CN114863951B - Rapid dysarthria detection method based on modal decomposition - Google Patents
Rapid dysarthria detection method based on modal decomposition Download PDFInfo
- Publication number
- CN114863951B CN114863951B CN202210807739.8A CN202210807739A CN114863951B CN 114863951 B CN114863951 B CN 114863951B CN 202210807739 A CN202210807739 A CN 202210807739A CN 114863951 B CN114863951 B CN 114863951B
- Authority
- CN
- China
- Prior art keywords
- signal
- dysarthria
- mel
- modal
- decomposition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/48—Other medical applications
- A61B5/4803—Speech analysis specially adapted for diagnostic purposes
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/725—Details of waveform analysis using specific filters therefor, e.g. Kalman or adaptive filters
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7253—Details of waveform analysis characterised by using transforms
- A61B5/7257—Details of waveform analysis characterised by using transforms using Fourier transforms
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7264—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
- A61B5/7267—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/87—Detection of discrete points within a voice signal
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Abstract
The invention relates to a dysarthria rapid detection method based on modal decomposition, which comprises the following steps: collecting an original voice signal; preprocessing an original voice signal; framing and windowed signal based on modal decompositionSPerforming acoustic feature extraction to obtain statistical features; and inputting the statistical characteristics into a machine learning classifier to realize the detection of dysarthria, wherein the machine learning classifier is a Support Vector Machine (SVM) model. The method overcomes the limitation of the traditional acoustic characteristics in a nonlinear time-varying system, the decomposed IMF contains time-frequency information of the original audio signal on different levels, the voice physiological information of a dysarthria patient can be well captured, the pathological change of a sounding organ is reflected, and the accuracy and the robustness of dysarthria detection are improved; the method can adapt to nonlinear and non-stable voice signals, and further improves the detection effect of dysarthria.
Description
Technical Field
The invention relates to the technical field of dysarthria detection, in particular to a dysarthria rapid detection method based on modal decomposition.
Background
Dysarthria is the abnormality of breathing, phonation, resonance and rhythm caused by pathological changes and morphological abnormalities of phonic organs or nervous systems, and is manifested by the abnormality of phonation difficulty, phonation inaccuracy, word biting, sound, tone, rate, rhythm and the like, and the change of speech and auditory characteristics such as excessive nasal sound and the like. The traditional Chinese medicine composition is clinically common in neurological diseases such as Parkinson's disease and the like, and seriously influences the life quality and the social life capability of patients.
Currently, dysarthria has no special assessment standard, and subjective methods of auditory perception, such as French evaluation, are mostly adopted clinically. This requires examination, recording and scoring by professional speech therapists or neurologists in neurology department and rehabilitation departments, requires high time and labor cost, and has internal differences among scorers, which limits the wide clinical popularization of the method. Characteristics such as voice tremor, rhythm disorder and tone depression expressed in the speech of a dysarthric patient can be represented based on voice signal analysis, so that dysarthric recognition is realized automatically.
A speech signal processing technology provides a non-invasive automatic dysarthria detection method, and a patent with a publication number of CN112927696A introduces a dysarthria automatic assessment system and a method based on speech recognition, realizes objective and accurate assessment of dysarthria by combining a deep learning technology, but the method needs a large number of continuous speech data samples of dysarthria patients to carry out specific speech recognition system training, and simultaneously, extracted acoustic features are set manual features, so that the method has great time cost, and the detection robustness depends heavily on a trained speech model. In the academic field, pronunciation and prosody related features have been applied in the detection of dysarthria. However, these features do not take into account the time-frequency characteristics of sound. In contrast, other researches are based on spectral and cepstrum features, but the features can only reflect static characteristics of a voice signal, and the direct processing of a nonlinear and non-stationary voice signal has limitations, so that implicit information of sound is ignored, the detection is not accurate enough, and the robustness is poor.
Disclosure of Invention
The invention aims to provide a rapid dysarthria detection method based on modal decomposition, which can improve the accuracy and robustness of dysarthria detection, further improve the dysarthria detection effect, is convenient to operate, short in time consumption and low in cost, and is beneficial to large-scale popularization and application.
In order to realize the purpose, the invention adopts the following technical scheme: a dysarthria rapid detection method based on modal decomposition comprises the following steps:
(1) executing a speech paradigm through a standardized testing process to collect original voice signalsOf the original speech signalFor a time series consisting of a plurality of sample points,𝑡is the ordinal number of the sampling point,,is a natural number set, namely a set consisting of positive integers;
(2) for original voice signalCarrying out pretreatment: including end point detection, pre-emphasis, framing and windowing to obtain the framed and windowed signalS;
(3) Framing and windowed signal based on modal decompositionSPerforming acoustic feature extraction to obtain statistical features;
(4) inputting the statistical characteristics into a machine learning classifier, and outputting a dysarthria detection result, wherein the machine learning classifier is a Support Vector Machine (SVM) model.
In the step (1), the testing process comprises the standardization of a testing environment, an audio acquisition device and a data transmission and storage mode; the speech paradigm is a continuous vowel pronunciation task.
In step (2), the endpoint detection adopts a double-threshold algorithm, and utilizes the short-time energy and the short-time zero-crossing rate characteristics to determine the original voice signalTo obtain the effective voice sectionThe interference of the non-voice section signal to the subsequent analysis is avoided; the pre-emphasis is used for balancing frequency spectrum and improving signal-to-noise ratio, and a first-order filter of the following formula is adopted as a pre-emphasis mode:
wherein the content of the first and second substances,𝛼is a pre-emphasis coefficient;𝑦(𝑡) The pre-emphasis processing is performed on the voice signal;𝑡is the ordinal number of the sampling point,,is a natural number set; the framing and windowing are through pairs𝑦(𝑡) Each frame signal of (2) applying a Hamming window𝑤(𝑛) To realize that:
wherein the content of the first and second substances,𝑠(𝑛) Is composed of𝑦(𝑡) A framed signal;𝑁in order to be the frame length,;
() For Hamming windows:
the step (3) specifically comprises the following steps:
(3a) for framing and windowing the signalSCarrying out modal decomposition: framing and windowed signals by adopting CEEMDAN algorithm, namely ensemble empirical mode decomposition algorithmSIs decomposed into𝑘An intrinsic mode component IMF;
(3b) applying short-time Fourier transform (STFT) to each intrinsic modal component IMF, and stacking the result according to the frequency value of the modal component in a positive sequence to obtain a frequency spectrum matrixD(ω);
(3c) Calculating a spectrum matrixD(ω) Corresponding periodogram-based power spectrum estimation, followed by application of a Mel Filter Bank and summing of the energies within the frequency windows of each filter of the Mel Filter Bank to obtain the Mel frequency spectraS mel ;
(3d) For Mel frequency spectrumS mel Taking logarithm and performing Discrete Cosine Transform (DCT) to obtainLDimensional cepstrum coefficientsC;
(3e) Respectively calculateLDimensional cepstrum coefficientsCCorresponding first order difference coefficient Δ and second order difference coefficient Δ 2 First order difference coefficient 2 Is attached toLDimensional cepstrum coefficientsCUpper formation 3LDimensional feature vector;
(3f) Computation at the Speech level 3LDimensional feature vectorStatistical features across all frames per dimension.
In step (3 a), the CEEMDAN algorithm specifically includes the following steps:
(3a1) signals after framing and windowingSSuperposing a Gaussian white noise with the average value of 0 and the standard deviation of 1, and calculating the complete set average to obtain a first modal component:
Wherein the content of the first and second substances,is the first obtained by empirical mode decompositionnA component;𝐼test number of set average;is as follows𝑖White noise superimposed on each set and having an amplitude of;
(3a2) Calculating a first residual componentAnd the 1 st component of the white noise after modal decomposition is superposed again to obtain a new signal:
(3a3) Decomposing the new signal by using the calculation method in the step (3 a 1)Obtaining a second modal component:
(3a5) Repeating step (3 a 4) until the residual signal can no longer be decomposed, at which point the framed and windowed signalSExpressed as:
wherein, the number of the intrinsic mode components IMF is obtained by decomposition of the CEEMDAN algorithm; the final residual component.
In step (3 b), the spectrum matrixD(ω) The calculation formula of (2) is as follows:
wherein the content of the first and second substances,𝑘the number of intrinsic mode components IMF obtained by decomposition of the CEEMDAN algorithm;is as follows𝑘Intrinsic mode component IMFPassing angular frequency of𝜔After short-time fourier transformation STFT.
In step (3 c), the Mel frequency spectrumS mel The calculation formula of (2) is as follows:
wherein the content of the first and second substances,𝑀is the number of filters;is as follows𝑛Each frequency point corresponds to a triangular filter function with a Mel scale;𝑃the number of points of the short-time fourier transform STFT.
In step (3 d), theLDimensional cepstrum coefficientsCTo (1) a𝑖A coefficient ofC𝑖The calculation formula of (2) is as follows:
wherein the content of the first and second substances,𝑀is the number of filters;Lthe dimension of the first order difference coefficient Δ.
In step (3 e), the first-order difference coefficient is ΔLMaintenance of𝑖Δ coefficient 𝑖 Is defined as:
wherein the content of the first and second substances,𝑄time difference for calculating difference;,Lthe dimension of the first order difference coefficient Δ;
the second order difference coefficient Δ 2 Is composed ofLMaintenance ofiCoefficient ofIs defined as:
in step (3 f), the statistical features include mean, standard deviation, skewness, and kurtosis.
According to the technical scheme, the invention has the beneficial effects that: firstly, the limitation of traditional acoustic characteristics in a nonlinear time-varying system is overcome due to the introduction of a time-frequency analysis theory based on modal decomposition, the IMF obtained by decomposition contains time-frequency information of an original audio signal on different levels, the voice physiological information of a dysarthric patient can be well captured, the pathological change of a vocal organ is reflected, and the accuracy and the robustness of dysarthric detection are improved; secondly, the Mel scale features can reflect the nonlinear characteristics of a voice generation mechanism and auditory perception, and the modal decomposition is combined, so that the method can adapt to nonlinear and non-stable voice signals, and further improve the detection effect of dysarthria; thirdly, the CEEMDAN-based signal decomposition method avoids the defects of mode aliasing and time-frequency distribution error of the traditional mode decomposition method, thereby accurately representing the sound-forming information, having good completeness and effectively improving the calculation efficiency; fourthly, the method has the advantages of standardized testing process, high automation degree, simplified testing paradigm, convenient operation, short time consumption and low cost, and is favorable for large-scale popularization and application.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
fig. 2 is a schematic diagram of an acoustic feature extraction method based on modal decomposition in fig. 1.
Detailed Description
As shown in fig. 1, a method for rapidly detecting dysarthria based on modal decomposition includes the following steps:
(1) by standardisationExecuting the speech paradigm of the testing process and collecting the original voice signalsOf the original speech signalFor a time series consisting of a plurality of sample points,𝑡is the ordinal number of the sampling point,,is a natural number set, namely a set consisting of positive integers;
the speech paradigm is performed and voice data is collected through a standardized testing procedure that includes standardization of the testing environment, the audio acquisition device, the data transmission and storage means, recorded in a room with low ambient background noise (less than 45 dB), by a condenser microphone placed about 10 cm directly in front of the subject's mouth. The microphone is connected to a professional sound card, converted into an audio signal and transmitted to a computer, and simultaneously sampled to 44.1kHz frequency and 16bit resolution, and stored in a single-track wav format.
(2) For original voice signalCarrying out pretreatment: including end point detection, pre-emphasis, framing and windowing to obtain the framed and windowed signalS;
(3) Framing and windowed signal based on modal decompositionSPerforming acoustic feature extraction to obtain statistical features;
(4) inputting the statistical characteristics into a machine learning classifier, and outputting a dysarthria detection result, wherein the machine learning classifier is a Support Vector Machine (SVM) model.
In the step (1), the testing process comprises standardization of a testing environment, an audio acquisition device and a data transmission and storage mode; the speech paradigm is a continuous vowel pronunciation task. The subject is required to inhale a breath deeply and then to emit a vowel stably with a comfortable tone and loudness as long as possible, and to repeat the measurement a plurality of times. The vowel is a unit tone/a/, and the multiple measurements are 3.
In step (2), the endpoint detection adopts a double-threshold algorithm, and utilizes the short-time energy and the short-time zero-crossing rate characteristics to determine the original voice signalTo obtain the effective voice sectionThe interference of the non-voice section signal to the subsequent analysis is avoided; the pre-emphasis is used for balancing frequency spectrum and improving signal-to-noise ratio, and a first-order filter of the following formula is adopted as a pre-emphasis mode:
wherein the content of the first and second substances,𝛼is a pre-emphasis coefficient;𝑦(𝑡) The pre-emphasis processing is performed on the voice signal;𝑡is the ordinal number of the sampling point,,is a natural number set; the framing and windowing are by pairs𝑦(𝑡) Applying a Hamming window to each frame signal𝑤(𝑛) To realize that:
wherein the content of the first and second substances,𝑠(𝑛) Is composed of𝑦(𝑡) A framed signal;𝑁in order to be the frame length,;
() For the Hamming window:
as shown in fig. 2, the CEEMDAN algorithm is an improved modal decomposition method by repeatedly superimposing gaussian white noise on an original signal, and the completeness of decomposition can be achieved with a small number of iterations, thereby reducing the calculation cost. The step (3) specifically comprises the following steps:
(3a) for framing and windowing the signalSCarrying out modal decomposition: framing and windowed signals by adopting CEEMDAN algorithm, namely ensemble empirical mode decomposition algorithmSIs decomposed into𝑘An intrinsic mode component IMF;
(3b) applying short-time Fourier transform (STFT) to each intrinsic modal component IMF, and stacking the results according to the frequency value positive sequence of the modal component to obtain a frequency spectrum matrixD(ω);
(3c) Calculating a spectrum matrixD(ω) Corresponding periodogram-based power spectrum estimation, followed by application of a Mel Filter Bank and summing of the energies within the frequency windows of each filter of the Mel Filter Bank to obtain the Mel frequency spectraS mel ;
(3d) For Mel frequency spectrumS mel Taking logarithm and performing Discrete Cosine Transform (DCT) to obtainLDimensional cepstrum coefficientsC;
(3e) Respectively calculateLDimensional cepstrum coefficientsCCorresponding first order difference coefficient Δ and second order difference coefficient Δ 2 First order difference coefficient Δ and second order difference coefficient Δ 2 Is attached toLDimensional cepstrum coefficientsCUpper formation 3LDimension feature vectorTo characterize the dynamic information of the speech;
In step (3 a), the CEEMDAN algorithm specifically includes the following steps:
(3a1) signals after framing and windowingSSuperposing a Gaussian white noise with the average value of 0 and the standard deviation of 1, and calculating the complete set average to obtain a first modal component:
Wherein the content of the first and second substances,is the first obtained by empirical mode decompositionnA component;𝐼test number of set average;is as follows𝑖White noise superimposed on the set of noise of amplitude;
(3a2) Calculating a first residual componentAnd the 1 st component of the white noise subjected to modal decomposition is superposed again to obtain a new signal:
(3a3) Decomposing the new signal by using the calculation method in the step (3 a 1)Obtaining a second modal component:
(3a5) Repeating step (3 a 4) until the residual signal can no longer be decomposed, at which point the framed and windowed signalSExpressed as:
wherein, the number of intrinsic mode components IMF obtained by CEEMDAN algorithm decomposition; the final residual component.
In step (3 b), the spectrum matrixD(ω) The calculation formula of (2) is as follows:
wherein the content of the first and second substances,𝑘the number of intrinsic mode components IMF obtained by CEEMDAN algorithm decomposition;is as follows𝑘The intrinsic mode component IMF passes through an angular frequency of𝜔After short-time fourier transformation STFT.
In step (3 c), the Mel frequency spectrumS mel The calculation formula of (2) is as follows:
wherein the content of the first and second substances,𝑀in order to be able to determine the number of filters,𝑀is 26;is a first𝑛Each frequency point corresponds to a triangular filter function with a Mel scale;𝑃the number of points of the short-time fourier transform STFT,𝑃is 256.
In step (3 d), theLDimensional cepstrum coefficientsCTo (1) a𝑖Coefficient ofC𝑖The calculation formula of (2) is as follows:
wherein the content of the first and second substances,𝑀is the number of filters;Lis the dimension of the first order difference coefficient Δ,Lis 13.
In step (3 e), the first-order difference coefficient is ΔLMaintenance of𝑖Coefficient ofIs defined as:
wherein the content of the first and second substances,𝑄in order to calculate the time difference of the difference,𝑄is 2;,Lthe dimension of the first order difference coefficient Δ;
the second order difference coefficient Δ 2 Is composed ofLMaintenance of𝑖Coefficient ofIs defined as follows:
in step (3 f), the statistical features include mean, standard deviation, skewness, and kurtosis.
Example one
120 speech samples were collected, including 60 dysarthric patients, and gender and age-matched 60 healthy controls. All acoustic features are extracted and divided into a training set and a test set, then the training set is input into a Support Vector Machine (SVM) model for training and cross validation, finally, an experimental test is carried out on the test set, and the final experimental result is shown in the following table 1. In the first embodiment, the detection accuracy reaches 86.1%.
TABLE 1 dysarthria detection test results
Evaluation index | Rate of accuracy | Fraction F1 | AUC |
Numerical value | 0.8611 | 0.8718 | 0.8302 |
In conclusion, the method overcomes the limitation of the traditional acoustic characteristics in a nonlinear time-varying system, the decomposed IMF comprises time-frequency information of the original audio signal on different levels, the voice physiological information of a dysarthria patient can be well captured, the pathological change of a sounding organ is reflected, and the accuracy and the robustness of dysarthria detection are improved; the Mel scale features can reflect the nonlinear characteristics of a voice generation mechanism and auditory perception, and the modal decomposition is combined, so that the method can adapt to nonlinear and non-stable voice signals, and further improve the detection effect of dysarthria.
Claims (1)
1. A dysarthria rapid detection method based on modal decomposition is characterized in that: the method comprises the following steps in sequence:
(1) executing a speech paradigm through a standardized testing process to collect original voice signalsOf the original speech signalFor a time series consisting of a plurality of samples,𝑡is the ordinal number of the sampling point,,is a natural number set, namely a set consisting of positive integers;
(2) for original voice signalCarrying out pretreatment: including end point detection, pre-emphasis, framing and windowing to obtain the framed and windowed signalS;
(3) Framing and windowed signal based on modal decompositionSPerforming acoustic feature extraction to obtain statistical features;
(4) inputting the statistical characteristics into a machine learning classifier, and outputting a dysarthria detection result, wherein the machine learning classifier is a Support Vector Machine (SVM) model;
in the step (1), the testing process comprises standardization of a testing environment, an audio acquisition device and a data transmission and storage mode; the speech normal form is a continuous vowel pronunciation task;
in step (2), the endpoint detection adopts a double-threshold algorithm, and utilizes the short-time energy and the short-time zero-crossing rate characteristics to determine the original voice signalObtaining the effective speech segmentThe interference of the non-voice section signal to the subsequent analysis is avoided; the pre-emphasis is used for balancing frequency spectrum and improving signal-to-noise ratio, and a first-order filter of the following formula is adopted as a pre-emphasis mode:
wherein,𝛼Is a pre-emphasis coefficient;𝑦(𝑡) The pre-emphasis processing is performed on the voice signal;𝑡is the ordinal number of the sampling point,,is a natural number set; the framing and windowing are by pairs𝑦(𝑡) Applying a Hamming window to each frame signal𝑤(𝑛) To realize that:
wherein the content of the first and second substances,𝑠(𝑛) Is composed of𝑦(𝑡) A framed signal;𝑁in order to be the frame length,;
() For the Hamming window:
the step (3) specifically comprises the following steps:
(3a) for framing and windowing the signalSCarrying out modal decomposition: framing and windowed signals by adopting CEEMDAN algorithm, namely ensemble empirical mode decomposition algorithmSIs decomposed into𝑘An intrinsic mode component IMF;
(3b) applying short-time Fourier transform (STFT) to each intrinsic modal component IMF, and stacking the result of applying the STFT in a positive sequence according to the frequency values of the modal components to obtain a frequency spectrum matrixD(ω);
(3c) Calculating a spectrum matrixD(ω) Corresponding periodogram-based power spectrum estimation, followed by application of a Mel Filter Bank and summing of the energies within the frequency windows of each filter of the Mel Filter Bank to obtain the Mel frequency spectraS mel ;
(3d) For Mel frequency spectrumS mel Taking logarithm and performing Discrete Cosine Transform (DCT) to obtainLDimensional cepstrum coefficientsC;
(3e) Respectively calculateLDimensional cepstrum coefficientsCCorresponding first order difference coefficient Δ and second order difference coefficient Δ 2 First order difference coefficient Δ and second order difference coefficient Δ 2 Is attached toLDimensional cepstrum coefficientsCUpper formation 3LDimensional feature vector;
(3f) Computation at the Speech level 3LDimensional feature vectorStatistical features across all frames per dimension of (a);
in step (3 a), the CEEMDAN algorithm specifically includes the following steps:
(3a1) signals after framing and windowingSSuperposing a Gaussian white noise with the average value of 0 and the standard deviation of 1, and calculating the complete set average to obtain a first modal component:
Wherein the content of the first and second substances,is the first obtained by empirical mode decompositionnA component;𝐼test times averaged for the set;is as follows𝑖White noise superimposed on each set and having an amplitude of;
(3a2) Calculating a first residual componentAnd the 1 st component of the white noise after modal decomposition is superposed again to obtain a new signal:
(3a3) Decomposing the new signal by using the calculation method in the step (3 a 1)Obtaining a second modal component:
(3a5) Repeating step (3 a 4) until the residual signal can no longer be decomposed, at which point the framed and windowed signalSExpressed as:
wherein, the number of the intrinsic mode components IMF is obtained by decomposition of the CEEMDAN algorithm; the final residual component is obtained;
in step (3 b), the spectrum matrixD(ω) The calculation formula of (2) is as follows:
wherein the content of the first and second substances,𝑘the number of intrinsic mode components IMF obtained by CEEMDAN algorithm decomposition;is as follows𝑘The intrinsic mode component IMF passes through an angular frequency of𝜔The complex-valued spectrum matrix after short-time fourier transform STFT;
in step (3 c), the Mel frequency spectrumS mel The calculation formula of (2) is as follows:
wherein the content of the first and second substances,𝑀is the number of filters;is as follows𝑛Each frequency point corresponds to a triangular filter function with a Mel scale;𝑃the number of points of short-time Fourier transform (STFT);
in step (3 d), theLDimensional cepstrum coefficientsCTo (1) a𝑖Coefficient ofC𝑖The calculation formula of (2) is as follows:
wherein the content of the first and second substances,𝑀is the number of filters;Lthe dimension of the first order difference coefficient is;
in step (3 e), the first-order difference coefficient is ΔLMaintenance of𝑖Coefficient ofIs defined as:
wherein the content of the first and second substances,𝑄time difference for calculating difference;,Lis a first order difference coefficientDimension of (c);
the second order difference coefficient Δ 2 Is composed ofLMaintenance of𝑖Coefficient ofIs defined as:
in step (3 f), the statistical features include mean, standard deviation, skewness, and kurtosis.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210807739.8A CN114863951B (en) | 2022-07-11 | 2022-07-11 | Rapid dysarthria detection method based on modal decomposition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210807739.8A CN114863951B (en) | 2022-07-11 | 2022-07-11 | Rapid dysarthria detection method based on modal decomposition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114863951A CN114863951A (en) | 2022-08-05 |
CN114863951B true CN114863951B (en) | 2022-09-23 |
Family
ID=82626745
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210807739.8A Active CN114863951B (en) | 2022-07-11 | 2022-07-11 | Rapid dysarthria detection method based on modal decomposition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114863951B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016206704A1 (en) * | 2015-06-25 | 2016-12-29 | Abdalla Magd Ahmed Kotb | The smart stethoscope |
CN106328120A (en) * | 2016-08-17 | 2017-01-11 | 重庆大学 | Public place abnormal sound characteristic extraction method |
EP3200188A1 (en) * | 2016-01-27 | 2017-08-02 | Telefonica Digital España, S.L.U. | Computer implemented methods for assessing a disease through voice analysis and computer programs thereof |
CN112183582A (en) * | 2020-09-07 | 2021-01-05 | 中国海洋大学 | Multi-feature fusion underwater target identification method |
CN114613389A (en) * | 2022-03-16 | 2022-06-10 | 大连交通大学 | Non-speech audio feature extraction method based on improved MFCC |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111210845B (en) * | 2019-12-20 | 2022-06-21 | 太原理工大学 | Pathological voice detection device based on improved autocorrelation characteristics |
US11158302B1 (en) * | 2020-05-11 | 2021-10-26 | New Oriental Education & Technology Group Inc. | Accent detection method and accent detection device, and non-transitory storage medium |
US20240042149A1 (en) * | 2020-07-31 | 2024-02-08 | Resmed Sensor Technologies Limited | Systems and methods for determining movement during respiratory therapy |
-
2022
- 2022-07-11 CN CN202210807739.8A patent/CN114863951B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016206704A1 (en) * | 2015-06-25 | 2016-12-29 | Abdalla Magd Ahmed Kotb | The smart stethoscope |
EP3200188A1 (en) * | 2016-01-27 | 2017-08-02 | Telefonica Digital España, S.L.U. | Computer implemented methods for assessing a disease through voice analysis and computer programs thereof |
CN106328120A (en) * | 2016-08-17 | 2017-01-11 | 重庆大学 | Public place abnormal sound characteristic extraction method |
CN112183582A (en) * | 2020-09-07 | 2021-01-05 | 中国海洋大学 | Multi-feature fusion underwater target identification method |
CN114613389A (en) * | 2022-03-16 | 2022-06-10 | 大连交通大学 | Non-speech audio feature extraction method based on improved MFCC |
Non-Patent Citations (2)
Title |
---|
Parkinson disease prediction using intrinsic mode function based features from speech signal;Biswajit Karan 等;《Biocybernetics and Biomedical Engineering》;20200331;第249-264页 * |
基于改进MFCC的鸟鸣声识别方法研究;程龙等;《中国传媒大学学报(自然科学版)》;20170630;第第24卷卷(第03期);第41-44页 * |
Also Published As
Publication number | Publication date |
---|---|
CN114863951A (en) | 2022-08-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Panek et al. | Acoustic analysis assessment in speech pathology detection | |
Asgari et al. | Predicting severity of Parkinson's disease from speech | |
AU2013274940B2 (en) | Cepstral separation difference | |
CN105825852A (en) | Oral English reading test scoring method | |
US20190298271A1 (en) | Methods and systems for estimation of obstructive sleep apnea severity in wake subjects by multiple speech analyses | |
CN108198576A (en) | A kind of Alzheimer's disease prescreening method based on phonetic feature Non-negative Matrix Factorization | |
Wallen et al. | A screening test for speech pathology assessment using objective quality measures | |
Khan et al. | Cepstral separation difference: A novel approach for speech impairment quantification in Parkinson's disease | |
CN110223688A (en) | A kind of self-evaluating system of compressed sensing based hepatolenticular degeneration disfluency | |
El Emary et al. | Towards developing a voice pathologies detection system | |
Khan et al. | Assessing Parkinson's disease severity using speech analysis in non-native speakers | |
EP4125088A1 (en) | Method and device for predicting potentially difficult airway based on machine learning voice technology | |
Cordella et al. | Classification-based screening of Parkinson’s disease patients through voice signal | |
Usman et al. | Dataset of raw and pre-processed speech signals, Mel Frequency Cepstral Coefficients of Speech and Heart Rate measurements | |
Dubey et al. | Pitch-Adaptive Front-end Feature for Hypernasality Detection. | |
CN114863951B (en) | Rapid dysarthria detection method based on modal decomposition | |
CN114822567B (en) | Pathological voice frequency spectrum reconstruction method based on energy operator | |
Godino-Llorente et al. | Discriminative methods for the detection of voice disorders | |
CN107910019A (en) | A kind of human acoustical signal's processing and analysis method | |
CN110211566A (en) | A kind of classification method of compressed sensing based hepatolenticular degeneration disfluency | |
Tripathi et al. | Automatic speech intelligibility assessment in dysarthric subjects | |
Usman et al. | Dataset of raw and pre-processed speech signals | |
CN112908343B (en) | Acquisition method and system for bird species number based on cepstrum spectrogram | |
Zhang et al. | Automated Detection of Wilson's Disease Based on Improved Mel-frequency Cepstral Coefficients with Signal Decomposition. | |
Cai et al. | Recognition and Extraction of Cough Sound from Audio Signals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |