CN114863951A - Rapid dysarthria detection method based on modal decomposition - Google Patents

Rapid dysarthria detection method based on modal decomposition Download PDF

Info

Publication number
CN114863951A
CN114863951A CN202210807739.8A CN202210807739A CN114863951A CN 114863951 A CN114863951 A CN 114863951A CN 202210807739 A CN202210807739 A CN 202210807739A CN 114863951 A CN114863951 A CN 114863951A
Authority
CN
China
Prior art keywords
dysarthria
decomposition
signal
modal
detection method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210807739.8A
Other languages
Chinese (zh)
Other versions
CN114863951B (en
Inventor
李海
张政霖
杨立状
王宏志
江海河
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Institutes of Physical Science of CAS
Original Assignee
Hefei Institutes of Physical Science of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Institutes of Physical Science of CAS filed Critical Hefei Institutes of Physical Science of CAS
Priority to CN202210807739.8A priority Critical patent/CN114863951B/en
Publication of CN114863951A publication Critical patent/CN114863951A/en
Application granted granted Critical
Publication of CN114863951B publication Critical patent/CN114863951B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/48Other medical applications
    • A61B5/4803Speech analysis specially adapted for diagnostic purposes
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/725Details of waveform analysis using specific filters therefor, e.g. Kalman or adaptive filters
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7253Details of waveform analysis characterised by using transforms
    • A61B5/7257Details of waveform analysis characterised by using transforms using Fourier transforms
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Medical Informatics (AREA)
  • Veterinary Medicine (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Surgery (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Pathology (AREA)
  • Biophysics (AREA)
  • Psychiatry (AREA)
  • Physiology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Fuzzy Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Epidemiology (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention relates to a dysarthria rapid detection method based on modal decomposition, which comprises the following steps: collecting an original voice signal; preprocessing an original voice signal; framing and windowed signal based on modal decompositionSPerforming acoustic feature extraction to obtain statistical features; and inputting the statistical characteristics into a machine learning classifier to realize the detection of dysarthria, wherein the machine learning classifier is a Support Vector Machine (SVM) model. The invention overcomes the traditional acoustic characteristicsDue to the limitation in a nonlinear time-varying system, the IMF obtained by decomposition contains time-frequency information of an original audio signal on different levels, so that the voice physiological information of a dysarthria patient can be well captured, the pathological change of a sounding organ is reflected, and the accuracy and robustness of dysarthria detection are improved; the method can adapt to nonlinear and non-stable voice signals, and further improves the detection effect of dysarthria.

Description

Rapid dysarthria detection method based on modal decomposition
Technical Field
The invention relates to the technical field of dysarthria detection, in particular to a dysarthria rapid detection method based on modal decomposition.
Background
Dysarthria is respiratory, phonation, resonance and rhythm abnormalities caused by pathological changes and morphological abnormalities of phonation organs or nervous systems, and is manifested by abnormal phonation difficulty, phonation inaccuracy, slurred biting, sound, tone, rate, rhythm and the like, and changes of speech and auditory characteristics such as excessive nasal sound and the like. The traditional Chinese medicine composition is clinically common in neurological diseases such as Parkinson's disease and the like, and seriously influences the life quality and the social life capability of patients.
Currently, dysarthria has no special assessment standard, and subjective methods of auditory perception, such as French evaluation, are mostly adopted clinically. This requires examination, recording and scoring by professional speech therapists or neurologists, and doctors in the department of rehabilitation, requires high time and labor cost, and has internal differences among scorers, which limits the wide popularization of the method in clinic. Characteristics such as voice tremor, rhythm disorder and tone depression expressed in the speech of a dysarthric patient can be represented based on voice signal analysis, so that dysarthric recognition is realized automatically.
A speech signal processing technology provides a non-invasive automatic dysarthria detection method, and a patent with a publication number of CN112927696A introduces a dysarthria automatic assessment system and a method based on speech recognition, and combines a deep learning technology to realize objective and accurate assessment of dysarthria, but the method needs a large number of continuous speech data samples of dysarthria patients to carry out specific speech recognition system training, and simultaneously, extracted acoustic features are set manual features, so that the method has great time cost, and the detection robustness depends heavily on trained speech models. In the academic field, pronunciation and prosody related features have been applied in the detection of dysarthria. However, these features do not take into account the time-frequency characteristics of sound. In contrast, other researches are based on spectral and cepstrum features, but the features can only reflect static characteristics of a voice signal, and the direct processing of a nonlinear and non-stationary voice signal has limitations, so that implicit information of sound is ignored, the detection is not accurate enough, and the robustness is poor.
Disclosure of Invention
The invention aims to provide a rapid dysarthria detection method based on modal decomposition, which can improve the accuracy and robustness of dysarthria detection, further improve the dysarthria detection effect, is convenient to operate, short in time consumption and low in cost, and is beneficial to large-scale popularization and application.
In order to achieve the purpose, the invention adopts the following technical scheme: a dysarthria rapid detection method based on modal decomposition comprises the following steps:
(1) executing a speech paradigm through a standardized testing process to collect original voice signals
Figure 635396DEST_PATH_IMAGE001
Of the original speech signal
Figure 100002_DEST_PATH_IMAGE002
For a time series consisting of a plurality of sample points,𝑡is the ordinal number of the sampling point,
Figure 100002_DEST_PATH_IMAGE003
Figure 100002_DEST_PATH_IMAGE004
is a natural number set, namely a set consisting of positive integers;
(2) for original voice signal
Figure 100002_DEST_PATH_IMAGE005
Carrying out pretreatment: including end point detection, pre-emphasis, framing and windowing to obtain the signal after framing and windowingNumber (C)S
(3) Framing and windowed signal based on modal decompositionSPerforming acoustic feature extraction to obtain statistical features;
(4) inputting the statistical characteristics into a machine learning classifier, and outputting a dysarthria detection result, wherein the machine learning classifier is a Support Vector Machine (SVM) model.
In the step (1), the testing process comprises standardization of a testing environment, an audio acquisition device and a data transmission and storage mode; the speech paradigm is a continuous vowel pronunciation task.
In step (2), the endpoint detection adopts a double-threshold algorithm, and utilizes the short-time energy and the short-time zero-crossing rate characteristics to determine the original voice signal
Figure 174831DEST_PATH_IMAGE005
To obtain the effective voice section
Figure 100002_DEST_PATH_IMAGE006
The interference of the non-voice section signal to the subsequent analysis is avoided; the pre-emphasis is used for balancing frequency spectrum and improving signal-to-noise ratio, and a first-order filter of the following formula is adopted as a pre-emphasis mode:
Figure 100002_DEST_PATH_IMAGE007
wherein,𝛼is a pre-emphasis coefficient;𝑦(𝑡) The pre-emphasis processing is performed on the voice signal;𝑡is the ordinal number of the sampling point,
Figure 586221DEST_PATH_IMAGE003
Figure 100002_DEST_PATH_IMAGE008
is a natural number set; the framing and windowing are by pairs𝑦(𝑡) Applying a Hamming window to each frame signal𝑤(𝑛) To realize that:
Figure 492997DEST_PATH_IMAGE009
wherein,𝑠(𝑛) Is composed of𝑦(𝑡) A framed signal;𝑁in order to be the frame length,
Figure 100002_DEST_PATH_IMAGE010
() For the Hamming window:
Figure 913614DEST_PATH_IMAGE011
the step (3) specifically comprises the following steps:
(3a) for framing and windowing the signalSCarrying out modal decomposition: framing and windowed signals by adopting CEEMDAN algorithm, namely ensemble empirical mode decomposition algorithmSIs decomposed into𝑘An intrinsic mode component IMF;
(3b) applying short-time Fourier transform (STFT) to each intrinsic modal component IMF, and stacking the result according to the frequency value of the modal component in a positive sequence to obtain a frequency spectrum matrixD(ω);
(3c) Calculating a spectrum matrixD(ω) Corresponding periodogram-based power spectrum estimation, followed by application of a Mel Filter Bank and summing of the energies within the frequency windows of each filter of the Mel Filter Bank to obtain the Mel frequency spectraS mel
(3d) For Mel frequency spectrumS mel Taking logarithm and performing Discrete Cosine Transform (DCT) to obtainLDimensional cepstrum coefficientsC
(3e) Respectively calculateLDimensional cepstrum coefficientsCCorresponding first order difference coefficient Δ and second order difference coefficient Δ 2 First order difference coefficient Δ and second order difference coefficient Δ 2 Is attached toLDimensional cepstrum coefficientsCUpper formation 3LDimensional feature vector
Figure 100002_DEST_PATH_IMAGE012
(3f) Computation at the Speech level 3LDimensional feature vector
Figure 104293DEST_PATH_IMAGE012
Statistical features across all frames per dimension.
In step (3 a), the CEEMDAN algorithm specifically includes the following steps:
(3a1) signals after framing and windowingSSuperposing a Gaussian white noise with the average value of 0 and the standard deviation of 1, and calculating the complete set average to obtain a first modal component
Figure 100002_DEST_PATH_IMAGE013
Figure 217742DEST_PATH_IMAGE014
Wherein,
Figure 100002_DEST_PATH_IMAGE015
is the first obtained by empirical mode decompositionnA component;𝐼test number of set average;
Figure 611815DEST_PATH_IMAGE016
is as follows𝑖White noise superimposed on the set of noise of amplitude
Figure 100002_DEST_PATH_IMAGE017
(3a2) Calculating a first residual component
Figure 304964DEST_PATH_IMAGE018
And the 1 st component of the white noise after modal decomposition is superposed again to obtain a new signal
Figure 100002_DEST_PATH_IMAGE019
Figure 366461DEST_PATH_IMAGE020
Figure 100002_DEST_PATH_IMAGE021
(3a3) Decomposing the new signal by using the calculation method in the step (3 a 1)
Figure 119653DEST_PATH_IMAGE019
Obtaining a second modal component
Figure 532180DEST_PATH_IMAGE022
Figure 100002_DEST_PATH_IMAGE023
(3a4) By analogy, calculate the first step in turnnA residual error
Figure 560179DEST_PATH_IMAGE024
And a firstn+1 modal component
Figure 100002_DEST_PATH_IMAGE025
Figure 459871DEST_PATH_IMAGE026
Figure 100002_DEST_PATH_IMAGE027
(3a5) Repeating step (3 a 4) until the residual signal can no longer be decomposed, at which point the framed and windowed signalSExpressed as:
Figure 649544DEST_PATH_IMAGE028
wherein, the number of intrinsic mode components IMF obtained by CEEMDAN algorithm decomposition; the final residual component.
In step (3 b), the spectrum matrixD(ω) Is calculated by the formula:
Figure 100002_DEST_PATH_IMAGE029
Wherein,𝑘the number of intrinsic mode components IMF obtained by CEEMDAN algorithm decomposition;
Figure 18208DEST_PATH_IMAGE030
is as follows𝑘The intrinsic mode component IMF passes through an angular frequency of𝜔After short-time fourier transformation STFT.
In step (3 c), the Mel frequency spectrumS mel The calculation formula of (2) is as follows:
Figure 100002_DEST_PATH_IMAGE031
wherein,𝑀is the number of filters;
Figure 53160DEST_PATH_IMAGE032
is as follows𝑛Each frequency point corresponds to a triangular filter function with a Mel scale;𝑃the number of points of the short-time fourier transform STFT.
In step (3 d), theLDimensional cepstrum coefficientsCTo (1) a𝑖Coefficient ofC𝑖The calculation formula of (c) is:
Figure 100002_DEST_PATH_IMAGE033
wherein,𝑀is the number of filters;Lthe dimension of the first order difference coefficient Δ.
In step (3 e), the first-order difference coefficient is ΔLMaintenance of𝑖Coefficient of land 𝑖 Is defined as:
Figure 354829DEST_PATH_IMAGE034
wherein,𝑄time difference for calculating difference;
Figure 100002_DEST_PATH_IMAGE035
Lthe dimension of the first order difference coefficient Δ;
the second order difference coefficient Δ 2 Is composed ofLMaintenance ofiCoefficient of
Figure 184244DEST_PATH_IMAGE036
Is defined as:
Figure 100002_DEST_PATH_IMAGE037
wherein,
Figure 571363DEST_PATH_IMAGE038
in step (3 f), the statistical features include mean, standard deviation, skewness, and kurtosis.
According to the technical scheme, the beneficial effects of the invention are as follows: firstly, the limitation of traditional acoustic characteristics in a nonlinear time-varying system is overcome due to the introduction of a time-frequency analysis theory based on modal decomposition, the IMF obtained by decomposition contains time-frequency information of an original audio signal on different levels, the voice physiological information of a dysarthric patient can be well captured, the pathological change of a vocal organ is reflected, and the accuracy and the robustness of dysarthric detection are improved; secondly, the Mel scale features can reflect the nonlinear characteristics of a voice generation mechanism and auditory perception, and the modal decomposition is combined, so that the method can adapt to nonlinear and non-stable voice signals, and further improve the detection effect of dysarthria; thirdly, the CEEMDAN-based signal decomposition method avoids the defects of mode aliasing and time-frequency distribution error of the traditional mode decomposition method, thereby accurately representing the sound-forming information, having good completeness and effectively improving the calculation efficiency; fourthly, the method has the advantages of standardized testing process, high automation degree, simplified testing paradigm, convenient operation, short time consumption and low cost, and is favorable for large-scale popularization and application.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
fig. 2 is a schematic diagram of an acoustic feature extraction method based on modal decomposition in fig. 1.
Detailed Description
As shown in fig. 1, a method for rapidly detecting dysarthria based on modal decomposition includes the following steps:
(1) executing a speech paradigm through a standardized testing process to collect original voice signals
Figure 659274DEST_PATH_IMAGE001
Of the original speech signal
Figure 549869DEST_PATH_IMAGE002
For a time series consisting of a plurality of sample points,𝑡is the ordinal number of the sampling point,
Figure 346924DEST_PATH_IMAGE003
Figure 955760DEST_PATH_IMAGE004
is a natural number set, namely a set consisting of positive integers;
the speech paradigm is performed and voice data is collected through a standardized testing procedure that includes standardization of the testing environment, the audio acquisition device, the data transmission and storage means, recorded in a room with low ambient background noise (less than 45 dB), by a condenser microphone placed about 10 cm directly in front of the subject's mouth. The microphone is connected to a professional sound card, converted into an audio signal and transmitted to a computer, and simultaneously sampled to 44.1kHz frequency and 16bit resolution, and stored in a single-track wav format.
(2) For original voice signal
Figure 598094DEST_PATH_IMAGE005
Carrying out pretreatment: including end point detection, pre-emphasis, framing and windowing to obtain the framed and windowed signalS
(3) Framing and windowed signal based on modal decompositionSPerforming acoustic feature extraction to obtain statistical features;
(4) inputting the statistical characteristics into a machine learning classifier, and outputting a dysarthria detection result, wherein the machine learning classifier is a Support Vector Machine (SVM) model.
In the step (1), the testing process comprises standardization of a testing environment, an audio acquisition device and a data transmission and storage mode; the speech paradigm is a continuous vowel pronunciation task. The subject is required to inhale a breath deeply and then to emit a vowel stably with a comfortable tone and loudness as long as possible, and to repeat the measurement a plurality of times. The vowel is a unit tone/a/, and the multiple measurements are 3.
In step (2), the endpoint detection adopts a double-threshold algorithm, and utilizes the short-time energy and the short-time zero-crossing rate characteristics to determine the original voice signal
Figure 608775DEST_PATH_IMAGE005
To obtain the effective voice section
Figure 576731DEST_PATH_IMAGE006
The interference of the non-voice section signal to the subsequent analysis is avoided; the pre-emphasis is used for balancing frequency spectrum and improving signal-to-noise ratio, and a first-order filter of the following formula is adopted as a pre-emphasis mode:
Figure 100002_DEST_PATH_IMAGE039
wherein,𝛼is a pre-emphasis coefficient;𝑦(𝑡) The pre-emphasis processing is performed on the voice signal;𝑡is the ordinal number of the sampling point,
Figure 141705DEST_PATH_IMAGE003
Figure 853309DEST_PATH_IMAGE008
is a natural number set; the framing and windowing are by pairs𝑦(𝑡) Is/are as followsApplying Hamming windows to each frame of signal𝑤(𝑛) To realize that:
Figure 718497DEST_PATH_IMAGE009
wherein,𝑠(𝑛) Is composed of𝑦(𝑡) A framed signal;𝑁in order to be the frame length,
Figure 326196DEST_PATH_IMAGE010
() For the Hamming window:
Figure 909624DEST_PATH_IMAGE011
as shown in fig. 2, the CEEMDAN algorithm is an improved modal decomposition method by repeatedly superimposing gaussian white noise on an original signal, and the completeness of decomposition can be achieved with a small number of iterations, thereby reducing the calculation cost. The step (3) specifically comprises the following steps:
(3a) for framing and windowing the signalSCarrying out modal decomposition: framing and windowed signals by adopting CEEMDAN algorithm, namely ensemble empirical mode decomposition algorithmSIs decomposed into𝑘An intrinsic mode component IMF;
(3b) applying short-time Fourier transform (STFT) to each intrinsic modal component IMF, and stacking the result according to the frequency value of the modal component in a positive sequence to obtain a frequency spectrum matrixD(ω);
(3c) Calculating a spectrum matrixD(ω) Corresponding periodogram-based power spectrum estimation, followed by application of a Mel Filter Bank and summing of the energies within the frequency windows of each filter of the Mel Filter Bank to obtain the Mel frequency spectraS mel
(3d) For Mel frequency spectrumS mel Taking logarithm and performing Discrete Cosine Transform (DCT) to obtainLDimensional cepstrum coefficientsC
(3e) Respectively calculateLDimensional cepstrum coefficientsCCorresponding first order difference coefficient Δ and second order difference coefficient Δ 2 First order differenceThe index and the second order difference coefficient 2 Is attached toLDimensional cepstrum coefficientsCUpper formation 3LDimensional feature vector
Figure 159339DEST_PATH_IMAGE012
To characterize the dynamic information of the speech;
(3f) computation at the Speech level 3LDimensional feature vector
Figure 862722DEST_PATH_IMAGE012
Statistical features across all frames per dimension.
In step (3 a), the CEEMDAN algorithm specifically includes the following steps:
(3a1) signals after framing and windowingSSuperposing a Gaussian white noise with the average value of 0 and the standard deviation of 1, and calculating the complete set average to obtain a first modal component
Figure 172481DEST_PATH_IMAGE013
Figure 243205DEST_PATH_IMAGE014
Wherein,
Figure 765453DEST_PATH_IMAGE040
is the first obtained by empirical mode decompositionnA component;𝐼test number of set average;
Figure 605233DEST_PATH_IMAGE016
is as follows𝑖White noise superimposed on the set of noise of amplitude
Figure 85893DEST_PATH_IMAGE017
(3a2) Calculating a first residual component
Figure 378334DEST_PATH_IMAGE018
And the 1 st component of the white noise after modal decomposition is superposed again to obtain a new signal
Figure 704273DEST_PATH_IMAGE019
Figure DEST_PATH_IMAGE041
Figure 398560DEST_PATH_IMAGE042
(3a3) Decomposing the new signal by using the calculation method in the step (3 a 1)
Figure 50121DEST_PATH_IMAGE019
Obtaining a second modal component
Figure 564279DEST_PATH_IMAGE022
Figure 959488DEST_PATH_IMAGE023
(3a4) By analogy, calculate the first step in turnnA residual error
Figure 508281DEST_PATH_IMAGE024
And a firstn+1 modal component
Figure 314432DEST_PATH_IMAGE025
Figure 581465DEST_PATH_IMAGE026
Figure 780365DEST_PATH_IMAGE027
(3a5) Repeating step (3 a 4) until the residual signal can no longer be decomposed, at which point the framed and windowed signalSExpressed as:
Figure 183665DEST_PATH_IMAGE028
wherein, the number of intrinsic mode components IMF obtained by CEEMDAN algorithm decomposition; the final residual component.
In step (3 b), the spectrum matrixD(ω) The calculation formula of (2) is as follows:
Figure 911450DEST_PATH_IMAGE029
wherein,𝑘the number of intrinsic mode components IMF obtained by CEEMDAN algorithm decomposition;
Figure 665779DEST_PATH_IMAGE030
is as follows𝑘The intrinsic mode component IMF passes through an angular frequency of𝜔After short-time fourier transformation STFT.
In step (3 c), the Mel frequency spectrumS mel The calculation formula of (2) is as follows:
Figure 402791DEST_PATH_IMAGE031
wherein,𝑀in order to be able to determine the number of filters,𝑀is 26;
Figure 926176DEST_PATH_IMAGE032
is as follows𝑛Each frequency point corresponds to a triangular filter function with a Mel scale;𝑃the number of points of the short-time fourier transform STFT,𝑃is 256.
In step (3 d), theLDimensional cepstrum coefficientsCTo (1) a𝑖Coefficient ofC𝑖The calculation formula of (2) is as follows:
Figure 824862DEST_PATH_IMAGE033
wherein,𝑀is the number of filters;Lis a first order difference coefficientThe number of the first and second groups is,Lis 13.
In step (3 e), the first-order difference coefficient is ΔLMaintenance of𝑖Coefficient of
Figure DEST_PATH_IMAGE043
Is defined as:
Figure 800908DEST_PATH_IMAGE034
wherein,𝑄in order to calculate the time difference of the difference,𝑄is 2;
Figure 810452DEST_PATH_IMAGE044
Lthe dimension of the first order difference coefficient Δ;
the second order difference coefficient Δ 2 Is composed ofLMaintenance of𝑖Coefficient of
Figure 453923DEST_PATH_IMAGE036
Is defined as:
Figure 789090DEST_PATH_IMAGE037
wherein,
Figure 518011DEST_PATH_IMAGE038
in step (3 f), the statistical features include mean, standard deviation, skewness, and kurtosis.
Example one
120 speech samples were collected, including 60 dysarthric patients, and gender and age-matched 60 healthy controls. All acoustic features are extracted and divided into a training set and a test set, then the training set is input into a Support Vector Machine (SVM) model for training and cross validation, finally, an experimental test is carried out on the test set, and the final experimental result is shown in the following table 1. In the first embodiment, the detection accuracy reaches 86.1%.
TABLE 1 dysarthria detection test results
Evaluation index Rate of accuracy F1 score AUC
Numerical value 0.8611 0.8718 0.8302
In conclusion, the method overcomes the limitation of the traditional acoustic characteristics in a nonlinear time-varying system, the decomposed IMF comprises time-frequency information of the original audio signal on different levels, the voice physiological information of a dysarthria patient can be well captured, the pathological change of a sounding organ is reflected, and the accuracy and the robustness of dysarthria detection are improved; the Mel scale features can reflect the nonlinear characteristics of a voice generation mechanism and auditory perception, and the modal decomposition is combined, so that the method can adapt to nonlinear and non-stable voice signals, and further improve the detection effect of dysarthria.

Claims (10)

1. A dysarthria rapid detection method based on modal decomposition is characterized in that: the method comprises the following steps in sequence:
(1) executing a speech paradigm through a standardized testing process to collect original voice signals
Figure DEST_PATH_IMAGE001
Of the original speech signal
Figure DEST_PATH_IMAGE002
For a time series consisting of a plurality of samples,𝑡is the ordinal number of the sampling point,
Figure DEST_PATH_IMAGE003
Figure DEST_PATH_IMAGE004
is a natural number set, i.e. a set consisting of positive integers;
(2) for original voice signal
Figure 98950DEST_PATH_IMAGE001
Carrying out pretreatment: including end point detection, pre-emphasis, framing and windowing to obtain the framed and windowed signalS
(3) Framing and windowed signal based on modal decompositionSPerforming acoustic feature extraction to obtain statistical features;
(4) inputting the statistical characteristics into a machine learning classifier, and outputting a dysarthria detection result, wherein the machine learning classifier is a Support Vector Machine (SVM) model.
2. The modal decomposition-based dysarthria rapid detection method according to claim 1, wherein: in the step (1), the testing process comprises the standardization of a testing environment, an audio acquisition device and a data transmission and storage mode; the speech paradigm is a continuous vowel pronunciation task.
3. The modal decomposition-based dysarthria rapid detection method according to claim 1, wherein: in step (2), the endpoint detection adopts a double-threshold algorithm, and utilizes the short-time energy and the short-time zero-crossing rate characteristics to determine the original voice signal
Figure 281669DEST_PATH_IMAGE002
Obtaining a start point and an end point ofValid speech segment
Figure DEST_PATH_IMAGE005
The interference of the non-voice section signal to the subsequent analysis is avoided; the pre-emphasis is used for balancing frequency spectrum and improving signal-to-noise ratio, and a first-order filter of the following formula is adopted as a pre-emphasis mode:
Figure DEST_PATH_IMAGE006
wherein,𝛼is a pre-emphasis coefficient;𝑦(𝑡) The pre-emphasis processing is performed on the voice signal;𝑡is the ordinal number of the sampling point,
Figure DEST_PATH_IMAGE007
Figure DEST_PATH_IMAGE008
is a natural number set; the framing and windowing are by pairs𝑦(𝑡) Applying a Hamming window to each frame signal𝑤(𝑛) To realize that:
Figure DEST_PATH_IMAGE009
wherein,𝑠(𝑛) Is composed of𝑦(𝑡) A framed signal;𝑁in order to be the frame length,
Figure DEST_PATH_IMAGE010
() For Hamming windows:
Figure DEST_PATH_IMAGE011
4. the modal decomposition-based dysarthria rapid detection method according to claim 1, wherein: the step (3) specifically comprises the following steps:
(3a) for the sub-frames andsignal after windowingSCarrying out modal decomposition: adopting CEEMDAN algorithm, i.e. ensemble empirical mode decomposition algorithm to divide frame and windowed signalSIs decomposed into𝑘An intrinsic mode component IMF;
(3b) applying short-time Fourier transform (STFT) to each intrinsic modal component IMF, and stacking the result of applying the STFT in a positive sequence according to the frequency values of the modal components to obtain a frequency spectrum matrixD(ω);
(3c) Calculating a spectrum matrixD(ω) Corresponding periodogram-based power spectrum estimation, followed by application of a Mel Filter Bank and summing of the energies within the frequency windows of each filter of the Mel Filter Bank to obtain the Mel frequency spectraS mel
(3d) For Mel frequency spectrumS mel Taking logarithm and performing Discrete Cosine Transform (DCT) to obtainLDimensional cepstrum coefficientsC
(3e) Respectively calculateLDimensional cepstrum coefficientsCCorresponding first order difference coefficient Δ and second order difference coefficient Δ 2 First order difference coefficient Δ and second order difference coefficient Δ 2 Is attached toLDimensional cepstrum coefficientsCUpper formation 3LDimensional feature vector
Figure DEST_PATH_IMAGE012
(3f) Computation at the Speech level 3LDimensional feature vector
Figure 654882DEST_PATH_IMAGE012
Statistical features across all frames per dimension.
5. The modal decomposition-based dysarthria rapid detection method according to claim 4, wherein: in step (3 a), the CEEMDAN algorithm specifically includes the following steps:
(3a1) signals after framing and windowingSSuperposing a Gaussian white noise with the average value of 0 and the standard deviation of 1, and calculating the complete set average to obtain a first modal component
Figure DEST_PATH_IMAGE013
Figure DEST_PATH_IMAGE014
Wherein,
Figure DEST_PATH_IMAGE015
is the first obtained by empirical mode decompositionnA component;𝐼test number of set average;
Figure DEST_PATH_IMAGE016
is as follows𝑖White noise superimposed on the set of noise of amplitude
Figure DEST_PATH_IMAGE017
(3a2) Calculating a first residual component
Figure DEST_PATH_IMAGE018
And the 1 st component of the white noise after modal decomposition is superposed again to obtain a new signal
Figure DEST_PATH_IMAGE019
Figure DEST_PATH_IMAGE020
Figure DEST_PATH_IMAGE021
(3a3) Decomposing the new signal by using the calculation method in the step (3 a 1)
Figure 971462DEST_PATH_IMAGE019
Obtaining a second modal component
Figure DEST_PATH_IMAGE022
Figure DEST_PATH_IMAGE023
(3a4) By analogy, calculate the first step in turnnA residual error
Figure DEST_PATH_IMAGE024
And a firstn+1 modal component
Figure DEST_PATH_IMAGE025
Figure DEST_PATH_IMAGE026
Figure DEST_PATH_IMAGE027
(3a5) Repeating step (3 a 4) until the residual signal can no longer be decomposed, at which point the framed and windowed signalSExpressed as:
Figure DEST_PATH_IMAGE028
wherein, the number of intrinsic mode components IMF obtained by CEEMDAN algorithm decomposition; the final residual component.
6. The modal decomposition-based dysarthria rapid detection method according to claim 4, wherein: in step (3 b), the spectrum matrixD(ω) The calculation formula of (2) is as follows:
Figure DEST_PATH_IMAGE029
wherein,𝑘the number of intrinsic mode components IMF obtained by CEEMDAN algorithm decomposition;
Figure DEST_PATH_IMAGE030
is as follows𝑘The intrinsic mode component IMF passes through an angular frequency of𝜔After short-time fourier transformation STFT.
7. The modal decomposition-based dysarthria rapid detection method according to claim 4, wherein: in step (3 c), the Mel frequency spectrumS mel The calculation formula of (2) is as follows:
Figure DEST_PATH_IMAGE031
wherein,𝑀is the number of filters;
Figure DEST_PATH_IMAGE032
is as follows𝑛Each frequency point corresponds to a triangular filter function with a Mel scale;𝑃the number of points of the short-time fourier transform STFT.
8. The modal decomposition-based dysarthria rapid detection method according to claim 4, wherein: in step (3 d), theLDimensional cepstrum coefficientsCTo (1) a𝑖Coefficient ofC𝑖The calculation formula of (2) is as follows:
Figure DEST_PATH_IMAGE033
wherein,𝑀is the number of filters;Lthe dimension of the first order difference coefficient Δ.
9. The modal decomposition-based dysarthria rapid detection method according to claim 4,the method is characterized in that: in step (3 e), the first-order difference coefficient is ΔLMaintenance of𝑖Coefficient of
Figure DEST_PATH_IMAGE034
Is defined as:
Figure DEST_PATH_IMAGE035
wherein,𝑄time difference for calculating difference;
Figure DEST_PATH_IMAGE036
Lis a first order difference coefficient
Figure DEST_PATH_IMAGE037
Dimension of (c);
the second order difference coefficient Δ 2 Is composed ofLMaintenance, first of𝑖Coefficient of
Figure DEST_PATH_IMAGE038
Is defined as:
Figure DEST_PATH_IMAGE039
wherein,
Figure DEST_PATH_IMAGE040
10. the modal decomposition-based dysarthria rapid detection method according to claim 4, wherein: in step (3 f), the statistical features include mean, standard deviation, skewness, and kurtosis.
CN202210807739.8A 2022-07-11 2022-07-11 Rapid dysarthria detection method based on modal decomposition Active CN114863951B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210807739.8A CN114863951B (en) 2022-07-11 2022-07-11 Rapid dysarthria detection method based on modal decomposition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210807739.8A CN114863951B (en) 2022-07-11 2022-07-11 Rapid dysarthria detection method based on modal decomposition

Publications (2)

Publication Number Publication Date
CN114863951A true CN114863951A (en) 2022-08-05
CN114863951B CN114863951B (en) 2022-09-23

Family

ID=82626745

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210807739.8A Active CN114863951B (en) 2022-07-11 2022-07-11 Rapid dysarthria detection method based on modal decomposition

Country Status (1)

Country Link
CN (1) CN114863951B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016206704A1 (en) * 2015-06-25 2016-12-29 Abdalla Magd Ahmed Kotb The smart stethoscope
CN106328120A (en) * 2016-08-17 2017-01-11 重庆大学 Public place abnormal sound characteristic extraction method
EP3200188A1 (en) * 2016-01-27 2017-08-02 Telefonica Digital España, S.L.U. Computer implemented methods for assessing a disease through voice analysis and computer programs thereof
CN111210845A (en) * 2019-12-20 2020-05-29 太原理工大学 Pathological voice detection device based on improved autocorrelation characteristics
CN112183582A (en) * 2020-09-07 2021-01-05 中国海洋大学 Multi-feature fusion underwater target identification method
US20210350791A1 (en) * 2020-05-11 2021-11-11 Neworiental Education & Technology Group Ltd. Accent detection method and accent detection device, and non-transitory storage medium
WO2022024046A1 (en) * 2020-07-31 2022-02-03 Resmed Sensor Technologies Limited Systems and methods for determining movement during respiratory therapy
CN114613389A (en) * 2022-03-16 2022-06-10 大连交通大学 Non-speech audio feature extraction method based on improved MFCC

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016206704A1 (en) * 2015-06-25 2016-12-29 Abdalla Magd Ahmed Kotb The smart stethoscope
EP3200188A1 (en) * 2016-01-27 2017-08-02 Telefonica Digital España, S.L.U. Computer implemented methods for assessing a disease through voice analysis and computer programs thereof
CN106328120A (en) * 2016-08-17 2017-01-11 重庆大学 Public place abnormal sound characteristic extraction method
CN111210845A (en) * 2019-12-20 2020-05-29 太原理工大学 Pathological voice detection device based on improved autocorrelation characteristics
US20210350791A1 (en) * 2020-05-11 2021-11-11 Neworiental Education & Technology Group Ltd. Accent detection method and accent detection device, and non-transitory storage medium
WO2022024046A1 (en) * 2020-07-31 2022-02-03 Resmed Sensor Technologies Limited Systems and methods for determining movement during respiratory therapy
CN112183582A (en) * 2020-09-07 2021-01-05 中国海洋大学 Multi-feature fusion underwater target identification method
CN114613389A (en) * 2022-03-16 2022-06-10 大连交通大学 Non-speech audio feature extraction method based on improved MFCC

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BISWAJIT KARAN 等: "Parkinson disease prediction using intrinsic mode function based features from speech signal", 《BIOCYBERNETICS AND BIOMEDICAL ENGINEERING》 *
程龙等: "基于改进MFCC的鸟鸣声识别方法研究", 《中国传媒大学学报(自然科学版)》 *

Also Published As

Publication number Publication date
CN114863951B (en) 2022-09-23

Similar Documents

Publication Publication Date Title
Khan et al. Classification of speech intelligibility in Parkinson's disease
Panek et al. Acoustic analysis assessment in speech pathology detection
CN106073706B (en) A kind of customized information and audio data analysis method and system towards Mini-mental Status Examination
AU2013274940B2 (en) Cepstral separation difference
CN105825852A (en) Oral English reading test scoring method
CN111554256B (en) Piano playing ability evaluation system based on strong and weak standards
CN109727608A (en) A kind of ill voice appraisal procedure based on Chinese speech
EP4125088B1 (en) Method and device for predicting potentially difficult airway based on machine learning voice technology
CN110223688A (en) A kind of self-evaluating system of compressed sensing based hepatolenticular degeneration disfluency
Cordella et al. Classification-based screening of Parkinson’s disease patients through voice signal
Khan et al. Assessing Parkinson's disease severity using speech analysis in non-native speakers
Dubey et al. Pitch-Adaptive Front-end Feature for Hypernasality Detection.
CN114822567A (en) Pathological voice frequency spectrum reconstruction method based on energy operator
CN113974607A (en) Sleep snore detecting system based on impulse neural network
CN117198340B (en) Dysarthria correction effect analysis method based on optimized acoustic parameters
Usman et al. Dataset of raw and pre-processed speech signals, mel frequency cepstral coefficients of speech and heart rate measurements
CN114863951B (en) Rapid dysarthria detection method based on modal decomposition
CN110211566A (en) A kind of classification method of compressed sensing based hepatolenticular degeneration disfluency
Godino-Llorente et al. Discriminative methods for the detection of voice disorders
Laska et al. Cough sound analysis using vocal tract models
Tripathi et al. Automatic speech intelligibility assessment in dysarthric subjects
CN112603266B (en) Method and system for acquiring target five-tone characteristics
RU2758550C1 (en) Method for diagnosing signs of bronchopulmonary diseases associated with covid-19 virus disease
Usman et al. Dataset of raw and pre-processed speech signals
Cai et al. Recognition and Extraction of Cough Sound from Audio Signals

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant