CN116434739A - Device for constructing classification model for identifying different stages of heart failure and related assembly - Google Patents
Device for constructing classification model for identifying different stages of heart failure and related assembly Download PDFInfo
- Publication number
- CN116434739A CN116434739A CN202310205344.5A CN202310205344A CN116434739A CN 116434739 A CN116434739 A CN 116434739A CN 202310205344 A CN202310205344 A CN 202310205344A CN 116434739 A CN116434739 A CN 116434739A
- Authority
- CN
- China
- Prior art keywords
- classification model
- heart failure
- voice
- identifying
- stage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013145 classification model Methods 0.000 title claims abstract description 149
- 206010019280 Heart failures Diseases 0.000 title claims abstract description 126
- 238000012549 training Methods 0.000 claims abstract description 43
- 238000012545 processing Methods 0.000 claims abstract description 18
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 238000000034 method Methods 0.000 claims description 59
- 230000009467 reduction Effects 0.000 claims description 50
- 238000005070 sampling Methods 0.000 claims description 25
- 238000011156 evaluation Methods 0.000 claims description 17
- 230000003595 spectral effect Effects 0.000 claims description 16
- 238000004458 analytical method Methods 0.000 claims description 15
- 238000006243 chemical reaction Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 10
- 230000005284 excitation Effects 0.000 claims description 9
- 238000001514 detection method Methods 0.000 claims description 8
- 125000004122 cyclic group Chemical group 0.000 claims description 7
- 238000003860 storage Methods 0.000 claims description 7
- 210000001260 vocal cord Anatomy 0.000 claims description 7
- 238000009432 framing Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 5
- 238000005259 measurement Methods 0.000 claims description 5
- 230000035945 sensitivity Effects 0.000 claims description 5
- 238000003379 elimination reaction Methods 0.000 claims description 4
- 230000008030 elimination Effects 0.000 claims description 3
- 229910002056 binary alloy Inorganic materials 0.000 claims description 2
- 238000012360 testing method Methods 0.000 description 20
- 230000000694 effects Effects 0.000 description 18
- 238000000513 principal component analysis Methods 0.000 description 15
- 239000011159 matrix material Substances 0.000 description 12
- 230000006870 function Effects 0.000 description 9
- 238000001228 spectrum Methods 0.000 description 9
- 238000012706 support-vector machine Methods 0.000 description 8
- 238000012512 characterization method Methods 0.000 description 7
- 238000000354 decomposition reaction Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000002474 experimental method Methods 0.000 description 6
- 238000001914 filtration Methods 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- DDRJAANPRJIHGJ-UHFFFAOYSA-N creatinine Chemical compound CN1CC(=O)NC1=N DDRJAANPRJIHGJ-UHFFFAOYSA-N 0.000 description 4
- 238000003066 decision tree Methods 0.000 description 4
- 230000003247 decreasing effect Effects 0.000 description 4
- 230000001419 dependent effect Effects 0.000 description 4
- 102000001554 Hemoglobins Human genes 0.000 description 3
- 108010054147 Hemoglobins Proteins 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000037433 frameshift Effects 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 230000008816 organ damage Effects 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 238000006467 substitution reaction Methods 0.000 description 3
- 208000024891 symptom Diseases 0.000 description 3
- 230000002861 ventricular Effects 0.000 description 3
- 208000032928 Dyslipidaemia Diseases 0.000 description 2
- 206010020772 Hypertension Diseases 0.000 description 2
- 108010028554 LDL Cholesterol Proteins 0.000 description 2
- 208000017170 Lipid metabolism disease Diseases 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 2
- 230000000747 cardiac effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000000739 chaotic effect Effects 0.000 description 2
- 208000029078 coronary artery disease Diseases 0.000 description 2
- 238000010219 correlation analysis Methods 0.000 description 2
- 229940109239 creatinine Drugs 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 206010012601 diabetes mellitus Diseases 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000004907 flux Effects 0.000 description 2
- 238000012417 linear regression Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000000737 periodic effect Effects 0.000 description 2
- 230000005855 radiation Effects 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 230000000391 smoking effect Effects 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 238000005728 strengthening Methods 0.000 description 2
- 230000035488 systolic blood pressure Effects 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 208000024172 Cardiovascular disease Diseases 0.000 description 1
- 208000017667 Chronic Disease Diseases 0.000 description 1
- 208000010473 Hoarseness Diseases 0.000 description 1
- 208000033774 Ventricular Remodeling Diseases 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 208000035850 clinical syndrome Diseases 0.000 description 1
- 230000008602 contraction Effects 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000035622 drinking Effects 0.000 description 1
- 238000013399 early diagnosis Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000011207 functional examination Methods 0.000 description 1
- 210000004704 glottis Anatomy 0.000 description 1
- 230000004217 heart function Effects 0.000 description 1
- 230000000004 hemodynamic effect Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000002513 implantation Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002688 persistence Effects 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000031877 prophase Effects 0.000 description 1
- 230000002685 pulmonary effect Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 238000005309 stochastic process Methods 0.000 description 1
- 208000019270 symptomatic heart failure Diseases 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012731 temporal analysis Methods 0.000 description 1
- 238000000700 time series analysis Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0635—Training updating or merging of old and new templates; Mean values; Weighting
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
The invention discloses a device for constructing classification models for identifying different stages of heart failure and related components, wherein the device comprises a sample processing unit, a processing unit and a processing unit, wherein the sample processing unit is used for converting an acquired voice analog signal into a voice digital signal, preprocessing the voice digital signal, and extracting characteristics of the preprocessed voice digital signal to obtain a plurality of types of voice characteristic samples; and the model training unit is used for constructing a classification model for recognizing heart failure stage, and training and optimizing the classification model by utilizing the multi-class voice characteristic samples to obtain an optimal classification model. The optimal classification model can be used for accurately identifying different stages of heart failure.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a device for constructing a classification model for identifying different stages of heart failure based on sound characteristics, a computer readable storage medium and computer equipment.
Background
Heart failure is a complex set of clinical syndromes caused by abnormal changes in cardiac structure and/or function due to various causes, and is a serious and terminal stage of various common cardiovascular diseases. Currently, according to the heart failure occurrence and development process, the heart failure occurrence and development process is divided into four stages, namely a heart failure risk stage (A stage), a precordial heart failure stage (B stage), a heart failure stage (C stage) and an end-stage heart failure stage (D stage). The heart failure stage is aimed at early discovery, early diagnosis and early intervention, and has important significance for timely identifying and treating patients with heart failure risk and early heart failure stage, and early intervention has important significance for delaying ventricular remodeling and heart failure progress, protecting heart function, improving life quality, reducing re-hospitalization rate and the like.
In the prior art, the identification of heart failure stage usually depends on medical history, physical examination, laboratory examination, cardiac imaging examination and functional examination, which are often performed by patients when symptoms are seen, and are not beneficial to early identification. In recent years, advanced early warning of heart failure compensation events of patients can be achieved through hemodynamic or pulmonary water content monitoring by implanted devices, such as sensor devices of CardioMEMS, multiSENSE, reDS, and evaluation by a heart logic multisensor index and an alarm algorithm. However, the above method is expensive and invasive, requires the implantation of sensors or the installation of pacemakers, is suitable for only a small part of patients with severe or refractory heart failure, and is not suitable for screening patients in the risk and pre-heart failure phases. On the basis of strengthening standardized diagnosis and treatment and patient education, a noninvasive, convenient and universal monitoring and early warning method is developed, different stage heart failure patients are identified, and the strengthening of home monitoring and early warning is a key of heart failure chronic disease management to reduce re-hospitalization rate and death rate. In the prior art, some parameters are trained and learned, so that classification models of some diseases are identified, but data utilized by the classification models also depend on a plurality of inspection data, and the identification accuracy is not high.
Disclosure of Invention
Embodiments of the present invention aim to provide an apparatus, a computer-readable storage medium and a computer device for constructing a classification model capable of accurately identifying different stages of heart failure based on sound features.
In a first aspect, an embodiment of the present invention provides an apparatus for constructing a classification model for identifying different stages of heart failure based on acoustic features, including:
the sample processing unit is used for converting the collected voice analog signals into voice digital signals, preprocessing the voice digital signals, and extracting the characteristics of the preprocessed voice digital signals to obtain multiple types of voice characteristic samples;
the model training unit is used for constructing a classification model for identifying heart failure stage, training and optimizing the classification model by utilizing the multi-class voice feature samples to obtain an optimal classification model, wherein the classification model for identifying heart failure stage A and heart failure stage B is an AdaBoost classification model based on an original variable; the classification model used for identifying the B phase and the C phase of heart failure is an AdaBoost classification model based on Lasso dimension reduction; the classification model used for identifying heart failure stage A and B and stage C is an AdaBoost classification model based on Lasso dimension reduction.
In a second aspect, an embodiment of the present invention further provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the following method: converting the collected voice analog signals into voice digital signals, preprocessing the voice digital signals, and extracting features of the preprocessed voice digital signals to obtain multiple types of voice feature samples; constructing a classification model for identifying heart failure stage, and training and optimizing the classification model by utilizing the multi-class voice feature samples to obtain an optimal classification model, wherein the classification model for identifying heart failure stage A and heart failure stage B is an AdaBoost classification model based on an original variable; the classification model used for identifying the B phase and the C phase of heart failure is an AdaBoost classification model based on Lasso dimension reduction; the classification model used for identifying heart failure stage A and B and stage C is an AdaBoost classification model based on Lasso dimension reduction.
In a third aspect, embodiments of the present invention further provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the following method: converting the collected voice analog signals into voice digital signals, preprocessing the voice digital signals, and extracting features of the preprocessed voice digital signals to obtain multiple types of voice feature samples; constructing a classification model for identifying heart failure stage, and training and optimizing the classification model by utilizing the multi-class voice feature samples to obtain an optimal classification model, wherein the classification model for identifying heart failure stage A and heart failure stage B is an AdaBoost classification model based on an original variable; the classification model used for identifying the B phase and the C phase of heart failure is an AdaBoost classification model based on Lasso dimension reduction; the classification model used for identifying heart failure stage A and B and stage C is an AdaBoost classification model based on Lasso dimension reduction.
The embodiment of the invention achieves the aim of constructing and identifying the classification model of the corresponding stage aiming at different stages based on the sound characteristics capable of reflecting different stages, and improves the model identification accuracy.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic structural diagram of an apparatus for constructing classification models for identifying different stages of heart failure based on sound features according to an embodiment of the present invention;
FIG. 2 is a graph showing the effect of different phase-decay on Jitter according to an embodiment of the present invention;
FIG. 3 is a graph showing the effect of different concentric decay phases on shimmers provided by an embodiment of the present invention;
FIG. 4 is a graph showing the effect of different stages of decay on Harmonic difference according to an embodiment of the present invention;
fig. 5 is a graph showing the effect of different concentric decay periods on HNR according to an embodiment of the present invention;
FIG. 6 is a graph showing the effect of different phase-decay on Alpha Ratio provided by an embodiment of the present invention;
FIG. 7 is a graph showing the effect of different stages of decay on voiced/unvoiced duration according to an embodiment of the present invention;
FIG. 8 is a graph showing the effect of the different concentric decay phases Loudess provided by the present invention;
FIG. 9 is a graph showing the effect of different concentric decay phases Hammarberg Index provided by an embodiment of the present invention;
FIG. 10 is a diagram showing the effect of different Spectral Slope for different attenuation stages according to an embodiment of the present invention;
FIG. 11 is a graph of cross-correlation coefficients of different heart failure stage glottal cycles provided by an embodiment of the present invention;
FIG. 12 is a graph of nonlinear analysis of different phase-decay voice characteristics provided by an embodiment of the present invention;
FIG. 13 is a graph of a variation of different heart failure stage cepstrum-based vocal acoustic characteristics provided by an embodiment of the present invention;
FIG. 14 is a graph of an optimal model (i.e., adaBoost of the original variables) sample level ROC provided by an embodiment of the present invention;
FIG. 15 is a graph of an example level ROC of an optimal model (i.e., adaBoost using Lasso dimension reduction) provided by an embodiment of the present invention;
fig. 16 is a graph of ROC at sample level for an optimal model (i.e., adaBoost using Lasso dimension reduction) provided by an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
Referring to fig. 1, fig. 1 is a schematic structural diagram of an apparatus for constructing classification models for identifying different stages of heart failure based on sound features according to an embodiment of the present invention. The device comprises a present processing unit 101 and a model training unit 201;
the sample processing unit 101 is configured to convert the collected voice analog signal into a voice digital signal, perform preprocessing on the voice digital signal, and perform feature extraction on the preprocessed voice digital signal to obtain multiple types of voice feature samples.
In this unit, the sound is an analog signal, which needs to be converted into a digital signal for processing by a computer, and the sample processing unit includes: a first conversion unit for sampling, a second conversion unit for quantization, and a third conversion unit for encoding;
the first conversion unit is specifically configured to convert a speech analog signal with continuous time into signal samples with discrete time and continuous amplitude according to a predetermined sampling period;
The sampling period is the time interval between two adjacent sampling points, the sampling frequency is the reciprocal of the sampling period, for example, the sampling frequency of 8kHz represents 1 second to collect 8000 samples, so that the higher the sampling frequency is, the higher the reduction degree of sound is, and the more realistic the sound is.
The second conversion unit is specifically configured to convert each signal sample with a continuous value (analog quantity) in amplitude into a discrete value (digital quantity), and perform representation by using binary system to obtain digital data;
the quantized signal samples are usually represented by binary, the sampling precision refers to the number of binary bits occupied by each signal sample, the quality of sound can be reflected, as a common CD adopts a sampling depth of 16 bits, 65535 (2≡16) different values can be represented, the DVD uses a sampling depth of 24 bits, and most telephone devices use a sampling depth of 8 bits.
The third conversion unit is specifically configured to convert the digital data into a binary code stream, so as to obtain a voice digital signal.
The coding is to convert the sampled and quantized digital data into binary code stream for computer storage, processing and transmission, wherein the pulse code modulation can reach the highest fidelity level, and the data transmission rate of sound can be calculated through sampling frequency and precision: data transmission rate (bps) =sampling frequency x precision x number of channels, and at the same time, the data amount of the sound signal can be calculated: data amount (byte) =data transmission rate ×duration/8.
In this embodiment, in order to improve the quality of the voice digital signal and preserve more voice information, the voice digital signal needs to be preprocessed before being analyzed, i.e. the sample processing unit further includes: the device comprises a synchronization unit, an end point detection unit, an emphasis unit, a framing and windowing unit.
The synchronous unit is used for synchronizing the voice digital signals to a uniform sampling rate by adopting a downsampling method;
different voice acquisition devices may have differences in voice amplitude due to different sampling frequencies, durations and positions, so that the original voice needs to be subjected to downsampling before signal analysis. For example, the sampling frequency of the original signal is 22050Hz, so as to reduce the computational complexity, improve the signal processing efficiency and not lose the main voice component, and the signal can be downsampled to 16000Hz according to the nyquist sampling theorem.
The end point detection unit is used for carrying out end point detection on the voice digital signal after the unified sampling rate to distinguish a voice area and a non-voice area;
endpoint detection, also known as voice activity detection, aims to distinguish between regions of speech and non-speech, i.e. to accurately detect the starting and ending points of speech from noisy speech and to remove silence and noise parts to find truly valid speech content. Common methods such as a double-threshold method based on short-time energy and short-time zero-crossing rate can be used for detecting a voiced sound part and extracting the unvoiced sound part by using short-time energy firstly because the energy of the voiced sound is higher than that of the unvoiced sound and the zero-crossing rate of the unvoiced sound is higher than that of the unvoiced sound part, so that the endpoint detection of the voice can be completed.
The emphasis unit is used for emphasizing the high-frequency part of the voice area and increasing the high-frequency resolution of the voice;
because the average power spectrum of the voice signal is affected by glottal excitation and oral-nasal radiation, the power spectrum is reduced along with the increase of frequency, and the energy of the voice is mainly concentrated in a low-frequency part, the pre-emphasis treatment is needed for the purpose of emphasizing a high-frequency part of the voice, removing the influence of oral-labial radiation, and increasing the high-frequency resolution of the voice, so that the signal spectrum can be obtained by using the same signal-to-noise ratio in the whole frequency band from low frequency to high frequency, and the spectrum analysis or the channel parameter analysis is convenient. Pre-emphasis is typically achieved by the transfer function being a high-pass digital filter, i.e. H (z) =1-az -1 Wherein a is a pre-emphasis coefficient in the range of 0.9<a<1.0, a=0.97 is generally taken.
The framing and windowing unit is used for framing and windowing the emphasized voice region to obtain a plurality of voice signal segments.
The speech signal is a non-stationary continuous analog signal having a time-varying characteristic, but which is substantially constant, i.e. relatively stable, over a short time period (typically within 10-30 ms), so that the speech signal has a short-time stationarity, meaning that any analysis and processing of the speech signal must be "short-time analyzed", the speech signal being segmented to analyze its characteristic parameters, wherein each segment is referred to as a "frame", and the frame length is typically 10-30 ms. In order to make a smooth transition from frame to frame and maintain continuity, the frames may be segmented using an overlapping segmentation method. The overlapping portion of the previous frame and the next frame is called a frame shift, and the ratio of the frame shift to the frame length is generally 0 to 0.5. In this embodiment, the frame length is 25ms, and the frame shift is 10ms.
Because the speech signal has short-term stationarity, it is windowed in addition to the framing of the signal in order to emphasize the speech waveform near the sampled samples and attenuate the rest of the waveform. Common window functions include rectangular windows, hamming windows, etc., which have higher spectral resolution, but adjacent harmonics are severely disturbed, losing high frequency components and resulting in loss of details of the waveform, and hamming windows are the opposite.
In this embodiment, the sample processing unit further includes: an extraction unit and a merging unit;
the extraction unit is used for extracting a multidimensional first voice feature sample by using an openSMILE open source toolkit and extracting a multidimensional second voice feature sample by using python;
the extraction unit is used for combining the multi-dimensional first voice characteristic sample and the multi-dimensional second voice characteristic sample to obtain multi-class voice characteristic samples.
It should be noted that the first voice feature sample and the second voice feature sample may be extracted according to actual requirements. The extracted multiple types of voice feature samples in this embodiment are 100-dimensional in total. The first speech feature sample uses the eGeMAPS feature set, which is an extended feature set of GeMAPS, which is an 88-dimensional manual feature extracted by the openSMILE open source toolkit, which contains 18 Low-level description features (Low-Level Descriptors, LLDs) and adds 5 Spectral features (MFCC 1-4 and Spectral flux) and 2 frequency-related features (i.e., bandwidths of the second and third formants) on the GeMAPS basis, including frequency, energy/amplitude-related features and Spectral features. A 12-dimensional second speech feature sample was also extracted using python.
In this embodiment, the first speech feature sample includes: pitch characteristics, frequency perturbation characteristics, formant characteristics, amplitude perturbation characteristics, loudness characteristics, harmonic to noise characteristics, harmonic difference characteristics, alpha ratio characteristics, hammarberg coefficient characteristics, spectral slope characteristics, mel-frequency cepstrum characteristics, spectral flow characteristics, ratio characteristics of loudness peaks, continuous and unvoiced regions characteristics, equivalent sound level characteristics. The detailed information of the first speech feature sample is shown in table 1.
TABLE 1
In table 1, the pitch characteristics, i.e., the fundamental frequency of vocal cord vibration, represent the number of vocal cord vibrations per second, and the characteristics describe: log F0, calculated on the semitone frequency scale, starting from 27.5 Hz; the frequency perturbation characteristic reflects the frequency change during the adjacent period of the sound wave, and the characteristic description is as follows: deviations within a single consecutive pitch period; formant characteristics, characterization: the center frequency and bandwidth of the first, second and third formants, and the energy ratio of the first three formants to the fundamental tone; amplitude perturbation characteristics, i.e., characterization: reflecting the change in amplitude during adjacent cycles of the acoustic wave; loudness characteristics, i.e., the degree of size of a sound; harmonic-to-noise ratio characteristics, characteristics description: i.e., the proportion of periodically repeated harmonic components in the sound wave of the stationary vowels; harmonic difference characteristics, characteristics description: i.e. the energy ratio of the first pitch harmonic H1 to the second pitch harmonic H2 or the energy ratio of the first pitch harmonic H1 to the third formant H3; alpha ratio features, feature description: i.e., the energy sum of 50-1000Hz divided by the energy sum of 1-5 kHz; hammarberg coefficient characterization: i.e., the strongest energy peak of 0-2kHz divided by the strongest energy peak of 2-5 kHz; spectral slope characteristics, i.e., characterization: linear regression slopes of logarithmic power spectra in the range of 0-500Hz and 500-1500 Hz; mel-frequency spectral characteristics, i.e., characterization: mel cepstrum coefficient 1-4; spectral flow characteristics, i.e., characterization: spectrum difference of two adjacent frames; the ratio characteristics of the loudness peaks, i.e., the characteristics describe: number of loudness peaks per second; continuous sound region and silence region features, i.e., feature description: a continuous voiced (f0 > 0) duration and unvoiced (f0=0) duration; equivalent sound level characteristics, i.e. equivalent sound level, refer to the average of the a sound level in terms of energy over a period of time.
In this embodiment, the second speech feature sample includes: glottal noise excitation ratio characteristics, vocal cord excitation ratio characteristics, cyclic period density entropy characteristics, trend fluctuation elimination analysis characteristics, sample entropy characteristics and multi-scale entropy characteristics. The details of the second speech feature sample are shown in table 2.
TABLE 2
In table 2, the glottal noise excitation ratio (GNE) and the vocal cord vibration excitation ratio (VFER) are both energy specific weights used to quantify the normal speech signal in which each band triggered by a glottal pulse is excited simultaneously and the noise signal in which each band triggered by chaotic noise (usually caused by incomplete closure of the glottal) is excited unordered. For example, to calculate the GNE parameters, for the original voice signal with the sampling frequency of 44.1kHz, the signal is firstly downsampled to 10kHz; then using an inverse filtering method to find out the opening and closing time points of each glottal so as to find out the opening and closing time sequences; then for each time sequence, using a filter to filter out frequency bands of 0-500Hz, 500-1000Hz, 1000-1500Hz and the like up to 11.5KHz respectively with 500Hz as bandwidth; for each frequency band, taking the first five frequency bands (1 Hz-2.5 KHz) of the low frequency band as signals, taking the rest high frequency bands (2.5 KHz-11.5 KHz) as noise, and respectively calculating SEO and TKEO energy values of the signals and the noise; and finally, respectively solving the signal-to-noise ratio SNR and the noise-to-signal ratio NSR according to the calculated energy value. The VFER parameters are calculated much the same as the GNE steps except that the down-sampling of the signal during GNE is eliminated and the DYSPA algorithm is used in the second step of GNE instead of the inverse filtering method to obtain the glottal opening and closing sequence. In other words, GNE and VFER detect glottal pulses within a given time window using inverse filtering (for GNE) or DYSPA (for VFER); the original sound is then split into two parts: above 2.5kHz is noise and below 2.5kHz is an energy signal; and then combining the SEO and TKEO concepts, calculating the energy values of different frequency bands of the signals, and obtaining the signal-to-noise ratio measured by the empirical mode decomposition excitation ratio (Empirical mode decomposition excitation ratio, EMDER-ER). After calculating each time window value, the average and standard values of GNE and VFER are calculated. The following parameters can be calculated: gne_seo_snr, gne_tkeo_snr, gne_mean, gne_std, vfer_seo_snr, vfer_tkeo_snr, vfer_mean, vfer_std.
Cyclic period density entropy characterization (RPDE) refers to a method employed in the field of power systems, stochastic processes, and time series analysis for determining the periodicity or repeatability of a signal. With a value between 0 and 1, for quasi-periodic signals with a value of 0 and a value of even white noise close to 1. The calculation method is as follows:
first, the time sequence X is first of all n =[x n ,x n+r ,x n+2r ,…,x n+(M-1)r ]Projecting to a phase space according to a Taken embedding theory; where M is the embedded dimension, T is the embedded delay, and these parameters are all obtained by a parameter optimization algorithm. And secondly, drawing an M-dimensional area with a radius at each point Xn in the phase space, recording the time difference of reaching the area and leaving the area every time the time sequence, drawing a histogram of the time difference, and finally normalizing the histogram to obtain a regression density function P (T).
From the formula
Wherein T is max For maximum delay embedded in the phase space, a value of the cyclic period density entropy is obtained.
The method for eliminating trend fluctuation analysis (DFA) is a scale index calculation method for eliminating the influence of trend items in a time sequence on fluctuation analysis, and is used for analyzing long-range correlation of a voice signal, namely judging whether noise items in the time sequence have positive or negative autocorrelation, and has the advantages of effectively filtering each order trend component in the sequence, detecting long-range correlation which contains noise and is superimposed with polynomial trend signals, being suitable for long Cheng Milv correlation analysis of non-stationary time sequences and long Cheng Milv correlation analysis of non-stationary time sequences.
The implementation method is as follows:
1. first, for the sequence x (t), the cumulative deviation y (t) thereof is calculated
Here, the average value of the time series is filtered out first. Since cyclic or fluctuating components may exist for a general time series, a time series may have random components, and filtering out these components of the series may be helpful.
2. Performing sequence reconstruction, and performing equal-length division on Y (t) respectively, wherein the Y (t) is divided into m non-overlapping sections with a length s, and m= [ n/s ] (rounding); since the sequence length is not always an integer multiple of the increment s, there is a case where a small part of data information cannot be used at the end of the sequence, and thus the same operation is performed on the reverse order of the sequence, and a total of 2N equal-length sections are obtained.
3. For each interval v, a first order linear fit is performed on the S data contained in each interval by a least square method.
4. The mean square error after filtering the trend for each interval is calculated (the order and reverse order are calculated here by the respective formulas):
5. and (3) solving the mean value of all the equal-length intervals, and performing square calculation to obtain a DFA fluctuation function:
6. if the radial flow time series { x (t) } is Cheng Milv long, then F(s) and s satisfy the following power law relationship:
ln(F(s))~hln(s)
Taking the logarithm of the two sides of the upper part at the same time to obtain:
F(s)~s h
a scatter plot in a double logarithmic scale (ln(s), ln (F (s))) fits the data points with a least squares method, where the slope of the straight line portion, i.e. the Hurst index.
Relationship between Hurst index and correlation
(1) When 0.5< h <1, the time series has long-range correlation, and the time series shows a state that the trend is continuously enhanced, namely, the time series is in an increasing (decreasing) trend in a certain time period, the next time series is in an increasing (decreasing) trend, and the closer h is to 1, the stronger the correlation is.
(2) When h=0.5, the time series is uncorrelated, which is an independent random process, i.e. the current state does not affect the future state.
(3) When 0< h <0.5, it indicates that the radial flow time series has only negative correlation, and presents a state of inverse persistence, that is, the time series is in a trend of increasing (decreasing) in a certain time period, and is in a trend of decreasing (increasing) in a next time period.
Empirical decomposition mode ratio (EMD-ER): for an original speech signal with a sampling frequency of 44.1kHz, the original speech signal may be decomposed into a finite number of eigenmode functions (Intrinsic Mode Function, IMF for short), and each of the decomposed IMF components contains local feature signals of different time scales of the original signal. The eigenmode function of the initial decomposition is a high-frequency noise signal, and the eigenmode function of the subsequent decomposition is a practically useful signal. According to the energy operator formula And shannon entropy, the SEO, TKEO and shannon entropy for each IMF can be calculated. In calculating the signal-to-noise ratio, the first four IMFs are used as noise signals and are represented by the formula +.>(u is the SEO, TKEO and shannon entropy values of each IMF, D is the number of IMFs obtained by decomposition) parameters relating to the signal to noise ratio can be obtained. When the noise-signal ratio is calculated, firstly taking the logarithm of each IMF, taking the first two IMFs as noise signals, then calculating the SEO, TKEO and shannon entropy values of each IMF, and obtaining the value of the SEO, TKEO and shannon entropy values according to the formula(u is the SEO, TKEO and Shannon entropy of each IMF, D is the number of IMFs obtained by decomposition) parameters related to the noise-signal ratio can be obtained.
Sample entropy feature (samplen) is an improved method for measuring time series complexity based on approximate entropy;
multi-scale entropy features (MSEn) are used to entropy expand samples to multiple time scales and to compute the complexity of the signal at different time scales.
Glottal entropy (GQ) is a measure of the stability of vocal cord vibrations by calculating the mean and standard deviation of the speech signal. Firstly, finding out an opening point and a closing point of a glottal by using a DYSPA algorithm, dividing a section of voice signal into a plurality of glottal opening segments and glottal closing segments, and then respectively solving the average value and the standard deviation of the glottal opening segments and the glottal closing segments.
MFCC is a cepstrum parameter extracted from the Mel scale frequency domain, which describes the nonlinear characteristics of human ear frequency. The 39-dimensional MFCC includes, from left to right, one log energy and 12 cepstral parameters, a differential operation and a differential operation.
The model training unit 201 is configured to construct a classification model for identifying a heart failure stage, and train and optimize the classification model by using the multi-class speech feature samples to obtain an optimal classification model, where the classification model for identifying a heart failure stage a and a heart failure stage B is an AdaBoost classification model based on an original variable; the classification model used for identifying the B phase and the C phase of heart failure is an AdaBoost classification model based on Lasso dimension reduction; the classification model used for identifying heart failure stage A and B and stage C is an AdaBoost classification model based on Lasso dimension reduction.
In one embodiment, the classifier algorithm is optimized by combining a classification model and a dimension reduction method. In this embodiment, screening is performed from 6 classification models and 2 dimension reduction modes, namely, support vector machine classification model (SVM), decision tree classification model (DT), adaptive enhancement classification model (Ada Boost), minimum absolute shrinkage and selection operator classification model (LASSO), ridge regression classification model (Ridge regression), elastic network classification model (Elastic Net) and Principal Component Analysis (PCA) dimension reduction, LASSO dimension reduction.
The final built model is shown in table 3, containing 3 models: (1) a classification model based on substitution of the original variables; (2) after the dimension reduction based on PCA, a classification model substituted by the main component is utilized; (3) based on LASSO feature selection, feature variables are substituted into the classification model.
TABLE 3 Table 3
The device of the embodiment of the invention further comprises: and the evaluation unit is used for carrying out model evaluation on the classification model according to a set aside method.
The model establishment should avoid 'overfitting' as much as possible, i.e. the classifier regards some characteristics of the training samples as general properties that all potential samples can have, thus leading to a reduction in generalization performance, which is manifested in that the final model has good effect on the training set and poor effect on the test set. The "under fitting" should be avoided, that is, the general properties of the training sample are still not well learned, the "over fitting" cannot be avoided and can only be alleviated under the condition that the performance on the training set and the test set is poor, but the "under fitting" can be overcome by the methods of increasing the feature number, increasing the complexity of the model, reducing the regularization coefficient and the like, and finally, the model is expected to have a good fitting (i.e., training error) on the training data set, and simultaneously, can have a good fitting result (i.e., generalization capability) on the unknown data set (i.e., the test set).
The model establishment generally needs to divide sample data into a training set and a testing set, and for the two sets which are mutually exclusive, the model evaluation needs to use a testing set to test the discrimination capability of a learner on a new sample, and the test error on the testing set is approximate to the generalization error. Common methods include leave-out, cross-validation and self-service methods.
The method is as follows: the data set is directly divided into two mutually exclusive sets, one set being a training set and the other being a test set. The data distribution consistency is kept as much as possible by the division of the training set and the test set, and the influence on the final result caused by the additional deviation introduced in the data division process is avoided. Since the estimation result obtained by the single use of the leave-out method is often unreliable, the estimation result obtained by the single use of the leave-out method is generally obtained by taking an average value as the estimation result of the leave-out method after carrying out random division and repeated experimental estimation for a plurality of times, generally taking 2/3 to 4/5 of samples for training and the rest for testing. In this embodiment, a leave-one method is adopted, which is one of the leave-one methods.
The device of the embodiment of the invention further comprises: and the performance measurement unit is used for performing performance measurement on the classification model according to a preset index, wherein the preset index comprises error rate and accuracy rate, precision rate and recall rate, F1 value, specificity, sensitivity, ROC curve and AUC, and unweighted average recall rate.
The generalization performance of the classifier is evaluated, and an evaluation standard for measuring the generalization capability of the model, namely, a performance measure is required. For the classification problem, "confusion matrix" can be formed from a combination of the true class and the classifier prediction class, including four cases: true Positive (TP), true Negative (TN), false Positive (FalsePositive, FP), false Negative (FN), and confusion matrix are shown in table 4.
TABLE 4 Table 4
The preset indexes are respectively as follows:
accuracy (ACC): also called accuracy, refers to the ratio of the number of correctly classified samples to the total number of samples. The method is suitable for both the two-class task and the multi-class task, and can judge the total accuracy, but under the condition of unbalanced samples, the accuracy can be invalid and can not be used as an index for measuring the result, and other indexes are needed to be supplemented.
ACC=(TP+TN)/(TP+FN+FP+TN)
Error rate (ERR): indicates the ratio of the number of samples of all classification errors to the total number of samples, err=1-ACC.
ERR=(FN+FP)/(TP+FN+FP+TN)
Precision (P): also called precision rate, mainly for the prediction result, refers to the proportion of all samples predicted to be positive to the actual positive samples.
P=TP/(TP+FP)
Recall (R): also called recall or Sensitivity (SEN), the primary factor refers to the proportion of samples that are predicted to be positive among the samples that are actually positive.
R=TP/(TP+FN)
Specificity: (SPE): the ratio of samples that were actually negative was predicted to be negative.
SPE=TN/(FP+TN)
F1 Score (F1 Score): for the reconciliation average of precision and recall, a value of 0-1 is often used in statistics to measure the accuracy of a two-class (or multi-task two-class) model.
F1=2*P*R/(P+R)
ROC curve and AUC: the ROC curve is all called as a 'subject work characteristic curve', the ordinate of the ROC curve is true positive rate (namely sensitivity), the abscissa is false positive rate (1-sensitivity), and coordinate points are obtained under different thresholds and connected. If the ROC curve is closer to the diagonal, the accuracy of the model is lower. Assuming that the ROC curve of the a classifier can "encase" the B classifier, it can be explained that the a classifier has better classification performance, but when the ROC curves of the two classifiers cross, it is difficult to determine the performance of the two classifiers, at this time, the area under the ROC curve can be used for measurement, namely AUC (Area Under ROC Curve), since the ROC curve is generally located above the line y=x, the value is generally 0.5-1, and the larger the auc value (area) is, the better the performance of the classifier is represented.
Unweighted Average Recall (UAR): if the classifier labels are unevenly distributed, a traditional evaluation index (e.g. ACC, P, R, F1, etc.) will result in too optimistic results for the class with more samples, UAR can be used as a performance metric to avoid overfitting of the proposed classifier method to a certain class.
DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION
Patient data: from month 4 of 2021 to month 12 of 2022, 101 patients were enrolled together, and according to heart failure stage, groups a (a period, n=35), B (B period, n=26), C (C period, n=40) were grouped, while 29 volunteers without heart failure were enrolled in N groups (n=29). There were no significant differences in sex, body Mass Index (BMI), systolic blood pressure, hemoglobin values, creatinine, low density lipoprotein cholesterol (LDL-C), history of coronary heart disease, history of hypertension, history of diabetes, history of smoking, history of drinking, history of dyslipidemia during the three different groups of heart failure minutes. The differences in age, creatinine, troponin, N-brain natriuretic peptide precursor, left ventricular ejection fraction, left ventricular inner diameter during the three different groups of heart failure fractions are all statistically significant (P < 0.05). The specific clinical data are shown in Table 5:
TABLE 5
Phase a (n=35) | Phase B (n=26) | Phase C (n=40) | P | |
Sex (%, M) | 28(68%) | 28(97%) | 32(80%) | 0.014 |
Age of | 46±12 | 51±13 | 57±12 | <0.001 |
BMI | 25.7±3.5 | 26.9±3.8 | 25.3±4.8 | 0.246 |
Systolic blood pressure (mmHg) | 139±20 | 141±24 | 121±18 | <0.001 |
Hemoglobin (hemoglobin) | 145±15 | 146±11 | 141±28 | 0.518 |
Troponin (ng/ml) | 0.005(0.003) | 0.011(0.013) | 0.031(0.045) | <0.001 |
NT-ProBNP(pg/ml) | 31.0(45.4) | 59.0(168.4) | 1391.0(1754.0) | <0.001 |
LDL-c(mmol/L) | 2.90±1.09 | 2.65±1.12 | 2.54±1.10 | 0.326 |
Left ventricular ejection fraction (%) | 67±5 | 66±5 | 41±10 | <0.001 |
Left inner diameter (diastole mm) | 45±3 | 47±4 | 61±10 | <0.001 |
History of coronary disease (%) | 15(37%) | 14(48%) | 23(57%) | 0.168 |
History of hypertension (%) | 24(59%) | 24(83%) | 19(48%) | 0.011 |
History of diabetes (%) | 9(22%) | 8(28%) | 11(28%) | 0.809 |
History of smoking (%) | 7(17%) | 11(38%) | 16(40%) | 0.053 |
History of dyslipidemia (%) | 13(32%) | 12(41%) | 13(33%) | 0.664 |
130 cases meeting the selection standard are included in the embodiment, 4055 voice samples are taken, and the effective duration is 2.216 hours. Wherein 63 cases (%) of age 30-50 years old and 67 cases (%) of age > 50 years old. The heart failure grouping situation is as follows: group a 35 (%) containing 1085 voice samples with an effective duration of 0.574h; group B26 (%) containing 849 voice samples with effective time length of 0.462h; group C40 (%), containing 1231 voice samples, effective duration 0.715h; 29 cases of N groups (%) contain 890 voice samples, and the effective duration is 0.465h; control group 18 (%) contained 890 voice samples with an effective duration of 0.465h.
The speech feature in this embodiment uses the eGeMAPS feature set, which is an extended feature set of GeMAPS, which is an 88-dimensional manual feature extracted by the openSMILE open source toolkit, containing 18 Low-level description features (Low-Level Descriptors, LLDs) and 5 Spectral features (MFCC 1-4 and Spectral flux) and 2 frequency-related features (bandwidths of the second and third formants) added on the GeMAPS basis, including frequency, energy/amplitude-related features and Spectral features. Additional 12-dimensional features were also extracted using python: GNE_SEO_SNR, GNE_TKEO_SNR, GNE_mean, GNE_std, VFER_SEO_SNR, VFER_TKEO_SNR, VFER_mean, VFER_std, cycle period Density entropy RPDE, eliminated trend fluctuation analysis DFA, sample entropy SampEn, multiscale entropy MSEN. The feature parameters used in this study were therefore 100-dimensional in total. By analyzing the 100-dimensional voice characteristics of patients with different heart failure stages, the voice characteristics reflecting voice roughness and breath are found, and the different heart failure stages have obvious influence.
The voice roughness reflects the control ability of the glottis and vocal cords and the voice hoarseness degree, and main indexes comprise Jitter, shimmer, harmonic difference, HNR, alpha Ratio and the like.
Jitter represents the deviation in a single continuous pitch period, reflects the tone quality characteristics of voice rhythm, and has the statistics of mean value and standard deviation; there are 2 features in total, with significant differences in different decay phases. The experimental results are shown in fig. 2.
Shimmer represents the difference between amplitude peaks during adjacent pitch cycles and also reflects the timbre characteristics of the vocal prosody. The statistics are mean and standard deviation. There are 2 features in total, and the heart failure type has a significant effect. The experimental results are shown in fig. 3.
Harmonic difference H1-H2: the energy ratio of the first fundamental tone H1 to the second fundamental tone H2. The statistics are mean and standard deviation. There are 2 features in total, and the heart failure type has a significant effect. H1-A3: energy ratio of +first pitch harmonic H1 to third formant A3. The statistics are mean and standard deviation. There are 2 features in total, and the heart failure type has a significant effect. The experimental results are shown in FIG. 4.
HNR represents the harmonic-to-noise ratio, i.e., the proportion of periodically repeated harmonic components in the sound wave of a stationary vowel. The statistic is mean value and standard deviation; there are 2 features in total, and the heart failure type has a significant effect. The experimental results are shown in FIG. 5.
Alpha Ratio represents the capacity sum of 50-1000Hz divided by the energy sum of 1-5 kHz. The statistics are the mean value and standard deviation of a voiced region and the mean value of an unvoiced region; there are a total of 3 features, all with significant impact on heart failure type. The experimental results are shown in FIG. 6.
The voice breath degree reflects the voice setback and intensity, including Loudness, voiced/unvoiced duration, hammarberg Index, spectra Slope, etc. Different heart failure stage voice breath degree has obvious difference.
voiced/unvoiced duration represents the duration of continuous voiced sound (F0 > 0), with statistics of average length and standard deviation for a total of 2 features. Unvoiced (f0=0) duration, statistics are average length and standard deviation, 2 features total. The number of voiced regions per second is 1 feature. The heart failure type has a significant impact on all 5 features. The experimental results are shown in fig. 7.
Loudess represents the mean and standard deviation of the statistics as mean, standard deviation, 20/50/80 percentile, range of 20-80 percentile, slope of rising/falling speech signal; there are 10 features in total, and different decay phases all have significant impact. The experimental results are shown in fig. 8.
Hammarberg Index represents the strongest energy peak at 0-2kHz divided by the strongest energy peak at 2-5 kHz.
The statistics are the mean value and standard deviation of a voiced region, the mean value of an unvoiced region, and the heart failure types have obvious influence in total of 3 characteristics. The experimental results are shown in fig. 9.
Spectral Slope represents the linear regression Slope (decay rate) of the logarithmic power spectrum in the range of 0-500Hz and 500-1500Hz, the greater the Slope of the Slope-shaped spectrum envelope, the greater the attenuation of signals beyond the division Slope. The statistics are the mean value and standard deviation of the voiced region and the mean value of the unvoiced region. There are 6 features in total, and the heart failure type has a significant impact on 5 features. The experimental results are shown in FIG. 10.
The cross-correlation coefficient of each glottal cycle or the ratio of energy above 2.5KHz to energy below 2.5 KHz. There are 4 features in total, and the heart failure type has a significant effect. The experimental results are shown in fig. 11.
For nonlinear analysis, the nonlinear parameters are more suitable for describing the inherent characteristics of the acoustic signals with poor periodicity, and more heart failure voices often show the characteristics of poor periodicity. Entropy reflects the unordered nature of the distribution of speech information over the frequency domain, and sample entropy can reflect the complexity of heart sound signals from the time domain. The present study first extracts nonlinear parameters such as cyclic period density Entropy (Recurrence period density Entropy, RPDE), trend-elimination fluctuation analysis (Detrended Fluctuation Analysis, DFA), sample Entropy (Sample Entropy), and scale Entropy for describing periodic, non-periodic and chaotic speech signal features.
The cyclic period density entropy RPDE, the trend fluctuation elimination analysis DFA, the sample entropy SampEn and the multi-scale entropy MSEN are all roughness of the reaction sound. There are 4 features in total, and the heart failure type has a significant effect. The experimental results are shown in FIG. 12.
For the cepstral-based acoustic feature parameters, the Mel-frequency cepstral coefficients are the cepstral parameters extracted from the Mel-scale frequency domain, which describes the non-linear characteristics of human ear frequencies. The acoustic characteristic parameters based on cepstrum analysis can effectively avoid analysis inaccuracy caused by fundamental frequency irregularity.
Mel-frequency cepstrum coefficient 1-4. The statistics are the whole and the mean value and standard deviation of the voiced sound section; there are 16 features in total, and the heart failure type has a significant impact on 15 features. The experimental results are shown in fig. 13.
Note that, in fig. 2-13, the abscissa represents the group in the heart failure stage study, and the ordinate represents the corresponding voice feature parameters, for example, the extracted jitterlocal_sma3nz_ameam and jitterlocal_sma3nz_stddevnorm 2 voice feature parameters of the Jitter are consistent with table 1.
In this embodiment, based on the original 100-dimensional gemaps characteristics of the voice of the patients with different stage heart failure, we further observe the performance of different classifiers for identifying the patients with different stage heart failure by using two classification experiments, namely, a PCA method and a LASSO method to reduce the dimensions, and comparing 6 different classifiers, including a Support Vector Machine (SVM), a Decision Tree (DT), an adaptive enhancement (Adaptive Boosting, ada Boost), a minimum absolute contraction and selection operator (Least Absolute Shrinkage and Selection Operator, LASSO), a Ridge regression (Ridge regression), an Elastic network (Elastic Net), and the like.
A, B two-classification experiment (leave one method)
The heart failure A-stage and B-stage crowd are used as dependent variables to conduct classification recognition modeling, based on a classification model substituted by original variables, in an original 100-dimensional voice feature classification model, an Ada Boost classifier model is optimal, the Accuracy (Accurcy) reaches 0.869, the precision (P) is 0.846, the Recall (Recall, R) is 0.846, and the F1 score is 0.846. Through PCA dimension reduction, the classification model of Ada Boost is optimal, and the accuracy rate of model training by using a training set is 0.738; after the dimension reduction by LASSO, the classification model of Ada Boost is optimal, the accuracy of model training by using a training set is 0.770, which is lower than that of a two-class recognition model based on the original characteristics, and the consideration of the model is possibly related to the characteristic loss after the dimension reduction. The optimal model, i.e. the original AdaBoost, and the sample-level ROC curve is shown in fig. 14 (auc= 0.793), and the result shows that the voice characteristics can identify the population in the phase a and the phase B of heart failure, and the voice characteristics can be used for primarily screening heart failure risk period and heart failure prophase patients with target organ damage. The classification accuracy (mean, standard deviation) based on the original 100-dimensional gemaps features is shown in table 6.
TABLE 6
Classifying model based on original variable substitution by taking heart failure group A and B as dependent variables for classification recognition modeling
The AdaBoost confusion matrix is shown in tables 7 and 8:
TABLE 7
TABLE 8
The results of the classification model with principal component substitution based on PCA for dimension reduction are shown in table 9:
TABLE 9
The AdaBoost confusion matrix is shown in tables 10 and 11:
table 10
TABLE 11
The results of the classification model into which the feature variables were substituted for by performing dimension reduction based on LASSO are shown in table 12:
table 12
The AdaBoost confusion matrix is shown in tables 13 and 14:
TABLE 13
TABLE 14
After the dimension reduction by LASSO, the classification model of AdaBoost is optimal, the accuracy of model training by using a training set is 0.770, both the model training accuracy and the model training accuracy are reduced compared with the two-class recognition model based on the original features, the consideration is possibly related to the feature loss after the dimension reduction, and the feature importance of the original 100-dimensional AdaBoost (wherein the 2-dimensional importance is 0) is shown in a table 15.
TABLE 15
A. Summary of results of B-classification experiments (leave-one-out method)
The performance evaluation indexes of the original classification model in the test set are shown in table 16:
table 16
The performance evaluation indexes of the PCA dimension reduction+classification model in the test set are shown in table 17:
TABLE 17
The performance evaluation indexes of the LASSO feature selection+classification model in the test set are shown in table 18:
TABLE 18
The optimal model (original AdaBoost) sample level ROC curve (auc= 0.793) is shown in fig. 14, where the abscissa indicates the false positive rate and true positive rate, respectively, and Receiver Operating Characteristic is the receiver operating curve.
B, C classification results (leave-one-out method)
The heart failure B stage and C stage crowd are taken as dependent variables to conduct classification recognition modeling, based on a classification model substituted by original variables, in an original 100-dimensional voice feature classification model, an Ada Boost classifier model is optimal, the Accuracy (Accurcy) reaches 0.788, the precision (P) is 0.771, the Recall (Recall) is 0.925, and the F1 score is 0.841. Through PCA dimension reduction, classification models of Ada Boost and SVM are similar, and model training accuracy is 0.773 by using a training set; the training accuracy of the Elastic classification model is 0.742, which is lower than that of the classification recognition model based on the original characteristics. After the dimension reduction by LASSO, the classification model of Ada Boost is optimal, the accuracy of model training by using a training set is 0.803, and the accuracy of the model training is improved compared with that of a binary classification recognition model based on original features. As shown in fig. 15, the area under the ROC curve AUC of the optimal model, i.e., adaBoost using Lasso dimensionality reduction, was 0.819. The result shows that the voice characteristics can identify the populations in the B stage and the C stage of heart failure, and the voice characteristics can be used for identifying the patients with pre-heart failure and the patients with symptomatic heart failure with target organ damage.
The classification accuracy (mean, standard deviation) based on the original 100-dimensional gemaps features is shown in table 19:
TABLE 19
The AdaBoost confusion matrix is shown in tables 20 and 21:
table 20
Table 21
The classification model for dimensionality reduction based on PCA, substituted with principal components, is shown in table 22:
table 22
The AdaBoost confusion matrix is shown in tables 23 and 24:
table 23
Table 24
The SVM confusion matrix is shown in table 25 and table 26:
table 25
Table 26
The classification model for reducing the dimension based on LASSO and substituting the feature variable is shown in table 27:
table 27
The AdaBoost confusion matrix is shown in tables 28 and 29:
table 28
Table 29
Features with LASSO regularization coefficients other than 0, totaling 66 dimensions, 34 dimensions features 0, as shown in table 30:
table 30
B. Summary of results of the C-dichotomy experiment (leave-one-out method)
The performance evaluation indexes of the original classification model in the test set are shown in table 31:
table 31
The performance evaluation indexes of the PCA dimension reduction+classification model in the test set are shown in table 32:
table 32
The performance evaluation indexes of the LASSO feature selection+classification model in the test set are shown in table 33:
table 33
The optimal model (original AdaBoost) sample level ROC curve (auc= 0.793) is shown in fig. 15, where the abscissa indicates the false positive rate and true positive rate, respectively, and Receiver Operating Characteristic is the receiver operating curve.
AB, C two-classification experiment (leave one method)
Two-class recognition modeling is carried out by taking patients (AB) in the heart failure A and B stages and patients in the C stage as dependent variables, and based on a classification model substituted by original variables, in an original 100-dimensional voice characteristic classification model, an Ada Boost classifier model is optimal, the Accuracy (Accuracy) reaches 0.802, the precision (P) is 0.857, the Recall rate (Recall, R) is 0.600, and the F1 score is 0.706. Through PCA dimension reduction, the classification model of the SVM is optimal, but the accuracy of model training by the training set is reduced to 0.723; after the dimension reduction by LASSO, the classification model of Ada Boost is optimal, the accuracy of model training by using a training set is 0.812, and the accuracy of the model training is improved compared with that of a binary classification recognition model based on original features. As shown in fig. 16, the area under the ROC curve AUC of the optimal model, i.e., adaBoost using Lasso dimensionality reduction, was 0.731. The results show that the voice characteristics can distinguish patients with heart failure in the early stage (stage B) and the heart failure risk (stage A) and target organ damage from patients with symptoms of heart failure in the stage C, namely the voice characteristics can help identify the patients with symptoms of heart failure. The classification accuracy (mean, standard deviation) based on the original 100-dimensional gemaps features is shown in table 34.
Watch 34
The AdaBoost confusion matrix is shown in tables 35 and 36:
Table 35
Table 36
The classification model for dimensionality reduction based on PCA, substituted with principal components, is shown in table 37:
table 37
The SVM confusion matrix is shown in table 38 and table 39:
table 38
Table 39
The results of the dimension reduction based on LASSO are shown in table 40:
table 40
The AdaBoost confusion matrix is shown in tables 41 and 42:
table 41
Table 42
The LASSO regularization coefficient is not 0, and is calculated to be 56-dimensional, and the 44-dimensional is calculated to be 0, as shown in Table 43:
table 43
AB. Sorting the results of the two classification (leave-one-out method)
The performance evaluation indexes of the original classification model in the test set are shown in table 44:
table 44
The performance evaluation indexes of the "PCA dimension reduction+classification model" in the test set are shown in table 45:
table 45
The performance evaluation indexes of the LASSO feature selection+classification model in the test set are shown in Table 46:
watch 46
The optimal model (original AdaBoost) sample level ROC curve (auc= 0.793) is shown in fig. 16, where the abscissa indicates the false positive rate and true positive rate, respectively, and Receiver Operating Characteristic is the receiver operating curve.
The experiment proves that (1) the voice characteristics of different-stage patients are different, and the different-stage patients reflect main indexes of voice roughness, jitter, shimmer, harmonic difference, HNR, alpha Ratio and the like based on the total 100-dimensional characteristics extracted by eGeMAPS characteristic set and python, and the voice characteristics are different among different-stage heart failure. (2) The main indicators reflecting voice smell include Loudness, voiced/unvoiced duration, hammarberg Index, spectra Slope, etc., and there are also significant differences in different heart failure periods. In these sound indicators, the fundamental characteristics of the reflected sound, including frequency, energy/amplitude related characteristics, nonlinearity, etc., are distinguished from each other by different periods of decay. (3) The contribution of speech features during different segments is not exactly the same. (4) Based on the original 100-dimensional eGeMAPs characteristics of the voice of different stage heart failure patients, a classification model is constructed through a two-classification method, the original variable, PCA (principal component analysis) dimension reduction and LASSO dimension reduction are compared, the performances of different stage heart failure patients are identified by different classifiers, the classification model is optimized, the optimal model for identifying the stage A heart failure patients and the stage B heart failure patients is an AdaBoost classification method based on the original variable, and the ROC curve AUC is 0.793; the optimal model for identifying patients with B-phase heart failure and C-phase heart failure is an AdaBoost classification method based on Lasso dimension reduction, and the ROC curve AUC is 0.819; the optimal model for distinguishing the AB phase from the C phase is an AdaBoost classification method based on Lasso dimension reduction, and the ROC curve AUC is 0.731.
The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and operable on the processor, the processor implementing the following methods when executing the computer program: converting the collected voice analog signals into voice digital signals, preprocessing the voice digital signals, and extracting features of the preprocessed voice digital signals to obtain multiple types of voice feature samples; constructing a classification model for identifying heart failure stage, and training and optimizing the classification model by utilizing the multi-class voice feature samples to obtain an optimal classification model, wherein the classification model for identifying heart failure stage A and heart failure stage B is an AdaBoost classification model based on an original variable; the classification model used for identifying the B phase and the C phase of heart failure is an AdaBoost classification model based on Lasso dimension reduction; the classification model used for identifying heart failure stage A and B and stage C is an AdaBoost classification model based on Lasso dimension reduction.
The present invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the following method: converting the collected voice analog signals into voice digital signals, preprocessing the voice digital signals, and extracting features of the preprocessed voice digital signals to obtain multiple types of voice feature samples; constructing a classification model for identifying heart failure stage, and training and optimizing the classification model by utilizing the multi-class voice feature samples to obtain an optimal classification model, wherein the classification model for identifying heart failure stage A and heart failure stage B is an AdaBoost classification model based on an original variable; the classification model used for identifying the B phase and the C phase of heart failure is an AdaBoost classification model based on Lasso dimension reduction; the classification model used for identifying heart failure stage A and B and stage C is an AdaBoost classification model based on Lasso dimension reduction.
In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.
It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Claims (10)
1. An apparatus for constructing a classification model based on acoustic features that identifies different stages of heart failure, comprising:
the sample processing unit is used for converting the collected voice analog signals into voice digital signals, preprocessing the voice digital signals, and extracting the characteristics of the preprocessed voice digital signals to obtain multiple types of voice characteristic samples;
the model training unit is used for constructing a classification model for identifying heart failure stage, training and optimizing the classification model by utilizing the multi-class voice feature samples to obtain an optimal classification model, wherein the classification model for identifying heart failure stage A and heart failure stage B is an AdaBoost classification model based on an original variable; the classification model used for identifying the B phase and the C phase of heart failure is an AdaBoost classification model based on Lasso dimension reduction; the classification model used for identifying heart failure stage A and B and stage C is an AdaBoost classification model based on Lasso dimension reduction.
2. The apparatus for constructing a classification model for identifying different stages of heart failure based on acoustic features according to claim 1, wherein the sample processing unit comprises:
a first conversion unit for converting a time-continuous speech analog signal into time-discrete, amplitude-continuous signal samples at a predetermined sampling period;
The second conversion unit is used for converting each signal sample with continuous value in amplitude into discrete value and representing the discrete value by binary system to obtain digital data;
and the third conversion unit is used for converting the digital data into a binary code stream to obtain a voice digital signal.
3. The apparatus for constructing a classification model for identifying different stages of heart failure based on acoustic features of claim 1, wherein the sample processing unit further comprises:
a synchronizing unit for synchronizing the plurality of voice digital signals to a uniform sampling rate by adopting a downsampling method;
the end point detection unit is used for carrying out end point detection on the voice digital signal after the unified sampling rate and distinguishing a voice area and a non-voice area;
the emphasis unit is used for emphasizing the high-frequency part of the voice area and increasing the high-frequency resolution of the voice;
and the framing and windowing unit is used for framing and windowing the emphasized voice region to obtain a plurality of voice signal segments.
4. The apparatus for constructing a classification model for identifying different stages of heart failure based on acoustic features of claim 1, wherein the sample processing unit further comprises:
An extraction unit for extracting a multi-dimensional first speech feature sample using an openSMILE open source toolkit and extracting a multi-dimensional second speech feature sample using python;
and the merging unit is used for merging the multi-dimensional first voice characteristic sample and the multi-dimensional second voice characteristic sample to obtain multi-class voice characteristic samples.
5. The apparatus for constructing a classification model identifying different stages of heart failure based on acoustic features of claim 4, wherein the first speech feature sample comprises: pitch characteristics, frequency perturbation characteristics, formant characteristics, amplitude perturbation characteristics, loudness characteristics, harmonic to noise characteristics, harmonic difference characteristics, alpha ratio characteristics, hammarberg coefficient characteristics, spectral slope characteristics, mel-frequency cepstrum characteristics, spectral flow characteristics, ratio characteristics of loudness peaks, continuous and unvoiced regions characteristics, equivalent sound level characteristics.
6. The apparatus for constructing a classification model based on acoustic features that identifies different stages of heart failure according to claim 4, wherein the second speech feature sample comprises: glottal noise excitation ratio characteristics, vocal cord excitation ratio characteristics, cyclic period density entropy characteristics, trend fluctuation elimination analysis characteristics, sample entropy characteristics and multi-scale entropy characteristics.
7. The apparatus for constructing a classification model for identifying different stages of heart failure based on acoustic features of claim 1, further comprising:
and the evaluation unit is used for carrying out model evaluation on the classification model according to a set aside method.
8. The apparatus for constructing a classification model for identifying different stages of heart failure based on acoustic features of claim 1, further comprising:
and the performance measurement unit is used for performing performance measurement on the classification model according to a preset index, wherein the preset index comprises error rate and accuracy rate, precision rate and recall rate, F1 value, specificity, sensitivity, ROC curve and AUC, and unweighted average recall rate.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following method when executing the computer program: converting the collected voice analog signals into voice digital signals, preprocessing the voice digital signals, and extracting features of the preprocessed voice digital signals to obtain multiple types of voice feature samples; constructing a classification model for identifying heart failure stage, and training and optimizing the classification model by utilizing the multi-class voice feature samples to obtain an optimal classification model, wherein the classification model for identifying heart failure stage A and heart failure stage B is an AdaBoost classification model based on an original variable; the classification model used for identifying the B phase and the C phase of heart failure is an AdaBoost classification model based on Lasso dimension reduction; the classification model used for identifying heart failure stage A and B and stage C is an AdaBoost classification model based on Lasso dimension reduction.
10. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, the computer program when executed by a processor implementing the method of: converting the collected voice analog signals into voice digital signals, preprocessing the voice digital signals, and extracting features of the preprocessed voice digital signals to obtain multiple types of voice feature samples; constructing a classification model for identifying heart failure stage, and training and optimizing the classification model by utilizing the multi-class voice feature samples to obtain an optimal classification model, wherein the classification model for identifying heart failure stage A and heart failure stage B is an AdaBoost classification model based on an original variable; the classification model used for identifying the B phase and the C phase of heart failure is an AdaBoost classification model based on Lasso dimension reduction; the classification model used for identifying heart failure stage A and B and stage C is an AdaBoost classification model based on Lasso dimension reduction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310205344.5A CN116434739A (en) | 2023-03-06 | 2023-03-06 | Device for constructing classification model for identifying different stages of heart failure and related assembly |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310205344.5A CN116434739A (en) | 2023-03-06 | 2023-03-06 | Device for constructing classification model for identifying different stages of heart failure and related assembly |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116434739A true CN116434739A (en) | 2023-07-14 |
Family
ID=87078617
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310205344.5A Pending CN116434739A (en) | 2023-03-06 | 2023-03-06 | Device for constructing classification model for identifying different stages of heart failure and related assembly |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116434739A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117727296A (en) * | 2023-12-18 | 2024-03-19 | 杭州恒芯微电子技术有限公司 | Speech recognition control system based on single fire panel |
CN117898684A (en) * | 2024-03-20 | 2024-04-19 | 北京大学 | Method, device and equipment for monitoring heart failure illness state and readable storage medium |
-
2023
- 2023-03-06 CN CN202310205344.5A patent/CN116434739A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117727296A (en) * | 2023-12-18 | 2024-03-19 | 杭州恒芯微电子技术有限公司 | Speech recognition control system based on single fire panel |
CN117727296B (en) * | 2023-12-18 | 2024-08-09 | 杭州恒芯微电子技术有限公司 | Speech recognition control system based on single fire panel |
CN117898684A (en) * | 2024-03-20 | 2024-04-19 | 北京大学 | Method, device and equipment for monitoring heart failure illness state and readable storage medium |
CN117898684B (en) * | 2024-03-20 | 2024-06-18 | 北京大学 | Method, device and equipment for monitoring heart failure illness state and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107657964B (en) | Depression auxiliary detection method and classifier based on acoustic features and sparse mathematics | |
Dibazar et al. | Feature analysis for automatic detection of pathological speech | |
Parsa et al. | Identification of pathological voices using glottal noise measures | |
CN102429662B (en) | Screening system for sleep apnea syndrome in family environment | |
Umapathy et al. | Discrimination of pathological voices using a time-frequency approach | |
CN116434739A (en) | Device for constructing classification model for identifying different stages of heart failure and related assembly | |
CN110123367B (en) | Computer device, heart sound recognition method, model training device, and storage medium | |
CN108896878A (en) | A kind of detection method for local discharge based on ultrasound | |
CN108305639B (en) | Speech emotion recognition method, computer-readable storage medium and terminal | |
CN112820279B (en) | Parkinson detection model construction method based on voice context dynamic characteristics | |
Kapoor et al. | Parkinson’s disease diagnosis using Mel-frequency cepstral coefficients and vector quantization | |
CN108682432B (en) | Speech emotion recognition device | |
Reddy et al. | The automatic detection of heart failure using speech signals | |
Reggiannini et al. | A flexible analysis tool for the quantitative acoustic assessment of infant cry | |
Gómez-García et al. | On the design of automatic voice condition analysis systems. Part III: Review of acoustic modelling strategies | |
Friedman | Pseudo-maximum-likelihood speech pitch extraction | |
Dubuisson et al. | On the use of the correlation between acoustic descriptors for the normal/pathological voices discrimination | |
Singh et al. | Preliminary analysis of cough sounds | |
Cordeiro et al. | Spectral envelope first peak and periodic component in pathological voices: A spectral analysis | |
WO2022152751A1 (en) | Speech-analysis based automated physiological and pathological assessment | |
CN118016106A (en) | Elderly emotion health analysis and support system | |
Wu et al. | GMAT: Glottal closure instants detection based on the multiresolution absolute Teager–Kaiser energy operator | |
CN113974607A (en) | Sleep snore detecting system based on impulse neural network | |
Hess | Pitch and voicing determination of speech with an extension toward music signals | |
Tran et al. | Robust Pitch Regression with Voiced/Unvoiced Classification in Nonstationary Noise Environments. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |