CN109192221A - It is a kind of that phonetic decision Parkinson severity detection method is used based on cluster - Google Patents
It is a kind of that phonetic decision Parkinson severity detection method is used based on cluster Download PDFInfo
- Publication number
- CN109192221A CN109192221A CN201811032625.0A CN201811032625A CN109192221A CN 109192221 A CN109192221 A CN 109192221A CN 201811032625 A CN201811032625 A CN 201811032625A CN 109192221 A CN109192221 A CN 109192221A
- Authority
- CN
- China
- Prior art keywords
- amplitude
- frame
- fundamental frequency
- adjacent
- difference
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 11
- 238000000034 method Methods 0.000 claims description 44
- 238000012706 support-vector machine Methods 0.000 claims description 25
- 239000013598 vector Substances 0.000 claims description 24
- 210000001260 vocal cord Anatomy 0.000 claims description 24
- 230000008569 process Effects 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 14
- 208000018737 Parkinson disease Diseases 0.000 claims description 13
- 238000005311 autocorrelation function Methods 0.000 claims description 10
- 238000005259 measurement Methods 0.000 claims description 10
- 238000012935 Averaging Methods 0.000 claims description 7
- 238000004458 analytical method Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 6
- 208000011293 voice disease Diseases 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 5
- 201000010099 disease Diseases 0.000 claims description 4
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 238000009432 framing Methods 0.000 claims 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 claims 1
- 239000010931 gold Substances 0.000 claims 1
- 229910052737 gold Inorganic materials 0.000 claims 1
- 230000006870 function Effects 0.000 description 14
- 238000002474 experimental method Methods 0.000 description 8
- 238000003780 insertion Methods 0.000 description 7
- 230000037431 insertion Effects 0.000 description 7
- 230000001755 vocal effect Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 238000001228 spectrum Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 238000003745 diagnosis Methods 0.000 description 4
- 229940079593 drug Drugs 0.000 description 4
- 239000003814 drug Substances 0.000 description 4
- 230000009977 dual effect Effects 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 238000013480 data collection Methods 0.000 description 3
- 238000012417 linear regression Methods 0.000 description 3
- 210000000214 mouth Anatomy 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 238000006073 displacement reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000002045 lasting effect Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 235000008331 Pinus X rigitaeda Nutrition 0.000 description 1
- 235000011613 Pinus brutia Nutrition 0.000 description 1
- 241000018646 Pinus brutia Species 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012631 diagnostic technique Methods 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 210000004704 glottis Anatomy 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000002427 irreversible effect Effects 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 210000003205 muscle Anatomy 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 210000003800 pharynx Anatomy 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000010979 ruby Substances 0.000 description 1
- 229910001750 ruby Inorganic materials 0.000 description 1
- 208000027765 speech disease Diseases 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
- 238000013456 study Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002087 whitening effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/48—Other medical applications
- A61B5/4803—Speech analysis specially adapted for diagnostic purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Public Health (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Theoretical Computer Science (AREA)
- Biomedical Technology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Pathology (AREA)
- Epidemiology (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Primary Health Care (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Biophysics (AREA)
- Heart & Thoracic Surgery (AREA)
- Molecular Biology (AREA)
- Surgery (AREA)
- Animal Behavior & Ethology (AREA)
- Veterinary Medicine (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
The invention discloses a kind of to use phonetic decision Parkinson severity detection method based on cluster, includes the following steps: the acquisition of 1, voice signal;2, the pretreatment of voice signal;3, all phonetic features are extracted, including fundamental frequency feature Pitch, fundamental frequency disturb Jitter, amplitude disturbances Shimmer, signal-to-noise ratio feature, nonlinear characteristic;4, model and calculating;5, predicted: each classification for cluster is loaded into classification and regression model;Obtain classification results;Speculated by the severity that mark value carries out patient.Finally, feeding back to front end, showing user the result of prediction by interface.The problems such as present invention is completed using computer software analysis, solves the problems, such as not have in clinic fixed index to determine whether with Parkinson, while it is long also to solve clinical observation Parkinson's period, costly, has in real time, efficient and inexpensive feature.
Description
Technical field
The present invention relates to machine learning, artificial intelligence, voice diagnosis, data mining, more specifically to a kind of base
Phonetic decision Parkinson severity detection method is used in cluster.
Background technique
SVM (SupportVectorMachine) is one of current most popular classifier.SVM is soluble
Basic problem is two classification problems.Its essential idea is to find one or one from training data by convex optimized algorithm
Group can be separated by two class data hyperplane.In this way, in prediction, so that it may judge to predict number by this group of hyperplane
According to which kind of belongs to.Parkinson is only carried out diagnostic classification by current existing Parkinson's diagnostic techniques based on machine learning,
Determine whether suspected patient suffers from Parkinson's disease.And Parkinson's disease is a kind of irreversible disease, so in actual life
In cannot play essence and solve the problems, such as the effect of patient.Support vector regression (SupportVectorRegression, SVR),
It is a kind of regression algorithm being modified to according to the basic thought of SVM, main thought is to find a hyperplane to carry out sample
Mapping, the absolute value of the difference unlike other regression algorithms between its mapping and true value are specific if it is less than one
Range, be just not counted in loss.But in prior art there is also some defects or problem, such as can only be made whether
Classify with Parkinson, extent cannot be provided.Although patent is by UPDRS (unified Parkinson's disease measuring scale)
It to Parkinson's patient assessment illness severity, but is to carry out overall calculation and prediction, it cannot be according to the feelings of disturbances in patients with Parkinson disease
Condition carries out personalized prediction.Accelerate the efficiency of diagnosis.
Summary of the invention
The present invention provides a kind of use phonetic decision Parkinson severity detection method based on cluster.
In order to achieve the above object, the present invention provides a kind of use phonetic decision Parkinson severity based on cluster
Detection method includes the following steps:
The acquisition of S1, voice signal
Vowel is selected, by acquiring equipment, acquire following content: whether patient number name, the age, gender, is made a definite diagnosis
For Parkinson, whether there are other to lead to the disease of voice disorder, sick time, UPDRS (movement), UPDRS (entirety), acquisition
Date and time, which time acquisition of the same day;
The pretreatment of S2, voice signal
To voice carry out voice signal pretreatment, including format conversion, sample frequency conversion, preemphasis, adding window and point
Frame removes unvoiced section, carries out fundamental frequency extraction;
S3, all phonetic features are extracted
Jitter, amplitude disturbances Shimmer, signal-to-noise ratio feature, non-linear spy are disturbed including fundamental frequency feature Pitch, fundamental frequency
Sign;
S4, model and calculating
Sorting algorithm based on support vector machines, using linear separability SVM classifier and nonlinear model, SVM is logical
It crosses introducing kernel function and establishes model;The information that characteristic obtained by S3 and doctor provide is corresponded, row is at data
Collection;Clustered using k-means, for each cluster classification, by data set in the ratio of 3:1 be divided into training set and
Test set;For the classification of each cluster stroke, training set is carried out using the sorting algorithm and SVM model of support vector machines
Disaggregated model training is carried out regression model training using support vector regression SVR method, is carried out using trellis search method excellent
Change model, will be trained, the model parameter of each classification is saved.
S5, it is predicted
Each classification for cluster is loaded into classification and regression model obtained in the above training process;Input needs to sentence
Disconnected and prediction data, calculate Distance Judgment generic, carry out classified calculating to data using the corresponding model of the category,
Obtain classification results.Classification results are handled again, are the testers for not suffering from Parkinson's disease for prediction, by mark value
Installing is 0, for predicting the tester with Parkinson's disease, reuses SVR and carries out regression forecasting, obtain calculated label
Value is speculated by the severity that mark value carries out patient, terminates process;
S6, feedback result
By the result of prediction by interface, front end is fed back to, user is showed.
Under preferred embodiment, fundamental frequency feature Pitch carries out the extraction of fundamental frequency using correlation method in step S3;
For deterministic signal, short-time autocorrelation function is defined as:
Then for the auto-correlation function of each frame, need to find maximum peak value after its first zero crossing, it is right
The subscript k value answered is exactly the corresponding pitch period of frame voice, and inverted to k is exactly fundamental frequency;
General fundamental frequency is used to indicate, due to having carried out sub-frame processing to voice signal when voice is pretreated, so
Each frame can all have one it is corresponding, can be obtained by a fundamental frequency sequence in this way.
And fundamental frequency feature is exactly to calculate some simple statistical parameters in the fundamental frequency sequence basis extracted.
Under preferred embodiment, in step S3, Jitter is used to indicate the disturbance of fundamental frequency, the i.e. journey in pitch period deviation period
Degree since audio is using lasting pronunciation process, and sends out long vowel, can exclude the alternate fundamental frequency disturbance of first consonant;
The not instead of fundamental frequency that Jitter is used when calculating, pitch period, pitch period are defined as follows:
S.t.n=1,2 ..., N (2.1)
Wherein, n indicates that frame, N represent the sum of frame, refer to the fundamental frequency of n-th frame, and what is referred to is exactly the fundamental tone week of the n-th frame
Phase;
(1) Jitter: i.e. the Relative Perturbation of pitch period is the average absolute value of the difference of adjacent pitch period, with fundamental tone
The ratio of the average value in period reflects whole subject to the relation control ability of vocal cord vibration;Its formula are as follows:
(2) Jitter_abs: i.e. the absolute perturbation of pitch period is exactly the average absolute value of the difference of adjacent pitch period,
Whole subject is reflected to the absolute control capacity of vocal cord vibration;Its formula are as follows:
(3) Jitter_PPQ5: i.e. the adjacent 5 points of disturbances of pitch period, are the flat of a certain frame pitch period 5 frame adjacent thereto
The average absolute value of the difference of equal pitch period, reflects a period of time subject to the control ability of vocal cord vibration to a certain degree;
Its formula are as follows:
(4) Jitter_rap: i.e. the adjacent 3 points of disturbances of pitch period, are the flat of a certain frame pitch period 3 frame adjacent thereto
The average absolute value of the difference of equal pitch period, reflects a period of time subject to the control ability of vocal cord vibration to a certain degree;
Its formula are as follows:
(5) Jitter_ddp: i.e. the difference of the adjacent 3 points of disturbances of pitch period, is the difference between adjacent 3 frame pitch period
Difference, then be averaging absolute value, reflect subject in a bit of time to a certain degree to the control ability of vocal cord vibration
Difference;Its formula are as follows:
Under preferred embodiment, in step S3, Shimmer is the disturbance of the amplitude of voice, i.e. the journey of amplitude deviation mean amplitude of tide
Degree:
The definition of the amplitude of Shimmer measurement is:
A0,n=max (Pn)-min(Pn) (3.1)
S.t.n=1,2 ..., N
Wherein, sequence indicates the voice signal value sequence of n-th frame, refers to the corresponding amplitude of n-th frame;
(1) Shimmer: i.e. amplitude disturbances (precentagewise to calculate), are the average absolute values of the difference of adjacent amplitude, with vibration
The ratio of the average value of width reflects subject to the relation control ability of voice amplitudes;Its formula are as follows:
(2) Shimmer_dB: i.e. amplitude disturbances (being calculated by decibel).It is the average value of the ratio of adjacent amplitude, only
Unit is a decibel dB, reflects subject to the relation control ability of voice amplitudes;Its formula are as follows:
(3) Shimmer_APQ5: i.e. the adjacent 5 points of disturbances of amplitude, are the means amplitude of tide of a certain frame amplitude 5 frame adjacent thereto
Difference average absolute value, reflect a period of time subject to a certain degree to the control ability of voice amplitudes;Its formula are as follows:
(4) Shimmer_APQ3: i.e. the adjacent 3 points of disturbances of amplitude, are the means amplitude of tide of a certain frame amplitude 3 frame adjacent thereto
Difference average absolute value, reflect a period of time subject to a certain degree to the control ability of voice amplitudes;Its formula are as follows:
(5) Shimmer_dda: i.e. the difference of the adjacent 3 points of disturbances of amplitude, is the difference of the difference between adjacent 3 frame amplitude, then
It is averaging absolute difference, reflects in a bit of time subject to a certain degree to the difference of the control ability of voice amplitudes;Its
Formula are as follows:
(6) Shimmer_APQ11: i.e. the adjacent 11 points of disturbances of amplitude, are the average vibrations of a certain frame amplitude 11 frame adjacent thereto
The average absolute value of the difference of width reflects a long time subject to the control ability of voice amplitudes to a certain degree;Its formula
Are as follows:
Under preferred embodiment, in step S3, nonlinear characteristic is to have trend fluction analysis DFA, recurrence period density entropy
RPDE, fundamental frequency cycles entropy PPE.
The present invention is completed using computer software analysis, and solving in clinic does not have fixed index to determine whether to suffer from
Have the problem of Parkinson, while it is long also to solve clinical observation Parkinson's period, it is costly the problems such as, have in real time, efficiently again
The feature of low cost.This method detected using phonetic decision Parkinson's severity based on cluster, can not only pass through
The mode of voice judges whether testee suffers from Parkinson's disease, can also be tested by the UPDRS value of prediction
Examination person suffers from the judgement of Parkinson's severity, and by the method row of cluster at multiple classifications, small classification class is predicted and returned
The accuracy rate of prediction can be significantly improved.It can be by prediction as a result, next being controlled how patient carries out auxiliary come decision
It treats.Personalized treatment for disturbances in patients with Parkinson disease is a Gospel.
Detailed description of the invention
Fig. 1 is model construction flow chart;
Fig. 2 UPDRS estimation flow figure;
Fig. 3 linear separability SVM classifier;
The linear SVR of Fig. 4 returns device.
Specific embodiment
The acquisition of one voice signal
The selection of 1.1 pronunciations
The pronunciation content of acquisition needs briefly, while can reflect the voice disorder of patient to a certain extent.Consider
To need between different people that there are languages different, whether there is or not dialect, whether there is or not accent and need to avoid asophia etc. a variety of because
Element determines that the method is commonplace, and effect is good, strong operability using pronunciation process is continued.
And select to have selected hair vowel in pronunciation, this is because the mechanism that different sound is formed is different:
(1) vowel: being the sound during the pronunciation process by air-flow by the unobstructed sending in oral cavity.Air-flow when it pronounces
Pass through glottis from lung's exhalation, then impulsive sound band, vibrates vocal cords uniformly, and last air-flow is unobstructed to pass through oral cavity, nose
Chamber issues different sound by the adjusting of tongue, lip.So vocal cords necessarily vibrate when hair vowel.
(2) consonant: being during the pronunciation process by air-flow in the sound that oral cavity or pharynx are hindered and are issued.When it pronounces
Various obstructions of the air-flow by multiple vocal organs, rely primarily on obstruction to pronounce.So vocal cords are not necessarily intended to when hair consonant
Vibration.
1.2 acquisition equipment
Compared with the experiment of the previous sound pick-up outfit using profession, in order to meet the needs for being packaged into app, acquisition
When the equipment used be normal mobile phone.Simultaneously because there are operating system different problems for mobile phone, we used two hands
Machine is acquired, and one is iphone5s, and another is oppo r9s.The system is being serviced using the Flask frame of python
Device builds background process platform and audio transmission interfaces, is acquired by using the app developed, by voice document
It is stored in background server, so as to subsequent processing.
1.3 acquisition scheme
A large amount of experimental data in order to obtain needs to take multiple measurements each patient Parkinson (open state, pass
State, before medication, after medication, after medication 1 hour, after medication 3 hours), sampling environment need to meet quiet while put sampler
The room of pine, and sampler is assisted equally to keep quite.If there is the uncertain noise of burst in Recording Process, need
It deletes and rerecords, to guarantee the quality of recording.Whether acquisition content includes: patient number, name, the age, gender, makes a definite diagnosis
For Parkinson, whether there are other to lead to the disease of voice disorder, sick time, UPDRS (movement), UPDRS (entirety), acquisition
Date and time, which time acquisition of the same day.
The pretreatment of two voice signals
To voice carry out voice signal pretreatment, including format conversion, sample frequency conversion, preemphasis, adding window and point
Frame removes unvoiced section, carries out fundamental frequency extraction, uses Matlab R2017a as handling implement.It (this part can be in network
On find)
Three extract all phonetic features
Jitter, amplitude disturbances Shimmer, signal-to-noise ratio feature, non-linear spy are disturbed including fundamental frequency feature Pitch, fundamental frequency
Sign, steps are as follows for specific calculating phonetic feature:
All voice disorder mark sheets used of table 1
1. fundamental frequency feature Pitch
The extraction of fundamental frequency is carried out using most commonly seen correlation method, its main feature is that interpretation is strong, is suitble to simultaneously
In this quick speech processes mobile phone application.Its calculating of correlation method is fairly simple, is exactly to carry out base using auto-correlation function
The estimation of frequency.
Autocorrelation calculation in short-term is carried out firstly the need of to each frame voice signal extracted before.For deterministic letter
Number, short-time autocorrelation function is defined as:
Then for the auto-correlation function of each frame, need to find maximum peak value after its first zero crossing, it is right
The subscript k value answered is exactly the corresponding pitch period of frame voice, and inverted to k is exactly fundamental frequency.
General fundamental frequency is used to indicate, due to having carried out sub-frame processing to voice signal when voice is pretreated, so
Each frame can all have one it is corresponding, can be obtained by a fundamental frequency sequence in this way.
And fundamental frequency feature is exactly to calculate some simple statistical parameters in the fundamental frequency sequence basis extracted.
(1) F0_mean: averaging to fundamental frequency sequence, reflects the whole height of subject's vibration frequency of vocal band,
There is certain difference between men and women.
(2) F0_max: i.e. to fundamental frequency sequence maximizing, the maximum value of subject's vibration frequency of vocal band is reflected.
(3) F0_min: minimizing to fundamental frequency sequence, reflects the minimum value of subject's vibration frequency of vocal band.
(4) F0_median: seeking intermediate value to fundamental frequency sequence, reflects subject's vibration frequency of vocal band to a certain extent
Whole height.
(5) F0_std: seeking standard deviation to fundamental frequency sequence, reflects the dispersion degree of subject's vibration frequency of vocal band.
2. fundamental frequency disturbs Jitter
Jitter sheet is the meaning of shake, referred in Probability an event actually occur with it is ideal occur when
Between deviation, be made of certainty content and Gauss (random) content.
Here Jitter is used to indicate the disturbance of fundamental frequency, the i.e. degree in pitch period deviation period, since audio uses
Be to continue pronunciation process, and send out long vowel, the alternate fundamental frequency disturbance of first consonant can be excluded, so Jitter is to a certain degree
On reflect subject to the control ability of vocal cord vibration.
The not instead of fundamental frequency that Jitter is used when calculating, pitch period, pitch period are defined as follows:
S.t.n=1,2 ..., N (2.1)
Wherein, n indicates that frame, N represent the sum of frame, refer to the fundamental frequency of n-th frame, and what is referred to is exactly the fundamental tone week of the n-th frame
Phase.
(1) Jitter: i.e. the Relative Perturbation of pitch period is the average absolute value of the difference of adjacent pitch period, with fundamental tone
The ratio of the average value in period reflects whole subject to the relation control ability of vocal cord vibration.Its formula are as follows:
(2) Jitter_abs: i.e. the absolute perturbation of pitch period is exactly the average absolute value of the difference of adjacent pitch period,
Whole subject is reflected to the absolute control capacity of vocal cord vibration.Its formula are as follows:
(3) Jitter_PPQ5: i.e. the adjacent 5 points of disturbances of pitch period, are the flat of a certain frame pitch period 5 frame adjacent thereto
The average absolute value of the difference of equal pitch period, reflects a period of time subject to the control ability of vocal cord vibration to a certain degree.
Its formula are as follows:
(4) Jitter_rap: i.e. the adjacent 3 points of disturbances of pitch period, are the flat of a certain frame pitch period 3 frame adjacent thereto
The average absolute value of the difference of equal pitch period, reflects a period of time subject to the control ability of vocal cord vibration to a certain degree.
Its formula are as follows:
(5) Jitter_ddp: i.e. the difference of the adjacent 3 points of disturbances of pitch period, is the difference between adjacent 3 frame pitch period
Difference, then be averaging absolute value, reflect subject in a bit of time to a certain degree to the control ability of vocal cord vibration
Difference.Its formula are as follows:
3. amplitude disturbances Shimmer
Shimmer sheet is the meaning of flashing, herein same Jitter, but the object of its measurement is different, is the vibration of voice
The degree of the disturbance of width, i.e. amplitude deviation mean amplitude of tide.
Since the amplitude that experiment measures voice is not merely related with subject's sounding, also at a distance from subject and mobile phone
Etc. other factors it is related, so being difficult with the amplitude of voice directly as feature.As soon as but patient Parkinson has a feature,
It is when speaking, sound can be original smaller, so patient Parkinson is not so good as normal person to the control of the amplitude of voice, and
Shimmer can be very good to embody this point.
The definition of the amplitude of Shimmer measurement is:
A0,n=max (Pn)-min(Pn) (3.1)
S.t.n=1,2 ..., N
Wherein, sequence indicates the voice signal value sequence of n-th frame, refers to the corresponding amplitude of n-th frame.
(1) Shimmer: i.e. amplitude disturbances (precentagewise to calculate), are the average absolute values of the difference of adjacent amplitude, with vibration
The ratio of the average value of width reflects subject to the relation control ability of voice amplitudes.Its formula are as follows:
(2) Shimmer_dB: i.e. amplitude disturbances (being calculated by decibel).It is the average value of the ratio of adjacent amplitude, only
Unit is a decibel dB, reflects subject to the relation control ability of voice amplitudes.Its formula are as follows:
(3) Shimmer_APQ5: i.e. the adjacent 5 points of disturbances of amplitude, are the means amplitude of tide of a certain frame amplitude 5 frame adjacent thereto
Difference average absolute value, reflect a period of time subject to a certain degree to the control ability of voice amplitudes.Its formula are as follows:
(4) Shimmer_APQ3: i.e. the adjacent 3 points of disturbances of amplitude, are the means amplitude of tide of a certain frame amplitude 3 frame adjacent thereto
Difference average absolute value, reflect a period of time subject to a certain degree to the control ability of voice amplitudes.Its formula are as follows:
(5) Shimmer_dda: i.e. the difference of the adjacent 3 points of disturbances of amplitude, is the difference of the difference between adjacent 3 frame amplitude, then
It is averaging absolute difference, reflects in a bit of time subject to a certain degree to the difference of the control ability of voice amplitudes.Its
Formula are as follows:
(6) Shimmer_APQ11: i.e. the adjacent 11 points of disturbances of amplitude, are the average vibrations of a certain frame amplitude 11 frame adjacent thereto
The average absolute value of the difference of width reflects a long time subject to the control ability of voice amplitudes to a certain degree.Its formula
Are as follows:
4. signal-to-noise ratio feature
Harmonic wave signal-to-noise ratio HNR (Harmonic to Noise Ratio) and harmonic wave jamtosignal NHR (Noise to
Harmonic Ratio) it is the feature that relatively common a pair is used to assess the ratio in voice comprising noise, in speech disease
Diagnostic value in Neo-Confucianism was widely reported.They can be very good measurement voiced speech signal for generating sound
Learn the additive noise in waveform.And since the long vowel that this experiment uses also belongs to Voiced signal, so HNR and NHR are fine
Voice disorder feature.
If acquisition voice signal it is pure enough, noise therein mostly come from vocal cords vibration when not
Be closed completely, patient Parkinson due to the hypotaxia to the muscle near vocal cords and can chatter, it is possible to for commenting
The ability that vocal cords are closed when estimating Parkinson's disease human hair sound.
The calculation method of one comparison simple is also based on auto-correlation function.First in fundamental frequency feature trifle
Method auto-correlation function (auto-correlation in short-term) is calculated to the voice signal of each frame.Define a parameter, which represent in addition to
Other than zero-lag, that maximum corresponding subscript k value of point of autocorrelative value, the corresponding pitch period of the approximation frame.Then root
According to formula (1.1), Rxx(lmax) just it is the largest the corresponding auto-correlation function value of that point.
So harmonic component are as follows:
Noise component are as follows:
So HNR's (being calculated by decibel) is defined as:
So NHR's (calculating in proportion) is defined as:
Each frame can obtain a corresponding HNR and NHR, so with reference to fundamental frequency feature, it is also necessary in HNR and NHR
Some simple statistical parameters are calculated on the basis of sequence.
(1) HNR_mean, the i.e. average value of harmonic wave signal-to-noise ratio, can be in the measurement of decibel, when assessing subject's sounding
The average ability of vocal cords closure.
(2) HNR_std, i.e. harmonic wave signal-to-noise ratio standard deviation, can be in the measurement of decibel, sound when assessing subject's sounding
Mean change with closure.
(3) NHR_mean, the i.e. average value of harmonic wave jamtosignal, can be in normal measurement, when assessing subject's sounding
The average ability of vocal cords closure.
(4) NHR_std, i.e. harmonic wave signal-to-noise ratio standard deviation, can be in normal measurement, sound when assessing subject's sounding
Mean change with closure.
5. nonlinear characteristic
It is discovered by experiment that can more distinguish the sound and health of patient Parkinson using the model that nonlinear characteristic constructs
The sound of people, so more and more correlative studys in recent years all efforts be made so that with new nonlinear characteristic.
Nonlinear characteristic calculates more complicated.Nonlinear characteristic used by this is tested is to have trend fluction analysis
(DFA), recurrence period density entropy (RPDE), D2, fundamental frequency cycles entropy (PPE).It is below exactly the principle and calculating to these features
The introduction of method.
5.1DFA (removes trend fluction analysis)
It goes trend fluction analysis (Detrended Fluctuation Analysi, DFA) to be one the most directly to comment
Estimate one of the method for signal point shape autocorrelation, early stage is the variation characteristic for assessing DNA chain structural order.It is main
Thought is the inherent trend removed in signal, to obtain the wave characteristic in signal, belongs to nonlinear model, main to use
In the time series (statistically refer to the characteristics such as variance, auto-correlation change with the time sequence) to non-stationary
Scale index is calculated.
Its calculating is complex, it is necessary first to add up to input signal x (n):
It has been generally acknowledged that x (n) satisfaction is independently distributed, and it is distributed identical, it can think that y (n) statistically meets from phase
Pass is independently distributed.Then need to carry out y (n) not Overlapping Fragment, the selection of the number of segment scales default of segmentation is from points 2
Section starts, and arrives closest to log2The integer of M, wherein M is the sampling sum of input signal.
After segmentation, needs all to carry out linear fit to each section, it is quasi- that single order is carried out usually using least square method
It closes (linear regression).A final each section of available corresponding slope a and intercept b.Then need to ask after fitting with it is true
Mean square error (and fluctuation) between value:
Wherein, L is each section of length, yi(n) refer to i-th section of n-th of y value, ai,biExactly i-th section fits
Slope and intercept.
If the number of segment scale of segmentation has N number of value, i.e. L has N number of value so just to need N number of to seek pair this respectively
The F (L) answered.Due to the relationship between section length L and scale index are as follows: F (L) ∝ Lα, so cannot directly ask, usually need
It is mapped in the space log-log, i.e. log (F (L)) ∝ α log (L), carries out linear fit, same fitting uses minimum
Square law carries out single order fitting (linear regression), available, i.e. DFA value.
5.2RPDE (recurrence period density entropy)
Recurrence period density entropy (Recurrence Period Density Entropy, RPDE) is one based on mixed
The ignorant theoretical algorithm for being used to find the signal period.It can be found that a duplicate degree of time series, basic thought is in phase
The multiplicity of measuring signal, i.e. recurrence period belong to nonlinear model to can determine whether the size of random noise under space.
Its calculating is related to chaology.
(1) chaology:
The basis of some chaology is introduced first: being different from common time series research, in chaos time sequence
In, it needs for sequence to be mapped in phase space, i.e. phase space reconfiguration.In general, coordinate delay can be used according to embedding theorems
Method carries out phase space reconfiguration:
X={ Xi|Xi=[xn,xn+τ,xn+2τ,...,xn+(m-1)τ]T, i=1,2 ..., M } (5.2.1)
In formula (5.2.1), xnIt is n-th of value of the time series x of input.XnIt is the insertion vector that phase space reconfiguration goes out,
It is insertion delay, m is insertion dimension, that is, is embedded in the length of vector, and M is the number for being embedded in vector.Insertion delay and insertion vector
Selection be the highly important ring of phase space reconfiguration.
(2) C-C algorithm
At present there are many it is theoretical to and m solve, wherein C-C algorithm is wherein the most classical one of algorithm, can be with
Directly the two is all found out, is based on correlation integral:
Wherein, dij=| | Xi-Xj||(∞), it is its Infinite Norm, and N corresponds to the length of input time sequence.
Need to resolve into input time sequence t following subsequences when so calculating:
N is the integral multiple of t, next:
Radius r is adjusted, definition:
ΔS2(m, r, t)=max { S2(m,rj,t)}-min{S2(m,rj,t)} (5.2.6)
The value of usual N, m, r can estimate by BDS statistical conclusions, wherein if taking N=3000, m=2,3,4,
5,riIt is the standard deviation of input time sequence when=i × 0.5 σ, i=1,2,3,4.Then:
First zero point orThe corresponding t of first minimum be require it is optimal.
S2cor(t) the corresponding t of minimum value is imbedding inequality L=(m-1) t, then can be obtained by optimal m.
(3)RPDE
Obtain m and after, M insertion vector X can be obtained by according to formula (5.2.1), then using closing on the method for turning back, thing
It first defines a radius r and length is M vector P, P is initialized as null vector, and principle is to choose i-th of insertion vector first
Xi, then to vector X lateriIt is compared, when finding first XjIt is started counting at a distance from it less than r, until first
A Xj+nIt is greater than r at a distance from it to stop, that is, has n-1 and its vectors of distance less than r, then to (n-1)th value+1 of P,
Then to next vector Xi+1It is similarly operated, until institute's directed quantity wheel is complete.
So RPDE are as follows:
Wherein, TmaxIt is the length of the P chosen, default selects P entirely.
(5.3D2 relevant dimension)
Relevant dimension (correlation dimension, D2), is also based on an index of chaology, reflects
The correlation degree of signal.
D2 is according to correlation integral C (r), and definition is identical as formula (5.2.2), but dijMeaning difference by infinite model
Number becomes two norms, i.e. dij=| | Xi-Xj||2.According to formula (5.2.2), r is radius, it is believed that the pass between index of correlation v and r
System are as follows: C (r) ∝ rv, then can also be mapped that in the space log-log similar to DFA, i.e. log (C (r)) ∝ v log
(r), linear fit is carried out.It is carried out when fitting using least square method single order fitting (linear regression), available v,
That is D2 value.
5.4PPE (fundamental frequency cycles entropy)
Fundamental frequency cycles entropy (Pitch Period Entropy, PPE), the stabilization of fundamental frequency when being for measuring lasting sounding
The feature of property, thought are to be mapped to fundamental frequency in log space, and remove various disturbing factors to seek the period entropy of fundamental frequency,
Belong to nonlinear model.
Fundamental frequency F0 is converted into logarithm semitone sequence first:
F0,per=12log2(F0/127)22 (5.4.1)
Then it needs F0,perPower spectrum flattening, this experiment use common linear whitening filter, and principle is first
First calculate the power spectrum G of input speech signalx(ω), then filter are as follows:
Power spectrum flattening filtering can filter out the influence (related to gender and pronunciation content) of average halftone, filtering
Obtaining sequence later is r.Then its power spectrum P (r) is sought.The method for finally copying RPDE seeks entropy:
Wherein, L is the length of power spectrum sequence.
We by each section acquire audio all carry out more than feature calculation.
Four model introductions and calculating process
Support vector machines (Support Vector Machine, SVM) is sorting algorithm the most classical.It is mainly thought
Want that finding a hyperplane is split sample, this hyperplane needs to allow as far as possible between positive class and negative class sample
Interval is maximum, and Fig. 3 illustrates a basic linear separability SVM classifier.
Assumed initially that M sample i ∈ { 1,2 ..., M } | (xi,yi), wherein being the vector of n dimension, represent
The feature vector of i-th of sample, n value have corresponded to the n feature y of i-th of samplei∈ {+1, -1 } is i-th of sample
Class label, i.e., positive class and negative class.In Fig. 3, H is the hyperplane for classifying, and following linear equation table can be used
Show:
wTX+b=0 (1)
Wherein, w=(w1;w2;...;wn) be H normal vector, dimension is also n dimension, and b is displacement item, represents H and former
The distance of point.For i-th of sample (xi,yi), if yi=+1, then wTxi+b>0;If yi=-1, then wTxi+ b < 0 thus may be used
To classify.
And H1 and H2 are two parallel with H and equidistant two hyperplane, wherein H1:wTX+b=-1 and H2:wTx+
B=1.In the thought of SVM, a good classifier need to satisfy two conditions:: (1) without any sample H1 and H2 it
Between;(2) the distance between H1 and H2 want maximum.
The range formula between two hyperplane of H1 and H2 can be released according to the distance between straight line formula:
Wherein, D is the distance of two hyperplane H1 and H2.According to condition (1), sample must cannot between h, and h 2,
According to condition (2), need to make D maximum, then the minimum value that can be solved, then can convert are as follows:
Formula (3) is the basic model of SVM.It is evident that formula 3 is a convex quadratic programming problem.Convex optimization side can be used
Method solves, and according to lagrange's method of multipliers theorem, then solves optimal value using dual problem.
Lagrange multiplier is introduced to above-mentioned function first, obtains Lagrangian:
Wherein, α=(α1;α2;...;αM), seeking this function the local derviation of w and b respectively is 0, as a result are as follows:
So formula (5) and (6) are substituted into (4), so that it may obtain a dual problem:
αi>=0, i=1,2 ..., M
The problem can be solved usually using SMO algorithm.Utilizing dual problem theory, it is known that α can solve w,
In turn, it is known that w can solve α, and the maximum value for solving w is converted into the maximum value for solving α, find out α*Bringing formula (4) into can
Find out w*And b*.Finally obtain optimal hyperlane H*.Function is substituted into when a new sample input according to formula (1)It can be obtained by the classification of the sample.
But if data have noise, directly can bring about certain error using above-mentioned model, it at this moment can allow model
Allow some data points to deviate hyperplane in a certain range, therefore introduces this concept of slack variable ξi, wherein ξi≥0。
s.t.yi(wTxi+b)+ξi>=1, i=1,2..., M
ξi>=0, i=1,2..., M
Introduce slack variable ξiLater, allow the function interval of sample point less than 2, allow some sample points in hyperplane
Between or the region of other side in.Section 2 is penalty term in objective function, is punished to outlier, the more mesh of outlier
Offer of tender numerical value is more, therefore it is required that reducing target function value as far as possible.C indicates the weight of outlier, the bigger target letter of C
Numerical value is bigger.
But for reality problem, usually with a linear model be can not Accurate classification (linearly inseparable), must
Nonlinear model must need to be used, SVM solves this problem by introducing kernel function, and principle is directly by sample from original sky
Between be mapped to the feature space of more higher-dimension, and linear separability within this space.
Enabling φ (x) is feature vector of the x after mapping, then being brought into formula (8):
When calculating, need to calculate φ (xi)Tφ(xj), thus kernel function introduces:
κ(xi,xj)=< φ (xi),φ(xj) >=φ (xi)Tφ(xj) (10)
Due to its principle, so the selection of kernel function and the property relationship of classifier are close.Table 5.1 is enumerated several common
Kernel function.
The common kernel function of table 5.1
It finally solves available:
There are many methods for realizing SVM, and method most popular at present is Taiwan Univ. professor Lin Zhiren exploitation
The tool box LIBSVM, easy to use, speed is quick, full-featured, supports Windows and Unix operating system, the language of support
Than wide, the language such as Java, Python, R, Matlab, Ruby can be supported, this experiment uses Matlab to adjust
With LIBSVM, version is 3.21 editions.The linear core of the kernel function that LIBSVM itself can be supported, polynomial kernel, RBF core and
Sigmoid core, through testing, this experiment finally uses Gaussian kernel (RBF core).
Support vector regression (Support Vector Regression, SVR) is improved according to the basic thought of SVM
At a kind of regression algorithm its main thought be to find a hyperplane to map sample, not with other regression algorithms
As soon as be its mapping true value between absolute value of the difference if it is less than a specific range, be not counted in loss.Fig. 2
Illustrate that a basic linear SVR returns device.
Assumed initially that M sample i ∈ { 1,2 ..., M } | (xi,yi), wherein xiIt is the vector of n dimension, represents
The feature vector of i-th of sample, n value have corresponded to the n feature of i-th of sample, yiIt is the corresponding of i-th of sample
Output is returned, i.e., continuous value.In Fig. 2, H is the hyperplane for Hui-Hui calendar, and following linear equation table can be used
Show:
wTX+b=0 (12)
Wherein, w=(w1;w2;...;wn) be H normal vector, dimension is also n dimension, and b is displacement item, represents H and former
The distance of point.For i-th of sample (xi,yi), its feature vector is input to after returning in device, the output of model can be obtained
f(xi)=wTxi+ b can be carried out returning in this way.
And H1 and H2 are two parallel with H and equidistant two hyperplane, wherein H1:wTX+b=-1 and H2:wTx+
B=ε.According to the basic thought of SVR, it can tolerate that the error between true output y and model output value f (x) is no more than ε,
Only error is more than that ε just calculates error loss.Such as Fig. 4, even if the part prediction that is, among H1 and H2 is correct.It also needs simultaneously
Allow H1 and H2 distance recently, so obtaining formula (13) according to formula (2) and above-mentioned condition:
Wherein, C is the coefficient of regular terms, lεIt is our calculative losses, referred to as insensitive loss:
Same SVM introduces slack variable ξiWithThe relaxation degree that two slack variables represent two sides can be different, then
Introduce kernel function κ (xi,xj), define same SVM.It may finally become:
Its method for solving is completely as SVM, by introducing Laplce's multiplier, then utilizes dual problem
Realize SVR method also there are many, here for convenience the tool box LIBSVM is still called using Matlab,
As long as adjusting parameter therein changes algorithm types, in the version that this experiment uses, there are e-SVR and v-SVR two
Seed type selects more commonly used e-SVR after testing.
The characteristic that a upper trifle is handled and the UPDRS that doctor provides are corresponded, row is at data
Collection.Above data collection is clustered using k-means, for the classification of each cluster, by data set in the ratio of 3:1
It is divided into training set and test set.For the classification of each cluster stroke, training set carries out disaggregated model instruction using SVM model
Practice, carries out regression model training using SVR, optimize model using trellis search method, will be trained, each classification
Model parameter saved.
Five are predicted
Each classification for cluster is loaded into classification and regression model obtained in the above training process.Input needs to sentence
Disconnected and prediction data, calculate Distance Judgment generic, carry out classified calculating to data using the corresponding model of the category,
Obtain classification results.Classification results are handled again, are the testers for not suffering from Parkinson's disease for prediction, by UPDRS value
It is set to 0, for predicting the tester with Parkinson's disease, SVR is reused and carries out regression forecasting, obtain calculated UPDRS
Value is speculated by the severity that the value of UPDRS carries out patient, terminates process.
Six feedback results
By the result of prediction by interface, front end is fed back to, user is showed.
The foregoing is only a preferred embodiment of the present invention, but protection scope of the present invention be not limited to
This, anyone skilled in the art within the technical scope of the present disclosure, according to the technique and scheme of the present invention
And its inventive concept is subject to equivalent substitution or change, should be covered by the protection scope of the present invention.
Claims (5)
1. a kind of use phonetic decision Parkinson severity detection method based on cluster, which is characterized in that including walking as follows
It is rapid:
The acquisition of S1, voice signal
Vowel is selected, by acquiring equipment, acquire following content: whether patient number name, the age, gender, is diagnosed as pa gold
It is gloomy, whether have other cause the disease of voice disorder, sick time, UPDRS (movement), UPDRS (entirety), acquisition the date it is timely
Between, the same day which time acquisition;
The pretreatment of S2, voice signal
The pretreatment of voice signal, including format conversion, sample frequency conversion, preemphasis, adding window and framing are carried out to voice, are gone
Except unvoiced section, fundamental frequency extraction is carried out;
S3, all phonetic features are extracted
Jitter, amplitude disturbances Shimmer, signal-to-noise ratio feature, nonlinear characteristic are disturbed including fundamental frequency feature Pitch, fundamental frequency;
S4, model and calculating
Sorting algorithm based on support vector machines, using linear separability SVM classifier and nonlinear model, SVM passes through introducing
Kernel function establishes model;The information that characteristic obtained by S3 and doctor provide is corresponded, row is at data set;It uses
K-means is clustered, and for the classification of each cluster, data set is divided into training set and test set in the ratio of 3:1;Needle
To the classification of each cluster stroke, training set carries out disaggregated model instruction using the sorting algorithm and SVM model of support vector machines
Practice, carries out regression model training using support vector regression SVR method, optimize model using trellis search method, will instruct
It perfects, the model parameter of each classification is saved.
S5, it is predicted
Each classification for cluster is loaded into classification and regression model obtained in the above training process;Input need judge and
The data of prediction calculate Distance Judgment generic, carry out classified calculating to data using the corresponding model of the category, are divided
Class result.Classification results are handled again, are the testers for not suffering from Parkinson's disease for prediction, it is 0 that mark value, which is installed,
For predicting the tester with Parkinson's disease, reuses SVR and carry out regression forecasting, obtain calculated mark value, pass through mark
The severity that note value carries out patient speculates, terminates process;
S6, feedback result
By the result of prediction by interface, front end is fed back to, user is showed.
2. using phonetic decision Parkinson severity detection method based on cluster according to claim 1, feature exists
In fundamental frequency feature Pitch carries out the extraction of fundamental frequency using correlation method in step S3;
For deterministic signal, short-time autocorrelation function is defined as:
Then for the auto-correlation function of each frame, need to find maximum peak value after its first zero crossing, it is corresponding
Subscript k value is exactly the corresponding pitch period of frame voice, and inverted to k is exactly fundamental frequency;
General fundamental frequency is used to indicate, due to having carried out sub-frame processing to voice signal when voice is pretreated, so each
Frame can all have one it is corresponding, can be obtained by a fundamental frequency sequence in this way.
And fundamental frequency feature is exactly to calculate some simple statistical parameters in the fundamental frequency sequence basis extracted.
3. using phonetic decision Parkinson severity detection method based on cluster according to claim 1, feature exists
In in step S3, Jitter is used to indicate the disturbance of fundamental frequency, the i.e. degree in pitch period deviation period, is used due to audio
It is to continue pronunciation process, and send out long vowel, the alternate fundamental frequency disturbance of first consonant can be excluded;
The not instead of fundamental frequency that Jitter is used when calculating, pitch period, pitch period are defined as follows:
Wherein, n indicates that frame, N represent the sum of frame, refer to the fundamental frequency of n-th frame, and what is referred to is exactly the pitch period of n-th frame;
(1) Jitter: i.e. the Relative Perturbation of pitch period is the average absolute value of the difference of adjacent pitch period, with pitch period
Average value ratio, reflect whole subject to the relation control ability of vocal cord vibration;Its formula are as follows:
(2) Jitter_abs: i.e. the absolute perturbation of pitch period is exactly the average absolute value of the difference of adjacent pitch period, reflection
Absolute control capacity of the whole subject to vocal cord vibration;Its formula are as follows:
(3) Jitter_PPQ5: i.e. the adjacent 5 points of disturbances of pitch period, are the average bases of a certain frame pitch period 5 frame adjacent thereto
The average absolute value of the difference in sound period reflects a period of time subject to the control ability of vocal cord vibration to a certain degree;It is public
Formula are as follows:
(4) Jitter_rap: i.e. the adjacent 3 points of disturbances of pitch period, are the average pitch of a certain frame pitch period 3 frame adjacent thereto
The average absolute value of the difference in period reflects a period of time subject to the control ability of vocal cord vibration to a certain degree;Its formula
Are as follows:
(5) Jitter_ddp: i.e. the difference of the adjacent 3 points of disturbances of pitch period, is the difference of the difference between adjacent 3 frame pitch period
Value, then is averaging absolute value, reflects in a bit of time subject to a certain degree to the difference of the control ability of vocal cord vibration;
Its formula are as follows:
4. using phonetic decision Parkinson severity detection method based on cluster according to claim 1, feature exists
In in step S3, Shimmer is the disturbance of the amplitude of voice, i.e. the degree of amplitude deviation mean amplitude of tide:
The definition of the amplitude of Shimmer measurement is:
A0,n=max (Pn)-min(Pn) (3.1)
S.t.n=1,2 ..., N
Wherein, sequence indicates the voice signal value sequence of n-th frame, refers to the corresponding amplitude of n-th frame;
(1) Shimmer: i.e. amplitude disturbances (precentagewise to calculate), are the average absolute values of the difference of adjacent amplitude, flat with amplitude
The ratio of mean value reflects subject to the relation control ability of voice amplitudes;Its formula are as follows:
(2) Shimmer_dB: i.e. amplitude disturbances (being calculated by decibel).It is the average value of the ratio of adjacent amplitude, only unit
It is a decibel dB, reflects subject to the relation control ability of voice amplitudes;Its formula are as follows:
(3) Shimmer_APQ5: i.e. the adjacent 5 points of disturbances of amplitude, are the differences of the mean amplitude of tide of a certain frame amplitude 5 frame adjacent thereto
Average absolute value reflects a period of time subject to the control ability of voice amplitudes to a certain degree;Its formula are as follows:
(4) Shimmer_APQ3: i.e. the adjacent 3 points of disturbances of amplitude, are the differences of the mean amplitude of tide of a certain frame amplitude 3 frame adjacent thereto
Average absolute value reflects a period of time subject to the control ability of voice amplitudes to a certain degree;Its formula are as follows:
(5) Shimmer_dda: i.e. the difference of the adjacent 3 points of disturbances of amplitude is the difference of the difference between adjacent 3 frame amplitude, then be averaging
Absolute difference reflects in a bit of time subject to the difference of the control ability of voice amplitudes to a certain degree;Its formula are as follows:
(6) Shimmer_APQ11: i.e. amplitude it is adjacent 11 points disturbance, be a certain frame amplitude 11 frame adjacent thereto mean amplitude of tide it
The average absolute value of difference, reflects a long time subject to the control ability of voice amplitudes to a certain degree;Its formula are as follows:
5. using phonetic decision Parkinson severity detection method based on cluster according to claim 1, feature exists
In in step S3, nonlinear characteristic is to have trend fluction analysis DFA, recurrence period density entropy RPDE, fundamental frequency cycles entropy PPE.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810276038 | 2018-03-30 | ||
CN201810276038X | 2018-03-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109192221A true CN109192221A (en) | 2019-01-11 |
Family
ID=64914571
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811032625.0A Withdrawn CN109192221A (en) | 2018-03-30 | 2018-09-05 | It is a kind of that phonetic decision Parkinson severity detection method is used based on cluster |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109192221A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110335624A (en) * | 2019-07-29 | 2019-10-15 | 吉林大学 | Parkinson's disease speech detection method based on power normalization cepstrum coefficient feature |
CN111210840A (en) * | 2020-01-02 | 2020-05-29 | 厦门快商通科技股份有限公司 | Age prediction method, device and equipment |
CN111292851A (en) * | 2020-02-27 | 2020-06-16 | 平安医疗健康管理股份有限公司 | Data classification method and device, computer equipment and storage medium |
CN112102849A (en) * | 2019-06-18 | 2020-12-18 | 成都医学院 | Sound analysis method and device |
CN112259126A (en) * | 2020-09-24 | 2021-01-22 | 广州大学 | Robot and method for assisting in recognizing autism voice features |
CN113361563A (en) * | 2021-04-22 | 2021-09-07 | 重庆大学 | Parkinson's disease voice data classification system based on sample and feature double transformation |
CN113393932A (en) * | 2021-07-06 | 2021-09-14 | 重庆大学 | Parkinson's disease voice sample segment multi-type reconstruction transformation method |
CN117059133A (en) * | 2023-10-13 | 2023-11-14 | 首都医科大学附属北京天坛医院 | Speech function evaluation device, electronic apparatus, and storage medium |
-
2018
- 2018-09-05 CN CN201811032625.0A patent/CN109192221A/en not_active Withdrawn
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112102849A (en) * | 2019-06-18 | 2020-12-18 | 成都医学院 | Sound analysis method and device |
CN110335624A (en) * | 2019-07-29 | 2019-10-15 | 吉林大学 | Parkinson's disease speech detection method based on power normalization cepstrum coefficient feature |
CN111210840A (en) * | 2020-01-02 | 2020-05-29 | 厦门快商通科技股份有限公司 | Age prediction method, device and equipment |
CN111292851A (en) * | 2020-02-27 | 2020-06-16 | 平安医疗健康管理股份有限公司 | Data classification method and device, computer equipment and storage medium |
CN112259126A (en) * | 2020-09-24 | 2021-01-22 | 广州大学 | Robot and method for assisting in recognizing autism voice features |
CN113361563A (en) * | 2021-04-22 | 2021-09-07 | 重庆大学 | Parkinson's disease voice data classification system based on sample and feature double transformation |
CN113393932A (en) * | 2021-07-06 | 2021-09-14 | 重庆大学 | Parkinson's disease voice sample segment multi-type reconstruction transformation method |
CN117059133A (en) * | 2023-10-13 | 2023-11-14 | 首都医科大学附属北京天坛医院 | Speech function evaluation device, electronic apparatus, and storage medium |
CN117059133B (en) * | 2023-10-13 | 2024-01-26 | 首都医科大学附属北京天坛医院 | Speech function evaluation device, electronic apparatus, and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109192221A (en) | It is a kind of that phonetic decision Parkinson severity detection method is used based on cluster | |
CN107657964B (en) | Depression auxiliary detection method and classifier based on acoustic features and sparse mathematics | |
Tsanas et al. | Remote assessment of Parkinson’s disease symptom severity using the simulated cellular mobile telephone network | |
CN109285551B (en) | Parkinson patient voiceprint recognition method based on WMFCC and DNN | |
US11672472B2 (en) | Methods and systems for estimation of obstructive sleep apnea severity in wake subjects by multiple speech analyses | |
Yap | Speech production under cognitive load: Effects and classification | |
Wang et al. | Automatic assessment of pathological voice quality using multidimensional acoustic analysis based on the GRBAS scale | |
Reggiannini et al. | A flexible analysis tool for the quantitative acoustic assessment of infant cry | |
López-Pabón et al. | Cepstral analysis and Hilbert-Huang transform for automatic detection of Parkinson’s disease | |
Grzywalski et al. | Parameterization of Sequence of MFCCs for DNN-based voice disorder detection | |
Wang et al. | Automatic hypernasality detection in cleft palate speech using cnn | |
Kulkarni et al. | Child cry classification-an analysis of features and models | |
Madruga et al. | Multicondition training for noise-robust detection of benign vocal fold lesions from recorded speech | |
Revathi et al. | Robust respiratory disease classification using breathing sounds (RRDCBS) multiple features and models | |
Reddy et al. | Exemplar-Based Sparse Representations for Detection of Parkinson's Disease From Speech | |
Ankışhan | Classification of acoustic signals with new feature: Fibonacci space (FSp) | |
Deepa et al. | Speech technology in healthcare | |
Coro et al. | A self-training automatic infant-cry detector | |
RU2559689C2 (en) | Method of determining risk of development of individual's disease by their voice and hardware-software complex for method realisation | |
Naikare et al. | Classification of voice disorders using i-vector analysis | |
Campi et al. | Ataxic speech disorders and Parkinson’s disease diagnostics via stochastic embedding of empirical mode decomposition | |
Safdar et al. | Prediction of Specific Language Impairment in Children using Cepstral Domain Coefficients | |
Korvel et al. | Investigating Noise Interference on Speech Towards Applying the Lombard Effect Automatically | |
Howard | Speech fundamental period estimation using pattern classification | |
Hasan | Bird Species Classification And Acoustic Features Selection Based on Distributed Neural Network with Two Stage Windowing of Short-Term Features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20190111 |
|
WW01 | Invention patent application withdrawn after publication |