CN114400025A - Automatic schizophrenia voice detection method and system based on EHHT and CI - Google Patents

Automatic schizophrenia voice detection method and system based on EHHT and CI Download PDF

Info

Publication number
CN114400025A
CN114400025A CN202210100356.7A CN202210100356A CN114400025A CN 114400025 A CN114400025 A CN 114400025A CN 202210100356 A CN202210100356 A CN 202210100356A CN 114400025 A CN114400025 A CN 114400025A
Authority
CN
China
Prior art keywords
voice
schizophrenia
ehht
automatic
formant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210100356.7A
Other languages
Chinese (zh)
Inventor
田维维
冯瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN202210100356.7A priority Critical patent/CN114400025A/en
Publication of CN114400025A publication Critical patent/CN114400025A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Abstract

The invention provides an automatic schizophrenia voice detection method and system based on EHHT and CI, which improve the existing formant extraction algorithm by analyzing and researching the clinical characteristics of schizophrenia voice, acquire an acoustic characteristic parameter set reflecting the tone quality emotional change of schizophrenia voice from a voice sample by using the improved formant extraction algorithm, further perform classification detection by using an SVM classifier, realize the automatic classification detection of the voice of schizophrenia patients and the voice of a healthy control group, and further discuss the influence of four factors of white noise times and variance, IMF component number and window length on the detection effect by further design experiments. The result shows that the detection accuracy can reach 98.8%, the schizophrenia patient has obvious difference with a healthy control group on the formant voice acoustic parameter which shows the characteristic of the tone quality, and a brand-new objective, quantitative and efficient index is possibly provided for the clinical auxiliary diagnosis research of the schizophrenia.

Description

Automatic schizophrenia voice detection method and system based on EHHT and CI
Technical Field
The invention belongs to the technical field of pattern recognition and pathological voice signal processing, relates to an automatic schizophrenia voice signal detection method, and particularly relates to an automatic schizophrenia voice detection method and system based on EHHT and CI.
Background
Schizophrenia is a chronic neurodegenerative disorder with prolonged disease course, has the characteristics of high recurrence rate, high disability rate, poor compliance and the like, is usually accompanied with the lifetime of patients, seriously impairs the life quality and social cognition of the patients, and causes great productivity loss for the patients and family members. Studies have shown that its major clinical manifestations include one or more of auditory hallucinations, delusions, speech (thought) disorders, marked tension or abnormal behavior, negative symptoms (e.g., poverty of speech, apathy, lack of motivation, etc.) in five dimensions. The problems that the number of psychiatrists faces serious gaps, the distribution of mental and sanitary resources is unreasonable and the like exist in China, and patients often miss the optimal treatment time. If reasonable and effective automatic detection means can be adopted in the early period of the onset of mental symptoms and intervention treatment can be carried out in time, the method is helpful for improving the state of illness of patients and reducing the damage of social functions of the patients.
Currently, the diagnosis and monitoring evaluation of schizophrenia mainly depends on the following six ways: (1) clinical scale: the evaluation system comprises a Negative symptom Scale (SANS), a Positive symptom Scale (SAPS), a Positive and Negative Scale (PANSS) and The like, wherein The Scale is used for The Assessment of Negative Symptoms, The SANS, The SAPS, The PANSS and The like are seriously dependent on self-description and family history of patients, and are easily interfered by subjective factors of scorers, such as inquiry skills, clinical experience and The like, and The evaluation system lacks standardization; (2) brain imaging: functional Nuclear Magnetic Resonance (fMRI) brain area images mainly focus on changes in the brain parenchyma and brain function of the brains of schizophrenic patients, the instrument operation is complex, the detection cost is high, and objective biological markers are lacked; (3) electroencephalogram signals: EEG contains abundant physiological state information, fully reflects brain activity, but has no specificity to alpha wave and beta wave, and has limited value for diagnosis; (4) video analysis: compared with normal people, the schizophrenic patients have less body movement and more dull facial expressions, and can realize automatic detection by analyzing and extracting expression and action characteristics of the subjects, but the current research bottleneck is lack of a uniform experimental paradigm, is still in a starting stage, does not reach the level of clinical auxiliary diagnosis, and needs further deep discussion; (5) genomics: variants closely related to schizophrenia exist in exon regions of the whole gene, but the problems of false positive and false negative still exist in the process of gene sequencing capture; (6) voice signal: as a new area, studies have shown that the negative symptoms of schizophrenia are closely related to the expression of speech emotion. With the rapid development of big data, artificial intelligence, speech signal processing algorithms and the cross-fusion research of medical science, computer science and other multidisciplines, speech emotion characteristics extracted by computer aided analysis and diagnosis technology are being gradually applied to the research of schizophrenia.
At present, Compton et al have found that schizophrenic patients have chronic speech expression disorders, marked negative symptoms such as apathy and flattening, narrow melody range, abnormal tone quality, and the like. Gold R et al studied the effect of stimuli associated with emotional perception on 92 patients and 73 control subjects and showed that schizophrenic patients had significant deficits in speech emotion recognition capabilities, and were associated with potential impairment of acoustic features, as well as significant impact on social cognitive function. Xu S et al automatically extracted the lexical features and document vectors in the interview recordings of 50 schizophrenic patients and 25 age-matched healthy controls using natural language processing techniques and classified the patients and controls using an integrated machine learning algorithm with a maximum accuracy of 78.7%. Cohen A S et al extracted four basic acoustic features-pitch frequency (F0), first formant (F1), second formant (F2) and intensity to quantify the speech signal of schizophrenic patients, and found that tongue movement was more pronounced in patients with activation and hostility, whereas the formation of F1 and F2 was related to tongue up-down and anterior-posterior movement, respectively. By evaluating 10 phonetic parameters of 26 patients (at and after one week) and 30 healthy control samples (at), jin Zhang et al found that the linear prediction coefficient score of the patient group was significantly higher than that of the control group, but the Mel-frequency cepstrum coefficient score was significantly lower than that of the control group; and 10 of 170 correlation coefficients composed of 17 clinical features reach significance level.
The above studies show that there is a potential relationship between the acoustic characteristic parameters of speech and psychopathological symptoms, and further random intensive studies on more complicated and diverse patients are required. If the automatic detection of the voice of the schizophrenia patient can be realized, the method is not only beneficial to early identifying and tracking the emotional changes of the schizophrenia patient, but also can provide a new scientific basis for assisting and guiding the psychiatrist to diagnose the illness state of the patient and monitor the treatment effect. However, there is currently a lack of methods and systems for automated detection of speech in schizophrenic patients.
Disclosure of Invention
The invention is carried out to solve the above problems, and aims to provide an automatic voice detection method for schizophrenia, which extracts formant parameters reflecting voice emotion and tone characteristic changes based on EHHT and CI algorithm, and automatically detects and classifies the voices of schizophrenia patients and normal persons by using SVM classifier, and adopts the following technical scheme:
the invention provides an automatic schizophrenia voice detection method based on EHHT and CI, which is characterized by comprising the following steps: step S1, collecting the voice of the subject to obtain a voice signal; step S2, preprocessing the collected voice signal; step S3, performing feature extraction on the preprocessed voice signal by using an improved formant extraction algorithm based on EHHT and CI to obtain an acoustic feature parameter set reflecting the voice quality emotion change of the voice; step S4, extracting parameters with significant differences from the acoustic feature parameter set by using a hypothesis testing method, and combining the parameters into the acoustic feature parameter set with reduced dimensions; step S5, forming a training sample by the acoustic feature parameter set subjected to dimension reduction and the label category verified and confirmed by a psychiatrist, and training an SVM classifier; and step S6, carrying out classification detection on the voices by using the trained SVM classifier, thereby realizing automatic classification detection on the voices of the schizophrenia patients and the voices of the healthy control groups.
The EHHT and CI-based speech automatic detection method for schizophrenia provided by the present invention may further have the technical feature that, in step S2, the preprocessing includes: and carrying out direct current removal, normalization and pre-emphasis processing on the vowel signals in the voice signals, so that high-frequency components of the voice signals are improved, and interference of oral tooth radiation of the subject on extraction of formant components is avoided.
The automatic voice detection method for schizophrenia based on EHHT and CI according to the present invention may further have the technical feature that the step S3 includes the following sub-steps: step S3-1, performing empirical mode decomposition by integrating random white Gaussian noise on the voice signal for multiple times, and averaging the multiple decomposition results to obtain an inherent modal function; step S3-2, calculating the ratio of normalized band energy-entropy ratios of Hilbert marginal spectrums corresponding to IMF components of the inherent modal function, and screening the IMF components containing a plurality of formants to reconstruct the voice signal; and step S3-3, after performing frame division and windowing processing on the reconstructed voice signal, extracting a plurality of formant characteristic parameters of a plurality of formants by using a cepstrum interpolation method, and obtaining the acoustic characteristic parameter set.
The automatic voice detection method for schizophrenia based on EHHT and CI according to the present invention may further include a technical feature that, in step S3-1, the empirical mode decomposition is an integrated empirical mode decomposition.
The method for automatically detecting schizophrenia voice based on EHHT and CI according to the present invention may further include a technical feature that, in step S3-3, the number of peaks, the mean, the variance, the quartile difference, the median, the mode, the extreme difference, the skewness, and the kurtosis corresponding to the frequencies, bandwidths, and amplitudes of the first three formants are extracted as the set of formant feature parameters.
The automatic schizophrenia voice detection method based on the EHHT and the CI provided by the invention can also have the technical characteristics that in the step S3-1, the integration times of the random white Gaussian noise is 100 times, the variance of the random white Gaussian noise is 0.1, and the number of IMF components is 3; in step S3-3, the window length is 8.
The EHHT and CI-based speech automatic detection method for schizophrenia provided by the present invention may further have a technical feature that, in step S5, ten cross-validation methods are adopted to divide the training samples into ten parts, 90% of the training samples are used for training each time, 10% of the training samples are left for testing, and the mean value obtained by repeating ten times is used as an estimate of the detection accuracy of the SVM classifier.
The invention provides an automatic schizophrenia voice detection system based on EHHT and CI, which is characterized by comprising the following components: the voice acquisition module is used for acquiring the voice of a subject to obtain a voice signal; the preprocessing module is used for preprocessing the acquired voice signals; the feature extraction module is used for extracting features of the preprocessed voice signals by utilizing an improved formant extraction algorithm based on EHHT and CI to obtain an acoustic feature parameter set reflecting the voice quality emotion change of the voice; the characteristic dimension reduction module extracts parameters with significant differences from the acoustic characteristic parameter set by using a hypothesis testing method and combines the parameters into the acoustic characteristic parameter set with reduced dimension; the pattern recognition module is used for classifying the voice by utilizing a trained SVM classifier based on the dimension-reduced acoustic feature parameter set and the label category verified and confirmed by a psychiatrist, so that the automatic classification detection of the voice of the schizophrenia patient and the voice of a healthy control group is realized; and the control module is used for carrying out coordination management on the work of each module.
Action and Effect of the invention
According to the automatic schizophrenia voice detection method based on the EHHT and the CI, the clinical characteristics of schizophrenia voice are analyzed and researched, the EHHT and the CI are used for improving the existing formant extraction algorithm, the improved formant extraction algorithm is used for obtaining the acoustic characteristic parameter set reflecting the tone quality and emotion changes of schizophrenia voice from the voice sample, and then the SVM classifier is used for carrying out classification detection, so that the automatic classification detection of the voice of a schizophrenia patient and the voice of a healthy control group is realized, the method is not only beneficial to early recognition and tracking of the emotion changes of the schizophrenia patient, but also can provide a new scientific basis for assisting guidance of a psychiatrist to diagnose the illness state of the schizophrenia and monitor the treatment effect of the patient.
Drawings
FIG. 1 is a flow chart of an EHHT and CI based speech automatic detection method for schizophrenia in an embodiment of the present invention;
FIG. 2 is a simplified flow chart of the EHHT and CI-based method for automatically detecting schizophrenia in an embodiment of the present invention;
FIG. 3 is a time domain waveform and spectrogram of a vowel sounds in a schizophrenic patient in accordance with an embodiment of the present invention;
FIG. 4 is a time domain waveform and spectrogram of vowels uttered by a healthy control group in an embodiment of the present invention;
FIG. 5 is a diagram of a first peak number characteristic box of a first resonant peak in an embodiment of the present invention;
FIG. 6 is a box diagram illustrating the number of peak features of the second formants in an embodiment of the present invention;
FIG. 7 is a box diagram illustrating the number of peak features of a third formant in an embodiment of the present invention;
FIG. 8 is a line graph showing the influence of the integration number and variance of white noise on the detection result in the embodiment of the present invention;
fig. 9 is a block diagram of an EHHT and CI based speech automatic schizophrenia detection system according to an embodiment of the present invention.
Reference numerals:
a speech automatic detection system 10 for schizophrenia; a voice acquisition module 11; a pre-processing module 12; a feature extraction module 13; a feature dimension reduction module 14; a pattern recognition module 15; a control module 16.
Detailed Description
In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the EHHT and CI-based schizophrenia voice automatic detection method of the present invention is specifically described below with reference to the embodiments and the accompanying drawings.
< example >
Fig. 1 is a flowchart of the automatic voice detection method for schizophrenia based on EHHT and CI in the present embodiment. Fig. 2 is a simplified flow chart of the automatic voice detection method for schizophrenia based on EHHT and CI in this embodiment.
As shown in fig. 1 and fig. 2, in the present embodiment, the method for automatically detecting schizophrenia voice based on EHHT and CI specifically includes the following steps:
and step S1, collecting the voice of the subject to obtain a voice signal.
In this embodiment, the subjects include schizophrenic patients and normal persons, i.e., healthy controls. During the experiment, the subject was told to freely read standardized text containing all the vowels in the mandarin syllables. And collecting the voice read aloud by the subject through a recording device to obtain a voice signal. The voice signal is stored in Wave format by sampling frequency of 8000Hz, 16-bit quantization and single channel. After the voice signals are acquired, the psychiatrists check and screen the voice signals to ensure the validity of the voice signals.
In this embodiment, 686 vowel data samples of 14 schizophrenia patients and 793 vowel data samples of 14 normal persons are respectively selected to establish a pathological voice database according to the vocalization characteristics of vowels, as shown in table 1, wherein the 14 normal persons are respectively matched with the 14 schizophrenia patients in terms of gender, age, culture degree, and the like.
Table 1 pathological speech database table
Figure BDA0003492180760000091
FIG. 3 is a time domain waveform and spectrogram of vowel/u/uttered by patients with schizophrenia in this embodiment. FIG. 4 is a time domain waveform and spectrogram of vowel/u/uttered by a normal person in this embodiment.
As shown in FIGS. 3 and 4, the frequency distribution and fluctuation range of the vowel signal formants of schizophrenic patients and normal persons are significantly different and have complicated time-varying properties, mainly caused by the change of vocal tract structure during pronunciation.
And step S2, preprocessing the acquired voice signals.
In this embodiment, the preprocessing performed on the speech signal includes: and performing direct current removal, normalization and pre-emphasis processing on the vowel signals, so that the high-frequency component of the voice signals is improved, and the interference of the oral tooth radiation of a subject on the subsequent extraction of formant components is avoided.
And step S3, extracting features, namely acquiring an acoustic feature parameter set reflecting the voice quality and emotion change by using an improved formant extraction algorithm based on EHHT and CI.
Based on the difference and time-varying nature of the frequency distribution and fluctuation range of the formants of the vowel signals of the schizophrenic patients and normal people, the embodiment of the invention provides an improved formant extraction algorithm based on cepstrum interpolation. However, due to interference of noise, high frequency harmonics, and the like, the actually obtained spectral envelope peak may still deviate from the position of the formant, and therefore, in order to further improve the accuracy of extracting the formant in the interference environment, the embodiment of the present invention employs a new algorithm integrating the Hilbert-yellow Transform (EHHT) and Cepstrum Interpolation (CI).
In this embodiment, step S3 specifically includes the following sub-steps:
step S3-1, performing Empirical Mode Decomposition (EMD) by integrating random white gaussian noise for a plurality of times on the speech signal, and averaging the results of the multiple Decomposition to obtain an Intrinsic Mode Function (IMF).
Wherein, the adopted time-frequency domain analysis method assisted by noise-integrated Empirical Mode Decomposition (EEMD) specifically comprises the following substeps:
step S3-1-1, adding different white gaussian noises with the same distribution attribute and a mean value of 0 and a variance of a predetermined value std to the original speech signal, synthesizing a target signal y (t), and making k equal to 1;
step S3-1-2, where n (t) is equal to y (t);
s3-1-3, extracting all maximum value points and all minimum value points of n (t), and respectively obtaining an upper envelope and a lower envelope by adopting cubic spline interpolation;
step S3-1-4, calculating average value m of upper envelope and lower envelopek(t),hk(t)=n(t)-mk(t);
Step S3-1-5, mixing hk(t) is regarded as n (t), and the step S3-1-3 is returned until hk(t) if the constraint of IMF component is satisfied, the kth IMF component ck(t)=hk(t);
Step S3-1-6, let rk(t)=y(t)-ck(t), k is k +1, and r isk(t) regarding y (t) and returning to the step S3-1-2 until the number of screening times reaches log2(L) -1; wherein, L represents the signal length, and the influence of local disturbance and newly added data can be reduced by fixing the screening times;
step S3-1-7, repeating the steps S3-1-1 to S3-1-6M times;
and S3-1-8, averaging the corresponding IMF components in the result to obtain the final decomposition result.
And S3-2, calculating the ratio of normalized band energy-entropy ratios of Hilbert marginal spectrums corresponding to the IMF components of the inherent mode functions, and screening the IMF components containing formants to reconstruct the voice signals.
And step S3-3, after performing frame division and windowing processing on the reconstructed voice signal, extracting a plurality of formant characteristic parameters of a plurality of formants by using a cepstrum interpolation method, and obtaining the acoustic characteristic parameter set.
In this embodiment, a plurality of formant characteristic parameters of the first three formants are extracted, for each formant, the number of peaks, the mean, the variance, the quartile difference, the median, the mode, the range, the skewness, and the kurtosis corresponding to the frequency, the bandwidth, and the amplitude of the formant are extracted, that is, 81 formant characteristic parameters of the first three formants are extracted in total, and the 81 formant characteristic parameters constitute the acoustic characteristic parameter set of the speech.
And step S4, reducing the dimension of the features, extracting parameters with significant differences from the acoustic feature parameter set by using a hypothesis testing method, and combining the parameters into the dimension-reduced acoustic feature parameter set.
In this example, a hypothesis test (i.e., t test) method was used to extract parameters with significant differences, and the level of significant differences was set to 0.05, and the test results obtained are shown in tables 2 to 4.
TABLE 2 table of p-values of characteristic parameters of the first formants
Figure BDA0003492180760000111
TABLE 3 table of p-values of characteristic parameters of the second formants
Figure BDA0003492180760000121
TABLE 4 table of p-values of each characteristic parameter of the third formant
Figure BDA0003492180760000122
As can be seen from the test results of tables 2-4, the p-values of some features (e.g., all variances in table 2, median corresponding to amplitude, mean corresponding to amplitude, etc.) are significantly larger than those of the remaining features, which indicates that if these features with larger p-values are used to distinguish between patients and normal persons, poor classification accuracy is obtained, and therefore these features should be discarded in actual classification.
Fig. 5-7 are characteristic box diagrams of the number of peaks of the first three formants in this embodiment.
As shown in tables 2-4 and fig. 5-7, the feature of the number of peaks (max) has significant difference in classification detection of the first three formants, and it is apparent from fig. 5 that the distribution difference of the feature of the number of peaks in two types of speech signals is significant, further indicating the effectiveness of using the feature in classification detection.
And step S5, combining the acoustic feature parameter set with the reduced dimension and the label category verified and confirmed by the psychiatrist into a training sample, and training the SVM classifier.
In the embodiment, in order to verify the reliability and the practicability of the generated model, a ten-fold cross verification method is adopted, data is divided into ten parts, 90% of the data is used for training each time, 10% of the data is reserved for testing, and the mean value of ten times of repeated experiments is used as the estimation of the algorithm detection rate.
And step S6, pattern recognition, namely, carrying out classification detection on the voice by using a trained SVM classifier, thereby realizing automatic classification detection on the voice of the schizophrenia patient and the voice of a healthy control group.
In this embodiment, the influence of the white noise integration times M, the variance std, the number n of IMF components, and the window length cepstL on the detection accuracy is respectively tested.
(1) Influence of white noise integration times M and variance std on schizophrenia voice automatic classification detection result
Fig. 8 is a line graph showing the influence of the number of integrations of white noise and the variance on the detection result in the present embodiment.
As shown in fig. 8, the number n of IMF components is set to 3, and the window length cepstL is set to 8. When the white noise integration frequency is 100 and the variance std is 0.1, the voice detection effect on the schizophrenia patient is the best. According to the EHHT principle, as the number of white noise integrations increases, a high frequency intermittent signal is gradually drowned out by noise, and a low frequency signal related to a resonance peak is continuously extracted, but the amount of calculation also increases. Meanwhile, when the variance of the white noise is small, the distribution of upper and lower extreme points depended on in the EMD decomposition process cannot be changed, so that the problem of mode aliasing cannot be solved well. When the white noise variance is large, the number of white noise integrations must be increased to reduce the influence of the introduced white noise on the decomposition result, resulting in a reduction in calculation efficiency. Therefore, within the limited computer data operand range, the more the integration times of white noise are, the more obvious the effect of weakening modal aliasing response is, the higher the correct rate of distinguishing the voice of the schizophrenia patient from the voice of a healthy control group is, and then the variance of the white noise is selected. In consideration, the integration number M of the optimal white noise selected in this embodiment is 100, and the variance std is 0.1.
(2) Influence of number of IMF components on schizophrenia voice automatic classification detection result
TABLE 5 IMF component number to detection accuracy mapping table
Figure BDA0003492180760000141
The white noise integration number M is set to 100, the white noise variance std is set to 0.1, and the window length cepstL is set to 8. As can be seen from table 5, when the number n of IMF components is 3, the detection accuracy is the highest. And (3) arranging the energy-entropy ratio of each IMF component in the frequency band of 200-3000 Hz to the energy-entropy ratio of the whole frequency domain in a descending order, and selecting the first n IMF components to reconstruct the voice signal. When the value of n is small, IMF components containing formant information are discarded; when the value of n is too large, useless IMF components are introduced, which is not favorable for the reconstruction of the voice signal. The experimental result shows that when the value of n is 3, the signal extraction effect containing the formant information is optimal. If there is a band aliasing effect in the EMD decomposition algorithm, one formant may be distributed among multiple IMF components. The EEMD decomposition algorithm adopted in the embodiment of the invention effectively combines a plurality of IMF components of the same formant into a new IMF component, thereby ensuring that each IMF component only contains one formant. Namely, the first three IMF components with the maximum ratio of the normalized energy-entropy ratios reflect the structural characteristics of the first three formants of the speech signal. When the IMF components are used for signal reconstruction, the classification detection effect of the schizophrenia voice and the healthy control group voice is the best.
(3) Influence of window length on voice automatic classification detection result of schizophrenia
The white noise integration number M is set to 100, the white noise variance std is set to 0.1, and the number n of IMF components is set to 3. As can be seen from table 6, the detection accuracy is highest when the window length cepstL is 8. The window length is related to the resolution of the cepstrum, i.e. to the length of the FFT and to the sampling frequency. The vocal tract response cepstrum containing formant information and the glottal pulse excitation cepstrum containing pitch frequency information are separated from each other in the cepstral domain, and the optimal cepstrum component containing formant information is determined by adjusting the window length. If the window length cepstL is too long, introducing a fundamental frequency part; when the window length cepstL is too short, formant information is lost. Experiments prove that the voice classification detection effect is best when the window length cepstL is 8.
TABLE 6 correspondence table of window length and detection accuracy
Figure BDA0003492180760000151
Through the experimental analysis, the influence of the integration frequency M and the variance std of white noise, the number n of IMF components and the window length cepstL on the accuracy of the automatic classification detection of schizophrenia voice is respectively researched, and the average accuracy of the classification detection is about 93.0%. Through the classification discussion of each factor, the optimal value is gradually determined and substituted into the discussion of the next factor, and meanwhile, the limitation of data computation amount is considered, and finally, when the integration frequency M of white noise is 100, the variance std of the white noise is 0.1, the number n of IMF components is 3, and the window length cepstL is 8, the average accuracy of classification detection is optimal and reaches 98.8%.
TABLE 7 detection accuracy table for each vowel
Figure BDA0003492180760000152
TABLE 8 detection accuracy table for various parameters
Figure BDA0003492180760000153
Table 7 and table 8 show that the accuracy of the classification detection of each vowel and each type of parameter is significantly higher than the accuracy of the detection of vowels/a/,/e/,/i/, and the difference between men and women is related to the difference in the distribution of formants of two types of voices of schizophrenic patients and healthy control groups caused by the difference in the shape and size of the vocal cavity when each vowel pronounces, such as the degree of openness, the degree of circular lip and the degree of tightening. Meanwhile, the detection accuracy rate of the first resonance peak and the amplitude which are independently used is slightly lower than that of other types of parameters, so that the differentiation degree of the forward and backward movement of the tongue, the round lip degree, the soft palate height and the concentration degree of the spectral energy is more obvious when the schizophrenia patients and the healthy control group pronounce, the integral schizophrenia voice detection accuracy rate can be improved by combining a plurality of parameters with obvious differences, and the optimal detection effect is achieved.
The embodiment also provides an automatic voice detection system for schizophrenia based on EHHT and CI.
Fig. 9 is a block diagram of the structure of the speech automatic schizophrenia detection system based on EHHT and CI in the present embodiment.
As shown in fig. 9, the EHHT and CI-based schizophrenia voice automatic detection system 10 includes a voice acquisition module 11, a preprocessing module 12, a feature extraction module 13, a feature dimension reduction module 14, a pattern recognition module 15, and a control module 16.
Wherein, the voice collecting module 11 collects the voice of the subject by the method of the step S1 to obtain a voice signal; the preprocessing module 12 preprocesses the voice signal acquired by the voice acquisition module 11 by using the method of the step S2; the feature extraction module 13 performs feature extraction on the preprocessed voice signal by using the method in the step S3 to obtain the acoustic feature parameter set; the feature dimension reduction module 14 performs dimension reduction processing on the extracted acoustic feature parameter set by using the method in the step S4; the pattern recognition module 15 performs classification detection on the speech by using the method of the step S6, wherein the SVM classifier is trained by using the method of the step S5; the control module 16 coordinates the operation of the modules.
Examples effects and effects
According to the method and the system for automatically detecting the schizophrenia voice based on the EHHT and the CI, the existing formant extraction algorithm is improved by utilizing the EHHT and the CI through analyzing and researching the clinical characteristics of the schizophrenia voice, the acoustic feature parameter set reflecting the tone quality and emotion change of the schizophrenia voice is obtained from the voice sample by utilizing the improved formant extraction algorithm, and then the SVM classifier is adopted for classification detection, so that the automatic classification detection of the voice of the schizophrenia patient and the voice of a healthy control group is realized, thereby being beneficial to early recognizing and tracking the emotion change of the schizophrenia patient and providing a new scientific basis for assisting and guiding a psychiatrist to diagnose the illness state of the schizophrenia and monitor the treatment effect of the patient.
Further, in the embodiment, the influence of four factors, namely the number and variance of white noise, the number of IMF components and the window length, on the detection effect is discussed in the design experiment. The experimental result shows that the detection accuracy of the method provided by the embodiment can reach 98.8%, which indicates that the schizophrenia patient has a significant difference from a healthy control group in the formant voice acoustic parameters representing the characteristics of the tone quality, and a brand-new objective, quantitative and efficient index can be provided for clinical auxiliary diagnosis research of schizophrenia.
Furthermore, in the embodiment, the proposed formant extraction algorithm based on EEMD and CI effectively avoids the problem of mode aliasing by using EEMD, screens out the component containing the formants of schizophrenia voice according to the ratio of normalized IMF energy-entropy ratio for reconstructing voice signals, and extracts formant parameters by combining with CI algorithm, thereby overcoming the defects of formant combination, false formants and the like in the prior art and realizing good automatic classification detection effect.
The above-described embodiments are merely illustrative of specific embodiments of the present invention, and the present invention is not limited to the description of the above-described embodiments.

Claims (8)

1. An automatic voice detection method for schizophrenia based on EHHT and CI, which is characterized by comprising the following steps:
step S1, collecting the voice of the subject to obtain a voice signal;
step S2, preprocessing the collected voice signal;
step S3, performing feature extraction on the preprocessed voice signal by using an improved formant extraction algorithm based on EHHT and CI to obtain an acoustic feature parameter set reflecting the voice quality emotion change of the voice;
step S4, extracting parameters with significant differences from the acoustic feature parameter set by using a hypothesis testing method, and combining the parameters into the acoustic feature parameter set with reduced dimensions;
step S5, forming a training sample by the acoustic feature parameter set subjected to dimension reduction and the label category verified and confirmed by a psychiatrist, and training an SVM classifier;
and step S6, carrying out classification detection on the voices by using the trained SVM classifier, thereby realizing automatic classification detection on the voices of the schizophrenia patients and the voices of the healthy control groups.
2. The EHHT and CI-based speech automatic detection method for schizophrenia according to claim 1, wherein:
wherein, in step S2, the preprocessing includes:
and carrying out direct current removal, normalization and pre-emphasis processing on the vowel signals in the voice signals, so that high-frequency components of the voice signals are improved, and interference of oral tooth radiation of the subject on extraction of formant components is avoided.
3. The EHHT and CI-based speech automatic detection method for schizophrenia according to claim 1, wherein:
wherein, step S3 includes the following substeps:
step S3-1, performing empirical mode decomposition by integrating random white Gaussian noise on the voice signal for multiple times, and averaging the multiple decomposition results to obtain an inherent modal function;
step S3-2, calculating the ratio of normalized band energy-entropy ratios of Hilbert marginal spectrums corresponding to IMF components of the inherent modal function, and screening the IMF components containing a plurality of formants to reconstruct the voice signal;
and step S3-3, after performing frame division and windowing processing on the reconstructed voice signal, extracting a plurality of formant characteristic parameters of a plurality of formants by using a cepstrum interpolation method, and obtaining the acoustic characteristic parameter set.
4. The EHHT and CI based speech-enabled automatic detection method for schizophrenia according to claim 3, wherein:
in step S3-1, the empirical mode decomposition is an integrated empirical mode decomposition.
5. The EHHT and CI based speech-enabled automatic detection method for schizophrenia according to claim 3, wherein:
in step S3-3, the number of peaks, the mean, the variance, the quartile difference, the median, the mode, the range, the skewness, and the kurtosis corresponding to the frequency, the bandwidth, and the amplitude of the first three formants are extracted as the set of formant feature parameters.
6. The EHHT and CI based speech-enabled automatic detection method for schizophrenia according to claim 3, wherein:
in step S3-1, the integration frequency of the random white gaussian noise is 100 times, the variance of the random white gaussian noise is 0.1, and the number of IMF components is 3;
in step S3-3, the window length is 8.
7. The EHHT and CI-based speech automatic detection method for schizophrenia according to claim 1, wherein:
in step S5, ten-fold cross validation is adopted to divide the training samples into ten parts, 90% of the training samples are used for training each time, 10% of the training samples are left for testing, and the average value obtained by repeating ten times is used as the estimation of the detection accuracy of the SVM classifier.
8. An automatic voice detection system for schizophrenia based on EHHT and CI, comprising:
the voice acquisition module is used for acquiring the voice of a subject to obtain a voice signal;
the preprocessing module is used for preprocessing the acquired voice signals;
the feature extraction module is used for extracting features of the preprocessed voice signals by utilizing an improved formant extraction algorithm based on EHHT and CI to obtain an acoustic feature parameter set reflecting the voice quality emotion change of the voice;
the characteristic dimension reduction module extracts parameters with significant differences from the acoustic characteristic parameter set by using a hypothesis testing method and combines the parameters into the acoustic characteristic parameter set with reduced dimension;
the pattern recognition module is used for classifying the voice by utilizing a trained SVM classifier based on the dimension-reduced acoustic feature parameter set and the label category verified and confirmed by a psychiatrist, so that the automatic classification detection of the voice of the schizophrenia patient and the voice of a healthy control group is realized; and
and the control module is used for carrying out coordination management on the work of each module.
CN202210100356.7A 2022-01-27 2022-01-27 Automatic schizophrenia voice detection method and system based on EHHT and CI Pending CN114400025A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210100356.7A CN114400025A (en) 2022-01-27 2022-01-27 Automatic schizophrenia voice detection method and system based on EHHT and CI

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210100356.7A CN114400025A (en) 2022-01-27 2022-01-27 Automatic schizophrenia voice detection method and system based on EHHT and CI

Publications (1)

Publication Number Publication Date
CN114400025A true CN114400025A (en) 2022-04-26

Family

ID=81232589

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210100356.7A Pending CN114400025A (en) 2022-01-27 2022-01-27 Automatic schizophrenia voice detection method and system based on EHHT and CI

Country Status (1)

Country Link
CN (1) CN114400025A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117373492A (en) * 2023-12-08 2024-01-09 北京回龙观医院(北京心理危机研究与干预中心) Deep learning-based schizophrenia voice detection method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117373492A (en) * 2023-12-08 2024-01-09 北京回龙观医院(北京心理危机研究与干预中心) Deep learning-based schizophrenia voice detection method and system
CN117373492B (en) * 2023-12-08 2024-02-23 北京回龙观医院(北京心理危机研究与干预中心) Deep learning-based schizophrenia voice detection method and system

Similar Documents

Publication Publication Date Title
US9936914B2 (en) Phonologically-based biomarkers for major depressive disorder
Ozdas et al. Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk
Montaña et al. A Diadochokinesis-based expert system considering articulatory features of plosive consonants for early detection of Parkinson’s disease
Cummins et al. A review of depression and suicide risk assessment using speech analysis
Khan et al. Classification of speech intelligibility in Parkinson's disease
Di Liberto et al. Low-frequency cortical entrainment to speech reflects phoneme-level processing
US10278637B2 (en) Accurate analysis tool and method for the quantitative acoustic assessment of infant cry
Tsanas Accurate telemonitoring of Parkinson’s disease symptom severity using nonlinear speech signal processing and statistical machine learning
US7139699B2 (en) Method for analysis of vocal jitter for near-term suicidal risk assessment
Bandini et al. Automatic identification of dysprosody in idiopathic Parkinson's disease
Kato et al. Easy screening for mild Alzheimer's disease and mild cognitive impairment from elderly speech
Ambrosini et al. Automatic speech analysis to early detect functional cognitive decline in elderly population
Seneviratne et al. Extended Study on the Use of Vocal Tract Variables to Quantify Neuromotor Coordination in Depression.
Kato et al. Detection of mild Alzheimer's disease and mild cognitive impairment from elderly speech: Binary discrimination using logistic regression
Almaghrabi et al. Bio-acoustic features of depression: A review
Nishikawa et al. Machine learning model for discrimination of mild dementia patients using acoustic features
Ozdas et al. Analysis of fundamental frequency for near term suicidal risk assessment
CN114400025A (en) Automatic schizophrenia voice detection method and system based on EHHT and CI
Cordeiro et al. Spectral envelope first peak and periodic component in pathological voices: A spectral analysis
Keskinpala et al. Screening for high risk suicidal states using mel-cepstral coefficients and energy in frequency bands
Elisha et al. Automatic detection of obstructive sleep apnea using speech signal analysis
Elisha et al. Detection of obstructive sleep apnea using speech signal analysis
Sanadi et al. Acoustic analysis of speech based on power spectral density features in detecting suicidal risk among female patients
Akkaralaertsest et al. Classification of depressed speech samples with spectral energy ratios as depression indicator
Bendale Voice Based Disease Identification System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination