CN113555004A - Voice depression state identification method based on feature selection and transfer learning - Google Patents
Voice depression state identification method based on feature selection and transfer learning Download PDFInfo
- Publication number
- CN113555004A CN113555004A CN202110801507.7A CN202110801507A CN113555004A CN 113555004 A CN113555004 A CN 113555004A CN 202110801507 A CN202110801507 A CN 202110801507A CN 113555004 A CN113555004 A CN 113555004A
- Authority
- CN
- China
- Prior art keywords
- speech
- voice
- feature
- transfer learning
- depression
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000013526 transfer learning Methods 0.000 title claims abstract description 44
- 238000012549 training Methods 0.000 claims abstract description 22
- 235000014653 Carica parviflora Nutrition 0.000 claims abstract description 10
- 241000243321 Cnidaria Species 0.000 claims abstract description 10
- 238000001914 filtration Methods 0.000 claims abstract description 5
- 238000007781 pre-processing Methods 0.000 claims description 15
- 238000012417 linear regression Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 5
- 238000012546 transfer Methods 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 4
- 238000012360 testing method Methods 0.000 abstract description 13
- 230000005012 migration Effects 0.000 abstract description 8
- 238000013508 migration Methods 0.000 abstract description 8
- 238000012216 screening Methods 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 8
- 239000011159 matrix material Substances 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 230000001629 suppression Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 4
- 239000012634 fragment Substances 0.000 description 4
- 238000001228 spectrum Methods 0.000 description 4
- 238000003066 decision tree Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000000875 corresponding effect Effects 0.000 description 2
- 230000000994 depressogenic effect Effects 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000007477 logistic regression Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 206010011224 Cough Diseases 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000001107 psychogenic effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Child & Adolescent Psychology (AREA)
- General Health & Medical Sciences (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a speech depression state recognition method based on feature selection and transfer learning, and provides a speech depression state recognition method fusing a Lasso method and a transfer learning method CORAL, aiming at the two problems that feature dimensionality is high and feature distribution is influenced by individual differences tested except for depression level when modeling is carried out based on speech. The method has the advantages that 1, redundant information in the Lasso filtering characteristics is obtained, effective characteristics are reserved, and the identification precision is further improved on the basis of improving the model efficiency; 2. under the premise of not leaking depression label information, the migration learning method CORAL draws the feature distribution of the training set and the test set closer, and reduces the influence of other factors except the depression level on the feature distribution. The combination of the two methods can further improve the accuracy and stability of depression screening.
Description
Technical Field
The invention belongs to the field of voice signal processing, and particularly relates to a voice depression state identification method based on feature selection and transfer learning.
Background
The depression is a typical and common psychogenic disease in the world, covers all age stages, depends on clinical experience of doctors and relevant scales filled by patients, and has long time consumption in the whole process and low efficiency of a diagnosis process. The voice is an important external expression of emotion, and is a key direction for researchers to realize an automatic depression recognition method due to unique advantages of few use limitations, low equipment cost, no contact, noninvasive and convenient acquisition mode and the like.
At present, there is no specific feature with clear theoretical background support for depression identification, the feature design level is to extract depression-related information in voice as much as possible, generally using features of multiple fields with high dimensionality, and comparing classification results of different feature combinations. However, the number of used features is too large, the model is too complex, the recognition result takes too long, and the diagnosis efficiency is reduced.
Speaking among other people is a very complex process, and many studies have been made to explore the differences between the brain structure and function of patients with depression and the potential factors affecting speech in addition to depression, mainly including: gender, age, emotional state, language style, and academic work background. These factors further increase the difference in feature distribution of different tested speech signals, and increase the difficulty of model recognition.
In addition, in the machine learning related to the speech signal, it is usually assumed that the data in the test set and the data in the test set are independently and identically distributed when performing the training set and the test set division, however, the feature distribution of the tested speech signal is affected not only by the level of depression, but also by other factors such as age, sex, occupation, etc. of the tested individuality difference, so that this assumption condition is not satisfied, and the performance of the model is reduced.
Disclosure of Invention
In order to solve the problems, the invention provides a speech depression state identification method based on feature selection and transfer learning, which adopts the following technical scheme:
the invention provides a speech depression state identification method based on feature selection and transfer learning, which is characterized by comprising the following steps: step S1, collecting voice by using a recording device to obtain a voice sample; step S2, preprocessing the voice sample; step S3, extracting the voice characteristics in the voice sample, wherein the voice characteristics at least comprise chrominance characteristics; step S4, calculating statistic of voice characteristics, and taking the statistic as a characteristic set; step S5, using a Lasso model to perform feature selection on the feature set to obtain an effective feature set; step S6, based on the effective feature set, using CORAL method to perform transfer learning to obtain the training set features after transfer; and step S7, classifying the voice samples based on the characteristics of the training set, and outputting the classification result.
The speech depression state identification method based on feature selection and transfer learning provided by the invention can also have the technical characteristics, wherein the speech characteristics further comprise acoustic characteristics, frequency domain characteristics, pause characteristics and Mel frequency cepstrum coefficients.
The speech depression state identification method based on feature selection and transfer learning provided by the invention can also have the technical characteristics that the statistics comprises the maximum value, the minimum value, the range, the mean value, the median, the intercept term of linear regression, the independent variable coefficient of linear regression, R2 of linear regression, standard deviation, skewness, kurtosis and variation coefficient.
The speech depression state identification method based on feature selection and transfer learning provided by the invention can also have the technical characteristics that the classifier model used in classification is XGboost.
The speech depression state identification method based on feature selection and transfer learning provided by the invention can also have the technical characteristics that the preprocessing comprises the removal of noise fragments, the removal of mute fragments, high-pass filtering and down-sampling.
The invention provides a speech depression state recognition device based on feature selection and transfer learning, which is characterized by comprising the following components: a voice collecting part for collecting the voice sample; a preprocessing section for preprocessing the voice sample; a feature extraction unit configured to extract the speech feature of the speech sample; the characteristic processing part is used for processing the voice characteristics to obtain the effective characteristic set; the transfer learning part is used for carrying out transfer learning on the effective characteristic set to obtain the characteristics of the training set after the transfer; a classification section for classifying the voice sample.
Action and Effect of the invention
According to the method for recognizing the speech depression state based on the feature selection and the transfer learning, after the collected speech samples are preprocessed, the speech features are extracted, 12 statistics of the speech features are calculated to serve as feature sets, the feature sets are further subjected to the feature selection and the transfer learning, and training set features are obtained and used for classifying the speech samples. The method has the advantages that the Lasso model is used for feature selection, redundant information in the features is filtered, and effective features are reserved, so that the method is based on fewer features, lower model complexity achieves better recognition accuracy, the technical problem of high feature dimensionality in modeling based on voice is solved, and meanwhile, the model efficiency is improved.
On the other hand, because the feature-based unsupervised transfer learning method CORAL is used for transfer learning, the feature distribution of the training set and the test set can be drawn close by aligning the second-order covariance matrix on the premise of not revealing depression label information, and the influence of other factors except the depression level on the feature distribution is reduced, so that the technical problem that the feature distribution is influenced by the individual difference outside the tested depression level when modeling is carried out based on voice is solved. The combination of the two methods can further improve the accuracy and stability of depression screening.
Drawings
FIG. 1 is a flow chart of a speech suppression state recognition method based on feature selection and transfer learning according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a speech suppression state recognition apparatus based on feature selection and transfer learning according to an embodiment of the present invention.
Detailed Description
In order to make the technical means, the creation characteristics, the achievement purposes and the efficacies of the present invention easy to understand, the following describes the speech depression state identification method based on the characteristic selection and the transfer learning specifically with reference to the embodiment and the accompanying drawings.
< example 1>
Fig. 1 is a flowchart of a speech suppression state recognition method based on feature selection and transfer learning according to an embodiment of the present invention.
As shown in fig. 1, the method for recognizing a speech suppression state based on feature selection and transfer learning according to the embodiment of the present invention includes the following steps:
and step S1, acquiring voice information, acquiring voice by using a recording device, designing questions of different speech task types, answering the test according to prompts on a screen, acquiring the complete speaking process of the test by using the recording device, and recording the complete speaking process as a wav file, wherein the file is a voice sample.
And step S2, preprocessing the voice signal, preprocessing the collected voice sample, manually screening and eliminating obvious noise fragments, such as cough and dropped sound, and performing high-pass filtering, down-sampling, silence fragment detection and removal.
In this embodiment 1, a second-order butterworth filter with a cutoff frequency of 137.8Hz is used for high-pass filtering, so as to reduce the interference of low-frequency noise on the effective voice information; uniformly sampling the voice signal to 16000hz by using a tool kit librosa; the tool kit Pyaudioanalysis is used to detect voiced and unvoiced segments and remove unvoiced segments that are not voiced. Short-time Fourier transform: the window length is 0.1s, the sliding step length is 0.05s, a hamming window is selected, and NFFT is 1024.
Step S3, extracting voice characteristics in the voice sample, including acoustic characteristics, frequency domain characteristics, pause characteristics, Mel frequency cepstrum coefficients and chroma characteristics, see Table 1.
TABLE 1 Speech characteristics summary sheet
As shown in table 1, the acoustic features include 6 fundamental frequency, energy, and zero-crossing rate related features. The energy characteristics comprise sound intensity and sound intensity envelopes, and the zero-crossing rate related characteristics comprise zero-crossing rate, zero-crossing amplitude, namely the maximum amplitude of a signal between two zero-crossing points, and zero-crossing interval, namely the time interval between two zero-crossing points.
The number of frequency domain features is 5, which are respectively a spectrum center, a spectrum entropy, a spectrum extensibility, a spectrum roll-off point and a spectrum flux.
The total number of mel-frequency cepstrum coefficients is 13, which is a common feature in speech signal processing.
The total number of the chromaticity characteristics is 12, which is a general name of chromaticity maps and chromaticity vectors, represents the energy in 12 sound levels in unit time, the energy of the same sound level of different octaves is accumulated, the chromaticity characteristics are widely applied to the music field, and the method is introduced into the depression identification field.
The number of the pause characteristics is 3, and the pause times, the pause time ratio and the average pause time length ratio are included.
Step S4, calculating a feature statistic: 12 statistics of the speech features are calculated, and the statistics are taken as a feature set. The 12 statistics include: maximum, minimum, range, mean, median, intercept term of linear regression (time as argument), independent coefficient of linear regression (time as argument), R2 of linear regression (time as argument), standard deviation, skewnessKurtosisAnd coefficient of variation
Step S5, feature selection: and (4) performing feature selection on the feature set by using a Lasso model, and compressing the coefficient of the non-significant variable to obtain an effective feature set.
Lasso selects feature variables based on a penalty function, extracts valid features by compressing coefficients, and extracts a general linear regression model Y ═ X β + epsilon and a response variable Y ═ Y (Y ═ Y ∈)1,y2,…,yn)TThe independent variable X ═ X(1),X(2),…,X(m)) Wherein X is(i)Is an n × 1 order vector, and the regression coefficient β ═ β (β)1,β2,…,βm)T. Based on the common least square estimation, the regression coefficients are compressed in a mode of adding a penalty function, part of coefficients can be compressed to 0, the features of the coefficients compressed to 0 are discarded, the remaining features are reserved effective features, and the Lasso estimation formula is as follows:
the method adopts Lasso-Logistic regression for classification tasks, compares different lambda parameters on the basis of the fixed parameters of a Logistic regression model, and determines the hyper-parameters according to the optimal accuracy. The penalty coefficient λ is determined by adjusting parameters through multi-round experimental cross validation, and it is tried to set the penalty coefficient λ to 1, 0.1, 0.01, 0.005, 0.001, 0.0005, 0.0001, 0.00005, and finally 0.005.
And step S6, performing transfer learning, namely performing transfer learning by using a domain adaptive method CORAL based on the effective characteristic set, and drawing up the characteristic distribution between the test set and the training set by aligning a second-order covariance matrix to obtain the characteristics of the training set after the transfer.
The invention introduces a feature-based unsupervised transfer learning method for reducing the feature distribution difference between a training set and a testing set caused by other individual factors except the depression level on the premise of not leaking depression label information: and (3) a domain adaptive Alignment (CORAL) method, which is used for drawing up the characteristic distribution of the training set and the test set by aligning a second-order covariance matrix. After white noise information is added to the covariance matrix of the target domain, linear transformation is performed, and only two blocks need to be calculated by CORAL: (1) a covariance matrix of the source domain features and the target domain features; (2) and performing linear transformation on the matrix added with the white noise. The specific steps of the migration algorithm are shown in table 2.
TABLE 2 CORAL Algorithm steps
And step S7, classifying, namely classifying the voice samples by using an XGboost classifier model based on the characteristics of the training set, and outputting the classification result of the voice samples.
The XGboost is a Boosting framework-based lifting tree model, and the identification error and variance of the model are reduced by integrating a plurality of CART decision trees into a strong classifier. The XGboost learns the residual error used for the last prediction of the XGboost and the last prediction every time based on a gradient descent tree setting function, the score obtained by each node is calculated according to a sample, the sum of all the scores is used as the classification result of the sample, and the model to be trained in the t iteration is set as ft(x) And then:
i.e. the classification result, x, of the model on the ith sample after t iterationsiRepresents the number of the i-th sample,representing the predicted outcome of the t-1 tree, ftRepresenting the t-th tree. The objective function is set to:
obj (t) is the objective function value for t iterations,training error for the ith sample;is the sum of the model complexity of t trees, and is used as a regular term in the objective function. The model complexity Ω is determined by the total number T of decision tree nodes, and the weight coefficients of the decision tree nodes are written as:
in the formulaIs the L2 norm of the weight coefficient; gamma is the coefficient of the sliced leaf node, used to control the total number of nodes; λ is a regular term coefficient.
By training, it is estimated when to terminate training based on the objective function described above. And traversing all the characteristics by adopting a greedy algorithm as a division point during implementation, continuing the division if the OBJ after the division is larger than that before the division, and stopping the division if the weight coefficient or the depth exceeds a threshold value, so that overfitting of the model is avoided.
After training is completed, the model can be used for carrying out classification prediction on the voice sample, and the voice sample is judged to belong to a depressed subject or a normal subject. And finally outputs the result of the classification.
The embodiment also provides evaluation indexes of three classification results of the speech depression state: accuracy, F1 score and AUC values. The three evaluation indexes are specifically defined as follows:
the F1 score is the harmonic mean of recall and accuracy and is in the range of [0,1 ].
The AUC value is the area enclosed by the receiver operating characteristic curve (ROC) and the coordinate axis, and the abscissa of the ROC curve isThe ordinate isThe curve is above y-x and the value range is [0.5, 1]]。
Wherein, the definition of TP, FP, FN, TN is shown in Table 3.
TABLE 3 confusion matrix of classification results of speech depression states
Audio for depression being tested | Normal tested audio | |
Determining audio belonging to a depressed subject | True Positive(TP) | False Positive(FP) |
Judging the audio frequency belonging to the normal tested audio frequency | False Negative(FN) | True Negative(TN) |
The values of the three evaluation indexes are positively correlated with the classification performance, and the larger the value is, the better the classification result is.
Therefore, through the voice depression state identification method based on feature selection and transfer learning, the depression state identification of the tested voice segment is realized, the classification result of the voice segment is obtained, and the evaluation of the classification result is obtained.
< example 2>
As described above, embodiment 1 provides a speech depression state recognition method based on feature selection and transfer learning, mainly including steps S1 to S6. In practical application, the steps of the method of embodiment 1 can be configured into corresponding computer modules, namely, a voice collecting part, a preprocessing part, a feature extracting part, a feature processing part, a transfer learning part and a classifying part, which form a device for classifying and identifying the voice depression state, so that a voice depression state identification device based on feature selection and transfer learning can also be provided.
Fig. 2 is a schematic diagram of a speech depression state recognition apparatus based on feature selection and transfer learning according to an embodiment of the present invention.
As shown in fig. 2, a speech depression state recognition apparatus (hereinafter, simply referred to as a speech suppression state recognition apparatus) 100 based on feature selection and transition learning includes a speech acquisition section 11, a preprocessing section 12, a feature extraction section 13, a feature processing section 14, a transition learning section 15, and a classification section 16. The speech depression state recognition device 100 is used for recognizing a target speech segment and obtaining a recognition result, namely the speech segment belongs to a depression subject test or a normal subject test.
The voice acquiring unit 11 acquires a voice sample by acquiring a voice segment to be tested, and adopts the voice acquiring method of step S1.
The preprocessing section 12 is for preprocessing the voice sample by the preprocessing method of step S2.
The feature extraction unit 13 is configured to extract a speech feature in the speech sample, and employs the speech feature extraction method of step S3.
The feature processing unit 14 is configured to process the extracted speech features to obtain an active feature set, and to adopt the feature processing method of steps S4 to S5.
The migration learning unit 15 performs migration learning to obtain training set features after migration, and adopts the migration learning method of step S6.
The classification unit 16 classifies the speech segment and outputs the result, and adopts the classification method of step S7.
The execution process of each part is consistent with the process described in the corresponding step in the speech suppression state recognition method based on feature selection and transfer learning, and is not described herein again.
Examples effects and effects
According to the method for recognizing the speech depression state based on the feature selection and the transfer learning, after the collected speech samples are preprocessed, the speech features are extracted, 12 statistics of the speech features are calculated to serve as feature sets, and the feature sets are further subjected to the feature selection and the transfer learning to obtain the features of a training set for classifying the speech samples. The method has the advantages that the Lasso model is used for feature selection, redundant information in the features is filtered, and effective features are reserved, so that the method is based on fewer features, lower model complexity achieves better recognition accuracy, the technical problem of high feature dimensionality in modeling based on voice is solved, and meanwhile, the model efficiency is improved.
On the other hand, in the embodiment, because the feature-based unsupervised migration learning method CORAL is used for migration learning, the feature distribution of the training set and the test set can be drawn closer by aligning the second-order covariance matrix on the premise of not revealing the information of the depression label, and the influence of other factors except the depression level on the feature distribution is reduced, so that the technical problem that the feature distribution is influenced by the individual difference outside the tested depression level when modeling is performed based on voice is solved. The combination of the two methods can further improve the accuracy and stability of depression screening.
The above-described embodiments are merely illustrative of specific embodiments of the present invention, and the present invention is not limited to the description of the above-described embodiments.
For example, in the embodiment, the penalty coefficient λ of the Lasso model is set to 0.005, and in the present invention, the penalty coefficient λ may also be adjusted to other suitable values, so that the technical effects of the present invention can also be achieved.
In the embodiment, the classifier model used for classification is XGBoost, and in the present invention, other classifier models may be used for classification, for example, LightGBM may also be used to achieve the technical effect of the present invention.
Claims (6)
1. A speech depression state identification method based on feature selection and transfer learning is used for identifying speech depression states and is characterized by comprising the following steps:
step S1, collecting voice by using a recording device to obtain a voice sample;
step S2, preprocessing the voice sample;
step S3, extracting the voice characteristics in the voice sample, wherein the voice characteristics at least comprise chrominance characteristics;
step S4, calculating statistic of the voice features, and taking the statistic as a feature set;
step S5, using a Lasso model to perform feature selection on the feature set to obtain an effective feature set;
step S6, based on the effective feature set, using a CORAL method to perform transfer learning to obtain the characteristics of the training set after transfer;
and step S7, classifying the voice samples based on the training set characteristics, and outputting a classification result.
2. The speech depression state recognition method based on feature selection and transfer learning according to claim 1, characterized in that:
wherein the speech features further include acoustic features, frequency domain features, pause features, and mel-frequency cepstrum coefficients.
3. The speech depression state recognition method based on feature selection and transfer learning according to claim 1, characterized in that:
wherein the statistics include a maximum value, a minimum value, a range, a mean, a median, an intercept term of the linear regression, an independent variable coefficient of the linear regression, R2 of the linear regression, a standard deviation, a skewness, a kurtosis, and a coefficient of variation.
4. The speech depression state recognition method based on feature selection and transfer learning according to claim 1, characterized in that:
and the classifier model used for classification is XGboost.
5. The speech depression state recognition method based on feature selection and transfer learning according to claim 1, characterized in that:
wherein the preprocessing includes removal of noise segments, removal of silence segments, high-pass filtering, and down-sampling.
6. A speech depression state recognition apparatus based on feature selection and transfer learning, comprising:
a voice collecting part for collecting the voice sample;
a preprocessing section for preprocessing the voice sample;
a feature extraction unit configured to extract the speech feature of the speech sample;
the characteristic processing part is used for processing the voice characteristics to obtain the effective characteristic set;
the transfer learning part is used for carrying out transfer learning on the effective characteristic set to obtain the characteristics of the training set after the transfer;
a classification section for classifying the voice sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110801507.7A CN113555004A (en) | 2021-07-15 | 2021-07-15 | Voice depression state identification method based on feature selection and transfer learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110801507.7A CN113555004A (en) | 2021-07-15 | 2021-07-15 | Voice depression state identification method based on feature selection and transfer learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113555004A true CN113555004A (en) | 2021-10-26 |
Family
ID=78131917
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110801507.7A Pending CN113555004A (en) | 2021-07-15 | 2021-07-15 | Voice depression state identification method based on feature selection and transfer learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113555004A (en) |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017021267A (en) * | 2015-07-14 | 2017-01-26 | 日本電信電話株式会社 | Wiener filter design device, sound enhancement device, acoustic feature amount selection device, and method and program therefor |
CN106725532A (en) * | 2016-12-13 | 2017-05-31 | 兰州大学 | Depression automatic evaluation system and method based on phonetic feature and machine learning |
CN107221344A (en) * | 2017-04-07 | 2017-09-29 | 南京邮电大学 | A kind of speech emotional moving method |
CN107657964A (en) * | 2017-08-15 | 2018-02-02 | 西北大学 | Depression aided detection method and grader based on acoustic feature and sparse mathematics |
CN108830645A (en) * | 2018-05-31 | 2018-11-16 | 厦门快商通信息技术有限公司 | A kind of visitor's attrition prediction method and system |
US20190385711A1 (en) * | 2018-06-19 | 2019-12-19 | Ellipsis Health, Inc. | Systems and methods for mental health assessment |
CN110808072A (en) * | 2019-11-08 | 2020-02-18 | 广州科慧健远医疗科技有限公司 | Method for evaluating dysarthria of children based on optimized acoustic parameters of data mining technology |
CN110956310A (en) * | 2019-11-14 | 2020-04-03 | 佛山科学技术学院 | Fish feed feeding amount prediction method and system based on feature selection and support vector |
CN110991535A (en) * | 2019-12-04 | 2020-04-10 | 中山大学 | pCR prediction method based on multi-type medical data |
CN111210846A (en) * | 2020-01-07 | 2020-05-29 | 重庆大学 | Parkinson voice recognition system based on integrated manifold dimensionality reduction |
CN111444747A (en) * | 2019-01-17 | 2020-07-24 | 复旦大学 | Epileptic state identification method based on transfer learning and cavity convolution |
CN111898095A (en) * | 2020-07-10 | 2020-11-06 | 佛山科学技术学院 | Deep migration learning intelligent fault diagnosis method and device, storage medium and equipment |
CN111915596A (en) * | 2020-08-07 | 2020-11-10 | 杭州深睿博联科技有限公司 | Method and device for predicting benign and malignant pulmonary nodules |
US20210064829A1 (en) * | 2019-08-27 | 2021-03-04 | Nuance Communications, Inc. | System and method for language processing using adaptive regularization |
CN112906644A (en) * | 2021-03-22 | 2021-06-04 | 重庆大学 | Mechanical fault intelligent diagnosis method based on deep migration learning |
CN112927722A (en) * | 2021-01-25 | 2021-06-08 | 中国科学院心理研究所 | Method for establishing depression perception system based on individual voice analysis and depression perception system thereof |
-
2021
- 2021-07-15 CN CN202110801507.7A patent/CN113555004A/en active Pending
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017021267A (en) * | 2015-07-14 | 2017-01-26 | 日本電信電話株式会社 | Wiener filter design device, sound enhancement device, acoustic feature amount selection device, and method and program therefor |
CN106725532A (en) * | 2016-12-13 | 2017-05-31 | 兰州大学 | Depression automatic evaluation system and method based on phonetic feature and machine learning |
CN107221344A (en) * | 2017-04-07 | 2017-09-29 | 南京邮电大学 | A kind of speech emotional moving method |
CN107657964A (en) * | 2017-08-15 | 2018-02-02 | 西北大学 | Depression aided detection method and grader based on acoustic feature and sparse mathematics |
CN108830645A (en) * | 2018-05-31 | 2018-11-16 | 厦门快商通信息技术有限公司 | A kind of visitor's attrition prediction method and system |
US20190385711A1 (en) * | 2018-06-19 | 2019-12-19 | Ellipsis Health, Inc. | Systems and methods for mental health assessment |
CN111444747A (en) * | 2019-01-17 | 2020-07-24 | 复旦大学 | Epileptic state identification method based on transfer learning and cavity convolution |
US20210064829A1 (en) * | 2019-08-27 | 2021-03-04 | Nuance Communications, Inc. | System and method for language processing using adaptive regularization |
CN110808072A (en) * | 2019-11-08 | 2020-02-18 | 广州科慧健远医疗科技有限公司 | Method for evaluating dysarthria of children based on optimized acoustic parameters of data mining technology |
CN110956310A (en) * | 2019-11-14 | 2020-04-03 | 佛山科学技术学院 | Fish feed feeding amount prediction method and system based on feature selection and support vector |
CN110991535A (en) * | 2019-12-04 | 2020-04-10 | 中山大学 | pCR prediction method based on multi-type medical data |
CN111210846A (en) * | 2020-01-07 | 2020-05-29 | 重庆大学 | Parkinson voice recognition system based on integrated manifold dimensionality reduction |
CN111898095A (en) * | 2020-07-10 | 2020-11-06 | 佛山科学技术学院 | Deep migration learning intelligent fault diagnosis method and device, storage medium and equipment |
CN111915596A (en) * | 2020-08-07 | 2020-11-10 | 杭州深睿博联科技有限公司 | Method and device for predicting benign and malignant pulmonary nodules |
CN112927722A (en) * | 2021-01-25 | 2021-06-08 | 中国科学院心理研究所 | Method for establishing depression perception system based on individual voice analysis and depression perception system thereof |
CN112906644A (en) * | 2021-03-22 | 2021-06-04 | 重庆大学 | Mechanical fault intelligent diagnosis method based on deep migration learning |
Non-Patent Citations (5)
Title |
---|
LEI SHEN ET AL.: "《Epileptic States Recognition Using Transfer Learning》", 《2019 41ST ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC)》 * |
NIVEDHITHA MAHENDRAN ET AL.: "《Realizing a Stacking Generalization Model to Improve the Prediction Accuracy of Major Depressive Disorder in Adults》", 《IEEE ACCESS》 * |
崔鸿雁等: "机器学习中的特征选择方法研究及展望", 《北京邮电大学学报》 * |
彼得·布尔曼 等: "《高维数据统计方法、理论与应用》", 30 September 2018, 国防工业出版社 * |
王景行: "基于回归的房价预测模型研究", 《全国流通经济》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107657964B (en) | Depression auxiliary detection method and classifier based on acoustic features and sparse mathematics | |
CN109044396B (en) | Intelligent heart sound identification method based on bidirectional long-time and short-time memory neural network | |
Ittichaichareon et al. | Speech recognition using MFCC | |
Dibazar et al. | Feature analysis for automatic detection of pathological speech | |
CN101620853A (en) | Speech-emotion recognition method based on improved fuzzy vector quantization | |
CN109285551A (en) | Disturbances in patients with Parkinson disease method for recognizing sound-groove based on WMFCC and DNN | |
Srinivasan et al. | Artificial neural network based pathological voice classification using MFCC features | |
Ramashini et al. | Robust cepstral feature for bird sound classification | |
CN111489763A (en) | Adaptive method for speaker recognition in complex environment based on GMM model | |
CN115457966B (en) | Pig cough sound identification method based on improved DS evidence theory multi-classifier fusion | |
CN113674767A (en) | Depression state identification method based on multi-modal fusion | |
CN116842460A (en) | Cough-related disease identification method and system based on attention mechanism and residual neural network | |
da Silva et al. | Evaluation of a sliding window mechanism as DataAugmentation over emotion detection on speech | |
CN114299996A (en) | AdaBoost algorithm-based speech analysis method and system for key characteristic parameters of symptoms of frozen gait of Parkinson's disease | |
Dibazar et al. | A system for automatic detection of pathological speech | |
CN113724731A (en) | Method and device for audio discrimination by using audio discrimination model | |
Sabet et al. | COVID-19 detection in cough audio dataset using deep learning model | |
Roy et al. | Pathological voice classification using deep learning | |
Kumar et al. | Parkinson’s Speech Detection Using YAMNet | |
Vieira et al. | Combining entropy measures and cepstral analysis for pathological voices assessment | |
Cai et al. | The best input feature when using convolutional neural network for cough recognition | |
CN113555004A (en) | Voice depression state identification method based on feature selection and transfer learning | |
Faseela et al. | Machine Learning Based Parkinson's Disease Detection from Enhanced Speech | |
CN114299925A (en) | Method and system for obtaining importance measurement index of dysphagia symptom of Parkinson disease patient based on voice | |
Satyasai et al. | A gammatonegram based abnormality detection in PCG signals using CNN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20211026 |
|
RJ01 | Rejection of invention patent application after publication |