CN113555004A - Voice depression state identification method based on feature selection and transfer learning - Google Patents

Voice depression state identification method based on feature selection and transfer learning Download PDF

Info

Publication number
CN113555004A
CN113555004A CN202110801507.7A CN202110801507A CN113555004A CN 113555004 A CN113555004 A CN 113555004A CN 202110801507 A CN202110801507 A CN 202110801507A CN 113555004 A CN113555004 A CN 113555004A
Authority
CN
China
Prior art keywords
speech
voice
feature
transfer learning
depression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110801507.7A
Other languages
Chinese (zh)
Inventor
赵张
王守岩
汪静莹
刘伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN202110801507.7A priority Critical patent/CN113555004A/en
Publication of CN113555004A publication Critical patent/CN113555004A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a speech depression state recognition method based on feature selection and transfer learning, and provides a speech depression state recognition method fusing a Lasso method and a transfer learning method CORAL, aiming at the two problems that feature dimensionality is high and feature distribution is influenced by individual differences tested except for depression level when modeling is carried out based on speech. The method has the advantages that 1, redundant information in the Lasso filtering characteristics is obtained, effective characteristics are reserved, and the identification precision is further improved on the basis of improving the model efficiency; 2. under the premise of not leaking depression label information, the migration learning method CORAL draws the feature distribution of the training set and the test set closer, and reduces the influence of other factors except the depression level on the feature distribution. The combination of the two methods can further improve the accuracy and stability of depression screening.

Description

Voice depression state identification method based on feature selection and transfer learning
Technical Field
The invention belongs to the field of voice signal processing, and particularly relates to a voice depression state identification method based on feature selection and transfer learning.
Background
The depression is a typical and common psychogenic disease in the world, covers all age stages, depends on clinical experience of doctors and relevant scales filled by patients, and has long time consumption in the whole process and low efficiency of a diagnosis process. The voice is an important external expression of emotion, and is a key direction for researchers to realize an automatic depression recognition method due to unique advantages of few use limitations, low equipment cost, no contact, noninvasive and convenient acquisition mode and the like.
At present, there is no specific feature with clear theoretical background support for depression identification, the feature design level is to extract depression-related information in voice as much as possible, generally using features of multiple fields with high dimensionality, and comparing classification results of different feature combinations. However, the number of used features is too large, the model is too complex, the recognition result takes too long, and the diagnosis efficiency is reduced.
Speaking among other people is a very complex process, and many studies have been made to explore the differences between the brain structure and function of patients with depression and the potential factors affecting speech in addition to depression, mainly including: gender, age, emotional state, language style, and academic work background. These factors further increase the difference in feature distribution of different tested speech signals, and increase the difficulty of model recognition.
In addition, in the machine learning related to the speech signal, it is usually assumed that the data in the test set and the data in the test set are independently and identically distributed when performing the training set and the test set division, however, the feature distribution of the tested speech signal is affected not only by the level of depression, but also by other factors such as age, sex, occupation, etc. of the tested individuality difference, so that this assumption condition is not satisfied, and the performance of the model is reduced.
Disclosure of Invention
In order to solve the problems, the invention provides a speech depression state identification method based on feature selection and transfer learning, which adopts the following technical scheme:
the invention provides a speech depression state identification method based on feature selection and transfer learning, which is characterized by comprising the following steps: step S1, collecting voice by using a recording device to obtain a voice sample; step S2, preprocessing the voice sample; step S3, extracting the voice characteristics in the voice sample, wherein the voice characteristics at least comprise chrominance characteristics; step S4, calculating statistic of voice characteristics, and taking the statistic as a characteristic set; step S5, using a Lasso model to perform feature selection on the feature set to obtain an effective feature set; step S6, based on the effective feature set, using CORAL method to perform transfer learning to obtain the training set features after transfer; and step S7, classifying the voice samples based on the characteristics of the training set, and outputting the classification result.
The speech depression state identification method based on feature selection and transfer learning provided by the invention can also have the technical characteristics, wherein the speech characteristics further comprise acoustic characteristics, frequency domain characteristics, pause characteristics and Mel frequency cepstrum coefficients.
The speech depression state identification method based on feature selection and transfer learning provided by the invention can also have the technical characteristics that the statistics comprises the maximum value, the minimum value, the range, the mean value, the median, the intercept term of linear regression, the independent variable coefficient of linear regression, R2 of linear regression, standard deviation, skewness, kurtosis and variation coefficient.
The speech depression state identification method based on feature selection and transfer learning provided by the invention can also have the technical characteristics that the classifier model used in classification is XGboost.
The speech depression state identification method based on feature selection and transfer learning provided by the invention can also have the technical characteristics that the preprocessing comprises the removal of noise fragments, the removal of mute fragments, high-pass filtering and down-sampling.
The invention provides a speech depression state recognition device based on feature selection and transfer learning, which is characterized by comprising the following components: a voice collecting part for collecting the voice sample; a preprocessing section for preprocessing the voice sample; a feature extraction unit configured to extract the speech feature of the speech sample; the characteristic processing part is used for processing the voice characteristics to obtain the effective characteristic set; the transfer learning part is used for carrying out transfer learning on the effective characteristic set to obtain the characteristics of the training set after the transfer; a classification section for classifying the voice sample.
Action and Effect of the invention
According to the method for recognizing the speech depression state based on the feature selection and the transfer learning, after the collected speech samples are preprocessed, the speech features are extracted, 12 statistics of the speech features are calculated to serve as feature sets, the feature sets are further subjected to the feature selection and the transfer learning, and training set features are obtained and used for classifying the speech samples. The method has the advantages that the Lasso model is used for feature selection, redundant information in the features is filtered, and effective features are reserved, so that the method is based on fewer features, lower model complexity achieves better recognition accuracy, the technical problem of high feature dimensionality in modeling based on voice is solved, and meanwhile, the model efficiency is improved.
On the other hand, because the feature-based unsupervised transfer learning method CORAL is used for transfer learning, the feature distribution of the training set and the test set can be drawn close by aligning the second-order covariance matrix on the premise of not revealing depression label information, and the influence of other factors except the depression level on the feature distribution is reduced, so that the technical problem that the feature distribution is influenced by the individual difference outside the tested depression level when modeling is carried out based on voice is solved. The combination of the two methods can further improve the accuracy and stability of depression screening.
Drawings
FIG. 1 is a flow chart of a speech suppression state recognition method based on feature selection and transfer learning according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a speech suppression state recognition apparatus based on feature selection and transfer learning according to an embodiment of the present invention.
Detailed Description
In order to make the technical means, the creation characteristics, the achievement purposes and the efficacies of the present invention easy to understand, the following describes the speech depression state identification method based on the characteristic selection and the transfer learning specifically with reference to the embodiment and the accompanying drawings.
< example 1>
Fig. 1 is a flowchart of a speech suppression state recognition method based on feature selection and transfer learning according to an embodiment of the present invention.
As shown in fig. 1, the method for recognizing a speech suppression state based on feature selection and transfer learning according to the embodiment of the present invention includes the following steps:
and step S1, acquiring voice information, acquiring voice by using a recording device, designing questions of different speech task types, answering the test according to prompts on a screen, acquiring the complete speaking process of the test by using the recording device, and recording the complete speaking process as a wav file, wherein the file is a voice sample.
And step S2, preprocessing the voice signal, preprocessing the collected voice sample, manually screening and eliminating obvious noise fragments, such as cough and dropped sound, and performing high-pass filtering, down-sampling, silence fragment detection and removal.
In this embodiment 1, a second-order butterworth filter with a cutoff frequency of 137.8Hz is used for high-pass filtering, so as to reduce the interference of low-frequency noise on the effective voice information; uniformly sampling the voice signal to 16000hz by using a tool kit librosa; the tool kit Pyaudioanalysis is used to detect voiced and unvoiced segments and remove unvoiced segments that are not voiced. Short-time Fourier transform: the window length is 0.1s, the sliding step length is 0.05s, a hamming window is selected, and NFFT is 1024.
Step S3, extracting voice characteristics in the voice sample, including acoustic characteristics, frequency domain characteristics, pause characteristics, Mel frequency cepstrum coefficients and chroma characteristics, see Table 1.
TABLE 1 Speech characteristics summary sheet
Figure BDA0003164774460000051
Figure BDA0003164774460000061
As shown in table 1, the acoustic features include 6 fundamental frequency, energy, and zero-crossing rate related features. The energy characteristics comprise sound intensity and sound intensity envelopes, and the zero-crossing rate related characteristics comprise zero-crossing rate, zero-crossing amplitude, namely the maximum amplitude of a signal between two zero-crossing points, and zero-crossing interval, namely the time interval between two zero-crossing points.
The number of frequency domain features is 5, which are respectively a spectrum center, a spectrum entropy, a spectrum extensibility, a spectrum roll-off point and a spectrum flux.
The total number of mel-frequency cepstrum coefficients is 13, which is a common feature in speech signal processing.
The total number of the chromaticity characteristics is 12, which is a general name of chromaticity maps and chromaticity vectors, represents the energy in 12 sound levels in unit time, the energy of the same sound level of different octaves is accumulated, the chromaticity characteristics are widely applied to the music field, and the method is introduced into the depression identification field.
The number of the pause characteristics is 3, and the pause times, the pause time ratio and the average pause time length ratio are included.
Step S4, calculating a feature statistic: 12 statistics of the speech features are calculated, and the statistics are taken as a feature set. The 12 statistics include: maximum, minimum, range, mean, median, intercept term of linear regression (time as argument), independent coefficient of linear regression (time as argument), R2 of linear regression (time as argument), standard deviation, skewness
Figure BDA0003164774460000071
Kurtosis
Figure BDA0003164774460000072
And coefficient of variation
Figure BDA0003164774460000073
Step S5, feature selection: and (4) performing feature selection on the feature set by using a Lasso model, and compressing the coefficient of the non-significant variable to obtain an effective feature set.
Lasso selects feature variables based on a penalty function, extracts valid features by compressing coefficients, and extracts a general linear regression model Y ═ X β + epsilon and a response variable Y ═ Y (Y ═ Y ∈)1,y2,…,yn)TThe independent variable X ═ X(1),X(2),…,X(m)) Wherein X is(i)Is an n × 1 order vector, and the regression coefficient β ═ β (β)12,…,βm)T. Based on the common least square estimation, the regression coefficients are compressed in a mode of adding a penalty function, part of coefficients can be compressed to 0, the features of the coefficients compressed to 0 are discarded, the remaining features are reserved effective features, and the Lasso estimation formula is as follows:
Figure BDA0003164774460000074
the method adopts Lasso-Logistic regression for classification tasks, compares different lambda parameters on the basis of the fixed parameters of a Logistic regression model, and determines the hyper-parameters according to the optimal accuracy. The penalty coefficient λ is determined by adjusting parameters through multi-round experimental cross validation, and it is tried to set the penalty coefficient λ to 1, 0.1, 0.01, 0.005, 0.001, 0.0005, 0.0001, 0.00005, and finally 0.005.
And step S6, performing transfer learning, namely performing transfer learning by using a domain adaptive method CORAL based on the effective characteristic set, and drawing up the characteristic distribution between the test set and the training set by aligning a second-order covariance matrix to obtain the characteristics of the training set after the transfer.
The invention introduces a feature-based unsupervised transfer learning method for reducing the feature distribution difference between a training set and a testing set caused by other individual factors except the depression level on the premise of not leaking depression label information: and (3) a domain adaptive Alignment (CORAL) method, which is used for drawing up the characteristic distribution of the training set and the test set by aligning a second-order covariance matrix. After white noise information is added to the covariance matrix of the target domain, linear transformation is performed, and only two blocks need to be calculated by CORAL: (1) a covariance matrix of the source domain features and the target domain features; (2) and performing linear transformation on the matrix added with the white noise. The specific steps of the migration algorithm are shown in table 2.
TABLE 2 CORAL Algorithm steps
Figure BDA0003164774460000081
And step S7, classifying, namely classifying the voice samples by using an XGboost classifier model based on the characteristics of the training set, and outputting the classification result of the voice samples.
The XGboost is a Boosting framework-based lifting tree model, and the identification error and variance of the model are reduced by integrating a plurality of CART decision trees into a strong classifier. The XGboost learns the residual error used for the last prediction of the XGboost and the last prediction every time based on a gradient descent tree setting function, the score obtained by each node is calculated according to a sample, the sum of all the scores is used as the classification result of the sample, and the model to be trained in the t iteration is set as ft(x) And then:
Figure BDA0003164774460000091
Figure BDA0003164774460000092
i.e. the classification result, x, of the model on the ith sample after t iterationsiRepresents the number of the i-th sample,
Figure BDA0003164774460000093
representing the predicted outcome of the t-1 tree, ftRepresenting the t-th tree. The objective function is set to:
Figure BDA0003164774460000094
obj (t) is the objective function value for t iterations,
Figure BDA0003164774460000095
training error for the ith sample;
Figure BDA0003164774460000096
is the sum of the model complexity of t trees, and is used as a regular term in the objective function. The model complexity Ω is determined by the total number T of decision tree nodes, and the weight coefficients of the decision tree nodes are written as:
Figure BDA0003164774460000097
in the formula
Figure BDA0003164774460000098
Is the L2 norm of the weight coefficient; gamma is the coefficient of the sliced leaf node, used to control the total number of nodes; λ is a regular term coefficient.
By training, it is estimated when to terminate training based on the objective function described above. And traversing all the characteristics by adopting a greedy algorithm as a division point during implementation, continuing the division if the OBJ after the division is larger than that before the division, and stopping the division if the weight coefficient or the depth exceeds a threshold value, so that overfitting of the model is avoided.
After training is completed, the model can be used for carrying out classification prediction on the voice sample, and the voice sample is judged to belong to a depressed subject or a normal subject. And finally outputs the result of the classification.
The embodiment also provides evaluation indexes of three classification results of the speech depression state: accuracy, F1 score and AUC values. The three evaluation indexes are specifically defined as follows:
Figure BDA0003164774460000101
Figure BDA0003164774460000102
the F1 score is the harmonic mean of recall and accuracy and is in the range of [0,1 ].
The AUC value is the area enclosed by the receiver operating characteristic curve (ROC) and the coordinate axis, and the abscissa of the ROC curve is
Figure BDA0003164774460000103
The ordinate is
Figure BDA0003164774460000104
The curve is above y-x and the value range is [0.5, 1]]。
Wherein, the definition of TP, FP, FN, TN is shown in Table 3.
TABLE 3 confusion matrix of classification results of speech depression states
Audio for depression being tested Normal tested audio
Determining audio belonging to a depressed subject True Positive(TP) False Positive(FP)
Judging the audio frequency belonging to the normal tested audio frequency False Negative(FN) True Negative(TN)
The values of the three evaluation indexes are positively correlated with the classification performance, and the larger the value is, the better the classification result is.
Therefore, through the voice depression state identification method based on feature selection and transfer learning, the depression state identification of the tested voice segment is realized, the classification result of the voice segment is obtained, and the evaluation of the classification result is obtained.
< example 2>
As described above, embodiment 1 provides a speech depression state recognition method based on feature selection and transfer learning, mainly including steps S1 to S6. In practical application, the steps of the method of embodiment 1 can be configured into corresponding computer modules, namely, a voice collecting part, a preprocessing part, a feature extracting part, a feature processing part, a transfer learning part and a classifying part, which form a device for classifying and identifying the voice depression state, so that a voice depression state identification device based on feature selection and transfer learning can also be provided.
Fig. 2 is a schematic diagram of a speech depression state recognition apparatus based on feature selection and transfer learning according to an embodiment of the present invention.
As shown in fig. 2, a speech depression state recognition apparatus (hereinafter, simply referred to as a speech suppression state recognition apparatus) 100 based on feature selection and transition learning includes a speech acquisition section 11, a preprocessing section 12, a feature extraction section 13, a feature processing section 14, a transition learning section 15, and a classification section 16. The speech depression state recognition device 100 is used for recognizing a target speech segment and obtaining a recognition result, namely the speech segment belongs to a depression subject test or a normal subject test.
The voice acquiring unit 11 acquires a voice sample by acquiring a voice segment to be tested, and adopts the voice acquiring method of step S1.
The preprocessing section 12 is for preprocessing the voice sample by the preprocessing method of step S2.
The feature extraction unit 13 is configured to extract a speech feature in the speech sample, and employs the speech feature extraction method of step S3.
The feature processing unit 14 is configured to process the extracted speech features to obtain an active feature set, and to adopt the feature processing method of steps S4 to S5.
The migration learning unit 15 performs migration learning to obtain training set features after migration, and adopts the migration learning method of step S6.
The classification unit 16 classifies the speech segment and outputs the result, and adopts the classification method of step S7.
The execution process of each part is consistent with the process described in the corresponding step in the speech suppression state recognition method based on feature selection and transfer learning, and is not described herein again.
Examples effects and effects
According to the method for recognizing the speech depression state based on the feature selection and the transfer learning, after the collected speech samples are preprocessed, the speech features are extracted, 12 statistics of the speech features are calculated to serve as feature sets, and the feature sets are further subjected to the feature selection and the transfer learning to obtain the features of a training set for classifying the speech samples. The method has the advantages that the Lasso model is used for feature selection, redundant information in the features is filtered, and effective features are reserved, so that the method is based on fewer features, lower model complexity achieves better recognition accuracy, the technical problem of high feature dimensionality in modeling based on voice is solved, and meanwhile, the model efficiency is improved.
On the other hand, in the embodiment, because the feature-based unsupervised migration learning method CORAL is used for migration learning, the feature distribution of the training set and the test set can be drawn closer by aligning the second-order covariance matrix on the premise of not revealing the information of the depression label, and the influence of other factors except the depression level on the feature distribution is reduced, so that the technical problem that the feature distribution is influenced by the individual difference outside the tested depression level when modeling is performed based on voice is solved. The combination of the two methods can further improve the accuracy and stability of depression screening.
The above-described embodiments are merely illustrative of specific embodiments of the present invention, and the present invention is not limited to the description of the above-described embodiments.
For example, in the embodiment, the penalty coefficient λ of the Lasso model is set to 0.005, and in the present invention, the penalty coefficient λ may also be adjusted to other suitable values, so that the technical effects of the present invention can also be achieved.
In the embodiment, the classifier model used for classification is XGBoost, and in the present invention, other classifier models may be used for classification, for example, LightGBM may also be used to achieve the technical effect of the present invention.

Claims (6)

1. A speech depression state identification method based on feature selection and transfer learning is used for identifying speech depression states and is characterized by comprising the following steps:
step S1, collecting voice by using a recording device to obtain a voice sample;
step S2, preprocessing the voice sample;
step S3, extracting the voice characteristics in the voice sample, wherein the voice characteristics at least comprise chrominance characteristics;
step S4, calculating statistic of the voice features, and taking the statistic as a feature set;
step S5, using a Lasso model to perform feature selection on the feature set to obtain an effective feature set;
step S6, based on the effective feature set, using a CORAL method to perform transfer learning to obtain the characteristics of the training set after transfer;
and step S7, classifying the voice samples based on the training set characteristics, and outputting a classification result.
2. The speech depression state recognition method based on feature selection and transfer learning according to claim 1, characterized in that:
wherein the speech features further include acoustic features, frequency domain features, pause features, and mel-frequency cepstrum coefficients.
3. The speech depression state recognition method based on feature selection and transfer learning according to claim 1, characterized in that:
wherein the statistics include a maximum value, a minimum value, a range, a mean, a median, an intercept term of the linear regression, an independent variable coefficient of the linear regression, R2 of the linear regression, a standard deviation, a skewness, a kurtosis, and a coefficient of variation.
4. The speech depression state recognition method based on feature selection and transfer learning according to claim 1, characterized in that:
and the classifier model used for classification is XGboost.
5. The speech depression state recognition method based on feature selection and transfer learning according to claim 1, characterized in that:
wherein the preprocessing includes removal of noise segments, removal of silence segments, high-pass filtering, and down-sampling.
6. A speech depression state recognition apparatus based on feature selection and transfer learning, comprising:
a voice collecting part for collecting the voice sample;
a preprocessing section for preprocessing the voice sample;
a feature extraction unit configured to extract the speech feature of the speech sample;
the characteristic processing part is used for processing the voice characteristics to obtain the effective characteristic set;
the transfer learning part is used for carrying out transfer learning on the effective characteristic set to obtain the characteristics of the training set after the transfer;
a classification section for classifying the voice sample.
CN202110801507.7A 2021-07-15 2021-07-15 Voice depression state identification method based on feature selection and transfer learning Pending CN113555004A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110801507.7A CN113555004A (en) 2021-07-15 2021-07-15 Voice depression state identification method based on feature selection and transfer learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110801507.7A CN113555004A (en) 2021-07-15 2021-07-15 Voice depression state identification method based on feature selection and transfer learning

Publications (1)

Publication Number Publication Date
CN113555004A true CN113555004A (en) 2021-10-26

Family

ID=78131917

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110801507.7A Pending CN113555004A (en) 2021-07-15 2021-07-15 Voice depression state identification method based on feature selection and transfer learning

Country Status (1)

Country Link
CN (1) CN113555004A (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017021267A (en) * 2015-07-14 2017-01-26 日本電信電話株式会社 Wiener filter design device, sound enhancement device, acoustic feature amount selection device, and method and program therefor
CN106725532A (en) * 2016-12-13 2017-05-31 兰州大学 Depression automatic evaluation system and method based on phonetic feature and machine learning
CN107221344A (en) * 2017-04-07 2017-09-29 南京邮电大学 A kind of speech emotional moving method
CN107657964A (en) * 2017-08-15 2018-02-02 西北大学 Depression aided detection method and grader based on acoustic feature and sparse mathematics
CN108830645A (en) * 2018-05-31 2018-11-16 厦门快商通信息技术有限公司 A kind of visitor's attrition prediction method and system
US20190385711A1 (en) * 2018-06-19 2019-12-19 Ellipsis Health, Inc. Systems and methods for mental health assessment
CN110808072A (en) * 2019-11-08 2020-02-18 广州科慧健远医疗科技有限公司 Method for evaluating dysarthria of children based on optimized acoustic parameters of data mining technology
CN110956310A (en) * 2019-11-14 2020-04-03 佛山科学技术学院 Fish feed feeding amount prediction method and system based on feature selection and support vector
CN110991535A (en) * 2019-12-04 2020-04-10 中山大学 pCR prediction method based on multi-type medical data
CN111210846A (en) * 2020-01-07 2020-05-29 重庆大学 Parkinson voice recognition system based on integrated manifold dimensionality reduction
CN111444747A (en) * 2019-01-17 2020-07-24 复旦大学 Epileptic state identification method based on transfer learning and cavity convolution
CN111898095A (en) * 2020-07-10 2020-11-06 佛山科学技术学院 Deep migration learning intelligent fault diagnosis method and device, storage medium and equipment
CN111915596A (en) * 2020-08-07 2020-11-10 杭州深睿博联科技有限公司 Method and device for predicting benign and malignant pulmonary nodules
US20210064829A1 (en) * 2019-08-27 2021-03-04 Nuance Communications, Inc. System and method for language processing using adaptive regularization
CN112906644A (en) * 2021-03-22 2021-06-04 重庆大学 Mechanical fault intelligent diagnosis method based on deep migration learning
CN112927722A (en) * 2021-01-25 2021-06-08 中国科学院心理研究所 Method for establishing depression perception system based on individual voice analysis and depression perception system thereof

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017021267A (en) * 2015-07-14 2017-01-26 日本電信電話株式会社 Wiener filter design device, sound enhancement device, acoustic feature amount selection device, and method and program therefor
CN106725532A (en) * 2016-12-13 2017-05-31 兰州大学 Depression automatic evaluation system and method based on phonetic feature and machine learning
CN107221344A (en) * 2017-04-07 2017-09-29 南京邮电大学 A kind of speech emotional moving method
CN107657964A (en) * 2017-08-15 2018-02-02 西北大学 Depression aided detection method and grader based on acoustic feature and sparse mathematics
CN108830645A (en) * 2018-05-31 2018-11-16 厦门快商通信息技术有限公司 A kind of visitor's attrition prediction method and system
US20190385711A1 (en) * 2018-06-19 2019-12-19 Ellipsis Health, Inc. Systems and methods for mental health assessment
CN111444747A (en) * 2019-01-17 2020-07-24 复旦大学 Epileptic state identification method based on transfer learning and cavity convolution
US20210064829A1 (en) * 2019-08-27 2021-03-04 Nuance Communications, Inc. System and method for language processing using adaptive regularization
CN110808072A (en) * 2019-11-08 2020-02-18 广州科慧健远医疗科技有限公司 Method for evaluating dysarthria of children based on optimized acoustic parameters of data mining technology
CN110956310A (en) * 2019-11-14 2020-04-03 佛山科学技术学院 Fish feed feeding amount prediction method and system based on feature selection and support vector
CN110991535A (en) * 2019-12-04 2020-04-10 中山大学 pCR prediction method based on multi-type medical data
CN111210846A (en) * 2020-01-07 2020-05-29 重庆大学 Parkinson voice recognition system based on integrated manifold dimensionality reduction
CN111898095A (en) * 2020-07-10 2020-11-06 佛山科学技术学院 Deep migration learning intelligent fault diagnosis method and device, storage medium and equipment
CN111915596A (en) * 2020-08-07 2020-11-10 杭州深睿博联科技有限公司 Method and device for predicting benign and malignant pulmonary nodules
CN112927722A (en) * 2021-01-25 2021-06-08 中国科学院心理研究所 Method for establishing depression perception system based on individual voice analysis and depression perception system thereof
CN112906644A (en) * 2021-03-22 2021-06-04 重庆大学 Mechanical fault intelligent diagnosis method based on deep migration learning

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
LEI SHEN ET AL.: "《Epileptic States Recognition Using Transfer Learning》", 《2019 41ST ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC)》 *
NIVEDHITHA MAHENDRAN ET AL.: "《Realizing a Stacking Generalization Model to Improve the Prediction Accuracy of Major Depressive Disorder in Adults》", 《IEEE ACCESS》 *
崔鸿雁等: "机器学习中的特征选择方法研究及展望", 《北京邮电大学学报》 *
彼得·布尔曼 等: "《高维数据统计方法、理论与应用》", 30 September 2018, 国防工业出版社 *
王景行: "基于回归的房价预测模型研究", 《全国流通经济》 *

Similar Documents

Publication Publication Date Title
CN107657964B (en) Depression auxiliary detection method and classifier based on acoustic features and sparse mathematics
CN109044396B (en) Intelligent heart sound identification method based on bidirectional long-time and short-time memory neural network
Ittichaichareon et al. Speech recognition using MFCC
Dibazar et al. Feature analysis for automatic detection of pathological speech
CN108281146A (en) A kind of phrase sound method for distinguishing speek person and device
CN101620853A (en) Speech-emotion recognition method based on improved fuzzy vector quantization
Srinivasan et al. Artificial neural network based pathological voice classification using MFCC features
CN109285551A (en) Disturbances in patients with Parkinson disease method for recognizing sound-groove based on WMFCC and DNN
CN115457966B (en) Pig cough sound identification method based on improved DS evidence theory multi-classifier fusion
CN113674767A (en) Depression state identification method based on multi-modal fusion
CN111489763A (en) Adaptive method for speaker recognition in complex environment based on GMM model
CN116842460A (en) Cough-related disease identification method and system based on attention mechanism and residual neural network
Janbakhshi et al. Automatic dysarthric speech detection exploiting pairwise distance-based convolutional neural networks
da Silva et al. Evaluation of a sliding window mechanism as DataAugmentation over emotion detection on speech
CN114299996A (en) AdaBoost algorithm-based speech analysis method and system for key characteristic parameters of symptoms of frozen gait of Parkinson&#39;s disease
Dibazar et al. A system for automatic detection of pathological speech
Roy et al. Pathological voice classification using deep learning
Vieira et al. Combining entropy measures and cepstral analysis for pathological voices assessment
Cai et al. The best input feature when using convolutional neural network for cough recognition
CN113555004A (en) Voice depression state identification method based on feature selection and transfer learning
CN114299925A (en) Method and system for obtaining importance measurement index of dysphagia symptom of Parkinson disease patient based on voice
CN113724731A (en) Method and device for audio discrimination by using audio discrimination model
CN113571050A (en) Voice depression state identification method based on Attention and Bi-LSTM
Kumar et al. Parkinson’s Speech Detection Using YAMNet
CN112259107A (en) Voiceprint recognition method under meeting scene small sample condition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20211026

RJ01 Rejection of invention patent application after publication