CN111951824A

CN111951824A - Detection method for distinguishing depression based on sound

Info

Publication number: CN111951824A
Application number: CN202010817892.XA
Authority: CN
Inventors: 陆可; 李青青; 赵双双; 王颖捷
Original assignee: Suzhou Guoling Technology Research Intelligent Technology Co ltd
Current assignee: Suzhou Guoling Technology Research Intelligent Technology Co ltd
Priority date: 2020-08-14
Filing date: 2020-08-14
Publication date: 2020-11-17

Abstract

The invention discloses a detection method for judging depression based on sound, which is used for judging depression based on voice feature extraction and deep learning processing; through the collection and storage of the sound element datamation, the BSS algorithm analysis is carried out on the sound file data, and the voice is identified; using MFCC as characteristic parameter to analyze the speech signal to be processed, converting it into Mel frequency, making cepstrum analysis; respectively collecting data in the recording by adopting a plurality of groups of training data, and establishing a convolutional neural network model for discrimination; classifying and analyzing the obtained test sample data by using a BP neural network method; and judging the accuracy of the individual depression suffering probability based on sound judgment by adopting an ROC (rock characteristic) and AUC (AUC) model evaluation method based on a confusion matrix. The discrimination rate of the depression is obviously improved, and the cost is low.

Description

Detection method for distinguishing depression based on sound

Technical Field

The invention belongs to the technical field of voice processing, and particularly relates to a detection method for judging depression based on voice.

Background

Depression is a mental disorder accompanied by abnormalities in thought and behavior, and has become a serious public health and social problem worldwide. The report published by the world health organization in 2017 shows that more than 3 hundred million people are afflicted by depression worldwide, and in China, the number of depression patients reaches 5400 million people (accounting for 4.2 percent of the population), and the incidence rate is similar to the global level (4.4 percent); among young people of 15-24 years of age in china, about 120 million people suffer from depression; the incidence rate of the Chinese college student depression is up to 23.8% (which is similar to the data of British university); the 2015 report of the foundation of children of the United nations shows that the incidence rate of the depression of teenagers in rural areas is higher than that of urban peers; in china for example, absenteeism, medical and funeral costs due to depression result in a loss of $ 78 million each year. The depression is characterized in that the appearance of the patients is the same as that of the normal people, but the patients suffer from the pain in the heart, often feel depressed and have a strong mind, and the symptoms of the depression are from stuffiness to happiness to self-mutilation and social difficulties to the later stage and even have suicide thoughts or behaviors. Therefore, one of effective methods for reducing the suicide rate is to make detection in advance and treat the depression in time, namely based on an effective depression detection method. In recent years, the diagnosis of depression has been dependent on traditional depression detection methods such as the SDS depression self-rating scale, and SDS is mainly suitable for adults with depression symptoms, and is available for both psychological counseling and psychiatric outpatients or inpatients. For depressed patients with severe retardation symptoms, assessment is difficult. Scholars at home and abroad have also made a great deal of research, and Ozdas et al explore risk factors causing depression and suicide based on vocal cord tremor and the spectral range of glottal waves. But the number of the experimental samples is small, the verification in the case of large samples is lacked, and the establishment environment of the experimental samples comes from different communication equipment and environments. Therefore, the accuracy of the experimental result is influenced to a certain extent.

In addition, there are some journal literatures at home and abroad that disclose methods for detecting depression based on sound, for example, yanchu jade and others have studied "depression recognition technology research based on speech and facial features", and have analyzed audio data recorded in interviews based on speech feature parts. The audio features provided by the data set are extracted from the audio recording file by the covrep algorithm. Each 0.3334s is a time stamp, and the extracted audio features are recorded under each time stamp. According to the time sequence characteristics of the audio features, a long-short term memory network (LSTM) is established, meanwhile, the data sets are classified according to genders, the features are used as the input of the long-short term memory network (LSTM) according to the sequence of time stamps, and a prediction result based on the audio features is obtained. Wangtianyang et al studied effective feature analysis based on speech data and its application in depression level assessment, and herein used GMM to establish a multi-feature set decision system, trained models on multiple feature sets respectively, and then made decision fusion on the prediction results, and obtained 70% and 75% classification accuracy on male and female data respectively.

In addition, some domestic patent documents disclose methods for detecting depression based on sound, for example, chinese patent CN106725532A discloses an automatic depression assessment system and method based on speech features and machine learning, which are based on speech processing, feature extraction and machine learning technology to find the relation between speech features and depression, and provide objective reference for clinical diagnosis of depression. Chinese patent CN107657964A discloses a depression auxiliary detection method and a classifier based on acoustic features and sparse mathematics, and the depression judgment is based on the common recognition of voice and facial emotion; the estimation of the glottal signal is realized through an inverse filter, global analysis is adopted for the voice signal, characteristic parameters are extracted, the time sequence and distribution characteristics of the characteristic parameters are analyzed, and the rhythm rules of different emotion voices are found to be used as the basis of emotion recognition; and analyzing the voice signal to be processed by using the MFCC as a characteristic parameter, respectively acquiring data in the sound recording by using a plurality of groups of training data, and establishing a neural network model for discrimination. Chinese patent CN109171769A discloses a method and system for extracting voice and facial features applied to depression detection, which performs feature extraction on audio data according to an energy information method to obtain spectral parameters and acoustic parameters; inputting the parameters into a first deep neural network model to obtain voice depth characteristic data; performing static feature extraction on the video image to obtain a frame image; inputting the frame image into a second deep neural network model to obtain facial feature data; extracting dynamic features of the video image to obtain an optical flow image; inputting the optical flow image into a third deep neural network model to obtain facial motion characteristic data; inputting the facial feature data and the motion feature data into a third deep neural network model to obtain facial deep feature data; and inputting the voice depth feature data and the face depth feature data into a fourth neural network model to obtain fusion data. Chinese patent CN111329494A discloses a depression detection method based on voice keyword retrieval and voice emotion recognition, which can automatically recognize depression of a person to be detected by collecting voice information of the person to be detected and using voice features and voice texts extracted from the voice information.

While there have been many attempts to detect audio-based deprences using neural networks, existing methods mark one sample with a single audio 62 file at training, ultimately outputting the total prediction accuracy, and a single file does not have a probability of 63 predictions being correct. The invention is more representative by processing from a single file and estimating and judging aiming at the uniqueness of a single individual.

In summary, the problems of the prior art are as follows: the traditional depression detection method is based on SDS depression self-rating scale and subjective judgment of clinicians, has larger error, does not adopt BP neural network algorithm two-classification and AUC accuracy verification after MFCC voice feature extraction, and is lack of scientificity and effective objective evaluation index.

Disclosure of Invention

1. Problems to be solved

Aiming at the defects in the prior art, the invention provides the detection method for distinguishing the depression based on the sound, which greatly improves the depression recognition rate, and the method system can be easily built on a hospital detector or a computer, so that the software and hardware cost is low.

2. Technical scheme

In order to solve the problems, the technical scheme adopted by the invention is as follows:

the invention relates to a detection method for distinguishing depression based on voice, which is based on the depression distinguishing of voice feature extraction and deep learning processing; through the collection and storage of the sound element datamation, the BSS algorithm analysis is carried out on the sound file data, and the voice is identified; using MFCC as characteristic parameter to analyze the speech signal to be processed, converting it into Mel frequency, making cepstrum analysis; respectively collecting data in the recording by adopting a plurality of groups of training data, and establishing a convolutional neural network model for discrimination; classifying and analyzing the obtained test sample data by using a BP neural network method; and judging the accuracy of the individual depression suffering probability based on sound judgment by adopting an ROC (rock characteristic) and AUC (AUC) model evaluation method based on a confusion matrix.

The invention discloses a detection method for distinguishing depression based on sound, which comprises the following steps:

step S101, BSS algorithm analysis is carried out on the collected voice wav files, and then sound digital processing is carried out;

step S102, coding operation is carried out on the voice physical information, cepstrum (spectrum envelope and details) is carried out, 13-dimensional feature vectors of the MFCC are obtained for machine identification, 13-dimensional static coefficients of the original MFCC are supplemented, and the 13-dimensional static coefficients are converted into 39-dimensional MFCC used in identification, and the method comprises the following steps: inputting the static coefficient +13 first-order difference coefficient +13 second-order difference coefficient into a convolutional neural network model;

s103, establishing a convolutional neural network model for training, and autonomously extracting selection characteristics;

step S104, the BP network end receives the output characteristic vector, carries out error back-propagation training and classifies the input vector II;

s105, obtaining an accumulated value by using a statistical analysis method to obtain the probability of suffering from depression of an individual;

and S106, carrying out measurement evaluation on the binary model by using AUC and ROC to support accuracy.

Further, the step S101 specifically includes:

(1) sampling, quantizing and coding the recording to ensure the precision;

(2) 3 main indexes in the digitization of the sound signal are clearly and mainly extracted: sampling frequency, quantization bit number and channel number.

Further, the step S102 specifically includes:

(1) MFCC feature extraction comprises two key steps: converting to Mel frequency, and performing cepstrum analysis;

(2) the filter bank of the Mel scale has high resolution at the low frequency part, which is consistent with the auditory characteristics of human ears, and the physical meaning of the Mel scale is that the conversion to the Mel frequency step is that firstly, the Fourier transform is carried out on the time domain signal to convert the time domain signal into the frequency domain, then the division is carried out by utilizing the filter bank of the Mel frequency scale to correspond to the frequency domain signal, and finally, each frequency segment corresponds to a numerical value;

(3) the cepstrum analysis is to perform Fourier transform on time domain signals, then take log, perform inverse Fourier transform, and can be divided into complex cepstrum, real cepstrum and power cepstrum, and select power cepstrum in a limited way.

Further, the specific process of MFCC extracting features of step S102 is as follows:

(1) pre-emphasis, namely multiplying a coefficient by a frequency domain, wherein the coefficient is positively correlated with the frequency, so that the amplitude of a high frequency is improved; actually, an H (z) -1-Kz-1 high-pass filter is used to realize S' n-Sn-k Sn-1;

(2) windowing, namely performing windowing processing on the signal by using a Hamming window, wherein S' N is {0.54-0.46cos (2 pi (N-1) N-1) } Sn, and the side lobe size and the frequency spectrum leakage after FFT are weakened compared with a rectangular window function;

(3) converting the frequency domain, namely converting the time domain signal into the frequency domain for subsequent frequency analysis;

(4) filtering by using a Mel scale filter bank, and respectively multiplying and accumulating the frequency of the amplitude spectrum obtained by FFT with each filter to obtain a value, namely the energy value of the frame data in the corresponding frequency band of the filter, wherein if the number of the filters is 22, 22 energy values are obtained at the moment;

(5) the energy value is log, because the perception of human ears to sound is not linear, the nonlinear relation of log is better described, and the cepstrum analysis can be carried out after the log is taken;

(6) discrete cosine transform, performing inverse Fourier transform, and then obtaining a final low-frequency signal through a low-pass filter to obtain a final characteristic parameter; (7) and in order to enable the feature to better reflect the time domain continuity, the dimensionality of the frame information before and after the feature dimensionality can be increased, and the common mode is first-order difference and second-order difference, namely, first-order difference and second-order difference, and 13-dimensional MFCC is converted into 39-dimensional MFCC to be input into a convolutional neural network model.

Further, step S103 specifically includes:

(1) the first stage is a stage of data propagation from a low level to a high level, namely a forward propagation stage;

(2) the other stage is a stage of carrying out propagation training on the error from a high level to a bottom level when the result obtained by the current propagation is inconsistent with the expectation, namely a back propagation stage;

the method comprises the following specific steps:

a. initializing a weight value by the network;

b. the input data is transmitted forwards through a convolution layer, a down-sampling layer and a full-connection layer to obtain an output value;

c. calculating the error between the output value of the network and the target value;

d. when the error is larger than the expected value, the error is transmitted back to the network, and the errors of the full connection layer, the down sampling layer and the convolution layer are sequentially obtained;

e. when the error is equal to or less than our expected value, the training is finished;

f. and (c) updating the weight according to the obtained error, and then entering the step b.

Further, the step S104 specifically includes:

(1) network initialization, namely determining the number n of nodes of a network input layer, the number l of nodes of a hidden layer and the number m of nodes of an output layer according to a system input and output sequence (X, Y), initializing link weights omega ij and omega jk among neurons of the input layer, the hidden layer and the output layer, initializing a threshold value a of the hidden layer and a threshold value b of the output layer, and setting a learning rate and a neuron excitation function;

(2) hidden layer output calculation, namely calculating hidden layer output H, wherein Hj is f (Σ ω ijxi-aj) j is 1,2, …, l, and l is the number of hidden layer nodes according to an input variable X, an input layer and hidden layer interlayer connection weight ω ij and a hidden layer threshold a; f is the hidden layer excitation function;

(3) output layer output calculation, namely calculating BP neural network output O according to hidden layer output H, connecting the weight omega jk and a threshold b, and calculating Ok ∑ Hj ω jk-bk ═ 1,2, …, m;

(4) calculating a network prediction error e, ek-Ok-1, 2, …, m, based on the network prediction output O and the desired output Y;

(5) updating the weight, namely updating the network connection weight ω ij, ω jk, ω ij ═ ω ij + η Hi (1-Hj) x (i) Σ ω ijek j ═ 1,2, …, n according to the network prediction error e; j ═ 1,2, …, l; ω jk + η Hjek j 1,2, …, l; k is 1,2, …, where η is the learning rate;

(6) updating a threshold value, namely updating a network node threshold value a, b, aj + eta Hj (1-Hj) Sigma omega jkek j to 1,2, …, l according to the network prediction error e; bk-bk + ek-1, 2, …, m;

(7) judging whether the algorithm iteration is finished or not, and if not, returning to the step (2);

(8) the supervised learning classification algorithm qualitatively outputs classifications, each frame being directed to depression and not depression.

Further, the step S105 specifically includes:

(1) 1000 ten thousand frames of test data are extracted for training, and the pointing cumulative value is counted;

(2) setting a threshold, and if 800 ten thousand frames of classification points to depression, the person can be said to have depression at a probability of 80%; a 1-frame 20ms, 10 minute recording, which indicates depression if an 8 minute length of sound points to the person.

Further, the step S106 specifically includes:

(1) based on the concepts of Positive, Negative, True and False in the confusion matrix, the prediction category is 1, the prediction category is Positive, the prediction category is 0, the prediction category is Negative, the prediction is correct True, and the prediction error is False, and the four concepts are combined to generate a unique confusion matrix;

(2) calculating True Positive Rate and False Positive Rate, wherein TPRTate is TP/(TP + FN), FPRate is FP/(FP + TN), TPRTate means the proportion of 1 in all samples with real category of 1, and FPRate means the proportion of 1 in all samples with real category of 0;

(3) when the classifier is effective, for a sample with a true class of 1, the probability that the classifier predicts 1 (i.e., TPRate) is greater than the probability that the true class is 0 and the predicted class is 1 (i.e., FPRate), i.e., y > x;

(4) experiments show that 0.8 is used as a threshold value to obtain a series of TPRATE and FPRate, points are drawn, the area is calculated, and an AUC value can be obtained and is high, so that the method for evaluating depression based on sound judgment is reliable in accuracy.

In contrast, chinese patent CN109599129A discloses a speech depression recognition method based on attention mechanism and convolutional neural network, which first preprocesses speech data, and segments longer speech data, based on that the segmented segments can fully contain features related to depression; then extracting a Mel frequency spectrum graph from each segmented segment, and adjusting the size of the frequency spectrum graph input to the neural network model so as to facilitate the training of the model; then, fine tuning of the weight is carried out by using a pre-trained Alexnet deep convolution neural network, and higher-level voice characteristics in the Mel frequency spectrogram are extracted; then, using an attention mechanism algorithm to perform weight adjustment on the segment-level voice features to obtain sentence-level voice features; and finally classifying the sentence-level voice characteristics into depression by using an SVM classification model. The patent also carries out feature extraction on voice data through a convolutional neural network, extracts a Mel frequency spectrogram for optimization and adjustment, extracts feature Mel Frequency Cepstrum Coefficients (MFCCs) of voice signals as matrix vector features to represent features of the voice of a participant, and then continuously updates weights so as to obtain the best prediction effect. However, there are also many differences, firstly, in the preprocessing of the speech data, we delete the long silent part of each audio file and splice the rest into a whole new one. After this, a label indicating whether the participant is healthy or not is added to each file, with 0 label belonging to healthy persons and 1 label belonging to depressed persons, and the probability of prediction of the individual file is finally output through the softmax layer by conducting supervised learning, thereby judging how likely the test person is to suffer from depression.

3. Advantageous effects

Compared with the prior art, the invention has the beneficial effects that:

(1) compared with simple clinical detection or SDS (sodium dodecyl sulfate) depression scale self-test, the method can avoid the trouble of illumination, behavior, age and other problems on detection, extracts voice characteristics and deeply learns and processes the voice characteristics based on MFCC (Mel frequency cepstrum coefficient), cuts frames and analyzes a large amount of recorded data, statistically analyzes the output classification of a BP (Back propagation) neural network to obtain an accumulated value, obtains the probability that an individual suffers from depression, and carries out measurement and evaluation on a binary classification model by utilizing AUC (AUC) and ROC (rock characteristic curve), and the experimental result supports accuracy, so that the method provided by the invention can be used as a low-cost and high-efficiency method for detecting whether the depression exists or not;

(2) the detection method for judging the depression based on the sound has great improvement on the depression recognition rate, and the method system can be easily built on a hospital detector or a computer, so that the cost of software and hardware is low; is an accurate and effective depression detection method.

Drawings

The technical solutions of the present invention will be described in further detail below with reference to the accompanying drawings and examples, but it should be understood that these drawings are designed for illustrative purposes only and thus do not limit the scope of the present invention. Furthermore, unless otherwise indicated, the drawings are intended to be illustrative of the structural configurations described herein and are not necessarily drawn to scale.

FIG. 1 is a schematic flow chart of the detection method for distinguishing depression based on sound according to the present invention;

FIG. 2 is a process of the detection method for determining depression based on voice according to the present invention;

fig. 3 is another processing procedure of the detection method for discriminating depression based on sound according to the present invention.

Detailed Description

Exemplary embodiments of the present invention are described in detail below. Although these exemplary embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, it should be understood that other embodiments may be realized and that various changes to the invention may be made without departing from the spirit and scope of the present invention. The following more detailed description of the embodiments of the invention is not intended to limit the scope of the invention, as claimed, but is presented for purposes of illustration only and not limitation to describe the features and characteristics of the invention, to set forth the best mode of carrying out the invention, and to sufficiently enable one skilled in the art to practice the invention. Accordingly, the scope of the invention is to be limited only by the following claims.

As shown in fig. 1, the detection method for discriminating depression based on sound includes the steps of:

the step S101 specifically includes:

(1) sampling, quantizing and coding the recording to ensure the precision;

the step S012 specifically includes:

the specific process of feature extraction of the MFCC is as follows:

the method specifically comprises the following steps:

the method comprises the following specific steps:

a. initializing a weight value by the network;

the method specifically comprises the following steps:

The method specifically comprises the following steps:

Example 1

As shown in FIGS. 2 and 3, the distance Analysis overview kernel-Wizard in the Oz (DAIC-WOZ) dataset was used as experimental data and the above-described method was used.

Firstly, the DAIC-WOZ sample is preprocessed, and noise interference on subsequent feature extraction is reduced. Due to the original speech data, without preprocessing, intermittent periods of blanking may occur. According to the method, the threshold value is flexibly set, whether the current state is in a mute state or not is determined, if the current state exceeds the threshold value, deletion is selected, and blanks of 0.03 are added to the left end and the right end of the audio frequency, so that the stability of the sound is ensured, and meanwhile, each file is marked as 'depression' or 'health', and the subsequent data processing is facilitated;

secondly, extracting Mel frequency cepstrum coefficients of the voice signals, and finally extracting MFCCs for obtaining feature data of unique voice attributes of participants through the steps of pre-emphasis, framing and windowing, FFT conversion, Mel filtering calculation and the like of the voice files, wherein the feature data are data which are vital to normal training of a network model;

and finally, inputting the extracted MFCCS characteristics into a convolutional neural network model, obtaining the error between an actual result and a target value through classification prediction of different convolutional layers, full-link layers and a softmax function, then reversely propagating an error value by using a BP algorithm, updating the network weight and optimizing the structure of the network, and finally obtaining the probability value of the single file which is predicted to be healthy or depressed. And evaluating the test set of the trained model to obtain the proportion of the predicted correct frame number in a single file and obtain the final prediction precision of the single file.

Overall, the overall prediction accuracy was 0.86, and the average prediction accuracy for a single file was 0.84. The accuracy of the predictions using a metric model of AUC and ROC includes the probability that a healthy person is predicted to be healthy (TPR) and the probability that a depressed person is predicted to be depressed (FPR). When the relevant training parameters are adjusted, the model still has higher stability and prediction precision, and the effectiveness of the method is proved.

Claims

1. A detection method for judging depression based on voice is characterized in that BSS algorithm analysis is carried out on voice file data through collection and storage of voice element datamation, and voice is identified; using MFCC as characteristic parameter to analyze the speech signal to be processed, converting it into Mel frequency, making cepstrum analysis; respectively collecting data in the recording by adopting a plurality of groups of training data, and establishing a convolutional neural network model for discrimination; classifying and analyzing the obtained test sample data by using a BP neural network method; and judging the accuracy of the individual depression suffering probability based on sound judgment by adopting an ROC (rock characteristic) and AUC (automatic characteristic) model evaluation method based on a confusion matrix.

2. The detection method for distinguishing the depression based on the voice according to the claim 1 is characterized by comprising the following specific steps:

step S102, coding operation and cepstrum are carried out on the voice physical information to obtain 13-dimensional feature vectors of the MFCC for machine identification, 13-dimensional static coefficients of the original MFCC are supplemented and converted into 39-dimensional MFCC used in identification, and the method comprises the following steps: inputting the static coefficient +13 first-order difference coefficient +13 second-order difference coefficient into a convolutional neural network model;

3. The detection method for distinguishing depression based on sound according to claim 2, wherein the step S101 specifically includes:

(1) sampling, quantizing and coding the audio record to ensure the precision;

4. The detection method for distinguishing depression based on sound according to claim 2, wherein the step S102 specifically includes:

(2) firstly, Fourier transform is carried out on a time domain signal to convert the time domain signal into a frequency domain, then, a filter bank with a Mel frequency scale is utilized to divide the frequency domain signal, and finally, each frequency segment corresponds to a numerical value;

(3) the cepstrum analysis is to perform Fourier transform on time domain signals, then take log, perform inverse Fourier transform, and can be divided into complex cepstrum, real cepstrum and power cepstrum, and preferentially select the power cepstrum.

5. The sound-based depression detection method according to claim 2, wherein the MFCC extraction features specifically include:

(1) pre-emphasis, multiplying a coefficient by a frequency domain, wherein the coefficient is positively correlated with the frequency, and realizing S' n Sn-k Sn-1 through an H (z) -1-Kz-1 high-pass filter;

(2) windowing, namely performing windowing processing on the signal by using a Hamming window, wherein S' N is {0.54-0.46cos (2 pi (N-1) N-1) } Sn;

(4) filtering by using a Mel scale filter bank, and respectively multiplying and accumulating the amplitude spectrum obtained by FFT with each filter to obtain a value, namely the energy value of the frame data in the corresponding frequency band of the filter;

(5) taking log of the energy value, and performing cepstrum analysis after the log is taken;

(6) discrete cosine transform, performing inverse Fourier transform, and then obtaining a final low-frequency signal through a low-pass filter to obtain a final characteristic parameter;

(7) and (4) converting the 13-dimensional MFCC into a 39-dimensional MFCC input convolutional neural network model by adopting a first-order difference and a second-order difference.

6. The detection method for distinguishing depression based on sound according to claim 2, wherein the step S103 specifically includes:

(2) the other stage is a stage of carrying out propagation training on the error from a high level to a bottom level when the result obtained by the current propagation is inconsistent with the expectation, namely a back propagation stage; specifically, as follows, the following description will be given,

a. initializing a weight value by the network;

7. The detection method for distinguishing depression based on sound according to claim 2, wherein the step S104 specifically includes:

(3) output layer output calculation, namely calculating BP neural network output O, Ok ∑ Hj ω jk-bk, k ∑ 1,2, …, m according to hidden layer output H, connecting weight ω jk and threshold b;

(5) updating the weight, namely updating the network connection weight ω ij, ω jk, ω ij ═ ω ij + η Hi (1-Hj) x (i) Σ ω ijek, j ═ 1,2, …, n according to the network prediction error e; j ═ 1,2, …, l; ω jk + η Hjek j 1,2, …, l; k is 1,2, …, where η is the learning rate;

(6) updating a threshold value, namely updating a network node threshold value a, b, aj + eta Hj (1-Hj) Sigma omega jkek j to 1,2, …, l according to the network prediction error e; bk ═ bk + ek, k ═ 1,2, …, m;

8. The detection method for distinguishing depression based on sound according to claim 2, wherein the step S105 specifically includes:

(2) setting a threshold value, and if 800 ten thousand frames of classification points to depression, the person has 80% of depression; 120 ms, 10 minute recording, and if an 8 minute length of sound points to the person with depression, the person suffers depression.

9. The detection method for distinguishing depression based on sound according to claim 2, wherein the step S106 specifically includes:

(1) based on the concepts of Positive, Negative, True and False in the confusion matrix, the prediction category is 1, Positive (Positive), the prediction category is 0, Negative (Negative), True (True) and False (False);

(4) through experiments, 0.8 is set as a threshold value, a series of TPRTate, FPRate and tracing points are obtained, the area is calculated, the AUC value is high, and the accuracy of the method for evaluating depression based on sound judgment is reliable.