CN112998709A

CN112998709A - Depression degree detection method using audio data

Info

Publication number: CN112998709A
Application number: CN202110212777.4A
Authority: CN
Inventors: 乔亚男; 杨帆; 罗丹; 王珊; 薄钧戈; 黄程; 黄鑫; 房琛琛
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2021-02-25
Filing date: 2021-02-25
Publication date: 2021-06-22

Abstract

The invention discloses a depression degree detection method using audio data, which comprises the following steps: 1) acquiring a plurality of audio data samples; 2) extracting the characteristics of each audio data sample; 3) acquiring global features according to the features extracted in the step 2), and then segmenting each audio data sample to acquire the global features of each segment of samples obtained by segmentation; 4) training the deep convolutional network model according to the global characteristics of each section of the sample obtained in the step 3), and then detecting the melancholy degree of the object to be detected by using the trained deep convolutional network model.

Description

Depression degree detection method using audio data

Technical Field

The invention relates to a depression degree detection method, in particular to a depression degree detection method using audio data.

Background

At present, the clinical diagnosis standards of depression comprise diagnosis standards published by WHO formal International Classification of diseases and health problems, 10 th edition, diagnosis standards published by American psychiatric Association, mental disorder diagnosis statistics handbook, 4 th edition, Chinese Classification of mental disorders and diagnosis standards, 3 rd edition, and Chinese traditional and Western medicine integrated mental disease syndrome differentiation and typing diagnosis standards. Currently, most of the medical diagnosis methods for depression diseases are diagnosis of depression by professional physicians according to some medically well-listed diagnosis standards, communication with suspected patients, and health questionnaires. However, the diagnosis method has the limitations of strong subjectivity and small flexibility, and has low accuracy in depression degree diagnosis.

Disclosure of Invention

The present invention is directed to overcoming the above-mentioned disadvantages of the prior art and providing a depression degree detecting method using audio data, which can detect a depression degree more accurately.

In order to achieve the above object, the method for detecting a degree of depression using audio data according to the present invention comprises the steps of:

1) acquiring a plurality of audio data samples;

2) extracting the characteristics of each audio data sample;

3) acquiring global features according to the features extracted in the step 2), and then segmenting each audio data sample to acquire the global features of each segment of samples obtained by segmentation;

4) training the deep convolutional network model according to the global characteristics of each section of the sample obtained in the step 3), and then detecting the melancholy degree of the object to be detected by using the trained deep convolutional network model.

The specific operation of the step 2) is as follows:

performing feature extraction on each audio data sample by using CONVAEP, wherein each feature is acquired once every 10 milliseconds, and extracting features, wherein the extracted features comprise F0, VUV, NAQ, QOQ, H1H2, PSP, MDQ, peakSlope, Rd _ conf, MCEP _0-24, HMPDM _0-24 and HMPDD _ 0-12.

The specific operation of the step 3) is as follows:

averaging 10ms data of all characteristics of each audio data sample, taking the average result as the global characteristic of the audio data sample, dividing each audio data sample into 100 parts, and obtaining the global characteristic of each segment of sample.

The deep convolutional network model comprises a male deep convolutional network model and a female deep convolutional network model.

The male deep convolutional network model comprises a first convolutional layer, a first pooling layer, a second convolutional layer, a second pooling layer, a third convolutional layer, a third pooling layer and a full-link layer.

The female deep convolutional network model comprises a first convolutional layer, a first pooling layer, a second convolutional layer, a second pooling layer and a full-link layer.

The invention has the following beneficial effects:

when the method for detecting the depression degree by using the audio data is specifically operated, the audio data of the detected object only needs to be acquired, then the audio data of the detected object is input into the trained deep convolution network model, and the melancholy degree of the detected object is judged by using the trained deep convolution network model.

Drawings

FIG. 1 is a schematic diagram of a deep convolutional network model for males;

FIG. 2 is a schematic diagram of a girl's deep convolutional network model.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings:

the depression degree detection method using audio data according to the present invention includes the steps of:

1) the method comprises the steps of obtaining a plurality of audio data samples, wherein when the audio data samples are obtained, audio data in conversation with a user are obtained firstly, then denoising is carried out on the obtained audio data, the frequency spectrum of an inquirer is removed, the audio data of the user are guaranteed, and the audio data of the user are used as the audio data samples.

2) Extracting the characteristics of each audio data sample;

specifically, the CONVAEP is used for extracting features of each audio data sample, wherein each feature is acquired once every 10 milliseconds, and the features are extracted, and the extracted features comprise F0, VUV, NAQ, QOQ, H1H2, PSP, MDQ, peakSlope, Rd _ conf, MCEP _0-24, HMPDM _0-24 and HMPDD _0-12, namely 13 features with 73 features in total

specifically, 10ms data of all characteristics of each audio data sample is averaged, an average result is used as a global characteristic of the audio data sample, the time length of audio of each sample is different, so that the obtained audio data of each sample is different, and in order to meet the requirement of a uniform data format and collect more data, each audio data sample is divided into 100 parts to obtain the global characteristic of each segment of sample.

Because men and women have great difference in audio frequency, timbre and the like, if the same strategy is used on the audio frequency, great errors are caused, different networks are respectively used for modeling the men and the women, and the depression degrees of the men and the women are respectively predicted, namely, the deep convolution network models comprise a male deep convolution network model and a female deep convolution network model.

Claims

1. A depression degree detection method using audio data, comprising the steps of:

1) acquiring a plurality of audio data samples;

2) extracting the characteristics of each audio data sample;

2. The method for detecting a degree of depression using audio data according to claim 1, wherein the specific operation of step 2) is:

3. The method of detecting a degree of depression using audio data according to claim 1, wherein the specific operation of step 3) is:

4. The method of detecting a degree of depression using audio data according to claim 1, wherein the deep convolutional network model includes a deep convolutional network model for males and a deep convolutional network model for females.

5. The method of detecting depression degree using audio data according to claim 1, wherein the deep convolutional network model for men includes a first convolutional layer, a first pooling layer, a second convolutional layer, a second pooling layer, a third convolutional layer, a third pooling layer, and a full-link layer.

6. The method of detecting depression degree using audio data according to claim 1, wherein the deep convolutional network model of the female includes a first convolutional layer, a first pooling layer, a second convolutional layer, a second pooling layer, and a full-link layer.