WO2021147363A1

WO2021147363A1 - Text-based major depressive disorder recognition method

Info

Publication number: WO2021147363A1
Application number: PCT/CN2020/117579
Authority: WO
Inventors: 王迎雪; 王刚; 邹博超; 王英华; 陈勤琴; 刘弋锋; 谢海永; 丰雷; 冯媛
Original assignee: 中国电子科技集团公司电子科学研究院; 首都医科大学附属北京安定医院
Priority date: 2020-01-20
Filing date: 2020-09-25
Publication date: 2021-07-29
Also published as: CN111241817A

Abstract

Disclosed is a text-based major depressive disorder recognition method capable of improving the accuracy of recognizing major depressive disorder. The text-based major depressive disorder recognition method comprises: acquiring voice information of a user under test, and converting same into text information; converting the text information into a text embedding vector; using a major depressive disorder prediction model to perform prediction on the basis of the text embedding vector, and obtaining a prediction result, the major depressive disorder prediction model being obtained by using a long short-term memory (LSTM) model to train text embedding vector samples, the text embedding vector samples including major depressive disorder text embedding vector samples and non-major depressive disorder text embedding vector samples; and determining a target keyword contained in the text information; and determining whether the user under test is affected with major depressive disorder according to a weighting result of the prediction result and a weight value corresponding thereto and the target keyword and a weight value corresponding thereto.

Description

A Text-based Recognition Method of Depression

Cross-references to related applications

This disclosure claims the priority of a Chinese patent application filed with the Chinese Patent Office on January 20, 2020, with the application number 2020100650965 and titled "A text-based method for identifying depression", the entire content of which is incorporated into this disclosure by reference middle.

Technical field

The present disclosure relates to the field of machine learning technology, and in particular to a text-based depression recognition method.

Background technique

The global prevalence of depression (Major Depressive Disorder, MDD) is as high as 5%-12%, and 15% of patients commit suicide. The prevalence rate of depression in my country is 6.1%. According to estimates by the Chinese Center for Disease Control and Prevention, the proportion of depression in the total burden of disease in my country will increase to 7.3% in 2020. Depression has become a major public health problem and has urgent clinical research needs.

Currently, the recognition methods of depression include video information-based depression recognition methods, audio features-based depression recognition methods, and other methods. The above methods only use single modal features such as voice features or video features to identify depression, and the discriminative information contained therein is not enough, thereby reducing the recognition accuracy of depression recognition.

Summary of the invention

The embodiments of the present disclosure provide a text-based depression recognition method to improve the accuracy of depression recognition.

According to an embodiment of the present disclosure, a text-based method for identifying depression is provided, and the method includes:

Obtain the voice information of the tested user and convert it into text information;

Converting the text information into a text embedding vector;

Based on the text embedding vector, a prediction result is obtained by using a depression prediction model, the depression prediction model is obtained by training a text embedding vector sample using a long- and short-term memory model LSTM, and the text embedding vector sample includes depression Embedding vector samples of symptom text and non-depression text embedding vector samples;

Determine the target keywords contained in the text information; and

According to the prediction result and its corresponding weight value and the weighted result of the target keyword and its corresponding weight value, it is determined whether the tested user is a user with depression.

Optionally, the method for converting the voice information into the text information may include: converting using a mature voice-text conversion algorithm model; and performing professional manual translation.

Optionally, the text information is composed of several sentences arranged in chronological order; and the conversion of the text information into a text embedding vector specifically includes:

Based on the Bert (Bidirectional Encoder Representations from Transformers) model, several sentences arranged in chronological order are respectively converted into text embedding models, and several text embedding vectors arranged in chronological order are obtained.

Optionally, converting the text information into a text embedding vector may adopt sentence-level 768-dimensional text embedding based on the Bert model.

Optionally, the long-short-term memory model LSTM may be a bidirectional variable-length LSTM model.

Optionally, determining the target keyword contained in the text information specifically includes:

Searching for preset candidate keywords from the text information;

For the searched candidate keywords, determine whether the sentence in which the candidate keywords are located contains negative words; and

It is determined that candidate keywords that do not contain negative words in the sentence are the target keywords.

Optionally, the target keyword may include multiple categories, wherein the weight value corresponding to the target keyword of each category is different.

According to an embodiment of the present disclosure, a text-based depression recognition device is provided, and the device includes:

A text conversion unit, which is configured to obtain the voice information of the tested user and convert it into text information;

A vector conversion unit configured to convert the text information into a text embedding vector;

A prediction unit configured to perform prediction based on the text embedding vector using a depression prediction model to obtain a prediction result, and the depression prediction model is obtained by training a text embedding vector sample using a long and short-term memory model LSTM, so The text embedding vector samples include depression text embedding vector samples and non-depressive text embedding vector samples;

A first determining unit configured to determine the target keyword contained in the text information; and

The second determining unit is configured to determine whether the tested user is a user with depression according to the prediction result and its corresponding weight value and the weighted result of the target keyword and its corresponding weight value.

Optionally, the text information is composed of several sentences arranged in chronological order; and

The vector conversion unit is specifically configured to convert several sentences arranged in chronological order into a text embedding model based on the Bert model to obtain several text embedding vectors arranged in chronological order.

Optionally, the first determining unit is specifically configured to: search for a preset candidate keyword from the text information; for the searched candidate keyword, determine whether the candidate keyword is in the sentence Contains negative words; it is determined that candidate keywords that do not contain negative words in the sentence are the target keywords.

According to an embodiment of the present disclosure, a computing device is provided. The computing device includes a memory, a processor, and a computer program that is stored on the memory and can run on the processor. The processor implements the steps described in any of the above methods when executed.

According to an embodiment of the present disclosure, a computer storage medium is provided, and a computer program is stored on the computer storage medium, and when the computer program is executed by a processor, the steps of any one of the above methods are implemented.

With the above technical solutions, the present disclosure has at least the following advantages:

The text-based depression recognition method described in the present disclosure converts speech information into text information, and then uses the Bert model to convert the text information into a text embedding vector, and at the same time uses an LSTM neural network to use a recurrent layer to model the text embedding vector , So as to better express the relevance of the text information context. By optimizing the model and increasing the complexity of the model, deeper text features can be mined, thereby greatly improving the recognition accuracy of depression;

The method provided by the embodiments of the present disclosure can also be used in most occasions, is not limited by the place of use, is faster and more efficient, and does not need to collect patient facial videos, which helps to protect the privacy of subjects;

In addition, the method provided in the embodiments of the present disclosure provides more quantitative and objective results than the traditional method based on the PHQ-9 questionnaire, which is more subjective; and

The method provided by the embodiment of the present disclosure can be deployed on a PC (personal computer), a mobile client, etc., has the characteristics of simplicity, efficiency, and speed, and can assist the diagnosis and identification of depression.

Description of the drawings

In order to explain the specific embodiments of the present disclosure more clearly, the following will briefly introduce the drawings that need to be used in the specific embodiments. Obviously, the drawings in the following description only show some embodiments of the present disclosure. Therefore, It should not be regarded as a limitation of the scope. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.

FIG. 1 is a schematic block diagram of processing steps of a text-based depression recognition method according to an embodiment of the present disclosure;

2 is a schematic diagram of an implementation process of a text-based depression recognition method according to an embodiment of the present disclosure;

3 is a schematic diagram of the processing flow of sentence-level text embedding based on the Bert model according to an embodiment of the present disclosure;

4 is a schematic diagram of the frequency of occurrence of four types of keywords in depression/non-depression patients according to an embodiment of the present disclosure;

5 is a schematic diagram of a prediction result of a depression prediction model according to an embodiment of the present disclosure and a keyword fusion process determined from text information;

Fig. 6 is a schematic structural diagram of a text-based depression recognition device according to an embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of a computing device according to an embodiment of the present disclosure.

Reference signs:

61-text conversion unit, 62-vector conversion unit, 63-prediction unit, 64-first determination unit, 65-second determination unit, 70-computing device, 71-processor, 72-memory, 721-random access Memory (RAM), 722-cache memory, 723-read only memory (ROM), 724-program module, 725-program/utility, 73-bus, 74-external device, 75-input/output (I/O ) Interface, 76-Network Adapter.

Detailed ways

In order to further illustrate the technical means and effects adopted by the present disclosure to achieve the predetermined purpose, the present disclosure will be described in detail below with reference to the accompanying drawings and preferred embodiments. It should be understood that the specific embodiments described herein are only used to explain the present disclosure, but not used to limit the present disclosure.

First, some terms involved in the embodiments of the present disclosure will be described to facilitate the understanding by those skilled in the art.

Bert (Bidirectional Encoder Representations from Transformers) model: is a general pre-trained language representation model, which is based on the two-way encoder representation of the converter, that is, when processing a word, it can consider the word information before and after the word, thus Get the semantics of the context.

It should be noted that the terms “first” and “second” in the description and claims in the embodiments of the present disclosure and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or Priority. It should be understood that the data used in this way can be interchanged under appropriate circumstances so that the embodiments described herein can be implemented in a sequence other than the content illustrated or described herein.

The "plurality or several" mentioned in this article refers to two or more than two. "And/or" describes the association relationship of the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects are in an "or" relationship.

Depression, also known as depressive disorder, is the main type of mood disorder with significant and lasting depression as its main clinical feature. At present, depression has become the largest burden of mental illness in mankind, and it is the main reason for people's loss of mobility. And because depression patients often experience psychiatric symptoms such as hallucinations, delusions, and even suicide attempts or behaviors, depression has already had a great impact on the individual, family, and society of the patient. The World Health Organization reports that as of 2017, about 300 million people in the world suffer from depression. In China, the incidence of depression is about 6.1%, and currently about 30 million people have been diagnosed with depression. Less than 10% of these 30 million depression patients receive professional assistance and treatment. At the same time, there are still quite a few patients who are not aware that they are suffering from depression. Moreover, the recognition rate of depression in many hospitals is not high, causing patients to be often missed. If depression is not treated in time, it will cause depression to become chronic or develop into a more difficult-to-treat disease. In severe cases, there will be disability and even suicide. Therefore, accurate recognition of depression is one of the medical problems that need to be solved urgently.

One of the current mainstream clinical self-rating scales for depression is the PHQ-9 (Depression Self-Assessment Scale) questionnaire. The questionnaire contains some specific questions, such as how you sleep, whether you feel depressed, whether you have ever had suicidal behavior, etc. For questions, give the corresponding score for each answer, and finally calculate the sum to get the PHQ-9 total score. The PHQ-9 score is only used as an important reference indicator for judging whether it is depression. The final diagnosis is still based on long-term observation and inquiry of professional doctors. This method has two flaws. One is that the PHQ-9 self-assessment form is completely determined by the tested user, but the tested user may not be willing to disclose certain things. The more typical one is that many people are unwilling to share their own suicide experiences. , And so on, this will have a certain impact on the PHQ-9 score; the second drawback is that professional doctors have limited medical resources, and even professional doctors’ judgments on depression will inevitably lead to misdiagnosis, missed diagnosis, etc., and even different Doctors get different diagnosis results for the same user under test, so it is difficult to objectively quantify the diagnosis and recognition of depression.

In order to overcome the above-mentioned drawbacks, depression detection methods based on video information, audio features and other monomodal information have emerged. For audio data, recognition is performed by extracting features such as the Mel spectrum coefficient of the audio data, and its best accuracy reaches 74.3%. For video data, OpenFace extracts 68 points of 3D features on the face for recognition, and its accuracy reaches 73.7%. Although these methods use machine learning to automatically identify depression, the above methods only use audio features or video features to identify them, and the discriminative information contained therein is not enough, and the recognition accuracy does not meet the clinical requirements.

In view of this, in the embodiments of the present disclosure, combining traditional clinical diagnosis methods and modern machine learning methods, a text-based depression recognition method is proposed. This method first conducts a dialogue between a professional doctor and the tested user. The content of the dialogue is a specific problem designed for depression, and then the acquired voice information is converted into text information. Based on the converted text information, the Bert-Lstm-based text emotion recognition and keyword recognition fusion method is used for discrimination. The Bert-Lstm-based text emotion recognition method is mainly based on Bert converting text information into text embedding vectors, then using LSTM to model the text embedding vectors, and finally using the trained Bert-Lstm model to classify the text. The goal of keyword recognition is to distinguish the sensitive keywords in the dialogue and find the vocabulary that is significantly different between normal people and depression patients. After that, the results obtained by the two methods are merged at the decision-making level, and appropriate weights are given through repeated experiments and training. Compared with methods based on video and audio data, this method uses text to bring information that is more accurate and intuitive, and it can cut to the point, and its recognition accuracy is relatively high. The method provided by the embodiment of the present disclosure can be deployed on a PC (personal computer), a mobile client, etc., has the characteristics of simplicity, efficiency, and speed, and can assist the diagnosis and identification of depression.

Referring to FIG. 1, the text-based depression recognition method provided by the embodiments of the present disclosure can be composed of the following five parts: data collection preprocessing, text conversion, text embedding modeling, keyword recognition, and decision-level fusion. Among them, in the data collection preprocessing process, the embodiment of the present disclosure collects the voice information of the user under test through the microphone, and at the same time deletes the voice asked by the doctor or the machine to avoid excessive external voice noise. In the text conversion process, the embodiments of the present disclosure adopt voice-text conversion technology, or ask professionals to perform translation, so as to avoid personal confusion factors such as accent and speech speed. In the text embedding modeling process, the embodiment of the present disclosure adopts Bert sentence-level embedding, and then adopts an LSTM (Long Short Term Memory Model) network to model sentences in the text information. In the keyword recognition process, the embodiments of the present disclosure screen sensitive words related to depression, and then identify these target keywords from the semantics of the text information, and perform weighted score discrimination. In the decision-level fusion process, the discriminative results of the two processes of text embedding modeling and keyword recognition are fused, and then the final result is given.

Specifically, as shown in FIG. 2, it is a schematic diagram of the implementation process of the text-based depression recognition method provided by the embodiments of the present disclosure, and the method includes the following steps:

S21: Acquire the voice information of the tested user and convert it into text information.

In this step, the form of question-and-answer interview can be used. The questions include some problems related to depression, such as how do you sleep in the last three months, whether you have loss of appetite or overeating, whether you feel depressed, and it is difficult to concentrate. Something and so on.

After the voice information of the tested user is collected, the collected voice information can be preprocessed, and the voices that are not related to the tested user’s answer can be deleted. The main purpose is to delete the question voice and the gaps in the dialogue, so as to ensure the data input into the model. Only the data of the tested user is included.

During specific implementation, preprocessing can be performed in any of the following ways:

Method 1: Record the start and end time of the question and answer separately, and delete them according to the time interval;

Method 2: Screen and eliminate according to voice characteristics.

In another embodiment, it is also possible to record only the voice of the user under test when recording the voice, which is the most convenient and quickest. In addition, the modal particles uttered by the tested user, or the voices such as sobbing, laughing, etc. should be retained. These often contain very critical information, and sometimes some deep features can be extracted from these semantic data.

In specific implementation, one of the following methods can be used to obtain the voice information of the tested user: One is to develop into a mobile phone APP (application) or computer software, so that the tested user can diagnose by himself or with the help of family members, which is convenient and quick ; The second is that the tested user is online or the hospital directly talks with the doctor, and records the answer. This kind of question and answer is more flexible, and the doctor can make further inquiries based on the patient's own or the answer, and make a more accurate diagnosis.

After obtaining the voice information of the tested user, the voice information is converted into text information. In this way, the accent, speed of speech, intonation and other confusing factors caused by different individuals can be avoided. Secondly, if the voice information is directly used for discrimination, it is often only based on the spectral characteristics, or the amplitude and phase characteristics, thereby ignoring the semantic characteristics, so the recognition accuracy will not be very high.

In the embodiments of the present disclosure, there are mainly two methods for converting voice information into text information. One is to use a mature voice-to-text conversion algorithm model for conversion, which saves time, effort, and convenience, and the other is to perform professional manual translation. Although this is time-consuming, the conversion accuracy is higher than the former.

S22: Convert the obtained text information into a text embedding vector.

In specific implementation, the text information converted according to the voice information of the tested user is composed of several sentences arranged in chronological order. Based on this, based on the Bert model, several sentences arranged in chronological order can be converted into text embedding models respectively, and several text embedding vectors arranged in chronological order can be obtained.

Specifically, as shown in FIG. 3, the embodiment of the present disclosure adopts sentence-level 768-dimensional text embedding based on the Bert model to convert each word in each sentence of the tested user into a 768-dimensional vector. For example, "i "can't sleep" is converted into three 768-dimensional vectors, which are word-level text embeddings. Compared with the classic Word2Vec embedding, Bert-level embedding is context-sensitive, such as "I work in a bank" (I work in a bank), "riverbank" (河边), the embedding vector converted by these two "banks" are different.

Optionally, the embodiment of the present disclosure may adopt a Bert pre-training model based on Chinese text to process Chinese text information. The embodiment of the present disclosure adopts sentence-level text embedding, which can average three 768-dimensional vectors after conversion to obtain one 768-dimensional vector, which is the sentence-level vector expression of the sentence "can’t sleep". For a tested user, there will be a lot of answer sentences. In the specific implementation, every sentence of the tested user can be converted into such a 768-dimensional vector, so that multiple chronologically arranged The 768-dimensional text embedding vector is the total number of sentences answered by the tested user.

S23. Based on the obtained text embedding vector, use the depression prediction model to perform prediction to obtain a prediction result.

Wherein, the depression prediction model is obtained by training a text embedding vector sample using a long and short-term memory model LSTM, and the text embedding vector sample includes a depression text embedding vector sample and a non-depression text embedding vector sample.

Through Bert modeling, a multi-dimensional sentence vector with temporal characteristics can be obtained, and the dimensions are all 768. The LSTM model can learn some in-depth information in the timing characteristics, which is suitable for processing timing problems, and has unique advantages in solving the problem of vanishing or exploding gradients in traditional recurrent neural networks. In the embodiment of the present disclosure, the 768-dimensional vector is regarded as the feature number, and its step size is the total number of sentences of the tested user, and the cycle is performed according to the time sequence of each sentence. Since the total number of sentences of each tested user is different, the two-way variable length LSTM model is used here, that is, the maximum step size is the maximum number of sentences, and each tested user has a variable as its effective length. When the hidden layer of the LSTM network loops to its effective length, it stops the loop and outputs the result to the output layer.

In an embodiment, the loss function may adopt a cross-entropy loss function, the learning rate is 0.01, and the neuron nodes are all 64.

S24. Determine the target keyword contained in the text information.

As shown in Table 1, they are common keywords related to depression.

Table 1

As shown in Figure 4, based on the English DAIC (Distress Analysis Interview Corpus, Depression Analysis Interview Corpus) data set, regarding the frequency of occurrence of depression and non-depression for the four types of keywords, the first column 2.48 indicates that in non-depressive patients Among them, there are 2.48 first-class keywords per 10,000 words on average. As can be seen from Figure 4, the frequency of these four keywords in depressed patients and non-depressive patients is significantly different. Optionally, the target keyword includes multiple categories, and the weight value corresponding to the target keyword of each category is different.

In one embodiment, keywords related to depression can be divided into four categories. The first category is keywords that are highly related to depression, such as "suicidal", "kill myself", "depression", and "mental". "Illness", suicidal tendency is a significant feature of depression. In addition, many depression patients also know that they are suffering from depression or mental illness. Therefore, words related to suicide and depression are highly relevant keywords. The second category is keywords related to sleep, such as "not sleep", "difficult sleeping", "insomnia", "nightmares", "toss and turn". Depression patients usually have symptoms such as long-term insomnia and loss of appetite. , Here the keywords related to insomnia are extracted separately as a category. The third category is the performance of patients with general depression, mainly feeling depressed, anxious, and helpless, such as "depressed", "upset", "hopelessness", "helpless", etc. This category describes the typical symptoms of depression. Normal people will occasionally experience loss of psychology, but patients with depression will be depressed and not excited for a long time. According to statistics, 90% of depression patients enter a depressive state after continuous mania, and 60% of patients show manic symptoms after experiencing a continuous depressive state. Therefore, the fourth category is mainly irritability and loneliness. The relevance is relatively small, because normal people will also experience manic, irritable, feeling lonely and other phenomena, but the frequency of depression is higher. Such keywords are "irritable", "uncontrollable", "seclusive", "loner" "Wait.

In specific implementation, these four types of keywords can be found in the text information of the tested user, and appropriate weights can be trained for each type of keywords. If a keyword appears multiple times, the weight score is calculated only once. Cumulative count.

In addition, if the negative words with negative meaning are also identified in the sentence containing the keyword, such as "not", "no", "without", "never", "hardly", "none", "neither" , "Litter", "few" and other words, the keyword is invalid, such as "i don't want to suicide", the keyword "suicide" is not included in the score, and finally the total keyword of the tested user is calculated. Score, for example, the scores for the four categories are 10, 5, 3, and 1.

Based on this, in the implementation of the present disclosure, the target keywords contained in the text information can be determined according to the following process: searching for preset candidate keywords from the text information; judging the candidate keywords based on the searched candidate keywords Whether the sentence contains negative words; it is determined that candidate keywords that do not contain negative words in the sentence are the target keywords.

S25: Determine whether the tested user is a user with depression according to the prediction result and its corresponding weight value and the weighted result of the target keyword and its corresponding weight value.

In this step, the prediction result of step S23 is combined with the score of the keyword recognition. Specifically, the weight value corresponding to the prediction result and the target keyword may be weighted to obtain the total score.

For example, in specific implementation, the total score of the tested user can be determined according to the following formula: R*α+K*β, where R represents the score corresponding to the prediction result output by the depression prediction model, and K represents the corresponding score of the target keyword Score. Optionally, when multiple keywords are hit, K can be calculated as the sum of the scores corresponding to each keyword. α and β represent the weight values corresponding to R and K, and α+β=1.

Optionally, R can be set according to empirical values or experimental results, which is not limited in the embodiments of the present disclosure. For example, it can be set as if the prediction result output by the depression prediction model is "yes", R=22, and if the prediction result output by the depression prediction model is "no", R=0.

During specific implementation, the scores corresponding to various types of keywords can be set according to experience values or experimental results, and the scores corresponding to each type of keywords can be different, which is not limited in the embodiments of the present disclosure. For example, according to experience values or experimental results, you can set the score corresponding to the first category of keywords to 10, determine the score corresponding to the second category of keywords to 7 points, and set the score corresponding to the third category of keywords to 5. The score for the fourth category of keywords is 3 points. The initial value of K is 0. Each time a different keyword is hit, the corresponding score will be accumulated according to the category of the hit keyword; if the same keyword is hit multiple times, the weight score is calculated only once, and the count is not accumulated. For example, when you hit the second type of keyword for the first time, add 7 to the current keyword score, so you can get the value of K. If you hit the second type of keyword again, no more points will be accumulated.

For ease of understanding, the prediction result output by the depression prediction model is depression, the target keyword hits two keywords of the first category, and one keyword of the second category as an example. From this, it can be determined that R=22, K= 17. The total score corresponding to the tested user is: 22*α+17*β. Optionally, it can be determined whether the total score corresponding to the tested user is greater than a preset optimal threshold, if it is greater, it is determined that the tested user is depressed, and if it is not greater than, it is determined that the tested user is not depressed.

Optionally, during specific implementation, the specific values of α and β may be set according to empirical values or experimental results, which are not limited in the embodiments of the present disclosure.

As shown in FIG. 5, it is a schematic diagram of the flow of fusion of step S23 and step S24.

In the embodiments of the present disclosure, the voice signal is obtained through the microphone array and converted into text information, and then the text-based features are used for recognition. This method does not need to collect the patient's facial video, which helps to protect the subject's privacy. The traditional method is based on the PHQ-9 questionnaire for testing, which is highly subjective, and the results obtained by the embodiments of the present disclosure are more quantitative and objective. On the English DAIC data set, the average recognition accuracy of the present disclosure is 80%. Compared with the current machine learning methods based on video and audio, the recognition accuracy is improved.

Compared with the prior art, the present disclosure can be used in most occasions, such as in the home of the user under test, not necessarily limited to regular hospitals, and not limited by the place of use. It is faster, more efficient, and can protect patients better. Privacy, etc. In addition, the present disclosure uses the long and short-term memory network LSTM to model the text information, which better expresses the context relevance of the text information, and can mine deeper text features by optimizing the model and increasing the model complexity. Improve the recognition rate of depression.

The embodiment of the present disclosure also provides a text-based depression recognition device. As shown in FIG. 6, the device includes:

The text conversion unit 61 is configured to obtain the voice information of the tested user and convert it into text information;

The vector conversion unit 62 is configured to convert the text information into a text embedding vector;

The prediction unit 63 is configured to perform prediction based on the text embedding vector using a depression prediction model to obtain a prediction result, where the depression prediction model is obtained by training a text embedding vector sample using a long and short-term memory model LSTM, The text embedding vector samples include depression text embedding vector samples and non-depressive text embedding vector samples;

The first determining unit 64 is configured to determine the target keyword contained in the text information;

The second determining unit 65 is configured to determine whether the tested user is a user with depression according to the prediction result and its corresponding weight value and the weighted result of the target keyword and its corresponding weight value.

In one embodiment, the text information is composed of several sentences arranged in chronological order;

The vector conversion unit 62 is specifically configured to convert several sentences arranged in chronological order into a text embedding model based on the Bert model to obtain several text embedding vectors arranged in chronological order.

In one embodiment, the first determining unit 64 is specifically configured to search for preset candidate keywords from the text information; for the searched candidate keywords, determine where the candidate keywords are located. Whether the sentence contains a negative word; it is determined that a candidate keyword that does not contain a negative word in the sentence is the target keyword.

In one embodiment, the target keywords include multiple categories, wherein the target keywords of each category have different weight values.

For the convenience of description, the above parts are divided into modules (or units) according to their functions and described separately. Of course, when implementing the present disclosure, the functions of each module (or unit) can be implemented in one or more software or hardware.

After introducing the text-based depression recognition method and device of the exemplary embodiment of the present disclosure, next, a computing device according to another exemplary embodiment of the present disclosure is introduced.

Those skilled in the art can understand that various aspects of the present disclosure can be implemented as a system, a method, or a program product. Therefore, various aspects of the present disclosure can be specifically implemented in the following forms, namely: complete hardware implementation, complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software implementations, which may be collectively referred to herein as "Circuit", "Module" or "System".

In some possible implementation manners, the computing device according to the present disclosure may at least include at least one processor and at least one memory. Optionally, the memory may store program code, and when the program code is executed by the processor, the processor is caused to execute the text-based Steps in the method of identifying depression. For example, the processor may execute as shown in FIG. 2: Step S21, obtain the voice information of the tested user and convert it into text information; Step S22, convert the obtained text information into a text embedding vector; Step S23, based on The obtained text embedding vector is predicted by using the depression prediction model to obtain the prediction result; step S24, determining the target keyword contained in the text information; and step S25, according to the prediction result and its corresponding weight value and target keyword and The weighted result of its corresponding weight value determines whether the tested user is a user with depression.

The computing device 70 according to this embodiment of the present disclosure will be described below with reference to FIG. 7. The computing device 70 shown in FIG. 7 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.

As shown in FIG. 7, the computing device 70 is represented in the form of a general-purpose computing device. The components of the computing device 70 may include, but are not limited to: the aforementioned at least one processor 71, the aforementioned at least one memory 72, and a bus 73 connecting different system components (including the memory 72 and the processor 71).

Optionally, the bus 73 may represent one or more of several types of bus structures, including a memory bus or a memory controller, a peripheral bus, a processor, or a local bus using any bus structure among multiple bus structures.

Optionally, the memory 72 may include a readable medium in the form of a volatile memory, such as a random access memory (RAM) 721 and/or a cache memory 722, and may further include a read-only memory (ROM) 723.

Optionally, the memory 72 may also include a program/utility tool 725 having a set (at least one) program module 724. Such program module 724 includes but is not limited to: an operating system, one or more application programs, other program modules, and Program data, each of these examples or some combination may include the realization of a network environment.

The computing device 70 can also communicate with one or more external devices 74 (such as keyboards, pointing devices, etc.), and can also communicate with one or more devices that enable a user to interact with the computing device 70, and/or communicate with the computing device 70 can communicate with any device (such as a router, modem, etc.) that can communicate with one or more other computing devices. This communication can be performed through an input/output (I/O) interface 75. In addition, the computing device 70 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 76. As shown in the figure, the network adapter 76 communicates with other modules for the computing device 70 through the bus 73. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the computing device 70, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.

In some possible implementation manners, various aspects of the text-based depression recognition method provided in the present disclosure can also be implemented in the form of a program product, which includes program code, and when the program product runs on a computer device, The program code is used to make the computer device execute the steps in the text-based depression recognition method according to various exemplary embodiments of the present disclosure described above in this specification. For example, the computer device may execute the steps as shown in FIG. 2 As shown: step S21, obtain the voice information of the tested user and convert it into text information; step S22, convert the obtained text information into a text embedding vector; step S23, use the depression prediction model based on the obtained text embedding vector The prediction result is obtained by predicting; step S24, determining the target keyword contained in the text information; and step S25, determining the tested result according to the prediction result and its corresponding weight value and the weighting result of the target keyword and its corresponding weight value Whether the user is a user with depression.

Optionally, the program product may use any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.

The program product for text-based depression recognition of the embodiment of the present disclosure may adopt a portable compact disk read-only memory (CD-ROM) and include program code, and may run on a computing device. However, the program product of the present disclosure is not limited thereto. In this document, the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device.

The readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein. This propagated data signal can take many forms, including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing. The readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.

The program code contained on the readable medium can be transmitted by any suitable medium, including, but not limited to, wireless, wired, optical cable, RF, etc., or any suitable combination of the above.

The program code used to perform the operations of the present disclosure can be written in any combination of one or more programming languages. The programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming languages. Programming language-such as "C" language or similar programming language. The program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on. In the case of remote computing devices, the remote computing device can be connected to the user computing device through any kind of network-including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (for example, using Internet services). Provider to connect via the Internet).

Through the description of the specific embodiments, it should be possible to gain a more in-depth and specific understanding of the technical means and effects adopted by the present disclosure to achieve the predetermined purpose. However, the accompanying drawings are used for reference and explanation, and are not used to limit the present disclosure.

Industrial applicability

The text-based depression recognition method provided by the embodiments of the present disclosure converts speech information into text information, and then uses the Bert model to convert the text information into a text embedding vector, and at the same time uses a long and short-term memory network LSTM to model the text embedding vector , So as to better express the relevance of the text information context and dig deeper text features, thereby greatly improving the recognition accuracy of depression;

Claims

A text-based method for identifying depression, characterized in that the method includes:

Obtain the voice information of the tested user and convert it into text information;

Converting the text information into a text embedding vector;

Based on the text embedding vector, a prediction result is obtained by using a depression prediction model, the depression prediction model is obtained by training a text embedding vector sample using a long- and short-term memory model LSTM, and the text embedding vector sample includes depression Embedding vector samples of symptom text and non-depression text embedding vector samples;

Determine the target keywords contained in the text information; and

According to the prediction result and its corresponding weight value and the weighted result of the target keyword and its corresponding weight value, it is determined whether the tested user is a user with depression.
The method according to claim 1, wherein the method for converting the voice information into the text information comprises: converting using a mature voice-text conversion algorithm model; and performing professional manual translation.
The method according to any one of claims 1 to 2, wherein the text information is composed of several sentences arranged in chronological order; and converting the text information into a text embedding vector specifically includes:

Based on the Bert model, several sentences arranged in chronological order are respectively converted into text embedding models, and several text embedding vectors arranged in chronological order are obtained.
The method according to any one of claims 1 to 3, wherein the conversion of the text information into a text embedding vector adopts a sentence-level 768-dimensional text embedding based on a Bert model.
The method according to any one of claims 1 to 4, wherein the long and short-term memory model LSTM is a bidirectional variable length LSTM model.
The method according to any one of claims 1 to 5, wherein determining the target keyword contained in the text information specifically includes:

Searching for preset candidate keywords from the text information;

For the searched candidate keywords, determine whether the sentence in which the candidate keywords are located contains negative words; and

It is determined that candidate keywords that do not contain negative words in the sentence are the target keywords.
The method according to any one of claims 1 to 6, wherein the target keyword includes a plurality of categories, wherein the weight value corresponding to the target keyword of each category is different.
A text-based depression recognition device, characterized in that the device comprises:

A text conversion unit, which is configured to obtain the voice information of the tested user and convert it into text information;

A vector conversion unit configured to convert the text information into a text embedding vector;

A prediction unit configured to perform prediction based on the text embedding vector using a depression prediction model to obtain a prediction result, and the depression prediction model is obtained by training a text embedding vector sample using a long and short-term memory model LSTM, so The text embedding vector samples include depression text embedding vector samples and non-depressive text embedding vector samples;

A first determining unit configured to determine the target keyword contained in the text information; and

The second determining unit is configured to determine whether the tested user is a user with depression according to the prediction result and its corresponding weight value and the weighted result of the target keyword and its corresponding weight value.
The device according to claim 8, wherein the method for converting the voice information into the text information comprises: using a mature voice-text conversion algorithm model for conversion; and performing professional manual translation.
The device according to any one of claims 8 to 9, wherein the text information is composed of several sentences arranged in chronological order; and

Wherein, the vector conversion unit is specifically configured to convert several sentences arranged in chronological order into a text embedding model respectively based on the Bert model to obtain several text embedding vectors arranged in chronological order.
The device according to any one of claims 8 to 10, wherein the conversion of the text information into a text embedding vector adopts a sentence-level 768-dimensional text embedding based on a Bert model.
The device according to any one of claims 8 to 11, wherein the long-short-term memory model LSTM is a bidirectional variable-length LSTM model.
The device according to any one of claims 8 to 12, characterized in that:

The first determining unit is specifically configured to: search for a preset candidate keyword from the text information; for the searched candidate keyword, determine whether the sentence in which the candidate keyword contains a negative word; It is determined that candidate keywords that do not contain negative words in the sentence are the target keywords.
The device according to any one of claims 8 to 13, wherein the target keyword includes a plurality of categories, wherein each category has a different weight value corresponding to the target keyword.
A computing device, characterized in that the computing device includes: a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and when the computer program is executed by the processor The steps of the method as claimed in any one of claims 1 to 7 are implemented.
A computer storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 7 are realized.