WO2021147363A1 - Text-based major depressive disorder recognition method - Google Patents

Text-based major depressive disorder recognition method Download PDF

Info

Publication number
WO2021147363A1
WO2021147363A1 PCT/CN2020/117579 CN2020117579W WO2021147363A1 WO 2021147363 A1 WO2021147363 A1 WO 2021147363A1 CN 2020117579 W CN2020117579 W CN 2020117579W WO 2021147363 A1 WO2021147363 A1 WO 2021147363A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
depression
embedding vector
model
text information
Prior art date
Application number
PCT/CN2020/117579
Other languages
French (fr)
Chinese (zh)
Inventor
王迎雪
王刚
邹博超
王英华
陈勤琴
刘弋锋
谢海永
丰雷
冯媛
Original Assignee
中国电子科技集团公司电子科学研究院
首都医科大学附属北京安定医院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国电子科技集团公司电子科学研究院, 首都医科大学附属北京安定医院 filed Critical 中国电子科技集团公司电子科学研究院
Publication of WO2021147363A1 publication Critical patent/WO2021147363A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Definitions

  • the present disclosure relates to the field of machine learning technology, and in particular to a text-based depression recognition method.
  • the global prevalence of depression (Major Depressive Disorder, MDD) is as high as 5%-12%, and 15% of patients commit suicide.
  • the prevalence rate of depression in my country is 6.1%.
  • the proportion of depression in the total burden of disease in my country will increase to 7.3% in 2020.
  • Depression has become a major public health problem and has urgent clinical research needs.
  • the recognition methods of depression include video information-based depression recognition methods, audio features-based depression recognition methods, and other methods.
  • the above methods only use single modal features such as voice features or video features to identify depression, and the discriminative information contained therein is not enough, thereby reducing the recognition accuracy of depression recognition.
  • the embodiments of the present disclosure provide a text-based depression recognition method to improve the accuracy of depression recognition.
  • a text-based method for identifying depression includes:
  • a prediction result is obtained by using a depression prediction model
  • the depression prediction model is obtained by training a text embedding vector sample using a long- and short-term memory model LSTM
  • the text embedding vector sample includes depression Embedding vector samples of symptom text and non-depression text embedding vector samples
  • the prediction result and its corresponding weight value and the weighted result of the target keyword and its corresponding weight value it is determined whether the tested user is a user with depression.
  • the method for converting the voice information into the text information may include: converting using a mature voice-text conversion algorithm model; and performing professional manual translation.
  • the text information is composed of several sentences arranged in chronological order; and the conversion of the text information into a text embedding vector specifically includes:
  • Bert Bidirectional Encoder Representations from Transformers
  • converting the text information into a text embedding vector may adopt sentence-level 768-dimensional text embedding based on the Bert model.
  • the long-short-term memory model LSTM may be a bidirectional variable-length LSTM model.
  • determining the target keyword contained in the text information specifically includes:
  • candidate keywords that do not contain negative words in the sentence are the target keywords.
  • the target keyword may include multiple categories, wherein the weight value corresponding to the target keyword of each category is different.
  • a text-based depression recognition device includes:
  • a text conversion unit which is configured to obtain the voice information of the tested user and convert it into text information
  • a vector conversion unit configured to convert the text information into a text embedding vector
  • a prediction unit configured to perform prediction based on the text embedding vector using a depression prediction model to obtain a prediction result, and the depression prediction model is obtained by training a text embedding vector sample using a long and short-term memory model LSTM, so
  • the text embedding vector samples include depression text embedding vector samples and non-depressive text embedding vector samples;
  • a first determining unit configured to determine the target keyword contained in the text information
  • the second determining unit is configured to determine whether the tested user is a user with depression according to the prediction result and its corresponding weight value and the weighted result of the target keyword and its corresponding weight value.
  • the text information is composed of several sentences arranged in chronological order;
  • the vector conversion unit is specifically configured to convert several sentences arranged in chronological order into a text embedding model based on the Bert model to obtain several text embedding vectors arranged in chronological order.
  • converting the text information into a text embedding vector may adopt sentence-level 768-dimensional text embedding based on the Bert model.
  • the long-short-term memory model LSTM may be a bidirectional variable-length LSTM model.
  • the first determining unit is specifically configured to: search for a preset candidate keyword from the text information; for the searched candidate keyword, determine whether the candidate keyword is in the sentence Contains negative words; it is determined that candidate keywords that do not contain negative words in the sentence are the target keywords.
  • the target keyword may include multiple categories, wherein the weight value corresponding to the target keyword of each category is different.
  • a computing device includes a memory, a processor, and a computer program that is stored on the memory and can run on the processor.
  • the processor implements the steps described in any of the above methods when executed.
  • a computer storage medium is provided, and a computer program is stored on the computer storage medium, and when the computer program is executed by a processor, the steps of any one of the above methods are implemented.
  • the present disclosure has at least the following advantages:
  • the text-based depression recognition method described in the present disclosure converts speech information into text information, and then uses the Bert model to convert the text information into a text embedding vector, and at the same time uses an LSTM neural network to use a recurrent layer to model the text embedding vector , So as to better express the relevance of the text information context.
  • the method provided by the embodiments of the present disclosure can also be used in most occasions, is not limited by the place of use, is faster and more efficient, and does not need to collect patient facial videos, which helps to protect the privacy of subjects;
  • the method provided in the embodiments of the present disclosure provides more quantitative and objective results than the traditional method based on the PHQ-9 questionnaire, which is more subjective;
  • the method provided by the embodiment of the present disclosure can be deployed on a PC (personal computer), a mobile client, etc., has the characteristics of simplicity, efficiency, and speed, and can assist the diagnosis and identification of depression.
  • FIG. 1 is a schematic block diagram of processing steps of a text-based depression recognition method according to an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of an implementation process of a text-based depression recognition method according to an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of the processing flow of sentence-level text embedding based on the Bert model according to an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of the frequency of occurrence of four types of keywords in depression/non-depression patients according to an embodiment of the present disclosure
  • FIG. 5 is a schematic diagram of a prediction result of a depression prediction model according to an embodiment of the present disclosure and a keyword fusion process determined from text information;
  • Fig. 6 is a schematic structural diagram of a text-based depression recognition device according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of a computing device according to an embodiment of the present disclosure.
  • 61-text conversion unit 62-vector conversion unit, 63-prediction unit, 64-first determination unit, 65-second determination unit, 70-computing device, 71-processor, 72-memory, 721-random access Memory (RAM), 722-cache memory, 723-read only memory (ROM), 724-program module, 725-program/utility, 73-bus, 74-external device, 75-input/output (I/O ) Interface, 76-Network Adapter.
  • Bert (Bidirectional Encoder Representations from Transformers) model: is a general pre-trained language representation model, which is based on the two-way encoder representation of the converter, that is, when processing a word, it can consider the word information before and after the word, thus Get the semantics of the context.
  • Depression also known as depressive disorder, is the main type of mood disorder with significant and lasting depression as its main clinical feature.
  • depression has become the largest burden of mental illness in centuries, and it is the main reason for people's loss of mobility. And because depression patients often experience psychiatric symptoms such as hallucinations, delusions, and even suicide attempts or behaviors, depression has already had a great impact on the individual, family, and society of the patient.
  • the World Health Organization reports that as of 2017, about 300 million people in the world suffer from depression. In China, the incidence of depression is about 6.1%, and currently about 30 million people have been diagnosed with depression. Less than 10% of these 30 million depression patients receive professional assistance and treatment. At the same time, there are still quite a few patients who are not aware that they are suffering from depression.
  • PHQ-9 Depression Self-Assessment Scale
  • the questionnaire contains some specific questions, such as how you sleep, whether you feel depressed, whether you have ever had suicidal behavior, etc. For questions, give the corresponding score for each answer, and finally calculate the sum to get the PHQ-9 total score.
  • the PHQ-9 score is only used as an important reference indicator for judging whether it is depression. The final diagnosis is still based on long-term observation and inquiry of professional doctors. This method has two flaws. One is that the PHQ-9 self-assessment form is completely determined by the tested user, but the tested user may not be willing to disclose certain things. The more typical one is that many people are unwilling to share their own suicide experiences.
  • depression detection methods based on video information, audio features and other monomodal information have emerged.
  • recognition is performed by extracting features such as the Mel spectrum coefficient of the audio data, and its best accuracy reaches 74.3%.
  • OpenFace extracts 68 points of 3D features on the face for recognition, and its accuracy reaches 73.7%.
  • a text-based depression recognition method is proposed. This method first conducts a dialogue between a professional doctor and the tested user. The content of the dialogue is a specific problem designed for depression, and then the acquired voice information is converted into text information. Based on the converted text information, the Bert-Lstm-based text emotion recognition and keyword recognition fusion method is used for discrimination.
  • the Bert-Lstm-based text emotion recognition method is mainly based on Bert converting text information into text embedding vectors, then using LSTM to model the text embedding vectors, and finally using the trained Bert-Lstm model to classify the text.
  • the goal of keyword recognition is to distinguish the sensitive keywords in the dialogue and find the vocabulary that is significantly different between normal people and depression patients. After that, the results obtained by the two methods are merged at the decision-making level, and appropriate weights are given through repeated experiments and training. Compared with methods based on video and audio data, this method uses text to bring information that is more accurate and intuitive, and it can cut to the point, and its recognition accuracy is relatively high.
  • the method provided by the embodiment of the present disclosure can be deployed on a PC (personal computer), a mobile client, etc., has the characteristics of simplicity, efficiency, and speed, and can assist the diagnosis and identification of depression.
  • the text-based depression recognition method provided by the embodiments of the present disclosure can be composed of the following five parts: data collection preprocessing, text conversion, text embedding modeling, keyword recognition, and decision-level fusion.
  • data collection preprocessing the embodiment of the present disclosure collects the voice information of the user under test through the microphone, and at the same time deletes the voice asked by the doctor or the machine to avoid excessive external voice noise.
  • the embodiments of the present disclosure adopt voice-text conversion technology, or ask professionals to perform translation, so as to avoid personal confusion factors such as accent and speech speed.
  • the embodiment of the present disclosure adopts Bert sentence-level embedding, and then adopts an LSTM (Long Short Term Memory Model) network to model sentences in the text information.
  • LSTM Long Short Term Memory Model
  • the embodiments of the present disclosure screen sensitive words related to depression, and then identify these target keywords from the semantics of the text information, and perform weighted score discrimination.
  • the decision-level fusion process the discriminative results of the two processes of text embedding modeling and keyword recognition are fused, and then the final result is given.
  • FIG. 2 it is a schematic diagram of the implementation process of the text-based depression recognition method provided by the embodiments of the present disclosure, and the method includes the following steps:
  • the questions include some problems related to depression, such as how do you sleep in the last three months, whether you have loss of appetite or overeating, whether you feel depressed, and it is difficult to concentrate. Something and so on.
  • the collected voice information of the tested user can be preprocessed, and the voices that are not related to the tested user’s answer can be deleted.
  • the main purpose is to delete the question voice and the gaps in the dialogue, so as to ensure the data input into the model. Only the data of the tested user is included.
  • preprocessing can be performed in any of the following ways:
  • Method 1 Record the start and end time of the question and answer separately, and delete them according to the time interval;
  • Method 2 Screen and eliminate according to voice characteristics.
  • one of the following methods can be used to obtain the voice information of the tested user:
  • One is to develop into a mobile phone APP (application) or computer software, so that the tested user can diagnose by himself or with the help of family members, which is convenient and quick ;
  • the second is that the tested user is online or the hospital directly talks with the doctor, and records the answer.
  • This kind of question and answer is more flexible, and the doctor can make further inquiries based on the patient's own or the answer, and make a more accurate diagnosis.
  • the voice information After obtaining the voice information of the tested user, the voice information is converted into text information. In this way, the accent, speed of speech, intonation and other confusing factors caused by different individuals can be avoided. Secondly, if the voice information is directly used for discrimination, it is often only based on the spectral characteristics, or the amplitude and phase characteristics, thereby ignoring the semantic characteristics, so the recognition accuracy will not be very high.
  • the text information converted according to the voice information of the tested user is composed of several sentences arranged in chronological order. Based on this, based on the Bert model, several sentences arranged in chronological order can be converted into text embedding models respectively, and several text embedding vectors arranged in chronological order can be obtained.
  • the embodiment of the present disclosure adopts sentence-level 768-dimensional text embedding based on the Bert model to convert each word in each sentence of the tested user into a 768-dimensional vector.
  • “i "can't sleep” is converted into three 768-dimensional vectors, which are word-level text embeddings.
  • Bert-level embedding is context-sensitive, such as "I work in a bank” (I work in a bank), "riverbank” ( ⁇ ), the embedding vector converted by these two "banks” are different.
  • the embodiment of the present disclosure may adopt a Bert pre-training model based on Chinese text to process Chinese text information.
  • the embodiment of the present disclosure adopts sentence-level text embedding, which can average three 768-dimensional vectors after conversion to obtain one 768-dimensional vector, which is the sentence-level vector expression of the sentence "can’t sleep". For a tested user, there will be a lot of answer sentences. In the specific implementation, every sentence of the tested user can be converted into such a 768-dimensional vector, so that multiple chronologically arranged The 768-dimensional text embedding vector is the total number of sentences answered by the tested user.
  • the depression prediction model is obtained by training a text embedding vector sample using a long and short-term memory model LSTM, and the text embedding vector sample includes a depression text embedding vector sample and a non-depression text embedding vector sample.
  • the LSTM model can learn some in-depth information in the timing characteristics, which is suitable for processing timing problems, and has unique advantages in solving the problem of vanishing or exploding gradients in traditional recurrent neural networks.
  • the 768-dimensional vector is regarded as the feature number, and its step size is the total number of sentences of the tested user, and the cycle is performed according to the time sequence of each sentence. Since the total number of sentences of each tested user is different, the two-way variable length LSTM model is used here, that is, the maximum step size is the maximum number of sentences, and each tested user has a variable as its effective length.
  • the loss function may adopt a cross-entropy loss function, the learning rate is 0.01, and the neuron nodes are all 64.
  • the first column 2.48 indicates that in non-depressive patients Among them, there are 2.48 first-class keywords per 10,000 words on average.
  • the frequency of these four keywords in depressed patients and non-depressive patients is significantly different.
  • the target keyword includes multiple categories, and the weight value corresponding to the target keyword of each category is different.
  • keywords related to depression can be divided into four categories.
  • the first category is keywords that are highly related to depression, such as “suicidal”, “kill myself”, “depression”, and “mental”. "Illness”, suicidal tendency is a significant feature of depression. In addition, many depression patients also know that they are suffering from depression or mental illness. Therefore, words related to suicide and depression are highly relevant keywords.
  • the second category is keywords related to sleep, such as “not sleep”, “difficult sleeping”, “insomnia”, “nightmares”, “toss and turn”. Depression patients usually have symptoms such as long-term insomnia and loss of appetite. , Here the keywords related to insomnia are extracted separately as a category.
  • the third category is the performance of patients with general depression, mainly feeling depressed, anxious, and helpless, such as "depressed”, “upset”, “hopelessness”, “helpless”, etc.
  • This category describes the typical symptoms of depression. Normal people will occasionally experience loss of psychology, but patients with depression will be depressed and not excited for a long time. According to statistics, 90% of depression patients enter a depressive state after continuous mania, and 60% of patients show manic symptoms after experiencing a continuous depressive state. Therefore, the fourth category is mainly irritability and loneliness. The relevance is relatively small, because normal people will also experience manic, irritable, feeling lonely and other phenomena, but the frequency of depression is higher. Such keywords are “irritable”, “uncontrollable”, “seclusive”, “loner” "Wait.
  • these four types of keywords can be found in the text information of the tested user, and appropriate weights can be trained for each type of keywords. If a keyword appears multiple times, the weight score is calculated only once. Cumulative count.
  • the negative words with negative meaning are also identified in the sentence containing the keyword, such as "not”, “no”, “without”, “never”, “hardly”, “none”, “neither” , “Litter”, “few” and other words, the keyword is invalid, such as "i don't want to suicide", the keyword “suicide” is not included in the score, and finally the total keyword of the tested user is calculated. Score, for example, the scores for the four categories are 10, 5, 3, and 1.
  • the target keywords contained in the text information can be determined according to the following process: searching for preset candidate keywords from the text information; judging the candidate keywords based on the searched candidate keywords Whether the sentence contains negative words; it is determined that candidate keywords that do not contain negative words in the sentence are the target keywords.
  • S25 Determine whether the tested user is a user with depression according to the prediction result and its corresponding weight value and the weighted result of the target keyword and its corresponding weight value.
  • the prediction result of step S23 is combined with the score of the keyword recognition. Specifically, the weight value corresponding to the prediction result and the target keyword may be weighted to obtain the total score.
  • the total score of the tested user can be determined according to the following formula: R* ⁇ +K* ⁇ , where R represents the score corresponding to the prediction result output by the depression prediction model, and K represents the corresponding score of the target keyword Score.
  • R represents the score corresponding to the prediction result output by the depression prediction model
  • K represents the corresponding score of the target keyword Score.
  • K can be calculated as the sum of the scores corresponding to each keyword.
  • the scores corresponding to various types of keywords can be set according to experience values or experimental results, and the scores corresponding to each type of keywords can be different, which is not limited in the embodiments of the present disclosure.
  • you can set the score corresponding to the first category of keywords to 10 determine the score corresponding to the second category of keywords to 7 points, and set the score corresponding to the third category of keywords to 5.
  • the score for the fourth category of keywords is 3 points.
  • the initial value of K is 0.
  • the corresponding score will be accumulated according to the category of the hit keyword; if the same keyword is hit multiple times, the weight score is calculated only once, and the count is not accumulated. For example, when you hit the second type of keyword for the first time, add 7 to the current keyword score, so you can get the value of K. If you hit the second type of keyword again, no more points will be accumulated.
  • the total score corresponding to the tested user is: 22* ⁇ +17* ⁇ .
  • the specific values of ⁇ and ⁇ may be set according to empirical values or experimental results, which are not limited in the embodiments of the present disclosure.
  • FIG. 5 it is a schematic diagram of the flow of fusion of step S23 and step S24.
  • the voice signal is obtained through the microphone array and converted into text information, and then the text-based features are used for recognition.
  • This method does not need to collect the patient's facial video, which helps to protect the subject's privacy.
  • the traditional method is based on the PHQ-9 questionnaire for testing, which is highly subjective, and the results obtained by the embodiments of the present disclosure are more quantitative and objective.
  • the average recognition accuracy of the present disclosure is 80%. Compared with the current machine learning methods based on video and audio, the recognition accuracy is improved.
  • the present disclosure can be used in most occasions, such as in the home of the user under test, not necessarily limited to regular hospitals, and not limited by the place of use. It is faster, more efficient, and can protect patients better. Privacy, etc.
  • the present disclosure uses the long and short-term memory network LSTM to model the text information, which better expresses the context relevance of the text information, and can mine deeper text features by optimizing the model and increasing the model complexity. Improve the recognition rate of depression.
  • the embodiment of the present disclosure also provides a text-based depression recognition device. As shown in FIG. 6, the device includes:
  • the text conversion unit 61 is configured to obtain the voice information of the tested user and convert it into text information
  • the vector conversion unit 62 is configured to convert the text information into a text embedding vector
  • the prediction unit 63 is configured to perform prediction based on the text embedding vector using a depression prediction model to obtain a prediction result, where the depression prediction model is obtained by training a text embedding vector sample using a long and short-term memory model LSTM,
  • the text embedding vector samples include depression text embedding vector samples and non-depressive text embedding vector samples;
  • the first determining unit 64 is configured to determine the target keyword contained in the text information
  • the second determining unit 65 is configured to determine whether the tested user is a user with depression according to the prediction result and its corresponding weight value and the weighted result of the target keyword and its corresponding weight value.
  • the text information is composed of several sentences arranged in chronological order;
  • the vector conversion unit 62 is specifically configured to convert several sentences arranged in chronological order into a text embedding model based on the Bert model to obtain several text embedding vectors arranged in chronological order.
  • the first determining unit 64 is specifically configured to search for preset candidate keywords from the text information; for the searched candidate keywords, determine where the candidate keywords are located. Whether the sentence contains a negative word; it is determined that a candidate keyword that does not contain a negative word in the sentence is the target keyword.
  • the target keywords include multiple categories, wherein the target keywords of each category have different weight values.
  • each module or unit
  • the functions of each module can be implemented in one or more software or hardware.
  • the computing device may at least include at least one processor and at least one memory.
  • the memory may store program code, and when the program code is executed by the processor, the processor is caused to execute the text-based Steps in the method of identifying depression.
  • the processor may execute as shown in FIG.
  • Step S21 obtain the voice information of the tested user and convert it into text information
  • Step S22 convert the obtained text information into a text embedding vector
  • Step S23 based on The obtained text embedding vector is predicted by using the depression prediction model to obtain the prediction result
  • step S24 determining the target keyword contained in the text information
  • step S25 according to the prediction result and its corresponding weight value and target keyword and The weighted result of its corresponding weight value determines whether the tested user is a user with depression.
  • the computing device 70 according to this embodiment of the present disclosure will be described below with reference to FIG. 7.
  • the computing device 70 shown in FIG. 7 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
  • the computing device 70 is represented in the form of a general-purpose computing device.
  • the components of the computing device 70 may include, but are not limited to: the aforementioned at least one processor 71, the aforementioned at least one memory 72, and a bus 73 connecting different system components (including the memory 72 and the processor 71).
  • the bus 73 may represent one or more of several types of bus structures, including a memory bus or a memory controller, a peripheral bus, a processor, or a local bus using any bus structure among multiple bus structures.
  • the memory 72 may include a readable medium in the form of a volatile memory, such as a random access memory (RAM) 721 and/or a cache memory 722, and may further include a read-only memory (ROM) 723.
  • RAM random access memory
  • ROM read-only memory
  • the memory 72 may also include a program/utility tool 725 having a set (at least one) program module 724.
  • program module 724 includes but is not limited to: an operating system, one or more application programs, other program modules, and Program data, each of these examples or some combination may include the realization of a network environment.
  • the computing device 70 can also communicate with one or more external devices 74 (such as keyboards, pointing devices, etc.), and can also communicate with one or more devices that enable a user to interact with the computing device 70, and/or communicate with the computing device 70 can communicate with any device (such as a router, modem, etc.) that can communicate with one or more other computing devices. This communication can be performed through an input/output (I/O) interface 75.
  • the computing device 70 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 76. As shown in the figure, the network adapter 76 communicates with other modules for the computing device 70 through the bus 73.
  • LAN local area network
  • WAN wide area network
  • public network such as the Internet
  • computing device 70 includes but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
  • various aspects of the text-based depression recognition method provided in the present disclosure can also be implemented in the form of a program product, which includes program code, and when the program product runs on a computer device,
  • the program code is used to make the computer device execute the steps in the text-based depression recognition method according to various exemplary embodiments of the present disclosure described above in this specification.
  • the computer device may execute the steps as shown in FIG.
  • step S21 obtain the voice information of the tested user and convert it into text information
  • step S22 convert the obtained text information into a text embedding vector
  • step S23 use the depression prediction model based on the obtained text embedding vector
  • the prediction result is obtained by predicting
  • step S24 determining the target keyword contained in the text information
  • step S25 determining the tested result according to the prediction result and its corresponding weight value and the weighting result of the target keyword and its corresponding weight value Whether the user is a user with depression.
  • the program product may use any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the program product for text-based depression recognition of the embodiment of the present disclosure may adopt a portable compact disk read-only memory (CD-ROM) and include program code, and may run on a computing device.
  • CD-ROM portable compact disk read-only memory
  • the program product of the present disclosure is not limited thereto.
  • the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device.
  • the readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein. This propagated data signal can take many forms, including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
  • the program code contained on the readable medium can be transmitted by any suitable medium, including, but not limited to, wireless, wired, optical cable, RF, etc., or any suitable combination of the above.
  • the program code used to perform the operations of the present disclosure can be written in any combination of one or more programming languages.
  • the programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming languages. Programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on.
  • the remote computing device can be connected to the user computing device through any kind of network-including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (for example, using Internet services). Provider to connect via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet services for example, using Internet services
  • the text-based depression recognition method converts speech information into text information, and then uses the Bert model to convert the text information into a text embedding vector, and at the same time uses a long and short-term memory network LSTM to model the text embedding vector , So as to better express the relevance of the text information context and dig deeper text features, thereby greatly improving the recognition accuracy of depression;
  • the method provided by the embodiments of the present disclosure can also be used in most occasions, is not limited by the place of use, is faster and more efficient, and does not need to collect patient facial videos, which helps to protect the privacy of subjects;
  • the method provided in the embodiments of the present disclosure provides more quantitative and objective results than the traditional method based on the PHQ-9 questionnaire, which is more subjective;
  • the method provided by the embodiment of the present disclosure can be deployed on a PC (personal computer), a mobile client, etc., has the characteristics of simplicity, efficiency, and speed, and can assist the diagnosis and identification of depression.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Machine Translation (AREA)

Abstract

Disclosed is a text-based major depressive disorder recognition method capable of improving the accuracy of recognizing major depressive disorder. The text-based major depressive disorder recognition method comprises: acquiring voice information of a user under test, and converting same into text information; converting the text information into a text embedding vector; using a major depressive disorder prediction model to perform prediction on the basis of the text embedding vector, and obtaining a prediction result, the major depressive disorder prediction model being obtained by using a long short-term memory (LSTM) model to train text embedding vector samples, the text embedding vector samples including major depressive disorder text embedding vector samples and non-major depressive disorder text embedding vector samples; and determining a target keyword contained in the text information; and determining whether the user under test is affected with major depressive disorder according to a weighting result of the prediction result and a weight value corresponding thereto and the target keyword and a weight value corresponding thereto.

Description

一种基于文本的抑郁症识别方法A Text-based Recognition Method of Depression
相关申请的交叉引用Cross-references to related applications
本公开要求于2020年01月20日提交中国专利局的申请号为2020100650965、名称为“一种基于文本的抑郁症识别方法”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure claims the priority of a Chinese patent application filed with the Chinese Patent Office on January 20, 2020, with the application number 2020100650965 and titled "A text-based method for identifying depression", the entire content of which is incorporated into this disclosure by reference middle.
技术领域Technical field
本公开涉及机器学习技术领域,尤其涉及一种基于文本的抑郁症识别方法。The present disclosure relates to the field of machine learning technology, and in particular to a text-based depression recognition method.
背景技术Background technique
抑郁症(Major Depressive Disorder,MDD)全球患病率高达5%-12%,其中15%的患者自杀身亡。我国抑郁症的患病率为6.1%,据中国疾病预防控制中心估计,2020年抑郁症在我国疾病总负担的比例将增至7.3%。抑郁症已经成为一个重大的公共卫生问题,具有迫切的临床研究需求。The global prevalence of depression (Major Depressive Disorder, MDD) is as high as 5%-12%, and 15% of patients commit suicide. The prevalence rate of depression in my country is 6.1%. According to estimates by the Chinese Center for Disease Control and Prevention, the proportion of depression in the total burden of disease in my country will increase to 7.3% in 2020. Depression has become a major public health problem and has urgent clinical research needs.
当前,抑郁症的识别方法包括基于视频信息的抑郁症识别方法、基于音频特征的抑郁症识别方法以及其他方法。以上方法仅仅通过语音特征或者视频特征等单个模态的特征来识别抑郁症,其包含的判别信息并不够,从而降低了抑郁症识别的识别精度。Currently, the recognition methods of depression include video information-based depression recognition methods, audio features-based depression recognition methods, and other methods. The above methods only use single modal features such as voice features or video features to identify depression, and the discriminative information contained therein is not enough, thereby reducing the recognition accuracy of depression recognition.
发明内容Summary of the invention
本公开实施例提供一种基于文本的抑郁症识别方法,用以提高抑郁症识别精度。The embodiments of the present disclosure provide a text-based depression recognition method to improve the accuracy of depression recognition.
根据本公开实施例,提供了一种基于文本的抑郁症识别方法,所述方法包括:According to an embodiment of the present disclosure, a text-based method for identifying depression is provided, and the method includes:
获取被测用户的语音信息并转换为文本信息;Obtain the voice information of the tested user and convert it into text information;
将所述文本信息转换为文本嵌入向量;Converting the text information into a text embedding vector;
基于所述文本嵌入向量,利用抑郁症预测模型进行预测得到预测结果,所述抑郁症预测模型为利用长短时记忆模型LSTM对文本嵌入向量样本进行训练得到的,所述文本嵌入向量样本中包括抑郁症文本嵌入向量样本和非抑郁症文本嵌入向量样本;Based on the text embedding vector, a prediction result is obtained by using a depression prediction model, the depression prediction model is obtained by training a text embedding vector sample using a long- and short-term memory model LSTM, and the text embedding vector sample includes depression Embedding vector samples of symptom text and non-depression text embedding vector samples;
确定所述文本信息中包含的目标关键词;以及Determine the target keywords contained in the text information; and
根据所述预测结果及其对应的权重值和所述目标关键词及其对应的权重值的加权结果,确定所述被测用户是否属于抑郁症用户。According to the prediction result and its corresponding weight value and the weighted result of the target keyword and its corresponding weight value, it is determined whether the tested user is a user with depression.
可选地,所述语音信息转换成所述文本信息的方法可以包括:利用成熟的语音-文本转换算法模型进行转换;以及进行专业人工翻译。Optionally, the method for converting the voice information into the text information may include: converting using a mature voice-text conversion algorithm model; and performing professional manual translation.
可选地,所述文本信息由按照时间顺序排列的若干个句子组成;并且将所述文本信息转换为文本嵌入向量,具体包括:Optionally, the text information is composed of several sentences arranged in chronological order; and the conversion of the text information into a text embedding vector specifically includes:
基于Bert(Bidirectional Encoder Representations from Transformers)模型,将按照时间顺序排列的若干个句子分别转换为文本嵌入模型,得到按照时间顺序排列的若干个文本嵌入向量。Based on the Bert (Bidirectional Encoder Representations from Transformers) model, several sentences arranged in chronological order are respectively converted into text embedding models, and several text embedding vectors arranged in chronological order are obtained.
可选地,将所述文本信息转换为文本嵌入向量可以采用基于Bert模型的句子级别768维度文本嵌入。Optionally, converting the text information into a text embedding vector may adopt sentence-level 768-dimensional text embedding based on the Bert model.
可选地,所述长短时记忆模型LSTM可以是双向变长LSTM模型。Optionally, the long-short-term memory model LSTM may be a bidirectional variable-length LSTM model.
可选地,确定所述文本信息中包含的目标关键词,具体包括:Optionally, determining the target keyword contained in the text information specifically includes:
从所述文本信息中搜索预设的候选关键词;Searching for preset candidate keywords from the text information;
针对搜索出的候选关键词,判断所述候选关键词所在的句子中是否包含否定词;以及For the searched candidate keywords, determine whether the sentence in which the candidate keywords are located contains negative words; and
确定所在句子中不包含否定词的候选关键词为所述目标关键词。It is determined that candidate keywords that do not contain negative words in the sentence are the target keywords.
可选地,所述目标关键词可以包括多个类别,其中,每一类别的目标关键词对应的权重值不同。Optionally, the target keyword may include multiple categories, wherein the weight value corresponding to the target keyword of each category is different.
根据本公开实施例,提供了一种基于文本的抑郁症识别装置,所述装置包括:According to an embodiment of the present disclosure, a text-based depression recognition device is provided, and the device includes:
文本转换单元,其被配置用于获取被测用户的语音信息并转换为文本信息;A text conversion unit, which is configured to obtain the voice information of the tested user and convert it into text information;
向量转换单元,其被配置用于将所述文本信息转换为文本嵌入向量;A vector conversion unit configured to convert the text information into a text embedding vector;
预测单元,其被配置用于基于所述文本嵌入向量,利用抑郁症预测模型进行预测得到预测结果,所述抑郁症预测模型为利用长短时记忆模型LSTM对文本嵌入向量样本进行训练得到的,所述文本嵌入向量样本中包括抑郁症文本嵌入向量样本和非抑郁症文本嵌入向量样本;A prediction unit configured to perform prediction based on the text embedding vector using a depression prediction model to obtain a prediction result, and the depression prediction model is obtained by training a text embedding vector sample using a long and short-term memory model LSTM, so The text embedding vector samples include depression text embedding vector samples and non-depressive text embedding vector samples;
第一确定单元,其被配置用于确定所述文本信息中包含的目标关键词;以及A first determining unit configured to determine the target keyword contained in the text information; and
第二确定单元,其被配置用于根据所述预测结果及其对应的权重值和所述目标关键词及其对应的权重值的加权结果,确定所述被测用户是否属于抑郁症用户。The second determining unit is configured to determine whether the tested user is a user with depression according to the prediction result and its corresponding weight value and the weighted result of the target keyword and its corresponding weight value.
可选地,所述文本信息由按照时间顺序排列的若干个句子组成;并且Optionally, the text information is composed of several sentences arranged in chronological order; and
所述向量转换单元被具体地配置用于基于Bert模型,将按照时间顺序排列的若干个句子分别转换为文本嵌入模型,得到按照时间顺序排列的若干个文本嵌入向量。The vector conversion unit is specifically configured to convert several sentences arranged in chronological order into a text embedding model based on the Bert model to obtain several text embedding vectors arranged in chronological order.
可选地,将所述文本信息转换为文本嵌入向量可以采用基于Bert模型的句子级别768维度文本嵌入。Optionally, converting the text information into a text embedding vector may adopt sentence-level 768-dimensional text embedding based on the Bert model.
可选地,所述长短时记忆模型LSTM可以是双向变长LSTM模型。Optionally, the long-short-term memory model LSTM may be a bidirectional variable-length LSTM model.
可选地,所述第一确定单元被具体地配置用于:从所述文本信息中搜索预设的候选关键词;针对搜索出的候选关键词,判断所述候选关键词所在的句子中是否包含否定词;确定所在句子中不包含否定词的候选关键词为所述目标关键词。Optionally, the first determining unit is specifically configured to: search for a preset candidate keyword from the text information; for the searched candidate keyword, determine whether the candidate keyword is in the sentence Contains negative words; it is determined that candidate keywords that do not contain negative words in the sentence are the target keywords.
可选地,所述目标关键词可以包括多个类别,其中,每一类别的目标关键词对应的权重值不同。Optionally, the target keyword may include multiple categories, wherein the weight value corresponding to the target keyword of each category is different.
根据本公开实施例,提供了一种计算装置,所述计算装置包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现上述任一方法所述的步骤。According to an embodiment of the present disclosure, a computing device is provided. The computing device includes a memory, a processor, and a computer program that is stored on the memory and can run on the processor. The processor implements the steps described in any of the above methods when executed.
根据本公开实施例,提供了一种计算机存储介质,所述计算机存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述任一方法所述的步骤。According to an embodiment of the present disclosure, a computer storage medium is provided, and a computer program is stored on the computer storage medium, and when the computer program is executed by a processor, the steps of any one of the above methods are implemented.
采用上述技术方案,本公开至少具有下列优点:With the above technical solutions, the present disclosure has at least the following advantages:
本公开所述的基于文本的抑郁症识别方法,通过将语音信息转换为文本信息,进而利用Bert模型将文本信息转换成文本嵌入向量,同时使用LSTM神经网络使用循环层对文本嵌入向量进行建模,从而能够更好地表达文本信息上下文的相关性,可以通过优化模型、增加模型复杂度等方法,挖掘更深层次的文本特征,从而大幅地提高了抑郁症的识别精度;The text-based depression recognition method described in the present disclosure converts speech information into text information, and then uses the Bert model to convert the text information into a text embedding vector, and at the same time uses an LSTM neural network to use a recurrent layer to model the text embedding vector , So as to better express the relevance of the text information context. By optimizing the model and increasing the complexity of the model, deeper text features can be mined, thereby greatly improving the recognition accuracy of depression;
本公开实施例提供的方法还可以在大部分场合下使用,不受使用场所的限定,更加快捷、高效,并且不需要采集患者面部视频,有助于保护受试者的隐私;The method provided by the embodiments of the present disclosure can also be used in most occasions, is not limited by the place of use, is faster and more efficient, and does not need to collect patient facial videos, which helps to protect the privacy of subjects;
此外,本公开实施例提供的方法相较基于PHQ-9问卷进行试验、主观性较大的传统方法,得出的结果较为量化、客观;并且In addition, the method provided in the embodiments of the present disclosure provides more quantitative and objective results than the traditional method based on the PHQ-9 questionnaire, which is more subjective; and
本公开实施例提供的方法可以在PC(个人计算机)、移动客户端等部署,具有简便、高效、快捷的特点,可以辅助抑郁症的诊断识别。The method provided by the embodiment of the present disclosure can be deployed on a PC (personal computer), a mobile client, etc., has the characteristics of simplicity, efficiency, and speed, and can assist the diagnosis and identification of depression.
附图说明Description of the drawings
为了更清楚地说明本公开具体实施方式,下面将对具体实施方式中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅示出了本公开的一些实施方式,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the specific embodiments of the present disclosure more clearly, the following will briefly introduce the drawings that need to be used in the specific embodiments. Obviously, the drawings in the following description only show some embodiments of the present disclosure. Therefore, It should not be regarded as a limitation of the scope. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.
图1为根据本公开实施方式的基于文本的抑郁症识别方法的处理步骤示意框图;FIG. 1 is a schematic block diagram of processing steps of a text-based depression recognition method according to an embodiment of the present disclosure;
图2为根据本公开实施方式的基于文本的抑郁症识别方法的实施流程示意图;2 is a schematic diagram of an implementation process of a text-based depression recognition method according to an embodiment of the present disclosure;
图3为根据本公开实施方式的基于Bert模型的句子级别文本嵌入的处理流程示意图;3 is a schematic diagram of the processing flow of sentence-level text embedding based on the Bert model according to an embodiment of the present disclosure;
图4为根据本公开实施方式的四类关键词在抑郁症/非抑郁症患者中出现的频率示意图;4 is a schematic diagram of the frequency of occurrence of four types of keywords in depression/non-depression patients according to an embodiment of the present disclosure;
图5为根据本公开实施方式的抑郁症预测模型的预测结果和从文本信息中确定出的关键词融合流程示意图;5 is a schematic diagram of a prediction result of a depression prediction model according to an embodiment of the present disclosure and a keyword fusion process determined from text information;
图6为根据本公开实施方式的基于文本的抑郁症识别装置的结构示意图;Fig. 6 is a schematic structural diagram of a text-based depression recognition device according to an embodiment of the present disclosure;
图7为根据本公开实施方式的计算装置的结构示意图。FIG. 7 is a schematic structural diagram of a computing device according to an embodiment of the present disclosure.
附图标记:Reference signs:
61-文本转换单元,62-向量转换单元,63-预测单元,64-第一确定单元,65-第二确定单元,70-计算装置,71-处理器,72-存储器,721-随机存取存储器(RAM),722-高速缓存存储器,723-只读存储器(ROM),724-程序模块,725-程序/实用工具,73-总线,74-外部设备,75-输入/输出(I/O)接口,76-网络适配器。61-text conversion unit, 62-vector conversion unit, 63-prediction unit, 64-first determination unit, 65-second determination unit, 70-computing device, 71-processor, 72-memory, 721-random access Memory (RAM), 722-cache memory, 723-read only memory (ROM), 724-program module, 725-program/utility, 73-bus, 74-external device, 75-input/output (I/O ) Interface, 76-Network Adapter.
具体实施方式Detailed ways
为更进一步阐述本公开为达成预定目的所采取的技术手段及功效,以下结合附图及较佳实施例,对本公开进行详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本公开,并不用于限定本公开。In order to further illustrate the technical means and effects adopted by the present disclosure to achieve the predetermined purpose, the present disclosure will be described in detail below with reference to the accompanying drawings and preferred embodiments. It should be understood that the specific embodiments described herein are only used to explain the present disclosure, but not used to limit the present disclosure.
首先,对本公开实施例中涉及的部分用语进行说明,以便于本领域技术人员理解。First, some terms involved in the embodiments of the present disclosure will be described to facilitate the understanding by those skilled in the art.
Bert(Bidirectional Encoder Representations from Transformers)模型:是一种通用预训 练语言表示模型,其基于转换器的双向编码器表征,即在处理一个词的时候,能考虑到该词前面和后面单词信息,从而获取上下文的语义。Bert (Bidirectional Encoder Representations from Transformers) model: is a general pre-trained language representation model, which is based on the two-way encoder representation of the converter, that is, when processing a word, it can consider the word information before and after the word, thus Get the semantics of the context.
需要说明的是,本公开实施例中的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。It should be noted that the terms “first” and “second” in the description and claims in the embodiments of the present disclosure and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or Priority. It should be understood that the data used in this way can be interchanged under appropriate circumstances so that the embodiments described herein can be implemented in a sequence other than the content illustrated or described herein.
在本文中提及的“多个或者若干个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。The "plurality or several" mentioned in this article refers to two or more than two. "And/or" describes the association relationship of the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects are in an "or" relationship.
抑郁症又称抑郁障碍,以显著而持久的心境低落为主要临床特征,是心境障碍的主要类型。目前,抑郁症已经成为了人类第一大心理疾病负担,是造成人们失去行动力的主要原因。而且由于抑郁症患者经常会出现幻觉、妄想等精神病性症状,甚至有自杀企图或行为,抑郁症已经对患者个人、家庭乃至社会造成了很大的影响。世界卫生组织报道,截止到2017年,世界上约有3亿人患有抑郁症。在中国,抑郁症的发病率约为6.1%,目前已确诊的抑郁症患者为3000万人左右。这3000万抑郁症患者只有不到10%得到专业的救助和治疗,同时,还有相当多的患者根本没有意识到自己患有抑郁。而且很多医院对抑郁症的识别率不高,导致患者经常被漏诊。抑郁症如果得不到及时治疗,将会导致抑郁症慢性化,或者发展成为较难治性病症,严重者还会出现残疾,甚至自杀。因此,对抑郁症的精准识别是亟需解决的医学问题之一。Depression, also known as depressive disorder, is the main type of mood disorder with significant and lasting depression as its main clinical feature. At present, depression has become the largest burden of mental illness in mankind, and it is the main reason for people's loss of mobility. And because depression patients often experience psychiatric symptoms such as hallucinations, delusions, and even suicide attempts or behaviors, depression has already had a great impact on the individual, family, and society of the patient. The World Health Organization reports that as of 2017, about 300 million people in the world suffer from depression. In China, the incidence of depression is about 6.1%, and currently about 30 million people have been diagnosed with depression. Less than 10% of these 30 million depression patients receive professional assistance and treatment. At the same time, there are still quite a few patients who are not aware that they are suffering from depression. Moreover, the recognition rate of depression in many hospitals is not high, causing patients to be often missed. If depression is not treated in time, it will cause depression to become chronic or develop into a more difficult-to-treat disease. In severe cases, there will be disability and even suicide. Therefore, accurate recognition of depression is one of the medical problems that need to be solved urgently.
当前主流的抑郁症临床自评量表之一是PHQ-9(抑郁症自我评估量表)问卷,问卷中包含一些特定问题,例如,睡眠如何、有没有感到沮丧、曾经是否有过自杀行为等问题,针对每一道问题的回答给出相应的得分,最终计算求和得到PHQ-9总得分。PHQ-9得分只是作为判断是否为抑郁症的一个重要参考指标,最终确诊与否依然是根据专业医生长期的观察、问询得到。这种方法有两个缺陷,其一是PHQ-9自评表完全由被测用户决定,但是被测用户可能不愿意透露某些事情,比较典型的就是很多人不愿意分享自己曾经的自杀经历,诸如此类,这样会对PHQ-9得分产生一定影响;第二个弊端就是,专业医生医疗资源有限,且即使是专业医生对抑郁症的判断也难免会出现误诊、漏诊等现象,甚至出现不同的医生对同一被测用户得出不同的诊断结果,因此抑郁症诊断识别难以客观量化。One of the current mainstream clinical self-rating scales for depression is the PHQ-9 (Depression Self-Assessment Scale) questionnaire. The questionnaire contains some specific questions, such as how you sleep, whether you feel depressed, whether you have ever had suicidal behavior, etc. For questions, give the corresponding score for each answer, and finally calculate the sum to get the PHQ-9 total score. The PHQ-9 score is only used as an important reference indicator for judging whether it is depression. The final diagnosis is still based on long-term observation and inquiry of professional doctors. This method has two flaws. One is that the PHQ-9 self-assessment form is completely determined by the tested user, but the tested user may not be willing to disclose certain things. The more typical one is that many people are unwilling to share their own suicide experiences. , And so on, this will have a certain impact on the PHQ-9 score; the second drawback is that professional doctors have limited medical resources, and even professional doctors’ judgments on depression will inevitably lead to misdiagnosis, missed diagnosis, etc., and even different Doctors get different diagnosis results for the same user under test, so it is difficult to objectively quantify the diagnosis and recognition of depression.
为了克服上述弊端,出现了基于视频信息、基于音频特征等单模态信息的抑郁症检测 方法。对于音频数据,通过提取音频数据的梅尔频谱系数等特征进行识别,其最佳准确性达到74.3%。对于视频数据,通过OpenFace提取人脸上的68个点的3D特征进行识别,其准确性达到73.7%。这些方法虽然通过机器学习自动识别抑郁症,但以上方法仅仅通过音频特征或者视频特征来识别,其包含的判别信息并不够,识别精度也没有能够达到临床要求。In order to overcome the above-mentioned drawbacks, depression detection methods based on video information, audio features and other monomodal information have emerged. For audio data, recognition is performed by extracting features such as the Mel spectrum coefficient of the audio data, and its best accuracy reaches 74.3%. For video data, OpenFace extracts 68 points of 3D features on the face for recognition, and its accuracy reaches 73.7%. Although these methods use machine learning to automatically identify depression, the above methods only use audio features or video features to identify them, and the discriminative information contained therein is not enough, and the recognition accuracy does not meet the clinical requirements.
鉴于此,本公开实施例中,结合传统临床诊断方法和现代机器学习方法,提出了一种基于文本的抑郁症识别方法。该方法首先通过专业医生和被测用户进行对话,其对话内容是针对抑郁症设计的特定的问题,然后将获取到的语音信息转换为文本信息。基于转换后的文本信息,采用基于Bert-Lstm的文本情感识别和关键词识别相融合的方法进行判别。基于Bert-Lstm的文本情感识别方法主要基于Bert将文本信息转换成文本嵌入向量,然后采用LSTM对文本嵌入向量进行建模,最后利用训练好的Bert-Lstm模型对文本进行分类。关键词识别的目标是对对话中敏感的关键词进行判别,找出正常人与抑郁症患者有着显著差异的词汇。之后将两种方法得到的结果在决策层级别进行融合,通过实验反复训练给予适当的权重。相比于基于视频、音频数据的方法,这种方法采用文本所带来的信息更加准确、直观,并且能够切中要领,其识别准确度也相对较高。本公开实施例提供的方法可以在PC(个人计算机)、移动客户端等部署,具有简便、高效、快捷的特点,可以辅助抑郁症的诊断识别。In view of this, in the embodiments of the present disclosure, combining traditional clinical diagnosis methods and modern machine learning methods, a text-based depression recognition method is proposed. This method first conducts a dialogue between a professional doctor and the tested user. The content of the dialogue is a specific problem designed for depression, and then the acquired voice information is converted into text information. Based on the converted text information, the Bert-Lstm-based text emotion recognition and keyword recognition fusion method is used for discrimination. The Bert-Lstm-based text emotion recognition method is mainly based on Bert converting text information into text embedding vectors, then using LSTM to model the text embedding vectors, and finally using the trained Bert-Lstm model to classify the text. The goal of keyword recognition is to distinguish the sensitive keywords in the dialogue and find the vocabulary that is significantly different between normal people and depression patients. After that, the results obtained by the two methods are merged at the decision-making level, and appropriate weights are given through repeated experiments and training. Compared with methods based on video and audio data, this method uses text to bring information that is more accurate and intuitive, and it can cut to the point, and its recognition accuracy is relatively high. The method provided by the embodiment of the present disclosure can be deployed on a PC (personal computer), a mobile client, etc., has the characteristics of simplicity, efficiency, and speed, and can assist the diagnosis and identification of depression.
参见图1,本公开实施例提供的基于文本的抑郁症识别方法可以由以下5部分组成,分别为:数据采集预处理、文本转换、文本嵌入建模、关键词识别、决策级融合。其中,在数据采集预处理流程中,本公开实施例通过麦克风收集被测用户的语音信息,同时删去了医生或者机器提问的语音,避免外界语音噪声过大。在文本转换流程中,本公开实施例采用语音-文本转换技术,或者是请专业人员进行翻译,避免了口音、语速等个人混淆因素。在文本嵌入建模流程中,本公开实施例采用Bert句子级嵌入,之后采用LSTM(长短时记忆模型)网络对文本信息中的句子进行建模。在关键词识别流程中,本公开实施例对有关抑郁症的敏感词进行筛选,然后从文本信息的语义中识别这些目标关键词,进行加权得分判别。在决策级融合流程中,融合了文本嵌入建模和关键词识别两个流程的判别结果,然后给出最终结果。Referring to FIG. 1, the text-based depression recognition method provided by the embodiments of the present disclosure can be composed of the following five parts: data collection preprocessing, text conversion, text embedding modeling, keyword recognition, and decision-level fusion. Among them, in the data collection preprocessing process, the embodiment of the present disclosure collects the voice information of the user under test through the microphone, and at the same time deletes the voice asked by the doctor or the machine to avoid excessive external voice noise. In the text conversion process, the embodiments of the present disclosure adopt voice-text conversion technology, or ask professionals to perform translation, so as to avoid personal confusion factors such as accent and speech speed. In the text embedding modeling process, the embodiment of the present disclosure adopts Bert sentence-level embedding, and then adopts an LSTM (Long Short Term Memory Model) network to model sentences in the text information. In the keyword recognition process, the embodiments of the present disclosure screen sensitive words related to depression, and then identify these target keywords from the semantics of the text information, and perform weighted score discrimination. In the decision-level fusion process, the discriminative results of the two processes of text embedding modeling and keyword recognition are fused, and then the final result is given.
具体地,如图2所示,其为本公开实施例提供的基于文本的抑郁症识别方法的实施流程示意图,该方法包括以下步骤:Specifically, as shown in FIG. 2, it is a schematic diagram of the implementation process of the text-based depression recognition method provided by the embodiments of the present disclosure, and the method includes the following steps:
S21、获取被测用户的语音信息并转换为文本信息。S21: Acquire the voice information of the tested user and convert it into text information.
本步骤中,可以采用问答访谈的形式,其问题包含一些和抑郁症相关的问题,例如最近三个月睡眠如何、有没有食欲不振或者暴饮暴食的现象、是否感觉到沮丧、难以集中精力做某件事等等。In this step, the form of question-and-answer interview can be used. The questions include some problems related to depression, such as how do you sleep in the last three months, whether you have loss of appetite or overeating, whether you feel depressed, and it is difficult to concentrate. Something and so on.
采集到被测用户的语音信息后,可以对采集的语音信息进行预处理,删除与被测用户的回答无关的语音,主要是删除提问语音和对话间隙空白,这样可以保证输入到模型中的数据只包含被测用户的数据。After the voice information of the tested user is collected, the collected voice information can be preprocessed, and the voices that are not related to the tested user’s answer can be deleted. The main purpose is to delete the question voice and the gaps in the dialogue, so as to ensure the data input into the model. Only the data of the tested user is included.
具体实施时,可以按照以下任一方式进行预处理:During specific implementation, preprocessing can be performed in any of the following ways:
方法一、分别记录下问答的起始、结束时间,根据时间区间进行删除;Method 1: Record the start and end time of the question and answer separately, and delete them according to the time interval;
方法二、根据语音特征进行筛选剔除。Method 2: Screen and eliminate according to voice characteristics.
在另外一种实施方式中,还可以在记录语音的时候,只记录被测用户的语音,这样最方便快捷。另外,对于被测用户发出的语气词,或者是出现抽泣,大笑等语音要予以保留,这些往往包含着非常关键的信息,有时能从这些语义数据中提取到一些深层特征。In another embodiment, it is also possible to record only the voice of the user under test when recording the voice, which is the most convenient and quickest. In addition, the modal particles uttered by the tested user, or the voices such as sobbing, laughing, etc. should be retained. These often contain very critical information, and sometimes some deep features can be extracted from these semantic data.
具体实施时,可以采取以下任一方式获取被测用户的语音信息:其一是开发成手机APP(应用程序)或者电脑软件,这样被测用户可以自助或在家人的帮助下进行诊断,方便快捷;其二是被测用户在线上或者医院与医生直接对话,将回答记录下来,这样的问答比较灵活,医生可以根据患者的自身或者回答情况进行进一步问询,做出更准确的诊断。In specific implementation, one of the following methods can be used to obtain the voice information of the tested user: One is to develop into a mobile phone APP (application) or computer software, so that the tested user can diagnose by himself or with the help of family members, which is convenient and quick ; The second is that the tested user is online or the hospital directly talks with the doctor, and records the answer. This kind of question and answer is more flexible, and the doctor can make further inquiries based on the patient's own or the answer, and make a more accurate diagnosis.
在获取了被测用户的语音信息之后,将语音信息转换为文本信息。这样,首先可以避免口音、语速、语调等因个体不同而产生的混淆因素。其次,如果直接用语音信息进行判别,往往只是从频谱特征,或者是振幅、相位特征进行判别,从而忽略了语义特征,这样识别的精度不会很高。After obtaining the voice information of the tested user, the voice information is converted into text information. In this way, the accent, speed of speech, intonation and other confusing factors caused by different individuals can be avoided. Secondly, if the voice information is directly used for discrimination, it is often only based on the spectral characteristics, or the amplitude and phase characteristics, thereby ignoring the semantic characteristics, so the recognition accuracy will not be very high.
本公开实施例中,语音信息转换成文本信息的方法主要有两种,其一是利用成熟的语音-文本转换算法模型进行转换,这样省时省力,方便快捷,其二是进行专业人工翻译,这样虽然比较耗时,但是转换的精准度比前者要高。In the embodiments of the present disclosure, there are mainly two methods for converting voice information into text information. One is to use a mature voice-to-text conversion algorithm model for conversion, which saves time, effort, and convenience, and the other is to perform professional manual translation. Although this is time-consuming, the conversion accuracy is higher than the former.
S22、将获得的文本信息转换为文本嵌入向量。S22: Convert the obtained text information into a text embedding vector.
具体实施时,根据被测用户的语音信息转换得到的文本信息由按照时间顺序排列的若干个句子组成。基于此,可以基于Bert模型,将按照时间顺序排列的若干个句子分别转换 为文本嵌入模型,得到按照时间顺序排列的若干个文本嵌入向量。In specific implementation, the text information converted according to the voice information of the tested user is composed of several sentences arranged in chronological order. Based on this, based on the Bert model, several sentences arranged in chronological order can be converted into text embedding models respectively, and several text embedding vectors arranged in chronological order can be obtained.
具体地,如图3所示,本公开实施例采用了基于Bert模型的句子级别768维度文本嵌入,将被测用户的每句话中每个单词转换为768维的向量,例如可以将“i can’t sleep”转换为3个768维的向量,这即是单词级别文本嵌入。相比于经典的Word2Vec嵌入,Bert级别嵌入是上下文相关的,例如“I work in a bank”(我在银行工作)、“riverbank”(河边),这两个“bank”所转换嵌入的向量是不一样的。Specifically, as shown in FIG. 3, the embodiment of the present disclosure adopts sentence-level 768-dimensional text embedding based on the Bert model to convert each word in each sentence of the tested user into a 768-dimensional vector. For example, "i "can't sleep" is converted into three 768-dimensional vectors, which are word-level text embeddings. Compared with the classic Word2Vec embedding, Bert-level embedding is context-sensitive, such as "I work in a bank" (I work in a bank), "riverbank" (河边), the embedding vector converted by these two "banks" are different.
可选地,本公开实施例可以采用基于中文文本的Bert预训练模型处理中文文本信息。本公开实施例采用句子级别文本嵌入,可以将转换后的3个768维度的向量求平均,得到1个768维度的向量,这便是该句“i can’t sleep”的句子级向量表达。对于一个被测用户而言,会有很多的回答语句,具体实施时,可以将被测用户的每一句话,都转换成这样的768维度的向量,这样就得到了以时间顺序排列的多个768维的文本嵌入向量,其数量为该被测用户所回答的总句子数。Optionally, the embodiment of the present disclosure may adopt a Bert pre-training model based on Chinese text to process Chinese text information. The embodiment of the present disclosure adopts sentence-level text embedding, which can average three 768-dimensional vectors after conversion to obtain one 768-dimensional vector, which is the sentence-level vector expression of the sentence "can’t sleep". For a tested user, there will be a lot of answer sentences. In the specific implementation, every sentence of the tested user can be converted into such a 768-dimensional vector, so that multiple chronologically arranged The 768-dimensional text embedding vector is the total number of sentences answered by the tested user.
S23、基于得到的文本嵌入向量,利用抑郁症预测模型进行预测得到预测结果。S23. Based on the obtained text embedding vector, use the depression prediction model to perform prediction to obtain a prediction result.
其中,所述抑郁症预测模型为利用长短时记忆模型LSTM对文本嵌入向量样本进行训练得到的,所述文本嵌入向量样本中包括抑郁症文本嵌入向量样本和非抑郁症文本嵌入向量样本。Wherein, the depression prediction model is obtained by training a text embedding vector sample using a long and short-term memory model LSTM, and the text embedding vector sample includes a depression text embedding vector sample and a non-depression text embedding vector sample.
通过Bert建模,可以得到具有时序特性的多维句子向量,且维度均为768。LSTM模型可以学习到时序特征中一些深层的信息,适用于处理时序问题,而且在解决传统循环神经网络梯度消失或爆炸问题上具有独特的优势。本公开实施例中,将768维向量看成是特征数,其步长为该被测用户的总句子数,根据每句话的时间先后顺序进行循环。由于每个被测用户的总句子数不一样,这里采用双向变长LSTM模型,也就是最大步长为最多的句子数,每个被测用户都有一个变量为其有效长度。当LSTM网络隐藏层循环到其有效长度时,便停止循环,向输出层输出结果。Through Bert modeling, a multi-dimensional sentence vector with temporal characteristics can be obtained, and the dimensions are all 768. The LSTM model can learn some in-depth information in the timing characteristics, which is suitable for processing timing problems, and has unique advantages in solving the problem of vanishing or exploding gradients in traditional recurrent neural networks. In the embodiment of the present disclosure, the 768-dimensional vector is regarded as the feature number, and its step size is the total number of sentences of the tested user, and the cycle is performed according to the time sequence of each sentence. Since the total number of sentences of each tested user is different, the two-way variable length LSTM model is used here, that is, the maximum step size is the maximum number of sentences, and each tested user has a variable as its effective length. When the hidden layer of the LSTM network loops to its effective length, it stops the loop and outputs the result to the output layer.
在一个实施例中,损失函数可以采用交叉熵损失函数,学习速率为0.01,神经元节点均为64。In an embodiment, the loss function may adopt a cross-entropy loss function, the learning rate is 0.01, and the neuron nodes are all 64.
S24、确定所述文本信息中包含的目标关键词。S24. Determine the target keyword contained in the text information.
如表1所示,其为常见的抑郁症相关的关键词。As shown in Table 1, they are common keywords related to depression.
表1Table 1
Figure PCTCN2020117579-appb-000001
Figure PCTCN2020117579-appb-000001
如图4所示,基于英文DAIC(Distress Analysis Interview Corpus,抑郁分析访谈语料库)数据集中,关于四类关键词的抑郁症患者和非抑郁症出现的频率,第一列2.48表示在非抑郁症患者中,平均每一万个单词中会含有2.48个一类关键词,由图4可知,这四类关键词在抑郁症患者和非抑郁症出现的频率有着显著不同的。可选地,目标关键词包括多个类别,每一类别的目标关键词对应的权重值不同。As shown in Figure 4, based on the English DAIC (Distress Analysis Interview Corpus, Depression Analysis Interview Corpus) data set, regarding the frequency of occurrence of depression and non-depression for the four types of keywords, the first column 2.48 indicates that in non-depressive patients Among them, there are 2.48 first-class keywords per 10,000 words on average. As can be seen from Figure 4, the frequency of these four keywords in depressed patients and non-depressive patients is significantly different. Optionally, the target keyword includes multiple categories, and the weight value corresponding to the target keyword of each category is different.
在一个实施例中,可以将抑郁症相关的关键词分为四类,第一类为与抑郁症相关性很高的关键词,例如“suicidal”、“kill myself”、“depression”、“mental illness”,有自杀倾向是抑郁症的一个显著特征,另外很多抑郁症患者也清楚自己患有抑郁症,或者是精神疾病,因此和自杀、抑郁症相关的词为高度相关的关键词。第二类为与睡眠相关的关键词,例如“not sleep”、“difficult sleeping”、“insomnia”、“nightmares”、“toss and turn”,抑郁症患者通常伴随着长期失眠和食欲不振等的症状,这里将有关失眠的关键词单独提取出来作为一类。第三类为通常抑郁症患者的表现,主要为感到沮丧、焦虑、无助,例如“depressed”、“upset”、“hopelessness”、“helpless”等,这一类说明了抑郁症的典型症状,正常人偶尔会发生失落的心理,但抑郁症患者会长期处于沮丧、不兴奋的状态。据统计,90%的抑郁症患者在持续的狂躁后进入抑郁状态,60%的患者经历持续的抑郁状态后又表现出狂躁的症状,因此,第四类主要为易怒、孤独,这一类相关性比较小,因为正常人也会出现狂躁、易怒、感到孤独等现象,只是抑郁症患者出现的频率较高,这样的关键词例如“irritable”、 “uncontrollable”、“seclusive”、“loner”等。In one embodiment, keywords related to depression can be divided into four categories. The first category is keywords that are highly related to depression, such as "suicidal", "kill myself", "depression", and "mental". "Illness", suicidal tendency is a significant feature of depression. In addition, many depression patients also know that they are suffering from depression or mental illness. Therefore, words related to suicide and depression are highly relevant keywords. The second category is keywords related to sleep, such as "not sleep", "difficult sleeping", "insomnia", "nightmares", "toss and turn". Depression patients usually have symptoms such as long-term insomnia and loss of appetite. , Here the keywords related to insomnia are extracted separately as a category. The third category is the performance of patients with general depression, mainly feeling depressed, anxious, and helpless, such as "depressed", "upset", "hopelessness", "helpless", etc. This category describes the typical symptoms of depression. Normal people will occasionally experience loss of psychology, but patients with depression will be depressed and not excited for a long time. According to statistics, 90% of depression patients enter a depressive state after continuous mania, and 60% of patients show manic symptoms after experiencing a continuous depressive state. Therefore, the fourth category is mainly irritability and loneliness. The relevance is relatively small, because normal people will also experience manic, irritable, feeling lonely and other phenomena, but the frequency of depression is higher. Such keywords are "irritable", "uncontrollable", "seclusive", "loner" "Wait.
具体实施时,可以在被测用户的文本信息中找出这四类关键词,对于每一类关键词训练出恰当的权重,且如果一个关键词出现多次,只计算一次的权重得分,不累计计数。In specific implementation, these four types of keywords can be found in the text information of the tested user, and appropriate weights can be trained for each type of keywords. If a keyword appears multiple times, the weight score is calculated only once. Cumulative count.
另外,如果在含有该关键词中的句子中同时识别出有否定意义的否定词,例如“not”、“no”、“without”、“never”、“hardly”、“none”、“neither”、“litter”、“few”等词,则该关键词无效,比如“i don't want to suicide”,该关键词“suicide”不计入得分,最终计算出该被测用户的关键词总得分,例如,四类得分分别为10,5,3,1。In addition, if the negative words with negative meaning are also identified in the sentence containing the keyword, such as "not", "no", "without", "never", "hardly", "none", "neither" , "Litter", "few" and other words, the keyword is invalid, such as "i don't want to suicide", the keyword "suicide" is not included in the score, and finally the total keyword of the tested user is calculated. Score, for example, the scores for the four categories are 10, 5, 3, and 1.
基于此,本公开实施中,可以按照以下流程确定文本信息中包含的目标关键词:从所述文本信息中搜索预设的候选关键词;针对搜索出的候选关键词,判断所述候选关键词所在的句子中是否包含否定词;确定所在句子中不包含否定词的候选关键词为所述目标关键词。Based on this, in the implementation of the present disclosure, the target keywords contained in the text information can be determined according to the following process: searching for preset candidate keywords from the text information; judging the candidate keywords based on the searched candidate keywords Whether the sentence contains negative words; it is determined that candidate keywords that do not contain negative words in the sentence are the target keywords.
S25、根据预测结果及其对应的权重值和目标关键词及其对应的权重值的加权结果,确定被测用户是否属于抑郁症用户。S25: Determine whether the tested user is a user with depression according to the prediction result and its corresponding weight value and the weighted result of the target keyword and its corresponding weight value.
本步骤中,将步骤S23的预测结果和关键词识别的得分进行决策融合。具体地,可以将预测结果和目标关键字各自对应的权重值进行加权计算得到总得分。In this step, the prediction result of step S23 is combined with the score of the keyword recognition. Specifically, the weight value corresponding to the prediction result and the target keyword may be weighted to obtain the total score.
例如,具体实施时,可以按照以下公式确定被测用户的总得分:R*α+K*β,其中,R表示根据抑郁症预测模型输出的预测结果对应的得分,K表示目标关键字对应的得分,可选地,命中多个关键字时,K可以以各个关键字对应的得分之和累加计算,α和β分别表示R和K对应的权重值,α+β=1。For example, in specific implementation, the total score of the tested user can be determined according to the following formula: R*α+K*β, where R represents the score corresponding to the prediction result output by the depression prediction model, and K represents the corresponding score of the target keyword Score. Optionally, when multiple keywords are hit, K can be calculated as the sum of the scores corresponding to each keyword. α and β represent the weight values corresponding to R and K, and α+β=1.
可选地,R可以根据经验值或者实验结果来设定,本公开实施例对此不进行限定。例如,可以设定为如果抑郁症预测模型输出的预测结果为“是”时,R=22,如果抑郁症预测模型输出的预测结果为“否”时,R=0。Optionally, R can be set according to empirical values or experimental results, which is not limited in the embodiments of the present disclosure. For example, it can be set as if the prediction result output by the depression prediction model is "yes", R=22, and if the prediction result output by the depression prediction model is "no", R=0.
具体实施时,各类关键字对应的得分可以根据经验值或者实验结果进行设定,每一类关键字对应的得分可以不同,本公开实施例对此不进行限定。例如,根据经验值或者实验结果可以设定第一类关键字对应的得分为10,确定第二类关键字对应的得分为7分,设定第三类关键字对应的得分为5,设定第四类关键字对应的得分为3分。K初始值为0,每命中一个不同的关键字,则根据命中的关键字的类别累加相应的得分;命中同一个关键词多次,只计算一次的权重得分,不累计计数。例如,当第一次命中第二类关键字时,在当前 关键字得分的基础上加7,由此可以得到K的值,如果再次命中第二类关键字则不再累计加分。During specific implementation, the scores corresponding to various types of keywords can be set according to experience values or experimental results, and the scores corresponding to each type of keywords can be different, which is not limited in the embodiments of the present disclosure. For example, according to experience values or experimental results, you can set the score corresponding to the first category of keywords to 10, determine the score corresponding to the second category of keywords to 7 points, and set the score corresponding to the third category of keywords to 5. The score for the fourth category of keywords is 3 points. The initial value of K is 0. Each time a different keyword is hit, the corresponding score will be accumulated according to the category of the hit keyword; if the same keyword is hit multiple times, the weight score is calculated only once, and the count is not accumulated. For example, when you hit the second type of keyword for the first time, add 7 to the current keyword score, so you can get the value of K. If you hit the second type of keyword again, no more points will be accumulated.
为了便于理解,以抑郁症预测模型输出的预测结果是抑郁症,目标关键词命中2个第一类关键词,1个第二类关键词为例,由此可以确定,R=22,K=17,被测用户对应的总得分为:22*α+17*β。可选地,可以判断被测用户对应的总得分是否大于预先设定的最佳阈值,如果大于,则确定被测用户为抑郁症,如果不大于,则确定被测用户不是抑郁症。For ease of understanding, the prediction result output by the depression prediction model is depression, the target keyword hits two keywords of the first category, and one keyword of the second category as an example. From this, it can be determined that R=22, K= 17. The total score corresponding to the tested user is: 22*α+17*β. Optionally, it can be determined whether the total score corresponding to the tested user is greater than a preset optimal threshold, if it is greater, it is determined that the tested user is depressed, and if it is not greater than, it is determined that the tested user is not depressed.
可选地,具体实施时,α和β的具体数值可以根据经验值或者实验结果进行设定,本公开实施例对此不进行限定。Optionally, during specific implementation, the specific values of α and β may be set according to empirical values or experimental results, which are not limited in the embodiments of the present disclosure.
如图5所示,其为步骤S23和步骤S24进行融合的流程示意图。As shown in FIG. 5, it is a schematic diagram of the flow of fusion of step S23 and step S24.
本公开实施例中,通过麦克风阵列得到语音信号并将其转换为文本信息,进而利用基于文本的特征进行识别。该方法不需要采集患者面部视频,有助于保护受试者的隐私。传统的方法基于PHQ-9问卷进行试验,主观性较大,本公开实施例得出的结果较为量化、客观。在英文DAIC数据集上,本公开的平均识别准确度为80%,相比目前基于视频、音频的机器学习方法,识别准确度有所提高。In the embodiments of the present disclosure, the voice signal is obtained through the microphone array and converted into text information, and then the text-based features are used for recognition. This method does not need to collect the patient's facial video, which helps to protect the subject's privacy. The traditional method is based on the PHQ-9 questionnaire for testing, which is highly subjective, and the results obtained by the embodiments of the present disclosure are more quantitative and objective. On the English DAIC data set, the average recognition accuracy of the present disclosure is 80%. Compared with the current machine learning methods based on video and audio, the recognition accuracy is improved.
相比现有技术,本公开可以在大部分场合下使用,比如在被测用户家中等,不一定局限于正规医院中,不受使用场所的限定,更加快捷、高效,也更能保护患者的隐私等情况。另外,本公开采用长短时记忆网络LSTM对文本信息进行建模,更好地表达了文本信息的上下文相关性,可以通过优化模型、增加模型复杂度等方法,挖掘更深层次的文本特征,从而大幅地提高抑郁症的识别率。Compared with the prior art, the present disclosure can be used in most occasions, such as in the home of the user under test, not necessarily limited to regular hospitals, and not limited by the place of use. It is faster, more efficient, and can protect patients better. Privacy, etc. In addition, the present disclosure uses the long and short-term memory network LSTM to model the text information, which better expresses the context relevance of the text information, and can mine deeper text features by optimizing the model and increasing the model complexity. Improve the recognition rate of depression.
本公开实施例还提供了一种基于文本的抑郁症识别装置,如图6所示,该装置包括:The embodiment of the present disclosure also provides a text-based depression recognition device. As shown in FIG. 6, the device includes:
文本转换单元61,其被配置用于获取被测用户的语音信息并转换为文本信息;The text conversion unit 61 is configured to obtain the voice information of the tested user and convert it into text information;
向量转换单元62,其被配置用于将所述文本信息转换为文本嵌入向量;The vector conversion unit 62 is configured to convert the text information into a text embedding vector;
预测单元63,其被配置用于基于所述文本嵌入向量,利用抑郁症预测模型进行预测得到预测结果,所述抑郁症预测模型为利用长短时记忆模型LSTM对文本嵌入向量样本进行训练得到的,所述文本嵌入向量样本中包括抑郁症文本嵌入向量样本和非抑郁症文本嵌入向量样本;The prediction unit 63 is configured to perform prediction based on the text embedding vector using a depression prediction model to obtain a prediction result, where the depression prediction model is obtained by training a text embedding vector sample using a long and short-term memory model LSTM, The text embedding vector samples include depression text embedding vector samples and non-depressive text embedding vector samples;
第一确定单元64,其被配置用于确定所述文本信息中包含的目标关键词;The first determining unit 64 is configured to determine the target keyword contained in the text information;
第二确定单元65,其被配置用于根据所述预测结果及其对应的权重值和所述目标关键词及其对应的权重值的加权结果,确定所述被测用户是否属于抑郁症用户。The second determining unit 65 is configured to determine whether the tested user is a user with depression according to the prediction result and its corresponding weight value and the weighted result of the target keyword and its corresponding weight value.
在一种实施方式中,所述文本信息由按照时间顺序排列的若干个句子组成;In one embodiment, the text information is composed of several sentences arranged in chronological order;
所述向量转换单元62,被具体地配置用于基于Bert模型,将按照时间顺序排列的若干个句子分别转换为文本嵌入模型,得到按照时间顺序排列的若干个文本嵌入向量。The vector conversion unit 62 is specifically configured to convert several sentences arranged in chronological order into a text embedding model based on the Bert model to obtain several text embedding vectors arranged in chronological order.
在一种实施方式中,所述第一确定单元64被具体地配置用于从所述文本信息中搜索预设的候选关键词;针对搜索出的候选关键词,判断所述候选关键词所在的句子中是否包含否定词;确定所在句子中不包含否定词的候选关键词为所述目标关键词。In one embodiment, the first determining unit 64 is specifically configured to search for preset candidate keywords from the text information; for the searched candidate keywords, determine where the candidate keywords are located. Whether the sentence contains a negative word; it is determined that a candidate keyword that does not contain a negative word in the sentence is the target keyword.
在一种实施方式中,所述目标关键词包括多个类别,其中,每一类别的目标关键词对应的权重值不同。In one embodiment, the target keywords include multiple categories, wherein the target keywords of each category have different weight values.
为了描述的方便,以上各部分按照功能划分为各模块(或单元)分别描述。当然,在实施本公开时可以把各模块(或单元)的功能在同一个或多个软件或硬件中实现。For the convenience of description, the above parts are divided into modules (or units) according to their functions and described separately. Of course, when implementing the present disclosure, the functions of each module (or unit) can be implemented in one or more software or hardware.
在介绍了本公开示例性实施方式的基于文本的抑郁症识别方法和装置之后,接下来,介绍根据本公开的另一示例性实施方式的计算装置。After introducing the text-based depression recognition method and device of the exemplary embodiment of the present disclosure, next, a computing device according to another exemplary embodiment of the present disclosure is introduced.
所属技术领域的技术人员能够理解,本公开的各个方面可以实现为系统、方法或程序产品。因此,本公开的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。Those skilled in the art can understand that various aspects of the present disclosure can be implemented as a system, a method, or a program product. Therefore, various aspects of the present disclosure can be specifically implemented in the following forms, namely: complete hardware implementation, complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software implementations, which may be collectively referred to herein as "Circuit", "Module" or "System".
在一些可能的实施方式中,根据本公开的计算装置可以至少包括至少一个处理器、以及至少一个存储器。可选地,所述存储器可以存储有程序代码,当所述程序代码被所述处理器执行时,使得所述处理器执行本说明书上述描述的根据本公开各种示例性实施方式的基于文本的抑郁症识别方法中的步骤。例如,所述处理器可以执行如图2中所示的:步骤S21、获取被测用户的语音信息并转换为文本信息;步骤S22、将获得的文本信息转换为文本嵌入向量;步骤S23、基于得到的文本嵌入向量,利用抑郁症预测模型进行预测得到预测结果;步骤S24、确定所述文本信息中包含的目标关键词;以及步骤S25、根据预测结果及其对应的权重值和目标关键词及其对应的权重值的加权结果,确定被测用户是否属于抑郁症用户。In some possible implementation manners, the computing device according to the present disclosure may at least include at least one processor and at least one memory. Optionally, the memory may store program code, and when the program code is executed by the processor, the processor is caused to execute the text-based Steps in the method of identifying depression. For example, the processor may execute as shown in FIG. 2: Step S21, obtain the voice information of the tested user and convert it into text information; Step S22, convert the obtained text information into a text embedding vector; Step S23, based on The obtained text embedding vector is predicted by using the depression prediction model to obtain the prediction result; step S24, determining the target keyword contained in the text information; and step S25, according to the prediction result and its corresponding weight value and target keyword and The weighted result of its corresponding weight value determines whether the tested user is a user with depression.
下面参照图7来描述根据本公开的这种实施方式的计算装置70。图7显示的计算装置 70仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。The computing device 70 according to this embodiment of the present disclosure will be described below with reference to FIG. 7. The computing device 70 shown in FIG. 7 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
如图7所示,计算装置70以通用计算设备的形式表现。计算装置70的组件可以包括但不限于:上述至少一个处理器71、上述至少一个存储器72、连接不同系统组件(包括存储器72和处理器71)的总线73。As shown in FIG. 7, the computing device 70 is represented in the form of a general-purpose computing device. The components of the computing device 70 may include, but are not limited to: the aforementioned at least one processor 71, the aforementioned at least one memory 72, and a bus 73 connecting different system components (including the memory 72 and the processor 71).
可选地,总线73可以表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器、外围总线、处理器或者使用多种总线结构中的任意总线结构的局域总线。Optionally, the bus 73 may represent one or more of several types of bus structures, including a memory bus or a memory controller, a peripheral bus, a processor, or a local bus using any bus structure among multiple bus structures.
可选地,存储器72可以包括易失性存储器形式的可读介质,例如随机存取存储器(RAM)721和/或高速缓存存储器722,还可以进一步包括只读存储器(ROM)723。Optionally, the memory 72 may include a readable medium in the form of a volatile memory, such as a random access memory (RAM) 721 and/or a cache memory 722, and may further include a read-only memory (ROM) 723.
可选地,存储器72还可以包括具有一组(至少一个)程序模块724的程序/实用工具725,这样的程序模块724包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。Optionally, the memory 72 may also include a program/utility tool 725 having a set (at least one) program module 724. Such program module 724 includes but is not limited to: an operating system, one or more application programs, other program modules, and Program data, each of these examples or some combination may include the realization of a network environment.
计算装置70也可以与一个或多个外部设备74(例如键盘、指向设备等)通信,还可与一个或者多个使得用户能与计算装置70交互的设备通信,和/或与使得该计算装置70能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口75进行。并且,计算装置70还可以通过网络适配器76与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器76通过总线73与用于计算装置70的其它模块通信。应当理解,尽管图中未示出,可以结合计算装置70使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理器、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。The computing device 70 can also communicate with one or more external devices 74 (such as keyboards, pointing devices, etc.), and can also communicate with one or more devices that enable a user to interact with the computing device 70, and/or communicate with the computing device 70 can communicate with any device (such as a router, modem, etc.) that can communicate with one or more other computing devices. This communication can be performed through an input/output (I/O) interface 75. In addition, the computing device 70 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 76. As shown in the figure, the network adapter 76 communicates with other modules for the computing device 70 through the bus 73. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the computing device 70, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
在一些可能的实施方式中,本公开提供的基于文本的抑郁症识别方法的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在计算机设备上运行时,所述程序代码用于使所述计算机设备执行本说明书上述描述的根据本公开各种示例性实施方式的基于文本的抑郁症识别方法中的步骤,例如,所述计算机设备可以执行如图2中所示的:步骤S21、获取被测用户的语音信息并转换为文本信息;步骤S22、将获得的文本信息转换为文本嵌入向量;步骤S23、基于得到的文本嵌入向量,利用抑郁症预测模型进行预测得到预测结果;步骤S24、确定所述文本信息中包含的目标关键词;以及步骤S25、根据预测结果及其对应的权重值和目标关键词及其对应的权重值的加权结果,确定被测用户是否属于抑郁症用户。In some possible implementation manners, various aspects of the text-based depression recognition method provided in the present disclosure can also be implemented in the form of a program product, which includes program code, and when the program product runs on a computer device, The program code is used to make the computer device execute the steps in the text-based depression recognition method according to various exemplary embodiments of the present disclosure described above in this specification. For example, the computer device may execute the steps as shown in FIG. 2 As shown: step S21, obtain the voice information of the tested user and convert it into text information; step S22, convert the obtained text information into a text embedding vector; step S23, use the depression prediction model based on the obtained text embedding vector The prediction result is obtained by predicting; step S24, determining the target keyword contained in the text information; and step S25, determining the tested result according to the prediction result and its corresponding weight value and the weighting result of the target keyword and its corresponding weight value Whether the user is a user with depression.
可选地,所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。Optionally, the program product may use any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
本公开的实施方式的用于基于文本的抑郁症识别的程序产品可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在计算设备上运行。然而,本公开的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。The program product for text-based depression recognition of the embodiment of the present disclosure may adopt a portable compact disk read-only memory (CD-ROM) and include program code, and may run on a computing device. However, the program product of the present disclosure is not limited thereto. In this document, the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device.
可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括——但不限于——电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。The readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein. This propagated data signal can take many forms, including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing. The readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
可读介质上包含的程序代码可以用任何适当的介质传输,包括——但不限于——无线、有线、光缆、RF等等,或者上述的任意合适的组合。The program code contained on the readable medium can be transmitted by any suitable medium, including, but not limited to, wireless, wired, optical cable, RF, etc., or any suitable combination of the above.
可以以一种或多种程序设计语言的任意组合来编写用于执行本公开操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。The program code used to perform the operations of the present disclosure can be written in any combination of one or more programming languages. The programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming languages. Programming language-such as "C" language or similar programming language. The program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on. In the case of remote computing devices, the remote computing device can be connected to the user computing device through any kind of network-including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (for example, using Internet services). Provider to connect via the Internet).
通过具体实施方式的说明,应当可对本公开为达成预定目的所采取的技术手段及功效得以更加深入且具体的了解,然而所附图示用于提供参考与说明,并非用来对本公开加以限制。Through the description of the specific embodiments, it should be possible to gain a more in-depth and specific understanding of the technical means and effects adopted by the present disclosure to achieve the predetermined purpose. However, the accompanying drawings are used for reference and explanation, and are not used to limit the present disclosure.
工业实用性Industrial applicability
本公开实施例提供的基于文本的抑郁症识别方法,通过将语音信息转换为文本信息,进而利用Bert模型将文本信息转换成文本嵌入向量,同时使用长短时记忆网络LSTM对文本嵌入向量进行建模,从而能够更好地表达文本信息上下文的相关性,挖掘更深层次的文本特征,从而大幅地提高了抑郁症的识别精度;The text-based depression recognition method provided by the embodiments of the present disclosure converts speech information into text information, and then uses the Bert model to convert the text information into a text embedding vector, and at the same time uses a long and short-term memory network LSTM to model the text embedding vector , So as to better express the relevance of the text information context and dig deeper text features, thereby greatly improving the recognition accuracy of depression;
本公开实施例提供的方法还可以在大部分场合下使用,不受使用场所的限定,更加快捷、高效,并且不需要采集患者面部视频,有助于保护受试者的隐私;The method provided by the embodiments of the present disclosure can also be used in most occasions, is not limited by the place of use, is faster and more efficient, and does not need to collect patient facial videos, which helps to protect the privacy of subjects;
此外,本公开实施例提供的方法相较基于PHQ-9问卷进行试验、主观性较大的传统方法,得出的结果较为量化、客观;并且In addition, the method provided in the embodiments of the present disclosure provides more quantitative and objective results than the traditional method based on the PHQ-9 questionnaire, which is more subjective; and
本公开实施例提供的方法可以在PC(个人计算机)、移动客户端等部署,具有简便、高效、快捷的特点,可以辅助抑郁症的诊断识别。The method provided by the embodiment of the present disclosure can be deployed on a PC (personal computer), a mobile client, etc., has the characteristics of simplicity, efficiency, and speed, and can assist the diagnosis and identification of depression.

Claims (16)

  1. 一种基于文本的抑郁症识别方法,其特征在于,所述方法包括:A text-based method for identifying depression, characterized in that the method includes:
    获取被测用户的语音信息并转换为文本信息;Obtain the voice information of the tested user and convert it into text information;
    将所述文本信息转换为文本嵌入向量;Converting the text information into a text embedding vector;
    基于所述文本嵌入向量,利用抑郁症预测模型进行预测得到预测结果,所述抑郁症预测模型为利用长短时记忆模型LSTM对文本嵌入向量样本进行训练得到的,所述文本嵌入向量样本中包括抑郁症文本嵌入向量样本和非抑郁症文本嵌入向量样本;Based on the text embedding vector, a prediction result is obtained by using a depression prediction model, the depression prediction model is obtained by training a text embedding vector sample using a long- and short-term memory model LSTM, and the text embedding vector sample includes depression Embedding vector samples of symptom text and non-depression text embedding vector samples;
    确定所述文本信息中包含的目标关键词;以及Determine the target keywords contained in the text information; and
    根据所述预测结果及其对应的权重值和所述目标关键词及其对应的权重值的加权结果,确定所述被测用户是否属于抑郁症用户。According to the prediction result and its corresponding weight value and the weighted result of the target keyword and its corresponding weight value, it is determined whether the tested user is a user with depression.
  2. 根据权利要求1所述的方法,其特征在于,所述语音信息转换成所述文本信息的方法包括:利用成熟的语音-文本转换算法模型进行转换;以及进行专业人工翻译。The method according to claim 1, wherein the method for converting the voice information into the text information comprises: converting using a mature voice-text conversion algorithm model; and performing professional manual translation.
  3. 根据权利要求1至2中任一项所述的方法,其特征在于,所述文本信息由按照时间顺序排列的若干个句子组成;并且将所述文本信息转换为文本嵌入向量,具体包括:The method according to any one of claims 1 to 2, wherein the text information is composed of several sentences arranged in chronological order; and converting the text information into a text embedding vector specifically includes:
    基于Bert模型,将按照时间顺序排列的若干个句子分别转换为文本嵌入模型,得到按照时间顺序排列的若干个文本嵌入向量。Based on the Bert model, several sentences arranged in chronological order are respectively converted into text embedding models, and several text embedding vectors arranged in chronological order are obtained.
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,将所述文本信息转换为文本嵌入向量采用基于Bert模型的句子级别768维度文本嵌入。The method according to any one of claims 1 to 3, wherein the conversion of the text information into a text embedding vector adopts a sentence-level 768-dimensional text embedding based on a Bert model.
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述长短时记忆模型LSTM是双向变长LSTM模型。The method according to any one of claims 1 to 4, wherein the long and short-term memory model LSTM is a bidirectional variable length LSTM model.
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,确定所述文本信息中包含的目标关键词,具体包括:The method according to any one of claims 1 to 5, wherein determining the target keyword contained in the text information specifically includes:
    从所述文本信息中搜索预设的候选关键词;Searching for preset candidate keywords from the text information;
    针对搜索出的候选关键词,判断所述候选关键词所在的句子中是否包含否定词;以及For the searched candidate keywords, determine whether the sentence in which the candidate keywords are located contains negative words; and
    确定所在句子中不包含否定词的候选关键词为所述目标关键词。It is determined that candidate keywords that do not contain negative words in the sentence are the target keywords.
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述目标关键词包括多个类别,其中,每一类别的目标关键词对应的权重值不同。The method according to any one of claims 1 to 6, wherein the target keyword includes a plurality of categories, wherein the weight value corresponding to the target keyword of each category is different.
  8. 一种基于文本的抑郁症识别装置,其特征在于,所述装置包括:A text-based depression recognition device, characterized in that the device comprises:
    文本转换单元,其被配置用于获取被测用户的语音信息并转换为文本信息;A text conversion unit, which is configured to obtain the voice information of the tested user and convert it into text information;
    向量转换单元,其被配置用于将所述文本信息转换为文本嵌入向量;A vector conversion unit configured to convert the text information into a text embedding vector;
    预测单元,其被配置用于基于所述文本嵌入向量,利用抑郁症预测模型进行预测得到预测结果,所述抑郁症预测模型为利用长短时记忆模型LSTM对文本嵌入向量样本进行训练得到的,所述文本嵌入向量样本中包括抑郁症文本嵌入向量样本和非抑郁症文本嵌入向量样本;A prediction unit configured to perform prediction based on the text embedding vector using a depression prediction model to obtain a prediction result, and the depression prediction model is obtained by training a text embedding vector sample using a long and short-term memory model LSTM, so The text embedding vector samples include depression text embedding vector samples and non-depressive text embedding vector samples;
    第一确定单元,其被配置用于确定所述文本信息中包含的目标关键词;以及A first determining unit configured to determine the target keyword contained in the text information; and
    第二确定单元,其被配置用于根据所述预测结果及其对应的权重值和所述目标关键词及其对应的权重值的加权结果,确定所述被测用户是否属于抑郁症用户。The second determining unit is configured to determine whether the tested user is a user with depression according to the prediction result and its corresponding weight value and the weighted result of the target keyword and its corresponding weight value.
  9. 根据权利要求8所述的装置,其特征在于,所述语音信息转换成所述文本信息的方法包括:利用成熟的语音-文本转换算法模型进行转换;以及进行专业人工翻译。The device according to claim 8, wherein the method for converting the voice information into the text information comprises: using a mature voice-text conversion algorithm model for conversion; and performing professional manual translation.
  10. 根据权利要求8至9中任一项所述的装置,其特征在于,所述文本信息由按照时间顺序排列的若干个句子组成;并且The device according to any one of claims 8 to 9, wherein the text information is composed of several sentences arranged in chronological order; and
    其中,所述向量转换单元被具体地配置用于基于Bert模型,将按照时间顺序排列的若干个句子分别转换为文本嵌入模型,得到按照时间顺序排列的若干个文本嵌入向量。Wherein, the vector conversion unit is specifically configured to convert several sentences arranged in chronological order into a text embedding model respectively based on the Bert model to obtain several text embedding vectors arranged in chronological order.
  11. 根据权利要求8至10中任一项所述的装置,其特征在于,将所述文本信息转换为文本嵌入向量采用基于Bert模型的句子级别768维度文本嵌入。The device according to any one of claims 8 to 10, wherein the conversion of the text information into a text embedding vector adopts a sentence-level 768-dimensional text embedding based on a Bert model.
  12. 根据权利要求8至11中任一项所述的装置,其特征在于,所述长短时记忆模型LSTM是双向变长LSTM模型。The device according to any one of claims 8 to 11, wherein the long-short-term memory model LSTM is a bidirectional variable-length LSTM model.
  13. 根据权利要求8至12中任一项所述的装置,其特征在于,The device according to any one of claims 8 to 12, characterized in that:
    所述第一确定单元被具体地配置用于:从所述文本信息中搜索预设的候选关键词; 针对搜索出的候选关键词,判断所述候选关键词所在的句子中是否包含否定词;确定所在句子中不包含否定词的候选关键词为所述目标关键词。The first determining unit is specifically configured to: search for a preset candidate keyword from the text information; for the searched candidate keyword, determine whether the sentence in which the candidate keyword contains a negative word; It is determined that candidate keywords that do not contain negative words in the sentence are the target keywords.
  14. 根据权利要求8至13中任一项所述的装置,其特征在于,所述目标关键词包括多个类别,其中,每一类别的目标关键词对应的权重值不同。The device according to any one of claims 8 to 13, wherein the target keyword includes a plurality of categories, wherein each category has a different weight value corresponding to the target keyword.
  15. 一种计算装置,其特征在于,所述计算装置包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如权利要求1至7中任一项所述的方法的步骤。A computing device, characterized in that the computing device includes: a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and when the computer program is executed by the processor The steps of the method as claimed in any one of claims 1 to 7 are implemented.
  16. 一种计算机存储介质,所述计算机存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述方法的步骤。A computer storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 7 are realized.
PCT/CN2020/117579 2020-01-20 2020-09-25 Text-based major depressive disorder recognition method WO2021147363A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010065096.5A CN111241817A (en) 2020-01-20 2020-01-20 Text-based depression identification method
CN202010065096.5 2020-01-20

Publications (1)

Publication Number Publication Date
WO2021147363A1 true WO2021147363A1 (en) 2021-07-29

Family

ID=70865650

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/117579 WO2021147363A1 (en) 2020-01-20 2020-09-25 Text-based major depressive disorder recognition method

Country Status (2)

Country Link
CN (1) CN111241817A (en)
WO (1) WO2021147363A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241817A (en) * 2020-01-20 2020-06-05 首都医科大学 Text-based depression identification method
CN112582061A (en) * 2020-12-14 2021-03-30 首都医科大学 Text question-answer-based depression auxiliary screening method and system and storage medium
CN112768070A (en) * 2021-01-06 2021-05-07 万佳安智慧生活技术(深圳)有限公司 Mental health evaluation method and system based on dialogue communication
CN112927781A (en) * 2021-02-10 2021-06-08 杭州医典智能科技有限公司 Depression detection method based on natural language processing and time sequence convolution network
CN114298012B (en) * 2021-12-31 2022-10-25 中国电子科技集团公司电子科学研究院 Optimization method for generating long text scientific and technological information model
CN115631772A (en) * 2022-10-27 2023-01-20 四川大学华西医院 Method and device for evaluating risk of suicide injury, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106782602A (en) * 2016-12-01 2017-05-31 南京邮电大学 Speech-emotion recognition method based on length time memory network and convolutional neural networks
CN110110330A (en) * 2019-04-30 2019-08-09 腾讯科技(深圳)有限公司 Text based keyword extracting method and computer equipment
CN110413788A (en) * 2019-07-30 2019-11-05 携程计算机技术(上海)有限公司 Prediction technique, system, equipment and the storage medium of the scene type of session text
CN110532387A (en) * 2019-08-14 2019-12-03 成都中科云集信息技术有限公司 A kind of depression aided detection method based on open question and answer text
CN111241817A (en) * 2020-01-20 2020-06-05 首都医科大学 Text-based depression identification method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284387B (en) * 2018-10-19 2021-06-01 昆山杜克大学 Engraving idiom detection system, engraving idiom detection method, computer device and storage medium
CN110147445A (en) * 2019-04-09 2019-08-20 平安科技(深圳)有限公司 Intension recognizing method, device, equipment and storage medium based on text classification
CN110222827A (en) * 2019-06-11 2019-09-10 苏州思必驰信息科技有限公司 The training method of text based depression judgement network model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106782602A (en) * 2016-12-01 2017-05-31 南京邮电大学 Speech-emotion recognition method based on length time memory network and convolutional neural networks
CN110110330A (en) * 2019-04-30 2019-08-09 腾讯科技(深圳)有限公司 Text based keyword extracting method and computer equipment
CN110413788A (en) * 2019-07-30 2019-11-05 携程计算机技术(上海)有限公司 Prediction technique, system, equipment and the storage medium of the scene type of session text
CN110532387A (en) * 2019-08-14 2019-12-03 成都中科云集信息技术有限公司 A kind of depression aided detection method based on open question and answer text
CN111241817A (en) * 2020-01-20 2020-06-05 首都医科大学 Text-based depression identification method

Also Published As

Publication number Publication date
CN111241817A (en) 2020-06-05

Similar Documents

Publication Publication Date Title
WO2021147363A1 (en) Text-based major depressive disorder recognition method
Hassan et al. COVID-19 detection system using recurrent neural networks
Mirheidari et al. Detecting Signs of Dementia Using Word Vector Representations.
CN112750465B (en) Cloud language ability evaluation system and wearable recording terminal
CN111329494B (en) Depression reference data acquisition method and device
Wang et al. Automatic prediction of intelligible speaking rate for individuals with ALS from speech acoustic and articulatory samples
TW201208636A (en) Method and mobile device for awareness of linguistic ability
CN109841231B (en) Early AD (AD) speech auxiliary screening system for Chinese mandarin
CN112164459A (en) Information evaluation method for depression symptoms
Preum et al. CognitiveEMS: A cognitive assistant system for emergency medical services
Wang et al. Depression speech recognition with a three-dimensional convolutional network
Haider et al. A System for Real-Time Privacy Preserving Data Collection for Ambient Assisted Living.
CN113571184A (en) Dialogue interaction design method and system for mental health assessment
Li et al. Improvement on speech depression recognition based on deep networks
Zou et al. Semi-structural interview-based Chinese multimodal depression corpus towards automatic preliminary screening of depressive disorders
Cao et al. Depression prediction based on BiAttention-GRU
Liu et al. Spontaneous language analysis in Alzheimer’s disease: evaluation of natural language processing technique for analyzing lexical performance
JP2024504097A (en) Automated physiological and pathological assessment based on speech analysis
Rahman et al. Efficient online cough detection with a minimal feature set using smartphones for automated assessment of pulmonary patients
Danner et al. Advancing Mental Health Diagnostics: GPT-Based Method for Depression Detection
Triantafyllopoulos et al. COVYT: Introducing the Coronavirus YouTube and TikTok speech dataset featuring the same speakers with and without infection
Ding et al. Automatic recognition of student emotions based on deep neural network and its application in depression detection
Qin Research on the application of intelligent speech recognition technology in medical big data fog computing system
CN112599119B (en) Method for establishing and analyzing mobility dysarthria voice library in big data background
Dutta et al. A Fine-Tuned CatBoost-Based Speech Disorder Detection Model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20915701

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20915701

Country of ref document: EP

Kind code of ref document: A1