CN116168824A

CN116168824A - Multi-modal mental disorder assessment method, computer device, and storage medium

Info

Publication number: CN116168824A
Application number: CN202211630192.5A
Authority: CN
Inventors: 郭田友; 寻潺潺; 李敏健; 梁臻; 王松
Original assignee: Shenzhen Yanghe Technology Co ltd
Current assignee: Shenzhen Yanghe Technology Co ltd
Priority date: 2022-12-19
Filing date: 2022-12-19
Publication date: 2023-05-26

Abstract

The application discloses a multimodal mental disorder assessment method, a computer device and a storage medium. The multimodal mental disorder assessment method comprises the following steps: acquiring sound mode data, text mode data and image mode data of a target object; processing the sound mode data, the text mode data and the image mode data through the neural network model to obtain attribute classification results; positioning the decision tree diagram according to the attribute classification result; the target subject is guided to complete the decision tree graph to complete the mental disorder assessment. According to the multi-mode mental disease assessment method, text data, voice data and video data of the interaction process of the target object and the machine equipment are processed through the artificial intelligence multi-mode technology, a decision tree diagram is completed under the cooperation of a neural network by means of a training data set, the target object is guided to complete the decision tree diagram so as to complete mental disease assessment, DSM-5 diagnosis standards can be supported, privacy protection of a patient is met, and conflict of the patient on mental disorder diagnosis is reduced.

Description

Multi-modal mental disorder assessment method, computer device, and storage medium

Technical Field

The present application relates to the field of mental disease diagnosis, and in particular, to a multimodal mental disease assessment method, a computer apparatus, and a storage medium.

Background

At present, people have little knowledge of the science popularization of mental disease diagnosis. Due to lack of cognition for mental disorders, patients and their relatives have prejudice in diagnosis and treatment of mental diseases, so that patients cannot be subjected to timely intervention treatment.

Neither the free-dialogue robots nor the fixed-task robots of the related art support the DSM-5 diagnostic standard. In addition, since the above two types of robots collect only voice and text as input data, in the actual mental disorder diagnosis process, the robots cannot accurately judge the type of mental disorder of the patient only according to the voice and text.

In addition, at present, clinical psychology is based on DSM-5 mental disorder diagnosis, and diagnosis is completed by using two modes of real-person dialogue or handwriting questionnaire, so that the requirement of patients on privacy protection cannot be met, and part of patients can generate contradiction emotion.

Disclosure of Invention

In view of this, the present invention aims to solve, at least to some extent, one of the problems in the related art. To this end, an object of the present application is to provide a multimodal mental disorder assessment method, a computer apparatus, and a storage medium.

The embodiment of the application provides a multi-mode mental disease assessment method. The multimodal mental disorder assessment method comprises the following steps: acquiring sound mode data, text mode data and image mode data of a target object; processing the sound modal data, the text modal data and the image modal data through a neural network model to obtain an attribute classification result; positioning a decision tree diagram according to the attribute classification result; directing the target subject to complete the decision tree graph to complete the mental disorder assessment.

Thus, the multi-mode mental disease evaluation method processes text data, voice data and video data of mental disorder patient interaction process through artificial intelligence multi-mode technology, and completes decision tree diagram under the cooperation of a neural network by means of training data set, and obtains final diagnosis result based on DSM-5 diagnosis standard. The multimodal mental disorder assessment method improves the diagnostic efficiency of mental disorders, protects patient privacy and alleviates conditions lacking in mental disorder professionals.

In some embodiments, the processing the sound modal data, the text modal data, and the image modal data by the neural network model to obtain the attribute classification result includes: preprocessing the sound modal data, the text modal data and the image modal data respectively to obtain corresponding vector matrixes, and respectively processing the vector matrixes through a plurality of sub-networks of the neural network model to output corresponding preliminary attribute classification results; and integrating the preliminary attribute classification result to obtain the attribute classification result.

Therefore, the multi-mode mental disease evaluation method processes and integrates the sound mode data, the text mode data and the image mode data of the target object through a plurality of sub-networks of the neural network model to obtain the attribute classification result, and the attribute classification result is used as a key basis for guiding the target object to complete subsequent mental disease evaluation.

In some embodiments, the preprocessing the sound mode data, the text mode data, and the image mode data to obtain corresponding vector matrices includes: performing Fourier transform on the sound mode data to obtain a corresponding vector matrix; converting the text modal data into a corresponding vector matrix by a natural language processing encoder; and extracting time slices from the video images of the image modal data to obtain a corresponding vector matrix.

Thus, the multi-modal mental disease assessment method realizes conversion of modal data into a vector matrix by utilizing a Fourier transform method, a natural language coding function and a video image extraction slicing function. The multi-modal mental disease assessment method converts modal data into a vector matrix, and is beneficial to obtaining a preliminary attribute classification result through subsequent processing.

In some embodiments, the sound mode data includes sound frequency mode data, voice mode data, intonation mode data, and voiceprint mode data, the image mode data includes limb mode data, facial expression mode data, and eye movement mode data, and the obtaining the sound mode data, text mode data, and image mode data of the target object includes: acquiring the voice frequency mode data, the voice mode data, the intonation mode data and the voiceprint mode data of human-computer interaction of the target object through a voice acquisition device; acquiring the text modal data of human-computer interaction of the target object through a user input device and/or a voice recognition technology; and acquiring the limb modal data, the facial expression modal data and the eye movement modal data of human-computer interaction of the target object through an image acquisition device.

Therefore, the multi-modal mental disease assessment method is used for acquiring different modal data of the target object through the sound acquisition device, the user input device and the image acquisition device, and providing enough diagnosis data for multi-modal mental disease assessment, so that the multi-modal mental disease assessment result is more accurate.

In some embodiments, the processing, by the multiple sub-networks of the neural network model, the vector matrix to output the corresponding preliminary attribute classification result includes: and processing vector matrixes corresponding to the voice frequency modal data, the voice modal data, the intonation modal data and the voiceprint modal data through an emotion classification model to output classification results of all the attributes in a preset attribute set, and obtaining the preliminary attribute classification results corresponding to the voice modal data.

Therefore, the multi-modal mental disease assessment method processes the vector matrix corresponding to the sound modal data through the emotion classification model to obtain the preliminary attribute classification result, and is convenient for integrating the preliminary classification result corresponding to the sound modal data.

In some embodiments, the processing, by the multiple sub-networks of the neural network model, the vector matrix to output the corresponding preliminary attribute classification result includes: and processing the vector matrix corresponding to the text modal data through a long-short-term memory neural network to output classification results of all the attributes in a preset attribute set, and obtaining the preliminary attribute classification results corresponding to the text modal data.

Therefore, the multi-modal mental disease assessment method of the invention obtains the preliminary attribute classification result by vectorizing the text modal data through the long-term memory neural network, is beneficial to integrating the preliminary attribute classification result of the text modal data with the preliminary attribute classification result corresponding to other modal data, and improves the diagnosis efficiency and the diagnosis accuracy of the target object.

In some embodiments, the processing, by the multiple sub-networks of the neural network model, the vector matrix to output the corresponding preliminary attribute classification result includes: processing a vector matrix corresponding to the limb modal data through a first convolutional neural network to output classification results of all the attributes in a preset attribute set, and obtaining the preliminary attribute classification results corresponding to the limb modal data; processing a vector matrix corresponding to the facial expression modal data through a second convolutional neural network to output classification results of all the attributes in the preset attribute set, and obtaining the preliminary attribute classification results corresponding to the facial expression modal data; and processing the vector matrix corresponding to the eye movement mode data through a third convolution neural network to output the classification result of each attribute in the preset attribute set, so as to obtain the preliminary attribute classification result corresponding to the eye movement mode data.

In this way, the multi-modal mental disease assessment method processes the limb modal data, the facial modal data and the eye movement modal data through the convolutional neural network, acquires the limb behavior, the facial expression and the eye movement data of the target object from the image angle, comprehensively judges the inner mind of the target object, and improves the accuracy of multi-modal mental disease diagnosis.

In certain embodiments, the multimodal mental disorder assessment method comprises: and if the mental disease evaluation cannot be completed according to the decision tree diagram, invoking an identification form to cooperatively complete the mental disease evaluation.

Therefore, the multi-mode mental disease evaluation method can complete mental disease evaluation of the target object by calling the discrimination table to cooperatively decide the tree diagram, so that the mental disorder diagnosis result of the target object is more accurate.

The present application also provides a computer device comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, implements the method of any of the above embodiments.

Thus, the computer equipment processes the text data, the voice data and the video data generated in the interaction process of the target object and the machine equipment through the artificial intelligence multi-mode technology by applying the multi-mode mental disease evaluation method, completes the decision tree diagram under the cooperation of the neural network by means of the training data set, and obtains the final diagnosis result based on the DSM-5 diagnosis standard. The multimodal mental disorder assessment method improves the diagnostic efficiency of mental disorders, protects patient privacy and alleviates conditions lacking in mental disorder professionals.

The present application also provides a computer readable storage medium storing a computer program which, when executed by one or more processors, implements the method of any of the above embodiments.

Thus, the computer readable storage medium of the application applies the multi-mode mental disease assessment method, processes text data, voice data and video data generated in the interaction process of the target object and the machine equipment through the artificial intelligence multi-mode technology, completes a decision tree diagram under the cooperation of a neural network by means of a training data set, and obtains a final diagnosis result based on a DSM-5 diagnosis standard. The multimodal mental disorder assessment method improves the diagnostic efficiency of mental disorders, protects patient privacy and alleviates conditions lacking in mental disorder professionals.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a flow chart of a cognitive recognition method in certain embodiments of the present application;

FIG. 2 is a flow chart of a cognitive recognition method in certain embodiments of the present application;

FIG. 3 is a flow chart of a cognitive recognition method in certain embodiments of the present application;

FIG. 4 is a flow chart of a cognitive recognition method in certain embodiments of the present application;

FIG. 5 is a flow chart of a cognitive recognition method in certain embodiments of the present application;

fig. 6 is a schematic structural diagram of a deep learning module of a cognitive recognition method in some embodiments of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.

In the description of the present application, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or an implicit indication of the number of technical features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more of the described features. In the description of the present application, the meaning of "a plurality" is two or more, unless specifically defined otherwise.

In the description of the present application, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; may be mechanically connected, may be electrically connected, or may be in communication with each other; can be directly connected or indirectly connected through an intermediate medium, and can be communicated with the inside of two elements or the interaction relationship of the two elements. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art as the case may be.

The following disclosure provides many different embodiments or examples for implementing different structures of the present application. In order to simplify the disclosure of the present application, the components and arrangements of specific examples are described below. Of course, they are merely examples and are not intended to limit the present application. Furthermore, the present application may repeat reference numerals and/or letters in the various examples, which are for the purpose of brevity and clarity, and which do not in themselves indicate the relationship between the various embodiments and/or arrangements discussed.

In view of this, referring to fig. 1, in some embodiments, the present application provides a method for multimodal mental disorder assessment. The multimodal mental disorder assessment method comprises the following steps:

02: acquiring sound mode data, text mode data and image mode data of a target object;

04: processing the sound mode data, the text mode data and the image mode data through the neural network model to obtain attribute classification results;

06: positioning the decision tree diagram according to the attribute classification result;

08: the target subject is guided to complete the decision tree graph to complete the mental disorder assessment.

The application also provides a computer device. The computer device includes a memory and a processor, the memory having a computer program stored therein. The processor is used for acquiring sound mode data, text mode data and image mode data of the target object; processing the sound mode data, the text mode data and the image mode data through the neural network model to obtain attribute classification results; positioning the decision tree diagram according to the attribute classification result; the target subject is guided to complete the decision tree graph to complete the mental disorder assessment.

Specifically, first, sound modality data, text modality data, and image modality data of a target object are acquired. The target object in the embodiments of the present application may be a patient with mental disorder, or may be a patient with other mental diseases, which is not limited herein.

The modality data is understood to be data composed of modalities formed by different existing forms or sources of information. The sound modality data may be data composed of modalities formed by the sound existence form. The text modal data may be data composed of modalities formed by the text existence form. The image mode data can be data composed of modes formed by the image existence form.

The sound mode data in the embodiment of the application can be obtained through the mode that the sound collecting device extracts the audio data, can be indirectly obtained through the mode that the video is converted into the audio, can be obtained through other modes, and is not limited herein. The sound mode data comprises sound frequency mode data, voice mode data, intonation mode data and voiceprint mode data.

The text modal data may be obtained directly by a user input device extracting a text data mode, or may be obtained indirectly by a voice recognition technology, or may be obtained by other modes, which is not limited herein.

The image modality data includes limb modality data, facial expression modality data, and eye movement modality data. The image mode data can be obtained by extracting the mode of the video data through the image acquisition device, and can also be obtained through other modes, and the image mode data are not limited herein.

According to the multi-mode mental disease evaluation method, the sound mode data, the text mode data and the image mode data of the target object are obtained, sufficient data sources are provided for multi-mode mental disease evaluation, and the data authenticity and the data integrity of the multi-mode mental disease evaluation are ensured.

And then, processing the sound mode data, the text mode data and the image mode data through the neural network model to obtain an attribute classification result. The neural network model in the embodiments of the present application may be composed of a plurality of neural networks. In detail, each neural network uses corresponding data generated when a psychiatrist makes an actual diagnosis based on the DSM-5 diagnosis standard as a training data set for training to form a neural network model. Wherein, 70% of the training data set is used as the training set, 30% of the training data set is used as the verification set, and 20% of the training data set is used as the test set. The training set, validation set and test set are all generated by random sampling with substitution. The training set is used for training of the neural network. The verification set data is used for judging whether the neural network after learning needs to process the fitting problem. The test set data is used for evaluating the prediction capability and performance of the neural network after learning. The attribute classification results can integrate the sets of the attribute classification results corresponding to different modal data through the neural network model to obtain a new set of the attribute classification results without repetition. For example, the attribute classification result corresponding to the text modal data is: { anxiety, insomnia, avoidance behavior }. The attribute classification result corresponding to the sound mode data is as follows: { speech disorder, anxiety, depressed mood }. The attribute classification result corresponding to the limb modal data is as follows: { avoidance behavior, panic attacks, suicidal ideation or behavior }. The attribute classification result corresponding to the facial expression modal data is as follows: { anxiety, somnolence, avoidance behavior }. The attribute classification result corresponding to the eye movement mode data is as follows: { depressed mood, avoidance behavior }. The integrated attribute classification result is: { anxiety, insomnia, avoidance behavior, speech disorders, depressed mood, panic attacks, suicidal ideation or behavior, sleepiness }. That is, the multi-modal mental disease assessment method of the application obtains the attribute classification result through the neural network model and the modal data processing, and the result is used as the positioning basis of the subsequent decision tree diagram.

And then, positioning the decision tree diagram according to the attribute classification result. The decision tree diagram is different tree diagrams preset according to different psychological disease personnel.

It will be appreciated that the decision tree in DSM-5 differential diagnosis Manual has a total of 29 decision tree diagrams. The 29 decision tree graphs are respectively: decision tree diagrams for poor school performance, decision tree diagrams for behavioral problems in children or young years, decision tree diagrams for speech disorders, decision tree diagrams for transition from context, decision tree diagrams for delusions, decision tree diagrams for hallucinations, decision tree diagrams for stress symptoms, decision tree diagrams for high or inflated mood, decision tree diagrams for irritable mood, decision tree diagrams for depressed mood, decision tree diagrams for suicidal ideation or behavior, decision tree diagrams for psychomotor retardation, decision tree diagrams for anxiety, decision tree diagrams for panic attacks, decision tree diagrams for avoidance behavior, decision tree diagrams for etiology involving trauma or psychological stress sources, decision tree diagrams for somatic complaints or disease/appearance anxiety, decision tree diagrams for appetite changing or abnormal feeding behavior, decision tree diagrams for insomnia, tree diagrams for somnolence, decision tree diagrams for female sexual dysfunction, decision tree diagrams for male sexual dysfunction, decision tree diagrams for control of tree forms, tree diagrams for memory impairment or for cognitive impairment, tree forms for cognitive impairment or tree forms or for cognitive impairment, tree forms for the cause of the like, and the like.

From the above detailed contents of the decision tree in the DSM-5 differential diagnosis Manual, it is known that the number of suitable decision tree may be one or more depending on the condition of the target object. Thus, the decision tree graph of the present application may be any one or more of the 29 decision tree graphs described above.

Specifically, taking the integrated attribute classification result { anxiety, insomnia, avoidance behavior, speech disorder, depression mood, panic attack, suicide concept or behavior, sleepiness } as an example, the multi-modal mental disease assessment method matches a decision tree diagram suitable for the condition of a target object according to the words in the attribute classification result. For example, the "decision tree diagram for anxiety" that can be located in "DSM-5 differential diagnosis manual" is based on "anxiety", the "decision tree diagram for insomnia" that can be located in "DSM-5 differential diagnosis manual" is based on "insomnia", and so on, the "decision tree diagram for somnolence" that can be located in "DSM-5 differential diagnosis manual" is based on "somnolence", that is, the located decision tree diagrams are 8 according to the attribute classification results { anxiety, insomnia, avoidance behavior, speech disorder, depression mood, panic attacks, suicide concept or behavior, somnolence } integrated in the examples. That is, the multi-modal mental disease assessment method in the embodiment of the application can locate the decision tree diagram according to the keywords of the attribute classification result, and improves the searching efficiency of the attribute classification result.

Finally, the target subject is guided to complete the decision tree graph to complete the mental disorder assessment. The guiding manner of the target object may be guiding the target object to complete the decision tree diagram by means of psychology, or may be other manners, which are not limited herein. That is, after locating the appropriate decision tree, the artificial intelligence system applying the multi-modal mental disease assessment method of the present application completes the content supplementation of one or more located decision tree by guiding the target object one by one. The artificial intelligence system can flexibly ask questions to the target object according to the flow of the tree diagram, and the trend of each node in the tree diagram is determined from answers of the target object through capturing keywords. Since the psychological diagnosis process is not a pure chat, the reply mode, question mode, expression word, etc. of the system are formulated by psychologists. According to the multi-mode mental disease assessment method, the target object is guided to complete the decision tree diagram by using the psychology technology through the artificial intelligence technology, the privacy of the target object is protected, the lack of professionals in the psychological field is relieved, and the diagnosis efficiency of the mental disease of the target object is improved.

Thus, the multi-mode mental disease evaluation method processes text data, voice data and video data of mental disorder patient interaction process through the artificial intelligence multi-mode technology, and completes decision tree diagram under the cooperation of a neural network by means of training data set to obtain final diagnosis result. The multimodal mental disorder assessment method improves the diagnostic efficiency of mental disorders, protects patient privacy and alleviates conditions lacking in mental disorder professionals.

Referring to fig. 2, in some embodiments, step 04 includes:

041: respectively preprocessing sound mode data, text mode data and image mode data to obtain a corresponding vector matrix;

042: respectively processing vector matrixes through a plurality of sub-networks of the neural network model to output corresponding primary attribute classification results;

043: and integrating the preliminary attribute classification results to obtain attribute classification results.

The processor is used for respectively preprocessing the sound mode data, the text mode data and the image mode data to obtain a corresponding vector matrix; respectively processing vector matrixes through a plurality of sub-networks of the neural network model to output corresponding primary attribute classification results; and integrating the preliminary attribute classification results to obtain attribute classification results.

Specifically, firstly, the multi-modal mental disease evaluation method of the application respectively preprocesses sound modal data, text modal data and image modal data to obtain corresponding vector matrixes. A vector matrix is a multi-dimensional array consisting of a plurality of equal length vectors, one for each column or row. The vector matrix corresponding to the sound mode data in the embodiment of the application may be a set of sound multidimensional arrays, the vector matrix corresponding to the text mode data may be a set of text multidimensional arrays, and the vector matrix corresponding to the image mode data may be a set of image multidimensional arrays. The preprocessing in the embodiments of the present application is performed in the preprocessor of the deep learning module. For sound modality data, preprocessing in embodiments of the present application may transform sound modality data into a corresponding vector matrix through fourier transformation. For text modal data, preprocessing in embodiments of the present application may convert the text modal data into a corresponding vector matrix through a natural language processed encoder. For image mode data, preprocessing in the embodiment of the application can convert the image mode data into a corresponding vector matrix in a mode of extracting time slices through videos. That is, the multi-modal mental disease assessment method of the application vectorizes different modal data, facilitates calculation and processing of the data, and improves efficiency and accuracy of the multi-modal mental disease assessment method.

Secondly, the multi-mode mental disease assessment method respectively processes vector matrixes through a plurality of sub-networks of the neural network model to output corresponding primary attribute classification results. The plurality of sub-networks of the neural network model in the embodiments of the present application may include a long-short-term memory neural network, an emotion classification model, a first convolutional neural network, a second convolutional neural network, and a third convolutional neural network.

For the text modal data, the long-term and short-term memory neural network can be used for classifying the text vector matrix to obtain a corresponding primary attribute classification result. For the voice modal data, the emotion classification model classifies the voice vector matrix to obtain a corresponding primary attribute classification result. The image modality data includes limb modality data, facial expression modality data, and eye movement modality data. For the limb modal data, the first convolutional neural network can be used for classifying the image vector matrix to obtain a corresponding preliminary attribute classification result. For facial expression modal data, the second convolutional neural network can be used for classifying the image vector matrix to obtain a corresponding preliminary attribute classification result. For the eye movement modal data, the third convolutional neural network can be used for classifying the image vector matrix to obtain a corresponding primary attribute classification result.

For example, the preliminary attribute classification result of the text modal data output by the long-short term memory neural network may be: { bad school performance: behavioral problems in 6%, children or young age: 4%, speech disorders: 7%, transfer with the environment: 3%, delusions: 2%, illusion: 8%, irritability mood: 2.5%, suicidal ideation or behavior: 7.5%, psychomotor retardation: 4.5%, anxiety: 5.5%, panic attacks: 3.5%, avoidance behavior: 6.5% etiology involves trauma or a source of psychosocial stress: 1.5%, somatic complaints or disease/appearance anxiety: 8.5%, appetite altered or abnormal feeding behaviour: 2.3%, insomnia: 2.7%, somnolence: 2.2%, sexual dysfunction: 2.8%, attack behavior: 1.3%, impulse or impulse control problem: 1.7%, self-injury or self-residual: 1.4%, excessive substance use: 1.6%, memory loss: 1.9%, cognitive impairment: 2.1%, somatic disease as etiology: 9% and others: 1% }.

The preliminary attribute classification result of the sound modality data output by the emotion classification model may be: { speech disorders: 1%, symptoms of stress: 9%, elevated or inflated mood: 12%, irritability mood: 8%, depressed mood: 7%, anxiety: 3%, etiology involves trauma or a source of psychosocial stress: 6%, somatic complaints or illness/appearance anxiety: 14%, attack behavior: 5%, impulse or impulse control problem: 25%, cognitive impairment: 8.5% and others: 1.5% }.

The primary attribute classification result of the limb modal data output by the first convolutional neural network may be: { bad school performance: behavioral problems in 10%, children or young age: 1%, transfer with the environment: 9%, delusions: 2%, illusion: 8%, symptoms of stress: 3%, suicide concept or behavior: 7%, psychomotor retardation: 4%, panic attacks: 6%, avoidance behavior: 2.5% etiology involves trauma or a source of psychosocial stress: 7.5% of attack behavior: 3.5%, impulse or impulse control problem: 6.5%, self-injury or self-residual: 15%, excessive substance use: 13.5% and others: 1.5% }.

The preliminary attribute classification result of the facial expression modal data output by the second convolutional neural network may be: { symptoms of stress: 10%, elevated or inflated mood: 2%, irritability mood: 8%, depressed mood: 7%, anxiety: 3%, etiology involves trauma or a source of psychosocial stress: 6%, somatic complaints or illness/appearance anxiety: 4%, insomnia: 5%, somnolence: 15%, suicide concept or behavior: 7.5%, psychomotor retardation: 2.5%, avoidance behavior: 11.5%, panic attacks: 8.5%, sexual dysfunction: 9% and others: 1% }.

The preliminary attribute classification result of the eye movement mode data output by the third convolutional neural network may be: { bad school performance: behavioral problems in 30%, children or young adulthood: 10%, depressed mood: 13%, psychomotor retardation: 17%, avoidance behavior: 25% and others: 5% }.

That is, the multi-modal mental disease assessment method of the application respectively processes the vector matrix through a plurality of sub-networks of the neural network model to output corresponding preliminary attribute classification results, so that the subsequent integration of the classification results is facilitated.

Finally, the multi-mode mental disease assessment method integrates the preliminary attribute classification result to obtain an attribute classification result. The attribute classification results in the embodiment of the application can be the preliminary attribute classification results corresponding to different modal data to be integrated, so that a new set without repetition and with various attribute classification results is obtained.

The attribute classification result is cooperatively regulated by two five-dimensional parameters K and theta. Wherein, k= [ K1, K2, K3, K4, K5], θ= [ θ1, θ2, θ3, θ4, θ5], K is the attribute category of the attribute classification result, and θ is a preset percentage of each preliminary attribute classification result.

The integration process of the preliminary attribute classification result can be as follows: the sorting is firstly carried out according to the percentage from high to low, and the number of categories with the percentage larger than or equal to theta is n. If n is more than or equal to kn, the first n categories are taken from the sorted classification results; if n < kn, the first kn categories are taken from the sorted classification results. And then integrating the preliminary attribute classification results to obtain a new non-repeated class set, and obtaining the attribute classification result. Specifically, the process of integrating the preliminary attribute classification results to obtain the attribute classification results is as follows:

First, for text modal data, the ranked preliminary attribute classification result may be, for example: { somatic disease is causative: 9%, somatic complaints or illness/appearance anxiety: 8.5%, illusion: 8%, suicide concept or behavior: 7.5%, speech disorders: 7%, avoidance behavior: 6.5%, bad school performance: 6%, anxiety: 5.5%, psychomotor retardation: behavioral problems in 4.5%, children or young age: 4%, panic attacks: 3.5%, transfer with environment: 3%, sexual dysfunction: 2.8%, insomnia: 2.7%, irritability mood: 2.5%, appetite altered or abnormal feeding behaviour: 2.3%, somnolence: 2.2%, cognitive impairment: 2.1%, delusions: 2%, memory loss: 1.9%, impulse or impulse control problem: 1.7%, excessive substance use: 1.6% etiology involves trauma or a source of psychosocial stress: 1.5%, self-wound or self-residual: 1.4% of attack behavior: 1.3% and others: 1% }. Assuming θ1=7, k1=3, the preliminary attribute classification result with a percentage equal to or greater than θ1 is: { somatic disease is causative: 9%, somatic complaints or illness/appearance anxiety: 8.5%, illusion: 8%, suicide concept or behavior: 7.5%, speech disorders: 7% }, i.e. n1=5 and n1+.k1, so take the first 5 categories.

That is, the set of categories derived from the text modality data is: { somatic disorders as etiology, somatic complaints or disorder/appearance anxiety, hallucinations, suicidal ideation or behavioral and speech disorders }.

Secondly, for the sound modality data, the sorted preliminary attribute classification result may be, for example: { impulse or impulse control problem: 25%, somatic complaints or illness/appearance anxiety: 14%, elevated or inflated mood: 12%, symptoms of stress: 9%, cognitive impairment: 8.5%, irritability mood: 8%, depressed mood: 7% the etiology involves trauma or a source of psychosocial stress: 6%, attack behavior: 5%, anxiety: 3%, others: 1.5% and speech disorders: 1% }. Assuming θ2=20%, k2=2, the preliminary attribute classification result with a percentage equal to or greater than θ2 is: { impulse or impulse control problem: 25% }, i.e. n2=1 and n2 < k2, thus taking the first 2 categories.

That is, the set of categories derived from the sound modality data is: { problem of impulse or impulse control and physical complaint or illness/physical anxiety }.

Then, for the limb modality data, the ranked preliminary attribute classification result may be, for example: { self-injury or self-residual: 15%, excessive substance use: 13.5%, bad school performance: 10%, transfer with environment: 9%, illusion: 8%, etiology involves trauma or a source of psychosocial stress: 7.5%, suicide concept or behavior: 7%, impulse or impulse control problem: 6.5%, panic attacks: 6%, psychomotor retardation: 4%, attack behavior: 3.5%, symptoms of stress: 3%, avoidance behavior: 2.5%, delusions: 2%, others: behavioral problems in 1.5% and children or young age: 1% }. Assuming θ3=10, k3=1, the preliminary attribute classification result with a percentage equal to or greater than θ3 is: { self-injury or self-residual: 15%, excessive substance use: 13.5%, bad school performance: 10% }, i.e. n3=3 and n3+.k3, so take the first 3 categories.

That is, the set of categories derived from limb modality data is: { self-injury or self-residual, excessive substance use and poor school performance }.

Next, for the facial expression modal data, the ranked preliminary attribute classification result may be, for example: { somnolence: 15%, avoidance behavior: 11.5%, symptoms of stress: 10%, sexual dysfunction: 9%, panic attacks: 8.5%, irritability mood: 8%, suicide concept or behavior: 7.5%, depressed mood: 7% the etiology involves trauma or a source of psychosocial stress: 6%, insomnia: 5%, somatic complaints or illness/appearance anxiety: 4%, anxiety: 3%, psychomotor retardation: 2.5%, elevated or inflated mood: 2% and others: 1% }. Assuming θ4=11, k4=1, the preliminary attribute classification result with a percentage equal to or greater than θ4 is: { somnolence: 15%, avoidance behavior: 11.5% }, i.e. n4=2 and n4+.k4, so the first 2 categories are taken.

That is, the set of categories derived from the limb facial expression modality data is: { sleepiness and avoidance behavior }.

Finally, for the eye movement mode data, the sorted preliminary attribute classification result may be, for example: { bad school performance: 30%, avoidance behavior: 25%, psychomotor retardation: 17%, depressed mood: behavioral problems in 13%, children or young adulthood: 10%, and others: 5% }. Assuming θ5=24, k5=2, the preliminary attribute classification result with a percentage equal to or greater than θ5 is: { bad school performance: 30%, avoidance behavior: 25% }, i.e. n5=2 and n5+.k5, so take the first 2 categories.

That is, the category set obtained from the eye movement modality data is: { bad school performance and avoidance behavior }.

Therefore, the integrated attribute classification result is: { somatic disorders as etiology, somatic complaints or disease/appearance anxiety, hallucinations, suicidal ideation or behavior, speech disorders, impulsivity or impulse control problems, self-injury or self-disability, excessive substance use, poor school performance, somnolence and avoidance behavior }.

That is, the multi-modal mental disease assessment method in the embodiment of the present application obtains the attribute classification result by integrating the preliminary attribute classification result obtained according to the modal data of the target object, as a key basis for guiding the target object to complete the subsequent mental disease assessment.

Referring to FIG. 3, in some embodiments, step 041 includes:

0411: carrying out Fourier transform on the voice modal data to obtain a corresponding vector matrix;

0412: converting the text modal data into a corresponding vector matrix through a natural language processing encoder;

0413: and extracting time slices from the video images of the image modal data to obtain a corresponding vector matrix.

The processor is used for carrying out Fourier transform on the sound mode data to obtain a corresponding vector matrix; converting the text modal data into a corresponding vector matrix through a natural language processing encoder; and extracting time slices from the video images of the image modal data to obtain a corresponding vector matrix.

Specifically, first, the multi-modal mental disease evaluation method of the present application performs fourier transform on sound modal data to obtain a corresponding vector matrix. The fourier transform in the embodiments of the present application may represent the acoustic mode data function satisfying a certain condition as a trigonometric function or a linear combination of the integrals of the functions. The acoustic mode data is continuous waveform data, and the fourier transform of the present application may be continuous fourier transform, or may be another form of transform, and is not limited herein. That is, the multi-modal mental disease assessment method of the application vectorizes sound modal data through Fourier transformation, is favorable for unified and efficient processing of vector data, and improves the efficiency of multi-modal mental disease assessment.

Secondly, the multi-modal mental disease assessment method converts text modal data into a corresponding vector matrix through a natural language processing encoder. The model architecture of natural language processing can be divided into three parts, encoder, context and decoder. The encoder for natural language processing in the embodiment of the application can segment and extract features of text modal data and then convert the text modal data into a text vector matrix. That is, the multi-modal mental disease assessment method utilizes the encoder of natural language processing to vectorize text modal data, realizes automatic text classification and important label extraction, realizes text data monitoring, and enables natural language processing to achieve high precision and high efficiency.

Finally, the multi-mode mental disease evaluation method extracts time slices from video images of the image mode data to obtain a corresponding vector matrix. The tool for extracting the time slice from the video image in the application can be Photoshop software or other modes, and is not limited herein.

Referring to fig. 4, in some embodiments, step 02 includes:

021: acquiring sound frequency mode data, voice mode data, intonation mode data and voiceprint mode data of human-computer interaction of a target object through a sound acquisition device;

022: acquiring text modal data of human-computer interaction of a target object through a user input device and/or a voice recognition technology;

023: and acquiring limb mode data, facial expression mode data and eye movement mode data of the target object for human-computer interaction through an image acquisition device.

The processor is used for acquiring sound frequency mode data, voice mode data, intonation mode data and voiceprint mode data of human-computer interaction of the target object through the sound acquisition device; acquiring text modal data of human-computer interaction of a target object through a user input device and/or a voice recognition technology; and acquiring limb mode data, facial expression mode data and eye movement mode data of the target object for human-computer interaction through an image acquisition device.

Specifically, firstly, the multi-mode mental disease evaluation method of the application acquires sound frequency mode data, voice mode data, intonation mode data and voiceprint mode data of human-computer interaction of a target object through a sound acquisition device. The man-machine interaction in the embodiment of the application may be the interaction between the target object and the mechanical device, and the mechanical device may include, for example, a dialogue robot and an artificial intelligence system, or may be other interactions, which is not limited herein. It will be appreciated that the mechanical device may include a sound recording device, a keyboard, and a camera to obtain audio, text, and video data of the target object.

The sound collection device in the embodiment of the present application may be a sound recorder, or may be other sound recording devices, which is not limited herein. The sound mode data comprises sound frequency mode data, voice mode data, intonation mode data and voiceprint mode data. The sound frequency mode data, the voice mode data, the intonation mode data and the voiceprint mode data can be obtained through an audio analysis module of the mechanical equipment, and can also be obtained through other modes, and the method is not limited herein. The sound frequency refers to the speed of the dialogue, and can be used for judging the reaction speed of the target object. Speech and intonation refer to the intonation of a conversation and may be used to determine the emotional change of a target subject in the conversation process. Voiceprints are acoustic spectra carrying speech information, have specificity and relative stability, and can be used for judging the identity of a target object. That is, the multi-modal mental illness evaluation method obtains the voice frequency mode data, the voice mode data, the intonation mode data and the voiceprint mode data of the target object through the voice acquisition device, and processes the voice frequency mode data, the voice mode data, the intonation mode data and the voiceprint mode data to obtain the clearer and complete dialogue content and emotion change of the target object in the human-computer interaction process, so that the multi-modal mental illness evaluation method is more accurate.

Secondly, the multi-modal mental illness assessment method obtains text modal data of human-computer interaction of the target object through a user input device and/or a voice recognition technology (Automatic Speech Recognition, ASR). That is, the text mode data may be obtained by means of a user input device or a voice recognition technology, or may be obtained by means of both a user input device and a voice recognition technology. The voice recognition technology can acquire text mode data from audio conversion characters of the recording equipment.

The user input device includes a keyboard, touch screen, or other device. Specifically, in the embodiment of the present application, the text mode data may be obtained by inputting a text through a keyboard by a user, or may be obtained by handwriting input of a text through a touch screen.

Finally, the multi-mode mental disease evaluation method acquires limb mode data, facial expression mode data and eye movement mode data of human-computer interaction of a target object through an image acquisition device. The image capturing device in this embodiment of the present application may be a camera, which includes an internal camera and an external camera, or may be other devices, which is not limited herein. The limb modality data, facial modality data, and eye movement modality data may be obtained by a video analysis module of the mechanical device. According to the method, the acquired limb mode data, facial mode data and eye movement mode data show the mental state activity and the behavior activity of the target object in the human-computer interaction process on the whole, and a more accurate diagnosis basis can be provided for the multi-mode mental disease assessment method.

In certain embodiments, step 042 comprises:

0421: and processing vector matrixes corresponding to the voice frequency modal data, the voice modal data, the intonation modal data and the voiceprint modal data through the emotion classification model to output classification results of all the attributes in the preset attribute set, and obtaining a preliminary attribute classification result corresponding to the voice modal data.

The processor processes vector matrixes corresponding to the voice frequency modal data, the voice modal data, the intonation modal data and the voiceprint modal data through the emotion classification model to output classification results of all the attributes in the preset attribute set, and a preliminary attribute classification result corresponding to the voice modal data is obtained.

Specifically, the preset attribute set is: { emotion, behavior, cognition, event }. For example: the preliminary attribute classification result output by the neural network corresponding to the eye movement mode data is as follows: { emotion: depression, behavior: tic disorder, cognition: delay, event: insomnia }. The preliminary attribute classification result output by the neural network corresponding to the text modal data is as follows: { emotion: depression, behavior: tic disorder, cognition: delay, event: insomnia }. Specific content corresponding to the attribute of the preset attribute set corresponding to each mode data is different.

The corresponding preset attribute set of the sound modality data in the embodiment of the present application may be, for example: { speech disorders, catatonic symptoms, elevated or expanded mood, irritable mood, depressed mood, anxiety, etiology involving traumatic or psycho-social stressors, somatic complaints or disease/physical anxiety, aggression, impulse or impulse control problems, cognitive impairment, others }.

According to the multi-modal mental disease assessment method, the vector matrixes corresponding to the voice frequency modal data, the voice modal data, the intonation modal data and the voiceprint modal data are processed through the emotion classification model, classification results of all the attributes in the preset attribute set are output, and the preliminary attribute classification results corresponding to the voice modal data are obtained. In detail, according to the multi-mode mental disease assessment method, emotion classification results are obtained by classifying in an emotion classification model according to sound frequency modal data, voice modal data, intonation modal data and voiceprint modal data, and then the emotion classification results are further classified according to the weight ratio of each attribute in a preset attribute set corresponding to the sound modal data to obtain preliminary attribute classification results corresponding to the sound modal data.

The emotion classification result may be, for example: { anxiety injury: 0.8%, curiosity: 0.2%, depression: 0.9%, firm: 0.1%, worry: 0.7%, surprised: 0.3%, guilt: 0.6%, complain of remoistening: 0.4%, sad: 1.1%, restlessness 1.9%, depression 1.2%, confidence 1.8%, difficulty: 1.3%, vital energy: 1.7%, wounded heart: 1.4%, disappointing: 1.6%, anxiety: 0.5%, disorientation: 1.5%, cynicism: 2.1%, self-responsibility: 2.9%, regret: 2.2%, expect: 2.8%, doubt: 2.3%, ash core: 2.7%, pain: 2.4%, panic: 2.6%, vexation: 3.1%, worry: 3.9%, tension: 3.2%, dysphoria: 3.8%, puzzles: 3.3%, vexation: 3.7%, despair: 3.4%, anger: 3.6%, calm: 2.5%, regrets: 3.5%, principal and drove: 4%, aversion to: 6%, doubt: 2%, fear: 8%, open heart: 1% and guilt: 7% }.

The emotion classification result further classifies the emotion classification result according to the weight ratio of each attribute in the preset attribute set corresponding to the sound mode data to obtain a preliminary attribute classification result corresponding to the sound mode data, for example, the preliminary attribute classification result may be: { speech disorders: 734%, symptoms of stress: 8%, elevated or inflated mood: 10%, irritability mood: 6%, depressed mood: 14%, anxiety: 15%, etiology involves trauma or a source of psychosocial stress: 5%, somatic complaints or illness/appearance anxiety: 11%, attack behavior: 9%, impulse or impulse control problem: 7%, cognitive impairment: 13%, others: 0.266% }.

The specific process of obtaining the preliminary attribute classification result according to the emotion classification result may be that 42 emotions exist in the emotion classification result, correspondingly, the weight of each attribute in the preset attribute set is a vector, the dimension is 42, and the weight vector of "speech disorder" is

The weight vector of "symptoms of stress" is

The weight vector of "other" is +.>

Wherein->

Is "you Shang" in vector w _i Is a medium weight of (2); />

Is "curiosity" in vector w _i Weights in (a) and so on, +.>

Is "guilt of crediting" in vector w _i Is a weight of (a). That is, w ₁ A weight vector that is "speech disorder"; w (w) ₂ Weight vector for "symptoms of stress", and so on w ₁₂ Is an "other" weight vector.

Assuming that the weight vector of "speech disturbance" is w1= [7%,1%,8%,2%,6%,4%,3.5%,2.5%,3.6%,3.4%,3.7%,3.3%,3.8%,3.2%,3.9%,3.1%,2.6%,2.4%,2.7%,2.3%,2.8%,2.2%,2.9%,2.1%,1.5%,0.5%,1.6%,1.4%,1.7%,1.3%,1.8%,1.2%,1.9%,1.1%,0.4%,0.6%,0.3%,0.7%,0.1%,0.9%,0.2%,0.8% ], the sum of the products of the weight vector of the "speech disturbance" in the preliminary attribute classification result and the respective attribute proportion 734 in the emotion classification result is calculated to obtain the "speech disturbance" in the preliminary attribute result as 1.8%, and the rest percentage in the preliminary attribute result is 1.8%, and the rest percentage is the same as the calculated attribute, and the attribute is not identical in the preliminary classification result.

That is, the emotion classification model processes the vector matrix corresponding to the voice modal data to obtain classification results of each attribute in the preset attribute set, and then each attribute and the classification results thereof form a set to obtain a preliminary attribute classification result corresponding to the voice modal data.

According to the emotion classification model, voice modal data are vectorized, and a preliminary attribute classification result corresponding to the voice modal data is provided for the multi-modal mental illness assessment method.

In certain embodiments, step 042 comprises:

0422: and processing the vector matrix corresponding to the text modal data through the long-short-term memory neural network to output the classification result of each attribute in the preset attribute set, and obtaining the preliminary attribute classification result corresponding to the text modal data.

The processor processes the vector matrix corresponding to the text modal data through the long-term and short-term memory neural network to output the classification result of each attribute in the preset attribute set, and the preliminary attribute classification result corresponding to the text modal data is obtained.

Specifically, the multi-modal mental disease assessment method processes the vector matrix corresponding to the text modal data through the long-term and short-term memory neural network to output classification results of all the attributes in the preset attribute set, and obtains a preliminary attribute classification result corresponding to the text modal data. A long short-term memory (LSTM) is a special cyclic neural network capable of capturing long-term dependency and memorizing long-term information, and can be used for processing a current task by combining information reserved by a previous state. The long-term and short-term memory neural network in the embodiment of the application can be used for extracting the characteristics of the positive sequence and the negative sequence of the text vector matrix with a certain sequence to obtain the preliminary attribute classification result of the text modal data corresponding to the preset attribute set.

As described above, the preset attribute set is: { emotion, behavior, cognition, event }. For example: the preliminary attribute classification result output by the neural network corresponding to the eye movement mode data is as follows: { emotion: depression, behavior: tic disorder, cognition: delay, event: insomnia }. The preliminary attribute classification result output by the neural network corresponding to the text modal data is as follows: { emotion: depression, behavior: tic disorder, cognition: delay, event: insomnia }. The specific content of the preset attribute set corresponding to each mode data is different.

The preset attribute set corresponding to the text modal data in the embodiment of the present application may be, for example: { bad school performance, behavioral problems in children or young years, speech disturbances, mood swings, delusions, hallucinations, irritability mood, suicidal ideation or behavior, psychomotor retardation, anxiety, panic attacks, avoidance behavior, etiology involving traumatic or psychosocial stressors, somatic complaints or disease/appearance anxiety, appetite changing or abnormal eating behavior, insomnia, somnolence, sexual dysfunction, aggression, impulsivity or impulse control problems, self-injury or self-disability, excessive substance use, memory loss, cognitive impairment, somatic diseases as etiology and others }.

The preliminary attribute classification result corresponding to the text modal data may be, for example: { bad school performance: behavioral problems in 6%, children or young age: 4%, speech disorders: 7%, transfer with the environment: 3%, delusions: 2%, illusion: 8%, irritability mood: 2.5%, suicidal ideation or behavior: 7.5%, psychomotor retardation: 4.5%, anxiety: 5.5%, panic attacks: 3.5%, avoidance behavior: 6.5% etiology involves trauma or a source of psychosocial stress: 1.5%, somatic complaints or disease/appearance anxiety: 8.5%, appetite altered or abnormal feeding behaviour: 2.3%, insomnia: 2.7%, somnolence: 2.2%, sexual dysfunction: 2.8%, attack behavior: 1.3%, impulse or impulse control problem: 1.7%, self-injury or self-residual: 1.4%, excessive substance use: 1.6%, memory loss: 1.9%, cognitive impairment: 2.1%, somatic disease as etiology: 9% and others: 1% }.

That is, the long-term and short-term memory neural network processes the vector matrix corresponding to the text modal data to obtain the classification result of each attribute in the preset attribute set, and then each attribute and the classification result thereof form a set to obtain the preliminary attribute classification result corresponding to the text modal data.

The multi-modal mental disease assessment method is characterized in that a text vector matrix is extracted through a long-short-term memory neural network to obtain a preliminary attribute classification result, and the preliminary attribute classification result diagnosis of voice modal data is cooperated, so that the statistical efficiency of the text modal data is improved, and the accuracy of the multi-modal mental disease assessment method is further improved.

Therefore, the multi-modal mental disease assessment method of the invention obtains the preliminary attribute classification result by vectorizing the text modal data through the long-term memory neural network, is beneficial to integrating the preliminary attribute classification result of the text modal data with the preliminary attribute classification result corresponding to the sound modal data and the preliminary attribute classification result corresponding to the image modal data, and improves the diagnosis efficiency and the diagnosis accuracy of the target object.

Referring to fig. 5, in some embodiments, step 042 includes:

0423: processing a vector matrix corresponding to the limb modal data through a first convolutional neural network to output classification results of all the attributes in a preset attribute set, and obtaining a preliminary attribute classification result corresponding to the limb modal data;

0424: processing a vector matrix corresponding to the facial expression modal data through a second convolutional neural network to output classification results of all the attributes in the preset attribute set, and obtaining a preliminary attribute classification result corresponding to the facial expression modal data;

0425: and processing the vector matrix corresponding to the eye movement mode data through a third convolution neural network to output classification results of all the attributes in the preset attribute set, and obtaining a preliminary attribute classification result corresponding to the eye movement mode data.

The processor processes the vector matrix corresponding to the limb modal data through the first convolutional neural network to output classification results of all the attributes in the preset attribute set, and a preliminary attribute classification result corresponding to the limb modal data is obtained; processing a vector matrix corresponding to the facial expression modal data through a second convolutional neural network to output classification results of all the attributes in the preset attribute set, and obtaining a preliminary attribute classification result corresponding to the facial expression modal data; and processing the vector matrix corresponding to the eye movement mode data through a third convolution neural network to output classification results of all the attributes in the preset attribute set, and obtaining a preliminary attribute classification result corresponding to the eye movement mode data.

Specifically, firstly, the multi-modal mental disease assessment method processes a vector matrix corresponding to limb modal data through a first convolutional neural network to output classification results of all attributes in a preset attribute set, and obtains a preliminary attribute classification result corresponding to the limb modal data. Convolutional neural networks (Convolutional Neural Networks, CNN) are a type of feed-forward neural network that contains convolutional computations and has a depth structure, which can be used for video processing. The limb modality data may include contents such as limb language and behavior ideas of the target object.

The preset attribute set corresponding to the limb modal data in the embodiment of the present application may be, for example: { bad school performance, behavioral problems in children or young years, delusions, hallucinations, catatonic symptoms, suicidal ideation or behavior, psychomotor retardation, panic attacks, avoidance behavior, etiology involving traumatic or psychosocial stressors, aggression, impulsivity or impulse control problems, self-injuries or self-disabilities, excessive substance use, and others }.

The preliminary attribute classification result corresponding to the limb modal data may be, for example: { bad school performance: behavioral problems in 10%, children or young age: 1%, transfer with the environment: 9%, delusions: 2%, illusion: 8%, symptoms of stress: 3%, suicide concept or behavior: 7%, psychomotor retardation: 4%, panic attacks: 6%, avoidance behavior: 2.5% etiology involves trauma or a source of psychosocial stress: 7.5% of attack behavior: 3.5%, impulse or impulse control problem: 6.5%, self-injury or self-residual: 15%, excessive substance use: 13.5% and others: 1.5% }.

That is, the first convolutional neural network processes the vector matrix corresponding to the limb modal data to obtain the classification result of each attribute in the preset attribute set, and then each attribute and the classification result thereof form a set to obtain the preliminary attribute classification result corresponding to the limb modal data.

The first convolutional neural network in the embodiment of the application is used for processing a vector matrix corresponding to the limb modal data, so that limb language and behavior ideas of the target object are cooperatively judged according to the corresponding attribute classification result.

Secondly, the multi-modal mental disease assessment method processes the vector matrix corresponding to the facial expression modal data through the second convolution neural network to output classification results of all the attributes in the preset attribute set, and a preliminary attribute classification result corresponding to the facial expression modal data is obtained. The facial expression modality data may include contents such as a change in facial expression of a target subject and a transformation of an mind.

The preset attribute set corresponding to the facial expression modal data in the embodiment of the present application may be, for example: { stress symptoms, elevated or swollen mood, irritable mood, depressed mood, anxiety, etiology involving traumatic or psychosocial stressors, somatic complaints or disease/appearance anxiety, insomnia, somnolence, suicidal ideation or behavior, psychomotor retardation, avoidance behavior, panic attacks, sexual dysfunction and others }.

The preliminary attribute classification result corresponding to the facial expression modal data may be, for example: { symptoms of stress: 10%, elevated or inflated mood: 2%, irritability mood: 8%, depressed mood: 7%, anxiety: 3%, etiology involves trauma or a source of psychosocial stress: 6%, somatic complaints or illness/appearance anxiety: 4%, insomnia: 5%, somnolence: 15%, suicide concept or behavior: 7.5%, psychomotor retardation: 2.5%, avoidance behavior: 11.5%, panic attacks: 8.5%, sexual dysfunction: 9% and others: 1% }.

That is, the second convolutional neural network processes the vector matrix corresponding to the facial expression modal data to obtain the classification result of each attribute in the preset attribute set, and then each attribute and the classification result thereof form a set to obtain the preliminary attribute classification result corresponding to the facial expression modal data.

The second convolutional neural network in the embodiment of the application is used for processing a vector matrix corresponding to the facial expression modal data, so that the change of the facial expression and the transformation of the mind and mind of the target object are cooperatively judged according to the corresponding attribute classification result.

Finally, the multi-modal mental disease assessment method processes the vector matrix corresponding to the eye movement modal data through the third convolution neural network to output classification results of all the attributes in the preset attribute set, and a preliminary attribute classification result corresponding to the eye movement modal data is obtained. The eye movement modality data may include the track change of the eyeball of the target object, the mind and the like.

The preset attribute set corresponding to the eye movement mode data in the embodiment of the present application may be, for example: { bad school performance, behavioral problems in children or young children, depressed mood, psychomotor retardation, avoidance behavior and others }.

The preliminary attribute classification result corresponding to the eye movement mode data may be, for example: { bad school performance: behavioral problems in 30%, children or young adulthood: 10%, depressed mood: 13%, psychomotor retardation: 17%, avoidance behavior: 25% and others: 5% }.

That is, the third convolutional neural network processes the vector matrix corresponding to the eye movement mode data to obtain the classification result of each attribute in the preset attribute set, and then each attribute and the classification result thereof form a set to obtain the preliminary attribute classification result corresponding to the eye movement mode data.

The third convolutional neural network in the embodiment of the application is used for processing a vector matrix corresponding to the eye movement modal data, so that the track change and the conversion of the inner mind of the eyeball of the target object are cooperatively judged according to the corresponding attribute classification result.

In certain embodiments, the multimodal mental disorder assessment method comprises:

and if the mental disease evaluation cannot be completed according to the decision tree diagram, invoking the discrimination table to cooperatively complete the mental disease evaluation.

Specifically, if the processor cannot complete the mental disease assessment according to the decision tree diagram, the processor invokes the discrimination table to cooperatively complete the mental disease assessment. The decision tree diagram may be based on The fifth edition of The U.S. mental disorder diagnosis and statistics manual (The 5) ^th version of diagnostic and statistical manual of mental disorders, DSM-5) and guiding the target object to complete the "tree-shaped" branch structure graph with the evaluation and judgment functions of mental disorder diseases by psychology. The discrimination table can be a discrimination standard table formed by an existing training data set established by a psychiatrist according to actual diagnosis data and is used for finishing the evaluation of mental diseases on a target object by a collaborative decision tree diagram. Taking the decision tree for self-injury or self-disability as an example, assume that the final leaf node of the decision tree for self-injury or self-disability comprises "notch dyskinesia", "fraud", "third part: non-suicidal self-injury "and" adaptive disorders ", etc., and" notch movement disorder "," fraud "and" third section: non-suicidal self-injury "does not have subtype classification and therefore can be directly output as a result of decision tree diagrams. However, since subtype classification exists in "adaptation disorder", it is necessary to call the discrimination table for analysis based on "adaptation disorder".

Upon invocation of the authentication form, the artificial intelligence system continues to purposefully converse with the patient to determine the final result. The method for determining the final result is basically to collect and classify the data of different modes of the patient, and the specific embodiments are consistent with the foregoing, but the data and the classified categories are different, and will not be repeated here.

As shown in fig. 6, the deep learning module includes a preprocessor and a neural network that process the respective modal data. In detail, the specific workflow of the deep learning module is: the different modal data are converted into corresponding vector matrixes in corresponding preprocessors, the corresponding neural network processes the vector matrixes to obtain preliminary attribute classification results, and the preliminary attribute classification results are integrated to obtain attribute classification results. Then, the artificial intelligence system locates and matches the appropriate decision tree diagram by utilizing the keywords in the attribute classification result according to the condition of the target object, and guides the target object to complete the decision tree diagram by utilizing psychology. And if the mental disease evaluation cannot be completed according to the decision tree diagram, invoking the discrimination table to cooperatively complete the mental disease evaluation.

The application also provides a computer device. A computer device comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, performs the method of any of the above embodiments.

Specifically, the Memory in the embodiments of the present application may be a storage medium with a storage function, such as an internal Memory, an external Memory, a random access Memory (RAM, random Access Memory)), a Read-Only Memory (ROM), and a hard disk, or may be other memories, which are not limited herein. The processor may be a Central Processing Unit (CPU) or a graphics processor, or may be other processors, and is not limited herein. The computer program may be in the form of source code, object code, and executable files, or in other forms, without limitation. That is, the computer device in the embodiment of the present application stores the modal data acquired in the human-computer interaction process in the memory in the form of a computer program, and the computer program is executed by the processor to implement the above-described multimodal mental disease assessment method.

Thus, the computer equipment of the application applies the multi-mode mental disease assessment method to process text data, voice data and video data of mental disorder patient interaction process through artificial intelligence multi-mode technology, and completes a decision tree diagram under the cooperation of a neural network by means of a training data set, and obtains a final diagnosis result based on DSM-5 diagnosis standard. The multimodal mental disorder assessment method improves the diagnostic efficiency of mental disorders, protects patient privacy and alleviates conditions lacking in mental disorder professionals.

The present application also provides a computer-readable storage medium. The computer readable storage medium stores a computer program which, when executed by one or more processors, implements the method of any of the above embodiments.

For example, the computer program when executed by a processor performs the steps of the method of:

It is understood that the computer readable storage medium in the embodiments of the present application may include a U-disk, a removable hard disk, a recording medium, a magnetic disk, an optical disk, a computer memory, and the like. That is, the computer readable storage medium in the embodiment of the present application stores the modal data acquired during the human-computer interaction in the form of a computer program, and the computer program is executed by a processor to implement the above-described multimodal mental disease assessment method.

The foregoing examples represent only a few embodiments of the present application, which are described in more detail and are not thereby to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A method of multimodal mental disorder assessment comprising:

acquiring sound mode data, text mode data and image mode data of a target object;

Processing the sound modal data, the text modal data and the image modal data through a neural network model to obtain an attribute classification result;

positioning a decision tree diagram according to the attribute classification result;

directing the target subject to complete the decision tree graph to complete the mental disorder assessment.

2. The method for assessing a multi-modal mental disorder according to claim 1, wherein the processing the sound modality data, the text modality data, and the image modality data through a neural network model to obtain an attribute classification result comprises:

preprocessing the sound modal data, the text modal data and the image modal data respectively to obtain a corresponding vector matrix;

respectively processing the vector matrix through a plurality of sub-networks of the neural network model to output corresponding preliminary attribute classification results;

and integrating the preliminary attribute classification result to obtain the attribute classification result.

3. The method for assessing a mental disorder according to claim 2, wherein preprocessing the sound modality data, the text modality data, and the image modality data to obtain corresponding vector matrices includes:

Performing Fourier transform on the sound mode data to obtain a corresponding vector matrix;

converting the text modal data into a corresponding vector matrix by a natural language processing encoder;

and extracting time slices from the video images of the image modal data to obtain a corresponding vector matrix.

4. The method of claim 2, wherein the sound modality data includes sound frequency modality data, voice modality data, intonation modality data, and voiceprint modality data, the image modality data includes limb modality data, facial expression modality data, and eye movement modality data, and the acquiring sound modality data, text modality data, and image modality data of the target object includes:

acquiring the voice frequency mode data, the voice mode data, the intonation mode data and the voiceprint mode data of human-computer interaction of the target object through a voice acquisition device;

acquiring the text modal data of human-computer interaction of the target object through a user input device and/or a voice recognition technology;

and acquiring the limb modal data, the facial expression modal data and the eye movement modal data of human-computer interaction of the target object through an image acquisition device.

5. The method for assessing a multi-modal mental disorder according to claim 4, wherein the processing the vector matrix to output the corresponding preliminary attribute classification results through the sub-networks of the neural network model, respectively, comprises:

and processing vector matrixes corresponding to the voice frequency modal data, the voice modal data, the intonation modal data and the voiceprint modal data through an emotion classification model to output classification results of all the attributes in a preset attribute set, and obtaining the preliminary attribute classification results corresponding to the voice modal data.

6. The method for assessing a multi-modal mental disorder according to claim 4, wherein the processing the vector matrix to output the corresponding preliminary attribute classification results through the sub-networks of the neural network model, respectively, comprises:

and processing the vector matrix corresponding to the text modal data through a long-short-term memory neural network to output classification results of all the attributes in a preset attribute set, and obtaining the preliminary attribute classification results corresponding to the text modal data.

7. The method for assessing a multi-modal mental disorder according to claim 4, wherein the processing the vector matrix to output the corresponding preliminary attribute classification results through the sub-networks of the neural network model, respectively, comprises:

Processing a vector matrix corresponding to the limb modal data through a first convolutional neural network to output classification results of all the attributes in a preset attribute set, and obtaining the preliminary attribute classification results corresponding to the limb modal data;

processing a vector matrix corresponding to the facial expression modal data through a second convolutional neural network to output classification results of all the attributes in the preset attribute set, and obtaining the preliminary attribute classification results corresponding to the facial expression modal data;

and processing the vector matrix corresponding to the eye movement mode data through a third convolution neural network to output the classification result of each attribute in the preset attribute set, so as to obtain the preliminary attribute classification result corresponding to the eye movement mode data.

8. The method for assessing the mental disorder according to claim 1, wherein the method for assessing the mental disorder comprises:

and if the mental disease evaluation cannot be completed according to the decision tree diagram, invoking an identification form to cooperatively complete the mental disease evaluation.

9. A computer device, characterized in that it comprises a memory and a processor, in which memory a computer program is stored which, when executed by the processor, implements the method according to any of claims 1-8.

10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by one or more processors, implements the method according to any of claims 1-8.