CN115331804A - Multi-modal psychological disease diagnosis method, computer device and storage medium - Google Patents

Multi-modal psychological disease diagnosis method, computer device and storage medium Download PDF

Info

Publication number
CN115331804A
CN115331804A CN202210781758.8A CN202210781758A CN115331804A CN 115331804 A CN115331804 A CN 115331804A CN 202210781758 A CN202210781758 A CN 202210781758A CN 115331804 A CN115331804 A CN 115331804A
Authority
CN
China
Prior art keywords
probability
user
diagnosis
disease state
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210781758.8A
Other languages
Chinese (zh)
Inventor
黄立
沈琳琳
周善斌
刘金婷
彭晓哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHENZHEN JINGXIANG TECHNOLOGY CO LTD
Original Assignee
SHENZHEN JINGXIANG TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHENZHEN JINGXIANG TECHNOLOGY CO LTD filed Critical SHENZHEN JINGXIANG TECHNOLOGY CO LTD
Priority to CN202210781758.8A priority Critical patent/CN115331804A/en
Publication of CN115331804A publication Critical patent/CN115331804A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Abstract

The application provides a multi-modal diagnosis method of psychological diseases, comprising the following steps: acquiring video and audio data of a user for answering a predetermined diagnosis question; acquiring text data of a user for answering a predetermined diagnosis question; obtaining a first probability of a psychological disease state according to the audio-visual data based on the first diagnosis model; obtaining a second probability of the psychological disease state according to the text data based on the second diagnosis model; and obtaining the final probability of the mental disease state of the user according to the first probability and the second probability. According to the method and the device, the video data and the text data of the user answering the preset diagnosis questions are obtained, and the final disease probability of the user is confirmed according to the relevant data by using the preset model. By adopting a model diagnosis mode, the diagnosis method of the psychological diseases can be based on the basis, the diagnosis result is scientific and objective, and the diagnosis method has relative universality, so that the psychological diseases of a large-range crowd can be diagnosed, and the diagnosis requirements of all psychological disease patients under the condition of limited resource conditions are further met.

Description

Multi-modal psychological disease diagnosis method, computer device and storage medium
Technical Field
The invention relates to the technical field of psychological disease diagnosis, in particular to a multi-mode psychological disease diagnosis method, computer equipment and a storage medium.
Background
Currently, for the diagnosis of psychological diseases, users mostly rely on filling out questionnaires, and then professional personnel analyze the questionnaires to know the psychological state of the users, or the users speak with psychologists to know the psychological state of the users. However, the above methods all need to depend on professionals. Under the condition of limited resource conditions, the diagnosis of a large-scale population is difficult to realize, and the diagnosis requirements of all patients are difficult to meet.
Disclosure of Invention
In view of the above, embodiments of the present invention provide a multi-modal psychological disease diagnosis method, a computer device, and a storage medium.
The embodiment of the invention provides a multi-modal mental disease diagnosis method, which comprises the following steps:
acquiring video and audio data of a user for answering a preset diagnosis question;
acquiring text data of the user for answering the predetermined diagnostic question;
obtaining a first probability of a psychological disease state according to the audio-visual data based on a first diagnosis model;
obtaining a second probability of the psychological disease state according to the text data based on a second diagnosis model;
and obtaining the final probability of the mental disease state of the user according to the first probability and the second probability.
In this way, in the psychological disease diagnosis method according to the embodiment of the present application, by acquiring the audio-visual data and the text data of the user answering the predetermined diagnosis question, and using the preset diagnosis model to obtain the probability of the psychological disease state of the user according to the relevant data, and combining the two, the final disease probability of the user can be confirmed. By adopting a model diagnosis mode, the diagnosis method of the psychological diseases can be based on the basis, the diagnosis result is scientific and objective, and the diagnosis method has relative universality, so that the psychological diseases of a large-range crowd can be diagnosed, and the diagnosis requirements of all psychological disease patients under the condition of limited resource conditions are further met.
In some embodiments, the obtaining audio-visual data of the user answering the predetermined diagnostic question comprises:
acquiring video data of a user for answering a predetermined diagnosis question;
extracting image data of a user from the video data;
and extracting audio data of the user from the video data.
Therefore, by acquiring the video data of the user answering the preset diagnosis questions, the image data and the audio data can be extracted, and a data basis is provided for the subsequent psychological disease diagnosis.
In some embodiments, the deriving a first probability of a psychological disease state from the audiovisual data based on a first diagnostic model comprises:
extracting user facial feature information in the image data to form a facial feature vector matrix.
In this way, by extracting the facial feature information of the user in the image data, a plurality of facial feature vectors with the same dimension are formed, so that the psychological disease state of the user can be analyzed according to the features of each part of the face, a facial feature vector matrix is further formed, the facial feature vector matrix is provided for the first diagnosis model, and a facial feature basis is provided for obtaining a first probability of the psychological disease state of the user.
In some embodiments, the obtaining a first probability of a psychological disease state from the audiovisual data based on the first diagnostic model includes:
and carrying out emotion classification on the audio data based on a preset emotion classification model to form an emotion feature vector matrix.
Therefore, a plurality of emotion feature vectors with the same dimensionality are formed by extracting the emotion feature information of the user in the audio data, an emotion feature vector matrix is further formed, and an emotion feature vector matrix is provided for the first diagnosis model, so that an emotion feature basis is provided for obtaining a first probability of the state of the psychological disease of the user.
In some embodiments, the deriving a first probability of a psychological disease state from the audiovisual data based on a first diagnostic model comprises:
and mapping the dimension of the emotion characteristic vector matrix and the face characteristic vector matrix so that the dimension of the mapped emotion characteristic vector matrix is the same as the dimension of the face characteristic vector matrix.
Therefore, the emotion characteristic vector matrix is mapped into the characteristic vector matrix with the same dimension as the face characteristic vector matrix, and a basis is provided for vector splicing and fusion of the emotion characteristic vector matrix and the face characteristic vector matrix.
In some embodiments, the deriving a first probability of a psychological disease state from the audiovisual data based on a first diagnostic model comprises:
carrying out vector splicing and fusion on the mapped emotion characteristic vector matrix and the face characteristic vector matrix;
and inputting the spliced eigenvector matrix into the first diagnosis model to obtain a first probability of the psychological disease state.
Therefore, the emotion characteristic vector matrix and the facial characteristic vector matrix are spliced into the same matrix, so that emotion characteristic change and facial characteristic change of a user can be comprehensively considered when the first diagnosis model analyzes the mental disease state, and finally a model result of an image and audio is obtained, so that the first probability of the mental disease state is obtained.
In some embodiments, said obtaining text data of said user answering said predetermined diagnostic question comprises:
and performing text extraction processing on the video and audio data of the user answering the preset diagnosis questions to obtain the text data.
Therefore, after the audio and video data of the user are obtained, the audio data in the audio and video data are converted into text data, and confusion factors caused by different individuals, such as accents, speech speeds, intonations and the like, can be avoided at first. Secondly, the meaning which the user wants to express can be judged according to the semantics.
In some embodiments, the deriving a second probability of a psychological disease state from the textual data based on a second diagnostic model comprises:
extracting the content of the text data;
and performing disease matching according to the extracted content to obtain a second probability of the psychological disease state.
In this way, the content of the text data is extracted, and only the text data answered by the user is reserved, so that after the text data is subjected to disease matching, a second probability representing the psychological disease state of the user can be obtained.
In some embodiments, said deriving a final probability of the user's mental disease state from the first and second probabilities comprises:
and performing fusion calculation on the first probability and the second probability according to a preset weight to obtain a final probability of the user mental disease state.
Therefore, the first probability and the second probability are fused and calculated according to the preset weight to obtain the final diagnosis result. And modifying the preset weight according to the actual condition to enable the final diagnosis result to be in accordance with the reality, thereby achieving the purpose of diagnosing the psychological disease state for the user.
The invention provides a computer device comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, implements the method of any of the above.
Therefore, the computing equipment of the application can respectively obtain the probability of the psychological disease state of the user according to the relevant data by acquiring the video and audio data and the text data of the user answering the preset diagnosis question and utilizing the preset diagnosis model, and combine the probability of the psychological disease state of the user and the text data, thereby confirming the final disease probability of the user. By adopting a model diagnosis mode, the diagnosis method of the psychological diseases can be based on the basis, the diagnosis result is scientific and objective, and the diagnosis method has relative universality, so that the psychological diseases of a large-range crowd can be diagnosed, and the diagnosis requirements of all psychological disease patients under the condition of limited resource conditions are further met.
The present invention provides a non-transitory computer-readable storage medium of a computer program which, when executed by one or more processors, causes the processors to perform the method.
Therefore, in the method and the device, the audio-visual data and the text data of the user answering the preset diagnosis questions are obtained, the probability of the psychological disease state of the user is obtained according to the relevant data by using the preset diagnosis model, and the probability of the psychological disease state of the user is combined, so that the final disease probability of the user can be confirmed. By adopting a model diagnosis mode, the diagnosis method of the psychological diseases can be based on the basis, the diagnosis result is scientific and objective, and the diagnosis method has relative universality, so that the psychological diseases of a large-range crowd can be diagnosed, and the diagnosis requirements of all psychological disease patients under the condition of limited resource conditions are further met.
Additional aspects and advantages of embodiments of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic flow chart of a method for diagnosing a psychological disorder according to some embodiments of the present invention;
FIG. 2 is a schematic flow chart of a method for diagnosing a psychological disease according to some embodiments of the invention;
FIG. 3 is a schematic flow chart of a method for diagnosing a psychological disorder according to some embodiments of the present invention;
FIG. 4 is a schematic flow chart of a method for diagnosing a psychological disorder according to some embodiments of the present invention;
FIG. 5 is a schematic flow chart of a method for diagnosing a psychological disorder according to some embodiments of the present invention;
FIG. 6 is a flow chart of a method for diagnosing a psychological disorder according to some embodiments of the present invention;
FIG. 7 is a flow chart of a method for diagnosing a psychological disease according to some embodiments of the invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
Referring to fig. 1, the present application provides a method for multi-modal diagnosis of psychological diseases, comprising:
s10: acquiring video and audio data of a user for answering a preset diagnosis question;
s20: acquiring text data of a user for answering a predetermined diagnosis question;
s30: obtaining a first probability of a psychological disease state according to the audio-visual data based on the first diagnosis model;
s40: obtaining a second probability of the psychological disease state according to the text data based on the second diagnosis model;
s50: and obtaining the final probability of the mental disease state of the user according to the first probability and the second probability.
The application also provides a computer device, and the psychological disease diagnosis method can be realized by the computer device. The computer device comprises a memory and a processor, wherein the memory stores a computer program, and the processor is used for acquiring audio-visual data of a user for answering a preset diagnosis question and acquiring text data of the user for answering the preset diagnosis question, and is used for obtaining a first probability of a psychological disease state according to the audio-visual data based on a first diagnosis model, obtaining a second probability of the psychological disease state according to the text data based on a second diagnosis model, and obtaining a final probability of the psychological disease state of the user according to the first probability and the second probability. The computer equipment in the application can be medical equipment with a video and audio diagnosis function.
Specifically, the multi-modal psychological disease diagnosis method is realized at least based on audio-visual data and text data collected by a user in the process of answering a predetermined diagnosis question. The audio-visual data can be obtained from an audio-visual video of a user who has completed a diagnosis problem predetermined by a professional psychotherapist. The length of the video data can be variable, and the length of the video data is different according to different video recordings, and is not limited herein. The text data may be text data obtained by converting audio data in the audio/video data using an Automatic Speech Recognition (ASR) technique to answer a predetermined diagnostic question. The predetermined diagnostic questions may be the age, occupation, character, etc. of the user, or may be the most troubling and most desired questions to be solved, for example, "how the sleep quality is during the time of you", "what is the most happy thing recently", and may be configured by the user according to the needs, and is not limited herein. The first diagnosis model can be generated by collecting a large number of characteristics of faces, behaviors, emotions and the like of people, performing label learning, and further obtaining a first probability of the psychological disease state through the audio-video data of a user. For example, the model can be trained by collecting the features of faces, behaviors, moods and the like of ten thousand persons and respectively marking the respective depression degrees of the ten thousand persons, and after training, the model has the capability of judging the depression degree. And transmitting a section of video data with unknown depression degree to the model, and returning the most possible depression degree of the section of video data by the model. The second diagnosis model may be a diagnosis model generated by collecting a large number of text features of a person and performing label learning, and further a second probability of the mental disease state is obtained through text data of the user. For example, the model can be trained by collecting the text features of ten thousand people and respectively marking the respective depression degrees of the ten thousand people, and after training,the model possessed the ability to judge the extent of depression. A piece of text data with unknown depression degree is transmitted to the model, and the model returns the most probable depression degree of the piece of text data. The accuracy of model identification depends on the training degree based on the collected characteristic materials and whether the structure of the model is reasonable. The first and second probabilities obtained can each represent the respective probability of different degrees of depression, e.g., the first and second probabilities can each be expressed as (no depression a) 1 Mild depression a 2 Moderate depression a 3 Major depression a 4 ) And a is a 1 +a 2 +a 3 +a 4 =1。
It can be understood that the first probability and the second probability are calculated based on different models, and the different models have respective emphasis, and the first probability and the second probability need to be subjected to fusion calculation to obtain the final probability of the user mental disease state. The fusion calculation may be performed by, for example, arithmetic mean, weighted average, or the like, and is not limited herein.
In summary, in the multi-modal mental disease diagnosis method and the computer device according to the embodiments of the present application, by obtaining the audio-visual data and the text data of the user answering the predetermined diagnosis question, the probability of the mental disease state of the user is respectively obtained according to the relevant data by using the preset diagnosis model, and the two probabilities are combined, so that the final disease probability of the user can be determined. By adopting a model diagnosis mode, the diagnosis method of the psychological diseases can be based on the basis, the diagnosis result is scientific and objective, and the diagnosis method has relative universality, so that the psychological diseases of a large-range crowd can be diagnosed, and the diagnosis requirements of all psychological disease patients under the condition of limited resource conditions are further met.
Referring to fig. 2, in some embodiments, S10 includes:
s11: acquiring video data of a user for answering a predetermined diagnosis question;
s12: extracting image data of a user from the video data;
s13: audio data of the user is extracted from the video data.
In some embodiments, the processor is configured to obtain video data of the user answering a predetermined diagnostic question, and to extract image data of the user from the video data, and to extract audio data of the user from the video data.
Specifically, in this step, the video data may be a video-audio recording of the user by answering a diagnosis question predetermined by a professional psychotherapist. The video data may extract image data and audio data of the user, and the image data may be extracted from the video data frame by frame. The audio data may be extracted from the video data, and the audio data may provide emotional characteristics for the first diagnostic model and textual data for the second diagnostic model.
Therefore, by acquiring the video data of the user answering the preset diagnosis questions, the image data and the audio data can be extracted, and a data basis is provided for the subsequent psychological disease diagnosis.
Referring to fig. 3, in some embodiments, S30 includes:
s31: user facial feature information in the image data is extracted to form a facial feature vector matrix.
In some such approaches, the processor is operative to extract user facial feature information in the image data to form a facial feature vector matrix.
Specifically, the key information related to the human face in the image data may be extracted as facial feature vectors, where the facial features may be eyes, nose, mouth, eyebrows, cheeks, chin, etc., so that a feature vector may be formed for each feature, and a plurality of facial feature vectors with the same dimension may form a facial feature vector matrix.
In this way, by extracting the facial feature information of the user in the image data, a plurality of facial feature vectors with the same dimension are formed, so that the psychological disease state of the user can be analyzed according to the features of each part of the face, a facial feature vector matrix is further formed, the facial feature vector matrix is provided for the first diagnosis model, and a facial feature basis is provided for obtaining a first probability of the psychological disease state of the user.
Referring to fig. 3, in some embodiments, S30 includes:
s32: the audio data is mood classified based on a predetermined mood classification model to form a matrix of mood feature vectors.
In some such approaches, the processor is configured to perform emotion classification on the audio data based on a predetermined emotion classification model to form an emotion feature vector matrix.
Specifically, the predetermined emotion classification model may be a classification of the user's speech into several categories of calm, anger, surprise, depression, happiness, fear, sadness, etc., each emotion being represented by a vector. For example, calm can be expressed as (1, 0), anger can be expressed as (0, 1, 0) and so on, the vector for each emotion may be represented as a 1 x n matrix, forming a matrix of emotion feature vectors.
Therefore, a plurality of emotion characteristic vectors with the same dimensionality are formed by extracting the emotion characteristic information of the user in the audio data, an emotion characteristic vector matrix is further formed, and an emotion characteristic vector matrix is provided for the first diagnosis model, so that an emotion characteristic basis is provided for the first probability of the state of the user psychological disease.
Referring to fig. 3, in some embodiments, S30 includes:
s33: and mapping the dimension of the emotion characteristic vector matrix and the face characteristic vector matrix so that the dimension of the mapped emotion characteristic vector matrix is the same as the dimension of the face characteristic vector matrix.
In some embodiments, the processor is configured to map the dimensions of the matrix of emotional feature vectors with the matrix of facial feature vectors such that the dimensions of the matrix of mapped emotional feature vectors are the same as the dimensions of the matrix of facial feature vectors.
Specifically, the emotion feature vector matrix has a different dimension from the face feature vector matrix, and therefore, the emotion feature vector matrix and the face feature vector matrix cannot be merged together, and therefore, matching analysis cannot be performed together. For example, the emotion feature vector matrix may be a 1 × 2 matrix, written as:
A=[0 1]
the facial feature vector matrix may be a 3 x 3 matrix written as:
Figure RE-GDA0003837866180000071
further, mapping the emotion feature vector matrix into a feature vector matrix with the same dimension as the face feature vector matrix can be expressed as:
Figure RE-GDA0003837866180000072
it should be noted that the emotion feature vector matrix a may be a matrix of 1 × p, the facial feature vector matrix may be a matrix of m × n, and the emotion feature vector matrix a' after mapping is a matrix of 1 × n, which is not limited herein.
Therefore, the emotion characteristic vector matrix is mapped into the characteristic vector matrix with the same dimension as the face characteristic vector matrix, and a basis is provided for vector splicing and fusion of the emotion characteristic vector matrix and the face characteristic vector matrix.
Referring to fig. 3, in some embodiments, S30 includes:
s34: carrying out vector splicing and fusion on the mapped emotion characteristic vector matrix and the mapped face characteristic vector matrix;
s35: and inputting the spliced eigenvector matrix into a first diagnosis model to obtain a first probability of the psychological disease state.
In some embodiments, the processor is configured to perform vector splicing and fusion on the mapped emotion feature vector matrix and the facial feature vector matrix, and to input the spliced feature vector matrix into the first diagnosis model to obtain the first probability of the psychological disease state.
Specifically, after mapping, the emotion characteristic vector matrix and the face characteristic vector matrix with the same dimensionality can be spliced and fused to form a characteristic vector matrix. And inputting the spliced eigenvector matrix into a first diagnosis model, and obtaining a first probability of the psychological disease state through matching analysis of the first diagnosis model. For example, the emotion feature vector matrix and the face feature vector matrix with the same dimension may be written as:
emotion feature vector matrix:
A'=[0 1 0]
face feature vector matrix:
Figure RE-GDA0003837866180000081
further, the emotion characteristic vector matrix and the face characteristic vector matrix are spliced into a fusion, and the fusion is recorded as:
Figure RE-GDA0003837866180000082
it should be noted that the matrix C is formed by splicing and fusing the mapped emotion feature vector matrix (1 × n matrix) and facial feature vector matrix (m × n matrix), where the matrix C may be an (m + 1) × n matrix, and the specific embodiment is not limited herein.
Therefore, the emotion characteristic vector matrix and the facial characteristic vector matrix are spliced into the same matrix, so that emotion characteristic change and facial characteristic change of a user can be comprehensively considered when the first diagnosis model analyzes the mental disease state, and finally a model result of an image and audio is obtained, so that the first probability of the mental disease state is obtained.
Referring to fig. 4, in some embodiments, S20 includes:
s21: and performing text extraction processing on the video and audio data of the user answering the preset diagnosis question to obtain text data.
In some embodiments, the processor is configured to perform text extraction processing on the audio/video data of the user answering the predetermined diagnostic question to obtain text data.
Specifically, in this step, audio data extraction is performed by audio-visual data from which the user answers a predetermined diagnostic question. The voice data is converted into the text data, and the ASR technology can be used for carrying out automatic voice recognition, so that time and labor are saved, and convenience and rapidness are realized. And manual translation can also be used, so that the conversion precision of the manual translation is higher.
Therefore, after the audio and video data of the user are obtained, the audio data in the audio and video data are converted into text data, and therefore, confusion factors caused by different individuals, such as accents, speech speed, intonation and the like, can be avoided. Secondly, the meaning which the user wants to express can be judged according to the semantics.
Referring to fig. 5, in some embodiments, S40 includes:
s41: extracting the content of the text data;
s42: and performing disease matching according to the extracted content to obtain a second probability of the psychological disease state.
In some embodiments, the processor is configured to perform content extraction on the text data and perform disease matching based on the extracted content to obtain a second probability of a psychological disease state.
Specifically, the acquired text information may be extracted, and the voice irrelevant to the user's answer may be deleted, which mainly includes the predetermined diagnostic question and the dialog gap blank, so that it may be ensured that the text data input into the second diagnostic model only includes the text data of the user. The extracted text data may be composed of several sentences arranged in time order. For example, the text data may be "how the quality of sleep is in this period, i sleeps for a long time, i cannot fall asleep, what is the most happy thing of you in the near future, and no thing which can be happy recently", the text data may be "i sleeps for a long time, i cannot fall asleep", and "no is the most happy thing of me", a predetermined diagnosis question is deleted, text data to be answered by the user is retained, and the extracted text is composed of two sentences arranged in time sequence. Based on the second diagnosis model, a plurality of sentences arranged in time order may be disease-matched, thereby obtaining a second probability of the psychological disease state.
In this way, the content of the text data is extracted, and only the text data answered by the user is reserved, so that after the text data is subjected to disease matching, a second probability representing the psychological disease state of the user can be obtained.
Referring to fig. 6, in some embodiments, S50 includes:
s51: and performing fusion calculation on the first probability and the second probability according to a preset weight to obtain a final probability of the psychological disease state of the user.
In some embodiments, the processor is configured to perform a fusion calculation of the first probability and the second probability according to a predetermined weight to obtain a final probability of the user's mental disease state.
In particular, the weights of the predetermined first and second probabilities may be configured according to the form that different users exhibit with respect to the psychological disease state, e.g. the first probability weight may be adjusted down and the second probability weight may be adjusted up as soon as the user has no significant change in facial expression and mood when answering the predetermined diagnostic question, while the semantics of answering the predetermined diagnostic question are negative, sad and full of negative energy. The adjustment can be performed in real time according to actual conditions, and is not limited herein.
The final probability of the user's mental disease state may be calculated from the first probability and the second probability according to a predetermined weight, e.g., the first probability is P 1 The second probability is P 2 The final probability is P, the first probability weight is F, and the second probability weight is 1-F, wherein P 1 ∈[0,1],P 2 ∈[0,1],F∈[0,1]Then, P = P 1 *F+P 3 *(1-F)。
Therefore, the first probability and the second probability are fused and calculated according to the preset weight to obtain the final diagnosis result. And modifying the preset weight according to the actual condition to enable the final diagnosis result to be in accordance with the reality, thereby achieving the purpose of diagnosing the psychological disease state for the user.
The embodiment of the application also provides a computer readable storage medium. One or more non-transitory computer-readable storage media embodying computer-executable instructions which, when executed by one or more processors, cause the processors to perform the method of any of the embodiments described above.
As such, the present invention provides a non-transitory computer-readable storage medium of a computer program, storing the computer program, which, when executed by one or more processors, causes the processors to perform a psychological disease diagnosis method.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, and the program can be stored in a non-volatile computer readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), or the like.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the appended claims.

Claims (11)

1. A method for multi-modal diagnosis of a psychological disease, the method comprising:
acquiring video and audio data of a user for answering a preset diagnosis question;
acquiring text data of the user for answering the predetermined diagnostic question;
obtaining a first probability of a psychological disease state according to the audio-visual data based on a first diagnosis model;
obtaining a second probability of the psychological disease state according to the text data based on a second diagnosis model;
and obtaining the final probability of the mental disease state of the user according to the first probability and the second probability.
2. The diagnostic method of claim 1, wherein the obtaining of audio-visual data of the user answering the predetermined diagnostic questions comprises:
acquiring video data of a user for answering a predetermined diagnosis question;
extracting image data of a user from the video data;
and extracting audio data of the user from the video data.
3. The method of claim 2, wherein obtaining a first probability of a psychological disease state from the audiovisual data based on the first diagnostic model comprises:
extracting user facial feature information in the image data to form a facial feature vector matrix.
4. The method of claim 3, wherein obtaining a first probability of a psychological disease state from the audiovisual data based on the first diagnostic model comprises:
and performing emotion classification on the audio data based on a predetermined emotion classification model to form an emotion feature vector matrix.
5. The method of claim 4, wherein obtaining a first probability of a psychological disease state from the audiovisual data based on the first diagnostic model comprises:
and mapping the dimension of the emotion characteristic vector matrix and the face characteristic vector matrix so that the dimension of the mapped emotion characteristic vector matrix is the same as the dimension of the face characteristic vector matrix.
6. The method of claim 5, wherein obtaining the first probability of the mental disease state from the audio/video data based on the first diagnostic model comprises:
carrying out vector splicing and fusion on the mapped emotion characteristic vector matrix and the face characteristic vector matrix;
and inputting the spliced feature vector matrix into the first diagnosis model to obtain a first probability of the psychological disease state.
7. The diagnostic method of claim 1, wherein said obtaining text data of said user answering said predetermined diagnostic question comprises:
and performing text extraction processing on the video and audio data of the user answering the preset diagnosis questions to obtain the text data.
8. The method of claim 7, wherein the deriving a second probability of a psychological disease state from the textual data based on a second diagnostic model comprises:
extracting the content of the text data;
and performing disease matching according to the extracted content to obtain a second probability of the psychological disease state.
9. The diagnostic method of claim 1, wherein said deriving a final probability of the user's mental disease state from the first and second probabilities comprises:
and performing fusion calculation on the first probability and the second probability according to a preset weight to obtain a final probability of the user mental disease state.
10. A computer arrangement, characterized in that the computer arrangement comprises a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, carries out the method of any one of claims 1-9.
11. A non-transitory computer-readable storage medium of a computer program, wherein the computer program, when executed by one or more processors, implements the method of any one of claims 1-9.
CN202210781758.8A 2022-07-04 2022-07-04 Multi-modal psychological disease diagnosis method, computer device and storage medium Pending CN115331804A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210781758.8A CN115331804A (en) 2022-07-04 2022-07-04 Multi-modal psychological disease diagnosis method, computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210781758.8A CN115331804A (en) 2022-07-04 2022-07-04 Multi-modal psychological disease diagnosis method, computer device and storage medium

Publications (1)

Publication Number Publication Date
CN115331804A true CN115331804A (en) 2022-11-11

Family

ID=83917417

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210781758.8A Pending CN115331804A (en) 2022-07-04 2022-07-04 Multi-modal psychological disease diagnosis method, computer device and storage medium

Country Status (1)

Country Link
CN (1) CN115331804A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116530944A (en) * 2023-07-06 2023-08-04 荣耀终端有限公司 Sound processing method and electronic equipment
CN116543918A (en) * 2023-07-04 2023-08-04 武汉大学人民医院(湖北省人民医院) Method and device for extracting multi-mode disease features

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108245161A (en) * 2017-12-26 2018-07-06 北京医拍智能科技有限公司 The assistant diagnosis system of lung's common disease
CN111540440A (en) * 2020-04-23 2020-08-14 深圳市镜象科技有限公司 Psychological examination method, device, equipment and medium based on artificial intelligence
CN111816301A (en) * 2020-07-07 2020-10-23 平安科技(深圳)有限公司 Medical inquiry assisting method, device, electronic equipment and medium
CN113380418A (en) * 2021-06-22 2021-09-10 浙江工业大学 System for analyzing and identifying depression through dialog text
CN113724898A (en) * 2021-08-31 2021-11-30 平安科技(深圳)有限公司 Intelligent inquiry method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108245161A (en) * 2017-12-26 2018-07-06 北京医拍智能科技有限公司 The assistant diagnosis system of lung's common disease
CN111540440A (en) * 2020-04-23 2020-08-14 深圳市镜象科技有限公司 Psychological examination method, device, equipment and medium based on artificial intelligence
CN111816301A (en) * 2020-07-07 2020-10-23 平安科技(深圳)有限公司 Medical inquiry assisting method, device, electronic equipment and medium
WO2021114736A1 (en) * 2020-07-07 2021-06-17 平安科技(深圳)有限公司 Medical consultation assistance method and apparatus, electronic device, and medium
CN113380418A (en) * 2021-06-22 2021-09-10 浙江工业大学 System for analyzing and identifying depression through dialog text
CN113724898A (en) * 2021-08-31 2021-11-30 平安科技(深圳)有限公司 Intelligent inquiry method, device, equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116543918A (en) * 2023-07-04 2023-08-04 武汉大学人民医院(湖北省人民医院) Method and device for extracting multi-mode disease features
CN116543918B (en) * 2023-07-04 2023-09-22 武汉大学人民医院(湖北省人民医院) Method and device for extracting multi-mode disease features
CN116530944A (en) * 2023-07-06 2023-08-04 荣耀终端有限公司 Sound processing method and electronic equipment
CN116530944B (en) * 2023-07-06 2023-10-20 荣耀终端有限公司 Sound processing method and electronic equipment

Similar Documents

Publication Publication Date Title
Avots et al. Audiovisual emotion recognition in wild
CN115413348B (en) System and method for automatically verifying and quantifying interview question answers
Perez-Gaspar et al. Multimodal emotion recognition with evolutionary computation for human-robot interaction
CN110969106B (en) Multi-mode lie detection method based on expression, voice and eye movement characteristics
Busso et al. Iterative feature normalization scheme for automatic emotion detection from speech
CN115331804A (en) Multi-modal psychological disease diagnosis method, computer device and storage medium
CN115329779B (en) Multi-person dialogue emotion recognition method
Kim et al. ISLA: Temporal segmentation and labeling for audio-visual emotion recognition
US20200357302A1 (en) Method for digital learning and non-transitory machine-readable data storage medium
CN113380271B (en) Emotion recognition method, system, device and medium
CN115713875A (en) Virtual reality simulation teaching method based on psychological analysis
CN111199205A (en) Vehicle-mounted voice interaction experience evaluation method, device, equipment and storage medium
CN113592251B (en) Multi-mode integrated teaching state analysis system
CN106991172B (en) Method for establishing multi-mode emotion interaction database
CN113223560A (en) Emotion recognition method, device, equipment and storage medium
CN111145903A (en) Method and device for acquiring vertigo inquiry text, electronic equipment and inquiry system
CN114549946A (en) Cross-modal attention mechanism-based multi-modal personality identification method and system
CN113035232B (en) Psychological state prediction system, method and device based on voice recognition
Suarez et al. Building a Multimodal Laughter Database for Emotion Recognition.
WO2022180860A1 (en) Video session evaluation terminal, video session evaluation system, and video session evaluation program
McTear et al. Affective conversational interfaces
CN115171673A (en) Role portrait based communication auxiliary method and device and storage medium
CN114492579A (en) Emotion recognition method, camera device, emotion recognition device and storage device
Esposito et al. The new Italian audio and video emotional database
JP7152825B1 (en) VIDEO SESSION EVALUATION TERMINAL, VIDEO SESSION EVALUATION SYSTEM AND VIDEO SESSION EVALUATION PROGRAM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20221111