CN110301920B - Multi-mode fusion method and device for psychological pressure detection - Google Patents

Multi-mode fusion method and device for psychological pressure detection Download PDF

Info

Publication number
CN110301920B
CN110301920B CN201910567398.XA CN201910567398A CN110301920B CN 110301920 B CN110301920 B CN 110301920B CN 201910567398 A CN201910567398 A CN 201910567398A CN 110301920 B CN110301920 B CN 110301920B
Authority
CN
China
Prior art keywords
feature matrix
matrix
attention
text
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910567398.XA
Other languages
Chinese (zh)
Other versions
CN110301920A (en
Inventor
冯铃
张慧君
曹檑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201910567398.XA priority Critical patent/CN110301920B/en
Publication of CN110301920A publication Critical patent/CN110301920A/en
Application granted granted Critical
Publication of CN110301920B publication Critical patent/CN110301920B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/165Evaluating the state of mind, e.g. depression, anxiety
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Psychiatry (AREA)
  • Public Health (AREA)
  • Surgery (AREA)
  • Veterinary Medicine (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Medical Informatics (AREA)
  • Child & Adolescent Psychology (AREA)
  • Evolutionary Computation (AREA)
  • Signal Processing (AREA)
  • Fuzzy Systems (AREA)
  • Physiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Developmental Disabilities (AREA)
  • Psychology (AREA)
  • Hospice & Palliative Care (AREA)
  • Educational Technology (AREA)
  • Social Psychology (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The embodiment of the invention provides a multi-mode fusion method and device for psychological pressure detection, and the method is based on an attention-enhancing feature matrix of physiological data- > text, physiological data- > picture, text- > physiological data, text- > picture, picture- > physiological data and picture- > text, and based on a feedforward full-link neural network, a fusion feature matrix of the text, the picture and the physiological data is obtained; then acquiring fusion expression matrixes of the three modes based on the importance weight values of the text, the picture and the physiological data and the fusion characteristic matrix of the text, the picture and the physiological data; and finally, acquiring a pressure classification vector reflecting the psychological pressure problem based on the fusion expression matrix of the three modes and the feedforward full-connection network. According to the invention, through fusing the text image data and the physiological relevant data, the defects caused by subjectivity of the text image data and the image data are made up, and some inherent problems of the physiological relevant data are solved.

Description

Multi-mode fusion method and device for psychological pressure detection
Technical Field
The invention relates to the technical field of computers, in particular to a multi-mode fusion method and device for psychological stress detection.
Background
With the increase of the social competitive pressure, the psychological stress problem of teenagers gradually becomes a more serious problem. Excessive psychological stress can cause many physiological and psychological problems, which make psychological stress detection more and more important.
The existing mental stress detection work focusing on social media only focuses on text and picture contents, but the text and picture contents are subjective and sometimes cannot express a real mental state.
There is some work related to physiological signals that have proven their effectiveness in detecting psychological stress, such as heart rate variability, electrocardiograms, galvanic skin reactions, electroencephalograms, blood pressure, and electromyograms. However, there are inherent problems in the physiological signal related data, such as that the physiological related data in the extreme excitability state and the extreme stress state are very similar, and therefore, the real psychological state cannot be fully expressed according to the physiological signal related data.
As can be seen from the above description, there is currently no effective psychological stress detection method and apparatus.
Disclosure of Invention
To solve the problems in the prior art, embodiments of the present invention provide a multi-modal fusion method and apparatus for psychological stress detection.
In a first aspect, an embodiment of the present invention provides an attention weight correspondence method for feature interactive fusion of two modality data, including:
based on the feature matrix of the two modal data, acquiring an incidence relation matrix reflecting information incidence between different features of the two modal data by utilizing matrix multiplication;
acquiring an influence weight matrix of the characteristic matrix of one modal data to the characteristic matrix of the other modal data based on the incidence relation matrix and the feedforward full-connection network model;
and acquiring an attention-strengthening feature matrix containing the mutual influence weight of the feature matrices of the two modal data by utilizing matrix dot multiplication and residual connection based on the influence weight matrix and the feature matrices of the two modal data.
In a second aspect, an embodiment of the present invention provides a multimodal fusion method for detecting psychological stress based on the attention weight correspondence method for feature interactive fusion of two modality data according to the first aspect, including:
respectively acquiring a physiological data related characteristic matrix reflecting the physiological state of a user and a text characteristic matrix and an image characteristic matrix reflecting the psychological activity state of the user;
based on the physiological data related feature matrix, the text feature matrix and the picture feature matrix, by utilizing the attention weight corresponding method, a first attention-enhanced feature matrix comprising the influence weight of the physiological data related feature matrix on the text feature matrix, a second attention-enhanced feature matrix comprising the influence weight of the physiological data related feature matrix on the picture feature matrix, a third attention-enhanced feature matrix comprising the influence weight of the text feature matrix on the physiological data related feature matrix, a fourth attention-enhanced feature matrix comprising the influence weight of the text feature matrix on the picture feature matrix, a fifth attention-enhanced feature matrix comprising the influence weight of the picture feature matrix on the physiological data related feature matrix and a sixth attention-enhanced feature matrix comprising the influence weight of the picture feature matrix on the text feature matrix are obtained Characterizing the matrix;
acquiring a text fusion feature matrix, a picture fusion feature matrix and a physiological data fusion feature matrix based on the first attention enhancement feature matrix, the second attention enhancement feature matrix, the third attention enhancement feature matrix, the fourth attention enhancement feature matrix, the fifth attention enhancement feature matrix and the sixth attention enhancement feature matrix and based on a feedforward full-connection neural network;
based on the physiological data related feature matrix, the text feature matrix and the picture feature matrix, obtaining a text, a picture and a physiological data feature value based on a feedforward full-connection neural network;
based on the text, the picture and the physiological data characteristic value, acquiring importance weight values of the text, the picture and the physiological data based on vector splicing and attention mechanism;
acquiring fusion expression matrixes of three modes based on the importance weight values of the text, the picture and the physiological data and the text fusion characteristic matrix, the picture fusion characteristic matrix and the physiological data fusion characteristic matrix;
and acquiring a pressure classification vector reflecting the psychological pressure problem based on the fusion expression matrix of the three modes and the feedforward full-connection network.
In a third aspect, an embodiment of the present invention further provides an attention weight corresponding apparatus for feature interactive fusion of two modality data, including:
the first acquisition module is used for acquiring an incidence relation matrix reflecting information incidence between different characteristics of the two modal data by utilizing matrix multiplication based on the characteristic matrix of the two modal data;
the second acquisition module is used for acquiring an influence weight matrix of the characteristic matrix of one modal data to the characteristic matrix of the other modal data based on the incidence relation matrix and the feedforward full-connection network model;
and the third acquisition module is used for acquiring an attention-enhanced feature matrix containing the mutual influence weight of the feature matrices of the two modal data by utilizing matrix dot multiplication and residual connection based on the influence weight matrix and the feature matrices of the two modal data.
In a fourth aspect, an embodiment of the present invention further provides a multi-modal fusion apparatus for mental stress detection based on the attention weight correspondence apparatus for feature interactive fusion of two modal data according to the third aspect, including:
the fourth acquisition module is used for respectively acquiring a physiological data related characteristic matrix reflecting the physiological state of the user and a text characteristic matrix and an image characteristic matrix reflecting the psychological activity state of the user;
a fifth obtaining module, configured to obtain, based on the physiological data related feature matrix, the text feature matrix, and the image feature matrix, a first attention-enhanced feature matrix including a weight of an influence of the physiological data related feature matrix on the text feature matrix, a second attention-enhanced feature matrix including a weight of an influence of the physiological data related feature matrix on the image feature matrix, a third attention-enhanced feature matrix including a weight of an influence of the text feature matrix on the physiological data related feature matrix, a fourth attention-enhanced feature matrix including a weight of an influence of the text feature matrix on the image feature matrix, a fifth attention-enhanced feature matrix including a weight of an influence of the image feature matrix on the physiological data related feature matrix, and a fifth attention-enhanced feature matrix including a weight of an influence of the image feature matrix on the text feature matrix by using the attention-weight correspondence method The sixth attention-enhancing feature matrix of (1);
a sixth obtaining module, configured to obtain a text fusion feature matrix, an image fusion feature matrix, and a physiological data fusion feature matrix based on a feedforward fully-connected neural network based on the first attention-enhanced feature matrix, the second attention-enhanced feature matrix, the third attention-enhanced feature matrix, the fourth attention-enhanced feature matrix, the fifth attention-enhanced feature matrix, and the sixth attention-enhanced feature matrix;
a seventh obtaining module, configured to obtain a text, a picture, and a physiological data feature value based on the physiological data related feature matrix, the text feature matrix, and the picture feature matrix, and based on a feedforward fully-connected neural network;
the eighth acquiring module is used for acquiring importance weight values of the text, the picture and the physiological data based on the text, the picture and the physiological data characteristic values and based on a vector splicing and attention mechanism;
a ninth obtaining module, configured to obtain a fusion expression matrix of three modalities based on the importance weight values of the text, the picture and the physiological data, and the text fusion feature matrix, the picture fusion feature matrix and the physiological data fusion feature matrix;
and the tenth acquisition module is used for acquiring a pressure classification vector reflecting the psychological pressure problem based on the fusion expression matrix of the three modes and the feedforward full-connection network.
In a fifth aspect, embodiments of the present invention further provide an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the program to implement the steps of the attention weight correspondence method for feature interactive fusion of two modality data according to the first aspect, and/or the steps of the multi-modality fusion method for mental stress detection according to the second aspect.
In a sixth aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the attention weight correspondence method for feature interactive fusion of two modality data as described in the first aspect, and/or the steps of the multi-modality fusion method for mental stress detection as described in the second aspect.
It can be known from the above technical solutions that the attention weight corresponding method and apparatus for feature interactive fusion of two modal data according to the embodiments of the present invention obtains an incidence relation matrix reflecting information incidence between different features of the two modal data by using matrix multiplication based on feature matrices of the two modal data, obtains an influence weight matrix of the feature matrix of one modal data to a feature matrix of the other modal data based on the incidence relation matrix and a feedforward full-connection network model, and finally obtains an attention-enhanced feature matrix including mutual influence weights of the feature matrices of the two modal data by using matrix point multiplication and residual connection based on the influence weight matrix and the feature matrices of the two modal data, and the embodiments of the present invention implement the attention weight corresponding method for feature interactive fusion of the two modal data by the above processing procedure, based on the method, another embodiment of the present invention provides a multi-modal fusion method and apparatus for psychological stress detection, in which a first attention-enhanced feature matrix including a weight of the physiological data-related feature matrix affecting the text feature matrix, a second attention-enhanced feature matrix including a weight of the physiological data-related feature matrix affecting the image feature matrix, a third attention-enhanced feature matrix including a weight of the text feature matrix affecting the physiological data-related feature matrix, a fourth attention-enhanced feature matrix including a weight of the text feature matrix affecting the image feature matrix, and a corresponding method of the attention weights are obtained based on the physiological data-related feature matrix, the text feature matrix, and the image feature matrix, A fifth attention-enhanced feature matrix containing the mutual influence weight of the picture feature matrix on the physiological data related feature matrix and a sixth attention-enhanced feature matrix containing the mutual influence weight of the picture feature matrix on the text feature matrix are obtained, and then a text fusion feature matrix, a picture fusion feature matrix and a physiological data fusion feature matrix are obtained based on a feedforward full-connection neural network based on the first attention-enhanced feature matrix, the second attention-enhanced feature matrix, the third attention-enhanced feature matrix, the fourth attention-enhanced feature matrix, the fifth attention-enhanced feature matrix and the sixth attention-enhanced feature matrix; then based on the physiological data related characteristic matrix, the text characteristic matrix and the picture characteristic matrix, based on a feedforward full-connection neural network, obtaining characteristic values of texts, pictures and physiological data, and then based on the characteristic values of the texts, the pictures and the physiological data, based on vector splicing and an attention mechanism, obtaining importance weighted values of the texts, the pictures and the physiological data; then acquiring fusion expression matrixes of three modes based on the importance weight values of the text, the picture and the physiological data and the text fusion characteristic matrix, the picture fusion characteristic matrix and the physiological data fusion characteristic matrix; and finally, acquiring a pressure classification vector reflecting the psychological pressure problem based on the fusion expression matrix of the three modes and a feedforward full-connection network. According to the embodiment of the invention, through fusing the text picture data and the physiological related data, the defects caused by the subjectivity of the text and the picture data are made up, some inherent problems of the physiological related data are solved (for example, the physiological related data in an extreme excitation state and an extreme stress state are very similar), and a psychological detection window period caused by certain data loss is made up to a certain extent.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart of an attention weight correspondence method for feature interactive fusion of two modality data according to an embodiment of the present invention;
fig. 2 is a model structure diagram of an attention weight mapping method for feature interactive fusion of two modality data according to an embodiment of the present invention;
FIG. 3 is a flowchart of a multi-modal fusion method for mental stress detection according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a text feature extraction process according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a physiological feature extraction process provided by an embodiment of the invention;
FIG. 6 is a block diagram of a fusion method for multi-modal detection of psychological stress problems for text, pictures and physiological related data according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a multi-modal fusion apparatus for mental stress detection according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an attention weight mapping apparatus for feature interactive fusion of two modality data according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Before describing the scheme provided by the embodiment of the present invention, the origin of the present invention will be briefly described. When a teenager suffers from psychological stress, the amount of daily activities (such as counting steps) and the sleeping conditions (difficulty in falling asleep, early awakening, etc.) of the teenager often become abnormal. On the other hand, the word expression and the picture expression can greatly represent the mental state and daily activities of the teenagers. The embodiment of the invention aims to detect the psychological stress of teenagers by fusing text data, picture data and physiological relevant data. Because the embodiment of the invention needs to solve the problem of multi-modal fusion, the embodiment of the invention firstly provides an attention weight corresponding method which enables feature interactive fusion of two modal data. In order to detect the psychological stress of teenagers on texts, pictures and physiological relevant data, the embodiment of the invention provides a fusion method for detecting the psychological stress of the texts, the pictures and the physiological relevant data in a multi-mode. The attention weight corresponding method for feature interactive fusion of two modality data and the multi-modality fusion method and apparatus for psychological stress detection provided by the embodiment of the invention will be described in detail through specific embodiments.
Fig. 1 shows a flowchart of an attention weight correspondence method for feature interactive fusion of two modality data according to an embodiment of the present invention. As shown in fig. 1, the attention weight correspondence method for feature interactive fusion of two modality data according to the embodiment of the present invention includes the following steps:
step 101: based on the feature matrix of the two modal data, an incidence relation matrix reflecting the information incidence between different features of the two modal data is obtained by matrix multiplication.
In this embodiment, the modal data refers to text data (such as a text questionnaire, a diary, a sketch, a composition, and the like) for detecting the psychological stress of the teenager, picture data (such as a picture questionnaire, a favorite cartoon, a handscribble, and the like) for detecting the psychological stress of the teenager, or physiologically relevant data, such as a sports situation, a sleep situation, and the like.
In this embodiment, the two modality data may refer to two modality data, namely text data and picture data, may refer to two modality data, namely text data and physiological related data, and may refer to two modality data, namely picture data and physiological related data.
Step 102: and acquiring an influence weight matrix of the characteristic matrix of one modal data to the characteristic matrix of the other modal data based on the incidence relation matrix and the feedforward full-connection network model.
Step 103: and acquiring an attention-strengthening feature matrix containing the mutual influence weight of the feature matrices of the two modal data by utilizing matrix dot multiplication and residual connection based on the influence weight matrix and the feature matrices of the two modal data.
In this embodiment, the main purpose is to obtain the association relationship between the feature matrices of the data of the two modalities, and to correspond the association relationship back to the original feature matrix, so that the processed feature matrix of each modality includes the information about the influence of the other modality on the association relationship. As shown in fig. 2, the following are specific:
assume the feature matrices of the two modality data are A and B, where
Figure BDA0002109908360000071
Multiplying the rank conversion matrixes of A and B by matrix multiplication to obtain a correlation relation matrix containing each characteristic in A and each characteristic in B
Figure BDA0002109908360000072
Figure BDA0002109908360000073
Through a layer of fully connected network, will
Figure BDA0002109908360000074
Is mapped back to
Figure BDA0002109908360000075
Of vector space, get AB→A
Figure BDA0002109908360000076
Figure BDA0002109908360000077
AB→ARepresents the weight of influence of mode B on mode A, W1Representing a first preset training parameter.
Using dot product operation to get AB→AMultiplying the obtained result by A and obtaining an attention-strengthening feature matrix after residual connection
Figure BDA0002109908360000078
Figure BDA0002109908360000079
Contains the information of B and the influence of B on A.
Figure BDA00021099083600000710
By the same way, the attention-strengthening feature matrix can be obtained
Figure BDA00021099083600000711
Figure BDA00021099083600000712
The information of A and the influence of A on B are included:
Figure BDA00021099083600000713
Figure BDA00021099083600000714
to facilitate the later embodiments to invoke this method, use fAMMTo express the method that
Figure BDA00021099083600000715
It should be noted that the two modality data in this embodiment may refer to two modality data, namely, text data and picture data for detecting psychological stress, may refer to two modality data, namely, text data and physiological related data, for detecting psychological stress, and may refer to two modality data, namely, picture data and physiological related data, for detecting psychological stress, in the embodiment of the present invention, the attention weight corresponding method for performing feature interactive fusion on the two modality data, namely, the text data and the picture data, is implemented through the above processing procedure, the attention weight corresponding method for performing feature interactive fusion on the two modality data, namely, the text data and the physiological related data, is implemented, the attention weight corresponding method for performing feature interactive fusion on the two modality data, namely, the embodiment enables the processed text data to include the influence of the picture data and the physiological related data on the association relationship thereof, the processed image data comprises the incidence relation influence of the text data and the physiological relevant data on the image data, and the processed physiological relevant data comprises the incidence relation influence of the text data and the image data on the image data. That is, in this embodiment, the processed feature matrix of each modality includes the influence information of another modality on the association relationship thereof, so that the multi-modality feature data is fused to obtain the result of the comprehensive influence of the multi-modality feature data. The following embodiments provide a multi-modal fusion method and apparatus for mental stress detection based on the method, and perform multi-modal fusion on text data, picture data and physiologically relevant data for mental stress detection, so as to make up for the deficiency caused by the subjectivity of the text and picture data, solve some inherent problems of physiologically relevant data (for example, physiologically relevant data in an extreme excitable state and an extreme stressed state are very similar), and make up for a mental detection window period caused by some data loss to some extent.
As can be seen from the above technical solution, the attention weight corresponding method for performing feature interaction fusion on two modality data provided in the embodiment of the present invention aims to obtain an association relationship between two modality feature matrices, and to correspond the association relationship back to an original feature matrix, so that the processed feature matrix of each modality includes information about an influence of another modality on its association relationship, and the following processing means is adopted: based on the feature matrix of two modal data, acquiring an incidence relation matrix reflecting information incidence degree between different features of the two modal data by utilizing matrix multiplication, acquiring an influence weight matrix of the feature matrix of one modal data to the feature matrix of the other modal data based on the incidence relation matrix and a feedforward full-connection network model, and finally acquiring an attention-enhanced feature matrix containing mutual influence weight of the feature matrices of the two modal data by utilizing matrix dot multiplication and residual connection based on the influence weight matrix and the feature matrices of the two modal data The embodiment of the invention realizes an attention weight corresponding method for carrying out feature interactive fusion on the two modal data of the text data and the image data, an attention weight corresponding method for carrying out feature interactive fusion on the two modal data of the text data and the physiological relevant data, and an attention weight corresponding method for carrying out feature interactive fusion on the two modal data of the image data and the physiological relevant data, namely, the embodiment leads the processed text data to contain the incidence relation influence of the image data and the physiological relevant data on the image data and the physiological relevant data, the processed picture data contains the incidence relation influence of the text data and the physiological relevant data on the picture data, the processed physiological relevant data comprises the incidence relation influence of the text data and the picture data on the physiological relevant data. That is, in this embodiment, the processed feature matrix of each modality includes the influence information of another modality on the association relationship thereof, so that the multi-modality feature data is fused to obtain the result of the comprehensive influence of the multi-modality feature data. The following embodiments provide a multi-modal fusion method and apparatus for mental stress detection based on the method, and perform multi-modal fusion on text data, picture data and physiologically relevant data for mental stress detection, so as to make up for the deficiency caused by the subjectivity of the text and picture data, solve some inherent problems of physiologically relevant data (for example, physiologically relevant data in an extreme excitable state and an extreme stressed state are very similar), and make up for a mental detection window period caused by some data loss to some extent.
Further, based on the content of the foregoing embodiment, in the present embodiment, the foregoing step 101 may be implemented as follows:
acquiring an incidence relation matrix reflecting information incidence degrees between different characteristics of two modal data by using a following first relation model:
Figure BDA0002109908360000091
wherein the content of the first and second substances,
Figure BDA0002109908360000092
representing a correlation matrix, A representing a characteristic matrix of data of one modality, B representing a characteristic matrix of data of another modality,
Figure BDA0002109908360000093
Figure BDA0002109908360000094
representing real space, k representing the dimension of the two modal data, BTExpressing the rank conversion matrix of B, multiplying the feature matrix A by the rank conversion matrix of the feature matrix B by using matrix multiplication to obtain an incidence relation matrix containing each feature in the feature matrix A and each feature in the feature matrix B
Figure BDA0002109908360000095
Further, based on the content of the foregoing embodiment, in this embodiment, the foregoing step 102 may be implemented as follows:
and acquiring an influence weight matrix of the feature matrix B on the feature matrix A by using a following second relation model:
Figure BDA0002109908360000096
and acquiring an influence weight matrix of the feature matrix A on the feature matrix B by using a following third relation model:
Figure BDA0002109908360000101
wherein A isB→ARepresenting the weight matrix of the influence of the feature matrix B on the feature matrix A, BA→BRepresenting the weight matrix of the influence of the feature matrix a on the feature matrix B,
Figure BDA0002109908360000102
softmax denotes the normalized exponential function, W1Representing a first predetermined training parameter, W, of a first class of training parameters2
Representing a second preset training parameter in the first class of training parameters, and connecting the incidence relation matrix through a layer of fully-connected network
Figure BDA0002109908360000103
Is mapped back to
Figure BDA0002109908360000104
Obtaining the influence weight matrix A of the feature matrix B on the feature matrix AB→AAnd the influence of the feature matrix A on the feature matrix BA→B
Further, based on the content of the foregoing embodiment, in the present embodiment, the foregoing step 103 can be implemented as follows:
obtaining an attention-strengthening feature matrix by using the following fourth relation model
Figure BDA0002109908360000105
Figure BDA0002109908360000106
And obtaining an attention-strengthening feature matrix by using the following fifth relation model
Figure BDA0002109908360000107
Figure BDA0002109908360000108
wherein an indicates a dot product operation, using the dot product operation to form AB→AMultiplying the obtained result by A and obtaining an attention-strengthening feature matrix after residual connection
Figure BDA0002109908360000109
Figure BDA00021099083600001010
The information of B and the influence of B on A are contained; using dot product operation to get BA→BMultiplying by B and obtaining the attention-strengthening feature matrix after residual connection
Figure BDA00021099083600001011
Figure BDA00021099083600001012
The information of A and the influence of A on B are contained;
wherein the content of the first and second substances,
Figure BDA00021099083600001013
fAMMrepresenting feature matrix A and feature matrix B to attention-enhancing feature matrix
Figure BDA00021099083600001014
And attention-enhancing feature matrix
Figure BDA00021099083600001015
The processing process specifically comprises the following steps: features are aligned using the first to fifth relational modelsProcessing the feature matrix A and the feature matrix B to obtain an attention-strengthening feature matrix
Figure BDA00021099083600001016
And attention-enhancing feature matrix
Figure BDA00021099083600001017
The process of (1).
Fig. 3 shows a flowchart of a multi-modal fusion method for mental stress detection according to an embodiment of the present invention. As shown in fig. 3, the multi-modal fusion method for detecting mental stress according to the embodiment of the present invention is implemented based on the attention weight correspondence method for performing feature interactive fusion on two modality data according to the foregoing embodiment, and the multi-modal fusion method for detecting mental stress according to the embodiment of the present invention includes the following steps:
step 201: and respectively acquiring a physiological data related characteristic matrix reflecting the physiological state of the user and a text characteristic matrix and an image characteristic matrix reflecting the psychological activity state of the user.
In this step, a feature matrix of text, pictures and physiologically relevant data needs to be obtained. For the process of acquiring the text feature matrix, see the schematic diagram of the acquiring process shown in fig. 4. For the acquisition process of the physiological data related feature matrix, refer to the schematic diagram of the acquisition process shown in fig. 5. The process of obtaining the feature matrix of text, picture and physiological related data is described in detail below.
for text, each text is denoted by w, w ═ w1,w2,···,wn},
Figure BDA0002109908360000111
wiRepresenting a word. For example, a pre-trained 300-dimensional vector from Chinese Word Vectors is selected as the initial Word vector for each Word, thus representing the text as X ═ { X ═ X1,x2,···,xn},
Figure BDA0002109908360000112
xiIs a 1 x 300 vector representing the meaning of a word.
The LSTM (Long Short-Term Memory) network layer aims to calculate a text representation that can express context information, because a model cannot directly understand natural language, a text representation that can be understood by the model must be calculated first, and the text representation is particularly in a matrix form H. Text representation X ═ X1,x2,···,xnEnters the LSTM layer as input, where n denotes the number of said words contained in said text vocabulary, and n is taken 20 in the present invention. Obtaining hidden layer outputs of two LSTMs through forward LSTM and backward LSTM respectively
Figure BDA0002109908360000113
And
Figure BDA0002109908360000114
adding hidden layer outputs at corresponding positions to obtain a text expression matrix H:
Figure BDA0002109908360000115
applying an attention mechanism to obtain the contribution degree distribution weight of the text expression matrix H:
AttnT=softmax(HW3+b1)
wherein, AttnTIs a contribution-degree distribution weight vector representing the distribution of contribution weights of the text representation of each word. Attn (gamma-TMultiplying by H and connecting through residual errors to obtain a text representation matrix subjected to weight distribution readjustment
Figure BDA0002109908360000116
Figure BDA0002109908360000117
Through a layer of fully connected network, will
Figure BDA0002109908360000118
Mapping to k multiplied by 1 vector space to obtain text feature matrix FT
Figure BDA0002109908360000119
the number of channels is 3, each picture is represented by a vector of 32 x 3, the feature vector of 4 x 512 is obtained through the first three-part structure of a pre-trained ResNet network, the convolution kernel size is 1 x 1, the feature vector of 4 x 32 is obtained through one layer of convolution layer, the image of 4 x 32 is expanded into a vector C with the length of 512, the initial image feature C is represented, and the dimension of the image feature is mapped to a space of n x 1 by one full-connection layer, so that a picture feature matrix F is obtainedV
FV=ReLU(W5C+b3)。
thirdly, for physiological related data, the related sleep data and motion data can be collected through a bracelet, feature extraction is carried out on the sleep data and the motion data, the related sleep feature vectors and the motion feature vectors are obtained and spliced to be used as the feature vectors of the physiological related data, for example, the sleep condition of 8:00 at night to 10:00 in the next morning is taken into consideration of the law of work and rest time of teenagers, 9 features are extracted, wherein the features are respectively a sleep starting segment, a sleep ending segment, a sleep segment, a deep sleep proportion, a sleep total amount, a unit segment sleep amount, a sleep fluctuation amount and the number of waking times in sleep, for the measurement of time features, every 15 minutes is taken as a segment, for example, 20:00-20:15 is segment 1, 20:15-20:30 is segment 2, and the rest is carried out by analogy, the segment set is represented by T,
T={t1,t2,···,t56},tie T represents the sleep quantity of the ith segment。
Sleep onset segment: generating a starting segment with at least 4 continuous segments of continuous sleep data all larger than 0 at the earliest in the sleep interval as a sleep starting segment, namely when ti*ti+1*ti+2*ti+3At > 0, ti,ti+1,ti+2,ti+3E.t, the sleep onset segment is taken as the segment of the minimum value in i.
Sleep end time: the latest segment of at least 4 continuous sleep segments in the sleep interval, i.e. ti*ti-1*ti-2*ti-3Is greater than 0 and
Figure BDA0002109908360000121
ti,ti-1,ti-2,ti-3∈T。
sleep segment: and the number of fragments with the sleep quantity larger than 0 in the sleep metering interval.
Deep sleep segment: when the sleep amount in the segment is higher than the threshold value theta, the segment is a deep sleep segment, the value of theta is generally 230, the threshold value is a bracelet parameter, and the value is variable according to different bracelets.
Deep sleep ratio: ratio of deep sleep segment to sleep segment.
Total sleep amount: sum of sleep amount between sleep onset segment and sleep end segment.
Sleep amount per unit segment: the ratio of the total sleep amount to the sleep segments is the unit segment sleep amount.
Amount of sleep fluctuation: the standard deviation of the amount of sleep between the sleep onset section and the sleep termination section is taken as the amount of sleep fluctuation.
the number of waking times in sleep is that the number of the segments from the sleep starting segment to the sleep ending segment is less than the threshold beta, and the beta value is 25, and when the number of the sleep between the sleep starting segment and the sleep ending segment is less than 25, the waking is indicated.
Regarding the motion feature vector, 5 motion features, which are the number of motion steps per day, the calorie consumption value per day, the distance of motion per day, the length of motion per day, and the length of active motion per day, are extracted.Wherein the exercise steps per day, the calorie consumption value per day, the exercise distance per day and the exercise duration per day can be directly obtained through the bracelet. Daily exercise activity duration: and equally dividing 24 hours each day into 96 segments, wherein the segments in each segment, of which the exercise step number, the calorie consumption value, the exercise distance and the exercise duration are higher than the average values of the corresponding items, are exercise active segments, and the total number of the exercise active segments per day is the exercise active duration per day. Splicing 9 sleep characteristics and 5 motion characteristics together to form 14 x 1 physiological related data characteristics ES
A better physiological relevant data representation matrix E is obtained through a two-layer fully-connected network:
E=ReLU(W7(ReLU(W6ES+b4)+b5))
and applying an attention mechanism to obtain the contribution degree distribution weight of the physiological relevant data representation matrix E:
AttnE=softmax(W8E+b6)
wherein, AttnEIs a contribution degree distribution weight vector which represents the distribution of the contribution weight represented by each physiological characteristic. Attn (gamma-EMultiplying by E and connecting through residual errors to obtain a text representation matrix subjected to weight distribution readjustment
Figure BDA0002109908360000131
Figure BDA0002109908360000132
Through a layer of fully connected network, will
Figure BDA0002109908360000133
Mapping to k multiplied by 1 vector space to obtain a physiological data related characteristic matrix FE
Figure BDA0002109908360000134
It should be noted that the above example is only an illustration, the physiological related data is not limited to the sleep data and the exercise data, and may be data such as blood pressure, pulse, galvanic skin response, electrocardiogram, electromyogram, etc. according to actual needs, which is not limited by the present invention.
Step 202: based on the physiological data related feature matrix, the text feature matrix and the picture feature matrix, by utilizing the attention weight corresponding method, a first attention-enhanced feature matrix comprising the influence weight of the physiological data related feature matrix on the text feature matrix, a second attention-enhanced feature matrix comprising the influence weight of the physiological data related feature matrix on the picture feature matrix, a third attention-enhanced feature matrix comprising the influence weight of the text feature matrix on the physiological data related feature matrix, a fourth attention-enhanced feature matrix comprising the influence weight of the text feature matrix on the picture feature matrix, a fifth attention-enhanced feature matrix comprising the influence weight of the picture feature matrix on the physiological data related feature matrix and a sixth attention-enhanced feature matrix comprising the influence weight of the picture feature matrix on the text feature matrix are obtained And (5) characterizing the matrix.
In this step, an attention weight correspondence method is applied between every two feature vectors to obtain an attention-enhanced feature matrix
Figure BDA0002109908360000141
Figure BDA0002109908360000142
Figure BDA0002109908360000143
Figure BDA0002109908360000144
Wherein the first attention-enhancing feature matrix isPhysiological data->Text attention-enhancing feature matrix
Figure BDA0002109908360000145
The second attention-enhancing feature matrix is physiological data->Picture attention-enhancing feature matrix
Figure BDA0002109908360000146
The third attention-enhancing feature matrix is text->Attention-enhancing feature matrix for physiological data
Figure BDA0002109908360000147
The fourth attention-enhancing feature matrix is text->Picture attention-enhancing feature matrix
Figure BDA0002109908360000148
The fifth attention-enhancing feature matrix is a picture->Attention-enhancing feature matrix for physiological data
Figure BDA0002109908360000149
The sixth attention-enhancing feature matrix is a picture->Text attention-enhancing feature matrix
Figure BDA00021099083600001410
Thus, for each modality, two attention-strengthening feature matrixes are obtained, and the two attention-strengthening feature matrixes contain the associated information of the other two modalities.
Step 203: based on the first attention-enhanced feature matrix, the second attention-enhanced feature matrix, the third attention-enhanced feature matrix, the fourth attention-enhanced feature matrix, the fifth attention-enhanced feature matrix and the sixth attention-enhanced feature matrix, acquiring a text fusion feature matrix, a picture fusion feature matrix and a physiological data fusion feature matrix based on a feedforward fully-connected neural network.
In this step, the two attention-enhancing feature matrices of each modality are further merged into a fused feature matrix through a layer of fully-connected network
Figure BDA0002109908360000151
Each fusion feature matrix contains the correlation and influence information of the other two modalities:
Figure BDA0002109908360000152
Figure BDA0002109908360000153
Figure BDA0002109908360000154
step 204: and obtaining the characteristic values of the text, the picture and the physiological data based on the related characteristic matrix of the physiological data, the text characteristic matrix and the picture characteristic matrix and based on a feedforward full-connection neural network.
In the step, the text, the picture and the physiological relevant characteristic matrix are mapped between (0,1), and then the text, the picture and the physiological relevant characteristic value S are obtained through one layer of full connectionT,SVAnd SE
ST=ReLU(W16softmax(FT)+b11)
SV=ReLU(W17softmax(FV)+b12)
SE=ReLU(W18softmax(FE)+b13)
Step 205: and acquiring importance weight values of the text, the picture and the physiological data based on the text, the picture and the physiological data characteristic values and based on a vector splicing and attention mechanism.
In the step, the text, the picture and the physiological data characteristic value S are processedT,SVAnd SESpliced together, and through an attention mechanism, the importance weight values weight of texts, pictures and physiological data are obtainedT,weightVAnd weightE
(weightT,weightV,weightE)=softmax([ST,SV,SE]W19)
Step 206: and acquiring fusion expression matrixes of three modes based on the importance weight values of the text, the picture and the physiological data and the text fusion characteristic matrix, the picture fusion characteristic matrix and the physiological data fusion characteristic matrix.
In this step, the three modalities are a text data modality, a picture data modality, and a physiological related data modality. In the step, the importance weight values of the text, the picture and the physiological data are weightedT,weightVAnd weightEFusing feature matrices with the text
Figure BDA0002109908360000155
The picture fusion feature matrix
Figure BDA0002109908360000156
And the physiological data fusion feature matrix
Figure BDA0002109908360000157
Corresponding multiplication and addition are carried out to obtain a fusion expression matrix R of three modesW
Figure BDA0002109908360000158
Step 207: and acquiring a pressure classification vector reflecting the psychological pressure problem based on the fusion expression matrix of the three modes and the feedforward full-connection network.
In this step, a linear classifier is used to obtain a 1 x 2 pressure classification vector y representing the existence of mental pressure, and two dimensions represent the existence of pressure and the nonexistence of pressure respectively, wherein the meaning corresponding to the position with the highest numerical value is used as the final classification result. For example, the pressure classification vector y may be obtained specifically by the following model:
y=softmax(W23RW+b14)
wherein, W1~W23Representing first to twenty-third preset training parameters of the first class of training parameters, b1~b14Representing the first to fourteenth preset training parameters in the second class of training parameters. The first type of training parameters and the second type of training parameters are both in accordance with a normal distribution U (-0.001, 0.001), and the first to twenty-third preset training parameters in the first type of training parameters and the first to fourteenth preset training parameters in the second type of training parameters are set according to actual needs.
Referring to fig. 6, as can be seen from a model structure diagram of a fusion method for performing multi-modal detection on a psychological stress problem of text, picture and physiological related data, in the multi-modal fusion method for detecting a psychological stress provided in an embodiment of the present invention, based on the physiological data related feature matrix, the text feature matrix and the picture feature matrix, a first attention-enhanced feature matrix including a weight of the physiological data related feature matrix affecting the text feature matrix mutually, a second attention-enhanced feature matrix including a weight of the physiological data related feature matrix affecting the picture feature matrix mutually, a third attention-enhanced feature matrix including a weight of the text feature matrix affecting the physiological data related feature matrix mutually, a fourth attention-enhanced feature matrix including a weight of the text feature matrix affecting the picture feature matrix mutually, and a method for corresponding attention weights are obtained, A fifth attention-enhanced feature matrix containing the mutual influence weight of the picture feature matrix on the physiological data related feature matrix and a sixth attention-enhanced feature matrix containing the mutual influence weight of the picture feature matrix on the text feature matrix are obtained, and then a text fusion feature matrix, a picture fusion feature matrix and a physiological data fusion feature matrix are obtained based on a feedforward full-connection neural network based on the first attention-enhanced feature matrix, the second attention-enhanced feature matrix, the third attention-enhanced feature matrix, the fourth attention-enhanced feature matrix, the fifth attention-enhanced feature matrix and the sixth attention-enhanced feature matrix; then based on the physiological data related characteristic matrix, the text characteristic matrix and the picture characteristic matrix, based on a feedforward full-connection neural network, obtaining characteristic values of texts, pictures and physiological data, and then based on the characteristic values of the texts, the pictures and the physiological data, based on vector splicing and an attention mechanism, obtaining importance weighted values of the texts, the pictures and the physiological data; then acquiring fusion expression matrixes of three modes based on the importance weight values of the text, the picture and the physiological data and the text fusion characteristic matrix, the picture fusion characteristic matrix and the physiological data fusion characteristic matrix; and finally, acquiring a pressure classification vector reflecting the psychological pressure problem based on the fusion expression matrix of the three modes and a feedforward full-connection network. According to the embodiment of the invention, through fusing the text picture data and the physiological related data, the defects caused by the subjectivity of the text and the picture data are made up, some inherent problems of the physiological related data are solved (for example, the physiological related data in an extreme excitation state and an extreme stress state are very similar), and a psychological detection window period caused by certain data loss is made up to a certain extent.
Further, based on the content of the foregoing embodiment, in the present embodiment, the foregoing step 201 may be implemented as follows:
acquiring a text feature matrix reflecting the psychological activity state of the user by using the following sixth processing model:
Figure BDA0002109908360000171
AttnT=softmax(HW3+b1)
Figure BDA0002109908360000172
Figure BDA0002109908360000173
wherein, FTRepresenting a text feature matrix, H representing a text representation matrix,
Figure BDA0002109908360000174
representing the text representation matrix, Attn, readjusted by the weight distributionTA distribution weight vector representing the contribution degree of the text representation matrix H, and the text representation X is set to { X }1,x2,···,xnGet into long and short term memory network as input
An LSTM layer for respectively obtaining two hidden layer outputs of the LSTM via forward LSTM and backward LSTM
Figure BDA0002109908360000175
And
Figure BDA0002109908360000176
adding the hidden layer outputs at the corresponding positions to obtain a text representation matrix H; applying attention mechanism to obtain contribution degree distribution weight vector Attn of text expression matrix HT:AttnT=softmax(HW3+b1),AttnTDistribution of contribution weights representing the text representation of each word, AttnTMultiplying by H and connecting through residual errors to obtain a text representation matrix subjected to weight distribution readjustment
Figure BDA0002109908360000177
Figure BDA0002109908360000178
Through a layer of fully connected network, will
Figure BDA0002109908360000179
Mapping to k multiplied by 1 vector space to obtain text feature matrix FT
Figure BDA00021099083600001710
Wherein, W3Representing a third pre-set training parameter, W, of the first class of training parameters4Representing a fourth pre-set training parameter of the first class of training parameters, b1Representing a first preset training parameter in a second class of training parameters, b2Representing second class of training parametersA second preset training parameter in the number, ReLU representing an activation function; softmax represents the normalized exponential function, textual representation
Figure BDA00021099083600001711
xiA vector representing the meaning of the word, n representing the number of words contained in the text;
and acquiring a picture characteristic matrix reflecting the psychological activity state of the user by using the following seventh processing model:
FV=ReLU(W5C+b3)
wherein, FVRepresenting a picture characteristic matrix, C representing picture characteristics, and mapping the dimension of the picture characteristics C to an n multiplied by 1 vector space by using a full connection layer to obtain a picture characteristic matrix FV(ii) a Wherein, W5Representing a fifth pre-set training parameter of the first class of training parameters, b3Representing a third preset training parameter in the second class of training parameters;
and acquiring a physiological data related feature matrix reflecting the physiological state of the user by using the following eighth processing model:
E=ReLU(W7(ReLU(W6ES+b4)+b5))
AttnE=softmax(W8E+b6)
Figure BDA0002109908360000181
Figure BDA0002109908360000182
wherein, FERepresenting a matrix of physiological data-related features, ESRepresenting a matrix of characteristics of physiologically relevant data, ESThe inside of the body contains a plurality of preset physiological characteristics, E represents a pair ESPhysiological related data expression matrix E, Attn obtained by two-layer full-connection networkERepresenting the contribution degree distribution weight vector of the physiological relevant data representation matrix E,
Figure BDA0002109908360000183
representing a text representation matrix readjusted by a weight distribution
Figure BDA0002109908360000184
To ESAnd (3) performing a physiological related data representation matrix E obtained by two layers of fully-connected networks: e ═ ReLU (W)7(ReLU(W6ES+b4)+b5) Applying an attention mechanism to obtain a contribution degree distribution weight vector Attn of the physiological-related data representation matrix EE:AttnE=softmax(W8E+b6) Attn (r) toEMultiplying by E and connecting through residual errors to obtain a text representation matrix subjected to weight distribution readjustment
Figure BDA0002109908360000185
Figure BDA0002109908360000186
Through a layer of fully connected network, will
Figure BDA0002109908360000187
Mapping to k multiplied by 1 vector space to obtain a physiological data related characteristic matrix FE
Figure BDA0002109908360000188
Wherein, AttnERepresents the distribution of the contribution weights, W, of each physiological characteristic representation6Representing a sixth pre-set training parameter, W, of the first class of training parameters7Representing a seventh preset training parameter, W, of the first class of training parameters8Represents an eighth pre-set training parameter, W, of the first class of training parameters9Representing a ninth pre-set training parameter of the first class of training parameters, b4Representing a fourth pre-set training parameter in the second class of training parameters, b5Representing a fifth preset training parameter in the second class of training parameters, b6Representing a sixth preset training parameter in the second class of training parameters, b7Representing a seventh preset training parameter in the second class of training parameters.
Further, based on the content of the foregoing embodiment, in the present embodiment, the foregoing step 202 may be implemented as follows:
based on the physiological data related feature matrix and the text feature matrix, applying the attention weight corresponding method for performing feature interactive fusion on two modal data, to obtain a first attention-enhanced feature matrix including the influence weight of the physiological data related feature matrix on the text feature matrix; the first attention-enhancing feature matrix is physiological data->Text attention-enhancing feature matrix
Figure BDA0002109908360000191
Figure BDA0002109908360000192
Based on the physiological data related feature matrix and the picture feature matrix, applying the attention weight corresponding method for carrying out feature interactive fusion on two modal data to obtain a second attention enhanced feature matrix comprising the influence weight of the physiological data related feature matrix on the picture feature matrix; the second attention-enhancing feature matrix is physiological data->Picture attention-enhancing feature matrix
Figure BDA0002109908360000193
Figure BDA0002109908360000194
Based on the physiological data related feature matrix and the text feature matrix, applying the attention weight corresponding method for performing feature interactive fusion on two modal data, to obtain a third attention-enhanced feature matrix including the influence weight of the text feature matrix on the physiological data related feature matrix; the third attention-enhancing feature matrix is text->Attention-enhancing feature matrix for physiological data
Figure BDA0002109908360000195
Figure BDA0002109908360000196
Based on the text feature matrix and the picture feature matrix, obtaining a fourth attention-enhanced feature matrix including the mutual influence weight of the text feature matrix on the picture feature matrix by applying the attention weight corresponding method for performing feature interactive fusion on two modal data, which is described in the embodiment; the fourth attention-enhancing feature matrix is text->Picture attention-enhancing feature matrix
Figure BDA0002109908360000197
Figure BDA0002109908360000198
Based on the physiological data related feature matrix and the picture feature matrix, applying the attention weight corresponding method for performing feature interactive fusion on two modal data, to obtain a fifth attention-enhanced feature matrix including the weight of the picture feature matrix on the mutual influence of the physiological data related feature matrix; the fifth attention-enhancing feature matrix is a picture->Attention-enhancing feature matrix for physiological data
Figure BDA0002109908360000201
Figure BDA0002109908360000202
Based on the text feature matrix and the picture feature matrix, obtaining a sixth attention-enhanced feature matrix including the mutual influence weight of the picture feature matrix on the text feature matrix by applying the attention weight corresponding method for performing feature interactive fusion on the two modal data, which is described in the embodiment; the sixth attention-enhancing feature matrix is a picture->Text attention-enhancing feature matrix
Figure BDA0002109908360000203
Figure BDA0002109908360000204
Further, based on the content of the foregoing embodiment, in this embodiment, the foregoing step 203 may be implemented as follows:
acquiring a text fusion feature matrix through a layer of full-connection network based on the first attention-enhanced feature matrix and the sixth attention-enhanced feature matrix by using a following ninth processing model
Figure BDA0002109908360000205
Figure BDA0002109908360000206
Acquiring a picture fusion feature matrix through a layer of full-connection network based on the second attention-enhanced feature matrix and the fourth attention-enhanced feature matrix by using the following tenth processing model
Figure BDA0002109908360000207
Figure BDA0002109908360000208
Acquiring a physiological data fusion feature matrix through a layer of full-connection network based on the third attention-enhanced feature matrix and the fifth attention-enhanced feature matrix by using the following eleventh processing model
Figure BDA0002109908360000209
Figure BDA00021099083600002010
Wherein, W10~W15Representing the tenth to tenth of the first class of training parametersFive preset training parameters, b8~b10Represent the eighth to tenth preset training parameters in the second class of training parameters.
Further, based on the content of the foregoing embodiment, in the present embodiment, the foregoing step 204 may be implemented as follows:
acquiring characteristic values of texts, pictures and physiological data by using the following twelfth processing model:
ST=ReLU(W16softmax(FT)+b11)
SV=ReLU(W17softmax(FV)+b12)
SE=ReLU(W18softmax(FE)+b13)
wherein the physiological data is correlated with a feature matrix FEThe text feature matrix FTAnd the picture feature moments FVMapping the arrays between (0,1), and obtaining the characteristic values S of texts, pictures and physiological data through one layer of full connectionT,SVAnd SE(ii) a Wherein, W16~W18Representing the sixteenth to eighteenth preset training parameters in the first class of training parameters, b11~b13Representing the eleventh to thirteenth preset training parameters in the second class of training parameters.
Further, based on the content of the foregoing embodiment, in this embodiment, the foregoing step 205 may be implemented as follows:
using the following thirteenth processing model to process the text, the picture and the physiological data characteristic value ST,SVAnd SESpliced together, and through an attention mechanism, the importance weight values weight of texts, pictures and physiological data are obtainedT,weightVAnd weightE
(weightT,weightV,weightE)=softmax([ST,SV,SE]W19)
Wherein, W119Representing a nineteenth preset training parameter in the first class of training parameters.
Further, based on the content of the foregoing embodiment, in the present embodiment, the foregoing step 206 may be implemented as follows:
utilizing the following fourteenth processing model to weight the importance values weight of the text, the picture and the physiological dataT,weightVAnd weightEFusing feature matrices with the text
Figure BDA0002109908360000211
The picture fusion feature matrix
Figure BDA0002109908360000212
And the physiological data fusion feature matrix
Figure BDA0002109908360000213
Corresponding multiplication and addition are carried out to obtain a fusion expression matrix R of three modesW
Figure BDA0002109908360000214
Wherein, W20~W22Representing the twentieth to twenty-second preset training parameters in the first class of training parameters.
Fig. 7 is a schematic structural diagram illustrating an attention weight mapping apparatus for feature interactive fusion of two modality data according to an embodiment of the present invention. As shown in fig. 7, the attention weight correspondence apparatus for feature interactive fusion of two modality data according to the embodiment of the present invention includes: a first obtaining module 11, a second obtaining module 12 and a third obtaining module 13, wherein:
the first obtaining module 11 is configured to obtain, based on a feature matrix of two types of modal data, an incidence relation matrix reflecting information relevance between different features of the two types of modal data by using matrix multiplication;
a second obtaining module 12, configured to obtain, based on the incidence relation matrix and the feedforward full-connection network model, a weight matrix of influence of a feature matrix of one modal data on a feature matrix of another modal data;
a third obtaining module 13, configured to obtain an attention-enhanced feature matrix including mutual influence weights of feature matrices of the two modality data by using matrix dot multiplication and residual connection based on the influence weight matrix and the feature matrices of the two modality data.
The attention weight corresponding device for performing feature interactive fusion on two modality data provided in the embodiment of the present invention may be used to execute the attention weight corresponding method for performing feature interactive fusion on two modality data described in the above embodiment, and the working principle and the beneficial effect are similar, so detailed description is omitted here, and specific contents may refer to the description of the above embodiment.
Fig. 8 is a schematic structural diagram of a multi-modal fusion apparatus for mental stress detection according to an embodiment of the present invention. As shown in fig. 8, the multi-modal fusion apparatus for mental stress detection according to an embodiment of the present invention is implemented based on the attention weight correspondence apparatus for feature interactive fusion of two modality data described in the above embodiment, and includes: a fourth obtaining module 21, a fifth obtaining module 22, a sixth obtaining module 23, a seventh obtaining module 24, an eighth obtaining module 25, a ninth obtaining module 26, and a tenth obtaining module 27, wherein:
a fourth obtaining module 21, configured to obtain a physiological data related feature matrix reflecting a physiological state of the user, and a text feature matrix and an image feature matrix reflecting a psychological activity state of the user, respectively;
a fifth obtaining module 22, configured to obtain, based on the physiological data related feature matrix, the text feature matrix and the image feature matrix, a first attention-enhanced feature matrix including a weight of an influence of the physiological data related feature matrix on the text feature matrix, a second attention-enhanced feature matrix including a weight of an influence of the physiological data related feature matrix on the image feature matrix, a third attention-enhanced feature matrix including a weight of an influence of the text feature matrix on the physiological data related feature matrix, a fourth attention-enhanced feature matrix including a weight of an influence of the text feature matrix on the image feature matrix, a fifth attention-enhanced feature matrix including a weight of an influence of the image feature matrix on the physiological data related feature matrix, and a fifth attention-enhanced feature matrix including a weight of an influence of the image feature matrix on the text feature matrix by using the attention-weight correspondence method A sixth weighted attention-enhancing feature matrix;
a sixth obtaining module 23, configured to obtain a text fusion feature matrix, an image fusion feature matrix, and a physiological data fusion feature matrix based on the feedforward fully-connected neural network based on the first attention-enhanced feature matrix, the second attention-enhanced feature matrix, the third attention-enhanced feature matrix, the fourth attention-enhanced feature matrix, the fifth attention-enhanced feature matrix, and the sixth attention-enhanced feature matrix;
a seventh obtaining module 24, configured to obtain a text, an image, and a physiological data feature value based on the physiological data related feature matrix, the text feature matrix, and the image feature matrix, and based on a feedforward fully-connected neural network;
an eighth obtaining module 25, configured to obtain importance weight values of the text, the picture, and the physiological data based on the text, the picture, and the physiological data feature values and based on a vector splicing and attention mechanism;
a ninth obtaining module 26, configured to obtain a fusion expression matrix of three modalities based on the importance weight values of the text, the picture and the physiological data, and the text fusion feature matrix, the picture fusion feature matrix and the physiological data fusion feature matrix;
a tenth obtaining module 27, configured to obtain a pressure classification vector reflecting a psychological pressure problem based on the fusion representation matrix of the three modalities and the feedforward full-connection network.
Since the multi-modal fusion apparatus for psychological stress detection provided by the embodiment of the present invention can be used to perform the multi-modal fusion method for psychological stress detection described in the above embodiment, and the operation principle and the beneficial effect are similar, detailed descriptions are not provided herein, and specific contents can be referred to the description of the above embodiment.
Based on the same inventive concept, another embodiment of the present invention provides an electronic device, which specifically includes the following components, with reference to fig. 9: a processor 301, a memory 302, a communication interface 303, and a communication bus 304;
the processor 301, the memory 302 and the communication interface 303 complete mutual communication through the communication bus 304;
the processor 301 is configured to call a computer program in the memory 302, and when the processor executes the computer program, the processor implements the above-mentioned attention weight corresponding method for feature interactive fusion of two modality data, and/or all the steps of a multi-modality fusion method for mental stress detection, for example, when the processor executes the computer program, the processor implements the following processes:
based on the feature matrix of the two modal data, acquiring an incidence relation matrix reflecting information incidence between different features of the two modal data by utilizing matrix multiplication; acquiring an influence weight matrix of the characteristic matrix of one modal data to the characteristic matrix of the other modal data based on the incidence relation matrix and the feedforward full-connection network model; and acquiring an attention-strengthening feature matrix containing the mutual influence weight of the feature matrices of the two modal data by utilizing matrix dot multiplication and residual connection based on the influence weight matrix and the feature matrices of the two modal data.
As another example, the processor, when executing the computer program, implements the following:
respectively acquiring a physiological data related characteristic matrix reflecting the physiological state of a user and a text characteristic matrix and an image characteristic matrix reflecting the psychological activity state of the user; based on the physiological data related feature matrix, the text feature matrix and the picture feature matrix, by utilizing the attention weight corresponding method, a first attention-enhanced feature matrix comprising the influence weight of the physiological data related feature matrix on the text feature matrix, a second attention-enhanced feature matrix comprising the influence weight of the physiological data related feature matrix on the picture feature matrix, a third attention-enhanced feature matrix comprising the influence weight of the text feature matrix on the physiological data related feature matrix, a fourth attention-enhanced feature matrix comprising the influence weight of the text feature matrix on the picture feature matrix, a fifth attention-enhanced feature matrix comprising the influence weight of the picture feature matrix on the physiological data related feature matrix and a sixth attention-enhanced feature matrix comprising the influence weight of the picture feature matrix on the text feature matrix are obtained Characterizing the matrix; acquiring a text fusion feature matrix, a picture fusion feature matrix and a physiological data fusion feature matrix based on the first attention enhancement feature matrix, the second attention enhancement feature matrix, the third attention enhancement feature matrix, the fourth attention enhancement feature matrix, the fifth attention enhancement feature matrix and the sixth attention enhancement feature matrix and based on a feedforward full-connection neural network; based on the physiological data related feature matrix, the text feature matrix and the picture feature matrix, obtaining a text, a picture and a physiological data feature value based on a feedforward full-connection neural network; based on the text, the picture and the physiological data characteristic value, acquiring importance weight values of the text, the picture and the physiological data based on vector splicing and attention mechanism; acquiring fusion expression matrixes of three modes based on the importance weight values of the text, the picture and the physiological data and the text fusion characteristic matrix, the picture fusion characteristic matrix and the physiological data fusion characteristic matrix; and acquiring a pressure classification vector reflecting the psychological pressure problem based on the fusion expression matrix of the three modes and the feedforward full-connection network.
Based on the same inventive concept, yet another embodiment of the present invention provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor implements the above-mentioned attention weight correspondence method for feature interactive fusion of two modality data, and/or all steps of a multimodal fusion method for psychological stress detection, for example, the processor implements the following processes when executing the computer program:
based on the feature matrix of the two modal data, acquiring an incidence relation matrix reflecting information incidence between different features of the two modal data by utilizing matrix multiplication; acquiring an influence weight matrix of the characteristic matrix of one modal data to the characteristic matrix of the other modal data based on the incidence relation matrix and the feedforward full-connection network model; and acquiring an attention-strengthening feature matrix containing the mutual influence weight of the feature matrices of the two modal data by utilizing matrix dot multiplication and residual connection based on the influence weight matrix and the feature matrices of the two modal data.
As another example, the processor, when executing the computer program, implements the following:
respectively acquiring a physiological data related characteristic matrix reflecting the physiological state of a user and a text characteristic matrix and an image characteristic matrix reflecting the psychological activity state of the user; based on the physiological data related feature matrix, the text feature matrix and the picture feature matrix, by utilizing the attention weight corresponding method, a first attention-enhanced feature matrix comprising the influence weight of the physiological data related feature matrix on the text feature matrix, a second attention-enhanced feature matrix comprising the influence weight of the physiological data related feature matrix on the picture feature matrix, a third attention-enhanced feature matrix comprising the influence weight of the text feature matrix on the physiological data related feature matrix, a fourth attention-enhanced feature matrix comprising the influence weight of the text feature matrix on the picture feature matrix, a fifth attention-enhanced feature matrix comprising the influence weight of the picture feature matrix on the physiological data related feature matrix and a sixth attention-enhanced feature matrix comprising the influence weight of the picture feature matrix on the text feature matrix are obtained Characterizing the matrix; acquiring a text fusion feature matrix, a picture fusion feature matrix and a physiological data fusion feature matrix based on the first attention enhancement feature matrix, the second attention enhancement feature matrix, the third attention enhancement feature matrix, the fourth attention enhancement feature matrix, the fifth attention enhancement feature matrix and the sixth attention enhancement feature matrix and based on a feedforward full-connection neural network; based on the physiological data related feature matrix, the text feature matrix and the picture feature matrix, obtaining a text, a picture and a physiological data feature value based on a feedforward full-connection neural network; based on the text, the picture and the physiological data characteristic value, acquiring importance weight values of the text, the picture and the physiological data based on vector splicing and attention mechanism; acquiring fusion expression matrixes of three modes based on the importance weight values of the text, the picture and the physiological data and the text fusion characteristic matrix, the picture fusion characteristic matrix and the physiological data fusion characteristic matrix; and acquiring a pressure classification vector reflecting the psychological pressure problem based on the fusion expression matrix of the three modes and the feedforward full-connection network.
In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions may be essentially or partially implemented in the form of software products, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the multi-modal fusion method for mental stress detection according to the various embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (12)

1. A multi-modal fusion method for psychological stress detection based on an attention weight correspondence method for feature interactive fusion of two modal data,
the attention weight corresponding method for carrying out feature interactive fusion on two modal data comprises the following steps:
based on the feature matrix of the two modal data, acquiring an incidence relation matrix reflecting information incidence between different features of the two modal data by utilizing matrix multiplication;
acquiring an influence weight matrix of the characteristic matrix of one modal data to the characteristic matrix of the other modal data based on the incidence relation matrix and the feedforward full-connection network model;
based on the influence weight matrix and the feature matrices of the two modal data, acquiring an attention-enhanced feature matrix containing the mutual influence weights of the feature matrices of the two modal data by utilizing matrix dot multiplication and residual connection;
accordingly, the multi-modal fusion method for psychological stress detection comprises the following steps:
respectively acquiring a physiological data related characteristic matrix reflecting the physiological state of a user and a text characteristic matrix and an image characteristic matrix reflecting the psychological activity state of the user;
based on the physiological data related feature matrix, the text feature matrix and the picture feature matrix, by utilizing the attention weight corresponding method, a first attention-enhanced feature matrix comprising the influence weight of the physiological data related feature matrix on the text feature matrix, a second attention-enhanced feature matrix comprising the influence weight of the physiological data related feature matrix on the picture feature matrix, a third attention-enhanced feature matrix comprising the influence weight of the text feature matrix on the physiological data related feature matrix, a fourth attention-enhanced feature matrix comprising the influence weight of the text feature matrix on the picture feature matrix, a fifth attention-enhanced feature matrix comprising the influence weight of the picture feature matrix on the physiological data related feature matrix and a sixth attention-enhanced feature matrix comprising the influence weight of the picture feature matrix on the text feature matrix are obtained Characterizing the matrix;
acquiring a text fusion feature matrix, a picture fusion feature matrix and a physiological data fusion feature matrix based on the first attention enhancement feature matrix, the second attention enhancement feature matrix, the third attention enhancement feature matrix, the fourth attention enhancement feature matrix, the fifth attention enhancement feature matrix and the sixth attention enhancement feature matrix and based on a feedforward full-connection neural network;
based on the physiological data related feature matrix, the text feature matrix and the picture feature matrix, obtaining a text, a picture and a physiological data feature value based on a feedforward full-connection neural network;
based on the text, the picture and the physiological data characteristic value, acquiring importance weight values of the text, the picture and the physiological data based on vector splicing and attention mechanism;
acquiring fusion expression matrixes of three modes based on the importance weight values of the text, the picture and the physiological data and the text fusion characteristic matrix, the picture fusion characteristic matrix and the physiological data fusion characteristic matrix;
and acquiring a pressure classification vector reflecting the psychological pressure problem based on the fusion expression matrix of the three modes and the feedforward full-connection network.
2. The multi-modal fusion method for mental stress detection according to claim 1, wherein the obtaining of the incidence relation matrix reflecting the information correlation degree between different features of the two modal data by using matrix multiplication based on the feature matrix of the two modal data specifically comprises:
acquiring an incidence relation matrix reflecting information incidence degrees between different characteristics of two modal data by using a following first relation model:
Figure FDA0002392415990000021
wherein the content of the first and second substances,
Figure FDA0002392415990000022
representing a correlation matrix, A representing a characteristic matrix of data of one modality, B representing a characteristic matrix of data of another modality,
Figure FDA0002392415990000023
Figure FDA0002392415990000024
representing real space, k representing the dimension of the two modal data, BTExpressing the rank conversion matrix of B, multiplying the feature matrix A by the rank conversion matrix of the feature matrix B by using matrix multiplication to obtain an incidence relation matrix containing each feature in the feature matrix A and each feature in the feature matrix B
Figure FDA0002392415990000025
3. The multi-modal fusion method for mental stress detection according to claim 2, wherein the obtaining of the influence weight matrix of the feature matrix of one modal data on the feature matrix of another modal data based on the incidence relation matrix and the feedforward full-connection network model specifically comprises:
and acquiring an influence weight matrix of the feature matrix B on the feature matrix A by using a following second relation model:
Figure FDA0002392415990000026
and acquiring an influence weight matrix of the feature matrix A on the feature matrix B by using a following third relation model:
Figure FDA0002392415990000031
wherein A isB→ARepresenting the weight matrix of the influence of the feature matrix B on the feature matrix A, BA→BRepresenting the weight matrix of the influence of the feature matrix a on the feature matrix B,
Figure FDA0002392415990000032
softmax denotes the normalized exponential function, W1Representing a first predetermined training parameter, W, of a first class of training parameters2Representing a second preset training parameter in the first class of training parameters, and connecting the incidence relation matrix through a layer of fully-connected network
Figure FDA0002392415990000033
Is mapped back to
Figure FDA0002392415990000034
Obtaining the influence weight matrix A of the feature matrix B on the feature matrix AB→AAnd the influence of the feature matrix A on the feature matrix BA→B
4. The multi-modal fusion method for mental stress detection according to claim 3, wherein the obtaining of the attention-enhanced feature matrix including the feature matrix mutual influence weight of the two modal data by using matrix dot multiplication and residual connection based on the influence weight matrix and the feature matrices of the two modal data specifically comprises:
obtaining an attention-strengthening feature matrix by using the following fourth relation model
Figure FDA0002392415990000035
Figure FDA0002392415990000036
And obtaining an attention-strengthening feature matrix by using the following fifth relation model
Figure FDA0002392415990000037
Figure FDA0002392415990000038
wherein an indicates a dot product operation, using the dot product operation to form AB→AMultiplying the obtained result by A and obtaining an attention-strengthening feature matrix after residual connection
Figure FDA0002392415990000039
The middle bag containsB information and the effect of B on A; using dot product operation to get BA→BMultiplying by B and obtaining the attention-strengthening feature matrix after residual connection
Figure FDA00023924159900000310
Figure FDA00023924159900000311
The information of A and the influence of A on B are contained;
wherein the content of the first and second substances,
Figure FDA00023924159900000312
fAMMrepresenting feature matrix A and feature matrix B to attention-enhancing feature matrix
Figure FDA00023924159900000313
And attention-enhancing feature matrix
Figure FDA00023924159900000314
The processing process specifically comprises the following steps: processing the feature matrix A and the feature matrix B by using the first to fifth relational models to obtain an attention-strengthening feature matrix
Figure FDA00023924159900000315
And attention-enhancing feature matrix
Figure FDA00023924159900000316
The process of (1).
5. The multi-modal fusion method for detecting psychological stress according to claim 1, wherein the obtaining of the physiological data related feature matrix reflecting the physiological status of the user and the text feature matrix and the picture feature matrix reflecting the psychological activity status of the user respectively comprises:
acquiring a text feature matrix reflecting the psychological activity state of the user by using the following sixth processing model:
Figure FDA0002392415990000041
AttnT=soft max(HW3+b1)
Figure FDA0002392415990000042
Figure FDA0002392415990000043
wherein, FTRepresenting a text feature matrix, H representing a text representation matrix,
Figure FDA0002392415990000044
representing the text representation matrix, Attn, readjusted by the weight distributionTA distribution weight vector representing the contribution degree of the text representation matrix H, and the text representation X is set to { X }1,x2,…,xnEnters the LSTM layer of the long-short term memory network as input, and obtains two hidden layer outputs of the LSTM through the forward LSTM and the backward LSTM respectively
Figure FDA0002392415990000045
And
Figure FDA0002392415990000046
adding the hidden layer outputs at the corresponding positions to obtain a text representation matrix H; applying attention mechanism to obtain contribution degree distribution weight vector Attn of text expression matrix HT:AttnT=softmax(HW3+b1),AttnTDistribution of contribution weights representing the text representation of each word, AttnTMultiplying by H and connecting through residual errors to obtain a text representation matrix subjected to weight distribution readjustment
Figure FDA0002392415990000047
Figure FDA0002392415990000048
Through a layer of fully connected network, will
Figure FDA0002392415990000049
Mapping to k multiplied by 1 vector space to obtain text feature matrix FT
Figure FDA00023924159900000410
Wherein, W3Representing a third pre-set training parameter, W, of the first class of training parameters4Representing a fourth pre-set training parameter of the first class of training parameters, b1Representing a first preset training parameter in a second class of training parameters, b2Representing a second preset training parameter in the second class of training parameters, wherein the ReLU represents an activation function; softmax denotes normalized exponential function, text denotes X ═ X1,x2,…,xn}
Figure FDA00023924159900000411
xiA vector representing the meaning of the word, n representing the number of words contained in the text;
and acquiring a picture characteristic matrix reflecting the psychological activity state of the user by using the following seventh processing model:
FV=ReLU(W5C+b3)
wherein, FVRepresenting a picture characteristic matrix, C representing picture characteristics, and mapping the dimension of the picture characteristics C to an n multiplied by 1 vector space by using a full connection layer to obtain a picture characteristic matrix FV(ii) a Wherein, W5Representing a fifth pre-set training parameter of the first class of training parameters, b3Representing a third preset training parameter in the second class of training parameters;
and acquiring a physiological data related feature matrix reflecting the physiological state of the user by using the following eighth processing model:
E=ReLU(W7(Re LU(W6ES+b4)+b5))
AttnE=soft max(W8E+b6)
Figure FDA0002392415990000051
Figure FDA0002392415990000052
wherein, FERepresenting a matrix of physiological data-related features, ESRepresenting a matrix of characteristics of physiologically relevant data, ESThe inside of the body contains a plurality of preset physiological characteristics, E represents a pair EsPhysiological related data expression matrix E, Attn obtained by two-layer full-connection networkERepresenting the contribution degree distribution weight vector of the physiological relevant data representation matrix E,
Figure FDA0002392415990000053
representing a text representation matrix readjusted by a weight distribution
Figure FDA0002392415990000054
To EsAnd (3) performing a physiological related data representation matrix E obtained by two layers of fully-connected networks: e ═ ReLU (W)7(ReLU(W6ES+b4)+b5) Applying an attention mechanism to obtain a contribution degree distribution weight vector Attn of the physiological-related data representation matrix EE:AttnE=soft max(W8E+b6) Attn (r) toEMultiplying by E and connecting through residual errors to obtain a text representation matrix subjected to weight distribution readjustment
Figure FDA0002392415990000055
Figure FDA0002392415990000056
Through a layer of fully connected network, will
Figure FDA0002392415990000057
Mapping to k multiplied by 1 vector space to obtain a physiological data related characteristic matrix FE
Figure FDA0002392415990000058
Wherein, AttnERepresents the distribution of the contribution weights, W, of each physiological characteristic representation6Representing a sixth pre-set training parameter, W, of the first class of training parameters7Representing a seventh preset training parameter, W, of the first class of training parameters8Represents an eighth pre-set training parameter, W, of the first class of training parameters9Representing a ninth pre-set training parameter of the first class of training parameters, b4Representing a fourth pre-set training parameter in the second class of training parameters, b5Representing a fifth preset training parameter in the second class of training parameters, b6Representing a sixth preset training parameter in the second class of training parameters, b7Representing a seventh preset training parameter in the second class of training parameters.
6. The multi-modal fusion method for psychological stress detection as set forth in claim 5, wherein the attention weight correspondence method is utilized to obtain a first attention-enhanced feature matrix comprising the mutual influence weight of the physiological data-related feature matrix on the text feature matrix, a second attention-enhanced feature matrix comprising the mutual influence weight of the physiological data-related feature matrix on the picture feature matrix, a third attention-enhanced feature matrix comprising the mutual influence weight of the text feature matrix on the physiological data-related feature matrix, a fourth attention-enhanced feature matrix comprising the mutual influence weight of the text feature matrix on the picture feature matrix, and a fifth attention-enhanced feature matrix comprising the mutual influence weight of the picture feature matrix on the physiological data-related feature matrix The matrix and a sixth attention-enhancing feature matrix including the mutual influence weight of the image feature matrix on the text feature matrix specifically include:
based on the physiological data related feature matrix and the text feature matrix, applying the attention weight corresponding method for carrying out feature interactive fusion on the two modal data to obtain a first attention-enhanced feature matrix containing the influence weight of the physiological data related feature matrix on the text feature matrix; the first attention-enhancing feature matrix is physiological data->Text attention-enhancing feature matrix
Figure FDA0002392415990000061
Figure FDA0002392415990000062
Based on the physiological data related feature matrix and the picture feature matrix, applying the attention weight corresponding method for carrying out feature interactive fusion on the two modal data to obtain a second attention reinforced feature matrix comprising the interaction weight of the physiological data related feature matrix on the picture feature matrix; the second attention-enhancing feature matrix is physiological data->Picture attention-enhancing feature matrix
Figure FDA0002392415990000063
Figure FDA0002392415990000064
Based on the physiological data related feature matrix and the text feature matrix, applying the attention weight corresponding method for carrying out feature interactive fusion on the two modal data to obtain a third attention-enhanced feature matrix comprising the influence weight of the text feature matrix on the physiological data related feature matrix; the third attention-enhancing feature matrix is text->Attention-enhancing feature matrix for physiological data
Figure FDA0002392415990000065
Figure FDA0002392415990000066
Based on the text feature matrix and the picture feature matrix, applying the attention weight corresponding method for carrying out feature interactive fusion on the two modal data to obtain a fourth attention reinforced feature matrix containing the mutual influence weight of the text feature matrix on the picture feature matrix; the fourth attention-enhancing feature matrix is text->Picture attention-enhancing feature matrix
Figure FDA0002392415990000067
Figure FDA0002392415990000068
Based on the physiological data related feature matrix and the picture feature matrix, applying the attention weight corresponding method for carrying out feature interactive fusion on the two modal data to obtain a fifth attention reinforced feature matrix comprising the interaction weight of the picture feature matrix on the physiological data related feature matrix; the fifth attention-enhancing feature matrix is a picture->Attention-enhancing feature matrix for physiological data
Figure FDA0002392415990000069
Figure FDA00023924159900000610
Based on the text feature matrix and the picture feature matrix, a sixth attention-enhanced feature matrix containing the mutual influence weight of the picture feature matrix on the text feature matrix is obtained by applying the attention weight corresponding method for carrying out feature interactive fusion on the two modal data; the sixth attention-enhancing feature matrix is a picture->Text attention-enhancing feature matrix
Figure FDA0002392415990000071
Figure FDA0002392415990000072
7. The multi-modal fusion method for mental stress detection according to claim 6, wherein the obtaining of the text fusion feature matrix, the picture fusion feature matrix and the physiological data fusion feature matrix based on the feedforward fully-connected neural network based on the first attention-strengthening feature matrix, the second attention-strengthening feature matrix, the third attention-strengthening feature matrix, the fourth attention-strengthening feature matrix, the fifth attention-strengthening feature matrix and the sixth attention-strengthening feature matrix specifically comprises:
acquiring a text fusion feature matrix through a layer of full-connection network based on the first attention-enhanced feature matrix and the sixth attention-enhanced feature matrix by using a following ninth processing model
Figure FDA0002392415990000073
Figure FDA0002392415990000074
Acquiring a picture fusion feature matrix through a layer of full-connection network based on the second attention-enhanced feature matrix and the fourth attention-enhanced feature matrix by using the following tenth processing model
Figure FDA0002392415990000075
Figure FDA0002392415990000076
Acquiring a physiological data fusion feature matrix through a layer of full-connection network based on the third attention-enhanced feature matrix and the fifth attention-enhanced feature matrix by using the following eleventh processing model
Figure FDA0002392415990000077
Figure FDA0002392415990000078
Wherein, W10~W15Representing the tenth to fifteenth pre-set training parameters in the first class of training parameters, b8~b10Represent the eighth to tenth preset training parameters in the second class of training parameters.
8. The multi-modal fusion method for mental stress detection according to claim 7, wherein the obtaining of the text, picture and physiological data feature values based on the physiological data related feature matrix, the text feature matrix and the picture feature matrix and based on a feedforward fully-connected neural network specifically comprises:
acquiring characteristic values of texts, pictures and physiological data by using the following twelfth processing model:
ST=ReLU(W16soft max(FT)+b11)
SV=ReLU(W17soft max(FV)+b12)
SE=ReLU(W18soft max(FE)+b13)
wherein the physiological data is correlated with a feature matrix FEThe text feature matrix FTAnd the picture feature moments FVMapping the arrays between (0,1), and obtaining the characteristic values S of texts, pictures and physiological data through one layer of full connectionT,SVAnd SE(ii) a Wherein, W16~W18Represents the sixteenth to eighteenth preset training parameters in the first class of training parameters, b11~b13Representing the eleventh to thirteenth preset training parameters in the second class of training parameters.
9. The multi-modal fusion method for mental stress detection according to claim 8, wherein the obtaining the importance weight values of the text, the picture and the physiological data based on the text, the picture and the physiological data feature values and based on a vector splicing and attention mechanism specifically comprises:
using the following thirteenth processing model to process the text, the picture and the physiological data characteristic value ST,SVAnd SESpliced together, and through an attention mechanism, the importance weight values weight of texts, pictures and physiological data are obtainedT,weightVAnd weightE
(weightT,weightV,weightE)=softmax([ST,SV,SE]W19)
Wherein, W119Representing a nineteenth preset training parameter in the first class of training parameters.
10. The multi-modal fusion method for mental stress detection according to claim 9, wherein the obtaining a fusion representation matrix of three modalities based on the importance weight values of the text, the picture and the physiological data and the text fusion feature matrix, the picture fusion feature matrix and the physiological data fusion feature matrix specifically comprises:
utilizing the following fourteenth processing model to weight the importance values weight of the text, the picture and the physiological dataT,weightVAnd weightEFusing feature matrices with the text
Figure FDA0002392415990000081
The picture fusion feature matrix
Figure FDA0002392415990000082
And the physiological data fusion feature matrix
Figure FDA0002392415990000083
Corresponding multiplication and addition are carried out to obtain a fusion expression matrix R of three modesW
Figure FDA0002392415990000084
Wherein, W20~W22Representing the twentieth to twenty-second preset training parameters in the first class of training parameters.
11. A multi-modal fusion apparatus for mental stress detection based on an attention weight correspondence apparatus for performing feature interactive fusion on two modal data, wherein the attention weight correspondence apparatus for performing feature interactive fusion on two modal data comprises:
the first acquisition module is used for acquiring an incidence relation matrix reflecting information incidence between different characteristics of the two modal data by utilizing matrix multiplication based on the characteristic matrix of the two modal data;
the second acquisition module is used for acquiring an influence weight matrix of the characteristic matrix of one modal data to the characteristic matrix of the other modal data based on the incidence relation matrix and the feedforward full-connection network model;
a third obtaining module, configured to obtain an attention-enhanced feature matrix including mutual influence weights of feature matrices of the two modality data by using matrix dot multiplication and residual connection based on the influence weight matrix and the feature matrices of the two modality data
Accordingly, the multi-modal fusion device for psychological stress detection comprises:
the fourth acquisition module is used for respectively acquiring a physiological data related characteristic matrix reflecting the physiological state of the user and a text characteristic matrix and an image characteristic matrix reflecting the psychological activity state of the user;
a fifth obtaining module, configured to obtain, based on the physiological data related feature matrix, the text feature matrix, and the image feature matrix, a first attention-enhanced feature matrix including a weight of an influence of the physiological data related feature matrix on the text feature matrix, a second attention-enhanced feature matrix including a weight of an influence of the physiological data related feature matrix on the image feature matrix, a third attention-enhanced feature matrix including a weight of an influence of the text feature matrix on the physiological data related feature matrix, a fourth attention-enhanced feature matrix including a weight of an influence of the text feature matrix on the image feature matrix, a fifth attention-enhanced feature matrix including a weight of an influence of the image feature matrix on the physiological data related feature matrix, and a fifth attention-enhanced feature matrix including a weight of an influence of the image feature matrix on the text feature matrix by using the attention-weight correspondence method The sixth attention-enhancing feature matrix of (1);
a sixth obtaining module, configured to obtain a text fusion feature matrix, an image fusion feature matrix, and a physiological data fusion feature matrix based on a feedforward fully-connected neural network based on the first attention-enhanced feature matrix, the second attention-enhanced feature matrix, the third attention-enhanced feature matrix, the fourth attention-enhanced feature matrix, the fifth attention-enhanced feature matrix, and the sixth attention-enhanced feature matrix;
a seventh obtaining module, configured to obtain a text, a picture, and a physiological data feature value based on the physiological data related feature matrix, the text feature matrix, and the picture feature matrix, and based on a feedforward fully-connected neural network;
the eighth acquiring module is used for acquiring importance weight values of the text, the picture and the physiological data based on the text, the picture and the physiological data characteristic values and based on a vector splicing and attention mechanism;
a ninth obtaining module, configured to obtain a fusion expression matrix of three modalities based on the importance weight values of the text, the picture and the physiological data, and the text fusion feature matrix, the picture fusion feature matrix and the physiological data fusion feature matrix;
and the tenth acquisition module is used for acquiring a pressure classification vector reflecting the psychological pressure problem based on the fusion expression matrix of the three modes and the feedforward full-connection network.
12. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the multimodal fusion method for mental stress detection as claimed in any one of claims 1 to 10.
CN201910567398.XA 2019-06-27 2019-06-27 Multi-mode fusion method and device for psychological pressure detection Active CN110301920B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910567398.XA CN110301920B (en) 2019-06-27 2019-06-27 Multi-mode fusion method and device for psychological pressure detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910567398.XA CN110301920B (en) 2019-06-27 2019-06-27 Multi-mode fusion method and device for psychological pressure detection

Publications (2)

Publication Number Publication Date
CN110301920A CN110301920A (en) 2019-10-08
CN110301920B true CN110301920B (en) 2020-06-02

Family

ID=68076687

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910567398.XA Active CN110301920B (en) 2019-06-27 2019-06-27 Multi-mode fusion method and device for psychological pressure detection

Country Status (1)

Country Link
CN (1) CN110301920B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113837390A (en) * 2020-06-23 2021-12-24 华为技术有限公司 Modal information completion method, device and equipment
CN112155577B (en) * 2020-10-15 2023-05-05 深圳大学 Social pressure detection method and device, computer equipment and storage medium
CN112861945B (en) * 2021-01-28 2022-05-13 清华大学 Multi-mode fusion lie detection method
CN112998652B (en) * 2021-02-23 2022-07-19 华南理工大学 Photoelectric volume pulse wave pressure identification method and system
CN113241178B (en) * 2021-05-28 2023-06-27 温州康宁医院股份有限公司 Device for determining severity of depression of tested person
CN113704502B (en) * 2021-08-27 2023-04-21 电子科技大学 Multi-mode information fusion account number position identification method based on social media
CN113940638B (en) * 2021-10-22 2023-09-19 上海理工大学 Pulse wave signal identification and classification method based on frequency domain dual-feature fusion
CN114201041B (en) * 2021-11-09 2024-01-26 北京电子工程总体研究所 Man-machine interaction command method and device based on brain-computer interface

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103126690A (en) * 2013-01-28 2013-06-05 周万荣 Human emotion recognition and control method, device and system based on applications
CN103838836A (en) * 2014-02-25 2014-06-04 中国科学院自动化研究所 Multi-modal data fusion method and system based on discriminant multi-modal deep confidence network
CN106250855A (en) * 2016-08-02 2016-12-21 南京邮电大学 A kind of multi-modal emotion identification method based on Multiple Kernel Learning
CN109801706A (en) * 2018-12-12 2019-05-24 清华大学 The cognitive method and device of psychological pressure problem

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10105089B2 (en) * 2014-06-18 2018-10-23 Hong Kong Applied Science And Technology Research Institute Co., Ltd. Systems and methods for blood pressure measurement with psychological status validation
JP6986680B2 (en) * 2016-08-29 2021-12-22 パナソニックIpマネジメント株式会社 Stress management system and stress management method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103126690A (en) * 2013-01-28 2013-06-05 周万荣 Human emotion recognition and control method, device and system based on applications
CN103838836A (en) * 2014-02-25 2014-06-04 中国科学院自动化研究所 Multi-modal data fusion method and system based on discriminant multi-modal deep confidence network
CN106250855A (en) * 2016-08-02 2016-12-21 南京邮电大学 A kind of multi-modal emotion identification method based on Multiple Kernel Learning
CN109801706A (en) * 2018-12-12 2019-05-24 清华大学 The cognitive method and device of psychological pressure problem

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于宽度学习方法的多模态信息融合;贾晨等;《CAAI Transactions on Intelligent Systems》;20190131;第14卷(第1期);第150-157页 *
多模态深度学习综述;刘建伟等;《Application Research of ComputersApplication Research of Computers》;20190426;第37卷(第6期);第2-19页 *

Also Published As

Publication number Publication date
CN110301920A (en) 2019-10-08

Similar Documents

Publication Publication Date Title
CN110301920B (en) Multi-mode fusion method and device for psychological pressure detection
Jiang et al. Probing the visual representation of faces with adaptation: A view from the other side of the mean
US11113890B2 (en) Artificial intelligence enabled mixed reality system and method
Barata et al. Internet of Things based on electronic and mobile health systems for blood glucose continuous monitoring and management
CN109801706B (en) Psychological stress problem sensing method and device
KR102239163B1 (en) Method and apparatus for predicting the presence or absence of diseases using artificial neural networks
CN114386528B (en) Model training method and device, computer equipment and storage medium
Huttunen et al. Assessment of obstructive sleep apnea-related sleep fragmentation utilizing deep learning-based sleep staging from photoplethysmography
WO2019128515A1 (en) Method, device, and equipment for information alert
CN108888277A (en) Psychological test method, system and terminal device
AU2021206060A1 (en) Dynamic user response data collection method
JP2022523631A (en) Heart rate measurement system
Gerger et al. It felt fluent but I did not like it: Fluency effects in faces versus patterns
Biancardi et al. A computational model for managing impressions of an embodied conversational agent in real-time
EP3856012B1 (en) Visualized virtual agent
CN116807476A (en) Multi-mode psychological health assessment system and method based on interface type emotion interaction
JP2008242534A (en) Healing system, server device, information processor and program
US20220284649A1 (en) Virtual Representation with Dynamic and Realistic Behavioral and Emotional Responses
McTear et al. Affective conversational interfaces
CN114424934A (en) Apnea event screening model training method and device and computer equipment
Sengupta Stress Detection: A Predictive Analysis
CN113748441A (en) Virtual agent team
KR102427093B1 (en) A method of creating a blending ratio of protein donuts considering the user's constitution
Das et al. A deep cnn framework for distress detection using facial expression
Younis et al. Machine learning for human emotion recognition: a comprehensive review

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant