CN115564027A

CN115564027A - Multi-modal learning behavior analysis method, system and storage medium

Info

Publication number: CN115564027A
Application number: CN202211323486.3A
Authority: CN
Inventors: 梅晓勇; 周友根; 黄昌勤
Original assignee: Zhejiang Normal University CJNU
Current assignee: Zhejiang Normal University CJNU
Priority date: 2022-10-27
Filing date: 2022-10-27
Publication date: 2023-01-03

Abstract

The invention discloses a multi-modal learning behavior analysis method, a system and a storage medium, which are applied to the technical field of artificial intelligence, can realize high-performance and high-interpretability multi-modal learning behavior analysis and provide reliable analysis and explanation for an education decision process. The method comprises the following steps: obtaining multi-modal behavior data of an object to be analyzed in the learning process and preprocessing the multi-modal behavior data to obtain multi-modal sequence data; performing collaborative embedding representation according to the multi-modal sequence data to obtain initial feature representation; constructing a learning behavior data association diagram according to the initial feature representation; decoupling the learning behavior data association graph through a graph decoupling neural network to obtain a multi-mode decoupling graph; constructing an attribute routing mechanism according to the relationship among the nodes of the multi-modal decoupling graph; updating the node embedded representation and the graph structure through an attribute routing mechanism to obtain a target characteristic representation and a target graph structure; and performing learning behavior analysis according to the target characteristic representation and visualizing the target graph structure to obtain a visual learning behavior analysis result.

Description

Multi-modal learning behavior analysis method, system and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a multi-mode learning behavior analysis method, a multi-mode learning behavior analysis system and a storage medium.

Background

With the wave of education big data enhancement and the rise of artificial intelligence technology, the application of the artificial intelligence technology and the education integration becomes an important means in the practice process of intelligent education. The intelligent education is also changed from the internet to the practical application scene such as accurate teaching assistance based on the analysis of the whole process behavior data of the students. In the related art, with the increase of the learning behavior data of students, high-dimensional features are difficult to understand by common people, and practical significance association with metadata is lost, so that no good method is provided for providing reliable analysis and explanation for the education decision making process on the basis of realizing high performance and high interpretability.

Disclosure of Invention

In order to solve at least one of the above technical problems, the present invention provides a method, a system and a storage medium for multi-modal learning behavior analysis, which can implement multi-modal learning behavior analysis with high performance and high interpretability, and provide reliable analysis and interpretation for an education decision process.

In one aspect, an embodiment of the present invention provides a multi-modal learning behavior analysis method, including the following steps:

acquiring multi-modal behavior data of an object to be analyzed in a learning process;

preprocessing the multi-modal behavior data to obtain corresponding multi-modal sequence data;

performing collaborative embedding representation according to the multi-modal sequence data to obtain initial feature representation;

constructing a learning behavior data association diagram according to the initial feature representation;

decoupling the learning behavior data association diagram through a graph decoupling neural network to obtain a multi-mode decoupling diagram;

constructing an attribute routing mechanism according to the relationship among all nodes in the multi-modal decoupling graph;

updating and iterating the node embedded representation and the graph structure of the multi-modal decoupling graph through the attribute routing mechanism to obtain a target characteristic representation and a target graph structure;

and performing learning behavior analysis according to the target characteristic representation and visualizing the target graph structure to obtain a visualized learning behavior analysis result.

The multi-modal learning behavior analysis method provided by the embodiment of the invention at least has the following beneficial effects: in the embodiment, firstly, multi-modal behavior data of an object to be analyzed in a learning process is acquired, and the acquired multi-modal behavior data is preprocessed to obtain corresponding multi-modal sequence data, so that deep feature learning is facilitated. Then, the present embodiment performs collaborative embedding representation according to the multi-modal sequence data to obtain an initial feature representation, and maps multiple modality information into the same embedding space by way of collaborative embedding, thereby enhancing complementarity between modalities. Then, a learning behavior data correlation diagram is constructed according to the obtained initial characteristic representation, so that the representation capability of the data is improved by constructing the correlation diagram. Meanwhile, the learning behavior association diagram is decoupled through the graphical coupling neural network, and a multi-modal decoupling diagram is obtained. In addition, in this embodiment, an attribute routing mechanism is constructed according to the relationship between each node in the multi-modal decoupling graph, and the node embedded representation and graph structure of the iterative multi-modal decoupling graph are updated through the attribute routing mechanism to obtain a target feature representation and a target graph structure, so that the transfer of information in the model is effectively controlled through the constructed attribute routing mechanism, and the accuracy and the transparency of the learning behavior analysis result are improved. Further, the embodiment performs learning behavior analysis according to the target feature representation and visualizes the target graph structure, thereby obtaining a visualized learning behavior analysis result, implementing high-performance and high-interpretability multi-modal learning behavior analysis, and providing reliable analysis explanation for the education decision process.

According to some embodiments of the invention, the preprocessing the multi-modal behavior data to obtain corresponding multi-modal sequence data comprises:

performing corresponding data cleaning according to the data form of the multi-modal behavior data to obtain multi-modal cleaning data;

and carrying out data serialization on the multi-modal cleaning data to obtain the multi-modal sequence data.

According to some embodiments of the invention, the performing the collaborative embedding representation according to the multi-modal sequence data to obtain an initial feature representation comprises:

and carrying out the collaborative embedding representation on different modal sequence data in the multi-modal sequence data through a convolutional neural network and a long-short term memory network to obtain the initial feature representation.

According to some embodiments of the invention, the constructing a learning behavior data correlation graph from the initial feature representation comprises:

obtaining an incidence relation between time slice nodes according to the initial feature representation; wherein the time-sliced node is a time slice in the multi-modal sequence data;

constructing an adjacency matrix according to the incidence relation;

and constructing the learning behavior data association graph according to the adjacency matrix and the time slice nodes.

According to some embodiments of the present invention, the constructing an attribute routing mechanism according to the relationship between the nodes in the multi-modal decoupling graph includes:

calculating a first difference score matrix between the nodes of each modal decoupling graph in the multi-modal decoupling graph; wherein the first difference score matrix is a mask matrix for intra-modal information transfer;

calculating a second difference score matrix of the same node in different modes in the multi-mode decoupling graph; wherein the second difference fractional matrix is a mask matrix across modal information transfer.

According to some embodiments of the present invention, the updating and iterating the node-embedded representation and the graph structure of the multi-modal decoupling graph through the attribute routing mechanism to obtain a target feature representation and a target graph structure includes:

transmitting the first difference score matrix control information in the time dimension of each modal decoupling graph in the multi-modal decoupling graph to obtain time dimension representation update;

obtaining modal dimension representation updating according to the transmission of the second difference fractional matrix control information on the modal dimension between each modal decoupling graph in the multi-modal decoupling graph;

updating the node embedded representation according to the time dimension representation update and the modal dimension representation update to obtain the target feature representation;

and updating the graph structure according to the target feature representation to obtain the target graph structure.

According to some embodiments of the present invention, the performing learning behavior analysis according to the target feature representation and visualizing the target graph structure to obtain a visualized learning behavior analysis result includes:

inputting the target characteristic representation into a preset graph decoupling learning behavior analysis model for learning behavior analysis to obtain a learning behavior analysis result;

visualizing the target graph structure according to the mask matrix to obtain a preset visualized image; the preset visual image comprises a heat map and a structure visual map;

and obtaining the visual learning behavior analysis result according to the learning behavior analysis result and the preset visual image.

On the other hand, an embodiment of the present invention further provides a multi-modal learning behavior analysis system, including:

the data acquisition module is used for acquiring multi-modal behavior data of an object to be analyzed in the learning process;

the preprocessing module is used for preprocessing the multi-modal behavior data to obtain corresponding multi-modal sequence data;

the embedding expression module is used for carrying out collaborative embedding expression according to the multi-modal sequence data to obtain initial characteristic expression;

the association diagram building module is used for building a learning behavior data association diagram according to the initial feature representation;

the decoupling module is used for decoupling the learning behavior data association diagram through a graph decoupling neural network to obtain a multi-mode decoupling diagram;

the routing construction module is used for constructing an attribute routing mechanism according to the relationship among all nodes in the multi-modal decoupling graph;

the routing updating module is used for updating and iterating the node embedded representation and the graph structure of the multi-modal decoupling graph through the attribute routing mechanism to obtain a target characteristic representation and a target graph structure;

and the result analysis module is used for carrying out learning behavior analysis according to the target characteristic representation and visualizing the target graph structure to obtain a visualized learning behavior analysis result.

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement the multi-modal learning behavior analysis method as described in the above embodiments.

In another aspect, the present invention further provides a computer storage medium, in which a program executable by a processor is stored, and when the program is executed by the processor, the program is used to implement the multi-modal learning behavior analysis method according to the above embodiments.

Drawings

FIG. 1 is a flow chart of a method for multi-modal learning behavior analysis according to an embodiment of the present invention;

fig. 2 is a schematic block diagram of a multi-modal learning behavior analysis system according to an embodiment of the present invention.

Detailed Description

The embodiments described in the embodiments of the present application should not be considered as limiting the present application, and all other embodiments obtained by those skilled in the art without making creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

With the wave of education big data enhancement and the rise of artificial intelligence technology, the semantic fusion of the artificial intelligence technology and the education becomes an important means in the practice process of intelligent education. The intelligent education is also changed from the internet to the practical application scene such as accurate teaching assistance based on the analysis of the whole process behavior data of the students. The individual behavior of the students is various, and the learning style, the psychological characteristics and the emotional characteristics of the students can be reflected to a certain extent. By means of the behavior analysis model based on deep learning, behavior data of the whole learning process of the students, including multi-mode behavior data such as sound, video and expression, can be mined, the learning process of the students can be known more comprehensively and efficiently, the learning rule of the students is mined, the depth and the breadth of learning analysis research are expanded, and service is provided for the learning process. At present, a multi-modal learning behavior analysis model mostly senses various behavior information through various intelligent recognition technologies, and collects multi-modal data to perform unified representation so as to accurately judge learning conditions. The method is mostly combined with a machine learning method, modeling is performed around an algorithm model, and the problem that data can be interpreted is ignored. And as the learning behavior data of students increases, the high-dimensional features are difficult to understand by common people, and the actual significance association with the metadata is lost. The models built based on deep learning contain uncertainty, most results are weak in association with education and teaching rules, certain interpretability is lacked, and serious threats are brought to actual education and teaching, especially technical application in result sensitive tasks such as academic early warning and knowledge tracking.

One embodiment of the invention provides a multi-modal learning behavior analysis method, a multi-modal learning behavior analysis system and a storage medium, which can realize high-performance and high-interpretability multi-modal learning behavior analysis and provide reliable analysis and explanation for an education decision process. Referring to fig. 1, the method of the embodiment of the present invention includes, but is not limited to, step S110, step S120, step S130, step S140, step S150, step S160, step S170, and step S180.

Specifically, the method application process of the embodiment of the invention includes, but is not limited to, the following steps:

s110: and acquiring multi-modal behavior data of the object to be analyzed in the learning process.

S120: and preprocessing the multi-modal behavior data to obtain corresponding multi-modal sequence data.

S130: and performing collaborative embedding representation according to the multi-modal sequence data to obtain an initial feature representation.

S140: and constructing a learning behavior data association diagram according to the initial feature representation.

S150: and decoupling the learning behavior data association diagram through a graph decoupling neural network to obtain a multi-mode decoupling diagram.

S160: and constructing an attribute routing mechanism according to the relationship among all nodes in the multi-modal decoupling graph.

S170: and updating the node embedded representation and the graph structure of the iterative multi-modal decoupling graph through an attribute routing mechanism to obtain a target feature representation and a target graph structure.

S180: and performing learning behavior analysis according to the target feature representation and visualizing the structure of the target graph to obtain a visualized learning behavior analysis result.

In the working process of the present embodiment, the present embodiment first obtains multi-modal behavior data of the learning process of the object to be analyzed. Data is the basis of driving model analysis, and compared with single-mode data, multimode data sources are diversified, data complementation can be realized, and the analysis effect is effectively enhanced. The embodiment enhances the complementation between data by acquiring the multi-modal behavior data of the object to be analyzed in the learning process. Illustratively, the multi-modal behavioral data obtained by the present embodiment includes physiological level data, psychological level data, behavioral level data, and mixed type data. For physiological layer data a _t In the embodiment, the data of the neurobiological layers of the eye movement frequency, the brain wave, the electrocardio, the galvanic skin response and the like of the student are acquired through the biological data acquisition technology, and the data can reflect the emotion and the physical health condition of the student. Meanwhile, the embodiment is perceived through the internet of things and can be wornThe wearable device collects body sign data such as body temperature, blood pressure and heart rate related to concentration and activity of a subject to be analyzed. For mental level data b _t Unstructured data in the learning platform, such as comment data of an object to be analyzed, are acquired through a web crawler technology, emotion information such as facial expressions and the like is acquired through an emotion recognition technology, and the speech content of the object to be analyzed is acquired through a speech technology such as automatic recognition and the like. Next, for the behavior level data c _t In the embodiment, various activity conditions in the campus, such as times of entering and exiting a library, retrieval of electronic resource contents, book borrowing frequency and the like, are recorded through the one-card. For mixed data d _t In the embodiment, data such as a learner click stream and a test, and data such as times of browsing courseware and interaction frequency are acquired by using a log search technology on data stored in the teaching management platform. Further, the embodiment respectively preprocesses the multi-modal behavior data to obtain corresponding multi-modal sequence data. Because the acquired data are multi-modal behavior data, and the representation modes of different modal data are different, the multi-modal behavior data need to be respectively preprocessed, that is, the behavior data of different modalities need to be correspondingly preprocessed to obtain corresponding multi-modal sequence data, so as to facilitate deep feature learning.

Further, the embodiment performs collaborative embedding representation according to the multi-modal sequence data to obtain an initial feature representation. Because certain semantic barriers exist among behavior data of different modalities, the multi-modality sequence data obtained after preprocessing is subjected to collaborative embedded representation, so that information of multiple modalities is mapped into the same embedded space, the semantic barriers among the modalities are broken, and complementarity among the modalities is enhanced. Next, the present embodiment constructs a learning behavior data association graph from the initial feature representation. The graph structure data can retain not only the time sequence information but also the structure information, compared to the sequence data. Meanwhile, different education decision tasks require attribute information of different aspects of data, such as learning style, learning time and the like, and the attribute information can be reflected by giving decision explanations through different graph structures. The embodiment constructs a learning behavior data association diagram through the obtained initial characteristic representation so as to enhance the data representation capability. Then, the embodiment decouples the learning behavior data association graph through a graph decoupling neural network to obtain a multi-mode decoupling graph. In the embodiment, the problem of semantic missing in the interpretation method based on the attention mechanism is relieved by carrying out graph decoupling on the learning behavior data association graph, the extracted abstract features are associated with the specific modal attributes, node representation is enriched and refined, the problem of message loss in the subsequent feature extraction process is reduced, the learning behavior analysis effect is enhanced, and the interpretability is improved. Wherein the multi-modal decoupling graph comprises high-dimensional features with specific semantics. Further, in this embodiment, an attribute routing mechanism is constructed according to the relationship between each node in the multi-modal decoupling graph, after graph decoupling is performed on the learning behavior data association graph, the embedded representation blocks of each node are specific modal attribute representations, an individual channel is provided for the specific modal attribute representations to guide signal flow, and valuable messages are extracted from the learning behavior association graph nodes for learning behavior analysis. The embodiment introduces the constructed attribute routing mechanism in the embedding propagation process to guide the signal flow, thereby extracting valuable information for learning behavior analysis. Next, the node embedded representation and graph structure of the iterative multimodal decoupling graph are updated through the attribute routing mechanism to obtain the target feature representation and the target graph structure, so that the transfer of information in the model is effectively controlled through the constructed attribute routing mechanism, and the accuracy and the transparency of the learning behavior analysis result are improved. Further, the embodiment performs learning behavior analysis according to the target feature representation and visualizes the target graph structure, so as to obtain a visualized learning behavior analysis result, realize high-performance and high-interpretability multi-modal learning behavior analysis, and provide reliable analysis and interpretation for the education decision process.

In some embodiments of the present invention, the multimodal behavior data is preprocessed to obtain corresponding multimodal sequence data, including but not limited to:

and carrying out corresponding data cleaning according to the data form of the multi-modal behavior data to obtain multi-modal cleaning data.

And carrying out data serialization on the multi-modal cleaning data to obtain multi-modal sequence data.

In this particular embodiment, the pre-processing of the multi-modal behavioral data by this embodiment includes data cleansing and data serialization. Specifically, the present embodiment performs corresponding preprocessing for different modal behavior data. According to the embodiment, corresponding data cleaning is carried out according to the data form of the multi-modal behavior data to obtain multi-modal cleaning data. Then, the multi-modal cleansing data is serialized to obtain multi-modal sequence data. Illustratively, the present embodiment performs the stop word, space and symbol operations on the text data, while performing the transformation enhancement on the stem. Next, the present embodiment turns the enhanced transformed data into sequence data by word segmentation (token). For example, word piece token is used for text data at sentence level, and sensor piece token is used for text data at document level. In addition, for image data, the present embodiment first performs image enhancement processing such as super-resolution and defogging in accordance with picture quality, noise, and the like. Next, the present embodiment passes through the patch (patch), and becomes sequence data in the order from top to bottom. Meanwhile, in the embodiment, after signal enhancement operations such as high-pass filtering and low-pass filtering are performed on data such as voice, physiology, and psychology, the data are sliced into specific sequences according to a certain time interval. After the operation, the multi-modal behavior data can be unified into multi-modal sequence data. Wherein the multimodal sequence data is represented by the following formula (1):

S _k ＝{w _k,1 ,w _k,2 ,…,w _k,i ,…,w _k,t } (1)

wherein, in the formula w _k,i Representing a time slice in the sequence in the k-mode.

In some embodiments of the present invention, the collaborative embedded representation is performed based on multi-modal sequence data, resulting in an initial feature representation, including but not limited to:

and carrying out cooperative embedded representation on different modal sequence data in the multi-modal sequence data through a convolutional neural network and a long-short term memory network to obtain initial feature representation.

In this embodiment, the present embodiment performs cooperative embedding representation by Convolutional Neural Networks (CNNs) and long short term memory networks (LSTMs). Specifically, in the present embodiment, the converted different modality sequence data, such as picture modality data and text modality data, are cooperatively embedded and represented through a convolutional neural network and a long-short term memory network, so as to map multiple modality information into the same embedding space, enhance complementarity between modalities, facilitate subsequent similarity-based composition and structuring operations, and obtain a corresponding embedded representation as shown in the following formula (2):

X _k ＝{x _k,1 ,x _k,2 ,…,x _k,i ,…,x _k,t } (2)

wherein, in the formula

Representing an embedded representation of the time slice i time corresponding data in the k modality.

Further, the present embodiment performs a downsampling operation on the embedded representation obtained by the above formula by using feature pooling in the modal dimension to fuse local features of the respective modalities to obtain an initial embedded representation for each time slice displacement as an initial feature representation, as shown in the following formulas (3) and (4):

wherein in the formula

And representing the learning behavior state representation of the student at the time t, namely the initial feature representation.

In some embodiments of the invention, a learning behavior data correlation graph is constructed from the initial feature representations, including but not limited to:

and obtaining the incidence relation between the time slice nodes according to the initial feature representation. Wherein the time-sliced node is a time slice in the multimodal sequence data.

And constructing an adjacency matrix according to the incidence relation.

And constructing a learning behavior data association diagram according to the adjacency matrix and the time slice nodes.

In this embodiment, the association relationship between time slice nodes is obtained according to the initial feature representation. Wherein each time slice w in the multimodal sequence data _i And the nodes are taken as nodes in the learning behavior data association graph, namely time slicing nodes. Then, the embodiment constructs an adjacency matrix according to the association relationship, thereby constructing a learning behavior data association graph according to the adjacency matrix and the time slice nodes. In particular, the association graph is generally composed of nodes and connecting edges between the nodes. This embodiment slices each time w in the multimodal sequence data _i As a node v in a learning behavior data correlation graph _i . And the connecting edges represent the existing association relationship between the nodes, usually represented by an adjacency matrix. Meanwhile, the adjacency matrix may also represent the degree of dependency between nodes. For example, v _i And v _j There is an association between them, then A _ij =1, otherwise A _ij =0. Further, in order to obtain the initial graph structure, the embodiment is based on the initial feature representation of the nodes

The cosine similarity between node pairs is calculated in the embedding space to obtain a weighted adjacency matrix, as shown in the following formula (5):

wherein, in the formula w _p Is a learnable weight parameter, h _i And h _j Representative node v _i And v _j Represents the cosine similarity.

Further, it is characterized by thatSequence information of the time sequence data is reserved, nodes in the graph are connected according to a sequence in the multi-mode sequence data, and therefore a learning behavior data association graph is constructed according to the adjacency matrix and the time slice nodes. Meanwhile, the graph construction method based on the similarity is dynamic and heuristic, and is updated according to the updating of the node information, so that the finally learned graph structure can be visualized as the presentation of the structure information, and certain interpretability is given. After the operation, the multi-modal sequence data S _k Conversion into a learning behavior data association diagram consisting of time slice nodes and connecting edges

Wherein

Represents nodes appearing in the graph, n represents the number of nodes appearing in the sample, and the association relationship existing between the nodes can be described as

By means of a contiguous matrix

Representing, simultaneously obtaining an initial embedded representation H ⁰ Feature matrix that can be graph nodes

Where c represents the dimension of the feature vector.

It should be noted that, unlike the attention mechanism that directly uses the original embedding of nodes to calculate the contribution of each node to the learning behavior analysis, the embodiment obtains the multi-modal decoupling graph by decoupling the learning behavior data association graph through the graph decoupling neural network. Specifically, the present embodiment first encodes a single dimension of an initial embedded representation of a node, which may be decomposed as shown in equation (6) below, into a hidden neuron as an input signal of the neuron:

h _i ＝{h _i,1 ,h _i,2 ,…,h _i,m ,…h _i,c } (6)

wherein in the formula

Initial embedded representation of a representation node, h _i,m Representing a node v _i The mth input signal of (1).

Further, the present embodiment divides the initial embedded representation of each node into blocks as neurons according to a linear model. To relate the output signals of the neurons to specific semantics, the present embodiment relates the neuron output signals to specific modal attribute features by a supervisory task. Wherein the specific modal attribute corresponds to the embedded representation x of the original modal data time node _k,i And using the feature representation on the time slice of the original modality data as a surveillance tag to associate with a particular modality attribute. Therefore, the neuron of a specific mode is represented by the following formula (7):

h _i,k ＝W _k h _i +b _i,k (7)

where k is a modal attribute information indicator, W _k Is a trainable weight matrix, b _i,k The method is a learnable offset vector, and the linear model can obtain the contribution value of each input signal according to the weight matrix, so the method has good interpretability.

Further, signal extraction without supervision tasks cannot guarantee that the extracted attribute information is consistent with the desired specific attribute. Therefore, the present embodiment introduces multi-label classification loss in the signal extraction process of neurons, and utilizes the original modal attribute labels to constrain the attribute feature extraction process, and the specific formula is represented by the following formulas (8) and (9):

wherein, K is the number of attribute information, P _i,k Is a true attribute tag that is either a true attribute tag,

is the probability predicted to be that attribute. It will be readily appreciated that the graphical coupling operation in this particular embodiment may be understood as correlating the learning behavior data

Decomposing to obtain multiple modal decoupling graphs G _k = V, E, the initial embedding of the nodes on each modal decoupling graph represents h, although the connection structure of the nodes and edges remains unchanged _i,k Having represented specific modal attributes, there is a distinction between modalities. Therefore, the graphical coupling can overcome the problem of semantic missing in an interpretation method based on an attention mechanism, the extracted abstract features are associated with specific modal attributes, node representation is enriched and refined, the problem of information loss in the subsequent feature extraction process is solved, the learning behavior analysis effect is enhanced, and the interpretability is improved.

In some embodiments of the invention, an attribute routing mechanism is constructed from relationships between nodes in the multi-modal decoupling graph, including but not limited to:

and calculating a first difference fraction matrix between nodes of each modal decoupling graph in the multi-modal decoupling graph. Wherein the first difference score matrix is a mask matrix for intra-modality information transfer.

And calculating a second difference fraction matrix of the same node in different modes in the multi-mode decoupling graph. And the second difference fraction matrix is a mask matrix crossing modal information transfer.

In this embodiment, in the embodiment, an attribute routing mechanism is first constructed according to the relationship between each node in the multi-modal decoupling graph. Specifically, in the embodiment, first difference score matrices between nodes in each modal decoupling graph in the multi-modal decoupling graph are calculated, and the first difference score matrices are used as mask matrices for intra-modal information transfer. In addition, the embodiment calculates a second difference score matrix of the same node in different modes in the multi-mode decoupling graph, so that the second difference score matrix is used as a mask matrix for cross-mode information transfer. In order to realize the time-context correlation of the learning behavior information, the embodiment constructs the first difference score matrix to realize the transmission of the learning behavior information of the object to be analyzed in the time dimension. Calculating a difference score matrix between each node interaction on a single decoupled modal decoupling graph, namely a first difference score matrix, as a mask matrix for intra-modal information transfer, wherein the calculation formula is shown as the following formula (10):

wherein, in the formula h _i,k And h _j,k Representing the initial embedded representation on a given modal decoupling graph, k is a modal indicator.

Further, the learning behavior data of the object to be analyzed may exhibit complementarity and differences between different modalities, and the embodiment implements transfer of the learning behavior of the object to be analyzed in the modality dimension by calculating the second difference score matrix. In this embodiment, a difference score matrix of the same node on different modes, that is, a second difference score matrix, is calculated on each mode decoupling graph as a mask matrix for cross-mode information transfer, and a calculation formula is shown in the following equation (11):

wherein, in the formula h _i,k And h _j,k Represents the initial embedded representation on a given modal decoupling graph, K is a modal indicator, and K represents the number of different modal decoupling graphs.

In some embodiments of the present invention, the node-embedded representation and graph structure of the iterative multi-modal decoupling graph are updated by an attribute routing mechanism to obtain a target feature representation and a target graph structure, including but not limited to:

and according to the transmission of the first difference score matrix control information in the time dimension of each modal decoupling graph in the multi-modal decoupling graph, obtaining time dimension representation updating.

And transmitting the control information on the modal dimension among the modal decoupling graphs in the multi-modal decoupling graph according to the second difference fractional matrix to obtain modal dimension representation updating.

And updating the node embedded representation according to the time dimension representation updating and the modal dimension representation updating to obtain the target feature representation.

And according to the target feature representation, updating the graph structure to obtain a target graph structure.

In this specific embodiment, in the present embodiment, first, time dimension transmission is performed inside each modal decoupling graph in the multi-modal decoupling graph according to the first difference score matrix control information, so as to obtain time dimension representation update. Next, in this embodiment, according to the transfer of the second difference score matrix control information on the modal dimension between each modal decoupling graph in the multi-modal decoupling graph, the modal dimension representation update is obtained. And then, respectively updating the node embedded representation and the graph structure according to the time dimension representation updating and the modal dimension representation updating to obtain corresponding target feature representation and target graph structure. Specifically, after a mask matrix for intra-modal information transfer is obtained through calculation, that is, a first difference score matrix, the present embodiment transfers control information in a modal decoupling graph according to the mask matrix, so as to perform node representation updating, where a node representation updating formula in a time dimension in a modality is shown in the following expression (12):

wherein, in the formula

Node-embedded representation of updates on behalf of nodes in the time dimension，

Representing a node v _i Is received.

Further, after a mask matrix for cross-modal information transfer is obtained through calculation, that is, a second difference score matrix, the present embodiment controls information to be transferred between decoupling graphs of different modalities according to the mask matrix, so as to perform node representation update, where the node representation update in the modal dimension is as shown in the following equation (13):

wherein in the formula

A node-embedded representation that represents an update of the node in the modal dimension,

representing a node v _i Is received.

Further, the present embodiment obtains an iteratively updated node-embedded representation, i.e., a target feature representation, according to the obtained time dimension representation update and modality dimension representation update. Wherein, the updated target feature is represented by the following formula (14):

wherein μ represents a learnable offset vector,

a temporary feature embedded representation representing the node.

Further, after obtaining the temporary feature update embedded representation of the target node by calculation, the present embodiment performs final node embedded representation update according to the information update principle on the graph, and the node embedded representation after update iteration on the whole graph is as shown in the following equation (15):

wherein in the formula

Representing a node-feature representation at a level above in the graph structure,

a node characteristic representation representing a current network layer.

Further, in this embodiment, based on the obtained updated target node feature representation of the current network layer, the graph structure in the current state, that is, the target graph structure, is updated by calculation according to equation (5).

In some embodiments of the present invention, the learning behavior analysis is performed according to the target feature representation and the target graph structure is visualized to obtain a visualized learning behavior analysis result, including but not limited to:

and inputting the target characteristic representation into a preset graph decoupling learning behavior analysis model for learning behavior analysis to obtain a learning behavior analysis result.

And visualizing the structure of the target graph according to the mask matrix to obtain a preset visualized image. The preset visual image comprises a heat map and a structure visual map.

And obtaining a visual learning behavior analysis result according to the learning behavior analysis result and a preset visual image.

In this embodiment, in the present embodiment, the target feature representation is first input into a preset graph to decouple the learning behavior analysis model information to learn the behavior analysis. Specifically, different learning behavior analysis tasks are required for different education decision requirements, so that different graph structures are required to be input, the whole education decision task can be divided into node classification tasks, such as score prediction and knowledge tracking, and graph classification tasks, such as emotion analysis and learning situation prediction. At the same time, this also corresponds to different levels of interpretation. In order to realize decision and interpretation generation of multiple tasks in a unified framework, the tasks on the graph are converted into index generation tasks, and overall classification indexes and sequence indexes of nodes are generated for decision, as shown in the following formula (16):

wherein softmax is used for generating indexes, and the graph classification task carries out agg aggregation operation on the node feature matrix to obtain a unique vector representation for generating the whole classification indexes. Further, the present embodiment employs a cross entropy loss function constraint classifier, as shown in the following equation (17):

wherein, D in the formula _train Is a delimited training set and S is a training sample in the training set. The learning behavior analysis model obtained through the construction, namely the preset graph decoupling learning behavior analysis model, can realize various education decision tasks.

Further, in order to implement training of the preset graph decoupling learning behavior analysis model, in the embodiment, in the modeling of the multi-modal learning behavior deep learning, collected multi-modal behavior data is divided into a training set and a test set, and the preset graph decoupling learning behavior analysis model designed according to the above steps is trained and verified. In the training process, the learning behavior analysis model adopts a dynamic association graph structure, the graph structure is iteratively updated according to the learned node characteristics in the training process, and meanwhile, in order to further enhance the effects of graph decoupling and routing mechanisms, node representation is updated through an attribute routing mechanism designed for U times of iteration. Further, in order to implement end-to-end unified training, a unified objective function and optimization approach of the multi-modal learning behavior analysis model are adopted, and the objective function of the final learning behavior analysis model is shown as the following formula (18):

wherein α and β are the fractional losses respectively

And decoupling loss

The number of iterations U =3 of the attribute routing mechanism in this embodiment.

Further, to generate a structured interpretation of the learning analysis results, a trained preset graph is given to decouple the learning behavior analysis model Φ and the learning behavior analysis prediction result Y, by identifying a sub-graph G of the sample graph _s And node characteristics X having the most influence on the prediction of the preset graph decoupling learning behavior analysis model phi _s Generates an interpretation, the interpretation process of the optimized ensemble in the form of Mutual Information (MI) is shown in the following equation (19):

wherein f (-) in the formula represents an entropy calculation function. It should be noted that f (Y) is fixed, because the decision parameters of the trained preset graph decoupling learning behavior analysis model are fixed. Thus, the overall optimization function becomes a minimization f (Y | G = G) _s ,X＝X _s ) As shown in the following formula (20):

further, the target graph structure is visualized according to the mask matrix to obtain a preset visualized image. The preset visual image comprises a heat map and a structure visual map. Specifically, the present embodiment implements information flow delivery and graph structure update through the constructed attribute routing mechanism. Thus, it is possible to provideThe implementation obtains a preset visual image by visualizing the mask matrix of the attribute routing mechanism. The embodiment expands the traditional method for depicting the node and the edge weight by using an attention mechanism based on the idea of feature importance, gives specific semantics to the features, overcomes the problems of inaccurate and non-unique explanation caused by attention weight semantic deletion, and simultaneously delineates the structure of the graph in a fine-grained way. Illustratively, the embodiment randomly selects an object to be analyzed and collects learning behavior data of the object. Preprocessing and initially embedding the multi-modal learning behavior data of the students, and constructing an initial graph structure according to the feature similarity for information transmission. Further, after the graph construction is completed, graph decoupling is carried out on the fusion feature representation by using a supervision task, the feature representation of the node is updated, and H is generated _k ＝{h _1,k ,…,h _i,k }. Then, the present embodiment inputs the feature representation into the training model, and outputs the mask value q of each node and area under the specific modality and the mask value c between multiple modalities through the processing of the attribute routing mechanism. Then, after the preset graph decoupling learning behavior analysis model is trained, the preset graph decoupling learning behavior analysis model is input to an interpretation generation optimization process to obtain a subgraph G which has the largest influence on the result _s And the most influential node feature subset X _s . Then, in this embodiment, a heat map corresponding to the mask value of the obtained attribute routing mechanism and a corresponding map structure are drawn through matplotlib, and different color identifiers are given according to the magnitude of the numerical weight, so that the target map structure is visualized, and a preset visualized image is obtained. It should be noted that the preset visualized image obtained in the present embodiment is divided into a heat map and a structure visualized map. Wherein, darker color represents more weight, and has more influence on the learning analysis result. On one hand, the visual representation can intuitively reflect the importance of different nodes and areas, is beneficial to guiding information transmission and embodies the importance of specific information. On the other hand, different modes have different weights, so that the importance among the modes can be reflected in the heat map, and the importance of certain node information specific to a certain mode can be realized. Further, the present embodiment integrates learning behavior scoresAnd analyzing the result and a preset visual image to obtain a visual learning behavior analysis result.

It should be noted that, in some embodiments of the present invention, the embodiment performs multi-modal learning behavior analysis support by decoupling a learning behavior analysis model through a preset graph, and generates an interpretation of visualization and natural language interaction according to a message flow of a model decision process. Exemplarily, assuming that course learning is performed on a certain object Z to be analyzed, variables related to the learning process of students are recorded in the background of the system, and the end-of-term examination performance (classified into failing, passing, medium, good, excellent and five grades) of the object Z needs to be predicted. At present, shallow feature analysis is required to be performed on data of different modes, then normalization and decoupling representation learning are performed on shallow features, and global feature representation is obtained by fusion according to a routing mechanism and is used as an analysis basis. According to the method, firstly, data cleaning is carried out on data in different modes, different deep learning algorithms are adopted to obtain node initial vector representation X, and a graph structure G is formed. Then, in the embodiment, the nodes in the graph structure are decoupled according to the existing multi-modal data attributes to obtain a decoupling representation h _i,k Updating node representation according to the decoupled characteristics and a routing mechanism, and fusing the node characteristics to obtain h _i . Further, according to the characteristic h _i Making a decision, calculation (G), in whole or in part _s ,X _s ) And an explanation is given by a visualization method according to the updated graph structure.

An embodiment of the present invention also provides a multimodal learning behavior analysis system, including:

and the data acquisition module is used for acquiring multi-modal behavior data of the object to be analyzed in the learning process.

And the preprocessing module is used for preprocessing the multi-modal behavior data to obtain corresponding multi-modal sequence data.

And the embedded representation module is used for carrying out collaborative embedded representation according to the multi-modal sequence data to obtain initial feature representation.

And the association diagram building module is used for building a learning behavior data association diagram according to the initial characteristic representation.

And the decoupling module is used for decoupling the learning behavior data association diagram through a graph decoupling neural network to obtain a multi-mode decoupling diagram.

And the route building module builds an attribute route mechanism according to the relationship among all nodes in the multi-modal decoupling graph.

And the routing updating module is used for updating the node embedded representation and the graph structure of the iterative multi-modal decoupling graph through an attribute routing mechanism to obtain a target feature representation and a target graph structure.

Referring to fig. 2, an embodiment of the present invention further provides a multi-modal learning behavior analysis system, including:

at least one processor 210.

At least one memory 220 for storing at least one program.

When the at least one program is executed by the at least one processor 210, the at least one processor 210 implements the multi-modal learning behavior analysis method as described in the above embodiments.

An embodiment of the present invention also provides a computer-readable storage medium storing computer-executable instructions for execution by one or more control processors, e.g., to perform the steps described in the above embodiments.

One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as is well known to those skilled in the art.

While the preferred embodiments of the present invention have been described in detail, it will be understood by those skilled in the art that the foregoing and various other changes, omissions and deviations in the form and detail thereof may be made without departing from the scope of this invention.

Claims

1. A multi-modal learning behavior analysis method is characterized by comprising the following steps:

2. The multi-modal learning behavior analysis method as claimed in claim 1, wherein the pre-processing the multi-modal behavior data to obtain corresponding multi-modal sequence data comprises:

3. The multi-modal learning behavior analysis method according to claim 1, wherein the performing of the collaborative embedding representation according to the multi-modal sequence data to obtain an initial feature representation comprises:

and performing the collaborative embedding representation on different modal sequence data in the multi-modal sequence data through a convolutional neural network and a long-short term memory network to obtain the initial feature representation.

4. The multi-modal learning behavior analysis method according to claim 1, wherein the building of the learning behavior data association diagram according to the initial feature representation comprises:

constructing an adjacency matrix according to the incidence relation;

5. The method according to claim 1, wherein the constructing an attribute routing mechanism according to the relationship between the nodes in the multi-modal decoupling graph comprises:

calculating a second difference score matrix of the same node in different modes in the multi-mode decoupling graph; wherein the second difference score matrix is a mask matrix that spans modal information transfer.

6. The multi-modal learning behavior analysis method of claim 5, wherein the iterating the node-embedded representation and graph structure of the multi-modal decoupling graph through the attribute routing mechanism update to obtain a target feature representation and a target graph structure comprises:

updating the node embedded representation according to the time dimension representation updating and the modal dimension representation updating to obtain the target feature representation;

7. The multi-modal learning behavior analysis method according to claim 5, wherein the performing learning behavior analysis according to the target feature representation and visualizing the target graph structure to obtain a visualized learning behavior analysis result comprises:

8. A multimodal learning behavior analysis system, comprising:

the embedded representation module is used for carrying out collaborative embedded representation according to the multi-modal sequence data to obtain initial characteristic representation;

9. A multimodal learning behavior analysis system, comprising:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement the multi-modal learning behavior analysis method of any of claims 1-7.

10. A computer storage medium in which a processor-executable program is stored, the processor-executable program being configured to implement the multimodal learning behavior analysis method according to any one of claims 1 to 7 when executed by the processor.