CN116361697A

CN116361697A - Learner learning state prediction method based on heterogeneous graph neural network model

Info

Publication number: CN116361697A
Application number: CN202310342704.6A
Authority: CN
Inventors: 倪琴; 魏廷江; 安冬冬; 杨茹; 刘双; 余杨泽
Original assignee: Shanghai Normal University
Current assignee: Shanghai Normal University
Priority date: 2023-03-31
Filing date: 2023-03-31
Publication date: 2023-06-30

Abstract

The invention relates to a learner learning state prediction method based on a heterogeneous graph neural network model, which comprises the following steps: acquiring an online learning data set, and cleaning and preprocessing the online learning data set; constructing an isomerism interaction graph by using an isomerism graph construction method; establishing a heterogeneous GNN model, initializing the model and setting super parameters, wherein the heterogeneous GNN model comprises a GNN unit and a window attention unit; training and testing the heterogeneous GNN model by utilizing the heterogeneous interaction diagram to obtain a learning state prediction model; based on the actual online learning data, a corresponding heterogeneous interaction diagram is constructed, a learning state prediction model is input, and a corresponding learning state recognition result is output. Compared with the prior art, the method and the device effectively improve the precision of student state estimation, problem difficulty distinction and the like in large-scale online learning data, and can accurately identify, track and predict the learning state of a learner.

Description

Learner learning state prediction method based on heterogeneous graph neural network model

Technical Field

The invention relates to the technical field of online learning data analysis, in particular to a learner learning state prediction method based on a heterogeneous graph neural network model.

Background

In recent years, large-scale offline education is gradually replaced by fusion education, and meanwhile, homogenized online learning content is increasingly unable to meet personalized learning requirements of more and more participants. While learner needs more detailed personalized learning feedback and help, current online learning platforms are not flexible enough to cope with these challenges, traditional methods for analyzing learner knowledge state based on examination results and paper answers lack sufficient resources and environments to support high quality and deliberate remote learning. Thus, educational data mining plays a vital role in data analysis and intelligent coaching systems, knowledge Tracking (KT) is a learning situation analysis task that tracks learning activities based on computer-aided large-scale data processing capabilities, which can provide diagnostic results to learners without complex manual or teacher involvement.

In KT, the learning content is represented by a Knowledge Component (KCs), and the learner's Knowledge State (KS) is a set of KCs. For example, by tracking the learner's mastery of KC during the first ten exercises, their performance or KS during the 11 th exercise can be estimated. Although the current deep KT model is superior to the traditional model in predicting the learner's mastery of KC, the existing KT model has three main disadvantages: first, the simulation of learner memory in RNN type model relies entirely on gating information, resulting in no feedback; the transition in the transformer-based model can be seen as a review of mandatory information, which has a certain visual interpretation effort but still violates the intuition of the learner's memory process; secondly, although the progress is made in practical application by aggregating higher-order relations through GNN, the relation between the problems and knowledge points in the practical data set and learners is very complex, and the research of aggregated heterogeneous characteristics and large-scale learner interaction graph structures is still lacking; finally, deep KT models often lack interpretable problem parameters, while for ITS (intelligent coaching system) systems, providing solvable practical feedback to the learner is more important than model fitting capability.

Currently, there are various methods in the KT field for estimating the learning level of the learner on the KC, such as bayesian method, logistic regression method, and deep learning method. Conventional methods generally include psychometric methods based on psychology (based on statistical regression, etc.) and bayesian-based methods. Project response theory (IRT) is widely used in KT framework, where Rasch (1 PL-IRT) is also used to estimate project difficulty. With the rapid development of deep learning, some RNN-based methods have been proposed to model long dependencies between interactions, and converter-based methods have been proposed to emphasize the weights of different KCs. With the help of a strong fitting capability, these depth models show considerable performance. In the depth KT method, depth Knowledge Tracking (DKT) is an early-emerging RNN-based depth KT model, while self-attention KT (SAKT) is the first model to introduce transducer structures into KT. The RNN structure can gradually process sequential data from left to right and from bottom to top, retain certain information of the memory unit based on the gating information, and can be maintained at step t ₀ To t _n-1 Information captured during to help predict t _n . Most of online learning data are sequence data, and the online learning data are very suitable for RNN structure processing. In addition, SAKT introduces a attention mechanism based on sequential tasks, so that contributions of different KCs to master prediction at current time KCs can be emphasized, but a deep learning method requires a large amount of data to train a model, and the interpretability of the data cannot be ensured, so that the accuracy of a prediction result is lower.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a learner learning state prediction method based on a heterogeneous graph neural network model, which can accurately identify, track and predict the learner learning state.

The aim of the invention can be achieved by the following technical scheme: a learner learning state prediction method based on a heterogeneous graph neural network model comprises the following steps:

s1, acquiring an online learning data set, and cleaning and preprocessing the online learning data set;

s2, constructing a heterogeneous interaction diagram by utilizing a heterogeneous diagram construction method and combining the data obtained in the step S1;

s3, building a heterogeneous GNN (Graph Neural Network, graphic neural network) model, initializing the model and setting super parameters, wherein the heterogeneous GNN model comprises a GNN unit and a window attention unit;

s4, training and testing the heterogeneous GNN model established in the step S3 by utilizing the heterogeneous interaction diagram established in the step S2 to obtain a learning state prediction model;

s5, based on actual online learning data, constructing a corresponding heterogeneous interaction diagram, inputting a learning state prediction model, and outputting a corresponding learning state recognition result.

Further, the preprocessing operation in step S1 includes log processing, data standardization, resampling and data segmentation, where the data segmentation specifically includes dividing the online learning data set into a training set, a verification set and a test set according to a set proportion.

Further, the heterogeneous interaction graph includes interaction information and interactions between learners, questions, and responses.

Further, the GNN unit in step S3 includes a difficulty estimation module, a basic feature extraction module, and a hierarchical feature extraction module, where the difficulty estimation module is configured to generate interpretable problem differentiation and problem difficulty;

the basic feature extraction module is used for extracting a higher-order relation between the problem and the knowledge points;

the hierarchical feature extraction module is used for calculating coarsening information of different types of nodes and expanding the coarsening information to the heterogeneous graph so as to obtain the representation of the hierarchical structure features.

Further, the window attention unit in step S3 is configured to extract a sequence feature, and predict and obtain a learning state of the learner according to the learning behavior and the output data of the GNN unit.

Further, the specific working process of the difficulty estimation module is as follows:

calculating the difficulty and the distinguishing degree of the problems and the knowledge points generated and related by the online learning behavior through a TrueSkill system, so as to obtain interpretable learning parameters;

aiming at knowledge points and problems faced in online learning activities, the difficulty and the distinguishing degree of the knowledge points and problems are continuously estimated through small-batch sub-sampling, and parameters calculated by TrueSkill are converted into difficulty and distinguishing degree features by using the following formula:

d _τ ＝Softmax(Linear _τ(t) ([μ _τ ,σ _τ ]))

wherein mu _τ Sum sigma _τ The parameter weights representing the problem or knowledge point τ, respectively, the TrueSkill score is continuously updated during training with small batches of data and this parameter is not updated during prediction.

Further, the specific working process of the basic feature extraction module comprises the following steps:

1) Marginal relationship representation among heterogeneous nodes

The basic feature extraction module is trained and sampled to obtain relative difficulty information between different problems, and in terms of form, the module assumes that

The method comprises the steps that when node t is represented by a node of a first layer GNN, feature representation of a previous layer is converged in the updating process from an l-1 layer to an l layer, so that node feature representation is updated, different information computing messages aiming at different edge weights are used, weight matrixes of different interaction operators are parameterized into source node projection, edge projection and target node projection, different attention vectors are computed for different edges, and a multi-head attention mechanism is used, wherein the following formula is used:

ATThead for attention head _i (s, e, t) the source node s is projected to the ith key vector K ⁱ In(s), depending on the type of source node, each type of node in the graph has a corresponding unique Linear projection to maximize the difference between the different types of nodes, introducing a mask matrix for the embedding of the problem, preventing the premature aggregation of edge information into the node representation, the computation of dot product attention between the different nodes of the iso-graph being different from the original transform model, links from different relationships typically having different weights for the target node, there may be different edge types between the problem and the knowledge node, and between the learner and the problem, each edge type φ (e) retaining a unique edge-based matrix

To capture different relationship information;

2) Knowledge point and problem node representation and update

Messaging is a key operation of GNN to transfer information from a source node to a target node while adding edge meta-relationships during messaging to mitigate the differences in the distribution of different types of nodes and edges, and the messaging of learner interaction data is presented in the equation:

Message _(s,e,t) ＝|| _i∈[1,h] Msg-head ⁱ (s,e,t)

ith message header MSGhead _i The information of (s, e, t) comes from the rest of the nodes linked by the target node,

representing weights, wherein h represents the counting of multiple heads, messages (s, e, t) corresponding to each node are obtained by combining all Message heads, new messages are needed to be aggregated to a target node after the calculation of attention and messages is completed, the whole updating process is completed, and the new node is characterized in thatFormed by concatenating the residual of the original feature with the new feature, after multiplying the attention and the message, a hidden representation H of the target t is obtained ^(l) [t]The new node characteristics comprise all attention weights and messages, and the characteristic information of the target node is updated through residual connection, and the method specifically comprises the following steps:

H ^(l) [t]＝Linear _τ(t) (σ(H ^(l) [t]))+H ^(l-1) [t]。

further, the hierarchical feature extraction module specifically calculates a vector by adopting a graph convolution calculation

It weights all nodes, and the coarsening process of the nodes is as follows:

further, the window attention unit calculates the importance of the first and last positions of the feature problem to improve the prediction performance:

wherein beta is _t,τ The importance of the first and last positions is a feature problem.

Further, the step S4 is specifically a 5-fold cross-validation method, and performs performance evaluation on the model obtained by training by using an Accuracy (Accuracy), a Recall (Recall), a Precision (Precision), an F1 score and an Area Under Curve (AUC).

Compared with the prior art, the invention has the following advantages:

1. according to the invention, by constructing the heterogeneous interaction diagram and establishing the heterogeneous GNN model, the heterogeneous GNN model comprises the GNN unit and the window attention unit, the learning content and the knowledge points of the learner can be dynamically identified and characterized by training the heterogeneous GNN model, and the learning state of the learner can be predicted by combining a window attention mechanism. The high-order relation between the heterogeneous GNN extraction problem and the knowledge points is utilized, rich representation of knowledge point level information is obtained through heterographic composition coarsening, final prediction is completed through a attention mechanism of combining the problem distinction with window sequence memory, and accuracy of learning state recognition prediction can be effectively improved.

2. The invention designs the difficulty estimation module in the GNN unit, combines the TrueSkill system with the deep network, generates interpretable problem differences and problem difficulties by generating learners with interpretable feedback, and ensures the interpretability of learning parameters.

3. According to the invention, the hierarchical feature extraction module is designed in the GNN unit, and the coarsening information of the nodes of different types is calculated and expanded to the heterogeneous graph to obtain rich representation of the hierarchical structure features, so that accurate hierarchical features can be obtained, and the hierarchical information of the real problem can be completely reflected.

4. The invention introduces a window memory limiting sequence position attention mechanism to design a window attention unit, and can effectively improve the prediction performance by emphasizing the importance of the first position and the last position of a specific problem.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of an application process of an embodiment;

FIG. 3 is a schematic diagram of a knowledge tracking overall framework constructed in accordance with an embodiment;

FIG. 4 is a schematic diagram of a process for extracting interactive features and knowledge structural features in an iso-composition according to the present invention;

FIG. 5 is a schematic diagram of a hierarchical feature extraction process in accordance with the present invention;

fig. 6 is a schematic diagram of an application effect of the embodiment.

Detailed Description

The invention will now be described in detail with reference to the drawings and specific examples.

Examples

As shown in fig. 1, a learner learning state prediction method based on a heterogeneous graph neural network model includes the following steps:

By applying the technical solution, as shown in fig. 2 and fig. 6, the main contents include:

1) Constructing a knowledge structure multistage summary learner attention-strengthening knowledge tracking model based on heterogeneous GNN: and extracting a high-order relation between the problem and the knowledge points through heterogeneous GNN, obtaining rich representation of knowledge point level information through heterogram coarsening, and finishing final prediction through a attention mechanism of combining the problem distinction and window sequence memory.

2) The high-order problem knowledge point embedded representation of the heterogeneous learner interaction map is implemented on large-scale data. The method of combining the TrueSkill system with a deep network allows the model to generate interpretable problem differences and problem difficulties by generating learners with interpretable feedback. By introducing a window memory limiting sequence position attention mechanism, the short-term learning state of a learner is accurately highlighted.

3) According to the characteristics of the constructed knowledge tracking model, cleaning the data of the students 'system interactions collected in 2009-2010 and the data of all student's system interactions collected in Santa platform 2 years to obtain learning behavior characteristics of a learner, preprocessing the data, selecting proper characteristics to construct an interaction diagram structure by using a heterogeneous diagram construction method, and selecting proper training and testing data.

4) Training the heterogeneous GNN model to obtain a framework capable of dynamically identifying and characterizing learning contents and knowledge points of learners, combining the obtained problems and knowledge point characteristics with a window attention mechanism, and predicting to obtain a student learning state.

5) Parameter adjustment and effect evaluation of the model.

In the constructed heterogeneous GNN model, as shown in fig. 3, the working process includes three steps of problem discrimination acquisition, learner learning sequence chart feature recognition and hierarchical feature extraction, and the specific steps are as follows:

step 1: formally, given an input Graph bg= (V, E, λ) and a Graph network model, the key idea of most Graph Neural Networks (GNNs) is to aggregate feature information from the direct (first order) neighbors of a node, such as generalized representation learning (Graph SAGE) or Graph annotation networks (GAT) on a large Graph. However, they cannot aggregate information from different nodes and are susceptible to interference from information of neighboring nodes. The basic feature extraction module based on the heterogeneous GNN method is used as a method for problem feature and knowledge point processing. The extractor uses the problem sequence { q } ₁ ,q ₂ ,…q _i Sequence of knowledge points { c }, and ₁ ,c ₂ ,…c _i as input. Interaction map information is also used as additional input, which is a heterogeneous map containing interactions between interaction information and learner, questions, and responses. The output of the basic feature extraction module is the embedding of questions and knowledge points, including intersectionHigh order features behind the fork. The original data needs to be projected through the embedded layer into the desired embedded information. In this module, the embedded layer is represented as

And->

Representing the number of knowledge points and questions, respectively.

The differentiation of knowledge points and questions in the knowledge tracking process is related not only to themselves, but also to the order of interaction. The original graph is heterogeneous and more feature information related to the current input sequence can be found from the input items and knowledge points. Therefore, inspired by a Heterogeneous Graph Transformer (HGT), global sampling is introduced in a multi-hop sampling based Basic Feature Extractor (BFE) of a small batch of input sequences, and feature data is acquired according to the richness. After multi-hop sampling, the sampled data and the original data are respectively fed into an heterograph neural network for information aggregation, wherein different knowledge points and problems are embedded into the characteristics of adjacent related nodes. The raw data uses a mask to prevent pre-capture of relationships to be predicted. The knowledge state extraction and message passing update methods under the heterogeneous graph structure (as shown in fig. 4) are respectively described below.

1.1 method for representing marginal relationship between heterogeneous nodes

The basic extractor (Base Feature Extractor) is trained and sampled to obtain relative difficulty information between different questions. Formally, a module assumes

Is the node representation of node t at layer l GNN, the feature representation of the previous layer is converged in the update process from layer l-1 to layer l, thereby updating the node feature representation. Messages are computed using different messages for different edge weights. Parameterizing weight matrixes of different interaction operators into source node projection, edge projection and target nodeAnd (5) point projection. Different attention vectors are calculated for different edges and a multi-headed attention mechanism is used, as shown in the following formula:

ATThead for attention head _i (s, e, t) the source node s is projected to the ith key vector K ⁱ In(s), depending on the type of source node. Each type of node in the graph has a corresponding unique Linear projection, so that the difference between the different types of nodes is kept to the maximum. The mask matrix is introduced for embedding the problem, and information of the edges is prevented from being aggregated into the node representation in advance. The computation of dot product attention between different nodes of the iso-graph is different from the original transducer model, links from different relationships typically having different weights for the target node. Different edge types, such as τ(s) and τ (i), may exist between the problem and the knowledge node and between the learner and the problem. Thus, each edge type φ (e) retains a unique edge-based matrix

To capture different relationship information.

1.2 knowledge point and problem node representation and update method

Message passing is a key operation of GNNs aimed at passing information from a source node to a target node while adding edge meta-relationships during message passing to mitigate the differences in the distribution of different types of nodes and edges. Messaging of learner interaction data is presented in an equation.

Message _(s,e,t) ＝|| _i∈[1,h] Msg-head ⁱ (s,e,t) (2)

weight, where h represents the count of multiple heads. The Message (s, e, t) corresponding to each node is obtained by merging all the Message headers. After the calculation of the attention and the messages is completed, new messages need to be aggregated to the target node to complete the whole updating process. The new node feature is formed by concatenating the residual of the original feature with the new feature, after multiplying the attention and the message, obtaining a hidden representation H of the target t ^(l) [t]. The new node characteristics comprise all attention weights and messages, and the characteristic information of the target node is updated through residual connection, and the method specifically comprises the following steps:

H ^(l) [t]＝Linear _τ(t) (σ(H ^(l) [t]))+H ^(l-1) [t] (5)

step 2: in general, knowledge point structure information provided in a dataset does not fully represent the true level of knowledge point variability. In the HGKT model, hierarchical information has proven critical to knowledge tracking text. However, the hierarchical information obtained by clustering does not fully reflect the hierarchical information of the real problem. Using the isomerism map as input, accurate hierarchical features are obtained by iso-composition coarsening. Specifically, as shown in FIG. 5, the coarsened information of the different types of nodes is expanded to the heterogeneous graph by computing it to obtain a rich representation of the hierarchical features. The learner and the knowledge points are related not only through node information, but also through graph structure information. In order to utilize node features and sub-graph features, the present invention constructs a hierarchical feature extraction module (HFE) that is used to compute vectors by graph convolution

It weights all nodes, and the coarsening process of the nodes is as follows:

the model is a process-based transducer model for serialized data. Since the original dot product attention is inconsistent with the short-term memory capabilities of humans, the trace cannot be fully applied to learner capabilities, and HHSKT improves the calculation of attention through a windowing mechanism.

2.1 windowed attention mechanism

The use of classical knowledge-tracking models and AKT models in time effects suggests that short-term memory repetitive stimulation is very helpful in predicting learner responses. However, the learner learns a series of questions in the learning system, and the sequence position effect indicates that the learner is likely to accept only the beginning and end of the content. Thus, HHSKT improves predictive performance by emphasizing the importance of the first and last locations of a particular problem. The following formula is shown:

2.2 problem difficulty and differentiation estimation

The difficulty of the problem belongs to the original property of the problem and needs to be considered. The Rasch model is generally regarded as a 1PL-IRT model and has proven effective for problem embedding. The TrueSkill system is a scoring system based on bayesian reasoning. The TrueSkill system assumes that the player's rank can be represented by a normal distribution, which can be described by two parameters: mean μ and variance σ. Lee's work indicates that the TrueSkill system can correctly estimate the problem difficulty features. The learner, question and knowledge points are considered to be different game participants, with response 1 identifying the learner as a winner and vice versa, and the update process of the question and knowledge point features is the process of estimating the whole from the subgraph. The TrueSkill score is continuously updated during training with small batches of data and this parameter is not updated during prediction. After scoring is completed, mu and sigma are mapped into one-dimensional vectors through linear projection, and then final difficulty prediction features are obtained through softMax.

Step 3: and (5) data processing and index selection. The whole process is divided into data preprocessing, model training and parameter adjustment and model evaluation.

3.1 data Pre-processing

This part includes log processing, data normalization and resampling. Firstly, the log represents the interactive records and answer records left by students on an online learning platform, and the log records (marked with behavior labels) are processed according to specified rules so as to obtain online learning behavior characteristics of the students. Finally, the data set is divided into a training set, a verification set and a test set according to 60%,20% and 20%, and student numbers are installed for 5-fold cross verification.

3.2 training and adjustment of models

Specific details of model training are as follows. First, the preprocessed data is trained in a model that divides the input training set into five training set subsets train using a 5-fold cross-validation strategy _i ,i∈[1,5]. In 5-fold cross-validation, each subset uses sequence i as the validation set, the remaining four sequences j, j ε [1,5 ]]Λj+.i as training set. Meanwhile, the trained model is used for predicting the test set. The parameter adjustment of the model adopts a one-by-one optimization method, the parameter adopts a mode of combining random search coarse adjustment and grid search fine adjustment, the optimal parameter under the current data set is searched, and the depth parameter optimizes the network parameter through a back propagation algorithm so as to improve the prediction effect of the model.

3.3 evaluation of the model

The performance of the model obtained by training was evaluated using Accuracy (Accuracy), recall (Recall), precision (Precision), F1 score and Area Under Curve (AUC). Among them, AUC index is widely used as the most important and classical index for knowledge tracking.

In this embodiment, two types of data sets are selected respectively. The first type of small dataset is the documents data collected from 2009 through 2010. The complete data set is split into two distinct files, one for all knowledge point builder data and one for all non-knowledge point builder data. Knowledge point builder data is also referred to as mastering learning data. The data comes from a knowledge point construction (mastering learning) question set in which a student is considered to have mastered a class of knowledge points when certain criteria are met (typically set to answer 3 questions correctly in succession), and will no longer give any questions about the knowledge points after mastering. The second large dataset is a dataset of all student system interactions collected by the Santa platform for 2 years, santa is a multi-platform AI coaching service, with over 78 tens of thousands of users in korea available through Android, iOS and networks. In this embodiment, only knowledge points, questions, students, time and other information in the data set are selected as experimental input data, and the predicted answer condition is used as output.

The method for multi-stage summary of knowledge structures based on heterogeneous GNNs and attention-intensive knowledge tracking models of learners of the invention is provided below, and comprises the following specific steps:

1) And (3) extracting the characteristics of the isomerism map: and constructing a graph structure by using the questions, the knowledge points, the answering time, the correct and incorrect student information in the student learning records, extracting learning activity features existing in the graph structure, and further obtaining the hierarchical knowledge point representation.

(1) Data preprocessing, data segmentation, random oversampling and data standardization are carried out on the data set, and the data set is divided into a training set, a testing set and a verification set according to 60%,20% and 20%. In the experiment, less than 5 pieces of learning data of learners without references are removed.

(2) Formally, given an input graph bg= (V, E, λ) and a graph network model, the key idea of most Graph Neural Networks (GNNs) is to aggregate feature information from the direct (first order) neighbors of a node. The learner's answering activity record can give an input diagram, and in order to train on the oversized diagram, part of students are selected each time to form a sub-diagram for training. The basic extractor (Base Feature Extractor) is used to pre-train and sample, respectively, to obtain relative difficulty information between different questions. Hierarchical features of knowledge points are obtained using a Hierarchical Feature Extractor (HFE).

2) Problem knowledge point discrimination estimation and sequence attention calculation: firstly, estimating the correlation difficulty and the discrimination of sequence features in a subgraph by utilizing a TrueSkill system in a sliding window mode, then carrying out attention calculation on a learning sequence of each learner in the subgraph, enabling the model to pay more attention to the learning performance in the latest learning period by limiting the attention scope, further training the model, and constructing a model with reasonable parameters.

(1) The TrueSkill system is a scoring system based on bayesian reasoning. The TrueSkill system assumes that the player's rank can be represented by a normal distribution, which can be described by two parameters: mean and variance. And after grading estimation is completed by carrying out parameter estimation on the interaction sequence in the subgraph, mapping the mean value and the variance into one-dimensional vectors through linear projection, and obtaining final difficulty prediction characteristics through softMax.

(2) Learner question records are typically time series, with different attention weights given to the model at different moments by calculating attention, and with sequence start position information and nearest window information used as attention calculation features by limiting the expression range of the model attention.

(3) Model training: model acceptance in the present invention

As input->

As an output. To evaluate the differences between the final predictions of the model and the dataset labels, the negative log likelihood of the observed sequence was used as a loss function: />

The learner is divided into 5 groups of non-learning machines by adopting a cross-validation mode in the model training processCombined with verification to verify the generalization ability of the model. Specifically, first, the preprocessed data is trained with a 5-fold cross-validation strategy in a model that divides the input training set into five training set subsets train _i ,i∈[1,5]. In 5-fold cross-validation, each fold uses sequence i as the validation set, the remaining four sequences j, j ε [1,5 ]]And training the classifier by using the data, wherein ∈j is not equal to i as a training set. Meanwhile, the trained model is used for predicting the test set.

(4) Subsequently, to further evaluate the performance of the proposed model, comparison will be made with a baseline machine learning method: item Response Theory (IRT), performance Factor Analysis (PFA), regression-based DAS3H, depth Knowledge Tracking (DKT), saint+, self-attention knowledge tracking model (SAKT), content-aware attention knowledge tracking model (AKT). While these methods all use the same processed data for training. Training parameter settings were also consistent and averaged across multiple times. Compared with a baseline machine learning method and a deep learning method, the knowledge tracking model provided by the invention has better prediction performance.

(5) In order to maximize optimization and mining model potential, the parameter adjustment of the model adopts a strategy method of random search and item-by-item optimization, a rough parameter range is determined through random search coarse adjustment, grid search fine adjustment is used for searching optimal parameters, so that the optimal parameters under the current data set are searched, and the depth parameter part optimizes network parameters through a back propagation algorithm so as to improve the prediction effect of the model.

(6) Finally, the Accuracy (Accuracy), recall (Recall), precision (Precision), F1 fraction and Area Under Curve (AUC) are adopted to evaluate the comprehensive performance of the model obtained by training, and compared with the traditional knowledge tracking method and the early depth knowledge tracking method, the model provided by the invention can generate a result with higher Accuracy under the same data set.

In summary, the technical scheme utilizes the characteristics of strong expression capability and the advantages of an attention mechanism of the graph neural network to construct an integral model structure for learner knowledge tracking; the cross loss entropy function is widely used in the prior knowledge tracking research and proved to have higher reliability; the learner's near-cause effects and memory mechanisms have proven to assist the learner in learning and absorbing knowledge. Therefore, the invention fully utilizes the existing research results, and provides a knowledge structure multistage summary learner attention-strengthening knowledge tracking model based on heterogeneous GNN aiming at the problems that the prior research does not consider hierarchical knowledge point state change, the difficulty of static knowledge points is insufficient for distinguishing knowledge features and the interpretable parameters are lacking. The method utilizes the high-order relation between the heterogeneous GNN extraction problem and the knowledge points, obtains rich representation of knowledge point level information through heterographing coarsening, completes final prediction through a attention mechanism of combining window sequence memory through problem differentiation, and finally obtains a prediction result with higher accuracy on large-scale and large-scale online education data, thereby having practical application prospect.

Claims

1. The learner learning state prediction method based on the heterogeneous graph neural network model is characterized by comprising the following steps of:

s3, establishing a heterogeneous GNN model, initializing the model and setting super parameters, wherein the heterogeneous GNN model comprises a GNN unit and a window attention unit;

2. The method for predicting learning states of learners based on a heterogeneous graph neural network model according to claim 1, wherein the preprocessing operation in step S1 includes log processing, data standardization, resampling and data segmentation, and the data segmentation specifically includes dividing an online learning data set into a training set, a verification set and a test set according to a set proportion.

3. The method for predicting learning state of a learner based on a heterogeneous graph neural network model according to claim 1, wherein the heterogeneous interaction graph includes interaction information and interactions among the learner, the question, and the response.

4. The method for predicting learning states of learners based on heterogeneous graph neural network models according to claim 1, wherein the GNN unit in the step S3 includes a difficulty estimating module, a basic feature extracting module and a hierarchical feature extracting module, and the difficulty estimating module is used for generating interpretable problem differentiation and problem difficulty;

5. The method for predicting learning state of learner based on heterogeneous graph neural network model according to claim 4, wherein the window attention unit in step S3 is used for extracting sequence features and predicting learning state of learner according to learning behavior and output data of GNN unit.

6. The method for predicting learning states of learners based on heterogeneous graph neural network model according to claim 4, wherein the specific working process of the difficulty estimation module is as follows:

d _τ ＝Softmax(Linear _τ(t) ([μ _τ ,σ _τ ]))

7. The method for predicting learning states of learners based on heterogeneous graph neural network model according to claim 4, wherein the specific working process of the basic feature extraction module comprises:

1) Marginal relationship representation among heterogeneous nodes

To capture different relationship information;

2) Knowledge point and problem node representation and update

Message _(s,e,t) ＝|| _i∈[1,h] Msg-head ⁱ (s,e,t)

representing weights, wherein h represents the counting of multiple heads, messages (s, e, t) corresponding to each node are obtained by combining all Message heads, and after the calculation of attention and messages is completed, new messages are needed to be aggregated to a target nodeThe point, completing the whole updating process, the new node characteristic is formed by connecting the residual error of the original characteristic with the new characteristic, and after multiplying the attention and the message, the hidden representation H of the target t is obtained ^(l) [t]The new node characteristics comprise all attention weights and messages, and the characteristic information of the target node is updated through residual connection, and the method specifically comprises the following steps:

H ^(l) [t]＝Linear _τ(t) (σ(H ^(l) [t]))+H ^(l-1) [t]。

8. the method for predicting learning states of learners based on heterogeneous graph neural network model as claimed in claim 4, wherein the hierarchical feature extraction module calculates the calculation vector by specifically adopting graph convolution calculation

It weights all nodes, and the coarsening process of the nodes is as follows:

9. the method for predicting learning states of learners based on a heterogeneous graph neural network model according to claim 5, wherein the window attention unit calculates importance of a first and a last location of the feature problem to improve prediction performance:

10. The method for predicting learning states of learners based on heterogeneous graph neural network models according to claim 2, wherein the step S4 is specifically a 5-fold cross-validation training, and performance evaluation is performed on the model obtained by training by using accuracy, recall, precision, F1 score and area under a curve.