CN117992924A

CN117992924A - HyperMixer-based knowledge tracking method

Info

Publication number: CN117992924A
Application number: CN202410389529.0A
Authority: CN
Inventors: 王俊; 陈恳; 李子杰; 王明杰; 夏跃龙; 邹伟; 周菊香; 甘健侯
Original assignee: Yunnan Normal University
Current assignee: Yunnan Normal University
Priority date: 2024-04-02
Filing date: 2024-04-02
Publication date: 2024-05-07
Anticipated expiration: 2044-04-02
Also published as: CN117992924B

Abstract

The invention relates to the technical field of deep learning, in particular to a HyperMixer-based knowledge tracking method. Through HyperMixer architecture unique mixed characteristic layers, excellent performance can be obtained in different tasks, characteristic information can be mixed in a global scope, and therefore the model can capture long-scope dependency relationships and maintain detail processing of local characteristics. The method aims at solving the problem of how to enable the model to capture long-range dependency relationships and maintain detail processing of local features.

Description

HyperMixer-based knowledge tracking method

Technical Field

The invention relates to the technical field of deep learning, in particular to a HyperMixer-based knowledge tracking method.

Background

With the continuous development of artificial intelligence technology, intelligent education is also a topic of interest to researchers. The fire heat of deep learning brings new opportunities for the application of technology in education, and the use of various neural networks in educational intelligent algorithms becomes the current research popularity.

Knowledge tracking tasks are an important part of intelligent educational research, and the technology aims at predicting the future performance of students through their historical interaction records, which are usually the situations that students answer to problems in the past, by which teachers and learners use learning platforms to grasp the knowledge state of current learners and diagnose which knowledge points the students have not grasped enough through the knowledge state, so as to actively adjust the following teaching strategies or learning paths.

From the perspective of sequence modeling, existing knowledge tracking methods can be largely divided into cyclic neural network-based methods and transform-based methods. These methods follow the same settings for other sequential tasks, all of which encode historical information using an encoder, and then choose the hidden state for the corresponding time step to predict.

However, when the encoder processes the history information, the history information may lose some important details or context information, which easily causes the encoder to lose some local feature details in the sequence data; and when the hidden state corresponding to the time step is selected for prediction, the model may be too dependent on recent history information to capture long-term information, so that long-range dependency relationship in the sequence data cannot be captured.

The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.

Disclosure of Invention

The invention mainly aims to provide a HyperMixer-based knowledge tracking method, which aims to solve the problem of how to enable a model to capture long-range dependency relationships and maintain detail processing of local features.

In order to achieve the above object, the present invention provides a HyperMixer-based knowledge tracking method, which includes:

acquiring an interaction sequence corresponding to a learner at a target moment;

determining exercise labels, knowledge point labels and answer situation labels in the interaction sequence;

embedding the exercise label, the knowledge point label and the answer condition label;

Capturing sequence full-length features in the embedded interaction sequence based on HyperMixer global feature mixer; and

Capturing interaction characteristics in a preset historical time period in the embedded interaction sequence based on HyperMixe local characteristic mixer;

And fusing the full-length characteristics of the sequences with the interactive characteristics to obtain a characteristic mixer result, and predicting the training result, the knowledge point result and the answer situation result of the learner at the next moment according to the characteristic mixer result.

Optionally, the step of embedding the exercise tag, the knowledge point tag, and the answer situation tag includes:

embedding each exercise label in the interaction sequence to obtain an embedding matrix Wherein, the method comprises the steps of, wherein,D represents an embedding dimension for the total number of exercises of the learner;

Embedding each knowledge point label in the interaction sequence to obtain an embedded matrix Wherein, the method comprises the steps of, wherein,D represents the embedding dimension for the total number of knowledge points of the learner;

embedding the answer condition to obtain an embedded matrix 。

Optionally, the HyperMixer-based global feature mixer includes a global sequence mixerAnd global channel mixerThe expression for capturing the full-length sequence features in the embedded interactive sequence based on HyperMixer global feature mixer is as follows:

Wherein,

In the method, in the process of the invention,For all features in the interaction sequence after embedding,AndThe weight matrices of Sequence Mixer and Channel Mixer respectively,AndFor the bias vector, GELU is a nonlinear activation function,Representing the inputs of the sequence mixer and the global channel mixer.

Optionally, the saidAnd saidBy parameterized functionsGenerating, by another MLP, a weight matrix from each token independently：

In the method, in the process of the invention,Is a vector of additional information by position coding,Representing a multi-layer perception mechanism.

Optionally, the HyperMixe-based local feature mixer includes a local sequence mixerAnd partial channel mixerThe step of capturing the interaction characteristics in the preset history time period in the embedded interaction sequence based on HyperMixe local characteristic mixer comprises the following steps:

Wherein, For the interactive features in the interactive sequence within a preset history period,

In the method, in the process of the invention,AndRespectively areAndA weighting matrix for Channel Mixer,AndAs a bias vector GELU is a nonlinear activation function.

Optionally, the step of fusing the full-length sequence feature and the interactive feature to obtain a feature mixer result, and predicting the training result, the knowledge point result and the answer situation result of the learner at the next moment according to the feature mixer result includes:

splicing the sequence full-length characteristic and the interactive characteristic to obtain a spliced characteristic ：

；

Characterizing the spliceAs the sequence characteristic of the current moment, predicting the exercise result of the next momentKnowledge point resultsAnd answer case results；

Wherein,The sigmoid function is represented as a function,Is a fully connected layer parameter that reduces dimensions through a neural network while adjusting output toIs used in the range of (a),、、Respectively an exercise label, a knowledge point label and a response situation label at the time t+1.

Optionally, after the step of fusing the full-length sequence feature and the interactive feature to obtain a feature mixer result and predicting the training result, the knowledge point result and the answer situation result of the learner at the next moment according to the feature mixer result, the method further includes:

Calculating a score corresponding to each interaction sequence of the learner at each time step based on a bi-classification cross entropy loss function, wherein the loss of a single sequence is expressed as:

In the method, in the process of the invention, Real labels for students when interacting.

In addition, to achieve the above object, the present invention further provides a knowledge tracking system based on HyperMixer, the knowledge tracking system based on HyperMixer includes: a memory, a processor, and a HyperMixer-based knowledge tracking program stored on the memory and executable on the processor, the HyperMixer-based knowledge tracking program, when executed by the processor, implementing the steps of HyperMixer-based knowledge tracking method as claimed in any one of the preceding claims.

In addition, to achieve the above object, the present invention further provides a computer-readable storage medium having stored thereon a knowledge tracking program based on HyperMixer, which when executed by a processor, implements the steps of the knowledge tracking method based on HyperMixer as set forth in any one of the above.

The invention provides a HyperMixer-based knowledge tracking method, a HyperMixer-based knowledge tracking device and a computer-readable storage medium, which can obtain excellent performance in different tasks through a HyperMixer-structured unique mixed feature layer, and can mix feature information in a global range, so that a model can capture long-range dependency relationships and can maintain detail processing of local features.

Drawings

FIG. 1 is a schematic architecture diagram of a hardware operating environment of HyperMixer-based knowledge tracking system in accordance with an embodiment of the invention;

fig. 2 is a flowchart of a first embodiment of a HyperMixer-based knowledge tracking method according to the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

In order to better understand the above technical solution, exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As an implementation manner, fig. 1 is a schematic architecture diagram of a hardware running environment of a HyperMixer-based knowledge tracking system according to an embodiment of the present invention.

As shown in fig. 1, the HyperMixer-based knowledge tracking system may include: a processor 1001, such as a CPU, memory 1005, user interface 1003, network interface 1004, communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

Those skilled in the art will appreciate that the HyperMixer-based knowledge tracking system architecture shown in fig. 1 does not constitute a limitation of the HyperMixer-based knowledge tracking system, and may include more or fewer components than shown, or may combine certain components, or may be a different arrangement of components.

As shown in fig. 1, an operating system, a network communication module, a user interface module, and a HyperMixer-based knowledge tracking program may be included in the memory 1005 as one type of storage medium. Wherein the operating system is a program that manages and controls hardware and software resources of the HyperMixer-based knowledge tracking system, hyperMixer-based knowledge tracking program, and other software or program runs.

In the HyperMixer-based knowledge tracking system shown in fig. 1, the user interface 1003 is mainly used for connecting a terminal, and is in data communication with the terminal; the network interface 1004 is mainly used for a background server and is in data communication with the background server; the processor 1001 may be configured to invoke HyperMixer-based knowledge tracking programs stored in the memory 1005.

In this embodiment, hyperMixer-based knowledge tracking system includes: a memory 1005, a processor 1001, and a HyperMixer-based knowledge tracking program stored on the memory and executable on the processor, wherein:

when the processor 1001 calls the HyperMixer-based knowledge tracking program stored in the memory 1005, the following operations are performed:

embedding the answer condition to obtain an embedded matrix 。

；

Based on the hardware architecture of the HyperMixer knowledge tracking system based on the deep learning technology, an embodiment of the HyperMixer knowledge tracking method is provided.

Referring to fig. 2, in a first embodiment, the HyperMixer-based knowledge tracking method includes the steps of:

step S10, an interaction sequence corresponding to a learner at a target moment is obtained;

Step S20, determining exercise labels, knowledge point labels and response condition labels in the interaction sequence;

In this embodiment, a historical time step in the interaction sequence of the learner is selected as the target time t, and the training label, the knowledge point label and the answer condition label in the interaction sequence of the target time t are determined.

The interaction sequence records the interaction condition of the learner in the learning process, and in the embodiment, the interaction condition mainly comprises a training label, a knowledge point label and a response condition label, and the training label, the knowledge point label and the response condition label of the learner in a specific time period are determined by analyzing the interaction sequence, so that the model can identify the learning performance of the learner.

The exercise labels reflect the learner's performance in different exercise tasks including, but not limited to, difficulty, completion, time consumption of the exercise.

Knowledge point tags are characterized as knowledge points that a learner has involved in the learning process, and can help a model learn how the learner has mastered each knowledge point, and their association and migration between different knowledge points.

The answer condition indicia is characterized by the learner's performance in the answer process including, but not limited to, accuracy of the training answer, type of error, time of answer, etc. By analyzing the answer condition labels, the model can know the error correction capability, thinking capability and self-thinking mind of a learner when the problem is solved.

Step S30, embedding the exercise label, the knowledge point label and the answer condition label;

In this embodiment, to convert the training tags, knowledge point tags, and response case tags in the sequence, the discrete tags are converted to a low-dimensional continuous vector representation, and the high-dimensional, discrete symbols or categories are encoded into a low-dimensional, continuous vector representation by embedding the tags, thereby facilitating model learning.

Optionally, how to embed the training tags, each training tag in the interaction sequence is embedded to obtain an embedding matrixWherein, the method comprises the steps of, wherein,For the total number of exercises of the learner, d represents the embedding dimension.

Optionally, how to embed the knowledge point tags, each knowledge point tag in the interaction sequence is embedded to obtain an embedding matrixWherein, the method comprises the steps of, wherein,D represents the embedding dimension for the total number of knowledge points of the learner;

alternatively, how to embed the answer cases, since the answer cases only have two cases of "opposite" or "wrong", the matrix is embedded 。

Step S40, capturing sequence full-length features in the embedded interaction sequence based on HyperMixer global feature mixer; capturing the interaction characteristics in a preset historical time period in the embedded interaction sequence based on HyperMixe local characteristic mixer;

In this embodiment, the HyperMixer model includes a global feature mixer and a local feature mixer, where the global feature mixer is used to capture the full-length features of the sequence in the interactive sequence after being embedded, and the local feature mixer is used to capture the interactive features in the preset historical time period in the interactive sequence after being embedded, and through the mixed feature layer with the unique HyperMixer architecture, excellent performance can be obtained in different tasks, feature information can be mixed in the global scope, so that the model can capture long-range dependency relationships, and can also maintain detail processing of the local features.

Optionally, the global feature mixer comprises a global sequence mixerAnd global channel mixerFull length features of sequencesThe expression of (2) is:

Wherein,

In the method, in the process of the invention,For all features in the interaction sequence after embedding,AndThe weight matrices of Sequence Mixer and Channel Mixer respectively,AndAs a bias vector GELU is a nonlinear activation function.

Further and optionally, the step of, in the alternative,And saidBy parameterized functionsGenerating, by another MLP, a weight matrix from each token independently：

Illustratively, the maximum sequence length is set toAnd the input of the global feature mixer is。

In the method, in the process of the invention,For an interaction sequence in which interactions occur at time t,Wherein, the method comprises the steps of, wherein,、、The training label comprises an embedded sequence formed by training labels at the time t, an embedded sequence formed by knowledge point labels and an embedded sequence formed by response condition labels.

Optionally, the HyperMixe-based local feature mixer comprises a local sequence mixerAnd partial channel mixerThe step of presetting the interaction characteristics in the history time period comprises the following steps:

Further and optionally, the step of, in the alternative,AndBy parameterized functionsGenerating, by another MLP, a weight matrix from each token independently：

Illustratively, the last prior to capture by the local feature mixer is setThe characteristic of each moment, the input of the local Mixer is：

And S50, fusing the full-length features of the sequence and the interactive features to obtain a feature mixer result, and predicting the training result, the knowledge point result and the answer situation result of the learner at the next moment according to the feature mixer result.

In this embodiment, the obtained full-length features and interactive features of the sequence are fused to obtain a feature mixer result which is capable of capturing a long-range dependency relationship and maintaining details of local features, and a training result, a knowledge point result and a response situation result of a learner at the next moment are predicted according to the feature mixer result.

Optionally, splicing the sequence full-length feature and the interactive feature to obtain a spliced feature：

；

Furthermore, for training the model, a score corresponding to each interaction sequence of the learner at each time step is calculated based on a bi-categorical cross-entropy loss function, wherein the loss of a single sequence is expressed as:

In the method, in the process of the invention, For the real labels of students in interaction, an Adam optimizer is used for optimization, and the training target is to minimize Loss.

In the technical scheme provided by the embodiment, through the HyperMixer architecture unique mixed feature layer, excellent performance can be obtained in different tasks, and feature information can be mixed in a global scope, so that the model can capture long-scope dependency relationships and can maintain detail processing of local features.

In addition, to verify the effect of the present invention, performance comparison was performed with a baseline model such as DKT, SAKT, SAINT by two modes of solution Level and KC Level under the setting of five-fold intersection. All experiments were completed with the same platform: the compute core is provided by an Intel Xeon E-2288G eight-core processor, and the GPU core is supported by NVIDIA A100 PCIe (80 GB HBM 2E) to implement the high-load deep learning task. During training, early stop strategy is used for training, and the number of training rounds is counted，，Embedding dimension D of exerciseDropout is set to 0.2 in order to prevent overfitting.

Table 1 shows the behavior of the query Level (All-in-One) and KC Level (All-in-One) on AS2009 and AL2005 data sets with the baseline model. The method exceeds other baseline models in both the Question Level and the KC Level performance on both dataset datasets. On the Question Level, the accuracy of the method is improved by 1.05% compared with a DKT model on an AS2009 data set, 4% compared with a SAKT model, and 1.17% compared with the DKT model on an AL2005 data set; on the KC Level, the method is improved to a certain extent, the accuracy rate is improved by 1.62% compared with a DKT model, 4.96% compared with a SAKT model, 0.47% compared with a DKT model and 5.11% compared with a SAKT model on an AS2009 data set, and the result shows that even without adopting a complex sequence model structure, the pure MLP architecture can effectively capture the knowledge state and learning progress of students.

Furthermore, it will be appreciated by those of ordinary skill in the art that implementing all or part of the processes in the methods of the above embodiments may be accomplished by computer programs to instruct related hardware. The computer program comprises program instructions, and the computer program may be stored in a storage medium, which is a computer readable storage medium. The program instructions are executed by at least one processor in the HyperMixer-based knowledge tracking system to implement the flow steps of the embodiments of the method described above.

Accordingly, the present invention also provides a computer-readable storage medium storing a HyperMixer-based knowledge tracking program which, when executed by a processor, implements the steps of the HyperMixer-based knowledge tracking method described in the above embodiment.

The computer readable storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, etc. which may store the program code.

It should be noted that, because the storage medium provided in the embodiments of the present application is a storage medium used for implementing the method in the embodiments of the present application, based on the method described in the embodiments of the present application, a person skilled in the art can understand the specific structure and the modification of the storage medium, and therefore, the description thereof is omitted herein. All storage media adopted by the method of the embodiment of the application belong to the scope of protection of the application.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flowchart and/or block of the flowchart illustrations and/or block diagrams, and combinations of flowcharts and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A HyperMixer-based knowledge tracking method, the method comprising:

2. The method of claim 1, wherein the step of embedding the practice label, the knowledge point label, and the response situation label comprises:

embedding each exercise label in the interaction sequence to obtain an embedding matrix Wherein/>D represents an embedding dimension for the total number of exercises of the learner;

Embedding each knowledge point label in the interaction sequence to obtain an embedded matrix Wherein/>D represents the embedding dimension for the total number of knowledge points of the learner;

embedding the answer condition to obtain an embedded matrix 。

3. The method of claim 1, wherein the HyperMixer-based global feature mixer comprises a global sequence mixerAnd global channel mixer/>The expression for capturing the full-length sequence features in the embedded interactive sequence based on HyperMixer global feature mixer is as follows:

；

Wherein,

；

In the method, in the process of the invention,For all features in the interaction sequence after embedding,/>And/>Weighting matrices of Sequence Mixer and Channel Mixer, respectively,/>And/>Is a bias vector, GELU is a nonlinear activation function,/>Representing the inputs of the sequence mixer and the global channel mixer.

4. The method of claim 3, wherein,

The saidAnd said/>By parameterized function/>Generating, generating a weight matrix/>, independently from each token by another MLP：

；

In the method, in the process of the invention,Is a vector of additional information by position coding,/>Representing a multi-layer perception mechanism.

5. The method of claim 1, wherein the HyperMixe-based local feature mixer comprises a local sequence mixerAnd local channel mixer/>The step of capturing the interaction characteristics in the preset history time period in the embedded interaction sequence based on HyperMixe local characteristic mixer comprises the following steps:

；

In the method, in the process of the invention,And/>Are respectively/>And/>Weight matrix of Channel Mixer,/>And/>As a bias vector GELU is a nonlinear activation function.

6. The method of claim 5, wherein,

；

7. The method of claim 1, wherein the step of fusing the full-length features of the sequence with the interactive features to obtain feature mixer results, and predicting the learner's exercise results, knowledge point results, and response situation results at a next time based on the feature mixer results comprises:

；

Characterizing the spliceAs the sequence characteristic of the current moment, predicting the exercise result/>, of the next momentKnowledge point resultsSum answer case results/>；

；

Wherein,Representing a sigmoid function,/>Is a fully connected layer parameter that reduces dimensions through a neural network while adjusting output to/>Interval of/>、/>、/>Respectively an exercise label, a knowledge point label and a response situation label at the time t+1.

8. The method of claim 1, wherein after the step of fusing the full-length features of the sequence with the interactive features to obtain feature mixer results and predicting the training results, knowledge point results and response situation results of the learner at the next time based on the feature mixer results, further comprising:

；

9. A HyperMixer-based knowledge tracking system, the HyperMixer-based knowledge tracking system comprising: a memory, a processor, and a HyperMixer-based knowledge tracking program stored on the memory and executable on the processor, the HyperMixer-based knowledge tracking program, when executed by the processor, implementing the steps of HyperMixer-based knowledge tracking method as claimed in any one of claims 1 to 8.

10. A computer readable storage medium, characterized in that it has stored thereon a HyperMixer based knowledge tracking program, which when executed by a processor implements the steps of HyperMixer based knowledge tracking method according to any of claims 1 to 8.