CN117591870A

CN117591870A - Deep reinforcement learning-based emotion perception intelligent teaching method and system

Info

Publication number: CN117591870A
Application number: CN202311327454.5A
Authority: CN
Inventors: 李志勇; 谭昕; 李亮; 李珩; 许蕤; 刘明国; 余灿灿
Original assignee: Shenzhen Vocational And Technical University
Current assignee: Shenzhen Vocational And Technical University
Priority date: 2023-10-13
Filing date: 2023-10-13
Publication date: 2024-02-23

Abstract

The invention discloses a deep reinforcement learning-based emotion perception intelligent teaching method and a deep reinforcement learning-based emotion perception intelligent teaching system. And then, inputting the preprocessed data into a deep emotion network to obtain the emotion state of the student. And finally, dynamically adjusting teaching contents and modes by using a deep reinforcement learning algorithm according to learning behavior characteristics and emotion states of the students so as to realize individualized and emotion perception teaching of the students. In the updating process of the deep reinforcement learning, learning behavior characteristics, emotion states and teaching feedback of students are considered, so that the purpose of continuously optimizing the teaching effect is achieved. The method and the system of the invention can be widely applied to the fields of online education, remote teaching, personalized education and the like.

Description

Deep reinforcement learning-based emotion perception intelligent teaching method and system

Technical Field

The invention relates to the technical field of intelligent education, in particular to an emotion perception intelligent teaching method and system based on deep reinforcement learning.

Background

With the development of modern educational technology, personalized and intelligent teaching methods have become a hotspot for research and application. In the methods, algorithms based on deep learning and reinforcement learning are widely applied to the learning process of students, and deep understanding and effective intervention of learning behaviors and learning effects of the students are realized. However, the traditional methods mainly focus on learning and teaching decision making by using the behavior data and learning result data of students, and neglect the important influence of the emotion state of the students on the learning effect of the students.

The emotional state of students, such as pleasure, boredom, tension, etc., has an important influence on their learning effect. The emotional state of students can affect their attention, their absorption and understanding of knowledge, and their aggressiveness with learning tasks. Therefore, the emotion states of the students are accurately identified and understood, and the emotion states of the students are considered in teaching decisions, so that the method has important significance in improving teaching effects.

Therefore, it is necessary to provide a new emotion perception intelligent teaching method and system, which can effectively utilize deep learning and reinforcement learning technology to accurately identify and understand the emotion states of students and consider the emotion states of students in teaching decisions so as to improve the teaching effect.

Disclosure of Invention

The application provides an emotion perception intelligent teaching method and system based on deep reinforcement learning so as to improve teaching effects.

The emotion perception intelligent teaching method based on deep reinforcement learning provided by the application comprises the following steps:

collecting multi-modal data of students, including text data, audio data, and video data generated during teaching;

preprocessing the multi-modal data by using a deep learning model to obtain preprocessed data, wherein the preprocessed data comprises learning behavior characteristics and characteristics related to emotion;

taking the preprocessed data as input, and obtaining the emotion state of the student by using a deep emotion network, wherein the expression of the deep emotion network is as follows: e=f (X; phi), wherein E represents the emotion state of the student, X represents the preprocessed data, phi represents the parameters of the deep emotion network, and f represents the operation process of the deep emotion network and is output as the emotion state of the student;

according to the learning behavior characteristics and the emotion states of the students, the teaching content and the mode are dynamically adjusted by using a deep reinforcement learning algorithm, wherein the updating process of the deep reinforcement learning can be described by the following formula:

Where θ_t is the policy network parameter at time t, α is the learning rate, δ_t is the TD error, A_t is the action taken in state S_t, pi (A_t|S_t, θ_t) is the probability of taking action A_t in state S_t, and output is the dynamically adjusted teaching content and mode.

Still further, the method further comprises:

and adjusting and optimizing the parameters of the deep reinforcement learning according to the learning result and feedback information of the students so as to improve the future teaching effect.

Further, the preprocessing the multi-modal data using the deep learning model to obtain preprocessed data includes:

converting the text data into numerical representation reflecting semantic information of the text data by using a pre-trained transducer model to obtain text characteristics;

processing the audio data by adopting a convolutional neural network or a cyclic neural network, converting the audio data into a spectrogram, and extracting audio features from the spectrogram;

processing the video data by using a 3D convolutional neural network or a time convolutional network to obtain video characteristics including information of facial expressions and limb languages of students;

integrating text features, audio features and video features into a unified multi-modal representation by utilizing a specific feature fusion strategy, and taking the integrated multi-modal representation as preprocessed data, wherein the preprocessed data comprises learning behavior features and features related to emotion.

Further, the obtaining the emotion state of the student by using the deep emotion network with the preprocessed data as input includes:

carrying out convolution and pooling operation of a specific mode on the preprocessed data to obtain key information in text data, key information in audio data and key information in video data;

carrying out feature fusion on the key information in the text data, the key information in the audio data and the key information in the video data to obtain fused features, wherein the feature fusion operation can be carried out by simply connecting the features in series or using a weighted average or attention mechanism to endow the features of different modes with different weights;

performing full-connection operation on the fused features to obtain full-connection operation data, wherein the full-connection operation comprises linear transformation through weight and bias and nonlinear transformation through nonlinear activation function, so as to perform further feature extraction and combination;

processing the data after the full connection operation by using a long-short-term memory neural network to obtain data after the long-short-term memory neural network processing;

and outputting the data processed by the long-term and short-term memory neural network, wherein the data comprises the emotion states of students through linear transformation and an activation function operation.

Further, the performing specific-mode convolution and pooling on the preprocessed data to obtain key information in text data, key information in audio data and key information in video data includes:

for text data, converting the text data into a vector with a fixed length through embedding operation, and extracting key information in the text data through one-dimensional rolling and pooling operation;

for audio data, extracting key information in the audio data through two-dimensional convolution and pooling operation;

for video data, key information in the video data is extracted through a three-dimensional convolution and pooling operation.

Furthermore, the dynamically adjusting teaching content and mode by using the deep reinforcement learning algorithm according to the learning behavior characteristics and the emotion states of the students further comprises:

setting personalized emotion threshold values according to learning history data of students;

and when the emotion states of the students exceed the set personalized emotion threshold values, dynamically adjusting the teaching content and modes by using a deep reinforcement learning algorithm.

Furthermore, the method for dynamically adjusting teaching content and mode by using the deep reinforcement learning algorithm according to the learning behavior characteristics and the emotion state of the student further comprises the following steps:

When the emotion states of the students exceed the set personalized emotion threshold values, selecting a corresponding teaching strategy according to the specific emotion states;

wherein the teaching strategy comprises repeating or interpreting concepts differently, or introducing teaching elements like analogy, stories, games etc. to re-interpret unintelligible content in a more vivid and interesting way.

Still further, the adjusting and optimizing the parameters of the deep reinforcement learning according to learning results and feedback information of the students to improve the future teaching effect includes:

determining a reward function based on the learning outcome and feedback information of the student;

and updating parameters of the deep reinforcement learning by using the reward function and a deep reinforcement learning algorithm.

Still further, the reward function is:

R＝α*L+β*F+γ*E+δ*P+ε*I，

wherein R is the output value of the reward function and represents the overall reward of the system taking a certain action in the current state; l represents learning achievements; f represents feedback satisfaction; e represents the emotional state of the student; p learning progress; i represents an improvement in emotional state; alpha, beta, gamma, delta, epsilon are coefficients for adjusting the weights.

The application also provides an emotion perception intelligent teaching system based on deep reinforcement learning, which comprises:

The collecting unit is used for collecting multi-mode data of students, including text data, audio data and video data generated in the teaching process;

the data acquisition unit is used for preprocessing the multi-mode data by using a deep learning model to acquire preprocessed data, wherein the preprocessed data comprises learning behavior characteristics and characteristics related to emotion;

the emotion obtaining unit is used for obtaining the emotion state of the student by using a deep emotion network by taking the preprocessed data as input, wherein the expression of the deep emotion network is as follows: e=f (X; phi), wherein E represents the emotion state of the student, X represents the preprocessed data, phi represents the parameters of the deep emotion network, and f represents the operation process of the deep emotion network and is output as the emotion state of the student;

the adjusting unit is used for dynamically adjusting teaching contents and modes by using a deep reinforcement learning algorithm according to the learning behavior characteristics and the emotion states of the students, wherein the updating process of the deep reinforcement learning can be described by the following formula:

The technical scheme provided by the application is different from other schemes in the prior art, and the method adopts the deep learning and reinforcement learning technology and can process multi-mode data of students, including text data, audio data and video data. Through the deep emotion network, the emotion state of the student can be obtained according to the preprocessed data, so that the emotion change of the student can be considered in the teaching process, and the teaching effect is improved. The method utilizes a deep reinforcement learning algorithm to dynamically adjust teaching contents and modes according to learning behavior characteristics and emotion states of students.

The technical scheme provided by the application has the following beneficial effects:

(1) The multi-mode data of the students are processed by utilizing the deep learning technology, so that the learning behavior and the emotion state of the students can be more comprehensively understood in the teaching process, and the teaching effect is improved.

(2) The deep emotion network is utilized to acquire the emotion states of the students, so that the emotion changes of the students can be considered in the teaching process, and the teaching quality and the learning experience of the students are improved.

(3) The teaching content and mode are dynamically adjusted through the deep reinforcement learning algorithm, so that a highly self-adaptive and personalized teaching method is provided, and the teaching efficiency and the learning effect of students are improved.

Drawings

Fig. 1 is a flowchart of an emotion perception intelligent teaching method based on deep reinforcement learning according to a first embodiment of the present application.

Fig. 2 is a schematic diagram of a multimodal feature fusion model according to a first embodiment of the present application.

Fig. 3 is a block diagram of the structure of the deep emotion network.

Fig. 4 is a block diagram of the system architecture of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is, however, susceptible of embodiment in many other ways than those herein described and similar generalizations can be made by those skilled in the art without departing from the spirit of the application and the application is therefore not limited to the specific embodiments disclosed below.

The first embodiment of the application provides an emotion perception intelligent teaching method based on deep reinforcement learning. Referring to fig. 1, a schematic diagram of a first embodiment of the present application is shown. The following provides a detailed description of an emotion perception intelligent teaching method based on deep reinforcement learning in a first embodiment of the present application with reference to fig. 1.

The method comprises the following steps:

step S101: multimodal data for students is collected, including text data, audio data, and video data generated during teaching.

The goal of this step is to obtain as comprehensive student information as possible for subsequent analysis of emotional states. The text data can be chat records, real-time testing of classroom content and the like from students on an online teaching platform, and the text information can reflect the understanding degree of the students on learning content and emotional response. The audio data is mainly derived from the student's voice response, such as speech in a classroom discussion or voice answer, and may reflect the student's mood and emotion. Video data is derived primarily from student's facial expressions and limb language, such as real-time video in video teaching, which can help analyze student's emotional state, such as happy, depressed, confusing, etc.

Step S102: preprocessing the multi-modal data by using a deep learning model to obtain preprocessed data, wherein the preprocessed data comprises learning behavior characteristics and characteristics related to emotion.

First, the learning behavior feature and emotion related feature related to this step will be briefly described.

Learning behavior features refer to behaviors related to students' direct learning activities, such as when they learn, how long they learn, how many courses or tasks they have completed, how their learning effects, and so on. Specifically, learning behavioral characteristics may include:

(1) Learning time: including total learning time, duration of each learning, frequency of learning, etc.

(2) Learning progress: including the number of tasks completed, the number of tests passed, the number of courses learned, etc.

(3) The learning method comprises the following steps: including whether to take notes, how frequently to review, how to learn (e.g., read, listen to lessons, etc.), etc.

(4) Learning results: including examination achievements, task completion, course completion, etc.

The emotion related features in this embodiment may include:

(1) Expression characteristics: the student's expressions, such as smiling, frowning, surprise, confusion, etc., which may be related to the student's emotional state, are analyzed by the video data.

(2) Voice characteristics: the intonation, volume, speed, etc. of the student are analyzed by the audio data, which may be related to the emotional state of the student.

(3) Text characteristics: student emotion, such as vocabulary used, mood, and emotion expressed, is analyzed by text data (e.g., online chat records, learning feedback, etc.).

(4) Behavior characteristics: by analyzing the student's behavior (e.g., clicking behavior on a learning platform, course browsing behavior, etc.), it is possible to reveal the student's emotional state. The behavioral characteristics are different from the learning behavioral characteristics mentioned above, the former is more focused on the general performance of students, which may include their behavior in the class, such as whether notes are frequently taken, whether questions are frequently asked, whether early or late in the lecture, etc. In addition, it may also include their behavior on the learning platform, such as which courses they have browsed, which resources have been clicked on, how much time they spent on a particular task, etc.

The main tasks of this step are preprocessing of the data and feature extraction. Although in many scenarios, it is possible to use separate deep learning models to process different types of data, in this embodiment, to achieve the best personalized teaching effect, it is selected to integrate the three data into one unified multi-modal deep learning model. The multi-modal deep learning model, which can be called as a multi-modal feature fusion model, can accept three types of data, namely text, audio and video, and then internally performs feature extraction and feature fusion.

The multi-modal feature fusion model is described in detail below in conjunction with fig. 2.

As shown in fig. 2, in this multi-modal feature fusion model, the following four key modules are mainly included: a text feature extraction module 201, an audio feature extraction module 202, a video feature extraction module 203, and a feature fusion module 204.

The text feature extraction module 201 works primarily to convert the student's text data into a numerical representation that reflects their semantic information. Here, a pre-trained transducer model (e.g., BERT or GPT) may be used as a core component. These models are pre-trained on large-scale text data, and have learned rich semantic information. By entering the student's text data into this model, an embedded representation (i.e., vectorized representation) of each word or phrase can be obtained, and then the representation of the words or phrases can be integrated into a text-level representation by some policy (e.g., averaging, or using an intent mechanism), and the resultant text-level representation can be used as a text feature.

An audio feature extraction module 202, the goal of which is to extract useful features from the student's speech data. The audio data may be processed using a Convolutional Neural Network (CNN) or a Recurrent Neural Network (RNN). First, an audio signal is converted into a spectrogram, and then the spectrogram is provided as an input to the CNN or RNN. In this way, the network can learn the characteristics of the audio, such as pitch, intensity, speed of speech, etc., from the spectrogram. The output of the network will be a representation of the audio level.

The video feature extraction module 203 is configured to extract information of facial expressions and body language of the student. The video data may be processed using a 3D convolutional neural network (3D-CNN) or a Temporal Convolutional Network (TCN). First, a sequence of video frames is input into the 3D-CNN or TCN, and then the network is allowed to learn the dynamic information in the video. The output of the network will be a representation of the video level.

The feature fusion module 204 functions to integrate the outputs of the above three modules into a unified representation. Specifically, some feature fusion strategies such as concatenation (localization), weighted averaging, or more complex fusion strategies such as attention mechanisms (attention mechanism) may be employed. In addition, in order to ensure the effect of feature fusion, a full connection layer or other mapping function is also required to be introduced to perform final feature mapping. The output of the module will be a multi-modal representation incorporating text, audio, video information. The multimodal representation may be used as pre-processed data including, but not limited to, learning behavioral characteristics and emotion related characteristics.

The above is a detailed description of the multimodal feature fusion model. The model is innovative in that three different types of data, namely text, audio and video, are integrated into a unified model, so that the model can learn the sharing characteristics of the three types of data and the interrelationship among the characteristics, the state of students can be more comprehensively and accurately understood, and teaching strategies can be more effectively adjusted.

The pre-processed data is obtained directly using the post-training multimodal feature fusion model in the above description. The training process of this multimodal feature fusion model is described below:

first, a large amount of multimodal data needs to be collected as training data, including text data of students (e.g., online chat recordings, learning feedback, etc.), audio data (e.g., voice recordings of students), video data (e.g., video recordings of students), and tag data corresponding thereto (e.g., learning results, learning behaviors, emotional states, etc. of students).

Then, training of the model is performed according to the following steps:

step S2001: and (5) preprocessing data. The collected multimodal data is subjected to necessary preprocessing, such as word segmentation of text data, sonography of audio data, frame extraction of video data, and the like.

Step S2002: and (5) extracting characteristics. The preprocessed data are respectively input into a text feature extraction module 201, an audio feature extraction module 202 and a video feature extraction module 203 to obtain text features, audio features and video features.

Step S2003: and (5) feature fusion. The features obtained by the three modules are input into a feature fusion module 204 to obtain the fused multi-modal features.

Step S2004: and (5) model training. The obtained multi-modal characteristics and corresponding label data are input into a supervised learning model (e.g., SVM, decision tree, neural network, etc.) for training. The training aims to enable the model to predict corresponding labels according to the input multi-modal characteristics. A suitable optimization algorithm (e.g., gradient descent, random gradient descent, adam, etc.) is used during training to continuously adjust the parameters of the model so that the model's predicted results are as close as possible to the actual tag data.

Step S2005: and (5) evaluating a model. During training, a portion of the data that is not used for training (referred to as validation data) needs to be used periodically to evaluate the performance of the model. The metrics evaluated may include accuracy, precision, recall, F1 score, etc. If the model performs poorly on verification data, it may be necessary to adjust parameters of the model, or to change the structure of the model, and then re-train.

Step S2006: and (5) model optimization. Depending on the performance of the model on the verification data, the necessary model optimization can be performed. For example, if the performance of the model does not significantly improve over several consecutive iterations, it may be desirable to reduce the learning rate, increase the regularization term, or adjust the structure of the model, etc.

Step S2007: and (5) model testing. After model training and optimization is completed, another portion of the data that is not used for training and validation (referred to as test data) is needed to test the performance of the model. If the performance of the model on the test data is also good, the generalization capability of the model is strong, and the model can be used for actual multi-mode data processing tasks.

The training process of the multi-mode feature fusion model is as above. This process requires appropriate adjustments to be made according to specific task requirements and data characteristics. For example, in some tasks, it may be desirable to use more complex feature extraction and feature fusion methods, or to use more complex model training methods.

Step S103: taking the preprocessed data as input, and obtaining the emotion state of the student by using a deep emotion network, wherein the expression of the deep emotion network is as follows: e=f (X; phi), wherein E represents the emotion state of the student, X represents the preprocessed data, phi represents the parameters of the deep emotion network, and f represents the operation process of the deep emotion network, and the output is the emotion state of the student.

In the step, the preprocessed data is sent to a deep emotion network to obtain the emotion state of the student. For example, a deep emotion network may judge that students are being confused because their homework answers are inaccurate, their voice response sounds hesitate, and their facial expressions appear to be confused.

The deep emotion network is described in detail below with reference to fig. 3. As shown in fig. 3, the deep emotion network includes:

an input layer 301, which layer functions to receive input data, i.e. said preprocessed data obtained in step S102.

The convolution and pooling layers of a particular modality data (text, audio, video) for each modality obtained by the input layer is processed through a series of convolution and pooling layers.

Convolution and pooling layer of text modality 302:

for text data, the text is first converted into a fixed-length vector by an embedding layer, and then key information in the text is extracted by a one-dimensional convolution layer and a pooling layer. Parameters of the convolution layer and the pooling layer (such as convolution kernel size, step size, pooling type, etc.) need to be set according to actual situations.

Convolution and pooling layer 303 of audio modality:

For audio data, key information in the audio is extracted through a two-dimensional convolution layer and a pooling layer. The convolution layer may capture local patterns and frequency characteristics in the audio signal, while the pooling layer may help reduce the dimensions of the features, preventing overfitting.

Convolution and pooling layer 304 of video modality:

for video data, key information in the video is extracted through a three-dimensional convolution layer and a pooling layer. The three-dimensional convolution layer may take into account both spatial (intra) and temporal (inter) information to capture motion information in the video.

A feature fusion layer 305 that receives output from all modality-specific layers and fuses the features. This fusion can be carried out by simple concatenation, or by more complex operations. For example, a weighted average or attention mechanism may be used to assign different weights to features of different modalities.

The fully connected layer 306 accepts the output of the feature fusion layer, performs linear transformation through a series of weights and biases, and then performs nonlinear transformation through nonlinear activation functions (e.g., reLU, tanh, etc.), thereby performing further feature extraction and combination.

LSTM layer 307, which accepts the output of fully connected layer 304, captures the dynamic changes in the feature over time through a series of gates (forget gate, input gate, output gate) and states (cell state and hidden state). A deep LSTM network may be formed by stacking multiple LSTM layers. Here LSTM is a Long Short-Term Memory neural network (Long Short-Term Memory).

Output layer 308, which accepts the output of LSTM layer 305, outputs the final emotional state through a linear transformation (weights and biases) and then through an activation function (e.g., sigmoid or softmax). This emotional state may be classified into two categories (positive and negative), or into multiple categories (happy, sad, angry, surprise, etc.).

The network parameters (Φ) of the deep emotion network need to be initialized before training is started. These parameters include the weight of the convolution kernel, the weight and offset of the full connection layer, and the various weights and offsets of the LSTM layer. Initialization of these parameters may be performed according to various methods, for example using gaussian distributed random numbers, or using pre-trained model parameters.

And then training the deep emotion network by using a training sample to obtain a network parameter (phi).

The preprocessed data (X) is input into a deep emotion network after training, so that the emotion state of the student is obtained.

Step S104: according to the learning behavior characteristics and the emotion states of the students, the teaching content and the mode are dynamically adjusted by using a deep reinforcement learning algorithm, wherein the updating process of the deep reinforcement learning can be described by the following formula:

Where θ_t is the policy network parameter at time t, α is the learning rate, δ_t is the TD error, A_t is the action taken in state S_t, pi (A_t|S_t, θ_t) is the probability of taking action A_t in state S_t, and output is the dynamically adjusted teaching content and mode. The meaning of this formula is: the policy network parameter θ_t of the current time step t is updated to the parameter θ_ { t+1} of the next time step t+1 by adding an update to the current parameter θ_t, the update being made by the learning rate α, the TD error δ_t, and the gradient of the action probability->Multiplication.

In the step, according to the learning behavior characteristics and the emotion states of students, the teaching contents and modes are dynamically adjusted by using a deep reinforcement learning algorithm. For example, if the system detects that a student is showing confusion at a certain knowledge point, it may choose to repeat some important concepts or interpret them differently, all automatically determined by the algorithm.

The following details the implementation procedure of step S104:

(1) Definition of state (S) and action (A)

First, states and actions need to be defined. In this scenario, the states may include the student's learning behavioral characteristics and emotional state, both of which are derived from previous deep learning models.

An action may then be defined as a selection of tutorial content and manner. For example, some possible actions include:

(1a) The teaching difficulty is improved or reduced: if students' learning activities indicate that they have well understood the current teaching content, the teaching system may increase the difficulty of teaching, introducing more complex concepts or problems. Conversely, if the student shows struggling or confusing emotional states, the teaching system may reduce the difficulty of teaching, explain the current concept in more detail, or provide more practice.

(1b) Changing the teaching mode: teaching means may include instructions, examples, discussions, team activities, practice activities, and the like. If the student's learning behavior and emotional state indicate that they are not interested in the current teaching mode or are not learning effectively, the teaching system may change the teaching mode, for example, switch from lecturing to practice activities, or switch from personal learning to team discussion.

(1c) Personalized feedback and help: the teaching system may provide personalized feedback and assistance based on the learning behavior and emotional state of the student. For example, if a student is hesitant on a problem, the teaching system may provide relevant prompts or explanations; the teaching system may provide encouragement and support if the student is confused or frustrated.

Through deep reinforcement learning, the teaching system can dynamically adjust teaching contents and modes according to feedback (e.g., learning behavior characteristics and emotion states) of students, so that personalized learning requirements of the students are better met. This process is just like playing a game, the teaching system (i.e., the learning-intensive agent) observes the feedback of the environment (i.e., the student's learning behavior and emotional state) by trying different actions (i.e., adjusting the teaching content and mode), and then optimizes its strategy (i.e., the teaching method) based on this feedback so that the long-term cumulative return (e.g., the student's learning effect or satisfaction) is maximized.

Still further, in defining the status, a unique emotional threshold may be set for each student to quantify their emotional response. For example, some students may need help when they are slightly confused, while other students may need to seek help when they are deeply confused. Such personalized emotion thresholds may be learned and adjusted based on the student's historical learning data. This setting can help the teaching system to more finely judge and adapt to the needs of each student.

The following describes how to determine a unique emotion threshold for each student:

(a) Initial setting: when students begin to use the system, an initial emotion threshold may be set, which may be based on educational psychology theory or past study data.

(b) Collecting data: the system needs to continuously collect learning data and emotional responses of the students. Learning data includes their performance in performing various learning activities, such as speed, accuracy, etc. of completing a task. Emotional responses may be collected in a variety of ways, such as self-reporting questionnaires, facial expression analysis, speech emotion analysis, and the like.

(c) Data analysis: through deep learning or machine learning analysis on the collected data, the emotional response modes of the students in different environments can be obtained. This includes their emotional response in the face of learning tasks of varying difficulty, and their emotional changes in the face of difficulty.

(d) Adjusting a threshold value: based on the analysis results, the emotion threshold of the student can be adjusted. For example, if the analysis results show that a student is significantly confused when it encounters a slight difficulty, his emotional threshold may be lowered. Conversely, if a student is confused when encountering a deep difficulty, his emotional threshold may be raised.

(e) Iterative optimization: the above process needs to be repeated, the system needs to continuously collect and analyze data, and then the emotion threshold value is adjusted according to the new analysis result, so that the threshold value can more accurately reflect the actual emotion requirement of the student.

The setting of personalized emotion threshold values is a continuous learning and adjustment process that needs to be based on a large amount of student learning data and emotion response data. The setting mode can help the intelligent teaching system to more accurately understand and adapt to the emotion requirements of each student, so that more personalized and effective teaching support is provided.

Further, the system dynamically adjusts the teaching content and mode when the emotional state of the student exceeds a set emotional threshold. The dynamic adjustment here is not just to change the teaching content or change the teaching mode, but rather to select teaching strategies for specific emotional states.

For example, if a student is confused, the teaching system may choose not only to repeat or interpret concepts differently, but also consider introducing elements like analogy, story, game, etc. This approach may re-teach what is difficult to understand in a more engaging and lively manner. For example, when teaching physical concepts, related daily life instances may be introduced, or through games to assist students in understanding and mastering. This approach may help students understand complex concepts from different perspectives while also alleviating their confusion and anxiety.

(2) Strategy (pi)

A policy is a function that decides which action to select in a given state. In deep reinforcement learning, the strategy is typically represented by a neural network whose parameter is θ. This neural network, commonly referred to as a policy network, has the task of outputting a probability for each possible action based on the current state. The input to the network is the state (in this case, the student's learned behavioral characteristics and emotional state) and the output is the probability of each possible action. That is, pi (A_t|S_t, θ_t) is the probability of taking action A_t in state S_t, which is given by the policy network. The policy network may be a multi-layer perceptron (MLP) that can handle structured data well, in which case the learning behavior features and emotional state of the student can be encoded as a numerical vector.

The relationship between the parameters for deep reinforcement learning and the parameters of the policy network is briefly described below:

deep reinforcement learning is a method of reinforcement learning, which combines the features of deep learning and reinforcement learning. In deep reinforcement learning, one key component is a policy network that is used to determine what actions an agent (e.g., our emotion-aware intelligent teaching system) should take in a given state.

The parameters of the policy network are parameters of deep reinforcement learning. These parameters determine the structure and behavior of the policy network, that is, they determine the probability distribution of actions that an agent should take in a given environmental state. During the training process, these parameters are continually adjusted to find strategies that maximize the jackpot.

Thus, the parameters of deep reinforcement learning and the parameters of the policy network are in fact the same set of parameters, except that the roles of this set of parameters are described from different angles: from a deep reinforcement learning perspective, these parameters are used to decide how to learn the optimal action strategy from environmental conditions and rewards; from the perspective of the policy network, these parameters determine the behavior of the network, i.e. the action policy of the agent.

(3) TD error (delta)

TD error (Temporal Difference Error) is an important concept that measures the gap between expected return and actual return. In reinforcement learning, rewards are generally defined as the sum of all possible rewards in the future starting from the current state. In the application scenario of the present embodiment, rewards may be defined as learning progress of students. If the expected learning improvement is higher than the actual learning improvement, then the TD error is negative; conversely, if the expected learning improvement is lower than the actual learning improvement, then the TD error is positive.

(4) Updating parameters (θ) of neural networks

Given the TD error and the learning rate (α), the parameters of the neural network can be updated with the following formula:

the meaning of this formula is to decide how to adjust the parameters of the neural network according to the size and sign of the TD error and the probability of the selected action in the current state. If the TD error is positive, then the probability of selecting the action is increased; conversely, if the TD error is negative, then the probability of selecting the action is reduced.

(5) Dynamically adjusting teaching content and mode

By continuing the above process, the parameters of the neural network will be updated and the policies will be changed accordingly. This means that the system will dynamically adjust the teaching content and the way according to the learning behavior and emotional state of the students. For example, if a student is very well mastered of a portion of content, the system may choose to skip the portion of content to the next portion; conversely, if a student encounters difficulty with a certain piece of content, the system may choose to re-teach the piece of content in a different manner. Such dynamic adjustment allows each student to get own, personalized teaching.

A specific example for this step is given below.

For example, assuming a student encounters difficulty in learning the programming language Python, his/her learning behavior characteristics may include spending excessive time at a certain knowledge point, or frequently viewing related references, and emotional states may be confusing and frustrating.

In this case, the deep reinforcement learning system provided in the present embodiment may perform the following operations:

(1) Definition state (S): the state consists of learning behavioral characteristics of the student (e.g., spending excessive time at a certain knowledge point, frequently viewing related references) and emotional states (e.g., confusion and frustration).

(2) Defining action (a): the action may be to choose to teach different resources of Python, or to adjust the teaching mode, such as using more graphic interpretations, or to give more practical examples.

(3) Strategy (pi): the policy network will give a probability for each possible action, e.g. selecting a different teaching material or changing teaching mode. In this example, the policy network may recommend more intuitive textbooks or interactive teaching.

(4) TD error (δ): assuming that the system predicts that the student will understand Python better after using the new teaching material or teaching mode, but in practice the student's understanding is not improved, the expected return (improvement in student understanding) is higher than the actual return and the TD error is negative.

(5) Updating parameters (θ) of the neural network: because the TD error is negative, the probability of selecting the action (selecting the teaching material or teaching mode) needs to be reduced, so the formula is followed To adjust parameters of the neural network.

(6) Dynamically adjusting teaching content and mode: as the parameters of the neural network are continually updated, the policies change. In this example, the system may recommend another teaching material to be used or take a completely different teaching form.

The above is a specific example describing the process of dynamically adjusting teaching content and modes using a deep reinforcement learning algorithm according to learning behavioral characteristics and emotional states of students.

In this embodiment, the method for emotion perception intelligent teaching further includes adjusting and optimizing parameters of the deep reinforcement learning according to learning results and feedback information of students, so as to improve future teaching effects.

Adjusting and optimizing the parameters of deep reinforcement learning is an iterative process, with the goal of improving teaching results in the future. The following steps are specific:

(1) Collection of learning results and feedback information: first, a mechanism is required to collect learning results and feedback information of students. The learning outcome may be the performance of a student in practice, work or testing. For example, the accuracy of the student, the speed at which the job is completed, or the score in the test, etc. may be calculated. The feedback information may come from direct assessment of the teaching content and manner by the student. For example, students may be required to fill out a questionnaire after each learning activity, expressing their satisfaction with the teaching content and manner.

(2) Definition of the reward function: a bonus function needs to be defined to measure how good each action is. This reward function may be based on the student's learning outcome and feedback information. For example, if students have increased accuracy or their assessment of teaching content and mode is higher, a greater reward may be given. Conversely, if the accuracy of the students decreases, or their assessment is lower, less rewards or penalties may be given.

The present embodiment provides a bonus function:

R＝α*L+β*F+γ*E+δ*P+ε*I

wherein:

r: the output value of the reward function represents the overall reward for the system to take some action in the current state;

l: representing learning results such as student accuracy, rate of completion of work, test score, etc. Appropriate normalization strategies can be adopted to convert the indexes into a unified measure;

f: representing feedback satisfaction, which can be obtained by a questionnaire filled in by a student after a teaching activity, and also can be converted into a unified measure by adopting a proper standardization strategy;

e: representing the emotional state of the student, the emotional state of the student can be predicted and quantified through the deep emotion network, for example, the emotional state is converted into a value in the range of [0,1 ];

P: representing learning progress such as improvement of accuracy of students, acceleration of completion of homework, increase of test score, etc. Appropriate normalization strategies may be employed to translate the metrics onto a unified metric.

I: representing improvements in emotional state may be quantified by the predicted outcome of the deep emotion network. For example, an average improvement in the emotional state of a student over a period of time may be calculated.

Alpha, beta, gamma, delta, epsilon: the weight coefficient is a coefficient for adjusting the weight, and the weight coefficient can be adjusted according to actual needs so as to reflect the importance degree of learning achievements, feedback satisfaction and emotion states.

The design of the reward function fully considers the learning effect, satisfaction degree and emotion state of the students and the improvement of the students, and is helpful for the intelligent teaching method provided by the embodiment to make better decisions, so that more personalized and efficient teaching experience is provided for the students. Meanwhile, the reward function is adjustable and can be dynamically adjusted according to actual teaching targets and feedback of students.

(3) And (3) adjusting deep reinforcement learning parameters: at each step, parameters of the policy network in this embodiment may be updated according to a reward function and a deep reinforcement learning algorithm (e.g., Q-learning or Actor-Critic, etc.). For example, a gradient ascent method may be used to update parameters of the policy network so that in the current state, the probability of an action that rewards is expected to be maximized is higher.

The adjustment of the deep reinforcement learning parameters is a core link, and is the key of strategy optimization in an intelligent teaching system. In deep reinforcement learning, policies are represented and enforced by a policy network, and the parameters of the policy network determine its behavior.

Next, this embodiment describes this process in detail:

(a) Rewarding functions and status assuming that the intelligent teaching system is performing interactive teaching with a student, each teaching action may be to explain a new concept, provide an instance, provide some additional learning resources, or perform a quiz, etc. At each step, the system can observe the current state, including the student's progress of learning, degree of understanding, emotional state, and their feedback on previous teaching and patterns, etc. In addition, the system receives rewards for the last action, which are calculated by the rewards function defined above.

(b) Decision making and execution the policy network generates an action based on the current state and the bonus function, which is selected from all possible actions. This selection process is typically probability-based, that is, there is a probability that each possible action is selected, which is determined by parameters of the policy network.

(c) Parameter updating, namely, after the action is executed, the system obtains new state and rewards. The parameters of the policy network can then be updated based on this new information. In particular, the present embodiment contemplates that the policy network may be more biased towards selecting actions that result in higher rewards. To achieve this goal, a gradient-ramp method may be used to adjust parameters of the policy network. This process can be seen as climbing up the hill in the policy space, hopefully finding a policy that maximizes the total prize that is expected in all possible states.

For parameter updating, the gradient is first calculated. In deep reinforcement learning, the gradient is typically calculated by backtracking (backprojection). An estimate of the value function or the merit function is calculated based on the reward and the new state, and then a gradient of the difference between this estimate and the policy network output is calculated. This gradient reflects how the parameters of the policy network need to be changed in order to increase the expected total rewards.

This gradient can then be used to update the parameters of the policy network. In particular, moving the parameter a small step in the direction of the gradient may cause the expected total prize to increase. This step size is usually determined by a super parameter called the learning rate.

The above is the adjustment process of the deep reinforcement learning parameter. It should be noted that this process requires a significant amount of data and computing resources. Thus, to increase efficiency, updates are typically performed on a batch of data, rather than at every step. This method is called batch update or small batch update, and can greatly improve the calculation efficiency and improve the learning stability.

(4) Policy enforcement and evaluation: the updated policies may then be used to determine new tutorials and ways. This new strategy is implemented and new learning results and feedback information are collected. These new data can then be used to evaluate the strategy in this embodiment to see if there is improvement.

(5) Iterative optimization: this process is iterative and can be repeated to collect new data, adjust the rewards function, update parameters of the policy network, execute new policies, evaluate the effect until satisfied.

Some adjustments may be required in this process depending on the particular situation. For example, it may be necessary to continually adjust the bonus function in practice to ensure that it effectively reflects the objectives of the present embodiment.

In the above embodiment, a deep reinforcement learning-based emotion perception intelligent teaching method is provided, and correspondingly, the application also provides a deep reinforcement learning-based emotion perception intelligent teaching system. Refer to fig. 4, which is a schematic diagram illustrating an embodiment of an emotion perception intelligent teaching system based on deep reinforcement learning. Since this embodiment, i.e. the second embodiment, is substantially similar to the method embodiment, the description is relatively simple, and reference should be made to the description of the method embodiment for relevant points. The system embodiments described below are merely illustrative.

The second embodiment of the application provides an emotion perception intelligent teaching system based on deep reinforcement learning, which comprises:

a collecting unit 401 for collecting multi-modal data of students, including text data, audio data, and video data generated during teaching;

a data obtaining unit 402, configured to perform preprocessing on the multimodal data using a deep learning model, and obtain preprocessed data, where the preprocessed data includes, but is not limited to, learning behavior features and features related to emotion;

an emotion obtaining unit 403, configured to obtain an emotion state of the student using a deep emotion network with the preprocessed data as input, where an expression of the deep emotion network is: e=f (X; phi), wherein E represents the emotion state of the student, X represents the preprocessed data, phi represents the parameters of the deep emotion network, and f represents the operation process of the deep emotion network and is output as the emotion state of the student;

The adjusting unit 404 is configured to dynamically adjust the teaching content and the manner by using a deep reinforcement learning algorithm according to the learning behavior feature and the emotion state of the student, where the updating process of the deep reinforcement learning can be described by the following formula:

A third embodiment of the present application provides an electronic device, including:

a processor;

and the memory is used for storing a program, and the program executes the emotion perception intelligent teaching method provided in the first embodiment of the application when being read and executed by the processor.

A fourth embodiment of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the emotion-awareness intelligent teaching method provided in the first embodiment of the present application.

While the preferred embodiment has been described, it is not intended to limit the invention thereto, and any person skilled in the art may make variations and modifications without departing from the spirit and scope of the present invention, so that the scope of the present invention shall be defined by the claims of the present application.

Claims

1. An emotion perception intelligent teaching method based on deep reinforcement learning comprises the following steps:

taking the preprocessed data as input, and obtaining the emotion state of the student by using a deep emotion network, wherein the expression of the deep emotion network is as follows: e=f (X; phi), wherein E represents the emotion state of the student, X represents the preprocessed data, phi represents parameters of the deep emotion network, and f represents the operation process of the deep emotion network and is output as the emotion state of the student;

according to the learning behavior characteristics and the emotion states of students, the teaching content and the mode are dynamically adjusted by using a deep reinforcement learning algorithm, wherein the updating process of the deep reinforcement learning is described by the following formula:

where θ_t is the policy network parameter at time t, α is the learning rate, δ_t is the TD error, A_t is the action taken in state S_t, pi (A_t|S_t, θ_t) is the probability of taking action A_t in state S_t, the output is dynamically adjusted Teaching content and manner.

2. The method as recited in claim 1, further comprising:

3. The method of claim 1, wherein preprocessing the multi-modal data using a deep learning model to obtain preprocessed data comprises:

4. The method of claim 1, wherein using the depth emotion network to obtain the emotion state of the student using the preprocessed data as input comprises:

5. The method of claim 4, wherein performing a specific-modality convolution and pooling operation on the preprocessed data to obtain key information in text data, key information in audio data, and key information in video data, comprises:

6. The method of claim 1, wherein dynamically adjusting the teaching content and mode using a deep reinforcement learning algorithm based on the learning behavioral characteristics and the emotional state of the student further comprises:

7. The method of claim 6, wherein the dynamically adjusting teaching content and mode using a deep reinforcement learning algorithm based on the learning behavioral characteristics and the emotional state of the student, further comprises:

8. The method of claim 2, wherein adjusting and optimizing parameters of the deep reinforcement learning to improve future teaching effects based on learning outcome and feedback information of the student comprises:

9. The method of claim 8, wherein the reward function is:

R＝α*L+β*F+γ*E+δ*P+ε*I，

10. An emotion perception intelligent teaching system based on deep reinforcement learning is characterized by comprising: