CN113158872A

CN113158872A - Online learner emotion recognition method

Info

Publication number: CN113158872A
Application number: CN202110408590.1A
Authority: CN
Inventors: 吕伟刚; 吕立; 张雅; 池梦娅; 刘鹏
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2021-04-16
Filing date: 2021-04-16
Publication date: 2021-07-23

Abstract

The invention belongs to the technical field of online education and expression recognition, and particularly relates to an emotion recognition method for online learners. An online learner emotion recognition method, comprising: (1) constructing a regional attention network model based on a transfer learning method; (2) training the model constructed in the step (1) by using an online learner expression database; (3) and performing emotion recognition on the input online learner expression by using the trained model. The automatic recognition technology for the emotion of the online learner, provided by the invention, can be applied to a real online course, automatically recognizes the emotion of the online learner in the learning process, is convenient for a lecturer to effectively intervene and guide the learner in time, and improves the effect and experience of online learning.

Description

Online learner emotion recognition method

Technical Field

The invention belongs to the technical field of online education and expression recognition, and particularly relates to an emotion recognition method for online learners.

Background

With the popularization of the internet and the development of digital media technology, online learning is becoming an option for more and more people. At the same time, online learning also exposes some problems. In the traditional classroom environment, a teacher and a learner are in the same physical space, eye-to-eye communication, facial expressions, postures and gestures between the teacher and the student are all conveying important information, and the teacher can make appropriate teaching adjustment according to the emotional expression of the learner. In an online learning environment, learning contents are generally directly displayed by electronic equipment, and the emotion of a learner is hardly concerned by a teacher due to the separation of time and space and the lack of interaction. Many studies in neuroscience, psychology and education have demonstrated that emotions can directly influence cognitive processes, especially Learning processes (menees, alfede and Liy-saleron, gustavo, serotonin and observation, Learning and memory [ J ]. Reviews in the Neurosciences,2012,23(5):543 553.; prophetic, moral. emotional and cognitive relation study development profile [ J ]. psychology, 2004(01): 241-; 243.; Mary, ainley. students, tasks and observations: Learning of the Learning of Learning and Learning' retrieval of spatial studies [ J ]. Learning and spatial studies, 433). Positive emotional states of happiness, selfish and curiosity contribute to the quality of learning, while negative emotional states of anxiety, fear and sadness may hinder the learning process. Therefore, it is very necessary to identify the emotion of the student in the online learning environment.

At present, many scholars at home and abroad are dedicated to the research on the problem, and common emotion recognition methods comprise direct inquiry, questionnaire survey, hidden parameter tracking, physiological signal recognition, voice recognition, expression recognition, gesture recognition and the like.

A widely used method of learner emotion recognition is questionnaire. Lopatovska, in studying the role of subjective factors in Information search, requires participants to complete an achievement emotion questionnaire after completion of the search task to detect emotion in the search (Lopatovska I. searching for a good food: learning relationships between search tasks and food [ J ]. Proceedings of the American Society for Information Science & Technology 2015,46(1): 1-13.). Monitoring physiological signals is also a method of emotion recognition. Chen et al used the emWave system to identify the learner's mood when studying the effect of multimedia material on the learner's mood and learning performance (Chen C M, Wang H P. Using the emotion technology to the effects of multimedia materials on learning exercise and performance [ J ]. Library & Information Science Research,2011,33(3):244 & 255.). The system is developed by the American HeartMath research institute, and three emotional states, namely negative, calm and positive, are identified by utilizing physiological signals of human body pulse. In addition, many studies have been conducted to automatically detect the emotion of a learner by using a computer and then to make a targeted adjustment to make the learning process closer to the actual needs of the learner. Faria et al proposed an emotional feedback model in their research and developed a software prototype (Faria AR, Almeida A, Martins C, et al, atmospheric perceptual on an empirical learning model pro-usal [ J ]. Telemics & information, 2016,34(6):824,837.) that used emotional computing techniques to adjust the learning process of students. Lee proposes a emotion-based text interaction recognition research and application framework in the research, analyzes the emotion of learners in An online learning environment, and further gives a corresponding emotion adjustment strategy (Lee J M. differential types of human interaction in online learning environment: An interaction of using online learning environment [ J ]. Proceedings of the American Society for Information Science and Technology,2006,43(1): 1-11.). Ouherou et al compared facial expressions of children with or without learning disabilities in a virtual learning environment and detected the emotions of the children by facial expression analysis when two groups of children played an educational game (Ouherou N, Elhammomi O, Benmarrakchi F, et al, comprehensive study on experience and analysis from facial expressions in childrenwithout and with learning abnormalities in virtual learning environment [ J ] Eduition and Information Technologies,2019,24(2):1777 1792.). Nezami et al propose a deep Learning model for detecting student participation in online Learning environment using expression recognition (Nezami O M, Dras M, Hamey L, et al. automatic recognition of student engaging Learning and facial expression [ C ]. Joint European Conference reference on Machine Learning and Knowledge Discovery in Databases,2019: 273-. However, the expressions in these studies are all basic expressions of joy, sadness, anger, surprise, disgust and fear, and no research is made on learning situation. The expressions appearing in the learning process are different from the common basic expressions, for example, the expressions such as curiosity, confusion, distraction, fatigue and anxiety which often appear in the learning process are greatly different from the basic expressions, and further research on the expressions is needed.

The expression recognition technology can collect facial expression data of learners in a non-invasive state, then carries out automatic emotion recognition, and is increasingly applied to the field of education (rock game, student classroom concentration evaluation model research based on the face recognition technology [ D ]. university of teacher in China, 2020.; Zhao Mei Ming. However, due to the particularity of the online Learning environment, the learners often have hand shielding, which has a certain effect on the accuracy of expression recognition (Sharma P, Joshi S, Gautam S, et al. student environment Detection Using expression Analysis, Eye Tracking and Head Movement with Machine Learning [ J ]. Technology and Innovation in Learning, Teaching and discovery, 2019, 1-8.). The study of occlusion in expression recognition technology is mainly focused on glasses, hats, scarves and the like, and the approximate position of the occlusion is predictable and greatly different from the texture of the face. However, the position of the hand occlusion is difficult to predict due to different action habits of people, and the color and texture of the hand skin are close to those of the face skin, so the expression recognition problem under the condition of the hand occlusion is more challenging.

Disclosure of Invention

The invention aims to provide an online learner emotion recognition method, which is used for accurately recognizing the emotion of an online learner by establishing a learner expression database in an online learning environment and adopting an expression recognition method based on transfer learning, so that the online learner is effectively guided and intervened, and the learning experience of the online learner is improved.

In order to achieve the purpose, the invention adopts the technical scheme that: an online learner emotion recognition method, comprising:

(1) constructing a regional attention network model based on a transfer learning method;

(2) training the model constructed in the step (1) by using an online learner expression database;

(3) and performing emotion recognition on the input online learner expression by using the trained model.

As a preferred embodiment of the present invention, the method for constructing the regional attention network model in step (1) comprises: retraining the pre-trained regional attention network model on the AffectNet data set, and replacing the last layer of the pre-trained model with a full connection layer with a plurality of outputs.

Further preferably, the training of the model in step (2) estimates a rough attention weight by using a full connection layer and a Sigmoid function; the calculation method of the region is shown in formula 1:

wherein q is⁰Is the parameter of the full connection layer, f represents the Sigmoid function; i denotes a region of picture division, F_iFeatures representing the ith region, F_iThe superscript T of (1) represents transposition;

all the region characteristics F_iWeighted by its weight to obtain a global feature F_mAs shown in the following equation 2:

F_mrepresenting the contents of all face regions as the final input to the classifier; n represents the maximum value of i, i.e., the number of divided regions.

Further preferably, the training of the model in step (2) estimates attention weights of the region features using the sample cascade and another fully-connected layer; the new attention weight is represented in the ith region of the relational attention module as shown in equation 3:

wherein q is¹Is a parameter of FC, and f denotes Sigmoid function;

all the regional characteristics and the new attention weight obtained from the relational attention module are subjected to weighted calculation to obtain a new global characteristic P_RANAs shown in equation 4.

P_RANUsed as the final representation of the proposed RAN approach; v. of_iIndicating the attention weight of the new ith region further refined.

Further preferably, the training model is optimized by using the regional bias loss, as shown in equation 5:

where α is a hyperparameter used as a margin, μ₀Is the attention weight, mu, of the original face image_maxRepresenting the maximum weight of all local regions.

Further preferably, the online learner expression database contains facial expressions with academic moods.

Further preferably, the facial expression is divided into four conditions of no occlusion, left face occlusion, middle occlusion and right face occlusion according to occlusion conditions;

further preferably, the academic emotion comprises at least seven types of neutral, curious, happy, confused, depressed, distracted and tired emotions.

The invention provides a facial expression recognition method with transfer learning based on a regional attention network, a deep network structure can carry out end-to-end learning from an input image, and the influence of shielding is reduced. The regional attention network architecture can automatically sense the shielded face region, and simultaneously perform feature learning and shielding region coding through attention weight. The accuracy and the training speed of the model on the database built by the invention are improved by the transfer learning method. According to the invention, firstly, the RAN model pre-trained on the AffectNet data set is retrained, the network structure of the output layer is adjusted, meanwhile, the training process is continuously optimized based on the softmax function and the ADAM optimizer, and finally, the obtained depth model realizes 89% of expression recognition precision on the test set.

The automatic recognition technology for the emotion of the online learner, provided by the invention, can be applied to a real online course, automatically recognizes the emotion of the online learner in the learning process, is convenient for a lecturer to effectively intervene and guide the learner in time, and improves the effect and experience of online learning.

Drawings

FIG. 1 is some examples of an online learner expression database built in an embodiment of the present invention;

FIG. 2 is a confusion matrix for non-occluded expressions;

FIG. 3 is a confusion matrix with hand occlusion expressions;

FIG. 4 is a flowchart of an expression recognition method according to the present invention;

FIG. 5 is a schematic diagram of a regional attention network architecture;

FIG. 6 is a graph of the variation of model accuracy over the training set and validation set in accordance with the present invention;

FIG. 7 is a graph of the variation of the model loss function of the present invention over the training set and the validation set.

Detailed Description

In order to facilitate an understanding of the invention, the invention is described in more detail below with reference to the accompanying drawings and specific examples. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

The invention provides an emotion recognition method for an online learner, which comprises the following specific steps and implementation processes:

firstly, establishing an online learner expression database

The facial expression database is the basis of the expression recognition study. When evaluating and benchmarking different expression recognition algorithms, experiments are performed on standardized databases to make meaningful comparisons. The current research on expression is mostly based on the emotional theory of Ekman, and the six basic expressions accepted specifically refer to happiness, sadness, anger, surprise, disgust and fear (Ekman P.an arrangement for basic expressions [ J ]. Cognition & emotion,1992,6(3-4): 169-200.).

The prior literature already provides a large database of facial expressions. Most expression libraries include some of the following emotions: anger, disgust, fear, happiness, sadness, surprise, and slight sight. Some databases include facial expressions of embarrassment, pride, shame or benevolence. Some databases also contain body expressions. In general, to evaluate and benchmark different facial expression analysis algorithms, standardized databases are required to make meaningful comparisons. Without comparative testing of such standard databases, it is difficult to find the relative strengths and weaknesses of different facial expression recognition algorithms. Therefore, based on the particularity of the online learning environment, the invention establishes the online learner expression database with hand shielding conditions.

1. Collecting expression data

(1) Participants and mood inducing stimuli

30 volunteers, 17 girls, 13 boys, and an average age of 22.5 years were recruited in the present invention. They were all healthy college students, all of whom were informed of the purpose of the experiment and signed an informed consent before the experiment began. The study collects image data of online learners and information related to personal identities is kept secret. In order to collect expressions which appear spontaneously by online learners in the learning process, corresponding stimulation is not prepared in advance, and volunteers participating in experiments can freely select online courses to learn according to the professional field, learning style and the arrangement situation of the courseware.

(2) Recording environment and recording equipment

In order to obtain more spontaneous expressions, the experiment does not set any limit, no head action limit, no hand action limit and free sitting posture for the volunteers, and the volunteers can freely select favorite and habitual online learning environments for recording. The invention covers most online learning scenes, such as dormitories, laboratories, study rooms, libraries, living rooms in the home or bedrooms in the home, and the illumination conditions in these environments are different.

In order to reflect the real situation as much as possible, the invention does not use other equipment, but uses the camera of the electronic equipment used by each volunteer in online learning for recording. A camera in the middle above the display screen of a notebook computer is generally used. In addition, some experimenters use a notebook of a certain model number, the camera of which is hidden in the keyboard and is pressed and popped up for use when needed. Because the position of the camera is different, the angle of the recorded video is different. Due to the fact that the configuration of the notebook computer of each volunteer is different, the resolution of the camera is different. The recorded video resolution is 1920 × 1080, 1280 × 720 and 640 × 480.

2. Data processing and labeling

Since the experiments designed by the present invention are non-invasive, the present invention contains a self-annotated form for subjective assessment of mood. The form provides seven academic moods with high occurrence frequency, such as neutrality, curiosity, happiness, confusion, depression, distraction and fatigue, and also comprises four conditions, namely no shielding, shielding of the left face, shielding of the middle face and shielding of the right face. After the expression video is recorded, the volunteers are informed to record the expressions appearing in the video and their time nodes of beginning and ending. Besides the expressions in the form, they can also add other expressions as needed.

After the expression video data and volunteer self-report forms were collected, the videos were processed in turn using the FormatFactory tool. The expression video time of each volunteer collected by the invention is between one hour and two hours, and various expressions under different conditions are basically expressed. And manually segmenting the video of each person into different expression video segments according to the expression labels marked in the self-report table. The duration of each segment is between 3 and 10 seconds and only one distinct expression is displayed. The frame rate of the video is two of 24 frames per second and 30 frames per second. The number of expression segments obtained by each volunteer is between 28 and 61, and finally 786 expression video segments are obtained. Further processing is carried out on the expression video clips, key frames with expressions are intercepted, and 92947 pictures are obtained in total. The number of pictures for each expression is shown in table 1. Fig. 1 is some expression examples. Each row represents the same occlusion condition and each column represents the same expression.

TABLE 1 Online learner hand occlusion expression database

Expression of facial expressions	Without shielding	The left face is shielded	Middle shelter	The right face is shielded	Total of
						Neutral property	6127	4311	4132	5080	19650
Curiosity	2789	1954	2052	1849	8644
						Enjoy the	4315	2400	2256	2216	11187
Puzzlement	4214	3418	2786	3331	13749
						Depression and depression	4153	2281	2249	2167	10850
Distraction of the heart	5300	2904	2692	3337	14233
						Fatigue	4610	3341	2625	4058	14634
Total of	31508	20609	18792	22038	92947

3. Database table condition tag validity verification

In order to ensure the validity of the label marked by the volunteer in the experiment, the invention uses a confusion matrix method to evaluate the effective credibility of the expression label in the established database. Three trained annotators complete the work.

The collected expression video clips are randomly disorderly sequenced, then each video file is renamed by numerical coding and then presented to a annotator. The annotator looks through each emotive video clip and then the annotator reselects the emoticon for each particular video. After the data of the marker is collected, the data is subjected to statistical analysis and a confusion matrix is calculated. As shown, FIG. 2 represents the confusion matrix in the case of no occlusion, and FIG. 3 represents the confusion matrix in the case of hand occlusion. The columns in the confusion matrix represent the volunteer's self-reported expressions, and the rows show the average percentage of emoji labels given by the annotators. The diagonal values represent the agreement of the volunteer self-reported emoji label with the emoji label given by the annotator.

Through the confusion matrix, the internal consistency between the expression labels marked by different markers can be proved to be better, so that the label of the image in the facial expression database established by the method has higher reliability. Moreover, it can be known through analysis that happiness is the expression which is most easily and accurately recognized. Hand shielding has certain influence on expression recognition, and under the shielding condition, the recognition rate of each expression is reduced to a certain degree.

Automatic facial expression recognition with transfer learning based on regional attention network

In several publicly available deep learning models, in view of the problem of hand occlusion, the invention provides an automatic facial expression recognition method with transfer learning based on a Regional Attention Network (RAN) model. The method performs end-to-end learning from the input image, performs targeted design on specific details of the RAN such as a network structure, a loss function and an optimization strategy, reduces dependence on the face geometric features and other preprocessing technologies, and reduces the influence of hand shielding.

The RAN model has performed well on the AffectNet dataset and can be used for scientific research. AffectNet is the largest dataset to date, providing category notes and potency notes. The data set contains one million images downloaded from the internet, including people of different genders, different ages, different countries, and different ethnicities. Researchers collected pictures in three major search engines (google, Bing, and yahoo) using six different languages (english, spanish, portuguese, german, arabic, and bosch) and 1250 emotion-related keywords.

The basic flow of the automatic facial expression recognition method with transfer learning based on the regional attention network provided by the invention is shown in fig. 4. The method comprises two stages of training and testing.

In the training stage, the invention retrains the RAN model which is pre-trained on the AffectNet data set based on the transfer learning method, and replaces the last layer of the RAN pre-training model by constructing a full connection layer with seven outputs. Training in an online learner expression database with hand occlusion by using a softmax activation function and an ADAM optimizer, and continuously optimizing to obtain a depth model for online learner expression recognition under the condition of hand occlusion. In the testing stage, the depth model is used for predicting the emotion of the online learner of the input test sample.

1. Regional attention network architecture

The RAN is composed of a feature extraction module, a regional attention module and a relationship attention module. The latter two modules aim to learn rough attention weights and are optimized globally separately. The RAN learns the attention weights of different regions in an end-to-end manner, can adaptively capture the importance of different facial region features, and make reasonable tradeoffs between regional and global features. The basic framework of the regional attention network is shown in fig. 5.

The input Facial Expression Image is encoded through VGG16 network, and then the feature mapping of the last layer of convolution layer is divided into 24 local regions by adopting a region decomposition method (K.Wang, X.Peng, J.Yang, et al.region orientation Networks for position and encryption route knowledge [ J ]. IEEE Transactions on Image Processing,2020,29, 4057) and 4069). Each local area is processed by PG-unit. Each PG-unit encodes a local region by a vector feature and evaluates the information content of the local region by an attention mechanism. The complete feature map is encoded by the GG-unit and weighted as a feature vector. The two fully connected layers follow, and finally the parameters of the whole network are learned by minimizing the softmax loss value.

The RAN can automatically perceive the occluded face regions and primarily focus on the non-occluded and information rich regions. Each local area in the RAN has a weight value according to the existence of shielding and the importance of expression recognition, and the adaptive weight is continuously learned in the network training process. The higher the occlusion the lower the weight value. Each unit can learn the occlusion mode from the data, encode it using the model weights, and assign different weights to the images without occlusion. Then, the weighted local features and the global features are concatenated together and input into the classification module. The advantage of the RAN is that no explicit handling of the occlusion is required, thereby avoiding detection or repair of errors. And the RAN unifies two tasks of feature learning and occlusion region coding in an end-to-end network architecture, and the two tasks mutually promote in a training process.

2. Design of loss function

The regional attention module with regional features applies the FC layer and Sigmoid function to estimate the coarse attention weights. The calculation method of the area is shown in formula 1.

Wherein q is⁰Is a parameter of FC, and f denotes Sigmoid function. At this stage, the invention summarizes all regional features and their attention weights as a global feature denoted as F_mAs shown in the following equation 2.

F_mCan be used as the final input to the classifier.

The regional attention module learns weights, F, with single features and non-linear mappings_mRepresenting the content of all facial regions, attention weighting can be further refined by modeling the relationship between the region features and the global representation.

The present invention uses a sample cascade and another FC layer to estimate attention weights for regional features. The new attention weight is represented in the ith region of the relational attention module as shown in equation 3.

Wherein q is¹Is a parameter of FC, and f denotes Sigmoid function. At this stage, the invention performs weighted calculation on all the regional features and the new attention weight obtained from the relational attention module to obtain a new global feature P_RANAs shown in equation 4.

P_RANIs used as the final representation of the RAN approach proposed by the present invention.

Different facial expressions are mainly defined by different facial regions, so the present invention places a direct constraint on the attention weight of self-attention, namely the region bias Loss (RB-Loss). This constraint makes one of the local facial region attention weights should be larger than the original facial image with margins. The formal definition of RB-Loss is shown in equation 5.

In training, the classification loss is jointly optimized with the region bias loss. The proposed RB-Loss enhances the effect of regional attention and allows the RAN to obtain higher weights for regional and global representations.

3. Model parameter setting

The parameter setting also has great influence on the effect of the deep learning algorithm, and the proper parameters are beneficial to accelerating the training process of the deep model and improving the model effect. During RB-Loss and cross entropy Loss joint training, the default Loss weight ratio is 1: 1. During data set training, the learning rate is initialized to 0.01, the momentum is set to 0.9, the weight decay is set to 0.0005, and the margin in RB-Loss is defaulted to 0.02. In the training stage, the invention sets the actual batch size to be 128, sets the maximum iteration number to be 50K, and performs model training. In the aspect of optimization, the invention tries different optimization strategies, and finally proposes an ADAM optimizer based on the weight attenuation of the learning rate of 0.005, which has better performance in experiments.

4. Evaluation of the model

In order to evaluate the accuracy and feasibility of the method for online learner emotion prediction, the method performs model training on processors of Intel Xeon E5-1650V 4 CPU and two NVIDIA GeForce GTX 1080Ti GPU 11GB and 32GB RAM. And constructing an experimental environment based on the Ubuntu system, the Python language and the Pythrch tool kit. In addition, PyTorch is integrated with the TensorBoard, which is a tool for visualizing the results of the neural network training operation and can provide feedback information for evaluating the model performance and adjusting the training process.

The method divides the established online learner expression database with hand shielding into a training set, a verification set and a test set according to the ratio of 8:1: 1. And (3) performing model training on the training set and simultaneously verifying on the verification set, wherein 300 epochs are trained from the beginning, and accracy and loss on the training set and the verification set gradually tend to be stable.

Specific model accuracy

Analysis of FIG. 6 reveals that after 100 epochs have been trained, the model has been able to achieve 80% recognition accuracy. After 200 epoch trains are continued, the accuracy is slowly improved and can reach nearly 90%. Meanwhile, the variation of the loss function during the training process is shown in fig. 7.

During the first 100 epoch training sessions, loss rapidly drops to a very low value. After 200 epoch trains follow, the loss value decreased by approximately 0.001.

And finally, the expression recognition models obtained by training 300 epochs are used for realizing 89% of average expression recognition precision on the test set.

5. Comparative experiment

In addition to the method, the invention additionally uses three traditional facial expression recognition methods to perform self-labeling academic emotion inference experiments on the created online learner hand-shielding expression database to obtain the facial expression recognition accuracy rate, and the facial expression recognition accuracy rate is compared with the result obtained by the method provided by the invention.

Experiment one: the AAM-DTC algorithm uses an Active Appearance Model (AAM) as a feature extraction algorithm, and the classification method is a Decision Tree classification algorithm (DTC). Experiment one result shows that the hand shielding condition has great influence on the identification method.

Aiming at the condition of low recognition rate in the first experiment, the second experiment LGBPHS-DTC algorithm is replaced by a feature extraction algorithm, and a Local wavelet Binary Pattern Histogram Sequence (LGBPHS) is adopted as the feature extraction algorithm. Although the result of the second experiment is improved to a certain extent on the basis of the first experiment, the result is still not ideal.

Experiment three: the LGBPHS-KNN algorithm selects a K-Nearest Neighbor (KNN) classification algorithm to perform a second experiment on the basis of a retention experiment two-feature extraction algorithm. Although the recognition accuracy is gradually improved, satisfactory results are not shown.

Compared with the traditional algorithm, the automatic facial expression recognition method with transfer learning based on the regional attention network overcomes the influence of hand shielding to a certain extent, obviously improves the expression recognition accuracy rate, and has the effect close to that of a manual method. The results of the experiment are shown in table 3.

TABLE 2 comparison of results of different expression recognition methods

Method	Accuracy (Accuracy)
		AAM-DTC	29％
LGBPHS-DTC	66％
		LGBPHS-KNN	80％
Method of the invention	89％

The invention provides a facial expression recognition method with transfer learning based on a regional attention network. The deep network structure can carry out end-to-end learning from the input image, and reduces the influence of hand shielding. The regional attention network architecture can automatically sense the shielded face region, and simultaneously perform feature learning and shielding region coding through attention weight. The accuracy and the training speed of the model on the database built by the invention are improved by the transfer learning method.

Claims

1. An online learner emotion recognition method, comprising:

2. The emotion recognition method for online learners according to claim 1, wherein the regional attention network model in step (1) is constructed by: retraining the pre-trained regional attention network model on the AffectNet data set, and replacing the last layer of the pre-trained model with a full connection layer with a plurality of outputs.

3. The online learner emotion recognition method of claim 2, wherein the training of the model in step (2) estimates the attention weight μ of the region feature using the full connection layer and Sigmoid function_iThe calculation method is shown in formula 1:

4. The online learner emotion recognition method of claim 3, wherein, in the training of the model in step (2), the attention weight of the regional feature is estimated using the sample cascade and another fully connected layer; the new attention weight is represented in the ith region of the relational attention module as shown in equation 3:

v_i＝f([F_i：F_m]^Tq¹) (3)

wherein q is¹Is the parameter of the full connection layer, f represents the Sigmoid function; v. of_iAn attention weight representing a new ith area obtained by further refinement;

all the regional characteristics and the new attention weight obtained from the relational attention module are subjected to weighted calculation to obtain a new global characteristic P_RANAs shown in equation 4:

P_RANis used as the final representation of the proposed RAN approach.

5. The online learner emotion recognition method of claim 4, wherein the training model is optimized using regional bias loss, as shown in equation 5:

6. The online learner emotion recognition method of any one of claims 1 to 5, wherein the online learner expression database contains facial expressions with academic emotions.

7. The method of claim 6, wherein the facial expressions are classified into four cases, i.e. no occlusion, left occlusion, middle occlusion, and right occlusion, according to the occlusion situation.

8. The method of claim 6, wherein the academic emotion is at least one of neutral, curious, happy, confused, depressed, distracted and tired.