CN113158872A - Online learner emotion recognition method - Google Patents

Online learner emotion recognition method Download PDF

Info

Publication number
CN113158872A
CN113158872A CN202110408590.1A CN202110408590A CN113158872A CN 113158872 A CN113158872 A CN 113158872A CN 202110408590 A CN202110408590 A CN 202110408590A CN 113158872 A CN113158872 A CN 113158872A
Authority
CN
China
Prior art keywords
online
learner
model
attention
expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110408590.1A
Other languages
Chinese (zh)
Inventor
吕伟刚
吕立
张雅
池梦娅
刘鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocean University of China
Original Assignee
Ocean University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ocean University of China filed Critical Ocean University of China
Priority to CN202110408590.1A priority Critical patent/CN113158872A/en
Publication of CN113158872A publication Critical patent/CN113158872A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Probability & Statistics with Applications (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention belongs to the technical field of online education and expression recognition, and particularly relates to an emotion recognition method for online learners. An online learner emotion recognition method, comprising: (1) constructing a regional attention network model based on a transfer learning method; (2) training the model constructed in the step (1) by using an online learner expression database; (3) and performing emotion recognition on the input online learner expression by using the trained model. The automatic recognition technology for the emotion of the online learner, provided by the invention, can be applied to a real online course, automatically recognizes the emotion of the online learner in the learning process, is convenient for a lecturer to effectively intervene and guide the learner in time, and improves the effect and experience of online learning.

Description

Online learner emotion recognition method
Technical Field
The invention belongs to the technical field of online education and expression recognition, and particularly relates to an emotion recognition method for online learners.
Background
With the popularization of the internet and the development of digital media technology, online learning is becoming an option for more and more people. At the same time, online learning also exposes some problems. In the traditional classroom environment, a teacher and a learner are in the same physical space, eye-to-eye communication, facial expressions, postures and gestures between the teacher and the student are all conveying important information, and the teacher can make appropriate teaching adjustment according to the emotional expression of the learner. In an online learning environment, learning contents are generally directly displayed by electronic equipment, and the emotion of a learner is hardly concerned by a teacher due to the separation of time and space and the lack of interaction. Many studies in neuroscience, psychology and education have demonstrated that emotions can directly influence cognitive processes, especially Learning processes (menees, alfede and Liy-saleron, gustavo, serotonin and observation, Learning and memory [ J ]. Reviews in the Neurosciences,2012,23(5):543 553.; prophetic, moral. emotional and cognitive relation study development profile [ J ]. psychology, 2004(01): 241-; 243.; Mary, ainley. students, tasks and observations: Learning of the Learning of Learning and Learning' retrieval of spatial studies [ J ]. Learning and spatial studies, 433). Positive emotional states of happiness, selfish and curiosity contribute to the quality of learning, while negative emotional states of anxiety, fear and sadness may hinder the learning process. Therefore, it is very necessary to identify the emotion of the student in the online learning environment.
At present, many scholars at home and abroad are dedicated to the research on the problem, and common emotion recognition methods comprise direct inquiry, questionnaire survey, hidden parameter tracking, physiological signal recognition, voice recognition, expression recognition, gesture recognition and the like.
A widely used method of learner emotion recognition is questionnaire. Lopatovska, in studying the role of subjective factors in Information search, requires participants to complete an achievement emotion questionnaire after completion of the search task to detect emotion in the search (Lopatovska I. searching for a good food: learning relationships between search tasks and food [ J ]. Proceedings of the American Society for Information Science & Technology 2015,46(1): 1-13.). Monitoring physiological signals is also a method of emotion recognition. Chen et al used the emWave system to identify the learner's mood when studying the effect of multimedia material on the learner's mood and learning performance (Chen C M, Wang H P. Using the emotion technology to the effects of multimedia materials on learning exercise and performance [ J ]. Library & Information Science Research,2011,33(3):244 & 255.). The system is developed by the American HeartMath research institute, and three emotional states, namely negative, calm and positive, are identified by utilizing physiological signals of human body pulse. In addition, many studies have been conducted to automatically detect the emotion of a learner by using a computer and then to make a targeted adjustment to make the learning process closer to the actual needs of the learner. Faria et al proposed an emotional feedback model in their research and developed a software prototype (Faria AR, Almeida A, Martins C, et al, atmospheric perceptual on an empirical learning model pro-usal [ J ]. Telemics & information, 2016,34(6):824,837.) that used emotional computing techniques to adjust the learning process of students. Lee proposes a emotion-based text interaction recognition research and application framework in the research, analyzes the emotion of learners in An online learning environment, and further gives a corresponding emotion adjustment strategy (Lee J M. differential types of human interaction in online learning environment: An interaction of using online learning environment [ J ]. Proceedings of the American Society for Information Science and Technology,2006,43(1): 1-11.). Ouherou et al compared facial expressions of children with or without learning disabilities in a virtual learning environment and detected the emotions of the children by facial expression analysis when two groups of children played an educational game (Ouherou N, Elhammomi O, Benmarrakchi F, et al, comprehensive study on experience and analysis from facial expressions in childrenwithout and with learning abnormalities in virtual learning environment [ J ] Eduition and Information Technologies,2019,24(2):1777 1792.). Nezami et al propose a deep Learning model for detecting student participation in online Learning environment using expression recognition (Nezami O M, Dras M, Hamey L, et al. automatic recognition of student engaging Learning and facial expression [ C ]. Joint European Conference reference on Machine Learning and Knowledge Discovery in Databases,2019: 273-. However, the expressions in these studies are all basic expressions of joy, sadness, anger, surprise, disgust and fear, and no research is made on learning situation. The expressions appearing in the learning process are different from the common basic expressions, for example, the expressions such as curiosity, confusion, distraction, fatigue and anxiety which often appear in the learning process are greatly different from the basic expressions, and further research on the expressions is needed.
The expression recognition technology can collect facial expression data of learners in a non-invasive state, then carries out automatic emotion recognition, and is increasingly applied to the field of education (rock game, student classroom concentration evaluation model research based on the face recognition technology [ D ]. university of teacher in China, 2020.; Zhao Mei Ming. However, due to the particularity of the online Learning environment, the learners often have hand shielding, which has a certain effect on the accuracy of expression recognition (Sharma P, Joshi S, Gautam S, et al. student environment Detection Using expression Analysis, Eye Tracking and Head Movement with Machine Learning [ J ]. Technology and Innovation in Learning, Teaching and discovery, 2019, 1-8.). The study of occlusion in expression recognition technology is mainly focused on glasses, hats, scarves and the like, and the approximate position of the occlusion is predictable and greatly different from the texture of the face. However, the position of the hand occlusion is difficult to predict due to different action habits of people, and the color and texture of the hand skin are close to those of the face skin, so the expression recognition problem under the condition of the hand occlusion is more challenging.
Disclosure of Invention
The invention aims to provide an online learner emotion recognition method, which is used for accurately recognizing the emotion of an online learner by establishing a learner expression database in an online learning environment and adopting an expression recognition method based on transfer learning, so that the online learner is effectively guided and intervened, and the learning experience of the online learner is improved.
In order to achieve the purpose, the invention adopts the technical scheme that: an online learner emotion recognition method, comprising:
(1) constructing a regional attention network model based on a transfer learning method;
(2) training the model constructed in the step (1) by using an online learner expression database;
(3) and performing emotion recognition on the input online learner expression by using the trained model.
As a preferred embodiment of the present invention, the method for constructing the regional attention network model in step (1) comprises: retraining the pre-trained regional attention network model on the AffectNet data set, and replacing the last layer of the pre-trained model with a full connection layer with a plurality of outputs.
Further preferably, the training of the model in step (2) estimates a rough attention weight by using a full connection layer and a Sigmoid function; the calculation method of the region is shown in formula 1:
Figure BDA0003023279190000031
wherein q is0Is the parameter of the full connection layer, f represents the Sigmoid function; i denotes a region of picture division, FiFeatures representing the ith region, FiThe superscript T of (1) represents transposition;
all the region characteristics FiWeighted by its weight to obtain a global feature FmAs shown in the following equation 2:
Figure BDA0003023279190000041
Fmrepresenting the contents of all face regions as the final input to the classifier; n represents the maximum value of i, i.e., the number of divided regions.
Further preferably, the training of the model in step (2) estimates attention weights of the region features using the sample cascade and another fully-connected layer; the new attention weight is represented in the ith region of the relational attention module as shown in equation 3:
Figure BDA0003023279190000042
wherein q is1Is a parameter of FC, and f denotes Sigmoid function;
all the regional characteristics and the new attention weight obtained from the relational attention module are subjected to weighted calculation to obtain a new global characteristic PRANAs shown in equation 4.
Figure BDA0003023279190000043
PRANUsed as the final representation of the proposed RAN approach; v. ofiIndicating the attention weight of the new ith region further refined.
Further preferably, the training model is optimized by using the regional bias loss, as shown in equation 5:
Figure BDA0003023279190000044
where α is a hyperparameter used as a margin, μ0Is the attention weight, mu, of the original face imagemaxRepresenting the maximum weight of all local regions.
Further preferably, the online learner expression database contains facial expressions with academic moods.
Further preferably, the facial expression is divided into four conditions of no occlusion, left face occlusion, middle occlusion and right face occlusion according to occlusion conditions;
further preferably, the academic emotion comprises at least seven types of neutral, curious, happy, confused, depressed, distracted and tired emotions.
The invention provides a facial expression recognition method with transfer learning based on a regional attention network, a deep network structure can carry out end-to-end learning from an input image, and the influence of shielding is reduced. The regional attention network architecture can automatically sense the shielded face region, and simultaneously perform feature learning and shielding region coding through attention weight. The accuracy and the training speed of the model on the database built by the invention are improved by the transfer learning method. According to the invention, firstly, the RAN model pre-trained on the AffectNet data set is retrained, the network structure of the output layer is adjusted, meanwhile, the training process is continuously optimized based on the softmax function and the ADAM optimizer, and finally, the obtained depth model realizes 89% of expression recognition precision on the test set.
The automatic recognition technology for the emotion of the online learner, provided by the invention, can be applied to a real online course, automatically recognizes the emotion of the online learner in the learning process, is convenient for a lecturer to effectively intervene and guide the learner in time, and improves the effect and experience of online learning.
Drawings
FIG. 1 is some examples of an online learner expression database built in an embodiment of the present invention;
FIG. 2 is a confusion matrix for non-occluded expressions;
FIG. 3 is a confusion matrix with hand occlusion expressions;
FIG. 4 is a flowchart of an expression recognition method according to the present invention;
FIG. 5 is a schematic diagram of a regional attention network architecture;
FIG. 6 is a graph of the variation of model accuracy over the training set and validation set in accordance with the present invention;
FIG. 7 is a graph of the variation of the model loss function of the present invention over the training set and the validation set.
Detailed Description
In order to facilitate an understanding of the invention, the invention is described in more detail below with reference to the accompanying drawings and specific examples. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
The invention provides an emotion recognition method for an online learner, which comprises the following specific steps and implementation processes:
firstly, establishing an online learner expression database
The facial expression database is the basis of the expression recognition study. When evaluating and benchmarking different expression recognition algorithms, experiments are performed on standardized databases to make meaningful comparisons. The current research on expression is mostly based on the emotional theory of Ekman, and the six basic expressions accepted specifically refer to happiness, sadness, anger, surprise, disgust and fear (Ekman P.an arrangement for basic expressions [ J ]. Cognition & emotion,1992,6(3-4): 169-200.).
The prior literature already provides a large database of facial expressions. Most expression libraries include some of the following emotions: anger, disgust, fear, happiness, sadness, surprise, and slight sight. Some databases include facial expressions of embarrassment, pride, shame or benevolence. Some databases also contain body expressions. In general, to evaluate and benchmark different facial expression analysis algorithms, standardized databases are required to make meaningful comparisons. Without comparative testing of such standard databases, it is difficult to find the relative strengths and weaknesses of different facial expression recognition algorithms. Therefore, based on the particularity of the online learning environment, the invention establishes the online learner expression database with hand shielding conditions.
1. Collecting expression data
(1) Participants and mood inducing stimuli
30 volunteers, 17 girls, 13 boys, and an average age of 22.5 years were recruited in the present invention. They were all healthy college students, all of whom were informed of the purpose of the experiment and signed an informed consent before the experiment began. The study collects image data of online learners and information related to personal identities is kept secret. In order to collect expressions which appear spontaneously by online learners in the learning process, corresponding stimulation is not prepared in advance, and volunteers participating in experiments can freely select online courses to learn according to the professional field, learning style and the arrangement situation of the courseware.
(2) Recording environment and recording equipment
In order to obtain more spontaneous expressions, the experiment does not set any limit, no head action limit, no hand action limit and free sitting posture for the volunteers, and the volunteers can freely select favorite and habitual online learning environments for recording. The invention covers most online learning scenes, such as dormitories, laboratories, study rooms, libraries, living rooms in the home or bedrooms in the home, and the illumination conditions in these environments are different.
In order to reflect the real situation as much as possible, the invention does not use other equipment, but uses the camera of the electronic equipment used by each volunteer in online learning for recording. A camera in the middle above the display screen of a notebook computer is generally used. In addition, some experimenters use a notebook of a certain model number, the camera of which is hidden in the keyboard and is pressed and popped up for use when needed. Because the position of the camera is different, the angle of the recorded video is different. Due to the fact that the configuration of the notebook computer of each volunteer is different, the resolution of the camera is different. The recorded video resolution is 1920 × 1080, 1280 × 720 and 640 × 480.
2. Data processing and labeling
Since the experiments designed by the present invention are non-invasive, the present invention contains a self-annotated form for subjective assessment of mood. The form provides seven academic moods with high occurrence frequency, such as neutrality, curiosity, happiness, confusion, depression, distraction and fatigue, and also comprises four conditions, namely no shielding, shielding of the left face, shielding of the middle face and shielding of the right face. After the expression video is recorded, the volunteers are informed to record the expressions appearing in the video and their time nodes of beginning and ending. Besides the expressions in the form, they can also add other expressions as needed.
After the expression video data and volunteer self-report forms were collected, the videos were processed in turn using the FormatFactory tool. The expression video time of each volunteer collected by the invention is between one hour and two hours, and various expressions under different conditions are basically expressed. And manually segmenting the video of each person into different expression video segments according to the expression labels marked in the self-report table. The duration of each segment is between 3 and 10 seconds and only one distinct expression is displayed. The frame rate of the video is two of 24 frames per second and 30 frames per second. The number of expression segments obtained by each volunteer is between 28 and 61, and finally 786 expression video segments are obtained. Further processing is carried out on the expression video clips, key frames with expressions are intercepted, and 92947 pictures are obtained in total. The number of pictures for each expression is shown in table 1. Fig. 1 is some expression examples. Each row represents the same occlusion condition and each column represents the same expression.
TABLE 1 Online learner hand occlusion expression database
Expression of facial expressions Without shielding The left face is shielded Middle shelter The right face is shielded Total of
Neutral property 6127 4311 4132 5080 19650
Curiosity 2789 1954 2052 1849 8644
Enjoy the 4315 2400 2256 2216 11187
Puzzlement 4214 3418 2786 3331 13749
Depression and depression 4153 2281 2249 2167 10850
Distraction of the heart 5300 2904 2692 3337 14233
Fatigue 4610 3341 2625 4058 14634
Total of 31508 20609 18792 22038 92947
3. Database table condition tag validity verification
In order to ensure the validity of the label marked by the volunteer in the experiment, the invention uses a confusion matrix method to evaluate the effective credibility of the expression label in the established database. Three trained annotators complete the work.
The collected expression video clips are randomly disorderly sequenced, then each video file is renamed by numerical coding and then presented to a annotator. The annotator looks through each emotive video clip and then the annotator reselects the emoticon for each particular video. After the data of the marker is collected, the data is subjected to statistical analysis and a confusion matrix is calculated. As shown, FIG. 2 represents the confusion matrix in the case of no occlusion, and FIG. 3 represents the confusion matrix in the case of hand occlusion. The columns in the confusion matrix represent the volunteer's self-reported expressions, and the rows show the average percentage of emoji labels given by the annotators. The diagonal values represent the agreement of the volunteer self-reported emoji label with the emoji label given by the annotator.
Through the confusion matrix, the internal consistency between the expression labels marked by different markers can be proved to be better, so that the label of the image in the facial expression database established by the method has higher reliability. Moreover, it can be known through analysis that happiness is the expression which is most easily and accurately recognized. Hand shielding has certain influence on expression recognition, and under the shielding condition, the recognition rate of each expression is reduced to a certain degree.
Automatic facial expression recognition with transfer learning based on regional attention network
In several publicly available deep learning models, in view of the problem of hand occlusion, the invention provides an automatic facial expression recognition method with transfer learning based on a Regional Attention Network (RAN) model. The method performs end-to-end learning from the input image, performs targeted design on specific details of the RAN such as a network structure, a loss function and an optimization strategy, reduces dependence on the face geometric features and other preprocessing technologies, and reduces the influence of hand shielding.
The RAN model has performed well on the AffectNet dataset and can be used for scientific research. AffectNet is the largest dataset to date, providing category notes and potency notes. The data set contains one million images downloaded from the internet, including people of different genders, different ages, different countries, and different ethnicities. Researchers collected pictures in three major search engines (google, Bing, and yahoo) using six different languages (english, spanish, portuguese, german, arabic, and bosch) and 1250 emotion-related keywords.
The basic flow of the automatic facial expression recognition method with transfer learning based on the regional attention network provided by the invention is shown in fig. 4. The method comprises two stages of training and testing.
In the training stage, the invention retrains the RAN model which is pre-trained on the AffectNet data set based on the transfer learning method, and replaces the last layer of the RAN pre-training model by constructing a full connection layer with seven outputs. Training in an online learner expression database with hand occlusion by using a softmax activation function and an ADAM optimizer, and continuously optimizing to obtain a depth model for online learner expression recognition under the condition of hand occlusion. In the testing stage, the depth model is used for predicting the emotion of the online learner of the input test sample.
1. Regional attention network architecture
The RAN is composed of a feature extraction module, a regional attention module and a relationship attention module. The latter two modules aim to learn rough attention weights and are optimized globally separately. The RAN learns the attention weights of different regions in an end-to-end manner, can adaptively capture the importance of different facial region features, and make reasonable tradeoffs between regional and global features. The basic framework of the regional attention network is shown in fig. 5.
The input Facial Expression Image is encoded through VGG16 network, and then the feature mapping of the last layer of convolution layer is divided into 24 local regions by adopting a region decomposition method (K.Wang, X.Peng, J.Yang, et al.region orientation Networks for position and encryption route knowledge [ J ]. IEEE Transactions on Image Processing,2020,29, 4057) and 4069). Each local area is processed by PG-unit. Each PG-unit encodes a local region by a vector feature and evaluates the information content of the local region by an attention mechanism. The complete feature map is encoded by the GG-unit and weighted as a feature vector. The two fully connected layers follow, and finally the parameters of the whole network are learned by minimizing the softmax loss value.
The RAN can automatically perceive the occluded face regions and primarily focus on the non-occluded and information rich regions. Each local area in the RAN has a weight value according to the existence of shielding and the importance of expression recognition, and the adaptive weight is continuously learned in the network training process. The higher the occlusion the lower the weight value. Each unit can learn the occlusion mode from the data, encode it using the model weights, and assign different weights to the images without occlusion. Then, the weighted local features and the global features are concatenated together and input into the classification module. The advantage of the RAN is that no explicit handling of the occlusion is required, thereby avoiding detection or repair of errors. And the RAN unifies two tasks of feature learning and occlusion region coding in an end-to-end network architecture, and the two tasks mutually promote in a training process.
2. Design of loss function
The regional attention module with regional features applies the FC layer and Sigmoid function to estimate the coarse attention weights. The calculation method of the area is shown in formula 1.
Figure BDA0003023279190000091
Wherein q is0Is a parameter of FC, and f denotes Sigmoid function. At this stage, the invention summarizes all regional features and their attention weights as a global feature denoted as FmAs shown in the following equation 2.
Figure BDA0003023279190000101
FmCan be used as the final input to the classifier.
The regional attention module learns weights, F, with single features and non-linear mappingsmRepresenting the content of all facial regions, attention weighting can be further refined by modeling the relationship between the region features and the global representation.
The present invention uses a sample cascade and another FC layer to estimate attention weights for regional features. The new attention weight is represented in the ith region of the relational attention module as shown in equation 3.
Figure BDA0003023279190000102
Wherein q is1Is a parameter of FC, and f denotes Sigmoid function. At this stage, the invention performs weighted calculation on all the regional features and the new attention weight obtained from the relational attention module to obtain a new global feature PRANAs shown in equation 4.
Figure BDA0003023279190000103
PRANIs used as the final representation of the RAN approach proposed by the present invention.
Different facial expressions are mainly defined by different facial regions, so the present invention places a direct constraint on the attention weight of self-attention, namely the region bias Loss (RB-Loss). This constraint makes one of the local facial region attention weights should be larger than the original facial image with margins. The formal definition of RB-Loss is shown in equation 5.
Figure BDA0003023279190000104
Where α is a hyperparameter used as a margin, μ0Is the attention weight, mu, of the original face imagemaxRepresenting the maximum weight of all local regions.
In training, the classification loss is jointly optimized with the region bias loss. The proposed RB-Loss enhances the effect of regional attention and allows the RAN to obtain higher weights for regional and global representations.
3. Model parameter setting
The parameter setting also has great influence on the effect of the deep learning algorithm, and the proper parameters are beneficial to accelerating the training process of the deep model and improving the model effect. During RB-Loss and cross entropy Loss joint training, the default Loss weight ratio is 1: 1. During data set training, the learning rate is initialized to 0.01, the momentum is set to 0.9, the weight decay is set to 0.0005, and the margin in RB-Loss is defaulted to 0.02. In the training stage, the invention sets the actual batch size to be 128, sets the maximum iteration number to be 50K, and performs model training. In the aspect of optimization, the invention tries different optimization strategies, and finally proposes an ADAM optimizer based on the weight attenuation of the learning rate of 0.005, which has better performance in experiments.
4. Evaluation of the model
In order to evaluate the accuracy and feasibility of the method for online learner emotion prediction, the method performs model training on processors of Intel Xeon E5-1650V 4 CPU and two NVIDIA GeForce GTX 1080Ti GPU 11GB and 32GB RAM. And constructing an experimental environment based on the Ubuntu system, the Python language and the Pythrch tool kit. In addition, PyTorch is integrated with the TensorBoard, which is a tool for visualizing the results of the neural network training operation and can provide feedback information for evaluating the model performance and adjusting the training process.
The method divides the established online learner expression database with hand shielding into a training set, a verification set and a test set according to the ratio of 8:1: 1. And (3) performing model training on the training set and simultaneously verifying on the verification set, wherein 300 epochs are trained from the beginning, and accracy and loss on the training set and the verification set gradually tend to be stable.
Specific model accuracy
Analysis of FIG. 6 reveals that after 100 epochs have been trained, the model has been able to achieve 80% recognition accuracy. After 200 epoch trains are continued, the accuracy is slowly improved and can reach nearly 90%. Meanwhile, the variation of the loss function during the training process is shown in fig. 7.
During the first 100 epoch training sessions, loss rapidly drops to a very low value. After 200 epoch trains follow, the loss value decreased by approximately 0.001.
And finally, the expression recognition models obtained by training 300 epochs are used for realizing 89% of average expression recognition precision on the test set.
5. Comparative experiment
In addition to the method, the invention additionally uses three traditional facial expression recognition methods to perform self-labeling academic emotion inference experiments on the created online learner hand-shielding expression database to obtain the facial expression recognition accuracy rate, and the facial expression recognition accuracy rate is compared with the result obtained by the method provided by the invention.
Experiment one: the AAM-DTC algorithm uses an Active Appearance Model (AAM) as a feature extraction algorithm, and the classification method is a Decision Tree classification algorithm (DTC). Experiment one result shows that the hand shielding condition has great influence on the identification method.
Aiming at the condition of low recognition rate in the first experiment, the second experiment LGBPHS-DTC algorithm is replaced by a feature extraction algorithm, and a Local wavelet Binary Pattern Histogram Sequence (LGBPHS) is adopted as the feature extraction algorithm. Although the result of the second experiment is improved to a certain extent on the basis of the first experiment, the result is still not ideal.
Experiment three: the LGBPHS-KNN algorithm selects a K-Nearest Neighbor (KNN) classification algorithm to perform a second experiment on the basis of a retention experiment two-feature extraction algorithm. Although the recognition accuracy is gradually improved, satisfactory results are not shown.
Compared with the traditional algorithm, the automatic facial expression recognition method with transfer learning based on the regional attention network overcomes the influence of hand shielding to a certain extent, obviously improves the expression recognition accuracy rate, and has the effect close to that of a manual method. The results of the experiment are shown in table 3.
TABLE 2 comparison of results of different expression recognition methods
Method Accuracy (Accuracy)
AAM-DTC 29%
LGBPHS-DTC 66%
LGBPHS-KNN 80%
Method of the invention 89%
The invention provides a facial expression recognition method with transfer learning based on a regional attention network. The deep network structure can carry out end-to-end learning from the input image, and reduces the influence of hand shielding. The regional attention network architecture can automatically sense the shielded face region, and simultaneously perform feature learning and shielding region coding through attention weight. The accuracy and the training speed of the model on the database built by the invention are improved by the transfer learning method.

Claims (8)

1. An online learner emotion recognition method, comprising:
(1) constructing a regional attention network model based on a transfer learning method;
(2) training the model constructed in the step (1) by using an online learner expression database;
(3) and performing emotion recognition on the input online learner expression by using the trained model.
2. The emotion recognition method for online learners according to claim 1, wherein the regional attention network model in step (1) is constructed by: retraining the pre-trained regional attention network model on the AffectNet data set, and replacing the last layer of the pre-trained model with a full connection layer with a plurality of outputs.
3. The online learner emotion recognition method of claim 2, wherein the training of the model in step (2) estimates the attention weight μ of the region feature using the full connection layer and Sigmoid functioniThe calculation method is shown in formula 1:
Figure FDA0003023279180000011
wherein q is0Is the parameter of the full connection layer, f represents the Sigmoid function; i denotes a region of picture division, FiFeatures representing the ith region, FiThe superscript T of (1) represents transposition;
all the region characteristics FiWeighted by its weight to obtain a global feature FmAs shown in the following equation 2:
Figure FDA0003023279180000012
Fmrepresenting the contents of all face regions as the final input to the classifier; n represents the maximum value of i, i.e., the number of divided regions.
4. The online learner emotion recognition method of claim 3, wherein, in the training of the model in step (2), the attention weight of the regional feature is estimated using the sample cascade and another fully connected layer; the new attention weight is represented in the ith region of the relational attention module as shown in equation 3:
vi=f([Fi:Fm]Tq1) (3)
wherein q is1Is the parameter of the full connection layer, f represents the Sigmoid function; v. ofiAn attention weight representing a new ith area obtained by further refinement;
all the regional characteristics and the new attention weight obtained from the relational attention module are subjected to weighted calculation to obtain a new global characteristic PRANAs shown in equation 4:
Figure FDA0003023279180000021
PRANis used as the final representation of the proposed RAN approach.
5. The online learner emotion recognition method of claim 4, wherein the training model is optimized using regional bias loss, as shown in equation 5:
Figure FDA0003023279180000022
where α is a hyperparameter used as a margin, μ0Is the attention weight, mu, of the original face imagemaxRepresenting the maximum weight of all local regions.
6. The online learner emotion recognition method of any one of claims 1 to 5, wherein the online learner expression database contains facial expressions with academic emotions.
7. The method of claim 6, wherein the facial expressions are classified into four cases, i.e. no occlusion, left occlusion, middle occlusion, and right occlusion, according to the occlusion situation.
8. The method of claim 6, wherein the academic emotion is at least one of neutral, curious, happy, confused, depressed, distracted and tired.
CN202110408590.1A 2021-04-16 2021-04-16 Online learner emotion recognition method Pending CN113158872A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110408590.1A CN113158872A (en) 2021-04-16 2021-04-16 Online learner emotion recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110408590.1A CN113158872A (en) 2021-04-16 2021-04-16 Online learner emotion recognition method

Publications (1)

Publication Number Publication Date
CN113158872A true CN113158872A (en) 2021-07-23

Family

ID=76868026

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110408590.1A Pending CN113158872A (en) 2021-04-16 2021-04-16 Online learner emotion recognition method

Country Status (1)

Country Link
CN (1) CN113158872A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114913461A (en) * 2022-05-19 2022-08-16 广东电网有限责任公司 Emotion recognition method and device, terminal equipment and computer readable storage medium
CN115439915A (en) * 2022-10-12 2022-12-06 首都师范大学 Classroom participation identification method and device based on region coding and sample balance optimization

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110837777A (en) * 2019-10-10 2020-02-25 天津大学 Partial occlusion facial expression recognition method based on improved VGG-Net
CN111178242A (en) * 2019-12-27 2020-05-19 上海掌学教育科技有限公司 Student facial expression recognition method and system for online education
CN111797683A (en) * 2020-05-21 2020-10-20 台州学院 Video expression recognition method based on depth residual error attention network
CN112101241A (en) * 2020-09-17 2020-12-18 西南科技大学 Lightweight expression recognition method based on deep learning
CN112307958A (en) * 2020-10-30 2021-02-02 河北工业大学 Micro-expression identification method based on spatiotemporal appearance movement attention network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110837777A (en) * 2019-10-10 2020-02-25 天津大学 Partial occlusion facial expression recognition method based on improved VGG-Net
CN111178242A (en) * 2019-12-27 2020-05-19 上海掌学教育科技有限公司 Student facial expression recognition method and system for online education
CN111797683A (en) * 2020-05-21 2020-10-20 台州学院 Video expression recognition method based on depth residual error attention network
CN112101241A (en) * 2020-09-17 2020-12-18 西南科技大学 Lightweight expression recognition method based on deep learning
CN112307958A (en) * 2020-10-30 2021-02-02 河北工业大学 Micro-expression identification method based on spatiotemporal appearance movement attention network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KAI WANG ET AL.: "Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition", 《JOURNAL OF LATEX CLASS FILES》 *
YONG LI ET AL.: "Occlusion aware facial expression recognition using cnn with attention mechanism", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114913461A (en) * 2022-05-19 2022-08-16 广东电网有限责任公司 Emotion recognition method and device, terminal equipment and computer readable storage medium
CN115439915A (en) * 2022-10-12 2022-12-06 首都师范大学 Classroom participation identification method and device based on region coding and sample balance optimization

Similar Documents

Publication Publication Date Title
Dewan et al. Engagement detection in online learning: a review
Ding et al. A depression recognition method for college students using deep integrated support vector algorithm
CN110148318B (en) Digital teaching assistant system, information interaction method and information processing method
Dewan et al. A deep learning approach to detecting engagement of online learners
Shen et al. Assessing learning engagement based on facial expression recognition in MOOC’s scenario
Chen et al. Analyze spontaneous gestures for emotional stress state recognition: A micro-gesture dataset and analysis with deep learning
CN109902912B (en) Personalized image aesthetic evaluation method based on character features
Bu Human motion gesture recognition algorithm in video based on convolutional neural features of training images
CN115239527A (en) Teaching behavior analysis system for teaching characteristic fusion and modeling based on knowledge base
Wei et al. Bioinspired visual-integrated model for multilabel classification of textile defect images
Aslan et al. Multimodal video-based apparent personality recognition using long short-term memory and convolutional neural networks
CN112529054B (en) Multi-dimensional convolution neural network learner modeling method for multi-source heterogeneous data
CN113158872A (en) Online learner emotion recognition method
Nandi et al. A survey on multimodal data stream mining for e-learner’s emotion recognition
ALISAWI et al. Real-Time Emotion Recognition Using Deep Learning Methods: Systematic Review
AU2020101294A4 (en) Student’s physiological health behavioural prediction model using svm based machine learning algorithm
Luo et al. 3DLIM: Intelligent analysis of students’ learning interest by using multimodal fusion technology
Maddu et al. Online learners’ engagement detection via facial emotion recognition in online learning context using hybrid classification model
Tan et al. Towards automatic engagement recognition of autistic children in a machine learning approach
Song Emotional recognition and feedback of students in English e-learning based on computer vision and face recognition algorithms
Fu et al. Affective computation of students’ behaviors under classroom scenes
Xiaoning Application of artificial neural network in teaching quality evaluation
Li Learning behaviour recognition method of English online course based on multimodal data fusion
Huang et al. [Retracted] Analysis of Ideological and Political Classroom Based on Artificial Intelligence and Data Mining Technology
Qi Interaction and psychological characteristics of art teaching based on Openpose and Long Short-Term Memory network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210723

RJ01 Rejection of invention patent application after publication