CN113331839A

CN113331839A - Network learning attention monitoring method and system based on multi-source information fusion

Info

Publication number: CN113331839A
Application number: CN202110592337.6A
Authority: CN
Inventors: 杨娟; 黄奥
Original assignee: Wuhan University of Science and Engineering WUSE
Current assignee: Wuhan University of Science and Engineering WUSE; Wuhan University of Science and Technology WHUST
Priority date: 2021-05-28
Filing date: 2021-05-28
Publication date: 2021-09-03

Abstract

The invention relates to a network learning attention monitoring method and system based on multi-source information fusion, wherein the method comprises the following steps: s1: collecting learning video data and brain wave data; s2: preprocessing the learning video data and the brain wave data; estimating the sight of human eyes according to the preprocessed learning video data to obtain positioning information of the sight of human eyes; extracting electroencephalogram characteristics according to the preprocessed electroencephalogram data; s3: and inputting the human eye sight line positioning information and the electroencephalogram characteristics into a time sequence prediction model for learning attention monitoring. The method and the device solve the problem that whether the learner stares at the screen or not and then deduces whether the learner concentrates the attention or not only according to the head posture information in the prior art, and can accurately judge the internet learning attention of the learner.

Description

Network learning attention monitoring method and system based on multi-source information fusion

Technical Field

The invention belongs to the technical field of monitoring, and particularly relates to a network learning attention monitoring method and system based on multi-source information fusion.

Background

With the deep progress of education informatization work, online learning gradually becomes one of mainstream learning paradigms because of not being limited by physical space-time, but in the online learning process, learners are easily interfered by external factors to cause difficulty in focusing attention, and meanwhile, because of factors such as online teaching teacher-to-student ratio and space isolation, teachers are difficult to find the problem of the defocusing attention of learners in time, the online learning efficiency is low, and the learning effect is poor.

The existing network teaching attention recognition mainly analyzes the head posture and the like of a learner based on video data of the learner, judges whether the learner gazes at a learning screen, and further judges whether the learner concentrates on network learning content, but the methods are difficult to accurately recognize whether the learner gazes at the screen but actually focuses on the network learning content.

Disclosure of Invention

The invention mainly aims to overcome the defects of the prior art and provide a network learning attention monitoring method and system based on multi-source information fusion, firstly, the human eye characteristics and the head posture information of a learner at each sampling moment are obtained based on learning image data, further, the human eye sight line positioning information of the learner is obtained, and meanwhile, the brain activity characteristics at the sampling moment are obtained based on electroencephalogram data; and then, the learning attention time sequence prediction model comprehensively utilizes the electroencephalogram characteristics and the positioning information of the sight lines of human eyes to accurately judge the learning attention of the learner on the internet, and feedback prompt is given to a learning terminal according to the learning attention time sequence prediction result in a certain time period so as to solve the problem that the attention of the learner cannot be accurately identified in the conventional online teaching scene.

According to one aspect of the invention, the invention provides a network learning attention monitoring method based on multi-source information fusion, which comprises the following steps:

s1: collecting learning video data and brain wave data;

s2: preprocessing the learning video data and the brain wave data; estimating the sight of human eyes according to the preprocessed learning video data to obtain positioning information of the sight of human eyes; extracting electroencephalogram features according to the preprocessed electroencephalogram data;

s3: and inputting the human eye sight line positioning information and the electroencephalogram characteristics into a time sequence prediction model for learning attention monitoring.

Preferably, the preprocessing the learning video data and the brain wave data includes:

sampling the learning video data according to a sampling period T to obtain a learning sequence picture; and segmenting the waveform data of the electroencephalogram data according to the sampling period T.

Preferably, the estimating of the human eye sight line according to the preprocessed learning video data to obtain the human eye sight line positioning information includes:

face recognition and correction are carried out on the learning sequence pictures, and the corrected pictures are subjected to deep CNN network H₁Obtaining the human eye characteristics of the learner through multiple convolution operations; subjecting the corrected image to a deep CNN network H₂Obtaining the positioning characteristics of the head posture of the learner through multiple convolution operations, and sending the positioning characteristics of the head posture to a classification regression network to obtain head posture information;

and fusing the human eye features and the head posture information by using a feature splicing mode to obtain human eye sight line positioning information.

Preferably, the time sequence prediction model is an LSTM prediction model based on line-of-sight-electroencephalogram mode gating; the performing learning attention monitoring includes:

performing modal gate operation on the electroencephalogram characteristics and the human eye sight positioning information input at each moment, giving different dynamic weights to the characteristics of different modes, realizing dynamic fusion of the electroencephalogram characteristics and the human eye sight positioning information, and inputting a dynamic fusion result into an LSTM prediction model;

the LSTM prediction model outputs corresponding prediction results according to application scenes of network learning attention monitoring, the application scenes comprise regression and classification, and the corresponding prediction results comprise learning attention values and learning attention categories.

Preferably, whether learning attention periodic feedback is performed through a learning terminal is determined according to a prediction result of the time sequence prediction model and a preset threshold value.

According to another aspect of the present invention, the present invention further provides a network learning attention monitoring system based on multi-source information fusion, the system comprising:

the acquisition module is used for acquiring learning video data and brain wave data;

the processing module is used for preprocessing the learning video data and the brain wave data; carrying out human eye sight estimation according to the preprocessed learning video data to obtain human eye sight positioning information; extracting electroencephalogram characteristics according to the preprocessed electroencephalogram data;

and the monitoring module is used for inputting the human eye sight line positioning information and the electroencephalogram characteristics into a time sequence prediction model to carry out learning attention monitoring.

Preferably, the system further includes a feedback module, configured to determine whether to perform learning attention periodic feedback through a learning terminal according to a prediction result of the time sequence prediction model and a preset threshold.

Has the advantages that: the invention can solve the problems that whether the sight of the learner is on the screen or not can not be accurately judged only according to the head posture information and the condition that the learner stares at the screen but is distracted in the prior learning attention monitoring technology, and the like, thereby improving the accuracy of learning attention monitoring.

The features and advantages of the present invention will become apparent by reference to the following drawings and detailed description of specific embodiments of the invention.

Drawings

FIG. 1 is a flow chart of a network learning attention monitoring method based on multi-source information fusion;

FIG. 2 is a flow chart of another network learning attention monitoring method based on multi-source information fusion;

FIG. 3 is a schematic diagram of a human eye feature extraction method based on an attention mechanism;

FIG. 4 is a schematic diagram of a head pose location method based on a classification regression network;

FIG. 5 is a schematic diagram of a learning attention prediction model based on EEG-VIA mode gating;

FIG. 6 is a schematic diagram of a network learning attention monitoring system based on multi-source information fusion.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

FIG. 1 is a flow chart of a network learning attention monitoring method based on multi-source information fusion. As shown in fig. 1, the present invention provides a network learning attention monitoring method based on multi-source information fusion, which includes the following steps:

s1: learning video data and brain wave data are collected.

Specifically, a student wears portable electroencephalogram equipment and carries out network learning facing a computer screen, and the system comprises a camera which is positioned above the screen or on the same plane with the screen; and the network learning video data and the brain wave data collected by the camera are sent to a student data warehouse. The student data warehouse stores video data, electroencephalogram data and historical learning attention states of learners in the learning process. The learning video data are collected by a camera on the learning terminal, and the electroencephalogram data are collected by a portable electroencephalograph; the initial value of the historical learning attention state is obtained by the subjective judgment of a teacher according to the previous learning performance of the learner, and the prediction result of the network learning attention recognition engine in the learning process is continuously stored.

S2: preprocessing the learning video data and the brain wave data; estimating the sight of human eyes according to the preprocessed learning video data to obtain positioning information of the sight of human eyes; and extracting electroencephalogram characteristics according to the preprocessed electroencephalogram data.

Specifically, the video data of the learner and the electroencephalogram data have time sequence consistency in the network learning process, so that the embodiment proposes that the video data are preprocessed according to a fixed sampling period T, and the learning sequence picture at the time T and the time 2T … nT can be represented as (I)₁,I₂,...I_n). In view of the fact that the brain activity data changes faster than the image data, the present embodiment proposes to segment the brain wave data into waveform data according to the sampling period T, so that the brain wave data in each T time period can be represented as (E)₁,E₂,...E_n)。

Specifically, referring to fig. 2, after the learning video data is sampled, a series of pictures are obtained, and then the face monitoring is firstly performed on the learning image I to determine the position frame of the face of the learner in the learning image. In this embodiment, an open MTCNN algorithm is used, the original learning image is continuously resized to obtain a picture pyramid, and then coarse positioning to fine positioning of the face position are continuously performed based on the CNN algorithm to obtain a face image of a corrected learner, so that subsequent extraction of eye features and head pose positioning can be supported.

As shown in FIG. 3, the corrected learner image passes through a deep CNN network H₁The characteristics of the human eyes of the learner are obtained through multiple convolution operations and are marked as U. The invention provides a method for further processing the human eye characteristics U of the learner based on an attention mechanism. Assuming that U contains L channels can be denoted as { c₁,c₂,…c_k,…c_LAnd performing global average pooling operation on each channel of the U to obtain a vector V, and performing full-connection mapping on the vector V to obtain the attention weight of each channel. If the channel c_kWeight under attention mechanism is cw_kThen the weights of all channels can be characterized as cw, thus attention is paid to the human eye feature U in force_aThe characteristics are as follows:

as shown in FIG. 4, the rectified learning image passes through a deep CNN network H₂The positioning characteristics of the head posture of the learner are obtained through multiple convolution operations, and then the positioning characteristics of the head posture are sent to 3 classification regression networks to position the head posture parameters. That is, each classification regression network will predict the head pose belonging to a classification result under a certain pose angle (for example: pitch) according to the head pose positioning feature, and the classification result indicates the rough range of the learner under the head pose angle; and finding a rough range under a certain head attitude angle based on the classification result, and then performing regression to realize accurate positioning of the head attitude angle. Therefore, based on the results of the three classification regression networks, the 3 parameters of the head pose pitch, yaw and roll of the learner corresponding to a certain learning image can be obtained and recorded as h.

In the network learning process, the head pose and the eyes of the learner can reflect whether the learner pays attention to the screen, so that the embodiment proposes to integrate the head pose information and the eye features to realize the estimation of the eye sight. Firstly, fusing U by using a characteristic splicing mode_aAnd h, further based on human eye gaze estimationPrediction algorithm H₃Obtaining more accurate human eye sight line positioning information F_sComprises the following steps:

F_s＝H₃(U_a,h)

preferably, extracting electroencephalogram features according to the preprocessed electroencephalogram data, comprising:

based on the electroencephalogram data preprocessing mode, electroencephalogram characteristics of electroencephalogram data in each T time period can be extracted from the aspects of time domain, frequency domain and wavelet. The time domain features of the electroencephalogram data include: signal amplitude ratio, peak signal difference, peak time window, inter-peak slope, signal power, signal mean, kurtosis, mobility, and complexity; the frequency domain features of the electroencephalogram data include: power spectral density and band strength; the wavelet characteristics of the brain electrical data include: entropy and energy, therefore, the electroencephalogram data characteristics can be recorded as F based on the characteristic extraction method_e。

Specifically, as shown in fig. 5, the characteristics of learner i at (T, 2T.. jT.. nT.) timing sequence may be characterized as

Wherein

And

respectively showing the electroencephalogram characteristics of the learner i at the time T, the time 2T, the time jT and the time nT,

and

respectively showing the human eye sight line positioning characteristics of the learner i at the time T, the time 2T, the time jT and the time nT. In the LSTM prediction model based on view-electroencephalogram mode gating according to the present embodiment, the model first performs mode gating on electroencephalogram characteristics and human eye view localization input at each time, that is, different dynamic weights are given to characteristics of different modes, so as to realize dynamic fusion of electroencephalogram characteristics and human eye views, and further serve as input of an LSTM network. Assume that at the jth sampling instant, the original input is (F)_ej,F_sj) Obtaining integrated input x after EEG-Sight line mode gating_jThe formula is shown below.

x_j＝concate(f(W^FF_ej+V^FF_sj+Q^Fh_j-1)[0]·F_ej，f(W^FF_ej+V^FF_sj+Q^Fh_j-1)[1]·F_sj)

Wherein, W^F，V^FAnd Q^FNetwork parameters corresponding to brain electrical characteristics, human eye sight and a hidden layer in a fusion gate control network layer, f is a sigmoid activation function, and h_j-1The hidden layer state output at the sampling moment of j-1 in the LSTM network. The output y of the LSTM network at the jth sampling instant_jComprises the following steps:

y_j＝f(W^Yx_j+Q^Yh_j-1)

wherein, W^YAnd Q^YNetwork parameters corresponding to the integrated input and hidden states in the output layer are respectively. Can be selected according to the specific application scene of the network learning attention monitoringSelecting attention monitoring as regression or classification problem, if regression, predicting output y_jTo learn attention values, and to predict output y if it is a classification_jTo learn the category of attention.

Specifically, since a lesson teaching on the internet is composed of a plurality of teaching points related to a plurality of teaching activities, each teaching activity or teaching knowledge point is only a few minutes. The present embodiment therefore proposes an attention result (y) predicted from a web-learning attention recognition engine₁,y₂,...y_n..), intercepting the prediction results according to a segment of every N minutes, and checking the sequence prediction results in the time segment, wherein if m% of the results are attention-focused and m is more than or equal to threshold (wherein the threshold is a threshold for judging attention-focused or non-focused under an actual application network teaching scene), feedback to the learning terminal is not needed; otherwise, if the learner is not attentive in the time period, the learner needs to be reminded and fed back by the learning terminal (such as a pop-up box or voice reminding), so that the learner is prompted to enter the learning state as soon as possible.

Compared with the existing attention monitoring technology, the method has the advantages that multi-source data information in the network learning process is fully utilized, the learning image data is utilized to learn the positioning information of the human eye features and the head postures, and then the human eye features and the head postures are fused to judge the human eye sight line positioning; simultaneously, extracting time domain, frequency domain and wavelet characteristics of the electroencephalogram data; and then, on the basis of a learning attention time sequence prediction model of sight-brain electrical mode gating, comprehensively utilizing brain electrical characteristics and human eye sight information to comprehensively judge the network learning attention of the learner, and analyzing a time sequence prediction result according to a certain period to judge whether to give a learning terminal with reminding and feedback at a proper time. The embodiment can more accurately identify the learning attention condition of the online learner, solves the problem that the prior art only roughly judges whether the learner stares at the screen according to the head posture information so as to deduce whether the learner focuses on the attention, and has important teaching application value.

Example 2

FIG. 6 is a schematic diagram of a network learning attention monitoring system based on multi-source information fusion. As shown in fig. 6, the present invention further provides a network learning attention monitoring system based on multi-source information fusion, wherein the system includes:

The specific implementation process of the method steps executed by each module in embodiment 2 of the present invention is the same as the implementation process of each step in embodiment 1, and is not described herein again.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all modifications and equivalents of the present invention, which are made by the contents of the present specification and the accompanying drawings, or directly/indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A network learning attention monitoring method based on multi-source information fusion is characterized by comprising the following steps:

s1: collecting learning video data and brain wave data;

s2: preprocessing the learning video data and the brain wave data; estimating the sight of human eyes according to the preprocessed learning video data to obtain positioning information of the sight of human eyes; extracting electroencephalogram characteristics according to the preprocessed electroencephalogram data;

2. The method according to claim 1, wherein the preprocessing the learning video data and the brain wave data comprises:

3. The method of claim 2, wherein the performing human eye gaze estimation based on the preprocessed learning video data to obtain human eye gaze location information comprises:

and fusing the human eye features and the head posture information by using a feature splicing mode to obtain human eye sight positioning information.

4. The method of claim 1, wherein the timing prediction model is an LSTM prediction model based on line-of-sight-electroencephalogram modal gating; the performing learning attention monitoring includes:

5. The method according to claim 1, wherein whether learning attention staged feedback is performed through a learning terminal is determined according to a prediction result of the time sequence prediction model and a preset threshold value.

6. A network learning attention monitoring system based on multi-source information fusion, characterized in that the system comprises:

the processing module is used for preprocessing the learning video data and the brain wave data; estimating the sight of human eyes according to the preprocessed learning video data to obtain positioning information of the sight of human eyes; extracting electroencephalogram characteristics according to the preprocessed electroencephalogram data;

7. The system according to claim 6, wherein the preprocessing of the learning video data and the brain wave data comprises:

8. The system according to claim 7, wherein the estimating the human eye gaze according to the preprocessed learning video data to obtain the human eye gaze positioning information comprises:

face recognition and correction are carried out on the learning sequence pictures, and the corrected pictures are subjected to deep CNN network H₁Obtaining the human eye characteristics of the learner through multiple convolution operations; subjecting the corrected image to a deep CNN network H₂Obtaining the positioning characteristics of the head posture of the learner through multiple convolution operations, and sending the positioning characteristics of the head posture to a classification regression networkObtaining head posture information;

9. The system of claim 6, wherein the timing prediction model is an LSTM prediction model based on line-of-sight-electroencephalogram modality gating; the performing learning attention monitoring includes:

10. The system of claim 6, further comprising a feedback module configured to determine whether to perform learning attention periodic feedback through a learning terminal according to a prediction result of the time-series prediction model and a preset threshold.