CN117152841A

CN117152841A - Student attention evaluation method based on double-flow sliding attention network

Info

Publication number: CN117152841A
Application number: CN202311108274.8A
Authority: CN
Inventors: 刘海; 张昭理; 刘新; 邢师珍; 郑媛; 黄小吉; 刘攀; 刘婷婷; 李友福
Original assignee: Central China Normal University
Current assignee: Central China Normal University
Priority date: 2023-08-30
Filing date: 2023-08-30
Publication date: 2023-12-01

Abstract

The embodiment of the application discloses a student attention evaluation method based on a double-flow sliding attention network, which relates to the technical field of intelligent teaching, and comprises the following steps: acquiring image information and physiological signals of a target student in a teaching period, inputting the preprocessed data into a trained model, and identifying to obtain head gestures and physiological indexes; judging the type of the head gesture, and calculating a first attention evaluation index; if the first attention evaluation index is lower than the threshold value, further calculating a second attention evaluation index based on the physiological index in the time period corresponding to the head posture type; the attention evaluation result is further calculated. According to the embodiment of the application, the physiological signals and the image information of the target students are collected simultaneously, so that the attention evaluation index can be automatically, accurately and in real time based on the head gesture and the physiological index, erroneous judgment can be effectively avoided, more accurate attention evaluation results can be favorably given, and a teacher can be helped to better know the learning state of the students in a network teaching scene.

Description

Student attention evaluation method based on double-flow sliding attention network

Technical Field

The application relates to the technical field of intelligent teaching, in particular to a student attention evaluation method based on a double-flow sliding attention network.

Background

In the traditional teaching mode, interaction between a teacher and students mainly depends on language and written expression, but the mode has the problems of low communication efficiency, limited understanding and the like, and under the condition, the teaching teacher can not timely feel the teaching emotion of the students and the specific concentration condition and adjust teaching states and contents.

Currently, with the development of internet technology, online education occupies an increasingly large share in the education industry, and network education performed in a video manner shows explosive growth. The student concentration is one of the preconditions of guaranteeing the classroom quality, and unlike the traditional classroom, in the net class scene, the teacher can only communicate with the students through the video mode, and at the same time, the number of the student videos which the teacher can notice is less, and the attention situation of the students in class is difficult to master. Particularly, the learning state of a single student cannot be specified, so that the learning state of the student is recognized in real time, and the avoidance of the class emotion in a negative state is an essential important link for improving the teaching quality, and has important significance for improving the teaching effect, optimizing the teaching content and supporting personalized learning.

Disclosure of Invention

The embodiment of the application provides a student attention evaluation method based on a double-flow sliding attention network, which is used for solving the defect that the attention of a target student in the teaching process cannot be accurately judged in the related technology, and the technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a student attention evaluation method based on a dual-flow sliding attention network, including:

acquiring image information of a target student in a teaching period, and acquiring physiological signals of the target student in the teaching period, wherein the physiological signals comprise brain electrical signals and eye electrical signals; preprocessing the image information and the physiological signal;

inputting the preprocessed image information into a trained head gesture recognition model to recognize and obtain the head gesture of each image frame; inputting the preprocessed physiological signals into a trained physiological signal recognition model to recognize and obtain physiological indexes of students at all moments;

judging and obtaining the head posture types of target students in different time periods according to the head postures of the image frames, and outputting a first attention evaluation index according to the head posture types and the corresponding duration time;

If the first attention evaluation index is lower than a first attention evaluation index threshold, outputting a second attention evaluation index based on the physiological index in the time period corresponding to the head posture type;

and outputting the attention evaluation result of the target student in the teaching period based on the first attention evaluation index and/or the second attention evaluation index of the target student in the teaching period.

In an alternative of the first aspect, the method includes:

the head gesture comprises a yaw angle, a pitch angle and a roll angle of the head, and the head gesture type of the target student in each time period is judged based on the yaw angle, the pitch angle and the roll angle;

wherein the head pose types include a facing screen pose and other head pose types.

In an alternative aspect of the first aspect, the outputting the first attention evaluation index according to the head pose type and the corresponding duration includes:

acquiring duration time of a corresponding head gesture type in each time period, and weighting and calculating to obtain the first attention evaluation index based on the head gesture type and the corresponding duration time;

If the head gesture type is the right-facing screen gesture, judging that the first attention evaluation index in the corresponding time period is not smaller than the first attention evaluation index threshold;

if the head gesture type is the other head gesture type, judging whether the duration time of the other head gesture type is larger than a gesture time threshold, and if so, judging that the first attention evaluation index in the corresponding time period is smaller than the first attention evaluation index threshold.

In an optional aspect of the first aspect, after the determining that the first attention evaluation index in the corresponding period of time is smaller than the first attention evaluation index threshold, the method further includes:

acquiring a reference physiological index of a target student based on historical data, calculating a difference value between the physiological index of the target student in the time period of the other head posture types and the reference physiological index, and calculating an average change rate of the physiological index in the time period of the other head posture types, wherein the average change rate comprises an electroencephalogram signal change rate and an eyeball displacement change rate;

calculating the second attention evaluation index based on the average change rate and the difference value;

And if the electroencephalogram signal change rate, the eyeball displacement change rate and the difference value are all larger than the corresponding physiological index threshold, judging that the second attention evaluation index is smaller than the second attention evaluation index threshold.

In an optional aspect of the first aspect, the outputting, based on the first attention evaluation index and/or the second attention evaluation index of the target student in the lecture period, an attention evaluation result of the target student in the lecture period includes:

acquiring the times and corresponding durations that the first attention evaluation index of the target student is lower than the first attention evaluation index and/or the times and corresponding durations that the second attention evaluation index is lower than the second attention evaluation index;

and calculating the attention evaluation result of the target student in the teaching period based on the times and the time length in a weighting mode.

In an optional implementation manner of the first aspect, the inputting the preprocessed image information into a trained head pose recognition model to recognize a head pose of each image frame includes:

acquiring each image frame of the preprocessed RGB video of the target student, inputting each video frame into two data streams of a trained head gesture recognition model for feature fusion, inputting the video frames into a full-connection layer of the head gesture recognition model, and outputting the video frames to obtain a rotation matrix;

And calculating the Euler angle of the image frame based on the rotation matrix.

In a second aspect, an embodiment of the present application further provides a student attention evaluation device based on a dual-flow sliding attention network, including:

the information acquisition module is used for acquiring image information of the target students in the teaching period and acquiring physiological signals of the target students in the teaching period, wherein the physiological signals comprise brain electrical signals and eye electrical signals; preprocessing the image information and the physiological signal;

the recognition module is used for inputting the preprocessed image information into a trained head gesture recognition model to recognize and obtain the head gesture of each image frame; inputting the preprocessed physiological signals into a trained physiological signal recognition model to recognize and obtain physiological indexes of students at all moments;

the first evaluation module is used for judging and obtaining the head posture types of the target students in different time periods according to the head postures of the image frames, and outputting a first attention evaluation index according to the head posture types and the corresponding duration time;

the second evaluation module is used for outputting a second attention evaluation index based on the physiological index in the time period corresponding to the head posture type when the first attention evaluation index is lower than a first attention evaluation index threshold;

And the attention evaluation module is used for outputting an attention evaluation result of the target student in the teaching period based on the first attention evaluation index and/or the second attention evaluation index of the target student in the teaching period.

In an optional aspect of the first aspect, the attention evaluation module is configured to output an attention evaluation result of the target student in the lecture period based on the first attention evaluation index and/or the second attention evaluation index of the target student in the lecture period, and includes:

acquiring the times and corresponding durations that the first attention evaluation index of the target student is lower than the first attention evaluation index through the first evaluation module and/or acquiring the times and corresponding durations that the second attention evaluation index is lower than a second attention evaluation index through the second evaluation module;

and the attention evaluation module obtains the attention evaluation result of the target student in the teaching period based on the times and the time length weighted calculation.

In a third aspect, an embodiment of the present application further provides an electronic device, including a memory, a processor, and a computer program stored on the memory and capable of running on the processor, where the processor executes the program to implement the steps of any one of the methods provided in the first aspect of the embodiment of the present application.

In a fourth aspect, the present application also provides a non-transitory computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of the method as provided by the first aspect of the embodiments of the application.

The technical scheme provided by the embodiments of the application has the beneficial effects that at least:

according to the student attention evaluation method based on the double-flow sliding attention network, which is provided by the embodiment of the application, the head gesture and the physiological index of the target student are identified and obtained by collecting the physiological signals and the image information of the target student at the same time, so that the actual state of the student in class can be reflected more comprehensively; the recognition is carried out through the trained neural network, so that the recognition result of the head gesture type and the physiological index can be automatically and accurately given in real time; judging a first evaluation index based on the head posture type and the corresponding duration, and judging whether a second evaluation index needs to be calculated according to the evaluation result of the first evaluation index, so that the calculation force can be saved to the greatest extent; when the first evaluation index is smaller than the first attention evaluation index threshold, the second attention evaluation index based on the physiological signal is used for judging again, so that erroneous judgment can be effectively avoided, more accurate attention evaluation results can be given out, and therefore a teacher can be accurately helped to know the learning state of the student in real time in a network teaching scene.

Drawings

In order to more clearly illustrate the application or the technical solutions in the related art, the drawings used in the description of the embodiments or the related art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a student attention assessment method based on a dual-flow sliding attention network according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an application scenario according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a trained head pose recognition model according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a trained physiological signal recognition model according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terms "comprising" and "having" and any variations thereof in the description and claims of the application and in the foregoing drawings are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to only those steps or modules but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that the term "first/second" related to the present application is merely to distinguish similar objects, and does not represent a specific order for the objects, and it should be understood that "first/second" may interchange a specific order or precedence where allowed. It is to be understood that the "first\second" distinguishing aspects may be interchanged where appropriate to enable embodiments of the application described herein to be implemented in sequences other than those described or illustrated herein.

Next, taking the example of executing the student attention evaluation method based on the dual-flow sliding attention network at the terminal of the target student as an example, the student attention evaluation method based on the dual-flow sliding attention network provided by the embodiment of the application is introduced, and the method comprises the following steps:

S1, acquiring image information of a target student in a teaching period, and acquiring physiological signals of the target student in the teaching period, wherein the physiological signals comprise brain electrical signals and eye electrical signals; preprocessing the image information and the physiological signal.

Optionally, the image information of the student can be obtained through a camera carried on the net lesson terminal used by the student, and the image data of the student can also be obtained through an external camera, a cradle head camera and the like; physiological signals of the students can be acquired through the wearable device, for example, EEG signals of the students are acquired through the head-mounted device, and the EEG signals EOG are acquired through electrodes around eyes.

Optionally, preprocessing the image includes collecting video in a teaching period, and preprocessing the image in modes of feature extraction, cutting, portrait identification and the like to finally obtain an RGB image of a student single frame; after the electroencephalogram signal and the electrooculogram signal are obtained, the signals can be subjected to filtering processing to remove noise, and the information is extracted through binarization, so that the embodiment of the application is not limited.

S2, inputting the preprocessed image information into a trained head gesture recognition model to recognize and obtain the head gesture of each image frame; inputting the preprocessed physiological signals into a trained physiological signal recognition model to recognize and obtain physiological indexes of students at all moments.

Specifically, after extracting the head gesture corresponding to each frame of image, packaging and storing data, reserving a time stamp corresponding to each head gesture, and recording the time stamp as a head gesture log based on time sequence;

the inventor finds that the cognitive behaviors and psychological activities of the human have strong correlation with the electroencephalogram signals, and the emotion fluctuation of the human can be reflected based on the analysis of the electroencephalogram signals; the eye movement can be divided into several steps of back vision, saccade and fixation, and the different eye movements can produce eye movement waveforms with different characteristics, and can reflect whether a person is focused on reading state or not by detecting the eye movement waveforms and analyzing the eye movement waveforms.

And S3, judging and obtaining the head posture types of the target students in different time periods according to the head postures of the image frames, and outputting a first attention evaluation index according to the head posture types and the corresponding duration time.

Specifically, the head gesture comprises a yaw angle, a pitch angle and a roll angle of the head, and the head gesture type of the target student in each time period is judged based on the yaw angle, the pitch angle and the roll angle.

By way of example, head pose types are divided based on yaw, pitch and roll angles of the head pose, see table 1, and other head poses may be divided into head up, head down, and head left right, and are provided herein as further illustrative of embodiments of the application, which is not limited thereto:

TABLE 1 student head pose type classification rules

As an example, if the head posture is right against the screen, the first attention evaluation index is larger than the first attention evaluation index threshold value at this time, and the first attention evaluation index of the target student is higher; if the head posture is low and the duration of the low head is 15 minutes, the duration threshold corresponding to the first attention evaluation index threshold is 20 minutes, the first attention evaluation index of the target student is higher, and if the duration of the low head is 30 minutes, the first attention evaluation index of the target student is lower.

Based on the above, by judging the head posture type, the sight direction of the target student can be judged, whether the student falls on the screen or not can be judged, and if the student falls on the screen, whether the student is in a state of being in focus for listening and speaking or not can be judged; otherwise, it is determined that the student may not be concentrating on the content of the screen display.

And S4, if the first attention evaluation index is lower than a first attention evaluation index threshold, outputting a second attention evaluation index based on the physiological index in the time period corresponding to the head posture type.

It can be understood that only from the fact that the student does not concentrate on the screen to judge whether concentration is not accurate enough, there are some situations that the student needs to think, check lectures and the like, so that the time that the student is in other head gesture types needs to be considered, and further, the emotion recognition of the electroencephalogram signal and the eye waveform recognition of the electrooculogram signal are combined to judge whether the student is in concentration state.

Further, if the first attention evaluation index of the target student is lower than the threshold value, the first attention evaluation index indicates that the student is in other head gesture types for a long time, and the first attention evaluation index needs to be further evaluated by combining with the physiological index; it is therefore necessary to acquire physiological indexes of the student during the time periods of other head posture types, and based thereon, obtain a second attention evaluation index.

As an example, when it is determined that the physiological index obtained by the electroencephalogram information and the electrooculogram information is stable, the electrooculogram signal is stable, that is, the change rate of the two signals in a given time period is low, the corresponding second attention evaluation index is high, otherwise, the second attention evaluation index is low.

S5, outputting attention evaluation results of the target students in the teaching period based on the first attention evaluation index and/or the second attention evaluation index of the target students in the teaching period.

Specifically, the attention of the students in the teaching period is comprehensively calculated based on the first attention evaluation index and/or the second attention evaluation index of the target students in the teaching period, if the duration of keeping the watching screen or keeping other head postures in the teaching period is short, only the first attention evaluation index is generated, and if the duration of keeping other head postures in the teaching period is long, the second attention evaluation index is further given.

Specifically, as shown in fig. 2, an application scenario of the embodiment of the present application is that an image acquisition device and a physiological signal acquisition device connected through a net lesson terminal acquire image information and physiological signals of students and upload the image information and physiological signals to a server, and the image information and the physiological signals are preprocessed by the server, so as to calculate and obtain evaluation indexes.

In a specific embodiment, the outputting the first attention evaluation index according to the head pose type and the corresponding duration includes:

When the head posture type is right facing the screen, the first attention evaluation index can be judged to be not smaller than the first attention evaluation index threshold value, and further judgment time is not needed.

In a specific embodiment, after the determining that the first attention evaluation index in the corresponding period of time is smaller than the first attention evaluation index threshold, the method further includes:

if the electroencephalogram signal change rate, the eyeball displacement change rate and the difference value are all larger than the corresponding physiological index threshold, judging that the second attention evaluation index is smaller than a second attention evaluation index threshold;

specifically, when the electroencephalogram signal change rate, the eyeball displacement change rate and the difference value are all larger than the corresponding physiological index threshold, determining that the emotion of the target student is fluctuated and the eyeballs are in a motion state at the moment, and indicating that the target student is not in a concentration state at the moment, so that the second attention evaluation index is smaller than the second attention evaluation index threshold.

Specifically, the electroencephalogram and the electrooculogram signals of the target students during the period of concentrating on reading can be collected to obtain the reference physiological indexes of the electroencephalogram and the electrooculogram, so that the second attention evaluation index is calculated more accurately for the single students.

In a specific embodiment, the outputting, by the first attention evaluation index and/or the second attention evaluation index of the target student in the period, the attention evaluation result of the target student in the lecture period includes:

As an example, by counting the number of times and the time that the two attention evaluation indexes are below the corresponding threshold values, the total evaluation result may be calculated, for example, the number of times that the first attention evaluation index is below the threshold value is calculated to be 0, the first attention evaluation index is not below the threshold value, the second attention evaluation index is not necessary to be calculated, and at this time, the corresponding attention evaluation result may give a high rating; if, for example, the number of times that the first attention evaluation index is calculated to be lower than the threshold value is 2, the duration is 1 clock and 5 minutes, respectively, and the time of not looking at the screen for one time is longer, then the second attention evaluation index is further calculated, and if the second attention evaluation index is lower than the threshold value and the duration is 5 minutes, then the corresponding attention evaluation result may give a general rating, which is only taken as an example of the embodiment of the present application and should not be taken as limiting the present application.

The specific evaluation result can be in the forms of scores, grades and the like, the attention evaluation result can be obtained based on a preset attention degree mapping table according to the weighted calculation result, and the attention level can be directly reflected through the weighted calculation score.

Furthermore, after the target student has low attention, a prompt can be sent to the teacher and/or the student to remind the teacher to interact with the student, so that the attention of the student is improved, or a prompt message is displayed on a display screen of the net lesson terminal to remind the user to concentrate on listening and speaking.

Specifically, the inputting the preprocessed image information into a trained head gesture recognition model to recognize the head gesture of each image frame includes:

As shown in fig. 3, the architecture details of the trained head pose recognition model are as follows: the network is a double-flow sliding attention network, and input image frames are divided into two data flows to be respectively processed; specifically, two basic DSC blocks were used for feature fusion: DSCr (c) ≡ [ DSC (3X 3, c) -BN-ReLU ]And DSCt (c) ≡ [ DSC (3X 3, c) -BN-Tanh]Where c is the parameter and BN represents batch normalization. Trans represents a custom Transformer encoder with 8 heads, 3 encoder layers, 32 expected features in the encoder (denoted k), 64 feed-forward dimensions. The first data stream has [ DSCr (16) -AvgPool (2×2) -DSCr (32) -DSCr (32) -AvgPool (2×2)]-[DSCr(32)-DSCr(32)-Trans-AvgPoo1(2X2)]-[DSCr(32)-DSCr(32)-Trans]Each pair of brackets forms a phase, avgPoo1 being average pooled. . The second data stream has the same structure as the first stream except that it uses a DSCt instead of a DSCr and a maximally pooled MaxPoo1 instead of an average pooled AvgPoo1. The feature fusion module consists of element-wise multiplication of features from two data streams. The prediction head consists of a 1 x 1 convolution layer, reducing the number of channels from 32 to 16 (the first stage has AvgPoo1 after the 1 x 1 convolution). The flat array is then fed into a linear layer with 6 output units, and the final head pose is calculated by taking their weighted average using three predicted head poses (one for each stage), as in the formulaAs shown.

Further, calculating the rotation matrix and the true rotation matrix through the 6d degree of freedom prediction and the geodesic distance Loss includes:

the Gram-Schmidt mapping is performed inside the representation itself by simply discarding the last column vector of the rotation matrix, which reduces the 3 x 3 matrix to a six parameter rotation representation.

The predicted 6D representation matrix may then be mapped back to SO (3), with the SO (3) clusters representing rotation operations in three dimensions.

The predicted 6D representation matrix may be mapped back to SO (3). Thus, the remaining column vectors are simply determined by the cross product, which ensures that the orthogonality constraint is satisfied for the resulting 3×3 matrix.

The predicted 6D representation matrix may be mapped back to SO (3). Thus, the remaining column vectors are simply determined by the cross product, which ensures that the orthogonality constraint is satisfied for the resulting 3×3 matrix. Thus, the network only needs to predict 6 parameters, which are mapped to the 3 x 3 rotation matrix in the subsequent transform, while also satisfying the orthogonality constraint. A common loss function for head pose related tasks is the l2 norm. However, using the Frobenius norm to measure the distance between two matrices will break the SO (3) manifold geometry. In contrast, the shortest path between two 3D rotations is geometrically interpreted as the geodesic distance. Let R _p And R is _gt E, SO (3) is an estimated rotation matrix and a real rotation matrix respectively, and then the geodesic distance between the two rotation matrices is defined as:

further, the specific steps of converting the rotation matrix into euler angle representation are as follows:

euler angles of three axes x, y and z are respectively theta _x ，θ _y ，θ _z Rotating along the sequence of z-y-x by adopting an internal rotation mode, wherein sine values and cosine values are s respectively _x ，c _x ，s _y ，c _y ，s _z ，c _z Then the rotation matrix is:

R(θ _z ,θ _y ,θ _z )＝R(z,θ _z )·R(y,θ _y )·R(x,θ _z )；

the Euler angle is calculated by the rotation matrix:

the euler angle is obtained by solving the equation:

optionally, the head pose recognition model is trained by historically measured student head pose data and/or a commonly used 300w_lp dataset as a training set.

Further, referring to fig. 3, the training process of the physiological signal recognition model is as follows:

the preprocessed EEG signals EEG and EOG signals are input to a feature extraction and fusion module, wherein the EEG signals and the EOG signals can be waveform diagrams of 30 seconds continuously, and the starting time can be the starting time of recording of the long-time head posture abnormality.

The input to the ResNet50 deep convolutional neural network is then learned and inferred, and attention mechanisms are improved and employed for learning global feature information without adding much computational cost, namely, channel and Spatial Joint Attention (CSJA) blocks and squeeze and fire (SE) blocks, to recalibrate the features. Accordingly, a channel complexity adjustment factor (f) is introduced to uniformly spread the number of channels and then cooperates with the attention block.

Further, features are input into a spatial joint attention (CSJA) block, where the original features are globally averaged and pooled to the maximum in the "squeeze" section. In the Excitation section, convolution is used to form a new feature map containing the importance of the positional information. In the final feature fusion section, attention feature maps trained separately at the spatial and channel levels are added. And finally, obtaining the adjusted characteristic map and fusing the double attentions.

A channel attention mechanism called SE block is introduced at the bottom of the network, aimed at adaptively recalibrating the abstract features extracted by the encoder. First, spatial information corresponding to each channel is compressed, such as a global response of the signal within 30 s. Second, two convolution operations are used to learn the nonlinear features and perform channel level feature selection. Finally, tensor multiplication enhances useful features and suppresses invalid features.

Finally, the information is input into a classification prediction head, the classification prediction head consists of a plurality of linear projection layers, the final prediction is divided into 2 groups, the emotion is stable and fluctuates, the eyeball smoothly moves and the eyeball scans, and the finally obtained categories comprise: mood-stabilizing eye movements, mood-fluctuating eye movements, mood-stabilizing eye saccades, mood-fluctuating eye saccades.

According to the scheme, the SE module is specifically designed as follows:

the SE module mainly comprises two parts, namely compression of the Squeeze and Excitation of the expression. Since the output is generated by summing all channels, the channel correlation is implicitly embedded in, but entangled with the local spatial correlation captured by the filter. It is desirable to enhance the learning of convolution features by explicitly modeling the interdependencies of the channels so that the network can increase its sensitivity to information features that can be exploited by subsequent transformations. Thus, the module wants to provide a way for it to obtain global information and recalibrate the filter response, i.e. the convolution kernel of the neural network, in two steps (Sequeeze and Excitation) before entering the next transition.

Consider first the signal of each channel in the output profile.

Further, since each learned filter works with one local receptive field, each unit of the converted output U cannot utilize context information outside the area. To address this problem, global spatial information may be compressed into channel descriptors by generating channel statistics using a global averaging pool. Precisely z ε R ^C Is the result of performing global average pooling on feature U in spatial dimension H x W, so each element of Z is represented as follows:

in order to exploit the information aggregated in the compress squeze operation, the second operation needs to be continued to trigger the specification in order to fully capture the channel dependencies, i.e. the channels of the feature map. To achieve this, the functionality must meet two criteria: first, flexible operation must be enabled, and nonlinear relationships between channels must be learned; second, non-exclusive relationships must be learned so that multiple channels can be enhanced (rather than just enhancing a channel feature like one-hot encoding). To meet these criteria, the module chooses to use a simple gating mechanism with sigmoid activation.

In order to meet the two criteria described above, the following variants are used:

s＝F _ex (z,W)＝σ(g(z,W))＝σ(W ₂ δ(W ₁ z))；

in step 1.5.4, to limit the complexity of the model and generalize it, two FC layers are used to parameterize the gating mechanism, namely a dimension reduction layer with dimension reduction rate r, a ReLU, then a dimension elevation layer, and then to the channel dimension of the output feature map. The final output of the Block is obtained by using the active rescaled feature map. After s is obtained, the final output of SE Block can be obtained by:

the following are device embodiments of the present application that may be used to perform method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.

The student attention evaluation device based on the double-flow sliding attention network provided by the exemplary embodiment of the application can be realized into all or part of a terminal through software, hardware or a combination of the software and the hardware, and can also be integrated on a server as an independent module. The student attention evaluation device based on the double-flow sliding attention network in the embodiment of the application can be applied to a terminal or a cloud, and comprises a student attention evaluation device based on the double-flow sliding attention network, wherein the device comprises:

Further, the attention evaluation module is configured to output an attention evaluation result of the target student in the lecture period based on the first attention evaluation index and/or the second attention evaluation index of the target student in the lecture period, and includes:

It should be noted that, when the student attention evaluation device based on the dual-flow sliding attention network provided in the foregoing embodiment performs the student attention evaluation method based on the dual-flow sliding attention network, only the division of the foregoing functional modules is used for illustration, and in practical application, the foregoing functional allocation may be performed by different functional modules, that is, the internal structure of the apparatus is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the device provided in the above embodiment and the student attention evaluation method embodiment based on the dual-flow sliding attention network belong to the same concept, and the detailed implementation process is referred to the method embodiment, which is not repeated here.

The embodiment of the application also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the method of any embodiment when executing the program.

Fig. 5 is a block diagram of an electronic device according to an embodiment of the present application.

As shown in fig. 5, the electronic device 500 includes: a processor 501 and a memory 502.

In the embodiment of the present application, the processor 501 is a control center of a computer system, and may be a processor of a physical machine or a processor of a virtual machine. Processor 501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 501 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ).

The processor 501 may also include a main processor and a coprocessor, the main processor being a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state.

Memory 502 may include one or more computer-readable storage media, which may be non-transitory. Memory 502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments of the application, a non-transitory computer readable storage medium in memory 502 is used to store at least one instruction for execution by processor 501 to implement the method in embodiments of the application.

In some embodiments, the electronic device 500 further includes: a peripheral interface 503 and at least one peripheral. The processor 501, memory 502, and peripheral interface 503 may be connected by buses or signal lines. The individual peripheral devices may be connected to the peripheral device interface 503 by buses, signal lines or circuit boards. Specifically, the peripheral device includes: peripheral interfaces 503 in display 504, camera 505 and audio circuitry 506 may be used to connect at least one Input/Output (I/O) related peripheral to processor 501 and memory 502.

In some embodiments of the application, processor 501, memory 502, and peripheral interface 503 are integrated on the same chip or circuit board; in some other embodiments of the application, either or both of the processor 501, memory 502, and peripheral interface 503 may be implemented on separate chips or circuit boards. The embodiment of the present application is not particularly limited thereto.

The display 504 is used to display the UI. The UI may include graphics, text, icons, video, and any combination thereof. When the display 504 is a touch screen, the display 504 also has the ability to collect touch signals at or above the surface of the display 504. The touch signal may be input as a control signal to the processor 501 for processing. At this point, the display 504 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards.

In some embodiments of the present application, the display 504 may be one and disposed on the front panel of the electronic device 500; in other embodiments of the present application, the display 504 may be at least two, respectively disposed on different surfaces of the electronic device 500 or in a folded design; in still other embodiments of the present application, the display 504 may be a flexible display disposed on a curved surface or a folded surface of the electronic device 500. Even more, the display 504 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The display 504 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera 505 is used to capture images or video. Optionally, the camera 505 includes a front camera and a rear camera. In general, a front camera is disposed on a front panel of an electronic device, and a rear camera is disposed on a rear surface of the electronic device. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments of the application, camera 505 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 506 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, and converting the sound waves into electric signals to be input to the processor 501 for processing. For purposes of stereo acquisition or descent, the microphone may be multiple, each disposed at a different location of the electronic device 500. The microphone may also be an array microphone or an omni-directional pickup microphone.

The power supply 507 is used to power the various components in the electronic device 500. The power source 507 may be alternating current, direct current, disposable or rechargeable. When the power source 507 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

The block diagrams of the electronic device according to the embodiments of the present application do not limit the electronic device 500, and the electronic device 500 may include more or less components than those shown, or may combine some components, or may employ different arrangements of components.

The present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of any of the previous embodiments. The computer readable storage medium may include, among other things, any type of disk including floppy disks, optical disks, DVDs, CD-ROMs, micro-drives, and magneto-optical disks, ROM, RAM, EPROM, EEPROM, DRAM, VRAM, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on such understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the related art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A student attention assessment method based on a dual-flow sliding attention network, comprising:

2. The method of claim 1, wherein the head pose comprises a yaw angle, a pitch angle, and a roll angle of the head, and wherein the type of head pose of the target student is determined for each time period based on the yaw angle, the pitch angle, and the roll angle;

3. The method of claim 2, wherein outputting the first attention assessment indicator based on the head pose type and the corresponding duration comprises:

4. The method according to claim 3, wherein after the determining that the first attention evaluation index in the corresponding period of time is smaller than the first attention evaluation index threshold value, further comprising:

5. The method according to any one of claims 1 to 4, wherein the outputting of the attention evaluation result of the target student during the lecture period based on the first attention evaluation index and/or the second attention evaluation index of the target student during the lecture period includes:

6. The method according to claim 1, wherein the inputting the preprocessed image information into a trained head pose recognition model to recognize the head pose of each image frame includes:

7. A student's attention assessment device based on double-flow sliding attention network, comprising:

8. The dual-flow sliding attention network based student attention evaluation device of claim 7, wherein the attention evaluation module is configured to output an attention evaluation result of the target student in the lecture period based on the first attention evaluation index and/or the second attention evaluation index of the target student in the lecture period, and comprises:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 6 when the program is executed.

10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 6.