CN113723233A - Student learning participation degree evaluation method based on layered time sequence multi-example learning - Google Patents

Student learning participation degree evaluation method based on layered time sequence multi-example learning Download PDF

Info

Publication number
CN113723233A
CN113723233A CN202110942289.9A CN202110942289A CN113723233A CN 113723233 A CN113723233 A CN 113723233A CN 202110942289 A CN202110942289 A CN 202110942289A CN 113723233 A CN113723233 A CN 113723233A
Authority
CN
China
Prior art keywords
video
level
learning
segment
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110942289.9A
Other languages
Chinese (zh)
Other versions
CN113723233B (en
Inventor
李特
姜新波
马嘉遥
秦学英
顾建军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202110942289.9A priority Critical patent/CN113723233B/en
Publication of CN113723233A publication Critical patent/CN113723233A/en
Application granted granted Critical
Publication of CN113723233B publication Critical patent/CN113723233B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a student learning participation degree evaluation method based on hierarchical time sequence multi-example learning. The method uses three characteristics of head posture, facial expression and body posture extracted from a video and a video-level learning participation label to train an evaluation model, and the model can obtain the video-level learning participation and the learning participation of all video segments. The implementation method is convenient, efficient and simple in calculation, and the learning participation degree evaluation precision is reliably guaranteed.

Description

Student learning participation degree evaluation method based on layered time sequence multi-example learning
Technical Field
The invention belongs to the field of computer vision, artificial intelligence and education, and particularly relates to a student learning participation degree evaluation method based on hierarchical time sequence multi-example learning.
Background
The advent of large-scale open online courses (MOOC) has raised widespread interest and great expectations in the educational community. Despite the broad potential of new educational approaches, low learning completion rates of students are considered to be one of their major problems. In order to overcome the deficiency, dynamic assessment of the participation of individual students during online learning activities can provide timely teaching intervention to improve the learning completion rate and perform personalized learning. Since a large number of students are often seen in an MOOC environment, the cost of manually performing such evaluations is prohibitive. Therefore, research into automated techniques for instantly evaluating student learning participation is receiving increasing attention.
The study of automatic assessment of learning participation has the following problems:
1) due to the fact that the work of segment-by-segment annotation is time-consuming and labor-consuming, the problem of evaluating the learning participation degree of the whole video is only solved by a plurality of previous methods, and attention is lacked to the evaluation of the more meaningful video segment participation degree.
2) The lessons of distance education usually last tens of minutes or even an hour, and the large amount of video data makes evaluation difficult. Therefore, how to obtain effective features that can represent the whole video and each short film becomes a problem to be solved urgently.
Disclosure of Invention
The invention aims to provide a student learning participation degree evaluation method based on hierarchical time sequence multi-example learning, aiming at the defects of the prior art.
The purpose of the invention is realized by the following technical scheme: a student learning participation degree evaluation method based on hierarchical time sequence multi-example learning comprises the following steps:
step 1, extracting image frames from each video, wherein each frame of image forms a video segment, and each video acquires N video segments.
And 2, respectively extracting body posture features, head posture features and facial key point features of each frame image in each video clip by using OpenPose, FSA-NET and PLFD networks for evaluation of learning participation.
Step 3, for each type of characteristic sequence of the video clip, respectively using a Bi-LSTM network to obtain the hidden state of each moment; the hidden state is input into an underlying temporal multi-instance learning module (B-TMIL) to obtain a feature representation of the video segment. And processing the extracted features of all video segments of one video through a full-connection and top-level multi-instance learning module (T-TMIL) to obtain the feature representation of the video. Wherein, the B-TIMIL and the T-TMIL are realized based on a self-attention mechanism, and the structures are the same.
And 4, fusing the three video segment-level features extracted by the B-TMIL in the step 3, and fusing the three video-level features extracted by the T-TMIL in the step 3.
And 5, performing full connection operation on the fusion characteristics of the video clip level to obtain the learning participation of the video clip. And obtaining the learning participation of the video through full-connection operation of the video-level fusion characteristics. And respectively establishing local and global supervision by using the average value of the video segment participation and the learning participation of the video, and training the whole network.
Further, in step 1, extracting the image frames specifically includes reserving one frame every several frames at equal intervals.
Further, in step 2, for the video clip frame vi,jRespectively extracting head posture characteristics e by using OpenPose, FSA-NET and PLFD networksi,jBody posture characteristics bi,jAnd face key point mi,j. For video clip ViThen get the head pose sequence Ei={ei,1,ei,2…,ei,l}, body posture sequence Bi={bi,1,bi,2…,bi,lAnd facial Key Point sequence Mi={mi,1,mi,2…,mi,l}。
Further, in step 3, the B-TMIL module acts on the sequence of sampled video frames that constitute a short video segment, where the frames are examples and the segments are packets. A valid representation of the packet needs to be obtained in order to accurately obtain its tag. The characterization of the packet is adaptively obtained by trainable parameters using a multi-instance learning module of an attention-machine mechanism to act on the hidden state at all times of the Bi-LSTM.
Further, with XiRepresenting one of a head pose sequence, a body pose sequence, a facial keypoint sequence, XiInputting into Bi-LSTM to obtain hidden state sequence Hi={hi,1,hi,2…,hi,l}。XiCorresponding segment-level aggregation features
Figure BDA0003215562330000021
Is calculated as
Figure BDA0003215562330000022
And (3) weighted sum form after dimension reduction:
Figure BDA0003215562330000023
Figure BDA0003215562330000024
Figure BDA0003215562330000025
wherein the content of the first and second substances,
Figure BDA0003215562330000026
representing one of a segment-level head pose feature sequence, a segment-level body pose feature sequence, and a segment-level face keypoint feature sequence. Delta is the function of the ReLU and is,
Figure BDA0003215562330000027
is a full join operation for dimensionality reduction.
Figure BDA0003215562330000028
Is a weight matrix,. alpha.is an element-by-element multiplication,. alpha.is a Sigmoid function,. tau.is Tanh function.
Further, in step 3, T-TMIL acts between video segments, which are considered as examples, and the complete video composed of segments is a packet. The MIL module is applied on the basis of the segment-level features. The dimensionality of the segment-level features is reduced by the fully-join operation to generate a more efficient embedded representation. A weighted combination of video segments is constructed to represent the video.
Further, the air conditioner is provided with a fan,
Figure BDA0003215562330000029
representing one of a video-level head pose feature sequence, a video-level body pose feature sequence, a video-level facial key point feature sequence:
Figure BDA0003215562330000031
Figure BDA0003215562330000032
Figure BDA0003215562330000033
wherein the content of the first and second substances,
Figure BDA0003215562330000034
is the weight matrix in T-TMIL.
Figure BDA0003215562330000035
Representing one of a segment-level head pose feature sequence, a segment-level body pose feature sequence, and a segment-level face keypoint feature sequence.
Further, in step 4, a weighted feature fusion method is adopted to extract the fusion features of the segment level and the video level for evaluation. The weight matrix is the same size as the feature matrix and is composed of trainable parameters. And normalizing each column of the weight matrix through a Softmax function to obtain different proportions of different features on corresponding dimensions, and then weighting and summing to obtain weighted fusion features.
The invention has the beneficial effects that: the invention establishes a layered time sequence multi-example learning model according to the time correlation between examples, wherein the model is composed of a bottom module of a video frame-video fragment and a top module of the video fragment-video. The method uses three characteristics of head posture, facial expression and body posture extracted from the video and the learning participation degree label of video level to train the evaluation model, and the model not only can obtain the learning participation degree of video level, but also can obtain the learning participation degree of all video segments. The implementation method is convenient, efficient and simple in calculation, and the learning participation degree evaluation precision is reliably guaranteed.
Drawings
FIG. 1 is a schematic diagram of a learning engagement assessment framework;
FIG. 2 is a schematic diagram of a feature fusion process.
Detailed Description
As shown in fig. 1, the student learning participation degree evaluation method based on hierarchical time series multi-instance learning of the present invention includes the following steps:
step 1, pretreatment. Extracting 3000 frames of images at equal intervals from each video; each 30 frames of images constitute one video clip, so that 100 video clips can be acquired for one video.
Down-sampling: the body posture, head posture and facial key points of the learner tend to change gradually and slowly during the learning process. Therefore, we downsample each original video by keeping one frame every few frames to achieve more efficient computational processing. In our experiment, 3000 frames per video were reserved for evaluation.
And (3) dividing: since the features of the current moment have little influence on the evaluation of learning engagement of other moments, and the network usually has difficulty in processing a lengthy feature sequence, we segment the input video into short video segments as basic analysis objects. We set the length of the video segment to 30 frames, denoted l-30 in all our experiments, to trade off the result between computational efficiency and accuracy of the proposed method. The number of video clips N extracted from each video is 100.
And 2, extracting the characteristics. And respectively extracting body posture characteristics, head posture characteristics and facial key point characteristics of each frame image in each video clip by using OpenPose, FSA-NET and PLFD networks for the evaluation of learning participation.
According to previous studies, the body language and facial expression have a strong correlation with the learner's learning engagement. Therefore, we use head pose features, body pose features, and facial keypoints as inputs to improve the accuracy and robustness of our model. For video clip frame vi,jWe use OpenPose, FSA-NET and PLFD networks to extract the head pose features e separatelyi,jBody posture characteristics bi,jAnd face key point mi,j(ii) a i is 1 to N, and j is 1 to l. Then for video segment ViWe can get the head pose sequence Ei={ei,1,ei,2…,ei,l}, body posture sequence Bi={bi,1,bi,2…,bi,lAnd facial Key Point sequence Mi={mi,1,mi,2…,mi,l}。
And step 3, layering a time sequence multi-instance learning model (H-TMIL). Based on a frame-segment-video structure with only video-level tags, as shown in fig. 1, we propose a hierarchical temporal multi-instance learning model (H-TMIL) consisting of a bottom-level temporal multi-instance learning module (B-TMIL) and a top-level temporal multi-instance learning module (T-TMIL) that work to learn the potential relationships between segments and their constituent frames and between videos and their constituent segments, respectively. Through this framework, we establish a connection between the underlying video frames and the video-level tags, and can implicitly learn the expressions of the middleware, i.e., the segment-level features useful for evaluating segment-level learning engagement.
And 3.1, a bottom-layer time sequence multi-instance learning module (B-TMIL). For each class of feature sequences of a video segment, we use the Bi-LSTM network to obtain the hidden state at each time. The hidden state is input into an underlying temporal multi-instance learning module (B-TMIL) to obtain a feature representation of the video segment. As shown in the lower left corner of FIG. 1, B-TMIL is implemented based on a self-attention mechanism.
The B-TMIL module operates on a sequence of sampled video frames that constitute a short video segment, where frames are examples and segments are packets. We need to obtain a valid representation of a packet in order to accurately obtain its tag. However, unlike conventional multi-instance learning MILs, there is a strong timing correlation between frames for a short sequence of time frames. The use of Bi-LSTM to capture temporal correlations and use the last layer of hidden states to express sequences results in the loss of early information. To address this problem, we use a multi-instance learning module of an attention-oriented mechanism to act on hidden states at all times of Bi-LSTM, adaptively obtaining the characterization of the packet through trainable parameters. The body posture feature is used as an example for illustration.
First, we will characterize the body posture sequence BiInputting into Bi-LSTM to obtain hidden state sequence Hi={hi,1,hi,2…,hi,l}. Segment level aggregation feature
Figure BDA0003215562330000041
Can be calculated as
Figure BDA0003215562330000042
And (3) weighted sum form after dimension reduction:
Figure BDA0003215562330000043
where, delta refers to the ReLU function,
Figure BDA0003215562330000044
refers to a full join operation for dimensionality reduction. Weight of
Figure BDA0003215562330000045
The calculation formula is as follows:
Figure BDA0003215562330000051
wherein the content of the first and second substances,
Figure BDA0003215562330000052
the calculation is as follows:
Figure BDA0003215562330000053
in the formula (I), the compound is shown in the specification,
Figure BDA0003215562330000054
is a weight matrix,. phi.is an element-by-element multiplication,. sigma.is a Sigmoid function, and. tau.is a Tanh function. Wherein, the Tanh function is used for obtaining the correlation between the characteristics, and the Sigmoid function is used as a door mechanism.
Similar to segment-level body posture feature sequence
Figure BDA0003215562330000055
Using the B-TMIL module to extract segment-level head pose feature sequences
Figure BDA0003215562330000056
And segment level face key point feature sequence
Figure BDA0003215562330000057
I.e. B in the above processiIs replaced by EiOr Mi
And 3.2, a top-level time sequence multi-instance learning module (T-TMIL). And processing the extracted features of all video segments of a video through a full-connection and top-level multi-instance learning module (T-TMIL) to obtain a feature representation of the video. As shown in the lower left corner of FIG. 1, T-TMIL is also implemented based on a self-attention mechanism, similar to B-TIMIL.
Similar to B-TMIL, T-TMIL acts between video clips, which can be considered as examples, where a complete video composed of clips is a packet. However, since the duration of each video is long, we consider that there is no longer a strong temporal relationship between video segments. Many previous works have generally considered that the video level feature can be represented as an average of all segment level features. These methods treat all video segments equally, which adversely affects the accuracy of the final assessment. To obtain a more robust and flexible video representation, we still apply the MIL module on the basis of the slice-level features. And the top module becomes more similar to the conventional multi-instance structure. Here we also describe the process of T-TMIL with body posture characteristics as an example.
The dimensionality of the segment-level features is reduced by the fully-join operation to generate a more efficient embedded representation. We construct a weighted combination of video segments to represent the video, which is calculated as follows:
Figure BDA0003215562330000058
Figure BDA0003215562330000059
wherein the content of the first and second substances,
Figure BDA00032155623300000510
is a video-level body posture characteristic sequence.
Figure BDA00032155623300000511
The calculation is as follows:
Figure BDA00032155623300000512
wherein the content of the first and second substances,
Figure BDA00032155623300000513
is the weight matrix in T-TMIL. Similar to the process described above, we use the T-TMIL module to aggregate video-level head pose feature sequences
Figure BDA00032155623300000514
And video level face key point feature sequence
Figure BDA00032155623300000515
I.e. in the above formula
Figure BDA00032155623300000516
Is replaced by
Figure BDA00032155623300000517
Or
Figure BDA00032155623300000518
And 4, fusing the characteristics. And (3) fusing the three types of video clip level features extracted in the step (3.1), and fusing the three types of video level features extracted in the step (3.2). The fusion process is shown in figure 2.
We use three types of features to assess learning engagement, and our hierarchical module processes each type of feature separately. In order to realize advantage complementation among different features and increase judgment information, a weighted feature fusion method is provided, and segment-level and video-level fusion features used for evaluation are extracted. As shown in fig. 2, the weight matrix is the same size as the feature matrix and is composed of trainable parameters. And normalizing each column of the weight matrix through a Softmax function to obtain different proportions of different features on the dimension, and then weighting and summing to obtain weighted fusion features.
And 5, performing full connection operation on the fusion characteristics of the video clip level to obtain the learning participation of the video clip. And obtaining the learning participation of the video through full-connection operation of the video-level fusion characteristics. And respectively establishing local and global supervision by using the average value of the video segment participation and the learning participation of the video, and training the network.

Claims (8)

1. A student learning participation degree evaluation method based on hierarchical time sequence multi-example learning is characterized by comprising the following steps:
step 1, extracting image frames from each video, wherein each frame of image forms a video segment, and each video acquires N video segments.
And 2, respectively extracting body posture features, head posture features and facial key point features of each frame image in each video clip by using OpenPose, FSA-NET and PLFD networks for evaluation of learning participation.
Step 3, for each type of characteristic sequence of the video clip, respectively using a Bi-LSTM network to obtain the hidden state of each moment; the hidden state is input into an underlying temporal multi-instance learning module (B-TMIL) to obtain a feature representation of the video segment. And processing the extracted features of all video segments of one video through a full-connection and top-level multi-instance learning module (T-TMIL) to obtain the feature representation of the video. Wherein, the B-TIMIL and the T-TMIL are realized based on a self-attention mechanism, and the structures are the same.
And 4, fusing the three video segment-level features extracted by the B-TMIL in the step 3, and fusing the three video-level features extracted by the T-TMIL in the step 3.
And 5, performing full connection operation on the fusion characteristics of the video clip level to obtain the learning participation of the video clip. And obtaining the learning participation of the video through full-connection operation of the video-level fusion characteristics. And respectively establishing local and global supervision by using the average value of the video segment participation and the learning participation of the video, and training the whole network.
2. The student learning participation degree evaluation method based on the hierarchical time series multi-instance learning as claimed in claim 1, wherein in the step 1, the image frames are extracted to be specific to reserving one frame every few frames at equal intervals.
3. The student learning engagement assessment method based on hierarchical time series multi-instance learning as claimed in claim 1, wherein in step 2, v is a video segment framei,jRespectively extracting head posture characteristics e by using OpenPose, FSA-NET and PLFD networksi,jBody posture characteristics bi,jAnd face key point mi,j. For video clip ViThen get the head gesture sequenceEi={ei,1,ei,2…,ei,l}, body posture sequence Bi={bi,1,bi,2…,bi,lAnd facial Key Point sequence Mi={mi,1,mi,2…,mi,l}。
4. The student learning participation evaluation method based on hierarchical time series multiple instance learning according to claim 1, wherein in step 3, the B-TMIL module acts on a sequence of sample video frames constituting a short video segment, wherein the frames are instances and the segments are packets. A valid representation of the packet needs to be obtained in order to accurately obtain its tag. The characterization of the packet is adaptively obtained by trainable parameters using a multi-instance learning module of an attention-machine mechanism to act on the hidden state at all times of the Bi-LSTM.
5. The student learning engagement assessment method based on hierarchical time series multi-instance learning according to claim 4, characterized in that X is usediRepresenting one of a head pose sequence, a body pose sequence, a facial keypoint sequence, XiInputting into Bi-LSTM to obtain hidden state sequence Hi={hi,1,hi,2…,hi,l}。XiCorresponding segment-level aggregation features
Figure FDA0003215562320000021
Is calculated as
Figure FDA0003215562320000022
And (3) weighted sum form after dimension reduction:
Figure FDA0003215562320000023
Figure FDA0003215562320000024
Figure FDA0003215562320000025
wherein the content of the first and second substances,
Figure FDA0003215562320000026
representing one of a segment-level head pose feature sequence, a segment-level body pose feature sequence, and a segment-level face keypoint feature sequence. Delta is the function of the ReLU and is,
Figure FDA0003215562320000027
is a full join operation for dimensionality reduction.
Figure FDA0003215562320000028
Is a weight matrix,. phi.is an element-by-element multiplication,. sigma.is a Sigmoid function, and. tau.is a Tanh function.
6. The student learning participation evaluation method based on the hierarchical time series multi-instance learning as claimed in claim 1, wherein in step 3, T-TMIL is applied between video segments, the video segments are considered as instances, and a complete video composed of the segments is a package. The MIL module is applied on the basis of the segment-level features. The dimensionality of the segment-level features is reduced by the fully-join operation to generate a more efficient embedded representation. A weighted combination of video segments is constructed to represent the video.
7. The student learning engagement assessment method based on hierarchical time series multi-instance learning according to claim 6,
Figure FDA0003215562320000029
representing one of a video-level head pose feature sequence, a video-level body pose feature sequence, a video-level facial key point feature sequence:
Figure FDA00032155623200000210
Figure FDA00032155623200000211
Figure FDA00032155623200000212
wherein the content of the first and second substances,
Figure FDA00032155623200000213
is the weight matrix in T-TMIL.
Figure FDA00032155623200000214
Representing one of a segment-level head pose feature sequence, a segment-level body pose feature sequence, and a segment-level face keypoint feature sequence.
8. The student learning participation evaluation method based on the hierarchical time series multi-instance learning as claimed in claim 1, wherein in step 4, a weighted feature fusion method is adopted to extract segment-level and video-level fusion features for evaluation. The weight matrix is the same size as the feature matrix and is composed of trainable parameters. And normalizing each column of the weight matrix through a Softmax function to obtain different proportions of different features on corresponding dimensions, and then weighting and summing to obtain weighted fusion features.
CN202110942289.9A 2021-08-17 2021-08-17 Student learning participation assessment method based on hierarchical time sequence multi-example learning Active CN113723233B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110942289.9A CN113723233B (en) 2021-08-17 2021-08-17 Student learning participation assessment method based on hierarchical time sequence multi-example learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110942289.9A CN113723233B (en) 2021-08-17 2021-08-17 Student learning participation assessment method based on hierarchical time sequence multi-example learning

Publications (2)

Publication Number Publication Date
CN113723233A true CN113723233A (en) 2021-11-30
CN113723233B CN113723233B (en) 2024-03-26

Family

ID=78676052

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110942289.9A Active CN113723233B (en) 2021-08-17 2021-08-17 Student learning participation assessment method based on hierarchical time sequence multi-example learning

Country Status (1)

Country Link
CN (1) CN113723233B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116485792A (en) * 2023-06-16 2023-07-25 中南大学 Histopathological subtype prediction method and imaging method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178142A (en) * 2019-12-05 2020-05-19 浙江大学 Hand posture estimation method based on space-time context learning
CN112287891A (en) * 2020-11-23 2021-01-29 福州大学 Method for evaluating learning concentration through video based on expression and behavior feature extraction
US20210042937A1 (en) * 2019-08-08 2021-02-11 Nec Laboratories America, Inc. Self-supervised visual odometry framework using long-term modeling and incremental learning
CN112541529A (en) * 2020-12-04 2021-03-23 北京科技大学 Expression and posture fusion bimodal teaching evaluation method, device and storage medium
WO2021051579A1 (en) * 2019-09-17 2021-03-25 平安科技(深圳)有限公司 Body pose recognition method, system, and apparatus, and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210042937A1 (en) * 2019-08-08 2021-02-11 Nec Laboratories America, Inc. Self-supervised visual odometry framework using long-term modeling and incremental learning
WO2021051579A1 (en) * 2019-09-17 2021-03-25 平安科技(深圳)有限公司 Body pose recognition method, system, and apparatus, and storage medium
CN111178142A (en) * 2019-12-05 2020-05-19 浙江大学 Hand posture estimation method based on space-time context learning
CN112287891A (en) * 2020-11-23 2021-01-29 福州大学 Method for evaluating learning concentration through video based on expression and behavior feature extraction
CN112541529A (en) * 2020-12-04 2021-03-23 北京科技大学 Expression and posture fusion bimodal teaching evaluation method, device and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
康洪晶;王甲生;: "MOOC在线学习者参与度的模糊综合评判研究", 计算机与数字工程, no. 11, 20 November 2018 (2018-11-20) *
缪佳;禹东川;: "基于课堂视频的学生课堂参与度分析", 教育生物学杂志, no. 04, 15 December 2019 (2019-12-15) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116485792A (en) * 2023-06-16 2023-07-25 中南大学 Histopathological subtype prediction method and imaging method
CN116485792B (en) * 2023-06-16 2023-09-15 中南大学 Histopathological subtype prediction method and imaging method

Also Published As

Publication number Publication date
CN113723233B (en) 2024-03-26

Similar Documents

Publication Publication Date Title
CN108229338B (en) Video behavior identification method based on deep convolution characteristics
CN107766447A (en) It is a kind of to solve the method for video question and answer using multilayer notice network mechanism
CN111652202B (en) Method and system for solving video question-answer problem by improving video-language representation learning through self-adaptive space-time diagram model
CN110889672A (en) Student card punching and class taking state detection system based on deep learning
Hieu et al. Identifying learners’ behavior from videos affects teaching methods of lecturers in Universities
CN106897671A (en) A kind of micro- expression recognition method encoded based on light stream and FisherVector
CN111611854B (en) Classroom condition evaluation method based on pattern recognition
Zhou et al. Classroom learning status assessment based on deep learning
CN114898460B (en) Teacher nonverbal behavior detection method based on graph convolution neural network
CN111723667A (en) Human body joint point coordinate-based intelligent lamp pole crowd behavior identification method and device
CN116935447A (en) Self-adaptive teacher-student structure-based unsupervised domain pedestrian re-recognition method and system
CN115240259A (en) Face detection method and face detection system based on YOLO deep network in classroom environment
CN113723233A (en) Student learning participation degree evaluation method based on layered time sequence multi-example learning
Tang et al. Automatic facial expression analysis of students in teaching environments
CN113989608A (en) Student experiment classroom behavior identification method based on top vision
CN110941976A (en) Student classroom behavior identification method based on convolutional neural network
CN113688789B (en) Online learning input degree identification method and system based on deep learning
CN117058752A (en) Student classroom behavior detection method based on improved YOLOv7
Tang et al. Multi-level Amplified iterative training of semi-supervision deep learning for glaucoma diagnosis
CN114638988A (en) Teaching video automatic classification method and system based on different presentation modes
CN113469001A (en) Student classroom behavior detection method based on deep learning
CN113536926A (en) Human body action recognition method based on distance vector and multi-angle self-adaptive network
CN111144368A (en) Student behavior detection method based on long-time and short-time memory neural network
CN111507241A (en) Lightweight network classroom expression monitoring method
Zhu et al. Emotion Recognition in Learning Scenes Supported by Smart Classroom and Its Application.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant