CN112949461A

CN112949461A - Learning state analysis method and device and electronic equipment

Info

Publication number: CN112949461A
Application number: CN202110219573.3A
Authority: CN
Inventors: 赵德玺; 陈清飞
Original assignee: Beijing Gaotu Yunji Education Technology Co Ltd
Current assignee: Beijing Gaotu Yunji Education Technology Co Ltd
Priority date: 2021-02-26
Filing date: 2021-02-26
Publication date: 2021-06-11

Abstract

The disclosure provides a learning state analysis method, a learning state analysis device and electronic equipment, wherein the method comprises the following steps: acquiring a learning video of a target user in the process of learning a preset learning course; detecting attention state information of a target user according to image frames in a learning video; and when the target user is determined to be in the distraction state according to the distraction state information, acquiring distraction information of the target user, and sending the distraction information to the server to perform statistical analysis on the distraction information. According to the embodiment of the invention, the attention state information of the target user can be detected in real time in the online learning process by means of detecting and analyzing the attention state information of the target user through the learning video, so that the online course quantitative analysis is realized, and the online learning quality of the target user is improved.

Description

Learning state analysis method and device and electronic equipment

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a learning state analysis method and apparatus, and an electronic device.

Background

Along with the development of the internet, the education forms are more and more diversified, and the popularization of online education based on the internet technology provides an education mode capable of learning anytime and anywhere. In the existing online education mode, the teaching end cannot timely master the attention state of each learning user of the learning end because the online education mode is not offline teaching. For example, the teaching end can determine the attention state of each learning user by observing the learning expression of each student at any time, but since the learning video of each learning user is not displayed in real time at the teaching end, at this time, the teaching end cannot grasp the attention state of each learning user in the whole learning process.

Disclosure of Invention

The embodiment of the disclosure at least provides a learning state analysis method and device and electronic equipment.

In a first aspect, an embodiment of the present disclosure provides a learning state analysis method, including: acquiring a learning video of a target user in the process of learning a preset learning course; detecting attention state information of the target user according to image frames in the learning video, wherein the attention state information is used for representing the attention distraction degree of the target user when learning the preset learning course; and under the condition that the target user is determined to be in the distraction state according to the distraction state information, acquiring distraction information of the target user, and carrying out statistical analysis on the distraction information.

With reference to the first aspect, embodiments of the present disclosure provide a first possible implementation manner of the first aspect, where: the detecting attention state information of the target user according to the image frames in the learning video comprises: detecting image frames in the learning video to obtain a target detection result; and determining the attention state information according to the target detection result, wherein the target detection result comprises: and the expression detection result of the target user and/or the sight line detection result of the target user.

With reference to the first possible implementation manner of the first aspect, this disclosure provides a second possible implementation manner of the first aspect, where the detecting an image frame in the learning video to obtain a target detection result includes: extracting eye images and face images from image frames of the learning video, wherein the face images carry mark information used for representing face position information; and performing sight detection on the eye image and the face image through a sight detection model to obtain the gaze point position information of the eyes of the target user, and determining the gaze point position information as the sight detection result.

With reference to the first possible implementation manner of the first aspect, this disclosed embodiment provides a third possible implementation manner of the first aspect, where the method further includes: judging whether the expression indicated by the expression detection result is contained in a plurality of preset expressions; the preset expressions are expressions used for representing that the user is in a distraction state; and determining that the target user is in a distraction state when the judgment result shows that the target user is involved.

With reference to the second possible implementation manner of the first aspect, an embodiment of the present disclosure provides a fourth possible implementation manner of the first aspect, where the method further includes: judging whether the gaze point is located outside a range indicated by a display screen of the learning terminal according to the gaze point position information under the condition that the attention state information is the sight line detection result; in the case of a determination of yes, determining that the target user is in a distracted state.

With reference to the first aspect, the present disclosure provides a fifth possible implementation manner of the first aspect, where the detecting attention state information of the target user according to image frames in the learning video includes: detecting a face image in an image frame of the learning video; under the condition that the face image is detected, performing living body detection on the face image through a living body detection model; and under the condition that the living body detection passes, performing expression detection on the image frame through an expression detection model to obtain an expression detection result.

With reference to the first aspect, an embodiment of the present disclosure provides a sixth possible implementation manner of the first aspect, where the obtaining of the attention distraction information of the target user includes: after detecting that the target user is in the distraction state, counting the duration of time that the target user is in the distraction state; determining the distraction information of the target user from the duration.

With reference to the seventh possible implementation manner of the first aspect, the preset learning course includes multiple time periods, and different time periods correspond to different knowledge points; the determining the distraction information of the target user according to the duration includes: determining a target time period to which the duration belongs among the plurality of time periods; determining a target knowledge point corresponding to the target time period; and determining the attention distraction information according to the target time period and/or the target knowledge point.

In a second aspect, an embodiment of the present disclosure further provides an apparatus for analyzing a learning state, including: the first acquisition module is used for acquiring a learning video in the process of learning preset learning courses by a target user; the detection module is used for detecting attention state information of the target user according to image frames in the learning video, wherein the attention state information is used for representing the attention dispersion degree of the target user when the target user learns the preset learning course; and the second acquisition module is used for acquiring the distraction information of the target user and carrying out statistical analysis on the distraction information under the condition that the target user is determined to be in a distraction state according to the distraction state information.

In a third aspect, an embodiment of the present disclosure further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the first aspect described above, or any possible implementation of the first aspect.

According to the method and the device for analyzing the learning state and the electronic equipment provided by the embodiment of the disclosure, firstly, a learning video in the process of a target user learning a preset learning course is obtained; then, detecting attention state information of a target user according to image frames in the learning video; and finally, under the condition that the target user is determined to be in the distraction state according to the distraction state information, acquiring the distraction information of the target user, and sending the distraction information to a server to perform statistical analysis on the distraction information. According to the embodiment of the invention, the attention state information of the target user can be detected in real time in the online learning process by means of detecting and analyzing the attention state information of the target user through the learning video, so that the online course quantitative analysis is realized, and the online learning quality of the target user is improved.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 is a flowchart illustrating a learning state analysis method provided by an embodiment of the present disclosure;

FIG. 2 is a flow chart illustrating another learning state analysis method provided by embodiments of the present disclosure;

FIG. 3 is a top view of a position relationship between a target user and a learning terminal provided by an embodiment of the disclosure;

FIG. 4 is a front view illustrating a position relationship between a target user and a learning terminal provided by an embodiment of the present disclosure;

fig. 5 is a flowchart illustrating a specific method for detecting the attention status information of the target user according to the image frames in the learning video in the learning status analysis method provided by the embodiment of the present disclosure;

fig. 6 is a schematic diagram illustrating an analysis apparatus for learning states according to an embodiment of the present disclosure;

fig. 7 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The term "and/or" herein merely describes an associative relationship, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Research shows that in the existing online education mode, the teaching end cannot timely master the attention state of each learning user of the learning end because the online education mode is not offline teaching. For example, the teaching end can determine the attention state of each learning user by observing the learning expression of each student at any time, but since the learning video of each learning user is not displayed in real time at the teaching end, at this time, the teaching end cannot grasp the attention state of each learning user in the whole learning process.

Based on the above research, the present disclosure provides a learning state analysis method and apparatus, and an electronic device, and in the embodiments of the present disclosure, a mode of detecting and analyzing attention state information of a target user through a learning video is used, so that the attention state of the target user can be detected in real time in an online learning process, thereby implementing quantitative analysis of online courses, and improving online learning quality of the target user.

To facilitate understanding of the present embodiment, first, a method for analyzing a learning state disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the method for analyzing a learning state provided in the embodiments of the present disclosure is generally an electronic device with certain computing capability, and the electronic device includes, for example: a terminal device, which may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle mounted device, a wearable device, or a server or other processing device. In some possible implementations, the method of analyzing the learning state may be implemented by a processor calling computer readable instructions stored in a memory.

Referring to fig. 1, a flowchart of a learning state analysis method provided in an embodiment of the present disclosure is shown. The method for analyzing the learning state provided by the embodiment of the present disclosure may be applied to a learning terminal, and may also be applied to a server, where the learning terminal and the server may be the electronic device described above, and the present disclosure does not specifically limit this. The method comprises steps S101-S105, wherein:

s101: and acquiring a learning video of the target user in the process of learning the preset learning course.

In the embodiment of the disclosure, an application program for online teaching may be installed in the learning terminal in advance, and when the application program for online teaching is installed, an opening authority for opening the camera device of the learning terminal may be set for the application program for online teaching, and at this time, the camera device of the learning terminal may be opened through the opening authority, so as to collect a learning video of a target user in a preset learning course through the camera device.

In the embodiment of the present disclosure, the camera device of the learning terminal may be configured to periodically collect the learning video of the target user in the learning process, for example, collect the learning video of the target user in the learning process every 5 seconds. It should be noted that, in the embodiment of the present disclosure, the period time for acquiring the learning video is not limited, and the user may set the period time according to actual needs.

S103: and detecting attention state information of the target user according to image frames in the learning video, wherein the attention state information is used for representing the attention distraction degree of the target user when learning the preset learning course.

In an embodiment of the present disclosure, the attention state information may include: expression information and/or gaze information. Therefore, the degree of distraction of the target user in learning the preset learning course can be determined by the expression information and/or the sight line information.

Specifically, image processing may be performed on image frames in the learning video through the attention state detection model, so as to obtain attention state information, i.e., expression information and/or sight line information. In the embodiment of the present disclosure, the number of the attention state detection models may be one or more, and when the attention state information includes a plurality of kinds of information, one attention state detection model may be correspondingly set for each kind of information, so as to perform image processing on image frames in the learning video through each attention state detection model to obtain corresponding information. Besides, an attention state detection model may be correspondingly set for a plurality of information, for example, image frames in the learning video are input into the attention state detection model as input data, and the image frames are subjected to image processing by the attention state detection model to obtain output data, where the output data of the attention state detection model is the plurality of information.

It should be noted that, when there are a plurality of attention state detection models, the model structures between any two attention state detection models may be the same or different, and this disclosure does not specifically limit this.

S105: and under the condition that the target user is determined to be in the distraction state according to the distraction state information, acquiring distraction information of the target user, and carrying out statistical analysis on the distraction information.

In the embodiment of the present disclosure, after obtaining the attention state information, it may be determined whether the target user is in the distracted state according to the attention state information. If so, acquiring the attention dispersion information of the target user. For example, when the attention state information is expression information and/or gaze information, it may be determined whether the target user is in a distracted state according to the expression information and/or the gaze information. If so, acquiring the attention dispersion information of the target user.

The distraction information can be understood as time information of the target user in a distraction state in the process of learning the preset learning course, and learning contents (e.g., knowledge points) corresponding to the time information.

Specifically, the time information of the state of distraction may be at least one of: the duration of the target user in the distracted state, the total duration of the target user in the distracted state, the course progress time corresponding to the target user in the distracted state, and the learning terminal time of the target user in the distracted state. The learning content corresponding to the time information may be information such as a knowledge point corresponding to a preset learning course when the target user is in a distracted state.

In addition to the above-described distraction information, the distraction information may be user identification information of the target user, which may be terminal ID information of a learning terminal used by the target user, and course identification information of a preset learning course, which may also be identity information of the target user (e.g., a school number of the target user, an account number of the target user).

In the embodiment of the disclosure, firstly, a learning video in the process of a target user learning a preset learning course is obtained; then, detecting attention state information of a target user according to image frames in the learning video; and finally, under the condition that the target user is determined to be in the distraction state according to the distraction state information, acquiring distraction information of the target user, and sending the distraction information to a server to perform statistical analysis on the distraction information. According to the embodiment of the invention, the attention state information of the target user can be detected in real time in the online learning process by means of detecting and analyzing the attention state information of the target user through the learning video, so that the online course quantitative analysis is realized, and the online learning quality of the target user is improved.

As can be seen from the above description, in the embodiment of the present disclosure, after a learning video is acquired, attention state information of the target user is first detected according to image frames in the learning video.

In an optional embodiment, as shown in fig. 2, step S103 specifically includes the following steps:

s1031: detecting image frames in the learning video to obtain a target detection result;

s1032: determining the attention state information according to the target detection result, wherein the target detection result comprises: and the expression detection result of the target user and/or the sight line detection result of the target user.

As can be seen from the above description, in the embodiment of the present disclosure, the image frames in the learning video may be detected by the attention state detection model, so as to obtain the target detection result. The attention state detection model can perform sight line detection and/or expression detection on the image frames. In this case, the expression detection result, and/or the sight line detection result may be included in the target detection result.

Specifically, the expression detection result may be expression type information of the facial expression of the target user, for example, a confused expression, a distracted expression, a pleasant expression; the sight line detection result is used for determining whether the eye fixation point of the target user is within the screen range of the learning terminal. And when the expression type information of the facial expression of the target user is determined to be the distracted expression according to the expression detection result, and/or when the eye gaze point of the target user is determined to deviate from the screen range of the learning terminal according to the sight line detection result, determining that the target user is in the distracted state.

The following description will be given taking an example in which the attention state detection model includes a sight line detection model and an expression detection model.

And performing sight line detection on the image frames in the learning video through the sight line detection model to obtain a sight line detection result of the target user. And performing expression detection on the image frames of the learning video through the expression detection model to obtain expression detection results, and then determining the sight line detection results and the expression detection results as attention state information.

As can be seen from the above description, in the embodiment of the present disclosure, when the image frames in the learning video are detected by the attention state detection model, the gaze and/or expression of the target user can be detected by the attention state detection model, so as to determine the attention state information by the detection result of the gaze and/or expression. By the method, the attention state information of the target user can be detected more comprehensively, so that the accuracy of the detection result is improved.

In an optional implementation manner, in step S1031, the image frames in the learning video are detected to obtain a target detection result, which specifically includes the following processes:

firstly, an eye image and a face image are extracted from an image frame of the learning video, wherein the face image carries mark information used for representing face position information.

In the embodiment of the present disclosure, when performing line-of-sight detection on the image frames in the learning video through the attention state detection model, first, an eye image in the image frames is extracted, where the eye image includes left eye image information and right eye image information of a target user, and a face image in the image frames may also be extracted, where the face image includes mark information used for representing face position information, for example, the face image includes position information of a face feature point.

Secondly, performing sight line detection on the eye image and the face image through a sight line detection model to obtain the gaze point position information of the eyes of the target user, and determining the gaze point position information as the sight line detection result.

After the eye image and the face image are obtained, the left eye image information, the right eye image information, the face image and the face position information in the eye image can be used as input data of the sight line detection model and are respectively input into the four processing branches of the sight line detection model to be processed, so that the fixation point is estimated according to the four input data, and a two-dimensional coordinate position is obtained through estimation, wherein the two-dimensional coordinate position is the eye fixation point position information of the target user.

Note that the gaze point estimation is to estimate the landing point of the binocular eye focus. The general scenario is to estimate the gaze point of a person on a two-dimensional plane. The two-dimensional plane can be a mobile phone screen, a pad screen, a television screen and the like. When the target user is in a distracted state, the eye gaze point of the target user is generally outside the learning screen of the learning terminal.

In the embodiment of the present disclosure, the position information of the gaze point of the user may be obtained by estimating the eye gaze point of the target user through the face position information, the face image, the left eye image information, and the right eye image information, so as to determine whether the target user is in a state of distractions according to the position information. The processing method can accurately judge the attention focusing condition of the target user, can be suitable for various terminal devices, and is wide in application range and high in robustness.

In the embodiment of the present disclosure, after obtaining the attention state information of the target user in the manner described above, it may be determined whether the target user is in the distracted state according to the attention state information. As can be seen from the above description, the attention status information includes: the expression detection result and/or the sight line detection result, how to determine whether the target user is in the distracted state according to the expression detection result and/or the sight line detection result will be described below.

The first condition is as follows: the attention state information includes a sight line detection result.

In this case, after detecting the attention state information of the target user from the image frames in the learning video, the following process is further included:

(1) and judging whether the gaze point is located outside the range indicated by the display screen of the learning terminal according to the gaze point position information under the condition that the attention state information is the sight line detection result.

As can be seen from the above description, when the target user is in the distracted state, the eye gaze point of the target user is generally located outside the learning screen range of the learning terminal, or the target gaze point is located at an edge position of the learning screen. Therefore, in the embodiments of the present disclosure, the distraction state of the target user can be determined from the position information of the eye fixation point of the target user.

In the embodiment of the disclosure, first, size information of a display screen of the learning terminal may be obtained, and a preset sight distance length from eyes of the target user to the display screen of the learning terminal may be obtained, where the sight distance length is a relatively fixed value selected empirically. The apparent distance length may vary with the application scenario, and the distance value of the apparent distance length is not specifically limited in the embodiments of the present disclosure.

The above determination process is described below with reference to fig. 3 and 4. Fig. 3 is a top view showing the positional relationship between the target user and the learning terminal, and fig. 4 is a front view showing the positional relationship between the target user and the learning terminal. As can be seen from fig. 3 and 4, point a is the position of the eye of the target user, and point B is the position of the eye fixation point of the target user. As can be seen from fig. 3 and 4, the eye fixation point of the target user is located outside the display screen of the learning terminal. As can be seen from fig. 3, after the viewing distance length and the size information are determined, it can be determined that the included angle between the viewing line AC and the viewing line AD is an included angle 2, and the included angle between the viewing line AC and the viewing line AB is an included angle 1, when the eye gaze point of the target user is located outside the display screen of the learning terminal, the included angle 1 is greater than the included angle 2, and at this time, it can be determined that the eye gaze point of the target user is located outside the display screen of the learning terminal.

Based on this, in the embodiment of the present disclosure, whether the eye gaze point of the target user is located outside the display screen of the learning terminal may be determined by comparing the sizes of the included angle 1 and the included angle 2. When the included angle 1 is greater than or equal to the included angle 2, it can be determined that the eye fixation point of the target user is located outside the display screen of the learning terminal.

(2) And if yes, determining that the target user is in a distraction state.

In the embodiment of the present disclosure, if it is determined that the eye gaze point position of the target user is outside the display screen range of the learning terminal according to the above-described manner, it is determined that the target user is in the distraction state.

In another implementation manner of the embodiment of the present disclosure, a threshold value indicating distraction of the target user may be preset. When the gazing point position of the target user stays in the display screen range of the learning terminal all the time, detecting whether the gazing point position of the target user moves along with the movement of a teaching picture on the display screen of the terminal equipment or not in real time, and when the time that the gazing point of the target user stays at a fixed position exceeds the set threshold value, determining that the user is in a distraction state of sight.

In the embodiment of the disclosure, by detecting the position information of the eye gaze point of the target user and determining whether the target user is in a distracted state according to the position information of the gaze point, the attention concentration condition of the target user can be accurately judged, and the processing method can be applied to various terminal devices, has a wide application range and stronger robustness, and can better simulate the mutual inductance between teachers and students in an offline teaching mode.

Case two: the attention state information includes an expression detection result.

(1) judging whether the expression indicated by the expression detection result is contained in a plurality of preset expressions or not under the condition that the attention state information is the expression detection result; the preset expressions are expressions used for representing that the user is in a distraction state.

(2) And determining that the target user is in a distraction state when the content is judged to be contained.

As can be seen from the above description, in the embodiment of the present disclosure, expression detection may be performed on the image frames in the learning video through an expression detection model, so as to obtain an expression detection result. For example, the expression detection result may be a probability that the facial expression in the image frame belongs to a plurality of preset expressions (e.g., confusion, depression, or stubborn). The facial expression in one image frame can correspond to one or more preset expressions.

In the embodiment of the present disclosure, a plurality of expressions (i.e., the above-mentioned preset expressions) for representing that the user is in a distracted state may be preset. After the expression detection model is used for carrying out expression detection on the image frames in the learning video, expression detection results of the image frames can be obtained, wherein the expression detection results are used for indicating expression types of faces contained in the image frames. After the expression detection result is obtained, the expression detection result can be matched with a plurality of preset expressions, and if the expression identical to the expression detection result is matched in the plurality of preset expressions, the target user is determined to be in a distraction state.

For example, the preset expressions include: the expression type comprises a preset expression type A1, a preset expression type A2 and a preset expression type A3. After the image frames in the learning video are subjected to recognition processing by the expression detection model, probabilities P1, P2, and P3 that the image frames belong to preset expression types a1 to A3 can be obtained. After that, the expression type indicated by the expression detection result can be judged according to the probabilities P1, P2, and P3.

For example, a threshold C1 may be set, and if the probability P1 is greater than the threshold C1 among the probabilities P1, P2, and P3, the expression type indicated by the expression detection result is determined to be the preset expression type a 1.

The preset expression type a1 may then be matched to preset expressions, for example, including: preset expression type B1, preset expression type B2, and preset expression type B3. If it is matched that the preset expression type B2 is the same as the expression type of the preset expression type a1 among the preset expression types B1-B3, it is determined that the target user is in a distracted state.

In the embodiment of the disclosure, whether the target user is in the distracted state or not is determined by the expression detection result, so that the learning state of the target user can be remotely monitored, the learning state of the target user can be mastered in real time, and the learning quality of the target user is improved.

As shown in fig. 5, in an optional implementation manner of the embodiment of the present disclosure, the step S105 of detecting the attention state information of the target user according to the image frames in the learning video includes the following processes:

s501: a face image is detected in an image frame of a learning video.

S502: and under the condition that the face image is detected, performing living body detection on the face image through a living body detection model.

In the embodiment of the present disclosure, after the learning video is acquired, the face detection may be performed on the image frames in the learning video through the face detection model.

It should be noted that, in the embodiment of the present disclosure, face detection may be performed on each image frame in the learning video; face detection can also be carried out on partial image frames in the learning video. Specifically, the image frames for face detection may be determined in the learning video according to the memory consumption of the CPU of the learning terminal. For example, when the memory consumption of the CPU of the learning terminal is large, in order to ensure the normal operation of the learning terminal, a part of image frames may be selected from the learning video for face detection, for example, every N frames of image frames in the learning video for face detection.

In the case where it is detected that the image frame includes a face image, the living body detection may be performed on the face image. In an alternative embodiment, the face image may be input into the living body detection model for processing, and a living body detection result is obtained, where the living body detection result is used to represent a probability that the face image is the living body face image. After the probability is obtained, if the probability is greater than a preset threshold value, determining that the face image is a living body face image. After the living body detection passes, the expression detection model is used for carrying out expression detection on the image frame to obtain an expression detection result.

In another optional embodiment, after the image frame is detected to contain the face image, an interaction instruction may be further sent to the target user, where the interaction instruction is used to instruct the target user to perform an action indicated to be performed in the interaction instruction, for example, a head nodding operation or a head shaking operation; after the interactive instruction is sent, the action performed by the user is detected, and in the case that the action performed by the user is determined to be the execution action indicated in the interactive instruction, the pass of the living body detection is determined.

In the embodiment of the disclosure, after detecting that the image frame contains the face image, user information of the target user may also be obtained, where the user information may be head portrait information of the user. Then, comparing the face image in the image frame with the head portrait information of the user, and judging whether the user in the image frame is the target user according to the face comparison result. If the target user is judged not to be the target user, generating a piece of recording information for recording that XX of the target user does not perform online learning in XX and the target user is in a state of absence.

S503: and under the condition that the living body detection passes, performing expression detection on the image frame through an expression detection model to obtain an expression detection result.

In the embodiment of the disclosure, when the living body detection passes, the expression detection module may perform expression detection on the image to obtain an expression detection result, and determine whether the target user is in a distracted state according to the expression detection result. If yes, acquiring the attention dispersion information of the target user, and sending the attention dispersion information to the server to perform statistical analysis on the attention dispersion information.

In the embodiment of the disclosure, image processing may be performed on image frames in a learning video through face detection and live body detection, and when it is determined that a face is included in an image frame, live body detection may be continuously performed on the face to implement live body detection on the image frame. The online teaching is more intelligent by the processing mode, the open course behavior on the line can be accurately identified, and meanwhile, the detection of attention state information can be avoided aiming at the image frame failed in the living body detection, so that the CPU memory consumption of the learning terminal is further saved.

As can be seen from the above description, in the embodiment of the present disclosure, after determining that the target user is in the distracted state according to the attention state information, the method may obtain the distracted information of the target user, which specifically includes the following steps:

(1) and after detecting that the target user is in the distraction state, counting the duration of the target user in the distraction state.

(2) And determining the attention distraction information of the target user according to the duration.

In the embodiment of the present disclosure, after determining that the target user is in the distracted state from the image frames in the learning video, each image frame following the image frame (denoted as image frame M1) may be detected until determining that the target user is in the non-distracted state (e.g., the attentive state) from the image frame (denoted as image frame M2). Then, the time interval between the image frame M1 and the image frame M2 is determined as the duration of the distraction state. The time information corresponding to the image frame M1 is the distraction state start time t1, and the time information corresponding to the image frame M2 is the distraction state end time t 2. After the duration is determined, the distraction information can be determined from the duration.

For example, if it is determined that the course progress time when the target user is detected to be in the distracted state is 17 minutes and 21 seconds, and the end time when the distracted state is detected is 19 minutes and 06 seconds, the distraction information is: the learning content of the learning course is preset in the time period of 17 minutes 21 seconds, the end time of 19 minutes 06 seconds, the duration of 1 minute 45 seconds, and the time period of 17 minutes 21 seconds to 19 minutes 06 seconds.

In the embodiment of the disclosure, by determining the duration of the distraction state and determining the distraction information of the target user, the time that the user is in the distraction state can be accurately recorded, and the learning content of the learning course is preset in the duration of the distraction state, so that a teacher can be helped to analyze and evaluate the learning effect of students more carefully, and the target user can be helped to know the learning state of the student in real time.

In the embodiment of the disclosure, after the teacher end completes teaching, a video file of a preset learning video may be generated, and at this time, the teacher end may determine corresponding knowledge points and a time period corresponding to each knowledge point according to content information in a preset learning course, where different knowledge points correspond to different time periods. When determining the distraction information of a target user according to a duration, a target time period to which the duration belongs may be first determined among the plurality of time periods; and determining a target knowledge point corresponding to the target time period. Finally, the distraction information is determined based on the target time period and/or the target knowledge points.

After the attention dispersion information is obtained, the total attention dispersion time of the target user, the number of times of attention dispersion of the target user, the dispersion frequency of the target user, the knowledge point with the largest number of distractions in the preset learning course, the time period with the largest number of distractions in the preset learning course and the like can be counted according to the attention dispersion information.

In the embodiment of the disclosure, after the teacher end completes teaching, a video file of a preset learning video may be generated, and the teacher end may play back the video file. During the playback process, the teacher end may determine a matching time period in the video file according to the target time period used for representing that the target user is in the distraction state in the distraction information, and record a mapping relationship between the target user and the matching time period. After analyzing the distraction information of each target user in the manner described above, a lecture listening report containing lecture listening information of each target user can be generated.

Besides, the expression detection result of the target user corresponding to each target time period can be determined, so that the attention dispersion type of the target user in the target time period can be determined according to the expression detection result, and the attention dispersion type can be recorded in the lecture report.

In the embodiment of the present disclosure, by generating the lecture listening report by the above-described method, the learning effect of the target user on the preset learning course can be evaluated in a quantitative manner, and the lecture listening report can be pushed to the target user to instruct the target user to change the learning state.

In summary, in the embodiment of the present disclosure, the attention state information of the target user is detected and analyzed through the learning video, and the attention state of the target user can be detected in real time in the online learning process, so that quantitative analysis of the online course is implemented, and the online learning quality of the target user is improved.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same inventive concept, an analysis apparatus for a learning state corresponding to the analysis method for a learning state is also provided in the embodiments of the present disclosure, and since the principle of the apparatus in the embodiments of the present disclosure for solving the problem is similar to the analysis method for a learning state described above in the embodiments of the present disclosure, the implementation of the apparatus may refer to the implementation of the method, and repeated details are omitted.

Referring to fig. 6, a schematic diagram of an apparatus for analyzing a learning state according to an embodiment of the present disclosure is shown, where the apparatus includes: a first acquisition module 61, a detection module 62 and a second acquisition module 63; wherein the content of the first and second substances,

the first obtaining module 61 is configured to obtain a learning video of a target user in a process of learning a preset learning course;

a detecting module 62, configured to detect attention state information of the target user according to image frames in the learning video, where the attention state information is used to represent a degree of distraction of the target user when learning the preset learning course;

a second obtaining module 63, configured to obtain distraction information of the target user and perform statistical analysis on the distraction information when it is determined that the target user is in a distraction state according to the distraction state information.

The embodiment of the disclosure obtains the information of the distraction state of the student by detecting the distraction state of the student, provides a mode for assisting reinforcement learning, enables the on-line lecture listening effect to be quantitatively analyzed, and solves the problems that the attention of the student cannot be detected and controlled when the student cannot be on-line lecture in the prior art scheme.

In a possible implementation, the detection module is further configured to: detecting image frames in the learning video to obtain a target detection result; and determining the attention state information according to the target detection result, wherein the target detection result comprises: and the expression detection result of the target user and/or the sight line detection result of the target user.

In a possible implementation, the detection module is further configured to: extracting eye images and face images from image frames of the learning video, wherein the face images carry mark information used for representing face position information; and performing sight detection on the eye image and the face image through a sight detection model to obtain the gaze point position information of the eyes of the target user, and determining the gaze point position information as the sight detection result.

In one possible embodiment, the apparatus is further configured to: judging whether the expression indicated by the expression detection result is contained in a plurality of preset expressions or not under the condition that the attention state information is the expression detection result; the preset expressions are expressions used for representing that the user is in a distraction state; and determining that the target user is in a distraction state when the judgment result shows that the target user is involved.

In one possible embodiment, the apparatus is further configured to: judging whether the gaze point is located outside a range indicated by a display screen of the learning terminal according to the gaze point position information under the condition that the attention state information is the sight line detection result; in the case of a determination of yes, determining that the target user is in a distracted state.

In a possible implementation, the detection module is further configured to: detecting a face image in an image frame of the learning video; under the condition that the face image is detected, performing living body detection on the face image through a living body detection model; and under the condition that the living body detection passes, performing expression detection on the image frame through an expression detection model to obtain an expression detection result.

In a possible implementation, the second obtaining module is further configured to: after detecting that the target user is in the distraction state, counting the duration of time that the target user is in the distraction state; determining the distraction information of the target user from the duration.

In a possible implementation, the second obtaining module is further configured to: determining a target time period to which the duration belongs in a plurality of time periods under the condition that a preset learning course comprises the plurality of time periods and different time periods correspond to different knowledge points; determining a target knowledge point corresponding to the target time period; and determining the attention distraction information according to the target time period and/or the target knowledge point.

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

Corresponding to the analysis method of the learning state in fig. 1, an embodiment of the present disclosure further provides an electronic device 700, as shown in fig. 7, a schematic structural diagram of the electronic device 700 provided in the embodiment of the present disclosure includes:

a processor 71, a memory 72, and a bus 73; the memory 72 is used for storing execution instructions and includes a memory 721 and an external memory 722; the memory 721 is also referred to as an internal memory, and is used for temporarily storing the operation data in the processor 71 and the data exchanged with the external memory 722 such as a hard disk, the processor 71 exchanges data with the external memory 722 through the memory 721, and when the electronic device 700 operates, the processor 71 communicates with the memory 72 through the bus 73, so that the processor 71 executes the following instructions:

acquiring a learning video of a target user in the process of learning a preset learning course; detecting attention state information of the target user according to image frames in the learning video, wherein the attention state information is used for representing the attention distraction degree of the target user when learning the preset learning course; and under the condition that the target user is determined to be in the distraction state according to the distraction state information, acquiring distraction information of the target user, and carrying out statistical analysis on the distraction information.

The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the learning state analysis method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the learning state analysis method in the foregoing method embodiments, which may be referred to specifically for the foregoing method embodiments, and are not described herein again.

The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing an electronic device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A learning state analysis method, comprising:

acquiring a learning video of a target user in the process of learning a preset learning course;

detecting attention state information of the target user according to image frames in the learning video, wherein the attention state information is used for representing the attention distraction degree of the target user when learning the preset learning course;

and under the condition that the target user is determined to be in the distraction state according to the distraction state information, acquiring distraction information of the target user, and carrying out statistical analysis on the distraction information.

2. The method of claim 1, wherein the detecting attention state information of the target user from image frames in the learning video comprises:

detecting image frames in the learning video to obtain a target detection result; and determining the attention state information according to the target detection result, wherein the target detection result comprises: and the expression detection result of the target user and/or the sight line detection result of the target user.

3. The method of claim 2, wherein the detecting image frames in the learning video to obtain target detection results comprises:

extracting eye images and face images from image frames of the learning video, wherein the face images carry mark information used for representing face position information;

and performing sight detection on the eye image and the face image through a sight detection model to obtain the gaze point position information of the eyes of the target user, and determining the gaze point position information as the sight detection result.

4. The method of claim 2, further comprising:

judging whether the expression indicated by the expression detection result is contained in a plurality of preset expressions or not under the condition that the attention state information is the expression detection result; the preset expressions are expressions used for representing that the user is in a distraction state;

and determining that the target user is in a distraction state when the judgment result shows that the target user is involved.

5. The method of claim 3, further comprising:

under the condition that the attention state information is the sight line detection result, judging whether the gaze point is positioned outside the range indicated by a display screen of the learning terminal according to the gaze point position information;

in the case of a determination of yes, determining that the target user is in a distracted state.

6. The method of claim 1, wherein the detecting attention state information of the target user from image frames in the learning video comprises:

detecting a face image in an image frame of the learning video;

under the condition that the face image is detected, performing living body detection on the face image through a living body detection model;

and under the condition that the living body detection passes, performing expression detection on the image frame through an expression detection model to obtain an expression detection result.

7. The method of claim 1, wherein the obtaining the distraction information of the target user comprises:

after detecting that the target user is in the distraction state, counting the duration of time that the target user is in the distraction state;

determining the distraction information of the target user from the duration.

8. The method of claim 7, wherein the predetermined learning course comprises a plurality of time periods, and different time periods correspond to different knowledge points;

the determining the distraction information of the target user according to the duration includes:

determining a target time period to which the duration belongs among the plurality of time periods; determining a target knowledge point corresponding to the target time period;

and determining the attention distraction information according to the target time period and/or the target knowledge point.

9. An apparatus for analyzing a learning state, comprising:

the first acquisition module is used for acquiring a learning video in the process of learning preset learning courses by a target user;

the detection module is used for detecting attention state information of the target user according to image frames in the learning video, wherein the attention state information is used for representing the attention dispersion degree of the target user when the target user learns the preset learning course;

and the second acquisition module is used for acquiring the distraction information of the target user and carrying out statistical analysis on the distraction information under the condition that the target user is determined to be in a distraction state according to the distraction state information.

10. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the method of analyzing a learning state according to any one of claims 1 to 8.