CN113591678B

CN113591678B - Classroom attention determination method, device, apparatus, storage medium, and program product

Info

Publication number: CN113591678B
Application number: CN202110858425.6A
Authority: CN
Inventors: 刘海涛; 李玉格; 胡益珲
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-07-28
Filing date: 2021-07-28
Publication date: 2023-06-23
Anticipated expiration: 2041-07-28
Also published as: CN113591678A

Abstract

The disclosure provides a classroom attention determination method, a device, electronic equipment, a computer readable storage medium and a computer program product, and relates to the technical field of artificial intelligence such as image processing, voice processing and deep learning. The method comprises the following steps: determining the distribution of the noisy degree according to the audio data in the classroom monitoring video; determining chaotic degree distribution according to image data in the classroom monitoring video; determining class scenes corresponding to different time periods according to the noisy degree distribution and the chaotic degree distribution; and determining classroom attention parameters respectively corresponding to the different classes of classroom scenes according to the face orientation information in the image data. According to the method, the characteristics of different types of scenes in the image and audio data are respectively displayed in the off-line classroom teaching process, so that the types of the classroom scenes in different time periods are accurately determined, the classroom attention under the types of scenes is determined as accurately as possible, and the influence of the different scenes on the assessment of the classroom attention parameters is not ignored.

Description

Classroom attention determination method, device, apparatus, storage medium, and program product

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to the field of image processing, speech processing, deep learning, and other artificial intelligence technologies, and more particularly, to a classroom attention determination method, apparatus, electronic device, computer readable storage medium, and computer program product.

Background

While on-line education or on-line learning is getting hot, the face-to-face formation of off-line education by teacher and students is still the most conventional education or learning form, how to better evaluate the off-line teaching state is used for providing the teacher and students with targeted improvement opinion and improving teaching ability, and is the focus of the study of the technicians in the field.

Disclosure of Invention

Embodiments of the present disclosure provide a classroom attention determination method, apparatus, electronic device, computer-readable storage medium, and computer program product.

In a first aspect, an embodiment of the present disclosure provides a classroom attention determination method, including: determining the distribution of the noisy degree according to the audio data in the classroom monitoring video; determining chaotic degree distribution according to image data in the classroom monitoring video; determining class scenes corresponding to different time periods according to the noisy degree distribution and the chaotic degree distribution; and determining classroom attention parameters respectively corresponding to the different classes of classroom scenes according to the face orientation information in the image data.

In a second aspect, an embodiment of the present disclosure proposes a class attention determining apparatus including: the noisy degree distribution determining unit is configured to determine noisy degree distribution according to the audio data in the classroom monitoring video; a confusion-degree distribution determining unit configured to determine a confusion degree distribution from image data in the classroom monitoring video; the different classroom scene determining unit is configured to determine classroom scenes corresponding to different time periods according to the noisy degree distribution and the chaotic degree distribution; and the classroom attention parameter determination unit is configured to determine the classroom attention parameters respectively corresponding to different types of classroom scenes according to the face orientation information in the image data.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to implement a classroom attention determination method as described in any one of the implementations of the first aspect when executed by the at least one processor.

In a fourth aspect, embodiments of the present disclosure provide a non-transitory computer-readable storage medium storing computer instructions for enabling a computer to implement a classroom attention determination method as described in any one of the implementations of the first aspect when executed.

In a fifth aspect, embodiments of the present disclosure provide a computer program product comprising a computer program which, when executed by a processor, is capable of implementing a classroom attention determination method as described in any one of the implementations of the first aspect.

According to the classroom attention determining method provided by the embodiment of the disclosure, firstly, the noisy degree distribution is determined according to the audio data in the classroom monitoring video; meanwhile, determining chaotic degree distribution according to image data in the classroom monitoring video; then, determining class scenes corresponding to different time periods according to the noisy degree distribution and the chaotic degree distribution; and finally, determining the classroom attention parameters respectively corresponding to the different classes of classroom scenes according to the face orientation information in the image data.

The method and the device have the advantages that the characteristics of different types of scenes in the image and audio data are respectively displayed in the offline classroom teaching process, the types of the classroom scenes in different time periods are accurately determined, and therefore the classroom attention under the types of scenes can be determined as accurately as possible on the basis of determining the types of the classroom scenes, and the influence of different scenes on evaluating the classroom attention parameters is not ignored.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings:

FIG. 1 is an exemplary system architecture in which the present disclosure may be applied;

FIG. 2 is a flow chart of a classroom attention determination method provided by an embodiment of the present disclosure;

FIG. 3 is a flow chart of another class attention determination method provided by embodiments of the present disclosure;

fig. 4 is a flowchart of an off-line teaching suggestion generating method provided in an embodiment of the present disclosure;

fig. 5 is a block diagram of a classroom attention determination device according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device adapted to perform a classroom attention determination method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness. It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related personal information of the user accord with the regulations of related laws and regulations, necessary security measures are taken, and the public order harmony is not violated.

FIG. 1 illustrates an exemplary system architecture 100 to which embodiments of the classroom attention determination methods, apparatus, electronic devices, and computer-readable storage media of the present disclosure may be applied.

As shown in fig. 1, a system architecture 100 may include a camera 101, a network 102, and a server 103. The network 102 is a medium used to provide a communication link between the camera 101 and the server 103. Network 102 may include various connection types such as wired, wireless communication links, or fiber optic cables, among others.

The camera 101 disposed in the online classroom may interact with the server 103 via the network 102, for example, transmit the captured classroom monitoring video to the server 103, etc. Various applications for implementing information communication between the camera 101 and the server 103, such as a data transmission application, an instruction transmission application, an instant messaging application, and the like, may be installed on the camera.

The camera 101 may be of different sizes, shapes and specifications, and the server 103 may be hardware or software. When the server 103 is hardware, it may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server; when the server is software, the server may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module, which is not particularly limited herein.

The server 103 may provide various services through various built-in applications, and for example, a class application for determining class attention according to received class monitoring video may be provided, and the server 103 may achieve the following effects when running the class application for determining class attention: firstly, receiving a classroom monitoring video shot by a camera arranged in a classroom through a network 104; then, determining the distribution of the noisy degree according to the audio data in the classroom monitoring video; meanwhile, determining chaotic degree distribution according to image data in the classroom monitoring video; next, determining classroom scenes corresponding to different time periods according to the noisy degree distribution and the chaotic degree distribution; and finally, determining the classroom attention parameters respectively corresponding to the different classes of classroom scenes according to the face orientation information in the image data.

Note that, in addition to the real-time acquisition of the class monitoring video from the camera 101 through the network 102, the class monitoring video may be stored in advance in the server 103 in various ways. Thus, when the server 103 detects that such data has been stored locally (e.g., a pending dynamic video generation task left until processing is started), it may choose to retrieve such data directly from the local, in which case the exemplary system architecture 100 may also exclude the camera 101 and network 102.

Since determining the attention of a classroom based on the classroom monitoring video requires more complex processing and analysis of the audio content and requires more computing resources and stronger computing power, the method for determining the attention of a classroom provided in the subsequent embodiments of the present disclosure is generally performed by the server 103 having stronger computing power and more computing resources, and accordingly, the apparatus for determining the attention of a classroom is also generally disposed in the server 103.

It should be understood that the number of cameras, networks and servers in fig. 1 is merely illustrative. There may be any number of cameras, networks, and servers, as desired for implementation.

Referring to fig. 2, fig. 2 is a flowchart of a classroom attention determination method according to an embodiment of the disclosure, wherein a flowchart 200 includes the following steps:

step 201: determining the distribution of the noisy degree according to the audio data in the classroom monitoring video;

this step aims at determining a noisy degree distribution from audio data in a classroom monitoring video by an execution subject of the classroom attention determination method (e.g., the server 105 shown in fig. 1). The classroom monitoring video can be received from a camera arranged in a classroom where a target classroom where the classroom attention needs to be determined is located, the classroom attention described in the disclosure does not purely describe the attention of students in the classroom, but comprehensively characterizes the comprehensive classroom attention generated by the teacher's teaching action on the students, so that the camera at least comprises a front camera deviating from a blackboard or having the same direction as the direction of the teacher to the students, so that the face orientation information of the students can be determined conveniently.

The noisy degree can be represented by determining characteristic parameters such as sound size, sound definition, sound source position overlapping condition (i.e. whether different sound source positions are in different areas, wherein the more distant the area is, the lower the overlapping degree is, and the more nearer the area is, the higher the overlapping degree is), and the noisy degree can be a characteristic vector and a multidimensional matrix which are integrated to a plurality of characteristic noisy features, or can be a quantized value obtained according to a certain conversion mode. The louder the sound and the more chaotic the sound source position, the higher the degree of noisiness can be characterized, and conversely, the lower the degree of noisiness. The distribution of the noisy degree determined in this step is the distribution of the noisy degree of different periods in the whole audio data duration, and assuming that the noisy degree is distinguished from small to large by the levels A1, A2, A3, A4, the distribution of the noisy degree according to the time sequence in the whole audio duration up to 10 minutes can be: the 0-3 minutes are grade A1, 3-4 minutes are grade A3, 4-8 minutes are grade A1, 8-10 minutes are grade A2.

Step 202: determining chaotic degree distribution according to image data in the classroom monitoring video;

the step aims at determining the confusion degree distribution according to the image data in the classroom monitoring video by the executive body. In contrast to the audio data used for determining the distribution of the noisy degree, the present step is that the executing body determines the distribution of the noisy degree according to the activity information of the students and teachers in the image data, that is, the gesture of the students is fixed when the students focus on the class under normal conditions, so that the activity amplitude is smaller, the chaotic degree at the moment is lower, and otherwise, the chaotic degree is higher. As with the noisy level distribution, the chaotic level distribution is still a distribution that characterizes the chaotic level at different times throughout the duration of the audio.

Similarly, the degree of confusion may be a feature vector or a multidimensional matrix that integrates a plurality of feature characterizing the confusion, or may be a quantized value obtained according to a transformation method.

Assuming that the degree of confusion is defined according to the activity amplitude, and specifically that there are different degrees of confusion of B1, B2, B3, B4 arranged in a small-to-large manner, in a surveillance video image stream for up to 10 minutes, the distribution according to the degree of confusion of the time sequence may be: grade B3 for 0-2 minutes, grade B1 for 2-5 minutes, grade B2 for 5-7 minutes, and grade B4 for 7-10 minutes.

Step 203: determining class scenes corresponding to different time periods according to the noisy degree distribution and the chaotic degree distribution;

based on the step 201 and the step 202, this step aims to integrate the noisy degree distribution and the chaotic degree distribution by the execution subject, so as to determine the types of the class scenes corresponding to different time periods as accurately as possible, i.e. determine the distribution of the class scene types.

It should be appreciated that the noisy level distribution determined based on the audio data and the chaotic level distribution determined based on the image data may each reflect to some extent the scene type of the belonging class in different time periods, which is abstract and summarised from the different teaching modes of the class and is relevant for characterizing the attention of the class. Namely, the noisy degree distribution and the chaotic degree distribution are taken as two parallel influence factors to jointly and better determine the class scene type of the corresponding time period. Considering that the noisy degree distribution and the chaotic degree distribution originate from the same cause in some cases, the association between the noisy degree distribution and the chaotic degree distribution is not tight enough in some scenes, so that the noisy degree distribution or the chaotic degree distribution can be determined independently and separately when the scene type is comprehensively determined, and then the abnormal analysis or the coincidence enhancement analysis and the like are carried out by combining the conclusions of the noisy degree distribution and the chaotic degree distribution, so that the weights causing the corresponding conclusions can be weighted by combining the two conclusions when the scene type is comprehensively considered, and the result which is more in line with the actual situation can be obtained.

An implementation including, and not limited to, is:

the noisy degree parameter and the chaotic degree parameter in the same time period are used as input parameters to be input into a preset classroom scene determination model;

and determining the result output by the classroom scene determination model as the classroom scene of the corresponding time period to obtain the classroom scenes respectively corresponding to the different time periods.

The classroom scene determining model is used for representing corresponding relations between different noisy degree parameters and chaotic degree parameters and different classroom scenes, video clips marked with real classroom scene types can be used as training samples in a training stage, and noisy degree analysis results and chaotic degree analysis results are stored on each video clip so as to learn potential relations between the noisy degree analysis results and the chaotic degree analysis results and the real classroom scene types from the training samples through special structures of the model, and further the classroom scene determining model is used for determining the classroom scene types of the real video clips.

Specifically, the classroom scene determination model can be constructed by adopting neural networks, convolution networks and deep learning networks with various structures, and the noisy degree parameters and the chaotic degree parameters in parallel relation can also find proper weights in training and iteration processes through a full-connection layer or other similar functional layers.

Different classroom scenarios can be divided into: the method has the advantages that the method can be used for solving the problems of no interaction scene, serious listening/class questioning, large interaction scene of organized activities/group discussion and disordered interaction scene of unorganized confusion, and the standard of evaluating the attention of the class in different classes of class scenes is obviously not unified, so that the necessity of determining the class scene in the step is reflected.

For ease of understanding, and still illustrated herein, assume that within 45 minutes of an entire class, different types of class scenarios can be distributed as: the first 3 minutes are chaotic interaction scenes of student interaction lugs, the 3-20 minutes are carefully heard and talked small interaction scenes, the 21-30 minutes are quiet questions, the 30-35 minutes are small interaction scenes of group discussion, the 35-43 minutes are small interaction scenes of classroom questioning discussion results, and the 43-45 minutes are chaotic interaction scenes of student interaction lugs.

Step 204: and determining classroom attention parameters respectively corresponding to the different classes of classroom scenes according to the face orientation information in the image data.

On the basis of step 203, this step aims at determining, by the above-described execution subject, on the basis of determining the class scene type for each period, class attention parameters under the corresponding class scene type from face orientation information reflected in the image data during the period. It should be appreciated that since the faces of the student and teacher in concentration are not the same in different types of classroom scenarios, the number of faces towards the teacher or blackboard should be used as the basis for evaluation when evaluating the class attention parameters in a carefully heard small interaction scenario, the number of faces towards other students or content carriers such as textbooks should be used as the basis for evaluation when evaluating the class attention parameters in a large interaction scenario discussed by the team, etc.

The class attention parameter described in this step is a parameterized description manner of quantifying the abstract class attention level so as to estimate the level thereof, and the quantification manner is various, such as scoring, grading, converting according to a formula, and the like, which is not particularly limited herein.

According to the classroom attention determining method provided by the embodiment of the disclosure, the characteristics that different types of scenes are respectively displayed in the image and audio data in the off-line classroom teaching process are utilized, so that the types of the classroom scenes in different time periods are accurately determined, and further, the classroom attention in the scene of the type is determined as accurately as possible on the basis of determining the types of the classroom scenes, and the influence of different scenes on the assessment of the classroom attention parameters is not ignored.

Referring to fig. 3, fig. 3 is a flowchart of another classroom attention determination method according to an embodiment of the disclosure, wherein the flowchart 300 includes the following steps:

step 301: according to the audio data in the classroom monitoring video, determining the audio distribution and the audio variation trend of different time periods;

step 302: according to the overlapping degree and the volume change trend of the sound source distribution, determining the noisy degree distribution;

For the upper implementation scheme provided in step 201 of the flowchart 200, a specific lower implementation scheme is provided in this embodiment through steps 301 to 302, that is, firstly, sound source distribution and audio variation trend of different time periods are determined according to audio data, then, noisy degree of different time periods is determined according to overlapping degree of the sound source distribution and the audio variation trend, and further, noisy degree distribution is formed.

A specific way of determination can be seen in the following steps:

determining a time period with the overlapping degree smaller than a first preset value and the volume change trend conforming to the first preset change trend as a teaching time period with low noisy degree, namely extracting the first preset change trend from the volume change condition of a real teaching time period;

determining a period of time, in which the overlapping degree is greater than a second preset value and the volume change area accords with a second preset change trend, as an unorganized discussion period of high noisy degree, namely extracting the volume change condition of the second preset change trend from a real unorganized discussion period;

and summarizing noisy degree parameters corresponding to each period of time, and determining noisy degree distribution. Assuming that in actual cases, all the time periods can be divided into the teaching time period and the unorganized discussion time period as shown above, the distribution of the noisy degree will be determined according to the time sequence relationship.

Step 303: identifying human body key points in image data in a classroom monitoring video;

step 304: determining the gesture according to the key points of the human body, and determining the change condition of the gesture;

step 305: according to the change conditions and consistency of the gestures of different human bodies, determining the confusion degree distribution;

aiming at the upper implementation scheme provided by step 202 in the flow 200, the embodiment provides a specific lower implementation scheme through steps 303-305, namely, firstly, human body key points of image data are identified, then, the gesture and gesture change conditions are determined according to the human body key points, and finally, the degree of confusion is determined according to the change conditions of different human body gestures and the consistency between gesture changes of different human bodies. That is, under the organized activity requirement, even though the posture of the human body changes more frequently, the posture change mode of the human body is consistent with that of other people, the human body can be considered to belong to cooperative activity or unified activity, and the degree of confusion at the moment is determined to be lower by being restricted by the organization, and the human body is not considered to have higher degree of confusion only according to the frequent posture change.

A specific way of determination can be seen in the following steps:

Determining a teaching period with low confusion degree as a period that the change condition of the postures of different human bodies is smaller than a first preset change amplitude and the posture consistency among different human bodies is larger than a first preset consistency degree, wherein the smaller change degree means that the smaller the limb movement amplitude of students is, and the higher the consistency is, the same posture among a plurality of students is meant;

the method comprises the steps of determining an unorganized discussion period with high confusion degree as a period that the change condition of the postures of different human bodies is larger than a second preset change amplitude (larger than a first preset change amplitude) and the posture consistency among different human bodies is smaller than a second preset consistency degree (smaller than the first preset consistency degree), wherein the larger change degree means that the limb movement amplitude of students is smaller, the constraint is smaller, the consistency is lower, the students are different, and the degree of freedom is high;

and summarizing the chaotic degree parameters corresponding to each period of time, and determining the chaotic degree distribution. Assuming that in actual cases, all the time periods can be divided into the teaching time period and the unorganized discussion time period as shown above, the confusion degree distribution will be determined according to the time sequence relationship.

Step 306: determining class scenes corresponding to different time periods according to the noisy degree distribution and the chaotic degree distribution;

Step 307: determining positive orientations corresponding to different classroom scenes respectively;

for example, when the class scene is a teaching scene, the direction toward the teaching object or the carrier presenting the teaching content may be determined as the positive direction; in the case of an organized discussion scenario in a class scenario, the direction towards other discussion objects or knowledge record carriers may be determined to be positive. I.e. the forward direction should be able to characterize the student's attention in the corresponding type of class scenario.

Step 308: extracting face orientation information from the image data by using a face recognition model provided with a multi-scale convolution kernel;

because the sizes of the video target bodies are different in the process, the size of a proper convolution kernel has a larger influence on the accuracy of the identification result, the size of the convolution kernel of the current mainstream is 3×3, and the accurate identification of the different sizes of the target bodies in different pictures can be realized by changing the sizes of the convolution kernels by comprehensively utilizing a plurality of convolution kernels (such as 3×3+1×3+3×1 and the combination of 3×3 and 5×5).

Step 309: and determining the class attention parameters under the corresponding class scenes according to the number of the faces which are facing towards and correspond to the different class scenes.

Based on step 308, this step aims at determining, by the execution subject, class attention parameters in the corresponding class scenes according to the number of faces facing forward corresponding to the different class scenes, respectively. I.e. the more faces are facing, the more focused the characterization of classroom attention.

The embodiment shown in the process 200 is distinguished, and the implementation manner of determining the noisy degree distribution and the chaotic degree distribution is provided by the steps 301-302 and 303-305 respectively, the positive directions of different types of classroom scenes are defined by the step 307, meanwhile, the recognition capability of multi-resolution images and multi-scale target recognition is improved by the multi-scale convolution kernel provided by the step 308, and finally, the classroom attention parameters under the corresponding types of classroom scenes are determined more accurately by the step 309.

It should be noted that the above-mentioned incremental improvement points or embodiments are not dependent or causal with respect to each other, only the step 307 and the step 309 are collocated, and other improvement points or embodiments may be combined with the embodiment of the process 200 alone to form different independent embodiments, and the embodiment only exists as a preferred embodiment including multiple improvement points or embodiments at the same time.

On the basis of any embodiment, in order to improve the accuracy of the determined classroom scene portion as much as possible, when classroom planning information can be obtained through a preset data interface, according to the classroom planning information, the classroom scenes corresponding to different time periods, which are determined based on noisy degree distribution and chaotic degree distribution, can be adjusted. The preset data interface may be an interface for acquiring teaching plan information uploaded by a teacher before teaching in the teaching system, where the planning information is teaching plan information of the teacher on the class, including arrangement of class types of different types, and the like.

On the basis of any of the above embodiments, this step aims to generate an offline teaching suggestion by the execution body in combination with determined classroom attentiveness corresponding to different classes of classes, that is, generate an offline teaching suggestion by searching for a teaching mode capable of bringing about higher classroom attentiveness, please refer to fig. 4, fig. 4 is a flowchart of an offline teaching suggestion generation method provided by an embodiment of the present disclosure, wherein a flowchart 400 includes the following steps:

step 401: according to the classroom attention parameters respectively corresponding to each type of classroom scene, weighting calculation is carried out to obtain comprehensive attention parameters corresponding to the whole class;

The step aims at obtaining the comprehensive attention parameters corresponding to the whole class through weighted calculation by the execution main body according to the class attention parameters corresponding to each type of class scene, namely, respectively distributing a proper weight for each type of class scene in advance, and obtaining the comprehensive attention parameters corresponding to the whole class through weighted calculation of the weight.

Step 402: determining the whole class with comprehensive attention parameters meeting preset requirements as a target class;

on the basis of step 401, this step aims at determining, by the above-mentioned executing entity, a whole class with integrated attention parameters satisfying preset requirements set to screen preferred classes that can be selected to generate off-line lecture advice, as target classes, so that it is generally required to have higher integrated attention parameters, which may include the kind, order, etc. of class scene types involved, in addition to viewing the integrated attention parameters.

Step 403: and generating online teaching suggestions according to the distribution of different classes in the target class.

Based on step 402, this step aims to generate offline teaching advice by the executive body according to the distribution of different types of class scenes in the target class. Further, the generated offline teaching advice can be pushed to the same type of teacher to guide the teacher to adjust the teaching arrangement of the teacher.

For deepening understanding, the disclosure further provides a specific implementation scheme in combination with a specific application scenario:

considering that the classroom is generally rectangular in shape and the distance between each row of desks and the front-facing cameras gradually becomes longer in the classroom scene, the proportion of the faces of students sitting in the rear of the classroom in the field of view is small. In order to better identify the faces of students with different sizes in the scene, a TinyFace detection model with better detection effect on the faces with different sizes is used in the embodiment, and meanwhile, aiming at the situation that the sizes are inconsistent, the detection precision is improved by fusing a multi-scale convolution kernel.

Construction of the class attention index is characterized by using the number of faces in the detected video in determining class scene type, for example, when carefully hearing, the student's facial information is typically exposed to the field of view of the front camera, so that the class attention index in carefully hearing class scenes can be determined by detecting the number of complete faces identifiable in the video.

In the process of processing the surveillance video, ffmpeg (a video processing tool) can be used to process video data and separate audio information in the video for analysis. And an audio analysis mode is adopted to assist in identifying the classroom scene. The audio format separated by Ffmteg is WAV format, recoding and appointed sampling rate are needed on the basis of WAV format, and normalization is needed to be carried out on the audio data when the audio file is processed, so that the problem of non-uniform standard caused by non-normalization is avoided.

In the audio processing process, the recognition of classroom interaction scenes can be influenced by the division of different thresholds, and various interaction scenes can exist in the same threshold frequency band, for example, when the audio threshold is between 0.3 and 0.5, the probability is in a carefully listening and speaking or classroom questioning stage; when the audio threshold is between 0.5 and 0.7, the interaction scene may be in an organized activity or group discussion stage, and in this mode, the scene recognition task is difficult to perform only by means of audio segmentation recognition, so that the situation that the video scene recognition module of the next section is combined for fusion processing is encountered, so that the detection of the classroom interaction scene is more accurate.

When audio detection is utilized, a problem, namely uneven distribution of samples, is also caused, through statistics discovery of data, students are in a stage of quietly making questions, certifying listening and speaking and organizing activities, the proportion of the students in a whole class is large, and the proportion of the class questions, group discussions and unorganized chaotic stages in the whole class time is small, so that the problems can be processed in an unbalanced sampling mode, different sampling frequencies are adopted in a data preprocessing stage aiming at different scenes, video in the stage of quietly making questions, carefully listening and organizing activities is downsampled, and the uniformity of sample distribution is improved in an up-sampling mode for the audio in the stage of class questions, group discussions and unorganized chaotic stages.

The subsequent attention evaluation module can integrate the audio segmentation result, the face detection result and the video feature extraction result to realize the identification of the current video interaction scene.

In the aspect of classifying the attention of the classroom, the embodiment can divide the attention situation into three situations of concentration, moderate attention and non-concentration, and different classroom interaction scenes are corresponding to different attention situations, so that the unorganized chaotic stage is less in occupation from the whole distribution, the frequency of occurrence in the whole data set is lowest, and the recognition difficulty is maximum. Under the condition of focusing attention, the quiet questions and the earnest listening and speaking are the most distributed in the whole data set, and under the training of a large number of data sets, the recognition result is more accurate. The difference in actions of three interaction scenarios, i.e. classroom questions and team discussions and organized activities, is not obvious in moderate attentions. Therefore, the scheme provided by the embodiment is highlighted, the obvious degree of difference can be amplified, and the accuracy of discrimination is further improved.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of a class attention determining device, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the classroom attention determination apparatus 500 of the present embodiment may include: a noisy degree distribution determining unit 501, a chaotic degree distribution determining unit 502, a different class scene determining unit 503, and a class attention parameter determining unit 504. Wherein the noisy degree distribution determining unit 501 is configured to determine a noisy degree distribution according to the audio data in the classroom monitoring video; a confusion-degree distribution determining unit 502 configured to determine a confusion-degree distribution from image data in the classroom monitoring video; a different class scene determination unit 503 configured to determine class scenes corresponding to different periods according to the noisy degree distribution and the chaotic degree distribution; the classroom attention parameter determination unit 504 is configured to determine classroom attention parameters respectively corresponding to different types of classroom scenes according to face orientation information in the image data.

In the present embodiment, in the classroom attention determination device 500: the specific processing and technical effects of the noisy level distribution determining unit 501, the chaotic level distribution determining unit 502, the different classroom scene determining unit 503, and the classroom attention parameter determining unit 504 may refer to the relevant descriptions of steps 201-204 in the corresponding embodiment of fig. 2, and are not repeated here.

In some optional implementations of this embodiment, the classroom attention parameter determination unit 504 may include:

a forward direction per-scene determination subunit configured to determine forward directions respectively corresponding to different classroom scenes;

and the classroom attention parameter determination subunit is configured to determine the classroom attention parameters in the corresponding classroom scenes according to the number of faces which are facing and respectively correspond to the different classroom scenes.

In some optional implementations of the present embodiment, the positive orientation per-scene determination subunit may include:

a lecture scene forward direction determination module configured to determine a direction toward a lecture object or a carrier presenting lecture contents as a forward direction in response to the classroom scene being a lecture scene;

an organized discussion forward orientation determination module configured to determine a direction towards other discussion objects or knowledge record carriers as forward orientation in response to the classroom scenario being an organized discussion scenario.

In some optional implementations of the present embodiment, the classroom attention determination apparatus 500 may further include:

and a face recognition model calling unit configured to extract face orientation information from the image data using a face recognition model provided with a multi-scale convolution kernel.

In some optional implementations of the present embodiment, the noisy level distribution determination unit 501 may be further configured to:

determining sound source distribution and sound source variation trend of different time periods according to the audio data;

and determining the noisy degree distribution according to the overlapping degree and the sound source distribution and the sound volume change trend.

In some optional implementations of the present embodiment, the root confusion-degree-distribution determining unit 502 may be further configured to:

identifying human body key points in the image data;

determining the gesture according to the key points of the human body, and determining the change condition of the gesture;

and determining the confusion degree distribution according to the change conditions and consistency of the postures of different human bodies.

In some optional implementations of the present embodiment, the different classroom scene determination unit 503 may be further configured to:

the noisy degree parameter and the chaotic degree parameter in the same time period are used as input parameters to be input into a preset classroom scene determination model; the classroom scene determining model is used for representing the corresponding relation between different noisy degree parameters and chaotic degree parameters and different classroom scenes;

and the adjusting unit is configured to respond to the classroom planning information obtained through the preset data interface, and adjust the classroom scenes corresponding to different time periods determined based on the noisy degree distribution and the chaotic degree distribution according to the classroom scene planning information of different types in the classroom planning information.

a comprehensive attention parameter determination unit configured to obtain a comprehensive attention parameter corresponding to the whole class by weighting calculation according to class attention parameters respectively corresponding to each type of class scene;

a preferred class determination unit configured to determine an entire class having comprehensive attention parameters satisfying preset requirements as a target class;

and the teaching suggestion generation unit is configured to generate off-line teaching suggestions according to the distribution of different types of classroom scenes in the target classroom.

The embodiment exists as an embodiment of a device corresponding to the embodiment of the method, and the classroom attention determining device provided by the embodiment utilizes the characteristics that different types of scenes are respectively displayed in the image and the audio data in the offline classroom teaching process, so that the types of the classroom scenes in different time periods are accurately determined, and further, the classroom attention in the scenes of the types can be accurately determined as much as possible on the basis of determining the types of the classroom scenes, and the influence of the different scenes on the evaluation of the classroom attention parameters is not ignored.

According to an embodiment of the present disclosure, the present disclosure further provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to implement the classroom attention determination method described in any of the embodiments above.

According to an embodiment of the present disclosure, there is also provided a readable storage medium storing computer instructions for enabling a computer to implement the classroom attention determination method described in any of the above embodiments when executed.

The presently disclosed embodiments provide a computer program product which, when executed by a processor, enables the classroom attention determination method described in any of the above embodiments.

Fig. 6 illustrates a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the respective methods and processes described above, such as the classroom attention determination method. For example, in some embodiments, the classroom attention determination method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When a computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the classroom attention determination method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the classroom attention determination method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical host and virtual private server (VPS, virtual Private Server) service.

According to the technical scheme of the embodiment of the disclosure, the characteristics of different types of scenes in the off-line classroom teaching process in the image and audio data are respectively utilized, so that the types of the classroom scenes in different time periods are accurately determined, and further, the classroom attention under the scene of the type can be accurately determined as far as possible on the basis of determining the types of the classroom scenes, and the influence of different scenes on evaluating the classroom attention parameters is not ignored.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A classroom attention determination method, comprising:

according to the audio data in the classroom monitoring video, determining the audio distribution and the audio variation trend of different time periods;

determining a noisy degree distribution according to the overlapping degree of the sound source distribution and the volume change trend;

identifying human body key points in image data in the classroom monitoring video;

determining the gesture according to the human body key points, and determining the change condition of the gesture;

according to the change conditions and consistency of the gestures of different human bodies, determining the confusion degree distribution;

determining classroom scenes corresponding to different time periods according to the noisy degree distribution and the chaotic degree distribution;

in response to the classroom scene being a teaching scene, determining a direction toward a teaching object or a carrier presenting teaching content as a positive direction;

in response to the classroom scenario being an organized discussion scenario, determining a direction towards other discussion objects or knowledge record carriers as the positive direction;

and determining the class attention parameters under the corresponding class scenes according to the number of the faces which are respectively corresponding to different class scenes and are facing forward and are represented by the image data.

2. The method of claim 1, further comprising:

And extracting the face orientation information from the image data by using a face identification model provided with a multi-scale convolution kernel.

3. The method of claim 1, wherein the determining classroom scenes corresponding to different time periods from the noisy level distribution and the chaotic level distribution comprises:

the noisy degree and the chaotic degree of the same period are used as input parameters to be input into a preset classroom scene determination model; the classroom scene determination model is used for representing the corresponding relation between different noisy degrees and chaotic degrees and different classroom scenes;

and determining the result output by the classroom scene determination model as the classroom scene of the corresponding time period to obtain the classroom scenes respectively corresponding to different time periods.

4. The method of claim 1, further comprising:

and responding to the classroom planning information obtained through a preset data interface, and adjusting the classroom scenes corresponding to different time periods determined based on the noisy degree distribution and the chaotic degree distribution according to the planning information of the classroom scenes of different types in the classroom planning information.

5. The method of any of claims 1-4, further comprising:

According to the classroom attention parameters respectively corresponding to each type of classroom scene, weighting calculation is carried out to obtain comprehensive attention parameters corresponding to the whole class;

determining the whole class with comprehensive attention parameters meeting preset requirements as a target class;

and generating online teaching suggestions according to the distribution of different classes in the target class.

6. A classroom attention determination device comprising:

the noisy degree distribution determining unit is configured to determine sound source distribution and sound variation trend of different time periods according to the audio data in the classroom monitoring video; determining a noisy degree distribution according to the overlapping degree of the sound source distribution and the volume change trend;

a confusion degree distribution determining unit configured to identify human body key points in image data in the classroom monitoring video; determining the gesture according to the human body key points, and determining the change condition of the gesture; according to the change conditions and consistency of the gestures of different human bodies, determining the confusion degree distribution;

a different class scene determination unit configured to determine class scenes corresponding to different periods according to the noisy degree distribution and the chaotic degree distribution;

A classroom attention parameter determination unit configured to determine, as a positive orientation, a direction toward a teaching object or a carrier presenting teaching contents in response to the classroom scene being a teaching scene; in response to the classroom scenario being an organized discussion scenario, determining a direction towards other discussion objects or knowledge record carriers as the positive direction; and determining the class attention parameters under the corresponding class scenes according to the number of the faces which are respectively corresponding to different class scenes and are facing forward and are represented by the image data.

7. The apparatus of claim 6, further comprising:

and a face recognition model calling unit configured to extract the face orientation information from the image data using a face recognition model provided with a multi-scale convolution kernel.

8. The apparatus of claim 6, wherein the different classroom scenario determination unit is further configured to:

the noisy degree parameter and the chaotic degree parameter in the same time period are used as input parameters to be input into a preset classroom scene determination model; the classroom scene determination model is used for representing the corresponding relation between different noisy degree parameters and chaotic degree parameters and different classroom scenes;

9. The apparatus of claim 6, further comprising:

and the adjusting unit is configured to respond to the classroom planning information obtained through the preset data interface, and adjust the classroom scenes corresponding to different time periods determined based on the noisy degree distribution and the chaotic degree distribution according to the planning information of the classroom scenes of different types in the classroom planning information.

10. The apparatus of any of claims 6-9, further comprising:

11. An electronic device, comprising:

At least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the classroom attention determination method of any one of claims 1-5.

12. A non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the classroom attention determination method of any one of claims 1-5.

13. A computer program product comprising a computer program which, when executed by a processor, implements the classroom attention determination method of any one of claims 1-5.