CN109241830B

CN109241830B - Classroom lecture listening abnormity detection method based on illumination generation countermeasure network

Info

Publication number: CN109241830B
Application number: CN201810831224.5A
Authority: CN
Inventors: 谢昭; 张安杰; 吴克伟; 肖泽宇; 童赟
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2018-07-26
Filing date: 2018-07-26
Publication date: 2021-09-17
Anticipated expiration: 2038-07-26
Also published as: CN109241830A

Abstract

The invention discloses a classroom lecture listening abnormity detection method based on an illumination generation confrontation network, which comprises the following steps: the method comprises the steps of collecting real classroom head posture data, rendering illumination classroom head posture data, constructing an illumination generation confrontation network, generating a generated confrontation sample, constructing a head posture detection model, detecting classroom head posture and detecting classroom listening abnormity. According to the invention, through using the deep neural network, the accuracy of positioning the head region is improved, and the interference of the non-head region on the judgment of the non-lecture state is reduced.

Description

Classroom lecture listening abnormity detection method based on illumination generation countermeasure network

Technical Field

The invention belongs to the technical field of anomaly detection, and particularly relates to a classroom lecture listening anomaly detection method based on an illumination generation countermeasure network.

Background

The computer anomaly detection is to analyze a video sequence recorded by monitoring equipment such as a camera and the like by using a computer vision theory and a video analysis method without human intervention, realize the positioning, identification and tracking of a target in a structured scene, analyze and judge the behavior of the target on the basis of the positioning, identification and tracking of the target, obtain the understanding of the meaning of image content and the explanation of an objective scene, and further guide and plan actions.

The existing anomaly detection method usually adopts a specific statistical analysis method and a deep learning method. The chinese patent with application number 201510141935.6, a crowd movement track anomaly detection method for a complex structured scene, performs the learning of the multi-center clustering algorithm based on the maximum and minimum distances by extracting the crowd movement track in the complex structured scene in the historical data of the surveillance video and segmenting the crowd movement track, and uses the anomaly detection based on the LOF algorithm, thereby efficiently and practically solving the problem of crowd movement track detection. The Chinese patent 'crowd abnormity detection and positioning system and method based on time recurrent neural network' with application number 201410795393.X uses the time recurrent neural network to analyze and train the collected sample data so as to complete abnormity detection. The chinese patent "a video anomaly detection method" with application number 201710305833.2 utilizes a gray projection algorithm to complete global motion estimation, thereby effectively realizing image jitter detection and jitter degree estimation. The method is easy to fall into a local optimal solution, and when a training data set is large and a training network is complex, training time is often very high, cost is too large, and efficiency is too low.

The main application fields of the current abnormity detection comprise intelligent transportation, intelligent monitoring and the like. The chinese patent "a traffic anomaly detection method and system" with application number 201410799626.3 divides a normal traffic video image sequence into video block sequences, detects the number of shots in the video block sequences, establishes a gaussian model of the number of shots in the video block sequences, and performs anomaly detection on a test traffic video image by using the gaussian model. The chinese patent "traffic scene anomaly detection method based on motion reconstruction technology" with application number 201510670786.2 utilizes the spatial position information of the motion pattern to explore the spatial structure information between different motion patterns, and solves the problem of inapplicability of the existing anomaly detection method to the specific scene. The chinese patent application No. 201710131835.4, urban road traffic anomaly detection method based on Isolation Forest, uses a road as a detection object, divides different types of data sets according to the average running speed of the road in different periods, trains an Isolation Forest based on each data set, and determines whether the road is abnormal by detecting the distance from the road speed to a root node in the Isolation Forest. The chinese patent "an anomaly detection method based on vehicle track similarity" with application number 201510046984.1 calculates similarity measurement between a typical track and the vehicle track of the type to build a deviation statistical model, obtains a confidence interval of the similarity measurement, calculates the similarity measurement between the track to be measured and the typical track, and judges whether the track is abnormal according to the confidence interval. However, the method is complex, low in scene adaptability, large in real data set requirement and high in data cost.

The existing abnormity detection method does not relate to the analysis of the listening state in the process of classroom teaching invigilation, and the task of the invention is different from the existing classroom video and head posture data. The method is different from the conventional anomaly detection method in the aspect of implementation, and on the basis of a deep learning method, the 3D model and the illumination rendering are used for generating the sample, and the illumination-optimized head posture data is generated on the basis of the generation countermeasure network, so that the inconsistency of the illumination-rendered head position image and the real head position image data is solved. The accuracy of head posture detection can be effectively improved by using the generated confrontation sample, and abnormal classroom listening state judgment is realized through statistical analysis of the head posture detection.

Disclosure of Invention

In order to make up for the defects of the prior art, the invention provides a classroom lecture listening abnormity detection method based on illumination generation countermeasure network.

The technical scheme adopted by the invention is as follows:

a classroom lecture listening abnormity detection method based on an illumination generation countermeasure network is characterized in that: the method comprises the following steps:

step S1: acquiring real classroom head posture data:

acquiring and obtaining video frames in a real classroom, constructing a head position detection model, marking candidate head position images, and acquiring a training set and training parameters;

step S2: rendering the head posture data of the illumination classroom:

according to a designed 3D model of a classroom student, setting head gestures, illumination conditions and camera shooting angle parameters in the model, rendering for multiple times, and acquiring a classroom image set under rendered illumination;

step S3: constructing an illumination generation countermeasure network:

generating an antagonistic network according to the 11 layers of illumination, solving the target loss of the antagonistic network generated by the illumination, and training the antagonistic network generated by the illumination;

step S4: generating a challenge sample:

using real classroom head posture data to obtain illumination rendering head position images of different illumination conditions, shooting angles and different figures, using trained illumination to generate confrontation network model parameters, generating an optimized rendering illumination head position image, calculating a judgment score of the illumination optimization head position image, setting a vivid image threshold value, and selecting the image which is larger than the threshold value as the vivid rendering illumination head position image;

step S5: constructing a head posture detection model:

the method comprises the steps of taking a vivid rendering illumination head position image as training data for head posture detection, marking the training data for the head posture detection in a classification mode, setting a head posture detection model, and obtaining parameters of the head posture detection model through training;

step S6: detecting the head posture in the classroom:

using the generated confrontation head posture detection model to realize classroom head posture detection;

step S7: detecting abnormal class listening in a classroom:

the method comprises the steps of inputting a classroom real-time collected video, extracting a video frame, setting a class-listening abnormity detection mechanism by using a constructed model and training parameters, and obtaining proportions of students in different states.

Each step of the classroom attendance anomaly detection method for generating an countermeasure network based on illumination is described in detail below.

The method for acquiring the real classroom head posture data comprises the following steps:

step S1-1: collecting a real classroom video;

step S1-2: acquiring video frames in a classroom video, and performing sliding window sampling to obtain candidate head position images, wherein each head position image comprises RGB three-layer color channels;

step S1-3: constructing a head position detection model, wherein the head position detection model comprises 8 layers of neural network models, the first 6 layers are convolutional neural networks, and the 7 th layer and the 8 th layer are full-connection networks;

step S1-3-1: the first 6 layers of convolutional neural networks use the same parameters, the size of the filter of each layer is 3 x 3, the number of the filters is 256, the pooling method is summation pooling, namely the summation result of response values in 256 channels is reserved as the final output response, and the form of the excitation function is a relu function;

step S1-3-2: in the full-connection layer of the 7 th layer, 256 characteristic neurons are mapped into 4096 characteristic neurons, and a full-connection mapping parameter matrix is 256 x 4096;

step S1-3-3: a full-connection layer of the 8 th layer maps 4096 characteristic neurons into single neurons, and a full-connection mapping parameter matrix is 4096 x 2, wherein the type of the final output layer neuron is 1, which indicates that the neuron is a head position image, and 0 indicates that the neuron is not a head position image;

step S1-4: marking the candidate head position image obtained in the step S1-2 to obtain head training data and non-head training data, and constructing a head position detection training set;

step S1-5: training the head position detection model constructed in the step S1-3 by using the head position detection training set acquired in the step S1-4 to obtain parameters w of the trained head position detection model_headpos；

Step S1-6: for the candidate head position image obtained in step S1-2, the trained head position detection model parameter w is used_headposThe human face judgment is carried out, the head and the non-head can be distinguished, and therefore the real head position image real of the test video can be extracted_headpos。

The method for rendering the head posture data of the illumination classroom comprises the following steps:

step S2-1: designing a classroom student 3D model;

step S2-2: setting the head postures of students in class in a classroom student 3D model;

step S2-3: setting illumination conditions in a classroom student 3D model;

step S2-4: setting a camera shooting angle in a classroom student 3D model;

step S2-5: shooting for multiple times according to the conditions set in the steps S2-1, S2-2, S2-3 and S2-4 to obtain a classroom image set under rendered illumination;

step S2-6: for a set of classroom images under rendered lighting, training is performed using step S1-5Trained head position detection model parameter w_headposObtaining a render of the head position image by illumination_headpos。

The method for constructing the illumination generation countermeasure network comprises the following steps:

step S3-1: setting an illumination generation countermeasure network, wherein the first 4 layers are illumination generation optimization networks, and the 5 th to 11 th layers are illumination generation judgment networks;

step S3-1-1: setting an illumination generation optimization network, and using a 4-layer convolutional neural network;

step S3-1-1-1: in the optimization network, the convolutional neural networks of each layer use the same parameters, the size of the filter of each layer is 3 x 3, the number of the filters is 256, the pooling method is maximum pooling, namely the maximum response value in 256 channels is reserved as the final output response, and the form of the excitation function is a relu function;

step S3-1-1-2: all parameters in the illumination generation optimization network are denoted as w_ref；

Step S3-1-1-3: generating an optimized network by the input image through illumination, and obtaining an optimized image, wherein the resolution of the optimized image is the same as that of the original image;

step S3-1-2: setting an illumination generation judgment network, and using a neural network with 7 layers, wherein the first 5 layers are convolutional neural networks, and the 6 th layer and the 7 th layer are full-connection neural networks;

step S3-1-2-1: in the decision network, the same parameters are used in the first 5 layers, the size of the filter of each layer is 3 x 3, the number of the filters is 64, the pooling method is summation pooling, namely the summation result of response values in 64 channels is reserved as the final output response, and the form of the excitation function is relu function;

step S3-1-2-2: in the full-connection layer of the 6 th layer, 256 characteristic neurons are mapped into 4096 characteristic neurons, and a full-connection mapping parameter matrix is 256 x 4096;

step S3-1-2-3: and a full-connection layer of the 7 th layer, wherein 4096 characteristic neurons are mapped into single neurons, the full-connection mapping parameter matrix is 4096 x 2, and the type y of the neurons of the final output layer_real，y_realIs 1 means trueA real head position image, 0 representing an illumination optimized head position image;

step S3-1-2-4: all parameters in the illumination generation decision network are denoted as w_judge；

Step S3-1-2-5: rendering an illumination image, inputting the illumination to generate a judgment network, and judging that the score is closer to 0; inputting a real head position image into an illumination generation judgment network, and judging that the score is closer to 1;

step S3-2: solving the target loss of the illumination generation countermeasure network;

step S3-2-1: calculating the optimization loss of the illumination rendering image;

step S3-2-1-1: rendering head position images with lighting_headposInputting step S3-1-1 illumination generation optimization network, obtaining illumination optimization head position image refine_headpos；

Step S3-2-1-2: solving the optimization loss of the illumination rendering head position image, namely illumination optimization head position image refine_headposRender head position image render with illumination_headposUsing a 1 norm to calculate the distance between 2 images, i.e.

d_ref＝||render_headpos-refine_headpos||₁

Step S3-2-2: calculating the judgment loss of the image;

step S3-2-2-1: constructing a set img of head position images_headposContaining real head position image real_headposAnd illumination optimized head position image refine_headpos；

Step S3-2-2-2: according to the image type, img is set for the head position image_headposSetting image flag y_real，y_realA value of 1 indicates a true head position image, and a value of 0 indicates a lighting-optimized head position image;

step S3-2-2-3: image img of head position_headposInputting step S3-1-2 light generation judgment network to obtain judgment score S_judge；

Step S3-2-2-4: solving the judgment loss of one image according to the judgment score and the image mark

Step S3-2-3: solving the total loss of the illumination rendering head position image, wherein the total loss comprises 2 parts of optimization loss and judgment loss, and the total loss of the illumination rendering head position image is

loss＝d_ref+d_judge

Step S3-3: training illumination to generate an antagonistic network;

step S3-3-1: training illumination to generate an optimized network;

step S3-3-1-1: input illumination rendering head position image render_headpos；

Step S3-3-1-2: computing illumination optimized head position image refine_headpos；

Step S3-3-1-3: calculating a decision score s for an illumination-optimized head position image_judge；

Step S3-3-1-4: according to the step S3-2-3, calculating the total loss of the illumination rendering head position image;

step S3-3-1-5: adjusting model parameters for illumination generation optimization, and determining updated illumination generation optimization model parameters according to total loss and gradient descent method of illumination rendering head position image

Wherein t represents the t-th update of the model parameters;

step S3-3-2: training an illumination generation decision network;

step S3-3-2-1: repeating step S3-2-2-4 to calculate the decision loss d of all images in the image set_judge；

Step S3-3-2-2: adjusting model parameters of illumination generation judgment, and determining updated illumination generation judgment model parameters according to gradient descent method

Wherein t represents the t-th update of the model parameters;

step S3-3-3: alternately repeating the steps S3-3-1 and S3-3-2, and iteratively optimizing the parameters of the illumination generation optimization model

And illumination generation decision model parameters

Until the model loss convergence no longer changes;

step S3-3-4: generating optimized model parameters by the converged illumination

And illumination generation decision model parameters

Recording as trained light generation confrontation network model parameter w_adv＝{w_ref,w_judge}。

The generating generates a challenge sample, comprising the steps of:

step S4-1: using the step S2, obtaining the head position image render from different lighting conditions, different shooting angles, and different people' S lighting_headpos；

Step S4-2: generating optimized rendered illumination head position image refine using the illumination-generation confrontation network model parameters trained in step S3-3-4_headpos；

Step S4-3: calculating a decision score s for an illumination-optimized head position image using an illumination generation decision model_judge；

Step S4-4: setting a threshold value of a vivid image, and scoring the judgment by s_judgeGreater than 0.5 as a realistic rendering illuminated head position image.

The method for constructing the head posture detection model comprises the following steps:

step S5-1: using the vivid rendered illumination head position image and the real head position image obtained in the step S1-6 as training data for head posture detection;

step S5-2: class labeling of training data for head pose detection y_listenThe head posture training data comprises lecture samples and non-lecture samples, wherein y_listenIs 1 denotes a careful lecture, y _listen0 means not attending class;

step S5-3: setting a head posture detection model, and using 7 layers of neural networks, wherein the first 5 layers are convolutional neural networks, and the 6 th layer and the 7 th layer are full-connection neural networks;

step S5-3-1: in the decision network, the same parameters are used in the first 5 layers, the size of the filter of each layer is 3 x 3, the number of the filters is 64, the pooling method is summation pooling, namely the summation result of response values in 64 channels is reserved as the final output response, and the form of the excitation function is relu function;

step S5-3-2: in the full-connection layer of the 6 th layer, 256 characteristic neurons are mapped into 4096 characteristic neurons, and a full-connection mapping parameter matrix is 256 x 4096;

step S5-3-3: and a full-connection layer of the 7 th layer, wherein 4096 characteristic neurons are mapped into single neurons, the full-connection mapping parameter matrix is 4096 x 2, and the type y of the neurons of the final output layer_listenWherein y is_listenIs 1 denotes a careful lecture, y _listen0 means not attending class;

step S5-4: training the neural network model constructed in the step S5-3 by using the training set constructed in the step S5-1 and the step S5-2 to obtain the parameter w of the trained head posture detection model_listen。

The classroom head posture detection comprises the following steps:

step S6-1: inputting a classroom real-time collected video and extracting a video frame;

step S6-2: head position detection model parameter w trained using step S1-6_headposExtracting real head position image real_headpos；

Step S6-3: parameter w of head pose detection model trained using step S5-4_listenCalculating the head posture score of the student;

step S6-4: judging whether the student is in class according to the head posture score of the student, wherein y_listenIs 1 denotes a careful lecture, y _listen0 means not attending class;

step S6-5: traversing all students in the video frame, judging the class listening state of all people, and calculating the proportion of the number of people who do not listen to the class;

step S6-6: and setting a state threshold value of the class non-attending proportion, outputting the class non-attending state if the class non-attending proportion is more than or equal to 5%, and outputting the normal class attending state if the class non-attending proportion is less than 5%.

The class attendance abnormity detection method comprises the following steps:

step S7-1: inputting a real-time monitoring video and reading a video frame;

step S7-2: judging whether the video is finished or not, and finishing the real-time judgment of abnormal class listening state if the monitoring video is finished;

step S7-3: if the monitoring video is still valid, detecting the non-lesson-listening state of the continuous video frames by using the step S6-6, and extracting the non-lesson-listening state of each frame;

step S7-4: if the non-listening state does not appear, clearing the non-listening state, clearing the starting time of the non-listening state and clearing the abnormal state of class listening;

step S7-5: if the state of not listening to the class appears, abnormal judgment of class listening in the class is carried out;

step S7-5-1: if the non-class state appears for the first time, initializing the current time as the starting time of the non-class state, and initializing the non-class duration as 1 frame;

step S7-5-2: if the non-lesson-listening state does not occur for the first time, updating the duration of the non-lesson-listening state, and increasing the duration by 1 frame;

step S7-5-3: if the duration of the non-lesson-listening state reaches 50 frames, outputting an abnormal class-listening state in the classroom;

step S7-6: and repeating the steps S7-1 to S7-5 to realize the analysis of real-time classroom lecture listening data, provide abnormal classroom lecture listening state detection and realize real-time judgment.

Compared with the prior art, the invention has the following advantages:

(1) aiming at the problems of complex background environment and the interference of a large amount of invalid information of non-head areas, the invention improves the accuracy of positioning the head areas and reduces the interference of the non-head areas on the judgment of the state of not attending classes by using the deep neural network.

(2) The head posture data can change along with the change of factors such as the head posture, the illumination condition, the shooting angle of a camera and the like, thereby causing certain influence on the accuracy of the head posture data. According to the invention, through the 3D model and the ambient light rendering method, the classroom image set under the rendering illumination is generated for multiple times, more data samples are provided for the head posture detection model, and the component training of the model is facilitated.

(3) The confrontation network is generated through illumination, the rendering illumination generated for multiple times is optimized, a more vivid confrontation sample is generated, the inconsistency between the rendering illumination sample and the real data characteristic is avoided, and the effectiveness of model training is improved. The invention can realize effective detection of the head posture based on the generation of the confrontation network, is beneficial to the judgment of the class listening state and is beneficial to improving the accuracy of the judgment of the abnormal class listening state of the duration.

Drawings

The invention is further described below with reference to the accompanying drawings:

fig. 1 is a flowchart of a classroom attendance anomaly detection method for generating an countermeasure network based on illumination.

Fig. 2(a) proposes a model for the head position.

Fig. 2(b) is the collected real classroom head pose data.

Fig. 3 is a classroom student 3D model and lighting rendering environment.

Fig. 4(a) is a model of an illumination-generated countermeasure network.

Fig. 4(b) is a lighting optimized head position image.

Fig. 5 is a head pose detection model.

Fig. 6 shows a model for detecting a non-attending state.

Fig. 7 is a model for detecting abnormal listening status in a classroom.

Detailed Description

The invention is described in detail below with reference to the figures and the detailed description. The invention relates to a classroom lecture-listening abnormity detection method based on an illumination generation confrontation network, a specific flow is shown in figure 1, and the implementation scheme of the invention comprises the following steps:

step S1: the method comprises the following steps of collecting real classroom head posture data, wherein the specific operation steps comprise:

step S1-1: collecting a real classroom video;

step S1-3: constructing a head position detection model, wherein the head position detection model comprises 8 layers of neural network models, the first 6 layers are convolutional neural networks, and the 7 th layer and the 8 th layer are fully connected networks, as shown in fig. 2 (a);

Step S1-6: for the candidate head position image obtained in step S1-2, the trained head position detection model parameter w is used_headposThe human face judgment is carried out, the head and the non-head can be distinguished, and therefore the real head position image real of the test video can be extracted_headposAs shown in fig. 2 (b);

step S2: generating rendering illumination classroom head posture data, and the specific operation steps comprise:

step S2-1: designing a classroom student 3D model;

step S2-3: setting illumination conditions in a classroom student 3D model;

step S2-4: setting a camera shooting angle in a classroom student 3D model;

step S2-5: according to the conditions set in the steps S2-1, S2-2, S2-3, S2-4 and S2-5, such as the conditions shown in FIG. 3, shooting is carried out for multiple times to obtain a classroom image set under rendered illumination;

step S2-6: for the classroom image set under the rendering illumination, the head position detection model parameter w trained in the step S1-5 is used_headposObtaining a render of the head position image by illumination_headpos；

Step S3: aiming at the class-attending posture detection, an illumination generation countermeasure network is constructed, and the specific operation steps comprise:

step S3-1: setting an illumination generation countermeasure network, wherein the first 4 layers are illumination generation optimization networks, and the 5 th to 11 th layers are illumination generation decision networks, as shown in fig. 4 (a);

step S3-1-2-3: and a full-connection layer of the 7 th layer, wherein 4096 characteristic neurons are mapped into single neurons, the full-connection mapping parameter matrix is 4096 x 2, and the type y of the neurons of the final output layer_real，y_realA value of 1 indicates a true head position image, and a value of 0 indicates a lighting-optimized head position image;

d_ref＝||render_headpos-refine_headpos||₁

Step S3-2-2: calculating the judgment loss of the image;

loss＝d_ref+d_judge

Step S3-3: training illumination to generate an antagonistic network;

step S3-3-1: training illumination to generate an optimized network;

Wherein t represents the t-th update of the model parameters;

step S3-3-2: training an illumination generation decision network;

Wherein t represents the t-th update of the model parameters;

And illumination generation decision model parameters

Until the model loss convergence no longer changes;

And illumination generation decision model parameters

Recording as trained light generation confrontation network model parameter w_adv＝{w_ref,w_judge}；

Step S4: optimizing head posture data of an illumination classroom and generating a confrontation sample, wherein the specific operation steps comprise:

Step S4-4: setting a threshold value of a vivid image, and scoring the judgment by s_judgeGreater than 0.5 as a realistic rendering illuminated head position image, fig. 4 (b);

step S5: using the generated confrontation data to construct a head posture detection model, as shown in fig. 5, the specific operation steps include:

step S5-4: training the neural network model constructed in the step S5-3 by using the training set constructed in the step S5-1 and the step S5-2 to obtain the parameter w of the trained head posture detection model_listen；

Step S6: using the generated confrontation head posture detection model to realize classroom head posture detection and non-lecture state detection, as shown in fig. 6, the specific operation steps include:

step S6-6: setting a state threshold value of the class non-attending proportion, if the class non-attending proportion is more than or equal to 5%, outputting the class non-attending state, and if the class non-attending proportion is less than 5%, outputting the normal class attending state;

step S7: using the generated confrontation head posture detection model to realize classroom lecture abnormity detection, as shown in fig. 7, the specific operation steps include:

step S7-1: inputting a real-time monitoring video and reading a video frame;

step S7-3: if the monitored video is still valid, using step S6-6, the non-lesson-listening status of the continuous video frames is detected, and the non-lesson-listening status of each frame is extracted

Step S7-4: if no non-listening state appears, clearing the non-listening state, clearing the starting time of the non-listening state, and clearing the abnormal state of class listening

Step S7-5: if the non-listening state appears, the abnormal judgment of the class listening in the classroom is carried out

Step S7-5-1: if the non-class state appears for the first time, initializing the current time as the starting time of the non-class state, and initializing the non-class duration as 1 frame

Step S7-5-2: if the non-attending state does not appear for the first time, updating the duration of the non-attending state, and increasing the duration by 1 frame

Step S7-5-3: if the duration of the non-lesson-listening state reaches 50 frames, outputting the abnormal state of lesson-listening in the current class

Claims

1. A classroom lecture listening abnormity detection method based on an illumination generation countermeasure network is characterized in that: the method comprises the following steps:

step S1: acquiring real classroom head posture data:

step S2: rendering the head posture data of the illumination classroom:

step S3: constructing an illumination generation countermeasure network:

step S4: generating a challenge sample:

using real classroom head posture data to obtain illumination rendering head position images of different illumination conditions, shooting angles and different figures, using trained illumination to generate confrontation network model parameters, generating optimized rendering illumination head position images, calculating judgment scores of the illumination optimization head position images, setting a vivid image threshold value, and selecting the illumination rendering head position images larger than the threshold value as vivid rendering illumination head position images;

step S5: constructing a head posture detection model:

the method comprises the steps of taking a vivid rendering illumination head position image as training data for head posture detection, setting a head posture detection model by utilizing the training data for marking the head posture detection, and obtaining parameters of the head posture detection model through training;

step S6: detecting the head posture in the classroom:

step S7: detecting abnormal class listening in a classroom:

inputting a classroom real-time collected video, extracting a video frame, setting a class-listening abnormity detection mechanism by using a constructed model and training parameters, and obtaining proportions of students in different states;

the method for acquiring the real classroom head posture data specifically comprises the following steps:

step S1-1: collecting a real classroom video;

Step S1-6: for the candidate head position image obtained in step S1-2, the trained head position detection model parameter w is used_headposThe human face judgment is carried out, the head and the non-head can be distinguished, and therefore the real head position image real of the test video can be extracted_headpos；

The method for rendering the head posture data of the illumination classroom specifically comprises the following steps:

step S2-1: designing a classroom student 3D model;

step S2-3: setting illumination conditions in a classroom student 3D model;

step S2-4: setting a camera shooting angle in a classroom student 3D model;

The method for constructing the illumination generation countermeasure network specifically comprises the following steps:

d_ref＝||render_headpos-refine_headpos||₁

Step S3-2-2: calculating the judgment loss of the image;

Step S3-2-2-4: solving for a decision loss for an image based on the decision score and the image label

loss＝d_ref+d_judge

Step S3-3: training illumination to generate an antagonistic network;

step S3-3-1: training illumination to generate an optimized network;

step S3-3-1-5: adjusting model for illumination generation optimizationParameters, and determining updated parameters of the illumination generation optimization model according to the total loss and gradient reduction method of the illumination rendering head position image

Wherein t represents the t-th update of the model parameters;

step S3-3-2: training an illumination generation decision network;

Wherein t represents the t-th update of the model parameters;

And illumination generation decision model parameters

Until the model loss convergence no longer changes;

And illumination generation decision model parameters

The method for generating the confrontation sample specifically comprises the following steps:

Step S4-4: setting a threshold value of a vivid image, and scoring the judgment by s_judgeGreater than 0.5 as a realistic rendering illuminated head position image;

the method for constructing the head posture detection model specifically comprises the following steps:

step S5-2: class labeling of training data for head pose detection y_listenThe head posture training data comprises lecture samples and non-lecture samples, wherein y_listenIs 1 denotes a careful lecture, y_listen0 means not attending class;

step S5-3-3: the fully-connected layer of layer 7 maps 4096 characteristic neurons to a single neuron, and the fully-connected mapping parameter matrix is 4096 x 2, which isClass y of the last output layer neuron_listenWherein y is_listenIs 1 denotes a careful lecture, y_listen0 means not attending class;

The classroom head posture detection specifically comprises the following steps:

step S6-4: judging whether the student is in class according to the head posture score of the student, wherein y_listenIs 1 denotes a careful lecture, y_listen0 means not attending class;

the class attendance abnormity detection specifically comprises the following steps:

step S7-1: inputting a real-time monitoring video and reading a video frame;

step S7-5-3: if the duration of the non-lesson-listening state reaches 50 frames, outputting a class-listening abnormal state;