CN116071575A

CN116071575A - Multi-mode data fusion-based student classroom abnormal behavior detection method and detection system

Info

Publication number: CN116071575A
Application number: CN202211131074.XA
Authority: CN
Inventors: 郭胜男; 吴永和
Original assignee: Shanghai Xiyu Information Technology Co ltd; East China Normal University
Current assignee: Shanghai Xiyu Information Technology Co ltd; East China Normal University
Priority date: 2022-09-16
Filing date: 2022-09-16
Publication date: 2023-05-05

Abstract

The invention discloses a student classroom abnormal behavior detection method based on multi-mode data fusion, which comprises the steps of preprocessing student expression data and limb data in a classroom video, selecting an interest area, using an LBP histogram as the expression characteristic description of the students in the classroom, and using a single-Class support vector machine (One-Class SVM) as a classifier to detect expression abnormality; and calculating limb feature vectors in the classroom video stream by using a light flow histogram method (HOFO), performing limb action anomaly identification by using a single support vector machine (One-Class SVM) as a classifier, and finally performing logic decision fusion to obtain a final detection result of the classroom abnormal behavior. The invention adopts a new multi-mode data fusion method, and can synthesize data information of a plurality of modes, so that the detection result of the classroom behavior of students is more accurate. The invention also discloses a detection system for realizing the method.

Description

Multi-mode data fusion-based student classroom abnormal behavior detection method and detection system

Technical Field

The invention belongs to the technical field of education informatization, and particularly relates to a student classroom abnormal behavior detection method and system based on multi-mode data fusion.

Background

As a long term for many people, human beings can use sensory systems such as visual systems, auditory systems, olfactory systems, and haptic systems to receive information from the external environment, and form a real-world "mirror image" in the brain through multi-modal information synthesis. The biological information processing system can restrain environmental noise, extract key information from each sensory system, and resolve information conflict to make decisions. The intelligent machine can also receive multi-modal information, such as video information, audio information, etc., from an external environment through various information channels, but artificial intelligence and human beings have great gaps in multi-modal data processing capability, and many problems still need to be solved in multi-modal data fusion, such as suppression of different noise in different modal data, information conflict problem between the multi-modal data, and how to fuse the multi-modal data to improve the accuracy of final recognition or decision. Therefore, the fusion of the multi-modal data is helpful to solve ambiguity, and further the accuracy of intelligent decision making is improved.

In the educational arts, multimodal learning analysis (Multimodal Learning Analytics, MMLA) focuses on collecting and analyzing various impressions obtained from different aspects of the learning process to better understand and improve the teaching process. The multi-mode data fusion method can be divided into three types according to the information fusion level, including pixel level fusion, feature level fusion, decision level fusion and combination methods of various methods. Each modality of the multimodal data more or less contains information that contributes to the final classification or recognition of the task, in order to more effectively fuse the multimodal data, an insight into the information fusion hierarchy is required, applying different information fusion strategies for different situations.

In the field of education, the classroom is the main place of education practice, scientific and effective classroom observation can help students to improve the learning efficiency of the students, and assist teachers to improve teaching methods and strategies. The student classroom behavior is an important index for measuring the participation of the classroom, and the limb actions and the facial expressions can convey 70% of the total learning emotion information, thus being an indispensable non-language behavior for measuring the participation of the student in the classroom. Visual recognition technology-supported student class behavior detection studies in the field of longitudinal stage education have found that most studies focus on single-mode data, e.g., he Xiuling et al ^[1] The student classroom behavior recognition method based on human skeleton and deep learning is provided, namely, human skeleton single-mode information of students is extracted through images, and the classroom behaviors of the students are recognized by combining CNN-10. Similarly, xu Guzhen et al ^[2] The method comprises the steps of taking actual classroom teaching video shot by monitoring equipment as a data source, providing single-mode human skeleton information of students, and carrying out automatic identification by adopting a Boosting algorithm and a convolutional neural network. Vermum et al ^[3] The limb action recognition technology is applied to a remote teaching system, and is helpful for feeding back the learning state of a remote learner. Gu Liyu et al ^[4] Based on classroom video data, student status analysis is carried out on the number of students in the classroom, the positions of the students and the key points of the faces of the students, and the activity of the classroom is calculated through statistics. In summary, student status detection in a classroom scenario still faces the following challenges: firstly, detection of abnormal events of the class state of students is not yet available; secondly, the student class state detection based on video data focuses on single-mode data, and multi-mode information is not fully fused; third, in methods, feature descriptors are mostly dependent on hand-made features or methods of operating stacked denoising auto encoders in an unsupervised manner.

Disclosure of Invention

In order to solve the defects existing in the prior art, the invention aims to provide a student classroom abnormal state detection method and system based on multi-mode data fusion. According to the method, for two different-mode data contained in a classroom video stream, an expression feature vector and a limb feature vector are formed by adopting a Local Binary Pattern (LBP) and an optical flow Histogram (HOFO) method respectively, after the feature vector is subjected to dimension reduction, a single-class support vector machine (One-class SVM) is respectively input to obtain abnormal detection results of different modes, and finally a final fusion abnormal detection result is obtained through a decision fusion method, so that the effect of improving the detection accuracy of fusion multi-mode information is achieved. The abnormal state comprises expression abnormality and limb abnormality; wherein, the expression abnormality comprises crying, aversion, dull eye, ghost expression and the like; the limb abnormality includes eating things, playing mobile phones, beating a person with a hand, standing, waving hands, beating yawns and the like.

The invention provides a student classroom abnormal behavior detection method based on multi-mode data fusion, which comprises the following steps:

capturing video stream data of students in class learning through a class camera;

step two, preprocessing the classroom learning video stream data in the step one;

the preprocessing comprises the steps of adjusting the resolution of an original video image to 256 x 1169, guaranteeing the definition of the image, carrying out image normalization, adjusting different video images to the same size as the pixel value, and carrying out image graying and noise filtering on the basis of the operation;

the method is characterized in that a color image matrix captured by a camera is large in required storage space and not suitable for image processing, gray processing refers to replacing three component values of a color image R, G, B by a numerical value to represent the color depth of an image pixel point, and the range of the numerical value is [0,255], and the method adopts a weighted average method; the weighted average of 3 values of R component, G component and B component of the pixel in the color image is used as the gray value of the gray map, and the respective weight value of the pixel on R, G, B component is selected according to the actual situation.

Noise, which is unnecessary or redundant information in image data, is an important factor interfering with image quality, is sometimes present in the image data of the captured video stream, and is denoised by a gaussian filtering method.

Step three, binarizing the image preprocessed in the step two, wherein the image binarization is to convert the image into a representation mode of non-black, namely white, the color depth of a pixel point is 0 or 255, and a maximum inter-class difference method (Ostu method) is adopted to select an interest region in a video, so that the expression and limb movement characteristics of students can be conveniently and intensively extracted, and the method comprises the following specific steps:

pixels in a classroom video image are divided into background pixels and foreground pixels, wherein the background pixels are pixels contained in a stable image background model, and the foreground pixels approximately contain a foreground moving object relative to the background pixels. Specifically, a background model is built through a ViBe algorithm, a differential image is obtained according to the difference between a current frame and the background model, a specific threshold T is set for binarization, the threshold T takes an image pixel mean value as an initial value, adjustment is carried out according to a specific training process, a mathematical expression is as follows, if the environment changes, the background model is required to be continuously updated to adapt to the change of the background environment, the binarized image is processed in a morphological mode, and edges are extracted to obtain a foreground moving target.

Wherein f _k (x, y) represents a K-th frame image, B _k (x, y) is the currently established background model, T is the threshold for image segmentation.

And step four, respectively extracting expression feature vectors and limb feature vectors in the students' class from the processed classroom video stream data by adopting an LBP feature histogram and an optical flow histogram method (HOFO).

The Local Binary Pattern (LBP) has strong discrimination and small time calculation complexity, is suitable for appearance feature extraction, and calculates a pixel value by comparing a gray value of a center pixel P of an image with a gray value of an adjacent pixel, see fig. 2, and has the following formula:

L≈l(s(n ₀ -n _c ),...,s(n ₇ -n _c ))

wherein L represents a numerical value obtained by calculation, L (x) represents a single s (x) calculation result, n ₀ Representing pixel values in the window other than the center pixel, n _c The value of the center pixel is indicated,

fig. 2 illustrates the LBP operator calculation process, sequentially reading 8-bit binary numbers as the eigenvalues of pixels, converting the binary numbers into decimal numbers according to the following formula.

Wherein, (x) _c ,y _c ) The position of the center pixel, n _i The pixel values in the window except the center pixel are represented, and the values are 0,1 and 2 … 7 from the upper left corner of the window.

The calculation steps of the LBP feature vector are as follows: firstly, detecting a student facial expression window, dividing the window into small cell body cells with the size of 20 multiplied by 15, calculating an LBP value of each pixel in each cell, then counting a histogram of each cell and carrying out normalization processing, and finally, counting the histograms of all cells to be connected into an LBP feature vector.

Step five, the limb movements of students are accompanied by the changes of positions and directions, while the optical flow is composed of the brightness changes of images and can be used for expressing the movement information of target points.

The calculation of descriptors in an optical flow Histogram (HOFO) is in blocks (blocks) comprising b ^w ×b ^h Individual cells (cells) each containing c ^w ×c ^h The individual pixels, w and h, represent the length and width (including the length and width of the block, the length and width of the cell body), respectively, and a one-dimensional histogram of all pixels is calculated in each cell. The invention adopts 8 boxes (bins) to count the horizontal and vertical optical flow direction information, namely, the optical flow direction is divided into 8 direction blocks by 360 degrees. Each pixel in the cell body is mapped to a fixed angle range by using the optical flow direction, and statistics is carried out by using a statistical histogram, so that the optical flow histogram of the cell body can be obtained. For example, if the mapping angle of the optical flow direction of one pixel is 0 ° to 45 °, the first bin of the histogram is incremented by 1. Several cells are then combined into a block, and the feature descriptors of all cells within a block are concatenated to obtain the HOFO feature descriptor for that block, as shown in FIG. 3.

For example, as shown in FIG. 4, feature vector descriptor F of the ith frame _i Is calculated by the computer. The angle between the optical flow vector and the horizontal axis is calculated from bins (bins) projected to the respective intervals. Each pixel in the cell is mapped to a fixed angular range according to the optical flow direction, so that a gradient histogram of the cell can be generated and used for statistical analysis. For example, if the mapping angle of the optical flow direction of the pixel belongs to (0 ° -45 °), then the first bin of the histogram (right side of fig. 5) is added with 1, then the cells are connected into one block, and feature descriptors of all the cells in each block are connected to obtain the HOFO feature descriptor of the entire block.

Before step six is performed, the classifier is trained using student Class normal behavior data, and a single classification support vector machine (One-Class SVM) is a special variant of SVM, requiring training using only One type of sample (e.g., a normal behavior sample dataset).

The single classification support vector machine (One-Class SVM) works as follows: the OCSVM aims to establish a decision boundary with a maximum distance between the normal dataset and the origin. The optimal learning boundary is trained through the normal sample data set, and comprises almost all normal behavior sample points, and during testing, normal behaviors which are close to the origin inside the boundary and abnormal points which are not close to the boundary belong to the boundary. OCSVM maps training data set to Gao Weite by kernel function K (x, z) = (Φ (x) ·Φ (z))And (3) characterizing the space, enabling the input data to have better aggregation, and then iteratively finding out the optimal edge hyperplane to realize the maximization of the distance between the training data and the origin. Hypothesis training data set

Representing a normal dataset, to obtain the boundary, the optimization model is as follows:

where w and ρ are parameters for determining decision boundaries, they can locate variables in the optimization process. X is x _i Represents a training sample set, N is the total number of training data, v is a balance parameter, and ζ _i ＝{ξ ₁ ，ξ ₂ ，...,ξ _n X is _i Is used to calculate the distance of the sample set from the origin of coordinates. Φ:

representing the distance from the data entry space χ to the feature space χ>

Which may be implemented by some simple kernel function transformation, the learning of the OCSVM model is performed in feature space. Equation (2) is a general form of kernel function:

K(x,z)＝(Φ(x)·Φ(z)) (2)

where K (x, z) and Φ (x) represent the kernel function and the mapping function, respectively. Φ (x) ·Φ (z) is the inner product of Φ (x) and Φ (z). x, z εχ, represent samples of the data input space.

For example, the gaussian kernel function is as follows:

wherein z represents the kernel function center, i x-z i ² Euclidean distance representing vector x and vector z, along with twoThe distance of the vectors increases, the Gaussian kernel function value monotonically decreases, and sigma controls the action range of the Gaussian kernel function.

The solution of equation (1) can be solved by Lagrangian multiplier method Lagrange Multiplier, as shown in equation (4)

Wherein alpha is _i Is a Lagrangian operator, let α= [ α ] ₁ ,α ₂ ,...,α _N ] ^T . w (e.g., positive optical flow vector) is defined by

And (5) defining. />

The representation can be represented by any x _j The calculated boundary parameters are then used to determine the best boundary by support vector expansion, and the decision function is as follows:

wherein alpha is _i Represents a Lagrangian operator, f (x) represents a decision function, Σ represents a summation function, x _i Representing non-zero training samples, sign () represents a sign function.

Step six, based on the LBP feature vector of the step four and the optical flow Histogram (HOFO) of the step five, respectively training an expression abnormality classifier and a limb movement abnormality classifier, and then inputting a pre-processed classroom learning test data set to respectively obtain classification results of whether the expression and the limb movement are abnormal. In the invention, crying, aversion, dull eyes, ghost expression and the like belong to abnormal expression; the actions of eating, playing mobile phone, beating a person with a hand, standing, waving hands, beating yawns and the like all belong to abnormal class abnormal behaviors.

Step seven, carrying out logic fusion on the two-mode classification results of the expression and the limbAnd obtaining a final classroom abnormal behavior detection result. Before this step, the detection result of the student's expression and limb motion is obtained, R _e And R is _l The detection results of the two are respectively shown, and R represents the final detection result.

R＝R _e ∩R _l

Wherein R represents the overall detection result. When r=1, the student is behaving normally, i.e. R _e ＝1，R _l =1. When any one of the expression detection and the limb movement detection is abnormal, the overall result is abnormal, i.e., r=0.

The invention also provides a detection system for realizing the method, which comprises the following steps: the system comprises a classroom learning behavior database module, an expression abnormality detection module, a limb movement abnormality detection module and a decision fusion module;

the classroom learning behavior database module is used for collecting video data of students and determining a student classroom learning expression data set and a limb action data set; preprocessing a training data set, selecting an interested region in a video image by using a method in an OpenCV library, storing the processed data set in a classroom learning video database, and marking a normal or abnormal label;

the expression abnormality detection module obtains a result of whether the expression is abnormal or not by inputting the expression to be detected into the trained model;

the limb movement abnormality detection module is used for obtaining the result of whether the limb movement is abnormal or not by inputting the limb movement to be detected into the trained model;

and the decision fusion module adopts a decision-level fusion strategy to logically fuse the expression abnormality detection result and the limb movement abnormality detection result, and judges the final behavior abnormality detection result.

The beneficial effects of the invention include: the invention designs a student classroom abnormal behavior detection method based on a semi-supervision method, which can systematically integrate information of two modes of expression and limb motion, can highlight unique characteristics of each mode, has more sufficient information, has high fault tolerance and improves the accuracy of detection results. In addition, the invention provides a semi-supervised learning method One-Class SVM as a classifier, and the method has less requirement on the number of samples and higher timeliness.

Drawings

Fig. 1 is a diagram of a student classroom abnormal behavior detection method based on multi-mode data fusion.

Fig. 2 is a calculation diagram of the LBP operator provided by the present invention.

Fig. 3 shows the optical flow Histogram (HOFO) calculation step provided by the present invention.

Fig. 4 is a diagram illustrating a process of calculating an i-th frame HOFO according to the present invention.

Fig. 5 is a HOFO extraction visualization process of the present invention.

Fig. 6 is a flowchart of a real-time detection method for abnormal behaviors of students in a real classroom according to the embodiment of the present invention.

Fig. 7 is a functional block diagram of an implementation of the present invention.

Detailed Description

The invention will be described in further detail with reference to the following specific examples and drawings. The procedures, conditions, experimental methods, etc. for carrying out the present invention are common knowledge and common knowledge in the art, except for the following specific references, and the present invention is not particularly limited.

The invention discloses a student classroom abnormal behavior detection method based on multi-mode data fusion, which comprises the steps of preprocessing student expression data and limb data in a classroom video, selecting an interest area (Region ofinterest), using an LBP histogram as the expression characteristic description of the students in the classroom, and using a single-Class support vector machine (One-Class SVM) as a classifier to detect the expression abnormality; and calculating limb feature vectors in the classroom video stream by using a light flow histogram method (HOFO), performing limb action anomaly identification by using a single support vector machine (One-Class SVM) as a classifier, and finally performing logic decision fusion to obtain a final detection result of the classroom abnormal behavior. The invention adopts a new multi-mode data fusion method, and can synthesize data information of a plurality of modes, so that the detection result of the classroom behavior of students is more accurate. The invention also discloses a detection system for realizing the method.

In particular, the method comprises the steps of,

step 1: collecting student classroom learning data by using a camera, wherein the data set is divided into a training data set and a testing data set, and the training data set comprises an expression training data set and a limb action training data set so as to establish a detection database;

step 2: respectively preprocessing an expression training data set and a limb action data set, wherein the preprocessing comprises image resolution adjustment, image noise reduction, image size normalization, image graying and the like;

step 3: determining an interest area containing a target to be detected, and then respectively carrying out feature extraction of expression and limb actions and classifier training;

step 4: respectively inputting the test set into two trained classifiers to respectively obtain an expression abnormality detection result and a limb movement abnormality detection result;

step 5: and carrying out logic fusion on the detection result to obtain final classroom behavior abnormality detection.

Examples:

the embodiment provides a real-time detection method for abnormal behaviors of students in a real classroom, wherein a flow chart is shown in fig. 6, and a functional module chart is shown in fig. 7.

(1) The specific contents of the video database for student classroom learning established in fig. 7 are as follows:

establishing a student classroom learning video database, performing in a classroom behavior database module, collecting student classroom video data, determining a student classroom learning expression data set and a limb action data set, and preparing for training an expression OCSVM and a limb action OCSVM respectively; preprocessing a training data set, wherein the preprocessing comprises video definition, resolution, video size normalization, video noise reduction and the like; selecting an interested region in a video image by means of a method in an OpenCV library, inputting the width and length of a rectangular frame to be selected by calculating the relative position and marking pixels of the interested region, extracting the interested region by inputting a rectangular function, and marking all pixel values outside the interested region as 0. Through the steps, the processed data set is stored in a classroom learning video database, and normal or abnormal labels are marked.

(2) Expression abnormality detection, the details of expression abnormality detection in fig. 7 are as follows:

the expression abnormality detection is carried out by an expression abnormality detection module and is divided into two stages, namely an expression classifier off-line training stage and an on-line testing stage, wherein a student class expression data set comprises a normal expression training set and a testing set. In the training stage, a student class normal expression data set is selected, and video definition, resolution, video size normalization and noise reduction preprocessing are carried out on the student class normal expression data set so as to ensure the video quality of the training data set; and extracting expression features by using the LBP histogram to form feature vectors, and training an OCSVM classifier by taking the feature vectors as input values. In the test stage, a test data set consists of videos learned by students in a class, expression features are extracted after preprocessing and are input into a trained OCSVM classifier to be classified, and abnormal expressions are detected by the OCSVM as abnormal points.

(3) The specific contents of the detection of the abnormal limb movements in fig. 7 are as follows:

the limb movement abnormality detection is performed by a limb movement abnormality detection module and is divided into two stages, namely an off-line training stage and an on-line testing stage of a limb movement classifier, wherein a student classroom limb movement data set comprises a normal limb movement training set and a testing set. In the training stage, a normal limb action data set of a student class is selected, and video definition, resolution, video size normalization and noise reduction preprocessing are carried out on the normal limb action data set to ensure the video quality of the training data set; and extracting limb action features by adopting an optical flow histogram to form feature vectors, and training an OCSVM classifier by taking the feature vectors as input values. In the test stage, a test data set consists of videos learned by students in a class, limb action features are extracted after preprocessing, and the videos are input into a trained OCSVM classifier for classification, and abnormal limb actions are detected by the OCSVM as abnormal points.

(4) Decision fusion the specific content of the decision fusion in fig. 7 is as follows:

the decision fusion is carried out by a decision fusion module, a decision-level fusion strategy is adopted to carry out logic fusion on the expression abnormal detection result and the limb movement abnormal detection result, and the student's behavior is normal only when the logic fusion result is 1; if the result of the logic fusion is 0, at least one of the expression and limb movement of the student is abnormal. And finally outputting the abnormal detection result of the student class in real time and giving prompt to a teacher.

Reference to the literature

[1] He Xiuling, yang Fan, chen Zengzhao, fang Jing, li Yangyang. Student classroom behavior recognition based on human skeleton and deep learning [ J ]. Modern educational techniques, 2020,30 (11): 105-112.

[2] Xu Guzhen, deng Wei, wei Yantao student classroom behavior automatic identification based on human skeleton information [ J ]. Modern educational technique, 2020,30 (05): 108-113.

[3]Vermun K,Senapaty M,SankhlaA,et al.Gesture-based affective and cognitive states recognition using kinect for effective feedback during elearning[A].2013IEEE Fifth International Conference on Technology for Education(t4e 2013)[C].Piscataway:IEEE,2013:107-110.

[4] Gu Liyu, zhang Chaohui, zhao Xiaoyan, xiao classroom student status analysis based on artificial intelligence video processing [ J ]. Modern educational techniques, 2019,29 (12): 82-88.

The protection of the present invention is not limited to the above embodiments. Variations and advantages that would occur to one skilled in the art are included in the invention without departing from the spirit and scope of the inventive concept, and the scope of the invention is defined by the appended claims.

Claims

1. A student classroom abnormal behavior detection method based on multi-mode data fusion is characterized by comprising the following steps:

step one, acquiring video data of students in a classroom learning process through a classroom camera;

step two, preprocessing the classroom video data obtained in the step one;

step three, binarizing the image preprocessed in the step two, selecting an interest area in each frame of image of the video stream, and intensively extracting student behavior characteristics;

step four, respectively extracting expression feature vectors and limb feature vectors in the student class from the processed classroom video stream data by adopting a local binary method LBP and an optical flow histogram method HOFO;

step five, respectively inputting the obtained student classroom expression feature vector and limb feature vector into an OCSVM classifier to classify, and obtaining classification results of whether the classroom student behaviors are abnormal;

and step six, logically fusing the two-mode classification results of the expression and the limbs to obtain a final classroom abnormal behavior detection result.

2. The method according to claim 1, wherein in step two, the preprocessing includes image graying, video image resolution adjustment, video image size normalization, video noise removal;

the image graying means that a numerical value is used for replacing three component values of a color image R, G, B so as to represent the color depth of a pixel point of the image, and the range value is 0, 255;

the resolution of the video image is adjusted to 256 x 1169, so that the image definition is ensured;

the video image size normalization refers to adjusting different video images to be the same size as pixel values;

the video image noise removal refers to removing unnecessary or redundant information in the image data.

3. The method according to claim 1, wherein in the third step, the binarization means converting the image into a representation mode of two colors of non-black and white, and the color depth of the pixel point is 0 or 255; the region of interest in the video is selected using the maximum inter-class difference method.

4. The method according to claim 1, wherein in the fourth step, the local binary LBP calculates a pixel value by comparing a gray value of a center pixel P of the image with a gray value of an adjacent pixel, expressed as follows:

L≈l(s(n ₀ -n _c ),...,s(n ₇ -n _c ))，

5. the method of claim 1, wherein in step four, the calculation of the descriptor in the optical flow histogram method HOFO is in block units, including b ^w ×b ^h Each cell body cell contains c ^w ×c ^h Each pixel in the cell body is mapped to a fixed angle range by using the optical flow direction, and statistics is carried out by using a statistical histogram to obtain the optical flow histogram of the whole cell body; several cell bodies are then combined into a block, and feature descriptors of all cell bodies in a block are concatenated to obtain the HOFO feature descriptor of the block.

6. The method of claim 1, wherein in step five, the single-class support vector machine is a special variant of a general support vector machine, requiring training using only one type of sample, either normal or abnormal; respectively setting and training an expression anomaly detector and a limb anomaly detector corresponding to the expression feature vector and the limb feature vector, namely an expression anomaly classification support vector machine and a limb anomaly classification support vector machine; the abnormal expression comprises crying, aversion, dull eye and ghost expression; the limb abnormality comprises eating things, playing mobile phones, beating a person with a hand, standing, waving hands and yawning.

7. The method according to claim 6, characterized in that the single-class support vector machine OCSVM is intended to establish a decision boundary with a maximum distance between the normal dataset and the origin; training an optimal learning boundary through a normal sample data set, wherein the optimal learning boundary comprises almost all normal behavior sample points, and during testing, normal behaviors which are close to an origin point and abnormal points which are not close to the boundary are detected; the OCSVM maps the input data to the high-dimensional feature space through the kernel function, so that the input data has better aggregation, and then the optimal edge hyperplane is found iteratively, so that the distance between the training data and the origin is maximized.

8. The method of claim 7, wherein the training data set is assumed

wherein w and ρ are parameters for determining decision boundaries, which enable to locate variables in the optimization process; x is x _i Represents a training sample set, N is the total number of training data, v is a balance parameter, and ζ _i ＝{ξ ₁ ，ξ ₂ ，...,ξ _n X is _i Calculating the distance between the sample set and the origin of coordinates;

representing the distance from the data entry space χ to the feature space χ>

Is realized by kernel function transformation, and the learning of the OCSVM model is performed in a feature space; the form of the kernel function is as follows:

K(x,z)＝(Φ(x)·Φ(z))，

where K (x, z) and Φ (x) represent the kernel function and the mapping function, respectively. Φ (x) ·Φ (z) is the inner product of Φ (x) and Φ (z); x, z εχ, represent samples of the data input space;

the solution of the optimization model is solved by the lagrangian multiplier method Lagrange Multiplier,

wherein alpha is _i Is a Lagrangian operator, let α= [ α ] ₁ ,α ₂ ,...,α _N ] ^T The method comprises the steps of carrying out a first treatment on the surface of the w is composed of

Definition; />

Representing that can be represented by any x _j The calculated boundary parameters are then used to determine the best boundary by support vector expansion, and the decision function is as follows:

wherein alpha is _i Represents the Lagrangian operator, x _i Representing a non-zero training sample, f (x) represents a decision function.

9. The method according to claim 1, wherein in step six, the following is performedFusing the expression abnormal behavior detection result and the limb abnormal behavior detection result, and outputting a final classroom abnormal behavior detection result; r is R _e And R is _l Respectively representing an expression abnormal behavior detection result and a limb abnormal detection result, wherein R represents a final detection result,

R＝R _e ∩R _l ，

wherein R represents the whole detection result; when r=1, the student is behaving normally, i.e. R _e ＝1，R _l =1; when any one of the expression detection and the limb movement detection is abnormal, the overall result is abnormal, i.e., r=0.

10. A detection system for implementing the method according to any one of claims 1-9, characterized in that the system comprises: the system comprises a classroom learning behavior database module, an expression abnormality detection module, a limb movement abnormality detection module and a decision fusion module;