CN116071575A - Multi-mode data fusion-based student classroom abnormal behavior detection method and detection system - Google Patents

Multi-mode data fusion-based student classroom abnormal behavior detection method and detection system Download PDF

Info

Publication number
CN116071575A
CN116071575A CN202211131074.XA CN202211131074A CN116071575A CN 116071575 A CN116071575 A CN 116071575A CN 202211131074 A CN202211131074 A CN 202211131074A CN 116071575 A CN116071575 A CN 116071575A
Authority
CN
China
Prior art keywords
classroom
expression
abnormal
limb
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211131074.XA
Other languages
Chinese (zh)
Inventor
郭胜男
吴永和
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xiyu Information Technology Co ltd
East China Normal University
Original Assignee
Shanghai Xiyu Information Technology Co ltd
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xiyu Information Technology Co ltd, East China Normal University filed Critical Shanghai Xiyu Information Technology Co ltd
Priority to CN202211131074.XA priority Critical patent/CN116071575A/en
Publication of CN116071575A publication Critical patent/CN116071575A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/758Involving statistics of pixels or of feature values, e.g. histogram matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a student classroom abnormal behavior detection method based on multi-mode data fusion, which comprises the steps of preprocessing student expression data and limb data in a classroom video, selecting an interest area, using an LBP histogram as the expression characteristic description of the students in the classroom, and using a single-Class support vector machine (One-Class SVM) as a classifier to detect expression abnormality; and calculating limb feature vectors in the classroom video stream by using a light flow histogram method (HOFO), performing limb action anomaly identification by using a single support vector machine (One-Class SVM) as a classifier, and finally performing logic decision fusion to obtain a final detection result of the classroom abnormal behavior. The invention adopts a new multi-mode data fusion method, and can synthesize data information of a plurality of modes, so that the detection result of the classroom behavior of students is more accurate. The invention also discloses a detection system for realizing the method.

Description

Multi-mode data fusion-based student classroom abnormal behavior detection method and detection system
Technical Field
The invention belongs to the technical field of education informatization, and particularly relates to a student classroom abnormal behavior detection method and system based on multi-mode data fusion.
Background
As a long term for many people, human beings can use sensory systems such as visual systems, auditory systems, olfactory systems, and haptic systems to receive information from the external environment, and form a real-world "mirror image" in the brain through multi-modal information synthesis. The biological information processing system can restrain environmental noise, extract key information from each sensory system, and resolve information conflict to make decisions. The intelligent machine can also receive multi-modal information, such as video information, audio information, etc., from an external environment through various information channels, but artificial intelligence and human beings have great gaps in multi-modal data processing capability, and many problems still need to be solved in multi-modal data fusion, such as suppression of different noise in different modal data, information conflict problem between the multi-modal data, and how to fuse the multi-modal data to improve the accuracy of final recognition or decision. Therefore, the fusion of the multi-modal data is helpful to solve ambiguity, and further the accuracy of intelligent decision making is improved.
In the educational arts, multimodal learning analysis (Multimodal Learning Analytics, MMLA) focuses on collecting and analyzing various impressions obtained from different aspects of the learning process to better understand and improve the teaching process. The multi-mode data fusion method can be divided into three types according to the information fusion level, including pixel level fusion, feature level fusion, decision level fusion and combination methods of various methods. Each modality of the multimodal data more or less contains information that contributes to the final classification or recognition of the task, in order to more effectively fuse the multimodal data, an insight into the information fusion hierarchy is required, applying different information fusion strategies for different situations.
In the field of education, the classroom is the main place of education practice, scientific and effective classroom observation can help students to improve the learning efficiency of the students, and assist teachers to improve teaching methods and strategies. The student classroom behavior is an important index for measuring the participation of the classroom, and the limb actions and the facial expressions can convey 70% of the total learning emotion information, thus being an indispensable non-language behavior for measuring the participation of the student in the classroom. Visual recognition technology-supported student class behavior detection studies in the field of longitudinal stage education have found that most studies focus on single-mode data, e.g., he Xiuling et al [1] The student classroom behavior recognition method based on human skeleton and deep learning is provided, namely, human skeleton single-mode information of students is extracted through images, and the classroom behaviors of the students are recognized by combining CNN-10. Similarly, xu Guzhen et al [2] The method comprises the steps of taking actual classroom teaching video shot by monitoring equipment as a data source, providing single-mode human skeleton information of students, and carrying out automatic identification by adopting a Boosting algorithm and a convolutional neural network. Vermum et al [3] The limb action recognition technology is applied to a remote teaching system, and is helpful for feeding back the learning state of a remote learner. Gu Liyu et al [4] Based on classroom video data, student status analysis is carried out on the number of students in the classroom, the positions of the students and the key points of the faces of the students, and the activity of the classroom is calculated through statistics. In summary, student status detection in a classroom scenario still faces the following challenges: firstly, detection of abnormal events of the class state of students is not yet available; secondly, the student class state detection based on video data focuses on single-mode data, and multi-mode information is not fully fused; third, in methods, feature descriptors are mostly dependent on hand-made features or methods of operating stacked denoising auto encoders in an unsupervised manner.
Disclosure of Invention
In order to solve the defects existing in the prior art, the invention aims to provide a student classroom abnormal state detection method and system based on multi-mode data fusion. According to the method, for two different-mode data contained in a classroom video stream, an expression feature vector and a limb feature vector are formed by adopting a Local Binary Pattern (LBP) and an optical flow Histogram (HOFO) method respectively, after the feature vector is subjected to dimension reduction, a single-class support vector machine (One-class SVM) is respectively input to obtain abnormal detection results of different modes, and finally a final fusion abnormal detection result is obtained through a decision fusion method, so that the effect of improving the detection accuracy of fusion multi-mode information is achieved. The abnormal state comprises expression abnormality and limb abnormality; wherein, the expression abnormality comprises crying, aversion, dull eye, ghost expression and the like; the limb abnormality includes eating things, playing mobile phones, beating a person with a hand, standing, waving hands, beating yawns and the like.
The invention provides a student classroom abnormal behavior detection method based on multi-mode data fusion, which comprises the following steps:
capturing video stream data of students in class learning through a class camera;
step two, preprocessing the classroom learning video stream data in the step one;
the preprocessing comprises the steps of adjusting the resolution of an original video image to 256 x 1169, guaranteeing the definition of the image, carrying out image normalization, adjusting different video images to the same size as the pixel value, and carrying out image graying and noise filtering on the basis of the operation;
the method is characterized in that a color image matrix captured by a camera is large in required storage space and not suitable for image processing, gray processing refers to replacing three component values of a color image R, G, B by a numerical value to represent the color depth of an image pixel point, and the range of the numerical value is [0,255], and the method adopts a weighted average method; the weighted average of 3 values of R component, G component and B component of the pixel in the color image is used as the gray value of the gray map, and the respective weight value of the pixel on R, G, B component is selected according to the actual situation.
Noise, which is unnecessary or redundant information in image data, is an important factor interfering with image quality, is sometimes present in the image data of the captured video stream, and is denoised by a gaussian filtering method.
Step three, binarizing the image preprocessed in the step two, wherein the image binarization is to convert the image into a representation mode of non-black, namely white, the color depth of a pixel point is 0 or 255, and a maximum inter-class difference method (Ostu method) is adopted to select an interest region in a video, so that the expression and limb movement characteristics of students can be conveniently and intensively extracted, and the method comprises the following specific steps:
pixels in a classroom video image are divided into background pixels and foreground pixels, wherein the background pixels are pixels contained in a stable image background model, and the foreground pixels approximately contain a foreground moving object relative to the background pixels. Specifically, a background model is built through a ViBe algorithm, a differential image is obtained according to the difference between a current frame and the background model, a specific threshold T is set for binarization, the threshold T takes an image pixel mean value as an initial value, adjustment is carried out according to a specific training process, a mathematical expression is as follows, if the environment changes, the background model is required to be continuously updated to adapt to the change of the background environment, the binarized image is processed in a morphological mode, and edges are extracted to obtain a foreground moving target.
Figure BDA0003850292090000031
Wherein f k (x, y) represents a K-th frame image, B k (x, y) is the currently established background model, T is the threshold for image segmentation.
And step four, respectively extracting expression feature vectors and limb feature vectors in the students' class from the processed classroom video stream data by adopting an LBP feature histogram and an optical flow histogram method (HOFO).
The Local Binary Pattern (LBP) has strong discrimination and small time calculation complexity, is suitable for appearance feature extraction, and calculates a pixel value by comparing a gray value of a center pixel P of an image with a gray value of an adjacent pixel, see fig. 2, and has the following formula:
L≈l(s(n 0 -n c ),...,s(n 7 -n c ))
wherein L represents a numerical value obtained by calculation, L (x) represents a single s (x) calculation result, n 0 Representing pixel values in the window other than the center pixel, n c The value of the center pixel is indicated,
Figure BDA0003850292090000032
fig. 2 illustrates the LBP operator calculation process, sequentially reading 8-bit binary numbers as the eigenvalues of pixels, converting the binary numbers into decimal numbers according to the following formula.
Figure BDA0003850292090000033
Wherein, (x) c ,y c ) The position of the center pixel, n i The pixel values in the window except the center pixel are represented, and the values are 0,1 and 2 … 7 from the upper left corner of the window.
The calculation steps of the LBP feature vector are as follows: firstly, detecting a student facial expression window, dividing the window into small cell body cells with the size of 20 multiplied by 15, calculating an LBP value of each pixel in each cell, then counting a histogram of each cell and carrying out normalization processing, and finally, counting the histograms of all cells to be connected into an LBP feature vector.
Step five, the limb movements of students are accompanied by the changes of positions and directions, while the optical flow is composed of the brightness changes of images and can be used for expressing the movement information of target points.
The calculation of descriptors in an optical flow Histogram (HOFO) is in blocks (blocks) comprising b w ×b h Individual cells (cells) each containing c w ×c h The individual pixels, w and h, represent the length and width (including the length and width of the block, the length and width of the cell body), respectively, and a one-dimensional histogram of all pixels is calculated in each cell. The invention adopts 8 boxes (bins) to count the horizontal and vertical optical flow direction information, namely, the optical flow direction is divided into 8 direction blocks by 360 degrees. Each pixel in the cell body is mapped to a fixed angle range by using the optical flow direction, and statistics is carried out by using a statistical histogram, so that the optical flow histogram of the cell body can be obtained. For example, if the mapping angle of the optical flow direction of one pixel is 0 ° to 45 °, the first bin of the histogram is incremented by 1. Several cells are then combined into a block, and the feature descriptors of all cells within a block are concatenated to obtain the HOFO feature descriptor for that block, as shown in FIG. 3.
For example, as shown in FIG. 4, feature vector descriptor F of the ith frame i Is calculated by the computer. The angle between the optical flow vector and the horizontal axis is calculated from bins (bins) projected to the respective intervals. Each pixel in the cell is mapped to a fixed angular range according to the optical flow direction, so that a gradient histogram of the cell can be generated and used for statistical analysis. For example, if the mapping angle of the optical flow direction of the pixel belongs to (0 ° -45 °), then the first bin of the histogram (right side of fig. 5) is added with 1, then the cells are connected into one block, and feature descriptors of all the cells in each block are connected to obtain the HOFO feature descriptor of the entire block.
Before step six is performed, the classifier is trained using student Class normal behavior data, and a single classification support vector machine (One-Class SVM) is a special variant of SVM, requiring training using only One type of sample (e.g., a normal behavior sample dataset).
The single classification support vector machine (One-Class SVM) works as follows: the OCSVM aims to establish a decision boundary with a maximum distance between the normal dataset and the origin. The optimal learning boundary is trained through the normal sample data set, and comprises almost all normal behavior sample points, and during testing, normal behaviors which are close to the origin inside the boundary and abnormal points which are not close to the boundary belong to the boundary. OCSVM maps training data set to Gao Weite by kernel function K (x, z) = (Φ (x) ·Φ (z))And (3) characterizing the space, enabling the input data to have better aggregation, and then iteratively finding out the optimal edge hyperplane to realize the maximization of the distance between the training data and the origin. Hypothesis training data set
Figure BDA0003850292090000041
Representing a normal dataset, to obtain the boundary, the optimization model is as follows:
Figure BDA0003850292090000042
where w and ρ are parameters for determining decision boundaries, they can locate variables in the optimization process. X is x i Represents a training sample set, N is the total number of training data, v is a balance parameter, and ζ i ={ξ 1 ,ξ 2 ,...,ξ n X is i Is used to calculate the distance of the sample set from the origin of coordinates. Φ:
Figure BDA0003850292090000057
representing the distance from the data entry space χ to the feature space χ>
Figure BDA0003850292090000056
Which may be implemented by some simple kernel function transformation, the learning of the OCSVM model is performed in feature space. Equation (2) is a general form of kernel function:
K(x,z)=(Φ(x)·Φ(z)) (2)
where K (x, z) and Φ (x) represent the kernel function and the mapping function, respectively. Φ (x) ·Φ (z) is the inner product of Φ (x) and Φ (z). x, z εχ, represent samples of the data input space.
For example, the gaussian kernel function is as follows:
Figure BDA0003850292090000051
wherein z represents the kernel function center, i x-z i 2 Euclidean distance representing vector x and vector z, along with twoThe distance of the vectors increases, the Gaussian kernel function value monotonically decreases, and sigma controls the action range of the Gaussian kernel function.
The solution of equation (1) can be solved by Lagrangian multiplier method Lagrange Multiplier, as shown in equation (4)
Figure BDA0003850292090000052
Wherein alpha is i Is a Lagrangian operator, let α= [ α ] 12 ,...,α N ] T . w (e.g., positive optical flow vector) is defined by
Figure BDA0003850292090000053
And (5) defining. />
Figure BDA0003850292090000054
The representation can be represented by any x j The calculated boundary parameters are then used to determine the best boundary by support vector expansion, and the decision function is as follows:
Figure BDA0003850292090000055
wherein alpha is i Represents a Lagrangian operator, f (x) represents a decision function, Σ represents a summation function, x i Representing non-zero training samples, sign () represents a sign function.
Step six, based on the LBP feature vector of the step four and the optical flow Histogram (HOFO) of the step five, respectively training an expression abnormality classifier and a limb movement abnormality classifier, and then inputting a pre-processed classroom learning test data set to respectively obtain classification results of whether the expression and the limb movement are abnormal. In the invention, crying, aversion, dull eyes, ghost expression and the like belong to abnormal expression; the actions of eating, playing mobile phone, beating a person with a hand, standing, waving hands, beating yawns and the like all belong to abnormal class abnormal behaviors.
Step seven, carrying out logic fusion on the two-mode classification results of the expression and the limbAnd obtaining a final classroom abnormal behavior detection result. Before this step, the detection result of the student's expression and limb motion is obtained, R e And R is l The detection results of the two are respectively shown, and R represents the final detection result.
Figure BDA0003850292090000061
Figure BDA0003850292090000062
R=R e ∩R l
Wherein R represents the overall detection result. When r=1, the student is behaving normally, i.e. R e =1,R l =1. When any one of the expression detection and the limb movement detection is abnormal, the overall result is abnormal, i.e., r=0.
The invention also provides a detection system for realizing the method, which comprises the following steps: the system comprises a classroom learning behavior database module, an expression abnormality detection module, a limb movement abnormality detection module and a decision fusion module;
the classroom learning behavior database module is used for collecting video data of students and determining a student classroom learning expression data set and a limb action data set; preprocessing a training data set, selecting an interested region in a video image by using a method in an OpenCV library, storing the processed data set in a classroom learning video database, and marking a normal or abnormal label;
the expression abnormality detection module obtains a result of whether the expression is abnormal or not by inputting the expression to be detected into the trained model;
the limb movement abnormality detection module is used for obtaining the result of whether the limb movement is abnormal or not by inputting the limb movement to be detected into the trained model;
and the decision fusion module adopts a decision-level fusion strategy to logically fuse the expression abnormality detection result and the limb movement abnormality detection result, and judges the final behavior abnormality detection result.
The beneficial effects of the invention include: the invention designs a student classroom abnormal behavior detection method based on a semi-supervision method, which can systematically integrate information of two modes of expression and limb motion, can highlight unique characteristics of each mode, has more sufficient information, has high fault tolerance and improves the accuracy of detection results. In addition, the invention provides a semi-supervised learning method One-Class SVM as a classifier, and the method has less requirement on the number of samples and higher timeliness.
Drawings
Fig. 1 is a diagram of a student classroom abnormal behavior detection method based on multi-mode data fusion.
Fig. 2 is a calculation diagram of the LBP operator provided by the present invention.
Fig. 3 shows the optical flow Histogram (HOFO) calculation step provided by the present invention.
Fig. 4 is a diagram illustrating a process of calculating an i-th frame HOFO according to the present invention.
Fig. 5 is a HOFO extraction visualization process of the present invention.
Fig. 6 is a flowchart of a real-time detection method for abnormal behaviors of students in a real classroom according to the embodiment of the present invention.
Fig. 7 is a functional block diagram of an implementation of the present invention.
Detailed Description
The invention will be described in further detail with reference to the following specific examples and drawings. The procedures, conditions, experimental methods, etc. for carrying out the present invention are common knowledge and common knowledge in the art, except for the following specific references, and the present invention is not particularly limited.
The invention discloses a student classroom abnormal behavior detection method based on multi-mode data fusion, which comprises the steps of preprocessing student expression data and limb data in a classroom video, selecting an interest area (Region ofinterest), using an LBP histogram as the expression characteristic description of the students in the classroom, and using a single-Class support vector machine (One-Class SVM) as a classifier to detect the expression abnormality; and calculating limb feature vectors in the classroom video stream by using a light flow histogram method (HOFO), performing limb action anomaly identification by using a single support vector machine (One-Class SVM) as a classifier, and finally performing logic decision fusion to obtain a final detection result of the classroom abnormal behavior. The invention adopts a new multi-mode data fusion method, and can synthesize data information of a plurality of modes, so that the detection result of the classroom behavior of students is more accurate. The invention also discloses a detection system for realizing the method.
In particular, the method comprises the steps of,
the invention provides a student classroom abnormal behavior detection method based on multi-mode data fusion, which comprises the following steps:
step 1: collecting student classroom learning data by using a camera, wherein the data set is divided into a training data set and a testing data set, and the training data set comprises an expression training data set and a limb action training data set so as to establish a detection database;
step 2: respectively preprocessing an expression training data set and a limb action data set, wherein the preprocessing comprises image resolution adjustment, image noise reduction, image size normalization, image graying and the like;
step 3: determining an interest area containing a target to be detected, and then respectively carrying out feature extraction of expression and limb actions and classifier training;
step 4: respectively inputting the test set into two trained classifiers to respectively obtain an expression abnormality detection result and a limb movement abnormality detection result;
step 5: and carrying out logic fusion on the detection result to obtain final classroom behavior abnormality detection.
Examples:
the embodiment provides a real-time detection method for abnormal behaviors of students in a real classroom, wherein a flow chart is shown in fig. 6, and a functional module chart is shown in fig. 7.
(1) The specific contents of the video database for student classroom learning established in fig. 7 are as follows:
establishing a student classroom learning video database, performing in a classroom behavior database module, collecting student classroom video data, determining a student classroom learning expression data set and a limb action data set, and preparing for training an expression OCSVM and a limb action OCSVM respectively; preprocessing a training data set, wherein the preprocessing comprises video definition, resolution, video size normalization, video noise reduction and the like; selecting an interested region in a video image by means of a method in an OpenCV library, inputting the width and length of a rectangular frame to be selected by calculating the relative position and marking pixels of the interested region, extracting the interested region by inputting a rectangular function, and marking all pixel values outside the interested region as 0. Through the steps, the processed data set is stored in a classroom learning video database, and normal or abnormal labels are marked.
(2) Expression abnormality detection, the details of expression abnormality detection in fig. 7 are as follows:
the expression abnormality detection is carried out by an expression abnormality detection module and is divided into two stages, namely an expression classifier off-line training stage and an on-line testing stage, wherein a student class expression data set comprises a normal expression training set and a testing set. In the training stage, a student class normal expression data set is selected, and video definition, resolution, video size normalization and noise reduction preprocessing are carried out on the student class normal expression data set so as to ensure the video quality of the training data set; and extracting expression features by using the LBP histogram to form feature vectors, and training an OCSVM classifier by taking the feature vectors as input values. In the test stage, a test data set consists of videos learned by students in a class, expression features are extracted after preprocessing and are input into a trained OCSVM classifier to be classified, and abnormal expressions are detected by the OCSVM as abnormal points.
(3) The specific contents of the detection of the abnormal limb movements in fig. 7 are as follows:
the limb movement abnormality detection is performed by a limb movement abnormality detection module and is divided into two stages, namely an off-line training stage and an on-line testing stage of a limb movement classifier, wherein a student classroom limb movement data set comprises a normal limb movement training set and a testing set. In the training stage, a normal limb action data set of a student class is selected, and video definition, resolution, video size normalization and noise reduction preprocessing are carried out on the normal limb action data set to ensure the video quality of the training data set; and extracting limb action features by adopting an optical flow histogram to form feature vectors, and training an OCSVM classifier by taking the feature vectors as input values. In the test stage, a test data set consists of videos learned by students in a class, limb action features are extracted after preprocessing, and the videos are input into a trained OCSVM classifier for classification, and abnormal limb actions are detected by the OCSVM as abnormal points.
(4) Decision fusion the specific content of the decision fusion in fig. 7 is as follows:
the decision fusion is carried out by a decision fusion module, a decision-level fusion strategy is adopted to carry out logic fusion on the expression abnormal detection result and the limb movement abnormal detection result, and the student's behavior is normal only when the logic fusion result is 1; if the result of the logic fusion is 0, at least one of the expression and limb movement of the student is abnormal. And finally outputting the abnormal detection result of the student class in real time and giving prompt to a teacher.
Reference to the literature
[1] He Xiuling, yang Fan, chen Zengzhao, fang Jing, li Yangyang. Student classroom behavior recognition based on human skeleton and deep learning [ J ]. Modern educational techniques, 2020,30 (11): 105-112.
[2] Xu Guzhen, deng Wei, wei Yantao student classroom behavior automatic identification based on human skeleton information [ J ]. Modern educational technique, 2020,30 (05): 108-113.
[3]Vermun K,Senapaty M,SankhlaA,et al.Gesture-based affective and cognitive states recognition using kinect for effective feedback during elearning[A].2013IEEE Fifth International Conference on Technology for Education(t4e 2013)[C].Piscataway:IEEE,2013:107-110.
[4] Gu Liyu, zhang Chaohui, zhao Xiaoyan, xiao classroom student status analysis based on artificial intelligence video processing [ J ]. Modern educational techniques, 2019,29 (12): 82-88.
The protection of the present invention is not limited to the above embodiments. Variations and advantages that would occur to one skilled in the art are included in the invention without departing from the spirit and scope of the inventive concept, and the scope of the invention is defined by the appended claims.

Claims (10)

1. A student classroom abnormal behavior detection method based on multi-mode data fusion is characterized by comprising the following steps:
step one, acquiring video data of students in a classroom learning process through a classroom camera;
step two, preprocessing the classroom video data obtained in the step one;
step three, binarizing the image preprocessed in the step two, selecting an interest area in each frame of image of the video stream, and intensively extracting student behavior characteristics;
step four, respectively extracting expression feature vectors and limb feature vectors in the student class from the processed classroom video stream data by adopting a local binary method LBP and an optical flow histogram method HOFO;
step five, respectively inputting the obtained student classroom expression feature vector and limb feature vector into an OCSVM classifier to classify, and obtaining classification results of whether the classroom student behaviors are abnormal;
and step six, logically fusing the two-mode classification results of the expression and the limbs to obtain a final classroom abnormal behavior detection result.
2. The method according to claim 1, wherein in step two, the preprocessing includes image graying, video image resolution adjustment, video image size normalization, video noise removal;
the image graying means that a numerical value is used for replacing three component values of a color image R, G, B so as to represent the color depth of a pixel point of the image, and the range value is 0, 255;
the resolution of the video image is adjusted to 256 x 1169, so that the image definition is ensured;
the video image size normalization refers to adjusting different video images to be the same size as pixel values;
the video image noise removal refers to removing unnecessary or redundant information in the image data.
3. The method according to claim 1, wherein in the third step, the binarization means converting the image into a representation mode of two colors of non-black and white, and the color depth of the pixel point is 0 or 255; the region of interest in the video is selected using the maximum inter-class difference method.
4. The method according to claim 1, wherein in the fourth step, the local binary LBP calculates a pixel value by comparing a gray value of a center pixel P of the image with a gray value of an adjacent pixel, expressed as follows:
L≈l(s(n 0 -n c ),...,s(n 7 -n c )),
wherein L represents a numerical value obtained by calculation, L (x) represents a single s (x) calculation result, n 0 Representing pixel values in the window other than the center pixel, n c The value of the center pixel is indicated,
Figure FDA0003850292080000011
5. the method of claim 1, wherein in step four, the calculation of the descriptor in the optical flow histogram method HOFO is in block units, including b w ×b h Each cell body cell contains c w ×c h Each pixel in the cell body is mapped to a fixed angle range by using the optical flow direction, and statistics is carried out by using a statistical histogram to obtain the optical flow histogram of the whole cell body; several cell bodies are then combined into a block, and feature descriptors of all cell bodies in a block are concatenated to obtain the HOFO feature descriptor of the block.
6. The method of claim 1, wherein in step five, the single-class support vector machine is a special variant of a general support vector machine, requiring training using only one type of sample, either normal or abnormal; respectively setting and training an expression anomaly detector and a limb anomaly detector corresponding to the expression feature vector and the limb feature vector, namely an expression anomaly classification support vector machine and a limb anomaly classification support vector machine; the abnormal expression comprises crying, aversion, dull eye and ghost expression; the limb abnormality comprises eating things, playing mobile phones, beating a person with a hand, standing, waving hands and yawning.
7. The method according to claim 6, characterized in that the single-class support vector machine OCSVM is intended to establish a decision boundary with a maximum distance between the normal dataset and the origin; training an optimal learning boundary through a normal sample data set, wherein the optimal learning boundary comprises almost all normal behavior sample points, and during testing, normal behaviors which are close to an origin point and abnormal points which are not close to the boundary are detected; the OCSVM maps the input data to the high-dimensional feature space through the kernel function, so that the input data has better aggregation, and then the optimal edge hyperplane is found iteratively, so that the distance between the training data and the origin is maximized.
8. The method of claim 7, wherein the training data set is assumed
Figure FDA0003850292080000021
Representing a normal dataset, to obtain the boundary, the optimization model is as follows:
Figure FDA0003850292080000022
wherein w and ρ are parameters for determining decision boundaries, which enable to locate variables in the optimization process; x is x i Represents a training sample set, N is the total number of training data, v is a balance parameter, and ζ i ={ξ 1 ,ξ 2 ,...,ξ n X is i Calculating the distance between the sample set and the origin of coordinates;
Figure FDA0003850292080000023
representing the distance from the data entry space χ to the feature space χ>
Figure FDA0003850292080000024
Is realized by kernel function transformation, and the learning of the OCSVM model is performed in a feature space; the form of the kernel function is as follows:
K(x,z)=(Φ(x)·Φ(z)),
where K (x, z) and Φ (x) represent the kernel function and the mapping function, respectively. Φ (x) ·Φ (z) is the inner product of Φ (x) and Φ (z); x, z εχ, represent samples of the data input space;
the solution of the optimization model is solved by the lagrangian multiplier method Lagrange Multiplier,
Figure FDA0003850292080000025
wherein alpha is i Is a Lagrangian operator, let α= [ α ] 12 ,...,α N ] T The method comprises the steps of carrying out a first treatment on the surface of the w is composed of
Figure FDA0003850292080000026
Definition; />
Figure FDA0003850292080000027
Representing that can be represented by any x j The calculated boundary parameters are then used to determine the best boundary by support vector expansion, and the decision function is as follows:
Figure FDA0003850292080000031
wherein alpha is i Represents the Lagrangian operator, x i Representing a non-zero training sample, f (x) represents a decision function.
9. The method according to claim 1, wherein in step six, the following is performedFusing the expression abnormal behavior detection result and the limb abnormal behavior detection result, and outputting a final classroom abnormal behavior detection result; r is R e And R is l Respectively representing an expression abnormal behavior detection result and a limb abnormal detection result, wherein R represents a final detection result,
Figure FDA0003850292080000032
Figure FDA0003850292080000033
R=R e ∩R l
wherein R represents the whole detection result; when r=1, the student is behaving normally, i.e. R e =1,R l =1; when any one of the expression detection and the limb movement detection is abnormal, the overall result is abnormal, i.e., r=0.
10. A detection system for implementing the method according to any one of claims 1-9, characterized in that the system comprises: the system comprises a classroom learning behavior database module, an expression abnormality detection module, a limb movement abnormality detection module and a decision fusion module;
the classroom learning behavior database module is used for collecting video data of students and determining a student classroom learning expression data set and a limb action data set; preprocessing a training data set, selecting an interested region in a video image by using a method in an OpenCV library, storing the processed data set in a classroom learning video database, and marking a normal or abnormal label;
the expression abnormality detection module obtains a result of whether the expression is abnormal or not by inputting the expression to be detected into the trained model;
the limb movement abnormality detection module is used for obtaining the result of whether the limb movement is abnormal or not by inputting the limb movement to be detected into the trained model;
and the decision fusion module adopts a decision-level fusion strategy to logically fuse the expression abnormality detection result and the limb movement abnormality detection result, and judges the final behavior abnormality detection result.
CN202211131074.XA 2022-09-16 2022-09-16 Multi-mode data fusion-based student classroom abnormal behavior detection method and detection system Pending CN116071575A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211131074.XA CN116071575A (en) 2022-09-16 2022-09-16 Multi-mode data fusion-based student classroom abnormal behavior detection method and detection system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211131074.XA CN116071575A (en) 2022-09-16 2022-09-16 Multi-mode data fusion-based student classroom abnormal behavior detection method and detection system

Publications (1)

Publication Number Publication Date
CN116071575A true CN116071575A (en) 2023-05-05

Family

ID=86172198

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211131074.XA Pending CN116071575A (en) 2022-09-16 2022-09-16 Multi-mode data fusion-based student classroom abnormal behavior detection method and detection system

Country Status (1)

Country Link
CN (1) CN116071575A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117437208A (en) * 2023-11-10 2024-01-23 北京交通大学 Rail anomaly detection method and system using multi-sensor fusion

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117437208A (en) * 2023-11-10 2024-01-23 北京交通大学 Rail anomaly detection method and system using multi-sensor fusion

Similar Documents

Publication Publication Date Title
CN106960202B (en) Smiling face identification method based on visible light and infrared image fusion
CN110889672B (en) Student card punching and class taking state detection system based on deep learning
CN109389074B (en) Facial feature point extraction-based expression recognition method
Bascón et al. An optimization on pictogram identification for the road-sign recognition task using SVMs
CN111563452B (en) Multi-human-body gesture detection and state discrimination method based on instance segmentation
CN107767405A (en) A kind of nuclear phase for merging convolutional neural networks closes filtered target tracking
CN113793336B (en) Method, device and equipment for detecting blood cells and readable storage medium
CN101482923A (en) Human body target detection and sexuality recognition method in video monitoring
Reshna et al. Spotting and recognition of hand gesture for Indian sign language recognition system with skin segmentation and SVM
CN111158491A (en) Gesture recognition man-machine interaction method applied to vehicle-mounted HUD
CN111158457A (en) Vehicle-mounted HUD (head Up display) human-computer interaction system based on gesture recognition
CN112528777A (en) Student facial expression recognition method and system used in classroom environment
Roa’a et al. Automated cheating detection based on video surveillance in the examination classes
Elhassan et al. DFT-MF: Enhanced deepfake detection using mouth movement and transfer learning
Berrú-Novoa et al. Peruvian sign language recognition using low resolution cameras
Sharma et al. Deep learning based student emotion recognition from facial expressions in classrooms
CN116543261A (en) Model training method for image recognition, image recognition method device and medium
CN116071575A (en) Multi-mode data fusion-based student classroom abnormal behavior detection method and detection system
Rohini et al. Attendance monitoring system design based on face segmentation and recognition
Cowie et al. An intelligent system for facial emotion recognition
CN117437691A (en) Real-time multi-person abnormal behavior identification method and system based on lightweight network
CN111898454A (en) Weight binarization neural network and transfer learning human eye state detection method and device
Silva et al. POEM-based facial expression recognition, a new approach
Shanthi et al. Gender and age detection using deep convolutional neural networks
Bansal et al. Detection and Recognition of Hand Gestures for Indian Sign Language Recognition System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination