CN117058752A - Student classroom behavior detection method based on improved YOLOv7 - Google Patents

Student classroom behavior detection method based on improved YOLOv7 Download PDF

Info

Publication number
CN117058752A
CN117058752A CN202310884525.5A CN202310884525A CN117058752A CN 117058752 A CN117058752 A CN 117058752A CN 202310884525 A CN202310884525 A CN 202310884525A CN 117058752 A CN117058752 A CN 117058752A
Authority
CN
China
Prior art keywords
student
yolov7
image
features
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310884525.5A
Other languages
Chinese (zh)
Inventor
王莉娜
代启国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Minzu University
Original Assignee
Dalian Minzu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Minzu University filed Critical Dalian Minzu University
Priority to CN202310884525.5A priority Critical patent/CN117058752A/en
Publication of CN117058752A publication Critical patent/CN117058752A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A student classroom behavior detection method based on improved YOLOv7 belongs to the technical field of classroom behavior detection. Firstly, a detection pre-measurement head is changed into an ASFFdetection structure, so that a YOLOv7 network model is subjected to feature fusion on different feature levels to capture target information of different scales and improve target positioning capability. And replacing the CIoU loss function in the original YOLOv7 network model by WDLoss to adapt to unbalanced data and improve the generalization capability of the model. And finally, adding an attention mechanism ACmix module to enable the network to pay more attention to the object to be detected and enhance the feature processing capability of the network. The improved YOLOv7 model provided by the application can effectively detect the classroom behaviors of students under the conditions of lower image resolution, different scale targets and shielding.

Description

Student classroom behavior detection method based on improved YOLOv7
Technical Field
The application relates to the technical field of classroom behavior detection, in particular to a student classroom behavior detection method based on improved YOLOv 7.
Background
Along with the development of education industry, the importance of education and teaching fields to classroom teaching is more and more important, and the response and behavior change of students in the classroom are particularly focused. The proposal of new lessons puts higher demands on teaching evaluation. Meanwhile, in recent years, the construction of intelligent schools is advanced orderly in China, and school models featuring intelligent teaching, intelligent management, intelligent life and the like are built gradually. The student classroom is one of key links for constructing intelligent schools, and the quality of the student classroom is influenced by a plurality of aspects, including teaching design, classroom practice, teaching evaluation and the like. Among them, teaching evaluation by observing the classroom behavior of students is an effective and commonly used method.
In conventional teaching evaluation, there is generally an evaluation that a teacher sits in the back row to evaluate the state of a student in a class and the teaching situation of a teacher. However, it is difficult to comprehensively observe a specific lesson state of the student due to the position limitation of the assessment teacher. The assessment teacher can only assess the lesson status of a few students, resulting in incomplete assessment data. In addition, there are differences in the evaluation criteria, observation patterns, and thought angles of different evaluation teachers, which also lead to differences in teaching evaluation results. The mental states of the teacher are different in different periods of the same class, and the students are difficult to observe the class behaviors of the students in a concentrated manner for a long time, so that the difference of teaching evaluation is further increased. Therefore, the detection and analysis of the behaviors of students in a class from an objective perspective are of great significance to assessment teachers, lesson teachers, school leaders and parents of students. If the computer technology can be used for automatically identifying and detecting the classroom behaviors of students, comprehensive and objective data reference can be provided for teaching evaluation, and the teaching quality can be improved.
With the development of video analysis and computer vision technology, analyzing student behavior in classroom videos or pictures for teaching assessment can provide more accurate and objective feedback. In the field of classroom behavior detection, common algorithms include video-based motion recognition, gesture estimation, and target detection. Video motion recognition faces large-scale and high-dimensional video data processing problems, requires large amounts of computing resources and memory space, and acts have long-term dependencies in video, requiring time-dependent capture and modeling. Pose estimation it is challenging to estimate the poses of multiple people simultaneously in a multi-person scene, where the accuracy of the pose estimation can be degraded when parts of the human body are occluded or pose changes drastically. Time series analysis requires long-term dependencies to be established to accommodate different behavioral patterns and contexts. The target object can be accurately positioned by using the target detection to conduct behavior recognition, and a plurality of targets can be detected and recognized simultaneously in complex scenes such as multi-person interaction, group behaviors and the like. The target detection technology has made remarkable progress in the aspect of real-time application, and provides powerful support for behavior recognition tasks.
The problems of numerous student targets, serious shielding and the like exist in the classroom teaching video, and huge research challenges are brought to student behavior recognition in a classroom scene. In order to automatically identify the classroom behavior of all students, a more robust multi-person behavior identification model needs to be studied. The traditional student class behavior detection method based on target detection faces the influence of factors such as numerous student targets, inconsistent target sizes, target shielding, lower video or image resolution and the like, so that the behavior state of students in class cannot be accurately and efficiently identified.
Disclosure of Invention
Aiming at the defects in the prior art, the patent provides a student class behavior detection method based on improved YOLOv 7. The student class behavior detection method based on the improved YOLOv7 is mainly used for improving modules of a backbone network, a prediction head, an IOU calculation loss and the like of the YOLOv7, and the improved model focuses on objects to be detected, so that the behavior detection capability of a student class scene is improved. The problems mentioned in the background art above are solved. Experimental results prove that the method of the application has more advantages than the prior art.
In order to achieve the above purpose, the application adopts the technical scheme that: a student classroom behavior detection method based on improved YOLOv7 comprises the following steps:
step 1, acquiring a video of student classroom behavior, and frame-removing the acquired video to obtain a picture of the student classroom behavior;
and 2, preprocessing the image obtained in the step 1, marking the student class behavior data set by using a labelImg image marking tool, and dividing the data set to obtain the student class behavior data set.
Step 3, constructing a student class behavior detection network based on improved YOLOv7, adding an ACmix attention mechanism in a main network of a YOLOv7 algorithm, improving a prediction head part in the YOLOv7 algorithm, replacing a detection in the original YOLOv7 algorithm with an ASFFdetection structure, and introducing NWD-based Regression Loss as a loss function;
step 4, taking the image data in the data set as input, inputting the input data into the improved YOLOv7 model for training, and obtaining a trained student class behavior detection model;
step 5, sending the to-be-detected student class scene image into a trained model to obtain the behavior category and confidence of the student;
the image preprocessing and the image labeling obtained in the step 2 comprise the following steps:
step 2.1, preprocessing the obtained student classroom behavior image by using an OpenCV library, such as changing brightness and contrast, removing background and partial image, smoothing, reducing noise, and fusing pictures to obtain the student classroom behavior image;
step 2.2, performing action labeling on the student of the obtained student class behavior image by using a labelImg image labeling tool, and storing tag information in a txt file with the same name as the picture to obtain a student class behavior data set;
step 2.3, dividing the student class behavior image dataset into a training dataset and a testing dataset, and dividing all pictures and marked labels into 8: the scale of 2 is divided into training and test sets.
The student class behavior detection network based on the improved YOLOv7 mainly comprises an Input Backbone network (backbox), a Neck (Neck) and a Head (Head) 4 part, an ACmix attention convolution module is introduced into the Neck part of the basic YOLOv7, key target characteristics contained in a shallow network are highlighted, irrelevant information is weakened, the detection performance of an algorithm on small targets is improved, and the network is more focused on the targets to be detected. And replacing the Detect pre-measurement Head in the original network with the ASFFdetect pre-measurement Head in the Head part, and filtering out other layer characteristics carrying contradictory information by an optimal fusion method for learning different layer characteristics in the training process, thereby solving the problem of inconsistent learning targets. Introducing NWD-based Regression Loss to replace CIoU in the original YOLOv7 network model to optimize a loss function, adapting to unbalanced data and improving the generalization capability of the model;
the ACmix attention convolution module introduced in the neg section can be roughly divided into three first stages: the input features are projected by 3 1 x1 convolutions and then recombined into N blocks. Thus, a feature map containing 3×n intermediate features is obtained. And a second stage: using according to a different paradigm, for a self-attention path, intermediate features are collected into N groups, where each group contains three features, corresponding to q, k, v. For a convolution path with the kernel size of K, a lightweight full-connection layer is adopted to generate K2 feature graphs, and features are generated through shifting and aggregation. The third stage adds the outputs of the two paths, the intensity of which is shown by two learnable scalar controls:
F out =αF att +βF conv #(1)
wherein F is out Representing the final output of the path, F att Representing the output of the self-attention branch, F conv The values of parameters alpha and beta are both 1, representing the output of the convolved attention branch. The output results of the two branches are combined to give consideration to global features and local features, so that the detection effect of the network on the small target is improved.
The Head part replaces the Detect pre-header in the original network with an ASFFDetect pre-header, and the ASFF module comprises two steps: co-dimensional transformation and adaptive feature fusion, feature co-dimensional transformation: the feature map sizes of the different layers are not uniform, so that the same size needs to be reshaped whatever the fusion approach. The small size becomes larger in size and upsampling is required, and the large size becomes smaller in size and downsampling is required. Self-adaptive fusion: taking ASFF-3 as an example, the new fusion feature ASFF-3 can be obtained by multiplying the features X1, X2, X3 from level, level2, level3, respectively, by the weight parameters α3, β3 and γ3 for the features from different layers and adding them together:
wherein,meaning that the (i, j) vector of output features maps y between channels l ,/>Refers to the spatial importance weights of the feature map at three different levels to level L. Since the addition mode is adopted, the feature sizes of the level 1-3 layers output are the same when the addition is needed, the channel numbers are also the same, and the up-sampling or the down-sampling of the features of different layers and the channel number adjustment are needed. The weight parameters α, β and γ are obtained by convolving the features of level1 to level3 after the rest by 1×1. And parameters α, β and γ are all in the range of [0,1] by a softmax function after passing through the concat layer]Inner sum is 1:
wherein,meaning that the (i, j) vector of output features maps y between channels l ,/>Refers to the spatial importance weights of the feature map at three different levels to level L. The loss function of the replacement original model is designed as the loss function by NWD measurement:
wherein N is p For the Gaussian distribution model of the prediction block P, N g A gaussian distribution model for GT frame G; the NWD-based loss provides gradients |p n g|=0 and |p n g|=p or G.
The technical scheme of the application can obtain the following technical effects: according to the student class behavior detection method based on the improved YOLOv7, by adding the ACmix attention convolution module, key target characteristics contained in a shallow network can be highlighted, irrelevant information is weakened, the network is enabled to pay more attention to targets to be detected, and the problems of numerous student targets and target shielding under the class scene are solved. The detection pre-measurement Head of the Head part in the original YOLOv7 model is replaced by the ASFFdetection pre-measurement Head, and other layer characteristics carrying contradictory information are filtered through an optimal fusion method for learning different layer characteristics in the training process, so that the problem of inconsistent learning targets is solved, and the problem of large target size difference in the class scene in the prior art is solved. In addition, NWD-based Regression Loss is introduced to replace CIoU in the original YOLOv7 network model to optimize the loss function, adapt to unbalanced data, improve the generalization capability of the model and solve the detection problem under the condition of lower image resolution in a classroom scene.
Drawings
Fig. 1 is a flowchart of a student classroom behavior detection method based on improved YOLOv 7.
Fig. 2 is a network model structure of a student classroom behavior detection method based on improved YOLOv 7.
Fig. 3 is an effect diagram of generation of a student class behavior detection method based on improved YOLOv 7.
Detailed Description
The application is described in further detail below with reference to the attached drawings and to specific embodiments: the application will be further described by way of examples. It will be apparent that the described examples are only some, but not all embodiments of the application.
Fig. 1 shows a flow chart of a student class behavior detection method based on improved YOLOv 7. The student classroom behavior detection method based on the improved YOLOv7 specifically comprises the following steps:
step 1, acquiring a video of student classroom behavior, and frame-removing the acquired video to obtain a picture of the student classroom behavior;
and acquiring a student class behavior video, downloading a student class behavior data set, downloading the class behavior video from a data source Github, reading the video, setting the resolution of an output image, and outputting each frame in an image format in sequence to obtain a student class behavior image.
Step 2, preprocessing the image obtained in the step 1, marking a student class behavior data set by using a labelImg image marking tool, and dividing the data set to obtain the student class behavior data set;
and 2.1, preprocessing the student classroom behavior image by using an OpenCV library, changing brightness and contrast, removing background, carrying out smoothing treatment on partial images, reducing noise, and fusing pictures to obtain the student classroom behavior image.
Step 2.2, performing action labeling on the student of the obtained student class behavior image by using a labelImg image labeling tool, and storing tag information in a txt file with the same name as the picture to obtain a student class behavior data set;
step 2.3, dividing the student class behavior image dataset into a training dataset and a testing dataset, and dividing all pictures and marked labels into 8:2 is divided into a training set and a testing set;
step 3, constructing a student class behavior detection network based on improved YOLOv7, adding an ACmix attention mechanism in a main network of a YOLOv7 algorithm, improving a prediction head part in the YOLOv7 algorithm, replacing a detection in the original YOLOv7 algorithm with an ASFFdetection structure, and introducing NWD-based Regression Loss as a loss function;
the student class behavior detection network based on the improved YOLOv7 is constructed, and specifically comprises an attention adding convolution module, a change prediction head and a replacement loss function:
the student class behavior detection network based on the improved YOLOv7 mainly comprises an Input Backbone network (backbox), a Neck (Neck) and a Head (Head) 4 part, an ACmix attention convolution module is introduced into the Neck part of the basic YOLOv7, key target characteristics contained in a shallow network are highlighted, irrelevant information is weakened, the detection performance of an algorithm on small targets is improved, and the network is more focused on the targets to be detected. And replacing the Detect pre-measurement Head in the original network with the ASFFdetect pre-measurement Head in the Head part, and filtering out other layer characteristics carrying contradictory information by an optimal fusion method for learning different layer characteristics in the training process, thereby solving the problem of inconsistent learning targets. Introducing NWD-based Regression Loss to replace CIoU in the original YOLOv7 network model to optimize a loss function, adapting to unbalanced data and improving the generalization capability of the model;
the ACmix attention convolution module introduced in the neg section can be roughly divided into three first stages: the input features are projected by 3 1 x1 convolutions and then recombined into N blocks. Thus, a feature map containing 3×n intermediate features is obtained. And a second stage: using according to a different paradigm, for a self-attention path, intermediate features are collected into N groups, where each group contains three features, corresponding to q, k, v. For a convolution path with the kernel size of K, a lightweight full-connection layer is adopted to generate K2 feature graphs, and features are generated through shifting and aggregation. The third stage adds the outputs of the two paths, the intensity of which is shown by two learnable scalar controls:
F out =αF att +βF conv #(1)
wherein F is out Representing the final output of the path, F att Representing the output of the self-attention branch, F conv The values of parameters alpha and beta are both 1, representing the output of the convolved attention branch. The output results of the two branches are combined to give consideration to global features and local features, thereby improving the network pairDetection effect of the target.
The Head part replaces the Detect pre-header in the original network with an ASFFDetect pre-header, and the ASFF module comprises two steps: co-dimensional transformation and adaptive feature fusion, feature co-dimensional transformation: the feature map sizes of the different layers are not uniform, so that the same size needs to be reshaped whatever the fusion approach. The small size becomes larger in size and upsampling is required, and the large size becomes smaller in size and downsampling is required. Self-adaptive fusion: taking ASFF-3 as an example, the new fusion feature ASFF-3 can be obtained by multiplying the features X1, X2, X3 from level, level2, level3, respectively, by the weight parameters α3, β3 and γ3 for the features from different layers and adding them together:
wherein,meaning that the (i, j) vector of output features maps y between channels l ,/>Refers to the spatial importance weights of the feature map at three different levels to level L. Since the addition mode is adopted, the feature sizes of the level 1-3 layers output are the same when the addition is needed, the channel numbers are also the same, and the up-sampling or the down-sampling of the features of different layers and the channel number adjustment are needed. The weight parameters α, β and γ are obtained by convolving the features of level1 to level3 after the rest by 1×1. And parameters α, β and γ are all in the range of [0,1] by a softmax function after passing through the concat layer]Inner sum is 1:
wherein,meaning that the (i, j) vector of output features maps y between channels l ,/>Refers to the spatial importance weights of the feature map at three different levels to level L. The loss function of the replacement original model is designed as the loss function by NWD measurement:
wherein N is p For the Gaussian distribution model of the prediction block P, N g A gaussian distribution model for GT frame G; the NWD-based loss provides gradients |p n g|=0 and |p n g|=p or G.
Step 4, taking the image data in the data set as input, inputting the input data into the improved YOLOv7 model for training, and obtaining a trained student class behavior detection model;
and (3) sending the image data in the student class behavior data set into an improved YOLOv7 model for training, setting a training parameter, setting a learning rate to be 0.001, setting a confidence coefficient threshold to be 0.5, inputting all pictures in the training set into the improved YOLOv7 model for training, and repeating the training operation to obtain the model with the best training effect.
Step 5, sending the to-be-detected student class scene image into a trained model to obtain the behavior category and confidence of the student;
and detecting the classroom behavior of the students by using a trained student classroom behavior detection network based on improved YOLOv 7.
Fig. 2 shows a network model structure of a student class behavior detection method based on improved YOLOv 7. As shown in the figure, the method designs a novel network structure for student class behavior detection, adds an ACmix attention convolution module on the basis of a YOLOv7 network, changes ASFFdetection into a prediction head and replaces the original loss function by NWD-based Regression Loss. The experimental result proves that the method has advantages in accuracy and real-time compared with the prior art.
And 5, taking a group of student classroom behavior images as input, and detecting different input images through the step 5 to obtain a student classroom behavior detection image. Fig. 3 shows a detection effect diagram of the group of pictures, and from fig. 3, it can be seen that the method can accurately detect the behaviors of students in a multi-target blocked classroom scene, and the feasibility and effectiveness of the method are proved.

Claims (2)

1. The student classroom behavior detection method based on the improved YOLOv7 is characterized by comprising the following steps of:
step 1, acquiring a video of student classroom behavior, and frame-removing the acquired video to obtain a picture of the student classroom behavior;
step 2, preprocessing the image obtained in the step 1, marking a student class behavior data set by using a labelImg image marking tool, and dividing the data set to obtain the student class behavior data set;
step 3, constructing a student class behavior detection network based on improved YOLOv7, adding an ACmix attention convolution module in a YOLOv7 algorithm backbone network, improving a prediction head part in the YOLOv7 algorithm, replacing a detection in the original YOLOv7 algorithm with an ASFFdetection structure, and filtering out other layer characteristics carrying contradictory information by an optimal fusion method for learning different layer characteristics in a training process; simultaneously introducing NWD-based Regression Loss as a loss function;
the student classroom behavior detection network based on the improved YOLOv7 mainly comprises a Input, backbone, neck, head part; the ACmix attention convolution module introduced in the Neck part specifically comprises the following steps:
the first stage: projecting the input features through 3 1×1 convolutions, and then recombining the projected features into N blocks; obtaining a feature map comprising 3 xn intermediate features;
and a second stage: using according to different paradigms, for the self-attention path, collecting intermediate features into N groups, where each group contains three features, corresponding to q, k, v; for a convolution path with the kernel size of K, generating K2 feature graphs by adopting a lightweight full-connection layer, and generating features through shift and aggregation;
the third stage adds the outputs of the two paths, the intensity of which is shown by two learnable scalar controls:
F out =αF att +βF conv #(1)
wherein F is out Representing the final output of the path, F att Representing the output of the self-attention branch, F conv The values of parameters alpha and beta are 1, which represent the output of the convolved attention branch;
in the Head part, the Detect pre-header in the original network is replaced by an ASFFDetect pre-header, and the ASFFDetect module comprises two steps: the same-size transformation and self-adaptive feature fusion;
(1) Feature co-dimensional transformation: feature map sizes of different layers are inconsistent, and the feature map is remolded to the same size; the up-sampling is needed when the small size is changed into the large size, and the down-sampling is needed when the large size is changed into the small size;
(2) Self-adaptive fusion: multiplying the features X1, X2 and X3 from the levels 2 and 3 respectively by the weight parameters alpha, beta and gamma to obtain a new fusion feature ASFF-3:
wherein,meaning that the (i, j) vector of output features maps y between channels l ,/>Spatial importance weights of feature graphs at three different levels to level L; by adopting an addition mode, upsampling or downsampling is needed to be carried out on features of different layers, and the number of channels is adjusted; the output characteristics of the levels 1-3 are the same, and the number of channels is the same;
the weight parameters alpha, beta and gamma are obtained by convolution of 1X 1 through a characteristic diagram of level1 to level3 after the rest; and the weight parameters α, β and γ are all within [0,1] and sum to 1 by a softmax function after passing through the concat layer:
wherein,meaning that the (i, j) vector of output features maps y between channels l ,/>The method is characterized in that the spatial importance weights of the feature graphs from three different layers to layer L are adopted, and the loss function of the replacement original model is designed into the loss function by NWD measurement:
wherein N is p For the Gaussian distribution model of the prediction block P, N g A gaussian distribution model for GT frame G; providing a gradient |p n g|=0 and |p n g|=p or G based on the loss of NWD;
step 4, taking the image data in the data set as input, inputting the input data into the improved YOLOv7 model for training, and obtaining a trained student class behavior detection model;
and 5, sending the to-be-detected student class scene image into a trained model to obtain the behavior category and the confidence of the student.
2. The student class behavior detection method based on improved YOLOv7 of claim 1, wherein the image preprocessing and image labeling obtained in step 2 comprises the following steps:
step 2.1, preprocessing the obtained student classroom behavior image by using an OpenCV library, changing brightness and contrast, removing background, carrying out smoothing treatment on a local image, reducing noise, and fusing pictures to obtain the student classroom behavior image;
step 2.2, performing action labeling on the student of the obtained student class behavior image by using a labelImg image labeling tool, and storing tag information in a txt file with the same name as the picture to obtain a student class behavior data set;
and 2.3, dividing the student class behavior image data set into a training data set and a test data set, and dividing all pictures and marked labels into the training set and the test set according to the proportion of 8:2.
CN202310884525.5A 2023-07-19 2023-07-19 Student classroom behavior detection method based on improved YOLOv7 Pending CN117058752A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310884525.5A CN117058752A (en) 2023-07-19 2023-07-19 Student classroom behavior detection method based on improved YOLOv7

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310884525.5A CN117058752A (en) 2023-07-19 2023-07-19 Student classroom behavior detection method based on improved YOLOv7

Publications (1)

Publication Number Publication Date
CN117058752A true CN117058752A (en) 2023-11-14

Family

ID=88661621

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310884525.5A Pending CN117058752A (en) 2023-07-19 2023-07-19 Student classroom behavior detection method based on improved YOLOv7

Country Status (1)

Country Link
CN (1) CN117058752A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117611998A (en) * 2023-11-22 2024-02-27 盐城工学院 Optical remote sensing image target detection method based on improved YOLOv7

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117611998A (en) * 2023-11-22 2024-02-27 盐城工学院 Optical remote sensing image target detection method based on improved YOLOv7

Similar Documents

Publication Publication Date Title
Modi et al. Facial emotion recognition using convolution neural network
CN105718952A (en) Method for focus classification of sectional medical images by employing deep learning network
Zhang et al. An novel end-to-end network for automatic student engagement recognition
Hieu et al. Identifying learners’ behavior from videos affects teaching methods of lecturers in Universities
CN112036447A (en) Zero-sample target detection system and learnable semantic and fixed semantic fusion method
CN117058752A (en) Student classroom behavior detection method based on improved YOLOv7
Yang et al. Student in-class behaviors detection and analysis system based on CBAM-YOLOv5
Ho et al. Application of rough set, GSM and MSM to analyze learning outcome—An example of introduction to education
Figueroa-Flores et al. Saliency for free: Saliency prediction as a side-effect of object recognition
CN113688789B (en) Online learning input degree identification method and system based on deep learning
Nanthini et al. A Survey on Data Augmentation Techniques
CN115953836A (en) Off-line class student classroom behavior intelligent identification and cognitive state association method
WO2022247151A1 (en) Cognitive learning method based on brain mechanism
Jiang Analysis of Students’ Role Perceptions and their Tendencies in Classroom Education Based on Visual Inspection
CN113469001A (en) Student classroom behavior detection method based on deep learning
CN111444877B (en) Classroom people number identification method based on video photos
CN109726690A (en) Learner behavior image multizone based on DenseCap network describes method
Liao et al. Predicting learners' multi-question performance based on neural networks
CN117894217B (en) Mathematics topic guiding system for online learning system
Hoyle et al. Nervo: Augmented reality mobile application for the science education of central and peripheral nervous systems
Liu et al. Automatic Recognition and Application of Classroom Learning Behavior Based on ICAP Framework
Glaser et al. Work-in-Progress–Computer Vision Methods to Examine Neurodiverse Gaze Patterns in 360-Video
Liu The Detection of English Students’ Classroom Learning State in Higher Vocational Colleges Based on Improved SSD Algorithm
Jebli et al. Proposal of a similarity measure for unified modeling language class diagram images using convolutional neural network.
Shou et al. A Method for Analyzing Learning Sentiment Based on Classroom Time-Series Images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination