CN110728316A

CN110728316A - Classroom behavior detection method, system, device and storage medium

Info

Publication number: CN110728316A
Application number: CN201910943862.0A
Authority: CN
Inventors: 曾文彬; 郝禄国; 吴楚权; 葛海玉; 杨琳; 高星
Original assignee: Guang Zhou Hai Noboru Computer Science And Technology Ltd
Current assignee: Guang Zhou Hai Noboru Computer Science And Technology Ltd
Priority date: 2019-09-30
Filing date: 2019-09-30
Publication date: 2020-01-24

Abstract

The invention discloses a classroom behavior detection method, a system, a device and a storage medium, wherein the method comprises the following steps: acquiring an image set to be processed, and preprocessing the image set by adopting a preset algorithm to acquire positive/negative sample images with a preset proportion; performing feature extraction on the obtained positive/negative sample image to generate a positive/negative sample feature map; after the obtained positive/negative sample characteristic graphs are classified, generating a positive/negative sample prediction frame; and matching the positive/negative sample prediction frames by combining the loss function and a preset real frame set, and outputting a detection result after the matching is successful. The method has the advantages of realizing accurate identification of classroom behaviors, improving the detection precision and timeliness of target behaviors, reducing the dependence on hardware setting, having simple result, easily manufacturing a training set, and being beneficial to further popularization and application of a target detection algorithm based on deep learning in the teaching field. The method can be widely applied to the technical field of computer vision processing.

Description

Classroom behavior detection method, system, device and storage medium

Technical Field

The invention relates to the technical field of computer vision processing, in particular to a classroom behavior detection method, a classroom behavior detection system, a classroom behavior detection device and a storage medium.

Background

Interpretation of terms:

single step (One Stage) target detection algorithm: a regression-based detection algorithm.

Two-step (Two Stage) target detection algorithm: a detection algorithm based on region candidates.

NMS (Non-Maximum Suppression): a non-maxima suppression algorithm.

Double Stream (Two-Stream): behavior detection algorithms in video.

Ohem (online hard amplification): and (3) an online hard-to-divide sample mining algorithm.

In recent years, the technical field of target detection based on deep learning has made a great breakthrough along with the rapid development of artificial intelligence, wherein both the single-step target detection algorithm and the double-step target detection algorithm based on target detection undergo a rapid development process, such as from R-CNN to Fast R-CNN and then from Fast R-CNN to R-FCN, the single-step detection algorithm is represented by SSD, YOLO and the like; the target detection algorithms have better precision and real-time performance, so that the target detection algorithms based on deep learning are widely applied to various application fields, such as face detection, pedestrian detection, license plate recognition and the like.

The detection algorithm based on deep learning also has great application prospect and value in the teaching field. At present, two methods are mainly used for detecting the behavior of students in a detection algorithm classroom environment based on deep learning: firstly, a double-flow algorithm and a 3D convolution method are adopted for research; secondly, detecting the behavior of the student and positioning and identifying the position of the student by using the existing single-step and double-step target detection algorithm; however, the method based on the dual stream and 3D convolution has the following disadvantages: the calculation is complex, the expected effect is difficult to realize, the requirement on the hardware performance is high, the training data set is difficult to manufacture, and the accuracy is low; the method is based on single-step or double-step detection method alone, and the requirements of high precision and fast real-time property are easily met; in addition, the data set is not optimized in the above detection modes, the action information of the student is directly captured through recording and broadcasting equipment, and then the action information is input into a detection network for positioning and prediction, so that the detection accuracy is difficult to ensure.

Disclosure of Invention

The present invention is directed to solving one of the above technical problems in the related art, and an object of the present invention is to provide a method, a system, an apparatus and a storage medium for detecting classroom behavior, which can meet the requirements of high accuracy and fast real-time performance for classroom behavior identification.

The first technical scheme adopted by the invention is as follows:

a classroom behavior detection method comprises the following steps:

acquiring an image set to be processed, and preprocessing the image set by adopting a preset algorithm to acquire positive/negative sample images with a preset proportion;

performing feature extraction on the obtained positive/negative sample image to generate a positive/negative sample feature map;

after the obtained positive/negative sample characteristic graphs are classified, generating a positive/negative sample prediction frame;

and matching the positive/negative sample prediction frames by combining the loss function and a preset real frame set, and outputting a detection result after the matching is successful.

Further, the step of acquiring an image set to be processed and processing the image set by using a preset algorithm to acquire a positive/negative sample image with a preset ratio specifically includes the following steps:

acquiring an image set to be processed;

performing foreground pixel intersection and comparison matching on an image set to be processed by adopting a preset algorithm to obtain a plurality of positive sample images;

performing background pixel confidence coefficient matching on an image set to be processed by adopting a preset algorithm to obtain a plurality of negative sample images to be processed;

and performing descending sampling on the negative sample images to be processed according to the background pixel confidence values to obtain a plurality of negative sample images.

Further, the step of generating the positive/negative sample prediction frame after classifying the obtained positive/negative sample feature map specifically includes the following steps:

dividing each positive/negative sample characteristic graph into a plurality of positive/negative sample grids by adopting a preset sliding window;

and acquiring a plurality of positive/negative sample grids, and dividing the positive/negative sample grids into a plurality of positive/negative sample prediction boxes according to an anchor box mechanism.

Further, the step of matching the positive/negative sample prediction frame with the combination loss function and a preset real frame set, and outputting a detection result after the matching is successful specifically comprises the following steps:

combining a preset real frame set and a positive/negative sample prediction frame, acquiring a prediction frame with the maximum confidence value of foreground pixels of the real frame set as a positive sample pre-selection frame, and acquiring a negative sample prediction frame with the maximum confidence value of background pixels of the real frame set as a negative sample pre-selection frame;

and performing loss matching on the positive sample and the negative sample by adopting a loss function, and outputting a detection result after the matching is successful.

Further, the loss matching includes position error matching and confidence error matching, the step of performing loss matching on the positive sample and performing loss matching on the negative sample by using a loss function, and outputting a detection result after the matching is successful specifically includes the following steps:

and performing position error matching and confidence error matching on the positive sample preselection frame by adopting a loss function, performing position error matching and confidence error matching on the negative sample preselection frame by adopting a loss function, and outputting a detection result when the positive sample loss matching is successful and the negative sample loss matching is successful.

The second technical scheme adopted by the invention is as follows:

a classroom behavior detection system comprising:

the preprocessing module is used for acquiring an image set to be processed and preprocessing the image set by adopting a preset algorithm so as to acquire a positive/negative sample image with a preset proportion;

the characteristic extraction module is used for extracting the characteristics of the obtained positive/negative sample image by adopting a convolution network so as to generate a positive/negative sample characteristic diagram;

the classification processing module is used for generating a positive/negative sample prediction box after classifying the obtained positive/negative sample characteristic graph by adopting a two-step detection algorithm;

and the matching module is used for matching the positive/negative sample prediction frame by combining the loss function and the preset real frame set, and outputting a detection result after the matching is successful.

Further, the preprocessing module includes:

a first acquisition unit for acquiring an image set to be processed;

the first matching unit is used for performing foreground pixel intersection comparison matching on the image set to be processed by adopting a preset algorithm so as to obtain a plurality of positive sample images;

the second matching unit is used for performing background pixel confidence coefficient matching on the image set to be processed by adopting a preset algorithm so as to obtain a plurality of negative sample images to be processed;

and the second acquisition unit is used for performing descending sampling on the negative sample images to be processed according to the background pixel confidence value so as to acquire a plurality of negative sample images.

Further, the classification processing module comprises:

the grid unit is used for dividing each positive/negative sample characteristic graph into a plurality of positive/negative sample grids by adopting a preset sliding window;

and the prediction frame unit is used for acquiring a plurality of positive/negative sample grids and dividing the positive/negative sample grids into a plurality of positive/negative sample prediction frames according to the anchor frame mechanism.

Further, the matching module includes:

the pre-selection frame unit is used for combining a preset real frame set and the positive/negative sample prediction frames, acquiring a prediction frame with the maximum confidence value of foreground pixels of the real frame set as a positive sample pre-selection frame, and acquiring a negative sample prediction frame with the maximum confidence value of background pixels of the real frame set as a negative sample pre-selection frame;

and the output unit is used for performing loss matching on the positive samples and the negative samples by adopting a loss function and outputting the detection result after the matching is successful.

Further, the output unit includes:

and the output subunit is used for performing position error matching and confidence error matching on the positive sample preselection frame by adopting a loss function, performing position error matching and confidence error matching on the negative sample preselection frame by adopting the loss function, and outputting a detection result when the positive sample loss matching is successful and the negative sample loss matching is successful.

The third technical scheme adopted by the invention is as follows:

an automatic generation device of computer code, the memory is used for storing at least one program, and the processor is used for loading the at least one program to execute the method.

The fourth technical scheme adopted by the invention is as follows:

a storage medium having stored therein processor-executable instructions for performing the method as described above when executed by a processor.

The invention has the beneficial effects that: the method comprises the steps of obtaining a positive sample image and a negative sample image in a preset proportion by preprocessing an image set to be processed, then carrying out feature extraction on the positive sample image to generate a positive sample feature map and carrying out feature extraction on the negative sample image to generate a negative sample feature map so as to improve the speed of feature extraction, carrying out classification processing on a positive sample and a negative sample, generating a plurality of positive sample prediction frames and negative sample prediction frames so as to comprehensively cover regions possibly appearing in classroom behaviors and reduce the omission phenomenon of classroom shape, finally respectively matching the positive sample prediction frames and the negative sample prediction frames through loss functions, and outputting a detection result after the positive sample and the negative sample are successfully matched; the method and the device realize accurate detection of classroom behaviors of educators in classroom environment, improve identification accuracy and accelerate detection speed, have simple result and easy manufacture of training sets, and reduce dependence on hardware facilities.

Drawings

FIG. 1 is a flow chart of the steps of a classroom behavior detection method of the present invention;

fig. 2 is a block diagram of a classroom behavior detection system according to the present invention.

Detailed Description

Example one

As shown in fig. 1, the present embodiment provides a classroom behavior detection method, which includes the following steps:

s1, acquiring an image set to be processed, and preprocessing the image set by adopting a preset algorithm to acquire positive/negative sample images with preset proportion;

s2, performing feature extraction on the obtained positive/negative sample image to generate a positive/negative sample feature map;

s3, after the obtained positive/negative sample characteristic graph is classified, a positive/negative sample prediction box is generated;

and S4, matching the positive/negative sample prediction frame by combining the loss function and the preset real frame set, and outputting a detection result after the matching is successful.

In this embodiment, the applicable scenario refers to various teaching activities involving educators and educators during teaching, such as a primary school teaching classroom, a middle school teaching classroom, a university teaching classroom, and the like, and details are not repeated herein; the method comprises the steps that an image set to be processed is obtained, the image set is preprocessed through a preset algorithm, so that a positive sample image and a negative sample image in a preset proportion are obtained, wherein the image set is a single-frame image extracted from a monitoring video in the classroom teaching process, the monitoring video records the classroom behavior of an educated person in the classroom teaching process, and the preset algorithm is used for enabling the obtained negative sample image to be more pertinent and representing the whole negative sample space set to improve the classifying effect of positive and negative samples; respectively extracting features of the obtained positive sample image and the negative sample image to respectively generate a positive sample feature map and a negative sample feature map, wherein the feature maps refer to patterns with color features, texture features, shape features and spatial relationship features with greater similarity to real target classroom behaviors, and the extraction of the positive sample feature map and the negative sample feature map is preferably carried out by carrying out feature extraction on Single-step target detection algorithms based on regression, such as (YOLO: You OnlylLook one), Single-time multiple bounding box detectors (SSD: Single Shot multiple Box detectors) and the like to generate a positive sample; classifying the obtained positive sample characteristic diagram and the negative sample characteristic diagram, and then respectively generating a positive sample prediction frame and a negative sample prediction frame, wherein the classification of the positive sample characteristic diagram and the negative sample characteristic diagram is preferably based on a region candidate detection algorithm, such as a (R-FCN), (Faster RCNN), (R-FCN) and other two-step target detection algorithms, so as to generate the positive sample prediction frame and the negative sample prediction frame, finally, the positive sample prediction frame and the negative sample prediction frame are respectively matched by combining a loss function and a preset real frame set, and after the matching is successful, a detection result is output, wherein the preset real frame set refers to a single-frame image of a plurality of classroom behaviors (point name, education recipient behavior) in a real classroom environment collected in advance; the method has the advantages that the single-step target detection algorithm and the double-step target detection algorithm are combined, so that the recognition accuracy is improved, the characteristic extraction speed is increased, the precision and the timeliness of classroom behavior detection are considered, the classroom behavior of the educated person is accurately recognized, the dependence on hardware facilities is reduced, the result is simple, the training set is easy to manufacture, and the further popularization and application of the deep learning-based target detection algorithm in the teaching field are facilitated.

Further, as a preferred embodiment, the step S1 specifically includes the following steps:

s10, acquiring an image set to be processed;

s11, performing foreground pixel intersection and comparison matching on the image set to be processed by adopting a preset algorithm to obtain a plurality of positive sample images;

s12, performing background pixel confidence coefficient matching on the image set to be processed by adopting a preset algorithm to obtain a plurality of negative sample images to be processed;

and S13, performing descending sampling on the negative sample images to be processed according to the background pixel confidence values to obtain a plurality of negative sample images.

In this embodiment, an online hard sample mining algorithm (OHEM) is preferably used to pre-process the acquired image set to be processed, specifically, foreground pixel intersection and comparison matching is performed on images in the image set to be processed by using foreground pixel intersection and comparison preset by the online hard sample mining algorithm and a threshold value, an image with a ratio greater than a preset foreground pixel intersection ratio is acquired as a positive sample image to be processed, and the positive sample image to be processed with the largest foreground pixel intersection ratio is screened from the positive sample image to be processed according to a non-maximum suppression algorithm to serve as the positive sample image; performing background confidence coefficient matching on images in an image set to be processed by adopting an online hard-to-divide sample mining algorithm and a preset background pixel confidence coefficient threshold value, acquiring images smaller than the preset background pixel confidence coefficient threshold value as negative sample images to be processed, acquiring the negative sample images to be processed with the minimum background pixel confidence coefficient value as negative sample images according to the sequence of the background pixel confidence coefficient values from small to large, and performing iterative processing on the negative sample images to be processed for multiple times to acquire a plurality of negative sample images; so that the obtained positive sample image and the negative sample image conform to a preset ratio (for example, the preferred ratio of the positive sample image to the negative sample image in this embodiment is 1: 3), the obtained negative sample is more targeted by preprocessing the obtained data set by using a preset algorithm, such as an online hard sample mining algorithm (OHEM), and the like, so as to improve the classification effect.

Further, as a preferred embodiment, the step S3 specifically includes the following steps:

s30, dividing the positive/negative sample characteristic graph into a plurality of positive/negative sample grids by adopting a preset sliding window;

and S31, acquiring a plurality of positive/negative sample grids, and dividing the positive/negative sample grids into a plurality of positive/negative sample prediction boxes according to an anchor box mechanism.

In this embodiment, a detection algorithm based on region candidates is adopted to classify the positive sample feature map and the negative sample feature map, where the detection algorithm based on region candidates, such as convolutional layer-rpn (region protocol network) introduced by Fast R-CNN, or R-FCN, classifies and detects the positive sample feature map and the negative sample feature map extracted by the single-step target detection algorithm, specifically, each obtained positive sample feature map and negative sample feature map is divided into N × N positive sample grids and N × N negative sample grids (N is a natural number greater than 1) through a preset sliding window, and then each positive sample grid and each negative sample grid are divided into a plurality of positive sample prediction boxes and negative sample prediction boxes according to an anchor frame mechanism to reduce the omission phenomenon of classroom behavior.

Further, as a preferred embodiment, the step S4 specifically includes the following steps:

s40, combining a preset real frame set and a positive/negative sample prediction frame, acquiring a prediction frame with the maximum confidence value of foreground pixels of the real frame set as a positive sample pre-selection frame, and acquiring a negative sample prediction frame with the maximum confidence value of background pixels of the real frame set as a negative sample pre-selection frame;

and S41, performing loss matching on the positive sample and the negative sample by adopting a loss function, and outputting a detection result after the matching is successful.

In the embodiment, the foreground pixel confidence degree of the positive sample prediction frame is respectively judged through a preset real frame set, and the prediction frame with the maximum confidence value of the foreground pixels of the real frame set is obtained as a positive sample pre-selection frame; performing background pixel confidence judgment on the negative sample prediction frame, and acquiring a negative sample prediction frame with the maximum background pixel confidence with the real frame set as a negative sample pre-selection frame; and then, loss matching is carried out on the multiple positive/negative sample preselection frames by adopting a loss function, further matching is carried out, and after the matching is successful, a detection result is output.

Further, as a preferred embodiment, the step S41 specifically includes:

s410, carrying out position error matching and confidence error matching on the positive sample preselection frame by adopting a loss function, carrying out position error matching and confidence error matching on the negative sample preselection frame by adopting the loss function, and outputting a detection result when the positive sample loss matching is successful and the negative sample loss matching is successful.

In this embodiment, position error matching and confidence error matching are performed on the multiple positive sample preselection frames and the multiple negative sample preselection frames through a loss function, so as to further accurately predict category and position information, and finally, after the confidence error and the position error of the positive sample preselection frame and the confidence error and the position error of the negative sample preselection frame are both successfully matched, a detection result is output.

Example two

As shown in fig. 2, the present embodiment provides a classroom behavior detection system, including:

Further as a preferred embodiment, the preprocessing module comprises:

a first acquisition unit for acquiring an image set to be processed;

Further as a preferred embodiment, the classification processing module includes:

Further as a preferred embodiment, the matching module comprises:

Further as a preferred embodiment, the output unit includes:

The data processing system of the embodiment can execute the classroom behavior detection method provided by the first embodiment of the method of the invention, can execute any combination of the implementation steps of the embodiment of the method, and has corresponding functions and beneficial effects of the method.

EXAMPLE III

An apparatus for automatic generation of computer code, the memory for storing at least one program, the processor for loading the at least one program to perform the method of embodiment one.

The automatic computer code generation device of the embodiment can execute the classroom behavior detection method provided by the first embodiment of the method of the invention, can execute any combination implementation steps of the embodiments of the method, and has corresponding functions and beneficial effects of the method.

Example four

A storage medium having stored therein processor-executable instructions for performing a method as in embodiment one when executed by a processor.

The storage medium of this embodiment may execute the classroom behavior detection method provided in the first embodiment of the method of the present invention, may execute any combination of the implementation steps of the method embodiments, and has corresponding functions and advantages of the method.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A classroom behavior detection method is characterized by comprising the following steps:

2. The classroom behavior detection method as recited in claim 1, wherein the step of obtaining the image set to be processed and processing the image set with a predetermined algorithm to obtain positive/negative sample images at a predetermined ratio comprises the steps of:

acquiring an image set to be processed;

3. The classroom behavior detection method as recited in claim 2, wherein the step of generating the positive/negative sample prediction box after classifying the obtained positive/negative sample feature map specifically comprises the steps of:

4. The classroom behavior detection method as recited in claim 3, wherein the step of matching the positive/negative sample prediction box with the combination loss function and the preset real box set and outputting the detection result after the matching is successful comprises the following steps:

and performing loss matching on the positive sample preselection frame and the negative sample preselection frame by adopting a loss function, and outputting a detection result after the matching is successful.

5. The classroom behavior detection method as claimed in claim 4, wherein the loss matching includes location error matching and confidence error matching, and the step of performing loss matching on positive samples and loss matching on negative samples using a loss function and outputting a detection result after matching is successful includes the following steps:

6. A classroom behavior detection system, comprising:

7. The classroom behavior detection system of claim 6, wherein the preprocessing module comprises:

a first acquisition unit for acquiring an image set to be processed;

8. The classroom behavior detection system according to claim 7, wherein the classification processing module comprises: the grid unit is used for dividing each positive/negative sample characteristic graph into a plurality of positive/negative sample grids by adopting a preset sliding window;

9. An apparatus for automatic generation of computer code, comprising a memory for storing at least one program and a processor for loading the at least one program to perform the method of any one of claims 1 to 5.

10. A storage medium having stored therein processor-executable instructions, which when executed by a processor, are configured to perform the method of any one of claims 1-5.