CN114140876A

CN114140876A - Classroom real-time human body action recognition method, computer equipment and readable medium

Info

Publication number: CN114140876A
Application number: CN202111407773.8A
Authority: CN
Inventors: 廖盛斌; 杨宗凯
Original assignee: Central China Normal University
Current assignee: Central China Normal University
Priority date: 2021-11-24
Filing date: 2021-11-24
Publication date: 2022-03-04

Abstract

The application discloses a classroom real-time human body action recognition method, computer equipment and a readable medium, wherein the method comprises the following steps: extracting RGB images and optical flow graphs from a classroom video to be analyzed in real time; inputting the RGB image or the RGB image and the optical flow graph into a trained mixed convolution neural network to obtain a human body action classification result in a classroom video; the hybrid convolutional neural network has a wide residual error network structure and comprises a two-dimensional convolutional network and a three-dimensional convolutional network connected with the two-dimensional convolutional network in parallel, and two-dimensional characteristic vectors and three-dimensional characteristic vectors are respectively extracted from RGB images and light flow graphs; recognizing human body actions based on a feature map obtained by fusing the two-dimensional feature vector and the three-dimensional feature vector; the invention combines the deep convolutional neural network, the RGB image and the optical flow graph, and uses the high-efficiency data extraction algorithm, thereby not only further improving the accuracy of human body action recognition, but also avoiding frame dropping and pause phenomena in the real-time video human body action recognition process.

Description

Classroom real-time human body action recognition method, computer equipment and readable medium

Technical Field

The application relates to the technical field of education informatization, in particular to a classroom real-time human body action recognition method, computer equipment and a readable medium.

Background

In recent years, technologies such as artificial intelligence and computer vision based on deep learning are rapidly developed, the innovation of the field of education informatization is accelerated, concepts such as intelligent education, artificial intelligence and education are continuously proposed, and auxiliary teaching by using a machine is becoming a focus and a hotspot of research, so that the teaching pressure of teachers is relieved, and teachers are helped to make teaching plans.

Classroom teaching is the main position that the school carries out education, and student's classroom also focuses on informationization and intellectuality more and more, and it has important meaning to education and teaching to utilize artificial intelligence technique to carry out analysis and evaluation to student's classroom, and student's classroom action is the main component part of carrying out student's classroom evaluation, through the classroom action to student and carry out the analysis not only can master student's learning state, can send out the warning to the teacher when the student makes dangerous non-learning behavior moreover.

Chinese patent application No. 201910674395.6 discloses a human body behavior recognition method for student classroom based on target detection, which can recognize common classroom behaviors such as sitting up, writing, lying down on desk, looking right or left to look ahead, lifting hands, standing up and the like, but neglects some important non-learning behaviors existing in the classroom; in addition, the method uses a multi-channel 3D CNN network as a reference model for training and testing, reduces the calculated amount to a certain extent, but cannot perform real-time human body motion recognition.

Chinese patent application No. 201711200452.4 discloses a real-time human body motion recognition method, which includes first obtaining a depth image of a video, then extracting human body skeleton data according to the depth image and performing normalization, the used training data needs to mark the initial position of the motion in advance and set a set of key points, although the method realizes real-time human body motion recognition, it needs to extract additional human body skeleton data, and increases the amount of calculation and complexity.

In summary, the existing student classroom-oriented human body action recognition method mainly has the following defects: 1. the study behavior of students is mainly concerned, so that important non-study behaviors are ignored; 2. the used network model can not carry out real-time human body action recognition; 3. the video frame data cannot be fully utilized, and the acquisition of data such as human skeletons is complicated.

Disclosure of Invention

In view of at least one of the drawbacks or needs for improvement of the prior art, the present invention provides a classroom real-time human body motion recognition method, a computer device, and a readable medium, in which a deep convolutional neural network is used as a reference model, the network employs a wide residual structure and integrates two-dimensional convolution and three-dimensional convolution, thereby effectively reducing the amount of computation. In addition, the method takes RGB video frames and optical flow diagrams as input, and does not need to perform extra calculation.

To achieve the above object, according to a first aspect of the present invention, there is provided a real-time human body motion recognition method in class, comprising:

acquiring a classroom video to be analyzed, and extracting an RGB image and an optical flow graph from the classroom video in real time;

inputting the RGB image or the RGB image and the optical flow graph into a trained mixed convolution neural network to obtain a human body action classification result in a classroom video; the human body action classification result is a preset learning behavior or a non-learning behavior;

the hybrid convolutional neural network has a wide residual error network structure, comprises a two-dimensional convolutional network and a three-dimensional convolutional network connected with the two-dimensional convolutional network in parallel, and is respectively used for extracting two-dimensional characteristic vectors and three-dimensional characteristic vectors from RGB images and light flow graphs; and recognizing the human body action based on the characteristic diagram obtained by fusing the two-dimensional characteristic vector and the three-dimensional characteristic vector.

Preferably, in the classroom real-time human body action recognition method, the RGB image and the light flow graph are serialized data in the same time period;

extracting RGB images and optical flow graphs from the classroom video in real time, wherein the RGB images and optical flow graphs comprise:

extracting an RGB image in real time by using an opencv algorithm library;

extracting optical flow graphs of adjacent frames from the classroom video by using a TV-L1 algorithm, wherein the TV-L1 algorithm is as follows:

wherein, I₀And I₁Representing two adjacent RGB images,

u:Ω→R²is a disparity map that needs to be obtained,

which represents the fidelity of the image data,

representing a regularization term, the λ weight ranging between data fidelity and regularization, such that

The following formula can be obtained:

and enabling u when the E is minimum to be the light flow graph of the adjacent frame to be obtained.

Preferably, the method for recognizing real-time human body actions in a classroom inputs the RGB images into a trained hybrid convolutional neural network to obtain human body action classification results in a classroom video, and includes:

selecting N RGB images to be added into a first image sequence;

extracting the first N/2 RGB images from the first image sequence and adding the RGB images into a second image sequence; the second image sequence is an image sequence with N RGB images, and the first N/2 RGB images in the sequence are deleted before new N/2 RGB images are added every time, so that the second image sequence maintains an image sequence with the length of N;

inputting N RGB images in the second image sequence into the deep convolutional neural network to obtain a candidate prediction result Pc;

outputting a final prediction result P: (P + Pc)/2.

Preferably, the classroom real-time human body motion recognition method inputs the RGB image and the optical flow graph into a trained hybrid convolutional neural network, and obtains a human body motion classification result in a classroom video, including:

adding N RGB images into a first image sequence, and adding M optical flow graphs into the first optical flow sequence;

extracting a first N/2 image sequence from the first image sequence and adding the first N/2 image sequence into a second image sequence; extracting the first M/2 optical flow sequences from the first optical flow sequence and adding the first M/2 optical flow sequences into a second optical flow sequence;

the second image sequence and the second optical flow sequence are respectively an image sequence and an optical flow sequence with N RGB images and M optical flow graphs, and the first N/2 image sequence and the M/2 optical flow sequence are deleted before new sequence data are added every time, so that the second image sequence and the second optical flow sequence are always an image sequence and an optical flow sequence with the lengths of N and M;

inputting N RGB images in the second image sequence into the deep convolutional neural network to obtain a candidate prediction result Pc1, and inputting M optical flow graphs in the second optical flow sequence into the deep convolutional neural network to obtain a candidate prediction result Pc 2;

outputting a final prediction result P: (P + Pc)/2; wherein Pc: (x Pc1+ y Pc2), x and y being the weights of the candidate prediction results Pc1 and Pc2, respectively.

Preferably, in the classroom real-time human body action recognition method, in the mixed-dimension convolutional neural network, the three-dimensional convolutional network is used as a residual error connecting part of the two-dimensional convolutional network, and the two-dimensional convolutional network and the three-dimensional convolutional network are combined in the following manner:

[B*S,C,H,W]→[B,C,S,H,W]

where B denotes the batch size, S denotes the number of input images, and C, H, and W denote the number of channels, and the height and width of the feature map, respectively.

Preferably, in the classroom real-time human body motion recognition method, the calculation method of the deep convolutional neural network error is as follows:

wherein, l (x)_i) A tag value, p (x), representing the ith input data_i) Is a predicted value obtained by the ith input data after passing through a convolutional network, Loss (x)_i) The loss function is represented.

Preferably, the classroom real-time human body action recognition method,

the N RGB images can be continuous N images or N images with the same interval;

the M optical flow graphs can be continuous M optical flow graphs or M optical flow graphs with the same interval; the M optical flow maps contain both X-direction and Y-direction optical flow maps.

Preferably, the classroom real-time human body action recognition method further includes:

if the human body action classification result is a learning action, recording the human body action classification result in a log file;

and if the human body action classification result is a non-learning behavior, sending out a warning signal.

According to a second aspect of the present invention, there is also provided a computer device comprising at least one processing unit, and at least one memory unit, wherein the memory unit stores a computer program which, when executed by the processing unit, causes the processing unit to perform any of the steps of the classroom real-time human action recognition method described above.

According to a third aspect of the present invention, there is also provided a computer readable medium storing a computer program executable by a computer device, the computer program, when run on the computer device, causing the computer device to perform the steps of any of the above-mentioned real-time classroom human motion recognition methods.

In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:

(1) in the invention, a deep convolution neural network adopts a wide residual structure and a mode of combining two-dimensional convolution and three-dimensional convolution to respectively extract two-dimensional characteristic vectors and three-dimensional characteristic vectors from an RGB image and a light flow graph, and human body actions are identified based on a characteristic graph obtained by fusing the two-dimensional characteristic vectors and the three-dimensional characteristic vectors; on the premise of ensuring the accuracy, the network parameters are further reduced, so that the real-time human body action recognition can be realized; the RGB image and the optical flow graph are used as input data of the depth convolution neural network, extra extraction of human skeleton data is not needed, and the calculation amount is reduced.

(2) The identification of the human body actions of the video is divided into learning behaviors and non-learning behaviors, the classroom video is comprehensively and objectively analyzed, and the defect of simply analyzing the learning behaviors is overcome.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flowchart of a classroom real-time human body motion recognition method based on RGB images and an optical flow graph according to an embodiment of the present application;

fig. 2 is a schematic diagram of a network structure of a mixed-dimension convolutional neural network provided in an embodiment of the present application;

fig. 3 is a schematic diagram illustrating an example of a wide residual error network according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an RGB image and optical flow graph real-time extraction algorithm provided in the present application;

FIG. 5 is a schematic diagram of a real-time human body motion recognition feedback provided in an embodiment of the present application;

fig. 6 is a logic block diagram of a computer device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

In the following description, the terms "first" and "second" are used for descriptive purposes only and are not intended to indicate or imply relative importance. The following description provides embodiments of the invention, which may be combined with or substituted for various embodiments, and the invention is thus to be construed as embracing all possible combinations of the same and/or different embodiments described. Thus, if one embodiment includes feature A, B, C and another embodiment includes feature B, D, then the invention should also be construed as including embodiments that include one or more of all other possible combinations of A, B, C, D, even though such embodiments may not be explicitly recited in the following text.

The following description provides examples, and does not limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements described without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as appropriate. For example, the described methods may be performed in an order different than the order described, and various steps may be added, omitted, or combined. Furthermore, features described with respect to some examples may be combined into other examples.

The technical idea of the invention is as follows: acquiring classroom video data through equipment such as a camera, acquiring RGB (red, green and blue) image and optical flow diagram data of a video by using an Opencv algorithm library and a TV-L1 algorithm, setting labels on a large amount of acquired data, inputting the data serving as training samples into a constructed deep convolution neural network for training, and storing a trained model; specifically, the human action labels are divided into learning behaviors and non-learning behaviors, wherein the learning behaviors comprise hand lifting, standing up, writing and the like, and the non-learning behaviors comprise throwing, falling, eating and the like. The deep convolution neural network adopts a wide residual error structure, and combines two-dimensional convolution and three-dimensional convolution for use. Then, RGB image and optical flow graph data are extracted from the real-time classroom video to be analyzed and input into a trained model for real-time human body action recognition, and recognition results are divided into two types, one type is a learning behavior and the other type is a non-learning behavior. The invention integrates the deep convolutional neural network, the RGB image and the optical flow graph, and uses the high-efficiency data extraction algorithm, thereby not only further improving the accuracy of human body action recognition, but also avoiding frame dropping and pause phenomena in the real-time video human body action recognition process.

Fig. 1 is a schematic flowchart of a classroom real-time human body motion recognition method based on RGB images and an optical flow graph according to this embodiment; referring to fig. 1, the method comprises the steps of:

s1, acquiring a classroom video to be analyzed, and extracting an RGB image and an optical flow graph from the classroom video in real time;

in this embodiment, the video data of a classroom is acquired by using a conventional camera or other devices, and the RGB image and the optical flow graph extracted in real time from the classroom video are serialized data in the same time period.

Specifically, real-time video data are collected through a camera, then each video frame is extracted through an Opencv algorithm library, and an optical flow graph is extracted through a TV-L1 algorithm; the TV-L1 algorithm is as follows:

wherein, I₀And I₁Representing two adjacent RGB images,

u:Ω→R²is a disparity map that needs to be obtained,

which represents the fidelity of the image data,

The following formula can be obtained:

u, which minimizes E, is the light flow map needed to get the adjacent frame.

Optionally, the extracted RGB image and the optical flow graph are adjusted to a uniform size, such as 224 × 224; and the extracted RGB images and optical flow maps are encoded according to the time sequence, and the encoding formats are frame000001 and frame0000002 … ….

S2, inputting the RGB image or the RGB image and the optical flow graph into a trained hybrid convolutional neural network, wherein the hybrid convolutional neural network is a deep convolutional neural network with a wide residual error network structure, comprises a two-dimensional convolutional network and a three-dimensional convolutional network connected with the two-dimensional convolutional network in parallel, and is used for extracting two-dimensional feature vectors and three-dimensional feature vectors from the RGB image and the optical flow graph respectively;

identifying human body actions based on a characteristic diagram obtained by fusing the two-dimensional characteristic vector and the three-dimensional characteristic vector, and generating a human body action classification result in the classroom video; the human body action classification result is a preset learning behavior or a non-learning behavior;

specifically, the method can be divided into two modes of using optical flow diagram data and not using optical flow diagram data;

(1) if the optical flow graph data is not used:

adding N RGB images into an Incoming _ frames image sequence, wherein the N image sequences can be selected from N continuous images or N images with the same interval;

extracting the first N/2 image sequences from the Incoming _ frames image sequence and adding the image sequences to Working _ frames;

the Working _ frames is a sequence of images with N RGB images and the first N/2 image sequences are deleted before each new image sequence is added to Working _ frames, so that Working _ frames is always a sequence of images of length N;

inputting images in Working _ frames into the deep convolutional neural network, extracting two-dimensional image feature vectors from N RGB images by the two-dimensional convolutional network in the deep convolutional neural network, extracting three-dimensional image feature vectors from the N RGB images by the three-dimensional convolutional network, fusing the two-dimensional image feature vectors and the three-dimensional image feature vectors, and classifying based on the fused image feature vectors to obtain a candidate prediction result Pc;

final predicted result P: (P + Pc)/2.

(2) If optical flow data is used:

adding N RGB images into an integrating _ frames image sequence, and adding M optical flow graphs into the integrating _ flows optical flow sequence, wherein adjacent picture frames are selected from the integrating _ frames image sequence, and the N picture sequences can be selected from continuous N pictures or N images with the same interval; the M optical flow graphs can be continuous M optical flow graphs, or M optical flow graphs with the same interval; however, the M optical flow diagrams include optical flow diagrams in both the X direction and the Y direction;

extracting front N/2 and M/2 picture sequences and optical flow sequences from the Inclusing _ frames image sequence and the Inclusing _ flows respectively and adding the sequence data to the Working _ frames and the Working _ flows respectively;

the work _ frames and the work _ flows are respectively an image sequence and an optical flow sequence with N RGB images and M optical flow graphs, and the former N/2 and M/2 picture sequences and optical flow data are deleted before new sequence data are added into the work _ frames and the work _ flows every time, so that the work _ frames and the work _ flows are always the picture sequences and the optical flow sequences with the lengths of N and M;

inputting pictures in Working _ frames into the deep convolutional neural network, extracting two-dimensional image feature vectors from N RGB images by the two-dimensional convolutional network in the deep convolutional neural network, extracting three-dimensional image feature vectors from the N RGB images by the three-dimensional convolutional network, fusing the two-dimensional image feature vectors and the three-dimensional image feature vectors, and classifying based on the fused image feature vectors to obtain a candidate prediction result Pc 1;

inputting the optical flow in Working _ flows into the deep convolutional neural network, extracting a two-dimensional image feature vector from M optical flow graphs by a two-dimensional convolutional network in the deep convolutional neural network, extracting a three-dimensional image feature vector from M optical flow graphs by a three-dimensional convolutional network, fusing the two-dimensional image feature vector and the three-dimensional image feature vector, and classifying based on the fused image feature vector to obtain a candidate prediction result Pc 2;

pc: (x Pc1+ y Pc2), x and y being the weights of the prediction results Pc1 and Pc2, respectively;

final predicted result P: (P + Pc)/2.

In this embodiment, the input of the network may be an RGB image or an RGB image and an optical flow graph, if the requirement for accuracy is high, the RGB image and the optical flow graph data may be used as input at the same time, and if the requirement for speed of real-time motion recognition is high, only the RGB image may be selected as input.

According to the scheme, the RGB video frames and the optical flow graph are used as input, the characteristics of the RGB video frames and the optical flow graph are fused to perform human body action recognition, data such as human bones do not need to be additionally extracted, and the calculated amount is reduced.

Fig. 2 is a schematic diagram of a network structure of the mixed-dimension convolutional neural network provided in this embodiment; as shown in fig. 2, a three-dimensional convolution network in a mixed-dimension convolution neural network is used as a residual connecting part of a two-dimensional convolution network, and the two-dimensional convolution network and the three-dimensional convolution network are combined in a manner that:

[B*S,C,H,W]→[B,C,S,H,W]

In a specific example, the two-dimensional convolutional network uses a BN-initiation network as a basic network, the three-dimensional convolutional network uses a 3DRESNet18 network as a basic network, and the three-dimensional convolutional network is used as a residual connecting part of the two-dimensional convolutional network, so that the mixed-dimension convolutional neural network uses two layers of residual connections in width to form a wide residual network structure; and due to short connections, the residual error network effectively avoids the degradation problem. The residual network structure of the mixed dimension convolution neural network is represented by the following formula:

F(x)＝H(x)-x

where f (x) represents the residual of the network, h (x) is the prediction value, and x represents the input of the layer convolution.

The mixed dimension convolution neural network provided by the embodiment adopts a wide residual error structure and integrates a two-dimensional convolution network and a three-dimensional convolution network, so that the calculated amount is effectively reduced; on the premise of ensuring the accuracy, the network parameters are further reduced, so that the human body action recognition can be realized in real time.

Fig. 3 is a schematic diagram of a training process of the mixed-dimension convolutional neural network provided in this embodiment, and referring to fig. 3, the training process includes:

acquiring training samples, wherein the training samples are RGB images and optical flow diagram data extracted from classroom videos, and each training sample is provided with a human body action label; specifically, the human action labels are divided into learning behaviors and non-learning behaviors, wherein the learning behaviors comprise hand lifting, standing up, writing and the like, and the non-learning behaviors comprise throwing, falling, eating and the like; in one specific example, when an RGB image and a light flow graph are used simultaneously, training of the RGB image and the light flow graph is performed independently, and when the RGB image is trained, the input of the deep convolutional neural network is 16 images randomly selected from an RGB image sequence; when training the optical flow graph, input is 6 optical flow images randomly selected from the optical flow graph sequence, including 3 optical flow graphs in the X direction and the Y direction respectively. Referring to fig. 4, for example, the RGB image is input into the deep convolutional neural network from Working _ configuration and Incoming _ configuration, and the last half of the data in Working _ configuration and the first half of the data in Incoming _ configuration are combined and input into the deep convolutional neural network together, where the RGB image and the optical flow graph may be continuous or discontinuous.

Inputting the training sample and the human body action label corresponding to the training sample into a mixed dimension convolutional neural network, and outputting a human body action predicted value corresponding to the training sample through the mixed dimension convolutional neural network to be trained;

and calculating a loss function according to the human body action label and a human body action predicted value output by the mixed dimension convolutional neural network, and reversely adjusting the model parameters of the mixed dimension convolutional neural network to be trained until the loss function is minimized to obtain a trained mixed dimension convolutional neural network model. The calculation method of the loss function comprises the following steps:

S3, outputting real-time human body action recognition results in a classroom to help teachers to manage and teach the classroom;

fig. 5 is a schematic diagram of real-time human body motion recognition feedback provided in this embodiment; referring to fig. 5, the human body action recognition result can be divided into two categories, namely a learning action and a non-learning action, and the human body action in the classroom is analyzed according to the final classification result of the deep convolutional neural network, specifically: when the human body action recognition result is a learning action, the action is recorded in a log to help a teacher to specify a teaching plan, when the human body action recognition result is a non-learning action, a warning is sent to the teacher to help the teacher manage the classroom, and the classroom is recognized and analyzed in real time, so that the teacher can be helped to better manage the classroom, and support is provided for the teacher to make a teaching strategy.

It should be noted that although in the above-described embodiments, the operations of the methods of the embodiments of the present specification are described in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the steps depicted in the flowcharts may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

The present embodiment further provides a computer device, as shown in fig. 6, which includes at least one processor and at least one memory, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the steps of the classroom real-time human body motion recognition method; in this embodiment, the types of the processor and the memory are not particularly limited, for example: the processor may be a microprocessor, digital information processor, on-chip programmable logic system, or the like; the memory may be volatile memory, non-volatile memory, a combination thereof, or the like.

The computer device may also communicate with one or more external devices (e.g., keyboard, pointing terminal, display, etc.), with one or more terminals that enable a user to interact with the computer device, and/or with any terminals (e.g., network card, modem, etc.) that enable the computer device to communicate with one or more other computing terminals. Such communication may be through an input/output (I/O) interface. Also, the computer device may communicate with one or more networks (e.g., a Local Area Network (LAN), Wide Area Network (WAN), and/or a public Network, such as the internet) via the Network adapter.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A classroom real-time human body action recognition method is characterized by comprising the following steps:

2. The real-time classroom human body motion recognition method as claimed in claim 1, wherein the RGB images and the optical flow graph are serialized data of the same time period;

extracting an RGB image in real time by using an opencv algorithm library;

wherein, I₀And I₁Representing two adjacent RGB images,

u：Ω→R²is a disparity map that needs to be obtained,

which represents the fidelity of the image data,

The following formula can be obtained:

3. The real-time human body motion recognition method in class according to claim 1 or 2, wherein the step of inputting the RGB images into a trained hybrid convolutional neural network to obtain human body motion classification results in class videos comprises:

selecting N RGB images to be added into a first image sequence;

outputting a final prediction result P: (P + Pc)/2.

4. The real-time human body motion recognition method in class according to claim 1 or 2, wherein inputting the RGB image and the optical flow graph into a trained hybrid convolutional neural network to obtain the human body motion classification result in the class video, comprises:

outputting a final prediction result P: (P + Pc)/2; wherein Pc: x and y are weights of the candidate prediction results Pc1 and Pc2, respectively (x Pc1+ y Pc 2).

5. The classroom real-time human body action recognition method as claimed in claim 1, wherein in the mixed dimension convolutional neural network, a three-dimensional convolutional network is used as a residual connecting part of a two-dimensional convolutional network, and the two-dimensional convolutional network and the three-dimensional convolutional network are combined in a manner that:

[B*S，C，H，W]→[B，C，S，H，W]

6. The real-time classroom human body motion recognition method as recited in claim 1, wherein the deep convolutional neural network error is calculated by:

7. The real-time classroom human body motion recognition method according to claim 4,

the N RGB images can be continuous N images or N images with the same interval;

8. The real-time classroom human body motion recognition method as recited in claim 1, further comprising:

9. A computer arrangement comprising at least one processing unit and at least one memory unit, wherein the memory unit stores a computer program which, when executed by the processing unit, causes the processing unit to carry out the steps of the method according to any one of claims 1 to 8.

10. A computer-readable medium, in which a computer program is stored which is executable by a computer device, and which, when run on the computer device, causes the computer device to carry out the steps of the method according to any one of claims 1 to 8.