CN114140876A - Classroom real-time human body action recognition method, computer equipment and readable medium - Google Patents

Classroom real-time human body action recognition method, computer equipment and readable medium Download PDF

Info

Publication number
CN114140876A
CN114140876A CN202111407773.8A CN202111407773A CN114140876A CN 114140876 A CN114140876 A CN 114140876A CN 202111407773 A CN202111407773 A CN 202111407773A CN 114140876 A CN114140876 A CN 114140876A
Authority
CN
China
Prior art keywords
optical flow
human body
sequence
classroom
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111407773.8A
Other languages
Chinese (zh)
Inventor
廖盛斌
杨宗凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central China Normal University
Original Assignee
Central China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central China Normal University filed Critical Central China Normal University
Priority to CN202111407773.8A priority Critical patent/CN114140876A/en
Publication of CN114140876A publication Critical patent/CN114140876A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a classroom real-time human body action recognition method, computer equipment and a readable medium, wherein the method comprises the following steps: extracting RGB images and optical flow graphs from a classroom video to be analyzed in real time; inputting the RGB image or the RGB image and the optical flow graph into a trained mixed convolution neural network to obtain a human body action classification result in a classroom video; the hybrid convolutional neural network has a wide residual error network structure and comprises a two-dimensional convolutional network and a three-dimensional convolutional network connected with the two-dimensional convolutional network in parallel, and two-dimensional characteristic vectors and three-dimensional characteristic vectors are respectively extracted from RGB images and light flow graphs; recognizing human body actions based on a feature map obtained by fusing the two-dimensional feature vector and the three-dimensional feature vector; the invention combines the deep convolutional neural network, the RGB image and the optical flow graph, and uses the high-efficiency data extraction algorithm, thereby not only further improving the accuracy of human body action recognition, but also avoiding frame dropping and pause phenomena in the real-time video human body action recognition process.

Description

Classroom real-time human body action recognition method, computer equipment and readable medium
Technical Field
The application relates to the technical field of education informatization, in particular to a classroom real-time human body action recognition method, computer equipment and a readable medium.
Background
In recent years, technologies such as artificial intelligence and computer vision based on deep learning are rapidly developed, the innovation of the field of education informatization is accelerated, concepts such as intelligent education, artificial intelligence and education are continuously proposed, and auxiliary teaching by using a machine is becoming a focus and a hotspot of research, so that the teaching pressure of teachers is relieved, and teachers are helped to make teaching plans.
Classroom teaching is the main position that the school carries out education, and student's classroom also focuses on informationization and intellectuality more and more, and it has important meaning to education and teaching to utilize artificial intelligence technique to carry out analysis and evaluation to student's classroom, and student's classroom action is the main component part of carrying out student's classroom evaluation, through the classroom action to student and carry out the analysis not only can master student's learning state, can send out the warning to the teacher when the student makes dangerous non-learning behavior moreover.
Chinese patent application No. 201910674395.6 discloses a human body behavior recognition method for student classroom based on target detection, which can recognize common classroom behaviors such as sitting up, writing, lying down on desk, looking right or left to look ahead, lifting hands, standing up and the like, but neglects some important non-learning behaviors existing in the classroom; in addition, the method uses a multi-channel 3D CNN network as a reference model for training and testing, reduces the calculated amount to a certain extent, but cannot perform real-time human body motion recognition.
Chinese patent application No. 201711200452.4 discloses a real-time human body motion recognition method, which includes first obtaining a depth image of a video, then extracting human body skeleton data according to the depth image and performing normalization, the used training data needs to mark the initial position of the motion in advance and set a set of key points, although the method realizes real-time human body motion recognition, it needs to extract additional human body skeleton data, and increases the amount of calculation and complexity.
In summary, the existing student classroom-oriented human body action recognition method mainly has the following defects: 1. the study behavior of students is mainly concerned, so that important non-study behaviors are ignored; 2. the used network model can not carry out real-time human body action recognition; 3. the video frame data cannot be fully utilized, and the acquisition of data such as human skeletons is complicated.
Disclosure of Invention
In view of at least one of the drawbacks or needs for improvement of the prior art, the present invention provides a classroom real-time human body motion recognition method, a computer device, and a readable medium, in which a deep convolutional neural network is used as a reference model, the network employs a wide residual structure and integrates two-dimensional convolution and three-dimensional convolution, thereby effectively reducing the amount of computation. In addition, the method takes RGB video frames and optical flow diagrams as input, and does not need to perform extra calculation.
To achieve the above object, according to a first aspect of the present invention, there is provided a real-time human body motion recognition method in class, comprising:
acquiring a classroom video to be analyzed, and extracting an RGB image and an optical flow graph from the classroom video in real time;
inputting the RGB image or the RGB image and the optical flow graph into a trained mixed convolution neural network to obtain a human body action classification result in a classroom video; the human body action classification result is a preset learning behavior or a non-learning behavior;
the hybrid convolutional neural network has a wide residual error network structure, comprises a two-dimensional convolutional network and a three-dimensional convolutional network connected with the two-dimensional convolutional network in parallel, and is respectively used for extracting two-dimensional characteristic vectors and three-dimensional characteristic vectors from RGB images and light flow graphs; and recognizing the human body action based on the characteristic diagram obtained by fusing the two-dimensional characteristic vector and the three-dimensional characteristic vector.
Preferably, in the classroom real-time human body action recognition method, the RGB image and the light flow graph are serialized data in the same time period;
extracting RGB images and optical flow graphs from the classroom video in real time, wherein the RGB images and optical flow graphs comprise:
extracting an RGB image in real time by using an opencv algorithm library;
extracting optical flow graphs of adjacent frames from the classroom video by using a TV-L1 algorithm, wherein the TV-L1 algorithm is as follows:
Figure BDA0003372994450000031
wherein, I0And I1Representing two adjacent RGB images,
Figure BDA0003372994450000032
u:Ω→R2is a disparity map that needs to be obtained,
Figure BDA0003372994450000033
which represents the fidelity of the image data,
Figure BDA0003372994450000034
representing a regularization term, the λ weight ranging between data fidelity and regularization, such that
Figure BDA0003372994450000035
The following formula can be obtained:
Figure BDA0003372994450000036
and enabling u when the E is minimum to be the light flow graph of the adjacent frame to be obtained.
Preferably, the method for recognizing real-time human body actions in a classroom inputs the RGB images into a trained hybrid convolutional neural network to obtain human body action classification results in a classroom video, and includes:
selecting N RGB images to be added into a first image sequence;
extracting the first N/2 RGB images from the first image sequence and adding the RGB images into a second image sequence; the second image sequence is an image sequence with N RGB images, and the first N/2 RGB images in the sequence are deleted before new N/2 RGB images are added every time, so that the second image sequence maintains an image sequence with the length of N;
inputting N RGB images in the second image sequence into the deep convolutional neural network to obtain a candidate prediction result Pc;
outputting a final prediction result P: (P + Pc)/2.
Preferably, the classroom real-time human body motion recognition method inputs the RGB image and the optical flow graph into a trained hybrid convolutional neural network, and obtains a human body motion classification result in a classroom video, including:
adding N RGB images into a first image sequence, and adding M optical flow graphs into the first optical flow sequence;
extracting a first N/2 image sequence from the first image sequence and adding the first N/2 image sequence into a second image sequence; extracting the first M/2 optical flow sequences from the first optical flow sequence and adding the first M/2 optical flow sequences into a second optical flow sequence;
the second image sequence and the second optical flow sequence are respectively an image sequence and an optical flow sequence with N RGB images and M optical flow graphs, and the first N/2 image sequence and the M/2 optical flow sequence are deleted before new sequence data are added every time, so that the second image sequence and the second optical flow sequence are always an image sequence and an optical flow sequence with the lengths of N and M;
inputting N RGB images in the second image sequence into the deep convolutional neural network to obtain a candidate prediction result Pc1, and inputting M optical flow graphs in the second optical flow sequence into the deep convolutional neural network to obtain a candidate prediction result Pc 2;
outputting a final prediction result P: (P + Pc)/2; wherein Pc: (x Pc1+ y Pc2), x and y being the weights of the candidate prediction results Pc1 and Pc2, respectively.
Preferably, in the classroom real-time human body action recognition method, in the mixed-dimension convolutional neural network, the three-dimensional convolutional network is used as a residual error connecting part of the two-dimensional convolutional network, and the two-dimensional convolutional network and the three-dimensional convolutional network are combined in the following manner:
[B*S,C,H,W]→[B,C,S,H,W]
where B denotes the batch size, S denotes the number of input images, and C, H, and W denote the number of channels, and the height and width of the feature map, respectively.
Preferably, in the classroom real-time human body motion recognition method, the calculation method of the deep convolutional neural network error is as follows:
Figure BDA0003372994450000041
wherein, l (x)i) A tag value, p (x), representing the ith input datai) Is a predicted value obtained by the ith input data after passing through a convolutional network, Loss (x)i) The loss function is represented.
Preferably, the classroom real-time human body action recognition method,
the N RGB images can be continuous N images or N images with the same interval;
the M optical flow graphs can be continuous M optical flow graphs or M optical flow graphs with the same interval; the M optical flow maps contain both X-direction and Y-direction optical flow maps.
Preferably, the classroom real-time human body action recognition method further includes:
if the human body action classification result is a learning action, recording the human body action classification result in a log file;
and if the human body action classification result is a non-learning behavior, sending out a warning signal.
According to a second aspect of the present invention, there is also provided a computer device comprising at least one processing unit, and at least one memory unit, wherein the memory unit stores a computer program which, when executed by the processing unit, causes the processing unit to perform any of the steps of the classroom real-time human action recognition method described above.
According to a third aspect of the present invention, there is also provided a computer readable medium storing a computer program executable by a computer device, the computer program, when run on the computer device, causing the computer device to perform the steps of any of the above-mentioned real-time classroom human motion recognition methods.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
(1) in the invention, a deep convolution neural network adopts a wide residual structure and a mode of combining two-dimensional convolution and three-dimensional convolution to respectively extract two-dimensional characteristic vectors and three-dimensional characteristic vectors from an RGB image and a light flow graph, and human body actions are identified based on a characteristic graph obtained by fusing the two-dimensional characteristic vectors and the three-dimensional characteristic vectors; on the premise of ensuring the accuracy, the network parameters are further reduced, so that the real-time human body action recognition can be realized; the RGB image and the optical flow graph are used as input data of the depth convolution neural network, extra extraction of human skeleton data is not needed, and the calculation amount is reduced.
(2) The identification of the human body actions of the video is divided into learning behaviors and non-learning behaviors, the classroom video is comprehensively and objectively analyzed, and the defect of simply analyzing the learning behaviors is overcome.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic flowchart of a classroom real-time human body motion recognition method based on RGB images and an optical flow graph according to an embodiment of the present application;
fig. 2 is a schematic diagram of a network structure of a mixed-dimension convolutional neural network provided in an embodiment of the present application;
fig. 3 is a schematic diagram illustrating an example of a wide residual error network according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an RGB image and optical flow graph real-time extraction algorithm provided in the present application;
FIG. 5 is a schematic diagram of a real-time human body motion recognition feedback provided in an embodiment of the present application;
fig. 6 is a logic block diagram of a computer device provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
In the following description, the terms "first" and "second" are used for descriptive purposes only and are not intended to indicate or imply relative importance. The following description provides embodiments of the invention, which may be combined with or substituted for various embodiments, and the invention is thus to be construed as embracing all possible combinations of the same and/or different embodiments described. Thus, if one embodiment includes feature A, B, C and another embodiment includes feature B, D, then the invention should also be construed as including embodiments that include one or more of all other possible combinations of A, B, C, D, even though such embodiments may not be explicitly recited in the following text.
The following description provides examples, and does not limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements described without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as appropriate. For example, the described methods may be performed in an order different than the order described, and various steps may be added, omitted, or combined. Furthermore, features described with respect to some examples may be combined into other examples.
The technical idea of the invention is as follows: acquiring classroom video data through equipment such as a camera, acquiring RGB (red, green and blue) image and optical flow diagram data of a video by using an Opencv algorithm library and a TV-L1 algorithm, setting labels on a large amount of acquired data, inputting the data serving as training samples into a constructed deep convolution neural network for training, and storing a trained model; specifically, the human action labels are divided into learning behaviors and non-learning behaviors, wherein the learning behaviors comprise hand lifting, standing up, writing and the like, and the non-learning behaviors comprise throwing, falling, eating and the like. The deep convolution neural network adopts a wide residual error structure, and combines two-dimensional convolution and three-dimensional convolution for use. Then, RGB image and optical flow graph data are extracted from the real-time classroom video to be analyzed and input into a trained model for real-time human body action recognition, and recognition results are divided into two types, one type is a learning behavior and the other type is a non-learning behavior. The invention integrates the deep convolutional neural network, the RGB image and the optical flow graph, and uses the high-efficiency data extraction algorithm, thereby not only further improving the accuracy of human body action recognition, but also avoiding frame dropping and pause phenomena in the real-time video human body action recognition process.
Fig. 1 is a schematic flowchart of a classroom real-time human body motion recognition method based on RGB images and an optical flow graph according to this embodiment; referring to fig. 1, the method comprises the steps of:
s1, acquiring a classroom video to be analyzed, and extracting an RGB image and an optical flow graph from the classroom video in real time;
in this embodiment, the video data of a classroom is acquired by using a conventional camera or other devices, and the RGB image and the optical flow graph extracted in real time from the classroom video are serialized data in the same time period.
Specifically, real-time video data are collected through a camera, then each video frame is extracted through an Opencv algorithm library, and an optical flow graph is extracted through a TV-L1 algorithm; the TV-L1 algorithm is as follows:
Figure BDA0003372994450000071
wherein, I0And I1Representing two adjacent RGB images,
Figure BDA0003372994450000072
u:Ω→R2is a disparity map that needs to be obtained,
Figure BDA0003372994450000073
which represents the fidelity of the image data,
Figure BDA0003372994450000074
representing a regularization term, the λ weight ranging between data fidelity and regularization, such that
Figure BDA0003372994450000075
The following formula can be obtained:
Figure BDA0003372994450000076
u, which minimizes E, is the light flow map needed to get the adjacent frame.
Optionally, the extracted RGB image and the optical flow graph are adjusted to a uniform size, such as 224 × 224; and the extracted RGB images and optical flow maps are encoded according to the time sequence, and the encoding formats are frame000001 and frame0000002 … ….
S2, inputting the RGB image or the RGB image and the optical flow graph into a trained hybrid convolutional neural network, wherein the hybrid convolutional neural network is a deep convolutional neural network with a wide residual error network structure, comprises a two-dimensional convolutional network and a three-dimensional convolutional network connected with the two-dimensional convolutional network in parallel, and is used for extracting two-dimensional feature vectors and three-dimensional feature vectors from the RGB image and the optical flow graph respectively;
identifying human body actions based on a characteristic diagram obtained by fusing the two-dimensional characteristic vector and the three-dimensional characteristic vector, and generating a human body action classification result in the classroom video; the human body action classification result is a preset learning behavior or a non-learning behavior;
specifically, the method can be divided into two modes of using optical flow diagram data and not using optical flow diagram data;
(1) if the optical flow graph data is not used:
adding N RGB images into an Incoming _ frames image sequence, wherein the N image sequences can be selected from N continuous images or N images with the same interval;
extracting the first N/2 image sequences from the Incoming _ frames image sequence and adding the image sequences to Working _ frames;
the Working _ frames is a sequence of images with N RGB images and the first N/2 image sequences are deleted before each new image sequence is added to Working _ frames, so that Working _ frames is always a sequence of images of length N;
inputting images in Working _ frames into the deep convolutional neural network, extracting two-dimensional image feature vectors from N RGB images by the two-dimensional convolutional network in the deep convolutional neural network, extracting three-dimensional image feature vectors from the N RGB images by the three-dimensional convolutional network, fusing the two-dimensional image feature vectors and the three-dimensional image feature vectors, and classifying based on the fused image feature vectors to obtain a candidate prediction result Pc;
final predicted result P: (P + Pc)/2.
(2) If optical flow data is used:
adding N RGB images into an integrating _ frames image sequence, and adding M optical flow graphs into the integrating _ flows optical flow sequence, wherein adjacent picture frames are selected from the integrating _ frames image sequence, and the N picture sequences can be selected from continuous N pictures or N images with the same interval; the M optical flow graphs can be continuous M optical flow graphs, or M optical flow graphs with the same interval; however, the M optical flow diagrams include optical flow diagrams in both the X direction and the Y direction;
extracting front N/2 and M/2 picture sequences and optical flow sequences from the Inclusing _ frames image sequence and the Inclusing _ flows respectively and adding the sequence data to the Working _ frames and the Working _ flows respectively;
the work _ frames and the work _ flows are respectively an image sequence and an optical flow sequence with N RGB images and M optical flow graphs, and the former N/2 and M/2 picture sequences and optical flow data are deleted before new sequence data are added into the work _ frames and the work _ flows every time, so that the work _ frames and the work _ flows are always the picture sequences and the optical flow sequences with the lengths of N and M;
inputting pictures in Working _ frames into the deep convolutional neural network, extracting two-dimensional image feature vectors from N RGB images by the two-dimensional convolutional network in the deep convolutional neural network, extracting three-dimensional image feature vectors from the N RGB images by the three-dimensional convolutional network, fusing the two-dimensional image feature vectors and the three-dimensional image feature vectors, and classifying based on the fused image feature vectors to obtain a candidate prediction result Pc 1;
inputting the optical flow in Working _ flows into the deep convolutional neural network, extracting a two-dimensional image feature vector from M optical flow graphs by a two-dimensional convolutional network in the deep convolutional neural network, extracting a three-dimensional image feature vector from M optical flow graphs by a three-dimensional convolutional network, fusing the two-dimensional image feature vector and the three-dimensional image feature vector, and classifying based on the fused image feature vector to obtain a candidate prediction result Pc 2;
pc: (x Pc1+ y Pc2), x and y being the weights of the prediction results Pc1 and Pc2, respectively;
final predicted result P: (P + Pc)/2.
In this embodiment, the input of the network may be an RGB image or an RGB image and an optical flow graph, if the requirement for accuracy is high, the RGB image and the optical flow graph data may be used as input at the same time, and if the requirement for speed of real-time motion recognition is high, only the RGB image may be selected as input.
According to the scheme, the RGB video frames and the optical flow graph are used as input, the characteristics of the RGB video frames and the optical flow graph are fused to perform human body action recognition, data such as human bones do not need to be additionally extracted, and the calculated amount is reduced.
Fig. 2 is a schematic diagram of a network structure of the mixed-dimension convolutional neural network provided in this embodiment; as shown in fig. 2, a three-dimensional convolution network in a mixed-dimension convolution neural network is used as a residual connecting part of a two-dimensional convolution network, and the two-dimensional convolution network and the three-dimensional convolution network are combined in a manner that:
[B*S,C,H,W]→[B,C,S,H,W]
where B denotes the batch size, S denotes the number of input images, and C, H, and W denote the number of channels, and the height and width of the feature map, respectively.
In a specific example, the two-dimensional convolutional network uses a BN-initiation network as a basic network, the three-dimensional convolutional network uses a 3DRESNet18 network as a basic network, and the three-dimensional convolutional network is used as a residual connecting part of the two-dimensional convolutional network, so that the mixed-dimension convolutional neural network uses two layers of residual connections in width to form a wide residual network structure; and due to short connections, the residual error network effectively avoids the degradation problem. The residual network structure of the mixed dimension convolution neural network is represented by the following formula:
F(x)=H(x)-x
where f (x) represents the residual of the network, h (x) is the prediction value, and x represents the input of the layer convolution.
The mixed dimension convolution neural network provided by the embodiment adopts a wide residual error structure and integrates a two-dimensional convolution network and a three-dimensional convolution network, so that the calculated amount is effectively reduced; on the premise of ensuring the accuracy, the network parameters are further reduced, so that the human body action recognition can be realized in real time.
Fig. 3 is a schematic diagram of a training process of the mixed-dimension convolutional neural network provided in this embodiment, and referring to fig. 3, the training process includes:
acquiring training samples, wherein the training samples are RGB images and optical flow diagram data extracted from classroom videos, and each training sample is provided with a human body action label; specifically, the human action labels are divided into learning behaviors and non-learning behaviors, wherein the learning behaviors comprise hand lifting, standing up, writing and the like, and the non-learning behaviors comprise throwing, falling, eating and the like; in one specific example, when an RGB image and a light flow graph are used simultaneously, training of the RGB image and the light flow graph is performed independently, and when the RGB image is trained, the input of the deep convolutional neural network is 16 images randomly selected from an RGB image sequence; when training the optical flow graph, input is 6 optical flow images randomly selected from the optical flow graph sequence, including 3 optical flow graphs in the X direction and the Y direction respectively. Referring to fig. 4, for example, the RGB image is input into the deep convolutional neural network from Working _ configuration and Incoming _ configuration, and the last half of the data in Working _ configuration and the first half of the data in Incoming _ configuration are combined and input into the deep convolutional neural network together, where the RGB image and the optical flow graph may be continuous or discontinuous.
Inputting the training sample and the human body action label corresponding to the training sample into a mixed dimension convolutional neural network, and outputting a human body action predicted value corresponding to the training sample through the mixed dimension convolutional neural network to be trained;
and calculating a loss function according to the human body action label and a human body action predicted value output by the mixed dimension convolutional neural network, and reversely adjusting the model parameters of the mixed dimension convolutional neural network to be trained until the loss function is minimized to obtain a trained mixed dimension convolutional neural network model. The calculation method of the loss function comprises the following steps:
Figure BDA0003372994450000111
wherein, l (x)i) A tag value, p (x), representing the ith input datai) Is a predicted value obtained by the ith input data after passing through a convolutional network, Loss (x)i) The loss function is represented.
S3, outputting real-time human body action recognition results in a classroom to help teachers to manage and teach the classroom;
fig. 5 is a schematic diagram of real-time human body motion recognition feedback provided in this embodiment; referring to fig. 5, the human body action recognition result can be divided into two categories, namely a learning action and a non-learning action, and the human body action in the classroom is analyzed according to the final classification result of the deep convolutional neural network, specifically: when the human body action recognition result is a learning action, the action is recorded in a log to help a teacher to specify a teaching plan, when the human body action recognition result is a non-learning action, a warning is sent to the teacher to help the teacher manage the classroom, and the classroom is recognized and analyzed in real time, so that the teacher can be helped to better manage the classroom, and support is provided for the teacher to make a teaching strategy.
It should be noted that although in the above-described embodiments, the operations of the methods of the embodiments of the present specification are described in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the steps depicted in the flowcharts may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
The present embodiment further provides a computer device, as shown in fig. 6, which includes at least one processor and at least one memory, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the steps of the classroom real-time human body motion recognition method; in this embodiment, the types of the processor and the memory are not particularly limited, for example: the processor may be a microprocessor, digital information processor, on-chip programmable logic system, or the like; the memory may be volatile memory, non-volatile memory, a combination thereof, or the like.
The computer device may also communicate with one or more external devices (e.g., keyboard, pointing terminal, display, etc.), with one or more terminals that enable a user to interact with the computer device, and/or with any terminals (e.g., network card, modem, etc.) that enable the computer device to communicate with one or more other computing terminals. Such communication may be through an input/output (I/O) interface. Also, the computer device may communicate with one or more networks (e.g., a Local Area Network (LAN), Wide Area Network (WAN), and/or a public Network, such as the internet) via the Network adapter.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A classroom real-time human body action recognition method is characterized by comprising the following steps:
acquiring a classroom video to be analyzed, and extracting an RGB image and an optical flow graph from the classroom video in real time;
inputting the RGB image or the RGB image and the optical flow graph into a trained mixed convolution neural network to obtain a human body action classification result in a classroom video; the human body action classification result is a preset learning behavior or a non-learning behavior;
the hybrid convolutional neural network has a wide residual error network structure, comprises a two-dimensional convolutional network and a three-dimensional convolutional network connected with the two-dimensional convolutional network in parallel, and is respectively used for extracting two-dimensional characteristic vectors and three-dimensional characteristic vectors from RGB images and light flow graphs; and recognizing the human body action based on the characteristic diagram obtained by fusing the two-dimensional characteristic vector and the three-dimensional characteristic vector.
2. The real-time classroom human body motion recognition method as claimed in claim 1, wherein the RGB images and the optical flow graph are serialized data of the same time period;
extracting RGB images and optical flow graphs from the classroom video in real time, wherein the RGB images and optical flow graphs comprise:
extracting an RGB image in real time by using an opencv algorithm library;
extracting optical flow graphs of adjacent frames from the classroom video by using a TV-L1 algorithm, wherein the TV-L1 algorithm is as follows:
Figure FDA0003372994440000011
wherein, I0And I1Representing two adjacent RGB images,
Figure FDA0003372994440000012
u:Ω→R2is a disparity map that needs to be obtained,
Figure FDA0003372994440000013
which represents the fidelity of the image data,
Figure FDA0003372994440000014
representing a regularization term, the λ weight ranging between data fidelity and regularization, such that
Figure FDA0003372994440000015
The following formula can be obtained:
Figure FDA0003372994440000016
and enabling u when the E is minimum to be the light flow graph of the adjacent frame to be obtained.
3. The real-time human body motion recognition method in class according to claim 1 or 2, wherein the step of inputting the RGB images into a trained hybrid convolutional neural network to obtain human body motion classification results in class videos comprises:
selecting N RGB images to be added into a first image sequence;
extracting the first N/2 RGB images from the first image sequence and adding the RGB images into a second image sequence; the second image sequence is an image sequence with N RGB images, and the first N/2 RGB images in the sequence are deleted before new N/2 RGB images are added every time, so that the second image sequence maintains an image sequence with the length of N;
inputting N RGB images in the second image sequence into the deep convolutional neural network to obtain a candidate prediction result Pc;
outputting a final prediction result P: (P + Pc)/2.
4. The real-time human body motion recognition method in class according to claim 1 or 2, wherein inputting the RGB image and the optical flow graph into a trained hybrid convolutional neural network to obtain the human body motion classification result in the class video, comprises:
adding N RGB images into a first image sequence, and adding M optical flow graphs into the first optical flow sequence;
extracting a first N/2 image sequence from the first image sequence and adding the first N/2 image sequence into a second image sequence; extracting the first M/2 optical flow sequences from the first optical flow sequence and adding the first M/2 optical flow sequences into a second optical flow sequence;
the second image sequence and the second optical flow sequence are respectively an image sequence and an optical flow sequence with N RGB images and M optical flow graphs, and the first N/2 image sequence and the M/2 optical flow sequence are deleted before new sequence data are added every time, so that the second image sequence and the second optical flow sequence are always an image sequence and an optical flow sequence with the lengths of N and M;
inputting N RGB images in the second image sequence into the deep convolutional neural network to obtain a candidate prediction result Pc1, and inputting M optical flow graphs in the second optical flow sequence into the deep convolutional neural network to obtain a candidate prediction result Pc 2;
outputting a final prediction result P: (P + Pc)/2; wherein Pc: x and y are weights of the candidate prediction results Pc1 and Pc2, respectively (x Pc1+ y Pc 2).
5. The classroom real-time human body action recognition method as claimed in claim 1, wherein in the mixed dimension convolutional neural network, a three-dimensional convolutional network is used as a residual connecting part of a two-dimensional convolutional network, and the two-dimensional convolutional network and the three-dimensional convolutional network are combined in a manner that:
[B*S,C,H,W]→[B,C,S,H,W]
where B denotes the batch size, S denotes the number of input images, and C, H, and W denote the number of channels, and the height and width of the feature map, respectively.
6. The real-time classroom human body motion recognition method as recited in claim 1, wherein the deep convolutional neural network error is calculated by:
Figure FDA0003372994440000031
wherein, l (x)i) A tag value, p (x), representing the ith input datai) Is a predicted value obtained by the ith input data after passing through a convolutional network, Loss (x)i) The loss function is represented.
7. The real-time classroom human body motion recognition method according to claim 4,
the N RGB images can be continuous N images or N images with the same interval;
the M optical flow graphs can be continuous M optical flow graphs or M optical flow graphs with the same interval; the M optical flow maps contain both X-direction and Y-direction optical flow maps.
8. The real-time classroom human body motion recognition method as recited in claim 1, further comprising:
if the human body action classification result is a learning action, recording the human body action classification result in a log file;
and if the human body action classification result is a non-learning behavior, sending out a warning signal.
9. A computer arrangement comprising at least one processing unit and at least one memory unit, wherein the memory unit stores a computer program which, when executed by the processing unit, causes the processing unit to carry out the steps of the method according to any one of claims 1 to 8.
10. A computer-readable medium, in which a computer program is stored which is executable by a computer device, and which, when run on the computer device, causes the computer device to carry out the steps of the method according to any one of claims 1 to 8.
CN202111407773.8A 2021-11-24 2021-11-24 Classroom real-time human body action recognition method, computer equipment and readable medium Pending CN114140876A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111407773.8A CN114140876A (en) 2021-11-24 2021-11-24 Classroom real-time human body action recognition method, computer equipment and readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111407773.8A CN114140876A (en) 2021-11-24 2021-11-24 Classroom real-time human body action recognition method, computer equipment and readable medium

Publications (1)

Publication Number Publication Date
CN114140876A true CN114140876A (en) 2022-03-04

Family

ID=80391543

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111407773.8A Pending CN114140876A (en) 2021-11-24 2021-11-24 Classroom real-time human body action recognition method, computer equipment and readable medium

Country Status (1)

Country Link
CN (1) CN114140876A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116824640A (en) * 2023-08-28 2023-09-29 江南大学 Leg identification method, system, medium and equipment based on MT and three-dimensional residual error network
CN117237984A (en) * 2023-08-31 2023-12-15 江南大学 MT leg identification method, system, medium and equipment based on label consistency
CN117523677A (en) * 2024-01-02 2024-02-06 武汉纺织大学 Classroom behavior recognition method based on deep learning

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116824640A (en) * 2023-08-28 2023-09-29 江南大学 Leg identification method, system, medium and equipment based on MT and three-dimensional residual error network
CN116824640B (en) * 2023-08-28 2023-12-01 江南大学 Leg identification method, system, medium and equipment based on MT and three-dimensional residual error network
CN117237984A (en) * 2023-08-31 2023-12-15 江南大学 MT leg identification method, system, medium and equipment based on label consistency
CN117237984B (en) * 2023-08-31 2024-06-21 江南大学 MT leg identification method, system, medium and equipment based on label consistency
CN117523677A (en) * 2024-01-02 2024-02-06 武汉纺织大学 Classroom behavior recognition method based on deep learning
CN117523677B (en) * 2024-01-02 2024-06-11 武汉纺织大学 Classroom behavior recognition method based on deep learning

Similar Documents

Publication Publication Date Title
CN114140876A (en) Classroom real-time human body action recognition method, computer equipment and readable medium
CN110889672B (en) Student card punching and class taking state detection system based on deep learning
CN111931585A (en) Classroom concentration degree detection method and device
CN111428448B (en) Text generation method, device, computer equipment and readable storage medium
CN113870395A (en) Animation video generation method, device, equipment and storage medium
CN113538441A (en) Image segmentation model processing method, image processing method and device
KR102042168B1 (en) Methods and apparatuses for generating text to video based on time series adversarial neural network
CN112215171B (en) Target detection method, device, equipment and computer readable storage medium
CN112116684A (en) Image processing method, device, equipment and computer readable storage medium
CN111191664A (en) Training method of label identification network, label identification device/method and equipment
CN112668638A (en) Image aesthetic quality evaluation and semantic recognition combined classification method and system
CN110728194A (en) Intelligent training method and device based on micro-expression and action recognition and storage medium
CN113435432A (en) Video anomaly detection model training method, video anomaly detection method and device
CN115546692A (en) Remote education data acquisition and analysis method, equipment and computer storage medium
CN115050002A (en) Image annotation model training method and device, electronic equipment and storage medium
CN113255787B (en) Small sample target detection method and system based on semantic features and metric learning
CN111414895A (en) Face recognition method and device and storage equipment
CN110969109A (en) Blink detection model under non-limited condition and construction method and application thereof
US20230085938A1 (en) Visual analytics systems to diagnose and improve deep learning models for movable objects in autonomous driving
CN115512267A (en) Video behavior identification method, system, device and equipment
CN111459050B (en) Intelligent simulation type nursing teaching system and teaching method based on dual-network interconnection
CN115661904A (en) Data labeling and domain adaptation model training method, device, equipment and medium
CN115115979A (en) Identification and replacement method of component elements in video and video recommendation method
CN114882580A (en) Measuring method for motion action consistency based on deep learning
CN113688789A (en) Online learning investment recognition method and system based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination