CN112905811A

CN112905811A - Teaching audio and video pushing method and system based on student classroom behavior analysis

Info

Publication number: CN112905811A
Application number: CN202110168340.5A
Authority: CN
Inventors: 冯飞; 龙土仲; 张国友; 陈国镇
Original assignee: Guangzhou Luzhen Technology Co ltd
Current assignee: Guangzhou Luzhen Technology Co ltd
Priority date: 2021-02-07
Filing date: 2021-02-07
Publication date: 2021-06-04

Abstract

The invention discloses a teaching audio and video pushing method based on student classroom behavior analysis, which comprises the following steps: collecting classroom behavior videos of students while collecting classroom teaching audios and videos; inputting the classroom behavior video into a trained neural network model for training to obtain a recognition result, wherein the neural network model comprises a face detection model, a face recognition model and a behavior posture model, and the recognition result comprises an identity recognition result and a behavior recognition result; and packaging the classroom teaching audio and video and the recognition result and storing the packaged classroom teaching audio and video and recognition result in a file library of a teaching audio and video application platform. By adopting the invention, the problems that the video on the teaching video application platform can only be pushed in a coarse-dimension mode such as the additional information and the playing information of the video, the precision is not high enough, and the personalized requirements of users can not be met can be solved.

Description

Teaching audio and video pushing method and system based on student classroom behavior analysis

Technical Field

The invention relates to the field of video pushing, in particular to a teaching audio and video pushing method and system based on student classroom behavior analysis.

Background

With the rapid development of multimedia technology and network technology, the acquisition and transmission of teaching videos become easier and easier, and the teaching videos have gradually become one of the main carriers of teaching resource transmission. The existing video pushing method mainly pushes according to the added additional information of the manually added video and pushes according to the playing and browsing conditions of the video. For the first video pushing method, text identifiers are added to videos manually, such as video titles, the courses, the specialties, teaching teachers, relevant knowledge points, course introduction and the like are added; and setting a certain additional information and a value in the playing condition as a pushing basis, such as pushing according to the affiliated course, pushing according to the teaching teacher, pushing according to the browsing amount, pushing according to the praise amount, and the like. For the second video pushing method, the browsing amount, praise amount, collection amount and comment number of the video are recorded; and then setting a value in a certain playing situation as a pushing basis, such as pushing according to the browsing amount, pushing according to the praise amount, and the like.

For the video pushing method, the course information needs to be manually added, the workload is large, the dimensionality is too thick, the content pushed to each user is the same, the accuracy is not high enough, and the personalized requirements of the users cannot be met, so that the click-through rate of the pushed video is very low. At present, teaching videos are rapidly increased, and a teaching resource platform needs to push videos to users in a targeted manner, so that the utilization rate of the videos can be improved. However, the conventional video pushing technology can only push in a coarse dimension, for example, according to the course to which the video belongs, a teacher giving lessons, and the like, the pushing in the coarse dimension is not accurate enough, and cannot meet the personalized requirements of the user, for example, the user may only be interested in the video of a classroom missed by the user, and the pushing in the coarse dimension cannot identify the requirement of the user.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a teaching audio and video pushing method and system based on student classroom behavior analysis, which can solve the problems that videos on a teaching video application platform can only be pushed in a coarse-dimension mode such as additional information and playing information of the videos, and the like, and are not accurate enough and cannot meet the personalized requirements of users.

In order to solve the technical problem, the invention provides a teaching audio and video pushing method based on student classroom behavior analysis, which comprises the following steps: collecting classroom behavior videos of students while collecting classroom teaching audios and videos; inputting the classroom behavior video into a trained neural network model for training to obtain a recognition result, wherein the neural network model comprises a face detection model, a face recognition model and a behavior posture model, and the recognition result comprises an identity recognition result and a behavior recognition result; packaging the classroom teaching audio and video and the recognition result and storing the packaged classroom teaching audio and video and the recognition result in a file library of a teaching audio and video application platform; acquiring user identity information for logging in the teaching audio and video application platform; and screening the identity recognition result consistent with the user identity information from the file library, and pushing the classroom teaching audio and video corresponding to the identity recognition result to the user.

Preferably, the teaching audio and video pushing method based on student classroom behavior analysis further includes: acquiring timestamp information of the behavior recognition result and generating corresponding description information; and packaging the timestamp information, the description information and the classroom teaching audio and video and storing the timestamp information, the description information and the classroom teaching audio and video in a file library of a teaching audio and video application platform.

Preferably, the teaching audio and video pushing method based on student classroom behavior analysis further includes: acquiring identity information of the corresponding students in the classroom; and packaging the identity information of the student and the classroom teaching audio and video and storing the identity information and the classroom teaching audio and video in a file library of a teaching audio and video application platform.

Preferably, the teaching audio and video pushing method based on student classroom behavior analysis further includes: collecting the sign-in information of the corresponding students; and packaging the identity information, the sign-in information and the classroom teaching audio and video of the student and storing the identity information, the sign-in information and the classroom teaching audio and video in a file library of a teaching audio and video application platform.

Preferably, the trained neural network model is imported in a plug-in directory mode.

The invention also provides a teaching audio and video pushing system based on student classroom behavior analysis, which comprises a first acquisition module, a second acquisition module, an identification model construction module, a packaging storage module, a first acquisition module, a screening module and a pushing module; the first acquisition module is used for acquiring audio and video data of classroom teaching, and the second acquisition module is used for acquiring video data of student behaviors; the recognition model construction module is used for constructing and training a neural network model, and inputting the classroom behavior video into the trained neural network model for training to obtain a recognition result, wherein the neural network model comprises a face detection model, a face recognition model and a behavior posture model, and the recognition result comprises an identity recognition result and a behavior recognition result; the packaging storage module is used for packaging the classroom teaching audio and video and the recognition result and storing the packaged classroom teaching audio and video and the recognition result in a file library of a teaching audio and video application platform; the first acquisition module is used for acquiring the identity information of a user logging in the teaching audio and video application platform; the screening module is used for screening the identity recognition result consistent with the user identity information in the document library; the pushing module is used for pushing the classroom teaching audio and video corresponding to the identity recognition result to a user.

Preferably, the teaching audio and video pushing system based on student classroom behavior analysis further comprises a second acquisition module, the second acquisition module is used for acquiring timestamp information of the behavior recognition result and generating corresponding description information, and the encapsulation storage module is further used for encapsulating the timestamp information and the description information and classroom teaching audio and video and storing the encapsulated timestamp information and description information and classroom teaching audio and video in a file library of a teaching audio and video application platform.

Preferably, the teaching audio and video pushing system based on student classroom behavior analysis further comprises a third acquisition module, the second acquisition module is used for acquiring corresponding student identity information of a classroom, and the packaging storage module is further used for packaging the corresponding student identity information and the classroom teaching audio and video and storing the packaged information and the classroom teaching audio and video in a file library of a teaching audio and video application platform.

Preferably, teaching audio frequency and video push system based on student's classroom behavior analysis still includes the third collection module, the third collection module is used for gathering the information of registering to the student, encapsulation storage module still be used for with the identity information of registering to the student, the information of registering and classroom teaching audio frequency and video encapsulate and store in the file storehouse of teaching audio frequency and video application platform.

Preferably, the teaching audio and video pushing system based on student classroom behavior analysis further comprises a recognition model importing module, and the recognition model importing module is used for importing the trained neural network model in a plug-in directory mode.

The beneficial effects of the implementation of the invention are as follows:

according to the invention, the classroom teaching audio and video and the classroom behavior video of the student are collected at the same time, the identity recognition result and the behavior recognition result of the student are recognized according to the classroom behavior video, the identity recognition result consistent with the identity information of the user logging in the teaching audio and video application platform is screened out, and the classroom teaching audio and video corresponding to the identity recognition result is pushed to the user, so that the problems that the video on the teaching video application platform can only be pushed in a coarse-dimension mode such as the additional information and the playing information of the video, the accuracy is not high, and the personalized requirement of the user cannot be met are solved.

Drawings

FIG. 1 is a flow chart of a teaching audio/video pushing method based on student classroom behavior analysis according to the present invention;

FIG. 2 is a schematic diagram of a P-Net network according to the present invention;

FIG. 3 is a schematic diagram of the structure of an R-Net network provided by the present invention;

FIG. 4 is a schematic diagram of the structure of an O-Net network provided by the present invention;

FIG. 5 is a schematic structural diagram of the facenet model provided by the present invention;

fig. 6 is a schematic structural diagram of an opencast network provided by the present invention;

fig. 7 is a schematic diagram of a teaching audio and video pushing system based on student classroom behavior analysis provided by the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings. It is only noted that the invention is intended to be limited to the specific forms set forth herein, including any reference to the drawings, as well as any other specific forms of embodiments of the invention.

As shown in fig. 1, the invention provides a teaching audio and video pushing method based on student classroom behavior analysis, which comprises the following steps:

s101, collecting classroom behavior videos of students while collecting classroom teaching audios and videos.

It should be noted that, the classroom teaching audio and video and the classroom behavior video of the students are collected at the same time to ensure the time consistency and prepare for the subsequent processing.

S102, inputting the classroom behavior video into a trained neural network model for training to obtain a recognition result, wherein the neural network model comprises a face detection model, a face recognition model and a behavior posture model, and the recognition result comprises an identity recognition result and a behavior recognition result.

It should be noted that these models can be deployed in a production environment, so that in a real-time video stream, the identities and behavior postures of students are recognized by using these trained models to make algorithmic reasoning. According to the method, an AI algorithm training platform is used, CUDA parallel computing hardware equipment based on a GPU is used, and hyper-parameters such as an image data set, a learning rate and a loss function are trained by utilizing deep learning frames such as tensorFlow and caffe and network models such as mtcnn, facenet and opencast, so that a trained neural network model is obtained.

Preferably, the trained neural network model is imported in a plug-in directory mode. The invention uses the neural network and the image processing algorithm library which are optimized by the GPU, leads in the corresponding algorithm model in a plug-in directory mode, and can rapidly deploy the neural network model for AI inference without installing a deep learning framework. The inference engine can greatly improve the operation speed of the algorithm and achieve the effect of real-time video analysis.

Firstly, an mtcnn network model is imported for face detection. mtcnn is a multitask convolutional neural network divided into three-layer network structures of P-Net, R-Net and O-Net.

The basic construction of P-Net is a full convolutional network, as shown in fig. 2. The method comprises the steps of performing initial feature extraction and frame calibration through an FCN, and performing Bounding-Box Regression adjustment window and NMS (network management System) to filter most of windows. P-Net is a regional suggestion network of a face region, after three convolution layers of feature input results of the network are input, a face classifier is used for judging whether the region is a face or not, frame regression and a locator of a face key point are used for carrying out preliminary proposal of the face region, the part finally outputs a plurality of face regions where faces may exist, and the regions are input into R-Net for further processing.

As shown in fig. 3, the basic structure of R-Net is a convolutional neural network, and a fully connected layer is added to P-Net of the first layer, so that the input data screening is more strict. After a picture passes through P-Net, a plurality of prediction windows are left, all the prediction windows are sent to R-Net, a network filters a large number of candidate frames with poor effects, and finally, Bounding-Box Regression and NMS are carried out on the selected candidate frames to further optimize prediction results.

Because the output of P-Net is only a possible face region with a certain credibility, in the network, the input is selected in a refining way, most of error input is eliminated, the frame regression and the face key point positioner are used again to carry out the frame regression and the key point positioning of the face region, and finally, the more credible face region is output for the O-Net to use. Compared with the characteristics of 1x1x32 output by P-Net using full convolution, R-Net uses a full connection layer of 128 after the last convolution layer, so that more image characteristics are reserved and the accuracy performance is better than that of P-Net.

As shown in FIG. 4, the basic structure of O-Net is a more complex convolutional neural network, with one more convolutional layer compared to R-Net. The difference between the O-Net effect and the R-Net effect is that the layer structure can identify the facial region through more supervision, and can regress the facial feature points of the human, and finally five facial feature points are output. The O-Net network has more input features, the last of the network structure is also a larger 256 full-connection layer, more image features are reserved, meanwhile, face judgment, face region frame regression and face feature positioning are carried out, and finally the upper left corner coordinate and the lower right corner coordinate of the face region and five feature points of the face region are output. O-Net has more characteristic input and more complex network structure, and also has better performance, and the output of the layer is used as the final network model output.

And secondly, importing a facenet network mode for face recognition. facenet maps images to euclidean space through convolutional neural network learning, the spatial distance is directly related to the image similarity: different images of the same person are at a small spatial distance, and images of different persons are at a large spatial distance. FaceNet trains the neural network directly using the loss function of the triplets-based LMNN (maximum boundary nearest neighbor classification), and the network directly outputs a vector space of 128 dimensions. The network model structure is shown in fig. 5: the method comprises a batch input layer, a Deep CNN layer and a Deep architecture layer, wherein after L2 normalization is carried out, Embedding feature vectors are generated, and finally triplet loss is used for classification.

And finally, importing an opencast network model for behavior recognition. openposition uses a "bottom-up" mode, that is, all key points of a human body in a picture are detected first, and then the key points are corresponding to different human bodies. The network model is shown in fig. 6: where F is a feature maps set consisting of the top 10 layers of VGG-19, which outputs a set of feature maps F as input to the first stage. The whole technical scheme is 'two-branch multi-stage CNN', wherein one branch is used for predicting a grading diagram consistency maps (S), and the other branch is used for predicting Par Affinity Fields (L). The input received by stage 1 of the network is a feature F, and then the feature F is processed by a Branch 1 network and a Branch 2 network to obtain S _1 and L _1 respectively. Starting from stage 2, the input to the phase t network consists of three parts, S _ { t-1}, L _ { t-1}, F.

Through the three neural network models, the identity of the student and the behavior action can be recognized. Based on the identity and the behavior action, the video can be labeled, and a basis is provided for the screening and retrieval of the video.

And S103, packaging the classroom teaching audio and video and the recognition result and storing the packaged classroom teaching audio and video and recognition result in a file library of a teaching audio and video application platform.

It should be noted that, the invention firstly pre-processes the collected audio data and video data, including audio and video processing such as audio and video encoding and decoding, video scaling and synthesis, audio and video storage and encapsulation, audio and video synchronization processing, and further forms structured data to be stored in a file library of the teaching audio and video application platform. And writing the classroom teaching audio and video generation MP4 video file into a disk, and writing the label information of the identification result into a MySQL database.

And S104, acquiring the identity information of the user logging in the teaching audio and video application platform.

It should be noted that each user has an account in the teaching audio/video application platform, and each account is associated with identity information of one user.

S105, screening the identity recognition result consistent with the user identity information from the file library, and pushing the classroom teaching audio and video corresponding to the identity recognition result to the user.

It should be noted that the method and the device provided by the invention have the advantages of accurate pushing and required pushing by screening according to the identity information and pushing the teaching audio and video for related users, and improve the utilization value of the teaching audio and video.

It should be noted that, the timestamp information of the behavior recognition result is added and the corresponding description information is generated, so that the later screening and recommendation are facilitated. According to the method, according to a login account of a user, student information is inquired through the account, then the student information is retrieved from a file library, and description information is generated according to corresponding timestamps of behavior recognition results (such as absenteeism, dozing, vague and other different dimensions, relevant videos are found out for recommendation), so that the user can log in a video application platform through the two modes (including a browser and an APP), and after logging in, a background can be automatically triggered to initiate retrieval and recommendation behaviors, and recommended video courses and description information of the courses relevant to the user are matched. When the video description information is clicked, the corresponding time segment is automatically jumped to play, for example: when a student dozes or plays a mobile phone at the 10 th minute in a video, a user can automatically jump to play from the tenth minute segment when viewing the video, so that targeted push is realized.

Further, the teaching audio and video pushing method based on student classroom behavior analysis further comprises the following steps: collecting the sign-in information of the corresponding students; and packaging the identity information, the sign-in information and the classroom teaching audio and video of the student and storing the identity information, the sign-in information and the classroom teaching audio and video in a file library of a teaching audio and video application platform.

It should be noted that, in order to avoid that students who are not on duty cannot receive corresponding push videos, the identity information of the corresponding students in the classroom is acquired; and packaging the identity information of the student and the classroom teaching audio and video and storing the identity information and the classroom teaching audio and video in a file library of a teaching audio and video application platform. And adding the sign-in information (sign-in or not sign-in or leaving after a period of sign-in) of the student to the corresponding classroom education audio-video, and adding the leaving timestamp to the corresponding classroom education audio-video for the student leaving after the sign-in. Therefore, when the students log in the teaching audio and video application platform, whether the students are on duty or not, the corresponding pushed videos can be received in a targeted manner, and the corresponding time slices can be automatically jumped to be played after the students are opened, so that the aim of accurate pushing is achieved.

As shown in fig. 2, the invention further provides a teaching audio and video push system based on student classroom behavior analysis, which comprises a first acquisition module 1, a second acquisition module 2, an identification model construction module 3, a packaging storage module 4, a first acquisition module 5, a screening module 6 and a push module 7; the first acquisition module 1 is used for acquiring audio and video data of classroom teaching, and the second acquisition module 2 is used for acquiring video data of student behaviors; the recognition model construction module 3 is used for constructing and training a neural network model, and inputting the classroom behavior video into the trained neural network model for training to obtain a recognition result, wherein the neural network model comprises a face detection model, a face recognition model and a behavior posture model, and the recognition result comprises an identity recognition result and a behavior recognition result; the encapsulation storage module 4 is used for encapsulating the classroom teaching audio and video and the identification result and storing the encapsulated classroom teaching audio and video and the identification result in a file library of a teaching audio and video application platform; the first obtaining module 5 is used for obtaining the user identity information logged in the teaching audio and video application platform; the screening module 6 is configured to screen the identity recognition result that is consistent with the user identity information from the document library; the pushing module 7 is used for pushing the classroom teaching audio and video corresponding to the identity recognition result to a user.

According to the invention, the classroom teaching audio and video and the classroom behavior video of the students are simultaneously acquired through the first acquisition module and the second acquisition module, the identity recognition result and the behavior recognition result of the students are recognized according to the classroom behavior video, then the identity recognition result which is consistent with the identity information of the user who logs in the teaching audio and video application platform is screened out, and the classroom teaching audio and video corresponding to the identity recognition result is pushed to the user, so that the problems that the video on the teaching video application platform can only be pushed in a coarse-dimension mode such as additional information and playing information of the video, the accuracy is not high, and the personalized requirement of the user cannot be met are solved.

Preferably, the teaching audio and video pushing system based on student classroom behavior analysis further comprises a second obtaining module 8, the second obtaining module 8 is used for obtaining timestamp information of the behavior recognition result and generating corresponding description information, and the packaging storage module 4 is further used for packaging and storing the timestamp information, the description information and the classroom teaching audio and video in a file library of a teaching audio and video application platform.

Preferably, the teaching audio/video pushing system based on student classroom behavior analysis further comprises a third obtaining module 9, the third obtaining module 9 is used for obtaining the identity information of the corresponding student in the classroom, and the packaging storage module 4 is further used for packaging and storing the identity information of the corresponding student and the classroom teaching audio/video in a file library of a teaching audio/video application platform.

Further, teaching audio frequency and video push system based on student's classroom behavior analysis still includes third collection module 10, third collection module 10 is used for gathering the information of registering of corresponding student, encapsulation storage module 4 still is used for with the identity information of corresponding student, the information of registering and classroom teaching audio frequency and video encapsulate and store the file storehouse at teaching audio frequency and video application platform.

Preferably, the teaching audio and video pushing system based on student classroom behavior analysis further comprises a recognition model importing module 11, and the recognition model importing module 11 is used for importing a trained neural network model in a plug-in directory mode.

It should be noted that, a comparison image library of various classroom behaviors is preset in the neural network model, for example: the user can do classroom behaviors such as lying on a desk for dozing, lowering head for playing a mobile phone and the like.

In the invention, the first acquisition module and the second acquisition module are a recording and broadcasting host, a camera and a sound pickup, but are not limited thereto. And after the audio data and the video data are collected, distributing the data. After receiving the data, the encapsulation storage module 4 performs coding compression on the audio data, and performs processing such as scaling, synthesis, coding compression and the like on the video data. And then, a neural network model is constructed and trained through the recognition model construction module 3, and is used for inputting the classroom behavior video into the trained neural network model for training to obtain a recognition result, wherein the neural network model comprises a face detection model, a face recognition model and a behavior posture model, and the recognition result comprises an identity recognition result and a behavior recognition result. And then packaging and storing the recognition result and the corresponding classroom teaching audio and video to a file library. And the user logs in the video application platform through the browser or the APP to trigger the recommendation system to operate. And the recommendation system confirms the user identity information according to the account number. And the recommendation system initiates a screening request and searches the video content of the user in the file library. And after the screening 1 is finished, generating personalized recommended content. And presenting the personalized recommended courses and the description information of the courses at the user client. And clicking the description information, and automatically jumping to the corresponding time segment for playing.

In conclusion, the invention converts the classroom behavior of the students into the structured data, and then makes targeted data push according to the data, so that the pushed data more conforms to the real requirements of the users. Meanwhile, the data conversion and pushing are all automatically completed by the system, manual compiling is not needed, and the efficiency is high.

While the present disclosure has been described in considerable detail and with particular reference to a few illustrative embodiments thereof, it is not intended to be limited to any such details or embodiments or any particular embodiments, but it is to be construed as effectively covering the intended scope of the disclosure by providing a broad, potential interpretation of such claims in view of the prior art with reference to the appended claims. Furthermore, the foregoing describes the disclosure in terms of embodiments foreseen by the inventor for which an enabling description was available, notwithstanding that insubstantial modifications of the disclosure, not presently foreseen, may nonetheless represent equivalent modifications thereto.

Claims

1. A teaching audio and video pushing method based on student classroom behavior analysis is characterized by comprising the following steps:

collecting classroom behavior videos of students while collecting classroom teaching audios and videos;

inputting the classroom behavior video into a trained neural network model for training to obtain a recognition result, wherein the neural network model comprises a face detection model, a face recognition model and a behavior posture model, and the recognition result comprises an identity recognition result and a behavior recognition result;

packaging the classroom teaching audio and video and the recognition result and storing the packaged classroom teaching audio and video and the recognition result in a file library of a teaching audio and video application platform;

acquiring user identity information for logging in the teaching audio and video application platform;

and screening the identity recognition result consistent with the user identity information from the file library, and pushing the classroom teaching audio and video corresponding to the identity recognition result to the user.

2. The teaching audio and video pushing method based on student classroom behavior analysis as recited in claim 1, further comprising:

acquiring timestamp information of the behavior recognition result and generating corresponding description information;

and packaging the timestamp information, the description information and the classroom teaching audio and video and storing the timestamp information, the description information and the classroom teaching audio and video in a file library of a teaching audio and video application platform.

3. The teaching audio and video pushing method based on student classroom behavior analysis as recited in claim 1, further comprising:

acquiring identity information of the corresponding students in the classroom;

and packaging the identity information of the student and the classroom teaching audio and video and storing the identity information and the classroom teaching audio and video in a file library of a teaching audio and video application platform.

4. The teaching audio and video pushing method based on student classroom behavior analysis as recited in claim 3, further comprising:

collecting the sign-in information of the corresponding students;

and packaging the identity information, the sign-in information and the classroom teaching audio and video of the student and storing the identity information, the sign-in information and the classroom teaching audio and video in a file library of a teaching audio and video application platform.

5. The teaching audio and video pushing method based on student classroom behavior analysis as claimed in claim 1, wherein the trained neural network model is imported in a plug-in directory manner.

6. A teaching audio and video pushing system based on student classroom behavior analysis is characterized by comprising a first acquisition module, a second acquisition module, an identification model building module, a packaging storage module, a first acquisition module, a screening module and a pushing module;

the first acquisition module is used for acquiring audio and video data of classroom teaching, and the second acquisition module is used for acquiring video data of student behaviors;

the recognition model construction module is used for constructing and training a neural network model, and inputting the classroom behavior video into the trained neural network model for training to obtain a recognition result, wherein the neural network model comprises a face detection model, a face recognition model and a behavior posture model, and the recognition result comprises an identity recognition result and a behavior recognition result;

the packaging storage module is used for packaging the classroom teaching audio and video and the recognition result and storing the packaged classroom teaching audio and video and the recognition result in a file library of a teaching audio and video application platform;

the first acquisition module is used for acquiring the identity information of a user logging in the teaching audio and video application platform;

the screening module is used for screening the identity recognition result consistent with the user identity information in the document library;

the pushing module is used for pushing the classroom teaching audio and video corresponding to the identity recognition result to a user.

7. The teaching audio and video pushing system based on student classroom behavior analysis as claimed in claim 6, further comprising a second obtaining module, wherein the second obtaining module is configured to obtain timestamp information of the behavior recognition result and generate corresponding description information, and the encapsulation storage module is further configured to encapsulate the timestamp information, the description information and the classroom teaching audio and video and store the encapsulated timestamp information and description information and the classroom teaching audio and video in a file library of a teaching audio and video application platform.

8. The teaching audio and video pushing system based on student classroom behavior analysis as claimed in claim 6, further comprising a third obtaining module, wherein the second obtaining module is used for obtaining the identity information of the student corresponding to the classroom, and the encapsulation storage module is further used for encapsulating the identity information of the student corresponding to the classroom and the classroom teaching audio and video and storing the encapsulated identity information and classroom teaching audio and video in a file library of a teaching audio and video application platform.

9. The student classroom behavior analysis based teaching audio and video push system according to claim 6, further comprising a third collection module, wherein the third collection module is configured to collect the attendance information of the student, and the encapsulation storage module is further configured to encapsulate and store the identity information and the attendance information of the student and the classroom teaching audio and video in a file library of a teaching audio and video application platform.

10. The teaching audio and video pushing system based on student classroom behavior analysis as claimed in claim 6, further comprising a recognition model import module, wherein the recognition model import module is used for importing the trained neural network model by means of a plug-in directory.