CN111327939A

CN111327939A - Distributed teaching video processing system

Info

Publication number: CN111327939A
Application number: CN202010114831.7A
Authority: CN
Inventors: 张凌; 牟相霖; 高晓东; 李冠霖; 成海秀
Original assignee: South China University of Technology SCUT; CERNET Corp
Current assignee: South China University of Technology SCUT; CERNET Corp
Priority date: 2020-02-25
Filing date: 2020-02-25
Publication date: 2020-06-23

Abstract

The invention discloses a distributed teaching video processing system which comprises a teaching video file transmission module, a teaching video GPU acceleration processing module and a teaching video automatic eye mosaic making module, wherein each module can realize a corresponding software module by C + + language, so that the teaching video processing system capable of running on a server is realized. The teaching video file transmission module is used for downloading an original video file and uploading a processed video file from the video storage server. The teaching video GPU acceleration processing module can conduct GPU acceleration processing on video cutting, video transcoding and video resolution adjustment. The automatic human eye mosaic playing module for the teaching video can realize automatic mosaic playing on human eyes in the video. The invention transfers the local video processing to the server for processing, accelerates the video processing speed, provides the video automatic human eye mosaic-making function which is not provided by the video processing software, and simplifies and facilitates the video processing flow.

Description

Distributed teaching video processing system

Technical Field

The invention relates to the technical field of video processing, in particular to a distributed teaching video processing system.

Background

At present, in a learning management system, a teaching video of a teacher's lectures is generally required to be taken and then made into a network course for students to learn. However, since the size and resolution of the original captured teaching video are usually very large, in order to save bandwidth resources in the learning management system, some processing is usually performed on the videos, such as a series of operations of cropping the original video, or adjusting the resolution to reduce the size of the video, or transcoding the video. In addition, for some teaching videos of traditional Chinese medicine diagnosis, some high-order processing needs to be performed on the videos, such as playing mosaic on human eyes to further protect the privacy of patients.

In the face of these video processing tasks, we usually use some third-party video processing software, such as adobe playback, etc., to perform processing. However, these video processing tasks are generally very tedious and repetitive for teachers, and these video processing software are usually proprietary, and have more basic functions, lack high-level functions, and are not friendly to general users.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a distributed teaching video processing system, which fully exploits the parallel processing capability of a GPU to process teaching videos and solves the problems that the local teaching video processing speed of a user is low, the local processing operation of the user is complicated, third-party video processing software is required to be relied on and some video processing functions which are not possessed by the third-party video processing software are provided. The system serves as a role of a teaching video processing engine, on one hand, GPU acceleration can be carried out on traditional teaching video processing such as video transcoding, resolution and clipping, and on the other hand, the engine provides some high-order teaching video processing functions such as automatic mosaic printing by human eyes in teaching videos.

In order to achieve the purpose, the technical scheme provided by the invention is as follows: a distributed teaching video processing system comprising:

the teaching video file transmission module is used for downloading an original video file from the video storage server and uploading the video file processed by the video processing system;

the teaching video GPU acceleration processing module is used for adjusting the resolution of a video according to the video name and the video resolution input by a user, transcoding the video according to the video name and the video format input by the user, and cutting the video according to the video name, the starting time and the ending time input by the user;

the automatic human eye mosaic printing module for teaching video is used for decomposing teaching video into image frames, detecting human faces in the images, detecting human eye feature points in the human faces, printing mosaic on human eye feature point areas, and making the image frames and audio after printing into video.

Further, the teaching video file transmission module comprises a user login module, a Cookie processing module, a video file downloading module and a video file uploading module, wherein:

the user login module logs in the video storage server according to a user name and a password input by a user;

the Cookie processing module is responsible for storing Cookies returned by the video storage server to the local and loading the Cookies from the local;

the video file downloading module is responsible for downloading video files from the video storage server;

and the video file uploading module is responsible for uploading the processed video to the video storage server.

Further, teaching video GPU accelerates processing module includes video resolution ratio adjustment module, video transcoding module, video cutting module, wherein:

the video resolution adjusting module adjusts the video resolution: analyzing request parameters in the HTTP POST request to obtain a target resolution of a video, then using a video processing tool FFMPEG to adjust the resolution of a specified video file, fully mining the parallel computing capability of a GPU in the adjusting process, and accelerating video coding and decoding by using NVENC and NVDEC provided by NVIDIA to realize GPU accelerated video processing and improve the processing efficiency;

the video transcoding module converts the video format: analyzing request parameters in the HTTP POST request to obtain a target video format, then converting the video format of a specified video file by using a video processing tool FFMPEG, fully mining the parallel computing capability of a GPU in the conversion process, and accelerating video coding and decoding by using NVENC and NVDEC provided by NVIDIA to realize GPU accelerated video processing and improve the processing efficiency;

the video clipping module clips videos: the starting time and the ending time of the video needing to be cut are obtained by analyzing the request parameters in the HTTP POST request, and then the video processing tool FFMPEG is used for cutting the specified video file.

Further, the automatic human eye mosaic printing module for teaching video comprises a human face detection module in an image, a human eye feature point detection module in a human face, a continuous inter-frame human eye feature point tracking module, a human eye feature point region mosaic printing module and an image frame and audio synthesis video module, wherein:

the in-image face detection module automatically detects coordinate points of a face area in one image: pre-training a neural network model for detecting a human face based on deep learning, storing a weight value of the neural network to a local area, loading the weight value of the neural network from the local area when detecting the human face, and automatically detecting a coordinate area of the human face in an image by the model when inputting an image frame into the model;

the module for detecting the characteristic points of the human eyes in the human face automatically detects the coordinate points of the characteristic points of the human eyes in the human face: pre-training a neural network model based on deep learning for detecting human eye feature points from a human face, storing the weight value of the neural network to the local, loading the weight value of the neural network from the local during human eye feature point detection, and automatically detecting the human eye feature points of the human face by the model when a human face region in an image frame is input into the model;

the continuous inter-frame human eye feature point tracking module tracks the human eye feature points of two continuous frames of images: pre-training a neural network model tracked by human eye feature points and based on deep learning, storing the weight value of the neural network to the local, loading the weight value of the neural network from the local during the tracking of the human eye feature points, when inputting the face area of the next frame into the model, tracking the human eye feature points by a tracker, if the tracked human eye feature point result is in line with expectation, continuing to process the image of the next frame, otherwise, handing the result to a human eye feature point detection module to detect the human eye feature points again;

the human eye characteristic point region mosaic printing module performs mosaic printing on the human eye characteristic point region: after human eye feature points are detected in the image, calculating a corresponding mosaic area according to the human eye feature points, and then generating a corresponding human eye mosaic;

the image frame and audio synthesis video module synthesizes the coded image frame and audio into a video file: after the coded image frames are obtained, the image frames and audio are combined into the final coded video using the video processing tool FFMPEG in combination with the previously separated audio in the video.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the teaching video processing task is transferred to the server from the local area to be processed, the server fully excavates the parallel processing capacity of the GPU, and combines a video processing tool FFMPEG to carry out acceleration processing on video transcoding, video resolution adjustment and video cutting, so that the video processing speed is greatly improved, the video processing flow is simplified, and a user does not need to learn and use third-party video processing software.

2. Based on the latest deep learning technology, the neural network technology and the powerful GPU computing power, the system provides a high-order automatic mosaic playing function for human eyes, which is not provided by common video processing software, by deploying a human face detection model, a human eye feature point detection model and a human eye feature point tracking model in the system.

3. The video processing system is distributed, a teaching video storage server is separated from a teaching video processing server, the video processing system is composed of a plurality of video processing servers, and load balancing is achieved by distributing processing requests through a Web server Nginx.

Drawings

FIG. 1 is a schematic diagram of the overall architecture of the system of the present invention.

Fig. 2 is a flowchart of the entire video processing process.

Detailed Description

The present invention will be further described with reference to the following specific examples.

The distributed teaching video processing system provided in this embodiment is a teaching video processing system developed using Visual Studio Code software and C + + and Python languages and running on a server, and as shown in fig. 1, the system forwards a request and performs load balancing by a Web server Nginx, and forwards a video processing request to a Fast CGI server for processing. The teaching video processing system comprises:

the teaching video GPU acceleration processing module is used for adjusting the resolution of a video according to the video name and the video resolution input by a user, transcoding the video according to the video name and the video format input by the user, and cutting the video according to the video name and the start and end time input by the user;

the automatic human eye mosaic printing module for teaching video is used for decomposing teaching video into image frames, detecting human faces in the images, detecting human eye feature points in the human faces, printing mosaic on human eye feature point areas and manufacturing the coded image frames into video.

The teaching video file transmission module comprises a user login module, a Cookie processing module, a video file downloading module and a video file uploading module, wherein:

The teaching video GPU acceleration processing module comprises a video resolution adjusting module, a video transcoding module and a video cutting module, wherein:

the video resolution adjusting module adjusts the video resolution: analyzing request parameters in the HTTP POST request to obtain a target resolution of a video, then using an FFMPEG tool to adjust the resolution of a specified video file, fully mining the parallel computing capability of a GPU in the adjustment process, and accelerating video coding and decoding by using NVENC and NVDEC provided by NVIDIA to realize GPU accelerated video processing and improve the processing efficiency;

the video transcoding module converts the video format: analyzing request parameters in an HTTP POST request to obtain a target video format, then converting the video format of a specified video file by using an FFMPEG tool, fully mining the parallel computing capability of a GPU in the conversion process, and accelerating video coding and decoding by using NVENC and NVDEC provided by NVIDIA to realize GPU accelerated video processing and improve the processing efficiency;

the video clipping module clips videos: analyzing request parameters in the HTTP POST request to obtain the starting time and the ending time of the video to be cut, and then cutting the specified video file by using an FFMPEG tool;

the automatic human eye mosaic printing module for teaching video comprises a human face detection module in an image, a human eye feature point detection module in a human face, a human eye feature point region mosaic printing module and an image frame and audio synthesis video module, wherein:

the in-image face detection module automatically detects coordinate points of a face area in one image: a deep learning-based neural network model for detecting human faces is trained in advance, the model mainly improves a faceBox face detector, improves the network structure of the faceBox face detector, increases the depth and the width of the network, and then eliminates unnecessary indistinct samples in a system for training. After training the model, saving the weight value of the neural network model to the local. When the human face is detected, the weight value of the neural network is loaded from the local, and when an image frame is input into the neural network model, the model can automatically detect the coordinate area of the position of the human face in the image, wherein the coordinate area comprises the coordinate value of the upper left corner of the human face area, the coordinate value of the lower right corner of the human face area and the coordinate value of the center of the human face area.

The module for detecting the characteristic points of the human eyes in the human face automatically detects the coordinate points of the characteristic points of the human eyes in the human face: a deep learning-based neural network model for detecting human eye feature points from a human face is trained in advance, the model improves O-Net in an MTCNN model, and a data set is adjusted for training. After the model is trained, the weight value of the neural network is stored to the local. When detecting the human eye feature points, the weight value of the neural network is loaded from the local, and when a human face area in an image frame is input into the model, the model can automatically detect the human eye feature points of the human face;

the human eye feature point tracking module tracks the human eye feature points of two continuous frames of images: pre-training a deep learning-based neural network model tracked by human eye feature points, and storing the weight value of the neural network to the local. When tracking the human eye feature points, loading the weight value of the neural network from the local part, when inputting the human face area of the next frame into the model, the tracker firstly tracks the human eye feature points, if the tracked human eye feature point result is in line with the expectation, continuing to process the next frame of image, otherwise, handing the result to the human eye feature point detection module to detect the human eye feature points again;

the human eye characteristic point region mosaic printing module performs mosaic printing on the human eye characteristic point region: after human eye feature points are detected in the image, calculating a mosaic area according to the human eye feature points, and then generating a corresponding mosaic;

the image frame and audio synthesis video module synthesizes the coded image frame and audio into a video file: and after the coded image frame is obtained, combining the audio separated from the video, and synthesizing the image frame and the audio into a final video by using an FFMPEG tool.

As shown in fig. 2, the whole processing flow of the distributed teaching video processing system in this embodiment is that, first, a video is uploaded from a browser to a video storage server, after the video is uploaded successfully, the browser sends a video processing request to a video processing system, the video processing system downloads and processes the video from the video storage server after receiving the request, and after the processing is completed, the processed video is uploaded to the video storage server.

The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims

1. A distributed teaching video processing system, comprising:

2. A distributed instructional video processing system according to claim 1, wherein: the teaching video file transmission module comprises a user login module, a Cookie processing module, a video file downloading module and a video file uploading module, wherein:

3. A distributed instructional video processing system according to claim 1, wherein: the teaching video GPU acceleration processing module comprises a video resolution adjusting module, a video transcoding module and a video cutting module, wherein:

4. A distributed instructional video processing system according to claim 1, wherein: automatic people's eye of teaching video beats mosaic module includes in the image people's eye detection module, people's eye feature point detection module in the people's face, continuous interframe people's eye feature point tracking module, people's eye feature point region beat mosaic module and image frame and audio synthesis video module, wherein: