CN115348461A

CN115348461A - Teaching video processing method, device, equipment and storage medium

Info

Publication number: CN115348461A
Application number: CN202110528340.1A
Authority: CN
Inventors: 朱帅
Original assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd; Guangzhou Shiyuan Artificial Intelligence Innovation Research Institute Co Ltd
Current assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd; Guangzhou Shiyuan Artificial Intelligence Innovation Research Institute Co Ltd
Priority date: 2021-05-14
Filing date: 2021-05-14
Publication date: 2022-11-15

Abstract

The embodiment of the application discloses a teaching video processing method, a teaching video processing device, teaching video processing equipment and a storage medium, and relates to the technical field of network education. The method comprises the following steps: acquiring video data of a teaching video, wherein the video data comprises a plurality of video frames; according to the frame extraction quantity, uniformly extracting a plurality of first video frames in sequence from the video data, dividing the video data into a plurality of video groups according to the first video frames, wherein each video group corresponds to one first video frame; inputting the first video frame into the information frame detection model to obtain a second video frame output by the information frame detection model, wherein the second video frame comprises an information frame; carrying out Gaussian blur processing on the second video frame and a picture corresponding to the information frame in the corresponding video group to obtain a third video frame; and correspondingly replacing the video frame in the video data with the third video frame to obtain new video data. By adopting the technical means, the technical problems that the auditing efficiency is low and the quality is poor and the high-quality teaching resources are lost in the prior art are solved.

Description

Teaching video processing method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of network education, in particular to a teaching video processing method, a teaching video processing device, teaching video processing equipment and a storage medium.

Background

With the popularization of education informatization and the development of network audio and video technology, modern teaching lessons are more and more in the form of video recording or video live broadcast lessons, and the purpose of breaking through space-time limitation and expanding the propagation range is achieved. For a teacher who records a teaching video, the video recording software may record login information of the teacher in the teaching video together, and the information may include information such as the name, the mobile phone number, the email address and the like of the teacher. In order to guarantee the personal privacy of the teacher and the user, the platform for publishing the teaching videos needs to check whether the user information is leaked in each video through manpower.

The inventor finds that the general teaching video is a few minutes in short time and exceeds an hour in long time, the manual review needs to watch the complete video, and the review efficiency is low. With the popularization of information education in recent years, the mode of spreading knowledge by taking videos as teaching content carriers is widely applied, the number of teaching videos is rapidly increased, and due to the fact that the manual review speed is low, video resources of businesses to be reviewed are continuously accumulated. In the video auditing process, auditors are difficult to avoid fatigue, and the reliability and effectiveness of auditing results are influenced by fatigue auditing. And if the auditor finds that the teaching video contains the user privacy information, the video is divided into unavailable resources due to privacy protection, and therefore the resources of the excellent teaching video are not fully utilized and are lost.

Disclosure of Invention

The embodiment of the application provides a teaching video processing method, a teaching video processing device, teaching video processing equipment and a storage medium, and solves the technical problems that in the prior art, auditing efficiency is low, quality is poor, and high-quality teaching resources are lost.

In a first aspect, an embodiment of the present application provides a teaching video processing method, including:

the method comprises the steps of acquiring a teaching video, and extracting audio data and video data from the teaching video, wherein the video data comprise a plurality of video frames;

according to a preset frame extraction quantity, uniformly extracting a plurality of first video frames from the video data in sequence, dividing the video data into a plurality of video groups according to the first video frames, wherein each video group corresponds to one first video frame;

inputting the first video frame into a pre-trained information frame detection model to obtain a second video frame output by the information frame detection model, wherein the second video frame comprises an information frame which is detected by the information frame detection model and is correspondingly marked;

carrying out Gaussian blur processing on the second video frame and a picture corresponding to the information frame in the corresponding video group to obtain a third video frame;

and correspondingly replacing the video frame in the video data with the third video frame to obtain new video data, and synthesizing the new video data and the audio data into a new teaching video.

In a second aspect, an embodiment of the present application provides a teaching video processing apparatus, including:

the video frame acquisition module is configured to acquire a teaching video, and extract audio data and video data from the teaching video, wherein the video data comprises a plurality of video frames;

the video frame extraction module is configured to uniformly extract a plurality of first video frames from the video data in sequence according to a preset frame extraction quantity, divide the video data into a plurality of video groups according to the first video frames, and each video group corresponds to one first video frame;

the information detection module is configured to input the first video frame into a pre-trained information frame detection model to obtain a second video frame output by the information frame detection model, wherein the second video frame comprises an information frame which is detected by the information frame detection model and correspondingly marked by the information frame detection model;

the fuzzy processing module is configured to perform Gaussian fuzzy processing on the second video frame and a picture corresponding to the information frame in the corresponding video group to obtain a third video frame;

and the video synthesis module is configured to correspondingly replace the video frame in the video data with the third video frame to obtain new video data, and synthesize the new video data and the audio data into a new teaching video.

In a third aspect, an embodiment of the present application provides a teaching video processing device, including:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the instructional video processing method of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium on which a computer program is stored, which when executed by a processor, implements the teaching video processing method according to the first aspect.

According to the teaching video processing method, the teaching video processing device, the teaching video processing equipment and the teaching video processing storage medium, the frames are uniformly extracted from the teaching video, and only the privacy information detection is carried out on the first video frame, so that the information desensitization processing time of the teaching video is greatly shortened, and compared with manual auditing, the teaching video processing method, the teaching video processing equipment and the teaching video processing storage medium greatly optimize auditing efficiency. And detecting the privacy information in the first video frame through an information frame detection model, marking the information frame, and performing fuzzification desensitization on the picture of the information frame through a Gaussian smoothing operator to realize automatic accurate detection and desensitization of the privacy information. The video data comprises a plurality of groups of video frames with the same courseware content, the teaching video is divided into a plurality of video groups through the first video frame, the picture content of the first video frame represents the picture content of the corresponding video group, and the video group and the corresponding first video frame are processed in the same way, so that the information desensitization of the whole teaching video is realized. The desensitized video data and the audio data are synthesized into a new teaching video which does not contain privacy information, the teaching video meets the requirement of the online teaching video, the video can be uploaded to an online course resource library, and compared with the method that the teaching video containing the privacy information is divided into unavailable resources during manual examination, the method and the device reserve more high-quality teaching video resources.

Drawings

Fig. 1 is a flowchart of a teaching video processing method according to an embodiment of the present application;

FIG. 2 is a diagram of a second video frame provided in an embodiment of the present application;

FIG. 3 is a schematic coordinate diagram of a pixel map provided by an embodiment of the present application;

FIG. 4 is a Gaussian weight matrix representation of a pixel map provided by an embodiment of the present application;

FIG. 5 is a pixel diagram of a pixel map provided by an embodiment of the present application;

FIG. 6 is a schematic drawing of a frame of video data according to an embodiment of the present application;

FIG. 7 is a diagram of a first video frame provided by an embodiment of the present application;

FIG. 8 is a diagram of a second video frame provided by an embodiment of the present application;

FIG. 9 is a diagram of a third video frame according to an embodiment of the present application;

FIG. 10 is a flow diagram of another instructional video processing method provided by one embodiment of the present application;

fig. 11 is a schematic structural diagram of an instructional video processing apparatus according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of an instructional video processing device according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are for purposes of illustration and not limitation. It should be further noted that, for the convenience of description, only some of the structures associated with the present application are shown in the drawings, not all of them.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action or object from another entity or action or object without necessarily requiring or implying any actual such relationship or order between such entities or actions or objects. For example, the "first" and "second" of the first video frame and the second video frame are used to distinguish the video frames resulting from the different steps.

The teaching video processing method provided in the embodiment of the present application may be executed by a teaching video processing device, the teaching video processing device may be implemented in a software and/or hardware manner, and the teaching video processing device may be formed by two or more physical entities, or may be formed by one physical entity. For example, the instructional video processing device may be a smart device such as a computer.

The teaching video processing device is provided with at least one type of operating system, wherein the operating system comprises but is not limited to an android system, a Linux system and a Windows system. In an embodiment, the teaching video processing device at least follows an application program capable of executing the teaching video processing method, and thus, the teaching video processing device may also be the application program itself.

In order to facilitate understanding, a computer is taken as an exemplary description of the teaching video processing device in the embodiment.

Fig. 1 is a flowchart of a teaching video processing method according to an embodiment of the present application. Referring to fig. 1, the teaching video processing method includes:

s110, obtaining a teaching video, and extracting audio data and video data from the teaching video, wherein the video data comprises a plurality of video frames.

In the conventional auditing process, every time a teacher user uploads a teaching video, an auditor needs to open the teaching video and then watch the complete video to audit whether the video contains the personal privacy information of the teacher user, and the whole process may take several minutes to tens of minutes. And after the video is watched, auditing personnel make auditing judgment according to the watching result, and if the teaching video does not contain privacy information, the teaching video is added into an online video resource library through auditing so as to be used online. If the teaching video contains the privacy information, the video is eliminated, more high-quality contents can be wasted through the processing, and the teacher user needs to record related videos again, so that the enthusiasm of the teacher user is attacked, and the online teaching video platform is not favorable for long-term development. But if such instructional videos containing private information are retained, the teacher-user private information is exposed. Based on this, it can be found that the privacy information contained in the teaching video is a source of resource waste or information exposure problems, if some technical means are used, the privacy information in the uploaded teaching video is desensitized, so that the privacy information in the teaching video is not exposed, the teaching video without the privacy information is qualified teaching video, and the teaching video can be uploaded to an online video resource library, so that the privacy of a user is protected while high-quality teaching resources are reserved.

Therefore, the present embodiment is directed to desensitizing the private information in the teaching video, so that the private information is no longer exposed in the teaching video. In general, the private information in the teaching video is personal information such as a teacher name and a telephone number included in a courseware used by a teacher user. The teaching video is composed of audio data and video data, private information only appears in the video data, the video data is formed by arranging a plurality of frames of video frames according to a recording time sequence, the private information in the teaching video is private information pictures contained in video frame pictures, and desensitization treatment needs to be carried out on all the video frames containing the private information pictures in order to achieve desensitization of the teaching video private information.

In order to obtain video data in the teaching video, audio data and video data in the teaching video are separated, audio data are correspondingly obtained, the audio data are stored for later use, and subsequent privacy information desensitization processing is carried out on the video data. Specifically, audio data and video data in the teaching video are separated through FFmpeg to obtain audio data and video data of the teaching video, and the audio data are stored so that the audio data and the desensitized video data can be synthesized into a new teaching video. And reading and frame cutting processing is carried out on the video data to obtain all video frames in the video data. The FFmpeg module is an external tool package capable of being introduced by Python programming language, and the tool package comprises a tool with an audio and video stream decomposition function and a tool with an audio and video stream synthesis function. It will be appreciated that subsequent audio data and desensitised video data are also synthesized by FFmepg.

And S120, uniformly extracting a plurality of first video frames from the video data in sequence according to the preset number of the extracted frames, and dividing the video data into a plurality of video groups according to the first video frames, wherein each video group corresponds to one first video frame.

One video data usually contains tens of thousands of video frames, if each frame of the video data needs to be subjected to individual privacy information detection, an information frame detection model needs about 120ms for detecting one frame of picture, and detection of tens of thousands of frames needs extremely large time overhead, so that the video auditing rate is seriously influenced. Therefore, in the embodiment, based on the practical teaching scene of the teacher, according to the statistical data of the practical service party, the content of the teaching video recorded by the teacher is found to be the teaching courseware, and the privacy information only appears in the teaching courseware. It can be understood that when the teacher explains the contents of the courseware, the teacher explains each page of the courseware, and therefore a plurality of continuous video frames in the video data record the contents of the same page of the courseware at the same time, so long as the video frames extracted from the video data comprise the contents of each page of the courseware, all the privacy information in the courseware can be ensured to be detected. It should be noted that the number of frames must be greater than the number of pages of the courseware, and the number of frames in this embodiment may be automatically input by the auditor according to the number of pages of the courseware.

For example, for a piece of video data with the total frame number F, it is assumed that the number of frames extracted set by the auditor is N. And extracting a video frame at intervals of F/N frames according to the video frame recording time information of the video data, and taking the video frame as a first video frame. It is understood that the video data is regarded as being divided into a plurality of video sections by the first video frame, each video section includes a plurality of video frames, one video section corresponds to one video group, and each video group corresponds to one first video frame. It can be understood that, a first video frame is adjacent to a front video frame and a rear video frame of the video interval, and the first video frame corresponding to the video group is one of the front and rear adjacent first video frames. Specifically, the following three situations occur in the grouping of video data:

in the first case, the first frame of the video data is taken as the first video frame, and then one video frame is extracted every F/N frame as the first video frame until N frames are extracted. At this time, the video data is divided into N video intervals, that is, divided into N video groups, and the first video frame corresponding to each video group is the first video frame adjacent to the corresponding video interval in the forward direction, for example, the first segment of the video data corresponds to the first frame.

In the second situation, the last frame of the video data is taken as the first video frame, and one video frame is extracted from every F/N frame interval in the forward direction to be taken as the first video frame until N frames are extracted. At this time, the video data is divided into N video intervals, that is, divided into N video groups, and the first video frame corresponding to each video group is the first video frame adjacent to the corresponding video interval backward, for example, the last video interval of the video data corresponds to the last frame.

In the third case, the first frame and the last frame of the video data are not used as the first video frame, but N frames are extracted from the middle. At this time, the video data is divided into N +1 video intervals, i.e., into N +1 video groups. The first section of video interval of the video data corresponds to the first video frame adjacent backwards, the last section of video interval corresponds to the first video frame adjacent forwards, and the middle video interval corresponds to the first video frame adjacent backwards in a unified mode or corresponds to the first video frame adjacent forwards in a unified mode.

It can be understood that, in the present embodiment, a video group is regarded as a set of video frames in which a same courseware picture is continuously recorded in video data, and a first video frame corresponding to the video group may be regarded as a video frame containing the same courseware picture as each video frame in the video group, that is, a privacy information picture in the first video frame represents privacy information pictures of all video frames in the corresponding video group. It should be noted that, in an actual situation, there may be some video frames in the video group that are different from the corresponding first video frame, and the blurring processing or the non-blurring processing is performed on the video frames that are different from the corresponding first video frame, which may cause the processing manner of the video frames to be contrary to the expectation. However, after the number of frames is set to be at least twice the number of courseware pages, based on the principle that the larger the number of frames is, the smaller the number of video frames of each video group is, the fewer the video frames of which the processing mode is contrary to the expectation appear in the video group, the privacy information in the courseware is combined not to appear on each courseware, and only the actual situation that the video frames of which the processing mode is contrary to the expectation appear in a single video group occurs. The scheme has no obvious negative effect in practical application, namely, only a small part of video frames can be processed in a mode which is contrary to the expectation, but the negative effect is small in consideration of the improvement of the overall rate. If one wants to avoid the occurrence of video frames whose processing is contrary to the expected processing, the extracted video frames and the corresponding video group can be manually set.

It should be noted that the video group is only a limited term, which is used to describe a video frame for which the same information desensitization processing is performed with respect to the corresponding first video frame, and the present embodiment does not limit the step of grouping video data to be performed.

S130, inputting the first video frame into a pre-trained information frame detection model to obtain a second video frame output by the information frame detection model, wherein the second video frame comprises information frames which are detected by the information frame detection model and correspondingly marked by the information frame detection model.

The first video frame can be regarded as a video frame containing courseware content of each page, privacy information detection is carried out on the first video frame, whether the courseware content contains privacy information or not can be determined, and therefore privacy information desensitization processing can be carried out on the first video frame containing the privacy information and a video group with the same courseware content as the first video frame.

Illustratively, the present embodiment employs a deep convolutional neural network model, retinaNet, to detect private information in the first video frame. RetinaNet is a neural network model used to perform target detection tasks. Specifically, a batch of sample data for training the RetinaNet model is manually prepared in advance, the sample data comprises positive sample data and negative sample data, the negative sample data is a video frame not containing privacy information, the positive sample data is a video frame containing a privacy information picture, and the privacy information picture is manually marked out through an information frame. Training a RetinaNet model through sample data to obtain an information frame detection model which can detect a privacy information picture in a video frame and mark an information frame, wherein the information frame detection model has the functions of detecting the privacy information picture and marking the information frame.

2000 pieces of positive sample data and 2000 pieces of negative sample data are prepared in advance, the RetinaNet model is trained through the 4000 pieces of sample data, the first video frame is input into the trained RetinaNet model, and the detection result output by the RetinaNet model is obtained. If the first video frame includes the privacy information, the detection result is a second video frame, and fig. 2 is a schematic diagram of the second video frame provided in the embodiment of the present application. As shown in fig. 2, the video frame pictures of the second video frame include a privacy information picture, and the privacy information picture is circled by an information frame 11. If the first video frame does not contain the privacy information, the detection result is a fourth video frame, and the fourth video frame does not contain the information frame.

It should be noted that the fourth video frame is the same as the first video frame, and the second video frame is distinguished by the limited name of the fourth video frame, so that different processing is performed on the second video frame and the fourth video frame subsequently.

In addition, when the second video frame output by the information frame detection model is detected, the pixel coordinates of the information frame in the second video frame can be output, so that the pixel point in the information frame can be determined according to the pixel coordinates of the information frame subsequently, gaussian blur processing is carried out on the pixel point, and privacy information desensitization is realized. And the picture in the information frame is the privacy information picture detected by the information frame detection model.

S140, carrying out Gaussian blur processing on the second video frame and the picture corresponding to the information frame in the corresponding video group to obtain a third video frame.

For example, as can be seen from the above description, the first video frame is the same as the video frame courseware picture in the corresponding video group, the second video frame is the first video frame detected by the information frame detection model to include the privacy information, and the information frame is marked in the second video frame, then the video group corresponding to the second video frame also includes the privacy information picture, and the position of the privacy information picture is the same as the position in the second video frame. Therefore, the second video frame and the privacy information picture in the corresponding video group can be determined according to the pixel coordinates of the information frame, and the third video frame is obtained by carrying out Gaussian blur processing on the privacy information picture. Accordingly, the gaussian blur processing step specifically includes S1401 to S1402:

and S1401, correspondingly marking the information frame in all the video frames of the corresponding video group according to the information frame in the second video frame.

Illustratively, the privacy information picture corresponding to each video frame in the video group is determined according to the pixel coordinates of the information frame of the second video frame. Referring to FIG. 2, assume the upper left corner of the second video frame is the origin of coordinates (0, 0) and the pixel coordinates in the upper left corner of the information box are (x) ₁ ，y ₁ ) The pixel coordinate of the lower right corner of the information frame is (x) ₂ ，y ₂ ). Setting the coordinate of the upper left corner of the video group corresponding to the second video frame as the origin of coordinates, and setting the pixel coordinate of the information frame of the second video frame as (x) ₁ ，y ₁ ) And (x) ₂ ，y ₂ ) A respective information box is marked in each video frame of the corresponding video group.

And S1402, performing Gaussian blur processing on the picture of the information frame.

Illustratively, in this embodiment, a gaussian blurring operator is used to perform blurring processing on a pixel point in an information frame. The gaussian blurring operator is a smoothing operator which is common in image processing, and the function of the gaussian blurring operator is to perform smoothing blurring processing on a picture. The blurring can be understood as that each pixel point takes the weighted average of the surrounding pixels, thereby reducing the difference with the surrounding pixels. Accordingly, the gaussian blur calculating step specifically includes S14021 to S14023:

s14021, according to the position information of the information frame, obtaining the pixel point of the corresponding video frame at the relative position.

Illustratively, the pixel points in the information frame are determined according to the pixel coordinates of the information frame in the corresponding video frame. For example, referring to FIG. 2, assume the upper left corner of the video frame is the origin of coordinates (0, 0) and the pixel coordinates in the upper left corner of the information box are (x) ₁ ，y ₁ ) The pixel coordinate of the lower right corner of the information frame is (x) ₂ ，y ₂ ) Then the image contained in the framePrime point X = { (X) _i ，y _i )|x ₁ ≤x _i ≤x ₂ &y ₁ ≤y _i ≤y ₂ }。

And S14022, acquiring a preset Gaussian weight matrix, and acquiring a pixel map formed by pixel points and eight peripheral pixel points.

Illustratively, since the images in the computer are all represented by two-dimensional matrices, the present embodiment uses two-dimensional gaussian functions to calculate gaussian weight matrices. The two-dimensional gaussian function formula is:

where x and y are pixel coordinates of a pixel point, and σ is a constant, and σ =1.5 is taken in this embodiment.

Further, in this embodiment, a pixel map with a width and a height equal to the width of three pixels is used to perform weighted calculation on the pixels located in the center of the pixel map. Specifically, fig. 3 is a coordinate diagram of a pixel map provided in an embodiment of the present application. As shown in FIG. 3, the coordinate of the center pixel point is set to (0, 0), and the coordinates of other points in the pixel map are (-1, 1), (0, 1), (1, 1), (-1, 0), (-1, -1), (0, -1), and (1, -1), respectively. Correspondingly, fig. 4 is a gaussian weight matrix diagram of a pixel map provided in an embodiment of the present application. As shown in fig. 4, each coordinate in fig. 3 is substituted into a two-dimensional gaussian function formula, and a gaussian weight matrix in fig. 4 is calculated.

It should be noted that the gaussian blurring processing is to take a weighted average of peripheral pixels according to a blurred pixel value of each pixel, so as to reduce a difference between the blurred pixel value and the peripheral pixels. Although the gaussian weight matrix is obtained from the coordinates of the peripheral pixels, the blurred pixel values obtained by convolving the gaussian weight matrix with the pixel map are related to the pixel values of the peripheral pixels. Therefore, the effect of fuzzy processing can be achieved no matter what value the coordinate in the pixel map takes. In this regard, in the present embodiment, the coordinates of the pixel map corresponding to each pixel point in each frame are set as the coordinates in fig. 3, and correspondingly, the gaussian weight matrix in fig. 4 is directly used in the gaussian blur calculation.

And S14023, multiplying the weight of the Gaussian weight matrix by the corresponding pixel value in the pixel map, and summing all the products to obtain the pixel value of the pixel point after the Gaussian blur processing.

For example, fig. 5 is a pixel diagram of a pixel map provided in an embodiment of the present application. As shown in fig. 5. Assuming that the pixel value before Gaussian blur calculation of a central pixel is a, the pixel values of peripheral pixels are a ₁ 、a ₂ 、a ₃ 、a ₄ 、a ₅ 、a ₆ 、a ₇ And a ₈ And if the peripheral pixel points of the central pixel point are not in the information frame, taking 0 as the pixel value of the peripheral pixel points. Fuzzy pixel value a' =0.09474a after central pixel point is subjected to Gaussian fuzzy calculation ₁ +0.01183a ₂ +0.09474a ₃ +0.01183a ₄ +0.1478a+0.01183a ₅ +0.09474a ₆ +0.01183a ₇ +0.09474a ₈ 。

Further, after the second video frame and each pixel point of the corresponding video group corresponding to the privacy information picture in the information frame are subjected to Gaussian blur calculation, a corresponding third video frame can be obtained.

It should be noted that the fourth video frame does not include an information frame, and the video group corresponding to the fourth video frame does not include a private information picture, so that it is not necessary to perform the gaussian blur processing on the fourth video frame and the corresponding video group.

S150, correspondingly replacing the video frame in the video data with the third video frame to obtain new video data, and synthesizing the new video data and the audio data into a new teaching video.

Illustratively, the third video frame is a video frame after gaussian blur processing, and the third video frame does not display a privacy information picture in a courseware picture any more, so that the third video frame is replaced by a video frame in the video data correspondingly, and the video frame in the video data, which originally displays the privacy information picture, is replaced by a blurred video frame, thereby avoiding exposure of user privacy information. And synthesizing the audio data and the new video data into a new teaching video through FFmpeg, wherein a privacy information picture in the teaching video is subjected to fuzzy processing and does not expose privacy information of a user any more, the teaching video meets an auditing qualified standard, and then uploading the teaching video to an online video resource library.

The technical solutions provided by the embodiments are exemplarily described below:

the method comprises the steps of obtaining a teaching video uploaded by a user, separating the teaching video into audio data and video data through FFmpeg, and cutting frames of the video data to obtain all video frames in the video data. Assuming that the number of the teaching courseware in the teaching video is 20, the auditor sets the number of the frames to be 40 according to the number of the teaching courseware pages. Assuming that the video data includes 8000 video frames, one video frame is extracted as a first video frame every 200 frames. Fig. 6 is a schematic drawing of a frame of video data according to an embodiment of the present application. Referring to fig. 6, video frames of the 1 st, 201 st, 401 th, etc. video data of 200 frames per interval are extracted as the first video frames 12. 199 video frames between the 1 st video frame and the 201 st video frame are taken as the video group 13 of the 1 st first video frame 12, and correspondingly, the video frame between two adjacent first video frames 12 is taken as the video group 13 of the first video frame 12 adjacent to the former. Fig. 7 is a schematic diagram of a first video frame according to an embodiment of the present application. Referring to fig. 7, the courseware page 15 of the first video frame 12 contains a privacy information screen 14. The first video frame 12 is input into a pre-trained information frame detection model to obtain a second video frame output by the information frame detection model. Fig. 8 is a schematic diagram of a second video frame according to an embodiment of the present application. Referring to fig. 8, the private information picture 14 in the second video frame 17 is marked by an information box 16.

Furthermore, after all the first video frames are subjected to information detection, a second video frame or a fourth video frame corresponding to the first video frame is obtained. And marking the corresponding information frame in the video group corresponding to the second video frame according to the pixel coordinates of the information frame of the second video frame. And carrying out fuzzy processing on pixel points in the second video frame and the information frame in the corresponding video group to obtain a third video frame. Fig. 9 is a schematic diagram of a third video frame according to an embodiment of the present application. Referring to fig. 9, the information frame pictures in the third video frame 18 become a blurred pixel map 19.

And after the third video frame is obtained, the third video frame is correspondingly replaced with the video data according to the position of the first video frame corresponding to the third video frame in the video data. For example, the first video frame corresponding to the third video frame is the 1 st video frame of the video data, the third video frame is replaced with the 1 st video frame of the video data. And after all the third video frames are correspondingly replaced by the video frames in the video data, obtaining new video data. And synthesizing the new video data and the audio data into a new teaching video through FFmpeg. According to the teaching video processing method, the privacy information detection is only carried out on the first video frame by uniformly extracting frames from the teaching video, so that the information desensitization processing time of the teaching video is greatly shortened, and compared with manual review, the review efficiency is greatly optimized. And detecting the privacy information in the first video frame through an information frame detection model, marking the information frame, and performing fuzzification desensitization on the picture of the information frame through a Gaussian smoothing operator to realize automatic accurate detection and desensitization of the privacy information. As the video data comprises a plurality of groups of video frames with the same courseware content, the teaching video is divided into a plurality of video groups through the first video frame, the picture content of the first video frame represents the picture content of the corresponding video group, and the video group and the corresponding first video frame are processed in the same way, so that the information desensitization of the whole teaching video is realized. The desensitized video data and the audio data are synthesized into a new teaching video which does not contain privacy information, the teaching video meets the requirement of the online teaching video, the video can be uploaded to an online course resource library, and compared with the method that the teaching video containing the privacy information is divided into unavailable resources during manual examination, more high-quality teaching video resources are reserved in the method.

Fig. 10 is a schematic flowchart of another teaching video processing method according to an embodiment of the present application. The present embodiment is embodied on the basis of the above-described embodiments. Referring to fig. 10, the teaching video processing method provided in this embodiment includes:

s210, obtaining a teaching video, and extracting audio data and video data from the teaching video, wherein the video data comprises a plurality of video frames.

Illustratively, a teaching video uploaded by a user is obtained, audio and video separation processing is performed on the teaching video to obtain audio data and video data, frame cutting processing is performed on the video data to obtain a plurality of video frames, and the number of the video frames of the video data is determined.

S220, obtaining teaching courseware corresponding to the teaching video, determining the number of pages of the teaching courseware, and determining the number of frames according to the number of pages.

Illustratively, when the user uploads the teaching video, the corresponding teaching courseware is uploaded additionally. The method comprises the steps of obtaining teaching courseware uploaded by a user, determining the number of pages of the teaching courseware, and setting the number of frames to be two times or more than two times of the number of pages of the teaching courseware.

And S230, calculating the ratio of the number of the video frames to the number of the frames in the video data.

Illustratively, the ratio of the number of video frames to the number of decimated frames is calculated as the decimated interval of the uniform decimated frames.

And S240, taking the first frame of video frame of the video data and the video frame of the interval ratio from the first frame of video frame as the first video frame.

Illustratively, the extraction is started from the first frame of the video data, and then one frame is extracted from the ratio of the number of video frames per interval to the number of extracted frames, and the extracted video frame is taken as the first video frame.

And S250, inputting the first video frame into a pre-trained information frame detection model to obtain a second video frame or a fourth video frame output by the information frame detection model, wherein the second video frame comprises an information frame which is detected by the information frame detection model and is correspondingly marked, and the fourth video frame is a video frame which is output by the information frame detection model and does not comprise the information frame.

Illustratively, a first video frame is input into an infoframe detection model to detect whether the first video frame contains a private information picture. If the first video frame contains the privacy information picture, the information frame detection model outputs a second video frame, and if the first video frame does not contain the privacy information picture, the information frame detection model outputs a fourth video frame.

And S260, correspondingly replacing the video frames in the video data with the second video frame and the fourth video frame to obtain the video data to be blurred.

It is understood that a first video frame corresponds to a second video frame or a fourth video frame, and the first video frame corresponds to a video frame in the video data, so the second video frame or the fourth video frame can also find the corresponding video frame in the video data. And correspondingly replacing the video frames in the video data with the second video frame and the fourth video frame to obtain the video data to be blurred, so as to determine a video group corresponding to the second video frame or the fourth video frame according to the video frame recording sequence of the video data to be blurred.

S270, sequentially extracting video frames in the video data to be blurred, and if the extracted video frames are fourth video frames, executing the step S280; if the extracted video frame is the second video frame, step S290 is executed; if the extracted video frame is not the fourth video frame or the second video frame, step S300 is executed.

It should be noted that, the present embodiment refers to the first case mentioned above, that is, the first frame of the video data to be blurred is the first video frame, and in such a case, the first video frame corresponding to the video group is the first video frame corresponding to the forward adjacent video frame. And sequentially extracting the video frames in the video data to be blurred, and after the second video frame or the fourth video frame is extracted, determining that the video frame belongs to the video group of the second video frame or the fourth video frame which is extracted recently in the forward direction if the video frame extracted backwards is not the second video frame or the fourth video frame.

And S280, performing Gaussian blur processing on the picture of the information frame, and taking the video frame after the blur processing as a third video frame.

For example, if the extracted video frame is a second video frame, and the second video frame includes an information frame, the second video frame needs to be subjected to gaussian blur processing.

And S290, taking the video frame as a third video frame.

For example, if the extracted video frame is a fourth video frame, the fourth video frame does not include a privacy information picture nor an information frame, and the fourth video frame does not need to be subjected to gaussian blur processing.

And S300, performing the same processing on the video frame as the previous frame, and taking the processed video frame as a third video frame.

For example, if the extracted video frame is neither the second video frame nor the fourth video frame, it is determined that the extracted video frame belongs to the video group corresponding to the second video frame or the fourth video frame extracted last before, and the extracted video frame needs to be processed in the same way as the corresponding video frame. The video frames between the extracted video frame and the corresponding video frame also belong to the video group, and the video frames in the same video group are processed in the same way, so that the extracted video frame can be processed in the same way as the previous frame.

And S310, correspondingly replacing the video frame in the video data with the third video frame to obtain new video data, and synthesizing the new video data and the audio data into a new teaching video.

For example, as can be seen from the above, all video frames in the video data to be blurred are converted into third video frames, so that all the third video frames obtained in the above steps are synthesized into new video data according to the recording time sequence. And then the new video data and the audio data are synthesized into a new teaching video through the FFmpeg.

According to the teaching video processing method, the frames are uniformly extracted from the teaching video, and only the privacy information detection is carried out on the first video frame, so that the information desensitization processing time of the teaching video is greatly shortened, and compared with manual review, the review efficiency is greatly optimized. And detecting the privacy information in the first video frame through an information frame detection model, marking out an information frame, and performing fuzzification desensitization on the picture of the information frame through a Gaussian smoothing operator to realize automatic accurate detection and desensitization of the privacy information. The video data comprises a plurality of groups of video frames with the same courseware content, the teaching video is divided into a plurality of video groups through the first video frame, the picture content of the first video frame represents the picture content of the corresponding video group, and the video group and the corresponding first video frame are processed in the same way, so that the information desensitization of the whole teaching video is realized. The desensitized video data and the audio data are synthesized into a new teaching video which does not contain privacy information, the teaching video meets the requirement of the online teaching video, the videos can be uploaded to an online course resource library, and compared with the teaching video which contains the privacy information and is divided into unavailable resources during manual examination, more high-quality teaching video resources are reserved in the embodiment.

Fig. 11 is a schematic structural diagram of a teaching video processing apparatus according to an embodiment of the present application. Referring to fig. 11, the teaching video processing apparatus includes: a video frame acquisition module 401, a video frame extraction module 402, an information detection module 403, a blur processing module 404, and a video composition module 405.

The video frame acquiring module 401 is configured to acquire a teaching video, and extract audio data and video data from the teaching video, where the video data includes a plurality of video frames.

The video frame extraction module 402 is configured to uniformly extract a plurality of first video frames in sequence from the video data according to a preset number of extracted frames, and divide the video data into a plurality of video groups according to the first video frames, where each video group corresponds to one first video frame.

The information detection module 403 is configured to input the first video frame into a pre-trained information frame detection model to obtain a second video frame output by the information frame detection model, where the second video frame includes an information frame that is detected by the information frame detection model and is marked correspondingly.

And a blurring processing module 404 configured to perform gaussian blurring processing on the second video frame and a picture corresponding to the information frame in the corresponding video group to obtain a third video frame.

And a video synthesis module 405 configured to correspondingly replace the video frame in the video data with the third video frame to obtain new video data, and synthesize the new video data and the audio data into a new teaching video.

On the basis of the above embodiment, the teaching video processing apparatus further includes a frame number determining module configured to acquire a teaching piece corresponding to the teaching video, determine the number of pages of the teaching piece, and determine the frame number according to the number of pages.

On the basis of the above embodiment, the video frame extraction module includes: the interval calculation unit is configured to calculate the ratio of the number of video frames to the number of extracted frames in the video data; and the extraction unit is configured to take the first frame video frame of the video data and the video frame of the interval ratio from the first frame video frame as the first video frame.

On the basis of the above embodiment, the blur processing module includes: the video frame replacing unit is configured to correspondingly replace the video frame in the video data with the second video frame and the fourth video frame to obtain the video data to be blurred, and the fourth video frame is a video frame which is output by the information frame detection model and does not contain the information frame; the extraction unit is configured to extract video frames in the video data to be blurred in sequence; the first processing unit is configured to perform Gaussian blur processing on the picture of the information frame if the extracted video frame is the fourth video frame, and take the video frame after the blur processing as the third video frame; a second processing unit configured to take the extracted video frame as a third video frame if the video frame is a second video frame; and the third processing unit is configured to perform the same processing as the previous frame on the video frame if the extracted video frame is not the fourth video frame or the second video frame, and take the processed video frame as the third video frame.

On the basis of the above embodiment, the blur processing module includes: an information frame marking unit configured to correspondingly mark the information frame in all the video frames of the corresponding video group according to the information frame in the second video frame; a Gaussian blur processing unit configured to perform Gaussian blur processing on the picture of the information frame.

On the basis of the above embodiment, the first processing unit includes: the first pixel determination subunit is configured to acquire pixel points of the corresponding video frames at the relative positions according to the position information of the information frame; the first pixel map acquisition subunit is configured to acquire a preset Gaussian weight matrix and acquire a pixel map composed of pixel points and eight peripheral pixel points; and the first Gaussian blur calculation subunit is configured to multiply the weight of the Gaussian weight matrix with the corresponding pixel value in the pixel map, and sum all the products to obtain the pixel value of the pixel point after the Gaussian blur processing.

On the basis of the above embodiment, the gaussian blur processing unit includes: the second pixel determination subunit is configured to acquire pixel points of the corresponding video frames at the relative positions according to the position information of the information frame; the second pixel map acquisition sub-unit is configured to acquire a preset Gaussian weight matrix and acquire a pixel map composed of pixel points and eight peripheral pixel points; and the second Gaussian blur calculation subunit is configured to multiply the weight of the Gaussian weight matrix with the corresponding pixel value in the pixel map, and sum all the products to obtain the pixel value of the pixel point after the Gaussian blur processing.

On the basis of the above embodiment, the video synthesis module includes: and the separation unit is configured to separate the audio data and the video data in the teaching video and correspondingly acquire the audio data and the video data.

It should be noted that, in the embodiment based on the teaching video processing apparatus, the included units and modules are merely divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the present application.

The teaching video processing device provided by the embodiment of the application is included in teaching video processing equipment, can be used for executing the teaching video processing method provided by any embodiment, and has corresponding functions and beneficial effects.

Fig. 12 is a schematic structural diagram of a teaching video processing device according to an embodiment of the present application. As shown in fig. 12, the teaching video processing apparatus includes a processor 50, a memory 51, an input device 52, an output device 53, and a display screen 54; the number of the processors 50 in the teaching video processing device may be one or more, and one processor 50 is taken as an example in fig. 12; the number of the display screens 54 in the teaching video processing apparatus may be one or more, and one display screen 54 is exemplified in fig. 12; the processor 50, the memory 51, the input device 52, the output device 53, and the display 54 in the teaching video processing apparatus may be connected by a bus or other means, and the bus connection is exemplified in fig. 12.

The memory 51 is used as a computer-readable storage medium for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the teaching video processing method in the embodiment of the present application (for example, the video frame acquiring module 401, the video frame extracting module 402, the information detecting module 403, the blurring processing module 404, and the video synthesizing module 405 in the teaching video processing apparatus). The processor 50 executes various functional applications and data processing of the teaching video processing apparatus by executing software programs, instructions, and modules stored in the memory 51, that is, implements the teaching video processing method described above.

The memory 51 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the teaching video processing apparatus, and the like. Further, the memory 51 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 51 may further include memory located remotely from processor 50, which may be connected to the instructional video processing device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 52 is operable to receive input numeric or character information and generate key signal inputs relating to user settings and function controls of the teaching video processing apparatus. The output device 53 may include an audio output device such as a speaker. The display screen 54 may display video frames.

The teaching video processing device comprises a teaching video processing device, can be used for executing any teaching video processing method, and has corresponding functions and beneficial effects.

Embodiments of the present application also provide a computer-readable storage medium, on which a computer program is stored, which is used to execute the teaching video processing method provided in the above embodiments when the program is executed by a processor.

Of course, the computer-readable storage medium provided in the embodiments of the present application has computer-executable instructions that are not limited to the method operations described above, and may also perform related operations in the teaching video processing method provided in any embodiments of the present application.

From the above description of the embodiments, it is obvious for those skilled in the art that the present application can be implemented by software and necessary general hardware, and certainly can be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods of the embodiments of the present application.

It is to be noted that the foregoing is only illustrative of the presently preferred embodiments and application of the principles of the present invention. Those skilled in the art will appreciate that the present application is not limited to the particular embodiments described herein, but is capable of many obvious modifications, rearrangements and substitutions without departing from the scope of the application. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the appended claims.

Claims

1. A method for teaching video processing, comprising:

the method comprises the steps of obtaining a teaching video, and extracting audio data and video data from the teaching video, wherein the video data comprises a plurality of video frames;

uniformly extracting a plurality of first video frames from the video data in sequence according to a preset frame extraction quantity, dividing the video data into a plurality of video groups according to the first video frames, wherein each video group corresponds to one first video frame;

2. The method according to claim 1, further comprising, before the step of extracting frames according to the preset number of frames:

and acquiring the teaching courseware corresponding to the teaching video, determining the number of pages of the teaching courseware, and determining the number of frames according to the number of pages.

3. The method according to claim 1, wherein said uniformly extracting a plurality of first video frames in sequence from the video data according to a preset number of frames comprises:

calculating the ratio of the number of video frames in the video data to the number of extracted frames;

and taking a first frame video frame of the video data and video frames which start from the first frame video frame and are spaced by the ratio as first video frames.

4. The method according to claim 3, wherein the performing Gaussian blur processing on the second video frame and a picture corresponding to the information frame in the corresponding video group to obtain a third video frame comprises:

correspondingly replacing the second video frame and a fourth video frame with the video frames in the video data to obtain video data to be blurred, wherein the fourth video frame is a video frame which is output by the information frame detection model and does not contain an information frame;

sequentially extracting video frames in the video data to be blurred;

if the extracted video frame is the fourth video frame, performing Gaussian blur processing on the picture of the information frame, and taking the video frame after the blur processing as the third video frame;

if the extracted video frame is the second video frame, taking the video frame as the third video frame;

and if the extracted video frame is not the fourth video frame or the second video frame, performing the same processing as the previous frame on the video frame, and taking the processed video frame as a third video frame.

5. The method according to claim 1, wherein the performing the gaussian blurring process on the second video frame and the picture corresponding to the information frame in the corresponding video group to obtain a third video frame comprises:

marking the information frames in all the video frames of the corresponding video group correspondingly according to the information frames in the second video frame;

and performing Gaussian blur processing on the picture of the information frame.

6. The method according to any one of claims 4 or 5, wherein the performing the Gaussian blur processing on the picture of the information frame comprises:

acquiring pixel points of the corresponding video frames at the relative positions according to the position information of the information frame;

acquiring a preset Gaussian weight matrix, and acquiring a pixel map formed by the pixel points and eight peripheral pixel points;

and multiplying the weight of the Gaussian weight matrix with the corresponding pixel value in the pixel map, and summing all the products to obtain the pixel value of the pixel point after Gaussian blur processing.

7. The method of claim 1, wherein said extracting audio data and video data from said instructional video comprises:

and separating the audio data and the video data in the teaching video, and correspondingly acquiring the audio data and the video data.

8. An instructional video processing apparatus, comprising:

the video frame extraction module is configured to uniformly extract a plurality of first video frames from the video data in sequence according to a preset frame extraction quantity, and divide the video data into a plurality of video groups according to the first video frames, wherein each video group corresponds to one first video frame;

9. An instructional video processing apparatus, comprising:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the instructional video processing method of any of claims 1-7.

10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the instructional video processing method according to any one of claims 1 to 7.