CN112860943A

CN112860943A - Teaching video auditing method, device, equipment and medium

Info

Publication number: CN112860943A
Application number: CN202110004554.9A
Authority: CN
Inventors: 陈甜甜
Original assignee: Zhejiang Nuonuo Network Technology Co ltd
Current assignee: Zhejiang Nuonuo Network Technology Co ltd
Priority date: 2021-01-04
Filing date: 2021-01-04
Publication date: 2021-05-28

Abstract

The application discloses a teaching video auditing method, device, equipment and medium. The method comprises the following steps: extracting key frames from the target teaching video according to a preset extraction rule to obtain a key frame set, and extracting audio clips from the target teaching video based on the number of bytes to obtain an audio clip set; inputting the key frame set into a pre-established image detection model to obtain an image type corresponding to the key frame, and determining an illegal image according to the image type; converting the audio clips into texts through a voice recognition technology to obtain a text set, inputting the text set into a pre-established text auditing model to obtain audio types corresponding to the audio clips, and determining violation audio according to the audio types; and determining the time point of the violation image in the target teaching video, and determining the time period of the violation audio in the target teaching video. The image detection model and the text auditing model are used for detecting from two aspects of image and audio, so that the efficiency and the accuracy of auditing the target teaching video are improved.

Description

Teaching video auditing method, device, equipment and medium

Technical Field

The invention relates to the field of video auditing, in particular to a teaching video auditing method, device, equipment and medium.

Background

At present, along with the continuous development of computer internet correlation technique, the teaching mode is also being constantly updated and is being developed, the teaching of online lesson, modes such as live teaching are widely accepted the head of a family and classmates, but these internet teaching videos have various illegal risks equally, these illegal teaching videos are unfavorable for the construction of good internet environment, more serious can harm teaching video viewer, especially teenagers' physical and mental health, and because picture and text information alternates changeable in the teaching video, lecturer is changeable, the degree of difficulty has been increased for the illegal detection of teaching video. In the prior art, the auditing is performed manually, but a large amount of manpower and material resources are consumed. Therefore, how to efficiently and accurately detect the illegal content in the teaching video is a problem to be solved urgently at present.

Disclosure of Invention

In view of this, the present invention provides a teaching video auditing method, apparatus, device and medium, which can improve the efficiency and accuracy of target teaching video auditing. The specific scheme is as follows:

in a first aspect, the present application discloses a teaching video auditing method, including:

extracting key frames from a target teaching video according to a preset extraction rule to obtain a key frame set, and extracting audio clips from the target teaching video based on byte number to obtain an audio clip set;

inputting the key frame set into a pre-established image detection model to obtain an image type corresponding to the key frame, and determining an illegal image according to the image type;

converting the audio clips into texts through a voice recognition technology to obtain a text set, inputting the text set into a pre-established text auditing model to obtain audio types corresponding to the audio clips, and determining illegal audios according to the audio types;

and determining the time point of the violation image in the target teaching video, and determining the time period of the violation audio in the target teaching video.

Optionally, the extracting an audio clip from the target teaching video based on the number of bytes to obtain an audio clip set includes:

converting the audio corresponding to the target teaching video into binary data according to a preset sampling rate to obtain audio data;

determining the number of target bytes based on the preset sampling rate and the preset audio time length;

and dividing the audio data into a plurality of data segments according to the target byte number to obtain corresponding audio segments so as to obtain the audio segment set.

Optionally, the extracting key frames from the target teaching video according to the preset extraction rule to obtain a key frame set includes:

extracting a first image frame from the target teaching video as a key frame;

sequentially extracting multiple frames of target image frames according to a preset time interval, and sequentially calculating the similarity between the extracted target image frames and adjacent key frames based on a structural similarity algorithm;

and if the similarity is smaller than a preset similarity threshold, taking the target image frame as a key frame to obtain the key frame set.

Optionally, the extracting key frames from the target teaching video according to the preset extraction rule to obtain a key frame set, and extracting audio clips from the target teaching video based on the number of bytes to obtain an audio clip set includes:

acquiring the target teaching video, and separating the target teaching video to obtain a corresponding image sequence and an audio file;

segmenting the image sequence and the audio file to obtain a segmented image sequence and a segmented audio file;

extracting the key frames from the segmented image sequence by using a thread pool according to a preset extraction rule to obtain the key frame set;

and extracting the audio clip from the segmented audio file by utilizing a thread pool and based on the number of bytes to obtain the audio clip set.

Optionally, the process of creating the image detection model includes:

acquiring an illegal image, and adding corresponding type marking information to the illegal image; the type marking information comprises any one or more of pornography, violence, political sensitivity and vulgar;

grouping the illegal images containing the type marking information to obtain a training image set, a verification image set and a test image set;

and constructing a blank model based on an artificial neural network, and training and detecting the blank model by using the training image set, the verification image set and the test image set to obtain the image detection model.

Optionally, the process of creating the text audit model includes:

acquiring illegal text data, and adding corresponding type marking information to the illegal text data; the type marking information comprises any one or more of pornography, violence, political sensitivity and vulgar;

carrying out data cleaning and corpus preprocessing on the illegal text data, and extracting illegal keywords from the processed illegal text data;

grouping the illegal keywords to obtain a training data set, a verification data set and a test data set;

and constructing a blank model based on machine learning, and training and detecting the blank model by using the training data set, the verification data set and the test data set to obtain the text auditing model.

Optionally, the determining a time point of the violation image in the target teaching video and determining a time period of the violation audio in the target teaching video includes:

determining the time point of the violation image in the target teaching video according to the frame number of the violation image in the target teaching video and the number of frames transmitted by the target teaching video per second;

and determining the time period of the violation audio in the target teaching video according to the target byte number and the preset sampling rate.

In a second aspect, the present application discloses a teaching video auditing device, including:

the extraction module is used for extracting key frames from the target teaching video according to a preset extraction rule to obtain a key frame set, and extracting audio clips from the target teaching video based on the number of bytes to obtain an audio clip set;

the violation image detection module is used for inputting the key frame set into a pre-established image detection model to obtain an image type corresponding to the key frame, and determining a violation image according to the image type;

the illegal audio detection module is used for converting the audio clips into texts through a voice recognition technology to obtain a text set, inputting the text set into a pre-established text auditing model to obtain audio types corresponding to the audio clips, and determining illegal audio according to the audio types;

and the time determining module is used for determining the time point of the violation image in the target teaching video and determining the time period of the violation audio in the target teaching video.

In a third aspect, the present application discloses an electronic device, comprising:

a memory for storing a computer program;

and the processor is used for executing the computer program to realize the teaching video auditing method.

In a fourth aspect, the present application discloses a computer readable storage medium for storing a computer program; wherein the computer program when executed by the processor implements the teaching video auditing method described above.

According to the method, a key frame is extracted from a target teaching video according to a preset extraction rule to obtain a key frame set, and an audio clip is extracted from the target teaching video based on the number of bytes to obtain an audio clip set; inputting the key frame set into a pre-established image detection model to obtain an image type corresponding to the key frame, and determining an illegal image according to the image type; converting the audio clips into texts through a voice recognition technology to obtain a text set, inputting the text set into a pre-established text auditing model to obtain audio types corresponding to the audio clips, and determining illegal audios according to the audio types; and determining the time point of the violation image in the target teaching video, and determining the time period of the violation audio in the target teaching video. Therefore, the key frame set of the target teaching video is extracted through the preset extraction rule, the audio clip set is extracted quickly based on bytes, the illegal image is detected through the image detection model, and the illegal audio is detected through the text conversion and text verification model, so that the multi-aspect detection of the target teaching video can be realized, the time of the illegal image and the illegal audio in the target teaching video is determined, and the efficiency and the accuracy of the verification of the target teaching video are improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of a review method for teaching videos provided by the present application;

fig. 2 is a flowchart of a key frame extraction method provided in the present application;

FIG. 3 is a flowchart of a particular review method for teaching video provided herein;

fig. 4 is a schematic structural diagram of a teaching video auditing apparatus provided in the present application;

fig. 5 is a block diagram of an electronic device provided in the present application.

Detailed Description

In the prior art, the teaching video is audited in a manual mode, and the efficiency of auditing the teaching video is reduced. In order to overcome the technical problem, the application provides a teaching video auditing method which can improve the efficiency and accuracy of teaching video auditing.

The embodiment of the application discloses a teaching video auditing method, and as shown in figure 1, the method can comprise the following steps:

step S11: extracting key frames from the target teaching video according to a preset extraction rule to obtain a key frame set, and extracting audio clips from the target teaching video based on byte number to obtain an audio clip set.

In this embodiment, first, a key frame is extracted from a target teaching video according to a preset extraction rule to obtain a key frame set, and an audio clip with a corresponding length is extracted from the target teaching video based on the number of bytes to obtain an audio clip set.

In this embodiment, the extracting key frames from the target teaching video according to the preset extraction rule to obtain the key frame set may include: extracting a first image frame from the target teaching video as a key frame; sequentially extracting multiple frames of target image frames according to a preset time interval, and sequentially calculating the similarity between the extracted target image frames and adjacent key frames based on a structural similarity algorithm; and if the similarity is smaller than a preset similarity threshold, taking the target image frame as a key frame to obtain the key frame set.

It is understood that the first image frame may be a first frame image in the target teaching video, or may be one frame of images of several previous frames of the target teaching video; specifically, for example, as shown in fig. 2, after the first image frame is taken as a first key frame, a second image frame is extracted according to the first key frame and a preset time interval, and a similarity between the second image frame and the first key frame is calculated based on a structural similarity algorithm, if the similarity is smaller than a preset similarity threshold, the second image frame is taken as a second key frame, if the similarity is greater than or equal to the preset similarity threshold, the second image frame is filtered, a third image frame is extracted according to the second image frame and the preset time interval, and a similarity between the third image frame and the first key frame is calculated, until the target teaching video is extracted and compared according to the contrast screening principle, and a key frame meeting the condition is screened to obtain the key frame set. The preset time interval and the preset similarity threshold value can be adjusted according to video types and auditing requirements, and the similarity filtering of the image frames can also be used for filtering the previous image frame in two image frames for similarity comparison. The obtained key frame set can contain the main content of the target teaching video, the redundancy is reduced, and the complexity of a frame-by-frame detection mode is avoided while the detection accuracy is ensured.

In this embodiment, before performing image frame similarity calculation based on the structural similarity calculation method, image preprocessing, including scaling, noise reduction, and the like, may be performed on the image frame; for example, two frames of images with the same size are scaled equally, the size of the longest side can be maintained at 460 or 380, and then the two extracted frames of images are compared by using a Structural SIMilarity (SSIM) algorithm to obtain the SIMilarity of the two frames.

In this embodiment, the extracting an audio clip from the target teaching video based on the number of bytes to obtain an audio clip set may include: converting the audio corresponding to the target teaching video into binary data according to a preset sampling rate to obtain audio data; determining the number of target bytes based on the preset sampling rate and the preset audio time length; dividing the audio data into a plurality of data segments according to the target byte number to obtain corresponding audio segments so as to obtain the audio segment set; the extraction of the audio segments may be continuous extraction or interval extraction. For example, the audio is converted into monaural binary information by using the sampling rate 16k, then the number of bytes extracted for each 60s audio segment is determined to be 60 × 16000 × 2 according to the preset audio duration 60s, and a plurality of audio segments are extracted according to the number of bytes. Therefore, the extraction speed can be improved by extracting the audio clip by using the byte number, and the video auditing efficiency is improved.

Step S12: inputting the key frame set into a pre-established image detection model to obtain an image type corresponding to the key frame, and determining an illegal image according to the image type.

In this embodiment, after the key frame set is obtained, the key frame set is input to a pre-created image detection model, image types corresponding to different key frames are obtained, and an illegal image is determined according to the image types, where the image types include, but are not limited to, pornographic images, violence images, political sensitive images, vulgar images, and legal images. In this embodiment, the process of creating the image detection model may include: acquiring an illegal image, and adding corresponding type marking information to the illegal image; the type marking information comprises any one or more of pornography, violence, political sensitivity and vulgar; grouping the illegal images containing the type marking information to obtain a training image set, a verification image set and a test image set; and constructing a blank model based on an artificial neural network, and training and detecting the blank model by using the training image set, the verification image set and the test image set to obtain the image detection model.

It can be understood that images of pornographic, violent, political sensitive, vulgar and nausea and other types are collected, manual examination and labeling are carried out to obtain violation images containing type marking information, and then the violation images containing the type marking information are divided into a training image set, a verification image set and a test image set; a blank model is constructed based on an artificial neural network, wherein the artificial neural network can be a residual 18 network model and comprises an input layer, a convolution layer, a pooling layer, a full-link layer and an output identification layer, then forward propagation and backward propagation training is carried out on the blank model by utilizing a training image set, a verification image set and a test image set, and continuous loop iteration is carried out until the model converges. Before the illegal image is input into the model, image preprocessing including but not limited to noise reduction, brightness adjustment, stretching, overturning and scaling can be performed on the illegal image, and the generalization capability of the model can be effectively improved through the image preprocessing. The scaling may specifically be to adjust the shortest side of the image to 224, and then perform mean preprocessing on the image cut out by 224 × 224.

The image detection model training process specifically comprises the steps of inputting a training data set into a network model based on a content 18 architecture, then obtaining convolution feature maps C2, C3, C4 and C5 obtained in a bottom-up forward propagation process, adopting batch standardization processing after each convolution and before activating a function, selecting the convolution feature map C5 as an extracted feature because high-level semantic information of the network is richer, then predicting the probability of each type of a current data sample to which the current data sample belongs through a full-connection layer output network, setting the learning rate to be 1e-4, namely 0.0001, setting the iteration number to be 20, and repeatedly updating data in an iterative manner; and calculating loss by defining cross entropy between a real value and a predicted value, minimizing a loss function by presetting an optimization algorithm, wherein the optimization algorithm can be a Batch Gradient Descent (Batch Gradient decision) algorithm, and finally updating parameters by using a function optimization. The training can be stopped when the loss value in the training process is smaller than a certain threshold value, or the model training condition is judged by checking the accuracy of the verification image set, so that the phenomenon of model overfitting is prevented.

Step S13: converting the audio clips into texts through a voice recognition technology to obtain a text set, inputting the text set into a pre-established text auditing model to obtain audio types corresponding to the audio clips, and determining illegal audios according to the audio types.

In this embodiment, after the audio clip is obtained, the audio clip is converted into a text through a voice recognition technology to obtain a text set, the text set is input to a pre-created text auditing model to obtain an audio type corresponding to the audio clip, and an illegal audio is determined according to the audio type, where the audio type includes, but is not limited to, pornographic audio, violent audio, politically sensitive audio, vulgar audio, horrific audio, and legal audio.

In this embodiment, the process of creating the text audit model may include: acquiring illegal text data, and adding corresponding type marking information to the illegal text data; the type marking information comprises any one or more of pornography, violence, political sensitivity and vulgar; carrying out data cleaning and corpus preprocessing on the illegal text data, and extracting illegal keywords from the processed illegal text data; grouping the illegal keywords to obtain a training data set, a verification data set and a test data set; and constructing a blank model based on machine learning, and training and detecting the blank model by using the training data set, the verification data set and the test data set to obtain the text auditing model. Specifically, text data such as relevant pornography, abuse, political sensitivity, violence and terrorism are collected, corresponding labeling information is manually added, then data cleaning processing and corpus preprocessing are performed, wherein the corpus preprocessing includes but is not limited to word segmentation, word stop and complicated and simplified processing, then feature extraction is performed on the text data, specifically, keywords can be extracted by using a TF-IDF algorithm to obtain keywords corresponding to different types of text data, and a training data set, a verification data set and a test data set are obtained through division. The blank model constructed based on the machine learning can be a blank model constructed based on naive Bayes and can also be a blank model constructed based on a decision tree. Wherein, the above step S12 and step S13 may be performed simultaneously.

Step S14: and determining the time point of the violation image in the target teaching video, and determining the time period of the violation audio in the target teaching video.

In this embodiment, after the violation image and the violation audio are obtained, the specific time of the violation image and the violation audio appearing in the target teaching video is determined, so as to generate an audit result of the target teaching video.

In this embodiment, the determining a time point of the violation image in the target teaching video and determining a time period of the violation audio in the target teaching video may include: determining the time point of the violation image in the target teaching video according to the frame number of the violation image in the target teaching video and the number of frames transmitted by the target teaching video per second; and determining the time period of the violation audio in the target teaching video according to the target byte number and the preset sampling rate. It can be understood that the violation images can be determined by conversion according to the frame numbers of the violation images in the target teaching video and the number of transmission frames per second (fps) of the target teaching video, and the specific time periods of the violation audio can be determined by conversion of the target byte number and the preset sampling rate. The specific location where the violation image occurs can thus be recorded, which can be accurate to seconds, and the specific time period of the violation audio can be accurate to seconds. And finally, the auditing result of the target teaching video can be obtained in a result polling mode, specifically, when the user inputs the target teaching video for auditing, a unique ID is generated for the target teaching video, an information abstract algorithm is used for recording the ID of each request task to obtain a corresponding MD5 code, and then the result is inquired according to the MD5 code to output the types of violation images and violation audios, the specific positions of the violation images and the specific time periods of the violation audios.

As can be seen from the above, in this embodiment, a key frame set is obtained by extracting key frames from a target teaching video according to a preset extraction rule, and an audio clip set is obtained by extracting an audio clip from the target teaching video based on the number of bytes; inputting the key frame set into a pre-established image detection model to obtain an image type corresponding to the key frame, and determining an illegal image according to the image type; converting the audio clips into texts through a voice recognition technology to obtain a text set, inputting the text set into a pre-established text auditing model to obtain audio types corresponding to the audio clips, and determining illegal audios according to the audio types; and determining the time point of the violation image in the target teaching video, and determining the time period of the violation audio in the target teaching video. Therefore, the key frame set of the target teaching video is extracted through the preset extraction rule, the audio clip set is rapidly extracted on the basis of bytes, the violation image is detected through the image detection model, the violation audio is detected through the text conversion and text auditing model, the multi-aspect detection of the target teaching video can be realized, the time of the violation image and the violation audio in the target teaching video is finally determined, and the efficiency and the accuracy of auditing the target teaching video are improved.

The embodiment of the application discloses a specific teaching video auditing method, which is shown in fig. 3 and can comprise the following steps:

step S21: and acquiring the target teaching video, and separating the target teaching video to obtain a corresponding image sequence and an audio file.

In this embodiment, the target teaching video is first acquired, specifically, the complete video can be downloaded through the website where the user uploads the video, and the real-time teaching video being played can also be actively acquired at real-time intervals. And after the target teaching video is obtained, carrying out audio-video separation processing on the target teaching video to obtain a corresponding image sequence and an audio file.

Step S22: and segmenting the image sequence and the audio file to obtain a segmented image sequence and a segmented audio file.

In this embodiment, after obtaining the image sequence and the audio file, the image sequence and the audio file may be simply segmented according to a preset segmentation quantity value to obtain more than 2 segmented image sequences and more than 2 segmented audio files; the preset segmentation quantity values corresponding to the image sequence and the audio file can be the same or different; the specific segmentation mode can be uniform segmentation, and can also be a non-uniform segmentation mode according to the actual situation.

Step S23: and extracting the key frames from the segmented image sequence by utilizing a thread pool according to a preset extraction rule to obtain the key frame set.

In this embodiment, after obtaining the segmented image sequences, extracting key frames from the multiple segmented image sequences simultaneously by using a thread pool according to a preset extraction rule; before the key frame extraction is carried out by utilizing the thread pool, the efficiency of key frame extraction can be ensured by setting the accommodating thread number of the thread pool, namely, the accommodating thread number is set to be larger than the total number of the segmented image sequences. Therefore, the efficiency of video auditing can be improved through the multithread processing of the thread pool, and particularly, the speed of long video auditing can be improved.

Step S24: and extracting the audio clip from the segmented audio file by utilizing a thread pool and based on the number of bytes to obtain the audio clip set.

In this embodiment, after obtaining the segmented audio files, the thread pool is used to extract audio segments of the segmented audio files based on the number of bytes; before the audio clip is extracted by using the thread pool, the efficiency of extracting the audio clip can be ensured by setting the number of the accommodating threads of the thread pool, namely, the number of the accommodating threads is set to be larger than the total number of the audio files after the audio clip is divided. Therefore, the efficiency of video auditing can be improved through the multithread processing of the thread pool, and particularly, the speed of long video auditing can be improved.

Step S25: inputting the key frame set into a pre-established image detection model to obtain an image type corresponding to the key frame, and determining an illegal image according to the image type.

Step S26: converting the audio clips into texts through a voice recognition technology to obtain a text set, inputting the text set into a pre-established text auditing model to obtain audio types corresponding to the audio clips, and determining illegal audios according to the audio types.

Step S27: and determining the time point of the violation image in the target teaching video, and determining the time period of the violation audio in the target teaching video.

For the specific processes from step S25 to step S27, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.

As can be seen from the above, in this embodiment, the thread pool is used to extract the key frame from the segmented image sequence according to the preset extraction rule to obtain the key frame set, and the thread pool is used to extract the audio clip from the segmented audio file based on the number of bytes to obtain the audio clip set.

Correspondingly, the embodiment of the present application further discloses a teaching video auditing device, as shown in fig. 4, the device includes:

the extraction module 11 is configured to extract a key frame from a target teaching video according to a preset extraction rule to obtain a key frame set, and extract an audio clip from the target teaching video based on the number of bytes to obtain an audio clip set;

the violation image detection module 12 is configured to input the key frame set to a pre-created image detection model, obtain an image type corresponding to the key frame, and determine a violation image according to the image type;

the illegal audio detection module 13 is configured to convert the audio clip into a text by using a voice recognition technology to obtain a text set, input the text set to a pre-created text audit model to obtain an audio type corresponding to the audio clip, and determine an illegal audio according to the audio type;

and the time determining module 14 is configured to determine a time point of the violation image in the target teaching video, and determine a time period of the violation audio in the target teaching video.

As can be seen from the above, in this embodiment, a keyframe set of a target teaching video is extracted through a preset extraction rule, an audio clip set is rapidly extracted based on bytes, an illegal image is detected through an image detection model, and an illegal audio is detected through a text conversion and text review model, so that multi-aspect detection of the target teaching video can be realized, the time of the illegal image and the illegal audio in the target teaching video is finally determined, and the efficiency and accuracy of reviewing the target teaching video are improved.

In some specific embodiments, the extraction module 11 may specifically include:

the audio clip extraction unit is used for converting the audio corresponding to the target teaching video into binary data according to a preset sampling rate to obtain audio data; determining the number of target bytes based on the preset sampling rate and the preset audio time length; dividing the audio data into a plurality of data segments according to the target byte number to obtain corresponding audio segments so as to obtain the audio segment set;

the key frame extraction unit is used for extracting a first image frame from the target teaching video to serve as a key frame; sequentially extracting multiple frames of target image frames according to a preset time interval, and sequentially calculating the similarity between the extracted target image frames and adjacent key frames based on a structural similarity algorithm; and if the similarity is smaller than a preset similarity threshold, taking the target image frame as a key frame to obtain the key frame set.

In some specific embodiments, the time determination module 14 may specifically include:

the violation image time point determining unit is used for determining the time point of the violation image in the target teaching video according to the frame number of the violation image in the target teaching video and the transmission frame number of the target teaching video per second;

and the violation audio time period determining unit is used for determining the time period of the violation audio in the target teaching video according to the target byte number and the preset sampling rate.

In some embodiments, the teaching video auditing apparatus may specifically include:

the audio and video separation module is used for acquiring the target teaching video and separating the target teaching video to obtain a corresponding image sequence and an audio file;

and the segmentation module is used for segmenting the image sequence and the audio file to obtain a segmented image sequence and a segmented audio file.

Further, the embodiment of the present application also discloses an electronic device, which is shown in fig. 5, and the content in the drawing cannot be considered as any limitation to the application scope.

Fig. 5 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present disclosure. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. Wherein, the memory 22 is used for storing a computer program, and the computer program is loaded and executed by the processor 21 to implement the relevant steps in the teaching video auditing method disclosed in any of the foregoing embodiments.

In this embodiment, the power supply 23 is configured to provide a working voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and a communication protocol followed by the communication interface is any communication protocol applicable to the technical solution of the present application, and is not specifically limited herein; the input/output interface 25 is configured to obtain external input data or output data to the outside, and a specific interface type thereof may be selected according to specific application requirements, which is not specifically limited herein.

In addition, the storage 22 is used as a carrier for resource storage, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., the resources stored thereon include an operating system 221, a computer program 222, data 223 including a target teaching video, etc., and the storage may be a transient storage or a permanent storage.

The operating system 221 is used for managing and controlling each hardware device and the computer program 222 on the electronic device 20, so as to realize the operation and processing of the mass data 223 in the memory 22 by the processor 21, and may be Windows Server, Netware, Unix, Linux, and the like. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the teaching video review method performed by the electronic device 20 disclosed in any of the foregoing embodiments. The data 223 may include a target instructional video captured by the electronic device 20.

Further, an embodiment of the present application further discloses a computer storage medium, where computer-executable instructions are stored in the computer storage medium, and when the computer-executable instructions are loaded and executed by a processor, the steps of the teaching video auditing method disclosed in any of the foregoing embodiments are implemented.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The teaching video auditing method, device, equipment and medium provided by the invention are described in detail, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A teaching video auditing method is characterized by comprising the following steps:

2. The teaching video auditing method of claim 1, wherein said extracting an audio clip from the target teaching video based on number of bytes to obtain an audio clip set comprises:

3. The review method for teaching videos as claimed in claim 1, wherein the extracting key frames from the target teaching video according to the preset extraction rule to obtain the key frame set includes:

extracting a first image frame from the target teaching video as a key frame;

4. The teaching video auditing method of claim 1, wherein said extracting key frames from a target teaching video according to a preset extraction rule to obtain a key frame set, and extracting audio clips from the target teaching video based on byte count to obtain an audio clip set, comprises:

5. The instructional video review method of claim 1, wherein the creation of the image detection model comprises:

6. The instructional video review method of claim 1, wherein the process of creating the text review model comprises:

7. The instructional video review method according to any one of claims 2 to 6, wherein the determining a time point of the violation image in the instructional video and a time period of the violation audio in the instructional video comprises:

8. A teaching video auditing device, comprising:

9. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the instructional video review method of any one of claims 1 to 7.

10. A computer-readable storage medium for storing a computer program; wherein the computer program when executed by the processor implements a teaching video review method as claimed in any of claims 1 to 7.