CN112860943A - Teaching video auditing method, device, equipment and medium - Google Patents

Teaching video auditing method, device, equipment and medium Download PDF

Info

Publication number
CN112860943A
CN112860943A CN202110004554.9A CN202110004554A CN112860943A CN 112860943 A CN112860943 A CN 112860943A CN 202110004554 A CN202110004554 A CN 202110004554A CN 112860943 A CN112860943 A CN 112860943A
Authority
CN
China
Prior art keywords
audio
image
teaching video
target
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110004554.9A
Other languages
Chinese (zh)
Inventor
陈甜甜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Nuonuo Network Technology Co ltd
Original Assignee
Zhejiang Nuonuo Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Nuonuo Network Technology Co ltd filed Critical Zhejiang Nuonuo Network Technology Co ltd
Priority to CN202110004554.9A priority Critical patent/CN112860943A/en
Publication of CN112860943A publication Critical patent/CN112860943A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Abstract

The application discloses a teaching video auditing method, device, equipment and medium. The method comprises the following steps: extracting key frames from the target teaching video according to a preset extraction rule to obtain a key frame set, and extracting audio clips from the target teaching video based on the number of bytes to obtain an audio clip set; inputting the key frame set into a pre-established image detection model to obtain an image type corresponding to the key frame, and determining an illegal image according to the image type; converting the audio clips into texts through a voice recognition technology to obtain a text set, inputting the text set into a pre-established text auditing model to obtain audio types corresponding to the audio clips, and determining violation audio according to the audio types; and determining the time point of the violation image in the target teaching video, and determining the time period of the violation audio in the target teaching video. The image detection model and the text auditing model are used for detecting from two aspects of image and audio, so that the efficiency and the accuracy of auditing the target teaching video are improved.

Description

Teaching video auditing method, device, equipment and medium
Technical Field
The invention relates to the field of video auditing, in particular to a teaching video auditing method, device, equipment and medium.
Background
At present, along with the continuous development of computer internet correlation technique, the teaching mode is also being constantly updated and is being developed, the teaching of online lesson, modes such as live teaching are widely accepted the head of a family and classmates, but these internet teaching videos have various illegal risks equally, these illegal teaching videos are unfavorable for the construction of good internet environment, more serious can harm teaching video viewer, especially teenagers' physical and mental health, and because picture and text information alternates changeable in the teaching video, lecturer is changeable, the degree of difficulty has been increased for the illegal detection of teaching video. In the prior art, the auditing is performed manually, but a large amount of manpower and material resources are consumed. Therefore, how to efficiently and accurately detect the illegal content in the teaching video is a problem to be solved urgently at present.
Disclosure of Invention
In view of this, the present invention provides a teaching video auditing method, apparatus, device and medium, which can improve the efficiency and accuracy of target teaching video auditing. The specific scheme is as follows:
in a first aspect, the present application discloses a teaching video auditing method, including:
extracting key frames from a target teaching video according to a preset extraction rule to obtain a key frame set, and extracting audio clips from the target teaching video based on byte number to obtain an audio clip set;
inputting the key frame set into a pre-established image detection model to obtain an image type corresponding to the key frame, and determining an illegal image according to the image type;
converting the audio clips into texts through a voice recognition technology to obtain a text set, inputting the text set into a pre-established text auditing model to obtain audio types corresponding to the audio clips, and determining illegal audios according to the audio types;
and determining the time point of the violation image in the target teaching video, and determining the time period of the violation audio in the target teaching video.
Optionally, the extracting an audio clip from the target teaching video based on the number of bytes to obtain an audio clip set includes:
converting the audio corresponding to the target teaching video into binary data according to a preset sampling rate to obtain audio data;
determining the number of target bytes based on the preset sampling rate and the preset audio time length;
and dividing the audio data into a plurality of data segments according to the target byte number to obtain corresponding audio segments so as to obtain the audio segment set.
Optionally, the extracting key frames from the target teaching video according to the preset extraction rule to obtain a key frame set includes:
extracting a first image frame from the target teaching video as a key frame;
sequentially extracting multiple frames of target image frames according to a preset time interval, and sequentially calculating the similarity between the extracted target image frames and adjacent key frames based on a structural similarity algorithm;
and if the similarity is smaller than a preset similarity threshold, taking the target image frame as a key frame to obtain the key frame set.
Optionally, the extracting key frames from the target teaching video according to the preset extraction rule to obtain a key frame set, and extracting audio clips from the target teaching video based on the number of bytes to obtain an audio clip set includes:
acquiring the target teaching video, and separating the target teaching video to obtain a corresponding image sequence and an audio file;
segmenting the image sequence and the audio file to obtain a segmented image sequence and a segmented audio file;
extracting the key frames from the segmented image sequence by using a thread pool according to a preset extraction rule to obtain the key frame set;
and extracting the audio clip from the segmented audio file by utilizing a thread pool and based on the number of bytes to obtain the audio clip set.
Optionally, the process of creating the image detection model includes:
acquiring an illegal image, and adding corresponding type marking information to the illegal image; the type marking information comprises any one or more of pornography, violence, political sensitivity and vulgar;
grouping the illegal images containing the type marking information to obtain a training image set, a verification image set and a test image set;
and constructing a blank model based on an artificial neural network, and training and detecting the blank model by using the training image set, the verification image set and the test image set to obtain the image detection model.
Optionally, the process of creating the text audit model includes:
acquiring illegal text data, and adding corresponding type marking information to the illegal text data; the type marking information comprises any one or more of pornography, violence, political sensitivity and vulgar;
carrying out data cleaning and corpus preprocessing on the illegal text data, and extracting illegal keywords from the processed illegal text data;
grouping the illegal keywords to obtain a training data set, a verification data set and a test data set;
and constructing a blank model based on machine learning, and training and detecting the blank model by using the training data set, the verification data set and the test data set to obtain the text auditing model.
Optionally, the determining a time point of the violation image in the target teaching video and determining a time period of the violation audio in the target teaching video includes:
determining the time point of the violation image in the target teaching video according to the frame number of the violation image in the target teaching video and the number of frames transmitted by the target teaching video per second;
and determining the time period of the violation audio in the target teaching video according to the target byte number and the preset sampling rate.
In a second aspect, the present application discloses a teaching video auditing device, including:
the extraction module is used for extracting key frames from the target teaching video according to a preset extraction rule to obtain a key frame set, and extracting audio clips from the target teaching video based on the number of bytes to obtain an audio clip set;
the violation image detection module is used for inputting the key frame set into a pre-established image detection model to obtain an image type corresponding to the key frame, and determining a violation image according to the image type;
the illegal audio detection module is used for converting the audio clips into texts through a voice recognition technology to obtain a text set, inputting the text set into a pre-established text auditing model to obtain audio types corresponding to the audio clips, and determining illegal audio according to the audio types;
and the time determining module is used for determining the time point of the violation image in the target teaching video and determining the time period of the violation audio in the target teaching video.
In a third aspect, the present application discloses an electronic device, comprising:
a memory for storing a computer program;
and the processor is used for executing the computer program to realize the teaching video auditing method.
In a fourth aspect, the present application discloses a computer readable storage medium for storing a computer program; wherein the computer program when executed by the processor implements the teaching video auditing method described above.
According to the method, a key frame is extracted from a target teaching video according to a preset extraction rule to obtain a key frame set, and an audio clip is extracted from the target teaching video based on the number of bytes to obtain an audio clip set; inputting the key frame set into a pre-established image detection model to obtain an image type corresponding to the key frame, and determining an illegal image according to the image type; converting the audio clips into texts through a voice recognition technology to obtain a text set, inputting the text set into a pre-established text auditing model to obtain audio types corresponding to the audio clips, and determining illegal audios according to the audio types; and determining the time point of the violation image in the target teaching video, and determining the time period of the violation audio in the target teaching video. Therefore, the key frame set of the target teaching video is extracted through the preset extraction rule, the audio clip set is extracted quickly based on bytes, the illegal image is detected through the image detection model, and the illegal audio is detected through the text conversion and text verification model, so that the multi-aspect detection of the target teaching video can be realized, the time of the illegal image and the illegal audio in the target teaching video is determined, and the efficiency and the accuracy of the verification of the target teaching video are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a review method for teaching videos provided by the present application;
fig. 2 is a flowchart of a key frame extraction method provided in the present application;
FIG. 3 is a flowchart of a particular review method for teaching video provided herein;
fig. 4 is a schematic structural diagram of a teaching video auditing apparatus provided in the present application;
fig. 5 is a block diagram of an electronic device provided in the present application.
Detailed Description
In the prior art, the teaching video is audited in a manual mode, and the efficiency of auditing the teaching video is reduced. In order to overcome the technical problem, the application provides a teaching video auditing method which can improve the efficiency and accuracy of teaching video auditing.
The embodiment of the application discloses a teaching video auditing method, and as shown in figure 1, the method can comprise the following steps:
step S11: extracting key frames from the target teaching video according to a preset extraction rule to obtain a key frame set, and extracting audio clips from the target teaching video based on byte number to obtain an audio clip set.
In this embodiment, first, a key frame is extracted from a target teaching video according to a preset extraction rule to obtain a key frame set, and an audio clip with a corresponding length is extracted from the target teaching video based on the number of bytes to obtain an audio clip set.
In this embodiment, the extracting key frames from the target teaching video according to the preset extraction rule to obtain the key frame set may include: extracting a first image frame from the target teaching video as a key frame; sequentially extracting multiple frames of target image frames according to a preset time interval, and sequentially calculating the similarity between the extracted target image frames and adjacent key frames based on a structural similarity algorithm; and if the similarity is smaller than a preset similarity threshold, taking the target image frame as a key frame to obtain the key frame set.
It is understood that the first image frame may be a first frame image in the target teaching video, or may be one frame of images of several previous frames of the target teaching video; specifically, for example, as shown in fig. 2, after the first image frame is taken as a first key frame, a second image frame is extracted according to the first key frame and a preset time interval, and a similarity between the second image frame and the first key frame is calculated based on a structural similarity algorithm, if the similarity is smaller than a preset similarity threshold, the second image frame is taken as a second key frame, if the similarity is greater than or equal to the preset similarity threshold, the second image frame is filtered, a third image frame is extracted according to the second image frame and the preset time interval, and a similarity between the third image frame and the first key frame is calculated, until the target teaching video is extracted and compared according to the contrast screening principle, and a key frame meeting the condition is screened to obtain the key frame set. The preset time interval and the preset similarity threshold value can be adjusted according to video types and auditing requirements, and the similarity filtering of the image frames can also be used for filtering the previous image frame in two image frames for similarity comparison. The obtained key frame set can contain the main content of the target teaching video, the redundancy is reduced, and the complexity of a frame-by-frame detection mode is avoided while the detection accuracy is ensured.
In this embodiment, before performing image frame similarity calculation based on the structural similarity calculation method, image preprocessing, including scaling, noise reduction, and the like, may be performed on the image frame; for example, two frames of images with the same size are scaled equally, the size of the longest side can be maintained at 460 or 380, and then the two extracted frames of images are compared by using a Structural SIMilarity (SSIM) algorithm to obtain the SIMilarity of the two frames.
In this embodiment, the extracting an audio clip from the target teaching video based on the number of bytes to obtain an audio clip set may include: converting the audio corresponding to the target teaching video into binary data according to a preset sampling rate to obtain audio data; determining the number of target bytes based on the preset sampling rate and the preset audio time length; dividing the audio data into a plurality of data segments according to the target byte number to obtain corresponding audio segments so as to obtain the audio segment set; the extraction of the audio segments may be continuous extraction or interval extraction. For example, the audio is converted into monaural binary information by using the sampling rate 16k, then the number of bytes extracted for each 60s audio segment is determined to be 60 × 16000 × 2 according to the preset audio duration 60s, and a plurality of audio segments are extracted according to the number of bytes. Therefore, the extraction speed can be improved by extracting the audio clip by using the byte number, and the video auditing efficiency is improved.
Step S12: inputting the key frame set into a pre-established image detection model to obtain an image type corresponding to the key frame, and determining an illegal image according to the image type.
In this embodiment, after the key frame set is obtained, the key frame set is input to a pre-created image detection model, image types corresponding to different key frames are obtained, and an illegal image is determined according to the image types, where the image types include, but are not limited to, pornographic images, violence images, political sensitive images, vulgar images, and legal images. In this embodiment, the process of creating the image detection model may include: acquiring an illegal image, and adding corresponding type marking information to the illegal image; the type marking information comprises any one or more of pornography, violence, political sensitivity and vulgar; grouping the illegal images containing the type marking information to obtain a training image set, a verification image set and a test image set; and constructing a blank model based on an artificial neural network, and training and detecting the blank model by using the training image set, the verification image set and the test image set to obtain the image detection model.
It can be understood that images of pornographic, violent, political sensitive, vulgar and nausea and other types are collected, manual examination and labeling are carried out to obtain violation images containing type marking information, and then the violation images containing the type marking information are divided into a training image set, a verification image set and a test image set; a blank model is constructed based on an artificial neural network, wherein the artificial neural network can be a residual 18 network model and comprises an input layer, a convolution layer, a pooling layer, a full-link layer and an output identification layer, then forward propagation and backward propagation training is carried out on the blank model by utilizing a training image set, a verification image set and a test image set, and continuous loop iteration is carried out until the model converges. Before the illegal image is input into the model, image preprocessing including but not limited to noise reduction, brightness adjustment, stretching, overturning and scaling can be performed on the illegal image, and the generalization capability of the model can be effectively improved through the image preprocessing. The scaling may specifically be to adjust the shortest side of the image to 224, and then perform mean preprocessing on the image cut out by 224 × 224.
The image detection model training process specifically comprises the steps of inputting a training data set into a network model based on a content 18 architecture, then obtaining convolution feature maps C2, C3, C4 and C5 obtained in a bottom-up forward propagation process, adopting batch standardization processing after each convolution and before activating a function, selecting the convolution feature map C5 as an extracted feature because high-level semantic information of the network is richer, then predicting the probability of each type of a current data sample to which the current data sample belongs through a full-connection layer output network, setting the learning rate to be 1e-4, namely 0.0001, setting the iteration number to be 20, and repeatedly updating data in an iterative manner; and calculating loss by defining cross entropy between a real value and a predicted value, minimizing a loss function by presetting an optimization algorithm, wherein the optimization algorithm can be a Batch Gradient Descent (Batch Gradient decision) algorithm, and finally updating parameters by using a function optimization. The training can be stopped when the loss value in the training process is smaller than a certain threshold value, or the model training condition is judged by checking the accuracy of the verification image set, so that the phenomenon of model overfitting is prevented.
Step S13: converting the audio clips into texts through a voice recognition technology to obtain a text set, inputting the text set into a pre-established text auditing model to obtain audio types corresponding to the audio clips, and determining illegal audios according to the audio types.
In this embodiment, after the audio clip is obtained, the audio clip is converted into a text through a voice recognition technology to obtain a text set, the text set is input to a pre-created text auditing model to obtain an audio type corresponding to the audio clip, and an illegal audio is determined according to the audio type, where the audio type includes, but is not limited to, pornographic audio, violent audio, politically sensitive audio, vulgar audio, horrific audio, and legal audio.
In this embodiment, the process of creating the text audit model may include: acquiring illegal text data, and adding corresponding type marking information to the illegal text data; the type marking information comprises any one or more of pornography, violence, political sensitivity and vulgar; carrying out data cleaning and corpus preprocessing on the illegal text data, and extracting illegal keywords from the processed illegal text data; grouping the illegal keywords to obtain a training data set, a verification data set and a test data set; and constructing a blank model based on machine learning, and training and detecting the blank model by using the training data set, the verification data set and the test data set to obtain the text auditing model. Specifically, text data such as relevant pornography, abuse, political sensitivity, violence and terrorism are collected, corresponding labeling information is manually added, then data cleaning processing and corpus preprocessing are performed, wherein the corpus preprocessing includes but is not limited to word segmentation, word stop and complicated and simplified processing, then feature extraction is performed on the text data, specifically, keywords can be extracted by using a TF-IDF algorithm to obtain keywords corresponding to different types of text data, and a training data set, a verification data set and a test data set are obtained through division. The blank model constructed based on the machine learning can be a blank model constructed based on naive Bayes and can also be a blank model constructed based on a decision tree. Wherein, the above step S12 and step S13 may be performed simultaneously.
Step S14: and determining the time point of the violation image in the target teaching video, and determining the time period of the violation audio in the target teaching video.
In this embodiment, after the violation image and the violation audio are obtained, the specific time of the violation image and the violation audio appearing in the target teaching video is determined, so as to generate an audit result of the target teaching video.
In this embodiment, the determining a time point of the violation image in the target teaching video and determining a time period of the violation audio in the target teaching video may include: determining the time point of the violation image in the target teaching video according to the frame number of the violation image in the target teaching video and the number of frames transmitted by the target teaching video per second; and determining the time period of the violation audio in the target teaching video according to the target byte number and the preset sampling rate. It can be understood that the violation images can be determined by conversion according to the frame numbers of the violation images in the target teaching video and the number of transmission frames per second (fps) of the target teaching video, and the specific time periods of the violation audio can be determined by conversion of the target byte number and the preset sampling rate. The specific location where the violation image occurs can thus be recorded, which can be accurate to seconds, and the specific time period of the violation audio can be accurate to seconds. And finally, the auditing result of the target teaching video can be obtained in a result polling mode, specifically, when the user inputs the target teaching video for auditing, a unique ID is generated for the target teaching video, an information abstract algorithm is used for recording the ID of each request task to obtain a corresponding MD5 code, and then the result is inquired according to the MD5 code to output the types of violation images and violation audios, the specific positions of the violation images and the specific time periods of the violation audios.
As can be seen from the above, in this embodiment, a key frame set is obtained by extracting key frames from a target teaching video according to a preset extraction rule, and an audio clip set is obtained by extracting an audio clip from the target teaching video based on the number of bytes; inputting the key frame set into a pre-established image detection model to obtain an image type corresponding to the key frame, and determining an illegal image according to the image type; converting the audio clips into texts through a voice recognition technology to obtain a text set, inputting the text set into a pre-established text auditing model to obtain audio types corresponding to the audio clips, and determining illegal audios according to the audio types; and determining the time point of the violation image in the target teaching video, and determining the time period of the violation audio in the target teaching video. Therefore, the key frame set of the target teaching video is extracted through the preset extraction rule, the audio clip set is rapidly extracted on the basis of bytes, the violation image is detected through the image detection model, the violation audio is detected through the text conversion and text auditing model, the multi-aspect detection of the target teaching video can be realized, the time of the violation image and the violation audio in the target teaching video is finally determined, and the efficiency and the accuracy of auditing the target teaching video are improved.
The embodiment of the application discloses a specific teaching video auditing method, which is shown in fig. 3 and can comprise the following steps:
step S21: and acquiring the target teaching video, and separating the target teaching video to obtain a corresponding image sequence and an audio file.
In this embodiment, the target teaching video is first acquired, specifically, the complete video can be downloaded through the website where the user uploads the video, and the real-time teaching video being played can also be actively acquired at real-time intervals. And after the target teaching video is obtained, carrying out audio-video separation processing on the target teaching video to obtain a corresponding image sequence and an audio file.
Step S22: and segmenting the image sequence and the audio file to obtain a segmented image sequence and a segmented audio file.
In this embodiment, after obtaining the image sequence and the audio file, the image sequence and the audio file may be simply segmented according to a preset segmentation quantity value to obtain more than 2 segmented image sequences and more than 2 segmented audio files; the preset segmentation quantity values corresponding to the image sequence and the audio file can be the same or different; the specific segmentation mode can be uniform segmentation, and can also be a non-uniform segmentation mode according to the actual situation.
Step S23: and extracting the key frames from the segmented image sequence by utilizing a thread pool according to a preset extraction rule to obtain the key frame set.
In this embodiment, after obtaining the segmented image sequences, extracting key frames from the multiple segmented image sequences simultaneously by using a thread pool according to a preset extraction rule; before the key frame extraction is carried out by utilizing the thread pool, the efficiency of key frame extraction can be ensured by setting the accommodating thread number of the thread pool, namely, the accommodating thread number is set to be larger than the total number of the segmented image sequences. Therefore, the efficiency of video auditing can be improved through the multithread processing of the thread pool, and particularly, the speed of long video auditing can be improved.
Step S24: and extracting the audio clip from the segmented audio file by utilizing a thread pool and based on the number of bytes to obtain the audio clip set.
In this embodiment, after obtaining the segmented audio files, the thread pool is used to extract audio segments of the segmented audio files based on the number of bytes; before the audio clip is extracted by using the thread pool, the efficiency of extracting the audio clip can be ensured by setting the number of the accommodating threads of the thread pool, namely, the number of the accommodating threads is set to be larger than the total number of the audio files after the audio clip is divided. Therefore, the efficiency of video auditing can be improved through the multithread processing of the thread pool, and particularly, the speed of long video auditing can be improved.
Step S25: inputting the key frame set into a pre-established image detection model to obtain an image type corresponding to the key frame, and determining an illegal image according to the image type.
Step S26: converting the audio clips into texts through a voice recognition technology to obtain a text set, inputting the text set into a pre-established text auditing model to obtain audio types corresponding to the audio clips, and determining illegal audios according to the audio types.
Step S27: and determining the time point of the violation image in the target teaching video, and determining the time period of the violation audio in the target teaching video.
For the specific processes from step S25 to step S27, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.
As can be seen from the above, in this embodiment, the thread pool is used to extract the key frame from the segmented image sequence according to the preset extraction rule to obtain the key frame set, and the thread pool is used to extract the audio clip from the segmented audio file based on the number of bytes to obtain the audio clip set.
Correspondingly, the embodiment of the present application further discloses a teaching video auditing device, as shown in fig. 4, the device includes:
the extraction module 11 is configured to extract a key frame from a target teaching video according to a preset extraction rule to obtain a key frame set, and extract an audio clip from the target teaching video based on the number of bytes to obtain an audio clip set;
the violation image detection module 12 is configured to input the key frame set to a pre-created image detection model, obtain an image type corresponding to the key frame, and determine a violation image according to the image type;
the illegal audio detection module 13 is configured to convert the audio clip into a text by using a voice recognition technology to obtain a text set, input the text set to a pre-created text audit model to obtain an audio type corresponding to the audio clip, and determine an illegal audio according to the audio type;
and the time determining module 14 is configured to determine a time point of the violation image in the target teaching video, and determine a time period of the violation audio in the target teaching video.
As can be seen from the above, in this embodiment, a keyframe set of a target teaching video is extracted through a preset extraction rule, an audio clip set is rapidly extracted based on bytes, an illegal image is detected through an image detection model, and an illegal audio is detected through a text conversion and text review model, so that multi-aspect detection of the target teaching video can be realized, the time of the illegal image and the illegal audio in the target teaching video is finally determined, and the efficiency and accuracy of reviewing the target teaching video are improved.
In some specific embodiments, the extraction module 11 may specifically include:
the audio clip extraction unit is used for converting the audio corresponding to the target teaching video into binary data according to a preset sampling rate to obtain audio data; determining the number of target bytes based on the preset sampling rate and the preset audio time length; dividing the audio data into a plurality of data segments according to the target byte number to obtain corresponding audio segments so as to obtain the audio segment set;
the key frame extraction unit is used for extracting a first image frame from the target teaching video to serve as a key frame; sequentially extracting multiple frames of target image frames according to a preset time interval, and sequentially calculating the similarity between the extracted target image frames and adjacent key frames based on a structural similarity algorithm; and if the similarity is smaller than a preset similarity threshold, taking the target image frame as a key frame to obtain the key frame set.
In some specific embodiments, the time determination module 14 may specifically include:
the violation image time point determining unit is used for determining the time point of the violation image in the target teaching video according to the frame number of the violation image in the target teaching video and the transmission frame number of the target teaching video per second;
and the violation audio time period determining unit is used for determining the time period of the violation audio in the target teaching video according to the target byte number and the preset sampling rate.
In some embodiments, the teaching video auditing apparatus may specifically include:
the audio and video separation module is used for acquiring the target teaching video and separating the target teaching video to obtain a corresponding image sequence and an audio file;
and the segmentation module is used for segmenting the image sequence and the audio file to obtain a segmented image sequence and a segmented audio file.
Further, the embodiment of the present application also discloses an electronic device, which is shown in fig. 5, and the content in the drawing cannot be considered as any limitation to the application scope.
Fig. 5 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present disclosure. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. Wherein, the memory 22 is used for storing a computer program, and the computer program is loaded and executed by the processor 21 to implement the relevant steps in the teaching video auditing method disclosed in any of the foregoing embodiments.
In this embodiment, the power supply 23 is configured to provide a working voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and a communication protocol followed by the communication interface is any communication protocol applicable to the technical solution of the present application, and is not specifically limited herein; the input/output interface 25 is configured to obtain external input data or output data to the outside, and a specific interface type thereof may be selected according to specific application requirements, which is not specifically limited herein.
In addition, the storage 22 is used as a carrier for resource storage, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., the resources stored thereon include an operating system 221, a computer program 222, data 223 including a target teaching video, etc., and the storage may be a transient storage or a permanent storage.
The operating system 221 is used for managing and controlling each hardware device and the computer program 222 on the electronic device 20, so as to realize the operation and processing of the mass data 223 in the memory 22 by the processor 21, and may be Windows Server, Netware, Unix, Linux, and the like. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the teaching video review method performed by the electronic device 20 disclosed in any of the foregoing embodiments. The data 223 may include a target instructional video captured by the electronic device 20.
Further, an embodiment of the present application further discloses a computer storage medium, where computer-executable instructions are stored in the computer storage medium, and when the computer-executable instructions are loaded and executed by a processor, the steps of the teaching video auditing method disclosed in any of the foregoing embodiments are implemented.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The teaching video auditing method, device, equipment and medium provided by the invention are described in detail, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A teaching video auditing method is characterized by comprising the following steps:
extracting key frames from a target teaching video according to a preset extraction rule to obtain a key frame set, and extracting audio clips from the target teaching video based on byte number to obtain an audio clip set;
inputting the key frame set into a pre-established image detection model to obtain an image type corresponding to the key frame, and determining an illegal image according to the image type;
converting the audio clips into texts through a voice recognition technology to obtain a text set, inputting the text set into a pre-established text auditing model to obtain audio types corresponding to the audio clips, and determining illegal audios according to the audio types;
and determining the time point of the violation image in the target teaching video, and determining the time period of the violation audio in the target teaching video.
2. The teaching video auditing method of claim 1, wherein said extracting an audio clip from the target teaching video based on number of bytes to obtain an audio clip set comprises:
converting the audio corresponding to the target teaching video into binary data according to a preset sampling rate to obtain audio data;
determining the number of target bytes based on the preset sampling rate and the preset audio time length;
and dividing the audio data into a plurality of data segments according to the target byte number to obtain corresponding audio segments so as to obtain the audio segment set.
3. The review method for teaching videos as claimed in claim 1, wherein the extracting key frames from the target teaching video according to the preset extraction rule to obtain the key frame set includes:
extracting a first image frame from the target teaching video as a key frame;
sequentially extracting multiple frames of target image frames according to a preset time interval, and sequentially calculating the similarity between the extracted target image frames and adjacent key frames based on a structural similarity algorithm;
and if the similarity is smaller than a preset similarity threshold, taking the target image frame as a key frame to obtain the key frame set.
4. The teaching video auditing method of claim 1, wherein said extracting key frames from a target teaching video according to a preset extraction rule to obtain a key frame set, and extracting audio clips from the target teaching video based on byte count to obtain an audio clip set, comprises:
acquiring the target teaching video, and separating the target teaching video to obtain a corresponding image sequence and an audio file;
segmenting the image sequence and the audio file to obtain a segmented image sequence and a segmented audio file;
extracting the key frames from the segmented image sequence by using a thread pool according to a preset extraction rule to obtain the key frame set;
and extracting the audio clip from the segmented audio file by utilizing a thread pool and based on the number of bytes to obtain the audio clip set.
5. The instructional video review method of claim 1, wherein the creation of the image detection model comprises:
acquiring an illegal image, and adding corresponding type marking information to the illegal image; the type marking information comprises any one or more of pornography, violence, political sensitivity and vulgar;
grouping the illegal images containing the type marking information to obtain a training image set, a verification image set and a test image set;
and constructing a blank model based on an artificial neural network, and training and detecting the blank model by using the training image set, the verification image set and the test image set to obtain the image detection model.
6. The instructional video review method of claim 1, wherein the process of creating the text review model comprises:
acquiring illegal text data, and adding corresponding type marking information to the illegal text data; the type marking information comprises any one or more of pornography, violence, political sensitivity and vulgar;
carrying out data cleaning and corpus preprocessing on the illegal text data, and extracting illegal keywords from the processed illegal text data;
grouping the illegal keywords to obtain a training data set, a verification data set and a test data set;
and constructing a blank model based on machine learning, and training and detecting the blank model by using the training data set, the verification data set and the test data set to obtain the text auditing model.
7. The instructional video review method according to any one of claims 2 to 6, wherein the determining a time point of the violation image in the instructional video and a time period of the violation audio in the instructional video comprises:
determining the time point of the violation image in the target teaching video according to the frame number of the violation image in the target teaching video and the number of frames transmitted by the target teaching video per second;
and determining the time period of the violation audio in the target teaching video according to the target byte number and the preset sampling rate.
8. A teaching video auditing device, comprising:
the extraction module is used for extracting key frames from the target teaching video according to a preset extraction rule to obtain a key frame set, and extracting audio clips from the target teaching video based on the number of bytes to obtain an audio clip set;
the violation image detection module is used for inputting the key frame set into a pre-established image detection model to obtain an image type corresponding to the key frame, and determining a violation image according to the image type;
the illegal audio detection module is used for converting the audio clips into texts through a voice recognition technology to obtain a text set, inputting the text set into a pre-established text auditing model to obtain audio types corresponding to the audio clips, and determining illegal audio according to the audio types;
and the time determining module is used for determining the time point of the violation image in the target teaching video and determining the time period of the violation audio in the target teaching video.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the instructional video review method of any one of claims 1 to 7.
10. A computer-readable storage medium for storing a computer program; wherein the computer program when executed by the processor implements a teaching video review method as claimed in any of claims 1 to 7.
CN202110004554.9A 2021-01-04 2021-01-04 Teaching video auditing method, device, equipment and medium Pending CN112860943A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110004554.9A CN112860943A (en) 2021-01-04 2021-01-04 Teaching video auditing method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110004554.9A CN112860943A (en) 2021-01-04 2021-01-04 Teaching video auditing method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN112860943A true CN112860943A (en) 2021-05-28

Family

ID=76001437

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110004554.9A Pending CN112860943A (en) 2021-01-04 2021-01-04 Teaching video auditing method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN112860943A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113689862A (en) * 2021-08-23 2021-11-23 南京优飞保科信息技术有限公司 Quality inspection method and system for customer service seat voice data
CN113850162A (en) * 2021-09-10 2021-12-28 北京百度网讯科技有限公司 Video auditing method and device and electronic equipment
CN114612839A (en) * 2022-03-18 2022-06-10 壹加艺术(武汉)文化有限公司 Short video analysis processing method, system and computer storage medium
CN114979727A (en) * 2022-05-18 2022-08-30 雨果网(厦门)跨境电商有限公司 Advertisement violation gathering auditing system
CN115515002A (en) * 2022-09-22 2022-12-23 深圳市木愚科技有限公司 Intelligent admire class generation method and device based on virtual digital person and storage medium
CN115905584A (en) * 2023-01-09 2023-04-04 共道网络科技有限公司 Video splitting method and device
CN116052222A (en) * 2023-03-06 2023-05-02 吉林大学 Cattle face recognition method for naturally collecting cattle face image
CN114666618B (en) * 2022-03-15 2023-10-13 广州欢城文化传媒有限公司 Audio auditing method, device, equipment and readable storage medium
CN117727047A (en) * 2024-02-07 2024-03-19 深圳市多易得信息技术股份有限公司 AI-based large-model content security quality inspection processing method

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060228029A1 (en) * 2005-03-29 2006-10-12 Microsoft Corporation Method and system for video clip compression
CN105391708A (en) * 2015-11-02 2016-03-09 北京锐安科技有限公司 Audio data detection method and device
CN106341519A (en) * 2015-07-08 2017-01-18 腾讯科技(深圳)有限公司 Audio data processing method and device
CN106708990A (en) * 2016-12-15 2017-05-24 腾讯音乐娱乐(深圳)有限公司 Music clip extraction method and device
CN108124191A (en) * 2017-12-22 2018-06-05 北京百度网讯科技有限公司 A kind of video reviewing method, device and server
CN108364660A (en) * 2018-02-09 2018-08-03 腾讯音乐娱乐科技(深圳)有限公司 Accent identification method, device and computer readable storage medium
CN108419091A (en) * 2018-03-02 2018-08-17 北京未来媒体科技股份有限公司 A kind of verifying video content method and device based on machine learning
CN109495783A (en) * 2018-11-02 2019-03-19 平安科技(深圳)有限公司 Video reviewing method, device, electronic equipment and medium
CN109756746A (en) * 2018-12-28 2019-05-14 广州华多网络科技有限公司 Video reviewing method, device, server and storage medium
CN110852231A (en) * 2019-11-04 2020-02-28 云目未来科技(北京)有限公司 Illegal video detection method and device and storage medium
CN110969066A (en) * 2018-09-30 2020-04-07 北京金山云网络技术有限公司 Live video identification method and device and electronic equipment
CN111105779A (en) * 2020-01-02 2020-05-05 标贝(北京)科技有限公司 Text playing method and device for mobile client
CN111225234A (en) * 2019-12-23 2020-06-02 广州市百果园信息技术有限公司 Video auditing method, video auditing device, equipment and storage medium
CN111382623A (en) * 2018-12-28 2020-07-07 广州市百果园信息技术有限公司 Live broadcast auditing method, device, server and storage medium
CN111462735A (en) * 2020-04-10 2020-07-28 网易(杭州)网络有限公司 Voice detection method and device, electronic equipment and storage medium
CN111797752A (en) * 2020-06-29 2020-10-20 广州市百果园信息技术有限公司 Illegal video detection method, device, equipment and storage medium
CN111813367A (en) * 2020-07-22 2020-10-23 广州繁星互娱信息科技有限公司 Method, device and equipment for adjusting volume and storage medium

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060228029A1 (en) * 2005-03-29 2006-10-12 Microsoft Corporation Method and system for video clip compression
CN106341519A (en) * 2015-07-08 2017-01-18 腾讯科技(深圳)有限公司 Audio data processing method and device
CN105391708A (en) * 2015-11-02 2016-03-09 北京锐安科技有限公司 Audio data detection method and device
CN106708990A (en) * 2016-12-15 2017-05-24 腾讯音乐娱乐(深圳)有限公司 Music clip extraction method and device
CN108124191A (en) * 2017-12-22 2018-06-05 北京百度网讯科技有限公司 A kind of video reviewing method, device and server
CN108364660A (en) * 2018-02-09 2018-08-03 腾讯音乐娱乐科技(深圳)有限公司 Accent identification method, device and computer readable storage medium
CN108419091A (en) * 2018-03-02 2018-08-17 北京未来媒体科技股份有限公司 A kind of verifying video content method and device based on machine learning
CN110969066A (en) * 2018-09-30 2020-04-07 北京金山云网络技术有限公司 Live video identification method and device and electronic equipment
CN109495783A (en) * 2018-11-02 2019-03-19 平安科技(深圳)有限公司 Video reviewing method, device, electronic equipment and medium
CN109756746A (en) * 2018-12-28 2019-05-14 广州华多网络科技有限公司 Video reviewing method, device, server and storage medium
CN111382623A (en) * 2018-12-28 2020-07-07 广州市百果园信息技术有限公司 Live broadcast auditing method, device, server and storage medium
CN110852231A (en) * 2019-11-04 2020-02-28 云目未来科技(北京)有限公司 Illegal video detection method and device and storage medium
CN111225234A (en) * 2019-12-23 2020-06-02 广州市百果园信息技术有限公司 Video auditing method, video auditing device, equipment and storage medium
CN111105779A (en) * 2020-01-02 2020-05-05 标贝(北京)科技有限公司 Text playing method and device for mobile client
CN111462735A (en) * 2020-04-10 2020-07-28 网易(杭州)网络有限公司 Voice detection method and device, electronic equipment and storage medium
CN111797752A (en) * 2020-06-29 2020-10-20 广州市百果园信息技术有限公司 Illegal video detection method, device, equipment and storage medium
CN111813367A (en) * 2020-07-22 2020-10-23 广州繁星互娱信息科技有限公司 Method, device and equipment for adjusting volume and storage medium

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113689862A (en) * 2021-08-23 2021-11-23 南京优飞保科信息技术有限公司 Quality inspection method and system for customer service seat voice data
CN113689862B (en) * 2021-08-23 2024-03-22 南京优飞保科信息技术有限公司 Quality inspection method and system for customer service agent voice data
CN113850162A (en) * 2021-09-10 2021-12-28 北京百度网讯科技有限公司 Video auditing method and device and electronic equipment
WO2023035923A1 (en) * 2021-09-10 2023-03-16 北京百度网讯科技有限公司 Video checking method and apparatus and electronic device
CN114666618B (en) * 2022-03-15 2023-10-13 广州欢城文化传媒有限公司 Audio auditing method, device, equipment and readable storage medium
CN114612839A (en) * 2022-03-18 2022-06-10 壹加艺术(武汉)文化有限公司 Short video analysis processing method, system and computer storage medium
CN114612839B (en) * 2022-03-18 2023-10-31 壹加艺术(武汉)文化有限公司 Short video analysis processing method, system and computer storage medium
CN114979727A (en) * 2022-05-18 2022-08-30 雨果网(厦门)跨境电商有限公司 Advertisement violation gathering auditing system
CN115515002A (en) * 2022-09-22 2022-12-23 深圳市木愚科技有限公司 Intelligent admire class generation method and device based on virtual digital person and storage medium
CN115905584B (en) * 2023-01-09 2023-08-11 共道网络科技有限公司 Video splitting method and device
CN115905584A (en) * 2023-01-09 2023-04-04 共道网络科技有限公司 Video splitting method and device
CN116052222A (en) * 2023-03-06 2023-05-02 吉林大学 Cattle face recognition method for naturally collecting cattle face image
CN117727047A (en) * 2024-02-07 2024-03-19 深圳市多易得信息技术股份有限公司 AI-based large-model content security quality inspection processing method

Similar Documents

Publication Publication Date Title
CN112860943A (en) Teaching video auditing method, device, equipment and medium
US10824874B2 (en) Method and apparatus for processing video
US10380236B1 (en) Machine learning system for annotating unstructured text
CN110751224B (en) Training method of video classification model, video classification method, device and equipment
CN110008378B (en) Corpus collection method, device, equipment and storage medium based on artificial intelligence
US9202255B2 (en) Identifying multimedia objects based on multimedia fingerprint
CN112231275A (en) Multimedia file classification, information processing and model training method, system and equipment
CN110909205A (en) Video cover determination method and device, electronic equipment and readable storage medium
CN110856037B (en) Video cover determination method and device, electronic equipment and readable storage medium
CN110781960B (en) Training method, classification method, device and equipment of video classification model
CN114297439B (en) Short video tag determining method, system, device and storage medium
CN111314732A (en) Method for determining video label, server and storage medium
CN110727785A (en) Recommendation method, device and storage medium for training recommendation model and recommending search text
CN111984821A (en) Method and device for determining dynamic cover of video, storage medium and electronic equipment
CN111125429A (en) Video pushing method and device and computer readable storage medium
CN111816170B (en) Training of audio classification model and garbage audio recognition method and device
CN112015928A (en) Information extraction method and device of multimedia resource, electronic equipment and storage medium
CN111708909A (en) Video tag adding method and device, electronic equipment and computer-readable storage medium
CN116567351B (en) Video processing method, device, equipment and medium
CN110188277B (en) Resource recommendation method and device
CN114363664A (en) Method and device for generating video collection title
CN114842382A (en) Method, device, equipment and medium for generating semantic vector of video
CN112818984B (en) Title generation method, device, electronic equipment and storage medium
CN114880458A (en) Book recommendation information generation method, device, equipment and medium
CN114328995A (en) Content recommendation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210528