CN113051236B

CN113051236B - Method and device for auditing video and computer-readable storage medium

Info

Publication number: CN113051236B
Application number: CN202110255203.5A
Authority: CN
Inventors: 刘伟科; 韩卫召; 沈俊杰
Original assignee: Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2021-03-09
Filing date: 2021-03-09
Publication date: 2022-06-07
Anticipated expiration: 2041-03-09
Also published as: CN113051236A; WO2022188510A1

Abstract

The disclosure provides a method and a device for auditing videos and a computer-readable storage medium, and relates to the technical field of computers. In the disclosure, a video frame set corresponding to a video to be audited is obtained; calculating the similarity between every two adjacent frames in the video frame set; based on the similarity, screening out dissimilar video frames from the video frame set to form a representative frame set; and outputting the representative frame set, and auditing the representative frame set. The representative frame set is extracted according to the video frame set which comprises all frames and corresponds to the whole video to be audited, the representative frame set comprises a few representative frames, the auditing result of the whole video to be audited can be obtained by auditing the representative frame set, manpower is saved, efficiency is improved, high accuracy can be guaranteed, the method can be applied to a large-scale video auditing scene, and the requirements of high efficiency and high accuracy are met.

Description

Method and device for auditing video and computer-readable storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for auditing videos, and a computer-readable storage medium.

Background

The auditing aiming at the content is an important link for ensuring the internet health, at present, the auditing mode aiming at the video field is a mode based on the combination of spot check and manual auditing, namely, the spot check frequency is set, one or more frames are extracted from the video to be audited in a timing mode, and then the extracted frames are manually audited.

Disclosure of Invention

In the related technology, by means of a mode of combining spot-check and manual check, the efficiency and accuracy of the check result have certain correlation with the spot-check frequency, the set spot-check frequency is too large, so that the efficiency is low, and the set spot-check frequency is too small, so that the missed check is easy to occur, and the check accuracy is low.

In the embodiment of the disclosure, a video frame set corresponding to a video to be audited is obtained; calculating the similarity between every two adjacent frames in the video frame set; based on the similarity, screening out dissimilar video frames from the video frame set to form a representative frame set; and outputting the representative frame set, and auditing the representative frame set. The representative frame set is extracted according to the video frame set which comprises all frames and corresponds to the whole video to be audited, the representative frame set comprises a few representative frames, the auditing result of the whole video to be audited can be obtained by auditing the representative frame set, manpower is saved, efficiency is improved, high accuracy can be guaranteed, the method can be applied to a large-scale video auditing scene, and the requirements of high efficiency and high accuracy are met.

According to some embodiments of the present disclosure, there is provided a method of reviewing a video, comprising: acquiring a video frame set corresponding to a video to be audited; calculating the similarity between every two adjacent frames in the video frame set; based on the similarity, screening out dissimilar video frames from the video frame set to form a representative frame set; and outputting the representative frame set, and auditing the representative frame set.

In some embodiments, further comprising: before the representative frame set is output, selecting a frame from the video frame set at preset intervals, and adding the frame into the representative frame set.

In some embodiments, screening out dissimilar video frames from the set of video frames based on the similarity, and forming the representative frame set comprises: adding a first frame appearing in the video frame set to the representative frame set; and calculating the similarity between every two adjacent frames in the video frame set from the first frame, and adding the later frame in the adjacent frames into the representative frame set when the similarity is smaller than a preset value.

In some embodiments, said calculating the similarity between each two adjacent frames in the set of video frames comprises: calculating bit arrays corresponding to every two adjacent frames respectively; and determining the similarity between every two adjacent frames according to the Hamming distance between the bit arrays respectively corresponding to every two adjacent frames.

In some embodiments, said calculating the corresponding set of bits for the frame comprises: calculating the average value of pixel values corresponding to all pixel points of the frame; setting the pixel points of which the pixel values are greater than or equal to the average value in the frame as a first preset value, and setting the pixel points of which the pixel values are less than the average value in the frame as a second preset value; and determining an array obtained by the first preset value and the second preset value as a bit array corresponding to the frame.

In some embodiments, the set of video frames comprises a set of video frames consisting of original video frames or a set of video frames consisting of compressed video frames.

In some embodiments, the video frame set composed of the compressed video frames is obtained by parallel compression processing performed by a plurality of compression servers, where the parallel compression includes: acquiring the number of frames corresponding to a video to be audited; determining the number of frames to be processed of each compression server according to the number of the frames and the number of the compression servers; compressing the corresponding frame to be processed by using the compression server; and forming a video frame set by the compressed video frames output by all the compression servers.

In some embodiments, said compressing, by the compression server, the respective to-be-processed frame includes: dividing all pixel points of each frame to be processed into a plurality of groups by using the compression server, wherein each group corresponds to one pixel point in the compressed frame; calculating the average value of the pixel values corresponding to all the pixel points in each group; and taking the average value of the pixel values corresponding to all the pixel points in each group as the pixel value of the pixel point corresponding to the group in the compressed frame.

In some embodiments, in a case that the pixel points of the frame to be processed are represented by a pixel value of an R channel, a pixel value of a G channel, and a pixel value of a B channel, the calculating an average value of pixel values corresponding to all the pixel points in each group includes: calculating the average value of the pixel values of the R channels corresponding to all the pixel points in each group; calculating the average value of the pixel values of the G channels corresponding to all the pixel points in each group; calculating the average value of the pixel values of the B channels corresponding to all the pixel points in each group; wherein, the taking the average value of the pixel values corresponding to all the pixel points in each group as the pixel value of the pixel point corresponding to the group in the compressed frame comprises: and respectively taking the average value of the pixel values of the R channel, the average value of the pixel values of the G channel and the average value of the pixel values of the B channel corresponding to all the pixel points in each group as the pixel value of the R channel, the pixel value of the G channel and the pixel value of the B channel of the pixel point corresponding to the group in the compressed frame.

In some embodiments, the obtaining a set of video frames corresponding to a video to be audited includes: cutting a video to be audited into a plurality of sub-videos; performing parallel processing on a plurality of sub-videos to obtain a sub-frame set corresponding to each sub-video; utilizing a snowflake algorithm to generate an identifier of each frame in all the subframe sets; and aiming at all the subframe sets, arranging all frames in all the subframe sets according to the identification sequence, and forming a video frame set by all the sequenced frames.

According to further embodiments of the present disclosure, there is provided an apparatus for auditing videos, including: the acquisition module is configured to acquire a video frame set corresponding to a video to be audited; a calculation module configured to calculate a similarity between each two adjacent frames in the set of video frames; the determining module is configured to screen out dissimilar video frames from the video frame set based on the similarity to form a representative frame set; and the auditing module is configured to output the representative frame set and audit the representative frame set.

According to still further embodiments of the present disclosure, there is provided an apparatus for auditing videos, including: a memory; and a processor coupled to the memory, the processor configured to perform the method of reviewing videos of any embodiment based on instructions stored in the memory.

According to still further embodiments of the disclosure, there is provided a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of reviewing a video of any of the embodiments.

Drawings

The drawings that will be used in the description of the embodiments or the related art will be briefly described below. The present disclosure will be more clearly understood from the following detailed description, which proceeds with reference to the accompanying drawings.

It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without undue inventive faculty.

Fig. 1 illustrates a flow diagram of a method of reviewing a video, in accordance with some embodiments of the present disclosure.

Fig. 2 illustrates a schematic diagram of video versus shot, one second shot, video frame, according to some embodiments of the present disclosure.

Figure 3 illustrates a flow diagram for parsing a video to be audited into a set of video frames, according to some embodiments of the present disclosure.

Fig. 4 illustrates a schematic diagram of a video frame set composed of compressed video frames resulting from parallel compression processing by a compression server according to some embodiments of the present disclosure.

Fig. 5 illustrates a schematic diagram of a compression process for a frame, according to some embodiments of the present disclosure.

Fig. 6a illustrates a schematic diagram of two frames being dissimilar, according to some embodiments of the present disclosure.

Fig. 6b shows a schematic diagram of two frames being similar, according to some embodiments of the present disclosure.

Fig. 7 illustrates a schematic diagram of calculating a similarity between every two adjacent frames in a set of video frames according to some embodiments of the present disclosure.

Fig. 8 illustrates a schematic diagram of an apparatus for reviewing videos, according to some embodiments of the present disclosure.

Fig. 9 shows a schematic diagram of an apparatus for reviewing videos, according to further embodiments of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure.

The descriptions of "first", "second", etc. in this disclosure are intended to refer to different objects, and are not intended to refer to the meaning of size or timing, etc., unless otherwise specified.

Fig. 1 illustrates a flow diagram of a method of reviewing a video, in accordance with some embodiments of the present disclosure. The method may be performed, for example, by an apparatus reviewing a video.

As shown in FIG. 1, the method of this embodiment includes

steps

110, 130, 150, and in some embodiments, further includes step 140.

In step 110, a video frame set corresponding to a video to be audited is obtained.

As shown in fig. 2, a video may be composed of a plurality of shots, and a shot may be composed of a plurality of 1-second shots, and typically, a 1-second shot may be composed of 24 frames, for example, and 1 image is composed of n pixels (i.e., commonly referred to as a picture size). Here, a frame is also referred to as an image or a picture.

Therefore, the general video frame number calculation formula may be, for example: the video duration (minutes) 60 (seconds) 24 (frames), for example, a ten minute video corresponds to 14400 frames of 10 × 60 × 24.

In some embodiments, obtaining a set of video frames corresponding to a video to be audited includes: cutting a video to be audited into a plurality of sub-videos; performing parallel processing on a plurality of sub-videos to obtain a sub-frame set corresponding to each sub-video; utilizing a snowflake algorithm to generate an identifier of each frame in all the subframe sets; and aiming at all the subframe sets, arranging all frames in all the subframe sets according to the identification sequence, and forming a video frame set by all the sequenced frames.

The video to be audited is parallelly analyzed into the video frame set, so that the time cost can be reduced, the processing efficiency can be improved, and the requirement on the efficiency in a video audit scene can be met.

As shown in FIG. 3, the method of this embodiment includes steps 310-330.

In step 310, a video to be audited is loaded by using a parsing server, which may be implemented by, for example, a scheme File f ═ new File ($ path) provided in Java language.

In step 320, the video frames are acquired in sequence.

Wherein step 320 may include sub-steps 321-323.

In sub-step 321, a Frame image included by the Frame object of FFmpeg is extracted by the multimedia Frame FFmpeg.

In sub-step 322, the image in the Frame object is extracted as a Java memory image instance. For example, the image in the Frame object can be extracted into a Java memory image Frame by Java2DFrameconverter through a grabImage method in FFmpeg

In sub-step 323, a video frame is generated using the Java memory image instance. For example, the extracted Java memory image instance may be generated into a picture in a format such as jpg or png by a drawImage method through a buffermmage object of the Java language, and stored in a specified folder. The continuity of the saved pictures may be guaranteed, for example, with time stamps and/or other continuity means.

The loop step 321-323 can parse the entire video to be audited into a video frame set.

In step 330, the frames are collected sequentially, and a video frame set is obtained.

In some embodiments, the set of video frames may comprise, for example, a set of video frames consisting of original video frames or a set of video frames consisting of compressed video frames.

And the video frame set formed by the compressed video frames is obtained by parallel compression processing of a plurality of compression servers. Parallel compression may include, for example: acquiring the number of frames corresponding to a video to be audited; and determining the number of the frames to be processed of each compression server according to the number of the frames and the number of the compression servers. Compressing the corresponding frame to be processed by using a compression server; and forming a video frame set by the compressed video frames output by all the compression servers.

By compressing a frame, the number of pixel points of the frame is reduced, and the subsequent processing efficiency (for example, the reduction of the number of computations in the computation of the similarity) can be improved. By adopting a parallel compression method, the compression efficiency can be further improved, so that the efficiency of video auditing is improved, and the requirement of high efficiency is met.

For example, the number of frames/number of compression servers is the number of frames that each compression server needs to compress. The number of compression servers may be set to a multiple of 24, for example, to facilitate the calculation of compression efficiency. Assuming that the compression server is 240, 10 seconds of video can be processed each time compression is performed (240 frames/24 frames per second), and if one compression takes 100 milliseconds, the performance improvement is 100 times 10 seconds/100 milliseconds (1 second 1000 milliseconds).

As shown in fig. 4, for example, 3 compression servers are used to perform parallel compression processing on the video to be audited. Parsing the video to be audited into original video frames, for example, the number of frames is 10, the number of frames/the number of compression servers is 10/3-3 … 1, 2 compression servers need to compress 3 frames, and 1 compression server needs to compress 4 frames. And performing parallel compression processing on the original video frames by using each compression server to respectively obtain compressed video frames 1-10.

In some embodiments, compressing, with the compression server, the respective to-be-processed frames comprises: dividing all pixel points of each frame to be processed into a plurality of groups by using a compression server, wherein each group corresponds to one pixel point in the compressed frame; calculating the average value of the pixel values corresponding to all the pixel points in each group; and taking the average value of the pixel values corresponding to all the pixel points in each group as the pixel value of the pixel point corresponding to the group in the compressed frame.

In some embodiments, in a case that the pixel points of the frame to be processed are represented by a pixel value of an R channel, a pixel value of a G channel, and a pixel value of a B channel, calculating an average value of pixel values corresponding to all the pixel points in each group includes: calculating the average value of the pixel values of the R channels corresponding to all the pixel points in each group; calculating the average value of the pixel values of the G channels corresponding to all the pixel points in each group; calculating the average value of the pixel values of the B channels corresponding to all the pixel points in each group; the method for compressing the frame comprises the following steps of (1) taking the average value of pixel values corresponding to all pixel points in each group as the pixel value of the corresponding pixel point in the group in the compressed frame, wherein the step of: and respectively taking the average value of the pixel values of the R channel, the average value of the pixel values of the G channel and the average value of the pixel values of the B channel corresponding to all the pixel points in each group as the pixel value of the R channel, the pixel value of the G channel and the pixel value of the B channel of the pixel point corresponding to the group in the compressed frame.

Fig. 5 illustrates a schematic diagram of a frame compression process according to some embodiments of the present disclosure.

As shown in fig. 5, it is assumed that a frame to be compressed has 4 pixels, which are identified as pixels 1-4, and these pixels are represented as a pixel value of an R channel, a pixel value of a G channel, and a pixel value of a B channel, and RGB of the pixels are in a 16-ary format: 31ADF1, 31ADF1, 31ADF1, 45FA 8B.

Averaging the pixel value of the R channel, the pixel value of the G channel and the pixel value of the B channel respectively as follows:

on the R channel, (31+31+31+ 45)/4-138/4-34.

On the G channel, (AD + FA)/4, converted to decimal representation (173+173+173+250)/4 ═ 192, converted to 16 to C0.

On the B channel, (F1+ F1+ F1+8B)/4, the decimal representation is (241+241+241+139)/4 ═ 215, and the decimal representation is converted into the 16-system d 7.

The pixel value of the final output compressed pixel 5 is represented as 34C0D 7.

For example, according to the above method, once every 4 pixels in each frame are compressed, and after three times, the picture can be compressed to 1/64 of the original picture size, for example, 1920 × 1280 size, and the compressed size is 30 × 20, and the compression process can be performed by a conventional server, and it only needs 100 milliseconds to compress one picture.

For example, the technique of compression is implemented as: the bufferdImage object using Java language can read the specified pixel according to the height parameter and the width parameter, thereby reading the pixel data of the whole picture, and the object can draw a picture based on the read pixel data. The realization has the characteristic of high reading speed, and can meet the requirement of high efficiency.

At step 120, a similarity between every two adjacent frames in the set of video frames is calculated.

Calculating the similarity between each two adjacent frames in the set of video frames comprises: calculating bit array (namely bitArray) corresponding to each two adjacent frames; and determining the similarity between every two adjacent frames according to the Hamming distance between the bit arrays respectively corresponding to every two adjacent frames. The calculation similarity can be processed in parallel, and after the calculation of each server in parallel processing is completed, the similarity judgment between the tail frame and the first frame between two adjacent servers is carried out. The processing efficiency can be further improved.

Wherein, the bit array corresponding to the calculation frame comprises: calculating the average value of pixel values corresponding to all pixel points of the frame; setting pixel points with pixel values larger than or equal to the average value in the frame as a first preset value, and setting pixel points with pixel values smaller than the average value in the frame as a second preset value; and determining an array obtained by the first preset value and the second preset value as a bit array corresponding to the frame.

In some embodiments, in a case that the pixel points of the frame are represented by a pixel value of an R channel, a pixel value of a G channel, and a pixel value of a B channel, calculating an average value of pixel values corresponding to all the pixel points of the frame includes: and calculating the average value of the R channel pixel value, the G channel pixel value and the B channel pixel value of all the pixel points of the frame as the pixel value corresponding to the pixel point. For example, a frame has 4 pixels, denoted 31ADF1, 31ADF1, 31ADF1, 45FA8B (hexadecimal). And calculating an average value as follows: on the R channel, (31+31+31+45)/4 ═ 34 (hexadecimal), on the G channel, (AD + FA)/4 ═ C0 (hexadecimal), on the B channel, (F1+ F1+ F1+8B)/4 ═ D7 (hexadecimal), and then, the pixel values of the respective channels are averaged to obtain 34+ C0+ D7 ═ 110.25 (decimal).

Calculating the pixel degrees of the two frames, for example, comparing each pixel value x with the average value m in sequence, if the pixel value is greater than or equal to the average value, marking the pixel point as 1, if the pixel value is less than the average value, marking the pixel point as 0, that is, the pixel degree can be expressed as: x > m1: 0. The marking results of all the pixels of a certain frame are written into a bit array according to bits, for example, as [0,1,1,0,1,1,0,1,0,1 ].

In step 130, based on the similarity, dissimilar video frames are screened out from the video frame set to form a representative frame set.

Based on the similarity, screening out dissimilar video frames from the video frame set, and forming a representative frame set, wherein the representative frame set comprises: adding a first frame appearing in the video frame set to the representative frame set; and calculating the similarity between every two adjacent frames in the video frame set from the first frame, and adding the later frame in the adjacent frames into the representative frame set when the similarity is smaller than a preset value.

In some embodiments, the last frame of a shot may also be added to the set of representative frames. Two adjacent dissimilar frames respectively constitute the first frames of two different shot. The step selects the representative frame set in a targeted manner, so that the accuracy of auditing can be further improved.

The dissimilar frames are screened out to be used as representative frames, frames which cannot be identified by naked eyes or only by setting a sampling frequency can be identified, missing judgment is avoided, and the accuracy of auditing is improved.

Assuming that values over 10% in the bit arrays of the two frames are not consistent, i.e. the similarity is less than 90%, the two frames are considered to be not similar; values less than or equal to 10% of the bit arrays of the two frames are inconsistent, i.e., the degree of similarity is greater than 90%, i.e., the two frames are similar. If the values of more than 60 bits of 30 × 20 bits are inconsistent, the two frames are not similar, and if the values of less than or equal to 60 bits are inconsistent, the two frames are similar.

The corresponding bit groups of frame a and frame B have 10 bits, wherein there are 2 bits in different bits and 8 bits in the same bit, i.e. the similarity is 8/10-80%, and if the similarity is less than 90%, it is determined that the two frames are not similar, and then frame a and frame B are not similar.

The corresponding bit groups of the frame a and the frame B have 10 bits, wherein the number of different bits is 1, the number of the same bits is 9, that is, the similarity is 9/10-90%, and assuming that the similarity is not less than 90%, the two frames are considered similar, and the frame a and the frame B are similar.

Assume that the number of consecutive frames corresponding to the video to be audited is 5, which are respectively frame a, frame B, frame C, frame D, and frame E.

Firstly, adding a frame A into a representative frame set, calculating the similarity of the frame A and the frame B, then calculating the similarity of the frame B and the frame C if the frame A and the frame B are similar, then calculating the similarity of the frame C and the frame D if the frame B and the frame C are similar, and the like, and then adding a dissimilar frame D into the representative frame set if the frame C and the frame D are dissimilar.

Two adjacent dissimilar frames a and D respectively constitute the first frames of two different shots, i.e., the first shot includes frame a, frame B, and frame C, and the second shot includes frame D and frame E. The last frame of the first shot, respectively frame C, is also added to the set of representative frames and the process is repeated starting from D.

At step 140, a frame is selected from the video frame set at predetermined intervals and added to the representative frame set.

A representative frame is extracted at preset intervals, and the calculation formula thereof may be, for example: the shot total frame number/1440 is the representative frame number extracted at intervals (where 1440 is the frame number of 1 minute of video), that is, one representative frame is extracted every 1440 frames. Assuming a shot length of 8 minutes and 30 seconds, a total of 10 frames of 0, 1440, 2880.. 12240 frames need to be extracted.

And selecting one frame at preset intervals, adding the frame into the representative frame set, so that the overlarge difference between the first frame and the last frame of the shot can be avoided, and illegal frames in the slowly-changing continuous frames can be avoided from being omitted, thereby reducing the probability of missing the kernel and improving the accuracy of auditing.

In step 150, the representative frame set is output, and the representative frame set is audited.

Through the steps, the number of the extracted representative frames of the video to be audited in 100 minutes is not more than 200, and the auditing can be completed in 1 minute.

In the embodiment, the representative frame set is extracted according to the video frame set which comprises all frames and corresponds to the whole video to be audited, the representative frame set comprises a few representative frames, the auditing result of the whole video to be audited can be obtained by auditing the representative frame set, manpower is saved, efficiency is improved, high accuracy can be guaranteed, the method can be applied to a large-scale video auditing scene, the requirements of high efficiency and high accuracy are met, and business can be completed quickly and conveniently with high accuracy.

As shown in fig. 8, the apparatus 800 for reviewing a video of this embodiment includes: an acquisition module 810, a calculation module 820, a determination module 830, and an audit module 840.

The obtaining module 810 is configured to obtain a video frame set corresponding to a video to be audited. The method for acquiring the video frame set corresponding to the video to be audited comprises the following steps: cutting a video to be audited into a plurality of sub-videos; performing parallel processing on a plurality of sub-videos to obtain a sub-frame set corresponding to each sub-video; utilizing a snowflake algorithm to generate an identifier of each frame in all the subframe sets; and aiming at all the subframe sets, arranging all frames in all the subframe sets according to the identification sequence, and forming a video frame set by all the sequenced frames.

The video frame set may include, for example, a video frame set composed of original video frames or a video frame set composed of compressed video frames. And a video frame set formed by the compressed video frames is obtained by parallel compression processing of a plurality of compression servers. Wherein the parallel compression comprises: acquiring the number of frames corresponding to a video to be audited; and determining the number of frames to be processed by each compression server according to the number of the frames and the number of the compression servers. Compressing the corresponding frame to be processed by using a compression server; and forming a video frame set by the compressed video frames output by all the compression servers.

The compression processing of the corresponding frame to be processed by using the compression server comprises the following steps: dividing all pixel points of each frame to be processed into a plurality of groups by using a compression server, wherein each group corresponds to one pixel point in the compressed frame; calculating the average value of the pixel values corresponding to all the pixel points in each group; and taking the average value of the pixel values corresponding to all the pixel points in each group as the pixel value of the pixel point corresponding to the group in the compressed frame. In some embodiments, in a case that the pixel points of the frame to be processed are represented as a pixel value of an R channel, a pixel value of a G channel, and a pixel value of a B channel, calculating an average value of pixel values corresponding to all the pixel points in each group includes: calculating the average value of the pixel values of the R channels corresponding to all the pixel points in each group; calculating the average value of the pixel values of the G channels corresponding to all the pixel points in each group; calculating the average value of the pixel values of the B channels corresponding to all the pixel points in each group; the method for compressing the frame comprises the following steps of (1) taking the average value of pixel values corresponding to all pixel points in each group as the pixel value of the corresponding pixel point in the group in the compressed frame, wherein the step of: and respectively taking the average value of the pixel values of the R channel, the average value of the pixel values of the G channel and the average value of the pixel values of the B channel corresponding to all the pixel points in each group as the pixel value of the R channel, the pixel value of the G channel and the pixel value of the B channel of the pixel point corresponding to the group in the compressed frame.

A calculation module 820 configured to calculate a similarity between each two adjacent frames in the set of video frames; wherein calculating the similarity between each two adjacent frames in the set of video frames comprises: calculating bit arrays corresponding to every two adjacent frames respectively; and determining the similarity between every two adjacent frames according to the Hamming distance between the bit arrays respectively corresponding to every two adjacent frames.

Wherein, the bit array corresponding to the calculation frame comprises: calculating the average value of pixel values corresponding to all pixel points of the frame; in some embodiments, in a case that the pixel points of the frame are represented by a pixel value of an R channel, a pixel value of a G channel, and a pixel value of a B channel, calculating an average value of pixel values corresponding to all the pixel points of the frame includes: and averaging the R channel pixel values, the G channel pixel values and the B channel pixel values of all pixel points of the frame. Setting pixel points with pixel values larger than or equal to the average value in the frame as a first preset value, and setting pixel points with pixel values smaller than the average value in the frame as a second preset value; and determining an array obtained by the first preset value and the second preset value as a bit array corresponding to the frame.

A determining module 830 configured to screen out dissimilar video frames from the video frame set based on the similarity, and form a representative frame set; based on the similarity, screening out dissimilar video frames from the video frame set, and forming a representative frame set, wherein the representative frame set comprises: adding a first frame appearing in the video frame set to the representative frame set; and calculating the similarity between every two adjacent frames in the video frame set from the first frame, and adding the later frame in the adjacent frames into the representative frame set when the similarity is smaller than a preset value.

And the auditing module 840 is configured to output the representative frame set and audit the representative frame set.

As shown in fig. 9, the apparatus 900 for reviewing a video of this embodiment includes: a memory 910 and a processor 920 coupled to the memory 910, the processor 920 configured to perform a method of reviewing video in any of the embodiments of the present disclosure based on instructions stored in the memory 910.

Memory 910 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), and other programs.

The apparatus 900 for reviewing videos may further include an input output interface 930, a network interface 940, a storage interface 950, and the like. These

interfaces

930, 940, 950 and the memory 910 and the processor 920 may be connected, for example, by a bus 960. The input/output interface 930 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 940 provides a connection interface for various networking devices. The storage interface 950 provides a connection interface for external storage devices such as an SD card and a usb disk.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-non-transitory readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer program code embodied therein.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only exemplary of the present disclosure and is not intended to limit the present disclosure, so that any modification, equivalent replacement, or improvement made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims

1. A method of reviewing a video, comprising:

obtaining a video frame set corresponding to a video to be audited, comprising

Cutting a video to be audited into a plurality of sub-videos;

performing parallel processing on a plurality of sub-videos to obtain a sub-frame set corresponding to each sub-video;

utilizing a snowflake algorithm to generate an identifier of each frame in all the subframe sets;

aiming at all the subframe sets, arranging all frames in all the subframe sets according to the identification sequence, and forming a video frame set by all the sequenced frames;

calculating the similarity between every two adjacent frames in the video frame set, including the steps of calculating the similarity in parallel by each server and judging the similarity between the tail frame and the first frame between the two adjacent servers; based on the similarity, screening out dissimilar video frames from the video frame set to form a representative frame set;

and outputting the representative frame set, and auditing the representative frame set.

2. A method of reviewing a video as recited in claim 1, further comprising:

before the set of representative frames is output,

and selecting one frame from the video frame set at preset intervals, and adding the frame into the representative frame set.

3. A method for reviewing videos as recited in claim 1, wherein screening out dissimilar video frames from a set of video frames based on the similarity, forming a representative frame set comprises:

adding a first frame appearing in the video frame set to the representative frame set;

and calculating the similarity between every two adjacent frames in the video frame set from the first frame, and adding the later frame in the adjacent frames into the representative frame set when the similarity is smaller than a preset value.

4. A method of reviewing a video as recited in claim 1, wherein said calculating a similarity between each two adjacent frames in the set of video frames comprises:

calculating bit arrays corresponding to every two adjacent frames respectively;

and determining the similarity between every two adjacent frames according to the Hamming distance between the bit arrays respectively corresponding to every two adjacent frames.

5. A method of reviewing a video as recited in claim 4, wherein said calculating a set of bits for a frame comprises:

calculating the average value of pixel values corresponding to all pixel points of the frame;

setting the pixel points of which the pixel values are greater than or equal to the average value in the frame as a first preset value, and setting the pixel points of which the pixel values are less than the average value in the frame as a second preset value;

and determining an array obtained by the first preset value and the second preset value as a bit array corresponding to the frame.

6. A method of reviewing videos as recited in claim 1,

the video frame set comprises a video frame set formed by original video frames or a video frame set formed by compressed video frames.

7. A method for auditing videos according to claim 6, wherein a video frame set consisting of the compressed video frames is obtained by parallel compression processing performed by a plurality of compression servers, wherein the parallel compression includes:

acquiring the number of frames corresponding to a video to be audited;

determining the number of frames to be processed of each compression server according to the number of the frames and the number of the compression servers;

compressing the corresponding frame to be processed by using the compression server;

and forming a video frame set by the compressed video frames output by all the compression servers.

8. A method of reviewing a video as claimed in claim 7, wherein said compressing, with said compression server, respective frames to be processed comprises:

dividing all pixel points of each frame to be processed into a plurality of groups by using the compression server, wherein each group corresponds to one pixel point in the compressed frame;

calculating the average value of the pixel values corresponding to all the pixel points in each group;

and taking the average value of the pixel values corresponding to all the pixel points in each group as the pixel value of the pixel point corresponding to the group in the compressed frame.

9. A method of reviewing a video, according to claim 8,

under the condition that the pixel points of the frame to be processed are represented by the pixel values of the R channel, the G channel and the B channel, the calculating the average value of the pixel values corresponding to all the pixel points in each group comprises:

calculating the average value of the pixel values of the R channels corresponding to all the pixel points in each group;

calculating the average value of the pixel values of the G channels corresponding to all the pixel points in each group;

calculating the average value of the pixel values of the B channels corresponding to all the pixel points in each group;

wherein, the taking the average value of the pixel values corresponding to all the pixel points in each group as the pixel value of the pixel point corresponding to the group in the compressed frame comprises:

and respectively taking the average value of the pixel values of the R channel, the average value of the pixel values of the G channel and the average value of the pixel values of the B channel corresponding to all the pixel points in each group as the pixel value of the R channel, the pixel value of the G channel and the pixel value of the B channel of the pixel point corresponding to the group in the compressed frame.

10. An apparatus for auditing videos, comprising:

the acquisition module is configured to acquire a video frame set corresponding to a video to be audited, and comprises the steps of cutting the video to be audited into a plurality of sub-videos; performing parallel processing on a plurality of sub-videos to obtain a sub-frame set corresponding to each sub-video; utilizing a snowflake algorithm to generate an identifier of each frame in all the subframe sets; aiming at all the subframe sets, arranging all frames in all the subframe sets according to the identification sequence, and forming a video frame set by all the sequenced frames;

the computing module is configured to compute the similarity between every two adjacent frames in the video frame set, and comprises the steps of computing the similarity in parallel by each server and judging the similarity between a tail frame and a first frame between two adjacent servers;

the determining module is configured to screen out dissimilar video frames from the video frame set based on the similarity to form a representative frame set;

and the auditing module is configured to output the representative frame set and audit the representative frame set.

11. An apparatus for reviewing a video, comprising:

a memory; and

a processor coupled to the memory, the processor configured to perform the method of reviewing a video of any one of claims 1-9 based on instructions stored in the memory.

12. A non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of reviewing a video of any one of claims 1-9.