WO2022188510A1 - 审核视频的方法、装置及计算机可读存储介质 - Google Patents

审核视频的方法、装置及计算机可读存储介质 Download PDF

Info

Publication number
WO2022188510A1
WO2022188510A1 PCT/CN2021/140756 CN2021140756W WO2022188510A1 WO 2022188510 A1 WO2022188510 A1 WO 2022188510A1 CN 2021140756 W CN2021140756 W CN 2021140756W WO 2022188510 A1 WO2022188510 A1 WO 2022188510A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
frame
frames
frame set
pixel
Prior art date
Application number
PCT/CN2021/140756
Other languages
English (en)
French (fr)
Inventor
刘伟科
韩卫召
沈俊杰
Original Assignee
北京沃东天骏信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京沃东天骏信息技术有限公司 filed Critical 北京沃东天骏信息技术有限公司
Publication of WO2022188510A1 publication Critical patent/WO2022188510A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/735Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Definitions

  • the present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, and a computer-readable storage medium for reviewing video.
  • the auditing method for the video field is based on a combination of random inspection and manual auditing, that is, setting the sampling frequency and regularly extracting one or more frames from the video to be reviewed. frames, and then manually review these extracted frames.
  • the combination of random inspection and manual auditing, the efficiency and accuracy of audit results have a certain correlation with the sampling frequency. If the sampling frequency is set too high, the efficiency will be low. If the sampling frequency is set too small, the efficiency will be low. This leads to easy missed inspections, which in turn leads to low audit accuracy.
  • a video frame set corresponding to the video to be reviewed is obtained; the similarity between every two adjacent frames in the video frame set is calculated; based on the similarity, the dissimilarity is filtered out from the video frame set
  • the video frame constitutes a set of representative frames; the set of representative frames is output, and the set of representative frames is reviewed.
  • the representative frame set is extracted according to the video frame set containing all frames corresponding to the entire video to be reviewed.
  • the representative frame set includes a few representative frames.
  • a method for reviewing a video including: acquiring a video frame set corresponding to a video to be reviewed; calculating the similarity between every two adjacent frames in the video frame set; Similarity, select dissimilar video frames from the video frame set to form a representative frame set; output the representative frame set, and review the representative frame set.
  • the method further includes: before outputting the representative frame set, selecting a frame from the video frame set at every preset interval and adding it to the representative frame set.
  • screening out dissimilar video frames from the video frame set, and forming the representative frame set includes: adding the first frame appearing in the video frame set to the representative frame set; Starting from the first frame, the similarity between each two adjacent frames in the video frame set is calculated, and when the similarity is less than the preset value, the later frame in the adjacent frames is added to the representative frame set.
  • the calculating the similarity between every two adjacent frames in the video frame set includes: calculating a bit array corresponding to each two adjacent frames; The Hamming distance between the corresponding bit arrays determines the similarity between every two adjacent frames.
  • the calculating a bit array corresponding to a frame includes: calculating an average value of pixel values corresponding to all pixel points in the frame; calculating pixel points whose pixel values in the frame are greater than or equal to the average value Set as the first preset value, and set the pixel points in the frame whose pixel value is less than the average value as the second preset value; determine the array obtained by the first preset value and the second preset value as The bit array corresponding to the frame.
  • the video frame set includes a video frame set composed of original video frames or a video frame set composed of compressed video frames.
  • the video frame set composed of the compressed video frames is obtained by performing parallel compression processing by multiple compression servers, wherein the parallel compression includes: acquiring the number of frames corresponding to the video to be reviewed; The number of servers determines the number of frames to be processed by each compression server; the compression server is used to compress the corresponding frames to be processed; the compressed video frames output by all the compression servers are formed into a video frame set.
  • using the compression server to perform compression processing on the corresponding frame to be processed includes: using the compression server to divide all pixel points of each frame to be processed into multiple groups, each group being compressed There is one pixel point in the frame after the compression; calculate the average value of the pixel values corresponding to all the pixel points in each group; take the average value of the pixel values corresponding to all the pixel points in each group as the corresponding pixel value of the group in the compressed frame The pixel value of the pixel point.
  • the calculating the pixels corresponding to all the pixels in each group includes: calculating the average value of the pixel values of the R channel corresponding to all the pixels in each group; calculating the average value of the pixel values of the G channel corresponding to all the pixels in each group; The average value of the pixel value of the B channel corresponding to the pixel point; wherein, the average value of the pixel value corresponding to all the pixel points in each group as the pixel value of the corresponding group of pixel points in the compressed frame includes: The average value of the pixel values of the R channel, the average value of the pixel values of the G channel, and the average value of the pixel values of the B channel corresponding to all the pixels in each group are taken as the R values of the corresponding pixel points in the group in the compressed frame, respectively.
  • the pixel value of the channel
  • the acquiring the video frame set corresponding to the video to be reviewed includes: cropping the video to be reviewed into multiple sub-videos; performing parallel processing on the multiple sub-videos to obtain a sub-frame set corresponding to each sub-video; using snowflakes
  • the algorithm generates the identification of each frame in all subframe sets; for all subframe sets, all frames in all subframe sets are arranged in the order of identification, and all the sorted frames constitute a video frame set.
  • an apparatus for reviewing video comprising: an obtaining module configured to obtain a video frame set corresponding to the video to be reviewed; a calculation module configured to calculate every two video frame sets in the video frame set the similarity between adjacent frames; the determining module is configured to screen out dissimilar video frames from the video frame set based on the similarity to form a representative frame set; the review module is configured to output the representative frame A set of frames, and the representative frame set is reviewed.
  • an apparatus for reviewing video comprising: a memory; and a processor coupled to the memory, the processor configured to execute, based on instructions stored in the memory, The method for reviewing a video according to any one of the embodiments.
  • a non-transitory computer-readable storage medium having a computer program stored thereon, and when the program is executed by a processor, implements the method for reviewing a video according to any one of the embodiments.
  • FIG. 1 shows a schematic flowchart of a method for reviewing a video according to some embodiments of the present disclosure.
  • FIG. 2 shows a schematic diagram of the relationship between video and sub-shots, one-second shots, and video frames according to some embodiments of the present disclosure.
  • FIG. 3 shows a schematic flowchart of parsing a video to be reviewed into a set of video frames according to some embodiments of the present disclosure.
  • FIG. 4 shows a schematic diagram of a video frame set composed of compressed video frames obtained by parallel compression processing by a compression server according to some embodiments of the present disclosure.
  • FIG. 5 illustrates a schematic diagram of a frame compression process according to some embodiments of the present disclosure.
  • Figure 6a shows a schematic diagram of two frames being dissimilar according to some embodiments of the present disclosure.
  • Figure 6b shows a schematic diagram of two frames being similar according to some embodiments of the present disclosure.
  • FIG. 7 illustrates a schematic diagram of calculating the similarity between every two adjacent frames in a set of video frames according to some embodiments of the present disclosure.
  • FIG. 8 shows a schematic diagram of an apparatus for reviewing video according to some embodiments of the present disclosure.
  • FIG. 9 shows a schematic diagram of an apparatus for reviewing video according to other embodiments of the present disclosure.
  • FIG. 1 shows a schematic flowchart of a method for reviewing a video according to some embodiments of the present disclosure.
  • the method may, for example, be performed by a device for reviewing video.
  • the method of this embodiment includes steps 110 - 130 , 150 , and in some embodiments, further includes step 140 .
  • step 110 a video frame set corresponding to the video to be reviewed is acquired.
  • FIG. 2 shows a schematic diagram of the relationship between video and sub-shots, one-second shots, and video frames according to some embodiments of the present disclosure.
  • a video can be composed of multiple sub-shots, and a sub-shot can be composed of multiple 1-second shots.
  • a 1-second shot can be composed of, for example, 24 frames, and an image is composed of n pixels (the so-called picture size).
  • a frame is also called an image or a picture.
  • acquiring a set of video frames corresponding to the video to be reviewed includes: cropping the video to be reviewed into multiple sub-videos; performing parallel processing on the multiple sub-videos to obtain a set of sub-frames corresponding to each sub-video; using a snowflake algorithm to generate Identification of each frame in all subframe sets; for all subframe sets, all frames in all subframe sets are arranged in the order of identification, and all the sorted frames constitute a video frame set.
  • Parsing the video to be reviewed into a set of video frames in parallel can reduce time costs, improve processing efficiency, and meet the efficiency requirements in video review scenarios.
  • FIG. 3 shows a schematic flowchart of parsing a video to be reviewed into a set of video frames according to some embodiments of the present disclosure.
  • the method of this embodiment includes steps 310-330.
  • step 320 each video frame is sequentially acquired.
  • step 320 may include sub-steps 321-323.
  • sub-step 321 a frame of image included by the Frame object of FFmpeg is extracted through the multimedia framework FFmpeg.
  • the image in the Frame object is extracted as a Java memory image instance.
  • the image in the Frame object can be extracted as a Java memory image through the Java2DFrameConverter frame by frame through the grabImage method in FFmpeg
  • a video frame is generated using the Java memory image instance.
  • the extracted Java memory image instance can be generated into a jpg or png format image through the drawImage method, and saved in the specified folder.
  • the continuity of the saved pictures can be guaranteed by time stamps and/or other continuous means.
  • Looping steps 321-323 the entire video to be reviewed can be parsed into a video frame set.
  • each frame is sequentially collected to obtain a set of video frames.
  • the video frame set may include, for example, a video frame set composed of original video frames or a video frame set composed of compressed video frames.
  • the video frame set composed of the compressed video frames is obtained by performing parallel compression processing by multiple compression servers.
  • the parallel compression may, for example, include: acquiring the number of frames corresponding to the video to be reviewed; and determining the number of frames to be processed by each compression server according to the number of frames and the number of compression servers.
  • the corresponding frames to be processed are compressed by the compression server; the compressed video frames output by all the compression servers are formed into a video frame set.
  • the compression efficiency can be further improved, thereby improving the efficiency of video review and meeting the requirements of high efficiency.
  • the number of frames/the number of compression servers the number of frames each compression server needs to compress.
  • FIG. 4 shows a schematic diagram of a video frame set composed of compressed video frames obtained by parallel compression processing by a compression server according to some embodiments of the present disclosure.
  • each compression server is used to perform parallel compression processing on the original video frames to obtain compressed video frames 1-10 respectively.
  • using the compression server to perform compression processing on the corresponding frame to be processed includes: using the compression server to divide all the pixel points of each frame to be processed into multiple groups, and each group corresponds to one in the compressed frame Pixel points; calculate the average value of the pixel values corresponding to all the pixel points in each group; take the average value of the pixel values corresponding to all the pixel points in each group as the pixel value of the corresponding pixel point in the group in the compressed frame.
  • the pixels of the frame to be processed are represented as pixel values of the R channel, the pixel value of the G channel, and the pixel value of the B channel, the pixel values corresponding to all the pixel points in each group are calculated.
  • the average value includes: calculating the average value of the pixel values of the R channel corresponding to all the pixel points in each group; calculating the average value of the pixel value of the G channel corresponding to all the pixel points in each group; calculating the average value of all the pixel points in each group
  • the average value of the pixel values of the corresponding B channel; wherein, taking the average value of the pixel values corresponding to all the pixel points in each group as the pixel value of the corresponding pixel points in the group in the compressed frame includes: The average value of the pixel values of the R channel, the average value of the pixel value of the G channel, and the average value of the pixel value of the B channel corresponding to all pixels are taken as the pixel value of the R channel of the corresponding group of pixels in the compressed frame, respectively. , the pixel value of the G channel and the pixel value of the B channel.
  • FIG. 5 illustrates a schematic diagram of a frame compression process according to some embodiments of the present disclosure.
  • a certain frame to be compressed has 4 pixel points, which are identified as pixels 1-4. These pixel points are expressed as the pixel value of the R channel, the pixel value of the G channel and the pixel value of the B channel.
  • the hexadecimal format of RGB is: 31ADF1, 31ADF1, 31ADF1, 45FA8B.
  • the average value of the pixel value of the R channel, the pixel value of the G channel and the pixel value of the B channel is:
  • the pixel value of the final output compressed pixel 5 is represented as 34COD7.
  • every 4 pixels in each frame is compressed once, and after three times, the picture can be compressed to 1/64 of the original picture size.
  • the compressed size is 30*20
  • the compression process can be performed by a regular server, and it only takes 100 milliseconds to compress an image.
  • the compression technology is implemented as follows: the BufferedImage object in Java language can read the specified pixels according to the height parameter and the width weight parameter, so as to read the pixel data of the entire image, and the object can also draw a picture based on the read pixel data. picture.
  • the above implementation has the characteristics of fast reading speed and can meet the requirements of high efficiency.
  • step 120 the similarity between every two adjacent frames in the set of video frames is calculated.
  • Calculating the similarity between every two adjacent frames in the video frame set includes: calculating a bit array (ie bitArray) corresponding to each two adjacent frames; Hamming distance between arrays, which determines the similarity between every two adjacent frames.
  • the calculation of similarity can be processed in parallel. After the calculation of each parallel processed server is completed, the similarity judgment between the last frame and the first frame between two adjacent servers can be performed. Processing efficiency can be further improved.
  • calculating the bit array corresponding to the frame includes: calculating the average value of the pixel values corresponding to all the pixel points in the frame; The pixel point whose pixel value is less than the average value is set as the second preset value; the array obtained from the first preset value and the second preset value is determined as the bit array corresponding to the frame.
  • calculating the average value of the pixel values corresponding to all the pixel points of the frame includes: calculating The average value of the pixel value of the R channel, the pixel value of the G channel, and the pixel value of the B channel of all the pixel points of the frame is taken as the pixel value corresponding to the pixel point.
  • a certain frame has 4 pixels, which are represented as 31ADF1, 31ADF1, 31ADF1, 45FA8B (hexadecimal).
  • step 130 based on the similarity, dissimilar video frames are selected from the video frame set to form a representative frame set.
  • dissimilar video frames are screened from the video frame set, and forming a representative frame set includes: adding the first frame appearing in the video frame set to the representative frame set; starting from the first frame, calculating the video frame The similarity between each two adjacent frames in the frame set, when the similarity is less than the preset value, the later frame in the adjacent frames is added to the representative frame set.
  • the last frame of the shot may also be added to the set of representative frames.
  • Two adjacent dissimilar frames respectively constitute the first frame of two different sub-shots. This step selects a set of representative frames in a targeted manner, which can further improve the accuracy of the review.
  • Screening out dissimilar frames as representative frames can identify frames that cannot be identified by the naked eye or only by setting the sampling frequency, avoiding missed judgments and improving the accuracy of review.
  • Figure 6a shows a schematic diagram of two frames being dissimilar according to some embodiments of the present disclosure.
  • Figure 6b shows a schematic diagram of two frames being similar according to some embodiments of the present disclosure.
  • FIG. 7 illustrates a schematic diagram of calculating the similarity between every two adjacent frames in a set of video frames according to some embodiments of the present disclosure.
  • Two adjacent dissimilar frames A and D respectively constitute the first frame of two different sub-shots, that is, the first sub-shot includes frame A, frame B, and frame C, and the second sub-shot includes frame D and frame E. .
  • the last frame of the first sub-shot, which is frame C, is also added to the set of representative frames, and the above process is repeated starting from D.
  • step 140 a frame is selected from the set of video frames at preset intervals and added to the set of representative frames.
  • Selecting a frame every preset interval and adding it to the representative frame set can avoid the difference between the first frame and the last frame of the sub-shot being too large, and avoid omission of illegal frames in the slowly changing continuous frames, thereby reducing the The probability of missing cores increases the accuracy of the audit.
  • the representative frame set is output, and the representative frame set is reviewed.
  • the 100-minute video to be reviewed the representative frame extracted does not exceed 200, and the review can be completed in 1 minute.
  • the representative frame set is extracted according to the video frame set containing all frames corresponding to the entire video to be reviewed, and the representative frame set includes a few representative frames.
  • the representative frame set can be reviewed to obtain the entire video to be reviewed.
  • the audit results save manpower and improve efficiency while ensuring high accuracy. It can be applied to large-scale video auditing scenarios to meet the requirements of high efficiency and high accuracy, and quickly and easily with high accuracy. Complete the business.
  • FIG. 8 shows a schematic diagram of an apparatus for reviewing video according to some embodiments of the present disclosure.
  • the apparatus 800 for reviewing video in this embodiment includes: an acquisition module 810 , a calculation module 820 , a determination module 830 and a review module 840 .
  • the obtaining module 810 is configured to obtain a video frame set corresponding to the video to be reviewed.
  • acquiring the video frame set corresponding to the video to be reviewed includes: cropping the video to be reviewed into multiple sub-videos; performing parallel processing on the multiple sub-videos to obtain a sub-frame set corresponding to each sub-video; using the snowflake algorithm to generate all sub-frame sets
  • the identification of each frame in ; for all subframe sets, all frames in all subframe sets are arranged in the order of identification, and all the sorted frames constitute a video frame set.
  • the video frame set may include, for example, a video frame set composed of original video frames or a video frame set composed of compressed video frames.
  • the video frame set composed of the compressed video frames is obtained by performing parallel compression processing by multiple compression servers.
  • the parallel compression includes: acquiring the number of frames corresponding to the video to be reviewed; and determining the number of frames to be processed by each compression server according to the number of frames and the number of compression servers.
  • the corresponding frames to be processed are compressed by the compression server; the compressed video frames output by all the compression servers are formed into a video frame set.
  • Using the compression server to compress the corresponding frame to be processed includes: using the compression server to divide all the pixels of each frame to be processed into multiple groups, and each group corresponds to a pixel in the compressed frame; The average value of the pixel values corresponding to all the pixel points in the group; the average value of the pixel values corresponding to all the pixel points in each group is taken as the pixel value of the corresponding pixel point in the group in the compressed frame.
  • the pixels of the frame to be processed are represented as pixel values of the R channel, the pixel value of the G channel, and the pixel value of the B channel, the pixel values corresponding to all the pixel points in each group are calculated.
  • the average value includes: calculating the average value of the pixel values of the R channel corresponding to all the pixel points in each group; calculating the average value of the pixel value of the G channel corresponding to all the pixel points in each group; calculating the average value of all the pixel points in each group
  • the average value of the pixel values of the corresponding B channel; wherein, taking the average value of the pixel values corresponding to all the pixel points in each group as the pixel value of the corresponding pixel points in the group in the compressed frame includes: The average value of the pixel values of the R channel, the average value of the pixel value of the G channel, and the average value of the pixel value of the B channel corresponding to all pixels are taken as the pixel value of the R channel of the corresponding group of pixels in the compressed frame, respectively. , the pixel value of the G channel and the pixel value of the B channel.
  • the calculation module 820 is configured to calculate the similarity between every two adjacent frames in the video frame set; wherein calculating the similarity between every two adjacent frames in the video frame set includes: calculating each The bit arrays corresponding to the two adjacent frames respectively; the similarity between each two adjacent frames is determined according to the Hamming distance between the bit arrays corresponding to the two adjacent frames respectively.
  • calculating the bit array corresponding to the frame includes: calculating the average value of the pixel values corresponding to all pixel points of the frame; in some embodiments, the pixel points of the frame are expressed as the pixel value of the R channel, the pixel value of the G channel and In the case of the pixel value of the channel, calculating the average value of the pixel values corresponding to all the pixel points of the frame includes: averaging the pixel value of the R channel, the pixel value of the G channel, and the pixel value of the B channel of all the pixel points of the frame.
  • the pixels whose pixel values in the frame are greater than or equal to the average value are set as the first preset value, and the pixels whose pixel values in the frame are less than the average value are set as the second preset value;
  • the array obtained by the two preset values is determined as the bit array corresponding to the frame.
  • the determining module 830 is configured to screen out dissimilar video frames from the video frame set based on the similarity to form a representative frame set; based on the similarity, screen out the dissimilar video frames from the video frame set to form a representative frame set Including: adding the first frame that appears in the video frame set to the representative frame set; starting from the first frame, calculating the similarity between every two adjacent frames in the video frame set, when the similarity is less than When the default value is used, the later frame of the adjacent frames is added to the set of representative frames.
  • the auditing module 840 is configured to output the representative frame set, and audit the representative frame set.
  • the representative frame set is extracted according to the video frame set containing all frames corresponding to the entire video to be reviewed, and the representative frame set includes a few representative frames.
  • the representative frame set can be reviewed to obtain the entire video to be reviewed.
  • the audit results save manpower and improve efficiency while ensuring high accuracy. It can be applied to large-scale video auditing scenarios to meet the requirements of high efficiency and high accuracy, and quickly and easily with high accuracy. Complete the business.
  • FIG. 9 shows a schematic diagram of an apparatus for reviewing video according to other embodiments of the present disclosure.
  • the apparatus 900 for reviewing video in this embodiment includes: a memory 910 and a processor 920 coupled to the memory 910 , and the processor 920 is configured to execute any of the present disclosure based on the instructions stored in the memory 910 .
  • a method of reviewing video in some embodiments.
  • the memory 910 may include, for example, a system memory, a fixed non-volatile storage medium, and the like.
  • the system memory stores, for example, an operating system, an application program, a boot loader (Boot Loader), and other programs.
  • the apparatus 900 for reviewing video may further include an input/output interface 930, a network interface 940, a storage interface 950, and the like. These interfaces 930 , 940 , 950 and the memory 910 and the processor 920 can be connected, for example, through a bus 960 .
  • the input and output interface 930 provides a connection interface for input and output devices such as a display, a mouse, a keyboard, and a touch screen.
  • Network interface 940 provides a connection interface for various networked devices.
  • the storage interface 950 provides a connection interface for external storage devices such as SD cards and U disks.
  • embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer non-transitory readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer program code embodied therein .
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
  • the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

本公开提出一种审核视频的方法、装置及计算机可读存储介质,涉及计算机技术领域。在本公开中,获取待审核视频对应的视频帧集合;计算视频帧集合中的每两个相邻的帧之间的相似度;基于相似度,从视频帧集合中筛选出不相似的视频帧,构成代表帧集合;输出代表帧集合,对代表帧集合进行审核。根据整个待审核视频对应的包含所有帧的视频帧集合提取代表帧集合,代表帧集合包括有代表性的少数帧,可以通过对代表帧集合进行审核而得到整个待审核视频的审核结果,节省人力、提高效率的同时,可以保证较高的准确率,可以应用于大规模的视频审核场景中,满足其高效以及具有较高准确率的要求。

Description

审核视频的方法、装置及计算机可读存储介质
相关申请的交叉引用
本申请是以中国申请号为202110255203.5,申请日为2021年3月9日的申请为基础,并主张其优先权,该中国申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及计算机技术领域,特别涉及一种审核视频的方法、装置及计算机可读存储介质。
背景技术
针对内容的审核是保证互联网健康的重要环节,目前,针对视频领域的审核方式是基于抽检和人工审核相结合的方式,即设定抽检频率,定时地从待审核视频中抽取出一帧或多帧,再对这些抽取出的帧通过人工审核。
发明内容
在相关技术中,抽检和人工审核相结合的方式,审核结果的效率和准确率与抽检频率具有一定的相关性,设定的抽检频率过大会导致效率较低,设定的抽检频率过小会导致容易漏检,进而导致审核准确率较低。
在本公开的实施例中,获取待审核视频对应的视频帧集合;计算视频帧集合中的每两个相邻的帧之间的相似度;基于相似度,从视频帧集合中筛选出不相似的视频帧,构成代表帧集合;输出代表帧集合,对代表帧集合进行审核。根据整个待审核视频对应的包含所有帧的视频帧集合提取代表帧集合,代表帧集合包括有代表性的少数帧,可以通过对代表帧集合进行审核而得到整个待审核视频的审核结果,节省人力、提高效率的同时,可以保证较高的准确率,可以应用于大规模的视频审核场景中,满足其高效以及具有较高准确率的要求。
根据本公开的一些实施例,提供一种审核视频的方法,包括:获取待审核视频对应的视频帧集合;计算视频帧集合中的每两个相邻的帧之间的相似度;基于所述相似度,从视频帧集合中筛选出不相似的视频帧,构成代表帧集合;输出所述代表帧集合,对所述代表帧集合进行审核。
在一些实施例中,还包括:输出所述代表帧集合之前,从视频帧集合中每隔预设间隔选择一个帧,添加到代表帧集合中。
在一些实施例中,基于所述相似度,从视频帧集合中筛选出不相似的视频帧,构成代表帧集合包括:将视频帧集合中出现的第一个帧添加到代表帧集合中;从第一个帧开始,计算视频帧集合中的每两个相邻的帧之间的相似度,当相似度小于预设值时,将相邻帧中较后的帧添加到代表帧集合中。
在一些实施例中,所述计算视频帧集合中的每两个相邻的帧之间的相似度包括:计算每两个相邻的帧分别对应的位数组;根据每两个相邻的帧分别对应的位数组之间的汉明距离,确定每两个相邻的帧之间的相似度。
在一些实施例中,所述计算帧对应的位数组包括:计算所述帧的所有像素点对应的像素值的平均值;将所述帧中的像素值大于或等于所述平均值的像素点设置为第一预设值,将所述帧中的像素值小于所述平均值的像素点设置为第二预设值;将由第一预设值和第二预设值得到的数组,确定为所述帧对应的位数组。
在一些实施例中,所述视频帧集合包括原始视频帧组成的视频帧集合或者压缩后的视频帧组成的视频帧集合。
在一些实施例中,所述压缩后的视频帧组成的视频帧集合由多个压缩服务器进行并行压缩处理得到,其中并行压缩包括:获取待审核视频对应的帧数目;根据所述帧数目和压缩服务器的数目,确定每个压缩服务器待处理的帧的个数;利用所述压缩服务器对相应待处理的帧进行压缩处理;将所有压缩服务器输出的压缩后的视频帧组成视频帧集合。
在一些实施例中,所述利用所述压缩服务器对相应待处理的帧进行压缩处理包括:利用所述压缩服务器将待处理的每帧的所有像素点划分为多个组,每个组在压缩后的帧中对应一个像素点;计算每个组中所有像素点对应的像素值的平均值;将每个组中所有像素点对应的像素值的平均值作为压缩后的帧中该组对应的像素点的像素值。
在一些实施例中,在待处理的帧的像素点表示为R通道的像素值、G通道的像素值和B通道的像素值的情况下,所述计算每个组中所有像素点对应的像素值的平均值包括:计算每个组中所有像素点对应的R通道的像素值的平均值;计算每个组中所有像素点对应的G通道的像素值的平均值;计算每个组中所有像素点对应的B通道的像素值的平均值;其中,所述将每个组中所有像素点对应的像素值的平均值作为压缩后的帧中该组对应的像素点的像素值包括:将每个组中所有像素点对应的R通道的像素 值的平均值、G通道的像素值的平均值和B通道的像素值的平均值分别作为压缩后的帧中该组对应的像素点的R通道的像素值、G通道的像素值和B通道的像素值。
在一些实施例中,所述获取待审核视频对应的视频帧集合包括:将待审核视频裁剪为多个子视频;对多个子视频进行并行处理,得到每个子视频对应的一个子帧集合;利用雪花算法生成所有子帧集合中的每个帧的标识;针对所有子帧集合,按照标识顺序排列所有子帧集合中的所有帧,排序后的所有帧构成视频帧集合。
根据本公开的另一些实施例,提供一种审核视频的装置,包括:获取模块,被配置为获取待审核视频对应的视频帧集合;计算模块,被配置为计算视频帧集合中的每两个相邻的帧之间的相似度;确定模块,被配置为基于所述相似度,从视频帧集合中筛选出不相似的视频帧,构成代表帧集合;审核模块,被配置为输出所述代表帧集合,对所述代表帧集合进行审核。
根据本公开的又一些实施例,提供一种审核视频的装置,包括:存储器;以及耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器中的指令,执行任一实施例所述的审核视频的方法。
根据本公开的再一些实施例,提供一种非瞬时性计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现任一实施例所述的审核视频的方法。
附图说明
下面将对实施例或相关技术描述中所需要使用的附图作简单的介绍。根据下面参照附图的详细描述,可以更加清楚地理解本公开。
显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1示出根据本公开的一些实施例的审核视频的方法的流程示意图。
图2示出根据本公开的一些实施例的视频与分镜头、一秒镜头、视频帧的关系的示意图。
图3示出根据本公开的一些实施例的将待审核视频解析为视频帧集合的流程示意图。
图4示出根据本公开的一些实施例的由压缩服务器进行并行压缩处理得到压缩后的视频帧组成的视频帧集合的示意图。
图5示出根据本公开的一些实施例的对帧进行压缩处理的示意图。
图6a示出根据本公开的一些实施例的两个帧不相似的示意图。
图6b示出根据本公开的一些实施例的两个帧相似的示意图。
图7示出根据本公开的一些实施例的计算视频帧集合中的每两个相邻的帧之间的相似度的示意图。
图8示出根据本公开的一些实施例的审核视频的装置的示意图。
图9示出根据本公开的另一些实施例的审核视频的装置的示意图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述。
本公开的“第一”、“第二”等描述,如果没有特别指出,是用来表示不同的对象,并不用来表示大小或时序等含义。
图1示出根据本公开的一些实施例的审核视频的方法的流程示意图。该方法例如可以由审核视频的装置执行。
如图1所示,该实施例的方法包括步骤110-130、150,在一些实施例中,还包括步骤140。
在步骤110,获取待审核视频对应的视频帧集合。
图2示出根据本公开的一些实施例的视频与分镜头、一秒镜头、视频帧的关系的示意图。
如图2所示,一个视频可以由多个分镜头组成,一个分镜头可以由多个1秒镜头组成,通常情况下,一个1秒镜头例如可以由24帧构成,1个图像由n个像素构成(即俗称的图片尺寸)。其中,帧,也称为图像或者图片。
因此,一般视频对应的视频帧数计算公式例如可以为:视频时长(分钟数)*60(秒)*24(帧),例如一个十分钟的视频对应的帧数为10*60*24=14400帧。
在一些实施例中,获取待审核视频对应的视频帧集合包括:将待审核视频裁剪为多个子视频;对多个子视频进行并行处理,得到每个子视频对应的一个子帧集合;利用雪花算法生成所有子帧集合中的每个帧的标识;针对所有子帧集合,按照标识顺序排列所有子帧集合中的所有帧,排序后的所有帧构成视频帧集合。
将待审核视频并行解析为视频帧集合,可以减少时间成本,提高处理效率,满足视频审核场景下对于效率的要求。
图3示出根据本公开的一些实施例的将待审核视频解析为视频帧集合的流程示意图。
如图3所示,该实施例的方法包括步骤310-330。
在步骤310,利用解析服务器加载待审核视频,例如可以通过Java语言提供的方案File f=new File($path)实现。
在步骤320,依次获取各个视频帧。
其中,步骤320可以包括子步骤321-323。
在子步骤321,通过多媒体框架FFmpeg提取被FFmpeg的Frame对象包括的一帧图像。
在子步骤322,将Frame对象中的图像提取为Java内存图像实例。例如可以通过FFmpeg中的grabImage方法,逐帧通过Java2DFrameConverter将Frame对象中的图像提取为Java内存图像
在子步骤323,利用Java内存图像实例生成一个视频帧。例如,通过Java语言的BufferedImage对象,可以将提取的Java内存图像实例通过drawImage方法生成为一张jpg或png等格式的图片,并保存在指定文件夹。例如可以以时间戳/或其他连续手段保证保存的图片的连续性。
循环步骤321-323,可以将整个待审核视频解析为视频帧集合。
在步骤330,顺序收集各个帧,获取视频帧集合。
在一些实施例中,视频帧集合例如可以包括原始视频帧组成的视频帧集合或者压缩后的视频帧组成的视频帧集合。
其中,压缩后的视频帧组成的视频帧集合由多个压缩服务器进行并行压缩处理得到。并行压缩例如可以包括:获取待审核视频对应的帧数目;根据帧数目和压缩服务器的数目,确定每个压缩服务器待处理的帧的个数。利用压缩服务器对相应待处理的帧进行压缩处理;将所有压缩服务器输出的压缩后的视频帧组成视频帧集合。
对帧进行压缩处理,帧的像素点的数目减少,可以提高后续的处理效率(例如计算相似度时计算次数的减少)。采用并行压缩的方法,可以进一步提高压缩效率,从而提高视频审核的效率,满足高效性的要求。
例如,帧数目/压缩服务器的数目=每个压缩服务器需要压缩的帧的个数。例如可以把压缩服务器数量设置为24的倍数,以方便计算压缩效率。假设压缩服务器为240台,则每次执行压缩可以处理10秒的视频(240台/24帧每秒),如果一次压缩耗时100 毫秒,则性能提升为10秒/100毫秒=100倍(1秒=1000毫秒)。
图4示出根据本公开的一些实施例的由压缩服务器进行并行压缩处理得到压缩后的视频帧组成的视频帧集合的示意图。
如图4所示,例如利用3个压缩服务器对待审核视频进行并行压缩处理。将待审核视频解析为原始视频帧,例如帧数目为10,帧数目/压缩服务器的数目=10/3=3…1,2个压缩服务器需要压缩3个帧,1个压缩服务器需要压缩4个帧。利用每个压缩服务器对原始视频帧进行并行压缩处理,分别得到压缩后的视频帧1-10。
在一些实施例中,利用压缩服务器对相应待处理的帧进行压缩处理包括:利用压缩服务器将待处理的每帧的所有像素点划分为多个组,每个组在压缩后的帧中对应一个像素点;计算每个组中所有像素点对应的像素值的平均值;将每个组中所有像素点对应的像素值的平均值作为压缩后的帧中该组对应的像素点的像素值。
在一些实施例中,在待处理的帧的像素点表示为R通道的像素值、G通道的像素值和B通道的像素值的情况下,计算每个组中所有像素点对应的像素值的平均值包括:计算每个组中所有像素点对应的R通道的像素值的平均值;计算每个组中所有像素点对应的G通道的像素值的平均值;计算每个组中所有像素点对应的B通道的像素值的平均值;其中,将每个组中所有像素点对应的像素值的平均值作为压缩后的帧中该组对应的像素点的像素值包括:将每个组中所有像素点对应的R通道的像素值的平均值、G通道的像素值的平均值和B通道的像素值的平均值分别作为压缩后的帧中该组对应的像素点的R通道的像素值、G通道的像素值和B通道的像素值。
图5示出根据本公开的一些实施例的对帧进行压缩处理的示意图。
如图5所示,假设待压缩的某一个帧有4个像素点,标识为像素1-4,这些像素点表示为R通道的像素值、G通道的像素值和B通道的像素值,其RGB分别为16进制的格式为:31ADF1、31ADF1、31ADF1、45FA8B。
对R通道的像素值、G通道的像素值和B通道的像素值分别求平均值为:
在R通道上,(31+31+31+45)/4=138/4=34。
在G通道上,(AD+AD+AD+FA)/4,转换为十进制表示即为(173+173+173+250)/4=192,转化为16进制为C0。
在B通道上,(F1+F1+F1+8B)/4,转换为十进制表示即为(241+241+241+139)/4=215,转化为16进制为d7。
最终输出压缩像素5的像素值表示为34C0D7。
例如,按照上述方法,对每帧中的每4个像素压缩一次,三次之后即可将图片压缩为原图片大小的1/64,以1920*1280尺寸为例,压缩后的尺寸为30*20,压缩处理可以通过常规服务器执行,压缩一张图片只需要100毫秒。
例如,压缩的技术实现为:使用Java语言的BufferedImage对象可以根据高度height参数和宽度weight参数读取指定像素,从而读取整个图片的像素数据,该对象还可以基于读取的像素数据绘制一张图片。上述实现具有读取速度快的特点,可以满足高效性的要求。
在步骤120,计算视频帧集合中的每两个相邻的帧之间的相似度。
计算视频帧集合中的每两个相邻的帧之间的相似度包括:计算每两个相邻的帧分别对应的位数组(即bitArray);根据每两个相邻的帧分别对应的位数组之间的汉明距离,确定每两个相邻的帧之间的相似度。计算相似度可以并行处理,在并行处理的各服务器计算完成之后,再进行相邻两个服务器间的尾帧与首帧的相似判断即可。可以进一步提高处理效率。
其中,计算帧对应的位数组包括:计算帧的所有像素点对应的像素值的平均值;将帧中的像素值大于或等于平均值的像素点设置为第一预设值,将帧中的像素值小于平均值的像素点设置为第二预设值;将由第一预设值和第二预设值得到的数组,确定为帧对应的位数组。
在一些实施例中,在帧的像素点表示为R通道的像素值、G通道的像素值和B通道的像素值的情况下,计算帧的所有像素点对应的像素值的平均值包括:计算帧的所有像素点的R通道的像素值、G通道的像素值和B通道的像素值的平均值,作为该像素点对应的像素值。例如,某一个帧具有4个像素点,表示为31ADF1、31ADF1、31ADF1、45FA8B(十六进制)。计算平均值即为:R通道上,(31+31+31+45)/4=34(十六进制),G通道上,(AD+AD+AD+FA)/4=C0(十六进制),B通道上,(F1+F1+F1+8B)/4=d7(十六进制),然后,再对各通道的像素值求平均34+C0+D7=110.25(十进制)。
计算两帧的像素度,例如,依次将每个像素值x与平均值m比较,如果像素值大于等于平均值,将该像素点标记为1,如果像素值小于平均值,将该像素点标记为0,即可以用三目表达式表达为:x>=m?1:0。将某一帧的所有像素点的标记结果按位写入一个位数组,例如表示为[0,1,1,0,1,1,0,1,0,1]。
在步骤130,基于相似度,从视频帧集合中筛选出不相似的视频帧,构成代表帧集合。
基于相似度,从视频帧集合中筛选出不相似的视频帧,构成代表帧集合包括:将视频帧集合中出现的第一个帧添加到代表帧集合中;从第一个帧开始,计算视频帧集合中的每两个相邻的帧之间的相似度,当相似度小于预设值时,将相邻帧中较后的帧添加到代表帧集合中。
在一些实施例中,还可以将分镜头的最后一帧添加到代表帧集合中。两个相邻的不相似的帧分别构成两个不同的分镜头的首帧。该步骤有针对性地选择代表帧集合,可以进一步提高审核的准确率。
筛选出不相似的帧作为代表帧,可以将肉眼或仅通过设置抽检频率的方式无法识别的帧识别出来,避免漏判,提高审核的准确率。
假设两帧的位数组中超过10%的值不一致,即相似度小于90%,则认定两帧不相似;两帧的位数组中小于等于10%的值不一致,即相似度大于90%,即两帧相似。如果30*20=600个位中,有超过60个位的值不一致,则两帧不相似,如果小于等于60个位的值不一致,则两帧相似。
图6a示出根据本公开的一些实施例的两个帧不相似的示意图。
帧A和帧B对应的位数组均有10个位,其中,不相同的位有2个,相同的位有8个,即相似度为8/10=80%,假设相似度小于90%,则认定两帧不相似,则帧A和帧B不相似。
图6b示出根据本公开的一些实施例的两个帧相似的示意图。
帧A和帧B对应的位数组均有10个位,其中,不相同的位有1个,相同的位有9个,即相似度为9/10=90%,假设相似度不小于90%,则认定两帧相似,则帧A和帧B相似。
图7示出根据本公开的一些实施例的计算视频帧集合中的每两个相邻的帧之间的相似度的示意图。
假设待审核视频对应的连续帧为5个,分别为帧A、帧B、帧C、帧D、帧E。
首先,将帧A添加到代表帧集合中,计算帧A和帧B的相似度,如果帧A和帧B相似,则接着计算帧B和帧C的相似度,如果帧B和帧C相似,以此类推,接着计算帧C和帧D的相似度,如果帧C和帧D不相似,则将不相似的帧D添加到代表帧集合中。
两个相邻的不相似的帧A和帧D分别构成两个不同的分镜头的首帧,即,第一个分镜头包括帧A、帧B、帧C,第二个分镜头包括帧D、帧E。将第一个分镜头的最后一帧分别为帧C也添加到代表帧集合中,并从D开始重复上述过程。
在步骤140,从视频帧集合中每隔预设间隔选择一个帧,添加到代表帧集合中。
每隔预设间隔提取一张代表帧,其计算公式李例如可以为:分镜头总帧数/1440= 间隔提取的代表帧数(其中1440是1分钟的视频的帧数),即,每隔1440帧提取一张代表帧。假设一个分镜头长度为8分30秒,则共需要提取第0、1440、2880...12240帧共计10帧。
每隔预设间隔选择一个帧,添加到代表帧集合中,可以避免分镜头的第一个帧和最后一个帧的差异过大,可以避免遗漏缓慢变化的连续帧中存在不合法的帧,从而降低漏核的概率,提高审核的准确率。
在步骤150,输出代表帧集合,对代表帧集合进行审核。
通过上述步骤,100分钟的待审核视频,提取的代表帧不超过200张,1分钟即可审核完成。
上述实施例中,根据整个待审核视频对应的包含所有帧的视频帧集合提取代表帧集合,代表帧集合包括有代表性的少数帧,可以通过对代表帧集合进行审核而得到整个待审核视频的审核结果,节省人力、提高效率的同时,可以保证较高的准确率,可以应用于大规模的视频审核场景中,满足其高效以及具有较高准确率的要求,以较高的精度快速便捷地完成业务。
图8示出根据本公开的一些实施例的审核视频的装置的示意图。
如图8所示,该实施例的审核视频的装置800包括:获取模块810、计算模块820、确定模块830和审核模块840。
获取模块810,被配置为获取待审核视频对应的视频帧集合。其中,获取待审核视频对应的视频帧集合包括:将待审核视频裁剪为多个子视频;对多个子视频进行并行处理,得到每个子视频对应的一个子帧集合;利用雪花算法生成所有子帧集合中的每个帧的标识;针对所有子帧集合,按照标识顺序排列所有子帧集合中的所有帧,排序后的所有帧构成视频帧集合。
其中,视频帧集合例如可以包括原始视频帧组成的视频帧集合或者压缩后的视频帧组成的视频帧集合。压缩后的视频帧组成的视频帧集合由多个压缩服务器进行并行压缩处理得到。其中并行压缩包括:获取待审核视频对应的帧数目;根据帧数目和压缩服务器的数目,确定每个压缩服务器待处理的帧的个数。利用压缩服务器对相应待处理的帧进行压缩处理;将所有压缩服务器输出的压缩后的视频帧组成视频帧集合。
利用压缩服务器对相应待处理的帧进行压缩处理包括:利用压缩服务器将待处理的每帧的所有像素点划分为多个组,每个组在压缩后的帧中对应一个像素点;计算每个组中所有像素点对应的像素值的平均值;将每个组中所有像素点对应的像素值 的平均值作为压缩后的帧中该组对应的像素点的像素值。在一些实施例中,在待处理的帧的像素点表示为R通道的像素值、G通道的像素值和B通道的像素值的情况下,计算每个组中所有像素点对应的像素值的平均值包括:计算每个组中所有像素点对应的R通道的像素值的平均值;计算每个组中所有像素点对应的G通道的像素值的平均值;计算每个组中所有像素点对应的B通道的像素值的平均值;其中,将每个组中所有像素点对应的像素值的平均值作为压缩后的帧中该组对应的像素点的像素值包括:将每个组中所有像素点对应的R通道的像素值的平均值、G通道的像素值的平均值和B通道的像素值的平均值分别作为压缩后的帧中该组对应的像素点的R通道的像素值、G通道的像素值和B通道的像素值。
计算模块820,被配置为计算视频帧集合中的每两个相邻的帧之间的相似度;其中,计算视频帧集合中的每两个相邻的帧之间的相似度包括:计算每两个相邻的帧分别对应的位数组;根据每两个相邻的帧分别对应的位数组之间的汉明距离,确定每两个相邻的帧之间的相似度。
其中,计算帧对应的位数组包括:计算帧的所有像素点对应的像素值的平均值;在一些实施例中,在帧的像素点表示为R通道的像素值、G通道的像素值和B通道的像素值的情况下,计算帧的所有像素点对应的像素值的平均值包括:对帧的所有像素点的R通道的像素值、G通道的像素值和B通道的像素值求平均。将帧中的像素值大于或等于平均值的像素点设置为第一预设值,将帧中的像素值小于平均值的像素点设置为第二预设值;将由第一预设值和第二预设值得到的数组,确定为帧对应的位数组。
确定模块830,被配置为基于相似度,从视频帧集合中筛选出不相似的视频帧,构成代表帧集合;基于相似度,从视频帧集合中筛选出不相似的视频帧,构成代表帧集合包括:将视频帧集合中出现的第一个帧添加到代表帧集合中;从第一个帧开始,计算视频帧集合中的每两个相邻的帧之间的相似度,当相似度小于预设值时,将相邻帧中较后的帧添加到代表帧集合中。
审核模块840,被配置为输出代表帧集合,对代表帧集合进行审核。
上述实施例中,根据整个待审核视频对应的包含所有帧的视频帧集合提取代表帧集合,代表帧集合包括有代表性的少数帧,可以通过对代表帧集合进行审核而得到整个待审核视频的审核结果,节省人力、提高效率的同时,可以保证较高的准确率,可以应用于大规模的视频审核场景中,满足其高效以及具有较高准确率的要求,以较高的精度快速便捷地完成业务。
图9示出根据本公开的另一些实施例的审核视频的装置的示意图。
如图9所示,该实施例的审核视频的装置900包括:存储器910以及耦接至该存储器910的处理器920,处理器920被配置为基于存储在存储器910中的指令,执行本公开任意一些实施例中的审核视频的方法。
其中,存储器910例如可以包括系统存储器、固定非易失性存储介质等。系统存储器例如存储有操作系统、应用程序、引导装载程序(Boot Loader)以及其他程序等。
审核视频的装置900还可以包括输入输出接口930、网络接口940、存储接口950等。这些接口930,940,950以及存储器910和处理器920之间例如可以通过总线960连接。其中,输入输出接口930为显示器、鼠标、键盘、触摸屏等输入输出设备提供连接接口。网络接口940为各种联网设备提供连接接口。存储接口950为SD卡、U盘等外置存储设备提供连接接口。
本领域内的技术人员应当明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机程序代码的计算机非瞬时性可读存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本公开是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解为可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算 机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述仅为本公开的较佳实施例,并不用以限制本公开,凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。

Claims (15)

  1. 一种审核视频的方法,包括:
    获取待审核视频对应的视频帧集合;
    计算视频帧集合中的每两个相邻的帧之间的相似度;
    基于所述相似度,从视频帧集合中筛选出不相似的视频帧,构成代表帧集合;
    输出所述代表帧集合,对所述代表帧集合进行审核。
  2. 根据权利要求1所述的审核视频的方法,还包括:
    输出所述代表帧集合之前,从视频帧集合中每隔预设间隔选择一个帧,添加到代表帧集合中。
  3. 根据权利要求1所述的审核视频的方法,基于所述相似度,从视频帧集合中筛选出不相似的视频帧,构成代表帧集合包括:
    将视频帧集合中出现的第一个帧添加到代表帧集合中;
    从第一个帧开始,计算视频帧集合中的每两个相邻的帧之间的相似度,当相似度小于预设值时,将相邻帧中较后的帧添加到代表帧集合中。
  4. 根据权利要求1所述的审核视频的方法,其中,所述计算视频帧集合中的每两个相邻的帧之间的相似度包括:
    计算每两个相邻的帧分别对应的位数组;
    根据每两个相邻的帧分别对应的位数组之间的汉明距离,确定每两个相邻的帧之间的相似度。
  5. 根据权利要求4所述的审核视频的方法,其中,所述计算帧对应的位数组包括:
    计算所述帧的所有像素点对应的像素值的平均值;
    将所述帧中的像素值大于或等于所述平均值的像素点设置为第一预设值,将所述帧中的像素值小于所述平均值的像素点设置为第二预设值;
    将由第一预设值和第二预设值得到的数组,确定为所述帧对应的位数组。
  6. 根据权利要求1所述的审核视频的方法,其中,
    所述视频帧集合包括原始视频帧组成的视频帧集合或者压缩后的视频帧组成的视频帧集合。
  7. 根据权利要求6所述的审核视频的方法,其中,所述压缩后的视频帧组成的视频帧集合由多个压缩服务器进行并行压缩处理得到,其中并行压缩包括:
    获取待审核视频对应的帧数目;
    根据所述帧数目和压缩服务器的数目,确定每个压缩服务器待处理的帧的个数;
    利用所述压缩服务器对相应待处理的帧进行压缩处理;
    将所有压缩服务器输出的压缩后的视频帧组成视频帧集合。
  8. 根据权利要求7所述的审核视频的方法,其中,所述利用所述压缩服务器对相应待处理的帧进行压缩处理包括:
    利用所述压缩服务器将待处理的每帧的所有像素点划分为多个组,每个组在压缩后的帧中对应一个像素点;
    计算每个组中所有像素点对应的像素值的平均值;
    将每个组中所有像素点对应的像素值的平均值作为压缩后的帧中该组对应的像素点的像素值。
  9. 根据权利要求8所述的审核视频的方法,其中,
    在待处理的帧的像素点表示为R通道的像素值、G通道的像素值和B通道的像素值的情况下,所述计算每个组中所有像素点对应的像素值的平均值包括:
    计算每个组中所有像素点对应的R通道的像素值的平均值;
    计算每个组中所有像素点对应的G通道的像素值的平均值;
    计算每个组中所有像素点对应的B通道的像素值的平均值;
    其中,所述将每个组中所有像素点对应的像素值的平均值作为压缩后的帧中该组对应的像素点的像素值包括:
    将每个组中所有像素点对应的R通道的像素值的平均值、G通道的像素值的平均值和B通道的像素值的平均值分别作为压缩后的帧中该组对应的像素点的R通道的像素值、G通道的像素值和B通道的像素值。
  10. 根据权利要求1-9任一所述的审核视频的方法,其中,所述获取待审核视频对应的视频帧集合包括:
    将待审核视频裁剪为多个子视频;
    对多个子视频进行并行处理,得到每个子视频对应的一个子帧集合;
    利用雪花算法生成所有子帧集合中的每个帧的标识;
    针对所有子帧集合,按照标识顺序排列所有子帧集合中的所有帧,排序后的所有帧构成视频帧集合。
  11. 根据权利要求1-9任一所述的审核视频的方法,其中,所述获取待审核视频对应的视频帧集合包括:
    利用解析服务器加载待审核视频;
    将待审核视频解析为视频帧集合;
    顺序收集各个帧,获取视频帧集合。
  12. 根据权利要求1-9任一所述的审核视频的方法,还包括:
    将分镜头的最后一帧添加到代表帧集合中,其中,两个相邻的不相似的帧分别构成两个不同的分镜头的首帧。
  13. 一种审核视频的装置,包括:
    获取模块,被配置为获取待审核视频对应的视频帧集合;
    计算模块,被配置为计算视频帧集合中的每两个相邻的帧之间的相似度;
    确定模块,被配置为基于所述相似度,从视频帧集合中筛选出不相似的视频帧,构成代表帧集合;
    审核模块,被配置为输出所述代表帧集合,对所述代表帧集合进行审核。
  14. 一种审核视频的装置,包括:
    存储器;以及
    耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器中的指令,执行权利要求1-12中任一项所述的审核视频的方法。
  15. 一种非瞬时性计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现权利要求1-12中任一项所述的审核视频的方法。
PCT/CN2021/140756 2021-03-09 2021-12-23 审核视频的方法、装置及计算机可读存储介质 WO2022188510A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110255203.5A CN113051236B (zh) 2021-03-09 2021-03-09 审核视频的方法、装置及计算机可读存储介质
CN202110255203.5 2021-03-09

Publications (1)

Publication Number Publication Date
WO2022188510A1 true WO2022188510A1 (zh) 2022-09-15

Family

ID=76510879

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/140756 WO2022188510A1 (zh) 2021-03-09 2021-12-23 审核视频的方法、装置及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN113051236B (zh)
WO (1) WO2022188510A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051236B (zh) * 2021-03-09 2022-06-07 北京沃东天骏信息技术有限公司 审核视频的方法、装置及计算机可读存储介质
CN113724211B (zh) * 2021-08-13 2023-01-31 扬州美德莱医疗用品股份有限公司 一种基于状态感应的故障自动识别方法及系统
CN114205632A (zh) * 2021-12-17 2022-03-18 深圳Tcl新技术有限公司 视频预览方法、装置、电子设备及计算机可读存储介质
CN115361584B (zh) * 2022-08-22 2023-10-03 广东电网有限责任公司 一种视频数据处理方法、装置、电子设备及可读存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101076115A (zh) * 2006-12-26 2007-11-21 腾讯科技(深圳)有限公司 一种视频内容审核系统和方法
CN101547366A (zh) * 2009-05-07 2009-09-30 硅谷数模半导体(北京)有限公司 过驱动影像压缩方法
CN105224600A (zh) * 2015-08-31 2016-01-06 北京奇虎科技有限公司 一种样本相似度的检测方法及装置
US20190147279A1 (en) * 2017-11-13 2019-05-16 Aupera Technologies, Inc. System of a video frame detector for video content identification and method thereof
CN110913243A (zh) * 2018-09-14 2020-03-24 华为技术有限公司 一种视频审核的方法、装置和设备
CN111491180A (zh) * 2020-06-24 2020-08-04 腾讯科技(深圳)有限公司 关键帧的确定方法和装置
WO2021035227A1 (en) * 2020-05-30 2021-02-25 Futurewei Technologies, Inc. Systems and methods for retreiving videos using natural language description
CN113051236A (zh) * 2021-03-09 2021-06-29 北京沃东天骏信息技术有限公司 审核视频的方法、装置及计算机可读存储介质

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090207316A1 (en) * 2008-02-19 2009-08-20 Sorenson Media, Inc. Methods for summarizing and auditing the content of digital video
US20160014482A1 (en) * 2014-07-14 2016-01-14 The Board Of Trustees Of The Leland Stanford Junior University Systems and Methods for Generating Video Summary Sequences From One or More Video Segments
CN106851437A (zh) * 2017-01-17 2017-06-13 南通同洲电子有限责任公司 一种提取视频摘要的方法
CN108881947B (zh) * 2017-05-15 2021-08-17 阿里巴巴集团控股有限公司 一种直播流的侵权检测方法及装置
CN110324706B (zh) * 2018-03-30 2022-03-04 阿里巴巴(中国)有限公司 一种视频封面的生成方法、装置及计算机存储介质
CN110971833B (zh) * 2018-09-30 2021-05-14 北京微播视界科技有限公司 一种图像处理方法、装置、电子设备及存储介质
CN109756746B (zh) * 2018-12-28 2021-03-19 广州华多网络科技有限公司 视频审核方法、装置、服务器及存储介质
CN110321447A (zh) * 2019-07-08 2019-10-11 北京字节跳动网络技术有限公司 重复图像的确定方法、装置、电子设备及存储介质
CN111294646B (zh) * 2020-02-17 2022-08-30 腾讯科技(深圳)有限公司 一种视频处理方法、装置、设备及存储介质
CN111586473B (zh) * 2020-05-20 2023-01-17 北京字节跳动网络技术有限公司 视频的裁剪方法、装置、设备及存储介质
CN111813996B (zh) * 2020-07-22 2022-03-01 四川长虹电器股份有限公司 基于单帧和连续多帧抽样并行的视频搜索方法
CN112203122B (zh) * 2020-10-10 2024-01-26 腾讯科技(深圳)有限公司 基于人工智能的相似视频处理方法、装置及电子设备
CN112363790B (zh) * 2020-11-11 2022-11-29 北京字跳网络技术有限公司 表格的视图显示方法、装置和电子设备

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101076115A (zh) * 2006-12-26 2007-11-21 腾讯科技(深圳)有限公司 一种视频内容审核系统和方法
CN101547366A (zh) * 2009-05-07 2009-09-30 硅谷数模半导体(北京)有限公司 过驱动影像压缩方法
CN105224600A (zh) * 2015-08-31 2016-01-06 北京奇虎科技有限公司 一种样本相似度的检测方法及装置
US20190147279A1 (en) * 2017-11-13 2019-05-16 Aupera Technologies, Inc. System of a video frame detector for video content identification and method thereof
CN110913243A (zh) * 2018-09-14 2020-03-24 华为技术有限公司 一种视频审核的方法、装置和设备
WO2021035227A1 (en) * 2020-05-30 2021-02-25 Futurewei Technologies, Inc. Systems and methods for retreiving videos using natural language description
CN111491180A (zh) * 2020-06-24 2020-08-04 腾讯科技(深圳)有限公司 关键帧的确定方法和装置
CN113051236A (zh) * 2021-03-09 2021-06-29 北京沃东天骏信息技术有限公司 审核视频的方法、装置及计算机可读存储介质

Also Published As

Publication number Publication date
CN113051236A (zh) 2021-06-29
CN113051236B (zh) 2022-06-07

Similar Documents

Publication Publication Date Title
WO2022188510A1 (zh) 审核视频的方法、装置及计算机可读存储介质
CN109803180B (zh) 视频预览图生成方法、装置、计算机设备及存储介质
EP3271865B1 (en) Detecting segments of a video program
US8671109B2 (en) Content-based video copy detection
US8379735B2 (en) Automatic video glitch detection and audio-video synchronization assessment
US9594957B2 (en) Apparatus and method for identifying a still image contained in moving image contents
US20130063452A1 (en) Capturing screen displays in video memory and detecting render artifacts
EP3001871B1 (en) Systems and methods for addressing a media database using distance associative hashing
US20080310722A1 (en) Identifying character information in media content
CN109408672B (zh) 一种文章生成方法、装置、服务器及存储介质
EP3890294B1 (en) Method and apparatus for extracting hotspot segment from video
CN110475156B (zh) 一种视频延迟值的计算方法及装置
WO2022087826A1 (zh) 视频处理方法、装置、可移动设备及可读存储介质
US20170147170A1 (en) Method for generating a user interface presenting a plurality of videos
US9549162B2 (en) Image processing apparatus, image processing method, and program
US20160011955A1 (en) Extracting Rich Performance Analysis from Simple Time Measurements
US20220189172A1 (en) Method and system for whole-process trace leaving of video manuscript gathering, editing, and check
CN110662080B (zh) 面向机器的通用编码方法
WO2018068250A1 (zh) 处理数据的方法、装置、芯片和摄像头
KR20150089598A (ko) 요약정보 생성 장치 및 방법, 그리고 컴퓨터 프로그램이 기록된 기록매체
CN111212322A (zh) 一种基于多视频去重拼接的视频压缩方法
CN102905054A (zh) 一种基于图像多维特征值比对的视频同步方法
CA2760414C (en) Content-based video copy detection
JP2003069946A (ja) 映像解析装置,映像解析方法,映像解析プログラムおよびそのプログラム記録媒体
JP2006157687A (ja) 視聴者間コミュニケーション方法及び装置及びプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21929972

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.01.2024)