CN110913243B

CN110913243B - Video auditing method, device and equipment

Info

Publication number: CN110913243B
Application number: CN201811076274.3A
Authority: CN
Inventors: 赵海宾; 杨振华
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Cloud Computing Technologies Co Ltd
Priority date: 2018-09-14
Filing date: 2018-09-14
Publication date: 2021-09-14
Anticipated expiration: 2038-09-14
Also published as: CN110913243A; WO2020052270A1

Abstract

The application discloses a video auditing method, which comprises the steps of calculating frame characteristics of frames in a video, wherein the frame characteristics are obtained by combining first characteristics and second characteristics of the frames, and obtaining similarity distance of the frames according to the frame characteristics; selecting partial frames of the video as candidate shot boundaries according to the similarity distance of the frames; acquiring shots from the video according to the candidate shot boundaries; and acquiring a key frame from the shot, and auditing the video according to the key frame. By segmenting each lens in the video, the precision of video audit is improved.

Description

Video auditing method, device and equipment

Technical Field

The present application relates to the field of computers, and in particular, to a method for video review, and an apparatus and device for performing the method.

Background

The internet technology is prosperous and developing, and video resources on the network are numerous. Video auditing is an important means for filtering out bad or illegal content in a network video resource pool. In the traditional technology, the mode of manually auditing the video is time-consuming and labor-consuming, and has limited efficiency; the method for extracting frames at fixed intervals from a video and then obtaining the auditing result of the extracted frames by the image auditing technology still has the problems of large resource consumption, low precision and the like.

Disclosure of Invention

The application provides a video auditing method, which improves the precision of video auditing.

In a first aspect, the present application provides a method for video review, the method performed by a computing device, comprising: calculating frame characteristics of frames in a video, wherein the frame characteristics are obtained by combining first characteristics and second characteristics of the frames, and the similarity distance of the frames is obtained according to the frame characteristics; selecting partial frames of the video as candidate shot boundaries according to the similarity distance of the frames; acquiring shots from the video according to the candidate shot boundaries; and acquiring a key frame from the shot, and auditing the video according to the key frame. According to the method, the key frames are obtained from the shot by predetermining the shot boundary, the condition that the selected key frames cannot accurately express the video content during video verification is avoided, and the precision of video verification is improved.

The first feature or the second feature may be any feature in terms of color, texture, shape, or the like of one frame image. For example: RGB histogram features; HSV histogram feature; HOG edge features; an LBP feature; haar features, etc. The first feature and the second feature are generally features of different aspects, such that a frame feature obtained by combining the first feature and the second feature may describe a frame image from different aspects.

In a possible implementation manner of the first aspect, the similarity distance of the frame is a babbitt distance between the frame feature and a frame feature of a preceding frame of the frame in the video.

In a possible implementation manner of the first aspect, the selecting, according to the similarity distance of the frames, a partial frame of the video as a candidate shot boundary includes: and determining that the similarity distance of the frames is greater than a first threshold value, and selecting the frames as candidate shot boundaries. The specific operation is to judge the relation between the similarity distance of the frame and a first threshold, and when the similarity distance of the frame is greater than the first threshold, the frame is selected as a candidate shot boundary. The first threshold is a preset maximum threshold of the similarity distance of the frames.

In a possible implementation manner of the first aspect, the selecting, according to the similarity distance of the frames, a partial frame of the video as a candidate shot boundary includes: and determining whether the similarity distance of the frame is smaller than or equal to a second threshold value, and determining whether the frame is a candidate shot boundary according to the similarity distance of other frames in the shot candidate window of the frame. The specific operation is to judge the relationship between the similarity distance of the frame and a second threshold, and when the similarity distance of the frame is greater than or equal to the second threshold, determine the candidate shot boundary according to the similarity distance of other frames in the shot candidate window of the frame. The second threshold is a preset minimum threshold of the similarity distance of the frames, and in the application, the first threshold is greater than or equal to the second threshold.

In a possible implementation manner of the first aspect, the shot candidate window of the frame is a frame set that includes a certain number of frames and is centered on the frame.

In a possible implementation manner of the first aspect, determining whether the frame is a candidate shot boundary according to a similarity distance of other frames in a shot candidate window of the frame includes: calculating the mean value and the variance of the similarity distance of all frames in the lens candidate window of the frame to obtain a similarity distance judgment value of the frame; and determining that the similarity distance of the frame is greater than the similarity distance judgment value of the frame, selecting the frame as a candidate shot boundary.

In a possible implementation manner of the first aspect, the acquiring a key frame from the shot includes: determining that the similarity distance of any subsequent frame of the first frame in the shot is greater than a preset third threshold value, and selecting the subsequent frame as a candidate key frame; key frames are obtained from the selected candidate key frames. The method specifically comprises the steps of selecting a first frame in a shot as a candidate key frame, judging the relation between the similarity distance of the subsequent frames of the first frame in the shot and a third threshold value, and selecting the subsequent frames as the candidate key frames when the similarity distance of any subsequent frame is larger than the third threshold value. The third threshold is a preset key frame selection threshold, and the setting of the value is determined by actual conditions such as an application scene.

In a second aspect, the present application provides an apparatus for video review, the apparatus comprising: the system comprises a lens segmentation module, a key frame determination module and an image verification module. The lens segmentation module is used for calculating frame characteristics of frames in a video, wherein the frame characteristics are obtained by combining first characteristics and second characteristics of the frames, and the similarity distance of the frames is obtained according to the frame characteristics; selecting partial frames of the video as candidate shot boundaries according to the similarity distance of the frames; acquiring shots from the video according to the candidate shot boundaries; the key frame determining module is used for acquiring key frames from the shot; and the image auditing module is used for auditing the video according to the key frame. The apparatus for video auditing is configured to perform the method provided in the foregoing first aspect or any possible implementation manner of the first aspect.

In one possible implementation manner of the second aspect, the similarity distance of the frame is a babbitt distance between the frame feature and a frame feature of a preceding frame of the frame in the video.

In a possible implementation manner of the second aspect, the selecting, by the shot segmentation module, a partial frame of the video as a candidate shot boundary according to the similarity distance of the frame includes: and determining that the similarity distance of the frames is greater than a first threshold value, and selecting the frames as candidate shot boundaries.

In a possible implementation manner of the second aspect, the selecting, by the shot segmentation module, a partial frame of the video as a candidate shot boundary according to the similarity distance of the frame includes: and determining whether the similarity distance of the frame is smaller than or equal to a second threshold value, and determining whether the frame is a candidate shot boundary according to the similarity distance of other frames in the shot candidate window of the frame.

In a possible implementation manner of the second aspect, the shot candidate window of the frame is a frame set that includes a certain number of frames and is centered on the frame.

In a possible implementation manner of the second aspect, the determining, by the shot segmentation module, whether the frame is a candidate shot boundary according to similarity distances of other frames in a shot candidate window of the frame includes: calculating the mean value and the variance of the similarity distance of all frames in the lens candidate window of the frame to obtain a similarity distance judgment value of the frame; and determining that the similarity distance of the frame is greater than the similarity distance judgment value of the frame, and selecting the frame as a candidate shot boundary.

In a possible implementation manner of the second aspect, the key frame determining module is configured to obtain key frames from the shots, and includes: determining that the similarity distance of any subsequent frame of the first frame in the shot is greater than a preset third threshold value, and selecting the subsequent frame as a candidate key frame; key frames are obtained from the selected candidate key frames.

In a third aspect, the present application provides a computing device system. The computing device system includes at least one computing device. Each computing device includes a memory and a processor. The processor of at least one computing device is configured to access code in the memory of the at least one computing device to perform the method provided by the first aspect or any possible implementation of the first aspect.

In a fourth aspect, the present application provides a non-transitory readable storage medium which, when executed by a computing device, performs the method provided in the foregoing first aspect or any possible implementation manner of the first aspect. The storage medium stores a program therein. The storage medium includes, but is not limited to, volatile memory such as random access memory, non-volatile memory such as flash memory, hard disk (HDD), Solid State Disk (SSD).

In a fifth aspect, the present application provides a computing device program product comprising computer instructions that, when executed by a computing device, perform the method provided in the first aspect or any possible implementation manner of the first aspect. The computer program product may be a software installation package, which may be downloaded and executed on a computing device in case it is desired to use the method as provided in the first aspect or any possible implementation manner of the first aspect.

Drawings

In order to more clearly illustrate the technical method of the embodiments of the present application, the drawings used in the embodiments will be briefly described below.

Fig. 1 is a schematic diagram of a relationship between a video, a shot, and a frame according to an embodiment of the present application;

fig. 2a is a schematic view of an application scenario of a video auditing method according to an embodiment of the present application;

fig. 2b is a schematic view of another application scenario of the video auditing method according to the embodiment of the present application;

fig. 3 is a schematic flowchart of a method for video review according to an embodiment of the present application;

fig. 4 is a schematic flowchart of video shot segmentation provided in an embodiment of the present application;

fig. 5 is a schematic structural diagram of a lens candidate window according to an embodiment of the present disclosure;

fig. 6 is a schematic flowchart of determining key frames in a shot according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an apparatus for video auditing according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a computing device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a computing device system according to an embodiment of the present application.

Detailed Description

The following describes technical solutions of a method, an apparatus, and a device for video review provided in the present application in detail with reference to the drawings in the present application.

In the present application, the relationship between video, shot and frame is shown in fig. 1, and the video includes a plurality of frames that change with time. Shots represent video segments within which there is generally a correlation in the content of frames, each shot comprising at least one frame. A video may be divided into several shots according to the content, and each shot includes a certain number of frames.

The main application scenes of the video auditing method include the following two.

In fig. 2a, a destination computing device (e.g., a server) receives video transmitted by a source computing device (e.g., a client) over a communication path, and the video is stored in a memory of the destination computing device. The video auditing device reads the video in the memory and executes the video auditing method. Firstly, after a video is subjected to shot segmentation, key frame determination and key frame image review, the review result of each shot of the video is obtained, the review result of the video is formed according to the review result of each shot, the review result of the video is stored in a memory, and the storage form of the video review result can be a text form.

As shown in fig. 2b, the destination computing device receives the video sent by the source computing device, stores the video in a cloud storage device (which may be a block storage service device, a file storage service device, or an object storage service device, in this embodiment, the object storage service device is taken as an example) providing a storage service, and a video auditing apparatus in the destination device reads the video from the object storage service device, executes a video auditing method, and stores an obtained auditing result in the object storage service device. Taking a source computing device as a client of a video operation company and a target computing device as a virtual machine or a physical machine of a cloud environment as an example, the video operation company sets a video auditing service function in the cloud environment, a video file received by the company client needs to be uploaded to an object storage service device, starts the video auditing service, reads a video from a designated storage position in the object storage service device for video auditing, then performs video auditing, saves a result of the video auditing to a file in a json format, and finally saves the file in the json format, namely the auditing result, in the object storage service device after the video file to be processed is processed.

Fig. 3 is a schematic flow chart of a video auditing method according to an embodiment of the present application.

In this application, a frame feature refers to a frame feature formed by combining a first feature and a second feature of image information represented by each frame in a video.

The first feature or the second feature may be any feature of the color, texture, shape, or the like of one frame image. For example: red, Green, Blue (english: Red, Green, Blue, abbreviation: RGB) histogram features; hue, Saturation, lightness (Hue, Saturation, Value, HSV) histogram features; histogram of Oriented Gradient (HOG) edge features; local Binary Pattern (LBP) feature; haar (english: Haar) feature, etc. The first and second features are generally different types of features.

In one embodiment of the present application, the first feature and the second feature are an HSV histogram feature and an HOG edge feature, respectively.

In the present application, the similarity distance of a frame indicates a babbitt distance between a frame feature of the frame and a frame feature of a preamble frame in a video in which the frame is located, and the preamble frame may be any frame located before the frame in the video in which the frame is located.

S101, reading the video. The video auditing device reads the video to be audited from the memory or the cloud storage service equipment.

And S102, determining the shot in the video. The method comprises the steps of standardizing the size of a frame of a video to be audited, calculating frame characteristics, pre-judging candidate shot boundaries according to the similarity distance of the frame, locally and adaptively judging the candidate shot boundaries by using a shot candidate window, and determining the shot boundaries in the candidate shot boundaries.

After the shot boundary is determined, the frame included between the first frame in the video or the frame after the forward adjacent shot boundary and the shot boundary is a frame in a shot, and the process of determining the frame in the shot is a shot segmentation process.

After the execution of S102 is completed, a shot (shot 1) is divided, and then key frames in the shot are identified and the content of the key frames is checked through S103-S105. In the execution of S103-S105, S102 may be executed on the subsequent frame of the video to identify the next shot (shot 2). Thus, after the execution of S103-S105 of the shot 1 is completed, S103-S105 can be executed on the shot 2, and the recognition of the shot 2 does not need to be executed after the execution of S105 of the shot 1 is completed.

And S103, determining key frames in the shot. Firstly, marking a first frame in a shot as a candidate key frame, traversing subsequent frames in the shot, marking all the candidate key frames according to the comparison between the similarity distance of the frames and a preset threshold value, and finally determining the key frames from the candidate key frames.

And S104, checking the key frames in the shot. And inputting the determined key frame serving as an image to be audited into the image identification model, and auditing the key frame in the lens through the image identification model.

Alternatively, the image recognition model may be an image classification model trained and continuously optimized from a large data set based on a deep neural network. The image recognition model can use a ResNet101 deep learning residual network model with higher recognition accuracy to classify and recognize. And inputting the key frame to the trained image recognition model, so that a key frame recognition result can be quickly obtained. The application does not limit the specific technology for image auditing, and a support vector machine method and a deep learning network model in typical machine learning can be used in the video auditing method provided by the application in an alternative way.

And S105, after the review results of all the key frames in the shot are obtained, selecting the review result of a certain key frame or the review results of certain N key frames as the review result of the shot, and storing the shot review result (for example, storing a text). The key frame selection method may employ median filtering to select key frames with recognition rates at intermediate values. At this point, the review of the shot is completed.

And S106, after the review of each shot is finished, judging whether a video is still to be reviewed for the shot or not, if so, repeating the steps S102-S106, and merging and storing the review results of all shots in the same video (for example, storing the review results into the same text).

And S107, storing the video auditing result to a local memory or cloud storage equipment, wherein the video auditing result comprises the auditing results of all the shots in a video, and the video auditing result can be a text.

When there are a plurality of videos to be audited, the aforementioned S101 to S107 are performed for each video.

The specific implementation of determining the candidate shot boundaries in step S102 is shown in fig. 4.

The shot candidate window is designed for locally adaptively determining a candidate shot boundary, and a schematic structural diagram thereof is shown in fig. 5. And recording a frame in the currently processed video as a frame P, taking the frame P as a center, and forming a lens candidate window with the frame number of 2N +1 by the frame P and the front and rear N frames together, wherein the lens candidate window is recorded as a lens candidate window of the frame P. The candidate lens window moves with the frame change, and can be regarded as a sliding window, and N is a positive integer.

S201, scaling a frame P in the video to a standard size of M rows and M columns to complete frame size standardization, wherein M is a positive integer.

Alternatively, the size normalization method in S201 may use a bilinear scaling algorithm.

S202, a frame feature calculation is performed on the frame P with the standardized size, i.e. an image, and the frame feature is obtained by combining the first feature and the second feature.

In one embodiment of the application, the HSV color model is selected, converting the R, G, B value for each pixel in the image to a H, S, V value, where H represents hue, H e [0,2 π ], S represents saturation, S e [0,1], V represents lightness, and V e [0,1 ]. In the HSV color space, an HS two-dimensional histogram is counted, the obtained histogram is normalized, dimension is reduced to a one-dimensional vector, and a first feature is obtained. And extracting HOG edge statistical characteristics in a gray color space to obtain second characteristics. Alternatively, the quantization level is 9, the cell size is 1/4 frames wide and 1/4 frames high. And combining the first characteristic and the second characteristic, namely, obtaining the frame characteristic by adopting a mode of splicing two one-dimensional vectors. And calculating the similarity distance Sim of the frame P, wherein the similarity distance of the frame P is expressed by using the Papanicolaou distance to obtain the similarity distance Sim of the frame P, and the Sim belongs to [0, ∞ ], and when the Sim value is larger, the picture content difference of two adjacent frames is larger, and vice versa.

Optionally, the calculated similarity distance Sim of the frame P is stored in the storage module for use in subsequent operations.

S203, according to the similarity of the preset frames and the maximum threshold value T_mIt is determined whether frame P is a candidate shot boundary. Judgment of T_mThe size relation with Sim if the similarity of the frame P is far from Sim>T_mThen, it can be determined that the picture content of frame P and frame P-1 has changed significantly, and step S204 is entered, otherwise, step S205 is entered.

And S204, marking the frame P as a candidate shot boundary.

S205, according to the preset frame similarity distance minimum threshold value T_nIt is determined whether frame P is a non-shot boundary. Judgment of T_nAnd if Sim is not greater than Tn, the similarity distance between the frame P and the frame P-1 is smaller, the picture content is closer, and the step S206 is entered, otherwise, the step S207 is entered.

S206, mark the frame P as a non-shot boundary.

S207, the similarity distance Sim of the frame P is larger than T_nAnd is less than or equal to T_mThe frame P is adaptively determined to be a candidate shot boundary, and a shot candidate window is used. It is determined whether the similarity distances Sim of all frames in the candidate lens window centered on the frame P have been calculated, if yes, step S208 is performed, otherwise, step S201 is performed with the frame P +1 (the next frame of the frame P) as the frame P. Waiting for the phases of all frames within the shot candidate window for frame PAfter the similarity distance calculation is completed, the process proceeds to step S208.

S208, calculate the mean M and variance V of the similarity distance of all frames in the shot candidate window of the frame P.

S209, determine the similarity distance Sim of the frame P and M + V scale, where scale is a preset adjustment parameter. When Sim > M + V scale, the process proceeds to step S204, and when Sim ≦ M + V scale, the process proceeds to step S206. Shot boundaries are determined among the candidate shot boundaries. Optionally, the candidate shot boundaries marked after the pre-determination and the self-adaptive determination are rejected in a non-maximum suppression mode, and the candidate shot boundaries with non-maximum similarity distance in the local area are obtained.

A shot is determined from the shot boundary and the first frame in the video or the next frame to the forward adjacent shot boundary.

The specific implementation of step S103 is shown in fig. 6.

S301, marking the first frame in the shot as a key frame.

S302, reading the similarity distance Sim of the next frame in the storage module. If the similarity distance of each frame acquired in step S102 is not stored in the storage module, the similarity distance Sim of the frame is calculated in step S302 according to step S202.

S303, according to a preset threshold value T_xJudging the similarity distance Sim of the frame and the threshold value T_xThe relationship (2) of (c). When Sim is greater than threshold T_xThen, the process proceeds to step S304. When Sim is less than or equal to threshold T_xIn this case, the current frame is considered to be similar to the previous frame, and the content of one frame is selected for review to achieve the purpose of review, so that the step S305 is directly performed without marking the frame as a candidate key frame.

S304, marking the frame as a candidate key frame.

S305, determining whether or not the frame in the shot has been processed, returning to step S302 if there is an unprocessed frame in the shot, and proceeding to step S306 if the frame in the shot has been processed.

Optionally, after each frame in the shot is processed in S301 to S304, the similarity distance of each frame does not need to be used in subsequent steps, and therefore, the stored similarity distance value of the frame in the shot can be released.

S306, filtering the candidate key frames and determining the key frames in the shot.

Alternatively, the filtering candidate key frames may be measured by counting the frame gray variance. Calculating the gray average value mu of the kth frame_kSum frame gray variance

According to a preset threshold value T_yWhen frame gray variance

Greater than a threshold value T_yAnd then, the frame content is judged to be rich, the color level is clear, the candidate key frames can be continuously reserved as the candidate key frames, and the marks are reserved. Variance of gray level of current frame

Less than threshold T_yWhen the frame content is not rich, the information amount is less, and the mark of the candidate key frame is removed. Through the importance judgment of the candidate key frames, frames with poor video content in the candidate key frames are eliminated.

Optionally, the candidate key frames can be filtered by fixing the number of key frames in the shot, and according to the principle of uniform distribution of key frames, the key frames in the shot are ensured to be uniformly distributed, so that the key frames represent key contents in the shot to the maximum extent. For example, in a certain shot, the candidate key frames are 5 frames, the upper limit of the number of key frames of each shot is fixed to be 3 frames, at this time, since the first frame of each shot is determined to be a key frame and the position of the candidate key frame of the last frame is known, one frame of the rest three candidate key frames, which is positioned closest to the position of the central frame of the shot, will be selected as a key frame, and at this time, the key frames of the three frames in the shot are determined to be used for the review of the subsequent images. It should be noted that the present application does not limit the value of the key frame upper limit in the shot, and may be determined according to the actual situation. When the candidate key frames are less than the set upper limit of the key frames, the key frames can be regarded as the key frames without considering the principle of uniform distribution of the key frames.

In the embodiments of the present application, when determining intra-shot key frames, the two methods for filtering candidate key frames may be used in an overlapping manner or one of the two methods may be selected.

The present application provides a video review device 400. As shown in fig. 7, the video review apparatus 400 includes a shot segmentation module 401, a key frame determination module 402, an image review module 403, and a storage module 404. The shot segmentation module 401, the key frame determination module 402, the image review module 403, and the storage module 404 may be software modules running on a computing device.

The shot segmentation module 401 includes a preprocessing sub-module 4011 and a shot boundary determination sub-module 4012. The shot segmentation module 401 executes the foregoing steps S101-S102, wherein the preprocessing sub-module 4011 executes the specific implementations S201-S202 of S101 and S102, and the shot boundary determination sub-module 4012 executes the specific implementations S203-S209 of S102 or S102. The shot segmentation module 401 establishes communication with the key frame determination module 402 and transmits the segmented shots to the key frame determination module 402. The shot segmentation module 401 also establishes communication with the storage module 404, storing the similarity distance value Sim of the frame. The key frame determining module 402 receives the shots from the shot dividing module, and executes the step S103, specifically the steps are the steps S301 to S306, and when executing the step S302, reads the similarity distance value Sim of the frame in the storage module 404. After the keyframe is determined, the keyframe determination module 402 establishes communication with the image review module 403, and transmits the keyframe to be reviewed to the image review module 403. The image review module 403 performs the aforementioned steps S104-S107.

The present application provides a computer device 500. As shown in fig. 8, computing device 500 includes a bus 501, a processor 502, a communication interface 503, and a memory 504. The processor 502, memory 504 and communication interface 503 communicate via the bus 501. The communication interface 503 is used for communicating with the outside, such as receiving a video to be audited or transmitting a video audit result. The memory 504 has stored therein executable code that the processor 501 executes to perform the video review method described above.

The processor 502 may be a Central Processing Unit (CPU). The memory 504 may include a volatile memory (RAM), such as a Random Access Memory (RAM). The memory 504 may also include a non-volatile memory (english: non-volatile memory), such as a read-only memory (ROM), a flash memory, an HDD, or an SSD.

The memory 504 has stored therein executable code that the processor 501 executes to perform the deserialization method described above. Specifically, the memory 504 stores the lens segmentation module, the key frame determination module, and the image review module, and the storage module may be a storage space provided by the memory 504. The memory 504 may include other software modules such as an operating system and the like required for running a process, in addition to the aforementioned modules. The operating system may be LINUX^TM,UNIX^TM,WINDOWS^TMAnd the like.

Various portions of the video review apparatus 400 may be distributed for execution on multiple computing devices, and thus the present application also provides a video review system as shown in fig. 9, where the organization of computing device 500A, computing device 500B, and computing device 500C is described with reference to fig. 8. The computing devices in fig. 9 establish a communication path through a communication network. Any one or any number of a shot segmentation module, a keyframe determination module, an image review module, and a storage module 404 run on each computing device. Meanwhile, referring to fig. 2b, the video to be audited and the video audit result may be stored in the cloud storage device. The descriptions of the flows corresponding to the above-mentioned figures have respective emphasis, and for parts not described in detail in a certain flow, reference may be made to the related descriptions of other flows.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in, or transmitted from one computer-readable storage medium to another computer-readable storage medium, the computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device including one or more available media, such as a magnetic medium (e.g., floppy disks, hard disks, magnetic tapes), an optical medium (e.g., DVDs), or a semiconductor medium (e.g., SSDs), etc.

Claims

1. A method of video review, comprising:

calculating frame characteristics of frames in a video, wherein the frame characteristics are obtained by combining first characteristics and second characteristics of the frames, and the similarity distance of the frames is obtained according to the frame characteristics; determining whether the similarity distance of the frame is smaller than or equal to a first threshold and larger than a second threshold, and determining whether the frame is a candidate shot boundary according to the similarity distance of other frames in a shot candidate window of the frame;

acquiring shots from the video according to the frames determined as the candidate shot boundaries in the video;

and acquiring a key frame from the shot, and auditing the video according to the key frame.

2. The method of claim 1, wherein the shot candidate for the frame is a set of frames centered on the frame comprising a number of frames.

3. The method of claim 1 or 2, wherein said determining whether the frame is a candidate shot boundary according to similarity distances of other frames within a shot candidate window for the frame comprises:

calculating the mean value and the variance of the similarity distance of all frames in the lens candidate window of the frame to obtain a similarity distance judgment value of the frame;

and determining that the similarity distance of the frame is greater than the similarity distance judgment value of the frame, and selecting the frame as a candidate shot boundary.

4. The method of claim 1 or 2, wherein the similarity distance of the frame is a babbitt distance of the frame feature from a frame feature of a preceding frame of the frame in the video.

5. The method of claim 1 or 2, wherein said obtaining key frames from said shots comprises:

determining that the similarity distance of any subsequent frame of the first frame in the shot is greater than a preset third threshold value, and selecting the subsequent frame as a candidate key frame;

key frames are obtained from the selected candidate key frames.

6. A method of video review, comprising:

calculating frame characteristics of frames in a video, wherein the frame characteristics are obtained by combining first characteristics and second characteristics of the frames, and the similarity distance of the frames is obtained according to the frame characteristics;

selecting partial frames of the video as candidate shot boundaries according to the similarity distance of the frames;

acquiring shots from the video according to the candidate shot boundaries;

acquiring a key frame from the shot, and auditing the video according to the key frame;

the selecting a partial frame of the video as a candidate shot boundary according to the similarity distance of the frame comprises:

and determining whether the similarity distance of the frame is smaller than or equal to a second threshold value, and determining whether the frame is a candidate shot boundary according to the similarity distance of other frames in the shot candidate window of the frame.

7. An apparatus for video auditing, the apparatus comprising:

the lens segmentation module is used for calculating frame characteristics of frames in a video, wherein the frame characteristics are obtained by combining first characteristics and second characteristics of the frames, and the similarity distance of the frames is obtained according to the frame characteristics; determining whether the similarity distance of the frame is smaller than or equal to a first threshold and larger than a second threshold, and determining whether the frame is a candidate shot boundary according to the similarity distance of other frames in a shot candidate window of the frame; acquiring shots from the video according to the frames determined as the candidate shot boundaries in the video;

a key frame determining module, configured to obtain a key frame from the shot;

and the image auditing module is used for auditing the video according to the key frame.

8. The apparatus of claim 7, wherein the shot candidate for the frame is a set of frames centered on the frame comprising a number of frames.

9. The apparatus of claim 7 or 8, wherein the shot segmentation module for the determining whether the frame is a candidate shot boundary according to similarity distances of other frames within a shot candidate window of the frame comprises:

10. The apparatus of claim 7 or 8, wherein the similarity distance of the frame is a babbitt distance of the frame feature from a frame feature of a preceding frame of the frame in the video.

11. The apparatus of any of claims 7 or 8, wherein the key frame determination module for the obtaining of key frames from the shots comprises:

key frames are obtained from the selected candidate key frames.

12. An apparatus for video auditing, the apparatus comprising:

the lens segmentation module is used for calculating frame characteristics of frames in a video, wherein the frame characteristics are obtained by combining first characteristics and second characteristics of the frames, and the similarity distance of the frames is obtained according to the frame characteristics; selecting partial frames of the video as candidate shot boundaries according to the similarity distance of the frames; acquiring shots from the video according to the candidate shot boundaries;

a key frame determining module, configured to obtain a key frame from the shot;

the image auditing module is used for auditing the video according to the key frame;

the shot segmentation module is further configured to determine that the similarity distance of the frame is less than or equal to a second threshold, and determine whether the frame is a candidate shot boundary according to the similarity distance of other frames in the shot candidate window of the frame.

13. A computing device system comprising at least one computing device, wherein each computing device comprises a memory and a processor,

a memory of the at least one computing device to store computer instructions; the processor of the at least one computing device executes the computer instructions stored by the memory to perform the method of any of the above claims 1-6.

14. A non-transitory readable storage medium, wherein the non-transitory readable storage medium, when executed by a computing device, performs the method of any of claims 1-6.