CN110913243B - Video auditing method, device and equipment - Google Patents

Video auditing method, device and equipment Download PDF

Info

Publication number
CN110913243B
CN110913243B CN201811076274.3A CN201811076274A CN110913243B CN 110913243 B CN110913243 B CN 110913243B CN 201811076274 A CN201811076274 A CN 201811076274A CN 110913243 B CN110913243 B CN 110913243B
Authority
CN
China
Prior art keywords
frame
frames
shot
video
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811076274.3A
Other languages
Chinese (zh)
Other versions
CN110913243A (en
Inventor
赵海宾
杨振华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201811076274.3A priority Critical patent/CN110913243B/en
Priority to PCT/CN2019/087933 priority patent/WO2020052270A1/en
Publication of CN110913243A publication Critical patent/CN110913243A/en
Application granted granted Critical
Publication of CN110913243B publication Critical patent/CN110913243B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a video auditing method, which comprises the steps of calculating frame characteristics of frames in a video, wherein the frame characteristics are obtained by combining first characteristics and second characteristics of the frames, and obtaining similarity distance of the frames according to the frame characteristics; selecting partial frames of the video as candidate shot boundaries according to the similarity distance of the frames; acquiring shots from the video according to the candidate shot boundaries; and acquiring a key frame from the shot, and auditing the video according to the key frame. By segmenting each lens in the video, the precision of video audit is improved.

Description

Video auditing method, device and equipment
Technical Field
The present application relates to the field of computers, and in particular, to a method for video review, and an apparatus and device for performing the method.
Background
The internet technology is prosperous and developing, and video resources on the network are numerous. Video auditing is an important means for filtering out bad or illegal content in a network video resource pool. In the traditional technology, the mode of manually auditing the video is time-consuming and labor-consuming, and has limited efficiency; the method for extracting frames at fixed intervals from a video and then obtaining the auditing result of the extracted frames by the image auditing technology still has the problems of large resource consumption, low precision and the like.
Disclosure of Invention
The application provides a video auditing method, which improves the precision of video auditing.
In a first aspect, the present application provides a method for video review, the method performed by a computing device, comprising: calculating frame characteristics of frames in a video, wherein the frame characteristics are obtained by combining first characteristics and second characteristics of the frames, and the similarity distance of the frames is obtained according to the frame characteristics; selecting partial frames of the video as candidate shot boundaries according to the similarity distance of the frames; acquiring shots from the video according to the candidate shot boundaries; and acquiring a key frame from the shot, and auditing the video according to the key frame. According to the method, the key frames are obtained from the shot by predetermining the shot boundary, the condition that the selected key frames cannot accurately express the video content during video verification is avoided, and the precision of video verification is improved.
The first feature or the second feature may be any feature in terms of color, texture, shape, or the like of one frame image. For example: RGB histogram features; HSV histogram feature; HOG edge features; an LBP feature; haar features, etc. The first feature and the second feature are generally features of different aspects, such that a frame feature obtained by combining the first feature and the second feature may describe a frame image from different aspects.
In a possible implementation manner of the first aspect, the similarity distance of the frame is a babbitt distance between the frame feature and a frame feature of a preceding frame of the frame in the video.
In a possible implementation manner of the first aspect, the selecting, according to the similarity distance of the frames, a partial frame of the video as a candidate shot boundary includes: and determining that the similarity distance of the frames is greater than a first threshold value, and selecting the frames as candidate shot boundaries. The specific operation is to judge the relation between the similarity distance of the frame and a first threshold, and when the similarity distance of the frame is greater than the first threshold, the frame is selected as a candidate shot boundary. The first threshold is a preset maximum threshold of the similarity distance of the frames.
In a possible implementation manner of the first aspect, the selecting, according to the similarity distance of the frames, a partial frame of the video as a candidate shot boundary includes: and determining whether the similarity distance of the frame is smaller than or equal to a second threshold value, and determining whether the frame is a candidate shot boundary according to the similarity distance of other frames in the shot candidate window of the frame. The specific operation is to judge the relationship between the similarity distance of the frame and a second threshold, and when the similarity distance of the frame is greater than or equal to the second threshold, determine the candidate shot boundary according to the similarity distance of other frames in the shot candidate window of the frame. The second threshold is a preset minimum threshold of the similarity distance of the frames, and in the application, the first threshold is greater than or equal to the second threshold.
In a possible implementation manner of the first aspect, the shot candidate window of the frame is a frame set that includes a certain number of frames and is centered on the frame.
In a possible implementation manner of the first aspect, determining whether the frame is a candidate shot boundary according to a similarity distance of other frames in a shot candidate window of the frame includes: calculating the mean value and the variance of the similarity distance of all frames in the lens candidate window of the frame to obtain a similarity distance judgment value of the frame; and determining that the similarity distance of the frame is greater than the similarity distance judgment value of the frame, selecting the frame as a candidate shot boundary.
In a possible implementation manner of the first aspect, the acquiring a key frame from the shot includes: determining that the similarity distance of any subsequent frame of the first frame in the shot is greater than a preset third threshold value, and selecting the subsequent frame as a candidate key frame; key frames are obtained from the selected candidate key frames. The method specifically comprises the steps of selecting a first frame in a shot as a candidate key frame, judging the relation between the similarity distance of the subsequent frames of the first frame in the shot and a third threshold value, and selecting the subsequent frames as the candidate key frames when the similarity distance of any subsequent frame is larger than the third threshold value. The third threshold is a preset key frame selection threshold, and the setting of the value is determined by actual conditions such as an application scene.
In a second aspect, the present application provides an apparatus for video review, the apparatus comprising: the system comprises a lens segmentation module, a key frame determination module and an image verification module. The lens segmentation module is used for calculating frame characteristics of frames in a video, wherein the frame characteristics are obtained by combining first characteristics and second characteristics of the frames, and the similarity distance of the frames is obtained according to the frame characteristics; selecting partial frames of the video as candidate shot boundaries according to the similarity distance of the frames; acquiring shots from the video according to the candidate shot boundaries; the key frame determining module is used for acquiring key frames from the shot; and the image auditing module is used for auditing the video according to the key frame. The apparatus for video auditing is configured to perform the method provided in the foregoing first aspect or any possible implementation manner of the first aspect.
In one possible implementation manner of the second aspect, the similarity distance of the frame is a babbitt distance between the frame feature and a frame feature of a preceding frame of the frame in the video.
In a possible implementation manner of the second aspect, the selecting, by the shot segmentation module, a partial frame of the video as a candidate shot boundary according to the similarity distance of the frame includes: and determining that the similarity distance of the frames is greater than a first threshold value, and selecting the frames as candidate shot boundaries.
In a possible implementation manner of the second aspect, the selecting, by the shot segmentation module, a partial frame of the video as a candidate shot boundary according to the similarity distance of the frame includes: and determining whether the similarity distance of the frame is smaller than or equal to a second threshold value, and determining whether the frame is a candidate shot boundary according to the similarity distance of other frames in the shot candidate window of the frame.
In a possible implementation manner of the second aspect, the shot candidate window of the frame is a frame set that includes a certain number of frames and is centered on the frame.
In a possible implementation manner of the second aspect, the determining, by the shot segmentation module, whether the frame is a candidate shot boundary according to similarity distances of other frames in a shot candidate window of the frame includes: calculating the mean value and the variance of the similarity distance of all frames in the lens candidate window of the frame to obtain a similarity distance judgment value of the frame; and determining that the similarity distance of the frame is greater than the similarity distance judgment value of the frame, and selecting the frame as a candidate shot boundary.
In a possible implementation manner of the second aspect, the key frame determining module is configured to obtain key frames from the shots, and includes: determining that the similarity distance of any subsequent frame of the first frame in the shot is greater than a preset third threshold value, and selecting the subsequent frame as a candidate key frame; key frames are obtained from the selected candidate key frames.
In a third aspect, the present application provides a computing device system. The computing device system includes at least one computing device. Each computing device includes a memory and a processor. The processor of at least one computing device is configured to access code in the memory of the at least one computing device to perform the method provided by the first aspect or any possible implementation of the first aspect.
In a fourth aspect, the present application provides a non-transitory readable storage medium which, when executed by a computing device, performs the method provided in the foregoing first aspect or any possible implementation manner of the first aspect. The storage medium stores a program therein. The storage medium includes, but is not limited to, volatile memory such as random access memory, non-volatile memory such as flash memory, hard disk (HDD), Solid State Disk (SSD).
In a fifth aspect, the present application provides a computing device program product comprising computer instructions that, when executed by a computing device, perform the method provided in the first aspect or any possible implementation manner of the first aspect. The computer program product may be a software installation package, which may be downloaded and executed on a computing device in case it is desired to use the method as provided in the first aspect or any possible implementation manner of the first aspect.
Drawings
In order to more clearly illustrate the technical method of the embodiments of the present application, the drawings used in the embodiments will be briefly described below.
Fig. 1 is a schematic diagram of a relationship between a video, a shot, and a frame according to an embodiment of the present application;
fig. 2a is a schematic view of an application scenario of a video auditing method according to an embodiment of the present application;
fig. 2b is a schematic view of another application scenario of the video auditing method according to the embodiment of the present application;
fig. 3 is a schematic flowchart of a method for video review according to an embodiment of the present application;
fig. 4 is a schematic flowchart of video shot segmentation provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of a lens candidate window according to an embodiment of the present disclosure;
fig. 6 is a schematic flowchart of determining key frames in a shot according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of an apparatus for video auditing according to an embodiment of the present application;
FIG. 8 is a schematic structural diagram of a computing device according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a computing device system according to an embodiment of the present application.
Detailed Description
The following describes technical solutions of a method, an apparatus, and a device for video review provided in the present application in detail with reference to the drawings in the present application.
In the present application, the relationship between video, shot and frame is shown in fig. 1, and the video includes a plurality of frames that change with time. Shots represent video segments within which there is generally a correlation in the content of frames, each shot comprising at least one frame. A video may be divided into several shots according to the content, and each shot includes a certain number of frames.
The main application scenes of the video auditing method include the following two.
In fig. 2a, a destination computing device (e.g., a server) receives video transmitted by a source computing device (e.g., a client) over a communication path, and the video is stored in a memory of the destination computing device. The video auditing device reads the video in the memory and executes the video auditing method. Firstly, after a video is subjected to shot segmentation, key frame determination and key frame image review, the review result of each shot of the video is obtained, the review result of the video is formed according to the review result of each shot, the review result of the video is stored in a memory, and the storage form of the video review result can be a text form.
As shown in fig. 2b, the destination computing device receives the video sent by the source computing device, stores the video in a cloud storage device (which may be a block storage service device, a file storage service device, or an object storage service device, in this embodiment, the object storage service device is taken as an example) providing a storage service, and a video auditing apparatus in the destination device reads the video from the object storage service device, executes a video auditing method, and stores an obtained auditing result in the object storage service device. Taking a source computing device as a client of a video operation company and a target computing device as a virtual machine or a physical machine of a cloud environment as an example, the video operation company sets a video auditing service function in the cloud environment, a video file received by the company client needs to be uploaded to an object storage service device, starts the video auditing service, reads a video from a designated storage position in the object storage service device for video auditing, then performs video auditing, saves a result of the video auditing to a file in a json format, and finally saves the file in the json format, namely the auditing result, in the object storage service device after the video file to be processed is processed.
Fig. 3 is a schematic flow chart of a video auditing method according to an embodiment of the present application.
In this application, a frame feature refers to a frame feature formed by combining a first feature and a second feature of image information represented by each frame in a video.
The first feature or the second feature may be any feature of the color, texture, shape, or the like of one frame image. For example: red, Green, Blue (english: Red, Green, Blue, abbreviation: RGB) histogram features; hue, Saturation, lightness (Hue, Saturation, Value, HSV) histogram features; histogram of Oriented Gradient (HOG) edge features; local Binary Pattern (LBP) feature; haar (english: Haar) feature, etc. The first and second features are generally different types of features.
In one embodiment of the present application, the first feature and the second feature are an HSV histogram feature and an HOG edge feature, respectively.
In the present application, the similarity distance of a frame indicates a babbitt distance between a frame feature of the frame and a frame feature of a preamble frame in a video in which the frame is located, and the preamble frame may be any frame located before the frame in the video in which the frame is located.
S101, reading the video. The video auditing device reads the video to be audited from the memory or the cloud storage service equipment.
And S102, determining the shot in the video. The method comprises the steps of standardizing the size of a frame of a video to be audited, calculating frame characteristics, pre-judging candidate shot boundaries according to the similarity distance of the frame, locally and adaptively judging the candidate shot boundaries by using a shot candidate window, and determining the shot boundaries in the candidate shot boundaries.
After the shot boundary is determined, the frame included between the first frame in the video or the frame after the forward adjacent shot boundary and the shot boundary is a frame in a shot, and the process of determining the frame in the shot is a shot segmentation process.
After the execution of S102 is completed, a shot (shot 1) is divided, and then key frames in the shot are identified and the content of the key frames is checked through S103-S105. In the execution of S103-S105, S102 may be executed on the subsequent frame of the video to identify the next shot (shot 2). Thus, after the execution of S103-S105 of the shot 1 is completed, S103-S105 can be executed on the shot 2, and the recognition of the shot 2 does not need to be executed after the execution of S105 of the shot 1 is completed.
And S103, determining key frames in the shot. Firstly, marking a first frame in a shot as a candidate key frame, traversing subsequent frames in the shot, marking all the candidate key frames according to the comparison between the similarity distance of the frames and a preset threshold value, and finally determining the key frames from the candidate key frames.
And S104, checking the key frames in the shot. And inputting the determined key frame serving as an image to be audited into the image identification model, and auditing the key frame in the lens through the image identification model.
Alternatively, the image recognition model may be an image classification model trained and continuously optimized from a large data set based on a deep neural network. The image recognition model can use a ResNet101 deep learning residual network model with higher recognition accuracy to classify and recognize. And inputting the key frame to the trained image recognition model, so that a key frame recognition result can be quickly obtained. The application does not limit the specific technology for image auditing, and a support vector machine method and a deep learning network model in typical machine learning can be used in the video auditing method provided by the application in an alternative way.
And S105, after the review results of all the key frames in the shot are obtained, selecting the review result of a certain key frame or the review results of certain N key frames as the review result of the shot, and storing the shot review result (for example, storing a text). The key frame selection method may employ median filtering to select key frames with recognition rates at intermediate values. At this point, the review of the shot is completed.
And S106, after the review of each shot is finished, judging whether a video is still to be reviewed for the shot or not, if so, repeating the steps S102-S106, and merging and storing the review results of all shots in the same video (for example, storing the review results into the same text).
And S107, storing the video auditing result to a local memory or cloud storage equipment, wherein the video auditing result comprises the auditing results of all the shots in a video, and the video auditing result can be a text.
When there are a plurality of videos to be audited, the aforementioned S101 to S107 are performed for each video.
The specific implementation of determining the candidate shot boundaries in step S102 is shown in fig. 4.
The shot candidate window is designed for locally adaptively determining a candidate shot boundary, and a schematic structural diagram thereof is shown in fig. 5. And recording a frame in the currently processed video as a frame P, taking the frame P as a center, and forming a lens candidate window with the frame number of 2N +1 by the frame P and the front and rear N frames together, wherein the lens candidate window is recorded as a lens candidate window of the frame P. The candidate lens window moves with the frame change, and can be regarded as a sliding window, and N is a positive integer.
S201, scaling a frame P in the video to a standard size of M rows and M columns to complete frame size standardization, wherein M is a positive integer.
Alternatively, the size normalization method in S201 may use a bilinear scaling algorithm.
S202, a frame feature calculation is performed on the frame P with the standardized size, i.e. an image, and the frame feature is obtained by combining the first feature and the second feature.
In one embodiment of the application, the HSV color model is selected, converting the R, G, B value for each pixel in the image to a H, S, V value, where H represents hue, H e [0,2 π ], S represents saturation, S e [0,1], V represents lightness, and V e [0,1 ]. In the HSV color space, an HS two-dimensional histogram is counted, the obtained histogram is normalized, dimension is reduced to a one-dimensional vector, and a first feature is obtained. And extracting HOG edge statistical characteristics in a gray color space to obtain second characteristics. Alternatively, the quantization level is 9, the cell size is 1/4 frames wide and 1/4 frames high. And combining the first characteristic and the second characteristic, namely, obtaining the frame characteristic by adopting a mode of splicing two one-dimensional vectors. And calculating the similarity distance Sim of the frame P, wherein the similarity distance of the frame P is expressed by using the Papanicolaou distance to obtain the similarity distance Sim of the frame P, and the Sim belongs to [0, ∞ ], and when the Sim value is larger, the picture content difference of two adjacent frames is larger, and vice versa.
Optionally, the calculated similarity distance Sim of the frame P is stored in the storage module for use in subsequent operations.
S203, according to the similarity of the preset frames and the maximum threshold value TmIt is determined whether frame P is a candidate shot boundary. Judgment of TmThe size relation with Sim if the similarity of the frame P is far from Sim>TmThen, it can be determined that the picture content of frame P and frame P-1 has changed significantly, and step S204 is entered, otherwise, step S205 is entered.
And S204, marking the frame P as a candidate shot boundary.
S205, according to the preset frame similarity distance minimum threshold value TnIt is determined whether frame P is a non-shot boundary. Judgment of TnAnd if Sim is not greater than Tn, the similarity distance between the frame P and the frame P-1 is smaller, the picture content is closer, and the step S206 is entered, otherwise, the step S207 is entered.
S206, mark the frame P as a non-shot boundary.
S207, the similarity distance Sim of the frame P is larger than TnAnd is less than or equal to TmThe frame P is adaptively determined to be a candidate shot boundary, and a shot candidate window is used. It is determined whether the similarity distances Sim of all frames in the candidate lens window centered on the frame P have been calculated, if yes, step S208 is performed, otherwise, step S201 is performed with the frame P +1 (the next frame of the frame P) as the frame P. Waiting for the phases of all frames within the shot candidate window for frame PAfter the similarity distance calculation is completed, the process proceeds to step S208.
S208, calculate the mean M and variance V of the similarity distance of all frames in the shot candidate window of the frame P.
S209, determine the similarity distance Sim of the frame P and M + V scale, where scale is a preset adjustment parameter. When Sim > M + V scale, the process proceeds to step S204, and when Sim ≦ M + V scale, the process proceeds to step S206. Shot boundaries are determined among the candidate shot boundaries. Optionally, the candidate shot boundaries marked after the pre-determination and the self-adaptive determination are rejected in a non-maximum suppression mode, and the candidate shot boundaries with non-maximum similarity distance in the local area are obtained.
A shot is determined from the shot boundary and the first frame in the video or the next frame to the forward adjacent shot boundary.
The specific implementation of step S103 is shown in fig. 6.
S301, marking the first frame in the shot as a key frame.
S302, reading the similarity distance Sim of the next frame in the storage module. If the similarity distance of each frame acquired in step S102 is not stored in the storage module, the similarity distance Sim of the frame is calculated in step S302 according to step S202.
S303, according to a preset threshold value TxJudging the similarity distance Sim of the frame and the threshold value TxThe relationship (2) of (c). When Sim is greater than threshold TxThen, the process proceeds to step S304. When Sim is less than or equal to threshold TxIn this case, the current frame is considered to be similar to the previous frame, and the content of one frame is selected for review to achieve the purpose of review, so that the step S305 is directly performed without marking the frame as a candidate key frame.
S304, marking the frame as a candidate key frame.
S305, determining whether or not the frame in the shot has been processed, returning to step S302 if there is an unprocessed frame in the shot, and proceeding to step S306 if the frame in the shot has been processed.
Optionally, after each frame in the shot is processed in S301 to S304, the similarity distance of each frame does not need to be used in subsequent steps, and therefore, the stored similarity distance value of the frame in the shot can be released.
S306, filtering the candidate key frames and determining the key frames in the shot.
Alternatively, the filtering candidate key frames may be measured by counting the frame gray variance. Calculating the gray average value mu of the kth framekSum frame gray variance
Figure BDA0001800844310000061
According to a preset threshold value TyWhen frame gray variance
Figure BDA0001800844310000062
Greater than a threshold value TyAnd then, the frame content is judged to be rich, the color level is clear, the candidate key frames can be continuously reserved as the candidate key frames, and the marks are reserved. Variance of gray level of current frame
Figure BDA0001800844310000063
Less than threshold TyWhen the frame content is not rich, the information amount is less, and the mark of the candidate key frame is removed. Through the importance judgment of the candidate key frames, frames with poor video content in the candidate key frames are eliminated.
Optionally, the candidate key frames can be filtered by fixing the number of key frames in the shot, and according to the principle of uniform distribution of key frames, the key frames in the shot are ensured to be uniformly distributed, so that the key frames represent key contents in the shot to the maximum extent. For example, in a certain shot, the candidate key frames are 5 frames, the upper limit of the number of key frames of each shot is fixed to be 3 frames, at this time, since the first frame of each shot is determined to be a key frame and the position of the candidate key frame of the last frame is known, one frame of the rest three candidate key frames, which is positioned closest to the position of the central frame of the shot, will be selected as a key frame, and at this time, the key frames of the three frames in the shot are determined to be used for the review of the subsequent images. It should be noted that the present application does not limit the value of the key frame upper limit in the shot, and may be determined according to the actual situation. When the candidate key frames are less than the set upper limit of the key frames, the key frames can be regarded as the key frames without considering the principle of uniform distribution of the key frames.
In the embodiments of the present application, when determining intra-shot key frames, the two methods for filtering candidate key frames may be used in an overlapping manner or one of the two methods may be selected.
The present application provides a video review device 400. As shown in fig. 7, the video review apparatus 400 includes a shot segmentation module 401, a key frame determination module 402, an image review module 403, and a storage module 404. The shot segmentation module 401, the key frame determination module 402, the image review module 403, and the storage module 404 may be software modules running on a computing device.
The shot segmentation module 401 includes a preprocessing sub-module 4011 and a shot boundary determination sub-module 4012. The shot segmentation module 401 executes the foregoing steps S101-S102, wherein the preprocessing sub-module 4011 executes the specific implementations S201-S202 of S101 and S102, and the shot boundary determination sub-module 4012 executes the specific implementations S203-S209 of S102 or S102. The shot segmentation module 401 establishes communication with the key frame determination module 402 and transmits the segmented shots to the key frame determination module 402. The shot segmentation module 401 also establishes communication with the storage module 404, storing the similarity distance value Sim of the frame. The key frame determining module 402 receives the shots from the shot dividing module, and executes the step S103, specifically the steps are the steps S301 to S306, and when executing the step S302, reads the similarity distance value Sim of the frame in the storage module 404. After the keyframe is determined, the keyframe determination module 402 establishes communication with the image review module 403, and transmits the keyframe to be reviewed to the image review module 403. The image review module 403 performs the aforementioned steps S104-S107.
The present application provides a computer device 500. As shown in fig. 8, computing device 500 includes a bus 501, a processor 502, a communication interface 503, and a memory 504. The processor 502, memory 504 and communication interface 503 communicate via the bus 501. The communication interface 503 is used for communicating with the outside, such as receiving a video to be audited or transmitting a video audit result. The memory 504 has stored therein executable code that the processor 501 executes to perform the video review method described above.
The processor 502 may be a Central Processing Unit (CPU). The memory 504 may include a volatile memory (RAM), such as a Random Access Memory (RAM). The memory 504 may also include a non-volatile memory (english: non-volatile memory), such as a read-only memory (ROM), a flash memory, an HDD, or an SSD.
The memory 504 has stored therein executable code that the processor 501 executes to perform the deserialization method described above. Specifically, the memory 504 stores the lens segmentation module, the key frame determination module, and the image review module, and the storage module may be a storage space provided by the memory 504. The memory 504 may include other software modules such as an operating system and the like required for running a process, in addition to the aforementioned modules. The operating system may be LINUXTM,UNIXTM,WINDOWSTMAnd the like.
Various portions of the video review apparatus 400 may be distributed for execution on multiple computing devices, and thus the present application also provides a video review system as shown in fig. 9, where the organization of computing device 500A, computing device 500B, and computing device 500C is described with reference to fig. 8. The computing devices in fig. 9 establish a communication path through a communication network. Any one or any number of a shot segmentation module, a keyframe determination module, an image review module, and a storage module 404 run on each computing device. Meanwhile, referring to fig. 2b, the video to be audited and the video audit result may be stored in the cloud storage device. The descriptions of the flows corresponding to the above-mentioned figures have respective emphasis, and for parts not described in detail in a certain flow, reference may be made to the related descriptions of other flows.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in, or transmitted from one computer-readable storage medium to another computer-readable storage medium, the computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device including one or more available media, such as a magnetic medium (e.g., floppy disks, hard disks, magnetic tapes), an optical medium (e.g., DVDs), or a semiconductor medium (e.g., SSDs), etc.

Claims (14)

1. A method of video review, comprising:
calculating frame characteristics of frames in a video, wherein the frame characteristics are obtained by combining first characteristics and second characteristics of the frames, and the similarity distance of the frames is obtained according to the frame characteristics; determining whether the similarity distance of the frame is smaller than or equal to a first threshold and larger than a second threshold, and determining whether the frame is a candidate shot boundary according to the similarity distance of other frames in a shot candidate window of the frame;
acquiring shots from the video according to the frames determined as the candidate shot boundaries in the video;
and acquiring a key frame from the shot, and auditing the video according to the key frame.
2. The method of claim 1, wherein the shot candidate for the frame is a set of frames centered on the frame comprising a number of frames.
3. The method of claim 1 or 2, wherein said determining whether the frame is a candidate shot boundary according to similarity distances of other frames within a shot candidate window for the frame comprises:
calculating the mean value and the variance of the similarity distance of all frames in the lens candidate window of the frame to obtain a similarity distance judgment value of the frame;
and determining that the similarity distance of the frame is greater than the similarity distance judgment value of the frame, and selecting the frame as a candidate shot boundary.
4. The method of claim 1 or 2, wherein the similarity distance of the frame is a babbitt distance of the frame feature from a frame feature of a preceding frame of the frame in the video.
5. The method of claim 1 or 2, wherein said obtaining key frames from said shots comprises:
determining that the similarity distance of any subsequent frame of the first frame in the shot is greater than a preset third threshold value, and selecting the subsequent frame as a candidate key frame;
key frames are obtained from the selected candidate key frames.
6. A method of video review, comprising:
calculating frame characteristics of frames in a video, wherein the frame characteristics are obtained by combining first characteristics and second characteristics of the frames, and the similarity distance of the frames is obtained according to the frame characteristics;
selecting partial frames of the video as candidate shot boundaries according to the similarity distance of the frames;
acquiring shots from the video according to the candidate shot boundaries;
acquiring a key frame from the shot, and auditing the video according to the key frame;
the selecting a partial frame of the video as a candidate shot boundary according to the similarity distance of the frame comprises:
and determining whether the similarity distance of the frame is smaller than or equal to a second threshold value, and determining whether the frame is a candidate shot boundary according to the similarity distance of other frames in the shot candidate window of the frame.
7. An apparatus for video auditing, the apparatus comprising:
the lens segmentation module is used for calculating frame characteristics of frames in a video, wherein the frame characteristics are obtained by combining first characteristics and second characteristics of the frames, and the similarity distance of the frames is obtained according to the frame characteristics; determining whether the similarity distance of the frame is smaller than or equal to a first threshold and larger than a second threshold, and determining whether the frame is a candidate shot boundary according to the similarity distance of other frames in a shot candidate window of the frame; acquiring shots from the video according to the frames determined as the candidate shot boundaries in the video;
a key frame determining module, configured to obtain a key frame from the shot;
and the image auditing module is used for auditing the video according to the key frame.
8. The apparatus of claim 7, wherein the shot candidate for the frame is a set of frames centered on the frame comprising a number of frames.
9. The apparatus of claim 7 or 8, wherein the shot segmentation module for the determining whether the frame is a candidate shot boundary according to similarity distances of other frames within a shot candidate window of the frame comprises:
calculating the mean value and the variance of the similarity distance of all frames in the lens candidate window of the frame to obtain a similarity distance judgment value of the frame;
and determining that the similarity distance of the frame is greater than the similarity distance judgment value of the frame, and selecting the frame as a candidate shot boundary.
10. The apparatus of claim 7 or 8, wherein the similarity distance of the frame is a babbitt distance of the frame feature from a frame feature of a preceding frame of the frame in the video.
11. The apparatus of any of claims 7 or 8, wherein the key frame determination module for the obtaining of key frames from the shots comprises:
determining that the similarity distance of any subsequent frame of the first frame in the shot is greater than a preset third threshold value, and selecting the subsequent frame as a candidate key frame;
key frames are obtained from the selected candidate key frames.
12. An apparatus for video auditing, the apparatus comprising:
the lens segmentation module is used for calculating frame characteristics of frames in a video, wherein the frame characteristics are obtained by combining first characteristics and second characteristics of the frames, and the similarity distance of the frames is obtained according to the frame characteristics; selecting partial frames of the video as candidate shot boundaries according to the similarity distance of the frames; acquiring shots from the video according to the candidate shot boundaries;
a key frame determining module, configured to obtain a key frame from the shot;
the image auditing module is used for auditing the video according to the key frame;
the shot segmentation module is further configured to determine that the similarity distance of the frame is less than or equal to a second threshold, and determine whether the frame is a candidate shot boundary according to the similarity distance of other frames in the shot candidate window of the frame.
13. A computing device system comprising at least one computing device, wherein each computing device comprises a memory and a processor,
a memory of the at least one computing device to store computer instructions; the processor of the at least one computing device executes the computer instructions stored by the memory to perform the method of any of the above claims 1-6.
14. A non-transitory readable storage medium, wherein the non-transitory readable storage medium, when executed by a computing device, performs the method of any of claims 1-6.
CN201811076274.3A 2018-09-14 2018-09-14 Video auditing method, device and equipment Active CN110913243B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811076274.3A CN110913243B (en) 2018-09-14 2018-09-14 Video auditing method, device and equipment
PCT/CN2019/087933 WO2020052270A1 (en) 2018-09-14 2019-05-22 Video review method and apparatus, and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811076274.3A CN110913243B (en) 2018-09-14 2018-09-14 Video auditing method, device and equipment

Publications (2)

Publication Number Publication Date
CN110913243A CN110913243A (en) 2020-03-24
CN110913243B true CN110913243B (en) 2021-09-14

Family

ID=69777296

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811076274.3A Active CN110913243B (en) 2018-09-14 2018-09-14 Video auditing method, device and equipment

Country Status (2)

Country Link
CN (1) CN110913243B (en)
WO (1) WO2020052270A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542725B (en) * 2020-04-22 2023-09-05 百度在线网络技术(北京)有限公司 Video auditing method, video auditing device and electronic equipment
CN111625683B (en) * 2020-05-07 2023-05-23 山东师范大学 Automatic video abstract generation method and system based on graph structure difference analysis
CN113762014A (en) * 2021-01-05 2021-12-07 北京沃东天骏信息技术有限公司 Method and device for determining similar videos
CN114979742B (en) * 2021-02-24 2024-04-09 腾讯科技(深圳)有限公司 Video processing method, device, equipment and storage medium
CN113014957B (en) * 2021-02-25 2023-01-31 北京市商汤科技开发有限公司 Video shot segmentation method and device, medium and computer equipment
CN113051236B (en) * 2021-03-09 2022-06-07 北京沃东天骏信息技术有限公司 Method and device for auditing video and computer-readable storage medium
CN113393449A (en) * 2021-06-25 2021-09-14 上海市第一人民医院 Endoscope video image automatic storage method based on artificial intelligence
CN114650435B (en) * 2022-02-23 2023-09-05 京东科技信息技术有限公司 Method and device for searching repeated segments in video and related equipment

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090083790A1 (en) * 2007-09-26 2009-03-26 Tao Wang Video scene segmentation and categorization
US8358837B2 (en) * 2008-05-01 2013-01-22 Yahoo! Inc. Apparatus and methods for detecting adult videos
CN101360184B (en) * 2008-09-22 2010-07-28 腾讯科技(深圳)有限公司 System and method for extracting key frame of video
CN101620629A (en) * 2009-06-09 2010-01-06 中兴通讯股份有限公司 Method and device for extracting video index and video downloading system
CN101650830B (en) * 2009-08-06 2012-08-15 中国科学院声学研究所 Combined automatic segmentation method for abrupt change and gradual change of compressed domain video lens
CN102073841B (en) * 2009-11-20 2012-08-01 中国移动通信集团广东有限公司 Poor video detection method and device
CN102254006B (en) * 2011-07-15 2013-06-19 上海交通大学 Method for retrieving Internet video based on contents
CN102930553B (en) * 2011-08-10 2016-03-30 中国移动通信集团上海有限公司 Bad video content recognition method and device
CN102509084B (en) * 2011-11-18 2014-05-07 中国科学院自动化研究所 Multi-examples-learning-based method for identifying horror video scene
US9736520B2 (en) * 2012-02-01 2017-08-15 Futurewei Technologies, Inc. System and method for organizing multimedia content
CN103065300B (en) * 2012-12-24 2015-03-25 安科智慧城市技术(中国)有限公司 Method for video labeling and device for video labeling
CN103400155A (en) * 2013-06-28 2013-11-20 西安交通大学 Pornographic video detection method based on semi-supervised learning of images
CN104318208A (en) * 2014-10-08 2015-01-28 合肥工业大学 Video scene detection method based on graph partitioning and instance learning
CN107798304B (en) * 2017-10-20 2021-11-02 央视国际网络无锡有限公司 Method for rapidly auditing video
CN108182421B (en) * 2018-01-24 2020-07-14 北京影谱科技股份有限公司 Video segmentation method and device

Also Published As

Publication number Publication date
CN110913243A (en) 2020-03-24
WO2020052270A1 (en) 2020-03-19

Similar Documents

Publication Publication Date Title
CN110913243B (en) Video auditing method, device and equipment
US9349194B2 (en) Method for superpixel life cycle management
CN108182421A (en) Methods of video segmentation and device
JP5939056B2 (en) Method and apparatus for positioning a text region in an image
CN106327488B (en) Self-adaptive foreground detection method and detection device thereof
US20170178341A1 (en) Single Parameter Segmentation of Images
CN111507957B (en) Identity card picture conversion method and device, computer equipment and storage medium
CN111784675A (en) Method and device for processing article texture information, storage medium and electronic equipment
CN108960247B (en) Image significance detection method and device and electronic equipment
CN112966687B (en) Image segmentation model training method and device and communication equipment
CN113223011B (en) Small sample image segmentation method based on guide network and full-connection conditional random field
CN111178359A (en) License plate number recognition method, device and equipment and computer storage medium
JP4967045B2 (en) Background discriminating apparatus, method and program
CN109785367B (en) Method and device for filtering foreign points in three-dimensional model tracking
CN110889817A (en) Image fusion quality evaluation method and device
CN112991419B (en) Parallax data generation method, parallax data generation device, computer equipment and storage medium
CN110580706A (en) Method and device for extracting video background model
CN112446428B (en) Image data processing method and device
JP2016081472A (en) Image processing device, and image processing method and program
CN111986176B (en) Crack image identification method, system, terminal and readable storage medium
CN111951254B (en) Edge-guided weighted-average-based source camera identification method and system
CN114022699A (en) Image classification method and device, computer equipment and storage medium
CN113486788A (en) Video similarity determination method and device, electronic equipment and storage medium
WO2020113563A1 (en) Facial image quality evaluation method, apparatus and device, and storage medium
Graf et al. Robust image segmentation in low depth of field images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220222

Address after: 550025 Huawei cloud data center, jiaoxinggong Road, Qianzhong Avenue, Gui'an New District, Guiyang City, Guizhou Province

Patentee after: Huawei Cloud Computing Technology Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd.

TR01 Transfer of patent right