WO2022116668A1

WO2022116668A1 - Video filtering method and system based on identical content, and device

Info

Publication number: WO2022116668A1
Application number: PCT/CN2021/121872
Authority: WO
Inventors: 李美影
Original assignee: 苏州浪潮智能科技有限公司
Priority date: 2020-12-04
Filing date: 2021-09-29
Publication date: 2022-06-09
Also published as: CN112653928B; CN112653928A

Abstract

Provided are a video filtering method and system based on identical content, and a device. The method comprises the following steps: on the basis of a target video, detecting and acquiring corresponding similar videos from a library of videos having identical content; respectively performing timing extraction on the target video and the similar videos so as to obtain respective feature timing diagrams; matching the feature timing diagram of the target video and the feature timing diagrams of the similar videos by using a dynamic time warping algorithm, so as to obtain identical video clips; respectively performing, on the basis of the duration of the target video, sample extraction on the video clips obtained by means of matching, so as to obtain N frames of images, respectively; and respectively performing image information amount calculation on the N frames of images, respectively obtaining corresponding video information amounts of the target video and the similar videos on the basis of the calculated image information amount, and removing videos with a small video information amount. In the present application, videos with a greater information amount and having higher values are efficiently screened out, thereby reducing disk usage, saving on network resources, and facilitating the improvement of the efficiency of video retrieval.

Description

A kind of video filtering method, system and device based on the same content

This application claims the priority of the Chinese patent application filed on December 04, 2020 with the application number 202011406954.4 and the title of the invention is "a video filtering method, system and device based on the same content", the entire contents of which are obtained through Reference is incorporated in this application.

technical field

The present application relates to the field of video technology, and in particular, to a content-based video filtering method, system, and device.

Background technique

With the rapid development and wide application of information technology, there are more and more online videos, but at the same time, there are also a large number of similar videos. Their video content is the same, but they have undergone format conversion, scaling and deformation, adding watermarks, advertisements, filters, etc. . Similar video content is repeated and occupies a lot of disk resources and network resources, and also slows down the video retrieval speed, which leads to a huge waste of economic value. For video platforms, cost is the top priority. Similar videos have the same content, have very low value or can even be ignored, but take up a lot of resources. At the same time, pushing the same video to users will easily affect the user experience. . Therefore, it is urgent to screen videos with the same content, and how to screen and retain videos with higher value has become an urgent problem to be solved at present.

The existing video filtering technologies with the same content and the existing technical problems are as follows:

1. Compare the size of video files. By default, large files have a larger amount of information, and small files are filtered out. In the case of video format conversion, video resolution increase, and video frames per second increase, this method is likely to select the most information content. Larger or more resource-intensive videos with the same amount of information;

2. Compared with the video resolution, the default large resolution has more information, and the low-resolution video is filtered out; however, some video websites manage the unified resolution and convert the video source file to a higher-resolution video to adapt to the platform. In this case, this method is likely to select videos with less information or the same amount of information but more resource-consuming videos;

3. Comparing the length of the video, the longer the default duration, the greater the amount of information, and the low-duration video is filtered out; the video is long but blurry, and the video is short but clearer, and the amount of information is greater. In this case, this method is likely to select a small amount of information. 's video.

SUMMARY OF THE INVENTION

In view of this, the purpose of the present application is to propose a video filtering method, system and device based on the same content, so as to solve the problem that the video with the same content cannot be accurately filtered out in the prior art.

Based on the above purpose, the present application provides a content-based video filtering method, comprising the following steps:

Detect and obtain the corresponding similar videos in the video library with the same content based on the target video;

Perform timing extraction processing on the target video and similar videos respectively to obtain their respective feature sequence diagrams;

Match the feature sequence diagram of the target video and the feature sequence diagram of similar videos to the same video segment through the dynamic time warping algorithm;

Based on the duration of the target video, sample extraction processing is performed on the matched video segments, respectively, to obtain n frames of images;

The image information content is calculated for n frames of images respectively, and the video information content of the corresponding target video and similar videos is obtained based on the calculated image information content, and the videos with small video information content are eliminated.

In some embodiments, performing timing extraction processing on the target video and the similar video respectively to obtain the respective feature timing diagrams includes: extracting the gray value of the middle pixel point of the image on a frame-by-frame basis from the target video and the similar video respectively, and establishing As a function of it and time, the respective characteristic timing diagrams are obtained.

In some embodiments, using the dynamic time warping algorithm to match the feature sequence diagram of the target video and the feature sequence diagram of the similar video into the same video segment includes: selecting the feature sequence diagram of the target video and the feature sequence diagram of the similar video by the dynamic time warping algorithm. There are different keyframes in the feature sequence diagram, and based on the keyframes, the same video clips that are matched are obtained.

In some embodiments, sample extraction processing is performed on the matched video clips based on the duration of the target video, respectively, and the n frames of images obtained respectively include:

Establish a decimation frequency function based on the duration of the target video;

According to the extraction frequency function, the matched video segments are extracted frame by frame, and n frames of images are obtained respectively.

In some embodiments, establishing the extraction frequency function based on the duration of the target video includes: dividing the duration of the target video into four temporal gradients, which are: 1s to 60s, 1min to 60min, 1h to 10h, and 10h or more. The time gradient of , establishes different times of decimation.

In some embodiments, the image information content calculation includes: calculating the image two-dimensional entropy as the image information content.

In some embodiments, respectively obtaining the video information content of the corresponding target video and similar videos based on the calculated image information content, and removing the videos with small video information content includes:

The first average value is obtained to the n-frame image information content of the target video, and the product of the first average value and the target video duration is used as its video information content;

The second average value is calculated for the n-frame image information content of the similar video, and the product of the second average value and the similar video duration is used as its video information content;

Compare the video information content of the target video with the video information content of similar videos, and remove the videos with less video information.

In some embodiments, the method further includes: in response to the video with little video information being culled, updating the same-content video library and the regular video library, wherein the regular video library is used to store videos with separate content.

Another aspect of the present application also provides a video filtering system based on the same content, including:

A video detection module, configured to detect and obtain a corresponding similar video in a video library with the same content based on the target video;

The timing extraction module is configured to perform timing extraction processing on the target video and similar videos respectively to obtain their respective feature sequence diagrams;

The video segment matching module is configured to match the feature sequence diagram of the target video and the feature sequence diagram of similar videos to the same video segment through the dynamic time warping algorithm;

a sample extraction module, configured to perform sample extraction processing on the matched video segments based on the duration of the target video, respectively, to obtain n frames of images; and

The video culling module is configured to calculate the image information amount of the n-frame images respectively, and obtain the video information amount of the corresponding target video and similar videos based on the calculated image information amount, and eliminate the video with a small amount of video information. .

In another aspect of the present application, a computer device is also provided, including a memory and a processor, where a computer program is stored in the memory, and when the computer program is executed by the processor, any one of the above methods is executed.

The present application has at least the following beneficial technical effects:

In this application, by performing timing extraction processing and video clip matching on the target video and similar videos, the video clips with the same content are efficiently screened; Large and more valuable videos; by eliminating videos with small video information, the disk usage is reduced, network resources are saved, and at the same time, it helps to improve the efficiency of video retrieval, and further brings greater economic value to the video platform. .

Description of drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following briefly introduces the accompanying drawings required for the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present application. For those of ordinary skill in the art, other embodiments can also be obtained according to these drawings without creative efforts.

1 is a schematic diagram of an embodiment of a content-based video filtering method provided according to the present application;

2 is a feature sequence diagram matched by a target video and a corresponding similar video through a dynamic time warping algorithm according to an embodiment of the present application;

3 is a schematic diagram of an embodiment of a video filtering system based on the same content provided according to the present application;

FIG. 4 is a schematic diagram of a hardware structure of an embodiment of a computer device for performing the same content-based video filtering method provided by the present application.

Detailed ways

In order to make the objectives, technical solutions and advantages of the present application clearer, the following describes the embodiments of the present application in detail with reference to the accompanying drawings and specific embodiments.

It should be noted that all the expressions using "first" and "second" in the embodiments of this application are to distinguish two non-identical entities or non-identical parameters with the same name. " is only for the convenience of expression, and should not be construed as a limitation on the embodiments of the present application. Furthermore, the terms "comprising" and "having", and any variations thereof, are intended to cover non-exclusive inclusion, for example, other steps or units inherent in a process, method, system, product or apparatus comprising a series of steps or units.

Based on the above purpose, in the first aspect of the embodiments of the present application, an embodiment of a video filtering method based on the same content is proposed. FIG. 1 shows a schematic diagram of an embodiment of the same content-based video filtering method provided by the present application. As shown in Figure 1, the embodiment of the present application includes the following steps:

Step S10, based on the target video, detect and obtain the corresponding similar video in the video library with the same content;

Step S20, performing timing extraction processing on the target video and the similar video respectively, to obtain respective feature sequence diagrams;

Step S30, match the feature sequence diagram of the target video and the feature sequence diagram of the similar video to the same video segment by the dynamic time warping algorithm;

Step S40, performing sample extraction processing on the matched video clips based on the duration of the target video, respectively obtaining n frames of images;

Step S50: Calculate the amount of image information for the n frames of images respectively, and obtain the video information amount of the corresponding target video and similar videos based on the calculated amount of image information, and remove the videos with small video information amount.

In the embodiment of the present application, the target video and similar videos are respectively subjected to timing extraction processing and video clip matching to efficiently screen out video clips with consistent content; through sample extraction processing, image information content calculation, and video information content calculation, information is selected. More valuable videos; by eliminating videos with small amount of video information, the disk usage is reduced, network resources are saved, and at the same time, it helps to improve the efficiency of video retrieval, and further brings greater benefits to the video platform. Economic Value.

In some embodiments, performing timing extraction processing on the target video and the similar video respectively to obtain the respective feature timing diagrams includes: extracting the gray value of the middle pixel point of the image on a frame-by-frame basis from the target video and the similar video respectively, and establishing As a function of it and time, the respective characteristic timing diagrams are obtained. Figure 2 shows the feature sequence diagram of the target video and similar videos, where video A represents the target video, video B represents the similar video, the abscissa indicates that video A and video B are extracted regularly, and the ordinate indicates that The gray value of the middle pixel of the image of the corresponding time frame, the gray value and time form a time series function, and the time series is used as the feature value of the video.

In some embodiments, using the dynamic time warping algorithm to match the feature sequence diagram of the target video and the feature sequence diagram of the similar video into the same video segment includes: selecting the feature sequence diagram of the target video and the feature sequence diagram of the similar video by the dynamic time warping algorithm. There are different keyframes in the feature sequence diagram, and based on the keyframes, the same video clips that are matched are obtained. Fig. 2 is the feature sequence diagram that the target video according to the embodiment of the present application and the corresponding similar video are matched by the dynamic time warping algorithm, the key frame with difference is obtained by the dynamic time warping algorithm, and the matching is shown by the dotted line segmentation in the figure of the same video clip.

In some embodiments, performing sample extraction processing on the matched video clips based on the duration of the target video, respectively obtaining n frames of images includes: establishing an extraction frequency function based on the duration of the target video; Fragments are extracted frame by frame, and each obtains n frames of images. In some embodiments, establishing a decimation frequency function based on the duration of the target video includes: dividing the duration of the target video into four temporal gradients, which are: 1s to 60s, 1min to 60min, 1h to 10h, and more than 10h. The time gradient of , establishes different times of decimation. In this embodiment, the longer the video time is, the more times of extraction, because the longer the video is, the more data is required for comparison to complete the similarity comparison. The extraction frequency function f(x) is shown in the following formula, where x is the video time length; when the video time is 1s to 60s, it is extracted once in 0.05x seconds; when the video time is from 1min to 60min, it is extracted once in 0.025x minutes; If the time is from 1h to 10h, it will be extracted once every 0.01x hour. When the video time is more than 10h, it will be extracted once every 0.01x hour.

In some embodiments, the image information content calculation includes: calculating the image two-dimensional entropy as the image information content. In this embodiment, the two-dimensional entropy of the image is used as the amount of image information, and the average gray value of the neighborhood of image pixels is selected as the spatial feature of gray distribution, which is combined with the pixel gray of the image to form a feature two-tuple, denoted as (i, j) , f(i,j) represents the frequency of the feature two-tuple (i,j), where i represents the gray value of the pixel, j represents the average gray value of the neighborhood; M, N represent the image size; such as the following formula:

The two-dimensional entropy of the image is as follows:

In some embodiments, obtaining the video information content of the corresponding target video and similar videos based on the calculated image information content, and removing the video with a small video information content includes: calculating the first image information content of n frames of the target video. an average value, and the product of the first average value and the target video duration is used as its video information content; a second average value is obtained for the information content of n frames of similar videos, and the product of the second average value and the similar video time duration is used as Its video information content; compare the video information content of the target video and the video information content of similar videos, and remove the videos with small video information content. In this embodiment, the amount of video information is equal to the two-dimensional entropy of the image multiplied by the duration, and H _sum represents the amount of video information, L indicates the duration of the video, and the unit is frame, then the amount of video information is:

H _sum = H*L

Get the video information amount. The smaller the amount of information, the smaller the value. The duplicate videos with small amount of information will be eliminated or deleted. That is, if the amount of video information of the target video is small, the target video will be eliminated. If the amount of video information of the corresponding similar video Less, similar videos will be eliminated.

In some embodiments, the method further includes: in response to the video with little video information being culled, updating the same-content video library and the regular video library, wherein the regular video library is used to store videos with separate content. By updating the video library with the same content and the conventional video library, it is convenient to improve the efficiency of subsequent video retrieval.

In a second aspect of the embodiments of the present application, a video filtering system based on the same content is also provided. FIG. 3 shows a schematic diagram of an embodiment of the same content-based video filtering system provided by the present application. A video filtering system based on the same content includes: a video detection module 10, configured to detect and obtain a corresponding similar video in a video library with the same content based on a target video; a timing extraction module 20, configured to detect the target video and the similar video. Respectively perform timing extraction processing to obtain respective feature sequence diagrams; the video segment matching module 30 is configured to match the feature sequence diagram of the target video and the feature sequence diagram of similar videos to the same video segment through a dynamic time warping algorithm; sample extraction The module 40 is configured to perform sample extraction processing on the matched video clips based on the duration of the target video, respectively, to obtain n frames of images; The video information content of the corresponding target video and similar videos is obtained based on the calculated image information content, and the videos with small video information content are eliminated.

The video filtering system based on the same content in this embodiment efficiently filters out video clips with the same content by performing timing extraction processing and video clip matching on the target video and similar videos respectively; By calculating the amount of information, videos with more information and more valuable are selected; by eliminating the videos with less video information, the disk usage is reduced, network resources are saved, and at the same time, it helps to improve the efficiency of retrieving videos, and further improves the efficiency of video retrieval. Video platforms bring great economic value.

In a third aspect of the embodiments of the present application, a computer device is further provided, including a memory 302 and a processor 301, where a computer program is stored in the memory, and when the computer program is executed by the processor, any one of the foregoing embodiments is implemented method.

As shown in FIG. 4 , it is a schematic diagram of a hardware structure of an embodiment of a computer device for performing the same content-based video filtering method provided by the present application. Taking the computer device shown in FIG. 4 as an example, the computer device includes a processor 301 and a memory 302 , and may also include an input device 303 and an output device 304 . The processor 301 , the memory 302 , the input device 303 and the output device 304 may be connected by a bus or in other ways, and the connection by a bus is taken as an example in FIG. 4 . The input device 303 may receive input numerical or character information, and generate key signal input related to user settings and function control of the same content-based video filtering system. The output device 304 may include a display device such as a display screen. The processor 301 executes various functional applications and data processing of the server by running the non-volatile software programs, instructions and modules stored in the memory 302, that is, implementing the same content-based video filtering method of the above method embodiments.

Finally, it should be noted that computer-readable storage media (eg, memory) herein may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. By way of example and not limitation, nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory memory. Volatile memory may include random access memory (RAM), which may act as external cache memory. By way of example and not limitation, RAM is available in various forms such as Synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to include, but not be limited to, these and other suitable types of memory.

Those skilled in the art will also appreciate that the various exemplary logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends on the specific application and design constraints imposed on the overall system. Those skilled in the art may implement the described functions in various ways for each specific application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosure of the embodiments of the present application.

The above are exemplary embodiments disclosed in the present application, but it should be noted that various changes and modifications may be made without departing from the scope of the disclosure of the embodiments of the present application defined by the claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements disclosed in the embodiments of the present application may be described or claimed in an individual form, unless explicitly limited to the singular, they may also be construed as a plurality.

It should be understood that, as used herein, the singular form "a" is intended to include the plural form as well, unless the context clearly supports an exception. It will also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. The above-mentioned embodiments of the present application disclose the serial numbers of the embodiments only for description, and do not represent the advantages and disadvantages of the embodiments.

Those of ordinary skill in the art should understand that the discussion of any of the above embodiments is only exemplary, and is not intended to imply that the scope (including the claims) disclosed by the embodiments of the present application is limited to these examples; under the idea of the embodiments of the present application , the technical features in the above embodiments or different embodiments can also be combined, and there are many other changes in different aspects of the above embodiments of the present application, which are not provided in detail for the sake of brevity. Therefore, any omission, modification, equivalent replacement, improvement, etc. made within the spirit and principle of the embodiments of the present application should be included within the protection scope of the embodiments of the present application.

Claims

A kind of video filtering method based on the same content, is characterized in that, comprises the following steps:

Detect and obtain the corresponding similar videos in the video library with the same content based on the target video;

The target video and the similar video are respectively subjected to timing extraction processing to obtain their respective feature sequence diagrams;

Match the feature sequence diagram of the target video and the feature sequence diagram of similar videos to the same video segment through the dynamic time warping algorithm;

Based on the duration of the target video, sample extraction processing is performed on the matched video segments, respectively, to obtain n frames of images;

The image information content is calculated for n frames of images respectively, and the video information content of the corresponding target video and similar videos is obtained based on the calculated image information content, and the videos with small video information content are eliminated.
The method according to claim 1, characterized in that, performing timing extraction processing on the target video and similar videos respectively, and obtaining respective feature timing diagrams comprising:

From the target video and the similar video, the gray value of the middle pixel point of the image is periodically extracted by frame, and the function of the gray value and the time is established respectively to obtain the respective feature sequence diagrams.
The method according to claim 1, wherein matching the feature sequence diagram of the target video and the feature sequence diagram of the similar video to the same video segment by a dynamic time warping algorithm comprises:

A dynamic time warping algorithm is used to select key frames with differences between the feature sequence diagram of the target video and the feature sequence diagrams of similar videos, and based on the key frames, matching identical video segments are obtained.
The method according to claim 1, wherein the sample extraction processing is performed on the matched video clips based on the duration of the target video, and the n frames of images obtained respectively include:

Establish a decimation frequency function based on the duration of the target video;

According to the extraction frequency function, the matched video segments are extracted frame by frame, and n frames of images are obtained respectively.
The method according to claim 4, wherein establishing the extraction frequency function based on the duration of the target video comprises:

The duration of the target video is divided into four time gradients, namely: 1s to 60s, 1min to 60min, 1h to 10h, and more than 10h, and different extraction times are established for different time gradients.
The method according to claim 1, wherein the calculation of the amount of image information comprises:

Calculate the two-dimensional entropy of the image as the amount of image information.
The method according to claim 1, wherein, respectively obtaining the video information content of the corresponding target video and the similar video based on the calculated image information content, and removing the videos with small video information content comprises:

The first average value is obtained to the n-frame image information content of the target video, and the product of the first average value and the target video duration is used as its video information content;

The second average value is calculated for the n-frame image information content of the similar video, and the product of the second average value and the similar video duration is used as its video information content;

Compare the video information content of the target video with the video information content of similar videos, and remove the videos with less video information.
The method according to claim 1, wherein the method further comprises:

In response to the video having a small amount of video information being culled, the same-content video library and the regular video library are updated, wherein the regular video library is used to store videos with separate contents.
The method according to claim 7, wherein the comparing the video information content of the target video and the video information content of similar videos, and culling the videos with small video information content further comprises:

In response to the amount of video information of the target video being less than the amount of video information of the similar videos, the target video is eliminated;

In response to the video information content of the similar videos being less than the video information content of the target video, the similar videos are eliminated.
A video filtering system based on the same content, comprising:

a video detection module, configured to detect and acquire corresponding similar videos in a video library with the same content based on the target video;

a timing extraction module, configured to perform timing extraction processing on the target video and similar videos respectively to obtain respective feature timing diagrams;

The video segment matching module is configured to match the feature sequence diagram of the target video and the feature sequence diagram of similar videos to the same video segment through the dynamic time warping algorithm;

a sample extraction module, configured to perform sample extraction processing on the matched video segments based on the duration of the target video, respectively, to obtain n frames of images; and

The video culling module is configured to calculate the image information amount of the n-frame images respectively, and obtain the video information amount of the corresponding target video and similar videos based on the calculated image information amount, and eliminate the video with a small amount of video information. .
A computer device comprising a memory and a processor, wherein a computer program is stored in the memory, and the computer program is executed by the processor to execute the method according to any one of claims 1 to 9.