CN114051163B

CN114051163B - Copyright monitoring method and system based on video subtitle comparison

Info

Publication number: CN114051163B
Application number: CN202111328854.9A
Authority: CN
Inventors: 何平涛; 陈雷勇; 赵善为; 张奕良; 李天水
Original assignee: Guangdong Electroshock Media Technology Co ltd
Current assignee: Guangdong Electroshock Media Technology Co ltd
Priority date: 2021-11-10
Filing date: 2021-11-10
Publication date: 2024-03-22
Anticipated expiration: 2041-11-10
Also published as: CN114051163A

Abstract

The invention discloses a copyright monitoring method and a system based on video subtitle comparison, which can have the characteristics of lower cost, higher accuracy and efficiency of copyright monitoring and the like, and the method comprises the following steps: acquiring a copyrighted video and extracting first subtitle information of the copyrighted video; capturing a crawler video according to a preset keyword, and extracting second subtitle information of the crawler video; and comparing the second subtitle information with the first subtitle information to obtain a subtitle comparison result of the crawler video and the copyright video, wherein the subtitle comparison result is used for judging whether copyright infringement exists in the crawler video.

Description

Copyright monitoring method and system based on video subtitle comparison

Technical Field

The invention relates to the technical field of video subtitle extraction, in particular to a copyright monitoring method and system based on video subtitle comparison.

Background

Currently, video copyright monitoring technology in the industry is generally implemented through video content comparison. For example, video DNA (which may be referred to as a video fingerprint) is extracted from original video content, then video DNA is extracted from pirated video content, the video DNA of the original video and the video DNA of the pirated video are compared, and if the similarity of the two reaches a certain threshold, pirated video infringement is determined.

However, the video content comparison-based scheme has the following problems:

1. the technical threshold for extracting the video DNA is higher and the cost is higher.

2. The characteristic distinction degree of the video content is not large, the distance cannot be pulled away in the comparison process, and the problem of poor accuracy of copyright monitoring exists.

3. If the comparison threshold is not good, the problem of lower hit rate after comparison may exist because the feature distinction of the video content is not large if the comparison threshold is high, or a large number of videos still need to be checked manually one by one after comparison and screening if the comparison threshold is low, and the efficiency of copyright monitoring is low.

Therefore, there is a need to design a copyright monitoring scheme with lower cost and higher accuracy and efficiency of copyright monitoring.

Disclosure of Invention

Based on the problems of higher cost, lower accuracy and lower efficiency of copyright monitoring in the prior art, the invention aims to provide the copyright monitoring method and the system based on video subtitle comparison, which have the characteristics of lower cost, higher accuracy and higher efficiency of copyright monitoring and the like.

In a first aspect, an embodiment of the present invention provides a method for monitoring copyright based on video subtitle comparison, including:

acquiring a copyrighted video and extracting first subtitle information of the copyrighted video;

capturing a crawler video according to a preset keyword, and extracting second subtitle information of the crawler video;

and comparing the second subtitle information with the first subtitle information to obtain a subtitle comparison result of the crawler video and the copyright video, wherein the subtitle comparison result is used for judging whether copyright infringement exists in the crawler video.

In one possible design, acquiring a copyrighted video and extracting first subtitle information of the copyrighted video includes:

performing frame extraction processing on the copyrighted video by using an open source picture video processing tool OpenCV to obtain multi-frame video images;

extracting caption information of the video images of a plurality of frames by adopting a hundred-degree open-source deep learning framework PaddlePaddle;

and merging the caption information of the video images of a plurality of frames to obtain the first caption information.

In one possible design, the frame extraction processing is performed on the copyrighted video by using an OpenCV (open source picture video processing) tool to obtain a multi-frame video image, including:

capturing the copyrighted video by adopting a cv2.videocapture;

performing frame extraction processing on the copyrighted video by adopting a preset frame extraction interval duration to obtain a multi-frame original video image;

and adjusting the sizes of the original video images of a plurality of frames to be the same, and obtaining the video images of the plurality of frames.

In one possible design, extracting subtitle information for a plurality of frames of the video image using a hundred degree open source deep learning framework paddlepad includes:

instantiating a text detector according to a detection algorithm of a preset model;

instantiating a text recognizer according to a recognition mode of the preset model;

based on the text detector, performing recognition processing on a plurality of frames of the video images to obtain text range vectors of the plurality of frames of the video images;

arranging text range vectors of a plurality of frames of video images according to a preset mode;

clipping the text range vector of the multi-frame video image to obtain a clipped text range vector of the multi-frame video image;

and based on the text identifier, identifying and extracting texts in the text range vectors of the multiple frames of the video images after trimming, and obtaining caption information of the multiple frames of the video images.

In one possible design, the pruning process is performed on the text range vectors of the multiple frames of the video images to obtain the text range vectors of the multiple frames of the video images after pruning, including:

performing modular computation on the text range vectors of the multiple frames of video images by adopting a linear algebraic operation library numpy to obtain the modules of the text range vectors of the multiple frames of video images;

and based on the mode of the text range vector of the multi-frame video image, pruning the text region range vector of the multi-frame video image to obtain the text range vector after the multi-frame video image is pruned.

In one possible design, the preset model is a word recognition model ppocr;

before instantiating the text detector according to the detection algorithm of the preset model and instantiating the text recognizer according to the recognition mode of the preset model, the method further comprises:

setting model types and versions;

setting a model detection algorithm;

setting whether to use the GPU;

setting whether to use a null character;

setting whether angle_cls is used;

a path of the source video image is set.

In one possible design, the identifying and extracting the text in the text range vector after the clipping of the multiple frames of the video image based on the text identifier, to obtain the first subtitle information includes:

based on the text identifier, the text recognition model ppocr is adopted to perform positioning recognition on texts in text range vectors of a plurality of frames of video images after trimming, and the texts in the text range vectors of the plurality of frames of video images after trimming are extracted to obtain caption information of the plurality of frames of video images.

In one possible design, merging caption information of a plurality of frames of the video image to obtain the first caption information includes:

the method comprises the steps of performing outward extension processing on the coordinate range of a text range vector of a plurality of frames of video images, and calculating the editing distance between any two adjacent pieces of caption information in the caption information of the plurality of frames of video images;

if the editing distance between any two pieces of adjacent caption information is smaller than or equal to a preset threshold value, reserving one piece of caption information in the two pieces of adjacent caption information; or,

if the editing distance between any two pieces of adjacent caption information is determined to be greater than the preset threshold value, retaining the two pieces of adjacent caption information;

and merging all the reserved caption information to obtain the first caption information.

In a second aspect, an embodiment of the present invention further provides a copyright monitoring system based on video subtitle comparison, including:

the processing unit is used for acquiring the copyrighted video and extracting first subtitle information of the copyrighted video; capturing a crawler video according to a preset keyword, and extracting second subtitle information of the crawler video;

and the comparison unit is used for comparing the second subtitle information with the first subtitle information to obtain a subtitle comparison result of the crawler video and the copyright video, and the subtitle comparison result is used for judging whether copyright infringement exists in the crawler video.

In one possible design, the processing unit is specifically configured to:

capturing the copyrighted video by adopting a cv2.videocapture;

In one possible design, the processing unit is specifically configured to:

In one possible design, the preset model is a word recognition model ppocr; the processing unit is further configured to:

setting model types and versions;

setting a model detection algorithm;

setting whether to use the GPU;

setting whether to use a null character;

setting whether angle_cls is used;

a path of the source video image is set.

In one possible design, the processing unit is specifically configured to:

In a third aspect, embodiments of the present invention also provide a computer-readable storage medium storing at least one program; the method according to any one of the possible designs described above is implemented when the at least one program is executed by a processor.

The technical scheme provided by the embodiment of the invention has the following beneficial technical effects:

a. the distinguishing degree is large: the distinguishing degree of the video captions on different videos is quite obvious, the distance can be pulled apart in the caption information comparison process, and the accuracy of copyright monitoring can be further improved.

b. The technical threshold is low, and the cost is low: the technology for extracting the video captions and the related algorithm model adopted by the method are very mature.

c. The efficiency is high: after the subtitle information of the video is successfully extracted, the mutual comparison method is simple and efficient, and the accuracy of copyright monitoring can be improved.

d. The accuracy is high: compared with the hit rate of video content feature comparison, the hit rate of subtitle information comparison is high, and the accuracy of copyright monitoring can be improved.

Drawings

Fig. 1 is a flow chart of a copyright monitoring method based on video subtitle comparison according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a process of executing step S101 according to an embodiment of the present invention;

fig. 3 is a schematic diagram of an architecture of a copyright monitoring system based on video subtitle comparison according to an embodiment of the present invention.

Detailed Description

Terms of orientation such as up, down, left, right, front, rear, front, back, top, bottom, etc. mentioned or possible mentioned in this specification are defined with respect to their construction, and they are relative concepts. Therefore, the position and the use state of the device may be changed accordingly. These and other directional terms should not be construed as limiting terms.

The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of implementations consistent with aspects of the present disclosure.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

The shapes and sizes of the individual components in the drawings do not reflect true proportions, and are intended to illustrate only implementations described in the following exemplary examples.

Before describing the embodiments of the present invention, some terms related to the embodiments of the present invention are first explained so as to facilitate understanding of technical solutions provided by the embodiments of the present invention by those skilled in the art.

The embodiment of the invention relates to a pallet which is a deep learning framework with hundred degrees open source.

The PaddleCV related in the embodiment of the invention is a hundred-degree open-source computer vision model library.

The embodiment of the invention relates to a ppr, which is a character recognition model and technology based on a hundred-degree parallel deep learning framework.

The OpenCV related in the embodiment of the invention is an open-source computer vision tool library, and can perform frame extraction processing on pictures and videos to obtain feature vectors.

Numpy, which is referred to in the embodiment of the present invention, is an open-source linear algebraic computation library, and may be used to perform various operations on vectors.

The pandas related to the embodiment of the invention is a table data processing tool based on numpy, and can conveniently carry out batch processing on data.

For better understanding and implementation, the technical solutions provided by the embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Referring to fig. 1, a flow chart of a video subtitle comparison-based copyright monitoring method is provided in an embodiment of the present invention. As shown in fig. 1, the method may include the steps of:

s101, acquiring a copyrighted video and extracting first subtitle information of the copyrighted video.

In some embodiments, the process of implementing step S101 may include the steps of:

s201, performing frame extraction processing on the copyrighted video by using an open source picture video processing tool OpenCV to obtain multi-frame video images.

In a specific implementation, cv2.videocapture may be employed to capture copyrighted video. And then, carrying out frame extraction processing on the copyrighted video by adopting a preset frame extraction interval time length to obtain a multi-frame original video image. And then, the sizes of the multiple frames of original video images can be adjusted to be the same, namely, the multiple frames of original video images are resize, so that the multiple frames of video images are obtained. It can be understood that the video images with different resolutions are adjusted to be the same in size, so that the positioning coordinate position of the fixed caption can be realized, and the captions of various video images with different specifications can be positioned.

S202, extracting caption information of multi-frame video images by adopting a hundred-degree open-source deep learning framework PaddlePaddle.

In a specific implementation process, the related information of the preset model can be set first, so that the subtitle information of the copyrighted video can be conveniently identified and extracted. Specific setup procedures may include, but are not limited to, the following:

A. the model type and version are set.

B. A model detection algorithm is set.

C. It is set whether to use the GPU.

D. It is set whether to use null characters.

E. Whether angle_cls is used is set.

F. A path of the source video image is set.

It should be noted that, the embodiment of the present invention is not limited to the specific implementation sequence of the above-mentioned process, and the sequence selection of the above-mentioned process may be performed according to actual needs.

In a specific implementation, the preset model may be a word recognition model ppocr.

In a specific process, after the setting of the related information of the preset model is completed, the text detector can be instantiated according to a detection algorithm of the preset model, and the text recognizer can be instantiated according to a recognition mode of the preset model. The text identifier may be instantiated after or simultaneously with the instantiation of the text detector, which is not limited by the embodiment of the present invention.

In a specific implementation process, after the text detector is instantiated, the multi-frame video image can be identified based on the text detector, so as to obtain text range vectors of the multi-frame video image, and the text range vectors of the multi-frame video image can be regarded as a group of vectors. The text range vectors of the multi-frame video image may be arranged in a predetermined manner, for example, the text range vectors of the multi-frame video image may be arranged in an order from high to low on the ordinate of the text region range vector. The ordinate of the text region range vector can correspond to the appearance sequence of the video images in the copyrighted video, so that the problem of misjudgment of subtitle information comparison caused by disordered sequence of the video images can be avoided, and the accuracy of subsequent subtitle information comparison can be improved.

In a specific implementation process, after the arrangement of the text range vectors of the multi-frame video images is completed, the text range vectors of the multi-frame video images can be trimmed, the text range vectors of the multi-frame video images after trimming can be obtained, the content characteristics of the multi-frame video images can be simplified, and further, the identification and extraction of caption information can be facilitated. For example, the linear algebraic arithmetic library numpy may be used to perform modulo computation on the text range vector of the multi-frame video image, and an np. And then, based on the mode of the text range vector of the multi-frame video image, pruning the text region range vector of the multi-frame video image to obtain the text range vector after the multi-frame video image is pruned.

In a specific implementation process, after the text recognizer is instantiated, text in the text range vector after the multi-frame video image is trimmed can be recognized and extracted based on the text recognizer, so that caption information of the multi-frame video image is obtained. For example, based on a text recognizer, a text recognition model ppocr is used to perform positioning recognition on the text in the text range vector after the multi-frame video image trimming, and extract the text in the text range vector after the multi-frame video image trimming, so as to obtain the caption information of the multi-frame video image.

And S203, merging the caption information of the multi-frame video images to obtain first caption information of the copyrighted video.

In a specific implementation process, the editing distance between any two adjacent pieces of subtitle information in the subtitle information of the multi-frame video image can be calculated based on the outward extension processing of the coordinate range of the text range vector of the multi-frame video image. If the editing distance between any two pieces of adjacent caption information is smaller than or equal to the preset threshold value, one piece of caption information in the any two pieces of adjacent caption information can be reserved; or if the editing distance between any two pieces of adjacent caption information is determined to be greater than the preset threshold value, retaining the any two pieces of adjacent caption information. For example, the preset threshold may be set to 0, and when the edit distance between the two pieces of any adjacent caption information is equal to 0, it may indicate that the two pieces of any adjacent caption information are identical, and at this time, one piece of caption information in the two pieces of any adjacent caption information may be reserved. When the edit distance between any two pieces of adjacent caption information is greater than 0, it may be indicated that the any two pieces of adjacent caption information are not identical, and at this time, the any two pieces of adjacent caption information may be retained. And finally, combining all the reserved caption information to obtain the first caption information of the video copyright.

S102, capturing a crawler video according to a preset keyword, and extracting second subtitle information of the crawler video.

In a specific implementation process, after capturing the crawler video according to the preset keyword, the second subtitle information of the crawler video may be extracted in the same or similar manner to the first subtitle information of the extracted copyrighted video, and specifically, the content may be referred to above, which is not described herein again.

In a specific implementation process, the preset keywords may be set according to actual requirements, which is not limited by the embodiment of the present invention.

And S103, comparing the second subtitle information of the crawler video with the first subtitle information of the copyrighted video to obtain a subtitle comparison result of the crawler video and the copyrighted video.

In some embodiments, the subtitle comparison result may be used to determine whether the crawler video has copyright infringement.

In some embodiments, a comparison threshold between the second subtitle information of the crawler video and the first subtitle information of the copyrighted video may be set. For example, when the preset threshold is set to 5 based on the accuracy rate by combining factors such as the subtitle extraction quality and the like and comparing the test for a plurality of times, the accuracy is relatively high, the preset threshold may be set to 5.

In some embodiments, the second subtitle information of each crawler video may be compared with the first subtitle information of all the copyrighted videos in the copyrighted library in a circulating manner, that is, the second subtitle information of each crawler video captured back by each crawler is compared with the first subtitle information of all the copyrighted videos in the copyrighted library, so as to obtain a subtitle comparison result corresponding to any crawler video.

For example, if the subtitle comparison result corresponding to any one crawler video indicates that the number of subtitle information matches between the second subtitle information of any one crawler video and the first subtitle information of a certain copyright video is greater than a preset threshold, for example, 5, it is indicated that copyright infringement exists in any one crawler video.

In an applicable scenario mentioned in the embodiment of the present invention, as shown in fig. 1, the method for monitoring copyright based on video subtitle comparison provided in the embodiment of the present invention may further include the following steps:

s104, confirming the correctness of the comparison result in a manual auditing mode.

In some embodiments, the accuracy of the comparison result is confirmed through a manual auditing mode, so that the accuracy of judging whether the crawler video has copyright infringement can be further improved.

As can be seen from the above description, in the technical solution provided in the embodiments of the present invention, the monitoring of video rights is achieved by comparing video subtitle information, so that complicated processing of video content features can be skillfully avoided, and the problem of insufficient distinction of video content can be avoided. Compared with the scheme for video copyright monitoring based on video content in the prior art, as different videos are unlikely to have the same subtitle, the videos are distinguished through the video subtitles, and the attribute characteristics of the videos can be simplified and distinguished only by ensuring the quality of the video subtitles.

As can be seen from the above description, the technical solution provided in the embodiment of the present invention may have, but is not limited to, the following features:

Based on the same inventive concept, the embodiment of the invention also provides a copyright monitor system based on video subtitle comparison, as shown in fig. 3, the system may include:

a processing unit 301, configured to acquire a copyrighted video, and extract first subtitle information of the copyrighted video; capturing a crawler video according to a preset keyword, and extracting second subtitle information of the crawler video;

and the comparing unit 302 is configured to compare the second subtitle information with the first subtitle information to obtain a subtitle comparison result of the crawler video and the copyrighted video, where the subtitle comparison result is used to determine whether the crawler video has copyright infringement.

In one possible design, the processing unit 301 is specifically configured to:

capturing the copyrighted video by adopting a cv2.videocapture;

In one possible design, the processing unit 301 is specifically configured to:

In one possible design, the preset model is a word recognition model ppocr; the processing unit 301 is further configured to:

setting model types and versions;

setting a model detection algorithm;

setting whether to use the GPU;

setting whether to use a null character;

setting whether angle_cls is used;

a path of the source video image is set.

In one possible design, the processing unit 301 is specifically configured to:

It should be noted that, the processing unit 301 and the comparing unit 302 may be integrated on the same device, or may be independently disposed on different devices, which is not limited by the embodiment of the present invention.

The embodiment of the present invention is based on the same concept as the video subtitle comparison-based copyright monitoring system shown in fig. 1, and by the foregoing detailed description of the video subtitle comparison-based copyright monitoring method, those skilled in the art can clearly understand the implementation process of the video subtitle comparison-based copyright monitoring system in this embodiment, so that, for brevity of the description, no further description is given here.

Based on the same inventive concept, an embodiment of the present invention further provides a computer readable storage medium, where at least one program may be stored, and when the at least one program is executed by a processor, the above-mentioned video subtitle alignment-based copyright monitoring method shown in fig. 1 is implemented.

It should be appreciated that a computer readable storage medium is any data storage device that can store data or a program, which can thereafter be read by a computer system. Examples of the computer readable storage medium include: read-only memory, random access memory, CD-ROM, HDD, DVD, magnetic tape, optical data storage devices, and the like.

The computer readable storage medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, radio Frequency (RF), or the like, or any suitable combination of the foregoing.

The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention.

Claims

1. A copyright monitoring method based on video subtitle comparison is characterized by comprising the following steps:

acquiring a copyrighted video and extracting first subtitle information of the copyrighted video, wherein the method comprises the following steps: performing frame extraction processing on the copyrighted video by using an open source picture video processing tool OpenCV to obtain multi-frame video images; extracting caption information of the video images of a plurality of frames by adopting a hundred-degree open-source deep learning framework PaddlePaddle; combining caption information of a plurality of frames of video images to obtain first caption information;

comparing the second subtitle information with the first subtitle information to obtain a subtitle comparison result of the crawler video and the copyright video, wherein the subtitle comparison result is used for judging whether copyright infringement exists in the crawler video;

the extracting the caption information of the video images of the plurality of frames by adopting a hundred-degree open-source deep learning framework PaddlePaddle comprises the following steps: instantiating a text detector according to a detection algorithm of a preset model; instantiating a text recognizer according to a recognition mode of the preset model; based on the text detector, performing recognition processing on a plurality of frames of the video images to obtain text range vectors of the plurality of frames of the video images; arranging text range vectors of a plurality of frames of video images according to a preset mode; clipping the text range vector of the multi-frame video image to obtain a clipped text range vector of the multi-frame video image; and based on the text identifier, identifying and extracting texts in the text range vectors of the multiple frames of the video images after trimming, and obtaining caption information of the multiple frames of the video images.

2. The method of claim 1, wherein performing frame extraction processing on the copyrighted video using an open source picture video processing tool OpenCV to obtain a multi-frame video image, comprising:

capturing the copyrighted video by adopting a cv2.videocapture;

3. The method of claim 1, wherein pruning the text range vector of the plurality of frames of the video image to obtain the pruned text range vector of the plurality of frames of the video image comprises:

4. The method of claim 1, wherein the predetermined model is a word recognition model pprcr;

setting model types and versions;

setting a model detection algorithm;

setting whether to use the GPU;

setting whether to use a null character;

setting whether angle_cls is used;

a path of the source video image is set.

5. The method of claim 4, wherein obtaining the first caption information based on the text recognizer performing recognition extraction processing on text in a text range vector of a plurality of frames of the video image, comprises:

6. The method of claim 1, wherein combining caption information for a plurality of frames of the video image to obtain the first caption information comprises:

7. A copyright monitoring system based on video subtitle comparison is characterized by comprising:

the processing unit is used for acquiring the copyrighted video and extracting the first subtitle information of the copyrighted video, and comprises the following steps: performing frame extraction processing on the copyrighted video by using an open source picture video processing tool OpenCV to obtain multi-frame video images; extracting caption information of the video images of a plurality of frames by adopting a hundred-degree open-source deep learning framework PaddlePaddle; combining caption information of a plurality of frames of video images to obtain first caption information; capturing a crawler video according to a preset keyword, and extracting second subtitle information of the crawler video; the extracting the caption information of the video images of the plurality of frames by adopting a hundred-degree open-source deep learning framework PaddlePaddle comprises the following steps: instantiating a text detector according to a detection algorithm of a preset model; instantiating a text recognizer according to a recognition mode of the preset model; based on the text detector, performing recognition processing on a plurality of frames of the video images to obtain text range vectors of the plurality of frames of the video images; arranging text range vectors of a plurality of frames of video images according to a preset mode; clipping the text range vector of the multi-frame video image to obtain a clipped text range vector of the multi-frame video image; based on the text identifier, identifying and extracting texts in text range vectors of the multiple frames of the video images after trimming, and obtaining caption information of the multiple frames of the video images;

8. A computer-readable storage medium, wherein the computer-readable storage medium stores at least one program; the method according to any of claims 1-6 being performed when said at least one program is executed by a processor.