CN114051163B - Copyright monitoring method and system based on video subtitle comparison - Google Patents

Copyright monitoring method and system based on video subtitle comparison Download PDF

Info

Publication number
CN114051163B
CN114051163B CN202111328854.9A CN202111328854A CN114051163B CN 114051163 B CN114051163 B CN 114051163B CN 202111328854 A CN202111328854 A CN 202111328854A CN 114051163 B CN114051163 B CN 114051163B
Authority
CN
China
Prior art keywords
video
text
frames
caption information
video images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111328854.9A
Other languages
Chinese (zh)
Other versions
CN114051163A (en
Inventor
何平涛
陈雷勇
赵善为
张奕良
李天水
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Electroshock Media Technology Co ltd
Original Assignee
Guangdong Electroshock Media Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Electroshock Media Technology Co ltd filed Critical Guangdong Electroshock Media Technology Co ltd
Priority to CN202111328854.9A priority Critical patent/CN114051163B/en
Publication of CN114051163A publication Critical patent/CN114051163A/en
Application granted granted Critical
Publication of CN114051163B publication Critical patent/CN114051163B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/835Generation of protective data, e.g. certificates
    • H04N21/8355Generation of protective data, e.g. certificates involving usage data, e.g. number of copies or viewings allowed

Abstract

The invention discloses a copyright monitoring method and a system based on video subtitle comparison, which can have the characteristics of lower cost, higher accuracy and efficiency of copyright monitoring and the like, and the method comprises the following steps: acquiring a copyrighted video and extracting first subtitle information of the copyrighted video; capturing a crawler video according to a preset keyword, and extracting second subtitle information of the crawler video; and comparing the second subtitle information with the first subtitle information to obtain a subtitle comparison result of the crawler video and the copyright video, wherein the subtitle comparison result is used for judging whether copyright infringement exists in the crawler video.

Description

Copyright monitoring method and system based on video subtitle comparison
Technical Field
The invention relates to the technical field of video subtitle extraction, in particular to a copyright monitoring method and system based on video subtitle comparison.
Background
Currently, video copyright monitoring technology in the industry is generally implemented through video content comparison. For example, video DNA (which may be referred to as a video fingerprint) is extracted from original video content, then video DNA is extracted from pirated video content, the video DNA of the original video and the video DNA of the pirated video are compared, and if the similarity of the two reaches a certain threshold, pirated video infringement is determined.
However, the video content comparison-based scheme has the following problems:
1. the technical threshold for extracting the video DNA is higher and the cost is higher.
2. The characteristic distinction degree of the video content is not large, the distance cannot be pulled away in the comparison process, and the problem of poor accuracy of copyright monitoring exists.
3. If the comparison threshold is not good, the problem of lower hit rate after comparison may exist because the feature distinction of the video content is not large if the comparison threshold is high, or a large number of videos still need to be checked manually one by one after comparison and screening if the comparison threshold is low, and the efficiency of copyright monitoring is low.
Therefore, there is a need to design a copyright monitoring scheme with lower cost and higher accuracy and efficiency of copyright monitoring.
Disclosure of Invention
Based on the problems of higher cost, lower accuracy and lower efficiency of copyright monitoring in the prior art, the invention aims to provide the copyright monitoring method and the system based on video subtitle comparison, which have the characteristics of lower cost, higher accuracy and higher efficiency of copyright monitoring and the like.
In a first aspect, an embodiment of the present invention provides a method for monitoring copyright based on video subtitle comparison, including:
acquiring a copyrighted video and extracting first subtitle information of the copyrighted video;
capturing a crawler video according to a preset keyword, and extracting second subtitle information of the crawler video;
and comparing the second subtitle information with the first subtitle information to obtain a subtitle comparison result of the crawler video and the copyright video, wherein the subtitle comparison result is used for judging whether copyright infringement exists in the crawler video.
In one possible design, acquiring a copyrighted video and extracting first subtitle information of the copyrighted video includes:
performing frame extraction processing on the copyrighted video by using an open source picture video processing tool OpenCV to obtain multi-frame video images;
extracting caption information of the video images of a plurality of frames by adopting a hundred-degree open-source deep learning framework PaddlePaddle;
and merging the caption information of the video images of a plurality of frames to obtain the first caption information.
In one possible design, the frame extraction processing is performed on the copyrighted video by using an OpenCV (open source picture video processing) tool to obtain a multi-frame video image, including:
capturing the copyrighted video by adopting a cv2.videocapture;
performing frame extraction processing on the copyrighted video by adopting a preset frame extraction interval duration to obtain a multi-frame original video image;
and adjusting the sizes of the original video images of a plurality of frames to be the same, and obtaining the video images of the plurality of frames.
In one possible design, extracting subtitle information for a plurality of frames of the video image using a hundred degree open source deep learning framework paddlepad includes:
instantiating a text detector according to a detection algorithm of a preset model;
instantiating a text recognizer according to a recognition mode of the preset model;
based on the text detector, performing recognition processing on a plurality of frames of the video images to obtain text range vectors of the plurality of frames of the video images;
arranging text range vectors of a plurality of frames of video images according to a preset mode;
clipping the text range vector of the multi-frame video image to obtain a clipped text range vector of the multi-frame video image;
and based on the text identifier, identifying and extracting texts in the text range vectors of the multiple frames of the video images after trimming, and obtaining caption information of the multiple frames of the video images.
In one possible design, the pruning process is performed on the text range vectors of the multiple frames of the video images to obtain the text range vectors of the multiple frames of the video images after pruning, including:
performing modular computation on the text range vectors of the multiple frames of video images by adopting a linear algebraic operation library numpy to obtain the modules of the text range vectors of the multiple frames of video images;
and based on the mode of the text range vector of the multi-frame video image, pruning the text region range vector of the multi-frame video image to obtain the text range vector after the multi-frame video image is pruned.
In one possible design, the preset model is a word recognition model ppocr;
before instantiating the text detector according to the detection algorithm of the preset model and instantiating the text recognizer according to the recognition mode of the preset model, the method further comprises:
setting model types and versions;
setting a model detection algorithm;
setting whether to use the GPU;
setting whether to use a null character;
setting whether angle_cls is used;
a path of the source video image is set.
In one possible design, the identifying and extracting the text in the text range vector after the clipping of the multiple frames of the video image based on the text identifier, to obtain the first subtitle information includes:
based on the text identifier, the text recognition model ppocr is adopted to perform positioning recognition on texts in text range vectors of a plurality of frames of video images after trimming, and the texts in the text range vectors of the plurality of frames of video images after trimming are extracted to obtain caption information of the plurality of frames of video images.
In one possible design, merging caption information of a plurality of frames of the video image to obtain the first caption information includes:
the method comprises the steps of performing outward extension processing on the coordinate range of a text range vector of a plurality of frames of video images, and calculating the editing distance between any two adjacent pieces of caption information in the caption information of the plurality of frames of video images;
if the editing distance between any two pieces of adjacent caption information is smaller than or equal to a preset threshold value, reserving one piece of caption information in the two pieces of adjacent caption information; or,
if the editing distance between any two pieces of adjacent caption information is determined to be greater than the preset threshold value, retaining the two pieces of adjacent caption information;
and merging all the reserved caption information to obtain the first caption information.
In a second aspect, an embodiment of the present invention further provides a copyright monitoring system based on video subtitle comparison, including:
the processing unit is used for acquiring the copyrighted video and extracting first subtitle information of the copyrighted video; capturing a crawler video according to a preset keyword, and extracting second subtitle information of the crawler video;
and the comparison unit is used for comparing the second subtitle information with the first subtitle information to obtain a subtitle comparison result of the crawler video and the copyright video, and the subtitle comparison result is used for judging whether copyright infringement exists in the crawler video.
In one possible design, the processing unit is specifically configured to:
performing frame extraction processing on the copyrighted video by using an open source picture video processing tool OpenCV to obtain multi-frame video images;
extracting caption information of the video images of a plurality of frames by adopting a hundred-degree open-source deep learning framework PaddlePaddle;
and merging the caption information of the video images of a plurality of frames to obtain the first caption information.
In one possible design, the processing unit is specifically configured to:
capturing the copyrighted video by adopting a cv2.videocapture;
performing frame extraction processing on the copyrighted video by adopting a preset frame extraction interval duration to obtain a multi-frame original video image;
and adjusting the sizes of the original video images of a plurality of frames to be the same, and obtaining the video images of the plurality of frames.
In one possible design, the processing unit is specifically configured to:
instantiating a text detector according to a detection algorithm of a preset model;
instantiating a text recognizer according to a recognition mode of the preset model;
based on the text detector, performing recognition processing on a plurality of frames of the video images to obtain text range vectors of the plurality of frames of the video images;
arranging text range vectors of a plurality of frames of video images according to a preset mode;
clipping the text range vector of the multi-frame video image to obtain a clipped text range vector of the multi-frame video image;
and based on the text identifier, identifying and extracting texts in the text range vectors of the multiple frames of the video images after trimming, and obtaining caption information of the multiple frames of the video images.
In one possible design, the processing unit is specifically configured to:
performing modular computation on the text range vectors of the multiple frames of video images by adopting a linear algebraic operation library numpy to obtain the modules of the text range vectors of the multiple frames of video images;
and based on the mode of the text range vector of the multi-frame video image, pruning the text region range vector of the multi-frame video image to obtain the text range vector after the multi-frame video image is pruned.
In one possible design, the preset model is a word recognition model ppocr; the processing unit is further configured to:
setting model types and versions;
setting a model detection algorithm;
setting whether to use the GPU;
setting whether to use a null character;
setting whether angle_cls is used;
a path of the source video image is set.
In one possible design, the processing unit is specifically configured to:
based on the text identifier, the text recognition model ppocr is adopted to perform positioning recognition on texts in text range vectors of a plurality of frames of video images after trimming, and the texts in the text range vectors of the plurality of frames of video images after trimming are extracted to obtain caption information of the plurality of frames of video images.
In one possible design, the processing unit is specifically configured to:
the method comprises the steps of performing outward extension processing on the coordinate range of a text range vector of a plurality of frames of video images, and calculating the editing distance between any two adjacent pieces of caption information in the caption information of the plurality of frames of video images;
if the editing distance between any two pieces of adjacent caption information is smaller than or equal to a preset threshold value, reserving one piece of caption information in the two pieces of adjacent caption information; or,
if the editing distance between any two pieces of adjacent caption information is determined to be greater than the preset threshold value, retaining the two pieces of adjacent caption information;
and merging all the reserved caption information to obtain the first caption information.
In a third aspect, embodiments of the present invention also provide a computer-readable storage medium storing at least one program; the method according to any one of the possible designs described above is implemented when the at least one program is executed by a processor.
The technical scheme provided by the embodiment of the invention has the following beneficial technical effects:
a. the distinguishing degree is large: the distinguishing degree of the video captions on different videos is quite obvious, the distance can be pulled apart in the caption information comparison process, and the accuracy of copyright monitoring can be further improved.
b. The technical threshold is low, and the cost is low: the technology for extracting the video captions and the related algorithm model adopted by the method are very mature.
c. The efficiency is high: after the subtitle information of the video is successfully extracted, the mutual comparison method is simple and efficient, and the accuracy of copyright monitoring can be improved.
d. The accuracy is high: compared with the hit rate of video content feature comparison, the hit rate of subtitle information comparison is high, and the accuracy of copyright monitoring can be improved.
Drawings
Fig. 1 is a flow chart of a copyright monitoring method based on video subtitle comparison according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a process of executing step S101 according to an embodiment of the present invention;
fig. 3 is a schematic diagram of an architecture of a copyright monitoring system based on video subtitle comparison according to an embodiment of the present invention.
Detailed Description
Terms of orientation such as up, down, left, right, front, rear, front, back, top, bottom, etc. mentioned or possible mentioned in this specification are defined with respect to their construction, and they are relative concepts. Therefore, the position and the use state of the device may be changed accordingly. These and other directional terms should not be construed as limiting terms.
The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of implementations consistent with aspects of the present disclosure.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
The shapes and sizes of the individual components in the drawings do not reflect true proportions, and are intended to illustrate only implementations described in the following exemplary examples.
Before describing the embodiments of the present invention, some terms related to the embodiments of the present invention are first explained so as to facilitate understanding of technical solutions provided by the embodiments of the present invention by those skilled in the art.
The embodiment of the invention relates to a pallet which is a deep learning framework with hundred degrees open source.
The PaddleCV related in the embodiment of the invention is a hundred-degree open-source computer vision model library.
The embodiment of the invention relates to a ppr, which is a character recognition model and technology based on a hundred-degree parallel deep learning framework.
The OpenCV related in the embodiment of the invention is an open-source computer vision tool library, and can perform frame extraction processing on pictures and videos to obtain feature vectors.
Numpy, which is referred to in the embodiment of the present invention, is an open-source linear algebraic computation library, and may be used to perform various operations on vectors.
The pandas related to the embodiment of the invention is a table data processing tool based on numpy, and can conveniently carry out batch processing on data.
For better understanding and implementation, the technical solutions provided by the embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Referring to fig. 1, a flow chart of a video subtitle comparison-based copyright monitoring method is provided in an embodiment of the present invention. As shown in fig. 1, the method may include the steps of:
s101, acquiring a copyrighted video and extracting first subtitle information of the copyrighted video.
In some embodiments, the process of implementing step S101 may include the steps of:
s201, performing frame extraction processing on the copyrighted video by using an open source picture video processing tool OpenCV to obtain multi-frame video images.
In a specific implementation, cv2.videocapture may be employed to capture copyrighted video. And then, carrying out frame extraction processing on the copyrighted video by adopting a preset frame extraction interval time length to obtain a multi-frame original video image. And then, the sizes of the multiple frames of original video images can be adjusted to be the same, namely, the multiple frames of original video images are resize, so that the multiple frames of video images are obtained. It can be understood that the video images with different resolutions are adjusted to be the same in size, so that the positioning coordinate position of the fixed caption can be realized, and the captions of various video images with different specifications can be positioned.
S202, extracting caption information of multi-frame video images by adopting a hundred-degree open-source deep learning framework PaddlePaddle.
In a specific implementation process, the related information of the preset model can be set first, so that the subtitle information of the copyrighted video can be conveniently identified and extracted. Specific setup procedures may include, but are not limited to, the following:
A. the model type and version are set.
B. A model detection algorithm is set.
C. It is set whether to use the GPU.
D. It is set whether to use null characters.
E. Whether angle_cls is used is set.
F. A path of the source video image is set.
It should be noted that, the embodiment of the present invention is not limited to the specific implementation sequence of the above-mentioned process, and the sequence selection of the above-mentioned process may be performed according to actual needs.
In a specific implementation, the preset model may be a word recognition model ppocr.
In a specific process, after the setting of the related information of the preset model is completed, the text detector can be instantiated according to a detection algorithm of the preset model, and the text recognizer can be instantiated according to a recognition mode of the preset model. The text identifier may be instantiated after or simultaneously with the instantiation of the text detector, which is not limited by the embodiment of the present invention.
In a specific implementation process, after the text detector is instantiated, the multi-frame video image can be identified based on the text detector, so as to obtain text range vectors of the multi-frame video image, and the text range vectors of the multi-frame video image can be regarded as a group of vectors. The text range vectors of the multi-frame video image may be arranged in a predetermined manner, for example, the text range vectors of the multi-frame video image may be arranged in an order from high to low on the ordinate of the text region range vector. The ordinate of the text region range vector can correspond to the appearance sequence of the video images in the copyrighted video, so that the problem of misjudgment of subtitle information comparison caused by disordered sequence of the video images can be avoided, and the accuracy of subsequent subtitle information comparison can be improved.
In a specific implementation process, after the arrangement of the text range vectors of the multi-frame video images is completed, the text range vectors of the multi-frame video images can be trimmed, the text range vectors of the multi-frame video images after trimming can be obtained, the content characteristics of the multi-frame video images can be simplified, and further, the identification and extraction of caption information can be facilitated. For example, the linear algebraic arithmetic library numpy may be used to perform modulo computation on the text range vector of the multi-frame video image, and an np. And then, based on the mode of the text range vector of the multi-frame video image, pruning the text region range vector of the multi-frame video image to obtain the text range vector after the multi-frame video image is pruned.
In a specific implementation process, after the text recognizer is instantiated, text in the text range vector after the multi-frame video image is trimmed can be recognized and extracted based on the text recognizer, so that caption information of the multi-frame video image is obtained. For example, based on a text recognizer, a text recognition model ppocr is used to perform positioning recognition on the text in the text range vector after the multi-frame video image trimming, and extract the text in the text range vector after the multi-frame video image trimming, so as to obtain the caption information of the multi-frame video image.
And S203, merging the caption information of the multi-frame video images to obtain first caption information of the copyrighted video.
In a specific implementation process, the editing distance between any two adjacent pieces of subtitle information in the subtitle information of the multi-frame video image can be calculated based on the outward extension processing of the coordinate range of the text range vector of the multi-frame video image. If the editing distance between any two pieces of adjacent caption information is smaller than or equal to the preset threshold value, one piece of caption information in the any two pieces of adjacent caption information can be reserved; or if the editing distance between any two pieces of adjacent caption information is determined to be greater than the preset threshold value, retaining the any two pieces of adjacent caption information. For example, the preset threshold may be set to 0, and when the edit distance between the two pieces of any adjacent caption information is equal to 0, it may indicate that the two pieces of any adjacent caption information are identical, and at this time, one piece of caption information in the two pieces of any adjacent caption information may be reserved. When the edit distance between any two pieces of adjacent caption information is greater than 0, it may be indicated that the any two pieces of adjacent caption information are not identical, and at this time, the any two pieces of adjacent caption information may be retained. And finally, combining all the reserved caption information to obtain the first caption information of the video copyright.
S102, capturing a crawler video according to a preset keyword, and extracting second subtitle information of the crawler video.
In a specific implementation process, after capturing the crawler video according to the preset keyword, the second subtitle information of the crawler video may be extracted in the same or similar manner to the first subtitle information of the extracted copyrighted video, and specifically, the content may be referred to above, which is not described herein again.
In a specific implementation process, the preset keywords may be set according to actual requirements, which is not limited by the embodiment of the present invention.
And S103, comparing the second subtitle information of the crawler video with the first subtitle information of the copyrighted video to obtain a subtitle comparison result of the crawler video and the copyrighted video.
In some embodiments, the subtitle comparison result may be used to determine whether the crawler video has copyright infringement.
In some embodiments, a comparison threshold between the second subtitle information of the crawler video and the first subtitle information of the copyrighted video may be set. For example, when the preset threshold is set to 5 based on the accuracy rate by combining factors such as the subtitle extraction quality and the like and comparing the test for a plurality of times, the accuracy is relatively high, the preset threshold may be set to 5.
In some embodiments, the second subtitle information of each crawler video may be compared with the first subtitle information of all the copyrighted videos in the copyrighted library in a circulating manner, that is, the second subtitle information of each crawler video captured back by each crawler is compared with the first subtitle information of all the copyrighted videos in the copyrighted library, so as to obtain a subtitle comparison result corresponding to any crawler video.
For example, if the subtitle comparison result corresponding to any one crawler video indicates that the number of subtitle information matches between the second subtitle information of any one crawler video and the first subtitle information of a certain copyright video is greater than a preset threshold, for example, 5, it is indicated that copyright infringement exists in any one crawler video.
In an applicable scenario mentioned in the embodiment of the present invention, as shown in fig. 1, the method for monitoring copyright based on video subtitle comparison provided in the embodiment of the present invention may further include the following steps:
s104, confirming the correctness of the comparison result in a manual auditing mode.
In some embodiments, the accuracy of the comparison result is confirmed through a manual auditing mode, so that the accuracy of judging whether the crawler video has copyright infringement can be further improved.
As can be seen from the above description, in the technical solution provided in the embodiments of the present invention, the monitoring of video rights is achieved by comparing video subtitle information, so that complicated processing of video content features can be skillfully avoided, and the problem of insufficient distinction of video content can be avoided. Compared with the scheme for video copyright monitoring based on video content in the prior art, as different videos are unlikely to have the same subtitle, the videos are distinguished through the video subtitles, and the attribute characteristics of the videos can be simplified and distinguished only by ensuring the quality of the video subtitles.
As can be seen from the above description, the technical solution provided in the embodiment of the present invention may have, but is not limited to, the following features:
a. the distinguishing degree is large: the distinguishing degree of the video captions on different videos is quite obvious, the distance can be pulled apart in the caption information comparison process, and the accuracy of copyright monitoring can be further improved.
b. The technical threshold is low, and the cost is low: the technology for extracting the video captions and the related algorithm model adopted by the method are very mature.
c. The efficiency is high: after the subtitle information of the video is successfully extracted, the mutual comparison method is simple and efficient, and the accuracy of copyright monitoring can be improved.
d. The accuracy is high: compared with the hit rate of video content feature comparison, the hit rate of subtitle information comparison is high, and the accuracy of copyright monitoring can be improved.
Based on the same inventive concept, the embodiment of the invention also provides a copyright monitor system based on video subtitle comparison, as shown in fig. 3, the system may include:
a processing unit 301, configured to acquire a copyrighted video, and extract first subtitle information of the copyrighted video; capturing a crawler video according to a preset keyword, and extracting second subtitle information of the crawler video;
and the comparing unit 302 is configured to compare the second subtitle information with the first subtitle information to obtain a subtitle comparison result of the crawler video and the copyrighted video, where the subtitle comparison result is used to determine whether the crawler video has copyright infringement.
In one possible design, the processing unit 301 is specifically configured to:
performing frame extraction processing on the copyrighted video by using an open source picture video processing tool OpenCV to obtain multi-frame video images;
extracting caption information of the video images of a plurality of frames by adopting a hundred-degree open-source deep learning framework PaddlePaddle;
and merging the caption information of the video images of a plurality of frames to obtain the first caption information.
In one possible design, the processing unit 301 is specifically configured to:
capturing the copyrighted video by adopting a cv2.videocapture;
performing frame extraction processing on the copyrighted video by adopting a preset frame extraction interval duration to obtain a multi-frame original video image;
and adjusting the sizes of the original video images of a plurality of frames to be the same, and obtaining the video images of the plurality of frames.
In one possible design, the processing unit 301 is specifically configured to:
instantiating a text detector according to a detection algorithm of a preset model;
instantiating a text recognizer according to a recognition mode of the preset model;
based on the text detector, performing recognition processing on a plurality of frames of the video images to obtain text range vectors of the plurality of frames of the video images;
arranging text range vectors of a plurality of frames of video images according to a preset mode;
clipping the text range vector of the multi-frame video image to obtain a clipped text range vector of the multi-frame video image;
and based on the text identifier, identifying and extracting texts in the text range vectors of the multiple frames of the video images after trimming, and obtaining caption information of the multiple frames of the video images.
In one possible design, the processing unit 301 is specifically configured to:
performing modular computation on the text range vectors of the multiple frames of video images by adopting a linear algebraic operation library numpy to obtain the modules of the text range vectors of the multiple frames of video images;
and based on the mode of the text range vector of the multi-frame video image, pruning the text region range vector of the multi-frame video image to obtain the text range vector after the multi-frame video image is pruned.
In one possible design, the preset model is a word recognition model ppocr; the processing unit 301 is further configured to:
setting model types and versions;
setting a model detection algorithm;
setting whether to use the GPU;
setting whether to use a null character;
setting whether angle_cls is used;
a path of the source video image is set.
In one possible design, the processing unit 301 is specifically configured to:
based on the text identifier, the text recognition model ppocr is adopted to perform positioning recognition on texts in text range vectors of a plurality of frames of video images after trimming, and the texts in the text range vectors of the plurality of frames of video images after trimming are extracted to obtain caption information of the plurality of frames of video images.
In one possible design, the processing unit 301 is specifically configured to:
the method comprises the steps of performing outward extension processing on the coordinate range of a text range vector of a plurality of frames of video images, and calculating the editing distance between any two adjacent pieces of caption information in the caption information of the plurality of frames of video images;
if the editing distance between any two pieces of adjacent caption information is smaller than or equal to a preset threshold value, reserving one piece of caption information in the two pieces of adjacent caption information; or,
if the editing distance between any two pieces of adjacent caption information is determined to be greater than the preset threshold value, retaining the two pieces of adjacent caption information;
and merging all the reserved caption information to obtain the first caption information.
It should be noted that, the processing unit 301 and the comparing unit 302 may be integrated on the same device, or may be independently disposed on different devices, which is not limited by the embodiment of the present invention.
The embodiment of the present invention is based on the same concept as the video subtitle comparison-based copyright monitoring system shown in fig. 1, and by the foregoing detailed description of the video subtitle comparison-based copyright monitoring method, those skilled in the art can clearly understand the implementation process of the video subtitle comparison-based copyright monitoring system in this embodiment, so that, for brevity of the description, no further description is given here.
Based on the same inventive concept, an embodiment of the present invention further provides a computer readable storage medium, where at least one program may be stored, and when the at least one program is executed by a processor, the above-mentioned video subtitle alignment-based copyright monitoring method shown in fig. 1 is implemented.
It should be appreciated that a computer readable storage medium is any data storage device that can store data or a program, which can thereafter be read by a computer system. Examples of the computer readable storage medium include: read-only memory, random access memory, CD-ROM, HDD, DVD, magnetic tape, optical data storage devices, and the like.
The computer readable storage medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, radio Frequency (RF), or the like, or any suitable combination of the foregoing.
The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention.

Claims (8)

1. A copyright monitoring method based on video subtitle comparison is characterized by comprising the following steps:
acquiring a copyrighted video and extracting first subtitle information of the copyrighted video, wherein the method comprises the following steps: performing frame extraction processing on the copyrighted video by using an open source picture video processing tool OpenCV to obtain multi-frame video images; extracting caption information of the video images of a plurality of frames by adopting a hundred-degree open-source deep learning framework PaddlePaddle; combining caption information of a plurality of frames of video images to obtain first caption information;
capturing a crawler video according to a preset keyword, and extracting second subtitle information of the crawler video;
comparing the second subtitle information with the first subtitle information to obtain a subtitle comparison result of the crawler video and the copyright video, wherein the subtitle comparison result is used for judging whether copyright infringement exists in the crawler video;
the extracting the caption information of the video images of the plurality of frames by adopting a hundred-degree open-source deep learning framework PaddlePaddle comprises the following steps: instantiating a text detector according to a detection algorithm of a preset model; instantiating a text recognizer according to a recognition mode of the preset model; based on the text detector, performing recognition processing on a plurality of frames of the video images to obtain text range vectors of the plurality of frames of the video images; arranging text range vectors of a plurality of frames of video images according to a preset mode; clipping the text range vector of the multi-frame video image to obtain a clipped text range vector of the multi-frame video image; and based on the text identifier, identifying and extracting texts in the text range vectors of the multiple frames of the video images after trimming, and obtaining caption information of the multiple frames of the video images.
2. The method of claim 1, wherein performing frame extraction processing on the copyrighted video using an open source picture video processing tool OpenCV to obtain a multi-frame video image, comprising:
capturing the copyrighted video by adopting a cv2.videocapture;
performing frame extraction processing on the copyrighted video by adopting a preset frame extraction interval duration to obtain a multi-frame original video image;
and adjusting the sizes of the original video images of a plurality of frames to be the same, and obtaining the video images of the plurality of frames.
3. The method of claim 1, wherein pruning the text range vector of the plurality of frames of the video image to obtain the pruned text range vector of the plurality of frames of the video image comprises:
performing modular computation on the text range vectors of the multiple frames of video images by adopting a linear algebraic operation library numpy to obtain the modules of the text range vectors of the multiple frames of video images;
and based on the mode of the text range vector of the multi-frame video image, pruning the text region range vector of the multi-frame video image to obtain the text range vector after the multi-frame video image is pruned.
4. The method of claim 1, wherein the predetermined model is a word recognition model pprcr;
before instantiating the text detector according to the detection algorithm of the preset model and instantiating the text recognizer according to the recognition mode of the preset model, the method further comprises:
setting model types and versions;
setting a model detection algorithm;
setting whether to use the GPU;
setting whether to use a null character;
setting whether angle_cls is used;
a path of the source video image is set.
5. The method of claim 4, wherein obtaining the first caption information based on the text recognizer performing recognition extraction processing on text in a text range vector of a plurality of frames of the video image, comprises:
based on the text identifier, the text recognition model ppocr is adopted to perform positioning recognition on texts in text range vectors of a plurality of frames of video images after trimming, and the texts in the text range vectors of the plurality of frames of video images after trimming are extracted to obtain caption information of the plurality of frames of video images.
6. The method of claim 1, wherein combining caption information for a plurality of frames of the video image to obtain the first caption information comprises:
the method comprises the steps of performing outward extension processing on the coordinate range of a text range vector of a plurality of frames of video images, and calculating the editing distance between any two adjacent pieces of caption information in the caption information of the plurality of frames of video images;
if the editing distance between any two pieces of adjacent caption information is smaller than or equal to a preset threshold value, reserving one piece of caption information in the two pieces of adjacent caption information; or,
if the editing distance between any two pieces of adjacent caption information is determined to be greater than the preset threshold value, retaining the two pieces of adjacent caption information;
and merging all the reserved caption information to obtain the first caption information.
7. A copyright monitoring system based on video subtitle comparison is characterized by comprising:
the processing unit is used for acquiring the copyrighted video and extracting the first subtitle information of the copyrighted video, and comprises the following steps: performing frame extraction processing on the copyrighted video by using an open source picture video processing tool OpenCV to obtain multi-frame video images; extracting caption information of the video images of a plurality of frames by adopting a hundred-degree open-source deep learning framework PaddlePaddle; combining caption information of a plurality of frames of video images to obtain first caption information; capturing a crawler video according to a preset keyword, and extracting second subtitle information of the crawler video; the extracting the caption information of the video images of the plurality of frames by adopting a hundred-degree open-source deep learning framework PaddlePaddle comprises the following steps: instantiating a text detector according to a detection algorithm of a preset model; instantiating a text recognizer according to a recognition mode of the preset model; based on the text detector, performing recognition processing on a plurality of frames of the video images to obtain text range vectors of the plurality of frames of the video images; arranging text range vectors of a plurality of frames of video images according to a preset mode; clipping the text range vector of the multi-frame video image to obtain a clipped text range vector of the multi-frame video image; based on the text identifier, identifying and extracting texts in text range vectors of the multiple frames of the video images after trimming, and obtaining caption information of the multiple frames of the video images;
and the comparison unit is used for comparing the second subtitle information with the first subtitle information to obtain a subtitle comparison result of the crawler video and the copyright video, and the subtitle comparison result is used for judging whether copyright infringement exists in the crawler video.
8. A computer-readable storage medium, wherein the computer-readable storage medium stores at least one program; the method according to any of claims 1-6 being performed when said at least one program is executed by a processor.
CN202111328854.9A 2021-11-10 2021-11-10 Copyright monitoring method and system based on video subtitle comparison Active CN114051163B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111328854.9A CN114051163B (en) 2021-11-10 2021-11-10 Copyright monitoring method and system based on video subtitle comparison

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111328854.9A CN114051163B (en) 2021-11-10 2021-11-10 Copyright monitoring method and system based on video subtitle comparison

Publications (2)

Publication Number Publication Date
CN114051163A CN114051163A (en) 2022-02-15
CN114051163B true CN114051163B (en) 2024-03-22

Family

ID=80208258

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111328854.9A Active CN114051163B (en) 2021-11-10 2021-11-10 Copyright monitoring method and system based on video subtitle comparison

Country Status (1)

Country Link
CN (1) CN114051163B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101453333A (en) * 2008-10-16 2009-06-10 北京光线传媒有限公司 Copyright recognition method, apparatus and system for media file
CN104143055A (en) * 2014-08-16 2014-11-12 合一网络技术(北京)有限公司 Pirated video monitoring method and system
CN107529068A (en) * 2016-06-21 2017-12-29 北京新岸线网络技术有限公司 Video content discrimination method and system
CN108881947A (en) * 2017-05-15 2018-11-23 阿里巴巴集团控股有限公司 A kind of infringement detection method and device of live stream
CN111539929A (en) * 2020-04-21 2020-08-14 北京奇艺世纪科技有限公司 Copyright detection method and device and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8717499B2 (en) * 2011-09-02 2014-05-06 Dialogic Corporation Audio video offset detector

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101453333A (en) * 2008-10-16 2009-06-10 北京光线传媒有限公司 Copyright recognition method, apparatus and system for media file
CN104143055A (en) * 2014-08-16 2014-11-12 合一网络技术(北京)有限公司 Pirated video monitoring method and system
CN107529068A (en) * 2016-06-21 2017-12-29 北京新岸线网络技术有限公司 Video content discrimination method and system
CN108881947A (en) * 2017-05-15 2018-11-23 阿里巴巴集团控股有限公司 A kind of infringement detection method and device of live stream
CN111539929A (en) * 2020-04-21 2020-08-14 北京奇艺世纪科技有限公司 Copyright detection method and device and electronic equipment

Also Published As

Publication number Publication date
CN114051163A (en) 2022-02-15

Similar Documents

Publication Publication Date Title
CN109376684B (en) Face key point detection method and device, computer equipment and storage medium
Yu et al. Trajectory-based ball detection and tracking in broadcast soccer video
CN107862315B (en) Subtitle extraction method, video searching method, subtitle sharing method and device
CN110705405B (en) Target labeling method and device
CN109657100B (en) Video collection generation method and device, electronic equipment and storage medium
CN113542777B (en) Live video editing method and device and computer equipment
US8805123B2 (en) System and method for video recognition based on visual image matching
US8627203B2 (en) Method and apparatus for capturing, analyzing, and converting scripts
WO2014092979A1 (en) Method of perspective correction for devanagari text
CN113052169A (en) Video subtitle recognition method, device, medium, and electronic device
CN109344864B (en) Image processing method and device for dense object
KR20090093904A (en) Apparatus and method for scene variation robust multimedia image analysis, and system for multimedia editing based on objects
Oh et al. Content-based scene change detection and classification technique using background tracking
CN110502664A (en) Video tab indexes base establishing method, video tab generation method and device
CN104883515A (en) Video annotation processing method and video annotation processing server
US10965965B2 (en) Detecting of graphical objects to identify video demarcations
CN111901536A (en) Video editing method, system, device and storage medium based on scene recognition
CN105100647A (en) Subtitle correction method and terminal
CN112613508A (en) Object identification method, device and equipment
CN105814561B (en) Image information processing system
CN114051163B (en) Copyright monitoring method and system based on video subtitle comparison
CN113435438A (en) Video screen board extraction and video segmentation method for image and subtitle fusion
US9152876B1 (en) Methods and systems for efficient handwritten character segmentation
CN110019951B (en) Method and equipment for generating video thumbnail
CN110807453A (en) OCR-based product character online detection method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant