CN113779304A

CN113779304A - Method and device for detecting infringement video

Info

Publication number: CN113779304A
Application number: CN202010837511.4A
Authority: CN
Inventors: 刘羽中; 郑志彤; 桂创华
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2020-08-19
Filing date: 2020-08-19
Publication date: 2021-12-10

Abstract

The invention discloses a method and a device for detecting an infringement video, and relates to the technical field of computers. One embodiment of the method comprises: determining a video to be detected; dividing a video to be detected into at least two video segments to be detected; respectively determining the image characteristics of at least two video clips to be detected; respectively calculating at least two first similarities between at least two video clips to be detected and the original video clip according to the image characteristics; wherein, the original video clip corresponds to an original video with copyright in a copyright video library; when one of the at least two first similarities is greater than a preset first threshold, determining the video segment to be detected with the first similarity greater than the preset first threshold as an infringing video segment corresponding to the original video segment, and determining the video to be detected as the infringing video. The method and the device can effectively identify and position the infringement video clip, improve the detection accuracy of the infringement video and reduce the risk of copyright dispute.

Description

Method and device for detecting infringement video

Technical Field

The invention relates to the technical field of computers, in particular to a method and a device for detecting infringement videos.

Background

Currently, video has become an important information transmission medium. In practical applications, when a user uploads a video or shares a video, there may be some videos that violate the copyright of others. For example, a certain video M uploaded by the user is obtained by splicing a certain segment of the video a with copyright with a certain segment of the video B with copyright, and the uploading platform should detect the video M to avoid copyright dispute caused by uploading infringement video by the user.

In the prior art, when detecting infringement videos, the videos to be detected are generally detected as a whole. For example, a fingerprint of the whole video is extracted and compared with video fingerprints in a video library; or, the whole video is converted into text description, and the video copyright detection is realized through text comparison.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:

because the prior art mainly adopts an integral detection mode to detect the video, the accuracy of the detection result of the prior art is lower under the condition that the video to be detected is only a certain segment and the video with copyright is repeated, and the risk of copyright dispute is increased.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for detecting an infringement video, which can effectively identify and locate an infringement video segment in a video to be detected, aiming at a situation that only a certain segment in the video to be detected is identical to a video with a copyright, so as to improve detection accuracy of the infringement video and reduce a risk of occurrence of copyright disputes.

To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of detecting an infringing video.

The method for detecting the infringement video comprises the following steps:

determining a video to be detected;

dividing the video to be detected into at least two video segments to be detected;

respectively determining the image characteristics of the at least two video clips to be detected;

respectively calculating at least two first similarities between the at least two video clips to be detected and the original video clip according to the image characteristics; wherein the original video segment corresponds to an original video with copyright in a copyright video library;

when one of the at least two first similarities is greater than a preset first threshold, determining the video segment to be detected with the first similarity greater than the preset first threshold as an infringing video segment corresponding to the original video segment, and determining the video to be detected as an infringing video.

Alternatively,

the determining the image characteristics of at least two video clips to be detected respectively comprises the following steps:

performing frame extraction on the video clip to be detected according to the sampling frequency to obtain a plurality of image frames;

and extracting depth local features in the plurality of image frames by utilizing a LIFT algorithm, and determining the image features according to the depth local features.

Alternatively,

the determining the image feature according to the depth local feature includes:

determining feature areas in the plurality of image frames by using an optical flow method according to the time sequence of the plurality of image frames;

and selecting a depth local feature corresponding to the feature region from the depth local features of the plurality of image frames, and taking the selected depth local feature as the image feature.

Alternatively,

the calculating at least two first similarities between the at least two video segments to be detected and the original video segments in the copyright video library respectively comprises:

executing, for each video segment to be detected: and coding the image characteristics according to a preset dimension by using a bag-of-words model, and calculating a first similarity between the video segment to be detected and the original video segment according to the coding of the image characteristics.

Alternatively,

the dividing the video to be detected into at least two video segments to be detected comprises:

determining a scene conversion frame in the video to be detected;

and dividing the video to be detected into at least two video segments to be detected according to the scene conversion frame.

Alternatively,

and the second similarity between the video to be detected and the original video in the copyright video library is greater than a preset second threshold, and the original video segment is obtained by dividing the original video according to the duration of the scene conversion frame or the video segment to be detected in the original video.

Optionally, the method further comprises:

determining that the at least two video clips to be detected respectively correspond to the time axes of the videos to be detected;

and determining the start time of infringement and the end time of infringement in the infringement video according to the time axis and the infringement video clip.

Alternatively,

when a plurality of adjacent infringement video segments exist in the video to be detected, the determining the start time and the end time of infringement in the infringement video includes:

according to the time axis, the starting time of the first infringing video clip in the adjacent infringing video clips is used as the starting time of the infringement, and the ending time of the last infringing video clip in the adjacent infringing video clips is used as the ending time of the infringement.

To achieve the above object, according to still another aspect of embodiments of the present invention, there is provided an apparatus for detecting infringement video.

The device for detecting the infringement video comprises the following components: the system comprises a segment dividing module, a feature determining module, a similarity calculating module and an infringement video determining module; wherein,

the segment dividing module is used for determining a video to be detected and dividing the video to be detected into at least two video segments to be detected;

the characteristic determining module is used for respectively determining the image characteristics of the at least two video clips to be detected;

the similarity calculation module is used for calculating at least two first similarities between the at least two to-be-detected video clips and the original video clip respectively according to the image characteristics; the original video segment is a segment of an original video with copyright in a copyright video library;

the infringing video determining module is configured to determine, when one of the at least two first similarities is greater than a preset first threshold, the to-be-detected video segment whose first similarity is greater than the preset first threshold as an infringing video segment corresponding to the original video segment, and determine that the to-be-detected video is an infringing video.

Alternatively,

the characteristic determining module is used for performing frame extraction on the video clip to be detected according to the sampling frequency to obtain a plurality of image frames; extracting depth local features in the plurality of image frames by utilizing a LIFT algorithm; determining feature areas in the plurality of image frames by using an optical flow method according to the time sequence of the plurality of image frames; and selecting a depth local feature corresponding to the feature region from the depth local features of the plurality of image frames, and taking the selected depth local feature as the image feature.

To achieve the above object, according to still another aspect of the embodiments of the present invention, there is provided an electronic device for detecting an infringement video.

The electronic equipment for detecting the infringement video comprises the following components: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for detecting infringement video according to an embodiment of the invention.

To achieve the above object, according to still another aspect of embodiments of the present invention, there is provided a computer-readable storage medium.

A computer-readable storage medium of an embodiment of the present invention has stored thereon a computer program that, when executed by a processor, implements a method of detecting infringement video of an embodiment of the present invention.

One embodiment of the above invention has the following advantages or benefits: the method comprises the steps of dividing a video to be detected into a plurality of video segments to be detected, calculating a first similarity between the video to be detected and an original video segment with copyright according to image characteristics of the segments to be detected, and determining the video segments to be detected as infringing video segments and determining the video to be detected as infringing video segments when the calculated first similarity is larger than a preset first threshold. Therefore, the infringement video segments in the video to be detected are effectively identified and positioned, so that the detection accuracy of the infringement video is improved, the risk of copyright disputes is reduced, the method can be applied to the scenes of video overall infringement and video segment infringement, and has certain robustness on video composite change. Moreover, because the video is easy to dub for the second time, the similarity is calculated mainly based on the image characteristics of the video clip to be detected, and audio data which is easy to tamper is not considered, so that the detection accuracy of the infringing video is further improved.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

fig. 1 is a schematic diagram illustrating main steps of a method for detecting infringement video according to an embodiment of the invention;

fig. 2 is a schematic diagram illustrating a division of a video segment to be detected according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating the main steps of another method for detecting infringement video according to an embodiment of the invention;

fig. 4 is a schematic diagram of a video clip to be detected corresponding to a time axis of a video to be detected according to an embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating the main steps of another method for detecting infringement video according to an embodiment of the invention;

FIG. 6 is a schematic diagram of the main modules of an apparatus for detecting infringement video according to an embodiment of the invention;

FIG. 7 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 8 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the embodiments of the present invention and the technical features of the embodiments may be combined with each other without conflict.

Fig. 1 is a schematic diagram of main steps of a method for detecting infringement video according to an embodiment of the invention.

As shown in fig. 1, a method for detecting an infringement video according to an embodiment of the present invention mainly includes the following steps:

step S101: and determining the video to be detected.

For example, the video to be detected may be a video that is prepared for the user to be uploaded or is uploaded to a video platform, or a video that is shared by the user to other users, and in short, the video that needs to be detected whether the video has an infringement risk may be used as the video to be detected.

Step S102: and dividing the video to be detected into at least two video segments to be detected.

In the embodiment of the invention, the video to be detected can be divided into at least two video segments to be detected in different modes according to actual requirements. For example, the video to be detected can be divided into a plurality of video segments to be detected in a uniform division manner according to the time length of the video to be detected. For example, when the duration of the video to be detected is 60s, the video to be detected can be divided into 6 video segments to be detected in a uniform division manner, and the duration of each video segment to be detected is 10 s. Of course, in this embodiment, the number of the divided video segments to be detected can be adjusted according to actual requirements.

In a preferred embodiment of the present invention, a scene change frame in a video to be detected may be determined first; and then dividing the video to be detected into at least two video segments to be detected according to the scene conversion frame.

If the video to be detected includes an infringing video segment, that is, if the video to be detected is obtained by splicing a plurality of infringing video segments, or obtained by splicing the infringing video segment with other non-infringing video segments, then the image or scene is likely to have drastic changes for the front and rear image frames at the spliced position. Therefore, the video to be detected is divided according to the scene conversion frame, so that the divided video segments to be detected can better meet the actual splicing condition, and the subsequent infringement video segments can be detected more quickly and accurately.

Generally, the video file header information is written with information of the scene conversion frame of the corresponding video, and the scene conversion frame of the video to be detected can be determined by reading and analyzing the video file header information. In addition, the frame extraction processing can be carried out on the video to be detected, and the scene conversion frame in the video to be detected can be determined according to the information in the extracted image frame. For example, if the similarity change between the extracted previous and next frames is greater than the specified change ratio, the next frame may be used as the scene change frame of the video to be detected.

As shown in fig. 2, when the video a to be detected includes 4 scene transition frames, it can be divided into 5 video segments to be detected (a1 to a5) according to the scene transition frames of the video a to be detected. It can be understood that the lengths of the video segments to be detected, which are divided according to the scene change frame, may be the same or different. Fig. 2 shows a case where the lengths of the respective video segments to be detected are different.

Step S103: and respectively determining the image characteristics of the at least two video clips to be detected.

When the image characteristics of each video clip to be detected are determined, the video clip to be detected can be subjected to frame extraction according to a set sampling frequency to obtain a plurality of image frames; then, depth local features in the plurality of image frames are extracted by utilizing a LIFT algorithm, and the image features are determined according to the depth local features.

After the video to be detected is divided into a plurality of video segments to be detected, each video segment to be detected can be subjected to frame extraction according to a set sampling frequency so as to obtain a plurality of image frames. Taking frame extraction of the video segment a1 to be detected as an example, for example, uniform frame extraction is performed on the video segment a1 to be detected according to a set sampling frequency FPS (Frames Per Second, transmission frame number Per Second) of 1, that is, one frame is extracted Per Second from the video segment a1 to be detected, and when the duration of the video segment a1 to be detected is 8s, 8 image Frames can be obtained after frame extraction is performed on the video segment a1 to be detected.

Then, depth local features in a plurality of image frames are extracted by utilizing a LIFT (Invariant Feature Transform generated by learning) algorithm, for example, each depth local Feature keypoint Feature dimension is 256, so that different numbers of depth local Feature information with the dimension of 256 can be extracted from each image frame. Still taking the video segment a1 to be detected as an example, for 8 image frames (a11 to a18) extracted from the video segment a1, a depth local feature with a dimension of 10 × 256 is extracted from the image frame a11 by using the LIFT algorithm, a depth local feature with a dimension of 8 × 256 is extracted from the image frame a12, and a depth local feature with a dimension of 15 × 256 is extracted from the image frame a 18. Then, the image characteristics of the video segment to be detected can be determined according to the extracted depth local characteristics.

In order to reduce the amount of calculation in the video detection process and improve the efficiency of the infringement video detection, in one embodiment of the present invention, the feature areas in the plurality of image frames may be determined by an optical flow method according to the time sequence of the plurality of image frames; and selecting a depth local feature corresponding to the feature region from the depth local features of the plurality of image frames, and taking the selected depth local feature as the image feature.

The optical flow method is a method for calculating motion information of an object between adjacent frames by finding a correspondence between a previous frame and a current frame by using a change of a pixel in an image sequence in a time domain and a correlation between adjacent frames. In the video segment to be detected, there may be a characteristic region with large pixel variation and there may also be a background region with small pixel variation. For example, for a video to be detected formed by a user shooting a certain playing picture (such as a television playing picture), it is likely that the video will include content related to the environment where the television is located (such as a wall surface on which the television is located or a television cabinet for placing the television) in addition to the television playing content. In the video to be detected, a television playing picture is the key for detecting whether the video infringes, and the detection significance of the content related to the environment where the television is located on the infringement video is not large. In other words, for the background region of the video segment to be detected, the local feature of the reference depth is not significant.

Therefore, for a plurality of image frames corresponding to each video segment to be detected, a characteristic region with a large pixel change can be determined by adopting an optical flow method according to the time sequence of the plurality of image frames, and in this example, the determined characteristic region is a region corresponding to a television playing picture. Then, the depth local features corresponding to the feature regions are selected as the image features of the video segments to be detected, so that the depth local features of the background regions with small pixel change are ignored, and the calculation amount in the video detection process is reduced. For example, when a depth local feature having a dimension of 10 × 256 is extracted from the image frame a11, the extracted depth local feature is selected according to the feature region, and the dimension of the depth layout feature corresponding to the feature region in the image frame a11 may be selected to be 8 × 256. Similarly, the dimension of the depth layout feature corresponding to the feature region in the image frame a12 is selected to be 6 × 256 according to the feature region, and the dimension of the depth layout feature corresponding to the feature region in the image frame a18 is selected to be 10 × 256 according to the feature region. And then, the selected depth local features can be used as the image features of the video clip to be detected, so that the calculation amount can be reduced when the first similarity between the video clip to be detected and the original video clip is calculated according to the image features, and the efficiency of the infringement video detection is improved.

Step S104: respectively calculating at least two first similarities between the at least two video clips to be detected and the original video clip according to the image characteristics; wherein the original video segment corresponds to an original video with copyright in a copyright video library.

Here, the at least two first similarities between the at least two video segments to be detected and the original video segment refer to: the video clip to be detected comprises a plurality of first similarities corresponding to each video clip to be detected and a plurality of original video clips respectively, or a plurality of first similarities corresponding to the video clips to be detected and the original video clip respectively. That is, each first similarity corresponds to a pair of the video segment to be detected and the original video segment, and since the video segment to be detected and the original video segment are both multiple, at least two (multiple) first similarities can be obtained here.

Step S105: when one of the at least two first similarities is greater than a preset first threshold, determining the video segment to be detected with the first similarity greater than the preset first threshold as an infringing video segment corresponding to the original video segment, and determining the video to be detected as an infringing video.

For each video segment to be detected, the number of image frames obtained by uniformly framing the video segment is different because the length of the video segment may be different. Further, the number of the depth local features included in each image frame is also different, and therefore, the dimension of the image feature corresponding to each video segment to be detected is also different.

In the embodiment of the invention, in order to calculate the first similarity between the video segment to be detected and the original video segment according to the image characteristics, the image characteristics are encoded according to the preset dimensionality by using a bag-of-words model, and the first similarity between the video segment to be detected and the original video segment is calculated according to the encoding of the image characteristics.

In this embodiment, the image features are encoded according to a preset dimension by using a bag-of-words model to encode the depth local features of different dimensions in a plurality of image frames to the same dimension, for example, when the bag-of-words size is 1000, that is, the preset dimension is 1000, the depth local features in each image frame are encoded into a vector of 1000 lengths. That is, a feature vector with a length of 1000 is obtained whether the depth local feature with the dimension of 8 × 256 in the image frame a11, the depth local feature with the dimension of 6 × 256 in the image frame a12, or the depth local feature with the dimension of 6 × 256 in the image frame a18 is encoded by the bag-of-word model with a bag-of-word size of 1000.

Since the number of the image frames included in each video segment to be detected is different, in order to further improve the calculation efficiency, the codes corresponding to the video segments to be detected can be averaged according to the number of the image frames included in each video segment to be detected. For example, for the video segment a1 to be detected, 8 image frames are extracted, and after the image segment a1 is encoded by the bag-of-words model, the dimension of the depth local feature of each image frame is 1 × 1000, and then the dimension of the image feature of the video segment to be detected is 8 × 1000. Then, the feature dimensions of the image frames corresponding to the video segment to be detected may be further averaged according to the number of the image frames corresponding to the video segment to be detected, so as to encode the image features of the video segment to be detected into a vector with a length of 1000, that is, after averaging the dimensions of the image features of the video segment to be detected a1 according to the number of the image frames, the dimension of the image features of the segment to be detected a1 is 1 × 1000. For another example, if 4 image frames are extracted from the to-be-detected video segment a2, the dimension of the image features of the to-be-detected video segment a2 is also 1 × 1000 after the image features are encoded and averaged by the bag-of-words model.

And then, aggregating the image characteristics of the plurality of video segments to be detected to obtain a matrix of the image characteristics corresponding to the video to be detected. For example, for the video a to be detected, since the frame is converted according to the scene included in the frame and is divided into 5 video segments to be detected (a1 to a5), the matrix of the corresponding image features is a5 × 1000 dimensional matrix. Further, the image features of the video to be detected may be subjected to normalization processing (e.g., TF-IDF normalization) and regularization processing (e.g., L2 regularization), and then the first similarity with the original video segment may be calculated according to the processing result.

It is worth mentioning that, for videos with copyrights in the copyright video library, the same processing mode as that of the video to be detected can be adopted, the videos with copyrights are divided into at least two original video segments, the image characteristics of the original video segments are respectively determined, and the first similarity of the videos with copyrights in the copyright video library and the video to be detected is calculated according to the image characteristics of the original video segments and the image characteristics of the video segments to be detected.

In order to reduce the processing amount of the original video in the copyright video library and improve the detection efficiency of the infringement video, in one embodiment of the present invention, a plurality of second similarities between the video to be detected and the plurality of videos in the copyright video library may be respectively determined, then the video with the second similarity greater than a preset second threshold in the copyright video library is used as the original video to be compared with the video to be detected, and the original video may be further divided according to the duration of the scene conversion frame or the video segment to be detected in the original video to obtain at least two original video segments to be compared with the video segment to be detected.

When the original video is divided, the original video can be divided in the same way as the video to be detected. For example, when the video to be detected is divided into a plurality of video segments to be detected according to the scene change frame in the video to be detected, the video to be detected is also divided into a plurality of original video segments according to the scene change frame of the original video. Or, when the video to be detected is uniformly divided and divided into a plurality of video segments to be detected, the original video is also uniformly divided and divided into a plurality of original video segments.

In addition, the original video can be divided according to the duration of the video clip to be detected, so as to obtain a plurality of original video clips. For example, when the duration of the video segment a1 to be detected is 8s, for the original video compared with the video segment a1 to be detected, the original video is divided into a plurality of original video segments with duration of 8 s; when the duration of the video segment a2 to be detected is 10s, the original video to be compared with the video a2 to be detected is divided into a plurality of original video segments with the duration of 10 s. Of course, the video segment a1 to be detected and the video segment a2 to be detected may correspond to the same original video, and then, the original video may be divided for multiple times according to the time lengths of different video segments to be detected, so as to obtain multiple original video segments with different time lengths, thereby facilitating comparison with the video segments to be detected with different time lengths. It can be understood that, when the original video is divided according to the time length, the time length of the last original video segment may be less than the time length of the video segment to be detected.

In a preferred embodiment of the present invention, when a scene change frame exists in an original video, the original video is divided according to the scene change frame of the original video, and when the scene change frame does not exist in the original video, the original video is divided according to the duration of a video segment to be detected.

Generally speaking, the second similarity is the overall similarity between the video to be detected and the video in the original copyright video library, and the overall screening is only the preliminary screening of the original video in the copyright video library, so that the second similarity is smaller than the first similarity. Specifically, the second similarity between the video to be detected and the original video in the copyright video library can be calculated as a whole, then the original video with higher similarity to the video to be detected is selected from the copyright video library according to the second similarity, and then the selected original video is divided into a plurality of original video segments according to the scene conversion frame in the selected original video. Therefore, the videos in the original copyright video library are selected from the overall similarity, and only the original videos with higher overall similarity to the videos to be detected are processed, so that the processing amount of the original videos in the copyright video library is reduced, and the detection efficiency of the infringement videos is improved.

In addition, after the original video with the second similarity greater than the preset second threshold is divided into a plurality of original video segments, the image features of the original video segments may be determined in the same manner as the video segments to be detected, for example, each original video segment is subjected to frame extraction according to the same sampling frequency to obtain a plurality of image frames, then, the local depth features in the plurality of image frames of the original video segments are extracted by using the LIFT algorithm, the feature regions in the plurality of image frames of the original video segments are determined by using the optical flow method, and the local depth features corresponding to the feature regions in the plurality of image frames of the original video segments are selected as the image features of the original video segments. Of course, after the image features of the original video segment are determined, the image features of the original video segment can also be encoded by using the bag-of-word model, so as to obtain the encoding of the depth local features corresponding to the original video segment with the set dimensionality.

According to the above embodiment, as shown in fig. 3, a method for detecting an infringement video according to an embodiment of the present invention may include the following steps S301 to S310:

step S301: determining a video to be detected, and dividing the video to be detected into at least two video segments to be detected according to scene conversion frames in the video to be detected.

Step S302: and determining a second similarity between the video to be detected and the video in the copyright video library, and dividing the original video into at least two original video segments according to the scene conversion frame in the original video with the second similarity being greater than a preset second threshold value.

Step S303: executing, for each video segment to be detected: and performing frame extraction on the video clip to be detected according to the sampling frequency to obtain a plurality of image frames, and extracting the depth local features in the plurality of image frames by utilizing a LIFT algorithm.

Step S304: determining feature regions in the plurality of image frames using an optical flow method according to a temporal order of the plurality of image frames.

Step S305: and selecting the depth local features corresponding to the feature regions from the depth local features of the image frames, and taking the selected depth local features as the image features of the video clip to be detected.

Step S306: image characteristics of an original video segment are determined.

Here, the original video segment may be processed according to the methods shown in steps S303 to S305 to determine the image characteristics of the original video segment.

Step S307: and coding the image characteristics of the video segment to be detected and the image characteristics of the original video segment according to a preset dimension by using a bag-of-words model.

Step S308: respectively calculating a first similarity k between each video clip to be detected and each original video clip according to the codes of the image characteristics of the video clips to be detected and the original video clips; step S309 is executed when the first similarity k is greater than a preset first threshold n, and step S310 is executed when each of the first similarities k is not greater than the preset first threshold n.

For example, after the bag-of-words model encoding, the dimension of the image features of the video segment to be detected is 1 × 1000, and the dimension of the image features of the original video segment is also 1 × 1000. And the video to be detected comprises 5 video segments to be detected, the original video segment also comprises 5 video segments to be detected, the dimensionality of the matrix of the image characteristics corresponding to the video to be detected is 5 x 1000, and the dimensionality of the matrix of the image characteristics corresponding to the original video is 5 x 1000. And calculating according to the matrixes of the image characteristics of the video to be detected and the original video, so as to respectively obtain the first similarity between the 5 video segments to be detected and the 5 original video segments. Of course, the vectors of the image features of each to-be-detected video segment and the vectors of the image features of the 5 original video segments may be calculated respectively, so as to obtain the first similarity between each to-be-detected video segment and the 5 original video segments.

Step S309: and determining the video clip to be detected as an infringing video clip corresponding to the original video clip, determining the video clip to be detected as an infringing video, and ending the current process.

Step S310: and determining the video to be detected as a non-infringing video.

And when the similarity between each video clip to be detected and all original video clips is not larger than a preset first threshold value, determining that the video to be detected is a non-infringing video.

In the embodiment of the present invention, it may also be determined that the at least two video clips to be detected respectively correspond to the time axes of the video to be detected; and determining the start time of infringement and the end time of infringement in the infringement video according to the time axis and the infringement video clip.

As shown in fig. 4, in the video a to be detected, the time axis of each video clip to be detected corresponding to the video a to be detected can be respectively determined, that is, the time axis of the video clip to be detected a1 corresponding to the video a to be detected is t1-t2, the time axis of the video clip to be detected a2 corresponding to the video a to be detected is t2-t3, and so on, the time axis of the video clip to be detected a5 corresponding to the video a to be detected is t5-t 6. Then, the start time of the infringement in the infringement video and the end time of the infringement can be determined according to the time axis of each infringement video clip corresponding to the video to be detected.

For example, if it is determined according to the first similarity that only the video segment A3 to be detected is an infringing video segment in the video a to be detected, the video a to be detected is the infringing video, the start time of the infringing is the start time t3 of the infringing video segment A3 in the infringing video a, and the end time of the infringing is the end time t4 of the infringing video segment A3.

When a plurality of adjacent infringement video clips exist in the video to be detected, according to the time axis, the start time of a first infringement video clip in the plurality of adjacent infringement video clips is used as the start time of the infringement, and the end time of a last infringement video clip in the plurality of adjacent infringement video clips is used as the end time of the infringement.

For example, when the video segments a2 and A3 to be detected in the video a to be detected are all infringing video segments, the video a to be detected is infringing video, and in the infringing video a, the start time of infringement is the start time (t2) of the first infringing video segment in the plurality of infringing video segments, and the end time of infringement is the end time (t4) of the last infringing video segment in the plurality of infringing video segments. That is, it is determined that the infringing time period in the infringing video a is t2-t 4. In order to facilitate the determination of the infringement time period in the infringing video, in an embodiment of the present invention, when a plurality of adjacent infringing video segments exist in the video to be detected, the plurality of adjacent infringing video segments may be combined into a whole, and the combination of the plurality of infringing video segments is used as the infringing video segment in the infringing video.

It can be understood that, for the original video segment corresponding to the infringing video segment, the time length of the infringing of the original video can also be determined according to the time axis of the original video segment corresponding to the original video. Still taking the infringing video a as an example, if the infringing video segment a2 in the infringing video a corresponds to the original video segment X3 in the original video X, then the time period of infringing in the original video may be determined to be T3-T4 according to the time axis (e.g., T3-T4) of the original video corresponding to the original video segment X3.

In addition, for the case that a plurality of non-adjacent infringing video clips exist in the infringing video, the start time and the end time of the infringing in the infringing video can be determined according to the time axis of each infringing video clip. For example, when the infringing video clips in the infringing video a are A3 and a5, it is determined that the infringing time periods in the infringing video a are t3-t4 and t5-t6 according to the time axes of A3 and a5 corresponding to the infringing video a, respectively.

It should be noted that, in an embodiment of the present invention, if a first similarity difference between two non-adjacent infringing video segments and a video segment to be detected located therebetween (non-infringing video segment) is smaller than a preset third threshold, a combination of the two infringing video segments and the segment to be detected located therebetween may be regarded as an infringing video segment in the infringing video, and a start time and an end time of an infringement in the infringing video are determined according to a combination of the two infringing video segments and the segment to be detected located therebetween. For example, when the infringing video segments in the infringing video a are A3 and a5, the first preset threshold is 0.99, i.e. the first similarity between the infringing video segment A3 and the original video segment X3 is greater than 0.99, such as 0.998; and the first similarity between the infringing video segment a5 and the original video segment X5 is also greater than 0.99, such as the first similarity is 0.999, and the first similarity between the detected video segment a4 and the original video segment X4 is not greater than 0.99, such as the first similarity between the detected video segment a4 and the original video segment X4 is 0.96. If the third threshold is 0.050, since the first similarity difference between the infringing video segment A3 and the video segment a4 to be detected is 0.038, and the first similarity difference between the infringing video segment a5 and the video segment a4 to be detected is 0.039, which are both smaller than the third threshold 0.050, the combination of the video segments A3-a5 can be regarded as an infringing video segment in the infringing video a, the start time of infringing in the infringing video a is t3, and the end time of infringing is t6, that is, the infringing time period in the infringing video a is t3-t 6. Accordingly, assuming that the original video segments X3-X5 correspond to the original video X with time axes T3-T4, T4-T5 and T5-T6, respectively, the infringed time periods in the original video X are T3-T6. In addition, it is understood that, although it is exemplified that a plurality of infringement video segments correspond to a plurality of original video segments in the same original video, in practical applications, the plurality of infringement video segments may respectively correspond to different original videos, such as the infringement video segment A3 corresponding to the original video segment X3 in the original video X, and the infringement video segment a5 corresponding to the original video segment Y2 in the original video Y.

According to the above embodiment, the method for detecting an infringement video provided by the embodiment of the invention may include the following steps:

step S501: determining a video to be detected, and dividing the video to be detected into at least two video segments to be detected according to scene conversion frames in the video to be detected.

Step S502: and determining a second similarity between the video to be detected and the video in the copyright video library, and dividing the original video into at least two original video segments according to the scene conversion frame in the original video with the second similarity being greater than a preset second threshold value.

Step S503: and respectively determining the image characteristics of the at least two video clips to be detected and the at least two original video clips.

Step S504: and respectively calculating a first similarity between each video clip to be detected and each original video clip according to the image characteristics.

Step S505: and when the first similarity is larger than a preset first threshold value, determining that the video segment to be detected is an infringing video segment corresponding to the original video segment, and determining that the video to be detected is an infringing video.

Step S506: determining that the at least two video clips to be detected respectively correspond to a first time axis of the video to be detected, and the at least two original video clips respectively correspond to a second time axis of the original video, determining the infringement starting time and the infringement ending time in the infringement video according to the first time axis and the infringement video clips, and determining the infringed real time and the infringed ending time in the original video according to the second time axis and the original video clips corresponding to the infringement video clips.

According to the method for detecting the infringement video, disclosed by the embodiment of the invention, the video to be detected is divided into a plurality of video segments to be detected, then the first similarity between the video segments to be detected and the original video segments with the copyright is calculated according to the image characteristics of the segments to be detected, when the calculated first similarity is greater than a preset first threshold value, the video segments to be detected are determined to be the infringement video segments, and the video to be detected is the infringement video. Therefore, the infringement video segments in the video to be detected are effectively identified and positioned, so that the detection accuracy of the infringement video is improved, the risk of copyright disputes is reduced, the method can be applied to the scenes of video overall infringement and video segment infringement, and has certain robustness on video composite change. Moreover, because the video is easy to dub for the second time, the similarity is calculated mainly based on the image characteristics of the video clip to be detected, and audio data which is easy to tamper is not considered, so that the detection accuracy of the infringing video is further improved.

Fig. 6 is a schematic diagram of main blocks of an apparatus for detecting infringement video according to an embodiment of the invention.

As shown in fig. 6, an apparatus 600 for detecting infringement video according to an embodiment of the present invention includes: a segment dividing module 601, a feature determining module 602, a similarity calculating module 603 and an infringing video determining module 604; wherein,

the segment dividing module 601 is configured to determine a video to be detected, and divide the video to be detected into at least two video segments to be detected;

the feature determining module 602 is configured to determine image features of the at least two video segments to be detected respectively;

the similarity calculation module 603 is configured to calculate at least two first similarities between the at least two to-be-detected video segments and the original video segment according to the image features; the original video segment is a segment of an original video with copyright in a copyright video library;

the infringing video determining module 604 is configured to determine, when one of the at least two first similarities is greater than a preset first threshold, the to-be-detected video segment whose first similarity is greater than the preset first threshold as an infringing video segment corresponding to the original video segment, and determine that the to-be-detected video is an infringing video.

In an embodiment of the present invention, the characteristic determining module 602 is configured to perform frame extraction on the video segment to be detected according to a sampling frequency to obtain a plurality of image frames; extracting depth local features in the plurality of image frames by utilizing a LIFT algorithm; determining feature areas in the plurality of image frames by using an optical flow method according to the time sequence of the plurality of image frames; and selecting a depth local feature corresponding to the feature region from the depth local features of the plurality of image frames, and taking the selected depth local feature as the image feature.

In an embodiment of the present invention, the similarity calculation module 603 is configured to encode the image features according to a preset dimension by using a bag-of-words model, and calculate a first similarity between the to-be-detected video segment and the original video segment according to the encoding of the image features.

In an embodiment of the present invention, the segment dividing module 601 is configured to determine a scene change frame in the video to be detected; and dividing the video to be detected into at least two video segments to be detected according to the scene conversion frame.

In an embodiment of the present invention, the second similarity between the video to be detected and the original video in the copyrighted video library is greater than a preset second threshold, and the original video segment is obtained by dividing the original video according to a duration of a scene transition frame or a video segment to be detected in the original video.

In an embodiment of the present invention, the infringing video determining module 604 is configured to determine that the at least two video clips to be detected respectively correspond to a time axis of the video to be detected; and determining the start time of infringement and the end time of infringement in the infringement video according to the time axis and the infringement video clip.

In an embodiment of the present invention, the infringing video determining module 604 is configured to, when a plurality of adjacent infringing video segments exist in the video to be detected, according to the time axis, use a start time of a first infringing video segment in the adjacent infringing video segments as a start time of the infringement, and use an end time of a last infringing video segment in the adjacent infringing video segments as an end time of the infringement.

According to the device for detecting the infringement video, disclosed by the embodiment of the invention, the video to be detected is divided into a plurality of video segments to be detected, then the first similarity between the video segments to be detected and the original video segments with the copyright is calculated according to the image characteristics of the segments to be detected, when the calculated first similarity is greater than a preset first threshold value, the video segments to be detected are determined to be the infringement video segments, and the video to be detected is the infringement video. Therefore, the infringement video segments in the video to be detected are effectively identified and positioned, so that the detection accuracy of the infringement video is improved, the risk of copyright disputes is reduced, the method can be applied to the scenes of video overall infringement and video segment infringement, and has certain robustness on video composite change. Moreover, because the video is easy to dub for the second time, the similarity is calculated mainly based on the image characteristics of the video clip to be detected, and audio data which is easy to tamper is not considered, so that the detection accuracy of the infringing video is further improved.

Fig. 7 shows an exemplary system architecture 700 of a method of detecting infringement video or an apparatus for detecting infringement video to which embodiments of the invention may be applied.

As shown in fig. 7, the system architecture 700 may include

terminal devices

701, 702, 703, a network 704, and a server 705. The network 704 serves to provide a medium for communication links between the

terminal devices

701, 702, 703 and the server 705. Network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

701, 702, 703 to interact with a server 705 over a network 704, to receive or send messages or the like. Various communication client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, and the like, may be installed on the

terminal devices

701, 702, and 703.

The

terminal devices

701, 702, 703 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 705 may be a server that provides various services, such as a background management server that supports shopping websites browsed by users using the

terminal devices

701, 702, and 703. The background management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (e.g., target push information and product information) to the terminal device.

It should be noted that the method for detecting an infringement video provided by the embodiment of the present invention is generally performed by the server 705, and accordingly, the apparatus for detecting an infringement video is generally disposed in the server 705.

It should be understood that the number of terminal devices, networks, and servers in fig. 7 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 8, shown is a block diagram of a computer system 800 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program executes the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 801.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a segment partitioning module, a feature determination module, a similarity calculation module, and an infringing video determination module. The names of these modules do not in some cases form a limitation on the modules themselves, for example, the segment dividing module may also be described as a "module that divides the video to be detected into at least two video segments to be detected".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: determining a video to be detected; dividing the video to be detected into at least two video segments to be detected; respectively determining the image characteristics of the at least two video clips to be detected; respectively calculating at least two first similarities between the at least two video clips to be detected and the original video clip according to the image characteristics; wherein the original video segment corresponds to an original video with copyright in a copyright video library; when one of the at least two first similarities is greater than a preset first threshold, determining the video segment to be detected with the first similarity greater than the preset first threshold as an infringing video segment corresponding to the original video segment, and determining the video to be detected as an infringing video.

According to the technical scheme of the embodiment of the invention, the video to be detected is divided into a plurality of video segments to be detected, then the first similarity between the video segments to be detected and the original video segments with the copyright is calculated according to the image characteristics of the segments to be detected, and when the calculated first similarity is greater than a preset first threshold value, the video segments to be detected are determined to be infringing video segments, and the video to be detected is infringing video. Therefore, the infringement video segments in the video to be detected are effectively identified and positioned, so that the detection accuracy of the infringement video is improved, the risk of copyright disputes is reduced, the method can be applied to the scenes of video overall infringement and video segment infringement, and has certain robustness on video composite change. Moreover, because the video is easy to dub for the second time, the similarity is calculated mainly based on the image characteristics of the video clip to be detected, and audio data which is easy to tamper is not considered, so that the detection accuracy of the infringing video is further improved.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of detecting infringement video, comprising:

determining a video to be detected;

2. The method according to claim 1, wherein the determining the image characteristics of at least two video segments to be detected respectively comprises:

executing, for each video segment to be detected: performing frame extraction on the video clip to be detected according to the sampling frequency to obtain a plurality of image frames;

3. The method of claim 2, wherein determining the image feature from the depth local feature comprises:

4. The method according to claim 3, wherein said calculating at least two first similarities between the at least two video segments to be detected and the original video segments in the copyrighted video library respectively comprises:

and coding the image characteristics according to a preset dimension by using a bag-of-words model, and calculating a first similarity between the video segment to be detected and the original video segment according to the coding of the image characteristics.

5. The method according to claim 1, wherein the dividing the video to be detected into at least two video segments to be detected comprises:

determining a scene conversion frame in the video to be detected;

6. The method of claim 5,

7. The method of claim 1, further comprising:

8. The method according to claim 7, wherein when there are a plurality of adjacent infringing video segments in the video to be detected, the determining a start time and an end time of infringement in the infringing video comprises:

9. An apparatus for detecting infringement video, comprising: the system comprises a segment dividing module, a feature determining module, a similarity calculating module and an infringement video determining module; wherein,

10. The apparatus of claim 9,

11. An electronic device for detecting infringement video, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-8.

12. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-8.