CN113722541A

CN113722541A - Video fingerprint generation method and device, electronic equipment and storage medium

Info

Publication number: CN113722541A
Application number: CN202111004360.5A
Authority: CN
Inventors: 陈海峰; 郭彤; 王泽楷
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2021-08-30
Filing date: 2021-08-30
Publication date: 2021-11-30
Also published as: WO2023029389A1

Abstract

The disclosure relates to a video fingerprint generation method and device, an electronic device and a storage medium. The method comprises the following steps: determining at least one key frame image in a video to be processed; for any key frame image in the at least one key frame image, performing feature extraction on the key frame image through a first neural network to obtain feature information of the key frame image, wherein the first neural network is trained by adopting a video frame image set in advance, the video frame image set comprises video frame images belonging to a plurality of video clip sets, different video clips in any one of the video clip sets are obtained based on the same original edition video clip, and different video frame images belonging to the same video clip set in the video frame image set have the same category labeling information; and determining fingerprint information of the video to be processed according to the characteristic information of the at least one key frame image.

Description

Video fingerprint generation method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of video technologies, and in particular, to a method and an apparatus for generating a video fingerprint, an electronic device, and a storage medium.

Background

Among media such as audio, video images, and text, video images contain the most abundant information. The method has the advantages that the key information of the video images is mined and the video images are efficiently searched in the face of the generation of massive video image data of the Internet, so that the technical problem which is very challenging and significant is solved.

Disclosure of Invention

The present disclosure provides a technical scheme for generating video fingerprints.

According to an aspect of the present disclosure, there is provided a method for generating a video fingerprint, including:

determining at least one key frame image in a video to be processed;

for any key frame image in the at least one key frame image, performing feature extraction on the key frame image through a first neural network to obtain feature information of the key frame image, wherein the first neural network is trained by adopting a video frame image set in advance, the video frame image set comprises video frame images belonging to a plurality of video clip sets, different video clips in any one of the video clip sets are obtained based on the same original edition video clip, and different video frame images belonging to the same video clip set in the video frame image set have the same category labeling information;

and determining fingerprint information of the video to be processed according to the characteristic information of the at least one key frame image.

The first neural network is trained by adopting a video frame image set, wherein the video frame image set comprises video frame images belonging to a plurality of video segment sets, different video segments in any one of the video segment sets are obtained based on the same original edition video segment, and different video frame images belonging to the same video segment set in the video frame image set have the same category marking information, so that the first neural network can learn the capability of extracting similar feature information from the video frame images in different video segments obtained based on the same original edition video segment, the feature information extracted by the trained first neural network can be used for different video attack means, and the accuracy of video comparison can be improved. After the first neural network training is completed, for any key frame image in at least one key frame image of the video to be processed, feature extraction is performed on the key frame image through the first neural network to obtain feature information of the key frame image, and the fingerprint information of the video to be processed is determined according to the feature information of the at least one key frame image, so that the determined fingerprint information of the video to be processed has higher stability, can cope with different video attack means, and is beneficial to copyright protection of the video.

In a possible implementation manner, the any video clip set includes an original video clip and a copied video clip obtained by performing attack processing based on the original video clip.

The video frame images belonging to the original edition video segment and the video frame images belonging to the copy video segment of the original edition video segment are adopted to train the first neural network, and the video frame images belonging to the original edition video segment and the video frame images belonging to the copy video segment of the original edition video segment have the same category marking information, so that the first neural network can learn the capability of extracting similar characteristic information from the video frame images in the corresponding original edition video segment and the copy video segment, the characteristic information extracted by the first neural network obtained through training can cope with video attacks, and the accuracy of video comparison can be improved.

In a possible implementation manner, the any video segment set includes a plurality of copied video segments obtained by performing attack processing based on the same original video segment.

The first neural network is trained by adopting the video frame images of the plurality of the copied video segments belonging to the original edition video segments, and the video frame images belonging to the plurality of the copied video segments have the same category label information, so that the first neural network can learn the capability of extracting similar characteristic information from the video frame images in different copied video segments obtained based on the same original edition video segment, the characteristic information extracted by the trained first neural network can cope with different video attack means, and the accuracy of video comparison can be improved.

In one possible implementation, the video frame images belonging to any one of the video clip sets in the video frame image set include: key frame images of video clips in the set of video clips.

In this implementation, for any one of the plurality of video segment sets, the first neural network is trained by using the key frame images of the video segments in the video segment set, so that a better training effect can be obtained by using fewer training images.

In one possible implementation, the determining at least one key frame image in the video to be processed includes:

performing shot segmentation on a video to be processed to obtain at least one video clip of the video to be processed;

key frame images are respectively determined in the at least one video segment.

The method comprises the steps of obtaining at least one video clip of a video to be processed by carrying out shot segmentation on the video to be processed, respectively determining key frame images in the at least one video clip, and determining fingerprint information of the video to be processed according to the feature information of the at least one key frame image, so that the feature extraction of redundant video frame images can be greatly reduced, the data size of the determined fingerprint information of the video to be processed is small, the required storage space is small, and the rapid retrieval of massive videos can be realized. In addition, the determined fingerprint information of the video to be processed can cover the key visual information of each shot in the video to be processed, the larger information amount of the video to be processed can be covered by a smaller data amount, the reliability is higher, and therefore the rapid and accurate video retrieval can be realized. Thus, the implementation can provide a technically feasible, commercially available video fingerprint generation scheme.

In one possible implementation, after obtaining the feature information of the key frame image, the method further includes:

for any video clip in the at least one video clip, determining the characteristic information of the key frame image in the video clip as the fingerprint information of the video clip.

According to the implementation mode, the fingerprint information of each video segment in the video to be processed can be obtained, the fingerprint information of each video segment of the video to be processed determined according to the example has high stability, different video attack means can be responded, and the copyright protection of the video segments is facilitated.

In a possible implementation manner, the performing shot segmentation on the video to be processed to obtain at least one video segment of the video to be processed includes:

performing lens segmentation on a video to be processed through a second neural network to obtain a primary segmentation result of a video clip of the video to be processed;

and obtaining at least one video segment of the video to be processed based on the preliminary segmentation result.

In the implementation mode, the shot segmentation is carried out on the video to be processed through the second neural network, so that the accuracy and the speed of the shot segmentation on the video to be processed can be improved.

In a possible implementation manner, the obtaining at least one video segment of the video to be processed based on the preliminary segmentation result includes:

in the case that the number of video segments in the preliminary segmentation result is greater than or equal to 2, merging, for a first video segment and a second video segment that are adjacent in the preliminary segmentation result, the first video segment and the second video segment in response to a similarity between a last frame of the first video segment and a first frame of the second video segment being greater than or equal to a first preset threshold, wherein the second video segment is a next video segment of the first video segment.

According to the implementation mode, adjacent video clips with slight jitter or small variation can be combined, so that the number of key frame images for feature extraction can be further reduced, and the data volume of fingerprint information of the video to be processed can be further reduced.

In one possible implementation, the determining key frame images in the at least one video segment respectively includes:

for any video clip in the at least one video clip, determining a key frame image of the video clip according to a video frame in the middle position of the video clip.

In this implementation, for any one of the at least one video segment, the key frame image of the video segment is determined according to the video frame located in the middle of the video segment, so that the speed of determining the key frame image in each video segment can be greatly increased on the premise that the determined key frame image contains richer visual information of the video segment.

In a possible implementation manner, the determining fingerprint information of the video to be processed according to the feature information of the at least one key frame image includes:

and under the condition that the number of the key frame images is at least two, according to the sequence of the at least two key frame images in the video to be processed, forming a feature sequence by using the feature information of the at least two key frame images, and using the feature sequence as the fingerprint information of the video to be processed.

The fingerprint information of the video to be processed determined according to the implementation mode can reflect the time sequence information of the visual information in the video to be processed, so that the accuracy of video comparison is further improved.

In one possible implementation, after the determining the fingerprint information of the video to be processed, the method further includes at least one of:

adding the fingerprint information of the video to be processed into a header file of the video to be processed;

adding fingerprint information of the video to be processed to the least significant bits of a video frame of the video to be processed;

and adding the characteristic information of the key frame image to the least significant bit of the video frame in the video clip to which the key frame image belongs.

According to the implementation mode, the digital watermark can be added to the video to be processed, and the digital watermark is invisible to a user watching the video, namely, the digital watermark has no influence on the video quality, so that the user watching the video cannot be influenced by adding the digital watermark, and the experience of the user watching the video can be improved. And when the video copyright is in dispute, the original video and the copied video can be identified according to the digital watermark of the video, namely, the copyright ownership of the video can be identified.

In one possible implementation, after the determining the fingerprint information of the video to be processed, the method further includes:

and under the condition that the video to be processed belongs to the original edition video, storing the fingerprint information of the video to be processed in a preset database, wherein the preset database is used for storing the fingerprint information of the original edition video.

According to the implementation mode, the fingerprint information of each original edition video can be stored in the preset database, so that the storage of the fingerprint information of the original edition video can be realized. And when a newly uploaded video is detected, comparing the fingerprint information of the newly uploaded video with the fingerprint information of the original edition videos in the preset database to judge whether the newly uploaded video is a duplicated video of any original edition video in the preset database.

In one possible implementation, after the determining is fingerprint information of the video segment, the method further includes:

comparing the fingerprint information of the at least one video clip with the fingerprint information of the video clip of any original edition video to obtain a first comparison result;

and in response to the first comparison result meeting a first preset condition, determining the video to be processed as a copy video of the original video.

According to the implementation mode, the duplicated video can be found quickly.

In one possible implementation manner, the first preset condition includes:

and the continuous number of the video clips matched with the original edition video of the video to be processed reaches a second preset threshold value.

According to the implementation mode, the accuracy of video comparison can be improved.

In one possible implementation, after the determining the to-be-processed video as a copy of the original video, the method further includes:

and determining a propagation path of the copied video of the original edition video according to the creation time of the video to be processed and the creation time of other copied videos of the original edition video.

In this implementation, the propagation path of the duplicated videos of the original video may be determined by sorting according to the creation times of the multiple duplicated videos of the original video, thereby facilitating the infringement analysis of the videos.

comparing the fingerprint information of the at least one video segment with the fingerprint information of the video segment of the video to be compared under the condition that the video to be processed belongs to the original edition video to obtain a second comparison result;

and determining the video to be compared as a copy video of the video to be processed in response to the second comparison result meeting a second preset condition.

In this implementation, after the user uploads the original edition video, a large-scale search can be performed based on fingerprint information of the original edition video to find a duplicated video in the history video.

acquiring fingerprint information of video clips of videos in a to-be-processed video set, wherein the to-be-processed video set comprises the to-be-processed videos;

clustering fingerprint information of video segments of the videos in the to-be-processed video set to obtain fingerprint information of a clustering center;

and storing the fingerprint information of the clustering center.

According to the implementation mode, massive video segments can be clustered, fingerprint information of the massive video segments is collided, and a clustering result based on the fingerprint information of the massive video segments is obtained, so that automatic classification and structurization of the video segments can be realized, and the efficiency of subsequent retrieval is improved. For example, it may be applied to the automatic classification of entertainment media video segments and the structuring of entertainment video information.

According to an aspect of the present disclosure, there is provided a video fingerprint generation apparatus, including:

the first determination module is used for determining at least one key frame image in the video to be processed;

a feature extraction module, configured to perform feature extraction on any key frame image in the at least one key frame image through a first neural network to obtain feature information of the key frame image, where the first neural network is trained in advance by using a video frame image set, the video frame image set includes video frame images belonging to multiple video clip sets, different video clips in any one of the multiple video clip sets are obtained based on a same original version video clip, and different video frame images belonging to a same video clip set in the video frame image set have the same category tagging information;

and the second determining module is used for determining the fingerprint information of the video to be processed according to the characteristic information of the at least one key frame image.

In one possible implementation manner, the first determining module is configured to:

key frame images are respectively determined in the at least one video segment.

In one possible implementation, the apparatus further includes:

and the third determining module is used for determining the characteristic information of the key frame image in the video clip as the fingerprint information of the video clip for any video clip in the at least one video clip.

In one possible implementation manner, the second determining module is configured to:

In one possible implementation manner, the apparatus further includes an adding module, and the adding module is configured to at least one of:

In one possible implementation, the apparatus further includes:

the first storage module is used for storing the fingerprint information of the video to be processed in a preset database under the condition that the video to be processed belongs to an original edition video, wherein the preset database is used for storing the fingerprint information of the original edition video.

In one possible implementation, the apparatus further includes:

the first comparison module is used for comparing the fingerprint information of the at least one video clip with the fingerprint information of the video clip of any original edition video to obtain a first comparison result;

and the fourth determining module is used for determining the video to be processed as a duplicated video of the original video in response to the first comparison result meeting a first preset condition.

In one possible implementation manner, the first preset condition includes:

In one possible implementation, the apparatus further includes:

and the fifth determining module is used for determining the propagation path of the duplicated video of the original edition video according to the creation time of the video to be processed and the creation time of other duplicated videos of the original edition video.

In one possible implementation, the apparatus further includes:

the second comparison module is used for comparing the fingerprint information of the at least one video segment with the fingerprint information of the video segment of the video to be compared under the condition that the video to be processed belongs to the original edition video to obtain a second comparison result;

and the sixth determining module is used for determining that the video to be compared is a duplicated video of the video to be processed in response to the second comparison result meeting a second preset condition.

In one possible implementation, the apparatus further includes:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring fingerprint information of video clips of videos in a video set to be processed, and the video set to be processed comprises the videos to be processed;

the clustering module is used for clustering the fingerprint information of the video segments of the videos in the video set to be processed to obtain the fingerprint information of a clustering center;

and the second storage module is used for storing the fingerprint information of the clustering center.

According to an aspect of the present disclosure, there is provided an electronic device including: one or more processors; a memory for storing executable instructions; wherein the one or more processors are configured to invoke the memory-stored executable instructions to perform the above-described method.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

In the embodiment of the disclosure, a video frame image set is used for training a first neural network, wherein the video frame image set includes video frame images belonging to a plurality of video segment sets, different video segments in any one of the video segment sets are obtained based on a same original edition video segment, and different video frame images belonging to a same video segment set in the video frame image set have the same category label information, so that the first neural network can learn the capability of extracting similar feature information from video frame images in different video segments obtained based on a same original edition video segment, and the feature information extracted by the trained first neural network can be used for different video attack means, and the accuracy of video comparison can be improved. After the first neural network training is completed, for any key frame image in at least one key frame image of the video to be processed, feature extraction is performed on the key frame image through the first neural network to obtain feature information of the key frame image, and the fingerprint information of the video to be processed is determined according to the feature information of the at least one key frame image, so that the determined fingerprint information of the video to be processed has higher stability, can cope with different video attack means, and is beneficial to copyright protection of the video.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a flowchart of a method for generating a video fingerprint according to an embodiment of the present disclosure.

Fig. 2 is a schematic diagram illustrating an application scenario of a video fingerprint generation method provided by an embodiment of the present disclosure.

Fig. 3 shows a block diagram of a video fingerprint generation apparatus provided by an embodiment of the present disclosure.

Fig. 4 illustrates a block diagram of an electronic device 800 provided by an embodiment of the disclosure.

Fig. 5 shows a block diagram of an electronic device 1900 provided by an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

The following describes a method for generating a video fingerprint according to an embodiment of the present disclosure in detail with reference to the accompanying drawings. Fig. 1 shows a flowchart of a method for generating a video fingerprint according to an embodiment of the present disclosure. In a possible implementation manner, the video fingerprint generation method may be executed by a terminal device or a server or other processing devices. The terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, or a wearable device. In some possible implementations, the method for generating a video fingerprint may be implemented by a processor calling computer-readable instructions stored in a memory. As shown in fig. 1, the method for generating a video fingerprint includes steps S11 to S13.

In step S11, at least one key frame image in the video to be processed is determined.

In step S12, for any key frame image in the at least one key frame image, feature extraction is performed on the key frame image through a first neural network to obtain feature information of the key frame image, where the first neural network is trained in advance by using a video frame image set, the video frame image set includes video frame images belonging to a plurality of video clip sets, different video clips in any one of the plurality of video clip sets are obtained based on the same original video clip, and different video frame images belonging to the same video clip set in the video frame image set have the same category label information.

In step S13, fingerprint information of the video to be processed is determined according to the feature information of the at least one key frame image.

In the embodiment of the present disclosure, the video to be processed may represent any video that needs to generate fingerprint information. Wherein the fingerprint information may be information for uniquely identifying the video. In the embodiment of the disclosure, one or more than two key frame images can be determined from a video to be processed, feature information of each key frame image is extracted through a first neural network, and fingerprint information of the video to be processed is determined according to the feature information of each key frame image. The key frame image in the video to be processed may represent a video frame image that contains richer visual information of the video to be processed in the video frame image of the video to be processed.

In the embodiment of the present disclosure, feature extraction may be performed on each key frame image by using a first neural network. Wherein the first Neural network may be a Deep Neural Network (DNN). Compared with the traditional manual features such as Scale Invariant Feature Transform (SIFT) Feature, Histogram of Oriented Gradient (HOG) Feature and the like, the depth Feature of the key frame image is extracted by using the first neural network, so that richer visual information in the key frame image can be obtained.

In embodiments of the present disclosure, a first neural network may be trained using a set of video frame images. Wherein the video frame image set comprises a plurality of video frame images, and each video frame image can have category label information. The category label information of any video frame image can be used to indicate the category to which the video frame image belongs. For example, if the number of categories is 5, and a certain video frame image in the video frame image set belongs to a second category of the 5 categories, the category label information of the video frame image may be [0,1,0,0,0 ]; for another example, if the number of categories is 4, and a certain video frame image in the video frame image set belongs to a third category of the 4 categories, the category label information of the video frame image may be [0,0,1,0 ]. In practical applications, the number of categories is usually large. For example, a large number of classes of video frame images may be collected to train the first neural network, and the trained first neural network may be used as a feature extractor for extracting features of the key frame images. Of course, the embodiments of the present disclosure may also be applied to an application scenario with a small number of categories, which is not limited herein.

In the embodiment of the present disclosure, the video frame image set used for training the first neural network includes a plurality of groups of video frame images, where each group of video frame images includes at least one video frame image, each video frame image in each group of video frame images belongs to the same video segment set, and the category labeling information of each video frame image in each group of video frame images is the same.

For example, the video frame image set includes video frame images 1 to 10, wherein the video frame images 1 to 10 are video frame images in video segments 1 to 10, respectively, the video segment set 1 includes video segments 1, 2 and 3, the video segment set 2 includes video segments 4, 5, 6 and 7, the video segment set 3 includes video segments 8, 9 and 10, the category label information of the video frame images 1, 2 and 3 is [1,0,0], the label category label information of the video frame images 4, 5, 6 and 7 is [0,1,0], the category label information of the video frame images 8, 9 and 10 is [0,0,1].

For another example, original video O includes video segment 1, video segment 2, and video segment 3, and the corresponding duplicated video segment P of original video O₁Comprises a video segment 1 ', a video segment 2 ' and a video segment 3 ', a duplicated video segment P corresponding to the original video O₂Including video segment 1 ", video segment 2", and video segment 3 ", where video segment 1, video segment 1 ', and video segment 1" are respective video segments, video segment 2', and video segment 2 "are respective video segments, and video segment 3, video segment 3 ', and video segment 3" are respective video segments, then video segment 1, video segment 1', and video segment 1 "may be attributed to video segment set 1, video segment 2 ', and video segment 2" may be attributed to video segment set 2, and video segment 3, video segment 3', and video segment 3 "may be attributed to video segment set 3. Can be prepared by reacting [1, 0%]As the category label information of the video frame images in video clip 1, video clip 1' and video clip 1 ″,0,1,0 can be expressed]As the category label information of the video frame images in video clip 2, video clip 2' and video clip 2 ″,0,1 can be set]As category label information for the video frame images in video clip 3, video clip 3', and video clip 3 ".

For any of the plurality of video clip sets, the video clip set includes at least one video clip. In case the video segment set comprises more than two video segments, each video segment in the video segment set is derived based on the same original video segment. Wherein the video clip obtained based on any original video clip may include at least one of: the original edition video clip, the video clip obtained by copying the original edition video clip, and the reprinted video clip obtained by carrying out attack processing based on the original edition video clip. The video clip copied by performing attack processing based on the original video clip may include: and carrying out attack processing on the original edition video clip to obtain a copied video clip, and/or carrying out attack processing on the copied video clip of the original edition video clip to obtain the copied video clip. Wherein the video clips of the copies of the original video clip may include multi-level video clips of the original video clip. The first-level copied video segment of the original edition video segment may represent a copied video segment obtained by performing attack processing on the original edition video segment. That is, the first-level copied video segment of the original video segment may represent a copied video segment obtained by directly performing attack processing on the original video segment. The nth-level duplicated video segment of the original edition video segment may represent a duplicated video segment obtained by performing attack processing on an nth-1-level duplicated video segment of the original edition video segment, where N is an integer greater than or equal to 2. That is, the copied video segment of the original edition video segment at the second level or higher may represent a copied video segment obtained by performing attack processing on the copied video segment of the original edition video segment, rather than a copied video segment obtained by directly performing attack processing on the original edition video segment.

Wherein, the attack processing of the video clip can represent the modification of the video clip, so that the modified video clip is different from the original video clip. Correspondingly, the attack processing is carried out on the video, which can mean that the video is modified, so that the modified video is different from the original video. The manner in which the video segments are treated for attacks may include geometric attacks and/or non-geometric attacks. For example, the attack processing manner of the video segment may include at least one of gamma correction, resolution adjustment, picture distortion, edge cropping, watermarking, video format conversion, coding manner conversion, advertisement splicing, video flipping, gaussian filtering, video compression, video occlusion, and the like. It should be noted that, although the manner of performing attack processing on a video segment is described above by way of example, those skilled in the art can understand that the present disclosure should not be limited thereto.

In a possible implementation manner, the any video clip set includes an original video clip and a copied video clip obtained by performing attack processing based on the original video clip. In this implementation, any of the plurality of video segment sets may include an original video segment and a duplicate video segment of the original video segment. Wherein, the number of the duplicated video clips obtained by carrying out attack processing on any video clip set based on the original edition video clip can be one or more than two. The video frame images belonging to the original edition video segment and the video frame images belonging to the copy video segment of the original edition video segment are adopted to train the first neural network, and the video frame images belonging to the original edition video segment and the video frame images belonging to the copy video segment of the original edition video segment have the same category marking information, so that the first neural network can learn the capability of extracting similar characteristic information from the video frame images in the corresponding original edition video segment and the copy video segment, the characteristic information extracted by the first neural network obtained through training can cope with video attacks, and the accuracy of video comparison can be improved.

In another possible implementation manner, any one of the video segment sets includes a plurality of copied video segments obtained by performing attack processing based on the same original video segment. In this implementation, any of the plurality of video segment sets may include more than two duplicated video segments of the master video segment. The first neural network is trained by adopting the video frame images of the plurality of the copied video segments belonging to the original edition video segments, and the video frame images belonging to the plurality of the copied video segments have the same category label information, so that the first neural network can learn the capability of extracting similar characteristic information from the video frame images in different copied video segments obtained based on the same original edition video segment, the characteristic information extracted by the trained first neural network can cope with different video attack means, and the accuracy of video comparison can be improved.

In an embodiment of the present disclosure, the video frame images belonging to any video clip set in the video frame image set may include: any one or more than two video frame images in each video segment of the video segment set.

In one possible implementation, the video frame images belonging to any one of the video clip sets in the video frame image set include: key frame images of video clips in the set of video clips. In this implementation, for any one of the plurality of video segment sets, the key frame image of each video segment in the video segment set may be used as the video frame image in the video frame image set, that is, the key frame image of each video segment in the video segment set may be used to train the first neural network. Wherein, the amount of the key frame image of any video clip in any video clip set in the plurality of video clip sets can be one or more than two. For example, a video clip set 1 includes a video clip 1, a video clip 2, and a video clip 3, and the video frame image set may include a key frame image 1 of the video clip 1, a key frame image 2 of the video clip 2, and a key frame image 3 of the video clip 3. In this implementation, for any one of the plurality of video segment sets, the first neural network is trained by using the key frame images of the video segments in the video segment set, so that a better training effect can be obtained by using fewer training images.

In another possible implementation manner, the video frame images belonging to any video clip set in the video frame image set include: video frame images other than the key frame images of the video clips in the video clip set. For example, the video frame images belonging to any one of the video clip sets in the video frame image set may include: the key frame images of the video clips in the video clip set, and the video frame images except the key frame images of the video clips in the video clip set.

In one possible implementation, the determining at least one key frame image in the video to be processed includes: performing shot segmentation on a video to be processed to obtain at least one video clip of the video to be processed; key frame images are respectively determined in the at least one video segment. In this implementation, shot segmentation is performed on the video to be processed, so that one or more than two video clips of the video to be processed can be obtained. By performing shot segmentation on the video to be processed, the starting time point and the ending time point of each video clip in the video to be processed can be obtained. For any of at least one video segment of a video to be processed, one or more key frame images may be determined in the video segment. For example, a key frame image may be determined in each video clip of the video to be processed. For example, if the video to be processed includes M video segments, one key frame image may be determined in each of the M video segments to obtain M key frame images, where M is an integer greater than or equal to 1.

For example, the frame rate of the video is 25 frames/second, and 1 minute of the video includes 1500 frames, i.e., 60 × 25 frames. The duration of a large volume of video is tens of minutes or hours. If each frame of video is identified, a large amount of computing resources are consumed and are not feasible in application. In the related art, a video is usually decimated, for example, decimated 1 frame every 1 second. However, for a huge amount of video, 1 frame per second is extracted, and 600 frames will still be extracted for 110 hours of video. The number of frames that need to be extracted per day is still a considerable order of magnitude, in light of the current video production capacity of the internet. In the process of implementing the present invention, the inventor of the present application finds that, in the video shooting process, the difference of the picture information is large according to the difference of the lens angle, the focal length, the pitch angle and the direction of the video picture, and under the same lens, the consistency of the background information is high and the similarity of the features is also high. In a video, the switching frequency of the shots is low, and feature extraction is performed on each shot, so that the number of used video frame images can be reduced while the visual information of each shot of the video is covered. In the implementation manner, at least one video clip of the video to be processed is obtained by performing shot segmentation on the video to be processed, key frame images are respectively determined in the at least one video clip, and fingerprint information of the video to be processed is determined according to the feature information of the at least one key frame image, so that feature extraction of redundant video frame images can be greatly reduced, the data size of the determined fingerprint information of the video to be processed is small, the required storage space is small, and rapid retrieval of massive videos can be realized. In addition, the determined fingerprint information of the video to be processed can cover the key visual information of each shot in the video to be processed, the larger information amount of the video to be processed can be covered by a smaller data amount, the reliability is higher, and therefore the rapid and accurate video retrieval can be realized. Thus, the implementation can provide a technically feasible, commercially available video fingerprint generation scheme.

As an example of this implementation, the performing shot segmentation on the video to be processed to obtain at least one video segment of the video to be processed includes: performing lens segmentation on a video to be processed through a second neural network to obtain a primary segmentation result of a video clip of the video to be processed; and obtaining at least one video segment of the video to be processed based on the preliminary segmentation result. Wherein the second neural network may be a deep neural network. For example, the second neural network may employ a neural network such as C3D, I3D, TSN, etc. The second neural network may be trained using a training video set in advance, wherein each training video in the training video set may include annotation data of a shot switching time point. In this example, the preliminary segmentation result may be used as a final segmentation result of the video segment of the video to be processed, or the preliminary segmentation result may be processed to obtain a final segmentation result of the video segment of the video to be processed. In this example, the shot segmentation is performed on the video to be processed through the second neural network, so that the accuracy and the speed of the shot segmentation on the video to be processed can be improved.

In one example, the obtaining at least one video segment of the video to be processed based on the preliminary segmentation result includes: in the case that the number of video segments in the preliminary segmentation result is greater than or equal to 2, merging, for a first video segment and a second video segment that are adjacent in the preliminary segmentation result, the first video segment and the second video segment in response to a similarity between a last frame of the first video segment and a first frame of the second video segment being greater than or equal to a first preset threshold, wherein the second video segment is a next video segment of the first video segment. In this example, if the similarity between the last frame of the first video segment and the first frame of the second video segment is less than the first preset threshold, it may be determined that the difference between the first video segment and the second video segment is large, and thus the first video segment and the second video segment may not be merged. According to this example, adjacent video clips with little jitter or variation can be merged, whereby the number of key frame images subjected to feature extraction can be further reduced, and thus the data amount of fingerprint information of a video to be processed can be further reduced.

As another example of the implementation, for any two adjacent video frame images of the video to be processed, the similarity of the two video frame images may be calculated, and in response to the similarity of the two video frame images being less than a fourth preset threshold, it may be determined that a shot cut has occurred between the two video frame images.

In this implementation, for any video clip in the at least one video clip, the key frame image of the video clip may be determined according to any frame in the video clip.

As an example of this implementation, the determining key frame images in the at least one video clip respectively includes: for any video clip in the at least one video clip, determining a key frame image of the video clip according to a video frame in the middle position of the video clip. For example, the F-th frame of the video segment may be determined as a key frame image of the video segment, e.g., F ═ F/2], or F ═ F/2] +1, where F denotes the number of video frames in the video segment and [ F/2] denotes taking the largest integer no greater than F/2. In this example, by determining the key frame images of the video clips according to the video frames at the middle positions of the video clips for any video clip in the at least one video clip, the speed of determining the key frame images in the video clips can be greatly increased on the premise that the determined key frame images contain richer visual information of the video clips.

In this implementation, the feature information of the key frame image may be in the form of data such as "1010101010101010100110101010101010101010101", and is not limited herein.

As an example of this implementation, after obtaining the feature information of the key frame image, the method further includes: for any video clip in the at least one video clip, determining the characteristic information of the key frame image in the video clip as the fingerprint information of the video clip. According to the example, the fingerprint information of each video segment in the video to be processed can be obtained, the fingerprint information of each video segment of the video to be processed determined according to the example has high stability, different video attack means can be coped with, and the copyright protection of the video segments is facilitated.

In one example, after the determining is fingerprint information of the video segment, the method further comprises: comparing the fingerprint information of the at least one video clip with the fingerprint information of the video clip of any original edition video to obtain a first comparison result; and in response to the first comparison result meeting a first preset condition, determining the video to be processed as a copy video of the original video. In this example, in a case where the to-be-processed video is not yet determined to be original videos, fingerprint information of a video segment in the to-be-processed video may be compared with fingerprint information of video segments of one or more original videos to determine whether the to-be-processed video is a duplicate video of any of the original videos. In this example, the first comparison result may represent a comparison result of fingerprint information of a video segment of the video to be processed with fingerprint information of a video segment of any original video. For example, the first comparison result may include a similarity of fingerprint information of each video segment of the video to be processed and fingerprint information of each video segment of any original video. In this example, if the first comparison result meets a first preset condition, it may be determined that the video to be processed is a duplicated video of the original video; if the first comparison result does not satisfy the first condition, it may be determined that the video to be processed is not a duplicate of the original video. According to this example, a duplicated video can be quickly found.

In one example, the first preset condition includes: and the continuous number of the video clips matched with the original edition video of the video to be processed reaches a second preset threshold value. If the similarity between the fingerprint information of a third video segment of the video to be processed and the fingerprint information of a fourth video segment of the original edition video is greater than or equal to a fifth preset threshold, determining that the third video segment is matched with the fourth video segment, wherein the third video segment represents any video segment of the video to be processed, and the fourth video segment represents any video segment of the original edition video; if the similarity between the fingerprint information of the third video segment and the fourth video segment is less than a fifth preset threshold, it may be determined that the third video segment does not match the fourth video segment. For example, if P consecutive video segments in the to-be-processed video and P consecutive video segments in the original video are sequentially matched, the to-be-processed video may be determined as a video that is a copy of the original video, where P represents a second preset threshold. For example, the second preset threshold may be 3. Of course, a person skilled in the art may flexibly set the second preset threshold according to the requirements of the actual application scenario, which is not limited herein. According to this example, the accuracy of video comparison can be improved.

In another example, the first preset condition includes: and the number of the video clips matched with the original edition video of the video to be processed reaches a third preset threshold value.

In another example, the first preset condition includes: and the continuous number of the video segments matched with the original edition video of the video to be processed reaches a second preset threshold, and the number of the video segments matched with the original edition video of the video to be processed reaches a third preset threshold, wherein the third preset threshold is greater than or equal to the second preset threshold.

In one example, in response to the pending video being a copy of the original video, an alert message may be issued to prompt the user to call up the corresponding video and clip for confirmation.

In one example, after the determining the to-be-processed video as a video that is a copy of the original video, the method further comprises: and determining a propagation path of the copied video of the original edition video according to the creation time of the video to be processed and the creation time of other copied videos of the original edition video. In this example, the travel path of the video copies of the original video may be determined by sorting according to the creation times of the plurality of video copies of the original video, thereby facilitating infringement analysis of the videos.

Wherein, to the reprint video, can adopt the webpage to collect evidence, screenshot is collected evidence, record screen and collect evidence and record video and collect evidence in at least one mode in the evidence etc. and carry out evidence and fix.

In another example, after the determining is fingerprint information of the video segment, the method further comprises: comparing the fingerprint information of the at least one video segment with the fingerprint information of the video segment of the video to be compared under the condition that the video to be processed belongs to the original edition video to obtain a second comparison result; and determining the video to be compared as a copy video of the video to be processed in response to the second comparison result meeting a second preset condition. In this example, the video to be compared may be any video that has not been determined to be the original video. In this example, in the case that it has been determined that the video to be processed belongs to the original video, the fingerprint information of the video segments in the video to be processed may be compared with the fingerprint information of the video segments of one or more videos to be compared that have not been determined as the original video to determine whether the video to be compared is a duplicate video of the video to be processed. In this example, the second comparison result may represent a comparison result of fingerprint information of a video segment of the video to be processed and fingerprint information of a video segment of any video to be compared. For example, the second comparison result may include similarity between the fingerprint information of each video segment of the video to be processed and the fingerprint information of each video segment of any video to be compared. In this example, if the second comparison result meets a second preset condition, it may be determined that the video to be compared is a duplicated video of the video to be processed; if the second comparison result does not meet the second preset condition, it can be determined that the video to be compared is not a duplicated video of the video to be processed. In this example, after the user uploads the original video, a large-scale search can be performed based on fingerprint information of the original video to find a duplicated video in the historical video.

In one example, the second preset condition may include: the continuous number of the video clips matched with the video to be compared and the video to be processed reaches a second preset threshold value; and/or the number of the video clips matched with the video to be compared and the video to be processed reaches a third preset threshold value.

In one example, after the determining is fingerprint information of the video segment, the method further comprises: acquiring fingerprint information of video clips of videos in a to-be-processed video set, wherein the to-be-processed video set comprises the to-be-processed videos; clustering fingerprint information of video segments of the videos in the to-be-processed video set to obtain fingerprint information of a clustering center; and storing the fingerprint information of the clustering center. According to the example, massive video segments can be clustered, fingerprint information of the massive video segments is collided, and a clustering result based on the fingerprint information of the massive video segments is obtained, so that automatic classification and structurization of the video segments can be realized, and the efficiency of subsequent retrieval is improved. For example, it may be applied to the automatic classification of entertainment media video segments and the structuring of entertainment video information.

In another possible implementation manner, K frames may be randomly extracted from the video to be processed as the key frame image of the video to be processed, where K is an integer greater than or equal to 1.

In another possible implementation, part or all of the I frames (I frames) of the video to be processed may be used as key frame images of the video to be processed.

In a possible implementation manner, the determining fingerprint information of the video to be processed according to the feature information of the at least one key frame image includes: and under the condition that the number of the key frame images is at least two, according to the sequence of the at least two key frame images in the video to be processed, forming a feature sequence by using the feature information of the at least two key frame images, and using the feature sequence as the fingerprint information of the video to be processed. For example, if the number of the key frame images is M, the feature information of the M key frame images may be combined into a feature sequence according to the sequence of the M key frame images in the video to be processed, and the feature sequence is used as the fingerprint information of the video to be processed. The feature sequence comprises M elements, and the M elements are respectively feature information of the M key frame images. The fingerprint information of the video to be processed determined according to the implementation mode can reflect the time sequence information of the visual information in the video to be processed, so that the accuracy of video comparison is further improved.

In one possible implementation, after the determining the fingerprint information of the video to be processed, the method further includes at least one of: adding the fingerprint information of the video to be processed into a header file of the video to be processed; adding fingerprint information of the video to be processed to the least significant bits of a video frame of the video to be processed; and adding the characteristic information of the key frame image to the least significant bit of the video frame in the video clip to which the key frame image belongs. According to the implementation mode, the digital watermark can be added to the video to be processed, and the digital watermark is invisible to a user watching the video, namely, the digital watermark has no influence on the video quality, so that the user watching the video cannot be influenced by adding the digital watermark, and the experience of the user watching the video can be improved. And when the video copyright is in dispute, the original video and the copied video can be identified according to the digital watermark of the video, namely, the copyright ownership of the video can be identified.

In one possible implementation, after the determining the fingerprint information of the video to be processed, the method further includes: and under the condition that the video to be processed belongs to the original edition video, storing the fingerprint information of the video to be processed in a preset database, wherein the preset database is used for storing the fingerprint information of the original edition video. According to the implementation mode, the fingerprint information of each original edition video can be stored in the preset database, so that the storage of the fingerprint information of the original edition video can be realized. And when a newly uploaded video is detected, comparing the fingerprint information of the newly uploaded video with the fingerprint information of the original edition videos in the preset database to judge whether the newly uploaded video is a duplicated video of any original edition video in the preset database.

As an example of this implementation, the start time and the end time of each video segment in the video to be processed may also be recorded under the ID of the video to be processed in a preset database.

In one possible implementation, after the determining the fingerprint information of the video to be processed, the method further includes: determining the similarity between the fingerprint information of the video to be processed and the fingerprint information of any original edition video; and determining the video to be processed as a duplicated video of the original edition video in response to the similarity between the fingerprint information of the video to be processed and the fingerprint information of the original edition video being greater than or equal to a sixth preset threshold value.

In another possible implementation manner, after the determining the fingerprint information of the video to be processed, the method further includes: determining the similarity between the fingerprint information of the video to be processed and the fingerprint information of the video to be compared under the condition that the video to be processed belongs to the original edition video; and determining that the video to be compared is a duplicated video of the video to be processed in response to the fact that the similarity between the fingerprint information of the video to be processed and the fingerprint information of the video to be compared is larger than or equal to a seventh preset threshold value.

The method for generating a video fingerprint provided by the embodiment of the present disclosure is described below by a specific application scenario. Fig. 2 is a schematic diagram illustrating an application scenario of a video fingerprint generation method provided by an embodiment of the present disclosure.

In the application scene, shot segmentation can be performed on the original video through the second neural network, so that a video clip 1, a video clip 2 and a video clip 3 of the original video are obtained. Key frame extraction can be performed on the video clip 1, the video clip 2 and the video clip 3 respectively to obtain a key frame image 1, a key frame image 2 and a key frame image 3. The first neural network can be adopted to respectively extract the depth features of the key frame image 1, the key frame image 2 and the key frame image 3, so as to obtain the feature information 1 of the key frame image 1, the feature information 2 of the key frame image 2 and the feature information 3 of the key frame image 3. The feature information 1, the feature information 2 and the feature information 3 may be combined into a feature sequence according to the sequence of the feature information 1, the feature information 2 and the feature information 3, and the feature sequence may be used as the fingerprint information of the original video. After obtaining the fingerprint information of the original video, the fingerprint information of the original video may be stored in an original fingerprint database, where the original fingerprint database is used for storing the fingerprint information of the original video.

The video to be compared can be shot-cut through the second neural network, and the video clip 1 ', the video clip 2 ' and the video clip 3 ' of the video to be compared are obtained. The key frame extraction can be respectively carried out on the video clip 1 ', the video clip 2' and the video clip 3 'to obtain a key frame image 1', a key frame image 2 'and a key frame image 3'. The first neural network can be adopted to respectively extract the depth features of the key frame image 1 ', the key frame image 2 ' and the key frame image 3 ' to obtain the feature information 1 ' of the key frame image 1 ', the feature information 2 ' of the key frame image 2 ' and the feature information 3 ' of the key frame image 3 '. The feature information 1 ', the feature information 2' and the feature information 3 'may be combined into a feature sequence according to the sequence of the feature information 1' -the feature information 2 '-the feature information 3', and the feature sequence may be used as fingerprint information of the video to be compared.

The fingerprint information of the video to be compared can be compared with the fingerprint information of at least one original video in the original video fingerprint library. And determining that the video to be compared is a duplicated video of the original video in response to the fact that the continuous number of the video segments matched with any original video in the original video fingerprint library reaches a second preset threshold value.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

In addition, the present disclosure also provides a video fingerprint generation apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the video fingerprint generation methods provided by the present disclosure, and corresponding technical solutions and technical effects can be referred to in corresponding descriptions of the method section and are not described again.

Fig. 3 shows a block diagram of a video fingerprint generation apparatus provided by an embodiment of the present disclosure. As shown in fig. 3, the video fingerprint generation apparatus includes:

a first determining module 31, configured to determine at least one key frame image in a video to be processed;

a feature extraction module 32, configured to perform feature extraction on any key frame image in the at least one key frame image through a first neural network to obtain feature information of the key frame image, where the first neural network is trained in advance by using a video frame image set, the video frame image set includes video frame images belonging to multiple video clip sets, different video clips in any one of the multiple video clip sets are obtained based on a same original version video clip, and different video frame images belonging to a same video clip set in the video frame image set have the same category tagging information;

a second determining module 33, configured to determine fingerprint information of the video to be processed according to the feature information of the at least one key frame image.

In a possible implementation manner, the first determining module 31 is configured to:

key frame images are respectively determined in the at least one video segment.

In one possible implementation, the apparatus further includes:

In a possible implementation manner, the second determining module 33 is configured to:

In one possible implementation, the apparatus further includes:

In one possible implementation manner, the first preset condition includes:

In one possible implementation, the apparatus further includes:

In some embodiments, functions or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementations and technical effects thereof may refer to the description of the above method embodiments, which are not described herein again for brevity.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-described method. The computer-readable storage medium may be a non-volatile computer-readable storage medium, or may be a volatile computer-readable storage medium.

Embodiments of the present disclosure also provide a computer program, which includes computer readable code, and when the computer readable code runs in an electronic device, a processor in the electronic device executes the above method.

The disclosed embodiments also provide a computer program product comprising computer readable code or a non-volatile computer readable storage medium carrying computer readable code, which when run in an electronic device, a processor in the electronic device performs the above method.

An embodiment of the present disclosure further provides an electronic device, including: one or more processors; a memory for storing executable instructions; wherein the one or more processors are configured to invoke the memory-stored executable instructions to perform the above-described method.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 4 illustrates a block diagram of an electronic device 800 provided by an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 4, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge Coupled Device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as a wireless network (Wi-Fi), a second generation mobile communication technology (2G), a third generation mobile communication technology (3G), a fourth generation mobile communication technology (4G)/long term evolution of universal mobile communication technology (LTE), a fifth generation mobile communication technology (5G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 5 shows a block diagram of an electronic device 1900 provided by an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server. Referring to fig. 5, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as the Microsoft Server operating system (Windows Server), stored in the memory 1932^TM) Apple Inc. of the present application based on the graphic user interface operating System (Mac OS X)^TM) Multi-user, multi-process computer operating system (Unix)^TM) Free and open native code Unix-like operating System (Linux)^TM) Open native code Unix-like operating System (FreeBSD)^TM) Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method for generating video fingerprints is characterized by comprising the following steps:

determining at least one key frame image in a video to be processed;

2. The method according to claim 1, wherein the set of any one of the video segments comprises an original video segment and a copied video segment obtained by performing attack processing based on the original video segment.

3. The method according to claim 1 or 2, wherein any one of the video segment sets comprises a plurality of copied video segments obtained by performing attack processing based on the same original video segment.

4. The method according to any one of claims 1 to 3, wherein the video frame images belonging to any one of the video clip sets in the video frame image set comprises: key frame images of video clips in the set of video clips.

5. The method according to any one of claims 1 to 4, wherein the determining at least one key frame image in the video to be processed comprises:

key frame images are respectively determined in the at least one video segment.

6. The method of claim 5, wherein after obtaining feature information of the key frame image, the method further comprises:

7. The method according to claim 5 or 6, wherein the performing shot segmentation on the video to be processed to obtain at least one video segment of the video to be processed comprises:

8. The method according to claim 7, wherein the deriving at least one video segment of the video to be processed based on the preliminary segmentation result comprises:

9. The method according to any one of claims 5 to 8, wherein the determining key frame images in the at least one video segment respectively comprises:

10. The method according to any one of claims 1 to 9, wherein the determining fingerprint information of the video to be processed according to the feature information of the at least one key frame image comprises:

11. The method according to any one of claims 1 to 10, wherein after the determining the fingerprint information of the video to be processed, the method further comprises at least one of:

12. The method according to any one of claims 1 to 11, wherein after the determining the fingerprint information of the video to be processed, the method further comprises:

13. The method of claim 6, wherein after the determining is fingerprint information for the video segment, the method further comprises:

14. The method according to claim 13, wherein the first preset condition comprises:

15. The method of claim 13 or 14, wherein after said determining said video to be processed as a duplicate of said original video, said method further comprises:

16. The method of claim 6, wherein after the determining is fingerprint information for the video segment, the method further comprises:

17. The method according to any one of claims 6 and 13 to 16, wherein after the determining is fingerprint information of the video segment, the method further comprises:

and storing the fingerprint information of the clustering center.

18. A video fingerprint generation apparatus, comprising:

19. An electronic device, comprising:

one or more processors;

a memory for storing executable instructions;

wherein the one or more processors are configured to invoke the memory-stored executable instructions to perform the method of any one of claims 1 to 17.

20. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 17.