CN114860991A

CN114860991A - Short video de-duplication method and computer readable storage medium

Info

Publication number: CN114860991A
Application number: CN202210284145.3A
Authority: CN
Inventors: 赵舰波; 张晓瑾; 张善庄; 刘怀亮; 杨斌; 王亚凯
Original assignee: Xi'an Zhile Technology Co ltd
Current assignee: Xi'an Zhile Technology Co ltd
Priority date: 2022-03-22
Filing date: 2022-03-22
Publication date: 2022-08-05

Abstract

The invention relates to a short video duplicate removal method and a computer readable storage medium, wherein the method comprises the following steps: creating a feature database, wherein the feature database comprises a first feature point descriptor generated by a first key frame of an original video; extracting a plurality of second key frames of the target video, wherein the second key frames generate second feature point descriptors; and matching the second characteristic point descriptor of the second key frame with the first characteristic point descriptor in the characteristic database to judge whether the second key frame exists in the characteristic database, and if so, deleting the target video corresponding to the second key frame to obtain a duplicate-removed video set. The method of the invention establishes the image label characteristic database by classification and classification, which can improve the speed of identifying the target video and improve the duplicate removal efficiency. According to the method, the second feature point descriptor of the second key frame is matched with the first feature point descriptor in the feature database, so that the identification accuracy is guaranteed, and the identification cost is reduced.

Description

Short video duplicate removal method and computer readable storage medium

Technical Field

The invention belongs to the technical field of video processing, and relates to a short video duplicate removal method and a computer readable storage medium.

Background

With the rapid development from the media industry, short videos have become a trendy entertainment and lifestyle, and more short videos are created and spread. The short videos with the same content are found from the massive video library, redundant videos are removed, a large amount of storage space can be saved for the short video operation platform, management is facilitated, the phenomenon that homogeneous videos flood can be avoided, and watching feeling of users is improved. In addition, the review of short video content is also copyright protection for the video originator. Therefore, it is of great practical significance to detect and delete the short video with repeated content.

Currently, there are three main video deduplication methods in common use:

1. and (5) related information of the video is subjected to duplication removal. The related information (such as title, author, description, cover page, etc.) of the target video is compared with the related information of the existing videos in the video library in a traversal way to determine the repeated videos.

2. And (5) carrying out duplicate removal by a hash algorithm. Extracting key frames from the target video according to the target, performing down-sampling on the key frames, and calculating the similarity between the target video and the existing video in the video library by adopting a hash algorithm, wherein the common hash algorithm comprises a mean hash algorithm, a difference hash algorithm and a perceptual hash algorithm. And setting a certain threshold, and judging as a repeated video when the similarity is higher than the threshold.

3. Feature deduplication. Extracting key frames of the target video according to the target, extracting the image characteristics of the target video by using a convolutional neural network, identifying the similar characteristics of the target video and a video library by using clustering, and removing repeated video images.

The following techniques have drawbacks: 1. video deduplication is inefficient. Traversing through the related information of the target video is time-consuming and only a few repeated videos can be excluded, and the duplicate removal efficiency is not high. 2. The video deduplication accuracy is low. The method for calculating the video similarity and removing the duplicate through the Hash algorithm is to calculate the similarity aiming at the global characteristics of the video, is insensitive to the slight change of the video, and is not particularly high in duplicate removal accuracy. 3. Video deduplication costs high. Although the convolutional neural network can identify video features with finer granularity by utilizing feature extraction, the convolutional neural network has large and complex calculation amount and higher consumption cost in the face of rapidly growing short videos.

Therefore, it is an urgent need to solve the problem of providing a method with high duplicate removal efficiency, high accuracy and low cost.

Disclosure of Invention

In order to solve the above problems in the prior art, the present invention provides a short video deduplication method and a computer-readable storage medium. The technical problem to be solved by the invention is realized by the following technical scheme:

the embodiment of the invention provides a short video duplicate removal method, which comprises the following steps:

step 1, creating a feature database, wherein the feature database comprises first feature point descriptors generated by a first key frame of an original video, and all the first feature point descriptors are stored in the feature database according to a label hierarchy;

step 2, extracting a plurality of second key frames of the target video, wherein the second key frames generate second feature point descriptors;

and 3, matching a second feature point descriptor of the second key frame with a first feature point descriptor in the feature database to judge whether the second key frame exists in the feature database, and if so, deleting the target video corresponding to the second key frame to obtain a duplicate-removed video set.

In one embodiment of the present invention, the step 1 comprises:

step 1.1, acquiring the original video;

step 1.2, calculating the optical flow of the object motion in the video frame of the original video by using an optical flow method, extracting the video frame with the minimum optical flow moving times, and taking the video frame with the minimum optical flow moving times as a first key frame;

step 1.3, identifying visual contents in the first key frame;

step 1.4, labeling labels for the first key frames according to visual contents in the first key frames based on a preset rule;

step 1.5, judging whether the label of the first key frame exists in the feature database, if the label of the first key frame exists in the feature database, continuing to iterate to judge whether the label level of the first key frame is the bottom label level, if not, continuing to iterate downwards until the label level of the first key frame reaches the bottom label level, if the label of the first key frame does not exist in the feature database, creating a new label category, and attributing the first key frame to the new label category;

step 1.6, extracting image features of the first key frame by utilizing an SIFT technology to generate a first feature point descriptor of the first key frame;

and 1.7, storing the first feature point descriptors generated by the first key frame into the feature database according to the label types.

In one embodiment of the present invention, the step 1.3 comprises:

step 1.31, acquiring a training set;

step 1.32, extracting image features from the training set by using a direction gradient histogram;

step 1.33, training an SVM classifier by using the image characteristics so as to obtain an image classification model;

and 1.34, identifying and classifying the label of the first key frame by using the image classification model.

In one embodiment of the present invention, the step 1.6 comprises:

step 1.61, constructing a scale space by using a Gaussian difference function;

step 1.62, comparing the sampling point of the first keyframe with adjacent points with different sigma values in the same scale space, wherein if the sampling point is the maximum value or the minimum value of a Gaussian difference function in the same scale space, the sampling point is a first characteristic point in the scale, and sigma is a scale coordinate;

step 1.63, reserving a second feature point in the first feature points according to the contrast and the main curvature ratio;

step 1.64, positioning the position and the scale of the second feature point by fitting a three-dimensional quadratic function;

step 1.65, obtaining the main direction of the second characteristic point according to the peak value of the histogram;

step 1.66, the coordinate axis is rotated to the main direction of the second feature point, multi-dimensional SIFT features are formed on the second feature point, and length normalization processing is carried out on feature vectors of the SIFT features to obtain a first feature point descriptor.

In one embodiment of the present invention, the step 1.63 includes:

and judging the relation between the contrast absolute value of the first characteristic point and a first threshold value and the relation between the main curvature ratio of the first characteristic point and a second threshold value, if the contrast absolute value of the first characteristic point is smaller than the first threshold value or the main curvature ratio of the first characteristic point is larger than the second threshold value, removing the first characteristic point, and taking the reserved first characteristic point as the second characteristic point.

In one embodiment of the present invention, the step 2 comprises:

step 2.1, acquiring the target video;

2.2, calculating the optical flow of the object motion in the video frames of the target video by using an optical flow method, extracting the video frame with the minimum optical flow moving times, and taking the video frame with the minimum optical flow moving times as a second key frame;

2.3, identifying visual contents in the second key frame;

step 2.4, labeling labels for the second key frames according to the visual contents in the second key frames based on a preset rule;

step 2.5, judging whether the label of the second key frame exists in the feature database, if the label of the second key frame exists in the feature database, continuing to iterate to judge whether the label level of the second key frame is the bottom label level, if not, continuing to iterate downwards until the second key frame reaches the bottom label level, if the label of the second key frame does not exist in the feature database, creating a new label category, and classifying the second key frame under a new label category;

and 2.6, extracting the image features of the second key frame by utilizing an SIFT technology to generate a second feature point descriptor of the second key frame.

In one embodiment of the present invention, the step 3 comprises:

traversing feature points under the lowest label level in the feature database, sequentially calculating the similarity between a second feature point descriptor in the target video and a first feature point descriptor in the feature database, if the similarity is smaller than a distance threshold, indicating that a second key frame exists in the feature database, deleting the target video corresponding to the second key frame, and then repeating the step 2 and the step 3 until a video set after duplication removal is obtained, wherein the similarity is the nearest block distance divided by the next nearest block distance.

In an embodiment of the present invention, after the step 3, the method further includes:

and 4, updating the feature database and the video database according to the target video in the video set after the duplication is removed.

In one embodiment of the present invention, the step 4 comprises:

step 4.1, storing the second feature point descriptors of the target videos in the video set after the duplication removal into the feature database according to the label types so as to update the feature database;

and 4.2, storing the target video in the video set after the duplication removal into the video database so as to update the video database.

The invention also provides a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned method.

Compared with the prior art, the invention has the following beneficial effects:

the method of the invention establishes the image label characteristic database by classification and classification, which can improve the speed of identifying the target video and improve the duplicate removal efficiency. According to the method, the second feature point descriptor of the second key frame is matched with the first feature point descriptor in the feature database, so that the identification accuracy is guaranteed, and the identification cost is reduced.

Other aspects and features of the present invention will become apparent from the following detailed description, which proceeds with reference to the accompanying drawings. It is to be understood, however, that the drawings are designed solely for purposes of illustration and not as a definition of the limits of the invention, for which reference should be made to the appended claims. It should be further understood that the drawings are not necessarily drawn to scale and that, unless otherwise indicated, they are merely intended to conceptually illustrate the structures and procedures described herein.

Drawings

Fig. 1 is a schematic flowchart of a short video deduplication method according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of another short video deduplication method according to an embodiment of the present invention;

fig. 3 is a schematic diagram illustrating an example of tag determination according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a computer device module according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to specific examples, but the embodiments of the present invention are not limited thereto.

Example one

Referring to fig. 1 and fig. 2, fig. 1 is a schematic flow chart of a short video deduplication method according to an embodiment of the present invention, and fig. 2 is a schematic flow chart of another short video deduplication method according to an embodiment of the present invention, where the present invention provides a short video deduplication method, and the short video deduplication method includes:

step 1, a feature database is created, wherein the feature database comprises first feature point descriptors generated by a first key frame of an original video, and all the first feature point descriptors are stored in the feature database according to label levels.

In a specific embodiment, step 1 may specifically include steps 1.1-1.7, wherein:

and 1.1, acquiring an original video.

Specifically, the original video is a huge amount of videos in an original video library.

Step 1.2, calculating the optical flow of the object motion in the video frame of the original video by using an optical flow method, extracting the video frame with the minimum optical flow moving times, and taking the video frame with the minimum optical flow moving times as a first key frame. The specific formula is as follows:

M(k)＝∑∑|L _x (i,j,k)|+|L _y (i,j,k)|

M(k _i )＝min[M(k)]

where M (k) represents the luminous flux of the k-th frame, L _x (i, j, k) represents the component of the optical flow x at the k-th frame pixel point (i, j), L _y (i, j, k) represents the component of the optical flow y at pixel point (i, j) of the k-th frame, M (k) _i ) Representing the minimum light flux for the k-th frame.

And 1.3, identifying the visual content in the first key frame.

In a specific embodiment, step 1.3 may specifically include steps 1.31-1.34, wherein:

and 1.31, acquiring a training set.

Specifically, the training set is a key frame image collected based on a preset rule.

The preset rule refers to a plurality of different types of labels divided according to different levels according to different attributes of the training set image, for example, the first-level labels may be set as: objects, scenes, concepts; the secondary label under the object label can be set as: animals, plants, humans, etc.; the tertiary label under the animal label can be set as: monkey, rabbit, tiger, etc., and so on.

And step 1.32, extracting image features from the training set by using the directional gradient histogram.

Step 1.33, training an SVM (Support Vector Machine) classifier by using image features so as to obtain an image classification model.

And 1.34, identifying and classifying the label of the first key frame by using an image classification model.

Step 1.4, labeling labels for the first key frame according to the visual content in the first key frame based on a preset rule, for example, if the visual content is a monkey, the first-level label of the first key frame is an object, the second-level label is an animal, and the third-level label is a monkey.

Step 1.5, judging whether the label of the first key frame exists in the feature database, if the label of the first key frame exists in the feature database, continuously iterating to judge whether the label level of the first key frame is the bottom label level, if not, continuously iterating downwards until the label level of the first key frame reaches the bottom label level, and if the label of the first key frame does not exist in the feature database, creating a new label category and attributing the first key frame to the new label category.

For example, referring to fig. 3, if the label of the first key frame is a monkey, the first-level label of the iteration is an object, the second-level label of the iteration is an animal, the third-level label of the iteration is a monkey, and the monkey is the lowest-level label, so that the iteration does not need to be continued.

Step 1.6, extracting the image features of the first keyframe by using a Scale-invariant feature transform (SIFT) technique to generate a first feature point descriptor of the first keyframe.

In a specific embodiment, step 1.6 may specifically include steps 1.61-1.66, wherein:

and 1.61, constructing a scale space by using a Gaussian difference function (DoG).

In particular, multi-scale features of image data are modeled to ensure that image features are scale independent. In this embodiment, a gaussian difference function is used to construct a scale space, and the specific formula is as follows:

L(x,y,σ)＝G(x,y,σ)*I(x,y)

D(x,y,σ)＝L(x,y,kσ)-L(x,y,σ)

wherein, (x, y) represents space coordinates, sigma represents scale coordinates, L (x, y, sigma) represents a two-dimensional image scale space function, I (x, y) represents an original image function, G (x, y, sigma) represents a scale-variable Gaussian function, D (x, y, sigma) represents a Gaussian difference scale space function, and k represents the ratio of two adjacent image scales.

Step 1.62, comparing the sampling point of the first keyframe with adjacent points with different sigma values in the same scale space, wherein if the sampling point is the maximum value or the minimum value in the same scale space, the sampling point is the first characteristic point in the scale.

Specifically, the sampling point is a point of the first keyframe, that is, all points in the first keyframe are sequentially compared with adjacent points with different σ values in the same scale space, and if the currently compared point is the maximum value or the minimum value of the gaussian difference function in the same scale space, the point is considered to be the first feature point of the image in the scale.

And step 1.63, reserving a second characteristic point in the first characteristic points according to the contrast and the main curvature ratio.

Specifically, the relationship between the absolute value of the contrast of the first feature point and the first threshold and the relationship between the principal curvature ratio of the first feature point and the second threshold are determined, and if the absolute value of the contrast of the first feature point is smaller than the first threshold or the principal curvature ratio of the first feature point is larger than the second threshold, the first feature point is removed, and the retained first feature point is the second feature point. This embodiment removes the points of low contrast and unstable edge effects by this step, leaving representative feature points.

Wherein, the low-contrast point refers to a feature point with a contrast absolute value smaller than 0.03, and the unstable edge effect point refers to a feature point with a principal curvature ratio larger than 10, i.e. the first threshold is preferably 0.03, and the second threshold is preferably 10.

And 1.64, positioning the position and the scale of the second feature point by fitting a three-dimensional quadratic function.

And step 1.65, obtaining the main direction of the second characteristic point according to the peak value of the histogram.

Specifically, the gradient direction of the pixels in the neighborhood with each second feature point as the center is counted, the statistical result is represented by a histogram, and the peak value of the histogram is the main direction of the second feature point. Wherein the gradient value and the principal direction calculation formula are respectively as follows:

θ(x,y)＝tan ^-1 (L(x,y+1)-L(x,y-1))/(L(x+1,y)-L(x-1,y))

where m (x, y) represents the gradient value and θ (x, y) represents the principal direction.

Step 1.66, the coordinate axis is rotated to the main direction of the second feature point, multi-dimensional SIFT features are formed on the second feature point, length normalization processing is carried out on feature vectors of the SIFT features to obtain a first feature point descriptor, and the coordinate axis refers to the coordinate axis of the feature point and pixel points which are around the feature point and contribute to the feature point. The contributing pixel region may generally take a 4 × 4 window with the feature point as the center as the neighborhood.

The steps determine that one SIFT feature point has three key information: position, scale, direction. In order to subsequently improve the matching rate, feature point descriptors need to be generated. The feature point descriptor includes the feature points and the pixel points around the feature points that contribute to them. The generation step of the feature point descriptor comprises the steps of firstly rotating the coordinate axis into the main direction of the feature point and then forming the multi-dimensional SIFT feature for the feature point.

Preferably, the SIFT feature is 128-dimensional because the 128-dimensional feature vector has a stronger matching robustness.

And step 1.7, storing the first feature point descriptors generated by the first key frame into a feature database according to the label types.

And 2, extracting a plurality of second key frames of the target video, wherein the second key frames generate second feature point descriptors, and the target video is a video which needs to be judged whether the target video is repeated or not.

In a specific embodiment, step 2 may specifically include steps 2.1-2.6, where:

and 2.1, acquiring a target video.

And 2.2, calculating the optical flow of the object motion in the video frames of the target video by using an optical flow method, extracting the video frame with the minimum optical flow moving frequency, and taking the video frame with the minimum optical flow moving frequency as a second key frame.

And 2.3, identifying the visual content in the second key frame.

Specifically, the labels of the second keyframes are identified and categorized using an image classification model.

And 2.4, labeling labels for the second key frames according to the visual contents in the second key frames based on a preset rule.

And 2.5, judging whether the label of the second key frame exists in the feature database, if the label of the second key frame exists in the feature database, continuously iterating to judge whether the label level of the second key frame is the bottom label level, if not, continuously iterating downwards until the second key frame reaches the bottom label level, and if the label of the second key frame does not exist in the feature database, creating a new label category and classifying the second key frame under the changed label category.

And 2.6, extracting the image features of the second key frame by using an SIFT technology to generate a second feature point descriptor of the second key frame.

For step 2, please refer to the feature extraction step in step 1 for detailed implementation steps, which are not described herein again.

And 3, matching the second characteristic point descriptor of the second key frame with the first characteristic point descriptor in the characteristic database to judge whether the second key frame exists in the characteristic database, if so, deleting the target video corresponding to the second key frame to obtain a de-duplicated video set, wherein the video set comprises a plurality of de-duplicated target videos.

Specifically, feature points under the lowest label level in the feature database are traversed, the similarity between a second feature point descriptor in the target video and a first feature point descriptor point in the feature database is sequentially calculated, if the similarity is smaller than a distance threshold value, it is indicated that a second key frame already exists in the feature database, the target video corresponding to the second key frame is deleted, and then the steps 2 and 3 are repeated until a duplicate-removed video set is obtained.

Preferably, to reduce the computational burden of the matching process, similarity is measured in terms of neighborhood distance, where similarity is the nearest neighborhood distance divided by the next nearest neighborhood distance. The block distance calculation formula is as follows:

L ₀ ＝|x ₁ -x ₂ |+|y ₁ -y ₂ |

wherein L is ₀ Is the block distance, (x) ₁ ,y ₁ )(x ₂ ,y ₂ ) Two-dimensional coordinates of two pixels respectively.

Preferably, the distance threshold is 0.8, i.e. the same image is considered to be present if the nearest distance divided by the next nearest distance (best match and next best match) is less than 0.8.

Further, after the video which is judged to be the same is deleted, the operations of the step 2 and the step 3 are repeated again until the distance threshold value is larger than 0.8, and a video set after duplication removal is obtained.

And 4, updating the feature database and the video database according to the de-duplicated video centralized target video.

In a specific embodiment, step 4 may specifically include steps 4.1-4.2:

step 4.1, storing the second feature point descriptors of the de-duplicated video set target video into a feature database according to the label types so as to update the feature database;

and 4.2, storing the target video in the video set after the duplication removal into a video database to update the video database, wherein the video database can be used as a next original video database.

The invention aims at the problems of low efficiency, low accuracy and high cost in the process of removing the duplicate of the short video. And establishing an image feature tag database by classifying according to an SIFT algorithm and an image recognition technology, and matching and removing the weight of the key frame features of the target video. The created feature database avoids the complete traversal of the feature database, and greatly reduces objects to be traversed during matching; the SIFT algorithm is simple to calculate, and is not influenced by the size or direction of an image to quickly help to position local features in the image, so that the image feature data of a target video key frame is extracted based on the SIFT algorithm, the local image region can be accurately positioned and identified, and the duplication elimination accuracy is greatly improved; according to the method, the block distance measurement is used for replacing the Euclidean distance to calculate the video image feature similarity, so that the complex calculation amount is reduced, and the video duplicate removal cost is reduced.

Example two

The present invention also provides a computer-readable storage medium, in which a computer program is stored, and the computer program implements the steps of the first embodiment when being executed by a processor.

Generally, the computer readable storage medium can be disposed in a computer device, see fig. 4, which can include units or modules of a processor, a communication interface, a computer readable storage medium, and a communication bus, wherein the processor, the communication interface, and the memory are communicated with each other through the communication bus,

a computer-readable storage medium for storing a computer program;

a processor, configured to implement the following steps when executing a program stored on a computer-readable storage medium:

The communication bus mentioned in the above computer device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc.

The communication interface is used for communication between the electronic equipment and other equipment.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

The computer device may be: desktop computers, laptop computers, intelligent mobile terminals, servers, and the like. Without limitation, any electronic device that can implement the present invention is within the scope of the present invention.

For the computer device/storage medium embodiment, since it is substantially similar to the method embodiment, the description is simple, and the relevant points can be referred to the partial description of the method embodiment one.

In the description of the present invention, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic data point described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristic data points described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples described in this specification can be combined and combined by those skilled in the art.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, this application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "module" or "system. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. A computer program stored/distributed on a suitable medium supplied together with or as part of other hardware, may also take other forms of distribution, such as via the Internet or other wired or wireless telecommunication systems.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A short video deduplication method, wherein the short video deduplication method comprises:

2. The short video deduplication method of claim 1, wherein the step 1 comprises:

step 1.1, acquiring the original video;

step 1.3, identifying visual contents in the first key frame;

3. The short video deduplication method of claim 2, wherein the step 1.3 comprises:

step 1.31, acquiring a training set;

4. The short video deduplication method of claim 2, wherein the step 1.6 comprises:

step 1.61, constructing a scale space by utilizing a Gaussian difference function;

5. The short video deduplication method of claim 4, wherein the step 1.63 comprises:

6. The short video deduplication method of claim 1, wherein the step 2 comprises:

step 2.1, acquiring the target video;

2.3, identifying visual contents in the second key frame;

7. The short video deduplication method of claim 1, wherein the step 3 comprises:

8. The short video deduplication method of claim 1, wherein after the step 3, further comprising:

9. The short video deduplication method of claim 1, wherein the step 4 comprises:

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 9.