CN111143619A

CN111143619A - Video fingerprint generation method, video fingerprint retrieval method, electronic device and medium

Info

Publication number: CN111143619A
Application number: CN201911378319.7A
Authority: CN
Inventors: 闫威; 徐嵩; 王�琦; 李琳; 王科; 杜欧杰
Original assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2020-05-12
Anticipated expiration: 2039-12-27
Also published as: CN111143619B

Abstract

The embodiment of the invention provides a video fingerprint generation method, a retrieval method, electronic equipment and a storage medium, wherein the generation method comprises the following steps: acquiring a video fine fingerprint represented by a two-dimensional feature vector based on a video feature extraction model; and converting the video fine fingerprints into video brief fingerprints represented by one-dimensional feature vectors based on the information entropy. According to the generation method provided by the embodiment of the invention, the one-dimensional characteristic vector of the precise representation video is obtained by establishing the independent video fingerprint database and calculating based on the information entropy. Furthermore, in the retrieval method, the video to be detected and the video fingerprints in the established video fingerprint library are quickly compared, so that the retrieval and confirmation of the video can be accurately performed, the phenomenon that the same or similar videos are too many in content distribution and personalized recommendation results is reduced, and the user experience is improved.

Description

Video fingerprint generation method, video fingerprint retrieval method, electronic device and medium

Technical Field

The invention relates to the technical field of video processing, in particular to a video fingerprint generation method, a retrieval method, electronic equipment and a medium.

Background

Video Fingerprinting (Video Fingerprinting) service is based on a Video Fingerprinting technology, a string of fingerprint characters which can uniquely identify a current Video is generated according to Video content, and the influence of operations such as format conversion, editing, cutting and splicing, compression and rotation and the like of a Video file can be effectively avoided. The method can be used for multiple scenes such as video similarity duplicate checking, video copyright, advertisement identification and the like, and the video fingerprints have multiple purposes such as video identification, retrieval, copyright protection and the like.

In the existing video fingerprint matching technology, the obtained frame picture of a video to be detected is partitioned, and the frame picture of the video to be detected is coded according to the pixel of each partition to form a video fingerprint; carrying out Hash mapping on the codes of the frame pictures of the test video to obtain a Hash mapping address; and searching the hash mapping address in a hash table of a pre-established video fingerprint database, and performing matching identification according to the searching result.

In the prior art, in the video fingerprint extraction stage, a video interesting region is used as a basic unit for extracting video fingerprints, and a clustering algorithm is adopted to remove the time domain redundancy characteristics of the video fingerprints; in the video fingerprint matching stage, the video fingerprint matching is carried out by adopting a distance average method of a plurality of video fingerprints.

In the prior art, a video sequence key frame is partitioned according to inter-frame correlation and video features are extracted from the video sequence key frame, the occurrence frequency of each element of a pixel feature dictionary is counted in each sub-block of the key frame according to the classification result of the pixel, feature vectors of the sub-blocks are obtained and spliced into high-dimensional fingerprints of the key frame, and the fingerprints of the key frame are connected into a video key frame fingerprint string according to the time sequence through dimension reduction.

In the prior art, the video is also subjected to label preprocessing; before searching videos, multi-label classification is carried out on the videos, and then the identification range is narrowed through video library label matching.

The high-dimensional feature vectors are generated for processing the blocks, the computing power required for retrieving and matching videos through the high-dimensional feature vectors is huge, and the problem of information loss or inaccurate expression of the video processing is caused by one-dimensional expression in the general sense, so that the application of how to simplify the video fingerprints in video retrieval and matching is an important problem to be solved urgently in the industry at present.

Disclosure of Invention

The embodiment of the invention provides a video fingerprint generation method, a retrieval method, electronic equipment and a medium, which are used for solving the problems of inaccurate video retrieval and large computation amount in the prior art.

In a first aspect, an embodiment of the present invention provides a method for generating a video fingerprint, where the method includes: acquiring a video fine fingerprint represented by a two-dimensional feature vector based on a video feature extraction model; and converting the video fine fingerprints into video brief fingerprints represented by one-dimensional feature vectors based on the information entropy.

The method for generating the video fingerprint further comprises the following steps: establishing a corresponding video fine fingerprint database and a corresponding video brief fingerprint database according to the video fine fingerprint and the video brief fingerprint; and establishing a space coordinate index of the video brief fingerprint database corresponding to the one-dimensional characteristic vector value according to the video brief fingerprint.

Wherein the converting the video fine fingerprint into a video profile fingerprint characterized by a one-dimensional feature vector comprises: performing one-dimensional compression on the two-dimensional feature vector; acquiring the sum of information entropies of the feature vector values of each frame of the video after compression to serve as the information entropy of the frame; converting the sequence of frames of the video into the one-dimensional feature vector based on the entropy of the information of the frames.

In a second aspect, an embodiment of the present invention provides a method for retrieving a video fingerprint, including: acquiring a fine fingerprint of a to-be-detected video represented by a two-dimensional feature vector based on a video feature extraction model; based on the information entropy, converting the fine fingerprint of the video to be detected into a brief fingerprint of the video to be detected represented by a one-dimensional characteristic vector; and matching the brief fingerprint of the video to be detected with a brief video fingerprint library based on the spatial coordinate index, and acquiring a primary selection video set corresponding to a matching result from the brief video fingerprint library.

The video fingerprint retrieval method further comprises the following steps: and matching the fine fingerprint of the video to be detected with the fine fingerprint library corresponding to the primary selection video set to obtain a matching result.

The method comprises the following steps of matching the brief fingerprint of the video to be detected with a brief video fingerprint library based on a spatial coordinate index, and acquiring a primary selection video set corresponding to a matching result from the brief video fingerprint library, wherein the method comprises the following steps: determining a one-dimensional characteristic vector of the video to be detected, and adjusting the value range of the one-dimensional characteristic vector of the video to be detected according to the influence factor and the step length; matching each value of the one-dimensional characteristic vector of the video to be detected with the spatial coordinate index, and confirming that a matching result is in the value range; and recording the corresponding primary selection space coordinate in the space coordinate index, and determining a corresponding primary selection video set according to the primary selection space coordinate.

Wherein the matching each value of the one-dimensional feature vector of the video to be detected with the spatial coordinate index comprises: calculating a standard deviation according to the entropy value corresponding to the spatial coordinate matched with the spatial coordinate index and the one-dimensional feature vector of each frame of image of the video to be detected; and determining the initial selection matching degree of the video to be detected according to the relation between the standard deviation and a preset threshold, wherein the initial selection matching degree is used for representing the similarity degree of the video to be detected and the video corresponding to the brief fingerprint database.

In a third aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the method for generating a video fingerprint according to the first aspect when executing the program.

In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the video fingerprint retrieval method according to the second aspect.

In a fifth aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method for generating a video fingerprint according to the first aspect.

In a sixth aspect, the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the video fingerprint retrieval method according to the second aspect.

According to the video fingerprint generation method, the video fingerprint retrieval method, the electronic device and the storage medium, the independent video fingerprint database is established, and the one-dimensional characteristic vector accurately representing the video is obtained based on the calculation of the information entropy. Furthermore, the video to be detected is quickly compared with the video fingerprints in the established video fingerprint library, so that the retrieval and confirmation of the video can be accurately carried out, repeated videos or repeated segments can be recalled in real time, the content in the media asset library is prevented from being stored repeatedly, and the media asset storage efficiency is improved; the phenomenon that the same or similar videos are too many in content distribution and personalized recommendation results is reduced, and user experience is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method for generating video fingerprints based on information entropy according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a video fingerprint retrieval method based on entropy according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a process of a video feature extraction model according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a video feature extraction model according to an embodiment of the present invention;

FIG. 5 is a diagram of a video fingerprint generation device according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating an apparatus for retrieving video fingerprints according to an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of an electronic device for generating video fingerprints according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device for retrieving video fingerprints according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In general, the method carries out feature extraction on an original video or a video to be detected through an improved unsupervised deep learning method, and generates two-dimensional feature vectors with uniform formats as fine fingerprints of the video; and then, the video fine fingerprints are measured by using an information entropy measurement method to generate the video brief fingerprints of the one-dimensional characteristic vectors, so that the video representation accuracy is improved.

Furthermore, a space coordinate index is generated by the one-dimensional characteristic vector value and is used as a basis for fast matching of the video fingerprints, and a fingerprint library of corresponding brief fingerprints is formed.

And when the two-dimensional feature vectors are searched or matched, generating standard two-dimensional feature vectors and standard one-dimensional feature vectors corresponding to the video to be detected. And fast searching and matching the brief fingerprints represented by the one-dimensional characteristic vector and the one-dimensional characteristic vector in the video brief fingerprint library according to the space coordinate index and a corresponding matching algorithm to obtain an initial selection set. And then, matching the fine fingerprints with the videos of the primary set based on the fine fingerprints, and outputting a final matching result, thereby realizing the improvement of recall ratio and the quick division of a matching subset.

Fig. 1 is a flowchart of a method for generating a video fingerprint according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps: s11, acquiring video fine fingerprints represented by two-dimensional feature vectors based on the video feature extraction model; and S12, converting the video fine fingerprints into video brief fingerprints represented by one-dimensional feature vectors based on the information entropy.

In step S11, a video feature extraction model is constructed, and based on the automatic coding and decoding network, a video fine fingerprint corresponding to the two-dimensional feature vector is obtained, so as to obtain a two-dimensional feature vector library of the video fine fingerprint.

In one embodiment, all videos with different resolutions are subjected to dimensionality reduction and key frame extraction based on a video feature extraction model, namely a neural network of an unsupervised self-encoder, and a two-dimensional feature vector matrix corresponding to the videos with uniform reference resolution is obtained. In the two-dimensional feature vectors, a first dimension represents a feature vector group of each key frame of the video, and a second dimension represents the time sequence of the key frame. Note that the video feature extraction model, in this application, may be a neural network of an unsupervised auto-encoder, wherein the auto-codec network is part of the neural network.

As shown in fig. 3, fig. 3 is a schematic diagram of a video feature extraction model according to an embodiment of the present invention. Further, acquiring a two-dimensional feature vector matrix corresponding to the video with uniform reference resolution, including: extracting all key frames from the video, and inputting the key frames into the self-encoder in the figure 3; acquiring characteristic values of image frames with the same reference resolution through an automatic coding and decoding network sharing 5 layers of hidden layers; the method comprises the steps of constructing an unsupervised self-encoder-based neural network, extracting all videos with different resolutions through dimensionality reduction and core features, outputting video fingerprint feature vector matrixes with uniform sizes, and providing reference support for comparison of subsequent videos with the same content but different resolutions.

In a specific embodiment, all key frames are extracted from each video in the video library, and each key frame image is subjected to gray level conversion and serves as the input of the self-encoder. Outputting an image with the same dimensionality as an original image through an automatic coding and decoding network sharing 5 hidden layers; and training the model, and finishing the training when the error target is smaller than a specific value.

Further, since the resolutions of different videos are different, in fig. 3, the model is modified, and the input and output layers with different resolutions are accessed separately, but share a common hidden layer, i.e., a codec network, to ensure that the same feature extraction rules are generated.

Wherein, in one embodiment, the feature value is confirmed to meet the threshold requirement, and a two-dimensional feature vector corresponding to the video is obtained.

In another embodiment, even with the same content, there may be a case where the same videos of different resolutions coexist at the same time. If two feature codes with larger difference are extracted according to the model, the recall ratio of the video fingerprint duplicate removal is reduced. To solve the problem, after the model pre-training is finished and the preset initial weight is completed, the model is continuously improved.

Fig. 4 is a schematic diagram of an improved video feature extraction model according to an embodiment of the present invention, as shown in fig. 4, a reference resolution of a video is set, for example, 480 p. And adjusting the output layer characteristics to be the characteristics with consistent reference resolution level. Further, the error determination criterion is modified to be compared with the original feature input value of the reference resolution.

I.e. the original error function e (x)_out–x_in) Changed to e (x)_out’–x_in‘)。

Wherein x is_inRepresenting the resolution of the content video provided in advance, as may be 1080p of the original image characteristics.

x_outRepresenting the original x after training with a self-coding network_inAnd outputting the image with the matched resolution, namely outputting the feature based on 1080 p.

x_in‘Indicating the content 4 provided in advanceOriginal image characteristics of an 80p video are input as a reference.

x_out‘And the image characteristic value which only has the reference resolution of 480p and is output after the self-coding network training is represented as reference output.

In this way, it is ensured that the same feature code is always output after model calculation no matter how many different resolution inputs the video with the same content has. And continuously training the improved model, inputting key frames of the equal time sequence with different resolutions and the same content, outputting the characteristic value of the image frame with the same reference resolution and carrying out error judgment until the requirement of a threshold value is met. After the training is finished, the part from the input layer to the coding layer is taken as the final feature extraction model, namely the part inside the dashed box shown in fig. 4.

In one embodiment, each video is converted into a two-dimensional feature vector, wherein the first dimension is a set of feature vectors representing each key frame image and the second dimension represents the time sequence of each key frame image.

In step 12, a one-dimensional feature vector based on the information entropy is generated, and a brief fingerprint of the video is correspondingly generated. And further, constructing and generating a video brief fingerprint database.

Wherein, in one embodiment, the converting the video fine fingerprint into a video profile fingerprint of one-dimensional feature vector representation comprises: performing one-dimensional compression on the two-dimensional feature vector; acquiring the sum of information entropies of the feature vector values of each frame of the video after compression to serve as the information entropy of the frame; converting the sequence of frames of the video into the one-dimensional feature vector based on the entropy of the information of the frames.

Namely, the generated two-dimensional feature vector group of each video is compressed in one dimension, and the video is converted into the one-dimensional feature vector group as the brief fingerprint of the video by adopting an information entropy method.

Wherein, the information entropy is the measurement representation of the information, and an image can be represented by the information entropy. When the contents of the two images are equal or close, the information entropies of the two images are also equal or close.

The expression formula of the information entropy is as follows:

P_iis the corresponding vector value and H is the corresponding information entropy value. Therefore, information measurement of image content expression is realized through the information entropy, and even if hands and feet are made on the original video, such as video picture rotation and the like, the value of the information entropy is unchanged, so that the fault tolerance and recall ratio of video matching are improved.

In addition, a space index of vector values based on information entropy is established, a basis is provided for subsequent quick search, and the time complexity of rough contrast of one video can be changed from O (n) under the optimal condition due to the fact that the video feature vectors are converted from two dimensions to one dimension²) To o (log n), thereby greatly reducing the lookup time.

Establishing a corresponding video brief fingerprint database according to the video brief fingerprint; and establishing a space coordinate index of the video brief fingerprint database corresponding to the one-dimensional characteristic vector value according to the video brief fingerprint.

In a specific embodiment, the sum of the information entropies is obtained for the characteristic value of each image of the video, the sum is used as the information entropy of the image, and the image sequence of the video is converted into a one-dimensional characteristic vector which is used as a brief fingerprint of the video. Further, a space coordinate index with the format of the vector value of the one-dimensional characteristic vector and the array coordinate sequence number is established for the one-dimensional vector, and preparation is made for subsequent quick searching.

Fig. 2 is a schematic flowchart of a video fingerprint retrieval method based on information entropy according to an embodiment of the present invention, and as shown in fig. 2, the video fingerprint retrieval method includes: s21, acquiring a fine fingerprint of the video to be detected represented by the two-dimensional feature vector based on the video feature extraction model; based on the information entropy, converting the fine fingerprint of the video to be detected into a brief fingerprint of the video to be detected represented by a one-dimensional characteristic vector; and S22, matching the brief fingerprint of the video to be detected with the brief video fingerprint library based on the spatial coordinate index, and acquiring a primary selection video set corresponding to the matching result from the brief video fingerprint library.

The process of acquiring the fine fingerprint and the brief fingerprint in S21 is the same as the process of acquiring the fine fingerprint and the brief fingerprint in the previous embodiment, and the fine fingerprint and the brief fingerprint are acquired as the matched fingerprint database data; and in the present embodiment, as sample data of actual matching.

In S21, a video fine fingerprint corresponding to the two-dimensional feature vector is obtained based on the automatic coding and decoding network by constructing a video feature extraction model.

In a specific embodiment, all key frames of a video to be detected are extracted, and each image is subjected to gray level conversion and is used as the input of a self-encoder; and outputting the image with the same dimensionality as the original image through an automatic coding and decoding network with 5 hidden layers.

Furthermore, because the resolutions of different videos are different, the model is improved, and the input and output layers with different resolutions are accessed independently, but share a hidden layer, namely a coding and decoding network, so that the same feature extraction rule is ensured to be generated.

As shown in fig. 4, the reference resolution of the video is set, for example, 480 p. And adjusting the output layer characteristics to the characteristics of the reference resolution level. Further, the error determination criterion is modified to be compared with the original feature input value of the reference resolution. I.e. the original error function e (x)_out–x_in) Changed to e (x)_out’–x_in‘)。

Wherein x is_inPresentation is provided in advanceSuch as possibly 1080p of original image characteristics.

x_in‘Representing the original image characteristics of the content 480p video provided in advance as a reference input.

In this way, it is ensured that the same feature code is always output after model calculation no matter how many videos with different resolutions are input. And continuously training the improved model, inputting key frames of the equal time sequence with different resolutions and the same content, outputting the characteristic value of the image frame with the same reference resolution and carrying out error judgment until the requirement of a threshold value is met.

In step 21, a one-dimensional feature vector based on the information entropy is generated, and a video brief fingerprint is correspondingly generated. Wherein, in one embodiment, the converting the video fine fingerprint into a video profile fingerprint of one-dimensional feature vector representation comprises: performing one-dimensional compression on the two-dimensional feature vector; acquiring the sum of information entropies of the feature vector values of each frame of the video after compression to serve as the information entropy of the frame; converting the sequence of frames of the video into the one-dimensional feature vector based on the entropy of the information of the frames.

The formula of the information entropy is as follows:

the information measurement of image content expression is realized through the information entropy, and even if hands and feet are made on the original video, such as video picture rotation and the like, the value of the information entropy is unchanged, so that the fault tolerance and the recall ratio of video matching are improved.

In a specific embodiment, the sum of the information entropies is obtained for the feature value of each image of a video, the sum is used as the information entropy of the image, the image sequence of the video is converted into a one-dimensional feature vector which is used as a brief fingerprint of the video, and a spatial coordinate index with the format of the vector value of the one-dimensional feature vector and the array coordinate sequence number is established for the one-dimensional vector.

That is, the fine fingerprint and the brief fingerprint of the video to be detected are output to be matched with the brief fingerprint of each video in the video library.

In an embodiment, the matching, based on the spatial coordinate index, the brief fingerprint of the video to be detected with the brief video fingerprint library, and acquiring a primary selection video set corresponding to a matching result from the brief video fingerprint library, includes: determining a one-dimensional characteristic vector of the video to be detected, and adjusting the value range of the one-dimensional characteristic vector of the video to be detected according to the influence factor and the step length; matching each value of the one-dimensional characteristic vector of the video to be detected with the spatial coordinate index, and confirming that a matching result is in the value range; and recording the corresponding primary selection space coordinate in the space coordinate index, and determining a corresponding primary selection video set according to the primary selection space coordinate.

In an embodiment, the matching each value of the one-dimensional feature vector of the video to be detected with the spatial coordinate index includes: calculating a standard deviation according to the entropy value corresponding to the spatial coordinate matched with the spatial coordinate index and the one-dimensional feature vector of each frame of image of the video to be detected; and determining the initial selection matching degree of the video to be detected according to the relation between the standard deviation and a preset threshold, wherein the initial selection matching degree is used for representing the similarity degree of the video to be detected and the video corresponding to the brief fingerprint database.

In a specific embodiment, for each video, a first value of a brief fingerprint vector of the video to be detected is obtained as an original value, an influence factor epsilon is set as 0.03, a step length is set as 0.01, and a value range to be detected is adjusted to be [ original value +/-epsilon ] to be used as a set A.

And for each value in the set A, sequentially matching with an image space index established in advance by the video, and if the same value can be found, returning the corresponding space position coordinate to form a set B.

And selecting a first coordinate value in the set B as a starting point of the comparison of the brief fingerprints of the candidate videos, starting the comparison of the brief fingerprint feature vectors of the videos to be detected and the starting point of the candidate video vectors, and calculating the standard deviation. If the standard deviation is smaller than the specific threshold value, the fitting degree of the video to be detected and the candidate video is higher, the probability that the content of the video to be detected and the content of the candidate video are similar is considered to be higher, the candidate video is marked and belongs to a subsequent initial set needing fine fingerprint comparison, and matching is finished.

Specifically, if the standard difference value is larger than the threshold value, using the next coordinate in the set B as a comparison starting point to perform feature vector matching; sequentially circulating until the standard deviation of the matching result between a certain coordinate starting point and the fingerprint to be detected is smaller than a threshold value, considering that the matching is successful and quitting; or all coordinates in the B set have completed matching as a starting point.

And matching the fine fingerprint of the video to be detected with the primary fine fingerprint library and outputting a result. Further, matching the fine fingerprint of the video to be detected with the fine fingerprint library corresponding to the primary selection video set to obtain a matching result.

Further, according to the coordinate of the previously recorded comparison starting point, the fine fingerprint of the video to be detected and the fine fingerprint of the initially selected video subset are compared by adopting the existing image similarity precise comparison algorithm or algorithm combination, including but not limited to Euclidean distance, cosine included angle, Hamming distance or higher-level Scale invariant feature transform algorithm (Scale-invariant feature transform or SIFT) and the like, for example, the repetition of the video and the content of the video to be detected is confirmed, and the manual intervention is reminded.

Fig. 5 is a schematic diagram of a video fingerprint generation device according to an embodiment of the present invention, and as shown in fig. 5, a video fingerprint generation device includes: the fine fingerprint acquisition module 51, the fine fingerprint acquisition module 51 is configured to acquire a video fine fingerprint represented by a two-dimensional feature vector based on a video feature extraction model; a brief fingerprint acquisition module 52 configured to convert the video fine fingerprint into a video brief fingerprint represented by a one-dimensional feature vector based on entropy.

Further, the apparatus for generating video fingerprints further includes a spatial index establishing module, where the spatial index establishing module is configured to: establishing a corresponding video fine fingerprint database and a corresponding video brief fingerprint database according to the video fine fingerprint and the video brief fingerprint; the spatial index establishing module is further used for establishing a spatial coordinate index of the video brief fingerprint database corresponding to the one-dimensional feature vector value according to the video brief fingerprint.

Further, the fine fingerprint acquisition module is further configured to:

based on a video feature extraction model, performing dimensionality reduction and key frame extraction on all videos with different resolutions to obtain a two-dimensional feature vector matrix corresponding to the videos with uniform reference resolution;

in the two-dimensional feature vectors, a first dimension represents a feature vector group of each key frame of the video, and a second dimension represents the time sequence of the key frame.

Further, the brief fingerprint acquisition module is further configured to:

performing one-dimensional compression on the two-dimensional feature vector;

acquiring the sum of information entropies of the feature vector values of each frame of the video after compression to serve as the information entropy of the frame;

converting the sequence of frames of the video into the one-dimensional feature vector based on the entropy of the information of the frames.

Fig. 6 is a schematic diagram of a video fingerprint retrieval device according to an embodiment of the present invention, and as shown in fig. 6, the video fingerprint retrieval device includes: a video fingerprint acquisition module 61 and a video fingerprint matching module 62; the video fingerprint acquisition module 61 is configured to acquire a fine fingerprint of a to-be-detected video represented by a two-dimensional feature vector based on a video feature extraction model; the video fingerprint acquisition module 61 is further configured to convert the fine fingerprint of the video to be detected into a brief fingerprint of the video to be detected represented by a one-dimensional feature vector based on an information entropy; the video fingerprint matching module 62 is configured to match the brief fingerprint of the video to be detected with the brief video fingerprint library based on the spatial coordinate index, and obtain a primary selection video set corresponding to a matching result from the brief video fingerprint library.

Further, the video fingerprint matching module 62 is further configured to: and matching the fine fingerprint of the video to be detected with the fine fingerprint library corresponding to the primary selection video set to obtain a matching result.

Further, the video fingerprint obtaining module 61 is further configured to: determining a one-dimensional characteristic vector of the video to be detected, and adjusting the value range of the one-dimensional characteristic vector of the video to be detected according to the influence factor and the step length;

matching each value of the one-dimensional characteristic vector of the video to be detected with the spatial coordinate index, and confirming that a matching result is in the value range;

and recording the corresponding primary selection space coordinate in the space coordinate index, and determining a corresponding primary selection video set according to the primary selection space coordinate.

Fig. 7 illustrates a physical structure diagram of an electronic device, and as shown in fig. 7, the electronic device may include: a processor (processor)710, a Communications Interface 720, a memory 830, and a communication bus 740. The processor 710, the communication interface 720, and the memory 730 communicate with each other via a communication bus 740. Processor 710 may call logic instructions in memory 730 to perform the following method: acquiring a video fine fingerprint represented by a two-dimensional feature vector based on a video feature extraction model; and converting the video fine fingerprints into video brief fingerprints represented by one-dimensional feature vectors based on the information entropy.

It should be noted that, when being implemented specifically, the electronic device in this embodiment may be a server, a PC, or other devices, as long as the structure includes the processor 710, the communication interface 720, the memory 730, and the communication bus 740 shown in fig. 7, where the processor 710, the communication interface 720, and the memory 730 complete mutual communication through the communication bus 740, and the processor 710 may call the logic instruction in the memory 730 to execute the above method. The embodiment does not limit the specific implementation form of the electronic device.

In addition, the logic instructions in the memory 730 can be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Fig. 8 illustrates a physical structure diagram of an electronic device, and as shown in fig. 8, the electronic device may include: a processor (processor)810, a communication Interface 820, a memory 830 and a communication bus 840, wherein the processor 810, the communication Interface 820 and the memory 830 communicate with each other via the communication bus 840. The processor 810 may call logic instructions in the memory 830 to perform the following method: acquiring a fine fingerprint of a to-be-detected video represented by a two-dimensional feature vector based on a video feature extraction model; based on the information entropy, converting the fine fingerprint of the video to be detected into a brief fingerprint of the video to be detected represented by a one-dimensional characteristic vector; and matching the brief fingerprint of the video to be detected with a brief video fingerprint library based on the spatial coordinate index, and acquiring a primary selection video set corresponding to a matching result from the brief video fingerprint library.

It should be noted that, when being implemented specifically, the electronic device in this embodiment may be a server, a PC, or other devices, as long as the structure includes the processor 810, the communication interface 820, the memory 830, and the communication bus 840 shown in fig. 8, where the processor 810, the communication interface 820, and the memory 830 complete mutual communication through the communication bus 840, and the processor 810 may call the logic instructions in the memory 830 to execute the above method. The embodiment does not limit the specific implementation form of the electronic device.

In addition, the logic instructions in the memory 830 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Further, an embodiment of the present invention discloses a computer program product, the computer program product includes a computer program stored on a non-transitory computer readable storage medium, the computer program includes program instructions, when the program instructions are executed by a computer, the computer can execute a video fingerprint generation method provided by the above-mentioned method embodiments, for example, the method includes: acquiring a video fine fingerprint represented by a two-dimensional feature vector based on a video feature extraction model; and converting the video fine fingerprints into video brief fingerprints represented by one-dimensional feature vectors based on the information entropy.

In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the video fingerprint retrieval method provided in the foregoing embodiments when executed by a processor, for example, the method includes: acquiring a fine fingerprint of a to-be-detected video represented by a two-dimensional feature vector based on a video feature extraction model; based on the information entropy, converting the fine fingerprint of the video to be detected into a brief fingerprint of the video to be detected represented by a one-dimensional characteristic vector; and matching the brief fingerprint of the video to be detected with a brief video fingerprint library based on the spatial coordinate index, and acquiring a primary selection video set corresponding to a matching result from the brief video fingerprint library.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for generating a video fingerprint, the method comprising:

acquiring a video fine fingerprint represented by a two-dimensional feature vector based on a video feature extraction model;

and converting the video fine fingerprints into video brief fingerprints represented by one-dimensional feature vectors based on the information entropy.

2. The method for generating video fingerprints according to claim 1, further comprising:

establishing a corresponding video fine fingerprint database and a corresponding video brief fingerprint database according to the video fine fingerprint and the video brief fingerprint;

and establishing a space coordinate index of the video brief fingerprint database corresponding to the one-dimensional characteristic vector value according to the video brief fingerprint.

3. The method for generating video fingerprints according to claim 1, wherein the obtaining a video fine fingerprint represented by a two-dimensional feature vector based on the video feature extraction model comprises:

4. The method of generating video fingerprints according to claim 1, wherein said converting the video fine fingerprint into a video profile fingerprint represented by a one-dimensional feature vector comprises:

performing one-dimensional compression on the two-dimensional feature vector;

5. A method for retrieving video fingerprints is characterized by comprising the following steps:

acquiring a fine fingerprint of a to-be-detected video represented by a two-dimensional feature vector based on a video feature extraction model;

based on the information entropy, converting the fine fingerprint of the video to be detected into a brief fingerprint of the video to be detected represented by a one-dimensional characteristic vector;

and matching the brief fingerprint of the video to be detected with a brief video fingerprint library based on the spatial coordinate index, and acquiring a primary selection video set corresponding to a matching result from the brief video fingerprint library.

6. The method for retrieving video fingerprints according to claim 5, further comprising:

and matching the fine fingerprint of the video to be detected with the fine fingerprint library corresponding to the primary selection video set to obtain a matching result.

7. The method for retrieving video fingerprints according to claim 5, wherein the matching the brief fingerprint of the video to be detected with the video brief fingerprint library based on the spatial coordinate index, and obtaining the initially selected video set corresponding to the matching result from the video brief fingerprint library comprises:

determining a one-dimensional characteristic vector of the video to be detected, and adjusting the value range of the one-dimensional characteristic vector of the video to be detected according to the influence factor and the step length;

8. The method for retrieving video fingerprints according to claim 7, wherein the matching each value of the one-dimensional feature vector of the video to be detected with the spatial coordinate index comprises:

calculating a standard deviation according to the entropy value corresponding to the spatial coordinate matched with the spatial coordinate index and the one-dimensional feature vector of each frame of image of the video to be detected;

and determining the initial selection matching degree of the video to be detected according to the relation between the standard deviation and a preset threshold, wherein the initial selection matching degree is used for representing the similarity degree of the video to be detected and the video corresponding to the brief fingerprint database.

9. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for generating a video fingerprint according to any one of claims 1 to 4 or the steps of the method for retrieving a video fingerprint according to any one of claims 5 to 8.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method for generating a video fingerprint according to any one of claims 1 to 4 or the steps of the method for retrieving a video fingerprint according to any one of claims 5 to 8 when executing said program.