CN111143619B

CN111143619B - Video fingerprint generation method, search method, electronic device and medium

Info

Publication number: CN111143619B
Application number: CN201911378319.7A
Authority: CN
Inventors: 闫威; 徐嵩; 王�琦; 李琳; 王科; 杜欧杰
Original assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2023-08-15
Anticipated expiration: 2039-12-27
Also published as: CN111143619A

Abstract

The embodiment of the application provides a video fingerprint generation method, a search method, electronic equipment and a storage medium, wherein the generation method comprises the following steps: based on the video feature extraction model, obtaining a video fine fingerprint characterized by a two-dimensional feature vector; and based on information entropy, converting the video fine fingerprint into a video brief fingerprint characterized by a one-dimensional feature vector. According to the generation method provided by the embodiment of the application, the one-dimensional feature vector for accurately representing the video is obtained by establishing the independent video fingerprint library and calculating based on the information entropy. In addition, in the retrieval method, the video to be detected is quickly compared with the video fingerprints in the established video fingerprint library, so that the video can be accurately retrieved and confirmed, the phenomenon that the same or similar videos appear too much in content distribution and personalized recommendation results is reduced, and the user experience is improved.

Description

Video fingerprint generation method, search method, electronic device and medium

Technical Field

The present application relates to the field of video processing technologies, and in particular, to a video fingerprint generating method, a video fingerprint retrieving method, an electronic device, and a medium.

Background

The video fingerprint (Video Fingerprinting) service is based on a video fingerprint technology, generates a string of fingerprint characters capable of uniquely identifying the current video according to video content, and can effectively avoid the influence of operations such as format conversion, editing, cutting and splicing, compression and rotation of video files. The method can be used for various scenes such as video similarity check, video copyright, advertisement identification and the like, and video fingerprints have various purposes such as video identification, retrieval, copyright protection and the like.

In the existing video fingerprint matching technology, the frame pictures of the video to be detected are segmented, and the frame pictures of the video to be detected are encoded according to the pixels of each segmented to form video fingerprints; hash mapping is carried out on the codes of the frame pictures of the test video, and a Hash mapping address is obtained; searching the hash mapping address in a hash table of a pre-established video fingerprint library, and carrying out matching identification according to a searching result.

In the prior art, in the video fingerprint extraction stage, a video region of interest is used as a basic unit for extracting video fingerprints, and a clustering algorithm is adopted to remove time domain redundancy features of the video fingerprints; and in the video fingerprint matching stage, performing video fingerprint matching by adopting a method of averaging the distances of a plurality of video fingerprints.

In the prior art, video sequence key frames are segmented according to inter-frame correlation, video features are extracted from the video sequence key frames, the occurrence times of elements of a pixel feature dictionary are counted in each sub-block of the key frames according to classification results of pixels, feature vectors of the sub-blocks are obtained to be spliced into high-dimensional fingerprints of the key frames, and the fingerprints of the key frames are connected into video key frame fingerprint strings according to time sequence through dimension reduction.

In the prior art, the video is also subjected to label preprocessing; before searching video, multi-label classification is carried out on the video, and then identification range is reduced through video library label matching.

The processing of the blocks is to generate high-dimensional feature vectors, the operation capability required by the video retrieval and matching through the high-dimensional feature vectors is huge, and the problem of information loss or inaccurate expression of one-dimensional expression in the general sense can occur due to the video processing, so that how to simplify the application of video fingerprints in video retrieval and matching is an important problem in the industry.

Disclosure of Invention

The embodiment of the application provides a video fingerprint generation method, a retrieval method, electronic equipment and a medium, which are used for solving the problems of inaccurate video retrieval and large operand in the prior art.

In a first aspect, an embodiment of the present application provides a method for generating a video fingerprint, where the method includes: based on the video feature extraction model, obtaining a video fine fingerprint characterized by a two-dimensional feature vector; and based on information entropy, converting the video fine fingerprint into a video brief fingerprint characterized by a one-dimensional feature vector.

The method for generating the video fingerprint further comprises the following steps: establishing a corresponding video fine fingerprint library and a corresponding video brief fingerprint library according to the video fine fingerprint and the video brief fingerprint; and establishing a spatial coordinate index of the video brief fingerprint library corresponding to the one-dimensional characteristic vector value according to the video brief fingerprint.

Wherein the converting the video fine fingerprint into a video brief fingerprint characterized by a one-dimensional feature vector comprises: carrying out one-dimensional compression on the two-dimensional feature vector; obtaining the sum of information entropy of the feature vector values compressed by each frame of the video as the information entropy of the frame; and converting the frame sequence of the video into the one-dimensional feature vector based on the information entropy of the frame.

In a second aspect, an embodiment of the present application provides a method for searching a video fingerprint, including: based on the video feature extraction model, acquiring a fine fingerprint of the video to be detected, which is characterized by the two-dimensional feature vector; based on information entropy, converting the fine fingerprint of the video to be detected into a brief fingerprint of the video to be detected, which is characterized by a one-dimensional feature vector; and matching the brief fingerprints of the video to be detected with a video brief fingerprint library based on the space coordinate index, and acquiring a primary video set corresponding to a matching result from the video brief fingerprint library.

The video fingerprint retrieval method further comprises the following steps: and matching the fine fingerprint of the video to be detected with a fine fingerprint library corresponding to the primary video set to obtain a matching result.

The method for matching the brief fingerprint of the video to be detected with the brief fingerprint library of the video based on the space coordinate index, obtaining a primary video set corresponding to a matching result from the brief fingerprint library of the video, comprises the following steps: determining a one-dimensional feature vector of the video to be detected, and adjusting the value range of the one-dimensional feature vector of the video to be detected according to the influence factor and the step length; matching each value of the one-dimensional feature vector of the video to be detected with the space coordinate index, and confirming that a matching result is in the value range; and recording corresponding primary selection space coordinates in the space coordinate index, and determining a corresponding primary selection video set according to the primary selection space coordinates.

Wherein said matching each value of the one-dimensional feature vector of the video to be detected with the spatial coordinate index includes: calculating standard deviation according to entropy values corresponding to the spatial coordinates of the one-dimensional feature vector of each frame of image of the video to be detected and the spatial coordinate index; and determining the primary selection matching degree of the video to be detected according to the relation between the standard deviation and a preset threshold value, wherein the primary selection matching degree is used for representing the similarity degree of the video to be detected and the video corresponding to the brief fingerprint library.

In a third aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of a method for generating a video fingerprint according to the first aspect when the program is executed.

In a fourth aspect, embodiments of the present application provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a video fingerprint retrieval method according to the second aspect.

In a fifth aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of a method for generating a video fingerprint according to the first aspect when the program is executed.

In a sixth aspect, an embodiment of the present application provides a non-transitory computer readable storage medium, on which is stored a computer program, which when executed by a processor, implements the steps of a video fingerprint retrieval method according to the second aspect.

According to the video fingerprint generation method, the video fingerprint retrieval method, the electronic equipment and the storage medium, the independent video fingerprint library is built, and the one-dimensional feature vector accurately representing the video is obtained based on calculation of information entropy. Further, the video to be detected is quickly compared with the video fingerprints in the established video fingerprint library, so that the video can be accurately retrieved and confirmed, repeated videos or repeated fragments can be recalled in real time, the repeated storage of the content in the media asset library is prevented, and the media asset storage efficiency is improved; and the phenomenon that the same or similar videos are too much in the content distribution and personalized recommendation results is reduced, and the user experience is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for generating video fingerprints based on information entropy according to an embodiment of the present application;

FIG. 2 is a flow chart of a method for retrieving video fingerprints based on information entropy according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a video feature extraction model according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a video feature extraction model according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a video fingerprint generating apparatus according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a video fingerprint retrieval device according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device for generating a video fingerprint according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device for retrieving video fingerprints according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In general, the application performs feature extraction on an original video or a video to be detected by an improved unsupervised deep learning method to generate a two-dimensional feature vector with a uniform format as a fine fingerprint of the video; and then, an information measurement method of information entropy is used for the video fine fingerprint to generate a video brief fingerprint of the one-dimensional feature vector, so that the video representation accuracy is improved.

Further, the one-dimensional characteristic vector value is used for generating a space coordinate index as a basis for rapid matching of video fingerprints, and a fingerprint library of corresponding brief fingerprints is formed.

And when searching or matching, generating a standard two-dimensional feature vector and a standard one-dimensional feature vector corresponding to the video to be detected. And (3) carrying out quick search and matching on the brief fingerprints represented by the one-dimensional feature vectors and the one-dimensional feature vectors in the video brief fingerprint library according to the space coordinate indexes and the corresponding matching algorithm to obtain a primary selection set. And then, matching the fine fingerprint with the video of the primary set based on the fine fingerprint, and outputting a final matching result, thereby realizing the improvement of the recall ratio and the rapid division of the matching subset.

Fig. 1 is a flowchart of a method for generating a video fingerprint according to an embodiment of the present application, as shown in fig. 1, the method includes the following steps: s11, based on a video feature extraction model, obtaining a video fine fingerprint represented by a two-dimensional feature vector; and S12, converting the video fine fingerprint into a video brief fingerprint characterized by one-dimensional feature vectors based on information entropy.

In step S11, a video feature extraction model is constructed, and based on an automatic encoding and decoding network therein, a video fine fingerprint corresponding to a two-dimensional feature vector is obtained, so as to obtain a two-dimensional feature vector library of the video fine fingerprint.

In one embodiment, all videos with different resolutions are subjected to dimension reduction and key frame extraction based on a video feature extraction model, namely a neural network of an unsupervised self-encoder, and a two-dimensional feature vector matrix corresponding to the videos with uniform reference resolution is obtained. Wherein, in the two-dimensional feature vector, a first dimension represents a feature vector group of each key frame of the video, and a second dimension represents a time sequence of the key frame. Note that the video feature extraction model may be a neural network of unsupervised self-encoders in the present application, wherein the automatic codec network is part of the neural network.

Fig. 3 is a schematic diagram of a video feature extraction model according to an embodiment of the application, as shown in fig. 3. Further, obtaining a two-dimensional feature vector matrix corresponding to the video with uniform reference resolution, including: extracting all key frames from the video, and inputting the extracted key frames into a self-encoder in FIG. 3; acquiring the characteristic value of the image frames with the same reference resolution through an automatic coding and decoding network sharing 5 hidden layers; the method comprises the steps of constructing an unsupervised self-encoder neural network, extracting all videos with different resolutions through dimension reduction and core features, outputting a video fingerprint feature vector matrix with uniform size, and providing reference support for comparison of follow-up videos with the same content but different resolutions.

In a specific embodiment, all the key frames are extracted from each video of the video library, and gray conversion is performed on each key frame image as input of the self-encoder. Outputting an image with the same dimension as the original image through an automatic encoding and decoding network sharing 5 hidden layers; training the model, and ending the training when the error target is smaller than a specific value.

Further, since the resolutions of different videos are different, in this fig. 3, the model is improved, and the input and output layers with different resolutions are separately accessed, but share an implicit layer, i.e. a codec network, to ensure that the same feature extraction rule is generated.

In one embodiment, the feature value is confirmed to meet a threshold requirement, and a two-dimensional feature vector corresponding to the video is obtained.

In another embodiment, even the same content may have the same video with different resolutions coexisting at the same time. If two feature codes with larger differences are extracted according to the model, the recall ratio of video fingerprint de-duplication is reduced. To solve this problem, the model is further improved after the model pre-training is completed and the pre-set initial weights are completed.

Fig. 4 is a schematic diagram of an improved video feature extraction model according to an embodiment of the application, as shown in fig. 4, where a reference resolution of the video, for example, 480p, is set. The output layer characteristics are adjusted to be the characteristics with consistent reference resolution levels. Further, the error determination criterion is modified to be compared with the original feature input value of the reference resolution.

That is, the original error function e (x _out –x _in ) Change to e (x) _out’ –x _in‘ )。

Wherein x is _in Representing the resolution of the content video provided in advance, such as possibly 1080p of the original image characteristics.

x _out Representing the self-coded network trained and original x _in The image output of the matching resolution, i.e. also based on 1080p features.

x _in‘ The original image feature of the content 480p video provided in advance is represented as a reference input.

x _out‘ The image characteristic value which is output after training by the self-coding network and has the reference resolution of 480p is represented and is taken as the reference output.

In this way, it is ensured that the same content video always outputs the same feature code after model calculation, no matter how many inputs of different resolutions are available. And continuously training the improved model, inputting key frames of equal time sequences with different resolutions and the same content, outputting characteristic values of image frames with the same reference resolution, and performing error judgment until the threshold requirement is met. At the end of training, the part from the input layer to the coding layer is taken as the final feature extraction model, i.e. the part within the dashed box shown in fig. 4.

Wherein in one embodiment, each video is converted into a two-dimensional feature vector, wherein the first dimension is a set of feature vectors representing each key frame image and the second dimension is a time sequence representing each key frame image.

In step 12, a one-dimensional feature vector based on information entropy is generated, and a video brief fingerprint is correspondingly generated. Further, a video brief fingerprint library is constructed and generated.

Wherein, in one embodiment, the converting the video fine fingerprint into a video brief fingerprint characterized by a one-dimensional feature vector comprises: carrying out one-dimensional compression on the two-dimensional feature vector; obtaining the sum of information entropy of the feature vector values compressed by each frame of the video as the information entropy of the frame; and converting the frame sequence of the video into the one-dimensional feature vector based on the information entropy of the frame.

The method comprises the steps of generating two-dimensional feature vector groups of each video, carrying out one-dimensional compression on the two-dimensional feature vector groups, and converting the video into the one-dimensional feature vector groups by adopting an information entropy method to serve as brief fingerprints of the video.

Where information entropy is a measure representation of information, an image may be represented by information entropy. When the contents of the two images are equal or close, the information entropy is also equal or close.

The expression formula of the information entropy is as follows:

P _i and H is the corresponding information entropy value for the corresponding vector value. Therefore, the information measurement of the image content expression is realized through the information entropy, and even if hands and feet are made on the original video, such as video picture rotation, the value of the information entropy is unchanged, so that the fault tolerance and recall ratio of video matching are improved.

In addition, by establishing a spatial index of vector values based on information entropy, a basis is provided for subsequent rapid searching, and since the video feature vector is formed by two dimensionsTurning to one dimension, the temporal complexity of roughing contrast of a video can be most desirably selected from the group consisting of O (n ² ) Down to O (log n) thus greatly reducing the seek time.

According to the video brief fingerprints, a corresponding video brief fingerprint library is established; and establishing a spatial coordinate index of the video brief fingerprint library corresponding to the one-dimensional characteristic vector value according to the video brief fingerprint.

In one embodiment, the sum of the information entropy of each image of the video is obtained as the information entropy of the image, and the image sequence of the video is converted into a one-dimensional feature vector as a brief fingerprint of the video. Further, a space coordinate index with a format of < vector value of one-dimensional feature vector, array coordinate serial number > is established for the one-dimensional vector, so that preparation is made for subsequent quick searching.

Fig. 2 is a flow chart of a video fingerprint retrieval method based on information entropy according to an embodiment of the present application, and as shown in fig. 2, a video fingerprint retrieval method includes: s21, based on a video feature extraction model, acquiring a fine fingerprint of a video to be detected, which is characterized by a two-dimensional feature vector; based on information entropy, converting the fine fingerprint of the video to be detected into a brief fingerprint of the video to be detected, which is characterized by a one-dimensional feature vector; s22, based on the space coordinate index, the brief fingerprints of the video to be detected and the video brief fingerprint library are matched, and a primary video set corresponding to a matching result is obtained from the video brief fingerprint library.

The process of acquiring the fine fingerprint and the brief fingerprint in S21 is the same as the process of acquiring the above embodiment, and the previous embodiment acquires the fine fingerprint and the brief fingerprint as the matched fingerprint library data; in the present embodiment, however, as sample data of actual matching.

Wherein, in S21, a video fine fingerprint corresponding to the two-dimensional feature vector is acquired based on the automatic codec network therein by constructing a video feature extraction model.

In a specific embodiment, extracting all key frames from the video to be detected, and performing gray conversion on each image to serve as input of a self-encoder; and outputting the image with the same dimension as the original image through an automatic encoding and decoding network of a 5-layer hidden layer.

Further, since the resolutions of different videos are different, the model is improved, and the input and output layers with different resolutions are independently accessed, but the hidden layers, namely the coding and decoding networks, are shared, so that the same feature extraction rule is ensured to be generated.

As shown in fig. 4, the reference resolution of the video, for example, 480p, is set. The output layer characteristics are adjusted to characteristics of the reference resolution level. Further, the error determination criterion is modified to be compared with the original feature input value of the reference resolution. That is, the original error function e (x _out –x _in ) Change to e (x) _out’ –x _in‘ )。

In this way, it is ensured that the same content video always outputs the same feature code after model calculation, no matter how many inputs of different resolutions are. And continuously training the improved model, inputting key frames of equal time sequences with different resolutions and the same content, outputting characteristic values of image frames with the same reference resolution, and performing error judgment until the threshold requirement is met.

In step 21, a one-dimensional feature vector based on information entropy is generated, and a video brief fingerprint is correspondingly generated. Wherein, in one embodiment, the converting the video fine fingerprint into a video brief fingerprint characterized by a one-dimensional feature vector comprises: carrying out one-dimensional compression on the two-dimensional feature vector; obtaining the sum of information entropy of the feature vector values compressed by each frame of the video as the information entropy of the frame; and converting the frame sequence of the video into the one-dimensional feature vector based on the information entropy of the frame.

The formula of the information entropy is as follows:

the information measurement of the image content expression is realized through the information entropy, and even if hands and feet are made on the original video, such as video picture rotation, the value of the information entropy is unchanged, so that the fault tolerance and recall ratio of video matching are improved.

In addition, by establishing a spatial index based on vector values of information entropy, a basis is provided for subsequent rapid searching, and as video feature vectors are converted from two dimensions to one dimension, the time complexity of roughing comparison of one video can be obtained from O (n ² ) Down to O (log n) thus greatly reducing the seek time.

In one embodiment, the sum of the information entropy is obtained for the feature value of each image of the video, the information entropy is used as the information entropy of the image, the image sequence of the video is converted into one-dimensional feature vectors, the one-dimensional vectors are used as the brief fingerprints of the video, and the space coordinate indexes with the format of < the vector value of the one-dimensional feature vectors and the array coordinate serial numbers > are established for the one-dimensional vectors.

That is, the fine fingerprint and the brief fingerprint of the video to be detected are output to be matched with the brief fingerprint of each video in the video library.

In one embodiment, the matching the brief fingerprint of the video to be detected with the video brief fingerprint library based on the spatial coordinate index, and obtaining the initial video set corresponding to the matching result from the video brief fingerprint library includes: determining a one-dimensional feature vector of the video to be detected, and adjusting the value range of the one-dimensional feature vector of the video to be detected according to the influence factor and the step length; matching each value of the one-dimensional feature vector of the video to be detected with the space coordinate index, and confirming that a matching result is in the value range; and recording corresponding primary selection space coordinates in the space coordinate index, and determining a corresponding primary selection video set according to the primary selection space coordinates.

Wherein in one embodiment, said matching each value of the one-dimensional feature vector of the video to be detected with the spatial coordinate index comprises: calculating standard deviation according to entropy values corresponding to the spatial coordinates of the one-dimensional feature vector of each frame of image of the video to be detected and the spatial coordinate index; and determining the primary selection matching degree of the video to be detected according to the relation between the standard deviation and a preset threshold value, wherein the primary selection matching degree is used for representing the similarity degree of the video to be detected and the video corresponding to the brief fingerprint library.

In a specific embodiment, for each video, a first value of a brief fingerprint vector of the video to be measured is obtained as an original value, an influence factor epsilon is set as 0.03, a step length is set as 0.01, and a range of values to be measured is adjusted as [ original value ± epsilon ] to be used as a set a.

And (3) for each value in the set A, matching with the image space index established in advance by the video in sequence, and if the same value can be found, returning to the corresponding space position coordinate to form the set B.

And selecting a first coordinate value in the set B as a starting point of candidate video brief fingerprint comparison, starting feature vector comparison between a brief fingerprint feature vector of a video to be detected and the starting point of the candidate video vector, and calculating a standard deviation. If the standard deviation is smaller than a specific threshold value, the fitting degree of the video to be detected and the candidate video content is higher, the candidate video is considered to be highly likely to be similar to the candidate video content, the candidate video is marked, and the candidate video belongs to a preliminary set which is required to be subjected to fine fingerprint comparison subsequently, and matching is finished.

Specifically, if the standard deviation value is greater than the threshold value, using the next coordinate in the set B as a contrast starting point to perform feature vector matching; sequentially cycling until the standard deviation of the matching result between a certain coordinate starting point and the fingerprint to be detected is found to be smaller than a threshold value, and considering that the matching is successful and quitting; or all coordinates in the B set have completed matching as a starting point.

And matching the video fine fingerprint to be detected with the primary fine fingerprint library and outputting a result. Further, matching the fine fingerprint of the video to be detected with a fine fingerprint library corresponding to the primary video set to obtain a matching result.

Further, the fine fingerprint of the video to be detected and the fine fingerprint of the primary selected video subset are compared by adopting the existing image similarity accurate comparison algorithm or algorithm combination according to the previously recorded comparison starting point coordinates, including but not limited to Euclidean distance, cosine included angle, hamming distance or using a higher-level Scale-invariant feature transform algorithm (Scale-invariant feature transform or SIFT) and the like, for example, the video is confirmed to be repeated with the content of the video to be detected, and manual intervention is reminded.

Fig. 5 is a schematic diagram of a video fingerprint generating apparatus according to an embodiment of the present application, and as shown in fig. 5, a video fingerprint generating apparatus includes: a fine fingerprint acquisition module 51, wherein the fine fingerprint acquisition module 51 is used for acquiring a video fine fingerprint characterized by a two-dimensional feature vector based on a video feature extraction model; a brief fingerprint acquisition module 52 for converting the video fine fingerprint into a video brief fingerprint characterized by a one-dimensional feature vector based on information entropy.

Further, the video fingerprint generating device further comprises a spatial index establishing module, wherein the spatial index establishing module is used for: establishing a corresponding video fine fingerprint library and a corresponding video brief fingerprint library according to the video fine fingerprint and the video brief fingerprint; the spatial index establishing module is further used for establishing a spatial coordinate index of the video brief fingerprint library corresponding to the one-dimensional characteristic vector value according to the video brief fingerprint.

Further, the fine fingerprint acquisition module is further configured to:

performing dimension reduction and key frame extraction on all videos with different resolutions based on a video feature extraction model, and acquiring a two-dimensional feature vector matrix corresponding to the videos, wherein the two-dimensional feature vector matrix corresponds to the videos and has uniform reference resolution;

wherein, in the two-dimensional feature vector, a first dimension represents a feature vector group of each key frame of the video, and a second dimension represents a time sequence of the key frame.

Further, the profile fingerprint acquisition module is further configured to:

carrying out one-dimensional compression on the two-dimensional feature vector;

obtaining the sum of information entropy of the feature vector values compressed by each frame of the video as the information entropy of the frame;

and converting the frame sequence of the video into the one-dimensional feature vector based on the information entropy of the frame.

Fig. 6 is a schematic diagram of a video fingerprint retrieving apparatus according to an embodiment of the present application, as shown in fig. 6, the video fingerprint retrieving apparatus includes: a video fingerprint acquisition module 61 and a video fingerprint matching module 62; the video fingerprint acquisition module 61 is configured to acquire a fine fingerprint of a video to be detected, which is characterized by a two-dimensional feature vector, based on a video feature extraction model; the video fingerprint acquisition module 61 is further configured to convert, based on information entropy, a fine fingerprint of the video to be detected into a brief fingerprint of the video to be detected characterized by a one-dimensional feature vector; the video fingerprint matching module 62 is configured to match the brief fingerprint of the video to be detected with a video brief fingerprint library based on the spatial coordinate index, and obtain a primary video set corresponding to the matching result from the video brief fingerprint library.

Further, the video fingerprint matching module 62 is configured to: and matching the fine fingerprint of the video to be detected with a fine fingerprint library corresponding to the primary video set to obtain a matching result.

Further, the video fingerprint acquisition module 61 is further configured to: determining a one-dimensional feature vector of the video to be detected, and adjusting the value range of the one-dimensional feature vector of the video to be detected according to the influence factor and the step length;

matching each value of the one-dimensional feature vector of the video to be detected with the space coordinate index, and confirming that a matching result is in the value range;

and recording corresponding primary selection space coordinates in the space coordinate index, and determining a corresponding primary selection video set according to the primary selection space coordinates.

Fig. 7 illustrates a physical schematic diagram of an electronic device, as shown in fig. 7, which may include: a processor 710, a communication interface (Communications Interface) 720, a memory 830, and a communication bus 740. Wherein processor 710, communication interface 720, and memory 730 communicate with each other via a communication bus 740. Processor 710 may call logic instructions in memory 730 to perform the following method: based on the video feature extraction model, obtaining a video fine fingerprint characterized by a two-dimensional feature vector; and based on information entropy, converting the video fine fingerprint into a video brief fingerprint characterized by a one-dimensional feature vector.

It should be noted that, in this embodiment, the electronic device may be a server, a PC, or other devices in the specific implementation, so long as the structure of the electronic device includes a processor 710, a communication interface 720, a memory 730, and a communication bus 740 as shown in fig. 7, where the processor 710, the communication interface 720, and the memory 730 complete communication with each other through the communication bus 740, and the processor 710 may call logic instructions in the memory 730 to execute the above method. The embodiment does not limit a specific implementation form of the electronic device.

Further, the logic instructions in the memory 730 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Fig. 8 illustrates a physical structure diagram of an electronic device, as shown in fig. 8, which may include: processor 810, communication interface (Communications Interface) 820, memory 830, and communication bus 840, wherein processor 810, communication interface 820, memory 830 accomplish communication with each other through communication bus 840. The processor 810 may call logic instructions in the memory 830 to perform the following method: based on the video feature extraction model, acquiring a fine fingerprint of the video to be detected, which is characterized by the two-dimensional feature vector; based on information entropy, converting the fine fingerprint of the video to be detected into a brief fingerprint of the video to be detected, which is characterized by a one-dimensional feature vector; and matching the brief fingerprints of the video to be detected with a video brief fingerprint library based on the space coordinate index, and acquiring a primary video set corresponding to a matching result from the video brief fingerprint library.

It should be noted that, in this embodiment, the electronic device may be a server, a PC, or other devices in the specific implementation, so long as the structure of the electronic device includes a processor 810, a communication interface 820, a memory 830, and a communication bus 840 as shown in fig. 8, where the processor 810, the communication interface 820, and the memory 830 complete communication with each other through the communication bus 840, and the processor 810 may call logic instructions in the memory 830 to execute the above method. The embodiment does not limit a specific implementation form of the electronic device.

Further, the logic instructions in the memory 830 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Further, embodiments of the present application disclose a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform a method of generating a video fingerprint as provided by the above method embodiments, for example comprising: based on the video feature extraction model, obtaining a video fine fingerprint characterized by a two-dimensional feature vector; and based on information entropy, converting the video fine fingerprint into a video brief fingerprint characterized by a one-dimensional feature vector.

In another aspect, an embodiment of the present application further provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor is implemented to perform the method for searching video fingerprints provided in the foregoing embodiments, for example, including: based on the video feature extraction model, acquiring a fine fingerprint of the video to be detected, which is characterized by the two-dimensional feature vector; based on information entropy, converting the fine fingerprint of the video to be detected into a brief fingerprint of the video to be detected, which is characterized by a one-dimensional feature vector; and matching the brief fingerprints of the video to be detected with a video brief fingerprint library based on the space coordinate index, and acquiring a primary video set corresponding to a matching result from the video brief fingerprint library.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A method for generating a video fingerprint, the method comprising:

based on the video feature extraction model, obtaining a video fine fingerprint characterized by a two-dimensional feature vector;

based on information entropy, converting the video fine fingerprint into a video brief fingerprint characterized by a one-dimensional feature vector;

the video feature extraction model based on the video feature extraction model, the obtaining of the video fine fingerprint characterized by the two-dimensional feature vector comprises the following steps:

2. The method for generating a video fingerprint according to claim 1, further comprising:

establishing a corresponding video fine fingerprint library and a corresponding video brief fingerprint library according to the video fine fingerprint and the video brief fingerprint;

and establishing a spatial coordinate index of the video brief fingerprint library corresponding to the one-dimensional characteristic vector value according to the video brief fingerprint.

3. The method of generating a video fingerprint according to claim 1, wherein said converting the video fine fingerprint into a video brief fingerprint characterized by a one-dimensional feature vector comprises:

carrying out one-dimensional compression on the two-dimensional feature vector;

4. A method for retrieving a video fingerprint, comprising:

based on the video feature extraction model, acquiring a fine fingerprint of the video to be detected, which is characterized by the two-dimensional feature vector;

based on information entropy, converting the fine fingerprint of the video to be detected into a brief fingerprint of the video to be detected, which is characterized by a one-dimensional feature vector;

based on the space coordinate index, matching the brief fingerprint of the video to be detected with a video brief fingerprint library, and acquiring a primary video set corresponding to a matching result from the video brief fingerprint library;

the method for obtaining the fine fingerprint of the video to be detected based on the video feature extraction model comprises the following steps:

performing dimension reduction and key frame extraction on all videos to be detected with different resolutions based on a video feature extraction model, and obtaining a two-dimensional feature vector matrix corresponding to the videos to be detected, wherein the two-dimensional feature vector matrix corresponds to the videos to be detected and has uniform reference resolution;

and the first dimension of the two-dimensional feature vector represents a feature vector group of each key frame of the video to be detected, and the second dimension represents the time sequence of the key frame.

5. The method for retrieving a video fingerprint according to claim 4, further comprising:

and matching the fine fingerprint of the video to be detected with a fine fingerprint library corresponding to the primary video set to obtain a matching result.

6. The method for searching video fingerprints according to claim 4, wherein the matching the brief fingerprints of the video to be detected with the video brief fingerprint library based on the spatial coordinate index, and obtaining the initial video set corresponding to the matching result from the video brief fingerprint library, includes:

determining a one-dimensional feature vector of the video to be detected, and adjusting the value range of the one-dimensional feature vector of the video to be detected according to the influence factor and the step length;

7. The method according to claim 6, wherein matching each value of the one-dimensional feature vector of the video to be detected with the spatial coordinate index comprises:

calculating standard deviation according to entropy values corresponding to the spatial coordinates of the one-dimensional feature vector of each frame of image of the video to be detected and the spatial coordinate index;

and determining the primary selection matching degree of the video to be detected according to the relation between the standard deviation and a preset threshold value, wherein the primary selection matching degree is used for representing the similarity degree of the video to be detected and the video corresponding to the brief fingerprint library.

8. A non-transitory computer readable storage medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the method of generating a video fingerprint according to any one of claims 1 to 3 or the steps of the method of retrieving a video fingerprint according to any one of claims 4 to 7.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of generating a video fingerprint according to any one of claims 1 to 3 or the steps of the method of retrieving a video fingerprint according to any one of claims 4 to 7 when the program is executed.