CN111339368B

CN111339368B - Video retrieval method and device based on video fingerprint and electronic equipment

Info

Publication number: CN111339368B
Application number: CN202010105187.7A
Authority: CN
Inventors: 傅致晖; 孟丹; 李宏宇; 李晓林
Original assignee: Tongdun Holdings Co Ltd
Current assignee: Tongdun Holdings Co Ltd
Priority date: 2020-02-20
Filing date: 2020-02-20
Publication date: 2024-04-02
Anticipated expiration: 2040-02-20
Also published as: CN111339368A

Abstract

The invention discloses a video retrieval method, a device and electronic equipment based on video fingerprints, wherein the method comprises the following steps: respectively constructing a first video fingerprint matrix of a video to be searched and a second video fingerprint matrix of a preset video; calculating a correlation matrix of the first video fingerprint and the second video fingerprint; determining a connected region in the correlation matrix; and determining a video retrieval result based on the connected region attribute. Because the connected region is the connected region in the correlation matrix of the video fingerprint matrix to be searched and the preset video fingerprint matrix, the connected region can represent continuous frames with high similarity, so that the correlation between the video to be searched and the preset video can be determined based on the attribute of the connected region, further, the high-similarity video fragment can be accurately positioned, the video part is matched, various video time transformations are robust, the accuracy is high, and the interpretation is high.

Description

Video retrieval method and device based on video fingerprint and electronic equipment

Technical Field

The invention relates to the technical field of data processing, in particular to a video retrieval method and device based on video fingerprints and electronic equipment.

Background

With the rapid development of internet technology, information contents have been explosively increased. In these massive amounts of data, illegal videos, such as infringement videos that infringe the copyrights of original videos or illegal riot videos, are likely to be mixed, and in the following description, video infringement is taken as an example, in order to maintain the interests of the original video, it is necessary to identify the videos that may have infringement so as to maintain the copyrights of original videos.

The method for identifying the video in the related art mainly comprises the following modes: the first way is visual inspection; the second method is to cut the video into pictures, and then find similar pictures in original video or a preset video sample library in an image searching mode.

However, for the first way, i.e. visual inspection, it is not reliable and inefficient: similar segments of more detail are difficult for a person to see, and the quality of the inspection of the person is subject to fluctuations, especially to leakage after fatigue. A trained auditor checks approximately 1000 video segments per day, which is inefficient. For the second mode, namely, video is cut into pictures, then similarity is found in original video or preset video in an image searching mode, the effect of the method is strongly dependent on a sample library, and if the sample library is not updated timely or similar pictures cannot be collected, the similar pictures cannot be detected correctly. Through the two modes, only the situation that most of the video clips are similar/overlapped can be identified, namely, only the video with high similarity and long similarity duration with the video in the sample library can be detected, and the robustness of the transformation (such as brightness, resolution, watermark and the like) of the video frames is not strong. However, in a practical application scenario, editing of video often occurs. The existing searching method can not accurately search and accurately position under the condition that videos only overlap in a small part or video frames are subjected to various transformation.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The technical problem to be solved by the embodiment of the invention is how to recognize the video more accurately.

According to a first aspect, an embodiment of the present invention provides a video retrieval method based on video fingerprints, including: respectively constructing a first video fingerprint matrix of a video to be searched and a second video fingerprint matrix of a preset video; calculating a correlation matrix of the first video fingerprint and the second video fingerprint; determining a connected region in the correlation matrix; and determining a video retrieval result based on the connected region attribute.

Optionally, the determining the connected region in the correlation matrix includes: binarizing the correlation matrix through a preset threshold value; and determining a connected region in the binarized correlation matrix based on a contour labeling method.

Optionally, the determining the video search result based on the connected region attribute includes: judging whether the projection of the communication area in the first dimension of the correlation matrix is larger than a first preset value and whether the projection of the communication area in the second dimension of the correlation matrix is larger than a second preset value; and if the projection of the connected region in the first dimension of the correlation matrix is larger than a first preset value and the projection of the connected region in the second dimension of the correlation matrix is larger than a second preset value, confirming that similar video fragments exist in the video to be searched and the preset video.

Optionally, the determining the video search result based on the connected region attribute further includes: and obtaining the starting time and the ending time of the similar video clips in the video to be searched and the preset video respectively through the projection of the first dimension and the projection of the second dimension.

Optionally, the determining the video search result based on the connected region attribute further includes: and confirming the frame transformation of the video to be retrieved relative to the preset video based on the shape of the connected region.

Optionally, constructing the first video fingerprint matrix of the video to be retrieved and the second video fingerprint matrix of the preset video respectively includes: respectively extracting fingerprint vectors of a plurality of video frames of the video to be searched and the preset video; and respectively merging fingerprint vectors of the plurality of video frames to construct the first video fingerprint matrix and the second video fingerprint matrix.

According to a second aspect, an embodiment of the present invention provides a video retrieval device based on video fingerprint, including: the construction module is used for respectively constructing a first video fingerprint matrix of the video to be searched and a second video fingerprint matrix of the preset video; the computing module is used for computing a correlation matrix of the first video fingerprint and the second video fingerprint; the communication region demarcation module is used for determining a communication region in the correlation matrix; and the result confirmation module is used for determining a video retrieval result based on the connected region attribute.

Optionally, the communication region demarcation module includes: a binarization unit, configured to binarize the correlation matrix through a preset threshold; and the marking unit is used for determining a connected area in the binarized correlation matrix based on a contour marking method.

Optionally, the result confirmation module includes: the judging unit is used for judging whether the projection of the communication area in the first dimension of the correlation matrix is larger than a first preset value and whether the projection of the communication area in the second dimension of the correlation matrix is larger than a second preset value; and the similar video segment confirming unit is used for confirming that similar video segments exist in the video to be searched and the preset video when the projection of the connected region in the first dimension of the correlation matrix is larger than a first preset value and the projection of the connected region in the second dimension of the correlation matrix is larger than a second preset value.

Optionally, the result confirmation module further includes: the position confirmation unit is used for obtaining the starting time and the ending time of the similar video segment in the video to be searched and the preset video through the projection of the first dimension and the projection of the second dimension respectively.

Optionally, the result confirmation module further includes: and the frame transformation confirming unit is used for confirming the frame transformation of the video to be retrieved relative to the preset video based on the shape of the connected region.

Optionally, the building module includes: the fingerprint extraction unit is used for respectively extracting fingerprint vectors of a plurality of video frames of the video to be searched and the preset video; and the matrix construction unit is used for respectively combining the fingerprint vectors of the plurality of video frames to construct the first video fingerprint matrix and the second video fingerprint matrix.

According to a third aspect, an embodiment of the present invention provides a computer-readable storage medium storing computer instructions for causing the computer to perform the video fingerprint-based video retrieval method of any one of the above first aspects.

According to a fourth aspect, an embodiment of the present invention provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor to cause the at least one processor to perform the video fingerprint-based video retrieval method of any one of the first aspects above.

According to the method and the device, the first video fingerprint matrix of the video to be searched and the second video fingerprint matrix of the preset video are constructed, the correlation matrix of the two fingerprint matrices is calculated, the connected region is searched in the correlation matrix, the connected region can represent continuous frames with high similarity, namely video fragments with high similarity of the video to be searched and the preset video, and the video search result is determined by utilizing the connected region. The fingerprint correlation matrix is constructed, the similarity judgment of the video segments is converted into searching for the connected areas in the correlation matrix, and the connected areas are connected areas in the correlation matrix of the video fingerprint matrix to be searched and the preset video fingerprint matrix, so that the connected areas can represent continuous frames with high similarity, the correlation between the video to be searched and the preset video can be determined based on the attribute of the connected areas, the high-similarity video segments can be accurately positioned, the video partial matching and various video time transformation are robust, the accuracy is high, and the interpretation is high.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a schematic diagram of a video retrieval method based on video fingerprints of the present embodiment;

FIG. 2 shows a schematic diagram of a video retrieval device based on video fingerprints according to an embodiment of the present invention;

fig. 3 shows a schematic diagram of an electronic device according to an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

A video fingerprint of a video, which is a unique video feature generated by analysis of image information in the video, may represent a corresponding video file. Video searching may be implemented with video fingerprints, for example, because the descriptive nature of video fingerprints with respect to video content may help locate similar videos, video searching applications may be implemented with video fingerprints. For another example, a fast comparison of two videos may be achieved using video fingerprints. For the video with the same content but different video frame rate and resolution, the message digest algorithm md5 relying on the video file cannot determine whether the two videos are identical. However, since a video fingerprint can describe video content, two videos with the same content can be found by comparing the video fingerprints. With such characteristics, video fingerprints can enable operations such as performing deduplication on the same video. In addition, by utilizing the descriptive property of the video fingerprints on the video content, videos watched by the user can be clustered according to the video fingerprints, so that the recommendation of similar videos can be performed on the categories of interest of the user. In addition, during the video content transmission process, for example, on a video sharing platform, the earliest uploaded user in the same or highly similar video can be determined through video fingerprints and video uploading time, so that the original creator of the video content is determined, and pirated video is hit.

Therefore, an embodiment of the present invention provides a video retrieval method based on video fingerprint, as shown in fig. 1, the method may include the following steps:

s11, respectively constructing a first video fingerprint matrix of the video to be searched and a second video fingerprint matrix of the preset video. Specifically, the preset video may be a target video suspected to be infringing, or may be any one of a plurality of videos in a video library. In this embodiment, a plurality of video frames in the video to be retrieved and the preset video may be extracted, respectively. And the image features respectively extracted for the plurality of video frames are respectively used as fingerprint features of the corresponding video frames. The fingerprint feature may be a feature map of any one or more layers of the neural network output after processing the video frame by using the neural network having a plurality of layers. In this embodiment, the video frames for extracting the fingerprint features may be all video frames of the video to be searched and the preset video, or may be key frames of the video to be searched and the preset video. Specifically, the following description may be given by taking a key frame as an example:

specifically, an effective video segment for extracting a video fingerprint may be determined according to a video, and a key frame may be extracted from the effective video segment. In some embodiments, a portion of the video may be determined to be a valid video clip. For example, the video may be cut into segments of a predetermined length of time (e.g., 15s, or any other possible predetermined length of time), and the cut segments may be used as valid video segments. For example, a segment of the video from 0 to 15 seconds may be determined as a valid video segment, or a segment of a preset time length starting at an arbitrary position in the video may be determined as a valid video segment. In other embodiments, the entire video may be determined to be a valid video clip.

Key frames may then be extracted from the determined valid video segments. In some embodiments, each frame in the active video clip may be determined to be a key frame. In other embodiments, shot segmentation of the active video clip may be achieved by analyzing each frame in the active video clip, and extracting key frames for each shot based on the results of the shot segmentation. For example, the first frame of each shot determined by shot segmentation may be determined as a key frame. In still other embodiments, video frames may be selected as key frames within the video by sampling. For example, the active video segments may be sampled at regular intervals (e.g., every 5s, or at any predetermined time interval), with the sampled video frames being key frames. For a 15 second long active video segment, when sampled at 5 second intervals, 3 frames will be obtained as key frames. For another example, the effective video clip may be sampled at any interval, and the sampled video frame may be used as a key frame.

For each of a plurality of video frames, the video frame is processed using a neural network having a plurality of layers, and an intermediate layer feature map output by an intermediate layer of the neural network is used as a fingerprint vector of the video frame. The fingerprint feature may be used to generate a video fingerprint. In this embodiment, the frame fingerprint vectors corresponding to each frame are combined into a video fingerprint matrix.

As an alternative embodiment, the generation of the video fingerprint may also take other manners, for example, the video fingerprint may be extracted based on the salient region, the video fingerprint may be extracted based on sparse coding, and the video fingerprint extraction may be implemented based on the time gradient and the spatial gradient. And merging fingerprints of the extracted video frames into a video fingerprint matrix, so that a first video fingerprint matrix of the video to be retrieved and a second video fingerprint matrix of the preset video can be constructed. Wherein each row in the first video fingerprint matrix and the second video fingerprint matrix characterizes a fingerprint vector of a frame of video.

S12, calculating a correlation matrix of the first video fingerprint matrix and the second video fingerprint matrix. Specifically, assuming that a first video fingerprint matrix of a video to be retrieved is F0, a second video fingerprint matrix of a preset video is denoted as F1. The correlation matrix of the first video fingerprint matrix and the second video fingerprint matrix can be calculated, the size of the fingerprint correlation matrix is M x N, M is the number of rows of F0, N is the number of rows of F1, and the element (M, N) of the fingerprint correlation matrix is the correlation coefficient of the M-th row of F0 and the N-th row of F1.

S13, determining a communication area in the correlation matrix. In this embodiment, the search of the connected region of the correlation matrix may be performed by binarizing the correlation matrix, and searching the correlation matrix after binarization. Specifically, a preset threshold may be used to binarize the correlation matrix, and when the value of the correlation coefficient is greater than the preset threshold, the binarized value may be 1, and when the value of the correlation coefficient is less than the preset threshold, the binarized value may be 0. As an alternative embodiment, the correlation matrix may be binarized by means of a dual preset threshold, where the dual preset threshold may include a first threshold and a second threshold, the first threshold is greater than the second threshold, when the value of the correlation coefficient is greater than the first threshold, the binarized value may be 1, and when the value of the correlation coefficient is less than the second threshold, the binarized value is 0. As an exemplary embodiment, the matrix after the correlation matrix binarization may be as follows:

[0,0,1,0],

[0,1,0,0],

[1,0,0,0]

wherein, the vertical axis can represent a preset video, and the horizontal axis can represent a video to be retrieved;

as an exemplary embodiment, the connected region may be defined as a set of 8 connected matrix elements, where each element value in the set is 1, and based on the contour labeling method, the connected region existing in the matrix may be obtained as follows:

[0,0,1],

[0,1,0],

[1,0,0]。

s14, determining a video retrieval result based on the connected region attribute. In this embodiment, the search result of the video may include whether the video to be searched and the preset video have similar video clips, and theoretically, if only a connected region exists, the similar video clips exist. Because the correlation matrix is the correlation matrix of the first video fingerprint matrix and the second video fingerprint matrix, the vertical axis can represent the preset video, and the horizontal axis can represent the video to be searched, therefore, based on the characteristics of the correlation matrix, the positions of the similar video fragments in the video to be searched and the preset video can be determined through the projection of the connected areas on two dimensions of the correlation matrix, namely the starting time and the ending time of the similar video fragments in the video to be searched and the preset video respectively. In addition, if the similar video segments are not subjected to frame transformation, that is, the mapping of the similar video segments in the video to be retrieved and the preset video should be the same, so that the connected region is a line segment and the included angle between the connected region and the vertical axis or the horizontal axis of the correlation matrix should be 45 °. If the similar video segment is stretched or compressed in the video to be searched, the included angle between the connected region and the vertical axis or the horizontal axis of the correlation matrix is not 45 degrees, the connected regions with different shapes correspond to different frame time transformations, for example, one line segment with the connected region being 45 degrees to the lower right is a non-time transformation, one line segment with the connected region being 60 degrees to the upper right is a video frame in reverse order and in time compression. After the frame conversion, for example, the video frame sequence is disturbed, the shape of the connected region becomes an irregular shape, for example, the connected region is distributed in a curve or a discontinuous distribution. Therefore, the robustness can be kept high for the video to be retrieved of the frame transformation by the shape of the connected region.

As an exemplary embodiment, for a method for determining the positions of similar video clips in the video to be retrieved and the preset video, the method may be determined by projecting the connected areas on the horizontal axis and the vertical axis of the correlation matrix, and for an exemplary embodiment, reference may be made to the connected areas described above:

[0,0,1],

[0,1,0],

[1,0,0]。

the projections of the communication area on the horizontal axis and the vertical axis are 0-2 and 0-2 respectively, so that the video to be searched and the preset video can be confirmed to be similar video fragments from the 0 th frame to the 2 nd frame.

Since there are a large number of videos in the internet, it is difficult to avoid the situation that a certain frame or a few frames may be similar, in order to prevent misjudgment, as an exemplary embodiment, the number of frames of similar video segments in the video to be searched and the preset video may be further judged, in this embodiment, a preset value may be set, and when the number of frames of similar video segments in the video to be searched and the preset video is greater than a certain preset value, it is considered that similar video segments exist in the video to be searched and the preset video. Specifically, judging whether the length of the communication area is larger than a first preset value and the width of the communication area is larger than a second preset value; and if the length of the communication area is larger than the first preset value and the width of the communication area is larger than the second preset value, confirming that similar video fragments exist in the video to be searched and the preset video. It will be appreciated by those skilled in the art that the first preset value and the second preset value may be the same or different. Can be set according to actual requirements.

When the length of the communication area is larger than the first preset value and the width is larger than the second preset value, the starting time and the ending time of the similar video clips in the video to be searched and the preset video can be calculated through the projection of the communication area on the horizontal axis and the vertical axis.

An embodiment of the present invention provides a video retrieval device based on video fingerprint, as shown in fig. 2, the device may include: the construction module 10 is configured to respectively construct a first video fingerprint matrix of a video to be retrieved and a second video fingerprint matrix of a preset video; a calculation module 20, configured to calculate a correlation matrix of the first video fingerprint and the second video fingerprint; a connected region demarcation module 30 for determining a connected region in the correlation matrix; the result confirming module 40 is configured to determine a video search result based on the connected region attribute.

Optionally, the communication region demarcation module includes: a binarization unit for binarizing the correlation matrix by a preset threshold; and the marking unit is used for determining the connected area in the binarized correlation matrix based on the contour marking method.

Optionally, the result confirmation module includes: the judging unit is used for judging whether the projection of the communication area in the first dimension of the correlation matrix is larger than a first preset value and whether the projection of the communication area in the second dimension of the correlation matrix is larger than a second preset value; and the similar video fragment confirming unit is used for confirming that similar video fragments exist in the video to be searched and the preset video when the projection of the connected region in the first dimension of the correlation matrix is larger than a first preset value and the projection of the connected region in the second dimension of the correlation matrix is larger than a second preset value.

Optionally, the result confirmation module further includes: the position confirmation unit is used for obtaining the starting time and the ending time of the similar video clips in the video to be searched and the preset video respectively through the projection of the first dimension and the projection of the second dimension.

Optionally, the result confirmation module further includes: and a frame transformation confirming unit for confirming the frame transformation of the video to be retrieved relative to the preset video based on the shape of the connected region.

Optionally, the building module includes: the fingerprint extraction unit is used for respectively extracting fingerprint vectors of a plurality of video frames of the video to be searched and the preset video; and the matrix construction unit is used for respectively combining the fingerprint vectors of the plurality of video frames to construct a first video fingerprint matrix and a second video fingerprint matrix.

An embodiment of the present invention provides an electronic device, as shown in fig. 3, which includes one or more processors 31 and a memory 32, and in fig. 3, a processor 33 is taken as an example.

The controller may further include: an input device 33 and an output device 34.

The processor 31, the memory 32, the input device 33 and the output device 34 may be connected by a bus or otherwise, in fig. 3 by way of example.

The processor 31 may be a central processing unit (CentralProcessingUnit, CPU). The processor 31 may also be other general purpose processors, digital signal processors (DigitalSignalProcessor, DSP), application specific integrated circuits (ApplicationSpecificIntegratedCircuit, ASIC), field programmable gate arrays (Field-ProgrammableGateArray, FPGA) or other programmable logic devices, discrete gate or transistor logic, discrete hardware components, or a combination of the above. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 32 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the control methods in the embodiments of the present application. The processor 31 performs various functional applications of the server and data processing, i.e. video fingerprint based video retrieval implementing the above-described method embodiments, by running non-transitory software programs, instructions and modules stored in the memory 32.

The memory 32 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of a processing device operated by the server, or the like. In addition, the memory 32 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 32 may optionally include memory located remotely from processor 31, which may be connected to a network connection device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input means 33 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the processing means of the server. The output device 34 may include a display device such as a display screen.

One or more modules are stored in the memory 32 that, when executed by the one or more processors 31, perform the method shown in fig. 1.

It will be appreciated by those skilled in the art that the whole or part of the flow of the method of the above embodiment may be implemented by a computer program for instructing relevant hardware, and the program may be stored in a computer readable storage medium, and the program may include the flow of the embodiment of the method of controlling a motor as described above when executed. The storage medium may be a magnetic disk, an optical disc, a Read-only memory (ROM), a random access memory (RandomAccessMemory, RAM), a flash memory (flash memory), a hard disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims

1. A video retrieval method based on video fingerprints, comprising:

respectively constructing a first video fingerprint matrix of a video to be searched and a second video fingerprint matrix of a preset video;

determining the row numbers of the first video fingerprint matrix and the second video fingerprint matrix respectively, and determining the correlation matrix of the first video fingerprint matrix and the second video fingerprint matrix according to the row numbers of the first video fingerprint matrix and the second video fingerprint matrix, wherein the values of all elements in the correlation matrix are the correlation coefficients of the first video fingerprint matrix and the second video fingerprint matrix respectively;

binarizing the correlation matrix through a preset threshold value;

determining a connected region in the binarized correlation matrix based on a contour labeling method;

and determining a video retrieval result based on the attribute of the connected region.

2. The video retrieval method of claim 1, wherein the determining a video retrieval result based on the attribute of the connected region comprises:

judging whether the projection of the communication area in the first dimension of the correlation matrix is larger than a first preset value and whether the projection of the communication area in the second dimension of the correlation matrix is larger than a second preset value;

and if the projection of the connected region in the first dimension of the correlation matrix is larger than a first preset value and the projection of the connected region in the second dimension of the correlation matrix is larger than a second preset value, confirming that similar video fragments exist in the video to be searched and the preset video.

3. The video retrieval method of claim 2, wherein the determining a video retrieval result based on the attribute of the connected region further comprises:

and obtaining the starting time and the ending time of the similar video clips in the video to be searched and the preset video respectively through the projection of the first dimension and the projection of the second dimension.

4. The video retrieval method of claim 1, wherein the determining video retrieval results based on the attributes of the connected regions further comprises:

and confirming the frame transformation of the video to be retrieved relative to the preset video based on the shape of the connected region.

5. The video retrieval method according to claim 1, wherein constructing a first video fingerprint matrix of the video to be retrieved and a second video fingerprint matrix of the preset video, respectively, includes:

respectively extracting fingerprint vectors of a plurality of video frames of the video to be searched and the preset video;

and respectively merging fingerprint vectors of the plurality of video frames to construct the first video fingerprint matrix and the second video fingerprint matrix.

6. A video fingerprint-based video retrieval apparatus, comprising:

the construction module is used for respectively constructing a first video fingerprint matrix of the video to be searched and a second video fingerprint matrix of the preset video;

the computing module is used for respectively determining the row numbers of the first video fingerprint matrix and the second video fingerprint matrix, and determining the correlation matrix of the first video fingerprint matrix and the second video fingerprint matrix according to the row numbers of the first video fingerprint matrix and the second video fingerprint matrix, wherein the values of all elements in the correlation matrix are the correlation coefficients of the first video fingerprint matrix and the second video fingerprint matrix respectively;

the communication area defining module is used for binarizing the correlation matrix through a preset threshold value; determining a connected region in the binarized correlation matrix based on a contour labeling method;

and the result confirmation module is used for determining a video retrieval result based on the attribute of the connected region.

7. A computer readable storage medium storing computer instructions for causing the computer to perform the video fingerprint-based video retrieval method of any one of claims 1-5.

8. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor to cause the at least one processor to perform the video fingerprint-based video retrieval method of any one of claims 1-5.