CN111241344B - Video duplicate checking method, system, server and storage medium - Google Patents

Video duplicate checking method, system, server and storage medium Download PDF

Info

Publication number
CN111241344B
CN111241344B CN202010037940.3A CN202010037940A CN111241344B CN 111241344 B CN111241344 B CN 111241344B CN 202010037940 A CN202010037940 A CN 202010037940A CN 111241344 B CN111241344 B CN 111241344B
Authority
CN
China
Prior art keywords
characteristic
video
value
comparison
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010037940.3A
Other languages
Chinese (zh)
Other versions
CN111241344A (en
Inventor
李晓佳
柴中进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinhua Zhiyun Technology Co ltd
Original Assignee
Xinhua Zhiyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinhua Zhiyun Technology Co ltd filed Critical Xinhua Zhiyun Technology Co ltd
Priority to CN202010037940.3A priority Critical patent/CN111241344B/en
Publication of CN111241344A publication Critical patent/CN111241344A/en
Application granted granted Critical
Publication of CN111241344B publication Critical patent/CN111241344B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Abstract

The invention discloses a video duplicate checking method, a system, a server and a storage medium, wherein a video frame is taken by a to-be-checked important frequency at a specific time point in a stamping way, and a characteristic value is extracted, if the characteristic value of the video frame taken by the same time point in comparison with a video is the same, the two videos are considered to be the same, and because the characteristic images required to be taken in a stamping way are fewer, the required storage space is less, the characteristic value stamping speed is high, and the duplicate checking speed is high. In addition, screening the check weight data through basic information such as video length and the like in the earlier stage of comparison, reducing the data quantity, and comparing the independent characteristic images, converting the first characteristic image into a first characteristic value, directly comparing whether the characteristic values are identical or not, and considering that the first characteristic image is identical with the second characteristic image corresponding to one of the second characteristic values when the Hamming distance between the first characteristic value and one of the second characteristic values is smaller than or equal to a first threshold value, performing exclusive OR operation only, and performing complex encoding and decoding processing is not needed, so that the check weight speed is high.

Description

Video duplicate checking method, system, server and storage medium
Technical Field
The invention relates to the technical field of video processing, in particular to a video duplicate checking method, a video duplicate checking system, a video duplicate checking server and a video duplicate checking storage medium.
Background
In today's multimedia information society, users upload massive amounts of video to a video platform every day, most of which are normal valuable videos, however, there are also some problem videos, which mainly include: and videos that are repeated with existing videos in the platform video database, videos that are repeated with videos in the copyright database (e.g., videos that require a copyright fee to be paid), and certain videos that are unsuitable or prohibited from being presented.
In addition, when the website ranks videos according to the video watching amount to recommend the videos to the user, the accuracy of ranking videos by the website is low due to the fact that a large number of repeated videos or videos with high similarity exist in the videos, and the accuracy of recommending the videos to the user is also low, and the user is not favored to look up and watch the videos due to the fact that a large number of repeated videos or videos with high similarity exist in the videos, so that the experience of the user is low.
Therefore, the massive videos uploaded by users need to be quickly compared and duplicated. The existing method for comparing and eliminating the duplication of the video has the problems of low speed, large consumption of operation resources and storage resources and the like.
Disclosure of Invention
The invention aims to solve the problems and provide a video duplicate checking method, a system, a server and a storage medium, which have high processing speed and low resource consumption.
In order to achieve the above object, an embodiment of the present invention provides a video duplication checking method, including:
acquiring first characteristic images of the to-be-inspected important frequency at a plurality of specific time points;
converting the first characteristic image into a first characteristic value;
comparing the first characteristic value with second characteristic values corresponding to all comparison videos, wherein the comparison videos are all videos with the same duration as the to-be-checked importance frequency, the second characteristic values are characteristic values corresponding to second characteristic images of the comparison videos at specific time points, and when the Hamming distance between the first characteristic value and the second characteristic value of one of the comparison videos is smaller than or equal to a first threshold value, the first characteristic images and the second characteristic images corresponding to the second characteristic values are considered to be the same;
when the first characteristic image and the second characteristic image acquired by the video to be checked and one of the comparison videos at a plurality of specific time points are the same, the video to be checked and the corresponding comparison video are determined to be the same.
Optionally, the specific step of converting the first feature image into the first feature value includes:
reducing the first characteristic image into a gray scale image with low pixels;
and comparing the color value of each pixel point of the gray map with a reference gray value, setting the value of the corresponding pixel point to be 1 or 0 when the color value of the pixel point is larger than or equal to the reference gray value, otherwise, setting the value of the corresponding pixel point to be 0 or 1 when the color value of the pixel point is smaller than the reference gray value, so that the gray map is converted into a first characteristic value.
Optionally, the reference gray value is an average value of color values of each pixel point of the gray map.
Optionally, the obtaining manner of the second characteristic value includes:
based on the time length of the frequency to be checked, acquiring all videos with the same time length as the frequency to be checked as comparison videos
Acquiring second characteristic images of the comparison video at a plurality of specific time points;
scaling down the second feature image to a gray scale map of low pixels;
and comparing the color value of each pixel point of the gray map with a reference gray value, setting the value of the corresponding pixel point to be 1 or 0 when the color value of the pixel point is larger than or equal to the reference gray value, otherwise, setting the value of the corresponding pixel point to be 0 or 1 when the color value of the pixel point is smaller than the reference gray value, so that the gray map is converted into a second characteristic value.
Alternatively, the pixels of the scaled-down gray-scale image are 8×8, 64 bits, and equal to a storage size of one long shape, so that the first feature value and the second feature value are stored as one long shape.
Optionally, when comparing the to-be-checked heavy video with one of the comparison videos, the first feature images and the second feature images acquired at a plurality of specific time points are partially identical but partially different, and the different proportions are smaller than or equal to the second threshold, and the to-be-checked heavy video is determined to be identical with the corresponding comparison video.
Optionally, the specific step of determining that the first feature image and the second feature image are the same includes:
performing feature slicing on the first feature values, and dividing the first feature values into N groups of first sub-feature values, wherein N is larger than V1, and V1 is a first threshold value;
acquiring N groups of second sub-feature values corresponding to the second feature values;
obtaining any (N-V1) group of comparison videos with the first sub-characteristic values identical to the corresponding second sub-characteristic values to form a first comparison video set;
and comparing the first characteristic value corresponding to the first characteristic image with the second characteristic value corresponding to the second characteristic image corresponding to all the first comparison video sets, and when the Hamming distance between the first characteristic value and one of the second characteristic values is smaller than or equal to a first threshold value, considering that the first characteristic image is identical to the second characteristic image corresponding to the second characteristic value.
Optionally, the method further comprises: firstly, checking and comparing the characteristic images of two specific time points to obtain all the comparison videos which are the same as at least one of the two characteristic images of the to-be-checked important frequency, so as to form a second comparison video set;
and comparing the residual characteristic images of the to-be-inspected important frequency with the residual characteristic images in the second comparison video set, and judging whether the to-be-inspected important video is identical with the corresponding comparison video.
Optionally, the method further comprises: and acquiring the video to be checked, and acquiring second characteristic values of second characteristic images of all videos with the same time length as the video to be checked at a specific time point based on the time length of the video to be checked.
The embodiment of the invention also provides a video duplicate checking system, which comprises:
the characteristic image acquisition unit is used for acquiring characteristic images of the to-be-checked heavy video and the comparison video at a plurality of specific time points, wherein the comparison video is all videos with the same duration as the to-be-checked heavy video, the characteristic image of the to-be-checked heavy video is a first characteristic image, and the characteristic image of the comparison video is a second characteristic image;
the image characteristic value acquisition unit is used for converting the first characteristic image into a first characteristic value;
the characteristic value comparison unit compares the first characteristic value with second characteristic values corresponding to all the second characteristic images, and when the Hamming distance between the first characteristic value and one of the second characteristic values is smaller than or equal to a first threshold value, the first characteristic image is considered to be identical to the second characteristic image corresponding to the second characteristic value;
and the duplicate checking judgment unit is used for recognizing that the duplicate checking video is identical to the corresponding comparison video when the first characteristic image and the second characteristic image acquired by the duplicate checking video and one of the comparison videos at a plurality of specific time points are identical.
Optionally, the method further comprises: the video acquisition unit is used for acquiring the to-be-inspected important frequency and acquiring second characteristic values of second characteristic images of all videos with the same time length as the to-be-inspected important frequency at a specific time point based on the time length of the to-be-inspected important video.
The embodiment of the invention also provides a server, which comprises: a memory, a processor;
wherein the memory is used for storing programs;
the processor is used for executing the program in the memory, and specifically executing the video duplicate checking method.
The embodiment of the invention also provides a computer readable storage medium, which comprises instructions that when run on a computer, cause the computer to execute the video duplication checking method.
The invention has the beneficial effects that:
according to the video duplicate checking method, firstly, a first characteristic image is stamped at a specific time point of a video to be checked, the first characteristic image is compared with a corresponding second characteristic image, if the comparison of a plurality of characteristic images obtained through stamping is the same, the two videos are considered to be the same, and because the number of characteristic images needing to be stamped is small, the required storage space is small, the characteristic stamping speed is high, and the duplicate checking speed is high. In addition, when comparing the individual characteristic images, the first characteristic image is converted into the first characteristic value, and the first characteristic value and one of the second characteristic values are directly compared according to whether the characteristic values are identical, when the Hamming distance between the first characteristic value and the second characteristic value is smaller than or equal to a first threshold value, the first characteristic image and the second characteristic image corresponding to the second characteristic value are considered to be identical, exclusive OR operation is only needed, complex encoding and decoding processing is not needed, the calculated amount is small, and the duplicate checking speed is high.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a video duplication checking method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of converting a feature image into feature values according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a video duplication checking system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, based on the examples herein, which are within the scope of the invention as defined by the claims, will be within the scope of the invention as defined by the claims.
It should be appreciated that existing video websites store tens or even hundreds of millions of different videos, and that in order to avoid unnecessary storage consumption, a determination is made after a new video is acquired, and if so, no further storage is required. Furthermore, in capturing many videos, the problem of video repetition is also involved. Because the video duplication checking is to compare the new duplication checking video with all videos in the existing video library, when the capacity of the existing video library is larger, the time consumed by duplication checking comparison is longer, so that a video duplication checking method is needed, although 100% accuracy is not necessarily required, the processing speed is required to be high, and the consumption of resources is less.
The embodiment of the invention provides a video duplicate checking method, a system, a server and a storage medium, wherein a first characteristic image is stamped on a video to be checked at a specific time point, the first characteristic image is compared with a corresponding second characteristic image, if a plurality of characteristic images obtained by stamping are identical, the two videos are considered to be identical, and because the characteristic images needing to be stamped are fewer, the required storage space is less, the characteristic stamping speed is high, and the duplicate checking speed is high. In addition, when comparing the individual characteristic images, the first characteristic image is converted into the first characteristic value, and the first characteristic value and one of the second characteristic values are directly compared according to whether the characteristic values are identical, when the Hamming distance between the first characteristic value and the second characteristic value is smaller than or equal to a first threshold value, the first characteristic image and the second characteristic image corresponding to the second characteristic value are considered to be identical, exclusive OR operation is only needed, complex encoding and decoding processing is not needed, the calculated amount is small, and the duplicate checking speed is high.
The invention provides a video duplication checking method which is applied to a video duplication checking system, wherein the video duplication checking system can be placed in a server, and new to-be-checked important frequencies acquired by the server are processed. In other embodiments, the video duplication checking system may also be set as a software system in a terminal device, where the terminal device includes, but is not limited to, a tablet computer, a notebook computer, a palm computer, a mobile phone, and a personal computer (personal computer, PC), and the terminal device is used to acquire and process a new to-be-checked duplication frequency, and analyze whether the video is a duplicate video.
The following describes specific embodiments of the video duplication checking method and the video duplication checking system in detail.
Referring to fig. 1, a video duplication checking method according to an embodiment of the present invention specifically includes:
step S100, obtaining a video to be checked and a comparison video, wherein the comparison video is all videos with the same duration as the video to be checked in the existing video library.
The video repeat checking comparison of the invention is only aimed at comparing videos with identical duration. For example, in the movie of my and my ancestor, the duration is 155 minutes, but for some versions, the head or tail is cut off to a part, only 151 minutes remain, and the duplicate checking method of the embodiment of the invention cannot be used for judging whether the two movies are identical videos.
Therefore, in step S100, the frequency to be checked and the corresponding time length information are first obtained, and all the videos with the time length are obtained in the existing database by using the time length information and are used as the comparison videos for checking the subsequent videos, so that the subsequent comparison is facilitated. Because the time length information is screened once, the data size of check and duplication comparison is greatly reduced.
Step S200, obtaining characteristic images of the video to be checked and the comparison video at a plurality of specific time points, wherein the characteristic image of the video to be checked is a first characteristic image, and the characteristic image of the comparison video is a second characteristic image.
Generally, according to the total duration of the frequency to be considered, 10 specific time points are uniformly selected to be stamped to obtain a characteristic image of the frequency to be considered as a first characteristic image. For example, when the total duration of the video to be checked is 110 minutes, 10 minutes and 0 seconds, 20 minutes and 0 seconds and 30 minutes and 0 seconds are respectively stamped. . . . Characteristic images of 10 specific time points of 100 minutes 0 seconds. In other embodiments, the feature images of the frequency to be checked may be obtained by stamping 5 or 20 or other specific time points, and if the comparison accuracy needs to be increased, the number of specific time points may be increased.
In other embodiments, it may also be an option to skip the beginning/end portion of the content and then to stamp out the feature image, as the beginning/end portion of the content tends to be blank or similar.
In other embodiments, the characteristic image of the duplicate video to be checked may be obtained by stamping at a fixed time point without uniformly selecting a specific time point.
Since the embodiment directly takes the images according to the time stamp, the whole video is not required to be encoded and decoded, only 10 images are required to be encoded and decoded, and the speed is very high.
Because the comparison video of the embodiment of the invention is stored in the existing video library after being compared in a video duplication checking mode, for the video with the same duration, the second characteristic image corresponding to the specific time point is already stamped in advance, and is converted into the corresponding second characteristic value in advance, and only the first characteristic value and the second characteristic value are required to be compared later, so that repeated image processing is not required, the duplication checking speed is greatly saved, and the data storage space is reduced. Therefore, in one embodiment, after the first feature value is obtained based on the frequency to be checked, the second feature value of the corresponding comparison video is obtained by directly using the time length information corresponding to the frequency to be checked, so that the situation that a large number of comparison videos are subjected to feature image stamping and encoding and decoding is avoided, and the check time is greatly reduced.
In other embodiments, if the comparison video is not pre-stamped with the feature image corresponding to the specific time point, the feature image corresponding to the specific time point needs to be stamped with the comparison video as the second feature image. The method for taking the characteristic images corresponding to the specific time points for the comparison video stamps is consistent with the method for taking the characteristic images of the specific time points for the to-be-observed importance video stamps, and the time and the number of the specific time points are the same.
And step S300, converting the first characteristic image into a first characteristic value.
In this embodiment, the specific step of converting one of the first feature images into the first feature value includes: reducing the first characteristic image into a gray scale image with low pixels; and comparing the color value of each pixel point of the gray map with a reference gray value, setting the value of the corresponding pixel point to be 1 or 0 when the color value of the pixel point is larger than or equal to the reference gray value, otherwise, setting the value of the corresponding pixel point to be 0 or 1 when the color value of the pixel point is smaller than the reference gray value, so that the gray map is converted into a first characteristic value.
Referring specifically to fig. 2, a half-life image is taken as an example of a characteristic image.
First, the half-life picture is taken 80% of the middle of the image to remove some disturbing content, such as black edges, advertisements, etc. In other embodiments, the conversion may be performed by directly removing the entire content of the image.
After 80% content is obtained, the image is reduced to 8 x 8 pixels, and the color image is converted to a 64-level gray scale image. In other embodiments, the thumbnail may be scaled down to other pixels as well.
The average value of the color values of the 64 pixels is calculated, the value of each pixel is compared with the average value, if the color value of the pixel is greater than or equal to the average value, the value corresponding to the pixel point is set to be 1, otherwise, the value is set to be 0. The image is converted into a 64-bit value, namely, the characteristic value is 64 bits, 8 bytes is just a long length in java, so that the characteristic value can be stored as a long shaping, and the characteristic value of a video is an integer array, thereby being convenient to store and inquire and being beneficial to improving the duplicate checking efficiency.
In this embodiment, the reference gray value is an average value of color values of 64 pixels. In other embodiments, the reference gray value may also be a fixed color value.
In this embodiment, the feature value corresponding to one feature image is a long shape, which is convenient for storage. In other embodiments, the feature value corresponding to a feature image may also be a set of binary values.
In this embodiment, if the pixel color value is equal to or greater than the average value, the value corresponding to the pixel point is set to 1, otherwise, set to 0. In other embodiments, if the pixel color value is less than or equal to the average value, the value corresponding to the pixel point may be set to 1, otherwise, set to 0.
Because the comparison videos in the embodiment of the invention are also stored in the existing video library after being compared in a video duplication checking mode, all the comparison videos are already stamped to obtain the second characteristic images corresponding to the specific time points in advance and the second characteristic images are converted into the corresponding second characteristic values in advance, only the first characteristic values and the second characteristic values are required to be compared, repeated image processing is not required, duplication checking speed is greatly saved, and data storage space is reduced. According to the technical scheme of the embodiment of the invention, the most time-consuming step is to acquire the characteristic value for a new video, which often takes a few seconds, if the video is relatively high-definition and relatively long, the characteristic value can be generated relatively slowly and can take about 1 minute. However, since each comparison video in the existing video library has obtained the corresponding feature value in advance, the recalculation is not needed, and only a new first feature value (about 1-5 seconds) is needed to be calculated and the query and comparison are carried out, the time for the query and comparison is often only hundreds of milliseconds, so that the time for the query and comparison is greatly reduced.
Because all the comparison videos are simultaneously stored with the corresponding integer arrays, the Hamming distance calculation is only needed to be carried out on the converted first characteristic value and the prestored second characteristic value, encoding and decoding processing is not needed to be carried out on massive comparison videos of the video classes, and the weight checking time is greatly reduced.
In other embodiments, if the comparison video is not pre-stamped with the second feature image corresponding to the specific time point and is converted into the second feature value, the method similar to step S300 is adopted to convert the second feature image into the second feature value, which specifically includes:
scaling down the second feature image to a gray scale map of low pixels; and comparing the color value of each pixel point of the gray map with a reference gray value, setting the value of the corresponding pixel point to be 1 or 0 when the color value of the pixel point is larger than or equal to the reference gray value, otherwise, setting the value of the corresponding pixel point to be 0 or 1 when the color value of the pixel point is smaller than the reference gray value, so that the gray map is converted into a second characteristic value.
And step S400, comparing the first characteristic value with second characteristic values corresponding to all the second characteristic images, and considering that the first characteristic image is identical with the second characteristic image corresponding to the second characteristic value when the Hamming distance between the first characteristic value and one of the second characteristic values is smaller than or equal to a first threshold value.
Although in the invention, the characteristic image is converted into the characteristic value for comparison, so that the weight checking speed is high. However, when the amount of the comparison video is still relatively large, for example, on the order of hundreds of thousands or millions, it takes a certain time to perform hamming distance calculation on the first feature value and hundreds of thousands or millions of sets of second feature values, and the IO pressure ratio is relatively large. Therefore, in this embodiment, the calculation time can be further reduced by performing the feature slicing on the first feature value and the second feature value and comparing the feature slices step by step.
Specifically, the first feature values are subjected to feature slicing and divided into N groups of first sub-feature values, wherein N is larger than V1, and V1 is a first threshold value.
N groups of second sub-feature values corresponding to the second feature values are obtained.
The N groups of second sub-feature values can be obtained by pre-carrying out feature segmentation on the second feature values and respectively storing the second feature values in a database. In other embodiments, the segmentation may be performed after the second feature value is obtained.
Obtaining any (N-V1) group of comparison videos with the first sub-characteristic values identical to the corresponding second sub-characteristic values to form a first comparison video set;
and comparing the first characteristic value corresponding to the first characteristic image with the second characteristic value corresponding to the second characteristic image corresponding to all the first comparison video sets, and when the Hamming distance between the first characteristic value and one of the second characteristic values is smaller than or equal to a first threshold value, considering that the first characteristic image is identical to the second characteristic image corresponding to the second characteristic value. The first characteristic value and the second characteristic value are compared according to binary digits, and if the values are different at the same position, the difference value is increased by 1. For example, the first 16 bits of the first eigenvalue are 1000000000000000, the first 16 bits of the second eigenvalue are 000000000001, and the values of the 1 st bit and the 16 th bit are different, and the hamming distance is 2.
In this embodiment, the first threshold is 2, and n is 4, that is, when the hamming distance between the first feature value and one of the second feature values is less than or equal to 2, the first feature image and the second feature image corresponding to the second feature value are considered to be the same.
Since the first eigenvalue is a 64-bit value, the first eigenvalues are divided into 4 groups of first sub-eigenvalues, each group of first sub-eigenvalues being a 16-bit value. Since the hamming distance is less than or equal to 2, i.e. at most two bits in a 64-bit value are different, the two bits are at most distributed in two sets of first sub-feature values, so that at least 2 sets of first sub-feature values of 4 sets (i.e. N-V1) need to be identical. If the 4 groups of first sub-feature values are respectively named as A, B, C, D, AB, AC, AD, BC, BD, CD groups of first sub-feature values are respectively compared with corresponding second sub-feature values AB, AC, AD, BC, BD, CD in the comparison video, and the comparison video with the first sub-feature values of any (N-V1) group being identical to the corresponding second sub-feature values is obtained, so that a first comparison video set is formed. Since the calculation amount of the identical comparison is far smaller than that of the Hamming distance obtained by exclusive OR calculation, the calculation resource required by obtaining the first comparison video set is small, and the calculation speed is high. Meanwhile, through comparison of the first sub-characteristic values, the number of the comparison videos of the first comparison video set is far smaller than that of the original comparison videos, so that the first characteristic values are compared with the second characteristic values corresponding to all the second characteristic images corresponding to the first comparison video set, the number of times of comparison is reduced greatly, and the overall check time can be reduced.
And step S500, when the first characteristic image and the second characteristic image acquired by the video to be checked and one of the comparison videos at a plurality of specific time points are the same, the video to be checked and the corresponding comparison video are determined to be the same.
Because the characteristic image is reduced and then the gray scale forming characteristic value of each pixel point of the reduced image is judged and compared, the characteristic value is possibly changed due to some change of gray scale of part of pixel points, and therefore, if the characteristic value is strictly compared, misjudgment is easy to occur.
Therefore, when comparing the to-be-checked heavy video with one of the comparison videos, the first feature image and the second feature image acquired at 10 specific time points are partially identical but partially different, wherein different hamming distances are greater than 2 but less than or equal to 5, and the same video is still possible. Therefore, when the above-mentioned different ratio is less than or equal to 2, i.e. the second threshold is 2, the frequency to be considered is still considered to be the same as the corresponding comparison video.
In other embodiments, the second threshold and hamming distance may also be adjusted appropriately.
In addition, when judging the characteristic images of 10 specific time points, firstly, checking and comparing the characteristic images of two specific time points to obtain all the comparison videos which are the same as at least one of the two characteristic images of the frequency to be checked, so as to form a second comparison video set; and comparing the residual characteristic images of the to-be-inspected important frequency with the residual characteristic images in the second comparison video set, and judging whether the to-be-inspected important video is identical with the corresponding comparison video. Because the number of the comparison videos in the second comparison video set is far smaller than that of the original comparison videos, the number of times of comparison is reduced greatly, and therefore the overall check time can be reduced.
The invention also provides a video duplicate checking system, please refer to fig. 3, comprising:
a video acquisition unit 100 that acquires a frequency to be valued;
the characteristic image acquisition unit 200 acquires first characteristic images of the heavy video to be checked at a plurality of specific time points;
an image feature value obtaining unit 300, configured to convert the first feature image into a first feature value;
the feature value comparing unit 400 is configured to compare the first feature value with second feature values corresponding to all the comparison videos, where the comparison videos are all videos with the duration identical to the frequency to be considered, and the second feature value is a feature value corresponding to a second feature image of the comparison video at a specific time point, and when a hamming distance between the first feature value and the second feature value of one of the comparison videos is less than or equal to a first threshold value, the first feature image is considered to be identical to the second feature image corresponding to the second feature value;
and the duplicate checking judging unit 500 judges that the duplicate checking video is identical to the corresponding comparison video when the first characteristic image and the second characteristic image acquired by the duplicate checking video and one of the comparison videos at a plurality of specific time points are identical.
In addition, the invention also provides a server, which comprises: a memory, a processor;
wherein the memory is used for storing programs;
the processor is used for executing the program in the memory, and specifically executing the video duplicate checking method.
The present invention also provides a computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the steps of: acquiring a to-be-checked heavy video and a comparison video, wherein the comparison video is all videos with the same duration as the to-be-checked heavy frequency in the existing video library; acquiring characteristic images of a video to be checked and a comparison video at a plurality of specific time points, wherein the characteristic image of the video to be checked is a first characteristic image, and the characteristic image of the comparison video is a second characteristic image; converting the first characteristic image into a first characteristic value; comparing the first characteristic value with second characteristic values corresponding to all the second characteristic images, and when the Hamming distance between the first characteristic value and one of the second characteristic values is smaller than or equal to a first threshold value, considering that the first characteristic image is identical to the second characteristic image corresponding to the second characteristic value; when the first characteristic image and the second characteristic image acquired by the video to be checked and one of the comparison videos at a plurality of specific time points are the same, the video to be checked and the corresponding comparison video are determined to be the same.
In the several embodiments provided in the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely illustrative embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think about variations or substitutions within the technical scope of the present invention, and the invention should be covered. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (11)

1. A video duplication checking method, comprising:
acquiring a first characteristic image of the frequency to be checked at a plurality of specific time points, wherein the first characteristic image comprises a plurality of characteristic image of the frequency to be checked, which are uniformly selected according to the total duration of the frequency to be checked, and are taken as the first characteristic image;
converting the first characteristic image into a first characteristic value;
comparing the first characteristic value with second characteristic values corresponding to all comparison videos, wherein the comparison videos are all videos with the same duration as the to-be-checked importance frequency, the second characteristic values are characteristic values corresponding to second characteristic images of the comparison videos at specific time points, and when the Hamming distance between the first characteristic value and the second characteristic value of one of the comparison videos is smaller than or equal to a first threshold value, the first characteristic images and the second characteristic images corresponding to the second characteristic values are considered to be the same;
when the first characteristic image and the second characteristic image acquired by the video to be checked and one of the comparison videos at a plurality of specific time points are the same, the video to be checked is determined to be the same as the corresponding comparison video; the time and the number of specific time points of the video to be checked and the comparison video are the same;
further comprises: firstly, checking and comparing the characteristic images of two specific time points to obtain all the comparison videos which are the same as at least one of the two characteristic images of the to-be-checked important frequency, so as to form a second comparison video set;
comparing the residual characteristic images of the to-be-inspected important frequency with the residual characteristic images in the second comparison video set, and judging whether the to-be-inspected important video is identical with the corresponding comparison video;
the specific step of judging that the first characteristic image and the second characteristic image are the same comprises the following steps:
performing feature slicing on the first feature values, and dividing the first feature values into N groups of first sub-feature values, wherein N is larger than V1, and V1 is a first threshold value;
acquiring N groups of second sub-feature values corresponding to the second feature values;
obtaining any (N-V1) group of comparison videos with the first sub-characteristic values identical to the corresponding second sub-characteristic values to form a first comparison video set;
and comparing the first characteristic value corresponding to the first characteristic image with the second characteristic value corresponding to the second characteristic image corresponding to all the first comparison video sets, and when the Hamming distance between the first characteristic value and one of the second characteristic values is smaller than or equal to a first threshold value, considering that the first characteristic image is identical to the second characteristic image corresponding to the second characteristic value.
2. The video duplication checking method of claim 1, wherein: the specific step of converting the first characteristic image into a first characteristic value comprises the following steps:
reducing the first characteristic image into a gray scale image with low pixels;
and comparing the color value of each pixel point of the gray map with a reference gray value, setting the value of the corresponding pixel point to be 1 or 0 when the color value of the pixel point is larger than or equal to the reference gray value, otherwise, setting the value of the corresponding pixel point to be 0 or 1 when the color value of the pixel point is smaller than the reference gray value, so that the gray map is converted into a first characteristic value.
3. The video duplication checking method of claim 2 wherein the reference gray value is an average of color values of each pixel of the gray map.
4. The video duplication checking method of claim 2 wherein the obtaining manner of the second feature value includes:
based on the time length of the frequency to be checked, acquiring all videos with the same time length as the frequency to be checked as comparison videos;
acquiring second characteristic images of the comparison video at a plurality of specific time points;
scaling down the second feature image to a gray scale map of low pixels;
and comparing the color value of each pixel point of the gray map with a reference gray value, setting the value of the corresponding pixel point to be 1 or 0 when the color value of the pixel point is larger than or equal to the reference gray value, otherwise, setting the value of the corresponding pixel point to be 0 or 1 when the color value of the pixel point is smaller than the reference gray value, so that the gray map is converted into a second characteristic value.
5. The video rethreading method according to claim 2 or 4, wherein the pixels of the scaled down gray scale map are 8 x 8, 64 bits, and equal to a storage size of one long shape, so as to store the first feature value and the second feature value as one long shape.
6. The video duplication checking method of claim 1, wherein: when the to-be-checked heavy video is compared with one of the comparison videos, the first characteristic images and the second characteristic images acquired at a plurality of specific time points are partially identical but partially different, and the different proportions are smaller than or equal to a second threshold value, so that the to-be-checked heavy video is considered to be identical with the corresponding comparison video.
7. The video duplication checking method of claim 1 further comprising: and acquiring the video to be checked, and acquiring second characteristic values of second characteristic images of all videos with the same time length as the video to be checked at a specific time point based on the time length of the video to be checked.
8. A video duplication checking system, comprising:
a characteristic image acquisition unit for acquiring characteristic images of the heavy video to be checked and the contrast video at a plurality of specific time points,
the method comprises the steps that video is compared to be all videos with the same duration as the to-be-inspected important frequency, and characteristic images of the to-be-inspected important video are first characteristic images, wherein according to the total duration of the to-be-inspected important video, a plurality of specific time points are uniformly selected to stamp the characteristic images of the to-be-inspected important frequency to be used as the first characteristic images; the characteristic image of the comparison video is a second characteristic image;
the image characteristic value acquisition unit is used for converting the first characteristic image into a first characteristic value; the time and the number of specific time points of the video to be checked and the comparison video are the same;
the characteristic value comparison unit compares the first characteristic value with second characteristic values corresponding to all the second characteristic images, and when the Hamming distance between the first characteristic value and one of the second characteristic values is smaller than or equal to a first threshold value, the first characteristic image is considered to be identical to the second characteristic image corresponding to the second characteristic value;
the duplicate checking judging unit is used for judging that the video to be checked is identical with the corresponding comparison video when the first characteristic image and the second characteristic image which are acquired by the video to be checked and one of the comparison videos at a plurality of specific time points are identical;
further comprises: firstly, checking and comparing the characteristic images of two specific time points to obtain all the comparison videos which are the same as at least one of the two characteristic images of the to-be-checked important frequency, so as to form a second comparison video set;
comparing the residual characteristic images of the to-be-inspected important frequency with the residual characteristic images in the second comparison video set, and judging whether the to-be-inspected important video is identical with the corresponding comparison video;
the specific step of judging that the first characteristic image and the second characteristic image are the same comprises the following steps:
performing feature slicing on the first feature values, and dividing the first feature values into N groups of first sub-feature values, wherein N is larger than V1, and V1 is a first threshold value;
acquiring N groups of second sub-feature values corresponding to the second feature values;
obtaining any (N-V1) group of comparison videos with the first sub-characteristic values identical to the corresponding second sub-characteristic values to form a first comparison video set;
and comparing the first characteristic value corresponding to the first characteristic image with the second characteristic value corresponding to the second characteristic image corresponding to all the first comparison video sets, and when the Hamming distance between the first characteristic value and one of the second characteristic values is smaller than or equal to a first threshold value, considering that the first characteristic image is identical to the second characteristic image corresponding to the second characteristic value.
9. The video duplication checking system of claim 8 further comprising: the video acquisition unit is used for acquiring the to-be-inspected important frequency and acquiring second characteristic values of second characteristic images of all videos with the same time length as the to-be-inspected important frequency at a specific time point based on the time length of the to-be-inspected important video.
10. A server, comprising: a memory, a processor;
wherein the memory is used for storing programs;
the processor is configured to execute a program in the memory, and in particular to execute the video duplication checking method according to any one of claims 1-7.
11. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the video duplication checking method of any one of claims 1-7.
CN202010037940.3A 2020-01-14 2020-01-14 Video duplicate checking method, system, server and storage medium Active CN111241344B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010037940.3A CN111241344B (en) 2020-01-14 2020-01-14 Video duplicate checking method, system, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010037940.3A CN111241344B (en) 2020-01-14 2020-01-14 Video duplicate checking method, system, server and storage medium

Publications (2)

Publication Number Publication Date
CN111241344A CN111241344A (en) 2020-06-05
CN111241344B true CN111241344B (en) 2023-09-05

Family

ID=70871033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010037940.3A Active CN111241344B (en) 2020-01-14 2020-01-14 Video duplicate checking method, system, server and storage medium

Country Status (1)

Country Link
CN (1) CN111241344B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111696105B (en) * 2020-06-24 2023-05-23 北京金山云网络技术有限公司 Video processing method and device and electronic equipment
CN113905282B (en) * 2021-10-22 2024-02-20 贝壳找房(北京)科技有限公司 Video file uploading processing method and device, electronic equipment and storage medium
CN114913350B (en) * 2022-04-19 2023-04-07 深圳市东信时代信息技术有限公司 Material duplicate checking method, device, equipment and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101567041A (en) * 2009-05-25 2009-10-28 公安部交通管理科学研究所 Method for recognizing characters of number plate images of motor vehicles based on trimetric projection
JP2012133484A (en) * 2010-12-20 2012-07-12 Nippon Telegr & Teleph Corp <Ntt> Image retrieval device, and image retrieval program
CN103186652A (en) * 2011-12-28 2013-07-03 英业达股份有限公司 Distributed data de-duplication system and method thereof
CN105468755A (en) * 2015-11-27 2016-04-06 东方网力科技股份有限公司 Video screening and storing method and device
CN106528743A (en) * 2016-11-01 2017-03-22 山东浪潮云服务信息科技有限公司 High-efficiency similar picture identification method based on picture mining technology
CN107665261A (en) * 2017-10-25 2018-02-06 北京奇虎科技有限公司 Video duplicate checking method and device
CN107705805A (en) * 2017-10-25 2018-02-16 北京奇虎科技有限公司 Audio duplicate checking method and device
CN107944371A (en) * 2017-11-17 2018-04-20 辽宁公安司法管理干部学院 A kind of public security video monitoring image processing method based on data mining
CN109189991A (en) * 2018-08-17 2019-01-11 百度在线网络技术(北京)有限公司 Repeat video frequency identifying method, device, terminal and computer readable storage medium
CN109274966A (en) * 2018-09-21 2019-01-25 华中科技大学 A kind of monitor video content De-weight method and system based on motion vector
CN109862391A (en) * 2019-03-18 2019-06-07 网易(杭州)网络有限公司 Video classification methods, medium, device and calculating equipment
CN110110147A (en) * 2017-12-27 2019-08-09 中兴通讯股份有限公司 A kind of method and device of video frequency searching
CN110162665A (en) * 2018-12-28 2019-08-23 腾讯科技(深圳)有限公司 Video searching method, computer equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019058073A (en) * 2017-09-25 2019-04-18 オリンパス株式会社 Image processing apparatus, cell recognition apparatus, cell recognition method, and cell recognition program

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101567041A (en) * 2009-05-25 2009-10-28 公安部交通管理科学研究所 Method for recognizing characters of number plate images of motor vehicles based on trimetric projection
JP2012133484A (en) * 2010-12-20 2012-07-12 Nippon Telegr & Teleph Corp <Ntt> Image retrieval device, and image retrieval program
CN103186652A (en) * 2011-12-28 2013-07-03 英业达股份有限公司 Distributed data de-duplication system and method thereof
CN105468755A (en) * 2015-11-27 2016-04-06 东方网力科技股份有限公司 Video screening and storing method and device
CN106528743A (en) * 2016-11-01 2017-03-22 山东浪潮云服务信息科技有限公司 High-efficiency similar picture identification method based on picture mining technology
CN107705805A (en) * 2017-10-25 2018-02-16 北京奇虎科技有限公司 Audio duplicate checking method and device
CN107665261A (en) * 2017-10-25 2018-02-06 北京奇虎科技有限公司 Video duplicate checking method and device
CN107944371A (en) * 2017-11-17 2018-04-20 辽宁公安司法管理干部学院 A kind of public security video monitoring image processing method based on data mining
CN110110147A (en) * 2017-12-27 2019-08-09 中兴通讯股份有限公司 A kind of method and device of video frequency searching
CN109189991A (en) * 2018-08-17 2019-01-11 百度在线网络技术(北京)有限公司 Repeat video frequency identifying method, device, terminal and computer readable storage medium
CN109274966A (en) * 2018-09-21 2019-01-25 华中科技大学 A kind of monitor video content De-weight method and system based on motion vector
CN110162665A (en) * 2018-12-28 2019-08-23 腾讯科技(深圳)有限公司 Video searching method, computer equipment and storage medium
CN109862391A (en) * 2019-03-18 2019-06-07 网易(杭州)网络有限公司 Video classification methods, medium, device and calculating equipment

Also Published As

Publication number Publication date
CN111241344A (en) 2020-06-05

Similar Documents

Publication Publication Date Title
CN111241344B (en) Video duplicate checking method, system, server and storage medium
US11132555B2 (en) Video detection method, server and storage medium
EP2657884B1 (en) Identifying multimedia objects based on multimedia fingerprint
US9420299B2 (en) Method for processing an image
WO2018203920A1 (en) Summarizing video content
CN106503112B (en) Video retrieval method and device
CN112135140B (en) Video definition identification method, electronic device and storage medium
CN106897454B (en) File classification method and device
CN103067713B (en) Method and system of bitmap joint photographic experts group (JPEG) compression detection
JP5845361B2 (en) Image analysis
CN110730277B (en) Information coding and method and device for acquiring coded information
CN110135465B (en) Model parameter representation space size estimation method and device and recommendation method
CN115022670B (en) Video file storage method, video file restoration device, video file storage equipment and storage medium
CN113473140B (en) Lossy compression method, system, device and storage medium for cranial nerve image
CN112054805B (en) Model data compression method, system and related equipment
CN111243046B (en) Image quality detection method, device, electronic equipment and storage medium
CN113706639A (en) Image compression method and device based on rectangular NAM, storage medium and computing equipment
US9756342B2 (en) Method for context based encoding of a histogram map of an image
CN110929767A (en) Font processing method, system, device and medium
CN114173154B (en) Video processing method and system
CN113536078B (en) Method, apparatus and computer storage medium for screening data
CN112257749A (en) Similar picture processing method and device, terminal equipment and storage medium
CN117370584A (en) Method and system for synthesizing multimedia data in depth
CN117274641A (en) Image processing method, device, electronic equipment and storage medium
CN117939127A (en) Image processing method and related equipment thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant