CN113722543A - Video similarity comparison method, system and equipment - Google Patents

Video similarity comparison method, system and equipment Download PDF

Info

Publication number
CN113722543A
CN113722543A CN202111072794.9A CN202111072794A CN113722543A CN 113722543 A CN113722543 A CN 113722543A CN 202111072794 A CN202111072794 A CN 202111072794A CN 113722543 A CN113722543 A CN 113722543A
Authority
CN
China
Prior art keywords
video
compared
audio
similar
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111072794.9A
Other languages
Chinese (zh)
Inventor
白书占
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Turing Chuangzhi Beijing Technology Co ltd
Original Assignee
Turing Chuangzhi Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Turing Chuangzhi Beijing Technology Co ltd filed Critical Turing Chuangzhi Beijing Technology Co ltd
Priority to CN202111072794.9A priority Critical patent/CN113722543A/en
Publication of CN113722543A publication Critical patent/CN113722543A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The invention discloses a video similarity comparison method, a system and equipment, which comprises the steps of obtaining image files and audio files of a video to be compared and a compared video, respectively carrying out similarity comparison on the image files and the audio files of the video to be compared and the compared video, wherein the image file comparison is a similar key frame group of the video to be compared, which is obtained according to the key frame similarity comparison, and synchronizing time stream information to obtain similar image fragments; and the audio file comparison comprises the steps of respectively segmenting and extracting the characteristics of the audio file of the video to be compared and the audio file of the compared video, and calculating the cosine similarity of the audio fragment of the video to be compared and the audio fragment of the compared video so as to determine the similar audio fragment. The invention has the beneficial effects that: the video comparison method is more comprehensive and more accurate by simultaneously comparing the images and the audios of the videos, and the positions of the similar videos are found according to the synchronous time flow information of the similar key frames, so that the comparison result is more visual.

Description

Video similarity comparison method, system and equipment
Technical Field
The invention relates to the technical field of computer video comparison, in particular to a method, a system and equipment for comparing video similarity.
Background
With the rapid development of the video industry, a great deal of video copyright infringement is generated, and at present, the main infringement forms comprise content transportation (such as second stealing, code printing, picture-in-picture and the like), secondary creation (such as unauthorized secondary creation and the like), video material citation (such as secondary editing, long-time splitting, short-time splitting and the like), and video recomposition modes of the same dubbing on pictures, the same dubbing on pictures and the like also appear. The infringement form is more and more concealed, and the extraction of infringement evidence is particularly important for judging whether the infringement is established or not.
In the prior art, image comparison is mainly used for comparing suspected videos to determine whether infringement exists, and as infringement forms are more and more concealed and diversified, only the image infringement comparison method cannot accurately distinguish whether infringement exists, and cannot find an infringement position and lock evidence.
Disclosure of Invention
In order to solve the above problems, an object of the present invention is to provide a video comparison method, system and device for comparing video images and audio simultaneously, so that the video comparison method is more comprehensive and accurate.
The invention provides a video similarity comparison method, which comprises the following steps:
processing the video set to be compared and the compared video set, and respectively acquiring an image file and an audio file of the video to be compared and an image file and an audio file of the video to be compared;
comparing the image file and the audio file of the video to be compared with the video to be compared, wherein the image file for comparing the video to be compared with the video to be compared comprises the following steps:
extracting N key frames according to the image files of the videos to be compared, and extracting M key frames according to the image files of the videos to be compared;
sequentially and respectively carrying out similarity comparison on N key frames of a video to be compared with each key frame of the compared video, obtaining a similar key frame group of the video to be compared according to the similarity comparison of the key frames, determining a similar image group of the video to be compared according to the similar key frame group, and synchronizing time stream information to obtain similar image fragments;
the audio file for comparing the video to be compared with the compared video comprises:
and respectively segmenting and extracting features of the audio file of the video to be compared and the audio file of the video to be compared, calculating the cosine similarity of the audio fragment of the video to be compared and the audio fragment of the video to be compared, and determining the similar audio fragment according to the cosine similarity obtained by calculation.
As a further improvement of the present invention, the performing similarity comparison between N key frames of a video to be compared and each key frame of a compared video sequentially includes:
calculating the hash value of each key frame of the video to be compared and the compared video according to a difference hash algorithm;
and calculating the Hamming distance between the hash value of the video to be compared and the hash value of the compared video, and judging whether similar image segments exist between the video to be compared and the compared video according to the calculated Hamming distance.
As a further improvement of the present invention, the determining a similar image group of a video to be compared according to a similar key frame group, and synchronizing time stream information to obtain a similar image segment includes: determining a starting point and an ending point of the similar image segments, wherein the determining the starting point of the similar image segments comprises:
calculating forward and synchronizing time flow information according to the current similar key frames of the video to be compared and the compared video as end frames, wherein the last similar key frame of the current similar key frame is a start frame;
and simultaneously carrying out dichotomy on the video to be compared and the compared video to obtain key frames, and calculating the similarity of the obtained key frames:
if the obtained key frames are similar, continuing to carry out dichotomy to obtain the key frames, and calculating the similarity of the obtained key frames; if the obtained key frames are not similar, the currently obtained key frame is the starting point of the similar image segment;
determining the end point of the similar image segment comprises:
calculating backwards and synchronizing time flow information according to the current similar key frames of the video to be compared and the compared video as starting frames, wherein the next similar key frame of the current key frame is an ending frame;
and simultaneously carrying out dichotomy on the video to be compared and the compared video to obtain key frames, and calculating the similarity of the obtained key frames:
if the obtained key frames are similar, continuing to carry out dichotomy to obtain the key frames, and calculating the similarity of the obtained key frames; and if the acquired key frames are not similar, the currently acquired key frame is the end point of the similar image segment.
As a further improvement of the present invention, comparing the audio file of the video to be compared with the audio file of the compared video further comprises:
before the audio file of the video to be compared and the audio file of the video to be compared are divided, the cosine similarity of the audio file of the video to be compared and the audio file of the video to be compared is calculated, and if the cosine similarity is greater than a preset first threshold value, the audio file of the video to be compared and the audio file of the video to be compared are determined to be similar audio files.
As a further improvement of the present invention, after obtaining similar image segments of the video to be compared and the video to be compared, capturing audio segments corresponding to the similar image segments for similarity comparison, includes:
calculating cosine similarity of audio segments of the video to be compared and the video to be compared, wherein if the cosine similarity is greater than or equal to a preset first threshold value, the audio segments corresponding to the similar image segments are similar audio segments;
if the cosine similarity is smaller than a preset first threshold, segmenting the audio segments corresponding to the similar image segments, and performing similarity comparison on the segmented audio segments, including determining a starting point and an ending point of the similar audio segments, wherein determining the starting point of the similar audio segments includes:
if the similarity of the audio segments after the video to be compared and the video to be compared are segmented is larger than a set second threshold, taking the starting time of the video segment as the ending time of the similarity comparison of the audio segments, subtracting a time interval T1 from the ending time of the similarity comparison of the audio segments, sequentially segmenting the audio segments of the video to be compared and the video to be compared at a time interval T1, sequentially performing the similarity comparison on the segmented audio segments, if the cosine similarity is larger than or equal to the set first threshold, performing the similarity comparison on the next segmented audio segment until the cosine similarity is smaller than the set first threshold, and taking the starting time of the last similar audio segment of the current audio segment as the starting point of the similar audio segment;
determining the end point of the similar audio piece comprises:
taking the end time of the video segment as the start time of the audio segment similarity comparison, taking the end time of the audio segment similarity comparison as the start time of the audio segment similarity comparison plus a time interval T1, sequentially dividing the audio segments of the video to be compared and the compared video at a time interval T1, sequentially performing similarity comparison on the divided audio segments, if the cosine similarity is greater than or equal to a set first threshold, performing similarity comparison on the next divided audio segment until the cosine similarity is less than the set first threshold, and taking the end time of the last similar audio segment of the current audio segment as the end point of the similar audio segment.
As a further improvement of the invention, the method respectively extracts the characteristics of the audio clip of the video to be compared and the audio clip of the video to be compared, and comprises the following steps:
step S1: processing the audio clip to obtain audio data and a sampling rate;
step S2: calculating the maximum frequency of the audio samples, sampling and quantizing;
step S3: pre-emphasis is performed on the audio signal obtained in step S2;
step S4: framing and windowing the audio signal obtained in the step S3 to obtain a frame array;
step S5: calculating the power spectrum of each frame after Fourier transform;
step S6: calculating a Mel triangular space filter to obtain a preliminary characteristic matrix;
step S7: carrying out logarithmic operation on the filtered matrix characteristics;
step S8: and performing discrete cosine transform on the logarithm operation result obtained in the step S7 to obtain a feature matrix of the audio frequency fragment of the video to be compared and a feature matrix of the audio frequency fragment of the compared video.
As a further improvement of the present invention, a cosine similarity SIM is calculated according to the feature matrix of the audio segment of the video to be compared and the feature matrix of the audio segment of the video to be compared, and a formula for calculating the cosine similarity SIM is as follows:
Figure BDA0003261028370000041
wherein, arr1 and arr2 are respectively the feature matrix of the audio clip of the video to be compared and the feature matrix of the audio clip of the video to be compared.
The invention also provides a video similarity comparison system, which comprises:
the acquisition module is used for processing the video set to be compared and the compared video set, and respectively acquiring an image file and an audio file of the video to be compared and an image file and an audio file of the video to be compared;
the image comparison module is used for comparing the image files of the video to be compared with the image files of the video to be compared, and comprises:
extracting N key frames according to the image files of the videos to be compared, and extracting M key frames according to the image files of the videos to be compared;
sequentially and respectively carrying out similarity comparison on N key frames of a video to be compared with each key frame of the compared video, obtaining a similar key frame group of the video to be compared according to the similarity comparison of the key frames, determining a similar image group of the video to be compared according to the similar key frame group, and synchronizing time stream information to obtain similar image fragments;
the audio file comparison module is used for comparing the audio files of the video to be compared with the audio files of the video to be compared, and comprises:
and respectively segmenting and extracting features of the audio file of the video to be compared and the audio file of the video to be compared, calculating the cosine similarity of the audio fragment of the video to be compared and the audio fragment of the video to be compared, and determining the similar audio fragment according to the cosine similarity obtained by calculation.
As a further improvement of the present invention, the image comparison module sequentially and respectively compares the similarity of N key frames of the video to be compared with each key frame of the compared video, and the comparison comprises:
calculating the hash value of each key frame of the video to be compared and the compared video according to a difference hash algorithm;
and calculating the Hamming distance between the hash value of the video to be compared and the hash value of the compared video, and judging whether similar image segments exist between the video to be compared and the compared video according to the calculated Hamming distance.
As a further improvement of the present invention, the image comparison module determines a similar image group of the video to be compared according to the similar key frame group, and synchronizes the time stream information to obtain a similar image segment includes: determining a starting point and an ending point of the similar image segments, wherein the determining the starting point of the similar image segments comprises:
calculating forward and synchronizing time flow information according to the current similar key frames of the video to be compared and the compared video as end frames, wherein the last similar key frame of the current similar key frame is a start frame;
and simultaneously carrying out dichotomy on the video to be compared and the compared video to obtain key frames, and calculating the similarity of the obtained key frames:
if the obtained key frames are similar, continuing to carry out dichotomy to obtain the key frames, and calculating the similarity of the obtained key frames; if the obtained key frames are not similar, the currently obtained key frame is the starting point of the similar image segment;
determining the end point of the similar image segment comprises:
calculating backwards and synchronizing time flow information according to the current similar key frames of the video to be compared and the compared video as starting frames, wherein the next similar key frame of the current key frame is an ending frame;
and simultaneously carrying out dichotomy on the video to be compared and the compared video to obtain key frames, and calculating the similarity of the obtained key frames:
if the obtained key frames are similar, continuing to carry out dichotomy to obtain the key frames, and calculating the similarity of the obtained key frames; and if the acquired key frames are not similar, the currently acquired key frame is the end point of the similar image segment.
As a further improvement of the present invention, the comparing module of the audio file compares the audio file of the video to be compared with the audio file of the compared video, and further comprises:
before the audio file of the video to be compared and the audio file of the video to be compared are divided, the cosine similarity of the audio file of the video to be compared and the audio file of the video to be compared is calculated, and if the cosine similarity is greater than a preset first threshold value, the audio file of the video to be compared and the audio file of the video to be compared are determined to be similar audio files.
As a further improvement of the present invention, the audio file comparison module performs similarity comparison on the acquired audio clips corresponding to the similar image clips of the video to be compared and the compared video, and the similarity comparison includes:
calculating cosine similarity of audio segments of the video to be compared and the video to be compared, wherein if the cosine similarity is greater than or equal to a preset first threshold value, the audio segments corresponding to the similar image segments are similar audio segments;
if the cosine similarity is smaller than a preset first threshold, segmenting the audio segments corresponding to the similar image segments, and performing similarity comparison on the segmented audio segments, including determining a starting point and an ending point of the similar audio segments, wherein determining the starting point of the similar audio segments includes:
if the similarity of the audio segments after the video to be compared and the video to be compared are segmented is larger than a set second threshold, taking the starting time of the video segment as the ending time of the similarity comparison of the audio segments, subtracting a time interval T1 from the ending time of the similarity comparison of the audio segments, sequentially segmenting the audio segments of the video to be compared and the video to be compared at a time interval T1, sequentially performing the similarity comparison on the segmented audio segments, if the cosine similarity is larger than or equal to the set first threshold, performing the similarity comparison on the next segmented audio segment until the cosine similarity is smaller than the set first threshold, and taking the starting time of the last similar audio segment of the current audio segment as the starting point of the similar audio segment;
determining the end point of the similar audio piece comprises:
taking the end time of the video segment as the start time of the audio segment similarity comparison, taking the end time of the audio segment similarity comparison as the start time of the audio segment similarity comparison plus a time interval T1, sequentially dividing the audio segments of the video to be compared and the compared video at a time interval T1, sequentially performing similarity comparison on the divided audio segments, if the cosine similarity is greater than or equal to a set first threshold, performing similarity comparison on the next divided audio segment until the cosine similarity is less than the set first threshold, and taking the end time of the last similar audio segment of the current audio segment as the end point of the similar audio segment.
As a further improvement of the present invention, the audio comparison module performs feature extraction on the audio clip of the video to be compared and the audio clip of the video to be compared, respectively, and includes the following steps:
step S1: processing the audio clip to obtain audio data and a sampling rate;
step S2: calculating the maximum frequency of the audio samples, sampling and quantizing;
step S3: pre-emphasis is performed on the audio signal obtained in step S2;
step S4: framing and windowing the audio signal obtained in the step S3 to obtain a frame array;
step S5: calculating the power spectrum of each frame after Fourier transform;
step S6: calculating a Mel triangular space filter to obtain a preliminary characteristic matrix;
step S7: carrying out logarithmic operation on the filtered matrix characteristics;
step S8: and performing discrete cosine transform on the logarithm operation result obtained in the step S7 to obtain a feature matrix of the audio frequency fragment of the video to be compared and a feature matrix of the audio frequency fragment of the compared video.
As a further improvement of the present invention, the audio comparison module calculates the cosine similarity SIM according to the feature matrix of the audio segment of the video to be compared and the feature matrix of the audio segment of the video to be compared, and the formula for calculating the cosine similarity SIM is as follows:
Figure BDA0003261028370000071
wherein, arr1 and arr2 are respectively the feature matrix of the audio clip of the video to be compared and the feature matrix of the audio clip of the video to be compared.
The invention provides electronic equipment which comprises a memory and a processor and is characterized in that the memory is used for storing one or more computer instructions, and the one or more computer instructions are executed by the processor to realize the video comparison method.
The invention provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program is executed by a processor to implement the video comparison method.
The invention has the beneficial effects that: the video comparison method is more comprehensive and more accurate by simultaneously comparing the images and the audios of the videos, and the positions of the similar videos are found according to the synchronous time flow information of the similar key frames, so that the comparison result is more visual.
Drawings
Fig. 1 is a flowchart of a method for comparing video similarity according to an embodiment of the present invention;
fig. 2 is a flowchart of calculating a hash value by using a difference hash algorithm of the video similarity comparison method according to the embodiment of the present invention;
fig. 3 is an audio comparison flowchart of a video similarity comparison method according to an embodiment of the present invention;
fig. 4 is a schematic system structure diagram of a video similarity comparison system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present disclosure.
It should be noted that, if directional indications (such as up, down, left, right, front, and back … …) are involved in the embodiment of the present invention, the directional indications are only used to explain the relative positional relationship between the components, the movement situation, and the like in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indications are changed accordingly.
In addition, in the description of the present invention, the terms used are for illustrative purposes only and are not intended to limit the scope of the present disclosure. The terms "comprises" and/or "comprising" are used to specify the presence of elements, steps, operations, and/or components, but do not preclude the presence or addition of one or more other elements, steps, operations, and/or components. The terms "first," "second," and the like may be used to describe various elements, not necessarily order, and not necessarily limit the elements. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified. These terms are only used to distinguish one element from another. These and/or other aspects will become apparent to those of ordinary skill in the art in view of the following drawings, and the description of the embodiments of the present invention will be more readily understood. The drawings are used for the purpose of illustrating embodiments of the disclosure only. One skilled in the art will readily recognize from the following description that alternative embodiments of the illustrated structures and methods of the present invention may be employed without departing from the principles of the present disclosure.
As shown in fig. 1, a method for comparing video similarity according to an embodiment of the present invention includes:
processing the video set to be compared and the compared video set, and respectively acquiring an image file and an audio file of the video to be compared and an image file and an audio file of the video to be compared;
comparing the image file and the audio file of the video to be compared with the video to be compared, wherein the image file for comparing the video to be compared with the video to be compared comprises the following steps:
extracting N key frames according to the image files of the videos to be compared, and extracting M key frames according to the image files of the videos to be compared;
sequentially and respectively carrying out similarity comparison on N key frames of a video to be compared with each key frame of the compared video, obtaining a similar key frame group of the video to be compared according to the similarity comparison of the key frames, determining a similar image group of the video to be compared according to the similar key frame group, and synchronizing time stream information to obtain similar image fragments;
the audio file for comparing the video to be compared with the compared video comprises:
and respectively segmenting and extracting features of the audio file of the video to be compared and the audio file of the video to be compared, calculating the cosine similarity of the audio fragment of the video to be compared and the audio fragment of the video to be compared, and determining the similar audio fragment according to the cosine similarity obtained by calculation.
For example, if there is a threshold greater than X in any group of data in the video information in the corresponding N groups of libraries within M minutes of the keyframe group of the video set to be compared, and the similarity of the audio original of the two videos is greater than Y, and both X and Y are predetermined thresholds, it can be determined that the similarity of the two videos is high.
In an optional embodiment, when extracting the key frames, for example, the key frames of the compared videos may be extracted according to the time result types of the videos to be compared, and the formula for calculating the average time difference of the key frames is as follows:
td=total/framenum/fps/60
wherein, total is the total frame number of the compared video, framenum is the number of the extracted frames, and fps is the frame rate of the compared video.
If the time difference is between (0, 1):
the starting frame calculation formula is:
starttime=[fps*(total/framenum/fps/framenum)]*mu*2+100
wherein, total is the total frame number of the compared video, frame num is the number of the extracted frames, fps is the frame rate of the compared video, and mu is 1.
End frame calculation formula:
Figure BDA0003261028370000101
wherein, total is the total frame number of the compared video, frame num is the number of the extracted frames, fps is the frame rate of the compared video, mu is 1, and menu is 1.
When time stream information is synchronized, time can be located according to the key frames:
Figure BDA0003261028370000102
Figure BDA0003261028370000103
Figure BDA0003261028370000104
wherein, frames is the video frame number, and rate is the video frame rate.
In an optional implementation manner, the similarity comparison is sequentially performed on the N key frames of the video to be compared and each key frame of the video to be compared, including calculating the hash value of each key frame of the video to be compared and the video to be compared according to a difference hash algorithm; and calculating the Hamming distance between the hash value of the video to be compared and the hash value of the compared video, and judging whether similar image segments exist between the video to be compared and the compared video according to the calculated Hamming distance.
As shown in fig. 2, the method for calculating the hash value of each key frame of the video to be compared and the video to be compared by the difference hayes algorithm includes the following steps:
1) the images are reduced to the same proportion, so that the details can be removed, the basic outline special diagnosis can be obtained, and the speed of generating the hash value can be increased;
2) graying of an image:
graying (including image elements: width, height, depth) is achieved by changing the RGB three channels to a single channel.
3) And (4) calculating a difference value, and subtracting two adjacent elements (subtracting the right element from the left element) to obtain N difference values with different specified numbers.
4) Processing the hash value, if the hash value is a positive mark or 0 random number or letter is the same, if the hash value is negative, then the hash value is not;
5) principle of operation
Figure BDA0003261028370000111
Wherein, A is the pixel value of a certain frame of the video to be compared, and B is the pixel value of a certain frame of the compared video.
Figure BDA0003261028370000112
Finally, matrix information with matrix characteristics of (N x N) is obtained, the positive and negative of the number are judged, different 0 and 1 values are marked, and then the Hamming distance (namely the number of the same characters of the two character strings) is calculated. For example, if the number of identical characters is 8 and the total length of the hash values is 16, the similarity coefficient between the two is 8/16-0.5. The similarity coefficient can be adjusted according to requirements, the determined coefficient is called a threshold, for example, the threshold coefficient is 0.9, and if the similarity coefficient is greater than 0.9, it is determined that the two images are similar. When the method is applied to video infringement judgment, a compared video is a legal video, the similarity of the two is greater than a set similarity coefficient threshold, the video to be compared can be judged as infringement, and if the similarity is less than the similarity coefficient threshold, the video to be compared is judged as not infringement.
An alternative embodiment, determining a similar image group of videos to be compared according to the similar key frame group, and synchronizing time stream information to obtain similar image segments includes: determining a starting point and an ending point of the similar image segments, wherein the determining the starting point of the similar image segments comprises:
and calculating forward and synchronizing time flow information according to the current similar key frame of the video to be compared and the compared video as an end frame, wherein the last similar key frame of the current similar key frame is a starting frame, and the time from the beginning of the video is calculated if the current frame is the starting frame. And simultaneously carrying out dichotomy on the video to be compared and the compared video to obtain a key frame, and calculating the similarity of the obtained key frame: if the obtained key frames are similar, continuing to carry out dichotomy to obtain the key frames, and calculating the similarity of the obtained key frames; if the obtained key frames are not similar, the currently obtained key frame is the starting point of the similar image segment;
determining the end point of the similar image segment comprises:
calculating backwards and synchronizing time flow information according to the current similar key frame of the video to be compared and the video to be compared as a starting frame, wherein the next similar key frame of the current key frame is an ending frame, and the time from the end of the video is calculated if the current frame is the ending frame; and simultaneously carrying out dichotomy on the video to be compared and the compared video to obtain key frames, and calculating the similarity of the obtained key frames: if the obtained key frames are similar, continuing to carry out dichotomy to obtain the key frames, and calculating the similarity of the obtained key frames; and if the acquired key frames are not similar, the currently acquired key frame is the end point of the similar image segment.
The method is applied to video infringement judgment, if the video to be compared is a pirate video and the compared video is a legal video, the start time and the end time of suspected infringement of the image can be found through the method, and the suspected infringement position of the image can be positioned between the start point and the end point.
In an optional embodiment, comparing the audio file of the video to be compared with the audio file of the video to be compared further includes: before the audio file of the video to be compared and the audio file of the video to be compared are divided, the cosine similarity of the audio file of the video to be compared and the audio file of the video to be compared is calculated, and if the cosine similarity is greater than a preset first threshold value, the audio file of the video to be compared and the audio file of the video to be compared are determined to be similar audio files. When the method is applied to video infringement judgment, if the cosine similarity of the audio files corresponding to the two video files is larger than a set threshold, dubbing is determined to be completely the same, and a suspected infringement condition exists.
An optional embodiment is that after obtaining similar image segments of a video to be compared and a compared video, an audio segment corresponding to the similar image segment is intercepted to perform similarity comparison, and the method includes:
calculating cosine similarity of audio segments of the video to be compared and the video to be compared, wherein if the cosine similarity is greater than or equal to a preset first threshold value, the audio segments corresponding to the similar image segments are similar audio segments; if the cosine similarity is smaller than a preset first threshold, segmenting the audio segments corresponding to the similar image segments, and performing similarity comparison on the segmented audio segments, including determining a starting point and an ending point of the similar audio segments, wherein determining the starting point of the similar audio segments includes:
if the similarity of the video to be compared and the audio segment divided by the video to be compared is greater than a set second threshold, taking the start time of the video segment as the end time of the similarity comparison of the audio segment, and the start time of the similarity comparison of the audio segment is the end time of the similarity comparison of the audio segment minus a time interval T1 (for example, 5 seconds), sequentially dividing the audio segment of the video to be compared and the audio segment of the video to be compared at a time interval T1, sequentially performing the similarity comparison on the divided audio segments, and if the cosine similarity is greater than or equal to the set first threshold, performing the similarity comparison on the next divided audio segment until the cosine similarity is less than the set first threshold, and taking the start time of the last similar audio segment of the current audio segment as the start point of the similar audio segment;
determining the end point of the similar audio piece comprises: taking the ending time of the video segment as the starting time of the audio segment similarity comparison, taking the ending time of the audio segment similarity comparison as the starting time of the audio segment similarity comparison plus a time interval T1 (for example, 5 seconds), sequentially dividing the audio segments of the video to be compared and the compared video at a time interval T1, sequentially performing similarity comparison on the divided audio segments, if the cosine similarity is greater than or equal to a set first threshold, performing similarity comparison on the next divided audio segment until the cosine similarity is less than the set first threshold, and taking the ending time of the last similar audio segment of the current audio segment as the ending point of the similar audio segment.
The method is applied to video infringement judgment, if the video to be compared is a pirated video and the compared video is a legal video, the start time and the end time of suspected audio infringement can be found through the method, and the suspected audio infringement position can be positioned between the start point and the end point.
An optional implementation manner, respectively performing feature extraction on an audio segment of a video to be compared and an audio segment of a video to be compared, and performing similarity comparison between the two, as shown in fig. 3, includes:
processing the audio clip to obtain audio data and a sampling rate; for example, if the uploaded file is a.mp 3 file, which needs to be converted into a.wav lossless format file, the signal data is obtained by scipy to the sampling rate.
The maximum frequency of the audio samples is calculated, typically the sampling interval duration is at least one time longer than the signal period time, hf ═ sr/2, where sr is the sampling frequency and hf is the maximum frequency.
Pre-emphasis, for example, using a difference equation to implement pre-emphasis, the pre-emphasis equation being:
y(n)=x(n)-ax(n-1)
wherein a is 0.95; x (n) is the original audio signal, represented by a matrix of n x n.
Pre-emphasis is mainly to remove the influence of lip radiation, increase the high resolution of speech, and be more accurate for audio contrast.
Framing and windowing to obtain a frame array:
the main purposes of framing and windowing are as follows: speech signals are macroscopically unstable, microscopically short-term, and gibbs effects may occur after framing.
In this embodiment, the frame acquisition time length is: wl × sr (wl is the window length, value 25ms, sr is the sampling frequency), step size between adjacent frames: ws × sr (ws is a window interval, value 10ms, sr is the sampling frequency), calculate the total length of the frame:
Figure BDA0003261028370000141
where sl is the total length of the signal, fl is the frame time length, and fs is the step size between adjacent frames.
And then extracting time points of all frames subjected to matrix operation to obtain a matrix result with the total length multiplied by fl, and forming a final frame matrix signal according to a window function.
Calculating the power spectrum after Fourier transform of each frame: for example, fourier transform sp may be performed by an existing numpy scientific tool (if the matrix shape of the frame data is N × L, the shape after passing numpy. fft. rfft is N × nfft, and nfft takes 512). Then, calculating a power spectrum to obtain a summed power spectrum, wherein the power spectrum calculation formula is as follows:
Figure BDA0003261028370000142
wherein, NFFT takes 512, sp is the value after Fourier transform.
And calculating a Mel triangular pitch filter to obtain a preliminary characteristic matrix, so that the small frequency change in low frequency can be distinguished more easily by simulating human hearing. The method specifically comprises the following steps:
the frequency is firstly converted into the Mel frequency, and because the size of the sound distinguished by the human ear is required to be linear and non-linear, the linear division is carried out by converting into the Mel frequency, and the formula is as follows:
2595 × log (1+ hz/700.0), wherein hz is the frequency;
and converting the calculated Mel frequency into hz, wherein the formula is as follows: 700(10m/2505-1), wherein m is the above-calculated Mel coefficient,
find the converted frequency, find the corresponding position in fft and establish the filter, and calculate the filter matrix through the filter. The formula is as follows:
Figure BDA0003261028370000143
wherein m is the number of the filters,
Figure BDA0003261028370000144
where N is 512, fl is the mel-factor, and W is the sampling rate.
And then summing each frame of the energy spectrum according to the line, wherein the formula is as follows:
Figure BDA0003261028370000145
wherein sp is the energy spectrum, i is the number of matrix rows, and j is the number of matrix columns.
And then calculating a filtered result by using the filter and the summed energy spectrum, wherein the formula is as follows:
log(sp*fb.T)
wherein sp is the summed energy spectrum and fb is the filter.
And carrying out logarithmic operation on the filtered matrix characteristics.
And performing discrete cosine transform on the obtained logarithm operation result, and performing energy concentration.
For example, the scipy scientific computation package is used for computation, and a specific discrete cosine transform kernel formula is as follows:
Figure BDA0003261028370000151
obtaining a final characteristic matrix after DCT forward transformation, wherein the specific formula is as follows:
Figure BDA0003261028370000152
wherein f (x, y) is a feature matrix after logarithmic operation.
Calculating cosine similarity SIM according to the obtained feature matrix of the audio frequency fragment of the video to be compared and the feature matrix of the audio frequency fragment of the video to be compared, wherein the formula for calculating the cosine similarity SIM is as follows:
Figure BDA0003261028370000153
wherein, arr1 and arr2 are respectively the feature matrix of the audio clip of the video to be compared and the feature matrix of the audio clip of the video to be compared.
The obtained similarity is the SIM, and the SIM is compared with the required similarity to judge whether the dubbing is suspected to infringe.
The present invention also provides a video similarity comparison system, as shown in fig. 4, the system includes:
the acquisition module is used for processing the video set to be compared and the compared video set, and respectively acquiring an image file and an audio file of the video to be compared and an image file and an audio file of the video to be compared;
the image comparison module is used for comparing the image files of the video to be compared with the image files of the video to be compared, and comprises:
extracting N key frames according to the image files of the videos to be compared, and extracting M key frames according to the image files of the videos to be compared;
sequentially and respectively carrying out similarity comparison on N key frames of a video to be compared with each key frame of the compared video, obtaining a similar key frame group of the video to be compared according to the similarity comparison of the key frames, determining a similar image group of the video to be compared according to the similar key frame group, and synchronizing time stream information to obtain similar image fragments;
the audio file comparison module is used for comparing the audio files of the video to be compared with the audio files of the video to be compared, and comprises:
and respectively segmenting and extracting features of the audio file of the video to be compared and the audio file of the video to be compared, calculating the cosine similarity of the audio fragment of the video to be compared and the audio fragment of the video to be compared, and determining the similar audio fragment according to the cosine similarity obtained by calculation.
After the processing is finished, the processed data are stored in a warehouse, a comparison result needs to be retrieved again or similar videos need to be compared again later, the videos can be read from the video warehouse to be checked, whether the videos are in the key frame database or not is judged, and if yes, a plurality of groups of key frames corresponding to the data, corresponding difference hash values, corresponding time stream information and audio original files corresponding to the videos in the warehouse are searched. And calculating deviation frame synchronization with the video in the library to acquire a key frame group and an audio original of the uploaded video through the time stream of the uploaded video.
And storing the key frame information into a database: and calculating the deviation frame rate to calculate out the corresponding frame of the video in the library. And acquiring N groups of key frames within M minutes, corresponding difference hash values and audio original files of videos in the library, and storing the key frames in a key frame library. The comparison method is repeated to obtain the key frame group and the key frame of certain video data uploaded by the user and the audio file in the database, the same video segment is searched, and the comparison time can be greatly shortened.
During suspected infringement video comparison, the method is applied to the situation that original edition and pirate video content are unknown, purely performing computer comparison, if two video contents to be compared are known, only suspected infringement evidence locking is needed, and in order to increase the working efficiency, the following method can be used:
1) dynamically dividing the number of threads: and dynamically dividing the processing data responsible for each thread according to the data number and the highest thread number.
2) The following situation is processed for infringing content locking:
one original video and one pirate video: and extracting N key frames according to the average value of the total frame number of the whole video time stream, and generating corresponding difference hash values and time stream information for the whole key frame group.
Two original videos are combined into one video, and one pirate video: and extracting N key frames according to the two original video frames, and generating a difference hash value and corresponding time flow information.
Two pirated videos are merged, one authentic video: and extracting N key frames according to the average value of the total frame number of the whole video time stream, and generating corresponding difference hash values and time stream information for the whole key frame group.
3) And extracting key frame groups in M times before and after the time of the pirated frame:
starttime=[timecover-60*fpscover*frametime]
endtime=[timecover+60*fpscover*frametime]
wherein, timecover is the pirate frame number of the determined back deviation value, fpscover is the pirate frame rate, and the frame time is the time multiple.
Determining the time to begin and end generates a pirate keyframe group:
one original video and one pirate video: and acquiring the key frame group in the M time period by using the formula according to the legal and pirated video frame rates.
One legal two pirates are combined into one video: and taking a key frame group of an image group (the reason is that the time of the original video is equal to (or approximately equal to) the sum of the durations of two pirated videos) of the original video by N x 2 times according to the content formula, and subtracting the total frame number of the first infringing video when the frame number of the next N video frames reaches the corresponding image, thereby obtaining the key frame group of the second infringing video within N M time periods.
Two original videos are combined into one video and one pirate video: and extracting N key frames of the two original videos, and dynamically distributing the combination of the two key frames of the two videos according to a mathematical formula. Taking the first positive video key frame and taking out the key frame groups in N M time periods, and for the second positive video key frame, adding the frame number to the sum of the first positive video key frame so as to obtain the key frame groups in N M time periods.
4) Comparing the infringement key frame group corresponding to each key frame, and acquiring an image with the highest similarity: the similarity is judged through the Hamming distance, and the Hamming distance is the number of the same characters judged according to two character strings with equal length, and specifically comprises the following steps:
generating a hash value, and calculating the similarity: and generating a hash value for all pirate key frame groups corresponding to each legal version key frame, calculating the Hamming distance and storing the Hamming distance into a sqlite database.
Acquiring the most similar key frame of the image group in M minutes: and performing descending sequencing according to the content of the sqlite database to obtain the final key frame and the corresponding time node as well as the key frame and the time node corresponding to the original video.
Returning final data: and after the current data is processed, the total duration of the two videos and the corresponding video names are finally obtained.
The invention also relates to an electronic device comprising the server, the terminal and the like. The electronic device includes: at least one processor; a memory communicatively coupled to the at least one processor; and a communication component communicatively coupled to the storage medium, the communication component receiving and transmitting data under control of the processor; wherein the memory stores instructions executable by the at least one processor to implement the method of the above embodiments.
In an alternative embodiment, the memory is used as a non-volatile computer-readable storage medium for storing non-volatile software programs, non-volatile computer-executable programs, and modules. The processor executes various functional applications of the device and data processing, i.e., implements the method, by executing nonvolatile software programs, instructions, and modules stored in the memory.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store a list of options, etc. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be connected to the external device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more modules are stored in the memory and, when executed by the one or more processors, perform the methods of any of the method embodiments described above.
The product can execute the method provided by the embodiment of the application, has corresponding functional modules and beneficial effects of the execution method, and can refer to the method provided by the embodiment of the application without detailed technical details in the embodiment.
The present invention also relates to a computer-readable storage medium for storing a computer-readable program for causing a computer to perform some or all of the above-described method embodiments.
That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Furthermore, those of ordinary skill in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
It will be understood by those skilled in the art that while the present invention has been described with reference to exemplary embodiments, various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims (10)

1. A video similarity comparison method is characterized by comprising the following steps:
processing the video set to be compared and the compared video set, and respectively acquiring an image file and an audio file of the video to be compared and an image file and an audio file of the video to be compared;
comparing the image file and the audio file of the video to be compared with the video to be compared, wherein the image file for comparing the video to be compared with the video to be compared comprises the following steps:
extracting N key frames according to the image files of the videos to be compared, and extracting M key frames according to the image files of the videos to be compared;
sequentially and respectively carrying out similarity comparison on N key frames of a video to be compared with each key frame of the compared video, obtaining a similar key frame group of the video to be compared according to the similarity comparison of the key frames, determining a similar image group of the video to be compared according to the similar key frame group, and synchronizing time stream information to obtain similar image fragments;
the audio file for comparing the video to be compared with the compared video comprises:
and respectively segmenting and extracting features of the audio file of the video to be compared and the audio file of the video to be compared, calculating the cosine similarity of the audio fragment of the video to be compared and the audio fragment of the video to be compared, and determining the similar audio fragment according to the cosine similarity obtained by calculation.
2. The method according to claim 1, wherein the performing similarity comparison between the N key frames of the video to be compared and each key frame of the compared video sequentially comprises:
calculating the hash value of each key frame of the video to be compared and the compared video according to a difference hash algorithm;
and calculating the Hamming distance between the hash value of the video to be compared and the hash value of the compared video, and judging whether similar image segments exist between the video to be compared and the compared video according to the calculated Hamming distance.
3. The method of claim 1, wherein the determining a similar image group of the videos to be compared according to the similar key frame group and synchronizing the time stream information to obtain similar image segments comprises: determining a starting point and an ending point of the similar image segments, wherein the determining the starting point of the similar image segments comprises:
calculating forward and synchronizing time flow information according to the current similar key frames of the video to be compared and the compared video as end frames, wherein the last similar key frame of the current similar key frame is a start frame;
and simultaneously carrying out dichotomy on the video to be compared and the compared video to obtain key frames, and calculating the similarity of the obtained key frames:
if the obtained key frames are similar, continuing to carry out dichotomy to obtain the key frames, and calculating the similarity of the obtained key frames; if the obtained key frames are not similar, the currently obtained key frame is the starting point of the similar image segment;
determining the end point of the similar image segment comprises:
calculating backwards and synchronizing time flow information according to the current similar key frames of the video to be compared and the compared video as starting frames, wherein the next similar key frame of the current key frame is an ending frame;
and simultaneously carrying out dichotomy on the video to be compared and the compared video to obtain key frames, and calculating the similarity of the obtained key frames:
if the obtained key frames are similar, continuing to carry out dichotomy to obtain the key frames, and calculating the similarity of the obtained key frames; and if the acquired key frames are not similar, the currently acquired key frame is the end point of the similar image segment.
4. The method of claim 1, wherein comparing the audio files of the video to be compared and the video to be compared further comprises:
before the audio file of the video to be compared and the audio file of the video to be compared are divided, the cosine similarity of the audio file of the video to be compared and the audio file of the video to be compared is calculated, and if the cosine similarity is greater than a preset first threshold value, the audio file of the video to be compared and the audio file of the video to be compared are determined to be similar audio files.
5. The method of claim 1, wherein after obtaining similar image segments of the video to be compared and the video to be compared, capturing audio segments corresponding to the similar image segments for similarity comparison, comprises:
calculating cosine similarity of audio segments of the video to be compared and the video to be compared, wherein if the cosine similarity is greater than or equal to a preset first threshold value, the audio segments corresponding to the similar image segments are similar audio segments;
if the cosine similarity is smaller than a preset first threshold, segmenting the audio segments corresponding to the similar image segments, and performing similarity comparison on the segmented audio segments, including determining a starting point and an ending point of the similar audio segments, wherein determining the starting point of the similar audio segments includes:
if the similarity of the audio segments after the video to be compared and the video to be compared are segmented is larger than a set second threshold, taking the starting time of the video segment as the ending time of the similarity comparison of the audio segments, subtracting a time interval T1 from the ending time of the similarity comparison of the audio segments, sequentially segmenting the audio segments of the video to be compared and the video to be compared at a time interval T1, sequentially performing the similarity comparison on the segmented audio segments, if the cosine similarity is larger than or equal to the set first threshold, performing the similarity comparison on the next segmented audio segment until the cosine similarity is smaller than the set first threshold, and taking the starting time of the last similar audio segment of the current audio segment as the starting point of the similar audio segment;
determining the end point of the similar audio piece comprises:
taking the end time of the video segment as the start time of the audio segment similarity comparison, taking the end time of the audio segment similarity comparison as the start time of the audio segment similarity comparison plus a time interval T1, sequentially dividing the audio segments of the video to be compared and the compared video at a time interval T1, sequentially performing similarity comparison on the divided audio segments, if the cosine similarity is greater than or equal to a set first threshold, performing similarity comparison on the next divided audio segment until the cosine similarity is less than the set first threshold, and taking the end time of the last similar audio segment of the current audio segment as the end point of the similar audio segment.
6. The method according to claim 1, wherein the feature extraction is performed on the audio segments of the video to be compared and the audio segments of the video to be compared respectively, and the method comprises the following steps:
step S1: processing the audio clip to obtain audio data and a sampling rate;
step S2: calculating the maximum frequency of the audio samples, sampling and quantizing;
step S3: pre-emphasis is performed on the audio signal obtained in step S2;
step S4: framing and windowing the audio signal obtained in the step S3 to obtain a frame array;
step S5: calculating the power spectrum of each frame after Fourier transform;
step S6: calculating a Mel triangular space filter to obtain a preliminary characteristic matrix;
step S7: carrying out logarithmic operation on the filtered matrix characteristics;
step S8: and performing discrete cosine transform on the logarithm operation result obtained in the step S7 to obtain a feature matrix of the audio frequency fragment of the video to be compared and a feature matrix of the audio frequency fragment of the compared video.
7. The method of claim 6, wherein the cosine similarity SIM is calculated according to the feature matrix of the audio segment of the video to be compared and the feature matrix of the audio segment of the video to be compared, and the formula for calculating the cosine similarity SIM is as follows:
Figure FDA0003261028360000041
wherein, arr1 and arr2 are respectively the feature matrix of the audio clip of the video to be compared and the feature matrix of the audio clip of the video to be compared.
8. A video similarity comparison system, comprising:
the acquisition module is used for processing the video set to be compared and the compared video set, and respectively acquiring an image file and an audio file of the video to be compared and an image file and an audio file of the video to be compared;
the image comparison module is used for comparing the image files of the video to be compared with the image files of the video to be compared, and comprises:
extracting N key frames according to the image files of the videos to be compared, and extracting M key frames according to the image files of the videos to be compared;
sequentially and respectively carrying out similarity comparison on N key frames of a video to be compared with each key frame of the compared video, obtaining a similar key frame group of the video to be compared according to the similarity comparison of the key frames, determining a similar image group of the video to be compared according to the similar key frame group, and synchronizing time stream information to obtain similar image fragments;
the audio file comparison module is used for comparing the audio files of the video to be compared with the audio files of the video to be compared, and comprises:
and respectively segmenting and extracting features of the audio file of the video to be compared and the audio file of the video to be compared, calculating the cosine similarity of the audio fragment of the video to be compared and the audio fragment of the video to be compared, and determining the similar audio fragment according to the cosine similarity obtained by calculation.
9. An electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method of any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, the computer program being executable by a processor for implementing the method according to any one of claims 1-7.
CN202111072794.9A 2021-09-14 2021-09-14 Video similarity comparison method, system and equipment Pending CN113722543A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111072794.9A CN113722543A (en) 2021-09-14 2021-09-14 Video similarity comparison method, system and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111072794.9A CN113722543A (en) 2021-09-14 2021-09-14 Video similarity comparison method, system and equipment

Publications (1)

Publication Number Publication Date
CN113722543A true CN113722543A (en) 2021-11-30

Family

ID=78683516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111072794.9A Pending CN113722543A (en) 2021-09-14 2021-09-14 Video similarity comparison method, system and equipment

Country Status (1)

Country Link
CN (1) CN113722543A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114238692A (en) * 2022-02-23 2022-03-25 北京嘉沐安科技有限公司 Network live broadcast-oriented video big data accurate retrieval method and system
CN114969428A (en) * 2022-07-27 2022-08-30 深圳市纬亚森科技有限公司 Big data based audio and video intelligent supervision system and method
CN115086713A (en) * 2022-06-13 2022-09-20 乐知未来科技(深圳)有限公司 Repeated short video cleaning method based on visual features and audio features
CN115640422A (en) * 2022-11-29 2023-01-24 苏州琅日晴传媒科技有限公司 Network media video data analysis and supervision system
CN116471452A (en) * 2023-05-10 2023-07-21 武汉亿臻科技有限公司 Video editing platform based on intelligent AI
CN116939197A (en) * 2023-09-15 2023-10-24 海看网络科技(山东)股份有限公司 Live program head broadcasting and replay content consistency monitoring method based on audio and video

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050074168A1 (en) * 2003-10-03 2005-04-07 Cooper Matthew L. Methods and systems for discriminative keyframe selection
US20100188580A1 (en) * 2009-01-26 2010-07-29 Stavros Paschalakis Detection of similar video segments
CN106484837A (en) * 2016-09-30 2017-03-08 腾讯科技(北京)有限公司 The detection method of similar video file and device
CN109189991A (en) * 2018-08-17 2019-01-11 百度在线网络技术(北京)有限公司 Repeat video frequency identifying method, device, terminal and computer readable storage medium
CN110134829A (en) * 2019-04-28 2019-08-16 腾讯科技(深圳)有限公司 Video locating method and device, storage medium and electronic device
CN111507260A (en) * 2020-04-17 2020-08-07 重庆邮电大学 Video similarity rapid detection method and detection device
CN111831852A (en) * 2020-07-07 2020-10-27 北京灵汐科技有限公司 Video retrieval method, device, equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050074168A1 (en) * 2003-10-03 2005-04-07 Cooper Matthew L. Methods and systems for discriminative keyframe selection
US20100188580A1 (en) * 2009-01-26 2010-07-29 Stavros Paschalakis Detection of similar video segments
CN106484837A (en) * 2016-09-30 2017-03-08 腾讯科技(北京)有限公司 The detection method of similar video file and device
CN109189991A (en) * 2018-08-17 2019-01-11 百度在线网络技术(北京)有限公司 Repeat video frequency identifying method, device, terminal and computer readable storage medium
CN110134829A (en) * 2019-04-28 2019-08-16 腾讯科技(深圳)有限公司 Video locating method and device, storage medium and electronic device
CN111507260A (en) * 2020-04-17 2020-08-07 重庆邮电大学 Video similarity rapid detection method and detection device
CN111831852A (en) * 2020-07-07 2020-10-27 北京灵汐科技有限公司 Video retrieval method, device, equipment and storage medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114238692A (en) * 2022-02-23 2022-03-25 北京嘉沐安科技有限公司 Network live broadcast-oriented video big data accurate retrieval method and system
CN115086713A (en) * 2022-06-13 2022-09-20 乐知未来科技(深圳)有限公司 Repeated short video cleaning method based on visual features and audio features
CN115086713B (en) * 2022-06-13 2024-03-19 乐知未来科技(深圳)有限公司 Repeated short video cleaning method based on visual features and audio features
CN114969428A (en) * 2022-07-27 2022-08-30 深圳市纬亚森科技有限公司 Big data based audio and video intelligent supervision system and method
CN114969428B (en) * 2022-07-27 2022-12-16 深圳市海美迪科技股份有限公司 Big data based audio and video intelligent supervision system and method
CN115640422A (en) * 2022-11-29 2023-01-24 苏州琅日晴传媒科技有限公司 Network media video data analysis and supervision system
CN115640422B (en) * 2022-11-29 2023-12-22 深圳有影传媒有限公司 Network media video data analysis and supervision system
CN116471452A (en) * 2023-05-10 2023-07-21 武汉亿臻科技有限公司 Video editing platform based on intelligent AI
CN116471452B (en) * 2023-05-10 2024-01-19 武汉亿臻科技有限公司 Video editing platform based on intelligent AI
CN116939197A (en) * 2023-09-15 2023-10-24 海看网络科技(山东)股份有限公司 Live program head broadcasting and replay content consistency monitoring method based on audio and video

Similar Documents

Publication Publication Date Title
CN113722543A (en) Video similarity comparison method, system and equipment
US10452919B2 (en) Detecting segments of a video program through image comparisons
CN110213670B (en) Video processing method and device, electronic equipment and storage medium
EP1081960B1 (en) Signal processing method and video/voice processing device
CN108769731B (en) Method and device for detecting target video clip in video and electronic equipment
CN102084416B (en) Audio visual signature, method of deriving a signature, and method of comparing audio-visual data
EP2084624B1 (en) Video fingerprinting
EP3477506A1 (en) Video detection method, server and storage medium
CN108924617B (en) Method of synchronizing video data and audio data, storage medium, and electronic device
CN103077734A (en) Time alignment of recorded audio signals
US10757468B2 (en) Systems and methods for performing playout of multiple media recordings based on a matching segment among the recordings
CN110087042B (en) Face snapshot method and system for synchronizing video stream and metadata in real time
US11503375B2 (en) Systems and methods for displaying subjects of a video portion of content
CN104853244A (en) Method and apparatus for managing audio visual, audio or visual content
KR20060002942A (en) Data block detect by fingerprint
US9542976B2 (en) Synchronizing videos with frame-based metadata using video content
US8780209B2 (en) Systems and methods for comparing media signals
CN113347489A (en) Video clip detection method, device, equipment and storage medium
EP3839953A1 (en) Automatic caption synchronization and positioning
CN111400542A (en) Audio fingerprint generation method, device, equipment and storage medium
CN113992970A (en) Video data processing method and device, electronic equipment and computer storage medium
CN114339451A (en) Video editing method and device, computing equipment and storage medium
US8754947B2 (en) Systems and methods for comparing media signals
CN102905054A (en) Video synchronization method based on multidimensional image feature value comparison
EP2136314A1 (en) Method and system for generating multimedia descriptors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination