CN113722543A

CN113722543A - Video similarity comparison method, system and equipment

Info

Publication number: CN113722543A
Application number: CN202111072794.9A
Authority: CN
Inventors: 白书占
Original assignee: Turing Chuangzhi Beijing Technology Co ltd
Current assignee: Turing Chuangzhi Beijing Technology Co ltd
Priority date: 2021-09-14
Filing date: 2021-09-14
Publication date: 2021-11-30

Abstract

The invention discloses a video similarity comparison method, a system and equipment, which comprises the steps of obtaining image files and audio files of a video to be compared and a compared video, respectively carrying out similarity comparison on the image files and the audio files of the video to be compared and the compared video, wherein the image file comparison is a similar key frame group of the video to be compared, which is obtained according to the key frame similarity comparison, and synchronizing time stream information to obtain similar image fragments; and the audio file comparison comprises the steps of respectively segmenting and extracting the characteristics of the audio file of the video to be compared and the audio file of the compared video, and calculating the cosine similarity of the audio fragment of the video to be compared and the audio fragment of the compared video so as to determine the similar audio fragment. The invention has the beneficial effects that: the video comparison method is more comprehensive and more accurate by simultaneously comparing the images and the audios of the videos, and the positions of the similar videos are found according to the synchronous time flow information of the similar key frames, so that the comparison result is more visual.

Description

Video similarity comparison method, system and equipment

Technical Field

The invention relates to the technical field of computer video comparison, in particular to a method, a system and equipment for comparing video similarity.

Background

With the rapid development of the video industry, a great deal of video copyright infringement is generated, and at present, the main infringement forms comprise content transportation (such as second stealing, code printing, picture-in-picture and the like), secondary creation (such as unauthorized secondary creation and the like), video material citation (such as secondary editing, long-time splitting, short-time splitting and the like), and video recomposition modes of the same dubbing on pictures, the same dubbing on pictures and the like also appear. The infringement form is more and more concealed, and the extraction of infringement evidence is particularly important for judging whether the infringement is established or not.

In the prior art, image comparison is mainly used for comparing suspected videos to determine whether infringement exists, and as infringement forms are more and more concealed and diversified, only the image infringement comparison method cannot accurately distinguish whether infringement exists, and cannot find an infringement position and lock evidence.

Disclosure of Invention

In order to solve the above problems, an object of the present invention is to provide a video comparison method, system and device for comparing video images and audio simultaneously, so that the video comparison method is more comprehensive and accurate.

The invention provides a video similarity comparison method, which comprises the following steps:

processing the video set to be compared and the compared video set, and respectively acquiring an image file and an audio file of the video to be compared and an image file and an audio file of the video to be compared;

comparing the image file and the audio file of the video to be compared with the video to be compared, wherein the image file for comparing the video to be compared with the video to be compared comprises the following steps:

extracting N key frames according to the image files of the videos to be compared, and extracting M key frames according to the image files of the videos to be compared;

sequentially and respectively carrying out similarity comparison on N key frames of a video to be compared with each key frame of the compared video, obtaining a similar key frame group of the video to be compared according to the similarity comparison of the key frames, determining a similar image group of the video to be compared according to the similar key frame group, and synchronizing time stream information to obtain similar image fragments;

the audio file for comparing the video to be compared with the compared video comprises:

and respectively segmenting and extracting features of the audio file of the video to be compared and the audio file of the video to be compared, calculating the cosine similarity of the audio fragment of the video to be compared and the audio fragment of the video to be compared, and determining the similar audio fragment according to the cosine similarity obtained by calculation.

As a further improvement of the present invention, the performing similarity comparison between N key frames of a video to be compared and each key frame of a compared video sequentially includes:

calculating the hash value of each key frame of the video to be compared and the compared video according to a difference hash algorithm;

and calculating the Hamming distance between the hash value of the video to be compared and the hash value of the compared video, and judging whether similar image segments exist between the video to be compared and the compared video according to the calculated Hamming distance.

As a further improvement of the present invention, the determining a similar image group of a video to be compared according to a similar key frame group, and synchronizing time stream information to obtain a similar image segment includes: determining a starting point and an ending point of the similar image segments, wherein the determining the starting point of the similar image segments comprises:

calculating forward and synchronizing time flow information according to the current similar key frames of the video to be compared and the compared video as end frames, wherein the last similar key frame of the current similar key frame is a start frame;

and simultaneously carrying out dichotomy on the video to be compared and the compared video to obtain key frames, and calculating the similarity of the obtained key frames:

if the obtained key frames are similar, continuing to carry out dichotomy to obtain the key frames, and calculating the similarity of the obtained key frames; if the obtained key frames are not similar, the currently obtained key frame is the starting point of the similar image segment;

determining the end point of the similar image segment comprises:

calculating backwards and synchronizing time flow information according to the current similar key frames of the video to be compared and the compared video as starting frames, wherein the next similar key frame of the current key frame is an ending frame;

if the obtained key frames are similar, continuing to carry out dichotomy to obtain the key frames, and calculating the similarity of the obtained key frames; and if the acquired key frames are not similar, the currently acquired key frame is the end point of the similar image segment.

As a further improvement of the present invention, comparing the audio file of the video to be compared with the audio file of the compared video further comprises:

before the audio file of the video to be compared and the audio file of the video to be compared are divided, the cosine similarity of the audio file of the video to be compared and the audio file of the video to be compared is calculated, and if the cosine similarity is greater than a preset first threshold value, the audio file of the video to be compared and the audio file of the video to be compared are determined to be similar audio files.

As a further improvement of the present invention, after obtaining similar image segments of the video to be compared and the video to be compared, capturing audio segments corresponding to the similar image segments for similarity comparison, includes:

calculating cosine similarity of audio segments of the video to be compared and the video to be compared, wherein if the cosine similarity is greater than or equal to a preset first threshold value, the audio segments corresponding to the similar image segments are similar audio segments;

if the cosine similarity is smaller than a preset first threshold, segmenting the audio segments corresponding to the similar image segments, and performing similarity comparison on the segmented audio segments, including determining a starting point and an ending point of the similar audio segments, wherein determining the starting point of the similar audio segments includes:

if the similarity of the audio segments after the video to be compared and the video to be compared are segmented is larger than a set second threshold, taking the starting time of the video segment as the ending time of the similarity comparison of the audio segments, subtracting a time interval T1 from the ending time of the similarity comparison of the audio segments, sequentially segmenting the audio segments of the video to be compared and the video to be compared at a time interval T1, sequentially performing the similarity comparison on the segmented audio segments, if the cosine similarity is larger than or equal to the set first threshold, performing the similarity comparison on the next segmented audio segment until the cosine similarity is smaller than the set first threshold, and taking the starting time of the last similar audio segment of the current audio segment as the starting point of the similar audio segment;

determining the end point of the similar audio piece comprises:

taking the end time of the video segment as the start time of the audio segment similarity comparison, taking the end time of the audio segment similarity comparison as the start time of the audio segment similarity comparison plus a time interval T1, sequentially dividing the audio segments of the video to be compared and the compared video at a time interval T1, sequentially performing similarity comparison on the divided audio segments, if the cosine similarity is greater than or equal to a set first threshold, performing similarity comparison on the next divided audio segment until the cosine similarity is less than the set first threshold, and taking the end time of the last similar audio segment of the current audio segment as the end point of the similar audio segment.

As a further improvement of the invention, the method respectively extracts the characteristics of the audio clip of the video to be compared and the audio clip of the video to be compared, and comprises the following steps:

step S1: processing the audio clip to obtain audio data and a sampling rate;

step S2: calculating the maximum frequency of the audio samples, sampling and quantizing;

step S3: pre-emphasis is performed on the audio signal obtained in step S2;

step S4: framing and windowing the audio signal obtained in the step S3 to obtain a frame array;

step S5: calculating the power spectrum of each frame after Fourier transform;

step S6: calculating a Mel triangular space filter to obtain a preliminary characteristic matrix;

step S7: carrying out logarithmic operation on the filtered matrix characteristics;

step S8: and performing discrete cosine transform on the logarithm operation result obtained in the step S7 to obtain a feature matrix of the audio frequency fragment of the video to be compared and a feature matrix of the audio frequency fragment of the compared video.

As a further improvement of the present invention, a cosine similarity SIM is calculated according to the feature matrix of the audio segment of the video to be compared and the feature matrix of the audio segment of the video to be compared, and a formula for calculating the cosine similarity SIM is as follows:

wherein, arr1 and arr2 are respectively the feature matrix of the audio clip of the video to be compared and the feature matrix of the audio clip of the video to be compared.

The invention also provides a video similarity comparison system, which comprises:

the acquisition module is used for processing the video set to be compared and the compared video set, and respectively acquiring an image file and an audio file of the video to be compared and an image file and an audio file of the video to be compared;

the image comparison module is used for comparing the image files of the video to be compared with the image files of the video to be compared, and comprises:

the audio file comparison module is used for comparing the audio files of the video to be compared with the audio files of the video to be compared, and comprises:

As a further improvement of the present invention, the image comparison module sequentially and respectively compares the similarity of N key frames of the video to be compared with each key frame of the compared video, and the comparison comprises:

As a further improvement of the present invention, the image comparison module determines a similar image group of the video to be compared according to the similar key frame group, and synchronizes the time stream information to obtain a similar image segment includes: determining a starting point and an ending point of the similar image segments, wherein the determining the starting point of the similar image segments comprises:

determining the end point of the similar image segment comprises:

As a further improvement of the present invention, the comparing module of the audio file compares the audio file of the video to be compared with the audio file of the compared video, and further comprises:

As a further improvement of the present invention, the audio file comparison module performs similarity comparison on the acquired audio clips corresponding to the similar image clips of the video to be compared and the compared video, and the similarity comparison includes:

determining the end point of the similar audio piece comprises:

As a further improvement of the present invention, the audio comparison module performs feature extraction on the audio clip of the video to be compared and the audio clip of the video to be compared, respectively, and includes the following steps:

step S1: processing the audio clip to obtain audio data and a sampling rate;

step S3: pre-emphasis is performed on the audio signal obtained in step S2;

step S5: calculating the power spectrum of each frame after Fourier transform;

As a further improvement of the present invention, the audio comparison module calculates the cosine similarity SIM according to the feature matrix of the audio segment of the video to be compared and the feature matrix of the audio segment of the video to be compared, and the formula for calculating the cosine similarity SIM is as follows:

The invention provides electronic equipment which comprises a memory and a processor and is characterized in that the memory is used for storing one or more computer instructions, and the one or more computer instructions are executed by the processor to realize the video comparison method.

The invention provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program is executed by a processor to implement the video comparison method.

The invention has the beneficial effects that: the video comparison method is more comprehensive and more accurate by simultaneously comparing the images and the audios of the videos, and the positions of the similar videos are found according to the synchronous time flow information of the similar key frames, so that the comparison result is more visual.

Drawings

Fig. 1 is a flowchart of a method for comparing video similarity according to an embodiment of the present invention;

fig. 2 is a flowchart of calculating a hash value by using a difference hash algorithm of the video similarity comparison method according to the embodiment of the present invention;

fig. 3 is an audio comparison flowchart of a video similarity comparison method according to an embodiment of the present invention;

fig. 4 is a schematic system structure diagram of a video similarity comparison system according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present disclosure.

It should be noted that, if directional indications (such as up, down, left, right, front, and back … …) are involved in the embodiment of the present invention, the directional indications are only used to explain the relative positional relationship between the components, the movement situation, and the like in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indications are changed accordingly.

In addition, in the description of the present invention, the terms used are for illustrative purposes only and are not intended to limit the scope of the present disclosure. The terms "comprises" and/or "comprising" are used to specify the presence of elements, steps, operations, and/or components, but do not preclude the presence or addition of one or more other elements, steps, operations, and/or components. The terms "first," "second," and the like may be used to describe various elements, not necessarily order, and not necessarily limit the elements. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified. These terms are only used to distinguish one element from another. These and/or other aspects will become apparent to those of ordinary skill in the art in view of the following drawings, and the description of the embodiments of the present invention will be more readily understood. The drawings are used for the purpose of illustrating embodiments of the disclosure only. One skilled in the art will readily recognize from the following description that alternative embodiments of the illustrated structures and methods of the present invention may be employed without departing from the principles of the present disclosure.

As shown in fig. 1, a method for comparing video similarity according to an embodiment of the present invention includes:

For example, if there is a threshold greater than X in any group of data in the video information in the corresponding N groups of libraries within M minutes of the keyframe group of the video set to be compared, and the similarity of the audio original of the two videos is greater than Y, and both X and Y are predetermined thresholds, it can be determined that the similarity of the two videos is high.

In an optional embodiment, when extracting the key frames, for example, the key frames of the compared videos may be extracted according to the time result types of the videos to be compared, and the formula for calculating the average time difference of the key frames is as follows:

td＝total/framenum/fps/60

wherein, total is the total frame number of the compared video, framenum is the number of the extracted frames, and fps is the frame rate of the compared video.

If the time difference is between (0, 1):

the starting frame calculation formula is:

starttime＝[fps*(total/framenum/fps/framenum)]*mu*2+100

wherein, total is the total frame number of the compared video, frame num is the number of the extracted frames, fps is the frame rate of the compared video, and mu is 1.

End frame calculation formula:

wherein, total is the total frame number of the compared video, frame num is the number of the extracted frames, fps is the frame rate of the compared video, mu is 1, and menu is 1.

When time stream information is synchronized, time can be located according to the key frames:

wherein, frames is the video frame number, and rate is the video frame rate.

In an optional implementation manner, the similarity comparison is sequentially performed on the N key frames of the video to be compared and each key frame of the video to be compared, including calculating the hash value of each key frame of the video to be compared and the video to be compared according to a difference hash algorithm; and calculating the Hamming distance between the hash value of the video to be compared and the hash value of the compared video, and judging whether similar image segments exist between the video to be compared and the compared video according to the calculated Hamming distance.

As shown in fig. 2, the method for calculating the hash value of each key frame of the video to be compared and the video to be compared by the difference hayes algorithm includes the following steps:

1) the images are reduced to the same proportion, so that the details can be removed, the basic outline special diagnosis can be obtained, and the speed of generating the hash value can be increased;

2) graying of an image:

graying (including image elements: width, height, depth) is achieved by changing the RGB three channels to a single channel.

3) And (4) calculating a difference value, and subtracting two adjacent elements (subtracting the right element from the left element) to obtain N difference values with different specified numbers.

4) Processing the hash value, if the hash value is a positive mark or 0 random number or letter is the same, if the hash value is negative, then the hash value is not;

5) principle of operation

Wherein, A is the pixel value of a certain frame of the video to be compared, and B is the pixel value of a certain frame of the compared video.

Finally, matrix information with matrix characteristics of (N x N) is obtained, the positive and negative of the number are judged, different 0 and 1 values are marked, and then the Hamming distance (namely the number of the same characters of the two character strings) is calculated. For example, if the number of identical characters is 8 and the total length of the hash values is 16, the similarity coefficient between the two is 8/16-0.5. The similarity coefficient can be adjusted according to requirements, the determined coefficient is called a threshold, for example, the threshold coefficient is 0.9, and if the similarity coefficient is greater than 0.9, it is determined that the two images are similar. When the method is applied to video infringement judgment, a compared video is a legal video, the similarity of the two is greater than a set similarity coefficient threshold, the video to be compared can be judged as infringement, and if the similarity is less than the similarity coefficient threshold, the video to be compared is judged as not infringement.

An alternative embodiment, determining a similar image group of videos to be compared according to the similar key frame group, and synchronizing time stream information to obtain similar image segments includes: determining a starting point and an ending point of the similar image segments, wherein the determining the starting point of the similar image segments comprises:

and calculating forward and synchronizing time flow information according to the current similar key frame of the video to be compared and the compared video as an end frame, wherein the last similar key frame of the current similar key frame is a starting frame, and the time from the beginning of the video is calculated if the current frame is the starting frame. And simultaneously carrying out dichotomy on the video to be compared and the compared video to obtain a key frame, and calculating the similarity of the obtained key frame: if the obtained key frames are similar, continuing to carry out dichotomy to obtain the key frames, and calculating the similarity of the obtained key frames; if the obtained key frames are not similar, the currently obtained key frame is the starting point of the similar image segment;

determining the end point of the similar image segment comprises:

calculating backwards and synchronizing time flow information according to the current similar key frame of the video to be compared and the video to be compared as a starting frame, wherein the next similar key frame of the current key frame is an ending frame, and the time from the end of the video is calculated if the current frame is the ending frame; and simultaneously carrying out dichotomy on the video to be compared and the compared video to obtain key frames, and calculating the similarity of the obtained key frames: if the obtained key frames are similar, continuing to carry out dichotomy to obtain the key frames, and calculating the similarity of the obtained key frames; and if the acquired key frames are not similar, the currently acquired key frame is the end point of the similar image segment.

The method is applied to video infringement judgment, if the video to be compared is a pirate video and the compared video is a legal video, the start time and the end time of suspected infringement of the image can be found through the method, and the suspected infringement position of the image can be positioned between the start point and the end point.

In an optional embodiment, comparing the audio file of the video to be compared with the audio file of the video to be compared further includes: before the audio file of the video to be compared and the audio file of the video to be compared are divided, the cosine similarity of the audio file of the video to be compared and the audio file of the video to be compared is calculated, and if the cosine similarity is greater than a preset first threshold value, the audio file of the video to be compared and the audio file of the video to be compared are determined to be similar audio files. When the method is applied to video infringement judgment, if the cosine similarity of the audio files corresponding to the two video files is larger than a set threshold, dubbing is determined to be completely the same, and a suspected infringement condition exists.

An optional embodiment is that after obtaining similar image segments of a video to be compared and a compared video, an audio segment corresponding to the similar image segment is intercepted to perform similarity comparison, and the method includes:

calculating cosine similarity of audio segments of the video to be compared and the video to be compared, wherein if the cosine similarity is greater than or equal to a preset first threshold value, the audio segments corresponding to the similar image segments are similar audio segments; if the cosine similarity is smaller than a preset first threshold, segmenting the audio segments corresponding to the similar image segments, and performing similarity comparison on the segmented audio segments, including determining a starting point and an ending point of the similar audio segments, wherein determining the starting point of the similar audio segments includes:

if the similarity of the video to be compared and the audio segment divided by the video to be compared is greater than a set second threshold, taking the start time of the video segment as the end time of the similarity comparison of the audio segment, and the start time of the similarity comparison of the audio segment is the end time of the similarity comparison of the audio segment minus a time interval T1 (for example, 5 seconds), sequentially dividing the audio segment of the video to be compared and the audio segment of the video to be compared at a time interval T1, sequentially performing the similarity comparison on the divided audio segments, and if the cosine similarity is greater than or equal to the set first threshold, performing the similarity comparison on the next divided audio segment until the cosine similarity is less than the set first threshold, and taking the start time of the last similar audio segment of the current audio segment as the start point of the similar audio segment;

determining the end point of the similar audio piece comprises: taking the ending time of the video segment as the starting time of the audio segment similarity comparison, taking the ending time of the audio segment similarity comparison as the starting time of the audio segment similarity comparison plus a time interval T1 (for example, 5 seconds), sequentially dividing the audio segments of the video to be compared and the compared video at a time interval T1, sequentially performing similarity comparison on the divided audio segments, if the cosine similarity is greater than or equal to a set first threshold, performing similarity comparison on the next divided audio segment until the cosine similarity is less than the set first threshold, and taking the ending time of the last similar audio segment of the current audio segment as the ending point of the similar audio segment.

The method is applied to video infringement judgment, if the video to be compared is a pirated video and the compared video is a legal video, the start time and the end time of suspected audio infringement can be found through the method, and the suspected audio infringement position can be positioned between the start point and the end point.

An optional implementation manner, respectively performing feature extraction on an audio segment of a video to be compared and an audio segment of a video to be compared, and performing similarity comparison between the two, as shown in fig. 3, includes:

processing the audio clip to obtain audio data and a sampling rate; for example, if the uploaded file is a.mp 3 file, which needs to be converted into a.wav lossless format file, the signal data is obtained by scipy to the sampling rate.

The maximum frequency of the audio samples is calculated, typically the sampling interval duration is at least one time longer than the signal period time, hf ═ sr/2, where sr is the sampling frequency and hf is the maximum frequency.

Pre-emphasis, for example, using a difference equation to implement pre-emphasis, the pre-emphasis equation being:

y(n)＝x(n)-ax(n-1)

wherein a is 0.95; x (n) is the original audio signal, represented by a matrix of n x n.

Pre-emphasis is mainly to remove the influence of lip radiation, increase the high resolution of speech, and be more accurate for audio contrast.

Framing and windowing to obtain a frame array:

the main purposes of framing and windowing are as follows: speech signals are macroscopically unstable, microscopically short-term, and gibbs effects may occur after framing.

In this embodiment, the frame acquisition time length is: wl × sr (wl is the window length, value 25ms, sr is the sampling frequency), step size between adjacent frames: ws × sr (ws is a window interval, value 10ms, sr is the sampling frequency), calculate the total length of the frame:

where sl is the total length of the signal, fl is the frame time length, and fs is the step size between adjacent frames.

And then extracting time points of all frames subjected to matrix operation to obtain a matrix result with the total length multiplied by fl, and forming a final frame matrix signal according to a window function.

Calculating the power spectrum after Fourier transform of each frame: for example, fourier transform sp may be performed by an existing numpy scientific tool (if the matrix shape of the frame data is N × L, the shape after passing numpy. fft. rfft is N × nfft, and nfft takes 512). Then, calculating a power spectrum to obtain a summed power spectrum, wherein the power spectrum calculation formula is as follows:

wherein, NFFT takes 512, sp is the value after Fourier transform.

And calculating a Mel triangular pitch filter to obtain a preliminary characteristic matrix, so that the small frequency change in low frequency can be distinguished more easily by simulating human hearing. The method specifically comprises the following steps:

the frequency is firstly converted into the Mel frequency, and because the size of the sound distinguished by the human ear is required to be linear and non-linear, the linear division is carried out by converting into the Mel frequency, and the formula is as follows:

2595 × log (1+ hz/700.0), wherein hz is the frequency;

and converting the calculated Mel frequency into hz, wherein the formula is as follows: 700(10^m/2505-1), wherein m is the above-calculated Mel coefficient,

find the converted frequency, find the corresponding position in fft and establish the filter, and calculate the filter matrix through the filter. The formula is as follows:

wherein m is the number of the filters,

where N is 512, fl is the mel-factor, and W is the sampling rate.

And then summing each frame of the energy spectrum according to the line, wherein the formula is as follows:

wherein sp is the energy spectrum, i is the number of matrix rows, and j is the number of matrix columns.

And then calculating a filtered result by using the filter and the summed energy spectrum, wherein the formula is as follows:

log(sp*fb.T)

wherein sp is the summed energy spectrum and fb is the filter.

And carrying out logarithmic operation on the filtered matrix characteristics.

And performing discrete cosine transform on the obtained logarithm operation result, and performing energy concentration.

For example, the scipy scientific computation package is used for computation, and a specific discrete cosine transform kernel formula is as follows:

obtaining a final characteristic matrix after DCT forward transformation, wherein the specific formula is as follows:

wherein f (x, y) is a feature matrix after logarithmic operation.

Calculating cosine similarity SIM according to the obtained feature matrix of the audio frequency fragment of the video to be compared and the feature matrix of the audio frequency fragment of the video to be compared, wherein the formula for calculating the cosine similarity SIM is as follows:

The obtained similarity is the SIM, and the SIM is compared with the required similarity to judge whether the dubbing is suspected to infringe.

The present invention also provides a video similarity comparison system, as shown in fig. 4, the system includes:

After the processing is finished, the processed data are stored in a warehouse, a comparison result needs to be retrieved again or similar videos need to be compared again later, the videos can be read from the video warehouse to be checked, whether the videos are in the key frame database or not is judged, and if yes, a plurality of groups of key frames corresponding to the data, corresponding difference hash values, corresponding time stream information and audio original files corresponding to the videos in the warehouse are searched. And calculating deviation frame synchronization with the video in the library to acquire a key frame group and an audio original of the uploaded video through the time stream of the uploaded video.

And storing the key frame information into a database: and calculating the deviation frame rate to calculate out the corresponding frame of the video in the library. And acquiring N groups of key frames within M minutes, corresponding difference hash values and audio original files of videos in the library, and storing the key frames in a key frame library. The comparison method is repeated to obtain the key frame group and the key frame of certain video data uploaded by the user and the audio file in the database, the same video segment is searched, and the comparison time can be greatly shortened.

During suspected infringement video comparison, the method is applied to the situation that original edition and pirate video content are unknown, purely performing computer comparison, if two video contents to be compared are known, only suspected infringement evidence locking is needed, and in order to increase the working efficiency, the following method can be used:

1) dynamically dividing the number of threads: and dynamically dividing the processing data responsible for each thread according to the data number and the highest thread number.

2) The following situation is processed for infringing content locking:

one original video and one pirate video: and extracting N key frames according to the average value of the total frame number of the whole video time stream, and generating corresponding difference hash values and time stream information for the whole key frame group.

Two original videos are combined into one video, and one pirate video: and extracting N key frames according to the two original video frames, and generating a difference hash value and corresponding time flow information.

Two pirated videos are merged, one authentic video: and extracting N key frames according to the average value of the total frame number of the whole video time stream, and generating corresponding difference hash values and time stream information for the whole key frame group.

3) And extracting key frame groups in M times before and after the time of the pirated frame:

starttime＝[timecover-60*fpscover*frametime]

endtime＝[timecover+60*fpscover*frametime]

wherein, timecover is the pirate frame number of the determined back deviation value, fpscover is the pirate frame rate, and the frame time is the time multiple.

Determining the time to begin and end generates a pirate keyframe group:

one original video and one pirate video: and acquiring the key frame group in the M time period by using the formula according to the legal and pirated video frame rates.

One legal two pirates are combined into one video: and taking a key frame group of an image group (the reason is that the time of the original video is equal to (or approximately equal to) the sum of the durations of two pirated videos) of the original video by N x 2 times according to the content formula, and subtracting the total frame number of the first infringing video when the frame number of the next N video frames reaches the corresponding image, thereby obtaining the key frame group of the second infringing video within N M time periods.

Two original videos are combined into one video and one pirate video: and extracting N key frames of the two original videos, and dynamically distributing the combination of the two key frames of the two videos according to a mathematical formula. Taking the first positive video key frame and taking out the key frame groups in N M time periods, and for the second positive video key frame, adding the frame number to the sum of the first positive video key frame so as to obtain the key frame groups in N M time periods.

4) Comparing the infringement key frame group corresponding to each key frame, and acquiring an image with the highest similarity: the similarity is judged through the Hamming distance, and the Hamming distance is the number of the same characters judged according to two character strings with equal length, and specifically comprises the following steps:

generating a hash value, and calculating the similarity: and generating a hash value for all pirate key frame groups corresponding to each legal version key frame, calculating the Hamming distance and storing the Hamming distance into a sqlite database.

Acquiring the most similar key frame of the image group in M minutes: and performing descending sequencing according to the content of the sqlite database to obtain the final key frame and the corresponding time node as well as the key frame and the time node corresponding to the original video.

Returning final data: and after the current data is processed, the total duration of the two videos and the corresponding video names are finally obtained.

The invention also relates to an electronic device comprising the server, the terminal and the like. The electronic device includes: at least one processor; a memory communicatively coupled to the at least one processor; and a communication component communicatively coupled to the storage medium, the communication component receiving and transmitting data under control of the processor; wherein the memory stores instructions executable by the at least one processor to implement the method of the above embodiments.

In an alternative embodiment, the memory is used as a non-volatile computer-readable storage medium for storing non-volatile software programs, non-volatile computer-executable programs, and modules. The processor executes various functional applications of the device and data processing, i.e., implements the method, by executing nonvolatile software programs, instructions, and modules stored in the memory.

The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store a list of options, etc. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be connected to the external device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

One or more modules are stored in the memory and, when executed by the one or more processors, perform the methods of any of the method embodiments described above.

The product can execute the method provided by the embodiment of the application, has corresponding functional modules and beneficial effects of the execution method, and can refer to the method provided by the embodiment of the application without detailed technical details in the embodiment.

The present invention also relates to a computer-readable storage medium for storing a computer-readable program for causing a computer to perform some or all of the above-described method embodiments.

That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Furthermore, those of ordinary skill in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

It will be understood by those skilled in the art that while the present invention has been described with reference to exemplary embodiments, various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims

1. A video similarity comparison method is characterized by comprising the following steps:

2. The method according to claim 1, wherein the performing similarity comparison between the N key frames of the video to be compared and each key frame of the compared video sequentially comprises:

3. The method of claim 1, wherein the determining a similar image group of the videos to be compared according to the similar key frame group and synchronizing the time stream information to obtain similar image segments comprises: determining a starting point and an ending point of the similar image segments, wherein the determining the starting point of the similar image segments comprises:

determining the end point of the similar image segment comprises:

4. The method of claim 1, wherein comparing the audio files of the video to be compared and the video to be compared further comprises:

5. The method of claim 1, wherein after obtaining similar image segments of the video to be compared and the video to be compared, capturing audio segments corresponding to the similar image segments for similarity comparison, comprises:

determining the end point of the similar audio piece comprises:

6. The method according to claim 1, wherein the feature extraction is performed on the audio segments of the video to be compared and the audio segments of the video to be compared respectively, and the method comprises the following steps:

step S1: processing the audio clip to obtain audio data and a sampling rate;

step S3: pre-emphasis is performed on the audio signal obtained in step S2;

step S5: calculating the power spectrum of each frame after Fourier transform;

7. The method of claim 6, wherein the cosine similarity SIM is calculated according to the feature matrix of the audio segment of the video to be compared and the feature matrix of the audio segment of the video to be compared, and the formula for calculating the cosine similarity SIM is as follows:

8. A video similarity comparison system, comprising:

9. An electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method of any one of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored, the computer program being executable by a processor for implementing the method according to any one of claims 1-7.