CN117333790A

CN117333790A - Similarity judging method and device for video events and electronic equipment

Info

Publication number: CN117333790A
Application number: CN202310441698.XA
Authority: CN
Inventors: 汪昭辰; 刘世章
Original assignee: Qingdao Chenyuan Technology Information Co ltd
Current assignee: Qingdao Chenyuan Technology Information Co ltd
Priority date: 2023-04-23
Filing date: 2023-04-23
Publication date: 2024-01-02

Abstract

The application provides a similarity judging method, a device and electronic equipment of video events, and relates to the field of video processing. And calculating feature data of the video event based on the content frame of the video event, calculating the similarity ratio of the first video event and the second video event according to the feature data of the first video event and the second video event, and judging whether the first video event and the second video event are similar according to the magnitude relation between the similarity ratio of the first video event and the second video event and the first threshold. The method and the device can improve the detection precision of the video event and the judging efficiency of the video event level similarity in massive videos.

Description

Similarity judging method and device for video events and electronic equipment

Technical Field

The present invention relates to the field of video processing, and in particular, to a method and apparatus for determining similarity of video events, and an electronic device.

Background

With the development of internet technology, the phenomenon of video handling on the internet is more and more increasing, such as that the original video is completely or partially handled to a video platform to be played without permission, and the original video is edited again to be combined, so as to protect original rights and interests, in the prior art, the similarity of the two videos is compared by a video content comparison method.

However, the existing video content comparison method is a method by video fingerprint comparison, which includes taking image color histogram features as video fingerprints and video fingerprints based on two-dimensional discrete cosine transform of video images.

However, the noise immunity of the above manner is poor, the image detection accuracy is low, and the similarity determination result is affected, for example, after a video is subjected to amplitude-to-form ratio conversion or frame-to-image conversion of a video with a watermark or the like, the substantial content of the video is still the same, but the converted video fingerprint is changed compared with the original video fingerprint, that is, the detection accuracy of the existing video event similarity determination method is not high.

In addition, the existing video similarity comparison mode based on deep learning and a neural network has high dependence on a sample library, model training is needed to be carried out according to a large number of sample videos, training cost is high, training time is long, noise resistance is poor, and video content comparison efficiency is low.

Disclosure of Invention

In view of this, the purpose of the present application is to provide a method, an apparatus and an electronic device for determining similarity of video events, which can solve the problem of low accuracy and efficiency of the existing method for determining similarity of video events.

Based on the above object, in a first aspect, the present application proposes a method for determining similarity of video events, where the method includes: acquiring a video to be detected, and acquiring a video event based on the video to be detected, wherein the video event comprises a first video event and a second video event; the video event is a set of all content frames in a shot, the content frames are frames representing shot content, the frames comprise a first frame, a last frame and N intermediate frames, N is a natural number, the intermediate frames are obtained when the difference rate is larger than a preset threshold value through difference rate calculation of all subframes of the shot except the first frame and the last frame and the previous content frame; calculating feature data of the video event based on the content frame of the video event, and calculating similarity rates of the first video event and the second video event according to the feature data of the first video event and the feature data of the second video event; judging whether the similarity ratio of the first video event and the second video event is larger than or equal to a first threshold value, if so, determining that the first video event and the second video event are similar, and if not, determining that the first video event and the second video event are dissimilar.

Optionally, obtaining a video event based on the video to be detected includes: preprocessing the video to be detected to obtain a normalized video; granulating the normalized video to obtain a lens sequence and a content frame sequence; and obtaining a video event sequence according to the shot sequence and the content frame sequence, wherein the video event sequence comprises at least one video event.

Optionally, the feature data includes a number of content frames of a video event and a content frame feature matrix of the video event, the number of content frames of the second video event is greater than or equal to the number of content frames of the first video event, and calculating, according to the feature data of the first video event and the feature data of the second video event, a similarity ratio of the first video event and the second video event includes: calculating a content frame difference rate of a first content frame of the first video event and a content frame of the second video event in sequence to obtain target content frames corresponding to each content frame of the first video event one by one, wherein the target content frames belong to the second video event;

and calculating the similarity rate of the first video event and the second video event according to the content frame difference rate of each content frame of the first video event and the corresponding target content frame and the content frame quantity of the first video event.

Optionally, a similarity ratio calculation formula for calculating a similarity ratio of the first video event and the second video event is:

，/>

wherein,for the number of content frames of the first video event, p represents the second video event, +.>For the number of content frames of the second video event, < >>Representing the similarity of the first video event q and the second video event p,/and>content frame number for first video event, < >>Is the +.o of the first video event q>And the difference rate of the content frame and the corresponding target content frame, wherein the target content frame belongs to the second video event.

Optionally, the second video event is plural, and the method includes: extracting features of a plurality of second video events to obtain a video event feature database containing feature data of each second video event, wherein the feature data of the second video event comprise the number of content frames of the second video event, feature vectors of the second video event and a content frame feature matrix of the second video event; and judging the similarity of the video event to each second video event in the first video event and the video event feature database.

Optionally, performing video event similarity determination on the first video event and each second video event in the video event feature database includes: calculating an absolute value of a difference in the number of content frames of the first video event and the second video event; determining that the first video event and the second video event are dissimilar if the absolute value of the content frame number difference is greater than a second threshold; and judging whether the first video event and the second video event are similar or not according to the feature vector difference rate of the first video event and the second video event under the condition that the absolute value of the content frame number difference value is smaller than or equal to a second threshold value.

Optionally, determining whether the first video event and the second video event are similar according to the feature vector difference rate of the first video event and the second video event includes: obtaining the feature vector of the first video event and the module of the feature vector of the first video event according to the feature vector of the content frame of the first video event, and obtaining the feature vector of the second video event and the module of the feature vector of the second video event according to the feature vector of the content frame of the second video event; obtaining a feature vector difference value of the first video event and the second video event according to the feature vector of the first video event and the feature vector of the second video event; obtaining the feature vector difference rates of the first video event and the second video event according to the feature vector difference values of the first video event and the second video event, the feature vector modulus of the first video event and the feature vector modulus of the second video event; determining that the first video event and the second video event are dissimilar if a feature vector difference rate of the first video event and the second video event is greater than a third threshold; and judging whether the first video event and the second video event are similar or not according to the content frame difference rate of the first video event and the second video event under the condition that the feature vector difference rate of the first video event and the second video event is smaller than or equal to the third threshold value.

Optionally, the number of content frames of the second video event is greater than or equal to the number of content frames of the first video event, and determining whether the first video event and the second video event are similar according to the content frame difference rate of the first video event and the second video event includes: judging whether any content frame of the first video event has a target content frame in the second video event, judging whether the content frame difference rate of each content frame of the first video event and the corresponding target content frame is smaller than or equal to a fourth threshold value, and if not, determining that the first video event is dissimilar to the second video event; if yes, executing the step of judging whether the similarity ratio of the first video event and the second video event is greater than or equal to a first threshold value; the judging condition for judging whether the content frames of the first video event and the second video event are similar or not according to the content frame difference rate of the first video event and the second video event is as follows:

wherein,for the first video event->And second video event->Content frame difference rate,/-, of (2)>J content frame for event p, +.>，/>I content frame for event q, +. >，/>For the original difference rate between the j content frame of event p and the i content frame of event q,/>Is an inherent error->For calculating the preset threshold value of the error, +.>Is the fourth threshold.

In a second aspect, there is also provided a similarity determining apparatus for video events, the apparatus including: the image processing module is used for acquiring a video to be detected, and obtaining a video event based on the video to be detected, wherein the video event comprises a first video event and a second video event; the video event is a set of all content frames in a shot, the content frames are frames representing shot content, the frames comprise a first frame, a last frame and N intermediate frames, N is a natural number, the intermediate frames are obtained when the difference rate is larger than a preset threshold value through difference rate calculation of all subframes of the shot except the first frame and the last frame and the previous content frame; the computing module is used for computing the characteristic data of the video event based on the content frame of the video event, and computing the similarity rate of the first video event and the second video event according to the characteristic data of the first video event and the characteristic data of the second video event; the judging module is used for judging whether the similarity ratio of the first video event and the second video event is larger than or equal to a first threshold value, if so, determining that the first video event and the second video event are similar, and if not, determining that the first video event and the second video event are dissimilar.

In a third aspect, there is also provided an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor runs the computer program to implement the method of the first aspect.

In a fourth aspect, there is also provided a computer readable storage medium having stored thereon a computer program for execution by a processor to perform the method of any of the first aspects.

Overall, the present application has at least the following benefits:

the video event is obtained by normalizing and granulating the video, so that each video event can be calculated under the same coordinate system, and then the similarity of the first video event and the second video event is obtained by calculating the characteristic data of the video event, so that whether the two video events are similar or not can be judged according to the similarity of the first video event and the second video event, and the method has the characteristic of accurate calculation. In addition, when similarity judgment is carried out on the first video event and the massive second video event, a video event feature database is formed by carrying out normalization and feature extraction on the massive second video event, and further, when comparison is carried out, only feature data of the first video event are required to be extracted, and then similarity of the first video event feature data and the second video event feature data is compared according to the feature data of the first video event feature data and the feature data of the second video event feature data, so that comparison efficiency of massive images is improved.

Drawings

In the drawings, the same reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily drawn to scale. It is appreciated that these drawings depict only some embodiments according to the disclosure and are not therefore to be considered limiting of its scope. The exemplary embodiments of the present invention and the descriptions thereof are for explaining the present invention and do not constitute an undue limitation of the present invention. In the drawings:

FIG. 1 is a schematic diagram of an application environment of an alternative video event similarity determination method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an application environment of another alternative method for determining similarity of video events according to an embodiment of the present invention;

FIG. 3 is a flowchart showing steps of a method for determining similarity of video events according to an embodiment of the present invention;

FIG. 4 shows a schematic view of a granulating structure according to an embodiment of the invention;

FIG. 5 is a schematic diagram of content frame selection according to an embodiment of the present invention;

FIG. 6 is a flowchart illustrating steps for determining whether a first video event is similar to a second video event according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a video event similarity determination apparatus according to an exemplary embodiment of the present invention;

Fig. 8 shows a schematic diagram of an electronic device according to an embodiment of the application.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In one aspect of the embodiment of the present invention, a method for determining similarity of video events is provided, and as an alternative implementation manner, the method for determining similarity of video events may be applied, but is not limited to, to the application environment shown in fig. 1. The application environment comprises the following steps: a terminal device 102, a network 104 and a server 106 which interact with a user in a man-machine manner. Human-machine interaction can be performed between the user 108 and the terminal device 102, and a similarity judging application program of the video event runs in the terminal device 102. The terminal device 102 includes a man-machine interaction screen 1022, a processor 1024 and a memory 1026. The man-machine interaction screen 1022 is used for displaying images; the processor 1024 is configured to obtain a video to be detected, and perform similarity determination of video events according to the video to be detected. The memory 1026 is used to store feature data for the video event.

In addition, the server 106 includes a database 1062 and a processing engine 1064, where the database 1062 is used to store the feature data of the video event. The processing engine 1064 is configured to: acquiring a video to be detected, and acquiring a video event based on the video to be detected, wherein the video event comprises a first video event and a second video event; calculating feature data of the video event based on the content frame of the video event, and calculating similarity rates of the first video event and the second video event according to the feature data of the first video event and the feature data of the second video event; judging whether the similarity ratio of the first video event and the second video event is larger than or equal to a first threshold value, if so, determining that the first video event and the second video event are similar, and if not, determining that the first video event and the second video event are dissimilar.

In one or more embodiments, the method for determining the similarity of video events described above may be applied to the application environment shown in fig. 2. As shown in fig. 2, a human-machine interaction may be performed between a user 202 and a user device 204. The user device 204 includes a memory 206 and a processor 208. The user equipment 204 in this embodiment may, but is not limited to, perform the similarity determination of the video event with reference to performing the operations performed by the terminal equipment 102.

Optionally, the terminal device 102 and the user device 204 include, but are not limited to, a mobile phone, a tablet computer, a notebook computer, a PC, a vehicle-mounted electronic device, a wearable device, and the like, and the network 104 may include, but is not limited to, a wireless network or a wired network. Wherein the wireless network comprises: WIFI and other networks that enable wireless communications. The wired network may include, but is not limited to: wide area network, metropolitan area network, local area network. The server 106 may include, but is not limited to, any hardware device that may perform calculations. The server may be a single server, a server cluster composed of a plurality of servers, or a cloud server. The above is merely an example, and is not limited in any way in the present embodiment.

In the related art, a video fingerprint comparison method is mostly adopted, and the video fingerprint comparison method comprises the steps of taking image color histogram features as video fingerprints and video fingerprints based on two-dimensional discrete cosine transform of video images.

However, the noise immunity of the above manner is poor, the video detection accuracy is low, and the similarity determination result is affected, for example, after a video is subjected to amplitude-to-form ratio conversion or frame graphic conversion of a video with a watermark added, the substantial content of the video is still the same, but the converted video fingerprint is changed compared with the original video fingerprint, that is, the detection accuracy of the existing video event similarity determination method is not high.

In order to solve the above technical problems, as an optional implementation manner, the embodiment of the present invention provides a method for determining similarity of video events.

Fig. 3 shows a flowchart of steps of a method for determining similarity of video events according to an embodiment of the present application. As shown in FIG. 3, the method for judging the similarity of the video events comprises the following steps S301 to S303:

S301, acquiring a video to be detected, and acquiring a video event based on the video to be detected.

In this embodiment, the video to be detected may be a video from one or more resource libraries, may be a video specified by a user, and may be a video from the internet.

In this embodiment, obtaining a video event based on a video to be detected includes: preprocessing a video to be detected to obtain a normalized video; granulating the normalized video to obtain a shot sequence and a content frame sequence; and obtaining a video event sequence according to the shot sequence and the content frame sequence, wherein the video event sequence comprises at least one video event.

The preprocessing comprises, but is not limited to, performing video frame decomposition on the video to be detected, extracting picture-in-picture images, removing frames of images in the video and the like, and performing normalization conversion on the images of the video such as resolution, amplitude-shape ratio, color space and the like, so that the obtained normalization video has the same dimension, and granulation processing and content analysis are facilitated.

Fig. 4 shows a schematic diagram of a granulating structure, referring to fig. 4, a granulating structure of a video includes a video, a frame sequence, a shot and a content frame, the frame sequence is all frames representing video content, the shot is a continuous picture segment shot by a camera between a start time and a stop time, the shot is a basic unit of video composition, the content frame is a frame representing the shot content, the frame comprises a first frame, a last frame and N intermediate frames, N is a natural number, the intermediate frames are obtained when the difference rate is greater than a preset threshold value by performing difference rate calculation on all subframes of a shot except the first frame and the last frame and the previous frame.

In this embodiment, the granulating process refers to performing shot segmentation on a video to obtain a granulating structure of the video, where the principle of obtaining the granulating structure is as follows: the video content is composed of continuous frame sequences, the continuous frame sequences can be divided into a plurality of groups according to the continuity of the video content, and each group of continuous frame sequences is a lens. By analyzing the difference of the content in the video shots, a small number of frames are selected from the sequence of consecutive frames to represent the content of the shots, i.e. the frames of the content. Wherein, the content frames at least comprise the first and last two frames (shot frames) of the shot, so the content frame number of one shot is more than or equal to 2.

Fig. 5 is a schematic diagram of content frame extraction according to an embodiment of the present invention, as shown in fig. 5, the first frame is the first content frame, and then the 2 nd and 3 rd frames are calculated. And then calculating the difference rates of the 5 th, 6 th and 4 th frames until the difference rate is larger than a preset threshold, and if the difference rates of the 5 th, 6 th and 7 th frames and the first frame are smaller than the preset threshold and the 8 th frame is larger than the preset threshold, the 8 th frame is the third content frame. And by analogy, calculating the content frames in all subframes between all the first frames and all the tail frames. The end frame is selected directly as the last content frame without having to calculate the rate of difference with its previous content frame. The difference rate is the calculated difference rate between two frames of images.

For example, a surveillance video, with few people and few cars during the night, the video frame changes little, and the content frames will be few, for example, only a single number of content frames are extracted within 10 hours. The number of people and vehicles in the daytime is large, the change of people and objects in the video picture is frequent, and the content frames calculated according to the method are much more than those in the evening. Thus, the content frames are guaranteed not to lose all of the content information of the shot video relative to the key frames, as the key frames may lose part of the shot content. Compared with the scheme that each frame of the video is calculated and considered, the selection of the content frames only selects partial video image frames, so that the image calculation amount is greatly reduced on the premise of not losing the content.

In this embodiment, the video event refers to a set of all content frames in a shot, and specifically, obtaining the video event based on the video to be detected includes: preprocessing a video to be detected to obtain a normalized video, performing shot segmentation on the normalized video to obtain at least one shot, generating shot data (shot sequence) according to the at least one shot, extracting a content frame of each shot, generating content frame data (content frame sequence) according to the content frame of each shot, and combining the shot data and the content frame data of each shot to obtain each video event to form a video event sequence.

In this embodiment, the video event includes a first video event and a second video event, where the first video event and the second video event may be different video events from the same video event sequence, or may be different video events from different videos to be detected.

S302, calculating feature data of the video event based on the content frame of the video event, and calculating the similarity ratio of the first video event and the second video event according to the feature data of the first video event and the feature data of the second video event.

In this embodiment, the feature data includes the number of content frames of the video event and a content frame feature matrix of the video event, where the content frame feature matrix of the video event includes a feature matrix of each content frame included in the video event. The feature matrix of the content frame may be obtained according to the uniformity lbp feature of the content frame, which may better reflect the content feature of the video event, and in an alternative example, the feature matrix of the content frame may also be obtained according to other features of the content frame, such as a histogram feature, a sift feature, a hog feature, and a haar feature, which are not listed herein.

In this embodiment, the number of content frames of the second video event is greater than or equal to the number of content frames of the first video event, and calculating the similarity between the first video event and the second video event according to the feature data of the first video event and the feature data of the second video event includes: and calculating the difference rate of the content frames of the first content frame of the first video event and the content frames of the second video event in sequence to obtain target content frames corresponding to each content frame of the first video event one by one, wherein the target content frames belong to the second video event. And calculating the similarity of the first video event and the second video event according to the content frame difference rate of each content frame of the first video event and the corresponding target content frame and the content frame quantity of the first video event.

For example, if the number of content frames of the first video event is 5 and the number of content frames of the second video event is 7, calculating the difference between the first content frame of the first video event and the first content frame, the second content frame … …, the sixth content frame, and the seventh content frame of the second video event, respectively, and if the difference rate between the first content frame of the first video event and the first content frame of the second video event is greater than the preset difference rate threshold (fourth threshold), and if the difference rate between the first content frame of the first video event and the second content frame of the second video event is less than the preset difference rate threshold, taking the second content frame of the second video event as the target content frame corresponding to the first content frame of the first video event, and taking the second content frame of the second video event as the first target content frame, sequentially obtaining 5 target content frames, thereby knowing that the target content frames belong to the second video event, that is, the third content frame of the second video event, the fourth content frame of the second video event, the fifth content frame of the second video event, the sixth content frame of the second video event and the seventh content frame of the second video event are sequentially used as the second content frame of the first video event, the third content frame of the first video event, the fourth content frame of the first video event and the target content frame of the fifth content frame of the first video event, respectively, wherein, assuming that the difference rate of the first content frame of the first video event to the first content frame of the second video event, the second content frame and the third content frame of the second video event is greater than the preset difference rate threshold, 5 target content frames cannot be found in the second video event, that is, the target content frame cannot be found for the content frame of each first video event in the second video event, it is determined that the first video event and the second video event are dissimilar, indicating that there are different frames of content.

After obtaining the target content frames corresponding to each content frame of the first video event one by one according to the method, calculating the similarity ratio of the first video event and the second video event according to the content frame difference ratio of each content frame of the first video event and the corresponding target content frame and the content frame number of the first video event.

Specifically, a similarity ratio calculation formula for calculating the similarity ratio of the first video event and the second video event is:

，/>

It can be understood that when the number of content frames included in the a video event is greater than the number of content frames included in the B video event, the a video event has more content frames than the B video event, so that the B video event cannot be considered to include the a video event, and thus the a video event cannot be judged to be similar to the B video event, but whether the a video event is similar to the B video event can be judged by judging whether the a video event includes all content frames of the B video event. That is, in the present embodiment, when the number of content frames of two video events is the same, the comparison can be directly performed, and when the number of content frames of two video events is different, the video event with a smaller number of content frames is compared with the video event with a larger number of content frames, and the "first" and "second" of the first video event and the second video event are only different and are not limited in order.

S303, judging whether the similarity ratio of the first video event and the second video event is greater than or equal to a first threshold value, if so, determining that the first video event and the second video event are similar, and if not, determining that the first video event and the second video event are dissimilar.

In the present embodiment, whenThe conditions are satisfied: />Indicating that the similarity of the video events of the first video event and the second video event is larger within the error allowable range, that is, the difference rate of each content frame in the first video event and the corresponding target content frame in the second video event is smaller than a preset difference rate threshold value, and the overall similarity rate is larger than or equal to a first threshold value, determining that the first video event and the second video event are similar, if ∈>If the first video event and the second video event are not satisfied, indicating that the difference rate of the first video event and the second video event is larger, determining that the first video event and the second video event are dissimilar, whereinIs a first threshold.

The method for judging the similarity of the video events provided by the embodiment can truly and accurately judge the similarity between the two video events according to the video event feature matrix of the two video events, further judge whether the two video events are similar according to the similarity, and has the characteristic of accurate calculation.

When more videos are to be detected, for example, when the similarity comparison is performed between a video event and a massive video event, the calculated amount of calculation is too large by using the content frame feature matrix, and the embodiment further provides a method for judging the similarity between a small amount of video events and the massive video event, for example, when there are a plurality of second video events, the method in this embodiment compares a massive video event composed of a first video event and a plurality of second video events, and at this time includes: and extracting features of the plurality of second video events to obtain a video event feature database containing feature data of each second video event, and judging the similarity of the video event for the first video event and each second video event in the video event feature database.

Wherein the second video event feature data includes a number of content frames of the second video event, a feature vector of the second video event, and a content frame feature matrix of the second video event. The feature data of the second video event can be extracted in advance, and the feature data in the video event feature database can be directly called when the similarity judgment is carried out on the first video event and any second video event, so that the instant calculation speed is improved.

For example, the amplitude-shape ratio, resolution ratio and color space normalization processing and granulation are performed on the massive video events one by one to obtain a plurality of second video events, then feature data of each second video event are extracted, and a feature data set of the plurality of second video events is used as a video event feature database.

Further, in the case of comparing a first video event with a massive number of video events consisting of a plurality of second video events, if the first video event is compared with each second video event, the calculation amount is huge, so the embodiment can exclude the second video event dissimilar to the first video event through the preset condition, so as to reduce the calculation amount.

In this embodiment, performing video event similarity determination on the first video event and each second video event in the video event feature database includes: calculating an absolute value of a content frame number difference value of the first video event and the second video event; determining that the first video event and the second video event are dissimilar in the case that the absolute value of the content frame number difference is greater than a second threshold; and under the condition that the absolute value of the content frame number difference value is smaller than or equal to a second threshold value, judging whether the first video event and the second video event are similar or not according to the feature vector difference rate of the first video event and the second video event.

For example, p represents a second video event, q represents a first video event,for the number of content frames of the second video event, < >>For the number of content frames of the first video event, then

In the case of (a), it is explained that the difference in the number of frames of content contained in the first video event and the second video event is large, wherein,is a second threshold. For example, the number of content frames of the first video event is 9, the number of content frames of the second video event is 20, the number of content frames is greatly different, and the two are directly judged to be dissimilar.

And, atIn the case of the first video event, the number of content frames included in the second video event is smaller, the first video event and the second video event may be similar or dissimilar, for example, the number of content frames of the first video event is 5, and the number of content frames of the second video event is 7, at this time, 2 content frames of the second video event more than the first video event may be a picture repeated with 5 content frames of the first video event or a picture different from 5 content frames of any one of the first video event, and therefore, the judgment of whether the first video event and the second video event are similar or not is continued according to the feature vector difference rate of the first video event and the second video event.

Specifically, determining whether the first video event and the second video event are similar according to the feature vector difference rates of the first video event and the second video event includes: obtaining the feature vector of the first video event and the module of the feature vector of the first video event according to the feature vector of the content frame of the first video event, and obtaining the feature vector of the second video event and the module of the feature vector of the second video event according to the feature vector of the content frame of the second video event; according to the feature vector of the first video event and the feature vector of the second video event, obtaining a feature vector difference value of the first video event and the second video event; and obtaining the feature vector difference rates of the first video event and the second video event according to the feature vector difference values of the first video event and the second video event, the modulus of the feature vector of the first video event and the modulus of the feature vector of the second video event.

In this embodiment, the feature vector of the content frame is obtained by calculating the feature matrix of the content frame, and the feature vector of the video event is obtained according to the feature vector of the content frame.

In this embodiment, the feature vector of the video event is represented by EV, the dimension of EV is 3481, and the formula of EV is as follows:

Representation vector->The value of k dimensions>The calculation formula of (2) is as follows:

wherein the method comprises the steps ofFor the number of content frames in a video event, +.>Vector values in the k dimension for the ith content frame of the video event.

Further, feature vector difference values of the first video event and the second video eventThe method comprises the following steps:

where p represents a second video event, q represents a first video event,a value representing k dimensions of the second video event, < >>A value representing the k-dimension of the first video event.

Further, feature vector difference rates of the first video event and the second video eventThe calculation formula is as follows:

wherein,feature vector difference values representing a first video event and a second video event,/for a first video event>Modulo representing a feature vector of a second video event, < >>Features representing a first video eventThe modulus of the vector is that,representing the minimum of taking the modulus of the feature vector of the first video event and the modulus of the feature vector of the second video event,/for>As a denominator is not zero; at->All are zero, add>。

The modulo modEV calculation formula of the event feature vector is as follows:

and after the feature vector difference rates of the first video event and the second video event are obtained, judging whether the first video event and the second video event are similar or not by calculating the relation between the feature vector difference rates of the first video event and the second video event and a third threshold value.

In case the feature vector difference rate of the first video event and the second video event is greater than a third threshold, determining that the first video event and the second video event are dissimilar, i.e. inIn the case of (a), at this time, the feature vectors of the first video event and the second video event are described as having a large difference, and the first video event and the second video event are determined to be dissimilar, +.>Is a third threshold.

In the case that the difference rate of the feature vectors of the first video event and the second video event is less than or equal to the third threshold value, namelyIn the case of (2), the feature vector difference that states the first video event and the second video event is small, but due to +.>Cannot be directly used as a criterion for whether events p and q are similar, when->The value of the temporal events p and q may not be similar, and therefore, it is also determined whether the first video event and the second video event are similar according to the content frame difference rates of the first video event and the second video event.

Specifically, in this embodiment, the determining whether the first video event and the second video event are similar according to the content frame difference rate of the first video event and the second video event, where the number of content frames of the second video event is greater than or equal to the number of content frames of the first video event includes: and judging whether any content frame of the first video event has a target content frame in the second video event, judging whether the difference rate of each content frame of the first video event and the corresponding content frame of the target content frame is smaller than or equal to a fourth threshold value, and if not, determining that the first video event is dissimilar to the second video event.

The judging condition for judging the similarity of the content frames of the first video event and the second video event through the content frame difference rate of the first video event and the second video event is as follows:

wherein,for the first video event->And second video event->Content frame difference rate,/-, of (2)>J content frame for event p, +.>，/>I content frame for event q, +.>，/>For the original difference rate between the j content frame of event p and the i content frame of event q,/>Is an inherent error->For calculating the preset threshold value of the error, +.>Is the fourth threshold.

In the present embodiment of the present invention,the calculation formula of (2) is as follows:

wherein,content frame difference value for i content frame of first video event and j content frame of second video event,/for i>Modulo the feature matrix of the j content frame of the second video event, +.>Modulo the feature matrix of the i content frame of the first video event, < >>As a denominator other than 0, when +.>And->All 0 +.>。

It will be appreciated that when any one of the content frames of the first video event does not have a target content frame in the second video event, it is stated that the first video event and the second video event have distinct content frames, which are dissimilar. It should be noted that, the method for determining whether the second video event has the target content frame corresponding to each content frame of the first video event is consistent with the method for determining the target content frame in step S302, and the first content frame of the first video event and the content frame of the second video event are sequentially calculated to obtain the target content frame corresponding to each content frame of the first video event.

When each content frame of the first video event has a target content frame in the second video event, but the difference rate of any content frame in the first video event and all content frames of the second video event is larger than the fourth threshold, the difference rate of any content frame in the first video event and all content frames in the second video event is larger, and the first video event and the second video event are distinguished, and then the first video event and the second video event are determined to be dissimilar.

In the second video event, there are target content frames corresponding to each content frame of the first video event one by one, and the content frame difference rate between each content frame of the first video event and the corresponding target content frame is less than or equal to the fourth threshold valueIn the case of (a), it can be explained that the difference rate is smaller when the content frames of the first video event and the content frames of the second video event are compared in a single frame difference rate, but considering that the video event is formed by a plurality of content frames in a continuous order, the single frame difference rate cannot represent the difference rate of the whole video event, so that in order to obtain a more accurate judgment result, any content frame of the first video event has a target content frame in the second video event, and the difference rate of each content frame of the first video event and the content frame of the corresponding target content frame is equal to or less than a fourth threshold value% >In the case of (2), a step of determining whether the similarity ratio of the first video event to the second video event is greater than or equal to a first threshold is performed, that is, the step S303 determines whether the similarity ratio of the first video event to the second video event is greater than or equal to the first threshold, if so, it is determined that the first video event is similar to the second video event, and if not, it is determined that the first video event is dissimilar to the second video event. In order to avoid repetition, a description thereof is omitted.

Fig. 6 is a flowchart showing steps for determining whether a first video event is similar to a second video event, and referring to fig. 6, the steps include steps S601 to S603 as follows:

s601, judging whether the absolute value of the difference value of the number of the content frames of the first video event and the second video event is smaller than or equal to a second threshold value, if yes, executing step S602, and if not, determining that the first video event and the second video event are dissimilar.

S602, judging whether the vector difference rate of the video event feature vector of the first video event and the video event feature vector of the second video event is smaller than or equal to a third threshold value, if yes, executing step S603, and if not, determining that the first video event and the second video event are dissimilar.

S603, judging whether any content frame of the first video event has a target content frame in the second video event, wherein the difference rate of each content frame of the first video event and the content frame of the corresponding target content frame is smaller than or equal to a fourth threshold value, if yes, executing step S604, and if not, determining that the first video event and the second video event are dissimilar.

S604, judging whether the similarity ratio of the first video event and the second video event is greater than or equal to a first threshold value, if so, determining that the first video event and the second video event are similar, and if not, determining that the first video event and the second video event are dissimilar.

According to the method for judging the similarity between the first video event and the massive second video event, which is provided by the embodiment, the massive second video event is normalized and the characteristics are extracted to form the characteristic database, so that when the comparison is performed, only the characteristic data of the first video event is required to be extracted, and then the similarity between the first video event and the massive second video event is compared according to the characteristic data of the first video event and the characteristic data of the second video event, and the comparison efficiency of the massive video events is improved. And the massive second video events can be screened according to different video event characteristic data, so that the second video events dissimilar to the first video events are gradually removed, and the similarity comparison rate of the massive video events can be further improved.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.

The following is an embodiment of a device for determining similarity of video events according to the present invention, which may be used to execute the method embodiment of the present invention. For details not disclosed in the embodiment of the apparatus for determining similarity of video events according to the present invention, please refer to the embodiment of the method according to the present invention.

Fig. 7 is a schematic structural diagram of a video event similarity determination apparatus according to an exemplary embodiment of the present invention. The similarity determining means of the video event may be implemented as all or part of the terminal by software, hardware or a combination of both. The similarity determining device 700 for video events includes:

the image processing module 701 is configured to obtain a video to be detected, and obtain a video event based on the video to be detected, where the video event includes a first video event and a second video event; the video event is a set of all content frames in a shot, the content frames are frames representing the shot content, the frames comprise a first frame, a last frame and N intermediate frames, N is a natural number, the intermediate frames are obtained when the difference rate is larger than a preset threshold value through difference rate calculation of all sub-frames of the shot except the first frame and the last frame and the previous content frame.

The calculating module 702 is configured to calculate feature data of the video event based on the content frame of the video event, and calculate a similarity ratio of the first video event and the second video event according to the feature data of the first video event and the feature data of the second video event.

The judging module 703 is configured to judge whether a similarity ratio of the first video event to the second video event is greater than or equal to a first threshold, if so, determine that the first video event is similar to the second video event, and if not, determine that the first video event is dissimilar to the second video event.

In one example, the plurality of second video events are provided, and the image processing module 601 is further configured to perform feature extraction on the plurality of second video events to obtain a video event feature database containing feature data of each of the second video events, where the feature data of the second video events includes a number of content frames of the second video event, a feature vector of the second video event, and a feature matrix of the content frames of the second video event; the judging module 603 is further configured to perform video event similarity judgment on the first video event and each second video event in the video event feature database.

It should be noted that, when the similarity determining device for video events provided in the above embodiment performs the similarity determining method for video events, only the division of the above functional modules is used for illustration, and in practical application, the above functional allocation may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the similarity determining device for video events provided in the above embodiment and the method embodiment for generating similarity determination for video events belong to the same concept, which embody detailed implementation procedures in the method embodiment and are not described herein again.

According to the method and the device, the video event is obtained through normalization and granulation processing of the video, each video event can be calculated under the same coordinate system, the similarity of the first video event and the second video event is obtained through calculating the feature data of the video event, and whether the two video events are similar or not can be judged according to the similarity of the first video event and the second video event, so that the method and the device have the characteristic of accurate calculation. In addition, when similarity judgment is carried out on the first video event and the massive second video event, a video event feature database is formed by carrying out normalization and feature extraction on the massive second video event, and further, when comparison is carried out, only feature data of the first video event are required to be extracted, and then similarity of the first video event feature data and the second video event feature data is compared according to the feature data of the first video event feature data and the feature data of the second video event feature data, so that comparison efficiency of the massive video events is improved. And the massive second video events can be screened according to different video event characteristics, so that the second video events dissimilar to the first video events are gradually removed, and the similarity comparison rate of the massive video events can be further improved.

The embodiment of the application also provides an electronic device corresponding to the video event similarity judging method provided by the previous embodiment, so as to execute the video event similarity judging method.

Fig. 8 shows a schematic diagram of an electronic device according to an embodiment of the application. As shown in fig. 8, the electronic device 800 includes: a memory 801 and a processor 802, the memory 801 storing a computer program executable on the processor 802, the processor 802 executing the methods provided by any of the previous embodiments of the present application when the computer program is executed.

Alternatively, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of the computer network.

Alternatively, in the present embodiment, the above-described processor may be configured to execute the steps of the above-described similarity determination method of video events by a computer program.

Alternatively, as will be appreciated by those skilled in the art, the structure shown in fig. 8 is merely illustrative, and the electronic device may be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a mobile internet device (Mobile Internet Devices, MID), a PAD, or other terminal devices. Fig. 8 is not limited to the structure of the electronic device and the electronic apparatus described above. For example, the electronics can also include more or fewer components (e.g., network interfaces, etc.) than shown in fig. 8, or have a different configuration than shown in fig. 8.

The memory 801 may be used to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for determining similarity of video events in the embodiments of the present invention, and the processor 802 executes the software programs and modules stored in the memory 801 to perform various functional applications and data processing, that is, implement the method for determining similarity of video events described above. The memory 801 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory. In some examples, the memory 801 may further include memory remotely located relative to the processor 802, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 801 may be used, among other things, for storing image characteristic data. As an example, the memory 801 may include, but is not limited to, various modules in the similarity determination device including the video event. In addition, other module units in the above-mentioned similarity determination device for video events may be included, but are not limited to, and are not described in detail in this example.

Optionally, the electronic device comprises transmission means 803, the transmission means 803 being adapted to receive or transmit data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission means 803 includes a network adapter (Network Interface Controller, NIC) that can be connected to other network devices and routers via a network cable to communicate with the internet or a local area network. In one example, the transmission device 803 is a Radio Frequency (RF) module, which is used to communicate with the internet wirelessly.

In addition, the electronic device further includes: a display 804, configured to display the first determination result or the second determination result; and a connection bus 805 for connecting the respective module parts in the above-described electronic apparatus.

The present embodiments provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer program is configured to, when executed, perform the steps of any of the method embodiments described above.

Alternatively, in the present embodiment, the above-described computer-readable storage medium may be configured to store a computer program for executing the steps of the similarity determination method of video events.

Alternatively, in this embodiment, it will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be performed by a program for instructing a terminal device to execute the steps, where the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method of the various embodiments of the present invention.

In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and are merely a logical functional division, and there may be other manners of dividing the apparatus in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A method for determining similarity of video events, the method comprising:

acquiring a video to be detected, and acquiring a video event based on the video to be detected, wherein the video event comprises a first video event and a second video event; the video event is a set of all content frames in a shot, the content frames are frames representing shot content, the frames comprise a first frame, a last frame and N intermediate frames, N is a natural number, the intermediate frames are obtained when the difference rate is larger than a preset threshold value through difference rate calculation of all subframes of the shot except the first frame and the last frame and the previous content frame;

Calculating feature data of the video event based on the content frame of the video event, and calculating similarity rates of the first video event and the second video event according to the feature data of the first video event and the feature data of the second video event;

judging whether the similarity ratio of the first video event and the second video event is larger than or equal to a first threshold value, if so, determining that the first video event and the second video event are similar, and if not, determining that the first video event and the second video event are dissimilar.

2. The method of claim 1, wherein the deriving a video event based on the video to be detected comprises:

preprocessing the video to be detected to obtain a normalized video;

granulating the normalized video to obtain a lens sequence and a content frame sequence;

and obtaining a video event sequence according to the shot sequence and the content frame sequence, wherein the video event sequence comprises at least one video event.

3. The feature data includes a content frame number of a video event and a content frame feature matrix of the video event, the content frame number of the second video event is greater than or equal to the content frame number of the first video event, and the calculating the similarity ratio of the first video event and the second video event according to the feature data of the first video event and the feature data of the second video event includes:

Calculating a content frame difference rate of a first content frame of the first video event and a content frame of the second video event in sequence to obtain target content frames corresponding to each content frame of the first video event one by one, wherein the target content frames belong to the second video event;

4. The method of claim 3, wherein a similarity ratio calculation formula for calculating a similarity ratio of the first video event and the second video event is:

，/>

wherein,for the number of content frames of the first video event, p represents the second video event,for the number of content frames of the second video event, < >>Representing the similarity of the first video event q and the second video event p,/and>content frame number for first video event, < >>Is the +.o of the first video event q>And the difference rate of the content frame and the corresponding target content frame, wherein the target content frame belongs to the second video event.

5. The method of claim 1, wherein there are a plurality of the second video events, the method comprising:

Extracting features of a plurality of second video events to obtain a video event feature database containing feature data of each second video event, wherein the feature data of the second video event comprise the number of content frames of the second video event, feature vectors of the second video event and a content frame feature matrix of the second video event;

and judging the similarity of the video event to each second video event in the first video event and the video event feature database.

6. The method of claim 5, wherein performing a video event similarity determination for the first video event and each second video event in the video event feature database comprises:

calculating an absolute value of a difference in the number of content frames of the first video event and the second video event;

determining that the first video event and the second video event are dissimilar if the absolute value of the content frame number difference is greater than a second threshold;

and judging whether the first video event and the second video event are similar or not according to the feature vector difference rate of the first video event and the second video event under the condition that the absolute value of the content frame number difference value is smaller than or equal to a second threshold value.

7. The method of claim 6, wherein said determining whether the first video event and the second video event are similar based on feature vector difference rates of the first video event and the second video event comprises:

obtaining the feature vector of the first video event and the module of the feature vector of the first video event according to the feature vector of the content frame of the first video event, and obtaining the feature vector of the second video event and the module of the feature vector of the second video event according to the feature vector of the content frame of the second video event;

obtaining a feature vector difference value of the first video event and the second video event according to the feature vector of the first video event and the feature vector of the second video event;

obtaining the feature vector difference rates of the first video event and the second video event according to the feature vector difference values of the first video event and the second video event, the feature vector modulus of the first video event and the feature vector modulus of the second video event;

determining that the first video event and the second video event are dissimilar if a feature vector difference rate of the first video event and the second video event is greater than a third threshold;

And judging whether the first video event and the second video event are similar or not according to the content frame difference rate of the first video event and the second video event under the condition that the feature vector difference rate of the first video event and the second video event is smaller than or equal to the third threshold value.

8. The method of claim 7, wherein the number of content frames of the second video event is greater than or equal to the number of content frames of the first video event, and wherein determining whether the first video event and the second video event are similar based on the content frame difference rates of the first video event and the second video event comprises:

judging whether any content frame of the first video event has a target content frame in the second video event, judging whether the content frame difference rate of each content frame of the first video event and the corresponding target content frame is smaller than or equal to a fourth threshold value, and if not, determining that the first video event is dissimilar to the second video event;

if yes, executing the step of judging whether the similarity ratio of the first video event and the second video event is greater than or equal to a first threshold value;

wherein,for the first video event->And second video event->Content frame difference rate,/-, of (2)>For the j content frame of event p,，/>i content frame for event q, +.>，/>For the original difference rate between the j content frame of event p and the i content frame of event q,/>Is an inherent error->For calculating error +.>Is the fourth threshold.

9. A similarity determining device for video events, the device comprising:

the image processing module is used for acquiring a video to be detected, and obtaining a video event based on the video to be detected, wherein the video event comprises a first video event and a second video event; the video event is a set of all content frames in a shot, the content frames are frames representing shot content, the frames comprise a first frame, a last frame and N intermediate frames, N is a natural number, the intermediate frames are obtained when the difference rate is larger than a preset threshold value through difference rate calculation of all subframes of the shot except the first frame and the last frame and the previous content frame;

The computing module is used for computing the characteristic data of the video event based on the content frame of the video event, and computing the similarity rate of the first video event and the second video event according to the characteristic data of the first video event and the characteristic data of the second video event;

the judging module is used for judging whether the similarity ratio of the first video event and the second video event is larger than or equal to a first threshold value, if so, determining that the first video event and the second video event are similar, and if not, determining that the first video event and the second video event are dissimilar.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor runs the computer program to implement the method of any one of claims 1-8.

11. A computer readable storage medium having stored thereon a computer program, wherein the program is executed by a processor to implement the method of any of claims 1-8.