CN115103223B

CN115103223B - Video content detection method, device, equipment and storage medium

Info

Publication number: CN115103223B
Application number: CN202210628167.7A
Authority: CN
Inventors: 王佃勇
Original assignee: China Mobile Communications Group Co Ltd; MIGU Video Technology Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Video Technology Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2022-06-02
Filing date: 2022-06-02
Publication date: 2023-11-10
Anticipated expiration: 2042-06-02
Also published as: CN115103223A

Abstract

The application discloses a video content detection method, a device, equipment and a storage medium, and belongs to the technical field of video processing. The method comprises the steps of obtaining a first target video and a second target video of the same video set; the first target video comprises a plurality of first video frames, the second target video comprises a plurality of second video frames, and the first video frames and the second video frames have play time stamps; according to the playing time stamp, a plurality of first video frames and a plurality of second video frames are matched one by one to obtain a plurality of video frame groups; screening out similar video frame groups with similarity meeting preset conditions from the plurality of video frame groups, and obtaining target playing time stamps of the similar video frame groups; determining at least one continuous playing time period according to the plurality of target playing time stamps; and determining a plurality of video frames corresponding to the continuous playing time period in the same video set as similar video contents of the same video set. The application can more accurately identify the similar content in the same video set.

Description

Video content detection method, device, equipment and storage medium

Technical Field

The present application relates to the field of video processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting video content.

Background

In the related art, a tv episode is equivalent to a video episode having similar contents of a fixed, unified clip. The similar content may be a head portion, a tail portion, or an advertising portion in a slice.

When identifying similar content of the same video set, there are two ways:

firstly, comparing a video sample with a video picture to be identified, and when the video picture to be identified is continuously matched with the video sample, determining the position of the corresponding time point of the video picture to be identified as a video head and determining the corresponding time point of the last picture of the picture to be identified as the head ending time. The selection of a video picture sample is particularly important based on the comparison of the video picture sample and the video picture to be identified, but the video sample is not uniformly standardized because the pictures of the video to be compared may be jagged in quality. This tends to result in a relatively low accuracy of similar content identification.

Secondly, obtaining the change rate of the user closing or skipping the video operation on the playing time period of the video playing object by obtaining the playing data of the video object, namely the big data statistical data of the video object which is closed or skipped by the user, and determining the moment when the change rate is greater than a certain threshold value as the end moment of the head, the beginning moment of the tail or the advertisement part in the slice. But based on the recognition of video play data, if the video object is already put on shelf and available, similar content recognition is less accurate for a newly-put-on-line or a video object with fewer samples or no play data.

Disclosure of Invention

The application mainly aims to provide a video content detection method, a device, equipment and a storage medium, which aim to solve the problem that the existing video similar content detection and identification accuracy is not high.

To achieve the above object, in a first aspect, the present application provides a video content detection method, including:

acquiring a first target video and a second target video; the first target video and the second target video belong to the same video set, the first target video comprises a plurality of first video frames, the second target video comprises a plurality of second video frames, and the first video frames and the second video frames have play time stamps;

according to the playing time stamp, a plurality of first video frames and a plurality of second video frames are matched one by one to obtain a plurality of video frame groups;

screening out similar video frame groups with similarity meeting preset conditions from a plurality of video frame groups, and obtaining target playing time stamps of the similar video frame groups; the similarity is the picture similarity between the first video frame and the second video frame in each video frame group;

determining at least one continuous playing time period according to the plurality of target playing time stamps; the difference value of the playing time stamps between two adjacent target playing time stamps in the continuous playing time period is smaller than a preset time threshold;

And determining a plurality of video frames corresponding to the continuous playing time period in the same video set as similar video contents of the same video set.

In an embodiment, the selecting the group of similar video frames with similarity satisfying a preset condition from the plurality of groups of video frames includes:

determining a difference value hash of the first video frame and the second video frame in each video frame group respectively;

and determining the similarity of each video frame group according to the difference value hash.

In an embodiment, the determining the hash of the difference value of the first video frame and the second video frame in each video frame group includes:

obtaining a difference value array according to the image difference values of each row of adjacent pixels in the first video frame and the second video frame;

determining preset system data composed of each preset value of continuous image difference values in the difference value array;

and taking the character string corresponding to the preset binary data as a difference value hash of the first video frame and the second video frame.

Reducing the first video frame into a first video frame with a preset size;

reducing the second video frame into a second video frame with a preset size;

and respectively determining difference value hash of the first video frame with the preset size and the second video frame with the preset size in each video frame group.

carrying out graying treatment on the first video frame with the preset size to obtain a first gray video frame;

carrying out graying treatment on the first video frame with the preset size to obtain a second gray video frame;

and respectively determining difference value hash of the first gray video frame and the second gray video frame.

In an embodiment, before the matching the plurality of first video frames with the plurality of second video frames one by one according to the play time stamp to obtain a plurality of video frame groups, the method further includes:

according to the appointed sampling frame rate, video frame sampling is carried out on the video in the appointed playing time period in the first video frames, and a plurality of first video frames are obtained; wherein the specified sampling frame rate corresponds to the preset time threshold;

And according to the appointed sampling frame rate, video frame sampling is carried out on the video in the appointed playing time period in the second video frames, and a plurality of second video frames are obtained.

In an embodiment, the continuous playing time period includes at least one of a head time period, a tail time period, and an in-chip advertising time period.

In a second aspect, the present application also provides a video content detection apparatus, including:

the video frame acquisition module is used for acquiring a first target video and a second target video; the first target video and the second target video belong to the same video set, the first target video comprises a plurality of first video frames, the second target video comprises a plurality of second video frames, and the first video frames and the second video frames have play time stamps;

the video frame matching module is used for matching a plurality of first video frames with a plurality of second video frames one by one according to the playing time stamp to obtain a plurality of video frame groups;

the similar frame determining module is used for screening similar video frame groups with similarity meeting preset conditions from the plurality of video frame groups, and obtaining target playing time stamps of the similar video frame groups; the similarity is the picture similarity between the first video frame and the second video frame in each video frame group;

The time period determining module is used for determining at least one continuous playing time period according to the plurality of target playing time stamps; the playing time stamp difference value between two adjacent target playing time stamps in the continuous playing time period is smaller than a preset time threshold value;

and the similar content determining module is used for determining a plurality of video frames corresponding to the continuous playing time period in the same video set as similar video content of the same video set.

In a third aspect, the present application also provides a video content detection apparatus, including: a memory, a processor, and a video content detection program stored on the memory and executable on the processor, the video content detection program configured to implement the steps of the video content detection method as described above.

In a fourth aspect, the present application also provides a computer-readable storage medium having stored thereon a video content detection program which, when executed by a processor, implements the steps of the video content detection method as described above.

The application provides a video content detection method, which is characterized in that similarity comparison is carried out between a first video frame and a second video frame which are matched with each other in a plurality of groups of playing time stamps in a first target video and a second target video in the same video set to determine whether the first video frame and the second video frame are similar frames, and then a continuous playing time period is carried out according to the obtained target playing time stamps of the similar frame video frame groups, wherein a plurality of video frames corresponding to the continuous playing time period are similar video sets of the same video set. Compared with the prior art, the method compares the target video frames in the same video set with the video samples to determine similar video frames or carries out similar content detection through big data accessed by users.

Drawings

Fig. 1 is a schematic structural diagram of a video content detection apparatus of the present application;

FIG. 2 is a flowchart of a video content detection method according to a first embodiment of the present application;

FIG. 3 is a flowchart of a video content detection method according to a second embodiment of the present application;

FIG. 4 is a detailed flowchart of step S301 in an embodiment of the video content detection method according to the present application;

FIG. 5 is a detailed flowchart of step S301 in another embodiment of the video content detection method according to the present application;

FIG. 6 is a flowchart of a third embodiment of a video content detection method according to the present application;

fig. 7 is a schematic diagram of a video content detection apparatus according to the present application.

The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

For a video episode, each video of the episode has multiple video frames of similar content at the beginning and end of the episode, namely, the beginning video and end video. In the related art, there are two methods for identifying the head video and the tail video:

firstly, comparing a video sample with a video picture to be identified, and when the video picture to be identified is continuously matched with the video sample, determining the position of the corresponding time point of the video picture to be identified as a video head and determining the corresponding time point of the last picture of the picture to be identified as the head ending time.

The selection of the video picture sample is particularly important based on the comparison of the video picture sample and the video picture to be identified, but in fact the video sample does not have a uniform standard, because the pictures of the video to be compared may be uneven in quality. This tends to result in a relatively low matching efficiency.

Secondly, obtaining the change rate of the video closing operation of the user in the playing time period of the video playing object by obtaining the playing data of the video object, namely the big data statistical data of the video object closed by the user, and determining the moment when the change rate is greater than a certain threshold value as the end moment or the beginning moment of the chip tail.

Based on the identification premise of video playing data, the video object is put on shelf and playable, and the video object which is newly put on line or does not have playing data cannot be identified, and similar content identification needs to be carried out by manual annotation.

Therefore, the application provides a video content detection method, which is characterized in that two video frames with the same playing time stamp in any two target videos in the same video set, namely a first target video and a second target video, are compared through similarity to determine a continuous playing time period with similar video frames, and a plurality of video frames corresponding to the continuous playing time period are determined to be similar video content of the same video set, so that detection errors caused by the quality of video samples per se are overcome, detection errors caused by insufficient large data order of user access are avoided, similar content in the same video set can be identified more accurately, and then a similar content frame set is determined according to the determined playing time point of the similar frames.

The application concept of the present application is further described below in conjunction with some specific embodiments.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a video content detection apparatus of a hardware running environment according to an embodiment of the present application.

As shown in fig. 1, the video content detection apparatus may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) Memory or a stable nonvolatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

It will be appreciated by those skilled in the art that the structure shown in fig. 1 does not constitute a limitation of the video content detection apparatus, and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

As shown in fig. 1, an operating system, a data storage module, a communication module, a user interface module, and a video content detection program may be included in the memory 1005 as one type of storage medium.

In the video content detection apparatus shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the video content detection apparatus of the present application may be provided in the video content detection apparatus, and the video content detection apparatus calls the video content detection program stored in the memory 1005 through the processor 1001 and executes the video content detection method provided by the embodiment of the present application.

Based on the above hardware device, but not limited to the above hardware device, a first embodiment of a video content detection method of the present application is provided. Referring to fig. 2, fig. 2 is a flowchart illustrating a video content detection method according to a first embodiment of the present application.

In this embodiment, the method includes:

step S100, a first target video and a second target video are obtained; the first target video and the second target video belong to the same video set, the first target video comprises a plurality of first video frames, the second target video comprises a plurality of second video frames, and the first video frames and the second video frames have play time stamps.

Step 200, according to the playing time stamp, matching the plurality of first video frames with the plurality of second video frames one by one to obtain a plurality of video frame groups.

In the present embodiment, the execution subject of the video content detection method is a video content detection apparatus. The video content detection device is configured to detect similar video frames between each of the target videos in the same video set. The video content detection device may be a mobile terminal such as a mobile phone, a tablet, a local computer or a cloud server, which is not limited in this embodiment.

The same video set is a television episode with the same or similar content, or a video with the same feature tag issued by the same producer. For example, for the same television episode, its head and tail portions are typically identical. Alternatively, for the same movie distribution company, its tail portion typically has video frames showing its company logo, i.e., video frames carrying the same feature tags. Alternatively, for the same individual producer, each target video has video frames carrying their personal tags in the same video set of videos it publishes.

The same video episode includes at least two target videos, i.e., a first target video and a second target video, which may have an explicit logical relationship therebetween, such as for the same television episode, the target videos included therein are ordered according to an episode list. The at least two target videos can be independent, like a short video set issued by a producer, each target video is used for evaluating a certain game or cartoon, and the plurality of target videos are not logically related in content.

In particular, the video content detection device may download the same video set over a network, and may retrieve the stored same video set from a local database, as the application is not limited in this regard. And it can be understood that the same video set includes a plurality of target videos, and the embodiment can acquire part of target videos in the same video set, but not all of the same video set. However, it is worth mentioning that it is preferably complete for any target video in the same video set.

For any target video, it includes a plurality of video frames, and any video frame includes image data, audio data, and a play time stamp. It will be appreciated that, in order to avoid frame rate dependent audio-visual dyssynchrony, the playback time stamp may be used as a playback sequence tag for the image data and the audio data to ensure that the content of the presented first image data and the played audio data are synchronized when the video is played. Thus, different video frames have different play time stamps for the same target video. And may have the same play time stamp for two different video frames. I.e. for video frames with the same play time stamp, they should be played at the same point in time. For example, for a video episode, the play time stamp of a video frame in the first video episode is 03:00, which should be played at the point of 03:00 after the first video episode starts, that is, the image data is displayed and the audio data is played. The playing time stamp of a video frame in the second set of videos is 03:00, and the video frame should also be played at the point of 03:00 after the second set of videos starts playing, namely, the image data is displayed and the audio data is played.

Therefore, based on the same playing time stamp, a plurality of first video frames and a plurality of second video frames can be matched one by one to obtain a plurality of video frame groups.

Extracting a first to-be-detected video frame from the first target video, wherein the first to-be-detected video frame is 01 when the first target video starts to play: 20, and then the second to-be-detected video frame extracted from the second target video frame is also 01 when the second target video starts playing: 20 at this point in time. As such, multiple groups of video frames are available. And it can be understood that the play time points of the video frames to be detected of different groups are different. For example, the playing time points of two video frames to be detected in the first group of video frames to be detected, namely, the first video frame and the second video frame in the first group of video frames to be detected are all 00:29, and the playing time points of two video frames to be detected in the second group of video frames to be detected, namely, the first video frame and the second video frame in the second group of video frames to be detected are all 01:20.

It should be noted that the number of the plurality of video frame groups acquired from the same video set may be more than 2, such as 30.

Step S300, screening out similar video frame groups with similarity meeting preset conditions from a plurality of video frame groups, and obtaining target playing time stamps of the similar video frame groups; the similarity is the picture similarity between the first video frame and the second video frame in each video frame group.

In this embodiment, when the picture similarity of two video frames to be detected meets a preset condition, the video frame group where the two video frames to be detected are located is determined to be a similar video frame group.

For example, 20 groups of video frame groups to be detected, wherein the similarity of the video frame groups to be detected corresponding to the 4 playing time points of 00:05, 00:30, 001:00, 01:30 meets a preset condition, and the video frame groups to be detected are determined to be similar video frame groups.

It can be appreciated that a person skilled in the art knows how to determine the picture similarity between two video frames, such as performing similarity determination based on techniques such as SSIM (structural similarity measure), cosin similarity (cosine similarity), or histogram, and this embodiment will not be described in detail.

After determining a plurality of similar video frame groups, the target playing time stamp of the similar video frame groups can be obtained according to the playing time stamps of the first video frame and the second video frame in the similar video frame groups.

Step S400, determining at least one continuous playing time period according to a plurality of target playing time stamps; and the difference value of the playing time stamps between two adjacent target playing time stamps in the continuous playing time period is smaller than a preset time threshold.

Step S500, determining a plurality of video frames corresponding to the continuous playing time period in the same video set as similar video contents of the same video set.

Since the target play time stamp of the group of similar video frames has been determined, the play time points other than the sporadic play time point are determined on the play time axis. At this time, the continuous playing time periods of the corresponding similar video frame group distribution can be obtained according to the playing time points. After the continuous playing time period is obtained, a plurality of video frames corresponding to the continuous playing time period can be determined to be similar video contents of the same video set.

It is worth mentioning that the continuous play time period includes at least one.

It will be appreciated that for the same television episode, the similarity of the beginning and end portions between each set of target videos is high because most of the episodes and the beginning and end portions of the episodes are fixed and collectively clipped. Thus, the head part and the tail part of the episode of the same television play can be detected by comparing the episodes in pairs.

If the same video set only has the same similar title content, then there is only one continuous play period. It can be appreciated that in this embodiment, the similar video set may be a head part and a tail part of a tv episode, and may also be a head part carrying a producer tag in the same video set issued by the same producer. If the same video set only has the same similar tail content, then there is only one continuous playing time period. The 20 video frame groups, wherein the similarity of the video frame groups corresponding to the 4 play time stamps of 00:05, 00:30, 01:00, 01:30 meets a preset condition, and is determined to be the similar video frame group. At this time, the continuous playing time period is determined to be 00:00-01:30 according to 4 playing time points of 00:05, 00:30,1:00 and 01:30. At this time, the first target video and the second target video in the same video set play similar video content in the time period of 00:00-01:30.

If the same video set has the same similar end content, it also has the same similar head content. At this time, the continuous play period includes up to two. If the same video set has the same similar trailer content, the same similar footer content, and the same similar in-clip advertising content, then the continuous play period includes at least three. That is, the continuous playing time period includes at least one of a head time period, a tail time period and an in-chip advertising time period.

Specifically, according to the target playing time stamps of the plurality of similar video frame groups, the continuous playing time period of the head and the continuous playing time period of the tail can be determined. Specifically, the play time stamps of the plurality of similar video frame groups are divided into a first group play time point near the play start time and a second group play time point near the play end time. At this time, the last time point of the first set of playing time points and the playing start time may form a continuous playing time period of the title. The first time point and the end time of the second set of play time points may be combined into a tail continuous play time period.

For example, for a tv episode with a episode play duration of 27 minutes, the play time points of multiple similar video frame groups are: 00:05, 00:30, 01:00, 01:30, and 25:30, 26:00,26:30. At this time, 00:05, 00:30, 01:00, 01:30 can be divided into a first set of play time points, and 25:30, 26:00,26:30 can be divided into a second set of play time points. At this time, a set of video frames with a similar video frame time period of between 00:00 and 01:30 can be determined according to 01:30 in the first set of play time points. At this time, according to 25:30 in the second group of playing time points, the end-of-chip similar video time period is determined to be 25:30-27:00.

It will be appreciated that, as an implementation manner of this embodiment, the slice header similar video frame period and the slice tail similar video period may also be determined only according to the play time stamp. For example, the similar video frame time period of the slice header is 00:05 to 01:30, and the similar video time period of the slice tail is 25:30-26:30.

In addition, since the determined target playing time stamps are a plurality of scattered time points, in order to form a correct continuous playing time period, the two continuous playing time periods are prevented from being judged to be one continuous playing time period, and at the moment, the playing time stamp difference value between two adjacent target playing time stamps in the continuous playing time period is smaller than a preset time threshold value. That is, if the difference between the playing time stamps of the target playing time stamps of two adjacent similar video groups is smaller than the preset time threshold, the two are considered to be in the same continuous playing time period, and if the difference is larger than or equal to the preset time threshold, the two are considered not to be in the same continuous playing time period.

In this embodiment, for the same television episode, since most of the television episodes and the head and tail portions of the television episodes are fixed and integrally clipped, the similarity between the head and tail portions of each set of target videos is higher, so that in this embodiment, two frames of video frames with the same playing time stamp in any two target videos, namely, the first target video and the second target video, in the same video set are compared in similarity by using the difference value hash, so as to determine the television episode, and the head and tail portions in the same video set can be more accurately identified.

Compared with the related art, the method compares the target video frames in the same video set with the video samples to determine similar video frames, or performs similar content detection through big data accessed by a user, in this embodiment, similarity comparison is performed between any two target videos in the same video set, namely, a first video frame and a second video frame with multiple groups of playing time stamps matched in the first target video and the second target video, so as to determine whether the two target videos are similar frames, further, a continuous playing time period is determined according to the obtained target playing time stamps of the similar frame video frame groups, and a plurality of video frames corresponding to the continuous playing time period are similar video sets of the same video set, so that detection errors caused by the quality of the video samples are overcome, detection errors caused by insufficient magnitude of big data accessed by the user are avoided, similar contents in the same video set can be more accurately identified, and further, a similar content frame set is determined according to the determined playing time points of the similar frames.

Based on the above embodiments, a second embodiment of the video content detection method of the present application is presented. Referring to fig. 3, fig. 3 is a flowchart illustrating a video content detection method according to a second embodiment of the present application.

In this embodiment, screening out a group of similar video frames with similarity satisfying a preset condition from a plurality of groups of video frames includes:

step S301, determining a difference value hash of a first video frame and a second video frame in each video frame group respectively;

step S302, determining the similarity of each video frame group according to the difference value hash.

Specifically, the perceptual hash algorithm is used to compare fingerprint information of different images to determine similarity of the images. The closer the comparison results are, the more similar the images are. The difference value hash algorithm, i.e. the dHash algorithm, is adopted in the embodiment. The dHash algorithm has the advantages of both precision and calculation efficiency. And processing each acquired video frame to be detected through a difference value hash algorithm to obtain a difference value hash of each video frame to be detected. And judging the similarity of the first video frame and the second video frame in each video frame group through the hash of the difference value of each video frame to be detected.

In one embodiment, referring to fig. 4, step S301 includes:

step S3011, obtaining a difference value array according to the image difference values of each row of adjacent pixels in the first video frame and the second video frame.

Step S3012, determining preset binary data composed of each preset value and each continuous image difference value in the difference value array.

Step S3013, hash the character string corresponding to the preset binary data as the difference value between the first video frame and the second video frame.

In this embodiment, the preset value may be 8, and the preset data may be 16 data. Specifically, for each video frame to be detected, the intensities of adjacent pixels in each row may be compared in a predetermined direction, e.g., from left to right, and if the color intensity of the previous pixel is greater than that of the second pixel, the difference value is set to 1, otherwise, to 0. And finally obtaining the difference value array of each video frame to be detected. The array of variance values contains only 0 or 1. Each value in the array of difference values is then considered a byte bit, with each 8 bits constituting a 16-ary data. Finally, the 16-system data are connected and converted into character strings, and the difference value hash can be obtained.

The difference value hash algorithm, i.e. the dHash algorithm, is adopted in the embodiment. The dHash algorithm further judges the intensity of adjacent pixels of each video frame to be detected, so that accuracy and calculation efficiency are further considered.

As one embodiment, step S302 includes:

step S3021, obtaining binary data of a hash of a difference value between a first video frame and a second video frame;

Step S3022, performing exclusive-or processing on the two binary data to obtain an exclusive-or result;

step S3023, determining the occurrence number of the preset character in the exclusive or result as the hamming distance between the first video frame and the second video frame, so as to obtain the similarity of the video frame set.

Specifically, two video frames to be detected in a video frame group, namely, the difference value hash of the first video frame and the second video frame, are converted into binary data, and the two binary data are subjected to exclusive-or processing to obtain an exclusive-or result. If the binary data of the first video frame includes a, and the binary data of the second video frame includes b, the exclusive-or result is 1 if the two values of a and b are different. If the values of a and b are the same, the exclusive OR result is 0.

At this time, the exclusive or result is a string of data consisting of 0 and 1, and the number of bits of "1" of the exclusive or result, that is, the different number of bits, is calculated, so as to obtain the hamming distance between the first video frame and the second video frame in the video frame group.

It is understood that hamming distance is named by the name of richard, weiss, hamming. In the information theory, the hamming distance between two equal-length character strings is the number of different characters at the corresponding positions of the two character strings. In other words, the number of characters that need to be replaced to transform one string into another. For example: the distance between 1111101 and 1101001 is 1. In this embodiment, the number of characters required to be replaced by transforming one character string into another character string is already obtained through the exclusive-or operation, so that the number of 1 s in the exclusive-or result is counted. If the exclusive or result of the two video frames to be detected is 00001001000010010000100, the number of occurrences of 1 is 5, and 5 is the hamming distance of the two video frames to be detected. When the similarity of two images is quantized by using the Hamming distance, the larger the Hamming distance is, the smaller the similarity of the images is, and the smaller the Hamming distance is, the larger the similarity of the images is.

In this embodiment, the similar video frame group may be determined from the video frame groups in which the hamming distances between the first video frame and the second video frame in the video frame group satisfy the preset condition. If the hamming distance is smaller than 5, the same picture can be considered, namely, the hamming distance between the first video frame and the second video frame in the video frame group meets the preset condition is determined as a similar video frame group.

In one embodiment, referring to fig. 5, step S301 includes:

step A3011, reducing the first video frame to a first video frame with a preset size.

And step A3012, reducing the second video frame into a second video frame with a preset size.

Step A3013, determining a hash of a difference value of a first video frame with a preset size and a second video frame with a preset size in each video frame group respectively.

It will be appreciated that the resolution of the original image of the video frame to be detected, i.e. the first video frame or the second video frame, is typically very high. A 200 x 200 picture has an entire 4 ten thousand pixels, each pixel stores an RGB value, and 4 ten thousand RGB is a huge amount of information, and many details need to be processed. If the original image of the video frame to be detected is directly processed, the operand is too large. Therefore, in this embodiment, the picture may be scaled to a smaller preset size, so as to hide the detail portion of the video frame to be detected, thereby reducing the calculation amount. For example, the preset size may be wide×high= 9*8. The effect can be achieved using the size method of Python Imaging Library as picture scaling.

In this embodiment, since the head and tail of most of the dramas are fixed and uniformly clipped, the similarity between the sets is high, so that the detail part in the video frame to be detected can be ignored, the size is reduced to the preset size, and whether the video frame belongs to similar pictures is judged through the whole.

In this embodiment, before the difference value hash processing, the size of the video frame to be detected is first reduced, so as to reduce the calculation amount of the subsequent processing, and further improve the calculation speed of the subsequent processing.

As one embodiment, step S301 includes:

step B3011, carrying out graying treatment on a first video frame with a preset size to obtain a first gray video frame

Step B3012, gray processing is performed on the first video frame with the preset size to obtain a second gray video frame

Step B3013, determining hash values of the difference values of the first gray video frame and the second gray video frame respectively

It can be understood that, since most of the top and the tail of the drama are fixed and uniformly clipped, the similarity between the sets is high, so that in order to further reduce the workload, the graying process can be performed after the size of the video frame to be detected is reduced. Specifically, after the video frame to be processed of the color picture is subjected to graying processing, the RGB value can be reduced from three dimensions to only one integer of 0-255 to represent gray scale, so that the complexity caused by using RGB to compare the color forced difference is reduced.

Based on the above embodiments, a third embodiment of the video content detection method of the present application is provided, and referring to fig. 6, fig. 6 is a flowchart of the third embodiment of the video content detection method of the present application.

In this embodiment, before step S200, the method further includes:

and step C100, sampling video frames in a specified playing time period in the first target video frames according to the specified sampling frame rate to obtain a plurality of first video frames.

And step C200, sampling video frames of the video in the appointed playing time period in the second target video frame according to the appointed sampling frame rate, and obtaining a plurality of second video frames.

Wherein the specified sampling frame rate corresponds to a preset time threshold.

Specifically, the specified playback period may be the first 3 minutes or the first 5 minutes of each target video and the last 3 minutes or the last 5 minutes of each target video. It will be appreciated that the specified play time period may also be the first 3 minutes or the first 5 minutes of each target video; or the last 3 minutes or 5 minutes of each target video.

In this embodiment, the specified sampling frame rate may be 1 second and 1 frame.

In one embodiment, the video content detection device thus first obtains all target videos within the same video set. And dividing each target video, dividing the video 5 minutes before taking the first target video by using ffmpeg according to 1 second and 1 frame to obtain 300 pictures, and naming each picture as a first target video 1-first target video 300 according to a sequence number. Similarly, taking the first 5 minutes of the second target video, dividing the video by using ffmpeg according to 1 second and 1 frame to obtain 300 pictures, and naming each picture as a second target video 1-second target video 300 according to a serial number. It will be appreciated that since the designated sampling frame rate is the same and the designated playing time period is the same, the playing time points of the first target video 1 and the second target video 1 are the same, and so on, and the playing time points of the first target video 300 and the second target video 300 are also the same.

Pairing the first target video 1-first target video 300 and the second target video 1-second target video 300 one by one, so as to obtain a plurality of video frame groups to be detected: a first group: a first target video 1-a second target video 1; second group: a first target video 2-a second target video 2; … …; group 300: first target video 300: a second target video 300.

In this embodiment, sampling is performed in a specified playing time period of each target video by specifying a sampling frame rate, so that a plurality of video frames to be detected corresponding to the playing time point can be obtained. It will be appreciated that in order to increase the accuracy of identification, a better accuracy of identification may be obtained by adjusting the specified sampling frame rate.

It will be appreciated that the aforementioned specified sampling frame rate corresponds to a preset time threshold. Thereby, the accuracy of judging the continuous playing time period can be improved. If the specified sampling frame rate can be 1 second for 1 frame, the preset time interval is 2S. If the difference between the two target playing time stamps is smaller than 2S, the two target playing time stamps are considered to belong to the same continuous playing time period.

Based on the same application conception, referring to fig. 7, the present application further provides a video content detection apparatus, including:

the video frame matching module is used for matching the plurality of first video frames with the plurality of second video frames one by one according to the playing time stamp to obtain a plurality of video frame groups;

the similar frame determining module is used for screening out similar video frame groups with similarity meeting preset conditions from the plurality of video frame groups and obtaining target playing time stamps of the similar video frame groups; the similarity is the picture similarity between the first video frame and the second video frame in each video frame group;

the time period determining module is used for determining at least one continuous playing time period according to the plurality of target playing time stamps; the difference value of the playing time stamps between two adjacent target playing time stamps in the continuous playing time period is smaller than a preset time threshold value;

and the similar content determining module is used for determining a plurality of video frames corresponding to the continuous playing time period in the same video set as similar video contents of the same video set.

It should be noted that, in this embodiment, each implementation manner of the video content detection apparatus and the technical effects achieved by the implementation manner may refer to various implementation manners of the video content detection method in the foregoing embodiment, and are not described herein again.

In addition, the embodiment of the application also provides a computer storage medium, wherein a video coding program is stored on the storage medium, and the video coding program realizes the steps of the video coding method when being executed by a processor. Therefore, a detailed description will not be given here. In addition, the description of the beneficial effects of the same method is omitted. For technical details not disclosed in the embodiments of the computer-readable storage medium according to the present application, please refer to the description of the method embodiments of the present application. As an example, the program instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of computer programs, which may be stored on a computer-readable storage medium, and which, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random access Memory (Random AccessMemory, RAM), or the like.

It should be further noted that the above-described apparatus embodiments are merely illustrative, where elements described as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art will understand and implement the present application without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general purpose hardware, or of course by means of special purpose hardware including application specific integrated circuits, special purpose CPUs, special purpose memories, special purpose components, etc. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions can be varied, such as analog circuits, digital circuits, or dedicated circuits. However, a software program implementation is a preferred embodiment for many more of the cases of the present application. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a Read-only memory (ROM), a random-access memory (RAM, randomAccessMemory), a magnetic disk or an optical disk of a computer, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method of the embodiments of the present application.

The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the application, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A method for detecting video content, comprising:

acquiring a first target video and a second target video; the first target video and the second target video belong to the same video set, the first target video comprises a plurality of first video frames, the second target video comprises a plurality of second video frames, and the first video frames and the second video frames have play time stamps; the same video set comprises a plurality of target videos, and each target video has similar video content;

2. The method for detecting video content according to claim 1, wherein said selecting a group of similar video frames having a similarity satisfying a preset condition from a plurality of said groups of video frames comprises:

3. The method according to claim 2, wherein the determining the hash of the difference value of the first video frame and the second video frame in each of the video frame groups, respectively, includes:

4. The method according to claim 2, wherein the determining the hash of the difference value of the first video frame and the second video frame in each of the video frame groups, respectively, includes:

reducing the first video frame into a first video frame with a preset size;

reducing the second video frame into a second video frame with a preset size;

5. The method according to claim 2, wherein the determining the hash of the difference value of the first video frame and the second video frame in each of the video frame groups, respectively, includes:

carrying out graying treatment on a first video frame with a preset size to obtain a first gray video frame;

carrying out graying treatment on a first video frame with a preset size to obtain a second gray video frame;

6. The method for detecting video content according to claim 1, wherein said matching a plurality of said first video frames with a plurality of said second video frames one by one according to said play time stamp, before obtaining a plurality of video frame groups, said method further comprises:

7. The video content detection method according to any one of claims 1 to 6, wherein the continuous play period includes at least one of a head-of-chip period, a tail-of-chip period, and an in-chip advertising period.

8. A video content detection apparatus, comprising:

the video frame acquisition module is used for acquiring a first target video and a second target video; the first target video and the second target video belong to the same video set, the first target video comprises a plurality of first video frames, the second target video comprises a plurality of second video frames, and the first video frames and the second video frames have play time stamps; the same video set comprises a plurality of target videos, and each target video has similar video content;

9. A video content detection apparatus, comprising: a memory, a processor and a video content detection program stored on the memory and executable on the processor, the video content detection program being configured to implement the steps of the video content detection method of any one of claims 1 to 7.

10. A computer-readable storage medium, on which a video content detection program is stored, which when executed by a processor implements the steps of the video content detection method according to any one of claims 1 to 7.