CN107431831B

CN107431831B - Apparatus and method for identifying video sequence using video frame

Info

Publication number: CN107431831B
Application number: CN201680017875.XA
Authority: CN
Inventors: 塔勒·马奥兹; 贾勒·莫施池; 阿莉扎·埃特兹克维茨; 泽埃夫·格泽尔; 瑞文·威彻福格尔
Original assignee: Cisco Technology Inc
Current assignee: Cisco Technology Inc
Priority date: 2015-03-25
Filing date: 2016-02-18
Publication date: 2020-08-11
Anticipated expiration: 2036-02-18
Also published as: WO2016151415A1; CN107431831A; EP3274868A1

Abstract

In one embodiment, a system includes a processor that: retrieving the value comprising X₀And Y₀A first data element of (a); providing a hash function for use with a hash table having buckets, the hash function having a first input and a second input that are combined to map to one of the buckets, wherein the first input is in a range of X values having an X value sub-range, the second input is in a range of Y values having a Y value sub-range, different combinations of the X value sub-range and the Y value sub-range map to different buckets using the hash function; and will take the value X₀And Y₀Input into a hash function, producing an output indicative of a first bucket of the hash table. Related apparatus and methods are also described.

Description

Apparatus and method for identifying video sequence using video frame

Technical Field

The present disclosure relates generally to the storage and retrieval of information.

Background

Video sequences may be identified for a number of reasons, including: identifying television program replays to associate existing metadata with a program, identifying when certain advertisements or content items are broadcast, identifying pirated content, and other data analysis tasks. Video signatures may be used to identify video sequences.

Drawings

The present disclosure will be understood and appreciated more fully from the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a partially pictorial, partially block diagram, view of a broadcast system including a stream analyzer, constructed and operative in accordance with an embodiment of the present invention;

FIG. 2A is a view of a video frame processed by the stream analyzer of FIG. 1 constructed and operative in accordance with an embodiment of the present invention;

2B-D are views of the video frame of FIG. 2A divided into smaller regions;

FIG. 3 shows a flowchart of exemplary steps performed in creating a video signature by the stream analyzer of FIG. 1, in accordance with an embodiment of the present invention;

FIG. 4 shows a flowchart of exemplary steps performed in matching video signatures by the stream analyzer of FIG. 1, in accordance with an embodiment of the present invention;

FIG. 5 is a diagram of a two-dimensional hash table used by the flow analyzer of FIG. 1, according to an embodiment of the invention;

FIG. 6 shows a flowchart of exemplary steps performed in populating the two-dimensional hash table of FIG. 5, in accordance with an embodiment of the present invention;

FIG. 7 shows a flowchart of exemplary steps performed in extracting data from the two-dimensional hash table of FIG. 5, in accordance with embodiments of the present invention;

FIG. 8 shows a flowchart of exemplary steps performed in a method for improving signature matching speed of the flow analyzer of FIG. 1, in accordance with an embodiment of the present invention;

FIG. 9 shows a flowchart of exemplary steps performed in a method for identifying video signatures for metadata tagging by the stream analyzer of FIG. 1, in accordance with an embodiment of the present invention;

FIG. 10 shows a flowchart of exemplary steps performed in a method for marking, by the stream analyzer of FIG. 1, video signatures associated with episodes of a series of programs, in accordance with an embodiment of the present invention;

FIG. 11 shows two video sequences being processed by the stream analyzer of FIG. 1; and

FIG. 12 shows a flowchart of exemplary steps performed in identifying content item boundaries within an unknown content item by the flow analyzer of FIG. 1, in accordance with an embodiment of the present invention.

Detailed Description

Overview

According to an embodiment of the invention, there is provided a system comprising a processor and a memory for storing data used by the processor, wherein the processor is operable to: retrieving a first data element from memory, the first data element comprising a value X₀Sum value Y₀(ii) a Providing a hash function for use with a hash table having a plurality of buckets (buckets), the hash function having a plurality of inputs including a first input and a second input, the first input and the second input combined mapping to one of the buckets, wherein (a) the first input is within a range of X values having a plurality of non-overlapping sub-ranges of X values, (b) the second input is within a range of Y values having a plurality of non-overlapping sub-ranges of Y values, (c) the hash function maps to the same bucket when the first input is any value within one of the sub-ranges of X values and the second input is any value within one of the sub-ranges of Y values, and (d) different combinations of the sub-ranges of X values and Y values map to different buckets using the hash function; and will take the value X₀Sum value Y₀Input into a hash function, an output is generated that indicates a first bucket of the buckets of the hash table.

There is also provided, in accordance with another embodiment of the present invention, a system, including a processor and a memory for storing data used by the processor, wherein the processor is operable to: retrieving a first video signature from memory, the first video signature being a video signature of a content item currently being broadcast; determining that the first video signature corresponds to a start of a content item currently being broadcast that is within the first five minutes of a beginning of the content item; and issuing a command to compare the first video signature with a database of video signatures, the comparison starting with a video signature corresponding to the beginning of the content item, followed by a search for other ones of the video signatures.

Description of examples

Referring now to fig. 1, fig. 1 is a partially pictorial, partially block diagram view of a broadcast system 10 having a stream analyzer 12 constructed and operative in accordance with an embodiment of the present invention.

In many cases, including checking pirated copies of movies or in broadcast scenes, video signatures (sometimes referred to as video fingerprints) can be used to identify video sequences. In a broadcast scenario, identifying a video sequence may be useful in associating metadata with the current broadcast content.

Although the video signature creation and matching method described herein is described with reference to broadcast system 10, it should be understood that the creation and matching method may be applied to any suitable scenario, such as, but not limited to, finding pirated copies of a movie.

The broadcast system 10 generally includes a head end 14, a plurality of end user receiver devices 16 (only one shown for simplicity), and a stream analyzer 12.

Content is typically provided (broadcast or multicast) by the headend 14 to end user receiver devices 16 or any other suitable receiving device, such as, but not limited to, a mobile device 20 having content reception and playback capabilities. Alternatively or additionally, by way of example only, the receiving device may retrieve/receive content from a content server delivering pay-per-view content.

The content may be delivered by the headend 14 using any suitable communication technology, such as, but not limited to, satellite, cable, Internet Protocol (IP), terrestrial, or wireless communication.

Ideally, the content items are typically transmitted/delivered from the headend 14 along with appropriate metadata regarding the content items. However, some content items may or may not be associated with little or no metadata.

The stream analyzer 12 is operable to receive/retrieve content transmitted/transmitted by the headend 14. Flow analysis fig. 12 attempts to identify content by comparing the video signature of the broadcast content to a database 22 of video signatures. It will be appreciated that the stream analyzer 12 may also analyze non-broadcast content items. It should also be noted that the content items may be any suitable content items, such as, but not limited to, television programs, movies, advertisements, trailers, and promotional videos. Once a content item has been identified based on an appropriate video signature match, appropriate metadata can be linked with the content item for use by other devices. The metadata for the content item may already be in the database 22 (associated with the existing video signature that matches the broadcast content item), or the metadata may be retrieved via an appropriate search of the information database based on the content ID (e.g., serial number or name) of the matching content

The headend 14 may retrieve the metadata or a link to the metadata of the broadcast content item from the stream analyzer 12 for transmission to the end user receiver device 16. Alternatively or additionally, the end user receiver device 16 and/or the mobile device 20 may retrieve metadata or a link to the broadcast content item from the flow analyzer 12 via IP or any other suitable wired or wireless link.

The flow analyzer 12 is typically implemented on at least one server, optionally in a cloud computing environment. The stream analyzer 12 generally includes a processor 24 and a memory 26 for storing data used by the processor 24.

Stream analyzer 12 may send certain video sequences to user interface module 28 for manual metadata tagging. User interface module 28 is also typically associated with display device 30 to display video sequences requiring manual metadata tagging. The user interface module 28 may implement the interface module on the same processor/server as the flow analyzer 12 or a different processor/server. The user interface module is described in more detail with reference to fig. 9 and 10.

An example embodiment of the flow analyzer 12 will be described in more detail below with reference to fig. 2A through 12. Signature generation and matching may be performed by the same processor/server or may be performed using different processors/servers. The matching process may be distributed over many processors/servers.

It will be appreciated that in a broadcast scenario, video signature generation and matching may need to be configured to index and detect over 100 channels 24 hours a day, 7 days a week, and operate in real time.

Signature generation

Referring now to fig. 2A, fig. 2A is a diagram of a video frame 32 processed by stream analyzer 12 of fig. 1, according to an embodiment of the invention.

Each digital signature is generated for a single video frame. This process is repeated and a video signature is generated for each frame sampled from the video stream. The sampling rate is a configurable parameter. The inventors have found that a sampling rate of 1 frame per second provides good results. However, those of ordinary skill in the art will appreciate that the sampling rate may be greater or less than 1 frame per second, depending on the particular application and the precision required for the application. As will be described in the matching phase with reference to fig. 4, there is a trade-off between performance and accuracy when selecting the sampling rate.

The process for generating a video signature for a single frame is as follows and is described with reference to video frame 32.

First, a weighted average luminance is calculated for the frame 32 as a whole. The result is a pair of floating point numbers representing the luminance "centroid" of the frame 32, which can be calculated using the following equation:

wherein:

li is the luminance value of pixel i;

xi is the column (based on 1) index of pixel i; and

yi is the row (based on 1) index of pixel i.

The luminance values are normalized such that 0 represents no luminance and 1 is the maximum luminance.

A single "centroid" point is generally not sufficient to characterize a single frame, and therefore, the frame 32 is divided into several sub-images or sub-regions, and the above-described luminance "centroid" calculation is determined for each sub-image, as will now be described in more detail below.

Referring now to fig. 2B-D, fig. 2B-D are views of the video frame 32 of fig. 2A subdivided into smaller regions 34 (only some labeled for simplicity).

The video frame 32 is subdivided into smaller regions 34 (e.g., 2 blocks by 2 blocks, i.e., the video frame 32 is divided into 4 parts, R1-R4 as shown in FIG. 2B), and the weighted average luminance of each sub-block (region 34) is calculated, resulting in 4 groups

The value is obtained.

The "centroid" of each region 34 is calculated in the same manner as the "centroid" is calculated for the entire video frame 32, by assuming that each region 34 has its own x and y axes (as shown by x1, y1, etc. in fig. 2B).

The video frame 32 is subdivided into smaller regions 34 (e.g., 4 blocks by 4 blocks, i.e., the video frame 32 is divided into 16 parts, R1-R16 as shown in FIG. 2C), and the weighted average luminance of each sub-block (region 34) is calculated, resulting in 16 groups

The value is obtained.

The pairs are also referred to as Lx and Ly in the following.

For higher precision, which may be desirable in many cases, discussed in more detail below, the video frame 32 may be subdivided into even smaller regions 34 (e.g., 8 blocks by 8 blocks, i.e., the video frame 32 is divided into 64 portions, as shown in fig. 2D), and the weighted average luminance of each sub-block (region 34) is calculated, resulting in 64 groups

The value is obtained.

It should be noted that the division of the video frame 32 into regions 34 is not fixed to the 2 by 2, 4 by 4, 8 by 8 divisions described above. The video frames 32 may be subdivided into any suitable arrangement of smaller or larger regions 34. For example, the frame 32 may be divided as follows: 3 by 4, 6 by 6, and 8 by 10.

In some cases, 3 different levels of video signatures may be sufficient, where the 3 levels include:

grade 1: the "centroid" of the entire frame 32. The result is a single

And (4) point.

Grade 2: the "centroid" of each sub-frame (region 34) when the frame 32 is divided into 2 by 2 blocks for a total of 4 sub-frames. The result is 4

A list of points.

Grade 3: when a frame is divided into 4 x4 blocks for a total of 16 sub-frames, the "centroid" of each sub-frame. The result is 16

A list of points.

The final signature of frame 32 is as described above starting with level 1, followed by level 2 and then level 3

For the aggregation, 42 floating point numbers are obtained.

The following is an example signature in the format of JavaScript object notation (JSON):

for many cases, a 4 th level may be added; for example, where the video frame 32 is split into 64 sub-frames of 8 by 8 regions, 64 additional sub-frames are generated in the video signature

Point, a total of 170 floating point numbers are obtained in the video signature.

The inventors have found that a CPU system without general purpose computing on the graphics processing unit (GPGPU) (e.g., a PC with the specifications core i 7X 990, 3.47GHz 12M, 24MB triple channel DDR3 RAM 64 GB, 1066GHz, 4 500GB 7200RPM, 3.5HDs, MDRaid5) can compute the above signatures at 60 frames per second for HD content without any compiler optimizations. Multithreading and GPGPU may be used to further optimize the computation, increasing the computation rate to hundreds of frames per second using a single GPGPU device. Assuming a sample rate of 1 frame per second, this translates to hundreds of videos being processed on a single server with a GPGPU device.

In the case of MPEG-based video (e.g., MPEG-2/4 or h.264), the above-described signature generation process can be highly optimized due to the nature of MPEG compression. In MPEG compression, a picture is divided into macroblocks from which luminance and chrominance components are sampled. During this process, the luminance values are completely sampled, meaning that no data is lost. However, the samples are transformed using a Discrete Cosine Transform (DCT) and quantized using a predefined quantization matrix, which is basically a weighting function for the DCT coefficients.

Although some information is lost during quantization, the commonly used quantization matrices make the lower DCT frequencies more emphasized, while the higher frequencies are more likely to be lost. This means that when the quantization is inverted, the resulting image will likely be slightly blurred compared to the original image. However, because the signature is generated for the weighted average luminance, the effect of creating the signature and being able to identify a match is generally unaffected by the DCT transform. The actual generated signatures may of course be slightly different when the number varies. However, such blurring is actually an average over the macroblock pixels, and thus the overall effect of the final signature is negligible.

The video signature can be quickly and efficiently computed from the transformed and quantized luminance values by reconstructing luminance macroblocks, resulting in a luminance value for each pixel in the frame. In the case of a simple I-frame, the quantization and DCT are inverted. In the case of P or B frames, or hybrid I/P/B frames (e.g., as used in h.264), the delta macroblocks are added to create the final macroblock before quantization and DCT are reversed. The above calculations are then performed to determine the weighted average luminance of the video frame 32 and all regions 34 of the video frame 32 at various different levels.

It should be noted that the original color frame does not need to be reconstructed at any time, since only the luminance value and position of each pixel is used in the signature calculation. Furthermore, because it is generally not necessary to compute a signature on every and every frame (because signatures are sample based), decoding of non-correlated luma frames from the stream can be skipped. What actually happens is that the selected luminance frame (the entire frame) is decoded and reconstructed.

Another method for determining a weighted average luminance is described below. If one would like to forego the ability to be robust to video scaling changes (i.e., to change the size of the video frame), the signature can be computed directly from the encoded macroblock without inverting the DCT. The image from which the signature is computed in this case is a map of DCT macroblocks. It will be noted that the DCT values are not luminance values of the pixels of the decoded frame 32, but the DCT values are measures of luminance in a frequency domain representation that gives a frequency representation of the luminance. For example, if a 16 by 16 pixel macroblock of a video frame is subjected to a DCT transform, the result is a 16 by 16 matrix of values in the frequency domain. The weighted average of the 16 by 16 matrix may then be determined in the same manner as the weighted average of the 16 by 16 pixels is determined. Thus, computing the signature can be done without having to fully decompress the video stream and without having to reconstruct individual frames, making the process more efficient.

Referring now to fig. 3, a flowchart illustrating exemplary steps performed in creating a video signature by the stream analyzer 12 of fig. 1 is shown, in accordance with an embodiment of the present invention.

The processor 24 is operable to retrieve data for the video frame 32 from the memory 26 (block 36). The data typically includes a plurality of luminance measures. Each brightness metric is associated with a different entry in the matrix. The video frame 32 includes a plurality of pixels. In one embodiment, each brightness metric provides a measure of the brightness of a different one of the pixels, such that the matrix is a matrix of brightness values of the pixels of the video frame 32. Typically, all pixels of the video frame 32 are used in the average weighted luminance calculation. However, samples of pixels may be used in the calculation. It is estimated that at least 60% of the pixels should be included in the luminance measure, such that the luminance measure collectively provides a measure of the luminance of at least 60% of the pixels.

In an alternative embodiment, each luminance measure may be a measure of luminance in a discrete cosine transformed version of the image of the video frame, as described above, such that the matrix is a matrix of DCT values.

The processor 24 is operable to calculate a weighted average luminance value Ly of the luminance metrics such that each luminance metric is weighted according to its row position (y-axis coordinate) in the matrix throughout the video frame 32. The processor 24 is operable to calculate a weighted average luminance value Lx of the luminance metrics such that each luminance metric is weighted according to its column position (x-axis coordinate) in the matrix throughout the video frame 32 (block 38).

Processor 24 is operable to divide the data of video frame 32 into a plurality of sets of luminance metrics corresponding to a plurality of different regions 34(R) of video frame 32, each set being associated with a different sub-matrix of the matrix (block 40).

The processor 24 is operable to divide the data of the video frame 32 into sets of luminance metrics such that some sets of luminance metrics correspond to a plurality of different regions 34 of the video frame 32 and some sets of luminance metrics correspond to sub-regions of the different regions 34 of the video frame.

In one embodiment, processor 24 is operable to divide the data of video frame 32 into sets of luminance metrics by dividing video frame 32 into four regions 34, each of the four regions 34 corresponding to a different set. The processor 24 is operable to further divide each of the four regions 34 into four sub-regions 34, each sub-region 34 corresponding to a different set.

For each set of luminance metrics corresponding to a region and a sub-region R, the processor 24 is operable to: (a) calculating a weighted average luminance value ly (r) of the luminance measures in the set such that each luminance measure is weighted according to its row position (y-axis coordinate) in the submatrix of the set; and (b) calculating a weighted average luminance value lx (r) of the luminance measures in the set such that each luminance measure is weighted according to the column position (x-axis coordinate) of the luminance measure in the submatrix of the set (block 42).

Processor 24 is operable to create a video signature S for video frame 32_N(block 44). Video signature S_NIncluding a weighted average luminance value Ly and a weighted average luminance value Lx at the frame level. For each set of luminance metrics corresponding to a region and sub-region R, the processor 24 is operable to append the weighted average luminance value ly (R) of the set and the weighted average luminance value lx (R) of the set to the video signature S of the video frame 32_N. Thus, the video signature S_NIncluding a plurality of weighted average luminance values calculated for three or more levels of frame division. The three or more levels include: (a) the frame 32 is not divided; (b) dividing the frame 32 into different regions 34; and (c) dividing each of the different regions 34 into sub-regions.

Signature matching

Signature matching is performed at two levels. The first level is matching a single frame and the second level is matching a plurality of frames in a video sequence. Matching single frames is now described.

Matching single frames

Referring now to fig. 4, a flowchart illustrating exemplary steps performed in matching video signatures by the stream analyzer 12 of fig. 1 is shown, in accordance with an embodiment of the present invention.

Single frame matching is now generally described.

The matching process begins with a coarse search, extracting a set of candidate signatures from the database 22, given the signature of the target frame for the lookup. A finer granularity search is then performed on the set of candidates. Using this 2-phase search to enhance search performance allows searches to be performed in real time and efficiently on large data volumes.

The coarse search looks at the weighted average luminance pairs computed at the frame level and extracts signatures that match (x, y) — (x ', y') +/-or less (i.e., have the largest difference).

Based on various factors discussed in more detail below. By way of example, may be at 2^-8To 2^-5Within a range of maximum luminance values.

(x, y) is a weighted average luminance pair of the signature of the target frame, and (x ', y') is a weighted average luminance pair of the signature in the database 22.

A finer search is then performed in the candidate set by treating the signatures as vectors having 42 or 170 elements (depending on the number of levels of division of the frame used to generate the signature), and a Pearson correlation or other suitable correlation between the signature of the target frame and the signature of each candidate frame is calculated. Possible matching methods are described in more detail below. The correlation coefficients below a certain level are filtered out and the candidate with the highest correlation coefficient, if any, among the remaining candidates is selected as the match. The relative levels at which filtering out occurs are discussed in more detail below.

The signature matching process is now described in more detail.

The video signature S is generally calculated for the target frame according to the method described with reference to fig. 2A-3₀(block 46). However, the video signature S may be generated according to any suitable video signature generation method₀. Video signature S₀Comprising at least one average (typically a weighted average) luminance value L (typically comprising a weighting) of a target video frameThe average luminance value Lx and the weighted average luminance value Ly), and a plurality of average (typically weighted average) luminance values of a plurality of different regions R of the target video frame, so that for each different region R of the target video frame there is: (a) weighted average luminance value lx (r); and (b) a weighted average luminance value ly (r).

Processor 24 is generally operable to retrieve video signature S from memory₀。

The database 22 includes a plurality of previously processed video signatures Si generated according to the method described with reference to fig. 2A-3 or other suitable method. Each video signature Si typically comprises a weighted average luminance value Lxi and a weighted average luminance value ly at the frame level. For signature S corresponding to video₀Each video signature Si also comprises (in other words, all video signatures are generated based on the same region and sub-region, so that all signatures can be compared equally): (a) a weighted average luminance value Lx (R) i; and (b) a weighted average luminance value ly (r) i.

Processor 24 is operative to determine a video signature S based on matching criteria₀At least one average luminance value L (typically comprising a weighted average luminance value Lx and a weighted average luminance value Ly) at the frame level of (a) is the best matching subset of the video signature Si.

The subset is typically determined by processor 24 comparing: a weighted average luminance value Lx of the weighted average luminance values Lxi of each of the plurality of video signatures Si; and a weighted average luminance value Ly of the weighted average luminance values Ly of each of the plurality of video signatures Si.

It should be noted that there may be differences in the weighted average luminance of two frames that should provide a match due to various factors, including: differences introduced by lossy compression, different flags, and other smaller artifacts in different versions of the content item, and timing differences due to sampling rates. The processor 24 is operable to determine the subset of video signatures Si to include video signatures Si having: (a) video signature S₀A weighted average luminance value Lxi within the first limit of the weighted average luminance value Lx of (a); and (b)) Video signature S₀The weighted average luminance value Ly i within the second limit value of the weighted average luminance value Ly. By way of example, the first and second limits may be at 2^-8To 2^-5Within a range of maximum luminance values. For high resolution content, it is expected that the range will shrink to 2^-8To 2^-6And for standard resolution content, the range is narrowed to 2^-7To 2^-5In the meantime. The inventors found that setting the limit to 2^-7(0.0078) is particularly useful for matching high-resolution content from various content sources in real-time. It will be understood that the actual values and ranges will depend on many factors, including by way of example only: the memory and processing power of the computer performing the comparison, the number of content items in the database and the time available for processing (e.g., online versus offline), and the nature of the content. The limit may be set to any suitable value in order to create a subset that does not preclude a match, but does not create a larger subset that may create a processing burden. For example, the first and second limits may be set to a percentage of the maximum brightness value, such as, but not limited to, 1% or 2% of the maximum brightness value, or when the brightness range is in the range of 0 to 255, the first and second limits may be set equal to a certain number, such as 0.5, 1, or 2.

Processor 24 is operative to sign at least S₀And the plurality of weighted average luminance values (and typically all weighted luminance values) of the regions and sub-regions R are compared with a subset of the video signatures Si in an attempt to find the best matching video signature Si among the subsets that meet the matching criteria (block 50). Typically by treating the value of each signature as a vector, and signing S₀Each value in the vector of (a) and the same value in the vector of video signatures Si of the subset are compared to perform the above comparison.

As part of the comparison, for each different region R of each video signature Si, the processor 24 is operable to calculate the video signature S based on the correlation of the measures of (a) the average weighted luminances Lx, Ly, Lx (R), and Ly (R) with respect to the measures of (b) the weighted average luminances Lxi, Lyi, Lx (R) i, and Ly (R) i, respectively₀And the vector of each video signature Si in the subset (or other suitable matching measure).

When a certain video signature Si and video signature S are calculated₀Has a correlation coefficient greater than some minimum correlation coefficient and is the highest correlation coefficient of all video signatures Si in the subset, the processor 24 is operable to select this one of the video signatures Si. The minimum correlation coefficient is configurable and is typically in the range of 0.5 to 0.9. If the value is too high, a match may be lost. If the value is too low, false positives may be generated. It should be noted that the actual coefficient values may depend on the type of video content being examined. Experiments by the inventors have shown that a coefficient of 0.708 is suitable for various video content types. Correlations are typically calculated based on Pearson correlations or any other suitable correlation method, such as, but not limited to, Spearman and Kendall correlation methods. Other methods of measuring the similarity of signatures can be used to find the subset of video signatures Si and video signatures S₀The best match between them, such as, but not limited to, calculating the euclidean distance.

It should be noted that in video signature S₀And a subset of video signatures Si, each signature S₀All portions of the vector of Si and Si may be given equal weight, or different portions of the vector may be given different weights, by way of example only, e.g., regions of a video frame in the center of the video frame are given higher weight and outer regions of the video frame that may contain subtitles or channel defects/logos are given lower weight, or portions of the video frame that may be associated with greater variation and/or motion are given higher weight based on analysis of motion vectors of the encoded video frame.

The next stage of processing depends on whether or not the video is signed by the signature S₀A match is found (box 52). If a match is not found (branch 54), the next sample frame is fetched (block 56) and processing continues with the step of block 46. If a match is found (branch 58), processing continues with matching more frames of the content item C from the matching signature Si, as will be described in more detail below.

Matching video sequences

Still referring to fig. 4.

Matching a single frame does not necessarily indicate a matching video content item or sequence. To match the video content, a "lock" mechanism is used. If several consecutive signatures belonging to the same content item match, it is considered that a "lock" to the content item has been made. Processor 24 continues to calculate and match signatures to identify when a lock may be lost, for example, at an advertisement break point or if there is an edited version of the original content. If several consecutive frames do not match, then the lock is lost. The number of consecutive frames required for locking or unlocking and the tolerance to non-matching frames (the number of non-matching frames allowed between matching frames) may depend on a variety of factors, such as, but not limited to, the sampling rate (as described above), how many levels of frame partitioning are included in the signature, and the kind of content being examined. The number of frames used for locking or unlocking may be in the range of 2 to 10 frames, but the range may be extended to above 10. The tolerance for non-matching frames may be in the range of 1 to 4 frames, but may be higher than 4.

Matching video sequences will now be described in more detail.

Processing a next sample frame from the broadcast stream to generate a video signature S₁. Sign video S₁With the next video signature C in the content item C₁A comparison is made (box 60).

This comparison process is typically based on the comparison used in the step of block 50 to see if a video signature S is present₁Vector sum video signature C₁Correlation of vectors (and thus matching)

If at decision block 62 there is no match, processing continues along branch 64 to block 56 and then to block 46. It should be noted that tolerance may be allowed for non-matching frames at this point, and processing may continue at block 60 in any event. If there is a match, processing continues along branch 68 at decision block 66.

At block 66, processor 24 checks whether a lock has been reached. If the lock has not been reached, processing continues along branch 70 to block 60. If the lock has been reached, processing continues along branch 72 at block 74.

At block 74, processor 24 is operable to increment a counter of the locked content item by 1. This counter may be used for various search heuristics (heiristics) described in more detail later. The processor 24 is further operable to designate a video Signature (SIG) generated from the broadcast stream prior to locking as an unknown content entry and store the video signature in the database 22. It should be noted that based on the locking and unlocking processes, all frames from the broadcast stream for which a video signature is generated by the processor 24 may be linked to a (known or unknown) content entry from the database 22.

Processing then continues with block 76, block 76 compares the next sample frame to the next signature in content item C, and processing continues at decision block 78. At decision block 78, if there is a match (branch 80), processing continues with block 76. If there is no match (branch 82), processing continues with decision block 84. It should be noted that tolerance may be allowed for non-matching frames at this point, and processing may continue with block 76.

At decision block 84, the processor checks to see if a lock has been lost. If a lock has not been lost (branch 86), processing continues with the step of block 76. If the lock has been lost (branch 88), processing continues with block 56 and then block 46.

Sampling

According to experiments performed by the inventors, sampling 1 frame per second may give a good balance between performance and exact matching. There is a trade-off between performance and accuracy when selecting the sampling rate. On the one hand, using a higher sampling rate allows matching video sequences with a higher level of precision. On the other hand, higher sampling rates require more resources, reducing overall system performance. It is also possible to sample 2, 3 or 4 or more frames per second or one frame more than one second, taking into account the balance between performance and exact matching.

Robustness

It should be noted that the video signature generation and inspection method is flexible to maintain aspect ratio, video frame rate, or bit rate variation resolution (video scaling) that may affect image quality (blocking or pixelation), and small OSD (on screen display) appearing on the video (e.g., channel defects or ticker tape or logos that may appear in different areas of the screen).

The checking method is particularly resilient to frame rate variations when the sampling is done per time unit.

It should be noted that the signature may be slightly affected by the above factors, but with proper calibration of the signature match and lock parameters, the matching method is able to overcome the above factors and match the content.

Elasticity

There may be a time shift at the start of the samples of the two video sequences being compared. For example, the original sequence may have sampled

frames

1, 26, 51, 76, 101,. and the repeated sequence may have sampled frames 5, 30, 55, 80, 105,. et. The "locking" mechanism is generally resilient to such time shifts. Even when the frames are shifted, the sampled frames will have very similar signatures as long as they are sampled from the same scene. A scene change may result in a single mismatch while subsequent frames should continue to match. The locking mechanism is also generally resilient to accidental mismatches.

Matching heuristic

Referring now to FIG. 5, FIG. 5 is a diagram of a two-dimensional hash table 90 used by the flow analyzer 12 of FIG. 1, according to an embodiment of the present invention.

For some applications (e.g., television broadcasts) that require indexing hundreds of channels 24 hours per day, 7 days per week, the amount of data in the database 22 may grow rapidly, which may adversely affect the search time when attempting to match content. This may become even more important if matching needs to be performed in real time.

The two-dimensional hash table 90 and its associated hash function are designed to speed up the search process.

The two-dimensional hash table 90 may operate over a range ([0, 0], [1, 1]) from 0 to 1 along the x-axis and from 0 to 1 along the y-axis. The range is divided into buckets 92 (only some are labeled for clarity). The X-axis is divided into sub-ranges, range-X1, range-X2, and so on. The Y-axis is divided into sub-ranges, range-Y1, range-Y2, and so on. The bucket size is the maximum difference () accepted between similar frames used in determining the subset described with reference to fig. 4. The signatures are mapped into blocks 92 based on the first two values (x, y pairs) in the vector of signatures, according to the range of x and y values covered by each bucket 92. The first two values represent the weighted average luminance of the video frame as a whole. Therefore, similar signatures will be mapped to the same bucket 92 or adjacent buckets.

In the example of fig. 5, video signatures with pairs of values X0 and Y0 are mapped into buckets with an X-range of range X4 and a Y-range of range Y4, where values X0 and Y0 fall within the X-range and Y-range, respectively.

It will be understood that the bucket size is defined by the hash function used by the hash table. The hash function is defined in more detail with reference to fig. 6 and 7.

At the time of the lookup, the signatures stored in the same bucket 96 as the lookup signature and the signatures stored in multiple buckets 94 in the surrounding neighborhood of the lookup signature are candidates for matching.

A finer search is then performed on the candidate set by calculating Pearson correlation or any suitable correlation method between the lookup vector and the candidate vector to find the candidate closest to the lookup signature and greater than a predefined threshold. If such a signature is found, it is considered a match to the sought signature.

The hash table may be further enhanced by adding a second level of hashing that will be based on the next four (x, y) tuples in the signature vector generated as a result of dividing the frame into 4(2 x 2) blocks. Thus, this second hash level would use 8-dimensional buckets, where each dimension is within the range [0, 1 ].

Search time may be further reduced by distributing the hash table across different processors/servers (e.g., one bucket per machine).

Referring now to FIG. 6, a flowchart illustrating exemplary steps performed in populating the two-dimensional hash table 90 of FIG. 5 is shown, according to an embodiment of the invention.

The two-dimensional hash table 90 is described with reference to storing and retrieving video signatures. It should be noted, however, that the two-dimensional hash table 90 may be used to store any suitable data element, whether or not a digital signature is included.

Processor 24 is operable to retrieve a video signature from memory 26, the video signature including a value X0 and a value Y0 (block 98). The value X0 and the value Y0 may be the average luminance value of the entire video frame. The video signature may have been previously generated by processor 24 or another processor from frames included in the broadcast stream or other video sequence and may now be stored.

Processor 24 is operable to provide a hash function for use with a two-dimensional hash table 90 having buckets 92 (block 100). The hash function has a plurality of inputs including a first input and a second input, the first and second inputs in combination mapping to one of the buckets 92. The first input is within a range of X values having a plurality of non-overlapping X value sub-ranges (e.g., range X1, range X2, etc. of fig. 5). The second input is within a range of Y values having a plurality of non-overlapping sub-ranges of Y values (e.g., range Y1, range Y2, etc. of fig. 5).

The hash function is provided such that the size of any sub-range of X values is equal to the first limit and the size of any sub-range of Y values is equal to the second limit. The first limit and the second limit are set according to a criterion used in determining a subset of candidate video signatures that match video signatures in the hash table 90. The first limit may be based on a maximum acceptable match difference criterion between the value X of each of the plurality of video signatures and the value X0 of the video signature. The second limit may be based on a maximum acceptable match difference criterion between the value of Y in each video signature and the value of Y0 for the video signature. The first limit value may be 2 in the range of the value of X^-8To 2^-5And the second limit may be 2 in the range of Y values^-8To 2^-5Or any other suitable value as described in more detail with reference to fig. 4.

An example of a suitable hash function is given below, which assumes that the bucket size E (range of buckets) is the same in the x-axis and y-axis and that the maximum x-value (Xmax) and the maximum y-value (Ymax) are also the same.

The bucket identification for a given x, y pair is given by the hash function:

barrel ID ═ [ (y (ru) -1) × Xmax/E ] + x (ru)

Wherein x (ru) roundup (x/E), wherein the roundup function rounds up (x/E) to the nearest integer;

wherein y (ru) roundup (y/E), wherein the roundup function rounds up (y/E) to the nearest integer; and

wherein the buckets are labeled 1, 2, 3, etc.

Another example of a hash function is to define each bucket in a hash table as having a particular row and column value in the hash table matrix. Row and column values may then be calculated based on: the column value of the bucket is floor (x/E) and the row value of the bucket is floor (y/E), where floor is rounded down to the nearest integer.

One of ordinary skill in the art of computer programming will appreciate that an appropriate hash function may be constructed based on the requirements described above.

The processor 24 typically provides the hash function by retrieving the hash function from the memory 26.

When the first input is any value in one of the X-value sub-ranges and the second input is any value in one of the Y-value sub-ranges, the hash function maps to the same one of the buckets 92. Different combinations of X-value sub-ranges and Y-value sub-ranges are mapped to different buckets 92 using a hash function.

Processor 24 is operable to input a value X0 and a value Y0 into a hash function, generating an output indicative of a bucket 96 of buckets 92 of hash table 90 (block 102). The bucket 96 is presented by way of example, and the exact bucket selected will depend on the input value of the hash function.

Processor 24 is operable to issue a command to store the video signature in bucket 96 (block 104). The processor 24 may store the video signature in the database 22 or may instruct another remote processor to store the video signature in an appropriate database.

Similarly, the hash table 90 is operable to store a plurality of video signatures Si in the bucket 92 according to the X and Y values of each video signature Si.

Referring now to FIG. 7, a flowchart illustrating exemplary steps performed in extracting data from the two-dimensional hash table 90 of FIG. 5 is shown, in accordance with an embodiment of the present invention.

The processor 24 is operable to retrieve the video signature S from the memory 26₀Video signature S₀Including the value X0 and the value Y0 (block 106). The value X0 and the value Y0 may be the average luminance value of the entire video frame. Video signature S₀May have been previously generated by processor 24 or another processor from frames included in a broadcast stream or other video sequence and is now provided to look up a signature in order to find an appropriate match in the two-dimensional hash table 90.

Processor 24 is operable to provide a hash function for use with a hash table as previously defined with reference to fig. 6 (block 108).

Processor 24 is operative to sign video S₀The value X0 and the value Y0 are input into a hash function, generating an output indicative of a bucket 96 of buckets 92 of hash table 90 (block 110). Bucket 96 is associated with one of the X value sub-ranges (range X4 of FIG. 5) and one of the Y value sub-ranges (range Y4 of FIG. 5).

Processor 24 is operable to issue commands to retrieve all video signatures Si stored in bucket 96 and all video signatures Si stored in eight different buckets 94 (fig. 5) that are sub-range adjacent to bucket 96 such that each of the eight buckets 94 is associated with: an X value sub-range of the X value sub-range that is adjacent to or the same as the X value sub-range of bucket 96; and a Y-value sub-range that is adjacent to or the same as the Y-value sub-range of bucket 96. It should be noted that if the two-dimensional hash table 90 is stored entirely in the database 22, the processor 24 is typically operable to retrieve the video signature Si from the

buckets

94, 96. However, if the

buckets

94, 96 are stored on another server or distributed over several servers, the processor 24 is operable to issue commands to cause retrieval of the video signature Si, and to cause at least part of the video signature S to be issued, typically by remote server (S), by issuing commands₀And video signatures Si extracted from bucket 96 and eight buckets 94 are compared to analyze video signatures Si.

Referring now to FIG. 8, a flowchart illustrating exemplary steps performed in a method for improving signature matching speed of the flow analyzer 12 of FIG. 1 is shown, in accordance with an embodiment of the present invention.

Additionally or alternatively, the hash table, other advanced search heuristics described above with reference to fig. 5-7 may be applied to further reduce the search space.

Assuming that the stream analyzer 12 has access to the broadcast schedule and has previously identified and tagged some of the contents of the database 22, the stream analyzer 12 may use the event name to check whether the database 22 includes a video signature associated with the name when the broadcast schedule indicates that a new event is about to begin. If included, the flow analyzer 12 may extract the first few signatures of these matching content names and may give higher priority to match the signatures with matching names.

If there is no previously indexed event in the database 22 that is tagged with the same name as indicated from the broadcast schedule, or if the broadcast schedule is not checked for any reason, the flow analyzer 12 may preferentially match the signature generated from the frame at the beginning of the content entry. For example, a 40 minute program sampled at 1 frame per second would produce 40 × 60 ═ 2400 signatures. Matching the first 5-10 signatures can reduce the search space to 0.02% of the original size.

The processor 24 is operable to retrieve the video signature S from the memory 26₀(block 112). Video signature S₀May be a video signature of a content item (e.g., a television program, an advertisement, a trailer, a promotional video) currently being broadcast. Video signature S₀May have been generated by processor 24 or another processor/server.

Processor 24 is optionally operative to receive program guide data.

Processor 24 is operative to determine a video signature S₀Whether corresponding to the beginning of the content item currently being broadcast (block 114). The beginning of a content item may be defined to be within five minutes of the beginning of the content item, but the time may be defined to be much shorter than five minutes, e.g., 5-20 seconds. The processor is optionally operable to analyze the video stream to identify a video stream indicative of a new oneScrolling and/or credits (Credit) of a content item to be started.

Processor 24 is operable to issue a command to compare video signatures S₀And a database of video signatures, compare starting with a video signature corresponding to the start of a content item (and optionally, a content item identified as a currently broadcast content item according to the program guide data), and then search for other ones of the video signatures (block 116).

Advertisements, promotional videos, and the like may not only be repetitive content, but also be repeated at high frequency. Thus, the stream analyzer 12 may give higher search priority to content that has appeared many times, as it is more likely to appear again recently.

Metadata aspects

Referring now to FIG. 9, a flowchart of exemplary steps performed in a method for identifying video signatures for metadata tagging by the stream analyzer 12 of FIG. 1 is shown, in accordance with an embodiment of the present invention.

Metadata generation is typically performed manually in real time. Manually generating metadata for many channels at once is tedious, and metadata is typically produced late and no longer useful.

The stream analyzer 12 is operable to determine which broadcast video content is being displayed high frequency and give priority to metadata generation efforts for those high frequency events. This is particularly useful to facilitate metadata generation for highly repetitive content (e.g., advertisements, trailers, promotional videos, by way of example only).

Processor 24 is operable to update a count of how many times the video sequence has been identified in the video stream(s) (block 118). The video sequence may be identified based on a positive match of the video signature from the video stream with the video signature of the video sequence in the database 22, e.g., when the video sequence has been "locked" as described above with reference to fig. 4.

Processor 24 is operable to check whether the count has exceeded a limit (block 120). The limit may depend on various factors, such as how long it is available for manual metadata tagging.

In response to the limit being exceeded, processor 24 is operable to send a message to user interface module 28 (fig. 1) that includes a content identifier for the video sequence (e.g., a sequence number or stream ID for the video sequence in database 22, and a time code for the video sequence in database 22) and/or the video sequence itself, the message indicating that the video sequence should be manually reviewed and metadata tagged (block 122).

User interface module 28 is operable to output the video sequence for display on display device 30 (fig. 1) (block 124).

User interface module 28 may be operable to receive metadata tags from a user (block 126).

Processor 24 is operable to receive the metadata tag from user interface module 28 (block 128).

Processor 24 is operable to link the metadata tag to the video signature(s) of the video sequence (block 130).

Referring now to fig. 10, a flowchart of exemplary steps performed in a method for marking a video signature associated with an episode of a series of programs by the stream analyzer 12 of fig. 1 is shown, in accordance with an embodiment of the present invention.

The video sequence may be marked by metadata as part of an episode of a series of programs, for example, the metadata marker described with reference to fig. 9 may indicate that the video sequence is part of an episode of a series of programs (block 132).

The processor 24 is operable to identify a new occurrence of the video sequence in the video stream(s) after the video sequence has been previously marked by metadata as part of an episode of a series of programs (block 134).

The processor 24 is operable, in response to identifying a new occurrence of the video sequence in the video stream(s) after the video sequence has been previously marked by the metadata as part of an episode of a series of programs, to search the broadcast data to determine an episode identification associated with the new occurrence of the video sequence (block 136).

The processor 24 is then operable to add the episode identification to the video sequence following the episode presentation video sequence.

Referring now to fig. 11, two video sequences 138 are shown being processed by the stream analyzer 12 of fig. 1.

Using the lock/unlock mechanism, unknown content may be automatically partitioned into different blocks of content. For example, at time t1 in FIG. 11, if there is an advertisement/promotion break 142 in the middle of a television program, the break may be identified by recording when the lock on the television program was lost (block 142) and regained (block 144) (140). However, the internal damage of the individual advertisements/promotions in the break 142 is still unknown. At least some of the boundaries of the advertisement/promotion in the break 142 may be identified when at least one of the advertisement/promotion entries 146 reappears (at time t2 in FIG. 11) in a different order than the initial advertisement/promotion break or without other entries in the break 142.

In the example of FIG. 11, the advertisement 146 is identified as being at the beginning of the advertisement break 148. The same advertisement 146 was also previously identified as being in the middle of the break 142. Thus, two boundaries between content items may be identified as being in the break 142.

Once the advertisements/promotions are repeated for a sufficient time, the method described with reference to FIG. 9 may be used to manually add metadata to those advertisements/promotions.

Referring now to FIG. 12, a flowchart illustrating exemplary steps performed in identifying content item boundaries within an unknown content item by the flow analyzer 12 of FIG. 1 is shown, in accordance with an embodiment of the present invention.

The processor 24 is operable to retrieve a plurality of video signatures S of a video sequence from the memory 26₀(block 150). Video signature S₀Previously generated by processor 24 or another server at time t0 from video frames in a video sequence (e.g., video frames from a broadcast stream).

Processor 24 is operable to compare video signatures S of video sequences₀And video signature S_NThe database 22 (block 152).

Processor 24 is operative to determine video signature S based on matching criteria (e.g., the matching criteria described above with reference to fig. 4)₀Not with any video tags in the database 22Name S_NMatch (box 154).

Processor 24 is operative to sign video S₀Adding to video signatures S as unknown content items_NThe database 22 (block 156).

The processor 24 is operable to retrieve a plurality of video signatures S of a video sequence from the memory 26₁(block 158). Video signature S₁Previously generated by processor 24 or another server at time t1 from video frames in a video sequence (e.g., video frames from a broadcast stream).

Processor 24 is operable to compare video signatures S₁And video signature S_NThe database 22 of (2) generating a video signature S₁And a video signature S already stored in the database 22₀Partial match therebetween (block 160).

Based on video signatures S₀And video signature S₁The processor 24 is operable to determine that the unknown content item includes at least two unknown content items (block 162).

In the middle part of the unknown content item, when the video is signed S₁And video signature S₀When there is a match, processor 24 is operable to base the video signature S on₀And video signature S₁The partial match between determines that the unknown content item includes at least three unknown content items as in the example in fig. 11.

Processor 24 is operative to update video signature S₀Tags in a database to enable signing S with video₁Matching video signatures S₀Marked as the first unknown content item and the remaining video signature S₀Marked as at least one other, different second unknown content item (block 164).

It should be noted that the methods described with reference to fig. 5-12 may be implemented using any suitable video signature generation and matching method and not just the video signature generation and matching methods of fig. 2A-5. It should also be noted that the video signature may be generated on a per frame basis, or may be a video signature that identifies multiple video frames in a single video signature. Similarly, by way of example only, video signatures may be based on motion vectors, chrominance values, and/or luminance values, where appropriate.

There is also provided, in accordance with yet another embodiment of the present disclosure, a system, including a processor and a memory for storing data used by the processor, wherein the processor is operable to update a count of how many times a video sequence has been identified in at least one video stream, check whether the count has exceeded a limit, and in response to the count exceeding the limit, send a message to a user interface module indicating that the video sequence should be manually reviewed and tagged with metadata.

Further, according to an embodiment of the present disclosure, the processor is operable to: receiving at least one video stream; calculating a plurality of first video signatures for different portions of the at least one video stream; comparing the first video signature to a database of second video signatures comprising a plurality of video signatures of the video sequence; and identifying that the video sequence is in the at least one video stream based on a positive match of at least some of the first video signatures with at least some of the video signatures of the video sequence.

Further, in accordance with an embodiment of the present disclosure, in response to the count exceeding the limit, the processor is further operable to send a content identifier of the video sequence to the user interface module.

Further, in accordance with an embodiment of the present disclosure, the processor is further operable to send the video sequence to the user interface module in response to the count exceeding the limit.

Further, in accordance with an embodiment of the present disclosure, a user interface module is included in the system, and the user interface module is operable to output the video sequence for display on a display device.

Further, according to an embodiment of the present disclosure, the processor is further operable to receive a metadata tag from the user interface module and link the metadata tag to at least one video signature of the video sequence.

Further, according to an embodiment of the present disclosure, the metadata tag indicates that the video sequence is part of a series of program episodes.

Further, according to an embodiment of the present disclosure, the processor is further operable to search the broadcast data to determine an episode identification associated with the new occurrence of the video sequence in response to identifying the new occurrence of the video sequence in the at least one video stream and after the video sequence has been previously marked with metadata as part of a series of program episodes.

Further, according to an embodiment of the present disclosure, the user interface module is operable to receive a metadata tag from a user.

According to yet another embodiment of the present disclosure, there is also provided a method including: the method includes updating a count of how many times the video sequence has been identified in the at least one video stream, checking whether the count has exceeded a limit, and in response to the count exceeding the limit, sending a message to a user interface module indicating that the video sequence should be manually reviewed and tagged with metadata.

Further, according to an embodiment of the present disclosure, the method includes: receiving at least one video stream; calculating a plurality of first video signatures for different portions of the at least one video stream; comparing the first video signature to a database of second video signatures comprising a plurality of video signatures of the video sequence; and identifying that the video sequence is in the at least one video stream based on a positive match of at least some of the first video signatures with at least some of the video signatures of the video sequence.

Further, according to an embodiment of the present disclosure, the method includes: in response to the count exceeding the limit, a content identifier of the video sequence is sent to a user interface module.

Further, according to an embodiment of the present disclosure, the method includes: in response to the count exceeding the limit, the video sequence is sent to a user interface module.

Further, according to an embodiment of the present disclosure, the method includes: the video sequence is output for display on a display device.

Further, according to an embodiment of the present disclosure, the method includes: a metadata tag is received from the user interface module and linked to at least one video signature of the video sequence.

Further, according to an embodiment of the present disclosure, the method includes: in response to identifying a new occurrence of a video sequence in at least one video stream and after the video sequence has been previously marked with metadata as part of a series of program episodes, broadcast data is searched to determine episode identifications associated with the new occurrence of the video sequence.

Further, according to an embodiment of the present disclosure, the method includes: a metadata tag is received from a user.

There is also provided, in accordance with yet another embodiment of the present disclosure, a system, including a processor and a memory for storing data used by the processor, wherein the processor is operable to: retrieving a plurality of first video signatures of a video sequence from a memory; comparing a first video signature of a video sequence to a database of video signatures; determining that the first video signature does not match any video signature in the database according to the matching criteria; adding the first video signature as an unknown content item to a database of video signatures; retrieving a plurality of second video signatures of the video sequence from the memory; comparing the second video signature to a database of video signatures to produce a partial match between the second video signature and the first video signature; determining that the unknown content item includes at least two unknown content items based on a partial match between the first video signature and the second video signature; and updating the tags of the first video signatures in the database such that the first video signature that matches the second video signature is tagged as a first unknown content item and the remaining first video signatures are tagged as at least one other, different, second unknown content item.

Further, in accordance with an embodiment of the present disclosure, in a middle portion of the unknown content item, the second video signature and the first video signature match, and the processor is operable to determine that the unknown content item includes at least three unknown content items based on the partial match between the first video signature and the second video signature.

According to yet another embodiment of the present disclosure, there is also provided a system including a processor and a memory for storing instructions for executing the methodA memory storing data for use by the processor, wherein the processor is operable to: retrieving data for a video frame from a memory, the data comprising a plurality of luminance metrics, each luminance metric associated with a different entry in a matrix; weighting each luminance measure by the row position of the luminance measure in the matrix; calculating a weighted average luminance value Ly of the weighted luminance metrics weighted by the row position; weighting each luminance measure by the column position of the luminance measure in the matrix; calculating a weighted average luminance value Lx of the weighted luminance metrics weighted by the column positions; and creating a video signature S of the video frame_NThe video signature S_NIncluding a weighted average luminance value Ly and a weighted average luminance value Lx.

Further, according to an embodiment of the present disclosure, the video frame comprises a plurality of pixels, each luminance measure providing a measure of the luminance of a different pixel.

Further, according to embodiments of the present disclosure, each luminance measure is a measure of the luminance of a discrete cosine transformed version of an image of a video frame.

Further, according to an embodiment of the present disclosure, the video frame comprises a plurality of pixels, and the luminance measure collectively provides a measure of luminance for at least 60% of the pixels.

Further, according to an embodiment of the present disclosure, the processor is operable to: dividing data of the video frame into a plurality of sets of luminance metrics corresponding to a plurality of different regions R of the video frame, each set being associated with a different sub-matrix of the matrix; and for each set of luminance metrics, calculating a weighted average luminance value lx (r) of the luminance metrics in the set such that each luminance metric is weighted by the row position of the luminance metric in the sub-matrix of the set, calculating a weighted average luminance value ly (r) of the luminance metrics in the set such that each luminance metric is weighted by the column position of the luminance metric in the sub-matrix of the set; and adding the weighted average luminance value ly (R) of the set and the weighted average luminance value Lx (R) of the set to the video signature S of the video frame_N。

Further, in accordance with an embodiment of the present disclosure, the processor is operable to partition data of the video frame into regions corresponding to a plurality of different regions RSets of luminance metrics such that some sets of luminance metrics correspond to a plurality of different first regions of the video frame, some sets of luminance metrics correspond to sub-regions of different first regions of the video frame, and a video signature S_NComprising a plurality of weighted average luminance values calculated for frame divisions of at least three levels comprising: (a) not dividing the frame, (b) dividing the frame into different first regions, and (c) dividing each of the different first regions into sub-regions.

Further, in accordance with an embodiment of the present disclosure, the processor is operable to divide the data of the video frame into sets of luminance metrics by dividing the video frame into four regions, each of the four regions corresponding to a different set.

Further, in accordance with an embodiment of the present disclosure, the processor is operable to further divide each of the four regions into four sub-regions, each sub-region corresponding to a different set.

Further, according to an embodiment of the present disclosure, the processor is operable to determine the video signature S according to a matching criterion_NA plurality of video signatures S partially matched_iAnd comparing the video signatures S_NAnd video signature S_iTo determine a video signature S_iWith the video signature S in the subset of_NThe best matching video signature that matches.

According to yet another embodiment of the present disclosure, there is also provided a method including: retrieving data for a video frame from a memory, the data comprising a plurality of luminance metrics, each luminance metric associated with a different entry in a matrix; weighting each luminance measure by the row position of the luminance measure in the matrix; calculating a weighted average luminance value Ly of the weighted luminance metrics weighted by the row position; weighting each luminance measure by the column position of the luminance measure in the matrix; calculating a weighted average luminance value Lx of the weighted luminance metrics weighted by the column positions; and creating a video signature S of the video frame_NThe video signature S_NIncluding a weighted average luminance value Ly and a weighted average luminance value Lx.

According to still another embodiment of the disclosureThere is also provided, in an embodiment, a system comprising a processor and a memory for storing data used by the processor, wherein the processor is operable to: retrieving video signatures from memory₀Video signature S₀Comprising at least one first average luminance value of the video frame and a plurality of second weighted average luminance values of a plurality of different regions R of the video frame; determining and video signing S according to matching standard₀A plurality of video signatures S matched to the at least one first average luminance value_iA subset of (a); and comparing the plurality of second weighted average luminance values with the video signature S_iIn an attempt to find a video signature S that meets the matching criteria among the video signatures of the subset_i。

Further, according to an embodiment of the present disclosure, video signature S₀Comprises a weighted average luminance value Lx and a weighted average luminance value Ly, and a plurality of video signatures S_iEach of which includes a weighted average luminance value Lx_iAnd weighted average luminance value Ly_i。

Further, according to an embodiment of the present disclosure, the processor is operable to compare the weighted average luminance value Lx with a plurality of video signatures S_iA weighted average luminance value Lx of each of_iAnd comparing the weighted average luminance value Ly with a plurality of video signatures S_iIs weighted average luminance value Ly of each of_iAnd the processor is operable to determine the video signature S_iIncludes a video signature S having_iVideo signature of (1): (a) video signature S₀Weighted average luminance value Lx within the first limit of the weighted average luminance value Lx of_iAnd (b) video signature S₀Weighted average luminance value Ly within the second limit of the weighted average luminance value Ly of_i。

Further, according to embodiments of the present disclosure, the first limit and the second limit are at 2^-8To 2^-5Within a range of maximum luminance values.

Further, according to an embodiment of the present disclosure, for each different region R of a video frame, a video signature S₀The second weighted average luminance value of (1) includes: (a) weighted averageAverage luminance value Lx (R), and (b) weighted average luminance value Ly (R) for a video signature S₀Of different regions R of the video frame, a plurality of video signatures S_iEach of which comprises: (a) a weighted average luminance value Lx (R) i, and (b) a weighted average luminance value Ly (R) i, and the processor is operable to, for each video signature S_iCalculates a video signature S based on the correlation of (a) the average weighted luminance metrics lx (r) and ly (r) with respect to (b) the average weighted luminance metrics lx (r) i and ly (r) i, respectively₀And each video signature S_iThe correlation between them.

Further, according to an embodiment of the present disclosure, the processor is operable to sign S for each video_iFurther based on the weighted average luminance measures Lx and Ly, respectively, relative to the weighted average luminance value Lx_iAnd Ly_iThe correlation is calculated.

Further, according to embodiments of the present disclosure, the processor is operable when a video signature S is calculated_iHas a correlation greater than a certain minimum correlation coefficient and is all video signatures S in the subset_iThe correlation coefficient of the highest correlation coefficient of the video signature S is selected_i。

Further, according to an embodiment of the present disclosure, the minimum correlation coefficient is 0.7.

Further, according to an embodiment of the present disclosure, the correlation is calculated based on Pearson correlation.

In practice, some or all of these functions may be combined in a single physical component, or alternatively, may be implemented using multiple physical components. These physical components may include hardwired or programmable devices, or a combination of both. In some embodiments, at least some of the functions of the processing circuitry may be performed by a programmable processor under the control of appropriate software. For example, the software may be downloaded to the device in electronic form over a network. Alternatively or additionally, the software may be stored in a tangible, non-transitory computer readable storage medium, such as an optical, magnetic, or electronic memory.

It is understood that the software components may be implemented in the form of Read Only Memory (ROM), if desired. Software components may typically be implemented in hardware, if desired, using conventional techniques. It is also understood that software components may be embodied as, for example: as a computer program product or on a tangible medium. In some cases, the software components may be embodied as signals that may be interpreted by an appropriate computer, although such an implementation may not be included in some embodiments of the invention,

it is to be understood that various features of the invention which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination.

It will be appreciated by persons skilled in the art that the present invention is not limited to what has been particularly shown and described hereinabove. Rather, the scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A system for identifying a video sequence using a video signature, the system comprising a processor; and a memory for storing data used by the processor, wherein the processor is operable to:

retrieving a first video signature of a video frame from the memory, the first video signature comprising a value X₀Sum value Y₀Wherein X is₀Is an average luminance value weighted according to column position, Y₀Is an average luminance value weighted according to the line position;

providing a hash function for use with a hash table having a plurality of buckets for storing video frames, the hash function having a plurality of inputs including a first input and a second input, the first and second inputs in combination mapping to one of the buckets, wherein:

(a) the first input is within a range of X values having a plurality of non-overlapping X value sub-ranges;

(b) the second input is within a range of Y values having a plurality of non-overlapping sub-ranges of Y values;

(c) when the first input is any value within one of the X-value sub-ranges and the second input is any value within one of the Y-value sub-ranges, the hash function maps to the same one of the buckets; and is

(d) Different combinations of the X-value sub-range and the Y-value sub-range are mapped to different ones of the buckets using the hash function;

signing the value X of the first video₀And the value Y₀Input into the hash function, generating an output indicative of a first one of the buckets of the hash table; and

issuing a command to store the first video signature in the first bucket, wherein

The size of any said sub-range of X values is equal to the first limit;

the size of any said sub-range of Y values is equal to the second limit;

the first and second limits are set according to criteria for determining a subset of candidate video signatures that match the first video signature in the hash table;

the first limit is based on an X value of each of a plurality of video signatures and the value X of the first video signature₀A maximum acceptable match difference criterion between; and is

The second limit is based on a Y value of each of the video signatures and the value Y of the first video signature₀The maximum acceptable match difference criterion between.

2. The system of claim 1, wherein the first video signature is included in a video signature S of a video frame_NIn (1).

3. The system of claim 1, wherein the first limit is 2 within the range of X values^-8To 2^-5Within the range of the maximum value ofAnd the second limit is 2 within the range of the Y value^-8To 2^-5Is within the range of the maximum value of (a).

4. A system for identifying a video sequence using a video signature, the system comprising a processor; and a memory for storing data used by the processor, wherein the processor is operable to:

signing the value X of the first video₀And the value Y₀Input into the hash function, generating an output indicative of a first one of the buckets of the hash table;

issuing commands to retrieve all video signatures stored in the first bucket and all video signatures stored in eight different ones of the buckets that are sub-range adjacent to the first bucket; and

issuing a command to compare at least a portion of the first video signature with video signatures retrieved from the first bucket and the eight buckets, wherein

The size of any said sub-range of X values is equal to the first limit;

the size of any said sub-range of Y values is equal to the second limit;

5. The system of claim 4, wherein: the first video signature is included in a video signature S of a video frame₀Performing the following steps; and each of the plurality of video signatures is included in a different data signature S_iIn (1).

6. The system of claim 4, wherein the first limit is 2 within the range of X values^-8To 2^-5And the second limit is 2 within the range of the Y value^-8To 2^-5Is within the range of the maximum value of (a).

7. A method of identifying a video sequence using a video signature, comprising:

retrieving a first video signature of a video frame from memory, the first video signature comprising a value X₀Sum value Y₀Wherein X is₀Is an average luminance value weighted according to column position, Y₀Is an average luminance value weighted according to the line position;

providing a hash function for use with a hash table having a plurality of buckets, the hash function having a plurality of inputs including a first input and a second input, the first and second inputs in combination mapping to one of the buckets, wherein:

The size of any said sub-range of X values is equal to the first limit;

the size of any said sub-range of Y values is equal to the second limit;

8. The method of claim 7, wherein the first video signature is included in a video signature S of a video frame_NIn (1).

9. The method of claim 7, wherein the first limit is 2 within the range of X values^-8To 2^-5And the second limit is 2 within the range of the Y value^-8To 2^-5Is within the range of the maximum value of (a).

10. A method of identifying a video sequence using a video signature, comprising:

signing the value X of the first video₀And the value Y₀Input to the hash functionGenerating an output indicating a first one of the buckets of the hash table;

The size of any said sub-range of X values is equal to the first limit;

the size of any said sub-range of Y values is equal to the second limit;

11. The method of claim 10, wherein the first video signature is included in a video signature S of a video frame₀Performing the following steps; and each of the plurality of video signatures is included in a different data signature S_iIn (1).

12. The method of claim 10, wherein the first limit is 2 within the range of X values^-8To 2^-5And the second limit is 2 within the range of the Y value^-8To 2^-5Is within the range of the maximum value of (a).