~. CA 02280163 1999-08-04 .' , ~ ~ ~ , s s ~ .. _ o m VIDEO SEQUENCE RECOGNITION
This invention relates to video processing; and more particularly to a method and apparatus for the automatic recognition of video sequences.
The reliable automatic recognition of video sequences has application in a number of areas. One application, for example, is the recognition of video commercials.
Marketing executives, for example, may wish to know when and how often a particular commercial has been broadcast.
Clearly, one possibility would be to store an entire video sequence and continually compare broadcast signals with a library of stored sequences. This solution is not very practical since it would require a prohibitive amount of memory and processing power to implement.
EP 283570 teaches the use of digital signatures to match video sequences.
However, such signatures, for example, represem:ing the time between black scenes or color changes are insufficient to permit fast and accurate matching of complex video sequences.
EP 367585 describes a method of video recognition that involves creating digital signatures for all the frames in a sequence. This method involves an undue amount of processing power.
An object of the invention is to alleviate this problem and provide a practical method of recognizing a predetermined video sequence that is reasonably accurate yet at the same time requires an acceptable amount of processing power.
According to the present invention there is provided a method of detecting video sequences comprising the steps of receiving an incoming video stream consisting of sequences of successive frames, creating on-the-i1y digital signatures in accordance with a predetermined algol-ithm, comparing said digital signatures with a plurality of stored digital signatures created in accordance with said predetermined algorithm, and identifying a candidate sequence of said incoming video stream as corresponding to said stored sequence in the event of a positive match of said digital signatures in accordance with predetermined criteria, characterized in that selected special frames of predetermined type are identified in said sequences; unique digital signatures are created for said special ~,~;'SaJDED SHEET
a ' .;
~ a ~ a ~~ eesv ee eo
-2-frames, said unique digital signatures being uniquely dependent on the individual characteristics of said special frames, said digital signatures of said special frames of a sequence collectively form a signature file for the sequence, and the digital signatures of the special frames are matched with the digital sil;natures of frames of the same type in stored signature files to identify a candidate sequence.
The video sequence is in digital form, either as a result of analog-to-digital conversion of an analog broadcast signal or the signal being in digital format.
A video stream is composed of a sequence of frames, each being consisting of a grid of pixels. Pixel values can be represented, fo:r instance, by their red, green, and blue components, or by hue, luminance and saturation.. In a preferred embodiment, the digital signature is derived from the pixel values of the firames intelligently selected from those forming a video sequence.
The digital signature can be created from a live or recorded video sequence.
It can be stored in a file, which can, for example, contain the duration of the sequence, the number of hue or luminance categories considered, and for each selected frame in the sequence, a histogram representing the percentage of pixels in the defined categories, and scene information taken from a SmartFrameT"" encoder, as described in our co-pending patent application number 2,190,785, filed on November 20, 1996 and entitled METHOD
OF PROCESSING A VIDEO STREAM, which is herein incorporation by reference. This encoder extracts information relating the difference in pixel values pairs of frame in a sequence.
The processing engine can then compare one or more stored signatures to a candidate live or recorded video stream to find a match. The candidate signature is created in the same manner as the stored signatures, except that in the case of the candidate sequence, the signature is created "on-the-fly", i.e. on a real-time basis.
The candidate signature is a moving signature. This means that it is created from successive overlapping groups of frames.
The processing engine must continually slot the stored signature in time with the target video stream and generate an alert when thf:re is a positive match. A
signature must AN1E~!DED SHEEN' ._ -_ a a , _ .,, ' ,~s~ ~~,, ,
-3-' be re-slotted when it is determined that the target stream stops following or matching it. °
This operation is performed using a slider routine: to be described in detail below.
It should be noted that the stored video signature frame size and sample rate do not have to match the video frame size and sample rate of the candidate sequence, which may, for example, be a broadcast signal. The vidf;o signature matching will tolerate varying broadcast quality.
In order to reduce the amount of required processing and memory power, the signature file preferably only contains the signatures or histograms of a limited number of frames in a video sequence. These frames are preferably frames having a particular ° significance, namely "cuts" which represent sudden changes in scene, "time-outs", which are frames which occur after a predetermined period of time in the sequence when no "cuts" have been detected, and "start-frames", which are simply the first frame in a particular sequence. For example, if the signature file is to be created for a particular commercial, first a histogram is taken for the first frame in the sequence and this is stored as the first signature in the signature file. The system then waits for the first frame corresponding to a cut, and stores this as the next signature unless a certain time, say two seconds, has elapsed without a cut occurnng, in which case the signature of the next frame is taken and this is tagged as a "time-out" frame. This is to ensure that histograms are stored even when a sequence does not include; cuts.
The present invention can offer a high percentage hit rate and a low percentage false hit rate.
The invention also provides an apparatus for detecting video sequences comprising means for receiving an incoming vidf:o stream consisting of sequences of successive frames, means for creating on-the-fly digital signatures in accordance with a predetermined algorithm, means for comparing said digital signatures with a plurality of stored digital signatures created in accordance with said predetermined algorithm, and means for identifying a candidate sequence of said incoming video stream as corresponding to said stored sequence in the event of a positive match of said digital signatures in accordance with predetermined critf:ria, characterized in that said means for creating on-the-fly digital signatures selects special frames of predetermined type in said sequences and creates unique digital signatures for said special frames, said unique digital ' riiL~~:~UL;~J JflCi=!
;,, , - 3 (~, n signatures being uniquely dependent on the individual characteristics of said special frames, and said digital signatures of said special frames of a sequence collectively forming a signature file for the sequence, and said means for identifying a candidate sequence matches the digital signatures of the special frames with the digital signatures of frames of the same type in stored signature files to identify the candidate sequence.
The invention can be implemented in an IBM compatible personal computer equipped, for example, with a Pentium microprocessor and a commercial video card.
The invention will now be described in more detail, by way of example, only with reference to the accompanying drawings, in which: -Figure 1 is a block diagram of an video recognition apparatus in accordance with the invention;
Figure 2 shows the main routine of the memorize module of the apparatus shown in Figure 1;
Figure 3 shows the main routine of the search module of the apparatus shown in Figure 1;
Figure 4 shows the algorithm for obtaining a digital signature;
Yet another advantage of the invention is the ability to have multiple channels that transfer the information in surround sound.
Yet another advantage is elimination of the need for several cables to transfer the information from one receiver to the next in surround sound.
Brief Description of Drawings Fig. 1 is a schematic block diagram of an MTS stereo and surround sound encoder embodying features of the present invention.
Fig. 2 is a schematic block diagram of a first section of the encoder of Fig. 1 shown in greater detail.
Fig. 3 is a schematic block diagram of a second section of the encoder of Fig. 1 shown in greater detail.
Fig. 4 is a schematic block diagrarn of a third section of the encoder of Fig. 1 shown in greater detail.
Fig. 5 is a graph illustrating the siignal-to-noise ratio of the encoder of Fig. 1.
Detailed Descrint;ion of Drawings Fig. 1 is a schematic block diagram of a Multichannel Television Sound (MTS) stereo and surround sound encodE;r, generally designated by a reference numeral 10, having a left audio input 12, a right audio input 14, a video input 16, an output 18, an audio breakout matrix (ABM) 20, a surround sound conditioner (SSC) 22, a video stripper matrix (VSM) 24, a L+R low pass clamping filter 25, a mixer 26, an amplifier circuit 28, and a timing circuit 29.
The encoder 10 utilizes two pilot signal frequencies. Output of the VSM 24 is coupled to the timing circuit 29 to produce the two pilot signal frequencies as discussed below. One pilot signal is at 15.734 kHz, which is a television's horizontal rate, for synchronizing the encoder 10. The second pilot signal is at 31.468 kHz, which is two times the horizontal rate of the television, for synchronizing transfer of a L-R signal information.
In Fig. 2, the ABM 20 is shown in greater detail. The ABM 20 receives a left input signal and a right input signal at the left audio input 12 and the _ 3 _ ,Cp ,~ AIV~NUL.~
WO 98/35492 PCTlCA98/00069 right audio input 14, respectively. The input signals are matrixed by a resistor network 30. The resistor network 30 has a plurality of resistors, each resistors has a value of approximately 100 KS2, generates stereo information and surround sound information, collectively referred to as the L-R signal, and monaural information, referred to as the L+R signal. Use of large resistors in the resistor network 30 causes attenuation in the signals. Therefore, audio amplifiers for the L-R and the L+R signals, designated 32 and 34 respectively, return the levels of the signals to normal. Resistors 36 and 38 are selected in conjunction with the amplifiers 32 and 34, respectively, to produce the desired amplification of the signals. The L+R signal is transmitted through the L+R
low pass clamping filter 25 and the amplifier circuit 28 to the output 18. The L+R low pass clamping filter 25 is a low pass filter (LPF) that will clamp the signal at 15.734 kHz and at 31.468 kHz to prevent interference with the pilot signals operating at 15.734 kHz and 31.468 kHz. The L-R signal output of the ABM 20 is transmitted to the SSC 22.
As shown in Fig. 3, the SSC 22 includes a pre-emphasizes 39, a Dolby noise reduction (dbx) compander 40, a low pass (LP) filter network 42, a regeneration amplifier 44, an L-R clamping filter 46, and a diode circuit 48.
The L-R signal is received at the SSC 22. The pre-emphasizes 39 operates at 396 usec. The pre-emphasizes 39 is used to condition the L-R signal for the dbx compander 40. In operation, the pre-emphasizes 39 gives higher frequencies of the L-R signal the same power as lower frequency of the L-R signal. The higher frequencies need a boost in power because the lower frequencies travel much easier due to Doppler effects. Thus, the surround sound information contained at the higher frequencies of the L-R signal now has more power.
The pre-emphasized signal is then sent to the dbx compander 40. The dbx compander 40 amplitude compresses the L-R signal according to the MTS
standard. Amplitude compression is used to reduce the signal-to-noise (SN/R}
ratio. Amplitude compression is performed by routing the output of the L-R
clamping filter 46 through a transistor buffer stage 54 (Fig. 4) through a
-4-constant current circuit 50 and to the dbx compander 40. The constant current circuit 50 is a root-mean-square (RMS) e;tage of the dbx compander 40, which controls the amplitude of the L-R signal.
The output L-R signal of the compander 40 is passed through the LP
filter network 42. The LP filter network. 42 filters out any unwanted noise to produce a filtered L-R signal. The LP filter network 42 attenuates the original L-R signal during the filter process so that the filtered L-R signal will be slightly attenuated. Therefore, the filtered L-R signal is passed through the amplifier 44. The amplifier 44 returns the filtered L-R signal back to the proper signal level.
At this point, the filtered L-R signal must be clamped oft' at 15.734 kHz and 31.468 kHz frequencies to prevent interference with the pilot signals. The L-R clamping filter 46 is used to clamp the signals at 15.734 kHz and 31.468 kHz frequencies. The L-R clamping filter 46 will trap the signal to create about 45dB roll-off at 15.734 kHz and 31..468 kHz frequencies. The L-R
clamping filter 46 effectively traps the L-R signal, at 15.734 kHz, to ground and prevents the 15.734 kHz pilot signal from taking hits. Likewise, the L-R
clamping filter 46 traps any switching signal contained in the L-R signal at 31.468 kHz to ground to provide clean stereo/surround sound output. Thus, the information signal will be reduced to minimal levels and will not interfere with or allow the pilot signal to take hits. In addition to clamping the filtered L-R
signal at the 15.734 kHz and 31.468 kH2; frequencies, voltage spikes in the filtered L-R signal must be eliminated. The diode circuit 48 eliminates voltage spikes by leveling off voltage spikes so the peak-to-peak (P-P) voltage does not exceed 1.4 volts. The diode circuit 48 will take the filtered L-R signal and produce a leveled L-R signal. The leveled L-R signal will have the frequencies clamped off at the two frequencies 15.734 liHz and 31.468 kHz.
Fig. 4 shows the VSM 24, the L+R, low pass clamping filter 25, the mixer 26, the amplifier circuit 28, the timing circuit 29, and the transistor buffer stage 54. As discussed above, the output; of the L-R clamping filter 46 is sent _5_ to the transistor buffer stage 54. The output of the transistor buffer stage 54 is inputted to a balance modulator 56. The balance modulator 56 modulates the leveled L-R signal to produce an upper side band and a lower side band, around the pilot signal at the 31.468 kHz switching rate, as a reduced carrier amplitude modulated (AM) L-R signal. The switching rate of 31.468 kHz for the balance modulator 56 is produced by the timing circuit 29 as discussed below. A combining amplifier 58 blends the AM L-R signal output of the balance modulator 56 with the pilot signal at 15.734 kHz to produce a mixed L-R signal. Timing for the pilot signal at 15.734 kHz is produced by the modulator timing circuit 59. The modulator timing circuit 59 is synchronized to the 15.734 kHz rate of the television, which is produced by a synchronizing circuitry.
The synchronization circuitry synchronizes the switching rate at 31.768 kHz with the pilot signal at 15.734 kHz. The synchronization circuitry is made up of the VSM 24 and the timing circuit 29. The VSM 24 removes color or chroma information from a video signal to produce a luminous video pattern signal. The luminous video pattern signal is used to keep the encoder 10 (Fig.
1) in sync with the 15.734 kHz horizontal rate of the television. The luminous video pattern signal is sent to a synchronous separator 62. The synchronous separator 62 looks only at the 15.734 kHz horizontal rate to produces a clean horizontal sync signal. The sync signal is sent to a JK flip-flop 64. The JK
flip-flop 64 produces a "saw" like signal pattern which drives a phase lock loop (PLL) 66 at a switching rate of 31.468 kHz.. The PLL 66 in turn provides the 31.468 kHz switching rate to the balance modulator 56. Also, a JK flip-flop 65 provides the 15.734 kHz timing for the modulator timing circuit 59. Thus, the timing circuit 29 produces the sync signal that keeps the pilot signal at 15.734 kHz in sync with the pilot signal at 31.468 kHz switching rate. Accordingly, the balance modulator 56 is switched at 31.468 kHz in step with the pilot signal at 15.734 kHz to produce the AM L-R signal in step with the horizontal rate of the television.
E,.~ ~;r, . ' -. -n ~ . .., .. ,~ . ,.
a - r n r n a ~, 1 7!a!! ~ H '1.~
-whether the current frame represents a cut, a timf;out or a Searchstarted frame, i.e. the first frame in a search sequence.
Step 52 sets the variable FirstDifference f;qual to the difference between the current histogram and the first signature histogram (0), i.e. the histogram for frame zero in the signature file.
Step 53 determines whether the frame represents a cut and the stored signature file contains at least one cut-tagged histogram. If so, block 57 sets the variable Index equal to Firstcutindex, and the variable SecondDifference equal to the result of comparing the histogram for the frame with the first cut-tagged histogram for the signature file. If not the variable Index is set to 1 in block 56 and the variable SecondDifference is set to the result of the comparison between the current histogram (Histogram) and the second histogram of the signature file (corresponding to the Index l.. The variable Index starts at 0).
Step 54 then determines whether either of the FirstDifference or Second Difference is less than a threshold. If yes, which indicates a positive match of the histogram for the current frame with the selected histogram in the signature file, the program sets the variable Position to the Index value that corresponds to the smaller of the two (FirstDifference and SecondDifference), and calls an assign slider routine (block 63), which will be described in more detail below. If not, which indicates the absence of a match, the program calls a compare with previous sliders routine at block 60.
Both these routines will be described in more detail.
The assign slider routine is shown in Figure 7. The slider is essentially a pointer identifying a particular frame in the signature file whose histogram matches the current or most recently matched frame in the video sequence. Each time a first match occurs, a "slider" is assigned. The compare sliders routine to be described below with reference to Figure 8 then attempts to match the subsequent histograms in the signature file with the current video sequence, and moves the pointer (slider) by one histogram towards the last histogram in the file whenever a match occurs.
In the search routine, the system in effect builds one large file adding one table or histogram every time a scene changes (cut) or a timeout occurs. An attempt is made to fit the stored signature of the candidate sequence by "sliding" it within the large signature AMEf~I~ED SNEE i