CA2312997A1

CA2312997A1 - Apparatus and methods for manipulating sequences of images

Info

Publication number: CA2312997A1
Application number: CA002312997A
Authority: CA
Inventors: Asher Hershtik
Original assignee: Contentwise Ltd.; Asher Hershtik
Current assignee: CONTENTWISE Ltd
Priority date: 1997-12-07
Filing date: 1998-12-07
Publication date: 1999-06-17
Also published as: EP1046283A4; AU1503599A; IL122498A0; EP1046283A1; WO1999030488A1

Abstract

This invention discloses a video sequence viewing apparatus including an image sequence display unit (110) operative to display a sequence of images at a speed determined in accordance with a control signal, and an image sequence analyzer (100) operative to perform an analysis of the sequence of images and to generate the control signal in accordance with a result of the analysis. A
watermarking method including providing an image sequence to be watermarked and performing a predetermined alteration of the length of the image sequence is also disclosed.

Description

WO 99/30488 PC7YIL98/OOS9b APPARATUS AND METHODS FOR MANIPULATING SEQUENCES OF IMAGES
FIELD OF THE INVENTION
The present invention relates to apparatus and methods for manipulating sequences of images.
BACKGROUND OF THE INVENTION
Issued US Patent No. 5,790,236, entitled "Movie Processing System", inventors Asher Hershtik and Dani Rozenbaum, assignees ELOP Electronics Industries Ltd.,.
Rehovot, Israel and Television Multilingue S.A., Geneva, Switzerland, date of issue Aug. 4, 1998, describes a movie processing system in which a plurality of versions of a movie are compared, including a movie version syn-chronizer and an output movie generator receiving a synchronization signal, representing the mutual synchro-nization of the movie versions, from the synchronizer and generating therefrom an output movie editing list.
Israel Patent Application No. 119504 describes a system and method for audio-visual content verifica-tion.
"Intro" is a known function in audio applica-tions in which a user of a CD player can "scan" a CD by hearing a small portion of each audio segment (e. g. song) on the CD.
The disclosures of all publications mentioned in the specification and of the publications cited therein are hereby incorporated by reference.
SUMMARY OF THE INVENTION
The present invention seeks to provide improved apparatus and methods for manipulating sequences of images.

wo ~r~o~ss rcrnL9sroos~

2 There is thus provided in accordance With a preferred embodiment of the present invention a system for capturing the signature of video frames, using only small amounts of data. The video signature technology typically captures a small amount of data characterizing each frame. The applicability of the invention includes all uses that require video identification, without the necessity of viewing.
Preferably, the system of the present invention has a PC-based platform and is operative in real-time to analyze motion pictures, video and broadcasting, inter alia.
The system of the present invention typically uses small amounts of data, to capture a signature from a stream of video frames. The signature is then matched to a continuous stream of data.
Preferably, the system of the present invention includes a watcher which synchronizes various versions of a motion picture for diverse multi-language needs includ-ing but not limited to satellite TV broadcasts, on-board film projections and DVD authoring. Another application for the system of the present invention is simplification of the restoration of damaged films by using the best footage from different versions. Yet another application is rapid adaptation of sound tracks for colorized movies.
The watcher subunit typically does not digitize video sources but rather fingerprints pictures. As a result, the watcher can process substantially any video source, such as a S-VHS video source or a 1" video source. Typically, a cassette is inserted, and a check-list is employed to choose the language to be used as a reference for matching. The user then presses PLAY and the watcher autonomously and typically without user intervention registers the fingerprint of each frame.
This procedure is repeated for the next language version of the film to be checked (cassette insertion, language wo ~~s prrnL9sroos~

3 selection, play). After the various versions have been fingerprinted, the versions are automatically matched, showing the differences that were detected.
The matcher preferably is operative to generate any of a variety of outputs. For example: if it is de-sired to broadcast multiple language versions of a film simultaneously on satellite TV, the versions must be synchronized, matcher can generate an EDL (editing list) based on the shots common to all the versions. In multi-language DVD applications, the matcher may be operative to automatically generate a branching instruction list, based on 'holes' caused by missing data in the various versions.
The system of the present invention also pref-erably includes a synopter for efficient viewing of video sequences. Applications include stock footage, rushes and speed-viewing of selected (typically user-selected) items of interest.
The system of the present invention also pref-erably includes a storyboard application which displays the first frame of every shot in an image sequence, thereby to facilitate fast-tracking of shots from rushes or stock footage. This application can operate ~s a search option for professional and home-use. The technol-ogy shown and described herein may be integrated into VCR's, thereby facilitating speed-searching.
For example, a user may press a first activat-ing button and as a result, his VCR automatically adjusts search speed according to the amount of action in any given scene of a movie: slower for action-packed se-quences and faster for less active moments. If the user presses a second activating button, the VCR automatically screens the first few seconds of every shot in a video, allowing the user to quickly preview the video's content.
Controlling and registering transmission of commercial spots is one of the broadcaster's most tedious WO 99/30488 PCT/1L98/~OOS~

4 fobs. The system of the present invention preferably includes a spot shotter which monitors the off-air sig-nal, detecting the exact moment when specific portions of any given transmission are broadcast, and automatically logging relevant information such as time of transmission and duration.
For example, the spot shotter may be "told" to detect every appearance of commercials belonging to a particular manufacturer.
Another difficult, time-consuming function for which the system of the present invention preferably is suited is automatic checking of video dubs for uniformity of content.
There is thus provided, in accordance With a preferred embodiment of the present invention, video sequence viewing apparatus including an image sequence display unit operative to display a sequence of images at a speed determined in accordance with a control signal, and an image sequence analyzer operative to perform an analysis of the sequence of images and to generate the control signal in accordance with a result of the analy-sis.
Further in accordance with a preferred embodi-ment of the present invention, the analysis of the se-quence of images includes an analysis of the amount of motion in different images within the sequence and the control signal receives a value corresponding to rela-tively high speed for images in which there is a small amount of motion and a value corresponding to relatively low speed for images in which there is a large amount of motion.
Also provided, in accordance with another preferred embodiment of the present invention, is image sequence viewing apparatus including a shot identifier operative to perform an analysis of a sequence of images and to identify shots within the sequence of images, and an image sequence display unit operative to sequentially display at least one initial images of each identified shot.
Further in accordance with a preferred embodi-ment of the present invention, the image sequence display unit is operative to display the at least one initial images of each identified shot in response to a user request.
Still further in accordance with a preferred embodiment of the present invention, the image sequence display unit is operative to display the at least one initial images of all shots sequentially until stopped by the user.
Also provided, in accordance with another preferred embodiment of the present invention, is a display system for displaying a first image sequence as aligned relative to a second, related image sequence, the system including an image sequence analyzer operative to generate a representation of a first image sequence including at least one row of pixels of each image in the first image sequence, and an aligned image sequence display unit operative to display the rows generated by the analyzer, side by side, in a single screen, wherein gaps are provided between the rows, in order to denote images which are missing, relative to the second image sequence.
Further in accordance with a preferred embodi-ment of the present invention, the at least one row includes at least one horizontal row of pixels and at least one vertical row of pixels.
Still further in accordance with a preferred embodiment of the present invention, the display unit is operative to display an isometric view of a stack of the images in at least one of the first and second image sequences.
Additionally in accordance with a preferred embodiment of the present invention, the stack includes a horizontal stack.
Further in accordance with a preferred embodi-ment of the present invention, the analyzer also includes an image sequence aligner operative to align the first and second image sequences to one another and to provide an output denoting images which are missing from the first image sequence, relative to the second image se-quence.
Additionally provided, in accordance with yet another preferred embodiment of the present invention, is a copyright monitoring system including ar. image sequence comparing unit operative to conduct a comparison between an original image sequence and a suspected pirate copy of the original image sequence and to generate copyright information describing infringement of copy-right of the original image sequence by the suspected pirate copy, and a copyright infringement information generator operative to generate a display of the copy-right information.
Further in accordance with a preferred embodi-ment of the present invention, at least a portion of the comparison is conducted at the shot level.
Still further in accordance with a preferred embodiment of the present invention, at least a portion of the comparison is conducted at the frame level.
Further in accordance with a preferrec embod-iment of the present invention, the copyright information quantifies the infringement of copyright of the original image sequence by the suspected pirate copy.
Also provided, in accordance with yet another preferred embodiment of the present invention, is a watermarking method including providing an image sequence to be watermarked, and performing a predetermined altera-tion of the length of the image sequence.
Further in accordance with a preferred embod-WO 99/30488 PCT/IL98~00596 invent of the present invention, the performing step includes duplicating at least one predetermines image (e. g. frame or field) in the image sequence.
Still further in accordance with a preferred embodiment of the present invention, the performing step includes omitting at least one predetermined image (e. g.
frame or field) from the image sequence.
Further in accordance with a preferred embod-iment of the present invention, the image sequence ana-lyzer is operative to generate aligned representations of the first and second image sequences and the display unit is operative to display the aligned representations on a single screen.
Also provided; in accordance with yet another preferred embodiment of the present invention, is a video sequence viewing method including displaying a sequence of images at a speed determined in accordance with a control signal, and performing an analysis of the se-quence of images and generating the control signal in accordance with a result of the analysis.
Further provided, in accordance with yet another preferred embodiment of the present invention, is a an image sequence viewing method including performing an analysis of a sequence of images and to identify shots within the sequence of images, and sequentially display-ing at least one initial images of each identified shot.
Additionally provided, in accordance with yet another preferred embodiment of the present invention, is a method for displaying a first image sequence as aligned relative to a second, related image sequence, the method including generating a representation of a first image sequence including at least one row of pixels of each image in the first image sequence, and displaying the rows generated by the analyzer, side by side, in a single screen, wherein gaps are provided between the rows, in order to denote images which are missing, rela-WO 99/30488 PCT/IL98l00596 tive to the second image sequence.
Further provided, in accordance with yet another preferred embodiment of the present invention, is a copyright monitoring method including conducting a comparison between an original image sequence and a suspected pirate copy of the original image sequence and to generate copyright information describing infringement of copyright of the original image sequence by the sus-pected pirate copy, and generating a display of the copyright information.
Still further provided, in accordance with yet another preferred embodiment of the present invention, is a watermarking system including an image sequence input device operative to input an image sequence to be water-marked, and an image sequence length alteration device operative to perform a predetermined alteration of the length of the image sequence..
BRIEF DESCRIPTION OF THE DRAWINGS AND APPENDIX
The present invention will be understood and appreciated from the following detailed description, taken in conjunction with the drawings and appendix in which:
Fig. 1 is a simplified block diagram illustra-tion of a commercial verification system constructed and operative in accordance with a preferred embodiment of the present invention;
Fig. 2 is a simplified flowchart illustration of a preferred method of operation for the system of Fig.
1;
Fig. 3 is a simplified block diagram illustra-tion of a system for viewing image sequences at variable speed, depending on temporally local characteristics of the image sequence such as the amount of action;
Fig. 4 is a simplified flowchart illu:,tration of a preferred method of operation for the system of Fig.
3; - -Fig. 5 is a simplified block diagram illustra-tion of a system for finding and displaying shots in an image sequence;
Fig. 6 is a simplified flowchart illustration of a preferred method of operation for the system of Fig.

5;
Fig. 7 is a simplified block diagram illustra-tion of a system for displaying alignment of twc image sequences;
Fig. 8 is an isometric view of an image se-quence;
Fig. 9 is an example of an isometric view of three different-language versions of the same motion picture, where gaps in the representation of a particular version indicate missing images, relative to other ver-sions;
Fig. 10 is a simplified block diagram illustra-tion of a copyright monitoring system constructed and operative in accordance with a preferred embodiment of the present invention;
Fig. 11 is a simplified block diagram of an electronic watermarking system constructed and operative in accordance with a preferred embodiment of the present invention; and Appendix A is a copy of Israel Patent Applica-tion No. 119504;
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
Fig. 1 is a simplified block diagram illustra-tion of a commercial verification system constructed and operative in accordance with a preferred embodiment of the present invention. Fig. 2 is a simplified flowchart illustration of a preferred method of operation for the wo ~pcrnL9s~oos~
to system of Fig. 1. It is appreciated that the system of Figs. 1 - 2 is also useful for applications other than commercial verification, such as searching for illicit use of copyrighted sequences of images.
The apparatus of Fig. 1 includes a broadcasting system 10 which broadcasts commercials provided on a suitable receptacle 20 such as a CD or DVD or video cassette. A commercial verification workstation 30 is operative to receive broadcasts from the broadcasting system (either from the air or from a receptacle which was used to store broadcast material coming from the air) and to compare the broadcasts to an original commercial residing on the receptacle 20. The workstation attempts to identify some or all of the original commercial within the broadcasted material.
Any suitable method may be used to compare the broadcast with the original commercial. Preferably, the comparison is on the frame-level, i.e. individual frames in the broadcast, or signatures thereof, are compared to individual frames in the original commercial, or signa-tures thereof. Shot level comparison, in which entire shots in the broadcast are compared to entire shots in the original commercial, are typically not accurate enough. Preferred methods for comparing sequences of images, such as video images, including signature extrac-tion and signature search (steps 60 and 70 of Fig. 2) are described in issued US Patent No. 5,790,236 and in Appendix A. Preferably, the broadcast and the original commercial are compared based only on the content of the advertisement and without requiring any special addi-tions, e.g. without external indices, special information in vertical blanks and other special additions.
The output of the workstation 30 typically includes a recording of the commercial as broadcast and an indication of the time or times at which the commer-cial was broadcast, plus an indication of any incomplete-wo ~r3o4ss pcrn~9sroos~

ness in the commercial as broadcast. The output may be provided on a screen, in electronic form, as hard copy or in any other suitable format.
Figs. 1 - 2 illustrate a "cooperative" applica-tion in which the original commercial is available. It is appreciated that in some applications, in which the broadcaster and/or the advertiser are non-cooperative, the original commercial may not be available. For exam-ple, commercial monitoring of a competitor's commercials may be carried out, in which case the original commercial is, of course, not available. In these cases, a first appearance of a target commercial can be identified by a human being viewing the broadcast, and this appearance of the target commercial can then be treated as the original commercial. Alternatively, commercial monitoring can be carried out without having an original commercial, i.e.
without having a model to which to compare the broadcast.
For example, the system may monitor recurrence of short image sequence (i.e. image sequences which correspond in length to the known range of lengths which characterize a commercial) at time intervals which correspond to known intervals between commercial breaks.
Fig. 3 is a simplified block diagram illustra-tion of a system for viewing image sequences at variable speed, depending on temporally local characteristics of the image sequence such as the amount of action. Fig. 4 is a simplified flowchart illustration of a preferred method of operation for the system of Fig. 3.
The apparatus of Fig. 3 includes a receptacle 90 storing an image sequence and an image sequence ana-lyzer 100 which is typically operative to derive from each image in the image sequence a signature representing at least one characteristic of the image. For example, a "span" signature may be employed, which represents the amount of action in the image. The amount of actic.n in an image is typically defined as the rate of change between wo m3oass pcrnL9sroos~

that image and adjacent images. Preferred methods for derivation of a "span" signature is described in issued US Patent No. 5,790,236 and in Appendix A.
The analyzer typically thresholds the signature (step 140) in order to obtain a control signal having a small number of possible values, such as 3 or 4 possible values. More generally, the control signal need not be a simple thresholded version of the signature (e.g. of the span). The control signal can have only as many values as the image sequence display unit 110 has viewing speeds.
However, any suitable function may be employed to assign values to the control signal as a function of the signa-ture. For example, the values assigned to the control signal may depend in part on second or higher order derivatives of the signature variable.
The control signal is fed to an image sequence display unit 1I0 such as a VCR which adjusts its speed accordingly.
Different viewing speeds can be provided by mechanical display units having motors with adjustable speed. Alternatively, if the display unit is electronic, different viewing speeds may be provided by varying the rate of display of images stored in the electronic unit.
Fig. 5 is a simplified block diagram illustra-tion of a system for finding and displaying shots in an image sequence. Fig. 6 is a simplified flowchart illus-tration of a preferred method of operation for the system of Fig. 5.
The system of Fig. 5 includes a receptacle 160, such as a CD, DVD or video cassette, which stores an image sequence. An image sequence display unit, such as a VCR, is operative to display the image sequence as stored on the receptacle. The image sequence is also accessed by a shot identifier 170 which is operative, preferably on-line, to identify shots in the image sequence. Any suit-able method may be used to identify the shots (step 200).

Preferred methods for identifying shots are described in issued US Patent No. 5,790,236 and in Appendix A.
The shot identifier provides a control signal, based on the locations of the shots within the image sequence, to the display unit 180. The control signal typically instructs the image sequence to display a predetermined number of frames, such as one or a few frames, at each cut, i.e. at each interface between shots. In other words, the image sequence display unit typically displays the first one or few images in each shot.
If the receptacle storing the image sequence is a physical medium such as video cassette, there is typi-cally a time-gap between the display of the frames repre-senting the i'th shot, and the display of frames repre-senting the (i+1)'th shot. However, if the receptacle storing the image sequence is an electronic medium, there is typically no time-gap between the display of the frames representing subsequent shots.
It is appreciated that the image sequence display unit may display initial images for all of the shots in response to a single user command. Alternative-ly, the user may provide a "next shot" input each time s/he wishes to view the initial images of the next shot.
Fig. 7 is a simplified block diagram illustra-tion of a system for displaying alignment of two image sequences. The system of Fig. 7 includes two image se-quence receptacles 220 and 230, such as CDs, DVDs or video cassettes, storing two respective image seauences, such as two versions of the same motion picture. The two image sequences are aligned by an image sequence aligner 240. Image sequence aligner 240 may use any suitable image sequence aligning method to align the two sequences to one another. Preferred image sequence aligning methods are described in issued US Patent No. 5,790,236 and in Appendix A.

An isometric view generator 250 is operative to generate an isometric view of each of the image se-quences. A simple isometric view of an image sequence, as illustrated in Fig. 8, may comprise an isometric view of a stack of the images in the sequence, wherein each image iS regarded as a one-pixel thick rectangle, wherein all visible faces of each pixel have the color value of the pixel. It is appreciated that in the isometric view of Fig. 8, the top row of each image is visible along the top of the horizontal stack and the rightmost column of each image is visible along the side of the horizontal stack.
The isometric view generator 250 receives information regarding the alignment of the two sequences to one another from the image sequence aligner 240 and introduces gaps into the isometric view so as to illus-trate the alignment. The output of the isometric view generator is typically an electronic representation 260 of an isometric view of the aligned image sequences. This representation 260 is provided to an image sequence display unit 270, such as a VCR, for display. Preferably, both aligned sequences are displayed, in isometric view, on a single screen.
Fig. 9 is an example of an isometric view of three different-language versions of the same motion picture, where gaps in the representation of a particular version indicate missing images, relative to other ver-sions. As shown, the German version is most complete and includes no gaps, the French version has one large gap (sequence of missing frames, relative to the German version) and two smaller subsequent gaps and the English version has a total of four gaps which are not in the same locations as any of the 3 gaps of the French ver-sion.
Fig. 10 is a simplified block diagram illustra-tion of a copyright monitoring system constructed and WO 99/30488 PCT/IL98~00559G
operative fn accordance with a preferred embodiment of the present invention. The apparatus of Fig. 10 typically includes receptacles 300 and 310, which may comprise video cassettes, DVDs, CDs and the like, which respec-tively store an original motion picture and a suspect pirate copy thereof. The image sequences stored in recep-tacles 300 and 310 are accessed by an image sequence comparison unit 320 which typically operates either at shot level or at frame level, to compare the two image sequences. Any suitable method may be employed for com-parison of the two image sequences such as the methods described in issued US Patent No. 5,790,236 and in Appen-dix A.
The output of the image sequence comparison unit 320 typically comprises copyright monitoring- infor-oration such as two aligned isometric views of the origi-nal movie and the suspect pirate copy, in which gaps denote missing frames and identical frames are placed opposite one another. Alternatively or in addition, quantitative copyright monitoring information may be provided such as the number of frames in the original movie which appear in the suspect pirate copy.
Fig. 11 is a simplified block diagram of an electronic watermarking system constructed and operative in accordance with a preferred embodiment of the present invention. According to a preferred embodiment of the present invention, image sequences such as motion pic-tures, news clips, commercials etc. are watermarked not by tampering in any way with any particular frame, since this tampering may impair viewing quality, but rather by either removing or adding a small number of frames from or to the image sequence. The watermark of each version or each image sequence is typically stored in an elec-tronic databank.
In the illustrated embodiment, original and pirate copies 350 and 360 respectively of a motion pic-wo ~r~o4ss Pc~rn~9sroos~

ture are received by a frame-level image sequence aligner 370, in electronic form, from a video cassette (after digitization) or from a CD or DVD or other suitable image sequence receptacle. The frame-level image sequence aligner 370 is operative, according to a first embodiment of the present invention, to align the image sequence of the pirate copy to the image sequence of the original copy which preferably includes a "maximal", i.e. "union"
version of the motion picture whose frames include the union of all frames in all versions of the motion pic-ture. Any suitable method may be employed to align the two image sequences, preferably at frame level. Preferred methods for alignment of image sequences are described in issued US Patent No. 5,790,236 and in Appendix A.
Once the alignment has been determined, a watermark identifier 380 is operative to attempt to compare each of a plurality of watermarks to the aligned pirate copy. Preferably, each version of a motion picture is watermarked, including the post-production version, and each subsequent version. The "post-production ver-sion" is the motion picture as originally produced, before subsequent versions are derived therefrom. Subse-quent versions are typically characterized by at least one of the following:
a. Intended distribution (airline, cable TV, cinema, etc.):
b. Language c. Censorship (X-rated, PG-rated, R-rated, etc.) The watermarks may be defined relative to the original copy 350. For example, "Frame #4974" is typi-cally frame no. 4974 in image sequence 350. This is advantageous because then each suspected pirate copy need only be aligned once, to the original copy 350 (e.g. the post-production copy).
Alternatively, the frame-level image sequence aligner 370 is operative, according to a second embodi-ment of the present invention, to align the image se-quence_of the pirate copy to the image -sequences of each watermarked version separately, rather than aligning the pirate copy image sequence only once, to the "maximal" or "union" version of the motion picture. In this embodi-ment, the watermark of each version need not be defined relative to the original copy 350. For example, if every 500th field is duplicated in a PG-rated version of a motion picture, this easy rule is stored rather than computing the fields, in the maximal (complete) version, which correspond to each 500th field in the PG-rated (incomplete) version.
As shown, in the illustrated example, three watermarks are stored in this system, for each of three versions of a motion picture: post-production version, airline version, and cinema version. The airline and cinema version are typically produced from the water-marked post-production version. Typically, the wztermark of the post-production version is deleted when the air-line, cinema, television versions, etc., are derived from the post-production version. The post-production water-mark is replaced by the watermark of the version being generated. For example, if every 500th frame is duplicat-ed in the post-production version, whereas the watermark of the airline version calls for deletion of every 1000th frame, then the airline version is generated from the post-production version as follows:
a. the duplications of each 500th frame are re-moved; and b. each 1000th frame is deleted.
As shown, in the illustrated example, the post-production Watermark comprises a duplication cf four specific frames. The airline version watermark cc.mprises a duplication of one frame and removal of 3 other specif-ic frames. The cinema version watermark comprises removal of 3 specific frames.

wo ~Pcr~L9sioos~

The watermark identifier 380 is operative to indicate the version from which the pirate copy is de-rived. For example, if the watermark identifier 380 finds that frames 17, 479 and 19,999 in the original copy 350 are missing in the pirate copy 360, the watermark identi-fier puts out a suitable output indication that the pirate copy was derived from the cinema version of a film.
It is appreciated that the software components of the present invention may, if desired, be implemented in ROM (read-only memory) form. The software components may, generally, be implemented in hardware, if desired, using conventional techniques.
It is appreciated that various features of the invention which are, for clarity, described in the con-texts of separate embodiments may also be provided in combination in a single embodiment. Conversely, various features of the invention which are, for brevity, de-scribed in the context of a single embodiment may also be provided separately or in any suitable subcombination.
It will be appreciated by persons skilled in the art that the present invention is not limited to what has been particularly shown and described hereinabove.
Rather, the scope of the present invention is defined only by the claims that follow:

APPENDIX A
COPY OF ISRAEL PATENT APPLICATION N0. 119504 WO 99/30488 2~ PCT/IL98/OOS96 SYSTEM AND i~THOD FOR
AUDIO-VISU.~i, CONTENT VERIFIC.~TION
Field of the Invention The present invention relates to audio-visual test and measurement systems and more particularly to a method and apparatus for comparing a given content stream with a reference content stream for verifi~~ing the correctness of a given data stream and for detecting various content-related problems, such as missing or distorted content. as well as badly synchronized content streams such as audio or sub-titles delayed with respect to the video stre3tn.
''Audio-visual content'' is herein defined as a stream or sequence of video, audio, graphics (sub-pictures) and other data where the semantics of the data stream is of value. The term ''stream" or ''sequence" is of particular importance, since it is assumed that the ordering of content elements along a, time or space line constitutes pan of the content.
Background of the Invention Elementary content streams may be combined to a composite stream.
Starting with a simple monophonic audio or video transmission. an application which involves two video streams (for stereoscopic display), six or eight surround audio channels and several sub-picture channels can be formed.
Generally, the relative alignment of these streams is highly significant and should be ver ified.
In known systems. an analysis is made of video signal for detecting disturbances of that sigrai. suc:~ as ilIe~a1 colors. An ''illegal color" is one that is outside the practic3! limit set for a particular format. Other zt~pPs of video wo ~r~o4ss pcrnL9sroos~

measurement involve iniectinQ known signals at the source and evaluatins certain properties thereof at the receiving end.
With the introduction of the serial digital interface (SDI) standard. now used as a carrier for video, audio and data, error detection schemes are designed for testing data integrity. Such a scheme has already been proposed.
The known video test and measurement systems are, however, generally not capable of detecting content-related problems. such as missing or surplus frames. program time shift color or luminance distortions which are within the acceptable parameter range, mis-alignment of content streams such as audio or sub-pictures with respect to video, etc.
In many facilities. an observer will look at the display to detect quality problems. An experienced operator may detect and interpret a variety of problems in recording and transmission. An obser ~~er can do good rule-based or subiective evaluation of video content. however. human inspection of content is costly and unnredictabte. Additionally, some content-related defects cannot be detected by an observer.
As state of the art content delivery technologies such as mufti-channe=l Digital T~%, Digital Video Disk and the Internet provide more content and interactiviy. content-related problems are mare likely to occur: since the path from the contest sources to the end-user becomes more complicated.
:~dditionallv, the huge amounts of content generated. edited. recorded and transmitted in multiple channels and multiple disL-ibution slots (such as video-on-de:r:andl make human inspection almost irnpossi>:le.

WO 99/30488 22 PCTJIL98/~00596 It is therefore a broad object of the invention to provide a computerized method and system for comparing a given content stream with a reference content stream. for verin~inQ that the given stream is in fact the correct one and to detect various content-related defects.
In many cases, the reference sue3m consists of the original' program material and the actual stream consists of the broadcast or played content. In other cases. the designation of one stream as, the reference stream is arbitrary= for e.~cample. comparing one content stream with a backup stream. However, for convenience of description hereinafrer. the terms "reference content sueam'' and "actual content stream" will be used. without limiting the generality of the invention.
For illustrative purposes only. the invention will be described by nvo applications: broadcast automation and digital versatile disc (DVD) pre-mastering. T"nis .description however. is not intended to limit the Qencrality of the invention or its applicability to other domains.
Today's mufti-channel. mufti-program applications cannot be controlled manually. Including commercials and prosram trailers. a daily schedule may consist of hundreds of video sesments, intended to play seamlessly. Such a schedule is usually implemented by an automation system. ne schedule is longed into the systea as some form. of a table (a "play-list") describing the prosram's name. start time. duratior! and source, e.g., storage media. uniaue identifier. time-code of first Fame.

The storage media can be a tape or a digital file. Generally, the program source material is organized in an hierarchical manner. with most of the content stored off-line. The forthcoming programs are loaded on a tape machine and sometimes. as in the case of a commercial or trailer. digitized to a disk-based server. The comple~c paths of the various elements of content may further increase the content mismatch probabilit'~.
An e~amale of such an automation system is the :~.DC-100 from Louth Automation. ADC-100 can run up to 16 lists simultaneously, and control multiple devices including disk servers. video servers. tape machines. cart machines. ~"'fRs. switchea. characte: generators and audio carts. The present invention can verily the identity and integriy of the broadcast content.
providing important feedback for the automation system or facility manage:.
DVD is a new generation of the compact disc format which provides increased storage capaciy and performance. especially for video and multimedia applications. DV'D -for video is capable of storing eight audio t.-acks and thim-t<vo ."sub-picture" tracks, which are used for subtitles. menus, ete. T'nese can be used to put several selectable languages on earh disc. The interactive capabilities of consumer ~ DVD players include menus with a small set of navigation and control commands. with some functions for dynamic video stre3tn control. such as seamless branching, which can be used for playing different "cuts" of the same video mate:ial for dramatic purposes. censorship. etc. DVD-ROhI. which will be used for mufti-media applications. will z~chibit a hisher level of interactivity.
Since D~ v contains multiple content streams with many options for branchins from one stream to the othe: or combining several streams. such as a WO 99/30488 24 PCTlIL98/tNIS96 menu or sub-titles overlaid on a video frame, one has to verify that a given set of initial settings, followed by a specific set of navigation commands. indeed produces the correct content. This step in DVD production is known as '=emulation''. currently designed to be performed by an obser ver. The present invention also allows automation of DVD emulation.
It is important to note that in DVD, the video image is composed of the motion picture stream overlaid by sub=pictures or graphics. such as sub-titling.
Although all video streams and all sub-picture bitmaps are available before emulation takes place. the composite image depends on the actual user's choices and the user's "navigation'' in the content tree. It is impractical to generate all possible compositions prior to emulation and use these as the reference content.
Therefore, descriptors of the actual content must be compared against appropriate desc:iptors of the component streams.
In both broadcast or DVD applications. it may be necessary to detect video compression artifacts. While some of these are due to the mathematical compression itself. other may arise during transmissioniplayback. due to buffer overflow and ocher reasons. ~ common image compression artifact is "blockiness'' or the visibility of edues between imase blocks. Detecting artifacts in a completely rule-based manner. such as lookins for these edges, may be misleading since such edges may be present in the original, uncompressed image.
An image-refe:eace based approach in which the compressed imase is compared with the original image provides a good .tool for algorithm evaluation.
However.
in a practical situation. such an image will not be available ac the rece:vinQ'plavbac:: end for r esl-time detection of compression arufacs. It is wo 99130488 Z 5 PCT/IL98<00596 therefore necessary to compare compressed material with the original material.
based on concise content descriptors computed from both streams.
It is an object of the present invention to provide a content verification system in which as audio-visual program broadcast or recorded on storage media can be compared with a, reference program.
The audio-visual program comprises at least one video channel. or at last one audio channel. or at last one sub-picture channel comprising sub-titles.
closed-captions and any kind of au.~ciliay graphics information which is timed synchronously with the video or audio. While in certain applications sub-pictures are embedded in the ~rideo image sequence. in other applications they are carried by a separate stre3mifile.
Summary of the Invention The present in~~ention therefore provides a method of comparing the content obtained by broadcast or playback with a reference content: including the sups of enracting frame characteristic data streams from said reference content and from actual received or playback content, aliening said streams and comparins said strums on a frame-by-frame basis.
U.S: Patent Vo. 5,339,166, entitled "hrlotion-Dependent Image Classification for Editing Purposes," describes a system for comparing t<vo or more versions, t<picallv of different dubbing Ia115tuaQeS, of the same feature fiim.
Bv identifying camera shot boundaries in both ver lions and comparing s~quenc~s of shot length. a common video version. comprising camera shots which e:cist in all Ye:sions. can be automatically generated. While the embodiment described in WO 99/30488 ~6 PGT/IL98/00596 this patent allows. in principle. the location of content differences between versions at camera shot level. frame-bv-frame alignment for all frames in the respective version is not performed. Further, the differences detected are in the e.cistence or abseace of video frames as a whole. In contrast, the present invention allows frame-bv-frame inspection of color properties, detection of compression artifacts, audio distortions. etc.
Furthermore. in the U.S. patent. the content of each frame is fined and characteristic data are computed from the content. The present invention. on the other hand. addresses the on-line composition of a content stream from basic content streams. such that characteristic data are pre-computed on1 for these basic streams. Given the branching%navigation/editing commands. a composite reference characteristic data stream is predicted from the component characteristic data stream and then compared with the actual content stream.
Moreover. the present invention does not depend on the specific format~representation of the content sources and streams. In the same application.
one stream may be analog and the other dieital. Additionally, one stream may bz compressed and the other may be of full bandwidth. Typically, in a broadcast environment. the input will be CCIR-60I digital video and :~,ES digital audio.
Multiple audio streams may be due to different dubbins languages. as well as stereo and surround sound channels.
Gene:allv. the e:ctraction of characteristic data will bz done in real-time, t<'~us saving intermediate storage and also enablins real-time error detection in a broadcasting environment. riawever, this is not a :imitation. since the present invention can be used otf line b :~ recording both the reference and the aceial WO 99/30488 2~ PCT/IL98~IOOS96 audio-visual pro~a~m. When working off line. processing can be slower than real-time or faster. depending on the computational resources. When verifying dubs or copies of video cassettes. a faster than real-time performance may be needed. -depending. of course, on the availabiliy of a suitable analog to digital converter which can cope with fast-forward video signals.
Brief Description of the Drawings The invention will now be described in connection with certain preferred embodiments with reference to the following illustrative flgureS so that it may be more fullr~ understood.
With specif c reference now to the figures in detail. it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of the preferred embodiments of the present invention only, and are presented in the cause of providing what is belie~~ed to be the most useful and readily understood description of the principles and conceptual aspects of the invention. In this regard. no attempt is made to show structural details of the invention in more detail than is necessary for a fundamental understanding or the invention, the description taken with the drawings making apparent to those skilled in the art how the several forms of the invention may be embodied in practice.
In the drawings:
Fig. 1 is a block dia~am of a top level flow of processing of an audio-visual content v e: iiication system:
Fig. ? is a block diagram of a circuit for storing de:ected content problems:
Fig. ~ schelraticallv illust.-ates an array of video sequence characteristic data:

Flg: :t schematically illustrates an array of video frame or still image spatial characteristic data:
Fig. ~ schematically illustrates a set of regions in a video frame:
Fig. 6 schematically illustrates relative location of graphics sub-pictures with respect to the video frame:
Fig. 7 is a block dia~am illustrating enraction of sub-title characteristic data:
FiQ. 8 is a block diagram illustrating sub-title image sequence processing;
FIQ: 9 sche:naticall~ depicts a record of sub-pictures characteristic data:
Fig. 10 is a block diasram illustrating derivation of audio characteristic data;
Fig. I1 is a block dia~am of a circuit for the selection of anchor frames for coarse alignment:
Fig. I? is a block dia~am of a circuit for alignment of a composite stream with the component reference streams:
Fig. 13 is a block diag:am of a circuit for frame verification processing; and F1Q. 1-I is a block diagram of a characteristic data design workstation.
Detailed Description of Preferred Embodiments VNith reference now to the drawings, FiQ. 1 shows a top level flow of processing of an audio-visual content verification system accordins to the present invention. Reference sub-picture stream 11. video sueam I'_' and audio strum are stored in their respective stores 14, I ~ and 16. to be eventually processed by processors 17, 18 and 19, respectively. The combination of sub-pictures with video, as well 3s traaitiorvbranching between prosaam segments. is applied at characteristic data Ievel bY predictor ?0. ariven by navigationiplayback commands ? I.

wo r~r3oass 29 Pc~rnL9sroos~
Actual video stream 2? and audio stream 33 are stored in their respective stores 2~ and 25. to be later processed by processors ?6 and ?' respectively.
The video stream ?'' and the corresponding. characteristic data are composed of video and sub-pictures.
Once in the characteristic data stores ' 8 and ''9. the data streams are input to the characteristic data aliQnrnent processor 30. resulting in frame-alis_med characteristic data. The alignment process also results in a program time-shift value. as well as indices or time-codes of missing or surplus frames. Once the data are frame-aligned. characte:istic data are compared on a frame-by-frame basis in comparator 3?. yielding a frame quality report.
Fig. ? shows mesas for storing detected content problems. Recently played/received video from store 3=1 undergoes compression in engine 3~ and is then stored in buffer =~. The recently played/received audio from store ''~ is directly stored in buffer 36.. Transfer controller 3 ~ is activated by verification reports 38 to transfer the content into hard disk storage 39. where it can be later analyzed.
Fig. 3 shows an array of video sequence characteristic data ~0. The list comprises image difference measures, as well as image motion vectors. These measures may include properties of the histo~am of the difference image, obtained by subtractins nvo adjacent images. as is known per se. In particular.
the "span" characteristic data. deimed as the difference in gray levels between a high le.Q.. 8~) pe:centile and a low (e.g., 1~) percentile of said histosram.
was found to be use~ul. AIterzativelv. a measure of difference of intensity histogram of t~yo adiacen: images, also by a known technique. rnay be used.

Motion vector fields are computed at pre-determined locations while using a block-matching motion estimation algorithm. Alternatively. a more concise representation may consist of camera motion parameters, preferably estimated from image motion vector fields.
Fig. ~ shows an array of video frame or still image spatial characteristic data. T'ne list comprises color characteristic data :~ 1. te~cture characteristic data ~3 and statistics de:ived from imase regions. Such statistics may include the mean.
the variance and the median of luminance values. Usefu'color characteristic data include the first three moments: average, variance arid :.skewness of color components:
1 '' u, =-~P, :V
,.~
_ ~-. ~ ~P,r -fl,. )' s~ _'~ i ~(P~, W)-y :V ~.i where pl is the value of the i-th color space component of the j-th image pixel.
Color spaces of convenience may include the (RG,B) representation or the (Y .LT.V ), which provide luminance characte: istic data through the Y
component.

WO 99/30488 31 PCT/IL98I~00596 Texture provides measures to describe the structural composition, as well as the distribution. of image gray-levels. Useful te~cture characteristic data are derived from spatial ~av-level dependence matrices. These include measures such as energy, entropy and correlation.
The selection of ,characteristic data for a specific application of content verification is important. Tenure and color data are important for matching still images. Video frame sequences with significant motion can be aligned by motion characteristic data. For more static sequences. color and te.~cture aata can facilitate the alignment process.
When computing color and texture characteristic data, the region of support. that is, the image region on which these data are computed. is significant. Using the entire image, or most of it. is prefe:red when robustness and reduced storage are required. On the other hand. derivins multiple characteristics at numerous. relatively small image regions has nvo important adv antages:
1 ) better spatial discrimination power (like a low resolution image); and ?) when overlaid by sub-picture (graphics), those re2ions which do not intersect with graphics data still can be matched with corresponding characteristic data of the original video frame.
Fig. ~ shows a se: of regions ~2 in a video frame :~:. such that color or texture characteristic data are computed for each such region. Fig. 5 illustrates tape relati~~e location of graphics sub-pictures with respect to t:~e video frame.
dumber -~~ represents a sub-title sub-picture and number ~~ represents a menu-ittm sub-picture.

WO 99130488 32 PCT/IL98ro0596 Figs. i and S show the extraction of sub-title characteristic data. Sub-titles ._ or closed c3pdons in a movie are used to bring translated dialogues to the viewer.
Generally, a sub-title will occupy several dozen frames. A suitable form for sub-title characteristic data is time-code-in. time-code-out of that specific sub-title, with additional data desc:ibing the sub-title bitmap. The sub-title image sequence processor :~6 analyses every video frame of the sequence to detect specific frames at which sub-title information is changed. The result is a sequence of sub-title bitmaps, with the frame interval each such bitmap occupies in a time-code-in.
time-code-out representation. Characteristic data are then e:ca~acted by unit :~7 from the sub-title bitmap.
Fig. 8 shows the sub-title image sequence pracessor ~6. The video image passes through a character binarization processor ~8, ope:ative to identify pixels belonging to sub-title characters and paint them white, for example, where the background pixels are painted black. At every frame. the current frame bitmap ~9 is compared, or matches, with the stored sub-title bitmap from the frst instance of that bitmap. at the first mismatch event, the sub-title bitmap is reported with the corresponding time-code intewal. and a new matching cycle begins.
The matching process can be implemented b~~ a number or' binary template-matching or correlation algorithms. The spatial search range of the template-matching should accommodate mis-registration of a sub-title and additionally the case of scrolling sub-titles.
The caracteristic data of a single sub-title shot:Id be concise and allow for cfflcient matching. T'he sub-title bitmap, usually ran-leng'h coded. is a suitable representation. 4lternatively, one could use shape features of individual characters and a sub-title text string. using OCR software.
In addition to text. sub-pictures consist of graphics elements such as bullets, highlight or shadow rectangles. etc. Useful characteristic data are obtained by using circle and rectangle detectors. Fig. 9 shows a record ~0 of sub-pictures characteristic data.
Flg. 10 shows the derivation of audio characteristic data. In analog form.
the signal is digitized by the arrangement comprising an analog anti-aliasing filter ~ 1 and an A~'D converter 5? and then filtered by the pre-emphasis filter ~3.
Spectral analysis uses a digital filter bank. ~4. ~~~ . . .34". The filter output is squared and integrated by the power estimation unit ~ ~, ~ 5 ~ . . .~ a". The set of characteristic data is computed for each video frame duration (~0 cosec for PAL, or =3.3 cosec for NTSC) and stored in store ~6. Window duration controls the amount of averaging or smoothing used in power computation. Typically, a 60 or ~0 cosec window, for an overlap of 33°.'0, can be used.
The filter bank is a series of linear phase FIR filters, so that the group delay for all filters is zero and the output signals from the filters are synchronized in time. Each filter is specified by its center frequency and its bandwidth.
In many instances. the reference characteristic data stream is not available explicitly, but has to be derived from said source characteristic data and from playback commands such as denoted in F1Q. 1. A si.-nple case is when a program consists of ~onse~utive multiple content segments. Each such segment is spe:i::ed by a source content identifier . a beginning time-code and ar.
endins wo ~r3o~ss 34 pc~rnL9sroos~
time-code. Said reference characteristic data stream can be constructed or predicted from the corresponding seamenu of source characteristic data by means of concatenation. If content verification involves computing the actual content segment insertion points. these source characteristic data segmenu will be padded by characteristic data margins to allow for inaccuracies in insertion.
Sometimes the transitions involve not only cuts, but also dissolves or fades. When the composite image is a linear combination of t<vo source images.
some characteristic data can be predicted based on the origin~I source data as well as the blending values. These data include. for example. color moments computed over some region of support. In alignment and verification, the predicted values are compared against the actual values.
An important step in the verification process is the frame-b~-frame alignment of the characteristic data streams. The choice of the subset of characteristic data used for alignment is important to the success of that step.
Specifically. frame difference measures. such as the sawn desc:ibed above, are well suited to alignment. :~ coarse-fine strategy is employed. in which anchor frames are used to solve the major time-shift between the content streams.
Once that shift is known. fne frame-bv-frame alignment takes place.
An anchor frame is one with an unique structure of characteristic data in its neighborhood. Fis. 11 shows the selection of anchor frames for coarse alignment. Given the frame difference data, for example. the span sequence.
local variance estimation is e:fected in estimator ~? by means ef a sliding window.
Processors SS and ~Q produce a list of Iocal variance :ra.~cima which are above a suit.~blt thresholr. ~ consecutive Processing step in processor 6~J estimates the wo ~r~o~sa 35 pc~rni,9sroos~
auto-correlation of the candidate anchor frame with its frame difference data nei ahborhood. -In the step of refe:ence anchor frame selection, a further criterion may be used to increase the effectiveness of the aIigzunent step. The anchor frames are graded by uniqueness, i.e.. dissimilarity with other anchor frames. to reduce the probabiliy of false matches in the ne.~ct alignment step. Uniqueness is~computed by means of cross-correlation between the anchor frame and other anchor frames.
Bv associating the number of anchor frames with a cross-v:orrelation value lower than a specified threshold with the specific anchor frame, those frames with hi ghost uniqueness are selected.
Uniqueness pruning is applied only to the reference anchor frames.
Given the anchor frames of reference and actual stresm, coarse alignment now begins. Each reference and actual anchor frames pair such that the cross-correlation berneen their respective neighborhoods is above threshold and :yields a plausible alignment offset. expressed in frame count. All pairs are tested and the offsets are stored in an offset histogram array. False matches passing the cross-correlation tests will be manifested as ransom offset values or noise in the histogram. A nominal case of time-shifted actual content. with few or no dropped frames, will yield a single peak in the histogram. In the case of a larger number of missing or surplus frames. such as a few missing frames at each transition, the voting process described above will produce several peaks, each corresponding to a sisnificant shift.

wo 99r~o4as 36 rcrnL9s~oos~
Having solved the time-shift between corresponding stream characteristic data intervals which are bounded by matched anchor frames. the respective intervals have to be matched. The matching process can be described as a sequence of eait operators which transform the first interval or frame characteristic data to the second inten~al. The sequence consists of three such operators:
1 ) deletion of a frame from a first stream;
?) insertion of a frame to a first stream: and replacement of a frame from a first stream with a frame from a second stream.
Having associated a cost with each of these operations. the Fne frame alignment problem has now been transformed to finding a minimum cost sequence of operators which implements the aansformation. If m is the lensth of the first inte:wal and n is the length of the second intewal in frames. then the matching problem can be solved in space and time proportional to (m*n). X11 that remains is to set the respective costs. Deletion and insertion can be assigned a fired cost each, based on a priori information on the probabili~~ of dropped or surplus frames. Replacement is a distance measure on the characteristic data vector. such as weishted Euclidean distance.
FiQ. 1? shows the alignment of a composite stream with the component reference streams by means of a processor 61 and geometric f lter 6?. In a simple case, sub-title graphics of the language of choice are combined with the video .frame sequence. The location of sub-titles in the video fr3tne can be specified either manuall. in the characte:istic data design work=anon as described below, or can be automaticail~~ computed. based o:~ anaivsi$ of th~
suc-WO 99/30488 3~ PCT/IL98/00596 title sub-picture stream. For that simple case. video frame verirication is done in the image region free from sub-titles. 4dditionally, sub-title picture verification is done in the sub-title image region.
A more difficult case is w hen graphics are overlaid on the video frame, such as in the case of displaying a menu in a DVD pla~rer. Tne location of menu bullets and text may be. for example, as illustrated in Fig. 6. For that specific case, it is assumed that the graphics sueam has been pre-processed to extract the graphics regions of support. in the form of bounding rec:angl~s for text lines and g:zaphics primiti~~es. These regions are stored as au.~ciliary characteristic data. Bv comparing graphics stream characteristic data with composite video frame stream graphics characteristic data in the respective graphics regions. the streams can be aligned. Once aligned. the: composite frame graphics regions are known to be those of the corresponding graphics stream. Then. based on these regions. only color and t~x~ture actual frame characteristic data which are not occluded by overiayaphics [see Fig. 6J are compared with the respective reference data.
F1Q. 1. depicts the frame ve:ification processes pe:tormed by the frame characteristic data comparator 32 (Fig. 1 ), which start from aligned characteristic data streams. It is important to note that the characteristic data alignment processor 30 detects a variety of content problems. Failure in alignment may be due to the fact that a wTOng content stream is playing. or the content stream is severely time-shifted. or use stream is distorted beyond recognition. A
successful alignment yields the incites of missing or surplus frames. Once aligned. each actual content frame i~ compared with the corresponding reference frame. based on the characteristic date.

wo ~r~o4ss 38 PcrnL9sroos~s Then for the remaining data. frame-by-frame comparison can take place in processors 63, 6~ and 6~ and comparators 66 and 67. The distance between characteristic data of corresponding frames detects quality problems such as luminance or color change. as well as audio distortions. By comparing graphics characteristic data. errors in sub-picture content and overlay may be detected.
Also. by comparing characteristic data sensitive to compression artifacts, such artifacts can be detected.
The comparison process requires the notions of distance and threshold. For vector characte:istic data such as color. luminance and audio. a vector distance measure is used. such as the ~~fahalonobis distance:
D = (.Y" =':Y° )rC'' ( ~'' - X ° ) where .Y', .Y° are the reference and actual characteristic data vectors. C is the co-variance matri:c which models painvise relationships among the individual characteristic data. The proper threshold may be computed at a training phase.
using the characteristic data design workstation described hcreinafrer with reference to F1Q. I-1.
Comparator 68 compares blockiness, characteristic data derived from the reference and actual video frames,_respectivelv. Such data may include po~.ver estimates of a filter desired to enhance an edge did structure. such as. for example, the grid spacing equals the compression block size, which is usually or 16. Bv .comparing these estimates with the reference value, an increase in blockiness may be d~;e~ted. As desc:ibed above. absolute blockiness may be misleading. since it may originate from the ori~:nal frame texture.

Comparison of sub-pictures can be done at bicmap level, at the e.cclusive pR .of the corresponding bitrnaps. by computing the distance bet<veen corresponding shape characteristic data vectors, or by comparing recognized sub-title te.~ct strings. where applicable.
The tern "frame-bv-frame." which is used in conjunction with the comparison process. relates to the fact that once the content streams are aligned, inspection of every frame with the corresponding frame can be done. Clearly, comparison may include all frames or a sub-set of the frame.
The efficiency. robustness and content ver ification could be enhanced by using features that have greater discriminating power over the full reference content. Bv designing a sofrware-configurable characteristic data set, the actual data of the full set which is implemented will be enabled.
Fi$. 1-~ shoes a characteristic data desimn workstation 69. The characteristic data acquisition part of the work-station replicates the reference content processins front-end of Fig. 1. In addition. wori~station 69 has access. by network 70, to the actual content data and not just to the characteristic data. for display at 71 and further analysts 3t i3.
The development of the specific content verification application is conducted usins an aransement of a combination of manual, semi-automatic and automatic processes. For e:cample. the user may specii~ the sub-titling type-face and its location in the video frame. Additionally, the user may select several representative content segments and the system then e~ctraets a full characteristic data set. possibly in multiple p, sses or slower than real-time. ranking their WO 99/311488 PCT/IL98~00596 discriminating power over the sample reference content and retaining their best features.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrated embodiments. and that the present invention may be 5 embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are. therefore, to be considered in all respects as illustrative and not restrictive. the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are. therefore. intended to be embraced 10 therein.
The method of the invention may further comprise the step of computing actual characteristic data from at least part of the actual broadcast or playback content streams.
It may also comprise the step of computing reference characteristic data from at least part 15 of said reference content streams.
Said reference characteristic data may be derived from video frame sequences, still images, audio and graphics, and said actual characteristic data may be derived from a video sequence -and an audio channel. :also, said video image sequence characteristic data may '~0 include an imaLe motion vector field. or data derived from an imas~e difference signal, and said video frame or still image characteristic data may include luminance statistics in pre-defined regions of said frame or image.
Preferably, said video frame or still image characteristic data also include te.~cture ?5 characteristic data and~or colour data, said colour characteristic data include colour moments. said video frame or still image characteristic data also include a low resolution or highly compressed version of the original image, said audio characteristic data include audio siunal parameters. estimated at a window size which is comparable with video frame duration, said graphics characteristic data e~chibit printed text, and said graphics 30 characteristic data also exhibit common strapi,.ics eiements. including bullets and highlighted rectangles WO 99130488 ~ PCT/IL98/00396 In the method of the invention. said step of predicting may include Generating a characteristic data stream from source streams and navigation commands or play-lists.
branching from one source stream to another source stream. Said step of predicting may also include generating a characteristic data stream from source streams and transition commands such as cut. dissolve, fade to/from black or said step may include computing characteristic data of graphics sub-pictures overlay on a video image sequence or still.
The evaluation of the information content of a certain frame may be based on the temporal variation of characteristic data in said frame and in its adjacent frames.
The method may further comprise gradinG the information content of all frames in a sequence. denoting frames with locally maximal information content as anchor frames.
The method may still further comprise evaluating the similarity between two anchor points, based on a measure of temporal correlation between the respective sets of neighbouring characteristic data. ?alternatively, the method may further comprise evaluating the similarity between all pairs of anchor frames, such that, for each pair, one frame is from the reference data and the other is from the actual data.
30 The method may further comprise reporting said ali_anment results, including the time shift between the designed and actual content broadcast-playback, as well as an indication of missinu or surplus frames. The step of comparing may comprise first aligning the graphics of said composite frame sequence with said reference graphics streams, and the step of aliQnina may facilitate computing the location of all overlaid Graphics in said composite frame sequence. The step of computing may facilitate filtering out colour and texture actual frame characteristic data which are occluded by said overlay graphics.
The me:hod may further comprise comparing characteristic data of aligned frames to indicate quality or content problems. and said problems may be selected from the group comprisincr luminance or colour shifrs. compression artifacts, audio artifacts. and audio or sub-pictures mismatch or mis-alignment.

CL~L'vtS .
a method for video content verincation. operati~~e to compare and verify the content of a first audio-visual stream «ith the content of a second audio-usual stream, the method comprising the steps of.-e~ctractina characteristic data from a first audio-visual stream;
e~ctractinst characteristic data from a second audiovisual stream: and comparing the e.~ctracted characteristic data from said first and second audio-visual streams.
'. :~ method as claimed in claim 1, wher ein the step of comparison comprises:
_._----._ alistninu said first and second audio-visual streams on a frame-by-irame.~basis; and performint a frame-bv-frame comparison of said aligned streams of frames.
3. :~ method as claimed in claim 1 or claim 3, wherein said first and second streams are selected from the group comprisins_r the elementary content suearns, including video image sequence, audio channel. and sub-picture streams.
4. ~ method as claimed in any one of claims 1 to 3, wherein said comparison of first and second streams yields at least one parameter, including time-shift between the desired and the actual timing of said second stream: list of missing frames in said second stream;
list of surplus frames in said second stream; sub-title content error:
graphics content error, colour distortion. and luminance shift.
~. :~ method for video content verincation, operative to compare and verir~
the content of a first audio-visual stream with the content of a second audio-visual stream, wherein said second audio-visual content stream is defined by at lest one source cornent stream and a se: of editing instruc:ions. the method comprising the steps of extrac:i.~,u cl".aracteristic data from said &rst audio-visual stream:
extracting charac:erlstic data from said source content stream, and computinsr characteristic data of said second content-stream. based on characteristic data of said source content stream and on said editins_t instructions.

6. .~ method as claimed in claim -. wherein said instructions are in the form of an Edit Decision List or Digital Video Disk branching instructions.
?. ~ method as claimed in any one of claims 1 to 6. wherein said first or second stream is a reference content stream.
0 8. ~ method as claimed in any one of claims 1 to 6, wherein said first and/or second streams are actual broadcast or playback contem streams.
9 .~ method as claimed in ctaim ?, further comprising the step of predicting the reference characteristic data stream from said reference characteristic data and from I S playback instructions.
I0. 4 method as claimed in any one of claims I to 9, wherein said characteristic data extraction is optionaliv augmented by user input facilitating the extraction/relative weiQhtinQ of said data.
.0 I 1. a method as claimed in claim ?, further comprising aligning the reference characteristic ~ata stream with the actual characteristic data stream. on a frame-by-frame basis, and evaluating the information content of a certain frame.
'_5 12. A method as claimed in claim I I , further comprising computing the frame-index offset between the re:erence and actual frames, based on the most likely offsets derived from evaluation of the similarity between alI anchor frames.
13. a method as Maimed in claim 1 I. further comprising matchias_t the reference frame 30 sequence with the actual frame se~~.~ence. based on an identified frama-index o~~t~ ~d wo s~r~o4ss 44 pcrnL9srooss6 further comprising the step of desiunaanst an actual frame as a surplus frame.
or assigning to it a unique reference frame.
14. A method as claimed in any one of claims 1 to 13. further comparing a composite S video frame sequence includins Lraphics overlaid on a video frame sequence, with component reference streams consistine of the original video frame sequence as well as the uraphics streams.
15. a system for audio-visual content verification, operative to compare and verify the content of a first audio-~risual data stream with the content of a second audio-visual data stream. the svstern comprising:
mans for e:ctracting characteristic data from a first audio-visual data stream;
means for extracting characteristic data from a second audio-visual data stream;
and means for comparing characteristic data of said first and second audio-visual dasa streams.
16. ~ system as claimed in claim 15, wherein said comparison means comprises:
means for aligning said audio-visual data streams on a frame-by-frame basis:
and means for frame~bv-frame comparison of said aligned data streams.
1 i :~ system as claimed in claim t 5 or claim 16, wherein said first and second data strains are selected from the group comprising video image sequence. audio channel, and sub-picture data streams.
18. A system as claimed in any one of claims 15 to 17, wherein said means for comparison of said reference data streams yields at least one of the parameters including time-shift benveen _the desired and the actual timing of said second data stream: list of missins_ frames in said second data stream; list of surplus frames in said second data 30 s:ream sub-title content error: graphics content error: colour distortion.
and luminance shift.

9. A system for audio-visual content verification. operative to compare and verify the content of a first audio-visual data stream with the content of a second audio-visual data stream. wherein said second audio-visual data stream is defined by at least one source content data sueam and a set of editing instructions, the system comprising:
5 means for e~ctractinQ characteristic data froth said first audio-visual data stream;
means for ~ctmcrinQ characteristic data from said source content data stream;
and means for computing characteristic data of said second content data stream, based on characteristic data of said source content data stream and said editing instructions.
10 ~0. A system as claimed in claim 19, wherein said editing instructions are in the form of an Edit Decision List or DiQitai Video Disk branching instructions.
IS

wo ~r~ss rcrnL9s~oos~

SYSTEM A~'~TD METHOD FOR
AUDIO-VISUAL CONTENT VERIFICATION
ABSTRACT
The invention provides a method for video content verification. operative to compare and verify- the content of a first audio-visual strum with the content of a second audio-visual stream. comprising tt~e steps of e.ctracting characteristic data from a first audio-visual stream. e.YtractinQ characteristic data from a second audio-visual stream. and comparing the e~ctracted characteristic data from the first and second audio-visual streams. The invention also provides a system for carrying out the method wo pcrnL9s~oos96 ~r~ss reference reference Meranea actual actual sub-pt<xxure vkJeo audio video audio steam stnsm meam stream s~

11 1Z ~

reference reference niece aausl a sub-picture video audio video at~~

store store store store store .

sub-pi~tae v~i i ~

eo aud video audio o processor rocessor processor processor processor p sub-pi~ure video audio video audio charaae charatxeristiccharaaenstic characteristic characteristic ~

data data , data data data chsracarianc eharacxri:ttc data predic:or data store navigation/ 2Q 29 t b d p ay a t commands Gtsra~ertstic data store p~~
hme.s,ift Gars~eri::ic data alignment procesaor missing l frame-aligned suroius frame ~ara~eristic ~incicxs data fr:me e.~sanc;erstic data comparator frame cuaiity report Fig. '1 WO 99!30488 PCT/IL98/00596 from acwal from actual video store audio store Motion-JP
Compression Comprou~
Video Buffer verification report Fig. 2 WO 99/30488 49 PCT/IL98ro0596 image sequence characteristic data image difference measures image motion vector field camera motion vector Fig. 3 .~ I
color c;~arac;eris7c data texture c:~aracteris~c data average ~ energy variance ~ entropy skewness ~ correlation Fig. 4 wo ~r~o4ss rcrnL9s~ooss~s video frame Fig. 5 data window .~ o 0 0 0 Fig. B - ~ ~ TTY
n yJ I ~ LJ
~~ ~o I

WO 99/30488 PCT/IL98Ji00596 video frame sequence sub-title image sequence processor sub-fibs bitmap sub-~tle charac:eristic data extraction 4i time-code-in time-,.,~oce-out sub-title ci~arac;eristic data Flg. ~

wo ~r~cNSS - Pcr~L9soos~

time_ccde_out = start_nme_code cttarac~er frame bitmap binarization processor ' ftme_code_in ~ time_code_out 48 sub_t'ttle_bitmap = frame_bitrnap YES advent= one frame:
dme_code_out = time._code out+1 update frame_bitmap fn3me_bitmap matches sub titte_bitmap NO
........... ........... ...~ .................. ..........
apply temporal enahancament to sub ~tle_bitc;~ap (optional) ...................................,..................................
report:
dme_cade in. time code out sub dtle_bitmap time caae out = end time_code YES
E~ID i video frame Fig. 8 50 ~ sub-dde characteristic data ~ graphics characteristic data sub-title bounding rectangles ~ highlight rectangles sub-title bitmaps ~ bullets center caordinates sub-title shape data stung Fig. 9 analog ' to spectral audio a a~ analyser analog AIO Digital F'dter anti-aliasing Converter (Pre-Emphasis) flyer 52 53 I L"
audio w I
i ________________________________________________., ; .
;

characteristic data ~ Power Digital Filter;

s~~ Estimation 54 ' I

Power Digital Filter Estimation 54' ' 55' , Power Digrcal Filter Estimation 54" ' 55" ;

, ___ ___________________________________________ ____ Filter Bank Fig. 10 frame difference characteristic data stream lo~l variance estimator ci local variance saquence aaapative threshold processor 5.3 local variance threshold crossings non-maximum supcression pracessor 5~
auto-correlation processor anchor frames Fig. 11 wo ~r3o4ss Pc~rnr,9sroos~

actual fame reference frame charactertstic data sub-picture stream characteristic data stream sub-picture geometry alignment processor ac:uai frame sub-pic:ure geometry seam actual frame characteristic data geometric filter ""
of actual video frame c~aracierisac data Fig. 12 wo ~r~ss rc~rnL9sroos~

reference red reference ~r audio sub-pure characteristic charac~tic characteristic dab ~ data -actual aud~ ~i video characteristic sub-picturo trams color den char>cieristic ch:ruteristic data color a~ sub-picture distance charcteristic comparator data processor compararor . 67 , color luminance distance distance color hold nce threshold thres lumina processor processor audio sub-picture 66 qty quality report report .

cola luminance quality quality actual reference report re~rt dbck~nes: biodciness character>atic ~ary~ristic data data bbCks~s chrlatiC data co<rtparsror camprassion artifarw report Fig. 13 referents reference reference ~ audio sub-picture stream stream 12 13 , reference reference reference sub- video audio pin ~n store stop 15 18 id ' I
sub-picture video audio processor ~ processor processor sub-picture video aud'ro display ~aracteristic characteristic c~a~~c 71 data dare data keyboard computer des~n workstation _ . 69 network Fig. 14

Claims

59

1. ~Video sequence viewing apparatus comprising:
an image sequence display unit operative to display a sequence of images at a speed determined in accordance with a control signal; and an image sequence analyzer operative to perform an analysis of the sequence of images and to generate the control signal in accordance with a result of the analysis.

2. ~Apparatus according to claim 1 wherein the speed comprises a variable speed and the control signal has more than one values.

3. ~Apparatus according to claim 1 or claim 2 wherein the analysis of the sequence of images comprises an analysis of the amount of motion in different images within said sequence and said control signal receives a value corresponding to relatively high speed for images in which there is a small amount of motion and a value corresponding to relatively low speed for images in which there is a large amount of motion.

4. ~Image sequence viewing apparatus comprising:
a shot identifier operative to perform an analysis of a sequence of images and to identify shots within the sequence of images; and an image sequence display unit operative to sequentially display at least one initial images of each identified shot.

5. ~Apparatus according to claim 4 wherein the image sequence display unit is operative to display the at least one initial images of each identified shot in response to a user request.

6. Apparatus according to claim 4 or claim 5 wherein the image sequence display unit is operative to display the at least one initial images of all shots sequentially until stopped by the user.

7. A display system for displaying a first image sequence as aligned relative to a second, related image sequence, the system comprising:
an image sequence analyzer operative to generate a presentation of a first image sequence including at least one row of pixels of each image in the first image sequence; and an aligned image sequence display unit operative to display the rows generated by the analyzer, side by side, in a single screen, wherein gaps are provided between the rows, in order to denote images which are missing, relative to the second image sequence.

8. ~A system according to claim 7 wherein the at least one row comprises at least one horizontal row of pixels and at least one vertical row of pixels.

9. ~A system according to claim 7 wherein the display unit is operative to display an isometric view of a stack of the images in at least one of the first and second image sequences.

10. ~A system according to claim 9 wherein the stack comprises a horizontal stack.

11. ~A system according to claim 7 and wherein the analyzer also comprises an image sequence aligner operative to align the first and second image sequences to one another and to provide an output denoting images which are missing from the first image sequence, relative to the second image sequence.

12. ~A copyright monitoring system comprising:
an image sequence comparing unit operative to conduct a comparison between an original image sequence and a suspected pirate copy of the original image sequence and to generate copyright information describing infringement of copyright of the original image sequence by the suspected pirate copy; and a copyright infringement information generator operative to generate a display of the copyright information.

13. ~A system according to claim 12 wherein at least a portion of said comparison is conducted at the shot level.

14. ~A system according to claim 12 or claim 13 wherein at least a portion of said comparison is conducted at the frame level.

15. ~A system according to claim 12 wherein the copyright information quantifies the infringement of copyright of the original image sequence by the suspected pirate copy.

16. ~A watermarking method comprising:
providing an image sequence to be watermarked;
and performing a predetermined alteration of the length of the image sequence.

17. ~A method according to claim 16 wherein said performing step comprises duplicating at least one predetermined image in the image sequence.

18. ~A method according to claim 16 wherein said performing step comprises omitting at least one predetermined image from the image sequence.

19. ~A system according to claim 7 wherein the image sequence analyzer is operative to generate aligned representations of the first and second image sequences and the display unit is operative to display the aligned representations on a single screen.

20. ~A video sequence viewing method comprising:
displaying a sequence of images at a speed determined in accordance with a control signal; and performing an analysis of the sequence of images and generating the control signal in accordance with a result of the analysis.

21. ~An image sequence viewing method comprising:
performing an analysis of a sequence of images and to identify shots within the sequence of images; and sequentially displaying at least one initial images of each identified shot.

22. ~A method for displaying a first image sequence as aligned relative to a second, related image sequence, the method comprising:
generating a representation of a first image sequence including at least one row of pixels of each image in the first image sequence; and displaying the rows generated by the analyzer, side by side, in a single screen, wherein gaps are provided between the rows, in order to denote images which are missing, relative to the second image sequence.

23. ~A copyright monitoring method comprising:
conducting a comparison between an original image sequence and a suspected pirate copy of the original image sequence and to generate copyright information describing infringement of copyright of the original image sequence by the suspected pirate copy; and generating a display of the copyright information.

24. ~A watermarking system comprising:
an image sequence input device operative to input an image sequence to be watermarked; and an image sequence length alteration device operative to perform a predetermined alteration of the length of the image sequence.

25. ~A DVD authoring method comprising:
performing a DVD authoring operation on a plurality of versions of a motion picture, the performing step comprising:
synchronizing the plurality of versions of the motion picture, including:
capturing at least one signatures of at least one corresponding video frames within the plurality of versions of the motion pictures, using only small amounts of data to characterize each of said video frames; and matching said signatures to a continuous stream of data.

26. ~An advertisement verification method comprising:
comparing a broadcast of a commercial with an original commercial, at least partly on the frame level, including comparing individual frames of the broadcast to individual frames of the original commercial; and generating an output indicating at least one parameter of similarity between the broadcast and the original commercial.

27. A method according to claim 26 wherein the comparing step comprises at least one of the following steps:

signature extraction; and signature search.

28. A DVD authoring method comprising:
generating a generic version of a motion picture by comparing and combining a plurality of original video clips representing said motion picture, at the frame level; and creating branching instructions for playback of at least one subsequence of the generic version on a DVD
player.

29. A DVD authoring method comprising:
creating branching instructions for playback of at least one subsequence of a generic version of a motion picture on a DVD player, the generic version comprising a combination of a plurality of original video clips representing said motion picture; and employing said branching instructions to play back at least one subsequence and comparing said at least one subsequence, at the frame level, to at least a portion of at least one of the plurality of original video clips representing said motion picture.

30. An automated video duplication quality control method comprising:
comparing actual video content derived from a reference video content, with the reference content, thereby to obtain a measure of duplication quality control quantifying at least one aspect of similarity between the actual and reference video contents, the comparing step comprising:
extracting frame characteristic data streams from said reference content and from said actual content;
aligning at least a portion of said streams; and comparing at least a portion of said streams on a frame-by-frame basis.

31. A method for comparing a final DVD version of a video clip against an original clip from which the final DVD version was generated, the method comprising:
extracting characteristic data from a first audio-visual stream representing the final clip and from a second audio-visual stream representing the original clip; and comparing the extracted characteristic data from said first and second audio-visual streams.

32. A broadcast verification system comprising:
a signature extractor operative to extract a relatively small signature from a subject clip;
a real time video scanner operative to scan a broad video stream in real time in order to identify the subject clip within the broad video stream; and a comparison report generator operative to produce a comparison report including a frame-by-frame comparison of the subject clip and of the broad video stream.

33. A DVD authoring system comprising:
DVD authoring apparatus operative to perform DVD authoring on a plurality of versions of a motion picture, the apparatus comprising:
a synchronizer operative to synchronize the plurality of versions of the motion picture, including:

a signature capturer operative to capture at least one signatures of at least one corresponding video frames within the plurality of versions of the motion pictures, using only small amounts of data to characterize each of said video frames; and a signature matcher operative to match said signatures to a continuous stream of data.

34. An advertisement verification system comprising:

frame level broadcast evaluation apparatus operative to compare a broadcast of a commercial with an original commercial, at least partly on the frame level, including comparing individual frames of the broadcast to individual frames of the original commercial; and a similarity output generator operative to generate an output indicating at least one parameter of similarity between the broadcast and the original commercial.

35. A DVD authoring system comprising:
a generic version generator operative to generate a generic version of a motion picture by comparing and combining a plurality of original video clips representing said motion picture, at the frame level; and a brancher operative to create branching instructions for playback of at least one subsequence of the generic version on a DVD player.

36. A DVD authoring system comprising:
a brancher operative to create branching instructions for playback of at least one subsequence of a generic version of a motion picture on a DVD player, the generic version comprising a combination of a plurality of original video clips representing said motion picture;
and a frame level playback evaluator operative to employ said branching instructions to play back at least one subsequence and comparing said at least one subsequence, at the frame level, to at least a portion of at least one of the plurality of original video clips representing said motion picture.

37. An automated video duplication quality control system comprising:
a duplication quality controller operative to compare actual video content derived from a reference video content, with the reference content, thereby to obtain a measure of duplication quality control quantifying at least one aspect of similarity between the actual and reference video contents, the controller comprising:
a frame characteristic extractor operative to extract frame characteristic data streams from said reference content and from said actual content;
a stream aligner operative to align at least a portion of said streams; and stream comparing apparatus operative to compare at least a portion of said streams on a frame-by-frame basis.

38. A system for comparing a final DVD version of a video clip against an original clip from which the final DVD version was generated, the system comprising:
a characteristic data extractor operative to extract characteristic data from a first audio-visual stream representing the final clip and from a second audio-visual stream representing the original clip; and apparatus for comparing the extracted characteristic data from said first and second audio-visual streams.

39. A broadcast verification method comprising:
extracting a relatively small signature from a subject stream of video frames; and producing a comparison report including a frame-by-frame comparison of the subject stream and of an additional video stream based on a signature-level match between the two streams.

40. A broadcast verification method comprising:
comparing a broadcast video sequence with an original video sequence, at least partly on the frame level, including comparing at least a derivation of individual frames of the broadcast to at least a derivation of individual frames of the original video sequence;
and generating an output indicating at least one parameter of similarity between the broadcast and the original video sequence.

41. A method according to claim 40 wherein said derivation of a first individual frame which is compared to a derivation of a second individual frame, in the course of said comparing step, comprises a signature of the first individual frame.