JP2004509529A

JP2004509529A - How to use visual cues to highlight important information in video programs

Info

Publication number: JP2004509529A
Application number: JP2002527199A
Authority: JP
Inventors: アブデル−モッタレブ，モハメド
Original assignee: Koninklijke Philips Electronics NV
Current assignee: Koninklijke Philips NV
Priority date: 2000-09-13
Filing date: 2001-08-30
Publication date: 2004-03-25
Also published as: WO2002023891A3; EP1320992A2; WO2002023891A2

Abstract

サッカーの試合のようなスポーツイベントのビデオクリップにおける重要な進展をハイライトするための方法は、ビデオクリップの低レベルな特徴より与えられる手がかりから進展を推測することによる。本方法は、予め選択された可視的若しくは音声の手がかりを有するビデオクリップにおけるフレームのシーケンスを検出する。手がかりを有する各シーケンスにおけるフレーム数は、所定の閾値と比較される。フレーム数が所定の閾値以上の場合には、重要な進展が、当該閾値を満足するフォーマットシーケンスの直前のフレームにおいて宣言される。A way to highlight significant progress in video clips of sporting events, such as soccer games, is by inferring progress from clues provided by low-level features of the video clips. The method detects a sequence of frames in a video clip having preselected visual or audio cues. The number of frames in each sequence having a clue is compared to a predetermined threshold. If the number of frames is greater than or equal to a predetermined threshold, significant progress is declared in the frame immediately before the format sequence that satisfies the threshold.

Description

【０００１】
本発明は、コンテンツベースのビデオ抽出及び検索に係り、より詳細には、スポーツイベントのビデオクリップにおける重要情報若しくは進展（デベロップメント）を自動的に特定する方法に関する。
【０００２】
多くのビデオアプリケーションは、ある重要なクリップを見出すために大量のビデオ題材を検索することを可能とする検索方法を必要とする。かかるアプリケーションは、例えばインターアクティブ（対話方式）ＴＶ及び従量料金制のシステムを含んでよい。インターアクティブＴＶ及び従量料金制のシステムを使用する顧客は、借りる前にプログラムの部分を見たいと思う。ビデオブラウザは、顧客が興味のあるプログラムを見出すことを可能とする。
【０００３】
コンテンツベースのビデオ抽出及び検索における大部分の作業は、色、テクスチャ、形状及びカメラ移動のような低レベルの特徴に基づく。低レベルの特徴は、あるアプリケーションに対しては有用であるが、他の多くの興味深いアプリケーションは、より高いレベルの意味のある情報を必要とする。低レベルの特徴と高レベルの意味のある情報との間のギャップを埋めることは容易でない。たいていの場合、高レベルの意味のある情報が必要とされるとき、キーワードを用いた人手による注釈が通常的に用いられる。
【０００４】
ビデオアーカイビング（記録保管）及び抽出のための重要なアプリケーションの１つは、サッカー、フットボール等のようなスポーツに対するものである。従って、低レベルの特徴を用いて高レベルの情報を自動的に抽出できる方法が必要とされている。
【０００５】
本発明は、特にサッカーの試合のようなスポーツイベントのビデオクリップにおける重要な進展を自動的に特定する方法に向けられる。本方法は、
直前に先行するビデオクリップのフレームが重要な進展を有しうることを示す、予め選択された手がかりを有するビデオクリップのフレームシーケンスを検出し；
所定の閾値と、上記手がかりを有するフレームシーケンスにおけるフレーム数とを比較し；
上記シーケンスにおけるフレーム数が上記閾値以上である場合に、該フレームシーケンスを直前に先行するフレームにおいて重要な進展があると判断することを含む。
【０００６】
本方法は、更に、シーケンスの各フレームの画像における低レベルの特徴から予め選択された手がかりを取得することを含む。かかる実施例では、予め選択された手がかりは、カメラの照準の変化に基づく。より詳細には、重要な進展がビデオクリップにおいて発生したとき、カメラは、典型的には、観衆若しくは選手に焦点を合わせられるので、重要な進展を備えたフレームの直後に後続するフレームのシーケンスの画像は、芝領域をほとんど若しくは一切有していない。
【０００７】
本発明の効果、特性、及び種々の付加的特徴は、添付図面との関連において以下で詳細に説明される模範的実施例を考慮することにより、より明らかになるだろう。
【０００８】
本発明の方法は、コンテンツに基づく（コンテンツベースの）抽出及び検索の促進を達成すべく、低レベルの特徴を用いてビデオ若しくは複数の画像から高レベルの情報を抽出する。これは、特定の対象ドメインを特定すると共に、当該ドメインに固有の知識を用いることによって、低レベルの特徴に基づいて高レベルの情報を自動的に抽出する本発明によって達成される。本発明に対する一の特に有用なアプリケーションは、サッカーの試合及びフットボールゲーム等を含むスポーツイベントのビデオクリップにおける重要な進展のセグメントを強調表示（ハイライト）する際である。かかるビデオクリップは、典型的には、ビデオ、音声、文字情報（クローズ−キャプション）を含む。
【０００９】
本発明の方法は、ビデオクリップの低レベルの特徴及び文字情報から与えられる一若しくはそれ以上の手がかりから進展を推測することによって、当該ビデオクリップにおける重要な進展を強調表示する。より詳細には、本方法は、予め選択されている可視、可聴、及び／又は文字（クローズ−キャプション）の手がかりを有したビデオクリップにおけるフレームのシーケンスを検出する。次いで、手がかりを有する各シーケンスにおけるフレーム数は、所定の閾値と比較される。シーケンスにおけるフレーム数が上記閾値以上の場合、重要な進展は、手がかりを備えた閾値を満足するフレームシーケンスを直前に先行するフレームにおいて宣言される。
【００１０】
スポーツイベントのビデオクリップにおける重要な進展は、典型的には、カメラの関心の的における変化に関連する可視的な手がかりにより特徴付けられることが、見出された。例えば、重要な進展がサッカーの試合のようなスポーツイベントで発生したあと、ビデオカメラは、通常的には、選手若しくはスタジアムの観衆に焦点を合わせられる。カメラが選手若しくはスタジアムの観衆に照準されたとき、競技場の芝は、ほとんど若しくは一切カメラの視野内で見ることができない。
【００１１】
カメラの関心の的の変化を用いて、本発明の方法は、競技場の芝領域をほとんど若しくは一切有しない画像を備えたビデオクリップにおけるフレームのシーケンスを検出する。各シーケンスにおけるフレーム数は、所定の閾値と比較される。シーケンスにおけるフレーム数が上記閾値以上の場合、重要な進展は、芝領域をほとんど若しくは一切有しない閾値を満足するフレームシーケンスを、直前に先行するフレームにおいて宣言される。閾値は、競技場の芝領域をほとんど若しくは一切有しないシーケンスにおけるフレーム数が十分でない場合、カメラが選手若しくはスタジアムに照準を合わせていなければならない、という想定に基づいている。結果的に、当該フレームのシーケンスに直前で先行するフレームは、サッカーの試合の場合の得点時のような重要な進展を含むと考えられる。
【００１２】
図１は、サッカーの試合のビデオクリップにおける重要イベントのセグメントを強調表示するのに適用する場合における、本発明の方法を実現するためのアルゴリズムの模範的実施例の概要を示すフローチャートである。本アルゴリズムは、ステップＳ１において芝領域がほとんど若しくは一切ないビデオクリップにおけるフレームのシーケンスを検出する。ステップＳ２において、上記シーケンスにおけるフレーム数が所定の閾値より大きい場合、ステップＳ３において、重要な進展が、ビデオクリップにおけるフレームの先のセットにおいて宣言される。
【００１３】
検出ステップＳ１において、本アルゴリズムは、芝に類似する色を有する緑の領域を検出する。本アルゴリズムは、フレーム内の芝領域が特定可能なように、各フレームの他の色から緑色を見分けるように訓練される。これは、一若しくはそれ以上のサッカーの試合から、若しくは、ビデオクリップにおけるサッカーの試合から抽出されている芝領域の画像訓練セットからのパッチを用いて実現される。本アルゴリズムは、上記パッチから、芝領域を如何にして緑色の値に変換するかを学習する。ビデオクリップのフレームにおける画像が与えられる場合、訓練は、フレーム中の所与のピクセルが芝であるか否かを判断するために用いられる。
【００１４】
本アルゴリズムは、訓練用パッチにおける各ピクセルに対して赤及び緑の正規化された色（ｒ、ｇ）を算出し、クラス芝に対する正規化ヒストグラムを得ることによって訓練される。ここで、ｒ＝Ｒ／（Ｒ＋Ｇ＋Ｂ）、ｇ＝Ｇ／（Ｒ＋Ｇ＋Ｂ）である。画像のカラーヒストグラムは、赤、緑、青のような色空間を離散的な画像カラー（ビンと称する）に分割し、画像中のすべてのピクセルを横断走査することにより各離散的な色の出現回数をカウントすることによって得られる。
【００１５】
正規化ヒストグラムは、クラス芝、ｐ（ピクセル値｜緑）に対する確率分布関数として考慮できる。上記検出ステップＳ１は、所定の閾値を超えるｐの値（ピクセル値｜緑）を有する各フレーム中のピクセルを芝のピクセルとしてマーキングすることによって実現される。
【００１６】
上述のピクセルクラス化に基づいて、本アルゴリズムのステップＳ１は、各フレームの画像中の芝に類する色を有する連結成分（ｃｏｎｎｅｃｔｅｄｃｏｍｐｏｎｅｎｔｓ）を探し、それらが十分に大きい場合には、カメラは競技場にその照準を合わせていると想定する。しかし、フレームの画像中に見出される連結芝色成分が小さい場合、カメラは観衆若しくは選手にその照準を合わせていると想定する。ステップＳ２において、小さい芝色成分が、短時間例えば１乃至３若しくは４個のみのフレームにおいて検出されるだけの場合、ステップＳ３において重要な進展が宣言されない。しかし、小さい芝色成分が、比較的長時間例えば２００乃至３００個のフレームにおいて検出される場合、ステップＳ３において重要な進展が宣言される。
【００１７】
本アルゴリズムを用いて得られる結果は、音声若しくはクローズドキャプションのような他の異なる形式若しくは同一の形式からの他の手がかりを用いて更なる精度向上が可能である。同一の形式若しくは異なる形式からの手がかりは、検出された重要な出来事若しくは活躍の識別を確かめると共に、更に重要なこととして、ゴール、ゴールの試み、ペナルティ、けが、選手同士の乱闘等のような検出された重要な出来事若しくは活躍を、意味のあるクラスへとクラス化し、これらを重要度によってランク付けするために、用いることができる。
【００１８】
一実施例では、図１の方法は、データ処理装置によりコンピューター読取り可能なコードによって実行される。コードは、データ処理装置内のメモリに記憶されてよく、若しくはＣＤ−ＲＯＭやフロッピー（Ｒ）ディスクのような記録媒体から読み取り／ダウンロードされてよい。他の実施例では、ハードウェア回路は、本発明を実行するソフトウェアの指令と協働して若しくはそれに置換されて用いられても良い。本発明は、例えば図２に示すコンピューター３０で実行することもできる。
【００１９】
コンピューター３０は、可変バンド幅ネットワークやインターネットのようなデータネットワークに接続するためのネットワーク接続３１と、ビデオ若しくはデジタルカメラ（図示せず）のような他のリモートの情報源と接続するファックス／モデム接続３２とを含む。コンピューター３０は、ユーザに情報（ビデオデータを含む）を表示するためのディスプレイと、文字及びユーザコマンドを入力するためのキーボードと、ディスプレイ上でカーソルを位置付けると共にユーザコマンドを入力するためのマウスと、格納されたフロッピー（Ｒ）ディスクから読み出し及びそこへの書込みを行うためのディスクドライブと、ＣＤ−ＲＯＭに記憶された情報にアクセスするためのＣＤ−ＲＯＭドライブとを含んでよい。また、コンピューター３０は、画像等を入力する一若しくはそれ以上の後付型の周辺デバイス３８と、画像、文字等を出力するプリンタとを有してよい。
【００２０】
図３は、ランダム・アクセス・メモリ（ＲＡＭ）、リード・オンリー・メモリ（ＲＯＭ）及びハードディスクのようなコンピューター読取り可能な媒体を含んでよいメモリ４０を含むコンピューター３０の内部構造を示す。メモリ４０に格納される製品は、動作システム４１、データ４２、及びアプリケーション４３を含んでよい。動作システム４１は、ＵＮＩＸ（Ｒ）のようなウインド処理型の動作システムであってよいが、本発明は、マイクロソフト社製ウインドウズ９５（Ｒ）のような他の動作システムも同様に用いることができる。
【００２１】
図１の方法に加えて、メモリ４０に記憶されるアプリケーションは、ビデオコーダ４４、ビデオデコーダ４５、及びフレームグラバー４６を含む。ビデオコーダ４４は、ビデオデータを従来的手法でエンコードし、ビデオデコーダ４５は、従来的手法で符号化されたビデオデータをデコードする。フレームグラバー４６は、ビデオ信号ストリームから単一のフレームを取り込み及び処理することを可能にする。
【００２２】
コンピューター３０に内蔵されるものは、中央処理ユニット（ＣＰＵ）５０、通信インターフェース５１、メモリインターフェース５２、ＣＤ−ＲＯＭドライブインターフェース５３、ビデオインターフェース５４、及びバス５５を含む。ＣＰＵ５０は、コンピューター読取り可能なコード、即ち上述のようなアプリケーションをメモリ５０から実行するためのマイクロプロセッサ等を構成する。かかるアプリケーションは、メモリ４０（上述したように）に記憶されてよく、或いは、ディスクドライブ３６内のフロッピー（Ｒ）ディスク、ＣＤ−ＲＯＭドライブ内のＣＤ−ＲＯＭに記憶されて良い。ＣＰＵ５０は、フロッピー（Ｒ）ディスクに記録されたアプリケーション（若しくは他のデータ）にメモリインターフェース５２を介してアクセスし、ＣＤ−ＲＯＭに記憶されたアプリケーション（若しくは他のデータ）にＣＤ−ＲＯＭドライブインターフェース５３介してアクセスする。
【００２３】
入力ビデオデータは、ビデオインターフェース５４若しくは通信インターフェース５１を通して受信されてよい。入力ビデオデータは、ビデオデコーダ４５二よってデコードされて良い。出力ビデオデータは、ビデオインターフェース５４若しくは通信インターフェース５１を通した送信のために、ビデオコーダ４４によって符号化される。
【００２４】
上述の本発明は、上記実施例を参照して説明されてきたが、種々の修正及び変更が本発明の精神を逸脱することなくなされうる。従って、すべてのかかる修正及び変更は、上記請求の範囲の観点の範囲内であると考えられる。
【図面の簡単な説明】
【図１】
本発明の方法の模範的実施例を実行するアルゴリズムを概説するフローチャートである。
【図２】
本発明の実現のためのコンピューターのブロック図である。
【図３】
本発明を実現するためのコンピューターの内部構造を示すブロック図である。[0001]
The present invention relates to content-based video extraction and retrieval, and more particularly, to a method for automatically identifying important information or development in a video clip of a sporting event.
[0002]
Many video applications require search methods that allow searching large amounts of video material to find certain important clips. Such applications may include, for example, interactive TV and pay-as-you-go systems. Customers using interactive TV and pay-as-you-go systems want to see parts of the program before renting. Video browsers allow customers to find programs of interest.
[0003]
Most work in content-based video extraction and retrieval is based on low-level features such as color, texture, shape and camera movement. Low-level features are useful for some applications, but many other interesting applications require higher levels of meaningful information. Bridging the gap between low-level features and high-level meaningful information is not easy. In most cases, when a high level of meaningful information is needed, manual annotation using keywords is commonly used.
[0004]
One important application for video archiving and extraction is for sports such as soccer, football, and the like. Therefore, there is a need for a method that can automatically extract high-level information using low-level features.
[0005]
The present invention is particularly directed to a method for automatically identifying important developments in video clips of sporting events, such as soccer games. The method
Detecting a frame sequence of a video clip having preselected cues, indicating that the frame of the immediately preceding video clip may have significant progress;
Comparing a predetermined threshold with the number of frames in the frame sequence having the clue;
If the number of frames in the sequence is greater than or equal to the threshold, determining that there is significant progress in the frame immediately preceding the frame sequence.
[0006]
The method further includes obtaining pre-selected cues from low-level features in the images of each frame of the sequence. In such an embodiment, the preselected cues are based on changes in the camera's aim. More specifically, when significant progress occurs in a video clip, the camera is typically focused on the audience or player, so that the sequence of frames immediately following the frame with significant progress The image has little or no turf area.
[0007]
The advantages, characteristics, and various additional features of the present invention will become more apparent from consideration of the exemplary embodiments described in detail below in connection with the accompanying drawings.
[0008]
The method of the present invention extracts high-level information from video or multiple images using low-level features to achieve content-based (content-based) extraction and search facilitation. This is achieved by the present invention, which identifies a particular target domain and automatically extracts high-level information based on low-level features by using knowledge specific to that domain. One particularly useful application for the present invention is in highlighting important developmental segments in video clips of sporting events, including soccer games and football games. Such video clips typically include video, audio, and textual information (close-caption).
[0009]
The method of the present invention highlights important progress in a video clip by inferring the progress from one or more cues provided from low-level features and textual information of the video clip. More specifically, the method detects a sequence of frames in a video clip having preselected visual, audible, and / or text (close-caption) cues. The number of frames in each sequence having a clue is then compared to a predetermined threshold. If the number of frames in the sequence is greater than or equal to the threshold, significant progress is declared in the immediately preceding frame with a frame sequence that satisfies the threshold with cues.
[0010]
It has been found that significant progress in sporting event video clips is typically characterized by visual cues associated with changes in the focus of the camera's interest. For example, after significant progress has occurred in a sporting event, such as a soccer game, the video camera is typically focused on a player or a stadium crowd. When the camera is aimed at an athlete or stadium crowd, little or no turf on the stadium is visible in the camera's field of view.
[0011]
Using a change in the interest of the camera, the method of the present invention detects a sequence of frames in a video clip with an image that has little or no turf area in the stadium. The number of frames in each sequence is compared to a predetermined threshold. If the number of frames in the sequence is greater than or equal to the threshold, significant progress is declared in the immediately preceding frame with a frame sequence that satisfies the threshold with little or no turf area. The threshold is based on the assumption that if there are not enough frames in a sequence that has little or no turf area on the stadium, the camera must aim at the player or stadium. Consequently, the frame immediately preceding the sequence of frames is considered to include significant progress, such as when scoring in a soccer game.
[0012]
FIG. 1 is a flowchart outlining an exemplary embodiment of an algorithm for implementing the method of the present invention when applied to highlight segments of a significant event in a video clip of a soccer match. The algorithm detects a sequence of frames in a video clip with little or no turf area in step S1. If in step S2 the number of frames in the sequence is greater than a predetermined threshold, then in step S3 significant progress is declared in the previous set of frames in the video clip.
[0013]
In the detection step S1, the present algorithm detects a green area having a color similar to grass. The algorithm is trained to distinguish green from other colors in each frame so that turf regions within the frame can be identified. This is achieved using patches from one or more soccer games or from a turf region image training set that has been extracted from a soccer game in a video clip. The algorithm learns from the above patches how to convert the turf region to green values. Given an image in a frame of a video clip, training is used to determine whether a given pixel in the frame is turf.
[0014]
The algorithm is trained by calculating the red and green normalized colors (r, g) for each pixel in the training patch and obtaining a normalized histogram for the class turf. Here, r = R / (R + G + B) and g = G / (R + G + B). The color histogram of an image is obtained by dividing the color space, such as red, green, and blue, into discrete image colors (called bins) and traversing every pixel in the image to produce each discrete color appearance. It is obtained by counting the number of times.
[0015]
The normalized histogram can be considered as a probability distribution function for the class turf, p (pixel value | green). Said detection step S1 is realized by marking pixels in each frame having a value of p (pixel value | green) exceeding a predetermined threshold as grass pixels.
[0016]
Based on the pixel classifying described above, step S1 of the present algorithm looks for connected components with grass-like colors in the image of each frame, and if they are large enough, the camera will Assume that you are aiming at However, if the connected grass color component found in the image of the frame is small, it is assumed that the camera is aiming at the audience or players. If a small grass color component is only detected in step S2 for a short time, for example in only one to three or four frames, no significant progress is declared in step S3. However, if a small grass color component is detected for a relatively long time, for example in 200 to 300 frames, a significant progress is declared in step S3.
[0017]
The results obtained using this algorithm can be further refined using other cues, such as speech or closed captions, or other cues from the same format. Clues from the same or different formats confirm the identification of the significant event or activity detected, and more importantly, the detection of goals, goal attempts, penalties, injuries, brawls between players, etc. Significant events or activities can be classified into meaningful classes and used to rank them by importance.
[0018]
In one embodiment, the method of FIG. 1 is performed by computer readable code by a data processing device. The code may be stored in a memory in the data processing device, or may be read / downloaded from a recording medium such as a CD-ROM or a floppy disk. In other embodiments, a hardware circuit may be used in coordination with or in place of software instructions for performing the present invention. The present invention can also be executed by, for example, the computer 30 shown in FIG.
[0019]
Computer 30 has a network connection 31 for connecting to a data network such as a variable bandwidth network or the Internet, and a fax / modem connection for connecting to other remote information sources such as video or digital cameras (not shown). 32. The computer 30 includes a display for displaying information (including video data) to the user, a keyboard for inputting characters and user commands, a mouse for positioning a cursor on the display and inputting user commands, It may include a disk drive for reading from and writing to stored floppy (R) disks, and a CD-ROM drive for accessing information stored on a CD-ROM. Further, the computer 30 may include one or more retrofit peripheral devices 38 for inputting images and the like, and a printer for outputting images, characters, and the like.
[0020]
FIG. 3 shows the internal structure of a computer 30 that includes a memory 40, which may include computer-readable media such as random access memory (RAM), read-only memory (ROM), and a hard disk. The products stored in the memory 40 may include the operation system 41, the data 42, and the application 43. The operating system 41 may be a windowing type operating system such as UNIX (R), but the present invention can be used with other operating systems such as Microsoft Windows 95 (R) as well. .
[0021]
In addition to the method of FIG. 1, applications stored in the memory 40 include a video coder 44, a video decoder 45, and a frame grabber 46. Video coder 44 encodes the video data in a conventional manner, and video decoder 45 decodes the video data encoded in a conventional manner. Frame grabber 46 allows capturing and processing a single frame from a video signal stream.
[0022]
What is built into the computer 30 includes a central processing unit (CPU) 50, a communication interface 51, a memory interface 52, a CD-ROM drive interface 53, a video interface 54, and a bus 55. The CPU 50 constitutes a computer readable code, that is, a microprocessor or the like for executing the above-described application from the memory 50. Such an application may be stored in memory 40 (as described above), or may be stored on a floppy disk in disk drive 36 or on a CD-ROM in a CD-ROM drive. The CPU 50 accesses the application (or other data) recorded on the floppy (R) disk via the memory interface 52, and accesses the application (or other data) stored on the CD-ROM to the CD-ROM drive interface 53. Access via
[0023]
Input video data may be received through video interface 54 or communication interface 51. The input video data may be decoded by the video decoder 45. The output video data is encoded by the video coder 44 for transmission through the video interface 54 or the communication interface 51.
[0024]
Although the present invention described above has been described with reference to the above embodiments, various modifications and changes may be made without departing from the spirit of the invention. Accordingly, all such modifications and changes are considered to be within the scope of the following claims.
[Brief description of the drawings]
FIG.
5 is a flowchart outlining an algorithm for performing an exemplary embodiment of the method of the present invention.
FIG. 2
FIG. 2 is a block diagram of a computer for realizing the present invention.
FIG. 3
FIG. 2 is a block diagram showing an internal structure of a computer for realizing the present invention.

Claims

A method of automatically identifying important events or activities in a video clip of a sporting event,
a) providing a video clip of a sporting event generated by the camera;
b) detecting a frame sequence of the video clip with preselected cues, indicating that the frame of the immediately preceding video clip may have significant progress;
c) comparing a predetermined threshold value with the number of frames in the frame sequence having the clue;
d) determining that there is significant progress in the immediately preceding frame of the frame sequence if the number of frames in the sequence is greater than or equal to the threshold.

The method of claim 1, wherein the preselected cues are visible.

The method of claim 1, wherein the preselected cues are based on changes in camera sighting.

The method of claim 1, wherein each frame of the sequence comprises an image, and wherein the preselected cues are obtained from the image.

5. The method of claim 4, wherein the preselected cues are images having little or no turf area.

The method of claim 1, wherein the sporting event displayed in the video clip is a soccer match.

The method of claim 1, wherein the preselected cues are provided from low-level features of the video clip.

The method of claim 1, wherein the preselected cues are provided from low-level visual features of the video clip.

The method of claim 8, wherein the low-level visual features include color.

The method of claim 1, wherein the preselected cues are provided from low-level audio features of the video clip.

The method of claim 1, wherein the preselected cues are provided from textual information of the video clip.

12. The method of claim 11, further comprising using the textual information of the video clip to confirm the identification of a significant event or activity.

12. The method of claim 11, further comprising using the textual information of the video clip to classify the significant event or activity into a meaningful class.

The method of claim 1, wherein the preselected cues are a plurality of preselected cues.

The method of claim 1, wherein the preselected cues include low-level visual and audio features of the video clip and textual information of the video clip.

A device that automatically identifies important events or activities in a video clip of a sporting event,
A memory for storing executable code;
Based on the code stored in the memory,
a) providing a video clip of a sporting event generated by the camera;
b) detecting a frame sequence of the video clip with preselected cues, indicating that the frame of the immediately preceding video clip may have significant progress;
c) comparing a predetermined threshold value with the number of frames in the frame sequence having the clue;
d) if the number of frames in the sequence is greater than or equal to the threshold, the processor performing a step of determining that there is significant progress in a frame immediately preceding the frame sequence.