JP2002074375A

JP2002074375A - Method for estimating desired contour of image object

Info

Publication number: JP2002074375A
Application number: JP2001175472A
Authority: JP
Inventors: Uuku Paaku Hyun; ヒュン・ウーク・パーク; Sheplin Todd; トッド・シェプリン; Sun Shijun; シジュン・スン; Kim Yonmin; ヨンミン・キム
Original assignee: University of Washington
Current assignee: University of Washington
Priority date: 2000-08-28
Filing date: 2001-06-11
Publication date: 2002-03-15

Abstract

PROBLEM TO BE SOLVED: To improve object dividing and tracking. SOLUTION: Dividing (94) and tracking (86) of an object are improved by leading out the location of a dynamic contour (112) (namely, of a curved line or 'snake' to freely elongate/contract) while taking direction information into the boundary estimation of the object. In the object boundary estimation, the dynamic contour is deformed from the first form and matched to the features of an image while using an energy minimizing function (90). This function minimizes all the energy of the dynamic contour while being guided by an external limitation and an image power. When minimizing the contour energy concerning a dynamic contour model, both a strength and a direction (λ) of the gradient of the image are analyzed.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の背景】この発明は、画像フレームのシーケンス
内でのオブジェクト追跡および分割に関し、より特定的
には、動的輪郭モデルを用いてオブジェクトの追跡およ
び分割を行ない追跡されているオブジェクトの最適輪郭
エッジを推定することに関する。BACKGROUND OF THE INVENTION The present invention relates to object tracking and segmentation within a sequence of image frames, and more particularly to tracking and segmenting objects using an active contour model to optimize the contour of the object being tracked. Related to estimating edges.

【０００２】あるビデオシーケンスの複数フレームの中
でオブジェクトを追跡するとき、オブジェクトの境界は
各フレームで識別される。オブジェクトとはこの境界内
の領域である。所与のフレームにおけるオブジェクトの
境界の識別は、追跡可能なオブジェクトに対する制約が
緩和されて移動、回転または変形するオブジェクトを追
跡できるようになるのに伴い、より困難になる。１つの
フレームでオブジェクトが識別されると、後続のフレー
ムではテンプレートマッチングを用いてオブジェクトの
移動を検出できる。典型的にはテンプレートは先行する
フレームで識別されたオブジェクトである。When tracking an object in multiple frames of a video sequence, the boundaries of the object are identified in each frame. An object is an area within this boundary. Identifying object boundaries in a given frame becomes more difficult as constraints on trackable objects are relaxed and objects that move, rotate or deform can be tracked. When an object is identified in one frame, the movement of the object can be detected using template matching in a subsequent frame. Typically, a template is an object identified in the previous frame.

【０００３】スネーク（snakes）としても知られている
動的輪郭モデルは、特定の画像オブジェクト境界内で画
像の特徴を調整するのに用いられてきた。概念上、動的
輪郭モデルは、伸縮自在の曲線を画像の上に重ねること
を含む。曲線（すなわちスネーク（snake））はそれ自
身が最初の形状から変形して画像の特徴に合わせる。曲
線をラインおよびエッジといった画像の特徴に適合させ
るエネルギ最小化関数を用いる。この関数は、外部制約
力および画像力により導かれる。曲線の全エネルギ計算
を最小化すると最良適合が得られる。実際は、連続性お
よび平滑性の制約を課してモデルの変形を制御する。こ
のモデルとは先行するフレームからのオブジェクトであ
る。この動的輪郭モデルの欠点は、オブジェクトの位置
または形状が、あるフレームから次のフレームへとわず
かに変化すると、境界が識別ができなくなる可能性があ
ることである。具体的には、推定された境界が、オブジ
ェクトを追う代わりに背景にある強い偽のエッジを捉
え、オブジェクトの輪郭が歪む。Active contour models, also known as snakes, have been used to adjust image features within specific image object boundaries. Conceptually, an active contour model involves overlaying a stretchable curve over an image. The curve (ie, snake) itself deforms from its original shape to match the features of the image. An energy minimization function is used that fits the curve to image features such as lines and edges. This function is derived from external constraints and image forces. The best fit is obtained by minimizing the total energy calculation of the curve. In practice, continuity and smoothness constraints are imposed to control model deformation. This model is the object from the previous frame. A disadvantage of this active contour model is that if the position or shape of the object changes slightly from one frame to the next, the boundaries may not be discernable. Specifically, the estimated boundary captures a strong false edge in the background instead of following the object, and the contour of the object is distorted.

【０００４】Yuille他は、「変形可能なテンプレートを
用いた顔からの特徴抽出（FeatureExtraction from Fac
es Using Deformable Templates）」(International Jo
urnal of Computer Vision, Vol. 8, 1992）で、画像に
おける目および口をモデルと数個のパラメータを用いて
識別するプロセスを開示している。たとえば、目は、２
つの放物線および円半径を用いてモデル化される。放物
線の形状および円半径を変えることにより目を識別でき
る。Yuille他および他の変形モデルは典型的に、制約の
大きな変形のみを含むものである。特に、オブジェクト
は、一般に知られている何らかの態様で変形し得る、一
般に知られている形状を有する。動的輪郭モデルなどの
プロセスは制約が緩和されているが、動きの空間範囲が
非常に狭い場合にのみ有効である。Yuilleが開示するよ
うなプロセスは、動きのより広い空間範囲に対して有効
であるが、非常に制約された動きを追跡する。したがっ
て、より動的な動きをより広い空間範囲にわたり追跡で
きる、さらに柔軟で効果的なオブジェクト追跡装置が必
要である。[0004] Yuille et al., "Feature extraction from face using deformable template"
es Using Deformable Templates) "(International Jo
urnal of Computer Vision, Vol. 8, 1992) discloses a process for identifying eyes and mouths in an image using a model and several parameters. For example, the eyes are 2
It is modeled using two parabolas and a circle radius. The eyes can be identified by changing the shape of the parabola and the radius of the circle. Yuille et al. And other deformation models typically include only highly constrained deformations. In particular, the object has a generally known shape that can be deformed in some generally known manner. Processes such as active contour models have relaxed constraints, but are only effective when the spatial range of motion is very narrow. Processes such as those disclosed by Yuille are effective for a wider spatial range of motion, but track highly constrained motion. Therefore, there is a need for a more flexible and effective object tracking device that can track more dynamic movements over a wider spatial range.

【０００５】[0005]

【発明の概要】この発明に従い、オブジェクト分割およ
び追跡は、オブジェクト境界の推定に方向情報を取入れ
て動的輪郭（すなわち伸縮自在の曲線または「スネーク
（snake）」）の配置を誘導することにより、改良され
る。SUMMARY OF THE INVENTION In accordance with the present invention, object segmentation and tracking incorporates directional information into object boundary estimation to guide the placement of active contours (ie, telescopic curves or "snake") by: Be improved.

【０００６】この発明が取組む問題は、動的輪郭が、あ
るフレームから次のフレームへと急速に移動するオブジ
ェクトについて、誤った画像点の組に収束する可能性が
あることである。オブジェクト境界の推定の際、動的輪
郭は、エネルギ最小化関数を用いて最初の形状から変形
して画像の特徴に合わせる。この関数は、外部制約力お
よび画像力により誘導されて動的輪郭の総エネルギを最
小にする。従来は、連続性および平滑性の制約を課して
動的輪郭の変形を制御している。この発明の一局面に従
うと、方向情報を含み入れてさらに誘導および制御す
る。具体的には、動的輪郭エネルギ関数を最小化する際
に、画像の勾配の強さおよび勾配の方向双方を分析す
る。A problem addressed by the present invention is that the active contour can converge to the wrong set of image points for objects that move rapidly from one frame to the next. In estimating the object boundaries, the active contour is deformed from its initial shape using an energy minimization function to match the features of the image. This function is guided by external constraint forces and image forces to minimize the total energy of the active contour. Conventionally, the deformation of the active contour is controlled by imposing continuity and smoothness constraints. According to one aspect of the invention, guidance and control are further included including direction information. Specifically, when minimizing the active contour energy function, both the gradient strength and the gradient direction of the image are analyzed.

【０００７】方向情報を取入れることの利点の１つは、
方向性のある「スネーク」（すなわち勾配の強さおよび
勾配の方向に基づく動的輪郭モデル）により、所与の近
傍に方向が異なる複数のエッジ点候補があるときに、分
割がより正確に行なわれることである。したがって、現
在、方向情報はデジタルビデオ分割に取入れられてい
る。One of the advantages of incorporating directional information is that
A directional “snake” (ie, active contour model based on gradient strength and gradient direction) allows for more accurate segmentation when there are multiple candidate edge points in different directions in a given neighborhood It is to be. Therefore, direction information is currently being incorporated into digital video segmentation.

【０００８】この発明の上記および他の局面および利点
は、以下の詳細な説明を添付の図面と関連付けて参照す
ればさらに良く理解されるであろう。[0008] The above and other aspects and advantages of the present invention will be better understood with reference to the following detailed description when taken in conjunction with the accompanying drawings.

【０００９】[0009]

【具体的な実施例の説明】概観図１は、ビデオオブジェクトの場所を求め、ビデオオブ
ジェクトを追跡し、符号化するための、ホスト対話型処
理環境１０の一例のブロック図を示す。処理環境１０
は、ユーザインターフェイス１２、シェル環境１４およ
び複数の機能ソフトウェア「プラグイン」プログラム１
６を含む。ユーザインターフェイスは、オペレータ入力
を、ポインティングおよびクリックデバイス２６（たと
えばマウス、タッチパッド、トラックボール）、キー入
力デバイス２４（たとえばキーボード）、または事前記
録スクリプト化マクロ１３などの種々の入力源から受け
て分配する。ユーザインターフェイス１２はまた、表示
装置２２への出力のフォーマット化を制御する。シェル
環境１４は、プラグイン１６およびユーザインターフェ
イス１２間の対話を制御する。入力ビデオシーケンス１
１はシェル環境１４に入力される。種々のプラグインプ
ログラム１６ａ−１６ｎはビデオシーケンス１１すべて
または一部を処理することができる。シェル１４の利点
の１つは、プラグインプログラムを、起こり得るビデオ
シーケンス入力の種々のフォーマットから隔離すること
である。各プラグインプログラムは、アプリケーション
プログラムインターフェイス（「ＡＰＩ」）モジュール
１８を通してシェルとのインターフェイスをとる。DESCRIPTION OF THE SPECIFIC EMBODIMENTS Overview FIG. 1 shows a block diagram of an example of a host interactive processing environment 10 for locating, tracking, and encoding video objects. Processing environment 10
Is a user interface 12, a shell environment 14, and a plurality of function software "plug-in" programs 1.
6 inclusive. The user interface receives and distributes operator input from various input sources, such as a pointing and clicking device 26 (eg, a mouse, touchpad, trackball), a key input device 24 (eg, a keyboard), or a pre-recorded scripted macro 13. I do. The user interface 12 also controls the formatting of the output to the display device 22. Shell environment 14 controls the interaction between plug-in 16 and user interface 12. Input video sequence 1
1 is input to the shell environment 14. The various plug-in programs 16a-16n can process all or part of the video sequence 11. One of the advantages of shell 14 is that it isolates the plug-in program from the various formats of possible video sequence input. Each plug-in program interfaces with the shell through an application program interface ("API") module 18.

【００１０】ある実施例で、対話型処理環境１０は、当
該技術では周知のタイプのプログラミングされたデジタ
ルコンピュータで実現され、その一例が図２に示され
る。コンピュータシステム２０は、表示装置２２、キー
入力デバイス２４、ポインティング／クリックデバイス
２６、プロセッサ２８、およびランダムアクセスメモリ
（ＲＡＭ）３０を有する。加えて、一般的には、通信ま
たはネットワークインターフェイス３４（たとえばモデ
ム、イーサネット（Ｒ）アダプタ）、ハードディスクド
ライブ３２のような不揮発性記憶装置、および可搬記憶
媒体３８を読取る可搬型記憶媒体ドライブ３６が設けら
れる。その他種々の記憶装置４０たとえばフロッピー
（Ｒ）ディスクドライブ、ＣＤ−ＲＯＭドライブ、ジッ
プドライブ、ベルヌーイ（bernoulli）ドライブまたは
他の磁気、光またはその他記憶媒体を含めてもよい。こ
れら種々の構成要素は、１つ以上のバス４２を通してデ
ータおよびコマンドのインターフェイスおよび交換を行
なう。コンピュータシステム２０は、キー入力デバイス
２４、ポインティング／クリックデバイス２６、ネット
ワークインターフェイス３４または別の入力デバイスま
たは入力ポートにより入力された情報を受ける。コンピ
ュータシステム２０は、メインフレームコンピュータ、
ミニコンピュータ、またはマイクロコンピュータといっ
た当該技術で周知のタイプのうちいずれのタイプのコン
ピュータシステムでもよく、ネットワークサーバコンピ
ュータ、ネットワーク化クライアントコンピュータまた
は独立型コンピュータとして機能してもよい。コンピュ
ータシステム２０は、ワークステーション、パーソナル
コンピュータ、または縮小機構ネットワーク端末装置と
して構成されてもよい。In one embodiment, interactive processing environment 10 is implemented with a programmed digital computer of a type well known in the art, an example of which is shown in FIG. The computer system 20 includes a display device 22, a key input device 24, a pointing / click device 26, a processor 28, and a random access memory (RAM) 30. In addition, a communication or network interface 34 (eg, a modem, an Ethernet adapter), a non-volatile storage device such as a hard disk drive 32, and a portable storage medium drive 36 that reads a portable storage medium 38 are typically included. Provided. Various other storage devices 40 may include, for example, a floppy disk drive, CD-ROM drive, zip drive, bernoulli drive, or other magnetic, optical or other storage media. These various components interface and exchange data and commands over one or more buses 42. Computer system 20 receives information entered by key input device 24, pointing / click device 26, network interface 34, or another input device or input port. The computer system 20 includes a mainframe computer,
It may be any type of computer system known in the art, such as a minicomputer or a microcomputer, and may function as a network server computer, a networked client computer or a stand-alone computer. Computer system 20 may be configured as a workstation, personal computer, or reduction mechanism network terminal.

【００１１】別の実施例では、対話型処理環境１０は組
込みシステムにおいて実現される。この組込みシステム
は、上記のようなプログラミングされたデジタルコンピ
ュータと同様のデジタル処理装置および周辺装置を含
む。加えて、画像捕獲といった特定の実現化例のために
１つ以上の入力装置または出力装置が設けられる。In another embodiment, interactive processing environment 10 is implemented in an embedded system. This embedded system includes digital processing units and peripherals similar to a programmed digital computer as described above. In addition, one or more input or output devices are provided for certain implementations, such as image capture.

【００１２】コンピュータで実行可能な命令およびコン
ピュータで読取可能なデータを含め、ユーザインターフ
ェイス１２およびシェル環境１４を実現するためのソフ
トウェア符号は、組込みメモリ、ＲＡＭ、ＲＯＭ、ハー
ドディスク、光ディスク、フロッピー（Ｒ）ディスク、
光磁気ディスク、電気光学ディスク、または別の既知の
または今後実現される可搬または非可搬のプロセッサが
読取可能な記憶媒体といった、デジタルプロセッサが読
取可能な記憶媒体に格納される。プラグイン１６は（対
応するＡＰＩ１８とともに）、別々の記憶媒体に個々に
まとめられ、または、共に共通の記憶媒体にまとめられ
る。さらに、プラグイン１６および対応するＡＰＩ１８
のうち１つ以上をユーザインターフェイス１２およびシ
ェル環境１４とまとめてもよく、プラグイン１６および
対応するＡＰＩ１８をこれらとまとめなくてもよい。加
えて、種々のソフトウェアプログラムおよびプラグイン
を、グローバルコンピュータネットワークといったネッ
トワークで電気的に分配または実行してもよい。Software codes for implementing user interface 12 and shell environment 14, including computer-executable instructions and computer-readable data, include embedded memory, RAM, ROM, hard disk, optical disk, floppy (R), and the like. disk,
It is stored on a digital processor readable storage medium, such as a magneto-optical disk, an electro-optical disk, or another known or later implemented portable or non-portable processor readable storage medium. The plug-ins 16 (along with the corresponding API 18) may be individually grouped on separate storage media or together on a common storage medium. Additionally, plug-ins 16 and corresponding APIs 18
May be combined with the user interface 12 and the shell environment 14, and the plug-ins 16 and corresponding APIs 18 may not be combined therewith. In addition, various software programs and plug-ins may be distributed or executed electrically over a network, such as a global computer network.

【００１３】種々の計算モデルに基づき、処理環境１０
を構成するソフトウェアプログラムは、エンドユーザコ
ンピュータにインストールされるまたは遠隔からアクセ
スされる。独立型計算モデルについては、実行可能な命
令およびデータは、独立型コンピュータがアクセス可能
な揮発性または不揮発性メモリにロードされる。非常駐
コンピュータモデルについては、実行可能な命令および
データは、局所的に処理されまたは遠隔コンピュータで
処理され、その出力はローカルコンピュータに送られ、
オペレータ入力はローカルコンピュータから受ける。当
業者であれば、数多くの計算構成を実現できることがわ
かるであろう。非常駐計算モデルについては、ソフトウ
ェアプログラムは局所的に格納され、または公共または
私用、ローカルまたはワイドエリアネットワーク、また
は大域コンピュータネットワークのサーバコンピュータ
に格納される。実行可能な命令は、エンドユーザコンピ
ュータまたはサーバコンピュータいずれでも実行でき、
データはエンドユーザの表示装置に表示される。The processing environment 10 is based on various calculation models.
Is installed on an end-user computer or accessed remotely. For a standalone computing model, executable instructions and data are loaded into volatile or non-volatile memory accessible by the standalone computer. For the non-resident computer model, the executable instructions and data are processed locally or on a remote computer, the output of which is sent to the local computer,
Operator input is received from the local computer. Those skilled in the art will appreciate that numerous computational configurations can be implemented. For the non-resident computing model, the software program is stored locally or on a server computer in a public or private, local or wide area network, or global computer network. The executable instructions can be executed on either an end-user computer or a server computer,
The data is displayed on the end user's display.

【００１４】シェル環境およびユーザインターフェイスシェル環境１４により、オペレータは、対話型環境で作
業して種々のビデオ処理および共通ツールをテストまた
は使用できる。特に、ビデオオブジェクト分割、ビデオ
オブジェクト追跡およびビデオ符号化（たとえば圧縮）
のためのプラグインが、好ましい実施例においてサポー
トされる。対話型環境１０はシェル１４とともに、ＭＰ
ＥＧ−４ビデオコンテンツまたは別のビデオフォーマッ
トのためのコンテンツなどのビデオコンテンツを作成す
るための、有用な環境を提供する。プルダウンメニュー
またはポップアップウィンドウが実現され、オペレータ
は、プラグインを選択して１つ以上のビデオフレームを
処理できる。 Shell Environment and User Interface Shell environment 14 allows an operator to work in an interactive environment to test or use various video processing and common tools. In particular, video object segmentation, video object tracking and video encoding (eg, compression)
Plugins for are supported in the preferred embodiment. The interactive environment 10 includes a shell 14 and an MP
It provides a useful environment for creating video content, such as EG-4 video content or content for another video format. A pull-down menu or pop-up window is implemented, and the operator can select a plug-in to process one or more video frames.

【００１５】特定の実施例において、シェル１４はビデ
オオブジェクトマネージャを含む。分割プログラムなど
のプラグインプログラム１６は、シェル環境１４を通し
て、１組のユーザ入力と共に、１フレームのビデオデー
タにアクセスする。分割プラグインプログラムは、ビデ
オフレーム内のビデオオブジェクトを識別する。ビデオ
オブジェクトデータはシェル１４に送られ、シェル１４
はこのデータをビデオオブジェクトマネージャモジュー
ル内に格納する。こうして、このようなビデオオブジェ
クトデータに対し、同じプラグイン１６または追跡プロ
グラムなどの別のプラグイン１６がアクセスできる。追
跡プログラムは、後続のビデオフレームにおけるビデオ
オブジェクトを識別する。各フレームのビデオオブジェ
クトを識別するデータは、ビデオオブジェクトマネージ
ャモジュールに送られる。実際、ビデオオブジェクトデ
ータは、ビデオオブジェクトが追跡される各ビデオフレ
ームごとに抽出される。オペレータがビデオオブジェク
トの抽出、ビデオシーケンスの編集またはフィルタリン
グすべてを終えると、符号器プラグイン１６を起動して
最終ビデオシーケンスを所望のフォーマットに符号化で
きる。このようなプラグインアーキテクチャを用いる
と、分割および追跡プラグインは符号器プラグインとイ
ンターフェイスする必要はない。さらに、このようなプ
ラグインはいくつかのビデオファイルフォーマットの読
取をサポートしたりビデオ出力フォーマットを作成した
りする必要はない。シェルはビデオ入力互換性の問題を
処理し、一方、ユーザインターフェイスは表示フォーマ
ット化の問題を処理する。符号器プラグインは、実行時
ビデオシーケンスの作成を処理する。In a specific embodiment, shell 14 includes a video object manager. A plug-in program 16, such as a split program, accesses one frame of video data with a set of user inputs through the shell environment 14. The split plug-in program identifies a video object within a video frame. The video object data is sent to the shell 14 and the shell 14
Stores this data in the Video Object Manager module. Thus, such video object data can be accessed by the same plug-in 16 or another plug-in 16 such as a tracking program. The tracking program identifies video objects in subsequent video frames. Data identifying the video object for each frame is sent to the video object manager module. In fact, video object data is extracted for each video frame in which the video object is tracked. Once the operator has finished extracting video objects, editing or filtering the video sequence, the encoder plug-in 16 can be activated to encode the final video sequence into the desired format. With such a plug-in architecture, the segmentation and tracking plug-in does not need to interface with the encoder plug-in. Further, such plug-ins do not need to support reading some video file formats or create a video output format. The shell handles video input compatibility issues, while the user interface handles display formatting issues. The encoder plug-in handles the creation of the runtime video sequence.

【００１６】マイクロソフトウィンドウズ（Ｒ）オペレ
ーティングシステム環境では、プラグイン１６は、動的
リンクライブラリとしてコンパイルされる。処理環境１
０の実行時に、シェル１４は、プラグインプログラムに
対し事前に定義されたディレクトリを走査する。プラグ
インプログラムネームがある場合はリストに加えられ、
このリストがユーザの選択のためにウィンドウまたはメ
ニューに表示される。オペレータがプラグイン１６の実
行を選択すると、対応する動的リンクライブラリがメモ
リにロードされ、プロセッサは、そのプラグインについ
て事前定義された１組のエントリポイントのうち１つか
らの命令の実行を開始する。ビデオシーケンスおよびビ
デオオブジェクト分割にアクセスするために、プラグイ
ンは１組のコールバック機能を用いる。プラグインは対
応するアプリケーションプログラムインターフェイスモ
ジュール１８を通してシェルプログラム１４にインター
フェイスする。In the Microsoft Windows® operating system environment, plug-ins 16 are compiled as dynamically linked libraries. Processing environment 1
Upon execution of 0, the shell 14 scans a predefined directory for plug-in programs. If there is a plugin program name, it will be added to the list,
This list is displayed in a window or menu for user selection. When the operator chooses to execute plug-in 16, the corresponding dynamic link library is loaded into memory and the processor begins executing instructions from one of a set of predefined entry points for the plug-in. I do. To access video sequences and video object splits, the plug-in uses a set of callback functions. The plug-in interfaces to the shell program 14 through a corresponding application program interface module 18.

【００１７】加えて、分割プラグインによりサポートさ
れるユーザインターフェイス１２の分割インターフェイ
ス４４部が設けられる。分割インターフェイス１４は、
分割プラグインを呼出し、オペレータが選択した分割コ
マンドをサポートする（たとえば分割プラグインの実
行、分割プラグインの構成または境界選択／編集の実
施）。In addition, a split interface 44 of the user interface 12 supported by the split plug-in is provided. The split interface 14
Invoke the split plug-in to support the split command selected by the operator (eg, execute the split plug-in, configure the split plug-in, or perform boundary selection / editing).

【００１８】典型的には、ＡＰＩにより、対応するプラ
グインは、特定のデータ構造に、リンクされたアクセス
が必要かどうかのみに基づいて、アクセスできる。たと
えば、ＡＰＩは、１フレームのビデオデータを取出し、
ビデオオブジェクトデータをビデオオブジェクトマネー
ジャから引出し、または、ビデオオブジェクトデータを
ビデオオブジェクトマネージャとともに格納する役割を
果たす。プラグインを分離しＡＰＩを通してインターフ
ェイスすることにより、プラグインを、ユーザインター
フェイス１２およびシェルの作成に用いたのと異なるプ
ログラミング言語でかつ異なるプログラミング環境で、
書込むことができる。ある実施例では、ユーザインター
フェイス１２およびシェル１４はＣ＋＋で書込まれる。
プラグインは、Ｃプログラミング言語といった言語で書
込むこともできる。Typically, an API allows a corresponding plug-in to access a particular data structure based only on whether linked access is required. For example, the API retrieves one frame of video data,
It is responsible for extracting video object data from the video object manager or storing video object data with the video object manager. By separating the plug-in and interfacing through the API, the plug-in can be written in a different programming language and in a different programming environment than used to create the user interface 12 and shell
Can be written. In one embodiment, user interface 12 and shell 14 are written in C ++.
Plug-ins can also be written in a language such as the C programming language.

【００１９】特定の実施例で、各プラグイン１６は、別
々の処理スレッドで実行される。結果として、ユーザイ
ンターフェイス１２はプラグインが使用できるダイアロ
グボックスを表示して進行状態を表示し、ユーザはここ
から選択を行なってプラグインの実行を停止または一時
停止させることができる。In a particular embodiment, each plug-in 16 runs on a separate processing thread. As a result, the user interface 12 displays a dialog box that the plug-in can use to indicate the progress, from which the user can make a selection to halt or suspend execution of the plug-in.

【００２０】再び図１を参照して、ユーザインターフェ
イス１２は、分割インターフェイス４４、種々の表示ウ
ィンドウ５４−６２、ダイアログボックス６４、メニュ
ー６６およびボタンバー６８を、これら表示をフォーマ
ット化し維持するためのサポートソフトウェア符号とと
もに含む。好ましい実施例において、ユーザインターフ
ェイスはメインウィンドウにより定められ、そのウィン
ドウ内でユーザは１つ以上の従属ウィンドウを選択す
る。各ウィンドウが所与の時間に同時に起動していても
よい。従属ウィンドウは、開いていても閉じていても移
動してもサイズが変化してもよい。Referring again to FIG. 1, the user interface 12 includes a split interface 44, various display windows 54-62, dialog boxes 64, menus 66, and button bars 68 for formatting and maintaining these displays. Include with software code. In a preferred embodiment, the user interface is defined by a main window in which the user selects one or more subordinate windows. Each window may be running simultaneously at a given time. Dependent windows may change size as they are open, closed, moved, or moved.

【００２１】好ましい実施例において、ビデオウィンド
ウ５４、ズームウィンドウ５６、時間線ウィンドウ５
８、１つ以上の符号器表示ウィンドウ６０、および１つ
以上のデータウィンドウ６２を含む、数個の従属ウィン
ドウ５２が設けられる。ビデオウィンドウ５４は、１つ
のビデオフレームまたはフレームのシーケンスを表示す
る。フレームのシーケンスの観察については、フレーム
を段階的に観察しても、実時間で観察しても、スローモ
ーションで観察しても、加速して観察してもよい。この
中には、オペレータがポイントもしくはクリックにより
または事前定義されたキーシーケンスによりアクセスで
きる入力制御が含まれる。ビデオウィンドウ５４におけ
るビデオ表示の制御に対し、停止、一時停止、再生、巻
戻し、早送り、段階表示、およびその他ＶＣＲ（ビデオ
カセットレコーダ）のような制御が設けられる。実施例
によっては、ビデオウィンドウ５４について、拡大縮小
およびスクロール制御が設けられる。In the preferred embodiment, the video window 54, zoom window 56, timeline window 5
8, several dependent windows 52 are provided, including one or more encoder display windows 60 and one or more data windows 62. Video window 54 displays a single video frame or sequence of frames. Regarding the observation of the sequence of frames, the frames may be observed stepwise, in real time, in slow motion, or accelerated. This includes input controls that the operator can access by point or click or by a predefined key sequence. Controls for video display in the video window 54 include controls such as stop, pause, play, rewind, fast forward, step display, and other VCRs (video cassette recorders). In some embodiments, scaling and scrolling controls are provided for the video window 54.

【００２２】ズームウィンドウ５６は、ビデオウィンド
ウ５４の一部をビデオウィンドウよりも実質的に大きな
倍率で拡大表示する。時間線ウィンドウ５８は、ビデオ
フレームの増分時間線を、選択ビデオフレームの０また
はそれ以上のサムネイルビューとともに含む。時間線ウ
ィンドウ５８はまた、入力ビデオシーケンス１１につい
て定められた各ビデオオブジェクトのためのそれぞれの
時間線を含む。ビデオオブジェクトは、オブジェクトの
輪郭を描くことによって定められる。The zoom window 56 enlarges a portion of the video window 54 at a substantially larger magnification than the video window. Timeline window 58 includes an incremental timeline of the video frame, with zero or more thumbnail views of the selected video frame. Timeline window 58 also includes a respective timeline for each video object defined for input video sequence 11. Video objects are defined by delineating the objects.

【００２３】データウィンドウ６２は、オブジェクトタ
イトル、半透明マスクカラー、符号化ターゲットビット
レート、サーチ範囲および他のパラメータを、対応する
ビデオオブジェクトの定義および符号化に用いるため
の、ユーザ入力フィールドを含む。Data window 62 includes user input fields for using the object title, translucent mask color, encoding target bit rate, search range, and other parameters for defining and encoding the corresponding video object.

【００２４】符号化中に、符号器ウィンドウ６０のうち
１つが表示される。たとえば、符号器進行ウィンドウ
は、入力ビデオシーケンス１１における定義された各ビ
デオオブジェクトの符号化状態を示す。During encoding, one of the encoder windows 60 is displayed. For example, the encoder progress window indicates the encoding state of each defined video object in the input video sequence 11.

【００２５】ビデオオブジェクト追跡および分割オブジェクトを追跡するための第１ステップは、使用す
る、オブジェクトに対応するテンプレートを定めること
である。図３は、この発明の実施例に従い初めにビデオ
オブジェクトを分割して最初のテンプレートを得るため
のフローチャート７０である。ある実施例では、オペレ
ータはステップ７２で入力ビデオシーケンスをロード
し、ステップ７４でオブジェクトの境界を近似する点ま
たは線セグメントを選択する。次に、ステップ７６で分
割を行なって境界をより正確に定める。この分割は、以
下の別のセクションで説明するように動的輪郭モデルを
用いて実施される。The first step in tracking video objects and tracking segmented objects is to determine which template to use for the object. FIG. 3 is a flowchart 70 for initially dividing a video object to obtain a first template according to an embodiment of the present invention. In one embodiment, the operator loads the input video sequence at step 72 and selects a point or line segment that approximates the boundaries of the object at step 74. Next, in step 76, division is performed to determine the boundaries more accurately. This division is performed using an active contour model as described in another section below.

【００２６】オブジェクト境界を定めるエッジ点は、ス
テップ７８で出力される。このようなエッジ点を制御点
として別のプラグインのために用い、オブジェクトマス
ク（すなわちテンプレート）を規定し画像フレームに重
ね合わせるなどして、追跡されているオブジェクトが視
覚的に区別できるようにする。また、オペレータは、境
界上の点を調整して境界をさらに精密にし、精密にされ
た境界点を用いて分割アルゴリズムを再び実行し、所望
の正確なオブジェクトを得ることができる。このような
オブジェクトは、別のフレームにおいてオブジェクトの
場所を求めるための初期テンプレートの役割を果たす。The edge points that define the object boundaries are output at step 78. Using such edge points as control points for another plug-in, defining an object mask (ie, template) and superimposing it on the image frame, etc., so that the tracked object can be visually distinguished. . Also, the operator can adjust the points on the boundary to refine the boundary and re-execute the segmentation algorithm using the refined boundary points to obtain the desired accurate object. Such an object serves as an initial template for locating the object in another frame.

【００２７】好ましい実施例において、所与のフレーム
で位置を求めたオブジェクトは、処理すべき次のフレー
ムにおいてそのオブジェクトを探索するときの初期テン
プレートの役割を果たす。上記の次のフレームとは、ビ
デオシーケンスにおける後続の画像フレーム、または、
ビデオシーケンスにおいてサンプリングすべき次のフレ
ーム、または、処理すべき次のフレームであるシーケン
ス内またはシーケンス外の他のフレームである。このよ
うな方策に従うと、初期テンプレートは処理すべき各フ
レームについて常に変化する。In the preferred embodiment, the object located in a given frame serves as an initial template when searching for that object in the next frame to be processed. The next frame is a subsequent image frame in the video sequence, or
The next frame to be sampled in the video sequence, or another frame within or outside the sequence that is the next frame to be processed. According to such a strategy, the initial template is constantly changing for each frame to be processed.

【００２８】図４は、最初のフレームにおけるオブジェ
クトの識別および分割の後、後続のフレームにおいてオ
ブジェクトを追跡するためのフローチャート８０であ
る。ステップ８１で、処理すべき次の画像フレームを入
力する。ステップ８４で、テストを実施してシーンに変
化があるかどうか識別する。種々の方策を実現できる
が、ある実施例では、修正応用共振理論Ｍ−ＡＲＴ２方
法を実行する。これは、同一譲受人に譲渡される、１９
９９年６月１０日出願の、Sun他への米国特許出願第０
９／３２３，５０１号「Video Object Segmentation Us
ing Active ContourModel with Global Relaxation（大
域緩和とともに動的輪郭モデルを用いたビデオオブジェ
クト分割）」で述べられており、この出願は本明細書に
引用により援用されその一部を成す。FIG. 4 is a flowchart 80 for tracking an object in a subsequent frame after identifying and splitting the object in the first frame. In step 81, the next image frame to be processed is input. At step 84, a test is performed to identify any changes in the scene. Although various strategies can be implemented, some embodiments implement a modified applied resonance theory M-ART2 method. This is assigned to the same assignee, 19
U.S. Patent Application No. 0 to Sun et al., Filed June 10, 1999
9/323, 501 “Video Object Segmentation Us
ing Active ContourModel with Global Relaxation ", which is incorporated herein by reference and forms a part thereof.

【００２９】ステップ８４でシーンの変化が検出される
と、プロセス８０は終了する。または、再初期化により
別の画像オブジェクトを追跡する。シーンに変化が生じ
ていなければ、ステップ８６で、種々のオブジェクト追
跡技術のいずれかを用いて画像フレームから画像オブジ
ェクトを大まかに識別する。ある実施例では、２次元相
関自動予測サーチ（２Ｄ−ＣＡＰＳ）プロセスを実施す
る。別の実施例では、３次元相関自動予測サーチ（３Ｄ
−ＣＡＰＳ）プロセスを実施する。ステップ８８で、２
Ｄ−ＣＡＰＳプロセスを用いて画像オブジェクトが発見
されなければ、プロセス８０は終了する、または、再初
期化により別のオブジェクトを追跡する。If a scene change is detected at step 84, process 80 ends. Alternatively, another image object is tracked by re-initialization. If the scene has not changed, then at step 86, image objects are roughly identified from the image frames using any of a variety of object tracking techniques. In one embodiment, a two-dimensional correlation automatic prediction search (2D-CAPS) process is performed. In another embodiment, a three-dimensional correlation automatic prediction search (3D
-CAPS) process. In step 88, 2
If no image object is found using the D-CAPS process, process 80 terminates or tracks another object by re-initialization.

【００３０】オブジェクトが識別されると、ステップ９
０でオブジェクト境界のエッジエネルギが導き出され
る。ステップ９４で、以下の別のセクションで説明する
ように、動的輪郭モデルに基づいてオブジェクト分割プ
ロセスを実施する。動的輪郭モデルは、画像境界を分割
しオブジェクト境界を正確にモデル化するのに用いられ
る。ステップ９６で、推定された画像境界が出力され
る。最初の画像フレームについて先に述べたように、実
施例によっては、出力はバッファ、ファイル、および／
またはビデオスクリーンに書込まれる。このようにして
プロセス８０は別の画像フレームについて繰返される。
この結果、数多くの画像フレームにわたって画像オブジ
ェクトが分割および追跡される。Once the object is identified, step 9
At 0, the edge energy of the object boundary is derived. At step 94, an object segmentation process is performed based on the active contour model, as described in another section below. Active contour models are used to divide image boundaries and accurately model object boundaries. At step 96, the estimated image boundaries are output. As described above for the first image frame, in some embodiments, the output is a buffer, file, and / or
Or written on a video screen. Thus, the process 80 is repeated for another image frame.
As a result, the image object is split and tracked over many image frames.

【００３１】エッジエネルギエッジエネルギは、画素の相対エネルギ値に基づいてエ
ッジを識別する、１組の画素のポテンシャルエネルギの
測定値である。ポテンシャルエネルギの種々の測定値を
実現できる。ある実施例では、多重レベルウェーブレッ
ト分解アルゴリズムを用いて画像の高周波成分を抽出す
る。高周波ディテールを分析して画像オブジェクトのエ
ッジを識別する。たとえば、Haarウェーブレットを用い
ることができる。 Edge Energy Edge energy is a measure of the potential energy of a set of pixels that identifies edges based on the relative energy values of the pixels. Various measurements of potential energy can be realized. In one embodiment, the high-frequency components of the image are extracted using a multi-level wavelet decomposition algorithm. The high frequency details are analyzed to identify edges of the image object. For example, a Haar wavelet can be used.

【００３２】エッジポテンシャルエネルギを得るために
処理する入力は、画像である。ある実施例において、こ
の画像は画像フレーム全体である。他の実施例では、画
像は画像オブジェクトである。得られるエッジポテンシ
ャルエネルギは、画像の各データポイント（画素）のポ
テンシャルエネルギのアレイである。The input processed to obtain edge potential energy is an image. In one embodiment, the image is an entire image frame. In another embodiment, the image is an image object. The resulting edge potential energy is an array of potential energies for each data point (pixel) of the image.

【００３３】ある実施例において、入力画像は、画像を
画像のディテールを引出す直交ミラーフィルタ（ＱＭ
Ｆ）対を用いてフィルタリングし、それと同時に画像を
平滑化することによって分解される。ＱＭＦ対は、画像
のディテールを引出すための高域フィルタと、画像を平
滑化するための低域フィルタとを含む。図５を参照し
て、画像フレーム１５２の多重レベルＱＭＦ分解１５０
が示される。画像フレーム１５２を、低域フィルタ１５
４および高域フィルタ１５６に通して、低域成分１５８
および高域成分１６０を得る。今度はこれらの成分をフ
ィルタリングする。低域成分１５８を、低域フィルタ１
６２および高域フィルタ１６４に通す。低域フィルタ１
６２の出力は低域剰余１６６である。高域フィルタ１６
４の出力は画像フレーム１５２の水平ディテール１６５
である。In one embodiment, the input image is a quadrature mirror filter (QM) that extracts the image from image details.
F) Decomposed by filtering using pairs and simultaneously smoothing the image. The QMF pair includes a high-pass filter for extracting image details and a low-pass filter for smoothing the image. Referring to FIG. 5, multi-level QMF decomposition 150 of image frame 152
Is shown. The image frame 152 is converted to the low-pass filter 15.
4 and the high-pass filter 156 to pass the low-pass component 158
And the high frequency component 160 is obtained. Now filter these components. The low-pass component 158 is converted to the low-pass filter 1
62 and a high pass filter 164. Low-pass filter 1
The output of 62 is a low-frequency remainder 166. High-pass filter 16
The output of 4 is the horizontal detail 165 of the image frame 152.
It is.

【００３４】これに並行して、高域成分１６０を低域フ
ィルタ１６８および高域フィルタ１７０に通す。低域フ
ィルタ１６８の出力は画像フレーム１５２の垂直ディテ
ール１６９である。高域フィルタ１７０の出力は画像フ
レーム１５２の対角ディテール１７１である。低域剰余
１６６および３つの画像ディテール１６５、１６９およ
び１７１は、画像フレーム１５２の第１レベルＱＭＦ分
解である。実施例によっては、第２レベルＱＭＦ分解１
７２も行ない、この場合は低域剰余１６６を同様に２段
の低域および高域フィルタを通して入力して第２レベル
の低域剰余および３つの画像ディテール（水平ディテー
ル、垂直ディテールおよび対角ディテール）を得る。実
施例によっては、第２レベル分解において第１レベル分
解で用いたのと同じフィルタを用いてもよい。たとえ
ば、低域剰余１６６を単に画像フレーム１５２の代わり
にフィルタ１５４および１５６に入力する。At the same time, the high-pass component 160 is passed through the low-pass filter 168 and the high-pass filter 170. The output of low pass filter 168 is vertical detail 169 of image frame 152. The output of high pass filter 170 is diagonal detail 171 of image frame 152. The low-frequency remainder 166 and the three image details 165, 169 and 171 are the first level QMF decomposition of the image frame 152. In some embodiments, the second level QMF decomposition 1
72, in which case the low-pass remainder 166 is also input through a two-stage low-pass and high-pass filter to provide a second-level low-pass remainder and three image details (horizontal detail, vertical detail and diagonal detail). Get. In some embodiments, the same filter used in the first level decomposition may be used in the second level decomposition. For example, low-frequency remainder 166 is input to filters 154 and 156 instead of simply image frame 152.

【００３５】高域フィルタリング関数はウェーブレット
変換（Ψ）であり、低域フィルタリング関数はウェーブ
レットに一致するスケーリング関数（φ）である。スケ
ーリング関数が平滑化を生じさせ、３つのウェーブレッ
トが画像のディテールを引出す。１次元空間におけるス
ケーリング関数およびウェーブレット変換は以下の式で
与えられる。The high-pass filtering function is a wavelet transform (Ψ), and the low-pass filtering function is a scaling function (φ) that matches the wavelet. The scaling function causes smoothing, and the three wavelets extract details of the image. The scaling function and wavelet transform in one-dimensional space are given by the following equations.

【００３６】[0036]

【数１】 (Equation 1)

【００３７】スケーリングは分解のレベルごとに異なっ
ていてもよいが、ある実施例ではこのようなスケーリン
グは変化しない。Although the scaling may be different for each level of decomposition, in one embodiment such scaling does not change.

【００３８】第１レベルのＱＭＦ分解を行なう。第２レ
ベルの分解については、第１レベルの分解の低域剰余１
６６をさらなるダウンサンプリングを行なわずに分析す
る。実施例によっては、先行するレベルの低域剰余を２
段のフィルタリングプロセス（先行するレベルのものと
同様）を通して送ることにより、さらなる分解レベルを
得ることができる。Perform a first level QMF decomposition. For the second level decomposition, the lower residual 1 of the first level decomposition
66 is analyzed without further downsampling. In some embodiments, the lower level residue of the preceding level is 2
Additional levels of decomposition can be obtained by passing through a stage filtering process (similar to that of the preceding level).

【００３９】分解の所与のレベルについて、４つの画像
すなわち低域剰余、垂直ディテール、水平ディテールお
よび対角ディテールがある。水平および垂直ディテール
は、画像の、ｘおよびｙ軸に沿う勾配である。画像の大
きさを、分解の各レベルで取込む。ある実施例におい
て、対角ディテールは貢献度が小さいので省略される。For a given level of decomposition, there are four images: low-frequency remainder, vertical detail, horizontal detail and diagonal detail. The horizontal and vertical details are the gradients of the image along the x and y axes. The size of the image is captured at each level of decomposition. In some embodiments, diagonal details are omitted because they contribute less.

【００４０】ある実施例では、５レベルまでの分解を画
像フレームの各色成分について用いる。先行する段から
の低域剰余をフィルタ１５４および１５６に入力し、現
在の段についての画像ディテールおよび剰余を生成す
る。好ましくは、偶数レベル（たとえばレベル２、４お
よび６）からのデータのみを用いて、エッジエネルギに
半画素シフトが起こらないようにする。多重レベルおよ
び多重チャネル（色成分）データの統合は、主成分によ
り導かれる。ある実現化例では、多重レベルエッジ勾配
の比率は、５レベルの分解について、１：２：４：８：
１６として選択される。色成分（Ｙ、Ｃｒ、Ｃｂ）につ
いて、エッジ勾配比１：１：１が用いられる。In one embodiment, up to five levels of decomposition are used for each color component of an image frame. The low-pass residue from the previous stage is input to filters 154 and 156 to generate image details and residue for the current stage. Preferably, only data from even levels (eg, levels 2, 4 and 6) are used to prevent a half pixel shift in edge energy. The integration of multi-level and multi-channel (color component) data is guided by the principal components. In one implementation, the ratio of the multi-level edge gradient is 1: 2: 4: 8:
16 is selected. For the color components (Y, Cr, Cb), an edge gradient ratio of 1: 1: 1 is used.

【００４１】ある実施例で、分解の所与のレベル（ｉ）
の水平ディテールおよび垂直ディテールを組合せて、そ
のレベルについてのエッジポテンシャルエネルギ（ＥＰ
Ｅ）を以下のようにして発生する。In one embodiment, for a given level of decomposition (i)
The horizontal and vertical details of the edge potential energy (EP
E) occurs as follows.

【００４２】[0042]

【数２】 (Equation 2)

【００４３】５レベルの分解を実行するある実施例にお
いて、所与の色成分についての全エッジポテンシャルエ
ネルギ（ＥＰＥ）を以下のように総計する。In one embodiment that performs five levels of decomposition, the total edge potential energy (EPE) for a given color component is summed as follows:

【００４４】[0044]

【数３】 (Equation 3)

【００４５】式中、ｃは処理中の色成分である。すべて
の色成分を含む、フレーム全体の全エッジポテンシャル
エネルギは、異なる色成分からのエネルギの重み付けさ
れた合計である。（１、１、１）という重みファクタに
ついて、全ポテンシャルエネルギは以下のように与えら
れる。Where c is the color component being processed. The total edge potential energy of the entire frame, including all color components, is a weighted sum of the energy from the different color components. For a weighting factor of (1,1,1), the total potential energy is given as:

【００４６】[0046]

【数４】 (Equation 4)

【００４７】式中、ｙ、ｃｒおよびｃｂは色成分であ
る。他の実施例では、Ｒ、ＧおよびＢ色成分または別の
色成分モデルの色成分を用いてもよい。用いる色成分モ
デル次第で重み係数は変化し得る。In the formula, y, cr and cb are color components. In other embodiments, the R, G, and B color components or color components of another color component model may be used. The weighting factor can change depending on the color component model used.

【００４８】全エッジポテンシャルエネルギは、処理さ
れる画像の各画素についてのエネルギ値を有するアレイ
である。エッジエネルギ導出について詳細な説明を行な
っているが、これに代わるエッジエネルギ導出方法を用
いてもよい。The total edge potential energy is an array having an energy value for each pixel of the image being processed. Although the detailed description has been given of the derivation of the edge energy, an alternative method of deriving the edge energy may be used.

【００４９】修正動的輪郭モデルを用いたオブジェクト
分割画像内のオブジェクトを識別すると、そのオブジェクト
の境界（すなわちエッジ）を分割してオブジェクトのエ
ッジをより正確にモデル化する。好ましい実施例に従う
と、オブジェクト分割プロセス７６／９４は、動的輪郭
モデルに基づく。最初の分割（たとえばプロセス７０）
について、さらに向上した推定をすべき画像オブジェク
ト境界は、ステップ７４（図３参照）で点を選択した、
システムに入力される境界である。後続の画像フレーム
について、分割プロセス９４は、導出された全エッジポ
テンシャルエネルギおよび現在の画像オブジェクト境界
を、入力として受ける。ステップ９０で、全エッジポテ
ンシャルエネルギが導出される。現在の画像オブジェク
ト境界は、ステップ８６におけるオブジェクト追跡プロ
セスで得られる。 Object Using Modified Active Contour Model
When an object in a segmented image is identified, the boundaries (ie, edges) of the object are segmented to more accurately model the object edges. According to a preferred embodiment, the object segmentation process 76/94 is based on an active contour model. First split (eg, process 70)
For the image object boundaries to be further improved, the points were selected in step 74 (see FIG. 3).
Boundary input to the system. For subsequent image frames, the segmentation process 94 receives as input the derived total edge potential energy and the current image object boundary. At step 90, the total edge potential energy is derived. The current image object boundary is obtained in the object tracking process in step 86.

【００５０】動的輪郭モデルは、エネルギ最小化スプラ
インまたは曲線であり、これは内部の制約力により導か
れ、曲線をラインやエッジといった特徴に向けて引張る
外部画像力の影響を受ける。以下に述べる代替的な２つ
の動的輪郭モデル実施例は、ステップ７６および９４で
オブジェクト分割を行なう。ある実施例では、勾配降下
サーチプロセスを修正して方向情報を考慮している（図
６および７参照）。別の実施例では、動的プログラミン
グサーチを行なっており、これは方向情報の考慮を含ん
でいる（図８−図１１参照）。以下では、この修正勾配
降下実施例について説明する。動的プログラミング実施
例については続くセクションで述べる。Active contour models are energy minimizing splines or curves, which are guided by internal constraints and are affected by external image forces that pull the curves toward features such as lines and edges. The alternative two active contour model embodiments described below perform object segmentation at steps 76 and 94. In one embodiment, the gradient descent search process is modified to account for direction information (see FIGS. 6 and 7). In another embodiment, a dynamic programming search is performed, which includes consideration of directional information (see FIGS. 8-11). Hereinafter, a modified gradient descent embodiment will be described. A dynamic programming embodiment is described in a subsequent section.

【００５１】勾配降下実施例における従来のエネルギ関
数は、以下のように離散形式で定義される。The conventional energy function in the gradient descent embodiment is defined in discrete form as follows:

【００５２】[0052]

【数５】 (Equation 5)

【００５３】式（１）で、Ｎはｓｎａｘｅｌの数であ
り、シーケンス状のｓｎａｘｅｌの順序は閉じた輪郭に
沿い反時計回りである。重み係数α_iおよびβ_iは、それ
ぞれ、ｉ番目のｓｎａｘｅｌについてメンブレンおよび
薄板項の相対重要度を制御するように定められ、ｓ
_iは、ｉ番目のｓｎａｘｅｌの二次元座標、ｘ_iおよびｙ
_iである。外部エネルギは、上記画像勾配の負の大きさ
となるように設定されるので、ｓｎａｋｅ（スネーク）
は外部エネルギが低い領域すなわち強いエッジに引かれ
る。In the equation (1), N is the number of snaxels, and the order of the snaxels in the sequence is counterclockwise along the closed contour. The weighting factors α _i and β _i are defined to control the relative importance of the membrane and sheet terms for the i th snaxel, respectively,
_i is the two-dimensional coordinate of the i-th snaxel, x _i and y
_i . Since the external energy is set to be a negative value of the image gradient, snake (snake)
Are drawn to areas where the external energy is low, ie, strong edges.

【００５４】ｓｎａｘｅｌの最適位置をサーチし、スネ
ークのエネルギＥ_snakeが最小化できるようにする。あ
る実施例で、勾配降下方法は、新たなｓｎａｘｅｌ位置
を反復的に発見することによりスネークエネルギを最小
化しようとしている。The optimum position of snaxel is searched so that the snake energy E _snake can be minimized. In one embodiment, the gradient descent method seeks to minimize snake energy by iteratively finding new snaxel locations.

【００５５】[0055]

【数６】 (Equation 6)

【００５６】式（４）において、ＩはＮ×Ｎアイデンテ
ィティマトリクスである。式（４）は反復されてスネー
クエネルギに対する局所最小値を生成する。これらの式
で示すように、外部エネルギは勾配画像の大きさから定
められる。In equation (4), I is an N × N identity matrix. Equation (4) is repeated to generate a local minimum for snake energy. As shown by these equations, the external energy is determined from the size of the gradient image.

【００５７】勾配画像は、Ｓｏｂｅｌ、Ｐｒｅｗｉｔ
ｔ、またはガウスの微分（ＤＯＧ）といった勾配オペレ
ータを用いることにより得られる。画像は、勾配オペレ
ータで畳み込まれ、このオペレータは典型的に、水平お
よび垂直勾配カーネルからなる。The gradient image is represented by Sobel, Prewitt
It is obtained by using a gradient operator such as t or Gaussian derivative (DOG). The image is convolved with a gradient operator, which typically consists of horizontal and vertical gradient kernels.

【００５８】[0058]

【数７】 (Equation 7)

【００５９】式（１）−式（３）で定義される従来の輪
郭において、勾配の大きさのみが輪郭エネルギに貢献す
る。したがって、この輪郭は、正しい境界と異なる勾配
方向を有する強いエッジに引付けられる可能性がある。In the conventional contour defined by equations (1)-(3), only the magnitude of the gradient contributes to the contour energy. Thus, this contour can be attracted to strong edges that have a different gradient direction than the correct boundary.

【００６０】図６の（Ａ）−（Ｃ）は、この制限を、従
来の大きさのみの動的輪郭モデルにおいて示している。
図６の（Ａ）は白黒画像１００を示す。図６の（Ｂ）お
よび（Ｃ）は、白黒画像の勾配大きさ画像１０２を示
す。図６の（Ｂ）は、勾配大きさ画像１０２の上に重ね
られた、手書きで定めた最初の輪郭１０４も示す。図６
の（Ｃ）は、勾配大きさ画像１０２の上に重ねられた、
従来の動的輪郭モデルに基づいて導出された動的輪郭１
０６も示す。この例では、勾配大きさ画像１０２は、５
×５ＤＯＧカーネルを用いることによって得られる。図
６の（Ｃ）に示すように、従来の動的輪郭１０６は、勾
配大きさ画像１０３の外側の境界１０８にも内側の境界
１１０にも収束しない。図７の（Ａ）および（Ｂ）を参
照して、勾配大きさ画像１０２が、この発明のある実施
例に従って得られた動的輪郭１１２とともに示されてい
る。具体的には、動的輪郭１１２は、単に勾配の大きさ
に基づいているのではなく、勾配の方向にも基づいてい
る。勾配の方向についての情報を含めることにより、動
的輪郭１１２は、一部が双方の境界１０８および１１０
にまたがるのではなく、内側の境界１１０（図７の
（Ａ））または外側の境界１０８（図７の（Ｂ））のう
ちいずれかに収束する。具体的な結果は方向情報に依存
する。FIGS. 6A to 6C show this limitation in a conventional size-only active contour model.
FIG. 6A shows a monochrome image 100. FIGS. 6B and 6C show a gradient magnitude image 102 of a black and white image. FIG. 6B also shows an initial handwritten outline 104 overlaid on the gradient magnitude image 102. FIG.
(C) is superimposed on the gradient magnitude image 102,
Active contour 1 derived based on a conventional active contour model
06 is also shown. In this example, the gradient magnitude image 102 is 5
Obtained by using a x5 DOG kernel. As shown in FIG. 6C, the conventional active contour 106 does not converge on the outer boundary 108 or the inner boundary 110 of the gradient magnitude image 103. Referring to FIGS. 7A and 7B, a gradient magnitude image 102 is shown with an active contour 112 obtained in accordance with one embodiment of the present invention. Specifically, the active contour 112 is based not only on the magnitude of the gradient, but also on the direction of the gradient. By including information about the direction of the gradient, the active contour 112 is partially transformed into both boundaries 108 and 110
, But converge to either the inner boundary 110 (FIG. 7A) or the outer boundary 108 (FIG. 7B). The specific result depends on the direction information.

【００６１】勾配の方向情報について検討するために、
画像の外部エネルギを以下のように再定義する。In order to consider the direction information of the gradient,
Redefine the external energy of the image as follows:

【００６２】[0062]

【数８】 (Equation 8)

【００６３】ｓ_iでの勾配方向が内向きのとき、λιは
１である。それ以外の場合、λιは−１である。輪郭の
通常の方向が勾配の方向と逆のとき、外部エネルギは最
大すなわち式（７）による０である。このような場合、
こうして得た方向性輪郭は、勾配方向と逆方向のエッジ
に引付けられない、したがって、正しい勾配方向を有す
るエッジのみが、式（４）の動的輪郭最小化プロセスに
加わる。When the gradient direction at s _i is inward, λι is 1. Otherwise, λι is -1. When the normal direction of the contour is opposite to the direction of the gradient, the external energy is maximum or zero according to equation (7). In such a case,
The directional contours thus obtained are not attracted to edges in the direction opposite to the gradient direction, so that only edges having the correct gradient direction participate in the active contour minimization process of equation (4).

【００６４】方向輪郭を適用する際、方向パラメータ
｛λ_i｝は、先行するフレームの輪郭から以下のように
して定められる。When applying the directional contour, the directional parameter {λ _i } is determined from the contour of the preceding frame as follows.

【００６５】[0065]

【数９】 (Equation 9)

【００６６】したがって、（ｋ＋１）番目のフレームに
ついての方向パラメータは、式（１０）のように、ｋ番
目のフレームの輪郭方向θ_Cおよび勾配方向θ_Gから計算
される。計算された方向パラメータ｛λι｝およびｋ番
目のフレーム輪郭（Ｃ_k）を、（ｋ＋１）番目のフレー
ムの分割のための初期方向輪郭として用いる。方向パラ
メータは、先行するフレームの分割された輪郭から決定
されるため、分割の結果は、方向パラメータがフレーム
シーケンスに沿って頻繁に変化する場合劣化する。この
ことは、先行するフレームの方向パラメータが現在のフ
レームの方向パラメータと異なる可能性があることを意
味している。Therefore, the direction parameter for the (k + 1) -th frame is calculated from the contour direction θ _C and the gradient direction θ _G of the k-th frame as in equation (10). The calculated direction parameter {λι} and the k-th frame contour (C _k ) are used as initial direction contours for the (k + 1) -th frame division. Since the directional parameters are determined from the divided contours of the preceding frame, the result of the division will be degraded if the directional parameters change frequently along the frame sequence. This means that the direction parameter of the preceding frame may be different from the direction parameter of the current frame.

【００６７】分割の結果は、予め定義された方向パラメ
ータに依存する。図７の（Ａ）および（Ｂ）はそれぞ
れ、分割の結果を、すべてのｓｎａｘｅｌについての１
および−１の方向パラメータとともに示している。方向
パラメータが１（図７の（Ａ））のとき、方向輪郭は、
輪郭を、勾配方向が内向きすなわち勾配方向が輪郭の反
時計回り方向の左側である境界のエッジに収束させる。
しかしながら、輪郭は、方向パラメータが−１のとき
（図７の（Ｂ））、外向きの勾配方向の境界のエッジに
向けて移動する。この例では、輪郭におけるすべてのｓ
ｎａｘｅｌについての方向パラメータは同一である。し
かしながら、方向勾配分析に従い、輪郭に沿って異なる
ｓｎａｘｅｌに異なる方向パラメータが割当てられる。
このような場合、１の方向パラメータを有するｓｎａｘ
ｅｌは内側のエッジに収束し、−１の方向パラメータの
ｓｎａｘｅｌは外側のエッジに収束する。したがって、
最初の輪郭のすべてのｓｎａｘｅｌについて方向パラメ
ータを導出することが望ましい。より具体的には、ビデ
オ分割および追跡において、先行するフレームの分割結
果からの方向パラメータを用いる。The result of the division depends on the predefined direction parameters. FIGS. 7A and 7B respectively show the results of the division as 1 for all snaxels.
And -1 direction parameters. When the direction parameter is 1 (FIG. 7A), the direction contour is
The contour is converged to a boundary edge where the gradient direction is inward, ie, the gradient direction is to the left of the contour in a counterclockwise direction.
However, when the direction parameter is −1 (FIG. 7B), the contour moves toward the edge of the boundary in the outward gradient direction. In this example, all s in the contour
The direction parameters for the taxel are the same. However, according to the directional gradient analysis, different snaxels along the contour are assigned different directional parameters.
In such a case, snax having one direction parameter
el converges on the inner edge and snaxel with a directional parameter of -1 converges on the outer edge. Therefore,
It is desirable to derive direction parameters for all snaxels of the first contour. More specifically, in the video segmentation and tracking, the direction parameter from the segmentation result of the preceding frame is used.

【００６８】方向性動的輪郭モデルを用いたオブジェク
ト分割図８を参照して、オブジェクト分割ステップ７６および
９４（図４参照）を実現するための、代替的な動的輪郭
モデル実施例のフローチャート１９２が示される。この
実施例では、動的プログラミングサーチを行なう。第１
のステップ１９４でエッジ点が入力される。入力された
エッジ点の数は変化し得る。ステップ１９６で、近接し
過ぎるエッジ点（すなわち間隔が第１のしきい値距離未
満）は削除される。ある実施例では、点同士の間隔が
２．５画素未満のときこれらの点は近接し過ぎるとみな
される。他の実施例では、この間隔はこれより小さいま
たは大きい。ステップ１９８で、隣接する点が離れすぎ
ている（すなわち間隔が第２のしきい値距離よりも大き
い）場合、内挿によりさらなる点が加えられる。ある実
施例では、点同士の間隔が６．０画素よりも大きいとき
これらの点は離れすぎているとみなされる。他の実施例
では、この間隔は６．０よりも小さいまたは大きいが、
第１のしきい値距離よりは大きい。 Object Using Directional Active Contour Model
Referring to preparative division 8, for implementing the object segmentation step 76 and 94 (see FIG. 4), the flowchart 192 of an alternative active contour model examples are set forth. In this embodiment, a dynamic programming search is performed. First
In step 194, an edge point is input. The number of input edge points can vary. In step 196, edge points that are too close (i.e., the spacing is less than a first threshold distance) are deleted. In one embodiment, these points are considered too close when the spacing between the points is less than 2.5 pixels. In other embodiments, the spacing is smaller or larger. At step 198, if the adjacent points are too far apart (ie, the spacing is greater than a second threshold distance), additional points are added by interpolation. In one embodiment, these points are considered too far apart when the spacing between the points is greater than 6.0 pixels. In other embodiments, this interval is less than or greater than 6.0,
It is larger than the first threshold distance.

【００６９】このプロセスのこの段階で、入力エッジ点
から修正された、所与の数の現在のエッジ点がある。エ
ッジ点の数は輪郭によって変化し得るが、以下では現在
のＮのエッジ点に対するプロセスについて説明する。At this stage in the process, there is a given number of current edge points modified from the input edge points. Although the number of edge points may vary with the contour, the process for the current N edge points will be described below.

【００７０】ステップ１９９で、エッジ点についての方
向パラメータ｛λ_l｝が、式（７）から（１０）を用い
て導き出される。ある実施例では、ステップ２００で、
動的輪郭モデル化プロセスは、現在のＮのエッジ点に対
し大域緩和を行なう。そのために、現在の各エッジ点に
ついて、現在のエッジ点のまわりのボックスからＭまで
の候補点が選択される。ある実施例ではＭは４に等しい
が、種々の実施例では候補点の数は変化し得る。ある実
施例では、５×５ボックスを用いる。しかしながら、こ
のボックスのサイズも変化し得る。ボックスがより大き
ければ輪郭はより柔軟になるが計算時間は長くなる。ボ
ックスの形状は正方形、矩形または別の形状でもよい。In step 199, the directional parameter {λ _l } for the edge point is derived using equations (7) through (10). In some embodiments, at step 200,
The active contour modeling process performs global relaxation on the current N edge points. To that end, for each current edge point, candidate points from the box around the current edge point to M are selected. In some embodiments, M is equal to four, but in various embodiments, the number of candidate points can vary. In one embodiment, a 5 × 5 box is used. However, the size of this box can also vary. The larger the box, the more flexible the contour but the longer the computation time. The shape of the box may be square, rectangular or another shape.

【００７１】好ましい実施例では、Ｍよりも少ない潜在
候補点が、方向情報に基づいて導き出される。具体的に
は、エッジ点について、各潜在方向の候補点を識別する
のではなく、方向勾配情報に適合する候補点を選択す
る。したがって、式（１０）の基準を満たさない候補点
は削除される。In the preferred embodiment, less than M potential candidate points are derived based on direction information. Specifically, for the edge point, a candidate point matching the direction gradient information is selected instead of identifying a candidate point in each potential direction. Therefore, candidate points that do not satisfy the criterion of Expression (10) are deleted.

【００７２】図９を参照して、現在のエッジ点１７６を
囲む画素の５×５ボックス１７４を、４つの領域１７
８、１８０、１８２および１８４に分割し、すべての方
向における候補を考慮する。これらの領域のうち１つま
たは２つは、方向情報を考慮するときに削除される。Referring to FIG. 9, a 5 × 5 box 174 of pixels surrounding the current edge point 176 is divided into four regions 17.
8, 180, 182 and 184 and consider candidates in all directions. One or two of these regions are deleted when considering direction information.

【００７３】残りの各領域に６つの画素がある。これら
６つの画素のうち１つを各領域において候補画素
（「点」）として選択し、この画素は潜在的に、現在の
エッジ点１７６と、オブジェクト境界エッジ点として置
き換わる。このように、１８６−１８９のうち１つ、２
つまたは３つの候補点が現在のエッジ点１７６各々につ
いて選択される。代替実施例では、候補点の数は異なる
ものでもよく、または方向情報に適合する別の候補点選
択方法を用いてもよい。There are six pixels in each of the remaining regions. One of these six pixels is selected as a candidate pixel ("point") in each region, which potentially replaces the current edge point 176 as an object boundary edge point. Thus, one of 186-189, 2
One or three candidate points are selected for each of the current edge points 176. In alternative embodiments, the number of candidate points may be different, or another candidate point selection method that matches the direction information may be used.

【００７４】所与の領域について、候補点は、６つの潜
在点のうちエッジポテンシャルエネルギが最も高い画素
である。現在Ｎのエッジ点がある画像オブジェクト境界
について、Ｎの点の各々についてＭ（たとえば４）の代
替候補点がある場合、可能な輪郭は（Ｍ＋１）^N（たと
えば５^N）あり、ここからモデル化画像オブジェクト境
界を選択する。ステップ２０２で、トラベルアルゴリズ
ムを代替候補点ともに現在のエッジ点に適用して最適輪
郭経路を選択する。図１３は、起こり得る輪郭について
のトラベル経路図を示す。各列に（Ｍ＋１（たとえば
５））の点がある。５つの点は、現在のエッジ点１７６
およびこの現在のエッジ点１７６についての４つの候補
エッジ点１８６および１８９に対応する。各行における
点の数（列の数にも等しい）は、Ｎに相当する。方向情
報を用いて、経路順列のいくつかが方向勾配情報に適合
しないとき、トラベル問題を簡単にする（上記式６−式
１０参照）。For a given region, a candidate point is the pixel with the highest edge potential energy of the six potential points. For an image object boundary that currently has N edge points, if there are M (eg, 4) alternative candidate points for each of the N points, there are (M + 1) ^N (eg, 5 ^N ) possible contours from which to model. Select image object boundaries. In step 202, the optimal contour path is selected by applying the travel algorithm to the current edge point together with the alternative candidate points. FIG. 13 shows a travel path diagram for a possible contour. There are (M + 1 (eg, 5)) points in each column. The five points are the current edge point 176
And four candidate edge points 186 and 189 for this current edge point 176. The number of points in each row (also equal to the number of columns) corresponds to N. The directional information is used to simplify the travel problem when some of the path permutations do not match the directional gradient information (see Equations 6-6 above).

【００７５】最適画像オブジェクト境界を選択するため
に、現在の輪郭上の開始場所１９０を選択する。このよ
うな場所１９０は、所与の現在のエッジ点１７６および
それぞれの候補エッジ点（たとえば点１８６−１８９の
サブセット）に対応する。このように、トラベル経路上
の各判断点での点の数はＭ＋１＝５点未満である。最適
経路は潜在点から得られる。潜在経路のうち最も最適な
経路はモデル化されたオブジェクト境界として選択され
る。最適経路を導き出すためのプロセスは、導き出すべ
き各経路について同じである。To select the optimal image object boundary, a start location 190 on the current contour is selected. Such locations 190 correspond to a given current edge point 176 and respective candidate edge points (eg, a subset of points 186-189). Thus, the number of points at each determination point on the travel route is less than M + 1 = 5. The optimal route is obtained from the latent points. The best path among the potential paths is selected as the modeled object boundary. The process for deriving the optimal path is the same for each path to be derived.

【００７６】図１１を参照して、エッジ点１７６ｓから
始まる経路について考える。この経路のセグメントは、
隣接する列ｓ＋１の点の１つに進むことによって構成さ
れる。このように、１つの選択肢は点１７６（ｓ＋１）
に進むことである。別の選択肢は候補点１８６（ｓ＋
１）に進むことである。他の選択肢は１８７（ｓ＋
１）、１８８（ｓ＋１）および１８９（ｓ＋１）を含み
得る。この場合、選択肢１８９（ｓ＋１）は方向勾配情
報に従っておらず考慮に入れない。選択するのは１つだ
けである。この選択は、列（ｓ＋１）の点のうちどの点
について、結果として得られる経路のエネルギの差が最
も小さいのか（たとえば最もエネルギが節約される）を
判断することによってなされる。選択された点は、列ｓ
＋１におけるこの点の、現在の点からの距離とともに、
保存される。点１８６（ｓ＋１）が選択される例を考え
る。選択されたこの点は、点１７６（ｓ＋１）との間に
いくつ画素があるのかという距離値（たとえば画素で表
わす）とともに、保存される。Referring to FIG. 11, a path starting from edge point 176s will be considered. The segments of this path are
It is constructed by going to one of the points in the adjacent column s + 1. Thus, one option is point 176 (s + 1)
It is to proceed to. Another option is a candidate point 186 (s +
It is to go to 1). Another option is 187 (s +
1) may include 188 (s + 1) and 189 (s + 1). In this case, option 189 (s + 1) does not follow the directional gradient information and is not taken into account. Select only one. This selection is made by determining which of the points in column (s + 1) has the smallest energy difference in the resulting path (eg, the most energy savings). The selected point is the column s
With the distance of this point at +1 from the current point,
Will be saved. Consider an example in which point 186 (s + 1) is selected. This selected point is saved, along with a distance value (eg, expressed in pixels) of how many pixels there are with point 176 (s + 1).

【００７７】同様に、この経路の次のセグメントを構成
するために、列ｓ＋２の潜在点のうち１つを選択する。
経路に沿う各セグメントについて、潜在セグメントのう
ち１つのみを、このような点が同じ列において現在の点
１７６から離れている距離とともに保存する。Similarly, one of the potential points in column s + 2 is selected to construct the next segment of this path.
For each segment along the path, save only one of the potential segments, along with the distance such points are away from the current point 176 in the same column.

【００７８】同じプロセスを実施して、点１８６ｓから
始まる経路を導き出す。この経路の第１セグメントは、
隣接する列ｓ＋１の点の１つに進むことによって構成さ
れる。１つの選択肢は点１７６（ｓ＋１）に進むことで
ある。別の選択肢は候補点１８６（ｓ＋１）に進むこと
である。他の選択肢は、１８７（ｓ＋１）および１８８
（ｓ＋１）を含む。選択されるのは１つのみである。こ
の選択は、列（ｓ＋１）の点のうちどの点について、現
在の輪郭１７３に関し結果として得られる経路のエネル
ギ差が最も大きいかを判断することにより行なわれる。
選択された点は、この点が列ｓ＋１において現在の点か
ら離れている距離とともに、保存される。点１８７ｓお
よび１８８ｓそれぞれから始まる経路が同じ態様で構成
される。点１８９ｓは、方向勾配分析に基づき候補点と
して選択されなかったため、ここでは考慮に入れない。
結果得られた経路（それぞれ１７６ｓ、１８６ｓ、１８
７ｓおよび１８８ｓから始まる）を比較して、最適な経
路はどれかを判断する（たとえば現在の輪郭１７３に対
しエネルギ差が最も大きい）。The same process is performed to derive a path starting at point 186s. The first segment of this path is
It is constructed by going to one of the points in the adjacent column s + 1. One option is to go to point 176 (s + 1). Another option is to go to candidate point 186 (s + 1). Other options are 187 (s + 1) and 188
(S + 1). Only one is selected. This selection is made by determining which of the points in column (s + 1) has the largest energy difference in the resulting path for current contour 173.
The selected point is saved, along with the distance that this point is from the current point in column s + 1. The paths starting from each of points 187s and 188s are configured in the same manner. Point 189s was not considered here because it was not selected as a candidate point based on directional gradient analysis.
The resulting paths (176s, 186s, 18 respectively)
7s and 188s) to determine which is the best path (e.g., the energy difference is greatest for the current contour 173).

【００７９】例として挙げた、オブジェクト分割方法の
実現化は、ビデオ符号化、ビデオ編集およびコンピュー
タビジョンを含むが、これらに限定されるわけではな
い。たとえば、分割およびモデル化を、ＭＰＥＧ−４ビ
デオ符号化およびコンテンツに基づいたビデオ編集とい
う状況で行なってもよく、この場合異なるビデオクリッ
プからのビデオオブジェクトをグループ化して新たなビ
デオシーケンスを形成する。コンピュータビジョン応用
の一例として、分割およびモデル化を、限定されたユー
ザ補助を利用して行ないターゲットを追跡してもよい
（たとえば軍事または監視状況）。ターゲットがユーザ
補助でロックされたとき、追跡および分割方法は自動的
に、ターゲットを追うための情報を提供する。[0079] Examples of implementations of the object segmentation method include, but are not limited to, video encoding, video editing, and computer vision. For example, segmentation and modeling may be performed in the context of MPEG-4 video encoding and content-based video editing, where video objects from different video clips are grouped to form a new video sequence. As an example of a computer vision application, segmentation and modeling may be performed with limited accessibility to track targets (eg, military or surveillance situations). When a target is locked with user assistance, the tracking and splitting method automatically provides information for tracking the target.

【００８０】価値がありかつ有利な効果この発明のある利点によれば、変形するまたは一部が急
速に動くオブジェクトについて、オブジェクトの正確な
境界が追跡される。広範囲にわたるオブジェクト形状お
よび異なるオブジェクト変形パターンを追跡できるとい
うことは、特にＭＰＥＧ−４画像処理システムで用いる
のに効果的である。 Worth and Advantageous Effects According to one advantage of the present invention, for objects that deform or move rapidly, the exact boundaries of the object are tracked. The ability to track a wide range of object shapes and different object deformation patterns is particularly advantageous for use with MPEG-4 image processing systems.

【００８１】この発明の好ましい実施例について示し説
明してきたが、さまざまな代替例、変形例および等価例
を使用できる。したがって、上記の説明は、前掲の特許
請求の範囲で規定されるこの発明の範囲を限定するもの
と理解されてはならない。While the preferred embodiment of the invention has been illustrated and described, various alternatives, modifications and equivalents may be used. Therefore, the above description should not be taken as limiting the scope of the invention which is defined by the following claims.

[Brief description of the drawings]

【図１】ビデオフレームのシーケンス中のビデオオブ
ジェクトを追跡するための対話型処理環境のブロック図
である。FIG. 1 is a block diagram of an interactive processing environment for tracking video objects in a sequence of video frames.

【図２】図１の対話型処理環境のホスト計算システム
の例のブロック図である。FIG. 2 is a block diagram of an example of a host computing system of the interactive processing environment of FIG. 1;

【図３】追跡すべきオブジェクトを最初に選択し分割
するための分割プロセスのフロー図である。FIG. 3 is a flow diagram of a splitting process for initially selecting and splitting an object to be tracked.

【図４】この発明のある実施例に従うオブジェクト追
跡および分割方法のフロー図である。FIG. 4 is a flow diagram of an object tracking and splitting method according to an embodiment of the present invention.

【図５】画像を分割して詳細画像および低域剰余を得
るための直交モデリングフィルタの図である。FIG. 5 is a diagram of an orthogonal modeling filter for dividing an image to obtain a detailed image and a low-frequency remainder.

【図６】（Ａ）は白黒サンプル画像の図であり、
（Ｂ）および（Ｃ）は（Ａ）の白黒画像の勾配大きさ画
像の図であり、手書きの輪郭および方向情報のない動的
輪郭がそれぞれ併せて示されている。FIG. 6A is a diagram of a black and white sample image,
(B) and (C) are diagrams of the gradient size image of the black-and-white image of (A), in which a handwritten outline and a dynamic outline without direction information are also shown together.

【図７】（Ａ）および（Ｂ）は、図６の（Ａ）の白黒
画像の勾配大きさ画像の図であり、ある値の方向情報お
よび別の値の方向情報をそれぞれ用いて与えられる動的
輪郭がそれぞれ併せて示されている。FIGS. 7A and 7B are diagrams of the gradient magnitude image of the black-and-white image of FIG. 6A, which are given using direction information of a certain value and direction information of another value, respectively. The active contours are also shown together.

【図８】動的輪郭モデル化プロセスのフロー図であ
る。FIG. 8 is a flow diagram of an active contour modeling process.

【図９】現在のエッジ点の代わりに使用し得る他の候
補点を選択するために用いる、現在のエッジ点（画素）
についての５×５画素ドメインの図である。FIG. 9: Current edge point (pixel) used to select other candidate points that can be used in place of the current edge point
FIG. 4 is a diagram of a 5 × 5 pixel domain with respect to FIG.

【図１０】画像オブジェクト境界について１つの最適
経路を保存するために処理される潜在エッジ点の図であ
る。FIG. 10 is a diagram of potential edge points that are processed to preserve one optimal path for image object boundaries.

【図１１】輪郭が図１０の１組の点から導出されるプ
ロセスにおけるトラベル経路の一部の図である。11 is a diagram of a portion of a travel path in a process in which a contour is derived from the set of points of FIG.

[Explanation of symbols]

２８プロセッサ、３０メモリ、１１２動的輪郭。 28 processors, 30 memories, 112 active contours.

───────────────────────────────────────────────────── フロントページの続き (72)発明者トッド・シェプリンアメリカ合衆国、98105 ワシントン州、シアトル、イレブンス・アベニュ・エヌ・イー、4131、ナンバー・205 (72)発明者シジュン・スンアメリカ合衆国、98683 ワシントン州、バンクーバー、エス・イー・トゥウェンティシックスス・ドライブ、16900、ナンバー・47 (72)発明者ヨンミン・キムアメリカ合衆国、98155 ワシントン州、シアトル、エヌ・イー・ワンハンドレッドアンドエイティナインス・プレイス、4431 Ｆターム(参考） 5L096 FA05 GA01 HA01 ──────────────────────────────────────────────────の Continued on the front page (72) Inventor Todd Sheplin, United States of America, 98105 Washington, Seattle, Eleventh Avenue, NE, 4131, Number 205 (72) Inventor of Sijung Sun United States of America, 98683 Washington Vancouver, S.E. Twenty-Sixth Drive, 16900, Number 47 (72) Inventor Yongmin Kim United States of America, 98155 Washington, Seattle, NE One Hundred and Einsteins Place, 4431 F-term ( Reference) 5L096 FA05 GA01 HA01

Claims

[Claims]

A method for estimating a desired contour (112) of an image object tracked over a plurality of image frames, the method comprising receiving a first estimate (104) of a desired contour for a current image frame. Step, the current image frame includes image data, the first estimate includes a plurality of data points, the method further comprises: convolving the image data with a gradient operator in a first manner. Deriving (102); and deriving a convolution gradient direction (λ) of the image data by the gradient operator in the second mode (199).
Identifying a plurality of candidate data points (186-189) for each of the plurality of data points (2).
00), and re-estimating the desired contour (112) by deriving an energy minimizing spline advancing from a first point (190), wherein the first point comprises:
(I) one of the plurality of data points (1
76) or (ii) one of the plurality of candidate points (186-189) for the one of the plurality of data points including a first estimate. The energy minimization spline is based on an internal constraint force on the image object and an external image force on the image object, wherein the external image force is derived as a function of the magnitude of the gradient and the direction of the gradient. A method for estimating a desired contour of an image object.

2. Convolving the image data with the gradient operator in the second aspect based on a gradient direction derived from a preceding image frame with the gradient operator in the second aspect; The method of claim 1, comprising deriving the gradient direction (λ) for an image frame.

3. The step of receiving, convolving in a first manner, convolving in a second manner, identifying and re-estimating are performed on a sequence of image frames and are performed on one image frame. A first estimate of a desired contour is a re-estimated desired contour from a previous image frame, and the derived gradient direction for the one image frame was derived for the previous image frame. The method of claim 1, wherein the method is derived based in part on a gradient direction.

4. The step of convolving in the first mode comprises, for each selected one of a plurality of image points of a current image frame (100), converting the selected one of the plurality of image points to the first one of the plurality of image points. Convolved with the gradient operator in a manner,
The method of claim 1, further comprising deriving a gradient magnitude corresponding to the selected one point, wherein the gradient operator is based on a kernel of image data near the selected one point. .

5. The step of convolving in the second aspect comprises, for each selected one of a plurality of image points of a current image frame (100), the selected one of the plurality of image points Convolving with the gradient operator in an aspect to derive a gradient direction corresponding to the selected one point, the gradient operator comprising: a kernel of image data near the selected one point; The method of claim 1, based on a corresponding gradient operator derived for a previous image frame.

6. An apparatus (10) for estimating a desired contour (112) of an image object tracked over a plurality of image frames, the apparatus comprising a first estimation (104) of a desired contour for a current image frame. ), The processor re-estimating the desired contour and further storing a memory (30) for storing the re-estimated desired contour.
Wherein the current image frame (100) includes image data, the first estimate includes a plurality of data points, the processor performs a plurality of tasks, the task comprising: (a) the image data; Deriving the magnitude (102) of the convolution gradient by the gradient operator in the first mode; and (b) deriving the convolution gradient direction (λ) from the image data by the gradient operator in the second mode (199). (C) identifying a plurality of candidate data points (186-189) for each data point of the plurality of data points; and (d) an energy minimizing spline advancing from a first point (190). Re-estimating the desired contour (202) by deriving (i) the first point.
Or (ii) a plurality of candidate points (186-1) for the one of the plurality of data points including the first estimate.
89), wherein the energy minimizing spline is based on an internal constraint force on the image object and an external image force on the image object, wherein the external image force is Derived as a function of the magnitude of the gradient and the direction of the gradient,
An apparatus for estimating a desired contour of an image object.

7. An apparatus (10) for estimating a desired contour (112) of an image object tracked over a plurality of image frames, the apparatus comprising a first estimation (104) of a desired contour for a current image frame. A) a first processor (28) for receiving
The current image frame includes image data, and the first
Comprises a plurality of data points, the apparatus further comprising: a second processor (28) for deriving the magnitude of the convolution gradient (102) with the gradient operator in a first manner from the image data; A second processor (2) that derives a convolution gradient direction (λ) by the gradient operator in the second mode.
8); a fourth processor (28) for identifying a plurality of candidate data points (186-189) for each of the plurality of data points; and an energy minimizing spline advancing from the first point (190). And a fifth processor (28) for re-estimating the desired contour by deriving the first contour, wherein the first point comprises: (i) one of the plurality of data points including the first estimate (176). Or (ii) a plurality of candidate points (186-86) for said one of said plurality of data points including a first estimate.
189), wherein the energy minimizing spline is based on an internal constraint force on the image object and an external image force on the image object, wherein the external image force is Apparatus for estimating a desired contour of an image object, which is derived as a function of gradient magnitude and gradient direction.

8. The processor according to claim 7, wherein the first processor, the second processor, the third processor, the fourth processor and the fifth processor are the same processor (28).
An apparatus according to claim 1.

9. The third processor convolves the image data with the gradient operator in the second aspect based on a gradient direction derived from a previous image frame, and convolves the image data with the current image frame. The device according to claim 7, wherein the gradient direction (λ) is derived.

10. The first processor, the second processor, a third processor, a fourth processor, and a fifth processor.
The processor operates on a sequence of image frames and the first of the desired contours for one image frame.
Is the desired contour re-estimated from the previous image frame, and the derived gradient direction for the one image frame is based in part on the derived gradient direction for the previous image frame. The apparatus of claim 7, wherein the apparatus is derived by:

11. The gradient processor according to claim 1, wherein, for each selected one of a plurality of image points of the current image frame, the second processor converts the selected one of the plurality of image points in the first manner. Convolved with the selected 1
The apparatus of claim 7, wherein a gradient magnitude (104) corresponding to one point is derived, and wherein the gradient operator is based on a kernel of image data near the selected one point.

12. The gradient processor according to claim 2, wherein, for each of a selected one of a plurality of image points of the current image frame, the third processor converts the selected one of the plurality of image points in the second manner. Convolved with the selected 1
8. Deriving a gradient direction corresponding to one point, wherein the gradient operator is based on a kernel of image data near the selected one point and a corresponding gradient operator derived for a preceding image frame. The described device.