JP2011517226A

JP2011517226A - System and method for enhancing the sharpness of an object in a digital picture

Info

Publication number: JP2011517226A
Application number: JP2011503987A
Authority: JP
Inventors: バガヴァシー，シタラム; リャッチ，ホアン; ユ，ホアン
Original assignee: Thomson Licensing SAS
Current assignee: Thomson Licensing SAS
Priority date: 2008-04-11
Filing date: 2009-04-07
Publication date: 2011-05-26
Also published as: WO2009126258A9; EP2277142A1; WO2009126258A1; CA2720947A1; BRPI0911189A2; CN101999138A

Abstract

デジタルピクチャにおける対象の鮮明度は、デジタルピクチャの入力映像を、対象の性質及び特徴を表す記憶された情報と比較して、対象を識別してその位置を示す対象位置決め情報を生成することによって高められる。対象及び対象が位置する領域の鮮明度は画像処理によって高められ、そのように鮮明度を高められた入力映像はエンコードされる。The sharpness of an object in a digital picture is enhanced by comparing the input picture of the digital picture with stored information that represents the nature and characteristics of the object to identify the object and generate object positioning information that indicates its position. It is done. The sharpness of the object and the area where the object is located is enhanced by image processing, and the input video with such enhanced sharpness is encoded.

Description

本発明は、概して、デジタルピクチャの送信に係り、具体的に、デジタルピクチャ、特に、低解像度・低ビットレートの映像符号化を有するユニットで表示されるデジタルピクチャにおいて関心がある対象の鮮明度（visibility）を高めることに係る。 The present invention relates generally to the transmission of digital pictures, and in particular, the sharpness of interest of interest in digital pictures, in particular digital pictures displayed in units with low resolution and low bit rate video coding. Visibility).

例えば携帯電話機及びＰＤＡ等の携帯型の装置に映像コンテンツを配信する要望が高まっている。小さな画面サイズ、限られた帯域幅及び限られたデコーダエンド処理能力のために、映像は低ビットレート且つ低解像度でエンコードされる。 For example, there is an increasing demand for distributing video content to portable devices such as mobile phones and PDAs. Due to the small screen size, limited bandwidth, and limited decoder end processing power, the video is encoded at a low bit rate and low resolution.

低解像度・低ビットレートのエンコーディングの主たる問題の１つは、認知される映像品質にとって重要な対象の劣化又は喪失である。例えば、ボールがはっきりと見えない場合にサッカー又はテニスの試合のビデオクリップを観るのは厄介である。 One of the main problems with low resolution, low bit rate encoding is the degradation or loss of objects important to perceived video quality. For example, it is cumbersome to watch a video clip of a soccer or tennis match when the ball is not clearly visible.

従って、低解像度・低ビットレートの映像の主観的な表示品位を改善するよう、関心のある対象をハイライト表示することが好ましい。 Therefore, it is preferable to highlight the object of interest so as to improve the subjective display quality of the low resolution / low bit rate video.

本発明の様々な実施において、デジタル画像における関心のある対象の鮮明度は、画像における対象のおおよその位置及びサイズを前提として高められる。あるいは、対象の鮮明度は、対象のおおよその位置及びサイズの精緻化の後に高められる。対象のエンハンスメントは少なくとも２つの利点を提供する。第１に、対象のエンハンスメントは、対象を見やすく且つ追いやすくし、それによってユーザ・エクスペリエンスを改善する。第２に、対象のエンハンスメントは、エンコーディング（すなわち、圧縮）段階の間、対象をほとんど劣化させない。本発明の主たる用途の１つは、例えば携帯電話機及びＰＤＡ等の携帯型の装置への映像配信であるが、本発明の特徴、コンセプト及び実施は、また、例えば、インターネット・プロトコル上の映像（低ビットレート標準画質コンテンツ）を含む様々な他の用途、状況及び環境にとっても有用である。 In various implementations of the present invention, the sharpness of an object of interest in a digital image is increased given the approximate location and size of the object in the image. Alternatively, the sharpness of the object is increased after refinement of the approximate position and size of the object. The subject enhancement provides at least two advantages. First, subject enhancement makes the subject easier to see and follow, thereby improving the user experience. Second, the enhancement of the object causes little degradation of the object during the encoding (ie, compression) phase. One of the main applications of the present invention is the distribution of video to portable devices such as mobile phones and PDAs, but the features, concepts and implementations of the present invention also include, for example, video over Internet protocols ( It is also useful for a variety of other applications, situations and environments, including low bit rate standard definition content).

本発明は、映像において関心のある対象をハイライト表示して、低解像度・低ビットレートの映像の主観的な表示品位を改善する。本発明のシステム及び方法は、異なった特徴の対象を扱うことができるとともに、全自動モード、半自動モード（すなわち、手動援助有り）及び全手動モードで動作することができる。対象のエンハンスメントは、前処理段階で（すなわち、映像エンコーディング段階の前又はその段階で）、あるいは後処理段階で（すなわち、映像デコーディングの後で）実行されてよい。 The present invention highlights objects of interest in the video to improve the subjective display quality of the low resolution, low bit rate video. The system and method of the present invention can handle objects of different characteristics and can operate in fully automatic mode, semi-automatic mode (ie with manual assistance) and fully manual mode. The subject enhancement may be performed in the pre-processing stage (ie, before or at the video encoding stage) or at the post-processing stage (ie, after video decoding).

本発明に従って、デジタルピクチャにおける対象の鮮明度は、対象を含む入力映像を提供し、前記対象の性質及び特徴を表す情報を記憶し、前記入力映像並びに前記対象の性質及び特徴を表す情報に応じて、前記対象を識別し且つ該対象の位置を示す対象位置決め情報を生成することによって、高められる。前記対象及び前記対象が位置する領域を含む前記入力映像の部分のエンハンスト（enhanced）映像は、前記対象位置決め情報に応じて前記入力映像から生成され、生成されたエンハンスト映像はエンコードされる。 According to the present invention, the definition of a target in a digital picture provides an input video including the target, stores information representing the properties and characteristics of the target, and depends on the input video and information representing the characteristics and features of the target And by generating object positioning information that identifies the object and indicates the position of the object. An enhanced video of the portion of the input video including the target and the region where the target is located is generated from the input video according to the target positioning information, and the generated enhanced video is encoded.

本発明に従って構成される、デジタル映像において対象の鮮明度を高めるためのシステムの好ましい実施形態に係るブロック図である。1 is a block diagram according to a preferred embodiment of a system for enhancing the sharpness of an object in a digital video constructed according to the present invention. 図１のシステムによって提供されるおおよその対象の位置を表す。2 represents the approximate location of an object provided by the system of FIG. Ａ〜Ｄは、本発明に従う対象エンハンスメントにおけるワークフローを表す。A to D represent the workflow in the subject enhancement according to the present invention. 本発明に従って対象識別情報及び対象位置情報を精緻化するために使用される対象境界推定アルゴリズムに係るフローチャートである。4 is a flowchart according to an object boundary estimation algorithm used to refine object identification information and object position information according to the present invention. 本発明に従う、任意形状の対象の境界のレベル設定推定の概念に係る実施を表す。Fig. 4 represents an implementation according to the concept of level setting estimation of an arbitrarily shaped object boundary according to the invention. 本発明に従う、任意形状の対象の境界のレベル設定推定の概念に係る実施を表す。Fig. 4 represents an implementation according to the concept of level setting estimation of an arbitrarily shaped object boundary according to the invention. 本発明に従う、任意形状の対象の境界のレベル設定推定の概念に係る実施を表す。Fig. 4 represents an implementation according to the concept of level setting estimation of an arbitrarily shaped object boundary according to the invention. 本発明に従う、任意形状の対象の境界のレベル設定推定の概念に係る実施を表す。Fig. 4 represents an implementation according to the concept of level setting estimation of an arbitrarily shaped object boundary according to the invention. 本発明に従う対象拡大アルゴリズムに係るフローチャートである。6 is a flowchart according to an object enlargement algorithm according to the present invention. Ａ〜Ｃは、エンコーディング段階の間の対象識別情報及び対象位置情報の精緻化を説明するのに有用な１６×１６のマクロブロックの３つの可能な細分を表す。A through C represent three possible subdivisions of a 16 × 16 macroblock useful for explaining the refinement of the object identification information and object position information during the encoding stage.

図１を参照すると、本発明に従って構成される対象エンハンシングシステム（object enhancing system）は、送信器１０内の全ての構成要素にわたってよく、あるいは、対象エンハンスメント部品は受信器２０にあってよい。対象のハイライト表示が行われる処理チェーンには３つの段階がある。すなわち、（１）対象がエンコーディング（すなわち、圧縮）段階の前に送信器１０で引き立たせられる前処理、（２）対象を含む関心領域が、対象及びその位置に関する情報の精緻化によって送信器１０で特別の処理を受けるエンコーディング、及び（３）対象が、メタデータとしてビットストリームにより送信器１０から送信された対象及びその位置に関するサイド情報を用いてデコードディングの後に受信器２０で引き立たせられる後処理である。本発明に従って構成される対象エンハンシングシステムは、上記の段階の中の１つのみで、又は上記の段階の中の２つで、又は上記の３つの段階全てで、対象のハイライト表示を提供するよう配置されてよい。 Referring to FIG. 1, an object enhancing system configured in accordance with the present invention may span all components in transmitter 10, or the target enhancement component may be in receiver 20. There are three stages in the processing chain where the highlighting of the object takes place. (1) pre-processing in which the object is complemented by the transmitter 10 before the encoding (ie compression) stage; (2) the region of interest containing the object is transmitted to the transmitter 10 by refinement of information about the object and its position. (3) After the target is complemented at the receiver 20 after decoding using the side information about the target and its location sent from the transmitter 10 by the bitstream as metadata It is processing. An object enhancement system constructed in accordance with the present invention provides object highlighting in only one of the above stages, or in two of the above stages, or in all three of the above stages. May be arranged to do.

デジタルピクチャにおいて対象の鮮明度を高めるための図１のシステムは、関心のある対象を含む入力映像を提供する手段を有する。鮮明度を高められるべき対象を含むデジタルピクチャの発生源は、従来の構成及び動作のテレビジョンカメラであってよく、矢印１２によって表されている。 The system of FIG. 1 for enhancing the sharpness of an object in a digital picture has means for providing an input video that includes the object of interest. The source of the digital picture that includes the object to be enhanced in sharpness may be a television camera of conventional construction and operation and is represented by arrow 12.

図１のシステムは、更に、関心のある対象の性質及び特徴を表す情報（例えば、対象テンプレート）を記憶し、且つ、入力映像並びに対象の性質及び特徴を表す情報に応じて、対象を識別し且つその位置を示す対象位置決め情報を生成する手段を有する。かかる手段は、図１において対象位置決めモジュール１４として示され、フレームごとに入力映像を走査して、関心のある対象の性質及び特徴を表す記憶された情報と同じ性質及び特徴を有するピクチャ内の対象（すなわち、対象が何であるのか）を特定し、その対象の位置（すなわち、対象がどこにあるのか）を示す手段を有する。対象位置決めモジュール１４は、従来の構成及び動作のユニットであってよく、フレームごとに入力映像のデジタルピクチャを走査し、走査された入力映像のデジタルピクチャのセクタを関心のある対象の性質及び特徴を表す記憶された情報と比較して、特定のセクタの走査から生成された情報が対象の性質及び特徴を表す記憶された情報と同じである場合に関心のある対象を識別し且つデジタルピクチャのグリッド座標によってその位置を示す。 The system of FIG. 1 further stores information (eg, a target template) that represents the properties and characteristics of the object of interest, and identifies the objects according to the input video and information that represents the characteristics and features of the target. And means for generating target positioning information indicating the position. Such means is shown in FIG. 1 as the object positioning module 14 and scans the input video frame by frame, and the object in the picture has the same properties and characteristics as the stored information representing the properties and characteristics of the object of interest. (I.e., what the subject is) and means for indicating the location of the subject (i.e., where the subject is). The object positioning module 14 may be a unit of conventional configuration and operation, scanning the digital picture of the input video every frame, and scanning the sector of the digital picture of the scanned input video to characterize and characterize the object of interest. A grid of digital pictures that identifies an object of interest when the information generated from a scan of a particular sector is the same as the stored information that represents the nature and characteristics of the object as compared to the stored information that represents The position is indicated by coordinates.

一般に、対象位置決めモジュール１４は、関心のある対象の識別及び位置決めにおいて下記の方法の中の１又はそれ以上を実施する。 In general, the object positioning module 14 performs one or more of the following methods in identifying and positioning an object of interest.

・対象トラッキング・・・対象トラッカーの目的は、映像において動いている対象の位置決めを行うことである。通常は、トラッカーは、前のフレームから得られる当該動いている対象の履歴を鑑みて、現在のフレームにおける対象パラメータ（例えば、位置、サイズ）を推定する。トラッキングアプローチは、例えば、テンプレート照合、オプティカルフロー、カルマンフィルタ、平均シフト解析、隠れマルコフモデル、及び特別のフィルタに基づいてよい。 Target tracking: The purpose of the target tracker is to locate a moving object in the video. Normally, the tracker estimates target parameters (eg, position, size) in the current frame in view of the moving target's history obtained from the previous frame. The tracking approach may be based on, for example, template matching, optical flow, Kalman filter, average shift analysis, hidden Markov model, and special filters.

・対象検出・・・対象検出における目的は、対象に関する予備的知識に基づいて画像又は映像フレームで対象の存在及び位置を検出することである。対象検出方法は、一般に、トップダウン・アプローチ及びボトムアップ・アプローチの組み合わせを用いる。トップダウン・アプローチで、対象検出方法は、検出される対象に係る人間の知識から得られる規則に基づく。ボトムアップ・アプローチで、対象検出方法は、対象を低レベルの構造的特徴又はパターンと関連付けて、これらの特徴又はパターンを探すことによって対象の位置を決める。 Object detection: The purpose in object detection is to detect the presence and position of an object in an image or video frame based on preliminary knowledge about the object. The object detection method generally uses a combination of a top-down approach and a bottom-up approach. In a top-down approach, the object detection method is based on rules derived from human knowledge of the detected object. In a bottom-up approach, the object detection method associates the object with low-level structural features or patterns and locates the object by looking for these features or patterns.

・対象セグメンテーション・・・このアプローチで、画像又は映像は、それを構成する“対象”に分解される。これらの“対象”は、例えばカラーパッチ等の意味エンティティ（semantic entities）又は視覚構造を有してよい。かかる分解は、一般に、対象の動き、色及びテクスチャ属性に基づく。対象セグメンテーションは、コンパクトビデオ符号化、自動及び半自動のコンテンツベースの記述（content-based description）、フィルム・ポストプロダクション（film post-production）、及び場面説明を含む幾つかの応用を有してよい。具体的に、セグメンテーションは、場面について対象に基づく記述を提供することによって対象の位置決めに係る問題を単純化する。 Object segmentation. With this approach, an image or video is broken down into the "objects" that make it up. These “objects” may have semantic entities or visual structures such as color patches, for example. Such decomposition is generally based on object motion, color and texture attributes. Target segmentation may have several applications including compact video coding, automatic and semi-automatic content-based description, film post-production, and scene description. Specifically, segmentation simplifies the problem of object positioning by providing an object-based description of the scene.

図２は、対象位置決めモジュール１４によって提供されるおおよその対象の位置を表す。ユーザは、例えば、対象が位置する領域の周囲に楕円を描いて、対象の位置をおおよそ示す。最終的に、おおよその対象位置決め情報（すなわち、楕円の中心点、長軸及び短軸パラメータ）は精緻化される。 FIG. 2 represents the approximate object position provided by the object positioning module 14. For example, the user draws an ellipse around the area where the object is located to roughly indicate the position of the object. Finally, the approximate target positioning information (i.e., ellipse center point, long axis and short axis parameters) is refined.

理想的には、対象位置決めモジュール１４は全自動モードで動作する。しかし、具体的に、幾つかの手動支援が、位置を見つけ出すべきシステムによって重要な対象を定義するよう、システムによって又は最低限生ずるエラーを補正するために必要とされる。非対象領域のエンハンスメントは、見る者を混乱させて実際の動作を見逃させることがある。この問題を解消し又は最小限とするよう、ユーザは、上記のように、対象の周囲に楕円を描くことができ、次いで、システムは、特定された位置から対象を追跡することができる。フレームにおける対象の位置決めが成功すると、対象位置決めモジュール１４は、対応する楕円パラメータ（すなわち、中心点、長軸及び短軸）を出力する。理想的には、この境界となる楕円の輪郭は、対象の輪郭と一致する。 Ideally, the target positioning module 14 operates in a fully automatic mode. In particular, however, some manual assistance is required to correct errors caused by the system or minimally so as to define important objects by the system where the position is to be found. Non-target area enhancements can confuse viewers and miss actual movements. To eliminate or minimize this problem, the user can draw an ellipse around the object, as described above, and then the system can track the object from the identified location. If the object is successfully positioned in the frame, the object positioning module 14 outputs the corresponding ellipse parameters (ie, center point, major axis, and minor axis). Ideally, the contour of the ellipse serving as the boundary coincides with the contour of the object.

しかし、パラメータが単なる近似であって、結果として得られる楕円が対象をしっかりと包含しないまま対象エンハンスメントが適用される場合、２つの問題が起こりうる。第１に、楕円が対象の全体を含まないために、対象は全体としてエンハンスメントを受けられるわけではない。第２に、非対象領域がエンハンスメントを受けることがある。これらの結果は好ましくないために、このような状況下で、エンハンスメントの前に対象領域を精緻化することが有用である。対象位置決め情報の精緻化は、以下でより詳細に検討される。 However, if the parameter is just an approximation and the object enhancement is applied without the resulting ellipse firmly containing the object, two problems can arise. First, because an ellipse does not contain the entire object, the object cannot receive enhancement as a whole. Second, non-target areas may be enhanced. Because these results are undesirable, it is useful to refine the target area before enhancement under such circumstances. The refinement of the object positioning information is discussed in more detail below.

図１のシステムは、更に、入力映像と、対象位置決めモジュール１４から受け取られる対象位置決め情報とに応じて、関心のある対象及び対象が位置する領域を含むデジタルピクチャの部分のエンハンスト映像を生成する手段を有する。かかる手段は、対象エンハンスメントモジュール１６として図１で示され、従来の構成及び動作のユニットであってよく、関心のある対象を含むデジタルピクチャの領域の鮮明度を、その領域に従来の画像処理演算を適用することによって高める。フレームごとに対象位置決めモジュール１４から受け取られる対象位置決め情報は、関心のある対象が位置する所定サイズの領域のグリッド座標を含む。加えて、上述されるように、対象エンハンスメントは、エンハンスメント段階の後に続くエンコーディング段階（以下で記載）の間の対象の劣化を低減するのに役立つ。ここまでの図１のシステムの動作は、上記の前処理動作モードに対応する。 The system of FIG. 1 further includes means for generating an enhanced image of a portion of the digital picture that includes the object of interest and the region in which the object is located in response to the input image and the object positioning information received from the object positioning module 14. Have Such means is shown in FIG. 1 as the object enhancement module 16 and may be a unit of conventional construction and operation, and the conventional image processing operations are applied to the sharpness of the area of the digital picture containing the object of interest in that area. Enhance by applying. The object positioning information received from the object positioning module 14 for each frame includes grid coordinates of a predetermined size area where the object of interest is located. In addition, as described above, object enhancement helps to reduce object degradation during the encoding stage (described below) that follows the enhancement stage. The operation of the system of FIG. 1 so far corresponds to the above-described preprocessing operation mode.

対象のエンハンスメントを行う場合に、対象の鮮明度は、関心のある対象が位置する領域で画像処理演算を適用することによって改善される。これらの演算は、対象の内部で（例えば、テクスチャエンハンスメント）及び場合により対象の外部でさえ（例えば、コントラスト増大、対象領域外のぼかし）、対象境界に沿って（例えば、エッジ鮮鋭化（sharpening））適用されてよい。例えば、最も対象に注目を引く方法の１つは、対象の内部で対象の輪郭に沿ってエッジを鮮鋭化することである。これは、対象のディテールをより鮮明にするとともに、対象を背景から際立たせる。更に、より鮮鋭化されたエッジは、より良くエンコーディングを切り抜ける傾向がある。他の可能性は、例えば、スムージング、鮮鋭化及び対象精緻化の各動作をインタラクティブに適用することによって（必ずしも、適用される順序はここに挙げられた順でなくてもよい。）、対象を拡大（enlarge）することである。 When performing object enhancement, object sharpness is improved by applying image processing operations in the region where the object of interest is located. These operations are performed inside the object (eg texture enhancement) and possibly even outside the object (eg contrast enhancement, blurring outside the object region) along the object boundary (eg edge sharpening). ) May be applied. For example, one way to attract the most attention to an object is to sharpen the edges along the object's contour within the object. This makes the subject's details clearer and makes the subject stand out from the background. Furthermore, sharper edges tend to get through encoding better. Other possibilities include, for example, by applying smoothing, sharpening and object refinement operations interactively (the order of application need not necessarily be in the order listed here). To enlarge.

図３Ａ〜３Ｄは、対象エンハンスメント処理におけるワークフローを表す。図３Ａは、焦点を当てられている画像がサッカーボールであるところのサッカー映像における単一フレームである。図３Ｂは、対象位置決めモジュール１４の出力、すなわち、フレームにおけるサッカーボールの対象位置決め情報を示す。図３Ｃは、以下でより詳細に検討される領域精緻化ステップを表し、このステップで、図３Ｂのおおよその対象位置情報は、対象境界のより正確な推定、すなわち、ボールを囲む明るい色の線を生成するよう精緻化される。図３Ｄは、対象エンハンスメント、本例ではエッジ鮮鋭化を適用した後の結果を示す。留意すべきは、サッカーボールは図３Ｄでより鮮鋭化されて、図３Ａの元のフレームにおけるよりも鮮明である点である。対象は、また、より高いコントラストを有する。コントラストとは、一般に、暗い色をより暗く、明るい色をより明るくすることをいう。 3A to 3D show a workflow in the target enhancement process. FIG. 3A is a single frame in a soccer video where the focused image is a soccer ball. FIG. 3B shows the output of the target positioning module 14, that is, the target positioning information of the soccer ball in the frame. FIG. 3C represents a region refinement step, which will be discussed in more detail below, in which the approximate object location information of FIG. 3B provides a more accurate estimate of the object boundary, ie a bright colored line surrounding the ball Is refined to produce FIG. 3D shows the result after applying the target enhancement, in this example edge sharpening. It should be noted that the soccer ball is sharpened in FIG. 3D and is sharper than in the original frame of FIG. 3A. The subject also has a higher contrast. Contrast generally refers to making dark colors darker and light colors lighter.

図１のシステムに対象エンハンスメントを含めることによって、有意な利点が得られる。不完全なトラッキング及び歪みエンハンスメントに伴う問題が解消される。不完全なトラッキングは、対象の位置決めを難しくしうる。フレームごとに、対象の位置は僅かに外れて、各フレームは違うふうに僅かに外れることがある。これは、例えば、背景の断片が様々なフレームでエンハンスメントを受けること及び／又は対象の異なる部分が様々なフレームでエンハンスメントを受けることに起因して、フリッカを引き起こしうる。更に、一般的なエンハンスメント技術は、ある状況下で、歪みを導入することがある。 Including targeted enhancements in the system of FIG. 1 provides significant advantages. Problems associated with incomplete tracking and distortion enhancement are eliminated. Incomplete tracking can make object positioning difficult. For each frame, the position of the object is slightly off, and each frame may be off slightly differently. This can cause flicker, for example, due to background fragments being enhanced in various frames and / or different portions of the subject being enhanced in various frames. Furthermore, common enhancement techniques may introduce distortion under certain circumstances.

上記のように、エンハンスメント前の対象位置決め情報の精緻化は、対象が位置する領域の境界外にある特徴に対するエンハンスメントを回避するよう、対象位置決め情報が各フレームで対象の性質及び対象の位置のみを近似する場合に、必要とされうる。 As described above, the refinement of the target positioning information before enhancement is such that the target positioning information only shows the target property and target position in each frame so as to avoid enhancement for features outside the boundary of the region where the target is located. May be needed when approximating.

対象位置決めモジュール１４による対象位置決め情報の生成及び対象エンハンスメントモジュール１６への対象位置決め情報の送信は、上記のように全自動であってよい。入力映像のフレームが対象位置決めモジュール１４によって受信されると、対象位置決め情報は対象位置決めモジュール１４によって更新され、更新された対象位置決め情報は対象エンハンスメントモジュール１６に送信される。 The generation of the target positioning information by the target positioning module 14 and the transmission of the target positioning information to the target enhancement module 16 may be fully automatic as described above. When the frame of the input video is received by the target positioning module 14, the target positioning information is updated by the target positioning module 14, and the updated target positioning information is transmitted to the target enhancement module 16.

対象位置決めモジュール１４による対象位置決め情報の生成及び対象エンハンスメントモジュール１６への対象位置決め情報の送信は、また、半自動であってもよい。対象位置決めモジュール１４から対象エンハンスメントモジュール１６への直接の対象位置決め情報の送信に代えて、ユーザが、利用可能な対象位置決め情報を有した後に、手動で入力映像のデジタルピクチャに、対象が位置する所定サイズの領域を定義するマーキング（例えば、境界線）を加えてよい。 The generation of the target positioning information by the target positioning module 14 and the transmission of the target positioning information to the target enhancement module 16 may also be semi-automatic. Instead of transmitting the target positioning information directly from the target positioning module 14 to the target enhancement module 16, the target is manually positioned in the digital picture of the input video after the user has available target positioning information. Markings (eg, borders) that define the size area may be added.

対象位置決め情報の生成及び対象エンハンスメントモジュール１６への対象位置決め情報の送信は、また、全手動であってもよい。このような動作において、ユーザは入力映像のデジタルピクチャを観て、手動で入力映像のデジタルピクチャに、対象が位置する所定サイズの領域を定義するマーキング（例えば、境界線）を加える。実際問題として、全手動動作は、ライブ事象の補償のために推奨されない。 Generation of target positioning information and transmission of target positioning information to the target enhancement module 16 may also be fully manual. In such an operation, the user views the digital picture of the input video and manually adds a marking (for example, a boundary line) that defines an area of a predetermined size where the object is located to the digital picture of the input video. In practice, full manual operation is not recommended for live event compensation.

対象位置決め情報の精緻化は、必要な場合又は望ましい場合に、対象境界推定を伴い、対象の正確な境界が推定される。正確な境界の推定は、不自然な対象表現及び動作の副次的な作用を伴うことなく対象の鮮明度を高めるのに役立ち、幾つかの基準に基づく。対象境界推定のためのこれらのアプローチが開示される。 Refinement of the target positioning information involves target boundary estimation when necessary or desirable, and an accurate target boundary is estimated. Accurate boundary estimation helps to increase the sharpness of an object without unnatural object representation and side effects of motion, and is based on several criteria. These approaches for object boundary estimation are disclosed.

第１は、楕円パラメータの領域にわたって検索することによってほぼ厳密に対象の境界を示す楕円を決定し又は識別する楕円ベースのアプローチである。対象境界推定のための第２のアプローチはレベルセットベースの検索であり、このアプローチでは、対象近隣のレベルセット表現が取得されて、検索が対象境界をおおよそ表すレベルセット輪郭のために実施される。対象境界推定のための第３のアプローチは、対象境界に集まるようにある制約を有して曲線を縮小又は拡大するために使用される輪郭（contours）又は蛇行（snakes）等の曲線展開方法（curve evolution methods）を伴う。対象境界推定のための第１及び第２のアプローチのみが以下でより詳細に検討される。 The first is an ellipse-based approach that determines or identifies an ellipse that shows the boundary of the object almost exactly by searching over a region of ellipse parameters. A second approach for target boundary estimation is level set based search, in which a level set representation of the target neighborhood is obtained and the search is performed for level set contours that roughly represent the target boundary . A third approach for object boundary estimation is a curve expansion method such as contours or snakes that are used to reduce or expand a curve with certain constraints to gather at the object boundary ( curve evolution methods). Only the first and second approaches for object boundary estimation are discussed in more detail below.

楕円ベースのアプローチで、対象境界推定は、ほぼ厳密に対象の境界を示す楕円のパラメータを決定することと等価である。このアプローチは、初期値（すなわち、対象位置決めモジュール１４の出力）の周囲の楕円パラメータの領域にわたって検索を行い、各楕円が対象の境界を示す堅固さ（tightness）を決定する。図４に表されるアルゴリズムの出力は、最も堅固な境界楕円である。 With an ellipse-based approach, object boundary estimation is equivalent to determining an ellipse parameter that indicates the object boundary almost exactly. This approach searches over the area of the ellipse parameter around the initial value (ie, the output of the object positioning module 14) to determine the tightness with which each ellipse represents the boundary of the object. The output of the algorithm represented in FIG. 4 is the most rigid boundary ellipse.

楕円の堅固さ測定は、楕円のエッジ沿いの画像強度の平均階調度であるよう定義される。この測定の背後にある論拠は、最も堅固な境界楕円が密接に対象輪郭に従うべきであり、画像階調度は通常対象輪郭（すなわち、対象と背景の間のエッジ）沿いで高いことである。対象境界推定アルゴリズムのためのフローチャートは図４に示されている。パラメータを精緻化するための検索範囲（Δ_ｘ，Δ_ｙ，Δ_ａ，Δ_ｂ）はユーザにより指定される。 The ellipse stiffness measurement is defined to be the average gradation of the image intensity along the edge of the ellipse. The rationale behind this measurement is that the hardest bounding ellipse should closely follow the object contour and the image gradient is usually high along the object contour (ie, the edge between the object and the background). A flowchart for the object boundary estimation algorithm is shown in FIG. A search range (Δ _x , Δ _y , Δ _a , Δ _b ) for refining the parameters is specified by the user.

図４のフローチャートは、平均階調度を計算することによって開始する。次いで、変数が初期化され、水平の中心点位置、垂直の中心点位置及び２つの軸のための４つの入れ子ループが入られている。この中心点及び２軸によって記述される楕円がより良い（すなわち、より大きい）平均階調度を生ずる場合、この階調度値及びこの楕円はこれまでで最良であるとして知られる。次は４つの全てのループによるルーピングであり、最良の楕円を有して終了する。 The flowchart of FIG. 4 starts by calculating the average gradation. The variables are then initialized, with a horizontal center point position, a vertical center point position, and four nested loops for the two axes. If the ellipse described by this center point and two axes yields a better (ie, larger) average gradient, this gradient value and this ellipse are known as the best so far. Next is looping through all four loops, ending with the best ellipse.

楕円ベースのアプローチは、対象と背景との間の境界が一様に高い階調度を有する環境に適用されてよい。しかい、このアプローチは、また、境界が一様に高い階調度を有さない環境に適用されてもよい。例えば、このアプローチは、また、対象及び／又は背景が対象／背景境界に沿って強度変化を有する場合でさえ有用である。 The ellipse-based approach may be applied to environments where the boundary between the object and the background has a uniformly high gradation. However, this approach may also be applied to environments where the boundaries do not have a uniform high gradation. For example, this approach is also useful even when the object and / or background has an intensity change along the object / background boundary.

楕円ベースのアプローチは、典型的な実施において、最良適合の楕円の記述を生成する。記述は、通常、中心点、長軸及び短軸を含む。 The ellipse-based approach produces a best-fit ellipse description in a typical implementation. The description usually includes a center point, a major axis and a minor axis.

楕円ベースの表現は、任意形状を有する対象を記述するには不適切であることがある。楕円形の対象でさえ、モーションブラー（motion-blur）又は部分的な閉塞（occluding）が起こる場合に不規則な形状を有して現れることがある。レベルセット表現は、任意形状の態様の境界の推定を容易にする。 Ellipse-based representations may be inappropriate for describing objects with arbitrary shapes. Even elliptical objects may appear with irregular shapes when motion-blur or partial occluding occurs. The level set representation facilitates estimation of the boundary of the arbitrarily shaped aspect.

図５Ａ〜５Ｄは、対象境界推定のためのレベルセットアプローチに係る概念を表す。強度画像Ｉ（ｘ，ｙ）は、例えば図５Ｂに示されるような連続的な強度表面であり、図５Ａに示されるような離散的な強度のグリッドでないとする。強度値ｉでのレベルセットは、Ｉ_ｌ（ｉ）＝｛（ｘ，ｙ）｜Ｉ（ｘ，ｙ）＝ｉ｝によって定義される閉じた輪郭の組である。閉じた輪郭は連続的な曲線によって、又は曲線をたどる離散的な画素の列によって表されてよい。画像Ｉのレベルセット表現は、異なる強度レベル値でのレベルセットの組（すなわち、Ｌ_ｌ（Ｍ）＝｛Ｉ_ｌ（ｉ）｜ｉ∈Ｍ｝）である。例えば、Ｍ＝｛０，．．．，２５５｝又はＭ＝｛５０．５，１００．５，２００．５｝である。レベルセットは、幾つかの方法によって画像から取り出され得る。かかる方法の１つは、離散的な強度グリッドを空間及び強度値の両方において連続的な強度表面に変換するために一度に４つの画素の組の間で双線形補間（bilinear interpolation）を適用することである。その後、例えば図５Ｄに示されるようなレベルセットが、例えば図５Ｃに示されるような１又はそれ以上のレベル面（すなわち、特定のレベルにある水平な面）との表面の交差を計算することによって、取り出される。 5A-5D represent concepts related to a level set approach for target boundary estimation. The intensity image I (x, y) is, for example, a continuous intensity surface as shown in FIG. 5B and is not a discrete intensity grid as shown in FIG. 5A. The level set at intensity value i is a closed contour set defined by I _l (i) = {(x, y) | I (x, y) = i}. A closed contour may be represented by a continuous curve or by a sequence of discrete pixels following the curve. The level set representation of image I is a set of level sets with different intensity level values (ie, L _l (M) = {I _l (i) | i∈M}). For example, M = {0,. . . , 255} or M = {50.5, 100.5, 200.5}. The level set can be extracted from the image in several ways. One such method applies bilinear interpolation between a set of four pixels at a time to convert a discrete intensity grid into a continuous intensity surface in both space and intensity values. That is. Thereafter, a level set, such as shown in FIG. 5D, calculates the intersection of the surface with one or more level surfaces (ie, horizontal surfaces at a particular level), eg, as shown in FIG. 5C. Is taken out.

レベルセット表現は、地勢図（topographical map）に様々に類似する。通常、地勢図は、様々な標高値について閉じた輪郭を有する。 Level set representations are variously similar to topographical maps. Typically, topographic maps have closed contours for various elevation values.

実際に、画像Ｉは、推定されるべき境界を有する対象を含むサブ画像であってよい。レベルセット表現Ｌ_ｌ（Ｍ）（Ｍ＝｛ｉ_１，ｉ_２，．．．，ｉ_Ｎ｝）が取り出される。集合Ｍは、対象画素の推定強度に基づいて構成されてよく、あるいは、単に固定ステップにより強度範囲全体に及んでよい（例えば、Ｍ＝｛０．５，１．５，．．．，２５４．５，２５５．５｝）。次いで、集合Ｌ_ｌ（Ｍ）に含まれる全てのレベルセット曲線（すなわち、閉じた輪郭）Ｃ_ｊが考えられる。対象境界推定は、対象に関連する多数の基準を最も良く満足するレベルセット曲線Ｃ^＊を決定する問題を割り当てられる。かかる基準は、とりわけ、下記の変数を有してよい：
・Ｃ_ｊ沿いの平均階調度、
・Ｃ_ｊ内の面積、
・Ｃ_ｊの長さ、
・Ｃ_ｊの中心の位置、
・Ｃ_ｊに含まれる画素の強度の平均及び／又は分散。 Indeed, image I may be a sub-image that includes an object having a boundary to be estimated. The level set representation L ₁ (M) (M = {i ₁ , i ₂ ,..., I _N }) is retrieved. The set M may be constructed based on the estimated intensity of the target pixel, or may simply span the entire intensity range by a fixed step (eg, M = {0.5, 1.5,..., 254...). 5, 255.5}). Then, all level set curves (ie closed contours) C _j included in the set L ₁ (M) are considered. Object boundary estimation is assigned the problem of determining a level set curve C ^* that best satisfies a number of criteria associated with the object. Such criteria may have the following variables, among others:
The average gradient along · _{C j,}
The area in C _j ,
The length of C _j ,
The position of the center of C _j ,
The average and / or variance of the intensities of the pixels contained in C _j

基準は、対象に関する予備的知識に基づいてこれらの変数に制約を課してよい。以下では、レベルセットによる対象境界推定の具体的実施が記載される。 The criteria may impose constraints on these variables based on prior knowledge about the subject. In the following, a specific implementation of target boundary estimation by level set is described.

ｍ_ｒｅｆ、ｓ_ｒｅｆ、ａ_ｒｅｆ及びベクトルｘ_ｒｅｆ＝（ｘ_ｒｅｆ，ｙ_ｒｅｆ）は、夫々、対象の平均強度、強度の標準偏差、面積及び中心についての基準値であるとする。これらは、対象に関する予備的知識（例えば、対象位置決めモジュール１４からの対象パラメータは楕円から得られる。）に基づいて初期化されてよい。次いで、レベルの集合Ｍが、Ｍ＝｛ｉ_ｍｉｎ，ｉ_ｍｉｎ＋Δ_ｌ，ｉ_ｍｉｎ＋２Δ_ｌ，・・・，ｉ_ｍａｘ｝として構成される。ここで、

ここで、Ｎはプリセット値（例えば、１０）である。留意すべきは、

は、整数フローリング演算（integer flooring operation）を表す点である。 Let m _ref , s _ref , a _ref and the vector x _ref = (x _ref , y _ref ) be the reference values for the mean intensity, standard deviation of intensity, area and center of the object, respectively. These may be initialized based on preliminary knowledge about the object (eg, object parameters from the object positioning module 14 are obtained from an ellipse). A set of levels M is then constructed as M = {i _min , i _min + Δ _l , i _min + 2Δ _l ,..., I _max }. here,

Here, N is a preset value (for example, 10). It should be noted that

Is a point representing an integer flooring operation.

特定のレベルセット曲線Ｃ_ｊに関し、ｍ_ｊ、ｓ_ｊ、ａ_ｊ及びベクトルｘ_ｊ＝（ｘ_ｊ，ｙ_ｊ）は、夫々、Ｃ_ｊに含まれる画像領域の平均強度、強度の標準偏差、面積及び中心の測定値であるとする。また、Ｃ_ｊに沿って平均階調度Ｇ_ａｖｇ（Ｃ_ｊ）が計算される。言い換えると、Ｇ_ａｖｇ（Ｃ_ｊ）は、Ｃ_ｊ上の各画素での階調度の平均である。夫々のＣ_ｊに関し、この場合、スコアが以下のように計算される：

ここで、Ｓａ及びＳｘは、範囲［０，１］にある出力値を有する類似関数（similarity function）であって、値が大きいほど、基準値と測定値との間のより良い一致を示す。例えば、以下の通りである。 For a particular level set curve C _j , m _j , s _j , a _j and vector x _j = (x _j , y _j ) are respectively the average intensity, the standard deviation of the intensity, and the area of the image area included in C _j. And the center measurement. The average gradient _G avg _{(C j)} along the _{C j} are calculated. In other words, G _avg (C _j ) is the average of the gradation levels at each pixel on C _j . For each C _j , in this case the score is calculated as follows:

Here, Sa and Sx are similarity functions having output values in the range [0, 1], and a larger value indicates a better match between the reference value and the measured value. For example, it is as follows.

次いで、対象境界Ｃ^＊は、このスコアを最大とする曲線として推定される（すなわち、Ｃ^＊＝ａｒｇ_Ｃｊｍａｘ［Ｓ（Ｃ_ｊ）］）。

The target boundary C ^* is then estimated as the curve that maximizes this score (ie, C ^* = arg _Cj max [S (C _j )]).

対象境界を推定した後、基準値ｍ_ｒｅｆ、ｓ_ｒｅｆ、ａ_ｒｅｆ及びベクトルｘ_ｒｅｆは、学習係数（learning factor）α∈［０，１］により更新されてよい（例えば、ｍ_ｒｅｆ ^ｎｅｗ＝αｍ_ｊ＋（１−α）ｍ_ｒｅｆ）。映像シーケンスの場合に、係数αは、高値で始まって、フレームごとに低下し、最終的に固定の低値α_ｍｉｎまで飽和する、時間（例えば、フレームインデックス）ｔの関数であってよい。 After estimating the target boundary, the reference values m _ref , s _ref , a _ref and the vector x _ref may be updated by a learning factor αε [0, 1] (eg, m _ref ^new = αm _j + (1-α) m _ref ). In the case of a video sequence, the coefficient α may be a function of time (eg, frame index) t starting at a high value, decreasing from frame to frame, and eventually saturating to a fixed low value α _min .

対象のエンハンスメントにおいて、対象の鮮明度は、対象の近隣で画像処理演算を適用することによって改善される。かかる演算は、対象境界に沿って（例えば、エッジ鮮鋭化）、対象内部で（例えば、テクスチャエンハンスメント）、及び場合により対象の外部でさえ（例えば、コントラスト増大）適用されてよい。ここに記載される実施において、対象エンハンスメントのための多数の方法が提案される。第１は、対象内部でその輪郭に沿ってエッジを鮮鋭化することである。第２は、スムージング、鮮鋭化及び境界推定の各動作をインタラクティブに適用することによって（必ずしも、適用される順序はここに挙げられた順でなくてもよい。）、対象を拡大（enlarge）することである。他の可能性は、形態学的フィルタ（morphological filter）及び対象置換の使用を含む。 In object enhancement, object definition is improved by applying image processing operations in the vicinity of the object. Such operations may be applied along the object boundary (eg, edge sharpening), inside the object (eg, texture enhancement), and even even outside the object (eg, contrast enhancement). In the implementation described herein, a number of methods for target enhancement are proposed. The first is to sharpen the edges along the contour inside the object. Second, the object is enlarged by interactively applying smoothing, sharpening and boundary estimation operations (the order of application need not necessarily be in the order listed here). That is. Other possibilities include the use of morphological filters and object replacement.

より対象に注目を引く方法の１つは、対象の内部で対象の輪郭に沿ってエッジを鮮鋭化することである。これは、対象のディテールをより鮮明にするとともに、対象を背景から際立たせる。更に、より鮮鋭化されたエッジは、より良く圧縮を切り抜ける傾向がある。鮮鋭化による対象エンハンスメントのためのアルゴリズムは、一度に１つのフレームで対象に作用し、その入力として強度画像Ｉ（ｘ，ｙ）と、対象位置決めモジュール１４によって提供される対象パラメータ（すなわち、位置、サイズ等）とをとる。当該アルゴリズムは、以下のように３つのステップを有する。 One way to attract more attention to the object is to sharpen the edges along the object's outline within the object. This makes the subject's details clearer and makes the subject stand out from the background. Furthermore, sharper edges tend to better survive compression. The algorithm for object enhancement by sharpening operates on the object one frame at a time, with its intensity image I (x, y) as input and the object parameters provided by the object positioning module 14 (ie, position, Size). The algorithm has three steps as follows.

・対象の境界Ｏを推定する。 -Estimate the boundary O of the object.

・対象境界内及び対象境界上の画像Ｉにおける全ての画素に鮮鋭化フィルタＦ_αを適用する。これはＯに含まれる全ての画素について新たな鮮鋭化値Ｉ_{ｓｈａｒｐ}（ｘ，ｙ）を与える。ここで、Ｉ_{ｓｈａｒｐ}（ｘ，ｙ）＝（Ｉ＊Ｆ_α）（ｘ，ｙ）であり、（Ｉ＊Ｆ_α）は鮮鋭化フィルタＦ_αによる画像Ｉの畳み込みを示す。 · Applying a sharpening filter F _alpha to all pixels in the image I on the target within the boundary and the object boundary. This gives a new sharpening value I _sharp (x, y) for all pixels contained in O. Here, I _sharp (x, y) = (I * F _α ) (x, y), and (I * F _α ) indicates convolution of the image I by the sharpening filter F _α .

・Ｏの内側及びその上の全ての（ｘ，ｙ）について画素Ｉ（ｘ，ｙ）をＩ_{ｓｈａｒｐ}（ｘ，ｙ）により置換する。 Replace pixel I (x, y) with I _sharp (x, y) for all (x, y) inside and above O.

鮮鋭化フィルタＦ_αは、クロネッカー・デルタ関数（Kronecker delta function）及び離散ラプラシアン演算子∇_α ^２の差として定義される：
Ｆ_α（ｘ，ｙ）＝δ（ｘ，ｙ）−∇_α ^２（ｘ，ｙ）。 The sharpening filter F _α is defined as the difference between the Kronecker delta function and the discrete Laplacian operator ∇ _α ² :
F _α (x, y) = δ (x, y) −∇ _α ² (x, y).

パラメータα∈［０，１］は、ラプラシアン演算子の形を制御する。実際に、３×３のフィルタカーネルは、原点（０，０）であるカーネルの中心を有して構成される。このようなカーネルの一例が以下に示される：

拡大（enlargement）による対象エンハンスメントは、スムージング、鮮鋭化及び境界推定の各動作をインタラクティブに適用することによって（必ずしも、適用される順序はここに挙げられた順でなくてもよい。）、対象の輪郭を拡大（enlarge）しようとする。対象拡大アルゴリムの具体的な実施形態に係るフローチャートは図６に示されている。このアルゴリズムは、その入力として強度画像Ｉ（ｘ，ｙ）と、対象位置決めモジュール１４によって提供される対象パラメータとをとる。第１に、対象周囲に十分なマージンを有して対象を含む領域（サブ画像Ｊ）が分離されて、ガウスフィルタにより平滑化される。この動作は、幾つかの画素だけ対象境界を外側に広げる。その後、上記の鮮鋭化動作が、エッジをより明らかにするために適用される。目下推定される対象境界と、平滑化され且つ鮮鋭化されたサブ画像（Ｊ_{ｓｍｏｏｔｈｓｈａｒｐ}）とを用いて、境界推定アルゴリズムは、対象境界Ｏの新たな推定を得るために適用される。最終的に、Ｏに含まれる画像における全ての画素が、サブ画像Ｊ_{ｓｍｏｏｔｈｓｈａｒｐ}内の対応する画素によって置換される。 The parameter αε [0,1] controls the shape of the Laplacian operator. In practice, a 3 × 3 filter kernel is constructed with the kernel center at the origin (0,0). An example of such a kernel is shown below:

Object enhancement by enlargement involves applying smoothing, sharpening and boundary estimation operations interactively (the order of application need not necessarily be in the order listed here). Try to enlarge the contour. A flow chart according to a specific embodiment of the subject expansion algorithm is shown in FIG. This algorithm takes as its input an intensity image I (x, y) and object parameters provided by the object positioning module 14. First, a region (sub-image J) including a target with a sufficient margin around the target is separated and smoothed by a Gaussian filter. This action extends the object boundary outward by a few pixels. The above sharpening operation is then applied to make the edges more obvious. The boundary estimation algorithm is applied to obtain a new estimate of the target boundary O, using the target boundary currently estimated and the smoothed and sharpened sub-image (J _smoothsharp ). Eventually, all pixels in the image contained in O are replaced by corresponding pixels in the sub-image J _smoothshap .

平滑化フィルタＧ_σは、以下のように、２次元ガウス関数である：

パラメータσ＜０はガウス関数の形を制御し、値が大きいほどより滑らかになる。実際に、３×３のフィルタカーネルは、原点（０，０）であるカーネルの中心を有して構成される。このようなカーネルの一例が以下に示される：

図１のシステムは、更に、対象エンハンスメントモジュール１６から出力されたエンハンスト映像をエンコードする手段を有する。この手段は、オブジェクトアウェア（object-aware）のエンコーダモジュール１８として図１に示され、従来の構成及び動作のモジュールであってよく、例えば関心領域に更なるビットを割り当てることによって関心のある対象を含む関心領域に特別の扱いを与えることで最小限の劣化をもってエンハンスト映像を重要な対象にまで圧縮し、あるいは、対象をより良く保つモード決定を行う。このようにして、オブジェクトアウェアのエンコーダモジュール１８は、対象の高められた鮮明度を利用して、高い忠実度（fidelity）を有して対象をエンコードする。 The smoothing filter _Gσ is a two-dimensional Gaussian function as follows:

The parameter σ <0 controls the shape of the Gaussian function, and the larger the value, the smoother. In practice, a 3 × 3 filter kernel is constructed with the kernel center at the origin (0,0). An example of such a kernel is shown below:

The system of FIG. 1 further includes means for encoding the enhanced video output from the target enhancement module 16. This means is shown in FIG. 1 as an object-aware encoder module 18 and may be a conventional configuration and operation module, for example to target objects of interest by assigning more bits to the region of interest. By giving special treatment to the region of interest to be included, the enhanced video is compressed to an important target with minimal degradation, or a mode decision for keeping the target better is performed. In this way, the object-aware encoder module 18 uses the increased sharpness of the object to encode the object with high fidelity.

入力映像のエンハンスメントを最適化するよう、オブジェクトアウェアのエンコーダモジュール１８は、対象位置決めモジュール１４から対象位置決め情報を受け取って、対象が位置する領域、結果として対象、のエンハンスメントをより良く保つ。エンハンスメントが保たれようとなかろうと、対象が位置する領域は、オブジェクトアウェアのエンコーダ１８によるエンコードがない場合よりもより良く保たれる。しかし、エンハンスメントは、また、圧縮の間の対象の劣化も最小限とする。この最適化されたエンハンスメントは、エンコーディング決定及びリソース（例えば、ビット）の割り当てを適切に管理することによって達成される。 In order to optimize the enhancement of the input video, the object-aware encoder module 18 receives the target positioning information from the target positioning module 14 and better keeps the enhancement of the region in which the target is located, and consequently the target. Whether or not enhancement is preserved, the region where the object is located is better preserved than if there was no encoding by the object-aware encoder 18. However, enhancement also minimizes object degradation during compression. This optimized enhancement is achieved by appropriately managing encoding decisions and resource (eg, bit) allocation.

オブジェクトアウェアのエンコーダ１８は、“オブジェクトフレンドリー”なマクロブロック（ＭＢ）モード決定、すなわち、対象をほとんど劣化させないＭＢモード決定を行うよう配置されてよい。このような配置は、例えば、図７（Ａ）〜（Ｃ）に表されているように、予測のためにＭＢのオブジェクトフレンドリーなパーティショニングを有してよい。他のアプローチは、より細かい量子化、すなわちより多くのビットを、対象を含むＭＢに強いることである。これは、対象が更なるビットを得ることをもたらす。更なる他のアプローチは、対象自体を付加的なビットの対象とする。更なる他のアプローチは、速度歪み最適化処理の間に、重み付けされた歪みメトリクスを用いる。このとき、関心領域に属する画素は、関心領域外の画素よりも高い重みを有する。 The object-aware encoder 18 may be arranged to make an “object-friendly” macroblock (MB) mode decision, ie, an MB mode decision that causes little degradation of the object. Such an arrangement may have object-friendly partitioning of MBs for prediction, as shown, for example, in FIGS. 7 (A)-(C). Another approach is to force finer quantization, ie more bits, on the MB containing the object. This results in the subject getting more bits. Yet another approach targets the object itself for additional bits. Yet another approach uses weighted distortion metrics during the speed distortion optimization process. At this time, the pixels belonging to the region of interest have a higher weight than the pixels outside the region of interest.

図７Ａ〜７Ｃを参照すると、１６×１６マクロブロックの３つの可能な細分が示されている。このような細分は、どのようにＭＢをエンコードすべきかを決定するためにエンコードが行うモード決定の部分である。１つのキーメトリクスは、対象が細分の領域の高い割合を占める場合に、対象がエンコード中にほとんど劣化しないことである。これは、対象の劣化が細分のより多くの部分の品質を劣化させるためである。故に、図７Ｃで、対象は、夫々１６×８細分の小さな部分しか構成せず、然るに、これは、良好な細分とは考えられない。様々な実施におけるオブジェクトアウェアのエンコーダは、どこに対象が位置しているのかを知り、この位置情報をそのモード決定に組み入れる。このようなオブジェクトアウェアのエンコーダは、対象が細分のより大きい部分を占めることをもたらす細分を好む。全体として、オブジェクトアウェアのエンコーダ１８の目的は、エンコーディング処理中に可能な限り対象の劣化を小さくするのに役立つ。 Referring to FIGS. 7A-7C, three possible subdivisions of a 16 × 16 macroblock are shown. Such subdivision is part of the mode decision performed by the encoding to determine how to encode the MB. One key metric is that the target is hardly degraded during encoding if the target occupies a high percentage of subdivision areas. This is because the degradation of the object degrades the quality of more parts of the subdivision. Thus, in FIG. 7C, each object constitutes only a small portion of 16 × 8 subdivisions, which is not considered a good subdivision. Object-aware encoders in various implementations know where the object is located and incorporate this position information into its mode decision. Such object-aware encoders prefer subdivisions that result in the subject occupying a larger part of the subdivision. Overall, the purpose of the object-aware encoder 18 helps to minimize object degradation as much as possible during the encoding process.

図１に示されるように、対象位置決めモジュール１４、対象エンハンスメントモジュール１６及びオブジェクトアウェアのエンコーダモジュール１８は、関心のある対象を含むデジタルピクチャの入力映像を受信し、対象の鮮明度が高められた圧縮映像ストリームを送信する送信器１０の構成要素である。圧縮映像ストリームの送信は、例えば携帯電話機又はＰＤＡ等の受信器２０によって受信される。 As shown in FIG. 1, a target positioning module 14, a target enhancement module 16, and an object-aware encoder module 18 receive an input video of a digital picture that includes a target of interest, and compression with enhanced definition of the target. It is a component of the transmitter 10 which transmits a video stream. The transmission of the compressed video stream is received by a receiver 20 such as a mobile phone or a PDA.

従って、図１のシステムは、更に、受信器２０によって受信される圧縮映像ストリームにおいてエンハンスト映像をデコードする手段を有する。このような手段は、デコーダモジュール２２として図１で示され、従来の構成及び動作のモジュールであってよく、例えば関心領域に更なるビットを割り当てることによって関心のある対象を含む関心領域に特別の扱いを与えることで最小限の劣化をもってエンハンスト映像を重要な対象にまで復元し、あるいは、高められた対象の鮮明度をより良く保つモード決定を行う。 Accordingly, the system of FIG. 1 further comprises means for decoding the enhanced video in the compressed video stream received by the receiver 20. Such means is shown in FIG. 1 as a decoder module 22 and may be a conventional configuration and operation module, for example special to regions of interest including objects of interest by assigning further bits to the region of interest. By giving treatment, the enhanced video is restored to an important target with minimal degradation, or a mode decision is made to better maintain the enhanced target sharpness.

図１に点線で示されるように、オブジェクトアウェアの後処理モジュール２４を一時的に無視すると、デコーダモジュール２２から出力される復号された映像は、高められた対象の鮮明度を有してデジタルピクチャを観るために、例えば携帯電話機又ＰＤＦ等の画面のような表示コンポーネント２６に導かれる。 As shown by the dotted lines in FIG. 1, when the object-aware post-processing module 24 is temporarily ignored, the decoded video output from the decoder module 22 has a digital picture with an enhanced target sharpness. In order to watch the video, the user is guided to a display component 26 such as a screen of a mobile phone or PDF.

上記の図１のシステムの動作モードは、対象が対象エンハンスメントモジュール１６によってエンコードディング動作の前にエンハンスメントを受ける点で、前処理と見なされる。シーケンスは、圧縮される前に変更される。 The mode of operation of the system of FIG. 1 above is considered preprocessing in that the subject is enhanced by the subject enhancement module 16 prior to the encoding operation. The sequence is changed before being compressed.

上記のようにエンコーディングの前に対象の鮮明度を高めることに代えて、入力映像は、点線１９で表されるように、オブジェクトアウェアのエンコーダモジュール１８に直接に導かれて、対象の鮮明度を高められることなくエンコードされ、受信器２０にあるオブジェクトアウェアの後処理モジュール２４によって行われるエンハンスメントを有する。図１のシステムのこのような動作モードは、エンコーディング段階及びデコーディング段階の後で対象の鮮明度が高められる点で後処理と見なされ、メタデータとしてビットストリームにより送信される対象に関するサイド情報（例えば、対象の位置及びサイズ）を用いることによって行われてよい。後処理モードの動作は、受信器の複雑性が増すという欠点を有する。後処理モードの動作で、送信器１０にあるオブジェクトアウェアのエンコーダ１８は、対象の鮮明度が受信器で高められる場合に、対象位置情報のみを利用する。 Instead of increasing the object's definition before encoding as described above, the input video is directly directed to the object-aware encoder module 18 as represented by the dotted line 19 to increase the object's definition. Encoded without enhancement and has enhancements made by the object-aware post-processing module 24 at the receiver 20. Such a mode of operation of the system of FIG. 1 is considered post-processing in that the sharpness of the object is increased after the encoding and decoding stages, and side information about the object transmitted by the bitstream as metadata ( For example, the position and size of the object may be used. Post-processing mode operation has the disadvantage of increasing receiver complexity. In the post-processing mode operation, the object-aware encoder 18 in the transmitter 10 uses only the target position information when the target sharpness is enhanced at the receiver.

上記のように、送信器エンドの対象ハイライト表示システム（すなわち、前処理モードの動作）の利点の１つは、一般に低電力デバイスである受信器エンドの複雑性を増す必要性を回避することである。加えて、前処理モードの動作は、標準的な映像デコーダの使用を可能にする。これは、システムの展開を容易にする。 As mentioned above, one of the advantages of the transmitter end object highlighting system (ie, pre-processing mode of operation) is to avoid the need to increase the complexity of the receiver end, which is typically a low power device. It is. In addition, the pre-processing mode of operation allows the use of a standard video decoder. This facilitates system deployment.

記載される実施は、例えば、方法若しくは処理、装置、又はソフトウェアプログラムについて実施されてよい。単一の実施形態に関連してしか論じられないとしても（例えば、方法としてしか論じられないとしても）、論じられる実施又は特徴は他の形態（例えば、装置又はプログラム）でも実施されてよい。装置は、例えば、適切なハードウェア、ソフトウェア、及びファームウェアで実施されてよい。方法は、例えば、コンピュータ又は他のプロセッシングデバイス等の装置で実施されてよい。更に、方法は、プロセッシングデバイス又は他の装置によって実行される命令によって実施されてよく、このような命令は、例えば、ＣＤ又は他のコンピュータ可読記憶デバイス若しくは集積回路等のコンピュータ読出可能な媒体に記憶されてよい。 The described implementations may be implemented, for example, on a method or process, an apparatus, or a software program. Although discussed only in connection with a single embodiment (e.g., discussed only as a method), the discussed implementations or features may be implemented in other forms (e.g., an apparatus or program). The device may be implemented, for example, with suitable hardware, software, and firmware. The method may be implemented on an apparatus such as a computer or other processing device, for example. Further, the method may be implemented by instructions executed by a processing device or other apparatus, such instructions stored on a computer readable medium such as a CD or other computer readable storage device or integrated circuit, for example. May be.

当業者にとって明らかであるように、実施は、また、例えば記憶又は送信される情報を坦持するようフォーマットされた信号を生成してよい。情報は、例えば、情報を実行するための命令、又は記載される実施の１つによって生成されるデータを有してよい。例えば、信号は、様々なタイプの対象情報（すなわち、位置、形状）をデータとして坦持し、及び／又はエンコードされた画像データをデータとして坦持するようフォーマットされてよい。 As will be apparent to those skilled in the art, implementations may also generate signals that are formatted to carry information that is stored or transmitted, for example. The information may comprise, for example, instructions for performing the information or data generated by one of the described implementations. For example, the signal may be formatted to carry various types of target information (ie, position, shape) as data and / or carry encoded image data as data.

本発明は特定の実施形態を参照してここでは図示及び記載をされているが、本発明は示されている詳細に限定されるよう意図されない。むしろ、様々な変形が、特許請求の範囲で定義される技術的範囲内で、本発明から外れることなく行われてよい。 Although the invention is illustrated and described herein with reference to specific embodiments, the invention is not intended to be limited to the details shown. Rather, various modifications may be made without departing from the invention within the scope defined by the claims.

［関連出願の相互参照］
本願は、２００８年４月１１日に出願された、「PROCESSING IMAGES HAVING OBJECTS」と題された米国仮出願第６１／１２３８４４号（代理人整理番号ＰＵ０８００５４）の優先権を主張するものであり、その先願は参照により全文を本願に援用される。 [Cross-reference of related applications]
This application claims the priority of US Provisional Application No. 61/123844 (Attorney Docket Number PU080054) entitled “PROCESSING IMAGES HAVING OBJECTS” filed on April 11, 2008, The prior application is incorporated herein by reference in its entirety.

Claims

A system that increases the sharpness of objects in digital pictures,
Means for providing an input video including the object;
(A) storing information representing the nature and characteristics of the object; and (b) identifying the object and indicating the position of the object according to the input video and information representing the nature and characteristics of the object. Means for generating target positioning information;
Means for generating an enhanced video of a portion of the input video including the target and an area of a digital picture in which the target is located, according to the input video and the target positioning information;
Means for encoding the enhanced video.

(A) means for transmitting the encoded enhanced video;
(B) means for decoding the encoded enhanced video;
The system according to claim 1, further comprising: (c) means for displaying the enhanced video.

The means for generating the target positioning information includes:
(A) means for scanning a sector of the input video;
(B) comparing the scanned sector of the input video with the stored information representative of the properties and characteristics of the object and the same properties and characteristics as the stored information representative of the characteristics and characteristics of the object; The system of claim 1, comprising: means for identifying an object in the digital picture having and determining a position of the object.

(A) the object positioning information approximates only the identity and position of the object;
(B) The means for encoding the enhanced video comprises:
(1) receiving the target positioning information; and (2) having means for refining the target positioning information.
The system according to claim 3.

The system of claim 4, wherein the means for refining the object positioning information comprises: (a) estimating a boundary of the object; and (b) enhancing the object.

(A) the object positioning information approximates only the identity and position of the object;
(B) means for generating an enhanced video of a portion of the input video including the target and a digital picture area where the target is located has means for refining the target positioning information;
The system according to claim 3.

The system of claim 6, wherein the means for refining the object positioning information comprises: (a) estimating a boundary of the object; and (b) enhancing the object.

A method for increasing the sharpness of an object in a digital picture,
Providing an input video including the target;
Storing information representing the properties and characteristics of the object;
Generating target positioning information that identifies the target and indicates the position of the target according to the input video and information representing the nature and characteristics of the target;
Generating an enhanced video of a portion of the input video that includes the target and a region of a digital picture in which the target is located, according to the input video and the target positioning information;
Encoding the enhanced video;
Transmitting the encoded enhanced video.

(A) receiving the encoded enhanced video;
(B) decoding the encoded enhanced video;
The method of claim 8, further comprising: (c) displaying the enhanced video.

The step of generating the target positioning information includes:
(A) scanning the sector of the input video;
(B) comparing the scanned sector of the input video with the stored information representative of the properties and characteristics of the object and the same properties and characteristics as the stored information representative of the characteristics and characteristics of the object; And identifying a target in the digital picture having and determining the position of the target.

(A) the object positioning information approximates only the identity and position of the object;
(B) encoding the enhanced video comprises:
(1) receiving the target positioning information;
The method according to claim 10, further comprising: refining the target positioning information.

The step of refining the target positioning information includes:
(A) estimating a boundary of the object;
The method of claim 11, comprising: (b) enhancing the object.

(A) the object positioning information approximates only the identity and position of the object;
(B) generating an enhanced image of the portion of the input image that includes the object and a region of the digital picture in which the object is located comprises refining the object positioning information;
The method of claim 10.

The step of refining the target positioning information includes:
(A) estimating a boundary of the object;
14. The method of claim 13, comprising the step of: (b) enhancing the object.

A system that increases the sharpness of objects in digital pictures,
Means for providing an input video including the object;
(A) storing information representing the nature and characteristics of the object; and (b) identifying the object and indicating the position of the object according to the input video and information representing the nature and characteristics of the object. Means for generating target positioning information;
Means for encoding the input video according to the input video and the target positioning information.