JP4998787B2

JP4998787B2 - Object region extraction processing program, object region extraction device, and object region extraction method

Info

Publication number: JP4998787B2
Application number: JP2007249606A
Authority: JP
Inventors: 理紀夫尾内; 貴宏林; 一帆柴田; 正弥森; 孝真竹中
Original assignee: Rakuten Inc
Current assignee: Rakuten Group Inc
Priority date: 2007-09-26
Filing date: 2007-09-26
Publication date: 2012-08-15
Anticipated expiration: 2027-09-26
Also published as: JP2009080660A

Description

本発明は、動画像中の物体領域を追跡するシステムおよび方法等の技術分野に関する。 The present invention relates to a technical field such as a system and method for tracking an object region in a moving image.

従来から、動画像中の物体領域を抽出する技術が、動画の加工、編集などの要素技術として盛んに研究されている。かかる技術では、例えば、動画像中の最初のフレーム画像(或いは幾つかのキーフレーム画像)において目的の物体領域を抽出し、それを基に他のフレーム画像における物体領域を追跡することで、動画中の各フレーム画像において物体領域を抽出するようになっている。なお、フレーム画像（静止画像）中の物体領域を抽出する技術としては、例えば、非特許文献１〜３に開示された技術が知られている。 Conventionally, a technique for extracting an object region in a moving image has been actively studied as an elemental technique for processing and editing a moving image. In such a technique, for example, a target object region is extracted from the first frame image (or several key frame images) in a moving image, and the object region in another frame image is tracked based on the target object region. An object region is extracted from each frame image. As a technique for extracting an object region in a frame image (still image), for example, techniques disclosed in Non-Patent Documents 1 to 3 are known.

ところで、これまでの物体領域の追跡に関する研究では、例えば非特許文献４及び５や特許文献１に開示されるように、自動追跡の精度向上を目的としていた。例えば、特許文献１に開示された技術では、動画像中の非追跡対象の影響を受けなくなるような追跡対象（物体領域）の特徴づけを行うことで、より失敗の少ない物体領域の追跡を可能とし、追跡枠の再設定を行う手間を軽減することができるとしている。
Y. Li, J. Sun, C. K. Tang and H. Y. Shum, “Lazy Snapping, ” ACM Transactions on Graphics (TOG), Vol.23, Issue.3, pp.303-308, Aug.2004. Y. Y. Boykov and M. P. Jolly, “Interactive graph cuts for optimal boundary & regionsegmentation of objects in N-D images, ” Proc. ICCV’01, Vol.1, pp.105-112, Jul.2001. C. Rother, V. Kolmogorov and A. Blake, “GrabCut : interactive foreground extraction using iterated graph cuts, ” ACM Transactions on Graphics (TOG), Vol.23, Issue.3, pp.309-314, Aug.2004. 岡田隆三, 白井良明, 三浦純, 久野義徳, “オプティカルフローと距離情報に基づく動物体追跡, ” 信学論(D-II), Vol.J80-D-II, No.6, pp.1530-1538, Jun.1997. M. J. Black and A. D. Jepson, “EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation,” International Jounal of Computer Vision, Vol.26, No.1, pp.63-84, Jan.1998. 特開２００３−６６５４号公報 By the way, in the research related to the tracking of the object region so far, as disclosed in Non-Patent Documents 4 and 5 and Patent Document 1, for example, the purpose is to improve the accuracy of automatic tracking. For example, with the technique disclosed in Patent Document 1, it is possible to track an object region with fewer failures by characterizing a tracking target (object region) that is not affected by a non-tracking target in a moving image. The effort to reset the tracking frame can be reduced.
Y. Li, J. Sun, CK Tang and HY Shum, “Lazy Snapping,” ACM Transactions on Graphics (TOG), Vol.23, Issue.3, pp.303-308, Aug.2004. YY Boykov and MP Jolly, “Interactive graph cuts for optimal boundary & region segmentation of objects in ND images,” Proc. ICCV'01, Vol.1, pp.105-112, Jul.2001. C. Rother, V. Kolmogorov and A. Blake, “GrabCut: interactive foreground extraction using iterated graph cuts,” ACM Transactions on Graphics (TOG), Vol.23, Issue.3, pp.309-314, Aug.2004. Ryuzo Okada, Yoshiaki Shirai, Jun Miura, Yoshinori Kuno, “Animal tracking based on optical flow and distance information,” IEICE theory (D-II), Vol.J80-D-II, No.6, pp.1530- 1538, Jun. 1997. MJ Black and AD Jepson, “EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation,” International Jounal of Computer Vision, Vol.26, No.1, pp.63-84, Jan.1998. JP 2003-6654 A

しかしながら、全フレーム画像を通して追跡ミス（失敗）を無くすことは困難である。そのため、追跡ミスが発生したフレーム画像において再度人手による物体抽出作業（修正作業）が必要になる。追跡ミスが少なからず発生することを前提として、人手による作業負担を軽減するためには、追跡精度を向上させ追跡ミスの回数を減らすというアプローチと、修正作業自体の負担をインターフェース面から軽減するというアプローチとの２つが考えられるが、従来、これら双方のアプローチから人手による作業負担を軽減させることを可能とした技術は知られていなかった。 However, it is difficult to eliminate tracking errors (failures) through all frame images. Therefore, it is necessary to manually perform an object extraction operation (correction operation) again on the frame image in which the tracking error has occurred. In order to reduce the burden of manual work on the assumption that tracking errors will occur at least, the approach of improving tracking accuracy and reducing the number of tracking errors and the burden of correction work itself will be reduced from the interface side. Two approaches are conceivable. Conventionally, there has been no known technique that makes it possible to reduce the manual work load from both approaches.

本発明は、このような点に鑑みてなされたものであり、その課題の一例は、人手による作業負担を低減させること等が可能な物体領域抽出処理プログラム、物体領域抽出装置、および物体領域抽出方法を提供することを目的とする。 The present invention has been made in view of the above points, and one example of the problem is an object region extraction processing program, an object region extraction device, and an object region extraction that can reduce the burden of manual work. It aims to provide a method.

上記課題を解決するために、請求項１に記載の物体領域抽出処理プログラムの発明は、コンピュータを、動画像を構成する複数のフレーム画像のうち少なくとも一つのフレーム画像を表示させる表示制御手段、前記表示されたフレーム画像に含まれる物体領域を抽出するための手がかりとなる線分であって、前記物体領域上と前記物体領域以外の領域上に別々に引かれる線分の指定をユーザから受け付ける受付手段、前記指定された手がかりとなる線分に基づいて、そのフレーム画像から物体領域を抽出する物体領域抽出手段、および、前記手がかりとなる線分が指定されたフレーム画像以外の他のフレーム画像において、前記指定された手がかりとなる線分の移動を追跡する手がかり追跡手段として機能させ、前記物体領域抽出手段は、さらに、前記追跡された手がかりとなる線分に基づいて、そのフレーム画像から物体領域を抽出することを特徴とする。 In order to solve the above-mentioned problem, the object region extraction processing program according to claim 1, wherein the display control means for causing the computer to display at least one frame image among a plurality of frame images constituting the moving image, Accepting from a user designation of line segments that serve as clues for extracting an object area included in the displayed frame image and are separately drawn on the object area and an area other than the object area means, based on the line segment to be the designated cue, the object area extracting means for extracting the object region from the frame image, and, in other frame images other than frame image segment serving as the cue is specified , to function as a track means Ri or hand to track the movement of a line clues said specified, the object area extracting means, is , Based on the line segment to be the tracked clues, and extracts an object area from the frame image.

この発明によれば、手がかりとなる線分が指定されたフレーム画像以外の他のフレーム画像において物体領域を追跡するのではなく、手がかりとなる線分を追跡し、各フレーム画像において追跡された手がかりとなる線分に基づき物体領域を抽出するように構成したので、より高い精度での物体領域の抽出を行うことができると共に、あるフレーム画像で物体領域抽出に失敗したとしても、その物体領域抽出の失敗が次のフレーム画像での物体画像抽出には影響しないという効果を有する。さらに、抽出された物体領域自体を追跡する場合と比べて、失敗したフレーム画像内で手がかりとなる線分を追加、削除する修正作業をすれば良く、ユーザ（人手）による作業負担を低減させることができる。 According to the present invention, instead of tracking an object region in a frame image other than a frame image in which a line segment serving as a clue is designated , a line segment serving as a clue is tracked, and the clue tracked in each frame image is tracked. Since the object area is extracted based on the line segment , the object area can be extracted with higher accuracy and the object area can be extracted even if the object area extraction fails in a certain frame image. This failure has an effect that the object image extraction in the next frame image is not affected. Furthermore, as compared with the case where the extracted object region itself is tracked, it is only necessary to perform a correction work to add or delete a line segment that becomes a clue in the failed frame image, thereby reducing the work load on the user (manpower). Can do.

また、手がかりとなる線分を追跡するので、物体領域を追跡するときより小さいデータ量で追跡できるため、追跡処理を高速化することができる。 In addition, since the line segment as a clue is tracked, tracking can be performed with a smaller amount of data when tracking the object region, so that the tracking process can be speeded up.

請求項２に記載の発明は、請求項１に記載の物体領域抽出処理プログラムにおいて、前記物体領域抽出手段は、前記他の複数のフレーム画像において、前記指定された手がかりとなる線分の移動を連続的に追跡することを特徴とする。 According to a second aspect of the present invention, in the object region extraction processing program according to the first aspect, the object region extraction means moves the line segment serving as the designated clue in the plurality of other frame images. It is characterized by continuous tracking.

この発明によれば、連続的に追跡された手がかりとなる線分に基づき、夫々のフレーム画像から物体領域を抽出することができる。 According to the present invention, an object region can be extracted from each frame image based on a line segment that is a clue that is continuously tracked.

請求項３に記載の発明は、請求項１又は２に記載の物体領域抽出処理プログラムにおいて、前記コンピュータを、前記追跡された手がかりとなる線分に基づく前記物体領域の抽出が失敗したフレーム画像を検出する失敗検出手段として更に機能させることを特徴とする。 According to a third aspect of the present invention, in the object region extraction processing program according to the first or second aspect, the computer causes the computer to detect a frame image in which the extraction of the object region based on the tracked line segment has failed. It further functions as a failure detection means for detecting.

この発明によれば、物体領域の抽出が失敗したフレーム画像内で手がかりとなる線分を追加、削除する修正作業を効率化、迅速化することができる。 According to the present invention, it is possible to increase the efficiency and speed of the correction work for adding and deleting line segments that are clues in a frame image in which extraction of the object region has failed.

請求項４に記載の発明は、請求項３に記載の物体領域抽出処理プログラムにおいて、前記コンピュータを、前記検出されたフレーム画像を示す情報を提示する情報提示手段として更に機能させることを特徴とする。 According to a fourth aspect of the present invention, in the object region extraction processing program according to the third aspect, the computer is further caused to function as information presenting means for presenting information indicating the detected frame image. .

請求項５に記載の発明は、請求項３に記載の物体領域抽出処理プログラムにおいて、前記表示制御手段は、前記検出されたフレーム画像を表示させ、前記受付手段は、当該表示されたフレーム画像に対して前記手がかりとなる線分の再指定をユーザから受け付け、前記物体領域抽出手段は、前記手がかりとなる線分が再指定されたフレーム画像から物体領域を抽出することを特徴とする。 According to a fifth aspect of the present invention, in the object region extraction processing program according to the third aspect, the display control means displays the detected frame image, and the reception means displays the displayed frame image. On the other hand, re-designation of the line segment serving as the clue is received from the user, and the object region extracting means extracts the object region from the frame image in which the line segment serving as the clue is re-designated.

この発明によれば、物体領域の抽出が失敗したフレーム画像において手がかりとなる線分を追加、削除する修正作業を行うだけで再指定が実現され、物体領域の再抽出が行われるので、少ない修正量で物体領域の抽出失敗を修正することができ、ユーザによる作業負担を低減させることができる。 According to the present invention, re-designation is realized simply by performing a correction operation for adding or deleting a line segment as a clue in a frame image in which extraction of the object region has failed, and re-extraction of the object region is performed, so that there are few corrections. The object region extraction failure can be corrected by the amount, and the work burden on the user can be reduced.

請求項６に記載の発明は、請求項３乃至５の何れか一項に記載の物体領域抽出処理プログラムにおいて、前記失敗検出手段は、基準となるフレーム画像において抽出された物体領域と、当該フレーム画像以外の他のフレーム画像において抽出された物体領域と、を比較してそれらの一致度を算出することにより前記物体領域の抽出が失敗したフレーム画像を検出することを特徴とする。 According to a sixth aspect of the present invention, in the object region extraction processing program according to any one of the third to fifth aspects, the failure detection means includes an object region extracted from a reference frame image and the frame. A frame image in which extraction of the object region has failed is detected by comparing object regions extracted in other frame images other than the image and calculating a degree of coincidence thereof.

この発明によれば、物体領域の抽出が失敗したフレーム画像を容易に検出することができる。 According to the present invention, it is possible to easily detect a frame image in which object region extraction has failed.

請求項７に記載の発明は、請求項１に記載の物体領域抽出処理プログラムにおいて、前記物体領域抽出手段は、前記手がかりとなる線分が指定されたフレーム画像以外の他の複数のフレーム画像において、前記指定された手がかりとなる線分の移動を連続的に追跡し、前記物体領域抽出手段は、前記追跡された夫々の手がかりとなる線分に基づいて、夫々のフレーム画像から物体領域を抽出するものであって、前記コンピュータを、前記手がかりとなる線分が指定されたフレーム画像において抽出された物体領域と、当該フレーム画像以外の他の複数のフレーム画像の夫々において抽出された物体領域と、を夫々比較して夫々の一致度を算出する一致度算出手段として更に機能させ、前記表示制御手段は、前記算出された一致度が所定の閾値以下である前記フレーム画像を表示させることを特徴とする。 According to a seventh aspect of the present invention, in the object region extraction processing program according to the first aspect, the object region extraction unit is configured to use a plurality of frame images other than the frame image in which the line segment serving as the clue is specified. The movement of the line segment serving as the designated clue is continuously tracked, and the object area extracting means extracts the object area from each frame image based on the tracked line segment serving as the clue. An object region extracted in a frame image in which a line segment serving as a clue is designated, and an object region extracted in each of a plurality of other frame images other than the frame image, Are further functioned as coincidence degree calculating means for calculating the respective coincidence degrees, and the display control means is configured such that the calculated coincidence degree is a predetermined threshold value. And wherein the displaying the frame image is lower.

この発明によれば、ユーザは一目で修正対象のフレーム画像を把握することができ、ユーザによる修正作業を効率化、迅速化することができる。 According to this invention, the user can grasp the frame image to be corrected at a glance, and the correction work by the user can be made efficient and quick.

請求項８に記載の発明は、請求項７に記載の物体領域抽出処理プログラムにおいて、前記表示制御手段は、前記算出された一致度が前記所定の閾値以下である前記フレーム画像の当該一致度がユーザにより視覚的に把握可能な表示物を、当該フレーム画像に対応させて表示させることを特徴とする。 According to an eighth aspect of the present invention, in the object region extraction processing program according to the seventh aspect, the display control means determines that the degree of coincidence of the frame image in which the calculated degree of coincidence is equal to or less than the predetermined threshold value. A display object that can be visually grasped by the user is displayed in correspondence with the frame image.

請求項９に記載の発明は、請求項８に記載の物体領域抽出処理プログラムにおいて、前記表示制御手段は、前記算出された一致度が前記所定の閾値より小さい失敗検出閾値以下である前記フレーム画像に対応する前記表示物と、前記算出された一致度が前記失敗検出閾値より大きい前記フレーム画像に対応する表示物と、を互いに異なる表示態様で表示させることを特徴とする。 According to a ninth aspect of the present invention, in the object region extraction processing program according to the eighth aspect, the display control means includes the frame image in which the calculated degree of coincidence is equal to or less than a failure detection threshold smaller than the predetermined threshold. And the display object corresponding to the frame image in which the calculated degree of coincidence is larger than the failure detection threshold value are displayed in different display modes.

この発明によれば、物体領域の抽出が失敗したフレーム画像とそうでない（例えばこれに近いフレーム画像をユーザは一目で把握することができ、ユーザによる修正作業を効率化、迅速化することができる。 According to the present invention, the frame image in which the extraction of the object region has failed and the frame image that is not so (for example, the user can grasp at a glance the frame image close to this, and the correction work by the user can be made efficient and quick. .

請求項１０に記載の発明は、請求項１に記載の物体領域抽出処理プログラムにおいて、前記物体領域抽出手段は、前記手がかりとなる線分が指定されたフレーム画像以外の他の複数のフレーム画像において、前記指定された手がかりとなる線分の移動を連続的に追跡し、前記物体領域抽出手段は、前記追跡された夫々の手がかりとなる線分に基づいて、夫々のフレーム画像から物体領域を抽出するものであって、前記コンピュータを、前記手がかりとなる線分が指定されたフレーム画像において抽出された物体領域と、当該フレーム画像以外の他の複数のフレーム画像の夫々において抽出された物体領域と、を夫々比較して夫々の一致度を算出する一致度算出手段として更に機能させ、前記表示制御手段は、前記算出された一致度が最も小さい方から選定した所定数の前記フレーム画像を表示させることを特徴とする。 According to a tenth aspect of the present invention, in the object region extraction processing program according to the first aspect, the object region extracting means is configured to use a plurality of frame images other than the frame image in which the line segment serving as the clue is specified. The movement of the line segment serving as the designated clue is continuously tracked, and the object area extracting means extracts the object area from each frame image based on the tracked line segment serving as the clue. An object region extracted in a frame image in which a line segment serving as a clue is designated, and an object region extracted in each of a plurality of other frame images other than the frame image, Are further functioned as a degree of coincidence calculating means for calculating the degree of coincidence of the two, and the display control means has the smallest degree of coincidence calculated. And wherein the displaying a predetermined number of the frame image selected from the side.

請求項１１に記載の発明は、請求項１０に記載の物体領域抽出処理プログラムにおいて、前記表示制御手段は、前記算出された一致度がユーザにより視覚的に把握可能な表示物を、前記所定数の前記フレーム画像の夫々に対応させて表示させることを特徴とする。 According to an eleventh aspect of the present invention, in the object region extraction processing program according to the tenth aspect, the display control means uses the predetermined number of display objects whose calculated degree of coincidence can be visually grasped by a user. The frame images are displayed in correspondence with each of the frame images.

請求項１２に記載の発明は、請求項１１に記載の物体領域抽出処理プログラムにおいて、前記表示制御手段は、前記算出された一致度が失敗検出閾値以下である前記フレーム画像に対応する前記表示物と、前記算出された一致度が前記失敗検出閾値より大きい前記フレーム画像に対応する表示物と、を互いに異なる表示態様で表示させることを特徴とする。 According to a twelfth aspect of the present invention, in the object region extraction processing program according to the eleventh aspect, the display control unit is configured to display the display object corresponding to the frame image in which the calculated degree of coincidence is a failure detection threshold value or less. And a display object corresponding to the frame image in which the calculated degree of coincidence is larger than the failure detection threshold value are displayed in different display modes.

この発明によれば、物体領域の抽出が失敗したフレーム画像とそうでない（例えばこれに近い）フレーム画像をユーザは一目で把握することができ、ユーザによる修正作業を効率化、迅速化することができる。 According to the present invention, the user can grasp at a glance the frame image in which the extraction of the object region has failed and the frame image that is not (for example, close to this), and the correction work by the user can be made efficient and quick. it can.

請求項１３に記載の発明は、請求項１に記載の物体領域抽出処理プログラムにおいて、前記物体領域抽出手段は、前記手がかりとなる線分が指定されたフレーム画像以外の他の複数のフレーム画像において、前記指定された手がかりとなる線分の移動を連続的に追跡し、前記物体領域抽出手段は、前記追跡された夫々の手がかりとなる線分に基づいて、夫々のフレーム画像から物体領域を抽出するものであって、前記コンピュータを、前記手がかりとなる線分が指定されたフレーム画像において抽出された物体領域と、当該フレーム画像以外の他の複数のフレーム画像の夫々において抽出された物体領域と、を夫々比較して夫々の一致度を算出する一致度算出手段として更に機能させ、前記表示制御手段は、前記一致度が算出された各フレーム画像を表示させると共に、当該算出された一致度がユーザにより視覚的に把握可能な表示物を、前記各フレーム画像に対応させて表示させることを特徴とする。 According to a thirteenth aspect of the invention, in the object region extraction processing program according to the first aspect of the invention, the object region extraction means is provided in a plurality of frame images other than the frame image in which the line segment serving as the clue is specified. The movement of the line segment serving as the designated clue is continuously tracked, and the object area extracting means extracts the object area from each frame image based on the tracked line segment serving as the clue. An object region extracted in a frame image in which a line segment serving as a clue is designated, and an object region extracted in each of a plurality of other frame images other than the frame image, Are further functioned as a degree of coincidence calculating means for calculating the degree of coincidence of the frames, and the display control means Together to display an image, the calculated degree of matching visually prehensible displayed object by a user, and wherein the displaying the in correspondence with each frame image.

請求項１４に記載の発明は、請求項１３に記載の物体領域抽出処理プログラムにおいて、前記表示制御手段は、前記算出された一致度が失敗検出閾値以下である前記フレーム画像に対応する前記表示物と、前記算出された一致度が前記失敗検出閾値より大きい前記フレーム画像に対応する表示物と、を互いに異なる表示態様で表示させることを特徴とする。 According to a fourteenth aspect of the present invention, in the object region extraction processing program according to the thirteenth aspect, the display control means is configured to display the display object corresponding to the frame image in which the calculated degree of coincidence is equal to or less than a failure detection threshold. And a display object corresponding to the frame image in which the calculated degree of coincidence is larger than the failure detection threshold value are displayed in different display modes.

請求項１５に記載の発明は、請求項６乃至９の何れか一項に記載の物体領域抽出処理プログラムにおいて、前記比較には、前記物体領域のカラーヒストグラム、前記物体領域の面積、前記物体領域の形状、前記物体領域のテクスチャ、および前記物体領域の重心位置の少なくとも何れか一つが用いられることを特徴とする。 According to a fifteenth aspect of the present invention, in the object region extraction processing program according to any one of the sixth to ninth aspects, the comparison includes a color histogram of the object region, an area of the object region, and the object region. At least one of the following shape, the texture of the object region, and the center of gravity of the object region is used.

請求項１６に記載の発明は、請求項１乃至１５の何れか一項に記載の物体領域抽出処理プログラムにおいて、前記コンピュータを、前記追跡された手がかりとなる線分に基づいて抽出された物体領域に対して、当該物体領域に表される物体に関する情報を含むメタデータを付与するメタデータ付与手段として更に機能させることを特徴とする。 The invention according to claim 16 is the object region extraction processing program according to any one of claims 1 to 15, wherein the computer is extracted based on the tracked line segment. On the other hand, it is further characterized in that it further functions as a metadata providing means for assigning metadata including information on the object represented in the object area.

この発明によれば、動画像中に含まれる物体領域に対して直接メタデータを付与することができる。 According to the present invention, metadata can be directly attached to an object region included in a moving image.

請求項１７に記載の発明は、請求項１６に記載の物体領域抽出処理プログラムにおいて、前記メタデータ付与手段は、前記抽出された複数の物体領域に対して夫々前記メタデータを付与することを特徴とする。 According to a seventeenth aspect of the present invention, in the object region extraction processing program according to the sixteenth aspect, the metadata attaching unit assigns the metadata to each of the plurality of extracted object regions. And

この発明によれば、動画像中に複数の物体領域が存在する場合でも、夫々の物体領域夫々に対して直接メタデータを付与することができる。 According to the present invention, even when there are a plurality of object areas in a moving image, metadata can be directly assigned to each object area.

請求項１８に記載の発明は、請求項１６又は１７に記載の物体領域抽出処理プログラムにおいて、前記メタデータには、前記抽出された物体領域の前記フレーム画像上における位置を示す位置情報が含まれることを特徴とする。 According to an eighteenth aspect of the present invention, in the object region extraction processing program according to the sixteenth or seventeenth aspect, the metadata includes position information indicating a position of the extracted object region on the frame image. It is characterized by that.

この発明によれば、メタデータが付与された物体領域の位置を後から容易に特定することができる。 According to the present invention, the position of the object region to which metadata is added can be easily specified later.

請求項１９に記載の発明は、請求項１８に記載の物体領域抽出処理プログラムにおいて、前記位置情報は、前記抽出された物体領域上又はその近傍に描かれる線分上の少なくとも一点の座標であることを特徴とする。 According to a nineteenth aspect of the present invention, in the object region extraction processing program according to the eighteenth aspect, the position information is coordinates of at least one point on a line segment drawn on or in the vicinity of the extracted object region. It is characterized by that.

この発明によれば、物体領域の外枠の座標（又は、物体領域全体の座標）を位置情報とする場合と比較して、データ量を減らすことができる。 According to the present invention, the amount of data can be reduced compared to the case where the coordinates of the outer frame of the object region (or the coordinates of the entire object region) are used as position information.

請求項２０に記載の発明は、動画像を構成する複数のフレーム画像のうち少なくとも一つのフレーム画像を表示させる表示制御手段と、前記表示されたフレーム画像に含まれる物体領域を抽出するための手がかりとなる線分であって、前記物体領域上と前記物体領域以外の領域上に別々に引かれる線分の指定をユーザから受け付ける受付手段と、前記指定された手がかりとなる線分に基づいて、そのフレーム画像から物体領域を抽出する物体領域抽出手段と、前記手がかりとなる線分が指定されたフレーム画像以外の他のフレーム画像において、前記指定された手がかりとなる線分の移動を追跡する手がかり追跡手段と、を備え、前記物体領域抽出手段は、さらに、前記追跡された手がかりとなる線分に基づいて、そのフレーム画像から物体領域を抽出することを特徴とする。 The invention according to claim 20 is a display control means for displaying at least one frame image among a plurality of frame images constituting a moving image, and a clue for extracting an object region included in the displayed frame image. Based on the line segment that serves as the designated clue, and a receiving unit that accepts designation of the line segment that is drawn separately on the object region and the region other than the object region , hand to track the object area extracting means for extracting the object region from the frame image, in other frame images other than frame image line is designated to be the clue, the movement of the segment to be the designated cue comprising a painter Ri tracking means, wherein the object area extracting means further on the basis of the line segment to be the tracked cues, the object from the frame image And extracting the range.

請求項２１に記載の発明は、コンピュータにより行われる物体領域抽出方法であって、動画像を構成する複数のフレーム画像のうち少なくとも一つのフレーム画像を表示させる工程と、前記表示されたフレーム画像に含まれる物体領域を抽出するための手がかりとなる線分であって、前記物体領域上と前記物体領域以外の領域上に別々に引かれる線分の指定をユーザから受け付ける工程と、前記指定された手がかりとなる線分に基づいて、そのフレーム画像から物体領域を抽出する物体領域抽出工程と、前記手がかりとなる線分が指定されたフレーム画像以外の他のフレーム画像において、前記指定された手がかりとなる線分の移動を追跡する工程と、を含み、前記物体領域抽出工程においては、さらに、前記追跡された手がかりとなる線分に基づいて、そのフレーム画像から物体領域を抽出することを特徴とする。 The invention according to claim 21 is an object region extraction method performed by a computer, the step of displaying at least one frame image among a plurality of frame images constituting a moving image, and the displayed frame image A step of receiving from a user a designation of a line segment that serves as a clue to extract an included object area and is separately drawn on the object area and an area other than the object area; and In the object region extraction step of extracting an object region from the frame image based on the line segment serving as a clue, and in the frame image other than the frame image in which the line segment serving as the clue is specified, the specified clue and the higher engineering move you track line segments, comprises, in the object area extraction step, further line serving as the tracked clues Based on, and extracting the object region from the frame image.

本発明によれば、手がかりとなる線分が指定されたフレーム画像以外の他のフレーム画像において物体領域を追跡するのではなく、手がかりとなる線分を追跡し、各フレーム画像において追跡された手がかりとなる線分に基づき物体領域を抽出するように構成したので、より高い精度での物体領域の抽出を行うことができると共に、あるフレーム画像で物体領域抽出に失敗したとしても、その物体領域抽出の失敗が次のフレーム画像での物体画像抽出には影響しないという効果を有する。さらに、抽出された物体領域自体を追跡する場合と比べて、失敗したフレーム画像内で手がかりとなる線分を追加、削除する修正作業をすれば良く、ユーザ（人手）による作業負担を低減させることができる。また、手がかりとなる線分を追跡するので、物体領域を追跡するときより小さいデータ量で追跡できるため、追跡処理を高速化することができる。 According to the present invention, clues segment clues instead of following the object region in another frame images other than the specified frame images, which tracks the segment as a cue, have been tracked in each frame image Since the object area is extracted based on the line segment , the object area can be extracted with higher accuracy and the object area can be extracted even if the object area extraction fails in a certain frame image. This failure has an effect that the object image extraction in the next frame image is not affected. Furthermore, as compared with the case where the extracted object region itself is tracked, it is only necessary to perform a correction work to add or delete a line segment that becomes a clue in the failed frame image, thereby reducing the work load on the user (manpower). Can do. In addition, since the line segment as a clue is tracked, tracking can be performed with a smaller amount of data when tracking the object region, so that the tracking process can be speeded up.

以下、図面を参照して本発明の最良の実施形態について詳細に説明する。なお、以下に説明する実施の形態は、動画像中の物体領域を抽出するための手がかりとなる手がかり領域（以下、「手がかり」と称する）であってユーザにより指定された手がかりに基づく物体領域の抽出、上記手がかりの追跡による物体領域の追跡、並びに抽出した物体領域に対するメタデータの付与等を行う画像編集装置に対して本発明を適用した場合の実施形態である。
［Ｉ．画像編集装置の構成および機能］
先ず、本実施形態に係る画像編集装置Ｓの構成および機能について、図１等を用いて説明する。 Hereinafter, the best embodiment of the present invention will be described in detail with reference to the drawings. The embodiment described below is a clue region (hereinafter referred to as “clue”) that serves as a clue for extracting an object region in a moving image, and is an object region based on the clue specified by the user. This is an embodiment when the present invention is applied to an image editing apparatus that performs extraction, tracking of an object region by tracking the clue, and adding metadata to the extracted object region.
[I. Configuration and function of image editing apparatus]
First, the configuration and function of the image editing apparatus S according to the present embodiment will be described with reference to FIG.

図１は、本実施形態に係る画像編集装置Ｓの概要構成例を示す図である。 FIG. 1 is a diagram illustrating a schematic configuration example of an image editing apparatus S according to the present embodiment.

図１に示すように、画像編集装置Ｓは、操作部１、表示部２、通信部３、ドライブ部４、記憶部５、入出力インターフェース部６、およびシステム制御部７等を備えており、システム制御部７と入出力インターフェース部６とは、システムバス８を介して接続されている。なお、画像編集装置Ｓとしてはパーソナルコンピュータを適用することができる。 As shown in FIG. 1, the image editing apparatus S includes an operation unit 1, a display unit 2, a communication unit 3, a drive unit 4, a storage unit 5, an input / output interface unit 6, a system control unit 7, and the like. The system control unit 7 and the input / output interface unit 6 are connected via a system bus 8. A personal computer can be applied as the image editing apparatus S.

操作部１は、例えば、キーボードおよびマウス等からなり、ユーザからの操作指示を受け付け、その指示内容を指示信号としてシステム制御部７に出力する。 The operation unit 1 includes, for example, a keyboard and a mouse, receives an operation instruction from the user, and outputs the instruction content to the system control unit 7 as an instruction signal.

表示部２は、例えば、ＣＲＴ（Cathode Ray Tube）ディスプレイ、液晶ディスプレイ等からなり、文字や画像等の情報を表示する。 The display unit 2 includes, for example, a CRT (Cathode Ray Tube) display, a liquid crystal display, and the like, and displays information such as characters and images.

通信部３は、インターネットやイントラネット等のネットワークＮＷ（通信回線およびルーター等の中継機器等により構成される）に接続して例えば情報提供サーバとの通信状態を制御する。 The communication unit 3 is connected to a network NW such as the Internet or an intranet (configured by a communication line and a relay device such as a router) and controls the communication state with the information providing server, for example.

ドライブ部４は、例えば、フレキシブルディスク、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disc）等のディスクＤＫ（記録媒体）からデータ等を読み出す一方、当該ディスクＤＫ（記録媒体）に対してデータ等を記録する。 For example, the drive unit 4 reads data from a disk DK (recording medium) such as a flexible disk, a CD (Compact Disc), a DVD (Digital Versatile Disc), and the like, while reading data from the disk DK (recording medium). Record.

入出力インターフェース部６は、操作部１乃至記憶部５とシステム制御部７との間のインターフェース処理を行う。 The input / output interface unit 6 performs interface processing between the operation unit 1 to the storage unit 5 and the system control unit 7.

記憶部５は、例えば、ハードディスクドライブ等からなり、オペレーティングシステム（Ｏ／Ｓ），各種プログラムおよびデータ等を記憶する。ここで、記憶部５に記憶されるプログラムには、動画像編集アプリケーションプログラム（本発明の物体領域抽出処理プログラムを有する）等が含まれる。また、記億部５には、例えばＭＰＥＧ（Moving Picture Expert Group）等の形式の動画像ファイルが記憶されている。 The storage unit 5 includes, for example, a hard disk drive and stores an operating system (O / S), various programs, data, and the like. Here, the programs stored in the storage unit 5 include a moving image editing application program (having the object region extraction processing program of the present invention) and the like. The storage unit 5 stores moving image files in a format such as MPEG (Moving Picture Expert Group).

なお、動画像編集アプリケーションプログラムは、例えばＣＤ−ＲＯＭ等のディスクＤＫに記録されて提供されるか、或いはネットワークＮＷに接続された情報提供サーバからのダウンロードにより提供され、インストールされて利用される。 The moving image editing application program is provided by being recorded on a disk DK such as a CD-ROM or provided by downloading from an information providing server connected to the network NW, and is used after being installed.

システム制御部７は、ＣＰＵ（Central Processing Unit）７ａ，ＲＯＭ（Read Only Memory）７ｂ，およびメインメモリと画像メモリとして使用されるＲＡＭ（Random Access Memory）７ｃ等を備えて構成されている。そして、システム制御部７は、上記動画像編集アプリケーションプログラムを実行することにより、本発明における表示制御手段、受付手段、物体領域抽出手段、手がかり領域追跡手段、一致度算出手段、失敗検出手段、情報提示手段、およびメタデータ付与手段等として機能し、画像編集装置Ｓが、（１）物体領域探索支援機能、（２）物体領域抽出機能、（３）手がかり追跡による物体領域追跡機能、（４）物体領域抽出失敗検出機能、および（５）メタデータ付与支援機能を有するようになっている。 The system control unit 7 includes a CPU (Central Processing Unit) 7a, a ROM (Read Only Memory) 7b, a RAM (Random Access Memory) 7c used as a main memory and an image memory, and the like. Then, the system control unit 7 executes the moving image editing application program to display the display control unit, the receiving unit, the object region extracting unit, the clue region tracking unit, the coincidence degree calculating unit, the failure detecting unit, the information in the present invention. The image editing apparatus S functions as a presentation unit, a metadata providing unit, and the like. The image editing apparatus S includes (1) an object region search support function, (2) an object region extraction function, (3) an object region tracking function by cue tracking, (4) It has an object area extraction failure detection function and (5) a metadata addition support function.

以下、これらの各機能について、図２乃至図９を参照して具体的に説明する。 Each of these functions will be specifically described below with reference to FIGS.

図２乃至図９は、動画像編集アプリケーションプログラムの実行過程において表示部２におけるディスプレイ上に表示される表示画面例を示す図である。 2 to 9 are diagrams showing examples of display screens displayed on the display in the display unit 2 in the execution process of the moving image editing application program.

（１）物体領域探索支援機能は、動画像を複数のフレーム画像により構成されるシーン単位に分割し、当該動画像の各シーンの先頭フレーム画像を一覧表示し、ユーザによる物体領域表示区間の探索を支援する機能である。 (1) The object region search support function divides a moving image into scene units composed of a plurality of frame images, displays a list of the first frame images of each scene of the moving image, and searches for object region display sections by the user It is a function to support.

物体領域探索支援機能では、システム制御部７により動画像ファイルが記憶部５から読み込まれ、画像メモリ上で展開された後、シーン分割処理、シーン一覧表示処理が順に実行される。図２（Ａ）の表示例は、シーン分割処理により分割された各シーンの先頭フレーム画像が、シーン一覧表示処理により一覧表示（図中、破線５１で囲まれるフレーム画像）された例を示している。図２（Ａ）の表示状態において、ユーザが、先頭フレーム画像の一つ（例えば、図２（Ａ）の先頭フレーム画像Ｆ４）を選択（例えば、マウスによりクリック）、つまり、当該先頭フレーム画像を含むシーンを選択すると、図２（Ｂ）に示すように、選択されたシーンに含まれる複数のフレーム画像が一覧表示（図中、破線５２で囲まれるフレーム画像）される。このとき、読み込む動画像のフレーム数によって、例えばフレーム画像Ｆ４からフレーム画像Ｆ５までのフレーム数も異なってくるので、図中、破線５２内には、フレーム画像Ｆ４からフレーム画像Ｆ５までの全てのフレーム画像ではなく、そのうちの一部のフレーム画像が所定のフレーム数ごとに表示される場合もある。 In the object area search support function, a moving image file is read from the storage unit 5 by the system control unit 7 and developed on the image memory, and then a scene division process and a scene list display process are sequentially executed. The display example in FIG. 2A shows an example in which the first frame image of each scene divided by the scene division process is displayed as a list (frame image surrounded by a broken line 51 in the figure) by the scene list display process. Yes. In the display state of FIG. 2A, the user selects one of the head frame images (for example, the head frame image F4 of FIG. 2A) (for example, clicks with the mouse), that is, the head frame image is selected. When a scene to be included is selected, a plurality of frame images included in the selected scene are displayed as a list (frame images surrounded by a broken line 52 in the figure), as shown in FIG. At this time, for example, the number of frames from the frame image F4 to the frame image F5 varies depending on the number of frames of the moving image to be read. Therefore, all the frames from the frame image F4 to the frame image F5 are shown in the broken line 52 in the figure. In some cases, some of the frame images are displayed every predetermined number of frames instead of the images.

なお、図２（Ｂ）において、破線５２で囲まれるフレーム画像は、先頭フレーム画像Ｆ４と先頭フレーム画像Ｆ５との間に挟まれたフレーム画像である。また、表示画面上部の拡大されたフレーム画像（符号５０部）は、破線５１で囲まれるフレーム画像及び破線５２で囲まれるフレーム画像のうちからユーザにより選択されたフレーム画像である。また、ユーザが、新規作成ボタン５３ａおよび編集ボタン５３ｂを指定することにより、図３（Ａ）に示す表示画面に遷移される。 In FIG. 2B, the frame image surrounded by the broken line 52 is a frame image sandwiched between the top frame image F4 and the top frame image F5. The enlarged frame image (reference numeral 50) at the top of the display screen is a frame image selected by the user from the frame image surrounded by the broken line 51 and the frame image surrounded by the broken line 52. In addition, when the user designates the new creation button 53a and the edit button 53b, the screen is changed to the display screen shown in FIG.

（２）物体領域抽出機能は、一覧表示の中からユーザにより選択されたフレーム画像上で、当該ユーザにより指定された手がかりに基づいて、そのフレーム画像から物体領域を抽出する機能である。 (2) The object region extraction function is a function for extracting an object region from a frame image selected from the list display based on a clue designated by the user on the frame image selected by the user.

物体領域抽出機能では、ユーザが追跡を行う範囲を特定することから始められる。図３（Ａ）に示す表示画面は、ユーザが、抽出したい物体が表示されているフレーム画像の区間を特定するために利用される物体領域表示区間選択インターフェースを担っており、ユーザが開始フレーム画像指定ボタン５４ａと終了フレーム画像指定ボタン５４ｂを指定（例えば、マウスによりクリック）することで、物体領域表示区間（つまり、追跡区間）の開始および終了フレーム画像を選択することができる。すなわち、これにより、追跡を行う範囲を特定することができる。このとき、ユーザは、開始および終了フレーム画像を発見するために、シーン内の任意のフレーム画像を効率的に探索するためのフレーム探索スライドバー５５ａあるいはフレーム送り、フレーム戻しボタン５５ｂを使用することができる。こうして、追跡を行う範囲が特定された後、ユーザが、決定ボタン５４ｃを指定（例えば、マウスによりクリック）すると、図３（Ｂ）に示すように、領域指定ボタン５６が表示される。 In the object region extraction function, the user can start by specifying a range to be tracked. The display screen shown in FIG. 3A serves as an object area display section selection interface used for the user to specify a section of a frame image in which an object to be extracted is displayed. By designating the designation button 54a and the end frame image designation button 54b (for example, clicking with the mouse), it is possible to select the start and end frame images of the object area display section (that is, the tracking section). That is, it is possible to specify the range to be tracked. At this time, in order to find the start and end frame images, the user may use the frame search slide bar 55a for efficiently searching for an arbitrary frame image in the scene or the frame forward / back frame button 55b. it can. Thus, after the range to be tracked is specified, when the user designates the decision button 54c (for example, clicks with the mouse), an area designation button 56 is displayed as shown in FIG.

そして、ユーザが、当該領域指定ボタン５６を指定（例えば、マウスによりクリック）すると、図４（Ａ）に示すように、当該ユーザにより選択された開始フレーム画像６１ａが表示された領域抽出画面６１が表示される。この領域抽出画面６１は、手がかり入力インターフェースを担っている。このような領域抽出画面６１における開始フレーム画像６１ａ上に、ユーザにより手がかりが指定（入力）されると、当該指定を受け付けたシステム制御部７により物体領域抽出処理が実行され、指定された手がかりに基づいて（例えば、位置情報を利用して）物体領域が自動的に抽出される。ユーザは、手がかりを最低２つ指定するだけでよく、１つが所望の物体領域上に（図４（Ｂ）の破線６１ｂ）、もう１つが背景領域に（上記物体領域以外の領域、図４（Ｂ）の一点鎖線６１ｃ）上にあればよい。図４（Ｂ）の表示例では、手がかりとして、直線状の線分を利用しているが、曲線状、円形状、又は矩形状の線分であっても良く、或いは、点であってもよい。また、物体領域及び背景領域を指定する線分は、任意の本数、任意の順番で指定することができる。 When the user designates the area designation button 56 (for example, clicks with the mouse), as shown in FIG. 4A, an area extraction screen 61 on which the start frame image 61a selected by the user is displayed. Is displayed. The area extraction screen 61 serves as a clue input interface. When a clue is designated (input) by the user on the start frame image 61a on the region extraction screen 61, the system control unit 7 that has accepted the designation executes object region extraction processing, and uses the designated clue. The object region is automatically extracted based on (for example, using position information). The user only needs to specify at least two cues, one on the desired object area (broken line 61b in FIG. 4B), and the other on the background area (an area other than the object area, FIG. It only needs to be on the alternate long and short dash line 61c). In the display example of FIG. 4B, a straight line segment is used as a clue, but a curved line, a circular line, a rectangular line segment, or a point may be used. Good. In addition, the line segments that specify the object region and the background region can be specified in any number and in any order.

なお、手がかりを指定するには、例えばユーザがマウスを操作して、領域抽出画面６１におけるフレーム画像６１ａ上における所望の位置にポインタを移動させ、マウスの左ボタンを押したままドラッグすることで物体領域を指定する手がかり６１ｂを、マウスの右ボタンを押したままドラッグすることで背景領域を指定する手がかり６１ｃを、夫々引けばよい。また、ユーザは、例えば誤った線分を引いた場合、物体領域を指定する線分、背景領域を指定する線分にかかわらず、マウスの中ボタンを押したまま、削除したい線分をポインタでなぞることで、当該線分の削除を行うことができる。また、マウスの左右ボタンで物体領域及び背景領域を指定する代わりに、ユーザは、手がかり入力インターフェースに設けられるメニューより、物体（例えば図４（Ａ）の「foreground」）、背景（例えば図４（Ａ）の「background」）、削除（例えば図４（Ａ）の「delete」）を先に指定してから任意のマウスボタンを押したままドラッグすることでも指定が可能である。 In order to specify a clue, for example, the user operates the mouse to move the pointer to a desired position on the frame image 61a in the area extraction screen 61, and drags while holding down the left button of the mouse. The clue 61b for designating the background area may be pulled by dragging the clue 61b for designating the area while holding down the right button of the mouse. In addition, for example, when the user draws an incorrect line segment, regardless of the line segment that specifies the object area or the line segment that specifies the background area, the user can hold down the middle button of the mouse and point the line segment to be deleted with the pointer. The line segment can be deleted by tracing. Instead of specifying the object region and the background region with the left and right buttons of the mouse, the user can select the object (for example, “foreground” in FIG. 4A) and the background (for example, FIG. It is also possible to specify by dragging while holding down any mouse button after first specifying “background” in A) and deleting (for example, “delete” in FIG. 4A).

また、図４（Ｂ）の表示例において、実線６１ｄで囲まれた領域が抽出された物体領域である。物体領域の抽出は、当該物体領域を指定する手がかり６１ｂと背景領域を指定する手がかり６１ｃが夫々一本引かれた時点で行われ、実線６１ｄに示されるように、抽出結果が表示される。その後、ユーザが、物体領域を指定する手がかり、あるいは背景領域を指定する手がかりを指定する度に、物体領域の再抽出が行われ、その抽出結果が表示されることになる。こうして、ユーザは最終的な抽出結果を確認した後、完了ボタン６１ｅを指定（例えば、マウスによりクリック）すると、上記抽出結果が、図５に示す表示画面上に反映されることになる。 Further, in the display example of FIG. 4B, the region surrounded by the solid line 61d is the extracted object region. The extraction of the object area is performed when one clue 61b specifying the object area and one clue 61c specifying the background area are drawn, and the extraction result is displayed as indicated by a solid line 61d. Thereafter, each time the user designates a clue for designating the object region or a clue for designating the background region, the object region is re-extracted and the extraction result is displayed. In this way, after the user confirms the final extraction result and designates the completion button 61e (for example, clicks with the mouse), the extraction result is reflected on the display screen shown in FIG.

ところで、手がかりを利用した物体領域抽出処理には、物体領域上に引いた手がかりと、背景領域上に引いた手がかりから、物体領域を抽出する技術である、例えばLazy Snappingのロジック等の公知技術を利用することができる。Lazy Snappingは、物体領域の背景からの分離と、分離後の境界線の画素単位での修正の２つのステップで構成されるが、本発明において、Lazy Snappingを利用する場合には、物体領域の背景からの分離ステップのみ採用される。Lazy Snappingの特徴は、領域内に数本線を引くだけで物体領域抽出を実現している点にあり、また、線を１本追加する度に、逐次的に物体領域の再抽出が行われるため、ユーザは抽出結果を見ながら線の追加、削除を行うことができ、インタラクティブな物体領域抽出が可能となる。 By the way, the object area extraction process using the clue is a technique for extracting the object area from the clue drawn on the object area and the clue drawn on the background area, for example, a known technique such as Lazy Snapping logic. Can be used. Lazy Snapping consists of two steps: separation of the object region from the background and correction of the boundary line after separation in units of pixels. In the present invention, when Lazy Snapping is used, Only the separation step from the background is employed. The feature of Lazy Snapping is that object region extraction is realized by simply drawing several lines in the region, and every time a line is added, the object region is re-extracted sequentially. The user can add and delete lines while looking at the extraction result, and interactive object region extraction is possible.

また、Lazy Snappingなどのグラフカットに基づく物体領域抽出手法は、画像をグラフG=(V,E)に置換し、物体領域抽出の問題を、各頂点に対するラベリング問題へと帰着させ解くという特徴がある。ここで、Vは全ての頂点集合であり、Eは全ての辺集合である。画像をグラフに置換するとき、各画素を頂点に、各画素の周囲４方向あるいは８方向の隣接関係を辺に対応させる。ラベリング問題とは、グラフGの各頂点i∈Vに対してラベルx_i∈{物体領域(＝１),背景(＝０)}を割り当てる問題である。この問題の最適解Ｘ＝{x_i}は、次式で定義されるエネルギー関数Ｃ(Ｘ)を最小化する。 In addition, object region extraction methods based on graph cuts such as Lazy Snapping are characterized by replacing the image with a graph G = (V, E) and reducing the object region extraction problem to a labeling problem for each vertex. is there. Here, V is all vertex sets, and E is all edge sets. When an image is replaced with a graph, each pixel is made to correspond to a vertex, and the adjacency relationship in four directions or eight directions around each pixel is made to correspond to an edge. The labeling problem is a problem of assigning a label x _i ∈ {object region (= 1), background (= 0)} to each vertex i∈V of the graph G. The optimal solution X = {x _i } for this problem minimizes the energy function C (X) defined by

ここで、Ｃ_１(x_i)は頂点iがx_iである尤度、Ｃ_２(x_i, x_j)は頂点i, jがそれぞれx_i, x_jであるときのコストを表す。エネルギー最小化の方法にはグラフカットアルゴリズムが使用される。なお、Lazy Snappingで使用されるエネルギー関数の定義については非特許文献１を参照されたい。 Here, C ₁ (x _i ) represents the likelihood that the vertex i is x _i , and C ₂ (x _i , x _j ) represents the cost when the vertices i and j are x _i and x _j , respectively. A graph cut algorithm is used as an energy minimization method. Refer to Non-Patent Document 1 for the definition of the energy function used in Lazy Snapping.

このようなエネルギー最小化による物体領域抽出の原理は、Lazy Snappingだけでなく、非特許文献２や非特許文献３の手法でも使用されている。ユーザが指定した手がかりをもとにエネルギー計算によって物体領域を抽出するものであれば、本発明で適用可能な物体領域抽出手法はLazy Snappingに限定されるものではなく、その他の手法を適用しても良い。 The principle of object region extraction by such energy minimization is used not only by Lazy Snapping but also by the methods of Non-Patent Document 2 and Non-Patent Document 3. As long as the object region is extracted by energy calculation based on the clue specified by the user, the object region extraction method applicable in the present invention is not limited to Lazy Snapping, and other methods can be applied. Also good.

（３）手がかり追跡による物体領域追跡機能は、ユーザにより手がかりが指定されたフレーム画像以外の他のフレーム画像において、当該指定された手がかりの移動を追跡する機能（つまり、指定された手がかりが、他のフレーム画像内のどの位置に移動するかを追跡する機能）である。 (3) The object area tracking function based on the cue tracking is a function for tracking the movement of the designated cue in other frame images other than the frame image for which the cue is designated by the user (that is, the designated cue is other This is a function for tracking to which position in the frame image of the movement.

手がかり追跡による物体領域追跡機能では、システム制御部７により手がかり追跡処理が実行されて手がかりの移動位置が追跡され、各フレーム画像においてその手がかりをもとに物体領域抽出処理が行われることで、物体領域表示区間内の全フレーム画像での物体領域抽出を自動化するようになっている。例えば、ユーザがマウスを操作して、図５に示す表示画面上で追跡方向（再生方向又は逆（巻き戻し）方向）を追跡方向指定部５８にて指定し、追跡開始ボタン５７を指定すると、例えば上記物体領域表示区間における開始フレーム画像から、これに連続する他の複数のフレーム画像における手がかりの移動の連続（所定（例えば一つ）数のフレーム画像を飛び越えた場合（つまり、手がかりの移動が追跡されたフレーム画像間に手がかりの移動を追跡しないフレーム画像を含んでもよい））的な追跡が開始され、各フレーム画像において追跡された手がかりに基づいて各フレーム画像から物体領域が抽出される。なお、当該手がかりの追跡は、物体領域表示区間における開始フレーム画像と終了フレーム画像の間に位置する中間のフレーム画像からも開始することができる。 In the object area tracking function based on the cue tracking, the cue tracking process is executed by the system control unit 7 to track the movement position of the cue, and the object area extraction process is performed based on the cue in each frame image. Extraction of object areas in all frame images within the area display section is automated. For example, when the user operates the mouse to specify the tracking direction (playback direction or reverse (rewind) direction) on the display screen shown in FIG. For example, when the start frame image in the object area display section is over a predetermined (for example, one) number of frame images (ie, the movement of the clue is over) in a plurality of consecutive frame images. Tracking may be included between tracked frame images (which may include frame images that do not track the movement of cues))) tracking is initiated, and object regions are extracted from each frame image based on the cues tracked in each frame image. The tracking of the clue can also be started from an intermediate frame image positioned between the start frame image and the end frame image in the object region display section.

このように、手がかりの移動を追跡し、当該追跡された手がかりに基づいて各フレーム画像から物体領域を抽出することで、追跡に失敗した場合、つまり、あるフレーム画像で物体領域抽出に失敗したとしても、その物体領域抽出の失敗が次のフレーム画像での物体画像抽出には影響しないため、抽出された物体領域自体を追跡する場合と比べて、失敗したフレーム画像内で手がかりを追加、削除する修正作業をすれば良く、当該修正作業を迅速かつ効率化することができる。 In this way, by tracking the movement of the cue and extracting the object region from each frame image based on the tracked cue, if tracking fails, that is, the object region extraction fails in a certain frame image. However, since the object region extraction failure does not affect the object image extraction in the next frame image, the clue is added or deleted in the failed frame image as compared with the case where the extracted object region itself is tracked. The correction work may be performed, and the correction work can be performed quickly and efficiently.

ところで、手がかり追跡処理には、画面上の各点の速度場を表すオプティカルフローのロジック等の公知技術を利用することができる。なお、オプティカルフローの計算手法の詳細については非特許文献４及び５を参照されたい。 By the way, for the clue tracking process, a known technique such as an optical flow logic representing the velocity field of each point on the screen can be used. Refer to Non-Patent Documents 4 and 5 for details of the optical flow calculation method.

（４）物体領域抽出失敗検出機能は、追跡に失敗（追跡ミス）した（つまり、追跡された手がかり領域に基づく物体領域の抽出が失敗した）フレーム画像を検出し、検出したフレーム画像等を示す情報をユーザに提示（例えば図６に示すように表示、あるいは音声出力）する機能である。 (4) The object region extraction failure detection function detects a frame image in which tracking has failed (tracking error) (that is, extraction of an object region based on the tracked cue region has failed), and indicates the detected frame image, etc. This is a function for presenting information to the user (for example, as shown in FIG. 6 or voice output).

物体領域抽出失敗検出機能では、システム制御部７により失敗検出処理が実行されて、基準となるフレーム画像（例えば、手がかりが指定された開始フレーム画像）において抽出された物体領域と、当該フレーム画像から連続するフレーム画像（つまり、当該フレーム画像以外の他のフレーム画像）において抽出（追跡された手がかりに基づく抽出）された物体領域と、が比較されてそれらの一致度が算出されることにより物体領域の抽出が失敗したフレーム画像が検出される。 In the object region extraction failure detection function, failure detection processing is executed by the system control unit 7, and the object region extracted in the reference frame image (for example, the start frame image in which the clue is specified) is extracted from the frame image. Object regions extracted by comparing (extracting based on tracked cues) in consecutive frame images (that is, other frame images other than the frame images) and calculating the degree of coincidence thereof A frame image in which the extraction of the image has failed is detected.

すなわち、当該一致度は、基準となるフレーム画像を除く夫々のフレーム画像について計算されることになり、その上で、算出された一致度が、例えばユーザにより予め設定された失敗検出閾値以下であるフレーム画像が物体領域の抽出が失敗した失敗フレーム画像として検出される。なお、当該比較には、例えば、物体領域のカラーヒストグラム、物体領域の面積、物体領域の形状、物体領域のテクスチャ、及び物体領域の重心位置の少なくとも何れか一つが用いられるが、その他の比較方法を適用しても構わない。 That is, the degree of coincidence is calculated for each frame image excluding the reference frame image, and the calculated degree of coincidence is, for example, equal to or less than a failure detection threshold preset by the user. A frame image is detected as a failed frame image in which extraction of the object region has failed. The comparison uses, for example, at least one of a color histogram of the object region, an area of the object region, a shape of the object region, a texture of the object region, and a barycentric position of the object region. May be applied.

更に、物体領域抽出失敗検出機能では、システム制御部７により追跡結果表示処理が実行されて、算出された一致度が最も小さい（低い）方から選定した所定数（例えば、一致度が低いものから順に１０個）のフレーム画像を表示させるようになっている。例えば、失敗フレーム画像が所定数より少なければ失敗フレーム画像でないフレーム画像も表示されることになる（もちろん、失敗フレーム画像のみを表示させても良い）。また、物体領域抽出失敗検出機能では、上記所定数のフレーム画像を表示させると共に、上記一致度がユーザにより視覚的に把握可能な棒グラフ（表示物の一例）を当該所定数のフレーム画像の夫々に対応させて表示させるようになっている。 Further, in the object region extraction failure detection function, a tracking result display process is executed by the system control unit 7, and a predetermined number selected from the one with the smallest (low) calculated degree of coincidence (for example, the degree of coincidence is low). 10 frame images are displayed in order. For example, if the number of failed frame images is less than a predetermined number, a frame image that is not a failed frame image is also displayed (of course, only the failed frame image may be displayed). In addition, the object region extraction failure detection function displays the predetermined number of frame images, and displays a bar graph (an example of a display object) with which the degree of coincidence can be visually grasped by the user for each of the predetermined number of frame images. It is made to display correspondingly.

図６の表示例は、一致度が最も小さい方から選定された１０個のフレーム画像に対応する棒グラフ（グラフ表示領域５９内に表示）が当該フレーム画像に対応させて表示された例を示している。かかる棒グラフとフレーム画像との対応付けは、フレーム画像と棒グラフとを結ぶ直線６０によりなされており、これにより、ユーザはどの位置にあるフレーム画像の一致度が高いかを一目で把握することができる。更に、図６の表示例では、一致度が失敗検出閾値以下であるフレーム画像に対応する棒グラフ５９ａと、一致度が失敗検出閾値より大きいフレーム画像に対応する棒グラフ５９ｂと、は互いに異なる表示態様（例えば色や模様等が異なる）で表示されている。 The display example of FIG. 6 shows an example in which bar graphs (displayed in the graph display area 59) corresponding to 10 frame images selected from the one with the lowest degree of coincidence are displayed corresponding to the frame images. Yes. The association between the bar graph and the frame image is made by a straight line 60 that connects the frame image and the bar graph, so that the user can grasp at a glance which position of the frame image is highly consistent. . Furthermore, in the display example of FIG. 6, the bar graph 59a corresponding to the frame image whose matching degree is equal to or lower than the failure detection threshold and the bar graph 59b corresponding to the frame image whose matching degree is larger than the failure detection threshold are different from each other. For example, colors and patterns are different).

例えば、棒グラフ５９ａは例えば目立つ色(例えば赤色)とし、棒グラフ５９ｂは目立たない色（例えば緑色）とするように表示させる。これにより、ユーザは一目で失敗フレーム画像を把握することができる。 For example, the bar graph 59a is displayed so as to have a conspicuous color (for example, red), and the bar graph 59b has a conspicuous color (for example, green). Thereby, the user can grasp the failed frame image at a glance.

なお、上記表示例では、一致度が失敗検出閾値以下であるフレーム画像に対応する棒グラフ５９ａ（例えば赤色）と、一致度が失敗検出閾値より大きいフレーム画像に対応する棒グラフ５９ｂ（例えば緑色）と、は互いに異なる表示態様で表示させるように構成したが、一致度が失敗検出閾値より大きいフレーム画像であり、且つ該一致度が失敗フレーム画像に近いフレーム画像(例えば、一致度が「失敗検出閾値＋α」以下のフレーム画像)に対応する棒グラフも、一致度が失敗検出閾値以下であるフレーム画像に対応する棒グラフと同じ表示態様（例えば赤色）で表示しても良い。これにより、ユーザは一目で失敗フレーム画像を把握することができると共に、失敗フレーム画像に近いフレーム画像についても一目で把握することができる。 In the above display example, a bar graph 59a (for example, red) corresponding to a frame image whose matching degree is equal to or less than the failure detection threshold, a bar graph 59b (for example, green) corresponding to a frame image having a matching degree greater than the failure detection threshold, Are configured to be displayed in different display modes. However, the frame images having a matching degree larger than the failure detection threshold and the matching degree being close to the failure frame image (for example, the matching degree is “failure detection threshold + α The bar graph corresponding to “the frame image below” may also be displayed in the same display mode (for example, red) as the bar graph corresponding to the frame image whose coincidence is equal to or lower than the failure detection threshold. Accordingly, the user can grasp the failed frame image at a glance and can grasp a frame image close to the failed frame image at a glance.

こうして一致度を示す棒グラフと共に表示されたフレーム画像は、ユーザにより選択可能になっており、選択されたフレーム画像は、表示画面上部に拡大されて表示される（符号５０部）。このような表示状態において、ユーザが例えば失敗フレーム画像を選択し、領域指定ボタン５６を指定すると、図７（Ａ）に示すように、当該ユーザにより選択された失敗フレーム画像７１ａが表示された領域抽出画面７１が表示される。図７（Ａ）の表示例では、抽出された物体領域７１ｂがユーザの意図する領域ではないので、当該ユーザは失敗フレーム画像７１ａ上に手がかりを再指定（手がかりの追加、削除）すると、当該再指定を受け付けたシステム制御部７により物体領域抽出処理が実行され、再指定された手がかりに基づいて物体領域が抽出され、図７（Ｂ）の実線７１ｃに示すように抽出結果として表示されることになる。このとき、当該フレーム画像上に追加された手がかりは、そこから再追跡することで、これに連続するフレーム画像に反映することができる。 The frame image displayed together with the bar graph indicating the degree of coincidence can be selected by the user, and the selected frame image is enlarged and displayed on the upper part of the display screen (reference numeral 50). In such a display state, for example, when the user selects a failure frame image and designates the region designation button 56, the region in which the failure frame image 71a selected by the user is displayed as shown in FIG. An extraction screen 71 is displayed. In the display example of FIG. 7A, since the extracted object region 71b is not the region intended by the user, when the user re-specifies the clue on the failed frame image 71a (addition or deletion of the clue), The object region extraction process is executed by the system control unit 7 that has received the designation, the object region is extracted based on the redesignated clue, and is displayed as an extraction result as indicated by a solid line 71c in FIG. 7B. become. At this time, the clue added to the frame image can be reflected in a frame image continuous thereto by re-tracking from there.

なお、システム制御部７により追跡結果表示処理において表示（例えば、図６に示すグラフ表示領域５９内に表示）されるフレーム画像は、算出された一致度が失敗検出閾値よりも大きい所定の閾値以下であるフレーム画像としても良い。この場合、表示されるフレーム画像の数は、最大表示可能フレーム画像数以内となる一方、上記所定の閾値以下であるフレーム画像が無ければ、表示されるフレーム画像も無いということになる。また、この場合も、当該所定の閾値以下であるフレーム画像に対応する棒グラフが当該フレーム画像に対応付けられて表示される。従って、一致度が所定の閾値より大きい（かなり一致している）フレーム画像に対応する棒グラフ（例えば緑色）は表示させずに済む。 The frame image displayed by the system control unit 7 in the tracking result display process (for example, displayed in the graph display area 59 shown in FIG. 6) is equal to or less than a predetermined threshold value in which the calculated matching degree is larger than the failure detection threshold value. It is good also as a frame image. In this case, the number of frame images to be displayed is within the maximum number of displayable frame images. On the other hand, if there is no frame image that is equal to or less than the predetermined threshold value, there is no frame image to be displayed. Also in this case, a bar graph corresponding to a frame image that is equal to or less than the predetermined threshold is displayed in association with the frame image. Therefore, it is not necessary to display a bar graph (for example, green color) corresponding to a frame image whose matching degree is larger than a predetermined threshold value (matching considerably).

また、上記閾値とは無関係に、最大表示可能フレーム画像数以内の全てのフレーム画像とこれに対応する棒グラフ（一致度の大きさに比例したグラフ）を表示させるようにしても良い。 Further, regardless of the threshold value, all frame images within the maximum number of displayable frame images and corresponding bar graphs (graphs proportional to the degree of coincidence) may be displayed.

そして、ユーザは最終的な抽出結果を確認した後、完了ボタン７１ｄを指定すると、上記抽出結果が反映されつつ図６に示す表示状態に戻ることになる。なお、このような再指定による物体領域の抽出は、失敗フレーム画像であるかどうかにかかわらず、各フレーム画像毎に行うことが可能である。 Then, after the user confirms the final extraction result and designates the completion button 71d, the display state shown in FIG. 6 is returned while the extraction result is reflected. Note that extraction of the object region by such re-designation can be performed for each frame image regardless of whether or not it is a failed frame image.

その後、ユーザがマウスを操作して、図６に示す表示画面上で追跡方向（再生方向又は逆（巻き戻し）方向）を追跡方向指定部５８にて指定し、追跡開始ボタン５７を指定すると、上記再指定により物体領域が抽出されたフレーム画像から連続する他の複数のフレーム画像における手がかりの移動の連続的な再追跡が開始され、各フレーム画像において再追跡された手がかりに基づいて各フレーム画像から物体領域が抽出されることになる。 Thereafter, when the user operates the mouse to specify the tracking direction (playback direction or reverse (rewind) direction) on the display screen shown in FIG. Continuous re-tracking of the movement of the cues in a plurality of consecutive frame images from the frame image from which the object region has been extracted by the re-designation is started, and each frame image based on the cues re-tracked in each frame image The object region is extracted from

このように追跡に失敗した失敗フレーム画像を含む一致度が小さい（低い）フレーム画像を表示させ修正可能とすることで、物体領域の抽出をより正確に行うことができると共に、ユーザの修正負荷を軽減することができる。 In this way, by displaying and correcting a frame image with a low (low) degree of coincidence including a failed frame image that has failed to be tracked, the object region can be extracted more accurately and the correction load on the user can be reduced. Can be reduced.

（５）メタデータ付与支援機能は、抽出された物体領域に対して付与すべき、当該物体領域に表される物体に関する情報を含むメタデータの入力を支援するための機能である。 (5) The metadata addition support function is a function for supporting the input of metadata including information on the object represented in the object area to be added to the extracted object area.

メタデータ付与支援機能では、例えば、図６に示す表示画面上で、ユーザが、テキスト編集ボタン６０ａを指定すると、図８（Ａ）に示すように、テキスト情報入力画面８１が表示される。このテキスト情報入力画面８１は、メタデータ入力インターフェースを担っている。このようなテキスト情報入力画面８１において、ユーザは例えばキーボードにより、図８（Ｂ）に示すように、タグ名データとして、例えば物体領域を識別するためのテキストデータ（例えば、「ジャケット」）をタグ名入力欄８１ａに入力し、タグ内容データとして、任意のテキストデータ（例えば、「ブランド名」、「価格」、及びジャケットを購入することができる「ショッピングサイトのＵＲＬ」）をタグ内容入力欄８１ｂに入力して、完了ボタン８１ｃを指定すると、システム制御部７によりメタデータ付与処理が実行されて、上記入力されたタグ名データ及びタグ内容データとがメタデータとして当該物体領域に対して付与される。当該付与は、例えば、物体領域のフレーム画像上における位置を示す位置情報である物体領域位置データ（例えば、当該フレーム画像の番号と、当該フレーム画像上における物体領域の座標とが含まれる）に対して上記メタデータが関連付けられることよりなされる。 In the metadata addition support function, for example, when the user designates the text editing button 60a on the display screen shown in FIG. 6, a text information input screen 81 is displayed as shown in FIG. The text information input screen 81 serves as a metadata input interface. In such a text information input screen 81, the user tags, for example, text data (for example, “jacket”) for identifying an object region as tag name data, as shown in FIG. Arbitrary text data (for example, “brand name”, “price”, and “URL of shopping site” where a jacket can be purchased) is input as a tag content data in the name input field 81a and the tag content input field 81b. And the completion button 81c is specified, the system control unit 7 executes the metadata adding process, and the input tag name data and tag content data are added to the object region as metadata. The For example, the assignment is performed on object area position data that is position information indicating the position of the object area on the frame image (for example, the number of the frame image and the coordinates of the object area on the frame image are included). This is done by associating the metadata.

ここで、上記物体領域位置データは、より少ないデータ量で表すために、物体領域の外枠（例えば、図７（Ｂ）に示す実線７１ｃ）の座標とするのではなく、例えば抽出された物体領域を囲む円や外接円の中心の座標と半径により定義されることが望ましい。例えば、図６に示す表示画面上で、ユーザが、結果表示ボタン６０ｂを指定すると、図９（Ａ）に示すように、フレーム画像５０中に物体領域を囲む円５０ａが表示されるが、当該円５０ａの中心の座標と半径が物体領域位置データとなる。これにより、物体領域の外枠の座標（又は、物体領域全体の座標）を物体領域位置データとする場合と比較して、データ量を減らすことができる。なお、物体領域位置データの表現として、抽出された物体領域を囲む円や外接円の他、例えば、当該抽出された物体領域上又はその近傍に描かれる線分（直線、曲線、矩形等）上の少なくとも一点（全点でもよい）の座標としても、より少ないデータ量で表すことができる。 Here, in order to express the object area position data with a smaller amount of data, the coordinates of the outer frame of the object area (for example, the solid line 71c shown in FIG. 7B) are used instead of the coordinates of the extracted object, for example. It is desirable to be defined by the coordinates and radius of the center of the circle surrounding the region and the circumscribed circle. For example, when the user designates the result display button 60b on the display screen shown in FIG. 6, a circle 50a surrounding the object area is displayed in the frame image 50 as shown in FIG. The coordinates and radius of the center of the circle 50a are the object area position data. As a result, the data amount can be reduced as compared with the case where the coordinates of the outer frame of the object region (or the coordinates of the entire object region) are used as the object region position data. In addition to the circle and circumscribed circle surrounding the extracted object area, for example, on the line segment (straight line, curve, rectangle, etc.) drawn on or near the extracted object area as the representation of the object area position data The coordinates of at least one point (or all points) of can be expressed with a smaller amount of data.

また、図９（Ｂ）の表示例は、メタデータ付与後に表示される例であるが、メタデータ表示部５０ｂには、上記付与されたメタデータが表示される。なお、図９（Ｂ）に示す保存ボタン６０ｃがユーザにより指定されると、上記メタデータは、物体領域位置データと共に例えばＸＭＬファイルに格納され、当該ＸＭＬファイルは動画像ファイルのあるディレクトリに保存されることになる。 Further, the display example of FIG. 9B is an example that is displayed after the addition of metadata, but the added metadata is displayed on the metadata display unit 50b. When the save button 60c shown in FIG. 9B is designated by the user, the metadata is stored in, for example, an XML file together with the object area position data, and the XML file is saved in a directory where the moving image file is stored. Will be.

こうして、保存されたＸＭＬファイルは動画像ファイルに関連付けられて、例えばインターネット上の所定のサイトを構成するＷｅｂサーバに登録される。これにより、Ｗｅｂサーバはクライアント（Ｗｅｂブラウザ）からのリクエストに応じて動画像のデータ及びメタデータをクライアントに提供することで、当該クライアントでは動画像を表示し当該動画像中の物体領域がユーザにより例えばマウスで指定されると、当該物体領域に対応するメタデータが表示されることになる。
［II．画像編集装置Ｓの動作］
次に、本実施形態に係る画像編集装置Ｓの動作について、図１０及び図１１等を用いて説明する。 Thus, the stored XML file is associated with the moving image file and registered in, for example, a Web server constituting a predetermined site on the Internet. Thus, the Web server provides the moving image data and metadata to the client in response to a request from the client (Web browser), so that the client displays the moving image, and the object region in the moving image is displayed by the user. For example, when specified with the mouse, metadata corresponding to the object area is displayed.
[II. Operation of image editing apparatus S]
Next, the operation of the image editing apparatus S according to the present embodiment will be described with reference to FIGS.

図１０は、本実施形態に係る画像編集装置Ｓのシステム制御部７における処理を示すフローチャートであり、図１１は、図１０に示すメタデータ付与処理の詳細を示すフローチャートである。なお、図１０及び図１１に示す処理は、画像編集装置Ｓの動作の流れを簡略化して示す一例であり、当該処理に限定されるものではない。 FIG. 10 is a flowchart showing processing in the system control unit 7 of the image editing apparatus S according to the present embodiment, and FIG. 11 is a flowchart showing details of the metadata providing processing shown in FIG. Note that the processing illustrated in FIGS. 10 and 11 is an example in which the flow of the operation of the image editing apparatus S is simplified, and is not limited to the processing.

図１０に示す処理は、上記動画像編集アプリケーションプログラムの起動後、ユーザにより所望の動画像ファイルが指定された場合に開始される。 The processing shown in FIG. 10 is started when a desired moving image file is designated by the user after the moving image editing application program is started.

図１０において、先ず、システム制御部７は、ユーザにより指定された動画像ファイルを記憶部５から読み込み（ステップＳ１）、画像メモリ上に展開する。 In FIG. 10, first, the system control unit 7 reads a moving image file designated by the user from the storage unit 5 (step S1) and develops it on the image memory.

次いで、システム制御部７は、シーン分割処理を行い、読み込んだ動画像をシーン毎に分割する（ステップＳ２）。 Next, the system control unit 7 performs scene division processing and divides the read moving image for each scene (step S2).

次いで、システム制御部７は、シーン一覧表示処理を行い、分割された各シーンの先頭フレーム画像を、例えば図２（Ａ）に示すように一覧表示する（ステップＳ３）。 Next, the system control unit 7 performs a scene list display process to display a list of the first frame images of each divided scene as shown in FIG. 2A, for example (step S3).

次いで、システム制御部７は、メタデータ入力受付準備処理を行う（ステップＳ４）。このメタデータ入力受付準備処理では、例えば図２（Ａ）または（Ｂ）示す表示画面上に表示された各種ボタンの選択に応じた処理が実行される。 Next, the system control unit 7 performs a metadata input acceptance preparation process (step S4). In this metadata input acceptance preparation process, for example, a process corresponding to the selection of various buttons displayed on the display screen shown in FIG.

次いで、システム制御部７は、物体領域表示区間の開始フレーム画像および終了フレーム画像の指定（例えば図３（Ａ）に示す表示画面上に表示された開始フレーム画像指定ボタン５４ａが指定され、さらに終了フレーム画像指定ボタン５４ｂが指定されることによる）を受け付ける（ステップＳ５）。 Next, the system control unit 7 designates the start frame image and end frame image of the object area display section (for example, the start frame image designation button 54a displayed on the display screen shown in FIG. 3A is designated, and further ends. Is received (by designating the frame image designation button 54b) (step S5).

次いで、システム制御部７は、上記指定された開始フレーム画像上における物体領域を抽出するための手がかりの指定（例えば図４（Ｂ）に示す領域抽出画面６１上に表示されたフレーム画像上で線分が引かれることによる）を受け付ける（ステップＳ６）。 Next, the system control unit 7 designates a clue for extracting the object region on the designated start frame image (for example, a line on the frame image displayed on the region extraction screen 61 shown in FIG. 4B). (By subtracting the minutes) is accepted (step S6).

次いで、システム制御部７は、物体領域抽出処理を行い、上記指定された手がかりに基づいて、上述したように、物体領域を抽出する（ステップＳ７）。 Next, the system control unit 7 performs an object region extraction process, and extracts an object region as described above based on the designated clue (step S7).

次いで、システム制御部７は、物体領域の抽出結果（例えば、図４（Ｂ）に示す領域抽出画面６１上に表示された実線６１ｄ）を表示する（ステップＳ８）。 Next, the system control unit 7 displays an object region extraction result (for example, a solid line 61d displayed on the region extraction screen 61 shown in FIG. 4B) (step S8).

次いで、システム制御部７は、ユーザから追跡開始指示があったか否かを判別し（ステップＳ９）、追跡開始指示がない場合には（ステップＳ９：ＮＯ）、ステップＳ６に戻り、追跡開始指示があった場合（例えば、例えば図４（Ｂ）に示す領域抽出画面６１上に表示された完了ボタン６１ｅが指定され、その後、図５に示す表示画面上に表示された追跡開始ボタン５７が指定されることによる）には（ステップＳ９：ＹＥＳ）、ステップＳ１０に進む。 Next, the system control unit 7 determines whether or not there has been a tracking start instruction from the user (step S9). If there is no tracking start instruction (step S9: NO), the system control unit 7 returns to step S6 and receives the tracking start instruction. (For example, the completion button 61e displayed on the area extraction screen 61 shown in FIG. 4B, for example, is designated, and then the tracking start button 57 displayed on the display screen shown in FIG. 5 is designated. (Depending on the situation) (step S9: YES), the process proceeds to step S10.

ステップＳ１０では、システム制御部７は、上記手がかりの追跡対象のフレーム画像を一つ特定する。例えば、１回目のステップＳ１０の処理では、ステップＳ６〜ステップＳ８の処理で物体領域が抽出されたフレーム画像の次のフレーム画像が特定される。 In step S10, the system control unit 7 specifies one frame image to be tracked by the clue. For example, in the first processing of step S10, the next frame image of the frame image from which the object region has been extracted by the processing of steps S6 to S8 is specified.

次いで、システム制御部７は、手がかり追跡処理を行い、上記特定されたフレーム画像において、上記開始フレーム画像で指定された手がかりの移動を追跡する（ステップＳ１１）。 Next, the system control unit 7 performs a clue tracking process to track the movement of the clue specified by the start frame image in the specified frame image (step S11).

次いで、システム制御部７は、物体領域抽出処理を行い、上記追跡された手がかりに基づいて、上述したように、物体領域を抽出する（ステップＳ１２）。 Next, the system control unit 7 performs an object region extraction process, and extracts an object region as described above based on the tracked clue (step S12).

次いで、システム制御部７は、上記ステップＳ１２で物体領域が抽出されたフレーム画像が終了フレーム画像であるか否かを判別し（ステップＳ１３）、終了フレーム画像でない場合には（ステップＳ１３：ＮＯ）、ステップＳ１０に戻り、次のフレーム画像を特定し上記と同様の処理を行い、終了フレーム画像でない場合には（ステップＳ１３：ＹＥＳ）、ステップＳ１４に進む。こうして、開始フレーム画像から終了フレーム画像に向かって順に特定され、夫々のフレーム画像において物体領域が抽出されることになる。 Next, the system control unit 7 determines whether or not the frame image from which the object region has been extracted in step S12 is an end frame image (step S13). If the frame image is not the end frame image (step S13: NO) Returning to step S10, the next frame image is identified and the same processing as described above is performed. If it is not the end frame image (step S13: YES), the process proceeds to step S14. In this way, the start frame image is specified in order toward the end frame image, and the object region is extracted in each frame image.

ステップＳ１４では、システム制御部７は、失敗検出処理を行い、上述したように、基準となるフレーム画像において抽出された物体領域と、手がかりが追跡された各フレーム画像において抽出された物体領域と、を比較して一致度を算出し、算出した一致度が失敗検出閾値以下であるフレーム画像を失敗フレーム画像として検出する。 In step S14, the system control unit 7 performs failure detection processing, and as described above, the object region extracted in the reference frame image, the object region extracted in each frame image in which the clue is tracked, Are compared with each other to calculate the degree of coincidence, and a frame image whose calculated degree of coincidence is equal to or less than the failure detection threshold is detected as a failure frame image.

次いで、システム制御部７は、追跡結果表示処理を行い、例えば図６に示すように、算出された一致度が最も小さい方から選定した所定数（例えば、１０個）のフレーム画像を一覧表示させると共に各フレーム画像上に物体領域の抽出結果（物体領域の外枠）を表示し、更に、各フレーム画像の一致度を示す棒グラフを各フレーム画像に対応させて表示させる（ステップＳ１５）。 Next, the system control unit 7 performs a tracking result display process to display a list of a predetermined number (for example, 10) of frame images selected from the one with the smallest calculated degree of coincidence, for example, as shown in FIG. At the same time, the object region extraction result (outer frame of the object region) is displayed on each frame image, and a bar graph indicating the matching degree of each frame image is displayed corresponding to each frame image (step S15).

次いで、システム制御部７は、ユーザから手がかりの修正（再指定）指示があったか否かを判別し（ステップＳ１６）、修正指示があった場合（ユーザが修正対象のフレーム画像を選択し、領域指定ボタン５６を指定することによる）には（ステップＳ１６：ＹＥＳ）、ステップＳ６に移行し、上記と同様の処理を行う。なお、この場合において、ステップＳ１０〜Ｓ１３では、当該手がかりが修正され物体領域が抽出されたフレーム画像の次のフレーム画像から終了フレーム画像まで手がかりの再追跡が行われることになるが、このとき、ユーザは、ステップＳ５で指定した終了フレーム画像とは異なる終了フレーム画像（例えば、追跡区間を狭めるため）を新たに指定することも可能である。 Next, the system control unit 7 determines whether or not there has been a clue correction (re-designation) instruction from the user (step S16), and if there is a correction instruction (the user selects a frame image to be corrected and designates a region). (By designating the button 56) (step S16: YES), the process proceeds to step S6 and the same processing as described above is performed. In this case, in steps S10 to S13, the clue is re-tracked from the next frame image to the end frame image of the frame image in which the clue is corrected and the object region is extracted. The user can also newly designate an end frame image (for example, for narrowing the tracking section) different from the end frame image designated in step S5.

一方、上記修正指示がない場合には（ステップＳ１６：ＮＯ）、システム制御部７は、ユーザからメタデータ付与指示があったか否かを判別し（ステップＳ１７）、当該メタデータ付与指示があった場合（例えば、図６に示す表示画面上に表示されたテキスト編集ボタン６０ａが指定されることによる）には（ステップＳ１７：ＹＥＳ）、ステップＳ１８に進み、当該メタデータ付与指示がない場合には（ステップＳ１７：ＮＯ）、ステップＳ１６に戻る。 On the other hand, when there is no correction instruction (step S16: NO), the system control unit 7 determines whether or not there is an instruction to give metadata (step S17), and when there is an instruction to give metadata. For example (for example, when the text editing button 60a displayed on the display screen shown in FIG. 6 is designated) (step S17: YES), the process proceeds to step S18. Step S17: NO), it returns to step S16.

ステップＳ１８では、システム制御部７は、メタデータ付与処理を行う。このメタデータ付与処理においては、システム制御部７は、図１１に示すように、メタデータ入力インターフェース（例えば、図８（Ａ）に示すテキスト情報入力画面８１）を表示し（ステップＳ１８１）、タグ名データ及びタグ内容データをメタデータとして入力受け付け（ステップＳ１８２）、当該メタデータをＲＡＭ７ｃに一時的に記憶（ステップＳ１８３）すると共に画面表示（ステップＳ１８３）し、図１０の処理に戻る。なお、メタデータ付与処理は、上記メタデータ入力受付準備処理以降、任意のタイミングで実行可能である。 In step S18, the system control unit 7 performs a metadata providing process. In this metadata providing process, the system control unit 7 displays a metadata input interface (for example, the text information input screen 81 shown in FIG. 8A) as shown in FIG. 11 (step S181), and the tag The name data and the tag content data are input as metadata (step S182), the metadata is temporarily stored in the RAM 7c (step S183) and displayed on the screen (step S183), and the process returns to the process of FIG. The metadata providing process can be executed at an arbitrary timing after the metadata input reception preparation process.

図１０に示すステップＳ１９では、図９（Ｂ）に示す保存ボタン６０ｃがユーザにより指定されると、システム制御部７は、メタデータの書き出し処理を実行し、上記一時的に記憶されたメタデータと、上述した物体領域位置データとを対応付け、これらのデータ含むＸＭＬファイルを動画像ファイルのあるディレクトリに保存（ユーザにより指定された任意のディレクトリに保存されても良い）する。 In step S19 shown in FIG. 10, when the save button 60c shown in FIG. 9B is designated by the user, the system control unit 7 executes a metadata writing process, and the temporarily stored metadata. And the object region position data described above are associated with each other, and an XML file including these data is stored in a directory having a moving image file (may be stored in an arbitrary directory designated by the user).

なお、メタデータと物体領域位置データは、ＸＭＬファイル形式で格納されることに限定されるものではなく、その他のファイル形式で格納されるものであっても良い。また、メタデータの書き出し処理は、上記メタデータ入力受付準備処理以降、任意のタイミングで実行可能である。 The metadata and the object area position data are not limited to being stored in the XML file format, but may be stored in other file formats. Further, the metadata writing process can be executed at an arbitrary timing after the metadata input reception preparation process.

次いで、ステップＳ２０では、システム制御部７は、ユーザから当該動画像において別の物体領域の抽出指示があったか否かを判別し、当該抽出指示があった場合（例えば、図６に示す表示画面上に表示された保存ボタン６０ｃが指定された後に表示される図２（Ｂ）に示す表示画面上に表示された新規作成ボタン５３ａが指定されることによる）には（ステップＳ２０：ＹＥＳ）、ステップＳ４に移行し上記と同様の処理を行い、当該抽出指示がない場合には（ステップＳ２０：ＮＯ）、当該処理を終了する。
［III．評価実験例］
次に、物体領域追跡の性能評価として、上述した本発明による方法と、従来方法とを比較した評価実験例について、図１２を用いて説明する。なお、従来方法には、物体領域内の全ての画素をオプティカルフローにより追跡（つまり、物体領域自体を追跡）する基本的な手法を用いる。また、この性能評価では、動画像中の各フレーム画像で追跡（本発明では手がかり追跡による物体領域追跡）された物体領域（本発明による方法と従来方法で同じ物体領域）の正確さ（言い換えれば、精度）をprecision及びrecallにより評価する。なお、precision S_Pとrecall S_Rは次式で例えば定義される。 Next, in step S20, the system control unit 7 determines whether or not there is an instruction to extract another object region in the moving image from the user, and if there is an instruction to extract (for example, on the display screen shown in FIG. 6). (By specifying the new creation button 53a displayed on the display screen shown in FIG. 2B displayed after the save button 60c displayed in FIG. 2 is specified) (step S20: YES), step The process proceeds to S4 and the same process as described above is performed. When there is no extraction instruction (step S20: NO), the process ends.
[III. Evaluation experiment example]
Next, as an object region tracking performance evaluation, an evaluation experiment example in which the above-described method according to the present invention is compared with the conventional method will be described with reference to FIG. Note that the conventional method uses a basic method of tracking all pixels in the object region by optical flow (that is, tracking the object region itself). In this performance evaluation, the accuracy (in other words, the same object region in the method according to the present invention and the conventional method) tracked in each frame image in the moving image (in the present invention, tracking the object region by cue tracking). , Accuracy) is evaluated by precision and recall. Incidentally, precision S _P and recall S _R is a following equation example definition.

ここで，N(R)は領域Rの画素数，R_hは真の物体領域，R_cは抽出された物体領域，R_h∧R_cはR_h とR_cの共通領域を表す。precision S_Pは、追跡された物体領域内で真の物体領域が占める割合を表し、recall S_Rは、真の物体領域内で追跡できた物体領域の割合を表す。真の物体領域とは各フレーム画像で予め人手により抽出しておいた物体領域である。 Here, N (R) is the number of pixels in the region R, R _h is the true object region, R _c is the extracted object region, and R _h ∧R _c is the common region of R _h and R _c . The precision S _P represents the ratio of the true object area in the tracked object area, and the recall S _R represents the ratio of the object area that can be tracked in the true object area. A true object region is an object region that has been manually extracted from each frame image in advance.

図１２（Ａ）は、動画像中のあるフレーム画像を示す図であり、図１２（Ｂ）は、本発明による方法の評価実験結果を表すグラフであり、図１２（Ｃ）は、従来方法の評価実験結果を表すグラフである。なお、図１２（Ａ）のフレーム画像において、符号１０１部の太線で表された物体領域（この例では、帽子）が追跡（抽出）対象である。また、図１２（Ｂ），（Ｃ）に示すグラフにおいて、横軸がフレーム数（フレーム画像の数であり、この例では、５０枚とした）、縦軸がprecision（図中、実線）とrecall（図中、点線）を表す。 FIG. 12A is a diagram showing a certain frame image in a moving image, FIG. 12B is a graph showing an evaluation experiment result of the method according to the present invention, and FIG. 12C is a conventional method. It is a graph showing the evaluation experiment result of. In the frame image of FIG. 12A, an object region (a hat in this example) represented by a bold line 101 is a tracking (extraction) target. In the graphs shown in FIGS. 12B and 12C, the horizontal axis is the number of frames (the number of frame images, 50 in this example), and the vertical axis is precision (solid line in the figure). Represents recall (dotted line in the figure).

従来方法では、図１２（Ｃ）に示すように、フレーム数の増加に伴いprecision及びrecallが次第に低下することが確認された。従来方法において、recallが次第に且つ大きく低下しているのは、物体領域内の全画素を追跡する際、異なる複数の画素を同一の場所に追跡してしまうことにより、物体領域を示す画素数が減少してしまうことが原因であると考えられる。 In the conventional method, as shown in FIG. 12C, it has been confirmed that precision and recall gradually decrease as the number of frames increases. In the conventional method, recall is gradually and greatly reduced because when tracking all the pixels in the object region, the number of pixels indicating the object region is decreased by tracking different pixels to the same location. It is thought that the cause is that it decreases.

一方、本発明による方法では、図１２（Ｂ）に示すように、precisionが次第に低下するということが見られないが、４７枚目のフレーム画像においてのみprecisionが大きく低下（追跡に失敗）していることが確認された。precisionが次第に低下しないのは、手がかりを追跡してから、それをもとに物体領域を抽出するという２段階の処理を行っているため、手がかりの追跡の誤差が直接的にはprecisionに影響しないということが原因として考えられる。また、４７枚目のフレーム画像においてprecisionが大きく低下している原因は、物体領域を指定した手がかりが境界付近にあることとその境界のエッジ強度が小さいことにあると考えられる。同様の手がかりでありながら成功と失敗に分かれるのは、フレーム画像間で明度や彩度などに多少の違いが生じ、それによってエッジ強度が変化することが原因と考えられる。 On the other hand, in the method according to the present invention, as shown in FIG. 12B, it is not seen that the precision gradually decreases, but only in the 47th frame image, the precision greatly decreases (tracking fails). It was confirmed that The precision does not decrease gradually because the tracking of the clue and then extracting the object area based on it is performed in two steps, so the tracking error of the clue does not directly affect the precision. This is considered as a cause. Also, the reason why the precision is greatly reduced in the 47th frame image is considered to be that the cue specifying the object region is near the boundary and the edge strength of the boundary is small. Although it is the same clue, it is considered that the reason for being divided into success and failure is that there are some differences in lightness and saturation between the frame images, and the edge strength changes accordingly.

以上の評価実験より、本発明による方法は、追跡の誤差が直接物体領域抽出の誤差に影響しないため、従来方法による物体領域自体を追跡する手法と比べ、精度の良い追跡が可能となると言える。また、追跡に失敗した場合には、従来方法に比べて急激なprecision低下が生じることも確認できたが、追跡に失敗した場合に極端なprecisionの低下が生じることは、物体領域の面積に急激な変化があったことを示している。従って、上述したように、物体領域の面積変化を測定（一致度を算出）することで、コンピュータ（システム制御部７）が追跡に失敗したフレーム画像を自動検知でき、ユーザに対して修正すべきフレーム画像を容易に提示することも可能になり、より修正作業の負担の少ないシステムが実現可能である。 From the above evaluation experiment, it can be said that the method according to the present invention enables tracking with higher accuracy than the method of tracking the object region itself according to the conventional method because the tracking error does not directly affect the error of object region extraction. In addition, when tracking failed, it was confirmed that there was a sharp decrease in precision compared to the conventional method, but when tracking failed, an extreme decrease in precision occurred rapidly in the area of the object region. It shows that there was a change. Therefore, as described above, by measuring the area change of the object region (calculating the degree of coincidence), the computer (system control unit 7) can automatically detect the frame image that has failed to be tracked and should be corrected for the user. It is also possible to easily present a frame image, and a system with less burden of correction work can be realized.

また、本発明による方法では、局所的に追跡ミスが発生するのは、最初のフレーム画像で与えた手がかりとしての例えば線の情報が少ないことに起因すると考えられる。そこで、検証として最初のフレーム画像において、線を数本追加し、再度追跡を行った結果、最初のフレーム画像では物体領域の抽出結果は変わらないが、線の追加をした場合ではprecisionとrecallの局所的な低下が見られず、精度が向上した追跡結果であることが確認された。数本の線の追加により追跡結果を改善できることは、コンピュータによる追跡結果の自動修正を可能とする。上述のように追跡に失敗したフレーム画像を自動検知することができれば、修正できたかどうかをコンピュータで判断することができるため、例えばユーザにより指定された線から予め設定しておいた距離内に線の追加などを自動的に行うことで自動修正が可能となる。 Further, in the method according to the present invention, it is considered that the reason why the tracking error occurs locally is due to a small amount of information of, for example, a line as a clue given in the first frame image. Therefore, as a result of the verification, after adding several lines in the first frame image and tracking again, the extraction result of the object area does not change in the first frame image, but when adding lines, the precision and recall It was confirmed that the tracking results were improved in accuracy without any local degradation. The ability to improve tracking results by adding a few lines allows automatic correction of tracking results by a computer. If it is possible to automatically detect a frame image that has failed to be tracked as described above, it can be determined by a computer whether or not the frame image has been corrected. For example, a line within a distance set in advance from a line designated by the user. Automatic correction can be performed by automatically adding and the like.

また、上記自動修正の他の例として、誤っている手がかりを修正して変更するように構成しても良い。 Further, as another example of the automatic correction described above, it may be configured to correct and change an erroneous clue.

図１３は、誤っている手がかりが自動修正される例を示す図である。例えば、ｎフレーム画像での手がかりを追跡した結果であるｎ＋１フレーム画像での手がかりをもとに、ｎ＋１フレーム画像での物体領域の抽出で失敗したことが検出されたとき、手がかり自動修正が実行され、ｎ＋１フレーム画像での手がかりにおける曲率、座標分布などにより、特異である点、領域などが判別され、予め指定した閾値に収まるように、特異点あるいは特異領域の削除、追加、修正が行われる。例えば、図１３（右側）に示すように、手がかりが複数の直線からなるポリラインである場合は、システム制御部７は、曲率などにより特異点を判断し、その点を削除し、削除した点に接続している点を直線で結び、その中間に点を置くことで修正を行う。 FIG. 13 is a diagram illustrating an example in which an erroneous clue is automatically corrected. For example, when it is detected that the extraction of the object region in the n + 1 frame image has failed based on the clue in the n + 1 frame image, which is the result of tracking the clue in the n frame image, the clue automatic correction is executed. Specific points and regions are discriminated based on curvature, coordinate distribution, etc. in the clues in the n + 1 frame image, and singular points or singular regions are deleted, added, and corrected so as to fall within a predetermined threshold value. For example, as shown in FIG. 13 (right side), when the clue is a polyline composed of a plurality of straight lines, the system control unit 7 determines a singular point based on the curvature and the like, deletes the point, and deletes the point. Connect the connected points with a straight line, and fix the point by placing it in the middle.

かかる自動修正後は、図示の如く、ｎ＋１フレーム画像から後のフレーム画像において手がかり追跡処理が継続されても良い。 After such automatic correction, as shown in the figure, the cue tracking process may be continued in the frame image after the n + 1 frame image.

なお、上記実験結果は、あくまでも一例であり、その結果は、物体領域の大きさや、物体領域と背景領域との境界のエッジ強度等により変わりうるものであるが、上記実験結果は、本発明による方法と従来方法の結果の違いがより顕著に現れた例を示したものである。
［IV．効果］
以上説明したように、上記実施形態によれば、手がかりが指定されたフレーム画像以外の他のフレーム画像において物体領域を追跡するのではなく、手がかりを追跡し、各フレーム画像において追跡された手がかりに基づき物体領域を抽出するように構成したので、上記評価実験からも分かるように追跡の誤差が直接物体領域抽出の誤差に影響せず、従来方法による物体領域自体を追跡する手法と比べ、より高い精度での物体領域の抽出を行うことができると共に、追跡失敗が発生することを前提とし、あるフレーム画像で物体領域抽出に失敗した（追跡に失敗した）としても、その失敗が次のフレーム画像での物体画像抽出には影響しないという効果を有する。さらに、抽出された物体領域自体を追跡する場合と比べて、失敗したフレーム画像内で手がかりを追加、削除する修正作業をすれば良く、ユーザ（人手）による作業負担を低減させることができる（追跡失敗のユーザによる修正作業が容易となる）。 Note that the above experimental results are merely examples, and the results may vary depending on the size of the object region, the edge strength of the boundary between the object region and the background region, etc. This shows an example in which the difference between the results of the method and the conventional method appears more prominently.
[IV. effect]
As described above, according to the above embodiment, the object area is not tracked in the frame image other than the frame image in which the clue is specified, but the clue is tracked and the clue tracked in each frame image is used. Since the object area is extracted based on the above-mentioned evaluation experiment, the tracking error does not directly affect the object area extraction error and is higher than the conventional method of tracking the object area itself. It is possible to extract an object region with high accuracy, and on the assumption that tracking failure will occur, even if object region extraction fails (tracking fails) in a certain frame image, the failure will be detected in the next frame image. This has the effect of not affecting the object image extraction. Furthermore, compared with the case where the extracted object region itself is tracked, it is only necessary to perform a correction work for adding or deleting a clue in the failed frame image, and the work burden on the user (manual operation) can be reduced (tracking). This makes it easier for the user to make corrections.

また、手がかりを追跡するので、物体領域を追跡するときより小さいデータ量で追跡できるため、追跡処理を高速化することができる。 In addition, since the clue is tracked, tracking can be performed with a smaller amount of data when tracking the object region, so that the tracking process can be speeded up.

また、追跡に失敗した場合には、従来方法に比べて急激な精度低下が生じうるので、例えば、物体領域の面積を測定して一致度を算出すれば、容易に失敗フレーム画像を検出することができ、ユーザに対して修正すべきフレーム画像を容易に提示することも可能になり、ユーザによる手がかりを追加、削除する修正作業を効率化、迅速化することができると共に、修正作業の負担をより低減させることができる。 Also, when tracking fails, the accuracy can be drastically reduced compared to the conventional method.For example, if the degree of coincidence is calculated by measuring the area of the object region, the failure frame image can be easily detected. It is also possible to easily present a frame image to be corrected to the user, making it possible to improve and speed up the correction work for adding and deleting clues by the user and reducing the burden of the correction work. It can be further reduced.

また、追跡に失敗したフレーム画像において手がかりを追加、削除する修正作業を行うだけで再指定が実現され、物体領域の再抽出が行われるので、少ない修正量で物体領域の抽出失敗を修正することができ、ユーザによる作業負担をより低減させることができる。 In addition, it is possible to re-specify the object area by re-extracting the object area simply by adding or deleting clues in the frame image that has failed to be tracked, so that the object area extraction failure can be corrected with a small amount of correction. The work burden on the user can be further reduced.

また、図６に示すように、上述した一致度が失敗検出閾値以下である失敗フレーム画像を含む一致度が小さいフレーム画像と共に、夫々のフレーム画像の一致度を表す棒グラフを夫々のフレーム画像に対応付させて表示するようにしたので、ユーザはどの位置にあるフレーム画像の一致度が低いかを一目で把握することができ、ユーザによる修正作業を効率化、迅速化することができる。 Further, as shown in FIG. 6, a bar graph indicating the degree of coincidence of each frame image is associated with each frame image together with the frame image having a small degree of coincidence including the failure frame image whose coincidence degree is equal to or less than the failure detection threshold. Since the image is attached and displayed, the user can grasp at a glance at which position the matching degree of the frame image is low, and the correction work by the user can be made efficient and quick.

更に、一致度が失敗検出閾値以下である失敗フレーム画像に対応する棒グラフと、一致度が失敗検出閾値より大きいフレーム画像に対応する棒グラフと、を互いに異なる表示態様（例えば色や模様等が異なる）で表示するようにしたので、ユーザは一目で失敗フレーム画像を把握することができる。 Further, the bar graph corresponding to the failure frame image whose coincidence degree is equal to or lower than the failure detection threshold and the bar graph corresponding to the frame image having a coincidence degree greater than the failure detection threshold are different from each other in display mode (for example, different in color, pattern, etc.) The user can grasp the failed frame image at a glance.

また、動画像中のシーン全体に対してメタデータを付与するのではなく、シーンの中に含まれる物体領域に直接メタデータを付与することができ、シーンに複数の物体領域が存在する場合でも、複数の物体領域それぞれに対応するメタデータを直接付与することができる。 In addition, metadata is not assigned to the entire scene in the moving image, but metadata can be directly attached to the object area included in the scene, even when there are multiple object areas in the scene. The metadata corresponding to each of the plurality of object regions can be directly given.

更に、メタデータに付加される物体領域位置データとして、物体領域の外枠の座標とするのではなく、例えば抽出された物体領域を囲む円や外接円の中心の座標と半径により定義するように構成すれば、保存されるデータの量を減らすことができる。 Furthermore, the object area position data added to the metadata is not defined by the coordinates of the outer frame of the object area but, for example, by the coordinates and radius of the center of the circle or circumscribed circle surrounding the extracted object area. If configured, the amount of data stored can be reduced.

本実施形態に係る画像編集装置Ｓの概要構成例を示す図である。It is a figure which shows the example of a schematic structure of the image editing apparatus S which concerns on this embodiment. 動画像編集アプリケーションプログラムの実行過程において表示部２におけるディスプレイ上に表示される表示画面例を示す図である。It is a figure which shows the example of a display screen displayed on the display in the display part in the execution process of a moving image editing application program. 動画像編集アプリケーションプログラムの実行過程において表示部２におけるディスプレイ上に表示される表示画面例を示す図である。It is a figure which shows the example of a display screen displayed on the display in the display part in the execution process of a moving image editing application program. 動画像編集アプリケーションプログラムの実行過程において表示部２におけるディスプレイ上に表示される表示画面例を示す図である。It is a figure which shows the example of a display screen displayed on the display in the display part in the execution process of a moving image editing application program. 動画像編集アプリケーションプログラムの実行過程において表示部２におけるディスプレイ上に表示される表示画面例を示す図である。It is a figure which shows the example of a display screen displayed on the display in the display part in the execution process of a moving image editing application program. 動画像編集アプリケーションプログラムの実行過程において表示部２におけるディスプレイ上に表示される表示画面例を示す図である。It is a figure which shows the example of a display screen displayed on the display in the display part in the execution process of a moving image editing application program. 動画像編集アプリケーションプログラムの実行過程において表示部２におけるディスプレイ上に表示される表示画面例を示す図である。It is a figure which shows the example of a display screen displayed on the display in the display part in the execution process of a moving image editing application program. 動画像編集アプリケーションプログラムの実行過程において表示部２におけるディスプレイ上に表示される表示画面例を示す図である。It is a figure which shows the example of a display screen displayed on the display in the display part in the execution process of a moving image editing application program. 動画像編集アプリケーションプログラムの実行過程において表示部２におけるディスプレイ上に表示される表示画面例を示す図である。It is a figure which shows the example of a display screen displayed on the display in the display part in the execution process of a moving image editing application program. 本実施形態に係る画像編集装置Ｓのシステム制御部７における処理を示すフローチャートである。It is a flowchart which shows the process in the system control part 7 of the image editing apparatus S which concerns on this embodiment. 図１０に示すメタデータ付与処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the metadata provision process shown in FIG. （Ａ）は、動画像中のあるフレーム画像を示す図であり、（Ｂ）は、本発明による方法の評価実験結果を表すグラフであり、（Ｃ）は、従来方法の評価実験結果を表すグラフである。(A) is a figure which shows a certain frame image in a moving image, (B) is a graph showing the evaluation experiment result of the method by this invention, (C) represents the evaluation experiment result of the conventional method. It is a graph. 誤っている手がかりが自動修正される例を示す図である。It is a figure which shows the example by which the clue which is incorrect is corrected automatically.

Explanation of symbols

１操作部
２表示部
３通信部
４ドライブ部
５記憶部
６入出力インターフェース部
７システム制御部
７ａＣＰＵ
７ｂＲＯＭ
７ｃＲＡＭ
８システムバス
Ｓ画像編集装置 DESCRIPTION OF SYMBOLS 1 Operation part 2 Display part 3 Communication part 4 Drive part 5 Memory | storage part 6 Input / output interface part 7 System control part 7a CPU
7b ROM
7c RAM
8 System bus S Image editing device

Claims

Computer
Display control means for displaying at least one of the plurality of frame images constituting the moving image;
A line segment serving as a clue to extract an object area included in the displayed frame image, which is separately drawn on the object area and an area other than the object area, is received from a user. Reception means,
An object region extracting means for extracting an object region from the frame image based on the line segment serving as the designated clue; and
In another frame image other than the frame image segment serving as the cue is specified, to function as a track means Ri hand or to track the movement of the segment to be the designated cue,
The object region extraction processing program further extracts an object region from the frame image based on the tracked line segment as a clue.

In the object region extraction processing program according to claim 1,
The object region extraction processing program that continuously tracks the movement of the designated line segment in the plurality of other frame images.

In the object region extraction processing program according to claim 1 or 2,
An object region extraction processing program that causes the computer to further function as failure detection means for detecting a frame image in which extraction of the object region based on the tracked line segment has failed.

In the object region extraction processing program according to claim 3,
An object area extraction processing program causing the computer to further function as information presenting means for presenting information indicating the detected frame image.

In the object region extraction processing program according to claim 3,
The display control means displays the detected frame image,
The accepting unit accepts re-designation of a line segment as a clue to the displayed frame image from a user,
The object region extraction processing program, wherein the object region extraction unit extracts an object region from a frame image in which a line segment as a clue is redesignated.

In the object region extraction processing program according to any one of claims 3 to 5,
The failure detection means compares the object region extracted in the reference frame image with the object region extracted in another frame image other than the frame image, and calculates the degree of coincidence thereof. An object area extraction processing program for detecting a frame image in which extraction of an object area has failed.

In the object region extraction processing program according to claim 1,
The object area extracting means, in other plurality of frame images other than the frame image segment serving as the cue is specified, continuously tracks the movement of the segment to be the designated cue,
The object area extracting means extracts an object area from each frame image based on the tracked line segments ,
The computer compares the object region extracted in the frame image in which the line segment serving as the clue is designated with the object region extracted in each of a plurality of frame images other than the frame image. Further function as a degree of coincidence calculation means for calculating the degree of coincidence of each,
The display control means displays the frame image in which the calculated degree of coincidence is equal to or less than a predetermined threshold value.

In the object region extraction processing program according to claim 7,
The display control means displays a display object in which the degree of coincidence of the frame image in which the calculated degree of coincidence is equal to or less than the predetermined threshold can be visually grasped by a user in correspondence with the frame image. An object region extraction processing program characterized by

In the object region extraction processing program according to claim 8,
The display control means includes the display object corresponding to the frame image in which the calculated coincidence is less than or equal to a failure detection threshold smaller than the predetermined threshold, and the calculated coincidence is greater than the failure detection threshold. An object region extraction processing program that displays a display object corresponding to a frame image in different display modes.

In the object region extraction processing program according to claim 1,
The object area extracting means, in other plurality of frame images other than the frame image segment serving as the cue is specified, continuously tracks the movement of the segment to be the designated cue,
The object area extracting means extracts an object area from each frame image based on the tracked line segments ,
The computer compares the object region extracted in the frame image in which the line segment serving as the clue is designated with the object region extracted in each of a plurality of frame images other than the frame image. Further function as a degree of coincidence calculation means for calculating the degree of coincidence of each,
The display control means displays a predetermined number of the frame images selected from the one with the smallest calculated degree of coincidence.

In the object region extraction processing program according to claim 10,
The display control means displays a display object whose calculated degree of coincidence can be visually grasped by a user in correspondence with each of the predetermined number of the frame images. .

In the object region extraction processing program according to claim 11,
The display control means includes the display object corresponding to the frame image in which the calculated matching degree is equal to or less than a failure detection threshold, and a display corresponding to the frame image in which the calculated matching degree is greater than the failure detection threshold. An object region extraction processing program that displays objects in different display modes.

In the object region extraction processing program according to claim 1,
The object area extracting means, in other plurality of frame images other than the frame image segment serving as the cue is specified, continuously tracks the movement of the segment to be the designated cue,
The object area extracting means extracts an object area from each frame image based on the tracked line segments ,
The computer compares the object region extracted in the frame image in which the line segment serving as the clue is designated with the object region extracted in each of a plurality of frame images other than the frame image. Further function as a degree of coincidence calculation means for calculating the degree of coincidence of each,
The display control means displays each frame image for which the degree of coincidence is calculated, and displays a display object in which the calculated degree of coincidence can be visually grasped in correspondence with the frame image. An object region extraction processing program characterized by that.

In the object region extraction processing program according to claim 13,
The display control means includes the display object corresponding to the frame image in which the calculated matching degree is equal to or less than a failure detection threshold, and a display corresponding to the frame image in which the calculated matching degree is greater than the failure detection threshold. An object region extraction processing program that displays objects in different display modes.

In the object region extraction processing program according to any one of claims 6 to 14,
The comparison uses at least one of a color histogram of the object region, an area of the object region, a shape of the object region, a texture of the object region, and a barycentric position of the object region. Object area extraction processing program.

In the object region extraction processing program according to any one of claims 1 to 15,
The computer is further caused to function as a metadata adding unit that adds metadata including information on an object represented in the object area to the object area extracted based on the tracked line segment . An object region extraction processing program characterized by that.

In the object region extraction processing program according to claim 16,
The object region extraction processing program, wherein the metadata attaching unit assigns the metadata to each of the plurality of extracted object regions.

In the object region extraction processing program according to claim 16 or 17,
An object area extraction processing program characterized in that the metadata includes position information indicating a position of the extracted object area on the frame image.

The object region extraction processing program according to claim 18,
The position information is a coordinate of at least one point on a line segment drawn on or near the extracted object area.

Display control means for displaying at least one frame image among a plurality of frame images constituting the moving image;
A line segment serving as a clue to extract an object area included in the displayed frame image, which is separately drawn on the object area and an area other than the object area, is received from a user. Receiving means;
An object region extracting means for extracting an object region from the frame image based on the line segment serving as the designated clue;
In another frame image other than the frame image segment serving as the cue is specified, and the tracking means Ri hand or to track the movement of the segment to be the designated cue,
With
The object area extracting device further extracts an object area from the frame image based on the tracked line segment as a clue.

An object region extraction method performed by a computer,
Displaying at least one frame image of a plurality of frame images constituting the moving image;
A line segment serving as a clue to extract an object area included in the displayed frame image, which is separately drawn on the object area and an area other than the object area, is received from a user. Process,
An object region extracting step of extracting an object region from the frame image based on the line segment serving as the designated clue;
In another frame image other than the frame image segment serving as the cue is specified, and as engineering you track the movement of the segment to be the designated cue,
Including
In the object area extraction step, an object area is further extracted from the frame image based on the tracked line segment .