JPWO2014013690A1

JPWO2014013690A1 - Comment information generating apparatus and comment information generating method

Info

Publication number: JPWO2014013690A1
Application number: JP2013557995A
Authority: JP
Inventors: 亜矢子丸山; 登　一生; 一生登; 浩市堀田; 州平笹倉
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2012-07-17
Filing date: 2013-07-04
Publication date: 2016-06-30
Anticipated expiration: 2033-07-04
Also published as: CN103797783B; JP5659307B2; US20140196082A1; WO2014013690A1; CN103797783A; US20160309239A1; US9681201B2

Abstract

コメント情報生成装置（１００）は、動画像を受け付ける動画取得部（１０１）と、動画取得部（１０１）が受け付けた動画像中のオブジェクトの位置情報と、オブジェクトに、特定のタイミングから追従表示させるコメントの入力を受け付けるコメント入力受付部（１０２）と、コメントを表示し続ける時間長の目標値である目標時間を、コメント入力受付部（１０２）が受け付けたコメントに基づいて決定する目標時間決定部（１０６）と、位置情報で示されるオブジェクトにコメントを追従表示させるための、オブジェクトの移動軌跡が、目標時間に十分近い時間長の移動軌跡となるよう、オブジェクトの移動軌跡を決定するオブジェクト動き決定部（１０７）と、コメントと、オブジェクト動き決定部（１０７）が決定したオブジェクトの移動軌跡を含む出力コメント情報を生成する出力コメント情報生成部（１０４）と、出力コメント情報を出力する出力部（１０５）とを備える。The comment information generation device (100) displays a moving image acquisition unit (101) that receives a moving image, position information of an object in the moving image received by the moving image acquisition unit (101), and displays the object following the specific timing. A comment input receiving unit (102) that receives an input of a comment, and a target time determining unit that determines a target time that is a target value of a time length during which the comment is continuously displayed based on the comment received by the comment input receiving unit (102). (106) and object movement determination for determining the object movement locus so that the object movement locus for displaying the comment on the object indicated by the position information is a movement locus having a time length sufficiently close to the target time. Object (107), comment, and object determined by the object motion determination unit (107) Comprising output comment information generator for generating an output comment information including the moving locus of the bets and (104), an output unit that outputs an output comment information and (105).

Description

本発明は、動画像中のオブジェクトに追従してコメントを重畳表示させるためのコメント情報を生成するコメント情報生成装置およびコメント情報生成方法に関する。 The present invention relates to a comment information generating apparatus and a comment information generating method for generating comment information for displaying a comment superimposed on an object in a moving image.

近年、ネットワークの発達や、携帯端末の普及により、ネットワークを介して、遠隔地の相手と、手軽に動画コンテンツや静止画コンテンツを通じたコミュニケーションを楽しむことが日常的に行なわれるようになっている。例えば、既存のコンテンツに対し、コンテンツの時間軸上のとあるタイミングで、ユーザがそれぞれ、文章によって個人の嗜好や考えを書き込めるサービスが存在する。このような、コンテンツを通じたコミュニケーションは、表示デバイスや通信技術の発達とともに、今後ますます増えると考えられる。 In recent years, due to the development of networks and the spread of mobile terminals, it has become commonplace to enjoy communication through a video content or a still image content with a remote partner via a network. For example, with respect to existing content, there is a service in which a user can write an individual's preferences and thoughts by sentences at a certain timing on the content time axis. Such communication through content is expected to increase more and more with the development of display devices and communication technology.

このようなコミュニケーションを実現するため、動画像上にユーザが付与したコメントを合成した動画像を生成し、インターネット上で配信する技術が特許文献１に示されている。 In order to realize such communication, Patent Document 1 discloses a technique for generating a moving image in which a comment given by a user is synthesized on a moving image and distributing it on the Internet.

特開２００８−１４８０７１号公報JP 2008-148071 A 特開２０１０−２４４４３７号公報JP 2010-244437 A 国際公開第２０１０／１１６８２０号International Publication No. 2010/116820 特開２００４−１２８６１４号公報JP 2004-128614 A 特開２００９−８１５９２号公報JP 2009-81592 A 特開２００３−１３２０４７号公報JP 2003-130447 A 特許第４９９４５２５号公報Japanese Patent No. 4994525

Ｐ．Ａｎａｎｄａｎ，“ＡＣｏｍｐｕｔａｔｉｏｎａｌＦｒａｍｅｗｏｒｋａｎｄａｎＡｌｇｏｒｉｔｈｍｆｏｒｔｈｅＭｅａｓｕｒｅｍｅｎｔｏｆＶｉｓｕａｌＭｏｔｉｏｎ”，ＩｎｔｅｒｎａｔｉｏｎａｌＪｏｕｒｎａｌｏｆＣｏｍｐｕｔｅｒＶｉｓｉｏｎ，Ｖｏｌ．２，ｐｐ．２８３−３１０，１９８９P. Anandan, “A Computational Framework and an Algorithm for the Measurement of Visual Motion”, International Journal of Computer Vision, Vol. 2, pp. 283-310, 1989 ＶｌａｄｉｍｉｒＫｏｌｍｏｇｏｒｏｖａｎｄＲａｍｉｎＺａｂｉｈ， “ＣｏｍｐｕｔｉｎｇＶｉｓｕａｌＣｏｒｒｅｓｐｏｎｄｅｎｃｅｗｉｔｈＯｃｃｌｕｓｉｏｎｓｖｉａＧｒａｐｈＣｕｔｓ”，ＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎ，２００１Vladimir Kolmogorov and Ramin Zabih, “Computing Visual Correspondence with Occlusions via Graph Cuts”, International Conference on Computer Vision, 2001 ＪｉａｎｂｏＳｈｉａｎｄＣａｒｌｏＴｏｍａｓｉ “ＧｏｏｄＦｅａｔｕｒｅｓｔｏＴｒａｃｋ”，ＩＥＥＥＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ，ｐｐ５９３−６００，１９９４Jianbo Shi and Carlo Tomasi “Good Features to Track”, IEEE Conference on Computer Vision and Pattern Recognition, pp 593-600, 1994 ＰｅｄｒｏＦ．ＦｅｌｚｅｎｓｚｗａｌｂａｎｄＤａｎｉｅｌＰ．Ｈｕｔｔｅｎｌｏｃｈｅｒ “ＥｆｆｉｃｉｅｎｔＧｒａｐｈ−ＢａｓｅｄＩｍａｇｅＳｅｇｍｅｎｔａｔｉｏｎ”，ＩｎｔｅｒｎａｔｉｏｎａｌＪｏｕｒｎａｌｏｆＣｏｍｐｕｔｅｒＶｉｓｉｏｎ，Ｖｏｌ．５９，Ｎｏ．２，ｐｐ．１６７−１８１，Ｓｅｐｔ，２００４Pedro F.M. Felzenszwalb and Daniel P. Huttenlocher “Efficient Graph-Based Image Segmentation”, International Journal of Computer Vision, Vol. 59, no. 2, pp. 167-181, Sept, 2004

しかしながら、従来のコメント重畳（ないしコメント付き動画像生成）方法によると、コメントは動画像中のオブジェクトとは無関係な位置に表示される。また、コメントの表示時間も固定である。このため、動画像の視聴者は、どのオブジェクトに対して各コメント投稿者が付与したコメントであるのかを理解しにくい。また、非常に長いコメントの場合、前記固定されたコメントの表示時間内にコメントを読み終わることができない。このため、視聴者によるコメントの視認性が低下するという課題がある。 However, according to the conventional method for superimposing a comment (or generating a moving image with a comment), the comment is displayed at a position unrelated to the object in the moving image. Also, the comment display time is fixed. For this reason, it is difficult for viewers of moving images to understand which object each comment contributor gives a comment to. Also, in the case of a very long comment, it is impossible to finish reading the comment within the fixed comment display time. For this reason, there exists a subject that the visibility of the comment by a viewer falls.

本発明は、上述の課題を解決するためになされたものであり、コメントの視認性を向上させることのできる出力コメント情報を生成するコメント情報生成装置およびコメント情報生成方法を提供することを目的とする。 The present invention has been made to solve the above-described problem, and an object of the present invention is to provide a comment information generation device and a comment information generation method for generating output comment information that can improve the visibility of a comment. To do.

上記目的を達成するために、本発明の一態様に係るコメント情報生成装置は、動画像を受け付ける動画取得部と、前記動画取得部が受け付けた前記動画像中のオブジェクトの位置情報と、前記オブジェクトに特定のタイミングから追従表示させるコメントとの入力を受け付けるコメント入力受付部と、前記コメントを表示し続ける時間長の目標値である目標時間を、前記コメント入力受付部が受け付けた前記コメントに基づいて決定する目標時間決定部と、前記位置情報で示される前記オブジェクトに前記コメントを追従表示させるための、前記オブジェクトの移動軌跡が、前記目標時間の時間長の移動軌跡となるように、前記オブジェクトの移動軌跡を決定するオブジェクト動き決定部と、前記コメントと、前記オブジェクト動き決定部が決定した前記オブジェクトの移動軌跡とを含む出力コメント情報を生成する出力コメント情報生成部と、前記出力コメント情報生成部が生成した前記出力コメント情報を出力する出力部とを備える。 In order to achieve the above object, a comment information generation device according to an aspect of the present invention includes a moving image acquisition unit that receives a moving image, position information of an object in the moving image that the moving image acquisition unit receives, and the object A comment input receiving unit that receives an input of a comment to be displayed following from a specific timing, and a target time that is a target value of a time length for continuously displaying the comment based on the comment received by the comment input receiving unit A target time determination unit for determining, and the movement trajectory of the object for following and displaying the comment on the object indicated by the position information is a movement trajectory of the time length of the target time. An object motion determining unit for determining a movement locus, the comment, and the object motion determining unit; And an output comment information generator for generating an output comment information including the moving locus of boss was the object, and an output unit which outputs the output comment information said output comment information generating unit has generated.

なお、これらの全般的または具体的な態様は、システム、方法、集積回路、コンピュータプログラムまたはコンピュータ読み取り可能なＣＤ−ＲＯＭなどの記録媒体で実現されてもよく、システム、方法、集積回路、コンピュータプログラムおよび記録媒体の任意な組み合わせで実現されてもよい。 These general or specific aspects may be realized by a system, a method, an integrated circuit, a computer program, or a recording medium such as a computer-readable CD-ROM. The system, method, integrated circuit, computer program And any combination of recording media.

本発明によると、コメントの視認性を向上させることのできる出力コメント情報を生成することができる。 According to the present invention, it is possible to generate output comment information that can improve comment visibility.

図１は、実施の形態が対象とするコメント付き動画像配信システムの構成を示す図である。FIG. 1 is a diagram illustrating a configuration of a commented moving image distribution system targeted by the embodiment. 図２Ａは、動画像へのコメント付与の時間的な流れを示す図である。FIG. 2A is a diagram illustrating a temporal flow of adding a comment to a moving image. 図２Ｂは、動画像へのコメント付与および動画像視聴の時間的な流れを示す図である。FIG. 2B is a diagram illustrating a temporal flow of adding a comment to a moving image and viewing the moving image. 図３は、実施の形態におけるコメント情報生成装置の機能的な構成を示すブロック図である。FIG. 3 is a block diagram illustrating a functional configuration of the comment information generation device according to the embodiment. 図４Ａは、コンピュータによって構成されたコメント情報生成装置のハードウェア構成を示す図である。FIG. 4A is a diagram illustrating a hardware configuration of a comment information generation device configured by a computer. 図４Ｂは、コンピュータによって構成されたコメント情報生成装置のハードウェア構成を示す図である。FIG. 4B is a diagram illustrating a hardware configuration of a comment information generation device configured by a computer. 図５は、コメント情報生成装置の動作の手順を示すフローチャートである。FIG. 5 is a flowchart showing an operation procedure of the comment information generating apparatus. 図６Ａは、動画像の一例を示す図である。FIG. 6A is a diagram illustrating an example of a moving image. 図６Ｂは、動画像中の被写体領域について示す図である。FIG. 6B is a diagram illustrating a subject area in a moving image. 図７Ａは、動画像上でユーザがコメント入力を行なう手順を説明するための図である。FIG. 7A is a diagram for explaining a procedure for a user to input a comment on a moving image. 図７Ｂは、入力コメント情報のデータ例を示す図である。FIG. 7B is a diagram illustrating a data example of input comment information. 図８Ａは、オブジェクト動き決定部による移動軌跡の算出方法を説明するための図である。FIG. 8A is a diagram for explaining a method of calculating a movement trajectory by the object motion determination unit. 図８Ｂは、オブジェクト動き決定部による移動軌跡の算出方法を説明するための図である。FIG. 8B is a diagram for explaining a method of calculating a movement trajectory by the object motion determination unit. 図９Ａは、ノイズまたはモデル誤差の許容度を低くした場合の移動軌跡を示す図である。FIG. 9A is a diagram showing a movement trajectory when the tolerance of noise or model error is lowered. 図９Ｂは、ノイズまたはモデル誤差の許容度を高くした場合の移動軌跡を示す図である。FIG. 9B is a diagram illustrating a movement trajectory when the tolerance of noise or model error is increased. 図１０Ａは、遮蔽が起こった場合の、動き推定処理について説明するための図である。FIG. 10A is a diagram for describing motion estimation processing in the case where occlusion occurs. 図１０Ｂは、シーンチェンジが起こった場合の、動き推定処理について説明するための図である。FIG. 10B is a diagram for describing motion estimation processing when a scene change occurs. 図１１Ａは、コメント開始時刻を遡ることによる移動軌跡の伸張処理について説明するための図である。FIG. 11A is a diagram for explaining the extension process of the movement trajectory by tracing back the comment start time. 図１１Ｂは、所定の画素範囲内から目標時間を満たす移動軌跡を算出する例について説明するための図である。FIG. 11B is a diagram for describing an example of calculating a movement locus that satisfies a target time from within a predetermined pixel range. 図１２Ａは、ユーザが指定した領域内から目標時間を満たす移動軌跡を算出する例について説明するための図である。FIG. 12A is a diagram for describing an example of calculating a movement trajectory that satisfies a target time from within a region designated by a user. 図１２Ｂは、領域分割部により分割された同一領域内から目標時間を満たす移動軌跡を算出する例について説明するための図である。FIG. 12B is a diagram for describing an example of calculating a movement trajectory that satisfies a target time from the same region divided by the region dividing unit. 図１３Ａは、粗い領域分割を行った場合の領域の移動軌跡の一例を示す図である。FIG. 13A is a diagram illustrating an example of a movement trajectory of an area when rough area division is performed. 図１３Ｂは、細かい領域分割を行った場合の領域の移動軌跡の一例を示す図である。FIG. 13B is a diagram illustrating an example of a movement trajectory of an area when fine area division is performed. 図１４は、データベースの一例を示す図である。FIG. 14 is a diagram illustrating an example of a database. 図１５は、目標時間補正部を備えるコメント情報生成装置の機能的な構成を示すブロック図である。FIG. 15 is a block diagram illustrating a functional configuration of a comment information generation device including a target time correction unit.

（本発明の基礎となった知見）
本発明者らは、「背景技術」の欄において記載した、従来のコメント重畳方法に関し、以下の問題が生じることを見出した。(Knowledge that became the basis of the present invention)
The present inventors have found that the following problems arise with respect to the conventional comment superimposing method described in the “Background Art” column.

特許文献１に記載のシステムは、動画サーバ、およびコメント蓄積配信サーバを備えるシステムであり、各ユーザがネットワークを介して書き込んだコメントは、全てコメント蓄積配信サーバ上に蓄えられ、コメントが書き込まれた動画像上の再生時刻を基準点として動画像に重畳され配信される（以降、このような動画像を「コメント付き動画像」と呼ぶ）。動画像ないしコメント付動画像が配信され再生されるたびにユーザによって新たに書き込まれるコメントは、動画像時間軸上の時刻と対応付けられてコメント蓄積配信サーバ上で管理され、基本的にはコメントの新旧に関わらず、将来の配信では動画像時間軸上の時刻に基づいて配信される。各コメントは、ユーザがコメントの投稿の対象としたオブジェクトとは無関係に、動画像上を流れるように、または動画像上の一定の位置に固定して表示される。 The system described in Patent Document 1 is a system including a moving image server and a comment storage / delivery server, and all comments written by each user via the network are stored on the comment storage / delivery server, and comments are written. The reproduction time on the moving image is superimposed and distributed on the moving image using the reproduction time as a reference point (hereinafter, such a moving image is referred to as “moving image with a comment”). A comment newly written by the user each time a moving image or a moving image with a comment is distributed and played back is managed on the comment accumulation / delivery server in association with the time on the moving image time axis. Regardless of whether new or old, future distribution will be based on the time on the video time axis. Each comment is displayed so as to flow on the moving image or fixed at a certain position on the moving image regardless of the object to which the user has posted the comment.

また、特許文献２および３には、インターネットを介さずに動画像に文字情報を付加するムービー機器が開示されている。特許文献２および３は、特に、付加データをフキダシなどで表示し、動画像中のオブジェクトの移動に応じてフキダシを変化させることで、ユーザが本当に付加データを与えたかったのは、どのオブジェクトであるかが容易にわかる手法を提案している。 Patent Documents 2 and 3 disclose movie devices that add character information to moving images without using the Internet. Patent Documents 2 and 3 particularly indicate which object the user really wanted to give additional data by displaying the additional data as a balloon and changing the balloon according to the movement of the object in the moving image. We propose a method that makes it easy to see if there is any.

また、特許文献４には、フキダシを表示するチャットシステムが開示されている。共通背景上の特定位置に合成されたユーザの顔画像に対し、発生するフキダシが、顔画像や他のフキダシを隠さないように表示される。 Patent Document 4 discloses a chat system that displays balloons. The generated balloon is displayed on the face image of the user synthesized at a specific position on the common background so as not to hide the face image and other balloons.

また、特許文献５には、携帯電話または携帯端末から、映像記録装置に文字情報または絵情報を入力する技術が開示されている。携帯電話または携帯端末は、コメント追従メニューを備え、ユーザが指定した追従枠を動きベクトルを利用して追従させる。 Patent Document 5 discloses a technique for inputting character information or picture information to a video recording apparatus from a mobile phone or a mobile terminal. The mobile phone or the mobile terminal includes a comment tracking menu, and tracks the tracking frame specified by the user using a motion vector.

なお、一般的に、人間が一秒間に認識可能な文字数は限られているといわれる。このため、例えば、映画等の字幕に関しては、１秒間あたりの表示文字数や、１分あたりの表示語数（ＷＰＭ：ＷｏｒｄｓＰｅｒＭｉｎｕｔｅ）といった自主的なガイドラインが定められて、字幕視聴時の視認性が保たれる工夫がなされている。 In general, it is said that the number of characters that a human can recognize per second is limited. For this reason, for example, for subtitles such as movies, voluntary guidelines such as the number of display characters per second and the number of display words per minute (WPM: Words Per Minute) are established, and the visibility when viewing subtitles is set. The device that is kept is made.

以上に述べたように、従来、動画像にコメントを挿入する手法が提案されている。上述したように、特許文献１に開示されている技術を用いたコメント付き動画像配信サイトでは、画面の一端から他端までの一行に表示可能な文字数が限られている。このため、例えば、一律３秒間で画面の一端から他端までコメントがスクロール表示される、という設定がデフォルトとして適用される。その一方、ユーザが本当に付加データを与えたかったのは、どのオブジェクトであるかが分かりづらい。 As described above, conventionally, a method for inserting a comment into a moving image has been proposed. As described above, in the moving image distribution site with comments using the technique disclosed in Patent Document 1, the number of characters that can be displayed in one line from one end to the other end of the screen is limited. For this reason, for example, the setting that the comment is scroll-displayed from one end of the screen to the other end in 3 seconds is applied as a default. On the other hand, it is difficult to understand which object the user really wanted to give additional data.

そこで、本開示では、「オブジェクトに追従して動くコメント」を実現することを目的とする。これにより、ユーザが本当に付加データを与えたかったのは、どのオブジェクトであるかが分かりやすくなり、ユーザの視認性を高め、コミュニケーションを深められると考えられる。ただし、「オブジェクトに追従して動くコメント」を得るには、オブジェクト動き推定により算出される移動軌跡の座標を「追従して動くコメント」の座標として求める必要がある。例えばオブジェクト動き推定が早く失敗すれば、コメント表示時間も短くなるため、コメントが表示されている間にユーザがコメントを読み終わらないといった課題が発生し得る。 Therefore, an object of the present disclosure is to realize a “comment that moves following an object”. As a result, it is easy to understand which object the user really wanted to give additional data, so that the visibility of the user can be improved and communication can be deepened. However, in order to obtain a “comment that moves following an object”, it is necessary to obtain the coordinates of the movement trajectory calculated by the object motion estimation as the coordinates of the “comment that moves following the object”. For example, if the object motion estimation fails early, the comment display time is also shortened, which may cause a problem that the user does not finish reading the comment while the comment is displayed.

逆に、字幕のような、コメントの文字数に応じたコメント表示時間の制御を行なうことも単純には実現できない。 Conversely, it is not possible to simply control the comment display time according to the number of characters in the comment, such as subtitles.

コンテンツ配信者の配信する動画像や個人保有のコンテンツの多くは、シーンに複数のオブジェクトを含む。特に各オブジェクトの動きの大きさや色などが異なる場合、動画像中のオブジェクトの軌跡を算出する処理において、ユーザが選択したオブジェクトや、オブジェクト動き推定処理に用いるパラメータによって、前述したオブジェクト動き推定結果の時間長は異なることが多い。 Many of the moving images distributed by content distributors and personally owned content include a plurality of objects in a scene. In particular, when the size and color of the motion of each object are different, the object motion estimation result described above depends on the object selected by the user and the parameters used for the object motion estimation process in the process of calculating the trajectory of the object in the moving image. The length of time is often different.

一般的に、オブジェクト動き推定処理では、時系列の画像間で同一のオブジェクト領域を判別することで動きを求める。例えば、あるフレームでのオブジェクトをモデルとし、最もモデルを満たす（モデルとの輝度差が最も０に近い）別のフレーム上のオブジェクトを同一オブジェクトと判定し、オブジェクト間の位置変化から動きを算出する。 Generally, in the object motion estimation process, a motion is obtained by discriminating the same object region between time-series images. For example, an object in a certain frame is used as a model, an object on another frame that satisfies the model most (the brightness difference from the model is closest to 0) is determined as the same object, and motion is calculated from a change in position between the objects. .

また、動画像中にオブジェクトが映らなくなる場合など、動き推定が不可能な場合（中断すべきケース）も存在する。その中断判定の基準としても、前述のモデルとの輝度差を用いることができる。 In addition, there are cases where motion estimation is impossible (cases to be interrupted), such as when an object no longer appears in a moving image. The luminance difference from the above model can also be used as a criterion for the interruption determination.

このようなモデルとの輝度差に基づく動き推定処理において、ノイズまたはモデルとの輝度差をどれくらい許容するか（すなわち、動き推定の精度）と、オブジェクトの追跡時間との間にトレードオフの関係がある。これらのトレードオフは、画像中に一様に存在するわけではない。例えば、動画像上のあるオブジェクトに対しては、所望の追跡時間よりも早く動き推定が中断する一方、同じ動画像中の別のオブジェクトに対しては動き推定が中断されない、といった状況は数多く起こりえる。つまり、一様な動き推定条件を用いると、ユーザがコメント付与を意図して指定した座標位置および動画像上での時刻によって、追跡が終了するまでの時間と、動き推定の精度が、ばらついてしまうという課題がある。 In the motion estimation process based on the brightness difference with the model, there is a trade-off relationship between how much noise or the brightness difference with the model is tolerated (ie, the accuracy of motion estimation) and the object tracking time. is there. These trade-offs are not uniformly present in the image. For example, there are many situations in which motion estimation is interrupted earlier than the desired tracking time for one object on a moving image, while motion estimation is not interrupted for another object in the same moving image. Yeah. In other words, if uniform motion estimation conditions are used, the time until tracking ends and the accuracy of motion estimation vary depending on the coordinate position specified by the user with the intention of giving a comment and the time on the moving image. There is a problem of end.

このような課題を解決するために、本発明の一態様に係るコメント情報生成装置は、動画像を受け付ける動画取得部と、前記動画取得部が受け付けた前記動画像中のオブジェクトの位置情報と、前記オブジェクトに特定のタイミングから追従表示させるコメントとの入力を受け付けるコメント入力受付部と、前記コメントを表示し続ける時間長の目標値である目標時間を、前記コメント入力受付部が受け付けた前記コメントに基づいて決定する目標時間決定部と、前記位置情報で示される前記オブジェクトに前記コメントを追従表示させるための、前記オブジェクトの移動軌跡が、前記目標時間の時間長の移動軌跡となるように、前記オブジェクトの移動軌跡を決定するオブジェクト動き決定部と、前記コメントと、前記オブジェクト動き決定部が決定した前記オブジェクトの移動軌跡とを含む出力コメント情報を生成する出力コメント情報生成部と、前記出力コメント情報生成部が生成した前記出力コメント情報を出力する出力部とを備える。 In order to solve such a problem, a comment information generation device according to an aspect of the present invention includes a moving image acquisition unit that receives a moving image, position information of an object in the moving image that the moving image acquisition unit receives, A comment input receiving unit that receives an input of a comment to be displayed following the object from a specific timing, and a target time that is a target value of a length of time for which the comment is continuously displayed are displayed in the comment received by the comment input receiving unit. A target time determination unit that is determined based on the object, and the object trajectory for displaying the comment following the object indicated by the position information is a trajectory of a time length of the target time. An object motion determination unit for determining a movement locus of the object, the comment, and the object motion determination. And an output comment information generator section to generate output comment information including the moving locus of the determined object, and an output unit which outputs the output comment information said output comment information generating unit has generated.

この構成によると、コメントに基づき、コメントを表示すべき目標時間を決定し、移動軌跡の連続する時間が目標時間に近づくようにオブジェクトの移動軌跡を決定している。 According to this configuration, the target time for displaying the comment is determined based on the comment, and the movement trajectory of the object is determined so that the continuous time of the movement trajectory approaches the target time.

なお「目標時間の時間長の移動軌跡となるように」とは、広義には、オブジェクトの動き推定を開始した時点から中断されるまでの移動軌跡の長さが、コメントを表示する際の目標時間に十分近い時間長に相当する長さである状態を指す。狭義には、動き推定を開始した時点から中断されるまでの移動軌跡の長さが、コメントを表示する際の目標時間に相当する長さである状態を指す。以下、「目標時間」を用いる場合、上記「コメントを表示する際の目標時間」のことを指すものとする。どれぐらいの差の時間を「十分近い」と許容できるかは、表示する動画像のフレームレートや、目標時間を決定する際の係数、もしくは利用者によっても異なる可能性がある。このため、事前に実験により決定しておいてもよいし、あらかじめ利用者が選択できるようにしておいてもよい。たとえば、１文字あたりの表示時間に相当する±０．２５秒では、違和感がなく「十分近い」と言えるため、この結果を利用してもよい。 Note that “to be a movement trajectory of the target time length” broadly means that the length of the movement trajectory from when the object motion estimation is started until it is interrupted is the target when the comment is displayed. It refers to a state corresponding to a length of time sufficiently close to time. In a narrow sense, it indicates a state in which the length of the movement trajectory from when the motion estimation is started until it is interrupted is a length corresponding to the target time when the comment is displayed. Hereinafter, when “target time” is used, it means the above “target time for displaying a comment”. How much difference time can be allowed to be “close enough” may vary depending on the frame rate of the moving image to be displayed, the coefficient for determining the target time, or the user. For this reason, it may be determined by an experiment in advance or may be selected by the user in advance. For example, at ± 0.25 seconds corresponding to the display time per character, it can be said that there is no sense of incongruity and “close enough”, so this result may be used.

決定されたオブジェクトの移動軌跡は、コメントを追従表示させるための移動軌跡として用いることができる。このため、ユーザは、コメント付き動画像表示時に、コメントを表示時間内に読むことができ、そのコメントがどのオブジェクトに対して付与されたものであるかを判断することができる。よって、コメントの視認性を向上させることのできる出力コメント情報を生成することができる。 The determined movement trajectory of the object can be used as a movement trajectory for displaying the comment following. For this reason, the user can read the comment within the display time when the moving image with the comment is displayed, and can determine to which object the comment is given. Therefore, it is possible to generate output comment information that can improve comment visibility.

例えば、前記目標時間決定部は、前記コメント入力受付部が受け付けた前記コメントの長さが長いほど、前記目標時間がより長くなるように、前記目標時間を算出し、前記出力部は、前記コメント入力受付部に、より長いコメントが入力されると、より時間長が長い移動軌跡を出力しても良い。 For example, the target time determination unit calculates the target time so that the target time becomes longer as the length of the comment received by the comment input reception unit is longer, and the output unit receives the comment When a longer comment is input to the input receiving unit, a movement trajectory having a longer time may be output.

また、前記目標時間決定部は、予め定められた一文字あたりの表示時間である単位表示時間と、前記コメント入力受付部が受け付けた前記コメントの文字数とを掛け合わせた値を、前記目標時間として算出し、前記出力部は、前記コメント入力受付部に前記コメントが入力されると、前記コメントの文字数と前記単位表示時間とを掛け合わせた長さの移動軌跡を出力しても良い。 The target time determination unit calculates a value obtained by multiplying a unit display time, which is a predetermined display time per character, and the number of characters of the comment received by the comment input reception unit as the target time. The output unit may output a movement trajectory having a length obtained by multiplying the number of characters of the comment by the unit display time when the comment is input to the comment input receiving unit.

また、前記目標時間決定部は、さらに、予め定められた、文字の視覚認識にかかる視覚認識時間に、算出した前記目標時間が満たない場合は、前記視覚認識時間を前記目標時間として算出し、前記出力部は、前記コメント入力受付部に前記コメントが入力されると、前記コメントの文字数と予め定められた一文字あたりの表示時間である単位表示時間とを掛け合わせた長さと、前記視覚認識時間の長さのうち大きい方の長さの移動軌跡を出力し、どれだけ短い前記コメントが入力された場合でも前記視覚認識時間以上の長さの移動軌跡を出力しても良い。 The target time determination unit further calculates the visual recognition time as the target time when the calculated target time does not satisfy a predetermined visual recognition time for visual recognition of characters, When the comment is input to the comment input reception unit, the output unit multiplies the number of characters of the comment by a predetermined unit display time which is a display time per character, and the visual recognition time. It is also possible to output a movement trajectory having a length longer than the visual recognition time and output a movement trajectory having a length equal to or longer than the visual recognition time, regardless of how short the comment is input.

ここで、視覚認識時間とは文字数に関係なく、文字を認識するのに必要な最低限の時間のことである。 Here, the visual recognition time is the minimum time necessary for recognizing characters regardless of the number of characters.

また、前記出力部は、前記コメント入力受付部が受け付けた複数の前記コメントが、同一のフレームの同一の位置に付与された複数の前記コメントであっても、互いに文字数が異なる場合には、互いに異なる移動軌跡を出力しても良い。 In addition, even if the plurality of comments received by the comment input receiving unit are the plurality of comments given to the same position in the same frame, the output unit may mutually Different movement trajectories may be output.

また、前記オブジェクト動き決定部は、複数の動き推定方法の各々または複数の動き推定パラメータの各々を用いて、前記コメント入力受付部が受け付けた前記位置情報で示される前記オブジェクトであって、前記動画像中の前記オブジェクトの移動軌跡を算出し、算出した前記オブジェクトの移動軌跡のうち、前記目標時間に最も近い長さの移動軌跡を選択することにより前記オブジェクトの移動軌跡を決定しても良い。 Further, the object motion determination unit is the object indicated by the position information received by the comment input reception unit using each of a plurality of motion estimation methods or each of a plurality of motion estimation parameters, and the moving image The movement trajectory of the object in the image may be calculated, and the movement trajectory of the object may be determined by selecting a movement trajectory having a length closest to the target time among the calculated movement trajectories of the object.

この構成によると、複数の動き推定方法または複数の動き推定パラメータを用いて移動軌跡を算出することにより、目標時間連続する移動軌跡を推定し易くなる。 According to this configuration, it is easy to estimate a movement trajectory continuous for a target time by calculating the movement trajectory using a plurality of motion estimation methods or a plurality of motion estimation parameters.

例えば、前記オブジェクト動き決定部は、前記複数の動き推定パラメータの各々として、（１）オブジェクトの追従のし易さに影響し、互いに値の異なる複数の誤差許容度パラメータの各々、（２）互いにサイズの異なる複数の探索窓領域の各々、又は（３）互いに値が異なる複数の特徴量の各々、のいずれかに基づいて、前記コメント入力受付部が受け付けた前記位置情報で示される前記オブジェクトであって、前記動画像中の前記オブジェクトの移動軌跡を算出し、算出した前記オブジェクトの移動軌跡のうち、前記目標時間に最も近い長さの移動軌跡を選択することにより、前記オブジェクトの移動軌跡を決定しても良い。 For example, the object motion determination unit, as each of the plurality of motion estimation parameters, (1) affects the ease of following the object, each of a plurality of error tolerance parameters having different values, and (2) each other. The object indicated by the position information received by the comment input receiving unit based on any of a plurality of search window regions having different sizes or (3) each of a plurality of feature amounts having different values. The movement trajectory of the object in the moving image is calculated, and the movement trajectory of the object is selected by selecting a movement trajectory having a length closest to the target time from the calculated movement trajectories of the object. You may decide.

誤差を許容すれば移動軌跡が長くなるが動き推定精度は悪くなる。逆に、誤差を許容しなければ移動軌跡は短くなるが動き推定精度は良くなる。また、探索窓領域のサイズを小さくすれば移動軌跡が長くなるが動き推定精度は悪くなる。逆に、探索窓領域のサイズを大きくすれば移動軌跡は短くなるが動き推定精度は良くなる。また、特徴量の数を小さくすれば移動軌跡が長くなるが動き推定精度は悪くなる。逆に、特徴量の数を大きくすれば移動軌跡は短くなるが動き推定精度は良くなる。 If the error is allowed, the movement trajectory becomes long, but the motion estimation accuracy deteriorates. On the contrary, if the error is not allowed, the movement trajectory is shortened but the motion estimation accuracy is improved. Further, if the size of the search window area is reduced, the movement trajectory becomes longer, but the motion estimation accuracy deteriorates. Conversely, if the size of the search window area is increased, the movement trajectory is shortened but the motion estimation accuracy is improved. Further, if the number of feature amounts is reduced, the movement trajectory becomes longer, but the motion estimation accuracy becomes worse. Conversely, if the number of feature quantities is increased, the movement trajectory is shortened, but the motion estimation accuracy is improved.

また、前記オブジェクト動き決定部は、さらに、前記目標時間に最も近い長さの前記オブジェクトの移動軌跡を複数の動き推定方法の各々または複数の動き推定パラメータの各々を用いても決定できない、「動き推定不可能な状態」が生じた際に、当該状態の原因が遮蔽によるかシーンチェンジによるかを判定し、判定結果に基づいてオブジェクト動き決定方法を切り替えても良い。 In addition, the object motion determination unit is further unable to determine the movement trajectory of the object having a length closest to the target time using each of a plurality of motion estimation methods or each of a plurality of motion estimation parameters. When a state that cannot be estimated occurs, it may be determined whether the cause of the state is due to occlusion or a scene change, and the object motion determination method may be switched based on the determination result.

また、前記オブジェクト動き決定部は、前記「動き推定不可能な状態」が生じた原因が遮蔽によるものと判定した場合に、遮蔽が生じたフレーム以降のフレームにおける前記オブジェクトの移動軌跡を、前記遮蔽が生じたフレームまでの前記オブジェクトの移動軌跡に基づいて補外することにより、前記目標時間に最も近い長さの前記オブジェクトの移動軌跡を決定しても良い。 In addition, when the object motion determination unit determines that the cause of the “state where motion cannot be estimated” is caused by occlusion, the object movement determination unit displays the movement trajectory of the object in frames after the frame where occlusion occurs. The object movement trajectory having the length closest to the target time may be determined by extrapolation based on the movement trajectory of the object up to the frame where the occurrence of the object occurs.

この構成によると、遮蔽が生じたフレームの移動軌跡を遮蔽が生じなかったフレームの移動軌跡に基づいて補外することができる。 According to this configuration, it is possible to extrapolate the movement trajectory of the frame where the occlusion occurred based on the movement trajectory of the frame where the occlusion did not occur.

また、前記オブジェクト動き決定部は、前記「動き推定不可能な状態」が生じた原因がシーンチェンジによるものと判定した場合に、シーンチェンジが生じたフレームまでの前記オブジェクトの移動軌跡を、出力する移動軌跡として決定しても良い。 In addition, the object motion determination unit outputs a movement trajectory of the object up to the frame in which the scene change occurs when it is determined that the cause of the “state where motion cannot be estimated” is caused by a scene change. You may determine as a movement locus.

シーンチェンジが発生した場合には、シーンチェンジを挟んでの、オブジェクトの正確な移動軌跡を求めることは非常に困難であり、シーンチェンジ以降誤った移動軌跡が求まる可能性が高い。したがって、コメントの視認性向上のために、シーンチェンジ以降の移動軌跡は推定しないと決定した方が結果として視認性がよいことがある。 When a scene change occurs, it is very difficult to obtain an accurate movement trajectory of the object across the scene change, and there is a high possibility that an incorrect movement trajectory will be obtained after the scene change. Therefore, in order to improve the visibility of comments, it may be better to determine that the movement trajectory after the scene change is not estimated as a result.

また、前記オブジェクト動き決定部は、前記動画像を構成するフレーム間の輝度ヒストグラムの変化量が所定の閾値以上の場合に、前記「動き推定不可能な状態」が生じた原因がシーンチェンジによると判定し、前記輝度ヒストグラムの変化量が前記所定の閾値未満の場合に、前記「動き推定不可能な状態」が生じた原因が遮蔽によると判定しても良い。 In addition, the object motion determination unit may determine that the cause of the “state incapable of motion estimation” is caused by a scene change when a change amount of a luminance histogram between frames constituting the moving image is equal to or greater than a predetermined threshold. If the amount of change in the luminance histogram is less than the predetermined threshold, it may be determined that the cause of the “state where motion cannot be estimated” is caused by shielding.

また、前記オブジェクト動き決定部は、前記複数の動き推定方法の各々または前記複数の動き推定パラメータの各々を用いて求めた前記オブジェクトの移動軌跡が前記目標時間より一定時間以上短い場合に、前記コメント入力受付部が前記位置情報および前記コメントの入力を受け付けたフレーム、ならびに前記コメント入力受付部が受け付けた前記位置情報で示される前記オブジェクトの位置より、時間軸上で前のフレームへさかのぼって推定した移動軌跡を、前記オブジェクトの移動軌跡の前方につなぐことで、前記目標時間に最も近い長さの前記オブジェクトの移動軌跡を決定しても良い。 In addition, the object motion determination unit may include the comment when the movement trajectory of the object obtained using each of the plurality of motion estimation methods or each of the plurality of motion estimation parameters is shorter than the target time by a certain time or more. Estimated by going back to the previous frame on the time axis from the frame where the input receiving unit received the position information and the input of the comment, and the position of the object indicated by the position information received by the comment input receiving unit The movement locus of the object having the length closest to the target time may be determined by connecting the movement locus in front of the movement locus of the object.

この構成によると、位置情報で示されるオブジェクトの移動軌跡が目標時間に達しない場合には、領域分割して得られたいずれかの領域を追跡した結果を用いて、目標時間に最も近いオブジェクトの移動軌跡を決定できる。 According to this configuration, when the movement trajectory of the object indicated by the position information does not reach the target time, the result of tracking one of the areas obtained by dividing the area is used to determine the object closest to the target time. The movement trajectory can be determined.

また、前記オブジェクト動き決定部は、前記複数の動き推定方法の各々または前記複数の動き推定パラメータの各々を用いて求めた前記オブジェクトの移動軌跡の時間長が前記目標時間より一定時間以上短い場合に、前記コメント入力受付部が受け付けた前記オブジェクトの位置情報が示す位置から一定の距離範囲内の位置を基点としたオブジェクトの移動軌跡のうち、時間長が前記目標時間に最も近い長さの移動軌跡を、前記コメント入力受付部が受け付けた前記位置情報で示される前記オブジェクトの移動軌跡として決定しても良い。 In addition, the object motion determination unit may be configured when the time length of the movement trajectory of the object obtained using each of the plurality of motion estimation methods or each of the plurality of motion estimation parameters is shorter than the target time by a certain time or more. Among the movement trajectories of the object based on the position within a certain distance range from the position indicated by the position information of the object received by the comment input reception unit, the movement trajectory whose time length is the closest to the target time May be determined as the movement trajectory of the object indicated by the position information received by the comment input receiving unit.

また、前記オブジェクト動き決定部は、前記複数の動き推定方法の各々または前記複数の動き推定パラメータの各々を用いて求めた前記オブジェクトの移動軌跡の時間長が前記目標時間より一定時間以上短い場合には、前記コメント入力受付部が受け付けた前記オブジェクトの位置情報が示す位置を含む、ユーザが指定した範囲内の位置を基点とした前記オブジェクトの移動軌跡のうち、時間長が前記目標時間に最も近い長さの移動軌跡を、前記コメント入力受付部が受け付けた前記位置情報で示される前記オブジェクトの移動軌跡として決定しても良い。 In addition, the object motion determination unit may be configured when the time length of the movement trajectory of the object obtained using each of the plurality of motion estimation methods or each of the plurality of motion estimation parameters is shorter than the target time by a certain time or more. Is the closest to the target time among the movement trajectories of the object based on the position within the range specified by the user, including the position indicated by the position information of the object received by the comment input receiving unit The moving locus of the length may be determined as the moving locus of the object indicated by the position information received by the comment input receiving unit.

この構成によると、位置情報で示されるオブジェクトの移動軌跡が目標時間より一定時間以上短い場合には、当該移動軌跡の近傍を基点とした、別の移動軌跡を用いてオブジェクトを追跡した結果に基づき、目標時間連続するオブジェクトの移動軌跡を推定することができる。なお、空間的な近傍座標だけでなく、時間方向の近傍に対して同様の処理を行ってもよい。 According to this configuration, when the movement trajectory of the object indicated by the position information is shorter than the target time by a certain time or more, based on the result of tracking the object using another movement trajectory based on the vicinity of the movement trajectory. It is possible to estimate the movement trajectory of the object for the target time. The same processing may be performed not only on the spatial neighborhood coordinates but also on the neighborhood in the time direction.

また、前記オブジェクト動き決定部は、前記複数の動き推定方法の各々または前記複数の動き推定パラメータの各々を用いて求めた前記オブジェクトの移動軌跡の時間長が前記目標時間より一定時間以上短い場合には、前記オブジェクトを複数の領域に領域分割し、分割して得られた領域のうち、前記目標時間に最も近い長さの領域の移動軌跡を、前記オブジェクトの移動軌跡として決定しても良い。 In addition, the object motion determination unit may be configured when the time length of the movement trajectory of the object obtained using each of the plurality of motion estimation methods or each of the plurality of motion estimation parameters is shorter than the target time by a certain time or more. May divide the object into a plurality of areas, and determine a movement trajectory of an area having a length closest to the target time among the areas obtained by the division as the movement trajectory of the object.

この構成によると、位置情報で示されるオブジェクトの移動軌跡が目標時間より一定時間以上短い場合でも、当該移動軌跡の近傍を基点とした、別の移動軌跡を用いることにより、目標時間を満たすオブジェクトの移動軌跡を決定できる。 According to this configuration, even when the movement trajectory of the object indicated by the position information is shorter than the target time by a certain time or more, by using another movement trajectory based on the vicinity of the movement trajectory, the object that satisfies the target time The movement trajectory can be determined.

また、前記オブジェクト動き決定部は、さらに、前記コメント入力受付部が受け付けた前記位置情報が示すオブジェクトの重心について、目標時間に最も近い長さの移動軌跡を決定し、前記コメント入力受付部が受け付けた前記コメントが付与される位置と前記オブジェクトの重心との相対的な位置関係に基づいて、前記決定された前記オブジェクトの移動軌跡を、あたかもコメントが付与される位置からの移動軌跡であるかのように補正し出力しても良い。 In addition, the object movement determination unit further determines a movement locus having a length closest to a target time for the center of gravity of the object indicated by the position information received by the comment input reception unit, and the comment input reception unit receives Further, based on the relative positional relationship between the position where the comment is given and the center of gravity of the object, the movement trajectory of the determined object is as if it were a movement trajectory from the position where the comment is given. It may be corrected and output as described above.

この構成によると、例えば、オブジェクトの重心座標と、位置情報で示される、コメントを付与したフレームでの座標との相対的な位置関係を以降も保つように、オブジェクトの移動軌跡を補正することができる。 According to this configuration, for example, the movement trajectory of the object can be corrected so as to maintain the relative positional relationship between the coordinates of the center of gravity of the object and the coordinates in the frame to which the comment is given, which is indicated by the position information. it can.

なお、前記オブジェクト動き決定部は、複数の動き推定方法の各々または複数の動き推定パラメータの各々を用いて求めた前記オブジェクトの移動軌跡の時間長が前記目標時間より一定時間以上長い場合には、最も精度が高く、最も短い時間長となった移動軌跡に対して、開始フレームから、目標時間までの範囲の移動軌跡のみを、前記コメント入力受付部が受け付けた前記位置情報で示される前記オブジェクトの移動軌跡として決定し、目標時間以降のフレームに対応する移動軌跡は廃棄することで、容易に目標時間に最も近い長さの移動軌跡を得ることができる。 When the time length of the movement trajectory of the object obtained using each of a plurality of motion estimation methods or each of a plurality of motion estimation parameters is longer than the target time by a certain time or more, Only the movement trajectory in the range from the start frame to the target time with respect to the movement trajectory having the highest accuracy and the shortest time length is the object of the object indicated by the position information received by the comment input reception unit. By determining the movement trajectory and discarding the movement trajectory corresponding to the frame after the target time, a movement trajectory having a length closest to the target time can be easily obtained.

また、上述のコメント情報生成装置は、さらに、前記オブジェクト動き決定部が決定した前記オブジェクトの移動軌跡に基づいて、前記オブジェクトの移動速度が速いほど前記目標時間が長くなるように、前記目標時間を補正する目標時間補正部を備え、前記オブジェクト動き決定部は、さらに、前記位置情報で示される前記オブジェクトに前記コメントを追従表示させるための、前記オブジェクトの移動軌跡が、前記目標時間補正部で補正された後の前記目標時間の時間長となるように、前記オブジェクトの移動軌跡を決定しなおしても良い。 Further, the comment information generation device described above further sets the target time based on the movement trajectory of the object determined by the object movement determination unit so that the target time becomes longer as the moving speed of the object increases. A target time correcting unit for correcting the object, and the object movement determining unit further corrects the movement trajectory of the object for causing the object indicated by the position information to follow the comment by the target time correcting unit. The movement trajectory of the object may be determined again so as to be the length of the target time after being performed.

なお、これらの全般的または具体的な態様は、システム、方法、集積回路、コンピュータプログラムまたはコンピュータ読み取り可能なＣＤ−ＲＯＭなどの記録媒体で実現されてもよく、システム、方法、集積回路、コンピュータプログラムまたは記録媒体の任意な組み合わせで実現されてもよい。 These general or specific aspects may be realized by a system, a method, an integrated circuit, a computer program, or a recording medium such as a computer-readable CD-ROM. The system, method, integrated circuit, computer program Alternatively, it may be realized by any combination of recording media.

以下、本発明の一態様に係るコメント情報生成装置について、図面を参照しながら具体的に説明する。 Hereinafter, a comment information generation device according to an aspect of the present invention will be specifically described with reference to the drawings.

なお、以下で説明する実施の形態は、いずれも本発明の一具体例を示すものである。以下の実施の形態で示される数値、形状、材料、構成要素、構成要素の配置位置及び接続形態、ステップ、ステップの順序などは、一例であり、本発明を限定する主旨ではない。また、以下の実施の形態における構成要素のうち、最上位概念を示す独立請求項に記載されていない構成要素については、任意の構成要素として説明される。 Note that each of the embodiments described below shows a specific example of the present invention. The numerical values, shapes, materials, constituent elements, arrangement positions and connecting forms of the constituent elements, steps, order of steps, and the like shown in the following embodiments are merely examples, and are not intended to limit the present invention. In addition, among the constituent elements in the following embodiments, constituent elements that are not described in the independent claims indicating the highest concept are described as optional constituent elements.

図１に、本実施の形態が対象とするコメント付き動画像配信システムの構成を示す。特許文献１と同様に、コメント付き動画像配信システムは、動画サーバ、およびコメント蓄積配信サーバを備えている。各サーバはそれぞれ同じＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）上で管理されていても良いし、別のＰＣ上で管理されていても良い。 FIG. 1 shows the configuration of a commented moving image distribution system targeted by the present embodiment. Similar to Patent Document 1, the commented moving image distribution system includes a moving image server and a comment accumulation and distribution server. Each server may be managed on the same PC (Personal Computer), or may be managed on another PC.

図１に示す互いに異なるユーザＡ〜Ｄは、ネットワークを通じて各ユーザの保持する端末（スマートフォン、ＰＣまたはタブレットＰＣ等）に配信された動画像を閲覧する。ここではさらに、ユーザＡ、ユーザＢが、端末に備えつけられたキーボードやソフトウェアキーボード等を通じて、動画像にコメントを付与する例が示されている。コメントはここでは文字情報であり、ユーザによって指定された、動画像中の時間的な位置（すなわち所望の時間ないしフレーム番号）、および空間的な位置（すなわち座標）に対応付けて付与される。なお、以下の説明で動画像と言う場合には、動画像データまたは動画像データが示す動画像を指しうるものとする。 Different users A to D shown in FIG. 1 browse moving images distributed to terminals (smartphones, PCs, tablet PCs, and the like) held by the users through a network. Here, an example in which user A and user B add comments to a moving image through a keyboard, software keyboard, or the like provided in the terminal is shown. Here, the comment is character information, and is given in association with a temporal position (that is, desired time or frame number) and a spatial position (that is, coordinates) in the moving image specified by the user. In the following description, the term “moving image” refers to moving image data or a moving image indicated by moving image data.

ユーザが付与したコメントは、当該コメントが付与された動画像、当該ユーザのユーザＩＤや、当該コメントが対応付けられた（動画像中の）時刻および座標、当該コメントが投稿された実際の時刻、等の情報と合わせて、コメント蓄積配信サーバに随時蓄積される。そして、ユーザＡおよびユーザＢがある動画像にコメントを付与した後に、別のユーザＣおよびユーザＤが当該動画像を視聴した際には、当該動画像に対応付けられた他のコメントと同様に、ユーザＡおよびユーザＢのコメントがコメント蓄積配信サーバより配信され、前記コメントに対応付けられた情報（動画中で経過した時間である動画上時刻および座標等）に基づいて動画上にコメントが合成された動画が表示される。 The comment given by the user includes the moving image to which the comment is attached, the user ID of the user, the time and coordinates (in the moving image) associated with the comment, the actual time when the comment is posted, The information is stored in the comment storage / delivery server as needed. When a user A and a user B view a moving image after giving a comment to a moving image, the user A and the user B, like other comments associated with the moving image. The comments of the user A and the user B are distributed from the comment accumulation / delivery server, and the comments are synthesized on the moving image based on the information associated with the comment (the time and coordinates on the moving image that are the elapsed time in the moving image). Will be displayed.

図２Ａおよび図２Ｂで、動画像へのコメント付与の時間的な流れを示す。 2A and 2B show a temporal flow of adding a comment to a moving image.

まず図２Ａに、動画像中の時間の流れと、表示されるコメントとの関係を示す。コメントは映像内時刻（動画像の再生時刻）に対応付けられてサーバに蓄積される。各コメントは、コメントの視認に適した時間長表示されることが望ましい。時間長の算出については詳しくは後述する。 First, FIG. 2A shows a relationship between a flow of time in a moving image and a displayed comment. The comment is stored in the server in association with the in-video time (moving image playback time). Each comment is desirably displayed for a length of time suitable for viewing the comment. The calculation of the time length will be described later in detail.

図２Ａによれば、ユーザＡのコメントは、映像内時刻ＳＡ秒〜ＥＡ秒の間表示され、ユーザＢのコメントは、ＳＢ秒〜ＥＢ秒（秒の代わりにフレームで求めてもよい）の間表示される。なお、時間の単位として以下では”秒”を用いるが、秒の代わりに”フレーム数”を用いても良い。 According to FIG. 2A, the comment of the user A is displayed between the time SA seconds to EA seconds in the video, and the comment of the user B is between SB seconds to EB seconds (may be obtained in a frame instead of seconds). Is displayed. In the following, “second” is used as the unit of time, but “number of frames” may be used instead of second.

また、図２Ｂには、実際の日時の流れを示す。図２Ｂに示すように、ユーザＡおよびユーザＢがコメントを付与（書き込み）した後に、ユーザＣまたはユーザＤが、コメント付き動画像を視聴するものとすると、ユーザＣは、コメント付き動画像のうち、映像内時刻がＳＢ秒〜ＥＡ秒の間の動画像を視聴している場合、図２Ａに示すような、ユーザＡのコメント上にユーザＢのコメントが重畳されたコメント付き動画像を見ることができる。一方ユーザＤは、映像内時刻がＥＡ秒〜ＥＢ秒の間の動画像を視聴しているとする。ユーザＤは、図２Ａに示すような、ユーザＢのコメントのみが付与された動画像を見ることができる。 FIG. 2B shows the actual date and time flow. As illustrated in FIG. 2B, when the user C or the user D views the commented moving image after the user A and the user B give (write) the comment, the user C includes the commented moving image. When viewing a moving image whose video time is between SB seconds and EA seconds, see the moving image with a comment in which the comment of user B is superimposed on the comment of user A as shown in FIG. 2A. Can do. On the other hand, it is assumed that the user D is viewing a moving image whose in-video time is between EA seconds and EB seconds. The user D can see a moving image to which only the comment of the user B is given as shown in FIG. 2A.

以上が、本実施の形態が対象とするコメント付き動画像の概念的な説明である。 The above is the conceptual description of the commented moving image targeted by this embodiment.

図３は、実施の形態におけるコメント情報生成装置１００の機能的な構成を示すブロック図である。図３に示されるように、コメント情報生成装置１００は、動画取得部１０１と、コメント入力受付部１０２と、オブジェクト動き生成部１０３と、出力コメント情報生成部１０４と、出力部１０５とを備える。 FIG. 3 is a block diagram illustrating a functional configuration of the comment information generation device 100 according to the embodiment. As illustrated in FIG. 3, the comment information generation device 100 includes a moving image acquisition unit 101, a comment input reception unit 102, an object motion generation unit 103, an output comment information generation unit 104, and an output unit 105.

コメント情報生成装置１００は、動画像１１０およびコメントの内容を示す入力コメント情報１１１（当該コメントに対応する動画像上の座標値であるコメント座標１１２を含む）を入力とし、コメント座標１１２で示される、ユーザが指定した動画像１１０中の画素または被写体の動きを算出し、出力コメント情報（オブジェクト追従コメント）を生成し、記憶装置１２０に出力する。 The comment information generation device 100 receives the moving image 110 and input comment information 111 indicating the content of the comment (including the comment coordinates 112 that are coordinate values on the moving image corresponding to the comment) and is indicated by the comment coordinates 112. Then, the motion of the pixel or subject in the moving image 110 specified by the user is calculated, output comment information (object following comment) is generated, and output to the storage device 120.

動画取得部１０１は、動画像、または動画像を構成する複数枚のピクチャ（「画像」とも言う）の入力を受け付ける。動画取得部１０１は、例えば、動画サーバなどの記憶装置に記憶されている動画像を、直接、または通信経路を介して読み出すインタフェースであっても良い。 The moving image acquisition unit 101 receives an input of a moving image or a plurality of pictures (also referred to as “images”) constituting the moving image. For example, the moving image acquisition unit 101 may be an interface that reads a moving image stored in a storage device such as a moving image server directly or via a communication path.

コメント入力受付部１０２は、ユーザにより入力された入力コメント情報１１１（コメント座標１１２を含む）を受け付ける。コメント入力受付部１０２は、例えば、ユーザがマウスのクリックまたはタッチパネルへのタッチ等を行うことにより指定された動画像上の時刻や座標を、直接、または通信経路を介して読み出すインタフェースであっても良い。 The comment input receiving unit 102 receives input comment information 111 (including comment coordinates 112) input by the user. For example, the comment input receiving unit 102 may be an interface that reads the time and coordinates on a moving image specified by a user clicking a mouse or touching a touch panel, directly or via a communication path. good.

オブジェクト動き生成部１０３は、動画取得部１０１が受け付けた動画像１１０およびコメント入力受付部１０２が受け付けた入力コメント情報１１１（コメント座標１１２を含む）に基づき、コメントの文字列の長さや視認性に依存する時間長分のオブジェクト動き推定結果を生成する。 Based on the moving image 110 received by the moving image acquisition unit 101 and the input comment information 111 (including the comment coordinates 112) received by the comment input receiving unit 102, the object motion generation unit 103 determines the length and visibility of the comment character string. An object motion estimation result for the dependent time length is generated.

出力コメント情報生成部１０４は、オブジェクト動き生成部１０３が生成したオブジェクト動き（追従コメントを表示するための一連の時間軸に沿った座標値である移動軌跡）と、入力コメント情報１１１とから、出力コメント情報を生成する。 The output comment information generation unit 104 outputs the object motion generated by the object motion generation unit 103 (movement trajectory that is a coordinate value along a series of time axes for displaying a follow-up comment) and the input comment information 111. Generate comment information.

出力部１０５は、出力コメント情報生成部１０４が生成した出力コメント情報を、有線または無線を介して記憶装置１２０に出力する。ここで、出力コメント情報は、動画像に対して付与されたコメントの文字情報、コメントを付与した座標、コメント投稿時刻、コメント表示時間などのコメントに関する情報を含む。さらに、出力コメント情報は、コメントを表示する際の、コメントの形状またはサイズを含んでいてもよい。 The output unit 105 outputs the output comment information generated by the output comment information generation unit 104 to the storage device 120 via wired or wireless. Here, the output comment information includes information about comments such as character information of comments given to moving images, coordinates to which comments are given, comment posting time, comment display time, and the like. Furthermore, the output comment information may include the shape or size of the comment when the comment is displayed.

オブジェクト動き生成部１０３は、目標時間決定部１０６と、オブジェクト動き決定部１０７とを備える。 The object motion generation unit 103 includes a target time determination unit 106 and an object motion determination unit 107.

目標時間決定部１０６は、コメント入力受付部１０２が受け付けた入力コメント情報１１１に基づいて、コメントをオブジェクトに追従させて表示させる目標時間、すなわち入力コメント情報１１１が含むコメントを表示するのに適切な目標時間を決定する。目標時間は、例えば、コメントの文字数に基づいて決定することができる。 Based on the input comment information 111 received by the comment input receiving unit 102, the target time determining unit 106 is suitable for displaying a target time for displaying a comment following an object, that is, a comment included in the input comment information 111. Determine the target time. The target time can be determined, for example, based on the number of characters in the comment.

オブジェクト動き決定部１０７は、動画取得部１０１が受け付けた動画像１１０、コメント入力受付部１０２が受け付けた入力コメント情報１１１（コメント座標１１２を含む）および目標時間決定部１０６で決定した目標時間に基づいて、複数の動き推定方法または複数の動き推定パラメータから、目標時間に十分長さが近く、かつ長さが目標時間よりも大きく、かつ必要以上に推定精度が低下しないようなオブジェクト動きを決定する。オブジェクト動き決定部１０７は、コメント座標１１２からのオブジェクトの追従結果の時間長が目標時間よりも大きく、かつ目標時間に十分近くなるような、動き推定方法または動き推定パラメータを選択し、指定したコメント座標１１２からの画素（オブジェクト）の動き（移動軌跡）を決定する。 The object motion determination unit 107 is based on the moving image 110 received by the moving image acquisition unit 101, the input comment information 111 (including the comment coordinates 112) received by the comment input reception unit 102, and the target time determined by the target time determination unit 106. Then, from a plurality of motion estimation methods or a plurality of motion estimation parameters, determine an object motion that is sufficiently close to the target time, is longer than the target time, and does not degrade the estimation accuracy more than necessary. . The object motion determination unit 107 selects a motion estimation method or a motion estimation parameter such that the time length of the tracking result of the object from the comment coordinates 112 is larger than the target time and sufficiently close to the target time, and the designated comment The movement (movement locus) of the pixel (object) from the coordinate 112 is determined.

なお、コメント情報生成装置１００を構成する各構成要素（動画取得部１０１、コメント入力受付部１０２、オブジェクト動き生成部１０３、出力コメント情報生成部１０４、出力部１０５）は、コンピュータ上で実行されるプログラム等のソフトウェアで実現されてもよいし、電子回路または集積回路等のハードウェアで実現されてもよい。図４Ａおよび図４Ｂは、コンピュータによって構成された本実施の形態におけるコメント情報生成装置のハードウェア構成を示す図である。 Note that each component (the moving image acquisition unit 101, the comment input reception unit 102, the object motion generation unit 103, the output comment information generation unit 104, and the output unit 105) constituting the comment information generation device 100 is executed on a computer. It may be realized by software such as a program, or may be realized by hardware such as an electronic circuit or an integrated circuit. 4A and 4B are diagrams showing a hardware configuration of the comment information generating apparatus according to the present embodiment configured by a computer.

図４Ａおよび図４Ｂにおいて、記憶装置２１０ａは、格納した動画像１１０をＩ／Ｆ（インタフェース）２０１ａへ出力する。ユーザからの入力を受け付ける入力装置２１０ｂは、入力コメント情報１１１をＩ／Ｆ２０１ｂへ出力する。コンピュータ２００は、動画像１１０および入力コメント情報１１１を取得して、オブジェクト追従コメント生成処理を行い、オブジェクト追従コメントの出力コメント情報を生成する。 4A and 4B, the storage device 210a outputs the stored moving image 110 to an I / F (interface) 201a. The input device 210b that receives input from the user outputs the input comment information 111 to the I / F 201b. The computer 200 acquires the moving image 110 and the input comment information 111, performs an object following comment generation process, and generates output comment information of the object following comment.

記憶装置２２０は、コンピュータ２００で生成された出力コメント情報を取得して格納する。 The storage device 220 acquires and stores output comment information generated by the computer 200.

コンピュータ２００は、Ｉ／Ｆ２０１ａおよび２０１ｂ、ＣＰＵ２０２、ＲＯＭ２０３、ＲＡＭ２０４、ＨＤＤ２０５およびＩ／Ｆ２０６を含む。コンピュータ２００を動作させるプログラムは、ＲＯＭ２０３またはＨＤＤ２０５にあらかじめ保持されている。プログラムは、プロセッサであるＣＰＵ２０２によって、ＲＯＭ２０３またはＨＤＤ２０５からＲＡＭ２０４に読み出されて展開される。ＣＰＵ２０２は、ＲＡＭ２０４に展開されたプログラム中のコード化された各命令を実行する。Ｉ／Ｆ２０１ａおよび２０１ｂは、プログラムの実行に応じて、動画像１１０および入力コメント情報１１１を、ＲＡＭ２０４へそれぞれ取り込む。Ｉ／Ｆ２０６は、プログラムの実行により生成されたオブジェクト追従コメントの出力コメント情報を出力し、記憶装置２２０に格納する。 The computer 200 includes I / Fs 201a and 201b, a CPU 202, a ROM 203, a RAM 204, an HDD 205, and an I / F 206. A program for operating the computer 200 is stored in advance in the ROM 203 or the HDD 205. The program is read out from the ROM 203 or HDD 205 to the RAM 204 and expanded by the CPU 202 as a processor. The CPU 202 executes each coded instruction in the program expanded in the RAM 204. The I / Fs 201a and 201b fetch the moving image 110 and the input comment information 111 into the RAM 204 in accordance with the execution of the program. The I / F 206 outputs the output comment information of the object following comment generated by executing the program and stores it in the storage device 220.

なお、コンピュータプログラムは、半導体であるＲＯＭ２０３またはＨＤＤ２０５に限らず、例えばＣＤ−ＲＯＭに格納されていてもよい。また、有線や無線のネットワーク、放送などを介して伝送され、コンピュータのＲＡＭ２０４に取り込まれてもよい。 The computer program is not limited to the semiconductor ROM 203 or HDD 205, and may be stored in, for example, a CD-ROM. Alternatively, the data may be transmitted via a wired or wireless network, broadcasting, or the like and taken into the RAM 204 of the computer.

以下、本実施の形態におけるコメント情報生成装置１００の動作を、図５を用いて説明する。図５は、本実施の形態におけるコメント情報生成装置１００の動作を表すフローチャートである。 Hereinafter, the operation of the comment information generating apparatus 100 in the present embodiment will be described with reference to FIG. FIG. 5 is a flowchart showing the operation of the comment information generating apparatus 100 in the present embodiment.

図５において、７つのステップＳ３０１〜Ｓ３０７は、それぞれ図３の各処理部１０１〜１０７に対応している。すなわち、動画取得部１０１では動画取得ステップＳ３０１、コメント入力受付部１０２ではコメント入力受付ステップＳ３０２、オブジェクト動き生成部１０３ではオブジェクト動き生成ステップＳ３０３、出力コメント情報生成部１０４では出力コメント情報生成ステップＳ３０４、出力部１０５では出力ステップＳ３０５の各動作を実行する。また、オブジェクト動き生成ステップＳ３０３は、目標時間決定ステップＳ３０６、オブジェクト動き推定ステップＳ３０７の２つのステップを含む。目標時間決定部１０６では目標時間決定ステップＳ３０６、オブジェクト動き決定部１０７ではオブジェクト動き推定ステップＳ３０７の各動作を実行する。 In FIG. 5, seven steps S301 to S307 correspond to the respective processing units 101 to 107 in FIG. That is, the video acquisition unit 101 has a video acquisition step S301, the comment input reception unit 102 has a comment input reception step S302, the object motion generation unit 103 has an object motion generation step S303, and the output comment information generation unit 104 has an output comment information generation step S304. The output unit 105 executes each operation of the output step S305. The object motion generation step S303 includes two steps: a target time determination step S306 and an object motion estimation step S307. The target time determination unit 106 executes each operation of target time determination step S306, and the object motion determination unit 107 executes each operation of object motion estimation step S307.

最初に、動画取得ステップＳ３０１が、動画取得部１０１により実行される。動画取得部１０１は、動画像１１０を取得する。 First, the moving image acquisition unit 101 executes the moving image acquisition step S301. The moving image acquisition unit 101 acquires the moving image 110.

本実施の形態では、動画取得部１０１が取得する動画像１１０として、放送動画、ユーザによって撮影された動画像等の各種動画像、および動画像を構成する複数のピクチャ（画像）を想定している。これらの動画像１１０は、動画サーバなどに蓄積されており、動画取得部１０１は、有線や無線のネットワーク、放送などを介して、動画像１１０を取得する。本実施の形態では、動画像は３０フレーム／秒とする。 In the present embodiment, the moving image 110 acquired by the moving image acquisition unit 101 is assumed to be a broadcast moving image, various moving images such as a moving image taken by a user, and a plurality of pictures (images) constituting the moving image. Yes. These moving images 110 are stored in a moving image server or the like, and the moving image acquisition unit 101 acquires the moving images 110 via a wired or wireless network, broadcasting, or the like. In the present embodiment, the moving image is 30 frames / second.

取得される動画像の例を図６Ａに示す。ここでは音楽関連の動画コンテンツの例を示すが、本発明が対象とする動画像はこれに限られない。なお、以降では、説明および図面の簡略化のため、ここで示した動画像例について、図６Ｂのように、被写体領域を簡略表示した図を用いる。 An example of the acquired moving image is shown in FIG. 6A. Here, an example of music-related moving image content is shown, but the moving image targeted by the present invention is not limited to this. Hereinafter, for simplification of description and drawings, a diagram in which subject areas are simply displayed as shown in FIG. 6B is used for the moving image example shown here.

図５を参照して、コメント入力受付ステップＳ３０２がコメント入力受付部１０２により実行される。コメント入力受付部１０２は、入力コメント情報１１１を取得する。ユーザによるコメント入力の例、および取得される入力コメント情報１１１の例を図７Ａおよび図７Ｂにそれぞれ示す。 With reference to FIG. 5, the comment input receiving step S <b> 302 is executed by the comment input receiving unit 102. The comment input receiving unit 102 acquires input comment information 111. An example of comment input by the user and an example of acquired input comment information 111 are shown in FIGS. 7A and 7B, respectively.

入力コメント情報１１１は図７Ｂに示すとおり、少なくとも、コメント時刻（ｔｉｍｅ）、コメント対象座標（ｐｏｓｉｔｉｏｎ）、および入力された文章情報であるコメント文字列（ｃｏｍｍｅｎｔ）の３つの情報を含む。コメント時刻は、動画像１１０中の経過時間（再生時刻）を示す情報であり、時間情報の代わりに、フレーム番号など、動画像１１０中でユーザによってコメントが付与されたタイミングを特定できる情報であれば他の情報であってもよい。コメント対象座標も、画素座標値の代わりに、画面の縦および横の値域が１になるように正規化した座標値など、当該フレームにおけるユーザによってコメントが付与された空間的位置が特定できる情報であれば、他の情報であってもよい。 As shown in FIG. 7B, the input comment information 111 includes at least three pieces of information including a comment time (time), a comment target coordinate (position), and a comment character string (comment) that is input sentence information. The comment time is information indicating the elapsed time (reproduction time) in the moving image 110, and may be information that can specify the timing at which the comment is given by the user in the moving image 110, such as a frame number, instead of the time information. Other information may be used. The comment target coordinates are also information that can specify the spatial position where the comment is given by the user in the frame, such as coordinate values normalized so that the vertical and horizontal value ranges of the screen are 1 instead of the pixel coordinate values. Other information may be used as long as it exists.

なお、これらのコメント入力は、コメント情報生成装置１００を備えたＰＣまたは携帯端末などの機器上のユーザインタフェースを通じて行なわれてもよい。または、コメント情報生成装置１００を備えていない一般的なＰＣまたは携帯端末などの機器上のユーザインタフェースを通じて行なわれた操作から、通信回線を介して、コメントおよびコメントを追従して表示させるオブジェクトの位置情報を受け付けることにより、コメント入力が行われてもよい。 Note that these comment inputs may be performed through a user interface on a device such as a PC or a portable terminal provided with the comment information generation device 100. Alternatively, the position of the object to be displayed by following the comment and the comment via a communication line from an operation performed through a user interface on a device such as a general PC or a portable terminal that does not include the comment information generation device 100 A comment may be input by receiving information.

なお、入力コメント情報１１１が含む情報は、コメント時刻（ｔｉｍｅ）、および入力された文章情報であるコメント文字列（ｃｏｍｍｅｎｔ）の２つの情報のみであって、コメント対象座標（ｐｏｓｉｔｉｏｎ）を別途備えた位置推定部によって推定する、という構成であってもよい。 Note that the information included in the input comment information 111 is only two pieces of information, that is, a comment time (time) and a comment character string (comment) that is input text information, and is additionally provided with a comment target coordinate (position). A configuration in which the position is estimated by the position estimation unit may be used.

あるフレーム上で表示されているコメントは、コメント対象のオブジェクトごとに内容に特徴があると考えられる。つまり、あるフレームにおけるコメント文は、コメントの位置情報と関連があるといえる。したがって、それまでに蓄積されたコメントのうち、そのフレームを通り、かつ類似するコメントの付与位置から、新たに付与するコメントのコメント対象座標を推定することができる。 A comment displayed on a certain frame is considered to have a characteristic in content for each object to be commented. That is, it can be said that the comment text in a certain frame is related to the position information of the comment. Accordingly, among the comments accumulated so far, it is possible to estimate the comment target coordinates of a comment to be newly added from the position where a similar comment passes through the frame.

また、コメントの投稿は、例えば、ユーザが閲覧している端末上での動画像１１０の再生に際し、ユーザのマウス入力、キー入力またはタッチ入力を端末側で検出した際、自動的に動画像１１０の再生が一時停止された状態で行われる。または、コメントの投稿は、ユーザの画面上での操作によって動画像１１０の再生が一時停止された状態で実行される。 In addition, for example, when the moving image 110 is reproduced on the terminal being browsed by the user and the user's mouse input, key input, or touch input is detected on the terminal side, the comment is automatically posted. Is performed in a state where playback of the video is paused. Alternatively, the posting of a comment is executed in a state where playback of the moving image 110 is paused by a user operation on the screen.

例えば、図７Ａに示す表示画面において、望ましくは対象となる動画像１１０の再生が一時停止されている状態で（操作のしやすさのため）、画面上のある座標をユーザが指定する（図７Ａの（ａ））。この指定に応答して、コメント情報生成装置１００にポップアップ画面が重畳表示される（図７Ａの（ｂ））。ユーザは、ポップアップ画面内にコメントを入力することにより、コメントの投稿を行う。ここで、画面上の座標の指定は、例えば、ＰＣにおいて、ユーザがコメントを付与したい座標にマウスポインタが合わせた状態で、マウスをクリックすることや、タッチパネル式ディスプレイの画面を直接タッチすることにより行われる。なお、前記のように位置推定部を別途有する場合は、ユーザが位置を指定する必要がないため、必ずしもコメント入力時に動画像を一時停止状態にせずとも使いやすさ上は問題が生じない。 For example, in the display screen shown in FIG. 7A, the user designates certain coordinates on the screen, preferably while the reproduction of the target moving image 110 is paused (for ease of operation) (see FIG. 7A). 7A (a)). In response to this designation, a pop-up screen is superimposed on the comment information generating apparatus 100 ((b) in FIG. 7A). The user posts a comment by inputting the comment in the pop-up screen. Here, the coordinates on the screen can be specified by, for example, clicking the mouse or touching the screen of the touch panel display directly with the mouse pointer on the coordinates where the user wants to give a comment on the PC. Done. If the position estimation unit is separately provided as described above, there is no need for the user to specify the position. Therefore, there is no problem in ease of use even if the moving image is not temporarily stopped when a comment is input.

あるいは、コメントの投稿は音声入力によるとしてもよい。この場合、音声解析部を備え、音声解析部は入力された音声を、コメント文に変換するという構成にしてもよい。ただし、通常、入力される音声は話し言葉であり、書き言葉とは厳密には異なるものである。そのため、音声解析部は入力された音声を、書き言葉としてコメント文に変換する、とするのが望ましい。音声によって入力される話し言葉を、書き言葉に変換する方法はたとえば特許文献６に開示されている。入力された話し言葉データを形態素解析し、話し言葉と、書き言葉を対応付けたテーブルを用いて書き言葉の候補を得、さらに、書き言葉データベース中で出現しやすい語順や言葉を選択する、という枠組みで、書き言葉への変換を行っている。以上の仕組みによれば、話し言葉特有の表現や、同音異義語なども適切に書き言葉としてコメント文に変換できる。 Alternatively, the comment may be posted by voice input. In this case, a voice analysis unit may be provided, and the voice analysis unit may convert the input voice into a comment sentence. However, the input speech is usually a spoken language and is strictly different from the written language. Therefore, it is desirable that the voice analysis unit converts the input voice into a comment sentence as written words. For example, Patent Document 6 discloses a method for converting spoken language input by voice into written language. To the written language, a morphological analysis is performed on the input spoken word data, a written word candidate is obtained using a table that associates the spoken word with the written word, and a word order and words that are likely to appear in the written word database are selected. The conversion is done. According to the above mechanism, expressions peculiar to spoken words, homonyms, etc. can be appropriately converted into written sentences as written words.

図５を参照して、オブジェクト動き生成ステップＳ３０３がオブジェクト動き生成部１０３により実行される。 Referring to FIG. 5, the object motion generation step S303 is executed by the object motion generation unit 103.

目標時間決定部１０６が、コメント入力受付ステップＳ３０２で受付けられた入力コメント情報１１１に基づき目標時間決定ステップＳ３０６を実行し、目標時間を決定する。続いて、目標時間決定部１０６が決定した目標時間、動画取得ステップＳ３０１で取得された動画像１１０、およびコメント入力受付ステップＳ３０２で受付けられた入力コメント情報１１１に基づき、オブジェクト動き決定部１０７が、オブジェクト動き決定ステップＳ３０７を実行し、追従コメントを表示するための一連の時間軸に沿った座標値である移動軌跡を決定する。以上の処理について、以下で詳しく説明する。 The target time determination unit 106 executes the target time determination step S306 based on the input comment information 111 received in the comment input reception step S302, and determines the target time. Subsequently, based on the target time determined by the target time determination unit 106, the moving image 110 acquired in the moving image acquisition step S301, and the input comment information 111 received in the comment input reception step S302, the object movement determination unit 107 The object movement determination step S307 is executed to determine a movement trajectory that is a coordinate value along a series of time axes for displaying a follow-up comment. The above processing will be described in detail below.

目標時間決定部１０６は、入力されたコメントを、コメントを入力したユーザまたは他のユーザが視認するために必要な時間である目標時間を決定する。 The target time determination unit 106 determines a target time which is a time required for the user who has input the comment or another user to visually recognize the input comment.

例えば映画等の字幕に関しては、英文の場合「１秒間に１２文字を表示する」というガイドラインが存在する。また、単語単位で目標時間を求めてもよい。例えば、文章を読むスピードとしてＷＰＭ（Ｗｏｒｄｓｐｅｒｍｉｎｕｔｅ：１分に読める単語数）という単位が用いられることがある。成人アメリカ人が通常、雑誌、新聞または本を読むスピードが２５０〜３００ＷＰＭと言われていることから、この基準を目標時間算出の際に適用することができる。 For example, for subtitles such as movies, there is a guideline of “display 12 characters per second” in English. Further, the target time may be obtained in units of words. For example, a unit called WPM (Words per minute: the number of words that can be read per minute) may be used as the speed of reading a sentence. Since it is said that an adult American usually reads a magazine, newspaper or book at 250 to 300 WPM, this standard can be applied when calculating the target time.

例えば、読みやすさのために、目標を少し遅めの２００ＷＰＭ等と設定すると、入力コメント文中のスペースを検出することで、単語数Ｗを求めることができる。このため、目標時間Ｎｗ＝Ｗ＊６０／２００のように、計算により目標時間Ｎｗ（秒）を求めることができる。 For example, for readability, if the target is set to 200 WPM, which is a little later, the number of words W can be obtained by detecting a space in the input comment sentence. Therefore, the target time Nw (seconds) can be obtained by calculation such that the target time Nw = W * 60/200.

なお、人が視覚的な認識を行なうには、約０．５秒ほどかかるといわれている。この時間を視覚認識時間という。視覚認識時間を加味し、目標時間は、予め定めた長さ（例えば０．５秒）以上になるように求めることが望ましい。 It is said that it takes about 0.5 seconds for a person to perform visual recognition. This time is called visual recognition time. In consideration of the visual recognition time, it is desirable to obtain the target time so as to be a predetermined length (for example, 0.5 seconds) or more.

また、既に別のコメントが表示されているフレームに、ユーザがコメントを入力する場合、表示されている過去のコメント文字情報と、新たにユーザが入力するコメント文字情報との両方が視認可能な時間となるよう、新たに入力したコメントの目標時間を長めに算出しても良い。新たなコメントの文字数または単語数に、所定の文字数または単語数を加算したうえで目標時間を算出するとしても良い。このようにすることで、コメントつき動画を閲覧するユーザにとって、より確実に、同時に表示されるコメント文を視認できるようになる。 In addition, when the user inputs a comment in a frame in which another comment is already displayed, the time in which both the past comment character information displayed and the comment character information newly input by the user are visible The target time of a newly input comment may be calculated to be longer so that The target time may be calculated after adding a predetermined number of characters or words to the number of characters or words of a new comment. By doing in this way, the comment sentence displayed simultaneously can be visually recognized more certainly for the user who browses a movie with a comment.

また、ユーザがコメントを付与しようとしているフレーム上に、既に同じ文字列よりなるコメントが付与されている場合は、コメント内容の重複による冗長さが不快感を与える可能性もある。そこで、同じフレーム上に存在する同一内容のコメントの数が多いほど、新たに追加される同一内容のコメントについては、目標時間を短く、ないしゼロとしてもよい。目標時間を短くするには、例えば、文字数から求めた目標時間を、同一コメントの数で割った値に変更する、等により実現できる。また、同一のコメントが同一フレームに付与されていなくても、同一の動画像中に複数回登場するコメントについては、同様に目標時間を短く、ないしゼロとしてもよい。 In addition, when a comment made of the same character string has already been given on the frame to which the user wants to give a comment, redundancy due to duplication of comment contents may give an uncomfortable feeling. Therefore, as the number of comments having the same content existing on the same frame increases, the target time for newly added comments having the same content may be shortened to zero. In order to shorten the target time, for example, the target time obtained from the number of characters can be changed to a value divided by the number of the same comments. Even if the same comment is not assigned to the same frame, the target time may be similarly shortened to zero for a comment that appears multiple times in the same moving image.

なお、本開示では目標時間決定部１０６は、入力されたコメント文字数から目標時間を都度推定するとしていたが、目標時間決定部１０６の動作はこれに限られない。たとえば、別途あらかじめコメントの文字数と目標時間との関係を求めておき、これをデータベースないしテーブルとして、コメント情報生成装置内部または外部に備えた記憶装置に保存しておく。目標時間決定部１０６は、コメント入力を受け付けた際に、コメント文字数から目標時間を推定する代わりに、有線または無線を介して前記データベースないしテーブルを参照し、目標時間を得るとしてもよい。あらかじめ目標時間の推定を行っておくことで、コメントが入力された際により高速に目標時間決定処理を行うことができる。 In the present disclosure, the target time determination unit 106 estimates the target time each time from the number of input comment characters, but the operation of the target time determination unit 106 is not limited to this. For example, a relationship between the number of characters in the comment and the target time is obtained in advance and stored as a database or table in a storage device provided inside or outside the comment information generation device. When receiving a comment input, the target time determination unit 106 may obtain the target time by referring to the database or table via wired or wireless instead of estimating the target time from the number of comment characters. By estimating the target time in advance, the target time determination process can be performed at a higher speed when a comment is input.

なお、オブジェクト動き生成部１０３、出力コメント情報生成部１０４及び出力部１０５は、コメント付き動画を視聴する各端末側に設けられているとしてもよい。 The object motion generation unit 103, the output comment information generation unit 104, and the output unit 105 may be provided on each terminal side that views a comment-added moving image.

昨今、異なる言語間での自動翻訳機能が当たり前となってきている。したがって、もともと動画に付与されたコメントの言語とは異なる言語に翻訳されたコメントがついた、コメント付き動画を視聴することも可能であるといえる。その場合、どの言語に翻訳されるのがよいかは、視聴側の端末ごとに選択されることが望ましい。すなわち、最も単純には、視聴側の端末が、端末の有する言語情報に基づいてコメントを別の言語に翻訳し、翻訳されたコメントに対して、オブジェクト動き生成部１０３、出力コメント情報生成部１０４及び出力部１０５の処理が行われるとしてもよい。 Nowadays, automatic translation functions between different languages have become commonplace. Accordingly, it can be said that it is possible to view a comment-added video with a comment translated into a language different from the language of the comment originally given to the video. In that case, it is desirable to select which language should be translated for each terminal on the viewing side. That is, most simply, the viewing terminal translates the comment into another language based on the language information of the terminal, and the object motion generation unit 103 and the output comment information generation unit 104 for the translated comment. The processing of the output unit 105 may be performed.

または、特定の言語圏に、コメントつき動画像を翻訳配信するサーバに、オブジェクト動き生成部１０３、出力コメント情報生成部１０４及び出力部１０５が設けられ、同様の処理を行うとしてもよい。 Alternatively, the object motion generation unit 103, the output comment information generation unit 104, and the output unit 105 may be provided in a server that translates and distributes a moving image with a comment in a specific language area, and the same processing may be performed.

続いて、オブジェクト動き決定部１０７は、目標時間決定部１０６が決定した目標時間、入力コメント情報、および動画像１１０に基づいて、追従コメントを表示するための一連の時間軸に沿った座標値である移動軌跡を決定する。 Subsequently, the object motion determination unit 107 uses coordinate values along a series of time axes for displaying a follow-up comment based on the target time determined by the target time determination unit 106, input comment information, and the moving image 110. A certain movement trajectory is determined.

具体的には、オブジェクト動き決定部１０７は、動画取得部１０１から複数のピクチャを入力し、ピクチャ間の対応点を検出して、移動軌跡を生成して出力する。以下、動画像１１０を構成する時間的に隣接する２枚のピクチャ間での画素または、画素を含むブロックの動きを検出し、検出した動きを前記複数枚のピクチャについて連結した一連の時間軸に沿った座標値を、移動軌跡と称する。 Specifically, the object motion determination unit 107 receives a plurality of pictures from the moving image acquisition unit 101, detects corresponding points between the pictures, generates a movement locus, and outputs it. Hereinafter, a motion of a pixel or a block including a pixel between two temporally adjacent pictures constituting the moving image 110 is detected, and the detected motion is connected to a series of time axes connected to the plurality of pictures. The coordinate value along is called a movement trajectory.

図８Ａに示すように、オブジェクト動き決定部１０７は、時刻ｔにおける入力ピクチャ５０１の画素ｉ５０３ａ、画素ｊ５０３ｂに基づき、前記算出された２枚のピクチャ間の動きベクトル情報５０２を用いて、画素ｉ５０３ａ、画素ｊ５０３ｂの動きを追跡し、画素ｉ５０３ａ、画素ｊ５０３ｂの対応点を求める。この時、オブジェクト動き決定部１０７は、１フレームのピクチャ上のある画素ｉの座標値（ｘ_１ ^ｉ，ｙ_１ ^ｉ）と、時刻ｔにおける画素ｉの対応点の画素座標値（ｘ_ｔ ^ｉ，ｙ_ｔ ^ｉ）とから、式１のように移動軌跡ｘ^ｉを算出する。As illustrated in FIG. 8A, the object motion determination unit 107 uses the motion vector information 502 between the two pictures calculated based on the pixel i 503a and the pixel j 503b of the input picture 501 at time t to generate the pixel i 503a, The movement of the pixel j503b is tracked, and corresponding points of the pixel i503a and the pixel j503b are obtained. At this time, the object motion determination unit 107 determines the coordinate value (x ₁ ⁱ , y ₁ ⁱ ) of a pixel i on a picture of one frame and the pixel coordinate value (x _t ⁱ , since y _t ⁱ⁾ and calculates the movement trajectory ^{x i} as equation 1.

本実施の形態において、移動軌跡ｘ^ｉは、１フレームからＴフレームまでのＴ枚のピクチャ間にわたる対応点であるものとする。In this embodiment, the movement trajectory x ⁱ is assumed to be the corresponding points over between T sheets of pictures from one frame to T frame.

図８Ｂは、移動軌跡の例を示す。オブジェクト動き決定部１０７に入力された動画像１１０は、Ｔ枚のピクチャ５０４で構成されている。このとき移動軌跡ｘ^ｉ５０６ａ、ｘ^ｊ５０６ｂは、１フレームのある画素ｉ５０５ａ、画素ｊ５０５ｂにそれぞれ対応する、２フレームからＴフレームのピクチャ上の対応点の集まりである。移動軌跡ｘ^ｉ５０６ａ、ｘ^ｊ５０６ｂは、各ピクチャのピクチャ座標値を要素とするベクトルで表される。ここでは１フレームのピクチャ上における全ての画素（Ｉ個）を基準として、２フレームからＴフレームまでの（Ｔ−１）枚のピクチャ上の対応する画素を求めている。FIG. 8B shows an example of the movement trajectory. The moving image 110 input to the object motion determination unit 107 is composed of T pictures 504. At this time, the movement trajectories x ⁱ 506a and x ^j 506b are a collection of corresponding points on the picture from the second frame to the T frame corresponding to the pixel i 505a and the pixel j 505b in one frame, respectively. The movement trajectories x ⁱ 506a and x ^j 506b are represented by vectors having the picture coordinate values of each picture as elements. Here, the corresponding pixels on (T−1) pictures from 2 frames to T frames are obtained on the basis of all the pixels (I) on one frame picture.

なお、オブジェクト動き決定部１０７においてピクチャ間の対応点を求める際、ピクチャの全ての画素ごとに対応点を求める代わりに、ピクチャ内の隣接する複数の画素（ブロック）ごとに対応点を求めるものとしてもよい。本実施の形態では、画素単位の処理について説明するが、複数の画素からなるブロック（領域）単位で処理をする場合には、（ｉ）ブロック内で画素値を合計したり、（ｉｉ）ブロック内の画素値の平均を求めたり、（ｉｉｉ）ブロック内の画素値の中央値を求めたりすることで、ブロックに対応するデータ（代表値）を求め、得られた代表値を用いて画素単位の処理と同様に処理すればよい。本実施の形態において、ある対応点が１画素ごとに求められたものか、または複数の画素に対して１つの対応点が求められたかを区別しない。また、あるピクチャの画素ｉに対応する他のピクチャの対応点、および、あるピクチャのブロックｉに対応する他のピクチャの対応点を、いずれも画素ｉの移動軌跡と呼び、本実施の形態では、以上で説明したような動き推定の手順で求めた移動軌跡を、オブジェクト追従コメントの動きの基本とする。また、ピクチャ間の対応点を求める際、必ずしもフレームが連続している必要はなく、例えば、時刻ｔと時刻ｔ＋ｎに入力された２枚のピクチャから移動軌跡を求めてもよい。ただし、ｎは１以上の整数である。 Note that when the corresponding point between pictures is obtained by the object motion determination unit 107, the corresponding point is obtained for each of a plurality of adjacent pixels (blocks) in the picture, instead of obtaining the corresponding point for every pixel of the picture. Also good. In the present embodiment, processing in units of pixels will be described. However, when processing is performed in units of blocks (regions) composed of a plurality of pixels, (i) pixel values are summed in the block, or (ii) blocks By calculating the average of the pixel values in the block, or (iii) determining the median value of the pixel values in the block, the data (representative value) corresponding to the block is determined, and the obtained representative value is used as a pixel unit. What is necessary is just to process like the process of. In the present embodiment, it is not distinguished whether a certain corresponding point is obtained for each pixel or whether one corresponding point is obtained for a plurality of pixels. Also, the corresponding point of another picture corresponding to the pixel i of a certain picture and the corresponding point of another picture corresponding to the block i of a certain picture are both called the movement locus of the pixel i. The movement trajectory obtained by the motion estimation procedure as described above is used as the basis of the motion of the object following comment. Further, when obtaining corresponding points between pictures, the frames do not necessarily have to be continuous. For example, the movement trajectory may be obtained from two pictures input at time t and time t + n. However, n is an integer of 1 or more.

上記した複数のピクチャ間の対応点を算出する具体的な手法としては、非特許文献１または非特許文献２などに開示されている方法を用いてもよい。ともに、オプティカルフローを計算することにより動きベクトルを算出する手法であり、非特許文献１では階層的なブロックマッチングをベースに、オプティカルフローを算出する。画素間の滑らかさを拘束条件とするため、隣り合うオプティカルフロー間で動きベクトルが滑らかに変化するようなオプティカルフローが得られる。特に急峻な動きまたは遮蔽がない場合に効率的かつ正確な対応点が求められる。また、推定の信頼度を計算できるため、後述するように、信頼度がある閾値より低い対応点を以降の処理より除くことで、全動きベクトルに対する誤った動きベクトルの割合を低減できる。 As a specific method for calculating the corresponding points between the plurality of pictures described above, a method disclosed in Non-Patent Document 1, Non-Patent Document 2, or the like may be used. Both are methods for calculating a motion vector by calculating an optical flow. In Non-Patent Document 1, an optical flow is calculated based on hierarchical block matching. Since the smoothness between pixels is set as a constraint, an optical flow in which a motion vector smoothly changes between adjacent optical flows can be obtained. An efficient and accurate corresponding point is required especially when there is no steep movement or occlusion. Further, since the reliability of the estimation can be calculated, as will be described later, by removing the corresponding points whose reliability is lower than a certain threshold value from the subsequent processing, it is possible to reduce the ratio of erroneous motion vectors to all motion vectors.

これに対して、非特許文献２は、グラフカットベースのオプティカルフロー算出手法を開示している。この手法は、計算コストは高いが、正確な対応点がピクチャ上で密に求まる。また、この手法では、双方向の探索を行ない、互いの相関が閾値より低い対応点は遮蔽領域の画素であると推定する。このため、遮蔽領域に位置する対応点を以降の処理より除くことができる。全動きベクトルに対する誤った動きベクトルの割合を低減できる。 In contrast, Non-Patent Document 2 discloses a graph cut-based optical flow calculation method. Although this method has a high calculation cost, accurate corresponding points can be obtained densely on a picture. In this method, a bidirectional search is performed, and a corresponding point having a correlation lower than a threshold is estimated to be a pixel in the shielding area. For this reason, the corresponding point located in the shielding area can be excluded from the subsequent processing. It is possible to reduce the ratio of erroneous motion vectors to all motion vectors.

この際、すべての画素について動き情報を求めてもよい。また、より高速に処理を行いたい場合には、ピクチャをグリッドに区切って一定間隔のグリッド上の画素についてのみ動き情報を求めてもよいし、上述したように、ピクチャをブロックに区切ってブロックごとに動き情報を求めてもよい。 At this time, motion information may be obtained for all pixels. In order to perform processing at a higher speed, the motion information may be obtained only for pixels on the grid at a fixed interval by dividing the picture into grids. You may also ask for motion information.

この場合、前記ブロックの並進移動を仮定して動きベクトルを算出する方法を用いることができる。特に回転運動をする物体に対しては、並進移動を仮定するよりも、非特許文献３に開示されているアフィン変形を仮定した手法を用いることで、より高精度に画素動きを推定できる。 In this case, it is possible to use a method of calculating a motion vector on the assumption that the block is translated. In particular, for a moving object, it is possible to estimate the pixel motion with higher accuracy by using the method assuming affine deformation disclosed in Non-Patent Document 3, rather than assuming translational movement.

なお、非特許文献１の開示技術を用いて動きベクトルを算出する場合は、信頼度を計算することができる。このため、信頼度の高い動き情報を持つ画素のみを用いてもよい。また、非特許文献２の開示技術を用いて動きベクトルを算出する場合は、遮蔽を推定することができるため、遮蔽されていない画素の動き情報のみを用いてもよい。 In addition, when calculating a motion vector using the technique disclosed in Non-Patent Document 1, the reliability can be calculated. For this reason, only pixels having highly reliable motion information may be used. Further, when the motion vector is calculated using the technology disclosed in Non-Patent Document 2, since the shielding can be estimated, only the motion information of pixels that are not shielded may be used.

ここで、一般的に、オブジェクト追従コメントを生成するために用いることのできる動き推定手法は、時系列画像間で同じオブジェクト領域を判別するための何らかのモデルを前提とした処理になっている。例えば、この処理は、同一オブジェクト領域間の輝度差は０、といった仮定などを含む。さらに実際の動画像１１０には、計測ノイズやモデルとの誤差が含まれるため、「最もモデルを満たす」領域同士を同じオブジェクトと判定し、その同一オブジェクト間の時間的な位置変化から動きを算出する。 Here, in general, a motion estimation method that can be used to generate an object following comment is processing based on some model for discriminating the same object region between time-series images. For example, this process includes an assumption that the luminance difference between the same object areas is zero. Furthermore, since the actual moving image 110 includes measurement noise and an error with the model, the regions that “satisfy the model” are determined to be the same object, and the movement is calculated from the temporal position change between the same objects. To do.

一方、実際には動画像１１０中にオブジェクトが映らなくなる場合などが存在し、こういったケースは、動き推定が本来的に不可能なケースに属する。この場合、もし前記の基準によって「最もモデルを満たす」領域同士を求めたとしても、領域間の動きは実際にはないため、誤った動きが求まっていることになる。しかし、コメント入力受付部１０２によって受け付けられた入力コメント情報１１１が含む、任意のコメント時刻における任意のコメント座標が、それぞれ動き推定可能なケースであるか不可能なケースであるかを示す情報を得るためには、動画像１１０の全てのフレームの全ての画素について事前に手動で情報を与え（人が動きを確認し、動き推定可能かどうかを判定し）、それらの情報をサーバなどに保管しておく必要がある。しかし、大量のフレームおよび画素を有する動画像１１０に、手動で情報を与えることは現実的ではない。したがって、オブジェクト動き決定部１０７は、動き推定を行うにあたり、コメント入力受付部１０２によって受け付けられた入力コメント情報１１１が含む、あるコメント時刻におけるコメント座標のそれぞれについて、動き推定が可能なケースかどうかを判定する。判定基準としては、前述の動きを推定する「モデルが一定以上満たされている」という基準に対して別の閾値を設定する。すなわち、「モデルが一定以上満たされていない」基準を別途設定するのが一般的である。 On the other hand, there are actually cases where an object is not shown in the moving image 110, and such cases belong to cases where motion estimation is inherently impossible. In this case, even if areas that “satisfy the model most” are determined according to the above-described criteria, there is no actual movement between the areas, and thus an incorrect movement is obtained. However, information indicating whether or not the arbitrary comment coordinates at the arbitrary comment time included in the input comment information 111 received by the comment input receiving unit 102 is a case where motion estimation is possible or impossible is obtained. In order to do this, information is manually given in advance to all pixels in all frames of the moving image 110 (a person checks the movement and determines whether the movement can be estimated), and the information is stored in a server or the like. It is necessary to keep. However, manually providing information to the moving image 110 having a large number of frames and pixels is not practical. Accordingly, the object motion determination unit 107 determines whether or not motion estimation is possible for each comment coordinate at a certain comment time included in the input comment information 111 received by the comment input reception unit 102 when performing motion estimation. judge. As a determination criterion, another threshold is set with respect to the criterion of “the model is satisfied more than a certain level” for estimating the motion described above. That is, it is common to separately set a criterion that “the model is not satisfied more than a certain level”.

したがって、オブジェクト動き決定部１０７は、動き推定で用いるモデルを「一定以上満たすかどうか」という第一の判定基準により、動き推定が可能なケースかどうかを判定する。さらに、オブジェクト動き決定部１０７は、動き推定が可能なケースであった場合に、「最もモデルを満たすかどうか」という第二の判定基準を互いに満たす領域同士を同じオブジェクトと判定し、それらの同一オブジェクト間の時間的な位置変化から、動きを算出するものとする。 Therefore, the object motion determination unit 107 determines whether or not motion estimation is possible based on the first determination criterion “whether or not a model used for motion estimation satisfies a certain level”. Further, when the motion estimation is possible, the object motion determination unit 107 determines that the regions satisfying the second determination criterion “whether the model is satisfied most” are the same object, and the same object. The motion is calculated from the temporal position change between the objects.

一方で、上記のような仕組みをもつ動き推定手法には、一種のトレードオフが存在する。 On the other hand, there is a kind of trade-off in the motion estimation method having the above mechanism.

動き推定が可能なケースかどうかを判定する基準を、より厳しく設定すると、モデルとの誤差が比較的小さくても、「動き推定が不可能」な状態を検出できるようになる。このように動き推定が可能なケースかどうかの判定基準を設定する、つまり、誤差許容度を小さくすると、より多くの「動き推定不可能」なケースを正しく検出できる。その一方で、実際には「動き推定が可能」で、例えば、計測ノイズ、またはモデルと実画像との間に生じる誤差等が影響しているケースに対しても、動き推定は不可能な状態として誤判定される。このため、本来中断されるべきでない動き推定の処理が中断され、得られる移動軌跡が短くなってしまう可能性がある。つまり、追従しやすさとしては低下する。 If a criterion for determining whether or not motion estimation is possible is set more strictly, a state where “motion estimation is impossible” can be detected even if the error from the model is relatively small. By setting a criterion for determining whether or not motion estimation is possible, that is, by reducing the error tolerance, more “motion estimation impossible” cases can be detected correctly. On the other hand, it is actually "motion estimation is possible", for example, even when measurement noise or an error that occurs between the model and the actual image has an effect, motion estimation is impossible. Is erroneously determined. For this reason, there is a possibility that the motion estimation process that should not be interrupted is interrupted and the obtained movement trajectory is shortened. That is, the ease of following is reduced.

逆に、追従しやすさを上げるためには、計測ノイズ、またはモデルと実画像との間に生じる誤差が比較的大きい場合であっても、動き推定が可能とする必要がある。このように、ノイズまたはモデル誤差を過度に許容する基準を採用する、つまり、誤差許容度を大きくすると、先ほどとは逆に、動画像１１０中のオブジェクトがフレームアウト等で映らなくなり、実際には動き推定が不可能となってしまった場合であっても、本来中断されるべき動き推定の処理が続行される。このため、誤った動き推定結果を含む（ノイズまたはモデル誤差に非ロバストな）移動軌跡が生成される可能性がある。 Conversely, in order to increase the ease of tracking, it is necessary to enable motion estimation even when measurement noise or an error generated between the model and the actual image is relatively large. In this way, when a criterion that excessively allows noise or model error is adopted, that is, when the error tolerance is increased, contrary to the previous case, the object in the moving image 110 does not appear in the frame out or the like. Even if motion estimation becomes impossible, the motion estimation process that should be interrupted is continued. For this reason, there is a possibility that a movement trajectory including an erroneous motion estimation result (non-robust to noise or model error) is generated.

以上のように、動き推定処理において、ノイズまたはモデル誤差に対してロバストな動き推定を実現することと、遮蔽やシーンチェンジによって生じる「動き推定が不可能なケース」を正確に判定することの間には、図９Ａおよび図９Ｂに示すようなトレードオフの関係が存在する。 As described above, in the motion estimation process, between robust noise estimation and model error and accurate determination of “motion estimation impossible” caused by occlusion and scene change. Has a trade-off relationship as shown in FIGS. 9A and 9B.

図９Ａおよび図９Ｂにおいて、実線の矢印は、上記一般的な動き推定手法によって推定された移動軌跡を示し、点線の矢印は、実際の（正解の）移動軌跡を示す。図９Ａのように、誤差許容度パラメータを低く設定すると、「動き推定が不可能なケース」をより正確に判定でき、移動軌跡が含むノイズを抑えられるが、実際の動きよりも、短い移動軌跡となりやすく、コメント追従を行なえる時間が短くなりやすい。一方、図９Ｂのように誤差許容度パラメータを高く設定すると、コメント追従を行なえる時間は長くなり、図９Ａよりも長い移動軌跡が得られるが、正解の動きと異なる動きを含む動き推定結果になる可能性がある。コメントの視認性を上げるためには、図９Ｂのように移動軌跡がより長いほうが望ましいが、オブジェクトに追従したコメントという観点からは、図９Ａのように、より正確な動きの移動軌跡が得られるほうが望ましい。 9A and 9B, the solid arrow indicates a movement trajectory estimated by the general motion estimation method, and the dotted arrow indicates an actual (correct) movement trajectory. As shown in FIG. 9A, when the error tolerance parameter is set low, it is possible to more accurately determine the “case in which motion estimation is not possible” and to suppress noise included in the movement trajectory, but the movement trajectory is shorter than the actual movement. It is easy to become, and it is easy to shorten time to follow a comment. On the other hand, when the error tolerance parameter is set high as shown in FIG. 9B, the comment tracking time becomes longer and a moving trajectory longer than that in FIG. 9A is obtained, but the motion estimation result including a motion different from the correct motion is obtained. There is a possibility. In order to improve the visibility of the comment, it is desirable that the movement trajectory is longer as shown in FIG. 9B. However, from the viewpoint of the comment following the object, a more accurate movement trajectory can be obtained as shown in FIG. 9A. Is preferable.

これらのトレードオフ関係のもと、コメント表示に最も適した時間長と位置精度を備えた移動軌跡を得るためには、コメントの文字列をユーザが視認するのに最低限必要な移動軌跡の時間長を求め、必要な移動軌跡の時間長を最低限に抑えながら、より正確な動きの移動軌跡が得られるようにする必要がある。 Based on these trade-offs, the minimum time required for the user to visually recognize the comment string is the time required to obtain the movement trajectory with the best time length and position accuracy for comment display. It is necessary to determine the length and obtain a more accurate movement trajectory while minimizing the time length of the necessary movement trajectory.

すなわち、入力コメント情報１１１が含む、任意のコメント時刻における任意のコメント座標ごとに、目標時間決定部１０６が決定した目標時間を最低限必要な移動軌跡の時間長とし、その時間長に等しい、あるいは最も近い移動軌跡を求めることが望ましい。 That is, for each arbitrary comment coordinate included in the input comment information 111, the target time determined by the target time determination unit 106 is set as the minimum time required for the movement trajectory, and is equal to the time length. It is desirable to obtain the closest movement locus.

したがって、オブジェクト動き決定部１０７は、動き推定の際に、ノイズまたはモデルとの輝度差に対し「モデルを一定以上満たす」判定基準閾値を、あらかじめ複数個用意しておき、それぞれの閾値を用いて複数通りの移動軌跡算出を行い、それら複数の結果のうち、目標時間決定部１０６が決定した目標時間に最も近い時間長の移動軌跡の座標を、オブジェクト追従コメントの座標とすることにより、前述のトレードオフを解決するものとする。 Therefore, the object motion determination unit 107 prepares in advance a plurality of determination criterion threshold values “satisfying the model more than a certain level” for noise or luminance difference from the model at the time of motion estimation, and uses each threshold value. A plurality of movement trajectory calculations are performed, and among the plurality of results, the coordinates of the movement trajectory having the time length closest to the target time determined by the target time determination unit 106 are used as the coordinates of the object following comment. The trade-off shall be resolved.

なお、一般的な動き推定におけるトレードオフの関係を調整し、「追従しやすさ」に影響を与えるパラメータは、前述した「モデルを一定以上満たす度合い（誤差許容度）」に限らない。 It should be noted that the parameter affecting the “ease of following” by adjusting the trade-off relationship in general motion estimation is not limited to the “degree of satisfying the model more than a certain level (error tolerance)”.

別の例として、動き推定に利用する窓領域のサイズを挙げることができる。すなわち、窓領域のサイズを大きくすると部分的な輝度変化や変形に対してよりロバストに動き推定が行なえる一方、指定した箇所が遮蔽されたような場合に「動き推定不可能である」ことが検出されにくくなるという、トレードオフがある。このため、窓領域のサイズを移動軌跡の時間長を調整するパラメータとして用いることができる。例えば、オブジェクト動き決定部１０７は、複数のサイズの窓領域をそれぞれ用いて移動軌跡算出を行い、それら複数の結果のうち、目標時間決定部１０６が決定した目標時間に等しい、あるいは最も近い時間長の移動軌跡の座標を、オブジェクト追従コメントの座標とすることで、コメント表示に必要な時間長を有し、かつ、ノイズまたはモデル誤差に対して、最大限ロバストな動き推定結果を得ることができる。 As another example, the size of the window region used for motion estimation can be cited. In other words, if the size of the window area is increased, motion estimation can be performed more robustly against partial luminance changes and deformations, while `` motion estimation is impossible '' when the specified location is blocked. There is a trade-off that it is difficult to detect. For this reason, the size of the window region can be used as a parameter for adjusting the time length of the movement locus. For example, the object motion determination unit 107 performs a movement trajectory calculation using each of a plurality of size window regions, and among these results, a time length equal to or closest to the target time determined by the target time determination unit 106 By using the coordinates of the movement trajectory of the object as the coordinates of the object following comment, it is possible to obtain a motion estimation result that has the time length required for comment display and is robust to noise or model errors. .

さらに別の例として、動き推定に利用する特徴量（画像特徴量）の数を挙げることができる。すなわち、特徴量を多くすると、相対的に一部の特徴量変化にロバストになる。このため、異なる特徴量数の条件下でそれぞれ移動軌跡算出を行い、それら複数の結果のうち、目標時間決定部１０６が決定した目標時間に最も近い時間長の移動軌跡の座標を、オブジェクト追従コメントの座標とすることで、コメント表示に必要な時間長を有し、かつ、ノイズまたはモデル誤差に対して、最大限ロバストな動き推定結果を得ることができる。 As another example, the number of feature quantities (image feature quantities) used for motion estimation can be given. That is, if the feature amount is increased, the change in the feature amount is relatively robust. For this reason, the movement trajectory is calculated under the condition of the different number of features, and the coordinate of the movement trajectory having the time length closest to the target time determined by the target time determination unit 106 among the plurality of results is obtained as the object following comment. By using the coordinates, it is possible to obtain a motion estimation result that has a time length necessary for displaying a comment and is robust to the noise or model error.

なお、「窓領域のサイズ」のパラメータとして、非特許文献１における窓領域のサイズ、非特許文献３における、アフィン変形を仮定するブロックのサイズ、などを同様に用いることができる。したがって、これらを調整することが、先に説明した「窓領域サイズ」に基づき移動軌跡の時間長を調節することを意味する。 Note that as the parameter of “size of window region”, the size of the window region in Non-Patent Document 1, the size of a block assuming affine deformation in Non-Patent Document 3, and the like can be similarly used. Therefore, adjusting these means adjusting the time length of the movement trajectory based on the “window region size” described above.

また、「誤差許容度」のパラメータとして、非特許文献１の信頼度範囲（ｃｏｎｆｉｄｅｎｃｅｍｅａｓｕｒｅｒａｎｇｅｋ３）、非特許文献２の遮蔽重み値（ｏｃｃｌｕｓｉｏｎｐｅｎａｌｔｙ）、または非特許文献３のテクスチャなどの特徴量類似度（ｆｅａｔｕｒｅｄｉｓｓｉｍｉｌａｒｉｔｙ）などを同様に用いることができる。信頼度範囲は、値を０とすれば、より長い移動軌跡が得られやすくなる。特徴量類似度は、値を大きくすれば、より長い移動軌跡が得られやすくなる。したがって、これらを調整することが、先に説明した「誤差許容度」に基づき移動軌跡の時間長を調節することを意味する。 Further, as a parameter of “error tolerance”, a feature amount such as a confidence range (confidence measurement range k3) of Non-Patent Document 1, a shielding weight value (Occlusion Penalty) of Non-Patent Document 2, or a texture of Non-Patent Document 3 is used. Similarity (feature dissimilarity) or the like can be used as well. If the value of the reliability range is 0, a longer movement trajectory can be easily obtained. If the value of the feature quantity similarity is increased, a longer movement trajectory can be easily obtained. Therefore, adjusting these means adjusting the time length of the movement trajectory based on the “error tolerance” described above.

また、「特徴量の数」のパラメータとして、非特許文献３における、推定に利用する特徴点数、などを同様に用いることができる。これらを調整することで、「特徴量の数」に基づく移動軌跡の時間長の調節を実現できる。もちろん、ここで挙げた例以外のパラメータを用いてもよい。非特許文献２における探索範囲（画素動きを仮定する範囲）などを用いても、移動軌跡の時間長を調節することができる。探索範囲は、前述した他の動きパラメータと同様に、移動軌跡の時間長と動き推定精度（移動軌跡の推定精度）とのトレードオフに関係しており、探索範囲を広げればより長い移動軌跡が得られる一方、正解の動きと異なる動きを含む動き推定結果になる可能性がある。なお、これまでに述べた動き検出手法以外の既知の動き検出手法においても、同様に移動軌跡の時間長と動き推定精度に関係するパラメータが存在し、同様に用いることができる。 Further, as the parameter of “number of feature quantities”, the number of feature points used for estimation in Non-Patent Document 3 can be similarly used. By adjusting these, it is possible to adjust the time length of the movement trajectory based on the “number of feature amounts”. Of course, parameters other than those given here may be used. The time length of the movement trajectory can also be adjusted using the search range (a range in which pixel motion is assumed) in Non-Patent Document 2. The search range is related to the trade-off between the time length of the movement trajectory and the motion estimation accuracy (estimation accuracy of the movement trajectory), as with the other motion parameters described above. On the other hand, there is a possibility of a motion estimation result including motion different from the correct motion. It should be noted that parameters related to the time length of the movement trajectory and the motion estimation accuracy also exist in known motion detection methods other than the motion detection methods described so far and can be used in the same manner.

モデル誤差の閾値、窓領域のサイズ、または特徴量の数といったパラメータは、前述のように、一つだけを用いて移動軌跡の時間長を調節してもよいし、複数を組み合わせて移動軌跡の時間長を調節してもよい。例えば、最も単純な例では、モデル誤差の閾値、窓領域のサイズ、特徴量の数の各パラメータについて２通りずつ予め設定しておき、全８通りの組合せで得られた結果のうち、目標時間決定部１０６が決定した目標時間に最も近い時間長の移動軌跡の座標を、オブジェクト追従コメントの座標とするとしてもよい。 As described above, the parameter such as the model error threshold, the size of the window area, or the number of features may be adjusted using only one parameter, or the time length of the movement trajectory may be adjusted by combining a plurality of parameters. The time length may be adjusted. For example, in the simplest example, two parameters are preset for each of the model error threshold value, the window area size, and the number of feature values, and the target time is obtained from the results obtained in all eight combinations. The coordinates of the movement trajectory having the time length closest to the target time determined by the determination unit 106 may be used as the coordinates of the object following comment.

また、追従対象の動きが人のように複雑で特定のモデルを仮定することが難しいような場合には、複数の動き検出手法を用いて移動軌跡を算出し、複数の手法から算出した複数の結果のうち、目標時間決定部１０６が決定した目標時間に最も近い時間長の移動軌跡の座標を、オブジェクト追従コメントの座標とするとしてもよい。これにより、様々な動きに対して、よりロバストに目標時間に近い移動軌跡が得られる。 In addition, when the movement of the target to be tracked is complex and difficult to assume a specific model, it is difficult to assume a specific model. Of the results, the coordinates of the movement trajectory having the time length closest to the target time determined by the target time determination unit 106 may be used as the coordinates of the object following comment. As a result, a movement trajectory closer to the target time can be obtained with respect to various movements.

なお、ここまでは、本質的に動き推定が可能なケースについて、所望のコメント表示時間である目標時間に基づき移動軌跡を求める手法について述べた。 Up to this point, a method for obtaining a movement locus based on a target time, which is a desired comment display time, has been described for a case in which motion estimation is essentially possible.

しかし、特に、ＴＶ番組映像またはムービーで録画した映像などの一般的な動画像１１０では、シーンチェンジ、他のオブジェクトによる遮蔽、自己遮蔽、またはオブジェクトもしくはカメラの移動により、指定画素または指定領域の撮影範囲外への移動、などが起こりやすい。指定画素または指定領域が、動画像中に映らなくなるような場合には、目標時間に満たないある時点で、以降の動き推定が不可能になってしまうようなケースが数多く発生する。このような場合、パラメータを調整しても、短い移動軌跡しか得られない、もしくは、得られた移動軌跡の推定精度が著しく低下する。ユーザにとって見やすいオブジェクト追従コメントを生成するためには、先にも述べた「動き推定不可能」なケースを正しく判定し、「動き推定不可能」な状態が生じた要因に応じた処理をそれぞれ導入することが望ましい。 However, in particular, in the case of a general moving image 110 such as a TV program video or a video recorded in a movie, a specified pixel or a specified area is captured by a scene change, occlusion by another object, self-occlusion, or movement of the object or camera. Moving out of range is likely to occur. When the designated pixel or the designated area does not appear in the moving image, there are many cases in which the subsequent motion estimation becomes impossible at a certain time point less than the target time. In such a case, even if the parameters are adjusted, only a short movement trajectory can be obtained, or the estimation accuracy of the obtained movement trajectory is significantly reduced. In order to generate object follow-up comments that are easy for the user to see, the above-mentioned cases where “motion estimation is impossible” are correctly determined, and processing according to the factors that caused the “motion estimation impossible” state has been introduced. It is desirable to do.

すなわち、オブジェクト動き決定部１０７は、目標時間に最も近い時間長の移動軌跡であっても、目標時間に対して時間長が（一定値以上）短い場合に、「動き推定不可能」な状態が生じた要因のうち、「遮蔽」および「シーンチェンジ」のどちらが生じたかを判定し、判定結果に応じて処理を切り替える。 That is, the object motion determination unit 107 is in a state that “motion estimation is not possible” even when the movement trajectory has a time length closest to the target time and is shorter than the target time (a certain value or more). Of the generated factors, it is determined whether “shielding” or “scene change” has occurred, and the process is switched according to the determination result.

遮蔽が生じたかシーンチェンジが生じたかは、例えば、画像全体の輝度ヒストグラムの時間変化などから判定することができる。すなわち、輝度ヒストグラムが、フレーム間で大きく変化している場合（例えば、輝度ヒストグラムの変化量が所定の閾値以上の場合）には、シーンチェンジが生じたと判定することができるし、それ以外の場合は遮蔽が生じたと判定することができる。あるいは、メタデータとしてシーンチェンジの時間情報が動画像１１０に付随している場合、それを利用してシーンチェンジが生じたか否かを判定することも可能である。 Whether the occlusion has occurred or the scene change has occurred can be determined from, for example, a temporal change in the luminance histogram of the entire image. That is, when the luminance histogram changes greatly between frames (for example, when the amount of change in the luminance histogram is equal to or greater than a predetermined threshold), it can be determined that a scene change has occurred, and in other cases Can be determined to be occluded. Alternatively, in the case where scene change time information is attached to the moving image 110 as metadata, it is possible to determine whether or not a scene change has occurred using this information.

図１０Ａを用いて、「遮蔽」と判定された場合にオブジェクト動き決定部１０７が行なう処理について説明する。遮蔽によって動き推定が中断されるケースではオブジェクトは見えなくなるが、遮蔽物の背後をそれまでと同様な動きでオブジェクトが動く可能性、つまり動画像の画面中にオブジェクトがまだ存在する可能性が高い。したがって、オブジェクト動き決定部１０７は、オブジェクトが動いていそうな領域を推定し、例えば、求められているオブジェクトの動きを補外することなどにより、移動軌跡を伸長し、コメント追従軌跡を生成する。補外には、線形補間などを用いることができる。 A process performed by the object motion determination unit 107 when it is determined as “occlusion” will be described with reference to FIG. 10A. If the motion estimation is interrupted due to occlusion, the object will not be visible, but it is likely that the object will move behind the occlusion with the same movement as before, that is, it is likely that the object still exists in the video screen. . Accordingly, the object motion determination unit 107 estimates a region where the object is likely to move, and extends the movement trajectory by, for example, extrapolating the motion of the object that is obtained, and generates a comment tracking trajectory. For extrapolation, linear interpolation or the like can be used.

図１０Ｂを用いて、「シーンチェンジ」と判定された場合に、オブジェクト動き決定部１０７が行なう処理について説明する。シーンチェンジの場合、追従対象オブジェクトはカメラの外にいるか、カメラの画面内に存在していても、それまでと別の場所に映っている可能性が高い。したがって、「遮蔽」と判断された場合と同じように、動きを伸長してしまうと、逆にユーザにとって違和感を与えるようなコメント追従軌跡を生成してしまう可能性が高い。したがって、「シーンチェンジ」と判定された場合は、移動軌跡は伸長しない。すなわち、例外的に、オブジェクト動き決定部１０７で算出する移動軌跡の時間長は、目標時間に対して短くてもよいものとし、移動軌跡はシーンチェンジが起こったフレームまでとする。なお、同様に、オブジェクトが画像端まで移動したために、動き推定不可能な状態となった場合も、「シーンチェンジ」に含める。すなわち、当該フレーム以降の動きは伸長せず、当該フレームまでの移動軌跡を出力する。 A process performed by the object motion determination unit 107 when “scene change” is determined will be described with reference to FIG. 10B. In the case of a scene change, there is a high possibility that the object to be tracked is outside the camera or is present in a different location from the previous one even if it exists in the camera screen. Therefore, as in the case where it is determined as “blocking”, if the movement is extended, there is a high possibility that a comment tracking locus that gives a sense of incongruity to the user is generated. Therefore, if it is determined as “scene change”, the movement trajectory is not expanded. In other words, exceptionally, the time length of the movement trajectory calculated by the object motion determination unit 107 may be shorter than the target time, and the movement trajectory is up to the frame where the scene change has occurred. Similarly, a “scene change” also includes a case where motion cannot be estimated because the object has moved to the edge of the image. That is, the movement after the frame is not expanded, and the movement trajectory to the frame is output.

なお、「遮蔽」と判定された場合に、オブジェクト動き推決定部１０７は、以下に示す手順により移動軌跡を伸長しても良い。対応点の情報は、２フレーム間で定義されている。このため、移動軌跡は時間軸をさかのぼって生成することも可能である。つまり、図１１Ａに示すように、線形補間の代わりに、コメント開始時刻から遡って移動軌跡を伸長できる場合には、オブジェクト動き決定部１０７は、コメント開始時刻を前倒しすることにより目標時間分の移動軌跡を算出しても良い。オブジェクトがより見えている時間帯の方向に長い移動軌跡が得られるため、ユーザにとって違和感のないコメント表示を実現するコメント追従座標を得ることができる。 When it is determined as “occlusion”, the object motion estimation determining unit 107 may extend the movement locus by the following procedure. Corresponding point information is defined between two frames. For this reason, the movement trajectory can be generated by going back the time axis. That is, as shown in FIG. 11A, instead of linear interpolation, when the movement trajectory can be extended retroactively from the comment start time, the object motion determination unit 107 moves the target time by moving the comment start time forward. A trajectory may be calculated. Since a long movement trajectory is obtained in the direction of the time zone in which the object is more visible, it is possible to obtain comment following coordinates that realize a comment display that is comfortable for the user.

また、「遮蔽」と判定された場合に、オブジェクト動き推決定部１０７は、以下に示す手順により移動軌跡を伸長しても良い。つまり、画像上の距離が近接した領域の画素は、類似した動きをすると一般的に仮定できる。そこで、オブジェクト動き決定部１０７は、図１１Ｂに示すように、コメント入力受付部１０２によって受け付けられた入力コメント情報１１１が含む、あるコメント時刻におけるコメント座標から、予め定められた画素範囲Ｒ内に存在する座標の画素についても同様に複数の移動軌跡を求める。オブジェクト動き決定部１０７は、これらの複数の移動軌跡から、最も目標時間に近い時間長の移動軌跡を選択し、選択した移動軌跡の動きを、指定画素の動きとして算出してもよい。周辺の情報を用いることにより、よりノイズなどに対しロバストなコメント追従座標を得ることができる。 Further, when it is determined as “occlusion”, the object motion estimation determining unit 107 may extend the movement locus by the following procedure. In other words, it can be generally assumed that the pixels in a region close to each other on the image move in a similar manner. Therefore, the object motion determination unit 107 exists within a predetermined pixel range R from the comment coordinates at a certain comment time included in the input comment information 111 received by the comment input reception unit 102 as shown in FIG. 11B. Similarly, a plurality of movement trajectories are obtained for the pixels having the coordinates. The object movement determination unit 107 may select a movement locus having a time length closest to the target time from the plurality of movement loci, and calculate the movement of the selected movement locus as the movement of the designated pixel. By using peripheral information, it is possible to obtain comment tracking coordinates that are more robust against noise and the like.

別の例として、コメント入力受付部１０２が受け付けるコメント座標１１２は、図１２Ａのように、領域として指定されてもよい。この場合、受け付けられた入力コメント情報１１１が含む、あるコメント時刻における、指定領域に対応する複数のコメント座標１１２を、予め定められた画素範囲Ｒの代わりに用いることができる。 As another example, the comment coordinates 112 received by the comment input receiving unit 102 may be specified as an area as shown in FIG. 12A. In this case, a plurality of comment coordinates 112 corresponding to the designated area at a certain comment time included in the received input comment information 111 can be used instead of the predetermined pixel range R.

さらに別の例として、オブジェクト動き決定部１０７は、領域分割部を備え、領域分割部がピクチャの領域分割を行う。オブジェクト動き決定部１０７は、領域分割部により分割された領域のうち、図１２Ｂに示すように、入力コメント情報１１１が含む、あるコメント時刻におけるコメント座標が含まれる領域を、予め定められた画素範囲Ｒの代わりに用いてもよい。 As yet another example, the object motion determination unit 107 includes a region dividing unit, and the region dividing unit performs region division of a picture. As shown in FIG. 12B, the object motion determination unit 107 determines a region including the comment coordinates at a certain comment time included in the input comment information 111 among the regions divided by the region dividing unit. It may be used instead of R.

例えば、領域分割部は、各移動軌跡が属する画素またはブロックの色類似度に基づいて、ピクチャを複数の領域に分割する。また、画素の色類似度をもとに、いわゆる”ｓｕｐｅｒｐｉｘｅｌ”と呼ばれる複数の領域にピクチャを分割する方法を用いてもよい。ｓｕｐｅｒｐｉｘｅｌを算出する方法については、グラフベースの手法などを用いることができる。処理手順の詳細説明は非特許文献４等に記載されているため省略する。この手法は、ピクチャのグラフ表現に基づいて各領域間の境界を推定することで、効率的かつグローバルな特徴を保持しながらピクチャを小領域に分割するものであり、遮蔽に強いため、よりロバストな分割ができる。 For example, the region dividing unit divides the picture into a plurality of regions based on the color similarity of the pixel or block to which each movement locus belongs. Also, a method of dividing a picture into a plurality of regions called “superpixel” based on the color similarity of pixels may be used. As a method for calculating the superpixel, a graph-based method or the like can be used. A detailed description of the processing procedure is described in Non-Patent Document 4 and the like, and will be omitted. This method divides the picture into small regions while maintaining efficient and global features by estimating the boundary between regions based on the graphical representation of the picture, and is more robust because it is more resistant to occlusion. Can be divided.

特に移動体の色と背景の色が異なるようなシーンにおいては、各領域が移動体のみ、または背景のみで構成される可能性が高くなる（色が異なると、別のサブクラスに分離される確率が高まる）ため、ピクチャを、類似した動きをする領域に、より正しく分割することができる。 Especially in a scene where the color of the moving object and the background color are different, there is a high possibility that each area is composed of only the moving object or only the background (the probability that each area will be separated into different subclasses). Therefore, the picture can be more correctly divided into regions with similar motion.

また、動きに基づく領域分割方法を適用してもよい。具体的には、特許文献７のような手法を用いてもよい。上記のようにすることで、移動体と背景の色が類似する場合であっても、類似した動きをする被写体領域ごとに、より正しくピクチャを分割できる。 Further, a region division method based on motion may be applied. Specifically, a technique such as that disclosed in Patent Document 7 may be used. By doing as described above, even when the color of the moving object and the background is similar, the picture can be divided more correctly for each subject area that moves similarly.

さらに別の例として、オブジェクト動き決定部１０７は、領域分割部を備え、複数の移動軌跡を求める代わりに、図１３Ａおよび図１３Ｂに示すように領域分割結果を複数用意し、それぞれの領域の動きおよび追従時間長を、領域に含まれる移動軌跡の平均および移動軌跡の最小時間長として求める。オブジェクト動き決定部１０７は、そのうち、ユーザが指定した領域を分割した領域のうち、追従時間長が最も目標時間に近い領域を選択し、その領域の動きを指定画素の動きとして算出してもよい。一般的に、分割粒度と追従時間長にはトレードオフの関係がある。例えば、図１３Ａのような粗い領域分割の場合に、指定した領域の追従時間長が目標時間よりも短かったとしても、図１３Ｂのように、細かい領域分割をすることで、追従時間長を目標時間に近づけることができる。図１３Ｂのように、頭部の領域は追従時間長が短くなるが、ユーザが指定した体の領域では、より長く追従できる等の結果となることがある。この場合は、図１３Ｂの結果を用いる。１点の画素だけでなく、周辺画素の情報も用いるため、よりノイズ等に対してロバストなオブジェクト追従コメント座標を得ることができる。領域分割方法としては、前述のような、色類似度に基づく手法や、動きに基づく手法を用いることができる。 As yet another example, the object motion determination unit 107 includes an area dividing unit, and instead of obtaining a plurality of movement trajectories, a plurality of area division results are prepared as shown in FIGS. 13A and 13B, and the movement of each area is determined. The tracking time length is obtained as the average of the movement trajectories included in the region and the minimum time length of the movement trajectories. The object motion determination unit 107 may select a region whose tracking time length is closest to the target time from among regions obtained by dividing the region designated by the user, and calculate the motion of the region as the motion of the designated pixel. . In general, there is a trade-off relationship between the division granularity and the tracking time length. For example, in the case of coarse area division as shown in FIG. 13A, even if the follow-up time length of the designated area is shorter than the target time, the follow-up time length can be set as a target by performing fine area division as shown in FIG. 13B. Can approach time. As shown in FIG. 13B, the head region has a shorter follow-up time length, but the body region specified by the user may result in longer follow-up. In this case, the result of FIG. 13B is used. Since information on not only one pixel but also surrounding pixels is used, it is possible to obtain object tracking comment coordinates that are more robust against noise and the like. As the region dividing method, the method based on the color similarity as described above or the method based on motion can be used.

なお、図１３Ａおよび図１３Ｂのように、ユーザが指定した位置が分割領域の端のほうであった場合などでも、ユーザの意図した位置に合ったコメントを出すためには、最も簡易には、分割領域の重心座標位置を各フレームで求める。また、指示開始フレームにおけるユーザ指示座標位置の、分割領域の重心座標位置からの相対的な位置を保持しておいて、それにより常に追従結果を補正するとよい。なお、分割領域のサイズが変化した場合には、追従結果補正後の座標が、実際の分割領域上から外れてしまう可能性が高まる。したがって、そのような場合は、分割領域の外縁を矩形で囲む等により求めた分割矩形領域の中で重心を求めておき、矩形領域のサイズ変動、または矩形領域中の移動軌跡数の変動に基づいて、ユーザ指示座標位置と分割領域の重心座標位置との間の位置関係を補正し、補正した位置関係を用いて、追従結果を補正するとよい。なお、各フレームで分割領域の重心座標位置を求める代わりに、指示開始フレームで分割領域の重心座標位置を求めておき、指示開始フレームにおける分割領域の重心座標位置より移動軌跡を算出した結果を用いて、同様に相対的な位置に基づき、ユーザ指示位置からの移動軌跡を決定してもよい。 Note that, as shown in FIGS. 13A and 13B, even when the position designated by the user is the end of the divided area, in order to make a comment that matches the position intended by the user, The center-of-gravity coordinate position of the divided area is obtained for each frame. In addition, the relative position of the user instruction coordinate position in the instruction start frame from the center of gravity coordinate position of the divided region is held, and thereby the follow-up result is always corrected. In addition, when the size of the divided area is changed, there is a high possibility that the coordinates after the follow-up result correction are out of the actual divided area. Therefore, in such a case, the center of gravity is obtained in the divided rectangular area obtained by surrounding the outer edge of the divided area with a rectangle, etc., and based on the size fluctuation of the rectangular area or the fluctuation of the number of moving tracks in the rectangular area. Thus, it is preferable to correct the positional relationship between the user-indicated coordinate position and the barycentric coordinate position of the divided region, and to correct the follow-up result using the corrected positional relationship. Instead of obtaining the centroid coordinate position of the divided area in each frame, the centroid coordinate position of the divided area is obtained in the instruction start frame, and the movement locus is calculated from the centroid coordinate position of the divided area in the instruction start frame. Similarly, the movement trajectory from the user designated position may be determined based on the relative position.

なお、本開示ではオブジェクト動き決定部１０７はオブジェクトの動きを動画像から推定するとしていたが、オブジェクト動き決定部１０７の動作はこれに限られない。たとえば、別途あらかじめ動画像から複数のパラメータを用いて、フレーム間の動きを推定しておき、データベースないしテーブルとして、コメント情報生成装置内部または外部に備えた記憶装置に保存しておく。オブジェクト動き決定部１０７は、コメント入力を受け付けた際には、動画像から動きを推定する代わりに、有線または無線を介して前記データベースないしテーブルを参照し、移動軌跡を得るとしてもよい。図１４はデータベースの一例を示す図である。データベースは、時間長ごとに移動軌跡の情報を含んでいる。あらかじめ動き推定を行っておくことで、コメントが入力された際により高速にオブジェクト動き決定処理を行うことができる。 In the present disclosure, the object motion determination unit 107 estimates the motion of the object from the moving image, but the operation of the object motion determination unit 107 is not limited to this. For example, the motion between frames is estimated in advance separately from a moving image using a plurality of parameters, and is stored as a database or table in a storage device provided inside or outside the comment information generation device. When the object motion determination unit 107 receives a comment input, the object motion determination unit 107 may obtain the movement locus by referring to the database or table via a wired or wireless method instead of estimating the motion from the moving image. FIG. 14 is a diagram illustrating an example of a database. The database includes information on the movement trajectory for each time length. By performing motion estimation in advance, the object motion determination process can be performed at a higher speed when a comment is input.

なお、図１５に示すように、オブジェクト動き生成部１０３は、さらに目標時間補正部１０８を備え、目標時間補正部１０８は、オブジェクト動き決定部１０７が算出した移動軌跡に基づいて、目標時間決定部１０６が決定した目標時間を補正するとしてもよい。 As illustrated in FIG. 15, the object motion generation unit 103 further includes a target time correction unit 108, and the target time correction unit 108 is based on the movement trajectory calculated by the object motion determination unit 107. The target time determined by 106 may be corrected.

例えば、全く同じコメント文を有するコメントであっても、速く移動するオブジェクトに追従して表示されるコメントは、よりゆっくり移動するオブジェクトや静止したオブジェクトに追従して表示されるコメントに比べて、コメント文が読みづらくなり、読み終わるのに時間がかかってしまう。したがって、オブジェクトの速度に応じて目標時間を補正することがより望ましい。すなわち、同じ時間でより長い距離を移動するオブジェクトに付与されたコメントについては、より目標時間が長くなるよう目標時間を補正するものとする。 For example, even if a comment has exactly the same comment text, a comment displayed following an object that moves faster than a comment displayed following a slower moving object or a stationary object The sentence becomes difficult to read and it takes time to finish reading. Therefore, it is more desirable to correct the target time according to the speed of the object. That is, for a comment given to an object that moves a longer distance in the same time, the target time is corrected so that the target time becomes longer.

具体的には、目標時間補正部１０８は、オブジェクト動き決定部１０７が決定した移動軌跡を受け取り、オブジェクトの速さ（単位時間あたりの移動距離）を算出する。オブジェクトの速さは、最も単純には、各フレームごとの動きベクトルの長さを平均した値などで表せばよい。 Specifically, the target time correction unit 108 receives the movement locus determined by the object movement determination unit 107, and calculates the speed of the object (movement distance per unit time). The speed of the object is most simply expressed by an average value of the lengths of the motion vectors for each frame.

算出したオブジェクトの速さがゼロであった場合は、オブジェクトは静止しているため目標時間は補正しなくともよい。 If the calculated speed of the object is zero, the target time may not be corrected because the object is stationary.

それ以外の場合、オブジェクトの速さに基づき、目標時間が長くなるよう補正することが望ましい。たとえば、オブジェクトの速さとして可能性がある最大値について目標時間を２倍にするとし、速さがゼロの場合の目標時間から、算出されたオブジェクトの速さに相当する目標時間を線形に求めるものとしてもよい。画像のサイズによっては２倍以外の値を採用してもよい。また、オブジェクトの速さとして可能性がある最大値としては、たとえば、１フレームで画像の対角線上を端から端に移動した場合の動きベクトルの長さを用いてもよい。事前にシーンの種類がわかっている場合には、シーンの種類に応じて、たとえば、１フレームで画像の中心から角に移動した場合の動きベクトルの長さを用いる等としてもよい。 In other cases, it is desirable to correct the target time to be longer based on the speed of the object. For example, when the target time is doubled for the maximum possible object speed, the target time corresponding to the calculated object speed is obtained linearly from the target time when the speed is zero. It may be a thing. A value other than double may be adopted depending on the size of the image. Further, as the maximum possible value of the speed of the object, for example, the length of a motion vector when moving on the diagonal line of the image from one end to the other in one frame may be used. If the scene type is known in advance, the length of the motion vector when moving from the center of the image to the corner in one frame may be used according to the type of the scene.

また、人の視野角に基づいて基準を設定してもよい。人が画面上のある部分を注視しているときに、明確に視認している領域（中心視）は、視野角にしておよそ２度程度と言われている。そこで、オブジェクトの速さとして、前記視野角に相当する速度までは中心視の領域を外れないため、オブジェクト静止時と相違ない速さでコメントを読めるものとすると、前記オブジェクトの速さは前記中心視の視野角、および視聴時の画面とユーザの目との距離、及び視聴時のディスプレイサイズおよび解像度から容易に求めることができる。 Further, the reference may be set based on a person's viewing angle. It is said that the region (central vision) that is clearly visible when a person is gazing at a certain part on the screen is about 2 degrees in view angle. Therefore, as the speed of the object, since the central vision area cannot be deviated until the speed corresponding to the viewing angle, if the comment can be read at a speed different from that when the object is stationary, the speed of the object is the center speed. It can be easily obtained from the viewing angle of view, the distance between the screen during viewing and the user's eyes, and the display size and resolution during viewing.

たとえば視聴時の画面とユーザの目との距離を、おおよそ４０ｃｍ等と仮定し、視聴時のディスプレイサイズおよび解像度を９．７インチ（１９．７１ｃｍ×１４．７８ｃｍ）、２０４８×１５３６ピクセルと仮定すると、前記オブジェクトの速度は約１４５ピクセルとなる。したがって、オブジェクトの速さが１４５ピクセルを超えた場合について、目標時間をより長くする、としてもよい。たとえば、疑似的にオブジェクトの速さが前記中心視の視野角に相当する画素数になるよう、目標時間に対して、前記オブジェクトの速さを、中心視に対応するオブジェクトの速さ（前記の例では１４５ピクセル）で割った値をかけて算出した値を、補正された目標時間として求めてもよい。 For example, assuming that the distance between the screen during viewing and the user's eyes is approximately 40 cm, the display size and resolution during viewing are assumed to be 9.7 inches (19.71 cm × 14.78 cm) and 2048 × 1536 pixels. The speed of the object is about 145 pixels. Therefore, the target time may be made longer when the speed of the object exceeds 145 pixels. For example, the speed of the object is set to the speed of the object corresponding to the central view (the above-described speed) with respect to the target time so that the speed of the object becomes a number of pixels corresponding to the viewing angle of the central view. In the example, a value calculated by multiplying by 145 pixels) may be obtained as the corrected target time.

上記のようにして補正された目標時間をオブジェクト動き決定部１０７は受け取り、前記補正された目標時間に基づき、再度オブジェクト動き決定部１０７は、先に述べたような方法で移動軌跡を算出する。このようにすることで、動きが速い被写体に付与されたコメントについても、視認性が低下しないような移動軌跡を生成することができる。 The object motion determination unit 107 receives the target time corrected as described above, and based on the corrected target time, the object motion determination unit 107 calculates the movement locus again by the method described above. In this way, it is possible to generate a movement trajectory that does not decrease the visibility even for a comment given to a fast-moving subject.

なお、上述した目標時間補正部１０８とオブジェクト動き決定部１０７の間での目標時間補正の処理のループは、一度だけ行ってもよいし、より精度を高めるため複数回行ってもよい。 Note that the target time correction processing loop between the target time correction unit 108 and the object motion determination unit 107 described above may be performed only once, or may be performed a plurality of times in order to improve accuracy.

図５を参照して、出力コメント情報生成ステップＳ３０４が出力コメント情報生成部１０４により実行される。出力コメント情報生成部１０４は、オブジェクト動き生成部１０３が生成したオブジェクト動き推定結果と、入力コメント情報１１１より出力コメント情報を生成する。 Referring to FIG. 5, output comment information generation step S304 is executed by output comment information generation unit 104. The output comment information generation unit 104 generates output comment information from the object motion estimation result generated by the object motion generation unit 103 and the input comment information 111.

出力コメント情報は、最低限の情報として、入力コメント情報１１１のうちコメント時刻、コメント対象座標、および文章情報と、オブジェクト動き生成部１０３が生成した、複数フレーム分のオブジェクト追従コメント座標値とを含む。さらに、出力コメント情報は、入力コメント情報１１１が、コメントの色、コメントの表示（フキダシ）形状、コメントの文字列のフォント等を含む場合には、それらの情報を、出力コメント情報に含んでもよい。これにより、よりユーザの意向を反映した表示が可能な出力コメント情報を生成できる。 The output comment information includes, as minimum information, comment time, comment target coordinates, and text information of the input comment information 111, and object follow comment coordinate values for a plurality of frames generated by the object motion generation unit 103. . Furthermore, as for the output comment information, when the input comment information 111 includes a comment color, a comment display (balloon) shape, a comment character string font, and the like, these information may be included in the output comment information. . Thereby, output comment information that can be displayed more reflecting the user's intention can be generated.

最後に、出力ステップＳ３０５が出力部１０５により実行される。出力部１０５は、出力コメント情報生成部１０４が生成した出力コメント情報を、記憶装置１２０へ有線または無線の通信経路を通じて出力する。より具体的には、出力部１０５は、出力コメント情報が含む、投稿されたコメント文字列、およびそのコメントを追従して表示させるオブジェクトの位置情報を、コメント蓄積配信サーバに備えられたコメントデータベースに記憶させる。 Finally, the output step S305 is executed by the output unit 105. The output unit 105 outputs the output comment information generated by the output comment information generation unit 104 to the storage device 120 through a wired or wireless communication path. More specifically, the output unit 105 displays the posted comment character string included in the output comment information and the position information of the object to be displayed following the comment in a comment database provided in the comment accumulation / delivery server. Remember me.

コメント情報生成装置１００は、連続して入力される入力コメント情報１１１に対して処理を行う場合には、上述したステップＳ３０１〜Ｓ３０７の動作を入力コメント情報１１１が入力されるたびに繰り返し行うとしても良い。 When the comment information generating apparatus 100 performs processing on the input comment information 111 that is continuously input, the operation of steps S301 to S307 described above may be repeatedly performed each time the input comment information 111 is input. good.

なお、コメント情報生成装置１００は、コメントの長さに応じて、異なる移動軌跡を生成するため、同じフレーム、座標位置につけられたコメントであっても、コメントの長さが異なれば、表示装置上でコメント付き動画像を表示した際に、異なる動きをするコメントが表示されるような、移動軌跡を生成することができる。 Since the comment information generation device 100 generates different movement trajectories according to the length of the comment, even if the comments are attached to the same frame and coordinate position, Thus, when a moving image with a comment is displayed, a movement trajectory can be generated such that a comment that moves differently is displayed.

以上のように、本実施の形態におけるコメント情報生成装置１００は、目標時間決定部１０６が決定した目標時間に等しい、あるいは最も近い時間長の移動軌跡の座標を、オブジェクト追従コメントの座標とすることで、コメント表示に必要な時間長を有し、かつ、ノイズまたはモデル誤差に対して、最大限ロバストな動き推定結果を得る。これによって、ユーザがコメントを付与することを意図して指定したオブジェクトを時間的に追跡した結果として、違和感のないオブジェクト追従コメントのための座標を生成することができる。 As described above, the comment information generating apparatus 100 according to the present embodiment uses the coordinates of the moving trajectory having the time length equal to or closest to the target time determined by the target time determining unit 106 as the coordinates of the object following comment. Thus, the motion estimation result having the time length necessary for displaying the comment and maximally robust against noise or model error is obtained. Thereby, as a result of temporally tracking an object designated with the intention of giving a comment, it is possible to generate coordinates for an object following comment without a sense of incongruity.

つまり、コメント情報生成装置１００は、コメントに基づき、コメントを表示すべき目標時間を決定し、移動軌跡の連続する時間が目標時間になるようにオブジェクトの移動軌跡を推定している。推定されたオブジェクトの移動軌跡は、コメントを追従表示させるための移動軌跡として用いることができる。このため、ユーザは、コメント付き動画像表示時に、コメントを表示時間内に読むことができ、そのコメントがどのオブジェクトに対して付与されたものであるかを判断することができる。よって、コメントの視認性を向上させることのできる出力コメント情報を生成することができる。 That is, the comment information generating apparatus 100 determines a target time for displaying a comment based on the comment, and estimates the movement trajectory of the object so that the continuous time of the movement trajectory becomes the target time. The estimated movement trajectory of the object can be used as a movement trajectory for displaying the comment following. For this reason, the user can read the comment within the display time when the moving image with the comment is displayed, and can determine to which object the comment is given. Therefore, it is possible to generate output comment information that can improve comment visibility.

本発明は、ネットワーク上における動画像を通じたコミュニケーションに際し、ユーザが指定したオブジェクトに追従するコメントの出力コメント情報を生成するコメント情報生成装置に利用できる。例えば、本発明は、ネットワーク接続を通じて動画像の取得または閲覧が可能なタブレットＰＣ、スマートフォン、ＰＣ、ビデオカメラまたはＴＶ等のＡＶ機器に内蔵させるコメント情報生成装置等として利用することができる。 INDUSTRIAL APPLICABILITY The present invention can be used for a comment information generating apparatus that generates output comment information of a comment that follows an object specified by a user during communication through a moving image on a network. For example, the present invention can be used as a comment information generation device incorporated in an AV device such as a tablet PC, a smartphone, a PC, a video camera, or a TV that can acquire or view a moving image through a network connection.

１００コメント情報生成装置
１０１動画取得部
１０２コメント入力受付部
１０３オブジェクト動き生成部
１０４出力コメント情報生成部
１０５出力部
１０６目標時間決定部
１０７オブジェクト動き決定部
１０８目標時間補正部
１１０動画像
１１１入力コメント情報
１１２コメント座標
１２０記憶装置
２００コンピュータ
２０１ａ、２０１ｂ、２０６Ｉ／Ｆ
２０２ＣＰＵ
２０３ＲＯＭ
２０４ＲＡＭ
２０５ＨＤＤ
２１０ａ、２２０記憶装置
２１０ｂ入力装置DESCRIPTION OF SYMBOLS 100 Comment information generation apparatus 101 Movie acquisition part 102 Comment input reception part 103 Object motion generation part 104 Output comment information generation part 105 Output part 106 Target time determination part 107 Object motion determination part 108 Target time correction part 110 Movie 111 Input comment information 112 Comment coordinates 120 Storage device 200 Computer 201a, 201b, 206 I / F
202 CPU
203 ROM
204 RAM
205 HDD
210a, 220 storage device 210b input device

Claims

A video acquisition unit that accepts moving images;
A comment input receiving unit that receives position information of the object in the moving image received by the moving image acquisition unit and a comment that causes the object to follow-up display from a specific timing;
A target time determination unit that determines a target time that is a target value of a time length for continuing to display the comment based on the comment received by the comment input reception unit;
Object movement determination for determining the movement trajectory of the object so that the movement trajectory of the object for displaying the comment following the object indicated by the position information becomes a movement trajectory of the time length of the target time. And
An output comment information generation unit that generates output comment information including the comment and the movement trajectory of the object determined by the object movement determination unit;
A comment information generating apparatus comprising: an output unit that outputs the output comment information generated by the output comment information generating unit.

The target time determination unit calculates the target time so that the longer the length of the comment received by the comment input reception unit, the longer the target time,
The comment information generation device according to claim 1, wherein the output unit outputs a movement trajectory having a longer time length when a longer comment is input to the comment input receiving unit.

The target time determination unit calculates, as the target time, a value obtained by multiplying a unit display time that is a predetermined display time per character and the number of characters of the comment received by the comment input reception unit,
The comment information generation device according to claim 2, wherein when the comment is input to the comment input reception unit, the output unit outputs a movement trajectory having a length obtained by multiplying the number of characters of the comment by the unit display time. .

The target time determination unit further calculates the visual recognition time as the target time when the calculated visual recognition time required for visual recognition of characters does not satisfy the calculated target time,
When the comment is input to the comment input reception unit, the output unit multiplies the number of characters of the comment by a predetermined unit display time which is a display time per character, and the visual recognition time. 3. The comment trajectory generation according to claim 2, wherein a movement trajectory having a length longer than the visual recognition time is output and a movement trajectory having a length equal to or longer than the visual recognition time is output regardless of how short the comment is input. apparatus.

The output unit may move differently if the plurality of comments received by the comment input receiving unit are a plurality of the comments assigned to the same position in the same frame but have different numbers of characters. The comment information generating apparatus according to claim 1, wherein a comment is output.

The object motion determination unit is the object indicated by the position information received by the comment input reception unit using each of a plurality of motion estimation methods or each of a plurality of motion estimation parameters, and is included in the moving image The movement trajectory of the object is calculated, and the movement trajectory of the object is determined by selecting a movement trajectory having a length closest to the target time among the calculated movement trajectories of the object. The comment information generation device according to any one of the above items.

The object motion determination unit, as each of the plurality of motion estimation parameters, (1) influences the ease of tracking of the object, each of a plurality of error tolerance parameters having different values, and (2) the size of each of the plurality of motion estimation parameters. The object indicated by the position information received by the comment input receiving unit based on either one of a plurality of different search window regions or (3) each of a plurality of feature amounts having different values. The movement trajectory of the object in the moving image is calculated, and the movement trajectory of the object is determined by selecting the movement trajectory having the length closest to the target time among the calculated movement trajectories of the object. The comment information generation device according to claim 6.

The object motion determination unit may further determine a movement trajectory of the object having a length closest to the target time by using each of a plurality of motion estimation methods or each of a plurality of motion estimation parameters. The comment information generation device according to claim 6 or 7, wherein when a "possible state" occurs, it is determined whether the cause of the state is due to occlusion or a scene change, and the object motion determination method is switched based on the determination result.

When the object motion determination unit determines that the cause of the “state where motion cannot be estimated” is caused by occlusion, the occlusion occurs in the movement trajectory of the object in frames after the occlusion frame. The comment information generation device according to claim 8, wherein the movement trajectory of the object having a length closest to the target time is determined by extrapolation based on the movement trajectory of the object up to a frame.

When the object motion determination unit determines that the cause of the “state where motion cannot be estimated” is caused by a scene change, the object motion determination unit outputs a movement track of the object up to the frame where the scene change has occurred. The comment information generating device according to claim 8, wherein the comment information generating device is determined as follows.

The object motion determination unit determines that the cause of the “state where motion cannot be estimated” is caused by a scene change when a change amount of a luminance histogram between frames constituting the moving image is equal to or greater than a predetermined threshold. The comment information generation device according to claim 8, wherein when the change amount of the luminance histogram is less than the predetermined threshold value, it is determined that the cause of the “state incapable of motion estimation” is due to occlusion.

The object motion determination unit receives the comment input when a movement trajectory of the object obtained using each of the plurality of motion estimation methods or each of the plurality of motion estimation parameters is shorter than the target time by a certain time or more. The movement path estimated from the position of the object indicated by the position information and the comment and the position of the object indicated by the position information received by the comment input reception part to the previous frame on the time axis The comment information generation device according to claim 6, wherein a movement trajectory of the object having a length closest to the target time is determined by connecting the front of the movement trajectory of the object.

When the time length of the movement trajectory of the object obtained using each of the plurality of motion estimation methods or each of the plurality of motion estimation parameters is shorter than the target time by a certain time or more, Among the movement trajectories of the object based on the position within a certain distance range from the position indicated by the position information of the object received by the comment input reception unit, a movement trajectory whose time length is closest to the target time, The comment information generation device according to any one of claims 6 to 11, wherein the comment input reception unit determines the movement locus of the object indicated by the position information received.

When the time length of the movement trajectory of the object obtained using each of the plurality of motion estimation methods or each of the plurality of motion estimation parameters is shorter than the target time by a certain time or more, Among the movement trajectories of the object based on the position within the range specified by the user, including the position indicated by the position information of the object received by the comment input reception unit, the time length is the length closest to the target time The comment information generation device according to any one of claims 6 to 11, wherein the movement locus is determined as a movement locus of the object indicated by the position information received by the comment input reception unit.

When the time length of the movement trajectory of the object obtained using each of the plurality of motion estimation methods or each of the plurality of motion estimation parameters is shorter than the target time by a certain time or more, The area of the object is divided into a plurality of areas, and a movement trajectory of an area having a length closest to the target time among the areas obtained by the division is determined as a movement trajectory of the object. The comment information generation device according to any one of the above items.

The object movement determination unit further determines a movement locus having a length closest to a target time for the center of gravity of the object indicated by the position information received by the comment input reception unit, and the comment input reception unit receives Based on the relative positional relationship between the position where the comment is given and the center of gravity of the object, the determined movement trajectory of the object is as if it were a movement trajectory from the position where the comment is given. It correct | amends and outputs The comment information generation apparatus of any one of Claims 1-11.

further,
Based on the movement trajectory of the object determined by the object movement determination unit, a target time correction unit that corrects the target time so that the target time becomes longer as the moving speed of the object increases.
The object motion determination unit further includes a time of the target time after the movement trajectory of the object is corrected by the target time correction unit for displaying the comment following the object indicated by the position information. The comment information generation device according to claim 1, wherein the movement trajectory of the object is re-determined so as to be long.

A video acquisition step for accepting a moving image as input;
A comment input receiving step for receiving input of position information of the object in the moving image received in the moving image acquisition step and a comment to be displayed following the object;
A target time determining step for determining a target time that is a target value of a time length for displaying the comment based on the comment received in the comment input receiving step;
An object movement determination step of determining the movement trajectory of the object so that the movement trajectory of the object for causing the object indicated by the position information to follow and display the comment has a length sufficiently close to the target time;
An output comment information generation step for generating output comment information including the comment and the movement trajectory of the object determined in the object movement determination step;
An output step of outputting the output comment information generated in the output comment information generation step.

A program for causing a computer to execute the comment information generation method according to claim 18.