JP2009169644A

JP2009169644A - Image generating device, information terminal, image generating method, and program

Info

Publication number: JP2009169644A
Application number: JP2008006691A
Authority: JP
Inventors: Takamichi Miyata; 高道宮田; Hidefumi Iwashita; 英史岩下
Original assignee: Tokyo Institute of Technology NUC
Current assignee: Tokyo Institute of Technology NUC
Priority date: 2008-01-16
Filing date: 2008-01-16
Publication date: 2009-07-30

Abstract

<P>PROBLEM TO BE SOLVED: To estimate a presentation region optimal to a viewer by applying a dynamic outline model from a viewing history. <P>SOLUTION: This image generating device 1 generates one image from a plurality of input images. This image generating device includes a region estimating part 14 which determines an estimated region from the plurality of input images so that an energy function is minimized using the dynamic outline model by which the energy function containing narrowing energy to narrow down a region is defined to contain a most common image from the plurality of input images. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、画像生成装置、情報端末、画像生成方法及びプログラムに関し、特に、例えば、複数の視聴者により原画像から選択された部分領域の集合である視聴履歴から動的輪郭モデルを適用して視聴者にとって最適な提示領域を推定して、画像を生成するのに好適な画像生成装置、情報端末、画像生成方法及びプログラムに関する。 The present invention relates to an image generation device, an information terminal, an image generation method, and a program, and in particular, by applying a dynamic contour model from a viewing history that is a set of partial areas selected from an original image by a plurality of viewers, for example. The present invention relates to an image generation apparatus, an information terminal, an image generation method, and a program suitable for generating an image by estimating an optimum presentation area for a viewer.

動画像コンテンツやテキストなどのビジュアルコンテンツを視る際、視聴者にとって最も満足度が高くなるような空間的な切り出し範囲（最適提示領域）が存在する。例えば、サッカーの試合を撮影した動画像コンテンツの場合、フィールド全体ではなく、ボール付近をズームして動きを追ったものを提示する場合を指す。 When viewing visual content such as moving image content and text, there is a spatial clipping range (optimum presentation region) that provides the highest satisfaction for the viewer. For example, in the case of moving image content obtained by shooting a soccer game, it refers to a case in which the vicinity of the ball is zoomed to follow the movement, not the entire field.

近年、家庭用テレビの大画面化、テレビ付き携帯電話の発売等、映像の視聴環境が多様化している。そのため、視聴者が高解像度のビジュアルコンテンツを低解像度のディスプレイで視聴する状況が生じている。その際、高解像度のビジュアルコンテンツの画像全体を単純にダウンサンプリングすることが一般的に行われている。 In recent years, the viewing environment for videos has been diversified, such as the increase in the size of home TVs and the release of mobile phones with TVs. Therefore, a situation has arisen in which the viewer views high-resolution visual content on a low-resolution display. At that time, it is common to simply downsample the entire image of the high-resolution visual content.

そこで、特許文献１には、携帯端末で動画像コンテンツを視聴する環境において、視聴者が自在に空間的切り出しを行うことのできるインタフェースが提案されている。また、非特許文献１には、複数人の動画像コンテンツ視聴者により行われた切り出し操作履歴について、操作履歴を効率良く収集するために，視聴者が動画像コンテンツを視聴中に自身の嗜好に合わせて簡単に切り出し操作を行えるようなインターフェースが提案されている。 Therefore, Patent Document 1 proposes an interface that allows a viewer to freely perform spatial segmentation in an environment in which moving image content is viewed on a mobile terminal. Further, in Non-Patent Document 1, in order to efficiently collect operation histories of cut-out operation histories performed by a plurality of moving image content viewers, the viewers can make their own preferences while watching moving image content. In addition, an interface has been proposed that can be easily cut out.

一方、非特許文献２には、静止画像からの輪郭抽出を行うためのモデルとして、動的輪郭モデル（Active Contour Model）であるSnakesが提案されている。動的輪郭モデルとは、画像中の物体領域抽出などに利用されるアルゴリズムであり、輪郭によって決まるエネルギー関数を定義し、このエネルギーを最小化することによって輪郭を収束させ、所定の目的を達成するものである。 On the other hand, Non Patent Literature 2 proposes Snakes, which is an active contour model, as a model for extracting a contour from a still image. An active contour model is an algorithm used to extract an object region in an image, etc., defines an energy function determined by the contour, minimizes this energy, converges the contour, and achieves a predetermined purpose. Is.

さらに、非特許文献３には、時空間画像中の動物体に対応する領域を抽出するためのモデルとして、Active Tubesが提案されている。Active Tubesは、Snakesを時空間画像へ適用したとみなすことができ、非特許文献２と同様にエネルギー最小化の手法を用いて、時空間画像中の物体を抽出することができる。
特開２００５−３４１３９８号公報岩下英史, 宮田高道, 酒井善則, "視聴者の操作履歴に基づく動画像コンテンツのトリミング手法の提案, " IMPS2006, pp. 55-56, (Nov. 2006). M. Kass, A. witkin, and D. Terzopoulos, "Snakes : active contour models," International Journal of Computer Vision, vol.1, no.4, pp. 321-331,(1988). 古川亮, 今井正和, 烏野武, "時空間画像を利用した弾性輪郭モデルとその収束方法, " 電子情報通信学会論文誌, D-II, Vol. J79-D-II, No.6,pp. 1054-1063, (Jun. 1996). Further, Non-Patent Document 3 proposes Active Tubes as a model for extracting a region corresponding to a moving object in a spatiotemporal image. Active Tubes can be regarded as having applied Snakes to a spatio-temporal image, and can extract an object in a spatio-temporal image using an energy minimization method as in Non-Patent Document 2.
JP 2005-341398 A Hidefumi Iwashita, Takamichi Miyata, Yoshinori Sakai, "Proposal of video content trimming based on viewer operation history," IMPS2006, pp. 55-56, (Nov. 2006). M. Kass, A. witkin, and D. Terzopoulos, "Snakes: active contour models," International Journal of Computer Vision, vol.1, no.4, pp. 321-331, (1988). Ryo Furukawa, Masakazu Imai, Takeshi Kanno, "Elastic contour model using spatio-temporal image and its convergence method," IEICE Transactions, D-II, Vol. J79-D-II, No.6, pp 1054-1063, (Jun. 1996).

しかしながら特許文献１に記載のインタフェースでは、視聴者により切り出された切り出し領域を画面に表示することができるが、複数の視聴者による異なる切り出し領域から最適提示領域を推定することはできない。また、非特許文献１に記載のインタフェースでは、複数の視聴者からの操作履歴を収集することができるが、それらから最適提示領域を推定する具体的な方法については、開示されていない。 However, with the interface described in Patent Document 1, a clip region cut out by a viewer can be displayed on a screen, but an optimum presentation region cannot be estimated from different clip regions from a plurality of viewers. The interface described in Non-Patent Document 1 can collect operation histories from a plurality of viewers, but does not disclose a specific method for estimating the optimum presentation area from them.

また、非特許文献２に記載のSnakesでは、対象とする画像内の物体の輪郭を抽出するものであるが、抽出される輪郭は、表示される画面の形式とは対応しておらず、画面に表示される際に、最適提示領域となるとは限らない。さらに、非特許文献３に記載のActive Tubesでは、Snakesと比べ、時空間画像に適用可能であるが、やはり、原画像自体の特徴（輝度情報）から輪郭を抽出するものであるため、最適提示領域が抽出されるとは限らない。 In addition, Snakes described in Non-Patent Document 2 extracts the outline of an object in a target image, but the extracted outline does not correspond to the format of the displayed screen. When displayed on the screen, it is not always the optimum presentation area. Furthermore, the Active Tubes described in Non-Patent Document 3 can be applied to spatio-temporal images as compared with Snakes. However, since the contour is extracted from the characteristics (luminance information) of the original image itself, it is optimally presented. The region is not necessarily extracted.

本発明は、このような問題点を解決するためになされたものであり、視聴履歴から動的輪郭モデルを適用して視聴者にとって最適な提示領域を推定することができる画像生成装置、情報端末、画像生成方法及びプログラムを提供することを目的とする。 The present invention has been made to solve such problems, and an image generation apparatus and an information terminal that can estimate an optimum presentation area for a viewer by applying a dynamic contour model from a viewing history An object of the present invention is to provide an image generation method and a program.

本発明にかかる画像生成装置は、複数の入力画像から一つの画像を生成するものである。前記複数の入力画像から最も共通する画像を含むよう領域を絞り込む絞込エネルギーを含むエネルギー関数が定義された動的輪郭モデルを用いて、前記エネルギー関数を最小化するよう前記複数の入力画像から推定領域を求める領域推定部を備える。 The image generation apparatus according to the present invention generates one image from a plurality of input images. Estimating from the plurality of input images to minimize the energy function using an active contour model in which an energy function including a narrowing-down energy that narrows down a region so as to include the most common image from the plurality of input images is used. A region estimation unit for obtaining a region is provided.

本発明においては、一つの原画像に対する複数の選択領域から、最も共通する領域を含むような領域を推定する。これにより、多くの共通領域を含む提示領域を推定でき、複数の選択領域において平均的な領域を推定することができる。 In the present invention, an area including the most common area is estimated from a plurality of selected areas for one original image. Thereby, the presentation area including many common areas can be estimated, and an average area can be estimated in a plurality of selection areas.

また、前記絞込エネルギーは、前記推定領域の内部に関する内部エネルギーと、前記推定領域の外部に関する外部エネルギーとを有し、前記内部エネルギーは、前記推定領域の画像が共通するほどエネルギーが小さくなり、前記外部エネルギーは、前記推定領域の外部に共通する画像が少ないほどエネルギーが小さくなるようにするとよい。これにより、より正確な領域を推定できる。 Further, the narrowing energy includes internal energy relating to the inside of the estimation region and external energy relating to the outside of the estimation region, and the internal energy becomes smaller as the image of the estimation region is shared, The external energy may be such that the smaller the image common outside the estimation area, the smaller the energy. As a result, a more accurate region can be estimated.

また、前記エネルギー関数は、前記推定領域のサイズを目標値に近付けるサイズ調整エネルギーをさらに含むようにするとよい。これにより、サイズの目標値に近い提示領域を推定できる。 The energy function may further include size adjustment energy that brings the size of the estimated region close to a target value. Thereby, the presentation area close to the target value of size can be estimated.

また、前記入力画像は、原画像の一部の領域が切り出されたものであって、前記エネルギー関数は、前記原画像間で切り出された位置が一致するほど小さくなる画像間調整エネルギーをさらに含むようにするとよい。これにより、フレーム間の位置の移動量が調整され、推定される領域の品質を向上できる。 Further, the input image is obtained by cutting out a partial region of the original image, and the energy function further includes inter-image adjustment energy that becomes smaller as the positions cut out between the original images match. It is good to do so. Thereby, the movement amount of the position between frames is adjusted, and the quality of the estimated region can be improved.

さらに、複数の前記原画像は、複数のフレームから構成される時系列データであって、前記画像間調整エネルギーは、前後のフレームの切り出し位置を直線で結び、前記前後の間のフレームの切り出し位置が前記直線に近付くほどエネルギーが小さくなるようにするとよい。これにより、フレーム間の位置の移動量が調整され、推定される領域の品質をさらに向上できる。 Further, the plurality of original images is time-series data composed of a plurality of frames, and the inter-image adjustment energy connects the cut-out positions of the previous and subsequent frames with straight lines, and the cut-out positions of the frames between the front and rear It is preferable that the energy decreases as the value approaches the straight line. Thereby, the movement amount of the position between frames is adjusted, and the quality of the estimated region can be further improved.

また、前記推定領域は、矩形であり、前記エネルギー関数は、前記推定領域の縦横の比率を一定に保つ縦横比調整エネルギーをさらに有するようにするとよい。これにより、領域の縦横の比率における品質を向上できる。 The estimation area may be a rectangle, and the energy function may further include an aspect ratio adjustment energy that maintains a constant aspect ratio of the estimation area. Thereby, the quality in the aspect ratio of an area | region can be improved.

また、前記複数の入力画像は、１又は複数の画像における一部又は全てが切り出された画像であるとよい。これにより、動画像において、画像内の一部における共通領域を推定できる。 The plurality of input images may be images obtained by cutting out some or all of one or a plurality of images. Thereby, in a moving image, the common area | region in a part in image can be estimated.

また、前記複数の入力画像は、前記原画像を視聴する複数の視聴者により選択された領域の集合である視聴履歴であるとよい。これにより、多くの視聴者が希望する提示領域を推定できる。 The plurality of input images may be viewing histories that are collections of regions selected by a plurality of viewers viewing the original image. Thereby, the presentation area which many viewers desire can be estimated.

また、前記視聴履歴を記憶する視聴履歴記憶部と、前記視聴履歴記憶部を参照し、領域ごとに視聴者数を集計したヒストグラムを生成する集計処理部とをさらに備えるようにするとよい。これにより、視聴者数を密度としてエネルギーに定義することができる。 Further, it is preferable to further include a viewing history storage unit that stores the viewing history, and a totaling processing unit that generates a histogram by counting the number of viewers for each region with reference to the viewing history storage unit. Thereby, the number of viewers can be defined as energy as density.

また、前記領域推定部は、前記エネルギー関数を貪欲法により最小化し、前記貪欲法の繰り返し回数が所定回数以下の場合、前記外部エネルギー及び前記画像間調整エネルギー以外の前記エネルギー関数を最小化する領域を求めるようにするとよい。これにより、収束効率を高めることができる。 The region estimation unit minimizes the energy function by a greedy method, and when the number of repetitions of the greedy method is a predetermined number or less, a region that minimizes the energy function other than the external energy and the inter-image adjustment energy. It is good to ask for. Thereby, convergence efficiency can be improved.

また、前記視聴者により選択された領域を入力画像として受け付け、前記視聴履歴記憶部へ格納する入力受付部をさらに備えるようにするとよい。これにより、視聴者の自由な範囲選択が可能となる。 Further, it is preferable to further include an input receiving unit that receives an area selected by the viewer as an input image and stores it in the viewing history storage unit. Thereby, the viewer can freely select a range.

また、前記動的輪郭モデルは、前記原画像のレイアウト情報に関するエネルギーをさらに含むエネルギー関数が定義され、前記領域推定部は、当該動的輪郭モデルを用いて、推定領域を求めるとよい。これにより、ＷＥＢコンテンツのレイアウト情報を利用した提示領域の推定ができる。 In the active contour model, an energy function further including energy related to the layout information of the original image is defined, and the region estimation unit may obtain an estimated region using the dynamic contour model. Thereby, the presentation area | region using the layout information of a WEB content can be estimated.

本発明にかかる情報端末は、前記原画像を表示し、前記視聴者により選択された領域を判別し、当該領域の情報を本発明にかかる画像生成装置へ送信するものである。これにより、選択領域の自動収集が可能となる。 The information terminal according to the present invention displays the original image, determines an area selected by the viewer, and transmits information on the area to the image generation apparatus according to the present invention. Thereby, automatic collection of the selection area | region is attained.

本発明にかかる画像生成方法は、複数の入力画像から一つの画像を生成するものである。前記複数の入力画像から最も共通する画像を含むよう領域を絞り込む絞込エネルギーを含むエネルギー関数が定義された動的輪郭モデルを用いて、前記エネルギー関数を最小化するよう前記複数の入力画像から推定領域を求める領域推定ステップを備える。これにより、多くの共通領域を含む提示領域を推定でき、複数の選択領域において平均的な領域を推定することができる。 The image generation method according to the present invention generates one image from a plurality of input images. Estimating from the plurality of input images to minimize the energy function using an active contour model in which an energy function including a narrowing-down energy that narrows down a region so as to include the most common image from the plurality of input images is used. A region estimation step for obtaining a region is provided. Thereby, the presentation area including many common areas can be estimated, and an average area can be estimated in a plurality of selection areas.

また、前記エネルギー関数は、前記推定領域のサイズを目標値に近付けるサイズ調整エネルギーをさらに含むとよい。これにより、サイズの目標値に近い提示領域を推定できる。 The energy function may further include size adjustment energy that brings the size of the estimation region close to a target value. Thereby, the presentation area close to the target value of size can be estimated.

また、前記入力画像は、原画像の一部の画像領域が切り出されたものであって、前記エネルギー関数は、前記原画像間で切り出された位置が一致するほど小さくなる画像間調整エネルギーをさらに含むとよい。これにより、フレーム間の位置の移動量が調整され、推定される領域の品質を向上できる。 In addition, the input image is obtained by cutting out a part of the image area of the original image, and the energy function further reduces inter-image adjustment energy that becomes smaller as the positions cut out between the original images match. It is good to include. Thereby, the movement amount of the position between frames is adjusted, and the quality of the estimated region can be improved.

また、複数の前記原画像は、複数のフレームでから構成される時系列データであって、前記画像間調整エネルギーは、前後のフレームの切り出し位置を直線で結び、前記前後の間のフレームの切り出し位置が前記直線に近付くほどエネルギーが小さくなるようにするとよい。これにより、フレーム間の位置の移動量が調整され、推定される領域の品質をさらに向上できる。 Further, the plurality of original images is time-series data composed of a plurality of frames, and the inter-image adjustment energy is obtained by connecting the cut-out positions of the preceding and following frames with straight lines, and cutting out the frames between the previous and next frames. The energy may be reduced as the position approaches the straight line. Thereby, the movement amount of the position between frames is adjusted, and the quality of the estimated region can be further improved.

また、前記推定領域は、矩形であり、前記エネルギー関数は、前記推定領域の縦横の比率を一定に保つ縦横比調整エネルギーをさらに有するとよい。これにより、領域の縦横の比率における品質を向上できる。 The estimation area may be a rectangle, and the energy function may further include aspect ratio adjustment energy that maintains a constant aspect ratio of the estimation area. Thereby, the quality in the aspect ratio of an area | region can be improved.

また、前記視聴履歴から、領域ごとに視聴者数を集計したヒストグラムを生成する集計処理ステップとをさらに備えるようにするとよい。これにより、視聴者数を密度としてエネルギーに定義することができる。 Further, it is preferable to further include an aggregation processing step of generating a histogram in which the number of viewers is aggregated for each area from the viewing history. Thereby, the number of viewers can be defined as energy as density.

また、前記領域推定ステップは、前記エネルギー関数を貪欲法により最小化し、前記貪欲法の繰り返し回数が所定回数以下の場合、前記外部エネルギー及び前記画像間調整エネルギー以外の前記エネルギー関数を最小化する領域を求めるようにするとよい。これにより、収束効率を高めることができる。 The region estimation step minimizes the energy function by a greedy method, and minimizes the energy function other than the external energy and the inter-image adjustment energy when the number of repetitions of the greedy method is a predetermined number or less. It is good to ask for. Thereby, convergence efficiency can be improved.

また、前記視聴者により選択された領域を入力画像として受け付け、前記視聴履歴記憶部へ格納する入力受付ステップをさらに備えるようにするとよい。これにより、視聴者の自由な範囲選択が可能となる。 Further, it is preferable to further include an input receiving step of receiving an area selected by the viewer as an input image and storing it in the viewing history storage unit. Thereby, the viewer can freely select a range.

また、前記動的輪郭モデルは、前記原画像のレイアウト情報に関するエネルギーをさらに含むエネルギー関数が定義され、前記領域推定ステップは、当該動的輪郭モデルを用いて、推定領域を求めるとよい。これにより、ＷＥＢコンテンツのレイアウト情報を利用した提示領域の推定ができる。 In the active contour model, an energy function further including energy related to the layout information of the original image is defined, and the region estimation step may obtain an estimated region using the dynamic contour model. Thereby, the presentation area | region using the layout information of a WEB content can be estimated.

本発明にかかるプログラムは、上述した画像生成方法の処理をコンピュータに実行させるものである。 A program according to the present invention causes a computer to execute the processing of the above-described image generation method.

本発明によれば、視聴履歴から動的輪郭モデルを適用して視聴者にとって最適な提示領域を推定することができる画像生成装置、情報端末、画像生成方法及びプログラムを提供することができる。 According to the present invention, it is possible to provide an image generation device, an information terminal, an image generation method, and a program that can estimate an optimum presentation area for a viewer by applying a dynamic contour model from a viewing history.

以下では、本発明を適用した具体的な実施の形態について、図面を参照しながら詳細に説明する。各図面において、同一要素には同一の符号が付されており、説明の明確化のため、必要に応じて重複説明は省略する。 Hereinafter, specific embodiments to which the present invention is applied will be described in detail with reference to the drawings. In the drawings, the same elements are denoted by the same reference numerals, and redundant description will be omitted as necessary for the sake of clarity.

発明の実施の形態１．
本発明は、複数の入力画像から、一つの画像を生成する画像生成装置に関するものである。画像生成装置は、動的輪郭モデルを用いて、当該複数の入力画像から最も共通する画像を含むような推定領域を求める領域推定部を備えたものである。このとき、動的輪郭モデルとしては、当該複数の入力画像から最も共通する画像を含むような領域を絞り込む絞込エネルギーを含むエネルギー関数を定義する。そして、領域推定部は、当該エネルギー関数を最小化するよう当該複数の入力画像から推定領域を求めることにより、推定領域に基づいた画像を生成する。 Embodiment 1 of the Invention
The present invention relates to an image generation apparatus that generates one image from a plurality of input images. The image generation apparatus includes an area estimation unit that obtains an estimation area that includes the most common image from the plurality of input images using an active contour model. At this time, as the active contour model, an energy function including a narrowing energy for narrowing down a region including the most common image from the plurality of input images is defined. Then, the region estimation unit generates an image based on the estimated region by obtaining an estimated region from the plurality of input images so as to minimize the energy function.

本発明の実施の形態１においては、複数の入力画像として、予め、複数の視聴者が動画像コンテンツから所望の箇所を切り出した選択領域を用い、選択領域の内、共通する領域が多いものを推定領域として求める。そして、推定領域を求めるために、動的輪郭モデルに基づくエネルギー関数を定義する。当該エネルギー関数は、選択領域における視聴者数を視聴者の注目と捉え、視聴者数の多い領域を推定領域として絞り込む絞込エネルギーを含むものである。また、当該エネルギー関数は、上述した絞込エネルギーに加え、推定領域のサイズを目標値に近付けるエネルギー、選択領域のサイズ、縦横の比率を調整するエネルギー、及び、動画像コンテンツの複数のフレーム間の移動を滑らかにするエネルギーが定義されている。次に、本発明の実施の形態１における画像生成装置について更に詳細に説明する。 In Embodiment 1 of the present invention, as a plurality of input images, a selection area in which a plurality of viewers cut out a desired portion from moving image content in advance is used, and there are many common areas among the selection areas. Obtained as an estimated area. And in order to obtain | require an estimation area | region, the energy function based on an active contour model is defined. The energy function includes narrowing-down energy that considers the number of viewers in the selected region as the viewer's attention and narrows the region with a large number of viewers as the estimated region. Further, in addition to the above-described narrowing energy, the energy function includes energy that brings the size of the estimated area closer to the target value, energy that adjusts the size of the selected area, the aspect ratio, and between the frames of the moving image content. The energy that smoothes the movement is defined. Next, the image generation apparatus according to Embodiment 1 of the present invention will be described in more detail.

図１、及び図２は、本発明の実施の形態１にかかる画像生成装置１の構成例を示す図である。図１は、画像生成装置１が、情報端末３ａ、３ｂ、及び３ｃへビジュアルコンテンツを配信し、情報端末３ａ、３ｂ、及び３ｃを使用する各視聴者におけるビジュアルコンテンツの視聴履歴を取得する流れを説明する図である。図２は、画像生成装置１が、当該視聴履歴に基づき領域推定処理を行い、推定領域コンテンツを生成し、当該推定領域コンテンツを、情報端末３d、３e、及び３fその他の情報端末に対して、配信する流れを説明する図である。これにより、視聴者は、当該ビジュアルコンテンツにおける最適提示領域を視聴することができる。 1 and 2 are diagrams illustrating a configuration example of the image generation apparatus 1 according to the first embodiment of the present invention. FIG. 1 illustrates a flow in which the image generation apparatus 1 distributes visual content to the information terminals 3a, 3b, and 3c, and acquires a viewing history of visual content for each viewer who uses the information terminals 3a, 3b, and 3c. It is a figure explaining. In FIG. 2, the image generation device 1 performs region estimation processing based on the viewing history to generate estimated region content, and the estimated region content is transmitted to the information terminals 3d, 3e, and 3f and other information terminals. It is a figure explaining the flow to deliver. Thus, the viewer can view the optimum presentation area in the visual content.

画像生成装置１は、コンテンツ配信部１１、入力受付部１２、集計処理部１３、領域推定部１４、切り出し処理部１５、コンテンツ記憶部２１、視聴履歴記憶部２２、ヒストグラム記憶部２３、推定領域コンテンツ記憶部２４を備える。画像生成装置１は、汎用的なコンピュータシステムにより構成されていればよく、例えば、動画像コンテンツの配信を行うＷＥＢサーバであればよい。また、画像生成装置１は、ネットワーク（不図示）を介して、情報端末３ａ、３ｂ、及び３ｃと接続されており、ビジュアルコンテンツその他の情報を送受信可能である。 The image generation apparatus 1 includes a content distribution unit 11, an input reception unit 12, a totalization processing unit 13, a region estimation unit 14, a cutout processing unit 15, a content storage unit 21, a viewing history storage unit 22, a histogram storage unit 23, and an estimated region content. A storage unit 24 is provided. The image generation apparatus 1 only needs to be configured by a general-purpose computer system, and may be, for example, a WEB server that distributes moving image content. The image generation apparatus 1 is connected to the information terminals 3a, 3b, and 3c via a network (not shown), and can transmit and receive visual content and other information.

コンテンツ記憶部２１は、情報端末３ａ、３ｂ、及び３ｃへ配信されるビジュアルコンテンツを記憶する。ここで、ビジュアルコンテンツとは、例えば、複数のフレームから構成される時系列データからなる動画像コンテンツ、又はテキスト等である。また、各フレームは、矩形であり、複数のピクセルで構成され、各ピクセルの位置は、フレーム内の座標で特定することができる。 The content storage unit 21 stores visual content distributed to the information terminals 3a, 3b, and 3c. Here, the visual content is, for example, moving image content composed of time-series data composed of a plurality of frames, text, or the like. Each frame has a rectangular shape and includes a plurality of pixels, and the position of each pixel can be specified by coordinates within the frame.

コンテンツ配信部１１は、情報端末３ａ、３ｂ、又は３ｃからのコンテンツ配信要求に応じてコンテンツ記憶部２１を参照し、要求元である情報端末へビジュアルコンテンツを送信する。尚、コンテンツ配信部１１は、コンテンツ記憶部２１に格納されたビジュアルコンテンツを情報端末３ａ、３ｂ、及び３ｃへ一斉に配信するようにしても構わない。また、コンテンツ配信部１１は、切り出し処理部１５から送られる切り出し後のビジュアルコンテンツを要求元である情報端末へ送信する。さらに、コンテンツ配信部１１は、情報端末３ｄ、３ｅ、又は３ｆからの推定領域コンテンツ配信要求に応じて推定領域コンテンツ記憶部２４を参照し、要求元である情報端末へ推定領域コンテンツを送信する。 The content distribution unit 11 refers to the content storage unit 21 in response to a content distribution request from the information terminal 3a, 3b, or 3c, and transmits visual content to the information terminal that is the request source. Note that the content distribution unit 11 may distribute the visual content stored in the content storage unit 21 to the information terminals 3a, 3b, and 3c all at once. In addition, the content distribution unit 11 transmits the cut-out visual content sent from the cut-out processing unit 15 to the information terminal that is the request source. Furthermore, the content distribution unit 11 refers to the estimated area content storage unit 24 in response to the estimated area content distribution request from the information terminal 3d, 3e, or 3f, and transmits the estimated area content to the requesting information terminal.

入力受付部１２は、情報端末３ａ、３ｂ、又は３ｃからの視聴履歴データを受信し、視聴履歴記憶部２２へ格納する。その際、入力受付部１２は、情報端末３ａ、３ｂ、又は３ｃを操作する各視聴者と、選択領域とを対応付ける。また、入力受付部１２は、当該選択領域を切り出し処理部１５へ送る。 The input receiving unit 12 receives viewing history data from the information terminals 3 a, 3 b, or 3 c and stores it in the viewing history storage unit 22. At that time, the input receiving unit 12 associates each viewer who operates the information terminal 3a, 3b, or 3c with the selected area. Further, the input receiving unit 12 sends the selected area to the cutout processing unit 15.

切り出し処理部１５は、入力受付部１２から選択領域を取得し、コンテンツ記憶部２１を参照し、当該選択領域に基づきビジュアルコンテンツを切り出し、切り出し後のビジュアルコンテンツをコンテンツ配信部１１へ送る。 The cutout processing unit 15 acquires a selection area from the input receiving unit 12, refers to the content storage unit 21, cuts out visual content based on the selection area, and sends the cutout visual content to the content distribution unit 11.

視聴履歴記憶部２２は、情報端末３ａ、３ｂ、及び３ｃを操作する各視聴者による当該ビジュアルコンテンツの視聴履歴データを記憶する。ここで、視聴履歴データは、情報端末３ａ、３ｂ、及び３ｃの画面に表示される当該ビジュアルコンテンツを各視聴者がフレームごとに選択した部分領域である選択領域を表わす情報である。また、選択領域は、矩形とし、例えば、矩形の左上と右下の２点（ピクセル）の座標、矩形の４隅の点の座標、矩形の角１点の座標と矩形の高さ、及び幅の情報、又は、矩形内のすべてのピクセルの座標等で表現されるものとする。つまり、視聴履歴データは、視聴者当たり各フレームのどのピクセル集合が選択されたかを表わす情報である。尚、選択領域は、本実施の形態においては矩形とするが、これに限定されない。 The viewing history storage unit 22 stores viewing history data of the visual content by each viewer who operates the information terminals 3a, 3b, and 3c. Here, the viewing history data is information representing a selection region that is a partial region in which each viewer selects the visual content displayed on the screens of the information terminals 3a, 3b, and 3c for each frame. The selection area is a rectangle. For example, the coordinates of two points (pixels) at the upper left and lower right of the rectangle, the coordinates of the four corners of the rectangle, the coordinates of one corner of the rectangle, the height of the rectangle, and the width Or the coordinates of all the pixels in the rectangle. That is, the viewing history data is information indicating which pixel set of each frame is selected per viewer. The selection area is rectangular in this embodiment, but is not limited to this.

集計処理部１３は、視聴履歴記憶部２２を参照し、視聴履歴データからピクセル当たりの視聴者数を集計し、集計ヒストグラム情報を生成し、当該集計ヒストグラム情報をヒストグラム記憶部２３へ格納する。ここで、集計ヒストグラム情報は、各フレーム、各ピクセルにおける視聴者数を値とする情報である。 The totalization processing unit 13 refers to the viewing history storage unit 22, totals the number of viewers per pixel from the viewing history data, generates total histogram information, and stores the total histogram information in the histogram storage unit 23. Here, the total histogram information is information whose value is the number of viewers in each frame and each pixel.

ヒストグラム記憶部２３は、集計処理部１３により生成される集計ヒストグラム情報を記憶する。図７は、集計ヒストグラム情報をグラフ表示した例を示す図である。図７に示すグラフは、三次元空間のグラフをあるフレームにおけるフレーム高さ、フレーム幅、及び視聴者数で表現したものである。ここで、フレーム高さ、及びフレーム幅は、あるフレームにおけるピクセル当たりのｘ、ｙ座標に対応する。そのため、グラフ上の視聴者数が高い領域は、より多くの視聴者に選択された領域であることを示し、言い換えると、多くの選択領域に共通する領域であることを示す。 The histogram storage unit 23 stores total histogram information generated by the total processing unit 13. FIG. 7 is a diagram showing an example in which the aggregate histogram information is displayed in a graph. The graph shown in FIG. 7 is a graph of a three-dimensional space expressed by the frame height, the frame width, and the number of viewers in a certain frame. Here, the frame height and the frame width correspond to x and y coordinates per pixel in a certain frame. Therefore, an area having a high number of viewers on the graph indicates an area selected by more viewers, in other words, an area common to many selected areas.

領域推定部１４は、ヒストグラム記憶部２３を参照し、後述するエネルギー関数から、エネルギーを最小化する領域を求め、推定領域として推定する。その後、領域推定部１４は、推定領域に基づき、コンテンツ記憶部２１を参照し、当該ビジュアルコンテンツから画像データを生成し、推定領域コンテンツとして推定領域コンテンツ記憶部２４へ格納する。 The region estimation unit 14 refers to the histogram storage unit 23, obtains a region for minimizing energy from an energy function described later, and estimates it as an estimation region. Thereafter, the region estimation unit 14 refers to the content storage unit 21 based on the estimation region, generates image data from the visual content, and stores the image data as the estimation region content in the estimation region content storage unit 24.

推定領域コンテンツ記憶部２４は、後述する領域推定部１４により推定された領域に基づき、当該ビジュアルコンテンツから生成される推定領域コンテンツを記憶する。 The estimated area content storage unit 24 stores estimated area content generated from the visual content based on the area estimated by the area estimation unit 14 described later.

コンテンツ記憶部２１と、視聴履歴記憶部２２と、ヒストグラム記憶部２３と、推定領域コンテンツ記憶部２４とは、ハードディスクドライブ、フラッシュメモリ等の不揮発性の記憶装置でもよいし、ＤＲＡＭ（Dynamic Random Access Memory）等の揮発性の記憶装置であってもよい。 The content storage unit 21, the viewing history storage unit 22, the histogram storage unit 23, and the estimated area content storage unit 24 may be a non-volatile storage device such as a hard disk drive or a flash memory, or a dynamic random access memory (DRAM). ) Or other volatile storage devices.

尚、画像生成装置１の構成は図１、及び図２に限定されない。例えば、コンテンツ配信部１１、入力受付部１２、集計処理部１３、切り出し処理部１５は、それぞれ別のコンピュータシステムにより実現され、画像生成装置１に接続、もしくは、ネットワークを介して通信可能であればよい。また、コンテンツ記憶部２１、視聴履歴記憶部２２、ヒストグラム記憶部２３、推定領域コンテンツ記憶部２４は、それぞれ別の記憶装置で実現され、画像生成装置１と接続されていても構わない。 Note that the configuration of the image generation apparatus 1 is not limited to FIGS. 1 and 2. For example, the content distribution unit 11, the input reception unit 12, the totalization processing unit 13, and the cutout processing unit 15 are realized by different computer systems and can be connected to the image generation apparatus 1 or communicated via a network. Good. Further, the content storage unit 21, the viewing history storage unit 22, the histogram storage unit 23, and the estimation area content storage unit 24 may be realized by different storage devices and connected to the image generation device 1.

図１に示す情報端末３ａ、３ｂ、及び３ｃは、画像生成装置１とネットワークを介して接続されており、画像生成装置１へコンテンツ配信要求を送信し、画像生成装置１から配信されるビジュアルコンテンツを受信し、画面に表示するものである。また、情報端末３ａ、３ｂ、及び３ｃは、当該画面に表示されたビジュアルコンテンツに対して、視聴者が視聴する領域を切り出すことができるインタフェースを備えるものとする。情報端末３ａ、３ｂ、及び３ｃは、例えば、非特許文献１に開示された動画像コンテンツを視聴しながら画像の切り出し（トリミング）を行えるインタフェースを備えていればよい。また、情報端末３ａ、３ｂ、及び３ｃは、画像の切り出しを指示するためにマウス等の入力デバイスを備えている。尚、情報端末３ａ、３ｂ、及び３ｃの台数は、これに限定されない。また、情報端末３ａ、３ｂ、及び３ｃの機能は、画像生成装置１内に実現されていてもよい。 The information terminals 3a, 3b, and 3c shown in FIG. 1 are connected to the image generation apparatus 1 via a network, transmit a content distribution request to the image generation apparatus 1, and are distributed from the image generation apparatus 1. Is received and displayed on the screen. In addition, the information terminals 3a, 3b, and 3c are provided with an interface that can cut out an area viewed by the viewer with respect to the visual content displayed on the screen. For example, the information terminals 3a, 3b, and 3c only need to have an interface that can cut out (trim) an image while viewing the moving image content disclosed in Non-Patent Document 1. In addition, the information terminals 3a, 3b, and 3c are provided with an input device such as a mouse in order to instruct to cut out an image. The number of information terminals 3a, 3b, and 3c is not limited to this. The functions of the information terminals 3a, 3b, and 3c may be realized in the image generation device 1.

図２に示す情報端末３ｄ、３ｅ、及び３ｆは、画像生成装置１とネットワークを介して接続されており、画像生成装置１へ推定領域コンテンツ配信要求を送信し、画像生成装置１から配信される推定領域コンテンツを受信し、画面に表示するものである。情報端末３ｄ、３ｅ、及び３ｆは、汎用的なコンピュータシステムであればよい。例えば、ＷＥＢブラウザを備えたコンピュータであればよい。尚、情報端末３ｄ、３ｅ、及び３ｆの台数は、これに限定されない。また、推定領域コンテンツを受信し、表示する情報端末は、情報端末３ａ、３ｂ、及び３ｃであってもよい。 The information terminals 3d, 3e, and 3f illustrated in FIG. 2 are connected to the image generation apparatus 1 via a network, transmit an estimated area content distribution request to the image generation apparatus 1, and are distributed from the image generation apparatus 1. The estimated area content is received and displayed on the screen. The information terminals 3d, 3e, and 3f may be general-purpose computer systems. For example, it may be a computer provided with a WEB browser. The number of information terminals 3d, 3e, and 3f is not limited to this. In addition, the information terminals that receive and display the estimated area content may be the information terminals 3a, 3b, and 3c.

尚、切り出し処理部１５は、情報端末３ａ、３ｂ、及び３ｃにあってもよい。例えば、情報端末３ａは、情報端末３ａを使用する視聴者が切り出した選択領域を取得し、受信したビジュアルコンテンツから当該選択領域に基づき切り出しを行い、切り出し後のビジュアルコンテンツを画面に表示するようにすればよい。 Note that the cutout processing unit 15 may be in the information terminals 3a, 3b, and 3c. For example, the information terminal 3a acquires a selection area cut out by a viewer who uses the information terminal 3a, cuts out the received visual content based on the selection area, and displays the cut-out visual content on the screen. do it.

以下に、本発明の実施の形態１の適用例の概略処理を図３のフローチャート図を参照しながら説明する。尚、図１は、図３のステップＳ１１における処理の流れを示す図であって、情報端末３ａ、３ｂ、及び３ｃから視聴履歴を収集する動作を説明する図である。また、図２は、ステップＳ１２乃至Ｓ１４における処理の流れを示す図であって、収集された視聴履歴から推定領域コンテンツを生成する動作を説明する図である。 Hereinafter, schematic processing of an application example of the first embodiment of the present invention will be described with reference to a flowchart of FIG. FIG. 1 is a diagram showing the flow of processing in step S11 of FIG. 3, and is a diagram for explaining the operation of collecting viewing histories from the information terminals 3a, 3b, and 3c. FIG. 2 is a diagram showing the flow of processing in steps S12 to S14, and is a diagram for explaining the operation of generating estimated area content from the collected viewing history.

図１に示すように、まず、画像生成装置１は、情報端末３ａ、３ｂ、及び３ｃから各視聴者における選択領域の情報を収集する（Ｓ１１）。具体的には、まず、画像生成装置１は、情報端末３ａからのコンテンツ送信要求を受信し、コンテンツ配信部１１により、コンテンツ記憶部２１に格納されたビジュアルコンテンツを要求元の情報端末３ａへ送信する。次に、情報端末３ａは、ビジュアルコンテンツを受信し、画面に表示する。そして、情報端末３ａの視聴者は、後述する図６に示す領域選択画面５０を通して、表示されるビジュアルコンテンツから視聴する領域を選択する。その後、情報端末３ａは、当該選択領域を画像生成装置１へ送信する。そして、画像生成装置１の入力受付部１２は、情報端末３ａから視聴履歴データを受信し、視聴履歴記憶部２２へ格納する。また、このとき、切り出し処理部１５は、入力受付部１２から取得した当該選択領域に基づき、コンテンツ記憶部２１に格納されたビジュアルコンテンツから切り出し後のビジュアルコンテンツを生成し、コンテンツ配信部１１は、切り出し後のビジュアルコンテンツを情報端末３ａへ送信する。尚、ステップＳ１１は、情報端末３ａ、３ｂ、及び３ｃの全てについて行われる。 As shown in FIG. 1, first, the image generation apparatus 1 collects information on a selected area in each viewer from the information terminals 3a, 3b, and 3c (S11). Specifically, first, the image generation device 1 receives a content transmission request from the information terminal 3a, and the content distribution unit 11 transmits the visual content stored in the content storage unit 21 to the requesting information terminal 3a. To do. Next, the information terminal 3a receives the visual content and displays it on the screen. Then, the viewer of the information terminal 3a selects a viewing area from the displayed visual content through the area selection screen 50 shown in FIG. Thereafter, the information terminal 3a transmits the selected area to the image generation device 1. Then, the input receiving unit 12 of the image generating apparatus 1 receives the viewing history data from the information terminal 3 a and stores it in the viewing history storage unit 22. At this time, the cutout processing unit 15 generates visual content after cutout from the visual content stored in the content storage unit 21 based on the selected area acquired from the input receiving unit 12, and the content distribution unit 11 The cut-out visual content is transmitted to the information terminal 3a. Step S11 is performed for all of the information terminals 3a, 3b, and 3c.

尚、ステップＳ１１における選択領域の収集処理は、上述した処理に限定されない。例えば、当該ビジュアルコンテンツに対する複数の選択領域を表わす情報自体を、画像生成装置１の外部から直接、視聴履歴記憶部２２に格納してもよい。 Note that the selection area collection processing in step S11 is not limited to the above-described processing. For example, information representing a plurality of selection areas for the visual content may be stored in the viewing history storage unit 22 directly from the outside of the image generation apparatus 1.

次に、図２に示すように、画像生成装置１は、集計ヒストグラム情報を生成する（Ｓ１２）。具体的には、画像生成装置１の集計処理部１３は、視聴履歴記憶部２２を参照し、各視聴者の選択領域をピクセル単位に分割し、ピクセル当たりの視聴者数を集計して集計ヒストグラム情報を生成し、当該集計ヒストグラム情報をヒストグラム記憶部２３へ格納する。つまり、異なる視聴者における視聴履歴データに同一のピクセルが存在する場合、同一のピクセルを選択したとみなす。すなわち、集計ヒストグラム情報は、複数の選択領域において、共通するピクセル（画像）が多いほど、値（視聴者数）が大きくなる。 Next, as illustrated in FIG. 2, the image generation device 1 generates aggregate histogram information (S12). Specifically, the totalization processing unit 13 of the image generation apparatus 1 refers to the viewing history storage unit 22, divides each viewer's selection area into pixels, totals the number of viewers per pixel, and totals histogram Information is generated, and the total histogram information is stored in the histogram storage unit 23. That is, if the same pixel exists in the viewing history data of different viewers, it is considered that the same pixel has been selected. That is, the total histogram information has a larger value (number of viewers) as the number of common pixels (images) increases in a plurality of selected areas.

続いて、図２に示すように、画像生成装置１は、領域の推定を行う（Ｓ１３）。具体的には、まず、画像生成装置１の領域推定部１４は、ヒストグラム記憶部２３を参照し、集計ヒストグラム情報を取得する。次に、領域推定部１４は、集計ヒストグラム情報を後述するエネルギー関数に入力し、図５に示すエネルギー最小化処理により、最適解を求める。すなわち、領域推定部１４は、当該エネルギー関数を最小化する領域を示すピクセルの集合を算出する。続いて、領域推定部１４は、最適解である推定領域に基づき、コンテンツ記憶部２１を参照し、当該ビジュアルコンテンツから画像データを生成する。そして、領域推定部１４は、生成された画像データを推定領域コンテンツとして推定領域コンテンツ記憶部２４へ格納する。 Subsequently, as illustrated in FIG. 2, the image generation apparatus 1 estimates a region (S13). Specifically, first, the region estimation unit 14 of the image generation device 1 refers to the histogram storage unit 23 and acquires aggregate histogram information. Next, the region estimation unit 14 inputs the total histogram information into an energy function described later, and obtains an optimal solution by the energy minimization process shown in FIG. That is, the region estimation unit 14 calculates a set of pixels indicating a region that minimizes the energy function. Subsequently, the region estimation unit 14 refers to the content storage unit 21 based on the estimated region that is the optimal solution, and generates image data from the visual content. Then, the region estimation unit 14 stores the generated image data in the estimation region content storage unit 24 as the estimation region content.

その後、図２に示すように、画像生成装置１は、情報端末３ａ、３ｂ、及び３ｃへ推定領域コンテンツを配信する（Ｓ１４）。具体的には、まず、画像生成装置１は、情報端末３ａからの推定領域コンテンツ送信要求を受信し、コンテンツ配信部１１により、推定領域コンテンツ記憶部２４に格納された推定領域コンテンツを要求元の情報端末３ａへ送信する。次に、情報端末３ａは、推定領域コンテンツを受信し、画面に表示する。これにより、情報端末３ａの視聴者は、元のビジュアルコンテンツではなく、最適な提示領域が推定されたビジュアルコンテンツを視聴することができる。また、画像生成装置１は、情報端末３ｂ、及び３ｃその他の情報端末に対しても、同様に、推定領域コンテンツを配信することができる。 Thereafter, as shown in FIG. 2, the image generation device 1 distributes the estimated area content to the information terminals 3a, 3b, and 3c (S14). Specifically, first, the image generation apparatus 1 receives the estimated area content transmission request from the information terminal 3a, and the content distribution unit 11 uses the estimated area content stored in the estimated area content storage unit 24 as a request source. It transmits to the information terminal 3a. Next, the information terminal 3a receives the estimated area content and displays it on the screen. Thereby, the viewer of the information terminal 3a can view the visual content in which the optimum presentation area is estimated, not the original visual content. Similarly, the image generation device 1 can distribute the estimated area content to the information terminals 3b and 3c and other information terminals.

ここで、本発明の実施の形態１にかかる領域推定処理で採用される動的輪郭モデルについて説明する。図４は、動的輪郭モデルの適用例を示す図である。図４（ａ）は、中央に位置する対象物体４について、動的輪郭モデルを適用する際の初期状態を表わし、図４（ｂ）は、動的輪郭モデルの適用による対象物体４の輪郭への収束後の状態を表わす。対象物体４は、対象領域がエッジで囲まれているものである。そして、動的輪郭モデルは、当該エッジを閉曲線として抽出することで、対象物体４の輪郭を抽出するものである。 Here, the active contour model employed in the region estimation process according to the first embodiment of the present invention will be described. FIG. 4 is a diagram illustrating an application example of the active contour model. 4A shows an initial state when the active contour model is applied to the target object 4 located in the center, and FIG. 4B shows the contour of the target object 4 by applying the active contour model. Represents the state after convergence. The target object 4 has a target area surrounded by edges. The dynamic contour model extracts the contour of the target object 4 by extracting the edge as a closed curve.

図４（ａ）において、領域４０は、代表点４１乃至４８により囲まれた領域であり、初期状態における対象物体４の輪郭を表わす。ここで、代表点の数、及び位置は、初期状態で任意に与えられたものであり、これに限定されない。代表点４１乃至４８は、対象物体４に近付くほど小さくなるように定義されたエネルギー関数により、当該エネルギー関数を最小化するような点が求められることにより、それぞれ矢印４０１乃至４０８の方向へ移動をする。その後、代表点４１乃至４８は、図４（ｂ）に示す位置で停止し、領域４０は、対象物体４の輪郭として収束した状態となる。 In FIG. 4A, a region 40 is a region surrounded by representative points 41 to 48, and represents the contour of the target object 4 in the initial state. Here, the number and position of the representative points are arbitrarily given in the initial state, and are not limited thereto. The representative points 41 to 48 are moved in the directions of arrows 401 to 408, respectively, by obtaining points that minimize the energy function based on the energy function defined so as to become smaller as the target object 4 is approached. To do. Thereafter, the representative points 41 to 48 are stopped at the positions shown in FIG. 4B, and the region 40 is converged as the contour of the target object 4.

尚、一般的な動的輪郭モデルでは、対象物体４と背景画像との境界における画素値の差などにより当該エッジの判断が行われるが、本発明の実施の形態１にかかる動的輪郭モデルでは、視聴者数の差により当該エッジの判断が行われる。 Note that, in a general active contour model, the edge is determined based on a difference in pixel values at the boundary between the target object 4 and the background image. However, in the active contour model according to the first embodiment of the present invention, the edge is determined. The edge is determined based on the difference in the number of viewers.

次に、本発明の実施の形態１で用いるエネルギー関数について説明する。ここでは、動的輪郭モデルにおける輪郭は、有限数の代表点（ノード）ｖ_ｉ,ｊを曲線によって繋いだものとして表される。ｉは、同一フレーム内でのノード番号、ｊはフレーム番号を表す。 Next, the energy function used in Embodiment 1 of the present invention will be described. Here, the contour in the active contour model is represented as a finite number of representative points (nodes) v _{i, j} connected by a curve. i represents a node number in the same frame, and j represents a frame number.

以下に、本発明の実施の形態１で用いるエネルギー関数Ｅ_ｔｕｂｅを式（１）に定義する。
In the following, the energy function E _tube used in Embodiment 1 of the present invention is defined in Equation (1).

ここで、α、β、γ、δ、及びεは、各項に対する重み付け係数である。Ｅ_ｈｉｎ、及びＥ_ｈｏｕｔは、多くの視聴者が注目している箇所、すなわち、人気のある位置に近付くエネルギーである。つまり、視聴履歴データの選択領域に共通箇所が多い領域、又は、最も共通する画像を含むような領域に収束するエネルギーである。また、Ｅ_{ｆｒａｍｅ}は、カメラワークを滑らかにしようとするエネルギーである。また、Ｅ_ｄｉｓ、及びＥ_ａｓｐは、それぞれ推定領域を適切な大きさに保つエネルギーと、画面のアスペクト比（縦横比）に近づけるエネルギーである。 Here, α, β, γ, δ, and ε are weighting coefficients for each term. E _hin, and _{E hout} is, places that many of the viewers are paying attention, that is, the energy close to the popular position. That is, the energy converges to an area where there are many common parts in the selected area of the viewing history data or an area including the most common image. E _frame is energy for smoothing the camera work. E _dis and E _asp are energy that keeps the estimated area in an appropriate size and energy that approaches the aspect ratio (aspect ratio) of the screen.

式（１）におけるそれぞれの項は以下の式（２）、式（３）、式（４）、式（５）、及び式（６）のように表される。但し、ｖ_ｉ,ｊ＝（ｘ_ｉ,ｊ、ｙ_ｉ,ｊ）は、フレームｊにおけるノードｉの座標を示している。このとき、Ｎは、そのコンテンツの視聴者全体の数、Ｐ_ＩＮ、及びＰ_ＯＵＴは、それぞれ推定領域の内側、及び外側のピクセル数、Σ_ＩＮ、及びΣ_ＯＵＴは、それぞれ推定領域の内側、及び外側の各ピクセルにおける視聴者数の合計である。 Each term in the formula (1) is expressed as the following formula (2), formula (3), formula (4), formula (5), and formula (6). However, v _{i, j} = (x _{i, j} , y _{i, j} ) indicates the coordinates of the node i in the frame j. At this time, N is the total number of viewers of the content, P _IN and P _OUT are the numbers of pixels inside and outside the estimated area, respectively, Σ _IN and Σ _OUT are respectively inside the estimated area, and This is the total number of viewers at each outer pixel.

式（２）に示すように、Ｅ_ｈｉｎは、推定領域の内部に関するエネルギーであり、推定領域の内部の視聴者数が多いほど、又は、推定領域の内部に共通する画像が多いほど、小さくなるエネルギーである。
As shown in Expression (2), E _hin is energy relating to the inside of the estimation area, and becomes smaller as the number of viewers inside the estimation area increases or as the number of images common to the estimation area increases. Energy.

式（３）に示すように、Ｅ_ｈｏｕｔは、推定領域の外部に関するエネルギーであり、推定領域の外部の視聴者数が少ないほど、又は、推定領域の外部に共通する画像が少ないほど、小さくなるエネルギーである。
As shown in Expression (3), E _hout is energy related to the outside of the estimation area, and becomes smaller as the number of viewers outside the estimation area decreases or as the number of images common outside the estimation area decreases. Energy.

式（４）に示すように、Ｅ_{ｆｒａｍｅ}は、前後のフレームの同一のノードｉにおける座標位置が一致するほど、小さくなるエネルギーである。つまり、Ｅ_{ｆｒａｍｅ}は、フレーム間の移動を滑らかにするエネルギーである。または、Ｅ_{ｆｒａｍｅ}は、対象のフレームの前後のフレームの切り出し位置を直線で結び、前記前後の間の対象のフレームの切り出し位置が直線に近付くほど、小さくなるエネルギーである。
As shown in Expression (4), E _frame is energy that becomes smaller as the coordinate positions at the same node i in the preceding and _succeeding frames coincide. That is, E _frame is energy for smooth movement between frames. _{Alternatively} , E _frame is energy that becomes smaller as the cut-out positions of the target frames before and after the target frame are connected by a straight line, and the cut-out positions of the target frame between the front and rear approaches the straight line.

式（５）に示すように、Ｅ_ｄｉｓは、フレームｊにおける目標とする推定領域のサイズｘ_ｉｄｅａ、及びｙ_ｉｄｅａに近付くほど小さくなるエネルギーである。すなわち、Ｅ_ｄｉｓは、推定領域のサイズを統一するためのエネルギーである。ここで、ｘ_ｉｄｅａ、及びｙ_ｉｄｅａは、それぞれ推定領域が近付くべき縦、及び横の幅を表わす。
As shown in Expression (5), E _dis is energy that becomes smaller as the size approaches _xidea and _yidea of the target estimation region in the frame j. That is, E _dis is energy for unifying the size of the estimated region. Here, _xidea and _yidea represent the vertical and horizontal widths that the estimation area should approach, respectively.

式（６）に示すように、Ｅ_ａｓｐは、フレームｊにおける推定領域のアスペクト比が、目標とするアスペクト比ｈ／ｗに近付くほど、小さくなるエネルギーである。すなわち、Ｅ_ａｓｐは、アスペクト比を一定に保つためのエネルギーである。ここで、ｗ、及びｈは、それぞれ表示される画面の幅、及び高さを表わす。
As shown in Equation (6), E _asp is energy that decreases as the aspect ratio of the estimated region in frame j approaches the target aspect ratio h / w. That is, E _asp is energy for keeping the aspect ratio constant. Here, w and h represent the width and height of the displayed screen, respectively.

また、ｘ_ｓｉｚｅ（ｊ）、及びｙ_ｓｉｚｅ（ｊ）は、以下の式（７）、及び式（８）で表される。
Further, x _size (j) and y _size (j) are represented by the following formulas (7) and (8).

尚、本発明の実施の形態１にかかる領域推定処理を実現するためには、エネルギー関数Ｅ_ｔｕｂｅにおいて、Ｅ_ｈｉｎが最低限、定義される必要がある。視聴履歴データを利用するためである。但し、式（２）の内容は、任意である。 In order to realize the region estimation processing according to the first embodiment of the present invention, E _hin needs to be defined at a minimum in the energy function E _tube . This is because the viewing history data is used. However, the content of Formula (2) is arbitrary.

また、式（３）、式（４）、式（５）、及び式（６）の内容がこれに限定されないことは勿論である。さらに、エネルギー関数Ｅ_ｔｕｂｅにおいては、式（１）に加えて、他のエネルギーが追加で定義されても構わない。 Needless to say, the contents of Expression (3), Expression (4), Expression (5), and Expression (6) are not limited thereto. Further, in the energy function E _tube , other energy may be additionally defined in addition to the equation (1).

図５は、本発明の実施の形態１にかかるエネルギー最小化処理を示すフローチャート図である。エネルギー最小化には、貪欲法（Greedy Algorithm）を用いる。尚、Greedy Algorithmは、公知技術としてよく知られているものであるから、ここでは詳細な説明を省略する。 FIG. 5 is a flowchart showing energy minimization processing according to the first exemplary embodiment of the present invention. Greedy algorithm is used for energy minimization. The Greedy Algorithm is well known as a publicly known technique, and therefore detailed description thereof is omitted here.

ここでは、式（１）における各々のパラメータを、Ｎ＝３２、ｗ＝１８０、ｈ＝１２０、ｘ_ｉｄｅａ＝３６０、及びｙ_ｉｄｅａ＝２４０とする。また、Greedy Algorithmに関する探索近傍は、７×７ピクセルとする。すなわち、本発明の実施の形態１にかかるエネルギー最小化処理では、１つの代表点当たり、当該代表点の周囲７×７ピクセルの点について、エネルギー関数値を算出するものとする。但し、これらのパラメータ、探索近傍のピクセル数はこれに限定されるものではない。 Here, it is assumed that the parameters in the equation (1) are N = 32, w = 180, h = 120, _xidea = 360, and _yidea = 240. Also, the search neighborhood for the Greedy Algorithm is 7 × 7 pixels. That is, in the energy minimization process according to the first exemplary embodiment of the present invention, the energy function value is calculated for each representative point at a point of 7 × 7 pixels around the representative point. However, these parameters and the number of pixels in the vicinity of the search are not limited to this.

画像生成装置１の領域推定部１４は、式（１）に定義されるエネルギー関数に基づき、視聴履歴データを入力として、エネルギー最小化処理を行う。まず、領域推定部１４は、初期代表点を入力する（Ｓ２１）。ここで、初期代表点は、各フレーム当たり、任意の２点である。例えば、各フレームの左上、及び右下の点であればよい。 The region estimation unit 14 of the image generation device 1 performs the energy minimization process using the viewing history data as an input based on the energy function defined in Expression (1). First, the region estimation unit 14 inputs initial representative points (S21). Here, the initial representative points are two arbitrary points for each frame. For example, the upper left and lower right points of each frame may be used.

次に、領域推定部１４は、各代表点の近傍ピクセルのエネルギー関数値を算出する（Ｓ２２）。具体的には、領域推定部１４は、代表点ｖ_ｉ,ｊ、及び、代表点ｖ_ｉ,ｊの近傍ピクセルにおけるエネルギー関数値を算出する。これを全てのフレーム、及びノードについて行う。 Next, the region estimation unit 14 calculates energy function values of neighboring pixels of each representative point (S22). Specifically, the region estimation unit 14, the representative point v _{i, j,} and calculates the energy function values at representative points v _i, near the _j pixels. This is performed for all frames and nodes.

そして、領域推定部１４は、各代表点ｖ_ｉ,ｊに当たりに、代表点よりエネルギー関数値が小さい近傍ピクセルが存在するか否かを判定する（Ｓ２３）。いずれかの代表点ｖ_ｉ,ｊにおいて、代表点よりエネルギー関数値が小さい近傍ピクセルが存在すると判定された場合、ステップＳ２４へ進み、存在しないと判定された場合、処理を終了し、この時点での代表点により表わされる領域を推定領域とする。 Then, the region estimation unit 14 determines whether there is a neighboring pixel having an energy function value smaller than that of the representative point for each representative point v _{i, j} (S23). If it is determined that there is a neighboring pixel having an energy function value smaller than that of the representative point at any of the representative points v _{i, j} , the process proceeds to step S24. If it is determined that no neighboring pixel exists, the process is terminated. An area represented by the representative points is assumed as an estimated area.

代表点よりエネルギー関数値が小さい近傍ピクセルが存在すると判定された場合、領域推定部１４は、該当する代表点ｖ_ｉ,ｊを、エネルギー関数値が最小となる当該近傍ピクセルへ移動させ、次の代表点ｖ_ｉ,ｊとする（Ｓ２４）。これを、ステップＳ２３において、該当する代表点がなくなるまで繰り返す。 If it is determined that there is a neighboring pixel having an energy function value smaller than that of the representative point, the region estimation unit 14 moves the corresponding representative point v _{i, j} to the neighboring pixel having the smallest energy function value. The representative point is v _{i, j} (S24). This is repeated until there is no corresponding representative point in step S23.

尚、ステップＳ２３において、該当する代表点がなくなるまで繰り返すようにしたが、該当する代表点の上限値を設定してもよい。これにより、収束時間が短縮される。 In step S23, the process is repeated until there is no corresponding representative point. However, an upper limit value of the corresponding representative point may be set. Thereby, the convergence time is shortened.

さらに、式（１）の収束を助けるため、パラメータα、β、γ、δ、及びεを式（９）に示すように、反復回数ｒによって変化させることが望ましい。これにより、より収束効率を上げることができる。すなわち、エネルギー最小化処理における反復回数が１５０回未満の間は、パラメータβ、及びγを０として、式（１）は、式（２）、式（５）、及び式（６）のみで定義されるエネルギー関数とする。そして、反復回数が１５０回以上になった後に、パラメータβ、及びγを０以外にし、式（１）に、式（３）、及び式（４）の定義をさらに加え、パラメータδ、及びεを変更して、式（５）、及び式（６）に定義されるエネルギー値の重みを調整する。従って、パラメータαの値自体は変更しないが、結果的に式（２）の式（１）における比率は、反復回数が１５０回未満の間に比べ、調整されたことになる。
Furthermore, in order to help convergence of the equation (1), it is desirable to change the parameters α, β, γ, δ, and ε according to the number of iterations r as shown in the equation (9). Thereby, convergence efficiency can be raised more. That is, while the number of iterations in the energy minimization process is less than 150, the parameters β and γ are set to 0, and the equation (1) is defined only by the equations (2), (5), and (6). Is an energy function. After the number of iterations reaches 150 or more, the parameters β and γ are set to other than 0, the definitions of the equations (3) and (4) are further added to the equation (1), and the parameters δ and ε Is changed to adjust the weight of the energy value defined in Equation (5) and Equation (6). Accordingly, although the value of the parameter α itself is not changed, as a result, the ratio in the expression (1) of the expression (2) is adjusted as compared with the case where the number of iterations is less than 150.

尚、上述したエネルギー最小化処理は、Greedy Algorithmに限定されない。例えば、変分法や動的計画法が適用可能である。しかし、本発明の実施の形態１におけるエネルギー最小化処理は、変分法、及び動的計画法に比べ、Greedy Algorithmが好適である。例えば、変分法の場合、エネルギー最小化における反復処理ごとに逆行列の計算が発生し、計算量、及び精度の観点で最適とは言えない。尚、変分法、及び動的計画法は、公知技術としてよく知られているものであるから、ここでは詳細な説明を省略する。 The energy minimization process described above is not limited to the Greedy Algorithm. For example, a variational method or a dynamic programming method can be applied. However, the energy minimization processing in Embodiment 1 of the present invention is preferably the Greedy Algorithm as compared with the variational method and the dynamic programming method. For example, in the case of the variational method, calculation of an inverse matrix occurs for each iterative process in energy minimization, which is not optimal in terms of calculation amount and accuracy. Note that the variational method and the dynamic programming are well known as publicly known techniques, and thus detailed description thereof is omitted here.

図６は、本発明の実施の形態１にかかる領域選択画面５０の例を示す図である。領域選択画面５０は、情報端末３ａ、３ｂ、及び３ｃ内で稼働する選択領域の切り出すためのアプリケーションにより提供されるインタフェースである。 FIG. 6 is a diagram showing an example of the area selection screen 50 according to the first embodiment of the present invention. The area selection screen 50 is an interface provided by an application for cutting out a selection area that operates in the information terminals 3a, 3b, and 3c.

領域選択画面５０は、情報端末３ａ、３ｂ、及び３ｃの画面に表示される。領域選択画面５０は、ダウンサンプリング済原画像５１と、切り出し画像５３とを備える。ダウンサンプリング済原画像５１は、画像生成装置１から配信されるビジュアルコンテンツを表示する領域であり、元のビジュアルコンテンツに対して、低解像度で表示される。ダウンサンプリング済原画像５１は、切り出し領域５２を含む。切り出し領域５２は、視聴者のマウス等の操作により、位置、及び大きさが変更可能である。そのため、視聴者は、ダウンサンプリング済原画像５１内の切り出し領域５２の位置、及び大きさを変更させることができる。 The area selection screen 50 is displayed on the screens of the information terminals 3a, 3b, and 3c. The area selection screen 50 includes a downsampled original image 51 and a cutout image 53. The downsampled original image 51 is an area for displaying visual content distributed from the image generating device 1 and is displayed at a low resolution with respect to the original visual content. The downsampled original image 51 includes a cutout region 52. The position and size of the cutout area 52 can be changed by the operation of the viewer's mouse or the like. Therefore, the viewer can change the position and size of the cutout area 52 in the downsampled original image 51.

切り出し画像５３は、切り出し領域５２に囲まれた画像が、ダウンサンプリング済原画像５１と同じサイズに拡大して表示される領域である。そのため、視聴者は、自身が選択した切り出し領域５２の領域を切り出し画像５３により確認した上で、画像の切り出しを行うことができる。 The cutout image 53 is an area where the image surrounded by the cutout area 52 is enlarged and displayed to the same size as the downsampled original image 51. Therefore, the viewer can cut out the image after confirming the cutout region 52 selected by the viewer using the cutout image 53.

図７は、動画像コンテンツへの従来技術及び本発明の実施の形態１の適用の比較結果を示す図である。この例では、当該動画像コンテンツは、フレーム内に一つのオブジェクト（ライオン）が存在し、フレーム間で当該オブジェクトが移動する様子を示す。また、比較対象は、当該動画像コンテンツにおけるフレーム０、３０、６０、９０、１２０とする。また、各視聴者は、図６に示す領域選択画面５０を操作し、当該オブジェクト付近を任意に選択領域として選択したものとする。 FIG. 7 is a diagram showing a comparison result of application of the related art and the first embodiment of the present invention to moving image content. In this example, the moving image content shows a state in which one object (lion) exists in a frame and the object moves between frames. The comparison targets are frames 0, 30, 60, 90, and 120 in the moving image content. Each viewer operates the area selection screen 50 shown in FIG. 6 and arbitrarily selects the vicinity of the object as a selection area.

図８は、原画像である高解像度の動画像コンテンツを単純にダウンサンプリングした結果、生成される画像群を示す。図９は、原画像を本発明の実施の形態１にかかる画像生成装置１に適用した結果、推定領域コンテンツとして生成される画像群を示す。このことから、図９は、図８に比べ、当該オブジェクトがより鮮明に表示されることがわかる。また、この結果は、対象となる動画像コンテンツに複数のオブジェクトに対しても、同様に得ることが可能である。 FIG. 8 shows a group of images generated as a result of simply down-sampling the high-resolution moving image content that is the original image. FIG. 9 shows a group of images generated as estimated region contents as a result of applying the original image to the image generating apparatus 1 according to the first embodiment of the present invention. From this, it can be seen that FIG. 9 displays the object more clearly than FIG. In addition, this result can be similarly obtained for a plurality of objects in the target moving image content.

このように、本発明の実施の形態１にかかる画像生成装置１を用いることにより、多数の視聴者から集められた多くの履歴情報から全体にとって最適な提示領域を推定することができる。すなわち、集合知を用いることで、原画像に含まれる部分的な領域のうち、より多くの視聴者に選択された領域を推定することができる。 As described above, by using the image generation apparatus 1 according to the first embodiment of the present invention, it is possible to estimate the optimum presentation area for the whole from a lot of history information collected from a large number of viewers. That is, by using collective intelligence, it is possible to estimate a region selected by a larger number of viewers among partial regions included in the original image.

また、視聴者の操作履歴から選択領域における視聴者数の集計ヒストグラム情報を生成することで、視聴者の注目箇所をより的確に表現することができ、最適提示領域、すなわち、視聴者全体の満足度の平均が最高となるような領域を推定することができる。 In addition, by generating aggregate histogram information of the number of viewers in the selected area from the viewer's operation history, it is possible to more accurately represent the viewer's attention location, and the optimal presentation area, that is, the satisfaction of the entire viewer It is possible to estimate a region where the average degree is the highest.

さらに、選択領域における視聴者数の集計ヒストグラム情報を用いることで、選択領域内の視聴者数の密度をエネルギーとして定義した動的輪郭モデルを適用することにより、視聴者全体の主観品質を高くする推定領域を求めることができる。 Furthermore, by using the aggregate histogram information of the number of viewers in the selected region, the subjective quality of the entire viewer is increased by applying a dynamic contour model in which the density of the number of viewers in the selected region is defined as energy. An estimation area can be obtained.

特に、本発明の実施の形態１にかかる画像生成方法における動的輪郭モデルでは、複数の視聴者の操作履歴から最適提示領域を推定するために，さまざまな拘束条件を一次元のエネルギーで表現したエネルギー関数を定義した。当該エネルギー関数では、選択領域内の視聴者数の累積数やカメラワークの整合性を導入することにより、単純に動画像全体を縮小する場合に比べて高い主観品質の動画像を生成することができる。 In particular, in the active contour model in the image generation method according to the first embodiment of the present invention, various constraint conditions are expressed by one-dimensional energy in order to estimate the optimum presentation area from the operation history of a plurality of viewers. The energy function was defined. In the energy function, by introducing the cumulative number of viewers in the selected region and the consistency of camera work, it is possible to generate a moving image with higher subjective quality than when the entire moving image is simply reduced. it can.

尚、上述した領域推定部１４では、予め、推定領域から推定領域コンテンツを生成し、推定領域コンテンツ記憶部２４に格納するようにしていたが、これに限定されない。例えば、領域推定部１４は、推定領域自体を別途、記憶しておき、推定領域コンテンツの配信時に、コンテンツ配信部１１が、都度、当該推定領域から推定領域コンテンツを生成した上で配信するようにしてもよい。 In the above-described region estimation unit 14, the estimated region content is generated from the estimated region and stored in the estimated region content storage unit 24 in advance, but the present invention is not limited to this. For example, the region estimation unit 14 stores the estimation region itself separately, and when the estimated region content is distributed, the content distribution unit 11 generates the estimated region content from the estimated region and distributes it each time. May be.

発明の実施の形態２．
近年、携帯電話へのいわゆる「フルブラウザ」の搭載により、本来はＰＣ（パーソナルコンピュータ）での閲覧を前提に作成されたＷＥＢコンテンツを、その本来のレイアウトを崩さずに携帯電話で閲覧できるようになった。しかしながら、依然として携帯電話のディスプレイは低解像度であり、一度に閲覧できる範囲は本来のＷＥＢコンテンツの一部を切り取った（トリミングした）ものとならざるを得ない。これにより、携帯電話を使用するユーザ（視聴者）がＷＥＢコンテンツのうち、自身が所望する箇所を閲覧できるまでには、大きな労力が求められる。これを助けるため、携帯電話のフルブラウザには、現在視聴している箇所がＷＥＢコンテンツ全体のどの部分にあたるかを提示する縮小画像の表示機能が搭載されていることが多い。しかし、当該縮小画像からＷＥＢコンテンツの各部の内容を把握することは難しいため、問題の部分的な解決に留まっている。 Embodiment 2 of the Invention
In recent years, so-called “full browser” has been installed in mobile phones, so that WEB content originally created on the premise of browsing on a PC (personal computer) can be viewed on a mobile phone without destroying its original layout. became. However, the display of the mobile phone still has a low resolution, and the range that can be viewed at one time must be a part of the original WEB content cut out (trimmed). Thus, a great effort is required before a user (viewer) who uses a mobile phone can browse a portion desired by the user in the WEB content. In order to help this, a full browser of a mobile phone is often equipped with a display function of a reduced image that presents which part of the entire WEB content the part currently being viewed is. However, since it is difficult to grasp the details of each part of the WEB content from the reduced image, the problem is only partially solved.

そこで、本発明の実施の形態２は、本発明の実施の形態１における領域推定処理の変形例として、携帯電話向けのＷＥＢコンテンツを対象とするものを提供する。つまり、当該ＷＥＢコンテンツを同様の携帯電話を用いて閲覧した他ユーザの視聴履歴を利用して、特に人気のある箇所をすばやく表示させることで、携帯電話のディスプレイに自身が所望する箇所を表示させる労力を劇的に減らせることができる。 Therefore, Embodiment 2 of the present invention provides a target for WEB content for mobile phones as a modification of the area estimation processing in Embodiment 1 of the present invention. In other words, by using the viewing history of other users who have browsed the WEB content using a similar mobile phone, a particularly popular location can be quickly displayed to display the desired location on the mobile phone display. The labor can be reduced dramatically.

図１０は、本発明の実施の形態２における適用例の概略を示す図である。図１０おいては、図１、及び図２の画像生成装置１が、キャリアサーバ１ａ、及びＷＥＢサーバ１ｂに置き換わり、図１の情報端末３ａ、３ｂ、及び３ｃが、携帯電話端末３１に置き換わり、図２の情報端末３ｄ、３ｅ、及び３ｆが、携帯電話端末３３に置き換ったものである。また、本発明の実施の形態１にかかるコンテンツ配信部１１が、コンテンツ配信部１１ａに置き換わり、本発明の実施の形態１にかかる推定領域コンテンツ記憶部２４が、推定領域情報記憶部２５に置き換わったものである。その他、図１０において、図１、及び図２と同等の機能を有するものには、同一の符号が付されており、説明を省略する。 FIG. 10 is a diagram showing an outline of an application example in the second embodiment of the present invention. In FIG. 10, the image generation apparatus 1 of FIGS. 1 and 2 is replaced with a carrier server 1a and a WEB server 1b, and the information terminals 3a, 3b, and 3c of FIG. 1 are replaced with a mobile phone terminal 31. The information terminals 3d, 3e, and 3f in FIG. 2 are replaced with the mobile phone terminal 33. Also, the content distribution unit 11 according to the first embodiment of the present invention is replaced with the content distribution unit 11a, and the estimated region content storage unit 24 according to the first embodiment of the present invention is replaced with the estimated region information storage unit 25. Is. In addition, in FIG. 10, the same code | symbol is attached | subjected to what has a function equivalent to FIG. 1 and FIG. 2, and description is abbreviate | omitted.

携帯電話端末３１は、ネットワークを介してキャリアサーバ１ａと通信可能であり、フルブラウザ３２を備えた携帯電話端末である。フルブラウザ３２は、キャリアサーバ１ａから受信したＷＥＢコンテンツを表示し、当該ＷＥＢコンテンツにおける一部を拡大表示することができる。また、フルブラウザ３２は、拡大表示された位置をキャリアサーバ１ａへ送信することができる。これにより、フルブラウザ３２に表示されたＷＥＢコンテンツの内、拡大表示された領域を、視聴者における視聴履歴データとして扱うことができる。 The mobile phone terminal 31 is a mobile phone terminal that can communicate with the carrier server 1a via a network and includes a full browser 32. The full browser 32 can display the WEB content received from the carrier server 1a, and can enlarge and display a part of the WEB content. Further, the full browser 32 can transmit the position displayed in an enlarged manner to the carrier server 1a. As a result, it is possible to treat an enlarged area of the WEB content displayed on the full browser 32 as viewing history data for the viewer.

尚、携帯電話端末３３、及び携帯電話端末３３が備えるフルブラウザ３４は、携帯電話端末３１、及びフルブラウザ３２と同様の機能であればよいため、説明を省略する。 Note that the mobile phone terminal 33 and the full browser 34 included in the mobile phone terminal 33 may have the same functions as the mobile phone terminal 31 and the full browser 32, and thus description thereof is omitted.

キャリアサーバ１ａは、携帯電話端末３１、携帯電話端末３３、及びＷＥＢサーバ１ｂとネットワーク（不図示）を介して通信可能であり、携帯電話端末３１、及び携帯電話端末３３とＷＥＢサーバ１ｂとを中継するためのサーバである。ＷＥＢブラウジングを含む携帯電話端末による通信は、課金等の都合上、すべて携帯電話会社（キャリア）のサーバを経由する必要があるためである。 The carrier server 1a can communicate with the mobile phone terminal 31, the mobile phone terminal 33, and the WEB server 1b via a network (not shown), and relays between the mobile phone terminal 31, the mobile phone terminal 33, and the WEB server 1b. It is a server to do. This is because communication by a mobile phone terminal including WEB browsing needs to go through a server of a mobile phone company (carrier) for convenience of billing and the like.

キャリアサーバ１ａの推定領域情報記憶部２５は、領域推定部１４により求められる推定領域情報を記憶する。ここで、推定領域情報は、ＷＥＢコンテンツにおける位置情報である。例えば、ＷＥＢコンテンツに含まれているレイアウト情報であればよい。 The estimated area information storage unit 25 of the carrier server 1a stores estimated area information obtained by the area estimating unit 14. Here, the estimated area information is position information in the WEB content. For example, it may be layout information included in the WEB content.

また、キャリアサーバ１ａのコンテンツ配信部１１ａは、携帯電話端末３１、又は携帯電話端末３３からＷＥＢサーバ１ｂへのＷＥＢコンテンツのリクエストを中継し、ＷＥＢサーバ１ｂからのＷＥＢコンテンツを携帯電話端末３１、又は携帯電話端末３３へ送信する。 The content distribution unit 11a of the carrier server 1a relays a web content request from the mobile phone terminal 31 or the mobile phone terminal 33 to the WEB server 1b, and sends the WEB content from the WEB server 1b to the mobile phone terminal 31 or Transmit to the mobile phone terminal 33.

キャリアサーバ１ａは、携帯電話端末３１、又は携帯電話端末３３から当該ＷＥＢコンテンツにおける視聴履歴データを受信し、視聴履歴記憶部２２に格納する。また、キャリアサーバ１ａは、画像生成装置１と同様に、集計処理部１３、及び領域推定部１４により領域推定情報を生成する。そして、領域推定部１４は、推定領域情報記憶部２５へ当該推定領域情報を格納する。さらに、コンテンツ配信部１１ａは、携帯電話端末３１、又は携帯電話端末３３からの推定領域コンテンツ要求時には、ＷＥＢコンテンツに推定領域情報を付加して送信する。 The carrier server 1 a receives the viewing history data in the WEB content from the mobile phone terminal 31 or the mobile phone terminal 33 and stores it in the viewing history storage unit 22. In addition, the carrier server 1 a generates region estimation information by the aggregation processing unit 13 and the region estimation unit 14 in the same manner as the image generation device 1. Then, the region estimation unit 14 stores the estimated region information in the estimated region information storage unit 25. Further, when the estimated area content is requested from the mobile phone terminal 31 or the mobile phone terminal 33, the content distribution unit 11a adds the estimated area information to the WEB content and transmits it.

ＷＥＢサーバ１ｂは、キャリアサーバ１ａと通信可能であり、コンテンツ記憶部２１を備えた一般的なＷＥＢサーバである。コンテンツ記憶部２１は、ＰＣ向けのＷＥＢコンテンツを記憶する。 The WEB server 1 b is a general WEB server that can communicate with the carrier server 1 a and includes the content storage unit 21. The content storage unit 21 stores WEB content for PC.

続いて、本発明の実施の形態２における処理の流れを説明する。尚、本発明の実施の形態２の適用例の概略処理は、図３と同様のため、図示を省略する。 Next, the flow of processing in Embodiment 2 of the present invention will be described. The schematic process of the application example of the second embodiment of the present invention is the same as that in FIG.

まず、キャリアサーバ１ａは、携帯電話端末３１から視聴者の選択領域の情報を収集する（Ｓ１１）。具体的には、まず、携帯電話端末３１は、キャリアサーバ１ａへＷＥＢコンテンツのリクエストを送信する。次に、キャリアサーバ１ａは、携帯電話端末３１からのＷＥＢコンテンツのリクエストを受信し、当該ＷＥＢコンテンツを保有するＷＥＢサーバ１ｂへリクエストを送信する。そして、ＷＥＢサーバ１ｂは、コンテンツ記憶部２１を参照し、当該ＷＥＢコンテンツをキャリアサーバ１ａへ送信する。その後、キャリアサーバ１ａは、ＷＥＢサーバ１ｂから当該ＷＥＢコンテンツを受信し、要求元である携帯電話端末３１へ送信する。ここで、携帯電話端末３１は、キャリアサーバ１ａからＷＥＢコンテンツを受信し、フルブラウザ３２に表示する。視聴者は、フルブラウザ３２により、当該ＷＥＢコンテンツに対して、自身が所望する箇所を拡大表示させる。このとき、フルブラウザ３２は、当該ＷＥＢコンテンツにおける拡大表示された位置を視聴履歴データとして、キャリアサーバ１ａへ送信する。そして、キャリアサーバ１ａは、携帯電話端末３１から当該ＷＥＢコンテンツにおける視聴履歴データを受信し、視聴履歴記憶部２２に格納する。 First, the carrier server 1a collects information on the selected area of the viewer from the mobile phone terminal 31 (S11). Specifically, first, the mobile phone terminal 31 transmits a web content request to the carrier server 1a. Next, the carrier server 1a receives the web content request from the mobile phone terminal 31, and transmits the request to the web server 1b that holds the web content. Then, the WEB server 1b refers to the content storage unit 21, and transmits the WEB content to the carrier server 1a. Thereafter, the carrier server 1a receives the WEB content from the WEB server 1b and transmits it to the mobile phone terminal 31 that is the request source. Here, the mobile phone terminal 31 receives the WEB content from the carrier server 1 a and displays it on the full browser 32. The viewer uses the full browser 32 to enlarge and display a portion desired by the viewer on the WEB content. At this time, the full browser 32 transmits the enlarged display position in the WEB content as viewing history data to the carrier server 1a. Then, the carrier server 1 a receives the viewing history data in the WEB content from the mobile phone terminal 31 and stores it in the viewing history storage unit 22.

次に、キャリアサーバ１ａは、集計処理部１３により集計ヒストグラム情報の生成（Ｓ１２）、領域推定部１４により領域推定処理（Ｓ１３）を行い、推定領域情報記憶部２５へ当該推定領域情報を格納する。このとき、領域推定処理において用いられるエネルギー関数は、例えば、式（２）、式（３）、式（５）、及び式（６）により定義することができる。さらに、ＷＥＢコンテンツに含まれているレイアウト情報を用いることでより領域推定の精度を上げることができる。ここで、レイアウト情報とは、ＨＴＭＬ（HyperText Markup Language）タグやＣＳＳ（Cascading Style Sheets）に記述されているものであればよい。 Next, the carrier server 1 a performs generation of aggregate histogram information by the aggregation processing unit 13 (S 12), region estimation processing by the region estimation unit 14 (S 13), and stores the estimated region information in the estimation region information storage unit 25. . At this time, the energy function used in the region estimation process can be defined by, for example, Expression (2), Expression (3), Expression (5), and Expression (6). Furthermore, the accuracy of area estimation can be further improved by using the layout information included in the WEB content. Here, the layout information may be anything described in an HTML (HyperText Markup Language) tag or CSS (Cascading Style Sheets).

その後、キャリアサーバ１ａは、携帯電話端末３３へ推定領域コンテンツを配信する（Ｓ１４）。具体的には、まず、キャリアサーバ１ａは、携帯電話端末３３からの推定領域コンテンツ送信要求を受信し、ＷＥＢサーバ１ｂへＷＥＢコンテンツのリクエストを送信する。その後、キャリアサーバ１ａは、ＷＥＢサーバ１ｂから当該ＷＥＢコンテンツを受信し、また、コンテンツ配信部１１ａにより、推定領域情報記憶部２５に格納された推定領域情報を取得し、当該ＷＥＢコンテンツに推定領域情報を付加して、要求元である携帯電話端末３３へ送信する。次に、携帯電話端末３３は、ＷＥＢコンテンツ、及び推定領域情報を受信する。そして、フルブラウザ３４は、推定領域情報の位置情報を解釈し、ＷＥＢコンテンツの該当する位置を拡大表示する。 Thereafter, the carrier server 1a distributes the estimated area content to the mobile phone terminal 33 (S14). Specifically, first, the carrier server 1a receives an estimated area content transmission request from the mobile phone terminal 33, and transmits a request for WEB content to the WEB server 1b. Thereafter, the carrier server 1a receives the WEB content from the WEB server 1b, acquires the estimated region information stored in the estimated region information storage unit 25 by the content distribution unit 11a, and stores the estimated region information in the WEB content. To the mobile phone terminal 33 that is the request source. Next, the mobile phone terminal 33 receives the WEB content and the estimated area information. Then, the full browser 34 interprets the position information of the estimated area information and enlarges and displays the corresponding position of the WEB content.

これにより、携帯電話端末３３の視聴者は、当該ＷＥＢコンテンツの先頭から自身が所望する箇所へ移動させることなく、事前に携帯電話端末３１の視聴者が拡大表示した位置を閲覧することができる。そのため、キャリアサーバ１ａによって推定領域情報が求められた後には、当該ＷＥＢコンテンツを閲覧する視聴者に対してこの最適提示領域を推薦することが可能となり、これによってユーザがＷＥＢコンテンツ内部で適切な閲覧箇所を探す労力を大幅に軽減することができる。 Thereby, the viewer of the mobile phone terminal 33 can browse the position enlarged by the viewer of the mobile phone terminal 31 in advance without moving from the top of the WEB content to a desired location. For this reason, after the estimated area information is obtained by the carrier server 1a, it is possible to recommend the optimum presentation area to the viewer who browses the WEB content, so that the user can appropriately browse inside the WEB content. The effort to find a place can be greatly reduced.

尚、上述したフルブラウザ３２は、拡大表示された領域の表示時間により、視聴履歴データに重み付けをしてもよい。例えば、ＷＥＢコンテンツの内、表示時間が短い箇所は、視聴者の所望の箇所へ移動する際に通過した箇所である可能性が高いため、視聴者の興味が少ないとみなして重み付けを下げ、表示時間が長い箇所は、視聴者の所望の箇所である可能性が高いため、視聴者の興味が大きいとみなして重み付けを上げるようにするとよい。これにより、より精度の高い推定領域情報を求めることができる。 The full browser 32 described above may weight the viewing history data according to the display time of the enlarged display area. For example, a portion of the WEB content where the display time is short is likely to be a portion that has been passed when moving to a desired portion of the viewer. Therefore, it is assumed that the viewer is less interested and the weight is reduced and displayed. Since it is highly possible that a portion with a long time is a portion desired by the viewer, it is preferable that the weight is increased by regarding the viewer as having great interest. Thereby, estimated area information with higher accuracy can be obtained.

このように、本発明の実施の形態２では、世界中の複数のＷＥＢサーバに分散して存在しているＷＥＢコンテンツに対し、各ＷＥＢコンテンツの視聴履歴データをキャリアサーバ１ａ上で一元的に管理することで、世界中に分散して存在する各ＷＥＢサーバに改変を加えることなく、それぞれのＷＥＢコンテンツにおける最適提示領域を推薦することができる。 As described above, in the second embodiment of the present invention, viewing history data of each WEB content is centrally managed on the carrier server 1a with respect to the WEB content distributed in a plurality of WEB servers all over the world. By doing so, it is possible to recommend the optimum presentation area in each WEB content without modifying each WEB server distributed in the world.

尚、本発明の実施の形態２は、携帯電話端末に限定されない。例えば、ＰＣなどの任意の情報端末においても、通信プロバイダなどの直近のサーバに視聴履歴データを格納し、領域推定処理を行うことで、実現可能である。 The second embodiment of the present invention is not limited to a mobile phone terminal. For example, even an arbitrary information terminal such as a PC can be realized by storing viewing history data in a nearest server such as a communication provider and performing region estimation processing.

その他の発明の実施の形態．
尚、本発明の実施の形態１において、ビジュアルコンテンツは、動画像コンテンツに限定されない。例えば、単一の画像データであっても適用可能である。 Other Embodiments of the Invention
In the first embodiment of the present invention, the visual content is not limited to moving image content. For example, even single image data can be applied.

本発明の実施の形態１において、フレーム、及び選択領域は、矩形に限定されない。例えば、矩形以外の形状の選択領域である場合、式（５）、式（６）の定義を変形させることで実現可能である。また、矩形以外の形状の推定領域である場合、画面への表示において、推定領域の形状を維持したまま表示すればよい。 In the first embodiment of the present invention, the frame and the selection area are not limited to a rectangle. For example, in the case of a selection area having a shape other than a rectangle, this can be realized by modifying the definitions of Expressions (5) and (6). Further, in the case of an estimated area having a shape other than a rectangle, display on the screen may be performed while maintaining the shape of the estimated area.

尚、本発明の実施の形態１及び２において、動的輪郭モデルに対して、視聴履歴データを入力としたが、視聴履歴に限定されるものではない。例えば、複数の画像データを入力とすればよい。 In the first and second embodiments of the present invention, viewing history data is input to the active contour model, but the present invention is not limited to viewing history. For example, a plurality of image data may be input.

尚、本発明の実施の形態１における画像生成装置１の各機能は、情報端末３ａ、３ｂ、及び３ｃの内部で実現されても構わない。すなわち、情報端末３ａ、３ｂ、及び３ｃは、本発明の実施の形態１にかかるコンテンツ配信部１１、入力受付部１２、集計処理部１３、領域推定部１４、コンテンツ記憶部２１、視聴履歴記憶部２２、ヒストグラム記憶部２３、又は推定領域コンテンツ記憶部２４のいずれか又は全てを備えるものであって構わない。 Note that each function of the image generation apparatus 1 according to the first embodiment of the present invention may be realized inside the information terminals 3a, 3b, and 3c. That is, the information terminals 3a, 3b, and 3c are the content distribution unit 11, the input reception unit 12, the totalization processing unit 13, the region estimation unit 14, the content storage unit 21, and the viewing history storage unit according to the first embodiment of the present invention. 22, any one or all of the histogram storage unit 23 and the estimated region content storage unit 24 may be provided.

例えば、情報端末３ａを使用する視聴者は、情報端末３ａの内部に格納されたビジュアルコンテンツに対して、自身が嗜好する領域を選択し、情報端末３ａは、当該選択領域に基づき、領域推定を行うことができる。その後、視聴者は、当該ビジュアルコンテンツに対して、情報端末３ａ内に格納された推定領域コンテンツを視聴することができる。これにより、情報端末３ａを使用する視聴者は、当該当該ビジュアルコンテンツに対して、より自身の嗜好に合った推定領域コンテンツを視聴することができる。 For example, a viewer who uses the information terminal 3a selects an area that he / she likes for visual content stored inside the information terminal 3a, and the information terminal 3a performs area estimation based on the selected area. It can be carried out. Thereafter, the viewer can view the estimated area content stored in the information terminal 3a with respect to the visual content. Thereby, the viewer who uses the information terminal 3a can view the estimated area content that better matches his / her preference for the visual content.

さらに、本発明は上述した実施の形態のみに限定されるものではなく、既に述べた本発明の要旨を逸脱しない範囲において種々の変更が可能であることは勿論である。 Furthermore, the present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the gist of the present invention described above.

本発明の実施の形態１にかかる画像生成装置１において視聴履歴を収集する動作を説明する図である。It is a figure explaining the operation | movement which collects viewing history in the image generation apparatus 1 concerning Embodiment 1 of this invention. 本発明の実施の形態１にかかる画像生成装置１において視聴履歴から推定領域コンテンツを生成する動作を説明する図である。It is a figure explaining the operation | movement which produces | generates an estimation area | region content from viewing history in the image generation apparatus 1 concerning Embodiment 1 of this invention. 本発明の実施の形態１の適用例の概略処理を示すフローチャート図である。It is a flowchart figure which shows the schematic process of the example of application of Embodiment 1 of this invention. 動的輪郭モデルの適用例を示す図である。It is a figure which shows the example of application of an active contour model. 本発明の実施の形態１にかかるエネルギー最小化処理を示すフローチャート図である。It is a flowchart figure which shows the energy minimization process concerning Embodiment 1 of this invention. 本発明の実施の形態１にかかる領域選択画面の例を示す図である。It is a figure which shows the example of the area | region selection screen concerning Embodiment 1 of this invention. 視聴履歴のヒストグラムの例を示す図である。It is a figure which shows the example of the histogram of viewing history. 動画像コンテンツへ従来技術（ダウンサンプリング）を適用した結果、生成される画像群を示す図である。It is a figure which shows the image group produced | generated as a result of applying a prior art (downsampling) to moving image content. 動画像コンテンツへ本発明の実施の形態１にかかる画像生成装置１を適用した結果、推定領域コンテンツとして生成される画像群を示す図である。It is a figure which shows the image group produced | generated as an estimation area content as a result of applying the image generation apparatus 1 concerning Embodiment 1 of this invention to moving image content. 本発明の実施の形態２における適用例の概略を示す図である。It is a figure which shows the outline of the example of application in Embodiment 2 of this invention.

Explanation of symbols

１画像生成装置１ａキャリアサーバ１ｂＷＥＢサーバ
１１コンテンツ配信部１１ａコンテンツ配信部１２入力受付部
１３集計処理部１４領域推定部１５切り出し処理部
２１コンテンツ記憶部２２視聴履歴記憶部
２３ヒストグラム記憶部２４推定領域コンテンツ記憶部
２５推定領域情報記憶部
３ａ、３ｂ、３ｃ、３ｄ、３ｅ、３ｆ情報端末
３１携帯電話端末３２フルブラウザ
３３携帯電話端末３４フルブラウザ
４対象物体４０領域
４１乃至４８代表点４０１乃至４０８矢印
４０初期領域４１代表点４２収束エネルギー
４３代表点４４収束エネルギー
４５収束領域４６代表点４７代表点
５０領域選択画面５１ダウンサンプリング済原画像５２切り出し領域
５３切り出し画像
ｖ_ｉ,ｊ代表点（ノード）ｉノード番号ｊフレーム番号
Ｅ_ｔｕｂｅエネルギー関数 α、β、γ、δ、ε 重み付け係数
Ｅ_ｈｉｎ、Ｅ_ｈｏｕｔ、Ｅ_{ｆｒａｍｅ}、Ｅ_ｄｉｓ、Ｅ_ａｓｐエネルギー
Ｎそのコンテンツの視聴者全体の数
Ｐ_ＩＮ推定領域の内側のピクセル数Ｐ_ＯＵＴ推定領域の外側のピクセル数
Σ_ＩＮ推定領域の内側の各ピクセルにおける視聴者数の合計
Σ_ＯＵＴ推定領域の外側の各ピクセルにおける視聴者数の合計
ｘ_ｉｄｅａ、ｙ_ｉｄｅａ目標とする推定領域のサイズ
ｗ表示される画面の幅ｈ表示される画面高さ DESCRIPTION OF SYMBOLS 1 Image production | generation apparatus 1a Carrier server 1b WEB server 11 Content delivery part 11a Content delivery part 12 Input reception part 13 Total processing part 14 Area estimation part 15 Cutout process part 21 Content storage part 22 Viewing history storage part 23 Histogram storage part 24 Estimation area | region Content storage unit 25 Estimated area information storage unit 3a, 3b, 3c, 3d, 3e, 3f Information terminal 31 Mobile phone terminal 32 Full browser 33 Mobile phone terminal 34 Full browser 4 Target object 40 Area
41 to 48 representative points 401 to 408 arrow 40 initial region 41 representative point 42 convergence energy 43 representative point 44 convergence energy 45 convergence region 46 representative point 47 representative point 50 region selection screen 51 downsampled original image 52 cutout region 53 cutout image v _{i, j} representative point (node) i node number j frame number E _tube energy function α, β, γ, δ, ε weighting factors E _hin , E _hout , E _frame , E _dis , E _asp energy N Viewer of the content the number of pixels inside the whole number P _iN estimation region of P _OUT viewership on the inside of each pixel outside the number of pixels sigma _iN estimation region of the estimated area total sigma _OUT outside the estimated region of the audience at each pixel Total _xidea , _yidea target estimation area Size w Displayed screen width h Displayed screen height

Claims

An image generation device that generates one image from a plurality of input images,
Estimating from the plurality of input images to minimize the energy function using an active contour model in which an energy function including a narrowing-down energy that narrows down a region so as to include the most common image from the plurality of input images is used. An image generation apparatus including an area estimation unit for obtaining an area.

The focused energy is
Internal energy with respect to the interior of the estimated region;
External energy relating to the outside of the estimation region,
The internal energy is
The more the images of the estimated area are in common, the smaller the energy,
The external energy is
The image generating apparatus according to claim 1, wherein the smaller the image common to the outside of the estimation area, the smaller the energy.

The energy function is
The image generation apparatus according to claim 2, further comprising size adjustment energy for bringing the size of the estimated area close to a target value.

The input image is a part of the original image cut out,
The energy function is
The image generation apparatus according to claim 3, further comprising inter-image adjustment energy that decreases as the positions cut out between the original images match.

The plurality of original images is time-series data composed of a plurality of frames,
The inter-image adjustment energy is
The image generation apparatus according to claim 4, wherein the cutting positions of the front and rear frames are connected by a straight line, and the energy decreases as the cutting position of the frame between the front and rear approaches the straight line.

The estimated area is rectangular;
The energy function is
The image generation apparatus according to claim 4, further comprising an aspect ratio adjustment energy that maintains a constant aspect ratio of the estimation area.

The plurality of input images are:
The image generation apparatus according to claim 1, wherein the image generation apparatus is an image obtained by cutting out a part or all of one or a plurality of images.

The image generation apparatus according to claim 1, wherein the plurality of input images are viewing histories that are a set of regions selected by a plurality of viewers viewing the original image.

A viewing history storage unit for storing the viewing history;
The image generation apparatus according to claim 8, further comprising: a totalization processing unit that generates a histogram by counting the number of viewers for each region with reference to the viewing history storage unit.

The region estimation unit
Minimizing the energy function by a greedy method;
The image generating apparatus according to claim 6, wherein a region that minimizes the energy function other than the external energy and the inter-image adjustment energy is obtained when the number of repetitions of the greedy method is a predetermined number or less.

The image generation apparatus according to claim 9, further comprising an input receiving unit that receives an area selected by the viewer as an input image and stores the input image in the viewing history storage unit.

The active contour model is
An energy function further including energy related to layout information of the original image is defined,
The region estimation unit
The image generation apparatus according to claim 1, wherein an estimated area is obtained using the dynamic contour model.

The information terminal which displays the said original image, discriminate | determines the area | region selected by the said viewer, and transmits the information of the said area | region to the image generation apparatus of any one of Claim 1 thru | or 12.

An image generation method for generating one image from a plurality of input images,
Estimating from the plurality of input images to minimize the energy function using an active contour model in which an energy function including a narrowing-down energy that narrows down a region so as to include the most common image from the plurality of input images is used. An image generation method including an area estimation step for obtaining an area.

The focused energy is
Internal energy with respect to the interior of the estimated region;
External energy relating to the outside of the estimation region,
The internal energy is
The more the images of the estimated area are in common, the smaller the energy,
The external energy is
The image generation method according to claim 14, wherein the smaller the image common to the outside of the estimation area, the smaller the energy.

The energy function is
The image generation method according to claim 15, further comprising size adjustment energy for bringing the size of the estimated region closer to a target value.

The input image is obtained by cutting out a part of the image area of the original image,
The energy function is
The image generation method according to claim 16, further comprising inter-image adjustment energy that decreases as the positions cut out between the original images match.

The plurality of original images is time-series data composed of a plurality of frames,
The inter-image adjustment energy is
The image generation method according to claim 17, wherein the cutout positions of the preceding and following frames are connected by a straight line, and the energy decreases as the cutout position of the frame between the front and back approaches the straight line.

The estimated area is rectangular;
The energy function is
The image generation method according to claim 17 or 18, further comprising aspect ratio adjustment energy for maintaining a constant aspect ratio of the estimation area.

The plurality of input images are:
The image generation method according to any one of claims 14 to 19, which is an image obtained by cutting out part or all of one or a plurality of images.

21. The image generation method according to claim 14, wherein the plurality of input images are viewing histories that are collections of regions selected by a plurality of viewers viewing the original image.

The image generation method according to claim 21, further comprising: an aggregation processing step of generating a histogram in which the number of viewers is aggregated for each area from the viewing history.

The region estimation step includes:
Minimizing the energy function by a greedy method;
The image generation method according to claim 19, wherein a region that minimizes the energy function other than the external energy and the inter-image adjustment energy is obtained when the number of repetitions of the greedy method is a predetermined number or less.

24. The image generation method according to claim 22 or 23, further comprising an input receiving step of receiving an area selected by the viewer as an input image and storing it in the viewing history storage unit.

The active contour model is
An energy function further including energy related to layout information of the original image is defined,
The region estimation step includes:
The image generation method according to claim 14, wherein an estimated area is obtained using the dynamic contour model.

A program for causing a computer to execute the processing according to any one of claims 14 to 25.