JP6583996B2

JP6583996B2 - Video evaluation apparatus and program

Info

Publication number: JP6583996B2
Application number: JP2015143185A
Authority: JP
Inventors: 小峯　一晃; 一晃小峯
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2015-07-17
Filing date: 2015-07-17
Publication date: 2019-10-02
Anticipated expiration: 2035-07-17
Also published as: JP2017028402A

Description

本発明は、映像評価装置、及びプログラムに関する。 The present invention relates to a video evaluation device and a program.

映像の評価は、主に視聴率などの間接的なデータを利用するか、アンケートなどの事後調査を利用しており、時間的に変動する映像コンテンツの評価を表す直接的な指標にはなっていない。 Video evaluation mainly uses indirect data such as audience rating, or uses a follow-up survey such as a questionnaire, which is a direct indicator of the evaluation of video content that changes over time. Absent.

時間的な変動を考慮した映像評価の指標として、映像視聴時の視線を測定・分析して利用する例がある。視線の動きは映像が持つ情報の伝達量や視聴後の映像への印象などと関連が深く、映像制作時の演出効果を評価する指標として有効であると考えられる。視線の動きを利用した映像の評価としては、多人数の視聴者の注視点の動きや注視点分布などから演出や表現の効果を評価する方法がある（例えば、特許文献１参照）。 There is an example of measuring and analyzing the line of sight during video viewing as an index for video evaluation considering temporal variation. The movement of the line of sight is closely related to the amount of information transmitted in the video and the impression of the video after viewing, and is considered to be effective as an index for evaluating the production effect during video production. As an evaluation of the video using the movement of the line of sight, there is a method of evaluating the effect of presentation and expression from the movement of the gazing point and the distribution of the gazing point of a large number of viewers (for example, see Patent Document 1).

また、人の視覚情報処理モデルに基づいて画像の特徴量を分析し、注目されすい領域（顕著性マップ）を推定するモデルが提案されている（例えば、非特許文献１〜３参照）。さらには、映像の顕著性マップと視線分布とを比較して映像を評価する方法も提案されている（例えば、特許文献２参照）。 In addition, a model has been proposed in which the feature amount of an image is analyzed based on a human visual information processing model, and a focused region (saliency map) is estimated (see, for example, Non-Patent Documents 1 to 3). Furthermore, a method for evaluating a video by comparing a saliency map of the video with a line-of-sight distribution has been proposed (see, for example, Patent Document 2).

特開２００７−３１０４５４号公報JP 2007-310454 A 特許第５３０６９４０号公報Japanese Patent No. 5306940

L. Itti，外２名，"A Model of Saliency-Based Visual Attention for Rapid Scene Analysis"，IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE，1998年，VOL. 20，NO.11，p.1254-1259L. Itti, 2 others, "A Model of Saliency-Based Visual Attention for Rapid Scene Analysis", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, VOL. 20, NO.11, p.1254-1259 O. Le Meur，外２名，"Predicting visual fixations on video based on low-level visual features"，Vision Research 47，2007年，p.2483-2498O. Le Meur, 2 others, “Predicting visual fixations on video based on low-level visual features”, Vision Research 47, 2007, p.2483-2498 Neil D. B. Bruce，外１名，"Saliency, attention, and visual search: An information theoretic approach"，Journal of Vision，2009年，9(3) ，p.1-24Neil D. B. Bruce, 1 other, "Saliency, attention, and visual search: An information theoretic approach", Journal of Vision, 2009, 9 (3), p.1-24

上述した従来技術では、視聴者が映像の特徴によって誘目されている程度を定量的に評価することはできない。
例えば、特許文献１の技術では、映像コンテンツ自体が有する誘目性の影響が考慮されていない。そのため、映像の文脈によるものか、あるいは顕著性の高い部分に誘導されたものかなどの要因が特定できず、演出で用いた映像表現の効果を評価する際に課題がある。また、非特許文献１〜３の技術で得られるモデルは、実際に映像を視聴している際の視線を測定して注視点の分布や停留時間などとの相関からモデルの妥当性を評価しているが、映像自体の評価には至っていない。そして、特許文献２の技術では、視線分布と顕著性マップとを比較する際に眼球運動の時間的な特性（眼球運動の遅れ時間）が考慮されていないため、精度的な課題がある。 In the above-described conventional technology, it is impossible to quantitatively evaluate the degree to which the viewer is attracted by the characteristics of the video.
For example, in the technique of Patent Document 1, the influence of the attractiveness of the video content itself is not taken into consideration. For this reason, it is impossible to specify a factor such as whether it is due to the context of the video or that the video is guided to a highly prominent part, and there is a problem in evaluating the effect of the video expression used in the production. In addition, the model obtained by the techniques of Non-Patent Documents 1 to 3 measures the line of sight when actually watching the video, and evaluates the validity of the model from the correlation with the distribution of gazing points and the stopping time. However, the video itself has not been evaluated. The technique of Patent Document 2 has an accuracy problem because the temporal characteristics of eye movement (eye movement delay time) are not taken into consideration when comparing the line-of-sight distribution and the saliency map.

本発明は、このような事情を考慮してなされたもので、視聴者が映像の特徴によって誘目されている程度を精度よく評価することができる映像評価装置、及びプログラムを提供する。 The present invention has been made in view of such circumstances, and provides a video evaluation apparatus and a program that can accurately evaluate the degree to which a viewer is attracted by the characteristics of a video.

本発明の一態様は、映像視聴中の視点をフレームごとに示す視線データから得られた視点の移動速度に基づいてサッカードの開始から終了までの前記フレームを特定するサッカード抽出部と、前記移動速度に基づいて、サッカードの終了から次のサッカードが開始するまでの前記フレームから注視された前記フレームを特定し、特定した前記フレームにおける前記視点に基づいて注視領域を抽出する注視領域抽出部と、前記注視領域が得られた前記フレームにおける前記視点の分布に基づいて前記注視領域における重み係数を表す領域窓関数を算出する窓関数算出部と、サッカードが終了したときの前記フレームから、内的に注視の対象が決定された状態から視線の移動が開始するまでの遅れ時間である潜時とサッカードに要する時間とを合計した時間分のフレーム数だけ遡った前記フレームを、ターゲットフレームとして抽出するターゲットフレーム抽出部と、前記ターゲットフレームの画素ごとの特徴量の値を算出したマップに基づいて画素ごと又は画像のブロックごとに顕著性の程度を数値化した顕著性マップを算出する顕著性マップ算出部と、前記注視領域に該当する前記顕著性マップの部分に前記領域窓関数を適用した値を用いて、誘目性を定量的に表す値である誘目度を算出する誘目度算出部と、を備えることを特徴とする映像評価装置である。
この発明によれば、映像評価装置は、視聴者が映像を視聴したときのフレームごとの視点を示す視線データから視線の移動速度を求め、求めた移動速度に基づいてサッカードの開始から終了までのフレームを特定する。さらに、映像評価装置は、視点の移動速度に基づいて、サッカードの終了から次のサッカードの開始までに視聴者が注視したフレームを特定し、そのフレームにおける視点の位置に基づいて注視領域を抽出すると、注視領域における視点の分布に基づいて領域窓関数を算出する。映像評価装置は、サッカードが終了したフレームから、潜時分とサッカードに要した時間分のフレーム数だけ遡ったターゲットフレームの顕著性マップを算出し、ターゲットフレームにおける注視領域の部分に領域窓関数を適用した値を用いて誘目度を算出する。
これにより、映像評価装置は、眼球運動の時間的な特性を考慮して、視聴者が映像の特徴によって誘目されている程度を精度よく、定量的な値により評価することができる。 One aspect of the present invention provides a saccade extraction unit that identifies the frame from the start to the end of saccade based on the moving speed of the viewpoint obtained from line-of-sight data indicating the viewpoint during video viewing for each frame, Gaze area extraction that identifies the frame that is watched from the frame from the end of saccade to the start of the next saccade based on the moving speed, and extracts the gaze area based on the viewpoint in the identified frame A window function calculation unit that calculates an area window function that represents a weighting factor in the gaze area based on the distribution of the viewpoint in the frame from which the gaze area is obtained, and the frame when the saccade is ended The latency, which is the delay time from when the gaze target is determined internally until the movement of the line of sight starts, is combined with the time required for saccade. Was said frame back by time duration number of frames, the target frame extracting unit that extracts as a target frame, for each pixel on the basis of a map which calculates the value of the feature amount of each pixel of the target frame or image for each block of Using a value obtained by applying the area window function to the portion of the saliency map corresponding to the gaze area, the saliency map is calculated by calculating a saliency map in which the degree of saliency is quantified. And an attractiveness calculating unit that calculates an attractiveness level that is a representative value.
According to the present invention, the video evaluation device obtains the movement speed of the line of sight from the line-of-sight data indicating the viewpoint for each frame when the viewer views the video, and from the start to the end of the saccade based on the obtained movement speed. Identify the frame. Furthermore, the video evaluation device identifies a frame that the viewer has watched from the end of the saccade to the start of the next saccade based on the moving speed of the viewpoint, and determines the gaze area based on the position of the viewpoint in the frame. When extracted, an area window function is calculated based on the viewpoint distribution in the gaze area. The video evaluation device calculates a saliency map of the target frame that goes back by the number of frames for the latency and the time required for the saccade from the frame where the saccade is completed. The degree of attraction is calculated using the value to which the function is applied.
As a result, the video evaluation apparatus can accurately evaluate the degree to which the viewer is attracted by the characteristics of the video with a quantitative value in consideration of the temporal characteristics of the eye movement.

本発明の一態様は、上述する映像評価装置であって、前記窓関数算出部は、前記注視領域に含まれる前記視点に二次元混合正規分布を当てはめた分布、前記視点から一定の距離内にある領域を、前記注視領域に含まれる全ての前記視点について足し合わせた後に正規化した分布、又は、前記注視領域に含まれる前記視点の重心から一定の距離内を一様の値とする分布により前記領域窓関数を算出する、ことを特徴とする。
この発明によれば、映像評価装置は、注視領域の中でも、視聴者が特に注視している箇所に近いほど、ターゲットフレームの顕著性マップに適用する係数値が高くなるような領域窓関数を算出することができる。
これにより、映像評価装置は、視聴者の目を惹いたフレーム画像の部分については、顕著性マップの値の重みを高くし、高い誘目度が得られるようにするため、視聴者が映像の特徴によって誘目されている程度を精度よく評価することができる。 One aspect of the present invention is the above-described video evaluation device, wherein the window function calculation unit is a distribution in which a two-dimensional mixed normal distribution is applied to the viewpoint included in the gaze region, within a certain distance from the viewpoint. A distribution obtained by normalizing a certain area after adding up all the viewpoints included in the gaze area, or a distribution having a uniform value within a certain distance from the center of gravity of the viewpoint included in the gaze area The area window function is calculated.
According to the present invention, the video evaluation apparatus calculates an area window function such that the coefficient value applied to the saliency map of the target frame becomes higher as the position of the gaze area closer to the part where the viewer is particularly gazing. can do.
As a result, the video evaluation apparatus increases the weight of the value of the saliency map for the portion of the frame image that attracts the viewer's attention so that the viewer can obtain a high degree of attraction. It is possible to accurately evaluate the degree of being attracted by.

本発明の一態様は、上述する映像評価装置であって、視標の位置を時間的に切り替えて提示したときに測定された視点のデータに基づいて、前記視標ごとに、前記視標の提示が開始された時刻から前記視標へのサッカードが開始された時刻までの差分である遅れ時間を算出し、複数の前記視標について算出した前記遅れ時間を平均して前記潜時を算出する潜時抽出部をさらに備える、ことを特徴とする。
この発明によれば、映像評価装置は、個人キャリブレーションを行ったときの視点のデータに基づいて、指標の提示からサッカードが開始される時刻までの差分である遅れ時間を複数の指標について算出し、算出したそれらの遅れ時間を平均して潜時を算出する。
これにより、映像評価装置は、視聴者ごとに特有の潜時を用いて、内的に注視の対象が決定されたときのフレームを特定し、誘目度の算出に用いることができる。よって、映像評価装置は、視聴者が映像の特徴によって誘目されている程度を精度よく評価することができる。 One aspect of the present invention is the above-described video evaluation device, in which, for each target, the target of the target is measured based on viewpoint data measured when the position of the target is temporally switched and presented. A delay time that is a difference from a time when presentation is started to a time when saccade to the target is started is calculated, and the latency is calculated by averaging the delay times calculated for the plurality of targets. And a latency extraction unit.
According to the present invention, the video evaluation device calculates a delay time, which is a difference from the presentation of the index to the time when the saccade is started, for a plurality of indices, based on the viewpoint data when the personal calibration is performed. Then, the latency is calculated by averaging the calculated delay times.
Thereby, the video evaluation apparatus can specify the frame when the gaze target is determined internally using the latency specific to each viewer, and can use it for calculating the degree of attraction. Therefore, the video evaluation apparatus can accurately evaluate the degree to which the viewer is attracted by the characteristics of the video.

本発明の一態様は、上述する映像評価装置であって、注視領域抽出部は、注視された前記フレームにおける前記視点に基づいて二次元座標における分布を算出し、算出した分布の値が閾値以上の領域を注視領域として抽出する、ことを特徴とする。
この発明によれば、映像評価装置は、サッカード終了後に注視されたフレームにおける視点の位置に基づいて注視領域を抽出する。
これにより、映像評価装置は、フレーム画像においてサッカードを誘発した部分を注視領域として得ることができる。 One aspect of the present invention is the above-described video evaluation device, in which the gaze region extraction unit calculates a distribution in two-dimensional coordinates based on the viewpoint in the frame that has been watched, and the value of the calculated distribution is greater than or equal to a threshold value This area is extracted as a gaze area.
According to the present invention, the video evaluation device extracts a gaze region based on the position of the viewpoint in the frame gazeed after the saccade.
Thereby, the video evaluation apparatus can obtain a portion in which the saccade is induced in the frame image as a gaze area.

本発明の一態様は、コンピュータを、映像視聴中の視点をフレームごとに示す視線データから得られた視点の移動速度に基づいてサッカードの開始から終了までの前記フレームを特定するサッカード抽出手段と、前記移動速度に基づいて、サッカードの終了から次のサッカードが開始するまでの前記フレームから注視された前記フレームを特定し、特定した前記フレームにおける前記視点に基づいて注視領域を抽出する注視領域抽出手段と、前記注視領域が得られた前記フレームにおける前記視点の分布に基づいて前記注視領域における重み係数を表す領域窓関数を算出する窓関数算出手段と、サッカードが終了したときの前記フレームから、内的に注視の対象が決定された状態から視線の移動が開始するまでの遅れ時間である潜時とサッカードに要する時間とを合計した時間分のフレーム数だけ遡った前記フレームを、ターゲットフレームとして抽出するターゲットフレーム抽出手段と、前記ターゲットフレームの画素ごとの特徴量の値を算出したマップに基づいて画素ごと又は画像のブロックごとに顕著性の程度を数値化した顕著性マップを算出する顕著性マップ算出手段と、前記注視領域に該当する前記顕著性マップの部分に前記領域窓関数を適用した値を用いて、誘目性を定量的に表す値である誘目度を算出する誘目度算出手段と、を具備する映像評価装置として機能させるためのプログラムである。 According to one aspect of the present invention, a saccade extracting unit that identifies a frame from the start to the end of a saccade based on a moving speed of the viewpoint obtained from line-of-sight data indicating a viewpoint during video viewing for each frame. Then, based on the moving speed, the frame that is watched from the frame from the end of saccade to the start of the next saccade is identified, and the gaze area is extracted based on the viewpoint in the identified frame Gaze area extraction means, window function calculation means for calculating an area window function representing a weighting factor in the gaze area based on the distribution of the viewpoint in the frame from which the gaze area was obtained, and when saccade is completed Latency and sucker, which is a delay time from the state in which the gaze target is internally determined until the movement of the line of sight starts from the frame. The frame traced back by the number of frame time and total time fraction of that required for de, based the target frame extracting means for extracting a target frame, the map calculated value of the feature amount of each pixel of the target frame pixel A saliency map calculating means for calculating a saliency map in which the degree of saliency is quantified for each block or image block, and a value obtained by applying the area window function to a portion of the saliency map corresponding to the gaze area And a program for functioning as an image evaluation apparatus including an attractiveness calculating unit that calculates an attractiveness that is a value that quantitatively represents attractiveness.

本発明によれば、視聴者が映像の特徴によって誘目されている程度を精度よく評価することができる。 According to the present invention, it is possible to accurately evaluate the degree to which a viewer is attracted by the characteristics of an image.

本発明の実施形態による映像評価システムの構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the image | video evaluation system by embodiment of this invention. 同実施形態による視線データのデータ構成を示す図である。It is a figure which shows the data structure of the gaze data by the embodiment. 同実施形態による個人キャリブレーション時に画面に提示される視標の位置を示す図である。It is a figure which shows the position of the visual target shown on a screen at the time of personal calibration by the embodiment. 同実施形態による個人キャリブレーション時に画面に提示される視標の時間的変化の例を示す図である。It is a figure which shows the example of the time change of the optotype shown on a screen at the time of the personal calibration by the same embodiment. 同実施形態による個人キャリブレーション時において視標が変化した際の視点の変化のタイミングを模式的に示した図である。It is the figure which showed typically the timing of the change of a viewpoint when a visual target changes at the time of the personal calibration by the same embodiment. 同実施形態による映像評価装置の誘目度算出処理を示す処理フローである。It is a processing flow which shows the attractiveness calculation process of the image | video evaluation apparatus by the embodiment. 同実施形態によるサッカード及び注視の抽出例を示す図である。It is a figure which shows the extraction example of saccade and gaze by the embodiment. 同実施形態によるターゲットフレームの決定方法を示す図である。It is a figure which shows the determination method of the target frame by the embodiment. 同実施形態によるシーン毎のサッカードと領域窓関数の例を示す図である。It is a figure which shows the example of the saccade and area window function for every scene by the embodiment. 同実施形態による評価結果データの例を示す図である。It is a figure which shows the example of the evaluation result data by the same embodiment.

以下、図面を参照しながら本発明の実施形態を詳細に説明する。
本発明の実施形態では、映像視聴時の視聴者の眼球運動を測定し、測定したなかで注視点の位置が大きく変化するサッカード（跳躍眼球運動）成分に注目する。映像評価装置は、内的（心的）な注視対象の決定時点における当該サッカード終了地点周辺の顕著性の高さを算出し、当該サッカードを生じさせた誘目性の指標（以下、「誘目度」と記載する）とする。誘目度は、誘目性を定量的に表す値である。映像評価装置は、映像視聴中に得られた全てのサッカードについて算出される誘目度から、映像の各シーンや映像全体の誘目度を算出し、映像の誘目性に関する評価値とする。なお、シーンとは、カットからカットまでの間である。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
In the embodiment of the present invention, the eye movement of the viewer at the time of video viewing is measured, and attention is paid to a saccade (jumping eye movement) component in which the position of the gazing point changes greatly. The video evaluation device calculates the level of saliency around the end point of the saccade at the time of determination of the internal (mental) gaze target, and the index of the attractiveness that caused the saccade (hereinafter, “attractiveness”). Described as “degree”). The degree of attractiveness is a value that quantitatively represents the attractiveness. The video evaluation device calculates the degree of attraction for each scene of the video and the entire video from the degree of attraction calculated for all saccades obtained during the video viewing, and sets it as an evaluation value for the attractiveness of the video. The scene is from cut to cut.

図１は、本発明の実施形態による映像評価システム１０の構成を示す機能ブロック図であり、本実施形態と関係する機能ブロックのみを抽出して示してある。映像評価システム１０は、映像蓄積部１、映像再生部２、視線測定部３、視線データ蓄積部４、及び、映像評価装置５を備えて構成される。 FIG. 1 is a functional block diagram showing a configuration of a video evaluation system 10 according to an embodiment of the present invention, and only functional blocks related to the present embodiment are extracted and shown. The video evaluation system 10 includes a video storage unit 1, a video playback unit 2, a line-of-sight measurement unit 3, a line-of-sight data storage unit 4, and a video evaluation device 5.

映像蓄積部１は、映像データを蓄積する。映像蓄積部１に蓄積される映像データには、潜時の抽出に用いる映像データと、評価対象の映像データとが含まれる。サッカードとは、眼球を素早く動かして視点を移動することであり、潜時とは、視聴者が内的に注視対象を決定してからその視標へのサッカードが開始されるまでの遅れ時間である。
映像再生部２は、映像蓄積部１に蓄積されている映像データを視聴者に提示する。つまり、映像再生部２は、映像データを再生し、映像データの各フレームの画像をディスプレイまたはスクリーン（以下、「画面」と記載する。）に表示する。
視線測定部３は、視聴者が映像データを視聴している際の眼球運動を測定し、映像再生部２が画面にフレーム画像を表示しているときに、ユーザが視線を向けていた画面上の位置である視点を得る。視点は、例えば、画面の縦と横をそれぞれ、Ｘ軸、Ｙ軸としたときの座標により表される。
視線データ蓄積部４は、映像視聴中に視線測定部３が得た画面上の視点をフレームごとに示す視線データを蓄積する。 The video storage unit 1 stores video data. The video data stored in the video storage unit 1 includes video data used for latency extraction and video data to be evaluated. Saccade means moving the viewpoint quickly by moving the eyeball, and latency is the delay from when the viewer determines the target of gaze internally until saccade starts on that target. It's time.
The video playback unit 2 presents the video data stored in the video storage unit 1 to the viewer. That is, the video reproduction unit 2 reproduces the video data and displays an image of each frame of the video data on a display or a screen (hereinafter referred to as “screen”).
The line-of-sight measurement unit 3 measures eye movements when the viewer is viewing the video data, and the video reproduction unit 2 displays a frame image on the screen, and the user is turning the line of sight on the screen. Get the viewpoint that is the position of. The viewpoint is represented, for example, by coordinates when the vertical and horizontal directions of the screen are the X axis and the Y axis, respectively.
The line-of-sight data accumulating unit 4 accumulates line-of-sight data indicating the viewpoint on the screen obtained by the line-of-sight measurement unit 3 during video viewing for each frame.

映像評価装置５は、コンピュータ装置により実現することができる。映像評価装置５は、潜時抽出部５１、サッカード抽出部５２、注視領域抽出部５３、窓関数算出部５４、ターゲット領域抽出部５５、ターゲットフレーム抽出部５６、顕著性マップ算出部５７、誘目度算出部５８、及び記憶部５９を備えて構成される。
潜時抽出部５１は、映像評価処理に先立って実施される個人キャリブレーション時に得られた視線データを用いて、視聴者の眼球運動の遅れ時間である潜時を抽出する。サッカード抽出部５２は、視聴者が評価対象の映像データを視聴したときの視線データを用いて、サッカード成分を抽出する。注視領域抽出部５３は、サッカード終了点を含むフレームから次のサッカードが開始されるフレームまでの視点に基づいて注視領域を抽出する。窓関数算出部５４は、注視領域における視点の分布に基づいて、注視領域における重み係数を表す領域窓関数を算出する。ターゲット領域抽出部５５は、各サッカードについて、サッカード終了点を含む注視領域をターゲット領域として抽出する。ターゲットフレーム抽出部５６は、サッカードの開始フレームから、潜時抽出部５１が抽出したサッカード潜時分のフレームを時間的に遡ったフレームを、ターゲットフレームとして抽出する。サッカードの開始フレームは、サッカードの終了フレームからサッカードに要する時間分のフレームを遡ったフレームである。つまり、ターゲットフレームは、サッカード終了点を含むフレームから、潜時とサッカードに要する時間とを合計した時間分のフレーム数だけ遡ったフレームに相当する。顕著性マップ算出部５７は、ターゲットフレームの顕著性マップを算出する。誘目度算出部５８は、顕著性マップのターゲット領域に該当する部分に領域窓関数を適用して誘目性を表す値を算出する。誘目度算出部５８は、誘目性を表す値を領域窓関数によって正規化し、誘目度を算出する。記憶部５９は、各サッカードについて得られた誘目度など、各種データを記憶する。また、記憶部５９は、各部が処理に用いるデータを一時的に記憶する。 The video evaluation device 5 can be realized by a computer device. The video evaluation device 5 includes a latency extraction unit 51, a saccade extraction unit 52, a gaze region extraction unit 53, a window function calculation unit 54, a target region extraction unit 55, a target frame extraction unit 56, a saliency map calculation unit 57, and an attraction A degree calculation unit 58 and a storage unit 59 are provided.
The latency extraction unit 51 uses the line-of-sight data obtained at the time of personal calibration performed prior to the video evaluation process to extract a latency that is a delay time of the viewer's eye movement. The saccade extraction unit 52 extracts a saccade component using line-of-sight data when the viewer views the video data to be evaluated. The gaze area extraction unit 53 extracts a gaze area based on the viewpoint from the frame including the saccade end point to the frame where the next saccade starts. The window function calculation unit 54 calculates an area window function representing a weighting coefficient in the gaze area based on the distribution of viewpoints in the gaze area. The target area extraction unit 55 extracts, for each saccade, a gaze area including a saccade end point as a target area. The target frame extraction unit 56 extracts, as a target frame, a frame that is temporally retroactive to the saccade latency frame extracted by the latency extraction unit 51 from the saccade start frame. The start frame of saccade is a frame that goes back from the end frame of saccade by the time frame required for saccade. That is, the target frame corresponds to a frame that is back from the frame including the saccade end point by the number of frames corresponding to the sum of the latency and the time required for saccade. The saliency map calculation unit 57 calculates a saliency map of the target frame. The attractiveness calculating unit 58 calculates a value representing the attractiveness by applying an area window function to a portion corresponding to the target area of the saliency map. The degree of attractiveness calculation unit 58 normalizes the value representing the attractiveness by the area window function, and calculates the degree of attractiveness. The storage unit 59 stores various data such as the degree of attraction obtained for each saccade. The storage unit 59 temporarily stores data used by each unit for processing.

図２は、視線データの例を示す図である。同図に示すように、視線データは、フレームＩＤ（以下、「ＦＩＤ」と記載する。）と、タイムコードと、視点位置情報とを対応付けたデータである。ＦＩＤは、映像データのフレームを特定する識別情報である。タイムコードは、ＦＩＤにより特定されるフレームの再生時間を示す。視点位置情報は、ＦＩＤにより特定されるフレームの画像が提示されていたときにユーザの視線が向いていた画面上の位置である視点を表す座標である。 FIG. 2 is a diagram illustrating an example of line-of-sight data. As shown in the drawing, the line-of-sight data is data in which a frame ID (hereinafter referred to as “FID”), a time code, and viewpoint position information are associated with each other. The FID is identification information that identifies a frame of video data. The time code indicates the playback time of the frame specified by the FID. The viewpoint position information is coordinates representing a viewpoint that is a position on the screen at which the user's line of sight was directed when the image of the frame specified by the FID was presented.

続いて、映像評価システム１０の動作を説明する。
まず、映像評価システム１０は、映像評価に先立って、潜時を抽出するための個人キャリブレーションを行う。 Next, the operation of the video evaluation system 10 will be described.
First, the video evaluation system 10 performs personal calibration for extracting the latency prior to video evaluation.

図３は、個人キャリブレーション時に画面に提示される視標の位置を示す図である。同図では、縦の位置を右、中央、左の３種類、横の位置を上、中央、下の３種類としており、９つの視標Ｄ１〜Ｄ９はそれぞれ、縦の位置と横の位置の組み合わせが異なる。 FIG. 3 is a diagram illustrating the position of the visual target presented on the screen during personal calibration. In the figure, the vertical position has three types of right, center, and left, and the horizontal position has three types of up, center, and bottom. The combination is different.

図４は、個人キャリブレーション時に画面に提示される視標の時間的変化の例を示す図である。同図に示す画面は、映像再生部２が潜時抽出用の映像データを再生することにより表示される。同図では、視標Ｄ５、視標Ｄ１、視標Ｄ２、視標Ｄ３、視標Ｄ６、視標Ｄ９、視標Ｄ８、視標Ｄ７、視標Ｄ４、視標Ｄ５の順に提示を行う。各視標はそれぞれ２秒程度提示され、次の視標に切り替わる。視聴者はこれらの視標を注視することを求められ、視標の切り替えとともに視点の移動が生じる。 FIG. 4 is a diagram illustrating an example of a temporal change in the target presented on the screen during personal calibration. The screen shown in the figure is displayed when the video playback unit 2 plays back the video data for latency extraction. In the figure, the target D5, the target D1, the target D2, the target D3, the target D6, the target D9, the target D8, the target D7, the target D4, and the target D5 are presented in this order. Each target is presented for about 2 seconds, and switches to the next target. The viewer is required to gaze at these targets, and the viewpoint changes as the targets are switched.

図５は、個人キャリブレーション時に視標が変化した際の視点の変化のタイミングを模式的に示した図である。同図では、視線の上下方向の変化に注目するために、視標Ｄ１、視標Ｄ２、視標Ｄ３、視標Ｄ１、視標Ｄ３の順に表示した場合を示している。表示された時系列の各フレームの画像の下には、視点の上下方向の座標変化を示している。一般的に、視点は内的に注視対象を決定した時からほぼ一定の遅れ（潜時）を伴って移動する。内的に注視対象の決定した時とは、視標が提示された後に視聴者が心的にその視標に注意を向けようとした瞬間である。例えば、時刻ｔ１に視標が表示されると、視聴者は時刻ｔ２に内的にその視標を注視対象として決定する。視聴者は、潜時の後、時刻ｔ３に注視対象（視標）への視線の移動を開始する。これにより、サッカードが開始される。そして、注視対象（視標）への視線の移動が終了した時刻ｔ４に、サッカードが終了する。 FIG. 5 is a diagram schematically showing the timing of changing the viewpoint when the target changes during personal calibration. In the figure, in order to pay attention to a change in the vertical direction of the line of sight, a case where the target D1, the target D2, the target D3, the target D1, and the target D3 are displayed in this order is shown. Below the displayed images of each time-series frame, coordinate changes in the vertical direction of the viewpoint are shown. In general, the viewpoint moves with a substantially constant delay (latency) from when the gaze target is determined internally. The time when the gaze target is determined internally is the moment when the viewer is willing to pay attention to the target after the target is presented. For example, when a visual target is displayed at time t1, the viewer internally determines the visual target as a gaze target at time t2. The viewer starts moving the line of sight toward the gaze target (target) at time t3 after the latent time. Thereby, saccade is started. And saccade is complete | finished at the time t4 when the movement of the eyes | visual_axis to the gaze target (target) was complete | finished.

個人キャリブレーションにおいては、映像中に視標が１点のみ表示されるために顕著性が非常に高い。そのため、視標の表示時刻（ｔ１）≒内的に注視の対象を決定した時刻（ｔ２）とみなすことができる。従って、潜時は、視標の表示からサッカードの開始時刻までの時間（例えば、時刻ｔ３−時刻ｔ１）として算出される。そこで、個人キャリブレーションにより各視標を切り替えて提示した際に得られたこの遅れ時間（潜時１、潜時２、潜時３、潜時４、…）から平均的な潜時と、その平均的な潜時に相当するフレーム数（以下、「潜時相当フレーム数」と記載する。）を算出し、後述するターゲットフレーム抽出処理において利用する。
なお、実際の映像では映像中に様々な注視対象が同時に表示される。従って、ある被写体が映像中に表示された瞬間は必ずしも注視対象として内的に決定される時刻とは一致しない。 In personal calibration, only one point of the target is displayed in the video, so that the saliency is very high. Therefore, the display time of the target (t1) can be regarded as the time (t2) when the target of gaze is determined internally. Therefore, the latency is calculated as the time from the target display to the saccade start time (for example, time t3−time t1). Therefore, from this delay time (latency 1, latency 2, latency 3, latency 4, ...) obtained when switching and presenting each target by personal calibration, The number of frames corresponding to the average latency (hereinafter referred to as “latency equivalent number of frames”) is calculated and used in the target frame extraction process described later.
In the actual video, various gaze objects are displayed simultaneously in the video. Therefore, the moment when a certain subject is displayed in the video image does not necessarily coincide with the time determined internally as a gaze target.

映像評価システム１０における個人キャリブレーション時の動作を説明する。
映像再生部２は、映像蓄積部１に記憶されている潜時抽出用の映像データを読み出して再生し、表示される視標が２秒程度で切り変わっていく画面を提示する。視線測定部３は、視聴者が視標を注視しているときの眼球運動を測定し、測定した時点のフレーム画像のＦＩＤ及びタイムコードと、測定により得られた視点の視点位置情報とを設定した視線データを視線データ蓄積部４に記録する。映像評価装置５の潜時抽出部５１は、視線データ蓄積部４からこの記録された視線データを読み出す。潜時抽出部５１は、視線データが示す各タイムコードにおける視点位置情報に基づいて、視点の移動速度（例えば、角速度）を算出する。潜時抽出部５１は、視点の移動速度が所定の速度以上である場合、サッカードであると判断する。潜時抽出部５１は、視標の提示が開始された時刻から、その視標へのサッカードが開始される時刻までの差分である遅れ時間を視標ごとに算出する。潜時抽出部５１は、全ての視標それぞれの遅れ時間の平均を算出し、潜時とする。潜時抽出部５１は、さらに、潜時に相当するフレーム数である潜時相当フレーム数を算出する。 The operation at the time of personal calibration in the video evaluation system 10 will be described.
The video playback unit 2 reads and plays back the latency extraction video data stored in the video storage unit 1 and presents a screen on which the displayed target changes in about 2 seconds. The line-of-sight measurement unit 3 measures the eye movement when the viewer is gazing at the target, and sets the FID and time code of the frame image at the time of measurement and the viewpoint position information of the viewpoint obtained by the measurement. The line-of-sight data is recorded in the line-of-sight data storage unit 4. The latency extraction unit 51 of the video evaluation device 5 reads the recorded line-of-sight data from the line-of-sight data storage unit 4. The latency extraction unit 51 calculates the moving speed (for example, angular velocity) of the viewpoint based on the viewpoint position information in each time code indicated by the line-of-sight data. The latency extraction unit 51 determines that the viewpoint is saccade when the moving speed of the viewpoint is equal to or higher than a predetermined speed. The latency extraction unit 51 calculates, for each target, a delay time that is a difference from the time when the presentation of the target is started to the time when the saccade to the target is started. The latency extraction unit 51 calculates the average of the delay times of all the targets and sets the latency. The latency extraction unit 51 further calculates a latency equivalent frame number that is the number of frames corresponding to the latency.

続いて、映像評価システム１０における映像評価時の動作を説明する。
映像再生部２は、映像蓄積部１に記憶されている評価対象の映像データである映像コンテンツを読み出して再生する。視線測定部３は、視聴者が映像コンテンツを視聴しているときの眼球運動を測定し、測定した時点のフレーム画像のＦＩＤ及びタイムコードと、測定により得られた視点の視点位置情報とを設定した視線データを視線データ蓄積部４に記録する。 Next, an operation during video evaluation in the video evaluation system 10 will be described.
The video playback unit 2 reads and plays back video content, which is video data to be evaluated, stored in the video storage unit 1. The line-of-sight measurement unit 3 measures eye movements when the viewer is viewing the video content, and sets the FID and time code of the frame image at the time of measurement and the viewpoint position information of the viewpoint obtained by the measurement. The line-of-sight data is recorded in the line-of-sight data storage unit 4.

図６は、映像評価装置５による誘目度算出処理を示す処理フローである。
映像蓄積部１には映像コンテンツのシーンと、そのシーンのＦＩＤまたは再生時間との対応付けを示す情報を予め登録しておく。 FIG. 6 is a processing flow showing the attractiveness calculation processing by the video evaluation device 5.
In the video storage unit 1, information indicating the correspondence between a scene of video content and the FID or playback time of the scene is registered in advance.

映像評価装置５のサッカード抽出部５２は、映像コンテンツについて生成された視線データを視線データ蓄積部４から読み出す。サッカード抽出部５２は、視線データが示す各タイムコードにおける視点位置情報に基づいて、視点の移動速度（例えば、角速度）を算出する。サッカード抽出部５２は、算出した移動速度が、所定の速度（例えば５ｄｅｇ／ｓ）以上である時間帯をそれぞれ、サッカードが発生した時間帯として抽出する（ステップＳ１０５）。サッカードが発生した時間帯の開始時刻、終了時刻のフレームがそれぞれ、サッカード開始時点、サッカード終了時点のフレームである。ここでは、ｉ番目（ｉは１以上の整数）に出現したサッカードをサッカード＃ｉと記載する。サッカード抽出部５２は、各サッカードに、サッカードを識別するためのサッカードＩＤを付与する。
なお、個人キャリブレーション時、及び、ステップＳ１０５において使用される所定の速度は同じであり、経験値に基づいて事前に設定される。 The saccade extraction unit 52 of the video evaluation device 5 reads the line-of-sight data generated for the video content from the line-of-sight data storage unit 4. The saccade extracting unit 52 calculates the moving speed (for example, angular velocity) of the viewpoint based on the viewpoint position information in each time code indicated by the line-of-sight data. The saccade extraction unit 52 extracts each time zone in which the calculated moving speed is equal to or higher than a predetermined speed (for example, 5 deg / s) as a time zone when the saccade is generated (step S105). The frames at the start time and end time of the time zone in which saccade occurs are the frames at the saccade start time and saccade end time, respectively. Here, the saccade that appears i-th (i is an integer equal to or greater than 1) is referred to as saccade #i. The saccade extraction unit 52 assigns each saccade a saccade ID for identifying the saccade.
In addition, the predetermined speed used at the time of personal calibration and in step S105 is the same, and is set in advance based on experience values.

注視領域抽出部５３は、サッカードとして抽出されなかった時間帯のフレームから、所定の速度以下で視点移動が生じたフレームを特定する。注視領域抽出部５３は、特定したフレームにおける視点に基づき注視領域を抽出する（ステップＳ１１０）。
具体的には、注視領域抽出部５３は、サッカードの終了時のフレームと次のサッカードの開始時のフレームとの間で、所定の速度以下で視点移動が生じたフレームを特定し、そのフレームの視点位置情報を視線データから読み出す。注視領域抽出部５３は、読み出した視点位置情報が示す座標からＸ軸−Ｙ軸の二次元座標における正規分布を求め、その正規分布における値が閾値以上の領域を注視領域とする。以下では、サッカード＃ｉの終了時のフレームとサッカード＃（ｉ＋１）の開始時のフレームとの間のフレームから抽出された注視領域を注視領域＃ｉと記載する。
なお、ステップＳ１１０において使用される所定の速度及び閾値は、経験値に基づいて事前に設定される。 The gaze region extraction unit 53 identifies a frame in which the viewpoint movement occurs at a predetermined speed or less from the frames in the time zone that are not extracted as saccades. The gaze area extraction unit 53 extracts a gaze area based on the viewpoint in the identified frame (step S110).
Specifically, the gaze area extraction unit 53 identifies a frame in which the viewpoint movement occurs at a predetermined speed or less between the frame at the end of the saccade and the frame at the start of the next saccade. The viewpoint position information of the frame is read from the line-of-sight data. The gaze area extraction unit 53 obtains a normal distribution in the two-dimensional coordinates of the X axis and the Y axis from the coordinates indicated by the read viewpoint position information, and sets an area in which the value in the normal distribution is a threshold value or more as the gaze area. Hereinafter, a gaze area extracted from a frame between the frame at the end of saccade #i and the frame at the start of saccade # (i + 1) will be referred to as gaze area #i.
The predetermined speed and threshold used in step S110 are set in advance based on experience values.

窓関数算出部５４は、各注視領域について、その注視領域が得られたフレームにおける視点の分布から、その注視領域に対する重みの分布を算出し、領域窓関数を得る（ステップＳ１１５）。以下では、注視領域＃ｉから得られた領域窓関数を領域窓関数＃ｉと記載する。 For each gaze area, the window function calculation unit 54 calculates the distribution of weights for the gaze area from the distribution of viewpoints in the frame from which the gaze area is obtained, and obtains the area window function (step S115). Hereinafter, the area window function obtained from the gaze area #i is referred to as area window function #i.

図７は、サッカード及び注視の抽出例を示す図である。
例えば、ＦＩＤ「３」〜ＦＩＤ「５」のフレームにおいてサッカード＃１が抽出され、ＦＩＤ「１２」〜ＦＩＤ「１５」のフレームにおいてサッカード＃２が抽出されたとする。そして、サッカード＃１とサッカード＃２の間で、ＦＩＤ「６」〜ＦＩＤ「１１」のフレームが注視され、これらのフレームの視点から注視領域＃１が得られたとする。この場合、領域窓関数＃１は、注視領域＃１におけるＦＩＤ「６」〜ＦＩＤ「１１」のフレームの視点の分布から算出される。 FIG. 7 is a diagram illustrating an extraction example of saccade and gaze.
For example, it is assumed that saccade # 1 is extracted in the frames of FID “3” to FID “5” and saccade # 2 is extracted in the frames of FID “12” to FID “15”. Then, assume that frames of FID “6” to FID “11” are watched between saccade # 1 and saccade # 2, and gaze area # 1 is obtained from the viewpoint of these frames. In this case, the area window function # 1 is calculated from the viewpoint distribution of the frames of FID “6” to FID “11” in the gaze area # 1.

窓関数算出部５４は、例えば、以下の（１）〜（３）のように、視点の分布から注視領域に対する重みの分布を算出し、領域窓関数とする。 The window function calculation unit 54 calculates the distribution of weights for the gaze area from the distribution of viewpoints, for example, as in the following (1) to (3), and sets it as the area window function.

（１）注視領域に含まれる視点に二次元混合正規分布を当てはめた分布。
（２）各視点から一定の距離内にある領域を、注視領域に含まれる全ての視点について足し合わせた後、正規化した分布。距離の例として、観測される測定誤差などがある。
（３）注視領域に含まれる各視点の重心から一定の距離内を一様の値とする分布。 (1) Distribution obtained by fitting a two-dimensional mixed normal distribution to viewpoints included in the gaze area.
(2) A distribution obtained by adding an area within a certain distance from each viewpoint to all the viewpoints included in the gaze area, and then normalizing the distribution. Examples of distances include observed measurement errors.
(3) A distribution having a uniform value within a certain distance from the center of gravity of each viewpoint included in the gaze area.

窓関数算出部５４は、上記の（１）〜（３）のいずれかにより算出した分布を、重みの総和が１となるように正規化して領域窓関数とする。 The window function calculation unit 54 normalizes the distribution calculated by any one of the above (1) to (3) so that the total sum of the weights becomes 1, thereby obtaining an area window function.

ターゲット領域抽出部５５は、ステップＳ１０５においてサッカード抽出部５２が抽出したサッカードのうち未選択のサッカード＃ｉを１つ選択する（ステップＳ１２０）。例えば、ターゲット領域抽出部５５は、ｉが小さい順に未選択のサッカードを選択する。ターゲット領域抽出部５５は、注視領域抽出部５３が算出した注視領域＃ｉをターゲット領域として抽出する（ステップＳ１２５）。 The target area extraction unit 55 selects one unselected saccade #i from the saccades extracted by the saccade extraction unit 52 in step S105 (step S120). For example, the target area extraction unit 55 selects unselected saccades in ascending order of i. The target area extraction unit 55 extracts the gaze area #i calculated by the gaze area extraction unit 53 as a target area (step S125).

ターゲットフレーム抽出部５６は、サッカード＃ｉの終了時点のフレームから、サッカードに要する時間分のフレーム数と、潜時抽出部５１が得た潜時相当フレーム数とを遡ったフレームを、顕著性マップを算出するターゲットフレームに決定する（ステップＳ１３０）。ターゲットフレームは、内的に注視の対象を決定したときのフレームである。サッカード＃ｉの終了時点のフレームから、サッカードに要した時間分のフレーム数を遡ったフレームは、サッカード＃ｉの開始時点のフレームに相当する。従って、ターゲットフレームは、サッカード＃ｉの開始時点のフレームから潜時相当フレーム数だけ遡ったフレームとなる。 The target frame extraction unit 56 remarkably displays a frame that traces back the number of frames required for the saccade from the frame at the end of saccade #i and the number of frames corresponding to the latency obtained by the latency extraction unit 51. The target frame for calculating the sex map is determined (step S130). The target frame is a frame when the gaze target is determined internally. A frame that goes back the number of frames required for saccade from the frame at the end of saccade #i corresponds to the frame at the start of saccade #i. Therefore, the target frame is a frame that is retroactive by the number of frames corresponding to the latency from the frame at the start of saccade #i.

図８は、ターゲットフレームの決定方法を示す図である。同図は、図７に示すサッカード＃１のターゲットフレームを決定する場合を例に示している。ターゲットフレーム抽出部５６は、サッカード開始時点のＦＩＤ「０３」のフレームを、サッカード終了時点のＦＩＤ「０５」のフレームからサッカードに要した時間分のフレーム数を遡ったフレームとして特定する。さらに、ターゲットフレーム抽出部５６は、サッカード開始時点のＦＩＤ「０３」のフレームから、潜時相当フレーム数「２」だけ時間的に遡ったＦＩＤ「０１」フレームを、ターゲットフレームとして決定する。ターゲットフレーム抽出部５６は、ターゲットフレームのＦＩＤを顕著性マップ算出部５７に出力する。これは、サッカードが終了したときの視点を含む領域に、視聴者の目を惹いた画像があると考えられるためである。 FIG. 8 is a diagram illustrating a method for determining a target frame. This figure shows an example in which the target frame of saccade # 1 shown in FIG. 7 is determined. The target frame extraction unit 56 identifies the frame of FID “03” at the start of saccade as a frame that is retroactive from the number of frames required for saccade from the frame of FID “05” at the end of saccade. Further, the target frame extraction unit 56 determines the FID “01” frame that is temporally retroactive from the frame of the FID “03” at the start of saccade by the number of frames corresponding to the latency “2” as the target frame. The target frame extraction unit 56 outputs the FID of the target frame to the saliency map calculation unit 57. This is because it is considered that there is an image that attracts the viewer's eyes in the area including the viewpoint when the saccade is completed.

図６において、顕著性マップ算出部５７は、ステップＳ１３０においてターゲットフレーム抽出部５６から出力されたＦＩＤにより特定されるフレーム画像であるターゲットフレームを、映像蓄積部１が蓄積している映像コンテンツから読み出す。顕著性マップ算出部５７は、読み出したターゲットフレームの顕著性マップを算出する（ステップＳ１３５）。顕著性マップの算出には、例えば、非特許文献１〜３を利用することができるが、他の任意の顕著性マップの算出方法を用いてもよい。 In FIG. 6, the saliency map calculation unit 57 reads out the target frame, which is a frame image specified by the FID output from the target frame extraction unit 56 in step S130, from the video content stored in the video storage unit 1. . The saliency map calculation unit 57 calculates a saliency map of the read target frame (step S135). For calculation of the saliency map, for example, Non-Patent Documents 1 to 3 can be used, but any other saliency map calculation method may be used.

例えば、顕著性マップは、以下のように算出される。まず、特徴量の種類（色、輝度値、動きベクトルなど）別に、入力画像の画素ごとの特徴量の値を算出したマップを生成し、さらに、入力画像のスケールを１／２、１／４、１／８、…のように小さくしたときの特徴量の値のマップを生成する。そして、特徴量の種類別に、スケール間のマップの差を求めることによって特徴量の値の差分を強調した特徴マップを作成した後、各種類の特徴量について作成した特徴マップを線形結合し、画素ごと又は画像のブロックごとに顕著性の程度を数値化した顕著性マップを算出する。 For example, the saliency map is calculated as follows. First, for each type of feature quantity (color, luminance value, motion vector, etc.), a map is generated that calculates the feature quantity value for each pixel of the input image, and the scale of the input image is set to 1/2, 1/4. , 1/8,..., A feature value map is generated. Then, after creating a feature map that emphasizes the difference in the value of the feature value by calculating the map difference between the scales for each type of feature value, the feature map created for each type of feature value is linearly combined, A saliency map is calculated by quantifying the degree of saliency for each image or each block of an image.

誘目度算出部５８は、顕著性マップ算出部５７が算出した顕著性マップから、ターゲット領域（注視領域＃ｉ）に相当する部分の顕著性マップを抽出する。誘目度算出部５８は、ターゲット領域の顕著性マップに領域窓関数＃ｉが示す重み付け係数を乗算して得られた各画素の値を加算することにより顕著性マップの重み付け和を算出し、誘目性を表す値とする（ステップＳ１４０）。誘目度算出部５８は、ステップＳ１４０において算出した値を領域窓関数の総和によって正規化し、誘目度とする（ステップＳ１４５）。誘目度算出部５８は、サッカード＃ｉが含まれるシーンのシーンＩＤと、サッカード＃ｉのサッカードＩＤと、サッカード＃ｉの開始フレームのＦＩＤ及び終了フレームのＦＩＤとを対応付けて記憶部５９に書き込む。サッカード＃ｉが含まれるシーンのシーンＩＤは、映像蓄積部１から読み出される。 The attractiveness calculation unit 58 extracts a saliency map of a portion corresponding to the target region (gaze region #i) from the saliency map calculated by the saliency map calculation unit 57. The attractiveness calculation unit 58 calculates the weighted sum of the saliency map by adding the value of each pixel obtained by multiplying the saliency map of the target area by the weighting coefficient indicated by the area window function #i, and A value representing the sex (step S140). The degree of attractiveness calculation unit 58 normalizes the value calculated in step S140 by the sum of the area window functions to obtain the degree of attractiveness (step S145). The attractiveness calculating unit 58 stores the scene ID of the scene including the saccade #i, the saccade ID of the saccade #i, the FID of the start frame and the FID of the end frame of the saccade #i in association with each other. Write to part 59. The scene ID of the scene including saccade #i is read from the video storage unit 1.

ターゲット領域抽出部５５は、ステップＳ１０５においてサッカード抽出部５２が抽出した全てのサッカードについて処理を終了したか否かを判断する（ステップＳ１５０）。ターゲット領域抽出部５５は、処理をしていないサッカードがあると判断した場合（ステップＳ１５０：ＮＯ）、現在のｉの値に１を加算してステップＳ１２０からの処理を繰り返す。
そして、ターゲット領域抽出部５５は、全てのサッカードについて処理を終了したと判断した場合（ステップＳ１５０：ＹＥＳ）、処理を終了する。 The target area extraction unit 55 determines whether or not the processing has been completed for all saccades extracted by the saccade extraction unit 52 in step S105 (step S150). When it is determined that there is a saccade that has not been processed (step S150: NO), the target area extraction unit 55 adds 1 to the current value of i and repeats the processing from step S120.
Then, when the target area extraction unit 55 determines that the process has been completed for all saccades (step S150: YES), the process ends.

上記の誘目度処理を行うことにより、誘目度に関する時系列のベクトルが得られる。誘目度算出部５８は、シーンごとあるいは映像コンテンツ全体について、誘目度の平均値を算出し、シーンごとあるいは映像コンテンツ全体の評価値とする。誘目度算出部５８は、算出したシーンごとあるいは映像全体の評価値を示す評価結果データを生成し、記憶部５９に書き込むなどして出力する。 By performing the above-described degree of attraction processing, a time-series vector relating to the degree of attraction is obtained. The attraction level calculation unit 58 calculates the average value of the attraction level for each scene or for the entire video content and sets it as the evaluation value for each scene or for the entire video content. The attractiveness calculating unit 58 generates evaluation result data indicating the calculated evaluation value for each scene or for the entire video, and writes the result into the storage unit 59 for output.

図９は、シーン毎のサッカードと領域窓関数の例を示す図である。同図は、シーン１におけるサッカードＩＤ「０１」〜「０４」の４つのサッカードと、それらサッカードが選択されたときに誘目度の算出に使用した領域窓関数を示している。同様にシーン２、シーン３、…についてもサッカードが得られ、それらの誘目度が算出される。 FIG. 9 is a diagram illustrating an example of the saccade and the area window function for each scene. This figure shows four saccades with saccade IDs “01” to “04” in the scene 1 and an area window function used for calculating the degree of attraction when these saccades are selected. Similarly, saccades are obtained for scenes 2, 3 and so on, and the degree of attraction is calculated.

図１０は、評価結果データの例を示す図である。同図に示す評価結果は、各サッカードのシーンＩＤ、サッカードＩＤ、開始フレームのＦＩＤ、終了フレームのＦＩＤ、及び、誘目度と、シーン別の誘目度と、映像コンテンツ全体の誘目度とを示す。 FIG. 10 is a diagram illustrating an example of evaluation result data. The evaluation results shown in the figure are the scene ID of each saccade, the saccade ID, the FID of the start frame, the FID of the end frame, the degree of enticing, the degree of enticing by scene, and the degree of enticing of the entire video content. Show.

上記した実施形態は、例えば、視線測定部３が、２４０ｆｐｓ（フレーム毎秒）や５００ｆｂｓ等の高速カメラを利用した眼球運動測定装置であり、映像再生部２が、６０ｆｐｓや１２０ｆｂｓなどの高フレームレートの映像表示装置である場合に好適である。高速カメラを利用した眼球運動測定装置の場合、図５に示すように、サッカードの開始の時刻ｔ３と、サッカード終了の時刻ｔ４とを別々に観測することが可能である。
しかし、一般的にサッカードの速度は非常に早く、その所要時間は、３０ｆｂｓフレームレートの通常のビデオカメラによる１フレーム分（３３ミリ秒）以下になる可能性もある。さらに、キャリブレーションの場合は、視標を比較的近い場所に移動して眼球運動を促すためにサッカードの距離は短くなる。そのため、通常のビデオカメラで測定する眼球運動測定装置では、サッカードの開始の時刻ｔ３と、サッカード終了の時刻ｔ４とを別々に捉えにくいと考えられる。つまり、時刻ｔ３≒時刻ｔ４となる。そこで、潜時抽出部５１は、視標の提示が開始された時刻から、その視標へのサッカードが終了した時刻（又は開始した時刻）までの差分である遅れ時間を視標ごとに算出する。この遅れ時間は、潜時とサッカードに要する時間とを含んだ時間である。潜時抽出部５１は、全ての視標それぞれの遅れ時間の平均を算出し、算出した遅れ時間の平均に相当するフレーム数を算出する。そして、ステップＳ１３０において、ターゲットフレーム抽出部５６は、サッカード＃ｉの終了時点のフレームから、遅れ時間の平均に相当するフレーム数だけ遡ったフレームを、内的に注視の対象を決定したときのフレームであるターゲットフレームに決定する。 In the above-described embodiment, for example, the line-of-sight measurement unit 3 is an eye movement measurement device using a high-speed camera such as 240 fps (frame per second) or 500 fbs, and the video reproduction unit 2 has a high frame rate such as 60 fps or 120 fbs. This is suitable for a video display device. In the case of an eye movement measurement device using a high-speed camera, as shown in FIG. 5, it is possible to observe the saccade start time t3 and the saccade end time t4 separately.
However, in general, the speed of saccade is very fast, and the required time may be less than one frame (33 milliseconds) by a normal video camera with a 30 fbs frame rate. Further, in the case of calibration, the distance of the saccade is shortened because the target is moved to a relatively close place to promote eye movement. For this reason, it is considered that the eye movement measuring device that measures with a normal video camera cannot easily capture the saccade start time t3 and the saccade end time t4 separately. That is, time t3≈time t4. Therefore, the latency extraction unit 51 calculates, for each target, a delay time that is a difference from the time when the presentation of the target is started to the time when the saccade to the target ends (or the time when the target is started). To do. This delay time is a time including the latency and the time required for saccade. The latency extraction unit 51 calculates the average of the delay times of all the targets, and calculates the number of frames corresponding to the calculated average of the delay times. Then, in step S130, the target frame extraction unit 56 internally determines a target to be watched for a frame that is backed by the number of frames corresponding to the average delay time from the frame at the end of saccade #i. The target frame that is a frame is determined.

以上説明した実施形態によれば、映像コンテンツ視聴時の視線の特徴量（注視点分布、移動速度など）と、画像中の誘目性の高い領域を推定する顕著性マップとから、視聴者が映像表現の顕著性に誘目されている程度を数値化し、映像の質を評価することができる。これにより、映像制作時に試みられる様々な演出に対しての客観的な評価値として利用することが可能である。 According to the embodiment described above, the viewer can view the video from the feature amount of the line of sight at the time of viewing the video content (gaze point distribution, moving speed, etc.) and the saliency map for estimating a highly attractive area in the image. The degree of expression attracted by the saliency of the expression can be quantified to evaluate the quality of the video. As a result, it can be used as an objective evaluation value for various effects attempted during video production.

なお、上述の映像評価装置５は、内部にコンピュータシステムを有している。そして、映像評価装置５の動作の過程は、プログラムの形式でコンピュータ読み取り可能な記録媒体に記憶されており、このプログラムをコンピュータシステムが読み出して実行することによって、上記処理が行われる。ここでいうコンピュータシステムとは、ＣＰＵ及び各種メモリやＯＳ、周辺機器等のハードウェアを含むものである。 In addition, the above-mentioned video evaluation apparatus 5 has a computer system inside. The operation process of the video evaluation apparatus 5 is stored in a computer-readable recording medium in the form of a program, and the above processing is performed by the computer system reading and executing this program. The computer system here includes a CPU, various memories, an OS, and hardware such as peripheral devices.

また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。
また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含むものとする。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよい。 Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used.
The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory in a computer system serving as a server or a client in that case, and a program that holds a program for a certain period of time are also included. The program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.

１映像蓄積部
２映像再生部
３視線測定部
４視線データ蓄積部
５映像評価装置
１０映像評価システム
５１潜時抽出部
５２サッカード抽出部
５３注視領域抽出部
５４窓関数算出部
５５ターゲット領域抽出部
５６ターゲットフレーム抽出部
５７顕著性マップ算出部
５８誘目度算出部
５９記憶部 DESCRIPTION OF SYMBOLS 1 Image | video storage part 2 Image | video reproduction | regeneration part 3 Eye-gaze measurement part 4 Eye-gaze data storage part 5 Image | video evaluation apparatus 10 Image | video evaluation system 51 Latency extraction part 52 Saccade extraction part 53 Gaze area extraction part 54 Window function calculation part 55 Target area extraction part 56 Target frame extraction unit 57 Saliency map calculation unit 58 Attraction level calculation unit 59 Storage unit

Claims

A saccade extraction unit that identifies the frame from the start to the end of saccade based on the moving speed of the viewpoint obtained from the line-of-sight data indicating the viewpoint during video viewing for each frame;
A gaze area that identifies the frame that has been watched from the frame from the end of saccade to the start of the next saccade based on the moving speed, and extracts a gaze area based on the viewpoint in the identified frame An extractor;
A window function calculation unit that calculates an area window function representing a weighting coefficient in the gaze area based on the distribution of the viewpoint in the frame from which the gaze area is obtained;
From the frame at the end of saccade, from the state in which the target of gaze is determined internally to the start of movement of the line of sight, the latency is the sum of the time required for saccade and the time required for saccade A target frame extraction unit that extracts the frame that is traced back by the number of frames as a target frame;
A saliency map calculating unit that calculates a saliency map in which the degree of saliency is quantified for each pixel or each block of an image based on a map that calculates a value of a feature amount for each pixel of the target frame;
Using a value obtained by applying the area window function to the portion of the saliency map corresponding to the gaze area, and an attractiveness calculating unit that calculates an attractiveness that is a value that quantitatively represents attractiveness;
A video evaluation apparatus comprising:

The window function calculating unit adds a distribution obtained by applying a two-dimensional mixed normal distribution to the viewpoint included in the gaze area, and an area within a certain distance from the viewpoint for all the viewpoints included in the gaze area. The area window function is calculated by a distribution normalized after matching, or a distribution with a uniform value within a certain distance from the center of gravity of the viewpoint included in the gaze area,
The video evaluation apparatus according to claim 1, wherein:

Based on the viewpoint data measured when the target position is switched and displayed in time, saccade to the target starts from the time when the target presentation starts for each target. A latency extraction unit that calculates a delay time that is a difference up to the determined time, and calculates the latency by averaging the delay times calculated for the plurality of targets.
The video evaluation apparatus according to claim 1, wherein the video evaluation apparatus is a video evaluation apparatus.

The gaze area extraction unit calculates a distribution in two-dimensional coordinates based on the viewpoint in the frame that has been watched, and extracts an area where the calculated distribution value is a threshold value or more as a gaze area.
The video evaluation apparatus according to claim 1, wherein the video evaluation apparatus is a video evaluation apparatus.

Computer
Saccade extraction means for identifying the frame from the start to the end of saccade based on the moving speed of the viewpoint obtained from the line-of-sight data indicating the viewpoint during video viewing for each frame;
A gaze area that identifies the frame that has been watched from the frame from the end of saccade to the start of the next saccade based on the moving speed, and extracts a gaze area based on the viewpoint in the identified frame Extraction means;
Window function calculating means for calculating an area window function representing a weighting factor in the gaze area based on the distribution of the viewpoint in the frame from which the gaze area is obtained;
From the frame at the end of saccade, from the state in which the target of gaze is determined internally to the start of movement of the line of sight, the latency is the sum of the time required for saccade and the time required for saccade Target frame extraction means for extracting the frame that is traced back by the number of frames as a target frame;
Saliency map calculating means for calculating a saliency map in which the degree of saliency is quantified for each pixel or each block of an image based on a map for which a value of a feature amount for each pixel of the target frame is calculated;
Using a value obtained by applying the area window function to the portion of the saliency map corresponding to the gaze area, and an attraction degree calculating means for calculating an attraction degree that is a value that quantitatively represents attraction.
A program for causing a video evaluation apparatus to function.