JP3323842B2

JP3323842B2 - Video summarizing method and apparatus

Info

Publication number: JP3323842B2
Application number: JP30034999A
Authority: JP
Inventors: 富夫越後; アルベルト富田; 健益満
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1999-10-22
Filing date: 1999-10-22
Publication date: 2002-09-09
Anticipated expiration: 2019-10-22
Also published as: JP2001119649A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、動画データを処理
可能なコンピュータ・システム、特に画像処理装置にお
ける動画データの要約方法及び装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a computer system capable of processing moving image data, and more particularly to a method and an apparatus for summarizing moving image data in an image processing apparatus.

【０００２】[0002]

【従来の技術】２０００年のＢＳデジタル・テレビ放送
に続き、地上波デジタル放送が２００３年に始まろうと
しているが、そのときには現在のアナログ放送とは異な
った視聴方法が考えられる。2. Description of the Related Art Terrestrial digital broadcasting is about to start in 2003 following BS digital television broadcasting in 2000. At that time, a viewing method different from that of current analog broadcasting is conceivable.

【０００３】従来のアナログ放送では、ユーザはリアル
・タイムにライブを視聴したり、又は特定の時間にセッ
トしたビデオ・デッキに映像を録画し、録画開始時間の
頭出ししかできなかった。[0003] In the conventional analog broadcasting, a user can watch a live in real time or record a video on a video deck set at a specific time, and can only find a recording start time.

【０００４】本発明の対象となる映像とは、時間ととも
に変化する画像のことであり、動画データ、ビデオ・デ
ータ等と同義である。[0004] The video which is the object of the present invention is an image which changes with time, and is synonymous with moving image data, video data and the like.

【０００５】しかし、デジタル放送が普及するころに
は、低価格で大容量のハードディスク装置（ＨＤＤ）が
受像機であるセットトップ・ボックス（Ｓｅｔ−Ｔｏｐ
Ｂｏｘ、ＳＴＢ）に組み込まれ、ユーザの好みによる
長時間映像が録画可能になるが、ユーザの総視聴時間
は、いつの時代でも大きく変わらないので、効率良く映
像を検索し、要約して提示することが更に重要となる。However, by the time digital broadcasting becomes widespread, a low-cost, large-capacity hard disk drive (HDD) is a set-top box (Set-Top), which is a receiver.
Box, STB) to record long-term video according to the user's preference. However, since the total viewing time of the user does not change significantly at any time, it is necessary to search for video efficiently, summarize it, and present it. Becomes even more important.

【０００６】ここで、セットトップ・ボックス（ＳＴ
Ｂ）とは、ケーブルＴＶ（ＣＡＴＶ）の制御装置や，通
信カラオケの端末機器など、テレビに接続して追加機能
を提供するデバイスの一般名称であり、通常ＴＶの上に
置くことからこのように呼ばれている。Here, a set-top box (ST)
B) is a general name of a device that provides an additional function by connecting to a television, such as a control device of a cable TV (CATV) or a terminal device of a communication karaoke. being called.

【０００７】例えば、スポーツ映像を途中から見る場
合、それまでの経過を要約して提示し、ライブ映像に追
いつくなどの映像表現が考えられている。[0007] For example, when watching a sports video from the middle, a video expression such as summarizing and presenting the progress up to that point and catching up with a live video has been considered.

【０００８】しかし、従来の要約法では要約結果は、コ
ンテンツの配信者である放送局が決めており、個人の好
みに沿った要約を生成することが困難であった。However, according to the conventional summarization method, the summarization result is determined by the broadcasting station that is the distributor of the content, and it is difficult to generate a summarization according to personal preference.

【０００９】ここで要約とは、対象となる映像から時間
軸に圧縮した映像を生成することであり、長時間の対象
映像からユーザの興味のないシーケンスを削除し、興味
のあるシーケンスだけを残したものことである。[0009] Here, the summarization is to generate a video compressed on the time axis from a target video. A sequence that is not interesting to the user is deleted from a long-time target video, and only a sequence that is interesting is left. It is a thing.

【００１０】映像の要約には、内容解析を必要とするた
め、高性能なパーソナル・コンピュータ（ＰＣ）を必要
としていたが、今後放送が開始されるデジタル放送で
は、映像の隙間にデータ放送を配信することができるた
め、配信者が用意する複数のデータをデータ放送で送
り、ユーザが自由にデータを組み合わせてユーザ個々に
適した要約を生成することが可能となる。[0010] Video summarization requires a high-performance personal computer (PC) because of the need for content analysis. However, in digital broadcasting, which will start broadcasting in the future, data broadcasting will be distributed between video gaps. Therefore, a plurality of data prepared by the distributor can be transmitted by data broadcasting, and the user can freely combine the data to generate a summary suitable for each user.

【００１１】また、画像処理機能を有するＳＴＢで処理
したデータを利用した要約も可能である。Also, summarization using data processed by the STB having an image processing function is possible.

【００１２】従来からの映像の要約技術として、連続フ
レームの画素値がフレーム内で大きく異なる時間を見出
すシーン・チェンジの先頭画像を並べたストーリー・ボ
ード（story board）をあげている例がある。[0012] As a conventional video summarizing technique, there is an example in which a story board in which head images of scene changes in which pixel values of continuous frames find a large difference in a frame are arranged.

【００１３】しかし、ストーリー・ボード自体はシーン
の内容を伝えていないため要約にはなりえない。However, the story board itself cannot be a summary because it does not convey the contents of the scene.

【００１４】ストーリー・ボードの画像の類似度を求
め、関係グラフで一連のシーンを集約したシーン・トラ
ンジション・グラフ（Scene Transition Graph）［１］
「M. Yeung, B.L. Yeo, etc.,"Extracting story units
from long programs for video browsing and navigat
ion,"IEEE ICMCS, pp. 296-305, 1996.」は，映像の構
造を伝えることが可能であるが、要約にはならない。[0014] A scene transition graph (1) in which a similarity between images of a story board is obtained and a series of scenes are aggregated by a relation graph [1]
"M. Yeung, BL Yeo, etc.," Extracting story units
from long programs for video browsing and navigat
ion, "IEEE ICMCS, pp. 296-305, 1996." can convey the structure of video, but is not a summary.

【００１５】要約技術では、ビデオ・スキミング法（Vi
deo Skimming）［２］「M. Smith and T. Kanade, "Vid
eo skimming and characterizationthrough the combin
ation of image and language understanding techniqu
es," IEEE CVPR, pp. 775-781, 1997.」が映像だけでな
く、音声、クローズド・キャプションを用いて、有効性
を示しているが、個人の好みを反映するための学習機能
はない。In the summarizing technique, a video skimming method (Vi
deo Skimming) [2] "M. Smith and T. Kanade," Vid
eo skimming and characterizationthrough the combin
ation of image and language understanding techniqu
es, "IEEE CVPR, pp. 775-781, 1997." shows effectiveness using not only video but also audio and closed captions, but there is no learning function to reflect personal preferences .

【００１６】米国においては、ＳＴＢが普及し始めてお
り、ハード・ディスク装置（ＨＤＤ）を有する長時間録
画可能な機種が発表されており、ＭｂＴＶ、Ｔｉｖｏ社
等が有力なメーカであり、個人の好みの映像を蓄積する
アルゴリズムとして、ベイズ（Ｂａｙｓ）法が使われて
いる（［３］「MbTV, Preference Determination Engin
e」）。In the United States, STB has begun to spread, and models having a hard disk drive (HDD) capable of recording for a long time have been announced. MbTV, Tivo, etc. are leading manufacturers and personal preference The Bayes method is used as an algorithm for accumulating video images (see [3] “MbTV, Preference Determination Engin”).
e ").

【００１７】しかし、これらは映像タイトル単位で蓄積
する方法であり、後述する本発明のように内容を要約す
ることは行っていない。However, these methods are methods of accumulating in units of video titles, and do not summarize the contents as in the present invention described later.

【００１８】本発明を用いると、配信者によって送られ
た複数のデータ又はユーザのＳＴＢによって処理したデ
ータを用い、ユーザは個人の好みによって見たいシーン
を重要視するデータに対する重み付けをすることができ
る。According to the present invention, a plurality of data sent by a distributor or data processed by a user's STB can be used by a user to weight data that emphasizes a scene desired to be viewed according to personal preference. .

【００１９】映像の内容を表現するデータのタイプとし
て、シーンの意味を直接表現するセマンティック・デー
タ（semantic data）と映像における信号の強さを表す
数値データ（numerical data）がある。As types of data representing the contents of video, there are semantic data which directly represents the meaning of a scene and numerical data (numerical data) which represents the strength of a signal in the video.

【００２０】セマンティック・データは映像に含まれる
各シーンの内容を言葉で表したものであり、数値データ
は、映像に含まれる各シーンの内容を数値で表現したも
のである。The semantic data expresses the content of each scene included in the video by words, and the numerical data expresses the content of each scene included in the video by numerical values.

【００２１】ここでシーンとは、一般的に意味解釈を与
えることができるフレーム又はフレーム列のことであ
る。Here, a scene is generally a frame or a sequence of frames to which semantic interpretation can be given.

【００２２】ユーザにとってセマンティック・データ
は入力と結果の関係が明瞭であるが、数値データの場合
はどのように作用するか分かりづらい。For the user, the relationship between the input and the result is clear for semantic data, but it is difficult to understand how it works in the case of numerical data.

【００２３】セマンティック・データは映像シーンを解
釈する必要があるため、セマンティック・データだけに
頼った要約をするには、粒度の細かな注釈が必要とな
る。Since the semantic data needs to interpret a video scene, a fine-grained annotation is required for summarizing based only on the semantic data.

【００２４】通常セマンティック・データは、人手によ
り入力されるため、粒度の細かい詳細な注釈を映像に付
けるためには時間や経費がかかり、また、セマンティッ
ク・データの複数のパラメータを関連付けるためには、
正確な定義文が必要となる。Normally, since semantic data is manually input, it takes time and money to annotate a video with fine-grained and detailed annotations. To associate a plurality of parameters of the semantic data,
An accurate definition statement is required.

【００２５】全ての映像に、粒度の細かなセマンティ
ック・データが付帯しているとは限らないので、セマン
ティック・データが無い場合の要約を数値データのみか
ら行う必要がある。Since not all images are accompanied by fine-grained semantic data, it is necessary to summarize only the numerical data when there is no semantic data.

【００２６】どのような数値データが有効であるかは、
過去に採用されたシーンから重み付けることができる。What kind of numerical data is effective is
Weights can be assigned to scenes that have been adopted in the past.

【００２７】本発明を用いると、各シーンを採用するか
スキップするかで数値データのパラメータ・ポテンシャ
ルを変更でき、ユーザ各人の映像重要度を設定すること
ができるため、期待される要約時間に近似した要約映像
を得ることができる。According to the present invention, the parameter potential of the numerical data can be changed depending on whether each scene is adopted or skipped, and the video importance of each user can be set. An approximate summary video can be obtained.

【００２８】本発明により、長時間の映像からユーザが
好むシーンを含む映像だけを取り出すことができ、わず
かな視聴時間で興味ある映像を楽しむことができる。According to the present invention, it is possible to extract only a video including a scene that the user likes from a long video, and to enjoy an interesting video in a short viewing time.

【００２９】映像に付帯するデータは、データ放送で送
信できるように、映像に比べてデータ容量は小さく、安
価なＳＴＢで要約処理が可能である。The data attached to the video has a smaller data capacity than the video and can be summarized by an inexpensive STB so that it can be transmitted by data broadcasting.

【００３０】映像に付帯するデータが十分でない場合
は、ＳＴＢで映像を処理する必要があるが、処理済のデ
ータを使えば、同じ構成で要約が可能となる。If the data attached to the video is not enough, the video needs to be processed by the STB. However, if the processed data is used, summarization can be performed with the same configuration.

【００３１】[0031]

【発明が解決しようとする課題】本願発明が解決しよう
とする課題の１つは、長時間映像をユーザの望む短時間
のダイジェストに要約するための重要度が、各ユーザ毎
に変更可能となる機能を提供することである。One of the problems to be solved by the present invention is that the importance for summarizing a long video into a short digest desired by a user can be changed for each user. Is to provide functionality.

【００３２】[0032]

【課題を解決するための手段】本発明は、映像から要約
を作成する装置であり、対象となる映像から数値データ
を抽出する手段、前記抽出した数値データに基づいて各
シーンの重要度を決定する手段、前記重要度が所定の閾
値以上の部分から要約を作成する手段を含む。SUMMARY OF THE INVENTION The present invention is an apparatus for creating a summary from a video, means for extracting numerical data from a target video, and determining the importance of each scene based on the extracted numerical data. Means for creating a summary from a portion where the importance is equal to or greater than a predetermined threshold.

【００３３】[0033]

【発明の実施の形態】本発明の実施例として、以下でサ
ッカーのライブ中継の要約を紹介する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS As an embodiment of the present invention, a summary of a live broadcast of soccer will be introduced below.

【００３４】現状では、放送局は映像に付随するセマン
ティック・データ等の情報を付加していないため、要約
のための画像処理は、ユーザ自身が行うことになる。At present, since a broadcasting station does not add information such as semantic data attached to a video, image processing for summarization is performed by the user himself.

【００３５】ユーザが不在の間に、本発明を実装したＳ
ＴＢは、サッカー映像を蓄積（録画）するとともに前も
って定義した画像処理を実行し（追跡、動作分類、カメ
ラモーション検出、選手の位置推定）、画像特徴（パラ
メータ）を抽出する。During the absence of the user, S implementing the present invention
The TB accumulates (records) a soccer video and executes predefined image processing (tracking, motion classification, camera motion detection, player position estimation), and extracts image features (parameters).

【００３６】また同時に、以前から作成してある個人所
有のパラメータ・ポテンシャルや２次元ポテンシャル・
グラフを利用して、その放送の映像重要度を決定してお
く。At the same time, the parameter potential and the two-dimensional potential
The video importance of the broadcast is determined using the graph.

【００３７】視聴しながら興味のない要約が現れた場合
は、映像を早送りしてその場面をスキップさせる。If an uninteresting summary appears while watching, the video is fast-forwarded and the scene is skipped.

【００３８】ユーザが視聴し終えるとＳＴＢは、その日
のユーザの視聴行為を学習するため、要約が受け入れら
れたシーンに対応するパラメータを、パラメータ・ポテ
ンシャルと２次元パラメータ・グラフに投票して加算
し、要約をスキップしたシーンに対応するパラメータ
を、パラメータ・ポテンシャルと２次元ポテンシャル・
グラフから減算する。When the user finishes viewing, the STB votes and adds the parameters corresponding to the scene whose summary has been accepted to the parameter potential and the two-dimensional parameter graph in order to learn the viewing behavior of the user on that day. , Parameters corresponding to the scenes for which the summarization was skipped are defined as parameter potentials and two-dimensional potentials.
Subtract from the graph.

【００３９】もし、放送局が映像に付随するセマンティ
ック・データ等の意味データを送信している場合には、
ユーザは要約生成の前に、見たいキーワード（選手名や
プレーの種類など）を入力し、映像重要度に加えること
も可能であるが必須ではない。If the broadcasting station is transmitting semantic data such as semantic data accompanying the video,
Before generating the summary, the user can input a desired keyword (player name, type of play, etc.) and add it to the video importance, but this is not essential.

【００４０】最初、ユーザが個人の重要度を決定するパ
ラメータ・ポテンシャルを持っていないとき、映像配信
者が送ってくる映像重要度をサンプルとして用いるか、
ユーザが個人で見たいシーンのキーワードを入力して要
約を生成するか、又は映像の早送りを用いた個人の要約
をサンプルとして生成する必要がある。First, when the user does not have a parameter potential for determining the importance of the individual, the video importance sent by the video distributor is used as a sample,
It is necessary for the user to input a keyword of a scene that the user wants to see individually to generate a summary, or to generate a sample of a personal summary using fast-forwarding of a video.

【００４１】サンプルでは、重要度は時間毎の一次元グ
ラフで与えられ、その値は０〜１までの実数とし、高い
重要度を持つ時間帯にはシーンを記述する注釈が付けら
れている（図１）。In the sample, the importance is given as a one-dimensional graph for each time, and its value is a real number from 0 to 1, and an annotation describing a scene is attached to a time zone having a high importance ( (Fig. 1).

【００４２】次にユーザは、セマンティック・データが
ある場合はセマンティック・データから好みの注釈を選
択することにより、サンプル映像重要度は変更され、変
更した重要度から自動的に得られるダイジェスト映像を
見ながら採用するか、或いはスキップするかを選択する
（採用する場合は特に何もする必要はなく、スキップす
る場合だけ、早送りを選択する）。Next, the user selects the desired annotation from the semantic data, if any, to change the sample video importance, and view the digest video automatically obtained from the changed importance. The user selects whether to adopt while skipping or skipping (when adopting, there is no need to do anything, and only when skipping, fast forward is selected).

【００４３】ユーザが映像を見終わると、最新の映像重
要度に対応する数値データの重み付けを計算する。When the user finishes watching the video, the weight of the numerical data corresponding to the latest video importance is calculated.

【００４４】採用された映像に含まれる数値データ
（Ｘｉ）の値をｋｉとしたとき、Ｘｉがｋｉを平均、
σ∧２を分散とする正規分布の値をパラメータ・ポテ
ンシャルに投票する。Assuming that the value of numerical data (Xi) included in the adopted image is ki, Xi is the average of ki,
Vote the value of the normal distribution with variance σ∧2 to the parameter potential.

【００４５】次に他の時区間で数値データ（Ｘｉ）の値
がｈｉのとき、同様に投票値が加算され、数値データ
（Ｘｉ）に対するパラメータ・ポテンシャルは図２のよ
うになる。Next, when the value of the numerical data (Xi) is hi in another time interval, the voting value is similarly added, and the parameter potential for the numerical data (Xi) is as shown in FIG.

【００４６】ダイジェスト映像における加算回数ＳとＸ
ｉにおけるパラメータ・ポテンシャルの総和が０となる
閾値Ｔｈを求め、パラメータ・ポテンシャルを０〜１に
正規化する。The number of additions S and X in the digest video
A threshold Th at which the sum of the parameter potentials at i becomes 0 is obtained, and the parameter potentials are normalized to 0 to 1.

【００４７】他の数値データ（Ｘｎ）についても同様
に、パラメータ・ポテンシャルを計算する。The parameter / potential is similarly calculated for the other numerical data (Xn).

【００４８】加算回数Ｓが小さいとき、パラメータ・ポ
テンシャルの値はばらつくため、閾値Ｔｈは大きな値を
持つが、Ｓが大きくかつ選択されるシーンとパラメータ
に関連性が大きくなると、パラメータ・ポテンシャルは
先鋭化し、そのとき閾値Ｔｈは小さくなる。When the number of additions S is small, the value of the parameter potential varies, so the threshold value Th has a large value. However, when S is large and the relevance of the selected scene and parameter increases, the parameter potential becomes sharp. At this time, the threshold value Th becomes smaller.

【００４９】後述するが、重要度を決めるときポテンシ
ャル値と閾値の差の総和を求めるため、先鋭化したポテ
ンシャル値が強調されることになる。As will be described later, when determining the degree of importance, the sum of the difference between the potential value and the threshold value is obtained, so that the sharpened potential value is emphasized.

【００５０】また、数値データを複数用いる場合は、複
数の数値データを相互に関連付けるため、２次元ポテン
シャル・グラフを生成する。When a plurality of numerical data are used, a two-dimensional potential graph is generated in order to associate the plurality of numerical data with each other.

【００５１】同一フレームにおける数値データ（Ｘｉ）
と（Ｘｊ）の値がそれぞれｋｉ、ｐｊのとき、平均がｋ
ｉ、ｐｊで分散がσ∧２の２次元正規分布を加算す
る。Numerical data (Xi) in the same frame
When the values of and (Xj) are ki and pj, respectively, the average is k
Add a two-dimensional normal distribution with variance σ∧2 at i and pj.

【００５２】ポテンシャルの総和が０になるように閾値
Ｔｈ２を求め、２次元ポテンシャル・グラフを０〜１に
正規化する。ｎ種類の数値データに対してｎ（ｎ−１）
／２の２次元ポテンシャル・グラフが生成できる。The threshold Th2 is determined so that the sum of the potentials becomes 0, and the two-dimensional potential graph is normalized to 0-1. n (n-1) for n kinds of numerical data
/ 2 two-dimensional potential graph can be generated.

【００５３】一方、ユーザが映像をスキップしたとき、
その映像はそれ以降で積極的に採用されることがないよ
う、強化学習の手段を用いて、パラメータポテンシャル
を変更する。On the other hand, when the user skips the video,
The parameter potential is changed by means of reinforcement learning so that the video is not actively adopted thereafter.

【００５４】ここで強化学習とは、一般に外部システム
の判断による成功に対し報酬を与え学習を進める手法の
ことである。Here, the reinforcement learning is generally a method of proceeding the learning by rewarding the success determined by the external system.

【００５５】スキップされた映像に含まれる数値デー
タ（Ｘｉ）の値をｋｉとしたとき、Ｘｉがｋｉを平
均、σを分散とし、ｋｉのポテンシャルｐ（ｋｉ）に対
しｐ（ｋｉ）／２倍の正規分布を減ずる。Assuming that the value of numerical data (Xi) included in the skipped video is ki, Xi is an average of ki, σ is a variance, and p (ki) / 2 times the potential p (ki) of ki. Reduce the normal distribution of.

【００５６】他の数値データ（Ｘｎ）及び、複数の数値
データによる２次元ポテンシャル・グラフについても同
様に減算して、ポテンシャル・グラフを変更する。The potential graph is changed by similarly subtracting the other numerical data (Xn) and the two-dimensional potential graph based on a plurality of numerical data.

【００５７】次に、パラメータ・ポテンシャル及び２次
元ポテンシャルグラフを利用して重要度を定める。Next, the importance is determined using the parameter potential and the two-dimensional potential graph.

【００５８】各フレームにおける映像の数値データの
（ポテンシャル値−Ｔｈ）及び２次元ポテンシャル・グ
ラフの（ポテンシャル値−Ｔｈ２）の総和を求め、シグ
モイド関数により重要度を決定する。The sum of (potential value-Th) of the numerical data of the image in each frame and (potential value-Th2) of the two-dimensional potential graph is obtained, and the importance is determined by the sigmoid function.

【００５９】シグモイド関数におけるＴ=ｃ／Ｓ（ｃ：
定数）として加算回数によって重要度の確信度を向上さ
せる。T = c / S in the sigmoid function (c:
As a constant), the certainty of importance is improved by the number of additions.

【００６０】以上のようにパラメータ・ポテンシャルを
所有しているユーザは、次回からサンプルの重要度を使
わずにポテンシャル・グラフから個人の映像重要度を設
定することができ、更に、映像を見ながら「採用」或い
は「スキップ」を選択することで、ポテンシャル・グラ
フが変更され、ユーザの好みが強く反映される。As described above, the user having the parameter potential can set the individual video importance from the potential graph without using the importance of the sample from the next time. By selecting “adopt” or “skip”, the potential graph is changed, and the preference of the user is strongly reflected.

【００６１】また、ユーザは、ポテンシャル・グラフか
ら求めた重要度を、セマンティック・データを用いて変
更できる。The user can change the degree of importance obtained from the potential graph using the semantic data.

【００６２】セマンティック・データの注釈に一致する
キーワードをユーザが入力すると、該当する時区間の重
要度を最大にし、重要度に基づく要約が行われる。When the user inputs a keyword that matches the annotation of the semantic data, the importance of the corresponding time section is maximized, and a summary based on the importance is performed.

【００６３】このときの入力状況もポテンシャル・グラ
フに反映する。The input situation at this time is also reflected in the potential graph.

【００６４】以上が本発明の処理の流れの概要であり、
以下で図を用いて更に詳細に説明する。The above is the outline of the processing flow of the present invention.
This will be described in more detail below with reference to the drawings.

【００６５】図３は、本発明の全体の流れを示したフロ
ーチャートである。FIG. 3 is a flowchart showing the overall flow of the present invention.

【００６６】ステップ１１０において、本発明の処理が
開始される。In step 110, the process of the present invention is started.

【００６７】ステップ１２０において、重要度の算出が
行われる。In step 120, the importance is calculated.

【００６８】ステップ１３０において、画面の表示及び
ユーザの選択処理が行われる。At step 130, screen display and user selection processing are performed.

【００６９】ステップ１４０において、パラメータ・ポ
テンシャルの更新を行うか否かがチェックされ、そのチ
ェックの結果、パラメータ・ポテンシャルの更新を行う
場合は、ステップ１７０に移行し、パラメータ・ポテン
シャルの更新を行わない場合は、ステップ１５０へ移行
する。At step 140, it is checked whether or not to update the parameter potential. As a result of the check, if the parameter potential is to be updated, the process proceeds to step 170, and the parameter potential is not updated. In this case, the process proceeds to step 150.

【００７０】ステップ１７０において、強化学習法を用
いたパラメータ・ポテンシャルの更新が行われる。In step 170, the parameter potential is updated using the reinforcement learning method.

【００７１】このパラメータ・ポテンシャルを更新は、
このコンテンツの終了時、又はユーザが更新を望んだと
きに行われる。Updating the parameter potential
This is done at the end of this content or when the user wants to update.

【００７２】ステップ１８０において、パラメータ・ポ
テンシャルの選択を行う。In step 180, a parameter / potential is selected.

【００７３】ユーザのパラメータ・ポテンシャルが存在
しないときは、デフォルト値（例えば、コンテンツ配信
者が定める）を用い、ユーザのパラメータ・ポテンシャ
ルが存在する場合はそれを使う。When the parameter potential of the user does not exist, the default value (for example, determined by the content distributor) is used, and when the parameter potential of the user exists, it is used.

【００７４】ステップ１５０において、全フレームが終
了したかチェックされる。At step 150, it is checked whether all frames have been completed.

【００７５】例えば、１つのコンテンツ全てを見終わっ
たとき終了とすることができる。For example, it is possible to end when all of one content has been viewed.

【００７６】ステップ１６０において、次フレームを取
り出し、またステップ１２０へ移行する。At step 160, the next frame is taken out, and the routine goes to step 120.

【００７７】ステップ１９０において、処理を終了す
る。At step 190, the process ends.

【００７８】図４は、図３の重要度算出処理（ステップ
１２０）を更に詳細に説明したフローチャートである。FIG. 4 is a flowchart illustrating the importance calculation process (step 120) of FIG. 3 in more detail.

【００７９】ステップ２１０において、重要度算出処理
が開始される。At step 210, the importance calculation process is started.

【００８０】ステップ２２０において、現フレームにお
ける数値データＸｉ（ｉ＝１，２，・・・ｎ）において
それぞれｐ（Ｘｉ）−Ｔｈ_ｉの値を計算する。In step 220, the value of p (Xi) -Th_i is calculated for the numerical data Xi (i = 1, 2,... N) in the current frame.

【００８１】ステップ２３０において、現フレームにお
ける数値データの組（Ｘｉ，Ｘｊ）（１＜＝ｉ，ｊ＜
＝ｎ，となる全ての組み合わせ）においてそれぞれｐ
（Ｘｉ，Ｘｊ）−Ｔｈ２_ｉｊの値を計算する。At step 230, a set of numerical data (Xi, Xj) (1 <= i, j <
= N, all combinations)
Calculate the value of (Xi, Xj) -Th2_ij.

【００８２】ステップ２４０において、ステップ２２０
及び２３０で得られた全ての値の和（ｓｕｍ_ｐ）を求
める。In step 240, step 220
And the sum (sum_p) of all the values obtained in 230.

【００８３】[0083]

【数１】 (Equation 1)

【００８４】ステップ２５０において、シグモイド関数
により重要度を求める。In step 250, importance is obtained by a sigmoid function.

【００８５】ここで、ジグモイド関数は以下の式で与え
られる。Here, the jigmoid function is given by the following equation.

【００８６】[0086]

【数２】 (Equation 2)

【００８７】図５は、図３の画像表示及びユーザの選択
・投票処理（ステップ１３０）を更に詳細に説明したフ
ロー・チャートである。FIG. 5 is a flowchart illustrating the image display and the user selection / voting process (step 130) of FIG. 3 in more detail.

【００８８】ステップ３１０において、画像表示及びユ
ーザの選択・投票処理（ステップ１３０）が開始され
る。At step 310, image display and user selection / voting processing (step 130) are started.

【００８９】ステップ３２０において、要約を行うか否
かが判断され、要約を行う場合は、ステップ３３０に進
み、要約を行わない場合は、ステップ３４０に進む。In step 320, it is determined whether or not to summarize. If the summary is to be performed, the process proceeds to step 330; otherwise, the process proceeds to step 340.

【００９０】ステップ３３０において、重要度が所定の
閾値Ｔｈ_Ｉｍｐ未満のときは、このシーンは重要でな
いと考えられ、スキップされステップ３７０へ進む。If it is determined in step 330 that the degree of importance is less than the predetermined threshold value Th_Imp, this scene is not considered important, and the process skips to step 370.

【００９１】一方、重要度が所定の閾値Ｔｈ_Ｉｍｐ以
上のときは、このシーンはスキップされず、ステップ３
４０へ進む。On the other hand, if the degree of importance is equal to or greater than the predetermined threshold value Th_Imp, this scene is not skipped and step 3
Proceed to 40.

【００９２】ステップ３４０において、現フレームの画
面が表示される。At step 340, the screen of the current frame is displayed.

【００９３】ステップ３５０において、現画面を見てい
るユーザが、このシーンを見たい（Ａｃｃｅｐｔ）か、
見たくない（Ｒｅｊｅｃｔ）か、或いは、どちらもでも
ない（ｎｅｕｔｒａｌ）かの選択（判断）を行う。At step 350, the user who is looking at the current screen wants to see this scene (Accept)
A selection (determination) is made as to whether the user does not want to see (Reject) or neither (Neutral).

【００９４】この判断の結果、見たい（Ａｃｃｅｐｔ）
又は見たくない（Ｒｅｊｅｃｔ）の場合は、ステップ３
６０に進み、どちらもでもない（ｎｅｕｔｒａｌ）の場
合は、ステップ３７０へ進む。As a result of this judgment, the user wants to see (Accept)
Or, if you do not want to see (Reject), step 3
Proceed to 60, and if neither (neutral), proceed to step 370.

【００９５】ステップ３６０において、現フレームの数
値データＸｉ（ｉ＝１，２，・・・ｎ）を見たい（Ａｃ
ｃｅｐｔ）又は見たくない（Ｒｅｊｅｃｔ）の結果に応
じて投票し、記憶する。At step 360, the user wants to see the numerical data Xi (i = 1, 2,... N) of the current frame (Ac
Vote and store according to the result of “cept” or “Reject”.

【００９６】図６は、図３の強化学習によるパラメータ
・ポテンシャル更新処理（ステップ１７０）を更に詳細
に説明したフローチャートである。FIG. 6 is a flowchart illustrating the parameter / potential update process (step 170) by reinforcement learning in FIG. 3 in more detail.

【００９７】ステップ４１０において、強化学習による
パラメータ・ポテンシャル更新処理が開始される。In step 410, a parameter / potential update process by reinforcement learning is started.

【００９８】ステップ４２０において、投票データ（数
値データＸｉ（ｉ＝１，２，・・・ｎ））を取り出
す。In step 420, voting data (numeric data Xi (i = 1, 2,... N)) is extracted.

【００９９】ステップ４３０において、取り出された投
票データが見たい（Ａｃｃｅｐｔ）又は見たくない（Ｒ
ｅｊｅｃｔ）か判断され、見たい（Ａｃｃｅｐｔ）の場
合は、ステップ４４０へ進み、見たくない（Ｒｅｊｅｃ
ｔ）の場合は、ステップ４５０へ進む。In step 430, the extracted voting data is desired to be viewed (Accept) or not desired (R
eject), and if it is desired to see (Accept), the process proceeds to step 440, and the user does not want to see (Reject).
In the case of t), the process proceeds to step 450.

【０１００】ステップ４４０において、Ｘｉの値がｋｉ
とすると、平均ｋｉ、分散σ＾２とする正規分布の値を
Ｘｉのパラメータ・ポテンシャルに加算し、ｉ＝１〜ｎ
まで同じ処理を繰り返す。At step 440, if the value of Xi is ki
Then, the value of the normal distribution with the mean ki and the variance σ ＾ 2 is added to the parameter potential of Xi, and i = 1 to n
Repeat the same process until.

【０１０１】同時に、（Ｘｉ，Ｘｊ）の値がそれぞれ
（ｋｉ，ｐｊ）とすると、平均（ｋｉ，ｐｊ）、分散σ
＾２の２次元正規分布を（Ｘｉ，Ｘｊ）のポテンシャル
・グラフに加算し、全ての（ｉ,ｊ）（ｉ＝１〜ｎ，ｊ
＝１〜ｍ）の組に対して、同じ処理を繰り返す。At the same time, if the values of (Xi, Xj) are (ki, pj), respectively, the mean (ki, pj) and the variance σ
The two-dimensional normal distribution of ＾ 2 is added to the potential graph of (Xi, Xj), and all (i, j) (i = 1 to n, j
= 1 to m), the same processing is repeated.

【０１０２】ステップ４５０において、Ｘｉの値がｋｉ
とすると、平均ｋｉ、分散σとし、ｋｉのポテンシャル
ｐ（ｋｉ）に対してｐ（ｋｉ）／２の正規分布をＸｉの
パラメータ・ポテンシャルから減じ、ｉ＝１〜ｎまで同
じ処理を繰り返す。In step 450, the value of Xi is set to ki
Then, a normal distribution of p (ki) / 2 is subtracted from the parameter potential of Xi with respect to the potential p (ki) of ki, and the same processing is repeated until i = 1 to n.

【０１０３】同時に、（Ｘｉ，Ｘｊ）の値がそれぞれ
（ｋｉ，ｐｊ）とすると、平均（ｋｉ，ｐｊ）、分散σ
とし、（ｋｉ，ｐｊ）のポテンシャルｐ（ｋｉ，ｐｊ）
に対して、ｐ（ｋｉ，ｐｊ）／２の２次元正規分布を
（Ｘｉ，Ｘｊ）のポテンシャル・グラフから減じ、全て
の（ｉ,ｊ）の組に対して、同じ処理を繰り返す。At the same time, if the values of (Xi, Xj) are (ki, pj), respectively, the average (ki, pj) and the variance σ
And the potential p (ki, pj) of (ki, pj)
, The two-dimensional normal distribution of p (ki, pj) / 2 is subtracted from the potential graph of (Xi, Xj), and the same process is repeated for all (i, j) pairs.

【０１０４】ステップ４６０において、得られた全ての
パラメータ・ポテンシャルに対して、それぞれ総和が０
になるように閾値Ｔｈ１_ｉを求め、０〜１に正規化す
る。In step 460, the sum of all the obtained parameter potentials is 0.
The threshold value Th1_i is obtained such that

【０１０５】同様に２次元ポテンシャル・グラフに対し
ても、閾値Ｔｈ２_ｉｊを求め、０〜１に正規化する。Similarly, the threshold value Th2_ij is obtained for the two-dimensional potential graph and normalized to 0-1.

【０１０６】ステップ４７０において、強化学習による
パラメータ・ポテンシャル更新処理が終了する。In step 470, the parameter / potential update processing by reinforcement learning ends.

【０１０７】図７には、本発明に実施に適したハードウ
ェアの一例としてＰＶＲ（persponal Video Recorder）
の主な構成がブロックで示されている。FIG. 7 shows a PVR (persponal Video Recorder) as an example of hardware suitable for the present invention.
Are shown in blocks.

【０１０８】ブロック５１０は、ビデオ蓄積部であり、
長時間のビデオ・データを蓄積可能なハードディスク装
置やビデオ・テープ・レコーダー等の大容量記憶装置が
利用される。Block 510 is a video storage unit.
A large-capacity storage device such as a hard disk device or a video tape recorder capable of storing video data for a long time is used.

【０１０９】ブロック５２０は、本発明を適用したソフ
トウェアである。A block 520 is software to which the present invention is applied.

【０１１０】ブロック５３０は、ビデオ・バッファであ
り、重要度に基づいて要約されたビデオを一時的に保存
し、ハードディスク装置等の記憶装置が用いられる。Block 530 is a video buffer for temporarily storing the video summarized based on importance, and a storage device such as a hard disk device is used.

【０１１１】ビデオデータ５０５は、ＰＶＲのビデオ蓄
積部５１０に入力され、保存される。The video data 505 is input to and stored in the video storage unit 510 of the PVR.

【０１１２】ビデオ蓄積部５１０は、入力されたビデオ
・データ５０５を順次記憶する。The video storage section 510 sequentially stores the input video data 505.

【０１１３】ビデオ蓄積部５１０に記憶されたビデオ・
データは、本発明を適用したソフトウェア５２０により
処理され、重要度に応じて要約が作成される。The video data stored in video storage section 510
The data is processed by the software 520 to which the present invention is applied, and a summary is created according to the importance.

【０１１４】要約されたビデオ・データは、ビデオ・バ
ッファ５３０に一時的に記憶されユーザが希望するとき
にビデオ出力５３５として出力される。The summarized video data is temporarily stored in video buffer 530 and output as video output 535 when desired by the user.

【０１１５】一方、ビデオ蓄積部に記憶されたビデオ・
データ５１５は、要約されずにそのままビデオ出力とし
て出力することも可能である。On the other hand, the video data stored in the video storage
The data 515 can be output as a video output without being summarized.

【０１１６】図８〜図１０には、これまで説明した本発
明を実際のサッカーの映像への適用例が示されている。FIGS. 8 to 10 show examples in which the present invention described above is applied to actual soccer images.

【０１１７】本発明のユーザ・インタフェースとして
は、図８に示したようなウィンドウ８００が考えられ
る。As the user interface of the present invention, a window 800 as shown in FIG. 8 can be considered.

【０１１８】ウィンドウ８００は、映像表示領域８５
０、学習用の領域８１０、要約用の領域８２０等を含
む。The window 800 displays the image display area 85
0, a learning area 810, a summary area 820, and the like.

【０１１９】「Ｓｔａｒｔ」（スタート）ボタン８０４
をクリックすることにより対象の映像が映像表示領域８
５０に表示が開始され、「Ｓｔｏｐ」（ストップ）ボタ
ン８０６をクリックすることにより表示中の映像が停止
され、「Ｐａｕｓｅ」（ポーズ）ボタン８０８をクリッ
クすると表示中の映像が一時停止する。“Start” button 804
By clicking, the target video is displayed in the video display area 8
The display is started at 50, and the displayed image is stopped by clicking a “Stop” button 806, and the displayed image is paused by clicking a “Pause” button 808.

【０１２０】まず、学習方法は、以下のような処理を行
う。First, the learning method performs the following processing.

【０１２１】画面を見ながら、ユーザはその画面が重要
視したい画像ならば「Ａｃｃｅｐｔ」ボタン８１２をク
リックし、重要視しない画像ならば「Ｒｅｊｅｃｔ」ボ
タン８１４をクリックする。While looking at the screen, the user clicks an “Accept” button 812 if the image is an image that the user wants to emphasize, and clicks a “Reject” button 814 if the image is not an important image.

【０１２２】また、「Ｕｐｄａｔｅ」（更新）８１６ボ
タンをクリックすることで、パラメータ・ポテンシャル
が更新され、ユーザの好みが反映される。By clicking the “Update” (update) button 816, the parameter potential is updated, and the user's preference is reflected.

【０１２３】パラメータ・ポテンシャル自体をユーザは
見ることができないが、各画面の重要度８１８の数値を
見ることにより、ユーザの好みが適切に反映されている
かを確認することができる。Although the user cannot see the parameter / potential itself, by seeing the numerical value of the importance 818 on each screen, it is possible to confirm whether or not the user's preferences are appropriately reflected.

【０１２４】次に、要約方法は、以下のような処理を行
う。Next, the summarization method performs the following processing.

【０１２５】ラジオ・ボタンを「Ｏｎ」（オン）８２２
にすることにより、現フレームから最終フレームにかけ
て重要度が算出され、その重要度の算出結果に基づいて
対象となる映像の要約が行われる。When the radio button is set to “On” 822
Thus, the importance is calculated from the current frame to the last frame, and the target video is summarized based on the calculation result of the importance.

【０１２６】表示領域８３０は要約結果を示し、白い部
分はスキップするシーンを表し、黒い部分は要約するシ
ーンを表している。The display area 830 shows the result of summarization, with white portions representing scenes to be skipped and black portions representing scenes to be summarized.

【０１２７】すなわち、要約を表示する場合には、黒い
部分のシーンのみ表示されることになる。That is, when displaying the summary, only the scene in the black portion is displayed.

【０１２８】更に、要約表示中もユーザは「Ａｃｃｅｐ
ｔ」ボタン８１２又は「Ｒｅｊｅｃｔ」ボタン８１４を
適宜クリックすることにより、更に好みを反映すること
もできる。Further, even during the summary display, the user can select “Accept
By clicking the “t” button 812 or the “Reject” button 814 as appropriate, the preference can be further reflected.

【０１２９】要約用の領域８２０には、閾値８２６や要
約率（％）８２８を表示することも可能である。In the summary area 820, a threshold 826 and a summary rate (%) 828 can be displayed.

【０１３０】閾値８２６は、要約とし採用するシーンの
最低の重要度を示している。The threshold 826 indicates the lowest importance of a scene to be adopted as a summary.

【０１３１】要約率（％）とは、対象映像の表示時間と
要約映像の表示時間の比をパーセントで表したものであ
る。The summarization rate (%) is a ratio of the display time of the target video to the display time of the summary video expressed as a percentage.

【０１３２】図９には、図８に示したサッカーの適用例
に使用する数値データの実例が示されている。FIG. 9 shows an example of numerical data used in the application example of soccer shown in FIG.

【０１３３】図８のようなサッカーの試合を表す数値デ
ータとしては、カメラモーション（カメラの向き）、領
域分割結果、平均色情報、及びこれらの数値データのパ
ラメータ相関等が用いられる。As numerical data representing a soccer game as shown in FIG. 8, camera motion (camera direction), area division result, average color information, parameter correlation of these numerical data, and the like are used.

【０１３４】カメラモーションとしては、パン（左右の
向き（角度））、チルト（上下の向き（角度））の２種
類の値が用いれれる。As the camera motion, two types of values, that is, pan (right and left direction (angle)) and tilt (up and down direction (angle)) are used.

【０１３５】図９にカメラモーションのパン（左右の向
き）についてのパラメータ・ポテンシャル９１０が示さ
れている。FIG. 9 shows parameter potentials 910 for panning (left and right directions) of camera motion.

【０１３６】横軸９１２は、パンの向き（角度）を表
し、左側が角度−９０度で右側が角度＋９０度である。The horizontal axis 912 represents the direction (angle) of the pan. The left side is the angle -90 degrees and the right side is the angle +90 degrees.

【０１３７】縦軸９１４は、各パンの角度に対応して数
値を表している。The vertical axis 914 represents a numerical value corresponding to each pan angle.

【０１３８】パラメータ・ポテンシャル９１０におい
て、１点破線９２０は、投票前のパラメータ・ポテンシ
ャルを表し、破線９１８はユーザによる投票データ、実
線９１６は投票後のパラメータ・ポテンシャルを表して
いる。In the parameter potential 910, the one-dot broken line 920 indicates the parameter potential before voting, the broken line 918 indicates voting data by the user, and the solid line 916 indicates the parameter potential after voting.

【０１３９】領域分割結果としては、最大面積領域の面
積（Ｓ）、最大面積領域の色（ｒ−ｇ、ｂ−ｙ）の値の
３種類の値が用いられる。As the area division result, three values of the area (S) of the maximum area and the color (rg, by) of the maximum area are used.

【０１４０】図９に領域分割結果の最大面積領域の面積
（Ｓ）についてのパラメータ・ポテンシャル９３０が示
されている。FIG. 9 shows a parameter potential 930 for the area (S) of the maximum area as a result of the area division.

【０１４１】横軸９３２は、最大面積領域の面積（Ｓ）
を表し、左側が面積０で右側に行くほど面積が大きくな
る。The horizontal axis 932 indicates the area (S) of the maximum area.
Where the area is 0 on the left side and the area increases as going to the right side.

【０１４２】縦軸９３４は、各最大面積領域の面積
（Ｓ）に対応して数値を表している。The vertical axis 934 shows a numerical value corresponding to the area (S) of each maximum area.

【０１４３】パラメータ・ポテンシャル９３０におい
て、１点破線９４０は、投票前のパラメータ・ポテンシ
ャルを表し、破線９３８はユーザによる投票データ、実
線９３６は投票後のパラメータ・ポテンシャルを表して
いる。In the parameter potential 930, the one-dot broken line 940 indicates the parameter potential before voting, the broken line 938 indicates voting data by the user, and the solid line 936 indicates the parameter potential after voting.

【０１４４】平均色情報として、Ｒ（赤）、Ｇ（緑）、
Ｂ（青）の３種類の値が用いられる。As average color information, R (red), G (green),
Three values of B (blue) are used.

【０１４５】図９に平均色情報Ｇ（緑）についてのパラ
メータ・ポテンシャル９５０が示されている。FIG. 9 shows the parameter potential 950 for the average color information G (green).

【０１４６】横軸９５２は、平均色情報Ｇ（緑）を表
し、左側が面積０で右側に行くほどＧが大きくなる。The horizontal axis 952 represents the average color information G (green), where the left has an area of 0 and G increases toward the right.

【０１４７】縦軸９５４は、各平均色情報Ｇ（緑）に対
応して数値を表している。The vertical axis 954 represents a numerical value corresponding to each average color information G (green).

【０１４８】パラメータ・ポテンシャル９５０におい
て、１点破線９６０は、投票前のパラメータ・ポテンシ
ャルを表し、破線９５８はユーザによる投票データ、実
線９５６は投票後のパラメータ・ポテンシャルを表して
いる。In the parameter potential 950, the one-dot broken line 960 represents the parameter potential before voting, the broken line 958 represents voting data by the user, and the solid line 956 represents the parameter potential after voting.

【０１４９】また、更に上記の合計８つのパラメータの
相関値２８種類の値も利用される。Further, 28 kinds of correlation values of the above eight parameters are used.

【０１５０】従って、図８に示されたサッカーの例で
は、合計３６種類の数値データが重要度を決定するため
に使用される。Therefore, in the example of soccer shown in FIG. 8, a total of 36 kinds of numerical data are used for determining the importance.

【０１５１】しかし、重要度を計算するために使用する
数値データの種類は、対象となる映像により異なり、ま
た、数値データの種類を変えることにより要約の精度も
異なってくる。However, the type of numerical data used to calculate the degree of importance differs depending on the video to be processed, and the accuracy of summarization also changes by changing the type of numerical data.

【０１５２】カメラモーション（カメラの向き）、領域
分割結果、平均色情報の数値データは、原データを使う
場合と、データを主成分分析した特徴データを使うこと
も可能である。As the numerical data of camera motion (camera direction), area division result, and average color information, it is possible to use original data or feature data obtained by performing principal component analysis on the data.

【０１５３】しかし、原データをそのまま使うと、デー
タ間の相関を無視しているため、寄与するデータとそう
でないデータが同じ尺度で扱われ、相関の高いデータで
は、２重投票を行ったのと同じ効果になる場合があると
いう問題が生じる。However, if the original data is used as it is, since the correlation between the data is ignored, the contributing data and the non-contributing data are treated on the same scale. For the highly correlated data, double voting is performed. There is a problem that the same effect may be obtained.

【０１５４】従って、これらの原データに対し、主成分
分析を行い、データ間の相関が小さくなる特徴量に置き
換えた投票を行うことにする。Therefore, a principal component analysis is performed on these original data, and voting is performed by replacing the original data with a feature amount that reduces the correlation between the data.

【０１５５】ここで、主成分分析とは、ｎ次元のデータ
を互いに独立な小数個の指標で表そうとすることであ
る。Here, the principal component analysis is to represent n-dimensional data with a small number of indexes independent of each other.

【０１５６】主成分分析では、ｎ次のパラメータ群に対
し，相関のもっとも小さい順に並べたｍ次（ｍ＜ｎ）の
特徴量で表現する。In the principal component analysis, an n-order parameter group is represented by m-th (m <n) feature quantities arranged in the order of the smallest correlation.

【０１５７】図９に示した各データは、原データを使っ
た場合の投票の様子を表したものである。Each data shown in FIG. 9 represents a state of voting when the original data is used.

【０１５８】図１０には、図９を用いて説明して数値デ
ータからもとめた映像の各シーンの重要度が示されてい
る。FIG. 10 shows the degree of importance of each scene of the video obtained from the numerical data described with reference to FIG.

【０１５９】図１０の横軸１００２は、時間を表し、縦
軸１００４は、重要度が示されている。In FIG. 10, the horizontal axis 1002 indicates time, and the vertical axis 1004 indicates importance.

【０１６０】いま希望する要約を作成するために適した
重要度の閾値が１００６で示されている。A threshold value of importance suitable for creating a desired summary is indicated by reference numeral 1006.

【０１６１】すなわち、対象となるサッカー映像のうち
閾値以上の重要度を有する部分は、要約に含められ（シ
ーンＡ、Ｂ、Ｃ、Ｄ）、閾値未満の重要度を有する部分
（斜線の部分）は、要約には含まれずカットされること
になる。That is, the portion of the target soccer video having the importance not less than the threshold is included in the summary (scenes A, B, C, and D), and the portion having the importance less than the threshold (shaded portion). Will not be included in the summary and will be cut.

【０１６２】以上により、図１０に示された例の場合に
は、時間が約半分に短縮された要約が作成される。As described above, in the case of the example shown in FIG. 10, a summary whose time is reduced to about half is created.

【０１６３】[0163]

【効果】本願発明の構成によって、ユーザの好みに沿っ
た映像の要約を作成することが可能となる。According to the configuration of the present invention, it is possible to create a video summary according to the user's preference.

【０１６４】また、映像を見ながら更にユーザの好みを
要約に反映させることも可能となる。Further, it is possible to reflect the user's preference in the summary while watching the video.

【０１６５】以下まとめとして他の実施例を記載する。The following is a summary of another embodiment.

【０１６６】（１）映像から要約を作成する方法であっ
て、対象となる映像から数値データを抽出するステップ
と、前記抽出した数値データに基づいて各シーンの重要
度を決定するステップと、前記重要度が所定の閾値以上
の部分から要約を作成するステップと、を含む方法。（２）映像から要約を作成する方法であって、対象とな
る映像から数値データを抽出するステップと、前記対象
となる映像がセマンティック・データを含む場合には当
該セマンティック・データ及び前記数値データに基づい
て各シーンの重要度を決定し、前記対象となる映像がセ
マンティック・データを含まない場合には、前記数値デ
ータに基づいて各シーンの重要度を決定するステップ
と、前記重要度が所定の閾値以上の部分から要約を作成
するステップと、を含む方法。（３）映像から要約を作成する方法であって、対象とな
る映像から複数の種類の数値データを抽出するステップ
と、前記抽出した複数の種類の数値データに基づいて各
シーンの重要度を決定するステップと、前記重要度が所
定の閾値以上の部分から要約を作成するステップと、を
含む方法。（４）映像から要約を作成する方法であって、対象とな
る映像から数値データを抽出するステップと、対象とな
る映像を表示中にユーザの好みを表すデータを受け取る
ステップと、前記抽出した数値データ及び前記ユーザの
好みを表すデータに基づいて各シーンの重要度を決定す
るステップと、前記重要度が所定の閾値以上の部分から
要約を作成するステップと、を含む方法。（５）映像から要約を作成する方法であって、対象とな
る映像から数値データを抽出するステップと、前記抽出
した数値データに基づいて各シーンの重要度を決定する
ステップと、前記重要度が所定の閾値以上の部分から要
約を作成するステップと、前記作成された要約を表示す
るステップと、前記要約を表示中にユーザの入力により
前記重要度を変更するステップと、を含む方法。（６）映像から要約を作成する装置であって、対象とな
る映像から数値データを抽出する手段と、前記抽出した
数値データに基づいて各シーンの重要度を決定する手段
と、前記重要度が所定の閾値以上の部分から要約を作成
する手段と、を含む装置。（７）映像から要約を作成する装置であって、対象とな
る映像から数値データを抽出する手段と、前記対象とな
る映像がセマンティック・データを含む場合には当該セ
マンティック・データ及び前記数値データに基づいて各
シーンの重要度を決定し、前記対象となる映像がセマン
ティック・データを含まない場合には、前記数値データ
に基づいて各シーンの重要度を決定する手段と、前記重
要度が所定の閾値以上の部分から要約を作成する手段
と、を含む方法。（８）映像から要約を作成する装置であって、対象とな
る映像から複数の種類の数値データを抽出する手段と、
前記抽出した複数の種類の数値データに基づいて各シー
ンの重要度を決定する手段と、前記重要度が所定の閾値
以上の部分から要約を作成する手段と、を含む装置。（９）映像から要約を作成する装置であって、対象とな
る映像から数値データを抽出する手段と、対象となる映
像を表示中にユーザの好みを表すデータを受け取る手段
と、前記抽出した数値データ及び前記ユーザの好みを表
すデータに基づいて各シーンの重要度を決定する手段
と、前記重要度が所定の閾値以上の部分から要約を作成
する手段と、を含む装置。（１０）映像から要約を作成する装置であって、対象と
なる映像から数値データを抽出する手段と、前記抽出し
た数値データに基づいて各シーンの重要度を決定する手
段と、前記重要度が所定の閾値以上の部分から要約を作
成する手段と、前記作成された要約を表示する手段と、
前記要約を表示中にユーザの入力により前記重要度を変
更する手段と、を含む装置。（１１）映像から要約を作成するシステムであって、前
記システムは、プロセッサと、メモリと、入力された映
像を記憶する記憶装置と、映像要約装置とを含み、前記
映像要約装置は、対象となる映像から数値データを抽出
する手段と、前記抽出した数値データに基づいて各シー
ンの重要度を決定する手段と、前記重要度が所定の閾値
以上の部分から要約を作成する手段と、を含むシステ
ム。（１２）映像から要約を作成するプログラムを記憶する
記録媒体であって、前記プログラムは、対象となる映像
から数値データを抽出するステップと、前記抽出した数
値データに基づいて各シーンの重要度を決定するステッ
プと、前記重要度が所定の閾値以上の部分から要約を作
成するステップと、を含む記録媒体。(1) A method of creating an abstract from a video, comprising the steps of extracting numerical data from a target video, determining the importance of each scene based on the extracted numerical data, Creating a summary from portions whose importance is greater than or equal to a predetermined threshold. (2) A method of creating an abstract from a video, comprising the steps of: extracting numerical data from a target video; and, if the target video includes semantic data, adding numeric data to the semantic data and the numerical data. Determining the importance of each scene based on the numerical data, and determining the importance of each scene based on the numerical data if the target video does not include semantic data; Creating a digest from the portions above the threshold. (3) A method of creating a summary from a video, wherein a plurality of types of numerical data are extracted from a target video, and importance of each scene is determined based on the extracted plurality of types of numerical data. And summarizing from portions where the importance is greater than or equal to a predetermined threshold. (4) A method of creating an abstract from a video, comprising the steps of: extracting numerical data from the target video; receiving data representing user preference while displaying the target video; A method comprising: determining importance of each scene based on data and data representing the user's preference; and creating a summary from portions where the importance is greater than or equal to a predetermined threshold. (5) A method of creating an abstract from a video, comprising the steps of: extracting numerical data from a target video; determining the importance of each scene based on the extracted numerical data; A method comprising: creating a summary from a portion equal to or greater than a predetermined threshold; displaying the created summary; and changing the importance by user input while displaying the summary. (6) An apparatus for creating a summary from a video, a means for extracting numerical data from a target video, a means for determining the importance of each scene based on the extracted numerical data, Means for creating a digest from portions that are greater than or equal to a predetermined threshold. (7) An apparatus for creating an abstract from a video, comprising: means for extracting numerical data from a target video; and, when the target video includes semantic data, the means for extracting the semantic data and the numerical data. Means for determining the importance of each scene based on the numerical data, and determining the importance of each scene based on the numerical data when the target video does not include semantic data; Means for creating a summary from the portions above the threshold. (8) a device for creating a digest from a video, a means for extracting a plurality of types of numerical data from a target video;
An apparatus comprising: means for determining importance of each scene based on the extracted plurality of types of numerical data; and means for creating a summary from a portion where the importance is equal to or greater than a predetermined threshold. (9) An apparatus for creating a summary from a video, a unit for extracting numerical data from a target video, a unit for receiving data indicating user's preference while displaying the target video, and the extracted numerical value Apparatus comprising: means for determining importance of each scene based on data and data representing the user's preference; and means for creating a summary from portions where the importance is equal to or greater than a predetermined threshold. (10) An apparatus for creating a digest from a video, means for extracting numerical data from a target video, means for determining the importance of each scene based on the extracted numerical data, Means for creating a summary from a portion equal to or greater than a predetermined threshold, and means for displaying the created summary,
Means for changing the importance by user input while the summary is being displayed. (11) A system for creating a summary from a video, the system including a processor, a memory, a storage device for storing an input video, and a video summary device, wherein the video summary device includes a target A means for extracting numerical data from the video, a means for determining importance of each scene based on the extracted numerical data, and a means for creating a summary from a portion where the importance is equal to or greater than a predetermined threshold. system. (12) A recording medium for storing a program for creating an abstract from a video, wherein the program extracts numerical data from a target video, and determines importance of each scene based on the extracted numerical data. A recording medium comprising: a step of determining; and a step of creating an abstract from a portion where the importance is equal to or greater than a predetermined threshold.

[Brief description of the drawings]

【図１】サンプル映像の重要度とシーンの関係を示した
図である。FIG. 1 is a diagram showing the relationship between the importance of a sample video and a scene.

【図２】パラメータポテンシャルへ有効パラメータの投
票を示した図である。FIG. 2 is a diagram showing voting of effective parameters to parameter potential.

【図３】本発明の全体の処理を示すフローチャートであ
る。FIG. 3 is a flowchart showing an entire process of the present invention.

【図４】本発明の重要度算出処理を示すフローチャート
である。FIG. 4 is a flowchart illustrating an importance calculation process according to the present invention.

【図５】本発明の画面表示、ユーザの選択処理を示すフ
ローチャートである。FIG. 5 is a flowchart showing screen display and user selection processing of the present invention.

【図６】本発明の強化学習によるパラメータポテンシャ
ル更新処理を示すフローチャートである。FIG. 6 is a flowchart illustrating a parameter potential updating process by reinforcement learning according to the present invention.

【図７】本発明を適用したＰＶＲの構成を示す図であ
る。FIG. 7 is a diagram showing a configuration of a PVR to which the present invention is applied.

【図８】サッカー映像への本発明の適用例を示す図であ
る。FIG. 8 is a diagram showing an application example of the present invention to a soccer video.

【図９】本発明に使用される数値データの例を示す図で
ある。FIG. 9 is a diagram showing an example of numerical data used in the present invention.

【図１０】本発明を適用した場合のシーンと要約の関係
を示す図である。FIG. 10 is a diagram showing a relationship between a scene and a summary when the present invention is applied.

[Explanation of symbols]

５００ＰＶＲ５０５ビデオ入力５１０ビデオ蓄積部５２０映像要約部５３０ビデオ・バッファ５３５ビデオ出力 500 PVR 505 Video input 510 Video storage unit 520 Video summarization unit 530 Video buffer 535 Video output

───────────────────────────────────────────────────── フロントページの続き (72)発明者富田アルベルト神奈川県大和市下鶴間1623番地14 日本アイ・ビー・エム株式会社東京基礎研究所内 (72)発明者益満健神奈川県大和市下鶴間1623番地14 日本アイ・ビー・エム株式会社東京基礎研究所内 (56)参考文献特開2000−350165（ＪＰ，Ａ) 特開2000−299829（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) H04N 5/76 - 5/956 H04N 5/262 - 5/278 ──────────────────────────────────────────────────続き Continued on the front page (72) Inventor Tomita Alberto 1623-14 Shimotsuruma, Yamato-shi, Kanagawa Prefecture IBM Japan, Ltd. Address 14 IBM Japan, Ltd. Tokyo Research Laboratory (56) References JP-A-2000-350165 (JP, A) JP-A-2000-299829 (JP, A) (58) Fields studied (Int. . ^7, DB name) H04N 5/76 - 5/956 H04N 5/262 - 5/278

Claims

(57) [Claims]

1. A method for creating a digest from a video, comprising the steps of: extracting numerical data from the target video; and, if the target video includes semantic data, the semantic data and the numerical value. Determining the importance of each scene based on the data, and, if the target video does not include semantic data, determining the importance of each scene based on the numerical data; Creating a summary from portions that are greater than or equal to a predetermined threshold.

2. A method for creating a digest from a video, the method comprising: extracting numerical data from a target video; receiving data representing a user's preference while displaying the target video; Determining the importance of each scene based on the obtained numerical data and data representing the user's preference; and generating a summary from a portion where the importance is equal to or greater than a predetermined threshold.

3. A method for creating an abstract from a video, comprising: extracting numerical data from a target video; determining the importance of each scene based on the extracted numerical data; A method comprising: creating a summary from a portion whose degree is equal to or greater than a predetermined threshold; displaying the created summary; and changing the importance by user input while displaying the summary.

4. An apparatus for creating a summary from a video, comprising: means for extracting numerical data from a target video; and, when the target video includes semantic data, the semantic data and the numerical value. Means for determining the importance of each scene based on the data, and when the target video does not include semantic data, means for determining the importance of each scene based on the numerical data; Means for creating a summary from portions that are greater than or equal to a predetermined threshold.

5. An apparatus for creating a digest from a video, comprising: means for extracting numerical data from a target video; means for receiving data representing user's preference while displaying the target video; A means for determining the importance of each scene based on the obtained numerical data and data representing the user's preference; and a means for creating a summary from a portion where the importance is equal to or greater than a predetermined threshold.

6. An apparatus for creating a digest from a video, comprising: means for extracting numerical data from a target video; means for determining importance of each scene based on the extracted numerical data; An apparatus comprising: means for creating a summary from a portion having a degree equal to or greater than a predetermined threshold; means for displaying the created summary; and means for changing the importance by user input while the summary is being displayed.

7. A system for creating a summary from a video, the system comprising: a processor, a memory, a storage device for storing an input video, and a video summary device, wherein the video summary device comprises: Means for extracting numerical data from the target video, and when the target video includes semantic data, determine the importance of each scene based on the semantic data and the numerical data. In the case where the video does not include semantic data, means for determining importance of each scene based on the numerical data, and means for creating a summary from a portion where the importance is equal to or greater than a predetermined threshold value are included. system.

8. A computer-readable recording medium for storing a program for creating an abstract from a video, wherein the program causes a computer to extract numerical data from a target video; Receiving data representing the user's preference during display; determining the importance of each scene based on the extracted numerical data and data representing the user's preference; and wherein the importance is greater than or equal to a predetermined threshold. Creating a summary from the parts;