JP2003061038A

JP2003061038A - Video contents edit aid device and video contents video aid method

Info

Publication number: JP2003061038A
Application number: JP2001249669A
Authority: JP
Inventors: Hideyoshi Tominaga; 英義富永; Akiyuki Kodate; 亮之小舘; Kentaro Dobashi; 健太郎土橋; Ryohei Ogushi; 亮平大串; Takeshi Hanamura; 剛花村
Original assignee: Waseda University; Media Glue Corp
Current assignee: Waseda University; Media Glue Corp
Priority date: 2001-08-20
Filing date: 2001-08-20
Publication date: 2003-02-28

Abstract

PROBLEM TO BE SOLVED: To freely edit video contents in a short time by the intention of an end user. SOLUTION: A video contents analyzer 82 analyzes contents of an original video image, and a contents-providing device 83 provides a particular scene extracted from the contents. When an edit instruction is sent, while confirming the provided particular scene, a video configuration device 89 provides an edited object video image resulting from applying edit processing to the original image along with the edit instruction this time. Upon the receipt of revision of edit processing, a new edited object video image subjected to revised edit processing is provided to an end user. When receiving an edit processing end instruction, the edited object video image provided, immediately prior to this is converted into a desired format and the resulting image is distributed as the edited video image.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、エンドユーザーの
意志で映像コンテンツの編集を行なうことができる映像
コンテンツ編集支援装置および映像コンテンツ編集支援
方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a video content editing support device and a video content editing support method which allow an end user to edit video content.

【０００２】[0002]

【発明が解決しようとする課題】近年、高性能ディジタ
ルビデオカメラの普及により、一般ユーザーでも手軽に
ディジタル映像コンテンツを撮影・所有できるようにな
ってきている。これに伴い映像編集の要求も高まってお
り、汎用のパーソナルコンピュータでは映像コンテンツ
の編集環境（オーサリングツール）を搭載したものが、
ハードウェアおよびソフトウエア上の機能として備わっ
ている。しかし、現在のこうした編集機能は映像コンテ
ンツを時間順に全て見て行くしかなく、録音時間以上の
編集時間がかかるため、必ずしも一般ユーザーにとって
使いやすい環境が整っているとは言えなかった。With the spread of high-performance digital video cameras in recent years, general users can easily take and own digital video contents. Along with this, the demand for video editing is increasing, and general-purpose personal computers equipped with a video content editing environment (authoring tool)
It is provided as a function on hardware and software. However, such an editing function at present has no choice but to look at all the video contents in chronological order, and because it takes more editing time than recording time, it is not always easy to use for general users.

【０００３】ディジタルビデオカメラにより撮影した動
画像（例えば、旅行，運動会，卒業アルバム，自分史，
公演会など）は、データ容量を少なくするために、パソ
コン上でＭＰＥＧ（Moving Picture Experts Group）方
式の圧縮映像コンテンツに変換されるが、特に一般ユー
ザーが撮影した映像コンテンツの内容はぶれが多く冗長
なことが多いため、不要な部分をカットして短いシーン
に編集することが望まれる。しかし、編集に際しては映
像コンテンツそのものの評価を目で追いながら行なう必
要があり、その編集処理に膨大な手間と時間がかかるた
め、実際には一般ユーザーレベルでの編集が殆ど実行さ
れていなかった。A moving image taken by a digital video camera (eg, travel, athletic meet, graduation album, personal history,
In order to reduce the data volume, performances such as concerts are converted to MPEG (Moving Picture Experts Group) compressed video content on a personal computer, but the content of video content shot by general users is often blurred and redundant. Since it is often the case, it is desirable to cut unnecessary parts and edit them into short scenes. However, when editing, it is necessary to follow the evaluation of the video content itself with the eyes, and since the editing process requires enormous effort and time, in reality, editing at the general user level has hardly been executed.

【０００４】そこで本発明は上記問題点に鑑み、エンド
ユーザーの意志で映像コンテンツを短時間かつ自由に編
集できる映像コンテンツ編集支援装置および映像コンテ
ンツ編集支援方法を提供することをその目的とする。Therefore, in view of the above problems, it is an object of the present invention to provide a video content editing support device and a video content editing support method which allow the end user to freely edit video content in a short time.

【０００５】[0005]

【課題を解決するための手段】本発明の請求項１の映像
コンテンツ編集支援装置は、編集対象となる原映像を受
取る受取手段と、前記原映像の内容を解析する映像内容
解析手段と、前記映像内容解析手段の解析内容から特定
のシーンを抽出して提示する内容提示手段と、前記特定
のシーンに基づく編集命令を受付ける編集命令受付手段
と、前記編集命令に沿って前記原映像を編集処理した編
集済候補映像を提示する編集済候補映像提示手段と、前
記編集処理の変更を受付けると、この変更した編集処理
による新たな編集済候補映像を前記編集済候補映像提示
手段より提示させる編集処理変更受付手段と、編集処理
の完了命令を受付けると、その直前に提示した編集済候
補映像を所望のフォーマットに変換して、編集済映像と
して配信する編集済映像配信手段とを備えて構成され
る。According to a first aspect of the present invention, there is provided a video content editing support device, a receiving means for receiving an original video to be edited, a video content analyzing means for analyzing the content of the original video, Content presentation means for extracting and presenting a specific scene from the analysis content of the video content analysis means, editing command receiving means for receiving an editing command based on the specific scene, and editing processing of the original video along the editing command. Edited candidate video presenting means for presenting the edited edited candidate video, and edit processing for accepting a change in the edit processing, and causing the edited edited candidate video presenting means to present a new edited candidate video by the changed edit processing. Upon receiving the change accepting means and the completion command of the editing process, the edited candidate video presented immediately before that is converted into a desired format and distributed as the edited video. Configured to include a video distribution means.

【０００６】また本発明の請求項４の映像コンテンツ編
集支援方法は、編集対象となる原映像を受取り、この原
映像の解析内容により抽出した特定のシーンを提示し、
この特定のシーンに基づく編集命令を受付けると、この
編集命令に沿って前記原映像を編集処理した編集済候補
映像を提示し、前記編集処理の変更を受付けると、この
変更した編集処理による新たな編集済候補映像を提示
し、編集処理の完了命令を受付けると、その直前に提示
した編集済候補映像を所望のフォーマットに変換して、
編集済映像として配信することを特徴とする。Further, the video content editing support method according to claim 4 of the present invention receives an original video to be edited, presents a specific scene extracted by analysis contents of the original video,
When an edit command based on this specific scene is accepted, an edited candidate video in which the original video is edited according to the edit command is presented, and when a change in the edit process is accepted, a new one is created by the modified edit process. When the edited candidate video is presented and the editing processing completion command is received, the edited candidate video presented immediately before that is converted into a desired format,
The feature is that it is delivered as an edited video.

【０００７】この場合、取得した原映像の内容が解析さ
れ、その中から抽出した特定のシーンがエンドユーザー
に提示される。エンドユーザーは提示された特定のシー
ンを確認しながら、原画像に対しどのような編集を施せ
ばよいのかを編集命令として送り出すと、今度はこの編
集命令に沿って原画像を編集処理した編集済候補映像が
エンドユーザーに提示される。エンドユーザーからの編
集処理の変更を受付けると、この変更した編集処理によ
る新たな編集済候補映像をエンドユーザーに提示すると
共に、編集処理の完了命令を受付けると、その直前に提
示した編集済候補映像を所望のフォーマットに変換し
て、これを編集済映像として配信する。In this case, the content of the acquired original image is analyzed, and the specific scene extracted from it is presented to the end user. While confirming the specific scene presented, the end user sends out as an editing command what kind of editing should be performed on the original image, this time the original image is edited according to this editing command The candidate video is presented to the end user. When the editing process change from the end user is accepted, the new edited candidate image by this modified editing process is presented to the end user, and when the editing process completion command is accepted, the edited candidate image presented immediately before that is received. Is converted into a desired format and distributed as an edited video.

【０００８】このように、エンドユーザー側で原映像を
編集するに際しては、原映像そのものではなく、原映像
の内容を解析して得られた特定のシーンや、編集処理を
行なった編集済候補映像を見ながら、編集に必要なコマ
ンドを適宜送り出すだけでよい。そのため、映像コンテ
ンツ（原映像）そのもののを目で追いながらその内容を
評価する手間が省け、エンドユーザーの意志で映像コン
テンツを短時間かつ自由に編集することが可能になる。As described above, when the original video is edited on the end user side, not the original video itself but a specific scene obtained by analyzing the content of the original video or an edited candidate video subjected to the editing process. All you have to do is send the appropriate commands for editing while watching. Therefore, it is possible to save time and effort to evaluate the content of the video content (original video) itself with eyes and edit the video content freely in a short time at the will of the end user.

【０００９】上記構成において、請求項２の映像コンテ
ンツ編集支援装置では、前記映像内容解析手段は、原映
像を複数のシーンに区画して、各シーン毎の映像の見易
さを複数の評価レベルの中から判定するものであると共
に、前記内容提示手段は、要求のあった条件に合致する
評価レベルのシーンを前記特定のシーンとして提示する
ものであることを特徴とする。In the above-mentioned structure, in the video content editing support apparatus according to claim 2, the video content analyzing means divides the original video into a plurality of scenes, and the visibility of the video for each scene is evaluated at a plurality of evaluation levels. In addition to the above, the content presenting means presents a scene of an evaluation level that matches a requested condition as the specific scene.

【００１０】また上記方法において、請求項５の映像コ
ンテンツ編集支援方法は、前記原画像の解析内容により
抽出した特定のシーンを提示するに際して、原映像を複
数のシーンに区画して、各シーン毎の映像の見易さを複
数の評価レベルの中から判定すると共に、要求のあった
条件に合致する評価レベルのシーンを前記特定のシーン
として提示することを特徴とする。Further, in the above method, the video content editing support method according to claim 5 divides the original video into a plurality of scenes when presenting a specific scene extracted by the analysis content of the original image, and for each scene. Is determined from a plurality of evaluation levels, and a scene of an evaluation level that matches the requested condition is presented as the specific scene.

【００１１】この場合、受取った原映像を複数のシーン
に区画し、各シーン毎の映像の見易さを複数の評価レベ
ルの中から判定する。そして、エンドユーザーから要求
した条件に合致する評価レベルのシーンが、特定のシー
ンとしてエンドユーザーに提示される。したがって、エ
ンドユーザーはいちいち原画像における各シーンの評価
を行なうことなく、必要な評価レベルのシーンを編集に
先立ち確認することができる。In this case, the received original video image is divided into a plurality of scenes, and the visibility of the video image for each scene is determined from a plurality of evaluation levels. Then, the scene of the evaluation level that matches the condition requested by the end user is presented to the end user as a specific scene. Therefore, the end user can confirm the scene of the required evaluation level before editing without evaluating each scene in the original image.

【００１２】上記構成において、請求項３の映像コンテ
ンツ編集支援装置では、前記内容提示手段は、前記特定
のシーンの中の代表的な静止画を提示するものであるこ
とを特徴とする。In the above structure, in the video content editing support apparatus according to claim 3, the content presenting means presents a representative still image in the specific scene.

【００１３】また上記方法において、請求項６の映像コ
ンテンツ編集支援方法は、前記特定のシーンの提示に際
して、この特定のシーンの中の代表的な静止画を提示す
ることを特徴とする。Further, in the above method, the video content editing support method according to claim 6 is characterized in that, at the time of presenting the specific scene, a representative still image in the specific scene is presented.

【００１４】この場合、エンドユーザーは必要となる特
定のシーンの代表的静止画だけを確認すればよく、特定
のシーンとして提示する情報量の削減を図ることが可能
になる。In this case, the end user only needs to confirm the representative still image of the required specific scene, and it is possible to reduce the amount of information presented as the specific scene.

【００１５】[0015]

【発明の実施形態】以下、添付図面に基づき、本発明に
おける映像コンテンツ編集支援装置の一実施例を説明す
る。BEST MODE FOR CARRYING OUT THE INVENTION An embodiment of a video content editing support device according to the present invention will be described below with reference to the accompanying drawings.

【００１６】図１は、装置の全体構成を示すブロック図
である。本装置は、映像コンテンツ編集支援業者などが
センター内に設置した装置本体に相当するビデオコンテ
ンツ編集支援システム71と、インターネット環境でエン
ドユーザーとの情報の伝達を行なう通信手段としての情
報ネットワーク72とにより構成される。なお73は、例え
ばパーソナルコンピュータや携帯電話などの端末であっ
て、これは情報ネットワーク72の端部に接続される。端
末73は周知のように、複数の操作キーを有する操作部
と、液晶表示器やＣＲＴなどの表示部とを備えて構成さ
れる。そして、情報ネットワーク72に端末73を適宜接続
することで、ビデオコンテンツ編集支援システム71が管
理するインターネット上のホームページを通して、この
ビデオコンテンツ編集支援システム71から様々な情報を
取得したり、逆に各種情報を送り出せるようになってい
る。FIG. 1 is a block diagram showing the overall configuration of the apparatus. This device is composed of a video content editing support system 71 equivalent to the device main body installed in the center by a video content editing support company, and an information network 72 as a communication means for transmitting information to end users in the Internet environment. Composed. Reference numeral 73 is a terminal such as a personal computer or a mobile phone, which is connected to an end of the information network 72. As is well known, the terminal 73 includes an operation unit having a plurality of operation keys and a display unit such as a liquid crystal display or a CRT. Then, by properly connecting the terminal 73 to the information network 72, various information can be acquired from the video content editing support system 71 through the home page on the Internet managed by the video content editing support system 71, and vice versa. Can be sent out.

【００１７】次に、支援システム本体１の詳細を説明す
ると、81はユーザーからの要求によって素材となる原映
像となる映像コンテンツを受取る受取手段に相当する受
取装置である。この受取装置81は、例えば情報ネットワ
ーク72を経由して原映像である素材ビデオデータを取得
する映像入力インターフェースでもよいし、ユーザーか
ら預かった記憶媒体（例えばディジタルビデオテープや
ＣＤ−ＲＯＭなど）より原映像を読取る読取装置でもよ
い。いずれにせよ、ここでの受取り装置81は、各種フォ
ーマットで作成された原映像を取込んで、ビデオコンテ
ンツ編集支援システム71で使用できる出力形式に変換す
る装置として構成される。Next, the support system main body 1 will be described in detail. Reference numeral 81 is a receiving device corresponding to a receiving means for receiving a video content which is an original video serving as a material in response to a request from a user. The receiving device 81 may be, for example, a video input interface for acquiring material video data that is an original video via the information network 72, or may be an original from a storage medium (eg, digital video tape or CD-ROM) entrusted to the user. It may be a reading device that reads an image. In any case, the receiving device 81 here is configured as a device that takes in the original video created in various formats and converts it into an output format that can be used by the video content editing support system 71.

【００１８】82は、受取装置81から取り込んだ原映像の
内容を解析する映像内容解析手段に相当する映像内容解
析装置であり、また83は、原映像の内容解析結果に基づ
き得られた特定のシーンを、情報ネットワーク72を通し
て端末73に提示する内容提示手段に相当する内容提示装
置である。映像内容解析装置82は、例えばＭＰＥＧ２方
式に圧縮された原映像の映像コンテンツを、シーンチェ
ンジとカメラワークを含むシーンの切り替わりにより個
々に切り分けて映像蓄積装置84に記憶させると共に、そ
の切り分けた各シーンに対応したＩＤ番号を、映像の見
易さをの評価をあらわす評価レベルなどと共に映像ＩＤ
蓄積装置85に記憶させる機能を備えている。なお、ここ
では映像蓄積装置84と映像ＩＤ蓄積装置85が共通する記
憶装置86として構成されているが、例えば別々の記憶装
置で構成してもよい。そして、これらの映像蓄積装置84
と映像ＩＤ蓄積装置85に記憶された映像コンテンツは、
ビデオコンテンツ編集支援システム71内に備えたコンテ
ンツ管理装置87により一元的に管理される。映像内容解
析装置82がどのようにして原映像の内容を解析するのか
については、後程詳しく説明するが、映像内容解析装置
82により解析して得られた特定のシーンの中から、代表
的な静止画が内容提示装置83により情報ネットワーク72
を経由して端末73に送り出され、端末73側でこれらの静
止画を並べて見ることができるようになっている。Reference numeral 82 is a video content analysis device corresponding to a video content analysis means for analyzing the content of the original video taken in from the receiving device 81, and 83 is a specific content obtained based on the content analysis result of the original video. The content presentation device corresponds to content presentation means for presenting a scene to the terminal 73 through the information network 72. The video content analysis device 82 individually separates the video contents of the original video compressed in the MPEG2 system, for example, by changing scenes including scene changes and camera work, and stores them in the video storage device 84. The ID number corresponding to the video ID together with the evaluation level, which indicates the evaluation of the ease of viewing the video.
The storage device 85 has a function of storing. Although the video storage device 84 and the video ID storage device 85 are configured as the common storage device 86 here, they may be configured as separate storage devices, for example. And these video storage devices 84
And the video content stored in the video ID storage device 85,
It is centrally managed by a content management device 87 provided in the video content editing support system 71. The details of how the video content analysis device 82 analyzes the contents of the original video will be described later in detail.
From the specific scenes obtained by analysis by 82, a representative still image is displayed by the content presentation device 83 on the information network 72.
These still images can be viewed side by side on the terminal 73 side.

【００１９】88は、ユーザーが端末73を通して内容提示
手段83からの映像内容を確認した後で、端末73の入力操
作により情報ネットワーク72を経由してビデオコンテン
ツ編集支援システム71に送り出される編集命令（コマン
ド）を受付ける編集コマンド受付装置である。ここでい
う編集コマンドとは、原映像からの不要なシーンの削除
や、映像コンテンツに付加するデータ（他の映像，音，
字幕，エフェクトなど）の挿入などを命令するもので、
編集コマンド受付装置88が編集命令を受取ると、前記コ
ンテンツ管理装置87が必要なシーンのＩＤ番号とそれに
対応した原映像を記憶装置86から検索すると共に、この
検索したＩＤ番号および原映像と、前記編集コマンド受
付手段88からの編集命令とに基づいて、映像構成装置89
が編集済候補映像を生成し、この編集済候補映像が、情
報ネットワーク72を経由してユーザーの保有する端末73
に送り出されるようになっている。Reference numeral 88 denotes an edit command (which is sent to the video content edit support system 71 via the information network 72 by an input operation of the terminal 73 after the user confirms the video content from the content presentation means 83 through the terminal 73. It is an edit command reception device that receives a command). The editing command here means deleting unnecessary scenes from the original video and adding data (other video, sound,
It is a command to insert subtitles, effects, etc.)
When the edit command receiving device 88 receives the edit command, the content management device 87 retrieves the necessary scene ID number and the original image corresponding thereto from the storage device 86, and the retrieved ID number and original image Based on the editing command from the editing command receiving means 88, the video composing device 89
Generates an edited candidate video, and this edited candidate video is transmitted to the terminal 73 owned by the user via the information network 72.
It is supposed to be sent to.

【００２０】前記編集コマンド受付手段88は、内容提示
手段83から提示された代表静止画に対する編集命令を受
付ける編集命令受付手段としての機能だけでなく、映像
構成装置89で編集処理した編集済候補映像に関して、そ
の編集処理の追加や修正などの変更命令を受付ける編集
処理変更受付手段としての機能も備えている。編集コマ
ンド受付手段88が、こうした追加や修正などの変更命令
を受取る毎に、映像構成装置89はその内容に基づく新た
な編集済候補映像を再生成し、これを情報ネットワーク
72を経由して端末73に送り出す。そして、端末73側の出
力手段で最終的な編集済候補映像を確認し、追加や修正
命令がこれ以上存在しない旨を編集処理完了の確認命令
として端末73側から情報ネットワーク72を経由してビデ
オコンテンツ編集支援システム71に送り出すと、この確
認命令が編集コマンド受付手段88からコンテンツ管理装
置87に送り出される。これを受けて編集済映像配信手段
を兼用する映像構成装置89は、編集処理の確認が得られ
た編集済候補映像をユーザーが希望するフォーマットに
変換し、編集済映像として情報ネットワーク72から端末
73に配信するようになっている。The edit command accepting means 88 has not only the function as an edit command accepting means for accepting an edit command for the representative still image presented by the content presenting means 83, but also the edited candidate image edited by the image composition device 89. With respect to the above, a function as an edit process change receiving unit for receiving a change command such as addition or correction of the edit process is also provided. Each time the edit command receiving means 88 receives such a change command such as addition or correction, the video composing device 89 regenerates a new edited candidate video based on the content thereof, and regenerates the new edited video.
Send to terminal 73 via 72. Then, the final edited candidate image is confirmed by the output means on the terminal 73 side, and the fact that there is no additional or correction instruction is confirmed as an instruction to confirm the completion of the editing process, and the video is transmitted from the terminal 73 side via the information network 72. When sent to the content editing support system 71, this confirmation command is sent from the edit command receiving means 88 to the content management device 87. In response to this, the video composing device 89 also serving as the edited video distribution means converts the edited candidate video, which has been confirmed for the editing process, into the format desired by the user, and outputs the edited video from the information network 72 to the terminal.
Delivered to 73.

【００２１】次に、前記映像内容解析手段82の内部構成
とその動作手順について説明する。図２は映像内容解析
手段82の概略構成図であり、同図において、１はＭＰＥ
Ｇ２方式の圧縮映像コンテンツ（原映像）に含まれる動
き情報（動きベクトル）に基づいて、パンやズームなど
のカメラワークシーンを抽出するカメラワーク抽出手段
である。また２は、同じくＭＰＥＧ２方式の圧縮映像コ
ンテンツに含まれる色情報（時系列上の色ヒストグラム
変化）に基づいて、一台のカメラの電源オン／オフ間に
連続的に撮影された映像区間内、すなわちショット内を
複数の区間に分割する区間分割手段を備え、そこから意
味のあるフレームをキーフレームとして抽出するキーフ
レーム抽出手段である。この区間は、映像コンテンツ中
のカメラの動きが多い程たくさん分割されるものである
が、カメラワーク抽出手段１は、動きベクトル情報に明
確な特徴がないシーンで、キーフレーム抽出手段２で分
割した単位時間当りの区間数が所定値以上であるなら
ば、カメラワークシーンであると判定を修正する判定修
正手段３を備えている。そして、この判定結果をカメラ
ワークのパラメータとして付加した映像コンテンツが、
カメラワーク抽出手段１ひいては映像内容解析装置82か
ら出力されるようになっている。Next, the internal structure of the video content analysis means 82 and its operation procedure will be described. FIG. 2 is a schematic block diagram of the video content analysis means 82, in which 1 is MPE.
It is a camera work extraction means for extracting a camera work scene such as pan or zoom based on the motion information (motion vector) included in the G2 compressed video content (original video). Further, 2 is a video section continuously shot during power-on / off of one camera based on color information (color histogram change in time series) included in the compressed video content of the MPEG2 system, That is, the key frame extraction means is provided with section division means for dividing the inside of a shot into a plurality of sections, and extracts meaningful frames as key frames from the section division means. This section is divided more as the camera movement in the video content is larger, but the camera work extraction means 1 divides the scene by the key frame extraction means 2 in a scene in which the motion vector information has no clear feature. If the number of sections per unit time is equal to or greater than a predetermined value, the determination correction means 3 is provided to correct the determination that the scene is a camera work scene. Then, the video content with this determination result added as a parameter of camerawork is
The camera work extraction means 1 and then the video content analysis device 82 output the data.

【００２２】４は、カメラワーク抽出手段１で抽出され
るカメラワーク検出やぶれの程度と、キーフレーム抽出
手段２で検出した一区間の時間長さから求めることがで
きる映像変化の激しさとに基づいて、映像コンテンツに
おける各シーン毎の見易さを複数の評価レベルの中から
判定し、各評価レベルに応じて映像コンテンツを自動的
に構造化する映像コンテンツ自動構造化手段である。こ
の映像コンテンツ自動構造化手段４の詳細は後程詳述す
るが、ここで得られる各シーン毎の評価レベルにより、
同じ評価レベルにあるシーンの中から代表静止画を抽出
することなどが可能になる。Reference numeral 4 is based on the degree of camerawork detection and blurring extracted by the camerawork extraction means 1 and the intensity of image change that can be obtained from the time length of one section detected by the keyframe extraction means 2. The video content automatic structuring means determines the visibility of each scene in the video content from a plurality of evaluation levels and automatically structures the video content according to each evaluation level. The details of the video content automatic structuring means 4 will be described later, but according to the evaluation level for each scene obtained here,
It is possible to extract a representative still image from scenes at the same evaluation level.

【００２３】図３は、上記構成に関する詳細手順と、そ
れを実現するための構成を示すフローチャートである。
同図において、ステップＳ１におけるＭＰＥＧ２方式の
圧縮映像コンテンツに含まれる各フレーム毎のビットス
トリームから、予測残差符号化係数（ＤＣＴ係数）のＤ
Ｃ成分を復号器11により復号化し抽出する（ステップＳ
２）。その際に、フレーム内の動きベクトル情報を抽出
し（ステップＳ３）、この動きベクトルの情報に基づい
て、次のステップＳ４で区間特徴量抽出部12により各区
間のぶれやカメラワークを検出する。FIG. 3 is a flow chart showing the detailed procedure relating to the above configuration and the configuration for realizing it.
In the figure, the prediction residual coding coefficient (DCT coefficient) D from the bit stream for each frame included in the compressed video content of the MPEG2 system in step S1.
The C component is decoded and extracted by the decoder 11 (step S
2). At this time, the motion vector information within the frame is extracted (step S3), and based on this motion vector information, in the next step S4 the blurring of each section and camerawork are detected by the section feature quantity extraction unit 12.

【００２４】図４は、フレーム単位での処理を行なう前
記区間特徴量抽出部12の詳細手順と、それを実現するた
めの構成を示すフローチャートである。同図において、
ステップＳ21では、先ずカメラワークに応じた動きベク
トルの選択を行なう。具体的には、各フレームにカメラ
の動いているシーンを検出し、カメラの速さに応じてＰ
ピクチャまたはＢピクチャの動きベクトルの使用を制御
する、参照フレーム間隔の短いフレーム内では、Ｂピク
チャの動きベクトルがカメラの動きが速い場合に適当
で、参照フレーム間隔の長いフレーム内では、Ｐピクチ
ャの動きベクトルがカメラの動きが遅い場合に適当であ
る。したがって、カメラの動きが速い場合はＢピクチャ
の動きベクトルを選択し、カメラの動きが遅い場合はＰ
ピクチャの動きベクトルを選択する。次に、同じステッ
プＳ21の動きベクトル使用率算出器21により、選択した
動きベクトルから動きベクトル使用率ＲＶiを算出す
る。カメラの動いているシーンでは、フレーム内の動き
ベクトルのスカラー値が０以外となる場合が多くなる。
動きベクトル使用率算出器21は、動きベクトルのスカラ
ー値が０でないマクロブロックの割合を算出し、動きベ
クトル使用率ＲＶｉが所定値よりも小さい場合は、ステ
ップＳ22に移行して固定したシーンと判定し、動きベク
トル使用率ＲＶｉが所定値以上の場合は、ステップＳ23
に移行して手ぶれを含む動くカメラワークシーンと判定
し、両者を区別する。ステップＳ22の固定したシーンは
ぶれを含んでいないので、その情報が区間特徴量抽出部
12よりそのまま出力される。FIG. 4 is a flow chart showing a detailed procedure of the section feature quantity extraction unit 12 for performing processing in frame units and a configuration for realizing it. In the figure,
In step S21, a motion vector is selected according to the camerawork. Specifically, the scene in which the camera is moving is detected in each frame, and P is detected according to the speed of the camera.
In a frame with a short reference frame interval, which controls the use of a motion vector of a picture or a B picture, a motion vector of a B picture is suitable when the camera moves quickly, and in a frame with a long reference frame interval, a P picture The motion vector is suitable when the camera moves slowly. Therefore, when the camera moves fast, the B-picture motion vector is selected, and when the camera moves slow, P
Select a motion vector for the picture. Next, the motion vector usage rate calculator 21 in the same step S21 calculates the motion vector usage rate RVi from the selected motion vector. In a scene in which the camera is moving, the scalar value of the motion vector in the frame often becomes nonzero.
The motion vector usage rate calculator 21 calculates the proportion of macroblocks in which the scalar value of the motion vector is not 0. If the motion vector usage rate RVi is smaller than a predetermined value, the process proceeds to step S22 and it is determined that the scene is fixed. If the motion vector usage rate RVi is greater than or equal to the predetermined value, step S23
Then, it is judged as a moving camera work scene including camera shake, and the two are distinguished. Since the scene fixed in step S22 does not include the blur, the information is the section feature quantity extraction unit.
It is output as it is from 12.

【００２５】ステップＳ23の動いているシーンには、電
源の切り忘れ時や撮影者交代時のような見るに耐えない
ひどいぶれシーンや、手ぶれシーンが、いわゆるぶれと
して含まれている。そこで、次のステップＳ24では、フ
レーム内動きベクトル滑らかさ算出器22により、前者の
ひどいぶれシーンをステップＳ23の動いているシーンか
ら抽出する。フレーム内動きベクトル滑らかさ算出器22
は、フレーム内の各マクロブロックの有する動きベクト
ルの連続値を算出し、フレームを縦方向に分割して、ベ
クトル方向の差分値を算出するものである。ひどいぶれ
シーンでは、フレーム内の動きベクトルの向きがばらつ
きを有するので、フレーム内の動きベクトル滑らかさが
閾値を越えていれば、そのフレームがひどいぶれシーン
であると判定し（ステップＳ25）、フレーム内の動きベ
クトル滑らかさが閾値以内であれば、そのフレームがパ
ン，ズーム，手ぶれシーンのいずれかであると判定する
（ステップＳ26）。こうして、動きを解析する必要のな
いひどいぶれシーンを他のシーンから分離する。The moving scene in step S23 includes so-called blurring, such as a serious blurring scene that cannot be seen when the power is forgotten to be turned off or when the photographer changes, or a camera shake scene. Therefore, in the next step S24, the intra-frame motion vector smoothness calculator 22 extracts the former severely blurred scene from the moving scene in step S23. In-frame motion vector smoothness calculator 22
Is to calculate the continuous value of the motion vector of each macroblock in the frame, divide the frame in the vertical direction, and calculate the difference value in the vector direction. In a severely blurred scene, the directions of the motion vectors in the frame have variations, so if the motion vector smoothness in the frame exceeds the threshold value, it is determined that the frame is a severely blurred scene (step S25), and the frame If the smoothness of the motion vector is within the threshold, it is determined that the frame is one of pan, zoom, and camera shake scene (step S26). In this way, severely blurred scenes that do not require motion analysis are separated from other scenes.

【００２６】次のステップＳ27では、ステップＳ26にお
けるカメラの動いているシーンについて、フレーム内動
きベクトルヒストグラム偏り度算出器23によりパンシー
ンとズームシーンとを分離する。但しここでのパンシー
ンは、カメラが左右方向に変化する場合の他に、上下方
向に変化するチルトや、左右方向にカメラ位置が変化す
るトラックや、上下方向にカメラ位置が変化するブーム
や、それらの組合せを全て含む。またズームシーンは、
カメラ位置が前後に動くドリーも含んでいる。フレーム
内動きベクトルヒストグラム偏り度算出器23は、フレー
ム内の動きベクトルの向きを図５に示すように例えばク
ラス１〜８の８通りにクラスタリングし、各クラスのヒ
ストグラム（頻度分布）を求めた後に、このヒストグラ
ムの最大頻度の全体に対する割合をフレーム内動きベク
トルヒストグラム偏り度として算出するもので、この偏
り度が閾値を越えていれば手ぶれを含むパンシーン（ス
テップＳ28）と判定し、偏り度が閾値以下であれば手ぶ
れシーンを含むズームシーン（ステップＳ29）と判定す
る。In the next step S27, the in-frame motion vector histogram deviation degree calculator 23 separates the pan scene and the zoom scene for the scene in which the camera is moving in step S26. However, in this pan scene, in addition to the case where the camera changes in the left and right direction, tilt that changes in the up and down direction, a track that changes the camera position in the left and right direction, a boom that changes the camera position in the up and down direction, All combinations are included. Also, the zoom scene is
It also includes a dolly that moves the camera back and forth. The intra-frame motion vector histogram bias degree calculator 23 clusters the directions of the motion vectors in the frame into, for example, eight ways of classes 1 to 8 and then obtains a histogram (frequency distribution) of each class. The ratio of the maximum frequency of this histogram to the whole is calculated as the intra-frame motion vector histogram bias degree. If this bias degree exceeds a threshold value, it is determined to be a pan scene including camera shake (step S28), and the bias degree is If it is less than or equal to the threshold value, it is determined to be a zoom scene including a camera shake scene (step S29).

【００２７】パンシーンやズームシーンは、その区間の
間でフレーム内の動きベクトルが同じ性質を示すのに対
し、手ぶれシーンでは短い周期でフレーム内の動きベク
トルが変化する。このことを利用して、ステップＳ31で
はフレーム内平均動きベクトル差分値算出器24によりパ
ンシーンと手ぶれシーンの分離を行なうと共に、ステッ
プＳ32ではフレーム内動きベクトルヒストグラム差分値
算出器25によりズームシーンと手ぶれシーンの分離を行
なう。In the pan scene and the zoom scene, the motion vector in the frame shows the same property during the section, whereas in the camera shake scene, the motion vector in the frame changes in a short cycle. Utilizing this, in step S31, the intra-frame average motion vector difference value calculator 24 separates the pan scene from the camera shake scene, and in step S32, the intra-frame motion vector histogram difference value calculator 25 calculates the zoom scene and camera shake. Perform scene separation.

【００２８】パンシーンは、フレーム内の平均動きベク
トルが一定であるのに対して、手ぶれシーンではフレー
ム内の平均動きベクトルが短い周期で変化する。フレー
ム内平均動きベクトル差分値算出器24は、フレーム内の
平均動きベクトルの差分値を算出するもので、この差分
値が閾値以下であればパンシーン（ステップＳ33）と判
定し、差分値が閾値を越えていれば手ぶれシーン（ステ
ップＳ34）と判定する。In the pan scene, the average motion vector in the frame is constant, whereas in the camera shake scene, the average motion vector in the frame changes in a short cycle. The in-frame average motion vector difference value calculator 24 calculates the difference value of the average motion vector in the frame. If this difference value is less than or equal to the threshold value, it is determined to be a pan scene (step S33), and the difference value is the threshold value. If it exceeds, it is determined to be a camera shake scene (step S34).

【００２９】フレーム内平均動きベクトル差分値算出器
25は、フレーム内の動きベクトルの向きを図５に示すよ
うに例えばクラス１〜８の８通りにクラスタリングし、
各クラスのヒストグラム（頻度分布）を求めた後に、フ
レーム毎のヒストグラムの変化をフレーム内動きベクト
ルヒストグラム差分値として算出するもので、この差分
値が閾値以下であればズームシーン（ステップＳ35）と
判定し、差分値が閾値を越えていれば手ぶれシーン（ス
テップＳ36）と判定する。そして、これらの一連の手順
で、固定したシーン，パンシーン，ズームシーン，手ぶ
れシーン，ひどいぶれシーンを各フレーム単位毎に分離
する。Intra-frame average motion vector difference value calculator
25 clusters the directions of the motion vectors in the frame into, for example, eight ways of classes 1 to 8 as shown in FIG.
After obtaining the histogram (frequency distribution) of each class, the change in the histogram for each frame is calculated as the intra-frame motion vector histogram difference value. If this difference value is less than or equal to the threshold value, it is determined to be a zoom scene (step S35). If the difference value exceeds the threshold value, it is determined to be a camera shake scene (step S36). Then, the fixed scene, pan scene, zoom scene, camera shake scene, and severe blur scene are separated for each frame unit by these series of procedures.

【００３０】また、別のステップＳ41では、ＧＯＰ単位
でのカメラワークおよびぶれの検出を行なえるようにす
るために、Ｐピクチャ用およびＢピクチャ用フィルタの
ための各フレーム毎の重み関数を重み関数算出器27で算
出する。この重み関数を算出するに当たり、カメラの動
きを検出するステップＳ21の結果に関しては、フレーム
内の動きのある動きベクトルの数を利用し、それ以外の
結果に関しては、動きベクトルの滑らかさを利用する。
より具体的には、重み関数算出器27は、動きベクトルの
動きを検出する動きベクトル使用率ＲＶiと動きベクト
ル量ＱＶIのＧＯＰ内の平均を算出し、それぞれの値に
ＳＭＶ（滑らかさ）を割ることで重みづけを行なう。ま
た、動きベクトルヒストグラム偏り度と、動きベクトル
ヒストグラム分散値は、重み値を用いた平均値を算出す
る。ここでの重み値とは、ＧＯＰ内において、フレーム
内のカメラの動きとわかる部分で、スカラー値が０でな
い動きベクトルの数が閾値以上のフレーム値の割合を意
味する。さらに、動きベクトルヒストグラム変化と、平
均動きベクトル変化、動きベクトルヒストグラム分散値
変化について、その重み値をそれぞれのフレーム毎に乗
じて最大値を選択する。こうして得られた各フレームの
重み値（ステップＳ42）は、後述するステップＳ７にお
ける各ＧＯＰ毎の特徴検出に利用される。In another step S41, the weighting function for each frame for the P-picture filter and the B-picture filter is used as a weighting function so that camerawork and blurring can be detected in GOP units. It is calculated by the calculator 27. In calculating the weighting function, the number of motion vectors in the frame is used for the result of step S21 of detecting the camera motion, and the smoothness of the motion vector is used for the other results. .
More specifically, the weighting function calculator 27 calculates the average of the motion vector usage rate RVi for detecting the motion of the motion vector and the motion vector amount QVI within the GOP, and divides each value by SMV (smoothness). Weighting is done by that. Further, for the motion vector histogram bias degree and the motion vector histogram variance value, average values using weight values are calculated. The weight value here means a ratio of frame values in which a number of motion vectors whose scalar value is not 0 is equal to or larger than a threshold value in a portion in the GOP where the motion of the camera in the frame is known. Further, with respect to the motion vector histogram change, the average motion vector change, and the motion vector histogram variance value change, the maximum value is selected by multiplying the weight value for each frame. The weight value of each frame (step S42) thus obtained is used for the feature detection for each GOP in step S7 described later.

【００３１】再度図３に戻り、ステップＳ４で区間特徴
量抽出部12により各区間のぶれやカメラワークを検出す
ると、ステップＳ５の各ＧＯＰをバッファする手順を経
て、ステップＳ６の手順に至る。ここでは、ＧＯＰを構
成するフレーム数が15の時の例について記すが、フレー
ム番号を15で割ったときの余りが14に達したか否かが判
定され、余りが14に達していなければ再度ステップＳ２
の手順に戻るが、余りが14に達していればステップＳ７
において、ＧＯＰ検出のための各パラメータ算出器13に
より各ＧＯＰ毎の特徴を検出する。これにより、イント
ラマクロブロックだけで構成される１フレームのＩピク
チャと、フレーム間の動き予測符号化方式を採用した14
フレームのＢピクチャまたはＰピクチャとからなる15フ
レームを一つの単位とし、これらを１つのＧＯＰとして
構成することができる。Returning to FIG. 3 again, when the section feature amount extraction unit 12 detects blurring of each section or camera work in step S4, the procedure of step S6 is performed through the procedure of buffering each GOP in step S5. Here, we will describe an example when the number of frames that make up a GOP is 15, but it is judged whether the remainder when the frame number is divided by 15 has reached 14. If the remainder has not reached 14 again, Step S2
Return to the procedure, but if the remainder reaches 14, step S7
At, the characteristic for each GOP is detected by each parameter calculator 13 for GOP detection. As a result, one frame of I picture composed only of intra macroblocks and interframe motion prediction coding method were adopted.
It is possible to configure 15 frames consisting of B picture or P picture of a frame as one unit, and configure these as one GOP.

【００３２】ステップＳ７における各パラメータ算出器
27は、前記ステップＳ41で導出した各フレームの重み値
を利用して、ＧＯＰ毎のカメラワークやぶれの検出を行
なうものである。具体的には、それぞれ１ＧＯＰの中に
おける動きベクトルの数と、動きベクトルの滑らかさの
比を重み値として、１フレーム毎の値にかけて１ＧＯＰ
で和を算出し、ＧＯＰにおけるカメラワークやぶれの特
徴量をパラメータとして出力する。Each parameter calculator in step S7
Reference numeral 27 is for detecting camerawork and blur for each GOP by using the weight value of each frame derived in step S41. Specifically, the ratio of the number of motion vectors in each GOP and the smoothness of the motion vectors is used as a weight value, and the value for each frame is multiplied by 1 GOP.
Then, the sum is calculated, and the feature amount of camerawork or blurring in the GOP is output as a parameter.

【００３３】一方、ステップＳ２による復号化の際に得
られたＤＣ係数（ステップＳ11）は、次のステップ12に
てショット検出算出器14により、カット点を検出するこ
とによるショット検出が行なわれると共に、ステップ13
においてショット内区間検出器15により、一つのショッ
ト内での内容的な変化が検出される。なお、カット点と
は、カメラの電源オン／オフに伴うシーンの変わり目で
あり、ショットとはカット点の間の連続的に撮影された
映像区間を云う。ショット検出算出器14によるショット
検出は、従来知られているどのような手法を用いても構
わない。また、ショット内区間検出器15は、色のヒスト
グラム変化を用いてショット内の構造的変化を追うもの
で、これが前述の図２に示すキーフレーム抽出手段２に
相当する。そして、ショット検出算出器14により得られ
たショット番号ひいてはカット点（ステップＳ14）と、
ショット内区間検出器15により得られたショット内のキ
ーフレームに相当する区間点（ステップＳ15）と、前記
ステップＳ７からのカメラワークやぶれの特徴量を時系
列上にマッチングさせる。この作業については、後程説
明する。On the other hand, the DC coefficient (step S11) obtained during the decoding in step S2 is detected by the shot detection calculator 14 in the next step 12 by detecting the cut point. , Step 13
At, the intra-shot section detector 15 detects a content change within one shot. Note that the cut point is a transition point of a scene when the power of the camera is turned on / off, and the shot is a video section continuously photographed between the cut points. The shot detection by the shot detection calculator 14 may use any conventionally known method. The in-shot section detector 15 tracks structural changes in the shot by using changes in the color histogram, and this corresponds to the key frame extraction means 2 shown in FIG. Then, the shot number and the cut point (step S14) obtained by the shot detection calculator 14,
The section points (step S15) corresponding to the key frames in the shot obtained by the in-shot section detector 15 are matched in time series with the feature amounts of camerawork and blurring from step S7. This work will be described later.

【００３４】ここで、キーフレーム抽出手段２によるキ
ーフレーム抽出の手法を説明する。一般ユーザーが撮影
した映像コンテンツは、その特徴としてショットが長い
ことが挙げられるため、前記ステップＳ13のようなショ
ット内の時間的変化を知ることが重要である。そこで、
ショットを複数の区分に分割し、その区間に対してクラ
スタリングを行なった後に、要素数の多いクラスタから
優先的にそのクラスタの重心に最も近いフレームをキー
フレームとして抽出するのが好ましい。これにより、一
般ユーザーが撮影した映像コンテンツであっても、動画
像へのランダムアクセスや、動画のインデキシングにか
かる処理コストの低減を図れる。Now, a method of extracting a key frame by the key frame extracting means 2 will be described. Since the feature of the video content shot by a general user is that the shot is long, it is important to know the temporal change in the shot as in step S13. Therefore,
It is preferable to divide a shot into a plurality of sections, perform clustering on the section, and then preferentially extract a frame closest to the center of gravity of the cluster as a key frame from a cluster having a large number of elements. As a result, it is possible to reduce the processing cost required for random access to moving images and indexing of moving images, even for video contents shot by general users.

【００３５】キーフレームを正しく抽出するには、撮影
者が有意と考える物を撮影するときの意図を反映してい
ると思われるフレームをショット内から抽出する必要が
ある。一般に、撮影者が重要と考えられるものは、そう
でないものよりも撮影時間が長くなると共に、重要な物
のフォーカスをフレームの中央に置いて撮影しようとす
る。また、ズームをして撮影しているところなども、撮
影者が重要と考えて撮影していると思われる。そこで本
実施例では、特に撮影時間が長く、フォーカスがフレー
ムの中央にあるものをキーフレームとして抽出する。In order to correctly extract the key frame, it is necessary to extract from the shot a frame which is considered to reflect the intention of the photographer when photographing the object which is considered significant. In general, what is considered important by the photographer has a longer shooting time than ones which are not important, and tries to shoot by placing the focus of the important object in the center of the frame. In addition, it seems that the photographer considers the shooting to be important, even when shooting with zoom. Therefore, in this embodiment, a key frame is extracted when the shooting time is particularly long and the focus is in the center of the frame.

【００３６】ショット内での画像の類似度を求めるに当
り、映像コンテンツは時間軸に沿って順次フレームが再
生されていくデータであると言えるので、同一ショット
内で連続する幾つかのフレーム間では、画像の構図や撮
影されている物の形状は略同一であると考えられる。そ
こで、類似度を求めるために用いられる特徴量として、
テクスチャや形状やオブジェクトの位置ではなく、ここ
では色情報だけを考慮した点が着目される。なお、色情
報をあらわすカラーモデルは幾つか存在するが、前述の
ようにＭＰＥＧ２方式の圧縮映像コンテンツを利用する
ことを考慮して、輝度値Ｙと、２つのクロミナンス値Ｃ
ｂＣｒとをあらわしたＹＣｂＣｒカラーモデルを用いる
のが好ましい。特にここでは、画像の類似性を表す特徴
量として、輝度ヒストグラムＨと、ＹＣｂＣｒ空間での
ユークリッド距離Ｄと、ＹＣｂＣｒ空間でのフレームの
輝度，色差平均（Ｙavr，Ｃｂavr，Ｃｒavr）を用い
る。When determining the similarity of images within a shot, it can be said that the video content is data in which frames are sequentially reproduced along the time axis, so that between several consecutive frames within the same shot. The image composition and the shape of the object being photographed are considered to be substantially the same. Therefore, as the feature quantity used to obtain the similarity,
It is noted that only the color information is taken into consideration here, not the texture, the shape, or the position of the object. Although there are several color models that represent color information, the luminance value Y and the two chrominance values C are taken into consideration in consideration of using the compressed video content of the MPEG2 system as described above.
It is preferable to use a YCbCr color model that represents bCr. In particular, here, the luminance histogram H, the Euclidean distance D in the YCbCr space, the luminance of the frame in the YCbCr space, and the average color difference (Yavr, Cbavr, Cravr) are used as the feature amount indicating the similarity of the images.

【００３７】輝度ヒストグラムＨは、画像内での色の空
間分布の情報（画像の構図情報や画像中のオブジェクト
の形）などが失われてしまうので、輝度ヒストグラムＨ
が類似しているからといって、その画像が類似している
とは限らない。しかし、動画では連続している幾つかの
フレームで共通する部分が撮影されるので、各フレーム
間で輝度ヒストグラムＨが類似している場合は、画像の
構図情報や画像中のオブジェクトの形がある程度類似し
ていると仮定できる。その仮定を前提とすれば、映像コ
ンテンツにおける輝度ヒストグラムＨを用いた類似度の
信頼性は高くなる。また、連続するフレームを比較する
特徴量としてフレーム間差分を用いることも考えられる
が、この場合はカメラ操作によって発生する画面全体に
渡る動きに対して、人間の視覚的類似度をよく反映でき
ない。その点、輝度ヒストグラムＨでは、人間の視覚的
類似度をよく反映することができる利点を有する。フレ
ームｉとフレームｊの類似度ｍは、それらの輝度ヒスト
グラムＨi(s)，Ｈj(s)（但し、sは輝度値）が重なった
部分の画素数とすると、次の数１にてあらわせる。The luminance histogram H loses information about the spatial distribution of colors in the image (image composition information and the shape of objects in the image).
The fact that images are similar does not mean that the images are similar. However, in a moving image, a common portion is photographed in several consecutive frames, so when the luminance histograms H are similar between the frames, the composition information of the image and the shape of the object in the image are to some extent. It can be assumed that they are similar. Based on that assumption, the reliability of the similarity using the luminance histogram H in the video content is high. It is also possible to use an inter-frame difference as a feature amount for comparing consecutive frames, but in this case, the human visual similarity cannot be well reflected in the movement over the entire screen generated by the camera operation. On the other hand, the luminance histogram H has an advantage that the human visual similarity can be well reflected. The similarity m between the frame i and the frame j is expressed by the following formula 1 when the number of pixels in the portion where the luminance histograms Hi (s) and Hj (s) (s is the luminance value) overlap each other. .

【００３８】[0038]

【数１】 [Equation 1]

【００３９】ＹＣｂＣｒ空間でのユークリッド距離Ｄ
は、後述する単純クラスタリングに用いるものである。
ここで、フレームｉのＹＣｂＣｒ空間の値を（Ｙ，Ｃ
ｂ，Ｃｒ）、フレームｊのＹＣｂＣｒ空間の値を
（Ｙ’，Ｃｂ’，Ｃｒ’）とすると、フレームｉとフレ
ームｊのＹＣｂＣｒ空間でのユークリッド距離Ｄは、次
の数２にてあらわせる。Euclidean distance D in YCbCr space
Is used for simple clustering described later.
Here, the value of the YCbCr space of the frame i is set to (Y, C
b, Cr) and the value of the YCbCr space of the frame j is (Y ', Cb', Cr '), the Euclidean distance D in the YCbCr space of the frame i and the frame j can be expressed by the following equation 2.

【００４０】[0040]

【数２】 [Equation 2]

【００４１】ＹＣｂＣｒ空間でのフレーム中心の輝度値
平均および色差平均（Ｙavr，Ｃｂavr，Ｃｒavr）は、
前述の重要な物のフォーカスをフレームの中央に置いて
撮影しようとする仮定に基づき、フレームの中央の特徴
量を用いて算出される。また、フレームの中央に着目す
ることで、特に全体では両者が類似するフレームであっ
ても、フォーカスの違いによりそれらを区別することが
可能になる。ここでのフレームの中心とは、フレームを
５×５に分割して、その中央３×３に対応した部分を指
し、輝度，色差情報として、ＹＣｂＣｒ空間でのＹ成分
の輝度平均と、Ｃｂ成分とＣｒ成分の各色差平均とを用
いる。フレーム画像中の座標（ｘ，ｙ）の画素のＹ成分
の輝度をＹ（ｘ，ｙ）とし、Ｃｂ成分の色差をＣｂ
（ｘ，ｙ）とし、Ｃｒ成分の色差をＣｒ（ｘ，ｙ）とす
ると、ＹＣｂＣｒ空間でのＹ成分の輝度平均Ｙavrと、
Ｃｂ成分の色差平均Ｃｂavrと、Ｃｒ成分の色差平均Ｃ
ｒavrは、次の数３にてあらわせる。The average luminance value and average color difference (Yavr, Cbavr, Cravr) at the frame center in the YCbCr space are
It is calculated using the feature amount at the center of the frame based on the assumption that the focus is placed on the center of the important object to shoot. Further, by paying attention to the center of the frame, it is possible to distinguish them by the difference in focus even if the frames are similar to each other as a whole. The center of the frame here refers to a portion corresponding to the center 3 × 3 by dividing the frame into 5 × 5. As luminance and color difference information, the luminance average of the Y component in the YCbCr space and the Cb component are used. And the respective color difference averages of the Cr component are used. The luminance of the Y component of the pixel at the coordinates (x, y) in the frame image is Y (x, y), and the color difference of the Cb component is Cb.
If (x, y) and the color difference of the Cr component is Cr (x, y), the luminance average Yavr of the Y component in the YCbCr space,
Cb component color difference average Cbavr and Cr component color difference average C
ravr is expressed by the following equation 3.

【００４２】[0042]

【数３】 [Equation 3]

【００４３】但し、ｗは画像の幅で、ｈは画像の高さの
画素数である。However, w is the width of the image, and h is the number of pixels of the height of the image.

【００４４】ところで、ショットを複数の区間に分割
し、その区間に対してクラスタリングを行なう際に、時
間軸を全く考慮しないと、フレーム中心の特徴量は類似
しているが、実際には全く異なるものが同じクラスタに
属してしまう問題を生じる。例えば、人の顔を撮影した
後で、パンを行なって別の人の顔を撮影したとする。こ
のとき、二人の人物が移っているフレームの中心輝度と
色差は、人間の肌の色が似ていることから類似している
と判定され，別々の人物を撮影したフレーム群が同一の
クラスタに属する可能性が高くなる。こうした現象は、
特にカメラ操作を行なったときの映像や、ショットのフ
レーム数が多い映像で顕著となる。By the way, when the shot is divided into a plurality of sections and the time axis is not taken into consideration when the clustering is performed on the sections, the feature values at the center of the frame are similar, but are actually completely different. There is a problem that things belong to the same cluster. For example, assume that after a person's face is photographed, pan is performed to photograph another person's face. At this time, the central brightness and color difference of the frames in which the two persons are moving are determined to be similar because the human skin color is similar, and the frames in which different persons are photographed have the same cluster. More likely to belong to. These phenomena
This is particularly noticeable in images when the camera is operated or images in which the number of shot frames is large.

【００４５】こうした現象を防ぐには、最初に人を撮影
している時の区間と、その次の人を撮影している時の区
間といったように区間を分割し、その区間をクラスタリ
ングするのが好ましい。つまり、予めショットを区間に
分割し、その分割した区間でクラスタリングを行なうよ
うにキーフレーム抽出手段２を構成すれば、ＹＣｂＣｒ
空間でのフレーム中心の輝度値平均や色差平均（Ｙav
r，Ｃｂavr，Ｃｒavr）が似ていても、全く異なるフレ
ームが同じクラスタに属する不具合を低減できる。ま
た、動画像をブラウジングする際に、その内容を把握す
るときにも時間という流れが重要になる。In order to prevent such a phenomenon, it is necessary to divide a section into a section in which a person is photographed first and a section in which a person is photographed next, and to cluster the section. preferable. That is, if the shot is divided into sections in advance and the key frame extraction means 2 is configured to perform clustering in the divided sections, YCbCr
Average luminance value and average color difference (Yav
Even if r, Cbavr, Cravr) are similar, it is possible to reduce the problem that completely different frames belong to the same cluster. Also, when browsing a moving image, the flow of time becomes important when grasping the content of the moving image.

【００４６】キーフレーム抽出手段２によるショットの
区間分割に関する手順を、図６に基づき説明すると、先
ず、ショット中のフレーム群51の中で、最初のフレーム
を区間１の開始フレーム51Ａとする。この開始フレーム
51Ａと、それ以降のフレームとの間の類似度ｍを、前記
数５に基づき時間軸に沿って順に計算する。類似度ｍが
ある閾値Ｔｈよりも小さくなると、そのときのフレーム
が次の開始フレーム51Ｂとなる。以下、最前の開始フレ
ーム51Ｂとそれ以降のフレームとの間の類似度ｍを、同
様似に時間軸に沿って順に計算し、類似度ｍが閾値Ｔｈ
よりも小さくなったときのフレームを次の開始フレーム
51Ｃとする手順を繰り返す。こうして、各開始フレーム
51Ａ，51Ｂ，51Ｃを先頭として、三つの区間52Ａ，52
Ｂ，52Ｃが分割される。つまり、前記類似度ｍは、予め
ショット中の区間を分割するのに用いられる。The procedure for dividing the shot section by the key frame extraction means 2 will be described with reference to FIG. 6. First, the first frame in the frame group 51 in the shot is set as the start frame 51A of the section 1. This start frame
The degree of similarity m between 51A and the subsequent frames is calculated in order along the time axis based on the equation (5). When the similarity m becomes smaller than a certain threshold Th, the frame at that time becomes the next start frame 51B. Hereinafter, the similarity m between the frontmost start frame 51B and the subsequent frames is similarly calculated sequentially along the time axis, and the similarity m is the threshold Th.
Frame when it is smaller than the next start frame
Repeat procedure 51C. Thus each start frame
51A, 51B, 51C as the head, three sections 52A, 52
B and 52C are divided. That is, the similarity m is used in advance to divide the section in the shot.

【００４７】なお、本来であればフレーム全体の変化に
着目しているので、次の区間の開始フレームを設定する
際には、類似度をあらわす特徴量としてフレーム全体の
ヒストグラムだけを用いるのが理想であるが、背景が大
きな割合を占める映像コンテンツでは、全体の輝度ヒス
トグラムＨが背景の影響を大きく受けてしまい。本来区
間に分割すべきところを分割できなくなる。一方、フレ
ームの中心の輝度ヒストグラムＨでは、背景に占める映
像中のオブジェクトの割合が、全体の輝度ヒストグラム
での場合に比べて大きくなる。したがって、類似度ｍを
算出するに当り、フレーム中心の輝度ヒストグラムＨを
合わせて考慮するのが、背景の影響を小さくする上で好
ましい。また、フラッシュなどの影響で一瞬フレームの
輝度値が大きく変化すると、本来同一区間であるべきも
のが、区間の切れ目と誤検出する場合も考えられる。こ
うした誤検出を避けるために、最前の開始フレームとの
類似度ｍが複数フレーム（例えば２フレーム）連続して
閾値Ｔｈよりも小さくなったときに、区間を分割するこ
とにし、そのときの開始フレームを複数フレームの中
で、先に比較を行ったフレームとすればよい。Since originally, the change in the entire frame is focused on, it is ideal that only the histogram of the entire frame is used as the feature amount indicating the similarity when setting the start frame of the next section. However, in video contents in which the background occupies a large proportion, the entire luminance histogram H is greatly affected by the background. It becomes impossible to divide what should be divided into sections. On the other hand, in the luminance histogram H at the center of the frame, the proportion of objects in the image in the background is larger than in the case of the entire luminance histogram. Therefore, in calculating the similarity m, it is preferable to consider the luminance histogram H at the center of the frame together in order to reduce the influence of the background. In addition, when the luminance value of the frame changes for a moment due to the influence of a flash or the like, it is possible that the originally same section is erroneously detected as a section break. In order to avoid such erroneous detection, when the similarity m to the previous start frame becomes smaller than the threshold Th continuously for a plurality of frames (for example, 2 frames), the section is divided, and the start frame at that time is divided. May be the frame that was previously compared in the plurality of frames.

【００４８】次に、分割された各区間52Ａ〜52Ｃからキ
ーフレームを抽出する手順を、図７の概略図に基づき説
明する。先ず各区間52Ａ〜52Ｃ内で、前記数３で算出さ
れるフレーム中心のＹＣｂＣｒ空間での輝度値平均およ
び色差平均を用いて、単純クラスタリングを行なう。ク
ラスタリングの結果、例えば図７では、区間52Ａにおい
て３つのクラスタ、すなわち要素数10のクラスタ54Ａ
と、要素数５のクラスタ54Ｂと、要素数２のクラスタ54
Ｃが形成され、区間52Ｂにおいて２つのクラスタ、すな
わち要素数７のクラスタ54Ｄと、要素数10のクラスタ54
Ｅが形成され、区間52Ｃにおいて２つのクラスタ、すな
わち要素数７のクラスタ54Ｆと、要素数７のクラスタ54
Ｇが形成される。Next, a procedure for extracting a key frame from each of the divided sections 52A to 52C will be described with reference to the schematic diagram of FIG. First, in each of the sections 52A to 52C, simple clustering is performed using the average luminance value and the average color difference in the YCbCr space of the frame center calculated by the above-mentioned equation 3. As a result of the clustering, for example, in FIG. 7, three clusters in the section 52A, that is, a cluster 54A having 10 elements is used.
, A cluster 54B having 5 elements, and a cluster 54 having 2 elements
C is formed, and there are two clusters in the section 52B, that is, a cluster 54D having 7 elements and a cluster 54 having 10 elements.
E is formed, and in the section 52C, two clusters, that is, a cluster 54F having seven elements and a cluster 54 having seven elements are formed.
G is formed.

【００４９】次に、各区間52Ａ〜52Ｃ毎に形成された複
数のクラスタ54Ａ〜54Ｇから、その各クラスタ54Ａ〜54
Ｇの重心に最も近いフレームが、キーフレーム候補55Ａ
〜55Ｇとして一つずつ抽出される。そして、自分の求め
ている詳細さによって、所望のキーフレームを抽出す
る。このとき、要素数が多いクラスタから抽出するキー
フレーム候補（例えば図７では、キーフレーム候補55
Ａ，55Ｅが最もクラスタの要素数が多い）ほど、その重
要度すなわち提示優先順位を高くする。その理由は、撮
影時間の長いものほど、撮影者が重要と考えて撮影して
いるとの前提に基づいている。すなわち、要素数（フレ
ーム数）の多いクラスタの方が、要素数の少ないクラス
タよりも撮影者にとって有意になるためである。Next, from the plurality of clusters 54A to 54G formed in each of the sections 52A to 52C, the respective clusters 54A to 54G are
The frame closest to the center of gravity of G is the key frame candidate 55A.
~ 55G are extracted one by one. Then, a desired key frame is extracted according to the detail desired by the user. At this time, key frame candidates extracted from a cluster having a large number of elements (for example, in FIG. 7, key frame candidate 55
As A and 55E have the largest number of cluster elements), the degree of importance, that is, the presentation priority, is set higher. The reason is based on the assumption that the photographer thinks that the longer the shooting time is, the more important the shooting is. That is, a cluster having a large number of elements (frame number) becomes more significant to the photographer than a cluster having a small number of elements.

【００５０】なお、前記単純クラスタリングを行なう際
に、区間内での時間軸を考慮すると、ビジュアル的に類
似した余分なフレームを抽出する可能性が大きくなるの
に対し、区間内での時間軸を考慮しない場合は、余分な
フレームを抽出する可能性を低減できる。これは、撮影
者が一度撮影したところを、再度撮影するケースが考え
られるからである。ここで、数４に示すｎ個の特徴ベク
トルを単純クラスタリングする手順を示す。When the simple clustering is performed and the time axis in the section is taken into consideration, the possibility of extracting visually similar extra frames increases, whereas the time axis in the section is If not taken into consideration, the possibility of extracting extra frames can be reduced. This is because the photographer may take a picture once and take the picture again. Here, a procedure for simple clustering the n feature vectors shown in Equation 4 will be described.

【００５１】[0051]

【数４】 [Equation 4]

【００５２】先ず、任意のベクトルＸ_ｉをとり、これを
第１クラスタＣ_１の中心Ｙ_１（Ｙ_１＝Ｘ_ｉ）とする。次
に、別なベクトルＸ_ｊをとり、前記第１クラスタＣ_１の
中心Ｙ_１とベクトルＸ_ｊとのユークリッド距離Ｄ_１,ｊ
を求める。このとき、Ｄ_１,ｊ＞Ｔであるならば、ベク
トルＸ_ｊを第２クラスタＣ_２の中心Ｙ_２（Ｙ_２＝Ｘ_ｉ）
とし、Ｄ_１,ｊ≦Ｔであるならば、ベクトルＸ_ｊが第１
クラスタＣ_１に含まれる（Ｘ_ｊ∈Ｃ_１）ものとして、ク
ラスタの中心Ｙ_１を更新する。First, an arbitrary vector X _i is taken, and this is set as the center Y ₁ (Y ₁ = X _i ) of the first cluster C ₁ . Next, take another vector _{X j,} the Euclidean distance D _{1, j} of the first cluster _{C 1} of the center _{Y 1} and vector _{X j}
Ask for. At this time, if D _{1, j} > T, the vector X _{j is set} to the center Y ₂ (Y ₂ = X _i ) of the second cluster C _2.
And if D _{1, j} ≤T, then the vector X _j is the first
The cluster center Y ₁ is updated as (X _j εC ₁ ) included in the cluster C ₁ .

【００５３】その後で、ベクトルＸ_ｋをとり、第１クラ
スタＣ_１の中心Ｙ_１および第２クラスタＣ_２の中心Ｙ_２
とのユークリッド距離Ｄ_１,ｋ，Ｄ_２,ｋをそれぞれ求め
る。このとき、Ｄ_１,ｋ＞Ｔで、かつＤ_２,ｋ＞Ｔである
ならば、ベクトルＸ_ｋを第３クラスタＣ_３の中心Ｙ
_３（Ｙ_３＝Ｘ_ｉ）とし、Ｄ_１,ｋ≦ＴまたはＤ_２,ｋ≦Ｔ
ならば、ベクトルＸ_ｋは中心との距離の短い方のクラス
タに所属するものとする。[0053] Thereafter, take the vector _{X k,} center _{Y 2} center _{Y 1} and the second cluster _{C 2} of the first cluster _{C 1}
And Euclidean distances D _{1, k} and D _{2, k} from At this time, if D _{1, k} > T and D _{2, k} > T, the vector X _{k is set} to the center Y of the third cluster C ₃ .
₃ (Y ₃ = X _i ) and D _{1, k} ≦ T or D _{2, k} ≦ T
Then, the vector X _k belongs to the cluster having the shorter distance from the center.

【００５４】こうして、全てのベクトルについて上記の
手順を繰り返し行ない、クラスタリングを終了する。な
お、クラスタの中心は新しい要素が増える毎に更新す
る。ここでは、新しい要素が増える前のクラスタの中心
を（Ｙcen，Ｃｂcen，Ｃｒcen）とし、要素数をnumと
し、新しいクラスタに加わる要素を（Ｙnew，Ｃｂnew，
Ｃｒnew）とすると、新しいクラスタの中心（Ｙ’cen，
Ｃｂ’cen，Ｃｒ’cen）は次の数５にてあらわせる。Thus, the above procedure is repeated for all the vectors, and the clustering is completed. The center of the cluster is updated every time a new element is added. Here, the center of the cluster before the new element increases is (Ycen, Cbcen, Crcen), the number of elements is num, and the elements added to the new cluster are (Ynew, Cbnew,
Crnew), the center of the new cluster (Y'cen,
Cb'cen, Cr'cen) can be expressed by the following equation 5.

【００５５】[0055]

【数５】 [Equation 5]

【００５６】上記クラスタリングの結果、クラスタの数
が多過ぎるか、あるいは少な過ぎた場合、前記Ｔの値を
変えて再度クラスタリングを行なえばよい。As a result of the above clustering, if the number of clusters is too large or too small, the value of T may be changed and clustering may be performed again.

【００５７】次に、キーフレーム抽出処理の実験結果を
図８，図９および表１にて説明する。実験に使用した映
像コンテンツは、数人の人物を撮影した映像で、その条
件は720×240ピクセル，30フレーム／秒であり、ＭＰＥ
Ｇ−２形式で保存したものをｐｐｍ形式に変換したもの
で、撮影時間は31秒である。前記閾値Ｔｈは、フレーム
全体の輝度ヒストグラムＨの類似度ｍを求める際には、
Ｔｈall＝121000とし、フレーム中央の輝度ヒストグラ
ムＨに着目した場合には、Ｔｈcenter＝32000とした。
また、ユークリッド距離Ｄの閾値としてＴ＝30を用い
た。映像コンテンツが分割された区間の区間番号と、各
区間から精製されたクラスタを代表するキーフレーム候
補番号およびクラスタの要素数を、区間の範囲が５フレ
ーム以上であるものに限定して表１に示す。Next, the experimental results of the key frame extraction processing will be described with reference to FIGS. 8 and 9 and Table 1. The video contents used in the experiment were videos of several people, and the conditions were 720 × 240 pixels, 30 frames / sec.
The G-2 format was saved and converted to ppm format, and the shooting time is 31 seconds. When the similarity Th of the brightness histogram H of the entire frame is calculated, the threshold Th is
When Thall = 121000 and focusing on the luminance histogram H at the center of the frame, Thcenter = 32000.
Moreover, T = 30 was used as the threshold value of the Euclidean distance D. The section number of the section into which the video content is divided, the key frame candidate number representative of the cluster refined from each section, and the number of elements of the cluster are limited to those in which the section range is 5 frames or more, and shown in Table 1. Show.

【００５８】[0058]

【表１】 [Table 1]

【００５９】また、図８は各フレーム毎のフレーム中央
の輝度平均（実線）および色差平均（波線）を示してお
り、また図９は、フレーム毎の開始フレームとの類似度
ｍを、フレームの全体（実線）とフレームの中央（破
線）でそれぞれ示している。その結果、過剰検出となる
ようにキーフレーム候補からキーフレームを抽出する
と、撮影者の必要とするフレームを提示することができ
る。また、キーフレームを抽出するときに、カメラワー
クや手ぶれなどの情報も用いれば、さらに効果的なキー
フレームの提示が可能となる。ここでは、ショットを分
割する際に輝度情報しか用いていないが、色差成分など
も考慮すると、より適切な区間に分割することが可能に
なる。Further, FIG. 8 shows the luminance average (solid line) and the color difference average (broken line) at the center of each frame, and FIG. 9 shows the similarity m of each frame to the start frame. The whole (solid line) and the center of the frame (broken line) are shown respectively. As a result, when the key frames are extracted from the key frame candidates so as to be over-detected, the frame required by the photographer can be presented. Further, when the key frame is extracted, if the information such as camera work and camera shake is also used, it is possible to more effectively present the key frame. Here, although only the luminance information is used when dividing the shot, it is possible to divide the shot into more appropriate sections in consideration of color difference components and the like.

【００６０】続いて、キーフレーム抽出手段２で得られ
た色情報の変化を利用して、映像コンテンツの変化を見
ながら動きベクトル参照フレーム間隔を計算し、新たに
動きベクトルを生成してカメラワーク検出に利用する手
順を以下に説明する。前述のように、色情報（時系列上
の輝度ヒストグラムＨの変化）を用いて区間を検出する
と、動画像の変化点をロバストに求めることができる。
すなわち、フレーム全体の色の構図が同一なシーンを検
出することができるが、その一方で、パンやズームなど
のカメラワークシーンでは、区間の過剰検出が起こる。Subsequently, the change of the color information obtained by the key frame extraction means 2 is used to calculate the motion vector reference frame interval while watching the change of the video content, and a new motion vector is generated to generate the camera work. The procedure used for detection will be described below. As described above, when the section is detected using the color information (change of the luminance histogram H on the time series), the change point of the moving image can be robustly obtained.
That is, it is possible to detect a scene in which the color composition of the entire frame is the same, but on the other hand, in a camera work scene such as pan or zoom, excessive detection of sections occurs.

【００６１】これに対して、動きベクトルによる動き情
報では、フレーム全体の時系列変化を検出できると共
に、前述の手順により映像として内容理解が困難なひど
いぶれシーンを、画像処理不用フレームとして検出でき
るものの、動きベクトル情報に特徴がない場合は、カメ
ラワークを検出できなかったり，誤ったカメラワークと
して検出することがある。本実施例では、判定修正手段
３により色情報と動き情報をハイブリッドに用いてカメ
ラワークを検出する点が、新規な特徴として着目され
る。このカメラワークの具体的な検出方法を、図１０に
基づき説明する。図１０は検出結果の一例を概略的に示
したもので、上段の「ＣＷ」は、カメラワーク抽出手段
１が抽出したシーンを時系列的に並べたものであり、
「Ｐ」はパンシーン，「Ｆ」は固定シーン，「Ｓ」は手
ぶれシーン，「Ｎ」は動きベクトルの情報が明確な特徴
を有していないため、カメラワークの判別ができないシ
ーンを示している。また、ここには図示していないが、
カメラワーク抽出手段１によりズームシーンもパンシー
ンと同様にカメラワークの一つとして検出される。すな
わち動き情報により、パンシーンとズームシーンの２種
類のカメラワークが検出される。一方、下段の「ＫＦ」
は、キーフレーム抽出手段２で抽出される前記輝度ヒス
トグラムＨによる区間の検出を時系列的に並べたもので
ある。On the other hand, with the motion information based on the motion vector, it is possible to detect a time-series change of the entire frame, and it is possible to detect a severely blurred scene whose contents are difficult to understand as an image as an image processing unnecessary frame by the above-mentioned procedure. When the motion vector information has no characteristic, camerawork may not be detected or may be detected as wrong camerawork. In the present embodiment, the point that camerawork is detected by using the color information and the motion information in a hybrid by the determination correction means 3 is noted as a novel feature. A specific method for detecting this camerawork will be described with reference to FIG. FIG. 10 schematically shows an example of the detection result, and “CW” in the upper row is a time-sequential arrangement of the scenes extracted by the camerawork extracting means 1.
"P" is a pan scene, "F" is a fixed scene, "S" is a handshake scene, and "N" is a scene in which the motion vector information cannot be discriminated because it does not have clear features. There is. Also, although not shown here,
The zoom work is also detected as one of the camera works by the camera work extraction means 1 like the pan scene. That is, two types of camera work, a pan scene and a zoom scene, are detected from the motion information. On the other hand, the lower "KF"
Is a time-series arrangement of the detection of the sections by the luminance histogram H extracted by the key frame extraction means 2.

【００６２】判定修正手段３は次のような手順で、カメ
ラワーク抽出手段１から得られたカメラワークの検出結
果や、キーフレーム抽出手段２から得られた区間の検出
結果を修正する。動きベクトル情報の存在しない固定シ
ーン（カメラ固定）「Ｆ」では、色情報の変化もないた
め、キーフレーム抽出手段２で抽出される区間が一つに
検出される。したがってこのような場合は、そのまま固
定シーンであると判定する。速いフレーム変化は、動き
情報が明確な特徴を有しているため、カメラワーク抽出
手段１によるカメラワーク検出の適合率が高い。したが
って判定修正手段３は、図１０のに示すパンシーン
「Ｐ」において、キーフレーム抽出手段２が過剰検出し
た区間をまとめると共に、カメラワーク抽出手段１は動
き補償のフレーム間隔を最小にして、カメラワークと手
ぶれの分離を行なう（最小フレーム間動き補償）。これ
により、パンシーンやズームシーンなどの速いフレーム
変化に対し、輝度ヒストグラムＨに基づく区間の過剰検
出を防ぐことができ、さらに精度良くカメラワークと手
ぶれとの判別を行なうことができる。The determination correction means 3 corrects the camerawork detection result obtained from the camerawork extraction means 1 and the section detection result obtained from the keyframe extraction means 2 in the following procedure. In a fixed scene (camera fixed) “F” in which motion vector information does not exist, since there is no change in color information, one section extracted by the key frame extraction unit 2 is detected. Therefore, in such a case, it is determined as it is as a fixed scene. The rapid frame change has a characteristic that the motion information is clear, so that the precision of camerawork detection by the camerawork extraction means 1 is high. Therefore, in the panning scene “P” shown in FIG. 10, the judgment correction means 3 puts together the sections that the key frame extraction means 2 has excessively detected, and the camera work extraction means 1 minimizes the frame interval for motion compensation to make the camera Work and camera shake are separated (minimum interframe motion compensation). As a result, it is possible to prevent excessive detection of a section based on the luminance histogram H for a fast frame change such as a pan scene or a zoom scene, and it is possible to more accurately determine camerawork and camera shake.

【００６３】一方、ゆっくりとフレームが変化する場合
は、その前後にある近隣のフレームとの変化が少ないた
め、動きベクトル情報にカメラワークとしての特徴が少
なく、カメラワーク検出の適合率が低くなる。そこで判
定修正手段３は、図１０のに示すようなカメラワーク
の判別ができないシーン「Ｎ」間において、キーフレー
ム抽出手段２が所定値以上の区間分割を行なっている場
合に、手ぶれを含むカメラワークシーンであると修正判
定し、これを受けてカメラワーク抽出手段１は、このシ
ーン内で動き補償フレームの間隔を広げる（可変フレー
ム間動き補償）ことで、パンまたはズームのカメラワー
クシーンであるか、手ぶれシーンであるかの判別を行な
う。これにより、動きベクトル情報単独ではカメラワー
クの判別ができないシーンであっても、色情報による区
間検出を考慮することにより、カメラワークシーンであ
るか否かの判別を行なうことが可能になる。On the other hand, when the frame changes slowly, there is little change from the neighboring frames before and after that, and therefore the motion vector information has few characteristics as camerawork, and the matching rate of camerawork detection is low. Therefore, the determination correction unit 3 includes a camera including a camera shake when the key frame extraction unit 2 divides a section of a predetermined value or more between scenes "N" in which the camera work cannot be determined as shown in FIG. The camera work extracting means 1 determines that the work scene is a correction, and in response to this, the camera work extraction means 1 widens the interval of the motion compensation frames in this scene (variable inter-frame motion compensation) so that the camera work scene is a pan or zoom camera work scene. Or a camera shake scene. As a result, even if the scene cannot be identified by the motion vector information alone, it is possible to determine whether or not the scene is a camerawork scene by considering the section detection based on the color information.

【００６４】キーフレーム抽出手段２による一区間当り
のフレーム変化量は、変化の速さに関係なく等しい。そ
こで、対象物（オブジェクト）が近くにある場合や、カ
メラワークシーンの検出が速く行なわれる場合に、キー
フレーム抽出手段２で検出した一区間の長さから、単位
時間当りのフレーム変化（映像変化）の激しさを検出す
ることも可能になる。この映像変化の激しさは、カメラ
ワーク抽出手段１で抽出されるカメラワークシーンの検
出やぶれの程度と共に、映像コンテンツ自動構造化手段
４において、各シーン毎の映像の見易さの判定を決める
基準となる。The frame change amount per section by the key frame extraction means 2 is equal regardless of the change speed. Therefore, when an object (object) is near or when the camera work scene is detected quickly, the frame change per unit time (image change) from the length of one section detected by the key frame extraction means 2. It is also possible to detect the intensity of). The intensity of this video change is a criterion for determining the ease of viewing the video for each scene in the video content automatic structuring means 4, together with the degree of detection and blur of the camerawork scene extracted by the camerawork extraction means 1. Becomes

【００６５】表２は、こうした映像の見易さの判定を行
なう際の評価基準を、レベル０〜レベル３の４段階に定
めた例を示している。Table 2 shows an example in which the evaluation criteria for determining the visibility of the image are set to four levels, level 0 to level 3.

【００６６】[0066]

【表２】 [Table 2]

【００６７】なお、上記表２において、手ぶれシーンと
ひどいぶれシーンは前記図４に示すようにカメラワーク
抽出手段１で区別される。すなわち、レベル０やレベル
１の評価基準にある「手ぶれ」とは、映像の内容は分か
るが見ずらい手ぶれシーンのことで、レベル３の評価基
準にある「ぶれ」とは、映像の内容理解が困難なほどぶ
れているひどいぶれシーンのことである。さらに、レベ
ル２にある「ある閾値を超えた速い映像変化」とは、前
記キーフレーム抽出手段２で分割される区間の時間長さ
により検出できる。In Table 2, the camera-shake scene and the severe camera-shake scene are distinguished by the camera work extracting means 1 as shown in FIG. In other words, “camera shake” in the evaluation criteria of level 0 and level 1 is a camera shake scene that is difficult to see although the contents of the video are understandable, and “camera shake” in the evaluation criteria of level 3 is the understanding of the contents of the video. It is a terrible blur scene where it is difficult to blur. Furthermore, the “quick video change exceeding a certain threshold” at level 2 can be detected by the time length of the section divided by the key frame extraction means 2.

【００６８】図１１は、上記レベル０〜レベル３の評価
基準により、映像の見易さに基づいて映像コンテンツを
自動的に構造化した例を示す概略図である。同図におい
て、「シーンチェンジ」とあるのは、いわゆるカット点
で、カメラの電源オン／オフに伴うシーンの変わり目で
ある。ここでは、カット点を挟んで３つのショットＡ，
Ｂ，Ｃが存在する。図１１の最上段に示すように、映像
コンテンツはシーンチェンジとシーン検出により分割さ
れ、分割された各シーン毎に、前記レベル０〜レベル３
の評価が行なわれる。こうすると、映像の見易さに基づ
く評価レベルで、各ショットＡ，Ｂ，Ｃ毎に映像コンテ
ンツを自動的に構造化することができる。FIG. 11 is a schematic diagram showing an example in which the video contents are automatically structured based on the easiness of viewing the video based on the evaluation criteria of level 0 to level 3. In the figure, “scene change” is a so-called cut point, which is a scene change accompanying the power on / off of the camera. Here, three shots A, with the cut point in between,
There are B and C. As shown in the top row of FIG. 11, the video content is divided by scene change and scene detection, and the level 0 to level 3 are divided for each divided scene.
Is evaluated. In this way, the video content can be automatically structured for each shot A, B, C at the evaluation level based on the visibility of the video.

【００６９】映像コンテンツ自動構造化手段４は、外部
からの要求により、同じ評価レベルまたはある評価レベ
ルまでのシーンだけを抽出し、その中から代表静止画
（例えば、シーンの最初と最後、中間、あるいは最初だ
けの静止画）を並べて表示するものである。具体的に
は、レベル０のシーンを抽出すれば、構造化された映像
コンテンツの中から、カメラ固定または手ぶれを含まな
いカメラワークシーンの代表静止画だけを速やかに表示
することができる。逆にレベル１〜レベル３のシーンの
代表静止画を表示し、そこにある不要なシーンの幾つか
を選択して、元の映像コンテンツからカットすることも
簡単に行なえる。その場合に、映像コンテンツの全てを
時間的に追う必要がなく、映像コンテンツにおける編集
作業の短縮化を図ることができる。The video content automatic structuring means 4 extracts only scenes of the same evaluation level or up to a certain evaluation level according to a request from the outside, and extracts representative still images (for example, the first and last scenes, the middle, Or only the first still image) is displayed side by side. Specifically, if the level 0 scene is extracted, only the representative still image of the camera work scene that does not include camera fixing or camera shake can be quickly displayed from the structured video content. Conversely, it is also easy to display the representative still images of the scenes of level 1 to level 3, select some unnecessary scenes there, and cut them from the original video content. In that case, it is not necessary to temporally track all of the video content, and the editing work on the video content can be shortened.

【００７０】次に、上記図１の構成について、映像コン
テンツの編集サービスを提供する一連の手順を、図１２
のフローチャートに基づき説明する。先ずステップＳ41
において、ビデオコンテンツ編集支援システム71が編集
対象となる素材映像を原映像として受取装置81より取得
すると、その原映像をビデオコンテンツ編集支援システ
ム71内で取扱えるＭＰＥＧ２方式のフォーマットに変換
する（ステップＳ42）。なお、原映像が予めＭＰＥＧ２
方式の圧縮映像コンテンツである場合は、ステップＳ42
の手順を省略できる。次にステップＳ43において、この
ＭＰＥＧ２方式に圧縮化された原映像の内容を、映像内
容解析装置82で解析する。具体的には、圧縮化された原
映像に含まれる動きベクトルに基づいて、シーンチェン
ジとカメラワークを含むシーンの切り替わりにより個々
のシーンを切り分け、各シーンについて、映像の見易さ
を例えば評価レベル０〜評価レベル３の４段階のいずれ
かに判定して行く。区画された各シーンは映像蓄積装置
84に順次蓄積されると共に、これらのシーンに対応した
ＩＤ番号と映像の見易さの判定結果（評価レベル）が、
映像ＩＤ蓄積装置85に記憶される。これにより、各シー
ンの動画像情報に、撮影者が意識的に行なった行為のパ
ラメータが評価レベルとして付加されるが、評価レベル
のデータが付加されること自体は、さほど情報量が増え
ない。むしろこの評価レベルのデータを利用して、原画
像の編集時にエンドユーザーに提示する情報量を減らす
ことの方が、エンドユーザーにとって膨大なデータを扱
わずに済み、利点が大きい。Next, regarding the configuration of FIG. 1 described above, a series of procedures for providing a video content editing service will be described with reference to FIG.
It will be described based on the flowchart of FIG. First, step S41
In the above, when the video content editing support system 71 obtains the material video to be edited as an original video from the receiving device 81, the original video is converted into an MPEG2 format that can be handled in the video content editing support system 71 (step S42). ). Note that the original video is MPEG2 in advance.
If it is the compressed video content of the system, step S42
The step can be omitted. Next, in step S43, the contents of the original video compressed by the MPEG2 system are analyzed by the video content analysis device 82. Specifically, based on the motion vector contained in the compressed original video, individual scenes are separated by changing scenes including scene change and camera work, and for each scene, the visibility of the video is evaluated, for example. Judgment is made to one of four levels from 0 to evaluation level 3. Each divided scene is a video storage device
In addition to being sequentially accumulated in 84, the ID number corresponding to these scenes and the determination result (evaluation level) of the visibility of the image are
It is stored in the video ID storage device 85. As a result, the parameter of the action intentionally performed by the photographer is added to the moving image information of each scene as the evaluation level, but the addition of the evaluation level data does not increase the information amount so much. Rather, using this evaluation level data to reduce the amount of information presented to the end user when the original image is edited is advantageous for the end user because it does not have to handle a huge amount of data.

【００７１】こうして原画像の各シーンが、映像の見易
さに応じた評価レベルに自動的に構造化されると、ビデ
オコンテンツ編集支援システム71は原画像を編集する上
でどのような映像を提示してほしいのかを、情報ネット
ワーク72を通じて端末73上で表示させる。その際、ビデ
オコンテンツ編集支援システム71は、各評価レベルのシ
ーンの時間を端末73に時間情報として提供し、エンドユ
ーザーがどの程度の評価レベルまでのシーンを見たらよ
いのかを、事前に判断できるようにする。これを受けて
端末73から必要な条件を入力すると、その条件に合致す
る評価レベルのシーンが映像蓄積手段84より抽出され、
この特定のシーンの代表的な静止画が内容提示手段83か
ら端末73に送り出される（ステップＳ44，Ｓ45）。In this way, when each scene of the original image is automatically structured into an evaluation level according to the ease of viewing the image, the video content editing support system 71 selects what kind of image in editing the original image. Whether the information is desired to be displayed is displayed on the terminal 73 through the information network 72. At that time, the video content editing support system 71 provides the time of each evaluation level scene to the terminal 73 as time information so that the end user can judge in advance to what evaluation level the scene should be viewed. To do so. Upon receiving this, when the necessary condition is input from the terminal 73, the scene of the evaluation level that matches the condition is extracted from the video storage means 84,
A representative still image of this particular scene is sent from the content presenting means 83 to the terminal 73 (steps S44, S45).

【００７２】例えば評価レベル０のシーンだけを見たい
という条件を端末73から入力すると、内容提示手段83は
評価レベル０のシーンを順に抽出して、各シーンにおけ
る代表的な静止画を端末73に提示する。また、例えば評
価レベル０のシーン時間が余りにも短かすぎる場合に
は、評価レベル０と評価レベル１のシーンだけを見たい
という条件を端末73から入力する。この場合、内容提示
手段83は評価レベル０や評価レベル１のシーンを順に抽
出して、各シーンにおける代表的な静止画を端末73に提
示する。なお、どのような順番で静止画を配列表示する
のかも、端末73から適宜入力することができる。こうし
て、エンドユーザーは予め不要なシーンを切り捨て、必
要となるシーンの代表的静止画だけを閲覧することで、
提示する情報量の削減を図ることが可能になる。For example, when a condition that only the scene of the evaluation level 0 is desired to be viewed is input from the terminal 73, the content presenting means 83 sequentially extracts the scenes of the evaluation level 0, and a typical still image in each scene is input to the terminal 73. Present. Further, for example, when the scene time of the evaluation level 0 is too short, the condition for wanting to see only the scenes of the evaluation level 0 and the evaluation level 1 is input from the terminal 73. In this case, the content presenting means 83 sequentially extracts scenes of evaluation level 0 or evaluation level 1 and presents a representative still image in each scene to the terminal 73. The order in which the still images are arranged and displayed can also be appropriately input from the terminal 73. In this way, the end user cuts off unnecessary scenes in advance and browses only representative still images of necessary scenes,
It is possible to reduce the amount of information presented.

【００７３】また別の手順として、端末73からの条件入
力に拘らず、予め安定したカメラワーク区間である評価
レベル０のシーンだけを、端末73に自動的に提示するよ
うにしてもよい。その際、内容提示装置83が特定のシー
ンを動画像としてそのまま提示してもよいが、端末73に
送出する情報量を極力減らすために、ここでは実施例の
ように特定のシーンを代表する画像（静止画）や、前記
時間情報を提示するのが好ましい。さらに、似たような
特定のシーンが抽出された場合は、各シーンを区別する
ためにインデックスを付加するのが好ましい。As another procedure, regardless of the condition input from the terminal 73, only the scene of the evaluation level 0, which is a stable camera work section, may be automatically presented to the terminal 73 in advance. At that time, the content presentation device 83 may present the specific scene as a moving image as it is, but in order to reduce the amount of information transmitted to the terminal 73 as much as possible, here, an image representative of the specific scene is used as in the embodiment. It is preferable to present (still image) and the time information. Furthermore, when similar specific scenes are extracted, it is preferable to add an index to distinguish each scene.

【００７４】特定のシーンにおける代表的な静止画が端
末73に表示されると、エンドユーザーは各評価レベル毎
のシーン時間や、インデックスなども参照して、原画像
に対してどのような編集を行なうのかを、編集命令とし
て入力する。すなわちステップＳ46において、ビデオコ
ンテンツ編集支援システム71を構成する編集コマンド受
付装置88が、端末73からの編集命令を受付けると、映像
構成装置89はこの編集命令に基づいて全ての原画像の中
から必要なシーンだけを映像蓄積装置84から抽出し、こ
れらの各シーンを適宜つなぎ合わせると共に、必要に応
じてここに付加データ（他の映像，音，字幕，エフェク
トなど）を挿入した編集候補映像を自動的に生成する。
そして、編集済候補映像は、情報ネットワーク72を経由
してユーザーの保有する端末73に送り出される（ステッ
プＳ47）。When a representative still image of a specific scene is displayed on the terminal 73, the end user also refers to the scene time for each evaluation level, the index, etc. to edit what the original image is. Enter what you want to do as an edit command. That is, in step S46, when the edit command accepting device 88 constituting the video content editing support system 71 accepts the edit command from the terminal 73, the video composing device 89 is required from all the original images based on this edit command. Only scenes from the video storage device 84 are extracted, these scenes are connected appropriately, and the editing candidate video with additional data (other video, sound, subtitles, effects, etc.) inserted here is automatically created if necessary. To generate.
Then, the edited candidate video is sent to the terminal 73 owned by the user via the information network 72 (step S47).

【００７５】続くステップＳ48において、端末73側では
送られてきた編集済候補映像を確認すると共に、映像構
成装置89が行なった編集処理の内容について追加や修正
などの変更がある場合は、その旨を変更命令としてビデ
オコンテンツ編集支援システム71に送り出す。これを編
集コマンド受付手段88が受付けると、映像構成装置89は
変更命令に基づく新たな編集済候補映像を再生成する。
そして、この変更した編集済候補映像が情報ネットワー
ク72を経由して端末73に送り出される（ステップＳ4
9）。ステップＳ48〜ステップＳ49の手順を繰り返すこ
とにより、エンドユーザーは簡単なコマンドを端末73に
入力するだけで、所望の編集映像を得ることができる。At the subsequent step S48, the terminal 73 side confirms the edited candidate video image sent, and if there is a change such as addition or correction to the content of the editing process performed by the video image composition device 89, that effect is notified. Is sent to the video content editing support system 71 as a change command. When this is received by the edit command receiving means 88, the video composing device 89 regenerates a new edited candidate video based on the change command.
Then, the changed edited candidate video is sent to the terminal 73 via the information network 72 (step S4).
9). By repeating the procedure of steps S48 to S49, the end user can obtain a desired edited video by simply inputting a simple command to the terminal 73.

【００７６】こうして、所望の編集映像が得られたら、
ステップＳ48において、エンドユーザーはもはや編集処
理の変更の必要がない旨を、編集処理完了の確認命令と
して端末73側から情報ネットワーク72を経由してビデオ
コンテンツ編集支援システム71に送り出す。これを受け
て映像構成装置89は、編集処理の確認が得られた編集済
候補映像をユーザーが希望するフォーマットに変換し、
編集済映像として情報ネットワーク72から端末73に配信
する（ステップＳ50）。In this way, when the desired edited image is obtained,
In step S48, the end user sends a message to the effect that the editing process is no longer necessary from the terminal 73 side to the video content editing support system 71 via the information network 72 as a confirmation command of the editing process completion. In response to this, the video composition device 89 converts the edited candidate video for which the confirmation of the editing process has been obtained into the format desired by the user,
The edited image is distributed from the information network 72 to the terminal 73 (step S50).

【００７７】上記一連の手順において、エンドユーザー
は、映像の見易さに応じた特定の評価レベルのシーンか
ら代表的な静止画を確認し、予め提供した原画像をどの
ように編集するのかを端末73から操作入力すると共に、
送られてきた編集候補画像を確認して、どのように編集
処理を変更してほしいのかを、同じく端末から操作入力
するだけでよい。原画像を受付けてから編集済映像を配
信するまでの各種装置81〜89が、全てセンター装置とし
てのビデオコンテンツ編集支援システム71内に設けられ
ていると共に、このビデオコンテンツ編集支援システム
71と端末73が通信手段である情報ネットワークで接続さ
れているため、端末73には原画像を編集する際に必要な
ソフトウェア上の機能を一切保有する必要がなく、大掛
かりなソフトウェアを各端末73毎に組み込む必要がな
い。In the above series of procedures, the end user confirms a typical still image from a scene of a particular evaluation level according to the ease of viewing the video and decides how to edit the original image provided in advance. While inputting operation from the terminal 73,
All you have to do is confirm the sent editing candidate image and input how the editing process should be changed from the terminal as well. Various devices 81 to 89 from receiving the original image to delivering the edited video are all provided in the video content editing support system 71 as the center device, and the video content editing support system is also provided.
Since 71 and the terminal 73 are connected by an information network as a communication means, it is not necessary for the terminal 73 to have any software function necessary for editing the original image, and large-scale software is required for each terminal 73. There is no need to install it for each.

【００７８】以上のように本実施例によれば、編集対象
となる原映像を受取る受取手段としての受取装置81と、
原映像の内容を解析する映像内容解析手段としての映像
内容解析装置82と、映像内容解析装置82の解析内容から
特定のシーンを抽出して提示する内容提示手段としての
内容提示装置83と、この特定のシーンに基づく編集命令
を受付ける編集命令受付手段（編集コマンド受付装置8
8）と、この編集命令に沿って前記原映像を編集処理し
た編集済候補映像を提示する編集済候補映像提示手段
（映像構成装置89）と、映像構成装置89が行なった原画
像の編集処理に関す変更を受付けると、この変更した編
集処理による新たな編集済候補映像を提示させる編集処
理変更受付手段（編集コマンド受付装置88）と、編集処
理の完了命令を受付けると、その直前に提示した編集済
候補映像を所望のフォーマットに変換して、編集済映像
として配信する編集済映像配信手段（映像構成装置89）
とを備えて構成される。As described above, according to the present embodiment, the receiving device 81 as the receiving means for receiving the original image to be edited,
A video content analysis device 82 as a video content analysis means for analyzing the content of the original video, a content presentation device 83 as a content presentation means for extracting and presenting a specific scene from the analysis content of the video content analysis device 82, and Edit command receiving means (edit command receiving device 8 for receiving an edit command based on a specific scene)
8), edited candidate video presenting means (video composing device 89) for presenting the edited candidate video in which the original video is edited in accordance with the editing instruction, and original image editing process performed by the video composing device 89. When a change related to the editing process is accepted, a new edited candidate image by the changed editing process is presented, and an editing process change acceptance unit (editing command acceptance device 88), and when an editing process completion command is accepted, it is presented immediately before that. Edited video distribution means for converting the edited candidate video into a desired format and distributing the edited video (video composition device 89)
And is configured.

【００７９】この場合、取得した原映像の内容が解析さ
れ、その中から抽出した特定のシーンがエンドユーザー
に提示される。エンドユーザーは提示された特定のシー
ンを確認しながら、原画像に対しどのような編集を施せ
ばよいのかを編集命令として送り出すと、今度はこの編
集命令に沿って原画像を編集処理した編集済候補映像が
エンドユーザーに提示される。エンドユーザーからの編
集処理の変更を受付けると、この変更した編集処理によ
る新たな編集済候補映像をエンドユーザーに提示すると
共に、編集処理の完了命令を受付けると、その直前に提
示した編集済候補映像を所望のフォーマットに変換し
て、これを編集済映像として配信する。In this case, the content of the acquired original image is analyzed, and the specific scene extracted from it is presented to the end user. While confirming the specific scene presented, the end user sends out as an editing command what kind of editing should be performed on the original image, this time the original image is edited according to this editing command The candidate video is presented to the end user. When the editing process change from the end user is accepted, the new edited candidate image by this modified editing process is presented to the end user, and when the editing process completion command is accepted, the edited candidate image presented immediately before that is received. Is converted into a desired format and distributed as an edited video.

【００８０】このように、エンドユーザー側で原映像を
編集するに際しては、原映像そのものではなく、原映像
の内容を解析して得られた特定のシーンや、編集処理を
行なった編集済候補映像を見ながら、編集に必要なコマ
ンドを適宜送り出すだけでよい。そのため、映像コンテ
ンツ（原映像）そのもののを目で追いながらその内容を
評価する手間が省け、エンドユーザーの意志で映像コン
テンツを短時間かつ自由に編集することが可能になる。As described above, when the original video is edited on the end user side, not the original video itself, but a specific scene obtained by analyzing the content of the original video or an edited candidate video that has been edited. All you have to do is send the appropriate commands for editing while watching. Therefore, it is possible to save time and effort to evaluate the content of the video content (original video) itself with eyes and edit the video content freely in a short time at the will of the end user.

【００８１】そしてこのような作用効果は、編集対象と
なる原映像を受取り、この原映像の解析内容により抽出
した特定のシーンを提示し、この特定のシーンに基づく
編集命令を受付けると、この編集命令に沿って原映像を
編集処理した編集済候補映像を提示し、編集処理の変更
を受付けると、この変更した編集処理による新たな編集
済候補映像を提示し、編集処理の完了命令を受付ける
と、その直前に提示した編集済候補映像を所望のフォー
マットに変換して、編集済映像として配信する方法でも
達成できる。Such an operation effect is that when an original image to be edited is received, a specific scene extracted by the analysis content of the original image is presented, and an editing command based on this specific scene is accepted, this editing is performed. When the edited candidate video, which is the original video edited according to the instruction, is presented, and the edit processing change is accepted, a new edited candidate video is presented by the modified edit processing, and the edit processing completion instruction is accepted. The method can also be achieved by converting the edited candidate video presented immediately before that to a desired format and distributing it as the edited video.

【００８２】また本実施例では、原映像を複数のシーン
に区画して、各シーン毎の映像の見易さを複数の評価レ
ベルの中から判定するように映像内容解析装置82を構成
し、要求のあった条件に合致する評価レベルのシーンを
特定のシーンとして提示するように内容提示装置83を構
成している。Further, in the present embodiment, the original image is divided into a plurality of scenes, and the image content analysis device 82 is configured so as to judge the visibility of the image for each scene from a plurality of evaluation levels, The content presentation device 83 is configured to present a scene of an evaluation level that matches the requested condition as a specific scene.

【００８３】この場合、受取った原映像を複数のシーン
に区画し、各シーン毎の映像の見易さを複数の評価レベ
ルの中から判定する。そして、エンドユーザーから要求
した条件に合致する評価レベルのシーンが、特定のシー
ンとしてエンドユーザーに提示される。したがって、エ
ンドユーザーはいちいち原画像における各シーンの評価
を行なうことなく、必要な評価レベルのシーンを編集に
先立ち確認することができる。In this case, the received original video image is divided into a plurality of scenes, and the visibility of the video image for each scene is judged from a plurality of evaluation levels. Then, the scene of the evaluation level that matches the condition requested by the end user is presented to the end user as a specific scene. Therefore, the end user can confirm the scene of the required evaluation level before editing without evaluating each scene in the original image.

【００８４】そしてこのような作用効果は、原画像の解
析内容により抽出した特定のシーンを提示するに際し
て、原映像を複数のシーンに区画して、各シーン毎の映
像の見易さを複数の評価レベルの中から判定すると共
に、要求のあった条件に合致する評価レベルのシーンを
前記特定のシーンとして提示する方法でも達成できる。In addition, such an effect is obtained by dividing the original image into a plurality of scenes when presenting a specific scene extracted based on the analysis contents of the original image, and making the image easy to see for each scene. It can be achieved by a method of judging from the evaluation levels and presenting the scene of the evaluation level that matches the requested condition as the specific scene.

【００８５】さらに本実施例の内容提示装置83は、特定
のシーンの中の代表的な静止画を提示するように構成し
ている。こうすると、エンドユーザーは必要となる特定
のシーンの代表的静止画だけを確認すればよく、特定の
シーンとして提示する情報量の削減を図ることが可能に
なる。Further, the content presentation device 83 of this embodiment is configured to present a typical still image in a specific scene. In this case, the end user only needs to confirm the representative still image of the required specific scene, and it is possible to reduce the amount of information presented as the specific scene.

【００８６】そしてこのような作用効果は、特定のシー
ンの提示に際して、この特定のシーンの中の代表的な静
止画を提示することでも達成される。The above-described effects can also be achieved by presenting a typical still image in the specific scene when presenting the specific scene.

【００８７】因みに、本実施例におけるビデオコンテン
ツ編集支援システム71によれば、様々な使用環境の下
で、各種サービスの提供が可能になる。例えば、旅行代
理店などとの提携により、空港や駅などでユーザーであ
る旅行者からのビデオテープなどの映像記憶媒体を預か
り、そこに設置された入力装置（図示せず）から、情報
ネットワーク72を介してビデオコンテンツ編集支援シス
テム71の受取り手段81に素材ビデオデータ（原映像）を
転送する。原映像を取得したリモート編集センター側の
ビデオコンテンツ編集支援システム71は、映像内容解析
装置82にてその内容を解析し、例えばインターネット環
境を通じてユーザーに提示する。これによりユーザーは
端末73より好みの編集処理を行ない、最終的な編集済映
像を所望のフォーマットで受け取ることが可能になる。Incidentally, according to the video content editing support system 71 in the present embodiment, various services can be provided under various usage environments. For example, in cooperation with a travel agency or the like, a video storage medium such as a video tape from a traveler who is a user is stored at an airport or a station, and an input device (not shown) installed there stores information network 72. The material video data (original image) is transferred to the receiving means 81 of the video content editing support system 71 via. The video content editing support system 71 on the remote editing center side that has acquired the original image analyzes the content with the image content analysis device 82 and presents it to the user through, for example, the Internet environment. This allows the user to perform a desired editing process from the terminal 73 and receive the final edited video in a desired format.

【００８８】その他に、本実施例のビデオコンテンツ編
集支援システム71は、運動会，卒業式，結婚式，公演，
選挙用の演説などの各種イベントにおける撮影済原映像
の編集支援システムとしても利用できる。In addition, the video content editing support system 71 of this embodiment is used for athletic meet, graduation ceremony, wedding, performance,
It can also be used as an editing support system for shot original videos at various events such as speeches for elections.

【００８９】本発明は上記実施例に限定されるものでは
なく、本発明の要旨の範囲において種々の変形実施が可
能である。なお、上記説明にある閾値は全て同一の値で
ある必要はなく、各判定条件毎に最適な値が設定される
ものである。The present invention is not limited to the above embodiments, but various modifications can be made within the scope of the gist of the present invention. The thresholds described above do not all have to be the same value, and an optimum value is set for each determination condition.

【００９０】[0090]

【発明の効果】本発明の請求項１の映像コンテンツ編集
支援装置および請求項４の映像コンテンツ編集支援方法
によれば、エンドユーザーの意志で映像コンテンツを短
時間かつ自由に編集できる。According to the video content editing support device of the first aspect and the video content editing support method of the fourth aspect of the present invention, the video content can be freely edited in a short time at the will of the end user.

【００９１】本発明の請求項２の映像コンテンツ編集支
援装置および請求項５の映像コンテンツ編集支援方法に
よれば、エンドユーザー側で必要な評価レベルのシーン
を編集に先立ち確認することができる。According to the video content editing support device of the second aspect and the video content editing support method of the fifth aspect of the present invention, it is possible for the end user to confirm the scene of the required evaluation level prior to editing.

【００９２】本発明の請求項３の映像コンテンツ編集支
援装置および請求項６の映像コンテンツ編集支援方法に
よれば、特定のシーンとして提示する情報量の削減を図
ることが可能になる。According to the video content editing support device of claim 3 and the video content editing support method of claim 6, it is possible to reduce the amount of information presented as a specific scene.

[Brief description of drawings]

【図１】本発明の一実施例を示す映像コンテンツ編集支
援装置の全体構成をあらわした概略説明図である。FIG. 1 is a schematic explanatory view showing an overall configuration of a video content editing support device showing an embodiment of the present invention.

【図２】同上映像内容解析装置の全体構成をあらわした
概略説明図である。FIG. 2 is a schematic explanatory view showing the overall configuration of the video content analysis device of the above.

【図３】同上映像内容解析装置のより詳細な構成と手順
を示したフローチャートである。FIG. 3 is a flowchart showing a more detailed configuration and procedure of the video content analysis device of the above.

【図４】同上図３における区間特徴量抽出部のより詳細
な構成と手順を示したフローチャートである。FIG. 4 is a flowchart showing a more detailed configuration and procedure of the section feature amount extraction unit in FIG. 3 above.

【図５】同上動きベクトルの向きのクラスタリング方式
とフレーム内動きベクトルヒストグラム偏り度の算出方
式をあらわす概略図である。FIG. 5 is a schematic diagram showing a clustering method of motion vector directions and a calculation method of an intra-frame motion vector histogram bias degree.

【図６】同上ショットを複数の区間に分割に関する手順
を示す概略図である。FIG. 6 is a schematic diagram showing a procedure for dividing a shot into a plurality of sections.

【図７】同上分割された各区間からキーフレームを抽出
する手順を示す概略図である。FIG. 7 is a schematic diagram showing a procedure for extracting a key frame from each of the divided sections.

【図８】同上各フレーム毎のフレーム中央の輝度平均お
よび色差平均を示すグラフである。FIG. 8 is a graph showing an average luminance and an average color difference in the center of each frame of the same.

【図９】同上開始フレームとの類似度を、フレームの全
体とフレームの中央でそれぞれ示したグラフである。FIG. 9 is a graph showing the degree of similarity with the start frame of the same as above for the entire frame and the center of the frame.

【図１０】同上色情報を用いた区間検出と動き情報を用
いたカメラワーク検出との関係を示す概略図である。FIG. 10 is a schematic diagram showing the relationship between section detection using color information and camerawork detection using motion information.

【図１１】同上映像の見易さに基づいて映像コンテンツ
を自動的に構造化した例を示す概略図である。FIG. 11 is a schematic diagram showing an example in which video contents are automatically structured based on the visibility of the same as above.

【図１２】同上映像コンテンツ編集支援装置が行なう一
連の処理手順を示すフローチャートである。FIG. 12 is a flowchart showing a series of processing procedures performed by the same video content editing support device.

[Explanation of symbols]

81 受取装置（受取手段） 82 映像内容解析装置（映像内容解析手段） 83 内容提示装置（内容提示手段） 88 編集命令コマンド装置（編集命令受付手段，編集処
理変更受付手段） 89 映像構成装置（編集済候補映像提示手段，編集済映
像配信手段）81 Receiving device (receiving means) 82 Video content analyzing device (video content analyzing means) 83 Content presenting device (content presenting means) 88 Editing command device (editing command receiving means, editing process change receiving means) 89 Video composition device (editing) Candidate video presentation means, edited video distribution means)

フロントページの続き (72)発明者富永英義東京都新宿区大久保３−４−１早稲田大学理工学部内 (72)発明者小舘亮之東京都新宿区大久保３−４−１早稲田大学理工学部内 (72)発明者土橋健太郎東京都新宿区大久保３−４−１早稲田大学理工学部内 (72)発明者大串亮平東京都新宿区大久保３−４−１早稲田大学理工学部内 (72)発明者花村剛東京都新宿区大久保二丁目４番12号株式会社メディアグルー内Ｆターム(参考） 5B075 ND12 5C052 AA03 AB04 CC11 DD02 DD04 EE03 5C053 FA07 FA14 FA22 FA23 FA29 GB38 HA29 JA21 LA11 LA14 5C064 BA07 BB10 BC18 BC23 BD02 BD08 Continued front page (72) Inventor Hideyoshi Tominaga 3-4-1 Okubo, Shinjuku-ku, Tokyo Waseda Univ. Faculty of Science and Engineering (72) Inventor Ryoyuki Kodate 3-4-1 Okubo, Shinjuku-ku, Tokyo Waseda Univ. Faculty of Science and Engineering (72) Inventor Kentaro Dobashi 3-4-1 Okubo, Shinjuku-ku, Tokyo Waseda Univ. Faculty of Science and Engineering (72) Inventor Ryohei Ogushi 3-4-1 Okubo, Shinjuku-ku, Tokyo Waseda Univ. Faculty of Science and Engineering (72) Inventor Tsuyoshi Hanamura 2-4-12 Okubo, Shinjuku-ku, Tokyo Stocks Company media glue F term (reference) 5B075 ND12 5C052 AA03 AB04 CC11 DD02 DD04 EE03 5C053 FA07 FA14 FA22 FA23 FA29 GB38 HA29 JA21 LA11 LA14 5C064 BA07 BB10 BC18 BC23 BD02 BD08

Claims

[Claims]

1. Receiving means for receiving an original image to be edited, and image content analyzing means for analyzing the content of the original image,
Content presenting means for extracting and presenting a specific scene from the analysis content of the video content analyzing means, editing command receiving means for receiving an editing command based on the specific scene, and editing the original video according to the editing command Edited candidate video presenting means for presenting the processed edited candidate video, and edit, when accepting a change of the editing process, presents a new edited candidate video by the changed editing process by the edited candidate video presenting means A process change acceptance unit and an edited video distribution unit for converting the edited candidate video presented immediately before the editing change completion command to a desired format and distributing the edited video as an edited video. Characteristic video content editing support device.

2. The image content analyzing means divides the original image into a plurality of scenes, determines the visibility of the image for each scene from a plurality of evaluation levels, and presents the content. The video content editing support device according to claim 1, wherein the means presents a scene of an evaluation level that matches a requested condition as the specific scene.

3. The video content editing support device according to claim 1, wherein the content presenting means presents a representative still image in the specific scene.

4. An original image to be edited is received, and a specific scene extracted by the analysis content of the original image is presented,
When an edit command based on this specific scene is accepted, an edited candidate video in which the original video is edited according to the edit command is presented, and when a change in the edit process is accepted, a new one is created by the modified edit process. When the edited candidate video is presented and the editing processing completion command is received, the edited candidate video presented immediately before that is converted into a desired format,
A video content editing support method characterized by delivering as an edited video.

5. When presenting a specific scene extracted based on the analysis contents of the original image, the original image is divided into a plurality of scenes, and the viewability of the image for each scene is selected from a plurality of evaluation levels. 5. The video content editing support method according to claim 4, wherein a scene of an evaluation level that matches the requested condition is presented as the specific scene while the determination is made.

6. The video content editing support method according to claim 4, wherein a representative still image in the specific scene is presented when the specific scene is presented.