JPH0944639A

JPH0944639A - Method and device for classifying video blocks

Info

Publication number: JPH0944639A
Application number: JP7197416A
Authority: JP
Inventors: Yasumasa Niikura; 康巨新倉; Hiroshi Hamada; 洋浜田; Akito Akutsu; 明人阿久津; Yukinobu Taniguchi; 行信谷口
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1995-08-02
Filing date: 1995-08-02
Publication date: 1997-02-14
Anticipated expiration: 2015-08-02
Also published as: JP3358692B2

Abstract

PROBLEM TO BE SOLVED: To automatically classify images into sets of similar shots without finely dividing the image for the unit of a shot and without damaging time order property. SOLUTION: First of all, the feature amounts of respective shots are calculated and the shot is used as the block in an initial state (step 111). The degree of similarity between adjacent blocks is calculated (step 112), the adjacent block or step showing the maximum degree of similarity among the calculated degrees of similarity is merged with one block (step 113) and the feature amount of the merged block is calculated (step 114). The steps 112-114 are repeated until the classification is sufficiently performed, and the number of blocks is gradually decreased.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、映像データの処理
に関し、特に、映像を構成する画像データ列を分類して
ブロックやシーン単位に構造化する方法及び装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to video data processing, and more particularly to a method and apparatus for classifying image data sequences forming a video and structuring the blocks into scenes or scenes.

【０００２】[0002]

【従来の技術】映像データは一般にデータ量が膨大であ
るが、その内容を知るためには、映像を時間順に全て見
ていくしかなかった。時間的に連続する画像データの集
合が映像であると考えると、映像を構成する各画像デー
タがある尺度に基づいてそれぞれ分類されていれば、映
像の概略を把握したり、いわゆる飛ばし見をしたりする
のに有用であり、短時間で映像の内容を理解するのに非
常に役立つのではないかと期待される。以下、映像の分
類とは、映像に内在する階層的な構造要素を検出し、同
一の分類に属する構造要素を束ねることによって、時間
軸に関して映像を複数のブロックに分割することを意味
する。同一の分類に属する構造要素を束ねることに着目
すれば、映像の構造化ともいうことができる。2. Description of the Related Art Video data generally has an enormous amount of data, but the only way to know the content is to watch all the videos in chronological order. Considering that a set of temporally continuous image data is a video, if each image data forming the video is classified according to a certain scale, the outline of the video can be grasped, or so-called skipping can be performed. It is expected that it will be very useful for understanding the contents of the video in a short time. Hereinafter, classification of an image means dividing an image into a plurality of blocks with respect to a time axis by detecting hierarchical structural elements that are inherent in the image and bundling structural elements that belong to the same classification. If attention is paid to bundling structural elements belonging to the same classification, it can also be referred to as video structuring.

【０００３】一般に映像は、「フレーム」、「テイ
ク」、「ショット」、「シーン」、「ストーリ」という
単位で階層的に分類される。この階層的分類には、ハー
ドウェアでの物理的なレベル（下位レベル）から、人間
の創造的作業による意味的なレベル（上位レベル）まで
の全てを含んでいる。Generally, images are hierarchically classified in units of "frame", "take", "shot", "scene", and "story". This hierarchical classification includes everything from the physical level in hardware (lower level) to the semantic level of human creative work (upper level).

【０００４】「フレーム」は、映像を撮影するときのフ
ィルムの１コマ１コマに対応する物理的な単位であり、
「テイク」は、フレームの集合であって同一カメラで撮
影された時間的に連続な映像区間を示す単位であり、こ
れには、撮影時のカメラのオン（０Ｎ）／オフ（０Ｆ
Ｆ）が反映される。A "frame" is a physical unit corresponding to each frame of film at the time of shooting an image,
"Take" is a set of frames and is a unit indicating a temporally continuous video section shot by the same camera. The "take" is set to ON (0N) / OFF (0F) of the camera at the time of shooting.
F) is reflected.

【０００５】「ショット」は、テイクと同様に同一カメ
ラで撮影された映像区間を示す単位であるが、映像の編
集時にテイクの中から選び出された映像区間をいい、テ
イクよりもより人間の意図が反映された意味的な単位で
ある。さらに、編集作業によりショットが組み合わされ
ると、映像の「ストーリ」となる。すなわち、テイクは
映像素材の単位であり、ショットは映像作品の構成単位
である。A "shot" is a unit indicating a video section taken by the same camera as a take, but it means a video section selected from the takes at the time of editing the video, which is more human-like than the take. It is a semantic unit that reflects the intention. Furthermore, when shots are combined by editing work, it becomes a "story" of the video. That is, a take is a unit of video material, and a shot is a unit of video work.

【０００６】一方、「シーン」とは、映像作品におい
て、意味的に同一の場面とみなせるショットの集合を指
し、映像作品を幾つかのブロックにわける際の単位とな
る。シーンの分類には人間の創造的な作業を多く必要と
し、さまざまな解釈によっていろいろに分類される。例
として、登場人物が一致していれば、同一シーンとする
解釈もあるだろうし、時問的にどれだけ離れていても、
映像を撮影した舞台や場所が一致していれば、同一シー
ンとする場合もある。しかしここでは、作品の内容把握
を助けるために行うためにシーンの分類を行うという視
点にたち、シーンに以下の条件を設ける。条件１：シーンは、似たような画像特性をもつショット
の集合である（以下、特徴類似条件という）。条件２：シーンは、時間的に隣接し連続したショットの
集合である（以下、時間的連続条件という）。On the other hand, a "scene" refers to a set of shots that can be regarded as semantically identical scenes in a video work, and is a unit for dividing the video work into several blocks. The classification of scenes requires a lot of human creative work and is classified differently by various interpretations. For example, if the characters are the same, it may be interpreted as the same scene, and no matter how far apart they are,
The scenes may be the same if the stage and place where the video was shot match. However, here, from the viewpoint of classifying scenes in order to help grasp the content of the work, the following conditions are set for the scenes. Condition 1: A scene is a set of shots having similar image characteristics (hereinafter, referred to as feature similarity condition). Condition 2: A scene is a set of shots that are temporally adjacent and continuous (hereinafter referred to as a temporal continuous condition).

【０００７】以上の条件を満たすシーンの最も端的な例
は、同一被写体を、複数の異なるカメラで異なる視点か
ら撮影した映像から、それらの切り替えの連続によって
作成された映像である。すなわち、類似し、かつ、連続
な複数のショットによって一つのシーンを構成している
例である。一方で、例えばニュース番組映像によく見ら
れるスタジオと現場の中継の映像が交互に存在するよう
な映像は、それぞれ、スタジオの場面同士は類似し、現
場の中継同士も類似しているが、スタジオと現場中継映
像の間の類似性は低いので、同一のシーンであるとはし
ない。類似した映像が連続的に構成されているわけでは
ないので、このような場合は、スタジオシーンＡ、現場
シーンＡ、スタジオシーンＢ、現場シーンＢ、スタジオ
シーンＣ、現場シーンＣ、…というように、それぞれ全
く別個のシーンが連続に続いているとみなす。The most extreme example of the scene satisfying the above conditions is an image created by continuously switching the images of the same subject photographed by a plurality of different cameras from different viewpoints. That is, this is an example in which one scene is composed of a plurality of similar and continuous shots. On the other hand, for example, the images often seen in news programs such as studios and live broadcasts on the spot are similar to each other, although the scenes in the studios are similar to each other and the broadcasts to the live are similar. Since there is little similarity between the video footage and the live video, they are not the same scene. Since similar images are not continuously composed, in such a case, a studio scene A, a scene scene A, a studio scene B, a scene scene B, a studio scene C, a scene scene C, ... , It is assumed that each completely different scene continues.

【０００８】理想的な分類を行うにはストーリなどまで
を考慮なければならないが、現状ではこの作業は人手に
よってしか行うことができず、作業量が膨大となって、
特別の場合を除いて非現実的なものとなる。したがっ
て、画像データの分類に関し、なんらかの自動化が求め
られている。In order to perform an ideal classification, it is necessary to consider the story etc., but at present, this work can be performed only manually, and the work amount becomes enormous.
It is unrealistic except in special cases. Therefore, some sort of automation is required for classification of image data.

【０００９】上述した映像の分類単位に基づきユーザが
利用しやすいように画像データを分類して映像を構造化
することを目的とした従来技術がいくつか存在する。There are some conventional techniques for classifying image data so as to be easily used by the user based on the above-described image classification unit to structure the image.

【００１０】例えば、連続する画像フレームのフレーム
間での対応する位置(ｘ,ｙ)の輝度の差分の総和から、
連続するフレームにおける変化率を計算して映像のカッ
ト点の切り替わりを検出する「映像カット点検出方法」
等があげられる。これはショット単位の分類技術として
とらえられる。For example, from the sum of the differences in brightness at corresponding positions (x, y) between consecutive image frames,
"Video cut point detection method" that detects the switching of video cut points by calculating the rate of change in consecutive frames
And the like. This can be regarded as a shot-based classification technique.

【００１１】シーン単位の分類を目的とした技術として
は、(1)同一のシーンは似た色情報からなるという立場
から、画像データ列の色情報を特徴空間ヘ変換し、特徴
空間上でクラスタリングし、映像の分類を行う「映像特
徴処理方法」（特開平６−２５１１４７号）や、(2)一
般に映像作品においては画像とともに映像データを構成
する要素である音情報がショット単位でなくシーン単位
にかつ意味的な作業によって付加されていることを利用
して、この音情報によってシーンを分類する「音情報を
用いたビデオ・ブラウジング・インタフェース」（テレ
ビジョン学会技術報告, Vol.19, No.7, 1995/12）や、
(3)ショットごとの代表画面を求め、代表画像間での輝
度のモーメント不変量と色情報とを基に類似度を算出
し、類似性の高い代表画面及びショットを表示し、低い
ものを表示しないインタフェースを作成し、結果的にユ
ーザに対して類似した代表画面を表示することによって
映像をブロックに分ける「Content-based Browsing of
Video Sequences」（ACM, Multimedia 94, P.97-）等が
存在する。As a technique for classifying scenes, (1) from the standpoint that the same scene is composed of similar color information, the color information of the image data string is converted into a feature space, and clustering is performed on the feature space. However, in "video feature processing method" (Japanese Patent Laid-Open No. 6-251147) for classifying video, (2) generally, in a video work, sound information, which is an element that constitutes video data together with an image, is not a shot unit but a scene unit. "Video browsing interface using sound information" that classifies scenes based on this sound information, utilizing the fact that they are added to each other by semantic work (Technical Report of the Institute of Television Engineers of Japan, Vol. 19, No. 7, 1995/12),
(3) Obtain a representative screen for each shot, calculate the similarity based on the moment invariant of brightness between representative images and color information, display the representative screen and shots with high similarity, and display the low one "Content-based Browsing of" is created by creating an interface that does not display the result and displaying a similar representative screen to the user.
Video Sequences "(ACM, Multimedia 94, P.97-) and so on.

【００１２】[0012]

【発明が解決しようする課題】映像の内容の把握を目的
として、映像を分類する場合には、意味を反映するブロ
ックすなわちシーン単位に分類されることが望ましい。When classifying video for the purpose of grasping the content of the video, it is desirable to classify the video into blocks that reflect the meaning, that is, scene units.

【００１３】上述した従来の技術のうち、「映像カット
点検出方法」は、映像におけるカット点の検出を目的と
し、ショット単位での分類を可能にしている。しかし、
例えば２時間の映像作品は一般に数千にも及ぶショット
から構成されており、ショット単位では、映像の内容把
握のためには細分化されすぎてしまうという問題点が生
じる。したがって、ショット単位ではなくシーン単位に
分類する技術が必要となる。Among the above-mentioned conventional techniques, the "video cut point detection method" aims at detection of a cut point in a video and enables classification in shot units. But,
For example, a two-hour video work is generally composed of thousands of shots, and there is a problem in that shots are too subdivided for grasping the content of the video. Therefore, there is a need for a technique of classifying not by shots but by scenes.

【００１４】一方、「映像カット点検出方法」を除いた
他の従来技術は、ショット単位の分類では細分化されす
ぎる点を解決すべく、より意味的な単位に分類を行うこ
とを目的としたものである。このうち、「映像特徴処理
方法」では、同一のシーンは類似した色情報によって構
成されているという仮定に基づき、色情報を基づく特徴
空間を使用して、類似した色の組み合わせをもつ画像の
分類を行っている。この「映像特徴処理方法」は、上述
の特徴類似条件（条件１）を満たしているが、時間的に
連続であるという時間的連続条件（条件２）を考慮して
いない。したがって、この方法によれば、類似する画像
群を抽出して、時間的に不連続でかつ類似した画像を１
つのブロックとして検出してしまうことがある。すなわ
ち、上述のニュース番組映像の例を用いて説明すれば、
時間的に不連続なスタジオでの映像をそれぞれ個別のシ
ーンとして分類することなく、一つの類似したブロック
とみなしてしまう。結局、この方法では、時間的に連続
なシーンを安定に抽出することができない。On the other hand, other conventional techniques except the "video cut point detection method" aim to classify into more meaningful units in order to solve the point of being too subdivided by the shot unit classification. It is a thing. Among them, in the "video feature processing method", based on the assumption that the same scene is composed of similar color information, a feature space based on color information is used to classify images having similar color combinations. It is carried out. This "video feature processing method" satisfies the above-mentioned feature similarity condition (condition 1), but does not consider the temporal continuous condition (condition 2) of being temporally continuous. Therefore, according to this method, a group of similar images is extracted, and a temporally discontinuous and similar image is extracted as one image.
It may be detected as one block. That is, using the example of the news program video described above,
Images in a studio that are discontinuous in time are regarded as one similar block without being classified as individual scenes. After all, this method cannot stably extract scenes that are temporally continuous.

【００１５】「音表現を用いたビデオ・ブラウジング・
インタフェース」は、ＢＧＭ等の音情報を利用すること
によって、細分化されたショットを一連の時間的に連続
な同一シーンに併合することができ、上述の時間的連続
条件を満たす分類を行うことができる。しかしながら、
シーンをまたがって同じＢＧＭが連続している場合やシ
ーンの途中から音情報が挿入された場合、さらには音情
報が全く存在しないような映像作品等のように、シーン
と音情報が必ずしも対応していない場合には、安定して
映像をシーンに分類することができないという問題点が
ある。さらに、特徴類似条件を満たさないという問題点
がある。"Video browsing using sound expression
By using sound information such as BGM, the “interface” can merge the subdivided shots into a series of the same scenes that are temporally continuous, and can perform classification that satisfies the temporal continuous condition described above. it can. However,
Scenes and sound information do not necessarily correspond to each other, such as when the same BGM is continuous across scenes or when sound information is inserted in the middle of a scene, or even in a video work where sound information does not exist at all. If not, there is a problem that the image cannot be stably classified into scenes. Further, there is a problem that the feature similarity condition is not satisfied.

【００１６】「Content-based Browsing of Video Sequ
ences」では、予めショット単位に分類を行い、ショッ
トごとに代表画像を選び出し、これらの代表画像同士の
モーメント不変量の比較と色情報の比較との両方を利用
して類似度を判定している。形状と色による類似度を用
いることによって、代表画面を選択した際に類似する代
表画面のみを表示することにより、注視している画像に
類似し関連する情報の簡単な表示、検索を実現してい
る。したがって、「映像特徴処理方法」と同様に特徴類
似条件は満たすが、時間的連続条件は満たさない。[Content-based Browsing of Video Sequ
In “ences”, classification is performed in advance for each shot, a representative image is selected for each shot, and the similarity is determined by using both the comparison of the moment invariants of these representative images and the comparison of color information. . By using the similarity by shape and color, by displaying only the representative screen that is similar when the representative screen is selected, it is possible to realize a simple display and search of information similar to the image being watched and related. There is. Therefore, similar to the “video feature processing method”, the feature similarity condition is satisfied, but the temporal continuous condition is not satisfied.

【００１７】映像の内容把握を的確に行えるような分類
を実行するためには、上述の特徴類似条件と時間的連続
条件の両方を満たすシーン分類を実行しなければならな
いが、以上述べた従来の技術にはこれら２条件を同時に
満足するものはない。In order to perform the classification so that the content of the video can be accurately grasped, it is necessary to execute the scene classification satisfying both the characteristic similarity condition and the temporal continuous condition. There is no technology that satisfies these two conditions at the same time.

【００１８】本発明の目的は、映像をショット単位に細
分することなく、かつ時間的な順序性を損なうことな
く、類似したショットの集合に映像を分類すること、す
なわち、特徴類似条件と時間的連続条件とを同時に満た
し、映像の分類を安定して行うことができる映像ブロッ
ク分類方法及び装置を提供することにある。An object of the present invention is to classify an image into a set of similar shots without subdividing the image into shot units and without impairing the temporal order, that is, the feature similarity condition and the temporal similarity. An object of the present invention is to provide a video block classification method and device that can satisfy the continuous condition at the same time and stably classify video.

【００１９】[0019]

【課題を解決するための手段】本発明の映像ブロック分
類方法及び装置は、特徴類似条件と時間的連続条件とを
満たしつつ映像を分類するために、以下の構成を有す
る。A video block classification method and apparatus according to the present invention has the following configuration in order to classify video while satisfying a feature similarity condition and a temporal continuation condition.

【００２０】すなわち本発明の映像ブロック分類方法
は、画像データ列で構成された映像を複数の映像ブロッ
クに分類する映像ブロック分類方法であって、予め多数
のブロックに分類された映像を入力する映像入力工程
と、各ブロックの画像データ列から当該ブロックの特徴
量を算出する特徴量算出工程と、特徴量に基づいて隣接
するブロック間の類似度を算出する類似度算出工程と、
算出された類似度の中で最大の類似度を示した隣接する
ブロックを１つのブロックに併合するブロック併合工程
と、を有し、特徴量算出工程、類似度算出工程及びブロ
ック併合工程を反復して実行することにより、分類され
た複数の映像ブロックを得る。That is, the video block classification method of the present invention is a video block classification method for classifying a video composed of an image data string into a plurality of video blocks, and a video inputting a video classified into a large number of blocks in advance. An input step, a feature amount calculation step of calculating a feature amount of the block from the image data string of each block, a similarity degree calculation step of calculating a similarity degree between adjacent blocks based on the feature amount,
A block merging step of merging adjacent blocks showing the maximum similarity among the calculated similarities into one block, and repeating the feature amount calculating step, the similarity calculating step and the block merging step. Then, a plurality of classified video blocks are obtained.

【００２１】本発明の映像ブロック分類方法において、
映像入力工程に、フレーム単位で入力した映像をブロッ
クの１種であるショットに分類する工程を含ませてもよ
い。また、各ブロックの特徴量としては、当該ブロック
に含まれるフレームから色、色相、彩度、明度のいずれ
かあるいはこれらの組み合わせからなる情報を抽出して
得たヒストグラムを用いることが好ましく、この場合、
ブロックに含まれるフレーム数が２以上の場合には、当
該ブロックに含まれる映像フレームの全ヒストグラムに
ついてのヒストグラム論理積演算を行って得たヒストグ
ラムを特徴量とすることが好ましい。また、類似度は、
隣接するブロックのヒストグラムに対するヒストグラム
累積論理積に基づいて定めることが好ましい。さらに、
ブロック併合工程における隣接するブロックの併合に際
し、当該隣接するブロックのヒストグラムに対してヒス
トグラム論理積演算を行い、演算の結果得られたヒスト
グラムを併合後のブロックの特徴量とすることが好まし
い。In the video block classification method of the present invention,
The video input step may include a step of classifying the video input in frame units into shots that are one type of block. Further, as the feature amount of each block, it is preferable to use a histogram obtained by extracting information including any one of color, hue, saturation, and lightness or a combination thereof from a frame included in the block. ,
When the number of frames included in a block is two or more, it is preferable to use a histogram obtained by performing a histogram AND operation on all histograms of video frames included in the block as a feature amount. Also, the similarity is
It is preferable to determine it based on the cumulative cumulative AND of the histograms of the adjacent blocks. further,
When merging adjacent blocks in the block merging step, it is preferable to perform a histogram AND operation on the histograms of the adjacent blocks, and use the histogram obtained as a result of the calculation as the feature amount of the merged block.

【００２２】本発明の映像ブロック分類装置は、画像デ
ータ列で構成された映像を複数の映像ブロックに分類す
る映像ブロック分類装置であって、入力された画像デー
タ列を保存する画像データ列メモリと、画像データ列メ
モリからフレームのデータを読み出し、フレームごとの
フレーム特徴量を算出する画像情報変換部と、画像デー
タ列メモリからフレームを読出し、ショット単位に分類
するショット単位分類部と、フレーム特徴量に基づいて
各ショットの特徴量を算出するショット特徴量算出部
と、１または複数のショットで構成されるブロックに対
し、ブロックに対する特徴量を利用して隣接するブロッ
ク間の類似度を算出し、算出された類似度の中で最大の
類似度を示した隣接するブロックを１つのブロックに併
合することによって複数のショットで構成されるブロッ
クを生成し、類似度の算出とブロックの併合とを繰返し
実行する類似度評価及び画像分類処理部と、を有する。A video block classification device of the present invention is a video block classification device for classifying a video composed of an image data sequence into a plurality of video blocks, and an image data sequence memory for storing an input image data sequence. , An image information conversion unit that reads frame data from the image data string memory and calculates a frame feature amount for each frame, a shot unit classification unit that reads frames from the image data string memory, and classifies them into shot units, and a frame feature amount A shot feature amount calculation unit that calculates the feature amount of each shot based on, and for a block composed of one or a plurality of shots, the similarity between adjacent blocks is calculated using the feature amount for the block, By merging adjacent blocks showing the highest similarity among the calculated similarities into one block It generates a block including the number of shots, having a similarity assessment and an image classifying unit repeatedly executes the merging of similarity calculation and the block.

【００２３】本発明の映像ブロック分類装置では、フレ
ーム特徴量及びブロックの特徴量をヒストグラムで表わ
し、ショットごとにフレーム特徴量のヒストグラム論理
積を算出することで各ショットの特徴量を算出し、隣接
するブロックのヒストグラムのヒストグラム累積論理積
に基づいて類似度を算出し、類似度評価及び画像分類処
理部が、隣接するブロックの併合に際して当該隣接する
ブロックのヒストグラムに対してヒストグラム論理積演
算を行い演算の結果得られたヒストグラムを併合後のブ
ロックの特徴量とするようにすることが、好ましい。In the video block classification device of the present invention, the frame feature amount and the block feature amount are represented by a histogram, and the feature amount of each shot is calculated by calculating the histogram logical product of the frame feature amount for each shot, Similarity is calculated based on the histogram cumulative logical product of the histograms of the blocks to be processed, and the similarity evaluation and image classification processing unit performs histogram logical product operation on the histograms of the adjacent blocks when merging the adjacent blocks. It is preferable to use the histogram obtained as a result as the feature amount of the blocks after merging.

【００２４】結局、本発明の映像ブロック分類方法及び
装置では、類似する特徴量、典型的には類似する色情報
をもった隣接する画像ブロックを、類似度の高いものか
ら併合するという処理を反復している。このため、特徴
類似条件と時間的連続条件というシーンの条件を満た
し、類似した画像の集合でかつ時間的に連続な画像ブロ
ックの抽出が可能である。また、分類された映像ブロッ
クをさまざまな形でユーザに表示・提供し、ユーザから
の入力を受け取るインタフェースを充実することによっ
て、ユーザによる映像内容の把握を助けることが可能に
なる。In the end, the video block classification method and apparatus of the present invention repeat the process of merging adjacent image blocks having similar feature amounts, typically, similar color information, from the one having the highest degree of similarity. are doing. Therefore, it is possible to extract the image blocks that satisfy the condition of the scene such as the feature similarity condition and the temporal continuity condition and are a set of similar images and that are temporally continuous. Further, the classified video blocks are displayed / provided to the user in various forms, and the interface for receiving the input from the user is enhanced, so that the user can understand the video contents.

【００２５】[0025]

【発明の実施の形態】以下、図面を用いて本発明の実施
の形態を説明することにより、本発明をさらに詳しく説
明する。BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, the present invention will be described in more detail by explaining the embodiments of the present invention with reference to the drawings.

【００２６】この実施の形態では、ショット単位などで
多数のブロックに分割された映像について、各ブロック
ごとにそのブロックの映像の特徴を表わすヒストグラム
を求め、ヒストグラムに基づき時間的に隣接する２ブロ
ック間の類似度を算出し、類似度が最大となった隣接す
る２ブロックを１つのブロックに併合する。この処理を
繰り返すことによって、ブロック数が漸減する。十分に
分類が行われたかの判定を行い、その結果、十分に分類
が行われたときにもとの映像がシーンに分類される。In this embodiment, for an image divided into a large number of blocks on a shot-by-shot basis, for each block, a histogram representing the features of the image of that block is obtained, and based on the histogram, two blocks temporally adjacent to each other are obtained. Is calculated, and two adjacent blocks having the maximum similarity are merged into one block. By repeating this process, the number of blocks is gradually reduced. It is determined whether sufficient classification has been performed, and as a result, the original video is classified into a scene when sufficient classification has been performed.

【００２７】ヒストグラムは、例えば、画像データか
ら、色、色相、彩度、明度などの特徴量を抽出してこの
特徴量を成分順に配置したものである。画像データにお
ける色や明度などの空間分布を直交変換して得た周波数
分布をそのままヒストグラムとしてもよい。そしてブロ
ックの映像を表わすヒストグラムは、そのブロックに属
する各フレームのヒストグラムを得た上で、全フレーム
のヒストグラムについてのヒストグラム論理積を計算す
ることによって得られる。ヒストグラム論理積の演算
は、後述する説明から明らかになるように、ヒストグラ
ムにおける各成分ごとに、論理積の対象となるヒストグ
ラムにおけるその成分の値の最小値を求め、各成分ごと
の最小値を並べてヒストグラムを生成することによって
実行される。したがって、ブロックの併合に際しては、
併合対象のブロックのヒストグラムを対象としてヒスト
グラム論理積の演算を行ってヒストグラムを求めること
により、併合後のブロックのヒストグラムを得ることが
できる。また、隣接ブロック間の類似度としては、隣接
ブロックのヒストグラムのヒストグラム論理積の演算を
行って得たヒストグラムの面積、すなわち後述するヒス
トグラム累積論理積を使用する。The histogram is obtained, for example, by extracting characteristic quantities such as color, hue, saturation, and lightness from image data and arranging the characteristic quantities in the order of components. The frequency distribution obtained by orthogonally transforming the spatial distribution such as color and brightness in the image data may be used as the histogram as it is. The histogram representing the image of the block is obtained by obtaining the histogram of each frame belonging to the block and then calculating the histogram logical product of the histograms of all the frames. As will be apparent from the description given below, the operation of the histogram logical product finds the minimum value of the value of the component in the histogram that is the target of the logical product for each component in the histogram, and arranges the minimum value for each component. Performed by generating a histogram. Therefore, when merging blocks,
The histogram of the blocks after merging can be obtained by calculating the histogram logical product of the histograms of the blocks to be merged to obtain the histogram. As the similarity between adjacent blocks, the area of the histogram obtained by calculating the histogram logical product of the histograms of the adjacent blocks, that is, the histogram cumulative logical product described later is used.

【００２８】図１は、本発明の実施の一形態の映像ブロ
ック分類装置の構成を示すブロック図である。この映像
ブロック分類装置は、入力画像データ列１０をシーンに
分類し、シーンに分類された映像１６として映像シーン
表示部１７に表示するものであり、ユーザインタフェー
ス部１８を介して入力するユーザからの要求によって、
分類の度合（最終的に分類されるシーンの数など）を調
節できるようになっている。FIG. 1 is a block diagram showing the configuration of a video block classifying apparatus according to an embodiment of the present invention. This video block classifying device classifies the input image data sequence 10 into scenes and displays it as a video 16 classified into the scenes on the video scene display unit 17, which is input from the user through the user interface unit 18. By request,
It is possible to adjust the degree of classification (such as the number of scenes to be finally classified).

【００２９】入力画像データ列１０における画像のサン
プルレート、画像のデータフォーマット、画像サイズは
任意である。例えば、入力画像データ列１０は、ＮＴＳ
Ｃ標準映像信号を３０フレーム／秒でサンプリングした
ものであってもよいし、それよりも粗いサンプリングレ
ートでサンプリングしたものであってもよい。また、入
力画像データ列１０は、ＮＴＳＣのようなアナログ信号
であってもデジタル信号であってもよく、ビデオカメラ
などから直接入力されるデータであってもハードディス
クやＣＤ−ＲＯＭ等の蓄積装置に保存されている画像フ
ァイルであってもよい。図１に示した例では、入力画像
データ列１０はｔ＋１枚のフレームＩ₀,Ｉ₁,…,Ｉ_tで構
成されるＮＴＳＣ映像信号である。The image sample rate, the image data format, and the image size in the input image data sequence 10 are arbitrary. For example, the input image data string 10 is NTS
The C standard video signal may be sampled at 30 frames / second, or may be sampled at a sampling rate coarser than that. Further, the input image data string 10 may be an analog signal such as NTSC or a digital signal, and may be data directly input from a video camera or the like and stored in a storage device such as a hard disk or a CD-ROM. It may be a saved image file. In the example shown in FIG. 1, the input image data sequence 10 is an NTSC video signal composed of t + 1 frames I ₀ , I ₁ , ..., I _t .

【００３０】入力画像データ列１０をフレーム単位で格
納する画像データ列メモリ１１が設けられている。画像
データ列メモリ１１は、単純に入力画像データ列を格納
しておくだけでなく、ある程度まで加工されたデータを
保存していても構わないし、ショットや撮影者の名前、
撮影時の場所等の付加情報や、本実施の形態での以下に
述べる処理の結果得られる情報などを、同時に格納して
もよい。ここでは、画像データ列メモリ１１には入力画
像データ列の原信号を保存するものとする。An image data string memory 11 for storing the input image data string 10 in units of frames is provided. The image data string memory 11 may store not only the input image data string but also the data processed to some extent, the shot, the photographer's name,
Additional information such as a location at the time of shooting, information obtained as a result of the processing described below in the present embodiment, and the like may be stored at the same time. Here, it is assumed that the original signal of the input image data string is stored in the image data string memory 11.

【００３１】画像データ列メモリ１１から読み出された
入力画像データ列が入力するショット単位分類部１２及
び画像情報変換部１３が設けられている。ショット単位
分類部１２は、ショット単位分類処理を実行して、各フ
レームＩ₀,Ｉ₁,…,Ｉ_tのデータに基づき入力画像データ
をｎ＋１個（ただしｎ＜ｔ）のショットＳ₀,Ｉ₁,…,Ｓ_n
に分類するものである。ショット単位分類処理は、映像
信号に予め付加されるショット情報を利用しても構わな
いし、既存のカット点検出技術を利用してショットに分
類しても構わない。また、人間が予めショットに分類し
ておいてもよい。本実施の形態では、既存のカット点検
出技術によってショット単位に分類する。なお、連続す
る１あるいは複数のフレームで構成するものを一般的に
ブロックというから、ショットもそしてシーンもそれぞ
れブロックの１種である。A shot unit classification unit 12 and an image information conversion unit 13 to which an input image data string read from the image data string memory 11 is input are provided. Each shot classification unit 12 executes the shot unit classification process, each frame I _0, I _1, ..., shot S ₀ of the input image data (n + 1) based on data I _t (provided that n <t), I ₁ , ..., S _n
It is classified into. The shot unit classification processing may use shot information added in advance to the video signal, or may classify the shots by using an existing cut point detection technique. Alternatively, a person may classify the shots in advance. In the present embodiment, the existing cut point detection technique is used for classification into shots. It should be noted that since a block configured by one or a plurality of continuous frames is generally called a block, both a shot and a scene are one type of block.

【００３２】画像情報変換部１３は、各フレームの画像
情報を、色、色相、彩度、明度等の情報に変換し、フレ
ーム特徴量Ｈ₀,Ｈ₁,…,Ｈ_tを生成するものである。フレ
ーム特徴量は、色、色相、彩度、明度等の情報に基づく
ヒストグラムとして表わされている。色、色相、彩度、
明度等の情報以外の他の情報に変換しても構わない。こ
こでは、画像フレームＩ₀,Ｉ₁,…,Ｉ_tの全てをそれぞれ
ＲＧＢ情報に変換し、ＲＧＢヒストグラムとしてフレー
ム特徴量Ｈ₀,Ｈ₁,…,Ｈ_tを出力している。The image information conversion unit 13 converts the image information of each frame into information such as color, hue, saturation, lightness, etc., and generates frame feature quantities H ₀ , H ₁ , ..., H _t. is there. The frame feature amount is represented as a histogram based on information such as color, hue, saturation, and lightness. Color, hue, saturation,
You may convert into information other than information, such as brightness. Here, all of the image frames I ₀ , I ₁ , ..., I _t are each converted into RGB information, and the frame feature amounts H ₀ , H ₁ , ..., H _t are output as an RGB histogram.

【００３３】そして、ショット特徴量算出部１４が設け
られ、ショット特徴量算出部１４は、ショット単位分類
部１２で分類されたショットＳ₀,Ｓ₁,…,Ｓ_nの情報と画
像情報変換部１３で得られたフレーム特徴量Ｈ₀,Ｈ₁,
…,Ｈ_tとに基づいて、ショットＳ₀,Ｓ₁,…,Ｓ_nごとにそ
のショット内の全フレームのヒストグラム論理積を演算
して各ショットごとの特徴量すなわちショット特徴量Ｓ
Ｈ₀,ＳＨ₁,…,ＳＨ_nを算出する。算出されたショット特
徴量ＳＨ₀,ＳＨ₁,…,ＳＨ_nは、類似度評価及び画像分類
処理部１５に入力する。類似度評価及び画像分類処理部
１５は、それぞれヒストグラムとして表わされるブロッ
クの特徴量（ブロックの特徴量には、ショット特徴量Ｓ
Ｈ₀,ＳＨ₁,…,ＳＨ_nや、ショットを併合したブロックの
特徴量が含まれる）を基に、ヒストグラム累積論理積か
ら、隣接するブロック（ここでのブロックにはショット
も含まれる）の類似度を計算して評価し、類似度が最大
の隣接する組み合わせを併合して新たなブロックを構成
するものである。類似度評価及び画像分類処理部１５
は、ブロックないしショットの併合によって新たなブロ
ックを生成する際、併合されたブロックないしショット
の各特徴量のヒストグラム論理積に基づき、併合後のブ
ロックの特徴量（ヒストグラム）を算出する。実際に
は、ユーザインタフェース１８部からの指示に応じて、
類似度評価及び画像分類処理部１５はこの評価、併合、
特徴量算出という処理を反復して実行し、シーンに分類
された映像１６を映像シーン表示部１７に出力する。な
お、ユーザは、映像シーン表示部１７に表示されるシー
ンに分類された映像１６に対し、時間方向により詳細に
シーンを表示したいなどの要求をユーザインタフェース
部１８から入力することによって、その要求を映像に反
映させることが可能である。A shot characteristic amount calculating section 14 is provided, and the shot characteristic amount calculating section 14 includes information on shots S ₀ , S ₁ , ..., S _n classified by the shot unit classifying section 12 and an image information converting section. 13, the frame feature values H ₀ , H ₁ ,
, H _t , the histogram AND of all frames in each shot S ₀ , S ₁ , ..., S _n is calculated, and the feature amount of each shot, that is, the shot feature amount S
H ₀ , SH ₁ , ..., SH _n are calculated. The calculated shot feature quantities SH ₀ , SH ₁ , ..., SH _n are input to the similarity evaluation and image classification processing unit 15. The similarity evaluation and image classification processing unit 15 determines the feature amount of each block represented as a histogram (for the feature amount of a block, the shot feature amount S
Based on H ₀ , SH ₁ , ..., SH _n, and the feature amount of the block in which the shots are merged), the adjacent blocks (the blocks here also include the shots) The similarity is calculated and evaluated, and the adjacent combination with the maximum similarity is merged to form a new block. Similarity evaluation and image classification processing unit 15
When a new block is generated by merging blocks or shots, calculates the feature amount (histogram) of the merged block based on the histogram logical product of the feature amounts of the merged blocks or shots. Actually, according to the instruction from the user interface 18,
The similarity evaluation and image classification processing unit 15 performs this evaluation, merging,
The process of calculating the characteristic amount is repeatedly executed, and the video 16 classified into scenes is output to the video scene display unit 17. It should be noted that the user inputs a request for displaying the scene 16 displayed in the video scene display unit 17 in detail in the time direction from the user interface unit 18 by inputting a request from the user interface unit 18. It can be reflected in the image.

【００３４】ショット単位分類部１２、画像情報変換部
１３、ショット特徴量算出部１４及び類似度評価及び画
像分類処理部１５は、演算能力をもつＣＰＵを利用した
ソフトウェアによって処理を実現してもよいし、複数の
ＣＰＵとソフトウェアの組み合わせによって実現しても
良いし、一部を専用のハードウェアによって実現しても
良いし、全部を専用のハードウェアを用いて実現しても
よい。ここでは、演算能力をもつＣＰＵを利用したソフ
トウェアによって処理を実現している。The shot unit classification unit 12, the image information conversion unit 13, the shot feature amount calculation unit 14, and the similarity evaluation and image classification processing unit 15 may realize the processing by software using a CPU having a calculation capability. However, it may be realized by a combination of a plurality of CPUs and software, a part thereof may be realized by dedicated hardware, or the whole may be realized by using dedicated hardware. Here, the processing is realized by software using a CPU having an arithmetic capability.

【００３５】次に、ショット特徴量の算出について、図
２を用いて説明する。画像処理変換部１３では、各フレ
ームのデータから色データ（ＲＧＢデータ）の入力が行
われ（ステップ１０１）、フレームごとにフレーム特徴
量のヒストグラムＨ₀,Ｈ₁,…,Ｈ_tが生成し（ステップ１
０２）、ショット特徴量算出部１４に入力する。一方、
それぞれのショットＳ₀,Ｓ₁,…,Ｓ_nにはどのフレームが
属するかの情報も、ショット単位分類部１２からショッ
ト特徴量算出部１４に入力している。そこでショット特
徴量算出部１４は、ショットＳ₀,Ｓ₁,…,Ｓ_nごとにその
ショットに含まれるフレーム特徴量（ヒストグラム）の
ヒストグラム論理積を算出し（ステップ１０３）、ヒス
トグラム論理積の演算で得られたヒストグラムをショッ
トごとにそのショット特徴量ＳＨ₀,ＳＨ₁,…,ＳＨ_nとし
て出力する（ステップ１０４）。Next, the calculation of the shot characteristic amount will be described with reference to FIG. In the image processing conversion unit 13, color data (RGB data) is input from the data of each frame (step 101), and histograms H ₀ , H ₁ , ..., H _t of the frame feature amount are generated for each frame ( Step 1
02), and inputs it to the shot feature amount calculation unit 14. on the other hand,
Information regarding which frame belongs to each shot S ₀ , S ₁ , ..., S _n is also input from the shot unit classification unit 12 to the shot feature amount calculation unit 14. Therefore, the shot feature amount calculation unit 14 calculates, for each shot S ₀ , S ₁ , ..., S _n , a histogram logical product of frame feature amounts (histograms) included in the shot (step 103), and calculates a histogram logical product. .., SH _n are output as shot feature quantities SH ₀ , SH ₁ , ..., SH _n for each shot (step 104).

【００３６】ここで、ヒストグラム論理積の演算の詳細
について具体的に説明する。上述したようにヒストグラ
ム論理積の演算は、ショット特徴量の算出のみならず、
類似度の算出や併合されたブロックの特徴量の算出など
に使用されるものである。Here, details of the calculation of the histogram logical product will be specifically described. As described above, the calculation of the histogram logical product is not limited to the calculation of the shot feature amount,
It is used to calculate the degree of similarity and the feature amount of the merged blocks.

【００３７】まず、Ｈ_１，Ｈ_２，Ｈ_３，…を各ヒス
トグラムとし、Ｈ_１(ｊ)，Ｈ_２(ｊ)，Ｈ_３(ｊ),…を
成分ｊにおける各ヒストグラムＨ_１，Ｈ_２，Ｈ_３，
…の値とする。また、ヒストグラム論理積の計算の結果
得られるヒストグラムをＨ_ｎｅｗとし、この新たなヒ
ストグラムＨ_ｎｅｗでの成分ｊに対する値をＨ_ｎｅｗ
(ｊ)とする。First, H_1, H_2, H_3, ... Are each histograms, and H_1 (j), H_2 (j), H_3 (j), ... Are each histograms H_1, H_2, H_3 in the component j.
The value of ... Further, the histogram obtained as a result of the calculation of the histogram logical product is set to H_new, and the value for the component j in this new histogram H_new is H_new.
(j).

【００３８】各ヒストグラムの成分ｊが０からｍまで存
在し、かつ、ヒストグラム論理積を算出するためのヒス
トグラムが１からｋまで存在するとき、ヒストグラム論
理積の演算は、When the component j of each histogram exists from 0 to m and the histogram for calculating the histogram logical product exists from 1 to k, the operation of the histogram logical product is

【００３９】[0039]

【数１】Ｈ_ｎｅｗ(ｊ)＝ｍｉｎ｛Ｈ_１(ｊ)，Ｈ_２
(ｊ)，…，Ｈ_ｋ(ｊ)｝（ただし０≦ｊ≦ｍ）で表わされる。## EQU1 ## H_new (j) = min {H_1 (j), H_2
(j), ..., H_k (j)} (where 0 ≦ j ≦ m).

【００４０】図３は、図示(a),(b)に示されるように隣
接する２つのブロック（ブロックＡ,Ｂ）に対応するヒ
ストグラムが与えられたときに、どのようにこれら２つ
のブロックＡ,Ｂ間のヒストグラム論理積が計算される
のかを図解したものである。図示(c)はブロックＡ,Ｂ間
のヒストグラム論理積を示している。すなわち、特徴成
分ごとに、各ブロックでの値のうち低い方の値がヒスト
グラム論理積演算の結果のヒストグラムに採用されてい
る。例えば、成分Ｋについては、図示実線矢印で示すよ
うに、ブロックＢの方が値が小さいので、ヒストグラム
論理積における成分Ｋの値はブロックＢの値と等しくな
る。同様に、成分Ｌについては、ブロックＡの方の値が
採用されている。FIG. 3 shows how, when histograms corresponding to two adjacent blocks (blocks A and B) are given as shown in FIGS. Is a diagram illustrating whether a histogram logical product between B and B is calculated. The figure (c) shows the histogram logical product between the blocks A and B. That is, for each characteristic component, the lower value among the values in each block is adopted in the histogram of the result of the histogram AND operation. For example, as for the component K, as shown by the solid arrow in the figure, the value of the block B is smaller, so the value of the component K in the histogram AND is equal to the value of the block B. Similarly, for the component L, the value of the block A is adopted.

【００４１】次に、類似度評価及び画像分類処理部１５
での処理について、図４を用いて説明する。類似度評価
及び画像分類処理部１５の処理は、簡単に言えば、ショ
ットを含むブロックを併合する処理であり、その併合の
過程において隣接する画像ブロックの特徴量同士を比較
して類似度を算出し、その類似度に基づいて映像を小さ
なブロックから大きなブロックヘと併合し、最終的には
映像を例えばシーンに対応するブロックに分類する処理
である。Next, the similarity evaluation and image classification processing unit 15
The processing in step 4 will be described with reference to FIG. The process of the similarity evaluation and image classification processing unit 15 is simply a process of merging blocks including shots, and in the process of merging, feature amounts of adjacent image blocks are compared to calculate a similarity. Then, the video is merged from the small blocks to the large blocks based on the similarity, and finally the video is classified into blocks corresponding to a scene, for example.

【００４２】類似度評価及び画像分類処理部１５での併
合処理の初期段階では、各ブロックはそれぞれ１つのシ
ョットで構成されているはずである。そこでまず、ショ
ット特徴量算出部１４から各ショットＳ₀,Ｓ₁,…,Ｓ_nの
ショット特徴量ＳＨ₀,ＳＨ₁,…,ＳＨ_nを入力してこれら
をここでの処理対象のブロックとする（ステップ１１
１）。次に、ヒストグラム累積論理積に基づいて、隣接
する２つのブロック間の類似度を算出する（ステップ１
１２）。隣接する２つのブロック（ショットも含む）の
特徴量を示すヒストグラムをそれぞれＨ_１，Ｈ_２と
し、ヒストグラムＨ_１，Ｈ_２での成分ｊの値をそれぞ
れＨ_１(ｊ),Ｈ_２(ｊ)とする。また、ヒストグラム累
積論理積（すなわち類似度）をＶとする。ヒストグラム
の成分ｊが０からｍまでであるとすると、ヒストグラム
累積論理積Ｖは、At the initial stage of the similarity evaluation and merge processing in the image classification processing section 15, each block should be composed of one shot. Therefore, first, the shot from the shot feature quantity calculating unit _{_{14 S 0, S 1, ...}} , shot feature amount SH _0, SH ₁ of S _n, ..., and block to be processed in these enter the SH _n here Yes (Step 11
1). Next, the similarity between two adjacent blocks is calculated based on the cumulative cumulative AND of the histograms (step 1
12). Histograms indicating the feature amounts of two adjacent blocks (including shots) are H_1 and H_2, respectively, and the values of the component j in the histograms H_1 and H_2 are H_1 (j) and H_2 (j), respectively. In addition, the cumulative histogram AND (that is, the degree of similarity) is V. Assuming that the component j of the histogram is 0 to m, the histogram cumulative logical product V is

【００４３】[0043]

【数２】（ただし０≦ｊ≦ｍ）で表わされる。[Equation 2] (Where 0 ≦ j ≦ m).

【００４４】図５は、図示(a),(b)に示されるように隣
接する２つのブロック（ブロックＡ,Ｂ）に対応するヒ
ストグラムが与えられたときに、どのようにこれら２つ
のブロックＡ,Ｂ間のヒストグラム累積論理積が計算さ
れるのかを図解したものである。ヒストグラム論理積の
演算と同様に、成分ごとにヒストグラムでの値を比較
し、低い方の値をその成分における値とし、さらにその
累積を求めている。すなわち、ヒストグラム累積論理積
Ｖは、両者間のヒストグラム論理積を示す図示(c)にお
ける斜線部の面積である。各成分の累積和によって面積
を得ているため、ここではヒストグラム累積論理積と呼
んでいるのである。FIG. 5 shows how, when histograms corresponding to two adjacent blocks (blocks A and B) are given as shown in FIGS. , B is a diagram illustrating whether the histogram cumulative logical product is calculated. Similar to the calculation of the histogram logical product, the values in the histogram are compared for each component, the lower value is set as the value for that component, and the cumulative value is obtained. That is, the histogram cumulative logical product V is the area of the shaded area in FIG. 6C showing the histogram logical product between the two. Since the area is obtained by the cumulative sum of each component, it is called a histogram cumulative logical product here.

【００４５】隣接する２ブロック間のヒストグラム論理
積の面積の大きい場合、すなわちヒストグラム累積論理
積が大きい場合には、隣接する２つのブロック間で同じ
成分に対する値がそれぞれ大きいことが反映されてお
り、特徴量における各成分レベルで画像が類似している
ということができる。したがって、ヒストグラム累積論
理積によって、隣接する画像ブロックの類似度の評価を
行うことができるのである。When the area of the histogram logical product between two adjacent blocks is large, that is, when the histogram cumulative logical product is large, it is reflected that the values for the same component are large between the two adjacent blocks. It can be said that the images are similar at each component level in the feature amount. Therefore, it is possible to evaluate the degree of similarity between adjacent image blocks by the histogram cumulative logical product.

【００４６】上述したようにヒストグラム累積論理積に
基づく類似度は、値が大きいほど隣接する２つのブロッ
ク間での隣接の度合が高いことを示している。そこで、
隣接するブロック間で求めた類似度の中で最大の類似度
を探索し、この最大の類似度を示した隣接する２ブロッ
クを１つのブロックに併合する（ステップ１１３）。そ
して、併合によって新たに生成した画像ブロックの特徴
量を算出する（ステップ１１４）。ここでは併合前の２
ブロックのヒストグラム論理積を併合後のブロックの特
徴量とする。その後、映像の分類が十分に進行したかど
うかを判定し（ステップ１１５）、十分に分類された場
合にはシーンに分類された映像を１６を映像シーン表示
部１７に出力して処理を終了し、十分でない場合には、
ステップ１１２に戻り、併合されたブロックを対象とし
て隣接ブロック間の類似度の算出を実行する。なお、併
合されていないブロック間の類似度は変化しないから、
２回目以降にステップ１１２を実行する場合には、直前
に併合されて生成したブロックとこのブロックに隣接す
るブロックとの類似度のみを算出すればよい。As described above, the higher the value of the similarity based on the cumulative cumulative AND of the histograms, the higher the degree of adjacency between two adjacent blocks. Therefore,
The maximum similarity among the similarities obtained between the adjacent blocks is searched, and the two adjacent blocks showing the maximum similarity are merged into one block (step 113). Then, the feature amount of the image block newly generated by the merging is calculated (step 114). Here 2 before the merge
The histogram logical product of the blocks is used as the feature amount of the blocks after the merge. After that, it is determined whether or not the video classification has proceeded sufficiently (step 115), and if the video classification is sufficient, the video classified into the scene 16 is output to the video scene display unit 17 and the processing ends. , If not enough,
Returning to step 112, the similarity between adjacent blocks is calculated for the merged blocks. Note that the similarity between unmerged blocks does not change,
When Step 112 is executed for the second time and thereafter, only the similarity between the block generated immediately before and the block adjacent to this block needs to be calculated.

【００４７】分類が十分に行うかどうかの判定は、ブロ
ック間での類似度を参考にして行うことができる。この
実施の形態では、ブロックの特徴量であるヒストグラム
の算出を、そのブロックを構成するより小さなブロック
の特徴量であるヒストグラム同士の論理積によって算出
している。そのため、併合が繰り返された後に得られる
ブロックは、多数のフレームないしショットによって構
成されるが、一方でその特徴量であるヒストグラムの値
は小さいものとなる。したがって、巨大なブロック同士
の類似度を示すヒストグラム同士の論理積の値は、双方
のヒストグラムの値がきわめて小さなものであるため、
非常に小さい値でしかない。このことを用いて、ヒスト
グラム累積論理積の値が０のときを全く類似していない
状態、すなわち映像が十分に分類された状態とし、ヒス
トグラム累積論理積の値が０でないものが存在する場合
は、分類が十分に行われていない状態であって分類処理
を継続すべき場合であるとすることができる。また、こ
の実施の形態は、単純な反復作業のみによって実現され
ているため、シーンの分類レベルをさまざまな形で調節
することが可能である。例えば、ユーザインタフェース
部１８を介してユーザから入力する要求にしたがって分
類レベルを変化させ、最終的に分類されるブロックの個
数を変化させることが可能であり、ユーザからの入力情
報を基に分類が十分に行われたかどうかを判断すること
もできる。Whether or not the classification is sufficiently performed can be determined by referring to the similarity between blocks. In this embodiment, the histogram that is the feature amount of a block is calculated by the logical product of the histograms that are the feature amounts of smaller blocks that form the block. Therefore, the block obtained after the merging is repeated is composed of a large number of frames or shots, while the value of the histogram, which is the feature amount, is small. Therefore, the value of the logical product of the histograms showing the similarity between the huge blocks is very small because the values of both histograms are very small.
Only a very small value. Using this fact, when the value of the cumulative cumulative AND of the histogram is 0, it is regarded as a completely dissimilar state, that is, the image is sufficiently classified, and when there is a value of the cumulative cumulative AND of the histogram which is not zero, It is possible that the classification process is to be continued because the classification is not sufficiently performed. In addition, since this embodiment is realized only by simple iterative work, the classification level of the scene can be adjusted in various ways. For example, it is possible to change the classification level according to a request input by the user via the user interface unit 18 and change the number of blocks finally classified, and the classification can be performed based on the input information from the user. You can also determine if it was done well.

【００４８】以下、図６を用いて、この実施の形態にお
いてショットが併合されてより大きなブロックに分類さ
れていく過程をより詳細に説明する。The process in which shots are merged and classified into larger blocks in this embodiment will be described in more detail below with reference to FIG.

【００４９】(a)は、ショット単位分類部１２によって
分類された５つのショットＡ〜Ｅを示している。これら
のショットＡ〜Ｅから、ユーザにとって好適な分類単位
であるシーン単位の分類を進める。ショット特徴量算出
部１４によって算出されたショットＡ〜Ｅごとのショッ
ト特徴量（ヒストグラム）が(b)に示されている。ショ
ットに含まれる各フレームのヒストグラム（特徴量）か
ら、ヒストグラム論理積を計算することによって、ショ
ット特徴量が得られている。(A) shows five shots A to E classified by the shot unit classification unit 12. From these shots A to E, classification is performed in scene units, which is a classification unit suitable for the user. The shot feature amount (histogram) for each of the shots A to E calculated by the shot feature amount calculation unit 14 is shown in (b). The shot feature amount is obtained by calculating the histogram logical product from the histogram (feature amount) of each frame included in the shot.

【００５０】(c)は、隣接するショット間のヒストグラ
ム論理積を示すことによって、隣接するショット間の類
似度を算出する過程を示している。図において記号(C) shows the process of calculating the degree of similarity between adjacent shots by showing the logical product of histograms between adjacent shots. Symbol in the figure

【００５１】[0051]

【外１】は、ヒストグラム論理積の演算を示している。実際の類
似度の評価では、ヒストグラム論理積の面積、すなわち
ヒストグラム累積論理積を用いている。類似度の評価の
結果、ショットＤとショットＥとの組み合わせが最大の
類似度を示したので、(d)に示すように、ショットＤと
ショットＥを併合してＤＥという併合ブロックを得る。[Outside 1] Indicates the operation of histogram AND. In the actual evaluation of the similarity, the area of the histogram logical product, that is, the histogram cumulative logical product is used. As a result of the evaluation of the degree of similarity, the combination of shot D and shot E showed the maximum degree of similarity, so as shown in (d), shot D and shot E are merged to obtain a merged block called DE.

【００５２】この実施の形態では、十分に映像がシーン
に分類されるまで処理を行うので、併合ブロックＤＥを
含むショット群に対してさらに処理を実行する。この時
点での各ショット及び併合ブロックに対する特徴量が
(e)に示されている。そして、(f)に示すように、隣接す
るショットないしブロックの類似度をヒストグラム累積
論理積によって計算する。実際にはIn this embodiment, since the processing is performed until the video is sufficiently classified into the scenes, the processing is further performed on the shot group including the merged block DE. The feature quantity for each shot and merged block at this point is
It is shown in (e). Then, as shown in (f), the similarity between adjacent shots or blocks is calculated by histogram cumulative AND. actually

【００５３】[0053]

【数３】は前回の計算値をそのまま使用し、(Equation 3) Uses the previous calculated value as it is,

【００５４】[0054]

【数４】のみを新たに計算する。その結果、ショットＣとブロッ
クＤＥとの類似度が最大となったので、(g)に示すよう
に併合ブロックＣＤＥを生成する。以下同様の処理を繰
返し、例えば完全にシーンに分類されるまで、特徴量算
出、類似度算出、併合ブロック決定という処理を繰り返
し行っていけばよい。(Equation 4) Only newly calculate. As a result, the similarity between the shot C and the block DE is maximized, so that the merged block CDE is generated as shown in (g). The same process is repeated thereafter, and for example, the process of calculating the feature amount, calculating the degree of similarity, and determining the merged block may be repeatedly executed until the scene is completely classified.

【００５５】以上、本発明の実施の形態について説明し
たが、ここでは、類似度の評価において１回の評価で１
回の併合しか行われない。したがって、例えば、ユーザ
インタフェース部１８を介して、より詳細に映像ブロッ
クを見たい、ついては、「映像をｎ個数に分類したもの
が欲しい」といった要求があった場合には、この要求に
対して瞬時に対応することが可能である。すなわち、図
６の例においては、(a)の５つのショットＡ〜Ｅからな
る映像を４個に分類して見たいという要求があった場
合、シーンを求める際に４個のブロックの状態になって
いる図示(d)の状態、すなわちＡ，Ｂ，ＣＤＥのように
分類された映像ブロックを提示すれば良い。The embodiment of the present invention has been described above. Here, one evaluation is performed once in the similarity evaluation.
Only merged once. Therefore, for example, when there is a request to view a video block in more detail via the user interface unit 18 or, for example, "I want a video in which the video is classified into n pieces", the request is instantly issued. It is possible to correspond to. That is, in the example of FIG. 6, when there is a request to classify the video composed of the five shots A to E of (a) into four, and to view the scene, the state of four blocks is set when the scene is obtained. It suffices to present the state shown in FIG. 3D, that is, the video blocks classified as A, B, and CDE.

【００５６】なお、上述の実施の形態では、ヒストグラ
ム累積論理積に基づいて類似度を算出し、最大の類似度
の隣接ブロックを併合していくという例を説明したが、
類似度の評価と併合ルールには他にさまざまな方法が考
えられる。また、初期状態のブロックがショットである
場合を説明したが、１ショットないし１テイク内の映像
を分類するような場合には、初期状態でのブロックをフ
レームとして、上述の処理を行うことも可能である。In the above embodiment, an example has been described in which the similarity is calculated based on the cumulative AND of the histograms and adjacent blocks having the maximum similarity are merged.
Various other methods can be considered for the similarity evaluation and the merge rule. Also, the case where the block in the initial state is a shot has been described, but in the case of classifying images within one shot or one take, the above-described processing can be performed with the block in the initial state as a frame. Is.

【００５７】また本発明の応用として、データベースの
ブラウジングインタフェースや、映像コンテキストの制
作など様々な映像処理と、ユーザインタフェースヘの応
用なども挙げられる。Further, as an application of the present invention, a database browsing interface, various video processing such as production of a video context, and application to a user interface can be cited.

【００５８】[0058]

【発明の効果】以上説明ように本発明は、ショットやブ
ロック単位に分類された映像を、隣接するショットある
いは画像ブロック間で類似度を算出し、評価し、併合を
行うという処理を繰り返し行うことにより、類似してい
てかつ時間的に連続な画像ブロックを集中して集めるこ
とが可能となり、特徴類似条件と時間的連続条件を満た
す映像分類が実現できるという効果がある。したがっ
て、複数の細分化されたショット単位に分類された映像
をシーン単位に分類することが可能になる。As described above, according to the present invention, it is possible to repeatedly perform the processing of calculating the similarity between adjacent shots or image blocks, evaluating, and merging images classified into shots or block units. As a result, similar and temporally continuous image blocks can be concentrated and collected, and there is an effect that video classification that satisfies the feature similarity condition and the temporally continuous condition can be realized. Therefore, it becomes possible to classify the images classified into a plurality of subdivided shot units into scene units.

【００５９】さらに、本発明は単純な反復処理によって
映像の分類を実現しているため、分類のレベルを多様に
調節することが可能であり、シームレスにシーンに分類
された映像を提供できる。Further, since the present invention realizes image classification by simple iterative processing, it is possible to adjust the classification level in various ways, and it is possible to provide images classified into scenes seamlessly.

[Brief description of drawings]

【図１】本発明の実施の一形態の映像ブロック分類装置
の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a video block classification device according to an embodiment of the present invention.

【図２】ショット特徴量の算出の処理を説明するフロー
チャートである。FIG. 2 is a flowchart illustrating a process of calculating a shot feature amount.

【図３】ヒストグラム論理積の算出方法を示す模式図で
ある。FIG. 3 is a schematic diagram showing a method of calculating a histogram logical product.

【図４】類似度評価及び画像分類処理部での処理を説明
するフローチャートである。FIG. 4 is a flowchart illustrating processing performed by a similarity evaluation and image classification processing unit.

【図５】ヒストグラム累積論理積の算出方法を示す模式
図である。FIG. 5 is a schematic diagram illustrating a method of calculating a histogram cumulative logical product.

【図６】ショットがブロックとしてまとめられていく過
程を示す模式図である。FIG. 6 is a schematic diagram showing a process in which shots are collected into blocks.

[Explanation of symbols]

１０入力画像データ列１１画像データ列メモリ１２ショット単位分類部１３画像情報変換部１４ショット特徴量算出部１５類似度評価及び画像分類処理部１６シーンに分類された映像１７映像シーン表示部１８ユーザインタフェース部１０１〜１０４,１１１〜１１６ステップＡ〜ＥショットＨ₀,Ｈ₁,…,Ｈ_t フレーム特徴量Ｉ₀,Ｉ₁,…,Ｉ_t フレームＳ₀,Ｓ₁,…,Ｓ_n ショットＳＨ₀,ＳＨ₁,…,ＳＨ_n ショット特徴量10 input image data sequence 11 image data sequence memory 12 shot unit classification unit 13 image information conversion unit 14 shot feature amount calculation unit 15 similarity evaluation and image classification processing unit 16 video classified into scenes 17 video scene display unit 18 user interface part 101～104,111～116 step A~E shot _{_{H 0, H 1, ...,}} H t frame feature value _{_{I 0, I 1, ...,}} I t frame _{_{S 0, S 1, ...,}} S n shot SH ₀ , SH ₁ , ..., SH _n Shot feature quantity

───────────────────────────────────────────────────── フロントページの続き (72)発明者谷口行信東京都千代田区内幸町一丁目１番６号日本電信電話株式会社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Yukinobu Taniguchi 1-1-6 Uchisaiwaicho, Chiyoda-ku, Tokyo Nihon Telegraph and Telephone Corporation

Claims

[Claims]

1. A video block classification method for classifying a video composed of an image data string into a plurality of video blocks, the video input step of inputting a video classified into a large number of blocks in advance, and the image of each block. The feature amount calculation step of calculating the feature amount of the block from the data string, the similarity degree calculation step of calculating the similarity degree between adjacent blocks based on the feature amount, and the maximum similarity degree among the calculated similarity degrees. And a block merging step of merging adjacent blocks into one block, the plurality of videos classified by performing the feature amount calculating step, the similarity calculating step, and the block merging step repeatedly. Video block classification method for obtaining blocks.

2. The video block classification method according to claim 1, wherein the video input step includes a step of classifying a video input in frame units into shots which are one type of block.

3. A feature amount of each block is a histogram obtained by extracting information consisting of any one of color, hue, saturation, lightness or a combination thereof from a frame included in the block, The video block classification method according to claim 1 or 2, wherein when the number of included frames is 2 or more, it is represented as a histogram obtained by performing a histogram AND operation on all histograms of the frames included in the block.

4. The video block classification method according to claim 3, wherein the similarity is determined based on histogram cumulative logical product of histograms of adjacent blocks.

5. When merging adjacent blocks in the block merging step, a histogram AND operation is performed on the histograms of the adjacent blocks, and the histogram obtained as a result of the calculation is used as the feature amount of the blocks after merging. The video block classification method according to Item 3 or 4.

6. A video block classification device for classifying a video composed of an image data sequence into a plurality of video blocks, the image data sequence memory storing the input image data sequence, and the frame from the image data sequence memory. Read the data of
An image information conversion unit that calculates the frame feature amount for each frame, a shot unit classification unit that reads the frames from the image data string memory and classifies the shot units, and a shot that calculates the feature amount of each shot based on the frame feature amount. For the feature amount calculation unit and the block composed of one or more shots,
The similarity between adjacent blocks is calculated using the feature amount for the block, and the adjacent block showing the maximum similarity among the calculated similarities is merged into one block to obtain a plurality of shots. A video block classifying device having a similarity evaluation and image classification processing unit that generates configured blocks, and repeatedly executes similarity calculation and block merging.

7. A frame feature amount and a block feature amount are each represented by a histogram, and the feature amount of each shot is calculated by calculating a histogram AND of the frame feature amount for each shot, and the histogram of adjacent blocks is calculated. The similarity is calculated on the basis of the cumulative cumulative AND of the histograms, and the similarity evaluation and image classification processing unit performs a histogram logical AND operation on the histograms of the adjacent blocks when merging the adjacent blocks to obtain the result of the operation. 7. The video block classifying apparatus according to claim 6, wherein the histogram is used as a feature amount of the blocks after being merged.