JP2007200249A

JP2007200249A - Image search method, device, program, and computer readable storage medium

Info

Publication number: JP2007200249A
Application number: JP2006021155A
Authority: JP
Inventors: Isao Kondo; 功近藤; Satoshi Shimada; 聡嶌田; Masashi Morimoto; 正志森本
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2006-01-30
Filing date: 2006-01-30
Publication date: 2007-08-09

Abstract

<P>PROBLEM TO BE SOLVED: To improve similar image searching accuracy in image search aimed at an image so called a shot and to realize speedup by processing based on a label type alone. <P>SOLUTION: An image inputted when content of the inputted image is changed is divided into shots, and from the shots, composition information (a position and a dimension of the area, complexity of a pattern, a color feature, a movement amount and the like) serving as feature quantities of a main area is extracted for classifying the shots based on the composition information. According to the classification category result, similarity between an inputted example image section and the shot is calculated, and based on the similarity, an image section is outputted. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、映像検索方法及び装置及びプログラム及びコンピュータ読み取り可能な記録媒体に係り、特に、映像の制作方法に着目した映像区間の特徴記述方法、及び、それを用いて複数のショットからなる映像区間の検索を行う映像検索方法及び装置及びプログラム及びコンピュータ読み取り可能な記録媒体に関する。 The present invention relates to a video search method and apparatus, a program, and a computer-readable recording medium, and in particular, a video segment feature description method focusing on a video production method, and a video segment composed of a plurality of shots using the same. The present invention relates to a video search method and apparatus, a program, and a computer-readable recording medium.

近年、デジタル放送やケーブルテレビの普及に伴い、様々なジャンルの映像データを入手することが、個人でも容易になっている。加えて、ＨＤＤ、蓄積メディアの進歩に伴い、映像データを、ハードディスク、ＤＶＤ等に大量に蓄積し、個人の都合にあった時間に、視聴するスタイルが定着してきている。 In recent years, with the spread of digital broadcasting and cable television, it has become easier for individuals to obtain video data of various genres. In addition, with the advancement of HDDs and storage media, a large amount of video data is stored on a hard disk, DVD, etc., and a viewing style has become established at a time convenient for the individual.

その一方で、記録蓄積された映像データが増えれば増えるほど、膨大な映像データの中から、特定のイベントシーンのような、複数のショットで１つの意味（行為）となる映像区間を探し出すことは困難である。それに伴い、大量の映像内容を解析し、ユーザの所望する映像区間の検索を行う技術が注目を集めている。 On the other hand, as the amount of recorded and accumulated video data increases, searching for a video section that has one meaning (action) in a plurality of shots, such as a specific event scene, from an enormous amount of video data. Have difficulty. Accordingly, a technique for analyzing a large amount of video content and searching for a video section desired by a user has attracted attention.

従来の映像区間検索方法の多くは、予め定められたイベントを対象とし、定義されたイベントに含まれる特徴を事前に学習することでイベントを高精度に検索する技術がある（例えば、非特許文献１参照）。 Many conventional video segment search methods target a predetermined event and search for an event with high accuracy by learning in advance features included in the defined event (for example, non-patent literature). 1).

学習を用いないアプローチとしては、従来の画像検索で用いられる例示に基づく手法がある。この方法は、ユーザが予め所望する映像と類似する映像を用意し、その映像に含まれる特徴量から、ユーザが所望する映像を探し出す方法である。例示に基づく方法は、イベントを固定しないため、様々な検索要求（種々のイベント）に柔軟に対応することができるという特徴がある。 As an approach that does not use learning, there is a technique based on examples used in conventional image retrieval. This method is a method in which a video similar to the video desired by the user is prepared in advance, and the video desired by the user is searched from the feature amount included in the video. Since the method based on the example does not fix the event, it has a feature that it can flexibly respond to various search requests (various events).

例示に基づく手法の一つに、映像特徴量の時系列特徴に着目する映像検索方法がある。例えば、ＭＰＥＧストリームの各Intra(フレーム内符号)フレームのＤＣ成分を用い、フレームの平均色情報を求め、３次元の色空間に配置し、その軌跡を時間軸に投影し、波形情報に変換し、波形同士を拡大・縮小して比較することで、映像画像検索を実現するものがある（例えば、非特許文献２参照）。 One of the methods based on the examples is a video search method that focuses on time-series features of video feature values. For example, using the DC component of each Intra (intra-frame code) frame of an MPEG stream, the average color information of the frame is obtained, placed in a three-dimensional color space, the trajectory is projected on the time axis, and converted into waveform information. In some cases, a video image search is realized by enlarging / reducing and comparing waveforms (for example, see Non-Patent Document 2).

また、例示する映像の特徴量として、映像フレームの画像特徴量に着目する映像検索方法がある。例えば、映像をショットに分割し、各ショットの代表フレームの集合で映像を表現し、代表フレーム集合同士を比較することで映像を検索するものがある（非特許文献３参照）。
望月貴裕、蓼沼眞、八木伸行、「シーンのパターン化と隠れマルコフモデルを用いた野球のインデキシング」電子情報通信学会技術研究報告、PRMU2005-28, pp.37-42, 2005 高橋克直、富永英義、杉浦麻貴、横井摩優、寺島信義「特徴的な動画像の画紋を用いた高能率動画像検索法」画像電子学会論文誌，Vol.29, No.6, 2000 堀田政二、井上光平、浦浜喜一「画像集合間距離に基づくビデオの類似検索」映像情報メディア学会誌、Vol.54, No.11, 2000 Further, as an example of the feature amount of the video, there is a video search method that focuses on the image feature amount of the video frame. For example, there is a technique in which a video is divided into shots, the video is represented by a set of representative frames of each shot, and the video is searched by comparing the representative frame sets (see Non-Patent Document 3).
Takahiro Mochizuki, Satoshi Suganuma, Nobuyuki Yagi, "Scene Patterning and Baseball Indexing Using Hidden Markov Models" IEICE Technical Report, PRMU2005-28, pp.37-42, 2005 Katsunao Takahashi, Hideyoshi Tominaga, Maki Sugiura, Mayo Yokoi, Nobuyoshi Terashima “Highly Efficient Video Retrieval Method Using Characteristic Image Prints”, Transactions of the Institute of Image Electronics Engineers of Japan, Vol.29, No.6, 2000 Seiji Hotta, Kohei Inoue, Kiichi Urahama “Similar search of video based on distance between image sets”, Journal of the Institute of Image Information and Television Engineers, Vol.54, No.11, 2000

従来の学習ベースの手法（非特許文献１）では、事前に学習データを用意する必要がある。一般に、学習データを自動に集めることは困難であり、現実には人手を要する。そのため、映像のジャンルや検索イベント種類が増えるにつれて、その作業量及び、作業者を雇い入れる費用が問題になる。また、予め決められたイベント以外は、ユーザの多様な検索ニーズに即座に応えることができないという問題がある。 In the conventional learning-based method (Non-Patent Document 1), it is necessary to prepare learning data in advance. In general, it is difficult to collect learning data automatically, and in reality, human labor is required. Therefore, as the video genre and search event types increase, the amount of work and the cost of hiring workers become a problem. Further, there is a problem that it is not possible to immediately respond to the user's various search needs other than predetermined events.

一方、例示ベースの手法（非特許文献２）は、平均色の時系列特徴を用いており、例示映像と全く同じ映像（例えばＣＭ）を検索することには適しているが、例示映像と似ている映像（例えば、サッカーのＰＫシーン）を探す場合は、必ずしも例示映像と同じ時間長で撮影されているわけではないため、時系列特徴の類似性が低下するという問題がある。また、フレーム全体を平均色で表しているため、人間の見た目には異なるフレームを同一視してしまうという、識別精度の問題がある。 On the other hand, the example-based method (Non-Patent Document 2) uses time series features of average colors and is suitable for searching for the same video (for example, CM) as the example video, but is similar to the example video. When searching for a video (for example, a soccer PK scene), the video is not necessarily shot with the same length of time as that of the example video, so that there is a problem that the similarity of time-series features decreases. In addition, since the entire frame is represented by an average color, there is a problem of identification accuracy in that different frames are identified with human eyes.

また、映像フレームの画像特徴量を用いる手法（非特許文献３）では、ショットを一つの代表フレームで表しており、ショット内での物体の動き情報が失われており、映像特有の動物体を用いた検索ができないという問題がある。 Further, in the method using the image feature amount of the video frame (Non-patent Document 3), the shot is represented by one representative frame, the motion information of the object in the shot is lost, and the moving object peculiar to the video is displayed. There is a problem that the search cannot be performed.

また、動画像の特徴を詳細に記述すればするほど、映像同士の特徴量比較に係る時間が増え、検索結果を返すまでに計算時間がかかるという問題が生じる。 Further, as the feature of the moving image is described in detail, there is a problem that the time required for comparing the feature amount between videos increases and it takes more time to return the search result.

本発明は、上記の点に鑑みなされたもので、ショットと呼ばれる１０秒程度の映像を対象とする映像検索における類似映像の検索精度を向上させ、ラベル種別のみの処理による高速化が可能な映像検索方法及び装置及びプログラム及びコンピュータ読み取り可能な記録媒体を提供することを目的とする。 The present invention has been made in view of the above points, and improves the search accuracy of similar videos in a video search for a video of about 10 seconds called a shot, and can increase the speed by processing only the label type. It is an object of the present invention to provide a search method, apparatus, program, and computer-readable recording medium.

図１は、本発明の原理を説明するための図である。 FIG. 1 is a diagram for explaining the principle of the present invention.

本発明（請求項１）は、ユーザが例示した映像区間と類似する映像区間を検索する映像検索方法であって、
ショット分割手段が、入力された映像の内容が変化した点で該映像を部分映像区間（以下、ショットと呼ぶ）に分割し、記憶手段に格納するショット分割ステップ（ステップ１）と、
構図情報抽出手段が、ショットから主な領域の特徴量である構図情報を抽出し、記憶手段に格納する構図情報抽出ステップ（ステップ２）と、
ショット分類手段が、構図情報に基づいてショットを分類し、分類カテゴリ結果を記憶手段に格納するショット分類ステップ（ステップ３）と、
類似度算出手段が、ショット分類ステップで得られたショットの前記分類カテゴリ結果に基づいて、入力された例示映像区間とショットの類似度を算出する類似度算出ステップ（ステップ４）と、
検索結果表示手段が、類似度に基づいて映像区間を出力する検索結果表示ステップ（ステップ５）と、を行う。 The present invention (Claim 1) is a video search method for searching for a video section similar to the video section exemplified by the user,
A shot dividing step (step 1) in which the shot dividing means divides the video into partial video sections (hereinafter referred to as shots) at the point where the content of the inputted video has changed, and stores it in the storage means;
A composition information extracting unit that extracts composition information that is a feature quantity of a main region from a shot and stores the extracted composition information in a storage unit (step 2);
A shot classification step (step 3) in which the shot classification unit classifies the shots based on the composition information and stores the classification category result in the storage unit;
A similarity calculation step (step 4) in which the similarity calculation means calculates the similarity between the input example video section and the shot based on the classification category result of the shot obtained in the shot classification step;
The search result display means performs a search result display step (step 5) for outputting the video section based on the similarity.

また、本発明（請求項２）は、ショット分類ステップ（ステップ３）において、
未分類の入力ショットと分類済のショットを比較し、予め設定した閾値未満の場合、既存のショットカテゴリに未分類の入力ショットを分類し、閾値以上の場合は、新しい分類カテゴリに分類する。 In the present invention (Claim 2), in the shot classification step (Step 3),
The unclassified input shot is compared with the classified shot, and if it is less than a preset threshold, the unclassified input shot is classified into an existing shot category, and if it is equal to or greater than the threshold, it is classified into a new classification category.

図２は、本発明の原理構成図である。 FIG. 2 is a principle configuration diagram of the present invention.

本発明（請求項３）は、ユーザが例示した映像区間と類似する映像区間を検索する映像検索装置であって、
入力された映像の内容が変化した点で該映像を部分映像区間（以下、ショットと呼ぶ）に分割し、記憶手段１０８に格納するショット分割手段１０２と、
ショットから主な領域の特徴量である構図情報を抽出し、記憶手段１０８に格納する構図情報抽出手段１０３と、
構図情報に基づいて記憶手段１０８に格納されているショットを分類し、分類カテゴリ結果を該記憶手段に格納するショット分類手段１０４と、
ショット分類手段１０４で得られたショットの分類カテゴリ結果に基づいて、入力された例示映像区間とショット部分映像区間の類似度を算出する類似度算出手段１０６と、
類似度に基づいて映像区間を出力する検索結果表示手段１０７と、を有する。 The present invention (Claim 3) is a video search device for searching for a video section similar to the video section exemplified by the user,
A shot dividing unit 102 that divides the video into partial video sections (hereinafter referred to as shots) at the point where the content of the input video has changed, and stores it in the storage unit 108;
Composition information extraction means 103 that extracts composition information that is a feature quantity of a main region from a shot and stores it in the storage means 108;
A shot classification unit 104 that classifies the shots stored in the storage unit 108 based on the composition information and stores the classification category result in the storage unit;
Similarity calculation means 106 for calculating the similarity between the input example video section and shot partial video section based on the shot classification category result obtained by the shot classification means 104;
And search result display means 107 for outputting a video section based on the degree of similarity.

また、本発明（請求項４）は、ショット分類手段１０４において、
未分類の入力ショットと分類済のショットを比較し、予め設定した閾値未満の場合、既存のショットカテゴリに未分類の入力ショットを分類し、閾値以上の場合は、新しい分類カテゴリに分類する手段を含む。 Further, the present invention (Claim 4), in the shot classification means 104,
Compare the unclassified input shot with the classified shot, and if it is less than the preset threshold, classify the unclassified input shot into the existing shot category, and if it is above the threshold, classify it into a new classification category. Including.

本発明（請求項５）は、ユーザが例示した映像区間と類似する映像区間を検索する映像検索プログラムであって、
コンピュータを、請求項３または４記載の映像装置として機能させる映像検索プログラムである。 The present invention (Claim 5) is a video search program for searching for a video section similar to the video section exemplified by the user,
A video search program for causing a computer to function as the video device according to claim 3.

本発明（請求項６）は、ユーザが例示した映像区間と類似する映像区間を検索する映像検索プログラムを格納したコンピュータ読み取り可能な記録媒体であって、
コンピュータを、請求項３または４記載の映像装置として機能させるプログラムを格納したコンピュータ読み取り可能な記録媒体である。 The present invention (Claim 6) is a computer-readable recording medium storing a video search program for searching a video section similar to the video section exemplified by the user,
A computer-readable recording medium storing a program that causes a computer to function as the video device according to claim 3.

放送映像の多くは、映像製作技術に基づいて映像制作されているので、映像種別を限定すれば番組の構成（場面展開や被写体の構図など）に再現性がある。例えば、毎週あるいは毎日同じ時間帯で放送されるレギュラー番組の多くは、特定の放送作家やディレクタにより撮影したい場面とその場面展開がストーリとして決められており、放送日が異なっても番組構成の類似性は高い。 Since many broadcast videos are produced based on video production technology, if the video type is limited, the program structure (scene development, subject composition, etc.) is reproducible. For example, many regular programs that are broadcast weekly or every day at the same time zone have a story that determines the scenes to be filmed by specific broadcast writers and directors and the development of the scenes. The nature is high.

また、スポーツ番組では、スポーツ自身が決められたルールに基づいて行われる性質上、映像に再現性がある。例えば、野球番組であればどの番組でも、同じような投球シーンが繰り返し出現する。 In sports programs, the video is reproducible due to the nature of sports performed based on rules determined by the sports themselves. For example, a similar pitching scene repeatedly appears in any program of a baseball program.

さらに、熟練したカメラマンは撮影する場面や展開に応じて適切に撮影できるスキルを持っているので、同じような番組構成では、被写体の画面での位置や大きさ、カメラ操作などにも再現性がみられる。 In addition, the skilled cameraman has the skill to shoot appropriately according to the shooting scene and development, so in the same program structure, the position and size of the subject on the screen, camera operation etc. are also reproducible Be looked at.

したがって、放送作家、ディレクタ、熟練カメラマンらによって制作される、同種の放送映像では、番組構成上、類似シーンが繰り返し発生し、それらの類似したシーンにおいては、被写体の構図などに再現性があるといえる。 Therefore, in the same type of broadcast video produced by broadcast writers, directors, skilled cameramen, etc., similar scenes repeatedly occur in the program structure, and in those similar scenes, the composition of the subject is reproducible. I can say that.

本発明は、そのような映像制作技術に着目し、画像内容が変化するショットを単位にして、被写体の配置、被写体を写す大きさ、被写体の画面での移動量などの特徴により映像をモデル化する。これにより、非特許文献２のような特定条件（同一時間長、同一フレーム）を意識した厳密な類似性ではなく、また、非特許文献３のような特定フレームにおける画像特徴量といった粗い類似性ではない、映像撮影技術に基づくショットの構図集合で映像をモデル化した類似性を与える映像区間検索方法である。本発明は、ショットを単位とすることで、時間方向にロバスト性を与え、構図情報により画像内容の解析精度を高めることができる。また、ＴＶ番組の映像制作技術に着目しており、事前学習を必要としないということに加え、任意の映像区間に対して適用可能である。なお、これらの特徴は、野球映像といったように特定の映像に限定されるものではない。 The present invention focuses on such video production technology, and models video based on features such as subject placement, size of the subject, and amount of movement of the subject on the screen, in units of shots with varying image contents. To do. As a result, it is not a strict similarity conscious of specific conditions (same time length, same frame) as in Non-Patent Document 2, but a rough similarity such as an image feature amount in a specific frame as in Non-Patent Document 3. There is no video segment search method that gives similarity by modeling video based on a shot composition set based on video shooting technology. In the present invention, by using a shot as a unit, robustness can be given in the time direction, and the analysis accuracy of the image content can be enhanced by the composition information. In addition, it focuses on TV program video production technology, and in addition to not requiring prior learning, it can be applied to any video section. Note that these features are not limited to a specific video such as a baseball video.

また、本発明は、時間軸、画像空間軸の両面からなる構図情報を抽出することで、ショットの内容を解析するため精度が高まる。また、この構図情報を基にショットをいくつかのカテゴリに分類、ラベル付与を行い、映像の類似性を、ショットラベルの集合の類似性と定義することで、映像区間同士の比較演算コストを大幅に削減されるため、高速に所望とする映像区間が検索可能な映像検索方法である。 Further, the present invention extracts the composition information composed of both the time axis and the image space axis, thereby improving the accuracy of analyzing the shot contents. In addition, shots are classified into several categories based on this composition information, labeling is performed, and the similarity of video is defined as the similarity of a set of shot labels, greatly increasing the cost of comparison between video sections. Therefore, the video search method can search a desired video section at high speed.

本発明は、番組のストーリが同じであれば、映像区間に含まれるカメラ操作及び構図の集合に類似性があるという過程に基づく類似映像区間検索技術である。特に、スポーツ競技のように、カメラの操作及び構図に類似性が見られる映像に効果があると期待できる。 The present invention is a similar video segment search technique based on a process in which a set of camera operations and compositions included in a video segment are similar if the story of the program is the same. In particular, it can be expected to be effective for images in which similarity is seen in camera operation and composition, such as in sports competitions.

また、従来のイベント検出のように、事前学習を必要としないという特徴がある。 In addition, unlike conventional event detection, there is a feature that prior learning is not required.

上記のように本発明によれば、画像内容が変化するショットを単位に、画像全体の特徴を全て使うのではなく、主な領域を選び出し、ショットの構図特徴として抽出し、映像検索に応用する。これにより、映像区間の時間長に柔軟性を持たせ、画面の構図により識別精度を高めることができる。ＴＶ番組の映像製作技術に着目しており、野球映像といったように特定の映像に限定されるものではないという特徴がある。 As described above, according to the present invention, instead of using all the features of the entire image for each shot whose image content changes, the main region is selected and extracted as a composition feature of the shot, and applied to video search. . Thereby, the time length of the video section can be made flexible, and the identification accuracy can be increased by the composition of the screen. It focuses on video production technology for TV programs and is not limited to a specific video such as a baseball video.

また、本発明では、事前に映像データベース中の映像に対して、ショットのカテゴリ分類とラベル付与を行い、多大の情報量を持つ映像同士の比較を、少数のショットラベル集合の類似度算出に置き換え、高速に類似映像区間を検索することができる。また、ショットの分類は、構図情報に基づき行われているため、検索される映像区間は、類似した撮影構図から構成され、より人間の主観に近いことが期待できる。加えて、映像の書誌的なデータを使って、検索対象をフィルタリングすることで、誤検索を抑制することができる。 In addition, in the present invention, shot category classification and labeling are performed on videos in the video database in advance, and comparison between videos having a large amount of information is replaced with calculation of similarity of a small number of shot label sets. Similar video sections can be searched at high speed. In addition, since shot classification is performed based on composition information, a searched video section is composed of similar shooting compositions, and can be expected to be closer to human subjectivity. In addition, it is possible to suppress erroneous searches by filtering search objects using bibliographic data of video.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図３は、本発明の一実施の形態における映像検索装置の構成を示す。 FIG. 3 shows a configuration of a video search apparatus according to an embodiment of the present invention.

同図に示す映像検索装置は、映像データをショットと呼ばれる部分映像区間に分割し、構図情報に基づきショットをカテゴリ分類し、映像の類似性をショットの分類ラベルの類似性により定義し、ユーザが所望する映像区間を検索するものである。 The video search apparatus shown in the figure divides video data into partial video sections called shots, classifies shots based on composition information, defines video similarity based on similarity of shot classification labels, and allows the user to The desired video section is searched.

本実施の形態では、映像検索装置１００は、映像受付部１０１、ショット分割部１０２、構図情報抽出部１０３、ショット分類部１０４、検索条件入力部１０５、類似度算出部１０６、検索結果表示部１０７、映像データベース１０８から構成されるものとする。 In the present embodiment, the video search apparatus 100 includes a video reception unit 101, a shot division unit 102, a composition information extraction unit 103, a shot classification unit 104, a search condition input unit 105, a similarity calculation unit 106, and a search result display unit 107. The video database 108 is assumed to be configured.

また、映像検索装置１００には、外部情報記憶装置１１０と映像再生装置１１１が接続されている。 In addition, an external information storage device 110 and a video playback device 111 are connected to the video search device 100.

映像受付部１０１は、映像配信サーバや映像受信装置（例えば、チューナー）などの外部情報記憶部１１０を介して外部から映像を取得し、ショット分割部１０２へ入力すると共に、映像データベース１０８に格納する。このとき、映像に予め関連付けられたメタデータや、ＥＰＧ（電子番組表）サービスを利用してタイトル、映像ジャンル等の書誌的な情報を映像と共に取得し、映像データベース１０８へ格納してもよい。 The video reception unit 101 acquires a video from the outside via an external information storage unit 110 such as a video distribution server or a video reception device (for example, a tuner), inputs the video to the shot division unit 102, and stores the video in the video database 108. . At this time, bibliographic information such as a title and a video genre may be acquired together with the video using metadata previously associated with the video or an EPG (electronic program guide) service, and stored in the video database 108.

ショット分割部１０２は、映像受付部１０１から入力される映像に対し、カット点と呼ばれる映像内容の構成が大きく変化する点を検出し、このカット点毎に映像を分割するものである。ショットとは、時間的に隣接したカットの間と定義する。このショット分割部１０２は、得られたショットを構図情報抽出部１０３に入力する。また、図４に示す映像データベース１０８の映像管理テーブルのショットの名前name、開始時刻S_Time、終了時刻E_Timeの欄へ格納する。このショット分割部１０２は、例えば、特開平１１−０１８０２８に開示されたショット分割手法などを用いることができる。 The shot division unit 102 detects a point where the configuration of the video content, which is called a cut point, changes greatly from the video input from the video reception unit 101, and divides the video for each cut point. A shot is defined as a time interval between adjacent cuts. The shot dividing unit 102 inputs the obtained shot to the composition information extracting unit 103. Further, it is stored in the shot name name, start time S_Time, and end time E_Time fields of the video management table of the video database 108 shown in FIG. The shot division unit 102 can use, for example, a shot division method disclosed in JP-A-11-018028.

構図情報抽出部１０３は、ショット分割部１０２から入力される各ショットに対し、ショットを特徴付ける構図情報を求めるものである。具体的には、ショットから主な領域（画面を構成する主要な領域を表す。例えば、画像の領域分割で用いられる一般的な手法を用い、各フレームを複数の領域に分割し、ショットの先頭フレームから、最終フレームまで対応付けが可能な領域とする）を求め、その位置、大きさ、模様の複雑さ、色特徴、移動量などの特徴量を構図情報として抽出する。算出された各構図情報は、ショット分類部１０４へ入力すると共に、映像データベース１０８の映像管理テーブルの構図情報（領域１〜領域i）に格納される。 The composition information extraction unit 103 obtains composition information that characterizes a shot for each shot input from the shot division unit 102. Specifically, the main area from the shot (represents the main area constituting the screen. For example, using a general technique used in image area division, each frame is divided into a plurality of areas, and the head of the shot is The region from the frame to the final frame can be determined), and the feature amount such as the position, size, pattern complexity, color feature, and movement amount is extracted as composition information. The calculated composition information is input to the shot classification unit 104 and stored in the composition information (region 1 to region i) of the video management table of the video database 108.

ショット分類部１０４は、構図情報抽出部１０３から入力された各ショットの構図情報の類似性の基づき、ショットのカテゴリ分類を行うものである。また、このショット分類部１０４は、ショットのカテゴリ分類結果に基づき、分類ラベルを映像データベース１０８の映像管理テーブルのショットラベル［Label］に格納し、各分類ラベルとその特徴量を図５に示すラベル管理テーブルに格納する（詳細については後述する）。以上のようにして、映像データベース１０８に映像とその特徴量が蓄積される。 The shot classification unit 104 performs category classification of shots based on the similarity of the composition information of each shot input from the composition information extraction unit 103. Further, the shot classification unit 104 stores the classification label in the shot label [Label] of the video management table of the video database 108 based on the category classification result of the shot, and the label shown in FIG. Store in the management table (details will be described later). As described above, the video and its feature amount are accumulated in the video database 108.

映像再生装置１１１は、例えば、図６に示すように、ユーザが映像データベース１０８に格納された映像を視聴するものである。ユーザは、検索したい映像区間が見つかった場合、図６の右下にある「探す」ボタンを押下することで、検索条件入力部１０５（例えば、図７）に遷移する。 For example, as shown in FIG. 6, the video playback device 111 is used by a user to view video stored in the video database 108. When the user finds a video section to be searched, the user transitions to the search condition input unit 105 (for example, FIG. 7) by pressing the “Search” button at the lower right of FIG.

検索条件入力部１０５は、ユーザの検索意図をシステムに伝える役割を持つ例示映像区間を決定し、例示映像区間を類似度算出部１０６に入力するものである。類似映像区間を示すには、例えば、図７に示すように、映像再生装置１１１で「探す」ボタンを押下した時刻を基準に、時間的に前後する複数のショットの代表画像（例えばショットの先頭画像）を表示し、各代表画像の下部にあるin（始点）とout（終点）のラジオボタンを選択することで、例示映像区間を決定してもよい。 The search condition input unit 105 determines an example video section having a role of transmitting the user's search intention to the system, and inputs the example video section to the similarity calculation unit 106. In order to show similar video sections, for example, as shown in FIG. 7, representative images of a plurality of shots (for example, the heads of shots) that move forward and backward with respect to the time when the “search” button is pressed in the video playback device 111 are used as a reference. Image) may be displayed, and an example video section may be determined by selecting radio buttons of in (start point) and out (end point) at the bottom of each representative image.

また、このとき、映像データベース１０８で映像と共にＥＰＧなどの書誌的な情報を保持している場合、映像検索精度の向上を目的として、映像自体の書誌的な情報を用いて対象映像の範囲を指定することができる。 At this time, when bibliographic information such as EPG is held together with the video in the video database 108, the range of the target video is specified using the bibliographic information of the video itself for the purpose of improving the video search accuracy. can do.

類似度算出部１０６は、検索条件入力部１０５から入力された例示映像区間と映像データベース１０８に格納されている映像区間の類似性を構図情報に基づくショットラベル集合の類似性とし、予め定めた式により算出し、検索結果表示部１０７に入力するものである。類似度算出方法は、例えば、以下で定義される式を用いてもよい。 The similarity calculation unit 106 uses the similarity between the example video segment input from the search condition input unit 105 and the video segment stored in the video database 108 as the similarity of the shot label set based on the composition information, and uses a predetermined formula And is input to the search result display unit 107. For example, an expression defined below may be used as the similarity calculation method.

検索結果表示部１０７は、類似度算出部１０６で算出される類似度に基づき、類似度の高い順に、映像データベース１０８から該当する類似映像区間を取得し、ユーザに結果を提示する。例えば、検索結果の一覧性を高めるため、１映像区間につき、各ショットの代表フレームを複数枚表示してもよい。

The search result display unit 107 acquires corresponding similar video sections from the video database 108 in descending order of similarity based on the similarity calculated by the similarity calculation unit 106 and presents the result to the user. For example, a plurality of representative frames for each shot may be displayed for one video section in order to improve the list of search results.

次に、ショット分類部１０４について説明する。 Next, the shot classification unit 104 will be described.

図８は、本発明の一実施の形態におけるショット分類部の動作のフローチャートである。 FIG. 8 is a flowchart of the operation of the shot classification unit according to the embodiment of the present invention.

ステップ１１）構図情報抽出部１０３から入力される未分類のショットの構図情報を受け付け、一時メモリ（図示せず）に格納する。 Step 11) The composition information of the unclassified shot input from the composition information extraction unit 103 is received and stored in a temporary memory (not shown).

ステップ１２）映像データベース１０８のラベル管理テーブルを取得し、ラベル管理テーブルのショットを一時メモリ（図示せず）に格納する。 Step 12) The label management table of the video database 108 is acquired, and shots of the label management table are stored in a temporary memory (not shown).

ステップ１３）一時メモリ（図示せず）より、ステップ１１で受け付けたショットの構図情報と、ステップ１２で取得したショットの構図情報とを読み出し、全てのショット組で距離を算出する。このとき、ショット同士の距離は、例えば、最も特徴ベクトルの距離が近い領域同士の距離の和とし、距離尺度には、差分の絶対値和や以下の式で定義する重み付き２乗距離和を用いることが可能である。 Step 13) The composition information of the shot received in Step 11 and the composition information of the shot acquired in Step 12 are read from a temporary memory (not shown), and distances are calculated for all shot groups. At this time, the distance between the shots is, for example, the sum of the distances between the regions having the closest feature vector distances, and the distance scale includes a sum of absolute values of the differences and a weighted square distance sum defined by the following expression. It is possible to use.

ステップ１４）ステップ１３で求めた距離が、予め定めた閾値以下のショットの組を、同一のカテゴリに分類する。

Step 14) A set of shots in which the distance obtained in Step 13 is equal to or less than a predetermined threshold is classified into the same category.

ステップ１５）ステップ１４で分類されたカテゴリ毎に、映像データベース１０８のラベル管理テーブルのショットが含まれるか否かを判定する。「はい」の場合、ステップ１６へ移行し、「いいえ」の場合は、ステップ１８に移行する。 Step 15) For each category classified in Step 14, it is determined whether or not a shot of the label management table of the video database 108 is included. If “yes”, the process proceeds to step 16, and if “no”, the process proceeds to step 18.

ステップ１６）カテゴリ毎に、カテゴリ内のラベル未付与のショット（本実施の形態では、ショットＺと表し、説明する）に対し、同一カテゴリ内で、既にラベルを付与されたショット（本実施の形態では、ショットＤと表し、説明する）のラベルＤを与え、映像データベース１０８の映像管理テーブルのショットＺのLabel欄に分類ラベルＤを格納する。 Step 16) For each category, a shot that has already been given a label within the same category with respect to a shot that has not been given a label within the category (in the present embodiment, it will be described as shot Z) (this embodiment) The label D of the shot management D of the video database 108 is stored, and the classification label D is stored in the Label column of the shot Z of the video management table of the video database 108.

ステップ１７）映像データベース１０８の映像管理テーブルのショットのうち、Ｄラベルに属する全てのショットを用い、それらの構図情報の平均値を求め、ラベル管理テーブルのショットＤの構図情報と置き換える。 Step 17) Using all shots belonging to the D label among the shots in the video management table of the video database 108, an average value of the composition information is obtained and replaced with the composition information of the shot D in the label management table.

ステップ１８）新規のラベルを与え、映像管理テーブルとラベル管理テーブルに格納する。 Step 18) A new label is given and stored in the video management table and the label management table.

図９は、本発明の一実施の形態における検索の手掛かりとなる映像の特徴情報（映像管理テーブル、ラベル管理テーブル）を生成する手順のフローチャートである。 FIG. 9 is a flowchart of a procedure for generating video feature information (video management table, label management table) which is a clue to search in an embodiment of the present invention.

ステップ１０１）映像受付部１０１により、映像配信サーバや映像受信装置などの外部情報記憶装置を介して外部から映像を取得し、映像データベース１０８の映像記憶領域に格納する。 Step 101) The video reception unit 101 acquires a video from the outside via an external information storage device such as a video distribution server or a video reception device, and stores the video in a video storage area of the video database 108.

ステップ１０２）ステップ１０１で取得した映像データをショット分割部１０２により、ショットへ分割する。これにより得られたショットの先頭フレーム及び最終フレームの時刻、ショットラベルは、映像データベース１０８の映像管理テーブル及びラベル管理テーブルに格納される。 Step 102) The video data acquired in Step 101 is divided into shots by the shot division unit 102. The time and shot label of the first and last frames of the shot obtained in this way are stored in the video management table and label management table of the video database 108.

ステップ１０３）ステップ１０２で分割したショット毎に、構図情報抽出部１０３により、ショットの構図情報として抽出し、映像データベース１０８の映像管理テーブル及びラベル管理テーブルに格納する。 Step 103) For each shot divided in Step 102, the composition information extraction unit 103 extracts the shot composition information as shot information and stores it in the video management table and label management table of the video database 108.

ステップ１０４）ステップ１０３で算出した構図情報を用い、ショットの分類部１０４によりショットの分類を行う。 Step 104) Using the composition information calculated in step 103, the shot classification unit 104 classifies the shots.

上記のようにして得られ、映像データベース１０８の映像管理テーブルの例を図４、ラベル管理テーブルの例を図５に示す。また、上記のフローチャートで映像にラベルが付与されるイメージ図を図１０に示す。 An example of the video management table of the video database 108 obtained as described above is shown in FIG. 4, and an example of the label management table is shown in FIG. Further, FIG. 10 shows an image diagram in which a label is given to the video in the above flowchart.

次に、映像検索装置１００において、予め求めたショットの分類結果（ショットの分類ラベル）に基づいて、映像を検索する処理について説明する。 Next, a description will be given of processing for searching for a video based on the shot classification result (shot classification label) obtained in advance in the video search apparatus 100.

図１１は、本発明の一実施の形態における類似映像を検索する処理のフローチャートである。 FIG. 11 is a flowchart of processing for searching for similar videos according to an embodiment of the present invention.

ステップ２０１）検索条件入力部１０５により、ユーザの検索意図を表す例示映像区間が指定される。このとき、検索対象映像の詳細条件が指定できるようにしてもよい。例えば、映像ジャンルやタイトル名、録画日時などの条件である。 Step 201) The search condition input unit 105 designates an exemplary video section representing the user's search intention. At this time, detailed conditions of the search target video may be designated. For example, conditions such as video genre, title name, recording date and time.

ステップ２０２）類似度算出部１０６は、映像データベース１０８から例示区間映像に対応する映像管理テーブルのLabel欄を参照し、ショットのラベル集合を求める。また、映像データベース１０８から検索対象となる候補映像のラベル集合を時系列に取得する。 Step 202) The similarity calculation unit 106 refers to the Label column of the video management table corresponding to the example section video from the video database 108 and obtains a label set of shots. Further, a label set of candidate videos to be searched is acquired from the video database 108 in time series.

ステップ２０３）類似度算出部１０６は、例示映像区間におけるショットのラベル集合に基づき、検索窓幅を決定する。この検索窓幅は、映像区間同士を比較するために、検索対象映像の中から被検索映像区間を取得するためのものである。検索窓幅の決定方法としては、例えば、例示映像区間に含まれるショットラベルの個数や例示映像区間の時間幅を利用してもよい。 Step 203) The similarity calculation unit 106 determines the search window width based on the label set of shots in the example video section. The search window width is for acquiring a searched video section from the search target videos in order to compare the video sections. As a method for determining the search window width, for example, the number of shot labels included in the example video section or the time width of the example video section may be used.

ステップ２０４）類似度算出部１０６は、ステップ２０２で得られたラベル集合に対し、図１２に示すように、先頭から検索窓幅を設定し、ステップ２０３で決定した検索窓をかけ、検索窓内のショットラベル集合を得る。 Step 204) The similarity calculation unit 106 sets a search window width from the top of the label set obtained in Step 202 as shown in FIG. 12, applies the search window determined in Step 203, Get a set of shot labels.

ステップ２０５）類似度算出部１０６は、ステップ２０３で得られる例示映像区間のショットラベル集合と、ステップ２０４から得られる被検索対象のショットラベル集合を元に、類似度を算出する。 Step 205) The similarity calculation unit 106 calculates the similarity based on the shot label set of the example video section obtained in Step 203 and the shot label set to be searched obtained in Step 204.

ステップ２０６）入力された検索対象映像が最後であるかを判断し、「はい」であれば、ステップ２０７に移行し、「いいえ」ならステップ２０４へ移行する。 Step 206) It is determined whether or not the input search target video is the last. If “Yes”, the process proceeds to Step 207, and if “No”, the process proceeds to Step 204.

ステップ２０７）検索結果表示部１０７は、算出した類似度が高い順に、映像区間を表示手段に表示する。 Step 207) The search result display unit 107 displays the video sections on the display means in descending order of the calculated similarity.

本発明は、上記の映像検索装置の動作をプログラムとして構築し、映像検索装置として利用されるコンピュータにインストールして実行させる、または、ネットワークを介して流通させることが可能である。 According to the present invention, the operation of the above-described video search apparatus can be constructed as a program, installed in a computer used as the video search apparatus, executed, or distributed via a network.

また、構築されたプログラムを、ハードディスクや、フレキシブルディスク、ＣＤ−ＲＯＭ等の可搬記憶媒体に格納し、コンピュータにインストールする、または、配布することが可能である。 Further, the constructed program can be stored in a portable storage medium such as a hard disk, a flexible disk, or a CD-ROM, and can be installed or distributed in a computer.

なお、本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made within the scope of the claims.

本発明は、映像区間の検出を行う技術に適用可能である。例えば、放送映像において番組の構成上、再現性がある、スポーツ競技のように、カメラの操作及び構図に類似性が見られる映像区間の検索に適用できる。 The present invention is applicable to a technique for detecting a video section. For example, the present invention can be applied to a search for a video section in which similarity is seen in the operation and composition of a camera, such as sports competition, which has reproducibility in the configuration of a broadcast video.

本発明の原理を説明するための図である。It is a figure for demonstrating the principle of this invention. 本発明の原理構成図である。It is a principle block diagram of this invention. 本発明の一実施の形態における映像検索装置の構成図である。It is a block diagram of the video search device in one embodiment of the present invention. 本発明の一実施の形態における映像データベースの映像管理テーブルの例である。It is an example of the image | video management table of the image | video database in one embodiment of this invention. 本発明の一実施の形態における映像データベースのラベル管理テーブルの例である。It is an example of the label management table of the video database in one embodiment of the present invention. 本発明の一実施の形態における映像再生装置による視聴映像の例である。It is an example of the viewing-and-listening video by the video reproduction apparatus in one embodiment of this invention. 本発明の一実施の形態における例示映像区間を選択する画面例である。It is an example of a screen which selects the example video section in one embodiment of the present invention. 本発明の一実施の形態におけるショット分類部の動作のフローチャートである。It is a flowchart of operation | movement of the shot classification | category part in one embodiment of this invention. 本発明の一実施の形態における映像特徴情報（映像管理テーブル、ラベル管理テーブル）を生成する手順のフローチャートである。It is a flowchart of the procedure which produces | generates the image | video feature information (an image | video management table, a label management table) in one embodiment of this invention. 本発明の一実施の形態における映像にラベルを付与するイメージ図である。It is an image figure which provides a label to the image | video in one embodiment of this invention. 本発明の一実施の形態における類似映像を検索する処理のフローチャートである。It is a flowchart of the process which searches the similar image | video in one embodiment of this invention. 本発明の一実施の形態における検索対象映像から被検索区間を求めるショットラベル集合を求める例である。It is an example which calculates | requires the shot label set which calculates | requires a to-be-searched area from the search object image | video in one embodiment of this invention.

Explanation of symbols

１００映像検索装置
１０１映像受付部
１０２ショット分割手段、ショット分割部
１０３構図情報抽出手段、構図情報抽出部
１０４ショット分類手段、ショット分類部
１０５検索条件入力部
１０６類似度算出手段、類似度算出部
１０７検索結果表示手段、検索結果表示部
１０８映像データベース
１１０外部情報記憶装置
１１１映像再生装置 DESCRIPTION OF SYMBOLS 100 Image | video search device 101 Image | video reception part 102 Shot division | segmentation means, Shot division | segmentation part 103 Composition information extraction means, Composition information extraction part 104 Shot classification | category means, shot classification | category part 105 Search condition input part 106 Similarity degree calculation means, Similarity degree calculation part 107 Search result display means, search result display unit 108 Video database 110 External information storage device 111 Video playback device

Claims

A video search method for searching for a video segment similar to the video segment exemplified by the user,
A shot dividing step in which the shot dividing means divides the video into partial video sections (hereinafter referred to as shots) at the point where the content of the input video has changed, and stores it in the storage means;
Composition information extracting means extracts composition information that is a feature quantity of a main area from the shot, and stores it in the storage means;
A shot classification step in which the shot classification unit classifies the shots based on the composition information and stores a classification category result in the storage unit;
Similarity calculation means, based on the classification category result of the shot obtained in the shot classification step, a similarity calculation step of calculating the similarity between the input example video section and the shot;
A search result display means for outputting a video section based on the similarity, a search result display step,
The video search method characterized by performing.

In the shot classification step,
Compare the unclassified input shot with the classified shot, and if it is less than a preset threshold, classify the unclassified input shot into an existing shot category, and if it is above the threshold, classify it into a new classification category.
The video search method according to claim 1.

A video search device for searching for a video section similar to the video section exemplified by the user,
Shot dividing means for dividing the video into partial video sections (hereinafter referred to as shots) at a point where the content of the input video has changed, and storing the divided video in a storage means;
Composition information extraction means for extracting composition information, which is a feature amount of a main area, from the shot and storing it in a storage means;
Shot classification means for classifying the shots stored in the storage means based on the composition information, and storing a classification category result in the storage means;
Based on the classification category result of the shot obtained by the shot classification means, similarity calculation means for calculating the similarity between the input example video section and the shot;
Search result display means for outputting a video section based on the similarity;
A video search apparatus characterized by comprising:

The shot classification means includes
A means for comparing an unclassified input shot with the classified shot, and classifying an unclassified input shot into an existing shot category if it is less than a preset threshold, and classifying it into a new classification category if it is greater than or equal to the threshold including,
The video search device according to claim 3.

A video search program for searching for a video segment similar to the video segment exemplified by the user,
Computer
5. A video search program which functions as the video device according to claim 3 or 4.

A computer-readable recording medium storing a video search program for searching for a video section similar to the video section exemplified by the user,
Computer
A computer-readable recording medium storing a program for functioning as the video apparatus according to claim 3.