JP5522790B2

JP5522790B2 - Template image generation apparatus and template image generation program

Info

Publication number: JP5522790B2
Application number: JP2010161921A
Authority: JP
Inventors: 雅規佐野; 真人藤井
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2010-07-16
Filing date: 2010-07-16
Publication date: 2014-06-18
Anticipated expiration: 2030-07-16
Also published as: JP2012022622A

Description

本発明は、複数の番組映像からテンプレートマッチングを行う際に用いるテンプレート画像を生成するテンプレート画像生成装置およびテンプレート画像生成プログラムに関する。 The present invention relates to a template image generation apparatus and a template image generation program for generating a template image used when template matching is performed from a plurality of program videos.

昨今、番組映像等のコンテンツを大量に蓄積し、これらに容易にアクセスできる環境が整ってきている。そして同時に、大量のコンテンツの中から所望の映像を効率よく検索するための技術が求められている。現在、このようなコンテンツの検索技術に関連して、番組映像等のコンテンツを解析し、コンテンツにメタデータを自動付与する研究が行われている。ここで、番組映像にメタデータを自動付与する際には、番組の大まかな構成の区切りを検出して利用するものがあり、この区切りの検出にある特定の映像あるいは画像が用いられることがある。 In recent years, a large amount of content such as program images has been accumulated and an environment in which these can be easily accessed has been established. At the same time, there is a need for a technique for efficiently searching for a desired video from a large amount of content. Currently, in connection with such content search technology, research is being conducted to analyze content such as program video and automatically add metadata to the content. Here, when automatically assigning metadata to a program video, there is one that detects and uses a rough break of the program, and a specific video or image in the detection of this break may be used. .

例えば、１つのニュース番組の中ではいくつものニュース項目が伝えられるが、これらのニュース項目は、ニュースを伝えるアナウンサのスタジオショットで大まかに区切ることができる。従って、大量のコンテンツの中から、前記したようなニュース番組におけるアナウンサのスタジオショットのような繰り返し用いられる典型的な演出シーンを抽出することができれば、番組映像にメタデータを自動付与する際に有効に利用することができる。 For example, a number of news items are conveyed in one news program, and these news items can be roughly divided by a studio shot of an announcer that conveys the news. Therefore, if it is possible to extract typical production scenes that are used repeatedly, such as the announcer's studio shots in news programs as described above, from a large amount of content, it is effective for automatically adding metadata to the program video. Can be used.

ここで、非特許文献１，２では、テンプレート画像を利用してテンプレートマッチングを行うことで、番組映像等のコンテンツを区切る技術が対案されている。また、特許文献１，２では、連続して入力される画像からテンプレート画像を生成する技術が提案されている。 Here, Non-Patent Documents 1 and 2 propose a technique for dividing content such as a program video by performing template matching using a template image. Further, Patent Documents 1 and 2 propose a technique for generating a template image from images input continuously.

特開平１１−２８４９９７号公報JP-A-11-284997 特開２００６−２７６９４８号公報JP 2006-276948 A

HongJiang Zhang, Shuang Yeo Tan, Stephen W. Smoliar, Gong Yihong: “Automatic parsing and indexing of news video”, Multimedia Systems, Vol.2, pp.256-266, (1995)HongJiang Zhang, Shuang Yeo Tan, Stephen W. Smoliar, Gong Yihong: “Automatic parsing and indexing of news video”, Multimedia Systems, Vol.2, pp.256-266, (1995) Deborah Swanberg, Chiao-Fe Shu, Ramesh Jain: “Knowledge Guided Parsing in Video Databases”, SPIE, Vol.1908, pp.13-24, (1993)Deborah Swanberg, Chiao-Fe Shu, Ramesh Jain: “Knowledge Guided Parsing in Video Databases”, SPIE, Vol.1908, pp.13-24, (1993)

しかしながら、非特許文献１，２で提案された技術では、テンプレート画像の具体的な生成方法について詳細に説明されていなかった。また、非特許文献１，２で提案された技術では、テンプレート画像の生成のほとんどが人手によるものであり、大量のコンテンツを区切るには非効率的であった。 However, in the techniques proposed in Non-Patent Documents 1 and 2, a specific method for generating a template image has not been described in detail. Further, in the technologies proposed in Non-Patent Documents 1 and 2, most of the template images are generated manually, which is inefficient for dividing a large amount of content.

また、特許文献１，２で提案された技術は、映像の中から単にテンプレート画像を生成するだけであり、映像の中で繰り返し用いられる典型的な演出シーンからテンプレート画像を生成することはできなかった。 In addition, the techniques proposed in Patent Documents 1 and 2 simply generate a template image from a video, and cannot generate a template image from a typical effect scene that is repeatedly used in the video. It was.

また、例えば、ニュース番組におけるアナウンサのスタジオショットでは、背景のスタジオセット部分は基本的には変化がないが、出演するアナウンサは日によって変化する可能性がある。しかしながら、前記した各文献で提案された技術では、番組映像において変化のない固定部分と、変化のある可変部分と、を考慮することなくテンプレート画像を生成しているため、テンプレート画像内における可変部分の割合が大きくなると、テンプレートマッチングの精度が低下してしまうという問題があった。また、テンプレートマッチングの際には、対象となる画像がテンプレート画像と類似しているか否かを判断するために閾値を用いるが、前記した可変部分の割合が変化すると、この閾値がばらつくことになるため、前記した各文献で提案された技術は、実用的ではないという問題があった。 Also, for example, in an announcer's studio shot in a news program, the background studio set portion basically does not change, but the appearing announcer may change from day to day. However, in the technique proposed in each of the above-mentioned documents, the template image is generated without considering the fixed portion that does not change in the program video and the variable portion that changes, so the variable portion in the template image There is a problem in that the accuracy of template matching is reduced when the ratio is increased. In template matching, a threshold value is used to determine whether the target image is similar to the template image. However, this threshold value varies when the ratio of the variable portion changes. Therefore, there has been a problem that the techniques proposed in the above-mentioned documents are not practical.

本発明はかかる点に鑑みてなされたものであって、複数の番組映像の中で繰り返し用いられる典型的な演出シーンからテンプレート画像を自動的に生成することができるとともに、番組映像における固定部分と可変部分とを考慮したテンプレート画像を生成することができるテンプレート画像生成装置およびテンプレート画像生成プログラムを提供することを課題とする。 The present invention has been made in view of the above points, and can automatically generate a template image from a typical effect scene repeatedly used in a plurality of program videos, and a fixed portion in the program video. It is an object of the present invention to provide a template image generation apparatus and a template image generation program capable of generating a template image considering a variable portion.

前記課題を解決するために請求項１に係るテンプレート画像生成装置は、複数の番組映像から、テンプレートマッチングを行う際に用いるテンプレート画像を生成するテンプレート画像生成装置であって、画像特徴量抽出手段と、テンプレート候補クラスタ抽出手段と、テンプレート画像生成手段と、を備える構成とした。 In order to solve the above-mentioned problem, a template image generation device according to claim 1 is a template image generation device that generates a template image used when performing template matching from a plurality of program videos, and includes an image feature amount extraction unit, The template candidate cluster extracting unit and the template image generating unit are provided.

このような構成によれば、テンプレート画像生成装置は、画像特徴量抽出手段によって、複数の番組映像をショットごとに分割し、当該ショットから代表静止画像を抽出するとともに、当該代表静止画像を所定の数のブロックに分割し、当該ブロックごとの画像特徴量を抽出する。また、テンプレート候補クラスタ抽出手段によって、画像特徴量抽出手段によって抽出された画像特徴量の類似度に従って代表静止画像を階層クラスタリングし、当該階層クラスタリングの結果を示す樹形図を所定の階層で切断した場合における切断線との各交点から、１つ１つ下の階層を辿り、所定距離以上離れて分岐を持つクラスタを、テンプレート候補クラスタとして抽出する。また、テンプレート画像生成手段によって、テンプレート候補クラスタ抽出手段によって抽出されたテンプレート候補クラスタからテンプレート画像を生成する。 According to such a configuration, the template image generation device divides a plurality of program videos into shots by the image feature amount extraction unit, extracts the representative still images from the shots, and extracts the representative still images from the predetermined still images. The image feature amount is extracted for each block. In addition, the template candidate cluster extracting unit hierarchically clusters the representative still images according to the similarity of the image feature amount extracted by the image feature amount extracting unit, and the tree diagram indicating the result of the hierarchical clustering is cut at a predetermined hierarchy. From each intersection with the cutting line in the case, the cluster is traced down one by one, and a cluster having a branch at a predetermined distance or more is extracted as a template candidate cluster. Further, the template image generation unit generates a template image from the template candidate cluster extracted by the template candidate cluster extraction unit.

また、請求項１に係るテンプレート画像生成装置は、テンプレート画像生成手段が、分散値算出部と、マスク情報生成部と、テンプレート画像選択部と、を備える構成とした。 Further, the template image generation apparatus according to claim 1 is configured such that the template image generation means includes a variance value calculation unit, a mask information generation unit, and a template image selection unit.

このような構成によれば、テンプレート画像生成装置のテンプレート画像生成手段は、分散値算出部によって、テンプレート候補クラスタ抽出手段によって抽出されたテンプレート候補クラスタに含まれる、テンプレート画像の候補となる複数のテンプレート候補画像のブロックごとの画像特徴量の分散値を算出する。また、マスク情報生成部によって、分散値算出部によって算出された分散値が予め設定された閾値を超える場合、前記テンプレート画像に対するブロックごとのマスクの形成位置に関する情報であるマスク情報を生成する。また、テンプレート画像選択部によって、テンプレート候補クラスタ抽出手段によって抽出されたテンプレート候補クラスタに含まれる複数のテンプレート候補画像の中で、画像特徴量が最も平均に近いテンプレート候補画像を、テンプレート画像として選択する。 According to such a configuration, the template image generation unit of the template image generation apparatus includes a plurality of templates that are candidates for template images included in the template candidate clusters extracted by the template candidate cluster extraction unit by the variance value calculation unit. A variance value of the image feature amount for each block of the candidate image is calculated. Further, when the variance value calculated by the variance value calculation unit exceeds a preset threshold value, the mask information generation unit generates mask information that is information relating to a mask formation position for each block with respect to the template image. Further, the template image selection unit selects, as a template image, a template candidate image whose image feature amount is closest to the average among a plurality of template candidate images included in the template candidate cluster extracted by the template candidate cluster extraction unit. .

また、請求項２に係るテンプレート画像生成装置は、請求項１に係るテンプレート画像生成装置において、テンプレート候補クラスタ抽出手段が、階層クラスタリング部と、候補クラスタ抽出部と、を備える構成とした。 According to a second aspect of the present invention, the template image generating apparatus according to the first aspect is configured such that the template candidate cluster extracting unit includes a hierarchical clustering unit and a candidate cluster extracting unit.

このような構成によれば、テンプレート画像生成装置は、階層クラスタリング部によって、画像特徴量抽出手段によって抽出された代表静止画像のブロックごとの画像特徴量を所定の順序で並べたものを代表静止画像の特徴ベクトルとし、当該特徴ベクトルの類似度に従って代表静止画像を階層クラスタリングする。また、候補クラスタ抽出部によって、階層クラスタリング部による階層クラスタリングの結果を示す樹形図を所定の階層で切断した場合における切断線との各交点から、１つ１つ下の階層を辿り、所定距離以上離れて分岐を持つクラスタを、テンプレート候補クラスタとして抽出する。 According to such a configuration, the template image generation device is a representative still image obtained by arranging the image feature amounts for each block of the representative still image extracted by the image feature amount extraction unit by the hierarchical clustering unit in a predetermined order. The representative still images are hierarchically clustered according to the similarity of the feature vectors. In addition, the candidate cluster extraction unit traces a tree diagram indicating the result of the hierarchical clustering by the hierarchical clustering unit at a predetermined level, and then follows each level from the intersection with the cutting line to determine a predetermined distance. Clusters having branches apart from each other are extracted as template candidate clusters.

また、請求項３に係るテンプレート画像生成装置は、請求項２に係るテンプレート画像生成装置において、テンプレート候補クラスタ抽出手段が、候補クラスタ絞り込み部を備える構成とした。 According to a third aspect of the present invention, in the template image generating apparatus according to the second aspect, the template candidate cluster extracting unit includes a candidate cluster narrowing-down unit.

このような構成によれば、テンプレート画像生成装置は、候補クラスタ絞り込み部によって、候補クラスタ抽出部によって抽出されたテンプレート候補クラスタが、隣り合う前記ショットから抽出された代表静止画像を含むクラスタである場合、該当する代表静止画像をテンプレート候補クラスタの中から削除する第１の条件と、第１の条件を経たテンプレート候補クラスタに含まれる代表静止画像の抽出元となる番組映像の数が、予め設定された数以上ではない場合、該当するテンプレート候補クラスタを削除する第２の条件と、に従って、テンプレート候補クラスタを絞り込む。 According to such a configuration, in the template image generation device, the candidate cluster narrowing unit extracts the template candidate cluster extracted by the candidate cluster extracting unit as a cluster including a representative still image extracted from the adjacent shots. The first condition for deleting the corresponding representative still image from the template candidate clusters and the number of program videos from which the representative still images included in the template candidate cluster that have passed through the first condition are preset. If the number is not more than the number, the template candidate clusters are narrowed down according to the second condition for deleting the corresponding template candidate clusters.

そして、請求項４に係るテンプレート画像生成プログラムは、複数の番組映像から、テンプレートマッチングを行う際に用いるテンプレート画像を生成するために、コンピュータを、画像特徴量抽出手段、テンプレート候補クラスタ抽出手段、分散値算出手段、マスク情報生成手段、テンプレート画像選択手段、として機能させる構成とした。 According to a fourth aspect of the present invention, there is provided a template image generation program for generating a template image to be used when template matching is performed from a plurality of program videos by using an image feature amount extraction unit, a template candidate cluster extraction unit, a distribution It is configured to function as a value calculation unit, a mask information generation unit, and a template image selection unit.

このような構成によれば、テンプレート画像生成プログラムは、画像特徴量抽出手段によって、複数の番組映像をショットごとに分割し、当該ショットから代表静止画像を抽出するとともに、当該代表静止画像を所定の数のブロックに分割し、当該ブロックごとの画像特徴量を抽出する。また、テンプレート候補クラスタ抽出手段によって、画像特徴量抽出手段によって抽出された画像特徴量の類似度に従って代表静止画像を階層クラスタリングし、当該階層クラスタリングの結果を示す樹形図を所定の階層で切断した場合における切断線との各交点から、１つ１つ下の階層を辿り、所定距離以上離れて分岐を持つクラスタを、テンプレート候補クラスタとして抽出する。また、分散値算出手段によって、テンプレート候補クラスタ抽出手段によって抽出されたテンプレート候補クラスタに含まれる、テンプレート画像の候補となる複数のテンプレート候補画像のブロックごとの画像特徴量の分散値を算出する。また、マスク情報生成手段によって、分散値算出手段によって算出された分散値が予め設定された閾値を超える場合、テンプレート画像に対するブロックごとのマスクの形成位置に関する情報であるマスク情報を生成する。また、テンプレート画像選択手段によって、テンプレート候補クラスタ抽出手段によって抽出されたテンプレート候補クラスタに含まれる複数のテンプレート候補画像の中で、画像特徴量が最も平均に近いテンプレート候補画像を、テンプレート画像として選択する。 According to such a configuration, the template image generation program divides a plurality of program videos into shots by the image feature amount extraction unit, extracts the representative still images from the shots, and extracts the representative still images from the predetermined still images. The image feature amount is extracted for each block. In addition, the template candidate cluster extracting unit hierarchically clusters the representative still images according to the similarity of the image feature amount extracted by the image feature amount extracting unit, and the tree diagram indicating the result of the hierarchical clustering is cut at a predetermined hierarchy. From each intersection with the cutting line in the case, the cluster is traced down one by one, and a cluster having a branch at a predetermined distance or more is extracted as a template candidate cluster. Further, the variance value calculating means calculates the variance value of the image feature amount for each block of the plurality of template candidate images that are candidates for the template image included in the template candidate cluster extracted by the template candidate cluster extracting means. Further, when the variance value calculated by the variance value calculation unit exceeds a preset threshold value, the mask information generation unit generates mask information that is information regarding the mask formation position for each block with respect to the template image. Further, the template image selection means selects, as a template image, a template candidate image whose image feature amount is closest to the average among a plurality of template candidate images included in the template candidate cluster extracted by the template candidate cluster extraction means. .

請求項１、請求項４に係る発明によれば、画像特徴量の類似度に従って代表静止画像の階層クラスタリングを行い、その結果から、一部分が類似する複数の代表静止画像が含まれるクラスタを抽出することで、複数の番組映像の中で繰り返し用いられる典型的な演出シーンからテンプレート画像を自動的に生成することができる。また、複数のテンプレート候補画像における画像特徴量の分散値を算出することで画像内における可変部分を判別し、この可変部分を覆うマスク情報を生成するため、当該マスク情報で特定されるマスクをテンプレート画像に合成することにより、テンプレートマッチングの精度を向上させることができるとともに、テンプレートマッチングの際における閾値のばらつきを防止することができる。 According to the first and fourth aspects of the invention, hierarchical clustering of representative still images is performed according to the similarity of image feature amounts, and clusters including a plurality of representative still images that are partially similar are extracted from the result. Thus, a template image can be automatically generated from a typical effect scene repeatedly used in a plurality of program videos. In addition, in order to determine a variable portion in the image by calculating a variance value of image feature amounts in a plurality of template candidate images and generate mask information covering the variable portion, a mask specified by the mask information is used as a template. By synthesizing with an image, the accuracy of template matching can be improved, and variations in threshold values during template matching can be prevented.

請求項２に係る発明によれば、階層クラスタリングの結果を示す樹形図において、切断線との各交点から１つ１つ下の階層を辿り、所定距離以上離れて分岐を持つクラスタを探索することによって、テンプレート候補クラスタを容易に抽出することができる。 According to the invention according to claim 2, in the tree diagram showing the result of the hierarchical clustering, the cluster having a branch at a predetermined distance or more is searched by following the hierarchy one by one from each intersection with the cutting line. Thus, template candidate clusters can be easily extracted.

請求項３に係る発明によれば、テンプレート候補クラスタを２つの条件を用いて段階的に絞り込むことで、複数の番組映像の中で繰り返し用いられる典型的な演出シーンだけを精度よく抽出することができる。 According to the invention according to claim 3, it is possible to accurately extract only typical production scenes that are repeatedly used in a plurality of program videos by narrowing down the template candidate clusters in stages using two conditions. it can.

本発明の実施形態に係るテンプレート画像生成装置の全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the template image generation apparatus which concerns on embodiment of this invention. 本発明の実施形態に係るテンプレート画像生成装置のショット分割部による処理を示す概略図であり、（ａ）は番組Ａに対する処理を示す図、（ｂ）は番組Ｂに対する処理を示す図、である。It is the schematic which shows the process by the shot division part of the template image generation apparatus which concerns on embodiment of this invention, (a) is a figure which shows the process with respect to the program A, (b) is a figure which shows the process with respect to the program B . 本発明の実施形態に係るテンプレート画像生成装置の代表静止画像抽出部による処理を示す概略図であり、（ａ）は番組Ａに対する処理を示す図、（ｂ）は番組Ｂに対する処理を示す図、である。It is the schematic which shows the process by the representative still image extraction part of the template image generation apparatus which concerns on embodiment of this invention, (a) is a figure which shows the process with respect to program A, (b) is a figure which shows the process with respect to program B, It is. 本発明の実施形態に係るテンプレート画像生成装置のブロック分割部による処理を示す概略図である。It is the schematic which shows the process by the block division part of the template image generation apparatus which concerns on embodiment of this invention. 階層クラスタリングの概要を示す概略図である。It is the schematic which shows the outline | summary of hierarchical clustering. 本発明の実施形態に係るテンプレート画像生成装置の階層クラスタリング部による処理を示す概略図であり、（ａ）は階層クラスタリングの結果を示す樹形図、（ｂ）は（ａ）において破線で囲ったＡ部の拡大図、である。It is the schematic which shows the process by the hierarchical clustering part of the template image generation apparatus which concerns on embodiment of this invention, (a) is a dendrogram which shows the result of hierarchical clustering, (b) was enclosed with the broken line in (a). It is an enlarged view of the A section. 図６（ａ）において破線で囲ったＢ部の拡大図である。FIG. 7 is an enlarged view of a portion B surrounded by a broken line in FIG. 本発明の実施形態に係るテンプレート画像生成装置の候補クラスタ抽出部による処理を示すフローチャートである。It is a flowchart which shows the process by the candidate cluster extraction part of the template image generation apparatus which concerns on embodiment of this invention. 本発明の実施形態に係るテンプレート画像生成装置の候補クラスタ絞り込み部による第１の条件に基づく処理を示す概略図であり、（ａ）は複数のテンプレート候補画像が隣接したショットから抽出されたものである場合を示す図、（ｂ）は複数のテンプレート候補画像が隣接したショットから抽出されたものではない場合を示す図、である。It is the schematic which shows the process based on the 1st condition by the candidate cluster narrowing part of the template image generation device which concerns on embodiment of this invention, (a) is what extracted several template candidate images from the adjacent shot. FIG. 5B is a diagram illustrating a case where a plurality of template candidate images are not extracted from adjacent shots. 本発明の実施形態に係るテンプレート画像生成装置の分散値算出部による処理を示す概略図である。It is the schematic which shows the process by the dispersion value calculation part of the template image generation apparatus which concerns on embodiment of this invention. 本発明の実施形態に係るテンプレート画像生成装置によって出力されるマスク情報とテンプレート画像の一例を示す概略図であり、（ａ）はマスク情報生成部によって生成されたマスク情報と、テンプレート画像選択部によって選択されたテンプレート画像と、を示す図、（ｂ）は合成されたマスク情報付きテンプレート画像を示す図、である。It is the schematic which shows an example of the mask information output by the template image generation apparatus which concerns on embodiment of this invention, and a template image, (a) is the mask information produced | generated by the mask information production | generation part, and the template image selection part The figure which shows the selected template image, (b) is a figure which shows the synthesized template image with mask information.

本発明の実施形態に係るテンプレート画像生成装置およびテンプレート画像生成プログラムについて、図面を参照しながら説明する。 A template image generation apparatus and a template image generation program according to an embodiment of the present invention will be described with reference to the drawings.

［テンプレート画像生成装置］
テンプレート画像生成装置１は、複数の番組映像からテンプレートマッチングを行う際に用いるテンプレート画像を生成する装置である。テンプレート画像生成装置１は、図１に示すように、画像特徴量抽出手段１０と、テンプレート候補クラスタ抽出手段２０と、テンプレート画像生成手段３０と、を主な構成として備えている。なお、本発明で用いられる複数の番組とは、２つ以上の同じシリーズの番組の映像であり、例えば、毎週同じ時間に放送される同じ番組名のニュース番組等が挙げられる。 [Template image generator]
The template image generation device 1 is a device that generates a template image used when performing template matching from a plurality of program videos. As shown in FIG. 1, the template image generating apparatus 1 includes an image feature amount extracting unit 10, a template candidate cluster extracting unit 20, and a template image generating unit 30 as main components. The plurality of programs used in the present invention are two or more videos of the same series of programs, such as news programs with the same program name broadcast at the same time every week.

画像特徴量抽出手段１０は、複数の番組映像をショットごとに分割し、当該ショットから代表静止画像を抽出するとともに、当該代表静止画像のブロックごとの画像特徴量を抽出するものである。画像特徴量抽出手段１０は、ここでは図１に示すように、ショット分割部１１と、代表静止画像抽出部１２と、ブロック分割部１３と、特徴量抽出部１４と、を備えている。 The image feature amount extraction means 10 divides a plurality of program videos into shots, extracts representative still images from the shots, and extracts image feature amounts for each block of the representative still images. As shown in FIG. 1, the image feature quantity extraction unit 10 includes a shot division unit 11, a representative still image extraction unit 12, a block division unit 13, and a feature quantity extraction unit 14.

ショット分割部１１は、入力された番組映像を切れ目のないショットごとに分割するものである。ショット分割部１１は、例えば図２（ａ）、（ｂ）に示すように、２つの番組Ａ，Ｂの番組映像が入力された場合、それぞれの映像の中から編集点等の映像の切れ目を検出し、その切れ目に従ってショットごとに分割する。なお、映像の切れ目は番組ごとに異なるため、図２（ａ）、（ｂ）に示すように、各ショットの長さは番組によって異なる。 The shot division unit 11 divides the input program video for each continuous shot. For example, as shown in FIGS. 2 (a) and 2 (b), the shot dividing unit 11 receives a program video of two programs A and B, and cuts off video points such as edit points from the respective videos. Detect and divide each shot according to the break. In addition, since the breaks in the video are different for each program, as shown in FIGS. 2A and 2B, the length of each shot differs for each program.

ショット分割部１１には、図１に示すように、図示しない番組映像記憶手段等から複数の番組映像が入力される。そして、ショット分割部１１は、前記した手法によって番組映像をショットごとに分割し、これを代表静止画像抽出部１２に出力する。 As shown in FIG. 1, a plurality of program videos are input to the shot division unit 11 from program video storage means (not shown). Then, the shot dividing unit 11 divides the program video for each shot by the above-described method, and outputs this to the representative still image extracting unit 12.

代表静止画像抽出部１２は、番組映像を構成するショットから代表静止画像を抽出するものである。ここで、代表静止画像とは、各ショットの内容を代表する静止画像のことを指している。なお、番組映像を構成するショットにおいて代表静止画像を抽出する場所は、番組映像の内容によって異なる。代表静止画像抽出部１２は、例えば番組映像がニュース番組の映像である場合、ショット内での映像の変化が少ないことが多いため、各ショットの最初のフレーム画像を代表静止画像として抽出する。 The representative still image extraction unit 12 extracts a representative still image from shots constituting a program video. Here, the representative still image refers to a still image that represents the content of each shot. Note that the place where the representative still image is extracted from the shots constituting the program video differs depending on the content of the program video. For example, when the program video is a news program video, the representative still image extraction unit 12 extracts the first frame image of each shot as a representative still image because there is often little change in the video within the shot.

代表静止画像抽出部１２は、例えば図３（ａ）、（ｂ）に示すように、ショット分割部１１から２つのニュース番組Ａ，Ｂの各ショットが入力された場合、各ショットの最初のフレーム画像を代表静止画像として抽出する。なお、代表静止画像抽出部１２は、複数のショットから代表静止画像を抽出する際に、当該代表静止画像に対して、抽出した番組名と、番組映像を構成する何番目のショットから抽出されたものであるかを示す番号を付与する。 For example, as shown in FIGS. 3A and 3B, the representative still image extraction unit 12 receives the first frame of each shot when the shots of the two news programs A and B are input from the shot division unit 11. An image is extracted as a representative still image. When the representative still image extraction unit 12 extracts a representative still image from a plurality of shots, the representative still image is extracted from the representative still image from the extracted program name and the number of shots constituting the program video. The number which shows whether it is a thing is provided.

代表静止画像抽出部１２には、図１に示すように、ショット分割部１１から複数の番組映像を構成する複数のショットが入力される。そして、代表静止画像抽出部１２は、前記した手法によって複数のショットから代表静止画像を抽出し、これをブロック分割部１３に出力する。 As shown in FIG. 1, the representative still image extraction unit 12 receives a plurality of shots constituting a plurality of program videos from the shot division unit 11. Then, the representative still image extracting unit 12 extracts a representative still image from a plurality of shots by the above-described method, and outputs this to the block dividing unit 13.

ブロック分割部１３は、代表静止画像を予め設定された所定の数でブロック分割するものである。ブロック分割部１３は、例えば図４に示すように、代表静止画像抽出部１２から複数の代表静止画像が入力された場合、それぞれの代表静止画像を予め設定された横１８マス×縦１１マス（１９８ブロック）でブロック分割する。 The block dividing unit 13 divides the representative still image into blocks by a predetermined number. For example, as shown in FIG. 4, when a plurality of representative still images are input from the representative still image extracting unit 12, the block dividing unit 13 converts each representative still image into a preset 18 horizontal × 11 vertical ( 198 blocks).

ブロック分割部１３には、図１に示すように、代表静止画像抽出部１２から複数の番組の複数のショットから抽出された複数の代表静止画像が入力される。そして、ブロック分割部１３は、前記した手法によってそれぞれの代表静止画像をブロック分割し、これを特徴量抽出部１４に出力する。 As shown in FIG. 1, the block dividing unit 13 receives a plurality of representative still images extracted from a plurality of shots of a plurality of programs from the representative still image extracting unit 12. Then, the block dividing unit 13 divides each representative still image into blocks by the above-described method, and outputs this to the feature amount extracting unit 14.

特徴量抽出部１４は、代表静止画像のブロックごとの画像特徴量を抽出するものである。ここで、特徴量抽出部１４が抽出する画像特徴量としては、例えば各ブロックのＲＧＢ成分の平均値や、Ｌ^＊ａ^＊ｂ^＊成分の平均値等の色情報や、ＤＣＴ係数等の空間周波数の情報を用いることができる。 The feature amount extraction unit 14 extracts an image feature amount for each block of the representative still image. Here, examples of the image feature amount extracted by the feature amount extraction unit 14 include color information such as an average value of RGB components of each block, an average value of L ^* a ^* b ^* components, and a spatial frequency such as a DCT coefficient. Can be used.

特徴量抽出部１４には、図１に示すように、ブロック分割部１３から、ブロック分割後の複数の代表静止画像が入力される。そして、特徴量抽出部１４は、前記した手法によって代表静止画像のブロックごとの画像特徴量を抽出し、これをテンプレート候補クラスタ抽出手段２０の階層クラスタリング部２１に出力する。 As shown in FIG. 1, a plurality of representative still images after block division are input to the feature amount extraction unit 14 from the block division unit 13. Then, the feature amount extraction unit 14 extracts the image feature amount for each block of the representative still image by the above-described method, and outputs this to the hierarchical clustering unit 21 of the template candidate cluster extraction unit 20.

テンプレート候補クラスタ抽出手段２０は、画像特徴量抽出手段１０によって抽出された画像特徴量の類似度に従って代表静止画像を階層クラスタリングし、当該階層クラスタリングの結果を示す樹形図を所定の階層で切断した場合における切断線との各交点から、１つ１つ下の階層を辿り、所定距離以上離れて分岐を持つクラスタを、テンプレート画像の候補となる複数の画像が含まれるテンプレート候補クラスタとして抽出するものである。テンプレート候補クラスタ抽出手段２０は、ここでは図１に示すように、階層クラスタリング部２１と、候補クラスタ抽出部２２と、距離条件記憶部２３と、候補クラスタ絞り込み部２４と、割合条件記憶部２５と、を備えている。 The template candidate cluster extraction unit 20 performs hierarchical clustering on the representative still images according to the similarity of the image feature amounts extracted by the image feature amount extraction unit 10, and cuts a tree diagram indicating the result of the hierarchical clustering at a predetermined level. In this case, a cluster having a branch at a predetermined distance or more from each intersection with the cutting line is extracted as a template candidate cluster including a plurality of images that are candidates for the template image. It is. Here, as shown in FIG. 1, the template candidate cluster extracting means 20 includes a hierarchical clustering unit 21, a candidate cluster extracting unit 22, a distance condition storage unit 23, a candidate cluster narrowing unit 24, and a ratio condition storage unit 25. It is equipped with.

階層クラスタリング部２１は、代表静止画像のブロックごとの画像特徴量を用いて階層クラスタリングを行うものである。ここで、階層クラスタリングとは、複数のデータ群のそれぞれを単独のクラスタとしてみなし、それぞれの類似度に基づいて、クラスタを階層的に分類する手法である。以下、図５を参照しながら、階層クラスタリングの概要について簡単に説明する。 The hierarchical clustering unit 21 performs hierarchical clustering using image feature amounts for each block of the representative still image. Here, hierarchical clustering is a technique in which each of a plurality of data groups is regarded as a single cluster, and the clusters are classified hierarchically based on the respective similarities. Hereinafter, an outline of hierarchical clustering will be briefly described with reference to FIG.

図５に示すように、例えば６個のデータ１〜６をそれぞれ単独のクラスタとみなして階層クラスタリングする場合を考える。この場合、まずデータ１〜６のクラスタの中で、最も特徴量が類似する組み合わせを選択する。そして、例えばデータ１とデータ４の特徴量が全データの中で最も類似する場合、図５に示すように、データ１とデータ４とを線で結んで結合し、第１結合クラスタＣ_１を生成する。ここで、第１結合クラスタＣ_１は、データ１とデータ４の２つのクラスタを含むクラスタである。 As shown in FIG. 5, for example, consider a case where hierarchical clustering is performed by regarding each of six pieces of data 1 to 6 as a single cluster. In this case, first, a combination having the most similar feature amount is selected from the clusters of data 1 to 6. Then, for example, if the feature amount of data 1 and data 4 are the most similar among all the data, as shown in FIG. 5, coupled by connecting the data 1 and data 4 in a line, the first coupling cluster C ₁ Generate. Here, the first combined cluster C ₁ is a cluster including two clusters of data 1 and data 4.

次に、第１結合クラスタＣ_１と、残りのデータ２，３，５，６のクラスタの中で、最も特徴量が類似する組み合わせを選択する。なお、第１結合クラスタＣ_１の特徴量は、データ１およびデータ４の特徴量の平均値で表される。そして、例えば第１結合クラスタＣ_１とデータ５の特徴量が残った全データの中で最も類似する場合、図５に示すように、第１結合クラスタＣ_１とデータ５とを線で結んで結合し、第２結合クラスタＣ_２を生成する。ここで、第２結合クラスタＣ_２は、データ１とデータ４とデータ５の３つのクラスタを含むクラスタである。 Next, a combination having the most similar feature amount is selected from the first combined cluster C ₁ and the remaining clusters of data 2, 3, 5, and 6. Note that the feature amount of the first combined cluster C ₁ is represented by an average value of the feature amounts of the data 1 and the data 4. For example, if the first combined cluster C ₁ and the data 5 are most similar among all remaining data, as shown in FIG. 5, the first combined cluster C ₁ and the data 5 are connected by a line. bound, generating a second binding cluster _{C 2.} Here, the second combined cluster C ₂ is a cluster including three clusters of data 1, data 4, and data 5.

階層クラスタリングでは、このように特徴量が類似するクラスタ同士を次々と結合し、図５に示すように、最終的に全てのクラスタが１つの第４結合クラスタＣ_４を形成するまで結合処理を行う。ここで、図５は、階層クラスタリングの結果を視覚的に表した樹形図（デンドログラム）である。図５の樹形図における縦軸は、各データのクラスタが結合する際の結合距離（非類似度）を示しており、樹形図の下の階層で結合したクラスタほど特徴量が類似し、樹形図の上の階層で結合したクラスタほど特徴量が類似しないことを意味している。 Hierarchical clustering, thus feature quantity bonded one after another cluster together similar, as shown in FIG. 5, performs the binding processing until finally all the clusters to form a fourth connection cluster C ₄ one . Here, FIG. 5 is a dendrogram that visually represents the result of hierarchical clustering. The vertical axis in the dendrogram in FIG. 5 indicates the coupling distance (dissimilarity) when the clusters of each data are coupled, and the cluster has a similar feature amount in the hierarchy below the dendrogram, This means that the feature quantity is not similar to the cluster connected in the hierarchy above the tree diagram.

階層クラスタリング部２１は、具体的には、画像特徴量抽出手段１０の特徴量抽出部１４によって抽出された代表静止画像のブロックごとの画像特徴量を、所定の順序、例えば代表静止画像のブロックの左上から右下に向って並べたものを代表静止画像の特徴ベクトルとする。そして、１枚の代表静止画像を１つのクラスタとして、それぞれの特徴ベクトルの類似度に基づいて階層クラスタリングを行う。 Specifically, the hierarchical clustering unit 21 calculates the image feature amount for each block of the representative still image extracted by the feature amount extraction unit 14 of the image feature amount extraction unit 10 in a predetermined order, for example, the block of the representative still image block. The features arranged from the upper left to the lower right are used as feature vectors of the representative still image. Then, hierarchical clustering is performed based on the similarity of each feature vector with one representative still image as one cluster.

階層クラスタリング部２１による階層クラスタリングの結果を視覚的に表現したものが、図６（ａ）に示す樹形図である。図６（ａ）における樹形図において、縦軸は、それぞれのクラスタが結合した際におけるそれぞれの特徴ベクトルの結合距離（非類似度）を示しており、横軸（図６（ｂ））は、抽出された各代表静止画像を示している。 A tree diagram shown in FIG. 6A is a visual representation of the result of hierarchical clustering performed by the hierarchical clustering unit 21. In the tree diagram in FIG. 6 (a), the vertical axis indicates the connection distance (dissimilarity) of the feature vectors when the clusters are combined, and the horizontal axis (FIG. 6 (b)) is the horizontal axis. , Each representative still image extracted is shown.

階層クラスタリング部２１には、図１に示すように、特徴量抽出部１４から、代表静止画像のブロックごとの画像特徴量が入力される。そして、階層クラスタリング部２１は、前記した手法によって階層クラスタリングを行い、図６（ａ）の樹形図に示すような階層クラスタリング結果を候補クラスタ抽出部２２に出力する。 As shown in FIG. 1, the hierarchical clustering unit 21 receives the image feature amount for each block of the representative still image from the feature amount extraction unit 14. The hierarchical clustering unit 21 performs hierarchical clustering by the above-described method, and outputs a hierarchical clustering result as shown in the tree diagram of FIG. 6A to the candidate cluster extracting unit 22.

候補クラスタ抽出部２２は、階層クラスタリング部２１による階層クラスタリングの結果から、テンプレート候補クラスタを抽出するものである。ここで、テンプレート候補クラスタとは、テンプレート画像の候補となる代表静止画像が含まれたクラスタのことを意味している。候補クラスタ抽出部２２は、階層クラスタリング部２１から階層クラスタリングの結果が入力されると、次に示す所定のアルゴリズムでテンプレート候補クラスタを抽出する。説明のために図６（ａ）を拡大したものを図７に示す。 The candidate cluster extraction unit 22 extracts template candidate clusters from the result of hierarchical clustering by the hierarchical clustering unit 21. Here, the template candidate cluster means a cluster including a representative still image that is a template image candidate. When the result of the hierarchical clustering is input from the hierarchical clustering unit 21, the candidate cluster extracting unit 22 extracts a template candidate cluster using a predetermined algorithm shown below. FIG. 7 shows an enlarged view of FIG. 6A for explanation.

図７を参照すると、上の階層に階段状の領域が存在する。この階段状の領域は、様々なクラスタが結合して形成されたクラスタＣ_ｘに、その他のクラスタが次々と結合することによって形成された領域である。このクラスタＣ_ｘに結合するクラスタの中には、図７に示すように、代表静止画像が１枚のみ含まれたクラスタＣ_０と、代表静止画像が複数枚含まれたクラスタＣ_２と、が存在する。 Referring to FIG. 7, there is a stepped region in the upper hierarchy. This stepped region, the cluster C _x various clusters are formed by bonding the other cluster is a region formed by combining one after another. Among the clusters coupled to the cluster C _x , as shown in FIG. 7, there are a cluster C ₀ including only one representative still image and a cluster C ₂ including a plurality of representative still images. Exists.

ここで、図７に示すクラスタＣ_０の中でｒやｓの画像を含むものは、樹形図における下の階層で他のどのクラスタとも結合することなく、非類似度の高い上の階層でのみクラスタＣ_ｘと結合したクラスタである。従って、これらの代表静止画像は、画像全体においても、あるいは画像の一部分においても、他の代表静止画像とはあまり類似していない画像であることが推定される。一方、図７に示すクラスタＣ_２は、非類似度の高い上の階層でクラスタＣ_ｘと結合しているものの、クラスタＣ_ｘとの結合からある一定距離だけ離れた低い階層において、複数のクラスタにより形成されている。従って、クラスタＣ_２に含まれる代表静止画像は、画像全体においては他の代表静止画像と類似しているものの、画像の一部分においては他の代表静止画像と異なる画像であることが推定される。 Here, in the cluster C ₀ shown in FIG. 7, an image including r and s is not connected to any other cluster in the lower hierarchy in the tree diagram, and is higher in the dissimilarity higher hierarchy. it is a cluster bound to the cluster C _x only. Therefore, it is estimated that these representative still images are images that are not very similar to other representative still images in the entire image or a part of the image. On the other hand, although the cluster C ₂ shown in FIG. 7 is coupled to the cluster C _{x in the} upper layer having a high dissimilarity, a plurality of clusters are separated in a lower layer separated from the coupling with the cluster C _{x by} a certain distance. It is formed by. Therefore, although the representative still image included in the cluster C ₂ is similar to the other representative still images in the entire image, it is estimated that a part of the image is different from the other representative still images.

候補クラスタ抽出部２２は、以上のような推定のもと、図７に示す階段状の領域部分を基準として、テンプレート候補クラスタを抽出する。以下、候補クラスタ抽出部２２によるテンプレート候補クラスタの抽出アルゴリズムについて、図７を参照しつつ、かつ、図８のフローチャートに沿って説明する。なお、図７では、説明の便宜上、各代表静止画像をａ〜ｓで示すこととする。また、後記する距離条件ｄ_ｃ（閾値）の一例を表したものを図中の左上に示す。 Based on the above estimation, the candidate cluster extraction unit 22 extracts a template candidate cluster on the basis of the stepped region shown in FIG. The template candidate cluster extraction algorithm by the candidate cluster extraction unit 22 will be described below with reference to FIG. 7 and along the flowchart of FIG. In FIG. 7, for convenience of explanation, each representative still image is denoted by a to s. An example of a distance condition d _c (threshold) described later is shown in the upper left of the drawing.

候補クラスタ抽出部２２は、階層クラスタリング部２１から階層クラスタリング結果が入力され、候補クラスタ抽出処理がスタートすると、まず図７に示すように、階層クラスタリングの分類結果である樹形図を所定の階層で切断する（ステップＳ１）。ここで、所定の階層で切断するとは、例えば、クラスタ同士の結合距離を全クラスタ分積算したものを全クラスタ数で除算して平均結合距離を求め、その平均結合距離の位置で樹形図を切断することを意味している。また他にも、平均結合距離の位置より、階段状の樹形図を上方に辿り、１つ上の階層への距離が予め定めた閾値を越えるところで切断する方法もある。 When the candidate cluster extraction unit 22 receives the hierarchical clustering result from the hierarchical clustering unit 21 and starts the candidate cluster extraction process, first, as shown in FIG. 7, the tree diagram which is the classification result of the hierarchical clustering is displayed in a predetermined hierarchy. Disconnect (step S1). Here, cutting at a predetermined hierarchy means, for example, dividing the combined distance of clusters for all clusters, dividing the total number of clusters to obtain an average combined distance, and obtaining a tree diagram at the position of the average combined distance. It means cutting. In addition, there is a method in which a stair-like tree diagram is traced upward from the position of the average coupling distance, and cutting is performed when the distance to the next higher layer exceeds a predetermined threshold.

次に、候補クラスタ抽出部２２は、図７における切断線と交わった交点Ｂ１〜Ｂ７を出発点リストに追加する（ステップＳ２）。次に、候補クラスタ抽出部２２は、出発点リストが空かどうかを判定する（ステップＳ３）。そして、出発点リストが空ではない場合（ステップＳ３でＮｏ）、候補クラスタ抽出部２２は、出発点リストから１つの出発点（交点）を選択し、当該出発点の下方向に向って処理を開始する（ステップＳ４）。一方、出発点リストが空である場合（ステップＳ３でＹｅｓ）、候補クラスタ抽出部２２は、処理を終了する。 Next, the candidate cluster extraction unit 22 adds the intersections B1 to B7 that intersect the cutting line in FIG. 7 to the starting point list (step S2). Next, the candidate cluster extraction unit 22 determines whether the starting point list is empty (step S3). If the departure point list is not empty (No in step S3), the candidate cluster extraction unit 22 selects one departure point (intersection) from the departure point list, and performs processing in the downward direction of the departure point. Start (step S4). On the other hand, when the starting point list is empty (Yes in step S3), the candidate cluster extraction unit 22 ends the process.

次に、候補クラスタ抽出部２２は、出発点の下方向、すなわち下の階層において、分岐点があるか否かを判定する（ステップＳ５）。そして、分岐点がある場合、候補クラスタ抽出部２２は、ステップＳ６に進む。一方、分岐点がない場合、すなわち図７に示す代表静止画像ｋ，ｌ，ｍ，ｎ，ｒ，ｓのように、クラスタが代表静止画像を一枚しか含まない場合、候補クラスタ抽出部２２は、出発点リストから現在の出発点を削除し（ステップＳ７）、ステップＳ４に戻る。 Next, the candidate cluster extraction unit 22 determines whether there is a branch point in the downward direction of the starting point, that is, in the lower hierarchy (step S5). If there is a branch point, the candidate cluster extraction unit 22 proceeds to step S6. On the other hand, when there is no branch point, that is, when the cluster includes only one representative still image as in the representative still images k, l, m, n, r, and s shown in FIG. The current starting point is deleted from the starting point list (step S7), and the process returns to step S4.

次に、候補クラスタ抽出部２２は、出発点と分岐点との間の距離が、予め設定された距離条件ｄ_ｃに規定された距離以上であるか否かを判定する（ステップＳ６）。そして、距離条件に規定された距離以上である場合（ステップＳ６でＹｅｓ）、候補クラスタ抽出部２２は、出発点以下のクラスタ（出発点より下の階層にあるクラスタ）をテンプレート候補クラスタとして抽出し、出発点リストから当該出発点を削除し（ステップＳ８）、ステップＳ４に戻る。 Next, the candidate cluster extraction unit 22 determines whether the distance between the branch point and the starting point is a distance above defined to a preset distance condition d _c (step S6). If the distance is greater than or equal to the distance specified in the distance condition (Yes in step S6), the candidate cluster extraction unit 22 extracts clusters that are below the starting point (clusters below the starting point) as template candidate clusters. The starting point is deleted from the starting point list (step S8), and the process returns to step S4.

例えば図７に示すように出発点をＢ５とした場合、出発点Ｂ５と分岐点Ｇ２との距離ｄ_２は距離条件ｄ_ｃに示された距離以上である。従って、候補クラスタ抽出部２２は、分岐点Ｇ２以下のクラスタをテンプレート候補クラスタとして抽出する。なお、このテンプレート候補クラスタに含まれる代表静止画像は、図７に示すように、代表静止画像ｏ，ｐ，ｑの３枚となる。 For example, when the starting point was B5 as shown in FIG. 7, the distance d ₂ between the starting point B5 and the branching point G2 is at least the distance indicated on the distance condition d _c. Therefore, the candidate cluster extraction unit 22 extracts clusters that are equal to or lower than the branch point G2 as template candidate clusters. The representative still images included in the template candidate cluster are three representative still images o, p, q as shown in FIG.

一方、距離条件に規定された距離未満である場合（ステップＳ６でＮｏ）、候補クラスタ抽出部２２は、出発点リストから現在の出発点を削除して代わりに出発点リストに分岐点の両端の点を追加し（ステップＳ９）、ステップＳ３に戻る。 On the other hand, if the distance is less than the distance specified in the distance condition (No in step S6), the candidate cluster extraction unit 22 deletes the current departure point from the departure point list and instead puts the end points of the branch points into the departure point list. A point is added (step S9), and the process returns to step S3.

例えば図７に示すように出発点をＢ１とした場合、出発点Ｂ１と分岐点Ｇ１との距離ｄ_１は距離条件ｄ_ｃに示された距離未満である。従って、候補クラスタ抽出部２２は、分岐点Ｇ１を出発点リストから削除し、分岐点Ｇ１の両端の点Ｒ１，Ｒ２を新たな出発点として出発点リストに追加し、ステップＳ３以下の処理を繰り返す。ここで、点Ｒ２は一枚の代表静止画像ｋしか含まないため、候補クラスタ抽出部２２は、出発点リストから点Ｒ２を削除する。一方、点Ｒ１は分岐点Ｐ１との距離が距離条件ｄ_ｃ以下であるため、候補クラスタ抽出部２２は、出発点リストから点Ｒ１を削除するとともに、出発点リストに点Ｐ１の両端の点Ｆ１，Ｆ２を追加し、ステップＳ３以下の処理を繰り返す。 For example, if the starting point B1 as shown in FIG. 7, the distance d ₁ between the starting point B1 and the branch point G1 is less than the distance indicated on the distance condition d _c. Accordingly, the candidate cluster extraction unit 22 deletes the branch point G1 from the start point list, adds the points R1 and R2 at both ends of the branch point G1 to the start point list as new start points, and repeats the processing from step S3 onward. . Here, since the point R2 includes only one representative still image k, the candidate cluster extraction unit 22 deletes the point R2 from the starting point list. On the other hand, the point R1 is the distance between the branch point P1 is less distance condition d _c, the candidate cluster extraction unit 22 deletes the point R1 from the starting point list, points at both ends of the point P1 to starting point list F1 , F2 are added, and the processes in and after step S3 are repeated.

このようにして、候補クラスタ抽出部２２は、例えば図７では、代表静止画像ｂ，ｃが含まれるクラスタ、代表静止画像ｉ，ｊが含まれるクラスタ、代表静止画像ｏ，ｐ，ｑが含まれるクラスタ、の３つのテンプレート候補クラスタを抽出する。なお、前記した距離条件に示された距離は、予め実験的に求めた値であり、図１に示す距離条件記憶部２３に予め記憶されている。 In this way, the candidate cluster extraction unit 22 includes, for example, the cluster including the representative still images b and c, the cluster including the representative still images i and j, and the representative still images o, p, and q in FIG. Three candidate template clusters are extracted. The distance indicated in the distance condition described above is a value obtained experimentally in advance, and is stored in advance in the distance condition storage unit 23 shown in FIG.

候補クラスタ抽出部２２には、図１に示すように、階層クラスタリング部２１から、階層クラスタリング結果が入力されるとともに、距離条件記憶部２３から距離条件が入力される。そして、候補クラスタ抽出部２２は、前記した手法によってテンプレート候補クラスタを抽出し、これを候補クラスタ絞り込み部２４に出力する。 As shown in FIG. 1, the candidate cluster extraction unit 22 receives the hierarchical clustering result from the hierarchical clustering unit 21 and the distance condition from the distance condition storage unit 23. Then, the candidate cluster extraction unit 22 extracts template candidate clusters by the above-described method, and outputs this to the candidate cluster narrowing unit 24.

距離条件記憶部２３は、前記したように、出発点と分岐点との距離が示された距離条件ｄ_ｃ（閾値）を予め記憶するものである。距離条件記憶部２３は、具体的には、データを記憶することができるメモリ、ハードディスク等で具現される。距離条件記憶部２３は、図１に示すように、距離条件ｄ_ｃを候補クラスタ抽出部２２に出力する。なお、距離条件記憶部２３は、候補クラスタ抽出部２２に距離条件を出力できる構成であれば、テンプレート画像生成装置１の外部に設けてもよい。 As described above, the distance condition storage unit 23 stores in advance the distance condition d _c (threshold value) indicating the distance between the departure point and the branch point. Specifically, the distance condition storage unit 23 is implemented by a memory, a hard disk, or the like that can store data. Distance condition storage unit 23, as shown in FIG. 1, and outputs the distance condition d _c in the candidate cluster extraction unit 22. The distance condition storage unit 23 may be provided outside the template image generation device 1 as long as the distance condition can be output to the candidate cluster extraction unit 22.

候補クラスタ絞り込み部２４は、所定の条件に基づいて、複数のテンプレート候補クラスタの数を絞り込むものである。候補クラスタ絞り込み部２４は、具体的には、以下の２つの条件に基づいて、候補クラスタ抽出部２２によって抽出されたテンプレート候補クラスタの数を段階的に絞り込む。なお、テンプレート画像生成装置１は、候補クラスタ絞り込み部２４による絞り込みを経ずに、後記するマスク情報の生成処理やテンプレート画像の選択処理等を行うこともできるが、候補クラスタ絞り込み部２４による絞り込みを行うことにより、より適切なテンプレート画像を生成することができ、テンプレートマッチングの精度を向上させることができる。 The candidate cluster narrowing unit 24 narrows down the number of template candidate clusters based on a predetermined condition. Specifically, the candidate cluster narrowing unit 24 narrows down the number of template candidate clusters extracted by the candidate cluster extracting unit 22 in stages based on the following two conditions. The template image generation apparatus 1 can perform a mask information generation process and a template image selection process, which will be described later, without performing the narrowing down by the candidate cluster narrowing unit 24. By doing so, a more appropriate template image can be generated, and the accuracy of template matching can be improved.

第１の条件は、候補クラスタ抽出部２２によって抽出されたテンプレート候補クラスタが、隣り合うショットから抽出された代表静止画像を含むクラスタである場合、該当する代表静止画像をテンプレート候補クラスタの中から削除するというものである。これは、番組映像を構成するショット内に、例えばカメラのフラッシュ等の映像が含まれている場合、ショット分割部１１が当該フラッシュを映像の切れ目であると誤検出し、本来１つであるショットを複数に分割してしまうおそれがあるため、このようなショットの過剰検出を抑制するための条件である。 The first condition is that if the template candidate cluster extracted by the candidate cluster extraction unit 22 is a cluster including a representative still image extracted from adjacent shots, the corresponding representative still image is deleted from the template candidate cluster. It is to do. This is because when a shot constituting a program video includes a video such as a camera flash, the shot division unit 11 erroneously detects that the flash is a video break, and is originally one shot. Is a condition for suppressing such excessive detection of shots.

従って、候補クラスタ絞り込み部２４は、図９（ａ）に示すように、テンプレート候補クラスタ１の中に隣り合うショットから抽出された代表静止画像が含まれている場合、前記した第１の条件に従って、該当する代表静止画像をテンプレート候補クラスタの中から削除し、後記する第２の条件との照合を行う。一方、候補クラスタ絞り込み部２４は、図９（ｂ）に示すように、テンプレート候補クラスタ２の中に隣り合ったショットから抽出された代表静止画像が含まれていない場合、代表静止画像を削除することなく、後記する第２の条件による絞り込みを行う。 Accordingly, as shown in FIG. 9A, the candidate cluster narrowing unit 24, when the representative still image extracted from the adjacent shot is included in the template candidate cluster 1, according to the first condition described above. The corresponding representative still image is deleted from the template candidate cluster, and collation with a second condition described later is performed. On the other hand, as shown in FIG. 9B, the candidate cluster narrowing unit 24 deletes the representative still image when the template candidate cluster 2 does not include the representative still image extracted from the adjacent shot. Without narrowing down, the second condition described later is used for narrowing down.

第２の条件は、第１の条件で絞り込んだテンプレート候補クラスタに含まれる代表静止画像の抽出元となる番組映像の数が、予め設定された数以上でない場合、該当するテンプレート候補クラスタを削除するというものである。これは、例えばテンプレート画像生成装置１に対して１００個の番組映像が入力されたにも関わらず、テンプレート候補クラスタに１つの番組映像から抽出された代表静止画像しか含まれていない場合、当該代表静止画像が複数の番組映像に共通する典型的な演出シーンを示すものではない可能性があるためである。 The second condition is to delete the corresponding template candidate cluster when the number of program videos from which the representative still images included in the template candidate clusters narrowed down by the first condition are not more than a preset number. That's it. This is because, for example, when 100 program videos are input to the template image generation apparatus 1 and the template candidate cluster includes only representative still images extracted from one program video, the representative This is because there is a possibility that a still image does not indicate a typical effect scene common to a plurality of program videos.

従って、候補クラスタ絞り込み部２４は、まず、前記した第１の条件によって絞り込んだテンプレート候補クラスタに含まれる代表静止画像がどの番組に含まれていたかを検出し、テンプレート画像生成装置１に入力された番組映像の数に対する代表静止画像の抽出元の番組数の割合が予め設定された割合条件に示された割合以上ではない場合、該当するテンプレート候補クラスタを削除する。なお、テンプレート候補クラスタに含まれる代表静止画像には、前記したように、代表静止画像抽出部１２において、抽出した番組名と、番組映像を構成する何番目のショットから抽出されたものであるかを示す番号と、が付与されている。 Therefore, the candidate cluster narrowing unit 24 first detects in which program the representative still image included in the template candidate cluster narrowed down according to the first condition described above is input to the template image generation apparatus 1. If the ratio of the number of programs from which the representative still image is extracted to the number of program videos is not equal to or greater than the ratio indicated in the preset ratio condition, the corresponding template candidate cluster is deleted. As described above, the representative still image included in the template candidate cluster is extracted from the program name extracted by the representative still image extraction unit 12 and the number of shots constituting the program video. And a number indicating.

ここで、前記した割合条件は、テンプレートマッチングの精度をどの程度のものにするのかによって適宜変更可能な条件である。すなわち、テンプレート画像生成装置１に例えば１０００個の番組映像を入力し、全ての番組映像で同じ演出を行っているテンプレート画像を生成したい場合は、割合条件を１００％に設定すればよい。この場合は、テンプレートマッチングの際の精度は向上するが、生成されるテンプレート画像の枚数が減少することになる。一方、テンプレート画像生成装置１に例えば１０００個の番組映像を入力し、１００個の番組映像で同じ演出を行っているテンプレート画像を生成したい場合は、割合条件を１０％に設定すればよい。この場合は、テンプレートマッチングの際の精度は低下するが、生成されるテンプレート画像の枚数は増加することになる。 Here, the above-described ratio condition is a condition that can be changed as appropriate depending on the accuracy of template matching. That is, for example, when 1000 program videos are input to the template image generation apparatus 1 and it is desired to generate a template image that performs the same effect on all the program videos, the ratio condition may be set to 100%. In this case, the accuracy at the time of template matching is improved, but the number of generated template images is reduced. On the other hand, if, for example, 1000 program videos are input to the template image generation apparatus 1 and it is desired to generate a template image that performs the same effect with 100 program videos, the ratio condition may be set to 10%. In this case, the accuracy at the time of template matching is lowered, but the number of generated template images is increased.

候補クラスタ絞り込み部２４には、図１に示すように、候補クラスタ抽出部２２からテンプレート候補クラスタが入力されるとともに、割合条件記憶部２５から割合条件が入力される。そして、候補クラスタ絞り込み部２４は、前記した手法によってテンプレート候補クラスタを絞り込み、これをテンプレート画像生成手段３０の分散値算出部３１およびテンプレート画像選択部３４に出力する。 As shown in FIG. 1, the candidate cluster narrowing unit 24 receives a template candidate cluster from the candidate cluster extraction unit 22 and a ratio condition from the ratio condition storage unit 25. The candidate cluster narrowing unit 24 narrows down the template candidate clusters by the above-described method, and outputs this to the variance value calculating unit 31 and the template image selecting unit 34 of the template image generating unit 30.

割合条件記憶部２５は、前記したように、テンプレート画像生成装置１に入力された番組映像の数に対する代表静止画像の抽出元の番組数の割合を示す割合条件を予め記憶するものである。割合条件記憶部２５は、具体的には、データを記憶することができるメモリ、ハードディスク等で具現される。割合条件記憶部２５は、図１に示すように、割合条件を候補クラスタ絞り込み部２４に出力する。なお、割合条件記憶部２５は、候補クラスタ絞り込み部２４に割合条件を出力できる構成であれば、テンプレート画像生成装置１の外部に設けてもよい。 As described above, the ratio condition storage unit 25 stores in advance a ratio condition indicating the ratio of the number of programs from which the representative still image is extracted with respect to the number of program videos input to the template image generating apparatus 1. Specifically, the ratio condition storage unit 25 is implemented by a memory, a hard disk, or the like that can store data. The ratio condition storage unit 25 outputs the ratio condition to the candidate cluster narrowing unit 24 as shown in FIG. The ratio condition storage unit 25 may be provided outside the template image generation apparatus 1 as long as the ratio condition can be output to the candidate cluster narrowing unit 24.

テンプレート画像生成手段３０は、テンプレート候補クラスタ抽出手段２０によって抽出された絞り込み後のテンプレート候補クラスタから、マスク情報を生成するとともに、テンプレート候補画像からテンプレート画像を選択するものである。テンプレート画像生成手段３０は、図１に示すように、分散値算出部３１と、マスク情報生成部３２と、閾値記憶部３３と、テンプレート画像選択部３４と、を備えている。 The template image generation means 30 generates mask information from the template candidate clusters after narrowing down extracted by the template candidate cluster extraction means 20 and selects a template image from the template candidate images. As shown in FIG. 1, the template image generation unit 30 includes a variance value calculation unit 31, a mask information generation unit 32, a threshold storage unit 33, and a template image selection unit 34.

分散値算出部３１は、テンプレート候補クラスタに含まれるテンプレート候補画像（代表静止画像）のブロックごとの分散値を算出するものである。分散値算出部３１は、具体的には、候補クラスタ絞り込み部２４が絞り込んだテンプレート候補クラスタに含まれるテンプレート候補画像の画像特徴量をブロックごとに比較し、当該ブロックごとの画像特徴量の分散値を算出する。分散値算出部３１は、例えば図１０に示すように、テンプレート候補クラスタにテンプレート候補画像１〜３が含まれており、かつ、これらの画像が前記したブロック分割部１３によって、横１８マス×縦１１マスにブロック分割されたものである場合、１９８ブロック分の分散値を算出する。 The variance value calculation unit 31 calculates a variance value for each block of the template candidate images (representative still images) included in the template candidate cluster. Specifically, the variance value calculation unit 31 compares the image feature amounts of the template candidate images included in the template candidate clusters narrowed down by the candidate cluster narrowing unit 24 for each block, and the variance value of the image feature amount for each block. Is calculated. For example, as illustrated in FIG. 10, the variance value calculation unit 31 includes template candidate images 1 to 3 in a template candidate cluster, and these images are processed by the above-described block division unit 13 by 18 squares × vertical. If the block is divided into 11 squares, a variance value for 198 blocks is calculated.

ここで、画像特徴量の分散値が大きいということは、該当するブロックの画像特徴量の変化が大きいということを示している。従って、テンプレート候補画像において画像特徴量の分散値が大きいブロックは、複数の番組映像における可変部分であると考えることができる。一方、画像特徴量の分散値が小さいということは、該当するブロックの画像特徴量の変化が小さいということを示している。従って、テンプレート候補画像において画像特徴量の分散値が小さいブロックは、複数の番組映像における固定部分であると考えることができる。 Here, the fact that the variance value of the image feature value is large indicates that the change in the image feature value of the corresponding block is large. Therefore, a block having a large image feature amount variance value in the template candidate image can be considered as a variable portion in a plurality of program videos. On the other hand, a small dispersion value of the image feature amount indicates that a change in the image feature amount of the corresponding block is small. Therefore, it can be considered that a block having a small image feature amount variance value in a template candidate image is a fixed portion in a plurality of program videos.

分散値算出部３１には、図１に示すように、候補クラスタ絞り込み部２４から、絞り込み後のテンプレート候補クラスタが入力される。そして、分散値算出部３１は、前記した手法によってテンプレート候補クラスタに含まれるテンプレート候補画像のブロックごとの分散値を算出し、これらをマスク情報生成部３２に出力する。なお、分散値算出部３１は、候補クラスタ絞り込み部２４から、複数のテンプレート候補クラスタが入力された場合は、テンプレート候補クラスタごとに前記した分散値を算出する。 As shown in FIG. 1, the template candidate cluster after narrowing down is input to the variance value calculating unit 31 from the candidate cluster narrowing unit 24. Then, the variance value calculation unit 31 calculates the variance value for each block of the template candidate image included in the template candidate cluster by the above-described method, and outputs these to the mask information generation unit 32. When a plurality of template candidate clusters are input from the candidate cluster narrowing unit 24, the variance value calculation unit 31 calculates the above-described variance value for each template candidate cluster.

マスク情報生成部３２は、テンプレート画像に合成するマスク情報を生成するものである。ここで、マスク情報とは、テンプレート画像に対するマスクの形成位置に関する情報を意味している。マスク情報生成部３２は、具体的には、分散値算出部３１から入力されたテンプレート候補画像のブロックごとの分散値と、予め設定された閾値と、を比較し、当該分散値が閾値を超える場合、該当するブロックを覆うマスクの形成位置に関するマスク情報を生成する。そして、マスク情報生成部３２は、テンプレート画像の全てのブロックについて前記した処理を行い、例えば図１１（ａ）の左図に示すように、テンプレート画像全体のマスク情報を生成する。 The mask information generation unit 32 generates mask information to be combined with the template image. Here, the mask information means information related to the mask formation position with respect to the template image. Specifically, the mask information generation unit 32 compares the variance value for each block of the template candidate image input from the variance value calculation unit 31 with a preset threshold value, and the variance value exceeds the threshold value. In this case, mask information related to the formation position of the mask covering the corresponding block is generated. Then, the mask information generation unit 32 performs the above-described processing for all the blocks of the template image, and generates mask information for the entire template image, for example, as shown in the left diagram of FIG.

ここで、前記した閾値は、テンプレートマッチングの精度をどの程度のものにするのかによって適宜変更可能な条件である。すなわち、テンプレート候補画像における些細な可変部分であっても全てマスクしたい場合は、閾値を下げればよい。この場合は、テンプレートマッチングの際の精度は向上するが、生成されるテンプレート画像の枚数は減少することになる。一方、テンプレート候補画像における大きな可変部分のみをマスクしたい場合は、閾値を上げればよい。この場合は、テンプレートマッチングの際の精度は低下するが、生成されるテンプレート画像の枚数は増加することになる。 Here, the threshold value described above is a condition that can be changed as appropriate depending on the accuracy of template matching. That is, if it is desired to mask even a small variable portion in the template candidate image, the threshold value may be lowered. In this case, the accuracy in template matching is improved, but the number of template images to be generated is reduced. On the other hand, if it is desired to mask only a large variable part in the template candidate image, the threshold value may be increased. In this case, the accuracy at the time of template matching is lowered, but the number of generated template images is increased.

マスク情報生成部３２には、図１に示すように、分散値算出部３１から、テンプレート候補クラスタに含まれるテンプレート候補画像のブロックごとの分散値が入力されるとともに、閾値記憶部３３から、閾値が入力される。そして、マスク情報生成部３２は、前記した手法によってテンプレート全体のマスク情報を生成し、これを出力する。 As shown in FIG. 1, the mask information generation unit 32 receives a variance value for each block of template candidate images included in the template candidate cluster from the variance value calculation unit 31, and receives a threshold value from the threshold storage unit 33. Is entered. Then, the mask information generation unit 32 generates mask information of the entire template by the above-described method, and outputs this.

閾値記憶部３３は、前記したように、テンプレート画像のそれぞれブロックを覆うマスク情報を生成するか否かを判定するための閾値を記憶するものである。閾値記憶部３３は、具体的には、データを記憶することができるメモリ、ハードディスク等で具現される。閾値記憶部３３は、図１に示すように、閾値をマスク情報生成部３２に出力する。なお、閾値記憶部３３は、マスク情報生成部３２に閾値を出力できる構成であれば、テンプレート画像生成装置１の外部に設けてもよい。 As described above, the threshold value storage unit 33 stores a threshold value for determining whether or not to generate mask information that covers each block of the template image. Specifically, the threshold storage unit 33 is implemented by a memory, a hard disk, or the like that can store data. As shown in FIG. 1, the threshold storage unit 33 outputs the threshold to the mask information generation unit 32. Note that the threshold storage unit 33 may be provided outside the template image generation device 1 as long as the threshold value can be output to the mask information generation unit 32.

テンプレート画像選択部３４は、テンプレート候補クラスタに含まれるテンプレート候補画像から、１枚のテンプレート画像を選択するものである。テンプレート画像選択部３４は、具体的には、候補クラスタ絞り込み部２４が絞り込んだテンプレート候補クラスタに含まれる複数のテンプレート候補画像の中で、画像特徴量の特徴ベクトルが最も中心に近い（特徴ベクトルの平均に最も近い）テンプレート候補画像をテンプレート画像として選択する。すなわち、テンプレート画像選択部３４は、クラスタに含まれる画像群の中で、最も平均に近いものを取り出すことになる。 The template image selection unit 34 selects one template image from the template candidate images included in the template candidate cluster. More specifically, the template image selection unit 34 has the feature vector of the image feature amount closest to the center among the plurality of template candidate images included in the template candidate cluster narrowed down by the candidate cluster narrowing unit 24 (the feature vector of the feature vector). The template candidate image closest to the average is selected as the template image. That is, the template image selection unit 34 extracts the image group closest to the average from the image group included in the cluster.

テンプレート画像選択部３４には、図１に示すように、候補クラスタ絞り込み部２４から、絞り込み後のテンプレート候補クラスタが入力される。そして、テンプレート画像選択部３４は、前記した手法によってテンプレート画像を選択し、これを出力する。なお、テンプレート画像選択部３４は、候補クラスタ絞り込み部２４から、複数のテンプレート候補クラスタが入力された場合は、テンプレート候補クラスタごとに前記したテンプレート画像を選択する。 As shown in FIG. 1, the template candidate cluster after narrowing down is input to the template image selection unit 34 from the candidate cluster narrowing unit 24. And the template image selection part 34 selects a template image with an above-described method, and outputs this. When a plurality of template candidate clusters are input from the candidate cluster narrowing unit 24, the template image selection unit 34 selects the template image described above for each template candidate cluster.

マスク情報生成部３２によって生成されたマスク情報と、テンプレート画像選択部３４によって選択されたテンプレート画像は、例えば、図１１（ａ）に示すように合成され、図１１（ｂ）に示すようなマスク情報付きテンプレート画像が生成される。 The mask information generated by the mask information generation unit 32 and the template image selected by the template image selection unit 34 are combined as shown in FIG. 11A, for example, and a mask as shown in FIG. A template image with information is generated.

なお、図１１（ｂ）を参照すると、ニュース項目を伝えるはめ込み画像部分は、ニュースごとに変化するため、マスクがかかっていることがわかる。また、アナウンサの顔の左下部は、原稿を読む際に動いて変化するため、マスクがかかっていることがわかる。また、アナウンサの左腕は、原稿をめくる際に動いて変化するため、マスクがかかっていることがわかる。また、アナウンサのネクタイは、日によって変化するため、マスクがかかっていることがわかる。 Referring to FIG. 11 (b), it can be seen that the inset image portion that conveys the news item changes for each news, and is therefore masked. In addition, the lower left part of the announcer's face moves and changes when the manuscript is read, so that the mask is put on. Further, the announcer's left arm moves and changes when turning the document, so that it can be seen that the announcer is masked. The announcer's tie changes from day to day, so you can see that the mask is on.

以上のような構成を備えるテンプレート画像生成装置１は、画像特徴量の類似度に従って代表静止画像の階層クラスタリングを行い、その結果から、一部分が類似する複数の代表静止画像が含まれるクラスタを抽出することで、複数の番組映像の中で繰り返し用いられる典型的な演出シーンからテンプレート画像を自動的に生成することができる。また、複数のテンプレート候補画像における画像特徴量の分散値を算出することで画像内における可変部分を判別し、この可変部分を覆うマスク情報を生成するため、当該マスク情報で特定されるマスクをテンプレート画像に合成することにより、テンプレートマッチングの精度を向上させることができるとともに、テンプレートマッチングの際における閾値のばらつきを防止することができる。 The template image generating apparatus 1 having the above configuration performs hierarchical clustering of representative still images according to the similarity of image feature amounts, and extracts clusters including a plurality of representative still images that are partially similar from the result. Thus, a template image can be automatically generated from a typical effect scene repeatedly used in a plurality of program videos. In addition, in order to determine a variable portion in the image by calculating a variance value of image feature amounts in a plurality of template candidate images and generate mask information covering the variable portion, a mask specified by the mask information is used as a template. By synthesizing with an image, the accuracy of template matching can be improved, and variations in threshold values during template matching can be prevented.

また、テンプレート画像生成装置１は、階層クラスタリングの結果を示す樹形図において、切断線との各交点から１つ１つ下の階層を辿り、所定距離以上離れて分岐を持つクラスタを探索することによって、テンプレート候補クラスタを容易に抽出することができる。また、テンプレート候補クラスタを２つの条件を用いて段階的に絞り込むことで、複数の番組映像の中で繰り返し用いられる典型的な演出シーンだけを精度よく抽出することができる。 In addition, the template image generation apparatus 1 searches a cluster having a branch at a predetermined distance or more by following the hierarchy one by one from each intersection with the cutting line in the tree diagram showing the result of the hierarchical clustering. Thus, template candidate clusters can be easily extracted. Further, by narrowing down the template candidate clusters in stages using two conditions, it is possible to accurately extract only typical effect scenes that are repeatedly used in a plurality of program videos.

［テンプレート画像生成プログラム］
ここで、テンプレート画像生成装置１は、一般的なコンピュータを、前記した各手段および各部として機能させるプログラムにより動作させることで実現することができる。このプログラムは、通信回線を介して配布することも可能であるし、ＣＤ−ＲＯＭ等の記録媒体に書き込んで配布することも可能である。 [Template image generation program]
Here, the template image generating apparatus 1 can be realized by operating a general computer with a program that functions as each of the above-described units and units. This program can be distributed via a communication line, or can be written on a recording medium such as a CD-ROM for distribution.

［テンプレート画像生成装置の動作］
以下、テンプレート画像生成装置１の動作の一例について、図１を参照しながら簡単に説明する。まず、複数の番組映像がテンプレート画像生成装置１に入力されると、ショット分割部１１が、それぞれの映像の中から編集点等の映像の切れ目を検出し、その切れ目に従ってショットごとに分割する。 [Operation of Template Image Generation Device]
Hereinafter, an example of the operation of the template image generating apparatus 1 will be briefly described with reference to FIG. First, when a plurality of program videos are input to the template image generating apparatus 1, the shot dividing unit 11 detects video breaks such as edit points from the respective videos, and divides each shot according to the cuts.

次に、代表静止画像抽出部１２が、例えば各ショットの最初のフレーム画像を代表静止画像として抽出する。次に、ブロック分割部１３が、代表静止画像を例えば横１８マス×縦１１マス（１９８ブロック）でブロック分割する。次に、特徴量抽出部１４が、ＲＧＢ成分の平均値やＬ^＊ａ^＊ｂ^＊成分の平均値等の色情報からなる画像特徴量を代表静止画像のブロックごとに抽出する。 Next, the representative still image extraction unit 12 extracts, for example, the first frame image of each shot as a representative still image. Next, the block dividing unit 13 divides the representative still image into blocks by, for example, horizontal 18 squares × vertical 11 squares (198 blocks). Next, the feature amount extraction unit 14 extracts an image feature amount including color information such as an average value of RGB components and an average value of L ^* a ^* b ^* components for each block of the representative still image.

次に、階層クラスタリング部２１が、代表静止画像のブロックごとの画像特徴量を画像の左上から右下に向って順番に並べて特徴ベクトルとし、それぞれの特徴ベクトルの類似度に従って階層クラスタリングを行う。次に、候補クラスタ抽出部２２が、階層クラスタリングの結果から、テンプレート候補クラスタを抽出する。なお、候補クラスタ抽出部２２によるテンプレート候補クラスタの抽出アルゴリズムについては、前記した通りである。次に、候補クラスタ絞り込み部２４が、第１の条件に基づいて、テンプレート候補クラスタの中に隣り合うショットから抽出された代表静止画像が含まれている場合、前記した第１の条件に従って、該当する代表静止画像をテンプレート候補クラスタの中から削除する。また、候補クラスタ絞り込み部２４が、第２の条件に基づいて、第１の条件によって絞り込んだテンプレート候補クラスタに含まれる代表静止画像がどの番組に含まれていたかを検出し、テンプレート画像生成装置１に入力された番組映像の数に対する代表静止画像の抽出元の番組数の割合が予め設定された割合条件の割合以上でない場合、該当するテンプレート候補クラスタを削除する。 Next, the hierarchical clustering unit 21 arranges the image feature amounts for each block of the representative still image in order from the upper left to the lower right of the image as feature vectors, and performs hierarchical clustering according to the similarity of each feature vector. Next, the candidate cluster extraction unit 22 extracts template candidate clusters from the result of hierarchical clustering. The template candidate cluster extraction algorithm by the candidate cluster extraction unit 22 is as described above. Next, when the candidate cluster narrowing-down unit 24 includes a representative still image extracted from an adjacent shot in the template candidate cluster based on the first condition, the candidate cluster narrowing unit 24 corresponds to the first condition described above. The representative still image to be deleted is deleted from the template candidate cluster. Further, the candidate cluster narrowing unit 24 detects, based on the second condition, which program contained the representative still image included in the template candidate cluster narrowed down according to the first condition, and the template image generating device 1 If the ratio of the number of programs from which the representative still image is extracted to the number of program videos input to is not equal to or greater than the ratio of the preset ratio condition, the corresponding template candidate cluster is deleted.

次に、分散値算出部３１が、テンプレート候補クラスタに含まれるテンプレート候補画像のブロックごとの画像特徴量の分散値を算出する。次に、マスク情報生成部３２が、分散値算出部３１から入力されたテンプレート候補画像のブロックごとの分散値と、予め設定された閾値と、を比較し、当該分散値が閾値以上である場合、該当するブロックを覆うためのマスク情報を生成して出力する。テンプレート画像選択部３４が、テンプレート候補クラスタに含まれる複数のテンプレート候補画像の中で、最も平均に近い画像をテンプレート画像として選択して出力する。 Next, the variance value calculation unit 31 calculates the variance value of the image feature amount for each block of the template candidate images included in the template candidate cluster. Next, when the mask information generation unit 32 compares the variance value for each block of the template candidate image input from the variance value calculation unit 31 with a preset threshold value, and the variance value is equal to or greater than the threshold value The mask information for covering the corresponding block is generated and output. The template image selection unit 34 selects and outputs an image closest to the average as a template image among a plurality of template candidate images included in the template candidate cluster.

１テンプレート画像生成装置
１０画像特徴量抽出手段
１１ショット分割部
１２代表静止画像抽出部
１３ブロック分割部
１４特徴量抽出部
２０テンプレート候補クラスタ抽出手段
２１階層クラスタリング部
２２候補クラスタ抽出部
２３距離条件記憶部
２４候補クラスタ絞り込み部
２５割合条件記憶部
３０テンプレート画像生成手段
３１分散値算出部
３２マスク情報生成部
３３閾値記憶部
３４テンプレート画像選択部
Ｃ_ｘクラスタ
Ｃ_０独立クラスタ
Ｃ_１第１結合クラスタ
Ｃ_２第２結合クラスタ
Ｃ_３第３結合クラスタ
Ｃ_４第４結合クラスタ DESCRIPTION OF SYMBOLS 1 Template image generation apparatus 10 Image feature-value extraction means 11 Shot division part 12 Representative still image extraction part 13 Block division part 14 Feature-value extraction part 20 Template candidate cluster extraction means 21 Hierarchical clustering part 22 Candidate cluster extraction part 23 Distance condition storage part 24 candidate cluster narrowing down section 25 ratio condition storage section 30 template image generation means 31 variance value calculation section 32 mask information generation section 33 threshold storage section 34 template image selection section C _x cluster C ₀ independent cluster C ₁ first combined cluster C ₂ first 2 bond cluster C ₃ 3rd bond cluster C ₄ 4th bond cluster

Claims

A template image generation device that generates a template image used when performing template matching from a plurality of program videos,
An image feature amount that divides the plurality of program videos into shots, extracts a representative still image from the shot, divides the representative still image into a predetermined number of blocks, and extracts an image feature amount for each block Extraction means;
The representative still image is hierarchically clustered according to the similarity of the image feature amount extracted by the image feature amount extraction means, and a tree line showing a result of the hierarchical clustering is a cutting line when cut at a predetermined hierarchy. Template candidate cluster extraction means for following the hierarchy one by one from each intersection and extracting a cluster having branches separated by a predetermined distance or more as a template candidate cluster;
Template image generation means for generating mask information and a template image from the template candidate cluster extracted by the template candidate cluster extraction means,
The template image generation means includes
A variance value calculation unit that calculates a variance value of the image feature amount for each block of a plurality of template candidate images that are candidates for the template image included in the template candidate cluster extracted by the template candidate cluster extraction unit;
A mask information generating unit that generates mask information, which is information relating to a mask formation position for each block with respect to the template image, when the variance value calculated by the variance value calculating unit exceeds a preset threshold;
A template image for selecting, as the template image, the template candidate image having the closest image feature amount among the plurality of template candidate images included in the template candidate cluster extracted by the template candidate cluster extracting unit. A selection section;
A template image generation apparatus comprising:

The template candidate cluster extraction means includes
A feature vector of the representative still image obtained by arranging the image feature amounts for each block of the representative still image extracted by the image feature amount extracting unit in a predetermined order, and the representative still image according to the similarity of the feature vector A hierarchical clustering unit for hierarchical clustering;
Clusters having branches separated by a predetermined distance or more from each intersection with the cutting line when the tree diagram showing the result of hierarchical clustering by the hierarchical clustering unit is cut at a predetermined hierarchy. A candidate cluster extraction unit that extracts a template candidate cluster,
The template image generation apparatus according to claim 1, further comprising:

The template candidate cluster extraction means, when the template candidate cluster extracted by the candidate cluster extraction unit is a cluster including the representative still image extracted from the adjacent shot, the corresponding representative still image is The first condition to be deleted from the template candidate clusters, and the number of the program videos from which the representative still images are extracted from the template candidate clusters that have passed the first condition are equal to or greater than a preset number If not, the template image generation device according to claim 2, further comprising a candidate cluster narrowing-down unit that narrows down the template candidate clusters according to a second condition for deleting the corresponding template candidate clusters.

In order to generate a template image used when performing template matching from a plurality of program videos,
An image feature amount that divides the plurality of program videos into shots, extracts a representative still image from the shot, divides the representative still image into a predetermined number of blocks, and extracts an image feature amount for each block Extraction means,
The representative still image is hierarchically clustered according to the similarity of the image feature amount extracted by the image feature amount extraction means, and a tree line showing a result of the hierarchical clustering is a cutting line when cut at a predetermined hierarchy. A template candidate cluster extraction means for tracing a hierarchy one by one from each intersection and extracting a cluster having a branch at a predetermined distance or more as a template candidate cluster;
A variance value calculating unit that calculates a variance value of the image feature amount for each block of a plurality of template candidate images that are candidates for the template image included in the template candidate cluster extracted by the template candidate cluster extracting unit;
A mask information generating unit configured to generate mask information that is information relating to a mask forming position for each block with respect to the template image when the variance value calculated by the variance value calculating unit exceeds a preset threshold;
A template image for selecting, as the template image, the template candidate image having the closest image feature amount among the plurality of template candidate images included in the template candidate cluster extracted by the template candidate cluster extracting unit. Selection means,
A template image generation program characterized by being made to function as.