JP5070179B2

JP5070179B2 - Scene similarity determination device, program thereof, and summary video generation system

Info

Publication number: JP5070179B2
Application number: JP2008265023A
Authority: JP
Inventors: 雅規佐野
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2008-10-14
Filing date: 2008-10-14
Publication date: 2012-11-07
Anticipated expiration: 2028-10-14
Also published as: JP2010097246A

Description

本発明は、ドラマ、映画等の重複したシーンを含む素材映像から、類似するシーンを判定する技術に関する。 The present invention relates to a technique for determining a similar scene from a material video including overlapping scenes such as dramas and movies.

近年、ドラマ等の番組や映画において、制作する素材ビデオ（素材映像）の量は、最終的な番組の１００倍にもなることがある。このとき、制作者は、１つ１つの素材ビデオにどんなシーンが含まれているかを知るために、例えば、その素材ビデオの全てを早回して再生することがあり、この作業に時間と労力がかかる。しかも、各素材ビデオの中に同じシーンを何度も撮影し直したものが含まれており、それらのシーンが同じものか違うものかを知る為に各素材ビデオを早回して再生することは、制作者の時間と労力が非常にかかる。そこで、素材ビデオにおいて、撮影したシーンが同じものなのか、違うシーンなのかを判定する方法が、いくつか提案されている。 In recent years, the amount of material video (material image) produced in a program or movie such as a drama may be 100 times that of a final program. At this time, in order to know what scenes are included in each material video, for example, the creator may play all of the material videos quickly, and this work takes time and effort. Take it. In addition, each material video includes a re-shoot of the same scene over and over, and it is not possible to quickly play back each material video to know if those scenes are the same or different. , It takes a lot of time and effort for the creator. In view of this, several methods have been proposed for determining whether the captured scene is the same or different in the material video.

例えば、各ショット内のある時系列の特徴データを比較することにより、シーンが類似するか否かを判定する技術が提案されている（例えば、特許文献１参照）。また、例えば、対象となる素材ビデオを各シーンに分割した後に、各シーンの最初のフレーム（静止画）を用いて単純に比較する技術が提案されている（例えば、非特許文献１参照）。さらに、例えば、対象となる素材ビデオの各シーン内から等間隔で抽出した複数の静止画を用いて、抽出した全ての静止画に対し、ある画像の特徴を用いて、クラスタリングを行い、同じクラスタに属したものは、同じシーンと判定する技術が提案されている（例えば、非特許文献２参照）。
特開２０００−３３９４７４号公報Ｚ．Ｌｉｕ，Ｅ．Ｚａｖｅｓｋｙ，Ｄ．Ｇｉｂｂｏｎ，Ｂ．Ｓｈａｈｒａｒａｙ，ａｎｄＰ．Ｈａｆｆｎｅｒ，“ＡＴ＆ＴＲｅｓｅａｒｃｈａｔＴＲＥＣＶＩＤ２００７” Ａ．Ｈａｕｐｔｍａｎｎ，Ｍ．Ｃｈｒｉｓｔｅｌ，Ｗ．Ｌｉｎ，Ｂ．Ｍａｈｅｒ，Ｊ．Ｙａｎｇ，Ｒ．Ｂａｒｏｎ，ａｎｄＧ．Ｘｉａｎｇ，“ＳｕｍｍａｒｉｚｉｎｇＢＢＣＲｕｓｈｅｓｔｈｅＩｎｆｏｒｍｅｄｉａＷａｙ” For example, a technique for determining whether scenes are similar by comparing certain time-series feature data in each shot has been proposed (see, for example, Patent Document 1). In addition, for example, a technique has been proposed in which a target material video is divided into scenes and then simply compared using the first frame (still image) of each scene (see, for example, Non-Patent Document 1). Furthermore, for example, using a plurality of still images extracted at regular intervals from each scene of the target material video, clustering is performed on all the extracted still images using certain image characteristics, and the same cluster There has been proposed a technique for determining that the scene belongs to the same scene (see, for example, Non-Patent Document 2).
JP 2000-339474 A Z. Liu, E .; Zavesky, D.C. Gibbon, B.M. Shahraray, and P.M. Haffner, “AT & T Research at TRECVID 2007” A. Haptmann, M.M. Christel, W.M. Lin, B .; Maher, J .; Yang, R.A. Baron, and G.G. Xiang, “Summarizing BBC Rushes the Information Way”

しかし、特許文献１及び非特許文献１で提案された技術では、各シーンの最初又は各シーンの中間等の静止画を用いて比較するときに、各シーンの最初又は中間に撮影の準備部分が含まれ、かつ、同じシーンでも時間的なズレがあるため、類似判定の精度が非常に低い。また、非特許文献２で提案された技術は、この類似判定の精度を補うために、複数の画像を抽出し、クラスタリングを用いるものである。しかし、このクラスタリング処理は、一般的に処理時間が長く、類似判定が低速になってしまう。また、このとき、比較する画像の特徴を時系列データにすることも考えられるが、比較する画像の組み合わせが膨大となり、時間軸の伸縮（ズレ）が存在するため、処理時間が非常に長くなると共に類似判定の精度も向上しない場合があり、現実的でない。 However, in the techniques proposed in Patent Document 1 and Non-Patent Document 1, when a comparison is made using a still image such as the beginning of each scene or the middle of each scene, a preparation part for shooting is provided at the beginning or middle of each scene. The accuracy of similarity determination is very low because it is included and there is a time shift even in the same scene. In addition, the technique proposed in Non-Patent Document 2 extracts a plurality of images and uses clustering to supplement the accuracy of the similarity determination. However, this clustering process generally takes a long processing time, and the similarity determination is slow. At this time, the feature of the image to be compared may be time-series data. However, the combination of images to be compared becomes enormous and the time axis expands and contracts, so that the processing time becomes very long. At the same time, the accuracy of similarity determination may not be improved, which is not realistic.

そこで、本発明は、簡易で高速、かつ、シーンの類似判定の精度が高いシーン類似判定装置及びそのプログラム、並びに、簡易で高速に、重複シーンが極めて少ないサマリ映像を生成できるサマリ映像生成システムを提供することを目的とする。 Therefore, the present invention provides a scene similarity determination device and program thereof that is simple and high-speed and has high accuracy of scene similarity determination, and a summary video generation system that can easily and quickly generate a summary video with extremely few overlapping scenes. The purpose is to provide.

前記した課題を解決するため、請求項１に係るシーン類似判定装置は、重複したシーンを含む素材映像について、素材映像から抽出した２個のシーンが類似するか否かを判定するシーン類似判定装置において、入力手段と、シーン分割手段と、静止画抽出手段と、静止画類似判定手段と、シーン類似判定手段と、を備える構成とした。 In order to solve the above-described problem, a scene similarity determination device according to claim 1 determines whether or not two scenes extracted from a material video are similar for a material video including overlapping scenes. In the configuration, the input unit, the scene dividing unit, the still image extracting unit, the still image similarity determining unit, and the scene similarity determining unit are provided.

かかる構成によれば、シーン類似判定装置は、入力手段によって、２個のシーンを入力する。また、シーン類似判定装置は、シーン分割手段によって、シーンを構成するフレーム間の動き量を算出して、シーンを、動き量が予め設定した閾値以上である動き大区間と、動き量が閾値未満である動き小区間とに分割する。ここで、ドラマ、映画等の撮影では、同じシーンを何度も撮影し直すことが多く、この場合、カメラや役者は同じ立ち振る舞いを繰り返すため、同じような動きとなる。つまり、重複するシーンの動き大区間には同様の動きが含まれるため、シーン類似判定装置は、シーンの類似を判定するために、シーンを動き大区間と動き小区間とに分割する。 According to such a configuration, the scene similarity determination apparatus inputs two scenes by the input unit. In addition, the scene similarity determination device calculates a motion amount between frames constituting the scene by the scene dividing unit, and the scene includes a large motion section in which the motion amount is equal to or greater than a preset threshold value, and the motion amount is less than the threshold value. It divides into the motion subsection which is. Here, in the shooting of dramas, movies, etc., the same scene is often re-shooted many times. In this case, the camera and the actor repeat the same behavior, and thus the movement is the same. That is, since the same motion is included in the large motion section of the overlapping scene, the scene similarity determination device divides the scene into the large motion section and the small motion section in order to determine the similarity of the scene.

また、シーン類似判定装置は、静止画抽出手段によって、シーンの動き小区間毎に、動き小区間の直後に位置する動き大区間の開始時刻から一定時間前又は動き小区間の中間時刻のうち、動き大区間の開始時刻に近い一方の時刻における動き小区間のフレームを静止画として抽出する。ここで、２個のシーンから抽出した動き大区間は、互いにその時間方向に若干の伸縮（ズレ）があるため、シーン類似判定装置は、静止画抽出手段によって、動きが少なく、同じ構図（静止画）である確率が高い、動き大区間の直前の動き小区間から静止画を抽出する。 In addition, the scene similarity determination device uses a still image extraction unit for each small motion section of the scene, a predetermined time before the start time of the large motion section located immediately after the small motion section, or an intermediate time of the small motion section, The frame of the small motion section at one time close to the start time of the large motion section is extracted as a still image. Here, since the large motion sections extracted from the two scenes have a slight expansion and contraction in the time direction, the scene similarity determination device uses the still image extraction unit to move less and have the same composition (still image). A still image is extracted from the small motion section immediately before the large motion section.

また、シーン類似判定装置は、静止画類似判定手段によって、２個のシーンのうちの一方のシーンから抽出した静止画の特徴量を算出すると共に、２個のシーンのうちの他方のシーンから抽出した静止画の特徴量を算出し、一方のシーンの静止画と他方のシーンの静止画との全ての組み合わせについて、一方のシーンの静止画の特徴量と他方のシーンの静止画の特徴量とに基づいて、一方のシーンの静止画と他方のシーンの静止画とが類似するか否かを判定する。そして、シーン類似判定装置は、シーン類似判定手段によって、一方のシーンの静止画と他方のシーンの静止画との何れか１以上の組み合わせについて、静止画類似判定手段が類似すると判定した場合、一方のシーンと他方のシーンとが類似すると判定する。このように、シーン類似判定装置は、シーンの類似判定のために、１組以上の静止画の類似判定を行うだけで、クラスタリングや時系列で静止画を比較するといった複雑な演算を行っていない。 In addition, the scene similarity determination device calculates the feature amount of the still image extracted from one of the two scenes by the still image similarity determination means, and extracts from the other of the two scenes. The feature amount of the still image is calculated, and for all combinations of the still image of one scene and the still image of the other scene, the feature amount of the still image of one scene and the feature amount of the still image of the other scene Based on the above, it is determined whether or not the still image of one scene is similar to the still image of the other scene. When the scene similarity determination unit determines that the still image similarity determination unit is similar for one or more combinations of the still image of one scene and the still image of the other scene by the scene similarity determination unit, It is determined that the other scene is similar to the other scene. As described above, the scene similarity determination device only performs similarity determination of one or more sets of still images for scene similarity determination, and does not perform complicated calculations such as clustering or time-series comparison of still images. .

請求項２に係るシーン類似判定装置は、静止画類似判定手段が、一方のシーンの静止画と他方のシーンの静止画との何れか１の組み合わせが類似すると判定した場合、一方のシーンの静止画と他方のシーンの静止画との他の組み合わせが類似するか否かの判定を終了することを特徴とする。 In the scene similarity determination apparatus according to claim 2, when the still image similarity determination unit determines that any one combination of the still image of one scene and the still image of the other scene is similar, The determination whether or not another combination of the image and the still image of the other scene is similar is completed.

かかる構成によれば、シーン類似判定装置は、一方のシーンの静止画と他方のシーンの静止画との何れか１の組み合わせが類似すると判定した場合、静止画の類似判定を終了する。 According to such a configuration, when the scene similarity determination device determines that any one combination of the still image of one scene and the still image of the other scene is similar, the scene similarity determination ends.

請求項３に係るシーン類似判定装置は、静止画類似判定手段が、一方のシーンの静止画の特徴量及び他方のシーンの静止画の特徴量として、一方のシーンの静止画と他方のシーンの静止画とが対応するように分割し、分割した一方のシーンの静止画と他方のシーンの静止画との所定の色空間の平均値のユークリッド距離を算出すると共に、ユークリッド距離が予め設定された距離以下のとき、一方のシーンの静止画と他方のシーンの静止画とが類似すると判定し、ユークリッド距離が予め設定された距離を越えたとき、一方のシーンの静止画と他方のシーンの静止画とが類似しないと判定する。 In the scene similarity determination apparatus according to claim 3, the still image similarity determination unit uses the still image feature of one scene and the still image feature of the other scene as the feature amount of the still image of the other scene. The image is divided so that the corresponding still images correspond to each other, and the Euclidean distance of the average value of a predetermined color space between the still image of one scene and the still image of the other scene is calculated, and the Euclidean distance is preset. When the distance is less than the distance, it is determined that the still image of one scene is similar to the still image of the other scene, and when the Euclidean distance exceeds a preset distance, the still image of one scene and the still image of the other scene It is determined that the image is not similar.

かかる構成によれば、シーン類似判定装置は、静止画の類似判定を簡易な演算で行うことができる。 According to this configuration, the scene similarity determination device can perform still image similarity determination with a simple calculation.

また、前記した課題を解決するため、請求項４に係るシーン類似判定プログラムは、重複したシーンを含む素材映像について、素材映像から抽出した２個のシーンが類似するか否かを判定するために、コンピュータを、入力手段、シーン分割手段、静止画抽出手段、静止画類似判定手段、シーン類似判定手段、として機能させる構成とした。 In order to solve the above-described problem, a scene similarity determination program according to claim 4 is provided for determining whether two scenes extracted from a material video are similar for a material video including overlapping scenes. The computer is configured to function as an input unit, a scene division unit, a still image extraction unit, a still image similarity determination unit, and a scene similarity determination unit.

かかる構成によれば、シーン類似判定プログラムは、入力手段によって、２個のシーンを入力する。また、シーン類似判定プログラムは、シーン分割手段によって、シーンを構成するフレーム間の動き量を算出して、シーンを、動き量が予め設定した閾値以上である動き大区間と、動き量が閾値未満である動き小区間とに分割する。ここで、ドラマ等の番組制作では、同じシーンを何度も撮影し直すことが多く、この場合、カメラや役者は同じ立ち振る舞いを繰り返すため、同じような動きとなる。つまり、重複するシーンの動き大区間には同様の動きが含まれるため、シーン類似判定プログラムは、シーンの類似を判定するために、シーンを動き大区間と動き小区間とに分割する。 According to such a configuration, the scene similarity determination program inputs two scenes by the input unit. In addition, the scene similarity determination program calculates the amount of motion between frames constituting the scene by the scene dividing unit, and the scene is divided into a large motion section in which the motion amount is equal to or greater than a preset threshold value, and the motion amount is less than the threshold value. It divides into the motion subsection which is. Here, in the production of programs such as dramas, the same scene is often re-photographed many times. In this case, the camera and the actor repeat the same behavior, and thus the movement is the same. That is, since the same motion is included in the large motion section of the overlapping scene, the scene similarity determination program divides the scene into the large motion section and the small motion section in order to determine the similarity of the scene.

また、シーン類似判定プログラムは、静止画抽出手段によって、シーンの動き小区間毎に、動き小区間の直後に位置する動き大区間の開始時刻から一定時間前又は動き小区間の中間時刻のうち、動き大区間の開始時刻に近い一方の時刻における動き小区間のフレームを静止画として抽出する。ここで、２個のシーンから抽出した動き大区間は、互いにその時間方向に若干の伸縮（ズレ）があるため、シーン類似判定プログラムは、静止画抽出手段によって、動きが少なく、同じ構図（静止画）である確率が高い、動き大区間の直前の動き小区間から静止画を抽出する。 In addition, the scene similarity determination program uses a still image extraction unit for each small motion section of the scene, a certain time before the start time of the large motion section located immediately after the small motion section, or an intermediate time of the small motion section, The frame of the small motion section at one time close to the start time of the large motion section is extracted as a still image. Here, since the large motion sections extracted from the two scenes have a slight expansion / contraction (deviation) in the time direction, the scene similarity determination program uses the still image extraction means to move little and the same composition (still A still image is extracted from the small motion section immediately before the large motion section.

また、シーン類似判定プログラムは、静止画類似判定手段によって、２個のシーンのうちの一方のシーンから抽出した静止画の特徴量を算出すると共に、２個のシーンのうちの他方のシーンから抽出した静止画の特徴量を算出し、一方のシーンの静止画と他方のシーンの静止画との全ての組み合わせについて、一方のシーンの静止画の特徴量と他方のシーンの静止画の特徴量とに基づいて、一方のシーンの静止画と他方のシーンの静止画とが類似するか否かを判定する。そして、シーン類似判定プログラムは、シーン類似判定手段によって、一方のシーンの静止画と他方のシーンの静止画との何れか１以上の組み合わせについて、静止画類似判定手段が類似すると判定した場合、一方のシーンと他方のシーンとが類似すると判定する。このように、シーン類似判定プログラムは、シーンの類似判定のために、１組以上の静止画の類似判定を行うだけで、クラスタリングや時系列で静止画を比較するといった複雑な演算を行っていない。 In addition, the scene similarity determination program calculates the feature amount of the still image extracted from one of the two scenes by the still image similarity determination means, and extracts from the other of the two scenes. The feature amount of the still image is calculated, and for all combinations of the still image of one scene and the still image of the other scene, the feature amount of the still image of one scene and the feature amount of the still image of the other scene Based on the above, it is determined whether or not the still image of one scene is similar to the still image of the other scene. When the scene similarity determination unit determines that the still image similarity determination unit is similar for one or more combinations of the still image of one scene and the still image of the other scene by the scene similarity determination unit, It is determined that the other scene is similar to the other scene. As described above, the scene similarity determination program only performs similarity determination of one or more sets of still images for scene similarity determination, and does not perform complicated operations such as clustering and comparison of still images in time series. .

また、前記した課題を解決するため、請求項５に係るサマリ映像生成システムは、重複したシーンを含む素材映像から、重複したシーンを除いた異なるシーンで構成されるサマリ映像を生成するサマリ映像生成システムにおいて、シーン抽出手段と、請求項１に記載のシーン類似判定装置と、代表シーン選択手段と、サマリ映像生成手段と、を備える構成とした。 In order to solve the above-described problem, the summary video generation system according to claim 5 generates a summary video that generates a summary video composed of different scenes excluding a duplicate scene from a material video including a duplicate scene. The system includes a scene extraction unit, a scene similarity determination device according to claim 1, a representative scene selection unit, and a summary video generation unit.

かかる構成によれば、サマリ映像生成システムは、シーン抽出手段によって、シーンチェンジ点を検出して、素材映像に含まれる複数のシーンを抽出する。また、サマリ映像生成システムは、シーン類似判定装置によって、シーン抽出手段が抽出したシーンの組み合わせが類似するか否かを判定する。ここで、サマリ映像生成システムは、シーン類似判定装置によって、簡易で高速、かつ、精度が高いシーンの類似判定を行うことができる。 According to this configuration, the summary video generation system detects a scene change point by the scene extraction unit and extracts a plurality of scenes included in the material video. In the summary video generation system, the scene similarity determination device determines whether the combination of scenes extracted by the scene extraction unit is similar. Here, the summary video generation system can perform a simple, high-speed and high-accuracy scene similarity determination using a scene similarity determination device.

また、サマリ映像生成システムは、代表シーン選択手段によって、シーン抽出手段が抽出したシーンのうち、シーン類似判定装置が類似しないと判定したシーンと、シーン類似判定装置が類似すると判定したシーンの組み合わせうちの最も長いシーンとを代表シーンとして選択する。ここで、サマリ映像生成システムは、代表シーン選択手段によって、シーン類似判定装置が類似すると判定したシーンの組み合わせうちの最も長いシーンを代表シーンに含めることで、例えば、ドラマ、映画等の撮影し直しシーンがサマリ映像に含まれない可能性を最小限にできる。さらに、サマリ映像生成システムは、サマリ映像生成手段によって、代表シーン選択手段が選択した代表シーンの一部又は全部を連結してサマリ映像を生成する。 The summary video generation system also includes a combination of a scene determined by the representative scene selection unit that the scene similarity determination device determines that the scene similarity determination device is similar to a scene that the scene similarity determination device determines that the scene similarity determination device is similar. Is selected as the representative scene. Here, the summary video generation system includes, in the representative scene, the longest scene among the combinations of scenes determined by the scene similarity determination device to be similar by the representative scene selection unit, for example, re-shooting a drama, a movie, or the like. The possibility that the scene is not included in the summary video can be minimized. Further, the summary video generation system generates a summary video by connecting a part or all of the representative scene selected by the representative scene selection unit by the summary video generation unit.

本発明によれば、以下のような優れた効果を奏する。
請求項１，４に係る発明によれば、動きが少なく、同じ構図である確率が高い、動き大区間の直前の動き小区間から抽出した静止画を用いるため、シーンの類似判定の精度を高くすることができる。また、請求項１，４に係る発明によれば、１組以上の静止画の類似判定を行うだけで、クラスタリングや時系列で静止画を比較するといった複雑な演算を行っていないため、簡易で高速にシーンの類似判定を行うことができる。 According to the present invention, the following excellent effects can be obtained.
According to the first and fourth aspects of the invention, since the still image extracted from the small motion section immediately before the large motion section having a low motion and high probability of the same composition is used, the accuracy of the scene similarity determination is increased. can do. In addition, according to the inventions according to claims 1 and 4, since only the similarity determination of one or more sets of still images is performed, and complicated calculations such as clustering and comparison of still images in time series are not performed. Scene similarity determination can be performed at high speed.

請求項２に係る発明によれば、一方のシーンの静止画と他方のシーンの静止画との何れか１の組み合わせが類似すると判定した場合、静止画の類似判定を終了するため、シーンの類似判定をより高速に行うことができる。
請求項３に係る発明によれば、静止画の類似判定を簡易な演算で行うことができるため、シーンの類似判定をより高速に行うことができる。 According to the second aspect of the present invention, when it is determined that any one combination of the still image of one scene and the still image of the other scene is similar, the similarity determination of the still image is terminated to end the similarity determination of the still image. The determination can be performed at a higher speed.
According to the third aspect of the present invention, since the still image similarity determination can be performed with a simple calculation, the scene similarity determination can be performed at higher speed.

請求項５に係る発明によれば、簡易で高速、かつ、精度が高いシーンの類似判定を行うことができると共に、撮影し直しシーン等がサマリ映像に含まれない可能性を最小限にできるため、簡易で高速に重複シーンが極めて少ないサマリ映像を生成できる。 According to the fifth aspect of the invention, it is possible to perform a simple, high-speed and high-accuracy scene similarity determination, and to minimize the possibility that a re-captured scene or the like is not included in the summary video. It is possible to generate a summary video that is simple and high speed with very few overlapping scenes.

以下、本発明の第１実施形態について、適宜図面を参照しながら詳細に説明する。なお、各実施形態において、同一の機能を有する手段及び同一の部材には同一の符号を付し、説明を省略した。 Hereinafter, a first embodiment of the present invention will be described in detail with reference to the drawings as appropriate. In each embodiment, means having the same function and the same member are denoted by the same reference numerals, and description thereof is omitted.

（第１実施形態）
［素材映像の内容］
まず、本発明の第１実施形態に係るシーン類似判定装置を説明する前に、本発明における素材映像の内容について、図１を参照して説明する。図１は、本発明における素材映像の内容を説明する図である。 (First embodiment)
[Contents of material video]
First, before explaining the scene similarity determination apparatus according to the first embodiment of the present invention, the contents of the material video in the present invention will be described with reference to FIG. FIG. 1 is a diagram for explaining the contents of a material video in the present invention.

素材映像は、例えば、ドラマ、映画等の撮影において、同じシーンを２回以上撮影し直したために重複したシーンを含む映像である。図１の例では、素材映像は、４個のショットＳｔ１〜Ｓｔ４で構成されている。ここで、ショットＳｔ１は、カラーバーＣを含む。また、ショットＳｔ２は、ショットＳｔ２の先頭にあるジャンク映像Ｊと、シーン１を１回目に撮影したシーンＳｃ１−Ｔ１と、シーンＳｃ１−Ｔ１とシーンＳｃ１−Ｔ２との間のジャンク映像Ｊと、このシーン１を２回目に撮影し直したシーンＳｃ１−Ｔ２と、ショットＳｔ２の最後にあるジャンク映像Ｊとを含む。このジャンク映像Ｊは、ドラマ、映画の映像としては用いないものであり、例えば、準備中の役者又はスタッフが撮影された映像である。また、ジャンク映像Ｊには、カチンコが撮影された映像も含まれる（不図示）。 The material video is, for example, a video including duplicate scenes because the same scene is shot twice or more in shooting a drama, a movie, or the like. In the example of FIG. 1, the material image is composed of four shots St1 to St4. Here, the shot St1 includes the color bar C. The shot St2 includes the junk video J at the head of the shot St2, the scene Sc1-T1 obtained by shooting the scene 1 for the first time, the junk video J between the scene Sc1-T1 and the scene Sc1-T2, It includes a scene Sc1-T2 in which the scene 1 is re-photographed for the second time, and a junk video J at the end of the shot St2. This junk image J is not used as a drama or movie image, and is, for example, an image taken by an actor or staff who is preparing. Further, the junk video J includes a video in which a clapperboard is photographed (not shown).

また、ショットＳｔ３は、何の映像信号も含まないブランク映像Ｂを含む。さらに、ショットＳｔ４は、ショットＳｔ４の先頭にあるジャンク映像Ｊと、シーン２を１回目に撮影したシーンＳｃ２−Ｔ１と、ショットＳｔ４の最後にあるジャンク映像Ｊとを含む。つまり、図１の例では、素材映像は、シーンＳｃ１−Ｔ１とシーンＳｃ１−Ｔ２とを重複したシーンとして含む。なお、素材映像は、ドラマや映画の映像に限定されず、後記する撮影ルールに従って撮影される映像であれば良い。 The shot St3 includes a blank video B that does not include any video signal. Further, the shot St4 includes a junk video J at the beginning of the shot St4, a scene Sc2-T1 obtained by shooting the scene 2 for the first time, and a junk video J at the end of the shot St4. That is, in the example of FIG. 1, the material video includes the scenes Sc1-T1 and the scenes Sc1-T2 as overlapping scenes. The material video is not limited to a drama or movie, but may be any video that is shot according to the shooting rules described later.

［シーン類似判定装置の構成］
本発明の第１実施形態に係るシーン類似判定装置は、素材映像から予め抽出した２個のシーン（例えば、図１のショットＳｔ２から抽出したシーンＳｃ１−Ｔ１とシーンＳｃ１−Ｔ２）が互いに類似するか否かを判定するものである。以下、本発明の第１実施形態に係るシーン類似判定装置の構成について、図２を参照して説明する。図２は、本発明の第１実施形態に係るシーン類似判定装置の構成を示すブロック図である。図２に示すように、シーン類似判定装置１は、入力手段１０と、シーン分割手段１１と、静止画抽出手段１２と、静止画類似判定手段１３と、シーン類似判定手段１４と、パラメータ記憶手段１５とを備える。 [Configuration of scene similarity determination device]
In the scene similarity determination apparatus according to the first embodiment of the present invention, two scenes extracted in advance from a material video (for example, the scene Sc1-T1 and the scene Sc1-T2 extracted from the shot St2 in FIG. 1) are similar to each other. Whether or not. Hereinafter, the configuration of the scene similarity determination apparatus according to the first embodiment of the present invention will be described with reference to FIG. FIG. 2 is a block diagram showing the configuration of the scene similarity determination apparatus according to the first embodiment of the present invention. As shown in FIG. 2, the scene similarity determination apparatus 1 includes an input unit 10, a scene division unit 11, a still image extraction unit 12, a still image similarity determination unit 13, a scene similarity determination unit 14, and a parameter storage unit. 15.

入力手段１０は、素材映像から予め抽出した２個のシーンを入力するポート、端子等のインターフェースである。なお、第１シーンが２個のシーンのうちの一方のシーンに相当し、第２シーンが、２個のシーンのうちの他方のシーンに相当する。 The input means 10 is an interface such as a port or a terminal for inputting two scenes previously extracted from the material video. The first scene corresponds to one of the two scenes, and the second scene corresponds to the other of the two scenes.

シーン分割手段１１は、入力手段１０に入力された第１シーンを構成するフレーム間の動き量を算出して、第１シーンを、動き量が予め設定した閾値以上である動き大区間と、動き量が前記閾値未満である動き小区間とに分割するものである。ここで、シーン分割手段１１は、例えば、第１シーンを構成するフレーム間の動きベクトルの総和を算出し、この総和が予め設定した閾値以上となる区間を動き大区間として分割する。また、シーン分割手段１１は、例えば、この総和が閾値未満となる区間を動き小区間として分割する。なお、後記するパラメータ情報に前記した閾値を予め設定しておき、シーン分割手段１１は、このパラメータ情報を参照して、この閾値を取得しても良い。 The scene dividing unit 11 calculates a motion amount between frames constituting the first scene input to the input unit 10, and the first scene is divided into a motion large section in which the motion amount is equal to or greater than a preset threshold, The amount is divided into small motion sections whose amount is less than the threshold value. Here, the scene dividing unit 11 calculates, for example, the sum of motion vectors between frames constituting the first scene, and divides a section where the sum is equal to or greater than a preset threshold as a large motion section. In addition, the scene dividing unit 11 divides, for example, a section in which the sum is less than the threshold value as a small motion section. Note that the above-described threshold value may be set in advance in parameter information described later, and the scene dividing unit 11 may acquire this threshold value with reference to the parameter information.

また、シーン分割手段１１は、第１シーンの前後するフレームを所定のブロックに分割し、前後するフレームのブロック毎に画素値差を算出しても良い。そして、シーン分割手段１１は、この画素値差がパラメータ情報に予め設定した差以上となる区間を動き大区間として分割し、この差が前記した差未満となる区間を動き小区間として分割しても良い。なお、シーン分割手段１１は、第１シーンと同様の手法により、第２シーンも動き大区間と動き小区間とに分割する。 Further, the scene dividing unit 11 may divide a frame before and after the first scene into predetermined blocks and calculate a pixel value difference for each block of the preceding and following frames. Then, the scene dividing unit 11 divides a section where the pixel value difference is equal to or larger than a difference set in advance in the parameter information as a large motion section, and divides a section where the difference is less than the above difference as a small motion section. Also good. The scene dividing means 11 divides the second scene into the large motion section and the small motion section by the same method as that for the first scene.

静止画抽出手段１２は、シーン分割手段１１が分割した第１シーンの動き小区間毎に、動き小区間の直後に位置する動き大区間の開始時刻から一定時間前又は動き小区間の中間時刻のうち、動き大区間の開始時刻に近い一方の時刻における動き小区間のフレームを静止画として抽出するものである。また、静止画抽出手段１２は、第１シーンと同様に第２シーンからも静止画を抽出する。なお、パラメータ情報に前記した一定時間（例えば、５秒）を予め設定しておき、静止画抽出手段１２は、このパラメータ情報を参照して、この一定時間を取得しても良い。 The still image extracting means 12 is, for each small motion section of the first scene divided by the scene dividing means 11, for a certain time before the start time of the large motion section located immediately after the small motion section or at an intermediate time of the small motion section. Among them, the frame of the small motion section at one time close to the start time of the large motion section is extracted as a still image. Further, the still image extracting means 12 extracts a still image from the second scene as well as the first scene. Note that the predetermined time (for example, 5 seconds) described above may be set in advance in the parameter information, and the still image extracting unit 12 may acquire the predetermined time with reference to the parameter information.

静止画類似判定手段１３は、静止画抽出手段１２が第１シーンから抽出した静止画の特徴量を算出すると共に、静止画抽出手段１２が第２シーンから抽出した静止画の特徴量を算出するものである。そして、静止画類似判定手段１３は、第１シーンの静止画と第２シーンの静止画との全ての組み合わせについて、第１シーンの静止画の特徴量と第２シーンの静止画の特徴量とに基づいて、第１シーンの静止画と第２シーンの静止画とが類似するか否かを判定する。ここで、静止画類似判定手段１３は、第１シーンの静止画と第２シーンの静止画とのうち、何組の静止画が類似したことを示す信号をシーン類似判定手段１４に出力しても良い。なお、静止画類似判定手段１３による静止画の類似判定の詳細は、後記する。 The still image similarity determination unit 13 calculates the feature amount of the still image extracted from the first scene by the still image extraction unit 12, and calculates the feature amount of the still image extracted from the second scene by the still image extraction unit 12. Is. Then, the still image similarity determination unit 13 determines the feature amount of the still image of the first scene and the feature amount of the still image of the second scene for all combinations of the still image of the first scene and the still image of the second scene. Based on the above, it is determined whether or not the still image of the first scene is similar to the still image of the second scene. Here, the still image similarity determination unit 13 outputs a signal indicating that the number of still images in the first scene still image and the second scene still image are similar to the scene similarity determination unit 14. Also good. Details of still image similarity determination by the still image similarity determination means 13 will be described later.

シーン類似判定手段１４は、第１シーンの静止画と第２シーンの静止画との何れか１以上の組み合わせ（組数）について、静止画類似判定手段１３が類似すると判定した場合、第１シーンと第２シーンとが類似すると判定するものである。なお、パラメータ情報にこの組数を予め設定しておくことにより、シーン類似判定手段１４は、パラメータ情報を参照してこの組数を取得することができる。例えば、パラメータ情報にこの組数を「１」に設定すると、静止画類似判定手段１３からの信号が１組以上の静止画が類似したことを示す場合、シーン類似判定手段１４は、第１シーンと第２シーンとが類似すると判定する。また、例えば、パラメータ情報にこの組数を「２」に設定すると、静止画類似判定手段１３からの信号が２組以上の静止画が類似したことを示す場合、シーン類似判定手段１４は、第１シーンと第２シーンとが類似すると判定する。つまり、シーン類似判定手段１４は、この組数が大きい値に設定される程、シーンの類似判定の精度がより高くなり、この組数が小さい値に設定される程、シーンの類似判定が高速になる。 When the scene similarity determination unit 14 determines that the still image similarity determination unit 13 is similar for one or more combinations (number of sets) of the still image of the first scene and the still image of the second scene, the first scene And the second scene are determined to be similar. Note that, by setting the number of sets in the parameter information in advance, the scene similarity determination unit 14 can obtain the number of sets by referring to the parameter information. For example, when the number of sets is set to “1” in the parameter information, when the signal from the still image similarity determination unit 13 indicates that one or more sets of still images are similar, the scene similarity determination unit 14 selects the first scene. And the second scene are determined to be similar. For example, when the number of sets is set to “2” in the parameter information, when the signal from the still image similarity determination unit 13 indicates that two or more sets of still images are similar, the scene similarity determination unit 14 It is determined that the first scene is similar to the second scene. In other words, the scene similarity determination unit 14 increases the accuracy of the scene similarity determination as the number of sets increases, and the scene similarity determination increases as the number of sets decreases. become.

また、シーン類似判定手段１４は、前記した判定に応じて、シーンが類似する又はシーンが類似しないといった判定結果を示す信号を出力しても良い。また、シーン類似判定手段１４は、この判定結果を、図示しない液晶画面等の表示手段に出力しても良い。 In addition, the scene similarity determination unit 14 may output a signal indicating a determination result that the scene is similar or the scene is not similar in accordance with the above determination. Further, the scene similarity determination unit 14 may output the determination result to a display unit such as a liquid crystal screen (not shown).

パラメータ記憶手段１５は、前記したパラメータ情報を記憶するＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）等の記憶手段である。このパラメータ情報は、シーン類似判定装置１の動作に必要となる各種パラメータを設定するのもである。 The parameter storage unit 15 is a storage unit such as a ROM (Read Only Memory) or a HDD (Hard Disk Drive) that stores the parameter information. This parameter information sets various parameters necessary for the operation of the scene similarity determination apparatus 1.

＜静止画の抽出及び静止画の類似判定＞
以下、図２の静止画抽出手段１２による静止画の抽出及び静止画類似判定手段１３による静止画の類似判定の詳細について、図３を参照して説明する（適宜図２参照）。図３（ａ）は、図２の静止画抽出手段１２による静止画の抽出を説明する図であり、図３（ｂ）は、静止画類似判定手段１３による静止画の類似判定を説明する図である。なお、図３（ａ）では、第１シーン及び第２シーンの動き大区間を斜線のハッチングで図示し、動き小区間の中間時刻を破線で図示した。また、図３（ａ）では、ｔは、時刻の経過を示す。 <Still image extraction and still image similarity determination>
The details of still image extraction by the still image extraction unit 12 and still image similarity determination by the still image similarity determination unit 13 will be described with reference to FIG. 3 (see FIG. 2 as appropriate). FIG. 3A is a diagram for explaining still image extraction by the still image extraction unit 12 in FIG. 2, and FIG. 3B is a diagram for explaining still image similarity determination by the still image similarity determination unit 13. It is. In FIG. 3A, the large motion section of the first scene and the second scene is illustrated by hatching, and the intermediate time of the small motion section is illustrated by a broken line. In FIG. 3A, t indicates the passage of time.

図３（ａ）の例では、シーン分割手段１１が、第１シーンを、３個の動き小区間ａ，ｃ，ｅと、２個の動き大区間ｂ，ｄとに分割している。また、シーン分割手段１１が、第２シーンを、２個の動き小区間ｆ，ｈと、２個の動き大区間ｇ，ｉとに分割している。 In the example of FIG. 3A, the scene dividing unit 11 divides the first scene into three small motion sections a, c, e and two large motion sections b, d. The scene dividing means 11 divides the second scene into two small motion sections f and h and two large motion sections g and i.

まず、静止画抽出手段１２は、第１シーンの動き小区間ａから静止画を抽出する。具体的には、静止画抽出手段１２は、動き小区間ａの中間時刻と、動き小区間ａの直後に位置する動き大区間ｂの開始時刻から一定時間前の時刻とのうち、どちらが動き大区間ｂの開始時刻に近いかを判定する。ここでは、動き小区間ａの中間時刻が動き大区間ｂの開始時刻に近いため、静止画抽出手段１２は、動き小区間ａの中間時刻におけるフレームを静止画１−１として抽出する。 First, the still image extraction means 12 extracts a still image from the motion subsection a of the first scene. Specifically, the still image extracting unit 12 determines which one of the intermediate time of the small motion section a and the time a certain time before the start time of the large motion section b located immediately after the small motion section a. It is determined whether it is close to the start time of the section b. Here, since the intermediate time of the small motion section a is close to the start time of the large motion section b, the still image extraction unit 12 extracts the frame at the intermediate time of the small motion section a as the still image 1-1.

次に、静止画抽出手段１２は、第１シーンの動き小区間ｃから静止画を抽出する。具体的には、静止画抽出手段１２は、動き小区間ｃの中間時刻と、動き小区間ｃの直後に位置する動き大区間ｄの開始時刻から一定時間前の時刻とのうち、どちらが動き大区間ｄの開始時刻に近いかを判定する。ここでは、動き小区間ｃの直後に位置する動き大区間ｄの開始時刻から一定時間前の時刻が、動き大区間ｄの開始時刻に近いため、静止画抽出手段１２は、動き小区間ｃの直後に位置する動き大区間ｄの開始時刻から一定時間前の時刻におけるフレームを静止画１−２として抽出する。 Next, the still image extracting means 12 extracts a still image from the motion subsection c of the first scene. Specifically, the still image extraction unit 12 determines which of the intermediate time of the small motion section c and the time before a certain time from the start time of the large motion section d located immediately after the small motion section c. It is determined whether it is close to the start time of the section d. Here, since the time before a certain time before the start time of the large motion section d located immediately after the small motion section c is close to the start time of the large motion section d, the still image extracting means 12 A frame at a certain time before the start time of the large motion section d located immediately after is extracted as a still image 1-2.

また、静止画抽出手段１２は、第１シーンと同様に、第２シーンの動き小区間ｆ，ｈからそれぞれ静止画２−１，２−２を抽出する。なお、図３（ａ）に示すように、第１シーンの動き小区間ｅは、第１シーンの最後であって直後の動き大区間が存在しないため、静止画抽出手段１２は、動き小区間ｅから静止画を抽出しなくとも良い。 Further, the still image extracting means 12 extracts still images 2-1 and 2-2 from the small motion sections f and h of the second scene, respectively, as in the first scene. As shown in FIG. 3A, the motion sub-section e of the first scene is the last motion of the first scene and there is no large motion section immediately after the first scene. It is not necessary to extract a still image from e.

一般的に、ドラマ、映画等の素材映像は、同じシーンを何度も撮影し直すことが多く、この場合、役者が同じ振る舞いをすること、及び、撮影開始前ではどの役者も静止し、「アクション」又は「スタート」のかけ声とともに演技を開始するといった特定の撮影ルールに従って撮影されている。つまり、この撮影ルールの性質から、同じシーンの動き大区間であれば同様の動きが含まれ、その動き大区間の直前（一定時間前）で同じ構図（静止画）となる確率が高くなると考えられる。そこで、シーン類似判定装置１は、この撮影ルールの性質を利用して、シーン分割手段１１によって、シーンを動き大区間と動き小区間とに分割し、静止画抽出手段１２によって、前記したような手法で同じ構図（静止画）である確率が高い２枚の静止画を抽出する。 In general, material images such as dramas and movies often re-shoot the same scene many times. In this case, the actors behave the same, and before the start of shooting, all the actors stand still. The filming is performed according to a specific shooting rule such that the performance is started with a voice of “action” or “start”. In other words, due to the nature of this shooting rule, it is considered that the same motion is included in a large motion section of the same scene, and the probability of the same composition (still image) immediately before the large motion section (predetermined time) increases. It is done. Therefore, the scene similarity determination device 1 uses the characteristics of the shooting rule to divide the scene into a large motion section and a small motion section by the scene dividing means 11 and the still image extracting means 12 as described above. The method extracts two still images with a high probability of having the same composition (still image).

なお、静止画抽出手段１２は、入力手段に入力されたシーンがＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ）２形式の場合、動き大区間の先頭のフレームに対応するタイムコードを、動き大区間の開始時刻としても良い。また、静止画抽出手段１２は、動き小区間の先頭と最後のフレームにそれぞれ対応するタイムコードの中間となる時刻を算出し、これを中間時刻としても良い。 When the scene input to the input unit is in the MPEG (Moving Picture Experts Group) 2 format, the still image extraction unit 12 uses the time code corresponding to the first frame of the large motion section as the start time of the large motion section. good. Still image extraction means 12 may calculate a time that is intermediate between the time codes corresponding to the first and last frames of the small motion section, and may use this as the intermediate time.

図３（ｂ）に示すように、静止画類似判定手段１３は、第１シーンの静止画１−１，１−２と、第２シーンの静止画２−１，２−２との全てを組み合わせが類似するか否かを判定する。つまり、静止画類似判定手段１３は、静止画１−１と静止画２−１との組み合わせ、静止画１−１と静止画２−２との組み合わせ、静止画１−２と静止画２−１との組み合わせ、及び、静止画１−２と静止画２−２との組み合わせが類似するか否かを判定する。 As shown in FIG. 3B, the still image similarity determination unit 13 performs all of the still images 1-1 and 1-2 of the first scene and the still images 2-1 and 2-2 of the second scene. It is determined whether the combinations are similar. That is, the still image similarity determination means 13 is a combination of the still image 1-1 and the still image 2-1, a combination of the still image 1-1 and the still image 2-2, a still image 1-2 and a still image 2- 1 and the combination of the still image 1-2 and the still image 2-2 are determined to be similar to each other.

具体的には、静止画類似判定手段１３は、例えば、第１シーンの静止画１−１，１−２と第２シーンの静止画２−１，２−２とのそれぞれを対応するように、縦２×横２で４分割する。そして、静止画類似判定手段１３は、第１シーンの静止画の特徴量及び第２シーンの静止画の特徴量として、４分割した第１シーンの静止画１−１，１−２と第２シーンの静止画２−１，２−２とのＨＳＢ（ＨｕｅＳａｔｕｒａｔｉｏｎＢｒｉｇｈｔｎｅｓｓ）色空間の平均値のユークリッド距離を算出する。なお、静止画類似判定手段１３は、第１シーンの静止画と第２シーンの静止画とを４分割すると説明したが、２分割以上であれば分割する数は限定されない。また、静止画類似判定手段１３は、ＨＳＢ色空間を用いると説明したが、これに限定されず、例えば、ＲＧＢ（ＲｅｄＧｒｅｅｎＢｌｕｅ）色空間又はＬ＊ａ＊ｂ色空間等の任意の色空間を用いても良い。 Specifically, the still image similarity determination unit 13 may correspond to, for example, each of the still images 1-1 and 1-2 of the first scene and the still images 2-1 and 2-2 of the second scene. Divide into 4 by 2 × 2 Then, the still image similarity determination unit 13 uses the still images 1-1 and 1-2 of the first scene divided into four as the feature amount of the still image of the first scene and the feature amount of the still image of the second scene and the second image. The Euclidean distance of the average value of the HSB (Hue Saturation Brightness) color space with the still images 2-1 and 2-2 of the scene is calculated. The still image similarity determination unit 13 has been described as dividing the still image of the first scene and the still image of the second scene into four, but the number of division is not limited as long as it is two or more. Further, the still image similarity determination unit 13 has been described as using the HSB color space. However, the present invention is not limited to this. For example, an arbitrary color space such as an RGB (Red Green Blue) color space or an L * a * b color space. May be used.

そして、静止画類似判定手段１３は、このユークリッド距離が、予め設定した距離以上のとき、第１シーン１−１の静止画と第２シーン２−１の静止画とが類似すると判定する。一方、静止画類似判定手段１３は、このユークリッド距離が、前記した距離未満のとき、第１シーン１−１の静止画と第２シーン２−１の静止画とが類似しないと判定する。このように、静止画類似判定手段１３は、第１シーンの静止画１−２と、第２シーンの静止画２−１，２−２との残りの組み合わせについても、ユークリッド距離が前記した距離以下であるか否かの判定を行う。 Then, the still image similarity determination means 13 determines that the still image of the first scene 1-1 and the still image of the second scene 2-1 are similar when the Euclidean distance is equal to or greater than a preset distance. On the other hand, when the Euclidean distance is less than the above-described distance, the still image similarity determination unit 13 determines that the still image of the first scene 1-1 and the still image of the second scene 2-1 are not similar. As described above, the still image similarity determination unit 13 determines that the Euclidean distance is the distance described above for the remaining combinations of the still image 1-2 of the first scene and the still images 2-1 and 2-2 of the second scene. It is determined whether or not:

ここで、静止画類似判定手段１３は、第１シーンの静止画と第２シーンの静止画との何れか１の組み合わせが類似すると判定した場合、第１シーンの静止画と第２シーンの静止画との他の組み合わせが類似するか否かの判定を終了しても良い。これによって、静止画類似判定手段１３は、静止画の類似判定をより高速に行うことができる。なお、パラメータ情報に、静止画を分割する数を示す情報、ＨＳＢ色空間とＲＧＢ色空間とＬ＊ａ＊ｂ色空間の何れを用いるかを示す情報、前記した距離を示す情報、及び、静止画が１組類似したときに類似判定を終了するか否かを示す情報を予め設定しても良い。この場合、静止画類似判定手段１３は、このパラメータ情報を参照して、何れの色空間を用いるか、類似判定を終了するか否かといった制御を行っても良い。 Here, when the still image similarity determination unit 13 determines that any one combination of the still image of the first scene and the still image of the second scene is similar, the still image of the first scene and the still image of the second scene The determination as to whether another combination with the image is similar may be terminated. Accordingly, the still image similarity determination unit 13 can perform still image similarity determination at higher speed. The parameter information includes information indicating the number of still images to be divided, information indicating which of the HSB color space, the RGB color space, and the L * a * b color space is used, information indicating the distance, and still image Information indicating whether to end the similarity determination when one set of images is similar may be set in advance. In this case, the still image similarity determination unit 13 may control which color space to use or whether to end the similarity determination with reference to the parameter information.

なお、図３では、静止画類似判定手段１３が、第１シーンの静止画の特徴量及び第２シーンの静止画の特徴量として、色空間を用いる例で説明したが、第１シーンの静止画の特徴量及び第２シーンの静止画の特徴量は、これに限定されない。例えば、静止画類似判定手段１３は、第１シーンの静止画の特徴量及び第２シーンの静止画の特徴量として、エッジのヒストグラム又はＤＣＴ（離散コサイン変換）係数等の空間周波数を用いることができる。具体的には、静止画類似判定手段１３は、第１シーンの静止画の特徴量及び第２シーンの静止画のそれぞれについて、エッジのヒストグラム又はＤＣＴ係数を算出する。そして、静止画類似判定手段１３は、第１シーンの静止画の特徴量及び第２シーンの静止画のそれぞれについて、算出したエッジのヒストグラム又はＤＣＴ係数を比較し、これらが予め設定した範囲内にあるか否かによって、静止画の類似判定を行うことができる。このように、第１シーンの静止画の特徴量及び第２シーンの静止画の特徴量として、エッジのヒストグラムやＤＣＴ係数を用いても、静止画類似判定手段１３は、簡易で高速に静止画の類似判定を行うことができる。 In FIG. 3, the still image similarity determination unit 13 has been described using an example in which a color space is used as the feature amount of the still image of the first scene and the feature amount of the still image of the second scene. The feature amount of the image and the feature amount of the still image of the second scene are not limited to this. For example, the still image similarity determination unit 13 uses a spatial frequency such as an edge histogram or a DCT (discrete cosine transform) coefficient as the feature amount of the still image of the first scene and the feature amount of the still image of the second scene. it can. Specifically, the still image similarity determination unit 13 calculates an edge histogram or a DCT coefficient for each of the still image feature amount of the first scene and the still image of the second scene. Then, the still image similarity determination means 13 compares the calculated edge histogram or DCT coefficient for each of the still image feature amount of the first scene and the still image of the second scene, and these are within a preset range. The still image similarity determination can be performed depending on whether or not there is. As described above, even if the edge histogram or the DCT coefficient is used as the feature value of the still image of the first scene and the feature value of the still image of the second scene, the still image similarity determination unit 13 can easily and quickly Similarity determination can be performed.

なお、図３では、第１シーンと第２シーンとで同じ枚数（例えば、２枚）の静止画が抽出される例を説明したが、第１シーンと第２シーンとで異なる枚数の静止画が抽出される場合もある。この場合も、静止画類似判定手段１３は、図３と同様に、第１シーンの静止画と、第２シーンの静止画との全てを組み合わせが類似するか否かを判定する。例えば、第１シーンから３枚の静止画が抽出され、第２シーンから２枚の静止画が抽出された場合、静止画類似判定手段１３は、第１シーンの静止画と第２シーンの静止画とを、３枚×２枚の合計６通りの組み合わせ、各組み合わせが類似するか否かを判定する。 In FIG. 3, an example in which the same number of still images (for example, two) are extracted in the first scene and the second scene has been described. However, different numbers of still images are extracted in the first scene and the second scene. May be extracted. Also in this case, as in FIG. 3, the still image similarity determination unit 13 determines whether the combination of all the still images of the first scene and the still images of the second scene is similar. For example, when three still images are extracted from the first scene and two still images are extracted from the second scene, the still image similarity determination unit 13 determines that the still image of the first scene and the still image of the second scene A total of 6 combinations of 3 sheets × 2 sheets are determined, and it is determined whether or not each combination is similar.

［シーン類似判定装置の動作］
以下、図２のシーン類似判定装置の動作について、図４を参照して説明する（適宜図２参照）。図４は、図２のシーン類似判定装置の動作を示すフローチャートである。 [Operation of scene similarity determination device]
Hereinafter, the operation of the scene similarity determination apparatus in FIG. 2 will be described with reference to FIG. 4 (see FIG. 2 as appropriate). FIG. 4 is a flowchart showing the operation of the scene similarity determination apparatus of FIG.

まず、シーン類似判定装置１は、入力手段１０によって、素材映像から予め抽出した２個のシーン（第１シーン，第２シーン）を入力する（ステップＳ１）。 First, the scene similarity determination apparatus 1 inputs two scenes (first scene and second scene) extracted in advance from a material video by the input means 10 (step S1).

ステップＳ１の処理に続いて、シーン類似判定装置１は、シーン分割手段１１によって、入力手段１０に入力された第１シーンを構成するフレーム間の動き量を算出して、第１シーンを、動き量が予め設定した閾値以上である動き大区間と、動き量が閾値未満である動き小区間とに分割する（ステップＳ２）。 Following the processing of step S1, the scene similarity determination apparatus 1 calculates the amount of motion between frames constituting the first scene input to the input unit 10 by the scene dividing unit 11, and moves the first scene to the motion. The movement is divided into a large motion section whose amount is equal to or greater than a preset threshold value and a small motion section whose motion amount is less than the threshold value (step S2).

ステップＳ２の処理に続いて、シーン類似判定装置１は、静止画抽出手段１２によって、シーン分割手段１１が分割した第１シーンの動き小区間毎に、動き小区間の直後に位置する動き大区間の開始時刻から一定時間前又は動き小区間の中間時刻のうち、動き大区間の開始時刻に近い一方の時刻における動き小区間のフレームを静止画として抽出する（ステップＳ３）。 Subsequent to the processing of step S2, the scene similarity determination apparatus 1 uses the still image extraction unit 12 for each small motion section of the first scene divided by the scene dividing unit 11, and then the large motion section located immediately after the small motion section. The frame of the small motion section at one time close to the start time of the large motion section among the intermediate time of the small motion section or a certain time before the start time is extracted as a still image (step S3).

ステップＳ３の処理に続いて、シーン類似判定装置１は、シーン分割手段１１によって、入力手段１０に入力された第２シーンを構成するフレーム間の動き量を算出して、第２シーンを、動き量が予め設定した閾値以上である動き大区間と、動き量が閾値未満である動き小区間とに分割する（ステップＳ４）。 Subsequent to the process of step S3, the scene similarity determination device 1 calculates the amount of motion between frames constituting the second scene input to the input unit 10 by the scene dividing unit 11, and moves the second scene to the motion. The movement is divided into a large motion section whose amount is equal to or larger than a preset threshold value and a small motion section whose motion amount is less than the threshold value (step S4).

ステップＳ４の処理に続いて、シーン類似判定装置１は、静止画抽出手段１２によって、シーン分割手段１１が分割した第２シーンの動き小区間毎に、動き小区間の直後に位置する動き大区間の開始時刻から一定時間前又は動き小区間の中間時刻のうち、動き大区間の開始時刻に近い一方の時刻における動き小区間のフレームを静止画として抽出する（ステップＳ５）。 Following the processing in step S4, the scene similarity determination apparatus 1 uses the still image extraction unit 12 to move the large motion section located immediately after the small motion section for each small motion section of the second scene divided by the scene dividing unit 11. The frame of the small motion section at one time close to the start time of the large motion section among the intermediate time of the small motion section or a certain time before the start time is extracted as a still image (step S5).

ステップＳ５の処理に続いて、シーン類似判定装置１は、静止画類似判定手段１３によって、ステップＳ３の処理で抽出した第１シーンの静止画の特徴量を算出すると共に、ステップＳ５の処理で抽出した第２シーンの静止画の特徴量を算出する（ステップＳ６）。 Following the process of step S5, the scene similarity determination apparatus 1 calculates the feature quantity of the still image of the first scene extracted by the process of step S3 by the still image similarity determination unit 13, and extracts it by the process of step S5. The feature amount of the still image of the second scene is calculated (step S6).

ステップＳ６の処理に続いて、シーン類似判定装置１は、静止画類似判定手段１３によって、第１シーンの静止画と第２シーンの静止画との全ての組み合わせについて、第１シーンの静止画の特徴量と第２シーンの静止画の特徴量とに基づいて、第１シーンの静止画と第２シーンの静止画とが類似するか否かを判定する（ステップＳ７）。 Following the processing of step S6, the scene similarity determination apparatus 1 uses the still image similarity determination unit 13 to determine the first scene still image for all combinations of the first scene still image and the second scene still image. Based on the feature amount and the feature amount of the still image of the second scene, it is determined whether or not the still image of the first scene and the still image of the second scene are similar (step S7).

第１シーンの静止画と第２シーンの静止画との何れか１の組み合わせについて、静止画類似判定手段１３が類似すると判定した場合（ステップＳ７：あり）、シーン類似判定装置１は、シーン類似判定手段１４によって、第１シーンと第２シーンとが類似すると判定する（ステップＳ８）。そして、シーン類似判定装置１は、シーン類似判定手段１４によって、この判定に応じて、シーンが類似するという判定結果を出力する。 When the still image similarity determination unit 13 determines that any one of the still image of the first scene and the still image of the second scene is similar (step S7: present), the scene similarity determination apparatus 1 The determination unit 14 determines that the first scene and the second scene are similar (step S8). Then, the scene similarity determination device 1 outputs a determination result that the scenes are similar by the scene similarity determination unit 14 according to this determination.

一方、第１シーンの静止画と第２シーンの静止画との全ての組み合わせについて、静止画類似判定手段１３が類似しないと判定した場合（ステップＳ７：なし）、シーン類似判定装置１は、シーン類似判定手段１４によって、第１シーンと第２シーンとが類似しないと判定する（ステップＳ９）。そして、シーン類似判定装置１は、シーン類似判定手段１４によって、この判定に応じて、シーンが類似しないという判定結果を出力する。 On the other hand, when the still image similarity determination unit 13 determines that all the combinations of the still image of the first scene and the still image of the second scene are not similar (step S7: None), the scene similarity determination apparatus 1 The similarity determination unit 14 determines that the first scene and the second scene are not similar (step S9). Then, the scene similarity determination device 1 outputs a determination result that the scenes are not similar by the scene similarity determination unit 14 in accordance with this determination.

以上のように、シーン類似判定装置１は、シーンの類似判定の精度を高くすると共に、簡易で高速に、シーンの類似判定を行うことができる。なお、各実施形態では、本発明に係るシーン類似判定装置を独立した装置として説明したが、本発明では、一般的なコンピュータの演算手段及び記憶手段といったハードウェア資源を、前記した各手段として協調動作させるプログラムとしても良い。このプログラムは、通信回線を介して配布しても良く、ＣＤ−ＲＯＭやフラッシュメモリ等の記録媒体に書き込んで配布しても良い。 As described above, the scene similarity determination apparatus 1 can increase the accuracy of scene similarity determination and can perform scene similarity determination easily and at high speed. In each embodiment, the scene similarity determination apparatus according to the present invention has been described as an independent apparatus. However, in the present invention, hardware resources such as general computer calculation means and storage means are coordinated as the above-described means. It may be a program to be operated. This program may be distributed via a communication line, or may be distributed by writing in a recording medium such as a CD-ROM or a flash memory.

（第２実施形態）
［サマリ映像生成システムの構成］
以下、本発明の第２実施形態に係るサマリ映像生成システムの構成について、図５を参照して説明する。図５は、本発明の第２実施形態に係るサマリ映像生成システムの構成を示すブロック図である。図５に示すように、サマリ映像生成システム２は、重複したシーンを含む素材映像から、この重複したシーンを除いた異なるシーンで構成されるサマリ映像を生成するものであり、シーン抽出手段２１と、図２のシーン類似判定装置１と、代表シーン選択手段２２と、不要映像削除手段２３と、サマリ映像生成手段２４と、を備える。 (Second Embodiment)
[Configuration of Summary Video Generation System]
Hereinafter, the configuration of a summary video generation system according to the second embodiment of the present invention will be described with reference to FIG. FIG. 5 is a block diagram showing a configuration of a summary video generation system according to the second embodiment of the present invention. As shown in FIG. 5, the summary video generation system 2 generates a summary video composed of different scenes excluding the duplicate scene from the material video including the duplicate scene. 2, the scene similarity determination device 1, the representative scene selection unit 22, the unnecessary video deletion unit 23, and the summary video generation unit 24.

シーン抽出手段２１は、素材映像を入力すると共に、公知の手法で、素材映像のシーンチェンジ点を検出して、素材映像に含まれる複数のシーンを抽出するものである。例えば、シーン抽出手段２１は、素材映像の各フレームを所定のブロック数（例えば、４×４の１６ブロック）に区切り、各ブロック単位で色変化量を算出して、この色変化量が一定の値以上となったフレームをシーンチェンジ点として検出する（例えば、特開２００６−２７０３０１号公報参照）。 The scene extraction means 21 inputs a material video and detects a scene change point of the material video by a known method to extract a plurality of scenes included in the material video. For example, the scene extraction unit 21 divides each frame of the material video into a predetermined number of blocks (for example, 4 × 4 16 blocks), calculates a color change amount for each block, and the color change amount is constant. A frame that is equal to or greater than the value is detected as a scene change point (see, for example, JP-A-2006-270301).

また、シーン抽出手段２１は、素材映像に含まれる肌色領域を人物として検出し、この人物（肌色領域）の動きベクトルの変化に基づいて、シーンチェンジ点を検出しても良い。さらに、シーン抽出手段２１は、素材映像に音声情報が付加されている場合、一定時間声部がない部分、又は、音声が急激に変化した部分をシーンチェンジ点として検出しても良い。 The scene extraction means 21 may detect a skin color area included in the material video as a person and detect a scene change point based on a change in the motion vector of the person (skin color area). Further, when audio information is added to the material video, the scene extraction unit 21 may detect a part where there is no voice part for a certain period of time or a part where the audio has changed suddenly as a scene change point.

ここで、シーン抽出手段２１は、素材映像の先頭シーンから順に２個のシーンを組み合わせてシーン類似判定装置１に出力する。例えば、素材映像から５個のシーンを抽出した場合、シーン抽出手段２１は、シーン１とシーン２との組み合わせ、シーン２とシーン３との組み合わせ、シーン３とシーン４との組み合わせ、及び、シーン４とシーン５との組み合わせをそれぞれシーン類似判定装置１に出力する。なお、シーン抽出手段２１は、５個のシーンを全て組み合わせてシーン類似判定装置１に出力し、シーン類似判定装置１は、５個のシーン全ての組み合わせについて、類似するか否かを判定しても良い。 Here, the scene extraction means 21 combines the two scenes in order from the first scene of the material video and outputs the combined scenes to the scene similarity determination apparatus 1. For example, when five scenes are extracted from the material video, the scene extraction unit 21 combines the scene 1 and the scene 2, the combination of the scene 2 and the scene 3, the combination of the scene 3 and the scene 4, and the scene. The combination of 4 and scene 5 is output to the scene similarity determination device 1. The scene extraction unit 21 combines all five scenes and outputs the combined scenes to the scene similarity determination apparatus 1. The scene similarity determination apparatus 1 determines whether all the combinations of the five scenes are similar. Also good.

シーン類似判定装置１は、前記したように、シーン抽出手段２１から入力された２個のシーンが類似するか否かを判定するものである。なお、シーン類似判定装置１は、図２と同様のものであるため、その説明を省略する。 As described above, the scene similarity determination apparatus 1 determines whether two scenes input from the scene extraction unit 21 are similar. The scene similarity determination device 1 is the same as that shown in FIG.

代表シーン選択手段２２は、シーン抽出手段２１が抽出した全てのシーンのうち、シーン類似判定装置１が類似しないと判定したシーンと、シーン類似判定装置が類似すると判定したシーンの組み合わせのうちの最も長いシーンとを代表シーンとして選択するものである。 The representative scene selection means 22 is the most of the combinations of the scenes determined by the scene similarity determination device 1 not to be similar to the scenes determined by the scene similarity determination device to be similar among all the scenes extracted by the scene extraction means 21. A long scene is selected as a representative scene.

以下、図６を参照して、図５の代表シーン選択手段２２による代表シーンの選択について説明する（適宜図５参照）。図６は、図５の代表シーン選択手段２２による代表シーンの選択を説明する図である。なお、図６では、長いシーン程、左右方向に長い長方形として図示した。 Hereinafter, selection of a representative scene by the representative scene selection unit 22 of FIG. 5 will be described with reference to FIG. 6 (see FIG. 5 as appropriate). FIG. 6 is a diagram for explaining selection of a representative scene by the representative scene selection means 22 of FIG. In FIG. 6, a longer scene is illustrated as a rectangle that is longer in the left-right direction.

また、図６の例では、シーン類似判定装置１が、シーン１とシーン２との組み合わせを類似すると判定し、シーン２とシーン３との組み合わせを類似しないと判定し、シーン３とシーン４との組み合わせを類似しないと判定し、シーン４とシーン５との組み合わせを類似すると判定している。 In the example of FIG. 6, the scene similarity determination device 1 determines that the combination of the scene 1 and the scene 2 is similar, determines that the combination of the scene 2 and the scene 3 is not similar, Are determined not to be similar, and the combination of scene 4 and scene 5 is determined to be similar.

まず、代表シーン選択手段２２は、シーン１とシーン２との組み合わせが類似するため、シーン１とシーン２との長さを比較し、シーン２より長いシーン１を１個目の代表シーンとして選択する。また、代表シーン選択手段２２は、シーン２とシーン３との組み合わせが類似せず、及び、シーン３とシーン４との組み合わせが類似しないため、シーン３を２個目の代表シーンとして選択する。さらに、代表シーン選択手段２２は、シーン４とシーン５との組み合わせが類似するため、シーン４とシーン５との長さを比較し、シーン５を３個目の代表シーンとして選択する。なお、代表シーン選択手段２２は、各シーンの長さを、例えば、各シーンの先頭から終了までのタイムコードの差を算出して求めることができる。 First, since the combination of scene 1 and scene 2 is similar, representative scene selection means 22 compares the lengths of scene 1 and scene 2 and selects scene 1 longer than scene 2 as the first representative scene. To do. The representative scene selection unit 22 selects the scene 3 as the second representative scene because the combination of the scene 2 and the scene 3 is not similar and the combination of the scene 3 and the scene 4 is not similar. Furthermore, since the combination of the scene 4 and the scene 5 is similar, the representative scene selection unit 22 compares the lengths of the scene 4 and the scene 5 and selects the scene 5 as the third representative scene. Note that the representative scene selection unit 22 can obtain the length of each scene by, for example, calculating the difference in time codes from the beginning to the end of each scene.

なお、代表シーン選択手段２２は、３個以上のシーンが類似すると判定された場合、これらシーンの組み合わせのうち、最も長いシーンを代表シーンとして選択しても良い。例えば、図６において、シーン１とシーン２との組み合わせを類似すると判定され、かつ、シーン２とシーン３との組み合わせを類似すると判定された場合を例に説明する。このとき、シーン１とシーン３との組み合わせも類似すると考えられるため、代表シーン選択手段２２は、シーン１からシーン３までのうちの最も長いシーン、例えば、シーン１の長さが１０秒、シーン２の長さが１０秒、及び、シーン３の長さが１７秒である場合、もっとも長いシーン３を代表シーンとして選択する。 Note that, when it is determined that three or more scenes are similar, the representative scene selection unit 22 may select the longest scene among the combinations of these scenes as the representative scene. For example, in FIG. 6, a case will be described as an example where the combination of scene 1 and scene 2 is determined to be similar and the combination of scene 2 and scene 3 is determined to be similar. At this time, since the combination of scene 1 and scene 3 is also considered to be similar, the representative scene selection means 22 has the longest scene from scene 1 to scene 3, for example, the length of scene 1 is 10 seconds, When the length of 2 is 10 seconds and the length of the scene 3 is 17 seconds, the longest scene 3 is selected as a representative scene.

以下、図５に戻り説明を続ける。不要映像削除手段２３は、公知の手法で、代表シーン選択手段２２が選択した代表シーンに含まれる不要映像を検出して削除するものである。この不要映像とは、各シーンに含まれる映像のうち、視聴者による視聴の対象とならない映像又は製作者による編集の対象とならない映像であり、例えば、図１のカラーバーＣ、ブランク映像Ｂ及びジャンク映像Ｊである。例えば、不要映像削除手段２３は、代表シーンを構成する各フレームについて、原色の画素が占める割合を算出し、この割合が予め設定した閾値以上であれば、そのフレームがカラーバーであるとして検出し、そのフレームを削除する。また、例えば、不要映像削除手段２３は、映像信号が含まれていないフレームをブランク映像として検出し、そのフレームを削除する。 Hereinafter, returning to FIG. The unnecessary video deletion unit 23 detects and deletes an unnecessary video included in the representative scene selected by the representative scene selection unit 22 by a known method. This unnecessary video is a video that is not subject to viewing by the viewer or a video that is not subject to editing by the producer among the videos included in each scene. For example, the color bar C, the blank video B, and the like shown in FIG. Junk video J. For example, the unnecessary video deletion unit 23 calculates the ratio of the primary color pixels for each frame constituting the representative scene, and detects that the frame is a color bar if the ratio is equal to or greater than a preset threshold. , Delete that frame. For example, the unnecessary video deletion unit 23 detects a frame that does not include a video signal as a blank video, and deletes the frame.

また、ドラマ等の制作において、「アクション」で始まり「カット」で終了する各シーンが一定の長さ（例えば、１０秒）以上であることが多いため、不要映像削除手段２３は、代表シーンの長さが一定の長さ未満の場合、その代表シーンそのものを削除しても良い。さらに、不要映像削除手段２３は、そのドラマ等を制作する監督の「アクション」、「カット」といったシーンの開始及び終了を識別可能な音声を予め登録しておき、代表シーンに含まれる「アクション」及び「カット」を音声認識する。そして、不要映像削除手段２３は、各代表シーンにおいて、「アクション」を検出する以前の映像、及び「カット」を検出した以後の映像を不要映像として削除しても良い。以上のように、サマリ映像生成システム２は、不要映像削除手段２３によって、代表シーン選択手段２２が選択した代表シーンに含まれる不要映像を削除するため、素材映像の内容をより把握しやすいサマリ映像を生成できる。 In production of dramas and the like, each scene starting with “action” and ending with “cut” is often longer than a certain length (for example, 10 seconds). If the length is less than a certain length, the representative scene itself may be deleted. Further, the unnecessary video deleting means 23 registers in advance a voice that can identify the start and end of the scene such as “action” and “cut” of the director who produces the drama and the like, and “action” included in the representative scene. And “cut” is recognized by voice. Then, the unnecessary video deleting unit 23 may delete the video before detecting “action” and the video after detecting “cut” as unnecessary video in each representative scene. As described above, since the summary video generation system 2 deletes the unnecessary video included in the representative scene selected by the representative scene selection unit 22 by the unnecessary video deletion unit 23, the summary video can be easily grasped. Can be generated.

サマリ映像生成手段２４は、不要映像削除手段２３が不要映像を削除した代表シーンの一部を連結してサマリ映像を生成するものである。ここで、サマリ映像生成手段２４は、生成したサマリ映像を、図示しない表示装置に表示させても良く、図示しない記録手段に録画しても良い。 The summary video generation unit 24 generates a summary video by connecting a part of the representative scene from which the unnecessary video deletion unit 23 deleted the unnecessary video. Here, the summary video generation means 24 may display the generated summary video on a display device (not shown) or record it on a recording means (not shown).

例えば、サマリ映像生成手段２４は、不要映像削除手段２３が不要映像を削除した代表シーンの全てについて、各代表シーンのフレーム間の動きベクトルを算出し、この動きベクトル量が予め設定した閾値以上となったフレームを連結してサマリ映像を生成する。また、サマリ映像生成手段２４は、各代表シーンの先頭から所定の時間（例えば、先頭から３秒）の映像を連結してサマリ映像を生成しても良い。また、サマリ映像生成手段２４は、各代表シーンの音量変化が激しい部分の映像（盛り上がり部分の映像）を連結してサマリ映像を生成しても良い。なお、サマリ映像生成手段２４は、不要映像削除手段２３が不要映像を削除した代表シーンの全部を連結し、サマリ映像を生成しても良いことは言うまでもない。 For example, the summary video generation unit 24 calculates a motion vector between frames of each representative scene for all the representative scenes from which the unnecessary video deletion unit 23 deleted the unnecessary video, and the motion vector amount is equal to or greater than a preset threshold value. A summary video is generated by connecting the frames. Further, the summary video generation unit 24 may generate a summary video by connecting videos of a predetermined time (for example, 3 seconds from the top) from the top of each representative scene. Further, the summary video generation means 24 may generate a summary video by linking videos of portions where the volume change of each representative scene is sharp (video of a climax portion). Needless to say, the summary video generation unit 24 may generate a summary video by connecting all the representative scenes from which the unnecessary video deletion unit 23 deleted the unnecessary video.

［サマリ映像生成システムの動作］
以下、図５のサマリ映像生成システムの動作について、図７を参照して説明する（適宜図５参照）。図７は、図５のサマリ映像生成システムの動作を示すフローチャートである。 [Operation of summary video generation system]
Hereinafter, the operation of the summary video generation system of FIG. 5 will be described with reference to FIG. 7 (see FIG. 5 as appropriate). FIG. 7 is a flowchart showing the operation of the summary video generation system of FIG.

まず、サマリ映像生成システム２は、シーン抽出手段２１によって、素材映像を入力すると共に、公知の手法で、素材映像のシーンチェンジ点を検出して、素材映像に含まれる全てのシーンを抽出する（ステップＳ１１）。ステップＳ１１の処理に続いて、サマリ映像生成システム２は、シーン類似判定装置１によって、シーン抽出手段２１から入力された２個のシーンが類似するか否かを判定する（ステップＳ１２）。 First, the summary video generation system 2 inputs the material video by the scene extraction means 21 and detects scene change points of the material video by a known method to extract all scenes included in the material video ( Step S11). Following the processing of step S11, the summary video generation system 2 determines whether the two scenes input from the scene extraction unit 21 are similar by the scene similarity determination device 1 (step S12).

ステップＳ１２の処理に続いて、サマリ映像生成システム２は、代表シーン選択手段２２によって、シーン抽出手段２１が抽出した複数のシーンのうち、シーン類似判定装置１が類似しないと判定したシーンと、シーン類似判定装置が類似すると判定したシーンの組み合わせうちの最も長いシーンとを代表シーンとして選択する（ステップＳ１３）。 Following the processing of step S12, the summary video generation system 2 uses the representative scene selection unit 22 to determine a scene that the scene similarity determination device 1 has determined to be dissimilar among the plurality of scenes extracted by the scene extraction unit 21. The longest scene among the combinations of scenes determined to be similar by the similarity determination device is selected as a representative scene (step S13).

ステップＳ１３の処理に続いて、サマリ映像生成システム２は、不要映像削除手段２３によって、公知の手法で、代表シーン選択手段２２が選択した代表シーンから不要映像を削除する（ステップＳ１４）。ステップＳ１３の処理に続いて、サマリ映像生成システム２は、サマリ映像生成手段２４によって、不要映像削除手段２３が不要映像を削除した代表シーンの一部を連結してサマリ映像を生成する（ステップＳ１５）。 Following the processing of step S13, the summary video generation system 2 deletes the unnecessary video from the representative scene selected by the representative scene selection unit 22 by the unnecessary video deletion unit 23 by a known method (step S14). Following the processing of step S13, the summary video generation system 2 generates a summary video by connecting a part of the representative scene from which the unnecessary video deletion unit 23 deleted the unnecessary video by the summary video generation unit 24 (step S15). ).

以上のように、サマリ映像生成システムは、撮り直しが多い素材映像から、重複したシーンを削除したサマリ映像を容易に生成することができる。このサマリ映像を参照することで、制作者は、素材映像の要点を短時間で把握でき、時間と労力の大幅な短縮につながる。 As described above, the summary video generation system can easily generate a summary video from which duplicate scenes are deleted from material videos that are frequently re-taken. By referring to this summary video, the creator can quickly understand the key points of the material video, leading to a significant reduction in time and labor.

なお、本発明に係るサマリ映像生成システムは、ドラマや映画の制作に限定して用いられるものではない。現在、デジタルカメラ、デジタルビデオ等の家庭用撮影機器、及び、これら家庭用撮影機器で撮影した大量の素材映像を録画する家庭用録画機器は、一般家庭に普及してきており、誰でも手軽に映像コンテンツを生成できる。このため、一般家庭においても、映像の編集を簡単に行いたいという要望があると考えられる。このとき、大量の素材映像から利用者が必要とする映像を選択する作業が必須と考えられ、本発明に係るサマリ映像生成システムを適用すれば、利用者が素材映像を全て見る必要がなく、利用者が効率的に作業を進められる。つまり、本発明に係るサマリ映像生成システムは、家庭用撮影機器又は家庭用録画機器に組み込むことで、サマリ映像を利用者に提供するという機能を担うことができる。 The summary video generation system according to the present invention is not limited to the production of dramas and movies. Currently, home camera devices such as digital cameras and digital videos, and home video recorders that record a large amount of material footage taken with these home camera devices have become widespread in ordinary households, and anyone can easily record images. Can generate content. For this reason, it is considered that there is a demand for easy video editing even in ordinary households. At this time, it is considered essential to select a video required by the user from a large amount of material video, and if the summary video generation system according to the present invention is applied, the user does not need to see all the material video, Users can work efficiently. That is, the summary video generation system according to the present invention can have a function of providing a summary video to a user by being incorporated in a home photographing device or a home recording device.

この場合、一般家庭において撮影した映像には、「アクション」といった音声や「カチンコ」の映像が含まれない可能性が高いため、サマリ映像生成システムを家庭用撮影機器又は家庭用録画機器に組み込む場合、本発明に係るサマリ映像生成システムは、前記した不要映像削除手段を備えない構成としても良い。 In this case, since it is highly possible that the video shot in a general home does not include audio such as “action” or “clapperboard” video, the summary video generation system is incorporated into a home video recording device or home video recording device. The summary video generation system according to the present invention may be configured not to include the above-described unnecessary video deletion means.

本発明における素材映像の内容を説明する図である。It is a figure explaining the content of the material image | video in this invention. 本発明の第１実施形態に係るシーン類似判定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the scene similarity determination apparatus which concerns on 1st Embodiment of this invention. （ａ）は、図２の静止画抽出手段１２による静止画の抽出を説明する図であり、（ｂ）は、静止画類似判定手段１３による静止画の類似判定を説明する図である。(A) is a figure explaining the extraction of the still picture by the still picture extraction means 12 of FIG. 2, (b) is a figure explaining the similarity determination of the still picture by the still picture similarity judgment means 13. 図２のシーン類似判定装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the scene similarity determination apparatus of FIG. 本発明の第２実施形態に係るサマリ映像生成システムの構成を示すブロック図である。It is a block diagram which shows the structure of the summary video generation system which concerns on 2nd Embodiment of this invention. 図５の代表シーン選択手段２２による代表シーンの選択を説明する図である。It is a figure explaining selection of the representative scene by the representative scene selection means 22 of FIG. 図５のサマリ映像生成システムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the summary video generation system of FIG.

Explanation of symbols

１シーン類似判定装置
１０入力手段
１１シーン分割手段
１２静止画抽出手段
１３静止画類似判定手段
１４シーン類似判定手段
１５パラメータ記憶手段
２サマリ映像生成システム
２１シーン抽出手段
２２代表シーン選択手段
２３不要映像削除手段
２４サマリ映像生成手段 DESCRIPTION OF SYMBOLS 1 Scene similarity determination apparatus 10 Input means 11 Scene division means 12 Still image extraction means 13 Still image similarity determination means 14 Scene similarity determination means 15 Parameter storage means 2 Summary video generation system 21 Scene extraction means 22 Representative scene selection means 23 Unnecessary video deletion 23 Means 24 Summary video generation means

Claims

In a scene similarity determination device that determines whether or not two scenes extracted from the material video are similar for a material video including an overlapping scene,
Input means for inputting the two scenes;
The amount of motion between frames constituting the scene is calculated, and the scene is divided into a large motion section in which the motion amount is greater than or equal to a preset threshold and a small motion section in which the motion amount is less than the threshold. Scene dividing means to perform,
For each small motion section of the scene, a certain time before the start time of the large motion section located immediately after the small motion section or an intermediate time of the small motion section close to the start time of the large motion section Still image extraction means for extracting the frame of the motion subsection at the time of as a still image;
Calculating the feature amount of the still image extracted from one of the two scenes, calculating the feature amount of the still image extracted from the other of the two scenes, and For all combinations of still images of one scene and still images of the other scene, the one of the scenes based on the feature of the still image of the one scene and the feature of the still image of the other scene A still image similarity determination means for determining whether a still image of a scene is similar to a still image of the other scene;
When the still image similarity determination unit determines that the one or more combinations of the still image of the one scene and the still image of the other scene are similar, the one scene and the other scene are similar Then, a scene similarity determination means for determining,
A scene similarity determination apparatus comprising:

When the still image similarity determination unit determines that any one combination of the still image of the one scene and the still image of the other scene is similar, the still image of the one scene and the other scene The scene similarity determination device according to claim 1, wherein the determination as to whether another combination with the still image is similar is terminated.

The still image similarity determining means corresponds to the still image of the one scene and the still image of the other scene as the feature amount of the still image of the one scene and the feature amount of the still image of the other scene. And calculating an Euclidean distance of an average value of a predetermined color space between the still image of the one scene and the still image of the other scene, and the Euclidean distance is equal to or less than a preset distance. When the still image of the one scene and the still image of the other scene are determined to be similar, and the Euclidean distance exceeds the distance, the still image of the one scene and the still image of the other scene The scene similarity determination apparatus according to claim 1, wherein the scene similarity determination device is determined not to be similar to each other.

In order to determine whether or not two scenes extracted from the material video are similar for a material video including overlapping scenes,
Input means for inputting the two scenes;
The amount of motion between frames constituting the scene is calculated, and the scene is divided into a large motion section in which the motion amount is greater than or equal to a preset threshold and a small motion section in which the motion amount is less than the threshold. Scene dividing means to perform,
For each small motion section of the scene, a certain time before the start time of the large motion section located immediately after the small motion section or an intermediate time of the small motion section close to the start time of the large motion section A still image extracting means for extracting the frame of the motion subsection at the time of
Calculating the feature amount of the still image extracted from one of the two scenes, calculating the feature amount of the still image extracted from the other of the two scenes, and For all combinations of still images of one scene and still images of the other scene, the one of the scenes based on the feature of the still image of the one scene and the feature of the still image of the other scene A still image similarity determining means for determining whether a still image of a scene is similar to a still image of the other scene;
When the still image similarity determination unit determines that the one or more combinations of the still image of the one scene and the still image of the other scene are similar, the one scene and the other scene are similar Then, a scene similarity determination means for determining,
A scene similarity determination program characterized by functioning as

In a summary video generation system for generating a summary video composed of different scenes excluding the duplicate scene from a material video including the duplicate scene,
A scene extracting means for detecting a scene change point and extracting a plurality of the scenes included in the material video;
The scene similarity determination device according to claim 1, wherein the scene similarity determination device determines whether the combinations of the scenes extracted by the scene extraction unit are similar.
Of the scenes extracted by the scene extraction unit, the scene determined by the scene similarity determination device and the longest scene among the combinations of scenes determined by the scene similarity determination device to be similar are selected as representative scenes. Representative scene selection means;
Summary video generation means for generating a summary video by connecting a part or all of the representative scene selected by the representative scene selection means;
A summary video generation system comprising: