JP2005080209A

JP2005080209A - Moving image division method, moving image dividing apparatus, multimedia search index imparting apparatus, and moving image dividing program

Info

Publication number: JP2005080209A
Application number: JP2003311636A
Authority: JP
Inventors: Masayuki Ishikawa; 石川　　雅之
Original assignee: NTT Comware Corp
Current assignee: NTT Comware Corp
Priority date: 2003-09-03
Filing date: 2003-09-03
Publication date: 2005-03-24

Abstract

<P>PROBLEM TO BE SOLVED: To provide a moving image division method, moving image dividing apparatus and moving image dividing program which realize division of a moving image file into scenes corresponding to its contents and high efficiency in editing a moving image. <P>SOLUTION: The moving image dividing apparatus 100 is provided with a cut point recognizing section 101, a local variation part recognizing section 102, a high-level sound recognizing section 103, a divided file number adjusting section 104, an original moving image storage section 111, a parameter information storage section 112 and a divided moving image storage section 113. In the order of cut point recognition, local variation part recognition and high-level sound recognition, scene division candidate points of a moving image file are judged and a moving image file 10 that is inputted with the upper limit number of divisions as its upper limit, is divided into the predetermined number of divisions at the scene division candidate points. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、動画ファイルの分割を行う動画分割方法、動画分割装置、マルチメディア用検索インデックス付与装置、および動画分割プログラムに関する。 The present invention relates to a moving image dividing method, a moving image dividing device, a multimedia search index assigning device, and a moving image dividing program.

動画コンテンツ（動画ファイル）を編集する際には、意味的なまとまりがあるものとして映像シーンの変わり目や音楽の切れ目ごとに、動画コンテンツを分割することがある。例えば、マルチメディア検索システムにおいては、検索インデックス付与（各シーンごとに分割された動画像に対してそれぞれの内容に対応するテキスト文章を検索インデックスとして付与する）のために動画コンテンツを分割したりする。また、受信した放送コンテンツ（動画コンテンツ）に関連するコンテンツなどの付随情報を付加するシステムにおいては、放送コンテンツを複数の放送データ素片にするために放送コンテンツを分割したりする。 When editing moving image content (moving image file), there is a case where the moving image content is divided at every transition of a video scene or a break of music because there is a semantic unity. For example, in a multimedia search system, a moving image content is divided for providing a search index (a text sentence corresponding to each content is assigned as a search index to a moving image divided for each scene). . In addition, in a system that adds accompanying information such as content related to received broadcast content (moving image content), the broadcast content is divided in order to make the broadcast content into a plurality of pieces of broadcast data.

そして、このような動画分割に関しては、一般に動画ファイルを構成する各フレームの類似度からカット点を検出したり、また、無音声部分を検出したりする方法が採用されている。 With regard to such moving image division, generally, a method of detecting a cut point from the similarity of each frame constituting a moving image file or detecting a silent portion is employed.

尚、この出願に関連する先行技術文献情報としては、次のものがある。
特開２００２−２６２２２５公報特開平９−２１４８７９公報 The prior art document information related to this application includes the following.
JP 2002-262225 A Japanese Patent Laid-Open No. 9-214879

しかしながら、実際には様々な内容の動画ファイルが存在するため、上述したような従来の方法により画一的にシーン分割を行っても、内容に合った適切な分割となっていないという場合がある。即ち、動画ファイルの内容に応じた柔軟なシーン分割ができないという課題がある。これは、例えば、歌手のプロモーションビデオであれば、シーンが切り替わる場面が多々あるため、カット点を検出することによりシーン分割を行うことが有効であるが、ニュース番組や武道の演舞ビデオなどでは、シーンがあまり切り替わらないため、カット点の検出によるシーン分割は有効とならないというように、すべての動画コンテンツを単一の判断基準に基づいて分割しようとしても、有効でないことを意味する。 However, since there are actually moving image files with various contents, even if scene division is performed uniformly by the conventional method as described above, there are cases where the division is not appropriate for the contents. . That is, there is a problem that flexible scene division according to the content of the moving image file cannot be performed. For example, in the case of a promotion video of a singer, there are many scenes that change scenes, so it is effective to divide the scene by detecting the cut point, but in news programs and martial arts performance videos, Since scenes are not switched so much, scene division by detection of cut points is not effective, meaning that even if all moving image contents are divided based on a single criterion, it is not effective.

また、このような動画分割は所定のアルゴリズムに従って自動的に行われるものであるが、動画ファイルの内容によっては、何十、何百という数のシーン分割ファイルが作成されてしまう可能性がある。従って、この場合には、この多数のシーン分割ファイルをもとに、その後の動画編集作業をしなければならなくなり、動画編集作業における人的負担が大きいという課題がある。特に、上述したマルチメディア検索システムにおいては、一画面に表示され得る数を越えてシーン分割ファイルが作成されると、シーン分割ファイルに対する検索インデックス付与作業は煩雑化するという問題が生ずる。 Also, such moving image division is automatically performed according to a predetermined algorithm, but depending on the content of the moving image file, dozens or hundreds of scene division files may be created. Therefore, in this case, the subsequent moving image editing work must be performed based on the large number of scene division files, and there is a problem that the human burden in the moving image editing work is large. In particular, in the multimedia search system described above, when a scene division file is created exceeding the number that can be displayed on one screen, a search index assignment operation for the scene division file becomes complicated.

本発明は、上記の課題を解決するためになされたものであり、動画ファイルの内容に応じたシーン分割を可能とするとともに、動画編集作業の効率化を図ることができる動画分割方法、動画分割装置、マルチメディア用検索インデックス付与装置、および動画分割プログラムを提供することを目的とする。 The present invention has been made in order to solve the above-described problems, and a moving image dividing method and a moving image dividing method that can perform scene division according to the contents of a moving image file and improve the efficiency of moving image editing work. It is an object to provide a device, a multimedia search index assigning device, and a moving image dividing program.

上記目的を達成するため、請求項１記載の本発明は、動画の内容に合わせて動画ファイルをシーン分割する動画分割方法であって、動画分割に関する第１及び第２の閾値、並びに前記シーン分割の最大数である分割上限数を記憶する記憶手段を備えるコンピュータが、入力された分割前の元動画である前記動画ファイルに対して、前記動画ファイルの時間的に隣接するフレーム間すべての類似度に関する第１の数値を算出し、該第１の数値それぞれと、前記記憶手段に記憶された前記第１の閾値と、の比較に基づいて、シーン分割の候補点を検出するカット点検出ステップと、前記カット点検出ステップで前記候補点が検出されない場合には、前記フレームを複数に分割した各ブロックにおいて、前記時間的に隣接するフレーム間すべての類似度及び輝度差に関する第２の数値を算出し、該第２の数値それぞれと、前記記憶手段に記憶された前記第２の閾値と、の比較に基づいて、シーン分割の候補点を検出する局所変動箇所検出ステップと、前記局所変動箇所検出ステップで前記候補点が検出されない場合には、前記動画ファイルの全音声データの音声レベルの平均値を算出し、該平均値と、前記動画ファイルに含まれる音声データの音声レベルと、の比較に基づいて、シーン分割の候補点を検出する高レベル音声検出ステップと、前記カット点検出ステップ、前記局所変動箇所検出ステップ、又は前記高レベル音声検出ステップにおいて検出されたシーン分割の候補点の数が、前記記憶手段に記憶された前記分割上限数以下の場合には、前記シーン分割の候補点に従って前記動画ファイルを分割し、前記カット点検出ステップ、前記局所変動箇所検出ステップ、又は前記高レベル音声検出ステップにおいて検出されたシーン分割の候補点の数が、前記分割上限数を越えている場合には、それぞれの前記候補点において算出された前記第１の数値、前記第２の数値、又は前記音声レベルの大小に基づいて、前記分割上限数分の前記候補点を選択し、選択した候補点に従って前記動画ファイルを分割する分割ファイル数調節ステップと、を実行することを要旨とする。 In order to achieve the above object, the present invention according to claim 1 is a moving image dividing method for dividing a moving image file into scenes according to the contents of the moving image, wherein the first and second thresholds relating to moving image division, and the scene dividing The computer having storage means for storing the upper limit number of divisions, which is the maximum number of frames, is similar to all the similarities between temporally adjacent frames of the moving image file with respect to the input moving image that is the original moving image before dividing. A cut point detection step of calculating a first numerical value related to the scene and detecting a candidate point for scene division based on a comparison between each of the first numerical values and the first threshold value stored in the storage unit; If the candidate point is not detected in the cut point detection step, all the classes between the temporally adjacent frames in each block obtained by dividing the frame into a plurality of blocks. A second numerical value relating to a degree and a luminance difference, and a local point for detecting a candidate point for scene division based on a comparison between each of the second numerical values and the second threshold value stored in the storage unit If the candidate point is not detected in the fluctuation part detection step and the local fluctuation part detection step, an average value of the audio levels of all audio data of the moving image file is calculated, and the average value is included in the moving image file. A high-level sound detection step for detecting candidate points for scene division based on a comparison with the sound level of the sound data to be recorded, and the cut point detection step, the local variation point detection step, or the high-level sound detection step. When the number of detected scene division candidate points is equal to or less than the upper limit number of divisions stored in the storage unit, the motion division is performed according to the scene division candidate points. When the file is divided and the number of candidate points for scene division detected in the cut point detection step, the local variation point detection step, or the high-level sound detection step exceeds the division upper limit number, Based on the first numerical value, the second numerical value, or the level of the audio level calculated at each candidate point, the candidate points for the upper limit number of divisions are selected, and the candidate points are selected according to the selected candidate points. The gist of the present invention is to execute a divided file number adjusting step for dividing a moving image file.

請求項２記載の本発明は、動画の内容に合わせてシーン分割した分割動画を作成するとともに、クライアント端末からの要求に応じて前記分割動画に対して検索インデックスを付与するマルチメディア用検索インデックス付与装置の動画分割方法であって、動画分割に関する第１及び第２の閾値、並びに前記シーン分割の最大数である分割上限数を記憶する記憶手段を備える前記マルチメディア用検索インデックス付与装置が、前記クライアント端末で選択された元動画である動画ファイルを受信する元動画受信ステップと、前記動画ファイルに対して、前記動画ファイルの時間的に隣接するフレーム間すべての類似度に関する第１の数値を算出し、該第１の数値それぞれと、前記記憶手段に記憶された前記第１の閾値と、の比較に基づいて、シーン分割の候補点を検出するカット点検出ステップと、前記カット点検出ステップで前記候補点が検出されない場合には、前記フレームを複数に分割した各ブロックにおいて、前記時間的に隣接するフレーム間すべての類似度及び輝度差に関する第２の数値を算出し、該第２の数値それぞれと、前記記憶手段に記憶された前記第２の閾値と、の比較に基づいて、シーン分割の候補点を検出する局所変動箇所検出ステップと、前記局所変動箇所検出ステップで前記候補点が検出されない場合には、前記動画ファイルの全音声データの音声レベルの平均値を算出し、該平均値と、前記動画ファイルに含まれる音声データの音声レベルと、の比較に基づいて、シーン分割の候補点を検出する高レベル音声検出ステップと、前記カット点検出ステップ、前記局所変動箇所検出ステップ、又は前記高レベル音声検出ステップにおいて検出されたシーン分割の候補点の数が、前記記憶手段に記憶された前記分割上限数以下の場合には、前記シーン分割の候補点に従って前記動画ファイルを分割し、前記カット点検出ステップ、前記局所変動箇所検出ステップ、又は前記高レベル音声検出ステップにおいて検出されたシーン分割の候補点の数が、前記分割上限数を越えている場合には、それぞれの前記候補点において算出された前記第１の数値、前記第２の数値、又は前記音声レベルの大小に基づいて、前記分割上限数分の前記候補点を選択し、選択した候補点に従って前記動画ファイルを分割する分割ファイル数調節ステップと、前記分割ファイル数調節ステップで分割された分割動画それぞれを前記クライアント端末に送信して表示する表示ステップと、を実行することを要旨とする。 The present invention according to claim 2 creates a divided video that is divided into scenes according to the content of the video, and adds a search index for multimedia that gives a search index to the divided video in response to a request from a client terminal. The multimedia search index assigning apparatus, comprising: storage means for storing first and second threshold values relating to video division and a division upper limit number that is the maximum number of scene divisions. An original moving image receiving step for receiving a moving image file that is an original moving image selected by a client terminal, and a first numerical value related to all similarities between temporally adjacent frames of the moving image file with respect to the moving image file And based on a comparison between each of the first numerical values and the first threshold value stored in the storage means. A cut point detecting step for detecting candidate points for scene division, and when the candidate points are not detected in the cut point detecting step, in each block obtained by dividing the frame into a plurality of temporally adjacent frames And calculating a second numerical value relating to the similarity and the luminance difference of the image, and detecting candidate points for scene division based on a comparison between each of the second numerical values and the second threshold value stored in the storage unit When the candidate point is not detected in the local variation location detection step and the local variation location detection step, an average value of audio levels of all audio data of the video file is calculated, and the average value and the video file A high-level sound detection step for detecting candidate points for scene division based on a comparison with the sound level of the sound data included in the sound data, and the cut point detection step. If the number of scene division candidate points detected in the local variation point detection step or the high-level sound detection step is equal to or less than the upper limit number of divisions stored in the storage means, The moving image file is divided according to the candidate points, and the number of scene division candidate points detected in the cut point detection step, the local variation point detection step, or the high-level sound detection step exceeds the division upper limit number. And selecting the candidate points for the upper limit number of divisions based on the first numerical value, the second numerical value, or the level of the audio level calculated at each of the candidate points. The divided file number adjusting step for dividing the moving image file according to the candidate points, and the divided moving image divided in the divided file number adjusting step, respectively. The gist is to execute a display step of transmitting the information to the client terminal and displaying it.

請求項３記載の本発明は、動画の内容に合わせて動画ファイルをシーン分割する動画分割装置であって、動画分割に関する第１及び第２の閾値、並びに前記シーン分割の最大数である分割上限数を記憶する記憶手段と、入力された分割前の元動画である前記動画ファイルに対して、前記動画ファイルの時間的に隣接するフレーム間すべての類似度に関する第１の数値を算出し、該第１の数値それぞれと、前記記憶手段に記憶された前記第１の閾値と、の比較に基づいて、シーン分割の候補点を検出するカット点検出手段と、前記カット点検出手段で前記候補点が検出されない場合には、前記フレームを複数に分割した各ブロックにおいて、前記時間的に隣接するフレーム間すべての類似度及び輝度差に関する第２の数値を算出し、該第２の数値それぞれと、前記記憶手段に記憶された前記第２の閾値と、の比較に基づいて、シーン分割の候補点を検出する局所変動箇所検出手段と、前記局所変動箇所検出手段で前記候補点が検出されない場合には、前記動画ファイルの全音声データの音声レベルの平均値を算出し、該平均値と、前記動画ファイルに含まれる音声データの音声レベルと、の比較に基づいて、シーン分割の候補点を検出する高レベル音声検出手段と、前記カット点検出手段、前記局所変動箇所検出手段、又は前記高レベル音声検出手段において検出されたシーン分割の候補点の数が、前記記憶手段に記憶された前記分割上限数以下の場合には、前記シーン分割の候補点に従って前記動画ファイルを分割し、前記カット点検出手段、前記局所変動箇所検出手段、又は前記高レベル音声検出手段において検出されたシーン分割の候補点の数が、前記分割上限数を越えている場合には、それぞれの前記候補点において算出された前記第１の数値、前記第２の数値、又は前記音声レベルの大小に基づいて、前記分割上限数分の前記候補点を選択し、選択した候補点に従って前記動画ファイルを分割する分割ファイル数調節手段と、を有することを要旨とする。 The present invention described in claim 3 is a moving image dividing apparatus for dividing a moving image file into scenes according to the contents of the moving image, wherein the first and second thresholds relating to moving image division and the division upper limit which is the maximum number of scene divisions Storage means for storing the number, and for the moving image file that is the input original moving image before the division, calculating a first numerical value regarding all similarities between temporally adjacent frames of the moving image file, Cut point detection means for detecting candidate points for scene division based on a comparison between each first numerical value and the first threshold value stored in the storage means, and the candidate points by the cut point detection means Is not detected, in each block obtained by dividing the frame into a plurality, a second numerical value relating to all similarities and luminance differences between the temporally adjacent frames is calculated, and the second number Based on a comparison between each and the second threshold value stored in the storage unit, a local variation point detection unit that detects a candidate point for scene division, and the candidate point is detected by the local variation point detection unit If not, an average value of the audio levels of all the audio data of the video file is calculated, and based on a comparison between the average value and the audio level of the audio data included in the video file, scene division candidates The number of candidate points for scene division detected by the high-level sound detection means for detecting points, the cut point detection means, the local variation point detection means, or the high-level sound detection means is stored in the storage means. In the case where the number is equal to or less than the division upper limit number, the moving image file is divided according to the scene division candidate points, and the cut point detection unit, the local variation point detection unit, or the When the number of scene division candidate points detected by the level sound detection means exceeds the division upper limit number, the first numerical value, the second numerical value calculated at each of the candidate points, Alternatively, the present invention includes a divided file number adjusting unit that selects the candidate points for the upper limit number of divisions based on the level of the audio level and divides the moving image file according to the selected candidate points.

請求項４記載の本発明は、動画の内容に合わせてシーン分割した分割動画を作成するとともに、クライアント端末からの要求に応じて前記分割動画に対して検索インデックスを付与するマルチメディア用検索インデックス付与装置であって、動画分割に関する第１及び第２の閾値、並びに前記シーン分割の最大数である分割上限数を記憶する記憶手段と、前記クライアント端末で選択された元動画である動画ファイルを受信する元動画受信手段と、前記動画ファイルに対して、前記動画ファイルの時間的に隣接するフレーム間すべての類似度に関する第１の数値を算出し、該第１の数値それぞれと、前記記憶手段に記憶された前記第１の閾値と、の比較に基づいて、シーン分割の候補点を検出するカット点検出手段と、前記カット点検出手段で前記候補点が検出されない場合には、前記フレームを複数に分割した各ブロックにおいて、前記時間的に隣接するフレーム間すべての類似度及び輝度差に関する第２の数値を算出し、該第２の数値それぞれと、前記記憶手段に記憶された前記第２の閾値と、の比較に基づいて、シーン分割の候補点を検出する局所変動箇所検出手段と、前記局所変動箇所検出手段で前記候補点が検出されない場合には、前記動画ファイルの全音声データの音声レベルの平均値を算出し、該平均値と、前記動画ファイルに含まれる音声データの音声レベルと、の比較に基づいて、シーン分割の候補点を検出する高レベル音声検出手段と、前記カット点検出手段、前記局所変動箇所検出手段、又は前記高レベル音声検出手段において検出されたシーン分割の候補点の数が、前記記憶手段に記憶された前記分割上限数以下の場合には、前記シーン分割の候補点に従って前記動画ファイルを分割し、前記カット点検出手段、前記局所変動箇所検出手段、又は前記高レベル音声検出手段において検出されたシーン分割の候補点の数が、前記分割上限数を越えている場合には、それぞれの前記候補点において算出された前記第１の数値、前記第２の数値、又は前記音声レベルの大小に基づいて、前記分割上限数分の前記候補点を選択し、選択した候補点に従って前記動画ファイルを分割する分割ファイル数調節手段と、前記分割ファイル数調節手段で分割された分割動画それぞれを前記クライアント端末に送信して表示する表示手段と、を有することを要旨とする。 The present invention according to claim 4 provides a multimedia search index that creates a divided video that is divided into scenes according to the content of the video, and adds a search index to the divided video in response to a request from a client terminal. A storage means for storing first and second threshold values relating to video segmentation and a maximum number of divisions that is the maximum number of scene divisions; and a video file that is an original video selected by the client terminal A first numerical value relating to the similarity between all temporally adjacent frames of the moving image file, the original moving image receiving means, and the moving image file; Cut point detection means for detecting candidate points for scene division based on comparison with the stored first threshold value, and the cut point detection means When the candidate point is not detected, in each block obtained by dividing the frame into a plurality of frames, a second numerical value relating to all similarities and luminance differences between the temporally adjacent frames is calculated, and the second numerical value is calculated. Based on a comparison between each and the second threshold value stored in the storage unit, a local variation point detection unit that detects a candidate point for scene division, and the candidate point is detected by the local variation point detection unit If not, an average value of the audio levels of all the audio data of the video file is calculated, and based on a comparison between the average value and the audio level of the audio data included in the video file, scene division candidates High-level sound detection means for detecting a point, and the candidate for scene division detected by the cut point detection means, the local variation point detection means, or the high-level sound detection means Is less than or equal to the upper limit number of divisions stored in the storage means, the video file is divided according to the scene division candidate points, and the cut point detection means, the local variation location detection means, or the When the number of candidate points for scene division detected by the high-level sound detection means exceeds the upper limit number for division, the first numerical value and the second numerical value calculated at each of the candidate points Alternatively, the number of candidate points corresponding to the upper limit number of divisions is selected based on the level of the audio level, and the divided file number adjusting unit that divides the moving image file according to the selected candidate points and the divided file number adjusting unit And a display means for transmitting and displaying each of the divided divided videos to the client terminal.

請求項５記載の本発明は、動画の内容に合わせて動画ファイルをシーン分割する動画分割プログラムであって、動画分割に関する第１及び第２の閾値、並びに前記シーン分割の最大数である分割上限数を記憶する記憶手段を備えるコンピュータに、入力された分割前の元動画である前記動画ファイルに対して、前記動画ファイルの時間的に隣接するフレーム間すべての類似度に関する第１の数値を算出し、該第１の数値それぞれと、前記記憶手段に記憶された前記第１の閾値と、の比較に基づいて、シーン分割の候補点を検出するカット点検出ステップと、前記カット点検出ステップで前記候補点が検出されない場合には、前記フレームを複数に分割した各ブロックにおいて、前記時間的に隣接するフレーム間すべての類似度及び輝度差に関する第２の数値を算出し、該第２の数値それぞれと、前記記憶手段に記憶された前記第２の閾値と、の比較に基づいて、シーン分割の候補点を検出する局所変動箇所検出ステップと、前記局所変動箇所検出ステップで前記候補点が検出されない場合には、前記動画ファイルの全音声データの音声レベルの平均値を算出し、該平均値と、前記動画ファイルに含まれる音声データの音声レベルと、の比較に基づいて、シーン分割の候補点を検出する高レベル音声検出ステップと、前記カット点検出ステップ、前記局所変動箇所検出ステップ、又は前記高レベル音声検出ステップにおいて検出されたシーン分割の候補点の数が、前記記憶手段に記憶された前記分割上限数以下の場合には、前記シーン分割の候補点に従って前記動画ファイルを分割し、前記カット点検出ステップ、前記局所変動箇所検出ステップ、又は前記高レベル音声検出ステップにおいて検出されたシーン分割の候補点の数が、前記分割上限数を越えている場合には、それぞれの前記候補点において算出された前記第１の数値、前記第２の数値、又は前記音声レベルの大小に基づいて、前記分割上限数分の前記候補点を選択し、選択した候補点に従って前記動画ファイルを分割する分割ファイル数調節ステップと、を実行させることを要旨とする。 The present invention according to claim 5 is a moving picture dividing program for dividing a moving picture file into scenes according to the contents of the moving picture, wherein the first and second thresholds relating to the moving picture division and the division upper limit which is the maximum number of the scene divisions A first numerical value related to all similarities between temporally adjacent frames of the moving image file is calculated with respect to the moving image file which is the original moving image before division into a computer having storage means for storing the number Then, based on a comparison between each of the first numerical values and the first threshold value stored in the storage means, a cut point detecting step for detecting candidate points for scene division, and the cut point detecting step When the candidate point is not detected, in each block obtained by dividing the frame into a plurality, the similarity and luminance difference between all temporally adjacent frames are related. A local variation point detecting step of detecting a candidate point for scene division based on a comparison between each of the second numerical values and the second threshold value stored in the storage unit If the candidate point is not detected in the local variation location detection step, an average value of the audio levels of all audio data of the video file is calculated, and the average value and the audio data included in the video file are calculated. A scene detected in the high-level sound detection step for detecting candidate points for scene division based on the comparison with the sound level, the cut point detection step, the local variation point detection step, or the high-level sound detection step When the number of division candidate points is equal to or less than the upper limit number of divisions stored in the storage unit, the moving image file is divided according to the scene division candidate points. If the number of candidate points for scene division detected in the cut point detection step, the local variation point detection step, or the high-level sound detection step exceeds the division upper limit number, each candidate point Based on the first numerical value, the second numerical value, or the level of the audio level calculated in step 1, the candidate points for the division upper limit number are selected, and the moving image file is divided according to the selected candidate points. The gist is to execute the step of adjusting the number of divided files.

請求項６記載の本発明は、動画の内容に合わせてシーン分割した分割動画を作成するとともに、クライアント端末からの要求に応じて前記分割動画に対して検索インデックスを付与するマルチメディア用検索インデックス付与装置の動画分割プログラムであって、動画分割に関する第１及び第２の閾値、並びに前記シーン分割の最大数である分割上限数を記憶する記憶手段を備える前記マルチメディア用検索インデックス付与装置に、前記クライアント端末で選択された元動画である動画ファイルを受信する元動画受信ステップと、前記動画ファイルに対して、前記動画ファイルの時間的に隣接するフレーム間すべての類似度に関する第１の数値を算出し、該第１の数値それぞれと、前記記憶手段に記憶された前記第１の閾値と、の比較に基づいて、シーン分割の候補点を検出するカット点検出ステップと、前記カット点検出ステップで前記候補点が検出されない場合には、前記フレームを複数に分割した各ブロックにおいて、前記時間的に隣接するフレーム間すべての類似度及び輝度差に関する第２の数値を算出し、該第２の数値それぞれと、前記記憶手段に記憶された前記第２の閾値と、の比較に基づいて、シーン分割の候補点を検出する局所変動箇所検出ステップと、前記局所変動箇所検出ステップで前記候補点が検出されない場合には、前記動画ファイルの全音声データの音声レベルの平均値を算出し、該平均値と、前記動画ファイルに含まれる音声データの音声レベルと、の比較に基づいて、シーン分割の候補点を検出する高レベル音声検出ステップと、前記カット点検出ステップ、前記局所変動箇所検出ステップ、又は前記高レベル音声検出ステップにおいて検出されたシーン分割の候補点の数が、前記記憶手段に記憶された前記分割上限数以下の場合には、前記シーン分割の候補点に従って前記動画ファイルを分割し、前記カット点検出ステップ、前記局所変動箇所検出ステップ、又は前記高レベル音声検出ステップにおいて検出されたシーン分割の候補点の数が、前記分割上限数を越えている場合には、それぞれの前記候補点において算出された前記第１の数値、前記第２の数値、又は前記音声レベルの大小に基づいて、前記分割上限数分の前記候補点を選択し、選択した候補点に従って前記動画ファイルを分割する分割ファイル数調節ステップと、前記分割ファイル数調節ステップで分割された分割動画それぞれを前記クライアント端末に送信して表示する表示ステップと、を実行させることを要旨とする。 The present invention according to claim 6 provides a multimedia search index that creates a divided video that is divided into scenes according to the content of the video, and adds a search index to the divided video in response to a request from a client terminal. The multimedia search index assigning apparatus, comprising: a storage means for storing a first and second threshold values related to video division, and a division upper limit number that is the maximum number of scene divisions. An original moving image receiving step for receiving a moving image file that is an original moving image selected by a client terminal, and a first numerical value related to all similarities between temporally adjacent frames of the moving image file with respect to the moving image file And based on a comparison between each of the first numerical values and the first threshold value stored in the storage means. A cut point detecting step for detecting candidate points for scene division, and if the candidate point is not detected in the cut point detecting step, the temporally adjacent frames in each block obtained by dividing the frame into a plurality of blocks Calculating a second numerical value regarding all similarities and luminance differences between them, and comparing each of the second numerical values with the second threshold value stored in the storage means, the candidate points for scene division When the candidate point is not detected in the local variation location detection step and the local variation location detection step, the average value of the audio levels of all audio data of the video file is calculated, and the average value, A high-level audio detection step for detecting candidate points for scene division based on a comparison with the audio level of audio data included in the video file, and the cut inspection If the number of candidate points for scene division detected in the step, the local variation point detection step, or the high-level sound detection step is less than or equal to the upper limit number of divisions stored in the storage means, The moving image file is divided according to candidate points, and the number of scene division candidate points detected in the cut point detection step, the local variation point detection step, or the high-level sound detection step exceeds the division upper limit number. And selecting the candidate points for the upper limit number of divisions based on the first numerical value, the second numerical value, or the level of the audio level calculated at each of the candidate points. A divided file number adjusting step for dividing the moving image file according to the candidate points, and a divided moving image divided by the divided file number adjusting step The gist is to execute a display step of transmitting and displaying each of them to the client terminal.

本発明によれば、動画ファイルの内容に応じたシーン分割を可能とするとともに、動画編集作業の効率化を図ることができる。 According to the present invention, it is possible to divide a scene according to the content of a moving image file and to improve the efficiency of moving image editing work.

以下、本発明の実施の形態を図面を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明の実施の形態に係る動画分割装置１００の概略構成図である。図１に示す動画分割装置１００は、カット点認識部１０１、局所変動箇所認識部１０２、高レベル音声認識部１０３、分割ファイル数調節部１０４、元動画記憶部１１１、パラメータ情報記憶部１１２、及び分割動画記憶部１１３を備えており、入力された動画ファイル１０をその内容に応じて所定の分割数に適切に分割するようになっている装置である。 FIG. 1 is a schematic configuration diagram of a moving image dividing apparatus 100 according to an embodiment of the present invention. 1 includes a cut point recognition unit 101, a local variation location recognition unit 102, a high-level speech recognition unit 103, a divided file number adjustment unit 104, an original video storage unit 111, a parameter information storage unit 112, and The apparatus includes a divided moving image storage unit 113, and appropriately divides the input moving image file 10 into a predetermined number of divisions according to the contents thereof.

尚、動画分割装置１００は、少なくとも演算機能および制御機能を備えた中央演算装置（ＣＰＵ）、プログラムやデータを格納する機能を有するＲＡＭ等からなる主記憶装置（メモリ）を有する電子的な装置から構成されているものである。 The moving image dividing apparatus 100 includes a central processing unit (CPU) having at least a calculation function and a control function, and an electronic device having a main storage device (memory) including a RAM having a function of storing programs and data. It is configured.

また、本実施の形態に係る各種処理を実行するプログラムは、前述した主記憶装置またはハードディスクに格納されているものである。 A program for executing various processes according to the present embodiment is stored in the main storage device or the hard disk described above.

このうち、カット点認識部１０１、局所変動箇所認識部１０２、高レベル音声認識部１０３、分割ファイル数調節部１０４は、上記ＣＰＵによる演算制御機能を具体的に示したものに他ならない。また、元動画記憶部１１１、パラメータ情報記憶部１１２、及び分割動画記憶部１１３は、主記憶装置の機能を備えたものである。 Among these, the cut point recognition unit 101, the local variation location recognition unit 102, the high-level speech recognition unit 103, and the divided file number adjustment unit 104 are nothing but a concrete example of the arithmetic control function by the CPU. Further, the original moving image storage unit 111, the parameter information storage unit 112, and the divided moving image storage unit 113 have functions of a main storage device.

元動画記憶部１１１は、動画分割装置１００に入力された動画ファイル１０（分割対象となる動画ファイル）を記憶するもので、また、パラメータ情報記憶部１１２は、動画ファイル１０を分割する際に必要なパラメータ値を記憶しているもので、例えば、具体的には、後述する各部での認識処理に必要なカット点認識閾値Ｐ１、局所変動箇所認識閾値Ｐ２、平均音声レベルＰ３、及び分割上限数Ｐ４などを記憶しているものである。 The original moving image storage unit 111 stores the moving image file 10 (moving image file to be divided) input to the moving image dividing device 100, and the parameter information storage unit 112 is necessary when dividing the moving image file 10. For example, specifically, a cut point recognition threshold value P1, a local variation point recognition threshold value P2, an average voice level P3, and a division upper limit number necessary for recognition processing in each unit to be described later are stored. P4 and the like are stored.

カット点認識部１０１は、動画ファイル１０を構成するフレームのうち、隣接するフレーム間の類似度を比較して、その差異を算出し（所定の方法により数値化する）、差異がカット点認識閾値Ｐ１より大きい場合には、該当個所をカット点と認識して、シーン分割の候補点として記憶するものである。 The cut point recognizing unit 101 compares the similarities between adjacent frames among the frames constituting the moving image file 10 and calculates the difference (digitizes the value by a predetermined method), and the difference is a cut point recognition threshold value. If it is larger than P1, the corresponding part is recognized as a cut point and stored as a candidate point for scene division.

ここで、図２（ａ）を用いて、カット点認識部１０１の機能を具体的に説明する。図２（ａ）においては、ポイントＡ及びＢでシーンが切り替わっているが、カット点認識部１０１は上述したフレーム間の類似度の比較により、ポイントＡ及びＢを検出するものである。このように、カット点認識部１０１における機能は、動画ファイルには、ごく一般的である複数シーンが存在する動画に対して有効なものである。 Here, the function of the cut point recognition unit 101 will be specifically described with reference to FIG. In FIG. 2A, scenes are switched at points A and B, but the cut point recognition unit 101 detects points A and B by comparing the similarity between frames described above. As described above, the function of the cut point recognition unit 101 is effective for a moving image in which a plurality of scenes that are very common exist in a moving image file.

局所変動箇所認識部１０２は、動画ファイルのフレームを論理的な複数のブロックに分け、該ブロックそれぞれごとに隣接するフレーム間の類似度、及び輝度値を比較して、その差異を算出し（所定の方法により数値化する）、差異が局所変動箇所認識閾値Ｐ２より大きい場合には、該当個所をシーン分割の候補点として記憶するものである。 The local variation location recognition unit 102 divides the frame of the moving image file into a plurality of logical blocks, compares the similarity between adjacent frames and the luminance value for each block, and calculates the difference (predetermined) If the difference is larger than the local variation location recognition threshold P2, the corresponding location is stored as a candidate point for scene division.

ここで、図２（ｂ）及び図３を用いて、局所変動箇所認識部１０２の機能を具体的に説明する。図３は、１画像フレームを９個のブロックに分割した例を示すもので、このうち少なくとも１つのブロックで、類似度、及び輝度値の差異が局所変動箇所認識閾値Ｐ２より大きければ、シーン分割の候補点として検出されるものである。図２（ｂ）においては、テロップ表示が行われているフレームの前後のポイントＣ及びＤで一部分が局所的に変動しているので、局所変動箇所認識部１０２は上述した各ブロックにおけるフレーム間の類似度及び輝度値の差異の比較により、ポイントＣ及びＤを検出するものである。このように、局所変動箇所認識部１０２における機能は、ニュース番組などシーンは切り替わらないが、テロップ表示などにより、話題が変わる動画に対して有効なものである。 Here, the function of the local variation location recognition unit 102 will be specifically described with reference to FIGS. 2B and 3. FIG. 3 shows an example in which one image frame is divided into nine blocks. If at least one of these blocks has a difference in similarity and luminance value greater than the local variation location recognition threshold P2, scene division is performed. Are detected as candidate points. In FIG. 2B, since a part locally varies at points C and D before and after the frame on which the telop display is performed, the local variation location recognition unit 102 performs inter-frame variation in each block described above. Points C and D are detected by comparing the difference in similarity and luminance value. As described above, the function in the local variation location recognition unit 102 is effective for a moving image in which a topic changes due to a telop display or the like, although a scene such as a news program does not change.

高レベル音声認識部１０３は、動画ファイルの音声データが相対的に高レベルとなっている箇所（音声レベルを数値化して判断する）を、比較的大きな声または音が発せられているとみなして、該当個所をシーン分割の候補点として記憶するものである。より詳しくは、高レベル音声部認識部１０３は、動画ファイル１０の全音声データの平均音声レベルＰ３を算出し、動画ファイル１０の音声データが該平均音声レベルＰ３より大きい場合には、該当個所をシーン分割の候補点として記憶するものである。 The high level voice recognition unit 103 regards a portion where the voice data of the moving image file is relatively high (determined by quantifying the voice level) as a relatively loud voice or sound. The corresponding portion is stored as a candidate point for scene division. More specifically, the high level sound recognition unit 103 calculates an average sound level P3 of all sound data of the moving image file 10, and if the sound data of the moving image file 10 is larger than the average sound level P3, the corresponding portion is determined. This is stored as a candidate point for scene division.

ここで、図２（ｃ）及び図４を用いて、高レベル音声認識部１０３の機能を具体的に説明する。図２（ｃ）においては、ポイントＥで掛け声が発せられ、音声データが相対的に高レベルとなっているので、高レベル音声認識部１０３は、このポイントＥを検出するものである。このように、高レベル音声認識部１０３における機能は、武道の演舞ビデオのように、シーン変動が少なくても音声により次の内容に移り変わる動画に対して有効なものである。尚、図４は、音声データの音声レベルを波形グラフにした具体例であるあるが、この具体例においては、水平部分（無音声レベル）を平均音声レベルとすれば、ポイントＦ１，…，Ｆ１２が候補点として検出される。 Here, the function of the high-level speech recognition unit 103 will be specifically described with reference to FIGS. In FIG. 2C, a shout is uttered at point E, and the voice data is at a relatively high level, so the high level voice recognition unit 103 detects this point E. As described above, the function in the high-level speech recognition unit 103 is effective for a moving image that changes to the next content by voice even if there is little scene variation, such as a martial arts dance video. FIG. 4 is a specific example in which the sound level of the sound data is represented by a waveform graph. In this specific example, if the horizontal portion (no sound level) is the average sound level, the points F1,. Are detected as candidate points.

分割ファイル数調整部１０４は、カット点認識部１０１、局所変動箇所認識部１０２、高レベル音声認識部１０３のいずれかで処理されたシーン分割の候補点において、分割ファイル１０を分割し、分割した分割動画２０ａ，…，２０ｎを分割動画記憶部１１３に記憶するものである。 The divided file number adjustment unit 104 divides the divided file 10 at the scene division candidate points processed by any one of the cut point recognition unit 101, the local variation location recognition unit 102, and the high level speech recognition unit 103. The divided moving images 20a,..., 20n are stored in the divided moving image storage unit 113.

また、分割ファイル数調整部１０４は、カット点認識部１０１、局所変動箇所認識部１０２、高レベル音声認識部１０３のいずれかで処理されたシーン分割の候補点が、パラメータ情報記憶部１１２に記憶された分割上限数Ｐ４より大きい場合には、所定の規則に基づいて分割上限数Ｐ４の数だけ選択し、選択した候補点においてシーン分割するようになっている。ここで、所定の規則とは、例えば、具体的には、図５に示すように、各候補点に付けられた点数に従って、高い順に分割上限数Ｐ４の数だけ選択するものであり（図５においては、分割上限数Ｐ４は３である）、この点数は、例えば、カット点認識部１０１、局所変動箇所認識部１０２、又は高レベル音声認識部１０３の処理時に候補点に数値化して付与されたものである。 Further, the division file number adjustment unit 104 stores, in the parameter information storage unit 112, candidate points for scene division processed by any one of the cut point recognition unit 101, the local variation location recognition unit 102, and the high-level speech recognition unit 103. If it is larger than the division upper limit number P4, the number of division upper limit numbers P4 is selected based on a predetermined rule, and the scene is divided at the selected candidate points. Here, for example, as shown in FIG. 5, the predetermined rule is, for example, selected by the number of division upper limit numbers P4 in descending order according to the number of points assigned to each candidate point (FIG. 5). In this case, the division upper limit number P4 is 3), and this score is numerically assigned to the candidate points when the cut point recognition unit 101, the local variation location recognition unit 102, or the high level speech recognition unit 103 processes, for example. It is a thing.

次に、本実施の形態に係る動画分割装置１００の動作を図６乃至８を用いて説明する。ここで、図６は、カット点認識部１０１によりシーン分割される場合の動画分割装置１００の動作を示すフローチャート図であり、図７は、局所変動箇所認識部１０２によりシーン分割される場合の動画分割装置１００の動作を示すフローチャート図であり、図８は、高レベル音声認識部１０３によりシーン分割される場合の動画分割装置１００の動作を示すフローチャート図である。 Next, the operation of the moving picture dividing apparatus 100 according to the present embodiment will be described with reference to FIGS. Here, FIG. 6 is a flowchart showing the operation of the moving image dividing apparatus 100 when the scene is divided by the cut point recognizing unit 101, and FIG. 7 shows the moving image when the scene is divided by the local variation location recognizing unit 102. FIG. 8 is a flowchart showing the operation of the dividing device 100, and FIG. 8 is a flowchart showing the operation of the moving image dividing device 100 when scenes are divided by the high level speech recognition unit 103.

まず、カット点認識部１０１の動作について説明する。動画分割装置１００のカット点認識部１０１は、元動画記憶部１１１より分割対象となる動画ファイル１０の動画像データを１フレームずつ取得して（ステップＳ１０）、前フレームとの類似度を数値で算出する（ステップＳ２０）。次に、この数値をパラメータ記憶部１１２より取得したカット点認識閾値Ｐ１と比較し（ステップＳ３０）、該閾値Ｐ１より差異が大きい場合には、この差異情報（シーン分割の候補点、及び数値）を一時ファイルに保存する（ステップＳ４０）。尚、本実施の形態においては、類似度が大きいほど、数値は低くなるようになっているとする。そして、動画ファイル１０の全動画像データに対して、１フレームずつ取得して（ステップＳ１０）、フレームごとに類似度を算出し（ステップＳ２０）、閾値Ｐ１よりも差異が大きい場合には（ステップＳ３０）、差異情報を保存する（ステップＳ４０）という上記の処理を繰り返し行う（ステップＳ５０）。 First, the operation of the cut point recognition unit 101 will be described. The cut point recognizing unit 101 of the moving image dividing apparatus 100 acquires the moving image data of the moving image file 10 to be divided from the original moving image storage unit 111 frame by frame (step S10), and the similarity with the previous frame is expressed numerically. Calculate (step S20). Next, this numerical value is compared with the cut point recognition threshold value P1 acquired from the parameter storage unit 112 (step S30), and when the difference is larger than the threshold value P1, the difference information (scene division candidate points and numerical value). Is stored in a temporary file (step S40). In the present embodiment, it is assumed that the numerical value decreases as the degree of similarity increases. Then, one frame is acquired for all moving image data of the moving image file 10 (step S10), the similarity is calculated for each frame (step S20), and when the difference is larger than the threshold value P1 (step S20). In step S30, the above process of saving the difference information (step S40) is repeated (step S50).

次に、分割ファイル数調節部１０４は、動画ファイル１０の全動画像データの取得が終了して、差異情報が１つ以上ある場合には（ステップＳ６０）、この差異情報の数が、パラメータ情報記憶部１１２より取得した分割上限数Ｐ４を越えているか否かをチェックする（ステップＳ７０）。そして、分割上限数Ｐ４を越えている場合には、数値の高い順に分割上限数Ｐ４だけ差異情報を選択し（ステップＳ８０）、選択した差異情報を元に、シーン分割して、分割ファイル２０ａ，…，２０ｎを作成し、分割動画記憶部１１３に記憶する（ステップＳ９０）。分割上限数Ｐ４を越えていない場合には、差異情報を元にシーン分割して、分割ファイル２０ａ，…，２０ｎを作成し、分割動画記憶部１１３に記憶する（ステップＳ９０）。 Next, when the acquisition of all moving image data of the moving image file 10 is completed and there is one or more difference information (step S60), the divided file number adjustment unit 104 determines that the number of difference information is parameter information. It is checked whether or not the division upper limit number P4 acquired from the storage unit 112 has been exceeded (step S70). If the division upper limit number P4 is exceeded, difference information is selected by the division upper limit number P4 in descending order of numerical values (step S80), and the scene is divided based on the selected difference information, and the divided files 20a, ..., 20n are created and stored in the divided moving image storage unit 113 (step S90). If the division upper limit number P4 is not exceeded, the scene is divided based on the difference information, and divided files 20a,..., 20n are created and stored in the divided moving image storage unit 113 (step S90).

これに対して、差異情報がない場合には、カット点認識によるシーン分割点はないと判断し、次の局所変動箇所認識部１０２の動作に移る（ステップＳ６０）。 On the other hand, if there is no difference information, it is determined that there is no scene division point by cut point recognition, and the operation proceeds to the next local variation location recognition unit 102 (step S60).

局所変動箇所部１０２は、動画ファイル１００の動画像データを１フレームずつ取得して（ステップＳ１００）、前フレームとの類似度及び輝度値との差異をブロックごとに数値で算出する（ステップＳ１１０）。次に、この数値をパラメータ記憶部１１２より取得した局所変動箇所認識閾値Ｐ２と比較し、該閾値Ｐ２より差異が大きい場合には（ステップＳ１２０）、この差異情報（シーン分割の候補点、及び数値）を一時ファイルに保存する（ステップＳ１３０）。尚、本実施の形態においては、類似度が小さいほど、また、輝度値の差異が大きいほど、数値は高くなるようになっている。そして、動画ファイル１０の全動画像データに対して、１フレームずつ取得して（ステップＳ１００）、フレームのブロックごとに類似度及び輝度差を算出し（ステップＳ１１０）、閾値Ｐ２よりも差異が大きい場合には（ステップＳ１２０）、差異情報を保存する（ステップＳ１３０）という上記の処理を繰り返し行う（ステップＳ１４０）。 The local variation part 102 acquires the moving image data of the moving image file 100 frame by frame (step S100), and calculates the similarity with the previous frame and the difference from the luminance value numerically for each block (step S110). . Next, this numerical value is compared with the local variation location recognition threshold value P2 acquired from the parameter storage unit 112. When the difference is larger than the threshold value P2 (step S120), the difference information (scene division candidate points and numerical values) ) Is saved in a temporary file (step S130). In this embodiment, the numerical value increases as the degree of similarity decreases and the difference in luminance value increases. Then, the entire moving image data of the moving image file 10 is acquired frame by frame (step S100), the similarity and the luminance difference are calculated for each block of the frame (step S110), and the difference is larger than the threshold P2. In such a case (step S120), the above process of saving the difference information (step S130) is repeated (step S140).

次に、分割ファイル数調節部１０４は、動画ファイル１０の全動画像データの取得が終了して、差異情報が１つ以上ある場合には（ステップＳ１５０）、この差異情報の数が、パラメータ情報記憶部１１２より取得した分割上限数Ｐ４を越えているか否かをチェックする（ステップＳ１６０）。そして、分割上限数Ｐ４を越えている場合には、数値の高い順に分割上限数Ｐ４だけ差異情報を選択し（ステップＳ１７０）、選択した差異情報を元に、シーン分割して、分割ファイル２０ａ，…，２０ｎを作成し、分割動画記憶部１１３に記憶する（ステップＳ１８０）。分割上限数Ｐ４を越えていない場合には、差異情報を元にシーン分割して、分割ファイル２０ａ，…，２０ｎを作成し、分割動画記憶部１１３に記憶する（ステップＳ１８０）。 Next, when the acquisition of all moving image data of the moving image file 10 is completed and there is one or more difference information (step S150), the divided file number adjustment unit 104 determines that the number of difference information is parameter information. It is checked whether or not the division upper limit number P4 acquired from the storage unit 112 has been exceeded (step S160). If the division upper limit number P4 is exceeded, difference information is selected by the division upper limit number P4 in descending order of numerical values (step S170), and the scene is divided based on the selected difference information, and the divided files 20a, ..., 20n are created and stored in the divided moving image storage unit 113 (step S180). If the division upper limit number P4 is not exceeded, the scene is divided based on the difference information, and divided files 20a,..., 20n are created and stored in the divided moving image storage unit 113 (step S180).

これに対して、差異情報がない場合には、局所変動箇所認識によるシーン分割点はないと判断し、次の高レベル音声認識部１０３の動作に移る（ステップＳ１５０）。 On the other hand, if there is no difference information, it is determined that there is no scene division point by local variation location recognition, and the operation proceeds to the next operation of the high level speech recognition unit 103 (step S150).

高レベル音声認識部１０３は、動画ファイル１０の全音声データを取得して、平均音声レベルＰ３を算出し（ステップＳ１９０）、動画ファイル１０から音声データを１件ずつ取得し（ステップＳ２００）、平均音声レベルＰ３と比較し（ステップＳ２１０）、取得した音声データが該平均音声レベルＰ３より大きい場合にはこの差異情報（シーン分割の候補点、及び数値）を一時ファイルに保存する（ステップＳ２２０）。尚、本実施の形態においては、音声レベルが大きいほど、数値は高くなるようになっている。そして、動画ファイル１０の全音声データに対して、音声レベルを１件ずつ取得し（ステップＳ２００）、平均音声レベルＰ３と比較し（ステップＳ２１０）、平均音声レベルＰ３よりも大きい場合には、差異情報を保存する（ステップＳ２２０）という上記の処理を繰り返し行う（ステップＳ２３０）。 The high level voice recognition unit 103 acquires all the audio data of the moving image file 10, calculates the average audio level P3 (step S190), acquires the audio data one by one from the moving image file 10 (step S200), and calculates the average Compared with the audio level P3 (step S210), if the acquired audio data is larger than the average audio level P3, the difference information (scene division candidate points and numerical values) is stored in a temporary file (step S220). In the present embodiment, the numerical value increases as the sound level increases. Then, the audio level is obtained one by one for all audio data of the moving image file 10 (step S200), compared with the average audio level P3 (step S210), and if the audio level is higher than the average audio level P3, the difference is obtained. The above process of storing information (step S220) is repeated (step S230).

次に、分割ファイル数調節部１０４は、動画ファイル１０の全動画像データの取得が終了して、差異情報が１つ以上ある場合には（ステップＳ２４０）、この差異情報の数が、パラメータ情報記憶部１１２より取得した分割上限数Ｐ４を越えているか否かをチェックする（ステップＳ２５０）。そして、分割上限数Ｐ４を越えている場合には、数値の高い順に分割上限数Ｐ４だけ差異情報を選択し（ステップＳ２６０）、選択した差異情報を元に、シーン分割して、分割ファイル２０ａ，…，２０ｎを作成し、分割動画記憶部１１３に記憶する（ステップＳ２７０）。分割上限数を越えていない場合には、差異情報を元にシーン分割して、分割ファイル２０ａ，…，２０ｎを作成し、分割動画記憶部１１３に記憶する（ステップＳ２７０）。 Next, when the acquisition of all moving image data of the moving image file 10 is completed and there is one or more difference information (step S240), the divided file number adjustment unit 104 determines that the number of difference information is parameter information. It is checked whether or not the division upper limit number P4 acquired from the storage unit 112 has been exceeded (step S250). If the division upper limit number P4 is exceeded, difference information is selected by the division upper limit number P4 in descending order of numerical values (step S260), and the scene is divided based on the selected difference information, and the divided files 20a, ..., 20n are created and stored in the divided moving image storage unit 113 (step S270). If the division upper limit number is not exceeded, the scene is divided based on the difference information, and divided files 20a,..., 20n are created and stored in the divided moving image storage unit 113 (step S270).

これに対して、差異情報がない場合には、高レベル音声認識によるシーン分割点はないと判断するので（結局、シーン分割点はないと判断する）、動画ファイル１０をコピーし、分割動画記憶部１１３に記憶する（ステップＳ２８０）。 On the other hand, when there is no difference information, it is determined that there is no scene division point by high-level speech recognition (after all, it is determined that there is no scene division point), so the moving image file 10 is copied and divided moving image storage is performed. The information is stored in the unit 113 (step S280).

尚、上述したカット点認識、局所変動箇所認識、高レベル音声認識の優先順序は、主に経験則による以下の理由による。 The priority order of the above-described cut point recognition, local variation location recognition, and high-level speech recognition is mainly due to the following reason based on empirical rules.

動画中、シーンが変わる（＝カメラアングルが変わる）ということは、「撮っている内容（＝伝えたい内容）が変わった」可能性が高いと考えられる。特に、家庭用ビデオカメラで撮った動画はその傾向が強いものである。従って、まずは、カット点認識を優先順位１番としているものである。次に、動画における変化においては、シーンが変わっていなくても（＝カメラアングルが変わっていない）部分的に変化している場合は、「伝えたい内容が変わった」可能性が高いと考えられるので、局所変動箇所認識を優先順位２番としたものである。そして、「伝えたい内容の変化」は、音声の変化よりも映像の変化によるところが大きいと考えられるので、高レベル音声認識の優先順序を３番としたものである。 If the scene changes (= camera angle changes) during a movie, it is highly likely that “the content being captured (= the content you want to convey) has changed”. In particular, movies taken with home video cameras tend to have a strong tendency. Therefore, first, cut point recognition is given the highest priority. Next, in the change in the movie, even if the scene has not changed (= the camera angle has not changed), it is considered that there is a high possibility that the content to be communicated has changed. Therefore, the local variation location recognition is set to the second priority. Since the “change in the content to be transmitted” is considered to be largely due to the change in the video rather than the change in the voice, the priority order of the high-level voice recognition is set to the third.

従って、本実施の形態の動画分割装置１００によれば、まず、カット点認識によりシーン分割を行い、次に、カット点認識されない場合には、局所変動箇所認識により、シーン分割を行い、最後に、局所変動箇所認識もされない場合には、高レベル音声認識により、シーン分割を行って、動画ファイルを分割するので、動画ファイルの内容に応じたシーン分割を可能とすることができる。また、分割上限数を定めることにより、分割ファイルの数を制御することができるので、動画編集作業の効率化を図ることができる。 Therefore, according to the moving image dividing apparatus 100 of the present embodiment, first, scene division is performed by cut point recognition, and then, when cut point recognition is not performed, scene division is performed by local variation location recognition, and finally When the local variation part is not recognized, the scene division is performed by high-level voice recognition to divide the moving image file, so that the scene can be divided according to the content of the moving image file. In addition, since the number of divided files can be controlled by determining the division upper limit number, the efficiency of moving image editing work can be improved.

次に、本実施の形態の動画分割装置１００を動画用検索インデックスシステムに適用した場合について説明する。図９は、上述した動画分割装置１００の機能を動画編集サーバ１１が備えている場合の動画用検索インデックス付与システム１の概略構成図である。 Next, the case where the moving image dividing device 100 of the present embodiment is applied to a moving image search index system will be described. FIG. 9 is a schematic configuration diagram of the moving image search index assigning system 1 when the moving image editing server 11 includes the function of the moving image dividing apparatus 100 described above.

図９に示す動画用検索インデックス付与システム１は、インターネット５を介して接続されるクライアント端末３からの要求に応じて動画を構成する各分割動画に対して検索インデックスを付与するものであり、元動画を複数の分割動画に分割する分割処理（上述した動画分割方法が用いられている）、ユーザ情報、検索インデックス、分割動画などの生成や管理、ユーザ認証などを行う動画編集サーバ１１、テキスト文章の生成処理、コンテンツ関係付け処理などを行うアプリケーションサーバ１３、および複数の分割動画のサムネイル表示用ページなどを配信するストリームサーバ１５から構成されている。また、本実施形態の動画用検索インデックス付与システム１は、インターネット５を介して一般ＷＷＷサーバ７が接続され、この一般ＷＷＷサーバ７はホームページ７１を有している。 The moving image search index assigning system 1 shown in FIG. 9 assigns a search index to each divided moving image constituting a moving image in response to a request from the client terminal 3 connected via the Internet 5. A video editing server 11 for generating and managing user information, a search index, a split video, user authentication, and the like, a text sentence, for dividing a video into a plurality of divided videos (the above-described video dividing method is used) Application server 13 that performs the generation process, content correlation process, and the like, and the stream server 15 that distributes thumbnail display pages of a plurality of divided moving images. In addition, the moving image search index assignment system 1 of this embodiment is connected to a general WWW server 7 via the Internet 5, and the general WWW server 7 has a home page 71.

図１０を参照して本実施形態の動画用検索インデックス付与方法の概要について説明する。図１０に示すように、例えばクライアント端末において所有する元動画に付与したい検索インデックス用の文字列を含んでいるテキスト文章を例えばインターネット上から適宜選択してクライアント端末３のディスプレイにＷＷＷブラウザにより表示する。図１０では、「ラッコは北海道北部から千島……半周したほどでした。」というテキスト文章の一例が検索インデックス（テキスト文章）という項目として表示されている。 With reference to FIG. 10, the outline | summary of the search index provision method for moving images of this embodiment is demonstrated. As shown in FIG. 10, for example, a text sentence including a character string for a search index to be added to an original video owned by a client terminal is selected as appropriate from the Internet, for example, and displayed on the display of the client terminal 3 by a WWW browser. . In FIG. 10, an example of a text sentence “Sea otter was from Chikajima ... half a half from the northern part of Hokkaido” is displayed as an item of a search index (text sentence).

それから、元動画を動画用検索インデックス付与システム１に転送して複数の分割動画１，２，３，…Ｎにシーン分割してもらい、この分割された複数の分割動画１，２，３，…Ｎをクライアント端末３で受け取ってクライアント端末３のディスプレイにＷＷＷブラウザにより表示する。 Then, the original moving image is transferred to the moving image search index assigning system 1 so that a plurality of divided moving images 1, 2, 3,... N are divided into scenes, and the divided divided moving images 1, 2, 3,. N is received by the client terminal 3 and displayed on the display of the client terminal 3 by the WWW browser.

ここで、この動画分割の流れを示しているのが図１１である。図１１によれば、クライアント端末３からアップロードされた元動画を動画編集サーバ１１が、図６乃至８に示す処理を行い、分割動画をクライアント端末３のブラウザ画面３００に表示している。 Here, FIG. 11 shows the flow of the moving image division. According to FIG. 11, the moving image editing server 11 performs the processing shown in FIGS. 6 to 8 on the original moving image uploaded from the client terminal 3, and displays the divided moving images on the browser screen 300 of the client terminal 3.

尚、図１２は、ブラウザ画面３００をより詳しく示している図である。シーン分割された複数の分割動画１，２，３，…，Ｎは、ストリームサーバ１５のサムネイル機能によりフレーム３０５内に符号３１０で示すように表示される。なお、このサムネイル表示された複数の分割動画１，２，３，…，Ｎは、単なる静止画ではなく、実際に画像が動いている連続した分割動画として表示されているものである。 FIG. 12 shows the browser screen 300 in more detail. A plurality of divided moving images 1, 2, 3,..., N divided into scenes are displayed in the frame 305 as indicated by reference numeral 310 by the thumbnail function of the stream server 15. The plurality of divided moving images 1, 2, 3,..., N displayed as thumbnails are not simply still images but are displayed as continuous divided moving images in which images are actually moving.

クライアント端末３のユーザは、この表示された分割動画１，２，３，…Ｎを閲覧し、例えば分割動画１にラッコの画像などが表示されている場合には、クライアント端末３のディスプレイに表示されたテキスト文章の文字列の中からラッコの分割動画１に付与するのに最適な検索インデックス用の文字列として、図１０のように例えば「ラッコ」なる文字列をクライアント端末３のマウスで反転して選択する。 The user of the client terminal 3 browses the displayed divided videos 1, 2, 3,... N. For example, when a sea otter image or the like is displayed on the divided video 1, it is displayed on the display of the client terminal 3. As a search index character string that is most suitable to be given to the sea otter segmented moving image 1 from the text strings of the text text, for example, a character string of “sea otter” is inverted with the mouse of the client terminal 3 as shown in FIG. To select.

このように検索インデックス用の文字列「ラッコ」が選択され反転されると、前記ラッコの画像などが表示されている分割動画１をマウス押下で掴んでドラッグし、「ラッコ」なる文字列が表示されている所まで移動させ、そこでマウスを離し、これにより検索インデックス用の文字列「ラッコ」を分割動画１に付与し、両者を関係付ける。 When the character string “sea otter” for the search index is selected and reversed in this way, the divided video 1 on which the sea otter image or the like is displayed is grabbed and dragged with the mouse pressed, and the character string “sea otter” is displayed. The mouse is released, and the character string “sea otter” for the search index is given to the divided moving image 1 to associate the two with each other.

そして、このように関係付けられた検索インデックスの文字列「ラッコ」と分割動画は、分割動画名を用いて、文字列と分割動画名との対応付けが動画用の検索インデックスとして動画インデックスデータベースに登録されることになる。 Then, the character string “sea otter” and the divided video in the search index related in this way use the divided video name, and the correspondence between the character string and the divided video name is stored in the video index database as a video search index. Will be registered.

従って、動画用検索インデックス付与システム１によれば、上述した動画分割装置１００と同じ効果を奏することができる。そして、これに加えて、クライアント端末３における簡単なマウス操作により動画用の検索インデックスに分割動画のそれぞれを関係付けることができる。すなわち、分割動画のそれぞれに対して文字列を検索インデックスとして関係付けて、文字列と分割動画名との対応付けを動画用の検索インデックスとして効率的かつ適確に生成することができる。 Therefore, according to the moving image search index assigning system 1, the same effect as the moving image dividing apparatus 100 described above can be obtained. In addition to this, each of the divided moving images can be related to the moving image search index by a simple mouse operation on the client terminal 3. That is, a character string can be associated with each of the divided moving images as a search index, and a correspondence between the character string and the divided moving image name can be efficiently and appropriately generated as a search index for moving images.

また、この関係付けでは、複数の分割動画をサムネイル表示し、このサムネイル表示された複数の分割動画に対してテキスト文章を同じ画面上の異なるフレームで同時に表示し、テキスト文章と複数の分割動画の両方を閲覧しながら簡単なマウス操作でテキスト文章中の文字列と分割動画との関係付けを視覚的にわかり易く、効率的に行うことができる。 In addition, in this association, a plurality of divided videos are displayed as thumbnails, and text sentences are simultaneously displayed in different frames on the same screen for the plurality of divided videos displayed as thumbnails. The user can visually and easily understand the relationship between the character string in the text sentence and the divided moving image by simple mouse operation while browsing both.

更に、動画をキーとして異なるコンテンツ間のリンク付けを行うこともできる。また、この分割動画と文字列との関係付けは、クライアント端末３からインターネット５を介してどこからでも行うことができ、利用者は単にクライアント端末３のみを有していればよく、特別な装置を必要としない。既存のコンテンツへの動画の組み込みも容易にできる。 Furthermore, it is possible to link different contents using a moving image as a key. In addition, the association between the divided video and the character string can be performed from anywhere via the Internet 5 from the client terminal 3, and the user only needs to have only the client terminal 3, and a special device is provided. do not need. Video can be easily incorporated into existing content.

以上、本発明の実施の形態について説明してきたが、本発明の要旨を逸脱しない範囲において、本発明の実施の形態に対して種々の変形や変更を施すことができる。例えば、本実施の形態の動画分割装置１００は、スタンドアロンの装置として、また、上述した動画用検索インデックス付与システム１として適用できるほか、動画分割処理を伴う様々なコンピュータシステムに適用することができるものである。 While the embodiments of the present invention have been described above, various modifications and changes can be made to the embodiments of the present invention without departing from the spirit of the present invention. For example, the moving image dividing apparatus 100 according to the present embodiment can be applied as a stand-alone device or as the moving image search index assignment system 1 described above, and can be applied to various computer systems that involve moving image dividing processing. It is.

本発明の実施の形態に係る動画分割装置の概略構成図である。It is a schematic block diagram of the moving image division | segmentation apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る動画分割装置における分割イメージを説明する図である。It is a figure explaining the division | segmentation image in the moving image division | segmentation apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る動画分割装置の局所変動箇所認識におけるブロックの一例である。It is an example of the block in the local change location recognition of the moving image division | segmentation apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る動画分割装置の高レベル音声認識の一例である。It is an example of the high level audio | voice recognition of the moving image division | segmentation apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る動画分割装置のシーン分割数の調節方法を説明する図である。It is a figure explaining the adjustment method of the scene division number of the moving image division | segmentation apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る動画分割装置の動作を説明するフローチャートである。It is a flowchart explaining operation | movement of the moving image division | segmentation apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る動画分割装置の動作を説明するフローチャートである。It is a flowchart explaining operation | movement of the moving image division | segmentation apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る動画分割装置の動作を説明するフローチャートである。It is a flowchart explaining operation | movement of the moving image division | segmentation apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る動画分割装置が適用された動画用検索インデックス付与システムの概略構成図である。It is a schematic block diagram of the search index provision system for moving images to which the moving image dividing device according to the embodiment of the present invention is applied. 動画用検索インデックス付与方法の概要を説明する図である。It is a figure explaining the outline | summary of the search index provision method for moving images. 動画用検索インデックス付与システムにおける動画分割処理の概要を説明する図である。It is a figure explaining the outline | summary of the moving image division | segmentation process in the search index provision system for moving images. クライアント端末に示される複数の分割動画を表示したブラウザ画面を示す図である。It is a figure which shows the browser screen which displayed the some division | segmentation moving image shown by the client terminal.

Explanation of symbols

１動画用検索インデックス付与システム
３クライアント端末
５インターネット
７一般ＷＷＷサーバ
１０動画ファイル
１１動画編集サーバ
１３アプリケーションサーバ
１５ストリームサーバ
２０ａ〜２０ｎ分割動画ファイル
７１ホームページ
１００動画分割装置
１０１カット点認識部
１０２局所変動箇所認識部
１０３高レベル音声認識部
１０４分割ファイル数調節部
１１１元動画記憶部
１１２パラメータ情報記憶部
１１３分割動画記憶部 1 Video Search Indexing System 3 Client Terminal 5 Internet 7 General WWW Server 10 Video File 11 Video Editing Server 13 Application Server 15 Stream Server 20a-20n Divided Video File 71 Home Page 100 Video Dividing Device 101 Cut Point Recognition Unit 102 Local Fluctuation Location Recognition unit 103 High-level voice recognition unit 104 Divided file number adjustment unit 111 Original moving image storage unit 112 Parameter information storage unit 113 Divided moving image storage unit

Claims

A video splitting method that splits a video file into scenes according to the content of the video,
A computer comprising storage means for storing first and second threshold values relating to video segmentation and a segment upper limit number that is the maximum number of scene segments,
A first numerical value related to all similarities between temporally adjacent frames of the moving image file is calculated for the moving image file that is the original moving image before division, and each of the first numerical values, A cut point detecting step for detecting candidate points for scene division based on the comparison with the first threshold value stored in the storage means;
If the candidate point is not detected in the cut point detection step, a second numerical value related to all similarities and luminance differences between the temporally adjacent frames is calculated in each block obtained by dividing the frame into a plurality of blocks. , A local variation location detecting step for detecting candidate points for scene division based on a comparison between each of the second numerical values and the second threshold value stored in the storage means;
If the candidate point is not detected in the local variation location detection step, an average value of the audio levels of all audio data of the moving image file is calculated, and the average value and the audio level of the audio data included in the moving image file are calculated. And a high-level sound detection step for detecting candidate points for scene division based on the comparison with
When the number of candidate points for scene division detected in the cut point detection step, the local variation point detection step, or the high level audio detection step is equal to or less than the division upper limit number stored in the storage unit, The moving image file is divided according to the candidate points for scene division, and the number of candidate points for scene division detected in the cut point detection step, the local variation point detection step, or the high-level sound detection step is the division upper limit. If the number exceeds the number, the number of candidate points corresponding to the upper limit number of divisions is determined based on the first numerical value, the second numerical value, or the level of the audio level calculated at each candidate point. A dividing file number adjusting step of selecting and dividing the moving image file according to the selected candidate point;
A video dividing method characterized by executing

A method of dividing a video in a multimedia search index adding apparatus that creates a divided video that is divided into scenes according to the content of the video, and adds a search index to the divided video in response to a request from a client terminal,
The multimedia search index assigning device comprising storage means for storing first and second threshold values relating to video segmentation and a segment upper limit number that is the maximum number of scene segments,
An original video reception step of receiving a video file that is an original video selected by the client terminal;
A first numerical value for all similarities between temporally adjacent frames of the moving image file is calculated for the moving image file, and each of the first numerical values and the first value stored in the storage unit are calculated. A cut point detecting step for detecting candidate points for scene division based on the comparison with the threshold;
If the candidate point is not detected in the cut point detection step, a second numerical value related to all similarities and luminance differences between the temporally adjacent frames is calculated in each block obtained by dividing the frame into a plurality of blocks. , A local variation location detecting step for detecting candidate points for scene division based on a comparison between each of the second numerical values and the second threshold value stored in the storage means;
If the candidate point is not detected in the local variation location detection step, an average value of the audio levels of all audio data of the moving image file is calculated, and the average value and the audio level of the audio data included in the moving image file are calculated. And a high-level sound detection step for detecting candidate points for scene division based on the comparison with
When the number of candidate points for scene division detected in the cut point detection step, the local variation point detection step, or the high level audio detection step is equal to or less than the division upper limit number stored in the storage unit, The moving image file is divided according to the candidate points for scene division, and the number of candidate points for scene division detected in the cut point detection step, the local variation point detection step, or the high-level sound detection step is the division upper limit. If the number exceeds the number, the number of candidate points corresponding to the upper limit number of divisions is determined based on the first numerical value, the second numerical value, or the level of the audio level calculated at each candidate point. A dividing file number adjusting step of selecting and dividing the moving image file according to the selected candidate point;
A display step of transmitting and displaying each of the divided videos divided in the divided file number adjustment step to the client terminal;
A video dividing method characterized by executing

A video splitting device that splits a video file into scenes according to the content of the video,
Storage means for storing first and second threshold values relating to video segmentation, and a division upper limit number which is the maximum number of scene divisions;
A first numerical value related to all similarities between temporally adjacent frames of the moving image file is calculated for the moving image file that is the original moving image before division, and each of the first numerical values, Cut point detection means for detecting candidate points for scene division based on comparison with the first threshold value stored in the storage means;
If the candidate point is not detected by the cut point detection means, a second numerical value related to all similarities and luminance differences between the temporally adjacent frames is calculated in each block obtained by dividing the frame into a plurality of blocks. Each of the second numerical values and the second threshold value stored in the storage means, based on a comparison between the local variation location detection means for detecting candidate points for scene division;
If the candidate point is not detected by the local variation location detection means, the average value of the audio levels of all the audio data of the moving file is calculated, and the average value and the audio of the audio data included in the moving image file are calculated. High-level sound detection means for detecting candidate points for scene division based on the comparison with the level;
When the number of scene division candidate points detected by the cut point detection means, the local variation location detection means, or the high-level sound detection means is less than or equal to the division upper limit number stored in the storage means, The moving image file is divided according to the scene division candidate points, and the number of scene division candidate points detected by the cut point detection unit, the local variation point detection unit, or the high-level sound detection unit is the division upper limit. If the number exceeds the number, the number of candidate points corresponding to the upper limit number of divisions is determined based on the first numerical value, the second numerical value, or the level of the audio level calculated at each candidate point. A dividing file number adjusting means for selecting and dividing the moving image file according to the selected candidate point;
A moving picture dividing apparatus comprising:

A multimedia search index providing device that creates a divided video that is divided into scenes according to the content of the video, and adds a search index to the divided video in response to a request from a client terminal,
Storage means for storing first and second threshold values relating to video segmentation, and a division upper limit number which is the maximum number of scene divisions;
Original moving image receiving means for receiving a moving image file that is an original moving image selected by the client terminal;
A first numerical value for all similarities between temporally adjacent frames of the moving image file is calculated for the moving image file, and each of the first numerical values and the first value stored in the storage unit are calculated. Cut point detection means for detecting candidate points for scene division based on the comparison with the threshold;
If the candidate point is not detected by the cut point detection means, a second numerical value related to all similarities and luminance differences between the temporally adjacent frames is calculated in each block obtained by dividing the frame into a plurality of blocks. Each of the second numerical values and the second threshold value stored in the storage means, based on a comparison between the local variation location detection means for detecting candidate points for scene division;
If the candidate point is not detected by the local variation location detection means, the average value of the audio levels of all audio data of the video file is calculated, and the average value and the audio level of the audio data included in the video file And high-level sound detection means for detecting candidate points for scene division based on the comparison of
When the number of scene division candidate points detected by the cut point detection means, the local variation location detection means, or the high-level sound detection means is less than or equal to the division upper limit number stored in the storage means, The moving image file is divided according to the scene division candidate points, and the number of scene division candidate points detected by the cut point detection unit, the local variation point detection unit, or the high-level sound detection unit is the division upper limit. If the number exceeds the number, the number of candidate points corresponding to the upper limit number of divisions is determined based on the first numerical value, the second numerical value, or the level of the audio level calculated at each candidate point. A dividing file number adjusting means for selecting and dividing the moving image file according to the selected candidate point;
Display means for transmitting and displaying each of the divided videos divided by the divided file number adjusting means to the client terminal;
A search index assigning apparatus for multimedia, comprising:

A video segmentation program that divides a video file into scenes according to the content of the video,
A computer comprising storage means for storing first and second threshold values relating to video segmentation, and a division upper limit number which is the maximum number of scene divisions,
A first numerical value related to all similarities between temporally adjacent frames of the moving image file is calculated for the moving image file that is the original moving image before division, and each of the first numerical values, A cut point detecting step for detecting candidate points for scene division based on the comparison with the first threshold value stored in the storage means;
If the candidate point is not detected in the cut point detection step, a second numerical value related to all similarities and luminance differences between the temporally adjacent frames is calculated in each block obtained by dividing the frame into a plurality of blocks. , A local variation location detecting step for detecting candidate points for scene division based on a comparison between each of the second numerical values and the second threshold value stored in the storage means;
If the candidate point is not detected in the local variation location detection step, an average value of the audio levels of all audio data of the moving image file is calculated, and the average value and the audio level of the audio data included in the moving image file are calculated. And a high-level sound detection step for detecting candidate points for scene division based on the comparison with
When the number of candidate points for scene division detected in the cut point detection step, the local variation point detection step, or the high level audio detection step is equal to or less than the division upper limit number stored in the storage unit, The moving image file is divided according to the candidate points for scene division, and the number of candidate points for scene division detected in the cut point detection step, the local variation point detection step, or the high-level sound detection step is the division upper limit. If the number exceeds the number, the number of candidate points corresponding to the upper limit number of divisions is determined based on the first numerical value, the second numerical value, or the level of the audio level calculated at each candidate point. A dividing file number adjusting step of selecting and dividing the moving image file according to the selected candidate point;
A video segmentation program characterized by causing

A video segmentation program for a multimedia search index assigning device that creates a segmented video segmented according to the content of a video and adds a search index to the segmented video in response to a request from a client terminal,
In the multimedia search index assigning device comprising storage means for storing first and second threshold values relating to video segmentation, and an upper limit number of divisions which is the maximum number of scene divisions
An original video reception step of receiving a video file that is an original video selected by the client terminal;
A first numerical value for all similarities between temporally adjacent frames of the moving image file is calculated for the moving image file, and each of the first numerical values and the first value stored in the storage unit are calculated. A cut point detecting step for detecting candidate points for scene division based on the comparison with the threshold;
If the candidate point is not detected in the cut point detection step, a second numerical value related to all similarities and luminance differences between the temporally adjacent frames is calculated in each block obtained by dividing the frame into a plurality of blocks. , A local variation location detecting step for detecting candidate points for scene division based on a comparison between each of the second numerical values and the second threshold value stored in the storage means;
If the candidate point is not detected in the local variation location detection step, an average value of the audio levels of all audio data of the moving image file is calculated, and the average value and the audio level of the audio data included in the moving image file are calculated. And a high-level sound detection step for detecting candidate points for scene division based on the comparison with
When the number of candidate points for scene division detected in the cut point detection step, the local variation point detection step, or the high level audio detection step is equal to or less than the division upper limit number stored in the storage unit, The moving image file is divided according to the candidate points for scene division, and the number of candidate points for scene division detected in the cut point detection step, the local variation point detection step, or the high-level sound detection step is the division upper limit. If the number exceeds the number, the number of candidate points corresponding to the upper limit number of divisions is determined based on the first numerical value, the second numerical value, or the level of the audio level calculated at each candidate point. A dividing file number adjusting step of selecting and dividing the moving image file according to the selected candidate point;
A display step of transmitting and displaying each of the divided videos divided in the divided file number adjustment step to the client terminal;
A video segmentation program characterized by causing