JP2008092594A

JP2008092594A - Moving image processing apparatus

Info

Publication number: JP2008092594A
Application number: JP2007282311A
Authority: JP
Inventors: Hisashi Aoki; 恒青木
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2003-09-30
Filing date: 2007-10-30
Publication date: 2008-04-17
Anticipated expiration: 2023-11-06
Also published as: JP4491009B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a moving image processing apparatus capable of dividing moving images into appropriate metashots irrespective of program types. <P>SOLUTION: The apparatus includes a similarity measuring means 104 for measuring the similarity between divided parts of moving images, a similar-shot specifying means 104 for specifying similar partial parts of moving images, a grouping means 110 for conferring a same group attribute to the similar divided parts of moving images, a metashot generating means 107 for generating a metashot having lead parts of moving images selected by a lead shot selecting means 106 at the head thereof, and a program type determining means 130. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、動画像処理装置に関する。 The present invention relates to a moving image processing apparatus.

高性能なパーソナルコンピュータ（ＰＣ）やハードディスクレコーダーの普及により、映像・動画像をデジタル化して保存する技術が発達してきている。この技術は、ハードウェア、ソフトウェアの形態で実現されており、また、業務用に限らず家庭用のハードウェア等においても実現されている。 With the widespread use of high-performance personal computers (PCs) and hard disk recorders, techniques for digitizing and storing video and moving images have been developed. This technology is realized in the form of hardware and software, and is also realized not only for business use but also for home hardware.

具体的には、例えばＰＣ内、またはレコーダー内のハードディスク（ＨＤＤ）に電磁的に映像を記録する。従って、目的の番組を少ない待ち時間で再生開始できる、不要番組の限定的削除が容易であるなど、従来のビデオテープにはなかったメリットがある。このような利便性の向上により、録画等の操作をより簡単に行うことができるようになってきた。 Specifically, for example, video is electromagnetically recorded in a hard disk (HDD) in a PC or a recorder. Therefore, there are advantages that the conventional video tape does not have, such as the ability to start playback of the target program with a small waiting time and the easy deletion of unnecessary programs. Due to such improved convenience, operations such as recording can be performed more easily.

その一方で、大量の映像等が記録されると、所望の場面の検索が困難になるという問題が生じた。早送り機能などを用いて番組を、いわゆる「飛ばし見」することにより検索時間を短縮することでこのような問題に対処可能である。 On the other hand, when a large amount of video is recorded, there is a problem that it becomes difficult to search for a desired scene. Such a problem can be dealt with by shortening the search time by so-called “skipping” the program using a fast-forward function or the like.

しかし、このような「飛ばし見」は、例えば数秒に１フレームといったように番組内容の構造と無関係な物理的な単位で表示フレームを間引きするため、興味ある場面を行き過ぎてしまうという新たな問題が生じた。 However, such a “skipping” has a new problem of overshooting the scene of interest because the display frames are thinned out in physical units unrelated to the structure of the program content, for example, one frame every few seconds. occured.

このような問題を解決するために、画像処理技術を用い、動画像中の画像が切り替わる画像変化点（以下、「カット点」と称す）によって動画像を部分動画像に分割し、ショット、すなわち部分動画像毎に飛ばし見を可能とする技術研究や製品開発がなされてきている。 In order to solve such a problem, an image processing technique is used to divide a moving image into partial moving images by image change points (hereinafter referred to as “cut points”) at which the images in the moving image are switched, and shots, Technical research and product development have been carried out to enable skipping for each partial moving image.

上記のような映像の分割として、たとえばコマーシャルと番組本編、ニュース番組の話題転換など番組内容に即した分割が可能であるが、例えば資料映像の切り替わりなど、上記の映像分割の技術では適切に分割処理を行えない画像が大量に含まれている。 As the video division as described above, it is possible to divide according to the contents of the program, such as commercial and program main part, news program topic change, etc. A large amount of images that cannot be processed are included.

また、生成されたショットには、再生される時間長が数秒程度と短いものが多い。このように１つのショットの時間長が極端に短い場合には、検索時間を短縮できるという効果も期待できない。 Many of the generated shots have a short playback time of about several seconds. Thus, when the time length of one shot is extremely short, the effect that the search time can be shortened cannot be expected.

この問題を解決するために、本願出願人は、類似ショットのアイコン表示を省略することによって一覧表示の視認性を向上させる方法を提案した（特許文献１参照）。また、映像の繰り返し単位をグルーピング（連続する複数のショットの集合にする＝メタショット化）することにより、本来の番組内容により近い単位で映像を構造化する方法が提案されている。
特開平９−２７０００６号公報青木ら著「繰返しショットの統合による階層化アイコンを用いたビデオ・インタフェース」（情報処理学会論文誌 Vol.39, No.5 pp.1317-1324, 1998年） In order to solve this problem, the applicant of the present application has proposed a method for improving the visibility of the list display by omitting the icon display of similar shots (see Patent Document 1). In addition, a method has been proposed in which video is structured in units closer to the original program content by grouping video repeat units into a group of consecutive shots (metashot).
JP-A-9-270006 Aoki et al. “Video Interface Using Hierarchical Icons by Integration of Repeated Shots” (Information Processing Society of Japan Vol.39, No.5 pp.1317-1324, 1998)

上記文献の方法によれば、番組の種類によって、適切なメタショットに分割するための方法が異なるため番組の種別によらずに、自動的に適切なメタショットに分割する技術の提供が望まれている。 According to the method of the above document, since the method for dividing into appropriate meta shots differs depending on the type of program, it is desired to provide a technique for automatically dividing into appropriate meta shots regardless of the type of program. ing.

本発明は、上記に鑑みてなされたものであって、番組の種類によらず適切なショットに分割することのできる動画像処理装置、動画像処理方法および動画像処理プログラムを提供することを目的とする。 The present invention has been made in view of the above, and an object thereof is to provide a moving image processing apparatus, a moving image processing method, and a moving image processing program that can be divided into appropriate shots regardless of the type of program. And

上記目的を達成するために本発明の動画像処理装置は、動画像から画像の内容が切り替わる画像変化点を検出するカット検出手段と、前記カット検出手段が検出した前記画像変化点で分割された動画像である部分動画像間の類似度を計測する類似度計測手段と、前記類似度計測手段が計測した前記類似度に基づいて、類似する部分動画像を特定する類似ショット特定手段と、前記類似ショット特定手段が特定した類似する複数の部分動画像に対して同一のグループ属性を付与するグループ化手段と、前記グループ化手段によってグループ化された類似する部分動画像が動画像中で出現する出現パターンに基づいて、前記動画像の種別を判定する動画像種別判定手段とを備えたことを特徴とする。 In order to achieve the above object, a moving image processing apparatus according to the present invention is divided by a cut detection means for detecting an image change point at which an image content is switched from a moving image, and the image change point detected by the cut detection means. Similarity measuring means for measuring the similarity between partial moving images that are moving images, similar shot specifying means for specifying similar partial moving images based on the similarity measured by the similarity measuring means, Grouping means for assigning the same group attribute to a plurality of similar partial moving images specified by the similar shot specifying means, and similar partial moving images grouped by the grouping means appear in the moving image. And a moving image type determining means for determining the type of the moving image based on the appearance pattern.

ここで本発明において、同一のグループに属する部分動画像の個数と、予め定められた基準個数とを比較するショット個数比較手段をさらに備え、前記動画像種別判定手段は、前記ショット個数比較手段による比較結果に基づいて前記動画像種別を判定することを特徴とする。 Here, in the present invention, it further comprises shot number comparing means for comparing the number of partial moving images belonging to the same group with a predetermined reference number, and the moving image type determining means is based on the shot number comparing means. The moving image type is determined based on the comparison result.

また本発明において、同一のグループに属する部分動画像のうち再生時間長が最短である部分動画像の再生時間長と、予め定められた基準最短時間長とを比較する最短時間長比較手段をさらに備え、前記動画像種別判定手段は、前記最短時間長比較手段による比較結果に基づいて前記動画像種別を判定することを特徴とする。 In the present invention, the shortest time length comparing means for comparing the reproduction time length of the partial moving image having the shortest reproduction time length among the partial moving images belonging to the same group with a predetermined reference shortest time length is further provided. The moving image type determining means determines the moving image type based on a comparison result by the shortest time length comparing means.

また本発明において、同一のグループに属する部分動画像のうち再生時間長が最長である部分動画像の再生時間長と、予め定められた基準最長基準時間長とを比較する最長時間長比較手段をさらに備え、前記動画像種別判定手段は、前記最長時間長比較手段による比較結果に基づいて前記動画像種別を判定することを特徴とする。 In the present invention, the longest time length comparison means for comparing the reproduction time length of the partial moving image having the longest reproduction time length among the partial moving images belonging to the same group and a predetermined reference longest reference time length. Further, the moving image type determining means determines the moving image type based on a comparison result by the longest time length comparing means.

また本発明において、同一のグループに属する部分動画像の再生時間長の平均値を算出する時間長平均値算出手段と、前記時間長平均値算出手段が算出した前記再生時間長の平均値が、予め定められた基準平均時間長範囲内の値であるか否かを判定する平均値判定手段と、前記平均値判定手段が前記基準平均時間長範囲内であると判定した前記グループの個数を計測する基準平均時間グループ数計測手段とをさらに備え、前記動画像種別判定手段は、前記基準平均時間グループ数計測手段による計測結果に基づいて前記動画像種別を判定することを特徴とする。 Further, in the present invention, a time length average value calculating means for calculating an average value of reproduction time lengths of partial moving images belonging to the same group, and an average value of the reproduction time lengths calculated by the time length average value calculating means, Average value determining means for determining whether or not the value is within a predetermined reference average time length range, and measuring the number of groups determined by the average value determining means to be within the reference average time length range The moving image type determining means determines the moving image type based on a measurement result by the reference average time group number measuring means.

また本発明において、同一のグループに属する部分動画像のうち前記動画像において最
初に配置されている部分動画像と、前記動画像において最後に配置されている部分動画像との間の部分動画像間再生時間長を測定するショット間時間長測定手段と、前記ショット間時間長測定手段によって測定された前記部分動画像間再生時間長が、予め定められた基準部分動画像間時間長範囲内の値であるか否かを判定するショット間時間長判定手段と、前記ショット間時間長判定手段が前記基準部分動画像間時間長範囲内の値であると判定した前記グループの個数を計測するグループ数計測手段とをさらに備え、前記動画像種別判定手段は、前記基準ショットグループ数計測手段による計測結果に基づいて前記動画像種別を判定することを特徴とする。 In the present invention, among the partial moving images belonging to the same group, the partial moving image between the partial moving image first arranged in the moving image and the partial moving image arranged last in the moving image. An inter-shot time length measuring means for measuring an inter-playback time length, and the inter-moving image playback time length measured by the inter-shot time length measuring means is within a predetermined reference inter-moving video time length range. An inter-shot time length determining means for determining whether or not the value is a value, and a group for measuring the number of groups determined by the inter-shot time length determining means to be a value within the reference partial moving image time length range A moving image type determining unit, wherein the moving image type determining unit determines the moving image type based on a measurement result by the reference shot group number measuring unit.

また本発明において、同一のグループに属する部分動画像のうち前記動画像において最初に配置されている部分動画像と、前記動画像において最後に配置されている部分動画像との間をグループ存在範囲として特定するグループ存在範囲特定手段と、前記グループ存在範囲特定手段が前記動画像に含まれる複数のグループそれぞれに対して特定した前記グループ存在範囲に対して和演算を施し、和演算により得られた範囲に含まれる前記部分動画像を１つのメタショットとして特定するメタショット特定手段とをさらに備え、前記動画像種別判定手段は、前記メタショット特定手段によって特定された前記メタショットの出現パターンに基づいて、前記動画像の種別を判定することを特徴とする。 In the present invention, a group existence range is defined between a partial moving image first arranged in the moving image and a partial moving image arranged last in the moving image among the partial moving images belonging to the same group. Obtained by performing a sum operation on the group existence range specified by the group existence range specifying means and the group existence range specifying means specified for each of the plurality of groups included in the moving image. Meta shot specifying means for specifying the partial moving image included in the range as one meta shot, and the moving image type determining means is based on the appearance pattern of the meta shot specified by the meta shot specifying means Then, the type of the moving image is determined.

また本発明において、同一のグループに属する複数の部分動画像の出現パターンに基づいて、前記グループに属する、対象部分動画像が対話の場面を示す部分動画像である確率を示す対話度数を算出する対話度数算出手段をさらに備え、前記対話度数算出手段が算出した前記対話度数が予め定められた基準比率範囲内の値である場合に、前記対話度数が基準範囲内である範囲に含まれる前記部分動画像を１つのメタショットとして特定するメタショット特定手段とをさらに備え、前記動画像種別判定手段は、前記メタショット特定手段によって特定された前記メタショットの出現パターンに基づいて、前記動画像の種別を判定することを特徴とする。 In the present invention, based on the appearance patterns of a plurality of partial moving images belonging to the same group, the interaction frequency indicating the probability that the target partial moving image belonging to the group is a partial moving image indicating a conversation scene is calculated. The portion included in a range in which the interaction frequency is within a reference range when the interaction frequency calculated by the interaction frequency calculation unit is a value within a predetermined reference ratio range. A meta shot specifying unit that specifies a moving image as one meta shot, and the moving image type determining unit is configured to generate the moving image based on the appearance pattern of the meta shot specified by the meta shot specifying unit. The type is determined.

また本発明において、前記動画像種別判定手段は、前記動画像に含まれる前記メタショットの個数に基づいて前記動画像種別を判定することを特徴とする。 In the present invention, the moving image type determination means determines the moving image type based on the number of metashots included in the moving image.

また本発明において、前記動画像種別判定手段は、前記動画像に含まれる前記メタショットの再生時間長の合計値に基づいて前記動画像種別を判定することを特徴とする。 In the present invention, the moving image type determination unit determines the moving image type based on a total value of reproduction time lengths of the meta shots included in the moving image.

また本発明において、前記動画像種別判定手段は、前記動画像に含まれる前記メタショットのうち、再生時間長が最短である前記メタショットの再生時間長に基づいて、前記動画像種別を判定することを特徴とする。 In the present invention, the moving image type determining means determines the moving image type based on the reproduction time length of the metashot having the shortest reproduction time length among the metashots included in the moving image. It is characterized by that.

また本発明において、前記動画像種別判定手段は、前記動画像に含まれる前記メタショットのうち、再生時間長が最長である前記メタショットの再生時間長に基づいて、前記動画像種別を判定することを特徴とする。 In the present invention, the moving image type determining means determines the moving image type based on the reproduction time length of the meta shot having the longest reproduction time length among the meta shots included in the moving image. It is characterized by that.

また本発明において、前記動画像種別判定手段は、前記動画像に含まれる前記メタショットの平均再生時間長に基づいて、前記動画像種別を判定することを特徴とする。 In the present invention, the moving image type determining means determines the moving image type based on an average reproduction time length of the metashot included in the moving image.

また本発明において、前記動画像全体に対する対話の活発さを算出する番組対話度数算出手段をさらに備え、前記動画像種別判定手段は、前記番組対話度数算出手段が算出した動画像全体の対話度数に基づいて前記動画像種別を判定することを特徴とする。 In the present invention, the program further includes a program interaction degree calculating unit that calculates the activity of the dialogue with respect to the entire moving image, and the moving image type determining unit determines the interaction degree of the entire moving image calculated by the program interaction degree calculating unit. The moving image type is determined on the basis of this.

また本発明において、前記動画像中でいずれかのグループに属する部分動画像の時間長の合計を算出するショット時間長算出手段とをさらに備え、前記番組対話度数算出手段は
、前記動画像の時間長に対する、前記ショット時間長算出手段が算出した合計の時間長の比率を、前記番組対話度数として算出することを特徴とする。 In the present invention, it further comprises shot time length calculating means for calculating a total time length of partial moving images belonging to any group in the moving image, wherein the program interaction frequency calculating means includes the time of the moving image. The ratio of the total time length calculated by the shot time length calculation means to the length is calculated as the program interaction frequency.

また本発明において、前記動画像中でいずれかのグループに属する部分動画像の個数を計測するショット個数計測手段とをさらに備え、前記番組対話度数算出手段は、前記動画像の時間長に対する、前記ショット個数計測手段が計測した部分動画像個数の比率を、前記番組対話度数として算出することを特徴とする。 In the present invention, it further comprises a shot number measuring means for measuring the number of partial moving images belonging to any group in the moving image, and the program interaction frequency calculating means is configured to calculate the time length of the moving image with respect to the time length of the moving image. A ratio of the number of partial moving images measured by the shot number measuring means is calculated as the program interaction frequency.

また本発明において、前記動画像中でいずれかのグループに属する部分動画像の時間長の合計を算出するショット時間長算出手段と、前記動画像中でいずれかのグループに属する部分動画像の個数を計測するショット個数計測手段とをさらに備え、前記番組対話度数算出手段は、前記動画像の時間長に対する前記ショット時間長算出手段が算出した合計の時間長の比率と、前記動画像の時間長に対する前記ショット個数計測手段が計測した部分動画像個数の比率との積を、前記番組対話度数として算出することを特徴とする。 In the present invention, the shot time length calculating means for calculating the total time length of the partial moving images belonging to any group in the moving image, and the number of partial moving images belonging to any group in the moving image Shot number measuring means for measuring the program interaction frequency calculating means, the ratio of the total time length calculated by the shot time length calculating means to the time length of the moving image, and the time length of the moving image The product of the ratio of the number of partial moving images measured by the shot number measuring means with respect to is calculated as the program interaction frequency.

また本発明において、前記動画像種別判定手段は、前記動画像先頭から規定順位に登場するメタショットの開始時刻が、前記動画像先頭から予め定められた時間、または前記動画像の長さに対して予め定められた比率より前であるか後であるかを根拠にして、動画像種別を判定することを特徴とする。 Also, in the present invention, the moving image type determining means is configured such that a start time of a meta shot that appears in a specified order from the moving image head is a predetermined time from the moving image head or the length of the moving image. The moving image type is determined based on whether the ratio is before or after a predetermined ratio.

また本発明において、前記動画像を複数の時間的な区間に分割する分割手段をさらに備え、前記動画像種別判定手段は、前記分割手段によって分割された各区間毎に前記動画像種別を判定することを特徴とする。 In the present invention, it further includes a dividing unit that divides the moving image into a plurality of temporal sections, and the moving image type determining unit determines the moving image type for each section divided by the dividing unit. It is characterized by that.

また本発明において、動画像に対応する解析処理条件を受信する解析パラメータ受信手段をさらに備え、前記カット検出手段、前記類似ショット特定手段、前記動画像種別判定手段の少なくとも１つは、上記解析パラメータ受信手段が受信した基準条件に基づいてカットを検出、または類似ショットを特定、または動画像種別の判定を行うことを特徴とする。 In the present invention, an analysis parameter receiving unit that receives an analysis processing condition corresponding to a moving image is further provided, and at least one of the cut detecting unit, the similar shot specifying unit, and the moving image type determining unit includes the analysis parameter It is characterized in that a cut is detected, a similar shot is identified, or a moving image type is determined based on a reference condition received by the receiving means.

本発明によれば番組の種類によらず適切なショットに分割することのできる動画像処理装置、動画像処理方法および動画像処理プログラムを提供できる。 According to the present invention, it is possible to provide a moving image processing apparatus, a moving image processing method, and a moving image processing program that can be divided into appropriate shots regardless of the type of program.

以下に、本発明にかかる動画像処理装置、動画像処理方法および動画像処理プログラムの実施例を図面に基づいて詳細に説明する。なお、この実施例によりこの発明が限定されるものではない。また、実施例においては、一例として動画像としてニュース番組の映像を取得した場合の処理について説明する。なお、以下に記載の「ショット」は、特許請求の範囲に記載の部分動画像に相当する。 Embodiments of a moving image processing apparatus, a moving image processing method, and a moving image processing program according to the present invention will be described below in detail with reference to the drawings. Note that the present invention is not limited to the embodiments. In the embodiment, as an example, processing when a video of a news program is acquired as a moving image will be described. The “shot” described below corresponds to the partial moving image described in the claims.

図１は、実施例１にかかる動画像処理装置１０の機能構成を示すブロック図である。動画像処理装置１０は、動画像取得部１０１と、カット検出部１０２と、ショット区間定義部１０３と、類似ショット検出部１０４と、グループ化部１１０と、メタショット先頭グループ判定部１０５と、メタショット先頭時刻判定部１０６と、メタショット生成部１０７と、動画像出力部１０８と、統計処理部１２０と、統計値保持部１２２と、対話度数算出部１２４とを備えている。 FIG. 1 is a block diagram of a functional configuration of the moving image processing apparatus 10 according to the first embodiment. The moving image processing apparatus 10 includes a moving image acquisition unit 101, a cut detection unit 102, a shot section definition unit 103, a similar shot detection unit 104, a grouping unit 110, a meta shot head group determination unit 105, A shot start time determination unit 106, a meta shot generation unit 107, a moving image output unit 108, a statistical processing unit 120, a statistical value holding unit 122, and an interaction frequency calculation unit 124 are provided.

動画像取得部１０１は、例えば当該動画像処理装置１０に接続された放送番組受信機（チューナー）などを介して外部から動画像を取得する。動画像取得部１０１は、非圧縮の動画像を取得してもよい。また、ＤＶ形式や動画像圧縮の標準形式であるＭＰＥＧ−１，２，４などのデジタルデータに変換された動画像を取得してもよい。 The moving image acquisition unit 101 acquires a moving image from the outside via, for example, a broadcast program receiver (tuner) connected to the moving image processing apparatus 10. The moving image acquisition unit 101 may acquire an uncompressed moving image. Alternatively, a moving image converted into digital data such as the MPEG format, MPEG-1, 2, 4 or the like which is a standard format for moving image compression may be acquired.

動画像取得部１０１は、取得した動画像をカット検出部１０２が処理するのに好適な形式に変換し、変換後の動画像をカット検出部１０２に渡す。ここで、好適な形式への変換とは、例えば圧縮（エンコード）されている動画像を伸長（デコード）する処理である。またカット検出部１０２による処理において必要十分な画像サイズに大きさを変換する処理である。 The moving image acquisition unit 101 converts the acquired moving image into a format suitable for the cut detection unit 102 to process, and passes the converted moving image to the cut detection unit 102. Here, conversion to a suitable format is, for example, processing for expanding (decoding) a compressed (encoded) moving image. In addition, it is a process of converting the size into a necessary and sufficient image size in the process by the cut detection unit 102.

カット検出部１０２は、１枚毎に入力された画像フレームに対し、直前に入力された画像フレームとの類似度を計算し、画像の内容が切り替わる画像変化点、すなわちカット点を検出する。また、ＭＰＥＧ−２のように画像圧縮に予測符号化を用いている動画像を取得した場合には、予測符号量の変動を用いることによってカット点を検出してもよい。 The cut detection unit 102 calculates the similarity between the image frame input for each image and the image frame input immediately before, and detects an image change point at which the content of the image is switched, that is, a cut point. Further, when a moving image using predictive coding for image compression such as MPEG-2 is acquired, a cut point may be detected by using a variation in the predicted code amount.

なお、カット検出部１０２がカット点を検出する方法は実施例に限定されるものではなく、既に知られている様々な手法によって実現されてもよい。こうした手法は本願出願人によって出願された特開平９−９３５８８などにも述べられている。 Note that the method by which the cut detection unit 102 detects the cut point is not limited to the embodiment, and may be realized by various known methods. Such a technique is also described in JP-A-9-93588 filed by the applicant of the present application.

ショット区間定義部１０３は、カット検出部１０２によって検出された、時間的に最も近い位置に並ぶ２つのカット点に囲まれた時間区間に属する画像フレームの集合をショットと定義する。例えば、再生開始から３分１５秒２０後に再生されるフレームである３分１５秒２０フレームの直前にカット点が検出され、かつ３分２１秒１２フレームの直前に次のカット点が検出された場合に、３分１５秒２０フレームから３分２１秒１１フレームまでを１つのショットと定義する。ここで、再生時刻とは、映像を再生させた場合に、映像開始から所定のフレームが再生されるまでに要する時間である。 The shot section definition unit 103 defines a set of image frames belonging to a time section surrounded by two cut points arranged at the closest positions detected by the cut detection unit 102 as shots. For example, a cut point is detected immediately before 3 minutes 15 seconds 20 frames, which is a frame that is played back 3 minutes 15 seconds 20 after the start of playback, and the next cut point is detected immediately before 3 minutes 21 seconds 12 frames. In this case, one shot is defined from 3 minutes 15 seconds 20 frames to 3 minutes 21 seconds 11 frames. Here, the reproduction time is a time required from when the video is started until a predetermined frame is reproduced when the video is reproduced.

類似ショット検出部１０４は、ショット区間定義部１０３が定義したショットを１単位として、類似するショットを検出する。具体的には、ショットに含まれる１または２以上のフレームを対象とするそれぞれのショットから選択する。そしてこれらのフレーム同士を比較することによって類似度を計測する。 The similar shot detection unit 104 detects similar shots with the shot defined by the shot section definition unit 103 as one unit. Specifically, one or more frames included in the shot are selected from each shot. Then, the similarity is measured by comparing these frames.

ショット自体の類似比較については本願出願人によって出願された特開平９−２７０００６の方法などを用いることができる。この方法によれば、対象となる２つのフレームのそれぞれにおいて特徴量を算出する。そして、これら２つの特徴量の距離を計算する。例えば各フレームに属する画素の色相（ｈｕｅ）値から３６分割のヒストグラムを計算し、３６分割それぞれの度数を要素としたベクトルを特徴量として利用する場合には、３６次元空間における２特徴量点の距離を計算する。この距離が類似度に対応する値であって、距離の値が小さいほど類似度が高い。 For the similarity comparison of shots themselves, the method disclosed in Japanese Patent Laid-Open No. 9-270006 filed by the applicant of the present application can be used. According to this method, the feature amount is calculated in each of the two target frames. Then, the distance between these two feature amounts is calculated. For example, when a 36-division histogram is calculated from the hue values of pixels belonging to each frame and a vector having the frequency of each of the 36 divisions as an element is used as a feature quantity, two feature quantity points in a 36-dimensional space are used. Calculate the distance. This distance is a value corresponding to the similarity, and the smaller the distance, the higher the similarity.

こうして計測した類似度が予め定められた値以上である場合に、これら２つのショットを互いに類似するショットとして検出する。このように、ショット同士の類似度に基づいて類似ショットを検出する。 When the similarity measured in this way is greater than or equal to a predetermined value, these two shots are detected as similar to each other. In this way, similar shots are detected based on the similarity between shots.

なお、類似ショット検出部１０４は、１つの動画像に含まれる１つのショットに対して、当該動画像に含まれる他の全てのショットと類似度を計測するが、他の例としては、１つのショットに対して、当該ショットと時間的に近傍にある所定の数のショットに限定して、類似度を計測してもよい。なお、類似ショット検出部１０４は、本発明の類似度計測手段および類似ショット特定手段を構成する。 The similar shot detection unit 104 measures the degree of similarity of one shot included in one moving image with all other shots included in the moving image. The degree of similarity may be measured only for a predetermined number of shots that are temporally adjacent to the shot. Note that the similar shot detection unit 104 constitutes similarity measurement means and similar shot identification means of the present invention.

グループ化部１１０は、類似ショット検出部１０４が検出した類似ショットに対して同一のラベルを付与することによって類似ショットをグループ化する。統計処理部１２０は、グループ化部１１０から取得した情報に基づいて、例えば、１つの動画像における各グループの出現回数などグループに関する統計情報を生成する。統計値保持部１２２は、統計処理部１２０によって生成された統計情報を保持する。 The grouping unit 110 groups similar shots by assigning the same label to similar shots detected by the similar shot detection unit 104. Based on the information acquired from the grouping unit 110, the statistical processing unit 120 generates statistical information about the group such as the number of appearances of each group in one moving image, for example. The statistical value holding unit 122 holds the statistical information generated by the statistical processing unit 120.

メタショット先頭グループ判定部１０５は、統計値保持部１２２に保持される統計情報に基づいて、類似ショット検出部１０４により生成されたグループからメタショットの先頭ショットとなるべき特徴的なショットのグループを選択する。 Based on the statistical information stored in the statistical value storage unit 122, the meta shot head group determination unit 105 selects a group of characteristic shots that should be the first shot of the meta shot from the group generated by the similar shot detection unit 104. select.

ニュース番組においては、１つのニュース項目をメタショットとするのが適当である場合が多いため、ニュース項目の冒頭を先頭ショットとして検出できることが望ましい。ニュース項目の冒頭ではアンカーパーソン（ニュースキャスター／アナウンサー）が登場する場合が多い。そこで、アンカーパーソンが登場する場面をメタショットの先頭グループとして検出できれば適切なメタショットを生成することができる。 In a news program, since it is often appropriate to use one news item as a meta shot, it is desirable that the beginning of the news item can be detected as the first shot. An anchor person (newscaster / announcer) often appears at the beginning of news items. Therefore, if a scene where an anchor person appears can be detected as the first group of meta shots, an appropriate meta shot can be generated.

メタショット先頭グループ判定部１０５がアンカーパーソンのショットを選択する方法、すなわちメタショット先頭グループを判定する方法としては、たとえば類似ショットグループに属する複数のショットについて、ニュース番組中での出現回数、番組全体に渡る出現の時間的分布の広範さ、そのグループに属するショットの時間長などの条件のうち１または２以上の条件に基づいて判定する方法が採用される。先頭グループの判定方法については後に詳述する。 As a method by which the meta shot head group determination unit 105 selects an anchor person shot, that is, a method of determining a meta shot head group, for example, for a plurality of shots belonging to a similar shot group, the number of appearances in a news program, the entire program A method is adopted in which the determination is made based on one or more conditions among conditions such as the wide range of temporal distributions of appearances and the time lengths of shots belonging to the group. A method for determining the head group will be described in detail later.

なお、実施例にかかるメタショット先頭グループ判定部１０５は、本発明のショット個数比較手段、最短時間長比較手段、最長時間長比較手段、時間長平均値算出手段、平均時間長比較手段、ショット間時間長測定手段、ショット間時間長比較手段、ショット位置判
断手段を構成する。 The meta shot head group determining unit 105 according to the embodiment includes the shot number comparison unit, the shortest time length comparison unit, the longest time length comparison unit, the time length average value calculation unit, the average time length comparison unit, and the shot interval according to the present invention. A time length measuring means, an inter-shot time length comparing means, and a shot position determining means are configured.

メタショット先頭時刻判定部１０６は、メタショット先頭グループ判定部１０５から先頭グループの判定結果を取得する。メタショット先頭グループ判定部１０５から取得した判定結果に基づいて、先頭グループと判定されたグループに属するショットから、実際にメタショットの先頭となるべきショットを特定し、特定した先頭ショットの開始位置に対応する再生時刻をメタショット先頭時刻と定義する。 The meta shot head time determination unit 106 acquires the determination result of the head group from the meta shot head group determination unit 105. Based on the determination result acquired from the meta shot head group determination unit 105, the shot that should actually be the head of the meta shot is identified from the shots belonging to the group determined as the head group, and the start position of the identified head shot is determined. The corresponding playback time is defined as the metashot start time.

具体的には、当該ショットの再生時間長が所定の長さ以上のものを先頭ショットとして特定する。例えば、１つのニュース項目中にゲストとの対話でアンカーパーソンが登場する場合がある。この場合、アンカーパーソンがゲストとの対話で登場する時間は、ニュース項目の冒頭に登場する時間よりも短いことが多い。従って、このようにショットの再生時間長を所定の長さ以上と制限することにより、ニュース項目中に含まれるアンカーパーソンのショットを先頭ショットの候補から除外することができる。 Specifically, a shot whose playback time length is equal to or longer than a predetermined length is specified as the first shot. For example, an anchor person may appear in a news item in a dialogue with a guest. In this case, the time that the anchor person appears in the dialogue with the guest is often shorter than the time that the anchor person appears at the beginning of the news item. Therefore, by limiting the shot playback time length to a predetermined length or more, the anchor person shot included in the news item can be excluded from the head shot candidates.

また他の例としては、メタショット先頭時刻判定部１０６は、当該ショットの再生時間長が所定の長さ以下のものを先頭ショットとして特定してもよい。また他の例としては、同じ類似ショットグループに属する他のショットとの時間的な間隔、他の類似ショットグループに属するショットとの時間的な前後関係、分布、包含関係などの条件に基づいて先頭ショットを特定してもよい。 As another example, the meta shot head time determination unit 106 may specify a shot whose playback time length is equal to or shorter than a predetermined length as the head shot. As another example, the head is based on conditions such as a temporal interval with other shots belonging to the same similar shot group, a temporal context with other shots belonging to another similar shot group, a distribution, an inclusion relationship, etc. A shot may be specified.

また、これら複数の条件のうちから選択した一の条件のみに基づいて先頭ショットを特定してもよく、また他の例としては、これら複数の条件の全てまたはこれら複数の条件から選択した２以上の条件に基づいて先頭ショットを特定してもよい。なお、メタショット先頭時刻判定部１０６は、本発明の先頭ショット選択手段を構成する。 Further, the first shot may be specified based on only one condition selected from the plurality of conditions. As another example, all of the plurality of conditions or two or more selected from the plurality of conditions may be used. The first shot may be specified based on the above condition. The meta shot head time determination unit 106 constitutes head shot selection means of the present invention.

メタショット生成部１０７は、メタショット先頭時刻判定部１０６が特定した先頭ショットを先頭とするメタショットを生成する。具体的には、メタショット先頭時刻から次のメタショット先頭時刻までの間に連続して配置された複数のショットそれぞれに対して、同一のメタショットであることを示す同一のラベルを付与する。 The meta shot generation unit 107 generates a meta shot starting from the head shot specified by the meta shot head time determination unit 106. Specifically, the same label indicating the same metashot is assigned to each of a plurality of shots continuously arranged from the metashot start time to the next metashot start time.

ニュース番組の映像においては、ニュース番組の先頭から、最初にメタショット先頭時刻が登場する直前までの時刻を番組オープニングと判定し、オープニングのメタショットとしてラベル付けしてもよい。また、最後のメタショット先頭時刻からニュース番組の終了までのショットをメタショットとしてもよい。 In the video of a news program, the time from the beginning of the news program to the time immediately before the first metashot start time appears may be determined as the program opening and labeled as the opening metashot. A shot from the last metashot start time to the end of the news program may be used as a metashot.

以上のように、動画像がメタショットによって分割されると、その分割結果を動画像出力部１０８から出力する。こうして出力されたデータは、例えば表示装置に送られる。そして、表示装置では、メタショットに基づいて動画像の映像内容の一覧が表示される。またはメタショット単位で再生表示される。 As described above, when the moving image is divided by the meta shot, the division result is output from the moving image output unit 108. The data output in this way is sent to a display device, for example. The display device displays a list of video contents of the moving image based on the meta shot. Or, playback is displayed in meta shot units.

動画像出力部１０８は、メタショット生成部１０７によってメタショットに分割された動画像を出力する。動画像は、例えば表示装置に向けて出力されてもよい。この場合には、表示装置において、映像（番組）内容を一覧表示する。または動画像がメタショット単位で視聴される。 The moving image output unit 108 outputs the moving image divided into meta shots by the meta shot generation unit 107. The moving image may be output to a display device, for example. In this case, a list of video (program) contents is displayed on the display device. Or, a moving image is viewed in meta shot units.

このように、メタショットの区間に対応させて動画像を表示することにより、例えばニュース項目毎の画面一覧を作成することができる。またリモコンの「スキップ」ボタンを操作することにより、あるニュース項目を視聴しているときでも次のニュース項目の先頭のショットを視聴することができる。 In this way, by displaying a moving image corresponding to the section of the meta shot, for example, a screen list for each news item can be created. Further, by operating the “Skip” button on the remote controller, it is possible to view the first shot of the next news item even while viewing a certain news item.

図２は、統計値保持部１２２が保持する統計情報を模式的に示している。図２に示すように、類似ショットグループを識別するグループＩＤに対応付けて、各グループに関する情報を保持している。各グループに関する情報は、本実施例においてはそのグループに属するショットの数（以下「回数」と称す）、グループに属するショットの中で再生時間が最短であるものの再生時間長（最短）、グループに属するショットの中で再生時間が最長の再生時間長（最長）、グループに属するショットの再生時間長の平均値（平均長）、そのグループに属する最初のショットの開始時刻から最後のショットの終了時刻までの再生時間長（分布）、およびそのショットグループがほかのショットグループによって包含されているかどうか（被包含）である。 FIG. 2 schematically shows statistical information held by the statistical value holding unit 122. As shown in FIG. 2, information on each group is held in association with a group ID for identifying a similar shot group. In this embodiment, the information about each group includes the number of shots belonging to the group (hereinafter referred to as “number of times”), the playback time length (shortest) of the shots belonging to the group that has the shortest playback time, The longest playback time length of the shots belonging to the group (longest), the average playback time length of the shots belonging to the group (average length), and the end time of the last shot from the start time of the first shot belonging to the group Playback time length (distribution) and whether or not the shot group is included by other shot groups (included).

ここで、被包含の概念について説明する。ニュース番組においては、先頭ショットとすべきアンカーパーソンのショットは、動画像の全体に渡って点在している。そして、例えばアンカーパーソンのショットのうち、各ニュース項目の間に配置されたショットはアンカーパーソン以外の所定のグループに属する２つのショットに挟まれていないような関係を、被包含の関係と定義する。 Here, the concept of inclusion will be described. In news programs, anchor person shots that should be the first shot are scattered throughout the entire moving image. For example, among the shots of an anchor person, a relation in which a shot placed between each news item is not sandwiched between two shots belonging to a predetermined group other than the anchor person is defined as an inclusion relation. .

一方、各ニュース項目の映像に対応するショットは、動画像における所定の再生時刻の範囲にのみ偏在している。そして、ニュース項目の前後には、アンカーパーソンのショットが配置されている。このように、所定のグループに属する２つのショットに挟まれているような関係を包含の関係と定義する。 On the other hand, shots corresponding to the video of each news item are unevenly distributed only within a predetermined playback time range in the moving image. And before and after the news item, anchor person shots are arranged. In this way, a relationship between two shots belonging to a predetermined group is defined as an inclusion relationship.

以上のことから、ニュース番組においては、被包含のグループであるか否かに基づいて、アンカーパーソンのショットのグループ、すなわち先頭グループか否かを判定することができる。 From the above, in a news program, it is possible to determine whether or not it is a group of anchor person shots, that is, a leading group, based on whether or not it is an included group.

以下、図３を参照しつつ被包含の概念についてより具体的に説明する。図３は動画像を模式的に示す図である。各長方形は１つのショットを表し、例えばＡと付された長方形はＡグループに属するショットを示している。すなわち同じ記号が付与されたショットはそれぞれ同じグループに属する。また、横軸は時間軸である。すなわち、ショット７０１、ショット７０６の順に再生される。また、ＢショットおよびＣショットは、動画像において図３に示す以外の位置には存在しないこととする。 Hereinafter, the concept of inclusion will be described more specifically with reference to FIG. FIG. 3 is a diagram schematically showing a moving image. Each rectangle represents one shot. For example, a rectangle attached with A indicates a shot belonging to the A group. That is, shots assigned the same symbol belong to the same group. The horizontal axis is the time axis. That is, the shots 701 and 706 are reproduced in this order. Further, it is assumed that the B shot and the C shot do not exist in positions other than those shown in FIG. 3 in the moving image.

図３において、グループＣに属するＣショット７０８，７０９は、いずれもＡショット７０４とＡショット７０５の間に配置されている。また、Ａショット７０４およびＡショット７０５は、時間軸方向において連続して出現する２つのショットである。この場合、グループＣは包含のグループである。 In FIG. 3, C shots 708 and 709 belonging to group C are both arranged between A shot 704 and A shot 705. The A shot 704 and the A shot 705 are two shots that appear continuously in the time axis direction. In this case, group C is an inclusion group.

このように、同一のグループに属し、時間軸方向に連続して出現する２つのショットの間に、同一のグループに属する全てのショットが存在する場合に、２つのショット間に存在するショットのグループは、被包含のグループとなる。 In this way, when all shots belonging to the same group exist between two shots belonging to the same group and appearing continuously in the time axis direction, the group of shots existing between the two shots Becomes a group of inclusions.

一方、Ｂショット７０６およびＢショット７０７は、いずれもＡショットに挟まれているが、Ｂショット７０６とＢショット７０７の間にＡショット７０２が挟まれており、Ｂショット７０６およびＢショット７０７は、連続する２つのＡショットの間には存在していない。従って、グループＢは被包含のグループである。 On the other hand, B shot 706 and B shot 707 are both sandwiched between A shots, but A shot 702 is sandwiched between B shot 706 and B shot 707, and B shot 706 and B shot 707 are It does not exist between two consecutive A shots. Therefore, group B is an included group.

一方、Ａグループにおいては、例えば、Ａショット７０３は、Ｂショット７１２に続いて配置されているが、Ａショット７０３の後にＢショットが配置されていない。従って、被包含のグループではないと判断される。 On the other hand, in the A group, for example, the A shot 703 is arranged following the B shot 712, but the B shot is not arranged after the A shot 703. Therefore, it is determined that the group is not included.

ここで、メタショット先頭グループ判定部１０５が統計値保持部１２２に格納されている統計情報に基づいて先頭グループを選択する処理について、図４および図５を参照しつつ説明する。 Here, a process in which the metashot head group determination unit 105 selects a head group based on the statistical information stored in the statistical value holding unit 122 will be described with reference to FIGS. 4 and 5.

図４は、ニュース番組を模式的に示している。横軸は時間軸である。ニュース番組は、上段から下段へと再生される。図５は、各グループに属するショットの内容を示している。Ａグループはアンカーパーソンのショットである。ＢグループはワシントンＤＣ駐在の特派員が登場するショットである。Ｃグループは答弁する首相が登場するショットである。Ｄグループは県庁舎の映像ショットである。Ｅグループはアンカーパーソンを別のカメラ構図で捕らえたショットである。 FIG. 4 schematically shows a news program. The horizontal axis is the time axis. News programs are played from the top to the bottom. FIG. 5 shows the contents of shots belonging to each group. Group A is an anchor person shot. Group B is a shot of a correspondent stationed in Washington, DC. Group C is a shot where the prime minister who answers is appearing. Group D is a video shot of the prefectural office building. Group E is a shot of an anchor person captured by another camera composition.

ニュース番組の開始から始まるメタショットは、ニュース番組の概要を紹介するニュース概要ショットである。ニュース概要ショットには、後述する２番目のニュース項目である「混迷する予算委員会討議」という内容のヘッドラインのＣショット７１０が含まれている。Ｃショット７１０は、答弁する首相の横顔の映像である。 A meta shot starting from the start of a news program is a news summary shot that introduces an outline of the news program. The news summary shot includes a C-shot 710 of the headline with the content of “discussion of the confusing budget committee”, which is the second news item described later. The C shot 710 is an image of the profile of the prime minister who answers.

これに続いて、ショット７０２からショット７０７の間は、１番目のニュース項目であるアメリカ議会の話題に対応するメタショットである。 Following this, a shot between the shot 702 and the shot 707 is a meta shot corresponding to the topic of the US Congress, which is the first news item.

Ａショット７１２は、アンカーパーソンが挨拶し、最初のニュース項目のリード部分をアナウンスする映像である。そして、ＢショットとＡショットが交互に配置されるシーンが続く（７１３から７１６）。これは、ワシントンＤＣ駐在の特派員とアンカーパーソンが中継で対話するシーンである。そして、この対話のあと、アメリカ国会の映像が２ショット入り（７１７、７１８）、このニュース項目が終了する。 A shot 712 is an image in which the anchor person greets and announces the lead portion of the first news item. Then, a scene in which B shots and A shots are alternately arranged continues (713 to 716). This is a scene where a correspondent in Washington, DC and an anchor person have a dialogue. After this dialogue, the video of the American Diet enters two shots (717, 718), and this news item ends.

続いて、Ａショット７２０からショット７２２の間は、２番目のニュース項目に対応するメタショットである。２番目のニュース項目に対応するメタショットにおいては、Ａショット７２０の次に国会議事堂のショットおよび予算委員会討議室のショットが配置されている。 Subsequently, a portion between the A shot 720 and the shot 722 is a meta shot corresponding to the second news item. In the meta shot corresponding to the second news item, the A shot 720 is followed by a shot of the Diet Building and a shot of the budget committee discussion room.

さらにこれに続いてＣショット７２１も配置されている。Ｃショット７２１は、ニュース概要のメタショットに含まれているＣショット７１０と同一のショットである。 Following this, a C shot 721 is also arranged. The C shot 721 is the same shot as the C shot 710 included in the news summary meta shot.

続くＡショット７３１からショット７３２の間は、３番目のニュース項目に対応するメタショットである。３番目のニュース項目は、ある地方自治体の歳入不足を報じるものである。県庁舎のＤショット７３１，７３２を含む報道シーンで構成されている。 A subsequent shot from A shot 731 to shot 732 is a meta shot corresponding to the third news item. The third news item reports a lack of revenue for a local government. It consists of a news scene including D shots 731 and 732 of the prefectural office.

続くＡショット７３４からショット７３５の間は、４番目のニュース項目に対応するメタショットである。また、Ｅショット７４０，７４２，７４４から始まるニュース項目は、それぞれ為替と株価、天気予報、エンディングである。 A subsequent shot from A shot 734 to shot 735 is a meta shot corresponding to the fourth news item. The news items starting from E shots 740, 742, and 744 are exchange rate, stock price, weather forecast, and ending, respectively.

以上のようなニュース番組において、メタショット先頭グループ判定部１０５は例えば、グループに属するショットの数、すなわち回数に基づいて先頭グループを特定する。具体的には「登場回数が３回以上」という条件に合致するグループを先頭グループとして特定する。このように、所定の回数以上登場するグループを先頭グループとして特定する。これにより、図４および図５を参照しつつ説明したニュース番組においては、Ａグループが特定される。このように、望ましいグループを特定することができる。 In the news program as described above, the meta shot head group determination unit 105 specifies the head group based on, for example, the number of shots belonging to the group, that is, the number of times. Specifically, a group that matches the condition “appearance count is 3 or more” is identified as the first group. In this way, a group that appears more than a predetermined number of times is identified as the first group. Thereby, the A group is specified in the news program described with reference to FIGS. 4 and 5. Thus, a desirable group can be specified.

または、アンカーパーソンが極端に多く登場することはないので、所定の回数以下の登
場回数である場合に、当該グループを先頭グループとして特定してもよい。 Alternatively, since an extremely large number of anchor persons do not appear, when the number of appearances is equal to or less than a predetermined number, the group may be specified as the first group.

また、「同一グループに属するショットの最短の長さが１０秒以上」、すなわち同一グループに属するショットの最短の長さが所定の値以上であることを条件として先頭グループを特定してよい。さらにまた、「同一グループに属するショットの最長の長さが２１秒以上」、すなわち同一グループに属するショットの最長の長さが所定の値以上であることを条件として先頭グループを特定してもよい。 Further, the head group may be specified on condition that “the shortest length of shots belonging to the same group is 10 seconds or more”, that is, the shortest length of shots belonging to the same group is a predetermined value or more. Furthermore, the head group may be specified on condition that “the longest length of shots belonging to the same group is 21 seconds or longer”, that is, the longest length of shots belonging to the same group is equal to or greater than a predetermined value. .

また、「同一グループに属するショットの長さの平均値が１２秒以上」、すなわち同一グループに属するショットの長さの平均値が所定の値以上であることを条件として先頭グループを特定してもよい。さらにまた、同一グループに属するショットの長さの平均値が所定の値以上であることを条件として先頭グループを特定してもよい。 Further, even if the first group is specified on condition that “the average value of the lengths of the shots belonging to the same group is 12 seconds or more”, that is, the average value of the lengths of the shots belonging to the same group is a predetermined value or more. Good. Furthermore, the head group may be specified on the condition that the average length of shots belonging to the same group is equal to or greater than a predetermined value.

図４および図５を参照しつつ説明したニュース番組においては、「同一グループに属するショットの最長の長さが２１秒以上」という条件により図２を参照しつつ説明した統計情報からＡグループが特定される。 In the news program described with reference to FIGS. 4 and 5, Group A is identified from the statistical information described with reference to FIG. 2 under the condition that “the longest length of shots belonging to the same group is 21 seconds or longer”. Is done.

また、そのグループに属する最初のショットの開始時刻から最後のショットの終了時刻までの長さ、すなわち分布に基づいてもよい。具体的には、「分布が３分以上」という条件に合致するグループを先頭グループとして特定する。この条件によりＡグループとＣグループが特定される。 Further, it may be based on the length from the start time of the first shot belonging to the group to the end time of the last shot, that is, the distribution. Specifically, a group that satisfies the condition “distribution is 3 minutes or more” is specified as the first group. With this condition, the A group and the C group are specified.

この場合、さらに、冒頭に登場するＣグループのような、特別な登場を含むグループを除外するために、「分布」としていた条件を「同一ショットグループ中で２番目以降に登場するショットの『分布』」とすることにより正確に先頭グループを特定することができる。 In this case, in order to exclude a group including a special appearance such as the C group appearing at the beginning, the condition of “distribution” is changed to “distribution of shots appearing second and later in the same shot group”. By specifying “]”, the head group can be specified accurately.

他の例としては、「同一ショットグループに属するショットが再生される位置、すなわち配置の分散を計算し、配置の平均的な再生時間から、その分散に一定係数を積算した時間以上離れているショットを除外した『分布』」などを条件としてもよい。 Another example is “shots that are more than the time at which shots belonging to the same shot group are reproduced, that is, the variance of the arrangement, and the average reproduction time of the arrangement is multiplied by a certain coefficient added to that variance. The condition may be “distribution” excluding “.”

また「被包含」のグループであることを条件にしてもよい。これにより、ＡグループとＣグループを特定することができる。 Alternatively, it may be a condition that the group is “included”. Thereby, A group and C group can be specified.

以上、メタショット先頭グループ判定部１０５が先頭グループを選択する条件について説明したが、上記条件のうちから選択した１又は２以上の条件に基づいて先頭グループを特定してもよい。 The conditions for the metashot head group determination unit 105 to select the head group have been described above. However, the head group may be specified based on one or more conditions selected from the above conditions.

また、動画像が動画像取得部１０１に入力されるのに先立ち、または入力された際に解析パラメータ受信部１９０が上記のカット検出部１０２、類似ショット検出部１０４、メタショット先頭グループ判定部１０５、メタショット先頭時刻判定部１０６の各処理に必要な条件（パラメータ）を受信し、これら検出部、判定部に供給してもよい。 Also, prior to or when a moving image is input to the moving image acquisition unit 101, the analysis parameter receiving unit 190 performs the cut detection unit 102, the similar shot detection unit 104, and the meta shot head group determination unit 105 described above. The conditions (parameters) necessary for each process of the meta shot head time determination unit 106 may be received and supplied to the detection unit and the determination unit.

例えば、ＥＰＧまたはｉＥＰＧと呼ばれる電子番組表サービスでは、インターネット上で番組内容や放送チャンネル、開始・終了時刻などを提供している。これと同様に、あるいはＥＰＧまたはｉＥＰＧ情報の一部として、解析パラメータをインターネット上に提供するサービスがあった場合には、本発明の動画像処理装置は録画番組に応じて検出、判定のパラメータを変えることができる。 For example, an electronic program guide service called EPG or iEPG provides program contents, broadcast channels, start / end times, and the like on the Internet. Similarly, when there is a service that provides analysis parameters on the Internet as part of EPG or iEPG information, the moving image processing apparatus of the present invention sets parameters for detection and determination according to the recorded program. Can be changed.

具体的には、話題の変わり目ごとに必ず類似したタイトル画面が挿入されるような、特
定のバラエティ番組が入力される場合、本発明の動画像処理装置はその番組特有のパラメータ設定を録画前、あるいは録画中にインターネットからダウンロードする。本発明の動画像処理装置は、ダウンロードされた「３回以上登場する類似ショットであって、登場の最小間隔が２分以上」などという条件を用いて、より高精度に話題ごとのメタショットを作成できる。 Specifically, when a specific variety program is input such that a similar title screen is always inserted at each turning point of the topic, the moving image processing apparatus of the present invention sets the program-specific parameter settings before recording, Or download from the Internet during recording. The moving image processing apparatus of the present invention performs meta shots for each topic with higher accuracy by using the downloaded conditions such as “similar shots appearing three times or more and the minimum interval of appearance is two minutes or more”. Can be created.

解析パラメータのダウンロード手段はインターネットに限定されない。例えば、４月中旬、１０月中旬など、新番組が出揃った時期に、ＣＤ−ＲＯＭやメモリカードなどの形態で番組ごとの最適な解析パラメータ設定が供給されてもよい。解析パラメータ受信部１９０は番組が本装置に入力された際に、その番組に対応する最適パラメータ設定を記録メディアから読み取り、それを各検出、判定部に供給してもよい。また、記録メディアに記録された最適パラメータを一旦本装置内の記録領域（図示せず）にコピーし、解析パラメータ受信部１９０は本装置に番組が入力された際に、この記録領域から最適パラメータを読み取り、各検出、判定部に供給してもよい。 The means for downloading analysis parameters is not limited to the Internet. For example, the optimal analysis parameter setting for each program may be supplied in the form of a CD-ROM, a memory card, or the like when new programs are available, such as mid-April and mid-October. When a program is input to the apparatus, the analysis parameter receiving unit 190 may read an optimum parameter setting corresponding to the program from a recording medium and supply it to each detection and determination unit. Also, the optimum parameters recorded on the recording medium are temporarily copied to a recording area (not shown) in the apparatus, and the analysis parameter receiving unit 190 receives the optimum parameters from the recording area when a program is input to the apparatus. May be read and supplied to each detection and determination unit.

図６は、動画像処理装置１０における動画像処理を示すフローチャートである。動画像処理は、主に、ショット区間定義処理、グループ化処理およびメタショット生成処理の３つの処理を含んでいる。 FIG. 6 is a flowchart showing moving image processing in the moving image processing apparatus 10. The moving image processing mainly includes three processes of a shot section definition process, a grouping process, and a metashot generation process.

まず、ショット区間定義処理が行われる。すなわち、カット検出部１０２は、画像フレームを１フレームずつ取得する入力する（ステップＳ２０２）そして、カット検出部１０２は、ステップＳ２０２において取得した画像フレームの直前に取得した画像フレームと、ステップＳ２０２において取得した画像フレームとの類似度を計算し、類似度に基づいてカット点を検出する。 First, shot section definition processing is performed. That is, the cut detection unit 102 inputs to acquire image frames one by one (step S202), and the cut detection unit 102 acquires the image frame acquired immediately before the image frame acquired in step S202 and the step S202. The similarity with the image frame is calculated, and a cut point is detected based on the similarity.

取得した画像フレームがカット点である場合（ステップＳ２０３，Ｙｅｓ）、ショット区間定義部１０３は、当該カット点から直前のカット点までの間をショット区間として定義する（ステップＳ２０４）。 When the acquired image frame is a cut point (step S203, Yes), the shot section definition unit 103 defines a section from the cut point to the immediately preceding cut point as a shot section (step S204).

以上ステップＳ２０２からステップＳ２０４の処理を繰り返す。映像（番組）全体についてのショット区間の定義が完了すると（ステップＳ２０１，Ｙｅｓ）、ショット区間定義処理が完了し、グループ化処理に進む。 The processing from step S202 to step S204 is repeated as described above. When the definition of the shot section for the entire video (program) is completed (step S201, Yes), the shot section definition process is completed and the process proceeds to the grouping process.

類似ショット検出部１０４は、所定のショットを基準ショットとして選択し、当該ショットと比較すべき対象ショットとの類似度を判定する（ステップＳ２０７）。そして、対象ショットが基準ショットと類似していると判断した場合には（ステップＳ２０８，Ｙｅｓ）、グループ化部１１０は、当該対象ショットと基準ショットに対して同一のグループを識別するラベルを付与する。すなわち、対象ショットと基準ショットとをグループ化する（ステップＳ２０９）。 The similar shot detection unit 104 selects a predetermined shot as a reference shot, and determines the degree of similarity between the shot and the target shot to be compared (step S207). If it is determined that the target shot is similar to the reference shot (Yes in step S208), the grouping unit 110 assigns a label for identifying the same group to the target shot and the reference shot. . That is, the target shot and the reference shot are grouped (step S209).

以上のステップＳ２０７およびステップＳ２０８の処理を、１つの基準ショットに対する全ての対象ショットについて繰り返す。全ての対象ショットに対して処理が完了すると（ステップＳ２０６，Ｙｅｓ）、基準ショットを替えて、再度ステップＳ２０７およびステップＳ２０８の処理を繰り返す。 The processes in steps S207 and S208 are repeated for all target shots for one reference shot. When the processing is completed for all target shots (step S206, Yes), the reference shot is changed, and the processing of step S207 and step S208 is repeated again.

そして、映像全体について基準ショットと対象ショットとの類似度判定処理が完了すると（ステップＳ２０５，Ｙｅｓ）、グループ化処理が完了し、次のメタショット生成処理に進む。 When the similarity determination process between the reference shot and the target shot is completed for the entire video (step S205, Yes), the grouping process is completed and the process proceeds to the next metashot generation process.

メタショット先頭グループ判定部１０５は、統計値保持部１２２に保持される統計情報
に基づいて、先頭グループを特定する。そして、メタショット先頭時刻判定部１０６は、メタショット先頭グループ判定部１０５が特定した先頭グループに基づいて、メタショット先頭時刻を定義する。処理対象となっているグループが先頭グループの条件に合致すると（ステップＳ２１１）、メタショット生成部１０７は、当該グループを先頭ショットとするメタショットを生成する（ステップＳ２１２）。 The meta shot head group determination unit 105 identifies the head group based on the statistical information held in the statistical value holding unit 122. Then, the metashot start time determination unit 106 defines the metashot start time based on the start group specified by the metashot start group determination unit 105. When the group to be processed matches the condition of the top group (step S211), the metashot generation unit 107 generates a metashot having the group as the top shot (step S212).

以上ステップＳ２１１およびステップ２１２を繰り返す。映像全体についてメタショットの生成が完了すると（ステップＳ２１０，Ｙｅｓ）、メタショット生成処理が完了し、動画像処理が完了する。 Steps S211 and 212 are repeated. When the generation of the meta shot is completed for the entire video (step S210, Yes), the meta shot generation process is completed and the moving image process is completed.

なお、既出のように解析パラメータ受信ステップ（図示せず）が存在し、本処理前あるいは本処理中に解析パラメータ受信ステップによってインターネットなどから受信された番組ごとの最適パラメータ設定を用いてステップＳ２０３，Ｓ２０７，Ｓ２１１が検出、判定処理を行ってもよい。 Note that there is an analysis parameter reception step (not shown) as described above, and using the optimal parameter setting for each program received from the Internet or the like by the analysis parameter reception step before or during this processing, step S203, S207 and S211 may perform detection and determination processing.

以上のように、実施例１にかかる動画像処理装置１０は、同一のグループに属するショットの出現パターンに基づいて先頭ショットを特定するので、必要以上に細かいメタショットを生成することを避けることができる。これにより、ユーザによる所定のシーンの検索等を容易にすることができる。 As described above, since the moving image processing apparatus 10 according to the first embodiment specifies the first shot based on the appearance pattern of shots belonging to the same group, it is possible to avoid generating a metashot that is finer than necessary. it can. As a result, the user can easily search for a predetermined scene.

動画像処理装置１０における動画像処理は、（１）ショット区間定義処理、（２）グループ化処理、（３）メタショット生成処理の３つの処理（図２の破線で囲まれた部分）で構成されている。実施例においては、動画像に含まれる全てのショットに対して（１）ショット区間定義処理が完了した後に、（２）グループ化処理に移行した。同様に、動画像に含まれる全てのショットに対して（２）グループ化処理が完了した後に、（３）メタショット生成処理に移行した。これにかえて、他の例としては、動画像処理装置に一時記憶領域（図示せず）を設けることにより、映像の入力を行いながら上記３つの処理を並行して実行してもよい。 The moving image processing in the moving image processing apparatus 10 includes three processes (a part surrounded by a broken line in FIG. 2): (1) shot section definition processing, (2) grouping processing, and (3) metashot generation processing. Has been. In the embodiment, after (1) shot section definition processing is completed for all shots included in the moving image, the processing proceeds to (2) grouping processing. Similarly, after (2) grouping processing is completed for all shots included in the moving image, the process proceeds to (3) metashot generation processing. Instead of this, as another example, by providing a temporary storage area (not shown) in the moving image processing apparatus, the above three processes may be executed in parallel while inputting the video.

例えば、新しいカットが検出され、ショット区間が定義されるたびに、そのショット区間と過去のショット区間に対する類似ショットの判定を行い、そこまでの類似ショット判定結果に基づいて当座のメタショット生成を行ってもよい。このように、並列に処理を実行することによりニュース番組の終了後、きわめて短い時間で処理結果を得ることができる。 For example, each time a new cut is detected and a shot section is defined, similar shots are determined for the shot section and the past shot section, and a current meta shot is generated based on the previous similar shot determination results. May be. As described above, by executing the processing in parallel, the processing result can be obtained in a very short time after the news program ends.

図７は、動画像処理装置１０のハードウェア構成を示す図である。動画像処理装置１０は、ハードウェア構成として、動画像処理装置１０における動画像処理を実行するプログラムなどが格納されているＲＯＭ５２、ＲＯＭ５２内のプログラムに従って動画像処理装置１０の各部を制御し、動画像処理等を実行するＣＰＵ５１、ワークエリアが形成され、動画像処理装置１０の制御に必要な種々のデータが記憶されているＲＡＭ５３、ネットワークに接続して、通信を行う通信I／Ｆ５７、および各部を接続するバス６２を備えている。 FIG. 7 is a diagram illustrating a hardware configuration of the moving image processing apparatus 10. The moving image processing apparatus 10 controls, as a hardware configuration, each part of the moving image processing apparatus 10 according to a program in the ROM 52 and the ROM 52 in which a program for executing the moving image processing in the moving image processing apparatus 10 is stored. CPU 51 for executing image processing, a RAM 53 in which a work area is formed and various data necessary for controlling the moving image processing apparatus 10 is stored, a communication I / F 57 for communication by connecting to a network, and each unit Are provided.

先に述べた動画像処理装置１０における動画像処理を実行する動画像処理プログラムは、インストール可能な形式又は実行可能な形式のファイルでＣＤ−ＲＯＭ、フロッピー（Ｒ）ディスク（ＦＤ）、ＤＶＤ等のコンピュータで読み取り可能な記録媒体に記録されて提供される。 The moving image processing program for executing the moving image processing in the moving image processing apparatus 10 described above is an installable or executable file, such as a CD-ROM, floppy (R) disk (FD), DVD, or the like. The program is provided by being recorded on a computer-readable recording medium.

また、本実施例の動画像処理プログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成しても良い。 Further, the moving image processing program of the present embodiment may be provided by being stored on a computer connected to a network such as the Internet and downloaded via the network.

この場合には、動画像処理プログラムは、動画像処理装置１０において上記記録媒体から読み出して実行することにより主記憶装置上にロードされ、上記ソフトウェア構成で説明した各部が主記憶装置上に生成されるようになっている。 In this case, the moving image processing program is loaded onto the main storage device by being read from the recording medium and executed by the moving image processing device 10, and each unit described in the software configuration is generated on the main storage device. It has become so.

次に、実施例２にかかる動画像処理装置１０について説明する。実施例２に係る動画像処理装置１０は、例えば、図４に示すニュース番組におけるニュース項目１におけるＡショット７１４，７１６のように、メタショットの先頭ショットとならないショットを、対話度数という指標に基づいて特定する。ここで、対話度数とは、先頭ショットとすべきか否か問題となっているショットを含む所定の時間範囲内における当該ショットの出現頻度を示す値である。なお、対話度数については後に詳述する。 Next, the moving image processing apparatus 10 according to the second embodiment will be described. The moving image processing apparatus 10 according to the second embodiment uses, for example, a shot that does not become the first shot of the meta shot, such as the A shots 714 and 716 in the news item 1 in the news program shown in FIG. To identify. Here, the interaction frequency is a value indicating the appearance frequency of the shot within a predetermined time range including the shot in question whether it should be the first shot. The dialogue frequency will be described in detail later.

図８は、実施例２にかかる動画像処理装置１０の機能構成を示すブロック図である。実施例２にかかる動画像処理装置１０は、実施例１にかかる動画像処理装置１０の機能構成に加えて、対話度数算出部１２４をさらに備えている。対話度数算出部１２４は、ショット区間定義部１０３、類似ショット検出部１０４、グループ化部１１０から取得した情報に基づいて、対話度数を算出する。また、メタショット先頭時刻判定部１０６は、対話度数算出部１２４が算出した対話度数に基づいてメタショットの先頭とすべきショットを特定する。 FIG. 8 is a block diagram of a functional configuration of the moving image processing apparatus 10 according to the second embodiment. In addition to the functional configuration of the moving image processing apparatus 10 according to the first embodiment, the moving image processing apparatus 10 according to the second embodiment further includes an interaction frequency calculation unit 124. The interaction frequency calculation unit 124 calculates the interaction frequency based on the information acquired from the shot section definition unit 103, the similar shot detection unit 104, and the grouping unit 110. Further, the meta shot head time determination unit 106 specifies a shot to be the head of the meta shot based on the conversation frequency calculated by the interaction frequency calculation unit 124.

なお、実施例にかかる対話度数算出部１２４は、本発明にかかる対話度数算出手段、基準範囲特定手段、ショット時間長算出手段、第１のショット個数算出手段、第２のショット個数算出手段を構成する。 The interaction frequency calculation unit 124 according to the embodiment includes an interaction frequency calculation unit, a reference range specifying unit, a shot time length calculation unit, a first shot number calculation unit, and a second shot number calculation unit according to the present invention. To do.

図９は、に示したショット７１１からショット７２０、すなわちアンカーパーソンと特派員の対話シーンを示している。図９を参照しつつ、対話度数算出部１２４が、Ａショット７１４が先頭メタショットか否かを検討する際に利用される対話度数を算出する処理について説明する。 FIG. 9 shows shots 711 to 720 shown in FIG. 9, that is, an interaction scene between the anchor person and the correspondent. With reference to FIG. 9, a description will be given of a process in which the interaction degree calculation unit 124 calculates the interaction degree used when examining whether or not the A shot 714 is the first meta shot.

まず、対象となるＡショット７１４の中央の時刻から前後に規定時間（たとえば１分、３０秒など）だけ離れた時刻までの範囲を基準範囲とし、基準範囲に存在するショットを抜き出す。図９に示す動画像においては、ショット７１１からショット７１７を抜き出す。本実施例においては、ショット７１１およびショット７１７のように、基準範囲にショットの一部が含まれているようなショットも抜き出す。さらにショット７１１からショット７１７までの間から、いずれかのグループに属するショットを抜き出す。 First, a range from a central time of a target A shot 714 to a time separated by a specified time (for example, 1 minute, 30 seconds, etc.) is used as a reference range, and shots existing in the reference range are extracted. In the moving image shown in FIG. 9, a shot 717 is extracted from the shot 711. In the present embodiment, shots in which a part of the shot is included in the reference range, such as the shot 711 and the shot 717, are also extracted. Further, shots belonging to any group are extracted from between shot 711 and shot 717.

そして、ショット７１１からショット７１７までの合計時間に対する、グループに属するショットの合計時間の割合を算出する。このグループに属するショットの合計時間の割合が対話度数である。 Then, the ratio of the total time of the shots belonging to the group to the total time from the shot 711 to the shot 717 is calculated. The ratio of the total time of shots belonging to this group is the interaction frequency.

図９に示す動画像においては、いずれかのグループに属するショットは、ショット７１２〜７１６である。従って、ショット７１２からショット７１６までの合計時間をショット７１１〜ショット７１７までの合計時間で除算した結果がショット７１４に対する対話度数であり、図９に示す動画像においては、対話度数は、０．８８となる。 In the moving image shown in FIG. 9, shots belonging to any group are shots 712 to 716. Therefore, the result of dividing the total time from the shot 712 to the shot 716 by the total time from the shot 711 to the shot 717 is the interaction frequency for the shot 714. In the moving image shown in FIG. 9, the interaction frequency is 0.88. It becomes.

図９に示したアンカーパーソンと特派員の対話シーンのように、ニュース項目中に配置されたショットＡに対しては、高い対話度数が算出される。従って、対話度数が一定値以上となるショットは、先頭ショットとして選択しないことにより、先頭ショット以外のショットを除外することができる。すなわち、メタショット先頭時刻判定部１０６は、対話
度数算出部１２４から対話度数を取得し、取得した対話度数が一定値よりも小さい場合に、当該ショットを先頭ショットとして特定する。 A high interaction frequency is calculated for the shot A arranged in the news item, like the conversation scene between the anchor person and the correspondent shown in FIG. Therefore, shots other than the first shot can be excluded by not selecting a shot with the interaction frequency equal to or higher than a certain value as the first shot. In other words, the meta shot start time determination unit 106 acquires the interaction frequency from the interaction frequency calculation unit 124, and when the acquired interaction frequency is smaller than a certain value, identifies the shot as the first shot.

なお、対話度数の算出処理は上記実施例に限定されない。本実施例においては、対象とするショットの中央の時刻から前後に規定時間離れた位置までの範囲を基準範囲として、グループに属するショットの割合を算出したが、規定時間の基準はショットの中央の時刻でなくてもよい。例えば、時間的に前の位置（番組の先頭側）は対象とするショットの開始時刻から規定時間離れた位置までとしてもよい。また時間的に後の位置（番組の終了側）は対象とするショットの終了時刻から規定時間離れた位置までとしてもよい。 Note that the processing for calculating the interaction frequency is not limited to the above embodiment. In this embodiment, the ratio of shots belonging to the group is calculated using the range from the central time of the target shot to a position separated by a specified time before and after the reference time, but the reference of the specified time is the center of the shot. It may not be time. For example, the position before the time (the beginning of the program) may be a position away from the start time of the target shot by a specified time. Further, the position later in time (the end of the program) may be a position away from the end time of the target shot by a specified time.

また、実施例においては、基準範囲にショットの一部が含まれている場合には、当該ショットを抜き出したが、これにかえて、基準範囲にショットの全体が完全に含まれているようなショットのみを抜き出してもよい。また他の例としては、基準範囲にショットの一部が含まれている場合には、基準範囲に含まれている部分の再生時間長のみを対話度数算出の対象としてもよい。 In the embodiment, when a part of the shot is included in the reference range, the shot is extracted. However, instead of this, the entire reference shot includes the entire shot. Only shots may be extracted. As another example, in the case where a part of a shot is included in the reference range, only the playback time length of the part included in the reference range may be the target for calculating the interaction frequency.

また、実施例においては、「いずれかのグループに属するショットの割合」を算出したが、これにかえて、「基準範囲内にすべてのショットが存在するようなグループに属するショットの割合」を算出してもよい。すなわち、グループに属するショットであっても、当該グループに属するショットが、対象となる基準範囲外にも配置されている場合には、当該グループに属するショットは、割合を算出する対象としない。 In the embodiment, the “ratio of shots belonging to any group” is calculated, but instead, “the ratio of shots belonging to groups in which all shots exist within the reference range” is calculated. May be. That is, even if the shot belongs to a group, if the shot belonging to the group is arranged outside the target reference range, the shot belonging to the group is not a target for calculating the ratio.

また実施例においては、再生時間長の割合を算出したが、これにかえて、ショットの個数の割合を算出してもよい。具体的には、基準範囲に含まれる全てのショットの個数を計測する。また基準範囲に含まれかついずれかのグループに属するショットの個数を計測する。そして、全てのショットの個数に対するグループに属するショットの個数比率を算出する。この比率が対話度数である。図９の例ではいずれかのグループに属するショット数は５であり、基準範囲に含まれるショット数は７である。従って、この場合の対話度数は、５／７である。 In the embodiment, the ratio of the reproduction time length is calculated, but instead of this, the ratio of the number of shots may be calculated. Specifically, the number of all shots included in the reference range is measured. In addition, the number of shots included in the reference range and belonging to any group is measured. Then, the ratio of the number of shots belonging to the group to the number of all shots is calculated. This ratio is the interaction frequency. In the example of FIG. 9, the number of shots belonging to any group is 5, and the number of shots included in the reference range is 7. Therefore, the interaction frequency in this case is 5/7.

別の対話度数算出方法としては、基準範囲に含まれかついずれかのグループに属するショットの個数を、基準時間の長さで除算するものも考えられる。この数値は単位時間あたりの対話関与ショット数を表す。図９の例では、５（個）／５０（秒）＝０．１（個／秒）となる。 As another interactive frequency calculation method, a method of dividing the number of shots included in the reference range and belonging to any group by the length of the reference time is also conceivable. This value represents the number of shots involved in dialogue per unit time. In the example of FIG. 9, 5 (pieces) / 50 (seconds) = 0.1 (pieces / second).

さらに別の対話度数算出方法としては、再生時間長の割合と、上記の単位時間あたりの対話関与ショット数の積を用いることもできる。図９の場合、０．８８×０．１＝０．０８８となる。この場合の対話度数は、基準範囲内で対話に参加しているショットの数が多いほど大きく、基準範囲内で対話に参加しているショットが占める時間が長いほど大きくなる。言い換えると、基準範囲内でより速いスピードで類似ショットがより多く繰り返し登場するほど大きな数値となり、対話の活発さを示す指標となることが期待できる。 As another method for calculating the interaction frequency, the product of the ratio of the reproduction time length and the number of shots involved in the interaction per unit time can be used. In the case of FIG. 9, 0.88 × 0.1 = 0.088. In this case, the degree of dialogue increases as the number of shots participating in the dialogue within the reference range increases, and increases as the time taken by the shots participating in the dialogue within the reference range increases. In other words, it can be expected that the larger the number of similar shots repeatedly appearing at a faster speed within the reference range, the larger the numerical value, and an index indicating the activeness of the dialogue.

また、実施例においては、あるショットを対象として、当該ショットを中心とした基準範囲を特定したが、これにかえて、ある時刻を対象として、当該時刻を中心とした基準範囲を特定してもよい。 In the embodiment, the reference range centered on the shot is specified for a certain shot. Alternatively, the reference range centered on the time may be specified for a certain time. Good.

図１０は、図９を参照しつつ説明した算出方法によって算出された対話度数を示している。図１０に示すグラフの横軸は時間を示す。また縦軸は対話度数を示す。なお、図１０に示すグラフは、実際のニュース番組に対して計算したものである。１００１は、あるニュース項目について出演者がスタジオで短く討論する部分に相当する。１００２はゲスト
がスタジオで出演し、ニュースキャスターがインタビューしつつ討論している部分に相当する。１００３はスポーツコーナーに相当する。１００３においては、スポーツニュースのキャスターが繰り返し登場する。さらに、野球を報じる部分ではピッチャーのショット、バッターのショットなどが類似ショットとして繰り返し登場する。これは、異なるバッター、異なるピッチャーであってもカメラアングルが同一であるために類似ショットと判定されるためである。 FIG. 10 shows the interaction frequency calculated by the calculation method described with reference to FIG. The horizontal axis of the graph shown in FIG. 10 indicates time. The vertical axis indicates the interaction frequency. The graph shown in FIG. 10 is calculated for an actual news program. 1001 corresponds to a part where a performer briefly discusses a certain news item in the studio. 1002 corresponds to the part where the guest appears in the studio and the newscaster discusses while interviewing. 1003 corresponds to a sports corner. In 1003, sports news casters repeatedly appear. Furthermore, pitcher shots, batter shots, etc. appear repeatedly as similar shots in the part reporting baseball. This is because even if different batters and different pitchers are used, the camera angles are the same, and therefore, similar shots are determined.

１００２に相当するシーンではインタビュアーとしてのニュースキャスターが短い時間内に繰り返し登場するため、これらのショットが全て先頭ショットと特定されることが懸念される。そこで、メタショット先頭時刻判定部１０６は、例えば対話度数が規定値以上となる区間においては、当該区間に含まれるメタショット先頭グループのショットのうち最初の１つのショットのみを先頭ショットとして特定してもよい。これにより、対談シーンから大量にメタショット先頭ショットが発生することを抑制することができる。なお、図１０のグラフにおいては、例えば対話度数０．２を規定値としてもよい。 Since a news caster as an interviewer repeatedly appears in a short time in a scene corresponding to 1002, there is a concern that all these shots are identified as the first shot. Therefore, the meta shot start time determination unit 106 specifies, for example, only the first shot of the shots of the meta shot start group included in the interval as the start shot in the interval in which the interaction frequency is equal to or greater than a specified value. Also good. Thereby, it is possible to suppress the occurrence of a large amount of meta shot head shots from the conversation scene. In the graph of FIG. 10, for example, the interaction frequency 0.2 may be set as the specified value.

対話度数の計算をショット単位ではなく時刻単位で行った場合、対話度数が規定範囲に到達、あるいは規定範囲から離脱する時刻と、ショット境界とが一致しない場合がある。この場合、メタショットの境界とショットの境界とを一致させなくてもよい。 When the interaction frequency is calculated not in shot units but in time units, the time when the interaction level reaches or leaves the specified range may not match the shot boundary. In this case, the metashot boundary and the shot boundary do not have to coincide with each other.

例えば、先頭ショットが検出された場合に、対話度数が規定値以上にとなる時刻のうち、当該先頭ショットから最も近い時刻を開始時刻として決定してもよい。または、対話度数が規定値以下となる時刻のうち、当該先頭ショットから最も近い時刻を開始時刻として決定してもよい。 For example, when the first shot is detected, the time closest to the first shot may be determined as the start time among the times when the interaction frequency becomes equal to or higher than a specified value. Alternatively, the time closest to the first shot among the times when the interaction frequency becomes a specified value or less may be determined as the start time.

図１１を参照しつつ、メタショットの境界時刻の決定方法をより具体的に説明する。図１１は、先頭ショットと、当該先頭ショットに対する対話度数を模式的に示している。 With reference to FIG. 11, the method for determining the metashot boundary time will be described more specifically. FIG. 11 schematically shows the top shot and the interaction frequency for the top shot.

１３０１は、ショット単位で求められたメタショット先頭ショットである。１３０２は、対話度数を示している。１３０３は、対話度数がこの数字以上である区間は対話区間であると判定する規定値である。 Reference numeral 1301 denotes a metashot head shot obtained for each shot. Reference numeral 1302 denotes the interaction frequency. Reference numeral 1303 denotes a specified value for determining that a section in which the conversation frequency is equal to or greater than this number is a conversation section.

図１１に示す先頭ショットにおいては、ショット単位でメタショットを定義する場合のメタショット先頭時刻は１３０４になる。これに対して、時刻単位でメタショットを定義する場合にはメタショット先頭時刻は１３０５になる。このように、時刻単位で対話度数を計算した場合には、メタショットのショットの境界と異なる位置をメタショット先頭時刻としてもよい。 In the first shot shown in FIG. 11, the metashot first time when the metashot is defined for each shot is 1304. On the other hand, when the meta shot is defined in units of time, the meta shot head time is 1305. As described above, when the interaction frequency is calculated in units of time, a position different from the shot boundary of the meta shot may be set as the meta shot start time.

別の方法でメタショットを定義する方法について例示する。図２５には、動画像中の６１番目のショットから７４番目のショットまでに関し、それぞれのショットの継続時間、および類似ショットによってグループ化されたグループ名が示されている。「対話度数」の部分には、既出のように基準範囲に対する再生時間長の割合と、単位時間あたりの対話関与ショット数の積が示されている。ここでいう「基準範囲」とは、ショット番号６１の列にはショット６１からショット６１までの区間を、ショット番号６２の列にはショット６１から６２までの区間を、ショット番号７４の列にはショット６１から７４までの区間を基準範囲とした場合の対話度数を示した。ここでは便宜的に数値の単位は分の逆数であり、数値を６０で除算したものが秒の逆数単位となる。 An example of a method of defining a metashot by another method will be described. FIG. 25 shows the duration of each shot and the group names grouped by similar shots for the 61st shot to the 74th shot in the moving image. In the “interaction frequency” portion, as described above, the product of the ratio of the reproduction time length to the reference range and the number of interactive shots per unit time is shown. Here, the “reference range” refers to the section from shot 61 to shot 61 in the column of shot number 61, the section from shot 61 to 62 in the column of shot number 62, and the column of shot number 74. The interaction frequency when the section from shots 61 to 74 is set as the reference range is shown. Here, for convenience, the unit of the numerical value is the reciprocal of minutes, and the numerical value divided by 60 is the reciprocal unit of seconds.

例えば、ショット６１から６４までの区間だけをみると、この区間で繰り返し登場しているのはグループＢに属するショット６２とショット６４のみで、これらの合計時間は９秒である。ショット６１から６４までの合計時間は２７秒であるから、対話度数は（９（
秒）÷２７（秒））×（２（個）÷２７（秒）×６０（秒／分））＝１．５となる。 For example, when only the section from the shots 61 to 64 is seen, only the shot 62 and the shot 64 belonging to the group B repeatedly appear in this section, and their total time is 9 seconds. Since the total time from shots 61 to 64 is 27 seconds, the interaction frequency is (9 (
Seconds) ÷ 27 (seconds) × (2 (pieces) ÷ 27 (seconds) × 60 (seconds / minute)) = 1.5.

このように基準範囲を伸長していくと、ショット６１を開始ショットとした場合、ショット６９までの区間で対話度数が最大となる。対話区間と見なすことができる対話度数の最低値を予め例えば２と決めておけば、ショット６１から６９までの対話度数８．１はこれを上回る。したがってメタショット先頭時刻判定部１０６はショット６１からショット６９までを対話区間と設定する。 In this way, when the reference range is extended, when the shot 61 is set as the start shot, the interaction frequency becomes maximum in the section up to the shot 69. If the minimum value of the interaction frequency that can be regarded as the interaction interval is determined in advance as 2, for example, the interaction frequency 8.1 from the shots 61 to 69 exceeds this value. Therefore, the meta shot head time determination unit 106 sets shot 61 to shot 69 as the conversation section.

このように設定された対話区間をそのままメタショットとし、メタショット先頭時刻判定部１０６はメタショットの先頭ショットの開始時刻をメタショット先頭時刻としてもよい。また、この対話区間で先頭または終端にあるショットをメタショットの先頭としてもよい。さらに、いずれかの類似ショットグループに属すショットであって、対話区間中で最初または最後に登場するものをメタショットの先頭としてもよい。また、メタショット先頭グループ判定部１０５がメタショット先頭グループと判定したグループに属すショットであって、対話区間中で最初または最後に登場するものをメタショットの先頭としてもよい。 The dialogue section set in this way may be used as a meta shot as it is, and the meta shot start time determination unit 106 may set the start time of the start shot of the meta shot as the meta shot start time. In addition, a shot at the head or end of the conversation section may be used as the head of the meta shot. Furthermore, a shot belonging to any similar shot group and appearing first or last in the dialogue section may be set as the head of the meta shot. Further, the shot that belongs to the group that is determined as the metashot head group by the metashot head group determination unit 105 and that appears first or last in the conversation section may be the head of the metashot.

図１２は、実施例２にかかる動画像処理装置１０の動画像処理を示すフローチャートである。実施例２にかかる動画像処理装置１０においては、グループ化処理が完了すると、次に、対話度数算出部１２４が、先頭グループに含まれるショットのうち処理対象となるショットの対話度数を算出する（ステップＳ２２０）。そして、ステップＳ２１０へ進む。また、先頭ショットを特定する処理（ステップＳ２１１，ステップＳ２１２）においては、メタショット先頭時刻判定部１０６は、対話度数算出部１２４が算出した対話度数に基づいて先頭ショットを特定し、特定した先頭ショットの開始時刻をメタショット先頭時刻と定義する。 FIG. 12 is a flowchart of the moving image processing performed by the moving image processing apparatus 10 according to the second embodiment. In the moving image processing apparatus 10 according to the second embodiment, when the grouping process is completed, the interaction degree calculation unit 124 calculates the interaction degree of the shot to be processed among the shots included in the first group ( Step S220). Then, the process proceeds to step S210. Further, in the process of specifying the top shot (steps S211 and S212), the meta shot head time determination unit 106 specifies the top shot based on the conversation degree calculated by the conversation degree calculation unit 124, and specifies the identified first shot. Is defined as the metashot start time.

なお、実施例２にかかる動画像処理装置１０のこれ以外の構成および処理は、実施例１にかかる動画像処理装置１０の構成および処理と同様である。 The remaining configuration and processing of the moving image processing apparatus 10 according to the second embodiment are the same as the configuration and processing of the moving image processing apparatus 10 according to the first embodiment.

なお、類似ショットが繰り返し出現することを条件として先頭ショットを特定すると、図５に示したようなニュース項目を箇条書きにした画面７５６が先頭ショットとして特定される場合がある。画面７５６のような画面は、アンカーパーソン同様、ニュース番組においてニュース項目の切り替わりである場合が多いので、このような画面を先頭ショットとしても問題ない。このため、先頭ショットとして特定されるショットは、アンカーパーソンのショットに限定されない。 Note that if the top shot is specified on condition that similar shots repeatedly appear, a screen 756 in which news items as shown in FIG. 5 are listed may be specified as the top shot. A screen like the screen 756 is often a news item switching in a news program, like an anchor person, so there is no problem even if such a screen is used as the top shot. For this reason, the shot specified as the head shot is not limited to the shot of the anchor person.

また、既出の対話区間定義を用いてバラエティ番組などをコーナーごとに分割することもできる。以下ではこの方法の一例を説明する。図２６は雑学知識を紹介する番組の類似ショット登場パターンを模式的に示したものである。この番組ではスタジオでの出演者のトークと、雑学紹介ビデオが交互に登場し、雑学紹介ビデオの前にはその雑学知識を番組に投稿した人の氏名が紹介される。 Moreover, a variety program etc. can also be divided | segmented for every corner using the already-mentioned dialog section definition. An example of this method will be described below. FIG. 26 schematically shows a similar shot appearance pattern of a program introducing trivia knowledge. In this program, the talk of the performers in the studio and the trivia introduction video appear alternately, and the name of the person who posted the trivia knowledge on the program is introduced before the trivia introduction video.

スタジオトーク場面では出演者の映像が交互に登場する。しかし雑学紹介ビデオ部分には類似ショットがほとんど登場しなかったり、雑学紹介ビデオ部分のみで完結する類似ショット登場となる傾向にある。したがって既出の方法で対話区間を定義すると、それぞれの雑学知識に反応するスタジオトーク部分と、雑学紹介ビデオの一部区間が対話区間となる。 In the studio talk scene, the video of the performer appears alternately. However, there are few similar shots appearing in the trivia introduction video part, or there is a tendency for similar shots to be completed only by the trivia introduction video part. Therefore, when the dialogue section is defined by the above-described method, the studio talk portion that reacts to each trivia knowledge and the partial section of the trivia introduction video become the dialogue section.

このため、対話区間に一度も含まれなかった（あるいはきわめて少ない回数しか含まれなかった）類似ショットグループを選択することにより、図２６の「Ｍ」のようにコーナ
ーごとに特徴的なショットを選び出すことができる。類似ショットＭは雑学知識を投稿した人の氏名が紹介されるショットであり、司会者が「それでは次の雑学です」と紹介する類似ショットＡに続くショットである。 For this reason, by selecting a similar shot group that has never been included in the dialogue section (or included only a very small number of times), a characteristic shot is selected for each corner as shown by “M” in FIG. be able to. The similar shot M is a shot that introduces the name of the person who posted the trivia knowledge, and is a shot that follows the similar shot A that the moderator introduces, “That's the next trivia”.

類似ショットＭは対話区間２５０１に含まれないため、仮に対話区間２５０１をこれに続く類似ショットＭまで１ショット分拡張すると対話度数が低下する。このため、対話区間２５０１は図２６に示したように定義されている。これにより、類似ショットＭはどの対話区間にも属さなかった。 Since the similar shot M is not included in the conversation section 2501, if the conversation section 2501 is extended by one shot to the subsequent similar shot M, the number of conversations decreases. For this reason, the dialogue section 2501 is defined as shown in FIG. As a result, the similar shot M does not belong to any dialogue section.

上記のように対話区間に属さなかった類似ショットグループに属するショットをメタショット先頭ショットすることにより、バラエティ番組においてもコーナーごとに分割することができる。また、対話区間に属さなかった類似ショットグループが複数ある場合には、その時間分布や平均長などを利用してさらに絞り込んでもよい。 As described above, a shot that belongs to a similar shot group that does not belong to the conversation section is shot at the top of the meta shot, so that even in the variety program, it can be divided for each corner. In addition, when there are a plurality of similar shot groups that do not belong to the conversation section, it may be further narrowed down using the time distribution, average length, or the like.

このような手法によるバラエティ番組のコーナー分割は、ある種のクイズ番組でも用いることができる。たとえば問題ビデオを見てからスタジオ回答者が討論し、正解ビデオを流すような番組であって、問題ビデオの前に「問題」、正解ビデオの前に「正解」といった、決まったパターンが全画面で表示されるような場合、「問題」「正解」といったショットも対話区間に属さない可能性が高い。 The corner division of a variety program by such a method can be used for a certain kind of quiz program. For example, it is a program in which studio respondents discuss after watching the problem video and play the correct video, with a fixed pattern such as “question” before the problem video and “correct answer” before the correct video. In such a case, it is highly possible that shots such as “question” and “correct answer” do not belong to the dialogue section.

以上のように、実施例２にかかる動画像処理装置１０においては、対話度数に基づいて先頭ショットを特定するので、より適切なメタショットを生成することができる。 As described above, in the moving image processing apparatus 10 according to the second embodiment, the first shot is specified based on the interaction frequency, so that a more appropriate meta shot can be generated.

次に、実施例３にかかる動画像処理装置１０について説明する。実施例３に係る動画像処理装置１０は、取得した動画像の番組種別を判定する。ここで、番組種別とは、ニュース番組、ドラマ、スポーツ番組などの種類のことである。 Next, the moving image processing apparatus 10 according to the third embodiment will be described. The moving image processing apparatus 10 according to the third embodiment determines a program type of the acquired moving image. Here, the program type refers to a type such as a news program, a drama, or a sports program.

図１３は、実施例３に係る動画像処理装置１０の機能構成を示すブロック図である。実施例３に係る動画像処理装置１０は、実施例１に係る動画像処理装置１０におけるメタショット先頭グループ判定部１０５およびメタショット先頭時刻判定部１０６にかえて、番組種別判定部１３０を備えている。番組種別判定部１３０は、類似ショットがグループ化されると、類似ショットの時間的分布に基づいて入力された番組の種別を判定する。そして、番組種別判定部１３０によって判定された番組種を示す番組種別情報は、動画像出力部１０８から外部機器に向けて出力される。 FIG. 13 is a block diagram illustrating a functional configuration of the moving image processing apparatus 10 according to the third embodiment. The moving image processing apparatus 10 according to the third embodiment includes a program type determination unit 130 instead of the metashot head group determination unit 105 and the metashot head time determination unit 106 in the moving image processing apparatus 10 according to the first embodiment. Yes. When the similar shots are grouped, the program type determination unit 130 determines the type of the input program based on the temporal distribution of the similar shots. Then, the program type information indicating the program type determined by the program type determination unit 130 is output from the moving image output unit 108 to the external device.

番組種別を取得した外部機器は、番組種別情報に基づいて、番組種別に応じた処理を行うことができる。外部機器がハードディスクレコーダーのような録画装置である場合には、録画のビットレートを変化させたり、録画済番組一覧を表示する際に判定された番組種を表示したりすることに用いてもよい。また、カット検出や類似ショット検出のための判定パラメータの自動設定の用途に供してもよい。 The external device that has acquired the program type can perform processing according to the program type based on the program type information. When the external device is a recording device such as a hard disk recorder, it may be used to change the recording bit rate or display the program type determined when displaying the recorded program list. . Moreover, you may use for the use of the automatic setting of the determination parameter for cut detection or similar shot detection.

なお、実施例にかかる番組種別判定部１３０は、本発明にかかるショット個数比較手段、最短時間長比較手段、最長時間長比較手段、時間長平均値算出手段、平均値判定手段、基準平均時間グループ数計測手段、ショット間時間長測定手段、ショット間時間長判定手段、グループ計測手段、グループ存在範囲特定手段、およびメタショット特定手段を構成する。 The program type determination unit 130 according to the embodiment includes the shot number comparison unit, the shortest time length comparison unit, the longest time length comparison unit, the time length average value calculation unit, the average value determination unit, the reference average time group according to the present invention. A number measuring unit, an inter-shot time length measuring unit, an inter-shot time length determining unit, a group measuring unit, a group existence range specifying unit, and a meta-shot specifying unit.

図１４は、実施例３にかかる統計値保持部１２２が保持する統計情報を模式的に示している。なお、図１４に示す統計情報は後述する架空のドラマ番組に対して統計処理部１２
０が生成した統計情報である。 FIG. 14 schematically illustrates statistical information held by the statistical value holding unit 122 according to the third embodiment. Note that the statistical information shown in FIG. 14 is the statistical processing unit 12 for a fictional drama program described later.
0 is the generated statistical information.

図１５は、ドラマ番組を模式的に示している。図４に模式的に示したニュース番組と同様に、横軸は時間軸である。また、ドラマ番組は上段から下段に順に再生される。図１５を参照しつつ番組種別判定部１３０が番組種別を判定するアルゴリズムについて詳述する。 FIG. 15 schematically shows a drama program. Similar to the news program schematically shown in FIG. 4, the horizontal axis is the time axis. Drama programs are played in order from the top to the bottom. The algorithm by which the program type determination unit 130 determines the program type will be described in detail with reference to FIG.

番組種別判定部１３０は、例えば「メタショットの先頭ショットとなるグループ」が存在するか否かに基づいてニュース番組であるか否かを判定する。すなわち、先頭ショットとなるグループが１つ以上存在する場合には、ニュース番組であると判定する。また、先頭ショットとなるグループが存在しない場合には、ニュース番組以外の番組であると判定する。 The program type determination unit 130 determines whether or not the program is a news program, for example, based on whether or not a “group that becomes the first shot of a meta shot” exists. That is, when there is one or more groups that become the first shot, it is determined that the program is a news program. If there is no group that becomes the first shot, it is determined that the program is a program other than the news program.

なお、「メタショットの先頭ショットとなるグループ」が存在するか否かを判断する処理は、実施例１において説明したメタショット先頭グループ判定部１０５がメタショット先頭グループを選択する処理と同様である。 Note that the process for determining whether or not “the group to be the first shot of the metashot” exists is the same as the process for selecting the metashot first group by the metashot first group determination unit 105 described in the first embodiment. .

より具体的には、ニュース番組中のアンカーパーソンのショットを選別する処理を行う。すなわち類似ショットグループに属するショットの登場回数、最短ショットの時間長、最長ショットの時間長、ショットの時間長の平均、ショットの分布時間が一定範囲内にあるもの等の条件に基づいてメタショット先頭ショットと判定する。 More specifically, a process of selecting an anchor person shot in a news program is performed. That is, the start of the metashot based on the conditions such as the number of appearances of shots belonging to similar shot groups, the shortest shot time length, the longest shot time length, the average shot time length, and the shot distribution time within a certain range Judged as a shot.

ここで、図１６を参照しつつ、図１５におけるショット６０１からショット６０２を１つのメタショットとして定義する処理について説明する。なお、この手順は既出の文献（青木ら「繰返しショットの統合による階層化アイコンを用いたビデオ・インタフェース」（情報処理学会論文誌 Vol.39, No.5 pp.1317-1324, 1998年））にも記載されている。 Here, the process of defining shots 601 to 602 in FIG. 15 as one metashot will be described with reference to FIG. This procedure is the same as the previous document (Aoki et al., “Video Interface Using Hierarchical Icons by Integration of Repeated Shots” (IPSJ Journal Vol.39, No.5 pp.1317-1324, 1998)) It is also described in.

図１６において、類似ショットのＡグループに属するショットは、時間範囲１１０１に登場する。同様にＢグループおよびＣグループは、時間範囲１１０２および時間範囲１１０３に登場する。これらの時間範囲を時間軸上の集合と見なし、その和集合を求めると時間範囲１１０４が求まる。以上の処理により、時間範囲１１０４（すなわち、図１６の６０１から６０２まで）をメタショットと定義することができる。 In FIG. 16, shots belonging to the A group of similar shots appear in the time range 1101. Similarly, group B and group C appear in time range 1102 and time range 1103. When these time ranges are regarded as a set on the time axis and the union is obtained, a time range 1104 is obtained. With the above processing, the time range 1104 (that is, from 601 to 602 in FIG. 16) can be defined as a meta shot.

以上の処理によりメタショットを定義した後、当該メタショットの出現パターンに基づいて番組種別を判定する。 After defining the metashot by the above processing, the program type is determined based on the appearance pattern of the metashot.

具体的には、同一のグループに属するショットの再生時間長が最短である最短ショットの再生時間長が１０秒以上である、すなわち最短ショットの再生時間長が所定の値以上であるようなグループを選択し、選択したグループの数が番組全体に含まれる全グループ数の５０％以上となる場合に、当該番組をドラマと判定する。 Specifically, a group in which the playback time length of the shortest shot in which the playback time length of the shots belonging to the same group is the shortest is 10 seconds or more, that is, the playback time length of the shortest shot is a predetermined value or more. When the number of selected groups becomes 50% or more of the total number of groups included in the entire program, the program is determined as a drama.

すなわち、最短ショットの再生時間長が所定の値以上であるようなグループを選択し、選択したグループの数の番組全体に含まれる全グループ数に対する比率が所定の値以上の場合に、当該番組をドラマと判定する。 That is, when a group whose playback time length of the shortest shot is equal to or greater than a predetermined value is selected and the ratio of the number of selected groups to the total number of groups included in the entire program is equal to or greater than a predetermined value, the program is selected. Judged as a drama.

上記判定条件の他の例としては、最短ショットの最短時間長が所定の値以上であることにかえて、最短ショットの時間長が番組全体の時間長における所定の比率以上であることを条件としてもよい。 As another example of the above determination conditions, the shortest shot time length is not less than a predetermined value, and the shortest shot time length is not less than a predetermined ratio in the entire program time length. Also good.

また、選択したグループの数の番組全体に含まれる全グループ数に対する比率が所定の
値以上であることにかえて、選択したグループに属するショットの合計の時間長の番組全体の時間長に対する比率が所定の値以上であることを条件としてもよい。また、選択したグループの数の番組全体に含まれる全グループ数に対する比率が所定の値以上であることにかえて、選択したグループに属するショットの登場回数の番組全体のショット数における比率が所定の値以上であることを条件としてもよい。なおこの場合のメタショットとは、図１６を参照しつつ説明したメタショットである。 In addition, the ratio of the total number of shots belonging to the selected group to the total time length of the program is changed instead of the ratio of the number of selected groups to the total number of groups included in the entire program being a predetermined value or more. The condition may be that it is equal to or greater than a predetermined value. In addition, the ratio of the number of selected groups to the total number of groups included in the entire program is equal to or greater than a predetermined value. It is good also as a condition that it is more than a value. The metashot in this case is the metashot described with reference to FIG.

また、番組全体においてメタショットが登場する回数、番組全体において出現するメタショットの再生時間の合計値、再生時間長が最短であるメタショットの再生時間長、再生時間長が最長であるメタショットの再生時間長、および動画像に含まれるメタショットの平均再生時間長などに基づいて番組種別を判定する。このようにメタショットの出現パターンに基づいて番組種別を判定することができる。 Also, the number of meta shots that appear in the entire program, the total playback time of meta shots that appear in the entire program, the playback time length of the meta shot that has the shortest playback time length, and the meta shot that has the longest playback time length The program type is determined based on the reproduction time length, the average reproduction time length of meta shots included in the moving image, and the like. In this way, the program type can be determined based on the appearance pattern of the metashot.

または、同一のグループに属するショットの再生時間長の平均値を算出し、当該平均値が予め定められた基準平均時間長範囲内の値となるグループの数をカウントし、カウントしたグループの数に基づいて番組種別を判定してもよい。このようにメタショットに含まれるショットの番組全体における出現パターンに基づいて番組種別を判定してもよい。 Alternatively, the average value of the playback time lengths of shots belonging to the same group is calculated, the number of groups in which the average value falls within a predetermined reference average time length range is counted, and the number of counted groups is counted. The program type may be determined based on this. In this way, the program type may be determined based on the appearance pattern of the shots included in the meta shot in the entire program.

また他の例としては、同一のグループに属するショットのうち番組において最初に配置されているショットと最後に配置されているショットとの間にショット間再生時間長を測定し、ショット間再生時間長が予め定められている基準ショット間再生時間長範囲内の値となるグループの数をカウントし、カウントしたグループの数に基づいて番組種別を判定してもよい。この場合も例えばニュース番組か否かを判定することができる。 As another example, an inter-shot playback time length is measured between a shot that is first arranged and a shot that is arranged last in a program among shots that belong to the same group. May be counted to determine the program type based on the counted number of groups. In this case, for example, it can be determined whether the program is a news program.

なおこれらの条件のうち１の条件に基づいて番組種別を判定してもよく、またはこれらの条件のうちから選択いた複数の条件の組み合わせに基づいて番組種別を判定してもよい。 The program type may be determined based on one of these conditions, or the program type may be determined based on a combination of a plurality of conditions selected from these conditions.

また、動画像が動画像取得部１０１に入力されるのに先立ち、または入力された際に解析パラメータ受信部１９０が上記のカット検出部１０２、類似ショット検出部１０４、番組種判定部１３０の各処理に必要な条件（パラメータ）を受信し、これら検出部、判定部に供給してもよい。解析パラメータ１９０がパラメータを受信する先としては、実施例１で既出のようにインターネットや記録メディアなどが想定される。 Also, prior to or when a moving image is input to the moving image acquisition unit 101, the analysis parameter receiving unit 190 includes each of the cut detection unit 102, the similar shot detection unit 104, and the program type determination unit 130 described above. Conditions (parameters) necessary for processing may be received and supplied to these detection units and determination units. As the destination to which the analysis parameter 190 receives parameters, the Internet, a recording medium, and the like are assumed as described in the first embodiment.

図１７は、実施例３にかかる動画像処理装置１０における動画像処理を示すフローチャートである。実施例３にかかる動画像処理装置１０の動画像処理においては、（１）ショット区間定義処理、（２）グループ化処理に続いて、番組種別判定処理が行われる（ステップＳ２３０）。番組種別判定処理においては、番組種別判定部１３０が上述の処理により動画像の番組種別を判定する。 FIG. 17 is a flowchart of the moving image processing in the moving image processing apparatus 10 according to the third embodiment. In the moving image processing of the moving image processing apparatus 10 according to the third embodiment, (1) shot section definition processing and (2) grouping processing are followed by program type determination processing (step S230). In the program type determination process, the program type determination unit 130 determines the program type of the moving image by the above-described process.

なお、既出のように解析パラメータ受信ステップ（図示せず）が存在し、本処理前あるいは本処理中に解析パラメータ受信ステップによってインターネットなどから受信された番組ごとの最適パラメータ設定を用いてステップＳ２０３，Ｓ２０７，Ｓ２３０が検出、判定処理を行ってもよい。 Note that there is an analysis parameter reception step (not shown) as described above, and using the optimal parameter setting for each program received from the Internet or the like by the analysis parameter reception step before or during this processing, step S203, S207 and S230 may perform detection and determination processing.

なお、実施例３にかかる動画像処理装置１０のこれ以外の構成および処理は、実施例１にかかる動画像処理装置１０の構成および処理と同様である。 The remaining configuration and processing of the moving image processing apparatus 10 according to the third embodiment are the same as the configuration and processing of the moving image processing apparatus 10 according to the first embodiment.

以上のように実施例３にかかる動画像処理装置１０は、類似ショットの出現パターンに基づいて番組種別を判定することができるので、映像の視聴、検索および編集等の効率を
向上させることができる。また、当該番組の動画像を処理する場合には、判定された番組種別に応じた処理を行うことができる。 As described above, since the moving image processing apparatus 10 according to the third embodiment can determine the program type based on the appearance pattern of similar shots, it is possible to improve the efficiency of viewing, searching, editing, and the like of video. . In addition, when processing a moving image of the program, processing according to the determined program type can be performed.

次に実施例４にかかる動画像処理装置１０について説明する。実施例４にかかる動画像処理装置１０は、対話度数を用いてメタショットを定義する。図１８は、実施例４にかかる動画像処理装置１０の機能構成を示すブロック図である。実施例４にかかる動画像処理装置１０は、実施例３にかかる動画像処理装置１０の機能構成に加えて、さらに対話度数算出部１２４を備えている。番組種別判定部１３０は、対話度数算出部１２４が算出した対話度数に基づいて番組種別を判定する。 Next, the moving image processing apparatus 10 according to the fourth embodiment will be described. The moving image processing apparatus 10 according to the fourth embodiment defines a metashot using the interaction frequency. FIG. 18 is a block diagram of a functional configuration of the moving image processing apparatus 10 according to the fourth embodiment. In addition to the functional configuration of the moving image processing apparatus 10 according to the third embodiment, the moving image processing apparatus 10 according to the fourth embodiment further includes an interaction frequency calculation unit 124. The program type determination unit 130 determines the program type based on the interaction frequency calculated by the interaction frequency calculation unit 124.

図１９は、図１０と同様に、実際のクイズ番組に対して算出した対話度数を示すグラフである。なお、説明のために若干の修正を加えてある。横軸の黒帯１２０１〜１２０８の部分はスタジオにおいて司会者や回答者が対話している場面である。 FIG. 19 is a graph showing the number of dialogues calculated for an actual quiz program, as in FIG. Note that some modifications have been made for the sake of explanation. The black belts 1201 to 1208 on the horizontal axis are scenes where the presenter and the respondent are interacting in the studio.

この番組では、黒帯１２０１の前にオープニングおよび第一問の出題ビデオが流れる。続いて、スタジオにおける出演者の回答シーン、正解ビデオが流れるシーン、スタジオでコメントがなされるシーン、次の問題の出題ビデオが流れるシーンの順に続く構成である。 In this program, the opening and first question video flows before the black belt 1201. Subsequently, the answer scene of the performer in the studio, the scene where the correct video flows, the scene where the comment is made in the studio, and the scene where the question video of the next question flows are arranged in this order.

図１９のグラフからスタジオシーンでは対話度数が高く、オープニングや出題ビデオ、正解ビデオでは低くなっていることがわかった。これは、オープニングや出題ビデオ、正解ビデオの区間では、類似ショットが登場しない傾向にあるためである。したがって、しきい値１２１０を設定し、対話度数が閾値以上である区間をスタジオシーンのメタショットであると判別することができる。 From the graph of FIG. 19, it was found that the degree of dialogue was high in the studio scene and low in the opening video, the question video, and the correct video. This is because similar shots tend not to appear in the sections of the opening, the question video, and the correct video. Therefore, the threshold value 1210 can be set, and a section in which the interaction frequency is equal to or greater than the threshold value can be determined as a studio scene metashot.

さらに、このように、対話度数が閾値以上となる区間が予め定められた個数存在することを条件として、「ドラマ・映画」であると推定してもよい。 Further, as described above, it may be estimated as “drama / movie” on condition that there is a predetermined number of sections in which the number of conversations is equal to or greater than a threshold value.

また他の例としては、対話度数が閾値以上となるメタショットの時間長の合計を算出し、算出した値が予め定められた規定範囲内の値である場合に「ドラマ・映画」であると推定してもよい。 As another example, the total length of meta-shots in which the interaction frequency is equal to or greater than a threshold is calculated, and the calculated value is a “drama / movie” when the calculated value is within a predetermined range. It may be estimated.

また他の例としては、メタショット中で最長のものの時間長が規定範囲内であるという条件、メタショットの時間長の平均が規定範囲内であるという条件、を用いてもよい。 As another example, the condition that the time length of the longest metashot is within a specified range and the condition that the average time length of metashots is within the specified range may be used.

また、相撲や野球、テニスなどのスポーツ中継番組においては、複数の固定カメラから撮影された映像を組み合わせて放送する傾向にある。従って、類似ショットが番組全体にわたって登場する。 Also, in sports relay programs such as sumo, baseball, and tennis, there is a tendency to broadcast a combination of videos taken from a plurality of fixed cameras. Therefore, similar shots appear throughout the program.

そこで、例えば、実施例３において説明したニュース番組であると推定するための条件に合致せず、かつ番組の半分以上の時間的範囲において類似ショットが出現する場合には、当該番組はスポーツ番組であると推定してもよい。 Therefore, for example, if a similar shot appears in a time range that is more than half of the program and does not meet the conditions for estimating the news program described in the third embodiment, the program is a sports program. It may be estimated that there is.

番組種別の判定方法は上記の方法に限定されない。既出の対話度数を番組全体に対して求めることにより番組種別を判定する方法の一例を以下で説明する。 The method for determining the program type is not limited to the above method. An example of a method for determining the program type by obtaining the above-mentioned interaction frequency for the entire program will be described below.

図２７はニュース９番組とバラエティ１２番組に対して、番組全体の対話度数を求めたものである。白四角（□）がバラエティ番組を、黒四角がニュース番組をあらわしている。横軸は対話度数を表す。縦軸は実施例２のバラエティ番組をコーナー分割する手
法を（ニュースを含む）すべての番組に対して適用した際に、最初のコーナータイトルが現れる時刻を番組全体の長さで割った割合である。なお、対話度数は見やすさのために対数目盛にしてある。 FIG. 27 shows the number of dialogues of the entire program for 9 news programs and 12 variety programs. White squares (□) represent variety programs, and black squares represent news programs. The horizontal axis represents the interaction frequency. The vertical axis represents the ratio of the time at which the first corner title appears divided by the length of the entire program when the method of dividing the variety program of the second embodiment is applied to all programs (including news). . The dialogue frequency is logarithmic for ease of viewing.

図２７でわかるように、ニュース番組とバラエティ番組には番組全体の対話度数において明確な傾向がある。したがって、適切な対話度数を閾値としてニュース番組（領域Ａ）とバラエティ番組（領域ＢおよびＣ）を弁別することが可能である。 As can be seen from FIG. 27, the news program and the variety program have a clear tendency in the degree of dialogue of the entire program. Therefore, it is possible to discriminate between news programs (area A) and variety programs (areas B and C) with an appropriate degree of interaction as a threshold.

また、バラエティ番組において領域Ｃにある３番組は、コーナータイトル画面がコーナーごとに繰り返し現れるものであるが、縦軸の尺度において下方に現れる傾向が観測できる。したがって、コーナータイトルが現れる時刻について適切な閾値を設定することによって、バラエティ番組でもコーナーごとに分割することが適当であるものと、そうでないものを弁別することができる。 In the variety program, the three programs in the area C have the corner title screen repeatedly appearing for each corner, but the tendency of appearing downward on the scale of the vertical axis can be observed. Therefore, by setting an appropriate threshold for the time at which a corner title appears, it is possible to discriminate what is appropriate for dividing a variety program into corners and what is not.

このように類似ショットの時間的分布に基づいて番組の種別を判定すればよく、その具体的条件は実施例に限定されるものではない。 As described above, the type of program may be determined based on the temporal distribution of similar shots, and the specific condition is not limited to the embodiment.

図２０は、実施例４にかかる動画像処理装置１０における動画像処理を示すフローチャートである。実施例４にかかる動画像処理においては、実施例３において説明した対話度数算出処理（ステップＳ２２０）の直前に対話度数算出処理（ステップＳ２２０）を行う。対話度数算出処理における対話度数算出部１２４の処理については上述の通りである。 FIG. 20 is a flowchart of the moving image processing in the moving image processing apparatus 10 according to the fourth embodiment. In the moving image processing according to the fourth embodiment, the conversation frequency calculation process (step S220) is performed immediately before the conversation frequency calculation process (step S220) described in the third embodiment. The processing of the dialogue frequency calculation unit 124 in the dialogue frequency calculation processing is as described above.

なお、実施例４にかかる動画像処理装置１０のこれ以外の構成および処理は、実施例３にかかる動画像処理装置１０の構成および処理と同様である。 The remaining configuration and processing of the moving image processing apparatus 10 according to the fourth embodiment are the same as those of the moving image processing apparatus 10 according to the third embodiment.

次に、実施例５にかかる動画像処理装置１０について説明する。実施例５にかかる動画像処理装置１０は、動画像を複数の小区間に分割し、各小区間毎に、番組種別を判定する。ここで、小区間とは、単純に動画像開始から３分ごと、３０秒ごとといった一定時間長で区切られた区間のことである。また他の例としては、連続する３ショット、１０ショットといった一定ショット数で区切られた区間でもよい。 Next, the moving image processing apparatus 10 according to the fifth embodiment will be described. The moving image processing apparatus 10 according to the fifth embodiment divides a moving image into a plurality of small sections and determines a program type for each small section. Here, the small section is a section that is simply divided by a certain time length such as every 3 minutes or every 30 seconds from the start of the moving image. As another example, a section divided by a certain number of shots such as three consecutive shots and ten shots may be used.

図２１は、実施例５にかかる動画像処理装置１０の機能構成を示すブロック図である。実施例５にかかる動画像処理装置１０は、実施例３にかかる動画像処理装置１０の機能構成に加えて、小区間定義部２４０をさらに備えている。小区間定義部２４０は、動画像を複数の小区間に分割する。なお、実施の形態にかかる小区間定義部２４０は、本発明にかかる分割手段を構成する。 FIG. 21 is a block diagram of a functional configuration of the moving image processing apparatus 10 according to the fifth embodiment. The moving image processing apparatus 10 according to the fifth embodiment further includes a small section definition unit 240 in addition to the functional configuration of the moving image processing apparatus 10 according to the third embodiment. The small section definition unit 240 divides the moving image into a plurality of small sections. Note that the small section definition unit 240 according to the embodiment constitutes a dividing unit according to the present invention.

また、動画像が動画像取得部１０１に入力されるのに先立ち、または入力された際に解析パラメータ受信部１９０が上記のカット検出部１０２、類似ショット検出部１０４、番組種判定部１３０の各処理に必要な条件（パラメータ）を受信し、これら検出部、判定部に供給してもよい。 Also, prior to or when a moving image is input to the moving image acquisition unit 101, the analysis parameter receiving unit 190 includes each of the cut detection unit 102, the similar shot detection unit 104, and the program type determination unit 130 described above. Conditions (parameters) necessary for processing may be received and supplied to these detection units and determination units.

解析パラメータ１９０がパラメータを受信する先としては、実施例１で既出のようにインターネットや記録メディアなどが想定される。 As the destination to which the analysis parameter 190 receives parameters, the Internet, a recording medium, and the like are assumed as described in the first embodiment.

図２２は、実施例５にかかる動画像処理装置１０における動画像処理を示すフローチャートである。実施例５にかかる動画像処理においては、実施例３において説明したショット区間定義処理（ステップＳ２０４）に続いて、小区間定義処理（ステップＳ２４０）を行う。小区間定義処理における小区間定義部２４０の処理については上述の通りである。 FIG. 22 is a flowchart of the moving image processing in the moving image processing apparatus 10 according to the fifth embodiment. In the moving image processing according to the fifth embodiment, the small section definition process (step S240) is performed following the shot section definition process (step S204) described in the third embodiment. The processing of the small section definition unit 240 in the small section definition processing is as described above.

上記のようにして小区間ごとに番組種が検出されると、たとえば「１４分００秒〜１５分００秒：インタビュー」あるいは「７分１１秒〜９分４９秒：歌唱」というように、一つの番組でありながら、複数の構成要素（トーク、インタビュー、歌唱、ニュースアナウンスなど）が複合している番組においても、各コーナーに対して自動的に種別属性を付与することができる。 When the program type is detected for each small section as described above, for example, “14:00 to 15:00: Interview” or “7:11 to 9:49: Singing” Even in a single program, a type attribute can be automatically assigned to each corner even in a program in which a plurality of components (talk, interview, singing, news announcement, etc.) are combined.

これにより、番組自体が複数のコーナーを持っている場合には、それぞれがどういった種別（ニュース、インタビュー、トークなど）であるという属性を自動で付与することができるので、ユーザはそれを手がかりに自分が視聴したい場面を容易に検索することができる。 As a result, when the program itself has multiple corners, it is possible to automatically assign an attribute of each type (news, interview, talk, etc.) so that the user can get a clue. You can easily search the scene you want to watch.

なお、実施例５にかかる動画像処理装置１０のこれ以外の構成および処理は、実施例３にかかる動画像処理装置１０の構成および処理と同様である。 Other configurations and processes of the moving image processing apparatus 10 according to the fifth embodiment are the same as those of the moving image processing apparatus 10 according to the third embodiment.

次に、実施例６にかかる動画像処理装置１０について説明する。実施例６にかかる動画像処理装置１０は、対話度数に基づいて各小区間に対する番組種別を判定する。図２３は、実施例６にかかる動画像処理装置１０の機能構成を示すブロック図である。また、図２４は、実施例６にかかる動画像処理装置１０における動画像処理を示すフローチャートである。実施例５にかかる動画像処理においては、実施例５における番組種別判定処理（ステップＳ２３０）の直前に、対話度数算出処理（ステップＳ２２０）が行われる。対話度数算出処理における対話度数算出部１２４の処理については上述の通りである。 Next, the moving image processing apparatus 10 according to the sixth embodiment will be described. The moving image processing apparatus 10 according to the sixth embodiment determines a program type for each small section based on the interaction frequency. FIG. 23 is a block diagram of a functional configuration of the moving image processing apparatus 10 according to the sixth embodiment. FIG. 24 is a flowchart of the moving image processing in the moving image processing apparatus 10 according to the sixth embodiment. In the moving image processing according to the fifth embodiment, an interactive frequency calculation process (step S220) is performed immediately before the program type determination process (step S230) in the fifth embodiment. The processing of the dialogue frequency calculation unit 124 in the dialogue frequency calculation processing is as described above.

なお、実施例６にかかる動画像処理装置１０のこれ以外の構成および処理は、実施例５にかかる動画像処理装置１０の構成および処理と同様である。 The remaining configuration and processing of the moving image processing apparatus 10 according to the sixth embodiment are the same as the configuration and processing of the moving image processing apparatus 10 according to the fifth embodiment.

類似ショットが出現する出現パターンに基づいて、メタショットの先頭ショットを選択するので、ニュース番組などの映像に対して適切な単位のメタショットを生成することができるという効果を奏する。 Since the first shot of the meta shot is selected based on the appearance pattern in which the similar shot appears, there is an effect that an appropriate unit of the meta shot can be generated for a video such as a news program.

実施例１にかかる動画像処理装置１０の機能構成を示すブロック図である。1 is a block diagram illustrating a functional configuration of a moving image processing apparatus 10 according to Embodiment 1. FIG. 統計値保持部１２２が保持する統計情報を模式的に示す図である。It is a figure which shows typically the statistical information which the statistical value holding | maintenance part 122 hold | maintains. 動画像を模式的に示す図である。It is a figure which shows a moving image typically. ニュース番組を模式的に示す図である。It is a figure which shows a news program typically. 各グループに属するショットの内容を示す図である。It is a figure which shows the content of the shot which belongs to each group. 動画像処理装置１０における動画像処理を示すフローチャートである。3 is a flowchart illustrating moving image processing in the moving image processing apparatus 10. 動画像処理装置１０のハードウェア構成を示す図である。2 is a diagram illustrating a hardware configuration of a moving image processing apparatus 10. FIG. 実施例２にかかる動画像処理装置１０の機能構成を示すブロック図である。FIG. 6 is a block diagram illustrating a functional configuration of a moving image processing apparatus 10 according to a second embodiment. アンカーパーソンと特派員の対話シーンを模式的に示す図である。It is a figure which shows typically the dialog scene of an anchor person and a correspondent. 図９を参照しつつ説明した算出方法によって算出された対話度数のグラフを示す図である。It is a figure which shows the graph of the interaction frequency calculated by the calculation method demonstrated referring FIG. 先頭ショットと、当該先頭ショットに対する対話度数を模式的に示す図である。It is a figure which shows typically the first shot and the interaction frequency with respect to the first shot. 実施例２にかかる動画像処理装置１０の動画像処理を示すフローチャートである。12 is a flowchart illustrating moving image processing of the moving image processing apparatus 10 according to the second embodiment. 実施例３に係る動画像処理装置１０の機能構成を示すブロック図である。FIG. 10 is a block diagram illustrating a functional configuration of a moving image processing apparatus 10 according to a third embodiment. 実施例３にかかる統計値保持部１２２が保持する統計情報を模式的に示す図である。It is a figure which shows typically the statistical information which the statistical value holding | maintenance part 122 concerning Example 3 hold | maintains. ドラマ番組を模式的に示す図である。It is a figure which shows a drama program typically. メタショットを定義する処理を説明するための図である。It is a figure for demonstrating the process which defines a metashot. 実施例３にかかる動画像処理装置１０における動画像処理を示すフローチャートである。12 is a flowchart illustrating moving image processing in the moving image processing apparatus 10 according to the third embodiment. 実施例４にかかる動画像処理装置１０の機能構成を示すブロック図である。FIG. 10 is a block diagram illustrating a functional configuration of a moving image processing apparatus 10 according to a fourth embodiment. 実際のクイズ番組に対して算出した対話度数のグラフを示す図である。It is a figure which shows the graph of the interaction frequency computed with respect to the actual quiz program. 実施例４にかかる動画像処理装置１０における動画像処理を示すフローチャートである。10 is a flowchart illustrating moving image processing in the moving image processing apparatus 10 according to the fourth embodiment. 実施例５にかかる動画像処理装置１０の機能構成を示すブロック図である。FIG. 10 is a block diagram illustrating a functional configuration of a moving image processing apparatus 10 according to a fifth embodiment. 実施例５にかかる動画像処理装置１０における動画像処理を示すフローチャートである。10 is a flowchart illustrating moving image processing in the moving image processing apparatus 10 according to the fifth embodiment. 実施例６にかかる動画像処理装置１０の機能構成を示すブロック図である。FIG. 10 is a block diagram illustrating a functional configuration of a moving image processing apparatus 10 according to a sixth embodiment. 実施例６にかかる動画像処理装置１０における動画像処理を示すフローチャートである。14 is a flowchart illustrating moving image processing in the moving image processing apparatus 10 according to the sixth embodiment. 対話度数を用いた対話区間設定を模式的に示す図であるIt is a figure which shows typically the dialog section setting using a dialog frequency. バラエティ番組の類似ショット登場パターンを模式的に示す図であるIt is a figure which shows typically the similar shot appearance pattern of a variety program 対話度数とコーナータイトル登場時刻を用いた番組種別判定を模式的に示す図である。It is a figure which shows typically the program classification determination using the interaction frequency and the corner title appearance time.

Explanation of symbols

１０動画像処理装置
１０１動画像取得部
１０２カット検出部
１０３ショット区間定義部
１０４類似ショット検出部
１０５メタショット先頭グループ判定部
１０６メタショット先頭時刻判定部
１０７メタショット生成部
１０８動画像出力部
１１０グループ化部
１２０統計処理部
１２２統計値保持部
１２４対話度数算出部
１３０番組種別判定部
１９０解析パラメータ受信部
２４０小区間定義部 DESCRIPTION OF SYMBOLS 10 Moving image processing apparatus 101 Moving image acquisition part 102 Cut detection part 103 Shot area definition part 104 Similar shot detection part 105 Metashot head group determination part 106 Metashot head time determination part 107 Metashot generation part 108 Moving picture output part 110 Group Conversion unit 120 statistical processing unit 122 statistical value holding unit 124 interaction frequency calculation unit 130 program type determination unit 190 analysis parameter reception unit 240 small section definition unit

Claims

Cut detection means for detecting an image change point at which the content of the image is switched from the moving image;
Similarity measuring means for measuring the similarity between partial moving images that are divided by the image change points detected by the cut detecting means;
Similar shot specifying means for specifying similar partial moving images based on the similarity measured by the similarity measuring means;
Grouping means for assigning the same group attribute to a plurality of similar partial moving images specified by the similar shot specifying means;
A moving image comprising: a moving image type determining unit that determines a type of the moving image based on an appearance pattern in which similar partial moving images grouped by the grouping unit appear in the moving image Image processing device.

A program dialogue frequency calculating means for calculating the activity of dialogue with respect to the entire moving image,
The moving image processing apparatus according to claim 1, wherein the moving image type determining unit determines the moving image type based on an interaction frequency of the entire moving image calculated by the program interaction frequency calculating unit.

Further comprising a dividing means for dividing the moving image into a plurality of time intervals;
The moving image processing apparatus according to claim 1, wherein the moving image type determining unit determines the moving image type for each section divided by the dividing unit.

A leading group for selecting a leading group to which a leading partial moving image to be a leading portion of a metashot consisting of a plurality of temporally continuous partial moving images belongs from a group generated by the grouping means according to a predetermined condition Selection means,
First shot for selecting the first partial moving image from the partial moving images included in the first group based on the appearance pattern in which the partial moving image belonging to the first group selected by the first group selecting means appears in the moving image Selection means,
Metashot generation means for generating a metashot having the head partial moving image at the head;
Shot number comparison means for comparing the number of partial moving images belonging to the same group with a predetermined reference number,
Shortest time length comparison means for comparing the reproduction time length of a partial moving image having the shortest reproduction time length among the partial moving images belonging to the same group and a predetermined reference shortest time length;
Longest time length selection means for comparing the reproduction time length of a partial moving image having the longest reproduction time length among the partial moving images belonging to the same group with a predetermined reference longest time length;
An average time length comparison means for calculating an average value of reproduction time lengths of partial moving images belonging to the same group;
Among shots that belong to the same group, between shots that measure the playback time length between the first moving image arranged in the moving image and the last moving image arranged in the moving image. Time length comparison means,
Shot position determination means for determining whether or not all partial moving images belonging to the first group are arranged between two partial moving images belonging to a group other than the first group;
A reference range specifying means for specifying a reference range having a predetermined time length including a target partial moving image belonging to a predetermined group, or the group based on an appearance pattern of a plurality of partial moving images belonging to the same group Further comprising at least one means of interaction degree calculation means for calculating an interaction degree indicating a probability that the target partial moving image is a partial moving image indicating a conversation scene,
Based on the determination result of the moving image type determination unit, the similarity measurement unit, the grouping unit, the head group selection unit, the head shot selection unit, the meta shot generation unit, the shot number comparison unit, the shortest time At least one of a length comparison means, the longest time length selection means, the average time length comparison means, the inter-shot time length comparison means, the shot position determination means, the reference range identification means, or the interaction frequency calculation means 4. The moving image processing apparatus according to claim 1, wherein the operation reference value is changed.