JP2006053802A

JP2006053802A - Image type determining method, image type determining device, and image type determining program

Info

Publication number: JP2006053802A
Application number: JP2004235627A
Authority: JP
Inventors: Kiyomi Ito; 清美伊藤; Yasumasa Niikura; 康巨新倉
Original assignee: NTT Communications Corp
Current assignee: NTT Communications Corp
Priority date: 2004-08-12
Filing date: 2004-08-12
Publication date: 2006-02-23
Anticipated expiration: 2024-08-12
Also published as: JP4671640B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image type determining method, image type determining device, and image type determining program for supporting classification and selection of an image type on the basis of a font of a telop character (including modification information). <P>SOLUTION: The image type determining method is provided with a frame extracting step (S1) for extracting a frame with the telop character displayed from an image of an input frame construction, a character recognizing step (S6) for recognizing the telop character of the frame with the telop character displayed, a font recognizing step (S10) for recognizing the font of the recognized character, and an image determining step (S16) for determining the image type of the input image by referring to a font-image type associating database for associating the font of the character with the image type. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、映像種別判定方法、映像種別判定装置及び映像種別判定プログラムに関する。 The present invention relates to a video type determination method, a video type determination device, and a video type determination program.

コンピュータ技術の進歩によって、映像にテロップ文字が、多様なフォントで挿入されている。こうしたテロップ文字のフォントやその文字の色、大きさ等の文字の修飾情報は、番組の雰囲気を反映していることが多い。 With the advance of computer technology, telop characters are inserted in various fonts in images. Character modification information such as the telop font and its color and size often reflects the atmosphere of the program.

映像からテロップ文字を抜き出し、その文字を判別するテロップ文字認識技術が、従来から知られている。この技術は、映像の理解に役立てたり、映像を自動的に整理したりすることに用いられている。具体的には、テロップ文字が現れた時点を映像の区切りと見なす映像インデキシングや、文字列認識を行って判別した文字をメタデータとして、映像に付与したりすることが行われている。 Conventionally, a telop character recognition technique for extracting a telop character from an image and discriminating the character is known. This technique is used for understanding videos or automatically organizing videos. Specifically, video indexing in which the time when a telop character appears is regarded as a video break, and characters determined by performing character string recognition are added to video as metadata.

例えば、特許文献１には、映像中からテロップ文字が表示されているフレームを高精度に検出する技術が開示されている。
特許文献１に示されている図１には、テロップ文字表示フレーム検出装置のテロップ文字表示フレーム検出部の例が示されている。テロップ文字表示フレーム検出部は、部分ブロック分割部１１と未処理部分ブロック判断部１２とエッジ検出部１３とスキャン制御部１４とエッジ勾配方向判断部１５とエッジ間輝度変化算出部１６とエッジペアカウント部１７とエッジペア総数算出部１８とエッジペア数判断部１９とフラグ情報設定部２０とプラグ情報判段部２１で構成されている。部分ブロック分割部１１は、入力フレームを部分ブロックに分割する。図２は入力フレームを８個の部分ブロックに分割する例を示している。未処理部分ブロック判断部１２は、分割された部分ブロックのうち注目部分ブロックについて、フラグ情報設定の処理が済んだかどうか判断する。エッジ検出部１３は、予め決められた方法を用いて、画像を構成する複数の画素のうち輝度の値が局所的に不連続に変化する部分の画素であるエッジ画素の方向情報も伴ったエッジ検出処理を行う。スキャン制御部１４は、エッジペアを算出する走査線（水平又は垂直）を指定する。エッジ勾配方向判断部１５は、図２参照に示されているように、検出したエッジ画素の走査線方向の勾配方向（上り、下り）を判断する。エッジ間輝度変化算出部１６は、隣接するエッジ画素の間の輝度値の変化を図３のように計算する。エッジペアカウント部１７は、隣接するエッジの勾配方向が逆（上り、下りの順かあるいは下り、上りの順）であり、かつエッジ間の輝度変化が小さい場合、該エッジをエッジペアとしてカウントする。図２の例ではエッジペアの数は１３である。エッジペア総数算出部１８は、走査方向別に算出したエッジペアの合計を計算する。エッジペア数判断部１９は、エッジペアの合計と予め設定した値を比較する。フラグ情報設定部２０は、検出されたエッジペア数が所定値以上の場合に、テロップ文字と判断して、該部分ブロックにフラグ情報を設定する。フラグ情報判断部２１は、該フレームをテロップ文字表示フレームとして出力する。 For example, Patent Document 1 discloses a technique for detecting a frame in which a telop character is displayed from a video with high accuracy.
FIG. 1 shown in Patent Document 1 shows an example of a telop character display frame detection unit of a telop character display frame detection device. The telop character display frame detection unit includes a partial block division unit 11, an unprocessed partial block determination unit 12, an edge detection unit 13, a scan control unit 14, an edge gradient direction determination unit 15, an inter-edge luminance change calculation unit 16, and an edge pair account. It comprises a unit 17, an edge pair total number calculating unit 18, an edge pair number determining unit 19, a flag information setting unit 20, and a plug information determining unit 21. The partial block dividing unit 11 divides the input frame into partial blocks. FIG. 2 shows an example in which an input frame is divided into 8 partial blocks. The unprocessed partial block determination unit 12 determines whether flag information setting processing has been completed for the target partial block among the divided partial blocks. The edge detection unit 13 uses a predetermined method, and includes an edge with direction information of an edge pixel that is a part of a pixel in which the luminance value changes locally and discontinuously among a plurality of pixels constituting the image. Perform detection processing. The scan control unit 14 designates a scanning line (horizontal or vertical) for calculating an edge pair. As shown in FIG. 2, the edge gradient direction determination unit 15 determines the gradient direction (up and down) of the detected edge pixel in the scanning line direction. The inter-edge luminance change calculation unit 16 calculates a change in luminance value between adjacent edge pixels as shown in FIG. The edge pair account unit 17 counts the edges as edge pairs when the gradient directions of adjacent edges are reverse (in the order of up and down or in the order of down and up), and the luminance change between the edges is small. In the example of FIG. 2, the number of edge pairs is thirteen. The edge pair total number calculation unit 18 calculates the total of edge pairs calculated for each scanning direction. The edge pair number determination unit 19 compares the total of edge pairs with a preset value. When the detected number of edge pairs is equal to or greater than a predetermined value, the flag information setting unit 20 determines that the character is a telop character and sets flag information in the partial block. The flag information determination unit 21 outputs the frame as a telop character display frame.

また、特許文献２には、テロップ文字等を含む画像中から文字領域を抽出する技術が開示されている。 Patent Document 2 discloses a technique for extracting a character region from an image including a telop character or the like.

特許文献２に示されている図４において、２−１０は水平ライン単位二値化部であり、画像中の各水平ライン内で輝度の二値化を行う。２−１１は垂直ライン単位二値化部であり、画像中の各垂直ライン内で輝度の二値化を行う。２−１２は二値化結果統合部であり、水平ライン単位二値化部２−１０と垂直ライン単位二値化部２−１１により得られた２枚の二値画像を比較し、両方の結果ともに同じ位置に存在し、同じ面積値をもつ連結成分を文字領域と判断し、他の連結成分は背景ノイズ部と判断し画像から除去する。２−１３は処理制御部であり、上記２−１０〜１２の各部の実行順序を制御する。図５は、図４中の水平ライン単位二値化部２−１０のブロック図である。図５において、２−１４は水平ライン上輝度極大領域検出部であり、画像中のある水平ライン内の輝度分布を調べ、該水平ライン内で局所的に輝度値が予め設定した値だけ高い連結画素領域を検出する。２−１５は高輝度文字領域抽出部であり、画像中のある水平ライン上で、水平ライン上輝度極大領域検出部２−１４により得られた連結画素領域の左右の両端付近の複数の画素の中から水平方向の輝度勾配の絶対値が最大となる画素を左右各々で求め、該画素を両端とする範囲を高輝度文字領域として抽出する。２−１６は水平ライン上輝度極小領域検出部であり、画像中のある水平ライン内の輝度分布を調べ、該水平ライン内で局所的に輝度値が予め設定した値だけ低い連結画素領域を検出する。２−１７は低輝度文字領域抽出部であり、画像中のある水平ライン上で、水平ライン上輝度極小領域検出部２−１６により得られた連結画素領域の左右の両端付近の複数の画素の中から水平方向の輝度勾配の絶対値が最大となる画素を左右各々で求め、該画素を両端とする範囲を低輝度文字領域として抽出する。２−１８は水平ライン別二値化結果統合部であり、高輝度文字領域抽出部２−１５および低輝度文字領域抽出部２−１７により得られた各水平ライン上の文字領域を統合し、画像全体としての文字領域画像を作成する。垂直ライン単位二値化部２−１１も同様に、高輝度文字領域抽出部および低輝度文字領域抽出部により得られた各垂直ライン上の文字領域を統合し、画像全体としての文字領域画像を作成する。 In FIG. 4 shown in Patent Document 2, reference numeral 2-10 denotes a horizontal line unit binarization unit, which performs luminance binarization within each horizontal line in the image. Reference numeral 2-11 denotes a vertical line unit binarization unit, which performs luminance binarization within each vertical line in the image. 2-12 is a binarization result integration unit, which compares two binary images obtained by the horizontal line unit binarization unit 2-10 and the vertical line unit binarization unit 2-11, Both connected components that are present at the same position and have the same area value are determined as character regions, and the other connected components are determined as background noise portions and removed from the image. A processing control unit 2-13 controls the execution order of the units 2-10 to 12-12. FIG. 5 is a block diagram of the horizontal line unit binarization unit 2-10 in FIG. In FIG. 5, reference numeral 2-14 denotes a horizontal line maximum luminance area detection unit, which examines the luminance distribution in a certain horizontal line in the image and connects the luminance value higher by a preset value locally in the horizontal line. A pixel area is detected. Reference numeral 2-15 denotes a high-luminance character area extraction unit which, on a certain horizontal line in the image, has a plurality of pixels near the left and right ends of the connected pixel area obtained by the horizontal line luminance maximum area detection unit 2-14. A pixel having the maximum absolute value of the luminance gradient in the horizontal direction from the inside is obtained on each of the left and right sides, and a range having the pixel as both ends is extracted as a high-luminance character region. Reference numeral 2-16 denotes a horizontal line minimum luminance area detection unit, which examines the luminance distribution in a certain horizontal line in the image and detects a connected pixel area whose luminance value is locally lower in the horizontal line by a preset value. To do. Reference numeral 2-17 denotes a low-luminance character area extraction unit which, on a certain horizontal line in the image, includes a plurality of pixels near the left and right ends of the connected pixel area obtained by the horizontal line luminance minimum area detection unit 2-16. A pixel having the maximum absolute value of the luminance gradient in the horizontal direction from the inside is obtained on each of the left and right sides, and a range having these pixels at both ends is extracted as a low-luminance character region. 2-18 is a binarization result integration unit for each horizontal line, integrating the character regions on each horizontal line obtained by the high luminance character region extraction unit 2-15 and the low luminance character region extraction unit 2-17, Create a character area image for the entire image. Similarly, the vertical line unit binarization unit 2-11 integrates the character regions on the vertical lines obtained by the high luminance character region extraction unit and the low luminance character region extraction unit, and converts the character region image as the entire image create.

これにより、水平ライン別二値化結果統合部と各垂直ライン上の二値化結果統合部から、テロップ文字等を含む画像中から文字領域を抽出する。 Thereby, a character area is extracted from an image including a telop character or the like from the binarization result integration unit for each horizontal line and the binarization result integration unit on each vertical line.

また、イメージデータとして、読み込まれた読み込み文字を、パーソナルコンピュータや、印刷装置などが内蔵する近似フォントで位置き換える技術が知られている。 Further, a technique is known in which the read characters read as image data are replaced with approximate fonts built in a personal computer, a printing apparatus, or the like.

例えば、特許文献３には、複数の内蔵フォントを用いて作成された複数のイメージデータと読み込み文字のイメージデータとをそれぞれパターン比較して相違点を検出し、相違点の最も少ないものを近似フォントとして選択する技術が開示されている。 For example, in Patent Document 3, a plurality of image data created using a plurality of built-in fonts and image data of read characters are compared with each other to detect differences, and those having the least differences are approximated fonts. The technique to select as is disclosed.

特許文献３に示されている図６において、ブロック化手段３−２１は、スキャナ装置３−３により読み込まれた原稿のイメージデータを文字列毎にブロック化する。ブロック内文字分離手段３−２２は、ブロック化手段３−２１によりブロック化された各文字列内の複数の読み込み文字を、各読み込み文字毎に分離する。読み込み文字認識手段３−２３は、ブロック内文字分離手段３−２２により分離された各読み込み文字を、既存のＯＣＲ（Optical Character Reader）技術を用いてそれぞれ文字認識する。具体的には、読み込み文字認識手段３−２３は、例えば、各読み込み文字のイメージデータをそれぞれベクトル量に変換し、変換したベクトル量を、各文字カテゴリ単位で構成された内蔵データベースとそれぞれ照合し、類似点が多いものを当該読み込み文字として認識する。この読み込み文字認識手段３−２３により文字認識された各読み込み文字は、ＪＩＳコード等のコードデータと対応付けられて、例えばＲＡＭ３−１２に一時的に格納されることになる。読み込み文字サイズ判別手段３−２４は、読み込み文字認識手段３−２３により文字認識された各読み込み文字のサイズを判別する。イメージデータ作成手段３−２５は、ＨＤＤ３−１４に格納された数種類の内蔵フォントを用いて、読み込み文字認識手段３−２３により文字認識された読み込み文字と同一文字のイメージデータを、読み込み文字サイズ判別手段３−２４により判別された読み込み文字のサイズと略同一のサイズでそれぞれ作成する。すなわち、文字認識された読み込み文字は、上述したように、ＪＩＳコード等のコードデータと対応付けられてＲＡＭ３−１２に一時的に格納されている。一方、数種類の内蔵フォントのデータは、通常、ベクトルデータとしてＨＤＤ３−１４にそれぞれ格納されている。イメージデータ作成手段２５は、読み込み文字に対応したコードデータをもとにして、読み込み文字と同一文字のベクトルデータを各内蔵フォント毎にそれぞれ抽出し、これらのベクトルデータを読み込み文字と略同一サイズのビットマップデータ等のイメージデータにそれぞれ変換して、ＲＡＭ３−１２に展開する。近似フォント選択手段３−２６は、イメージデータ作成手段３−２５により数種類の内蔵フォントを用いて作成された数種類のイメージデータを、読み込み文字のイメージデータとそれぞれパターン比較してそれぞれの相違点を検出し、読み込み文字のイメージデータとの相違点が最も少ないイメージデータの内蔵フォントを、読み込み文字に対応した画像を形成するための近似フォントとして選択する。
特開平１１−１７８００７特開２０００−１９４８５１特開２００３−５０９７１ In FIG. 6 shown in Patent Document 3, a blocking unit 3-21 blocks image data of a document read by the scanner device 3-3 for each character string. The in-block character separating unit 3-22 separates a plurality of read characters in each character string blocked by the blocking unit 3-21 for each read character. The read character recognition means 3-23 recognizes each read character separated by the in-block character separation means 3-22 using the existing OCR (Optical Character Reader) technology. Specifically, the read character recognition unit 3-23 converts, for example, image data of each read character into a vector amount, and collates the converted vector amount with a built-in database configured for each character category. , A character having many similarities is recognized as the read character. Each read character recognized by the read character recognizing means 3-23 is associated with code data such as a JIS code and temporarily stored in, for example, the RAM 3-12. The read character size determination unit 3-24 determines the size of each read character recognized by the read character recognition unit 3-23. The image data creation means 3-25 uses several types of built-in fonts stored in the HDD 3-14 to determine the read character size of the image data having the same character as the read character recognized by the read character recognition means 3-23. Each character is created with a size substantially the same as the size of the read character determined by means 3-24. In other words, the read character that has been character-recognized is temporarily stored in the RAM 3-12 in association with code data such as a JIS code, as described above. On the other hand, data of several types of built-in fonts are usually stored in the HDD 3-14 as vector data. Based on the code data corresponding to the read character, the image data creation means 25 extracts vector data of the same character as the read character for each built-in font, and these vector data have the same size as the read character. The image data is converted into image data such as bitmap data and expanded in the RAM 3-12. The approximate font selection means 3-26 detects the difference between the image data creation means 3-25 and the image data created by using several kinds of built-in fonts by comparing the pattern with the image data of the read characters. Then, the built-in font of the image data with the least difference from the image data of the read character is selected as an approximate font for forming an image corresponding to the read character.
JP-A-11-178007 JP 2000-194851 A JP2003-50971

しかしながら、従来のテロップ認識技術では、文字を認識しても、番組の種別を判定することはできない。例えば、「未確認生物発見」というテロップが含まれていても、その番組がニュースなのか、それともバラエティにおける冗談としての番組なのか、文字の内容から区別することができない。 However, with the conventional telop recognition technology, the type of program cannot be determined even if characters are recognized. For example, even if the telop “Unidentified organism found” is included, it cannot be distinguished from the contents of the text whether the program is news or a program as a joke in variety.

また、テロップ文字のフォントや文字の修飾情報は、番組の雰囲気（乃至は内容）を反映していることが多いが、従来のテロップ文字の認識では、フォントや文字の修飾情報まで考慮した認識を行っていない。そのため、文字を認識しても、テロップ文字のフォントから番組の特徴まで判断することはできない。 The telop character font and character modification information often reflect the atmosphere (or content) of the program, but conventional telop character recognition recognizes the font and character modification information. not going. For this reason, even if characters are recognized, it is impossible to judge from the telop font to the program characteristics.

映像は、「映像」とひとくくりで表現されても、その種別・ジャンルは、ニュース、スポーツ、ドラマ、ドキュメンタリー、バラエティ、映画など多岐に亘っている。そこで、様々な種別の映像コンテンツを的確にジャンル分類して管理しておけば、映像コンテンツの検索を容易に行うことができる。 Even if the video is expressed as “video”, the type and genre are diverse, such as news, sports, drama, documentary, variety, and movie. Therefore, if various types of video content are classified and managed appropriately, video content can be easily searched.

ところで、映像の種別・ジャンルは、重要な分類項目であるものの、それを分類するには、映像の内容を理解する必要があり、最終的には、人手で判断することになって、運用コストが大きくなるという問題がある。 By the way, although the type and genre of the video are important classification items, it is necessary to understand the content of the video in order to classify them. There is a problem that becomes larger.

本発明の解決しようとする課題は、テロップ文字のフォント（含む文字の修飾情報）に基づいて、映像種別の分類又は選択を支援する映像種別判定方法、映像種別判定装置及び映像種別判定プログラムを提供することである。 The problem to be solved by the present invention is to provide a video type determination method, a video type determination device, and a video type determination program that support classification or selection of video types based on the font of telop characters (including character modification information). It is to be.

上記課題を解決するために、本件発明は、以下の特徴を有する課題を解決するための手段を採用している。 In order to solve the above problems, the present invention employs means for solving the problems having the following characteristics.

請求項１に記載された発明は、入力されたフレーム構成の映像から文字が表示されたフレームを抜き出すフレーム抜き出し手段と、文字が表示されたフレームの文字の認識を行う文字認識手段と、認識された文字のフォントを認識するフォント認識手段と、文字のフォントと映像種別とを対応づけたフォント・映像種別対応データベースを参照して、前記入力された映像の映像種別を判定する映像判定手段とを備えたことを特徴とする映像種別判定装置である。 The invention described in claim 1 is recognized as a frame extracting means for extracting a frame in which characters are displayed from an input image having a frame structure, and a character recognizing means for recognizing characters in the frame in which the characters are displayed. A font recognizing means for recognizing a font of a character, and a video determining means for determining a video type of the inputted video with reference to a font / video type correspondence database that associates the font of the character with a video type. A video type determination device characterized in that it is provided.

なお、ここでいう「フォント」は、文字フォントを意味するだけでなく、「文字フォントと文字の修飾情報」を含むものも意味する。文字の修飾情報とは、文字の色、大きさ等の文字を修飾する情報であり、広い概念の「フォント」は、文字の修飾情報の一つと言えること、また、厳密な意味で、複雑なフォントでは、「文字フォント」と「文字の修飾情報」を区別することができないからである。したがって、ここでの「フォント」は、いわゆる、「文字フォント」を意味するだけでなく、それ以外に「文字フォントと文字の修飾情報」を含むものも意味する。しかしながら、いわゆる「文字フォント」を除いた「文字の修飾情報」のみの場合は含まれない。 Note that “font” here means not only a character font but also one including “character font and character modification information”. Character modification information is information that modifies characters, such as the color and size of characters. A broad concept of “font” is one of character modification information, and it is complicated in a strict sense. This is because the font cannot distinguish between “character font” and “character modification information”. Therefore, the “font” here means not only a so-called “character font”, but also includes “character font and character modification information”. However, the case of only “character modification information” excluding the so-called “character font” is not included.

請求項２に記載された発明は、請求項１記載の映像種別判定装置において、前記フォント認識手段で最も多く認識されたフォントをその映像の特徴的フォントであると判定する特徴的なフォントを判定する特徴的フォント判定手段を備え、前記映像判定手段は、該特徴的フォント判定手段で判定された特徴的なフォントと前記フォント・映像種別対応データベースとに基づいて映像種別を判定することを特徴とする。 According to a second aspect of the present invention, in the video type determination apparatus according to the first aspect, a characteristic font for determining that the font recognized most by the font recognition means is a characteristic font of the video is determined. Characterized in that the video determination means determines the video type based on the characteristic font determined by the characteristic font determination means and the font / video type correspondence database. To do.

請求項３に記載された発明は、請求項１記載の映像種別判定装置において、前記フォント認識手段で認識されたフォントをフォント毎に積算し、入力された映像のフォントの使用割合を算出するフォント割合算出手段を備え、前記映像判定手段は、前記フォント割合算出手段で算出されフォントの割合と前記フォント・映像種別対応データベースとに基づいて映像種別を判定することを特徴とする。 According to a third aspect of the present invention, in the video type determining apparatus according to the first aspect, the font recognized by the font recognizing unit is integrated for each font, and the font used to calculate the font usage rate of the input video is calculated. A ratio calculation unit is provided, wherein the video determination unit determines the video type based on the font ratio calculated by the font ratio calculation unit and the font / video type correspondence database.

請求項４に記載された発明は、請求項３記載の映像種別判定装置において、フォント・映像種別対応データベースは、予め映像ジャンル毎のフォント利用割合のデータを備えることを特徴とする。 According to a fourth aspect of the present invention, in the video type determination apparatus according to the third aspect, the font / video type correspondence database includes data on a font use ratio for each video genre in advance.

請求項５に記載された発明は、入力されたフレーム構成の映像から文字が表示されたフレームを抜き出すフレーム抜き出しステップと、文字が表示されたフレームの文字の認識を行う文字認識ステップと、認識された文字のフォントを認識するフォント認識ステップと、文字のフォントと映像種別とを対応づけたフォント・映像種別対応データベースを参照して、前記入力された映像の映像種別を判定する映像判定ステップとを備えたことを特徴とする映像種別判定方法である。 The invention described in claim 5 is recognized as a frame extracting step for extracting a frame in which characters are displayed from an input image having a frame structure, and a character recognizing step for recognizing characters in the frame in which characters are displayed. A font recognition step for recognizing the font of the character, and a video determination step for determining the video type of the input video with reference to a font / video type correspondence database that associates the font of the character with the video type. A video type determination method characterized by comprising the above.

請求項６に記載された発明は、入力されたフレーム構成の映像から文字が表示されたフレームを抜き出すフレーム抜き出し手段、文字が表示されたフレームの文字の認識を行う文字認識手段、認識された文字のフォントを認識するフォント認識手段、文字のフォントと映像種別とを対応づけたフォント・映像種別対応データベースを参照して、前記入力された映像の映像種別を判定する映像判定手段としてコンピュータを機能させるための映像種別判定プログラムである。 The invention described in claim 6 is a frame extracting means for extracting a frame in which characters are displayed from an input image having a frame structure, a character recognizing means for recognizing characters in a frame in which characters are displayed, and a recognized character. The computer functions as image recognition means for determining the image type of the input image with reference to a font / image type correspondence database that associates the font of the character with the image type. This is a video type determination program.

請求項７に記載された発明は、請求項６記載の映像種別判定プログラムにおいて、前記フォント認識手段で最も多く認識されたフォントをその映像の特徴的フォントであると判定する特徴的フォント判定手段としてコンピュータを機能させ、前記映像判定手段は、該特徴的フォント判定手段で判定された特徴的なフォントと前記フォント・映像種別対応データベースとに基づいて映像種別を判定することを特徴とする。 According to a seventh aspect of the present invention, in the video type determination program according to the sixth aspect, as the characteristic font determination means for determining that the font most recognized by the font recognition means is the characteristic font of the video. The computer is made to function, and the video determination unit determines the video type based on the characteristic font determined by the characteristic font determination unit and the font / video type correspondence database.

請求項８に記載された発明は、請求項６記載の映像種別判定プログラムにおいて、前記フォント認識手段で認識されたフォントをフォント毎に積算し、入力された映像のフォントの使用割合を算出するフォント割合算出手段としてコンピュータを機能させ、前記映像判定手段は、前記フォント割合算出手段で算出されフォントの割合と前記フォント・映像種別対応データベースとに基づいて映像種別を判定することを特徴とする。 The invention described in claim 8 is the video type determination program according to claim 6, wherein the fonts recognized by the font recognition means are integrated for each font, and the font usage ratio of the input video is calculated. The computer is made to function as a ratio calculation means, and the video determination means determines the video type based on the font ratio calculated by the font ratio calculation means and the font / video type correspondence database.

テロップ文字のフォント（含む修飾情報）に基づいて、映像種別の分類又は選択を支援する映像種別判定方法、映像種別判定装置及び映像種別判定プログラムを提供することができる。 It is possible to provide a video type determination method, a video type determination device, and a video type determination program that support classification or selection of a video type based on a telop font (including modification information).

図７に本発明の映像種別判定装置の例を示す。図7の映像種別判定装置は、複数のフレームを有する映像素材４１を入力する映像入力装置４２と、入力された映像からテロップ文字が表示されたフレームを抜き出すフレーム抜出装置４３と、テロップ文字が表示されたフレームのテロップ文字の認識を行う文字認識装置４４と、文字を認識するために用いられる文字認識情報データベース４５と、認識された文字のフォントを判定するフォント判定装置４６と、フォントイメージ等のフォント情報が格納されているフォント情報データベース４７と、前記入力された映像の映像種別を判定する映像種別判定装置４８と、フォントと映像種別を対応づけたフォント・映像種別対応データベース４９と、メタデータ生成装置５０と、メタデータ管理装置５１から構成されている。 FIG. 7 shows an example of the video type determination apparatus of the present invention. 7 includes a video input device 42 that inputs a video material 41 having a plurality of frames, a frame extraction device 43 that extracts a frame in which a telop character is displayed from the input video, and a telop character is A character recognition device 44 for recognizing the telop character of the displayed frame, a character recognition information database 45 used for character recognition, a font determination device 46 for determining the font of the recognized character, a font image, etc. A font information database 47 in which font information is stored, a video type determination device 48 that determines the video type of the input video, a font / video type correspondence database 49 that associates fonts with video types, a meta The data generation device 50 and the metadata management device 51 are included.

映像素材４１は、ＨＤＣＡＭ、Digital Beta Cam、ＤＶＣＡＭ、Beta Cam、ＤＶ、ＶＨＳ、ＤＶＤなどの各種映像メディアに保存された映像素材であてもよいし、地上波、ディジタル放送などの放送波などからの映像を直接用いるようにしてもよい。また、一つのフレームに相当する１枚の映像であってもよい。 The video material 41 may be a video material stored in various video media such as HDCAM, Digital Beta Cam, DVCAM, Beta Cam, DV, VHS, and DVD, or from broadcast waves such as terrestrial and digital broadcasts. The video may be used directly. Further, it may be one image corresponding to one frame.

映像入力装置４２は、多種多様なメディアからの映像コンテンツを取得する装置である。ＶＴＲ機器制御や映像のディジタル化等の機能を持っていてもよい。 The video input device 42 is a device that acquires video content from various media. It may have functions such as VTR device control and video digitization.

フレーム抜出装置４３は、自動的に、入力された映像からテロップ文字が表示されたフレームを抜き出す。文字認識装置４４は、文字認識情報データベース４５を参照して、フレーム抜出装置４３で抜き出されたテロップ文字が表示されたフレームのテロップ文字を自動的に認識する。 The frame extracting device 43 automatically extracts a frame in which telop characters are displayed from the input video. The character recognition device 44 refers to the character recognition information database 45 and automatically recognizes the telop character of the frame in which the telop character extracted by the frame extraction device 43 is displayed.

フォント認識装置４６は、フォント情報データベース４７を参照して、文字認識装置４４で認識された文字のフォント及び／又は文字の修飾情報を認識する。映像種別判定装置４８は、文字認識装置４４及びフォント認識装置４６の認識結果と、フォント映像種別を対応づけたフォント・映像種別対応データベース４７とから、映像の種別（ジャンル等）を自動的に判定する。 The font recognizing device 46 refers to the font information database 47 and recognizes the character font and / or character modification information recognized by the character recognizing device 44. The video type determination device 48 automatically determines the video type (genre etc.) from the recognition results of the character recognition device 44 and the font recognition device 46 and the font / video type correspondence database 47 in which the font video type is associated. To do.

メタデータ生成装置５０では、映像種別判定装置４８で判定した映像の種別（ジャンル等）を、対象とする映像のメタデータの一つとして生成・付与する。 The metadata generation device 50 generates and assigns the video type (genre and the like) determined by the video type determination device 48 as one of the target video metadata.

なお、映像種別の付与は、入力された映像につき１つ付与してもよいし、一定の時間に区切って付与してもよい。一定時間に区切って付与する場合にはフォントの使用割合を算出する際にも一定時間毎に算出する構成となる。 Note that one video type may be given for each input video, or may be given at regular intervals. In the case where the fonts are given at regular intervals, the font usage rate is calculated at regular intervals.

映像の場合のメタデータは、例えば、ジャンル、映像の題名、ファイル名、制作者、出演者の名前、著作権、時間、シーン数等である。これらのメタデータは、例えば、ＭＰＥＧ（Moving Picture Experts Group）−７によって、記述する。 The metadata in the case of video is, for example, genre, video title, file name, producer, name of performer, copyright, time, number of scenes, and the like. These metadata are described by, for example, MPEG (Moving Picture Experts Group) -7.

メタデータ管理装置５０では、メタデータを管理しておくディスクである。映像コンテンツとメタデータとの対応を、コンテンツＩＤを発行して管理する。 The metadata management device 50 is a disk for managing metadata. The correspondence between video content and metadata is managed by issuing a content ID.

このようにメタデータを管理することにより、メタデータを用いて検索を行うことが可能となる。例えば、利用者が、欲しい映像をメタデータ管理装置で検索を行い、特定の映像データを取得することは可能となる。 By managing the metadata in this way, it is possible to perform a search using the metadata. For example, it is possible for a user to search for a desired video with the metadata management apparatus and acquire specific video data.

次に、図８のフローチャートを用いて、映像種別判定方法を説明する。まず、映像入力手段から、認識対象となる映像データが入力される（Ｓ１）。ステップＳ１では、フレーム抜き出し手段（フレーム抜き出し装置）が、テロップが表示される映像のフレームを検出する。ある連続したフレーム間では、同一のテロップ文字が表示されるので、その同一のテロップ文字が表示され区間を検出する。なお、ステップＳ１における映像フレームの検出は、上述した特許文献１に記載されたテロップ文字表示フレーム検出装置によって、実施することがでる。 Next, a video type determination method will be described using the flowchart of FIG. First, video data to be recognized is input from the video input means (S1). In step S1, the frame extracting means (frame extracting device) detects the frame of the video on which the telop is displayed. Since the same telop character is displayed between certain consecutive frames, the same telop character is displayed and the section is detected. Note that the detection of the video frame in step S1 can be performed by the telop character display frame detection device described in Patent Document 1 described above.

ステップＳ２では、フレーム領域抽出手段が、フレームに表示されているテロップの領域を抽出する。ステップＳ３では、文字分離手段（文字認識装置の一部）が、抽出したテロップの領域を文字毎に分離する。ここでは、分離された文字のイメージデータを、読み込み文字（ｎ個）とする。なお、ステップＳ２におけるテロップ領域の抽出は、上述した特許文献２に記載された文字領域の抽出技術によって、実施することがでる。 In step S2, the frame area extracting means extracts the telop area displayed in the frame. In step S3, the character separation means (part of the character recognition device) separates the extracted telop area for each character. Here, it is assumed that the image data of the separated characters is a read character (n). The telop area extraction in step S2 can be performed by the character area extraction technique described in Patent Document 2 described above.

ステップＳ４では、文字情報判定手段（文字認識装置の一部）が、読み込んだ文字のｎ番目の文字のサイズと色を判定する。ステップＳ５では、文字サイズ判定手段が、ｎ番目の読み込み文字のサイズが一定サイズ（ｍ×ｋ）以上か判定する。一定サイズ（ｍ×ｋ）未満であった場合は、フォントを判定するには文字のサイズが小さすぎるとみなし、その文字のフォントの判定は行わず、ステップＳ１１に進む。 In step S4, the character information determination means (a part of the character recognition device) determines the size and color of the nth character of the read character. In step S5, the character size determination means determines whether the size of the nth read character is equal to or larger than a certain size (m × k). If it is less than a certain size (mxk), it is considered that the size of the character is too small to determine the font, the font of the character is not determined, and the process proceeds to step S11.

ｎ番目の読み込み文字のサイズが一定サイズ（ｍ×ｋ）以上であれば、ステップＳ６で、文字認識手段（文字認識装置の一部）が、システムに内蔵する文字認識情報データベースを参照して、文字の認識を行う。なお、ステップＳ６における文字の認識は、上述した特許文献３に記載された読み込み文字認識手段によって、実施することがでる。 If the size of the nth read character is equal to or larger than a certain size (m × k), in step S6, the character recognition means (a part of the character recognition device) refers to the character recognition information database built in the system, Recognize characters. Note that the character recognition in step S6 can be performed by the read character recognition means described in Patent Document 3 described above.

次いで、ステップＳ７では、フォントデータ認識手段（フォントデータ認識装置）が、システムに内蔵するフォント情報データベースから、ｎ番目の認識文字のフォントデータを取得する。例えばステップＳ６で、ｎ番目の読み込み文字が「あ」と認識されたら、フォント情報データベースから、明朝体、ゴシック体、丸ゴシック体、・・・などあらゆる「あ」のフォントデータを取得する。次いで、ステップＳ８では、フォント認識手段が、ステップＳ４で取得したフォントのサイズと色に基づいて、ｎ番目の読み込み文字のサイズと色に対応したイメージデータを生成する。次いで、ステップＳ９では、フォントデータ認識手段が、作成されたイメージデータと、読み込まれた読み込み文字のイメージデータとをパターン比較し、相違点を検出する。次いで、ステップＳ１０では、相違点が最も少ないイメージデータのフォントをその文字のフォントとして選択する。なお、ステップＳ１０におけるフォントの選択は、上述した特許文献３に記載された近似フォント選択手段等によって、実施することがでる。 In step S7, the font data recognition means (font data recognition device) acquires the font data of the nth recognized character from the font information database built in the system. For example, when the n-th read character is recognized as “A” in step S6, all “A” font data such as Mincho, Gothic, Round Gothic,... Are acquired from the font information database. In step S8, the font recognition unit generates image data corresponding to the size and color of the nth read character based on the font size and color acquired in step S4. Next, in step S9, the font data recognizing means performs pattern comparison between the created image data and the read image data of the read character, and detects a difference. In step S10, the font of the image data with the least difference is selected as the font of the character. Note that the font selection in step S10 can be performed by the approximate font selection means described in Patent Document 3 described above.

ステップＳ１１では、読み込まれた読み込み文字の全てについて、フォントの認識を行ったか否かが判断される。そのとき、読み込まれた読み込み文字の全てについて、フォントの認識を行っていない場合は、ｎの値をステップＳ１２でデクリメントしてステップＳ４に戻り、ｎ−１番目の読み込み文字について、ステップＳ４以下の処理を行う。 In step S11, it is determined whether or not font recognition has been performed for all of the read characters that have been read. At this time, if the font has not been recognized for all of the read characters that have been read, the value of n is decremented in step S12 and the process returns to step S4. Process.

一方、ステップＳ１１では、読み込まれた読み込み文字の全てについて、フォントの認識を行ったと判断された場合は、ステップＳ１３において、１つのテロップ全体で、最も多く認識されたフォントがそのテロップに適合したフォントとして選択される。 On the other hand, if it is determined in step S11 that the font has been recognized for all of the read characters that have been read, in step S13, the font most recognized in one telop is the font that matches the telop. Selected as.

例えば、図９では、1つのテロップにおいて、複数のフォントを有する場合である。図８のテロップは、
・ゴシック体：３
・ポップ体：１
であるので、このテロップのフォントはゴシック体と決定する。 For example, in FIG. 9, one telop has a plurality of fonts. The telop in FIG.
・ Gothic: 3
・ Pop body: 1
Therefore, the telop font is determined to be Gothic.

ステップＳ１４では、映像の全てのフレームにテロップの検出が終了したか否かを判断し、終了していない場合は、ステップＳ１５において、次の異なるテロップが表示されるフレームまで進み、ステップＳ１に戻って、以降の処理が繰り返して行われる。 In step S14, it is determined whether or not telop detection has been completed for all frames of the video. If not, in step S15, the process proceeds to a frame where the next different telop is displayed, and returns to step S1. The subsequent processing is repeated.

一方、ステップＳ１４で映像の全てのフレームにテロップの検出が終了したと判断された場合は、ステップＳ１６に進む。 On the other hand, if it is determined in step S14 that telop detection has been completed for all frames of the video, the process proceeds to step S16.

ステップ１６で、映像種別判定手段（映像種別判定装置）が、文字のフォントと映像種別とを対応づけたフォント・映像種別対応データベースを参照して、前記入力された映像の映像種別を判定する。 In step 16, the video type determination means (video type determination device) determines the video type of the input video with reference to a font / video type correspondence database in which character fonts and video types are associated with each other.

ここで、ステップ１６における映像種別の判定方法の例を二つ説明する。 Here, two examples of the video type determination method in step 16 will be described.

第１方法は、ステップＳ１３で選択された、１つのテロップ全体で、最も多く認識されたフォントが、その映像全体における特徴的なフォントであると判断し、後述する図１１のフォント・映像種別対応データベースを参照して、入力された映像の映像種別を判定する方法である。 In the first method, it is determined that the most recognized font in one entire telop selected in step S13 is a characteristic font in the entire video, and the font and video type corresponding to FIG. This is a method for determining a video type of an input video by referring to a database.

ここで、例えば、図１０に示されているように、映像がテロップのある３つのシーンで構成されているとした場合を考える。図１０では、テロップ１のフォントは、ゴシック体であり、テロップ２のフォントは、ポップ体であり、テロップ３のフォントは、ゴシック体である。 Here, for example, as shown in FIG. 10, a case is considered where the video is composed of three scenes with telops. In FIG. 10, the font of telop 1 is Gothic, the font of telop 2 is pop, and the font of telop 3 is Gothic.

この場合、
・ゴシック体：２
・ポップ体：１
であるので、映像全体において、各テロップのフォントとして最も多く認識されたフォントは、ゴシック体であるので、その特徴的なフォントは、ゴシック体であると判断する。 in this case,
・ Gothic: 2
・ Pop body: 1
Therefore, since the font most recognized as the font of each telop in the entire video is Gothic, it is determined that the characteristic font is Gothic.

特徴的なフォントが判明すれば、図１１のフォント・映像種別対応データベースを参照して、入力された映像の映像種別を判定することができる。 If a characteristic font is found, the video type of the input video can be determined with reference to the font / video type correspondence database of FIG.

フォント・映像種別対応データベースは、図１１に示されているように、フォントと映像種別とが対応づけられている。例えば、図１１では、丸ゴシック体、ＰＯＰ体、ロゴ文字、デザイン書体・・・は、バラエティ系と、明朝体、ゴシック体、教科書体、楷書体・・・は、ニュース系と、字幕書体は、映画系と対応づけられている。 As shown in FIG. 11, the font / video type correspondence database associates fonts with video types. For example, in FIG. 11, the round Gothic font, POP font, logo character, design font, etc. are a variety, and Mincho, Gothic, textbook, font, etc. are a news font and a subtitle font. Is associated with the movie system.

したがって、ステップＳ１３で、そのテロップにおいて、最も多く認識されたフォントが、丸ゴシック体の場合は、そのテロップが流れた映像は、バラエティ系と判断し、教科書体の場合はニュース系と判断し、字幕書体の場合は、映画系であると判断する。 Therefore, in step S13, if the most recognized font in the telop is a round Gothic font, the video in which the telop has flowed is determined to be a variety system, and if it is a text type, it is determined to be a news system. In the case of a subtitle typeface, it is determined that the movie type.

図１０に示されている映像の場合は、ゴシック体であるので、入力された映像はニュース系であると判断する。 Since the video shown in FIG. 10 is Gothic, it is determined that the input video is a news system.

第２方法は、映像全体に亘って、テロップ毎にフォントを判定し、映像全体で使われたテロップのフォントの割合に基づいて、そのフォントの割合から最も確かな映像種別を割り出す方法である。 The second method is a method of determining the font for each telop over the entire video and determining the most reliable video type from the font ratio based on the ratio of the telop font used in the entire video.

第２方法では、フォント・映像種別対応データベースとして、図１２のフォント・映像種別対応データベースを用いる。 In the second method, the font / video type correspondence database of FIG. 12 is used as the font / video type correspondence database.

図１２のフォント・映像種別対応データベースには、映像種別に対して、その可能性の割合（％）が格納されている。図１２では、映像がバラエティ系の場合は、丸ゴシック体が１００、ＰＯＰ体が１００、丸文字が１００、ロゴ文字が９０、デザイン書体が８０、明朝体が０、ゴシック体が２０、教科書体が０、字幕書体が０・・・であり、ニュース系の場合は、丸ゴシック体が０、ＰＯＰ体が０、丸文字が０、ロゴ文字が１０、デザイン書体が２０、明朝体が１００、ゴシック体が８０、教科書体が１００、字幕書体が０・・・と格納されている。 In the font / video type correspondence database of FIG. 12, the probability ratio (%) is stored for the video type. In FIG. 12, when the video is a variety, the round Gothic type is 100, the POP type is 100, the round type is 100, the logo type is 90, the design type is 80, the Mincho type is 0, the Gothic type is 20, and the textbook. The body is 0, the subtitle font is 0 ..., and in the case of news, the round Gothic font is 0, the POP font is 0, the round font is 0, the logo is 10, the design font is 20, and the Mincho font is 100, Gothic 80, textbook 100, subtitle font 0 ... are stored.

このデータベースを用いて、最も確かな映像種別を割り出す。例えば、図１３に示すように、映像全体で使われたテロップのフォントの割合を計算して、その割合が、次のようであったとする。 Using this database, the most reliable video type is determined. For example, as shown in FIG. 13, it is assumed that the ratio of the telop font used in the entire video is calculated and the ratio is as follows.

・ゴシック体：５０％
・デザイン書体：１０％
・明朝体：４０％
図１２のフォント・映像種別対応データベースから、映像種別の可能性の値が次のようにして求まる。・ Gothic: 50%
・ Design font: 10%
・ Mincho: 40%
From the font / video type correspondence database in FIG. 12, the value of the video type possibility is obtained as follows.

映像がバラエティ系である可能性の値αは、
α=50%×20%（ゴシック体）
+10%×80%（デザイン体）
+10%×80%（明朝体）
=18%
映像がニュース系である可能性の値βは、
β=50%×80%（ゴシック体）
+10%×20%（デザイン体）
+10%×100%（明朝体）
=82%
これにより、この場合の映像種別は、ニュース系であることが判明する。 The value α of the possibility that the video is a variety system is
α = 50% × 20% (Gothic)
+ 10% x 80% (design body)
+ 10% x 80% (Mincho)
= 18%
The probability β that the video is news-related is
β = 50% × 80% (Gothic)
+ 10% x 20% (design body)
+ 10% x 100% (Mincho)
= 82%
Thereby, it is found that the video type in this case is a news system.

図１４に、この発明の応用例を示す。この応用例は、主に、コンテンツ入力装置、メタデータ入力装置、検索装置、再生装置及び利用者端末から構成されている。 FIG. 14 shows an application example of the present invention. This application example mainly includes a content input device, a metadata input device, a search device, a playback device, and a user terminal.

コンテンツ入力装置は、複数のフレームを有する映像素材６１を入力する映像入力部６２と、入力された映像のコンテンツを変換するフォーマット変換部６３とコンテンツを管理するコンテンツ管理部６４から構成されている。 The content input device includes a video input unit 62 that inputs a video material 61 having a plurality of frames, a format conversion unit 63 that converts the content of the input video, and a content management unit 64 that manages the content.

また、メタデータ入力装置は、テロップ認識部６５、フォント認識部６６、ジャンル選択部６８、メタデータ生成部６９及びメタデータ管理部７０から構成されている。 Further, the metadata input device includes a telop recognition unit 65, a font recognition unit 66, a genre selection unit 68, a metadata generation unit 69, and a metadata management unit 70.

映像素材４１は、ＨＤＣＡＭ、Digital Beta Cam、ＤＶＣＡＭ、Beta Cam、ＤＶ、ＶＨＳ、ＤＶＤなどの各種映像メディアに保存された映像素材であてもよいし、地上波、ディジタル放送などの放送波などからの映像を直接用いるようにしてもよい。 The video material 41 may be a video material stored in various video media such as HDCAM, Digital Beta Cam, DVCAM, Beta Cam, DV, VHS, and DVD, or from broadcast waves such as terrestrial and digital broadcasts. The video may be used directly.

映像入力部６２は、多種多様なメディアからの映像コンテンツを取得する装置である。ＶＴＲ機器制御や映像のディジタル化等の機能を持っていてもよい。 The video input unit 62 is a device that acquires video content from various media. It may have functions such as VTR device control and video digitization.

フォーマット変換部６３は、取得した映像のフォーマット変換を行う。映像データには、現在MPEG,MPEG2,MPEG4,AVI,WMV・・・などの様々なフォーマットが存在する。フォーマット変換部６３では、取り込んだ様々なフォーマットの映像データを、システム全体で統一したフォーマットに変換したりするといった処理を行う。 The format conversion unit 63 performs format conversion of the acquired video. There are various formats of video data such as MPEG, MPEG2, MPEG4, AVI, WMV, etc. at present. The format conversion unit 63 performs processing such as converting the captured video data in various formats into a format that is unified throughout the system.

コンテンツ管理部６４は、取得した映像を管理する。例えば、メディアストレージやディスクである。 The content management unit 64 manages the acquired video. For example, media storage or disk.

テロップ認識部６５は、映像中の表示されるテロップ文字を自動的に認識する。フォント認識部６６は、映像中の表示されるテロップ文字のフォント及び／又は修飾情報を認識する。ジャンル選択部６８は、テロップ認識部６５で認識された文字と、フォント認識部６６で認識されたフォン及び／又は文字の修飾情報に基づいて、映像ジャンル（映像種別）を自動的に選択する。 The telop recognition unit 65 automatically recognizes the displayed telop characters in the video. The font recognition unit 66 recognizes the font and / or modification information of the displayed telop characters in the video. The genre selection unit 68 automatically selects a video genre (video type) based on the characters recognized by the telop recognition unit 65 and the phone and / or character modification information recognized by the font recognition unit 66.

メタデータ生成部６９では、映像ジャンル選択部で選択した映像の種別（ジャンル等）を、対象とする映像のメタデータの一つとして生成・付与する。 The metadata generation unit 69 generates and assigns the video type (genre etc.) selected by the video genre selection unit as one of the target video metadata.

メタデータ管理部７０では、メタデータを管理しておくディスクである。映像コンテンツとメタデータとの対応を、コンテンツＩＤを発行して管理する。 The metadata management unit 70 is a disk for managing metadata. The correspondence between video content and metadata is managed by issuing a content ID.

なお、コンテンツ管理部７２のコンテンツとメタデータ管理部７０のメタデータ（コンテンツＩＤ、ファイル名、開始時間、終了時間、タイトル、制作者名等）とが、データベース内部で結びつかれている。 The content of the content management unit 72 and the metadata (content ID, file name, start time, end time, title, producer name, etc.) of the metadata management unit 70 are linked inside the database.

利用者端末７３は、ネットワークに接続でき、映像コンテンツが再生できるものであれば、一般的なパーソナルコンピュータにほか、ＰＤＡ（Personal Digital Assistant）や携帯端末でもよい。 The user terminal 73 may be a general personal computer, a PDA (Personal Digital Assistant), or a portable terminal as long as it can be connected to a network and can reproduce video content.

検索部７２は、利用者端末７３から、映像のジャンルやキーワードなどで問い合わせがあったとき、メタデータ管理部１０から検索を行う。再生部７２では、利用者端末７３で映像を閲覧できるように、映像の再生を行う。 The search unit 72 performs a search from the metadata management unit 10 when an inquiry is received from the user terminal 73 regarding the genre or keyword of the video. The reproduction unit 72 reproduces the video so that the user terminal 73 can view the video.

図１４において、システムの利用者は、利用者装置７３を用いて、メタデータ管理装置７０へアクセスする。利用者が観たい映像のジャンルを選択すると、メタデータ管理装置７０は保存されているメタデータを検索し、コンテンツ管理装置６４から、選択された映像ジャンルと一致する映像データを選び出す。 In FIG. 14, the user of the system accesses the metadata management device 70 using the user device 73. When the user selects a video genre that the user wants to watch, the metadata management device 70 searches stored metadata and selects from the content management device 64 video data that matches the selected video genre.

コンテンツ再生装置７２は、先ほど選び出された映像データを利用者端末７３に配信する。 The content reproduction device 72 distributes the video data selected earlier to the user terminal 73.

利用者端末７３では、再生装置１２で配信される映像データを取得し、閲覧する。 The user terminal 73 acquires and browses video data distributed by the playback device 12.

詳細な説明は省略するが、図１５に、コンテンツとメタデータの関係が示されている。図１５に示すように、コンテンツとメタデータを管理することにより、利用者装置７３から、キーワード検索、内容の早見、プレビュー視聴及び物理的情報量の検索を行うことができる。 Although a detailed description is omitted, FIG. 15 shows the relationship between content and metadata. As shown in FIG. 15, by managing the content and metadata, it is possible to perform keyword search, quick reference of contents, preview viewing, and search of physical information amount from the user device 73.

なお、図７及び図１４において、各ブロックは、それぞれ、ハードウエアによって（単独で又は組み合わされて）構成されていても良い。また、各ブロックは、コンピュータに搭載された映像種別判に係るプログラムによって、奏される機能をブロックとして表現されたものであってもよい。 7 and 14, each block may be configured by hardware (independently or in combination). In addition, each block may be a block that represents a function played by a program related to a video type installed in a computer.

また、図８に示されている映像種別判定方法の処理をプログラムにし、コンピュータにこの処理プログラムを搭載して、映像種別判定するようにしてもよい。 Further, the video type determination method shown in FIG. 8 may be processed as a program, and this processing program may be installed in a computer so that the video type is determined.

また、上記の発明を実施するための最良の形態では、フォントに基づいて、映像種別判定を判定したものが説明されているが、フォント以外の文字の修飾情報を考慮して、映像種別判定を行ってもよい。この場合は、フォント認識でフォント以外の文字の修飾情報を認識し、フォント・映像種別対応データベースには、文字のフォント及び文字の修飾情報と映像種別とを対応づけた情報を格納しておく。 In the best mode for carrying out the invention described above, the video type determination is determined based on the font. However, the video type determination is performed in consideration of the modification information of characters other than the font. You may go. In this case, the character recognition information other than the font is recognized by font recognition, and the font / video type correspondence database stores information that associates the font of the character and the character modification information with the video type.

従来例を説明するための図（その１）である。It is FIG. (1) for demonstrating a prior art example. 従来例を説明するための図（その２）である。It is FIG. (2) for demonstrating a prior art example. 従来例を説明するための図（その３）である。It is FIG. (3) for demonstrating a prior art example. 従来例を説明するための図（その４）である。It is FIG. (4) for demonstrating a prior art example. 従来例を説明するための図（その５）である。It is FIG. (5) for demonstrating a prior art example. 従来例を説明するための図（その１）である。It is FIG. (1) for demonstrating a prior art example. 本発明の映像種別判定装置の例を説明するための図である。It is a figure for demonstrating the example of the video classification determination apparatus of this invention. 映像種別判定方法を説明するためのフローチャートの例である。It is an example of the flowchart for demonstrating the image | video classification determination method. 1つのテロップにおいて、複数のフォントを有する場合である。This is a case where a single telop has a plurality of fonts. 映像がテロップのある３つのシーンで構成されているとした場合である。This is a case where the video is composed of three scenes with telops. フォント・映像種別対応データベースの例（その１）を説明するための図である。It is a figure for demonstrating the example (the 1) of a font and video classification correspondence database. フォント・映像種別対応データベースの例（その２）を説明するための図である。It is a figure for demonstrating the example (the 2) of a font and image | video classification corresponding | compatible database. 映像全体で使われているテロップのフォントを説明するための図である。It is a figure for demonstrating the telop font used in the whole image | video. 本発明の応用例を示図である。It is a figure which shows the example of application of this invention. コンテンツとメタデータの関係を示す図である。It is a figure which shows the relationship between a content and metadata.

Explanation of symbols

４１、６１映像素材
４２映像入力装置
４３フレーム抜出装置
４４文字認識装置
４５文字認識情報データベース
４６フォント判定装置
４７フォント情報データベース
４８映像種別判定装置
４９フォント・映像種別対応データベース
５０メタデータ生成装置
５１メタデータ管理装置
６２映像入力部
６３フォーマット変換部
６４コンテンツ管理部
６５テロップ認識部
６６フォント認識部
６７フォント辞書
６８ジャンル選択部（映像種別判定部）
６９メタデータ生成部
７０メタデータ管理装置
７１検索部
７２再生部
７３利用者端末 41, 61 Video material 42 Video input device 43 Frame extraction device 44 Character recognition device 45 Character recognition information database 46 Font judgment device 47 Font information database 48 Video type judgment device 49 Font / video type correspondence database 50 Metadata generation device 51 Meta Data management device 62 Video input unit 63 Format conversion unit 64 Content management unit 65 Telop recognition unit 66 Font recognition unit 67 Font dictionary 68 Genre selection unit (video type determination unit)
69 Metadata generation unit 70 Metadata management device 71 Search unit 72 Playback unit 73 User terminal

Claims

A frame extracting means for extracting a frame on which characters are displayed from the input frame configuration video;
Character recognition means for recognizing the character of the frame in which the character is displayed;
A font recognition means for recognizing the font of the recognized character;
A video type determination apparatus comprising: a video determination unit that determines a video type of the input video by referring to a font / video type correspondence database in which character fonts and video types are associated with each other.

Characteristic font determination means for determining that the font recognized most by the font recognition means is a characteristic font of the video,
2. The video type determination according to claim 1, wherein the video determination unit determines the video type based on the characteristic font determined by the characteristic font determination unit and the font / video type correspondence database. apparatus.

A font ratio calculating means for accumulating the fonts recognized by the font recognition means for each font and calculating a font usage ratio of the input video;
2. The video type determination apparatus according to claim 1, wherein the video determination unit determines a video type based on a font ratio calculated by the font ratio calculation unit and the font / video type correspondence database.

4. The video type determination apparatus according to claim 3, wherein the font / video type correspondence database includes data on a font use ratio for each video genre in advance.

A frame extracting step of extracting a frame in which characters are displayed from the input frame configuration video;
A character recognition step for recognizing the character of the frame in which the character is displayed;
A font recognition step for recognizing the font of the recognized character;
A video type determination method comprising: a video determination step of determining a video type of the input video by referring to a font / video type correspondence database in which character fonts and video types are associated with each other.

Frame extracting means for extracting a frame on which characters are displayed from the input frame configuration video,
Character recognition means for recognizing the character of the frame in which the character is displayed,
Font recognition means for recognizing the font of recognized characters,
A video type determination program for causing a computer to function as video determination means for determining the video type of the input video with reference to a font / video type correspondence database in which character fonts are associated with video types.

A video type determination program for causing a computer to function as characteristic font determination means for determining that the font recognized most by the font recognition means is a characteristic font of the video,
7. The video type determination according to claim 6, wherein the video determination unit determines the video type based on the characteristic font determined by the characteristic font determination unit and the font / video type correspondence database. program.

A video type determination program for causing a computer to function as a font ratio calculation unit that calculates a font usage ratio of an input video by accumulating fonts recognized by the font recognition unit for each font,
7. The video type determination program according to claim 6, wherein the video determination unit determines the video type based on the font ratio calculated by the font ratio calculation unit and the font / video type correspondence database.