JP4473813B2

JP4473813B2 - Metadata automatic generation device, metadata automatic generation method, metadata automatic generation program, and recording medium recording the program

Info

Publication number: JP4473813B2
Application number: JP2005356105A
Authority: JP
Inventors: 秀豪桑野; 裕子紺家; 智一山田; 雄彦川添
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2005-12-09
Filing date: 2005-12-09
Publication date: 2010-06-02
Anticipated expiration: 2025-12-09
Also published as: JP2007165983A

Description

本発明は映像、画像の内容に関するメタデータを自動的に作り出すメタデータ自動生成方法に関わるものであり、特に、映像、画像中の文字情報を自動的に認識する技術に基づいてメタデータを自動生成する技術に関するものである。 The present invention relates to an automatic metadata generation method for automatically generating metadata relating to the contents of video and images. In particular, metadata is automatically generated based on a technology for automatically recognizing character information in video and images. It relates to the technology to be generated.

映像シーンや画像の検索サービスなどを実現するために必要な映像シーン、及び画像の内容を説明するテキスト情報（以後、メタデータとする）を効率的に作り出す技術へのニーズが高い。このニーズに対し、従来から映像、画像中に表示される文字情報を自動認識し、その結果をメタデータとして定義する技術の検討は多く行われており、例として、下記非特許文献１に記載されている機能が存在する。 There is a high need for a technique for efficiently generating text information (hereinafter referred to as metadata) that describes video scenes and image contents necessary for realizing a video scene and image search service. In response to this need, many studies have been made on techniques for automatically recognizing character information displayed in video and images and defining the result as metadata. For example, it is described in Non-Patent Document 1 below. Exists.

映像、画像中に表示される文字情報としては、例えば、ニュース番組映像の中でニュースの内容を説明するテロップ文字や店舗の営業時間の情報などを説明する屋外の看板中の文字などがあり、説明対象を的確に説明する役割を担っている。このことから、映像、画像中の文字情報は映像・画像を検索する際のキー情報として非常に有用な情報である。 Examples of character information displayed in video and images include telop characters that explain the contents of news in news program videos and characters in outdoor signboards that explain store hours information, etc. It plays the role of explaining the explanation object accurately. For this reason, text information in video and images is very useful information as key information when searching for videos and images.

前述の非特許文献１に記載のシステム上の技術は、映像、画像中の文字情報を文字認識技術により認識し、その認識結果のテキスト情報を映像、画像のメタデータとして定義し、これを映像、画像の検索用キーワード情報として利用するものである。 The system technology described in Non-Patent Document 1 described above recognizes character information in video and images by character recognition technology, defines text information of the recognition result as video and image metadata, This is used as keyword information for image search.

尚、下記特許文献１には「文字列情報抽出装置及び方法及びその方法を記録した媒体」が記載されている。
山本奏／萬本正信／黒川清／宇高宏明／山吉一成／佐々木大義ＮＴＴ技術ジャーナルＶＯＬ．１７，ＰＰ４０−４３，電気工学編２００５．１．特開２００１−７６０９４号公報「文字列情報抽出装置及び方法及びその方法を記録した媒体」 The following Patent Document 1 describes “a character string information extracting device and method and a medium on which the method is recorded”.
Kan Yamamoto / Masanobu Enomoto / Kiyoshi Kurokawa / Hiroaki Udaka / Kazunari Yamayoshi / Taiyoshi Sasaki NTT Technology Journal VOL. 17, PP40-43, electrical engineering edition 2005.1. Japanese Patent Laid-Open No. 2001-76094 “Character String Information Extracting Apparatus and Method and Medium Recording the Method”

前記非特許文献１に記載のシステム上の従来技術は、映像、画像中に複数の様々な意味内容を持った文字表示が存在する場合でも、画像全体としての１種類の文字認識結果のテキスト情報をそのままメタデータとして定義するものである。 The conventional technology on the system described in Non-Patent Document 1 is that the text information of one type of character recognition result as the whole image even when there are a plurality of character displays having various meaning contents in the video and the image. Is defined as metadata as it is.

具体的には、図１に示すように、画像中に人名や地名、事象名など複数種類の文字が表示される場合（図１は文字表示が含まれる映像・画像の例であり、ニュース映像中の台風レポート中継のシーンを示したものである。「２０日（火）」という日時情報と「台風１５号、接近中」という事象名と「佐藤記者」という人物名の３つの文字列が表示されている。）、図２に示すように、文字認識結果として「２０日（火）台風１５号、接近中佐藤記者」という日時、事象名、人名が混在した一つの文字列がそのままメタデータとして設定される。図２はＸＭＬ形式で表現されるメタデータの一つの項目（テロップ認識結果タグ）に「２０日（火）台風１５号、接近中佐藤記者」という一つの文字列が設定される例を示したものである。…（Ａ）
一方、キーワードを使って、情報検索を行う場合、一般に、不要な情報が検索結果として出てこないように、キーワード文字列の他、人名、地名など、探したい情報の項目名も指定して検索するケースが良くある（図３参照）。利用者にとっては、不要な情報は検索結果として出てこないため、項目を指定しない場合に比べて、短時間で自分の知りたい情報に辿り着くことができるというメリットがある。…（Ｂ）
（Ｂ）を実現するためには、検索用のテキスト情報のデータベースに、検索対象のテキストだけではなく、そのテキストが意味する項目名とセットで格納されている必要がある。図３のような検索を実施したい場合は、「台風」というテキストは「ニュース事象」という項目であるという情報としてデータベースに格納されている必要がある。 Specifically, as shown in FIG. 1, when a plurality of types of characters such as names of persons, places, and event names are displayed in the image (FIG. 1 is an example of a video / image including character display, and a news video It shows the scene of the typhoon report in the middle, with the date and time information “20th (Tue)”, the event name “Typhoon No. 15, approaching” and the character string “Sato reporter”. 2) As shown in FIG. 2, as a character recognition result, a single character string containing the date and time, event name, and personal name of “20th (Tue) Typhoon No. 15, approaching Sato reporter” is stored as meta. Set as data. FIG. 2 shows an example in which one character string “20th (Tue) Typhoon No. 15, approaching Sato reporter” is set in one item (telop recognition result tag) of metadata expressed in XML format. Is. ... (A)
On the other hand, when searching for information using keywords, in general, specify the item name of the information you want to search for, such as the name of a person or place, in addition to the keyword character string, so that unnecessary information does not appear as search results. This is often the case (see FIG. 3). For the user, unnecessary information does not appear as a search result, so that there is an advantage that it is possible to reach the information that the user wants to know in a shorter time than when no item is specified. ... (B)
In order to realize (B), it is necessary to store not only the text to be searched but also the item name and the set that the text means in the text information database for search. When it is desired to perform a search as shown in FIG. 3, the text “typhoon” needs to be stored in the database as information indicating that the item is “news event”.

しかしながら、映像、画像中に表示される文字を自動認識して検索用のテキストデータを作成する場合には、従来の技術においては、前述の（Ａ）のような実現レベルであることから、人名、地名など様々な項目の文字列が混在した一つの文字列としてのみデータベースに格納されてしまい、前述の（Ｂ）のような検索利用時のメリットを利用者に提供することができないという問題点がある。 However, when the text data for search is created by automatically recognizing the characters displayed in the video and image, the conventional technique has the level of realization as described in (A) above, In other words, it is stored in the database only as a single character string in which character strings of various items such as place names are mixed, and it is not possible to provide the user with the benefits of using the search as described in (B) above. There is.

本発明は、以上の点を考慮してなされたもので、映像、画像中に複数の項目からなる文字列が混在する場合でも、文字を自動認識した結果として、図２のような混在文字列ではなく、項目名とそれに対応する文字列をセットでデータベースに格納し、従来技術よりも効率的に利用者が所望の情報を引き出せるようにすることを目的とするものである。 The present invention has been made in consideration of the above points, and even when a character string composed of a plurality of items is mixed in a video and an image, as a result of automatic character recognition, a mixed character string as shown in FIG. Instead, the object name and the character string corresponding to the item name are stored in a database as a set so that the user can extract desired information more efficiently than the prior art.

前記課題を解決するために本発明の請求項１に記載のメタデータ自動生成装置は、単数、または複数の文字列が画面上に表示される映像・画像データを読み込む映像・画像データ読み込み部と、前記映像・画像データ読み込み部で読み込まれた映像・画像データを二値化処理して文字領域を抽出し、さらに、前記読み込まれた映像・画像データから輝度エッジを抽出し、該抽出した文字領域、及び輝度エッジを画像中の水平ライン毎にカウントし、カウントした値が閾値以上となる水平ラインは文字列矩形内にあると判定することにより、映像・画像データ内の全ての表示文字列の外接矩形の頂点の垂直座標値、大きさ情報である文字列矩形情報を取得する文字列矩形抽出部と、前記文字列矩形抽出部で抽出された画像中の矩形内の文字パターンについて文字認識処理を行い、文字認識結果としてテキストデータを出力する文字認識部と、前記文字列矩形抽出部で抽出された文字列矩形情報と、メタデータ項目名とそれに対応する画像中の文字列矩形の頂点の垂直座標値の上限および下限、大きさに関する条件の情報が単数、または、複数定義されたメタデータ項目名データベース中の文字列矩形条件とを比較し、該条件と合致する場合、その条件に対応するメタデータ項目名を出力するメタデータ項目名検索部と、前記メタデータ項目名検索部から出力されるメタデータ項目名と前記文字認識部で得られた文字認識結果をセットとして出力するメタデータ出力部とを備え、前記メタデータ項目名検索部は、前記外接矩形の頂点の垂直座標値を、前記文字列矩形条件で定義された文字列矩形の頂点の垂直座標値の上限および下限と比較し、その結果が上限および下限の範囲内にある場合、該条件と合致すると判定することを特徴とする。 In order to solve the above-described problem, an automatic metadata generation apparatus according to claim 1 of the present invention includes a video / image data reading unit that reads video / image data in which one or more character strings are displayed on a screen, and The video / image data read by the video / image data reading unit is binarized to extract a character area, and a luminance edge is extracted from the read video / image data. All display character strings in video / image data are determined by counting the area and luminance edge for each horizontal line in the image, and determining that the horizontal line whose counted value is equal to or greater than the threshold is within the character string rectangle. vertical coordinate values of the vertices of a circumscribed rectangle of a character string rectangle extracting unit for obtaining character string rectangle information is the size information, the character pattern in the rectangular of the character string rectangle extracting unit in the image extracted in A character recognition unit that performs character recognition processing on the character string and outputs text data as a character recognition result, character string rectangle information extracted by the character string rectangle extraction unit, a metadata item name, and a corresponding character in the image When the information on the upper and lower limits of the vertical coordinate value of the vertex of the column rectangle and the condition information related to the size are compared with the string rectangle condition in the metadata item name database defined by one or more, and the condition is met A metadata item name search unit that outputs a metadata item name corresponding to the condition, a metadata item name output from the metadata item name search unit, and a character recognition result obtained by the character recognition unit and a metadata output unit for outputting as the metadata item name search unit, a vertical coordinate value of the vertex of the circumscribed rectangle, which is defined by the character string rectangle conditionals Compared to upper and lower vertical coordinate values of the vertices of the string rectangles and the result when within the range of upper and lower limits, and wherein the determining and matching with the condition.

具体的には、単数、または複数の文字列が画面上に表示される映像・画像データを読み込む映像・画像データ読み込み部と、前記映像・画像データ読み込み部で読み込まれた映像・画像データに対し、予め決められた文字領域抽出処理、及び輝度エッジ情報抽出処理を実施し、抽出した文字領域、及び輝度エッジを画像中の水平ライン毎にカウントし、カウントした値が閾値以上となる水平ラインは文字列矩形内にあると判定することにより、映像・画像データ内の全ての表示文字列の外接矩形の頂点の垂直座標値、大きさ情報である文字列矩形情報を取得する文字列矩形抽出部と、前記文字列矩形抽出部で抽出された画像中の矩形内の文字パターンを予め決められた方法を用いて、文字認識処理を行い、文字認識結果としてテキストデータを出力する文字認識部と、メタデータ項目名とそれに対応する画像中の文字列矩形の頂点の垂直座標値の上限および下限、大きさに関する条件の情報が単数、または、複数定義されたメタデータ項目名データベースと、前記文字列矩形抽出部で抽出された文字列矩形情報と前記メタデータ項目名データベース中の文字列矩形条件を比較し、該条件と合致する場合、その条件に対応するメタデータ項目名を出力するメタデータ項目名検索部と、前記メタデータ項目名検索部から出力されるメタデータ項目名と前記文字認識部で得られた文字認識結果をセットとして出力するメタデータ出力部とを備え、前記メタデータ項目名検索部は、前記外接矩形の頂点の垂直座標値を、前記文字列矩形条件で定義された文字列矩形の頂点の垂直座標値の上限および下限と比較し、その結果が上限および下限の範囲内にある場合、該条件と合致すると判定することを特徴とする。 Specifically, a video / image data reading unit that reads video / image data in which one or a plurality of character strings are displayed on the screen, and the video / image data read by the video / image data reading unit The predetermined character area extraction process and the luminance edge information extraction process are performed, the extracted character area and the luminance edge are counted for each horizontal line in the image, and the horizontal line where the counted value is equal to or greater than the threshold is A character string rectangle extraction unit that obtains character string rectangle information as vertical coordinate values and size information of circumscribed rectangles of all display character strings in video / image data by determining that they are within the character string rectangle And character recognition processing using a predetermined method for character patterns in the rectangle in the image extracted by the character string rectangle extraction unit, and text data is obtained as a character recognition result. A character recognition unit which outputs the metadata item name and upper and lower vertical coordinate values of the vertexes of the character string rectangle in the image corresponding thereto, the condition of the information about the size of one or a plurality defined metadata items, Compare the name database, the character string rectangle information extracted by the character string rectangle extraction unit and the character string rectangle condition in the metadata item name database, and if the condition matches, the metadata item corresponding to the condition A metadata item name search unit that outputs a name; a metadata output unit that outputs a metadata item name output from the metadata item name search unit and a character recognition result obtained by the character recognition unit as a set ; The metadata item name search unit includes a vertical coordinate value of the vertex of the circumscribed rectangle, an upper limit of a vertical coordinate value of the vertex of the character string rectangle defined by the character string rectangle condition And compared with the lower limit, the result may in the range of upper and lower limits, and wherein the determining and matching with the condition.

また、前記課題を解決するために本発明の請求項２に記載のメタデータ自動生成方法は、単数、または複数の文字列が画面上に表示される映像・画像データを読み込む映像・画像データ読み込みステップと、前記映像・画像データ読み込みステップで読み込まれた映像・画像データを二値化処理して文字領域を抽出し、さらに、前記読み込まれた映像・画像データから輝度エッジを抽出し、該抽出した文字領域、及び輝度エッジを画像中の水平ライン毎にカウントし、カウントした値が閾値以上となる水平ラインは文字列矩形内にあると判定することにより、映像・画像データ内の全ての表示文字列の外接矩形の頂点の垂直座標値、大きさ情報である文字列矩形情報を取得する文字列矩形抽出ステップと、前記文字列矩形抽出ステップで抽出された画像中の矩形内の文字パターンについて文字認識処理を行い、文字認識結果としてテキストデータを出力する文字認識ステップと、前記文字列矩形抽出ステップで抽出された文字列矩形情報と、メタデータ項目名とそれに対応する画像中の文字列矩形の頂点の垂直座標値の上限および下限、大きさに関する条件の情報が単数、または、複数定義されたメタデータ項目名データベースの中の文字列矩形条件とを比較し、該条件と合致する場合、その条件に対応するメタデータ項目名を出力するメタデータ項目名検索ステップと、前記メタデータ項目名検索ステップから出力されるメタデータ項目名と前記文字認識ステップで得られた文字認識結果をセットとして出力するメタデータ出力ステップとを備え、前記メタデータ項目名検索ステップは、前記外接矩形の頂点の垂直座標値を、前記文字列矩形条件で定義された文字列矩形の頂点の垂直座標値の上限および下限と比較し、その結果が上限および下限の範囲内にある場合、該条件と合致すると判定することを特徴とする。 In order to solve the above problem, the automatic metadata generation method according to claim 2 of the present invention reads video / image data for reading video / image data in which a single character string or a plurality of character strings are displayed on the screen. And binarizing the video / image data read in the video / image data reading step to extract a character area; and extracting a luminance edge from the read video / image data; The displayed character area and brightness edge are counted for each horizontal line in the image, and all the display in the video / image data is determined by determining that the horizontal line whose counted value is equal to or greater than the threshold is within the character string rectangle. vertical coordinate values of the vertices of a circumscribed rectangle of a character string, a character string rectangle extracting step of acquiring a character string rectangle information is the size information, extracted by the character string rectangle extraction step A character recognition step of performing character recognition processing on a character pattern in a rectangle in the image and outputting text data as a character recognition result, character string rectangle information extracted in the character string rectangle extraction step, and metadata items Information on the upper and lower limits of the vertical coordinate value of the name and the corresponding vertex of the string rectangle in the image, and information on the condition related to the size, or a string rectangle condition in the metadata item name database defined in multiple If the condition matches, the metadata item name search step for outputting the metadata item name corresponding to the condition, the metadata item name output from the metadata item name search step and the character recognition the character recognition result obtained in step a metadata output step of outputting as a set, the metadata item name search stearate Compares the vertical coordinate values of the vertices of the circumscribed rectangle with the upper and lower limits of the vertical coordinate values of the vertices of the character string rectangle defined by the character string rectangle condition, and the result is within the upper and lower limit ranges. In some cases, it is determined that the condition is met .

具体的には、単数、または複数の文字列が画面上に表示される映像・画像データを読み込む映像・画像データ読み込みステップと、前記映像・画像データ読み込みステップで読み込まれた映像・画像データに対し、予め決められた文字領域抽出処理、及び輝度エッジ情報抽出処理を実施し、抽出した文字領域、及び輝度エッジを画像中の水平ライン毎にカウントし、カウントした値が閾値以上となる水平ラインは文字列矩形内にあると判定することにより、映像・画像データ内の全ての表示文字列の外接矩形の頂点の垂直座標値、大きさ情報である文字列矩形情報を取得する文字列矩形抽出ステップと、前記文字列矩形抽出ステップで抽出された画像中の矩形内の文字パターンを予め決められた方法を用いて、文字認識処理を行い、文字認識結果としてテキストデータを出力する文字認識ステップと、前記文字列矩形抽出ステップで抽出された文字列矩形情報と、メタデータ項目名とそれに対応する画像中の文字列矩形の頂点の垂直座標値の上限および下限、大きさに関する条件の情報が単数、または、複数定義されたメタデータ項目名データベースの中の文字列矩形条件とを比較し、該条件と合致する場合、その条件に対応するメタデータ項目名を出力するメタデータ項目名検索ステップと、前記メタデータ項目名検索ステップから出力されるメタデータ項目名と前記文字認識ステップで得られた文字認識結果をセットとして出力するメタデータ出力ステップとを備え、前記メタデータ項目名検索ステップは、前記外接矩形の頂点の垂直座標値を、前記文字列矩形条件で定義された文字列矩形の頂点の垂直座標値の上限および下限と比較し、その結果が上限および下限の範囲内にある場合、該条件と合致すると判定することを特徴とする。
Specifically, a video / image data reading step for reading video / image data in which one or a plurality of character strings are displayed on the screen, and the video / image data read in the video / image data reading step The predetermined character area extraction process and the luminance edge information extraction process are performed, the extracted character area and the luminance edge are counted for each horizontal line in the image, and the horizontal line where the counted value is equal to or greater than the threshold is A character string rectangle extraction step for acquiring character string rectangle information as vertical coordinate values and size information of circumscribed rectangles of all display character strings in video / image data by determining that the character string rectangle is present And character recognition processing using a predetermined method for the character pattern in the rectangle in the image extracted in the character string rectangle extraction step. A character recognition step of outputting the text data as a result, a character string rectangle information extracted by the character string rectangle extraction step, the upper limit of the metadata item name and vertical coordinate values of the vertexes of the character string rectangle in the image corresponding thereto If the condition information regarding the lower limit and size is one or more than one defined in the metadata item name database, and the condition is matched, if the condition matches, the metadata item corresponding to the condition and metadata item name search step of outputting the name, and a meta data output step of outputting as the metadata item name search set character recognition result obtained by the metadata item name and the character recognition step output from step The metadata item name search step includes a vertical coordinate value of a vertex of the circumscribed rectangle defined by the character string rectangle condition. Compared to upper and lower vertical coordinate value of the character string rectangle vertices and the results when within range of the upper and lower limits, and wherein the determining and matching with the condition.

また、前記課題を解決するために本発明の請求項３に記載のメタデータ自動生成プログラムは、請求項２に記載のメタデータ自動生成方法における各ステップをコンピュータに実行させるためのプログラムとして構成したことを特徴とする。 In order to solve the above problem, the metadata automatic generation program according to claim 3 of the present invention is configured as a program for causing a computer to execute each step in the metadata automatic generation method according to claim 2. It is characterized by that.

また、前記課題を解決するために本発明の請求項４に記載のメタデータ自動生成プログラムを記録した記録媒体は、請求項３に記載のメタデータ自動生成プログラムを該コンピュータが読み取り可能な記録媒体に記録したことを特徴とする。 In order to solve the above problem, a recording medium on which the metadata automatic generation program according to claim 4 of the present invention is recorded is a recording medium on which the computer can read the metadata automatic generation program according to claim 3. It is characterized by being recorded in

文字が表示されている映像、画像の中には、一般に、表示する文字の内容に応じて、定型の表示レイアウトパターンを持つものが存在する。図４にその例を示す。図４（ａ）はニュース番組中の各ニューストピックの冒頭に表示されるニュース見出しのテロップ文字情報である。日本国内においては、同じニュース番組内であれば、各ニューストピックの冒頭に画像中の同じ位置、大きさ、デザインを持つテロップ文字が表示される性質を持つ。 Some images and images in which characters are displayed generally have a fixed display layout pattern according to the content of the characters to be displayed. An example is shown in FIG. FIG. 4A shows telop character information of a news headline displayed at the beginning of each news topic in a news program. In Japan, within the same news program, telop characters having the same position, size and design in the image are displayed at the beginning of each news topic.

他の例として、図４（ｂ）は映像制作会社等で制作された映像の冒頭に表示されるクレジット表示の例である。クレジット表示には、番組制作日、番組タイトル、制作者、放送日など、映像内容を説明する文字が含まれる。このクレジット表示についても、同じ映像制作会社で制作されたものであれば、いずれの映像のクレジット表示中の文字も画像中の同じ位置、大きさ、デザインで表示されるものである。本発明では、この性質を利用し、画像中の文字の位置、大きさ、デザインなどのレイアウト情報から文字内容が指し示す項目名も合わせて認識し、項目名、及び文字認識結果のテキストを両方ともデータベースに格納するものである。 As another example, FIG. 4B is an example of a credit display displayed at the beginning of a video produced by a video production company or the like. The credit display includes characters describing the video content such as the program production date, the program title, the producer, and the broadcast date. As for this credit display, as long as it is produced by the same video production company, the characters in the credit display of any video are displayed in the same position, size and design in the image. In the present invention, using this property, the item name indicated by the character content is also recognized from the layout information such as the position, size, and design of the character in the image, and both the item name and the text of the character recognition result are recognized. It is stored in a database.

すなわち、本発明では、請求項１に記載のように、入力映像・画像中に表示される文字パターンを認識し、認識結果のテキストを取得するだけでなく、文字列の外接矩形の位置、大きさ等のレイアウト情報を抽出することで、メタデータ項目名データベース中のメタデータ項目名をも取得することが可能である。これにより、本発明によれば、従来技術のように、画像内の文字表示の認識結果を一つのテキストデータとして出力するのではなく、文字列単位にメタデータ項目名と文字認識結果のテキストデータのセットとして出力することが可能であるため、従来技術の解決すべき課題が解決できる。 That is, according to the present invention, as described in claim 1, not only the character pattern displayed in the input video / image is recognized and the recognition result text is acquired, but also the position and size of the circumscribed rectangle of the character string By extracting such layout information, it is possible to obtain the metadata item name in the metadata item name database. Thus, according to the present invention, instead of outputting the recognition result of the character display in the image as one text data as in the prior art, the text data of the metadata item name and the character recognition result in character string units. Therefore, the problem to be solved by the conventional technique can be solved.

本発明によれば、映像、画像中に複数の項目からなる文字列が混在する場合でも、文字を自動認識した結果として、メタデータ項目名とそれに対応する文字列をセットでデータベースに格納し、従来技術よりも効率的に利用者が所望の情報を引き出せるようにする状態を作り出すことが可能となり、映像・画像検索処理の効率化が実現できるという効果をもたらす。 According to the present invention, even when a character string consisting of a plurality of items is mixed in a video and an image, as a result of automatically recognizing the character, the metadata item name and the corresponding character string are stored in a database as a set, It is possible to create a state in which the user can extract desired information more efficiently than in the prior art, and the effect that the efficiency of the video / image search processing can be realized is brought about.

以下、本発明の実施形態例について図面を参照しながら説明するが、本発明は下記の実施形態例に限定されるものではない。図５は本発明の第１実施形態例としての請求項１に記載のメタデータ自動生成装置の具体的な装置構成の一例を示したものである。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, the present invention is not limited to the following embodiments. FIG. 5 shows an example of a specific apparatus configuration of the metadata automatic generation apparatus according to claim 1 as the first embodiment of the present invention.

図５中の映像・画像データ読み込み部１は、単数、または複数の文字列が画面上に表示される映像・画像データを読み込むものであり、具体的には、ＶＴＲ（ビデオテープレコーダ）機器の出力映像信号を映像キャプチャボードを備えたパーソナルコンピュータで取り込み、ディジタル化し、メモリ領域、あるいはハードディスクに書き込むことで本部は実現可能である。 The video / image data reading unit 1 in FIG. 5 reads video / image data in which one or more character strings are displayed on the screen. Specifically, the video / image data reading unit 1 in a VTR (video tape recorder) device is used. The head office can be realized by capturing the output video signal with a personal computer equipped with a video capture board, digitizing it, and writing it in a memory area or hard disk.

図５中の文字列矩形抽出部２は、前記映像・画像データ読み込み部１で読み込まれた映像・画像データに対し、予め決められた文字領域抽出処理、及び輝度エッジ情報抽出処理を実施し、得られた文字領域、及び輝度エッジの画像内の分布密度に基づいて、映像・画像データ内の全ての表示文字列の外接矩形の位置、大きさ情報を取得するものである。 The character string rectangle extraction unit 2 in FIG. 5 performs predetermined character region extraction processing and luminance edge information extraction processing on the video / image data read by the video / image data reading unit 1, Based on the obtained character region and the distribution density of the luminance edge in the image, the position and size information of the circumscribed rectangle of all display character strings in the video / image data is acquired.

具体的には、パーソナルコンピュータ上のソフトウェアプログラムとして本部を実現する方式を実装することで実現可能である。本部を実現する方式としては、例えば、特許文献１「文字列情報抽出装置及び方法及びその方法を記録した記録媒体」で提案されている方式が利用可能である。 Specifically, it can be realized by implementing a method for realizing the headquarters as a software program on a personal computer. As a method for realizing the headquarters, for example, the method proposed in Patent Document 1 “Character string information extraction device and method and recording medium recording the method” can be used.

図６に文字列矩形抽出部２の実現方法を説明する補足図を示す。図６（ａ）は文字表示が含まれる入力画像であり、図６（ｂ）は図６（ａ）に対し、画像の二値化処理により文字領域を抽出した結果である。画像中の白い部分が文字領域である。図６（ｃ）は図６（ａ）に対し、輝度エッジを抽出した結果である。図６（ｄ）は図６（ｂ）（ｃ）の文字領域と輝度エッジ情報を画像中の水平ライン毎にカウントして、ヒストグラム化した図である。このヒストグラムの山を閾値処理により取り出すことで、文字列矩形の位置、大きさを計算することが可能となる。 FIG. 6 shows a supplementary diagram for explaining a method of realizing the character string rectangle extraction unit 2. FIG. 6A shows an input image including a character display, and FIG. 6B shows a result of extracting a character region by binarization processing of the image with respect to FIG. 6A. A white portion in the image is a character area. FIG. 6C shows the result of extracting the luminance edge with respect to FIG. FIG. 6D is a histogram obtained by counting the character area and luminance edge information in FIGS. 6B and 6C for each horizontal line in the image. By extracting the peaks of the histogram by threshold processing, the position and size of the character string rectangle can be calculated.

図５中の文字認識部３は前記文字列矩形抽出部２で抽出された画像中の矩形内の文字パターンを予め決められた方法を用いて、文字認識処理を行い、文字認識結果としてテキストデータを出力するものであり、コンピュータ上のソフトウェアにより実現可能である。 The character recognition unit 3 in FIG. 5 performs character recognition processing using a predetermined method for the character pattern in the rectangle in the image extracted by the character string rectangle extraction unit 2, and text data is obtained as a character recognition result. Can be realized by software on a computer.

図５中のメタデータ項目名データベース４はメタデータ項目名とそれに対応する画像中の文字列矩形の位置、大きさに関する条件の情報が単数、または、複数定義されるものであり、具体的には、コンピュータ上のハードディスクやメモリに蓄積される情報として実現可能である。図７にメタデータ項目名データベースの内容の例を示す。 The metadata item name database 4 in FIG. 5 is one in which the metadata item name and corresponding information on the position and size of the character string rectangle in the image are defined by one or more. Can be realized as information stored in a hard disk or memory on a computer. FIG. 7 shows an example of the contents of the metadata item name database.

図７の表においては、左列にメタデータ項目名に例として、番組名、放送日時、ジャンル、制作者が設定されている。右列には左列のメタデータ項目名に対応する画像中の文字列矩形の条件がそれぞれ設定されている。メタデータ項目「番組名」の場合は、対応する入力画像中の文字列矩形の条件として、
・矩形の左上頂点の水平座標：５０以上６０以下
・矩形の左上頂点の垂直座標：２５０以上２６０以下
・矩形の幅：３００以上３５０以下
・矩形の高さ：４０以上５０以下
というのが設定された例を示したものである。 In the table of FIG. 7, a program name, broadcast date / time, genre, and producer are set as examples of metadata item names in the left column. In the right column, a character string rectangle condition in the image corresponding to the metadata item name in the left column is set. In the case of the metadata item “program name”, as the condition of the character string rectangle in the corresponding input image,
・ Horizontal coordinates of the upper left vertex of the rectangle: 50 to 60 ・ Vertical coordinates of the upper left vertex of the rectangle: 250 to 260 ・ Rectangle width: 300 to 350 ・ Rectangle height: 40 to 50 An example is shown.

図５中のメタデータ項目名検索部５は前記文字列矩形抽出部２で抽出された文字列矩形情報と前記メタデータ項目名データベース４中の文字列矩形条件を比較し、条件と合致する場合、その条件に対応するメタデータ項目名を出力するものであり、コンピュータ上のソフトウェアとして本方式を実装することで実現可能である。図８に本処理部の具体的な処理フローの一例を示す。 When the metadata item name search unit 5 in FIG. 5 compares the character string rectangle information extracted by the character string rectangle extraction unit 2 with the character string rectangle condition in the metadata item name database 4 and matches the condition. The metadata item name corresponding to the condition is output, and can be realized by implementing this method as software on a computer. FIG. 8 shows an example of a specific processing flow of this processing unit.

図８中のステップＳ２１は図５中の文字列矩形抽出部２で抽出された文字列矩形の左上頂点の座標、幅、高さの値を入力し、ステップＳ２２に移る。 In step S21 in FIG. 8, the coordinates, width, and height values of the upper left vertex of the character string rectangle extracted by the character string rectangle extracting unit 2 in FIG. 5 are input, and the process proceeds to step S22.

図８中のステップＳ２２は図５中のメタデータ項目名データベース４の中に格納されるメタデータ項目名を参照する際のカウンタ変数Ｎに対し、１を設定し、ステップＳ２３に移る。この際、メタデータ項目名データベース中に格納される全てのメタデータ項目名の数をＭとする。 In step S22 in FIG. 8, 1 is set to the counter variable N when referring to the metadata item name stored in the metadata item name database 4 in FIG. 5, and the process proceeds to step S23. At this time, let M be the number of all metadata item names stored in the metadata item name database.

図８中のステップＳ２３は、ステップＳ２１で入力された文字列矩形の左上頂点の座標、幅、高さがメタデータ項目名データベース中のＮ番目のメタデータ項目の文字列矩形条件を満たすかどうか判断するステップであり、満たす場合は、ステップＳ２６に進み、満たさない場合は、ステップＳ２４に進む。 Step S23 in FIG. 8 determines whether the coordinates, width, and height of the upper left vertex of the character string rectangle input in step S21 satisfy the character string rectangle condition of the Nth metadata item in the metadata item name database. If it is satisfied, the process proceeds to step S26. Otherwise, the process proceeds to step S24.

本ステップの具体的な処理結果としては、例えば、ステップＳ２１で入力された文字列矩形の左上頂点の水平座標が５５、垂直座標が２５０、幅が３２０、高さが４５であって、メタデータ項目名データベース中の設定内容が図７の場合、前記入力値は、Ｎ＝１であるメタデータ項目名「番組名」に対応する文字列矩形の条件、
・矩形の左上頂点の水平座標：５０以上６０以下
・矩形の左上頂点の垂直座標：２５０以上２６０以下
・矩形の幅：３００以上３５０以下
・矩形の高さ：４０以上５０以下
を満たすことになる。このような場合は、ステップＳ２６に進む。条件を満たさない場合は、ステップＳ２４に進む。 As a specific processing result of this step, for example, the horizontal coordinate of the upper left vertex of the character string rectangle input in step S21 is 55, the vertical coordinate is 250, the width is 320, and the height is 45. When the setting content in the item name database is FIG. 7, the input value is a condition of a character string rectangle corresponding to the metadata item name “program name” with N = 1,
・ Horizontal coordinates of the upper left vertex of the rectangle: 50 to 60 ・ Vertical coordinates of the upper left vertex of the rectangle: 250 to 260 ・ Rectangle width: 300 to 350 ・ Rectangle height: 40 to 50 . In such a case, the process proceeds to step S26. If the condition is not satisfied, the process proceeds to step S24.

図８中のステップＳ２４は図５中のメタデータ項目名データベース４の中に格納されるメタデータ項目名を参照する際のカウンタ変数Ｎが全てのメタデータ項目名の数Ｍ以下かどうか判断するステップであり、Ｎ＜＝Ｍの場合は、ステップＳ２５に進み、そうでない場合は、メタデータ項目名の検索結果は出力せずに本処理部を終了するものである。 Step S24 in FIG. 8 determines whether the counter variable N when referring to the metadata item names stored in the metadata item name database 4 in FIG. 5 is equal to or less than the number M of all metadata item names. If N <= M, the process proceeds to step S25. Otherwise, the processing result is terminated without outputting the search result of the metadata item name.

図８中のステップＳ２５は図５中のメタデータ項目名データベース４の中に格納されるメタデータ項目名を参照する際のカウンタ変数Ｎを１だけインクリメントするステップであり、ステップＳ２３に進む。 Step S25 in FIG. 8 is a step of incrementing the counter variable N by 1 when referring to the metadata item name stored in the metadata item name database 4 in FIG. 5, and the process proceeds to step S23.

図８中のステップＳ２６はメタデータ項目データベース中のＮ番目のメタデータ項目を出力するステップであり、例えば、前記ステップＳ２３の処理例の場合、「番組名」というメタデータ項目名が出力されるものである。本ステップの終了とともに、本処理部全体の処理も終了する。

図５中のメタデータ出力部６は前記メタデータ項目名検索部５から出力されるメタデータ項目名と前記文字認識部２で得られた文字認識結果をセットとして出力する。例えば、前記メタデータ項目名検索部５から出力されるメタデータ項目名が「番組名」であって、前記文字認識部２で得られた文字認識結果が「フレッシュニュース」の場合、「番組名」と「フレッシュニュース」のペアがメタデータ出力部の出力結果となる。 Step S26 in FIG. 8 is a step of outputting the Nth metadata item in the metadata item database. For example, in the case of the processing example of step S23, the metadata item name “program name” is output. Is. With the end of this step, the processing of the entire processing unit is also ended.

The metadata output unit 6 in FIG. 5 outputs the metadata item name output from the metadata item name search unit 5 and the character recognition result obtained by the character recognition unit 2 as a set. For example, when the metadata item name output from the metadata item name search unit 5 is “program name” and the character recognition result obtained by the character recognition unit 2 is “fresh news”, “program name” ”And“ fresh news ”are the output results of the metadata output unit.

また、他の例を図９に示す。図９は図２に示した従来の文字認識結果を本発明により、メタデータ項目名毎にメタデータ化した例である。従来技術によると、図２に示したように、図１の画像中の複数の文字列が全て混在した「２０日（火）台風１５号、接近中佐藤記者」という一つの文字列としてメタデータ利用されるのに対し、本発明によると、「２０日（火）台風１５号、接近中佐藤記者」という文字列が、「２０日（火）」「台風１５号、接近中」「佐藤記者」という３つの文字列に分けられ、図９に示すように、「日付」「ニュースタイトル」「記者名」といったメタデータ項目名と共に出力されることが可能となる。 Another example is shown in FIG. FIG. 9 shows an example in which the conventional character recognition result shown in FIG. 2 is converted into metadata for each metadata item name according to the present invention. According to the prior art, as shown in FIG. 2, the metadata as one character string “20th (Tue) Typhoon No. 15, approaching Sato reporter”, which is a mixture of all the character strings in the image of FIG. In contrast, according to the present invention, the character string “20th (Tue) Typhoon No.15, approaching Sato reporter” is replaced by “20th (Tue)”, “Typhoon No.15, approaching”, “Sato reporter” And can be output together with metadata item names such as “date”, “news title”, and “reporter name” as shown in FIG.

これにより、本発明によれば、映像、画像中に複数の項目からなる文字列が混在する場合でも、文字を自動認識した結果として、メタデータ項目名とそれに対応する文字列をセットでデータベースに格納し、従来技術よりも効率的に利用者が所望の情報を引き出せるようにする状態を作り出すことが可能である。 Thus, according to the present invention, even when a character string consisting of a plurality of items is mixed in the video and image, as a result of automatically recognizing the character, the metadata item name and the corresponding character string are stored in the database as a set. It is possible to store and create a state that allows the user to retrieve the desired information more efficiently than in the prior art.

尚、前記メタデータ項目名データベース４を外部に設けて、メタデータ自動生成装置からアクセス可能に構成しても良い。 Note that the metadata item name database 4 may be provided outside so that it can be accessed from the metadata automatic generation device.

また、前記図５、図８のメタデータ自動生成装置の機能、および処理を実現するソフトウェアのプログラムコードを記録した記録媒体を、システム、又は装置に供給し、そのシステム、又は装置のＣＰＵ（ＭＰＵ）が記録媒体に格納されたプログラムコードを読み出し実行することも可能である。この場合、記録媒体から読み出されたプログラムコードを記録した記録媒体としては、例えば、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−ＲＷ、ＭＯ、及びＨＤＤ等がある。 Also, a recording medium in which the program code of software for realizing the functions and processing of the metadata automatic generation apparatus in FIGS. 5 and 8 is supplied to the system or apparatus, and the CPU (MPU of the system or apparatus) is supplied. It is also possible to read and execute the program code stored in the recording medium. In this case, examples of the recording medium on which the program code read from the recording medium is recorded include a CD-ROM, DVD-ROM, CD-R, CD-RW, MO, and HDD.

文字表示が含まれる映像・画像の例、及びニュース映像中の台風レポート中継のシーンを示した説明図である。It is explanatory drawing which showed the example of the image | video and image containing a character display, and the scene of the typhoon report relay in a news image | video. 従来技術による文字認識結果のメタデータとしての利用例を示した説明図である。It is explanatory drawing which showed the usage example as metadata of the character recognition result by a prior art. 項目名を指定したキーワード検索画面の例を示した説明図である。It is explanatory drawing which showed the example of the keyword search screen which designated the item name. 画像中の文字表示の定型レイアウトパターンの例を表し、（ａ）はテロップ文字の入った画面の説明図、（ｂ）はクレジット画面の説明図である。An example of a standard layout pattern of character display in an image is shown, (a) is an explanatory diagram of a screen containing telop characters, and (b) is an explanatory diagram of a credit screen. 本発明の一実施形態例におけるメタデータ自動生成装置の構成を示すブロック図である。It is a block diagram which shows the structure of the metadata automatic generation apparatus in one example of embodiment of this invention. 本発明の一実施形態例における文字列矩形抽出部の処理イメージを示す説明図である。It is explanatory drawing which shows the process image of the character string rectangle extraction part in one example of embodiment of this invention. 本発明の一実施形態例におけるメタデータ項目名データベースの内容の一例を示す説明図である。It is explanatory drawing which shows an example of the content of the metadata item name database in the example of 1 embodiment of this invention. 本発明の一実施形態例におけるメタデータ項目名検索部の具体的な処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a specific process of the metadata item name search part in the example of 1 embodiment of this invention. 本発明の一実施形態例によるメタデータ出力結果の例を示す説明図である。It is explanatory drawing which shows the example of the metadata output result by one Example of this invention.

Explanation of symbols

１…映像・画像データ読み込み部、２…文字列矩形抽出部、３…文字認識部、４…メタデータ項目名データベース、５…メタデータ項目名検索部、６…メタデータ出力部。
DESCRIPTION OF SYMBOLS 1 ... Video | video data reading part, 2 ... Character string rectangle extraction part, 3 ... Character recognition part, 4 ... Metadata item name database, 5 ... Metadata item name search part, 6 ... Metadata output part

Claims

A video / image data reading unit that reads video / image data in which one or more character strings are displayed on the screen,
The video / image data read by the video / image data reading unit is binarized to extract a character area, and a luminance edge is extracted from the read video / image data, and the extracted character area And the luminance edge for each horizontal line in the image, and by determining that the horizontal line whose counted value is equal to or greater than the threshold is within the character string rectangle, all the display character strings in the video / image data are A character string rectangle extraction unit that acquires character string rectangle information that is the vertical coordinate value and size information of the vertex of the circumscribed rectangle;
A character recognition unit that performs character recognition processing on a character pattern in a rectangle in the image extracted by the character string rectangle extraction unit, and outputs text data as a character recognition result;
The character string rectangle information extracted by the character string rectangle extraction unit, the metadata item name and the upper and lower limits of the vertical coordinate value of the vertex of the character string rectangle in the image corresponding thereto, information on the condition regarding the size, Alternatively, a plurality of metadata item name databases are compared with a character string rectangle condition, and when the condition is met, a metadata item name search unit that outputs a metadata item name corresponding to the condition;
A metadata output unit that outputs a metadata item name output from the metadata item name search unit and a character recognition result obtained by the character recognition unit as a set ;
The metadata item name search unit compares the vertical coordinate value of the vertex of the circumscribed rectangle with the upper limit and lower limit of the vertical coordinate value of the vertex of the character string rectangle defined by the character string rectangle condition, and the result is the upper limit. And an automatic metadata generation apparatus that determines that the condition is met when the value is within a lower limit range .

A video / image data reading step for reading video / image data in which one or more character strings are displayed on the screen,
The video / image data read in the video / image data reading step is binarized to extract a character area, and a luminance edge is extracted from the read video / image data, and the extracted character area And the luminance edge for each horizontal line in the image, and by determining that the horizontal line whose counted value is equal to or greater than the threshold is within the character string rectangle, all the display character strings in the video / image data are A character string rectangle extraction step for acquiring character string rectangle information which is vertical coordinate values and size information of the circumscribed rectangle vertices ;
A character recognition step of performing character recognition processing on a character pattern in a rectangle in the image extracted in the character string rectangle extraction step, and outputting text data as a character recognition result;
The character string rectangle information extracted in the character string rectangle extraction step, the upper and lower limits of the vertical coordinate value of the vertex of the character string rectangle in the image corresponding to the metadata item name, and information on the condition regarding the size are singular, Or, comparing a plurality of defined metadata item name databases with a character string rectangle condition, and if the condition matches, a metadata item name search step for outputting a metadata item name corresponding to the condition;
A metadata output step for outputting the metadata item name output from the metadata item name search step and the character recognition result obtained in the character recognition step as a set ;
The metadata item name search step compares the vertical coordinate value of the vertex of the circumscribed rectangle with the upper and lower limits of the vertical coordinate value of the vertex of the character string rectangle defined by the character string rectangle condition, and the result is the upper limit. And an automatic metadata generation method characterized in that if it is within the lower limit range, it is determined that the condition is met .

An automatic metadata generation program configured as a program for causing a computer to execute each step in the automatic metadata generation method according to claim 2.

A computer-readable recording medium on which the metadata automatic generation program according to claim 3 is recorded.