JP2011128924A

JP2011128924A - Comic image analysis apparatus, program, and search apparatus and method for extracting text from comic image

Info

Publication number: JP2011128924A
Application number: JP2009287145A
Authority: JP
Inventors: Keiichiro Hoashi; 啓一郎帆足; Yasuhiro Takishima; 康弘滝嶋
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2009-12-18
Filing date: 2009-12-18
Publication date: 2011-06-30
Anticipated expiration: 2029-12-18
Also published as: JP5433396B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a comic image analysis apparatus that extracts described text by analyzing comic images on an image basis. <P>SOLUTION: The comic image analysis apparatus includes: an objective feature point extraction means for extracting many feature points from a comic image to be analyzed, a positional clustering means for clustering the many feature points according to the density of distribution on the image, a subregional image extraction means for extracting a subregional image on the image by a frame surrounding the many feature points contained in each positional cluster, a subregion vector calculation means for calculating a feature vector of the subregional image, a subregional image classification means for comparing the feature vector of the subregional image with a feature vector of character regions obtained from learning data to classify it according to whether it is a character region or not, and a text extraction means for extracting text from a subregional image determined to be a character region. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、画像からテキストを抽出する画像解析技術に関する。また、その画像をキーワードによって検索する画像検索技術に関する。 The present invention relates to an image analysis technique for extracting text from an image. Further, the present invention relates to an image search technique for searching for the image by a keyword.

近年、インターネットを介して、電子書籍を配信・販売するサイトが多く開設されている。このようなサイトでは、検索機能が充実し、様々な要素に基づいて電子書籍を検索することができる。電子書籍としては、文章に基づく小説等の書籍に限られず、画像に基づくマンガのような書籍も含まれる。これら電子書籍には、メタ情報（タイトル、作者名、ジャンル等）が付加されており、ユーザは、そのメタ情報を検索要素として検索することもできる。例えば電子書籍が文章に基づくものである場合、文章内のテキストを、キーワードによって検索することもできる。 In recent years, many sites for distributing and selling electronic books via the Internet have been established. In such a site, the search function is enhanced, and an electronic book can be searched based on various elements. Electronic books are not limited to books such as novels based on text, but also include books such as manga based on images. Meta information (title, author name, genre, etc.) is added to these electronic books, and the user can also search the meta information as a search element. For example, when an electronic book is based on a sentence, the text in the sentence can be searched by a keyword.

一方で、異なる技術として、従来、画像から文字領域を抽出する技術がある（例えば特許文献１〜３参照）。この技術によれば、例えば文字が紙に印字された画像を入力とし、文字が出現する領域を自動的に抽出することができる。文字を検出するために、文字のフォントのエッジ特徴や、その輪郭を囲む矩形領域の間隔などを考慮する。 On the other hand, as a different technique, conventionally, there is a technique for extracting a character region from an image (see, for example, Patent Documents 1 to 3). According to this technique, for example, an image in which characters are printed on paper can be used as an input, and a region where the characters appear can be automatically extracted. In order to detect a character, the edge characteristics of the font of the character, the interval between rectangular regions surrounding the outline, and the like are taken into consideration.

特開平８−２９３００３号公報JP-A-8-293003 特開２００５−２７５８５４号公報JP 2005-275854 A 特開２００９−１３０８９９号公報JP 2009-130899 A

D. Lowe、「Distinctive image features fromscale-invariant keypoints」、IEEE Trans. Pattern Analysis Machine Intelligence,20: 91-110, 2004.D. Lowe, `` Distinctive image features fromscale-invariant keypoints '', IEEE Trans. Pattern Analysis Machine Intelligence, 20: 91-110, 2004. M. Ester, H.-P. Kriegel, J. Sander, and X.Xu、「A density-based algorithm for discovering clusters in large spatialdatabases with noise」、in Proceedings from 2nd International Conference onKnowledge Discovery and Data Mining, 1996, pp. 226-230.M. Ester, H.-P. Kriegel, J. Sander, and X.Xu, `` A density-based algorithm for discovering clusters in large spatial databases with noise '', in Proceedings from 2nd International Conference on Knowledge Discovery and Data Mining, 1996, pp. 226-230. 藤吉弘亘、「Gradientベースの特徴抽出」、中部大学工学部情報工学科、[online]、［平成２１年１２月３日検索］、インターネット＜URL:http://ci.nii.ac.jp/naid/110006423317/＞Hironobu Fujiyoshi, “Gradient-based feature extraction”, Chubu University Faculty of Engineering, Department of Information Engineering, [online], [December 3, 2009 search], Internet <URL: http://ci.nii.ac.jp/naid / 110006423317 /> 石井大祐、河村圭、渡辺裕、早稲田大学大学院国際情報通信研究科、「コミックのコマ分割処理に関する一検討」、電子情報通信学会論文誌 D Vol.J90-D No.7 pp.1667-1670、２００７年、画像符号化・映像メディア処理レター特集、[online]、［平成２１年１２月９日検索］、インターネット＜URL:http://www.ams.giti.waseda.ac.jp/pdf-files/j90-d_7_1667.pdf＞Daisuke Ishii, Satoshi Kawamura, Hiroshi Watanabe, Graduate School of International Information and Communication Studies, Waseda University, "A Study on Comic's Frame Division Processing", IEICE Transactions D Vol.J90-D No.7 pp.1667-1670, 2007, Image Encoding / Video Media Processing Letter Special Feature, [online], [Search December 9, 2009], Internet <URL: http://www.ams.giti.waseda.ac.jp/pdf- files / j90-d_7_1667.pdf>

電子書籍の中でも、マンガコンテンツは、多くのユーザによって所望される重要なコンテンツとなっている。しかしながら、マンガコンテンツは、画像に基づくものであるために、一般に、メタ情報の単位でしか検索することができない。マンガ画像の中には、吹き出しでテキストが記述されているが、これらテキストも画像の一部として認識されているからである。そのため、例えば、マンガ画像における特定のセリフをキーワードとして、マンガ画像を検索することができない。 Among electronic books, comic content is important content desired by many users. However, since manga content is based on images, it can generally be searched only in units of meta information. This is because, in the manga image, text is described in a balloon, and these texts are also recognized as part of the image. Therefore, for example, a manga image cannot be searched using a specific line in the manga image as a keyword.

図１は、マンガ画像の例である。 FIG. 1 is an example of a manga image.

図１によれば、マンガ画像は、コマ画像単位に区分されていることが多く、画像の吹き出しの中に、台詞のようなテキストが含まれている。また、マンガ画像は、一般的にモノクロであって、画像のみならずテキスト自体も、人手に基づくペン又は筆によって描かれている場合も多い。そのため、マンガ画像は、風景写真のような画像と異なって、全体的にエッジが多いという特性がある。 According to FIG. 1, a manga image is often divided into frame image units, and text such as dialogue is included in a balloon of the image. In addition, the manga image is generally monochrome, and not only the image but also the text itself is often drawn with a hand-based pen or brush. Therefore, unlike an image such as a landscape photograph, a manga image has a characteristic that it has many edges as a whole.

これに対し、例えば特許文献１〜３に記載されたような技術によれば、文字のフォントのエッジ特徴を利用して、画像から文字領域を抽出しようとする。そのために、人手に基づいて描かれた文字列から、文字領域を抽出することは極めて難しい。 On the other hand, according to the techniques described in, for example, Patent Documents 1 to 3, an attempt is made to extract a character region from an image by using an edge feature of a character font. Therefore, it is extremely difficult to extract a character area from a character string drawn based on manpower.

そこで、本発明は、マンガ画像を画像的に解析することによって、文字領域を適切に特定し、そこに記述されたテキストを抽出するマンガ画像解析装置、プログラム、検索装置及び方法を提供することを目的とする。 Therefore, the present invention provides a manga image analysis device, a program, a search device, and a method for appropriately identifying a character area by analyzing a manga image imagewise and extracting text described therein. Objective.

本発明によれば、マンガ画像に記述されたテキストを抽出するマンガ画像解析装置であって、
解析対象のマンガ画像から、多数の特徴点を抽出する対象用特徴点抽出手段と、
多数の特徴点を、画像上の分布密度に基づいてクラスタリングする位置的クラスタリング手段と、
位置的クラスタ毎に、当該クラスタに含まれる多数の特徴点を囲む外枠から、画像上のサブ領域画像を抽出するサブ領域画像抽出手段と、
サブ領域画像に含まれる全ての特徴点から、当該サブ領域画像における特徴ベクトルを算出するサブ領域ベクトル算出手段と、
サブ領域画像の特徴ベクトルを、学習データから得られた文字領域の特徴ベクトルと比較して、文字領域であるか否かによって分類するサブ領域画像分類手段と、
文字領域と判定されたサブ領域画像から、テキストを抽出するテキスト抽出手段と
を有することを特徴とする。 According to the present invention, there is provided a manga image analyzing apparatus for extracting text described in a manga image,
Feature point extraction means for extracting a large number of feature points from the manga image to be analyzed;
Positional clustering means for clustering a large number of feature points based on the distribution density on the image;
Sub-region image extraction means for extracting a sub-region image on an image from an outer frame surrounding a large number of feature points included in the cluster for each positional cluster;
Sub-region vector calculation means for calculating a feature vector in the sub-region image from all feature points included in the sub-region image;
Sub-region image classification means for comparing the feature vector of the sub-region image with the feature vector of the character region obtained from the learning data, and classifying according to whether or not it is a character region;
It has a text extraction means for extracting text from a sub-region image determined as a character region.

本発明のマンガ画像解析装置によれば、
学習用画像から、多数の特徴点を抽出する学習用特徴点抽出手段と、
多数の特徴点を、当該特徴値に基づいてｋ個の学習クラスタにクラスタリングする要素的クラスタリング手段と、
要素的クラスタ毎に、当該クラスタに含まれる全ての特徴点から、当該クラスタにおける特徴ベクトルを算出する学習用クラスタベクトル算出手段と、
要素的クラスタ毎に、当該クラスタに含まれる特徴点が文字領域であるか否かを学習させた画像分類学習手段と
を更に有し、
サブ領域画像分類手段によって用いられる学習データは、画像分類学習手段によって学習された学習データであることも好ましい。 According to the manga image analysis apparatus of the present invention,
Learning feature point extracting means for extracting a large number of feature points from the learning image;
Elemental clustering means for clustering a large number of feature points into k learning clusters based on the feature values;
Learning cluster vector calculation means for calculating a feature vector in the cluster from all feature points included in the cluster for each elemental cluster;
For each elemental cluster, the image classification learning means further learning whether or not the feature point included in the cluster is a character region,
The learning data used by the sub-region image classification unit is preferably learning data learned by the image classification learning unit.

本発明のマンガ画像解析装置によれば、
要素的クラスタリング手段は、k-meansクラスタリングによってｋ個の学習クラスタに分類し、
サブ領域ベクトル算出手段及び学習用クラスタベクトル算出手段は、ｋ個の学習クラスタに対応したｋ次元の特徴ベクトルを算出し、
画像分類学習手段は、文字領域と判定されるｋ次元の特徴ベクトルの学習データを生成することも好ましい。 According to the manga image analysis apparatus of the present invention,
Elemental clustering means classify into k learning clusters by k-means clustering,
The sub-region vector calculating means and the learning cluster vector calculating means calculate k-dimensional feature vectors corresponding to k learning clusters,
It is also preferable that the image classification learning unit generates learning data of a k-dimensional feature vector determined as a character region.

本発明のマンガ画像解析装置によれば、対象用特徴点抽出手段又は学習用特徴点抽出手段は、特徴点をＳＩＦＴ(Scale-Invariant Feature Transform)によって検出することも好ましい。 According to the manga image analyzing apparatus of the present invention, it is also preferable that the target feature point extracting unit or the learning feature point extracting unit detect the feature points by SIFT (Scale-Invariant Feature Transform).

本発明のマンガ画像解析装置によれば、画像分類学習手段は、サポートベクタマシン(Support Vector Machine)を用いることも好ましい。 According to the manga image analysis apparatus of the present invention, it is also preferable that the image classification learning means uses a support vector machine.

本発明のマンガ画像解析装置によれば、テキスト抽出手段は、文字領域と判定されたサブ領域画像と、予め記憶された文字パターンとの照合によって文字を特定するＯＣＲ(Optical Character Recognition)であることも好ましい。 According to the manga image analysis apparatus of the present invention, the text extraction means is OCR (Optical Character Recognition) that specifies a character by collating a sub-region image determined to be a character region with a pre-stored character pattern. Is also preferable.

本発明のマンガ画像解析装置によれば、
解析対象のマンガ画像から、画像上の直線によって囲まれるコマ画像単位に区分するコマ画像抽出手段を更に有し、
コマ画像毎に、対象用特徴点抽出手段が多数の特徴点を抽出することも好ましい。 According to the manga image analysis apparatus of the present invention,
It further has a frame image extraction means for dividing the manga image to be analyzed into frame image units surrounded by straight lines on the image,
It is also preferable that the feature point extraction unit for object extracts a large number of feature points for each frame image.

本発明によれば、前述したマンガ画像解析装置における全ての機能を含むマンガ画像検索装置であって、
テキスト抽出手段によって抽出されたテキストをインデックスとして、マンガ画像の識別情報に対応付けて記憶するインデックス記憶手段と、
検索キーワードを入力する検索キーワード入力手段と、
インデックス記憶手段を用いて、検索キーワードと一致するインデックスを含むマンガ画像の識別情報を検索するマンガ画像検索手段と、
検索されたマンガ画像の識別情報を出力する検索結果出力手段と
を更に有することを特徴とする。 According to the present invention, there is provided a manga image search device including all functions in the above-described manga image analysis device,
Index storage means for storing the text extracted by the text extraction means as an index in association with the identification information of the manga image;
A search keyword input means for inputting a search keyword;
Manga image search means for searching for identification information of a manga image including an index that matches the search keyword using the index storage means;
The apparatus further comprises search result output means for outputting identification information of the searched manga image.

本発明のマンガ画像検索装置によれば、
テキスト抽出手段は、文字領域と判定されたサブ領域画像と、予め記憶された文字パターンとの照合によって文字を特定するＯＣＲであり、当該ＯＣＲによって文字サイズをインデックス記憶手段へ出力するものであり、
インデックス記憶手段は、テキスト毎に優先度を対応付けて記憶するものであって、テキスト抽出手段から出力された文字サイズが大きいテキストほど、高い優先度を対応付けており、
マンガ検索手段は、インデックス記憶手段に記憶された優先度が高いテキストほど、優先的に検索結果として出力することも好ましい。 According to the manga image retrieval apparatus of the present invention,
The text extraction means is an OCR that specifies a character by collating a sub-region image determined to be a character area with a character pattern stored in advance, and outputs the character size to the index storage means by the OCR.
The index storage means stores the priority for each text in association with each other. The text having a larger character size output from the text extraction means associates a higher priority,
It is also preferable that the manga search means outputs the search result with higher priority as the text stored in the index storage means has a higher priority.

本発明によれば、マンガ画像に記述されたテキストを抽出する装置に搭載されたコンピュータを機能させるプログラムであって、
解析対象のマンガ画像から、多数の特徴点を抽出する対象用特徴点抽出手段と、
多数の特徴点を、画像上の分布密度に基づいてクラスタリングする位置的クラスタリング手段と、
位置的クラスタ毎に、当該クラスタに含まれる多数の特徴点を囲む外枠から、画像上のサブ領域画像を抽出するサブ領域画像抽出手段と、
サブ領域画像に含まれる全ての特徴点から、当該サブ領域画像における特徴ベクトルを算出するサブ領域ベクトル算出手段と、
サブ領域画像の特徴ベクトルを、学習データから得られた文字領域の特徴ベクトルと比較して、文字領域であるか否かによって分類するサブ領域画像分類手段と、
文字領域と判定されたサブ領域画像から、テキストを抽出するテキスト抽出手段と
してコンピュータを機能させることを特徴とする。 According to the present invention, there is provided a program for causing a computer mounted on an apparatus for extracting text described in a manga image to function.
Feature point extraction means for extracting a large number of feature points from the manga image to be analyzed;
Positional clustering means for clustering a large number of feature points based on the distribution density on the image;
Sub-region image extraction means for extracting a sub-region image on an image from an outer frame surrounding a large number of feature points included in the cluster for each positional cluster;
Sub-region vector calculation means for calculating a feature vector in the sub-region image from all feature points included in the sub-region image;
Sub-region image classification means for comparing the feature vector of the sub-region image with the feature vector of the character region obtained from the learning data, and classifying according to whether or not it is a character region;
A computer is made to function as a text extraction means for extracting text from a sub-region image determined to be a character region.

本発明によれば、マンガ画像に記述されたテキストを抽出する装置におけるマンガ画像解析方法であって、
解析対象のマンガ画像から、多数の特徴点を抽出する第１のステップと、
多数の特徴点を、画像上の分布密度に基づいてクラスタリングする第２のステップと、
位置的クラスタ毎に、当該クラスタに含まれる多数の特徴点を囲む外枠から、画像上のサブ領域画像を抽出する第３のステップと、
サブ領域画像に含まれる全ての特徴点から、当該サブ領域画像における特徴ベクトルを算出する第４のステップと、
サブ領域画像の特徴ベクトルを、学習データから得られた文字領域の特徴ベクトルと比較して、文字領域であるか否かによって分類する第５のステップと、
文字領域と判定されたサブ領域画像から、テキストを抽出する第６のステップと
を有することを特徴とする。 According to the present invention, there is provided a manga image analysis method in an apparatus for extracting text described in a manga image,
A first step of extracting a large number of feature points from the manga image to be analyzed;
A second step of clustering a number of feature points based on the distribution density on the image;
For each positional cluster, a third step of extracting a sub-region image on the image from an outer frame surrounding a number of feature points included in the cluster;
A fourth step of calculating a feature vector in the sub-region image from all feature points included in the sub-region image;
A fifth step of comparing the feature vector of the sub-region image with the feature vector of the character region obtained from the learning data, and classifying the feature vector according to whether or not it is a character region;
And a sixth step of extracting text from the sub-region image determined as the character region.

本発明のマンガ画像解析装置、プログラム、検索装置及び方法によれば、マンガ画像を画像的に解析することによって、文字領域を適切に特定し、そこに記述されたテキストを抽出することができる。また、抽出されたテキストを、マンガ画像にインデックスとして対応付けることによって、キーワードによってマンガ画像を検索することができる。 According to the manga image analysis device, program, search device, and method of the present invention, by characterically analyzing a manga image, a character region can be appropriately identified and text described therein can be extracted. In addition, by associating the extracted text with a manga image as an index, a manga image can be searched by a keyword.

マンガ画像の例である。It is an example of a manga image. 本発明におけるマンガ画像解析装置の基本的な機能構成図である。It is a basic functional block diagram of the manga image analysis apparatus in this invention. マンガ画像解析の経過に基づく画像の特徴点を表すイメージ図である。It is an image figure showing the feature point of the image based on progress of manga image analysis. 本発明のマンガ画像解析装置における学習処理部の機能構成図である。It is a functional block diagram of the learning process part in the manga image analysis apparatus of this invention. 学習処理の経過に基づく画像の特徴点を表すイメージ図である。It is an image figure showing the feature point of the image based on progress of learning processing. 本発明におけるマンガ画像検索装置の機能構成図である。It is a functional block diagram of the manga image search apparatus in this invention. 本発明のマンガ検索装置を備えたシステム構成図である。It is a system block diagram provided with the manga search apparatus of this invention.

以下、本発明の実施の形態について、図面を用いて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図２は、本発明におけるマンガ画像解析装置の基本的な機能構成図である。また、図３は、マンガ画像解析の経過に基づく画像の特徴点を表すイメージ図である。 FIG. 2 is a basic functional configuration diagram of the manga image analyzing apparatus according to the present invention. FIG. 3 is an image diagram showing image feature points based on the progress of manga image analysis.

マンガ画像解析装置１は、マンガ画像に記述されたテキストを抽出する。この用途として、例えば、抽出されたテキストをそのマンガ画像にインデックスとして対応付けることによって、マンガ画像をキーワードによって検索することができる。 The manga image analyzing apparatus 1 extracts text described in a manga image. For this purpose, for example, by associating the extracted text with the manga image as an index, the manga image can be searched by a keyword.

図２によれば、マンガ画像解析装置１は、マンガ画像蓄積部１０と、マンガ画像解析部１１とを有する。 As shown in FIG. 2, the manga image analysis apparatus 1 includes a manga image storage unit 10 and a manga image analysis unit 11.

マンガ画像蓄積部１０は、解析対象のマンガ画像を蓄積しており、そのマンガ画像をマンガ画像解析部１１へ出力する。解析対象のマンガ画像は、例えば前述した図１のような画像である。 The manga image storage unit 10 stores manga images to be analyzed, and outputs the manga images to the manga image analysis unit 11. The manga image to be analyzed is, for example, the image as shown in FIG.

マンガ画像解析部１１は、解析対象のマンガ画像に記述されたテキストを抽出するものであって、コマ画像抽出部１１１と、対象用特徴点抽出部１１２と、位置的クラスタリング部１１３と、サブ領域画像抽出部１１４と、サブ領域ベクトル算出部１１５と、サブ領域画像分類部１１６と、テキスト抽出部１１７とを有する。これら機能構成は、装置に搭載されたコンピュータを機能させるマンガ画像解析プログラムを実行することによって実現される。また、これら機能構成からなる処理フローは、装置におけるマンガ画像解析方法としても理解される。 The manga image analysis unit 11 extracts text described in the manga image to be analyzed, and includes a frame image extraction unit 111, a target feature point extraction unit 112, a positional clustering unit 113, and a sub-region. The image extracting unit 114 includes a sub region vector calculating unit 115, a sub region image classifying unit 116, and a text extracting unit 117. These functional configurations are realized by executing a manga image analysis program that causes a computer installed in the apparatus to function. Further, the processing flow comprising these functional configurations is understood as a manga image analysis method in the apparatus.

コマ画像抽出部１１１は、マンガ画像を、コマ画像単位で解析する場合に備えられる。コマ画像抽出部１１１は、解析対象のマンガ画像から、画像上の直線によって囲まれるコマ画像単位に区分する。図３（ａ）は、コマ画像を特定するイメージ図である。コマ画像に分割するために、帯を用いた直線検出により分割線候補を検出し、分割線適合検査によって分割線を決定する技術がある（例えば非特許文献４参照）。この技術によれば、分割線候補は、幅１画素の検出線画素を検出し、検出線角度が横軸に対して±４５°以内であれば縦軸方向に隣接した２つの検出線と、それ以外であれば横軸方向に隣接した検出線とを「検出帯」とする。それら検出帯について、濃度勾配方向検査と、コマ内外検査とによって、コマ画像に分割する。分割されたコマ画像単位で、解析対象のマンガ画像として、位置的クラスタリング部１１３へ出力される。 The frame image extraction unit 111 is provided when a manga image is analyzed on a frame image basis. The frame image extraction unit 111 classifies the manga image to be analyzed into frame image units surrounded by straight lines on the image. FIG. 3A is an image diagram for specifying a frame image. In order to divide into frame images, there is a technique in which a dividing line candidate is detected by straight line detection using a band, and a dividing line is determined by dividing line matching inspection (see, for example, Non-Patent Document 4). According to this technique, the dividing line candidate detects a detection line pixel having a width of 1 pixel, and if the detection line angle is within ± 45 ° with respect to the horizontal axis, two detection lines adjacent in the vertical axis direction, Otherwise, a detection line adjacent in the horizontal axis direction is defined as a “detection band”. These detection bands are divided into frame images by density gradient direction inspection and frame inside / outside inspection. The divided frame image units are output to the positional clustering unit 113 as manga images to be analyzed.

尚、マンガ画像が、コマ画像単位に予め区分されている場合には、コマ画像抽出部１１１は、当然に備える必要はない。例えば、携帯電話機向けに配信されているマンガ画像の場合、コマ画像単位に人手によって予め区分されているからである。コマ画像抽出部１１１は、あくまでオプション的なものであって、図２によれば破線によって描かれている。 In addition, when the manga image is divided into frame images in advance, the frame image extraction unit 111 is not necessarily provided. This is because, for example, in the case of a manga image distributed for a mobile phone, the frame image is preliminarily divided manually. The frame image extraction unit 111 is optional only, and is drawn with a broken line in FIG.

対象用特徴点抽出部１１２は、解析対象のマンガ画像から、多数の特徴点を抽出する。図３（ｂ）は、マンガ画像から検出された特徴点を表すイメージ図である。「特徴点」とは、視覚的な特徴を表す点であって、例えばＳＩＦＴ(Scale-Invariant Feature Transform)を用いて検出することができる（例えば非特許文献１又は３参照）。ＳＩＦＴとは、スケールスペースを用いて画像構造を解析し、画像のスケール変化及び回転に不変となる特徴量を記述する技術である。ＳＩＦＴによれば、特徴点を検出するために、以下の２つのステップを要する。
（Ｓ１）スケールスペースの極値探索によってキーポイント及びスケールを決定する。
（Ｓ２）決定されたキーポイントの中で、主曲率及びコントラストによって安定したキーポイントに絞り込む。
このようにして抽出された多数の特徴点は、位置的クラスタリング部１１３へ出力される。 The target feature point extraction unit 112 extracts a large number of feature points from the manga image to be analyzed. FIG. 3B is an image diagram showing feature points detected from a manga image. A “feature point” is a point representing a visual feature, and can be detected using, for example, SIFT (Scale-Invariant Feature Transform) (see, for example, Non-Patent Document 1 or 3). SIFT is a technique for describing a feature quantity that is invariant to scale change and rotation of an image by analyzing the image structure using a scale space. According to SIFT, the following two steps are required to detect feature points.
(S1) A key point and a scale are determined by an extreme value search of the scale space.
(S2) The key points determined are narrowed down to stable key points according to the main curvature and contrast.
A large number of feature points extracted in this way are output to the positional clustering unit 113.

位置的クラスタリング部１１３は、多数の特徴点を、画像上の分布密度に基づいてクラスタリングする。ここでのクラスタリングは、特徴点の特徴に基づくクラスタリングではなく、特徴点の画像上の「位置」に基づくクラスタリングである。例えば、DBSCANアルゴリズムを適用することにより（例えば非特許文献２参照）、画像上の位置的に高密度に分布する特徴点をクラスタとして抽出する。 The positional clustering unit 113 clusters a large number of feature points based on the distribution density on the image. The clustering here is not clustering based on the features of the feature points but clustering based on the “positions” of the feature points on the image. For example, by applying the DBSCAN algorithm (see, for example, Non-Patent Document 2), feature points distributed on the image at a high density are extracted as clusters.

サブ領域画像抽出部１１４は、位置的クラスタ毎に、当該クラスタに含まれる多数の特徴点を囲む外枠（例えば外接多角形）から、画像上のサブ領域画像を抽出する。図３（ｃ）は、特徴点を用いて検出されたサブ領域を表すイメージ図である。 For each positional cluster, the sub-region image extraction unit 114 extracts a sub-region image on the image from an outer frame (for example, a circumscribed polygon) surrounding many feature points included in the cluster. FIG. 3C is an image diagram showing sub-regions detected using feature points.

サブ領域ベクトル算出部１１５は、サブ領域画像に含まれる全ての特徴点から、当該サブ領域画像における特徴ベクトルを算出する。ここで、特徴ベクトルとは、ｋ個の学習クラスタに対応したｋ次元のベクトル(v₁,v₂,・・・,v_k)を表す。当該サブ領域画像に含まれる１つの特徴点の特徴量が、各学習クラスタの特徴量に対する指数として表される。ここで、ｋ個の学習クラスタは、予め生成されたものであってもよいし、図４で後述するものであってもよい。 The sub-region vector calculation unit 115 calculates a feature vector in the sub-region image from all feature points included in the sub-region image. Here, the feature vector represents a k-dimensional vector (v ₁ , v ₂ ,..., V _k ) corresponding to k learning clusters. A feature amount of one feature point included in the sub-region image is represented as an index with respect to the feature amount of each learning cluster. Here, the k learning clusters may be generated in advance or may be described later with reference to FIG.

サブ領域画像分類部１１６は、サブ領域画像の特徴ベクトルを、学習データから得られた文字領域の特徴ベクトルと比較して、文字領域であるか否かによって分類する。ここで「学習データ」とは、予め文字領域が特定された領域画像における特徴ベクトルである。この学習データは、文字領域であるとする「正例」に限られず、文字領域でないとする「負例」であってもよい。尚、学習データは、予め生成されたものであってもよいし、図４で後述するものであってもよい。 The sub-region image classification unit 116 compares the feature vector of the sub-region image with the feature vector of the character region obtained from the learning data, and classifies it according to whether or not it is a character region. Here, “learning data” is a feature vector in an area image in which a character area is specified in advance. This learning data is not limited to a “positive example” that is a character area, but may be a “negative example” that is not a character area. The learning data may be generated in advance or may be described later with reference to FIG.

テキスト抽出部１１７は、文字領域と判定されたサブ領域画像から、テキストを抽出する。テキスト抽出部１１７は、例えば、文字領域と判定されたサブ領域画像と、予め記憶された文字パターンとの照合によって文字を特定する既存のＯＣＲ(Optical Character Recognition)を用いたものであってもよい。 The text extraction unit 117 extracts text from the sub-region image determined as the character region. The text extraction unit 117 may use, for example, an existing OCR (Optical Character Recognition) that identifies a character by matching a sub-region image determined as a character region with a previously stored character pattern. .

図４は、本発明のマンガ画像解析装置における学習処理部の機能構成図である。また、図５は、学習処理の経過に基づく画像の特徴点を表すイメージ図である。 FIG. 4 is a functional configuration diagram of the learning processing unit in the manga image analyzing apparatus of the present invention. FIG. 5 is an image diagram showing image feature points based on the progress of the learning process.

図４のマンガ画像解析装置１によれば、図２の機能構成部に加えて、学習用特徴点抽出部１２１と、要素的クラスタリング部１２２と、学習用クラスタベクトル算出部１２３と、画像分類学習部１２４とを更に有する。これら機能構成も、装置に搭載されたコンピュータを機能させるプログラムを実行することによって実現される。 According to the manga image analysis apparatus 1 of FIG. 4, in addition to the functional component of FIG. 2, a learning feature point extraction unit 121, an elemental clustering unit 122, a learning cluster vector calculation unit 123, and image classification learning And a portion 124. These functional configurations are also realized by executing a program that causes a computer mounted on the apparatus to function.

学習用特徴点抽出部１２１は、学習用画像から、多数の特徴点を抽出する。図５（ａ）は、学習用画像から検出された特徴点を表すイメージ図である。前述した対象用特徴点抽出部１１２と同様に、例えばＳＩＦＴを用いて検出することができる。抽出された多数の特徴点は、要素的クラスタリング部１２２へ出力される。 The learning feature point extraction unit 121 extracts a large number of feature points from the learning image. FIG. 5A is an image diagram showing feature points detected from the learning image. Similar to the target feature point extraction unit 112 described above, detection can be performed using, for example, SIFT. The extracted many feature points are output to the elemental clustering unit 122.

要素的クラスタリング部１２２は、多数の特徴点を、当該特徴値に基づいてｋ個の学習クラスタにクラスタリングする。ここで、クラスタリングには、k-meansの方式が用いられてもよい。図５（ｂ）は、特徴点のk-meansクラスタリングを表すイメージ図である。そして、ｋ個の学習クラスタのそれぞれについて、当該クラスタの代表特徴値を算出する。例えば、多数の特徴値からなるクラスタ毎に、それら特徴値の重心をもって代表特徴値とみなすものであってもよい。 The elemental clustering unit 122 clusters a large number of feature points into k learning clusters based on the feature values. Here, the k-means method may be used for clustering. FIG. 5B is an image diagram showing k-means clustering of feature points. Then, for each of the k learning clusters, a representative feature value of the cluster is calculated. For example, for each cluster consisting of a large number of feature values, the center of the feature values may be regarded as the representative feature value.

学習用クラスタベクトル算出部１２３は、要素的クラスタ毎に、当該クラスタに含まれる全ての特徴点から、当該クラスタにおける特徴ベクトル(v₁,v₂,・・・,v_k)を算出する。学習用クラスタベクトル算出部１２３は、前述したサブ領域ベクトル算出部１１４と同様に、ｋ個の学習クラスタに対応したｋ次元のベクトルを算出する。 The learning cluster vector calculation unit 123 calculates, for each elemental cluster, feature vectors (v ₁ , v ₂ ,..., V _k ) in the cluster from all feature points included in the cluster. The learning cluster vector calculation unit 123 calculates k-dimensional vectors corresponding to k learning clusters, similarly to the sub-region vector calculation unit 114 described above.

画像分類学習部１２４は、要素的クラスタ毎に、当該クラスタに含まれる特徴点が文字領域であるか否かを学習させたものである。具体的には、学習用画像について、予め与えられたサブ領域画像毎に、ｋ次元のベクトルを算出する。具体的には、各サブ領域画像に分布する特徴点と、要素的クラスタリング部１２２によって得られたｋ個の学習クラスタの代表要素点との距離を算出し、各特徴点が属するクラスタを決定する。図５（ｃ）は、各要素クラスタリングに対するサブ領域画像のベクトルの対応を表すイメージ図である。その結果、各クラスタに属する特徴点の数（又は割合）を特徴値とするｋ次元ベクトルを生成することができる。 The image classification learning unit 124 learns for each elemental cluster whether or not the feature point included in the cluster is a character region. Specifically, a k-dimensional vector is calculated for each learning sub-image for the learning image. Specifically, the distance between the feature points distributed in each sub-region image and the representative element points of k learning clusters obtained by the elemental clustering unit 122 is calculated, and the cluster to which each feature point belongs is determined. . FIG. 5C is an image diagram showing the correspondence of the vector of the sub-region image to each element clustering. As a result, it is possible to generate a k-dimensional vector whose feature value is the number (or ratio) of feature points belonging to each cluster.

画像分類学習部１２４は、サポートベクタマシン(Support Vector Machine)を用いるものであってもよい。サポートベクタマシンとは、教師有り学習を用いる識別アルゴリズムであって、パターン認識に適用される。サポートベクタマシンは、線形入力素子を用いて２クラスのパターン識別器を構成するものであって、学習サンプルから、各特徴値との距離を算出することによって、線形入力素子のパラメータを学習する。具体的には、学習用画像の中で文字領域と指定されたサブ領域のベクトルを「正例」とし、それ以外のベクトルを「負例」して、画像分類用の学習データを生成する。 The image classification learning unit 124 may use a support vector machine. The support vector machine is an identification algorithm using supervised learning, and is applied to pattern recognition. The support vector machine constitutes two classes of pattern discriminators using linear input elements, and learns the parameters of the linear input elements by calculating the distance from each feature value from the learning sample. Specifically, learning data for image classification is generated by setting a vector of a sub-region designated as a character region in the learning image as a “positive example” and other vectors as “negative examples”.

従って、マンガ画像解析部１１のサブ領域ベクトル算出部１１５は、画像分類学習部１２４に基づくｋ個の学習クラスタに対応したｋ次元のベクトル(v₁,v₂,・・・,v_k)を利用することができる。また、マンガ画像解析部１１のサブ領域画像分類部１１６は、画像分類学習部１２４によって生成された画像分類用の学習データを用いて、マンガ画像から抽出されたサブ領域画像が、文字領域か否かを分類する。 Accordingly, the sub-region vector calculation unit 115 of the manga image analysis unit 11 generates k-dimensional vectors (v ₁ , v ₂ ,..., V _k ) corresponding to k learning clusters based on the image classification learning unit 124. Can be used. Further, the sub-region image classification unit 116 of the manga image analysis unit 11 uses the learning data for image classification generated by the image classification learning unit 124 to determine whether the sub-region image extracted from the manga image is a character region. Classify.

図６は、本発明のマンガ画像解析装置における検索処理部の機能構成図である。 FIG. 6 is a functional configuration diagram of the search processing unit in the manga image analysis apparatus of the present invention.

図６によれば、図２又は図４のマンガ画像解析装置における全ての機能を含むマンガ画像検索装置２が表されている。マンガ画像検索装置２は、更に、インデックス記憶部１３と、インタフェース部１４と、検索キーワード入力部１３１と、マンガ画像検索部１３２と、検索結果出力部１３３とを有する。検索キーワード入力部１３１、マンガ画像検索部１３２及び検索結果出力部１３３は、装置に搭載されたコンピュータを機能させるプログラムを実行することによって実現される。 FIG. 6 shows the manga image search apparatus 2 including all the functions in the manga image analysis apparatus of FIG. 2 or FIG. The manga image search device 2 further includes an index storage unit 13, an interface unit 14, a search keyword input unit 131, a manga image search unit 132, and a search result output unit 133. The search keyword input unit 131, the manga image search unit 132, and the search result output unit 133 are realized by executing a program that causes a computer installed in the apparatus to function.

インデックス記憶部１３は、マンガ画像解析部１１によって抽出されたテキストをインデックスとして、マンガ画像の識別情報に対応付けて記憶する。マンガ画像がコマ画像単位に区分されている場合、そのコマ画像の識別情報に、インデックスが対応付けられる。コマ画像の識別情報は、例えばマンガ画像の識別情報に加えて、その画像内のコマ番号又は座標によって特定される。 The index storage unit 13 stores the text extracted by the manga image analysis unit 11 as an index in association with the identification information of the manga image. When the comic image is divided into frame images, an index is associated with the identification information of the frame image. The identification information of the frame image is specified by, for example, the frame number or coordinates in the image in addition to the identification information of the manga image.

尚、マンガ画像の識別番号に、テキストがそのまま対応付けられているものであってもよいし、形態素解析によって形態素単位に区分されて対応付けられているものであってもよい。 The text may be directly associated with the identification number of the manga image, or may be associated with the morpheme unit by morphological analysis.

インタフェース部１４は、ユーザインタフェースであってもよいし、通信インタフェースであってもよい。ユーザインタフェースである場合、例えば検索キーワードをキーボードによって入力し、その検索結果（識別番号又はマンガ画像自体）をディスプレイに表示するものであってもよい。通信インタフェースである場合、ネットワークを介して端末から検索キーワードを受信し、その検索結果を端末へ返信する。 The interface unit 14 may be a user interface or a communication interface. In the case of a user interface, for example, a search keyword may be input using a keyboard, and the search result (identification number or manga image itself) may be displayed on a display. In the case of the communication interface, the search keyword is received from the terminal via the network, and the search result is returned to the terminal.

検索キーワード入力部１３１は、インタフェース部１４から検索キーワード（クエリ）を入力し、その検索キーワードをマンガ画像検索部１３２へ出力する。 The search keyword input unit 131 inputs a search keyword (query) from the interface unit 14 and outputs the search keyword to the manga image search unit 132.

マンガ画像検索部１３２は、インデックス記憶部１３を用いて、入力された検索キーワードと一致（又は類似）するインデックスを含むマンガ画像の識別情報を検索する。検索キーワードとインデックスとの間で、編集距離が一定距離以下であるものを、類似するとして検索してもよい。その検索結果は、検索結果出力部１３３へ出力される。 The manga image search unit 132 uses the index storage unit 13 to search for identification information of a manga image including an index that matches (or resembles) the input search keyword. The search keyword and the index whose edit distance is equal to or smaller than a certain distance may be searched as being similar. The search result is output to the search result output unit 133.

検索結果出力部１３３は、検索結果となるマンガ画像の識別情報又はマンガ画像自体を、インタフェース部１４へ出力する。 The search result output unit 133 outputs the manga image identification information or the manga image itself that is the search result to the interface unit 14.

ここで、他の実施形態として、マンガ画像解析部１１のテキスト抽出部１１７が、ＯＣＲによって文字サイズを、インデックス記憶部１３へ出力することも好ましい。これによって、インデックス記憶部１３は、テキスト毎に優先度を対応付けて記憶することができる。ここで、文字サイズが大きいテキストほど、高い優先度を対応付ける。 Here, as another embodiment, it is also preferable that the text extraction unit 117 of the manga image analysis unit 11 outputs the character size to the index storage unit 13 by OCR. As a result, the index storage unit 13 can store the priority in association with each text. Here, a higher priority is associated with a text having a larger character size.

また、マンガ検索部１３２は、インデックス記憶部１３に記憶された優先度が高いテキストほど、優先的に検索結果として出力する。これによって、マンガ画像の中で、文字サイズが大きいテキストほど、優先度の高いインデックスとして検索することができる。 Further, the manga search unit 132 preferentially outputs the search result as the higher priority text stored in the index storage unit 13. As a result, a text with a larger character size in a manga image can be searched as a higher priority index.

図７は、本発明のマンガ検索装置を備えたシステム構成図である。 FIG. 7 is a system configuration diagram including the manga search apparatus of the present invention.

図７によれば、マンガ画像検索装置２が、インターネットに接続されており、マンガ画像データベース３、携帯電話機４及び端末５と通信することができる。マンガ画像データベース３は、マンガ画像を蓄積したデータベースであって、マンガコンテンツ（マンガ画像）をマンガ画像検索装置２へ送信する。マンガ画像検索装置２は、そのマンガコンテンツについてインデックスを生成することができる。一方で、携帯電話機４及び端末５は、アクセスネットワークを介してインターネットに接続し、マンガ画像検索装置２へ、ユーザの操作に応じた検索キーワードを送信する。これに対し、マンガ画像検索装置２は、その検索結果を、携帯電話機４及び端末５へ返信する。 According to FIG. 7, the manga image search device 2 is connected to the Internet and can communicate with the manga image database 3, the mobile phone 4 and the terminal 5. The manga image database 3 is a database storing manga images, and transmits manga content (manga images) to the manga image search device 2. The manga image search device 2 can generate an index for the manga content. On the other hand, the mobile phone 4 and the terminal 5 are connected to the Internet via the access network, and transmit a search keyword corresponding to the user's operation to the manga image search device 2. On the other hand, the manga image search device 2 returns the search result to the mobile phone 4 and the terminal 5.

以上、詳細に説明したように、本発明のマンガ画像解析装置、プログラム、検索装置及び方法によれば、マンガ画像を画像的に解析することによって、文字領域を適切に特定し、そこに記述されたテキストを抽出することができる。また、抽出されたテキストを、マンガ画像にインデックスとして対応付けることによって、キーワードによってマンガ画像を検索することができる。 As described above in detail, according to the manga image analysis device, program, search device, and method of the present invention, a character region is appropriately identified by image analysis of a manga image and described therein. Text can be extracted. In addition, by associating the extracted text with a manga image as an index, a manga image can be searched by a keyword.

前述した本発明の種々の実施形態について、本発明の技術思想及び見地の範囲の種々の変更、修正及び省略は、当業者によれば容易に行うことができる。前述の説明はあくまで例であって、何ら制約しようとするものではない。本発明は、特許請求の範囲及びその均等物として限定するものにのみ制約される。 Various changes, modifications, and omissions of the above-described various embodiments of the present invention can be easily made by those skilled in the art. The above description is merely an example, and is not intended to be restrictive. The invention is limited only as defined in the following claims and the equivalents thereto.

１マンガ画像解析装置
１０マンガ画像蓄積部
１１１コマ画像抽出部
１１２対象用特徴点抽出部
１１３位置的クラスタリング部
１１４サブ領域画像抽出部
１１５サブ領域ベクトル算出部
１１６サブ領域画像分類部
１１７テキスト抽出部
１２１学習用特徴点抽出部
１２２要素的クラスタリング部
１２３学習用クラスタベクトル算出部
１２４画像分類学習部
１３インデックス記憶部
１３１検索キーワード入力部
１３２マンガ画像検索部
１３３検索結果出力部
１４インタフェース部
２マンガ画像検索装置
３マンガ画像データベース
４携帯電話機
５端末 DESCRIPTION OF SYMBOLS 1 Manga image analysis apparatus 10 Manga image storage part 111 Frame image extraction part 112 Target feature point extraction part 113 Positional clustering part 114 Sub area image extraction part 115 Sub area vector calculation part 116 Sub area image classification part 117 Text extraction part 121 Feature point extraction unit for learning 122 Elemental clustering unit 123 Cluster vector calculation unit for learning 124 Image classification learning unit 13 Index storage unit 131 Search keyword input unit 132 Manga image search unit 133 Search result output unit 14 Interface unit 2 Manga image search device 3 Manga image database 4 Mobile phone 5 Terminal

Claims

A manga image analyzer that extracts text described in a manga image,
Feature point extraction means for extracting a large number of feature points from the manga image to be analyzed;
Positional clustering means for clustering the multiple feature points based on distribution density on the image;
Sub-region image extraction means for extracting a sub-region image on an image from an outer frame surrounding a large number of feature points included in the cluster for each positional cluster;
Sub-region vector calculation means for calculating a feature vector in the sub-region image from all feature points included in the sub-region image;
Sub-region image classification means for comparing the feature vector of the sub-region image with the feature vector of the character region obtained from the learning data and classifying according to whether or not it is a character region;
A manga image analyzing apparatus, comprising: text extracting means for extracting text from the sub-region image determined to be a character region.

Learning feature point extracting means for extracting a large number of feature points from the learning image;
Elemental clustering means for clustering the multiple feature points into k learning clusters based on the feature values;
Learning cluster vector calculation means for calculating a feature vector in the cluster from all feature points included in the cluster for each elemental cluster;
For each elemental cluster, the image classification learning means further learning whether or not the feature point included in the cluster is a character region,
The manga image analysis apparatus according to claim 1, wherein the learning data used by the sub-region image classification unit is learning data learned by the image classification learning unit.

The elemental clustering means classifies into k learning clusters by k-means clustering,
The sub-region vector calculating means and the learning cluster vector calculating means calculate k-dimensional feature vectors corresponding to k learning clusters,
The manga image analysis apparatus according to claim 1, wherein the image classification learning unit generates learning data of a k-dimensional feature vector determined to be a character region.

The target feature point extracting unit or the learning feature point extracting unit converts the feature points into SIFT (Scale-Invariant
The manga image analysis apparatus according to any one of claims 1 to 3, wherein the manga image analysis apparatus is detected by Feature Transform.

The manga image analysis apparatus according to claim 1, wherein the image classification learning unit uses a support vector machine.

6. The text extracting means is an OCR (Optical Character Recognition) that specifies a character by collating the sub-region image determined to be a character region with a pre-stored character pattern. The manga image analysis apparatus according to any one of the above.

It further has a frame image extraction means for dividing the manga image to be analyzed into frame image units surrounded by straight lines on the image,
The manga image analyzing apparatus according to claim 1, wherein the target feature point extracting unit extracts a large number of feature points for each frame image.

A manga image search device including all functions in the manga image analysis device according to any one of claims 1 to 7,
Index storage means for storing the text extracted by the text extraction means as an index in association with identification information of the manga image;
A search keyword input means for inputting a search keyword;
Manga image search means for searching for identification information of a manga image including an index that matches the search keyword using the index storage means;
A manga image search device further comprising search result output means for outputting identification information of a searched manga image.

The text extraction means is an OCR that specifies a character by collating the sub-region image determined to be a character area with a pre-stored character pattern, and outputs the character size to the index storage means by the OCR And
The index storage means stores a priority in association with each text, the text having a larger character size output from the text extraction means associates a higher priority,
9. The manga image search apparatus according to claim 8, wherein the manga search means outputs a search result preferentially for a text having a higher priority stored in the index storage means.

A program that allows a computer installed in a device that extracts text described in a manga image to function.
Feature point extraction means for extracting a large number of feature points from the manga image to be analyzed;
Positional clustering means for clustering the multiple feature points based on distribution density on the image;
Sub-region image extraction means for extracting a sub-region image on an image from an outer frame surrounding a large number of feature points included in the cluster for each positional cluster;
Sub-region vector calculation means for calculating a feature vector in the sub-region image from all feature points included in the sub-region image;
Sub-region image classification means for comparing the feature vector of the sub-region image with the feature vector of the character region obtained from the learning data and classifying according to whether or not it is a character region;
A manga image analysis program that causes a computer to function as text extraction means for extracting text from the sub-region image determined to be a character region.

A method for analyzing a manga image in an apparatus for extracting text described in a manga image,
A first step of extracting a large number of feature points from the manga image to be analyzed;
A second step of clustering the multiple feature points based on a distribution density on the image;
For each positional cluster, a third step of extracting a sub-region image on the image from an outer frame surrounding a number of feature points included in the cluster;
A fourth step of calculating a feature vector in the sub-region image from all feature points included in the sub-region image;
Comparing the feature vector of the sub-region image with the feature vector of the character region obtained from the learning data, and classifying the feature vector according to whether or not it is a character region;
And a sixth step of extracting text from the sub-region image determined to be a character region.