JP5492666B2

JP5492666B2 - Judgment device, method and program

Info

Publication number: JP5492666B2
Application number: JP2010131356A
Authority: JP
Inventors: 章裕宮田; 寿子塩原; 考藤村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2010-06-08
Filing date: 2010-06-08
Publication date: 2014-05-14
Anticipated expiration: 2030-06-08
Also published as: JP2011257952A

Description

本発明は、判定装置及び方法及びプログラムに係り、特に、改ページや改行位置が確定しているドキュメント内の部分領域の撮影画像を検索クエリとして、該領域が出現するドキュメント及び該ドキュメント内における位置を取得する検索要求に応えるための、ドキュメント及びドキュメント内の各位置のインデックスの作成を支援する判定装置及び方法及びプログラムに関する。 The present invention relates to a determination apparatus, method, and program, and in particular, a document in which a region appears and a position in the document by using a captured image of a partial region in a document in which a page break or line break position is fixed as a search query. The present invention relates to a determination apparatus, a method, and a program for supporting creation of a document and an index of each position in the document in response to a search request for acquiring the document.

詳しくは、改ページや改行位置が確定しているドキュメント内の該領域を含む可能性があるドキュメント及びドキュメント内における位置を網羅的に取得するのではなく、位置を一意に特定したい場合に適用される判定装置及び方法及びプログラムに関する。 Specifically, this is applied to a document that may include the area in a document where the page break or line feed position has been determined, and a position in the document that is not comprehensively acquired, but is intended to uniquely identify the position. The present invention relates to a determination apparatus, method, and program.

ドキュメントの一部領域から、該領域がどのドキュメントに含まれているか、あるいは、どのドキュメントのどの位置に含まれているか一意に特定することが必要なシーンは少なくない。例えば、手元に雑誌の切り抜きがある場合、切り抜いた元の雑誌を探して、切り抜きの続きを読みたいことがある。この場合、該切り抜きがどの雑誌の一部であったか一意に特定できる必要がある。 There are not a few scenes in which it is necessary to uniquely identify from which document a part of a document is included in which document or at which position in which document. For example, if there is a magazine cut out at hand, there may be a case where the original cut out magazine is searched and the continuation of the cut out is read. In this case, it is necessary to be able to uniquely identify which magazine the clipping was part of.

上記の事例は、ドキュメントの一部領域をクエリとし、膨大な量のドキュメント群の中から、該領域を含むドキュメント名、あるいはドキュメント名及びドキュメントにおける位置を問い合わせる検索システムと捉えることができる。 The above example can be regarded as a search system that uses a partial area of a document as a query and inquires about a document name including the area or a document name and a position in the document from a huge amount of documents.

そして、ドキュメント群の中から情報を取得する検索要求に応えるシステムを構築するためには、ドキュメント群を事前に分析して検索インデックスを作成する必要がある。 In order to construct a system that responds to a search request for acquiring information from a document group, it is necessary to analyze the document group in advance and create a search index.

例えば、一般技術である形態素解析を用いてドキュメント内の文章から名詞を抽出しておき、図２３のように各名詞を検索インデックスのキーとし、該名詞の出現位置（ドキュメント名、ページ）を検索インデックスの値とする方式が挙げられる。 For example, nouns are extracted from sentences in a document using morphological analysis, which is a general technique, and each noun is used as a search index key as shown in FIG. 23 to search for the appearance position (document name, page) of the noun. An index value method may be mentioned.

上記方式は、図２４のように、書籍内で文字が存在する領域（以降「文字領域」とする）をカメラ付き携帯電話で撮影すると、撮影領域がどの位置（書籍名・ページ）か特定し、その領域に予め既定されたコンテンツを提示するシステムに適用できる。 In the above method, as shown in FIG. 24, when an area in which a character exists in a book (hereinafter referred to as “character area”) is shot with a camera-equipped mobile phone, the position (book title / page) of the shooting area is specified. The present invention can be applied to a system that presents a predetermined content in the area.

本発明では、このシステムを視覚障がい者（全盲、もしくは弱視等により書籍内の文字領域を知覚できない者。以降、「ユーザ」とする。）支援システムに適用するものである。すなわち、ユーザがカメラ付き携帯電話で書籍内の文字領域を撮影すると、その領域を読み上げた音声ファイルを提示するという仕組みである。 In the present invention, this system is applied to a visually impaired person (a person who is blind or who cannot perceive a character area in a book due to low vision, etc., hereinafter referred to as “user”). That is, when a user captures a character area in a book with a camera-equipped mobile phone, an audio file that reads out the area is presented.

このとき、ユーザは書籍内のどこに文字があるか知覚できないため、文字領域ではなく、非文字領域、すなわち、書籍内の図・写真が存在する領域や何も印刷されていない領域を撮影してしまうかもしれない。この場合、システムは撮影領域が非文字領域であることを判定し、ユーザに「撮影領域は非文字領域である」と通知する必要がある。 At this time, since the user cannot perceive where the character is in the book, it is not a character region, but a non-character region, that is, a region where a figure / photo is present in the book or a region where nothing is printed. It may end up. In this case, the system needs to determine that the shooting area is a non-character area and notify the user that “the shooting area is a non-character area”.

しかしながら、どのような非文字領域が存在するか既知でない書籍内の部分領域を撮影した画像が文字領域かどうか判定することは容易ではない。 However, it is not easy to determine whether an image obtained by photographing a partial area in a book whose unknown non-character area exists is a character area.

撮影対象が自動車や本棚であれば、文字領域がナンバープレート、背表紙といった矩形内に出現することを利用して、文字領域を推定する方法がある（例えば、非特許文献１、非特許文献２参照）。しかし、書籍内の文字領域が矩形で囲まれていることは少ないため、書籍内について文字領域を推定することは困難である。 If the object to be photographed is a car or a bookshelf, there is a method for estimating the character area by utilizing the fact that the character area appears in a rectangle such as a license plate or a back cover (for example, Non-Patent Document 1, Non-Patent Document 2). reference). However, since the character area in the book is rarely surrounded by a rectangle, it is difficult to estimate the character area in the book.

また、撮影対象が景観中の看板等であれば、文字領域は背景色と文字色の濃淡差が大きいことを利用して、文字領域を特定する技術がある（例えば、非特許文献３参照）。しかし、書籍内には線画や罫線のように背景色との濃淡差が大きい非文字領域がある。 In addition, if the subject to be photographed is a signboard or the like in a landscape, there is a technique for specifying the character region by utilizing the fact that the character region has a large difference in shade between the background color and the character color (see, for example, Non-Patent Document 3). . However, there are non-character areas in the book that have large shade differences from the background color, such as line drawings and ruled lines.

田邊勝義，川島晴美，丸林栄作，仲西正，塩昭夫，大塚作一部分文字列の配置規則を考慮したナンバープレート領域抽出電子情報通信学会論文誌，D-II，情報・システム，II-情報処理 J81-D-2(10)，pp.2280-2287．Katsuyoshi Tabuchi, Harumi Kawashima, Eisaku Marubayashi, Tadashi Nakanishi, Akio Shio, Osuka Osamu License plate region extraction considering partial character placement rules IEICE Transactions, D-II, Information System, II-Information Processing J81 -D-2 (10), pp.2280-2287. 澤木美奈子，村瀬洋，萩田紀博「劣化推定に基づいた辞書の自動選択による本棚画像中の文字認識」映像情報メディア学会誌，映像情報メディア54(6)，pp.881-886．Minako Sawaki, Hiroshi Murase, Norihiro Hirota “Character recognition in bookshelf images by automatic dictionary selection based on degradation estimation”, Journal of the Institute of Image Information and Television Engineers, 54 (6), pp.881-886. 劉詠梅，山村毅，大西昇，杉江昇，「シーン内の文字列領域の抽出について」電子情報通信学会論文誌，D-II，情報・システム，II-情報処理，J81-D-2(4)，pp.641-650．Liu Xiaomei, Satoshi Yamamura, Noboru Onishi, Noboru Sugie, “Extraction of character string regions in scenes” IEICE Transactions, D-II, Information and Systems, II-Information Processing, J81-D-2 (4) , Pp.641-650.

上記のように、書籍には様々な種類があり、出現する非文字領域も様々であるため、文字領域抽出方法を事前に1つに決定することは難しい。仮に単一の種類の非文字領域だけに対応する方法を用いると、想定していない種類の非文字領域を含む書籍に対しては判定精度が低下してしまう。複数の文字領域抽出を併用することも考えられるが、これらの方法は単独でも処理コストが高いため、複数併用することは処理速度の面から実用的とは言い難い。そもそも、今回は画像内に文字領域が含まれるかどうかさえ分かればよく、画像中から文字領域を精度良く抽出することを目的としている上述の従来手法はオーバースペックである。 As described above, since there are various types of books and various non-character areas appearing, it is difficult to determine one character area extraction method in advance. If a method corresponding to only a single type of non-character area is used, the accuracy of determination for a book including a non-type of non-character area that is not assumed is reduced. Although it is conceivable to use a plurality of character region extractions in combination, these methods alone are expensive in processing cost, and it is difficult to say that using a plurality of methods together is practical from the viewpoint of processing speed. In the first place, it is only necessary to know whether or not a character area is included in the image, and the above-described conventional method aiming at extracting the character area from the image with high accuracy is overspec.

本発明は、上記の点に鑑みなされたもので、どのような非文字領域が存在するか既知でない書籍内の部分領域を撮影した画像が、文字領域かどうか判定することが可能な判定装置及び方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and a determination apparatus capable of determining whether an image obtained by capturing a partial area in a book whose unknown non-character area exists is a character area, and An object is to provide a method and a program.

上記の課題を解決するために、本発明（請求項１）は、文字、あるいは、写真、あるいは、図、あるいは、表、あるいは、罫線、あるいは、その他の文字ではないものが記載されたドキュメントの全領域、もしくは部分領域を入力として、該領域が文字を一定割合以上含む文字領域であるかどうか判定する判定装置であって、
判定対象の領域の入力を受け付ける領域入力手段と、
判定対象の領域に文字が記載されているとみなして光学文字認識処理を行った結果、少なくとも、検出された未知語の数及び該領域に含まれる単語数の割合に基づいて文字領域か非文字領域かを判定する文字領域判定手段と、
前記文字領域判定手段の判定結果に基づいて、文字領域または非文字領域のいずれかを出力する判定結果出力手段と、を有する。 In order to solve the above problems, the present invention (Claim 1) is directed to a document in which a character, a photograph, a figure, a table, a ruled line, or other non-character is described. A determination device that determines whether a whole area or a partial area is an input and the area is a character area including a certain percentage of characters,
An area input means for receiving an input of an area to be determined;
As a result of performing optical character recognition processing assuming that characters are described in the determination target area, at least a character area or a non-character based on the number of unknown words detected and the ratio of the number of words included in the area Character area determination means for determining whether the area;
Determination result output means for outputting either a character area or a non-character area based on the determination result of the character area determination means.

また、本発明（請求項２）は、請求項１の前記文字領域判定手段が、
前記検出された未知語の数が所定値Ａ未満で、かつ、該領域に所定値Ｂ以上の単語を含む場合に文字領域と判定する手段を含む。 Further, according to the present invention (Claim 2), the character area determination means of Claim 1
Means for determining a character area when the number of detected unknown words is less than a predetermined value A and the area includes a word greater than or equal to a predetermined value B;

また、本発明（請求項３）は、請求項１の前記文字領域判定手段が、
前記検出された未知語の数が所定値Ａ未満で、かつ、検出された単語の長さが所定値Ｃ以下である単語数の前記領域に含まれる割合が、所定値Ｄ未満で、かつ、該領域に所定値Ｂ以上の単語を含む場合に文字領域と判定する手段を含む。 Further, according to the present invention (Claim 3), the character area determination means of Claim 1
The ratio of the number of detected unknown words that is less than a predetermined value A and the number of words in which the length of the detected word is less than or equal to a predetermined value C is less than a predetermined value D, and Means for determining that the area is a character area when the area includes a word equal to or greater than a predetermined value B;

また、本発明（請求項４）は、文字、あるいは、写真、あるいは、図、あるいは、表、あるいは、罫線、あるいは、その他の文字ではないものが記載されたドキュメントの全領域、もしくは部分領域を入力として、該領域が文字を一定割合以上含む文字領域であるかどうか判定する判定装置であって、
判定対象の領域の入力を受け付ける領域入力手段と、
判定対象の領域に文字が記載されているとみなして光学文字認識処理を行った結果、少なくとも、検出された単語の長さ及び該領域に含まれる単語数の割合に基づいて文字領域か非文字領域かを判定する文字領域判定手段と、
前記文字領域判定手段の判定結果に基づいて、文字領域または非文字領域のいずれかを出力する判定結果出力手段と、を有する。 Further, the present invention (Claim 4) provides a whole area or a partial area of a document in which a character, a photograph, a figure, a table, a ruled line, or other non-character is described. As an input, a determination device that determines whether or not the area is a character area including a certain percentage of characters,
An area input means for receiving an input of an area to be determined;
As a result of performing optical character recognition processing assuming that characters are described in the determination target area, the character area or non-character is based on at least the length of the detected word and the ratio of the number of words included in the area Character area determination means for determining whether the area;
Determination result output means for outputting either a character area or a non-character area based on the determination result of the character area determination means.

また、本発明（請求項５）は、請求項４の前記文字領域判定手段が、
前記検出された単語の長さが所定値Ｃ以下の単語数が、所定値Ｄ未満であり、前記該領域に所定値Ｂ以上の単語を含む場合に文字領域と判定する手段を含む。 Further, according to the present invention (Claim 5), the character area determination means of Claim 4
And means for determining a character area when the number of words whose length is less than or equal to a predetermined value C is less than a predetermined value D and the area includes a word that is equal to or greater than the predetermined value B.

また、本発明（請求項６）は、請求項３または請求項５の文字領域判定手段における、前記単語の長さの所定値Ｃを１文字とする。 In the present invention (Claim 6), the predetermined value C of the length of the word in the character area determination means of Claim 3 or Claim 5 is set to one character.

また、本発明（請求項７）は、文字、あるいは、写真、あるいは、図、あるいは、表、あるいは、罫線、あるいは、その他の文字ではないものが記載されたドキュメントの全領域、もしくは部分領域を入力として、該領域が文字を一定割合以上含む文字領域であるかどうか判定する判定方法であって、
判定対象の領域の入力を受け付ける領域入力ステップと、
判定対象の領域に文字が記載されているとみなして光学文字認識処理を行った結果、少なくとも、検出された未知語の数及び該領域に含まれる単語数の割合に基づいて文字領域か非文字領域かを判定する文字領域判定ステップと、
前記文字領域判定ステップの判定結果に基づいて、文字領域または非文字領域のいずれかを出力する判定結果出力ステップと、を行う。 Further, according to the present invention (Claim 7), an entire area or a partial area of a document in which a character, a photograph, a figure, a table, a ruled line, or any other character is described. A determination method for determining whether the region is a character region including a certain percentage or more of characters as an input,
An area input step for receiving an input of an area to be determined;
As a result of performing optical character recognition processing assuming that characters are described in the determination target area, at least a character area or a non-character based on the number of unknown words detected and the ratio of the number of words included in the area A character region determination step for determining whether the region is a region;
A determination result output step of outputting either a character area or a non-character area based on the determination result of the character area determination step;

また、本発明（請求項８）は、請求項７の前記文字領域判定ステップにおいて、
前記検出された未知語の数が所定値Ａ未満で、かつ、該領域に所定値Ｂ以上の単語を含む場合に文字領域と判定する。 Moreover, the present invention (Claim 8) is characterized in that in the character region determination step of Claim 7,
When the number of detected unknown words is less than a predetermined value A and the area includes words greater than or equal to a predetermined value B, the character area is determined.

また、本発明（請求項９）は、請求項７の前記文字領域判定ステップにおいて、
前記検出された未知語の数が所定値Ａ未満で、かつ、検出された単語の長さが所定値Ｃ以下である単語数の前記領域に含まれる割合が、所定値Ｄ未満で、かつ、該領域に所定値Ｂ以上の単語を含む場合に文字領域と判定する。 Further, the present invention (Claim 9) is characterized in that in the character region determination step of Claim 7,
The ratio of the number of detected unknown words that is less than a predetermined value A and the number of words in which the length of the detected word is less than or equal to a predetermined value C is less than a predetermined value D, and When the area includes a word of a predetermined value B or more, it is determined as a character area.

また、本発明（請求項１０）は、文字、あるいは、写真、あるいは、図、あるいは、表、あるいは、罫線、あるいは、その他の文字ではないものが記載されたドキュメントの全領域、もしくは部分領域を入力として、該領域が文字を一定割合以上含む文字領域であるかどうか判定する判定方法であって、
判定対象の領域の入力を受け付ける領域入力ステップと、
判定対象の領域に文字が記載されているとみなして光学文字認識処理を行った結果、少なくとも、検出された単語の長さ及び該領域に含まれる単語数の割合に基づいて文字領域か非文字領域かを判定する文字領域判定ステップと、
前記文字領域判定ステップの判定結果に基づいて、文字領域または非文字領域のいずれかを出力する判定結果出力ステップと、を行う。 Further, according to the present invention (Claim 10), an entire area or a partial area of a document in which a character, a photograph, a figure, a table, a ruled line, or other non-character characters is described. A determination method for determining whether the region is a character region including a certain percentage or more of characters as an input,
An area input step for receiving an input of an area to be determined;
As a result of performing optical character recognition processing assuming that characters are described in the determination target area, the character area or non-character is based on at least the length of the detected word and the ratio of the number of words included in the area A character region determination step for determining whether the region is a region;
A determination result output step of outputting either a character area or a non-character area based on the determination result of the character area determination step;

また、本発明（請求項１１）は、請求項１０の前記文字領域判定ステップにおいて、
前記検出された単語の長さが所定値Ｃ以下の単語数が、所定値Ｄ未満であり、前記該領域に所定値Ｂ以上の単語を含む場合に文字領域と判定する。 The present invention (Claim 11) is characterized in that, in the character region determination step of Claim 10,
When the number of words whose length of the detected word is less than or equal to the predetermined value C is less than the predetermined value D and the area includes a word that is greater than or equal to the predetermined value B, it is determined as a character area.

また、本発明（請求項１２）は、請求項９、または、１１の文字領域判定ステップにおいて、前記単語の長さの所定値Ｃを１文字とする。 Further, according to the present invention (Claim 12), in the character region determination step according to Claim 9 or 11, the predetermined value C of the length of the word is one character.

また、本発明（請求項１３）は、請求項１乃至６のいずれか１項に記載の判定装置を構成する各手段としてコンピュータを機能させるためのプログラムである。 Moreover, this invention (Claim 13) is a program for functioning a computer as each means which comprises the determination apparatus of any one of Claim 1 thru | or 6.

画像が文字領域かどうか判定する際に、従来手法は画像がどのような種類であるか（自動車や本棚の画像か、看板を含む景観画像か等）を既定し、その画像に適した文字領域抽出方法をとることが多く、どのような非文字領域が存在するか既知でない書籍内の部分領域を撮影した画像を分析対象とすることは困難であった。単一種類の画像しか既定しないと文字領域かどうかの判定精度は落ちるし、複数種類の画像を既定し複数種類の文字領域抽出方法を用いると処理コストが大幅に増加してしまう。 When determining whether an image is a character area, the conventional method determines what kind of image is an image (car or bookcase image, landscape image including a signboard, etc.), and the character area suitable for that image In many cases, an extraction method is used, and it is difficult to analyze an image obtained by photographing a partial region in a book where it is not known what non-character region exists. If only a single type of image is defined, the accuracy of determining whether or not it is a character region will be reduced, and if a plurality of types of images are defined and a plurality of types of character region extraction methods are used, the processing cost will be greatly increased.

これに対し、本発明は、画像がどのような種類であるか考慮することなく光学文字認識を実行してしまい、その結果を分析することで文字領域であるかどうかを判定している非常に処理コストの低いシンプルな方法である。 On the other hand, the present invention performs optical character recognition without considering what kind of image it is, and determines whether it is a character region by analyzing the result. This is a simple method with low processing costs.

本発明により、特に、どのような非文字領域が存在するか既知でない書籍内の部分領域を撮影した画像が、文字領域かどうかシンプルな方法で判定することができる。 In particular, according to the present invention, it is possible to determine by a simple method whether an image obtained by photographing a partial area in a book whose unknown non-character area exists is a character area.

本発明の第１の実施の形態におけるシステム構成図である。It is a system configuration figure in a 1st embodiment of the present invention. 本発明の第１の実施の形態における入力されるドキュメントの例である。It is an example of the document input in the 1st Embodiment of this invention. 本発明の第１の実施の形態における単語ＤＢの例である。It is an example of word DB in the 1st Embodiment of this invention. 本発明の第１の実施の形態における動作のシーケンスチャートである。It is a sequence chart of the operation | movement in the 1st Embodiment of this invention. 本発明の第１の実施の形態におけるドキュメントの文字領域抽出の例である。It is an example of the character area extraction of the document in the 1st Embodiment of this invention. 本発明の第１の実施の形態における画像ファイルからテキストデータへの変換の例である。It is an example of the conversion from the image file in the 1st Embodiment of this invention to text data. 本発明の第１の実施の形態における単語抽出部での形態素解析の例である。It is an example of the morphological analysis in the word extraction part in the 1st Embodiment of this invention. 本発明の第１の実施の形態における文字領域判定のフローチャートである。It is a flowchart of the character area determination in the 1st Embodiment of this invention. 本発明の第２の実施の形態におけるシステム構成図である。It is a system block diagram in the 2nd Embodiment of this invention. 本発明の第２の実施の形態における位置ＤＢの例である。It is an example of position DB in the 2nd Embodiment of this invention. 本発明の第２の実施の形態におけるコンテンツＤＢの例である。It is an example of content DB in the 2nd Embodiment of this invention. 本発明の第２の実施の形態における動作のシーケンスチャートである。It is a sequence chart of operation in a 2nd embodiment of the present invention. 本発明の第２の実施の形態におけるドキュメントの文字領域抽出の例である。It is an example of the character area extraction of the document in the 2nd Embodiment of this invention. 本発明の第２の実施の形態における画像ファイルからテキストデータへの変換の例である。It is an example of the conversion from the image file in the 2nd Embodiment of this invention to text data. 本発明の第２の実施の形態における単語抽出部での形態素解析の例である。It is an example of the morphological analysis in the word extraction part in the 2nd Embodiment of this invention. 本発明の第２の実施の形態における文字領域判定のフローチャートである。It is a flowchart of the character area determination in the 2nd Embodiment of this invention. 本発明の第２の実施の形態におけるコンテンツ問い合わせの処理のフローチャートである。It is a flowchart of the process of a content inquiry in the 2nd Embodiment of this invention. 本発明の第３の実施の形態における動作のシーケンスチャートである。It is a sequence chart of operation in a 3rd embodiment of the present invention. 本発明の第３の実施の形態における各名詞の位置の集計結果である。It is the total result of the position of each noun in the 3rd Embodiment of this invention. 本発明の第４の実施の形態におけるシステム構成図である。It is a system block diagram in the 4th Embodiment of this invention. 本発明の第４の実施の形態における動作のシーケンスチャートである。It is a sequence chart of operation in a 4th embodiment of the present invention. 本発明の第４の実施の形態における広域の文字領域を撮影した図である。It is the figure which image | photographed the wide character area in the 4th Embodiment of this invention. 名詞の出現位置をインデックスする例である。It is an example which indexes the appearance position of a noun. 書籍内の文字領域に基づいてコンテンツを提供する例である。It is an example which provides a content based on the character area in a book.

以下図面と共に、本発明の実施の形態を説明する。 Embodiments of the present invention will be described below with reference to the drawings.

［第１の実施の形態］
本実施の形態では、撮影した領域の文字認識を行い、当該領域に含まれる未知語の数に基づいて文字領域か否かを判定する。 [First Embodiment]
In the present embodiment, character recognition of the photographed region is performed, and it is determined whether or not the character region is based on the number of unknown words included in the region.

図１は、本発明の第１の実施の形態におけるシステム構成を示す。 FIG. 1 shows a system configuration according to the first embodiment of the present invention.

同図に示すシステムは、サーバ部３００、クライアント部４００、サーバ部３００に接続される単語ＤＢ２０１、光学文字認識装置２００から構成される。 The system shown in the figure includes a server unit 300, a client unit 400, a word DB 201 connected to the server unit 300, and an optical character recognition device 200.

同図におけるサーバ部３００は、ＰＣサーバ等の機器で実現でき、サーバ側データ送受信部３０１、単語抽出部３０２、文字領域判定部３０３から構成され、単語抽出部３０２には単語ＤＢ２０１が、サーバ側データ送受信部３０１には光学文字認識装置２００が接続される。 The server unit 300 in the figure can be realized by a device such as a PC server, and includes a server-side data transmission / reception unit 301, a word extraction unit 302, and a character area determination unit 303. The word extraction unit 302 includes a word DB 201 and a server side. The optical character recognition device 200 is connected to the data transmitting / receiving unit 301.

同図におけるクライアント部４００は、カメラ付き携帯電話等で実現でき、ドキュメント撮影部４０１、クライアント側データ送受信部４０２、判定結果提示部４０３から構成され、ドキュメント撮影部４０１にはドキュメント１００が入力される。 The client unit 400 in FIG. 1 can be realized by a camera-equipped mobile phone or the like, and includes a document photographing unit 401, a client side data transmitting / receiving unit 402, and a determination result presenting unit 403. The document 100 is input to the document photographing unit 401. .

同図におけるドキュメント１００は、図２のような文字領域と非文字領域を含む紙媒体書籍の１ページである。 A document 100 in the figure is one page of a paper medium book including a character area and a non-character area as shown in FIG.

図１における光学文字認識装置２００は、一般的なOCRソフトウェア等の外部装置であり、文字が写っている画像ファイルを入力とし、写っている文字を電子的なテキストデータに変換したものを出力とする。単語ＤＢ２０１は、一般的なOCRソフトウェア、あるいは形態素解析ソフトウェアが利用する単語辞書であり、我々が常用する単語はほぼすべて網羅されて図３の形式で格納されているとする。光学文字認識装置２００は、一般的なOCRソフトウェア同様に、単語ＤＢ２０１を用いなくても文字単位の識別はできるが、単語ＤＢ２０１を用いた方が単語単位の識別が可能になり、結果として全体の識別精度が向上する。 The optical character recognition device 200 in FIG. 1 is an external device such as general OCR software, which takes an image file containing characters as input, and outputs the converted characters as electronic text data. To do. The word DB 201 is a word dictionary used by general OCR software or morphological analysis software, and it is assumed that almost all the words that we regularly use are stored in the format shown in FIG. The optical character recognition apparatus 200 can identify character units without using the word DB 201 as in general OCR software. However, the optical character recognition apparatus 200 can identify word units using the word DB 201, and as a result, Identification accuracy is improved.

次に、上記の構成における、本実施の形態における処理を説明する。 Next, processing in the present embodiment in the above configuration will be described.

図４は、本発明の第１の実施の形態における処理のシーケンスチャートである。 FIG. 4 is a sequence chart of processing in the first embodiment of the present invention.

ステップ１００:入力ステップ）クライアント部４００のドキュメント撮影部４０１は、判定対象となるドキュメントの部分画像を撮影し、クライアント側データ送受信部４０２を介してサーバ部３００に送信する。詳細な処理については、以下のステップ１０１〜１０３で説明する。 Step 100: Input Step) The document photographing unit 401 of the client unit 400 captures a partial image of a document to be determined, and transmits the partial image to the server unit 300 via the client side data transmitting / receiving unit 402. Detailed processing will be described in steps 101 to 103 below.

ステップ１１０:単語抽出ステップ）サーバ部３００のサーバ側データ送受信部３０１は、クライアント部４００からのドキュメントの部分画像を取得し、文字領域判定部３０３において、撮影領域から単語を抽出する。詳細な処理については、以下のステップ１０４で説明する。 Step 110: Word Extraction Step) The server-side data transmission / reception unit 301 of the server unit 300 acquires a partial image of the document from the client unit 400, and the character region determination unit 303 extracts words from the shooting region. Detailed processing will be described in step 104 below.

ステップ１２０:文字領域判定ステップ）文字領域判定部３０３は、撮影領域が文字領域であるかどうか判定する。詳細な処理については、以下のステップ１０５で説明する。 Step 120: Character Area Determination Step) The character area determination unit 303 determines whether or not the shooting area is a character area. Detailed processing will be described in step 105 below.

ステップ１３０:出力ステップ）文字領域判定部３０３は、撮影領域が文字領域であるかどうかの判定結果を出力する。詳細な処理については、以下のステップ１０６〜１０８で説明する。 Step 130: Output Step) The character area determination unit 303 outputs a determination result as to whether or not the shooting area is a character area. Detailed processing will be described in steps 106 to 108 below.

以下に上記の処理の詳細を示す。 Details of the above processing will be described below.

ステップ１０１）ドキュメント撮影部４０１は、ドキュメント１００の文字領域を撮影して図５のように画像ファイルとして出力する。 Step 101) The document photographing unit 401 photographs the character area of the document 100 and outputs it as an image file as shown in FIG.

ステップ１０２）クライアント側データ送受信部４０２は、ステップ１０１の出力を入力として受け付け、画像ファイルのままネットワーク等を通じてサーバ部３００に出力する。 Step 102) The client-side data transmission / reception unit 402 receives the output of step 101 as an input, and outputs the image file to the server unit 300 through the network or the like as it is.

ステップ１０３）サーバ側データ送受信部３０１は、ステップ１０２の出力を入力として受け付け、光学文字認識装置２００を用いて画像ファイル中のテキストを認識してテキストデータに変換したものを、図６のように出力する。前述のとおり、光学文字認識装置２００は識別時に単語ＤＢ２０１を用い、単語ＤＢ２０１に登録されている単語は精度良く認識できるが、登録されていない単語・文字の認識精度は低下する。また、このとき、撮影状況がよくないため、正しくは「タイプ」である文字列が「タイフ」と誤認識されたとする。なお、本実施の形態では画像ファイルに写っているのが文字領域であったが、写っている対象が文字領域であるか非文字領域であるかは一切考慮しない。 Step 103) The server-side data transmission / reception unit 301 receives the output of Step 102 as input, recognizes the text in the image file using the optical character recognition device 200, and converts it into text data as shown in FIG. Output. As described above, the optical character recognition apparatus 200 uses the word DB 201 at the time of identification, and can accurately recognize words registered in the word DB 201, but the recognition accuracy of unregistered words / characters is lowered. At this time, since the shooting situation is not good, it is assumed that a character string that is correctly “type” is erroneously recognized as “type”. In the present embodiment, the character area is shown in the image file. However, it is not considered at all whether the object to be shown is a character area or a non-character area.

ステップ１０４）単語抽出部３０２は、ステップ１０３の出力を入力として受け付け、単語を抽出して出力する。ここでは一般技術である形態素解析を用いて単語抽出を行う。形態素解析とは単語辞書を情報源として文を形態素に分割し、各形態素の品詞を判別する一般的な自然言語処理技術である。新語や誤記等、単語辞書に格納されていない語は品詞を判別できないため未知語となる。本実施の形態では単語辞書は単語ＤＢ２０１を用い、ここに格納されておらず品詞を判別できない語は未知語と判定される。ステップ１０３の入力から単語を抽出すると図７のようになり、これを出力する。例えば、ステップ１０３で誤認識された「タイフ」や、撮影領域の端で「アンテナ」という語が途切れて生じた「テナ」は存在しない語であるため未知語になる。 Step 104) The word extraction unit 302 receives the output of Step 103 as an input, and extracts and outputs the word. Here, word extraction is performed using morphological analysis, which is a general technique. Morphological analysis is a general natural language processing technique for dividing a sentence into morphemes using a word dictionary as an information source and discriminating the part of speech of each morpheme. Words that are not stored in the word dictionary, such as new words and typographical errors, are unknown words because the part of speech cannot be determined. In this embodiment, the word dictionary uses the word DB 201, and words that are not stored here and whose part of speech cannot be determined are determined as unknown words. When a word is extracted from the input of step 103, it is as shown in FIG. 7, and this is output. For example, “tyf” misrecognized in step 103 and “tena” generated by the interruption of the word “antenna” at the end of the imaging region are unknown words because they do not exist.

ステップ１０５）文字領域判定部３０３は、ステップ１０４の出力を入力として受け付け、撮影領域が文字領域か非文字領域か判定した結果を出力する。 Step 105) The character area determination unit 303 receives the output of Step 104 as an input, and outputs a result of determining whether the shooting area is a character area or a non-character area.

判定の際には、図８に示すように、単語抽出部３０２から入力された未知語と判定された単語の割合が規定値Ｘ未満であるかを判定し（ステップ１０５１）、規定値Ｘ未満であれば（ステップ１０５１、Ｙｅｓ）、判定結果は文字領域であるとし（ステップ１０５２）、規定値Ｘ以上であれば（ステップ１０５１、Ｎｏ）、判定結果は非文字領域であるとする（ステップ１０５３）。このように、入力の中で未知語と判定された単語の割合が規定値を下回った場合には文字領域、規定値を上回った場合には非文字領域と判定する。本実施の形態では規定値を50％とし、図７において未知語と判定されている語は50％以下なので「文字領域」と判定される。 At the time of determination, as shown in FIG. 8, it is determined whether the proportion of words determined to be unknown words input from the word extraction unit 302 is less than a specified value X (step 1051). If so (step 1051, Yes), the determination result is a character area (step 1052), and if it is equal to or greater than the specified value X (step 1051, No), the determination result is a non-character area (step 1053). ). Thus, when the ratio of words determined as unknown words in the input falls below a specified value, it is determined as a character area, and when it exceeds a specified value, it is determined as a non-character area. In the present embodiment, the specified value is 50%, and the word determined as an unknown word in FIG.

ステップ１０６）サーバ側データ送受信部３０１は、ステップ１０５の判定結果を入力として受け付け、ネットワークを通じてクライアント部４００に出力する。 Step 106) The server-side data transmission / reception unit 301 receives the determination result of step 105 as an input, and outputs it to the client unit 400 through the network.

ステップ１０７）クライアント側データ送受信部４０２は、ステップ１０５の判定結果を入力として受け付け、出力する。 Step 107) The client-side data transmission / reception unit 402 receives and outputs the determination result of step 105 as an input.

ステップ１０８）判定結果提示部４０３は、判定結果が文字領域なら「文字領域です」、非文字領域なら「非文字領域です」と音声でアナウンスする。 Step 108) The determination result presentation unit 403 announces by voice that “it is a character area” if the determination result is a character area, and “non-character area” if it is a non-character area.

［第２の実施の形態］
本実施の形態は、文字領域を判定する際に、撮影された領域について文字認識を行い、認識結果の未知語の数と１文字の単語の数が規定値未満の場合に文字領域と判定する。 [Second Embodiment]
In this embodiment, when a character area is determined, character recognition is performed on the captured area, and the character area is determined when the number of unknown words and the number of one-character words are less than a predetermined value. .

本実施の形態は、第１の実施の形態と背景技術で述べた方式を用いて、書籍にカメラ付き携帯電話をかざすとその位置を音声で読み上げる視覚障がい者支援システムの例を示す。 This embodiment shows an example of a visually impaired support system that reads the position by voice when a camera-equipped mobile phone is held over a book using the method described in the first embodiment and the background art.

図９は、本発明の第２の実施の形態におけるシステム構成を示す。同図において、図１と同一構成部分には同一符号を付し、その説明を省略する。 FIG. 9 shows a system configuration in the second embodiment of the present invention. In the figure, the same components as those in FIG.

図９におけるサーバ部３００は、ＰＣサーバ等の機器で実現でき、サーバ側データ送受信部３０１、単語抽出部３０２、文字領域判定部３０３、検索問い合わせ部３０４、コンテンツ検索部305、位置ＤＢ３０６、コンテンツＤＢ３０７から構成される。なお、位置ＤＢ３０６内には背景技術で述べた方法で図１０に示すようなデータが作成・格納されている。また、コンテンツＤＢ３０７内には、書籍内の各領域に対して、各領域を読み上げた音声ファイルが図１１のように格納されているとする。 The server unit 300 in FIG. 9 can be realized by a device such as a PC server, and includes a server-side data transmission / reception unit 301, a word extraction unit 302, a character area determination unit 303, a search inquiry unit 304, a content search unit 305, a position DB 306, and a content DB 307. Consists of In the position DB 306, data as shown in FIG. 10 is created and stored by the method described in the background art. In the content DB 307, it is assumed that an audio file that reads out each area is stored as shown in FIG. 11 for each area in the book.

図９におけるクライアント部４００はカメラ付き携帯電話等で実現でき、ドキュメント撮影部４０１、クライアント側データ送受信部４０２、コンテンツ提示部４０３から構成される。 The client unit 400 in FIG. 9 can be realized by a camera-equipped mobile phone or the like, and includes a document photographing unit 401, a client-side data transmission / reception unit 402, and a content presentation unit 403.

同図におけるドキュメント１００、光学文字認識装置２００、単語ＤＢ２０１は第１の実施の形態と同様である。 The document 100, the optical character recognition device 200, and the word DB 201 in the figure are the same as those in the first embodiment.

以下、上記の構成における動作を説明する。 The operation in the above configuration will be described below.

図１２は、本発明の第２の実施の形態における動作のシーケンスチャートである。 FIG. 12 is a sequence chart of the operation in the second embodiment of the present invention.

ステップ２００:入力ステップ）クライアント部４００のドキュメント撮影部４０１は、判定対象となるドキュメントの部分画像を撮影し、クライアント側データ送受信部４０２を介してサーバ部３００に送信する。詳細な処理については、以下のステップ２０１〜２０３で説明する。 Step 200: Input Step) The document photographing unit 401 of the client unit 400 photographs a partial image of a document to be determined and transmits the partial image to the server unit 300 via the client side data transmitting / receiving unit 402. Detailed processing will be described in steps 201 to 203 below.

ステップ２１０:単語抽出ステップ）サーバ部３００のサーバ側データ送受信部３０１は、クライアント部４００からのドキュメントの部分画像を取得し、文字領域判定部３０３において、撮影領域から単語を抽出する。詳細な処理については、以下のステップ２０４で説明する。 Step 210: Word Extraction Step) The server-side data transmission / reception unit 301 of the server unit 300 acquires a partial image of a document from the client unit 400, and the character region determination unit 303 extracts words from the shooting region. Detailed processing will be described in step 204 below.

ステップ２２０:文字領域判定ステップ）文字領域判定部３０３は、撮影領域が文字領域であるかどうか判定する。詳細な処理については、以下のステップ２０５で説明する。 Step 220: Character Area Determination Step) The character area determination unit 303 determines whether or not the shooting area is a character area. Detailed processing will be described in step 205 below.

ステップ２３０:出力ステップ）文字領域判定部３０３は、撮影領域が文字領域であるかどうかの判定結果に基づいて、コンテンツを出力する。詳細な処理については、以下のステップ２０６〜２０９で説明する。 Step 230: Output Step The character area determination unit 303 outputs the content based on the determination result as to whether or not the shooting area is a character area. Detailed processing will be described in steps 206 to 209 below.

ステップ２０１）前述のステップ１０１と同様の処理を行い、図１３のように画像ファイルとして出力する。 Step 201) The same processing as in the above-described step 101 is performed, and an image file is output as shown in FIG.

ステップ２０２）前述のステップ１０２と同様の処理を行う。 Step 202) The same processing as in Step 102 described above is performed.

ステップ２０３）ステップ１０３と同様の処理を行う。写っている対象が文字領域であるか非文字領域であるかは一切考慮しないため、出力は図１４のようになる。これは、一般的なOCRソフトウェアで文字が記載されていない図領域を文字とみなして認識処理すると、未知語と認識してしまったり、「。」や「・」や「１」といった1文字から構成される単語の羅列として認識してしまったりするためである。 Step 203) The same processing as step 103 is performed. Since it is not considered at all whether the object to be reflected is a character area or a non-character area, the output is as shown in FIG. This is because if a typical OCR software recognizes a figure area that does not contain characters as a character and recognizes it as an unknown word, it can be recognized as an unknown word or from one character such as “.”, “•”, or “1”. This is because it may be recognized as an enumerated word sequence.

ステップ２０４）単語抽出部３０２は、ステップ２０３の認識を入力として受け付け、単語を抽出して出力する。ここではステップ１０３と同様に形態素解析を用いて単語抽出を行う。本実施の形態においても単語辞書は図３の形式で単語ＤＢ２０１に格納されており、ここに格納されておらず品詞を判別できない語は未知語と判定される。ステップ２０３の入力から単語を抽出すると図１５のようになり、これを文字領域判定部３０３に出力する。 Step 204) The word extraction unit 302 receives the recognition of Step 203 as an input, and extracts and outputs the word. Here, as in step 103, word extraction is performed using morphological analysis. Also in the present embodiment, the word dictionary is stored in the word DB 201 in the format of FIG. 3, and a word that is not stored here and whose part of speech cannot be determined is determined as an unknown word. When a word is extracted from the input in step 203, it is as shown in FIG. 15, and this is output to the character area determination unit 303.

ステップ２０５）文字領域判定部３０３は、ステップ２０４の出力を入力として受け付け、撮影領域が文字領域か非文字領域か判定した結果とステップ２０４で抽出した単語群を出力する。判定の際は図１６のフローチャートに従い決定する。まず、入力された単語のうち、未知語と判定された単語の割合が規定値X未満である場合には（ステップ２０５１、Ｙｅｓ）文字領域（暫定）、規定値X以上である場合には（ステップ２０５１、Ｎｏ）非文字領域と判定する。本実施の形態では規定値Xを50％とし、図１５において未知語と判定されている語は50％未満である。次に、１文字からなる単語の割合が規定値Y未満である場合は（ステップ２０５２、Ｙｅｓ）文字領域とし（ステップ２０５３）、規定値Y以上の場合は（ステップ２０５２、Ｎｏ）非文字領域と判定する（ステップ２０５４）。本実施の形態では規定値Yを80％とし、図１５において１文字からなる単語と判定されている語は80％以上であるため、「非文字領域」と判定される。 Step 205) The character area determination unit 303 receives the output of step 204 as an input, and outputs the result of determining whether the shooting area is a character area or a non-character area and the word group extracted in step 204. The determination is made according to the flowchart of FIG. First, in the case where the ratio of words determined to be unknown words is less than the prescribed value X (step 2051, Yes), the character area (provisional), if it is greater than the prescribed value X ( Step 2051, No) It determines with a non-character area | region. In the present embodiment, the specified value X is 50%, and the number of words determined as unknown words in FIG. 15 is less than 50%. Next, if the ratio of words consisting of one character is less than the prescribed value Y (step 2052, Yes), the character area is set (step 2053). If the ratio is greater than the prescribed value Y (step 2052, No), the non-character area is Determination is made (step 2054). In the present embodiment, the prescribed value Y is 80%, and the words that are determined to be one-character words in FIG. 15 are 80% or more, and thus are determined as “non-character areas”.

ステップ２０６）検索問い合わせ部３０４は、ステップ２０５の判定結果を入力として受け付け、図１７のフローチャートに従いコンテンツの問い合わせを行う。入力された判定結果が「文字領域」であれば（ステップ２０６１、Ｙｅｓ）、同じく入力された単語群をコンテンツ検索部３０５に入力してコンテンツ（撮影領域を読み上げた音声ファイル）を取得し、該コンテンツを出力する（ステップ２０６２）。一方、入力された判定結果が「非文字領域」であれば（ステップ２０６１、Ｎｏ）、「そこには文字はありません」と読み上げた音声ファイル（コンテンツ）を出力する（ステップ２０６３）。ここでは、ステップ２０６の入力の判定結果は「非文字領域」なので、コンテンツとして、「そこには文字はありません」と読み上げた音声ファイルが出力されることになる。 Step 206) The search inquiry unit 304 accepts the determination result of step 205 as an input, and inquires the content according to the flowchart of FIG. If the input determination result is “character region” (step 2061, Yes), the same input word group is input to the content search unit 305 to acquire content (an audio file reading the shooting region), The content is output (step 2062). On the other hand, if the input determination result is “non-character area” (step 2061, No), an audio file (content) read out as “no character is there” is output (step 2063). Here, since the determination result of the input in step 206 is “non-character area”, an audio file read out as “there is no character there” is output as the content.

ステップ２０７）サーバ側データ送受信部３０１は、ステップ２０６の検索されたコンテンツまたは非文字領域を示すコンテンツを受け付け、ネットワークを通じてクライアント部４００に出力する。 Step 207) The server-side data transmission / reception unit 301 receives the searched content or the content indicating the non-character area in Step 206, and outputs it to the client unit 400 through the network.

ステップ２０８）クライアント側データ送受信部４０２は、ステップ２０７のコンテンツを受け付け、コンテンツ提示部４０３に出力する。 Step 208) The client-side data transmission / reception unit 402 receives the content of step 207 and outputs it to the content presentation unit 403.

ステップ２０９）コンテンツ提示部４０３は、ステップ２０８の出力を入力として受け付け、コンテンツである音声ファイルを再生する。ここでは、「そこには文字はありません」という音声ファイルが再生されるので、ユーザは他の領域を撮影するという判断を行うことができる。 Step 209) The content presentation unit 403 receives the output of step 208 as an input, and reproduces the audio file that is the content. Here, since an audio file “There is no character there” is reproduced, the user can make a determination to shoot another area.

［第３の実施の形態］
本実施の形態では、第２の実施の形態で文字領域と判定された場合に、当該文字領域に含まれる単語（名詞）を用いてコンテンツ検索を行う。 [Third Embodiment]
In this embodiment, when a character area is determined in the second embodiment, a content search is performed using a word (noun) included in the character area.

本実施の形態は第２の実施の形態と同一の構成で、文字領域を撮影した場合を説明する。 In this embodiment, a case where a character area is photographed with the same configuration as that of the second embodiment will be described.

図１８は、本発明の第３の実施の形態における動作のシーケンスチャートである。 FIG. 18 is a sequence chart of operations in the third embodiment of the present invention.

ステップ３０１）前述の第２の実施の形態におけるステップ２０１と同様の処理を行うが、ここでは文字領域を撮影したため図５のような出力を行ったとする。 Step 301) The same processing as in step 201 in the second embodiment described above is performed, but here, since the character area is photographed, it is assumed that the output as shown in FIG. 5 is performed.

ステップ３０２）ステップ２０２と同様の処理を行う。 Step 302) The same processing as step 202 is performed.

ステップ３０３）ステップ２０３と同様の処理を行う。出力は図６のようになる。 Step 303) The same processing as step 203 is performed. The output is as shown in FIG.

ステップ３０４）ステップ２０４と同様の処理を行う。出力は図７のようになる。 Step 304) The same processing as step 204 is performed. The output is as shown in FIG.

ステップ３０５）ステップ２０５と同様の処理を行う。本実施の形態では規定値Xを50％、規定値Yを80％とし、図７において未知語と判定されている語は50％未満、１文字からなる単語と判定されている語は80％未満であるため、「文字領域」と判定される。 Step 305) The same processing as step 205 is performed. In the present embodiment, the prescribed value X is 50% and the prescribed value Y is 80%. In FIG. 7, the word determined as an unknown word is less than 50%, and the word determined as a single character is 80%. Therefore, it is determined as a “character area”.

ステップ３０６）ステップ３０５の入力の判定結果は「文字領域」なので、同じく入力された単語群をコンテンツ検索部３０５に入力する。コンテンツ検索部３０５は、入力された単語群に含まれる各名詞をキーとして位置ＤＢ３０６に問い合わせを行い、得られた結果（各名詞の位置）を図１８のように集計し、件数が最多の位置を特定する（この場合は書籍Ａ３ページ）。 Step 306) Since the input determination result in step 305 is “character region”, the same input word group is input to the content search unit 305. The content search unit 305 makes an inquiry to the position DB 306 using each noun included in the input word group as a key, and totals the obtained results (positions of each noun) as shown in FIG. (In this case, book A3 page).

ステップ３０７）次に、特定した位置をキーとしてコンテンツＤＢ３０７に問い合わせを行い、得られた結果（この場合は「書籍Ａ３ページを読み上げた音声ファイル」）を出力する。 Step 307) Next, the content DB 307 is inquired by using the specified position as a key, and the obtained result (in this case, “sound file reading the book A3 page”) is output.

ステップ３０８）ステップ２０７と同様の処理を行う。 Step 308) The same processing as step 207 is performed.

ステップ３０９）ステップ２０８と同様の処理を行う。ここでは、撮影した領域を読み上げた音声ファイルが再生されるので、ユーザはその領域の内容を理解することができる。 Step 309) The same processing as step 208 is performed. Here, an audio file that reads out the captured area is reproduced, so that the user can understand the contents of the area.

［第４の実施の形態］
本実施の形態は、第２の実施の形態を一部変更し、文献１「間野一則，水野秀之，中嶋秀治，宮崎昇，吉田明弘：顧客へのリアルな音声応答を実現するテキスト音声合成技術「Cralinet」電気通信協会 NTT技術ジャーナル 18(11)，pp.19-22，2006年11月．」等の技術を用いて、書籍にカメラ付き携帯電話をかざすとその位置を音声で読み上げる視覚障がい者支援システムである。 [Fourth Embodiment]
This embodiment is a partial modification of the second embodiment. Reference 1 “Mazunori Mano, Hideyuki Mizuno, Hideharu Nakajima, Noboru Miyazaki, Akihiro Yoshida: Text-to-speech synthesis technology that realizes real voice responses to customers. "Cralinet" Telecommunications Association NTT Technical Journal 18 (11), pp.19-22, November 2006. This is a visually impaired person support system that reads out the position by voice when a camera-equipped mobile phone is held over a book using a technique such as "".

図２０は、本発明の第４の実施の形態におけるシステム構成を示す。同図において、図９と同一構成部分には、同一符号を付し、その説明を省略する。 FIG. 20 shows a system configuration in the fourth embodiment of the present invention. In the figure, the same components as those in FIG. 9 are denoted by the same reference numerals, and the description thereof is omitted.

図２１は、本発明の第４の実施の形態における動作のシーケンスチャートである。 FIG. 21 is an operation sequence chart according to the fourth embodiment of the present invention.

ステップ４０１）前述の第２の実施の形態におけるステップ２０１と同様の処理を行うが、ここでは広域の文字領域を撮影したため図２２のような出力を行ったとする。 Step 401) The same processing as in Step 201 in the second embodiment described above is performed, but here, since a wide character area is photographed, it is assumed that an output as shown in FIG. 22 is performed.

ステップ４０２）ステップ２０２と同様の処理を行う。 Step 402) The same processing as step 202 is performed.

ステップ４０３）ステップ２０３と同様の処理を行う。 Step 403) The same processing as step 203 is performed.

ステップ４０４）ステップ２０４と同様の処理を行う。 Step 404) The same processing as step 204 is performed.

ステップ４０５）ステップ２０５と同様の処理を行う。本実施の形態では「文字領域」と判定される。 Step 405) The same processing as step 205 is performed. In this embodiment, it is determined as a “character area”.

ステップ４０６）コンテンツ作成部３０６は、ステップ４０５の出力を入力として受け付け、該入力に基づいて作成したコンテンツを出力する。入力された判定結果が「文字領域」であれば、同じく入力された単語群を連結したテキストの内容を、上記の文献１等の技術を用いて音声として読み上げた音声ファイルに変換する。一方、入力された判定結果が「非文字領域」であれば、「そこには文字はありません」と読み上げた音声ファイルを出力する。 Step 406) The content creation unit 306 receives the output of step 405 as an input, and outputs the content created based on the input. If the input determination result is “character region”, the content of the text obtained by concatenating the input word groups is converted into an audio file read out as speech using the technique of the above-mentioned document 1 or the like. On the other hand, if the input determination result is “non-character area”, an audio file that reads out “There is no character there” is output.

ステップ４０７）ステップ２０７と同様の処理を行う。 Step 407) The same processing as step 207 is performed.

ステップ４０８）ステップ２０８と同様の処理を行う。ここでは、撮影した領域を読み上げた音声ファイルが再生されるので、ユーザはその領域の内容を理解することができる。 Step 408) The same processing as step 208 is performed. Here, an audio file that reads out the captured area is reproduced, so that the user can understand the contents of the area.

なお、上記の第１〜第４の実施の形態における各構成要素の動作をプログラムとして構築し、判定装置として利用されるコンピュータにインストールする、または、ネットワークを介して流通させることが可能である。 In addition, it is possible to construct | assemble the operation | movement of each component in said 1st-4th embodiment as a program, to install in the computer utilized as a determination apparatus, or to distribute | circulate via a network.

また、構築されたプログラムをハードディスクや、フレキシブルディスク、ＣＤ−ＲＯＭ等の可搬記憶媒体に格納し、コンピュータにインストールする、または、配布することが可能である。 In addition, the constructed program can be stored in a portable storage medium such as a hard disk, a flexible disk, or a CD-ROM, and can be installed or distributed in a computer.

なお、本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made within the scope of the claims.

１００ドキュメント
２００光学文字認識装置
２０１単語ＤＢ
３００サーバ部
３０１サーバ側データ送受信部
３０２単語抽出部
３０３文字認識判定部
３０４コンテンツ問い合わせ部
３０６コンテンツ作成部
３０５コンテンツ検索部
４００クライアント部
４０１ドキュメント撮影部
４０２クライアント側データ送受信部
４０３判定結果提示部 100 Document 200 Optical Character Recognition Device 201 Word DB
300 server unit 301 server side data transmission / reception unit 302 word extraction unit 303 character recognition determination unit 304 content inquiry unit 306 content creation unit 305 content search unit 400 client unit 401 document photographing unit 402 client side data transmission / reception unit 403 determination result presentation unit

Claims

Input a whole area or a partial area of a document that contains text, a photograph, a figure, a table, a ruled line, or other non-character text. A determination device that determines whether a character area is included,
An area input means for receiving an input of an area to be determined;
As a result of performing optical character recognition processing assuming that characters are described in the determination target area, at least a character area or a non-character based on the number of unknown words detected and the ratio of the number of words included in the area Character area determination means for determining whether the area;
A determination result output means for outputting either a character area or a non-character area based on the determination result of the character area determination means;
The determination apparatus characterized by having.

The character area determination means includes:
Means for determining a character area when the number of detected unknown words is less than a predetermined value A and the area includes a word greater than or equal to a predetermined value B;
The determination apparatus according to claim 1.

The character area determination means includes:
The ratio of the number of detected unknown words that is less than a predetermined value A and the number of words in which the length of the detected word is less than or equal to a predetermined value C is less than a predetermined value D, and Means for determining a character area when the area includes a word equal to or greater than a predetermined value B;
The determination apparatus according to claim 1.

Input a whole area or a partial area of a document that contains text, a photograph, a figure, a table, a ruled line, or other non-character text. A determination device that determines whether a character area is included,
An area input means for receiving an input of an area to be determined;
As a result of performing optical character recognition processing assuming that characters are described in the determination target area, the character area or non-character is based on at least the length of the detected word and the ratio of the number of words included in the area Character area determination means for determining whether the area;
A determination result output means for outputting either a character area or a non-character area based on the determination result of the character area determination means;
The determination apparatus characterized by having.

The character area determination means includes:
The number of words whose length of the detected word is less than or equal to a predetermined value C is less than a predetermined value D, and includes means for determining a character area when the area includes a word greater than or equal to a predetermined value B. The determination apparatus described.

6. The determination apparatus according to claim 3, wherein the predetermined value C of the word length is one character.

Input a whole area or a partial area of a document that contains text, a photograph, a figure, a table, a ruled line, or other non-character text. A determination method for determining whether a character area is included,
An area input step for receiving an input of an area to be determined;
As a result of performing optical character recognition processing assuming that characters are described in the determination target area, at least a character area or a non-character based on the number of unknown words detected and the ratio of the number of words included in the area A character region determination step for determining whether the region is a region;
A determination result output step for outputting either a character area or a non-character area based on the determination result of the character area determination step;
The determination method characterized by performing.

In the character region determination step,
When the number of detected unknown words is less than a predetermined value A and the area includes a word that is greater than or equal to a predetermined value B, the character area is determined.
The determination method according to claim 7.

In the character region determination step,
The ratio of the number of detected unknown words that is less than a predetermined value A and the number of words in which the length of the detected word is less than or equal to a predetermined value C is less than a predetermined value D, and When the area includes a word that is equal to or greater than a predetermined value B, it is determined as a character area.
The determination method according to claim 7.

Input a whole area or a partial area of a document that contains text, a photograph, a figure, a table, a ruled line, or other non-character text. A determination method for determining whether a character area is included,
An area input step for receiving an input of an area to be determined;
As a result of performing optical character recognition processing assuming that characters are described in the determination target area, the character area or non-character is based on at least the length of the detected word and the ratio of the number of words included in the area A character region determination step for determining whether the region is a region;
A determination result output step for outputting either a character area or a non-character area based on the determination result of the character area determination step;
The determination method characterized by performing.

In the character region determination step,
The determination according to claim 10, wherein the number of words having a length of the detected word equal to or less than a predetermined value C is less than a predetermined value D, and the character area is determined when the area includes a word having a predetermined value B or more. Method.

The determination method according to claim 9 or 11, wherein the predetermined value C of the length of the word is one character.

The program for functioning a computer as each means which comprises the determination apparatus of any one of Claims 1 thru | or 6.