JP2004355184A

JP2004355184A - Extraction apparatus and extraction method for character/graphic region, program for the method, and recording medium for recording the program

Info

Publication number: JP2004355184A
Application number: JP2003150249A
Authority: JP
Inventors: Tetsuya Kinebuchi; 哲也杵渕; Naoki Ito; 直己伊藤; Yoshinori Kusachi; 良規草地; Akira Suzuki; 章鈴木; Kenichi Arakawa; 賢一荒川; Tomohiko Arikawa; 知彦有川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-05-28
Filing date: 2003-05-28
Publication date: 2004-12-16

Abstract

<P>PROBLEM TO BE SOLVED: To efficiently and precisely extract the inclusive rectangle of characters and graphics in an image. <P>SOLUTION: An image 100 is a correct answer data creation target image, such as inputted scene photos and photos in a town, and net-like grid construction processing 200 is constituted such that a net-like grid is constructed in the image. Inclusion rectangular extraction processing 300 extracts the inclusion rectangle of each character and each graphic. Then, supplementary information input processing 500 inputs supplementary information, such as image information, character fonts, character codes, and graphic names. Determination processing 700 determines whether processing is completed or not for all characters and graphics in the same net-like grid, and determination processing 800 determines whether processing is completed for all net-like grids in the image. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、画像内に存在する文字や図形を包含する四角形状の外接矩形領域（包含矩形）を、マウスまたはキーボード等の入力デバイスを用いて抽出する装置および方法に関するものである。
【０００２】
【従来の技術】
文字認識や図形認識技術の開発には、認識対象画像内に存在する文字や図形の文字コード、文字フォント、図形名称、文字や図形の位置、文字や図形の包含矩形形状、画像情報など、各種情報が集約されたデータ（以下、正解データと呼称）が必要となる。この正解データは、認識用のテンプレート（以下、辞書と呼称）を作成する際に学習データとして使用される場合や、認識結果の正誤性能を測定する際に用いられる場合があり、文字認識・図形認識技術の開発を進める際に重要なものである。
【０００３】
正解データの中で、位置と包含矩形形状は該矩形の４つの頂点座標により記述され、従来は市販の画像処理ソフトなどを用い（例えば、非特許文献１、２参照）、以下の手順によりこの座標を抽出していた。
【０００４】
Ｓ１．認識対象画像を市販画像処理ソフトに読み込み、表示する。
【０００５】
Ｓ２．市販画像処理ソフトに標準装備されていることの多い矩形領域抽出ツールを使用し、包含矩形の４つの頂点を人手でマウスクリックすることにより該矩形を抽出する。
【０００６】
Ｓ３．市販画像処理ソフトに標準装備されていることの多いマウスポインタ位置表示機能を使用し、４つの頂点の座標を人手で読み取り、記録する。
【０００７】
【非特許文献１】
画像処理ソフトウェア：発売元：ＰａｉｎｔＳｈｏｐＰｒｏ，ＪａｓｃＳｏｆｔｗａｒｅ社
【０００８】
【非特許文献２】
画像処理ソフトウェア：発売元：Ｐｈｏｔｓｈｏｐ，Ａｄｏｂｅ社
【０００９】
【発明が解決しようとする課題】
画像内の文字や図形各々に矩形形状の外枠があらかじめ付いている場合、従来方法により包含矩形の４つの頂点座標を精度良く抽出することが可能であるが、文字や図形の大部分には外枠が付いていない。
【００１０】
このような場合、まず文字や図形がその上に書かれている看板など、矩形形状を持ち外枠が明確な物体の４辺をはじめに抽出し、次にこの４辺を基準として、すなわちこれらを平行移動させることにより、文字や図形の包含矩形の４辺を定め、各辺の交わった頂点座標を抽出することになる。この作業には非常に大きな手間がかかることになる。
【００１１】
また、画像は原理的に透視投影変換されているため、文字や図形の包含矩形の４辺と、これらが書かれている看板などの４辺の対応する辺同士は完全に平行ではなく、直線の傾きは若干異なっているため、精度の高い包含矩形形状の抽出も困難であった。
【００１２】
本発明の目的は、上記の課題を解決した文字・図形領域抽出装置、方法、プログラムおよび記録媒体を提供することにある。
【００１３】
【課題を解決するための手段】
本発明は、文字や図形の包含矩形を効率的かつ精度良く抽出するため、以下の機能を備えた装置、方法、プログラムおよび記録媒体を特徴とする。
【００１４】
・画像読み込みと表示機能。
【００１５】
・ネット状グリッドを自動構築する機能。
【００１６】
・ネット状グリッドの微調整機能。
【００１７】
・マウスポインタにネット状グリッドに沿ったガイド線を付与する機能。
【００１８】
・４つの頂点の指定により、包含矩形形状を決定する機能。
【００１９】
・矩形形状情報の自動出力機能。
【００２０】
・上記１〜６の基本機能に加え、使用目的による拡張機能としての画像情報、文字フォント、文字コード、図形名称などの付帯情報入力機能。
【００２１】
（１）画像内に存在する文字や図形を包含する四角形状の包含矩形を抽出する文字・図形領域抽出装置であって、
文字や図形が描かれている看板の枠線など、画像内の直線または特徴的な点をハフ変換やパターンマッチングなどの画像処理技術により取得し、該直線または特徴的な点に基づき文字や図形の包含矩形を抽出する際の基準となるネット状グリッドを構築する手段と、
マウスまたはキーボード等の入力デバイスによって操作されるポインタに前記ネット状グリッドに沿ったガイド線を重畳させて表示し、該ポインタを用いて指定された包含矩形の頂点に基づき文字や図形の包含矩形を抽出する手段と、
を備えたことを特徴とする文字・図形領域抽出装置。
【００２２】
（２）上記の（１）に記載の文字・図形領域抽出装置において、
前記入力デバイスを用いて複数の点が入力され、該点に基づいて前記ネット状グリッドを構築する手段を備えたことを特徴とする文字・図形領域抽出装置。
【００２３】
（３）上記の（１）または（２）に記載の文字・図形領域抽出装置において、
画像内に複数の異なるネット状グリッドを構築し、各ネット状グリッドの領域内では該ネット状グリッドに沿ったガイド線をポインタに重畳させて表示する手段を備えたことを特徴とする文字・図形領域抽出装置。
【００２４】
（４）上記の（１）から（３）のいずれかに記載の文字・図形領域抽出装置において、
各文字や図形の文字コード、文字フォント、図形名称、または画像情報などの付帯情報をマウスまたはキーボード等の入力デバイスを用いて入力する手段を備えたことを特徴とする文字・図形領域抽出装置。
【００２５】
（５）画像内に存在する文字や図形を包含する四角形状の包含矩形を抽出する文字・図形領域抽出方法であって、
文字や図形が描かれている看板の枠線など、画像内の直線または特徴的な点をハフ変換やパターンマッチングなどの画像処理技術により取得し、該直線または特徴的な点に基づき文字や図形の包含矩形を抽出する際の基準となるネット状グリッドを構築する過程と、
マウスまたはキーボード等の入力デバイスによって操作されるポインタに前記ネット状グリッドに沿ったガイド線を重畳させて表示し、該ポインタを用いて指定された包含矩形の頂点に基づき文字や図形の包含矩形を抽出する過程と、
を備えたことを特徴とする文字・図形領域抽出方法。
【００２６】
（６）上記の（５）に記載の文字・図形領域抽出方法において、
前記入力デバイスを用いて複数の点が入力され、該点に基づいて前記ネット状グリッドを構築する過程を備えたことを特徴とする文字・図形領域抽出方法。
【００２７】
（７）上記の（５）または（６）に記載の文字・図形領域抽出方法において、
画像内に複数の異なるネット状グリッドを構築し、各ネット状グリッドの領域内では該ネット状グリッドに沿ったガイド線をポインタに重畳させて表示する過程を備えたことを特徴とする文字・図形領域抽出方法。
【００２８】
（８）上記の（５）から（７）のいずれかに記載の文字・図形領域抽出方法において、
各文字や図形の文字コード、文字フォント、図形名称、または画像情報などの付帯情報をマウスまたはキーボード等の入力デバイスを用いて入力する過程を備えたことを特徴とする文字・図形領域抽出方法。
【００２９】
（９）上記の（５）〜（８）のいずれかに記載の文字・図形領域抽出方法における処理手順をコンピュータで実行可能に構成したことを特徴とする文字・図形領域抽出プログラム。
【００３０】
（１０）上記の（５）〜（８）のいずれかに記載の文字・図形領域抽出方法における処理手順をコンピュータに実行させるためのプログラムを、該コンピュータが読み取り可能な記録媒体に記録したことを特徴とする記録媒体。
【００３１】
【発明の実施の形態】
本発明の実施の形態を、図を参照して詳細に説明する。
【００３２】
図１に本実施形態の処理手順を示す。１００は撮影装置または画像読取装置で入力される風景写真や街中の写真などの正解データ作成対象画像、２００は画像１００にネット状グリッド（画像に含まれる文字や図形の包含矩形を抽出する際の基準となる座標系）を構築する処理、３００は各文字や各図形の包含矩形を抽出する処理、４００は包含矩形抽出処理３００により抽出された包含矩形をディスプレイや記憶装置に出力された包含矩形形状情報、５００は画像情報、文字フォント、文字コード、図形名称などの付帯情報の入力処理、６００は付帯情報入力処理５００により入力された付帯情報、７００は同一ネット状グリッド内のすべての文字や図形について処理が終了したか否かを判定する処理、８００は画像内のすべてのネット状グリッドについて処理が終了したか否かを判定する処理である。
【００３３】
画像１００は、その中に文字や図形を含む画像であり、例えば風景写真や街中の写真などである。画像１００の例として、「神奈川県横須賀市」という文字列が書かれた看板写真画像を図２に示す。
【００３４】
ネット状グリッド構築処理２００は、画像処理技術によって取得された文字や図形が描かれている看板の枠線などの直線または特徴的な点、あるいはマウスまたはキーボード等の入力デバイスを用いて人手により入力された基準点にもとづき、ネット状グリッドを自動構築する処理である。
【００３５】
ここで、看板の枠線などの直線を画像内から検出し、これを基にしてグリッド状ネットを作成する方法を説明する。直線を検出する方法としては、従来から良く知られている方法として、ハフ（Ｈｏｕｇｈ）変換や、パターンマッチング（参考文献［１］）がある。
【００３６】
参考文献１「画像処理標準テキストブック、画像情報教育振興協会、ｐｐ．１８７−１９０」
Ｈｏｕｇｈ変換により抽出された複数の垂直方向の直線の交点と、複数の水平方向の直線の交点をそれぞれ消失点とすることにより、ネット状グリッドを作成することができる。図３に例を示す。図３の２０１，２０２，２０３，２０４は抽出された直線であり、垂直方向の直線２０１と２０２の交点から消失点２０５が、水平方向の直線２０３と２０４の交点からもう１つの消失点２０６が求まる。
【００３７】
また、本実施形態では人手によりネット状グリッド構築用の基準点を入力する方法を以下に示す。図４に基準点の例を示す。図４の例では、看板に存在する枠線の左上、左下、右下、右上の４つの頂点を基準点２０７，２０８，２０９，２１０としている。図４の枠線のような物がない場合、すなわち、基準点が明確でない場合、例えば「県」という文字に含まれる「目」の部分の左上、左下、右下、右上の４つの頂点を基準点とすることが可能である。この基準点は、お互いの距離が大きいほど精度良くネット状グリッドを構築できるため好ましい。また、一度基準点を入力した後でも、入力デバイスを用いてその位置を微修正することが可能である。また、基準点は４点だけでなく、より精度の高いネット状グリッドを構築するために５点以上入力することも可能であるが、本実施形態では基準点を４点として説明する。また、写真画像内に複数の看板が写っているような場合、ネット状グリッドも複数存在することとなり、各ネット状グリッドについてそれぞれ異なる基準点を入力する。
【００３８】
次に、入力された基準点に基づき、ネット状グリッドを自動構築する。この処理例を図５に示す。図５の２０７，２０８，２０９，２１０は４つの基準点であり、これらの点の座標から、簡単な幾何計算により、ネット状グリッドの消失点２１１，２１２を求めることが可能である。
【００３９】
求めた消失点を始点とする等角度間隔のグリッド線を、図２の看板写真画像に重畳させて表示した例を図６に示す。このグリッド線は任意に非表示とすることも可能である。また、一度消失点を算出してネット状グリッドを構築した後も、入力デバイスを用いて基準点位置を微修正し、グリッドを構築し直すことも可能である。
【００４０】
次に、図１の包含矩形抽出処理３００は、文字や図形の包含矩形の４頂点を、入力デバイスを用いて人手により指定することにより、該包含矩形を抽出する処理である。効率的かつ精度良く包含矩形を抽出するため、ネット状グリッドに沿った２本のガイド線をマウスポインタに発生させる。該ガイド線を参考にしてマウスポインタを包含矩形の４つの頂点に移動させることにより、効率的かつ精度良く該頂点を指定することが可能である。ガイド線付きマウスポインタの例を図７に示す。
【００４１】
この場合、包含矩形形状情報４００は、包含矩形抽出処理３００により抽出された包含矩形の４つの頂点座標である。
【００４２】
付帯情報入力処理５００は、必要に応じ、画像サイズや色数などの画像情報、文字フォント、文字コード、図形名称、各文字や図形が属するネット状グリッド、各文字や図形が属する文字列や図形列、写真撮影場所、写真撮影時間などに関する付帯情報を人手により入力する処理である。これら情報の入力は包含矩形の抽出処理と同時に入力デバイスにより行わる。
【００４３】
付帯情報６００は、付帯情報入力処理５００により入力された各付帯情報であり、テキストファイル、ＣＳＶファイルなど任意に指定した形式のファイルで出力される。
【００４４】
同一ネット状グリッド内処理終了判定処理７００は、１つのネット状グリッド内すべての文字や図形が処理されたか否か、人手による判定結果入力を実行する処理である。まだ処理されていない文字や図形が存在する場合、包含矩形抽出処理３００へ戻って包含矩形抽出処理３００と付帯情報入力処理５００が繰り返し実行される。
【００４５】
処理終了判定処理８００は、画像内のすべてのネット状グリッドについて処理が終了したか否か人手による判定結果入力を実行する処理である。まだ処理が終わっていないネット状グリッドが存在する場合、ネット状グリッド基準点入力処理２００へ戻り、ネット状グリッド構築処理２００と包含矩形抽出処理３００と付帯情報入力処理５００が繰り返し実行される。
【００４６】
以上に説明してきた処理により、画像内の文字や図形の包含矩形を効率的かつ精度良く抽出することが可能となる。
【００４７】
図８は、本発明の実施の形態としての、文字・図形領域抽出装置の構成を示し、各部は図１の処理機能をソフトウェア構成で実現するコンピュータ構成とする。図８において、画像入力部１は風景写真や街中の写真などの正解データ作成対象画像を入力する撮影装置または画像読取装置。ネット状グリッド構築部２は画像入力部１から入力された画像に対してネット状グリッドを構築する。包含矩形抽出部３は画像に含まれる各文字や各図形の包含矩形形状情報を抽出する。付帯情報入力部４は抽出された包含矩形形状情報に対する画像情報、文字フォント、文字コード、図形名称などの付帯情報の入力処理をする。情報出力部５は包含矩形抽出部３により抽出された包含矩形情報および付帯情報入力部４で入力された付帯情報をディスプレイや記憶装置に出力する。判定入力部６は同一ネット状グリッド内のすべての文字や図形について処理が終了したか否かの判定入力、および画像内のすべてのネット状グリッドについて処理が終了したか否かの判定入力に応じて、ネット状グリッド構築部２および包含矩形抽出部３での実行を指令する。
【００４８】
なお、本発明は、２次元画像を用いて画像に写った３次元物体の形状や運動を求める３次元情報復元技術における、幾何学的キャリブレーションに利用することができる。３次元情報復元は、カメラやビデオによる３次元情報から２次元情報への投影の過程を正確に求める必要があり、カメラモデルやその各種パラメータのキャリブレーションが行われる。このキャリブレーション方法の１つに、消失点の性質を利用した方法があり、以下に挙げるような様々な従来研究が存在する。
【００４９】
（ａ）新技術コミュニケーションズ、コンピュータビジョン、ｐｐ．４２
（ｂ）ＣａｐｒｉｌｅａｎｄＶ．Ｔｏｒｒｅ，Ｕｓｉｎｇｖａｎｉｓｈｉｎｇｐｏｉｎｔｓｆｏｒｃａｍｅｒａｃａｌｉｂｒａｔｉｏｎ，ＩＪＣＶ，Ｖｂｌ．４，ｐｐ．１２７−１４０．１９９０。
【００５０】
（ｃ）Ｗ．ＣｈｅｎａｎｄＢ．Ｃ．Ｊｉａｎｇ，３−Ｄｃａｍｅｒａｃａｌｉｂｒａｔｉｏｎｕｓｉｎｇｖａｎｉｓｈｉｎｇｐｏｉｎｔｃｏｎｓｅｐｔ，ＰＲ，Ｖｏｌ．２４，ｐｐ．５７−６７．１９９１。
【００５１】
これら従来手法は、平行線を描いた物体を用意し、これを撮影した画像を解析することにより消失点を自動抽出しているが、本発明では人手による判定と入力で、看板の外枠や文字内部の矩形などさまざまな手がかりを用いて消失点を抽出することができる。本発明により抽出した消失点を従来手法に適用することで、簡易かつ効率的なキャリブレーションが可能となる。
【００５２】
さらに、本発明は、カメラやビデオで撮影された画像や映像に対し、歪み補正や回転補正などの編集に用いることもできる。
【００５３】
また、本発明は、図１等に示した方法の一部又は全部の処理機能をプログラムとして構成してコンピュータを用いて実行させることができる。また、コンピュータでその各部の処理機能を実現するためのプログラム、あるいはコンピュータにその処理手順を実行させるためのプログラムを、そのコンピュータが読み取り可能な記録媒体、例えば、フレキシブルディスク、ＭＯ、ＲＯＭ、メモリカード、ＣＤ、ＤＶＤ、リムーバブルディスクなどに記録して、保存したり、提供したりすることが可能であり、また、インターネットのような通信ネットワークを介して配布したりすることが可能である。
【００５４】
【発明の効果】
以上説明してきたように、本発明は、文字や図形の包含矩形を抽出する際、包含矩形の基準となる座標系（ネット状グリッド）を構築し、これに沿ったガイド線を付与したマウスポインタを移動させて包含矩形の頂点を指定することとしたため、画像内の文字や図形の包含矩形を効率的かつ精度良く抽出することができる。これにより文字認識や図形認識技術開発のための正解データを効率良く作成することができる。
【００５５】
また、本発明では人手により抽出される消失点は、２次元画像からの３次元情報復元におけるカメラモデルとそのパラメータのキャリブレーション手法に利用することができる。
【００５６】
さらに、本発明は、カメラやビデオで撮影された画像や映像に対し、歪み補正や回転補正などの編集に利用することができる。
【図面の簡単な説明】
【図１】本発明の実施形態を示す処理手順。
【図２】実施形態における看板写真画像の例。
【図３】実施形態におけるネット状グリッドの作成例。
【図４】実施形態におけるネット状グリッド基準点の例。
【図５】実施形態におけるネット状グリッド構築処理例。
【図６】実施形態におけるネット状グリッドの例。
【図７】実施形態におけるガイド線付きマウスポインタの例。
【図８】本発明の実施形態を示す装置構成図。
【符号の説明】
１…画像入力部
２…ネット状グリッド構築部
３…包含矩形抽出部
４…付帯情報入力部
５…情報出力部
６…判定入力部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an apparatus and a method for extracting a rectangular circumscribed rectangular area (enclosing rectangle) including characters and graphics existing in an image using an input device such as a mouse or a keyboard.
[0002]
[Prior art]
The development of character recognition and figure recognition technologies includes various methods such as character codes of characters and figures existing in the image to be recognized, character fonts, figure names, positions of characters and figures, rectangular shapes of characters and figures, image information, etc. Data in which information is aggregated (hereinafter referred to as correct data) is required. This correct answer data may be used as learning data when creating a recognition template (hereinafter referred to as a dictionary), or may be used when measuring the correctness / error performance of a recognition result. This is important when developing recognition technology.
[0003]
In the correct answer data, the position and the included rectangular shape are described by the coordinates of the four vertices of the rectangle. Conventionally, using commercially available image processing software or the like (for example, see Non-Patent Documents 1 and 2), the following procedure is used. Coordinates were extracted.
[0004]
S1. The recognition target image is read into commercially available image processing software and displayed.
[0005]
S2. The rectangle is extracted by manually clicking the four vertices of the enclosing rectangle with a mouse using a rectangular region extraction tool often provided as a standard feature in commercially available image processing software.
[0006]
S3. The coordinates of the four vertices are manually read and recorded using a mouse pointer position display function often provided as a standard feature in commercially available image processing software.
[0007]
[Non-patent document 1]
Image processing software: Publisher: Paint Shop Pro, Jasc Software Inc. [0008]
[Non-patent document 2]
Image processing software: Publisher: Photoshop, Adobe
[Problems to be solved by the invention]
If each character or figure in the image has a rectangular outer frame in advance, it is possible to accurately extract the coordinates of the four vertices of the enclosing rectangle by the conventional method. There is no outer frame.
[0010]
In such a case, first, four sides of an object having a rectangular shape and a clear outer frame, such as a signboard on which characters and figures are written, are first extracted, and then these four sides are used as a reference, that is, these are extracted. By performing the parallel movement, the four sides of the rectangle including the characters and figures are determined, and the coordinates of the vertices at which the sides intersect are extracted. This will take a lot of work.
[0011]
In addition, since the image is perspective-transformed in principle, the four sides of the enclosing rectangle of characters and figures and the corresponding sides of the four sides of the signboard on which they are written are not completely parallel, but are straight lines. Are slightly different from each other, so that it is difficult to extract an inclusive rectangular shape with high accuracy.
[0012]
An object of the present invention is to provide a character / graphic region extracting apparatus, method, program, and recording medium that solve the above-mentioned problems.
[0013]
[Means for Solving the Problems]
The present invention is characterized by an apparatus, a method, a program, and a recording medium having the following functions for efficiently and accurately extracting a contained rectangle of a character or a figure.
[0014]
-Image reading and display function.
[0015]
・ A function to automatically construct a net-like grid.
[0016]
-Fine adjustment function of net-like grid.
[0017]
・ A function to give a mouse pointer a guide line along a net-like grid.
[0018]
-A function to determine the included rectangular shape by designating four vertices.
[0019]
・ Automatic output function of rectangular shape information.
[0020]
-Additional information input functions such as image information, character fonts, character codes, and graphic names as extended functions depending on the purpose of use, in addition to the basic functions 1 to 6 described above.
[0021]
(1) A character / graphic region extracting apparatus for extracting a rectangular inclusion rectangle that includes characters and graphics existing in an image,
A straight line or a characteristic point in an image, such as a frame of a signboard on which a character or figure is drawn, is acquired by image processing technology such as Hough transform or pattern matching, and a character or a figure is obtained based on the straight line or characteristic point. Means for constructing a net-like grid serving as a reference when extracting the inclusion rectangle of
A guide line along the net-like grid is superimposed and displayed on a pointer operated by an input device such as a mouse or a keyboard, and the enclosing rectangle of a character or a figure is formed based on the vertex of the enclosing rectangle specified by using the pointer. Means for extracting,
A character / figure region extraction device characterized by comprising:
[0022]
(2) In the character / graphic region extraction device according to (1),
A character / figure region extracting apparatus, comprising: a plurality of points input using the input device; and a unit configured to construct the net-like grid based on the points.
[0023]
(3) In the character / graphic region extraction device according to (1) or (2),
Characters / graphics comprising means for constructing a plurality of different net-like grids in an image, and superimposing and displaying a guide line along the net-like grid on a pointer in an area of each net-like grid. Region extraction device.
[0024]
(4) In the character / graphic region extraction device according to any one of (1) to (3),
A character / graphic region extracting apparatus comprising means for inputting supplementary information such as a character code of each character or graphic, a character font, a graphic name, or image information using an input device such as a mouse or a keyboard.
[0025]
(5) A character / graphic region extraction method for extracting a rectangular inclusion rectangle that includes characters and / or graphics present in an image,
A straight line or a characteristic point in an image, such as a frame of a signboard on which a character or figure is drawn, is acquired by image processing technology such as Hough transform or pattern matching, and a character or a figure is obtained based on the straight line or characteristic point. Constructing a net-like grid that serves as a reference when extracting the containing rectangle of
A guide line along the net-like grid is superimposed and displayed on a pointer operated by an input device such as a mouse or a keyboard, and the enclosing rectangle of a character or a figure is formed based on the vertex of the enclosing rectangle specified by using the pointer. The process of extracting,
A character / figure area extraction method characterized by comprising:
[0026]
(6) In the character / graphic area extraction method according to (5),
A method for extracting a character / graphic region, comprising a step of inputting a plurality of points using the input device and constructing the net-like grid based on the points.
[0027]
(7) In the character / graphic region extraction method according to the above (5) or (6),
Characters / graphics comprising a step of constructing a plurality of different net-like grids in an image, and superimposing and displaying a guide line along the net-like grid on a pointer in the area of each net-like grid. Region extraction method.
[0028]
(8) In the character / graphic area extraction method according to any one of (5) to (7),
A character / graphic area extraction method, comprising a step of inputting supplementary information such as a character code of each character or graphic, a character font, a graphic name, or image information using an input device such as a mouse or a keyboard.
[0029]
(9) A character / graphic region extraction program, characterized in that the processing procedure in the character / graphic region extraction method according to any one of (5) to (8) is configured to be executable by a computer.
[0030]
(10) A program for causing a computer to execute the processing procedure in the character / graphic area extraction method according to any one of (5) to (8) above is recorded on a computer-readable recording medium. Characteristic recording medium.
[0031]
BEST MODE FOR CARRYING OUT THE INVENTION
Embodiments of the present invention will be described in detail with reference to the drawings.
[0032]
FIG. 1 shows a processing procedure of the present embodiment. Reference numeral 100 denotes an image for which correct data is to be created, such as a landscape photograph or a photograph of a city, which is input by a photographing device or an image reading device. Reference numeral 200 denotes a net-like grid (for extracting a rectangle including characters and figures included in the image). A process for constructing a reference coordinate system), a process 300 for extracting an enclosing rectangle of each character or each figure, and a process 400 for enclosing a rectangle extracted by the enclosing rectangle extracting process 300 to a display or a storage device Shape information, 500 is input processing of additional information such as image information, character fonts, character codes, graphic names, etc., 600 is additional information input by the additional information input processing 500, and 700 is all characters in the same net-like grid. A process for determining whether or not the processing has been completed for the graphic, 800 indicates whether or not the processing has been completed for all net-like grids in the image It is a process of determining the.
[0033]
The image 100 is an image including characters and figures therein, and is, for example, a landscape photograph or a photograph of a city. As an example of the image 100, FIG. 2 shows a signboard photograph image in which a character string “Yokosuka City, Kanagawa Prefecture” is written.
[0034]
The net-like grid construction processing 200 is performed manually by using a line or a characteristic point such as a frame of a signboard on which a character or a figure obtained by the image processing technique is drawn, or an input device such as a mouse or a keyboard. This is a process for automatically constructing a net-like grid based on the set reference points.
[0035]
Here, a method for detecting a straight line such as a frame of a signboard from an image and creating a grid net based on the straight line will be described. As a method of detecting a straight line, Hough transform and pattern matching (reference document [1]) are well known methods.
[0036]
Reference 1 "Image Processing Standard Textbook, Image Information Education Promotion Association, pp. 187-190"
A net-like grid can be created by setting the intersection of a plurality of vertical straight lines extracted by the Hough transform and the intersection of a plurality of horizontal straight lines as vanishing points. FIG. 3 shows an example. In FIG. 3, 201, 202, 203, and 204 are extracted straight lines. A vanishing point 205 is formed from the intersection of the vertical straight lines 201 and 202, and another vanishing point 206 is formed from the intersection of the horizontal straight lines 203 and 204. I get it.
[0037]
In this embodiment, a method of manually inputting a reference point for constructing a net-like grid will be described below. FIG. 4 shows an example of the reference point. In the example of FIG. 4, four vertices at the upper left, lower left, lower right, and upper right of the frame existing on the signboard are set as reference points 207, 208, 209, and 210. If there is no such thing as the frame line in FIG. 4, that is, if the reference point is not clear, for example, the four vertices of the upper left, lower left, lower right, and lower right of the “eye” portion included in the characters “prefecture” It can be a reference point. The larger the distance between the reference points is, the more preferable it is to construct a net grid with higher accuracy. Even after the reference point has been input once, the position can be finely corrected using an input device. In addition, not only four reference points but also five or more reference points can be input in order to construct a more accurate net-like grid. However, in the present embodiment, four reference points will be described. Further, when a plurality of signboards appear in the photographic image, there are a plurality of net-like grids, and different reference points are input for each net-like grid.
[0038]
Next, a net grid is automatically constructed based on the input reference points. FIG. 5 shows an example of this processing. The reference points 207, 208, 209, and 210 in FIG. 5 are four reference points, and from the coordinates of these points, the vanishing points 211 and 212 of the net-like grid can be obtained by a simple geometric calculation.
[0039]
FIG. 6 shows an example in which grid lines at equal angular intervals starting from the obtained vanishing point are superimposed on the signboard photograph image of FIG. 2 and displayed. This grid line can be arbitrarily hidden. Further, even after the vanishing point is calculated once and the net-like grid is constructed, the reference point position can be finely corrected using the input device and the grid can be constructed again.
[0040]
Next, the inclusion rectangle extraction process 300 in FIG. 1 is a process of extracting the inclusion rectangle by manually specifying four vertices of the inclusion rectangle of a character or a figure using an input device. In order to efficiently and accurately extract the included rectangle, two mouse guide lines are generated along the net-like grid. By moving the mouse pointer to the four vertices of the containing rectangle with reference to the guide line, the vertices can be efficiently and accurately specified. FIG. 7 shows an example of a mouse pointer with a guide line.
[0041]
In this case, the inclusion rectangle shape information 400 is the coordinates of the four vertices of the inclusion rectangle extracted by the inclusion rectangle extraction processing 300.
[0042]
The additional information input processing 500 includes, if necessary, image information such as an image size and the number of colors, a character font, a character code, a graphic name, a net-like grid to which each character or graphic belongs, and a character string or graphic to which each character or graphic belongs. This is a process of manually inputting incidental information regarding columns, photo shooting locations, photo shooting times, and the like. The input of these pieces of information is performed by the input device at the same time as the process of extracting the containing rectangle.
[0043]
The supplementary information 600 is each supplementary information input by the supplementary information input processing 500 and is output as a file of an arbitrary designated format such as a text file or a CSV file.
[0044]
The same net-like grid process end determination process 700 is a process of manually inputting a determination result as to whether or not all characters and graphics in one net-like grid have been processed. If there is a character or graphic that has not been processed yet, the process returns to the inclusion rectangle extraction processing 300, and the inclusion rectangle extraction processing 300 and the accompanying information input processing 500 are repeatedly executed.
[0045]
The process end determination process 800 is a process of manually inputting a determination result as to whether or not the process has been completed for all net-like grids in an image. If there is a net-like grid that has not been processed yet, the process returns to the net-like grid reference point input processing 200, and the net-like grid construction processing 200, the inclusion rectangle extraction processing 300, and the supplementary information input processing 500 are repeatedly executed.
[0046]
By the processing described above, it is possible to efficiently and accurately extract the inclusion rectangles of the characters and figures in the image.
[0047]
FIG. 8 shows the configuration of a character / graphic region extracting apparatus according to an embodiment of the present invention. Each unit has a computer configuration that implements the processing functions of FIG. 1 by a software configuration. In FIG. 8, an image input unit 1 is a photographing device or an image reading device for inputting a correct data creation target image such as a landscape photograph or a street photograph. The net grid constructing unit 2 constructs a net grid for the image input from the image input unit 1. The inclusion rectangle extraction unit 3 extracts inclusion rectangle shape information of each character and each figure included in the image. The supplementary information input unit 4 performs a process of inputting supplementary information such as image information, a character font, a character code, and a graphic name for the extracted included rectangular shape information. The information output unit 5 outputs the inclusion rectangle information extracted by the inclusion rectangle extraction unit 3 and the supplementary information input by the supplementary information input unit 4 to a display or a storage device. The determination input unit 6 responds to a determination input as to whether processing has been completed for all characters and graphics in the same net-like grid and a determination input as to whether processing has been completed for all net-like grids in the image. Then, execution in the net-like grid construction unit 2 and the inclusion rectangle extraction unit 3 is instructed.
[0048]
The present invention can be used for geometric calibration in a three-dimensional information restoration technique for obtaining the shape and motion of a three-dimensional object reflected on an image using a two-dimensional image. In the three-dimensional information restoration, it is necessary to accurately obtain a process of projecting three-dimensional information into two-dimensional information by a camera or a video, and a camera model and various parameters thereof are calibrated. As one of the calibration methods, there is a method using a property of a vanishing point, and various conventional studies described below exist.
[0049]
(A) New Technology Communications, Computer Vision, pp. 42
(B) Caprile and V. Torre, Using vanishing points for camera calibration, IJCV, Vbl. 4, pp. 127-140.1990.
[0050]
(C) W. Chen and B.S. C. Jiang, 3-D camera calibration using vanishing point concept, PR, Vol. 24, pp. 57-67.1991.
[0051]
These conventional methods prepare an object drawn with parallel lines, and automatically extract vanishing points by analyzing an image taken of the object, but in the present invention, the judgment and input by hand, the outer frame of the signboard and the like The vanishing point can be extracted using various clues such as a rectangle inside the character. By applying the vanishing point extracted according to the present invention to the conventional method, simple and efficient calibration can be performed.
[0052]
Further, the present invention can also be used for editing such as distortion correction and rotation correction for images and videos shot by cameras and videos.
[0053]
In the present invention, a part or all of the processing functions of the method shown in FIG. 1 and the like can be configured as a program and executed using a computer. Also, a computer-readable recording medium such as a flexible disk, an MO, a ROM, and a memory card may be used to execute a program for realizing the processing function of each unit in the computer or a program for causing the computer to execute the processing procedure. , A CD, a DVD, a removable disk, or the like, and can be stored or provided, and can be distributed via a communication network such as the Internet.
[0054]
【The invention's effect】
As described above, according to the present invention, when extracting an inclusion rectangle of a character or figure, a coordinate system (net-like grid) serving as a reference of the inclusion rectangle is constructed, and a mouse pointer provided with guide lines along the coordinate system is provided. Is moved to specify the vertices of the enclosing rectangle, so that the enclosing rectangle of the character or figure in the image can be efficiently and accurately extracted. As a result, correct data for character recognition and graphic recognition technology development can be efficiently created.
[0055]
Further, in the present invention, the vanishing point extracted by hand can be used for a calibration method of a camera model and its parameters in restoring three-dimensional information from a two-dimensional image.
[0056]
Further, the present invention can be used for editing such as distortion correction and rotation correction for images and videos shot by a camera or video.
[Brief description of the drawings]
FIG. 1 is a processing procedure showing an embodiment of the present invention.
FIG. 2 is an example of a signboard photograph image in the embodiment.
FIG. 3 is an example of creating a net-like grid in the embodiment.
FIG. 4 is an example of a net-like grid reference point in the embodiment.
FIG. 5 is an example of a net-like grid construction process in the embodiment.
FIG. 6 is an example of a net-like grid in the embodiment.
FIG. 7 is an example of a mouse pointer with a guide line in the embodiment.
FIG. 8 is an apparatus configuration diagram showing an embodiment of the present invention.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Image input part 2 ... Net-like grid construction part 3 ... Inclusion rectangle extraction part 4 ... Additional information input part 5 ... Information output part 6 ... Judgment input part

Claims

What is claimed is: 1. A character / graphic region extracting apparatus for extracting a rectangular inclusion rectangle that includes characters and graphics existing in an image,
A straight line or a characteristic point in an image, such as a frame of a signboard on which a character or figure is drawn, is acquired by image processing technology such as Hough transform or pattern matching, and a character or a figure is obtained based on the straight line or characteristic point. Means for constructing a net-like grid serving as a reference when extracting the inclusion rectangle of
A guide line along the net-like grid is superimposed and displayed on a pointer operated by an input device such as a mouse or a keyboard, and the enclosing rectangle of a character or a figure is formed based on the vertex of the enclosing rectangle specified by using the pointer. Means for extracting,
A character / figure region extraction device characterized by comprising:

The character / graphic area extraction device according to claim 1,
A character / figure region extracting apparatus, comprising: a plurality of points input using the input device; and a unit configured to construct the net-like grid based on the points.

3. The character / graphic area extraction device according to claim 1,
Characters / graphics comprising means for constructing a plurality of different net-like grids in an image, and superimposing and displaying a guide line along the net-like grid on a pointer in an area of each net-like grid. Region extraction device.

4. The character / graphic area extraction device according to claim 1,
A character / graphic region extracting apparatus comprising means for inputting supplementary information such as a character code of each character or graphic, a character font, a graphic name, or image information using an input device such as a mouse or a keyboard.

A character / graphic region extraction method for extracting a rectangular inclusion rectangle that includes characters and / or graphics present in an image,
A straight line or a characteristic point in an image, such as a frame of a signboard on which a character or figure is drawn, is acquired by image processing technology such as Hough transform or pattern matching, and a character or a figure is obtained based on the straight line or characteristic point. Constructing a net-like grid that serves as a reference when extracting the containing rectangle of
A guide line along the net-like grid is superimposed and displayed on a pointer operated by an input device such as a mouse or a keyboard, and the enclosing rectangle of a character or a figure is formed based on the vertex of the enclosing rectangle specified by using the pointer. The process of extracting,
A character / figure area extraction method characterized by comprising:

The character / graphic area extraction method according to claim 5,
A method for extracting a character / graphic region, comprising a step of inputting a plurality of points using the input device and constructing the net-like grid based on the points.

The character / graphic area extraction method according to claim 5 or 6,
Characters / graphics comprising a step of constructing a plurality of different net-like grids in an image, and superimposing and displaying a guide line along the net-like grid on a pointer in the area of each net-like grid. Region extraction method.

8. The character / graphic area extraction method according to claim 5,
A character / graphic area extraction method, comprising a step of inputting supplementary information such as a character code of each character or graphic, a character font, a graphic name, or image information using an input device such as a mouse or a keyboard.

A character / graphic area extraction program, wherein the processing procedure in the character / graphic area extraction method according to any one of claims 5 to 8 is configured to be executable by a computer.

A recording medium, wherein a program for causing a computer to execute the processing procedure in the character / graphic area extraction method according to any one of claims 5 to 8 is recorded on a computer-readable recording medium.