JP2002318998A

JP2002318998A - Pattern input method, program, and storage medium with the program stored therein

Info

Publication number: JP2002318998A
Application number: JP2001292731A
Authority: JP
Inventors: Tetsuya Kinebuchi; 哲也杵渕; Akira Suzuki; 章鈴木; Akio Shio; 昭夫塩; Sakuichi Otsuka; 作一大塚
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2001-02-15
Filing date: 2001-09-26
Publication date: 2002-10-31

Abstract

PROBLEM TO BE SOLVED: To solve the problem that a pattern not defined as dictionary pattern cannot be retrieved when pattern-recognizing a pattern in an input image and converting it to a character code. SOLUTION: This method comprises a processing 20 for cutting an undefined pattern having no character code given thereto in the input image 100 as the dictionary pattern 300; a processing 400 for measuring the characteristic quantity of the dictionary pattern and input image; a processing 500 for retrieving and extracting a pattern 600 similar to the dictionary pattern on the basis of the characteristic quantity; and a processing 700 for giving the same character code to all the extracted similar patterns. These pieces of processing are repeated until the non-defied pattern is eliminated in the image. This method also comprises cutting the patterns appearing in the input image in the order for appearance frequency when cutting the undefined pattern.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、入力画像中に存在
する様々なパターン（文字、図形など）をキャラクタコ
ードに変換して入力するパターン入力方法に関するもの
である。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a pattern input method for converting various patterns (characters, figures, etc.) existing in an input image into character codes and inputting them.

【０００２】[0002]

【従来の技術】従来、入力画像中に存在する様々なパタ
ーンをキャラクタコードに変換して入力する方法として
は、人手による入力方法と、コンピュータを用いたパタ
ーンの認識に基づく入力方法とに分かれる。2. Description of the Related Art Conventionally, methods for converting various patterns existing in an input image into character codes and inputting them are divided into an input method by hand and an input method based on pattern recognition using a computer.

【０００３】前者の方法は、画像中のすべてのパターン
を人が見て認識し、キーボードなどの入力デバイスによ
ってそれぞれのパターンに与えられたキャラクタコード
を順に入力していくやり方である。In the former method, a person sees and recognizes all patterns in an image, and sequentially inputs a character code given to each pattern by an input device such as a keyboard.

【０００４】後者の方法は、一般的にパターン認識（白
井良明、“パターン理解”、オーム社、1987）と呼ばれ
る方法である。この方法は、画像中に発生するであろう
パターンをあらかじめ辞書パターンとして作成してお
き、作成した辞書パターンの濃淡情報や輪郭情報などの
特徴量にもとづいて画像中に存在する様々なパターンを
分類する方法である。[0004] The latter method is a method generally called pattern recognition (Yoshiaki Shirai, "Pattern Understanding", Ohmsha, 1987). In this method, a pattern that will occur in an image is created in advance as a dictionary pattern, and various patterns existing in the image are classified based on features such as density information and outline information of the created dictionary pattern. How to

【０００５】[0005]

【発明が解決しようとする課題】上記「従来の技術」で
述べた前者の方法は、費用と時間の両面で膨大なコスト
を要する。この問題を解決する方法としてコンピュータ
を用いる後者の方法が研究されてきたが、この方法には
次の問題がある。The former method described in the above "prior art" requires enormous costs in both cost and time. As a method for solving this problem, the latter method using a computer has been studied, but this method has the following problems.

【０００６】後者の方法では、あらかじめ作成した辞書
パターンに基づいた検索を行うため、出現が予想されな
かったパターンは認識できない。従って、出現するであ
ろうすべてのパターンの辞書パターンを先に作成してお
かねばならない。しかし、出現パターンのバリエーショ
ンが未知の場合、あらかじめすべての辞書パターンを作
成しておくことは不可能である。In the latter method, since a search is performed based on a dictionary pattern created in advance, a pattern that is not expected to appear cannot be recognized. Therefore, dictionary patterns for all patterns that will appear must be created first. However, if the variation of the appearance pattern is unknown, it is impossible to create all dictionary patterns in advance.

【０００７】また、パターンのバリエーション数が非常
に多い場合、辞書パターンの作成に多くのコストがかか
ってしまう。When the number of pattern variations is very large, it takes a lot of cost to create a dictionary pattern.

【０００８】このように、出現パターンのバリエーショ
ンが未知または非常に多い場合、あらかじめ辞書パター
ンを作成しておく従来方法の枠組みでは対処できない。As described above, when the variation of the appearance pattern is unknown or very large, it cannot be dealt with by the framework of the conventional method in which a dictionary pattern is created in advance.

【０００９】本発明の目的は、出現パターンのバリエー
ションが未知または非常に多い場合にも効率的なパター
ン入力を可能にするパターン入力方法、そのプログラム
およびそのプログラムを記録した記録媒体を提供するこ
とにある。An object of the present invention is to provide a pattern input method, a program thereof, and a recording medium on which the program is recorded, which enables efficient pattern input even when the appearance pattern variation is unknown or very large. is there.

【００１０】[0010]

【課題を解決するための手段】本発明は、前記課題を解
決するため、画像中から切り出される未定義パターンを
辞書パターンとして画像全体を検索し、その類似パター
ンを抽出した後、再び次の未定義パターンの切り出しと
類似パターン検索を繰り返すことにより、未定義パター
ンを段階的に減らし、最大でパターンバリエーション数
の回数の切り出し作業を行うことですべてのパターンの
入力を可能にするもので、以下の方法、プログラムおよ
びそのプログラムを記録した記録媒体を特徴とする。According to the present invention, in order to solve the above-mentioned problems, an entire image is searched by using an undefined pattern cut out from an image as a dictionary pattern, and a similar pattern is extracted. By repeating the extraction of the definition pattern and the similar pattern search, the undefined pattern is reduced in stages, and the extraction of the maximum number of pattern variations is performed, enabling input of all the patterns. The invention is characterized by a method, a program, and a recording medium on which the program is recorded.

【００１１】（１）入力画像中に存在する様々なパター
ンをキャラクタコードに変換して入力する方法におい
て、入力画像中でキャラクタコードが付与されていない
未定義パターンを辞書パターンとして切り出す第１の過
程と、前記切り出しにより得られた辞書パターンと入力
画像の特徴量を計測する第２の過程と、前記特徴量計測
により得られた特徴量に基づいて辞書パターンと類似す
るパターンを入力画像中から検索して抽出する第３の過
程と、前記過程により抽出された類似パターンのすべて
に同一のキャラクタコードを付与する第４の過程と、画
像中の未定義パターンが無くなるまで、前記第１の過程
から第４の過程までを繰り返すことを特徴とするパター
ン入力方法。(1) In a method of converting various patterns existing in an input image into a character code and inputting the same, a first step of cutting out an undefined pattern without a character code in the input image as a dictionary pattern A second step of measuring a dictionary pattern obtained by the cutout and a feature amount of the input image; and searching the input image for a pattern similar to the dictionary pattern based on the feature amount obtained by the feature amount measurement. And a fourth step of assigning the same character code to all of the similar patterns extracted in the above step, and the first step until the undefined pattern in the image disappears. A pattern input method characterized by repeating the steps up to a fourth step.

【００１２】（２）上記の（１）に記載のパターン入力
方法において、入力画像中から未定義パターンを切り出
す過程では、入力画像中に出現する頻度の高いパターン
から順に切り出していくことを特徴とするパターン入力
方法。(2) In the pattern input method according to the above (1), in the step of cutting out the undefined pattern from the input image, the pattern is cut out in order from the pattern which appears frequently in the input image. The pattern input method to be performed.

【００１３】（３）上記の（１）または２に記載のパタ
ーン入力方法において、検索により抽出された類似パタ
ーンにキャラクタコードを付与する過程では、辞書パタ
ーンと類似パターンが同一であるか否かの判定を操作入
力として得ることを特徴とするパターン入力方法。(3) In the pattern input method according to the above (1) or (2), in the step of assigning a character code to the similar pattern extracted by the search, it is determined whether or not the dictionary pattern and the similar pattern are the same. A pattern input method characterized in that a determination is obtained as an operation input.

【００１４】（４）上記の（１）から（３）までのいず
れか１項に記載のパターン入力方法において、入力画像
が白黒２値画像である場合、検索に用いる特徴量として
パターンの輪郭特徴を使用することを特徴とするパター
ン入力方法。(4) In the pattern input method according to any one of the above (1) to (3), when the input image is a black-and-white binary image, the contour feature of the pattern is used as a feature amount for search. A pattern input method characterized by using:

【００１５】（５）上記の（１）から（４）までのいず
れか１項に記載のパターン入力方法において、画像中か
ら辞書パターンと類似パターンを抽出する過程では、類
似しているか否かの判定尺度を特徴量間の距離値とする
ことを特徴とするパターン入力方法。(5) In the pattern input method according to any one of the above (1) to (4), in the process of extracting a dictionary pattern and a similar pattern from an image, it is determined whether or not the pattern is similar. A pattern input method characterized in that a judgment scale is a distance value between feature values.

【００１６】（６）上記の（１）から（５）までのいず
れか１項に記載のパターン入力方法において、辞書パタ
ーンと入力画像の両者から輪郭特徴のひとつである加重
平均ヒストグラムを計測する場合、着目している黒画素
を中心とした小領域を作成し、該小領域内において輪郭
方向ヒストグラム取得及び２次元ガウスフィルタによる
ボカシを行うことを特徴とするパターン入力方法。(6) In the pattern input method according to any one of the above (1) to (5), when a weighted average histogram, which is one of the contour features, is measured from both the dictionary pattern and the input image. A small area centering on a black pixel of interest, obtaining a contour direction histogram in the small area, and performing blurring by a two-dimensional Gaussian filter.

【００１７】（７）上記の（１）から（６）までのいず
れか１項に記載のパターン入力方法において、検索によ
り抽出された類似パターンにキャラクタコードを付与す
るとき、人手による同一性の確認を行う過程では、同一
画面内に辞書パターンと複数の類似パターンを一覧表示
することを特徴とするパターン入力方法。(7) In the pattern input method described in any one of (1) to (6) above, when assigning a character code to a similar pattern extracted by a search, the identity is manually confirmed. A dictionary input pattern and a plurality of similar patterns are displayed in a list on the same screen.

【００１８】（８）上記の（１）から（７）までのいず
れか１項に記載のパターン入力方法において、類似パタ
ーンへのキャラクタコード付与過程で人手による同一性
の確認を行う場合に、同一画面内に辞書パターンと複数
の類似パターンを一覧表示するとき、複数の類似パター
ンを辞書パターンとの類似性の高い順または低い順に並
べることを特徴とするパターン入力方法。(8) In the pattern input method according to any one of the above (1) to (7), when the identity is manually confirmed in the process of assigning a character code to a similar pattern, the same A pattern input method characterized in that when a dictionary pattern and a plurality of similar patterns are displayed in a list on a screen, the plurality of similar patterns are arranged in the order of high or low similarity with the dictionary pattern.

【００１９】（９）上記の（１）から（８）までのいず
れか１項に記載のパターン入力方法において、類似パタ
ーンへのキャラクタコード付与過程で人手による同一性
の確認を行う場合に、同一画面内に辞書パターンと複数
の類似パターンを一覧表示するとき、類似パターンはそ
の近傍に存在するパターンを含む程度の大きさの領域で
切り出して表示することを特徴とするパターン入力方
法。(9) In the pattern input method according to any one of the above (1) to (8), when the identity is manually checked in the process of assigning a character code to a similar pattern, the same A pattern input method characterized in that when a dictionary pattern and a plurality of similar patterns are displayed in a list on a screen, the similar patterns are cut out and displayed in an area large enough to include a pattern present in the vicinity thereof.

【００２０】（１０）コンピュータに、上記の（１）か
ら（９）のいずれか１項に記載のパターン入力方法を実
行させるプログラム。(10) A program for causing a computer to execute the pattern input method according to any one of (1) to (9).

【００２１】（１１）コンピュータに、上記の（１）か
ら（９）のいずれか１項に記載のパターン入力方法を実
行させるプログラムを記録した記録媒体。(11) A recording medium which records a program for causing a computer to execute the pattern input method according to any one of the above (1) to (9).

【００２２】[0022]

【発明の実施の形態】本発明の実施の形態として、以下
の２つの実施形態を、図を参照して詳細に説明する。実
施形態１は、白黒２値の入力画像から出現順に未定義パ
ターンを切り出し、パターンの輪郭特徴を特徴量として
類似パターンの検索・抽出を行う方法である。実施形態
２は、未定義パターンの切り出しを出現頻度の高い順に
行う方法である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS As embodiments of the present invention, the following two embodiments will be described in detail with reference to the drawings. The first embodiment is a method in which an undefined pattern is cut out from a binary input image of black and white in the order of appearance, and a similar pattern is searched and extracted using the contour feature of the pattern as a feature amount. The second embodiment is a method of extracting undefined patterns in descending order of appearance frequency.

【００２３】（実施形態１）図１に、実施形態１の処理
手順を示す。１００はイメージ入力装置によって入力さ
れた白及び黒の２値で表される２値画像、２００は未定
義パターンを辞書パターンとして切り出す処理、３００
は切り出された辞書パターン、４００は辞書パターンと
入力画像の特徴量計測処理、５００は類似パターンを入
力画像から検索して抽出する処理、６００は抽出された
類似パターン、７００は抽出された類似パターンに同一
のキャラクタコードを付与する処理、８００はすべての
パターンが入力されたか否かの判定処理である。(Embodiment 1) FIG. 1 shows a processing procedure of Embodiment 1. 100 is a binary image represented by binary values of white and black input by the image input device, 200 is a process of cutting out an undefined pattern as a dictionary pattern, 300
Is a cut-out dictionary pattern, 400 is a feature amount measurement process of the dictionary pattern and the input image, 500 is a process of searching and extracting a similar pattern from the input image, 600 is an extracted similar pattern, and 700 is an extracted similar pattern. Is a process for determining whether or not all patterns have been input.

【００２４】入力画像１００は、スキャナ、ファクシミ
リ、カメラなどのイメージ入力装置によって、白及び黒
の２値画像として入力された画像である。入力画像の例
を、図２の１１０に示す。The input image 100 is an image input as a white and black binary image by an image input device such as a scanner, a facsimile, a camera, or the like. An example of an input image is shown at 110 in FIG.

【００２５】辞書パターン切り出し処理２００は、まだ
キャラクタコードに変換されていない一番先頭のパター
ンを入力画像から切り出して辞書パターン画像を作成す
る処理である。切り出し方法は、一般の画像処理ソフト
ウェアと同様にマウスなどのデバイスによりコンピュー
タのメモリ上に展開された入力画像を操作して切り出す
方法の他、辞書パターン部分のみ再びイメージ入力装置
から入力する方法とすることができる。The dictionary pattern cutout process 200 is a process for cutting out the first pattern that has not been converted into a character code from the input image to create a dictionary pattern image. The clipping method is a method in which an input image developed on a computer memory is cut out by operating a device such as a mouse using a device such as a mouse in the same manner as general image processing software, and a method in which only a dictionary pattern portion is input again from an image input device. be able to.

【００２６】辞書パターン３００は、辞書パターン切り
出し処理２００によって入力画像から切り出された２値
画像である。作成された辞書パターンの例を、図３の３
１０に示す。The dictionary pattern 300 is a binary image extracted from the input image by the dictionary pattern extraction processing 200. An example of the created dictionary pattern is shown in FIG.
It is shown in FIG.

【００２７】特徴量計測処理４００は、辞書パターンと
入力画像それぞれから画像間のマッチングに使われる特
徴量を計測する処理である。特徴量には濃度値や輪郭特
徴など様々なものがあるが、ここでは輪郭特徴のひとつ
であり、手書き漢字認識に実績のある加重方向ヒストグ
ラム特徴（鶴岡他、“加重方向指数ヒストグラム法によ
る手書き漢字・ひらがな認識”、信学論（Ｄ）、Ｊ７０
−Ｄ、Ｎｏ.７、ｐｐ.1390-1397、1987）を用いる方法
を説明する。ただし、ここではあらかじめ画像を小領域
内にブロック分割せず、着目している黒画素を中心とし
た小領域を作成し、該小領域内において輪郭方向ヒスト
グラム取得及び２次元ガウスフィルタによるボカシを行
うこととする。The feature value measuring process 400 is a process for measuring a feature value used for matching between images from each of the dictionary pattern and the input image. There are various features such as density values and contour features. Here, it is one of the contour features.・ Recognition of Hiragana ”, IEICE (D), J70
-D, No. 7, pp. 1390-1397, 1987). However, here, the image is not divided into small regions in advance, but a small region centering on the black pixel of interest is created, and the contour direction histogram is obtained and the two-dimensional Gaussian filter is used in the small region. It shall be.

【００２８】辞書パターンと入力画像の両者に対し、例
えば輪郭方向をπ／４刻みの４方向に量子化した加重平
均ヒストグラム特徴を求めると、辞書パターンと入力画
像の各画素には、周囲の輪郭方向の情報を表す４次元ベ
クトルが与えられる。For each of the dictionary pattern and the input image, a weighted average histogram feature obtained by quantizing the outline direction in four directions, for example, in increments of π / 4, is obtained. A four-dimensional vector representing direction information is provided.

【００２９】類似パターンの検索・抽出処理５００は、
辞書パターンを入力画像に重ねて移動させながら両者の
違いが少ない部分を検索し、その部分を類似パターンと
して抽出する処理である。辞書パターンと入力画像との
相違を表す尺度には、辞書パターンと入力画像の特徴量
間のユークリッド距離、マハラノビス距離、相互相関係
数など様々なものが用いられる。ここではユークリッド
距離を用いる方法について簡単に説明する。The similar pattern search / extraction process 500
This is a process of searching for a portion where there is little difference between the two while moving the dictionary pattern over the input image, and extracting the portion as a similar pattern. Various measures such as a Euclidean distance, a Mahalanobis distance, and a cross-correlation coefficient between the feature amounts of the dictionary pattern and the input image are used as the scale indicating the difference between the dictionary pattern and the input image. Here, a method using the Euclidean distance will be briefly described.

【００３０】サイズがＭ×Ｎ画素の辞書パターンの各画
素に与えられている４次元ベクトルの成分を連続させ、
４×Ｍ×Ｎ次元のベクトルを作成する。辞書パターンと
重なっている部分の入力画像に対しても同様に、４×Ｍ
×Ｎ次元のベクトルを作成する。この時の辞書パターン
の移動量を（Ｘ，Ｙ）とすると、特徴量間のユークリッ
ド距離Ｒ（Ｘ，Ｙ）は、The components of the four-dimensional vector given to each pixel of the dictionary pattern having a size of M × N pixels are made continuous,
Create a 4 × M × N-dimensional vector. Similarly, for an input image overlapping with a dictionary pattern, 4 × M
Create a × N-dimensional vector. Assuming that the movement amount of the dictionary pattern at this time is (X, Y), the Euclidean distance R (X, Y) between the feature amounts is

【００３１】[0031]

【数１】 (Equation 1)

【００３２】で表される。ここで、Ｔ_iは辞書パターン
の成分、Ｉ_iは入力画像の成分である。ユークリッド距
離Ｒの値が小さいほど、画像間の違いは小さい。ユーク
リッド距離Ｒを求める際の処理時間が問題となる場合に
は、１回当たりの辞書パターン移動量を大きくするか、
または画像の解像度を小さくするなどの方法により高速
化を図ることができる。## EQU2 ## Here, T _i is a component of the dictionary pattern, and I _i is a component of the input image. The smaller the value of the Euclidean distance R, the smaller the difference between the images. If the processing time when calculating the Euclidean distance R becomes a problem, increase the dictionary pattern movement amount per time,
Alternatively, the speed can be increased by a method such as reducing the resolution of an image.

【００３３】辞書パターン３１０で入力画像１１０を検
索し、各点における特徴量間のユークリッド距離を計算
した結果を図４の５１０に示す。濃度が高いほど、距離
値が小さいことを表す。５１０で濃度が高い点の座標を
基準として入力画像から辞書パターンサイズの画像を切
り出すことで、辞書パターンの類似パターンを抽出でき
る。The result of searching the input image 110 using the dictionary pattern 310 and calculating the Euclidean distance between the feature values at each point is shown as 510 in FIG. The higher the density, the smaller the distance value. By extracting an image of a dictionary pattern size from the input image based on the coordinates of a point having a high density at 510, a pattern similar to the dictionary pattern can be extracted.

【００３４】類似パターン６００は、検索・抽出処理５
００によって抽出された類似パターンである。図５の６
１０に、辞書パターン３１０により入力画像１１０から
抽出された類似パターンを示す。The similar pattern 600 is searched and extracted 5
This is a similar pattern extracted by “00”. 6 in FIG.
10 shows a similar pattern extracted from the input image 110 by the dictionary pattern 310.

【００３５】キャラクタコード付与処理７００は、類似
パターン６００に同一のキャラクタコードを付与する処
理である。この時、抽出されたすべての類似パターンは
辞書パターンと同一のものであるとみなし、自動的にキ
ャラクタコードを付与する方法と、類似パターンと辞書
パターンとの同一性を人手で確認しつつキャラクタコー
ドを付与する方法とすることができる。人手による確認
はマウスボタンのクリックなどの簡易な動作で行うこと
で、高速に確認が可能となる。The character code assigning process 700 is a process of assigning the same character code to the similar pattern 600. At this time, it is assumed that all the extracted similar patterns are the same as the dictionary pattern, and a method of automatically assigning a character code, and a method of manually confirming the identity between the similar pattern and the dictionary pattern while manually confirming the identity. Can be provided. The confirmation by hand can be performed at high speed by performing a simple operation such as clicking a mouse button.

【００３６】パターン入力判定処理８００は、画像中の
すべてのパターンに対して入力処理が行われたか否かの
判定を行う処理である。まだ未定義パターンが入力画像
中に存在している場合、再び人手により未定義パターン
を切り出す辞書パターン切り出し処理２００へと戻り、
全体の処理を繰り返す。入力画像中に未定義パターンが
存在しているか否かの判定は、コンピュータにより自動
的に行う方法と、人手により行う方法が考えられる。The pattern input determination processing 800 is a processing for determining whether or not input processing has been performed on all patterns in an image. If the undefined pattern still exists in the input image, the process returns to the dictionary pattern cutout processing 200 for manually cutting out the undefined pattern again,
Repeat the whole process. Whether an undefined pattern exists in an input image can be determined by a method automatically performed by a computer or a method performed manually.

【００３７】自動的に行う方法では、例えば次のような
方法とする。パターンは黒画素の集合であるため、一定
範囲内に黒画素がある程度密集している部分は何らかの
パターンの一部であると考えられる。そこで、一定範囲
内に一定値以上の黒画素が存在する部分を抽出し、その
すべてが既に入力されたパターンの一部がどうか確認す
ることにより、すべてのパターンの入力が完了したか判
定できる。既に入力されたパターンの一部でない黒画素
密集部分が存在する場合、入力画像中にはまだ未定義パ
ターンが存在していることになる。The automatic method is as follows, for example. Since a pattern is a set of black pixels, a portion where black pixels are densely packed to a certain extent within a certain range is considered to be a part of a certain pattern. Therefore, it is possible to determine whether or not the input of all the patterns is completed by extracting a portion in which a black pixel having a certain value or more exists within a certain range and confirming whether or not a part of the pattern has already been input. If there is a black pixel dense portion that is not a part of the already input pattern, it means that an undefined pattern still exists in the input image.

【００３８】人手により行う方法では、既に入力された
パターンに色を付けて表示し、入力画像全体を一覧する
ことで、着色されていない未定義パターンを簡単に見つ
ける方法などにされる。In the manual method, a previously input pattern is colored and displayed, and the entire input image is listed to easily find an uncolored undefined pattern.

【００３９】（実施形態２）本発明の実施形態２とし
て、未定義パターンの切り出しを画像中に出現する頻度
の高いパターンから順に行う方法を説明する。(Embodiment 2) As Embodiment 2 of the present invention, a method of extracting an undefined pattern in order from a pattern having a high frequency of appearance in an image will be described.

【００４０】入力画像中に出現するパターンの出現頻度
には偏りがある。そこで、出現頻度の高い順にパターン
切り出しを行うことで、未定義パターンを減らしていく
スピードは実施形態１の場合に比べて速くなる。従っ
て、次に切り出すべき未定義パターンを効率的に見つけ
ることができるようになる。また、人手をかけるコスト
が限定されているような場合、出現頻度の高いパターン
から切り出していく方法は最も効率的である。The appearance frequency of the pattern appearing in the input image is biased. Therefore, by performing pattern cutting in the order of appearance frequency, the speed at which undefined patterns are reduced is higher than in the first embodiment. Therefore, an undefined pattern to be extracted next can be efficiently found. In the case where the labor cost is limited, a method of cutting out a pattern having a high appearance frequency is the most efficient.

【００４１】（実施形態３）本発明の実施形態３とし
て、検索により抽出された類似パターンにキャラクタコ
ードを付与するのに、辞書パターンと類似パターンが同
一であるか否かを容易に判定できる方法を説明する。(Embodiment 3) As Embodiment 3 of the present invention, a method for easily determining whether or not a dictionary pattern and a similar pattern are the same in assigning a character code to a similar pattern extracted by a search. Will be described.

【００４２】図６の７１０は、辞書パターンと類似パタ
ーンが同一であるか否かの判定を人手により確認するた
めのウィンドウ表示の例である。辞書パターン及び複数
の類似パターンを同一画面内に一覧表示する。ここで
は、類似パターンは５パターンずつ一列に並べた合計１
０パターンが表示された場合である。Reference numeral 710 in FIG. 6 shows an example of a window display for manually confirming whether or not the dictionary pattern and the similar pattern are the same. A dictionary pattern and a plurality of similar patterns are displayed in a list on the same screen. Here, a total of 1 similar patterns are arranged in a line by 5 patterns.
This is the case where 0 pattern is displayed.

【００４３】７２０は、入力画像から切り出された辞書
パターンであり、図３の３１０と同一である。Reference numeral 720 denotes a dictionary pattern cut out from the input image, which is the same as 310 in FIG.

【００４４】このように、辞書パターンと類似パターン
の一覧表示により、辞書パターンと各類似パターンの比
較だけでなく、類似パターン同士の比較も可能となり、
辞書パターンと類似パターンが同一であるか否かの判定
をより効率的にする。As described above, the list display of the dictionary patterns and the similar patterns enables not only the comparison between the dictionary patterns and the respective similar patterns but also the comparison between the similar patterns.
It is more efficient to determine whether a dictionary pattern and a similar pattern are the same.

【００４５】７３０は、抽出された複数の類似パターン
であり、ここでは辞書パターンとの類似性が高い順番に
並べる。このように類似性が高い順に並べることで、正
しく抽出されたパターンと誤って抽出されたパターンが
それぞれ固まって並ぶ傾向が強まる。また、類似パター
ンそのものだけでなく、その近傍領域（この例では上
下）に存在するパターンを含む領域を切り出して表示す
ることで、近傍領域に存在するパターンを同一性確認時
の参考とすることができる。Reference numeral 730 denotes a plurality of extracted similar patterns. Here, the similar patterns are arranged in the order of higher similarity with the dictionary pattern. By arranging the patterns in the order of similarity in this manner, the tendency of correctly extracted patterns and erroneously extracted patterns being individually arranged increases. In addition, by extracting and displaying not only the similar pattern itself but also an area including a pattern existing in a neighboring area (upper and lower in this example), the pattern existing in the neighboring area can be used as a reference when confirming identity. it can.

【００４６】７４０は、一度の入力操作で複数の類似パ
ターンの確認処理（キャンセル処理）を行うためのコマ
ンドボタンである。類似パターンは類似性の高い順に並
んでいるため、正しく抽出されたパターンは固まってい
る傾向が強く、ある程度の数の類似パターンをまとめて
処理の対象とするコマンドボタンがあると効率的であ
る。この例では上の列に並んでいる５つのパターンはす
べて正しいパターンであるため、一括処理が可能であ
る、５パターンそれぞれに対して１回ずつ操作する場合
に比べ、５倍の効率化を図ることができる。Reference numeral 740 denotes a command button for performing confirmation processing (cancel processing) of a plurality of similar patterns by one input operation. Since similar patterns are arranged in descending order of similarity, correctly extracted patterns have a strong tendency to be solidified, and it is efficient if there is a command button for which a certain number of similar patterns are collectively processed. In this example, the five patterns arranged in the upper row are all correct patterns, so that batch processing is possible. Efficiency is improved five times as compared with a case of operating once for each of the five patterns. be able to.

【００４７】また、まとめて処理の対象とする類似パタ
ーンの一部のみが正しく抽出されている場合には、正し
く抽出されたパターンが表示されている枠内の領域をマ
ウスボタンで個々にクリックするなどの動作で正しいパ
ターンのみを選択することができる。When only a part of the similar patterns to be collectively processed is correctly extracted, the area within the frame in which the correctly extracted pattern is displayed is individually clicked with a mouse button. Only the correct pattern can be selected by the operation described above.

【００４８】なお、図１等で示したパターン入力方法の
一部又は全部の処理機能をプログラムとして構成してコ
ンピュータを用いて実現すること、あるいは図１等で示
した処理手順をプログラムとして構成してコンピュータ
に実行させることができる。また、コンピュータでその
各部の処理機能を実現するためのプログラム、あるいは
コンピュータにその処理手順を実行させるためのプログ
ラムを、そのコンピュータが読み取り可能な記録媒体、
例えば、ＦＤ（フロッピー（登録商標）ディスク）、Ｍ
Ｏ、ＲＯＭ、メモリカード、ＣＤ、ＤＶＤ、リムーバブ
ルディスクなどに記録して、保存したり、提供したりす
ることが可能であり、また、インターネットのような通
信ネットワークを介して配布したりすることが可能であ
る。It is to be noted that a part or all of the processing functions of the pattern input method shown in FIG. 1 or the like may be implemented as a program and implemented using a computer, or the processing procedure shown in FIG. 1 or the like may be implemented as a program. Computer. Further, a computer-readable recording medium readable by a computer, the program for realizing the processing function of each unit, or the program for causing the computer to execute the processing procedure,
For example, FD (floppy (registered trademark) disk), M
O, ROM, a memory card, a CD, a DVD, a removable disk, etc., and can be stored and provided, and can be distributed via a communication network such as the Internet. It is possible.

【００４９】[0049]

【発明の効果】以上説明してきたように、本発明は未定
義パターンの切り出しとその類似パターン抽出を繰り返
し、段階的に未定義パターンを減少させていくことで、
出現パターンのバリエーションが未知または非常に多い
場合に対して、必要最低限の人手の介入による効率的な
パターン入力を可能とする。また、パターンの出現頻度
には偏りがあることを考慮し、切り出しを出現頻度の高
い順から行うことで、より効率的なパターン入力が可能
となる。As described above, according to the present invention, the extraction of the undefined pattern and the extraction of the similar pattern are repeated, and the undefined pattern is reduced step by step.
It enables efficient pattern input with minimum necessary human intervention in cases where the appearance pattern variation is unknown or very large. In addition, taking into account the fact that there is a bias in the frequency of appearance of the pattern, by performing the cutout in the order of appearance frequency, more efficient pattern input becomes possible.

[Brief description of the drawings]

【図１】本発明の実施形態を示す処理手順図。FIG. 1 is a processing procedure diagram showing an embodiment of the present invention.

【図２】パターン入力画像の例。FIG. 2 is an example of a pattern input image.

【図３】辞書パターンの例。FIG. 3 is an example of a dictionary pattern.

【図４】辞書パターンと入力画像の距離値を求めた画像
例。FIG. 4 is an image example in which a distance value between a dictionary pattern and an input image is obtained.

【図５】画像中の類似パターンの例。FIG. 5 is an example of a similar pattern in an image.

【図６】類似パターンの確認画面の例。FIG. 6 is an example of a confirmation screen for a similar pattern.

[Explanation of symbols]

１００…２値画像２００…辞書パターン切り出し処理３００…辞書パターン４００…特徴量計測処理５００…類似パターン検索・抽出処理６００…類似パターン７００…キャラクタコード付与処理８００…判定処理 100: binary image 200: dictionary pattern cutout processing 300: dictionary pattern 400: feature quantity measurement processing 500: similar pattern search / extraction processing 600: similar pattern 700: character code assignment processing 800: determination processing

───────────────────────────────────────────────────── フロントページの続き (72)発明者塩昭夫東京都千代田区大手町二丁目３番１号日本電信電話株式会社内 (72)発明者大塚作一東京都千代田区大手町二丁目３番１号日本電信電話株式会社内Ｆターム(参考） 5B064 AA01 AB02 AB03 DA03 DA34 DC00 EA08 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Akio Shio 2-3-1 Otemachi, Chiyoda-ku, Tokyo Within Nippon Telegraph and Telephone Corporation (72) Inventor Sakuichi Otsuka 2--3, Otemachi, Chiyoda-ku, Tokyo No. 1 Nippon Telegraph and Telephone Corporation F term (reference) 5B064 AA01 AB02 AB03 DA03 DA34 DC00 EA08

Claims

[Claims]

1. A method of converting various patterns present in an input image into character codes and inputting the converted characters, wherein a first step of cutting out an undefined pattern without a character code in the input image as a dictionary pattern A second step of measuring a dictionary pattern obtained by the extraction and a feature amount of the input image; and searching the input image for a pattern similar to the dictionary pattern based on the feature amount obtained by the feature amount measurement. A third step of assigning the same character code to all of the similar patterns extracted in the above step, and a fourth step of assigning the same character code to the similar pattern extracted in the above step. A pattern input method characterized by repeating the steps up to step 4.

2. The pattern input method according to claim 1, wherein in the step of cutting out an undefined pattern from the input image, the pattern is cut out in order from a pattern having a high frequency of appearance in the input image. Method.

3. The pattern input method according to claim 1, wherein in the step of assigning a character code to the similar pattern extracted by the search, a determination is made as to whether or not the dictionary pattern and the similar pattern are the same. A pattern input method characterized by being obtained as an input.

4. The pattern input method according to claim 1, wherein, when the input image is a black and white binary image, the outline feature of the pattern is used as a feature amount used for the search. Characteristic pattern input method.

5. The pattern input method according to claim 1, wherein in the step of extracting a dictionary pattern and a similar pattern from an image, a determination criterion for determining whether or not the pattern is similar is a feature quantity. A pattern input method characterized by using a distance value between the patterns.

6. The pattern input method according to claim 1, wherein a weighted average histogram, which is one of the contour features, is measured from both the dictionary pattern and the input image. A pattern input method comprising: creating a small area centered on a black pixel; acquiring a contour direction histogram and performing blurring by a two-dimensional Gaussian filter in the small area.

7. The pattern input method according to claim 1, wherein when assigning a character code to the similar pattern extracted by the search, the step of manually confirming the identity includes: A pattern input method characterized by displaying a list of dictionary patterns and a plurality of similar patterns on the same screen.

8. The pattern input method according to claim 1, wherein in the process of assigning a character code to a similar pattern, when the identity is manually checked, the dictionary pattern is displayed on the same screen. And displaying a plurality of similar patterns in a list, wherein the plurality of similar patterns are arranged in the order of high or low similarity with the dictionary pattern.

9. The pattern input method according to any one of claims 1 to 8, wherein in the case of confirming the identity by hand in the process of assigning a character code to a similar pattern, the dictionary pattern is displayed on the same screen. And displaying a plurality of similar patterns in a list, wherein the similar patterns are cut out and displayed in an area large enough to include a pattern present in the vicinity thereof.

10. A program for causing a computer to execute the pattern input method according to any one of claims 1 to 9.

11. A recording medium on which a program for causing a computer to execute the pattern input method according to claim 1 is recorded.