JPH0812683B2 - High speed extraction method for specific character strings - Google Patents

High speed extraction method for specific character strings

Info

Publication number
JPH0812683B2
JPH0812683B2 JP61288799A JP28879986A JPH0812683B2 JP H0812683 B2 JPH0812683 B2 JP H0812683B2 JP 61288799 A JP61288799 A JP 61288799A JP 28879986 A JP28879986 A JP 28879986A JP H0812683 B2 JPH0812683 B2 JP H0812683B2
Authority
JP
Japan
Prior art keywords
character
character string
feature amount
specific character
specific
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP61288799A
Other languages
Japanese (ja)
Other versions
JPS63142487A (en
Inventor
弘一 本間
文伸 古村
文男 和歌森
晃 加賀美
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Priority to JP61288799A priority Critical patent/JPH0812683B2/en
Publication of JPS63142487A publication Critical patent/JPS63142487A/en
Publication of JPH0812683B2 publication Critical patent/JPH0812683B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Description

【発明の詳細な説明】 〔産業上の利用分野〕 本発明は文書画像の文字列抽出方法に係り、特にシス
テムにあらかじめ登録された特定文字列のみを、文書画
像中から効率良く抽出するのに好適な特定文字列高速抽
出方法に関する。
The present invention relates to a method for extracting a character string of a document image, and more particularly to efficiently extracting only a specific character string registered in advance in the system from the document image. The present invention relates to a suitable specific character string high-speed extraction method.

〔従来の技術〕 従来、入力文字列の言語情報を利用して読み取り精度
を向上させる文字認識処理方式については、杉村:候補
文字補完と形態素解析による漢字認識の誤まり訂正処理
法,信学会情シ全国大会(1985)の1−307頁から1−3
08頁において論ぜられている。そこでの文字認識システ
ムは、単語辞書を持ち、全文を文字認識し、従来の1文
字ごとの認識結果である候補文字の並びを形態素解析
し、2位以下に正解文字がある場合の誤り訂正を行う。
このような方式によれば、1文字1文字を単独に文字認
識する場合に比べ、認識精度を向上させることができ
る。
[Prior Art] Conventionally, regarding a character recognition processing method for improving reading accuracy by using linguistic information of an input character string, Sugimura: error correction processing method of kanji recognition by candidate character complement and morphological analysis, IEICE Information Page 1-307 from 1-3 of National Convention (1985) 1-3
Discussed on page 08. The character recognition system there has a word dictionary, recognizes the whole sentence, performs morphological analysis on the sequence of candidate characters that is the conventional recognition result for each character, and corrects errors when there is a correct character at the second or lower position. To do.
According to such a method, the recognition accuracy can be improved as compared with the case of individually recognizing each character.

〔発明が解決しようとする問題点〕[Problems to be solved by the invention]

上記従来技術は、単語辞書による先験的文字列情報を
文字認識に用いるものであるが、全文を文字認識するた
め、特定の文字列だけを効率良く抽出しようとする点に
ついては配慮されておらず、処理時間がかかるという問
題があつた。
The above-mentioned conventional technology uses a priori character string information based on a word dictionary for character recognition.However, since the entire text is recognized, it should be taken into consideration that only a specific character string is efficiently extracted. However, there is a problem that it takes a long processing time.

本発明の目的は、システムにあらかじめ登録された特
定文字列のみを、文書画像中から効率良く抽出するに好
適な特定文字列高速抽出方法を提供することにある。
An object of the present invention is to provide a specific character string high-speed extraction method suitable for efficiently extracting only a specific character string registered in advance in a system from a document image.

〔問題点を解決するための手段〕[Means for solving problems]

上記目的は、文書上の文字画像(以下、文書画像とい
う)から文字を切り出し、切り出した文字の特徴量を算
出し、算出された特徴量とあらかじめ登録してある文字
特徴量とを照合する際、算出特徴量を量子化し、文書画
像上で抽出すべき特定の文字列が量子化文字特徴量列に
どのように対応するかをあらかじめ求めテーブル化して
おき、このテーブルを参照して入力文字画像の量子化文
字特徴量列から候補となる特定文字列を求め、この特定
文字列を対象に文字パタンの一致不一致の認識処理を行
うことにより、達成される。
The above-mentioned purpose is to cut out a character from a character image (hereinafter referred to as a document image) on a document, calculate a feature amount of the cut out character, and collate the calculated feature amount with a character feature amount registered in advance. , Quantize the calculated feature quantity, find in advance how the particular character string to be extracted on the document image corresponds to the quantized character feature quantity sequence, create a table, and refer to this table to input character image This is achieved by obtaining a candidate specific character string from the quantized character feature amount sequence of, and performing recognition processing of matching or non-matching of the character pattern for this specific character string.

〔作用〕[Action]

一般に、パタン分類においては分類先のクラス数が多
い場合、所属クラスの完全な識別より、いくつかの特定
クラスだけへの所属の有無の判別の方が格段に少ない演
算で実現できる。この傾向は、パタンの並びの中から、
特定のパタンの並びを判別する場合には、さらに顕著に
なる。通常、並びには規則性があるためである。文書画
像からの特定文字列の抽出は、まさにこの場合にあた
る。
Generally, in pattern classification, when the number of classes to be classified is large, it can be realized by a significantly smaller number of operations to determine whether or not a class belongs to only some specific classes than to completely identify the classes to which the classes belong. This tendency is
It becomes more prominent when determining the arrangement of specific patterns. This is because normally, and have regularity. The extraction of the specific character string from the document image is just this case.

第2図で、上記事実を説明する。文字パタンの特徴パ
ラメータをxと表わし、各文字iの特徴空間での確率分
布を、Pi(x);i=1,…、文字種数、と表わすと、確率
分布Pi(x)は図に示すように重なりを持つて分布す
る。通常の印刷文字認識では、入力パタンxに対し各確
率分布Pi(x)の最大値を与える文字を認識結果とす
る。ただし実際の演算では、各文字の平均パタン
の距離d(i,x)の最も小さい文字を選ぶことが多
い。従つて、1文字の認識には文字種類数回だけの距離
計算を必要とする。一方、入力パタンxが、特定Xであ
る可能性の有無の判定には1回あるいは少ない個数の特
定文字数回の距離計算ですむ。また、文字の組合わせで
ある文字列は、全くランダムな組合せが許されるわけで
はないため、1文字に関する確率分布が重なりを持つ場
合でも、文字特徴空間の積空間での文字列の確率分布の
重なりは少なくなる。従つて、確率分布の重なりが大き
い、粗い特徴量を用いても、文字列の認識では比較的高
い精度となる。
FIG. 2 illustrates the above fact. If the characteristic parameter of the character pattern is represented as x, and the probability distribution of each character i in the feature space is represented as P i (x); i = 1, ..., Number of character types, the probability distribution P i (x) is They are distributed with overlap as shown in. In ordinary print character recognition, a character that gives the maximum value of each probability distribution P i (x) to the input pattern x is set as the recognition result. However, in actual calculation, the character having the smallest distance d ( i , x) from the average pattern i of each character is often selected. Therefore, recognition of one character requires distance calculation only for several character types. On the other hand, the distance calculation may be performed once or a small number of specific characters to determine whether or not the input pattern x may be the specific X. Moreover, since a character string that is a combination of characters does not allow completely random combinations, even if the probability distributions for one character have an overlap, the probability distribution of the character string in the product space of the character feature space There is less overlap. Therefore, even if a coarse feature amount having a large overlap of probability distributions is used, the character string recognition has a relatively high accuracy.

〔実施例〕〔Example〕

以下、本発明の一実施例を第1図により説明する。光
デイスクなどに収められている文書画像データは、光デ
イスク装置1から読み出され、1ラインずつ文字枠切り
出し装置2に入力され、各文字の最小外接矩形情報すな
わち、左上右下頂点の座標が出力され、文字情報テーブ
ル3の外接矩形情報部4に格納される。文字枠切り出し
装置としては、ここでは、特願昭60−184242号「文書文
字切り出し画像処理方式」に詳述されている装置を用い
るものとする。文書画像全体について文字外接矩形情報
が抽出されると、外接矩形情報は文字情報テーブル3か
ら読み出され、粗特徴量算出装置5において、各文字ご
とに粗い特徴量が算出される。粗い特徴量としては、文
献:文字認識概論(1982)のp78〜79で詳しい説明のあ
る複雑度指数を用いる。複雑度指数は、文字パタンの輪
郭線の垂直および水平方向成分の総長である。第3図に
輪郭線の総長を求めるための2×2メツシユ要素パタン
を示す。要素パタンは、垂直パタンV(同図(a))、
水平パタンH(同図(b))、斜め片側パタンL(同図
(c))、斜め両側パタンT(同図(d))の4種類に
分けられる。図中の太線は文字パタンの輪郭線を折れ線
近似したものである。それぞれの要素パタンの文字全体
における総数n(V),n(H),n(L)そしてn(T)
から、輪郭線の垂直および水平方向成分を求める。した
がつて、水平、垂直方向複雑度指数lx,lyは、それぞれ
下式で求まる。
An embodiment of the present invention will be described below with reference to FIG. The document image data stored in the optical disc or the like is read from the optical disc device 1 and input line by line to the character frame clipping device 2, and the minimum circumscribing rectangle information of each character, that is, the coordinates of the upper left lower right vertex It is output and stored in the circumscribed rectangle information section 4 of the character information table 3. As the character frame clipping device, the device detailed in Japanese Patent Application No. 60-184242 "Document character clipping image processing method" is used here. When the character circumscribing rectangle information is extracted for the entire document image, the circumscribing rectangle information is read from the character information table 3, and the rough feature amount calculating device 5 calculates a rough feature amount for each character. As the coarse feature amount, the complexity index which is described in detail in p78 to 79 of Literature: Introduction to Character Recognition (1982) is used. The complexity index is the total length of the vertical and horizontal components of the outline of the character pattern. FIG. 3 shows a 2 × 2 mesh element pattern for obtaining the total length of the contour line. The element pattern is the vertical pattern V ((a) in the figure),
There are four types of patterns, a horizontal pattern H (FIG. 2B), a diagonal one side pattern L (FIG. 7C), and a diagonal both side pattern T (FIG. 2D). The thick line in the figure is a polygonal line approximation of the outline of the character pattern. N (V), n (H), n (L) and n (T), the total number of each element pattern in the entire character
Then, the vertical and horizontal components of the contour line are obtained. Therefore, the horizontal and vertical direction complexity indices l x and l y are obtained by the following equations.

粗特徴量算出装置5で求めた各文字に関する水平,垂
直方向複雑度指数は、文字情報テーブル3の粗特徴量部
6に、外接矩形情報部4の文字枠情報と対応して格納さ
れる。
The horizontal and vertical direction complexity index for each character obtained by the rough feature amount calculation device 5 is stored in the rough feature amount portion 6 of the character information table 3 in association with the character frame information of the circumscribing rectangle information portion 4.

つぎに、文字情報テーブル3から文字列の順に文字粗
特徴量が読み出され、量子化器7により0から15のコー
ドm(4bit)にコード化され、シフトレジスタ8に格納
される。量子化器7では、2方向の複雑度指数をそれぞ
れ3つの閾値で4区間に分割し定義した16の区間のいず
れに入るかを判別する。シフトレジスタ8には、連続3
文字分の粗特徴量量子化コードが格納されており、アド
レス演算器9は、3文字分のコードを12bitデータと考
え、約4kwのテーブルを参照することにより、文字列テ
ーブル10上の対応アドレスを求める。抽出すべき特定文
字列は、あらかじめ約4Kの分割区間(16の分割区間の3
乗積区間)のいずれに入るかを粗特徴量により判別さ
れ、分割区間ごとにコード列として集められ、文字列テ
ーブル10に格納されている。ただし、本判別のための分
割区間は互いにオーバーラツプさせ1つの文字列が複数
の区間に対応することも許す。アドレス演算部9の出力
アドレスは、分割区間に関するコード列データの先頭ア
ドレスである。ただし、先頭アドレス自身には分割区間
中のコード列の個数が格納されている。シフトレジスタ
8上の粗特徴量量子化コードに対応する3文字に対し、
アドレス演算器9の出力アドレスで参照される文字列テ
ーブル10の内容は、候補となる特定文字列の数を示す。
特定文字列数が0の場合には、判定装置11により候補と
なる特定文字列はないと判定され、文字情報テーブル3
から次の文字粗特徴量が読み出され同様にして文字列テ
ーブル10の参照が行われる。
Next, the character rough feature amounts are read from the character information table 3 in the order of character strings, coded by the quantizer 7 into codes m (4 bits) of 0 to 15, and stored in the shift register 8. The quantizer 7 divides the complexity index in the two directions into four sections with three thresholds, and determines which of the 16 sections is defined. The shift register 8 has three consecutive
The coarse feature amount quantization code for characters is stored, and the address calculator 9 considers the code for 3 characters as 12-bit data, and by referring to the table of about 4 kw, the corresponding address on the character string table 10 Ask for. The specific character string to be extracted is a divided section of about 4K in advance (3 of 16 divided sections).
Which of the product sections) to enter is determined by the rough feature amount, and the divided section is collected as a code string and stored in the character string table 10. However, the divided sections for this determination may overlap each other to allow one character string to correspond to a plurality of sections. The output address of the address calculation unit 9 is the start address of the code string data regarding the division section. However, the number of code strings in the divided section is stored in the head address itself. For the three characters corresponding to the coarse feature quantity quantization code on the shift register 8,
The content of the character string table 10 referred to by the output address of the address calculator 9 indicates the number of candidate specific character strings.
When the number of specific character strings is 0, the determination device 11 determines that there is no specific character string that is a candidate, and the character information table 3
Then, the next character rough feature amount is read out and the character string table 10 is referred to in the same manner.

文字列テーブル10の参照結果が0でない場合には、候
補となる文字列の文字コード列が次々と文字列テーブル
10から読み出され、文字精特徴量テーブル12により、文
字コードに対応する文字パタンの精特徴量に変換され
る。文字パタンの精特徴量としては、文字パタンK(x,
y)自身を用いる。ここで、x,yは各々水平と垂直方向の
位置座標で、 である。精特徴量列レジスタ13には、候補文字列の精特
徴量{▲Ki 1▼(x,y),▲Ki 2▼(x,y),▲Ki 3
(x,y)}が格納される。添字iはi番目の候補である
ことを示す。一方、判定装置11は、候補文字列数が0で
ない場合には、精特徴量算出装置14に起動をかけ、シフ
トレジスタ8中の文字列に関する精特徴量を算出する。
すなわち、光デイスク装置1より該当文字パタンを切り
出し、被判定精特徴量列レジスタ15に格納する。両レジ
スタ13,15の精特徴量列は、距離計算装置16により下式
で距離すなわち相違度が求められ、 閾値判定器17で相違度の判定が行われる。相違度が閾値
θを越える場合には、入力文書画像中の被判定文字列
は、候補文字列ではあり得ないと考え、候補文字列中で
相違度が閾値θを越えないものを選び、相違度と共に文
字コード列を出力する。相違度が閾値θを越えない候補
文字列が複数ある場合には、最小の相違度を与える文字
コード列あるいは、すべての文字コード列を順位付けし
て出力する。もし、候補文字列の相違度がすべて閾値θ
を越える場合には、被判定文字列は抽出すべき特定文字
列のいずれでもないとし、文字情報テーブル3に起動が
かかり、次の文字粗特徴量が読み出され、1文字シフト
した3文字の文字列上で上記判定処理が行われる。
If the reference result of the character string table 10 is not 0, the character code strings of candidate character strings are successively displayed in the character string table.
It is read from 10, and is converted into the precise feature amount of the character pattern corresponding to the character code by the character precise feature amount table 12. As the precise feature amount of the character pattern, the character pattern K (x,
y) Use yourself. Where x and y are position coordinates in the horizontal and vertical directions, respectively. Is. The fine feature amount register 13 stores the fine feature amount of the candidate character string {▲ K i 1 ▼ (x, y), ▲ K i 2 ▼ (x, y), ▲ K i 3 ▼.
(X, y)} is stored. The subscript i indicates that it is the i-th candidate. On the other hand, when the number of candidate character strings is not 0, the determination device 11 activates the fine feature amount calculation device 14 to calculate the fine feature amount regarding the character string in the shift register 8.
That is, the corresponding character pattern is cut out from the optical disk device 1 and stored in the to-be-determined fine feature quantity sequence register 15. The precise feature amount sequence of both registers 13 and 15 is obtained by the distance calculation device 16 in the following formula, that is, the degree of difference, The threshold determiner 17 determines the degree of difference. If the dissimilarity exceeds the threshold θ, the character string to be judged in the input document image cannot be a candidate character string, and a candidate character string whose dissimilarity does not exceed the threshold θ is selected. Output the character code string with the degree. When there are a plurality of candidate character strings whose dissimilarity does not exceed the threshold value θ, the character code string giving the smallest dissimilarity or all the character code strings are ranked and output. If the differences of the candidate character strings are all threshold θ
If it exceeds, it is determined that the character string to be judged is not one of the specific character strings to be extracted, the character information table 3 is activated, the next character rough feature amount is read, and one character is shifted and three characters are shifted. The above determination process is performed on the character string.

以上述べた実施例では、文字パタンの粗特徴量とし
て、複雑度指数を用いたが、周辺分布、縮少したパタン
そのものなど、別の粗特徴量を用いることも可能であ
る。
In the embodiment described above, the complexity index is used as the rough feature amount of the character pattern, but it is also possible to use another rough feature amount such as the marginal distribution or the reduced pattern itself.

〔発明の効果〕〔The invention's effect〕

本発明によれば、文書画像上で抽出すべき特定の文字
列をあらかじめ粗い文字特徴量の並びに基づいて分類し
ておくため、入力文書画像文字列から、演算量の少ない
粗い特徴量を用いて、候補となる登録特定文字列をしぼ
り込め、その結果特定文字列の抽出を高速におこなえる
ため、システムに登録された特定文字列のみを、文書画
像中から効率良く抽出できる効果がある。
According to the present invention, a specific character string to be extracted on a document image is classified in advance based on the arrangement of coarse character feature amounts. Therefore, a coarse feature amount with a small amount of calculation is used from the input document image character string. Since the registered specific character strings that are candidates are narrowed down and the specific character strings can be extracted at high speed as a result, only the specific character strings registered in the system can be efficiently extracted from the document image.

【図面の簡単な説明】[Brief description of drawings]

第1図は本発明の一実施例の全体システム構成図、第2
図は文字パタンの特徴空間での確率分布の説明図、第3
図は文字パタンの粗特徴量として用いた複雑度指数計算
のための要素パタンの一例を示す図である。
FIG. 1 is an overall system configuration diagram of an embodiment of the present invention, and FIG.
Fig. 3 is an explanatory diagram of the probability distribution of the character pattern in the feature space.
The figure is a diagram showing an example of an element pattern for calculating a complexity index used as a rough feature amount of a character pattern.

Claims (3)

【特許請求の範囲】[Claims] 【請求項1】文字画像を入力し、入力された文字画像よ
り文字列を切り出し、切り出された文字列の特徴量の列
を算出し、算出された特徴量の列とあらかじめ用意した
特定の文字列に関する特徴量の列とを文字位置をずらし
つつ照合し、文字列を構成する全ての文字の照合結果が
所定の閾値以上である照合位置と文字列のみを出力する
ことを特徴とする特定文字列高速抽出方法。
1. A character image is input, a character string is cut out from the input character image, a feature amount column of the cut out character string is calculated, and the calculated feature amount column and a specific character prepared in advance. A specific character characterized by collating the column of the feature amount related to the column while shifting the character position, and outputting only the collation position and the character string in which the collation result of all the characters constituting the character string is equal to or greater than a predetermined threshold value. High-speed column extraction method.
【請求項2】上記照合する処理は、抽出したい特定の文
字列であり得ない入力画像中の文字列を粗い特徴量で照
合して除去する処理と、粗い特徴量では照合不成功とは
ならなかった入力画像中の文字列のみに対し精細な特徴
量で照合を行う処理とからなる特許請求の範囲第1項の
特定文字列高速抽出方法。
2. The collation process is a process of collating and removing a character string in an input image that cannot be a specific character string to be extracted with a rough feature amount, and a collation failure with a rough feature amount is not unsuccessful. The high-speed extraction method of a specific character string according to claim 1, which comprises a process of matching only a character string in the input image that has not been detected with a fine feature amount.
【請求項3】上記照合する処理は、上記入力文字列の粗
い特徴量を量子化して量子化文字特徴量列を求め、文書
画像上で抽出すべき特定の文字列が上記量子化文字特徴
量列にどのように対応するかをあらかじめ求めテーブル
化しておき、上記量子化文字特徴列から該テーブルを参
照し候補となる特定文字列を求め、該特定文字列を対象
に文字パタンの一致不一致の照合を精細な特徴量で行う
処理からなる特許請求の範囲第2項の特定文字列高速抽
出方法。
3. The collating process quantizes a rough feature amount of the input character string to obtain a quantized character feature amount sequence, and a specific character string to be extracted on a document image is the quantized character feature amount. How to correspond to the columns is obtained in advance and made into a table, the specific character strings that are candidates are obtained by referring to the table from the quantized character feature sequence, and the matching of character patterns for the specific character string The specific character string high-speed extraction method according to claim 2, which comprises a process of performing matching with a fine feature amount.
JP61288799A 1986-12-05 1986-12-05 High speed extraction method for specific character strings Expired - Lifetime JPH0812683B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP61288799A JPH0812683B2 (en) 1986-12-05 1986-12-05 High speed extraction method for specific character strings

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP61288799A JPH0812683B2 (en) 1986-12-05 1986-12-05 High speed extraction method for specific character strings

Publications (2)

Publication Number Publication Date
JPS63142487A JPS63142487A (en) 1988-06-14
JPH0812683B2 true JPH0812683B2 (en) 1996-02-07

Family

ID=17734879

Family Applications (1)

Application Number Title Priority Date Filing Date
JP61288799A Expired - Lifetime JPH0812683B2 (en) 1986-12-05 1986-12-05 High speed extraction method for specific character strings

Country Status (1)

Country Link
JP (1) JPH0812683B2 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102006055796A1 (en) 2006-11-27 2008-05-29 Robert Bosch Gmbh Pressure control valve
JP5857704B2 (en) * 2011-12-13 2016-02-10 富士ゼロックス株式会社 Image processing apparatus and program

Also Published As

Publication number Publication date
JPS63142487A (en) 1988-06-14

Similar Documents

Publication Publication Date Title
US6272242B1 (en) Character recognition method and apparatus which groups similar character patterns
JP3343864B2 (en) How to separate words
US5539841A (en) Method for comparing image sections to determine similarity therebetween
US5774588A (en) Method and system for comparing strings with entries of a lexicon
US5161245A (en) Pattern recognition system having inter-pattern spacing correction
US5438628A (en) Method for matching text images and documents using character shape codes
Xu et al. Prototype extraction and adaptive OCR
JP3452774B2 (en) Character recognition method
JPH0634256B2 (en) Contact character cutting method
JPS62221088A (en) Optical type character reader
JPH0812683B2 (en) High speed extraction method for specific character strings
RU2707320C1 (en) Method of recognizing a symbol on a banknote and a coprocessor for a computing system of a banknote processing device
JPS6262388B2 (en)
EP1010128B1 (en) Method for performing character recognition on a pixel matrix
JP2003331214A (en) Character recognition error correction method, device and program
JP2906758B2 (en) Character reader
JP3157530B2 (en) Character extraction method
JP2529421B2 (en) Character recognition device
JP2866920B2 (en) Standard pattern creation method and apparatus, and character recognition apparatus and method
JP5986051B2 (en) Method for automatically recognizing Arabic text
JP2895115B2 (en) Character extraction method
JP2845463B2 (en) Pattern recognition device
JPS60138689A (en) Character recognizing method
JP3151866B2 (en) English character recognition method
JP3867237B2 (en) Character recognition method and apparatus, and recording medium on which character recognition program is recorded