JPH05298489A - System for recognizing character - Google Patents

System for recognizing character

Info

Publication number
JPH05298489A
JPH05298489A JP4099716A JP9971692A JPH05298489A JP H05298489 A JPH05298489 A JP H05298489A JP 4099716 A JP4099716 A JP 4099716A JP 9971692 A JP9971692 A JP 9971692A JP H05298489 A JPH05298489 A JP H05298489A
Authority
JP
Japan
Prior art keywords
character
category
character category
recognition target
collation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP4099716A
Other languages
Japanese (ja)
Inventor
Toshio Tsutsumida
敏夫 堤田
Kyoichi Sumiya
恭一 角谷
Yumi Nakayama
由美 中山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
N T T DATA TSUSHIN KK
NTT Data Corp
Original Assignee
N T T DATA TSUSHIN KK
NTT Data Communications Systems Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by N T T DATA TSUSHIN KK, NTT Data Communications Systems Corp filed Critical N T T DATA TSUSHIN KK
Priority to JP4099716A priority Critical patent/JPH05298489A/en
Publication of JPH05298489A publication Critical patent/JPH05298489A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To improve recognition precision by compensating a collation value at every character category and outputting the character category candidate of a character pattern which is read from a recognition object range. CONSTITUTION:Character category frequency memory parts 7-1 and 7-2 are respectively provided in accordance with the respective recognition object ranges when plural recognition object ranges exist in a slip. Then, a memory selecting part 8 selects one character category frequency memory part corresponding to the present recognition object range. Moreover, a collation value compensating part 9 compensates the collation value at every character category, which is obtained by an identification collating part 3, based on frequency information at every character category. The collation value compensated by the collation value compensating part 9 is inputted to a candidate character category sorting part 6. When the collation value by the classification of the character category is inputted, the collation value compensating part 9 controls the memory selecting part, permits the memory selecting part to select the character category frequency memory part 7-1 corresponding to the present recognition object range and reads frequency information by the classification of the character category.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、OCR等に適用して好
適な文字認識方式に係り、特に対象となる文字カテゴリ
を認識過程で少数に絞り込み、誤認識確率を向上させる
ようにした文字認識方式に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition method suitable for application to OCR and the like, and in particular, character recognition for improving the misrecognition probability by narrowing the target character category to a small number in the recognition process. Regarding the scheme.

【0002】[0002]

【従来の技術】図4は従来の文字認識方法を用いた文字
認識装置の主要部の構成を示すブロック図である。同図
において、1は帳票上の認識対象範囲に記述された文字
列をスキャナまたは文字切り出し等の手段によって文字
単位に2値量子化してなる文字パターンの特徴を解析
し、識別用の特徴ベクトルを作成する特徴抽出部、2は
認識対象の多数の文字パターンについてその標準的な特
徴を表わす標準的特徴ベクトルを照合情報として文字カ
テゴリ毎に格納した辞書メモリ部、3はこの辞書メモリ
部2に格納した照合情報を文字カテゴリ単位に順次読出
し、前記特徴ベクトルと照合し、照合情報に対する特徴
ベクトルの類似度を表わす照合値を文字カテゴリ単位に
出力する識別照合部、4−1および4−2は識別照合部
3が照合処理に使用する照合情報に関し、その使用可否
(可=1,否=0)の情報を文字カテゴリ別に格納した
文字カテゴリ限定メモリ部であり、この文字カテゴリ限
定メモリ部4−1,4−2は帳票上に複数個所の認識対
象範囲が存在する場合、各認識対象範囲にそれぞれ対応
して設けられる。但し、共通する文字カテゴリの認識対
象範囲については兼用するようになっている。なお、図
においては、2つの認識対象範囲に対応するものについ
てのみ図示し、他の認識対象範囲の文字カテゴリ限定メ
モリ部については省略している。図5に文字カテゴリが
3000種である場合の格納内容の一例を示している。
2. Description of the Related Art FIG. 4 is a block diagram showing a configuration of a main part of a character recognition apparatus using a conventional character recognition method. In the figure, reference numeral 1 is a character pattern which is obtained by binary-quantizing a character string described in a recognition target range on a form in a character unit by means of a scanner or a character cutout, and identifies a feature vector for identification. A feature extraction unit 2 to be created is a dictionary memory unit that stores a standard feature vector representing a standard feature of a large number of character patterns to be recognized as collation information for each character category, and 3 is stored in the dictionary memory unit 2. The identification collating units 4-1 and 4-2 identify the collation information sequentially read in character category units, collate with the feature vector, and output a collation value representing the similarity of the feature vector to the collation information in character category units. Regarding the matching information used by the matching unit 3 in the matching process, a character category limit in which information of availability (possible = 1, not = 0) is stored for each character category. A memory unit, the character category limited memory unit 4-1 and 4-2 when the recognition target range of a plurality of locations existing on the form, provided corresponding to each recognition target range. However, the recognition target range of the common character category is also used. In the figure, only those corresponding to the two recognition target ranges are shown, and the character category limited memory units in the other recognition target ranges are omitted. FIG. 5 shows an example of the stored contents when the character category is 3000 types.

【0003】5は文字カテゴリ限定メモリ部4−1,4
−2のうち現在の認識対象範囲に対応したメモリ部を選
択し、その選択したメモリ部から文字カテゴリ単位に順
次読み出された使用可否の情報を識別照合部3に入力
し、識別照合部3に使用可否の情報が「1」となってい
る文字カテゴリに限って照合動作を行わせるメモリ選択
部、6は識別照合部3における照合動作によって得た全
ての文字カテゴリの照合値をソートし、現在の認識対象
範囲から読み取った文字パターンの文字カテゴリ候補を
出力する候補文字カテゴリソート部であり、この候補文
字カテゴリソート部6から出力される文字カテゴリ候補
は図示しない単語照合部や認識結果表示部等に入力さ
れ、ここで存在し得る単語文字列に合致した候補文字カ
テゴリ列の選択、文字読み取り装置操作者による修正等
の実現に供せられる。
Reference numeral 5 is a character category limited memory section 4-1, 4
-2, a memory unit corresponding to the current recognition target range is selected, and the usability information sequentially read in character category units from the selected memory unit is input to the identification collation unit 3 and the identification collation unit 3 In the memory selection unit for performing the matching operation only in the character category whose availability information is “1”, 6 sorts the matching values of all the character categories obtained by the matching operation in the identification matching unit 3, The candidate character category sorting unit outputs the character category candidates of the character pattern read from the current recognition target range. The character category candidates output from the candidate character category sorting unit 6 are a word matching unit and a recognition result display unit (not shown). Etc., and is used for realizing selection of a candidate character category string that matches a word character string that may exist and correction by the operator of the character reading device.

【0004】このような構成において、帳票上の認識対
象範囲に記述された文字列を認識する場合、その文字列
をスキャナまたは文字切り出し等の手段によって文字単
位に2値量子化し、文字パターンとして特徴抽出部1に
入力する。
In such a configuration, when recognizing a character string described in a recognition target range on a form, the character string is binary-quantized in character units by means such as a scanner or a character cutout and characterized as a character pattern. Input to the extraction unit 1.

【0005】すると、特徴抽出部1は入力された文字パ
ターンの特徴を解析し、識別用の特徴ベクトルを作成す
る。この特徴ベクトルは識別照合部3に入力される。
Then, the feature extraction unit 1 analyzes the features of the input character pattern and creates a feature vector for identification. This feature vector is input to the identification and collation unit 3.

【0006】識別照合部3は、文字パターンの特徴ベク
トルが入力されると、辞書メモリ部2から文字カテゴリ
単位に照合情報を順次読み出す。また、メモリ選択部5
を制御し、メモリ選択部5に現在の認識対象範囲に応じ
た文字カテゴリ限定メモリ部(例えば4−1)を選択さ
せ、この文字カテゴリ限定メモリ部4−1に格納した使
用可否の情報のうち辞書メモリ部2から読出した照合情
報の文字カテゴリに対応した使用可否の情報を同期して
読出す。そして、使用可否の情報が「1」となっている
文字カテゴリに限って、特徴抽出部1から入力された特
徴ベクトルと辞書メモリ部2から読出した照合情報とを
照合し、その照合情報に対する特徴ベクトルの類似度を
表わす照合値を出力する。
When the feature vector of the character pattern is input, the identification collation unit 3 sequentially reads out the collation information from the dictionary memory unit 2 for each character category. In addition, the memory selection unit 5
Of the information on availability of use stored in the character category limited memory unit 4-1 by controlling the memory selection unit 5 to select the character category limited memory unit (for example, 4-1) according to the current recognition target range. Information about availability of use corresponding to the character category of the collation information read from the dictionary memory unit 2 is synchronously read. Then, the feature vector input from the feature extraction unit 1 is collated with the collation information read from the dictionary memory unit 2 only in the character category for which the usability information is “1”, and the feature for that collation information is collated. A matching value indicating the similarity of the vector is output.

【0007】例えば、平仮名に限って使用可否の情報が
「1」となっている場合は、平仮名の照合情報のみの照
合が行われ、平仮名の各文字に対する照合値が出力され
る。
[0007] For example, when the availability information is "1" only for hiragana, only the collation information of hiragana is collated, and the collation value for each character of hiragana is output.

【0008】この照合値は候補文字カテゴリソート部6
に入力される。候補文字カテゴリソート部6は識別照合
部3における照合動作によって得た全ての文字カテゴリ
の照合値をソートし、現在の認識対象範囲から読み取っ
た文字パターンの文字カテゴリ候補を出力する。
This matching value is used as the candidate character category sorting unit 6
Entered in. The candidate character category sorting unit 6 sorts the matching values of all the character categories obtained by the matching operation in the identification matching unit 3, and outputs the character category candidates of the character pattern read from the current recognition target range.

【0009】このような構成となっているため、認識対
象範囲に記述される文字カテゴリが例えば平仮名のみに
限定されるような場合、平仮名に限って使用可否の情報
を「1」としておけば、当該認識対象範囲の文字列を認
識する際に、平仮名以外の文字カテゴリを除いた照合が
可能になり、誤読確率を低減することができる。
With such a configuration, when the character category described in the recognition target range is limited to, for example, only hiragana, if the information of availability is set to "1" only in hiragana, When recognizing a character string in the recognition target range, it is possible to perform collation excluding character categories other than hiragana, and the misreading probability can be reduced.

【0010】[0010]

【発明が解決しようとする課題】しかしながら、上記従
来技術にあっては、文字カテゴリの使用可否を指定可能
であるが、漢字を認識する場合のように、認識対象とな
る文字カテゴリが多数存在する場合には、文字カテゴリ
の範囲絞り込みの程度が小さいため、誤読確率の低減効
果は大きくならないという問題があった。
However, in the above-mentioned conventional technique, it is possible to specify whether or not to use character categories, but there are many character categories to be recognized, such as when recognizing Chinese characters. In this case, since the degree of narrowing down the range of the character category is small, there is a problem in that the effect of reducing the misreading probability does not increase.

【0011】本発明はこのような問題を解決すべくなさ
れたもので、認識対象となる文字カテゴリが多数存在す
る場合でも文字カテゴリの範囲絞り込みの程度を大きく
し、実効的な認識精度を向上させることができる文字認
識方式を提供することを目的とするものである。
The present invention has been made to solve such a problem. Even when there are many character categories to be recognized, the range of character categories is narrowed down to improve the effective recognition accuracy. The object of the present invention is to provide a character recognition method capable of performing the character recognition.

【0012】[0012]

【課題を解決するための手段】本発明は上記目的を達成
するために、認識対象範囲から読み取った文字パターン
の特徴を解析し、識別用の特徴ベクトルを作成する特徴
抽出手段と、認識対象の多数の文字パターンについてそ
の標準的特徴を表わす照合情報を文字カテゴリ毎に格納
した辞書メモリ手段と、この辞書メモリ手段に格納した
照合情報を文字カテゴリ単位に順次読出し、前記特徴ベ
クトルと照合し、照合情報に対する特徴ベクトルの類似
度を表わす照合値を文字カテゴリ単位に出力する識別照
合手段と、全ての文字カテゴリにおいて得た照合値をソ
ートし、認識対象範囲から読み取った文字パターンの文
字カテゴリ候補を出力する候補文字カテゴリソート手段
とを備えた文字認識方式において、前記認識対象範囲内
で出現する文字カテゴリ毎の出現確率に関する頻度情報
を格納した文字カテゴリ頻度メモリ手段と、前記識別照
合手段で得た文字カテゴリ毎の照合値を前記文字カテゴ
リ毎の頻度情報に基づき補正する照合値補正手段とを設
け、この照合値補正手段で補正した照合値を前記候補文
字カテゴリソート手段へ入力し、認識対象範囲から読み
取った文字パターンの文字カテゴリ候補を出力するよう
にした。
In order to achieve the above object, the present invention analyzes a feature of a character pattern read from a recognition target range and creates a feature vector for identification, and a feature extraction unit of the recognition target. A dictionary memory means storing collation information representing standard characteristics of a large number of character patterns for each character category, and collation information stored in this dictionary memory means is sequentially read out in character category units and collated with the characteristic vector to collate. Identification matching means that outputs a matching value indicating the similarity of the feature vector to information in character category units, and the matching values obtained in all character categories are sorted, and character category candidates of the character pattern read from the recognition target range are output. In the character recognition method including means for sorting candidate character categories, Character category frequency memory means for storing frequency information regarding appearance probability for each gori, and matching value correction means for correcting the matching value for each character category obtained by the identification and matching means based on the frequency information for each character category are provided. The matching value corrected by the matching value correcting means is input to the candidate character category sorting means, and the character category candidates of the character pattern read from the recognition target range are output.

【0013】[0013]

【作用】上記手段によれば、認識対象範囲内で出現する
文字カテゴリ毎の出現確率に関する頻度情報により、文
字カテゴリ毎の照合値を補正し、この補正した照合値を
候補文字カテゴリソート手段へ入力し、認識対象範囲か
ら読み取った文字パターンの文字カテゴリ候補を出力す
るので、出現確率の高い文字カテゴリ候補を優先させ、
絞り込み程度を大きくすることができる。この結果、実
効的な認識精度を向上させることができる。
According to the above means, the matching value for each character category is corrected based on the frequency information regarding the appearance probability of each character category that appears in the recognition target range, and the corrected matching value is input to the candidate character category sorting means. Then, since the character category candidates of the character pattern read from the recognition target range are output, the character category candidates having a high appearance probability are prioritized,
The degree of narrowing can be increased. As a result, effective recognition accuracy can be improved.

【0014】[0014]

【実施例】以下、本発明を図示する実施例によって詳細
に説明する。
EXAMPLES The present invention will be described in detail below with reference to illustrated examples.

【0015】図1は本発明の文字認識方式を用いた文字
認識装置の主要部の構成の一実施例を示すブロック図で
ある。同図において、図4と同一部分は同一記号で示
し、その説明は省略する。図1において、7−1,7−
2は認識対象範囲内で出現する文字カテゴリ毎の出現確
率に関する頻度情報を格納した文字カテゴリ頻度メモリ
部であり、この文字カテゴリ頻度メモリ部7−1,7−
2は帳票上に複数個所の認識対象範囲が存在する場合、
各認識対象範囲にそれぞれ対応して設けられる。但し、
共通する文字カテゴリの認識対象範囲については兼用す
るようになっている。なお、図においては、2つの認識
対象範囲に対応するものについてのみ図示し、他の認識
対象範囲の文字カテゴリ頻度メモリ部については省略し
ている。
FIG. 1 is a block diagram showing an embodiment of the configuration of the main part of a character recognition device using the character recognition system of the present invention. In the figure, the same parts as those in FIG. 4 are indicated by the same symbols, and the description thereof will be omitted. In FIG. 1, 7-1, 7-
Reference numeral 2 denotes a character category frequency memory unit that stores frequency information regarding the appearance probability of each character category that appears in the recognition target range. The character category frequency memory units 7-1 and 7-
2 is when there are multiple recognition target areas on the form,
It is provided corresponding to each recognition target range. However,
The recognition target range of the common character category is shared. In the figure, only those corresponding to the two recognition target ranges are shown, and the character category frequency memory units of the other recognition target ranges are omitted.

【0016】8は現在の認識対象範囲に応じた1つの文
字カテゴリ頻度メモリ部を選択するメモリ選択部、9は
識別照合部3で得た文字カテゴリ毎の照合値を前記文字
カテゴリ毎の頻度情報に基づき補正する照合値補正部で
あり、この照合値補正部9で補正した照合値は候補文字
カテゴリソート部6に入力される。
Reference numeral 8 is a memory selection unit for selecting one character category frequency memory unit according to the current recognition target range. Reference numeral 9 is a matching value for each character category obtained by the identification and matching unit 3 and frequency information for each character category. Is a collation value correction unit that corrects the collation value based on the collation value.

【0017】このような構成において、認識対象の文字
列を記述した帳票10が図2に示すような構成であり、
この帳票10の中の形式コード欄11、地名欄12、番
地欄13、姓名欄14に記述される文字列を読み取って
認識する場合を仮定する。
In such a structure, the form 10 in which the character string to be recognized is described has a structure as shown in FIG.
It is assumed that the form code column 11, the place name column 12, the address column 13, and the family name column 14 in the form 10 are read and recognized.

【0018】このうち、番地欄13の文字列「66番地
2」の「2」を認識する場合、識別照合部3において
は、図3(a)に示すような照合値が得られる。すなわ
ち、「2」に類似した文字として「Z」や「乙」という
文字があるが、数字「2」、英字「Z」、漢字「乙」の
標準的特徴ベクトルに対し、番地欄13から読み取った
数字「2」の類似度を表わす照合値が例えば「2」=2
0,「Z」=15,「乙」=10といったような数値で
得られる。ここで、この数値が小さいほど標準的特徴ベ
クトルに近いことを示している。
When recognizing "2" of the character string "66 address 2" in the address column 13 among these, the identification matching unit 3 obtains a matching value as shown in FIG. 3 (a). That is, although there are characters similar to "2" such as "Z" and "Otsu", the standard feature vector of the number "2", the alphabet "Z", and the Chinese character "Otsu" can be read from the address column 13. The matching value representing the degree of similarity of the numeral "2" is, for example, "2" = 2.
It is obtained with numerical values such as 0, “Z” = 15, and “B” = 10. Here, it is shown that the smaller this numerical value is, the closer it is to the standard feature vector.

【0019】このような数値表現された照合値は照合値
補正部9に入力される。照合値補正部9は、文字カテゴ
リ別の照合値が入力されると、メモリ選択部5を制御
し、メモリ選択部5に現在の認識対象範囲である番地欄
13に応じた文字カテゴリ頻度メモリ部(例えば7−
1)を選択させ、この文字カテゴリ頻度メモリ部7−1
に格納した文字カテゴリ別の頻度情報を読出す。
The collation value expressed as such a numerical value is input to the collation value correction unit 9. When the collation value for each character category is input, the collation value correction unit 9 controls the memory selection unit 5 so that the memory selection unit 5 stores the character category frequency memory unit according to the address column 13 which is the current recognition target range. (For example, 7-
1) is selected, and the character category frequency memory unit 7-1
The frequency information for each character category stored in is read.

【0020】例えば、図3(b)に示すように、数字
「2」、英字「Z」、漢字「乙」の文字カテゴリに対し
て、「2」=1.0,「Z」=0.2,「乙」=0.1
といったように数値表現された頻度情報を読み出す。こ
こで、数値が大きいほど出現確率が大きいことを示して
いる。
For example, as shown in FIG. 3B, for the character categories of the numeral "2", the alphabet "Z", and the Chinese character "Otsu", "2" = 1.0, "Z" = 0. 2, "Oto" = 0.1
The frequency information expressed numerically is read out. Here, it is indicated that the larger the numerical value, the higher the appearance probability.

【0021】このような頻度情報を読出したならば、例
えば「照合値A÷頻度情報B=補正照合値C」という補
正演算式を用い、照合値Aを頻度情報Bによって補正す
る。
When such frequency information is read, the matching value A is corrected by the frequency information B using a correction arithmetic expression, for example, "matching value A / frequency information B = corrected matching value C".

【0022】この結果、図3(c)に示すような補正照
合値Cが得られる。すなわち、識別照合部3から出力さ
れる照合値によれば、番地欄13から読み取った数字
「2」は漢字「乙」に類似していることを表わしている
が、番地欄13には漢字「乙」が出現する確率は小さい
筈であるので、この出現確率を表わす頻度情報Bによっ
て補正されて候補順位が下げられ、これに代えて数字
「2」の候補順位が繰り上げられる。
As a result, the corrected collation value C as shown in FIG. 3C is obtained. That is, according to the collation value output from the identification and collation unit 3, the number “2” read from the address column 13 is similar to the Chinese character “Otsu”, but the address column 13 has the Chinese character “ Since the probability that "B" appears will be small, it is corrected by the frequency information B representing the appearance probability and the candidate rank is lowered, and instead, the candidate rank of the number "2" is advanced.

【0023】従って、この例では数字「2」、英字
「Z」、漢字「乙」の順位で文字カテゴリ候補が出力さ
れる。
Therefore, in this example, character category candidates are output in the order of the number "2", the alphabet "Z", and the Chinese character "Otsu".

【0024】従って、図2のような帳票10にあって
は、地名欄12に対応する頻度情報は数字に関する頻度
情報を小さくし、漢字に関する頻度情報を大きくすれ
ば、候補文字のカテゴリ範囲を小さくすることができ
る。同様に、姓名欄14にあっては、「県」、「市」、
「町」といった漢字に関する頻度情報を小さくし、これ
に代えて姓名に多く使用される漢字「夫」、「子」等の
漢字に関する頻度情報を大きくすることにより、候補文
字のカテゴリ範囲を小さくすることができる。
Therefore, in the form 10 as shown in FIG. 2, if the frequency information corresponding to the place name column 12 is smaller in frequency information regarding numbers and is larger in frequency information regarding kanji, the category range of candidate characters is reduced. can do. Similarly, in the family name field 14, "prefecture", "city",
By reducing the frequency information about Kanji such as "town" and increasing the frequency information about Kanji such as "husband" and "child" that are often used for surnames, the category range of candidate characters is reduced. be able to.

【0025】[0025]

【発明の効果】以上説明したように本発明によれば、認
識対象範囲内で出現する文字カテゴリ毎の出現確率に関
する頻度情報により、文字カテゴリ毎の照合値を補正
し、この補正した照合値を候補文字カテゴリソート手段
へ入力するので、字形が類似していることが原因で他の
文字カテゴリより候補順位が低くなった場合であって
も、この順位を出現確率に応じて補正し、優先順位の高
い文字カテゴリ候補として出力することができる。この
結果、文字カテゴリ候補の絞り込み程度が大きくなり、
実効的な認識精度を向上させることができるといった効
果がある。
As described above, according to the present invention, the matching value for each character category is corrected based on the frequency information regarding the appearance probability of each character category that appears in the recognition target range, and the corrected matching value is calculated. Since the candidate character category is input to the sorting means, even if the candidate rank is lower than other character categories due to similar glyphs, this rank is corrected according to the appearance probability, and the priority rank is set. It can be output as a character category candidate having a high character. As a result, the degree of narrowing down the character category candidates will increase,
There is an effect that the effective recognition accuracy can be improved.

【図面の簡単な説明】[Brief description of drawings]

【図1】 本発明の文字認識方式を用いた文字認識装置
の主要部の構成の一実施例を示すブロック図である。
FIG. 1 is a block diagram showing an example of a configuration of a main part of a character recognition device using a character recognition system of the present invention.

【図2】 認識対象の文字列を記述した帳票の例を示す
図である。
FIG. 2 is a diagram showing an example of a form in which a character string to be recognized is described.

【図3】 照合値、頻度情報および補正照合値の例を示
す説明図である。
FIG. 3 is an explanatory diagram showing examples of matching values, frequency information, and corrected matching values.

【図4】 従来の文字認識方式を用いた文字認識装置の
主要部の構成を示すブロック図である。
FIG. 4 is a block diagram showing a configuration of a main part of a character recognition device using a conventional character recognition method.

【図5】 従来の文字認識装置における文字カテゴリ別
の使用可否情報の例を示す説明図である。
FIG. 5 is an explanatory diagram showing an example of usability information for each character category in a conventional character recognition device.

【符号の説明】[Explanation of symbols]

1…特徴抽出部、2…辞書メモリ部、3…識別照合部、
6…候補文字カテゴリソート部、7−1,7−2…文字
カテゴリ頻度メモリ部、8…メモリ選択部、9…照合値
補正部。
1 ... Feature extraction unit, 2 ... Dictionary memory unit, 3 ... Identification and collation unit,
6 ... Candidate character category sorting section, 7-1, 7-2 ... Character category frequency memory section, 8 ... Memory selecting section, 9 ... Collation value correcting section.

Claims (3)

【特許請求の範囲】[Claims] 【請求項1】 認識対象範囲から読み取った文字パター
ンの特徴を解析し、識別用の特徴ベクトルを作成する特
徴抽出手段と、認識対象の多数の文字パターンについて
その標準的特徴を表わす照合情報を文字カテゴリ毎に格
納した辞書メモリ手段と、この辞書メモリ手段に格納し
た照合情報を文字カテゴリ単位に順次読出し、前記特徴
ベクトルと照合し、照合情報に対する特徴ベクトルの類
似度を表わす照合値を文字カテゴリ単位に出力する識別
照合手段と、全ての文字カテゴリにおいて得た照合値を
ソートし、認識対象範囲から読み取った文字パターンの
文字カテゴリ候補を出力する候補文字カテゴリソート手
段とを備えた文字認識方式において、前記認識対象範囲
内で出現する文字カテゴリ毎の出現確率に関する頻度情
報を格納した文字カテゴリ頻度メモリ手段と、前記識別
照合手段で得た文字カテゴリ毎の照合値を前記文字カテ
ゴリ毎の頻度情報に基づき補正する照合値補正手段とを
設け、この照合値補正手段で補正した照合値を前記候補
文字カテゴリソート手段へ入力し、認識対象範囲から読
み取った文字パターンの文字カテゴリ候補を出力するこ
とを特徴とする文字認識方式。
1. A feature extraction means for analyzing a feature of a character pattern read from a recognition target range to create a feature vector for identification, and collation information representing a standard feature of a large number of recognition target character patterns. The dictionary memory means stored for each category and the collation information stored in the dictionary memory means are sequentially read in character category units, collated with the feature vector, and a collation value representing the similarity of the feature vector to the collation information is presented in character category units. In the character recognition method provided with the identification and collation means for outputting to, the collation values obtained in all the character categories, and the candidate character category sorting means for outputting the character category candidates of the character pattern read from the recognition target range, A character character that stores frequency information regarding the appearance probability of each character category that appears in the recognition target range. A category frequency memory means and a matching value correcting means for correcting the matching value for each character category obtained by the identification matching means based on the frequency information for each character category are provided, and the matching value corrected by this matching value correcting means is provided. A character recognition method characterized by inputting to the candidate character category sorting means and outputting a character category candidate of a character pattern read from a recognition target range.
【請求項2】 前記文字カテゴリ頻度メモリ手段は、複
数の認識対象範囲のそれぞれに対応して設け、このうち
現在の認識対象範囲に応じた1つの文字カテゴリ頻度メ
モリ手段に格納した頻度情報を読出し、前記識別照合手
段で得た文字カテゴリ毎の照合値を補正することを特徴
とする請求項1記載の文字認識方式。
2. The character category frequency memory means is provided corresponding to each of a plurality of recognition target ranges, and the frequency information stored in one character category frequency memory means corresponding to the current recognition target range is read out. 2. The character recognition method according to claim 1, wherein the matching value for each character category obtained by the identification and matching means is corrected.
【請求項3】 前記文字カテゴリ頻度メモリ手段は、複
数の認識対象範囲のうち出現する文字カテゴリが異なる
認識対象範囲別に設け、このうち現在の認識対象に応じ
た1つの文字カテゴリ頻度メモリ手段に格納した頻度情
報を読出し、前記識別照合手段で得た文字カテゴリ毎の
照合値を補正することを特徴とする請求項1記載の文字
認識方式。
3. The character category frequency memory means is provided for each recognition target range in which a different character category appears among a plurality of recognition target ranges, and is stored in one character category frequency memory means corresponding to the current recognition target among them. The character recognition method according to claim 1, wherein the frequency information is read and the matching value for each character category obtained by the identification and matching means is corrected.
JP4099716A 1992-04-20 1992-04-20 System for recognizing character Pending JPH05298489A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP4099716A JPH05298489A (en) 1992-04-20 1992-04-20 System for recognizing character

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP4099716A JPH05298489A (en) 1992-04-20 1992-04-20 System for recognizing character

Publications (1)

Publication Number Publication Date
JPH05298489A true JPH05298489A (en) 1993-11-12

Family

ID=14254803

Family Applications (1)

Application Number Title Priority Date Filing Date
JP4099716A Pending JPH05298489A (en) 1992-04-20 1992-04-20 System for recognizing character

Country Status (1)

Country Link
JP (1) JPH05298489A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002366898A (en) * 2001-06-07 2002-12-20 Toshiba Corp Device and method for location information recognition and sectioning device
JP2010003182A (en) * 2008-06-20 2010-01-07 Sharp Corp Device and method for generating character string, character string generating program, and computer-readable recording medium with the character string generating program recorded thereon

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002366898A (en) * 2001-06-07 2002-12-20 Toshiba Corp Device and method for location information recognition and sectioning device
JP4733859B2 (en) * 2001-06-07 2011-07-27 株式会社東芝 Location information recognition device and sorting device
JP2010003182A (en) * 2008-06-20 2010-01-07 Sharp Corp Device and method for generating character string, character string generating program, and computer-readable recording medium with the character string generating program recorded thereon

Similar Documents

Publication Publication Date Title
JPH05298489A (en) System for recognizing character
WO2000036530A1 (en) Searching method, searching device, and recorded medium
JPH0157837B2 (en)
JPH11120294A (en) Character recognition device and medium
JPS6224382A (en) Method for recognizing handwritten character
KR100285426B1 (en) Method of distributing gap of letter and gap of word
JP2894305B2 (en) Recognition device candidate correction method
JP2875678B2 (en) Post-processing method of character recognition result
JPH0520490A (en) Optical character read and correction system
JP2639314B2 (en) Character recognition method
JPH0355874B2 (en)
JP3245415B2 (en) Character recognition method
KR100356503B1 (en) Device for recognizing learning character
JP4143148B2 (en) Character recognition device
JPS58163072A (en) Character correcting system
JPS61133487A (en) Character recognizing device
JPH0347553B2 (en)
JPH0338631B2 (en)
JPH0573027A (en) Individual penmanship dictionary generation device and character output processor using individual penmanship dictionary
JPH0620087A (en) Kanji address data processing method for ocr processing system
JPH0981690A (en) Handwritten character on-line recognition device and character style registering and learning method
JPH0827819B2 (en) Handwritten character recognition method for handwritten character recognition and handwritten character recognition apparatus using the same
JPH0540854A (en) Post-processing method for character recognizing result
JPH10261049A (en) Character recognizing device
JPH07141370A (en) English morpheme analyzer