JPH0135386B2 - - Google Patents

Info

Publication number
JPH0135386B2
JPH0135386B2 JP56048068A JP4806881A JPH0135386B2 JP H0135386 B2 JPH0135386 B2 JP H0135386B2 JP 56048068 A JP56048068 A JP 56048068A JP 4806881 A JP4806881 A JP 4806881A JP H0135386 B2 JPH0135386 B2 JP H0135386B2
Authority
JP
Japan
Prior art keywords
character
projection
character pattern
complementary
circuit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
JP56048068A
Other languages
Japanese (ja)
Other versions
JPS57162087A (en
Inventor
Takashi Akimoto
Masaki Komya
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Original Assignee
Tokyo Shibaura Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tokyo Shibaura Electric Co Ltd filed Critical Tokyo Shibaura Electric Co Ltd
Priority to JP56048068A priority Critical patent/JPS57162087A/en
Publication of JPS57162087A publication Critical patent/JPS57162087A/en
Publication of JPH0135386B2 publication Critical patent/JPH0135386B2/ja
Granted legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • G06V30/262Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Description

【発明の詳細な説明】 この発明は、誤読率を減少させた光学的文字読
取装置に関する。
DETAILED DESCRIPTION OF THE INVENTION The present invention relates to an optical character reading device that reduces the rate of misreading.

従来から、光学的文字読取装置(以下OCRと
いう)では、文字の認識方法として、類似度法を
採用しているものがある。この類似度法では、帳
票に記入された文字が光電変換され、この光電変
換された文字パターンと、予め用意されている認
識辞書に記憶されている標準文字パターンとの類
似計算がなされる。そして、類似度値が最も高い
第1位の候補文の類似度値が所定値以上であり、
かつ第2位の侯補文字の類似度値との差が所定値
以上である場合に、第1位の侯補文字を答として
出力していた。
Conventionally, some optical character reading devices (hereinafter referred to as OCR) have adopted a similarity method as a character recognition method. In this similarity method, characters written on a form are photoelectrically converted, and a similarity calculation is performed between the photoelectrically converted character pattern and a standard character pattern stored in a recognition dictionary prepared in advance. Then, the similarity value of the first candidate sentence with the highest similarity value is greater than or equal to a predetermined value,
In addition, when the difference from the similarity value of the second-ranked complementary character is equal to or greater than a predetermined value, the first-ranked complementary character is output as the answer.

したがつて、この類似度法では、文字の形の類
似した英小文字、j、iのような文字の大きさ
(横幅および高さ)が明らかに異なるものでも、
誤読を起こすような欠点があつた。
Therefore, with this similarity method, even if the letters are similar in shape, such as lowercase letters such as j and i, which are clearly different in size (width and height),
There were flaws that could lead to misreading.

この発明は上記のような事情に鑑みてなされた
もので、類似度法により得られた答が妥当か否か
を前後の文字との射影値の相対関係に基づきチエ
ツクすることにより、誤読率を減少させることが
できる光学的文字読取装置を提供することを目的
とする。
This invention was made in view of the above circumstances, and it is possible to reduce the misreading rate by checking whether the answer obtained by the similarity method is valid or not based on the relative relationship of the projection value with the preceding and succeeding characters. It is an object of the present invention to provide an optical character reading device that can reduce the number of characters.

以下、図面を参照してこの発明の実施例を説明
する。第1図はこの発明の一実施例のOCRの概
略構成図である。図中、1は光電変換部で、文字
の記入された図示せぬ帳票表面を光学的に走査
し、反射光をアナログ電気信号に変換する機能を
もつている。2は量子化回路で、光電変換部1か
ら送られてくるアナログ信号をデジタル信号に変
換する機能を持つている。3は検切回路で、量子
化回路2から送られてくるデジタル信号を記憶
し、記憶された文字パターン(光電変換された文
字パターン)を1文字単位の文字パターンに検出
し切り出す機能をもつている。4は正規化回路
で、検切回路3を検切りされた文字パターンの位
置や、線幅等の正規化を行う機能を持つている。
5は前処理回路で、正規化回路4で正規化された
文字パターンのデータ量、データ形式を整える等
の前処理を行なう機能を持つている。6は認識辞
書で、参照用の標準文字パターンが記憶されてい
る。7は類似度計算回路で、前処理回路5で前処
理された文字パターンについて、認識辞書7に記
憶されている各標準文字パターンを参照すること
により、各標準文字パターンとの類似度を計算し
類似度値を求める機能を持つている。8は射影メ
モリで、検切回路3で検切された文字パターン毎
の高さと幅の射影およびその文字パターンの位置
情報が記憶されているようになつている。9は補
正テーブルで、2つづつ標準文字パターンの組み
合せに応じて、各2つの標準文字パターンの高さ
と幅の差に基づき設定された各文字毎の高さと幅
の補正値が記憶されている。10は判定回路で、
類似度計算回路7から送られてくる1行分の第1
位の侯補文字が答として適当か否かを射影メモリ
8および補正テーブル9に記憶された内容を参照
し所定の演算を行ないチエツクする機能をもつて
いる。
Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a schematic configuration diagram of an OCR according to an embodiment of the present invention. In the figure, reference numeral 1 denotes a photoelectric conversion unit, which has the function of optically scanning the surface of a form (not shown) on which characters are written and converting reflected light into an analog electrical signal. 2 is a quantization circuit which has a function of converting the analog signal sent from the photoelectric conversion section 1 into a digital signal. Reference numeral 3 denotes a detection circuit which has the function of storing the digital signal sent from the quantization circuit 2, detecting and cutting out the stored character pattern (photoelectrically converted character pattern) into character patterns in units of characters. There is. Reference numeral 4 denotes a normalization circuit, which has a function of normalizing the position of the character pattern, line width, etc., which are cut off from the cutoff circuit 3.
A preprocessing circuit 5 has a function of performing preprocessing such as adjusting the data amount and data format of the character pattern normalized by the normalization circuit 4. 6 is a recognition dictionary in which standard character patterns for reference are stored. Reference numeral 7 denotes a similarity calculation circuit, which calculates the similarity between the character pattern preprocessed by the preprocessing circuit 5 and each standard character pattern by referring to each standard character pattern stored in the recognition dictionary 7. It has a function to calculate similarity value. Reference numeral 8 denotes a projection memory in which the projection of the height and width of each character pattern inspected by the inspection circuit 3 and the position information of the character pattern are stored. Reference numeral 9 is a correction table, which stores height and width correction values for each character, which are set based on the difference in height and width of each two standard character patterns, in accordance with the combination of two standard character patterns. . 10 is a judgment circuit;
The first line of one line sent from the similarity calculation circuit 7
It has a function of checking whether or not the complementary character of the position is appropriate as an answer by referring to the contents stored in the projection memory 8 and the correction table 9 and performing a predetermined calculation.

次に、上記実施例の動作を説明する。図示せぬ
帳票に記入された文字は、光電変換部1で光電変
換された文字パターンとされる。この光電変換さ
れた文字パターンは、量子化回路2で白黒の2値
化され量子化された文字パターンとされる。この
量子化された文字パターンは、検切回路3で、1
文字単位の文字パターンに検切される。このと
き、検切りされた文字パターンの高さと幅の射影
がとられ、その高さと幅の各射影値が検切りされ
た文字パターン毎に射影メモリ8に記憶される。
このとき、その文字パターンの位置情報も記憶さ
れる。一方、検切りされた文字パターンは、正規
化回路4で位置合せをしたり線幅を均一に補正す
る等の正規化が行なわれる。正規化された文字パ
ターンは、前処理回路5で前処理が施される。こ
の前処理された文字パターンは、類似度演算回路
7で認識辞書6に記憶されている各標準文字パタ
ーンとの類似度が計算され、類似度値の最も高い
標準文字パターンが第1位の侯補文字として、判
定回路10に送られる。そして、上記同様な動作
により、一行中における各文字パターン毎の類似
度演算による第1位の侯補文字が求められ、また
一行中における各文字パターン毎の高さと幅の射
影値が射影メモリ8に記憶されると、判定回路1
0は一行中における各文字パターン毎の第1位の
侯補文字が答として適当か否かのチエツクを次の
ようにして行なう。
Next, the operation of the above embodiment will be explained. Characters written on a form (not shown) are photoelectrically converted by the photoelectric converter 1 into character patterns. This photoelectrically converted character pattern is converted into a black and white binary value and quantized character pattern by a quantization circuit 2. This quantized character pattern is processed by the verification circuit 3.
The character pattern is checked character by character. At this time, a projection of the height and width of the cut-out character pattern is taken, and each projection value of the height and width is stored in the projection memory 8 for each cut-out character pattern.
At this time, position information of the character pattern is also stored. On the other hand, the cut-out character pattern is normalized by a normalization circuit 4, such as alignment and correction of line width to make it uniform. The normalized character pattern is subjected to preprocessing in a preprocessing circuit 5. The similarity calculation circuit 7 calculates the similarity of this preprocessed character pattern with each standard character pattern stored in the recognition dictionary 6, and the standard character pattern with the highest similarity value is ranked first. It is sent to the determination circuit 10 as a complementary character. Then, by the same operation as above, the first complementary character is obtained by calculating the similarity for each character pattern in one line, and the projected values of height and width for each character pattern in one line are stored in the projection memory 8. When stored in the determination circuit 1
0 checks whether the first complementary character for each character pattern in a line is appropriate as an answer as follows.

すなわち、例えば第2図Aに示すように、一行
中における各文字パターン毎の第1位の侯補文字
が得られたとする。また、第2図Bに示すように
第2図Aに対応して、一行中における各文字パタ
ーン毎の幅Xと高さYの射影値が射影メモリ8に
記憶されているとする。一行中における最初の文
字の第1位の侯補文字aが答として適当であるか
否かを、2文字目の第1位の侯補文字bとの射影
値の関係から判定する。具体的には、判定回路1
0は、最初の文字の第1位の侯補文字aの高さの
射影値Yaと、次に文字の第1位の侯補文字bの
高さの射影値Ybと、予め2文字の組み合わせか
ら、それぞれの文字の高さの差および幅の差に基
づき設定された幅の補正値XΔおよび高さの補正
値YΔが記憶されている補正テーブル9から抽出
された文字a、b間の高さの補正値YΔabとから、
Ya<Yb−YΔabなる式を満足した場合に、最初の
文字の第1位の侯補文字aが答として適している
と判定する。なお、この場合幅の射影値Xa、Xb
間については、文字a、b間の幅の差は少ないの
で、幅の補正値XΔabの値も小さいので、幅の射
影値についてはチエツクを行なわない。同様にし
て、2文字目は1文字目および3文字目との間で
チエツクを行う。このようにして、1行分の第1
位の侯補文字に関して、前後の文字との射影値に
基づくチエツクを行ない各侯補文字毎に答として
出力するか否かを判定する。
That is, for example, as shown in FIG. 2A, it is assumed that the first complementary character for each character pattern in one line is obtained. Further, as shown in FIG. 2B, corresponding to FIG. 2A, it is assumed that the projection values of the width X and height Y of each character pattern in one line are stored in the projection memory 8. Whether or not the first complementary character a of the first character in a line is appropriate as an answer is determined from the relationship of the projection value with the first complementary character b of the second character. Specifically, the determination circuit 1
0 is a projection value Y a of the height of the first complementary character a of the first character, a projection value Y b of the height of the first complementary character b of the next character, and two characters in advance. Between the characters a and b extracted from the correction table 9 that stores the width correction value XΔ and height correction value YΔ set based on the height difference and width difference of each character from the combination of From the height correction value YΔ ab ,
If the formula Y a <Y b −YΔ ab is satisfied, it is determined that the first complementary character a of the first character is suitable as the answer. In this case, the width projection values X a , X b
As for the width, since the difference in width between characters a and b is small, the width correction value XΔab is also small, so the projected width value is not checked. Similarly, the second character is checked against the first and third characters. In this way, the first
Regarding the complementary character in the position, a check is performed based on the projection value of the preceding and succeeding characters, and it is determined whether or not each complementary character should be output as an answer.

なお、上記射影メモリ8に記憶される文字パタ
ーンの幅および高さの射影値、位置情報は、各第
1位の侯補又はその文字コードに対して索引でき
るように記憶されている。また補正テーブル9に
予め記憶される補正値は、JIS−OCR−Bフオン
トを例にとると、英小文字のp、q、j等は他の
文字に比較して上下方向の位置について大きな差
があり、文字間の上下変動許容範囲を考えても、
その情報は有効である。このような文字について
は前後の文字の組み合せで異なつた値が必要であ
るので、文字の組み合せにより索引できるように
補正テーブル9に記憶しておく、一方、判定回路
10におけるチエツクの結果、ある侯補文字に対
して前後いずれかの文字との射影値に基づくチエ
ツクが正しくないと判定された場合には、リジエ
クトし、再読取を行なう。
Note that the projection values of the width and height of the character pattern and the position information stored in the projection memory 8 are stored so that they can be indexed for each first-place candidate or its character code. In addition, the correction values stored in the correction table 9 in advance are such that, taking the JIS-OCR-B font as an example, lowercase English letters such as p, q, and j have a large difference in vertical position compared to other characters. Yes, even considering the permissible vertical variation between characters,
That information is valid. Since such characters require different values depending on the combination of the preceding and following characters, they are stored in the correction table 9 so that they can be indexed by the combination of characters. If it is determined that the check based on the projection value of the complementary character with any of the characters before or after is incorrect, it is rejected and rereading is performed.

さらに、文字列のスキユー(傾斜)を求めるた
めに、量子化された文字パターン列を追跡するこ
とがあるが、この際に、ピリオド、カンマ等の射
影中心を文字中心とすると誤差が大きいので、射
影メモリ8に記憶された文字パターンの位置情報
を用いて補正するようにすれば、さらに効果的で
ある。この場合、一行分の各文字パターンの位置
情報を記憶させておく必要がある。また、アナロ
グの位置情報として、文字パターンの高さと、重
心を組み合せたものと、隣り合う2つの文字パタ
ーン間の各上縁と下縁間の距離を採用する方法が
ある。JIS−OCR−Bフオントを例にとると、大
文字の「1」を除く数字や「I」、「J」を除く英
文をそれぞれ1つのグループと考えれば、スキユ
ー補正のための位置情報の記憶容量を減少させる
ことができる。
Furthermore, in order to find the skew of a character string, quantized character pattern strings are sometimes tracked, but in this case, if the projection center of a period, comma, etc. is set as the character center, the error will be large. It is even more effective if the position information of the character pattern stored in the projection memory 8 is used for correction. In this case, it is necessary to store position information for each character pattern for one line. Further, as analog position information, there is a method of employing a combination of the height and center of gravity of a character pattern, and the distance between each upper edge and lower edge of two adjacent character patterns. Taking the JIS-OCR-B font as an example, if numbers excluding the capital letter "1" and English sentences excluding "I" and "J" are each considered as one group, the storage capacity of position information for skew correction is can be reduced.

このようなOCRでは、一度認識された文を前
後の射影形に基づき相対的なチエツクを行なうの
で、光学系の状態が変化しても安定したチエツク
を行なうことができる。
In this type of OCR, once a sentence has been recognized, a relative check is performed based on the previous and subsequent projections, so that stable checking can be performed even if the state of the optical system changes.

以上述べたようにこの発明によれば、類似度法
により得られた答が、妥当か否かを前後の文字と
の射影値の相対関係に基づきチエツクすることに
より、誤読率を減少させることができる光学的文
字読取装置を提供することができる。
As described above, according to the present invention, the misreading rate can be reduced by checking whether the answer obtained by the similarity method is valid or not based on the relative relationship of the projection value with the preceding and following characters. It is possible to provide an optical character reading device that can.

【図面の簡単な説明】[Brief explanation of drawings]

第1図はこの発明の一実施例のOCRの概略溝
成図、第2図Aは同実施例における一行分の認識
結果文字の例を示す図、第2図Bは第2図Aにお
ける認識結果文字に対して原文字パターンの求め
られた幅および高さの射影値を示す図である。 1……光電変換部、2……量子化回路、3……
検切回路、4……正規化回路、5……前処理回
路、6……認識辞書、7……類似度計算回路、8
……射影メモリ、9……補正テーブル、10……
判定回路。
FIG. 1 is a schematic diagram of OCR according to an embodiment of the present invention, FIG. 2A is a diagram showing an example of characters recognized as a result of one line in the same embodiment, and FIG. 2B is a diagram showing recognition in FIG. 2A. FIG. 7 is a diagram showing the calculated width and height projection values of the original character pattern for the resultant character. 1...Photoelectric conversion unit, 2...Quantization circuit, 3...
Verification circuit, 4... Normalization circuit, 5... Preprocessing circuit, 6... Recognition dictionary, 7... Similarity calculation circuit, 8
...Projection memory, 9...Correction table, 10...
Judgment circuit.

Claims (1)

【特許請求の範囲】[Claims] 1 帳票上の文字列を光電変換する光電変換部
と、光電変換された文字パターン列を量子化する
量子化回路と、量子化された文字パターン列を1
文字単位の文字パターンに切り出す検切回路と、
検切りされた文字パターン毎の射影値が1行分記
憶される射影メモリと、検切りされた文字パター
ンの侯補文字を辞書に記憶された標準文字パター
ンとマツチングさせることにより求める類似度計
算回路と、予め文字の組み合せ毎に射影補正値を
記憶させた補正テーブルと、上記射影メモリから
読み出された被チエツク侯補文字およびその前お
よびまたは後の侯補文字との射影値の差を求め、
この求められた射影値の差と上記補正テーブルか
ら読み出された上記被チエツク侯補文字と前およ
びまたは後の候補文字の組み合わせに該当する射
影補正値との関係に基づき上記被チエツク侯補文
字を答として出力するか否かを判定する判定回路
とを具備したことを特徴とする光学的文字読取装
置。
1 A photoelectric conversion unit that photoelectrically converts character strings on a form, a quantization circuit that quantizes the photoelectrically converted character pattern string, and a quantized character pattern string that converts the quantized character pattern string into 1
A detection circuit that cuts out character patterns in character units,
A projection memory that stores one line of projection values for each cut-out character pattern, and a similarity calculation circuit that calculates by matching complementary characters of the cut-out character pattern with standard character patterns stored in a dictionary. and a correction table in which projection correction values are stored in advance for each character combination, and the projection value difference between the checked complementary character and the preceding and/or subsequent complementary characters read from the projection memory is calculated. ,
Based on the relationship between the difference between the obtained projection values and the projection correction value corresponding to the combination of the candidate character to be checked and the previous and/or subsequent candidate characters read from the correction table, the complementary character to be checked is determined. What is claimed is: 1. An optical character reading device comprising: a determination circuit for determining whether or not to output as an answer.
JP56048068A 1981-03-31 1981-03-31 Optical character reader Granted JPS57162087A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP56048068A JPS57162087A (en) 1981-03-31 1981-03-31 Optical character reader

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP56048068A JPS57162087A (en) 1981-03-31 1981-03-31 Optical character reader

Publications (2)

Publication Number Publication Date
JPS57162087A JPS57162087A (en) 1982-10-05
JPH0135386B2 true JPH0135386B2 (en) 1989-07-25

Family

ID=12793027

Family Applications (1)

Application Number Title Priority Date Filing Date
JP56048068A Granted JPS57162087A (en) 1981-03-31 1981-03-31 Optical character reader

Country Status (1)

Country Link
JP (1) JPS57162087A (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5048113A (en) * 1989-02-23 1991-09-10 Ricoh Company, Ltd. Character recognition post-processing method

Also Published As

Publication number Publication date
JPS57162087A (en) 1982-10-05

Similar Documents

Publication Publication Date Title
JP3139521B2 (en) Automatic language determination device
US5664027A (en) Methods and apparatus for inferring orientation of lines of text
US5550934A (en) Apparatus and method for syntactic signal analysis
US20120257834A1 (en) Computer vision-based methods for enhanced jbig2 and generic bitonal compression
CN114742039A (en) Chinese spelling error correction method and system, storage medium and terminal
JPH0682403B2 (en) Optical character reader
JPH0135386B2 (en)
Saiga et al. An OCR system for business cards
JP4194020B2 (en) Character recognition method, program used for executing the method, and character recognition apparatus
JP2681663B2 (en) Japanese sentence correction candidate character extraction method
JP2902097B2 (en) Information processing device and character recognition device
JPS6095689A (en) Optical character reader
JPH0223490A (en) Character reading system
JP2615834B2 (en) Word reader
CN115410207A (en) Detection method and device for vertical texts
JPS5914078A (en) Reader of business form
JPH0135384B2 (en)
JPS6111886A (en) Character recognition system
JPH01171080A (en) Recognizing device for error automatically correcting character
JPH0475557B2 (en)
JPS60138689A (en) Character recognizing method
JP3476872B2 (en) Character recognition device
JPH0614375B2 (en) Character input device
JPS60144886A (en) Post-processing system of character recognizer
JPS6182275A (en) Automatic translating device