JPS6290781A

JPS6290781A - Character recognition system

Info

Publication number: JPS6290781A
Application number: JP60231946A
Authority: JP
Inventors: Michiaki Nakanishi; 道明中西
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1985-10-16
Filing date: 1985-10-16
Publication date: 1987-04-25

Abstract

PURPOSE:To shorten a processing time by obtaining a true quantity of dislocation in position every printing character on characters of plural lines, calculating a statistic quantity every same column of the respective lines and carrying out a template matching making an average value correction value of a collation start position if the quantity is within a prescribed quantitative range. CONSTITUTION:The quantity that a central position 2 of a printing character is dislocated from a central position 3 of an ideal printing line is a sum of the quantity 4 of a skew and the quantity 5 of dislocation in position peculiar to the printing character. Accordingly, the quantity 4 of the skew of respective lines is obtained and the quantity 5 of dislocation inposition peculiar to the printing character is added thereto, a statistic quantity every same column of the respective lines is obtained. If the statistic quantity is situated within a proper quantitative range from a character line width forming a character pattern and the number of scanning lines and a within a standard deviation 2sigma, an average value can be used as a correction value for deciding a collating start position of a template matching. Thereby, a processing time is shortened and a rejection rate is reduced.

Description

【発明の詳細な説明】〔概要〕文字認識方式であって、印字された複数行の文字列の同
一桁の文字位置ズレ量の統計値を求めることにより印字
状態を数量化し、このデータを基にしてテンプレートマ
ツチングとの最適照合位置を決め、照合処理時間の短縮
化とりジェット率の大幅な緩和を可能とする文字認識処
理方式が示されている。[Detailed Description of the Invention] [Summary] This is a character recognition method that quantifies the printing condition by determining the statistical value of the amount of character position shift in the same digit of a printed character string of multiple lines, and uses this data as a basis. A character recognition processing method has been proposed that determines the optimal matching position using template matching, thereby shortening the matching processing time and making it possible to significantly reduce the jet rate.

[Industrial application field]

本発明は低品質の印字文字でも効率良く認識処理を行う
ことが可能な、文字認識装置の文字認識方式に関するも
ので、さらに詳しくいえば、印刷に於いて、印字行方向
にスキュー（傾き）を生じたり、或いは印字文字列の位
置が不規則になったりして、印字文字の一部に欠は潰れ
が生じることがある。このような文字パターンとテンプ
レートマツチングをとる処理時間を短縮化すると共に、
リジェクト率を大幅に緩和することを可能とした文字認
識方式に関するものである。The present invention relates to a character recognition method for a character recognition device that can efficiently recognize even low-quality printed characters.More specifically, the present invention relates to a character recognition method for a character recognition device that can efficiently recognize even low-quality printed characters. Otherwise, the position of the printed character string may become irregular, resulting in parts of the printed characters being missing or being crushed. In addition to shortening the processing time for matching character patterns and templates,
This invention relates to a character recognition method that makes it possible to significantly reduce the rejection rate.

情報処理の電算機化が進み、全ての処理が高速処理され
る中で、データ入力の大部分は人手に頼る部分が多く、
このネックを解消しデータ人力の高速処理を目指し文字
読み取り技術の研究開発が進められてきた。As information processing becomes more computerized and all processing becomes faster, much of the data input still relies on humans.
Research and development of character reading technology has been progressing with the aim of solving this bottleneck and enabling high-speed data processing by humans.

ＯＣＲ（光学的文字認識装置）技術はこのようなニーズ
のもとに高度に発達し、その成果としては一般によく知
られている郵便番号読取装置、電話度数計読取装置、或
いはスーパーマーケット業におけるバーコードリーグの
実用化がある。OCR (optical character recognition) technology has been highly developed based on these needs, and its results include the well-known postal code reading devices, telephone frequency reading devices, and barcode reading devices in the supermarket industry. There is a practical use of leagues.

しかし、がたや電子計算機による情報処理の発展に伴い
普及形の低価格プリンタが市場に進出しており、これら
のプリンタで出力された低品質の印字物でも、高速に光
学的に文字認識ができる文字認識方式の開発が要望され
ている。However, with the development of information processing using electronic computers, popular low-cost printers have entered the market, and even low-quality printed materials output by these printers can be recognized quickly and optically. There is a need for the development of a character recognition method that can do this.

[Conventional technology]

第４図は従来の文字認識方式の前処理を示したものであ
る。FIG. 4 shows preprocessing in a conventional character recognition system.

印字物の文字行単位で上下方向の位置ずれ、或いは用紙
の相対位置ずれによるスキュー（傾キ）量に対応して文
字の切り出しを実行するには、文字の水平方向探索範囲
４１、垂直方向探索範囲４２の幅を持つマスクを移動さ
せることにより、文字部分の黒点のヒストグラム４３を
求め、文字の上下端と左右端の座標を検出し、文字の外
接枠の位置を決めて文字切り出しを行っている。To cut out characters in accordance with the amount of skew (tilt) due to vertical positional deviation in character line units of printed matter or relative positional deviation of paper, horizontal direction search range 41 and vertical direction search for characters are required. By moving a mask with a width of range 42, a histogram 43 of the black points of the character part is obtained, the coordinates of the upper and lower ends and left and right ends of the character are detected, the position of the circumscribing frame of the character is determined, and the character is cut out. There is.

一般にラインプリンタで出力された印刷物は各行とも同
一傾向にあり、数行の傾向を把握して判定を行いこのデ
ータを基に文字の切り出しに反映することができる。In general, each line of printed matter output by a line printer has the same tendency, and it is possible to determine the tendency of several lines and use this data to reflect in cutting out characters.

しかし、普及型として市場に出回っている低価格のプリ
ンタ類では、印字文字の並びに不規則の物が多く、この
ため文字の上火は下欠けが無秩序に出現している。この
ような文字パターンに対応を取るためには処理時間の増
大は避けられなかった。However, in low-priced printers that are popular on the market, the printed characters are often arranged in an irregular manner, and as a result, the tops of the characters are randomly chipped. In order to accommodate such character patterns, an increase in processing time was unavoidable.

[Problem that the invention seeks to solve]

このように、従来の文字認識方式では、印字文字の欠け
が無秩序に出現している物に対しては、文字の欠けに対
応した切り出しを行うことが出来ない。As described above, in the conventional character recognition method, it is not possible to perform cutting corresponding to the missing characters for objects in which missing printed characters appear randomly.

従って、テンプレートマツチング方式に例をとると、照
合処理はシフト量を拡大して対処しなければならなかっ
た。この為処理時間は必然的に増大するという問題点を
抱えている。Therefore, taking the template matching method as an example, the matching process had to be handled by increasing the shift amount. For this reason, there is a problem in that the processing time inevitably increases.

本発明は従来の行単位のスキュー量から判定して文字を
切り出す方式に加え、欠陥文字が位置している桁単位の
データを追加し、上記の問題点の解決を図るものである
。The present invention aims to solve the above-mentioned problems by adding data in units of digits in which defective characters are located in addition to the conventional method of cutting out characters based on the skew amount in units of lines.

これにより、文字の一部分に欠けの生じた印字文字に対
しても、より正確な照合位置を算出して文字認識処理時
間の短縮化を実現させることを目的としている。The purpose of this is to calculate a more accurate collation position even for a printed character in which a portion of the character is missing, thereby shortening the character recognition processing time.

[Means for solving problems]

本発明は印字物の行の各文字桁位置毎に文字の真の位置
ズレ世を求め、同一桁毎にこの位置ズレ量の平均値を求
める。In the present invention, the true positional deviation of a character is determined for each character digit position in a line of printed matter, and the average value of this positional deviation amount is determined for each same digit.

この平均値を基準として照合開始位置を決定することに
より、より正確な位置での照合を開始する事が可能とな
り上記問題点を解決することができる。By determining the comparison start position based on this average value, it is possible to start the comparison at a more accurate position, and the above-mentioned problem can be solved.

[Effect]

第１図において印字文字の中心位置２が、理想的な印字
行の中心位置３からずれている量は、印字文字行のスキ
ュー量４と印字文字の固有の位置ズレ量５の和である。In FIG. 1, the amount by which the center position 2 of the printed character deviates from the ideal center position 3 of the printed line is the sum of the skew amount 4 of the printed character line and the inherent positional deviation amount 5 of the printed character.

従って、各行のスキュー量４を従来と同様な方法によっ
て求め、これに印字文字の固有の位置ズレ量５を加算し
、各行の同一桁毎についての統計量を求める。Therefore, the skew amount 4 for each line is determined in the same manner as in the conventional method, and the unique positional deviation amount 5 of the printed characters is added to this to determine the statistical amount for each same digit in each line.

この統計量が、文字パターンを造形する文字線幅と走査
線数から適性な量的範囲内にあり、かつ標準偏差２σ内
にある限りにおいては、平均値をもってテンプレートマ
ツチングの照合開始位置を決定する補正値とし用いるこ
とができる。As long as this statistic is within an appropriate quantitative range based on the character line width and number of scanning lines that form the character pattern, and is within a standard deviation of 2σ, the matching start position for template matching will be determined using the average value. It can be used as a correction value.

このように、正確且つ合理的な照合開始位置を決定する
ことにより、処理時間の短縮化が図れるとともにリジェ
クト率の緩和が可能となる。In this way, by determining an accurate and reasonable verification start position, processing time can be shortened and the rejection rate can be reduced.

〔Example〕

第２図は低価格のヘルドプリンタによる出力印刷物でよ
く見掛ける印字文字を説明用として拡大した図である。FIG. 2 is an enlarged diagram for explanation of printed characters often seen on printed matter output by a low-cost held printer.

この印字文字は、上下の位置ズレとインクリボンの濃淡
から、文字の上下に欠けが生じている。This printed character has chipping at the top and bottom of the character due to vertical positional misalignment and the shading of the ink ribbon.

この例のようにプリンタ出力の行方向（左右）に観察す
ると不規則で同一の傾向は認められないが、各行の桁に
ついて上下の方向に観察すると、各行により文字種は異
なるものの、桁毎の文字位置ズレに規則性が見受けられ
る。If you observe the printer output in the line direction (left and right) as in this example, it is irregular and the same tendency cannot be observed, but if you observe the digits of each line in the up and down direction, although the character types differ depending on each line, the characters in each column Regularity can be seen in the positional deviation.

従って、各文字位置の真の位置ズレ量２１を印字文字行
のスキュー量２２のｔａｎθと、各桁毎の文字の固有位
置ズレ量２３から計算し、複数行の各桁毎の位置ズレ量
の統計値を求める。Therefore, the true positional deviation amount 21 of each character position is calculated from the tanθ of the skew amount 22 of the printed character line and the unique positional deviation amount 23 of the character for each digit. Find statistical values.

なおｔａｎθの算出は、例えば文字行の始めと終りにス
キュー検出マーク（マイナス記号“−”を一般に用いる
）を印字しておき、これらの関係位置をイメージ上で調
べることにより求めることができる。Note that tan θ can be calculated by, for example, printing skew detection marks (generally using a minus sign “-”) at the beginning and end of a character line, and examining their relative positions on the image.

このようにして求めた統計値が、前述した適性な量的範
囲内にある限りに於いて、平均値をもって切り出し処理
をしたパターンを補正し、テンプレートマツチングの照
合開始位置を決定する。As long as the statistical values obtained in this way are within the above-mentioned appropriate quantitative range, the extracted pattern is corrected using the average value, and the matching start position for template matching is determined.

第３図は上記の平均値により照合位置を決定した切り出
し処理後のパターンの例示であり、３１は切り出した状
態のパターン、３２は平均値で補正した位置（最適照合
位置）、３３は補正量である。FIG. 3 is an example of a pattern after the cutout process in which the matching position was determined by the above average value, 31 is the cut out pattern, 32 is the position corrected by the average value (optimum matching position), and 33 is the amount of correction. It is.

このようにして第３図の如く印字文字が位置ずれにより
、かなりの欠けを生じていても合理的な位置合わせを採
れば、文字の欠けた部分に弁別要素が無い限り判定が可
能となる。In this way, even if the printed characters are considerably chipped due to misalignment as shown in FIG. 3, if reasonable alignment is taken, it is possible to judge as long as there is no distinguishing element in the chipped portion of the characters.

従って、低品質の印字文字にも対応がとれるといった特
徴が得られ、照合処理時間が短縮されると共に照合不能
とされるリジェクト率が大幅に緩和される。Therefore, it is possible to deal with low-quality printed characters, thereby shortening the verification processing time and significantly reducing the rejection rate where verification is impossible.

また、通常ラインプリンタの出力印刷物では、−行全体
が同様のスキューを生じており、これらに対しては、前
記の複数行に渡って各桁単位のデータを収集し演算する
機能を用い、各行の印字文字の照合毎に最適位置情報を
累積してゆき、行の下段へ進むに従って、より最適位置
に近いところから照合が開始できるようになる。In addition, in printed matter output from a normal line printer, the entire - line has a similar skew, and in order to solve this problem, the above-mentioned function that collects and calculates data in units of digits across multiple lines is used to The optimum position information is accumulated every time the printed characters are compared, and as the line progresses to the bottom, the comparison can be started from a position closer to the optimum position.

なお、最終行での最適位置情報により先頭行の再試行を
行い、一層正確な照合処理による文字認識を行うことも
可能となる。Note that it is also possible to retry the first line based on the optimal position information on the last line and perform character recognition through more accurate matching processing.

〔Effect of the invention〕

本発明により、低品質の印字文字であっても最適照合位
置が得られることにより、テンプレートマツチング処理
時間が短縮化されるとともに、照合不能によるリジェク
ト率を大幅に緩和することが可能となり、甚だ有効な文
字認識方式である。The present invention makes it possible to obtain the optimal matching position even for low-quality printed characters, thereby shortening the template matching processing time and making it possible to significantly reduce the rejection rate due to inability to match. It is an effective character recognition method.

[Brief explanation of drawings]

第１図は本発明の方式説明図である、第２図は本発明の実施説明図、第３図は最適照合位置を例示した図、第４図は従来の文字切り出し処理の方式例である。第１図において、１は印字文字の真の位置ズレ量、２は印字文字の中心位置、３は理想的な印字行の中心位置、４は印字文字行のスキュー量、羊亮ａＨぺ犠茂甲凹第　１　図手発明−兜浸設明厨第２図 FIG. 1 is an explanatory diagram of the method of the present invention. FIG. 2 is an explanatory diagram of the implementation of the present invention, Figure 3 is a diagram illustrating the optimal matching position, FIG. 4 is an example of a conventional character extraction process. In Figure 1, 1 is the true positional deviation amount of printed characters, 2 is the center position of the printed character, 3 is the ideal center position of the print line, 4 is the skew amount of the printed character line, Yang Ryo aH Pesimo Koko Figure 1 Hand invention - Kabuto immersion Meichu Figure 2

Claims

[Claims] The true positional deviation amount (1) for each printed character is determined for multiple lines of characters, and the positional deviation amount (1) for each same digit in each line is calculated.
As long as this statistic is within a predetermined quantitative range, template matching is performed by using the average value of the above positional deviation amount as the correction value for the matching start position for each same digit in each row. A character recognition method that is characterized by