JPS60153574A

JPS60153574A - Character reading system

Info

Publication number: JPS60153574A
Application number: JP59009831A
Authority: JP
Inventors: Sueji Miyahara; 末治宮原
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1984-01-23
Filing date: 1984-01-23
Publication date: 1985-08-13
Anticipated expiration: 2009-02-23
Also published as: JPH0614372B2

Abstract

PURPOSE:To attain accurate character reading even if characters having difference size are contacted each other by determining the number of divisions in accordance with the size when the lump of a black string on a character line is larger than a fixed section, and executing different charactor segmenting methods each other. CONSTITUTION:Characters on a form are converted into binary pattern data by a photoelectric conversion circuit and temporarily stored in a pattern memory 12 from an input terminal 11 to a pattern memory 12. A character segmenting part 13 segments a row pattern including characters for one line by the pattern memory 12, and while moving a remark point in the row direction, executes the scanning of the column direction and takes out data (black string data) obtained by indicating a part including the pattern by the number of black picture elements. In addition, the character segmenting part 13 segments an individual pattern or a forced separating pattern as a discrimination pattern from the row pattern on the basis of the black string data and sends the segmented pattern to a feature extracting part 14. The feature extracting part 14 extracts the feature of the character, a discrimination part 15 collates the extracted feature with a discrimination dictionary part 16 and a character decision part 17 processes the sent data and outputs the selected character as a character reading result.

Description

【発明の詳細な説明】（技術分野）本発明は文字ピッチが文字の大きさに等しいような接触
文字の多い文書の文字を高精度でかつ高速に読取ること
ができる文字読取方式に関するものである。[Detailed Description of the Invention] (Technical Field) The present invention relates to a character reading method that can read characters in a document with many touching characters in which the character pitch is equal to the character size with high precision and at high speed. .

（従来技術）本発明者は先に、帳票上の文章全走査光電変換し得られ
た文字行のパターンから一文字ずつ切出して文字認識全
行なう文字読取方式において、文字行上の予め定められ
た一定区間内に存在する点列の塊の個数を調べ、−個の
場合はその区間？−一文字ノミターンとみなして切出し
、複数個の場合は該点列の塊を順次適宜に組合わせた複
数の組合わせパターン全それぞれ一文字のノミターンと
みなして切出し、該切出したノミターンとその切出しに
関する情報全出力する切出し工程と、該切出したパター
ンの識別結果とその切出しに関する情報とより一文字の
パターンとみなされている場合はその識別結果をそのま
ま出力し、複数個のパターンとみなされている場合はそ
の複数の組合わせパターンの各々の識別結果の中から最
もパターン幅の長い組合わせ−ξターンに対応する識別
結果を出力する文字決定工程とを有する文字読取方式全
発明した。この発明は、本出願人によって特許出願（特
願昭５７−２２２４８９号）中である。この先願発明は
文字ピッチが一定でない文書、全角や半角などの文字が
混在した文書などを精度よく、かつ高速に読取ることが
できる利点を有するものの文字の大きさが異なる文字が
接触した場合や接触した文字の一方がかけていた場合な
ど、目的とする文字読取結果が得られない場合も生ずる
おそれがあった。(Prior Art) The present inventor previously developed a method for character reading in which characters are extracted character by character from a pattern of character lines obtained by full scanning photoelectric conversion of text on a document, and a predetermined constant value on a character line is extracted. Check the number of point sequence clusters that exist within the interval, and if there are - pieces, is it the interval? - Cut out the chisel turns as one character, and if there are multiple combinations of dots, cut out all of the combination patterns that suitably combine the clusters of dots sequentially. Based on the cutting out process to output, the identification result of the cut out pattern, and the information regarding the cutting out, if the pattern is considered to be a single character pattern, the identification result is output as is, and if it is considered to be multiple patterns, the identification result is output as is. A character reading system has been invented which includes a character determination step of outputting a recognition result corresponding to the combination with the longest pattern width - ξ turn from among the recognition results of each of a plurality of combination patterns. This invention is currently under patent application (Japanese Patent Application No. 57-222489) by the present applicant. This prior invention has the advantage of being able to accurately and quickly read documents with uneven character pitches, documents with a mixture of full-width and half-width characters, etc., but when characters of different sizes come into contact or In some cases, the desired character reading result may not be obtained, such as when one of the characters is crossed out.

（発明の目的）本発明の目的は前述の間眺点に鑑み、文字の大きさが異
なる文字が接触した場合や接触した文字の一方がかけて
いた場合などにおいても、より一層高精度でかつ高速に
読取ることができる文字読取方式全提供することにある
。(Objective of the Invention) In view of the above-mentioned point of view, the object of the present invention is to achieve even higher accuracy and even when characters of different sizes touch each other or when one of the touching characters overlaps. The purpose is to provide all character reading methods that can be read at high speed.

（発明の構成〕前述の目的を達成するため、第１の発明は帳票上の文字
全走査光電変換して得られた黒白２値の文字行のパター
ンから一文字ずつ切出して文字認識を行なう文字読取方
式において、文字行上の予め定められた一定区間内に存
在する点列の塊の個数音調べ、文字行上の点列の塊が予
め定められた一定区間より大きい場合、点列の塊の大き
さに応じて分割数全快め、かつ互いに異なる複数種の文
字切出し方法を行って、文字切出しに関する情報と強制
分離した切出し−にターンを出力する文字切出し工程と
、該切出し・ξターンの識別結果とその切出しに関す不
情報を用い、複数種の文字切出し方法の中から最も確度
の高い値を示す文字切出し方法を最適な文字切出し方法
とみなし、その文字切出し方法で得られた識別結果全文
字読取結果として出力する文字決定工程とを有すること
全特徴とし、第２の発明は帳票上の文字全走査光電変換
して得られた黒白２値の文字行のノミターンから一文字
ずつ切出して文字認識を行なう文字読取方式において、
文字行上の予め定められた一定区間内に存在する点列の
塊の個数音調べ、文字行上の点列の塊が予め定められた
一定区間よシ大きい場合、点列の塊の分割数を点列り塊
の大きさに応じて複数種設定し、かつ互いに異なる複・
数種の文字切出し方法を行って、文字切出しに関する情
報と強制分離した切出しパターン全出力する文字切出し
工程と、該切出しノミターンの識別結果とその切出しに
関する情報？用い、複数種の文字切出し方法の中から最
も確度の高い値を示す文字切出し方法全最適な文字切出
し方法とみなし、その文字切出し方法で得られた識別結
果全文字読取結果として出力する文字決定工程と？有す
ることを特徴とする。(Structure of the Invention) In order to achieve the above-mentioned object, the first invention is a character reading method that performs character recognition by cutting out each character from a black and white binary character line pattern obtained by full scanning photoelectric conversion of characters on a form. In this method, the number of clusters of dots that exist within a predetermined interval on a character line is counted, and if the cluster of dots on the character line is larger than a predetermined interval, the number of clusters of dots on the character line is counted. A character cutting process in which the number of divisions is fully divided according to the size, and a plurality of different character cutting methods are performed to output information on character cutting and turns in forcedly separated cutouts, and identification of the cutouts and ξ turns. Using the results and the unknown information regarding the extraction, the character extraction method that shows the highest accuracy value among multiple character extraction methods is regarded as the optimal character extraction method, and all identification results obtained with that character extraction method are The second invention is characterized by a character determination step that outputs a character reading result, and the second invention recognizes characters by cutting out each character from the chisel turn of a black and white binary character line obtained by full scanning photoelectric conversion of characters on a form. In the character reading method that performs
Check the number of clusters of dots that exist within a predetermined interval on the character line, and if the cluster of dots on the character line is larger than the predetermined interval, calculate the number of divisions of the cluster of dots. Multiple types are set depending on the size of the point array block, and different types and types are set.
A character cutting process in which several types of character cutting methods are performed to output all information on character cutting and the forcedly separated cut-out patterns, and the identification results of the cut-out chisels and information on the cutting-out? A character determination process in which the character extraction method that has the highest accuracy value among multiple types of character extraction methods is regarded as the optimal character extraction method, and the identification results obtained by that character extraction method are output as all character reading results. and? It is characterized by having.

（実施例〕図面は本発明の実施例？示すものであって、図中１１は
入力端子、１２はパターンメモ１ハ１３は文字切出し部
、１４は特徴抽出部、１５は識別部、１６は識別辞書部
、１７は文字決定部、１８は出力端子である。(Embodiment) The drawing shows an embodiment of the present invention, in which 11 is an input terminal, 12 is a pattern memo 1, 13 is a character cutting section, 14 is a feature extraction section, 15 is an identification section, and 16 is an identification section. 17 is a character determining section; 18 is an output terminal;

前述の構成における各部の動作全以下に説明する。まず
、帳票上の文字全光電変換装置（図示せず）により白黒
２値のパター・ンテータに変換し、これを入力端子ｌｌ
金介してノミターンメモリ１２に一旦蓄える。文字切出
し部１３は該ノぐターンメモリ１２より第２図に示すよ
うな一行分の文字を含む行パターン２０金切出し、次に
注目点全行方向（図中矢印Ｘ方向）に移動しつつ、列方
向（図中矢印Ｙ方向）の走査を行ない、パターンが存在
する部分を黒画素の個数で表わし、存在しない部分音０
として表示したデータ（以下これ？点列データと称す）
′３０　’ｒ取り出す。更に該文字切出し部１３１ｄ黒
列データ３０に基づいて後述する文字切出しの処理全実
行し、行パターン２０よシ、個別ノミターン２１や強制
分離ノミターン２２など全識別用ノξターンとして切出
し、文字切出しに関する情報（行・ぞターン２０におけ
る文字切出し位置、一定区間α内の点列の塊の個数Ｎ、
黒点列塊を検出するための動作全何回繰り返したかを表
す動作番号ＤＮＯ，一定区間α定区間列内塊を組合せて
作成したＡターン番号ＰＮＯ，強制分離の鶴類数Ｍおよ
び強制分離の種類毎の分離数Ｌ）とともに一対の識別用
の文字ノミターンデータとして特徴抽出部１４に順次送
出する。特徴抽出部１４では送られて来た文字パターン
から文字の特徴？抽出し、特徴データと文字切出しに関
する情報とを識別部１５に送出する。識別部１５では識
別辞書部１６との照合をとり、識別用の文字・ξターン
を順次識別し、その識別結果（たとえば文字コードと類
似度など）と文字切出しに関する情報と？一対のデータ
として文字決定部１７に順次送出する。文字決定部１７
では送られて来た該データに後述の処理を施し、そこで
選択されたものを文字読取結果として出力端子１８に出
力する。The operation of each part in the above configuration will be fully explained below. First, the characters on the form are converted into a binary black and white pattern by a photoelectric conversion device (not shown), and this is sent to the input terminal ll.
It is temporarily stored in the nomiturn memory 12 via the money. The character cutting unit 13 cuts out a line pattern 20 including one line of characters as shown in FIG. 2 from the turn memory 12, and then moves in the direction of all the lines of interest (in the direction of the arrow X in the figure) while Scanning is performed in the column direction (direction of arrow Y in the figure), and the part where the pattern exists is represented by the number of black pixels, and the part where the pattern does not exist is 0.
Data displayed as (hereinafter referred to as point sequence data)
'30 'r Take out. Furthermore, the character cutting unit 131d executes all character cutting processes to be described later based on the black column data 30, and cuts out all of the line patterns 20, individual chisel turns 21, forced separation chisel turns 22, etc. as ξ turns for identification. Information (character cutting position in row/zo turn 20, number N of clusters of point sequences within a certain interval α,
An operation number DNO indicating how many times the operation to detect a sunspot sequence cluster is repeated, an A turn number PNO created by combining the constant interval α constant interval clusters, the number M of cranes for forced separation, and the type of forced separation. It is sequentially sent to the feature extraction unit 14 as a pair of character nomiturn data for identification along with the number of separations L) for each character. The feature extraction unit 14 extracts character features from the received character pattern. Then, the feature data and information regarding character extraction are sent to the identification unit 15. The identification unit 15 performs a comparison with the identification dictionary unit 16, sequentially identifies characters and ξ turns for identification, and combines the identification results (for example, character code and similarity) with information regarding character extraction. The data is sequentially sent to the character determination unit 17 as a pair of data. Character determination section 17
Then, the sent data is subjected to processing to be described later, and the selected data is outputted to the output terminal 18 as a character reading result.

文字切出し部１３における強制分離・ξターン２２を作
成する文字切出しの処理は第３図に示すようになってい
る。行ノξターン２０において、一定区間α内に点列の
塊が１個も存在しない場合畢１個あるいは複数個存在す
る場合は特願昭５７−２２２４８９号に詳述されている
ので言及しない。点列の塊が一定区間αよシ大きい場合
は、点列の塊の大きさから文字パターンの切出し個数Ｌ
ｋ決め（たとえば点列の塊の幅ＢＬＭＡ≦ＢＬ＜（Ｌ十
−！−）ＭＡのときＬ全文字切出し個数とする。また半
角文字の平均文字幅ＭＡ’との関係が（Ｌ′−１）Ｍへ
′≦ＢＬ〈（Ｌ′士−！−）ＭＡ′のとき２　２Ｌ′も文字切出し個数となる）、この値？用いて文字切
出しを行う。この時の文字切出し方法は複数種（Ｍ種〕
行ない（たとえばＭ＝３の時は塊の始まり位置から平均
文字幅ＭＡ年単位切出す方法■、塊の終了位置から平均
文字幅ＭＡ年単位切出す方法■、塊の始まり位置と終了
位置との間ｆｆｌＬ等分する点を切出し候補位置とみな
して切出す方法■などがある）、文字パターンの切出し
個数り、切出し種類数Ｍ、１つの切出し方法で切出され
た文字パターンに順次付与されるパターン番号ＰＮ’０
（ＰＮＯ＝１〜Ｌ）９文字切出しの方法ごとに１から＋
１ずつ増加させて付与した動作番号ＤＮＯ（ＤＮＯ＝１
〜Ｍ）などの文字切出しに関する情報と切出されたノミ
ターンとを、後続の識別部１５へ送出する。識別部１５
では特徴抽出部１４で抽出された文字パターンの特徴と
識別辞書部１６に用意された文字特徴と全照合し、類似
度が一定値以上のものを選択して識別結果とし、文字切
出しに関する情報とともに文字コード、類似度など？文
字決定部１７に出力する。文字決定部１７では識別部１
６から送られてきた文字切出しに関する情報と識別結果
とから第４図に示す文字決定の処理全行う。The character cutting process for creating forced separation/ξ turns 22 in the character cutting section 13 is as shown in FIG. In the row ξ turn 20, if there is no cluster of point sequences within the fixed interval α, or if there is one cluster or a plurality of clusters, this will not be discussed since this is detailed in Japanese Patent Application No. 57-222489. If the cluster of point sequences is larger than the fixed interval α, the number of character patterns to be cut out L from the size of the cluster of point sequences
K is determined (for example, when the width of a cluster of dots BLMA≦BL<(L0-!-)MA, L is the number of all characters to be extracted.Also, the relationship with the average character width MA' of half-width characters is (L'-1 ) to M'≦BL<(L'shi-!-) When MA', 2 2 L' is also the number of characters to be cut out), this value? Use this to cut out characters. At this time, there are multiple types of character cutting methods (M type)
(For example, when M = 3, the method of cutting out the average character width MA years from the starting position of the block ■, the method of cutting out the average character width MA years units from the ending position of the block ■, the method of cutting out the average character width MA years units from the starting position of the block There are methods such as (■) that consider the points that divide the space into equal parts as cropping candidate positions, etc.), the number of character patterns to be cut out, the number of types of cutouts M, and the character patterns that are cut out by one cutting method to be sequentially attached. Pattern number PN'0
(PNO=1 to L) 1 to + for each method of cutting out 9 characters
Operation number DNO incremented by 1 (DNO=1
- M) and the like and the cut out chisel turns are sent to the subsequent identification section 15. Identification section 15
Then, the features of the character pattern extracted by the feature extraction unit 14 are compared with the character features prepared in the identification dictionary unit 16, and those with a degree of similarity of more than a certain value are selected as identification results, along with information regarding character extraction. Character code, similarity, etc.? It is output to the character determination section 17. In the character determination section 17, the identification section 1
The entire character determination process shown in FIG. 4 is performed based on the information regarding character extraction sent from 6 and the identification results.

第４図では識別部１６から送られてきた文字切出しに関
する情報から識別結果が強制分離ノミターンの識別結果
であるか否かを判定し、強制分離、。In FIG. 4, it is determined from the information regarding character cutting sent from the identification unit 16 whether or not the identification result is a forced separation nomiturn identification result, and forced separation is performed.

ターンの場合は一次的に識別結果全バッファに格納し、
連続した強制分離−ξターンの最終識別結果が送られて
きた時点で、これまで格納したノ々ツファの中から確度
の高い文字切出し方法全選択してその読取結果として出
力する選択処理を行う。In the case of a turn, the identification results are temporarily stored in the entire buffer,
When the final identification results of consecutive forced separation -ξ turns are sent, a selection process is performed in which all highly accurate character cutting methods are selected from among the notations stored so far and outputted as the reading results.

次に第２図の行ノξターフ２０盆例にとって文字切出し
と文字決定の過程について説明する。Next, the process of character extraction and character determination will be explained for the example of row no ξ turf 20 trays shown in FIG.

この中で文字決定における選択処理は識別結果として類
似度全周いる方法や識別結果の優先度（ランク）を用い
る方法などが考えられるがここでは類似度を用いて説明
する。行ノξターフ２０において、対象区間２の、ｏタ
ーン「方定」は、点列データ３０が予め定められた一定
区間αよシ太きいために文字切出し部１３は強制分離の
処理を行う。このとき点列の塊ＢＬが２ＭＡにほぼ等し
いので分割数りがＬ＝２と外シ、文字切出し方法として
前記の例Ｍ＝３ヶ採用すると、文字切出し部１３からの
出カバターンは第５図に示すように「遊Ｊ、ｒｔ」、ｒ
方」、「定」、「右」「宗」　の６種のノミターンとそ
れぞれの文字切出しに関する情報とになる。文字決定部
１７ではこの区間が強制分離ノミターンの区間であるこ
と全検知し、識別結果の中から最も確度の高いもの？選
択する。すなわち、各動作番号ごとにノミターン番号を
付与されタノξターンの識別結果に対して、その類似度
の平均値？求め、ナの値の最も高いもの（図５の例では
項番２となる）を最適な文字切出し方法として採用し、
その時の識別結果「方」、「定」全文字読取結果として
出力端子１８に送出する。Among these, the selection process in character determination can be performed using a method that uses similarity all around as the identification result or a method that uses the priority (rank) of the identification result, but here, similarity will be used for explanation. In the row ξ turf 20, the character segmentation unit 13 performs forced separation processing for the o-turn "direction" in the target section 2 because the point sequence data 30 is thicker than the predetermined constant section α. At this time, since the cluster of dots BL is almost equal to 2MA, the number of divisions is L = 2, and if the above example M = 3 is adopted as the character cutting method, the output pattern from the character cutting part 13 is shown in Figure 5. As shown in "Yu J, rt", r
There are six types of chimiturns: ``Hou'', ``Sei'', ``Right'', and ``Sou'', and information on how to cut out each character. The character determination unit 17 completely detects that this section is a forced separation nomiturn section, and selects the one with the highest accuracy among the identification results. select. In other words, what is the average value of the similarity for the identification results of Tano ξ turns with a Nomiturn number assigned to each action number? The method with the highest value of Na (item number 2 in the example in Figure 5) is adopted as the optimal character extraction method.
At that time, the identification results "Ko" and "Tei" are sent to the output terminal 18 as all character reading results.

このように上記実施例によれば、文字行の点列の塊の大
きさによって一つの文字パターンなのか、文字が接触し
た複数の文字ノミターンなのか全区別するようにしたた
め、−文字として切出す区間と複数の文字として切出す
べき区間なのかを区別することができ、また複数の文字
切出し数と複数種の文字切出し方法全行なうようにして
いるため、全角文字のみならず半角文字の接触も切出す
ことができ、文字読取精度？上げることができる。また
文字切出し部１３では点列の塊の大きさに従って機械的
に・ξターンを切出すのみでよいことから、文字読取り
の処理全体？ノ々イブライン構成とすることができ、処
理の高速化がはかれる。In this way, according to the above embodiment, it is possible to distinguish whether it is a single character pattern or a plurality of characters that touch each other depending on the size of the block of dots in the character line, so that it is cut out as a - character. It is possible to distinguish whether it is an interval or an interval that should be cut out as multiple characters, and since it is possible to perform multiple character cutting numbers and multiple types of character cutting methods, not only full-width characters but also half-width characters can touch each other. Can it be cut out and character reading accuracy? can be raised. In addition, since the character cutting unit 13 only needs to mechanically cut out ξ turns according to the size of the cluster of dots, it is necessary to cut out the entire character reading process. It can be configured as a no-no-b line configuration, and the processing speed can be increased.

前記実施例における文字切出し工程において、点列の塊
の分割数を点列の塊の大きさに応じて複数種設定し、か
つ互いに異なる複数種の文字切出し方法を行うようにし
てもよい。このようにすれば読取り精度がなお一層向上
する。In the character cutting step in the embodiment, a plurality of types of division numbers of a dot sequence block may be set depending on the size of the dot sequence block, and a plurality of different character cutting methods may be performed. In this way, the reading accuracy is further improved.

（発明の効果）。(Effect of the invention).

以上説明したように本発明によれば、帳票上の文書を走
査光電変換して得られた文字行のパターンから一文字ず
つ切出して文字認識？行う文字切出し方式において、文
字行上の点列の塊の大きさ全調べ、予め定められた一定
区間より大きい場合、点列の塊の大きさに応亡て分割数
を決めかつ互いに異なる複数種の文字切出し方法７行っ
て、それぞれ？−文字パターンとみなして強制的に切出
し、該切出しタノξターンとその切出しに関する情報と
を出力し、該切出したノミターンの切出しに関する情報
から強制分離ノミターンであることを判定し、強制分離
ノミターンの識別結果から確度の高い切出し方法を検出
し、その識別結果１文字読取結果として出力するように
したため、接触が生じた文字？含む文書の読取りが複雑
な処理を行なうことなく一義的に行うことができ、処理
の高速化がはかれ、しかも高精度１となる。また点列の
塊の分割数全黒列の塊の大きさに応じて複数種設定しか
つ互いに異なるｉ数種の文字切出し方法全実行するもの
においては、数多くの種々の強制分離パターン？取出す
ことができ、したがって読取精度を、よシ一層向上でき
る等の利点がある。As explained above, according to the present invention, character recognition is possible by cutting out each character from a character line pattern obtained by scanning and photoelectrically converting a document on a form. In the character extraction method to be used, all sizes of clusters of dots on a character line are checked, and if the size is larger than a predetermined interval, the number of divisions is determined according to the size of the cluster of dots, and multiple types of different types are determined. How to cut out characters in 7 ways? - Regard it as a character pattern and forcibly cut it out, output the cut-out Tano ξ-turn and information about the cut-out, determine that it is a forced separation chisel-turn from the information about the cut-out of the cut-out chisel-turn, and identify the forced-separation chisel-turn. A highly accurate extraction method is detected from the results, and the identification result is output as a single character reading result. The document contained therein can be read uniquely without performing complicated processing, speeding up the processing, and achieving high accuracy. In addition, in the case where multiple types are set depending on the number of divisions of a block of dots and the size of a block of all black lines, and several different character extraction methods are executed, there are many different forced separation patterns. This has the advantage that the reading accuracy can be further improved.

[Brief explanation of the drawing]

第１図は本発明方式を適用した文字読取装置の一実施例
？示すブロック図、第２図は行ノξターン及びその点列
データの一列を示す説明図、第３図は文字切出し部１３
のフローチャート、第４図は文字決定部１５のフローチ
ャート、第５図は行、ｏターンに対する文字切出し、識
別。文字決定処理の実行のようす？示す説明図である。１１・・・入力端子、１２・・すξターンメモリ、１３
・・・文字切出し部、１４・・・特徴抽出部、１５・・
・識別部、１６・・・識別辞書部、１７・・文字決定部
、１８・・・出力端子。特許出願人　日本電信電話公社代理人　弁理士　吉　１）　精　、孝Fig. 1 is an example of a character reading device to which the method of the present invention is applied. FIG. 2 is an explanatory diagram showing a row no ξ turn and its point sequence data, and FIG. 3 is a block diagram showing the character cutting unit 13.
4 is a flowchart of the character determination unit 15, and FIG. 5 is a flowchart of character cutting and identification for rows and o-turns. How is the character determination process executed? FIG. 11...Input terminal, 12...Sξ turn memory, 13
...Character extraction section, 14...Feature extraction section, 15...
-Identification unit, 16...Identification dictionary unit, 17...Character determination unit, 18...Output terminal. Patent applicant Nippon Telegraph and Telephone Public Corporation agent Patent attorney Yoshi 1) Sei, Takashi

Claims

[Claims]

(1) Black and white obtained by scanning and photoelectrically converting characters on a form 2
In a character reading method that performs all character recognition by cutting out each character from the chisel turn of a character line, the number of clusters of point sequences existing within a predetermined interval on the character line is checked, and the number of clusters of point sequences on the character line is calculated. If the block is larger than a predetermined certain interval, how many times should the point sequence be divided according to the size of the block? A character cutting step in which a plurality of different character cutting methods are performed to output information on character cutting and a forcedly separated cutout pattern; and a character determination step in which a character extraction method that exhibits the highest accuracy value among the character extraction methods is regarded as the overall optimal character extraction method, and an identification result obtained by that character extraction method is output as a character reading result. A character reading method featuring:

(2) Black and white 2 obtained by scanning and photoelectrically converting characters on a form
In a character reading method that performs character recognition by cutting out each character from the number turn of a character line, it is possible to count the number of clusters of dots that exist within a predetermined interval on a character line, and to identify the number of dots on a character line. If the block is larger than a predetermined certain interval, how many times should the block of points be divided? Is it possible to set multiple types depending on the size of the cluster of dots and use multiple different character extraction methods to forcefully separate the information related to character extraction? Using the character cutting process to output, the identification result of the cutting nomimeter, and information regarding the cutting,
A character determination step in which a character extraction method that shows the highest accuracy value among multiple types of character extraction methods is regarded as the optimal character extraction method, and the identification results obtained by that character extraction method are output as all character reading results. A character reading method characterized by: