JPH06195521A

JPH06195521A - Character recognizing method

Info

Publication number: JPH06195521A
Application number: JP4345945A
Authority: JP
Inventors: Yukiya Sugiyama; 幸也杉山
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1992-12-25
Filing date: 1992-12-25
Publication date: 1994-07-15

Abstract

PURPOSE:To accurately recognize a Japanese document by judging that the Japanese document is not the result of recognition when the number of candidate words obtained by making a search only as to a 1st candidate character in one line is less than a threshold value. CONSTITUTION:A language processing is performed for the character recognition result of a character recognition part 13 which recognizes and converts characters of character image data, obtained by a character segmentation part 12 segmenting image data read by an image reader into character image data, character by character, into character codes. At this time, a post processing necessity/nonnecessity decision part 15 judges that the recognition result is not the result of the recognition of the Japanese document if the number of characters in one line exceeds the threshold value, there are successive characters of single character kind more than the threshold value among 1st candidate characters in one line, and candidate words less than the threshold value are obtained as a result of a search for only 1st candidate words in one line, so that the language processing is not performed.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、新聞，雑誌，小説など
の、活字，ドット文字及び手書き文字パターンをＪＩＳ
コード等のコード情報に変換する文字認識方法に関する
ものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention is designed to print characters, dot characters and handwritten character patterns of newspapers, magazines, novels, etc. according to JIS.
The present invention relates to a character recognition method for converting into code information such as a code.

【０００２】[0002]

【従来の技術】認識対象文書の原文を（表１）に示す。2. Description of the Related Art The original text of a document to be recognized is shown in (Table 1).

【０００３】[0003]

【表１】 [Table 1]

【０００４】（表１）に示す原文の認識結果を（表２）
に示す。The recognition result of the original sentence shown in (Table 1) is shown in (Table 2).
Shown in.

【０００５】[0005]

【表２】 [Table 2]

【０００６】上記認識結果に言語処理を用いた後処理を
行うと、後処理では候補文字群内の文字を組み合わせて
日本語単語として成立する文字の組合せを検出し、その
単語を正解文字として採用する。従って、後処理により
０行１列の‘位’と１行２列の‘相’が“位相”という
単語を形成するので、正解文字‘移’は不正解文字
‘相’と変換されてしまい、認識率が低下するという問
題点があった。When post-processing using language processing is performed on the recognition result, the post-processing detects the combination of characters that form a Japanese word by combining the characters in the candidate character group and adopts that word as the correct character. To do. Therefore, in the post-processing, the 0th row and 1st column'position 'and the 1st row and 2nd column'phase' form the word'phase ', so the correct character'transition' is converted to the incorrect character'phase '. However, there is a problem that the recognition rate decreases.

【０００７】（表３）に悪影響を受けた認識結果を示
す。(Table 3) shows the recognition results that are adversely affected.

【０００８】[0008]

【表３】 [Table 3]

【０００９】[0009]

【発明が解決しようとする課題】従来の文字認識方法で
は、全ての認識結果に対して言語処理を用いた後処理を
行っているために、単一文字種の文字を羅列した文書で
は上記問題点が発生する。後処理の要否を使用者が指定
する構造になっていたとしても指定ミスによる上記問題
点の発生を完全に防止することはできない。In the conventional character recognition method, since the post-processing using the language processing is performed on all recognition results, the above problem occurs in a document in which characters of a single character type are listed. Occurs. Even if the user specifies the necessity of post-processing, it is not possible to completely prevent the above-mentioned problems from occurring due to a specification error.

【００１０】本発明はこの点に鑑み、日本語文章を正確
に認識することのできる文字認識方法を提供することを
目的とする。In view of this point, the present invention has an object to provide a character recognition method capable of accurately recognizing a Japanese sentence.

【００１１】[0011]

【課題を解決するための手段】画像読み取り装置から読
み取った画像データを一文字毎の文字画像データに切り
出しする文字切り出し部により切り出された文字画像デ
ータを文字認識して文字コードに変換する文字認識部に
より認識された文字認識結果に対して言語処理を行うに
あたり、認識結果に於て、１行中の文字数がしきい値以
上存在し、且つ、１行中の第一候補文字の内、単一文字
種の個数がしきい値以上連続して存在し、且つ、１行中
の第一候補文字だけを対象として候補単語を探索した結
果、しきい値以下の個数の候補単語しか得られなかった
場合、当該認識結果は、日本語文章を認識した結果では
無いと判断し、言語処理を行わない事により認識率，認
識速度の低下を防止する。A character recognizing unit for recognizing character image data cut out by a character slicing unit for slicing image data read from an image reading device into character image data for each character and converting the character image data into a character code. When performing language processing on the character recognition result recognized by, the number of characters in one line exceeds the threshold in the recognition result, and the single candidate character among the first candidate characters in one line is detected. When the number of character types continues for more than the threshold value and the candidate words are searched for only the first candidate character in one line, only the number of candidate words less than the threshold value is obtained. It is determined that the recognition result is not the result of recognizing a Japanese sentence, and the language processing is not performed to prevent the recognition rate and the recognition speed from decreasing.

【００１２】[0012]

【作用】本発明は前記の構成により、日本語文章を正し
く認識することができる。The present invention can correctly recognize a Japanese sentence by the above configuration.

【００１３】[0013]

【実施例】以下本発明の一実施例における文字認識方法
について、図面を参照しながら説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A character recognition method according to an embodiment of the present invention will be described below with reference to the drawings.

【００１４】図１は本発明の一実施例における文字認識
方法を用いた文字認識装置のブロック図である。FIG. 1 is a block diagram of a character recognition apparatus using a character recognition method according to an embodiment of the present invention.

【００１５】１１は認識対象文書を光電変換する画像読
み取り部、１２は認識対象文書の画像データより、文字
単位の画像データを切り出す文字切り出し部，１３は文
字領域の画像データを文字認識して文字コードに変換す
る文字認識部，１４は候補文字群より改行コードを検出
し、１行単位の認識結果を得る行分割部，１５は行に対
して言語処理を用いた後処理の要否を判定する後処理要
否判定部，１６は言語処理を用いた後処理を行う後処理
部，１７は最終的な認識結果を出力する出力部，１８は
単語辞書である。Reference numeral 11 is an image reading unit for photoelectrically converting a recognition target document, 12 is a character cutout unit for cutting out image data in character units from image data of the recognition target document, and 13 is character recognition by recognizing image data in a character area. A character recognizing unit for converting into a code, 14 detects a line feed code from a candidate character group, and a line dividing unit for obtaining a recognition result for each line, 15 determines whether post-processing using language processing is necessary for a line A post-processing necessity determining unit for performing, a post-processing unit for performing post-processing using language processing, an output unit for outputting a final recognition result, and a word dictionary.

【００１６】以上のように構成された本実施例における
文字認識方法について、図２〜図６のフローチャートに
そって以下その動作について説明する。The operation of the character recognition method according to this embodiment having the above-described structure will be described below with reference to the flow charts of FIGS.

【００１７】まずステップｓ２１において画像読み取り
部１１は、認識対象文書を光電変換し、画像データを得
る。First, in step s21, the image reading section 11 photoelectrically converts the document to be recognized to obtain image data.

【００１８】次に、ステップｓ２２において文字切り出
し部１２は、画像データから文字単位の画像データを切
り出す。Next, in step s22, the character cutout unit 12 cuts out image data in character units from the image data.

【００１９】次に、ステップｓ２３において文字認識部
１３は、文字画像データを基にして文字認識を行う。認
識結果を（表４）に示す。Next, in step s23, the character recognition unit 13 performs character recognition based on the character image data. The recognition results are shown in (Table 4).

【００２０】[0020]

【表４】 [Table 4]

【００２１】次に、行分割を行う。ステップｓ３１にお
いて、ｉは文字位置を表わし、初期値として−１が設定
される。そしてステップｓ３２においてｉ＋１＝０をｈ
ｅａｄに代入する。ｈｅａｄは行の先頭位置を表わす。
ステップｓ３３においてｉ＋１＝０をｉに代入し、文字
位置を一つ進める。Next, line division is performed. In step s31, i represents a character position, and -1 is set as an initial value. Then, in step s32, i + 1 = 0 is changed to h
Substitute in ead. The head represents the start position of the line.
In step s33, i + 1 = 0 is substituted for i and the character position is advanced by one.

【００２２】ステップｓ３４においてｉが認識文字数を
超えていないかを調べる。超えていないならばステップ
ｓ３５に移行して、（ｉ＝０）位置第一候補文字は改行
コードか調べる。超えているならばステップｓ３７に移
行して認識結果を出力する。改行コードではないならば
ステップｓ３３に移行する。In step s34, it is checked whether i exceeds the number of recognized characters. If it does not exceed, the process proceeds to step s35 to check whether the first candidate character at the (i = 0) position is a line feed code. If it exceeds, the process proceeds to step s37 and the recognition result is output. If it is not a line feed code, the process proceeds to step s33.

【００２３】同様にｉを一つずつ進めて行き、ｉを６ま
で進めると第一候補文字が改行コードであるので、ステ
ップｓ３６においてｔａｉｌにｉを代入する。ｔａｉｌ
は行の末尾文字位置を表わす。Similarly, when i is advanced one by one and i is advanced to 6, the first candidate character is a line feed code, so i is substituted for tail in step s36. tail
Represents the last character position in the line.

【００２４】以上の作用により、１行分の認識結果範囲
が得られる。ステップｓ４０では行の文字数を求める。
そしてステップｓ４１においてｌｅｎｇｔｈに（ｔａｉ
ｌ−ｈｅａｄ）を代入する。With the above operation, the recognition result range for one line can be obtained. In step s40, the number of characters in the line is calculated.
Then, in step s41, the length is set to (tai
l-head) is substituted.

【００２５】そして、ｌｅｎｇｔｈはｔｈｒ１以下か調
べる。ｔｈｒ１は、１行あたりの文字数を表わすしきい
値であり、後処理要否判断要素のひとつである。ここで
は４を設定してあるものとする。Then, it is checked whether the length is less than thr1. thr1 is a threshold value that represents the number of characters per line, and is one of the post-processing necessity determination elements. Here, it is assumed that 4 is set.

【００２６】ステップｓ４２においてｊに初期値として
ｈｅａｄを代入する。ｊは０。ｊは文字位置を表わす。
そしてステップｓ４３においてｊ位置の第一候補文字種
をｋｉｎｄに設定する。文字種は漢字である。ステップ
ｓ４４においてｃｎｔに初期値１を設定する。ｃｎｔは
連続する同一文字種文字数を計数するのに用いる。ステ
ップｓ４５においてｊに１を加算する。In step s42, head is substituted for j as an initial value. j is 0. j represents a character position.
Then, in step s43, the first candidate character type at the j position is set to kind. The character type is Kanji. In step s44, the initial value 1 is set in cnt. cnt is used to count the number of consecutive characters of the same character type. In step s45, 1 is added to j.

【００２７】ステップｓ４６においてｊがｔａｉｌを超
えないかどうかをチェックする。超えていないならば、
ステップｓ４７に移行し（ｊ＝１）位置の第一候補文字
種とｋｉｎｄを照合する。両者とも漢字なので一致する
とステップｓ４８に移行しｃｎｔに１を加算。ｃｎｔ＝
２となったところでステップｓ４９においてｃｎｔはｔ
ｈｒ２を超えないかどうかをチェックする。ｔｈｒ２は
同一文字種の連続出現許容数であり、この値を超えると
後処理を行わない。ここでは５を設定する。In step s46, it is checked whether j does not exceed tail. If not exceeded,
In step s47, the first candidate character type at the position (j = 1) is compared with the kind. Since both are Kanji, if they match, the process moves to step s48 and 1 is added to cnt. cnt =
When it reaches 2, cnt is t in step s49.
Check if hr2 is not exceeded. thr2 is the allowable number of consecutive appearances of the same character type, and if this value is exceeded, post-processing will not be performed. Here, 5 is set.

【００２８】（ｃｎｔ＝２）は（ｔｈｒ２＝５）を超え
ないのでステップｓ４５へ移行する。Since (cnt = 2) does not exceed (thr2 = 5), the process proceeds to step s45.

【００２９】同様にｊを一つずつ進めて行き、ｊを５ま
で進めるとｃｎｔがｔｈｒ２を超える。Similarly, when j is advanced one by one and j is advanced to 5, cnt exceeds thr2.

【００３０】ｋｅｙにｈｅａｄを代入する（ｓ６０）。
ｋｅｙ＝０となる。ｋｅｙは候補単語照合開始点を表わ
す。ステップｓ６１においてｋｅｙ位置の文字で始まる
単語を単語辞書１８より読み出す。Head is substituted for key (s60).
The key becomes 0. The key represents a candidate word matching start point. In step s61, the word starting with the character at the key position is read from the word dictionary 18.

【００３１】ｋｅｙ＝０位置は‘亜’なので‘亜’で始
まる単語を辞書から読み出すと（表５）が得られる。Since the key = 0 position is "A", reading a word starting with "A" from the dictionary yields (Table 5).

【００３２】[0032]

【表５】 [Table 5]

【００３３】（表５）の単語と候補文字を照合する（ｓ
６２）。一致する単語は得られなかったので計数は行わ
ない。The words in (Table 5) are compared with the candidate characters (s
62). No matching word was obtained, so no counting is performed.

【００３４】ｋｅｙはｔａｉｌと一致するかチェック
（ｓ６４）。一致しない。同様にｋｅｙを一つずつ進め
て行き、ｔａｉｌまで候補単語抽出を行ったが候補単語
は全く得られなかった。It is checked whether the key matches tail (s64). It does not match. Similarly, the key was advanced one by one, and candidate words were extracted up to tail, but no candidate word was obtained.

【００３５】一致した個数とｔｈｒ３を比較する（ｓ６
６）。ｔｈｒ３は候補単語数を表わすしきい値であり、
ここでは３が設定されている。候補単語数は０である。
従って、当行に対して後処理は行わない。The number of coincidences and thr3 are compared (s6
6). thr3 is a threshold representing the number of candidate words,
Here, 3 is set. The number of candidate words is 0.
Therefore, no post-processing is performed on the Bank.

【００３６】次の行分割を行う。ステップｓ３２におい
て、ｉ＋１をｈｅａｄに代入する。ｈｅａｄ＝７とな
る。The next line division is performed. In step s32, i + 1 is substituted for head. head = 7.

【００３７】ステップｓ３３において、ｉ＋１をｉに代
入し、文字位置を一つ進める。ｉは８となる。In step s33, i + 1 is substituted for i and the character position is advanced by one. i becomes 8.

【００３８】ステップｓ３４において、ｉが認識文字数
を超えていないかを調べる。超えていないのでステップ
ｓ３５に移行して、ｉ位置の第一候補文字が改行コード
か調べる。改行コードでないのでｓ３３に移行する。In step s34, it is checked whether i exceeds the number of recognized characters. Since it has not exceeded, the process proceeds to step s35 to check whether the first candidate character at the i position is a line feed code. Since it is not a line feed code, the process moves to s33.

【００３９】同様にｉを一つずつ進めていき、ｉを１３
まで進めると第一候補文字が改行コードであるので、ス
テップｓ３６においてｔａｉｌにｉを代入する。Similarly, i is advanced one by one, and i is increased to 13
Since the first candidate character is a line feed code when i is advanced to, i is substituted for tail in step s36.

【００４０】以上の作用により、１行分の認識結果範囲
が得られる。ステップｓ４０では行の文字数を求める。
そしてステップｓ４１においてｌｅｎｇｔｈに（ｔａｉ
ｌ−ｈｅａｄ）を代入する。With the above operation, the recognition result range for one line can be obtained. In step s40, the number of characters in the line is calculated.
Then, in step s41, the length is set to (tai
l-head) is substituted.

【００４１】そしてｌｅｎｇｔｈはｔｈｒ１以下か調べ
る。ｌｅｎｇｔｈはｔｈｒ１以上なのでステップｓ４２
へ移行する。Then, it is checked whether the length is less than thr1. Since length is more than thr1, step s42
Move to.

【００４２】ステップｓ４２においてｊに初期値として
ｈｅａｄを代入する。ｊは７。そして、ステップｓ４３
においてｊ位置の第一候補文字種をｋｉｎｄに設定す
る。文字種は漢字である。In step s42, head is substituted for j as an initial value. j is 7. And step s43
In, the first candidate character type at position j is set to kind. The character type is Kanji.

【００４３】ステップｓ４４においてｃｎｔに初期値１
を設定する。ステップｓ４５においてｊに１を加算す
る。ｊは８。At step s44, the initial value 1 is set to cnt.
To set. In step s45, 1 is added to j. j is 8.

【００４４】ステップｓ４６においてｊがｔａｉｌを超
えないかどうかをチェックする。超えていないのでステ
ップｓ４７に移行する。In step s46, it is checked whether j does not exceed tail. Since it has not exceeded, the process proceeds to step s47.

【００４５】ステップｓ４７においてｊ位置の第一候補
文字種とｋｉｎｄを照合する。両者とも漢字なのでステ
ップｓ４８に移行し、ｃｎｔに１を加算する。In step s47, the first candidate character type at the j position is compared with the kind. Since both are Kanji, the process moves to step s48 and 1 is added to cnt.

【００４６】ステップｓ４９においてｃｎｔはｔｈｒ２
を超えていないのでステップｓ４５へ移行する。In step s49, cnt is thr2.
Since it does not exceed, the process proceeds to step s45.

【００４７】ステップｓ４５においてｊに１を加算す
る。ｊは９。ステップｓ４６においてｊがｔａｉｌを超
えないかどうかをチェックする。超えていないのでステ
ップｓ４７に移行する。At step s45, 1 is added to j. j is 9. In step s46, it is checked whether j does not exceed tail. Since it has not exceeded, the process proceeds to step s47.

【００４８】ステップｓ４７においてｊ位置の第一候補
文字種とｋｉｎｄを照合する。両者は一致しないのでス
テップｓ５０へ移行する。In step s47, the first candidate character type at the j position is compared with the kind. Since both do not match, the process proceeds to step s50.

【００４９】ステップｓ５０においてｊ位置の第一候補
文字種をｋｉｎｄに代入する。ステップｓ５１において
ｃｎｔに初期値１を代入する。In step s50, the first candidate character type at the j position is substituted for kind. In step s51, the initial value 1 is substituted for cnt.

【００５０】ステップｓ４５においてｊに１を加算す
る。ｊは１０。ステップｓ４６においてｊがｔａｉｌを
超えないかどうかをチェックする。超えていないのでス
テップｓ４７に移行する。At step s45, 1 is added to j. j is 10. In step s46, it is checked whether j does not exceed tail. Since it has not exceeded, the process proceeds to step s47.

【００５１】ステップｓ４７においてｊ位置の第一候補
文字種とｋｉｎｄを照合する。両者は一致しないのでス
テップｓ５０へ移行する。In step s47, the first candidate character type at the j position is compared with kind. Since both do not match, the process proceeds to step s50.

【００５２】ステップｓ５０においてｊ位置の第一候補
文字種をｋｉｎｄに代入する。ステップｓ５１において
ｃｎｔに初期値１を代入する。In step s50, the first candidate character type at the j position is substituted for kind. In step s51, the initial value 1 is substituted for cnt.

【００５３】ステップｓ４５においてｊに１を加算す
る。ｊは１１。ステップｓ４６においてｊがｔａｉｌを
超えないかどうかをチェックする。超えていないのでス
テップｓ４７に移行する。At step s45, 1 is added to j. j is 11. In step s46, it is checked whether j does not exceed tail. Since it has not exceeded, the process proceeds to step s47.

【００５４】ステップｓ４７においてｊ位置の第一候補
文字種とｋｉｎｄを照合する。両者は一致しないのでス
テップｓ５０へ移行する。In step s47, the first candidate character type at the j position is compared with the kind. Since both do not match, the process proceeds to step s50.

【００５５】ステップｓ５０においてｊ位置の第一候補
文字種をｋｉｎｄに代入する。ステップｓ５１において
ｃｎｔに初期値１を代入する。In step s50, the first candidate character type at the j position is substituted for kind. In step s51, the initial value 1 is substituted for cnt.

【００５６】ステップｓ４５においてｊに１を加算す
る。ｊは１２。ステップｓ４６においてｊがｔａｉｌを
超えないかどうかをチェックする。超えていないのでス
テップｓ４７に移行する。At step s45, 1 is added to j. j is 12. In step s46, it is checked whether j does not exceed tail. Since it has not exceeded, the process proceeds to step s47.

【００５７】ステップｓ４７においてｊ位置の第一候補
文字種とｋｉｎｄを照合する。両者は一致しないのでス
テップｓ５０へ移行する。In step s47, the first candidate character type at the j position is compared with kind. Since both do not match, the process proceeds to step s50.

【００５８】ステップｓ５０においてｊ位置の第一候補
文字種をｋｉｎｄに代入する。ステップｓ５１において
ｃｎｔに初期値１を代入する。In step s50, the first candidate character type at the j position is substituted for kind. In step s51, the initial value 1 is substituted for cnt.

【００５９】ステップｓ４５においてｊに１を加算す
る。ｊは１３。ステップｓ４６においてｊがｔａｉｌを
超えないかどうかをチェックする。超えていないのでス
テップｓ４７に移行する。At step s45, 1 is added to j. j is 13. In step s46, it is checked whether j does not exceed tail. Since it has not exceeded, the process proceeds to step s47.

【００６０】ステップｓ４７においてｊ位置の第一候補
文字種とｋｉｎｄを照合する。両者は一致しないのでス
テップｓ５０へ移行する。In step s47, the first candidate character type at the j position is compared with the kind. Since both do not match, the process proceeds to step s50.

【００６１】ステップｓ５０においてｊ位置の第一候補
文字種をｋｉｎｄに代入する。ステップｓ５１において
ｃｎｔに初期値１を代入する。In step s50, the first candidate character type at the j position is substituted for kind. In step s51, the initial value 1 is substituted for cnt.

【００６２】ステップｓ４５においてｊに１を加算す
る。ｊは１４。ステップｓ４６においてｊがｔａｉｌを
超えないかどうかをチェックする。超えているのでステ
ップｓ７０に移行する。即ち、７から１３文字位置の範
囲ではｓｔｒ１を超える同一文字種列が存在しなかっ
た。従って、当行は日本語文章と判断され、後処理部１
６による処理が行われる。At step s45, 1 is added to j. j is 14. In step s46, it is checked whether j does not exceed tail. Since it has exceeded, the process proceeds to step s70. That is, in the range of 7 to 13 character positions, the same character type string exceeding str1 does not exist. Therefore, the bank is judged to be a Japanese sentence, and the post-processing unit 1
The process of 6 is performed.

【００６３】従って、当行は日本語文章と判断され、後
処理部１６による処理が行われる。Therefore, the bank is judged to be a Japanese sentence, and the post-processing unit 16 performs the processing.

【００６４】[0064]

【発明の効果】本発明はこの構成により、日本語文章を
正しく認識することができる。According to the present invention, with this configuration, Japanese sentences can be correctly recognized.

[Brief description of drawings]

【図１】本発明の一実施例における文字認識方法を用い
た文字認識装置のブロック図FIG. 1 is a block diagram of a character recognition device using a character recognition method according to an embodiment of the present invention.

【図２】本発明の一実施例における文字認識方法を用い
た文字認識装置の制御手順を示すフローチャートFIG. 2 is a flowchart showing a control procedure of a character recognition device using a character recognition method according to an embodiment of the present invention.

【図３】本発明の一実施例における文字認識方法を用い
た文字認識装置の制御手順を示すフローチャートFIG. 3 is a flowchart showing a control procedure of a character recognition device using a character recognition method according to an embodiment of the present invention.

【図４】本発明の一実施例における文字認識方法を用い
た文字認識装置の制御手順を示すフローチャートFIG. 4 is a flowchart showing a control procedure of a character recognition device using a character recognition method according to an embodiment of the present invention.

【図５】本発明の一実施例における文字認識方法を用い
た文字認識装置の制御手順を示すフローチャートFIG. 5 is a flowchart showing a control procedure of a character recognition device using a character recognition method according to an embodiment of the present invention.

【図６】本発明の一実施例における文字認識方法を用い
た文字認識装置の制御手順を示すフローチャートFIG. 6 is a flowchart showing a control procedure of the character recognition device using the character recognition method in the embodiment of the present invention.

[Explanation of symbols]

１１画像読み取り部１２文字切り出し部１３文字認識部１４行分割部１５後処理要否判定部１６後処理部１７出力部１８単語辞書 11 image reading unit 12 character cutout unit 13 character recognition unit 14 line division unit 15 post-processing necessity determination unit 16 post-processing unit 17 output unit 18 word dictionary

Claims

[Claims]

1. A character recognition unit for recognizing character image data cut out by a character slicing unit for slicing image data read from an image reading device into character image data for each character and converting it into a character code. When performing post-processing using language processing on the character recognition result, the number of characters in one line is greater than or equal to the threshold in the recognition result, and only one of the first candidate characters in one line is detected. The number of one character type continuously existed more than the threshold value, and as a result of searching the candidate words only for the first candidate character in one line, only the number of candidate words less than the threshold value was obtained. In this case, the recognition result is determined not to be the result of recognizing a Japanese sentence, and no language processing is performed, which is a character recognition method.