JP2002230481A

JP2002230481A - Optical character reader

Info

Publication number: JP2002230481A
Application number: JP2001022150A
Authority: JP
Inventors: Yasuhiko Shimizu; 保彦清水
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2001-01-30
Filing date: 2001-01-30
Publication date: 2002-08-16

Abstract

PROBLEM TO BE SOLVED: To provide an OCR(optical character recognition) which correctly recognizes a character superposed on a character frame. SOLUTION: An image of a form 1, read by an image acquisition part 10, is stored in an image memory 20, and thereafter, the image is segmented by a character area segmenting part 31 and the character frame in the image is detected by a frame line detecting part 32. The character frame is removed from the image segmented by the character area segmenting part 31, to obtain a character pattern by a frame line removal part 33. The character pattern is given to a character line width calculating part 35, and the width of lines constituting the character. It is determined by a character line complementing part 36 whether the character has been partially removed together with the character frame or not, and the character is complemented with lines of the width calculated by the character line width calculation part 35, when it has been removed. The complemented character pattern is given to a character recognition part 50 and is converted into a character code, on the basis of a recognition dictionary 60.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、光学式文字読取装
置（以下、「ＯＣＲ」という）における文字切り出し技
術に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique for extracting characters in an optical character reader (hereinafter, referred to as "OCR").

【０００２】[0002]

【従来の技術】図２は、従来のＯＣＲの一例を示す構成
図である。このＯＣＲは、読み取り対象の文字が記載さ
れた帳票１のイメージを、画素に分解して光学的に読み
取るイメージ取得部１０を有している。イメージ取得部
１０には、読み取られた帳票のイメージを一時的に蓄積
するイメージメモリ２０が接続されている。イメージメ
モリ２０には、文字切出部３０が接続されている。2. Description of the Related Art FIG. 2 is a block diagram showing an example of a conventional OCR. The OCR includes an image acquisition unit 10 that optically reads an image of the form 1 on which characters to be read are described, by decomposing the image into pixels and optically reading the image. The image acquisition unit 10 is connected to an image memory 20 that temporarily stores the read form image. The character extracting unit 30 is connected to the image memory 20.

【０００３】文字切出部３０は、文字領域切出部３１、
枠線検出部３２、枠線除去部３３及び文字抽出部３４で
構成されている。The character extracting section 30 includes a character area extracting section 31,
It comprises a frame line detection unit 32, a frame line removal unit 33, and a character extraction unit 34.

【０００４】文字領域切出部３１は、帳票１の寸法、行
位置、文字数、文字種等の形式が登録された帳票形式登
録部４０の情報に基づいて、イメージメモリ２０から文
字領域のイメージを切り出すものである。枠線検出部３
２は、文字領域のイメージ中の枠線を検出するものであ
る。A character area extracting section 31 extracts an image of a character area from the image memory 20 based on information of a form format registering section 40 in which forms such as dimensions, line positions, number of characters, and character types of the form 1 are registered. Things. Frame line detector 3
Numeral 2 detects a frame line in the image of the character area.

【０００５】枠線除去部３３は、文字領域切出部３１で
切り出されたイメージから、枠線検出部３２で検出され
た枠線を除去し、文字パターンのみを抽出するものであ
る。文字抽出部３４は、文字領域中の個々の文字パター
ンを抽出するものであり、この出力側に文字認識部５０
が接続されている。The frame line removing unit 33 removes the frame line detected by the frame line detecting unit 32 from the image cut out by the character area cutting unit 31, and extracts only the character pattern. The character extracting unit 34 extracts individual character patterns in the character area, and outputs a character recognizing unit 50 to the output side.
Is connected.

【０００６】文字認識部５０は、文字切出部３０で切り
出された文字パターンの特徴を抽出し、この特徴と認識
辞書６０に登録された標準パターンを比較して、最も類
似した文字コードを出力するものである。認識辞書６０
には、帳票形式登録部４０から帳票１の文字種等の情報
が与えられるようになっている。更に、文字認識部５０
の出力側は操作部７０に接続され、この操作部７０にお
いて単語照合等の知識処理や、オペレータによる確認・
修正等の処理が行われるようになっている。The character recognizing unit 50 extracts the characteristics of the character pattern extracted by the character extracting unit 30, compares the characteristic with the standard pattern registered in the recognition dictionary 60, and outputs the most similar character code. Is what you do. Recognition dictionary 60
Is provided with information such as the character type of the form 1 from the form format registration unit 40. Further, the character recognition unit 50
The output side is connected to an operation unit 70, where knowledge processing such as word matching and confirmation /
Processing such as correction is performed.

【０００７】図３（ａ）〜（ｄ）は、図２のＯＣＲによ
る処理の一例を示す説明図である。以下、この図３
（ａ）〜（ｄ）を参照しつつ図２の動作を説明する。読
み取り対象の帳票１は、イメージ取得部１０で読み取ら
れ、図３（ａ）に示すような帳票イメージがイメージメ
モリ２０に格納される。FIGS. 3A to 3D are explanatory diagrams showing an example of processing by the OCR of FIG. Hereinafter, FIG.
The operation of FIG. 2 will be described with reference to (a) to (d). The form 1 to be read is read by the image acquisition unit 10 and a form image as shown in FIG.

【０００８】次に、文字領域切出部３１が起動され、帳
票形式登録部４０の情報に基づいて、イメージメモリ２
０から、図３（ｂ）に示すように、読み取り対象の文字
列と、文字を記入するための枠線を含む文字領域のイメ
ージが切り出される。更に、枠線検出部３２が起動さ
れ、水平方向及び垂直方向に投影した黒画素の頻度分布
が算出されて、縦及び横の枠線が検出される。文字領域
切出部３１で切り出された文字領域のイメージと、枠線
検出部３２で検出された枠線の情報は、枠線除去部３３
へ与えられる。Next, the character area extracting section 31 is activated, and the image memory 2 is stored on the basis of the information of the form format registering section 40.
From 0, as shown in FIG. 3B, an image of a character string including a character string to be read and a frame line for writing characters is cut out. Further, the frame line detection unit 32 is activated, the frequency distribution of black pixels projected in the horizontal direction and the vertical direction is calculated, and vertical and horizontal frame lines are detected. The image of the character area cut out by the character area cutout unit 31 and the information of the frame detected by the frame detection unit 32 are stored in a frame removal unit 33.
Given to.

【０００９】枠線除去部３３では、図３（ｃ）に示すよ
うに、文字領域のイメージから枠線部分が除去される。
枠線除去部３３によって枠線が除去されて得られた文字
パターンは、文字抽出部３４へ与えられる。In the frame line removing section 33, as shown in FIG. 3C, a frame line portion is removed from the image of the character area.
The character pattern obtained by removing the frame lines by the frame line removing unit 33 is provided to the character extracting unit 34.

【００１０】文字抽出部部３４では、図３（ｄ）に示す
ように、枠線が除去されて得られた文字領域内の個々の
文字パターンが抽出され、文字認識部５０へ出力され
る。In the character extracting section 34, as shown in FIG. 3D, individual character patterns in the character area obtained by removing the frame lines are extracted and output to the character recognizing section 50.

【００１１】文字認識部５０では、個々の文字パターン
の特徴が抽出され、この特徴と認識辞書６０に登録され
た標準パターンとが比較される。そして、最も類似した
標準パターンの文字コードが、認識結果として出力され
る。The character recognition unit 50 extracts the characteristics of each character pattern, and compares the characteristics with the standard patterns registered in the recognition dictionary 60. Then, the character code of the most similar standard pattern is output as the recognition result.

【００１２】[0012]

【発明が解決しようとする課題】しかしながら、従来の
ＯＣＲでは、次のような課題があった。図３（ａ）に示
すように、帳票１に記入した文字（この図では数字の
「２」）が枠線にかかっていると、文字切出部３０にお
ける枠線除去処理の際に、枠線と共に文字の一部が除去
され、同図（ｄ）に示すように、文字が分断されてしま
う。このため、文字認識部５０において正しく認識され
ず、誤読や不読となってしまうという課題があった。However, the conventional OCR has the following problems. As shown in FIG. 3A, if the character (the numeral “2” in this figure) entered in the form 1 is over the frame line, the character cutout unit 30 performs a frame removal process. A part of the character is removed together with the line, and the character is divided as shown in FIG. For this reason, there is a problem that the character recognition unit 50 does not recognize the character correctly, resulting in erroneous reading or non-reading.

【００１３】本発明は、前記従来技術が持っていた課題
を解決し、枠線に重なった文字パターンを正しく切り出
して認識することができるＯＣＲを提供するものであ
る。The present invention solves the problems of the prior art and provides an OCR capable of correctly cutting out and recognizing a character pattern overlapping a frame line.

【００１４】[0014]

【課題を解決するための手段】前記課題を解決するため
に、本発明は、ＯＣＲにおいて、文字記入用の文字枠と
読み取り対象の文字が記載された帳票のイメージを画素
に分解して光学的に読み取るイメージ取得部と、前記イ
メージ取得部で読み取った帳票のイメージから前記文字
枠を除去して文字パターンを生成する枠線除去部と、前
記文字パターン中の文字の線幅を算出する文字線幅算出
部と、前記文字パターン中の文字の一部が前記文字枠と
共に除去されているか否かを判定し、除去されている場
合にその除去された箇所を前記文字線幅算出部で算出さ
れた線幅の線で補完する文字線補完部と、前記文字線補
完部で補完された文字パターンを認識辞書に基づいて認
識して文字コードに変換する文字認識部とを備えてい
る。In order to solve the above-mentioned problems, the present invention provides an OCR system in which an image of a form on which a character frame for character entry and a character to be read are described is decomposed into pixels, and the OCR is performed. An image acquisition unit to be read, a frame line removal unit that removes the character frame from the image of the form read by the image acquisition unit to generate a character pattern, and a character line that calculates the line width of characters in the character pattern A width calculating unit, determining whether a part of the characters in the character pattern is removed together with the character frame, and when the character is removed, the removed portion is calculated by the character line width calculating unit. And a character recognition unit that recognizes the character pattern complemented by the character line complement unit based on a recognition dictionary and converts the character pattern into a character code.

【００１５】本発明によれば、以上のようにＯＣＲを構
成したので、次のような作用が行われる。イメージ取得
部で読み取られた帳票のイメージは枠線除去部に与えら
れ、文字枠部分が除去されて文字パターンが生成され
る。文字パターンは文字線幅算出部によって、文字の線
幅が算出される。また、文字パターンは文字線補完部に
与えられ、文字枠と共に文字の一部が除去されているか
否かが判定される。文字の一部が除去されている場合に
は、文字線幅算出部で算出された線幅の線によって、そ
の除去された箇所が補完される。補完された文字パター
ンは、文字線補完部から文字認識部へ与えられ、認識辞
書に基づいて文字コードに変換される。According to the present invention, since the OCR is configured as described above, the following operation is performed. The image of the form read by the image acquisition unit is provided to the frame line removal unit, and the character frame portion is removed to generate a character pattern. The character line width of the character pattern is calculated by the character line width calculation unit. Further, the character pattern is given to the character line complementing section, and it is determined whether or not a part of the character is removed together with the character frame. When a part of the character has been removed, the removed portion is complemented by the line having the line width calculated by the character line width calculation unit. The complemented character pattern is provided from the character line complementing unit to the character recognizing unit, and is converted into a character code based on the recognition dictionary.

【００１６】[0016]

【発明の実施の形態】図１は、本発明の実施形態を示す
ＯＣＲの構成図であり、図２中の要素と共通の要素には
共通の符号が付されている。FIG. 1 is a block diagram of an OCR according to an embodiment of the present invention. Elements common to those in FIG. 2 are denoted by the same reference numerals.

【００１７】このＯＣＲは、図２と同様に、読み取り対
象の文字が記載された帳票１のイメージを、画素に分解
して光学的に読み取るイメージ取得部１０を有してい
る。イメージ取得部１０には、読み取られた帳票のイメ
ージを一時的に蓄積するイメージメモリ２０が接続され
ている。このイメージメモリ２０には、図２とは異なる
文字切出部３０Ａが接続されている。This OCR has an image acquisition unit 10 that optically separates the image of the form 1 on which the character to be read is described into pixels as in FIG. The image acquisition unit 10 is connected to an image memory 20 that temporarily stores the read form image. The image memory 20 is connected to a character extracting section 30A different from that shown in FIG.

【００１８】文字切出部３０Ａは、図２と同様の文字領
域切出部３１、枠線検出部３２及び枠線除去部３３に加
えて、文字線幅算出部３５及び文字線補完部３６を追加
したものである。The character extraction section 30A includes a character line width calculation section 35 and a character line complementation section 36 in addition to a character area extraction section 31, a frame line detection section 32 and a frame line removal section 33 similar to those shown in FIG. It has been added.

【００１９】即ち、文字領域切出部３１は、読み取り対
象の帳票１の寸法、行位置、文字数、文字種等の形式が
予め登録された帳票形式登録部４０の情報に基づいて、
イメージメモリ２０から文字領域のイメージを切り出す
ものである。枠線検出部３２は、文字領域のイメージ中
の枠線を検出するものである。また、枠線除去部３３
は、文字領域切出部３１で切り出された文字領域のイメ
ージから、枠線検出部３２で検出された枠線を除去し、
文字パターンのみを抽出するものである。That is, the character area extracting section 31 is configured to store the form, such as the size, line position, number of characters, and character type of the form 1 to be read, based on information of the form format registering section 40 in advance.
The image of the character area is cut out from the image memory 20. The frame line detection unit 32 detects a frame line in the image of the character area. Also, the frame line removing unit 33
Removes the frame line detected by the frame line detection unit 32 from the image of the character region cut out by the character region cutout unit 31,
Only character patterns are extracted.

【００２０】一方、文字線幅算出部３５は、枠線除去部
３３から出力された文字パターンに基づいて、文字を構
成する線の幅（画素数）を算出するものである。また、
文字線補完部３６は、枠線除去部３３による枠線除去処
理で分断された文字パターンを、文字線幅算出部３５で
算出された幅の線で接続することにより、文字線を補完
して正常な文字パターンを再生するものである。再生さ
れた文字パターンは、文字線補完部３６によって個々の
文字パターンとして抽出され、文字認識部５０に与えら
れるようになっている。On the other hand, the character line width calculating section 35 calculates the width (number of pixels) of a line constituting a character based on the character pattern output from the frame line removing section 33. Also,
The character line complementing unit 36 complements the character line by connecting the character patterns separated by the frame line removing process by the frame line removing unit 33 with the line having the width calculated by the character line width calculating unit 35. This is for reproducing a normal character pattern. The reproduced character patterns are extracted as individual character patterns by the character line complementing unit 36 and are provided to the character recognizing unit 50.

【００２１】文字認識部５０は、文字切出部３０Ａで切
り出された文字パターンの特徴を抽出し、この特徴と認
識辞書６０に予め登録された標準パターンを比較して、
特徴に最も近い文字コードを出力するものである。認識
辞書６０には、帳票形式登録部４０から読み取り対象の
帳票１の文字種等の情報が与えられるようになってい
る。更に、文字認識部５０の出力側には操作部７０が接
続され、この操作部７０において単語照合等の知識処理
や、オペレータによる確認・修正等の処理が行われるよ
うになっている。The character recognizing unit 50 extracts the characteristics of the character pattern extracted by the character extracting unit 30A, compares the characteristic with a standard pattern registered in advance in the recognition dictionary 60,
It outputs the character code closest to the feature. The recognition dictionary 60 is provided with information such as the character type of the form 1 to be read from the form format registration unit 40. Further, an operation unit 70 is connected to the output side of the character recognition unit 50. The operation unit 70 performs knowledge processing such as word collation and processing such as confirmation / correction by an operator.

【００２２】図４（ａ），（ｂ）は、図１中の文字切出
部３０の動作の一例を示す説明図である。以下、この図
４（ａ），（ｂ）を参照しつつ、図１の動作を説明す
る。FIGS. 4A and 4B are explanatory diagrams showing an example of the operation of the character extracting section 30 in FIG. Hereinafter, the operation of FIG. 1 will be described with reference to FIGS. 4 (a) and 4 (b).

【００２３】読み取り対象の帳票１は、イメージ取得部
１０で読み取られ、帳票イメージがイメージメモリ２０
に格納される。次に、文字切出部３０Ａの文字領域切出
部３１が起動され、帳票形式登録部４０の情報に基づい
て、イメージメモリ２０から、読み取り対象の文字列
と、文字を記入するための枠線を含む文字領域のイメー
ジが切り出される。更に、枠線検出部３２が起動され、
水平方向及び垂直方向に投影した黒画素の頻度分布が算
出されて、縦及び横の枠線が検出される。文字領域切出
部３１で切り出された文字領域のイメージと、枠線検出
部３２で検出された枠線の情報は、枠線除去部３３へ与
えられる。The form 1 to be read is read by the image acquisition unit 10 and the form image is stored in the image memory 20.
Is stored in Next, the character area extracting unit 31 of the character extracting unit 30A is activated, and based on the information of the form format registration unit 40, a character string to be read and a frame line for writing characters from the image memory 20. The image of the character area containing is cut out. Further, the frame line detection unit 32 is activated,
The frequency distribution of black pixels projected in the horizontal and vertical directions is calculated, and vertical and horizontal frame lines are detected. The image of the character area cut out by the character area cutout unit 31 and the information on the frame detected by the frame detection unit 32 are provided to a frame removal unit 33.

【００２４】枠線除去部３３において、文字領域のイメ
ージから枠線部分が除去され、図４（ａ）に示すよう
に、枠線が除去された文字パターンが文字線幅算出部３
５へ与えられる。In the frame line removing section 33, the frame portion is removed from the image of the character area, and as shown in FIG.
5 given.

【００２５】文字線幅算出部３５では、次のように１文
字毎に文字線幅Ｗが算出される。まず、文字を構成する
黒画素を計数して、その総数をＡとする。次に、図４
（ａ）の左上に示したような、２画素×２画素の大きさ
の窓で文字パターンを全面走査する。その時、窓内の４
点がすべて黒画素である位置の数を計数して、その数を
Ｑとする。次に、次式（１）によって文字線幅Ｗを算出
する。Ｗ＝Ａ／（Ａ−Ｑ）・・・（１）The character line width calculator 35 calculates the character line width W for each character as follows. First, black pixels constituting a character are counted, and the total number is set to A. Next, FIG.
The entire character pattern is scanned by a window having a size of 2 pixels × 2 pixels as shown in the upper left of FIG. At that time, 4 in the window
The number of positions where all points are black pixels is counted, and the number is set as Q. Next, the character line width W is calculated by the following equation (1). W = A / (A-Q) (1)

【００２６】文字線幅算出部３５で算出された文字線幅
Ｗは、文字線補完部３６に与えられる。The character line width W calculated by the character line width calculator 35 is given to a character line complementer 36.

【００２７】文字線補完部３６では、除去された枠線の
両側に黒画素が存在するか否かにより、文字線の一部が
枠線と共に削除されたか否かを判定する。もしも両側に
黒画素が存在すれば、その間を文字線幅算出部３５から
与えられた文字線幅Ｗに基づいて、黒画素で補完する。
図４（ｂ）中の星型の記号は、文字パターンの欠損部分
に補完された黒画素を示している。The character line complementing section 36 determines whether or not a part of the character line has been deleted together with the frame line based on whether or not black pixels exist on both sides of the removed frame line. If there are black pixels on both sides, the space between them is complemented with black pixels based on the character line width W given from the character line width calculation unit 35.
A star-shaped symbol in FIG. 4B indicates a black pixel complemented by a missing portion of the character pattern.

【００２８】文字線補完部３６で補完されて得られた文
字領域内の個々の文字パターンは、文字認識部５０へ出
力される。文字認識部５０では、個々の文字パターンの
特徴が抽出され、この特徴と認識辞書６０に予め登録さ
れた標準パターンとが比較されて、特徴に最も近い文字
コードが出力される。The individual character patterns in the character area obtained by complementing by the character line complementing unit 36 are output to the character recognizing unit 50. The character recognition unit 50 extracts the characteristics of each character pattern, compares the characteristics with a standard pattern registered in advance in the recognition dictionary 60, and outputs a character code closest to the characteristic.

【００２９】このように、本実施形態のＯＣＲは、文字
パターン毎に文字線幅Ｗを算出する文字線幅算出部３５
と、算出された文字線幅Ｗに基づいて、枠線除去処理で
欠損した文字パターンを補完する文字線補完部３６を有
している。これにより、枠線と重なった文字パターンか
ら、ほぼ完全な文字パターンを再生することが可能にな
り、認識率を向上させることができるという利点があ
る。As described above, the OCR according to the present embodiment employs the character line width calculation unit 35 for calculating the character line width W for each character pattern.
And a character line complementing unit 36 for complementing a character pattern lost in the frame line removal processing based on the calculated character line width W. This makes it possible to reproduce an almost complete character pattern from the character pattern overlapping the frame line, and has the advantage that the recognition rate can be improved.

【００３０】なお、本発明は、上記実施形態に限定され
ず、種々の変形が可能である。この変形例としては、例
えば、次のようなものがある。The present invention is not limited to the above embodiment, and various modifications are possible. For example, there are the following modifications.

【００３１】（ａ）文字線幅Ｗの算出方法は、式
（１）に限定されない。例えば、縦及び横方向に連続す
る黒画素数の分布や、文字パターンにおける黒画素の比
率等に基づいて算出するようにしても良い。(A) The method of calculating the character line width W is not limited to the equation (1). For example, the calculation may be performed based on the distribution of the number of black pixels continuous in the vertical and horizontal directions, the ratio of black pixels in the character pattern, and the like.

【００３２】（ｂ）枠線補完部３６で補完する文字パ
ターンの欠損箇所は１箇所に限定されない。例えば、数
字の「０」が横方向の枠線に重なると、欠損箇所は２箇
所となる。このような場合、直近の黒画素の端部同士が
つながるように、黒画素を補充すれば良い。(B) The number of missing portions of the character pattern complemented by the frame line complementing section 36 is not limited to one. For example, if the number "0" overlaps the horizontal frame line, there are two missing portions. In such a case, the black pixels may be supplemented so that the ends of the nearest black pixels are connected to each other.

【００３３】[0033]

【発明の効果】以上詳細に説明したように、本発明によ
れば、文字パターンから文字の線幅を算出する文字線幅
算出部と、枠線除去部によって文字枠と共に除去された
文字パターンの欠損箇所を文字線幅算出部で算出された
線幅の線で補完する文字線補完部を有している。これに
より、枠線と重なった文字パターンから、ほぼ完全な文
字パターンを再生することが可能になり、認識率を向上
させることができる。As described above in detail, according to the present invention, a character line width calculating unit for calculating a line width of a character from a character pattern, and a character pattern width removed by a frame line removing unit along with a character frame. There is a character line complementing section for complementing the missing portion with a line having the line width calculated by the character line width calculating section. This makes it possible to reproduce an almost complete character pattern from the character pattern overlapping the frame line, and improve the recognition rate.

[Brief description of the drawings]

【図１】本発明の実施形態を示すＯＣＲの構成図であ
る。FIG. 1 is a configuration diagram of an OCR showing an embodiment of the present invention.

【図２】従来のＯＣＲの一例を示す構成図である。FIG. 2 is a configuration diagram illustrating an example of a conventional OCR.

【図３】図２のＯＣＲによる処理の一例を示す説明図で
ある。FIG. 3 is an explanatory diagram showing an example of processing by the OCR in FIG. 2;

【図４】図１中の文字切出部３０Ａの動作の一例を示す
説明図である。FIG. 4 is an explanatory diagram showing an example of an operation of a character cutout unit 30A in FIG.

[Explanation of symbols]

１帳票１０イメージ取得部２０イメージメモリ３０文字切出部３１文字領域切出部３２枠線検出部３３枠線除去部３５文字線幅算出部３６文字線補完部４０帳票形式登録部５０文字認識部６０認識辞書７０操作部 1 Form 10 Image Acquisition Unit 20 Image Memory 30 Character Extraction Unit 31 Character Area Extraction Unit 32 Frame Line Detection Unit 33 Frame Line Removal Unit 35 Character Line Width Calculation Unit 36 Character Line Completion Unit 40 Form Format Registration Unit 50 Character Recognition Unit 60 recognition dictionary 70 operation unit

Claims

[Claims]

An image acquisition unit that decomposes an image of a form in which a character frame for character entry and a character to be read are described into pixels and optically reads the image; and A frame line removal unit that generates a character pattern by removing a character frame, a character line width calculation unit that calculates a line width of a character in the character pattern, and a part of the character in the character pattern is together with the character frame. A character line complementer that determines whether or not the character line has been removed, and when the character line has been removed, complements the removed portion with a line having the line width calculated by the character line width calculator; An optical character reading device, comprising: a character recognition unit that recognizes the character pattern complemented by (1) based on a recognition dictionary and converts the character pattern into a character code.