JP2000187704A

JP2000187704A - Character recognition device, its method and storage medium

Info

Publication number: JP2000187704A
Application number: JP10365509A
Authority: JP
Inventors: Tadanori Nakatsuka; 忠則中塚
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1998-12-22
Filing date: 1998-12-22
Publication date: 2000-07-04

Abstract

PROBLEM TO BE SOLVED: To make it possible to make a user more accurately know characters whose similarity is uncertain and to improve the usability of the character recognition device. SOLUTION: When an image is inputted in a step S31, a character is segmented from the inputted image in a step S32. In a step S33, character recognition processing is applied to each character segmented in the step S32 to obtain respective similarity. In a step S34, the similarity obtained by the character recognition processing in the step S33 is corrected based on the size (segmenting size) of each character segmented in the step S32. In the case of displaying a recognized result in a step S35, the recognized result is provided to a user by changing the colors of characters e.g. so as to discriminate a recognized result whose similarity is lower than a prescribed threshold.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は入力された画像デー
タ中に存在するパターンに基づいて文字を認識する文字
認識装置及びその方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition apparatus and method for recognizing characters based on a pattern existing in input image data.

【０００２】[0002]

【従来の技術】一般に、文字認識装置は文字画像をいく
つかに分割してそれぞれの領域ごとに文字の方向成分を
取り出し、あらかじめ記憶している認識対象文字の方向
成分と比較して類似度を算出して認識結果を出力してい
た。2. Description of the Related Art In general, a character recognition device divides a character image into several parts, extracts a directional component of the character for each area, and compares the directional component with a directional component of a character to be recognized which is stored in advance to determine the similarity. It calculated and output the recognition result.

【０００３】[0003]

【発明が解決しようとする課題】一般に文字認識におい
ては、正確に類似度を算出するためには文字パターンに
ある程度の大きさが必要である。すなわち、文字認識処
理に際しては、文字が大きく類似度が正確に出せる場合
と、文字が小さくつぶれているなど類似度があまり正確
に出せない場合がある。特に、上付き文字、句読点等は
文字が小さくなり、文字がつぶれやすく、類似度があま
り正確に出せない場合が多い。Generally, in character recognition, a character pattern needs to have a certain size in order to accurately calculate similarity. That is, in the character recognition process, there are cases where the character is large and the similarity can be accurately output, and cases where the character is small and crushed and the similarity cannot be output very accurately. In particular, superscripts, punctuation marks, and the like often have small characters, tend to be crushed, and cannot provide a similarity very accurately.

【０００４】一般に、文字認識処理においては、文字が
大きく類似度を正確に出せる場合と、そうでない場合の
区別がないため、確からしい文字と不確かで誤っている
可能性のある文字を区別してユーザに知らせたり、不確
かな文字に特別な処理を加えたりすることが困難であっ
た。In general, in character recognition processing, there is no distinction between a case where a character is able to give a large degree of similarity accurately and a case where the similarity is not accurate. It was difficult to inform and to give special treatment to uncertain characters.

【０００５】本発明は上記従来例に鑑みてなされたもの
で、類似度が不確かである文字をユーザに知らしめるこ
とを可能とする文字認識装置及びその方法を提供するこ
とを目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above conventional example, and has as its object to provide a character recognition apparatus and a method thereof that can inform a user of a character whose similarity is uncertain.

【０００６】また、本発明の他の目的は、誤認識を減少
するための処理を効率よく施し認識精度を高めることに
ある。It is another object of the present invention to efficiently perform processing for reducing erroneous recognition and improve recognition accuracy.

【０００７】[0007]

【課題を解決するための手段】上記目的を達成するため
に本発明の文字認識装置は例えば以下のような構成を備
える。すなわち、入力された画像に基づいて文字認識を
行う文字認識装置であって、前記入力された画像より文
字を切り出す切出手段と、前記切出手段により切り出さ
れた文字について文字認識処理を施し、それぞれの類似
度を得る認識手段と、前記切出手段により切り出された
文字の大きさに基づいて前記認識手段で得られた類似度
を補正する補正手段とを備える。In order to achieve the above object, a character recognition device according to the present invention has, for example, the following configuration. That is, a character recognition device that performs character recognition based on an input image, a cutout unit that cuts out a character from the input image, and performs a character recognition process on the character cutout by the cutout unit. A recognition unit that obtains each similarity; and a correction unit that corrects the similarity obtained by the recognition unit based on the size of the character extracted by the extraction unit.

【０００８】また、上記の目的を達成するための本発明
による文字認識装置は、例えば以下の構成を備える。す
なわち、入力された画像に基づいて文字認識を行う文字
認識装置であって、前記入力された画像より文字を切り
出す切出手段と、前記切出手段により切り出された文字
について文字認識処理を施し、それぞれの類似度を得る
認識手段と、前記切出手段により切り出された文字の位
置に基づいて前記認識手段で得られた類似度を補正する
補正手段とを備える。A character recognition device according to the present invention for achieving the above object has, for example, the following configuration. That is, a character recognition device that performs character recognition based on an input image, a cutout unit that cuts out a character from the input image, and performs a character recognition process on the character cutout by the cutout unit. A recognition unit that obtains each similarity; and a correction unit that corrects the similarity obtained by the recognition unit based on the position of the character extracted by the extraction unit.

【０００９】また、好ましくは、上記の文字認識装置に
おいて、前記補正手段による補正後の類似度が低い認識
結果を、その周囲の認識結果に基づいて修正する修正手
段をさらに備える。Preferably, the character recognition apparatus further includes a correction unit that corrects a recognition result having a low similarity after correction by the correction unit based on the recognition results around the correction result.

【００１０】[0010]

【発明の実施の形態】以下、添付図面を参照して本発明
の好適な実施形態のいくつかを詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Some preferred embodiments of the present invention will be described below in detail with reference to the accompanying drawings.

【００１１】＜文字認識装置の説明（図１）＞図１は本
発明の一実施形態の文字認識装置の概略構成を示すブロ
ック図である。図１において、１は本実施形態の文字認
識装置の全体を示している。２は画像を入力する入力部
で、例えばハードディスク等に格納された画像を入力す
る。或いは、入力部２は、原稿画像を光学的に読み取る
スキャナであってもかまわない。３は演算処理用の中央
処理装置（以下、ＣＰＵという）であり、文字認識装置
１の全体を制御する。<Description of Character Recognition Apparatus (FIG. 1)> FIG. 1 is a block diagram showing a schematic configuration of a character recognition apparatus according to an embodiment of the present invention. In FIG. 1, reference numeral 1 denotes the entire character recognition device of the present embodiment. Reference numeral 2 denotes an input unit for inputting an image, for example, inputting an image stored in a hard disk or the like. Alternatively, the input unit 2 may be a scanner that optically reads a document image. Reference numeral 3 denotes a central processing unit (hereinafter, referred to as a CPU) for arithmetic processing, which controls the entire character recognition device 1.

【００１２】４はＲＯＭであり、ＣＰＵ３によって実行
される図３以降のフローチャートで示された制御プログ
ラムや各種データを格納している。また、文字認識にお
いて用いられる認識用辞書９を格納している。５はメモ
リ（ＲＡＭ）で、ＣＰＵ３のワークエリアとして使用さ
れるとともに、入力部２より入力された画像データを一
時的に記憶する領域、文字切り部７で取り出された一文
字ごとの位置や大きさを記憶する領域、そして認識部９
で算出された文字ごとの候補文字や類似度を記憶する領
域も備えている。Reference numeral 4 denotes a ROM, which stores control programs and various data shown in the flowcharts of FIG. Also, a recognition dictionary 9 used in character recognition is stored. Reference numeral 5 denotes a memory (RAM), which is used as a work area of the CPU 3, temporarily stores image data input from the input unit 2, and stores the position and size of each character extracted by the character cutting unit 7. And a recognition unit 9
There is also provided an area for storing the candidate character and similarity for each character calculated in.

【００１３】ＣＰＵ３は、ＲＯＭ４に格納された各種制
御プログラムを実行することにより、類似度補正部６、
文字切り部７、認識部８、表示部１０を実現する。類似
度補正部６は、文字の大きさ情報を用いて類似度の補正
を行う。文字きり部７は、メモリ５に記憶された画像内
のテキストの文字を取り出し、文字の位置や大きさをメ
モリ５に記憶する。認識部８では、認識用辞書９を用い
て文字認識を行なう。表示部１０は、入力された画像及
び文字認識結果に基づいて、ディスプレイ１４への表示
を制御する。The CPU 3 executes various control programs stored in the ROM 4 to thereby execute the similarity correction section 6,
The character cutting unit 7, the recognition unit 8, and the display unit 10 are realized. The similarity correction unit 6 corrects the similarity using the character size information. The character cutting unit 7 extracts the characters of the text in the image stored in the memory 5 and stores the positions and sizes of the characters in the memory 5. The recognition unit 8 performs character recognition using the recognition dictionary 9. The display unit 10 controls display on the display 14 based on the input image and the character recognition result.

【００１４】また、１１はシステムバスであり、ＣＰＵ
３よりのデータバス、アドレスバス及び制御信号バス等
を含んでいる。１２はインターフェース部であり、外部
の出力装置、例えばプリンタ１３やディスプレイ１４等
とのインターフェース制御を行う。A system bus 11 has a CPU
3 including a data bus, an address bus and a control signal bus. Reference numeral 12 denotes an interface unit that controls an interface with an external output device, for example, a printer 13 or a display 14.

【００１５】＜第１の実施形態＞図２は第１の実施形態
の文字認識装置１による文字認識処理の結果を示すテキ
ストウインドウの一例を示す図である。図２において、
２１は通常の認識結果のうちの一文字で、黒で表示され
ている。また、２２は不確かな認識結果、すなわち誤っ
ている可能性の高い認識結果の一文字で、赤で表示され
ている。以下、このような表示を実現するための制御手
順について説明する。<First Embodiment> FIG. 2 is a view showing an example of a text window showing the result of character recognition processing by the character recognition device 1 of the first embodiment. In FIG.
Reference numeral 21 denotes one character of the normal recognition result, which is displayed in black. Reference numeral 22 denotes an uncertain recognition result, that is, one character of a recognition result having a high possibility of being erroneous, and is displayed in red. Hereinafter, a control procedure for realizing such display will be described.

【００１６】図３は第１の実施形態による文字認識処理
の手順を説明するフローチャートである。上述したよう
に、この処理を実現するための制御プログラムはＲＯＭ
４に記憶されており、ＣＰＵ３によって実行される。FIG. 3 is a flowchart for explaining the procedure of the character recognition process according to the first embodiment. As described above, the control program for realizing this processing is the ROM
4 is executed by the CPU 3.

【００１７】まずステップＳ３１で、画像を入力部２
（ハードディスク或いはスキャナ等）から読取って入力
し、メモリ５に記憶する。First, in step S31, an image is input to the input unit 2.
The data is read from a hard disk or a scanner, input, and stored in the memory 5.

【００１８】次にステップＳ３２に進み、文字切り部７
により、文字認識する文字の切り出しを行なう。文字の
切り出しの方法は周知のいかなる方法を用いても良い。
例えば横書き文章であれば、文書画像の黒画素の横方向
のヒストグラムをとり黒画素が連続して存在する領域を
行として取り出し、次に行部分に関して縦方向のヒスト
グラムをとり黒画素が連続して存在する領域を文字とし
て取り出す。Then, the process proceeds to a step S32, wherein the character cutting unit 7
Cuts out a character to be recognized. Any known method may be used as a method for cutting out characters.
For example, in the case of a horizontally written sentence, a horizontal histogram of black pixels of the document image is taken, an area where black pixels are continuously present is taken out as a row, and then a vertical histogram is taken for a row portion and black pixels are continuously taken. Extract existing area as character.

【００１９】次にステップＳ３３に進み、認識部８によ
り、ステップＳ３２で取り出した文字の文字認識を行な
う。Next, the operation proceeds to step S33, in which the recognition unit 8 performs character recognition of the character extracted in step S32.

【００２０】次にステップＳ３４に進み、類似度補正部
６により、ステップＳ３２で得られた文字の大きさ情報
を用いてステップＳ３３で得られた類似度の補正を行
う。ここで、類似度の補正は、例えば以下のように行
う。Then, the process proceeds to step S34, in which the similarity correction unit 6 corrects the similarity obtained in step S33 using the character size information obtained in step S32. Here, the correction of the similarity is performed, for example, as follows.

【００２１】本実施形態では、文字の大きさに応じて例
えば以下のように補正率を設定し、類似度の補正を行
う。すなわち、例えば、図２の文字２２（カンマ「，」）について，類
似度８０００、縦３２ドット、横１４ドットが得られた
場合、文字２２（カンマ「，」）の類似度は、８０００
×０．６＝４８００と、補正される。In the present embodiment, the correction rate is set according to the size of the character, for example, as follows, and the similarity is corrected. That is, For example, when a similarity of 8000, 32 dots vertically and 14 dots horizontally is obtained for the character 22 (comma “,”) in FIG. 2, the similarity of the character 22 (comma “,”) is 8000.
× 0.6 = 4800, which is corrected.

【００２２】以上のような補正処理をステップＳ３３で
得られた全文字に関して行い、結果をメモリ５に記憶す
る。The above correction processing is performed on all the characters obtained in step S33, and the result is stored in the memory 5.

【００２３】次にステップＳ３５に進み、表示部１０が
図２の如く認識結果を表示する。Next, the process proceeds to step S35, where the display unit 10 displays the recognition result as shown in FIG.

【００２４】ここで例えば、類似度が７０００未満の文
字を不確かな認識結果、すなわち誤っている可能性の高
い認識結果として赤色の文字で表示を行うようにする。
この結果、図２のように、文字２２は赤い文字で表示さ
れる。一方、類似度が７０００以上の文字は図２の文字
２１の如く黒い文字で表示する。Here, for example, a character having a similarity of less than 7000 is displayed as a red character as an uncertain recognition result, that is, a recognition result having a high possibility of being erroneous.
As a result, the character 22 is displayed in red as shown in FIG. On the other hand, characters having a similarity of 7000 or more are displayed as black characters like character 21 in FIG.

【００２５】尚、プリンタ１３より画像等を出力する処
理については説明を省略する。The description of the process of outputting an image or the like from the printer 13 is omitted.

【００２６】以上のように、第１の実施形態によれば、
認識文字の大きさに基づいて算出された類似度を補正す
ることにより、ユーザに不確かな文字を正確に知らせる
ことが可能となる。このため、誤認識の修正の容易な、
ユーザにとって使いやすい文字認識装置を実現すること
ができる。As described above, according to the first embodiment,
By correcting the similarity calculated based on the size of the recognized character, it is possible to accurately inform the user of the uncertain character. Therefore, it is easy to correct misrecognition,
A character recognition device that is easy for the user to use can be realized.

【００２７】＜第２の実施形態＞上記の第１の実施形態
では、図３のステップＳ３４において、類似度補正をし
た後、ステップＳ３５で結果表示をしている。第２の実
施形態では、類似度が閾値を越えなかった認識結果に対
して補正を加えることにより、誤認識の減少を図ろうと
するものである。<Second Embodiment> In the first embodiment, after the similarity is corrected in step S34 of FIG. 3, the result is displayed in step S35. In the second embodiment, an attempt is made to reduce erroneous recognition by correcting a recognition result whose similarity does not exceed the threshold value.

【００２８】図４は第２の実施形態による文字認識の手
順を説明するフローチャートである。この処理を実現す
るための制御プログラムはＲＯＭ４に記憶されており、
ＣＰＵ３によって実行される。図４において、ステップ
Ｓ３１〜Ｓ３５は第１の実施形態（図３）と同様の処理
が行われる。第２の実施形態では、ステップＳ３４にお
ける類似度補正処理の後に、ステップＳ４１で、認識度
の低い認識結果について誤認識を軽減するための個別処
理を施す。以下、ステップＳ４１における処理について
説明する。FIG. 4 is a flowchart for explaining the procedure of character recognition according to the second embodiment. A control program for realizing this processing is stored in the ROM 4,
It is executed by the CPU 3. In FIG. 4, the same processing as in the first embodiment (FIG. 3) is performed in steps S31 to S35. In the second embodiment, after the similarity correction processing in step S34, in step S41, individual processing is performed to reduce erroneous recognition of a recognition result with a low degree of recognition. Hereinafter, the process in step S41 will be described.

【００２９】第１の実施形態と同様に、類似度７０００
未満の文字（認識結果）を類似度の低い認識結果とし、
これらの認識結果に関して個別処理を施す。Similar to the first embodiment, similarity 7000
Characters (recognition results) less than
Individual processing is performed on these recognition results.

【００３０】例えば、第１の実施形態で類似度補正され
た図２の文字２２カンマ「，」は、類似度４８００なの
で個別処理の対象となる。For example, the character 22 comma “,” in FIG. 2 whose similarity has been corrected in the first embodiment is subjected to individual processing because the similarity is 4,800.

【００３１】個別処理の例を以下に示す。なお、類似度
については変更せず、そのままとする。従って、ステッ
プＳ３５における表示処理では、文字個別処理内容「，」文章の最後が「。」であれば「、」に変更する「、」文章の最後が「．」であれば「，」に変更する「。」文章の途中に「，」があれば「．」に変更する「．」文章の途中に「、」があれば「。」に変更する・・・・・上述の例に従って個別処理の一例を示すと、個別処理前
の「本日は，晴天なり。」において、「，」の類似度は
所定値未満なので、個別処理が実行されて「、」に変更
され、「本日は、晴天なり。」となる。An example of the individual processing will be described below. Note that the similarity is not changed and is not changed. Therefore, in the display processing in step S35, the character individual processing content is changed to "," if the end of the sentence is ".". If the end of the sentence is ".", It is changed to ",". Yes "." If there is "," in the middle of the sentence, change it to ".". "." If there is a "," in the middle of the sentence, change it to "." ... Individual processing according to the above example For example, in “Today is fine weather” before the individual processing, the similarity of “,” is less than a predetermined value, so the individual processing is executed and changed to “,”, and “Today is fine weather”. It becomes. "

【００３２】ステップＳ３５では、第１の実施形態と同
様に類似度７０００未満の文字を赤で表示する。In step S35, characters having a similarity of less than 7000 are displayed in red, as in the first embodiment.

【００３３】以上説明したように本実施形態によれば、
不確かな文字をユーザに正確に知らせたり、誤認識を減
少するための個別処理を効率よく施すことが可能とな
り、認識精度を高めるなど、使い勝手が向上する。As described above, according to the present embodiment,
The user can be notified of uncertain characters accurately, and individual processing for reducing erroneous recognition can be efficiently performed, thereby improving usability such as improving recognition accuracy.

【００３４】＜第３の実施形態＞次に第３の実施形態を
説明する。図５は第３の実施形態による認識結果の表示
状態を示す図である。上記第１及び第２の実施形態で
は、認識対象となった各文字の大きさ（切り出された文
字の大きさ）に基づいて類似度の補正を行ったが、第３
の実施形態では、認識対象となった文字の位置に基づい
て類似度補正を行う。<Third Embodiment> Next, a third embodiment will be described. FIG. 5 is a diagram illustrating a display state of a recognition result according to the third embodiment. In the first and second embodiments, the similarity is corrected based on the size of each character to be recognized (the size of the cut-out character).
In the embodiment, the similarity correction is performed based on the position of the character to be recognized.

【００３５】図６は第３の実施形態のによる文字認識手
順を説明するフローチャートである。なお、この処理を
実現するための制御プログラムはＲＯＭ４に記憶されて
おり、ＣＰＵ３により実行される。また、ステップＳ３
１、Ｓ３２、Ｓ３３、Ｓ３５は第１の実施形態（図３）
と同様の処理であるので、ここではその説明を省略す
る。FIG. 6 is a flowchart for explaining a character recognition procedure according to the third embodiment. Note that a control program for realizing this processing is stored in the ROM 4 and executed by the CPU 3. Step S3
1, S32, S33, and S35 are the first embodiment (FIG. 3)
Since the processing is the same as described above, the description is omitted here.

【００３６】ステップＳ６１では、ステップＳ３２で得
られた文字の位置情報を用いてステップＳ３３で得られ
た類似度の補正を以下のように行う。In step S61, the degree of similarity obtained in step S33 is corrected as follows using the character position information obtained in step S32.

【００３７】位置情報補正率上付き文字０．８下付き文字０．８ルビ０．６縦中横文字０．９ ※ 縦中横文字とは、縦書きの中に含まれる数字２文字
などが横に並んで横書きで書かれた文字である。Position information Correction rate Superscript 0.8 Subscript 0.8 Ruby 0.6 Tate-chu-yokoji 0.9 * Tate-chu-yoko is horizontal characters such as two characters included in vertical writing. These are characters written side by side.

【００３８】例えば、図５に示した文字５２（上付き文
字の括弧「（」）に関して、ステップＳ３３（認識部
８）による文字認識の結果、類似度が８０００であった
とする。文字５２は、その切り出し位置から、上付き文
字の括弧「（」であると判断されるので、その類似度
は、８０００×０．８＝６４００と補正される。For example, it is assumed that the similarity of the character 52 (superscript parentheses "(") shown in Fig. 5 is 8000 as a result of character recognition by the step S33 (recognition unit 8). Based on the cutout position, it is determined that the parenthesis is a parenthesis “(”, so the similarity is corrected to 8000 × 0.8 = 6400.

【００３９】以上のような処理をステップＳ３３で得ら
れた全文字に関して行い、結果をメモリ５に記憶する。The above processing is performed on all the characters obtained in step S33, and the result is stored in the memory 5.

【００４０】次にステップＳ３５に進み、認識結果を表
示する。Next, the process proceeds to step S35, where the recognition result is displayed.

【００４１】ここで例えば、類似度が７０００未満の文
字を不確かな認識結果、すなわち誤っている可能性の高
い認識結果として、認識結果において赤色で表示を行
う。例えば、図５における文字５２は赤い文字で表示さ
れる。一方、類似度が７０００以上の文字は黒い文字で
表示される。例えば、図５の認識結果において、文字２
１は黒い文字で表示される。Here, for example, a character having a similarity of less than 7000 is displayed in red in the recognition result as an uncertain recognition result, that is, a recognition result having a high possibility of being erroneous. For example, the character 52 in FIG. 5 is displayed in red. On the other hand, characters having a similarity of 7000 or more are displayed as black characters. For example, in the recognition result of FIG.
1 is displayed in black letters.

【００４２】尚、プリンタ１３より画像等を出力する処
理については説明を省略する。The description of the process of outputting an image or the like from the printer 13 is omitted.

【００４３】以上のように、第３の実施形態によれば、
ユーザに不確かな文字をより正確に知らせることが可能
となり、誤認識の修正の容易な、ユーザにとって使いや
すい文字認識装置を実現することができる。As described above, according to the third embodiment,
It is possible to inform the user of the uncertain character more accurately, and it is possible to realize a character recognition device that can be easily corrected for erroneous recognition and is easy for the user to use.

【００４４】＜第４の実施形態＞次に第４の実施形態を
説明する。第４の実施形態では、誤認識軽減のための個
別処理を、ステップＳ６１の文字位置に基づく文字認識
の類似度補正処理に基づいて行う。<Fourth Embodiment> Next, a fourth embodiment will be described. In the fourth embodiment, individual processing for reducing erroneous recognition is performed based on the similarity correction processing for character recognition based on the character position in step S61.

【００４５】図７は第４の実施形態による文字認識処理
の手順を説明するフローチャートである。なお、この処
理を実現する制御プログラムはＲＯＭ４に記憶されてお
り、ＣＰＵ３によって実行される。FIG. 7 is a flowchart for explaining the procedure of character recognition processing according to the fourth embodiment. Note that a control program for realizing this processing is stored in the ROM 4 and executed by the CPU 3.

【００４６】図７において、上述した第３の実施形態に
よるステップＳ６１により類似度補正をした後、ステッ
プＳ７１において、認識結果を修正する誤認識軽減の為
の個別処理が施される。ステップＳ７１では、例えば、
類似度が７０００未満の文字に関して認識結果に応じた
個別処理を施す。なお、類似度についてはそのままとす
る。In FIG. 7, after the similarity is corrected in step S61 according to the above-described third embodiment, in step S71, individual processing for correcting erroneous recognition for correcting the recognition result is performed. In step S71, for example,
Individual processing according to the recognition result is performed on characters having a similarity of less than 7000. The similarity is not changed.

【００４７】例えば、第３の実施形態で類似度補正され
た上付き括弧「（」は、類似度６４００なので個別処理
の対象となる。個別処理の例の一部を以下に示す。For example, the superscript parenthesis "(" corrected for similarity in the third embodiment is subject to individual processing since the similarity is 6400. Some examples of individual processing are shown below.

【００４８】文字：個別処理内容（：続く文章内に、より類似度の高い「］」があれば「［」に、より類似度の高い「｝」があれば「｛」に、より類似度の高い「〉」があれば「〈」に変更する［：続く文章内に、より類似度の高い「）」があれば「（」に、より類似度の高い「｝」があれば「｛」に、より類似度の高い「〉」があれば「〈」に変更する｛：続く文章内により類似度の高い「）」があれば「（」に、より類似度の高い「〉」があれば「〈」に、より類似度の高い「］」があれば「［」に変更する〈：続く文章内により類似度の高い「］」があれば「［」に、より類似度の高い「）」があれば「（」に、より類似度の高い「｝」があれば「｛」に変更する・・・・・。Characters: Individual processing contents (: In the following sentence, if there is “]” with a higher similarity, “[”. If “｝” with a higher similarity is “｛”, If there is a higher “>”, change it to “<”. [: If the following sentence contains “)” with a higher similarity, “(” If ">" has a higher similarity to "", change it to "<". ｛: If ")" has a higher similarity in the following sentence, change it to ">" for a higher similarity. If there is, change it to “<”, and if there is a higher similarity “]”, change it to “[”. If there is a high “)”, change it to “(”. If there is a higher similarity “｝”, change it to “｛”.

【００４９】上記の処理を施すことにより、例えば、個
別処理前に「本日(2]は、晴天なり。」という認識結果
が得られ、「（」の類似度６４００≦「］」の類似度７
１００であった場合、個別処理が実行される。この個別
処理により、「（」が「［」に変更されて、「本日[2]
は、晴天なり。」となる。By performing the above processing, for example, before the individual processing, a recognition result of “Today (2) is fine weather” is obtained, and the similarity of “(” is 6400 ≦ “]” and the similarity is 7
If it is 100, individual processing is executed. By this individual processing, "(" is changed to "[" and "Today [2]
Is sunny. ".

【００５０】ステップＳ３５では、第３の実施形態と同
様に類似度７０００未満の文字を赤で表示する。In step S35, characters having a similarity of less than 7000 are displayed in red, as in the third embodiment.

【００５１】なお、個別処理としては第２の実施形態で
説明したような処理を適用することも可能である。ま
た、第４の実施形態で説明した処理を、第２の実施形態
で用いることも可能である。もちろん、第２と第４の実
施形態で説明した処理を共存させてもよい。It should be noted that the processing described in the second embodiment can be applied as the individual processing. Further, the processing described in the fourth embodiment can be used in the second embodiment. Of course, the processes described in the second and fourth embodiments may coexist.

【００５２】以上説明したように本実施形態によれば、
不確かな文字をユーザに正確に知らせたり、誤認識を減
少するための個別処理を効率よく施し、認識精度を高め
ることが可能となり、使い勝手が向上する。As described above, according to the present embodiment,
The user can be notified of an uncertain character accurately, and individual processing for reducing erroneous recognition can be efficiently performed, and the recognition accuracy can be increased. As a result, usability is improved.

【００５３】なお、本発明は、複数の機器（例えばホス
トコンピュータ，インタフェイス機器，リーダ，プリン
タなど）から構成されるシステムに適用しても、一つの
機器からなる装置（例えば、複写機，ファクシミリ装置
など）に適用してもよい。Even if the present invention is applied to a system including a plurality of devices (for example, a host computer, an interface device, a reader, a printer, etc.), an apparatus (for example, a copier, a facsimile, etc.) comprising one device. Device).

【００５４】また、本発明の目的は、前述した実施形態
の機能を実現するソフトウェアのプログラムコードを記
録した記憶媒体を、システムあるいは装置に供給し、そ
のシステムあるいは装置のコンピュータ（またはＣＰＵ
やＭＰＵ）が記憶媒体に格納されたプログラムコードを
読出し実行することによっても、達成されることは言う
までもない。It is another object of the present invention to provide a storage medium storing a program code of software for realizing the functions of the above-described embodiments to a system or an apparatus, and to provide a computer (or CPU) of the system or the apparatus.
And MPU) read and execute the program code stored in the storage medium.

【００５５】この場合、記憶媒体から読出されたプログ
ラムコード自体が前述した実施形態の機能を実現するこ
とになり、そのプログラムコードを記憶した記憶媒体は
本発明を構成することになる。In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiment, and the storage medium storing the program code constitutes the present invention.

【００５６】プログラムコードを供給するための記憶媒
体としては、例えば、フロッピディスク，ハードディス
ク，光ディスク，光磁気ディスク，ＣＤ−ＲＯＭ，ＣＤ
−Ｒ，磁気テープ，不揮発性のメモリカード，ＲＯＭな
どを用いることができる。As a storage medium for supplying the program code, for example, a floppy disk, hard disk, optical disk, magneto-optical disk, CD-ROM, CD
-R, a magnetic tape, a nonvolatile memory card, a ROM, or the like can be used.

【００５７】また、コンピュータが読出したプログラム
コードを実行することにより、前述した実施形態の機能
が実現されるだけでなく、そのプログラムコードの指示
に基づき、コンピュータ上で稼働しているＯＳ（オペレ
ーティングシステム）などが実際の処理の一部または全
部を行い、その処理によって前述した実施形態の機能が
実現される場合も含まれることは言うまでもない。When the computer executes the readout program code, not only the functions of the above-described embodiment are realized, but also the OS (Operating System) running on the computer based on the instruction of the program code. ) May perform some or all of the actual processing, and the processing may realize the functions of the above-described embodiments.

【００５８】さらに、記憶媒体から読出されたプログラ
ムコードが、コンピュータに挿入された機能拡張ボード
やコンピュータに接続された機能拡張ユニットに備わる
メモリに書込まれた後、そのプログラムコードの指示に
基づき、その機能拡張ボードや機能拡張ユニットに備わ
るＣＰＵなどが実際の処理の一部または全部を行い、そ
の処理によって前述した実施形態の機能が実現される場
合も含まれることは言うまでもない。Further, after the program code read from the storage medium is written into a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer, based on the instructions of the program code, It goes without saying that the CPU provided in the function expansion board or the function expansion unit performs part or all of the actual processing, and the processing realizes the functions of the above-described embodiments.

【００５９】[0059]

【発明の効果】以上説明したように、本発明によれば、
類似度が不確かである文字をより正確にユーザに知らし
めることが可能となる。また、本発明によれば、誤認識
を減少するための処理を効率よく施し認識精度を高める
ことが可能となる。このため、文字認識装置の使い勝手
が向上する。As described above, according to the present invention,
It is possible to more accurately notify the user of a character whose similarity is uncertain. Further, according to the present invention, it is possible to efficiently perform processing for reducing erroneous recognition and improve recognition accuracy. Therefore, usability of the character recognition device is improved.

[Brief description of the drawings]

【図１】本発明の一実施形態の文字認識装置の概略構成
を示すブロック図である。FIG. 1 is a block diagram illustrating a schematic configuration of a character recognition device according to an embodiment of the present invention.

【図２】第１の実施形態の文字認識装置１による文字認
識処理の結果を示すテキストウインドウの一例を示す図
である。FIG. 2 is a diagram illustrating an example of a text window showing a result of a character recognition process performed by the character recognition device 1 according to the first embodiment.

【図３】第１の実施形態による文字認識処理の手順を説
明するフローチャートである。FIG. 3 is a flowchart illustrating a procedure of a character recognition process according to the first embodiment.

【図４】第２の実施形態による文字認識の手順を説明す
るフローチャートである。FIG. 4 is a flowchart illustrating a procedure of character recognition according to a second embodiment.

【図５】第３の実施形態による認識結果の表示状態を示
す図である。FIG. 5 is a diagram illustrating a display state of a recognition result according to a third embodiment.

【図６】第３の実施形態のによる文字認識手順を説明す
るフローチャートである。FIG. 6 is a flowchart illustrating a character recognition procedure according to a third embodiment.

【図７】第４の実施形態による文字認識処理の手順を説
明するフローチャートである。FIG. 7 is a flowchart illustrating a procedure of a character recognition process according to a fourth embodiment.

[Explanation of symbols]

１文字認識装置２入力部３ＣＰＵ４ＲＯＭ５メモリ６類似度補正部７文字切り部８認識部９認識用辞書１０表示部１１システムバス１２インターフェース部１３プリンタ１４ディスプレイ Reference Signs List 1 character recognition device 2 input unit 3 CPU 4 ROM 5 memory 6 similarity correction unit 7 character cutting unit 8 recognition unit 9 recognition dictionary 10 display unit 11 system bus 12 interface unit 13 printer 14 display

Claims

[Claims]

1. A character recognition device for performing character recognition based on an input image, comprising: a cutout unit that cuts out a character from the input image; and a character recognition process for the character cutout by the cutout unit. And a correction unit that corrects the similarity obtained by the recognition unit based on the size of the character extracted by the extraction unit. Character recognition device.

2. The character recognition device according to claim 1, further comprising a presentation unit that presents a character recognition result based on the similarity after the correction by the correction unit.

3. The character recognition device according to claim 2, wherein the presentation unit displays a recognition result having a similarity smaller than a predetermined value so that the user can identify the recognition result.

4. The character recognition device according to claim 1, wherein the correction unit corrects the character cut out by the extraction unit so as to reduce the degree of similarity with respect to the small-sized character.

5. The method according to claim 1, wherein the correcting unit is provided with a plurality of ranks for the size cut out by the cutting unit, and the ratio for changing the similarity is different for each rank. The character recognition device according to claim 1.

6. The character recognition apparatus according to claim 1, further comprising a correction unit that corrects a recognition result having a low degree of similarity after correction by the correction unit based on a recognition result around the correction result.

7. The character recognition device according to claim 6, wherein the correction unit performs correction based on consistency of punctuation marks.

8. The character recognition device according to claim 6, wherein the correction unit performs correction based on matching of parentheses.

9. A character recognition apparatus for performing character recognition based on an input image, comprising: a cutout unit that cuts out a character from the input image; and a character recognition process for the character cutout by the cutout unit. And a correction unit for correcting the similarity obtained by the recognition unit based on the position of the character extracted by the extraction unit. Recognition device.

10. The character recognition apparatus according to claim 9, further comprising a presentation unit that presents a character recognition result based on the similarity after the correction by the correction unit.

11. The character recognition apparatus according to claim 10, wherein the presentation unit displays a recognition result having a similarity smaller than a predetermined value so that the user can identify the recognition result.

12. The apparatus according to claim 1, wherein the correction unit corrects the position extracted by the extraction unit so as to reduce the degree of similarity with respect to a character at a specific position having a smaller character size than usual. Item 10. The character recognition device according to item 9.

13. The correction means, wherein the determined position includes at least one of a superscript, a subscript, ruby, and tate-chu-yoko, and sets a ratio for changing the similarity for each determined position. The character recognition device according to claim 9, wherein the character recognition device is associated with the character recognition device.

14. The character recognition apparatus according to claim 9, further comprising a correction unit that corrects a recognition result having a low similarity after correction by the correction unit based on a recognition result around the recognition result.

15. The character recognition device according to claim 14, wherein the correction unit performs correction based on consistency of punctuation marks.

16. The character recognition device according to claim 14, wherein the correction unit performs correction based on matching of parentheses.

17. A character recognition method for performing character recognition based on an input image, comprising: a cutout step of cutting out characters from the input image; and a character recognition process for the characters cut out by the cutout step. And a correction step of correcting the similarity obtained in the recognition step based on the size of the character cut out in the cut-out step. Character recognition method.

18. A character recognition method for performing character recognition based on an input image, comprising: a cutout step of cutting out characters from the input image; and a character recognition process for the characters cut out by the cutout step. And a correction step of correcting the similarity obtained in the recognition step based on the position of the character cut out in the cut-out step. Recognition method.

19. The apparatus according to claim 17, further comprising a correction step of correcting a recognition result having a low degree of similarity after the correction in the correction step based on the recognition results around the recognition result.
Or the character recognition method described in 18.

20. A storage medium for storing a control program for causing a computer to perform character recognition based on an input image, the control program comprising: a code for an extraction step for extracting characters from the input image. And performing a character recognition process on the character cut out in the cut-out step to obtain a similarity between each character, and a code obtained in the recognition step based on the size of the character cut out in the cut-out step. A correction process code for correcting the obtained similarity.

21. A storage medium for storing a control program for causing a computer to perform character recognition based on an input image, the control program comprising: a code for a cutout step of cutting out characters from the input image And performing a character recognition process on the character extracted by the extraction process to obtain a similarity between the code and a position of the character extracted by the extraction process. A correction process code for correcting the similarity.

22. The method according to claim 20, further comprising a code of a correction step of correcting a recognition result having a low similarity after the correction in the correction step based on a recognition result around the correction result. Storage medium.