JPH06266906A - Character recognition system - Google Patents

Character recognition system

Info

Publication number
JPH06266906A
JPH06266906A JP5051918A JP5191893A JPH06266906A JP H06266906 A JPH06266906 A JP H06266906A JP 5051918 A JP5051918 A JP 5051918A JP 5191893 A JP5191893 A JP 5191893A JP H06266906 A JPH06266906 A JP H06266906A
Authority
JP
Japan
Prior art keywords
character
candidate
unit
phrase
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP5051918A
Other languages
Japanese (ja)
Other versions
JP3350127B2 (en
Inventor
Toshio Niwa
寿男 丹羽
Kazuhiro Kayashima
一弘 萱嶋
泰治 〆木
Taiji Shimeki
Hidetsugu Maekawa
英嗣 前川
Satoru Ito
哲 伊藤
Yoshihiro Kojima
良宏 小島
Koji Yamamoto
浩司 山本
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP05191893A priority Critical patent/JP3350127B2/en
Publication of JPH06266906A publication Critical patent/JPH06266906A/en
Priority to US08/513,294 priority patent/US6041141A/en
Priority to US08/965,534 priority patent/US5987170A/en
Application granted granted Critical
Publication of JP3350127B2 publication Critical patent/JP3350127B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To decrease recognition errors and to improve character recognition rate by detecting the recognition errors by correcting characters while using knowledge processing, and reconstructing the recognition dictionary of a character recognition part. CONSTITUTION:A character recognition part 1 recognizes a document image 10 and outputs pieces of candidate characters per character. A candidate clause is calculated from a candidate character set 11 by using a word dictionary 6 and a grammar dictionary 7. A clause evaluated value arithmetic part 4 calculates the vocabular and grammatical rightness of the clause, the clause is selected by a clause selection part 5 with the evaluated value of the clause as a reference, and a corrected character string 14 is outputted. The candidate character set 11 is compared with the corrected character string 14, and an additional learning character 15 is decided by a candidate character comparison part 9. Based on the additional learning character 15, a recognition dictionary 16 of the character recognition part 1 is reconstructed.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、文字を読みとるための
文字認識装置に関するものである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device for reading characters.

【0002】[0002]

【従来の技術】近年、データベースの発展に伴い、高速
で認識率の高い文字認識装置に対する要求が高まってい
る。
2. Description of the Related Art In recent years, with the development of databases, there has been an increasing demand for a character recognition device having a high speed and a high recognition rate.

【0003】従来の文字認識装置としては、例えば、特
開平2-214990号広報に示されているような、図9に示し
た文字認識装置が提案されている。文字訂正部8は、文
字認識部1から1文字に付きN個の候補文字を入力とし
て受け取る。自動訂正部61は、候補文字と訂正規則テ
ーブル63を比較し訂正規則により文字を訂正する。自
動訂正部61の訂正出力結果は操作者に表示され、操作
者は誤って認識された文字を訂正する。この訂正の操作
の情報をもとに、手動訂正制御部62で訂正規則を作
り、この規則を訂正規則テーブル63に登録し、以降の
認識結果に訂正規則を適用して認識誤りを自動訂正す
る。これにより、操作者が行った訂正をもとに、文書の
フォントに合わせた文字認識を行うことができる。
As a conventional character recognition device, for example, the character recognition device shown in FIG. 9 as disclosed in Japanese Patent Laid-Open No. 2-214990 has been proposed. The character correction unit 8 receives N candidate characters per character from the character recognition unit 1 as input. The automatic correction unit 61 compares the candidate character with the correction rule table 63 and corrects the character according to the correction rule. The correction output result of the automatic correction unit 61 is displayed to the operator, and the operator corrects the character that is erroneously recognized. Based on the information of this correction operation, the manual correction control unit 62 creates a correction rule, registers this rule in the correction rule table 63, and applies the correction rule to subsequent recognition results to automatically correct the recognition error. . As a result, based on the correction made by the operator, it is possible to perform character recognition that matches the font of the document.

【0004】[0004]

【発明が解決しようとする課題】しかしながら、上記の
文字認識装置では、操作者が行った訂正をもとに訂正規
則テーブルを作成するために、人手をかけずに自動的に
訂正規則を作成することができない。
However, in the above-mentioned character recognition device, since the correction rule table is created based on the correction made by the operator, the correction rule is automatically created without human intervention. I can't.

【0005】本発明は、このような従来の課題を解決す
るもので、知識処理を用いて修正された文字列をもと
に、文字認識部の認識辞書を自動的に再構成し、これに
より自動的に文書のフォントに合った認識を行い、文字
認識率を高くすることを目的としている。
The present invention solves such a conventional problem, and automatically reconstructs the recognition dictionary of the character recognizing unit based on the character string corrected by the knowledge processing. The purpose is to automatically recognize the font of the document and increase the character recognition rate.

【0006】[0006]

【課題を解決するための手段】本発明は上記目的を達成
するために、文字修正部において訂正された文字を、文
字認識部における候補文字と比較することにより、文字
認識部の誤り易い文字を抽出する。抽出された文字は文
字認識部に送られ、この文字をもとに文字認識部におけ
る認識辞書を再構成することにより、文字認識部の処理
を認識対象の文書の文字に適応させ、認識誤りをなくす
る。
In order to achieve the above object, the present invention compares a character corrected in a character correction unit with a candidate character in a character recognition unit to detect an error-prone character in the character recognition unit. Extract. The extracted characters are sent to the character recognition unit, and by reconstructing the recognition dictionary in the character recognition unit based on these characters, the processing of the character recognition unit is adapted to the characters of the document to be recognized, and the recognition error is eliminated. To lose.

【0007】[0007]

【作用】本発明は上記した構成により、文字修正部が文
字を訂正した情報から、文字認識部の誤り易い文字を抽
出し、この文字をもとに文字認識部の認識辞書を再構成
する。これにより、文字認識部における認識誤りが減
り、文字認識率が向上する。
With the above-described structure, the present invention extracts a character that is likely to be erroneous in the character recognizing unit from the information in which the character correcting unit has corrected the character, and reconstructs the recognition dictionary of the character recognizing unit based on this character. This reduces recognition errors in the character recognition unit and improves the character recognition rate.

【0008】[0008]

【実施例】以下、本発明の第1の発明の実施例について
説明する。図1にこの実施例の文字認識装置の全体の構
成を示す。文字認識部1は、認識辞書16を用いて文書
画像10より文字認識を行い、1文字につき第1候補文
字から第n候補文字までのn個の候補文字を持つ候補文
字集合を出力する。
EXAMPLES Examples of the first invention of the present invention will be described below. FIG. 1 shows the overall configuration of the character recognition device of this embodiment. The character recognition unit 1 performs character recognition from the document image 10 using the recognition dictionary 16 and outputs a candidate character set having n candidate characters from the first candidate character to the nth candidate character for each character.

【0009】単語検索部2は、単語辞書6を検索するこ
とにより候補文字集合11の組み合せの中から、単語辞
書6に存在する単語と一致する候補文字の組み合せであ
る候補単語集合12を選び出す。文節検索部3は、文法
辞書7を参照して候補単語集合12から文節となりえる
単語の組み合せ候補文節集合13を選び出す。文節評価
値演算部4は、文節検索部3で検索された文節の語彙的
および文法的な正しさを文節中の単語の長さや頻度を基
準として評価値を計算する。文節選択部5は、文節の候
補の中で評価値の最も大きい文節を選択し、修正文字列
14を出力する。
The word search unit 2 searches the word dictionary 6 to select a candidate word set 12 which is a combination of candidate characters matching a word existing in the word dictionary 6 from the combinations of the candidate character set 11. The phrase searching unit 3 refers to the grammar dictionary 7 and selects a candidate phrase set 13 that is a combination of words that can be a phrase from the candidate word set 12. The bunsetsu evaluation value calculation unit 4 calculates an evaluation value based on the lexical and grammatical correctness of the bunsetsu searched by the bunsetsu searching unit 3 based on the length and frequency of the words in the bunsetsu. The phrase selecting unit 5 selects the phrase having the largest evaluation value among the phrase candidates and outputs the corrected character string 14.

【0010】候補文字比較部9は、修正文字列14と候
補文字集合11を比較し、修正文字列と候補文字集合の
第1候補文字とが異なる文字を抽出し、追加学習文字1
5として文字認識部1に送る。
The candidate character comparison unit 9 compares the modified character string 14 with the candidate character set 11 and extracts a character whose modified character string is different from the first candidate character of the candidate character set.
5 is sent to the character recognition unit 1.

【0011】上記の構成の文字認識装置において次のよ
うにして文字認識を行う。まず、認識対象の文書画像1
0を文字認識部1で認識辞書16を用いて文字認識し
て、1文字につき第1候補文字から第n候補文字までの
n個の候補文字を持つ候補文字集合を出力する。
Character recognition is performed as follows in the character recognition device having the above configuration. First, the document image 1 to be recognized
The character recognition unit 1 character-recognizes 0 using the recognition dictionary 16 and outputs a candidate character set having n candidate characters from the first candidate character to the n-th candidate character per character.

【0012】さらに、単語検索部2で、単語辞書6を検
索することにより候補文字集合11の組み合せの中か
ら、単語辞書6に存在する単語と一致する候補文字の組
み合せである候補単語集合12を選び出す。さらに、文
節検索部3で、文法辞書7を参照して候補単語集合12
から文節となりえる単語の組み合せである候補文節集合
13を選び出す。文節検索部3で検索された文節の語彙
的および文法的な正しさを文節中の単語の長さや頻度な
どを基準として文節評価値を計算する。文節評価値を求
めた候補文節に対して文節評価値を基準にして、文節選
択部5で正しい文節の組み合せを選択し修正文字列14
を出力する。
Further, the word search unit 2 searches the word dictionary 6 to select a candidate word set 12 which is a combination of candidate characters matching a word existing in the word dictionary 6 from the combinations of the candidate character sets 11. Pick out. Further, the phrase searching unit 3 refers to the grammar dictionary 7 and sets the candidate word set 12
A candidate phrase set 13, which is a combination of words that can be a phrase, is selected. The lexical evaluation value is calculated based on the lexical and grammatical correctness of the bunsetsu searched by the bunsetsu searching unit 3 based on the length and frequency of the words in the bunsetsu. Based on the phrase evaluation value for the candidate phrase for which the phrase evaluation value is obtained, the phrase selection unit 5 selects the correct combination of phrases and the modified character string 14 is selected.
Is output.

【0013】候補文字比較部9は、修正文字列14と候
補文字集合11の比較を行う。同じ文字位置の修正文字
列の文字と候補文字集合の第1候補文字を比較し、これ
らの二つの文字が異なれば、追加学習文字15として出
力する。
The candidate character comparison unit 9 compares the corrected character string 14 with the candidate character set 11. The characters of the corrected character string at the same character position are compared with the first candidate character of the candidate character set, and if these two characters are different, the additional learning character 15 is output.

【0014】文字認識部1は、追加学習文字15を受け
取り、追加学習文字の文字画像と修正文字列の文字か
ら、追加学習文字が認識できるように認識辞書16に追
加学習文字15の辞書を追加する。
The character recognition unit 1 receives the additional learning character 15 and adds a dictionary of the additional learning character 15 to the recognition dictionary 16 so that the additional learning character can be recognized from the character image of the additional learning character and the character of the corrected character string. To do.

【0015】これにより、文字認識部1における初めの
文字認識で認識できなかった文字も認識辞書に追加文字
の辞書が追加されたことにより認識可能になる。
As a result, the character that cannot be recognized by the first character recognition in the character recognition unit 1 can be recognized by adding the dictionary of additional characters to the recognition dictionary.

【0016】なお、文字認識部1における認識辞書16
への追加学習文字15の辞書への追加は、文字認識部1
をニューラルネットワークで構成してネットワークの重
みを追加学習によって変化させて処理を行っても良い。
The recognition dictionary 16 in the character recognition unit 1
The learning character 15 is added to the dictionary by the character recognition unit 1.
May be configured by a neural network and the weight of the network may be changed by additional learning to perform the processing.

【0017】また、本実施例では候補文字比較部9で、
修正文字列と第1候補文字との比較を行ったが、修正文
字と第m候補文字(1≦m≦i<n)とを比較し、修正
文字がi個の候補文字の中に含まれていなかったら、修
正文字を追加学習文字15として出力するようにしても
良い。
In the present embodiment, the candidate character comparison unit 9
The corrected character string was compared with the first candidate character, but the corrected character was compared with the mth candidate character (1 ≦ m ≦ i <n), and the corrected character was included in the i candidate characters. If not, the corrected character may be output as the additional learning character 15.

【0018】次に、本発明の第2の発明の実施例につい
て説明する。図2にこの実施例の文字認識装置の全体の
構成を示す。
Next, a second embodiment of the present invention will be described. FIG. 2 shows the overall configuration of the character recognition device of this embodiment.

【0019】文字認識部1、単語検索部2、文節検索部
3、文節評価演算部4、文節選択部5、候補文字比較部
9は、第1の発明の実施例と同じである。
The character recognition unit 1, the word search unit 2, the phrase search unit 3, the phrase evaluation calculation unit 4, the phrase selection unit 5, and the candidate character comparison unit 9 are the same as those in the first embodiment of the invention.

【0020】同文字抽出部21は、候補文字比較部から
出力された文字に対して、同じ文字を抽出し、さらにそ
の文字が異なる単語に含まれている場合に、その文字を
追加学習文字15として出力する。
The same character extraction unit 21 extracts the same character from the characters output from the candidate character comparison unit, and when the character is included in different words, the character is additionally learned character 15 Output as.

【0021】上記の構成の文字認識装置において次のよ
うにして文字認識を行う。まず、認識対象の文書画像1
0を文字認識部1で認識辞書16を用いて文字認識し
て、1文字につき第1候補文字から第n候補文字までの
n個の候補文字を持つ候補文字集合を出力する。
Character recognition is performed as follows in the character recognition device having the above configuration. First, the document image 1 to be recognized
The character recognition unit 1 character-recognizes 0 using the recognition dictionary 16 and outputs a candidate character set having n candidate characters from the first candidate character to the n-th candidate character per character.

【0022】さらに、単語検索部2で、単語辞書6を検
索することにより候補文字集合11の組み合せの中か
ら、単語辞書6に存在する単語と一致する候補文字の組
み合せである候補単語集合12を選び出す。さらに、文
節検索部3で、文法辞書7を参照して候補単語集合12
から文節となりえる単語の組み合せである候補文節集合
13を選び出す。文節検索部3で検索された文節の語彙
的および文法的な正しさを文節中の単語の長さや頻度な
どを基準として文節評価値を計算する。文節評価値を求
めた候補文節に対して文節評価値を基準にして、文節選
択部5で正しい文節の組み合せを選択し修正文字列14
を出力する。
Further, the word search unit 2 searches the word dictionary 6 to find a candidate word set 12 which is a combination of candidate characters matching a word existing in the word dictionary 6 from among the combinations of the candidate character sets 11. Pick out. Further, the phrase searching unit 3 refers to the grammar dictionary 7 and sets the candidate word set 12
A candidate phrase set 13, which is a combination of words that can be a phrase, is selected. The lexical evaluation value is calculated based on the lexical and grammatical correctness of the bunsetsu searched by the bunsetsu searching unit 3 based on the length and frequency of the words in the bunsetsu. Based on the phrase evaluation value for the candidate phrase for which the phrase evaluation value is obtained, the phrase selection unit 5 selects the correct combination of phrases and the modified character string 14 is selected.
Is output.

【0023】候補文字比較部9は、修正文字列14と候
補文字集合11の比較を行う。同じ文字位置の修正文字
列の文字と候補文字集合の第1候補文字を比較し、これ
らが異なる文字を出力する。
The candidate character comparison unit 9 compares the corrected character string 14 with the candidate character set 11. The characters of the modified character string at the same character position are compared with the first candidate character of the candidate character set, and the characters differing from these are output.

【0024】同文字抽出部21は、候補文字比較部9が
出力した文字に対して、同じ文字を抽出し、さらにその
文字が異なる単語に含まれている場合に、その文字を追
加学習文字15として出力する。例えば、文字修正部8
によって訂正文字列が図3に示すように出力されたと
き、候補文字比較部9は、文章の『文』、認識の
『認』、文法の『文』、訂正の『正』を出力する。これ
らの文字に対して、同文字抽出部は、文章の『文』と文
法の『文』が同じ文字であり、かつ異なる単語に含まれ
ているので、『文』を追加学習文字15として出力す
る。
The same character extraction unit 21 extracts the same character from the characters output by the candidate character comparison unit 9, and when the character is included in a different word, the character is additionally learned character 15 Output as. For example, the character correction unit 8
When the corrected character string is output as shown in FIG. 3, the candidate character comparison unit 9 outputs the sentence “sentence”, the recognition “recognition”, the grammar “sentence”, and the correction “correct”. With respect to these characters, the same character extraction unit outputs the “sentence” as the additional learning character 15 because the “sentence” of the sentence and the “sentence” of the grammar are the same character and are included in different words. To do.

【0025】文字認識部1は、追加学習文字15を受け
取り、追加学習文字の文字画像と修正文字列の文字か
ら、追加学習文字が認識できるように認識辞書16に追
加学習文字15の辞書を追加する。
The character recognition unit 1 receives the additional learning character 15 and adds a dictionary of the additional learning character 15 to the recognition dictionary 16 so that the additional learning character can be recognized from the character image of the additional learning character and the character of the corrected character string. To do.

【0026】これにより、文字認識部1における初めの
文字認識で認識できなかった文字も認識辞書に追加文字
の辞書が追加されたことにより認識可能になる。
As a result, the character which cannot be recognized by the first character recognition in the character recognition unit 1 can be recognized by adding the dictionary of additional characters to the recognition dictionary.

【0027】なお、文字認識部1における認識辞書16
への追加学習文字15の辞書への追加は、文字認識部1
をニューラルネットワークで構成してネットワークの重
みを追加学習によって変化させて処理を行っても良い。
The recognition dictionary 16 in the character recognition unit 1
The learning character 15 is added to the dictionary by the character recognition unit 1.
May be configured by a neural network and the weight of the network may be changed by additional learning to perform the processing.

【0028】また、本実施例では候補文字比較部9で、
修正文字列と第1候補文字との比較を行ったが、修正文
字と第m候補文字(1≦m≦i<n)とを比較し、修正
文字がi個の候補文字の中に含まれていなかったら、修
正文字を追加学習文字15として出力するようにしても
良い。
Further, in this embodiment, the candidate character comparison unit 9
The corrected character string was compared with the first candidate character, but the corrected character was compared with the mth candidate character (1 ≦ m ≦ i <n), and the corrected character was included in the i candidate characters. If not, the corrected character may be output as the additional learning character 15.

【0029】次に、本発明の第3の発明の実施例につい
て説明する。図4にこの実施例の文字認識装置の全体の
構成を示す。
Next, a third embodiment of the present invention will be described. FIG. 4 shows the overall configuration of the character recognition device of this embodiment.

【0030】図5に図4の文字認識部1の構成を示す。
文字認識部1はニューラルネットワークにより、候補文
字を認識する。類似度計算部36は、文字画像と重み係
数37とから各文字との類似度を計算し、候補文字を出
力する。重み係数更新部38は、候補文字と追加学習文
字の誤差とをもとにして重み係数を更新することもでき
る。
FIG. 5 shows the configuration of the character recognition unit 1 shown in FIG.
The character recognition unit 1 recognizes a candidate character using a neural network. The similarity calculation unit 36 calculates the similarity between each character from the character image and the weighting coefficient 37 and outputs a candidate character. The weighting factor updating unit 38 can also update the weighting factor based on the error between the candidate character and the additional learning character.

【0031】なお、単語検索部2、文節検索部3、文節
評価値演算部4、文節選択部5は、第1の発明の実施例
と同じである。
The word search unit 2, the phrase search unit 3, the phrase evaluation value calculation unit 4, and the phrase selection unit 5 are the same as those in the first embodiment of the present invention.

【0032】図4のキーワード抽出部31は、文節選択
部5の出力の修正文字列14から認識対象の文書のキー
ワードを抽出し、キーワード集合35を作成する。キー
ワードの抽出は、例えば文書中の単語の頻度と一般の文
書における単語の頻度との差から求める。キーワード部
分一致検索部32は、得られたキーワード集合35と候
補文字集合11との部分一致検索を行う。例えば、キー
ワードとして、「認識」が抽出されていれば、修正文字
列14にある「認*」や「*識」が部分一致文字として
抽出される。候補単語付加部33は、部分一致したキー
ワードを候補単語集合に付加する。前述の例では、部分
一致した「認*」や「*識」が「認識」として候補単語
集合12に付加される。これによって、文字認識部1か
ら出力されなかった文字を文字訂正に用いることができ
る。
The keyword extracting unit 31 in FIG. 4 extracts the keywords of the document to be recognized from the corrected character string 14 output from the phrase selecting unit 5, and creates the keyword set 35. The keyword is extracted, for example, from the difference between the frequency of words in a document and the frequency of words in a general document. The keyword partial match search unit 32 performs a partial match search between the obtained keyword set 35 and the candidate character set 11. For example, if “recognition” is extracted as the keyword, “recognition *” or “* knowledge” in the correction character string 14 is extracted as a partially matching character. The candidate word addition unit 33 adds the partially matched keywords to the candidate word set. In the above-described example, the partially matching “recognition *” and “* knowledge” are added to the candidate word set 12 as “recognition”. As a result, the character not output from the character recognition unit 1 can be used for character correction.

【0033】候補外文字検出部34は、文節選択部5が
出力した修正文字列14の中で、候補単語付加部33に
よって付加された候補外文字を検出し、その文字を追加
学習文字15として出力する。
The non-candidate character detection unit 34 detects the non-candidate character added by the candidate word addition unit 33 in the corrected character string 14 output by the phrase selection unit 5, and sets that character as the additional learning character 15. Output.

【0034】上記の構成の文字認識装置において次のよ
うにして文字認識を行う。まず、認識対象の文書画像1
0を文字認識部1で図5に示した重み係数37を参照し
て類似度計算部36で、1文字につき第1候補文字から
第n候補文字までのn個の候補文字を持つ候補文字集合
を出力する。
Character recognition is performed as follows in the character recognition device having the above configuration. First, the document image 1 to be recognized
In the character recognition unit 1, 0 is referred to the weighting factor 37 shown in FIG. 5, and in the similarity calculation unit 36, a candidate character set having n candidate characters from the first candidate character to the nth candidate character per character. Is output.

【0035】さらに、単語検索部2で、単語辞書6を検
索することにより候補文字集合11の組み合せの中か
ら、単語辞書6に存在する単語と一致する候補文字の組
み合せである候補単語集合12を選び出す。さらに、文
節検索部3で、文法辞書7を参照して候補単語集合12
から文節となりえる単語の組み合せである候補文節集合
13を選び出す。文節検索部3で検索された文節の語彙
的および文法的な正しさを文節中の単語の長さや頻度な
どを基準として文節評価値を計算する。文節評価値を求
めた候補文節に対して文節評価値を基準にして、文節選
択部5で正しい文節の組み合せを選択し修正文字列14
を出力する。
Further, the word search unit 2 searches the word dictionary 6 to find a candidate word set 12 which is a combination of candidate characters matching a word existing in the word dictionary 6 from the combinations of the candidate character sets 11. Pick out. Further, the phrase searching unit 3 refers to the grammar dictionary 7 and sets the candidate word set 12
A candidate phrase set 13, which is a combination of words that can be a phrase, is selected. The lexical evaluation value is calculated based on the lexical and grammatical correctness of the bunsetsu searched by the bunsetsu searching unit 3 based on the length and frequency of the words in the bunsetsu. Based on the phrase evaluation value for the candidate phrase for which the phrase evaluation value is obtained, the phrase selection unit 5 selects the correct combination of phrases and the modified character string 14 is selected.
Is output.

【0036】次いで、キーワード抽出部31で、修正文
字列14からキーワード集合35を抽出する。
Next, the keyword extracting unit 31 extracts the keyword set 35 from the corrected character string 14.

【0037】キーワード部分一致検索部32で、キーワ
ード集合35と候補文字集合12との部分一致検索を行
う。次に候補単語付加部33で、キーワード部分一致検
索部32で出力された単語を候補単語集合13に付加す
る。
The keyword partial match search section 32 performs a partial match search between the keyword set 35 and the candidate character set 12. Next, the candidate word addition unit 33 adds the words output by the keyword partial match search unit 32 to the candidate word set 13.

【0038】再び、文節検索部3と文節評価値演算部4
で、付加された候補単語から候補文節を検索し、文節評
価値を求める。
Again, the phrase search unit 3 and the phrase evaluation value calculation unit 4
Then, the candidate phrase is searched from the added candidate word to obtain the phrase evaluation value.

【0039】さらに、文節選択部5で、文節の候補の中
で評価値の大きい文節を選択し、修正文字列14を出力
する。
Further, the phrase selecting section 5 selects a phrase having a large evaluation value among the phrase candidates and outputs the corrected character string 14.

【0040】候補外文字検出部34で、修正文字列14
の中で候補単語付加部33によって付加された文字を検
出し、その文字を追加学習文字15として出力する。
In the non-candidate character detection unit 34, the corrected character string 14
A character added by the candidate word addition unit 33 is detected in the table, and the character is output as the additional learning character 15.

【0041】重み係数更新部38は、追加学習文字と候
補文字との誤差をもとに重み係数37を更新し、追加学
習を行う。
The weighting factor updating unit 38 updates the weighting factor 37 based on the error between the additional learning character and the candidate character to perform additional learning.

【0042】これにより、文字認識部1における初めの
文字認識で認識できなかった文字も認識辞書に追加文字
の辞書が追加されたことにより認識可能になる。
As a result, a character that cannot be recognized by the first character recognition in the character recognition unit 1 can be recognized by adding a dictionary of additional characters to the recognition dictionary.

【0043】なお、文字認識部1は、ニューラルネット
ワークを用いない文字認識方式を用いてもよい。この方
式として例えば、各文字の平均値ベクトルを認識辞書と
して有し、それと画像との比較により文字を認識しても
よい。認識辞書を用いた場合は、追加された文字をもと
に認識辞書を再構築して、追加学習を行う。
The character recognition unit 1 may use a character recognition method that does not use a neural network. As this method, for example, an average value vector of each character may be provided as a recognition dictionary, and the character may be recognized by comparing it with an image. When the recognition dictionary is used, the recognition dictionary is reconstructed based on the added characters to perform additional learning.

【0044】次に、本発明の第4の発明の実施例につい
て説明する。図6にこの実施例の文字認識装置の全体の
構成を示す。
Next, a fourth embodiment of the present invention will be described. FIG. 6 shows the overall configuration of the character recognition device of this embodiment.

【0045】なお、文字認識部1、単語検索部2、文節
検索部3、文節評価値演算部4、文節選択部5は、第1
の発明の実施例と同じである。
The character recognition unit 1, the word search unit 2, the phrase search unit 3, the phrase evaluation value calculation unit 4, and the phrase selection unit 5 are the first
This is the same as the embodiment of the invention.

【0046】また、キーワード抽出部31、キーワード
部分一致検索部32、候補外文字検出部34は、第4の
発明の実施例と同じである。
The keyword extracting unit 31, the keyword partial match searching unit 32, and the non-candidate character detecting unit 34 are the same as those in the fourth embodiment of the present invention.

【0047】単語誤訂正度演算部41は、修正文字列1
4から訂正された単語が、誤訂正である確からしさ、す
なわち単語誤訂正度を計算する。リジェクト文字決定部
42は、単語誤訂正度演算部41が出力した単語誤訂正
度にもとづきリジェクト文字を決定する。候補単語付加
部43は、キーワード部分一致検索部32で検索された
キーワードの中で、リジェクト文字となっている文字を
候補単語集合に付加する。
The word erroneous correction degree calculation unit 41 uses the corrected character string 1
The probability that the word corrected from 4 is an erroneous correction, that is, the word erroneous correction degree is calculated. The reject character determination unit 42 determines the reject character based on the word error correction degree output by the word error correction degree operation unit 41. The candidate word addition unit 43 adds a character that is a reject character among the keywords searched by the keyword partial match search unit 32 to the candidate word set.

【0048】上記の構成の文字認識装置において次のよ
うにして文字認識を行う。まず、認識対象の文書画像1
0を文字認識部1で認識辞書16を用いて文字認識し
て、1文字につき第1候補文字から第n候補文字までの
n個の候補文字を持つ候補文字集合を出力する。
Character recognition is performed as follows in the character recognition device having the above configuration. First, the document image 1 to be recognized
The character recognition unit 1 character-recognizes 0 using the recognition dictionary 16 and outputs a candidate character set having n candidate characters from the first candidate character to the n-th candidate character per character.

【0049】さらに、単語検索部2で、単語辞書6を検
索することにより候補文字集合11の組み合せの中か
ら、単語辞書6に存在する単語と一致する候補文字の組
み合せである候補単語集合12を選び出す。さらに、文
節検索部3で、文法辞書7を参照して候補単語集合12
から文節となりえる単語の組み合せである候補文節集合
13を選び出す。文節検索部3で検索された文節の語彙
的および文法的な正しさを文節中の単語の長さや頻度な
どを基準として文節評価値を計算する。文節評価値を求
めた候補文節に対して文節評価値を基準にして、文節選
択部5で正しい文節の組み合せを選択し修正文字列14
を出力する。
Further, the word search unit 2 searches the word dictionary 6 to select a candidate word set 12 which is a combination of candidate characters that match a word existing in the word dictionary 6 from the combinations of the candidate character sets 11. Pick out. Further, the phrase searching unit 3 refers to the grammar dictionary 7 and sets the candidate word set 12
A candidate phrase set 13, which is a combination of words that can be a phrase, is selected. The lexical evaluation value is calculated based on the lexical and grammatical correctness of the bunsetsu searched by the bunsetsu searching unit 3 based on the length and frequency of the words in the bunsetsu. Based on the phrase evaluation value for the candidate phrase for which the phrase evaluation value is obtained, the phrase selection unit 5 selects the correct combination of phrases and the modified character string 14 is selected.
Is output.

【0050】次いで、キーワード抽出部31で、修正文
字列14からキーワード集合35を抽出する。
Next, the keyword extracting unit 31 extracts the keyword set 35 from the corrected character string 14.

【0051】キーワード部分一致検索部32で、キーワ
ード集合35と候補文字集合12との部分一致検索を行
う。
The keyword partial match search unit 32 performs a partial match search between the keyword set 35 and the candidate character set 12.

【0052】次に、単語誤訂正度演算部41で、訂正単
語の長さ、単語中に含まれる文字の文字認識部1での評
価値、訂正文字と第1候補文字の文字認識部1での評価
値の差、単語を構成する文字の種類、訂正単語が正解で
ある統計確率などから単語誤訂正度を計算する。リジェ
クト文字決定部42で、訂正単語とその前後の単語の単
語誤訂正度などからリジェクト文字を決定する。
Next, in the word error correction degree calculation unit 41, the length of the corrected word, the evaluation value of the character included in the word in the character recognition unit 1, the corrected character and the character recognition unit 1 of the first candidate character are detected. The word erroneous correction degree is calculated from the difference in the evaluation value of, the type of characters forming the word, and the statistical probability that the corrected word is correct. The reject character determination unit 42 determines the reject character from the corrected word and the word error correction degree of the words before and after the corrected word.

【0053】次に候補単語付加部43で、キーワード部
分一致検索部32で出力された単語とリジェクト文字決
定部42で出力された文字とを比較し、両者が一致して
いる単語を候補単語集合13に付加する。
Next, the candidate word addition unit 43 compares the words output by the keyword partial match search unit 32 with the characters output by the reject character determination unit 42, and the words that match both are selected as a candidate word set. Add to 13.

【0054】再び、文節検索部3と文節評価値演算部4
で、付加された候補単語から候補文節を検索し、文節評
価値を求める。
Again, the phrase retrieval unit 3 and the phrase evaluation value calculation unit 4
Then, the candidate phrase is searched from the added candidate word to obtain the phrase evaluation value.

【0055】さらに、文節選択部5で、文節の候補の中
で評価値の大きい文節を選択し、修正文字列14を出力
する。
Further, the phrase selecting section 5 selects a phrase having a large evaluation value among the phrase candidates and outputs the corrected character string 14.

【0056】候補外文字検出部34で、修正文字列14
の中で候補単語付加部43によって付加された文字を検
出し、その文字を追加学習文字15として出力する。
In the non-candidate character detection unit 34, the corrected character string 14
A character added by the candidate word addition unit 43 is detected in the table, and the character is output as the additional learning character 15.

【0057】文字認識部1は、追加学習文字15を受け
取り、追加学習文字の文字画像と修正文字列の文字か
ら、追加学習文字が認識できるように認識辞書16に追
加学習文字15の辞書を追加する。
The character recognition unit 1 receives the additional learning character 15 and adds a dictionary of the additional learning character 15 to the recognition dictionary 16 so that the additional learning character can be recognized from the character image of the additional learning character and the character of the corrected character string. To do.

【0058】これにより、文字認識部1における初めの
文字認識で認識できなかった文字も認識辞書に追加文字
の辞書が追加されたことにより認識可能になる。
As a result, the character which cannot be recognized by the first character recognition in the character recognition unit 1 can be recognized by adding the dictionary of additional characters to the recognition dictionary.

【0059】なお、文字認識部1における認識辞書16
への追加学習文字15の辞書への追加は、文字認識部1
をニューラルネットワークで構成してネットワークの重
みを追加学習によって変化させて処理を行っても良い。
The recognition dictionary 16 in the character recognition unit 1
The learning character 15 is added to the dictionary by the character recognition unit 1.
May be configured by a neural network and the weight of the network may be changed by additional learning to perform the processing.

【0060】次に、本発明の第5の発明の実施例につい
て説明する。図7にこの実施例の文字認識装置の全体の
構成を示す。
Next, a fifth embodiment of the present invention will be described. FIG. 7 shows the overall configuration of the character recognition device of this embodiment.

【0061】なお、文字認識部1、単語検索部2、文節
検索部3、文節評価値演算部4、文節選択部5は、第1
の発明の実施例と同じである。
The character recognition unit 1, the word search unit 2, the phrase search unit 3, the phrase evaluation value calculation unit 4, and the phrase selection unit 5 are the first
This is the same as the embodiment of the invention.

【0062】文字種決定部51は、修正文字列14から
文字種辞書52を参照して各文字の文字種を決定し出力
する。
The character type determining section 51 refers to the character type dictionary 52 from the corrected character string 14 to determine the character type of each character and outputs it.

【0063】上記の構成の文字認識装置において次のよ
うにして文字認識を行う。まず、認識対象の文書画像1
0を文字認識部1で認識辞書16を用いて文字認識し
て、1文字につき第1候補文字から第n候補文字までの
n個の候補文字を持つ候補文字集合を出力する。
Character recognition is performed in the character recognition device having the above-described configuration as follows. First, the document image 1 to be recognized
The character recognition unit 1 character-recognizes 0 using the recognition dictionary 16 and outputs a candidate character set having n candidate characters from the first candidate character to the n-th candidate character per character.

【0064】さらに、単語検索部2で、単語辞書6を検
索することにより候補文字集合11の組み合せの中か
ら、単語辞書6に存在する単語と一致する候補文字の組
み合せである候補単語集合12を選び出す。さらに、文
節検索部3で、文法辞書7を参照して候補単語集合12
から文節となりえる単語の組み合せである候補文節集合
13を選び出す。文節検索部3で検索された文節の語彙
的および文法的な正しさを文節中の単語の長さや頻度な
どを基準として文節評価値を計算する。文節評価値を求
めた候補文節に対して文節評価値を基準にして、文節選
択部5で正しい文節の組み合せを選択し修正文字列14
を出力する。
Further, the word search unit 2 searches the word dictionary 6 to select a candidate word set 12 which is a combination of candidate characters matching a word existing in the word dictionary 6 from among the combinations of the candidate character sets 11. Pick out. Further, the phrase searching unit 3 refers to the grammar dictionary 7 and sets the candidate word set 12
A candidate phrase set 13, which is a combination of words that can be a phrase, is selected. The lexical evaluation value is calculated based on the lexical and grammatical correctness of the bunsetsu searched by the bunsetsu searching unit 3 based on the length and frequency of the words in the bunsetsu. Based on the phrase evaluation value for the candidate phrase for which the phrase evaluation value is obtained, the phrase selection unit 5 selects the correct combination of phrases and the modified character string 14 is selected.
Is output.

【0065】文字種決定部51で、修正文字列14から
文字種辞書52に従って文字種を決定する。文字種辞書
52には、例えば、数詞前接頭語(第、約など)と数助
詞(本、個など)の間に挟まれる文字は数字である、片
仮名と片仮名に挟まれている文字は片仮名である確率が
高い、英字と英字に挟まれている文字は英字である確率
が高いなどというルールが登録されており、このルール
に従って文字種を決定する。図8は、修正文字列から文
字種を決定した例である。図8で修正文字列の中の「ニ
ューラねネットワーク」の文字列を片仮名、「9G.
5」の文字列を数字であると決定した。それ以外の部分
は、文字種の決定はされなかった。文字種が決定した文
字は文字認識部1に送られ、文字認識部1では、文字種
決定部の出力に従って文字種を限定して文字認識を再度
行う。
The character type determining unit 51 determines the character type from the corrected character string 14 according to the character type dictionary 52. In the character type dictionary 52, for example, the characters sandwiched between the pre-numerical prefix (e.g., about, etc.) and the number particle (book, individual, etc.) are numbers, and the characters sandwiched between katakana and katakana are katakana. There is a registered rule that there is a high probability that there is a high probability that the letters sandwiched between letters and letters are letters, and the character type is determined according to this rule. FIG. 8 is an example in which the character type is determined from the corrected character string. In FIG. 8, the character string “Nura-ne Network” in the modified character string is a katakana “9G.
The character string "5" was determined to be a number. Other than that, the character type was not determined. The character whose character type has been determined is sent to the character recognition unit 1, and the character recognition unit 1 performs character recognition again by limiting the character type according to the output of the character type determination unit.

【0066】単語検索部2では、文字認識部1によって
出力された文字を候補単語集合12に付加し、再度候補
単語を検索する。さらに、文節検索部3と文節評価値演
算部4で、付加された候補単語から候補文節を検索し、
文節評価値を求める。
The word search unit 2 adds the characters output by the character recognition unit 1 to the candidate word set 12 and searches again for candidate words. Furthermore, the phrase search unit 3 and the phrase evaluation value calculation unit 4 search for candidate phrases from the added candidate words,
Find the phrase evaluation value.

【0067】さらに、文節選択部5で、文節の候補の中
で評価値の大きい文節を選択し、修正文字列14を出力
する。
Further, the phrase selecting section 5 selects a phrase having a large evaluation value among the phrase candidates and outputs the corrected character string 14.

【0068】このように、文字種を限定すると、文字認
識部1は文字種が限定されたことにより認識率が向上す
る。その結果文字修正を行った結果も認識率が向上す
る。
As described above, when the character types are limited, the character recognition unit 1 improves the recognition rate because the character types are limited. As a result, the recognition rate also improves as a result of character correction.

【0069】なお、本実施例では、文字種決定部51に
おいて文字種を決定し文字種の限定を行ったが、文字種
決定部51である一定の数の文字に限定することを行っ
ても良い。例えば、修正文字列14から文字列が都市名
を表していることが決定できたら、都市名に使われてい
る文字だけに限定を行えば良い。
In this embodiment, the character type determining unit 51 determines the character type and limits the character type. However, the character type determining unit 51 may limit the number of characters to a certain number. For example, if it is determined from the corrected character string 14 that the character string represents a city name, it is sufficient to limit the characters used in the city name.

【0070】[0070]

【発明の効果】以上の実施例から明らかなように、本発
明の構成の文字認識装置を使用することにより、文字修
正部の訂正結果の情報を用いて追加学習文字を決定し、
認識対象の文書のフォントに合った認識辞書を自動的に
構成することができる。このため、認識対象の文書に合
った文字認識を行うために認識率が向上し、その実用的
効果は大きい。また、文字修正部の訂正結果の情報から
文字認識部における認識文字の文字種を限定することに
より、文字認識部の認識率を向上させることができる。
As is apparent from the above embodiments, by using the character recognition device having the configuration of the present invention, the additional learning character is determined by using the information of the correction result of the character correction unit,
It is possible to automatically configure a recognition dictionary that matches the font of the document to be recognized. For this reason, the recognition rate is improved in order to perform character recognition suitable for the document to be recognized, and its practical effect is great. Further, the recognition rate of the character recognition unit can be improved by limiting the character type of the recognized character in the character recognition unit from the information of the correction result of the character correction unit.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の文字認識装置の一実施例の構成図FIG. 1 is a configuration diagram of an embodiment of a character recognition device of the present invention.

【図2】本発明の文字認識装置の他の実施例の構成図FIG. 2 is a configuration diagram of another embodiment of the character recognition device of the present invention.

【図3】本発明の同文字抽出部の出力図FIG. 3 is an output diagram of the same character extraction unit of the present invention.

【図4】本発明の文字認識装置の別の実施例の構成図FIG. 4 is a configuration diagram of another embodiment of the character recognition device of the present invention.

【図5】本発明の文字認識部の構成図FIG. 5 is a configuration diagram of a character recognition unit of the present invention.

【図6】本発明の文字認識装置の他の実施例の構成図FIG. 6 is a configuration diagram of another embodiment of the character recognition device of the present invention.

【図7】本発明の文字認識装置の別の実施例の構成図FIG. 7 is a configuration diagram of another embodiment of the character recognition device of the present invention.

【図8】本発明の文字種決定部の出力図FIG. 8 is an output diagram of the character type determination unit of the present invention.

【図9】従来の文字認識装置の構成図FIG. 9 is a configuration diagram of a conventional character recognition device.

【符号の説明】[Explanation of symbols]

1 文字認識部 2 単語検索部 3 文節検索部 4 文節評価値演算部 5 文節選択部 6 単語辞書 7 文法辞書 8 文字修正部 9 候補文字比較部 10 文字画像 11 候補文字集合 12 候補単語集合 13 候補文節集合 14 修正文字列 15 追加学習文字 16 認識辞書 21 同文字抽出部 31 キーワード抽出部 32 キーワード部分一致検索部 33 候補単語付加部 34 候補外文字検出部 35 キーワード集合 36 類似度計算部 37 重み係数 38 重み係数更新部 41 単語誤訂正度演算部 42 リジェクト文字決定部 43 候補単語付加部 51 文字種決定部 52 文字種辞書 61 自動訂正部 62 手動訂正制御部 63 訂正規則テーブル 1 character recognition unit 2 word search unit 3 phrase search unit 4 phrase evaluation value calculation unit 5 phrase selection unit 6 word dictionary 7 grammar dictionary 8 character correction unit 9 candidate character comparison unit 10 character image 11 candidate character set 12 candidate word set 13 candidate Phrase set 14 Modified character string 15 Additional learning character 16 Recognition dictionary 21 Same character extraction unit 31 Keyword extraction unit 32 Keyword partial match search unit 33 Candidate word addition unit 34 Candidate non-character detection unit 35 Keyword set 36 Similarity calculation unit 37 Weight coefficient 38 Weighting coefficient update unit 41 Word error correction degree calculation unit 42 Rejected character determination unit 43 Candidate word addition unit 51 Character type determination unit 52 Character type dictionary 61 Automatic correction unit 62 Manual correction control unit 63 Correction rule table

フロントページの続き (72)発明者 前川 英嗣 大阪府門真市大字門真1006番地 松下電器 産業株式会社内 (72)発明者 伊藤 哲 大阪府門真市大字門真1006番地 松下電器 産業株式会社内 (72)発明者 小島 良宏 大阪府門真市大字門真1006番地 松下電器 産業株式会社内 (72)発明者 山本 浩司 大阪府門真市大字門真1006番地 松下電器 産業株式会社内Front page continued (72) Inventor Hidetsugu Maekawa 1006 Kadoma, Kadoma City, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. (72) Inventor, Satoshi Ito 1006 Kadoma, Kadoma City Osaka Prefecture (72) Invention Person Yoshihiro Kojima, 1006, Kadoma, Kadoma, Osaka Prefecture, Matsushita Electric Industrial Co., Ltd. (72) Inventor, Koji Yamamoto, 1006, Kadoma, Kadoma, Osaka Prefecture Matsushita Electric Industrial Co., Ltd.

Claims (6)

【特許請求の範囲】[Claims] 【請求項1】文書画像を認識して1文字に付きN個の候
補文字を出力する文字認識部と、 候補文字集合から単語辞書を用いて候補単語集合を求め
る単語検索部と、 候補単語集合から文法辞書を用いて候補文節を求める文
節検索部と、 文節の語彙的及び文法的な正しさを計算する文節評価値
演算部と、 文節の評価値を基準にして文節を選択し修正文字列を出
力する文節選択部と、 候補文字集合と修正文字列を比較し追加学習文字を出力
する候補文字比較部とからなることを特徴とする文字認
識装置。
1. A character recognition unit that recognizes a document image and outputs N candidate characters per character, a word search unit that obtains a candidate word set from a candidate character set using a word dictionary, and a candidate word set. A phrase search unit that obtains candidate phrases from a grammar dictionary, a phrase evaluation value calculation unit that calculates the lexical and grammatical correctness of a phrase, and a phrase that selects and corrects a phrase based on the phrase evaluation value. A character recognition device comprising: a phrase selection unit that outputs a candidate character set and a candidate character comparison unit that compares a candidate character set with a modified character string and outputs an additional learned character.
【請求項2】文書画像を認識して1文字に付きN個の候
補文字を出力する文字認識部と、 候補文字集合から単語辞書を用いて候補単語集合を求め
る単語検索部と、 候補単語集合から文法辞書を用いて候補文節を求める文
節検索部と、 文節の語彙的及び文法的な正しさを計算する文節評価値
演算部と、 文節の評価値を基準にして文節を選択し修正文字列を出
力する文節選択部と、 候補文字集合と修正文字列を比較する候補文字比較部と
異なる単語中の同文字を抽出し追加学習文字を出力する
同文字抽出部とからなることを特徴とする文字認識装
置。
2. A character recognition unit that recognizes a document image and outputs N candidate characters per character, a word search unit that obtains a candidate word set from a candidate character set using a word dictionary, and a candidate word set. A phrase search unit that obtains candidate phrases from a grammar dictionary, a phrase evaluation value calculation unit that calculates the lexical and grammatical correctness of a phrase, and a phrase that selects and corrects a phrase based on the phrase evaluation value. And a candidate character comparison unit that compares the candidate character set and the modified character string, and a same character extraction unit that extracts the same character in a different word and outputs an additional learning character. Character recognizer.
【請求項3】文書画像を認識して1文字に付きN個の候
補文字を出力する文字認識部と、 候補文字集合から単語辞書を用いて候補単語集合を求め
る単語検索部と、 候補単語集合から文法辞書を用いて候補文節を求める文
節検索部と、 文節の語彙的及び文法的な正しさを計算する文節評価値
演算部と、 文節の評価値を基準にして文節を選択し修正文字列を出
力する文節選択部と、 修正文字列から認識対象の文書のキーワードを抽出する
キーワード抽出部と、 前記候補文字列集合の中で前記キーワードとの部分一致
検索を行うキーワード部分一致検索部と、 部分一致したキーワードを候補単語集合に付加する候補
単語付加部と、 修正文字列から候補単語付加部によって付加された文字
を検出し追加学習文字を出力する候補外文字検出部とか
らなることを特徴とする文字認識装置。
3. A character recognition unit that recognizes a document image and outputs N candidate characters per character, a word search unit that obtains a candidate word set from a candidate character set using a word dictionary, and a candidate word set. A phrase search unit that obtains candidate phrases from a grammar dictionary, a phrase evaluation value calculation unit that calculates the lexical and grammatical correctness of a phrase, and a phrase that selects and corrects a phrase based on the phrase evaluation value. A phrase selection unit that outputs, a keyword extraction unit that extracts a keyword of a document to be recognized from a corrected character string, a keyword partial match search unit that performs a partial match search with the keyword in the candidate character string set, A candidate word addition unit that adds a partially matched keyword to the candidate word set, and a non-candidate character detection unit that detects the characters added by the candidate word addition unit from the corrected character string and outputs additional learning characters. Character recognition apparatus characterized by comprising.
【請求項4】文書画像を認識して1文字に付きN個の候
補文字を出力する文字認識部と、 候補文字集合から単語辞書を用いて候補単語集合を求め
る単語検索部と、 候補単語集合から文法辞書を用いて候補文節を求める文
節検索部と、 文節の語彙的及び文法的な正しさを計算する文節評価値
演算部と、 文節の評価値を基準にして文節を選択し修正文字列を出
力する文節選択部と、 修正文字列から認識対象の文書のキーワードを抽出する
キーワード抽出部と、 前記候補文字列集合の中で前記キーワードとの部分一致
検索を行うキーワード部分一致検索部と、 修正文字列中の単語の誤訂正度を求める単語誤訂正度演
算部と、 単語誤訂正度からリジェクト文字を決定するリジェクト
文字決定部と、 部分一致したキーワードとリジェクト文字を比較し一致
する文字を候補単語集合に付加する候補単語付加部と、 修正文字列から候補単語付加部によって付加された文字
を検出し追加学習文字を出力する候補外文字検出部とか
らなることを特徴とする文字認識装置。
4. A character recognition unit that recognizes a document image and outputs N candidate characters per character, a word search unit that obtains a candidate word set from a candidate character set using a word dictionary, and a candidate word set. A phrase search unit that obtains candidate phrases from a grammar dictionary, a phrase evaluation value calculation unit that calculates the lexical and grammatical correctness of a phrase, and a phrase that selects and corrects a phrase based on the phrase evaluation value. A phrase selection unit that outputs, a keyword extraction unit that extracts a keyword of a document to be recognized from a corrected character string, a keyword partial match search unit that performs a partial match search with the keyword in the candidate character string set, The word error correction degree calculation unit that determines the error correction degree of the words in the corrected character string, the reject character determination unit that determines the reject character from the word error correction degree, and the keyword that partially matches the reject character are compared. And a candidate word addition unit that adds matching characters to the candidate word set, and a non-candidate character detection unit that detects the characters added by the candidate word addition unit from the corrected character string and outputs additional learned characters. Character recognition device.
【請求項5】文字認識部が、文字画像と重み係数とから
文字の類似度を計算して候補文字を出力する類似度計算
部と、 追加学習文字と候補文字の誤差とから重み係数を更新す
る重み係数更新部とからなることを特徴とする、請求項
4記載の文字認識装置。
5. A character recognition unit updates a weighting factor from a similarity calculation unit that calculates a character similarity from a character image and a weighting factor and outputs a candidate character, and an error between an additional learning character and a candidate character. The character recognition device according to claim 4, further comprising:
【請求項6】文書画像を認識して1文字に付きN個の候
補文字を出力する文字認識部と、 候補文字集合から単語辞書を用いて候補単語集合を求め
る単語検索部と、 候補単語集合から文法辞書を用いて候補文節を求める文
節検索部と、 文節の語彙的及び文法的な正しさを計算する文節評価値
演算部と、 文節の評価値を基準にして文節を選択し修正文字列を出
力する文節選択部と、 修正文字列から文字種辞書を用いて文字種を決定する文
字種決定部とからなることを特徴とする文字認識装置。
6. A character recognition unit that recognizes a document image and outputs N candidate characters per character, a word search unit that obtains a candidate word set from a candidate character set using a word dictionary, and a candidate word set. A phrase search unit that obtains candidate phrases from a grammar dictionary, a phrase evaluation value calculation unit that calculates the lexical and grammatical correctness of a phrase, and a phrase that selects and corrects a phrase based on the phrase evaluation value. A character recognition device characterized by comprising a phrase selection unit for outputting a character type and a character type determination unit for determining a character type from a corrected character string using a character type dictionary.
JP05191893A 1992-09-28 1993-03-12 Character recognition device Expired - Fee Related JP3350127B2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP05191893A JP3350127B2 (en) 1993-03-12 1993-03-12 Character recognition device
US08/513,294 US6041141A (en) 1992-09-28 1995-08-10 Character recognition machine utilizing language processing
US08/965,534 US5987170A (en) 1992-09-28 1997-11-06 Character recognition machine utilizing language processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP05191893A JP3350127B2 (en) 1993-03-12 1993-03-12 Character recognition device

Publications (2)

Publication Number Publication Date
JPH06266906A true JPH06266906A (en) 1994-09-22
JP3350127B2 JP3350127B2 (en) 2002-11-25

Family

ID=12900260

Family Applications (1)

Application Number Title Priority Date Filing Date
JP05191893A Expired - Fee Related JP3350127B2 (en) 1992-09-28 1993-03-12 Character recognition device

Country Status (1)

Country Link
JP (1) JP3350127B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7680331B2 (en) 2004-05-25 2010-03-16 Fuji Xerox Co., Ltd. Document processing device and document processing method
JP2020204855A (en) * 2019-06-17 2020-12-24 株式会社日立製作所 Keyword detection device and keyword detection method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7680331B2 (en) 2004-05-25 2010-03-16 Fuji Xerox Co., Ltd. Document processing device and document processing method
JP2020204855A (en) * 2019-06-17 2020-12-24 株式会社日立製作所 Keyword detection device and keyword detection method

Also Published As

Publication number Publication date
JP3350127B2 (en) 2002-11-25

Similar Documents

Publication Publication Date Title
CN111859921A (en) Text error correction method and device, computer equipment and storage medium
Chaudhuri et al. OCR error detection and correction of an inflectional indian language script
JP3350127B2 (en) Character recognition device
JP4278011B2 (en) Document proofing apparatus and program storage medium
JP3255816B2 (en) Character recognition device
JP3274014B2 (en) Character recognition device and character recognition method
JP4047895B2 (en) Document proofing apparatus and program storage medium
JPH11328316A (en) Device and method for character recognition and storage medium
JPH0757059A (en) Character recognition device
JP3085107B2 (en) Character recognition device
JP3123181B2 (en) Character recognition device
JP4318223B2 (en) Document proofing apparatus and program storage medium
JP2908460B2 (en) Error recognition correction method and apparatus
JPH0290384A (en) Post-processing system for character recognizing device
JP3924899B2 (en) Text search apparatus and text search method
JP4047894B2 (en) Document proofing apparatus and program storage medium
JP2827066B2 (en) Post-processing method for character recognition of documents with mixed digit strings
JP3339879B2 (en) Character recognition device
JPH10240736A (en) Morphemic analyzing device
JP3725206B2 (en) Character recognition device
JPS646514B2 (en)
JPS62285189A (en) Character recognition post processing system
JPH03163663A (en) Unregistered word processing system
JPH04278664A (en) Address analysis processor
JPH03198180A (en) Post-processing method for character recognition

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20080913

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20080913

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090913

Year of fee payment: 7

LAPS Cancellation because of no payment of annual fees