JPH06215198A - Character recognition post-processing system - Google Patents

Character recognition post-processing system

Info

Publication number
JPH06215198A
JPH06215198A JP5003614A JP361493A JPH06215198A JP H06215198 A JPH06215198 A JP H06215198A JP 5003614 A JP5003614 A JP 5003614A JP 361493 A JP361493 A JP 361493A JP H06215198 A JPH06215198 A JP H06215198A
Authority
JP
Japan
Prior art keywords
character
dictionary
information
character recognition
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP5003614A
Other languages
Japanese (ja)
Other versions
JP2560959B2 (en
Inventor
Shinji Sase
慎治 佐瀬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=11562373&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=JPH06215198(A) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP5003614A priority Critical patent/JP2560959B2/en
Publication of JPH06215198A publication Critical patent/JPH06215198A/en
Application granted granted Critical
Publication of JP2560959B2 publication Critical patent/JP2560959B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To simultaneously handle Japanese characters and alphabetical letter strings and to enable collation in the unit of meaning as well by combining the unit of collation with the unit of a word in the case of character recognition post-processing. CONSTITUTION:With the character recognized result 20 as a key character, a dictionary reading processing 11 reads character arrangement information from a dictionary 21. In dictionary preparation processing 12, the read charcters arrangement information is analyzed, the partition of character is detected, and partition information or the character arrangement information is divided to a prescribed length. At a collation part 13, the character recognized result 20 is collated with the divided character arrangement information. In the case of collation, the partitioning character is specially collated. The result of the character arrangement information is stored and defined as the collating result with one piece of character arrangement information of the dictionary.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、文字認識装置に関し、
特に、読取結果を確認/補正する文字認識の後処理方式
に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device,
In particular, it relates to a post-processing method of character recognition for confirming / correcting the reading result.

【0002】[0002]

【従来の技術】文字認識の後処理は、文字認識の不完全
さを補助する手段として広く活用されている。
Post-processing of character recognition is widely used as a means of assisting incomplete character recognition.

【0003】入力文字列の性質としては、欧米では単語
単位で区切って書かれるのに対して日本語文字列は慣習
的にべた書きで書かれるために、処理対象に応じて後処
理手法が異なっている。
As for the nature of the input character string, in the US and Europe, the character string is delimited in units of words, whereas the Japanese character string is customarily written as a solid character. Therefore, the post-processing method differs depending on the processing target. ing.

【0004】欧米の文に対する後処理は、空白などの区
切り情報をもとに文字認識結果を単語単位に分割し、単
語単位の照合を行っている。また日本語のべた書きに対
する処理は、空白などの区切り情報に分割されている文
字列に対しても、一単語とは想定せずに照合を行ってい
る。
In post-processing for Western texts, the character recognition result is divided into word units based on delimiter information such as blanks, and collation is performed in word units. In addition, in the processing for Japanese solid writing, a character string divided into delimiter information such as white space is also collated without assuming that it is a single word.

【0005】[0005]

【発明が解決しようとする課題】前述の文字認識後処理
方式を日本語と欧米文の混在読み取りに適用すると、採
用する手法に応じて、日本語もわかち書きする必要が生
じるか、欧米文の単語が区切り情報なしに連続して抽出
される場合が生ずると言う課題があった。
When the above-mentioned character recognition post-processing method is applied to mixed reading of Japanese and Western sentences, it may be necessary to write Japanese in Japanese or Western, depending on the method adopted. However, there is a problem in that there may occur a case where data is continuously extracted without delimiter information.

【0006】また、本のタイトルのように、構成上は単
語の並びであるが、意味上では一つの単語として取り扱
うほうがよい文字列が存在する。前述の従来方式の後処
理でこのような文字列に対処する場合には、このような
文字列は単語の並びであると考えて単語別に辞書に登録
するか、あるいは単語の並び自体を一つの単語として辞
書に登録する方法がある。しかしながら、前者では意味
上では全体を一つととらえることが難しくなり、後者で
は単語長が長くなるために処理時間が膨大となる問題が
あった。
[0006] Further, like the title of a book, there is a character string that is a sequence of words in terms of structure, but is better to handle as a single word in terms of meaning. When dealing with such a character string in the post-processing of the conventional method described above, such a character string is considered to be a sequence of words and registered in the dictionary for each word, or the sequence of words itself is set as There is a method of registering as a word in the dictionary. However, it is difficult for the former to be regarded as one in terms of meaning, and for the latter, there is a problem that the processing time becomes enormous because the word length becomes long.

【0007】また意味上の一まとまりの単語列がわかる
ことにより、この間の関連性(例えば、図書名とその著
者名等)も簡単な構造で表現が可能となるが、従来の手
法では意味上一まとまりの単語列のみを取り扱うことし
かできないか、意味単位で単語列を区分することが十分
でないという問題があった。
[0007] Further, by knowing a group of words in the meaning, it is possible to express the relationship between them (for example, a book name and its author's name) with a simple structure. There is a problem in that it is possible to handle only a set of word strings, or it is not sufficient to classify word strings in semantic units.

【0008】更に、英文等では、書式により大文字←→
小文字の変化があったり、名前をイニシャルで省略する
ことがよくあるので、このような変動に対処する必要が
ある。照合時に矛盾なくこのような変動に対応する照合
を行うと処理が非常に複雑になるという問題がある。
[0008] Furthermore, in English, uppercase letters ← → depending on the format.
It is necessary to deal with such fluctuations because there are often changes in lowercase letters and omitting names with initials. There is a problem that the processing becomes very complicated if the matching corresponding to such a variation is performed without any contradiction at the time of matching.

【0009】本発明は従来の上記実情に鑑みてなされた
ものであり、従って本発明の目的は、従来の技術に内在
する上記諸課題を解決することを可能とした新規な文字
認識後処理方式を提供することにある。
The present invention has been made in view of the above-mentioned conventional circumstances, and therefore an object of the present invention is to provide a novel character recognition post-processing method capable of solving the above-mentioned problems inherent in the prior art. To provide.

【0010】[0010]

【課題を解決するための手段】上記目的を達成するため
に、本発明に係る文字認識後処理方式は、文字認識の結
果に応じて、辞書より照合する情報を読出す手段と、文
字認識結果と読み出した文字並び情報に応じて読み出し
た文字列を分割して階層的な分割辞書を生成する手段
と、入力文字列と読み出した文字並び情報を照合し、そ
の類似性を階層順に順次求める手段と、最終的に入力文
字列に対応する後処理結果を判定する手段と、を具備し
て構成される。
In order to achieve the above object, the character recognition post-processing method according to the present invention is a means for reading out information to be collated from a dictionary according to a result of character recognition, and a character recognition result. And means for generating a hierarchical division dictionary by dividing the read character string according to the read character arrangement information, and means for collating the input character string with the read character arrangement information and sequentially obtaining the similarity in hierarchical order. And means for finally determining the post-processing result corresponding to the input character string.

【0011】[0011]

【実施例】次に、本発明をその好ましい実施例について
図面を参照して具体的に説明する。
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will now be described in detail with reference to the preferred embodiments thereof with reference to the drawings.

【0012】図1は本発明(請求項1に記載の発明)の
一実施例を示すブロック構成図である。処理フロ−の概
要は以下に示す通りである。
FIG. 1 is a block diagram showing an embodiment of the present invention (the invention according to claim 1). The outline of the processing flow is as follows.

【0013】図1を参照するに、文字認識結果20が与
えられると、処理開始10が辞書読出11を起動する。
Referring to FIG. 1, when the character recognition result 20 is given, the process start 10 activates the dictionary reading 11.

【0014】図2は文字認識結果20の一例を示す図で
あり、各記入文字に対してここでは三つずつ文字認識結
果の候補が出力されている。
FIG. 2 is a diagram showing an example of the character recognition result 20, in which three candidates of the character recognition result are output for each written character.

【0015】辞書読出11は、この文字認識結果20を
キ−文字として辞書21内を検索し、照合すべき単語情
報を選択して読み出す。辞書生成部12では入力文字列
の照合範囲を区切目情報を中心に設定し、この情報をも
とに読み出した単語情報を解析し、必要に応じて短い部
分に分割した照合情報を順次生成し、照合位置とあわせ
て順次照合部13に渡す。
The dictionary reading 11 searches the dictionary 21 by using the character recognition result 20 as a key character, selects and reads word information to be collated. The dictionary generation unit 12 sets the collation range of the input character string centering on the break information, analyzes the read word information based on this information, and sequentially generates collation information divided into short parts as necessary. , And the matching positions are sequentially passed to the matching unit 13.

【0016】最終的に全部の単語情報が照合部13に送
出された後、最後に送出終了の情報が照合部13に送ら
れ、次の辞書読出が実行される。
After all the word information is finally transmitted to the collating unit 13, the information of the end of transmission is finally transmitted to the collating unit 13 and the next dictionary reading is executed.

【0017】単語の分割情報が、入力文字列の区切目情
報とあわないときには、辞書生成を中止し、次の辞書読
出を実行する。
When the word division information does not match the input character string break information, the dictionary generation is stopped and the next dictionary reading is executed.

【0018】照合部13は、順次送られてくる文字並び
の分割情報と文字認識結果20の照合を行い、その結果
を累積する。照合部13は、辞書生成12より送出終了
の情報を受け取ったときに、照合中間結果22と累積結
果を比較し、最終判定結果に残る可能性があると考えら
れる場合には、その結果を照合中間結果22に加える。
The collating unit 13 collates the division information of the character sequence and the character recognition result 20 which are sequentially transmitted, and accumulates the result. When the collation unit 13 receives the transmission end information from the dictionary generator 12, the collation unit 13 compares the collation intermediate result 22 with the cumulative result, and collates the result when it is considered that there is a possibility of remaining in the final determination result. Add to intermediate result 22.

【0019】辞書読出11において読み出すべき辞書が
なくなった場合には、判定14を起動し、照合中間結果
22を参照して、判定結果23を作成し、処理を終了す
る。
When there are no more dictionaries to be read in the dictionary reading 11, the judgment 14 is activated, the judgment result 23 is created by referring to the collation intermediate result 22, and the processing is ended.

【0020】以上のような構成の文字認識後処理方式
は、CPU(中央演算処理装置)と各種処理プログラム
およびデ−タを格納する記憶媒体(RAM,ハ−ドディ
スク,フロッピ−ディスク等)で構成することができ
る。
The character recognition post-processing system having the above-described structure is composed of a CPU (central processing unit) and a storage medium (RAM, hard disk, floppy disk, etc.) for storing various processing programs and data. Can be configured.

【0021】以下処理11〜13の処理について詳細に
説明する。
The processes of processes 11 to 13 will be described in detail below.

【0022】辞書読出処理11では、文字認識結果20
をキ−文字として、辞書21から文字並び情報を読み出
す。文字並び情報の一例を図3に示す。図3は意味単位
に分けられた一つ分の文字並び情報であり、文字格納位
置は参照する文字コ−ド列の格納位置を示しており、文
字数はそのエリアで参照する文字コ−ド数を示してい
る。連接情報は他の意味単位の文字並び情報との接続関
係を、内容情報は本情報に関する意味付き情報をそれぞ
れ示している。また文字コ−ド列のΔはブランクコ−ド
を示している。
In the dictionary reading process 11, the character recognition result 20
Is used as a key character, and character arrangement information is read from the dictionary 21. An example of the character arrangement information is shown in FIG. FIG. 3 shows one character arrangement information divided into semantic units, the character storage position shows the storage position of the character code string to be referred to, and the number of characters is the number of character code to be referred to in that area. Is shown. The concatenation information indicates the connection relation with the character arrangement information of another semantic unit, and the content information indicates the information with meaning about this information. Further, Δ in the character code string indicates a blank code.

【0023】辞書生成処理12では、文字認識結果20
より区切記号を検索し、その位置を記録する。区切り記
号は空白とその他の記号(“,”、“;”)との二種類
を抽出する。次に、読出した文字並び情報から区切り記
号を検索し、区切り記号位置を文字認識結果の区切位置
と比較し、一致した場合に文字並び情報を分割する。も
し区切り記号がなければ所定の長さで文字並び情報を分
割する。区切り記号の比較は、最終文字以外ではすべて
の区切り記号を対象とし、最終文字では空白のみが比較
対象となる。
In the dictionary generation processing 12, the character recognition result 20
Search for a delimiter and record its position. Two types of delimiters are extracted: blank and other symbols (“,”, “;”). Next, the delimiter is searched from the read character arrangement information, the delimiter position is compared with the delimiter position of the character recognition result, and when they match, the character arrangement information is divided. If there is no delimiter, the character arrangement information is divided into a predetermined length. The comparison of delimiters targets all delimiters except the last character, and only blanks are compared in the last character.

【0024】図2の文字認識結果に対して、図3の文字
並び情報を分割した例を図4に示す。
FIG. 4 shows an example in which the character arrangement information of FIG. 3 is divided from the character recognition result of FIG.

【0025】分割した文字並び情報を順次照合部13に
送出し、最終情報送出後に送出終了の情報を送出する。
The divided character arrangement information is sequentially transmitted to the collating unit 13, and after the final information is transmitted, the information of the transmission completion is transmitted.

【0026】前述の区切文字の比較は、文字並び情報の
区切り記号情報と文字認識結果の区切り記号情報位置が
一致し、文字並び情報の区切り記号以外の情報と文字認
識結果の空白以外での情報が一致することにより実施す
る。
In the comparison of the delimiter characters described above, the delimiter information of the character sequence information and the delimiter information position of the character recognition result match, and the information other than the delimiter of the character sequence information and the information other than the blank space of the character recognition result. It will be carried out by matching.

【0027】図5を用いて照合部13を説明する。The collating unit 13 will be described with reference to FIG.

【0028】図5を参照するに、照合部13は分割され
た文字並び情報と文字認識結果20の類似性を計算す
る。類似性の計算132は、文字認識結果の指定位置
に、文字並び情報で指定された候補文字があるかどうか
で確認する。たとえば類似性の値としては、文字認識の
候補順位を用いればよい。ただし、区切文字の場合に
は、予め登録されている区切文字テ−ブルを利用して、
最善の区切文字候補を検出し、その文字で分割文字並び
情報に置きかえるが、本検出は類似性の計算には含めな
い。
Referring to FIG. 5, the collation unit 13 calculates the similarity between the divided character arrangement information and the character recognition result 20. The similarity calculation 132 confirms whether or not there is a candidate character designated by the character arrangement information at the designated position of the character recognition result. For example, the candidate rank of character recognition may be used as the similarity value. However, in the case of delimiter characters, use the delimiter character table registered in advance,
The best delimiter character candidate is detected and replaced with the character division information, but this detection is not included in the similarity calculation.

【0029】分割された文字並び情報の区切文字を除く
すべての文字が文字認識結果の該当位置に存在しない場
合には、リジェクトフラッグ135をセッする。それ以
外の場合には、類似度を累積133し、照合位置と文字
並び情報と共に一次格納バッファ136に格納する。
When all the characters except the delimiter of the divided character arrangement information are not present at the corresponding positions in the character recognition result, the reject flag 135 is set. In other cases, the similarity is accumulated 133 and stored in the primary storage buffer 136 together with the collation position and the character arrangement information.

【0030】分割情報送出終了の情報を受け取ると、照
合候補の追加134を行う。まず、リジェクトフラッグ
134をチェックする。リジェクトフラッグ134がセ
ットされていない場合には、一次格納バッファ136と
照合中間結果22を比較し最終的な判定候補に残る可能
性のある場合には、照合中間結果22に追加し、照合中
間結果22の候補数が所定個数を越えた場合には、最も
ありえない候補を削除する。
When the information indicating the end of transmission of the division information is received, the collation candidate is added 134. First, the reject flag 134 is checked. If the reject flag 134 is not set, the primary storage buffer 136 is compared with the collation intermediate result 22, and if there is a possibility that it remains as a final judgment candidate, it is added to the collation intermediate result 22 and the collation intermediate result 22 is added. If the number of 22 candidates exceeds a predetermined number, the most unlikely candidate is deleted.

【0031】図6は請求項2に記載の文字認識後処理方
式を示す処理フロ−である。辞書読出31は、文字認識
結果20の候補文字をキ−文字に辞書検索を行うのにあ
わせて、照合中間結果22をもとに辞書検索を行う。こ
の時図4の連接情報として辞書検索情報を利用する。
FIG. 6 is a processing flowchart showing the character recognition post-processing method according to the second aspect. The dictionary reading 31 performs a dictionary search based on the matching intermediate result 22 in accordance with the dictionary search using the candidate character of the character recognition result 20 as a key character. At this time, the dictionary search information is used as the connection information in FIG.

【0032】判定32は、照合中間結果22より連接情
報と照合位置をもとにして照合候補を組み合わせて判定
結果23を作成する。
In the judgment 32, the judgment result 23 is created by combining the verification candidates from the verification intermediate result 22 based on the connection information and the verification position.

【0033】請求項3に記載の発明は、請求項1あるい
は2の辞書生成において、分割文字並び情報がアルファ
ベットあるいは数字含みアルファベットで構成されるこ
とを検出すると、大文字←→小文字変換により3種類の
文字並び情報を作成する。図7に、図3より作成した分
割情報を示す。
According to the third aspect of the present invention, in the dictionary generation of the first or second aspect, when it is detected that the divided character arrangement information is composed of alphabets or alphabets including numbers, three types of uppercase ← → lowercase conversion are performed. Create character arrangement information. FIG. 7 shows the division information created from FIG.

【0034】請求項4に記載の発明は、請求項1、2、
3の辞書生成において、分割文字並び情報がアルファベ
ットで構成され、分割数が二以上で、文字並び情報の内
容情報乱に姓名と記述されている場合には、最終分割情
報以外はイニシャルに置き換えるものである。図8にこ
の生成例を示す。
The invention described in claim 4 is the same as in claim 1,
In the dictionary generation of No. 3, when the divided character arrangement information is composed of alphabets, the number of divisions is two or more, and the surname and first name are described in the content information of the character arrangement information, the other than the final division information is replaced with the initials. Is. FIG. 8 shows an example of this generation.

【0035】[0035]

【発明の効果】以上説明したように、本発明によれば、
日本語表記とアルファベット表記が混在する文字並び情
報を同じ意味単位で取り扱うことが出来、かつ文字数が
日本語表記より一般的に長いアルファベットに対しても
処理速度が日本語と同程度で処理できるという効果が得
られる。
As described above, according to the present invention,
It is said that the character sequence information in which Japanese notation and alphabet notation are mixed can be handled in the same semantic unit, and that the processing speed can be processed at the same level as Japanese even for alphabets that generally have longer characters than Japanese notation. The effect is obtained.

【0036】更に請求項2に記載の発明では、図書名と
著者名のように意味単位での連接関係による照合も可能
となる。
Further, in the invention according to the second aspect, it is possible to perform collation based on a concatenation relationship in a semantic unit such as a book name and an author name.

【0037】請求項3に記載の発明は、アルファベット
表記の大文字/小文字の記入の不安定さにも1つの登録
情報で対応でき、かつ日本文の一部に英文が含まれても
対応が可能である。
The invention according to claim 3 can cope with the instability of writing uppercase / lowercase letters in alphabetical notation with one registration information, and can also cope with the case where an English sentence is included in a part of the Japanese sentence. Is.

【0038】請求項4に記載の発明は、更に加えて、1
つの登録情報で姓名の記入変動にも対応が可能となる。
In addition to the invention described in claim 4, 1
With one registration information, it is possible to respond to changes in the entry of the family name.

【0039】従って、本発明は、図書名−著者名−出版
社名の記入に代表されるような内容の記入帳票に対する
文字認識の後処理として大きな効果を有する。
Therefore, the present invention has a great effect as a post-processing of character recognition for an entry form having contents represented by entry of book name-author name-publisher name.

【図面の簡単な説明】[Brief description of drawings]

【図1】請求項1に記載の発明の一実施例を示す機能ブ
ロック構成図である。
FIG. 1 is a functional block configuration diagram showing an embodiment of the invention described in claim 1.

【図2】文字認識結果の一例を示す図である。FIG. 2 is a diagram showing an example of a character recognition result.

【図3】辞書内の一つの文字並び情報の格納形式の一例
を示す図である。
FIG. 3 is a diagram showing an example of a storage format of one character arrangement information in a dictionary.

【図4】図3の文字並び情報をもとに生成した分割文字
並び情報を示す図である。
FIG. 4 is a diagram showing divided character arrangement information generated based on the character arrangement information of FIG.

【図5】照合機能13の詳細機能を示すブロック構成図
である。
5 is a block diagram showing a detailed function of a matching function 13. FIG.

【図6】請求項2に記載の発明の一実施例を示す機能ブ
ロック図である。
FIG. 6 is a functional block diagram showing an embodiment of the invention described in claim 2.

【図7】請求項3に記載の発明により図3の文字並び情
報より生成される文字並び情報を示す図である。
FIG. 7 is a diagram showing character arrangement information generated from the character arrangement information of FIG. 3 according to the invention described in claim 3;

【図8】請求項4に記載の発明による分割文字並び情報
生成の例を示す図である。
FIG. 8 is a diagram showing an example of generation of divided character arrangement information according to the invention described in claim 4;

【符号の説明】[Explanation of symbols]

10…処理開始 11、31…辞書読出処理 12…辞書生成処理 13…照合処理 14、32…判定処理 15…処理終了 20…文字認識結果 21…辞書 22…照合中間結果 23…判定結果 131…分割エンド 132…類似性の計算 133…結果累積 134…照合候補の追加 135…リジェクトフラッグ 136…一次格納バッファ 10 ... Start of processing 11, 31 ... Dictionary reading processing 12 ... Dictionary generation processing 13 ... Collation processing 14, 32 ... Judgment processing 15 ... End of processing 20 ... Character recognition result 21 ... Dictionary 22 ... Collation intermediate result 23 ... Judgment result 131 ... Division End 132 ... Similarity calculation 133 ... Result accumulation 134 ... Collation candidate addition 135 ... Reject flag 136 ... Primary storage buffer

Claims (4)

【特許請求の範囲】[Claims] 【請求項1】 文字の並びに関する情報を記載した辞書
をもとに、文字認識の結果を確認/補正する文字認識後
処理方式において、 文字認識の結果に応じて、辞書より照合する情報を読み
出す手段と、 文字認識結果と読み出した文字並び情報に応じて文字列
を分割して階層的な分割文字並び情報を作成する手段
と、 入力文字列と読み出した文字並び情報を照合し、その類
似性を階層順に順次求める手段と、 最終的に入力文字列に対応する後処理結果を判定する手
段と、 を有することを特徴とする文字認識後処理方式。
1. A character recognition post-processing method for confirming / correcting the result of character recognition based on a dictionary in which information about the arrangement of characters is written, and information to be collated is read from the dictionary according to the result of character recognition. Means, a means for dividing the character string according to the character recognition result and the read character arrangement information to create hierarchical divided character arrangement information, and the input character string and the read character arrangement information are collated, and their similarity A character recognition post-processing method comprising: a means for sequentially obtaining the post-processing in a hierarchical order; and a means for finally determining a post-processing result corresponding to an input character string.
【請求項2】 文字の並びに関する情報を記載した辞書
をもとに、文字認識の結果を確認/補正する文字認識後
処理方式において、 文字認識の結果に応じて、辞書より照合する情報を読み
出す手段と、 文字認識結果と読み出した文字並び情報に応じて文字列
を分割して階層的な分割文字並び情報を作成する手段
と、 入力文字列と読み出した文字並び情報を照合し、その類
似性を階層順に順次求める手段と、 最終的に入力文字列に対応する後処理結果を判定する手
段と、 を有し、照合する辞書情報を選択して辞書を読み出すに
際して、入力文字列と既に読み出した辞書情報の照合に
関する結果を文字認識の結果と併せて読出し条件とする
こと、 を特徴とする文字認識後処理方式。
2. In a character recognition post-processing method for confirming / correcting the result of character recognition based on a dictionary describing information on the arrangement of characters, information to be collated is read from the dictionary according to the result of character recognition. Means, a means for dividing the character string according to the character recognition result and the read character arrangement information to create hierarchical divided character arrangement information, and the input character string and the read character arrangement information are collated, and their similarity And a means for finally determining the post-processing result corresponding to the input character string, and selecting the dictionary information to be matched and reading the dictionary, the input character string and the already read A post-processing method for character recognition, which is characterized in that the result related to collation of dictionary information is used as a read condition together with the result of character recognition.
【請求項3】 文字認識結果に応じて辞書を読み出すに
際して、文字認識結果にアルファベットが含まれる時
に、大文字と小文字を区別せずに辞書を読み出し、 分割辞書作成に際して、分割辞書がアルファベットと数
字のみで構成される場合にのみ大文字のみで構成される
分割辞書と小文字のみで構成される分割辞書と先頭のみ
大文字で残りが小文字で構成される辞書の三種類の分割
辞書を生成し、 三辞書の中から最も類似性の高い辞書を先立って選択す
ることを更に特徴とする請求項1または2のいずれか一
項に記載の文字認識後処理方式。
3. When reading a dictionary according to a character recognition result, when the character recognition result includes alphabets, the dictionary is read without distinguishing between uppercase and lowercase, and when the divided dictionary is created, the divided dictionary includes only alphabets and numbers. Generates three types of split dictionaries: a split dictionary that consists only of uppercase letters, a split dictionary that consists of only lowercase letters, and a dictionary that consists of only uppercase letters and lowercase letters. The character recognition post-processing method according to claim 1 or 2, further characterized in that the dictionary having the highest similarity is selected in advance.
【請求項4】 分割辞書作成に際して、分割辞書がアル
ファベットと数字のみで構成され、辞書情報で内容が姓
名に限定されるときにイニシャルで構成する辞書を生成
することを更に特徴とする請求項1、2または3のいず
れか一項に記載の文字認識後処理方式。
4. When creating a divided dictionary, the divided dictionary is composed of only alphabets and numbers, and when the content of the dictionary information is limited to first and last names, a dictionary composed of initials is generated. 2. The character recognition post-processing method according to any one of 2 and 3.
JP5003614A 1993-01-12 1993-01-12 Post-processing method for character recognition Expired - Lifetime JP2560959B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP5003614A JP2560959B2 (en) 1993-01-12 1993-01-12 Post-processing method for character recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP5003614A JP2560959B2 (en) 1993-01-12 1993-01-12 Post-processing method for character recognition

Publications (2)

Publication Number Publication Date
JPH06215198A true JPH06215198A (en) 1994-08-05
JP2560959B2 JP2560959B2 (en) 1996-12-04

Family

ID=11562373

Family Applications (1)

Application Number Title Priority Date Filing Date
JP5003614A Expired - Lifetime JP2560959B2 (en) 1993-01-12 1993-01-12 Post-processing method for character recognition

Country Status (1)

Country Link
JP (1) JP2560959B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11515153A (en) * 1995-11-02 1999-12-21 インターナショナル モービル サテライト オーガニゼイション Image communication

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5839377A (en) * 1981-09-02 1983-03-08 Toshiba Corp Character recognizing device
JPS5856189A (en) * 1981-09-30 1983-04-02 Comput Basic Mach Technol Res Assoc Character recognition postprocessing system
JPS58169679A (en) * 1982-03-31 1983-10-06 Comput Basic Mach Technol Res Assoc After-processing system of sentence reader

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5839377A (en) * 1981-09-02 1983-03-08 Toshiba Corp Character recognizing device
JPS5856189A (en) * 1981-09-30 1983-04-02 Comput Basic Mach Technol Res Assoc Character recognition postprocessing system
JPS58169679A (en) * 1982-03-31 1983-10-06 Comput Basic Mach Technol Res Assoc After-processing system of sentence reader

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11515153A (en) * 1995-11-02 1999-12-21 インターナショナル モービル サテライト オーガニゼイション Image communication

Also Published As

Publication number Publication date
JP2560959B2 (en) 1996-12-04

Similar Documents

Publication Publication Date Title
US7756871B2 (en) Article extraction
US9875254B2 (en) Method for searching for, recognizing and locating a term in ink, and a corresponding device, program and language
US8266169B2 (en) Complex queries for corpus indexing and search
Grefenstette Tokenization
EP1331574B1 (en) Named entity interface for multiple client application programs
WO2010044123A1 (en) Search device, search index creating device, and search system
JP2007122403A (en) Device, method, and program for automatically extracting document title and relevant information
JP3544749B2 (en) Keyword automatic extraction device
JP2001175661A (en) Device and method for full-text retrieval
JP3589007B2 (en) Document filing system and document filing method
JP2560959B2 (en) Post-processing method for character recognition
JPH06215184A (en) Labeling device for extracted area
JPS6239793B2 (en)
Eutamene et al. Ontologies and Bigram-based Approach for Isolated Non-word Errors Correction in OCR System.
JPH0441388B2 (en)
EP1072986A2 (en) System and method for extracting data from semi-structured text
JPH0748217B2 (en) Document summarization device
JPH07296005A (en) Japanese text registration/retrieval device
JP2000090193A (en) Character recognition device and item classifying method
JP2599973B2 (en) Japanese sentence correction candidate character extraction device
JP3924899B2 (en) Text search apparatus and text search method
JPH0256086A (en) Method for postprocessing for character recognition
KR101663521B1 (en) Method and program for proofreading word spacing
JPS63282586A (en) Character recognition device
JP2827066B2 (en) Post-processing method for character recognition of documents with mixed digit strings