JPS60147888A - Postprocessing of character recognition and device - Google Patents

Postprocessing of character recognition and device

Info

Publication number
JPS60147888A
JPS60147888A JP59004198A JP419884A JPS60147888A JP S60147888 A JPS60147888 A JP S60147888A JP 59004198 A JP59004198 A JP 59004198A JP 419884 A JP419884 A JP 419884A JP S60147888 A JPS60147888 A JP S60147888A
Authority
JP
Japan
Prior art keywords
word
character
characters
candidate
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP59004198A
Other languages
Japanese (ja)
Other versions
JPH0438026B2 (en
Inventor
Eiichiro Yamamoto
山本 栄一郎
Yukikazu Kaburayama
蕪山 幸和
Yoshihisa Fujii
敬久 藤井
▲はい▼ 東善
Touzen Hai
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Basic Technology Research Association Corp
Original Assignee
Computer Basic Technology Research Association Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Basic Technology Research Association Corp filed Critical Computer Basic Technology Research Association Corp
Priority to JP59004198A priority Critical patent/JPS60147888A/en
Publication of JPS60147888A publication Critical patent/JPS60147888A/en
Publication of JPH0438026B2 publication Critical patent/JPH0438026B2/ja
Granted legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Character Discrimination (AREA)

Abstract

PURPOSE:To shorten retrieving time of a word dictionary by setting preliminarily an optimum index character position in accordance with the number of characters of a word to be registered in a word dictionary, using a character in correspondence to an optimum index character position of a candidate word string as an index character, and retrieving a word of the same number of characters as that of a candidate word from a word dictionary. CONSTITUTION:It is assumed that an input word is ''KEISANKI'' having the number of letters of an input word n=3. candidate characters ''tei'', ''kei'', ''tou'', ''kyo'' and ''so'' for ''kei'', those of ''sen'', ''toku'', ''tou'', ''ga'' and ''bo'' for ''san'' and those of ''ki'', ''so'', ''ou'', ''ran'' and ''sei'' for ''ki'' are supplied. In this condition the value in accordance with the number of characters of a candidate word (n=3) is stored in a number of characters of a word register 7a. When it is transmitted to an index character reading circuit 7c, it reads out sequentially respective candidate characters of a word ''sen'', ''toku'', ''tou'', ''ga'', ''bo'', in correspondence to an optimum index character position, m=2 as an index character from a candidate word buffer 5 refering to an index character table 7b.

Description

【発明の詳細な説明】 (1)発明の技術分野 本発明は、文字竺識に当って文字の誤認識を防止するた
めに一旦認識した文字に対して行なわれる文字認識後処
理方法及びその装置に関する。
Detailed Description of the Invention (1) Technical Field of the Invention The present invention relates to a character recognition post-processing method and apparatus for performing character recognition processing on once recognized characters in order to prevent erroneous character recognition during character recognition. Regarding.

(2)技術の背景 7 一般に、文字認識装置は、用紙等に書かれた文字を観測
して当該文字の特徴を抽出した後、当該文字の特徴と認
識辞書の内容とを照合させて上記文字を識別するように
したものである。
(2) Background of the technology 7 In general, character recognition devices observe characters written on paper, etc., extract the characteristics of the characters, and then compare the characteristics of the characters with the contents of a recognition dictionary to identify the characters. It is designed to identify.

ところが、用紙等に書かれた文字が乱雑であった9、く
せ字であるような場合には、当該文字が誤認識されると
いう虞れがある。
However, if the characters written on a sheet of paper or the like are disordered9 or have curly characters, there is a risk that the characters may be misrecognized.

(3) 従来技術とその問題点 そこで、従来にあっては、文字認識後処理方法として、
認識対象が複数文字から成る単語であることを前提とし
、一旦認識した文字を単語単位で再認識するようにした
ものがあシ、これにより、文字の誤m識を極力防止する
ようにしている。即ち、上記後処理方法は、上記入力単
語を含む多数の単語を先頭文字によって検索可能に配列
した単語辞書を予め作成しておき、上記入力単語の各文
字を認識して類似度の高い順位から順にめられた候補文
字列を単語単位に区画して候補単語列とし、この候補単
語列を上記単語辞書の内容と照合させることによシ、単
語辞書から最適な単語を選択するようにしたものである
(3) Prior art and its problems Therefore, in the past, as a character recognition post-processing method,
There is a method that assumes that the recognition target is a word consisting of multiple characters, and re-recognizes the recognized characters word by word, thereby preventing misidentification of characters as much as possible. . That is, in the above post-processing method, a word dictionary is created in advance in which a large number of words including the input word are arranged in a searchable manner by the first character, and each character of the input word is recognized and the words are ranked in order of similarity. The candidate character strings selected in order are divided into word units to form a candidate word string, and the optimal word is selected from the word dictionary by comparing this candidate word string with the contents of the word dictionary. It is.

ところで、単語の文字数によっては、ある先頭文字から
始まる単語数が非常に多くなってしまうという事態を生
ずる。このような場合において、□従来の文字認識後処
理方法にあっては、上記単語辞書は常に単語の先頭文字
によって検索される構成になっているので、単語辞書の
検索時間がかさみ、その分、文字認識後処理の効率が悪
くなるという不具合を生ずる。
However, depending on the number of characters in a word, a situation may arise in which the number of words starting from a certain first character becomes extremely large. In such a case, □In the conventional character recognition post-processing method, the word dictionary is always searched by the first character of the word, so the search time for the word dictionary increases, and This causes a problem that the efficiency of character recognition post-processing becomes poor.

(4)発明の目的 本発明は以上の観点に立って為されたものであって、そ
の目的とするところは、文字認識後処理において、単語
辞書の検索時間の短縮化を図9、もって処理効率を向上
させるようにした文字認識後処理方法及びその装置を提
供することにある。
(4) Purpose of the Invention The present invention has been made based on the above-mentioned viewpoints, and its purpose is to shorten the search time of a word dictionary in character recognition post-processing. An object of the present invention is to provide a character recognition post-processing method and an apparatus thereof that improve efficiency.

(5) 発明の構成 そして、本発明に係る文字認識後処理方法の基本的構成
は、複数文字から成る入力単語の各文字を認識してめら
れた候補文字列を単語単位に区画して候補単語列とし、
この候補単語列を単語辞書の内容と照合させて単語辞書
から最適な単語を選択するに際し、上記単語辞書に登録
すべき単語の文字数に応じて電適索引文字位置を予め設
定しておき、上記単語辞書には登録単語群を上記最適索
引文字位置の文字によって検索可能に配列する一方、上
記i補単語列の最適索引文字位置に対応する文字を索引
文字として上記単語辞書から候補単語と同一文字数の単
□語を検索するようにしたもめである。また、上記方法
を実施するための装置の基本的構成は、複数文字から成
る入力単語の各文字を認識してめられた候補文字列を卑
語単位に区画した候補単語列として蓄積する蓄積手段と
、登録されるべき単語の文字数に応じて予め設定された
最適索引文字位置の文字によシ登録単語群を検索可能に
配列してなる単語辞書左ミ上記蓄積手段に蓄積された候
補単語列の最適索引文字位置に対応する文字を索引文字
として上記単語辞書から候補単語と向−文字数ア゛単語
を検索する検索手段と、゛上記単語辞書から検索され九
単飴のうち最適なものを選択する選択手段とを備えたも
のである。
(5) Structure of the Invention The basic structure of the character recognition post-processing method according to the present invention is to divide candidate character strings obtained by recognizing each character of an input word consisting of a plurality of characters into word units, and to generate candidates. As a word string,
When comparing this candidate word string with the contents of the word dictionary and selecting the optimal word from the word dictionary, the electronic index character position is set in advance according to the number of characters of the word to be registered in the word dictionary. In the word dictionary, registered word groups are arranged so that they can be searched by the characters at the optimal index character position, while the characters corresponding to the optimal index character position of the i-complement word string are used as index characters to search for the same number of characters as the candidate word from the word dictionary. The problem is that the search is made for the single word □. Furthermore, the basic configuration of the device for carrying out the above method includes an accumulation means for accumulating a candidate character string obtained by recognizing each character of an input word consisting of a plurality of characters as a candidate word string divided into vulgar words. , a word dictionary in which a group of registered words are arranged in a searchable manner according to the characters at the optimal index character position set in advance according to the number of characters of the word to be registered; a search means for searching for a candidate word from the word dictionary using the character corresponding to the optimum index character position as an index character; and selection means.

(6)発明の実施例 以下、添付図面に示す実施例に基づいて本発明に係る文
字認識後処理方法及びその装置を詳細に説明する。 ゛ 第1図は文字認識装置に本発明に係る文字認識後処理装
置を付設したものを示すブロック図である。□同図にお
いて、文字認識装置d、用紙等に書かれた入力単語の各
文字を光学的に読み取シ該光信号を光電変換して出力す
る観測部1と、この観測部1からの文字図形を表わす電
気信号に基づいて入゛力文字の特徴を抽出す□る特徴な
偶の標準特徴とを照合させて′類似度の高い順位から順
に認識辞書内の文字を入力文字として識別する識別部4
とを備えている。また、文字認識後処理装置は、上記入
力単語の各文字に対して上記識別部4′で識別された各
候補文字列を単語単位に区画した候補単語列として蓄積
する候補単語バッファ5と、入力単語を含む単語を予め
格納している単語辞書6と、上記候補単語列を単語辞書
6の内容と照合させて単語辞書から最適な単語を選択す
る単語照合部7とを備えている。
(6) Embodiments of the Invention Hereinafter, the character recognition post-processing method and apparatus thereof according to the present invention will be explained in detail based on the embodiments shown in the accompanying drawings. 1 is a block diagram showing a character recognition device to which a character recognition post-processing device according to the present invention is attached. □In the same figure, a character recognition device d, an observation unit 1 that optically reads each character of an input word written on a paper etc., photoelectrically converts the optical signal and outputs it, and a character figure output from this observation unit 1. □ An identification unit that extracts the characteristics of the input character based on the electrical signal representing the character. □ An identification unit that identifies characters in the recognition dictionary as input characters in descending order of similarity by comparing the characteristics with even standard features. 4
It is equipped with The character recognition post-processing device also includes a candidate word buffer 5 that stores each candidate character string identified by the identification unit 4' for each character of the input word as a candidate word string divided into word units; It is equipped with a word dictionary 6 that stores words including words in advance, and a word matching section 7 that matches the candidate word string with the contents of the word dictionary 6 and selects an optimal word from the word dictionary.

この実施例において、上記候補単語バッファ5は、第2
図に示すように、人力単語の最大文字数に対応するビッ
ト数のレジスタを識別部4から得られる候補文字数分(
例えば5)だけ備えたもので、入力単語(文字数n1例
えば3)の1(i=1.2.3)番目の文字に対する候
補文字は各レジスタ5a乃至5eのi番目のアドレスに
上位レジスタ5aから下位レジスタ5eにかけて類似度
の高い歇に格納され、各レジスタ5a乃至5eには上記
候補単語が配設されるようになっている。
In this embodiment, the candidate word buffer 5 has a second
As shown in the figure, registers with the number of bits corresponding to the maximum number of characters of a human word are set for the number of candidate characters obtained from the identification unit 4 (
For example, the candidate character for the 1st (i=1.2.3)th character of the input word (number of characters n1, for example, 3) is stored in the i-th address of each register 5a to 5e from the upper register 5a. The candidate words are stored in rows with high similarity through the lower registers 5e, and the candidate words are arranged in each of the registers 5a to 5e.

また、上記単語辞書6に登録される単語群は、第2図に
示すように、最適索引文字位置の文字によって検索可能
に配列されて、いる。上記最適索引文字位置は、登録さ
れるべき単語の文字数によって予め設定されるものであ
シ、単語辞書6から所定文字数の単語を検索するに当っ
て検索単語数を、最小値に抑えられる索引文字の位置を
示すものである。この実施例では、入力単語の文字数1
1=3に対して最適索引文字位置m=2と設定されてい
る。
Further, as shown in FIG. 2, the word groups registered in the word dictionary 6 are arranged so as to be searchable by the character at the optimum index character position. The optimum index character position is set in advance according to the number of characters of the word to be registered, and is an index character that can suppress the number of search words to the minimum value when searching for a word with a predetermined number of characters from the word dictionary 6. It shows the position of In this example, the number of characters in the input word is 1
The optimum index character position m=2 is set for 1=3.

更に、単語照合部7は、第2図に示すように、上記候補
単語列の最適索引文字位置に対応する文字を索引文字と
して上記単語辞書6から候補単語と同一文字数の単語を
検索する検索手段と、上記単語辞書6から検索された単
語のうち最適なものを選択する選択手段とを備えている
。第2図において、7aは候補単語バッファ5内の候補
単語の文字数を格納する単語文字数レジスタ、7bは登
録されるべき単語の文字数nに対する最適索引文字位置
mを格納している索引文字テーブル、ICは上記単語文
字数レジスタ7aの内容を読みと9、シかも索引文字テ
ーブル7bを参照して候補単語バッファ5から索引文字
を読み出す索引文字読出回路、7dは単語辞書6内に登
録されている単語のアドレスを格納している読出アドレ
ステーブル、7eは単語文字数レジスタ7a及び索引文
字読出回路1cの内容から続出アドレステーブル7d内
のアドレスを知り、単語辞書6から単語を順次読出す単
語辞書読出制御回路であり、これらは上記検索手段を構
成している。また、符号7fは単語辞書6か争読み出さ
れた単語全格納する単語レジスタ、7gは類似度計算回
路で、上記単語レジス、り7fに格納されている単語の
各文字が候補、単語バッファ5の各候補文字列の伺番目
の順位にあるかを調べ、この値を各文字について加算す
るものであシ、単語レジスタ7fに格納されている単語
のある文字、が対応する候補文字列に存在しない場合に
は、当該文字に対して候補文字数よシ木きい順位を与え
るようになつ、ている。7hは類似度計算回路7gで#
1算された類似度を格納する類似度レジスタ、T1は類
似度レジスタ7hに順次格納される類似度のうち小さい
値のものを更新しながら格納する類似度熾小値レジスタ
、7jは類似度最小値レジスタ7iに格納される類似度
をもつ単語を更新しながら格納する最適単語レジスタ、
7には比較器でロシ、類似度レジスタ7hの値が類似度
最小値レジスタ71の値よりも小さいとき、輝似度レジ
スタ7hの値を。
Furthermore, as shown in FIG. 2, the word collation unit 7 includes a search means for searching the word dictionary 6 for a word having the same number of characters as the candidate word, using the character corresponding to the optimal index character position of the candidate word string as an index character. and a selection means for selecting the most suitable word from among the words searched from the word dictionary 6. In FIG. 2, 7a is a word character number register that stores the number of characters of a candidate word in the candidate word buffer 5, and 7b is an index character table that stores the optimal index character position m for the number of characters n of the word to be registered. 7d reads the contents of the word character number register 7a and reads the index characters from the candidate word buffer 5 by referring to the index character table 7b; The read address table 7e storing addresses is a word dictionary read control circuit which learns the addresses in the successive address table 7d from the contents of the word character count register 7a and the index character read circuit 1c, and sequentially reads words from the word dictionary 6. These constitute the above-mentioned search means. Further, reference numeral 7f is a word register that stores all the words read out from the word dictionary 6, and 7g is a similarity calculation circuit, in which each character of the word stored in the word register 7f is a candidate, and the word buffer 5 The method is to check whether the character is in the first rank of each candidate character string, and add this value for each character.If a certain character of the word stored in the word register 7f exists in the corresponding candidate character string, this value is added for each character. If not, the character will be given a higher ranking than the number of candidate characters. 7h is the similarity calculation circuit 7g #
T1 is a similarity register that stores the similarity calculated by 1, T1 is a minimum similarity register that updates and stores smaller values among the similarities sequentially stored in similarity register 7h, and 7j is the minimum similarity. an optimal word register that updates and stores words with a degree of similarity stored in the value register 7i;
7 is a comparator, and when the value of the similarity register 7h is smaller than the value of the minimum similarity register 71, the value of the brightness similarity register 7h is input.

類似度最小無レジ、スタフ1に格納すると共に、単語レ
ジスタ7fの内容を最適単語レジスタ7jに格納するも
ので、める。そして、上記単語レジスタ7f乃至比較器
7には上記選択手段を構成。
The minimum similarity register is stored in the staff 1, and the contents of the word register 7f are stored in the optimum word register 7j. The word register 7f to the comparator 7 are configured with the selection means.

している。are doing.

次に、この実施例に係る文字認識後処理装置の作Nhヲ
説明する。今、、入力単語が文字喀n=3である「計算
機」であるとし、識別部4から入力文字「計」に対し、
て「、訂」「計」「討」「許」「訴」の候補1文字が与
えられ、入力文字「算」に対して「算」「篤」「等」「
賀」「簿」の候補文字が与えられ、入力文字[機、!に
対して「機」「磯」「横」「欄」「精」の候補文字が与
えられたとする。1この場合、候補単語バッファ5には
各文字の各候補文字が所定の順位に従って上位レジスタ
5aから下位レジスタ5eへと順に格納さ゛れ、各レジ
スタ5a乃至5eには候補単語「計算機」・0・ が格
納されることになる。この状態においソ、上記単語文字
数レジスタ7aには候補単語の文字数に応じた嶺(この
場合3)が格納され、この値が索引文字読出回路7cに
送られると、索引文字テーブル7bを参照して上記索引
文字読出回路7cは候補単語バッファ5から最適索引文
字位置m=2に対応する。各候補単語の文字「算」「篤
」「等」「賀」「簿」を索引文字として順次読出す。す
ると、上記索引文字及び単語文字数レジスタ7aの内容
が上記単語辞書読出制御回路7eに送られ、読出アドレ
ステーブル7dを参照して上記単語辞書読出制御回路7
eは単語辞書6から文字数が3文字で且つ第29′目の
文字が「算」「篤」「等」「賀」「簿」である単語を順
次読出していく。このとき、上記索引文字は最適なもの
に設゛定されていることから、単語辞書6の検索時間は
、候補単語夕1jにおける先頭の文字「訂□」・・・や
第3番目の文字「機」・・・を索引文字とした場合に比
べて短縮されている。
Next, the operation of the character recognition post-processing device according to this embodiment will be explained. Now, suppose that the input word is "calculator" with character count n=3, and from the identification unit 4, for the input character "kei",
One character is given as a candidate for ``, revise,''``kei,''``shu,''``shu,'' and ``sue.''
Candidate characters for ``Ka'' and ``Book'' are given, and the input characters [Ki,! Suppose that candidate characters are given for ``ki'', ``iso'', ``horizontal'', ``ran'', and ``sei''. 1 In this case, each candidate character of each character is stored in the candidate word buffer 5 in order from the upper register 5a to the lower register 5e according to a predetermined order, and the candidate word "calculator"・0・ is stored in each register 5a to 5e. It will be stored. In this state, the word character number register 7a stores a ridge (3 in this case) corresponding to the number of characters of the candidate word, and when this value is sent to the index character reading circuit 7c, the index character table 7b is referred to. The index character reading circuit 7c corresponds to the optimum index character position m=2 from the candidate word buffer 5. The characters "san", "atsu", "etc.", "ga", and "book" of each candidate word are sequentially read out as index characters. Then, the contents of the index character and word character count register 7a are sent to the word dictionary reading control circuit 7e, and the contents of the index character and word character number register 7a are sent to the word dictionary reading control circuit 7e, and the contents are sent to the word dictionary reading control circuit 7e with reference to the reading address table 7d.
e sequentially reads words from the word dictionary 6 that have three characters and whose 29th character is "san", "atsu", "etc.", "ga", and "book". At this time, since the above-mentioned index characters are set to the optimal ones, the search time of the word dictionary 6 is limited to the search time for the word dictionary 6, such as the first character "edited □"... or the third character " It is shortened compared to the case where "machine"... is used as the index character.

そして、単語辞書6から読出された単語が例えば「計算
機」であるとすると、該単語は単語レジスタ7fに格納
されると共に、該単語の候補単語に対する類似度が類似
度割算回路7gで計算される。この場合、上記単語の先
頭文字「計」は候補文字列の第2査目に位置し、上記単
語の第2、第3番目の文字「算」「機」は夫々候補文字
列の第1番目に位置することから、類似度計算回路7g
は2・+1+1の計算を行ない、当該計算値4を類似度
として類似度レジスタ7hに格納する。この状態におい
て、先に格納されている類似度最小値レジスタ71の内
容は「計算機」という単語が存在しない以上、上記類似
度レジスタ7hの内容よシ大きいものであるため、上記
比較器7にの働きによって、類似度レジスタ7hの内容
が類似度最小値レジスタ71に格納されると共に、単語
レジスタ7fの内容が最適単語レジスタ7jに格納され
る。この後、単語辞′書6から読出された単語が順次単
語レジスタ゛7fに格納され、夫々の単語における類似
度が計算されて類似度レジスタ7パhに格納されるが、
夫々の単語の類似度は単□語「計算機」における類似度
より大きいものになるため、類似度最示値レジスタ71
及び最適単語レジスタ7jの内容は斐新されず、元の内
容を保持する。このようにして、全単語の検索が終了す
ると、最終的に類似度最小値レジスタ7i′には類似度
の最小値が格納され、最適単語レジスタ7jには類似度
の最小値に対応する単語「計算機」が格鯖され、最適単
語レジスタ7jから入力単語の最終認識結果である最適
単語「計算機」が読出される。
If the word read out from the word dictionary 6 is, for example, "calculator," the word is stored in the word register 7f, and the similarity of the word to the candidate word is calculated by the similarity division circuit 7g. Ru. In this case, the first character of the above word "ke" is located in the second position of the candidate string, and the second and third characters "san" and "ki" of the above word are located in the first position of the candidate character string. Since it is located at , the similarity calculation circuit 7g
calculates 2.+1+1 and stores the calculated value 4 as the similarity in the similarity register 7h. In this state, the content of the previously stored minimum similarity register 71 is larger than the content of the similarity register 7h since the word "calculator" does not exist, so the comparator 7 As a result, the contents of the similarity register 7h are stored in the minimum similarity register 71, and the contents of the word register 7f are stored in the optimum word register 7j. Thereafter, the words read from the word dictionary 6 are sequentially stored in the word register 7f, and the similarity of each word is calculated and stored in the similarity register 7h.
Since the similarity of each word is greater than the similarity of the word "calculator", the lowest similarity value register 71
The contents of the optimum word register 7j are not updated and retain their original contents. In this way, when all the words have been searched, the minimum similarity value is finally stored in the minimum similarity register 7i', and the word corresponding to the minimum similarity value is stored in the optimal word register 7j. "Calculator" is selected, and the optimal word "Calculator", which is the final recognition result of the input word, is read out from the optimal word register 7j.

尚、文字認識後処理装置“の具体的構成については、上
記′、実施例で示し尼ものに限定される゛ものではなく
、単飴照谷部?’ 4マイクロプロセツサを用いて作成
する等適宜設計変更して差支え□ない。 −□ ゛ (7)発明の効果 □ “以上説明してきたよ゛うに、本発明に係る文字認識後
処理方法及びその装置によれば、文字1識後処理に□お
いそ単語辞書の検索時間を従来に比べて短縮で・き、そ
□の分、処理効率を向上させることができ□る。□
The specific configuration of the character recognition post-processing device is not limited to that shown in the above embodiments, but may be created using a single microprocessor, etc. You may change the design as appropriate. -□ ゛(7) Effects of the invention □ “As explained above, according to the character recognition post-processing method and device according to the present invention, the character recognition post-processing method and the device can be used for character recognition post-processing. The search time for the Oiso word dictionary can be reduced compared to the conventional method, and the processing efficiency can be improved accordingly. □

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本発明に係る文字認識後処理装置i文字認識装
置に付設したシステムの一例を示すブロック図、□第2
図は本発明に係る文字認識後処理装置の一′実施例を示
すブロック図であるす5・・・候補単語バッファ(蓄積
手段)6・・・単語辞書 ″
Fig. 1 is a block diagram showing an example of a system attached to the character recognition post-processing device i character recognition device according to the present invention;
The figure is a block diagram showing one embodiment of the character recognition post-processing device according to the present invention.5...Candidate word buffer (storage means)6...Word dictionary''

Claims (1)

【特許請求の範囲】 l)複数文字から成る入力単語の各文字を認識してめ、
られた候補文字列を単語単位に区画して候補単語列とし
、この候補単語列を単語、 辞書の内容と照合させて単
語辞書から最適な、単一を選択するに際し、上記単語辞
書に登録すべき単語の文字数に応じて最適i引文字位置
を予め設定しておき、上記単語辞書には登録単語群を上
記最適索引文字位置の文字によって検索可能に配列する
一方、上記候補単語列の最適索引文子位置に対応する文
字を索引文字として上記単語辞書から候補単語と同一文
字数の単語を検索するようにしたことを特徴とする文字
認識後処理方法。 □ 2)複数文字から成る入力単一の各文字を認識してめら
れた候補文字列鷺警語単位に区画した候補単語列として
蓄積する蓄積手段と、登録されるべき単語の文字弊に応
じて予め設定された最適索引文字位置の文字によシ登録
単語群を検索可能に配列してなる単語辞書と、上記蓄積
手段に芦積された候補単語型の一最適索引文字位置に対
、応する文字を索引文字として上記単語辞書、ソ為ら候
補単語と同一文字数の単語を検索する牌索手段と、上記
単語辞書から検索された単語のうち最適なものを選択す
る選択手段とを備えてなる文字認識後処理装置。
[Claims] l) Recognizing each character of an input word consisting of a plurality of characters,
The candidate character strings are divided into word units to form candidate word strings, and this candidate word string is compared with words and the contents of the dictionary to select the most suitable single from the word dictionary. The optimal i index character position is set in advance according to the number of characters of the candidate word, and the registered word groups are arranged in the word dictionary so that they can be searched by the characters at the optimal index character position, while the optimal index of the candidate word string is A character recognition post-processing method characterized in that a word having the same number of characters as a candidate word is searched from the word dictionary using a character corresponding to a sentence position as an index character. □ 2) A storage means for accumulating candidate character strings as candidate word strings partitioned into ``Kogogo'' units by recognizing each input single character consisting of multiple characters, and a storage means for storing candidate character strings as candidate word strings divided into ``Kogogo'' units and a word dictionary in which a group of registered words are arranged in a searchable manner according to the characters at the optimal index character position set in advance; a search means for searching for a word having the same number of characters as the candidate word in the word dictionary using the character as an index character; and a selection means for selecting an optimal word from among the words searched from the word dictionary. A character recognition post-processing device.
JP59004198A 1984-01-12 1984-01-12 Postprocessing of character recognition and device Granted JPS60147888A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP59004198A JPS60147888A (en) 1984-01-12 1984-01-12 Postprocessing of character recognition and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP59004198A JPS60147888A (en) 1984-01-12 1984-01-12 Postprocessing of character recognition and device

Publications (2)

Publication Number Publication Date
JPS60147888A true JPS60147888A (en) 1985-08-03
JPH0438026B2 JPH0438026B2 (en) 1992-06-23

Family

ID=11577965

Family Applications (1)

Application Number Title Priority Date Filing Date
JP59004198A Granted JPS60147888A (en) 1984-01-12 1984-01-12 Postprocessing of character recognition and device

Country Status (1)

Country Link
JP (1) JPS60147888A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0337766A (en) * 1989-07-04 1991-02-19 Nec Corp Word dictionary retrieving device
JPH03110676A (en) * 1989-09-25 1991-05-10 Nec Corp Word dictionary retrieval device
JPH03110675A (en) * 1989-09-25 1991-05-10 Nec Corp Word dictionary retrieving device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0337766A (en) * 1989-07-04 1991-02-19 Nec Corp Word dictionary retrieving device
JPH03110676A (en) * 1989-09-25 1991-05-10 Nec Corp Word dictionary retrieval device
JPH03110675A (en) * 1989-09-25 1991-05-10 Nec Corp Word dictionary retrieving device

Also Published As

Publication number Publication date
JPH0438026B2 (en) 1992-06-23

Similar Documents

Publication Publication Date Title
US5774588A (en) Method and system for comparing strings with entries of a lexicon
CN101021850B (en) Word search apparatus, word search method
CN109885641B (en) Method and system for searching Chinese full text in database
EP0241717A2 (en) Linguistic analysis method and apparatus
JPS60147888A (en) Postprocessing of character recognition and device
JP3825829B2 (en) Registration information retrieval apparatus and method
JP2588261B2 (en) Address database search device by OCR
JPH113401A (en) Information processor and its method
JPH0766423B2 (en) Character recognition device
JP2961888B2 (en) Document search system using term dictionary
JPS63138479A (en) Character recognizing device
JP2996823B2 (en) Character recognition device
JP2680311B2 (en) Character recognition method
JPH04215181A (en) Information retrieval processing system
JP2005189955A (en) Document processing method, document processor, control program, and recording medium
JPS58163072A (en) Character correcting system
JPH0345431B2 (en)
JPH0340079A (en) Post-processing method for character recognition in character reader
Lee et al. Key Expression driven Record Mining for Event Calendar Search.
JPH05258100A (en) Character recognizing device
JPH07117983B2 (en) Character correction method in character recognition device
JPS58222386A (en) Correcting system of character recognizer
JPS59188783A (en) Character discriminating and processing system
Grauman et al. Matching Local Features
JPH01194088A (en) Collating device for character string and word