JPH01183795A - Dictionary consulting system in post-processing for document reader - Google Patents

Dictionary consulting system in post-processing for document reader

Info

Publication number
JPH01183795A
JPH01183795A JP63007709A JP770988A JPH01183795A JP H01183795 A JPH01183795 A JP H01183795A JP 63007709 A JP63007709 A JP 63007709A JP 770988 A JP770988 A JP 770988A JP H01183795 A JPH01183795 A JP H01183795A
Authority
JP
Japan
Prior art keywords
word
dictionary
character
words
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP63007709A
Other languages
Japanese (ja)
Other versions
JP2570784B2 (en
Inventor
Noriyasu Takao
高尾 哲康
Fumito Nishino
文人 西野
Yuji Uchida
裕士 内田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to JP63007709A priority Critical patent/JP2570784B2/en
Publication of JPH01183795A publication Critical patent/JPH01183795A/en
Application granted granted Critical
Publication of JP2570784B2 publication Critical patent/JP2570784B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To reduce the number of times of retrieving a word dictionary and to improve processing speed by retrieving all words starting at a character on all the characters of an inputted candidate character table from the word dictionary by one consulting with the dictionary. CONSTITUTION:In the word dictionary 12, word information such as a word notation, part of speech information, connecting information with other word is stored. The dictionary 12 is sorted in the sequence of the dictionary and the word starting at the same leading character is present at one place in the dictionary. A dictionary retrieving means 111 retrieves all the words starting at the character of all the characters of the inputted candidate character table from the dictionary by one consulting with the dictionary. On the retrieved word, only the word in which all the characters are present in the candidate character table is selected. Thereby, the quantity of a processing and a time required for collating the character word in a post-processing can be reduced.

Description

【発明の詳細な説明】 〔概 要〕 日本語文章を対象とした文書を読み取り機械可読な形式
に変換する文書リーダの後処理装置に関し、 単語辞書検索の回数を削減して処理速度を向上すること
を目的とし、 文書リーダの文字認識装置から出力される候補文字表か
ら、単語情報、文法情報、文脈情報、関連語情報等を利
用して、文字認識装置では一意に確定できなかった文字
を一意に確定させる後処理装置において、候補文字表の
文字を組み合わせてできる単語と、単語表記、品詞情報
等の単語情報を格納する単語辞書との照合を行う単語照
合部に、入力された候補文字表の全文字についてその文
字で始まる単語の全てを単語辞書から1回の辞書引きで
検索する辞書検索手段と、検索された単語について単語
の全ての文字が候補文字表にあるもののみを選別する単
語選別手段を備えるよう構成する。
[Detailed Description of the Invention] [Summary] This invention relates to a post-processing device for a document reader that reads and converts Japanese documents into a machine-readable format, and improves processing speed by reducing the number of word dictionary searches. With the aim of In the post-processing device that uniquely determines the candidate characters, the input candidate characters are entered into the word matching unit that matches the word formed by combining the characters in the candidate character table with the word dictionary that stores word information such as word notation and part of speech information. A dictionary search means that searches for all words starting with that letter from a word dictionary in one dictionary lookup for all the letters in the table, and selects only those words for which all the letters of the searched words are in the candidate character list. The device is configured to include word selection means.

〔産業上の利用分野〕[Industrial application field]

本発明は文書読取り認識装置(文書リーダ装置)に係わ
り、特に日本語文章を対象とした文書リーダ後処理装置
に関する。
The present invention relates to a document reading recognition device (document reader device), and particularly to a document reader post-processing device for Japanese text.

文書リーダ装置は、人間に可読な文字の形式で書かれた
大量の印刷物、出版物、書類、手書き文書等を機械可読
の形式に変換する装置として需要が高まっている。この
装置は処理する文書量にもよるが、できるかぎり高速な
処理が望まれる。
Document reader devices are in increasing demand as devices that convert large amounts of printed matter, publications, documents, handwritten documents, etc. written in human-readable character format into machine-readable format. Although it depends on the amount of documents to be processed by this device, it is desired that the processing speed is as high as possible.

文書リーダ後処理装置は、文書リーダの文字認識装置か
ら出力される候補文字列から単語情報、文法情報、文脈
情報、関連語情報等を利用して文字認識装置では一意に
確定できなかった文字を一意に確定させる機能を持つ。
The document reader post-processing device uses word information, grammar information, context information, related word information, etc. from the candidate character string output from the character recognition device of the document reader to identify characters that could not be uniquely determined by the character recognition device. It has the function of uniquely determining.

本発明は、文書リーダ後処理装置で利用する各種情報の
うち単語情報を利用する際に必要な単語辞書との照合を
高速に行う方式に関する。
The present invention relates to a method for quickly comparing word information with a word dictionary, which is necessary when using word information among various types of information used in a document reader post-processing device.

〔従来の技術〕[Conventional technology]

従来の文書リーダ後処理装置における単語照合は、文字
認識装置から得られる候補文字を組み合わせて作られる
単語のすべてについて単語辞書との照合を行い、単語辞
書中に存在するかどうかをチエツクするものであった。
Word matching in conventional document reader post-processing devices involves checking all words created by combining candidate characters obtained from character recognition devices against a word dictionary to see if they exist in the word dictionary. there were.

〔発明が解決しようとする課題〕[Problem to be solved by the invention]

文書リーダ後処理装置においては、文字認識装置から得
られた候補文字を組み合わせてできる単語を単語辞書と
の照合を行うが、この際、文字認識装置から得られる候
補文字は通常、−位候補、三位候補、−1n位候補(n
は任意の正整数)のように複数であることが多い。候補
文字を単純に組み合わせてできる単語数は以下のように
なる。
In the document reader post-processing device, words formed by combining candidate characters obtained from the character recognition device are checked against a word dictionary. At this time, the candidate characters obtained from the character recognition device are usually -position candidates, 3rd place candidate, -1nth place candidate (n
is often plural, such as any positive integer). The number of words that can be created by simply combining candidate characters is as follows.

ここで、kは単語の文字数である。Here, k is the number of characters in the word.

n1+n2+・・・+nk しかし、kに無限に大きな数値までとらせることは実際
上は不可能であるから、一般には文字種(ひらがな、カ
タカナ、記号、数式、漢字)の切れ目が単語の区切りに
なることが多いという経験に基づいて最大のkの値を単
語候補を決定する時に決定している。
n1+n2+...+nk However, it is practically impossible to make k take on an infinitely large number, so in general, the breaks between character types (hiragana, katakana, symbols, mathematical formulas, kanji) are used to separate words. The maximum value of k is determined when determining word candidates based on the experience that there are many words.

例えば、第6図に示すように、上記の区切りにより漢字
6文字が決定され、各文字についてそれぞれ四つの候補
が得られた時、従来はその全ての組み合わせについて単
語辞書にあるか否かを辞書引きしていた。即ち、1文字
の単語として、価、廊、晒、版があるかどうか、2文字
の単語として価格、価栢、価捲、価椅、廊格、廊栢、・
・があるかどうか、3文字の単語として価格対、価格栢
、価格捲、価格椅、価栢対、・・・があるかどうか、4
文字の単語として価格対性、価格対性、・・・があるか
どうか、5文字の単語として価格対性能、価格対性能、
・・・があるかどうか、6文字の単語として価格対性熊
比、価格対性熊此、・・・があるかどうかを−々検索す
る。その回数は、4’ +42+4’ +4’ +4’
 +4’ =5460となる。
For example, as shown in Figure 6, when six kanji characters are determined by the above separation and four candidates are obtained for each character, conventionally the dictionary checks whether all combinations are in the word dictionary or not. It was pulling. That is, as one-letter words, price, lang, sarashi, whether or not there is a version, and as two-letter words, price, price, price, price, price chair, name, name, name, name, etc.
・Whether there is a three-letter word such as price pair, price index, price turn, price chair, price index pair, etc., 4
Whether there is price vs. performance, price vs. performance, etc. as a letter word, price vs. performance, price vs. performance, etc. as a 5 letter word.
. . . 6-letter words such as price vs. sex bear ratio, price vs. sex bear this, . . . are searched. The number of times is 4' + 42 + 4' + 4' + 4'
+4'=5460.

このように、単語の最大文字数が多く、候補文字数が多
いと組み合わせてできる単語数は増大しで、後処理のス
ループットが悪くなるという問題があった。
As described above, if the maximum number of characters in a word is large and the number of candidate characters is large, the number of words that can be combined increases, resulting in a problem of poor post-processing throughput.

また、前述の文字種の切れ目が単語の区切りになること
が多いという経験にも、例えば「ろ過」、「お手伝い」
、「A級」のように、例外が存在するため、このような
場合には、単語照合そのものがうまくいかない場合があ
るという問題があった。
Also, from the experience that the breaks in the character types mentioned above often serve as word breaks, for example, ``filtration'', ``help'', etc.
Since there are exceptions such as "Class A", there is a problem in that the word matching itself may not go well in such cases.

本発明が解決しようとする課題は、このような従来の問
題点を解消した文書リーダ後処理における辞書引き方式
を提供することにある。
The problem to be solved by the present invention is to provide a dictionary lookup method in document reader post-processing that eliminates the above-mentioned conventional problems.

〔課題を解決するための手段〕[Means to solve the problem]

第1図は、本発明の文書リーダ後処理における辞書引き
方式の原理ブロック図を示す。
FIG. 1 shows a block diagram of the principle of a dictionary lookup method in document reader post-processing of the present invention.

図において、11は単語照合部であり、入力された候補
文字表の文字を組み合わせてできる単語と単語辞書との
照合を行う。
In the figure, reference numeral 11 denotes a word matching unit, which matches words formed by combining characters in the input candidate character list with a word dictionary.

12は単語辞書であり、単語表記、品詞情報、他単語と
の接続情報等の単語情報を格納する。
A word dictionary 12 stores word information such as word notation, part of speech information, and connection information with other words.

111は辞書検索手段であり、入力された候補文字表の
全文字についてその文字で始まる単語の全てを単語辞書
12から1回の辞書引きで検索する。
Reference numeral 111 denotes a dictionary search means, which searches the word dictionary 12 for all words starting with all characters in the input candidate character table in one dictionary lookup.

112は単語選別手段であり、検索された単語について
単語の全ての文字が候補文字表にあるもののみを選別す
る。
Reference numeral 112 denotes a word selection means, which selects searched words only from words in which all the characters are in the candidate character list.

〔作 用〕[For production]

本発明の構成によれば、文字候補から始まるすべての単
語を一度の辞書引きで検索し、単語辞書へのアクセスを
減らす。単語辞書は、辞書順にソートされているので、
同じ先頭文字で始まる単語は、辞書の内部では1個所に
まとまって存在するので、検索対象の辞書が二次記憶(
磁気ディスク装置等)上にあっても、従来の方法に比べ
て入出力の回数を大幅に減らすことができる。
According to the configuration of the present invention, all words starting from a character candidate are searched in one dictionary lookup, thereby reducing access to the word dictionary. The word dictionary is sorted in dictionary order, so
Words that start with the same first letter are grouped together in one place in the dictionary, so the dictionary to be searched is stored in secondary memory (
Even on a magnetic disk device (such as a magnetic disk drive), the number of input/output operations can be significantly reduced compared to conventional methods.

これによって、後処理の単語照合にかかる処理量および
時間を大幅に減らすことができる。
This can significantly reduce the amount of processing and time required for word matching in post-processing.

〔実施例〕〔Example〕

以下第2図〜第5図に示す実施例により、本発明をさら
に具体的に説明する。
The present invention will be explained in more detail below with reference to embodiments shown in FIGS. 2 to 5.

第2図は、本発明の一実施例のシステム構成として、文
書リーダ装置の構成を示す。
FIG. 2 shows the configuration of a document reader device as a system configuration of an embodiment of the present invention.

図において、6は一般印刷文書や手書き文書を画像イメ
ージで読み込むイメージスキャナである。
In the figure, 6 is an image scanner that reads general printed documents and handwritten documents as images.

5は文字ブロック切り出し装置であり、イメージスキャ
ナ30から得られた画像から文章が書かれている文字ブ
ロックを切り出す。
Reference numeral 5 denotes a character block cutting device, which cuts out a character block in which a sentence is written from an image obtained from the image scanner 30.

4は行切り出し装置であり、文字ブロックから行ブロッ
クを切り出す。
4 is a line cutting device which cuts out line blocks from character blocks.

3は文字切り出し装置であり、行ブロックから文字を1
文字車位に切り出す。
3 is a character cutting device, which cuts one character from a line block.
Cut out to the letter position.

2は文字認識装置であり、文字認識を行い、各文字に対
応して候補文字群を各候補ごとに距離値(重み付けのた
めの量で値が小さいものほど正解に近いと判断した)を
付け、−位候補、三位候補、・・・、n位候補(nは任
意の数値)として出力する。
2 is a character recognition device that performs character recognition and assigns a distance value (an amount for weighting, the smaller the value is judged to be closer to the correct answer) to each candidate character group corresponding to each character. , − candidate, third candidate, ..., n candidate (n is any numerical value).

1は本発明の辞書引き方式を持つ文書リーダ後処理装置
であり、文字認識装置2から得られた候補文字列から正
解と思われる文字を決定する。
Reference numeral 1 denotes a document reader post-processing device having a dictionary lookup method according to the present invention, which determines characters that are considered to be correct from candidate character strings obtained from the character recognition device 2.

第3図は、本発明の一実施例の文書リーダ後処理装置の
構成を示す図である。
FIG. 3 is a diagram showing the configuration of a document reader post-processing device according to an embodiment of the present invention.

図において、16は文字認識装置2の出力する候補文字
列を入力し、文書リーダ後処理本体15の作業領域に格
納する。これを候補文字表と呼ぶ。
In the figure, reference numeral 16 inputs the candidate character string output from the character recognition device 2 and stores it in the work area of the document reader post-processing main body 15. This is called a candidate character table.

15は後処理本体であり、後処理の機能の主制御部であ
る。
15 is a post-processing main body, which is a main control unit for post-processing functions.

17は後処理装置の後処理部であり、後処理本体15で
確定できなかった単語についての後処理等を行う。
Reference numeral 17 denotes a post-processing section of the post-processing device, which performs post-processing on words that could not be determined by the post-processing main body 15.

11は本発明の辞書引き方式を採用した単語照合部であ
り、候補文字表の文字を組み合わせてできる単語と12
の単語辞書との照合を行う。単語辞書12は、単語表記
および品詞情報、並びにこの単語に隣接可能な単語の品
詞情報(隣接情報と呼ぶ)を格納しである。
Reference numeral 11 is a word matching unit that employs the dictionary lookup method of the present invention, and 11 and 12 are words formed by combining characters in the candidate character list.
Check against the word dictionary. The word dictionary 12 stores word notations, part-of-speech information, and part-of-speech information of words that can be adjacent to this word (referred to as adjacent information).

13は文法照合部であり、単語照合部11から得られた
隣接情報を基に、14の文法辞書を参照して単語間で隣
接可能かどうかをチエツクする。文法辞書14は、品詞
側に隣接可能な単語の品詞情報を格納しである。
Reference numeral 13 denotes a grammar checking unit, which checks whether or not words can be adjacent by referring to the grammar dictionary 14 based on the adjacency information obtained from the word checking unit 11. The grammar dictionary 14 stores part-of-speech information of words that can be adjacent to each other on the part-of-speech side.

第4図は、本発明の一実施例による文書リーダ後処理装
置の単語照合部の処理を示すフローチャートである。
FIG. 4 is a flowchart showing the processing of the word matching section of the document reader post-processing device according to an embodiment of the present invention.

■まず、単語リストをリセットする。■First, reset the word list.

■候補文字表の現在確定位置(次に処理を行う単語の候
補文字表上の位置)の全ての候補文字について■および
■の処理を行う。
(2) Processes (2) and (2) are performed for all candidate characters at the currently determined position of the candidate character table (the position on the candidate character table of the word to be processed next).

■キー文字、即ちこれから処理しようとする一つの候補
文字を基に単語辞書を検索し、キー文字で始まる全ての
単語を検索する。即ち、先頭文字から始まる単語を含む
辞書上の領域を1ブロツク(通常512〜4096バイ
ト)読み込む。
(2) Search the word dictionary based on the key character, that is, one candidate character to be processed, and search for all words starting with the key character. That is, one block (usually 512 to 4096 bytes) of the area on the dictionary containing the word starting from the first character is read.

■■により検索された全ての単語について、単語の全て
の文字が候補文字表にあるもののみを選別して単語リス
トに加え、出力する。
For all the words searched by ■■, only those words in which all the characters are in the candidate character list are selected, added to the word list, and output.

この単語照合で得られた単語リストは、さらに文法照合
部13によるチエツクによりさらにしぼられることにな
る。
The word list obtained by this word matching is further narrowed down by checking by the grammar matching section 13.

第5図は、本発明の一実施例による単語照合過程例を示
す図である。
FIG. 5 is a diagram illustrating an example of a word matching process according to an embodiment of the present invention.

本例は、文字種の区切り等により漢字6文字が単語とし
て決定され、各文字についてそれぞれ四つの候補が得ら
れて、これが候補文字表として入力された場合の単語照
合処理である。
This example is a word matching process in which six Kanji characters are determined as a word by character type separation, etc., four candidates are obtained for each character, and these are input as a candidate character table.

まず、文字位置1の一位候補の「価」を先頭文字とする
単語を一度に検索する。その結果、価、価格、価値、価
値づけの四つが得られた。三位候補の「廊」を先頭文字
とする単語では、廊下が一つだけ検索された。三位候補
の「晒」を先頭文字とする単語では、晒、晒しの二つが
検索された。
First, words whose first character is "valence", which is the first candidate for character position 1, are searched at once. As a result, four items were obtained: value, price, value, and valuation. Among the words starting with the third-place candidate, ``ro'', only one word for ``corridor'' was found. Among the words starting with the third-place candidate "sarashi", two words were searched: "sarashi" and "sarashi".

回位候補の「版」を先頭文字とする単語では、版、版下
、版画、版権、版元、版数の六つが検索され、文字位置
1を候補を先頭文字とする単語で計13候補が検索され
た。
Among the words whose first character is "ban" as a circulation candidate, six words are searched: han, hanshita, print, copyright, publisher, and edition number, and a total of 13 candidates are found for words whose first character is a candidate at character position 1. was searched.

同様に、文字位置2の候補文字「格、栢、捲、椅」を先
頭文字とする単語では、格、格安など計15候補が検索
された。
Similarly, for words whose first character is the candidate character ``kaku, haku, maki, chair'' in character position 2, a total of 15 candidates, such as kaku and kaku, were searched.

同様に、文字位置3の候補文字「対、封、村、材」を先
頭文字とする単語では、対、対ソなど80候補が検索さ
れた。
Similarly, 80 candidates such as tai, tai-so, etc. were searched for the word whose first character is the candidate character ``tai, hou, mura, material'' in character position 3.

同様に、文字位置4の候補文字「性、住、佐、牲」を先
頭文字とする単語では、性、性格など30候補が検索さ
れた。
Similarly, 30 candidates such as gender, personality, etc. were searched for the word whose first character is the candidate character "sex, residence, service, sacrifice" in character position 4.

同様に、文字位置5の候補文字を先頭文字とする単語で
は6候補、文字位置6の候補文字を先頭文字とする単語
では40候補が検索され、合計184候補が検索された
Similarly, 6 candidates were searched for words whose first character was the candidate character at character position 5, and 40 candidates were searched for words whose first character was the candidate character at character position 6, for a total of 184 candidates.

以上の検索は、一般用約7万語の単語辞書の場合である
The above search is for a general word dictionary containing about 70,000 words.

これを、第6図に示した従来例の場合の5640候補に
比べると大幅に削減されており、且つ一つの文字を先頭
文字とする単語を1度のアクセスで全て取り出すから二
次記憶へのアクセス回数は16回で済むこととなり、大
幅の削減となる。
This is a significant reduction compared to the 5,640 candidates in the conventional example shown in Figure 6, and since all words starting with one letter are retrieved in one access, the number of candidates is reduced to secondary memory. The number of accesses is only 16, which is a significant reduction.

本例では、文書リーダ後処理の入力となる候補文字を上
位回位までにしぼったが、これをもっと緩くするとその
差は益々開くことが見込まれる。
In this example, the candidate characters to be input into the document reader post-processing are limited to the top rankings, but if this is made more lenient, the difference is expected to widen further.

次ぎに、検索された全ての単語について、単語の全ての
文字が候補文字表にあるもののみを選別する。その結果
棄却されたものがX印で示すもので、残されたものがO
印で示すものである。次いで、○の単語について、文字
位置の1から文字位置2、文字位置3へと順に接続する
。文字位置1で検索した単語で文字位置2の候補文字を
含む場合は、文字位置3で検索した単語に接続する。
Next, for all the searched words, only those words in which all the characters are in the candidate character list are selected. The ones that were rejected as a result are those marked with an X, and the ones that remained are O.
This is indicated by a mark. Next, the word ○ is connected in order from character position 1 to character position 2 to character position 3. If the word searched at character position 1 includes a candidate character at character position 2, it is connected to the word searched at character position 3.

このようにして、比較的上位の文字候補を使用し、次の
文字位置の候補文字と接続する比較的長い単語というこ
とで、図にアンダーライン(下線)を引いて示した単語
が、最終的に単語リストとして出力される。
In this way, relatively high-ranking character candidates are used, and the final word, which is underlined in the diagram, is a relatively long word that connects with the candidate character in the next character position. is output as a word list.

〔発明の効果〕〔Effect of the invention〕

以上説明のように本発明によれば、文書リーダ後処理装
置の単語照合部において、辞書引きの回数を大幅に減ら
し、高速で効率の良い単語照合を行うことができ、文書
リーダ後処理の処理能力の向上に寄与する効果は極めて
大である。
As described above, according to the present invention, the number of dictionary lookups can be significantly reduced in the word matching section of the document reader post-processing device, and word matching can be performed at high speed and efficiently. The effect of contributing to the improvement of abilities is extremely large.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の原理ブロック図、 第2図は本発明の一実施例のシステム構成を示す図、 第3図は本発明の一実施例の文書リーダ後処理装置の構
成を示す図、 第4図は本発明の一実施例による処理を示すフローチャ
ート、 第5図は本発明の一実施例による単語照合過程例を示す
図、 第6図は従来例による単語辞書引き例を示す図である。 図面において、 1は後処理装置、    2は文字認識装置、3は文字
切り出し装置、 4は行切り出し装置、5は文字ブロッ
ク切り出し装置、 6はイメージスキャナ、 11は単語照合部、111は
辞書検索手段、  112は単語選別手段、12は単語
辞書、     13は文法照合部、14は文法辞書、
     15は後処理本体、16は候補文字列入力部
、 17は後処理後処理部、をそれぞれ示す。 候補文字表 本発明の原理ブロック図 第1図 犀〒 本発明の一実施例による処理を示すフローチャート第 
  4   図 候補文字表 第   5   図
FIG. 1 is a block diagram of the principle of the present invention; FIG. 2 is a diagram showing a system configuration of an embodiment of the present invention; FIG. 3 is a diagram showing the configuration of a document reader post-processing device of an embodiment of the present invention; FIG. 4 is a flowchart showing a process according to an embodiment of the present invention, FIG. 5 is a diagram showing an example of a word matching process according to an embodiment of the present invention, and FIG. 6 is a diagram showing an example of a word dictionary lookup according to a conventional example. be. In the drawings, 1 is a post-processing device, 2 is a character recognition device, 3 is a character segmentation device, 4 is a line segmentation device, 5 is a character block segmentation device, 6 is an image scanner, 11 is a word matching unit, and 111 is a dictionary search means. , 112 is a word selection means, 12 is a word dictionary, 13 is a grammar matching unit, 14 is a grammar dictionary,
Reference numeral 15 indicates a post-processing main body, 16 indicates a candidate character string input section, and 17 indicates a post-processing post-processing section. Candidate character table Figure 1 is a block diagram of the principle of the present invention.Flowchart illustrating the processing according to an embodiment of the present invention
4 Candidate character table Figure 5

Claims (1)

【特許請求の範囲】  文書リーダの文字認識装置から出力される候補文字表
から、単語情報、文法情報、文脈情報、関連語情報等を
利用して、文字認識装置では一意に確定できなかった文
字を一意に確定させる後処理装置において、 候補文字表の文字を組み合わせてできる単語と、単語表
記、品詞情報等の単語情報を格納する単語辞書(12)
との照合を行う単語照合部(11)に、入力された候補
文字表の全文字についてその文字で始まる単語の全てを
単語辞書(12)から1回の辞書引きで検索する辞書検
索手段(111)と、検索された単語について単語の全
ての文字が候補文字表にあるもののみを選別する単語選
別手段(112)を備えるよう構成したことを特徴とす
る文書リーダ後処理における辞書引き方式。
[Claims] From the candidate character table output from the character recognition device of the document reader, the characters that could not be uniquely determined by the character recognition device are determined by using word information, grammar information, context information, related word information, etc. In the post-processing device that uniquely determines the character, there is a word dictionary (12) that stores words formed by combining the characters in the candidate character list, and word information such as word notation and part-of-speech information.
The word matching unit (11) performs matching with the word dictionary (11), which searches the word dictionary (12) for all words starting with the characters in the input candidate character list in one dictionary lookup. ), and a word selection means (112) for selecting only those words in which all the characters of the searched words are in the candidate character list.
JP63007709A 1988-01-18 1988-01-18 Document reader post-processing device Expired - Lifetime JP2570784B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP63007709A JP2570784B2 (en) 1988-01-18 1988-01-18 Document reader post-processing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP63007709A JP2570784B2 (en) 1988-01-18 1988-01-18 Document reader post-processing device

Publications (2)

Publication Number Publication Date
JPH01183795A true JPH01183795A (en) 1989-07-21
JP2570784B2 JP2570784B2 (en) 1997-01-16

Family

ID=11673268

Family Applications (1)

Application Number Title Priority Date Filing Date
JP63007709A Expired - Lifetime JP2570784B2 (en) 1988-01-18 1988-01-18 Document reader post-processing device

Country Status (1)

Country Link
JP (1) JP2570784B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5754711A (en) * 1992-01-09 1998-05-19 Fuji Xerox Co., Ltd. Document recognizing system and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS60134992A (en) * 1983-12-23 1985-07-18 Hitachi Ltd Input device of character
JPS6174086A (en) * 1984-09-18 1986-04-16 Fujitsu Ltd Word recognizing device
JPS61161588A (en) * 1985-01-11 1986-07-22 Hitachi Ltd Postprocessing system of character recognition

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS60134992A (en) * 1983-12-23 1985-07-18 Hitachi Ltd Input device of character
JPS6174086A (en) * 1984-09-18 1986-04-16 Fujitsu Ltd Word recognizing device
JPS61161588A (en) * 1985-01-11 1986-07-22 Hitachi Ltd Postprocessing system of character recognition

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5754711A (en) * 1992-01-09 1998-05-19 Fuji Xerox Co., Ltd. Document recognizing system and method
US5757958A (en) * 1992-01-09 1998-05-26 Fuji Xerox Co., Ltd. Document recognizing system and method

Also Published As

Publication number Publication date
JP2570784B2 (en) 1997-01-16

Similar Documents

Publication Publication Date Title
US7293229B2 (en) Ensuring proper rendering order of bidirectionally rendered text
US5303150A (en) Wild-card word replacement system using a word dictionary
US5983171A (en) Auto-index method for electronic document files and recording medium utilizing a word/phrase analytical program
US7359896B2 (en) Information retrieving system, information retrieving method, and information retrieving program
US5680630A (en) Computer-aided data input system
JPH0619959A (en) Proper noun specifying processing system
WO2000036530A1 (en) Searching method, searching device, and recorded medium
JPH01183795A (en) Dictionary consulting system in post-processing for document reader
JPS61248160A (en) Document information registering system
Desta et al. Automatic spelling error detection and correction for Tigrigna information retrieval: a hybrid approach
JPH07230468A (en) Method and device for automatically extracting keyword
JPH08115330A (en) Method for retrieving similar document and device therefor
JPH10320399A (en) Language identification device and method therefor and recording medium for recording program of language identification
JPH0869474A (en) Similar character string retrieval device
JP2745484B2 (en) Handwritten character recognition method and device
JPS63282586A (en) Character recognition device
JPH0785040A (en) Inscription nonuniformity detecting method and kana/ kanji converting method
JPH11120294A (en) Character recognition device and medium
JPH0477857A (en) Improper expression detecting device
JP2996823B2 (en) Character recognition device
JP2560959B2 (en) Post-processing method for character recognition
JPS63138479A (en) Character recognizing device
JPH05210635A (en) Input device
JPH076212A (en) Intelligence processing unit for optical character reader
JPH07160730A (en) Entire text retrieval device