JPS6356756A

JPS6356756A - Western language preparing device with correcting function

Info

Publication number: JPS6356756A
Application number: JP61202871A
Authority: JP
Inventors: Yoshizo Saito; 齋藤　佳三
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1986-08-28
Filing date: 1986-08-28
Publication date: 1988-03-11

Abstract

PURPOSE:To supply a correct answer candidate word by referring to a segmentation table to store a correct answer candidate corresponding to the character read erroneously and the input data of a character string. CONSTITUTION:In a Western language preparing device, a main dictionary and a user dictionary are provided and it is investigated whether or not the spelling of the inputted word is present in them. When the spelling is not in the dictionaries, the word is prepared by one-character replacement, one- character deletion, one-character addition and an adjoining character inversion, and it is confirmed whether or not the word is present in the main dictionary or the user dictionary. When the word is absent, a segmental table is referred to, the word of the correct answer candidate is supplied from the erroneously read data when two characters are contacted and one character is broken off and confirmed by the main dictionary and the user dictionary.

Description

【発明の詳細な説明】〈産業上の利用分野〉この発明はコレクト機能付欧文作成装置に関ケろ。[Detailed description of the invention] <Industrial application field> This invention relates to a European writing device with a correct function.

〈従来の技術〉従来より、英単語情報を取り扱う計算機ノステム、ワー
ドプロセッサー装置、タイプライタ−装（η等の欧文作
成装置においては、入力した英単語の綴りに誤りはない
かどうかのヂエックをする５ｒ二めに、通常は辞書を備
えている。そして、さらに、今［１では単に綴り間退い
の単語を指摘４〜る他に、正解候補の単語をシステム側
が供給する機能がつけられるようになった。現在よく使
用されている方法は、下記の４つの処理を組合せた方法
であるが、これでも統計的には正解単語は８０％台の精
度でしか得られない。<Prior art> Conventionally, computer systems, word processors, and typewriters (such as η) that handle English word information have been using a 5R system to check whether there are any spelling errors in the input English words. Second, the system is usually equipped with a dictionary.In addition, in addition to simply pointing out words that are misspelled, the system now has a function that provides correct candidate words. The method commonly used at present is a method that combines the following four processes, but statistically speaking, correct words can only be obtained with an accuracy of around 80%.

ｉ）　　１文字置換　例　５ｔａｔａｏｎ　−＊　５ｔ
ａｔｉｏｎ（ａをｉに置換える）２）１文字削除　例　５ｔａｔｉｏｏｎ　−＋　５ｔａ
ｔｉｏｎ（０を１文字削除）３）　　１文字追加　例　５ｔａｔｏｎ−４ｓｔａｔｉ
ｏｎ（ｉＧｔと０の間に追加）４）隣接文字反転例　５ｔａｔｒｅ　−）　５ｔａｔｅ
ｒ（ｒｅをｅｒに反転さ什る）ところで、過去においては、英単語の入力は原稿をみな
がらキーボードから入力していくのが大半であったが、
今日では、０ＣＲ（光学文字読取装置）等を用いて直接
活字データを入力するというように、データ入力の仕方
も変わりつつある。i) Single character replacement example 5tataon -* 5t
ation (replace a with i) 2) Delete one character Example 5tation −+ 5ta
tion (delete 1 character from 0) 3) Add 1 character Example 5taton-4stati
on (add between iGt and 0) 4) Example of adjacent character reversal 5tatre -) 5tate
r (invert re to er) By the way, in the past, most English words were input using the keyboard while looking at the manuscript.
Nowadays, the way of data input is changing, such as directly inputting printed data using OCR (optical character reader) or the like.

そのため、発生するスペルエラーの内訳も、単なるキー
入力間違いよりも、綴りの非常に似通った文字の認識間
違いのエラーが総エラーの中で占めろ割合が大きくなっ
てきている。そのため、従来の１文字置換、１文字削除
。１文字追加、隣接文字反転という処理では極めて低い
精度でしか正解候補単語が得られない。As a result, the breakdown of spelling errors that occur is that errors caused by incorrect recognition of characters with very similar spellings account for a larger proportion of total errors than simple key input errors. Therefore, conventional one character replacement and one character deletion. The process of adding one character and reversing adjacent characters can only yield correct candidate words with extremely low accuracy.

〈発明の目的〉そこで、この発明は、誤読される文字または文字列の入
力データに対応づけて正解候補を記憶しているセグメン
テーションテーブルを参照することによって、より精度
の高い正解候補単語をＵ（袷できるようにすることにあ
る。<Purpose of the Invention> Therefore, the present invention provides a more accurate correct candidate word U( The goal is to make it easy to wear.

〈発明の構成〉上記目的を達成するため、この発明のコレクト機能付欧
文作成装置は、入力装置から入力された文字情報を記憶
する記憶装置と、メイン辞書領域と誤読される文字また
は文字列の入力データに対応づけて正解候補を記憶して
いるセグメンテーションテーブルとを有する辞書装置と
、上記記憶装置に記憶された文字列がメイン辞書に有る
か否かを判別し、上記文字列がメイン辞書に有る場合に
は、その文字列を出力装置に出力する一方、上記文字列
がメイン辞書にない場合には、１文字置換、１文字削除
、１文字追加、隣接文字反転の４処理のいずれかまたは
その組み合せを行なって、その処理後の文字列がメイン
辞書にあるか否かを判別し、上記処理後の文字列がメイ
ン辞書に有る場合には、処理後の文字列を出力装置に出
力する一方、上記処理後の文字列がメイン辞書にない場
合には、上記文字列を分解し、分解後の文字または文字
列がセグメンテーションテーブルにおける誤読される文
字または文字列と一致する場合には、上記文字または文
字列を正解候補に入れ替えて、この入れ替えた後の文字
列がメイン辞書にあるか否かを判別して、上記文字列か
上記メイン辞書にある場合には、上記文字列を出力装置
に出力する制御装置を備え１こことを特徴としている。<Structure of the Invention> In order to achieve the above object, the European language creation device with a correct function of the present invention includes a storage device for storing character information input from an input device, a main dictionary area, and a main dictionary area for storing characters or character strings that are misread. A dictionary device includes a segmentation table that stores correct answer candidates in association with input data, and a dictionary device that determines whether or not the character string stored in the storage device is in the main dictionary, and stores the character string in the main dictionary. If the character string exists, the character string is output to the output device, while if the character string is not in the main dictionary, one of the following four processes: 1 character replacement, 1 character deletion, 1 character addition, adjacent character inversion, or Perform the combination and determine whether or not the processed character string is in the main dictionary, and if the processed character string is in the main dictionary, output the processed character string to the output device. On the other hand, if the character string after the above processing is not in the main dictionary, the above character string is decomposed, and if the character or string after decomposition matches the misread character or character string in the segmentation table, the above character string is Replace the character or character string with the correct answer candidate, determine whether the replaced character string is in the main dictionary, and if the character string is in the main dictionary, output the character string to the output device. It is characterized by having a control device that outputs output to

〈実施例〉以下、この発明を図示の実施例により詳細に説明する。<Example> Hereinafter, the present invention will be explained in detail with reference to illustrated embodiments.

第１図において、ｌはキーボード、タブレット装置、ｏ
ｃｎ、磁気テープ等の英文字の単語情報データを入力す
る入力装置、２は入力装置１から入力された文字情報を
記憶する例えばコアメモリ、ＩＣメモリ、磁気ディスク
等の記憶装置、３は記憶装置２に保存されている編集後
の文字単語データを出力する例えばプリンター、ディス
プレイ装置、磁気テープ、磁気ディスク等の出力装置、
／１はメイン辞書領域、ユーザ辞書領域およびセグメン
テーションテーブルを有して、記憶装置に格納している
文字単語データの綴り情報の間合せに対して、有効な情
報を供給する辞書装置、５は上記入力装置１１記憶装置
２、出力装置３および辞書装置４間の信号のやりとりを
制御すると共に、後記する入力単語の文字列のコレクト
処理を行なう例えばコンピュータからなる制御装置であ
る。In FIG. 1, l is a keyboard, a tablet device, o
cn, an input device such as a magnetic tape for inputting English character word information data, 2 a storage device such as a core memory, an IC memory, a magnetic disk, etc. that stores the character information input from the input device 1, and 3 a storage device. An output device such as a printer, a display device, a magnetic tape, a magnetic disk, etc., which outputs the edited character word data stored in 2.
/1 has a main dictionary area, a user dictionary area, and a segmentation table, and supplies effective information for matching the spelling information of character word data stored in a storage device; 5 is the above-mentioned dictionary device; The input device 11 is a control device made of, for example, a computer, which controls the exchange of signals between the storage device 2, the output device 3, and the dictionary device 4, and also performs a process of collecting character strings of input words, which will be described later.

上記セグメンテーションテーブルは誤読される文字また
は文字列の入力データに正解候補の文字または文字列を
対応づけて記憶している。すなわち、第３図は２文字接
触の誤読データ（入力データ）に正解候補を対応づけた
セグメンテーンヨンテーブルを示し、第４図は１文字が
切れたための誤読データ（入力データ）に正解候補の文
字を対応づけたセグメンテーションテーブルを示してい
る。The segmentation table stores input data of misread characters or character strings in association with correct candidate characters or character strings. That is, Fig. 3 shows a segmentation table that associates correct answer candidates with misreading data (input data) of two characters touching, and Fig. 4 shows a segmentation table that associates correct answer candidates with misreading data (input data) of one character being cut off. It shows a segmentation table that associates characters.

第２図はこのコレクト機能付欧文作成装置のコレクト処
理のフローチャートである。まず、ステップＳｚ、Ｓｔ
で、入力単語の文字列がメイン辞書とユーザー辞書にあ
るかチェックする。この場合の辞書構造は、ハツシュ（
Ｈａｓｈ）法を使用した方法や連枝式を使用した方法や
マイクロプロ（Ｍｉｃｒ。FIG. 2 is a flowchart of the correct processing of this Latin language creation device with a correct function. First, steps Sz, St
checks whether the input word string exists in the main dictionary and user dictionary. The dictionary structure in this case is hash (
Hash) method, a method using a continuous branch method, and a method using the Micropro (Micr) method.

Ｐ　ｒｏ）社の旧ワードスタ（Ｗｏｒｄ　５ｔａｒ）の
辞ｔ４ｆ　ｔＭ成方法（Ａ−Ｚ順に頭文字データを横配
列にその単語を構成する文字数を縦順にした２次元のイ
ンデックスをらとにした方法）等、どの方法でもかまわ
ない。文字列がメイン辞書やユーザ辞書にある場合は、
その文字列は正しいものとして、記憶装置２に記憶し、
さらに出力装置３から出力する。このようなメイン辞書
やユーザ辞書に文字列が登録されていないときには、ス
テップＳ３．　Ｓ４．　Ｓ５゜Ｓ６に進み、前述の４つ
の処理を行なう。すなわち、単語の長さＣの構成要素Ｃ
＝Ｃ，・・・Ｃｉ・・・ＣＱからの正しい綴り候補は、１）　　１文字置換　位置ｉ、！≦１≦ρにある文字Ｃ
ｉをそれ以外のＣ＊と置換える。The t4f tM creation method of the old Word Star (Word 5tar) by Pro) (a method that uses a two-dimensional index in which initial letter data is arranged horizontally in A-Z order and the number of characters that make up the word is arranged vertically) etc., any method is fine. If the string is in the main dictionary or user dictionary,
The character string is assumed to be correct and stored in the storage device 2,
Furthermore, it is outputted from the output device 3. If the character string is not registered in such a main dictionary or user dictionary, step S3. S4. The process advances to S5 and S6, and the above-mentioned four processes are performed. That is, component C of word length C
=C,...Ci...Correct spelling candidates from CQ are: 1) 1 character replacement position i,! Character C in ≦1≦ρ
Replace i with other C*.

２）　　１文字削除　位置ｉ、１≦ｌ≦Ｑにある文字Ｃ
ｉを除去する。2) Delete one character Character C at position i, 1≦l≦Q
Remove i.

３）１文字追加　位置ｉとｉ＋Ｉ、ｏ≦ｉ≦Ｑの間に仮
想した文字Ｃ＊を挿入する。3) Add 1 character Insert the imaginary character C* between position i and i+I, o≦i≦Q.

４）隣接文字反転位置ｉと１−１−１．１≦ｉ≦ρ−１
とにあるｃｉ、ｃｉ＋１を反転する。4) Adjacent character inversion position i and 1-1-1.1≦i≦ρ-1
Invert ci and ci+1 in .

の処理によって新たに導きだせる新単語がメイン辞書ま
たはユーザ辞書にあるかチェックし、あるときは一致す
る単語を記憶装置２に格納する。It is checked whether there is a new word that can be newly derived by the processing in the main dictionary or the user dictionary, and if there is a new word that can be newly derived, the matching word is stored in the storage device 2.

ここまでの処理で誤った綴りの訂正は、８０％台の訂正
率が得られるという統計的数字がでている。しかし、ま
だｌＯ数％台の誤り率は発生している。そのため、ステ
ップＳ７に進んで上記４つの処理に加えて、第３，４図
のセグメンテーンヨンテーブルを参照することによって
、より精度の高い正解候補の単語が供給できる。このセ
グメンテーノヨンテーブルは単語を構成している文字列
を分解しており、このセグメンテーンヨンテーブルに登
録されている文字列と同じ文字列の場合に、その後にあ
る候補文字列と入替えて新しい正解単語候補を提供する
。これがメイン辞書あるいはユーザ辞書にある単語と同
一かどうかチェックし、一致していれば正しい単語候補
として記憶装置２、出力装置３に供給する。このように
して、精度の高い正解単語を供給できろ。Statistical figures show that the processing up to this point can correct incorrect spellings with a correction rate in the 80% range. However, an error rate of several 10% still occurs. Therefore, by proceeding to step S7 and referring to the segmentation table shown in FIGS. 3 and 4 in addition to the above four processes, more accurate correct candidate words can be supplied. This segmentation table breaks down the character strings that make up a word, and if the character string is the same as one registered in this segmentation table, it is replaced with the candidate character string that follows it and a new one is created. Provide correct word candidates. It is checked whether this is the same as a word in the main dictionary or the user dictionary, and if they match, it is supplied to the storage device 2 and output device 3 as a correct word candidate. In this way, you can supply correct words with high accuracy.

なお、第３．４図に示すセグメンテーンヨンテーブルは
、従来の４つの処理と重複するパターンを省いている。Note that the segmentation table shown in FIG. 3.4 omits patterns that overlap with the four conventional processes.

〈発明の効果〉以上より明らかなように、この発明のコレクト機能付欧
文作成装置は、従来の１文字置換、１文字削除、１文字
追加、隣接文字反転の４つの処理に加えて、誤読される
文字または文字列に対応づけて正解候補を記憶している
セグメンテーノヨンテーブルを参照して、正解候補単語
を捜し出して供給するので、辞書から供給される単語の
精度を大幅に向上できる。<Effects of the Invention> As is clear from the above, the European language creation device with a correct function of the present invention, in addition to the conventional four processes of replacing one character, deleting one character, adding one character, and reversing adjacent characters, Since correct candidate words are searched and supplied by referring to a segmentation table that stores correct candidate words in association with characters or character strings, the accuracy of words supplied from the dictionary can be greatly improved.

[Brief explanation of drawings]

第１図はこの発明の一実施例のコレクト機能付欧文作成
装置のブロック図、第２図はコレクト処理のフローチャ
ート、第３．４図はセグメンテーションテーブルの説明
図である。ｌ・・入力装置、２・・・記憶装置、３・・・出力装置
、４・・１γ書装置、５・・・制御装置。特　許　出　願　人　　シャープ株式会社代　理　人　
弁理士　　前出　葆　外２名第１　図第２図第３図１１ｆ４図本FIG. 1 is a block diagram of a Roman language creation apparatus with a correct function according to an embodiment of the present invention, FIG. 2 is a flowchart of the correct processing, and FIG. 3.4 is an explanatory diagram of a segmentation table. l...input device, 2...storage device, 3...output device, 4...1γ writing device, 5...control device. Patent applicant: Sharp Corporation Agent
Patent attorney: 2 people (ex. 1st figure, 2nd figure, 3rd figure, 11f4 book)

Claims

[Claims]

(1) A dictionary device that has a storage device that stores character information input from an input device, and a segmentation table that stores correct answer candidates in association with a main dictionary area and input data of characters or character strings that are misread. It is determined whether or not the character string stored in the storage device is present in the main dictionary, and if the character string is present in the main dictionary, the character string is output to the output device, while the character string is If it is not in the main dictionary, replace one character, delete one character,
Perform any of the four processes of adding one character and reversing adjacent characters, or a combination thereof, and determine whether the character string after that process is in the main dictionary, and if the character string after the above process is in the main dictionary. In this case, the processed string is output to the output device, and if the processed string is not in the main dictionary, the string is decomposed and the decomposed character or string is misread in the segmentation table. If the above character or string matches the character or string shown above, replace the above character or string with the correct answer candidate, determine whether the replaced string is in the main dictionary, and match the above character string with the correct answer candidate. A European language creation device with a correct function, comprising a control device that outputs the character string to an output device when the character string is in a main dictionary.