JPH0523458B2

JPH0523458B2 -

Info

Publication number: JPH0523458B2
Application number: JP61015342A
Authority: JP
Inventors: Yoshizo Saito
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1986-01-27
Filing date: 1986-01-27
Publication date: 1993-04-02
Also published as: JPS62173568A

Description

【発明の詳細な説明】＜産業上の利用分野＞本発明は電子辞書に関し、特にその中のユーザ
辞書の改良に関する。DETAILED DESCRIPTION OF THE INVENTION <Field of Industrial Application> The present invention relates to electronic dictionaries, and particularly to improvements to user dictionaries therein.

＜従来の技術＞単語情報を取り扱う計算機システム、ワードプ
ロセツサあるいは電子タイプライタ等において
は、入力した単語の綴りに誤りがないかどうかを
チエツクするために電子辞書を備えている。この
電子辞書は、メイン辞書とユーザ辞書からなり、
メイン辞書はシステムがあらかじめ供給した単語
情報を保持し、ユーザ辞書はユーザが登録した単
語情報を保持する。<Prior Art> Computer systems, word processors, electronic typewriters, etc. that handle word information are equipped with electronic dictionaries to check whether there are any errors in the spelling of input words. This electronic dictionary consists of a main dictionary and a user dictionary.
The main dictionary holds word information supplied in advance by the system, and the user dictionary holds word information registered by the user.

通常のスペルチエツクは、第６図に示すよう
に、メイン辞書に当該単語情報と同じ情報が有る
かどうか検索し、同じ情報が無ければユーザ辞書
に同じ単語情報が有るかどうか検索し、メイン辞
書またはユーザ辞書に同じ単語情報が有るときは
綴りは正しく無いときは綴りは誤りであることを
判断する。 As shown in Figure 6, normal spell check searches to see if the main dictionary has the same information as the word information, and if the same information is not found, searches to see if the user dictionary has the same word information, and then searches the main dictionary to see if the same word information exists. Or, if the same word information exists in the user dictionary, but the spelling is not correct, it is determined that the spelling is incorrect.

＜発明が解決しようとする問題点＞上述のスペルチエツクにおいて、メイン辞書の
データ構造が良くて処理が速くても、ユーザ辞書
の処理に時間がかかつては、全体として処理は遅
くなる。ユーザ辞書は、メイン辞書と違つてシス
テムが供給するものではなく、ユーザが自由に単
語を登録できるようにしているので、それなりの
対応性を持たなくてはならない。すなわち、メイ
ン辞書と同じように高速処理が実現でき、なお且
つユーザ辞書に登録されている単語情報を自由に
リスト出力できることが必要である。ユーザ辞書
の領域は限られているので、データ領域が満了に
なつた場合、どの単語情報を削除するかはリスト
がないと判別できない。<Problems to be Solved by the Invention> In the spell check described above, even if the main dictionary has a good data structure and processing is fast, if processing of the user dictionary takes time, the overall processing will be slow. Unlike the main dictionary, the user dictionary is not provided by the system, but allows the user to freely register words, so it must have a certain degree of compatibility. That is, it is necessary to be able to realize high-speed processing like the main dictionary, and to be able to freely output a list of word information registered in the user dictionary. Since the area of the user dictionary is limited, when the data area expires, it is impossible to determine which word information to delete without the list.

第７図はHash法のみを用いたユーザ辞書の構
造を示す。Hash法では、ユーザが登録する単語
情報を数値データとHashすなわちデータブロツ
ク長を一定にするための無意味情報とを組み合わ
せて登録する。このユーザ辞書では、インデツク
スはHashにより平均的にデータが割り振られて
データを検索する速度は速いが、データ部が満了
になつた場合や、現在登録されている単語を確認
したい場合に、元の単語を再生し直すのは困難で
ある。また、ユーザ辞書に登録する単語が偶然に
もHash値が同じ場合の対処が問題である。同じ
Hash値などの兼用するとすれば、今度削除する
ときにどちらを削除するのか残すのかがわからな
い。 FIG. 7 shows the structure of a user dictionary using only the Hash method. In the Hash method, the word information registered by the user is registered by combining numerical data and Hash, that is, meaningless information for making the data block length constant. In this user dictionary, index data is allocated on average by Hash, and the data search speed is fast. However, when the data section expires or when you want to check the currently registered words, you can It is difficult to reproduce the words. Another problem is what to do when the words registered in the user dictionary happen to have the same Hash value. same
If it is used as a hash value, etc., the next time you delete it, you don't know which one to delete or which one to keep.

第８図はHash値を使用せずに文字列を使用し
た例である。このように単に文字列だけを並べた
場合、単語の長さごとに文字を比較するわけであ
るが、ユーザ領域にどのような単語が登録されて
いるかはわかる反面、スペルチエツクの場合に単
語の比較になるのでどうしても処理が遅くなる。
これをカバーするために、ユーザ領域にある単語
を頻度順や最新に使用した順番に並べる方法がと
られることもあるが、実際に誤綴りの単語が入力
された場合、やはりユーザ領域の最後まで検索す
るので時間がかかつてしまう。また、第９図に示
すようにHashを用いないインデツクス例えば単
語の頭文字や単語の長さからインデツクスを構成
したとしても、Hashを用いたインデツクスのよ
うに平均的にデータが割り振られない（ｃやｓで
始まる８文字単語は多いが、ｚやｘで始まる８文
字単語はほとんど皆無であるので）から、単語に
よつてはHash法を用いた場合より検索に時間が
かかる。 Figure 8 is an example of using character strings without using Hash values. When just character strings are arranged in this way, the characters are compared according to the length of the word, but while it is possible to see what kind of words are registered in the user area, when using spell check, it is difficult to compare the characters by word length. Since it is a comparison, the processing will inevitably be slow.
To cover this, a method is sometimes taken to arrange the words in the user area in order of frequency or in the order of most recent use, but if a misspelled word is actually entered, it will still be displayed until the end of the user area. Searching takes time. Furthermore, as shown in Figure 9, even if an index that does not use Hash, for example, an index is constructed from the initial letter of a word or the length of a word, data is not allocated on average like an index that uses Hash (c There are many 8-letter words starting with ``or'', but there are almost no 8-letter words starting with ``z'' or ``x.'' Therefore, depending on the word, it takes longer to search than when using the Hash method.

＜問題点を解決する為の手段＞本発明による電子辞書は、システムがあらかじ
め供給した単語情報を保持するメイン辞書と、ユ
ーザが登録した単語情報を保持するユーザ辞書と
を備え、オペレータが入力した単語の綴りのチエ
ツクを、上記両辞書を参照することにより行う機
能を備えた電子辞書であつて、上記ユーザ辞書は
オペレータが登録した単語をハツシユ法を用いた
数値データで記憶する第１のエリアと、その単語
を文字データで記憶する第２のエリアとにより構
成されるとともに、上記第１のエリアはスペルチ
エツクの参照用とし、かつ、上記第２のエリアは
オペレータの確認用としたことによつて特徴付け
られている。ことを特徴とする。<Means for Solving the Problems> The electronic dictionary according to the present invention includes a main dictionary that holds word information supplied in advance by the system, and a user dictionary that holds word information registered by the user. The electronic dictionary has a function of checking the spelling of a word by referring to both of the dictionaries mentioned above, and the user dictionary has a first area where the words registered by the operator are stored as numerical data using a hashing method. and a second area for storing the word as character data, the first area being used for spell check reference, and the second area being used for operator confirmation. It is characterized by It is characterized by

＜実施例＞第１図は本実施例のユーザ辞書の構造を示す。
このユーザ辞書１は、インデツクス部２、ユーザ
が登録した単語をHash法を用いた数値データで
記憶するエリアからなるデータ部３、並びに、上
記単語を文字データで記憶するエリアからなるデ
ータ部４により構成される。<Example> FIG. 1 shows the structure of a user dictionary according to this example.
This user dictionary 1 includes an index section 2, a data section 3 consisting of an area for storing words registered by the user as numerical data using the Hash method, and a data section 4 consisting of an area for storing the above words as character data. configured.

第２図はこのユーザ辞書１を用いた計算機シス
テムの構成を示す。入力装置５は、文字や単語の
情報を記憶装置６に入力するための装置であり、
例えば鍵盤装置、タブレツト装置、OCR、磁気
テープ装置等からなる。記憶装置６は、入力装置
５から入力された文字や単語の情報を保存する領
域であり、例えばコアメモリ、ICメモリ、磁気
デイスク等からなる。出力装置７は、記憶装置６
において保存・編集された結果の情報を出力する
装置であり、例えばプリンタ、デイスプレイ装
置、磁気テープ装置、磁気デイスク装置等からな
る。スペルチエツク辞書装置８は、記憶装置６に
格納されている文字や単語の綴り情報の問い合わ
せに対して適宜有効な情報を供給するための辞書
領域であり、メイン辞書（図示せず）及び上述の
ユーザ辞書１から構成される。このスペルチエツ
ク辞書装置８は、例えばコアメモリ、ICメモリ、
ROM、磁気デイスク等からなる。制御装置９
は、以上の構成要素間の信号及びデータのやりと
りを制御するコンピユータからなる。 FIG. 2 shows the configuration of a computer system using this user dictionary 1. The input device 5 is a device for inputting character and word information into the storage device 6,
For example, it consists of a keyboard device, a tablet device, an OCR device, a magnetic tape device, etc. The storage device 6 is an area that stores information on characters and words inputted from the input device 5, and includes, for example, a core memory, an IC memory, a magnetic disk, and the like. The output device 7 is the storage device 6
It is a device that outputs the information saved and edited in the computer, and includes, for example, a printer, a display device, a magnetic tape device, a magnetic disk device, etc. The spell check dictionary device 8 is a dictionary area for supplying valid information as appropriate in response to inquiries about the spelling information of characters and words stored in the storage device 6, and includes a main dictionary (not shown) and the above-mentioned spelling dictionary. It consists of a user dictionary 1. This spell check dictionary device 8 includes, for example, a core memory, an IC memory,
Consists of ROM, magnetic disk, etc. Control device 9
consists of a computer that controls the exchange of signals and data between the above components.

以下、作用について説明する。 The effect will be explained below.

スペルチエツクを行なう場合は、Hash法によ
り当該単語のHash値を算出し、インデツクス部
２より当該データを高速で検索する。 When performing a spell check, the Hash value of the word is calculated using the Hash method, and the data is searched at high speed by the index section 2.

ユーザ辞書１の単語情報の登録、削除、リスト
出力には、データ部４を主に使用する。 The data section 4 is mainly used to register, delete, and output a list of word information in the user dictionary 1.

ユーザ辞書１に単語を登録する場合は、第３図
に示すように、まずメイン辞書に同じ単語情報が
有るかうどうか検索し（ステツプ＃11）、無けれ
ばユーザ辞書１のデータ部４に同じ単語情報が有
るかどうか検索する（ステツプ＃12）。メイン辞
書またはデータ部４に同じ単語情報が有るとき
は、エラーになる。同じ単語情報が無いときは、
当該単語情報についてHash値を作成し（ステツ
プ＃13）、作成したHash値と同じHash値がデー
タ部３に有るかどうか検索する（ステツプ＃14）。
同じHash値が有るときは、２重フラグをONす
る（ステツプ＃15）。Hash値は27ビツトで表わさ
れ、その前半の５ビツトでインデツクスを構成し
ているため、データ部分は22ビツトで表現され
る。しかし、実際は、ユーザ辞書１のデータ部３
は３バイトで格納されるので、２ビツトの余裕が
ある。そこで、２ビツトのうちの１ビツトがON
になつていれば、それは２重にHash値が使用さ
れていることを表わすものとする。以上の処理の
後、インデツクスのカウントアツプ、Hash値の
セツト並びにデータ部４に文字データの追加を行
ない、単語情報の登録を完了する（ステツプ
＃16）。 When registering a word in the user dictionary 1, as shown in Figure 3, first search to see if the same word information exists in the main dictionary (step #11). Search for information (Step #12). If the same word information exists in the main dictionary or data section 4, an error will occur. When there is no same word information,
A Hash value is created for the word information (Step #13), and a search is made to see if the same Hash value as the created Hash value exists in the data section 3 (Step #14).
If the same Hash value exists, turn on the double flag (Step #15). The hash value is represented by 27 bits, and the first 5 bits constitute the index, so the data part is represented by 22 bits. However, in reality, the data part 3 of the user dictionary 1
is stored in 3 bytes, so there is a margin of 2 bits. Therefore, one of the two bits is ON.
If it is, it means that the hash value is being used twice. After the above processing, the index is counted up, the hash value is set, and character data is added to the data section 4, thereby completing the registration of word information (step #16).

ユーザ辞書１から単語を削除する場合は、第４
図に示すように、まずユーザ辞書１に当該単語の
文字データが有るかどうか検索し（ステツプ
＃21）、文字データが有るときはデータ部４から
この文字データを削除する（ステツプ＃22）。次
に、当該単語についてHash値を作成（ステツプ
＃23）、データ部３の同じHash値に２重フラグが
ONしているかどうか検索する（ステツプ＃24）。
２重フラグがONしていなければ、このHash値
を削除し（ステツプ＃25）、インデツクスをカウ
ントダウンする（ステツプ＃29）。一方、２重フ
ラグがONしていると、ユーザ辞書１の全ての単
語についてHash法によりHash値を算出して同じ
Hash値がまだ複数個有るかどうか検察し（ステ
ツプ＃26、＃27）、複数個有るときはそのままイ
ンデツクスをカウントダウンし（ステツプ＃29）、
１個だけなら２重フラグをOFFにして（ステツ
プ＃28）、インデツクスをカウントダウンする
（ステツプ＃29）。 To delete a word from user dictionary 1, use the fourth
As shown in the figure, first, the user dictionary 1 is searched to see if there is character data for the word in question (step #21), and if character data is found, this character data is deleted from the data section 4 (step #22). Next, a Hash value is created for the word (step #23), and the same Hash value in data section 3 is flagged as double.
Search to see if it is ON (Step #24).
If the double flag is not ON, delete this Hash value (Step #25) and count down the index (Step #29). On the other hand, if the double flag is ON, the Hash value is calculated using the Hash method for all words in User Dictionary 1, and the same value is calculated using the Hash method.
Check whether there are more than one Hash value (Steps #26, #27), and if there are more than one, count down the index (Step #29),
If there is only one, turn off the double flag (step #28) and count down the index (step #29).

ユーザ辞書１に登録してある単語情報をリスト
出力する場合は、第５図に示すように、ユーザ辞
書１のデータ部４に当該文字データが有るかどう
か検索し（ステツプ＃31）、文字データが有ると
きはこの文字データを出力装置７へ送る（ステツ
プ＃32）。データ部４は、頭文字がアルフアベツ
ト順に単語情報を保持するので、リストを見易く
出力できる。 When outputting a list of word information registered in the user dictionary 1, as shown in FIG. If there is, this character data is sent to the output device 7 (step #32). Since the data section 4 holds word information in alphabetical order of initial letters, the list can be output in an easy-to-read manner.

＜発明の効果＞以上説明したように本発明においては、ユーザ
辞書を登録した単語をハツシユ法を用いた数値デ
ータで記憶する第１のエリアと、その単語を文字
データで記憶する第２のエリアとにより構成し、
その第１のエリアはスペルチエツクの参照用と
し、また第２のエリアはオペレータの確認用に用
いるよう構成したから、ユーザ辞書に対してスペ
ルチエツクのときはHash値を利用して高速処理
が可能になるとともに、登録、削除、リスト出力
等の高速を必要としない処理のときは文字データ
を利用することにより対応でき、電子辞書の機能
を高めることができる。<Effects of the Invention> As explained above, in the present invention, the first area stores the words registered in the user dictionary as numerical data using the hashing method, and the second area stores the words as character data. Consisting of
The first area is used for spell checking reference, and the second area is used for operator confirmation, so when spell checking a user dictionary, Hash values can be used for high-speed processing. In addition, processing that does not require high speed, such as registration, deletion, and list output, can be handled by using character data, and the functionality of the electronic dictionary can be improved.

[Brief explanation of the drawing]

第１図は本発明実施例のユーザ辞書の構造を示
す図、第２図は本発明を適用した計算機システム
の構成を示すブロツク図、第３図、第４図並びに
第５図は本発明実施例の処理手順を示すフローチ
ヤート、第６図はスペルチエツクの処理手順を示
すフローチヤート、第７図、第８図並びに第９図
は従来例のユーザ辞書の構造を示す図である。１……ユーザ辞書、２……インデツクス、３，
４……データ部。 FIG. 1 is a diagram showing the structure of a user dictionary according to an embodiment of the present invention, FIG. 2 is a block diagram showing the configuration of a computer system to which the present invention is applied, and FIGS. 3, 4, and 5 are diagrams showing the implementation of the present invention. FIG. 6 is a flowchart showing the processing procedure of a spell check, and FIGS. 7, 8, and 9 are diagrams showing the structure of a conventional user dictionary. 1...User dictionary, 2...Index, 3,
4...Data section.

Claims

[Claims]

1 Equipped with a main dictionary that holds word information supplied in advance by the system and a user dictionary that holds word information registered by the user, the spelling of words input by the operator is checked by referring to both dictionaries. The user dictionary has a first area that stores words registered by the operator as numerical data using the hashing method, and a second area that stores the words as character data. An electronic dictionary characterized in that the first area is used for spell check reference, and the second area is used for operator confirmation.