JPH05233701A - Dictionary storage device - Google Patents

Dictionary storage device

Info

Publication number
JPH05233701A
JPH05233701A JP3345081A JP34508191A JPH05233701A JP H05233701 A JPH05233701 A JP H05233701A JP 3345081 A JP3345081 A JP 3345081A JP 34508191 A JP34508191 A JP 34508191A JP H05233701 A JPH05233701 A JP H05233701A
Authority
JP
Japan
Prior art keywords
character
storage device
characters
element value
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP3345081A
Other languages
Japanese (ja)
Other versions
JP3127969B2 (en
Inventor
Min Gu Ton
グ トン−ミン
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial Technology Research Institute ITRI
Original Assignee
Industrial Technology Research Institute ITRI
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial Technology Research Institute ITRI filed Critical Industrial Technology Research Institute ITRI
Priority to JP03345081A priority Critical patent/JP3127969B2/en
Publication of JPH05233701A publication Critical patent/JPH05233701A/en
Application granted granted Critical
Publication of JP3127969B2 publication Critical patent/JP3127969B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

PURPOSE: To execute the high-speed access of a character by making the inner codes of two corresponding middle sentence characters index values to be a secondary coordinate place to occupy continuously storage space while shifting these through an indexing device. CONSTITUTION: In the indexing device 103, middle sentence indexed values to occupy one continuous space are generated by obtaining the inner code shift arithmetic operation of an inputted KANJI (Chines character). Then, the first character and the second character of one compound word of KANJI in the dictionary format (archaigue) of a magnetic disk 7 are read in from a reader 101, and are converted into the indexed values X, Y for these two characters through one indexing device 103, and these two indexed values X, Y are made the coordinates of an axis S and the axis Y. An element value corresponding to it shows the connection relation of these two characters (this element value is made one to plural bits). This element value is recorded and stored in a word and phrase connection relation storage device 104, and further, it is compressed through a corresponding device, and is stored in a compressing and storage device 105.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、中文辞書記憶装置又は
中文辞書を圧縮・メモリする方法に係る。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a Chinese dictionary storage device or a method for compressing and storing a Chinese dictionary.

【0002】[0002]

【従来の技術】コンピュータを利用して中文データを処
理する製品は、通常中文辞書をメモリしてインフォメー
ション(質問)に備える必要がある。中文は1字で2バ
イトを必要とし、数個の漢字で熟語が構成され、中文辞
書には数万個の漢字が含まれる為、伝統の方法で中文辞
書をディスクにメモリすると、コンピュータのソフトに
大量の記憶空間が必要となり、負担も大きく、文字の呼
び掛けスピードも遅くなる。
2. Description of the Related Art A product for processing Chinese data using a computer usually needs to store a Chinese dictionary in memory to prepare for information (question). A Chinese sentence requires 2 bytes for one character, a compound word is composed of several Kanji characters, and the Chinese sentence dictionary contains tens of thousands of Kanji characters. It requires a large amount of storage space, is burdensome, and slows down the character calling speed.

【0003】伝統的に用いる通常の方法は、二分呼掛け
法とハッシュ(Hash)法がある。ハッシュ法を使用
すれば文字のサーチは速くなるが、大規模の記憶空間に
ハッシュテーブルを必要とし、二分呼掛け法を使用すれ
ば、何回も呼掛けた後漸く探し出すことが出来る。しか
も、大型辞書は、RAMには収納しきれず、ストックの
形式でハードディスクにメモリする必要があり、二分法
でハードディスクを数回にわたって読み取るのは非常に
時間を要する。コンピュータから見れば、一般にスペー
スとスピードのいずれかを取捨選択しなければならな
い。
[0003] The usual methods used traditionally are the binary interrogation method and the Hash method. Although the character search is faster when using the hash method, it requires a hash table in a large storage space, and when using the binary interrogation method, it is possible to find out after a number of interrogations. Moreover, the large dictionary cannot be stored in the RAM and must be stored in the hard disk in the form of stock, and reading the hard disk several times by the dichotomy takes a very long time. From a computer perspective, one generally has to choose between space and speed.

【0004】[0004]

【発明が解決しようとする課題】本発明は辞書を特殊方
式でメモリし、更に非常に小さく圧縮してRAMの中に
収納するもので、文字呼掛け時に磁気ディスクのメモリ
セルを入出する必要がなく、又、“引線”(レグ・リー
ド線)とメモリを対照位置に置く方法を以て、圧縮後辞
書の呼掛スピードをハッシュレベルにまで高められる。
DISCLOSURE OF THE INVENTION The present invention stores a dictionary in a special manner and stores it in a RAM after compressing it to a very small size. In addition, the interrogation speed of the post-compression dictionary can be increased to the hash level by the method of placing the "draw line" (leg lead wire) and the memory at the contrast position.

【0005】本発明の運用は中文の欠字、誤字検査(ス
ペルチェック)から中文OCR或いは語音識別後処理す
るシステムにまで及ぶものである。
The operation of the present invention extends from a missing character in a Chinese sentence, a typographical error check (spell check) to a system for processing after the OCR of a Chinese sentence or a speech sound identification.

【0006】[0006]

【課題を解決するための手段】入力した中文の内部コー
ドは索引化(インデックス)装置を経て1シフト演算で
連続に空間を占拠する中文索引化の値となり、漢字の一
字目AをX軸の座標値とし、二字目BをY軸の座標値と
する。この二字が熟語を構成するか否かは、字句接続関
係の記憶装置の素子(エレメント)値として逐一設定さ
れ、更にメモリされている為、もし素子の値が対応する
第一字A及び第二字Bの組合せで熟語を構成しない(6
02)となれば、カウンタ中にその個数を記録する。そ
の値が2 -14 −1に達していれば、カウンタの値は素子
値と同等になり併合して圧縮記憶装置中の一圧縮ワード
中に書込まれ、この素子値に対応する一字目A、二字目
Bを二文字の単語とし且つ複数字句の一部分と見なされ
ず、カウンタの値と素子値を併せて圧縮記憶装置中の一
圧縮ワード中に書込み、この素子値が対応する一字目A
及び二字目Bを二文字の単語且つ複数文字単語の一部と
する。カウンタ値と素子値を取り込み併せて圧縮記憶装
置中の一圧縮ワード中に書込み、更にA及びBをトップ
に該当する二文字目以降の索引値を次の一つの圧縮ワー
ド中に書入れ、二文字目以降の文字と文字間の接続関係
を次の素子値中に書込む。更に一字目Aと二字目Bを首
とする第2からn個の字句を、その二字目以降の文字と
文字の接続関係も又引続き次素子値中に書込む。
[Means for Solving the Problems] The internal code of the input Chinese sentence
Do one shift operation through the indexing device
It becomes a value of Chinese indexing that occupies space continuously,
The letter A is the coordinate value of the X axis and the second letter B is the coordinate value of the Y axis.
To do. Whether or not these two characters form a idiom depends on the lexical connection function.
It is set every element as the element value of the storage device.
And the values are stored in the memory so that the values of the elements correspond.
Idioms are not composed of the combination of the first letter A and the second letter B (6
02), the number is recorded in the counter. So
Has a value of 2 -14If it reaches -1, the counter value is the element
A compressed word that is equal to the value and merges into compressed storage
First letter A, second letter written inside and corresponding to this element value
B is considered to be a two-letter word and part of a multiple lexical
Instead, the value of the counter and the element value are combined and stored in the compressed storage device.
Write in compressed word, the first letter A to which this element value corresponds
And the second character B is a two-letter word and part of a multi-letter word.
To do. Compressed storage device that captures the counter value and element value together
Write in one compressed word in storage, then top A and B
The index value after the second character that corresponds to
Entered in the text, the second and subsequent characters and the connection relationship between the characters
Is written in the next element value. In addition, the first letter A and the second letter B are necked
The second to nth tokens are defined as the second and subsequent letters.
The connection relationship of characters is also written in the next element value.

【0007】本発明は各素子値を字句の接続関係記憶装
置の位置と圧縮記憶装置中の位置を一線上に置き、約1
0000個の素子値毎に二種の対照位置を一つの記憶装
置に書込み、文字の高速呼出し目的を達成するものであ
る。
According to the present invention, each element value is placed in line with the position of the lexical connection relation storage device and the position in the compression storage device, and the value is about 1.
Two control positions for every 0000 element values are written in one storage device to achieve the purpose of high-speed character calling.

【0008】[0008]

【実施例】図1に本発明の構成を示し、読取り装置10
1から、磁気ディスクの辞書書式(アーカイグ)102
中−熟語の一字目Aと二字目Bを読込み、索引化装置の
一つ103を経てこの二文字に対する索引化値X,Yに
換算し、この二つの索引値をXとY軸座標とする。それ
に対応する素子値はこの二文字の接続関係をメモリ(こ
の素子値を1個から複数ビットとする)。この素子値は
字句の接続関係記憶装置104に記録、メモリされ、更
に圧縮装置を経由して圧縮して、圧縮記憶装置105に
保存する。
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT FIG. 1 shows the structure of the present invention, and a reader 10
From 1, the magnetic disk dictionary format (archiving) 102
The first character A and the second character B of the middle-phrase word are read and converted into indexing values X and Y for these two characters through one of the indexing devices 103, and these two index values are the X and Y axis coordinates. And The element value corresponding to this is a memory of the connection relationship of these two characters (this element value is made from one to multiple bits). This element value is recorded and stored in the lexical connection relation storage device 104, further compressed via a compression device, and stored in the compression storage device 105.

【0009】索引化装置:索引化装置は入力した漢字の
インナーコードシフト演算を経て一つの連続空間を占拠
する中文索引化値を産み出す。漢字は2バイトで1字を
形成し、BIG5コード中のハイオーダバイトのメモリ
位置範囲はA4〜F9である。ローオーダバイトは40
−7EとA1−FEの2段に分かれる。任意の漢字索引
値X,Y(Xはハイオーダバイト、Yはローオーダバイ
ト)で、もしY値がA1〜FE値であれば、Y=Y−2
2の演算を経てローオーダバイトは連続の範囲となる。
図2の(A)には、原BIG−5中文インナコードの記
憶体空間分配図を示す。A1≦Y≦FEのY値に対しY
=Y−22及びX=X−A4、Y=Y−40の演算を経
てXの範囲を0−55とし、Yの範囲は0−9Cとな
り、再びX* (Yの長さ9π)+Yの演算を経過して、
任意の漢字の索引値X,Yを0〜13501範囲内にあ
る一個の値に反映させる。図2の(B)には索引化した
後の漢字が占拠する連続空間の構造図を示し、本索引化
法もその他の中文インナコードシステム中に運用しなが
ら、異なるシフト換算を経て連続空間に使用することが
出来る。
Indexing device: The indexing device yields a Chinese sentence indexing value occupying one continuous space through an inner code shift operation of the input Chinese character. A Chinese character forms one character with two bytes, and the memory position range of the high-order byte in the BIG5 code is A4 to F9. 40 low order bytes
It is divided into two stages, -7E and A1-FE. Any Kanji index value X, Y (X is a high-order byte, Y is a low-order byte), and if the Y value is A1 to FE, Y = Y-2
Through the operation of 2, the low-order bytes become a continuous range.
FIG. 2A shows a storage space distribution diagram of original BIG-5 Chinese inner code. Y for Y value of A1 ≦ Y ≦ FE
= Y-22 and X = X-A4, Y = Y-40, the range of X is set to 0-55, the range of Y becomes 0-9C, and X * (Y length 9π) + Y After the calculation,
The index values X and Y of an arbitrary Chinese character are reflected in one value within the range of 0 to 13501. Fig. 2 (B) shows a structural diagram of the continuous space occupied by the Kanji after indexing. This indexing method also operates in other Chinese internal code systems, but is converted into continuous space through different shift conversion. Can be used.

【0010】字句接続関係記憶装置:伝統的な中文辞書
(図3の(A)参照)の空間占拠量は甚だ大きく磁気デ
ィスクに入れる必要があり、文字呼出しの回数が頻繁と
なると、ディスクからの入出回数も多くなり、処理スピ
ードは必然的におそくなる。
Lexical connection relation storage device: The traditional Chinese dictionary (see FIG. 3 (A)) has a very large space occupation amount, and it is necessary to put it on the magnetic disk. The number of times of entry and exit increases, and the processing speed inevitably slows.

【0011】図3の(B)は当発明中の字句の接続関係
記憶装置構造図を示す。本実施例は1を以て2ビットの
素子値とし、中文字句内の最初の2字について接続関係
を記憶する。もし、この中文字句が二文字による熟語或
いは単文字であれば、図3B素子値中二つ目のビットを
“1”に設定する。この中文字句が三字以上であれば図
3B素子値中第一のビットは“1”に設定する。
FIG. 3B is a structural diagram of a lexical connection relation storage device according to the present invention. In this embodiment, 1 is used as a 2-bit element value, and the connection relation is stored for the first two characters in the middle character phrase. If this middle character phrase is a compound word consisting of two characters or a single character, the second bit in the element value of FIG. 3B is set to "1". If the middle character phrase is three or more characters, the first bit in the element value of FIG. 3B is set to "1".

【0012】図4には字句の接続関係を構成するフロー
を示す。
FIG. 4 shows a flow for forming a lexical connection relationship.

【0013】一つの中文字句401を読取り、もし単文
字402と判断すれば一個の索引化装置(図1の10
3)を経て、この文字の索引値403を計算して、X軸
座標値とし、Y軸座標値を5401を設定し、字句の接
続関係を記憶装置中の(X+1)* 5402* 2+1の
位置の処(即ち(X,5401)の素子値を“1”に設
定する。読込んだ中文字句が単文字402でなければ索
引化装置(図1の103)を経て第一と第二字の索引値
405(各々XとYを算出)を計算する。二文字であ
り、(X* 5402* 2)+Y* 2+1場所の素子値を
“1”406に設定する。もしくは(* 5402* 2)
+Y* 2場所の素子値を“1”407に設定する。この
方式によれば、すべての中文文字を全部字句接続関係記
憶装置中に書入れることが出来る。
One medium character phrase 401 is read, and if it is judged as a single character 402, one indexing device (10 in FIG. 1) is read.
After 3), the index value 403 of this character is calculated to be the X-axis coordinate value, the Y-axis coordinate value is set to 5401, and the lexical connection relationship is the (X + 1) * 5402 * 2 + 1 position in the storage device. (That is, the element value of (X, 5401) is set to "1". If the read middle character phrase is not the single character 402, the first and second characters are passed through the indexing device (103 in FIG. 1). Calculate the index value 405 (calculate X and Y respectively), which is two letters and sets the element value at (X * 5402 * 2) + Y * 2 + 1 location to "1" 406. or ( * 5402 * 2)
+ Y * 2 Set the element value at the location to “1” 407. According to this method, all the Chinese characters can be written in the lexical connection relation storage device.

【0014】圧縮装置:字句接続関係を示す素子値で一
つの大型のスペース行列を形成し、比較的大きなメモリ
空間を占拠する。字句を成さない素子値は圧縮装置を利
用して圧縮しメモリ空間を減らすことが出来る。本実施
例に運用する圧縮装置は
Compressor: A large space matrix is formed by element values indicating a lexical connection relationship, and occupies a relatively large memory space. Non-lexical element values can be compressed using a compression device to reduce memory space. The compression device used in this embodiment is

【0015】[0015]

【数3】 [Equation 3]

【0016】圧縮装置である。A compression device.

【0017】圧縮記憶装置:圧縮後の素子値を2個のバ
イト(16個のビット)で1ゾーン(セクタ)として圧
縮記憶装置に記憶し、その前部分14ビットに圧縮され
た後の字句を形成しない素子値の個数をローディングし
て、後の2ビットに素子値自身をローディングし、圧縮
後のワードを形成する。図5の(A)にはその構成図を
示している。その中、素子値が‘00’の時、表ではこ
の素子値に対応する二中文文字は字句を形成しない。素
子値が‘01’の場合、この素子値が対応する二つの中
文文字は丁度二字の言葉を形成する。素子値が‘10’
の時、この素子値に対応する二つの中文文字は三字或い
は三字以上の言葉を形成する。素子値が‘11’の場
合、表でこの素子値に対応する二つの中文文字自身が二
文字の言葉であり、又、三字或いは三字以上の言葉であ
る。
Compressed storage device: The element value after compression is stored in the compressed storage device as 2 zones (16 bits) as one zone (sector), and the token after being compressed to 14 bits in the front part is stored. The number of element values that are not formed is loaded, and the element values themselves are loaded in the latter two bits to form a compressed word. FIG. 5A shows the configuration diagram. Among them, when the element value is '00', the Chinese character corresponding to this element value does not form a token in the table. If the element value is '01', the two Chinese characters to which the element value corresponds form exactly two words. Element value is '10'
Then, the two Chinese characters corresponding to this element value form three or more words. When the element value is '11', the two Chinese characters themselves corresponding to this element value in the table are two-letter words, or three or more letters.

【0018】上述の1ワード中最後の2ビット(即ち素
子値)が“10”あるいは“1”の場合、それに対応す
る第一字(A)及び第二字(B)を表示し、三字以上の
言葉を形成することが出来る。この時、第三字の索引値
を次の1ワードの前13ビットにローディングされ、第
三字とその後の文字関係を、そのあとに来る3ビットの
次の素子値でローディングする。
When the last 2 bits (that is, the element value) in one word described above are "10" or "1", the corresponding first letter (A) and second letter (B) are displayed, and three letters are displayed. The above words can be formed. At this time, the index value of the third character is loaded into the previous 13 bits of the next 1 word, and the character relationship between the third character and the subsequent characters is loaded with the next 3 bits of the next element value.

【0019】図5の(B)には圧縮記憶装置中2字以上
になる字句のメモリ方式を示し、その次素子値中第2ビ
ットに“1”を立てた時、当該文字は中文字句の最終文
字であることを表示し、次の一つのワードは、別の一個
の索引値(X,Y)を以て始まる中文字句の第3字をロ
ーディングする。次素子値中第3ビットに“1”が設定
された場合、続く一つのワードと、本ワードは同一中文
字句に属することを表わし、次の素子値(エレメント)
中第1ビットは、本実施例では保留し使用しない。
FIG. 5B shows a memory system of a phrase having two or more characters in the compression storage device. When the second bit in the element value is set to "1", the character is a medium phrase. Indicating that it is the last character, the next one word loads the third character of the middle phrase beginning with another index value (X, Y). When the third bit in the next element value is set to "1", it means that the following one word and this word belong to the same middle character phrase, and the next element value (element)
The middle first bit is reserved and is not used in this embodiment.

【0020】図6には圧縮並びにメモリした素子値が圧
縮記憶に至る過程を示したものである。
FIG. 6 shows the process in which the compressed and memorized element values reach the compressed storage.

【0021】字句の接続関係記憶装置中の1素子値60
1を読取り、この素子値が対応する一字目A、二字目B
が熟語を形成しないと判断602した時測定カウンタの
値が既に214−1(図7の(A)の605)に達してい
れば、カウンタの値は2ビットシフトして素子値に連同
し、併せて圧縮記憶装置中の一個の圧縮ワード606に
書込まれ、カウンタは“0”に復帰607する。この素
子値に対応する一字目Aと二字目Bが二文字であり且つ
複数字句603の一部でないと判断すれば、カウンタの
値は2ビットシフトし、素子値と併せて圧縮記憶装置中
の一つの圧縮ワード608に書込まれ、カウンタは0と
なる(609)。
1-element value 60 in lexical connection relation storage device
1 is read, and the first character A and the second character B corresponding to this element value
When it is determined 602 that the phrase is not formed, if the value of the measurement counter has already reached 2 14 -1 (605 in FIG. 7A), the counter value is shifted by 2 bits and linked to the element value. , Is also written in one compressed word 606 in the compressed storage device, and the counter returns 607 to “0”. If it is determined that the first letter A and the second letter B corresponding to this element value are two letters and are not part of the plural tokens 603, the value of the counter is shifted by 2 bits and the compression value is stored together with the element value. It is written to one of the compressed words 608 and the counter becomes 0 (609).

【0022】この素子値が対応する一字目Aと二字目B
が二文字の字句であり且つ複数字句の一部分604と判
断した場合、カウンタ値は2ビットシフトして素子値と
併せ圧縮記憶装置中の一個の圧縮ワード610に書込ま
れ同時にカウンタは0に戻る(611)。AとBで始ま
る当該字句の第2字以後の文字の索引値を3ビットシフ
トさせ、次の一個の圧縮ワード中612に書込み、2字
目以降の文字と文字間の接続関係も続いて次素子値中6
13に書入れる。図8は圧縮記憶装置中2字以上の言葉
のメモリ構成図を示している。
The first character A and the second character B to which this element value corresponds
Is a two-character lexical part and it is judged that it is a part 604 of the plural lexical parts, the counter value is shifted by 2 bits and written together with the element value into one compressed word 610 in the compressed storage device, and at the same time, the counter returns to 0. (611). The index value of the second and subsequent characters of the lexical beginning with A and B is shifted by 3 bits and written in 612 in the next one compressed word, and the second and subsequent characters and the connection relationship between the characters also continue. 6 out of element values
Write in 13. FIG. 8 shows a memory configuration diagram of words of two or more characters in the compressed storage device.

【0023】字句の呼出し装置: 対照位置記憶装置 高速で文字を探索し呼掛けを行ない呼出す目的を達成す
る為(圧縮記憶装置の最前列から圧縮が解除されるのを
防ぐため、探索する字句が探しだされる迄保持する)、
各素子値は字句の接続関係記憶装置に於ける位置と、圧
縮記憶装置中に於ける位置を一つの連なったものとす
る。本発明は約10000個の素子値毎に二種の対照位
置を一つの記憶装置に書込むものである。
Lexical calling device: contrast position storage device To achieve the purpose of calling and calling characters by searching at high speed (to prevent decompression from the front row of the compression storage device, Hold until searched),
Each element value is a combination of a position in the lexical connection relation storage device and a position in the compression storage device. The present invention writes two types of reference positions into one memory device for every 10,000 element values.

【0024】図9には対照位置のセットアップとメモ
リ。
FIG. 9 shows the control position setup and memory.

【0025】素子値の位置指標、圧縮位置の指標及びリ
ード線のポイント値の初期値801を設定、圧縮記憶装
置から整数一個を読み取る(802)。この数字は字句
を形成しないものの素子値の個数を代表しており、累積
加算して素子値の位置指標803に達したならば、圧縮
位置の指標804に増加する。もし、当該整数の後の素
子値が二文字且つ二文字以上ある言葉を代表していれ
ば、引続きその後の整数を読み取り、次素子値が“00
0”(805)になるまで継続する。次に素子値の位置
指標がリード線のポイント値806より大きいか否かを
測定する。もし、素子値の位置指標及び圧縮位置指標を
対照位置記憶装置807中に書込み、更にリード線ポイ
ントの累積が1000(808)になれば素子値の位置
指標に1を加え(809)、その他の整数は完全に処理
をする。
An element value position index, a compression position index, and an initial value 801 of a lead wire point value are set, and one integer is read from the compression storage device (802). This number represents the number of element values which do not form a token, and when cumulatively added to reach the element value position index 803, it increases to the compression position index 804. If the element value after the integer represents two words and there are two or more characters, the integer after that is read and the next element value is "00".
It continues until it becomes 0 ″ (805). Next, it is measured whether the position index of the element value is larger than the point value 806 of the lead wire. Writing during 807, and when the accumulation of the lead wire points reaches 1000 (808), 1 is added to the position index of the element value (809), and other integers are completely processed.

【0026】字句探索(呼掛け)の過程:(AB[CD
…])が熟語901か否かを呼掛け、(AB)の索引値
(X,Y)[注:単文字であれば、索引値は(X,54
01)となる]を計算する。(X,Y)の素子値の位置
(LOC)(903)を計算して、対照位置記憶装置内
の第〔素子値位置/10000〕項目のエレメントを探
し、LOCに最も近い1本のリード線を得る。即ち、素
子値指標とこれに反映する圧縮位置の指標である。90
4は圧縮位置指標の場所から順番に一個の整数905の
読取りを開始し、累積が「字句の接続表」指標906に
至るまで加算する。LOC>素子値の位置指標907
の時、圧縮を解除されていないものから探索しようとす
る素子値に相対して反映する中文字句を表示しており、
引き続き圧縮を解除する。この整数の素子値が“10”
か“11”かを測定し、継続して圧縮記憶装置の整数を
読み、その整数末尾数に次ぐ素子値が“000”(90
8)となるまで読取る。次に素子値の位置指標が1つ増
え、継続して辞書の圧縮を解除する(909)。LO
C<素子値の位置指標ならば、(A,B)の文字が存在
しないことを表示しており、即ち(AB[CD…])の
一字句は無い(910)。LOC=指標であれば、
(A,B)字句の位置を探しあてたことを表示してい
る。その場合、二文字(A,B)のみであれば、それに
見あう整数の素子値が“01”か“11”かを測定す
る。(A,B)が字句の存在を表示しなければ(A,
B)の字句が存在しないことを表示している。三文字以
上の言葉(ABCD…)の呼掛けでは、当該整数が“1
0”か“11”かを測定して、入力した二字目以後の文
字を持続的に圧縮記憶装置中で追随してくる漢字と逐一
対比させて、対比に成功するまで続ける。或いは、次の
素子が“000”(911)となるまで対比を続ける。
Process of lexical search (interrogation): (AB [CD
...]) is an idiom 901, and the index value (X, Y) of (AB) [Note: if it is a single character, the index value is (X, 54).
01)] is calculated. The position (LOC) (903) of the element value of (X, Y) is calculated to search for the element of the [element value position / 10,000] item in the reference position storage device, and one lead wire closest to the LOC To get That is, it is an element value index and an index of the compression position reflected on it. 90
4 starts reading one integer 905 in order from the position of the compression position index, and adds up until the accumulation reaches the “lexical connection table” index 906. LOC> element value position index 907
At the time of, it displays the middle character phrase that reflects relative to the element value to be searched from the one that is not decompressed,
Continue decompressing. The element value of this integer is "10"
Or "11" is measured, the integer of the compression storage device is continuously read, and the element value next to the end number of the integer is "000" (90
Read until 8) is displayed. Next, the position index of the element value increases by 1, and the compression of the dictionary is continuously released (909). LO
If the position index is C <element value, it indicates that the characters (A, B) do not exist, that is, there is no single word (AB [CD ...]) (910). If LOC = index,
(A, B) It is displayed that the position of the lexical is searched for. In that case, if there are only two letters (A, B), it is measured whether the integer element value corresponding to it is "01" or "11". If (A, B) does not indicate the presence of a lexical (A,
It indicates that the phrase B) does not exist. In the challenge of words with three or more letters (ABCD ...), the integer is "1".
It measures 0 "or" 11 "and continuously compares the input characters after the second character with the Chinese characters that follow in the compression storage device, and continues until the comparison succeeds. Continuing the comparison until the element of becomes "000" (911).

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の構成図である。FIG. 1 is a configuration diagram of the present invention.

【図2】(A),(B)は中文インナコードの連続記憶
空間分解図、及び索引化後中文インナコードの連続記憶
空間図である。
2A and 2B are a continuous storage space decomposition diagram of a Chinese inner code and a continuous storage space diagram of an indexed Chinese inner code.

【図3】(A),(B)は伝統的辞書保管形式及び本発
明中の字句接続関係のメモリ方式を説明する図である。
3A and 3B are diagrams illustrating a traditional dictionary storage format and a memory system for lexical connection in the present invention.

【図4】字句の接続関係設立及びメモリ方法を説明する
フローチャートである。
FIG. 4 is a flowchart illustrating a lexical connection establishment and memory method.

【図5】(A),(B)は圧縮記憶装置中の素子値メモ
リ方式及び圧縮記憶装置中、二字以上の熟語メモリ方式
を説明する図である。
5A and 5B are diagrams for explaining an element value memory system in a compression storage device and a compound word memory system of two or more characters in the compression storage device.

【図6】圧縮とメモリ素子値から圧縮辞書記憶装置に至
る方法を説明するフローチャートである。
FIG. 6 is a flow chart illustrating a method of compression and reaching from a memory element value to a compression dictionary storage device.

【図7】(A),(B),(C)は同じく圧縮とメモリ
素子値から圧縮辞書記憶装置に至る方法を説明する図で
ある。
7 (A), (B), and (C) are diagrams for explaining a method of similarly performing compression and a memory element value to reach a compression dictionary storage device.

【図8】圧縮記憶装置中2文字以上の熟語に対するメモ
リ分解図である。
FIG. 8 is a memory exploded view for a phrase having two or more characters in a compressed storage device.

【図9】対照位置のメモリ方法を説明するフローチャー
トである。
FIG. 9 is a flowchart illustrating a memory method of a reference position.

【図10】文字呼掛け方法を説明するフローチャートで
ある。
FIG. 10 is a flowchart illustrating a character calling method.

【符号の説明】[Explanation of symbols]

101 読取装置 103 索引化装置 104 字句接続関係記憶装置 105 圧縮記憶装置 101 Reading Device 103 Indexing Device 104 Lexical Connection Relation Storage Device 105 Compressed Storage Device

Claims (10)

【特許請求の範囲】[Claims] 【請求項1】 漢字文字間の接続関係をメモリし、上述
の文字と文字間の接続関係は1から複数ビットの素子値
を表示することが出来、その対応する二個の中文文字の
インナーコードは、索引化装置を経てシフトしながら連
続して記憶空間を占拠する二元座標値となる検引値であ
ることを特徴とする辞書記憶装置。
1. An inner code of two corresponding Chinese characters, which stores a connection relationship between Chinese characters and can display an element value of 1 to multiple bits in the connection relationship between the above characters. The dictionary storage device is a binary index value that continuously occupies the storage space while shifting through the indexing device.
【請求項2】 即ち1〜複数ビットの素子値を中文の文
字と文字間の接続関係をメモリする一文字の接続関係記
憶装置と、 【数1】 のコーディングで、上述の素子値中字句を形成しないも
のを圧縮する圧縮装置と、上述の字句を形成しない素子
値の個数、二字及び複数文字をメモリする圧縮記憶装置
とよりなる圧縮辞書記憶装置。
2. A one-character connection relation storage device for storing element values of 1 to a plurality of bits, that is, a connection relation between Chinese characters and a character, and A compression dictionary storage device comprising: a compression device that compresses those that do not form the above-mentioned element value middle-word, and a compression storage device that stores the number of element values that do not form the above-mentioned token, and two or more characters. ..
【請求項3】 一つの索引化装置を含み、中文文字のイ
ンナーコードを一つに演算し、文字のメモリ空間を連続
に変換することを特徴とする請求項2記載の圧縮辞書記
憶装置。
3. The compression dictionary storage device according to claim 2, wherein the compression dictionary storage device includes one indexing device, calculates an inner code of Chinese characters into one, and continuously converts a memory space of the characters.
【請求項4】 前記圧縮記憶装置の二文字のメモリは、
その接続関係の素子値をメモリするものであることを特
徴とする請求項2記載の圧縮辞書記憶装置。
4. The two-character memory of the compressed storage device comprises:
3. The compression dictionary storage device according to claim 2, wherein the element value of the connection relation is stored in a memory.
【請求項5】 前記圧縮記憶装置中複数文字のメモリ
は、二文字目以降の文字の索引値をメモリし、複数文字
の文字と文字間に於ける接続関係の次の素子値を表示す
るものであることを特徴とする請求項2記載の圧縮辞書
記憶装置。
5. A memory for a plurality of characters in the compressed storage device stores an index value of the second and subsequent characters, and displays the next element value of the connection relationship between the characters of the plurality of characters. The compression dictionary storage device according to claim 2, wherein
【請求項6】 素子値は2ビットである請求項1又は2
記載の圧縮辞書記憶装置。
6. The element value is 2 bits, according to claim 1 or 2.
The described compressed dictionary storage device.
【請求項7】 2次素子値は3ビットである請求項1又
は2記載の圧縮辞書記憶装置。
7. The compression dictionary storage device according to claim 1, wherein the secondary element value is 3 bits.
【請求項8】 一字句を読取り、上述字句の一字目、二
字目を取り、上述二文字のインナーコード索引値を計算
し、対照位置装置でサーチし、圧縮位置の値及び字句接
続位置の値を求め、上述の圧縮記憶装置で上述した位置
の値の場所から始め、最も近似した素子値を探し出し、
二文字間の接続関係を求め、順番に次の素子値を読取
り、三文字以上の字句の接続関係を求める各段階よりな
る更に一つの文字探索方法を含む請求項2記載の圧縮辞
書記憶装置。
8. A single lexical phrase is read, the first and second letters of the lexical phrase are taken, an inner code index value of the above-mentioned two characters is calculated, and a search is performed by a reference position device, and the value of the compression position and the lexical connection position are calculated. The value of is obtained, starting from the location of the value of the above-mentioned position in the above-mentioned compressed storage device, searching for the closest element value,
3. The compression dictionary storage device according to claim 2, further comprising one character search method comprising the steps of obtaining a connection relation between two characters, reading the next element value in order, and obtaining a connection relation of three or more characters.
【請求項9】 一字句を入力し、上述の第一字と第二字
のインナーコードを切換えて連続空間の索引化値に変換
し、上述の一字目、二字目が対応する接続関係素子値を
設定し、上述の文字の接続関係素子値を圧縮し、即ち、
字句を形成しない部分を圧縮し、その個数を記録し、上
記字句を形成しないものの個数、二字目以後の文字索引
及び次の素子値を以てその文字間の連続関係を指示する
次の素子値をメモリする各段階よりなる中文辞書を圧縮
メモリする方法。
9. A connection relation in which a first character and a second character are input by inputting a lexical character and switching the inner code of the first character and the second character to convert into an indexed value of a continuous space. Set the element value and compress the connection related element value of the above character, that is,
Compress the part that does not form a lexical record, record the number, the number of those that do not form the lexical, the character index after the second character and the next element value that indicates the continuous relationship between the characters with the next element value. A method of compressing and storing a Chinese dictionary that consists of each step of memory.
【請求項10】 【数2】 コーディングを運用する方法を採用している請求項9記
載の中文辞書を圧縮メモリする方法。
10. The equation 2 The method of compressing and storing a medium language dictionary according to claim 9, wherein a method of operating coding is adopted.
JP03345081A 1991-12-26 1991-12-26 Dictionary storage device Expired - Lifetime JP3127969B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP03345081A JP3127969B2 (en) 1991-12-26 1991-12-26 Dictionary storage device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP03345081A JP3127969B2 (en) 1991-12-26 1991-12-26 Dictionary storage device

Publications (2)

Publication Number Publication Date
JPH05233701A true JPH05233701A (en) 1993-09-10
JP3127969B2 JP3127969B2 (en) 2001-01-29

Family

ID=18374156

Family Applications (1)

Application Number Title Priority Date Filing Date
JP03345081A Expired - Lifetime JP3127969B2 (en) 1991-12-26 1991-12-26 Dictionary storage device

Country Status (1)

Country Link
JP (1) JP3127969B2 (en)

Also Published As

Publication number Publication date
JP3127969B2 (en) 2001-01-29

Similar Documents

Publication Publication Date Title
US4653100A (en) Audio response terminal for use with data processing systems
KR100330801B1 (en) Language identifiers and language identification methods
US4990903A (en) Method for storing Chinese character description information in a character generating apparatus
JPS5892035A (en) Data base processing system
US5802482A (en) System and method for processing graphic language characters
EP3385860A1 (en) Compression of text using multiple dynamic dictionaries
JPH05233701A (en) Dictionary storage device
US6731229B2 (en) Method to reduce storage requirements when storing semi-redundant information in a database
WO2001016863A2 (en) Method and apparatus for symbol storage and display
JPH0612548B2 (en) Document processor
EP0539965A2 (en) An electronic dictionary including a pointer file and a word information correction file
JP3045886B2 (en) Character processing device with handwriting input function
JP3585944B2 (en) Data processing method and apparatus
JPS6246029B2 (en)
JP3021224B2 (en) Dictionary search device
JP2634926B2 (en) Kana-Kanji conversion device
JPH03137768A (en) Document processor
JPH0456350B2 (en)
JP3273778B2 (en) Kana-kanji conversion device and kana-kanji conversion method
JPH0721182A (en) Character processor and its method
JPH0521264B2 (en)
JPS61128367A (en) &#39;kana&#39;/&#39;kanji&#39; converter
JP3155600B2 (en) Information retrieval device
JP2865446B2 (en) Sentence processing equipment
JPH0385670A (en) Two-step display system document processor

Legal Events

Date Code Title Description
R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

S111 Request for change of ownership or part of ownership

Free format text: JAPANESE INTERMEDIATE CODE: R313113

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081110

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081110

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091110

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101110

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111110

Year of fee payment: 11

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111110

Year of fee payment: 11

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121110

Year of fee payment: 12

EXPY Cancellation because of completion of term
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121110

Year of fee payment: 12