JPH05233701A

JPH05233701A - Dictionary storage device

Info

Publication number: JPH05233701A
Application number: JP3345081A
Authority: JP
Inventors: Min Gu Ton; グトン−ミン
Original assignee: Industrial Technology Research Institute ITRI
Current assignee: Industrial Technology Research Institute ITRI
Priority date: 1991-12-26
Filing date: 1991-12-26
Publication date: 1993-09-10
Anticipated expiration: 2016-01-29
Also published as: JP3127969B2

Abstract

PURPOSE: To execute the high-speed access of a character by making the inner codes of two corresponding middle sentence characters index values to be a secondary coordinate place to occupy continuously storage space while shifting these through an indexing device. CONSTITUTION: In the indexing device 103, middle sentence indexed values to occupy one continuous space are generated by obtaining the inner code shift arithmetic operation of an inputted KANJI (Chines character). Then, the first character and the second character of one compound word of KANJI in the dictionary format (archaigue) of a magnetic disk 7 are read in from a reader 101, and are converted into the indexed values X, Y for these two characters through one indexing device 103, and these two indexed values X, Y are made the coordinates of an axis S and the axis Y. An element value corresponding to it shows the connection relation of these two characters (this element value is made one to plural bits). This element value is recorded and stored in a word and phrase connection relation storage device 104, and further, it is compressed through a corresponding device, and is stored in a compressing and storage device 105.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、中文辞書記憶装置又は
中文辞書を圧縮・メモリする方法に係る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a Chinese dictionary storage device or a method for compressing and storing a Chinese dictionary.

【０００２】[0002]

【従来の技術】コンピュータを利用して中文データを処
理する製品は、通常中文辞書をメモリしてインフォメー
ション（質問）に備える必要がある。中文は１字で２バ
イトを必要とし、数個の漢字で熟語が構成され、中文辞
書には数万個の漢字が含まれる為、伝統の方法で中文辞
書をディスクにメモリすると、コンピュータのソフトに
大量の記憶空間が必要となり、負担も大きく、文字の呼
び掛けスピードも遅くなる。2. Description of the Related Art A product for processing Chinese data using a computer usually needs to store a Chinese dictionary in memory to prepare for information (question). A Chinese sentence requires 2 bytes for one character, a compound word is composed of several Kanji characters, and the Chinese sentence dictionary contains tens of thousands of Kanji characters. It requires a large amount of storage space, is burdensome, and slows down the character calling speed.

【０００３】伝統的に用いる通常の方法は、二分呼掛け
法とハッシュ（Ｈａｓｈ）法がある。ハッシュ法を使用
すれば文字のサーチは速くなるが、大規模の記憶空間に
ハッシュテーブルを必要とし、二分呼掛け法を使用すれ
ば、何回も呼掛けた後漸く探し出すことが出来る。しか
も、大型辞書は、ＲＡＭには収納しきれず、ストックの
形式でハードディスクにメモリする必要があり、二分法
でハードディスクを数回にわたって読み取るのは非常に
時間を要する。コンピュータから見れば、一般にスペー
スとスピードのいずれかを取捨選択しなければならな
い。[0003] The usual methods used traditionally are the binary interrogation method and the Hash method. Although the character search is faster when using the hash method, it requires a hash table in a large storage space, and when using the binary interrogation method, it is possible to find out after a number of interrogations. Moreover, the large dictionary cannot be stored in the RAM and must be stored in the hard disk in the form of stock, and reading the hard disk several times by the dichotomy takes a very long time. From a computer perspective, one generally has to choose between space and speed.

【０００４】[0004]

【発明が解決しようとする課題】本発明は辞書を特殊方
式でメモリし、更に非常に小さく圧縮してＲＡＭの中に
収納するもので、文字呼掛け時に磁気ディスクのメモリ
セルを入出する必要がなく、又、“引線”（レグ・リー
ド線）とメモリを対照位置に置く方法を以て、圧縮後辞
書の呼掛スピードをハッシュレベルにまで高められる。DISCLOSURE OF THE INVENTION The present invention stores a dictionary in a special manner and stores it in a RAM after compressing it to a very small size. In addition, the interrogation speed of the post-compression dictionary can be increased to the hash level by the method of placing the "draw line" (leg lead wire) and the memory at the contrast position.

【０００５】本発明の運用は中文の欠字、誤字検査（ス
ペルチェック）から中文ＯＣＲ或いは語音識別後処理す
るシステムにまで及ぶものである。The operation of the present invention extends from a missing character in a Chinese sentence, a typographical error check (spell check) to a system for processing after the OCR of a Chinese sentence or a speech sound identification.

【０００６】[0006]

【課題を解決するための手段】入力した中文の内部コー
ドは索引化（インデックス）装置を経て１シフト演算で
連続に空間を占拠する中文索引化の値となり、漢字の一
字目ＡをＸ軸の座標値とし、二字目ＢをＹ軸の座標値と
する。この二字が熟語を構成するか否かは、字句接続関
係の記憶装置の素子（エレメント）値として逐一設定さ
れ、更にメモリされている為、もし素子の値が対応する
第一字Ａ及び第二字Ｂの組合せで熟語を構成しない（６
０２）となれば、カウンタ中にその個数を記録する。そ
の値が２ ^-14−１に達していれば、カウンタの値は素子
値と同等になり併合して圧縮記憶装置中の一圧縮ワード
中に書込まれ、この素子値に対応する一字目Ａ、二字目
Ｂを二文字の単語とし且つ複数字句の一部分と見なされ
ず、カウンタの値と素子値を併せて圧縮記憶装置中の一
圧縮ワード中に書込み、この素子値が対応する一字目Ａ
及び二字目Ｂを二文字の単語且つ複数文字単語の一部と
する。カウンタ値と素子値を取り込み併せて圧縮記憶装
置中の一圧縮ワード中に書込み、更にＡ及びＢをトップ
に該当する二文字目以降の索引値を次の一つの圧縮ワー
ド中に書入れ、二文字目以降の文字と文字間の接続関係
を次の素子値中に書込む。更に一字目Ａと二字目Ｂを首
とする第２からｎ個の字句を、その二字目以降の文字と
文字の接続関係も又引続き次素子値中に書込む。[Means for Solving the Problems] The internal code of the input Chinese sentence
Do one shift operation through the indexing device
It becomes a value of Chinese indexing that occupies space continuously,
The letter A is the coordinate value of the X axis and the second letter B is the coordinate value of the Y axis.
To do. Whether or not these two characters form a idiom depends on the lexical connection function.
It is set every element as the element value of the storage device.
And the values are stored in the memory so that the values of the elements correspond.
Idioms are not composed of the combination of the first letter A and the second letter B (6
02), the number is recorded in the counter. So
Has a value of 2 ^-14If it reaches -1, the counter value is the element
A compressed word that is equal to the value and merges into compressed storage
First letter A, second letter written inside and corresponding to this element value
B is considered to be a two-letter word and part of a multiple lexical
Instead, the value of the counter and the element value are combined and stored in the compressed storage device.
Write in compressed word, the first letter A to which this element value corresponds
And the second character B is a two-letter word and part of a multi-letter word.
To do. Compressed storage device that captures the counter value and element value together
Write in one compressed word in storage, then top A and B
The index value after the second character that corresponds to
Entered in the text, the second and subsequent characters and the connection relationship between the characters
Is written in the next element value. In addition, the first letter A and the second letter B are necked
The second to nth tokens are defined as the second and subsequent letters.
The connection relationship of characters is also written in the next element value.

【０００７】本発明は各素子値を字句の接続関係記憶装
置の位置と圧縮記憶装置中の位置を一線上に置き、約１
００００個の素子値毎に二種の対照位置を一つの記憶装
置に書込み、文字の高速呼出し目的を達成するものであ
る。According to the present invention, each element value is placed in line with the position of the lexical connection relation storage device and the position in the compression storage device, and the value is about 1.
Two control positions for every 0000 element values are written in one storage device to achieve the purpose of high-speed character calling.

【０００８】[0008]

【実施例】図１に本発明の構成を示し、読取り装置１０
１から、磁気ディスクの辞書書式（アーカイグ）１０２
中−熟語の一字目Ａと二字目Ｂを読込み、索引化装置の
一つ１０３を経てこの二文字に対する索引化値Ｘ，Ｙに
換算し、この二つの索引値をＸとＹ軸座標とする。それ
に対応する素子値はこの二文字の接続関係をメモリ（こ
の素子値を１個から複数ビットとする）。この素子値は
字句の接続関係記憶装置１０４に記録、メモリされ、更
に圧縮装置を経由して圧縮して、圧縮記憶装置１０５に
保存する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT FIG. 1 shows the structure of the present invention, and a reader 10
From 1, the magnetic disk dictionary format (archiving) 102
The first character A and the second character B of the middle-phrase word are read and converted into indexing values X and Y for these two characters through one of the indexing devices 103, and these two index values are the X and Y axis coordinates. And The element value corresponding to this is a memory of the connection relationship of these two characters (this element value is made from one to multiple bits). This element value is recorded and stored in the lexical connection relation storage device 104, further compressed via a compression device, and stored in the compression storage device 105.

【０００９】索引化装置：索引化装置は入力した漢字の
インナーコードシフト演算を経て一つの連続空間を占拠
する中文索引化値を産み出す。漢字は２バイトで１字を
形成し、ＢＩＧ５コード中のハイオーダバイトのメモリ
位置範囲はＡ４〜Ｆ９である。ローオーダバイトは４０
−７ＥとＡ１−ＦＥの２段に分かれる。任意の漢字索引
値Ｘ，Ｙ（Ｘはハイオーダバイト、Ｙはローオーダバイ
ト）で、もしＹ値がＡ１〜ＦＥ値であれば、Ｙ＝Ｙ−２
２の演算を経てローオーダバイトは連続の範囲となる。
図２の（Ａ）には、原ＢＩＧ−５中文インナコードの記
憶体空間分配図を示す。Ａ１≦Ｙ≦ＦＥのＹ値に対しＹ
＝Ｙ−２２及びＸ＝Ｘ−Ａ４、Ｙ＝Ｙ−４０の演算を経
てＸの範囲を０−５５とし、Ｙの範囲は０−９Ｃとな
り、再びＸ^*（Ｙの長さ９π）＋Ｙの演算を経過して、
任意の漢字の索引値Ｘ，Ｙを０〜１３５０１範囲内にあ
る一個の値に反映させる。図２の（Ｂ）には索引化した
後の漢字が占拠する連続空間の構造図を示し、本索引化
法もその他の中文インナコードシステム中に運用しなが
ら、異なるシフト換算を経て連続空間に使用することが
出来る。Indexing device: The indexing device yields a Chinese sentence indexing value occupying one continuous space through an inner code shift operation of the input Chinese character. A Chinese character forms one character with two bytes, and the memory position range of the high-order byte in the BIG5 code is A4 to F9. 40 low order bytes
It is divided into two stages, -7E and A1-FE. Any Kanji index value X, Y (X is a high-order byte, Y is a low-order byte), and if the Y value is A1 to FE, Y = Y-2
Through the operation of 2, the low-order bytes become a continuous range.
FIG. 2A shows a storage space distribution diagram of original BIG-5 Chinese inner code. Y for Y value of A1 ≦ Y ≦ FE
= Y-22 and X = X-A4, Y = Y-40, the range of X is set to 0-55, the range of Y becomes 0-9C, and X ^* (Y length 9π) + Y After the calculation,
The index values X and Y of an arbitrary Chinese character are reflected in one value within the range of 0 to 13501. Fig. 2 (B) shows a structural diagram of the continuous space occupied by the Kanji after indexing. This indexing method also operates in other Chinese internal code systems, but is converted into continuous space through different shift conversion. Can be used.

【００１０】字句接続関係記憶装置：伝統的な中文辞書
（図３の（Ａ）参照）の空間占拠量は甚だ大きく磁気デ
ィスクに入れる必要があり、文字呼出しの回数が頻繁と
なると、ディスクからの入出回数も多くなり、処理スピ
ードは必然的におそくなる。Lexical connection relation storage device: The traditional Chinese dictionary (see FIG. 3 (A)) has a very large space occupation amount, and it is necessary to put it on the magnetic disk. The number of times of entry and exit increases, and the processing speed inevitably slows.

【００１１】図３の（Ｂ）は当発明中の字句の接続関係
記憶装置構造図を示す。本実施例は１を以て２ビットの
素子値とし、中文字句内の最初の２字について接続関係
を記憶する。もし、この中文字句が二文字による熟語或
いは単文字であれば、図３Ｂ素子値中二つ目のビットを
“１”に設定する。この中文字句が三字以上であれば図
３Ｂ素子値中第一のビットは“１”に設定する。FIG. 3B is a structural diagram of a lexical connection relation storage device according to the present invention. In this embodiment, 1 is used as a 2-bit element value, and the connection relation is stored for the first two characters in the middle character phrase. If this middle character phrase is a compound word consisting of two characters or a single character, the second bit in the element value of FIG. 3B is set to "1". If the middle character phrase is three or more characters, the first bit in the element value of FIG. 3B is set to "1".

【００１２】図４には字句の接続関係を構成するフロー
を示す。FIG. 4 shows a flow for forming a lexical connection relationship.

【００１３】一つの中文字句４０１を読取り、もし単文
字４０２と判断すれば一個の索引化装置（図１の１０
３）を経て、この文字の索引値４０３を計算して、Ｘ軸
座標値とし、Ｙ軸座標値を５４０１を設定し、字句の接
続関係を記憶装置中の（Ｘ＋１）^*５４０２^*２＋１の
位置の処（即ち（Ｘ，５４０１）の素子値を“１”に設
定する。読込んだ中文字句が単文字４０２でなければ索
引化装置（図１の１０３）を経て第一と第二字の索引値
４０５（各々ＸとＹを算出）を計算する。二文字であ
り、（Ｘ^*５４０２^*２）＋Ｙ^*２＋１場所の素子値を
“１”４０６に設定する。もしくは（^*５４０２^*２）
＋Ｙ^*２場所の素子値を“１”４０７に設定する。この
方式によれば、すべての中文文字を全部字句接続関係記
憶装置中に書入れることが出来る。One medium character phrase 401 is read, and if it is judged as a single character 402, one indexing device (10 in FIG. 1) is read.
After 3), the index value 403 of this character is calculated to be the X-axis coordinate value, the Y-axis coordinate value is set to 5401, and the lexical connection relationship is the (X + 1) ^* 5402 ^* 2 + 1 position in the storage device. (That is, the element value of (X, 5401) is set to "1". If the read middle character phrase is not the single character 402, the first and second characters are passed through the indexing device (103 in FIG. 1). Calculate the index value 405 (calculate X and Y respectively), which is two letters and sets the element value at (X ^* 5402 ^* 2) + Y ^* 2 + 1 location to "1" 406. or ( ^* 5402 ^* 2)
+ Y ^* 2 Set the element value at the location to “1” 407. According to this method, all the Chinese characters can be written in the lexical connection relation storage device.

【００１４】圧縮装置：字句接続関係を示す素子値で一
つの大型のスペース行列を形成し、比較的大きなメモリ
空間を占拠する。字句を成さない素子値は圧縮装置を利
用して圧縮しメモリ空間を減らすことが出来る。本実施
例に運用する圧縮装置はCompressor: A large space matrix is formed by element values indicating a lexical connection relationship, and occupies a relatively large memory space. Non-lexical element values can be compressed using a compression device to reduce memory space. The compression device used in this embodiment is

【００１５】[0015]

【数３】 [Equation 3]

【００１６】圧縮装置である。A compression device.

【００１７】圧縮記憶装置：圧縮後の素子値を２個のバ
イト（１６個のビット）で１ゾーン（セクタ）として圧
縮記憶装置に記憶し、その前部分１４ビットに圧縮され
た後の字句を形成しない素子値の個数をローディングし
て、後の２ビットに素子値自身をローディングし、圧縮
後のワードを形成する。図５の（Ａ）にはその構成図を
示している。その中、素子値が‘００’の時、表ではこ
の素子値に対応する二中文文字は字句を形成しない。素
子値が‘０１’の場合、この素子値が対応する二つの中
文文字は丁度二字の言葉を形成する。素子値が‘１０’
の時、この素子値に対応する二つの中文文字は三字或い
は三字以上の言葉を形成する。素子値が‘１１’の場
合、表でこの素子値に対応する二つの中文文字自身が二
文字の言葉であり、又、三字或いは三字以上の言葉であ
る。Compressed storage device: The element value after compression is stored in the compressed storage device as 2 zones (16 bits) as one zone (sector), and the token after being compressed to 14 bits in the front part is stored. The number of element values that are not formed is loaded, and the element values themselves are loaded in the latter two bits to form a compressed word. FIG. 5A shows the configuration diagram. Among them, when the element value is '00', the Chinese character corresponding to this element value does not form a token in the table. If the element value is '01', the two Chinese characters to which the element value corresponds form exactly two words. Element value is '10'
Then, the two Chinese characters corresponding to this element value form three or more words. When the element value is '11', the two Chinese characters themselves corresponding to this element value in the table are two-letter words, or three or more letters.

【００１８】上述の１ワード中最後の２ビット（即ち素
子値）が“１０”あるいは“１”の場合、それに対応す
る第一字（Ａ）及び第二字（Ｂ）を表示し、三字以上の
言葉を形成することが出来る。この時、第三字の索引値
を次の１ワードの前１３ビットにローディングされ、第
三字とその後の文字関係を、そのあとに来る３ビットの
次の素子値でローディングする。When the last 2 bits (that is, the element value) in one word described above are "10" or "1", the corresponding first letter (A) and second letter (B) are displayed, and three letters are displayed. The above words can be formed. At this time, the index value of the third character is loaded into the previous 13 bits of the next 1 word, and the character relationship between the third character and the subsequent characters is loaded with the next 3 bits of the next element value.

【００１９】図５の（Ｂ）には圧縮記憶装置中２字以上
になる字句のメモリ方式を示し、その次素子値中第２ビ
ットに“１”を立てた時、当該文字は中文字句の最終文
字であることを表示し、次の一つのワードは、別の一個
の索引値（Ｘ，Ｙ）を以て始まる中文字句の第３字をロ
ーディングする。次素子値中第３ビットに“１”が設定
された場合、続く一つのワードと、本ワードは同一中文
字句に属することを表わし、次の素子値（エレメント）
中第１ビットは、本実施例では保留し使用しない。FIG. 5B shows a memory system of a phrase having two or more characters in the compression storage device. When the second bit in the element value is set to "1", the character is a medium phrase. Indicating that it is the last character, the next one word loads the third character of the middle phrase beginning with another index value (X, Y). When the third bit in the next element value is set to "1", it means that the following one word and this word belong to the same middle character phrase, and the next element value (element)
The middle first bit is reserved and is not used in this embodiment.

【００２０】図６には圧縮並びにメモリした素子値が圧
縮記憶に至る過程を示したものである。FIG. 6 shows the process in which the compressed and memorized element values reach the compressed storage.

【００２１】字句の接続関係記憶装置中の１素子値６０
１を読取り、この素子値が対応する一字目Ａ、二字目Ｂ
が熟語を形成しないと判断６０２した時測定カウンタの
値が既に２¹⁴−１（図７の（Ａ）の６０５）に達してい
れば、カウンタの値は２ビットシフトして素子値に連同
し、併せて圧縮記憶装置中の一個の圧縮ワード６０６に
書込まれ、カウンタは“０”に復帰６０７する。この素
子値に対応する一字目Ａと二字目Ｂが二文字であり且つ
複数字句６０３の一部でないと判断すれば、カウンタの
値は２ビットシフトし、素子値と併せて圧縮記憶装置中
の一つの圧縮ワード６０８に書込まれ、カウンタは０と
なる（６０９）。1-element value 60 in lexical connection relation storage device
1 is read, and the first character A and the second character B corresponding to this element value
When it is determined 602 that the phrase is not formed, if the value of the measurement counter has already reached 2 ¹⁴ -1 (605 in FIG. 7A), the counter value is shifted by 2 bits and linked to the element value. , Is also written in one compressed word 606 in the compressed storage device, and the counter returns 607 to “0”. If it is determined that the first letter A and the second letter B corresponding to this element value are two letters and are not part of the plural tokens 603, the value of the counter is shifted by 2 bits and the compression value is stored together with the element value. It is written to one of the compressed words 608 and the counter becomes 0 (609).

【００２２】この素子値が対応する一字目Ａと二字目Ｂ
が二文字の字句であり且つ複数字句の一部分６０４と判
断した場合、カウンタ値は２ビットシフトして素子値と
併せ圧縮記憶装置中の一個の圧縮ワード６１０に書込ま
れ同時にカウンタは０に戻る（６１１）。ＡとＢで始ま
る当該字句の第２字以後の文字の索引値を３ビットシフ
トさせ、次の一個の圧縮ワード中６１２に書込み、２字
目以降の文字と文字間の接続関係も続いて次素子値中６
１３に書入れる。図８は圧縮記憶装置中２字以上の言葉
のメモリ構成図を示している。The first character A and the second character B to which this element value corresponds
Is a two-character lexical part and it is judged that it is a part 604 of the plural lexical parts, the counter value is shifted by 2 bits and written together with the element value into one compressed word 610 in the compressed storage device, and at the same time, the counter returns to 0. (611). The index value of the second and subsequent characters of the lexical beginning with A and B is shifted by 3 bits and written in 612 in the next one compressed word, and the second and subsequent characters and the connection relationship between the characters also continue. 6 out of element values
Write in 13. FIG. 8 shows a memory configuration diagram of words of two or more characters in the compressed storage device.

【００２３】字句の呼出し装置：対照位置記憶装置高速で文字を探索し呼掛けを行ない呼出す目的を達成す
る為（圧縮記憶装置の最前列から圧縮が解除されるのを
防ぐため、探索する字句が探しだされる迄保持する）、
各素子値は字句の接続関係記憶装置に於ける位置と、圧
縮記憶装置中に於ける位置を一つの連なったものとす
る。本発明は約１００００個の素子値毎に二種の対照位
置を一つの記憶装置に書込むものである。Lexical calling device: contrast position storage device To achieve the purpose of calling and calling characters by searching at high speed (to prevent decompression from the front row of the compression storage device, Hold until searched),
Each element value is a combination of a position in the lexical connection relation storage device and a position in the compression storage device. The present invention writes two types of reference positions into one memory device for every 10,000 element values.

【００２４】図９には対照位置のセットアップとメモ
リ。FIG. 9 shows the control position setup and memory.

【００２５】素子値の位置指標、圧縮位置の指標及びリ
ード線のポイント値の初期値８０１を設定、圧縮記憶装
置から整数一個を読み取る（８０２）。この数字は字句
を形成しないものの素子値の個数を代表しており、累積
加算して素子値の位置指標８０３に達したならば、圧縮
位置の指標８０４に増加する。もし、当該整数の後の素
子値が二文字且つ二文字以上ある言葉を代表していれ
ば、引続きその後の整数を読み取り、次素子値が“００
０”（８０５）になるまで継続する。次に素子値の位置
指標がリード線のポイント値８０６より大きいか否かを
測定する。もし、素子値の位置指標及び圧縮位置指標を
対照位置記憶装置８０７中に書込み、更にリード線ポイ
ントの累積が１０００（８０８）になれば素子値の位置
指標に１を加え（８０９）、その他の整数は完全に処理
をする。An element value position index, a compression position index, and an initial value 801 of a lead wire point value are set, and one integer is read from the compression storage device (802). This number represents the number of element values which do not form a token, and when cumulatively added to reach the element value position index 803, it increases to the compression position index 804. If the element value after the integer represents two words and there are two or more characters, the integer after that is read and the next element value is "00".
It continues until it becomes 0 ″ (805). Next, it is measured whether the position index of the element value is larger than the point value 806 of the lead wire. Writing during 807, and when the accumulation of the lead wire points reaches 1000 (808), 1 is added to the position index of the element value (809), and other integers are completely processed.

【００２６】字句探索（呼掛け）の過程：（ＡＢ［ＣＤ
…］）が熟語９０１か否かを呼掛け、（ＡＢ）の索引値
（Ｘ，Ｙ）［注：単文字であれば、索引値は（Ｘ，５４
０１）となる］を計算する。（Ｘ，Ｙ）の素子値の位置
（ＬＯＣ）（９０３）を計算して、対照位置記憶装置内
の第〔素子値位置／１００００〕項目のエレメントを探
し、ＬＯＣに最も近い１本のリード線を得る。即ち、素
子値指標とこれに反映する圧縮位置の指標である。９０
４は圧縮位置指標の場所から順番に一個の整数９０５の
読取りを開始し、累積が「字句の接続表」指標９０６に
至るまで加算する。ＬＯＣ＞素子値の位置指標９０７
の時、圧縮を解除されていないものから探索しようとす
る素子値に相対して反映する中文字句を表示しており、
引き続き圧縮を解除する。この整数の素子値が“１０”
か“１１”かを測定し、継続して圧縮記憶装置の整数を
読み、その整数末尾数に次ぐ素子値が“０００”（９０
８）となるまで読取る。次に素子値の位置指標が１つ増
え、継続して辞書の圧縮を解除する（９０９）。ＬＯ
Ｃ＜素子値の位置指標ならば、（Ａ，Ｂ）の文字が存在
しないことを表示しており、即ち（ＡＢ［ＣＤ…］）の
一字句は無い（９１０）。ＬＯＣ＝指標であれば、
（Ａ，Ｂ）字句の位置を探しあてたことを表示してい
る。その場合、二文字（Ａ，Ｂ）のみであれば、それに
見あう整数の素子値が“０１”か“１１”かを測定す
る。（Ａ，Ｂ）が字句の存在を表示しなければ（Ａ，
Ｂ）の字句が存在しないことを表示している。三文字以
上の言葉（ＡＢＣＤ…）の呼掛けでは、当該整数が“１
０”か“１１”かを測定して、入力した二字目以後の文
字を持続的に圧縮記憶装置中で追随してくる漢字と逐一
対比させて、対比に成功するまで続ける。或いは、次の
素子が“０００”（９１１）となるまで対比を続ける。Process of lexical search (interrogation): (AB [CD
...]) is an idiom 901, and the index value (X, Y) of (AB) [Note: if it is a single character, the index value is (X, 54).
01)] is calculated. The position (LOC) (903) of the element value of (X, Y) is calculated to search for the element of the [element value position / 10,000] item in the reference position storage device, and one lead wire closest to the LOC To get That is, it is an element value index and an index of the compression position reflected on it. 90
4 starts reading one integer 905 in order from the position of the compression position index, and adds up until the accumulation reaches the “lexical connection table” index 906. LOC> element value position index 907
At the time of, it displays the middle character phrase that reflects relative to the element value to be searched from the one that is not decompressed,
Continue decompressing. The element value of this integer is "10"
Or "11" is measured, the integer of the compression storage device is continuously read, and the element value next to the end number of the integer is "000" (90
Read until 8) is displayed. Next, the position index of the element value increases by 1, and the compression of the dictionary is continuously released (909). LO
If the position index is C <element value, it indicates that the characters (A, B) do not exist, that is, there is no single word (AB [CD ...]) (910). If LOC = index,
(A, B) It is displayed that the position of the lexical is searched for. In that case, if there are only two letters (A, B), it is measured whether the integer element value corresponding to it is "01" or "11". If (A, B) does not indicate the presence of a lexical (A,
It indicates that the phrase B) does not exist. In the challenge of words with three or more letters (ABCD ...), the integer is "1".
It measures 0 "or" 11 "and continuously compares the input characters after the second character with the Chinese characters that follow in the compression storage device, and continues until the comparison succeeds. Continuing the comparison until the element of becomes "000" (911).

[Brief description of drawings]

【図１】本発明の構成図である。FIG. 1 is a configuration diagram of the present invention.

【図２】（Ａ），（Ｂ）は中文インナコードの連続記憶
空間分解図、及び索引化後中文インナコードの連続記憶
空間図である。2A and 2B are a continuous storage space decomposition diagram of a Chinese inner code and a continuous storage space diagram of an indexed Chinese inner code.

【図３】（Ａ），（Ｂ）は伝統的辞書保管形式及び本発
明中の字句接続関係のメモリ方式を説明する図である。3A and 3B are diagrams illustrating a traditional dictionary storage format and a memory system for lexical connection in the present invention.

【図４】字句の接続関係設立及びメモリ方法を説明する
フローチャートである。FIG. 4 is a flowchart illustrating a lexical connection establishment and memory method.

【図５】（Ａ），（Ｂ）は圧縮記憶装置中の素子値メモ
リ方式及び圧縮記憶装置中、二字以上の熟語メモリ方式
を説明する図である。5A and 5B are diagrams for explaining an element value memory system in a compression storage device and a compound word memory system of two or more characters in the compression storage device.

【図６】圧縮とメモリ素子値から圧縮辞書記憶装置に至
る方法を説明するフローチャートである。FIG. 6 is a flow chart illustrating a method of compression and reaching from a memory element value to a compression dictionary storage device.

【図７】（Ａ），（Ｂ），（Ｃ）は同じく圧縮とメモリ
素子値から圧縮辞書記憶装置に至る方法を説明する図で
ある。7 (A), (B), and (C) are diagrams for explaining a method of similarly performing compression and a memory element value to reach a compression dictionary storage device.

【図８】圧縮記憶装置中２文字以上の熟語に対するメモ
リ分解図である。FIG. 8 is a memory exploded view for a phrase having two or more characters in a compressed storage device.

【図９】対照位置のメモリ方法を説明するフローチャー
トである。FIG. 9 is a flowchart illustrating a memory method of a reference position.

【図１０】文字呼掛け方法を説明するフローチャートで
ある。FIG. 10 is a flowchart illustrating a character calling method.

[Explanation of symbols]

１０１読取装置１０３索引化装置１０４字句接続関係記憶装置１０５圧縮記憶装置 101 Reading Device 103 Indexing Device 104 Lexical Connection Relation Storage Device 105 Compressed Storage Device

Claims

[Claims]

1. An inner code of two corresponding Chinese characters, which stores a connection relationship between Chinese characters and can display an element value of 1 to multiple bits in the connection relationship between the above characters. The dictionary storage device is a binary index value that continuously occupies the storage space while shifting through the indexing device.

2. A one-character connection relation storage device for storing element values of 1 to a plurality of bits, that is, a connection relation between Chinese characters and a character, and A compression dictionary storage device comprising: a compression device that compresses those that do not form the above-mentioned element value middle-word, and a compression storage device that stores the number of element values that do not form the above-mentioned token, and two or more characters. ..

3. The compression dictionary storage device according to claim 2, wherein the compression dictionary storage device includes one indexing device, calculates an inner code of Chinese characters into one, and continuously converts a memory space of the characters.

4. The two-character memory of the compressed storage device comprises:
3. The compression dictionary storage device according to claim 2, wherein the element value of the connection relation is stored in a memory.

5. A memory for a plurality of characters in the compressed storage device stores an index value of the second and subsequent characters, and displays the next element value of the connection relationship between the characters of the plurality of characters. The compression dictionary storage device according to claim 2, wherein

6. The element value is 2 bits, according to claim 1 or 2.
The described compressed dictionary storage device.

7. The compression dictionary storage device according to claim 1, wherein the secondary element value is 3 bits.

8. A single lexical phrase is read, the first and second letters of the lexical phrase are taken, an inner code index value of the above-mentioned two characters is calculated, and a search is performed by a reference position device, and the value of the compression position and the lexical connection position are calculated. The value of is obtained, starting from the location of the value of the above-mentioned position in the above-mentioned compressed storage device, searching for the closest element value,
3. The compression dictionary storage device according to claim 2, further comprising one character search method comprising the steps of obtaining a connection relation between two characters, reading the next element value in order, and obtaining a connection relation of three or more characters.

9. A connection relation in which a first character and a second character are input by inputting a lexical character and switching the inner code of the first character and the second character to convert into an indexed value of a continuous space. Set the element value and compress the connection related element value of the above character, that is,
Compress the part that does not form a lexical record, record the number, the number of those that do not form the lexical, the character index after the second character and the next element value that indicates the continuous relationship between the characters with the next element value. A method of compressing and storing a Chinese dictionary that consists of each step of memory.

10. The equation 2 The method of compressing and storing a medium language dictionary according to claim 9, wherein a method of operating coding is adopted.