JP2005135086A

JP2005135086A - Dictionary data compression apparatus, electronic dictionary device, compressed dictionary data production method and program

Info

Publication number: JP2005135086A
Application number: JP2003369027A
Authority: JP
Inventors: Shinichi Matsui; 紳一松井
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2003-10-29
Filing date: 2003-10-29
Publication date: 2005-05-26

Abstract

<P>PROBLEM TO BE SOLVED: To increase the compression efficiency of a dictionary coding method by discretely selecting entry unit data to create a reference part and changing the coding method in dependence on a character string. <P>SOLUTION: A CPU 10 discretely selects entry unit data from original English-Japanese dictionary data 200 as a processed English-Japanese reference part 302, and handles the remaining entry unit data as main data part intermediate data 306. The CPU 10, for character strings of a predetermined word length or longer, executes a global search program 214 to decide character strings to be coded, and for the other character strings, executes a local search program 216 to decide character strings to be coded. A coding program 218 is then executed to code the main data part intermediate data 306 into a coded English-Japanese main data part 304. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、辞書データを圧縮する辞書データ圧縮装置、圧縮された辞書データを伸張して利用する電子辞書装置、圧縮した辞書データを製造する圧縮辞書データ製造方法及びプログラムに関する。 The present invention relates to a dictionary data compression device that compresses dictionary data, an electronic dictionary device that expands and uses compressed dictionary data, and a compressed dictionary data manufacturing method and program for manufacturing compressed dictionary data.

データの圧縮アルゴリズムには種々のものが知られているが、電子辞書装置等で利用される辞書データの圧縮は、一般文章データの圧縮と比較して「見出語毎のランダムアクセス（復号・伸張）が必要」、「符号化（圧縮ともいう。）は製品開発時に１度行うのみであるため十分時間をかけても問題がない」という２点において大きく異なっている。 Various data compression algorithms are known, but compression of dictionary data used in an electronic dictionary device or the like is “random access (decoding / decoding for each headword) compared with compression of general sentence data”. There is a great difference in two points: “requires expansion” and “encoding (also referred to as compression) is performed only once during product development, so there is no problem even if sufficient time is spent”.

「見出語毎のランダムアクセス」とは、辞書データを見出語単位で復号・伸張することをいう。辞書データは紙の辞書と同等の内容を有している。符号化前の辞書データは、紙の辞書に印字されている文字を文字コードとしたものであって、一連のテキストのデータとなっているのが一般的である。すなわち、先頭の見出語から順番に、紙の辞書に印字されている文字（テキスト）が連続したテキストのデータである。この辞書データを全体として単純に圧縮すると、任意の見出語の情報（当該見出語に関する説明文）を復号することができない。このため、連続したテキストデータを見出語毎（見出語単位）で区切り、圧縮する必要がある。 “Random access for each headword” means decoding / decompression of dictionary data in units of headwords. The dictionary data has the same contents as a paper dictionary. The dictionary data before encoding is generally a series of text data in which characters printed on a paper dictionary are character codes. That is, the text data is a series of characters (text) printed in a paper dictionary in order from the first headword. If this dictionary data is simply compressed as a whole, information of any headword (descriptive text related to the headword) cannot be decoded. For this reason, it is necessary to divide the continuous text data into headwords (headword units) and compress them.

「符号化は製品開発時に１度行うのみ」とは、文字通りメーカ側が十分な時間をかけて辞書データを符号化できることを意味する。すなわち、電子辞書装置では圧縮された辞書データを伸張するのみであり、圧縮（符号化）することはない。従って、高速な計算機によるあらゆる可能性を試した後に最も効率のよい圧縮方法を採用できるという利点がある。 “Encoding is performed only once at the time of product development” means that the manufacturer can literally encode dictionary data over a sufficient time. That is, the electronic dictionary device only decompresses the compressed dictionary data, and does not compress (encode) it. Therefore, there is an advantage that the most efficient compression method can be adopted after trying all possibilities by a high-speed computer.

この様な特徴の中で、例えば、特許文献１、特許文献２及び特許文献３のような辞書データの圧縮方法が提案されている。 Among such features, for example, dictionary data compression methods such as Patent Document 1, Patent Document 2, and Patent Document 3 have been proposed.

例えば、辞書データ全体を１割程度の部分（以下適宜「参照部」という。）と残りの部分（以下適宜「主データ部」という。）とに区分する。そして、主データ部中の文字列が参照部中に含まれているか否かを判定し、含まれている場合には含まれている参照部中の位置及び語長に基づいて符号化することにより主データ部の圧縮を行う方法が知られている。 For example, the entire dictionary data is divided into about 10% part (hereinafter referred to as “reference part” as appropriate) and the remaining part (hereinafter referred to as “main data part” as appropriate). Then, it is determined whether or not the character string in the main data part is included in the reference part, and if it is included, encoding is performed based on the position and word length in the reference part included. A method for compressing the main data portion is known.

具体的には、主データ部中の符号化対象の文字列が、参照部中のどの位置の文字列に相当するのかを判定して符号化する。このような文字列が含まれる参照部を参照して符号化する方法は、辞書型符号化方法として知られているものである。辞書型符号化方法にはＬＺ７７法やＬＺ７８法等があるが、辞書データに辞書型符号化方法を適用する特徴としては、「見出語毎のランダムアクセス」を可能とするため、参照部を固定とすることが挙げられる。 Specifically, the character string to be encoded in the main data part is determined and encoded at which position in the reference part. A method of encoding with reference to a reference portion including such a character string is known as a dictionary-type encoding method. There are LZ77 method, LZ78 method, etc. in the dictionary type encoding method, but the feature of applying the dictionary type encoding method to the dictionary data is to enable the “random access for each headword”, the reference part It may be fixed.

また、参照部中に一致する文字列が存在しない場合には、その符号化対象の文字列をハフマン符号等で直接可変長で符号化することにより圧縮する。このように、符号化には、文字列が含まれる参照部の位置を参照して符号化する辞書型符号化方法と、直接ハフマン符号等の可変長符号により符号化する方法の２つを利用する。
特開平６−２５１０７０号公報特開平８−３１４９６０号公報特開平１１−９６１８６号広報 If there is no matching character string in the reference portion, the character string to be encoded is compressed by directly encoding it with a variable length using a Huffman code or the like. As described above, two types of encoding are used: a dictionary-type encoding method that encodes by referring to the position of the reference portion that includes the character string, and a method that encodes directly using a variable-length code such as a Huffman code. To do.
JP-A-6-251070 JP-A-8-314960 JP 11-96186

ここで、上述した２つの圧縮方法のうち、直接可変長で符号化する方法は、参照部を利用せず、使用されている文字列全てに対して一意に符号を割り当てる必要があるため、辞書型符号化方法に比べて圧縮率（圧縮効率ともいう。）が悪い。従って、より多くの文字列を辞書型符号化方法で符号化するために、辞書データのうち参照部の占める割合を高くすることが考えられる。 Here, of the two compression methods described above, the method of directly encoding with a variable length does not use the reference part, and it is necessary to uniquely assign a code to all the used character strings. The compression rate (also referred to as compression efficiency) is worse than the type coding method. Therefore, in order to encode a larger number of character strings using the dictionary-type encoding method, it is conceivable to increase the ratio of the reference portion in the dictionary data.

しかし、参照部の占める割合を高くした場合、参照部における文字列の位置を示すアドレスは、参照部の大きさに応じて大きくなってしまう。従って、アドレスを表す情報量（符号）が大きくなると、符号化した主データ部の情報量が大きくなり、辞書データ全体としての圧縮効率が悪くなってしまう。 However, when the ratio occupied by the reference portion is increased, the address indicating the position of the character string in the reference portion is increased according to the size of the reference portion. Therefore, when the information amount (code) representing the address is increased, the information amount of the encoded main data portion is increased, and the compression efficiency of the entire dictionary data is deteriorated.

また、文字列は長い文字列と一致する程圧縮効率が良い。さらに、短い文字列の場合は、辞書型符号化方法により符号化して圧縮するより、直接可変長で符号化して圧縮する方が圧縮効率が良い場合がある。しかし、従来は主データ部の先頭から、順次参照部と一致するか否かを判定するだけであったため、圧縮された文字列が必ずしも圧縮効率が良いとは言えなかった。 Also, the compression efficiency is better as the character string matches the longer character string. Furthermore, in the case of a short character string, there are cases where compression efficiency is better when directly encoded with a variable length and compressed than when encoded and compressed by a dictionary-type encoding method. However, conventionally, since it has only been determined from the head of the main data portion whether or not it sequentially matches the reference portion, the compressed character string cannot always be said to have good compression efficiency.

さらに、従来は辞書データ全体のうち、所定の割合を先頭から区分して参照部としている。しかし、辞書データは、通常ａｂｃ順やあいうえお順のように整列されて保存されている。従って、先頭から所定の割合に応じて区分した場合、参照部に含まれる文字列は重複する場合が多く、主データ部に一致する文字列が効率良く含まれていなかった。 Further, conventionally, a predetermined ratio of the entire dictionary data is divided from the head and used as a reference portion. However, the dictionary data is usually arranged and saved in the order of abc or aiueo. Therefore, when divided from the head according to a predetermined ratio, the character strings included in the reference part often overlap, and the character string matching the main data part is not efficiently included.

本発明は以上の課題に鑑みてなされたものであり、目的とするところは、辞書型符号化方法において、見出語単位データを離散的に選択して参照部を作成し、文字列に応じて符号化方法を変更し、圧縮効率を高めることである。さらに、主データ部の文字列が、参照部に頻出する場合には、参照部における文字列の位置を符号で表すことにより辞書データの圧縮効率を高めることを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to create a reference section by discretely selecting headword unit data in a dictionary-type encoding method, and according to a character string. Changing the encoding method to increase the compression efficiency. Furthermore, when the character string of the main data portion appears frequently in the reference portion, the object is to increase the compression efficiency of the dictionary data by representing the position of the character string in the reference portion with a code.

以上の課題を解決するために、請求項１に記載の発明の辞書データ圧縮装置は、
文字列が見出語単位で一連に記述されている辞書データを参照部とデータ部とに区分する区分手段（例えば、図２のハードディスク２０；元英和辞典データ２００）と、
文字の出現頻度に基づき、前記参照部の中から所定の頻度で出現する文字の位置を頻出参照位置として複数選択し、各頻出参照位置を識別するための符号と対応づけて複数記憶する位置記憶手段（例えば、図２のＲＡＭ３０；サブテーブル３１４）と、
参照位置が前記頻出参照位置の場合には前記位置記憶手段に記憶された当該参照位置の符号及び当該参照位置からの語長を用い、参照位置が前記頻出参照位置でない場合には当該参照位置及び当該参照位置からの語長を用いた辞書型符号化方法により前記主データ部を見出語単位で復号可能に符号化する主データ部符号化手段（例えば、図２のＣＰＵ１０；図５のステップＡ１８）と、
を備えることを特徴とする。 In order to solve the above problems, the dictionary data compression device according to the first aspect of the present invention provides:
Classifying means (for example, hard disk 20 in FIG. 2; original English-Japanese dictionary data 200) for classifying dictionary data in which character strings are described in series in headword units, into reference parts and data parts;
A position memory that selects a plurality of positions of characters that appear at a predetermined frequency from the reference section as the frequent reference positions based on the appearance frequency of the characters, and stores a plurality of positions in association with codes for identifying each frequent reference position Means (eg, RAM 30 in FIG. 2; sub-table 314);
When the reference position is the frequent reference position, the code of the reference position and the word length from the reference position stored in the position storage means are used. When the reference position is not the frequent reference position, the reference position and Main data portion encoding means (for example, CPU 10 in FIG. 2; step in FIG. 5) that encodes the main data portion so that it can be decoded in units of headwords by a dictionary-type encoding method using the word length from the reference position. A18)
It is characterized by providing.

また、請求項６に記載の発明の圧縮辞書データ製造方法は、
コンピュータに、
文字列が見出語単位で一連に記述されている辞書データを参照部と主データ部とに区分させる区分工程（例えば、図２のハードディスク２０；元英和辞典データ２００）と、
文字の出現頻度に基づき、前記参照部の中から所定の頻度で出現する文字の位置を頻出参照位置として複数選択し、各頻出参照位置を識別するための符号と対応づけて複数記憶する位置記憶工程（例えば、図２のＲＡＭ３０；サブテーブル３１４）と、
参照位置が前記頻出参照位置の場合には前記位置記憶工程で記憶した当該参照位置の符号及び当該参照位置からの語長を用い、参照位置が前記頻出参照位置でない場合には当該参照位置及び当該参照位置からの語長を用いた辞書型符号化方法により前記主データ部を見出語単位で復号可能に符号化させて出力させる主データ部出力工程（例えば、図２のＣＰＵ１０；図５のステップＡ１８）と、
を含むことを特徴とする。 The compression dictionary data manufacturing method of the invention according to claim 6 is:
On the computer,
A classification step (for example, hard disk 20 in FIG. 2; original English-Japanese dictionary data 200) for classifying dictionary data in which character strings are described in series in headword units into a reference portion and a main data portion;
A position memory for selecting a plurality of positions of characters appearing at a predetermined frequency as the frequent reference positions from the reference portion based on the appearance frequency of the characters, and storing a plurality of positions in association with codes for identifying each frequent reference position A process (eg, RAM 30 in FIG. 2; sub-table 314);
When the reference position is the frequent reference position, the code of the reference position stored in the position storage step and the word length from the reference position are used. When the reference position is not the frequent reference position, the reference position and the reference position A main data portion output step (for example, CPU 10 in FIG. 2; FIG. 5) that outputs the main data portion in such a way that it can be decoded in units of headwords by the dictionary-type encoding method using the word length from the reference position. Step A18)
It is characterized by including.

請求項１０に記載の発明のプログラムは、
コンピュータに、
文字列が見出語単位で一連に記述されている辞書データを参照部と主データ部とに区分する区分機能（例えば、図２のハードディスク２０；元英和辞典データ２００）と、
文字の出現頻度に基づき、前記参照部の中から所定の頻度で出現する文字の位置を頻出参照位置として複数選択し、各頻出参照位置を識別するための符号と対応づけて複数記憶する位置記憶機能（例えば、図２のＲＡＭ３０；サブテーブル３１４）と、
参照位置が前記頻出参照位置の場合には前記位置記憶機能に記憶された当該参照位置の符号及び当該参照位置からの語長を用い、参照位置が前記頻出参照位置でない場合には当該参照位置及び当該参照位置からの語長を用いた辞書型符号化方法により前記主データ部を見出語単位で復号可能に符号化する主データ部符号化機能（例えば、図２のＣＰＵ１０；図５のステップＡ１８）と、
を実現させることを特徴とする。 The program of the invention according to claim 10 is:
On the computer,
A classification function (for example, hard disk 20 in FIG. 2; original English-Japanese dictionary data 200) for classifying dictionary data in which character strings are described in series in headword units into a reference part and a main data part;
A position memory for selecting a plurality of positions of characters appearing at a predetermined frequency as the frequent reference positions from the reference portion based on the appearance frequency of the characters, and storing a plurality of positions in association with codes for identifying each frequent reference position Functions (eg, RAM 30 in FIG. 2; sub-table 314);
When the reference position is the frequent reference position, the code of the reference position and the word length from the reference position stored in the position storage function are used. When the reference position is not the frequent reference position, the reference position and Main data part encoding function (for example, CPU 10 in FIG. 2; step in FIG. 5) that encodes the main data part so that it can be decoded in units of headwords by a dictionary-type encoding method using the word length from the reference position. A18)
It is characterized by realizing.

請求項１、６又は１０に記載の発明によれば、辞書型符号化方法により主データ部を符号化する際に、参照する文字列が頻出する場合には、その頻出する文字列の位置を符号で表すことができる。従って、文字列の位置を用いて直接符号化する場合に比べてより少ない情報量で符号化することが可能となる。 According to the invention described in claim 1, 6 or 10, when the character string to be referred to frequently appears when the main data portion is encoded by the dictionary type encoding method, the position of the frequently appearing character string is determined. It can be represented by a code. Therefore, it is possible to encode with a smaller amount of information than in the case of direct encoding using the position of the character string.

請求項２に記載の発明の辞書データ圧縮装置は、
文字列が見出語単位で一連に記述されている辞書データ全体の中から見出語単位のデータを離散的に複数抽出して参照部（例えば、図２のＲＡＭ３０；処理後英和参照部３０２）とし、残りの部分を主データ部（例えば、図２のＲＡＭ３０；主データ部中間データ３０６）として区分する区分手段（例えば、図２のＣＰＵ１０；図５のステップＡ１０）と、
前記参照部を参照元とする辞書型符号化方法により前記主データ部を見出語単位で復号可能に符号化する主データ部符号化手段（例えば、図２のＣＰＵ１０；図５のステップＡ１８）と、
を備えることを特徴とする。 The dictionary data compression device according to the second aspect of the present invention provides:
A plurality of data in units of headwords are discretely extracted from the entire dictionary data in which character strings are described in series in units of headwords, and a reference unit (for example, RAM 30 in FIG. 2; post-processing English-Japanese reference unit 302). And a sorting means (for example, CPU 10 in FIG. 2; step A10 in FIG. 5) for classifying the remaining portion as a main data portion (for example, RAM 30 in FIG. 2; main data portion intermediate data 306);
Main data portion encoding means for encoding the main data portion so that it can be decoded in units of headwords by a dictionary-type encoding method using the reference portion as a reference source (for example, CPU 10 in FIG. 2; step A18 in FIG. 5). When,
It is characterized by providing.

また、請求項７に記載の発明の圧縮辞書データ製造方法は、コンピュータに、
文字列が見出語単位で一連に記述されている辞書データ全体の中から見出語単位のデータを離散的に複数抽出させて参照部とし、残りの部分を主データ部として区分させる区分工程（例えば、図２のＣＰＵ１０；図５のステップＡ１０）と、
前記参照部を参照元とする辞書型符号化方法により前記主データ部を見出語単位で復号可能に符号化させて、出力させる主データ部出力工程（例えば、図２のＣＰＵ１０；図５のステップＡ１８）と、
を含むことを特徴とする。 Further, the compression dictionary data manufacturing method of the invention according to claim 7 is provided in a computer.
A classification process in which multiple pieces of data in terms of terms are discretely extracted from the entire dictionary data in which character strings are described in series in terms of terms, and used as a reference part, and the remaining part is classified as a main data part (For example, CPU 10 in FIG. 2; Step A10 in FIG. 5);
A main data part output step (for example, the CPU 10 in FIG. 2; FIG. 5) for encoding and outputting the main data part so as to be decodable in units of headwords by a dictionary-type encoding method using the reference part as a reference source. Step A18)
It is characterized by including.

また、請求項１１に記載のプログラムは、
コンピュータに、
文字列が見出語単位で一連に記述されている辞書データ全体の中から見出語単位のデータを離散的に複数抽出して参照部とし、残りの部分を主データ部として区分する区分機能（例えば、図２のＣＰＵ１０；図５のステップＡ１０）と、
前記参照部を参照元とする辞書型符号化方法により前記主データ部を見出語単位で復号可能に符号化する主データ部符号化機能（例えば、図２のＣＰＵ１０；図５のステップＡ１８）と、
を実現させることを特徴とする。 The program according to claim 11 is:
On the computer,
A classification function that separates the data of the headword unit from the entire dictionary data in which character strings are described in series in the unit of headword, and uses it as a reference part and the rest as the main data part (For example, CPU 10 in FIG. 2; Step A10 in FIG. 5);
Main data portion encoding function for encoding the main data portion so that it can be decoded in units of headwords by a dictionary-type encoding method using the reference portion as a reference source (for example, CPU 10 in FIG. 2; step A18 in FIG. 5) When,
It is characterized by realizing.

請求項２、７又は１１に記載の発明によれば、辞書データ全体の中から見出語単位のデータを離散的に抽出して参照部とすることができる。従って、辞書データの一部分に偏ることなく、参照部を作成することが可能となる。 According to the second, seventh, or eleventh aspect of the present invention, it is possible to discretely extract headword unit data from the entire dictionary data and use it as a reference unit. Therefore, it is possible to create the reference portion without being biased toward a part of the dictionary data.

請求項３に記載の発明の辞書データ圧縮装置は、
文字列が見出語単位で一連に記述されている辞書データを参照部と主データ部とに区分し、前記主データ部中の文字列に一致する一致文字列が前記参照部中に在る場合に、当該一致文字列のコピーを表す符号により当該主データ部の文字列を符号化する辞書型符号化方法により辞書データを圧縮する辞書データ圧縮装置において、
前記主データ部中の文字列のうち、所定語長以上の文字列を対象として前記辞書型符号化方法により圧縮する所定語長以上圧縮手段（例えば、図２のＣＰＵ１０；図５のステップＡ１４）と、
文字列とこの文字列の符号化効率予測値とを対応づけて複数記憶する記憶手段（例えば、図２のハードディスク２０；符号化効率予測値テーブル２０６）と、
この所定語長以上圧縮手段によって圧縮された後の主データ部に含まれている文字列のうち、前記記憶手段に記憶された符号化効率予測値の高い文字列から順番に前記辞書型符号化方法により圧縮する評価値順圧縮手段（例えば、図２のＣＰＵ１０；図５のステップＡ１６）と、
を備えることを特徴とする。 The dictionary data compression device according to the invention of claim 3
Dictionary data in which character strings are described in a series of headwords is divided into a reference part and a main data part, and a matching character string that matches the character string in the main data part exists in the reference part In this case, in a dictionary data compression apparatus that compresses dictionary data by a dictionary-type encoding method that encodes a character string of the main data portion with a code representing a copy of the matching character string,
More than a predetermined word length compression means (for example, CPU 10 in FIG. 2; step A14 in FIG. 5) for compressing the character string in the main data portion by using the dictionary type encoding method for a character string having a predetermined word length or more. When,
Storage means (for example, the hard disk 20 of FIG. 2; encoding efficiency prediction value table 206) that stores a plurality of character strings and the encoding efficiency prediction values of the character strings in association with each other;
Among the character strings included in the main data portion after being compressed by the compression means over the predetermined word length, the dictionary type encoding is performed in order from the character string having the highest encoding efficiency prediction value stored in the storage means. Evaluation value order compression means (for example, CPU 10 in FIG. 2; step A16 in FIG. 5) for compressing by the method;
It is characterized by providing.

また、請求項８に記載の発明の圧縮辞書データ製造方法は、コンピュータに、
文字列が見出語単位で一連に記述されている辞書データを参照部と主データ部とに区分させ、前記主データ部中の文字列に一致する一致文字列が前記参照部中に在る場合に、当該一致文字列のコピーを表す符号により当該主データ部の文字列を符号化する辞書型符号化方法により辞書データを圧縮させる圧縮辞書データ製造方法において、
前記コンピュータに、
前記主データ部中の文字列のうち、所定語長以上の文字列を対象として前記辞書型符号化方法により圧縮させる所定語長以上圧縮工程（例えば、図２のＣＰＵ１０；図５のステップＡ１４）と、
この所定語長以上圧縮工程によって圧縮された後の主データ部に含まれている文字列のうち、符号化効率予測値の高い文字列から順番に前記辞書型符号化方法により圧縮させる評価値順圧縮工程（例えば、図２のＣＰＵ１０；図５のステップＡ１６）と、
この評価値順圧縮行程における圧縮後の文字列データを圧縮文字列データとして出力させる出力工程（例えば、図２のＣＰＵ１０；図５のステップＡ１８）と、
を含むことを特徴とする。 A compression dictionary data manufacturing method according to an eighth aspect of the present invention provides a computer,
Dictionary data in which character strings are described in a series of headwords is divided into a reference part and a main data part, and a matching character string that matches the character string in the main data part exists in the reference part In this case, in a compressed dictionary data manufacturing method for compressing dictionary data by a dictionary-type encoding method for encoding a character string of the main data portion by a code representing a copy of the matched character string,
In the computer,
More than a predetermined word length compression process (for example, CPU 10 in FIG. 2; step A14 in FIG. 5) for compressing the character string in the main data portion by using the dictionary-type encoding method for a character string having a predetermined word length or more. When,
Among the character strings included in the main data portion after being compressed by the compression step with a predetermined word length or more, the evaluation value order is compressed by the dictionary encoding method in order from the character string having the highest encoding efficiency prediction value. A compression step (for example, CPU 10 in FIG. 2; step A16 in FIG. 5);
An output step (for example, CPU 10 in FIG. 2; step A18 in FIG. 5) for outputting the compressed character string data as compressed character string data in the evaluation value order compression process;
It is characterized by including.

また、請求項１２に記載の発明のプログラムは、
文字列が見出語単位で一連に記述されている辞書データを参照部と主データ部とに区分し、前記主データ部中の文字列に一致する一致文字列が前記参照部中に在る場合に、当該一致文字列のコピーを表す符号により当該主データ部の文字列を符号化する辞書型符号化方法により辞書データを圧縮するコンピュータにおいて、
前記主データ部中の文字列のうち、所定語長以上の文字列を対象として前記辞書型符号化方法により圧縮する所定語長以上圧縮機能（例えば、図２のＣＰＵ１０；図５のステップＡ１４）と、
文字列とこの文字列の符号化効率予測値とを対応づけて複数記憶する記憶機能（例えば、図２のハードディスク２０；符号化効率予測値テーブル２０６）と、
この所定語長以上圧縮機能によって圧縮された後の主データ部に含まれている文字列のうち、前記記憶機能に記憶された符号化効率予測値の高い文字列から順番に前記辞書型符号化方法により圧縮する評価値順圧縮機能（例えば、図２のＣＰＵ１０；図５のステップＡ１６）と、
を実現させることを特徴としている。 The program of the invention according to claim 12 is
Dictionary data in which character strings are described in a series of headwords is divided into a reference part and a main data part, and a matching character string that matches the character string in the main data part exists in the reference part A computer that compresses dictionary data by a dictionary-type encoding method that encodes a character string of the main data portion with a code representing a copy of the matching character string,
More than a predetermined word length compression function (for example, CPU 10 in FIG. 2; step A14 in FIG. 5) that compresses the character string in the main data portion by using the dictionary-type encoding method for a character string having a predetermined word length or more. When,
A storage function (for example, the hard disk 20 of FIG. 2; encoding efficiency prediction value table 206) that stores a plurality of character strings and the encoding efficiency prediction values of the character strings in association with each other;
Among the character strings included in the main data portion after being compressed by the compression function with a predetermined word length or longer, the dictionary type encoding is performed in order from the character string having the highest encoding efficiency prediction value stored in the storage function. Evaluation value order compression function (for example, CPU 10 in FIG. 2; step A16 in FIG. 5) for compressing by the method;
It is characterized by realizing.

請求項３、８又は１２に記載の発明によれば、所定語長以上の文字列を圧縮する場合には、辞書型符号化方法により圧縮し、所定語長以上ではない文字列を圧縮する場合には、符号化効率予測値の高い文字列から、辞書型符号化方法により圧縮することができる。従って、文字列に応じて効率のよい圧縮を行うことが可能となる。 According to the invention of claim 3, 8 or 12, when compressing a character string longer than a predetermined word length, compressing by a dictionary type encoding method and compressing a character string not longer than a predetermined word length Can be compressed by a dictionary-type encoding method from a character string having a high encoding efficiency prediction value. Therefore, efficient compression can be performed according to the character string.

請求項４に記載の発明の辞書データ圧縮装置は、
文字列データで成る辞書データを記憶する記憶手段（例えば、図２のハードディスク２０）と、
規格化された１バイト文字コードの中から何れかの１バイト文字コードを選択する選択手段（例えば、図２のＣＰＵ１０；図７のステップＢ１０）と、
前記規格化された１バイト文字コードの中から対応文字が指定されていない空白の１バイト文字コードを１バイト目とする２バイト文字コード変換表を作成し、この２バイト文字コード変換表中の未設定の２バイト文字コードに前記選択された１バイト文字コードを設定する変換表作成手段（例えば、図２のＣＰＵ１０；図７のステップＢ１４）と、
前記文字列データの中から、出現頻度が所定頻度の２バイト文字コードを検出する検出手段（例えば、図２のＣＰＵ１０；図７のステップＢ１６）と、
前記選択された１バイト文字コードによって表される対応文字を、前記検出された２バイト文字コードによって表される対応文字に変更する対応文字変更手段（例えば、図２のＣＰＵ１０；図７のステップＢ１８）と、
前記文字列データのうち、前記選択された１バイト文字コードを、前記２バイト文字コード変換表に設定された対応する２バイト文字コードに置換する第１の置換手段（例えば、図２のＣＰＵ１０；図７のステップＢ２２）と、
前記文字列データのうち、前記検出された２バイト文字コードを、前記選択された１バイト文字コードに置換する第２の置換手段（例えば、図２のＣＰＵ１０；図７のステップＢ２４）と、
を備えることを特徴とする。 The dictionary data compression apparatus according to claim 4 is provided.
Storage means (for example, the hard disk 20 in FIG. 2) for storing dictionary data composed of character string data;
Selecting means (for example, CPU 10 in FIG. 2; step B10 in FIG. 7) for selecting any one-byte character code from the standardized one-byte character codes;
A two-byte character code conversion table is created in which the first byte is a blank one-byte character code for which no corresponding character is specified from the standardized one-byte character code. Conversion table creation means (for example, CPU 10 in FIG. 2; step B14 in FIG. 7) for setting the selected 1-byte character code to an unset 2-byte character code;
Detecting means (for example, CPU 10 in FIG. 2; step B16 in FIG. 7) for detecting a 2-byte character code having an appearance frequency of a predetermined frequency from the character string data;
Corresponding character changing means for changing the corresponding character represented by the selected one-byte character code to the corresponding character represented by the detected two-byte character code (for example, the CPU 10 in FIG. 2; step B18 in FIG. 7). )When,
Of the character string data, first replacement means for replacing the selected 1-byte character code with the corresponding 2-byte character code set in the 2-byte character code conversion table (for example, CPU 10 in FIG. 2; Step B22) in FIG.
Second replacement means (for example, CPU 10 in FIG. 2; step B24 in FIG. 7) for replacing the detected 2-byte character code in the character string data with the selected 1-byte character code;
It is characterized by providing.

請求項９に記載の発明の圧縮辞書データ製造方法は、コンピュータに、
文字列データで成る辞書データを圧縮した圧縮辞書データを製造させる圧縮辞書データ製造方法であって、
前記コンピュータに、
規格化された１バイト文字コードの中から何れかの１バイト文字コードを選択させる選択工程（例えば、図２のＣＰＵ１０；図７のステップＢ１０）と、
前記規格化された１バイト文字コードの中から対応文字が指定されていない空白の１バイト文字コードを１バイト目とする２バイト文字コード変換表を作成され、この２バイト文字コード変換表中の未設定の２バイト文字コードに前記選択された１バイト文字コードを設定させる変換表作成工程（例えば、図２のＣＰＵ１０；図７のステップＢ１４）と、
前記文字列データの中から、出現頻度が所定頻度の２バイト文字コードを検出させる検出工程（例えば、図２のＣＰＵ１０；図７のステップＢ１６）と、
前記選択された１バイト文字コードによって表される対応文字を、前記検出された２バイト文字コードによって表される対応文字に変更させる対応文字変更工程例えば、図２のＣＰＵ１０；図７のステップＢ１８）と、
前記文字列データのうち、前記選択された１バイト文字コードを、前記２バイト文字コード変換表に設定された対応する２バイト文字コードに置換させる第１の置換工程（例えば、図２のＣＰＵ１０；図７のステップＢ２２）と、
この第１の置換工程における置換後の文字列データのうち、前記検出された２バイト文字コードを、前記選択された１バイト文字コードに置換させる第２の置換工程（例えば、図２のＣＰＵ１０；図７のステップＢ２４）と、
この第２の置換工程における置換後の文字列データを圧縮文字列データとして出力させる出力工程（例えば、図２のＣＰＵ１０；図７のステップＢ２４）と、
を含むことを特徴とする。 According to a ninth aspect of the present invention, there is provided a compressed dictionary data manufacturing method in a computer,
A compressed dictionary data production method for producing compressed dictionary data obtained by compressing dictionary data composed of character string data,
In the computer,
A selection step (for example, CPU 10 in FIG. 2; step B10 in FIG. 7) for selecting any one-byte character code from the standardized one-byte character codes;
A two-byte character code conversion table is created in which a blank one-byte character code for which no corresponding character is specified is specified as the first byte from the standardized one-byte character code. A conversion table creation step (for example, CPU 10 in FIG. 2; step B14 in FIG. 7) for setting the selected 1-byte character code to an unset 2-byte character code;
A detection step (for example, CPU 10 in FIG. 2; step B16 in FIG. 7) for detecting a 2-byte character code having an appearance frequency of a predetermined frequency from the character string data;
Corresponding character changing step for changing the corresponding character represented by the selected one-byte character code to the corresponding character represented by the detected two-byte character code. For example, the CPU 10 in FIG. 2; Step B18 in FIG. 7) When,
A first replacement step (for example, CPU 10 in FIG. 2) of replacing the selected one-byte character code in the character string data with a corresponding two-byte character code set in the two-byte character code conversion table. Step B22) in FIG.
A second replacement step (for example, CPU 10 in FIG. 2) that replaces the detected 2-byte character code with the selected single-byte character code in the character string data after the replacement in the first replacement step. Step B24) of FIG.
An output step (for example, CPU 10 in FIG. 2; step B24 in FIG. 7) for outputting the character string data after replacement in the second replacement step as compressed character string data;
It is characterized by including.

請求項１３に記載の発明のプログラムは、
コンピュータに、
文字列データで成る辞書データを記憶する記憶機能（例えば、図２のハードディスク２０）と、
規格化された１バイト文字コードの中から何れかの１バイト文字コードを選択する選択機能（例えば、図２のＣＰＵ１０；図７のステップＢ１０）と、
前記規格化された１バイト文字コードの中から対応文字が指定されていない空白の１バイト文字コードを１バイト目とする２バイト文字コード変換表を作成し、この２バイト文字コード変換表中の未設定の２バイト文字コードに前記選択された１バイト文字コードを設定する変換表作成機能（例えば、図２のＣＰＵ１０；図７のステップＢ１４）と、
前記文字列データの中から、出現頻度が所定頻度の２バイト文字コードを検出する検出機能（例えば、図２のＣＰＵ１０；図７のステップＢ１６）と、
前記選択された１バイト文字コードによって表される対応文字を、前記検出された２バイト文字コードによって表される対応文字に変更する対応文字変更機能（例えば、図２のＣＰＵ１０；図７のステップＢ１８）と、
前記文字列データのうち、前記選択された１バイト文字コードを、前記２バイト文字コード変換表に設定された対応する２バイト文字コードに置換する第１の置換機能（例えば、図２のＣＰＵ１０；図７のステップＢ２２）と、
前記文字列データのうち、前記検出された２バイト文字コードを、前記選択された１バイト文字コードに置換する第２の置換機能（例えば、図２のＣＰＵ１０；図７のステップＢ２４）と、
を実現させることを特徴とする。 The program of the invention described in claim 13 is:
On the computer,
A storage function (for example, the hard disk 20 in FIG. 2) for storing dictionary data composed of character string data;
A selection function (for example, CPU 10 in FIG. 2; step B10 in FIG. 7) for selecting any one-byte character code from the standardized one-byte character codes;
A two-byte character code conversion table is created in which the first byte is a blank one-byte character code for which no corresponding character is specified from the standardized one-byte character code. A conversion table creation function (for example, CPU 10 in FIG. 2; step B14 in FIG. 7) for setting the selected 1-byte character code to an unset 2-byte character code;
A detection function (for example, CPU 10 in FIG. 2; step B16 in FIG. 7) for detecting a 2-byte character code having an appearance frequency of a predetermined frequency from the character string data;
Corresponding character change function for changing the corresponding character represented by the selected one-byte character code to the corresponding character represented by the detected two-byte character code (for example, CPU 10 in FIG. 2; step B18 in FIG. 7) )When,
A first replacement function (for example, CPU 10 in FIG. 2) that replaces the selected one-byte character code in the character string data with a corresponding two-byte character code set in the two-byte character code conversion table. Step B22) in FIG.
A second replacement function (for example, CPU 10 in FIG. 2; step B24 in FIG. 7) that replaces the detected 2-byte character code in the character string data with the selected 1-byte character code;
It is characterized by realizing.

請求項４、９又は１３に記載の発明によれば、出現頻度が所定の頻度の２バイト文字を１バイト文字として置換し、１バイト文字を２バイト文字として置換することが出来る。従って、出現頻度が高い２バイト文字を１バイト文字として表すことにより、少ない情報量で辞書データを圧縮することが可能となる。 According to the invention described in claim 4, 9 or 13, it is possible to replace a 2-byte character having a predetermined appearance frequency as a 1-byte character and replace a 1-byte character as a 2-byte character. Therefore, by representing a 2-byte character having a high appearance frequency as a 1-byte character, dictionary data can be compressed with a small amount of information.

請求項５に記載の発明の電子辞書装置は、
文字列データで成る辞書データを記憶する記憶手段（例えば、図１４のＥＥＰＲＯＭ１４０；圧縮後英和辞典データ１４００）と、
文字コードと、規格化された文字コードとを対応づけて記憶する文字コード記憶手段（例えば、図１４のＥＥＰＲＯＭ１４０；文字コード変換表１４０８）と、
前記文字列データに含まれる文字を表す文字コードの１バイト目を抽出する抽出手段（例えば、図１４のＣＰＵ１１０；図１６のステップＧ１０）と、
この抽出手段により抽出された１バイト目に基づいて、当該文字コードが１バイト文字コードであるか２バイト文字コードであるかを判定する文字判定手段（例えば、図１４のＣＰＵ１１０；図１６のステップＧ１４）と、
この文字判定手段により１バイト文字コードであると判定された場合に、その判定された１バイト文字コードに対応する規格化された文字コードを前記文字コード記憶手段から読み出して、規格化された文字コードに置換する置換手段（例えば、図１４のＣＰＵ１１０；図１６のステップＧ２６）と、
を備えることを特徴とする。 The electronic dictionary device of the invention according to claim 5 is:
Storage means for storing dictionary data consisting of character string data (for example, EEPROM 140 in FIG. 14; compressed English-Japanese dictionary data 1400);
Character code storage means (for example, EEPROM 140 in FIG. 14; character code conversion table 1408) for storing character codes and standardized character codes in association with each other;
Extraction means (for example, CPU 110 in FIG. 14; step G10 in FIG. 16) for extracting the first byte of the character code representing the character included in the character string data;
Based on the first byte extracted by this extraction means, character determination means for determining whether the character code is a 1-byte character code or a 2-byte character code (for example, CPU 110 in FIG. 14; step in FIG. 16) G14)
When the character determining unit determines that the character code is a one-byte character code, a standardized character code corresponding to the determined one-byte character code is read from the character code storage unit, and the standardized character is read. Replacement means for replacing with a code (for example, CPU 110 in FIG. 14; step G26 in FIG. 16);
It is characterized by providing.

請求項１４に記載の発明のプログラムは、
コンピュータに、
文字列データで成る辞書データを記憶する記憶機能（例えば、図１４のＥＥＰＲＯＭ１４０；圧縮後英和辞典データ１４００）と、
文字コードと、規格化された文字コードとを対応づけて記憶する文字コード記憶機能（例えば、図１４のＥＥＰＲＯＭ１４０；文字コード変換表１４０８）と、
前記文字列データに含まれる文字を表す文字コードの１バイト目を抽出する抽出機能（例えば、図１４のＣＰＵ１１０；図１６のステップＧ１０）と、
この抽出機能により抽出された１バイト目に基づいて、当該文字コードが１バイト文字コードであるか２バイト文字コードであるかを判定する文字判定機能（例えば、図１４のＣＰＵ１１０；図１６のステップＧ１４）と、
この文字判定機能により１バイト文字コードであると判定された場合に、その判定された１バイト文字コードに対応する規格化された文字コードを前記文字コード記憶機能から読み出して、規格化された文字コードに置換する置換機能（例えば、図１４のＣＰＵ１１０；図１６のステップＧ２６）と、
を実現させることを特徴とする。 The program of the invention according to claim 14 is:
On the computer,
A storage function for storing dictionary data composed of character string data (for example, EEPROM 140 in FIG. 14; English-Japanese dictionary data after compression 1400);
A character code storage function (for example, EEPROM 140; character code conversion table 1408 in FIG. 14) that stores character codes and standardized character codes in association with each other;
An extraction function (for example, CPU 110 in FIG. 14; step G10 in FIG. 16) for extracting the first byte of a character code representing a character included in the character string data;
Based on the first byte extracted by this extraction function, a character determination function for determining whether the character code is a 1-byte character code or a 2-byte character code (for example, CPU 110 in FIG. 14; step in FIG. 16) G14)
When the character determination function determines that the character code is a 1-byte character code, the standardized character code corresponding to the determined 1-byte character code is read from the character code storage function, A replacement function (for example, CPU 110 in FIG. 14; step G26 in FIG. 16) for replacing with a code;
It is characterized by realizing.

請求項５又は１４に記載の発明によれば、文字列データの文字が変換されている場合であっても、文字の１バイト目を抽出し、抽出された文字に応じて元の文字に変換することが可能となる。 According to the invention of claim 5 or 14, even if the character of the character string data is converted, the first byte of the character is extracted and converted to the original character according to the extracted character. It becomes possible to do.

［１．全体構成］
図１は、本発明を適用したコンピュータ１及び電子辞書装置１００の概観図である。コンピュータ１は、通常、電子辞書装置１００の製造メーカ等に設置されており、辞書データの圧縮の用に供される辞書圧縮装置の一種である。コンピュータ１で圧縮された辞書データは、ＥＥＰＲＯＭ１０７に記憶されて、ＥＥＰＲＯＭ１０７が実装された電子辞書装置１００が製造される。そして、電子辞書装置１００においては、圧縮された辞書データが伸張され、辞書データの内容（見出語や説明情報等）が表示される。 [1. overall structure]
FIG. 1 is an overview of a computer 1 and an electronic dictionary device 100 to which the present invention is applied. The computer 1 is usually installed in a manufacturer or the like of the electronic dictionary device 100, and is a kind of dictionary compression device that is used for compression of dictionary data. The dictionary data compressed by the computer 1 is stored in the EEPROM 107, and the electronic dictionary device 100 on which the EEPROM 107 is mounted is manufactured. In the electronic dictionary device 100, the compressed dictionary data is expanded and the contents of the dictionary data (such as headwords and explanation information) are displayed.

辞書データとは、見出語と、当該見出語を説明するための説明情報とからなるデータであり、例えば国語辞典や英和辞典、和英辞典、英英辞典、カタカナ語辞典などの複数の種類がある。但し、簡明のために、本実施形態においては、コンピュータ１により圧縮され、電子辞書装置１００に記憶される辞書データは、英和辞典の辞書データのみとして説明する。また、圧縮（符号化）前の辞書データを圧縮後の辞書データと区別するために、以下では、圧縮前の辞書データを「元辞書データ」という。また、圧縮後の辞書データを「圧縮後辞書データ」という。 Dictionary data is data consisting of headwords and explanatory information for explaining the headwords. For example, multiple types such as Japanese dictionary, English-Japanese dictionary, Japanese-English dictionary, English-English dictionary, Katakana dictionary, etc. There is. However, for the sake of simplicity, in the present embodiment, the dictionary data compressed by the computer 1 and stored in the electronic dictionary device 100 will be described as only dictionary data of an English-Japanese dictionary. In order to distinguish dictionary data before compression (encoding) from dictionary data after compression, the dictionary data before compression is hereinafter referred to as “original dictionary data”. The compressed dictionary data is referred to as “compressed dictionary data”.

図１に示すように、コンピュータ１は、ＣＲＴ（Cathode Ray Tube）等のディスプレイ３と、キーボード５と、ＲＡＭやハードディスク等のメモリ７とを備えた汎用のサーバ・コンピュータ等のハードウェアで構成される。電子辞書装置１００は、ＬＣＤ（Liquid Crystal Display）等のディスプレイ１０３と、文字入力キーや辞書種別の選択キー等の各種キー群１０５と、ＥＥＰＲＯＭ１０７とを備えて構成される。 As shown in FIG. 1, the computer 1 includes hardware such as a general-purpose server computer including a display 3 such as a CRT (Cathode Ray Tube), a keyboard 5 and a memory 7 such as a RAM or a hard disk. The The electronic dictionary device 100 includes a display 103 such as an LCD (Liquid Crystal Display), various key groups 105 such as character input keys and dictionary type selection keys, and an EEPROM 107.

電子辞書装置１００の基本的な機能は、次の通りである。すなわち、ユーザによって辞書が選択され、検索語となる文字が入力される（以下、入力された文字を「入力文字」という。）と、電子辞書装置１００は、入力文字に適合する見出語を辞書データの中から検索し、見出語候補として一覧表示する。そして、検索した見出語に対応する説明情報を表示する。 The basic functions of the electronic dictionary device 100 are as follows. That is, when a dictionary is selected by a user and a character to be a search word is input (hereinafter, the input character is referred to as “input character”), the electronic dictionary device 100 searches for a headword that matches the input character. Search from dictionary data and display as a list of headword candidates. And the explanatory information corresponding to the searched headword is displayed.

［２．辞書データ圧縮装置］
［２．１構成］
まず、辞書データをコンピュータ１において圧縮する場合の処理について説明する。図２は、コンピュータ１を示すブロック図である。同図に示すように、コンピュータ１は、ＣＰＵ（Central Processing Unit）１０と、ハードディスク２０と、ＲＡＭ（Random Access Memory）３０と、ＲＯＭ（Read Only Memory）４０と、入力部５０と、表示部６０とを備えている。 [2. Dictionary data compression device]
[2.1 Configuration]
First, processing when the dictionary data is compressed in the computer 1 will be described. FIG. 2 is a block diagram showing the computer 1. As shown in FIG. 1, a computer 1 includes a CPU (Central Processing Unit) 10, a hard disk 20, a RAM (Random Access Memory) 30, a ROM (Read Only Memory) 40, an input unit 50, and a display unit 60. And.

［２．１．１記憶領域］
ハードディスク２０は、オペレーティングシステム、必要なプログラム又はデータファイル等を格納する。また、ハードディスク２０には、元英和辞典データ２００と、符号化効率予測値テーブル２０６と、辞書圧縮プログラム２１０と、文字コード圧縮プログラム２１２と、グローバルサーチプログラム２１４と、ローカルサーチプログラム２１６と、符号化プログラム２１８とが記憶されている。 [2.1.1 Storage area]
The hard disk 20 stores an operating system, necessary programs, data files, and the like. The hard disk 20 also includes the original English-Japanese dictionary data 200, the encoding efficiency prediction value table 206, the dictionary compression program 210, the character code compression program 212, the global search program 214, the local search program 216, and the encoding. A program 218 is stored.

元英和辞典データ２００は、「英和辞典」のコンテンツの圧縮前のデータが入っている辞書データである。図３（ａ）に元英和辞典データ２００の概要を図示した。図３（ａ）において、「○○○○」で示した部分は見出語を表し、「・・・・・」で示した部分は当該見出語の説明文（見出語を説明する文章を構成する文字）を表している。図３（ａ）に示すように、元英和辞典データ２００は、紙の辞書に印字されている文字を文字コードとした一連のテキストのデータとなっている。 The original English-Japanese dictionary data 200 is dictionary data containing data before compression of the content of the “English-Japanese dictionary”. FIG. 3A shows an outline of the original English-Japanese dictionary data 200. In FIG. 3A, the part indicated by “XXX” represents a headword, and the part indicated by “...” explains the headword (explains the headword). Character constituting a sentence). As shown in FIG. 3A, the original English-Japanese dictionary data 200 is a series of text data in which characters printed on a paper dictionary are character codes.

また、図３（ｂ）は、説明の便宜上、元英和辞典データ２０２を見出語単位に区切って図式化した概念図である。図３（ｂ）によれば、例えば、見出語「ＣＤ−Ｒ」及び「ＣＤ−Ｒ」の説明情報（以下、１つの見出語とその見出語の説明情報とを合わせて「見出語単位データ」という。）は、元英和辞典データ２００の先頭を「１」バイト目とした「１００」バイト目から記述されている。見出語「ＣＤ−ＲＷ」の見出語単位データは、元英和辞典データ２００の「５００」バイト目から記述されている。 FIG. 3B is a conceptual diagram schematically illustrating the original English-Japanese dictionary data 202 divided into headword units for convenience of explanation. According to FIG. 3B, for example, the explanation information of the headwords “CD-R” and “CD-R” (hereinafter, one headword and the explanation information of the headword are combined together. "Word-unit data") is described from the "100" th byte starting from the "1" th byte of the original English-Japanese dictionary data 200. The headword unit data of the headword “CD-RW” is described from the “500” byte of the original English-Japanese dictionary data 200.

符号化効率予測値テーブル２０６は、文字列を辞書型符号化方法で符号化して圧縮した場合の大きさについて予測値として記憶しているテーブルである。図４（ａ）に符号化効率予測値テーブル２０６のデータ構造の一例を示す。符号化効率予測値テーブル２０６は、文字列（例えば、「［名詞」）と、この文字列を辞書型符号化方法によって符号化した場合の予測ビット数（例えば、「２８」）とを対応づけて保存している。 The encoding efficiency prediction value table 206 is a table that stores the size when a character string is encoded and compressed by a dictionary-type encoding method as a prediction value. FIG. 4A shows an example of the data structure of the encoding efficiency prediction value table 206. The encoding efficiency prediction value table 206 associates a character string (for example, “[noun”) with a predicted number of bits (for example, “28”) when the character string is encoded by a dictionary-type encoding method. Is saved.

ＲＡＭ３０は、ＣＰＵ１０が実行する各種プログラムや、これらのプログラムの実行にかかるデータ等を一時的に保持するメモリ領域を備える。本実施形態では、圧縮後英和辞典データ３００と、主データ部中間データ３０６と、１バイト文字変換表３０８と、文字コード変換表３１０と、文字待避表３１２と、サブテーブル３１４と、ハフマン符号テーブル３１６と、リザーブ配列記憶領域３１８と、見出語テーブル３２０とを備えている。なお、ＲＡＭ３０は、図１におけるメモリ７に相当する。 The RAM 30 includes a memory area that temporarily holds various programs executed by the CPU 10, data related to the execution of these programs, and the like. In this embodiment, the compressed English-Japanese dictionary data 300, the main data part intermediate data 306, the 1-byte character conversion table 308, the character code conversion table 310, the character avoidance table 312, the sub table 314, and the Huffman code table 316, a reserve array storage area 318, and a headword table 320 are provided. The RAM 30 corresponds to the memory 7 in FIG.

圧縮後英和辞典データ３００は、ＣＰＵ１０が、辞書圧縮プログラム２１０に基づいて辞書圧縮処理を実行することにより、元英和辞典データ２００を圧縮した辞書データである。詳細は後述するが、圧縮後英和辞典データ３００は、処理後英和参照部３０２と、符号化後英和主データ部３０４に区分されている。 The post-compression English-Japanese dictionary data 300 is dictionary data obtained by compressing the original English-Japanese dictionary data 200 by the CPU 10 executing a dictionary compression process based on the dictionary compression program 210. Although details will be described later, the post-compression English-Japanese dictionary data 300 is divided into a post-processing English-Japanese reference unit 302 and an encoded English-Japanese main data unit 304.

主データ部中間データ３０６は、ＣＰＵ１０が処理後英和参照部作成処理を実行することにより、元英和辞典データ２００から作成される辞書データである。ＣＰＵ１０は、元英和辞典データ２００から、処理後英和参照部３０２として保存されなかった残りの辞書データを主データ部中間データとして、ＲＡＭ３０に記憶する。 The main data portion intermediate data 306 is dictionary data created from the original English-Japanese dictionary data 200 when the CPU 10 executes post-processing English-Japanese reference portion creation processing. The CPU 10 stores the remaining dictionary data that is not stored as the post-processing English-Japanese reference unit 302 from the original English-Japanese dictionary data 200 in the RAM 30 as main data unit intermediate data.

ここで、文字コードについて説明する。コンピュータ等で利用されるテキスト文字は、文字コードで特定されているのが一般的である。具体的には、半角英数文字「０」について、コンピュータは文字「０」そのものとして扱っているのではなく、「０」の文字コード「３０ｈ（ｈは１６進数の意。以下省略する。）」というデータで処理している。文字コードには、１バイト文字コードと２バイト文字コードとがあり、種々の規格がある。何れの規格でもよいが、本実施形態においては、例えばシフトＪＩＳ規格による文字コードのことを汎用文字コードと呼ぶ。また、元英和辞典データ２００は汎用文字コードで記述されているが、文字コード圧縮処理において、汎用文字コードの一部を変更して書き換える。以下、その文字コード圧縮処理において利用される１バイト文字変換表３０８と、文字コード変換表３１０と、文字待避表３１２とを説明する。 Here, the character code will be described. Generally, text characters used in a computer or the like are specified by a character code. Specifically, the half-width alphanumeric character “0” is not handled by the computer as the character “0” itself, but the character code “30h” (h is a hexadecimal number; hereinafter omitted). Is processed with the data. Character codes include a 1-byte character code and a 2-byte character code, and there are various standards. Any standard may be used, but in the present embodiment, for example, a character code according to the Shift JIS standard is referred to as a general-purpose character code. Moreover, although the original English-Japanese dictionary data 200 is described by a general-purpose character code, a part of the general-purpose character code is changed and rewritten in the character code compression processing. Hereinafter, the 1-byte character conversion table 308, the character code conversion table 310, and the character saving table 312 used in the character code compression process will be described.

１バイト文字変換表３０８は、１バイト文字に関する文字コードのテーブルであり、２５６個の文字と、その文字コードとを対応づけたテーブルである。１バイト文字変換表３０８における文字コードは、１バイト文字であるため、「０」から「Ｆ」までの上位の桁と、同じく「０」から「Ｆ」までの下位の桁との２桁で構成されている。図８（ａ）は１バイト文字変換表３０８の一例を示した図である。例えば、文字「＃」の、文字コードは「２３」である。 The 1-byte character conversion table 308 is a table of character codes relating to 1-byte characters, and is a table in which 256 characters are associated with the character codes. Since the character code in the 1-byte character conversion table 308 is a 1-byte character, it has two digits, the upper digit from “0” to “F” and the lower digit from “0” to “F”. It is configured. FIG. 8A shows an example of the 1-byte character conversion table 308. For example, the character code of the character “#” is “23”.

文字コード変換表３１０は、１バイト文字の文字コードと、汎用文字コードとを対応づけたテーブルである。図８（ｂ）は文字コード変換表３１０の一例を示した図である。文字コード変換表３１０は、例えば１バイト文字の文字コード「２１」に対応する汎用文字コードとして「８９ＢＤ」を保存している。 The character code conversion table 310 is a table in which character codes of 1-byte characters are associated with general-purpose character codes. FIG. 8B shows an example of the character code conversion table 310. The character code conversion table 310 stores, for example, “89BD” as a general-purpose character code corresponding to the character code “21” of a 1-byte character.

文字待避表３１２は、汎用文字コードに無い、新たな２バイト文字の２バイト目のコードを定義したテーブルである。また、文字待避表３１２に定義される新たな２バイト文字は、汎用文字コードであったものである。例えば、文字「！」は、汎用文字コードにおいては１バイト文字として汎用文字コード「２１」が割り当てられているものであるが、図８においては、この文字「！」を新たな２バイト文字として、１バイト目のコードを「１３」、２バイト目のコードを「００」とした文字コード「１３００」を割り当てている。 The character saving table 312 is a table that defines a code of the second byte of a new 2-byte character that is not included in the general-purpose character code. The new 2-byte character defined in the character saving table 312 is a general-purpose character code. For example, the character “!” Is assigned with the general-purpose character code “21” as a one-byte character in the general-purpose character code. In FIG. The character code “1300” is assigned with the code of the first byte being “13” and the code of the second byte being “00”.

サブテーブル３１４は、処理後英和参照部３０２におけるアドレスに対応する符号を保存しているテーブルである。図４（ｂ）は、サブテーブル３１４のデータ構成の一例を示した図である。例えば、符号「０１」に対応づけて、アドレス「00000000000000000111」が保存されている。サブテーブル３１４を利用することにより、後述する辞書型符号化方法によって符号化する際、参照する文字列の位置をアドレスではなく、符号で表す。 The sub-table 314 is a table that stores codes corresponding to addresses in the post-processing English-Japanese reference unit 302. FIG. 4B is a diagram illustrating an example of a data configuration of the sub table 314. For example, the address “00000000000000000111” is stored in association with the code “01”. By using the sub-table 314, the position of a character string to be referred to is expressed by a code instead of an address when encoding by a dictionary-type encoding method described later.

ハフマン符号テーブル３１６は、文字列と、文字列をハフマン符号化した場合のハフマン符号とを対応づけて保存しているテーブルである。元英和辞典データ２００に含まれている文字列それぞれに対し、一意の符号が割り当てられている。 The Huffman code table 316 is a table that stores a character string and a Huffman code when the character string is Huffman-coded. A unique code is assigned to each character string included in the original English-Japanese dictionary data 200.

図４（ｃ）はハフマン符号テーブル３１６のデータ構成の一例を示した図である。例えば、ハフマン符号テーブル３１６は、文字列「［名詞］」と、ハフマン符号「１０１１００…」とを対応づけて保存している。 FIG. 4C is a diagram showing an example of the data configuration of the Huffman code table 316. For example, the Huffman code table 316 stores a character string “[noun]” and a Huffman code “101100.

リザーブ配列記憶領域３１８は、主データ部中間データ３０６を符号化する際に使用される配列であるリザーブ配列を記憶する領域である。ここで、リザーブ配列の添え字は、文字が主データ部中間データ３０６で、先頭から何文字目であるかを表しており、当該文字が含まれる文字列の語長を配列の要素の値として記憶している（以下、当該文字に対応したリザーブ配列の要素の値を、適宜「文字対応リザーブ配列値」という。）。 The reserve array storage area 318 is an area for storing a reserve array that is an array used when the main data portion intermediate data 306 is encoded. Here, the subscript of the reserve array indicates the number of characters from the beginning of the main data portion intermediate data 306, and the word length of the character string including the character is used as the value of the element of the array. (Hereinafter, the values of the elements of the reserve array corresponding to the character are referred to as “character-reserved reserve array values” as appropriate).

図１０（ｂ）は、リザーブ配列記憶領域３１８に記憶されたリザーブ配列の一例を示した図である。文字対応リザーブ配列値には、「０」かそれ以外の正の整数値が代入されている。ここで、文字対応リザーブ配列値に「０」が代入されている場合には、ＣＰＵ１０は、その「０」が代入されている要素の文字列をハフマン符号化方法に基づいて符号化する。また、文字対応リザーブ配列値に「０」以外の数値が代入されている場合には、ＣＰＵ１０は、文字列を辞書型符号化方法に基づいて符号化する。例えば、図１０（ｂ）（ロ）において、リザーブ配列［５４］に「１０」が代入されていることから、ＣＰＵ１０は、「５４」文字目から「１０」文字分を辞書型符号化方法により符号化する。 FIG. 10B is a diagram showing an example of the reserve array stored in the reserve array storage area 318. “0” or any other positive integer value is assigned to the character correspondence reserved array value. Here, when “0” is assigned to the character correspondence reserved array value, the CPU 10 encodes the character string of the element to which “0” is assigned based on the Huffman encoding method. If a numerical value other than “0” is assigned to the character-corresponding reserved array value, the CPU 10 encodes the character string based on the dictionary-type encoding method. For example, in FIG. 10B and FIG. 10B, since “10” is assigned to the reserve array [54], the CPU 10 uses the dictionary-type encoding method for “10” characters from the “54” character. Encode.

見出語テーブル３２０は、圧縮後英和辞典データ３００に含まれている各見出語の圧縮後英和辞典データ３００の開始位置（開始バイト）を保存したテーブルである。ＣＰＵ１０は、辞書圧縮処理を実行後、圧縮後英和辞典データ３００に保存された見出語単位データの開始位置を、見出語テーブル３２０に保存する。 The headword table 320 is a table in which the start position (start byte) of the compressed English-Japanese dictionary data 300 of each headword included in the compressed English-Japanese dictionary data 300 is stored. After executing the dictionary compression process, the CPU 10 stores the start position of the headword unit data stored in the post-compression English-Japanese dictionary data 300 in the headword table 320.

図４（ｄ）は、見出語テーブル３２０のデータ構造の一例を示した図である。見出語テーブル３２０は、圧縮後英和辞典データ３００に含まれている符号化後の見出語単位データの開始バイト位置（例えば、「４９」）を昇順に記憶している。 FIG. 4D is a diagram illustrating an example of the data structure of the headword table 320. The entry word table 320 stores the start byte position (for example, “49”) of the entry word unit data after encoding included in the compressed English-Japanese dictionary data 300 in ascending order.

ＲＯＭ４０は、各種初期設定、ハードウェアの検査、あるいは必要なプログラムのロードを行う為の初期プログラム（例えば、ＢＩＯＳ（Basic Input/Output System）等）を格納する。ＣＰＵ１０は、コンピュータ１の電源投入時においてこの初期プログラムを実行することにより、コンピュータ１の動作環境を設定する。 The ROM 40 stores an initial program (for example, a basic input / output system (BIOS)) for performing various initial settings, hardware inspection, or loading a necessary program. The CPU 10 sets the operating environment of the computer 1 by executing this initial program when the computer 1 is turned on.

［２．１．２ＣＰＵ］
ＣＰＵ１０は、入力される指示に応じて所定のプログラムに基づいた処理を実行し、各機能部への指示やデータの転送を行う。具体的には、ＣＰＵ１０は、入力部５０から入力される操作信号に応じてハードディスク２０に格納されたプログラムを読み出し、当該プログラムに従って処理を実行する。そして、表示制御信号を適宜表示部６０に出力して、処理結果を表示させる。 [2.1.2 CPU]
The CPU 10 executes processing based on a predetermined program in accordance with an input instruction, and transfers instructions and data to each function unit. Specifically, the CPU 10 reads a program stored in the hard disk 20 in response to an operation signal input from the input unit 50, and executes processing according to the program. Then, a display control signal is appropriately output to the display unit 60 to display the processing result.

また、ＣＰＵ１０は、本実施形態において、ハードディスク２０の辞書圧縮プログラム２１０に従った、辞書圧縮処理（図５参照）を実行すると共に、この辞書圧縮処理において、文字コード圧縮プログラム２１２に従った文字コード圧縮処理と、グローバルサーチプログラム２１４に従ったグローバルサーチ処理と、ローカルサーチプログラム２１６に従ったローカルサーチ処理と、符号化プログラム２１８に従った符号化処理とをサブルーチンとして実行する。 Further, in this embodiment, the CPU 10 executes a dictionary compression process (see FIG. 5) according to the dictionary compression program 210 of the hard disk 20, and in this dictionary compression process, the character code according to the character code compression program 212 is executed. The compression process, the global search process according to the global search program 214, the local search process according to the local search program 216, and the encoding process according to the encoding program 218 are executed as subroutines.

具体的には、ＣＰＵ１０は、辞書圧縮処理における処理後英和参照部処理において、元英和辞典データ２００全体から見出語単位データを離散的に選択し、選択した見出語単位データをまとめることによって処理後英和参照部３０２を作成する。そして、ＣＰＵ１０は、選択しなかった見出語単位データをまとめることによって主データ部中間データ３０６を作成する。そして、ＣＰＵ１０は、処理後英和参照部３０２及び主データ部中間データ３０６に含まれている文字に対して文字コード圧縮処理を実行する。さらに、ＣＰＵ１０は、主データ部中間データに対して、グローバルサーチ処理と、ローカルサーチ処理とを実行し、辞書型符号化方法により符号化する際に参照されるリザーブ配列を作成する。そして、ＣＰＵ１０は、作成されたリザーブ配列を参照して、符号化処理を実行することにより、符号化後英和主データ部３０４を作成する。以下、詳細は後述するが、各処理について概要を説明する。 Specifically, in the post-processing English-Japanese reference section processing in the dictionary compression processing, the CPU 10 discretely selects headword unit data from the entire original English-Japanese dictionary data 200, and collects the selected headword unit data. A post-processing English-Japanese reference unit 302 is created. Then, the CPU 10 creates the main data portion intermediate data 306 by collecting the entry word unit data that have not been selected. Then, the CPU 10 executes a character code compression process on the characters included in the post-processing English-Japanese reference unit 302 and the main data unit intermediate data 306. Further, the CPU 10 executes a global search process and a local search process for the main data part intermediate data, and creates a reserve array that is referred to when encoding is performed by the dictionary-type encoding method. Then, the CPU 10 creates an encoded English-Japanese main data section 304 by executing an encoding process with reference to the created reserved array. Hereinafter, although details will be described later, an outline of each process will be described.

まず、ＣＰＵ１０は、文字コード圧縮処理において、１バイト文字変換表３０８の中から空きコードを１つ決定する。また、ＣＰＵ１０は、１バイト文字変換表３０８の中から待避する１バイト文字を決定し、決定した１バイト文字を文字待避表３１２に割り当てる。次に、ＣＰＵ１０は、待避した文字数分だけ、元英和辞典データ２００の中において頻出する２バイト文字を決定し、決定した２バイト文字に対して、待避した１バイト文字の文字コードを割り当てるように１バイト文字変換表３０８を変更する。そして、ＣＰＵ１０は、頻出文字として決定した２バイト文字の汎用文字コードと、新たに割り当てた文字コードとを文字コード変換表３１０に保存する。そして、ＣＰＵ１０は、処理後英和参照部３０２及び主データ部中間データ３０６に含まれている各文字の文字コードを変換された文字コードで置換する。 First, the CPU 10 determines one empty code from the 1-byte character conversion table 308 in the character code compression process. Further, the CPU 10 determines a 1-byte character to be saved from the 1-byte character conversion table 308, and assigns the determined 1-byte character to the character saving table 312. Next, the CPU 10 determines two-byte characters that frequently appear in the original English-Japanese dictionary data 200 by the number of saved characters, and assigns the saved one-byte character code to the decided two-byte characters. The 1-byte character conversion table 308 is changed. Then, the CPU 10 saves the general-purpose character code of the 2-byte character determined as the frequent character and the newly assigned character code in the character code conversion table 310. Then, the CPU 10 replaces the character code of each character included in the post-processing English-Japanese reference unit 302 and the main data portion intermediate data 306 with the converted character code.

次に、ＣＰＵ１０は、グローバルサーチ処理において、主データ部中間データ３０６に含まれている所定の語長より大きい文字列が処理後英和参照部３０２に含まれているか否かを検索し、文字対応リザーブ配列値に、語長を代入する。 Next, in the global search process, the CPU 10 searches whether or not a character string longer than a predetermined word length included in the main data portion intermediate data 306 is included in the post-processing English-Japanese reference portion 302 and corresponds to the character. Assign the word length to the reserved array value.

次に、ＣＰＵ１０は、ローカルサーチ処理において、主データ部中間データ３０６に含まれている所定の語長以下の文字列が処理後英和参照部３０２に含まれているか否かを検索し、所定の語長以下の文字列の中で最も符号化効率がよいと予測される文字列の語長を文字対応リザーブ配列値に代入する。 Next, in the local search process, the CPU 10 searches the post-process English-Japanese reference unit 302 for whether or not a character string having a predetermined word length or less included in the main data part intermediate data 306 is included. The character length of the character string that is predicted to have the highest encoding efficiency among the character strings of the word length or less is assigned to the character-corresponding reserved array value.

次に、ＣＰＵ１０は、符号化処理において、文字対応リザーブ配列値を抽出する。そして、ＣＰＵ１０は、文字対応リザーブ配列値が「０」であれば、文字列をハフマン符号で符号化する。また、ＣＰＵ１０は、文字対応リザーブ配列値が「０」以外であれば、処理後英和参照部３０２における参照する文字列のアドレスを抽出する。ここで、ＣＰＵ１０は、抽出したアドレスがサブテーブル３１４に保存されている場合には、サブテーブル３１４に基づき符号化を行い、サブテーブル３１４に保存されていない場合には、処理後英和参照部３０２のアドレスに基づいて符号化を行う。そして、主データ部中間データの総ての文字列を符号化し、符号化後英和主データ部３０４に保存する。 Next, the CPU 10 extracts a character correspondence reserved array value in the encoding process. If the character correspondence reserved array value is “0”, the CPU 10 encodes the character string with the Huffman code. Further, when the character correspondence reserved array value is other than “0”, the CPU 10 extracts the address of the character string referred to in the post-processing English-Japanese reference unit 302. Here, when the extracted address is stored in the sub-table 314, the CPU 10 performs encoding based on the sub-table 314. When the extracted address is not stored in the sub-table 314, the post-processing English-Japanese reference unit 302 is processed. Encoding is performed based on the address. Then, all character strings of the main data portion intermediate data are encoded and stored in the encoded English-Japanese main data portion 304.

［２．１．３入出力部］
入力部５０は、仮名やアルファベット等の文字入力や機能選択等に必要なキー群を備えた入力装置であり、押下されたキーの信号をＣＰＵ１０に出力する。この入力部５０におけるキー入力により、処理の実行などを指示する制御命令の入力手段を実現する。なお、この入力部５０は、図１に示すキーボード５に相当するが、キーボードに限られる物ではなく、例えばマウス等であっても良い。 [2.1.3 Input / output unit]
The input unit 50 is an input device that includes a key group necessary for character input such as kana and alphabets, function selection, and the like, and outputs a signal of a pressed key to the CPU 10. A control command input means for instructing execution of processing is realized by key input in the input unit 50. The input unit 50 corresponds to the keyboard 5 shown in FIG. 1, but is not limited to the keyboard, and may be, for example, a mouse.

表示部６０は、ＣＰＵ１０から出力される表示信号に基づいて各種画面を表示するものであり、ＣＲＴ（Cathode Ray Tube）等により構成される。なお、この表示部６０は、図１に示すディスプレイ３に相当する。 The display unit 60 displays various screens based on display signals output from the CPU 10, and is configured by a CRT (Cathode Ray Tube) or the like. The display unit 60 corresponds to the display 3 shown in FIG.

［２．２動作］
［２．２．１辞書圧縮処理］
まず、本実施の形態におけるＣＰＵ１０が実行する辞書圧縮処理について説明する。図５は、辞書圧縮処理に係るコンピュータ１の動作を説明するためのフローチャートである。この辞書圧縮処理は、ＣＰＵ１０が、ハードディスク２０に記憶された辞書圧縮プログラム２１０を実行することによって実現される処理である。 [2.2 Operation]
[2.2.1 Dictionary compression processing]
First, dictionary compression processing executed by the CPU 10 in the present embodiment will be described. FIG. 5 is a flowchart for explaining the operation of the computer 1 related to the dictionary compression processing. This dictionary compression process is a process realized by the CPU 10 executing the dictionary compression program 210 stored in the hard disk 20.

辞書圧縮処理において、ＣＰＵ１０は、各種処理を実行することにより、元英和辞典データ２００を圧縮し、圧縮後英和辞典データ３００を作成する。以下、辞書圧縮処理に含まれる各処理について説明する。 In the dictionary compression process, the CPU 10 performs various processes to compress the original English-Japanese dictionary data 200 and create post-compression English-Japanese dictionary data 300. Hereinafter, each process included in the dictionary compression process will be described.

［２．２．２処理後英和参照部作成処理］
まず、ＣＰＵ１０は、元英和辞典データ２００を読み出し、処理後英和参照部３０２及び主データ部中間データ３０６を作成する。具体的には、元英和辞典データ２００に含まれている見出語単位データの中から、離散的に見出語単位データを選択して順番に追記することにより処理後英和参照部３０２として保存し、選択しなかった見出語単位データを主データ部中間データ３０６として保存する。 [2.2.2 Post-processing English-Japanese reference part creation processing]
First, the CPU 10 reads the original English-Japanese dictionary data 200 and creates a post-processing English-Japanese reference unit 302 and main data unit intermediate data 306. Specifically, the word-word unit data included in the original English-Japanese dictionary data 200 is discretely selected from the word-word unit data and added in order to be stored as the post-processing English-Japanese reference unit 302. The headword unit data not selected is stored as the main data portion intermediate data 306.

図６を用いて具体的に説明する。図６（ａ）は、元英和辞典データ２００を表した図である。図に示すように、元英和辞典データ２００には、見出語単位データが複数保存されている。図では、便宜的に見出語単位データにおける見出語部分のみを表示した。 This will be specifically described with reference to FIG. FIG. 6A shows the original English-Japanese dictionary data 200. As shown in the figure, the original English-Japanese dictionary data 200 stores a plurality of headword unit data. In the figure, only the headword portion in the headword unit data is displayed for convenience.

図６（ｂ）は、処理後英和参照部作成処理を実行したときの図である。ＣＰＵ１０は、元英和辞典データ２００中から、離散的に見出語単位データを選択する。そしてＣＰＵ１０は、選択された見出語単位データを、処理後英和参照部３０２として、ＲＡＭ３０に記憶する。また、選択されていない見出語単位データを、主データ部中間データ３０６として、ＲＡＭ３０に記憶する。 FIG. 6B is a diagram when the post-processing English-Japanese reference section creation process is executed. The CPU 10 discretely selects headword unit data from the original English-Japanese dictionary data 200. Then, the CPU 10 stores the selected headword unit data in the RAM 30 as the processed English-Japanese reference unit 302. Further, the entry unit data not selected is stored in the RAM 30 as the main data portion intermediate data 306.

このように、処理後英和参照部作成処理によれば、元英和辞典データ２００の中から見出語単位のデータを離散的に抽出して処理後英和参照部３０２を作成することができる。従って、元英和辞典データ２００中の一部分に偏ることなく、処理後英和参照部３０２を作成することが可能となる。 As described above, according to the post-process English-Japanese reference section creation process, the post-process English-Japanese reference section 302 can be created by discretely extracting data in units of headwords from the original English-Japanese dictionary data 200. Therefore, the post-processing English-Japanese reference unit 302 can be created without being biased to a part of the original English-Japanese dictionary data 200.

［２．２．３文字コード圧縮処理］
［２．２．３（ａ）処理の流れ］
図７は、文字コード圧縮処理に係るコンピュータ１の動作を説明するためのフローチャートである。この文字コード圧縮処理は、ＣＰＵ１０が、ハードディスク２０に記憶された文字コード圧縮プログラム２１２を実行することによって実現される。 [2.2.3 Character code compression processing]
[2.2.3 (a) Process flow]
FIG. 7 is a flowchart for explaining the operation of the computer 1 related to the character code compression processing. This character code compression processing is realized by the CPU 10 executing the character code compression program 212 stored in the hard disk 20.

まず、ＣＰＵ１０は、１バイト文字変換表３０８中において使用されていない文字コード（以下、適宜「空きコード」という。）を１つ決定する（ステップＢ１０）。次に、ＣＰＵ１０は、１バイト文字変換表３０８の中から、文字待避表３１２に待避する１バイトコードの文字（以下、適宜「待避文字」という。）を決定する（ステップＢ１２）。ここで、待避文字としては、元英和辞典データ２００中に使用されている文字の中で、使用頻度が少ないものから順に所定の数だけ決定される。 First, the CPU 10 determines one character code that is not used in the 1-byte character conversion table 308 (hereinafter referred to as “empty code” as appropriate) (step B10). Next, the CPU 10 determines, from the 1-byte character conversion table 308, a 1-byte code character to be saved in the character save table 312 (hereinafter, referred to as “a save character” as appropriate) (step B12). Here, as the escape character, a predetermined number is determined in order from the least frequently used characters in the original English-Japanese dictionary data 200.

次に、ＣＰＵ１０は、決定した待避文字を文字待避表３１２に割り当てる（ステップＢ１４）。そして、ＣＰＵ１０は、待避文字の数分の頻出文字を決定し（ステップＢ１６）、待避文字が割り当てられていた１バイト文字変換表３０８の１バイトコードに、その頻出文字を割り当てる（ステップＢ１８）。そして、ＣＰＵ１０は、割り当てた１バイト文字コードと頻出文字の汎用文字コードとを対応づけて保存して、文字コード変換表３１０を作成する（ステップＢ２０）。そして、ＣＰＵ１０は、処理後英和参照部３０２及び主データ部中間データ３０６に含まれる待避文字の文字コードを２バイト文字コードに置換し（ステップＢ２２）、頻出文字の文字コードを１バイト文字コードに置換する（ステップＢ２４）。 Next, the CPU 10 assigns the determined save character to the character save table 312 (step B14). Then, the CPU 10 determines as many frequent characters as the number of escape characters (step B16), and assigns the frequent characters to the 1-byte code of the 1-byte character conversion table 308 to which the escape characters have been assigned (step B18). Then, the CPU 10 associates and saves the assigned 1-byte character code and the general-purpose character code of the frequent characters, and creates the character code conversion table 310 (step B20). Then, the CPU 10 replaces the character code of the escape character included in the post-processing English-Japanese reference unit 302 and the main data portion intermediate data 306 with a 2-byte character code (step B22), and changes the character code of the frequent character to a 1-byte character code. Replace (step B24).

［２．２．３（ｂ）具体例］
文字コード圧縮処理について、図８を用いて具体的に説明する。図８（ａ）は、１バイト文字変換表３０８の一部について示した一例である。この１バイト文字変換表３０８の中から、空きコード「１３」を１つ決定する（ステップＢ１０）。次に、ＣＰＵ１０は、１バイト文字変換表３０８から、待避文字を１つ決定する。ここでは、待避文字として「！」が決定されたとする。そして、ＣＰＵ１０は、文字コード「２１」に割り当てられていた文字「！」を待避文字として文字待避表３１２の文字コード「００」の位置に割り当てる。この場合の文字「！」の文字コードは「１３００」となる。 [2.2.3 (b) Specific example]
The character code compression process will be specifically described with reference to FIG. FIG. 8A is an example showing a part of the 1-byte character conversion table 308. One empty code “13” is determined from the one-byte character conversion table 308 (step B10). Next, the CPU 10 determines one save character from the 1-byte character conversion table 308. Here, it is assumed that “!” Is determined as the escape character. Then, the CPU 10 assigns the character “!” Assigned to the character code “21” to the position of the character code “00” in the character save table 312 as a save character. In this case, the character code of the character “!” Is “1300”.

次に、ＣＰＵ１０は、頻出文字を待避文字と同数の１文字決定する（ステップＢ１６）。ここでは、頻出文字として「何」が決定されたとする。すると、ＣＰＵ１０は、文字「何」をステップＢ１４で決定した待避文字「！」が割り当てられていた場所（１バイト文字コード）に割り当てる（ステップＢ１８）。そして、ＣＰＵ１０は、文字「何」の汎用文字コード「８９ＢＤ」を文字コード変換表３１０に、コード「２１」に対応づけて保存する（ステップＢ２０）。この場合の、文字「何」の文字コードは「２１」となる。 Next, the CPU 10 determines the same number of frequently used characters as the number of escape characters (step B16). Here, it is assumed that “what” is determined as a frequent character. Then, the CPU 10 assigns the character “what” to the place (1-byte character code) where the save character “!” Determined in step B14 was assigned (step B18). Then, the CPU 10 stores the general-purpose character code “89BD” of the character “what” in the character code conversion table 310 in association with the code “21” (step B20). In this case, the character code of the character “what” is “21”.

図８（ｄ）は、文字コードを置換した図である。ここで、上段「（元）」は汎用文字コードであり、下段「（後）」は文字コード圧縮処理を実行して置換した後の文字コードである。文字「何」の文字コードが汎用文字コード「８９ＢＤ」から「２１」に、文字「！」の文字コードが汎用文字コード「２１」から「１３００」に置換されている。 FIG. 8D is a diagram in which the character code is replaced. Here, the upper “(original)” is a general-purpose character code, and the lower “(after)” is a character code after replacement by executing character code compression processing. The character code of the character “what” is replaced by “21” from the general-purpose character code “89BD”, and the character code of the character “!” Is replaced by “1300” from the general-purpose character code.

このように、文字コード圧縮処理によれば、出現頻度が高い２バイト文字を１バイト文字として置換し、出現頻度の低い１バイト文字を２バイト文字として置換して、処理後英和参照部３０２及び主データ部中間データ３０６を作成することが出来る。従って、出現頻度が高い２バイト文字を１バイト文字として表すことにより、辞書データを圧縮（データ量を削減）することが可能となる。 As described above, according to the character code compression process, a 2-byte character having a high appearance frequency is replaced with a 1-byte character, a 1-byte character having a low appearance frequency is replaced with a 2-byte character, and the post-processing English-Japanese reference unit 302 and The main data part intermediate data 306 can be created. Therefore, it is possible to compress dictionary data (reduce the amount of data) by expressing 2-byte characters with high appearance frequency as 1-byte characters.

［２．２．４グローバルサーチ処理］
［２．２．４（ａ）処理の流れ］
図９は、グローバルサーチ処理に係るコンピュータ１の動作を説明するためのフローチャートである。このグローバルサーチ処理は、ＣＰＵ１０が、ハードディスク２０に記載されたグローバルサーチプログラム２１４を実行することによって実現される。 [2.2.4 Global search processing]
[2.2.4 (a) Process flow]
FIG. 9 is a flowchart for explaining the operation of the computer 1 related to the global search process. This global search process is realized by the CPU 10 executing the global search program 214 described in the hard disk 20.

まず、変数ｉに検索文字長を代入する（ステップＣ１０）。ここで、検索文字長とは、グローバルサーチにおいて、処理後英和参照部３０２に含まれている文字列と、主データ部中間データ３０６に含まれている文字列とを一致させる文字列の長さのうち、最大の文字列の長さをいう。本実施形態においては、「１０」文字である。 First, the search character length is substituted for the variable i (step C10). Here, the search character length is the length of the character string that matches the character string included in the post-process English-Japanese reference unit 302 and the character string included in the main data intermediate data 306 in the global search. The length of the maximum character string. In this embodiment, it is “10” characters.

次に、ＣＰＵ１０は、変数ｊの値に初期値として「０」を代入する（ステップＣ１２）。そして、ＣＰＵ１０は、ｊを添え字として、リザーブ配列のｊ番目となるリザーブ配列［ｊ］の値が０であるか否かを判定する（ステップＣ１４）。もし、リザーブ配列［ｊ］の値が「０」以外のときは（ステップＣ１４；Ｎｏ）、ＣＰＵ１０は、変数ｊの値に「１」を加算する（ステップＣ２４）。 Next, the CPU 10 substitutes “0” as an initial value for the value of the variable j (step C12). Then, the CPU 10 determines whether or not the value of the reserve array [j], which is the j-th reserved array, is 0 with j as a subscript (step C14). If the value of the reserve array [j] is other than “0” (step C14; No), the CPU 10 adds “1” to the value of the variable j (step C24).

また、リザーブ配列［ｊ］の値が「０」のとき（ステップＣ１４；Ｙｅｓ）、現在の位置（ｊ文字目）から始まる語長ｉの文字列を抽出する（ステップＣ１６）。そして、抽出した文字列と同じ文字列が処理後英和参照部３０２にあるか否かを、ＣＰＵ１０は、判定する（ステップＣ１８）。もし、抽出した文字列と同じ文字列が処理後英和参照部３０２に含まれていないと判定した場合には（ステップＣ１８；Ｎｏ）、ＣＰＵ１０は、変数ｊの値に「１」を加えて処理を継続する（ステップＣ２４）。 When the value of the reserve array [j] is “0” (step C14; Yes), a character string having a word length i starting from the current position (jth character) is extracted (step C16). Then, the CPU 10 determines whether or not the same character string as the extracted character string exists in the post-processing English-Japanese reference unit 302 (step C18). If it is determined that the same character string as the extracted character string is not included in the processed English-Japanese reference unit 302 (step C18; No), the CPU 10 adds “1” to the value of the variable j and performs processing. (Step C24).

ここで、ＣＰＵ１０は、文字列が処理後英和参照部３０２に含まれていると判定した場合には（ステップＣ１８；Ｙｅｓ）、リザーブ配列［ｊ］の値からリザーブ配列［ｊ＋ｉ−１］の値を、変数ｉの値として代入する。そして、ＣＰＵ１０は、変数ｊの値に、変数ｉの値を加える（ステップＣ２２）。 Here, if the CPU 10 determines that the character string is included in the post-processing English-Japanese reference unit 302 (step C18; Yes), the value of the reserve array [j + i-1] from the value of the reserve array [j]. Is substituted as the value of the variable i. Then, the CPU 10 adds the value of the variable i to the value of the variable j (Step C22).

そして、ＣＰＵ１０は、主データ部中間データ３０６に含まれる文字列について、総て調べたか否かを判定する。もし、まだ判定し終わっていない場合には（ステップＣ２６；Ｎｏ）、ＣＰＵ１０は、ステップＣ１４からステップＣ２６までの処理について同様に繰り返す。 Then, the CPU 10 determines whether or not all character strings included in the main data portion intermediate data 306 have been examined. If the determination has not been completed yet (step C26; No), the CPU 10 repeats the processing from step C14 to step C26 in the same manner.

また、判定し終わっている場合には（ステップＣ２６；Ｙｅｓ）、ＣＰＵ１０は、変数ｉが最低文字長より大きいか否かを判定する（ステップＣ２８）。ここで、最低文字長とは、グローバルサーチの対象となる語長の閾値となる値となり、本実施形態では「５」と設定されている。従って、変数ｉの値が「５」より大きい場合については（ステップＣ２８；Ｙｅｓ）、変数ｉの値から１を減算（ステップＣ３０）し、ステップＣ１２から処理を実行する。 If the determination has been completed (step C26; Yes), the CPU 10 determines whether or not the variable i is greater than the minimum character length (step C28). Here, the minimum character length is a value serving as a threshold for the word length to be subjected to the global search, and is set to “5” in the present embodiment. Therefore, when the value of the variable i is larger than “5” (step C28; Yes), 1 is subtracted from the value of the variable i (step C30), and the processing is executed from step C12.

［２．２．４（ｂ）具体例］
グローバルサーチ処理について、図１０を用いて具体的に説明する。図１０（ａ）は、処理後英和参照部３０２の一部について示した一例である。「［名詞］〔コンピュータ…」が、１０文字目から始まっていることを表している。また、図１０（ｂ）は、グローバルサーチの処理手順を示すために、主データ部中間データ３０６の一部と、対応するリザーブ配列の一部を表している。また、「［名詞…」は、主データ部中間データ３０６において５０文字目から始まっていることを示している。 [2.2.4 (b) Specific example]
The global search process will be specifically described with reference to FIG. FIG. 10A is an example showing a part of the post-processing English-Japanese reference unit 302. “[Noun] [Computer ...” indicates that the 10th character starts. FIG. 10B shows a part of the main data part intermediate data 306 and a part of the corresponding reserve array in order to show the global search processing procedure. Further, “[noun ...” indicates that the main data portion intermediate data 306 starts from the 50th character.

例えば、現在の変数ｉの値が「１０」、変数ｊの値が「５０」としてグローバルサーチ処理を実行する。まず、図１０（ｂ）（イ）は、グローバルサーチ処理が実行されていないときの、リザーブ配列の状態である。リザーブ配列の値には、総て「０」が代入されている。 For example, the global search process is executed with the current value of the variable i being “10” and the value of the variable j being “50”. First, FIG. 10B and FIG. 10A are reserved array states when the global search process is not executed. All the values of the reserve array are assigned “0”.

次に、ＣＰＵ１０は、変数ｊの値が「５０」のとき、リザーブ配列［ｊ］は０であることから（図９のステップＣ１４；Ｙｅｓ）、「５０」文字目から始まる語長「１０」の文字列「［名詞］〔コンピュー」を抽出する（ステップＣ１６）。そして、抽出された文字列が、処理後英和参照部３０２に含まれているかを判定する（図９のステップＣ１８）。ここで、文字列「［名詞］〔コンピュー」は、処理後英和参照部３０２には含まれていないため（ステップ１８；Ｎｏ）、ＣＰＵ１０は、変数ｊの値に「１」加算し、変数ｊの値を「５１」とする（ステップＣ２４）。同様にして、ＣＰＵ１０は、ｊの値が５１〜５３のときについても、処理後英和参照部３０２に含まれていないと判定する。 Next, since the reserved array [j] is 0 when the value of the variable j is “50” (step C14 in FIG. 9; Yes), the CPU 10 starts the word length “10” starting from the “50” character. The character string “[noun] [computer” is extracted (step C16). Then, it is determined whether or not the extracted character string is included in the post-processing English-Japanese reference unit 302 (step C18 in FIG. 9). Here, since the character string “[noun] [computer” is not included in the post-processing English-Japanese reference unit 302 (step 18; No), the CPU 10 adds “1” to the value of the variable j, and the variable j Is set to “51” (step C24). Similarly, the CPU 10 determines that the post-processing English-Japanese reference unit 302 is not included even when the value of j is 51 to 53.

次に、変数ｊの値が「５４」のとき、ＣＰＵ１０が文字列「〔コンピュータ用語〕」を抽出する（ステップＣ１６）。そして、ＣＰＵ１０は、抽出した文字列が、処理後英和参照部３０２に含まれていると判定する（ステップＣ１８；Ｙｅｓ）。そして、ＣＰＵ１０は、リザーブ配列［５４］から、リザーブ配列［６３］まで、変数ｉの値である「１０」を代入する（図１０（ｂ）（ロ））。そして、ＣＰＵ１０は、変数ｊの値「５４」に、変数ｉの値「１０」を加えることにより（ステップＣ２２）、リザーブ配列［６４］から処理を実行する。 Next, when the value of the variable j is “54”, the CPU 10 extracts the character string “[computer term]” (step C16). Then, the CPU 10 determines that the extracted character string is included in the post-processing English-Japanese reference unit 302 (step C18; Yes). Then, the CPU 10 substitutes “10” that is the value of the variable i from the reserve array [54] to the reserve array [63] (FIG. 10 (b) (b)). Then, the CPU 10 adds the value “10” of the variable i to the value “54” of the variable j (step C22), and executes the process from the reserved array [64].

また、同様にグローバルサーチ処理を実行し、主データ部中間データ３０６に含まれる総ての文字列が終了すると、今度は、変数ｉの値を「９」として処理を実行する。このときリザーブ配列の各要素値は、変化しない（図１０（ｂ）（ハ））。また、変数ｉの値が「８」のとき、文字列「シーディーアール」が処理後英和参照部３０２に含まれているため、ＣＰＵ１０は、リザーブ配列［６４］からリザーブ配列［７１］までに、「８」を代入する（図１０（ｂ）（ニ））。 Similarly, when the global search process is executed and all the character strings included in the main data portion intermediate data 306 are completed, the process is executed with the value of the variable i set to “9”. At this time, each element value of the reserve array does not change (FIGS. 10B and 10C). Further, when the value of the variable i is “8”, the character string “CDR” is included in the post-processing English-Japanese reference unit 302, and thus the CPU 10 changes from the reserve array [64] to the reserve array [71]. “8” is substituted (FIG. 10 (b) (d)).

このように、グローバルサーチ処理によれば、語長の長い文字列から優先して辞書型符号化方法により符号化し、圧縮することが出来る。従って、語長の短い文字列に基づいて符号化するのに比べて、より効率的に符号化することが可能となる。 As described above, according to the global search process, it is possible to preferentially encode a character string having a long word length by using the dictionary-type encoding method and compress it. Therefore, encoding can be performed more efficiently than encoding based on a character string having a short word length.

［２．２．５ローカルサーチ処理］
［２．２．５（ａ）処理の流れ］
図１１は、ローカルサーチ処理に係るコンピュータ１の動作を説明するためのフローチャートである。このローカルサーチ処理は、ＣＰＵ１０が、ハードディスク２０に記憶されたローカルサーチプログラム２１６を実行することによって実現される。 [2.2.5 Local search processing]
[2.2.5 (a) Process flow]
FIG. 11 is a flowchart for explaining the operation of the computer 1 related to the local search process. This local search process is realized by the CPU 10 executing the local search program 216 stored in the hard disk 20.

まず、ＣＰＵ１０は変数ｊの値に「０」を代入する（ステップＤ１０）。また、ＣＰＵ１０は、変数ｍｉｎに取ることが出来る最大値を代入する（ステップＤ１２）。 First, the CPU 10 substitutes “0” for the value of the variable j (step D10). Further, the CPU 10 substitutes the maximum value that can be taken for the variable min (step D12).

次に、ＣＰＵ１０は、リザーブ配列［ｊ］の値が「０」か否か判定する（ステップＤ１４）。ここで、リザーブ配列［ｊ］の値が「０」以外の場合は（ステップＤ１４；Ｎｏ）、ＣＰＵ１０は、変数ｊの値に「１」加算して、ステップＤ４０に処理を移行する（ステップＤ３８）。 Next, the CPU 10 determines whether or not the value of the reserve array [j] is “0” (step D14). Here, when the value of the reserve array [j] is other than “0” (step D14; No), the CPU 10 adds “1” to the value of the variable j and shifts the processing to step D40 (step D38). ).

また、リザーブ配列［ｊ］の値が「０」の場合は（ステップＤ１４；Ｙｅｓ）、ＣＰＵ１０は、ｊ文字目から始まる文字列と、処理後英和参照部３０２中に含まれる文字列との中で最も語長が長い文字列ｍを抽出する（ステップＤ１６）。具体的には、ＣＰＵ１０は、ｊ文字目から始まる文字列であって、当該文字列の各要素（リザーブ配列）の値が「０」となる最長の文字列を抽出する。そして、抽出した文字列が、処理後英和参照部３０２中に含まれているか否か検索し、含まれていない場合は最後の１文字を削除して、同様に処理後英和参照部３０２を検索する。この処理を繰り返すことにより、文字列ｍを抽出する。 When the value of the reserve array [j] is “0” (step D14; Yes), the CPU 10 determines whether the character string starting from the j-th character and the character string included in the post-processing English-Japanese reference unit 302 are included. The character string m having the longest word length is extracted (step D16). Specifically, the CPU 10 extracts a longest character string that is a character string starting from the j-th character and whose value of each element (reserved array) of the character string is “0”. Then, it is searched whether or not the extracted character string is included in the post-process English-Japanese reference unit 302. If not, the last character is deleted and the post-process English-Japanese reference unit 302 is similarly searched. To do. By repeating this process, the character string m is extracted.

次に、ＣＰＵ１０は、抽出した文字列に対応する符号化効率予測値ｅｓｔ（ｍ）を符号化効率予測値テーブル２０６から読み出す（ステップＤ１８）。そして、読み出された符号化効率予測値ｅｓｔ（ｍ）の値と、変数ｍｉｎの値とを比較する（ステップＤ２０）。 Next, the CPU 10 reads the predicted encoding efficiency value est (m) corresponding to the extracted character string from the predicted encoding efficiency value table 206 (step D18). Then, the read encoding efficiency predicted value est (m) is compared with the value of the variable min (step D20).

ここで、符号化効率予測値ｅｓｔ（ｍ）の値の方が、変数ｍｉｎの値より小さいとＣＰＵ１０が判定した場合には（ステップＤ２０；Ｙｅｓ）、文字列ｍの長さｌｅｎ（ｍ）と文字列ｍの位置ｌｏｃ（ｍ）とを算出する（ステップＤ２２）。そして、変数ｍｉｎにｅｓｔ（ｍ）の値を代入し、変数ｌｅｎにｌｅｎ（ｍ）の値を代入し、変数ｌｏｃにｌｏｃ（ｍ）の値を代入する（ステップＤ２４）。さらに、文字列ｍを待避文字列としてＲＡＭ３０に記憶する（ステップＤ２６）。 Here, when the CPU 10 determines that the encoding efficiency predicted value est (m) is smaller than the variable min (step D20; Yes), the length len (m) of the character string m is The position loc (m) of the character string m is calculated (step D22). Then, the value of est (m) is substituted for the variable min, the value of len (m) is substituted for the variable len, and the value of loc (m) is substituted for the variable loc (step D24). Further, the character string m is stored in the RAM 30 as a save character string (step D26).

次に、ＣＰＵ１０は、ｌｅｎ（ｍ）の値が１より大きいと判定した場合は（ステップＤ２８；Ｙｅｓ）、文字列ｍの最終文字を削除して文字列ｍを更新し（ステップＤ３０）、ステップＤ１８からステップＤ２６までを同様に繰り返し処理する。 Next, when the CPU 10 determines that the value of len (m) is greater than 1 (step D28; Yes), the CPU 10 deletes the last character of the character string m and updates the character string m (step D30). D18 to step D26 are similarly repeated.

さらに、ＣＰＵ１０は、変数ｍｉｎがハフマン符号長より小さいか否かを判定する（ステップＤ３２）。ここで、ハフマン符号長は、待避文字列に対応するハフマン符号をハフマン符号テーブル３１６より読み出し、読み出されたハフマン符号の符号長を算出することによって求める。そして、ＣＰＵ１０は、変数ｍｉｎの値とハフマン符号長とを比較し、ハフマン符号長が変数ｍｉｎの値以上の場合には（ステップＤ３２；Ｎｏ）、変数ｊの値に１加算する（ステップＤ３８）。また、変数ｍｉｎの値が、ハフマン符号長より小さい場合は（ステップＤ３２；Ｙｅｓ）、ＣＰＵ１０は、リザーブ配列［ｊ］〜リザーブ配列「ｊ＋ｌｅｎ−１」までに変数ｌｅｎの値を代入する（ステップＤ３４）。そして、変数ｊに変数ｌｅｎの値を加算する（ステップＤ３６）。 Further, the CPU 10 determines whether or not the variable min is smaller than the Huffman code length (step D32). Here, the Huffman code length is obtained by reading the Huffman code corresponding to the saved character string from the Huffman code table 316 and calculating the code length of the read Huffman code. Then, the CPU 10 compares the value of the variable min with the Huffman code length, and when the Huffman code length is equal to or greater than the value of the variable min (step D32; No), 1 is added to the value of the variable j (step D38). . If the value of the variable min is smaller than the Huffman code length (step D32; Yes), the CPU 10 substitutes the value of the variable len from the reserve array [j] to the reserve array “j + len−1” (step D34). ). Then, the value of the variable len is added to the variable j (step D36).

そして、ＣＰＵ１０は、主データ部中間データに含まれている総ての文字列に対して処理を終了したか否かを判定する（ステップＤ４０）。まだ、総ての文字列に対して処理を終了していないとＣＰＵ１０が判定した場合には、再びステップＤ１２より処理を行う。 Then, the CPU 10 determines whether or not processing has been completed for all character strings included in the main data portion intermediate data (step D40). If the CPU 10 determines that the processing has not been completed for all the character strings, the processing is performed again from step D12.

［２．２．５（ｂ）具体例］
ローカルサーチ処理について、図１０を用いて具体的に説明する。図１０（ｂ）の（ホ）及び（ヘ）の下２段は、ローカルサーチ処理を実行した場合のリザーブ配列を表した図である。まず、変数ｊの値が「４９」のときに、ローカルサーチ処理を実行したときのリザーブ配列を表したのが図１０（ｂ）（ホ）であり、変数ｊの値が「５０」のときに、ローカルサーチ処理を実行したときのリザーブ配列を表したのが図１０（ｂ）（ヘ）である。 [2.2.5 (b) Specific example]
The local search process will be specifically described with reference to FIG. The lower two stages of (e) and (f) in FIG. 10B are diagrams showing the reserve arrangement when the local search process is executed. First, when the value of the variable j is “49”, the reserved array when the local search processing is executed is shown in FIGS. 10B and 10E, and when the value of the variable j is “50”. FIG. 10B and FIG. 10F show the reserved array when the local search process is executed.

まず、変数ｊの値が「４９」のときは、５０文字目から５３文字目までに対応するリザーブ配列の値は「０」となっている。次に、ＣＰＵ１０は、変数ｊの値が「５０」のときにリザーブ配列［５０］の値が０であることから（ステップＤ１４；Ｙｅｓ）、５０文字目から始まる文字列で参照部と一致する最も長い文字列「［名詞］」を抽出する（ステップＤ１６）。そして、文字列「［名詞］」の符号化効率予測値ｅｓｔ（［名詞］）を、符号化効率予測値テーブル２０６より読み出す。すると、符号化効率予測値ｅｓｔ（［名詞］）は「２０」となる。次に、現在最小値ｍｉｎとしては「∞」が代入されており、符号化効率予測値ｅｓｔ（［名詞］）の方が小さいことから、ＣＰＵ１０は、それぞれの値を代入する。具体的には、符号化効率予測値「２０」を変数ｍｉｎに代入し、文字列「［名詞］」の長さ「４」を変数ｌｅｎに代入し、文字列「［名詞］」の位置「５０」を変数ｌｏｃに代入する。また、文字列「［名詞］」を待避文字列として、ＲＡＭ３０に記憶する。 First, when the value of the variable j is “49”, the value of the reserve array corresponding to the 50th to 53rd characters is “0”. Next, since the value of the reserve array [50] is 0 when the value of the variable j is “50” (step D14; Yes), the CPU 10 matches the reference portion with the character string starting from the 50th character. The longest character string “[noun]” is extracted (step D16). Then, the encoding efficiency prediction value est ([noun]) of the character string “[noun]” is read from the encoding efficiency prediction value table 206. Then, the encoding efficiency prediction value est ([noun]) is “20”. Next, “∞” is assigned as the current minimum value min, and the encoding efficiency predicted value est ([noun]) is smaller, so the CPU 10 substitutes each value. Specifically, the encoding efficiency prediction value “20” is substituted into the variable min, the length “4” of the character string “[noun]” is substituted into the variable len, and the position “ 50 ”is substituted into the variable loc. Further, the character string “[noun]” is stored in the RAM 30 as a saved character string.

次に、ＣＰＵ１０は、文字列「［名詞］」の最終文字を削除して、文字列「［名詞」としてステップＤ１８から実行する。この場合、文字列「［名詞」の符号化効率予測値は「２８」となり、先ほどの「２０」より大きい。同様に、更に文字列「［名詞」の最終文字を削除した「［名」について同様に符号化効率予測値を算出し、変数ｍｉｎの値と比較する。 Next, the CPU 10 deletes the last character of the character string “[noun]” and executes the character string “[noun]” from step D18. In this case, the predicted encoding efficiency value of the character string “[noun” is “28”, which is larger than “20”. Similarly, a predicted encoding efficiency value is similarly calculated for “[name] from which the final character of the character string“ [noun ”is deleted, and is compared with the value of the variable min.

そして、文字列ｍの長さｌｅｃ（ｍ）が１になると（ステップＤ２８；Ｎｏ）、変数ｍｉｎと待避文字列のハフマン符号長とを比較する。このとき、文字列「［名詞］」の場合が最も符号化効率予測値が小さいとすると、変数ｍｉｎの値は「２０」となる。また、ＣＰＵ１０は、文字列「［名詞］」のハフマン符号長を、ハフマン符号テーブル３１６より求める。この際、ハフマン符号長は「３２」ビットであるとすると、変数ｍｉｎの方が小さいため、リザーブ配列［５０］からリザーブ配列［５３］の値に「４」を代入する。 When the length rec (m) of the character string m becomes 1 (step D28; No), the variable min is compared with the Huffman code length of the escape character string. At this time, if the encoding efficiency prediction value is the smallest in the case of the character string “[noun]”, the value of the variable min is “20”. Further, the CPU 10 obtains the Huffman code length of the character string “[noun]” from the Huffman code table 316. At this time, if the Huffman code length is “32” bits, since the variable min is smaller, “4” is substituted into the value of the reserve array [53] from the reserve array [50].

このように、ローカルサーチ処理によれば、文字列を符号化する際に、より符号数が少なくなる文字列で符号化することが出来るようになる。また、短い文字列については、ハフマン符号で符号化する場合を比較することにより、より適切な符号化方法に基づいて、主データ部中間データ３０６を圧縮することが出来るようになる。 Thus, according to the local search process, when a character string is encoded, it is possible to encode with a character string having a smaller number of codes. Further, by comparing the case of encoding a short character string with the Huffman code, the main data part intermediate data 306 can be compressed based on a more appropriate encoding method.

［２．２．６符号化処理］
［２．２．６（ａ）処理の流れ］
図１２は、符号化処理に係るコンピュータ１の動作を説明するためのフローチャートである。この符号化処理は、ＣＰＵ１０が、ハードディスク２０に記憶された符号化プログラム２１８を実行することによって実現される。 [2.2.6 Encoding process]
[2.2.6 (a) Process flow]
FIG. 12 is a flowchart for explaining the operation of the computer 1 related to the encoding process. This encoding process is realized by the CPU 10 executing the encoding program 218 stored in the hard disk 20.

まず、ＣＰＵ１０は、主データ部中間データ３０６から文字を選択する（ステップＥ１０）。次に、ＣＰＵ１０は、選択した文字に対応する文字対応リザーブ配列値を抽出する（ステップＥ１２）。そして、抽出した値が「０」の場合には（ステップＥ１４；Ｙｅｓ）、ＣＰＵ１０は、選択した文字を始めとする文字対応リザーブ配列値が「０」の文字列をハフマン符号で符号化する（ステップＥ２４）。 First, the CPU 10 selects a character from the main data portion intermediate data 306 (step E10). Next, the CPU 10 extracts a character correspondence reserved array value corresponding to the selected character (step E12). If the extracted value is “0” (step E14; Yes), the CPU 10 encodes the character string having the character correspondence reserved array value “0” including the selected character with the Huffman code ( Step E24).

また、文字対応リザーブ配列値が「０」以外の場合には（ステップＥ１４；Ｎｏ）、参照する文字列の参照部のアドレスを抽出する（ステップＥ１６）。具体的には、文字対応リザーブ配列値より抽出した値の分の文字列を、主データ部中間データ３０６より抽出する。そして、ＣＰＵ１０は、抽出した文字列と同じ文字列が含まれているアドレス（位置）を処理後英和参照部３０２より抽出する。 If the character correspondence reserved array value is other than “0” (step E14; No), the address of the reference portion of the character string to be referenced is extracted (step E16). Specifically, the character string corresponding to the value extracted from the character correspondence reserved array value is extracted from the main data part intermediate data 306. Then, the CPU 10 extracts an address (position) including the same character string as the extracted character string from the processed English-Japanese reference unit 302.

次に、ＣＰＵ１１０は、抽出されたアドレスが、サブテーブル３１４に含まれているか否かを判定する。ＣＰＵ１０は、サブテーブル３１４に含まれていると判定した場合には、サブテーブルに基づいて符号化を実行する（ステップＥ２０）。また、ＣＰＵ１０は、サブテーブル３１４に含まれていないと判定した場合には、処理後英和参照部３０２のアドレスに基づいて文字列を符号化する（ステップＥ２２）。 Next, the CPU 110 determines whether or not the extracted address is included in the subtable 314. If the CPU 10 determines that it is included in the sub-table 314, it performs encoding based on the sub-table (step E20). If the CPU 10 determines that the sub-table 314 is not included, the CPU 10 encodes the character string based on the address of the post-processing English-Japanese reference unit 302 (step E22).

そして、ＣＰＵ１０は、主データ部中間データ３０６に含まれている総ての文字列について、符号化が終了しているか否かを判定し、符号化が終了していない文字列があると判定した場合には（ステップＥ２６；Ｎｏ）、符号化が終了していない文字列を選択し（ステップＥ２８）、ステップＥ１２より処理を実行する。また、符号化が終了していると判定した場合には（ステップＥ２６；Ｙｅｓ）、符号化処理を終了する。 Then, the CPU 10 determines whether or not encoding has been completed for all character strings included in the main data part intermediate data 306, and determines that there is a character string that has not been encoded. In such a case (step E26; No), a character string that has not been encoded is selected (step E28), and the process is executed from step E12. If it is determined that the encoding is complete (step E26; Yes), the encoding process is terminated.

［２．２．６（ｂ）具体例］
符号化処理について、図１０及び図１３を用いて具体的に説明する。まず、ＣＰＵ１０が、文字列「［名詞］」について符号化処理を実行する場合について説明する。 [2.2.6 (b) Specific example]
The encoding process will be specifically described with reference to FIGS. 10 and 13. First, the case where the CPU 10 executes the encoding process for the character string “[noun]” will be described.

まず、文字「［名詞］」を選択して対応するリザーブ配列の値をリザーブ配列から抽出すると「４」となっている。そこで、ＣＰＵ１０は、処理後英和参照部３０２における文字列「［名詞］」の位置を抽出する（ステップＥ１６）。今、処理後英和参照部３０２における文字列「［名詞］」のアドレスは「00000000000000010000」とする。次に、このアドレスがサブテーブル３１４に保存されているか否かを判定する。すると、アドレス「00000000000000010000」は、サブテーブル３１４の符号「１０」に対応づけて保存されている。そして、ＣＰＵ１０は、サブテーブル及び語長に基づいて文字列「［名詞］」を符号化する。 First, when the character “[noun]” is selected and the value of the corresponding reserved array is extracted from the reserved array, “4” is obtained. Therefore, the CPU 10 extracts the position of the character string “[noun]” in the processed English-Japanese reference unit 302 (step E16). Now, the address of the character string “[noun]” in the post-processing English-Japanese reference unit 302 is “00000000000000010000”. Next, it is determined whether or not this address is stored in the subtable 314. Then, the address “00000000000000010000” is stored in association with the code “10” of the sub table 314. Then, the CPU 10 encodes the character string “[noun]” based on the sub-table and the word length.

図１３（ａ）は、「［名詞］」の符号化を説明した図である。ＣＰＵ１０は、辞書型符号化方法によって符号化されていることを示す「１」と、処理後英和参照部３０２において、文字列が含まれているアドレス値がサブテーブル３１４に保存されていることを示す「１」と、処理後英和参照部３０２のアドレスに対応するサブテーブル３１４の符号「１０」と、文字列「［名詞］」の語長「４」を表す「００１００」とからなる符号列「１１１０００１００」に符号化する。従って、本来「［名詞］」は、６４ビットで表現されていたが、本実施形態によれば、９ビットで表現することが出来る。 FIG. 13A illustrates the encoding of “[noun]”. The CPU 10 indicates that “1” indicating that the data is encoded by the dictionary-type encoding method and that the address value including the character string is stored in the sub-table 314 in the post-processing English-Japanese reference unit 302. A code string consisting of “1” shown, the code “10” of the sub-table 314 corresponding to the address of the post-processing English-Japanese reference unit 302, and “00100” representing the word length “4” of the character string “[noun]” It encodes to “111000100”. Therefore, “[noun]” is originally expressed in 64 bits, but according to the present embodiment, it can be expressed in 9 bits.

また、文字列「シーディーアール」について符号化処理を実行することにより、符号化する様子を示したのが図１３（ｂ）である。文字「シ」に対応する文字対応リザーブ配列値は「８」である。また、処理後英和参照部３０２において、文字列「シーディーアール」が含まれている処理後英和参照部３０２におけるアドレスは、サブテーブル３１４に保存されていないとする。ＣＰＵ１０は符号化処理を実行すると、文字列「シーディーアール」は、辞書型符号化方法により符号化されたことを示す「１」と、文字列「シーディーアール」が含まれている処理後英和参照部３０２におけるアドレスがサブテーブルに保存されていないことを示す「０」と、処理後英和参照部３０２において、文字列「シーディーアール」の先頭の文字「シ」が、処理後英和参照部３０２に含まれているアドレス「00000000000000011110」と、語長「８」を表す「０１０００」とを併せた符号列「100000000000000001111001000」に符号化する。従って、本来「シーディーアール」は１２８ビットで表現されていたが、本実施形態によれば、２７ビットで表現することが出来る。 FIG. 13B shows a state of encoding by executing the encoding process for the character string “CDR”. The character corresponding reserve array value corresponding to the character “shi” is “8”. Further, it is assumed that the post-process English-Japanese reference unit 302 including the character string “CDR” is not stored in the sub-table 314 in the post-process English-Japanese reference unit 302. When the CPU 10 executes the encoding process, the character string “CDR” is “1” indicating that the character string “CDR” is encoded by the dictionary type encoding method, and the post-processing English-Japanese reference including the character string “CDR” is referred to. “0” indicating that the address in the part 302 is not stored in the sub-table, and the first character “S” in the character string “CDR” in the post-processing English-Japanese reference unit 302 is sent to the post-processing English-Japanese reference unit 302. The encoded address “00000000000000011110” and “01000” representing the word length “8” are encoded into a code string “100000000000000001111001000”. Therefore, originally “CDR” was expressed by 128 bits, but according to the present embodiment, it can be expressed by 27 bits.

このように、符号化処理によれば、辞書型符号化方法により主データ部中間データ３０６を符号化する際に、頻出する文字列の位置がサブテーブル３１４に保存されていれば、文字列の位置を符号で表すことができる。従って、文字列の位置を用いて直接符号化する場合に比べてより少ない情報量で符号化することが可能となる。 Thus, according to the encoding process, when the main data portion intermediate data 306 is encoded by the dictionary-type encoding method, if the position of the frequently occurring character string is stored in the sub-table 314, the character string The position can be represented by a code. Therefore, it is possible to encode with a smaller amount of information than in the case of direct encoding using the position of the character string.

［３．電子辞書装置］
［３．１構成］
図１４は、電子辞書装置１００の構成を示すブロック図である。同図に示すように、電子辞書装置１００は、ＣＰＵ（Central Processing Unit）１１０と、ＲＯＭ（Read Only Memory）１２０と、ＲＡＭ（Random Access Memory）１３０と、ＥＥＰＲＯＭ（Electronically Erasable and Programmable Read Only Memory）１４０と、入力部１５０と、表示部１６０とを備えている。 [3. Electronic dictionary device]
[3.1 Configuration]
FIG. 14 is a block diagram illustrating a configuration of the electronic dictionary device 100. As shown in the figure, an electronic dictionary device 100 includes a CPU (Central Processing Unit) 110, a ROM (Read Only Memory) 120, a RAM (Random Access Memory) 130, and an EEPROM (Electronically Erasable and Programmable Read Only Memory). 140, an input unit 150, and a display unit 160.

［３．１．１記憶領域］
ＲＯＭ１２０は、各種初期設定、ハードウェアの検査、あるいは必要なプログラムのロード等を行うための初期プログラムを格納する。ＣＰＵ１１０は、電子辞書装置１００の電源投入時においてこの初期プログラムを実行することにより、電子辞書装置１００の動作環境を設定する。 [3.1.1 Storage area]
The ROM 120 stores an initial program for performing various initial settings, hardware inspections, loading of necessary programs, and the like. The CPU 110 sets the operating environment of the electronic dictionary device 100 by executing this initial program when the electronic dictionary device 100 is powered on.

また、ＲＯＭ１２０は、メニュー表示処理、各種設定処理、各種検索処理等の電子辞書装置１００の動作に係る各種プログラムや、電子辞書装置１００の備える種々の機能を実現するためのプログラム等を格納すると共に、辞書伸張プログラム１２００と、文字コード伸張プログラム１２０２とを備えている。 The ROM 120 stores various programs related to the operation of the electronic dictionary device 100 such as menu display processing, various setting processing, various search processing, and programs for realizing various functions of the electronic dictionary device 100. A dictionary expansion program 1200 and a character code expansion program 1202 are provided.

ＲＡＭ１３０は、ＣＰＵ１１０が実行する各種プログラムや、これらのプログラムの実行に係るデータ等を一時的に保持するメモリ領域を備える。 The RAM 130 includes a memory area that temporarily holds various programs executed by the CPU 110 and data related to the execution of these programs.

ＥＥＰＲＯＭ１４０は、電子辞書装置１００において、ＣＰＵ１１０が参照する各種辞書データや、各種設定等を電源オフの後であっても記憶するためのメモリである。本実施形態では、圧縮後英和辞典データ１４００と、１バイト文字変換表１４０６と、文字コード変換表１４０８と、文字待避表１４１０と、サブテーブル１４１２と、ハフマン符号テーブル１４１４と、見出語テーブル１４１６とを備えている。ここで、圧縮後英和辞典データ１４００は、圧縮後英和辞典データ３００と同一の辞書データである。１バイト文字変換表１４０６は、１バイト文字変換表３０８と同一の表である。文字コード変換表１４０８は、文字コード変換表３１０と同一の表である。文字待避表１４１０は、文字待避表３１２と同一の表である。サブテーブル１４１２は、サブテーブル３１４と同一のテーブルである。ハフマン符号テーブル１４１４は、ハフマン符号テーブル３１６と同一のテーブルである。見出語テーブル１４１６は、見出語テーブル３２０と同一のテーブルである。 The EEPROM 140 is a memory for storing various dictionary data referred to by the CPU 110 and various settings in the electronic dictionary device 100 even after the power is turned off. In this embodiment, post-compression English-Japanese dictionary data 1400, 1-byte character conversion table 1406, character code conversion table 1408, character save table 1410, sub-table 1412, Huffman code table 1414, and entry word table 1416 And. Here, the post-compression English-Japanese dictionary data 1400 is the same dictionary data as the post-compression English-Japanese dictionary data 300. The 1-byte character conversion table 1406 is the same table as the 1-byte character conversion table 308. The character code conversion table 1408 is the same table as the character code conversion table 310. The character saving table 1410 is the same table as the character saving table 312. The sub table 1412 is the same table as the sub table 314. The Huffman code table 1414 is the same table as the Huffman code table 316. The entry word table 1416 is the same table as the entry word table 320.

［３．１．２ＣＰＵ］
ＣＰＵ１１０は、入力される指示に応じて所定のプログラムに基づいた処理を実行し、各機能部への指示やデータの転送を行う。具体的には、ＣＰＵ１１０は、入力部１５０から入力される操作信号に応じてＲＯＭ１２０に格納されたプログラムを読み出し、当該プログラムに従って処理を実行する。そして、ＣＰＵ１１０は、表示制御信号を適宜表示部１６０に出力して、処理結果を表示させる。 [3.1.2 CPU]
The CPU 110 executes processing based on a predetermined program in accordance with an input instruction, and transfers instructions and data to each function unit. Specifically, CPU 110 reads a program stored in ROM 120 in response to an operation signal input from input unit 150, and executes processing according to the program. Then, the CPU 110 appropriately outputs a display control signal to the display unit 160 to display the processing result.

また、ＣＰＵ１１０は、本実施形態において、ＲＯＭ１２０に記憶された辞書伸張プログラム１２００に従った、辞書伸張処理（図１５参照）を実行すると共に、この辞書伸張処理において、文字コード伸張プログラム１２０２を読み出して文字コード伸張処理をサブルーチンとして実行する。 In this embodiment, the CPU 110 executes a dictionary expansion process (see FIG. 15) according to the dictionary expansion program 1200 stored in the ROM 120, and reads the character code expansion program 1202 in the dictionary expansion process. The character code expansion process is executed as a subroutine.

具体的には、ＣＰＵ１１０は、辞書伸張処理において、入力文字に対応する見出語を検索し、符号列を抽出する。そこで、抽出された符号列の始めの１ビットが１である場合には、ＣＰＵ１１０は、その符号列は辞書型符号化方法によって符号化された符号列であると判定する。さらに辞書型符号化方法によって符号化された符号列の次の１ビットが１である場合には、ＣＰＵ１１０は、参照する文字列の位置はサブテーブルに記憶されていると判定し、参照する文字列の位置をサブテーブルから読み出し、復号する。また、辞書型符号化方法によって符号化された符号列のうち、次の１ビットが０である場合には、ＣＰＵ１１０は、参照する文字列の位置を符号列から抽出し、符号列を復号する。そして、復号された文字列に対し、ＣＰＵ１１０は、文字コード伸張処理を実行し、文字列を表示部１６０に表示する。 Specifically, the CPU 110 searches for a headword corresponding to the input character and extracts a code string in the dictionary expansion process. Therefore, when the first bit of the extracted code string is 1, the CPU 110 determines that the code string is a code string encoded by the dictionary-type encoding method. Further, when the next 1 bit of the code string encoded by the dictionary type encoding method is 1, the CPU 110 determines that the position of the character string to be referenced is stored in the sub-table, and the character to be referred to Read column position from sub-table and decode. When the next 1 bit is 0 in the code string encoded by the dictionary-type encoding method, the CPU 110 extracts the position of the character string to be referenced from the code string, and decodes the code string. . Then, the CPU 110 executes a character code expansion process for the decoded character string, and displays the character string on the display unit 160.

また、ＣＰＵ１１０は、文字コード伸張処理では、文字コードの１バイト目を抽出する。次に、ＣＰＵ１１０は、１バイト文字変換表１４０６から、抽出された文字コードの１バイト目の文字種別を抽出する。そして、２バイト文字の場合は、２バイト文字として表示する。ここで、１バイト文字である場合に、ＣＰＵ１１０は、文字コード圧縮処理によって変換された１バイト文字（以下、適宜「１バイト変換文字」という。）であるか否かを判定する。１バイト変換文字である場合には、ＣＰＵ１１０は、文字コード変換表１４０８から対応する文字コードを読み出し、対応する文字を表示する。 The CPU 110 extracts the first byte of the character code in the character code expansion process. Next, the CPU 110 extracts the character type of the first byte of the extracted character code from the 1-byte character conversion table 1406. In the case of a 2-byte character, it is displayed as a 2-byte character. Here, in the case of a 1-byte character, the CPU 110 determines whether or not it is a 1-byte character converted by the character code compression process (hereinafter referred to as “1-byte converted character” as appropriate). If it is a 1-byte conversion character, the CPU 110 reads the corresponding character code from the character code conversion table 1408 and displays the corresponding character.

［３．１．３入出力部］
入力部１５０は、仮名やアルファベット等の文字入力や機能選択等に必要なキー群を備えた入力装置であり、押下されたキーの信号をＣＰＵ１１０に出力する。この入力部１５０におけるキー入力により、入力文字の入力、辞書モードの選択、検索実行指示、ジャンプ機能の開始等を指示入力する入力手段を実現する。なお、この入力部１５０は、図１のキー群１０５に相当するが、キー群１０５に限られるわけではなく、タッチパネル等であってもよい。 [3.1.3 Input / output unit]
The input unit 150 is an input device including a key group necessary for inputting characters such as kana and alphabets and selecting functions, and outputs a signal of a pressed key to the CPU 110. By means of key input in the input unit 150, an input means for inputting an input character, selecting a dictionary mode, instructing search execution, starting a jump function, and the like is realized. The input unit 150 corresponds to the key group 105 in FIG. 1, but is not limited to the key group 105 and may be a touch panel or the like.

表示部１６０は、ＣＰＵ１１０から入力される表示信号に基づいて各種画面を表示するものであり、ＬＣＤ等により構成される。なお、この表示部１６０は、図１に示すディスプレイ１０３に相当する。 The display unit 160 displays various screens based on display signals input from the CPU 110, and includes an LCD or the like. The display unit 160 corresponds to the display 103 shown in FIG.

［３．２動作］
［３．２．１辞書伸張処理］
［３．２．１（ａ）処理の流れ］
図１５は、辞書伸張処理に係る電子辞書装置１００の動作を説明するためのフローチャートである。この辞書伸張処理は、ＣＰＵ１１０がＲＯＭ１２０に記憶された辞書伸張プログラム１２００を実行することによって実現される処理である。 [3.2 Operation]
[3.2.1 Dictionary expansion processing]
[3.2.1 (a) Process flow]
FIG. 15 is a flowchart for explaining the operation of the electronic dictionary device 100 according to the dictionary expansion processing. This dictionary expansion process is a process realized by the CPU 110 executing the dictionary expansion program 1200 stored in the ROM 120.

まず、ＣＰＵ１１０は、文字が入力されると（ステップＦ１０）、入力文字に対応する見出語を検索する（ステップＦ１２）。具体的には、ＣＰＵ１１０は、見出語テーブル１４１６に格納された開始位置の中から何れかの開始位置を選択し、圧縮後英和辞典データ１４００を伸張していく処理を行う。見出語テーブル１４１６には、圧縮後英和辞典データ１４００の格納順に見出語単位データの開始位置が格納されているため、例えば、公知である２分木を用いた探索方法等により、開始位置の選択・見出語の伸張・適合する見出語か否かの判定を繰り返すことにより、見出語の検索を実行する。 First, when a character is input (step F10), the CPU 110 searches for a headword corresponding to the input character (step F12). Specifically, the CPU 110 performs a process of selecting any start position from the start positions stored in the headword table 1416 and expanding the compressed English-Japanese dictionary data 1400. Since the headword unit data start position is stored in the headword table 1416 in the storage order of the compressed English-Japanese dictionary data 1400, the start position is determined by, for example, a known search method using a binary tree. The search for the headword is executed by repeating the selection, the expansion of the headword, and the determination of whether or not the headword is suitable.

次に、ＣＰＵ１１０は、入力文字に対応する見出語の見出語単位データの符号列を抽出する（ステップＦ１４）。そして、ＣＰＵ１１０は、抽出された符号列のうち、まず復号する処理の対象となる符号列（以下、適宜「復号対象符号列」という。）を抽出する。次に、ＣＰＵ１０は、復号対象符号列のはじめの１ビットが「１」であるか否かを判定する（ステップＦ１６）。ここで、はじめの１ビットが「０」であった場合には（ステップＦ１６；Ｎｏ）、ＣＰＵ１１０は、続く符号列をハフマン符号として、ハフマン符号に対応する文字列をハフマン符号テーブル１４１４から読み出し、復号対象符号列を復号する（ステップＦ１８）。 Next, the CPU 110 extracts a code string of headword unit data of the headword corresponding to the input character (step F14). Then, the CPU 110 first extracts a code string to be decoded (hereinafter referred to as “decoding target code string” as appropriate) from the extracted code strings. Next, the CPU 10 determines whether or not the first 1 bit of the decoding target code string is “1” (step F16). Here, when the first 1 bit is “0” (step F16; No), the CPU 110 reads the character string corresponding to the Huffman code from the Huffman code table 1414 with the subsequent code string as the Huffman code, The decoding target code string is decoded (step F18).

また、復号対象符号列の始めの１ビットが「１」であった場合には（ステップＦ１６；Ｙｅｓ）、ＣＰＵ１１０は、続く２ビット目が「１」であるか否かを判定する（ステップＦ２０）。そして、２ビット目が「１」である場合には、続く符号列からサブテーブルの番号及び語長を抽出する（ステップＦ２２）。そして、ＣＰＵ１１０は、サブテーブルの番号に対応するアドレス値をサブテーブル１４１２から読み出し、読み出されたアドレス値及び語長に基づいて復号対象符号列を復号する。 If the first bit of the decoding target code string is “1” (step F16; Yes), the CPU 110 determines whether or not the subsequent second bit is “1” (step F20). ). If the second bit is “1”, the sub-table number and word length are extracted from the subsequent code string (step F22). Then, the CPU 110 reads an address value corresponding to the number of the subtable from the subtable 1412 and decodes the decoding target code string based on the read address value and word length.

他方、２ビット目が「０」であった場合には、ＣＰＵ１１０は続く符号列から参照する文字列のアドレス値及び語長を抽出する（ステップＦ２４）。そして、ＣＰＵ１１０は、抽出されたアドレス値及び語長に基づいて、復号対象符号列を復号する。 On the other hand, if the second bit is “0”, the CPU 110 extracts the address value and word length of the character string to be referenced from the subsequent code string (step F24). Then, the CPU 110 decodes the decoding target code string based on the extracted address value and word length.

そして、ＣＰＵ１１０は、読み出された見出し語単位データの符号列を総て復号したか否かを判定し、見出語単位データの符号列の総てが復号されていない場合には（ステップＦ２６；Ｎｏ）、ＣＰＵ１１０は、続く復号対象符号列を決定し（ステップＦ２８）、ステップＦ１６から同様の処理を実行する。 Then, the CPU 110 determines whether or not all of the read code strings of headword unit data have been decoded, and when all of the code strings of headword unit data have not been decoded (step F26). No), the CPU 110 determines the subsequent decoding target code string (step F28), and executes the same processing from step F16.

また、見出語単位データの符号列の総てが復号されている場合には（ステップＦ２６；Ｙｅｓ）、ＣＰＵ１１０は、復号された文字の文字コードについて、文字コード伸張処理を実行する（ステップＦ３０）。 If all of the code strings of the headword unit data have been decoded (step F26; Yes), the CPU 110 executes a character code expansion process for the decoded character code (step F30). ).

［３．２．１（ｂ）具体例］
具体的に、図１３を使って説明する。ここで、図１３（ａ）の符号化された符号列は、符号化後英和主データ部１４０４に含まれている符号列であり、復号対象符号列として、符号化後英和主データ部１４０４から、「１１１０００１００」が抽出されているとする。 [3.2.1 (b) Specific example]
This will be specifically described with reference to FIG. Here, the encoded code string in FIG. 13A is a code string included in the post-encoding English-Japanese main data part 1404, and the post-encoding English-Japanese main data part 1404 is used as a decoding target code string. , “111000100” is extracted.

まず、ＣＰＵ１１０は、復号対象符号列の最初の１ビット目を抽出する。すると、最初の１ビット目は「１」であることから（ステップＦ１６；Ｙｅｓ）、ＣＰＵ１１０は、次に２ビット目を抽出する。２ビット目についても、「１」であることから（ステップＦ２０；Ｙｅｓ）、ＣＰＵ１１０は、サブテーブル１４１２よりアドレスを読み出し、続く符号列から語長を読み出す。具体的には、続く２ビット「１０」に対応するアドレスをサブテーブル１４１２より「00000000000000010000」と読み出す。そして、ＣＰＵ１１０は、続く符号列「00100」を語長として抽出する。ＣＰＵ１１０は、抽出されたアドレスを開始位置とし、語長「４」文字分を処理後英和参照部１４０２から読み出す。そして、対応する文字列「［名詞］」に復号する。 First, the CPU 110 extracts the first bit of the decoding target code string. Then, since the first first bit is “1” (step F16; Yes), the CPU 110 next extracts the second bit. Since the second bit is also “1” (step F20; Yes), the CPU 110 reads the address from the sub-table 1412 and reads the word length from the subsequent code string. Specifically, the address corresponding to the subsequent 2 bits “10” is read from the subtable 1412 as “00000000000000010000”. Then, CPU 110 extracts the subsequent code string “00100” as the word length. CPU 110 uses the extracted address as a start position, and reads the word length “4” characters from processed English-Japanese reference section 1402. Then, it is decoded into the corresponding character string “[noun]”.

また、図１３（ｂ）を用いて説明すると、復号対象符号列として、符号化後英和主データ部１４０４から、「100000000000000001111001000」が抽出されているとする。 13B, it is assumed that “100000000000000001111001000” is extracted from the encoded English-Japanese main data section 1404 as a decoding target code string.

まず、ＣＰＵ１１０は、復号対象符号列の最初の１ビット目を抽出する。すると、最初の１ビット目は「１」であることから（ステップＦ１６；Ｙｅｓ）、ＣＰＵ１１０は、次に２ビット目を抽出する。抽出した２ビット目は、「０」であることから（ステップＦ２０；Ｎｏ）、ＣＰＵ１１０は、続く「２０」ビット分の符号列「00000000000000011110」を、アドレスとして抽出する。そして、ＣＰＵ１１０は、続く符号列「０１０００」を語長として抽出する。ＣＰＵ１１０は、抽出されたアドレスを開始位置とし、語長「８」文字分を処理後英和参照部１４０２から読み出す。そして、対応する文字列「シーディーアール」に復号する。 First, the CPU 110 extracts the first bit of the decoding target code string. Then, since the first first bit is “1” (step F16; Yes), the CPU 110 next extracts the second bit. Since the extracted second bit is “0” (step F20; No), the CPU 110 extracts the code string “00000000000000011110” for the subsequent “20” bits as an address. Then, CPU 110 extracts the subsequent code string “01000” as the word length. The CPU 110 uses the extracted address as a start position, and reads the word length “8” characters from the processed English-Japanese reference unit 1402. Then, it is decoded into the corresponding character string “CDR”.

［３．２．２文字コード伸張処理］
［３．２．２（ａ）処理の流れ］
図１６は、文字コード伸張処理に係る電子辞書装置１００の動作を説明するためのフローチャートである。この文字コード伸張処理は、ＣＰＵ１１０がＲＯＭ１２０に記憶された文字コード伸張プログラム１２０２を実行することによって実現される処理である。 [3.2.2 Character code expansion processing]
[3.2.2 (a) Process flow]
FIG. 16 is a flowchart for explaining the operation of the electronic dictionary device 100 according to the character code decompression process. This character code expansion process is a process realized by the CPU 110 executing the character code expansion program 1202 stored in the ROM 120.

まず、ＣＰＵ１１０は、辞書伸張処理によって復号された文字列（以下、適宜「復号文字列」という。）のうち始めの一文字を選択し、選択された文字の文字コードの１バイト目を抽出する（ステップＧ１０）。次に、ＣＰＵ１１０は、抽出した文字コードが１バイト文字であるか２バイト文字であるかを１バイト文字変換表１４０６に基づいて抽出する（ステップＧ１２）。具体的には、例えば１バイト文字変換表１４０６の表の中に、１バイト文字であるか、２バイト文字であるかのフラグにより文字の種別を抽出したり、文字コードが１バイト文字変換表３０８の所定の範囲にある場合は２バイト文字であることを抽出したりする。 First, the CPU 110 selects the first character from the character string decrypted by the dictionary expansion process (hereinafter referred to as “decoded character string” as appropriate), and extracts the first byte of the character code of the selected character ( Step G10). Next, CPU 110 extracts whether the extracted character code is a 1-byte character or a 2-byte character based on 1-byte character conversion table 1406 (step G12). Specifically, for example, in the table of the 1-byte character conversion table 1406, the type of the character is extracted by a flag indicating whether it is a 1-byte character or a 2-byte character, or the character code is a 1-byte character conversion table. If it is within the predetermined range of 308, it is extracted that it is a 2-byte character.

次に、ＣＰＵ１０は、抽出された文字の種別が１バイト文字か２バイト文字か判定する（ステップＧ１４）。もし、抽出された文字の種別が２バイト文字である場合には（ステップＧ１４；２バイト文字）、ＣＰＵ１１０は、次の１バイトを抽出し（ステップＧ２８）、２バイト文字として表示部１６０に表示する（ステップＧ３０）。 Next, the CPU 10 determines whether the extracted character type is a 1-byte character or a 2-byte character (step G14). If the extracted character type is a 2-byte character (step G14; 2-byte character), the CPU 110 extracts the next 1 byte (step G28) and displays it on the display unit 160 as a 2-byte character. (Step G30).

他方、抽出された文字の種別が１バイト文字である場合には、ＣＰＵ１１０は、１バイト変換文字であるか否かを判定する（ステップＧ１６）。ここで、抽出された文字が、１バイト変換文字ではない場合には、ＣＰＵ１１０は、通常の１バイト文字として表示部１６０に表示する（ステップＧ３１）。 On the other hand, when the type of the extracted character is a 1-byte character, the CPU 110 determines whether it is a 1-byte converted character (step G16). If the extracted character is not a 1-byte converted character, the CPU 110 displays it on the display unit 160 as a normal 1-byte character (step G31).

また、抽出された１バイト文字が１バイト変換文字であった場合には（ステップＧ１６；Ｙｅｓ）、ＣＰＵ１１０は、文字コード変換表１４０８から対応する文字コードから始めの１バイトを抽出する（ステップＧ１８）。そして、ＣＰＵ１１０は、１バイト文字変換表で抽出した文字の種別が１バイト文字であるか２バイト文字であるかを抽出し（ステップＧ２０）、判定する（ステップＧ２２）。そして、文字の種別が１バイト文字である場合には（ステップＧ２２；１バイト文字）、ＣＰＵ１１０は、１バイト文字として表示部１６０に表示する（ステップＧ３１）。また、文字の種別が２バイト文字である場合には（ステップＧ２２；２バイト文字）、ＣＰＵ１１０は、対応する文字コードの次の１バイトを抽出し（ステップＧ２４）、２バイト文字として表示部１６０に表示する（ステップＧ２６）。 If the extracted 1-byte character is a 1-byte converted character (step G16; Yes), the CPU 110 extracts the first 1 byte from the corresponding character code from the character code conversion table 1408 (step G18). ). Then, the CPU 110 extracts whether the character type extracted in the 1-byte character conversion table is a 1-byte character or a 2-byte character (step G20), and determines (step G22). When the character type is a 1-byte character (step G22; 1-byte character), the CPU 110 displays it as a 1-byte character on the display unit 160 (step G31). If the character type is a 2-byte character (step G22; 2-byte character), the CPU 110 extracts the next 1 byte of the corresponding character code (step G24), and displays the display unit 160 as a 2-byte character. (Step G26).

そして、総ての文字を伸張していない場合には（ステップＧ３２；Ｎｏ）、次の文字を抽出し、ステップＧ１０より同様の処理を実行する。 If all characters have not been expanded (step G32; No), the next character is extracted and the same processing is executed from step G10.

［３．２．２（ｂ）具体例］
具体的に、図８を用いて説明する。図８（ｄ）の文字コードから、ＣＰＵ１１０は、「２１」を抽出する（ステップＧ１０）。次に、文字の種別を１バイト文字変換表１４０６を用いて判定すると、「２１」は１バイト変換文字である。従って文字コード変換表３１０より、対応する文字コードの始めの１バイトを抽出する（ステップＧ１８）。すると、始めの１バイトは「８９」であるため、ＣＰＵ１１０は、文字の種別は２バイト文字であると判定する（ステップＧ２２；２バイト）。従って、ＣＰＵ１１０は、続く１バイト「ＢＤ」を抽出し、２バイト文字コード「８９ＢＤ」を汎用文字コードとして、対応する文字「何」を表示する（ステップＧ２６）。 [3.2.2 (b) Specific example]
This will be specifically described with reference to FIG. CPU110 extracts "21" from the character code of FIG.8 (d) (step G10). Next, when the character type is determined using the 1-byte character conversion table 1406, “21” is a 1-byte converted character. Therefore, the first byte of the corresponding character code is extracted from the character code conversion table 310 (step G18). Then, since the first byte is “89”, the CPU 110 determines that the character type is a 2-byte character (step G22; 2 bytes). Therefore, the CPU 110 extracts the subsequent 1-byte “BD” and displays the corresponding character “what” with the 2-byte character code “89BD” as the general-purpose character code (step G26).

また、図８（ｄ）の文字列の最後の文字コード「１３００」について説明する。まず、ＣＰＵ１１０は、文字コードの１バイト目「１３」を抽出する（ステップＧ１０）。次に、ＣＰＵ１１０は、文字コード「１３」の種別を判定すると、文字待避表３１２に対応づけられた２バイト文字と判定される（ステップＧ１４）。従って、次の１バイト「００」を読み出す。そして、ＣＰＵ１１０は、「００」に対応する文字を文字待避表３１２から抽出し文字「！」を得る（ステップＧ３０）。 Further, the last character code “1300” of the character string in FIG. First, the CPU 110 extracts the first byte “13” of the character code (step G10). Next, when determining the type of the character code “13”, the CPU 110 determines that the character is a 2-byte character associated with the character saving table 312 (step G14). Therefore, the next 1 byte “00” is read. Then, the CPU 110 extracts the character corresponding to “00” from the character saving table 312 to obtain the character “!” (Step G30).

このように、文字コード伸張処理によれば、圧縮後英和辞典データ１４００の文字が変換されている場合であっても、文字の１バイト目を抽出し、抽出された文字に応じて元の文字に変換することが可能となる。 As described above, according to the character code decompression process, even if the character of the post-compression English-Japanese dictionary data 1400 is converted, the first byte of the character is extracted and the original character is extracted according to the extracted character. It becomes possible to convert to.

［４．変形例］
上述した実施形態では、電子辞典、電子事典は専用機としての電子辞書装置として説明したが、このような製品に限定されるものではなく、例えば携帯電話やＰＤＡ（Personal Digital Assistants）、パソコン等に電子辞書装置として内蔵（ソフトウェア的に組み込まれることも含む。）されるものであってもよい。 [4. Modified example]
In the above-described embodiment, the electronic dictionary and the electronic dictionary have been described as an electronic dictionary device as a dedicated machine. However, the present invention is not limited to such a product. For example, the electronic dictionary and the electronic dictionary can be used for mobile phones, PDAs (Personal Digital Assistants), personal computers, and the like. It may be built in (including software built-in) as an electronic dictionary device.

コンピュータ及び電子辞書装置の概観図。1 is an overview diagram of a computer and an electronic dictionary device. コンピュータの構成図。The block diagram of a computer. 元英和辞典データのデータ構造の一例を示した図。The figure which showed an example of the data structure of original English-Japanese dictionary data. （ａ）符号化効率予測値テーブル、（ｂ）サブテーブル、（ｃ）ハフマン符号テーブル、（ｄ）見出語テーブルのデータ構造の一例を示した図。The figure which showed an example of the data structure of (a) encoding efficiency prediction value table, (b) sub-table, (c) Huffman code table, (d) headword table. 辞書圧縮処理の動作フローを示した図。The figure which showed the operation | movement flow of dictionary compression processing. 処理後英和参照部作成処理の動作を説明した図。The figure explaining operation | movement of the post-process English-Japanese reference part creation process. 文字コード圧縮処理の動作フローを示した図。The figure which showed the operation | movement flow of the character code compression process. （ａ）１バイト文字変換表のデータ構造の一例、（ｂ）文字コード変換表のデータ構造の一例、（ｃ）文字待避表のデータ構造の一例、（ｄ）文字コード圧縮処理の動作を説明した図。(A) An example of a data structure of a 1-byte character conversion table, (b) An example of a data structure of a character code conversion table, (c) An example of a data structure of a character saving table, (d) An operation of a character code compression process will be described. Figure. グローバルサーチ処理の動作フローを示した図。The figure which showed the operation | movement flow of the global search process. グローバルサーチ処理及びローカルサーチ処理の動作を説明した図。The figure explaining operation | movement of a global search process and a local search process. ローカルサーチ処理の動作フローを示した図。The figure which showed the operation | movement flow of the local search process. 符号化処理の動作フローを示した図。The figure which showed the operation | movement flow of an encoding process. 符号化処理の動作を説明した図。The figure explaining operation | movement of the encoding process. 電子辞書装置の構成図。The block diagram of an electronic dictionary apparatus. 辞書伸張処理の動作フローを示した図。The figure which showed the operation | movement flow of dictionary expansion | extension processing. 文字コード伸張処理の動作フローを示した図。The figure which showed the operation | movement flow of the character code expansion | extension process.

Explanation of symbols

１コンピュータ
１０ＣＰＵ
２０ハードディスク
２００元英和辞典データ
２０６符号化効率予測値テーブル
２１０辞書圧縮プログラム
２１２文字コード圧縮プログラム
２１４グローバルサーチプログラム
２１６ローカルサーチプログラム
２１８符号化プログラム
３０ＲＡＭ
３００圧縮後英和辞典データ
３０２処理後英和参照部
３０４符号化後英和主データ部
３０６主データ部中間データ
３０８１バイト文字変換表
３１０文字コード変換表
３１２文字待避表
３１４サブテーブル
３１６ハフマン符号テーブル
３１８リザーブ配列記憶領域
３２０見出語テーブル
４０ＲＯＭ
５０入力部
６０表示部
１００電子辞書装置
１１０ＣＰＵ
１２０ＲＯＭ
１２００辞書伸張プログラム
１２０２文字コード伸張プログラム
１３０ＲＡＭ
１４０ＥＥＰＲＯＭ
１４００圧縮後英和辞典データ
１４０２処理後英和参照部
１４０４符号化後英和主データ部
１４０６１バイト文字変換表
１４０８文字コード変換表
１４１０文字待避表
１４１２サブテーブル
１４１４ハフマン符号テーブル
１４１６見出語テーブル
１５０入力部
１６０表示部 1 computer 10 CPU
20 Hard disk 200 Original English-Japanese dictionary data 206 Encoding efficiency prediction value table 210 Dictionary compression program 212 Character code compression program 214 Global search program 216 Local search program 218 Encoding program 30 RAM
300 English-Japanese dictionary data after compression 302 English-English reference part after processing 304 English-Japanese main data part after encoding 306 Main data part intermediate data 308 1-byte character conversion table 310 Character code conversion table 312 Character escape table 314 Sub-table 316 Huffman code table 318 Reserve Sequence storage area 320 Headword table 40 ROM
50 input unit 60 display unit 100 electronic dictionary device 110 CPU
120 ROM
1200 Dictionary expansion program 1202 Character code expansion program 130 RAM
140 EEPROM
1400 Compressed English-Japanese Dictionary Data 1402 Processed English-Japanese Reference Section 1404 Encoded English-Japanese Main Data Section 1406 1-byte Character Conversion Table 1408 Character Code Conversion Table 1410 Character Escape Table 1412 Sub-table 1414 Huffman Code Table 1416 Headword Table 150 Input Section 160 Display

Claims

Classifying means for classifying dictionary data in which character strings are described in a series of headword units into a reference part and a main data part,
A position memory for selecting a plurality of positions of characters appearing at a predetermined frequency as the frequent reference positions from the reference portion based on the appearance frequency of the characters, and storing a plurality of positions in association with codes for identifying each frequent reference position Means,
When the reference position is the frequent reference position, the code of the reference position and the word length from the reference position stored in the position storage means are used. When the reference position is not the frequent reference position, the reference position and A main data portion encoding means for encoding the main data portion so as to be decodable in units of headwords by a dictionary type encoding method using a word length from the reference position;
A dictionary data compression apparatus comprising:

Classifying means for discriminating a plurality of data in terms of headwords from the entire dictionary data in which character strings are described in series in terms of headwords as reference parts and the rest as main data parts When,
A main data portion encoding means for encoding the main data portion so that it can be decoded in units of headwords by a dictionary-type encoding method using the reference portion as a reference source;
A dictionary data compression apparatus comprising:

Dictionary data in which character strings are described in a series of headwords is divided into a reference part and a main data part, and a matching character string that matches the character string in the main data part exists in the reference part In this case, in a dictionary data compression apparatus that compresses dictionary data by a dictionary-type encoding method that encodes a character string of the main data portion with a code representing a copy of the matching character string,
A compression means for compressing a predetermined word length or more by compressing the character string in the main data portion by using the dictionary-type encoding method for a character string of a predetermined word length or more;
Storage means for storing a plurality of character strings and predicted encoding efficiency values of the character strings in association with each other;
Among the character strings included in the main data portion after being compressed by the compression means over the predetermined word length, the dictionary type encoding is performed in order from the character string having the highest encoding efficiency prediction value stored in the storage means. Evaluation value order compression means for compressing by the method;
A dictionary data compression apparatus comprising:

Storage means for storing dictionary data composed of character string data;
Selecting means for selecting any one-byte character code from the standardized one-byte character codes;
A two-byte character code conversion table is created in which the first byte is a blank one-byte character code for which no corresponding character is specified from the standardized one-byte character code. Conversion table creating means for setting the selected 1-byte character code to an unset 2-byte character code;
Detecting means for detecting a double-byte character code having a predetermined appearance frequency from the character string data;
Corresponding character changing means for changing the corresponding character represented by the selected one-byte character code to the corresponding character represented by the detected two-byte character code;
First replacement means for replacing the selected one-byte character code in the character string data with a corresponding two-byte character code set in the two-byte character code conversion table;
Second replacement means for replacing the detected 2-byte character code in the character string data with the selected 1-byte character code;
A dictionary data compression apparatus comprising:

Storage means for storing dictionary data composed of character string data;
Character code storage means for storing character codes and standardized character codes in association with each other;
Extracting means for extracting a first byte of a character code representing a character included in the character string data;
Character determination means for determining whether the character code is a 1-byte character code or a 2-byte character code based on the first byte extracted by the extraction means;
When the character determining unit determines that the character code is a one-byte character code, a standardized character code corresponding to the determined one-byte character code is read from the character code storage unit, and the standardized character is read. Replacement means for replacing with code,
An electronic dictionary device comprising:

On the computer,
A classification step of classifying dictionary data in which a character string is described in a series of headword units into a reference part and a main data part;
A position memory for selecting a plurality of positions of characters appearing at a predetermined frequency as the frequent reference positions from the reference section based on the appearance frequency of the characters, and storing the plurality of positions in association with codes for identifying each frequent reference position Process,
When the reference position is the frequent reference position, the code of the reference position stored in the position storage step and the word length from the reference position are used. When the reference position is not the frequent reference position, the reference position and the reference position A main data portion output step for encoding and outputting the main data portion so that it can be decoded in units of headwords by a dictionary-type encoding method using a word length from a reference position;
A compressed dictionary data manufacturing method comprising:

On the computer,
A classification process in which multiple pieces of data in terms of terms are discretely extracted from the entire dictionary data in which character strings are described in series in terms of terms, and used as a reference part, and the remaining part is classified as a main data part When,
A main data part output step for encoding and outputting the main data part so that it can be decoded in units of headwords by a dictionary-type encoding method using the reference part as a reference source;
A compressed dictionary data manufacturing method comprising:

On the computer,
Dictionary data in which character strings are described in a series of headwords is divided into a reference part and a main data part, and a matching character string that matches the character string in the main data part exists in the reference part In this case, in a compressed dictionary data manufacturing method for compressing dictionary data by a dictionary-type encoding method for encoding a character string of the main data portion by a code representing a copy of the matched character string,
In the computer,
Of the character strings in the main data portion, a compression step of a predetermined word length or more that compresses the character string of a predetermined word length or more by the dictionary-type encoding method;
Among the character strings included in the main data portion after being compressed by the compression step with a predetermined word length or more, the evaluation value order is compressed by the dictionary encoding method in order from the character string having the highest encoding efficiency prediction value. A compression process;
An output step of outputting the character string data after compression in the evaluation value order compression step as compressed character string data;
A compressed dictionary data manufacturing method comprising:

On the computer,
A compressed dictionary data production method for producing compressed dictionary data obtained by compressing dictionary data composed of character string data,
In the computer,
A selection step for selecting any one-byte character code from the standardized one-byte character codes;
A two-byte character code conversion table is created in which a blank one-byte character code for which no corresponding character is specified is specified as the first byte from the standardized one-byte character code. A conversion table creation step for setting the selected one-byte character code to an unset two-byte character code;
A detection step of detecting a 2-byte character code having a predetermined frequency of appearance from the character string data;
A corresponding character changing step for changing the corresponding character represented by the selected one-byte character code to the corresponding character represented by the detected two-byte character code;
A first replacement step of replacing the selected 1-byte character code of the character string data with a corresponding 2-byte character code set in the 2-byte character code conversion table;
A second replacement step of replacing the detected 2-byte character code with the selected 1-byte character code in the character string data after the replacement in the first replacement step;
An output step of outputting the character string data after replacement in the second replacement step as compressed character string data;
A compressed dictionary data manufacturing method comprising:

On the computer,
A classification function for classifying dictionary data in which character strings are described in series in terms of headwords into a reference part and a main data part;
A position memory for selecting a plurality of positions of characters appearing at a predetermined frequency as the frequent reference positions from the reference portion based on the appearance frequency of the characters, and storing a plurality of positions in association with codes for identifying each frequent reference position Function and
When the reference position is the frequent reference position, the code of the reference position and the word length from the reference position stored in the position storage function are used. When the reference position is not the frequent reference position, the reference position and A main data part encoding function for encoding the main data part so that it can be decoded in units of headwords by a dictionary-type encoding method using a word length from the reference position;
A program to realize

On the computer,
A classification function that separates the data of the headword unit from the entire dictionary data in which character strings are described in series in the unit of headword, and uses it as a reference part and the rest as the main data part When,
A main data part encoding function for encoding the main data part in a unit of headwords by a dictionary-type encoding method using the reference part as a reference source;
A program to realize

Dictionary data in which character strings are described in a series of headwords is divided into a reference part and a main data part, and a matching character string that matches the character string in the main data part exists in the reference part A computer that compresses dictionary data by a dictionary-type encoding method that encodes a character string of the main data portion with a code representing a copy of the matching character string,
Among the character strings in the main data portion, a compression function of a predetermined word length or more that compresses the character string having a predetermined word length or more by the dictionary encoding method;
A storage function for storing a plurality of character strings and encoding efficiency prediction values of the character strings in association with each other;
Among the character strings included in the main data portion after being compressed by the compression function with a predetermined word length or longer, the dictionary type encoding is performed in order from the character string having the highest encoding efficiency prediction value stored in the storage function. Evaluation value order compression function to compress by the method,
A program to realize

On the computer,
A storage function for storing dictionary data composed of character string data;
A selection function for selecting any one-byte character code from standardized one-byte character codes;
A two-byte character code conversion table is created in which the first byte is a blank one-byte character code for which no corresponding character is specified from the standardized one-byte character code. A conversion table creation function for setting the selected 1-byte character code to an unset 2-byte character code;
A detection function for detecting a 2-byte character code having an appearance frequency of a predetermined frequency from the character string data;
A corresponding character changing function for changing the corresponding character represented by the selected one-byte character code to the corresponding character represented by the detected two-byte character code;
A first replacement function for replacing the selected one-byte character code in the character string data with a corresponding two-byte character code set in the two-byte character code conversion table;
A second replacement function for replacing the detected 2-byte character code in the character string data with the selected 1-byte character code;
A program to realize

On the computer,
A storage function for storing dictionary data composed of character string data;
A character code storage function for storing a character code and a standardized character code in association with each other;
An extraction function for extracting the first byte of a character code representing a character included in the character string data;
A character determination function for determining whether the character code is a 1-byte character code or a 2-byte character code based on the first byte extracted by the extraction function;
When the character determination function determines that the character code is a 1-byte character code, the standardized character code corresponding to the determined 1-byte character code is read from the character code storage function, Replace function to replace with code,
A program to realize