JP2016063475A

JP2016063475A - Encoding apparatus, encoding method, decoding apparatus, decoding method, and program

Info

Publication number: JP2016063475A
Application number: JP2014191516A
Authority: JP
Inventors: 井手　博康; Hiroyasu Ide; 博康井手
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2014-09-19
Filing date: 2014-09-19
Publication date: 2016-04-25
Anticipated expiration: 2034-09-19
Also published as: JP6511752B2

Abstract

PROBLEM TO BE SOLVED: To provide an encoding apparatus and an encoding method for encoding data to be encoded to partially decodable data by using a method having high encoding efficiency, to provide a decoding apparatus and a decoding method for partially decoding the data encoded by using the method having high encoding efficiency, and to provide a program.SOLUTION: A conversion unit 107 converts respective characters included in data to be encoded into character identification numbers for identifying the characters. An encoding unit 108 encodes respective character identification numbers to binary data. A bit string identification number association unit 111 converts each bit string in which the data obtained by encoding the data to be encoded by the encoding unit 108 appear at high frequency into a bit string identification number. A reference flag-added encoding unit 114 encodes each bit string identification number to binary data.SELECTED DRAWING: Figure 2

Description

本発明は、符号化装置、符号化方法、復号装置、復号方法、及び、プログラムに関する。 The present invention relates to an encoding device, an encoding method, a decoding device, a decoding method, and a program.

データ量を削減するために、符号化対象のデータを符号化する（圧縮する）符号化方法、及び、復号対象のデータを符号化前の元データに復号する復号方法が知られている。 In order to reduce the amount of data, an encoding method for encoding (compressing) data to be encoded and a decoding method for decoding the data to be decoded into original data before encoding are known.

例えば、非特許文献１は、符号化対象のデータに繰り返し出現する文字列について、２回目以降に出現する文字列を、最初に出現する文字列のデータ内における位置及び文字列の長さを示すデータ（以下、頻出文字列メタデータと言う。）に変換して符号化対象のデータを符号化するＬＺ符号化方法を開示している。また、非特許文献１は、復号対象のデータ（ＬＺ符号化方法で符号化されたデータ）に含まれる複数の頻出文字列メタデータのそれぞれを、符号化前の元データにおいて繰り返し出現する文字列のうち最初に出現する文字列に置き換えて復号対象のデータを復号する復号方法を開示している。 For example, Non-Patent Document 1 indicates the position in the data of the character string that appears first and the length of the character string for the character string that appears the second time or later for the character string that repeatedly appears in the data to be encoded. An LZ encoding method for encoding data to be encoded by converting it into data (hereinafter referred to as frequent character string metadata) is disclosed. Non-Patent Document 1 discloses a character string that repeatedly appears in the original data before encoding each of a plurality of frequent character string metadata included in the data to be decoded (data encoded by the LZ encoding method). The decoding method which decodes the data of decoding object by replacing with the character string which appears first among these is disclosed.

岡野原大輔著「高速文字列解析の世界」岩波出版、２０１２年Daisuke Okanohara "World of High-Speed String Analysis", Iwanami Publishing, 2012

しかしながら、従来の符号化方法・復号方法では、部分的な復号が必要とされるデータを符号化・復号の対象とした場合、符号化効率が低い。例えば、辞書データの復号では、ユーザが検索対象とした見出し語単位での復号が必要とされるが、このような部分的な復号を従来の符号化方法・復号方法が実現しようとすると、辞書データを見出し語単位で符号化しなければならない。このことを、上述のＬＺ符号化方法を例にとって具体的に説明すると、辞書データ全体がＬＺ符号化方法で符号化されている場合、符号化された辞書データの一部分には、前述の頻出文字列メタデータの置き換えに必要となる、符号化前の元データに繰り返し出現する文字列のうち最初に出現する文字列が含まれない可能性が高い。そのため、辞書データ全体がＬＺ符号化方法で符号化されている場合、辞書データの部分的な復号はほぼ不可能である。従って、見出し語単位での辞書データの復号には、見出し語単位での符号化が必要となる。また、他の符号化方法・復号方法についても同様の理由から、復号対象のデータの部分的な復号には、符号化前の元データについて部分的な符号化が必要となる。そして、このような部分的な符号化は、符号化効率を著しく低下させる。 However, in the conventional encoding method / decoding method, when data that requires partial decoding is targeted for encoding / decoding, the encoding efficiency is low. For example, decoding of dictionary data requires decoding in units of headwords that the user has searched for. If a conventional encoding method / decoding method tries to realize such partial decoding, the dictionary Data must be encoded in headword units. This will be specifically described by taking the above-described LZ encoding method as an example. When the entire dictionary data is encoded by the LZ encoding method, the above-mentioned frequent characters are included in a part of the encoded dictionary data. There is a high possibility that the first character string that appears repeatedly in the original data before encoding, which is necessary for the replacement of the column metadata, is not included. Therefore, when the entire dictionary data is encoded by the LZ encoding method, partial decoding of the dictionary data is almost impossible. Accordingly, decoding of dictionary data in units of headwords requires encoding in units of headwords. For the same reason as for other encoding methods and decoding methods, partial decoding of the original data before encoding is necessary for partial decoding of the data to be decoded. Such partial encoding significantly reduces the encoding efficiency.

本発明は、以上のような課題を解決するためのものであり、符号化効率が高い方法を用いて符号化対象のデータを部分的に復号可能なデータに符号化する符号化装置、符号化方法、符号化効率が高い方法を用いて符号化されたデータを部分的に復号する復号装置、復号方法、及び、プログラムを提供することを目的とする。 The present invention is to solve the above-described problems, and an encoding apparatus and encoding that encode data to be encoded into partially decodable data using a method with high encoding efficiency It is an object to provide a decoding apparatus, a decoding method, and a program for partially decoding data encoded using a method, a method with high encoding efficiency.

上記目的を達成するために、本発明の第１の観点に係る符号化装置は、
符号化対象のデータに含まれる文字と、該文字を識別する文字識別番号と、を関連付けて記憶する文字識別番号記憶部と、
前記文字識別番号記憶部を参照して、前記符号化対象のデータに含まれる文字を、該文字に関連付けられた文字識別番号に変換する変換部と、
前記変換部が変換した文字識別番号をバイナリデータに符号化する符号化部と、
前記符号化部が符号化した文字識別番号のバイナリデータに出現する同一のビット列ごとに、該ビット列を識別するビット列識別番号を関連付けてビット列識別番号記憶部に記憶するビット列識別番号関連付部と、
前記ビット列識別番号記憶部を参照して、前記文字識別番号のバイナリデータに出現する同一のビット列を、該ビット列に関連付けられたビット列識別番号に変換するビット列識別番号変換部と、
前記ビット列識別番号変換部が変換したビット列識別番号に、復号時に前記ビット列識別番号記憶部を参照することを示す参照フラグを関連付けて、該ビット列識別番号をバイナリデータに符号化する参照フラグ付き符号化部と、
を備える。 In order to achieve the above object, an encoding apparatus according to the first aspect of the present invention provides:
A character identification number storage unit that stores a character included in data to be encoded and a character identification number for identifying the character in association with each other;
A conversion unit that refers to the character identification number storage unit and converts a character included in the data to be encoded into a character identification number associated with the character;
An encoding unit that encodes the character identification number converted by the conversion unit into binary data;
A bit string identification number associating unit that associates a bit string identification number that identifies the bit string and stores it in the bit string identification number storage unit for each identical bit string that appears in the binary data of the character identification number encoded by the encoding unit;
A bit string identification number converter that converts the same bit string appearing in the binary data of the character identification number into a bit string identification number associated with the bit string with reference to the bit string identification number storage unit;
Encoding with a reference flag for associating a reference flag indicating that the bit string identification number storage unit is referred to at the time of decoding with the bit string identification number converted by the bit string identification number conversion unit, and encoding the bit string identification number into binary data And
Is provided.

上記目的を達成するために、本発明の第２の観点に係る復号装置は、
符号化対象のデータに含まれる各文字を識別する各文字識別番号のバイナリデータと、前記各文字識別番号のバイナリデータに出現する同一のビット列を識別するビット列識別番号のバイナリデータと、で構成された圧縮データを記憶する圧縮データ記憶部と、
前記符号化対象のデータに含まれる文字と、該文字を識別する前記文字識別番号と、を関連付けて記憶する文字識別番号記憶部と、
前記各文字識別番号のバイナリデータに出現する前記同一のビット列ごとに、該ビット列を識別する前記ビット列識別番号を関連付けて記憶するビット列識別番号記憶部と、
復号対象のデータを特定するための条件を入力する入力部と、
前記圧縮データ記憶部が記憶する圧縮データのうち前記条件を満たすデータを復号対象のデータとして特定し、特定した復号対象のデータを構成する前記文字識別番号のバイナリデータと前記ビット列識別番号のバイナリデータとのうち、前記ビット列識別番号のバイナリデータを前記ビット列識別番号に復号するビット列識別番号復号部と、
前記ビット列識別番号記憶部を参照して、前記ビット列識別番号復号部が復号した前記ビット列識別番号を、該ビット列識別番号に関連付けられたビット列に変換するビット列識別番号変換部と、
前記復号対象のデータを構成する前記文字識別番号のバイナリデータと前記ビット列識別番号変換部が変換したビット列とを前記文字識別番号に復号する復号部と、
前記文字識別番号記憶部を参照して、前記復号部が復号した前記文字識別番号を、該文字識別番号に関連付けられた文字に変換する変換部と、
を備える。 In order to achieve the above object, a decoding device according to the second aspect of the present invention provides:
It consists of binary data of each character identification number that identifies each character included in the data to be encoded, and binary data of a bit string identification number that identifies the same bit string that appears in the binary data of each character identification number A compressed data storage unit for storing compressed data,
A character identification number storage unit for storing the character included in the data to be encoded and the character identification number for identifying the character in association with each other;
A bit string identification number storage unit that associates and stores the bit string identification number for identifying the bit string for each identical bit string that appears in the binary data of each character identification number;
An input unit for inputting a condition for specifying data to be decrypted;
Among the compressed data stored in the compressed data storage unit, the data satisfying the condition is specified as the data to be decoded, and the binary data of the character identification number and the binary data of the bit string identification number constituting the specified decoding target data A bit string identification number decoding unit for decoding binary data of the bit string identification number into the bit string identification number,
A bit string identification number converting unit that converts the bit string identification number decoded by the bit string identification number decoding unit into a bit string associated with the bit string identification number with reference to the bit string identification number storage unit;
A decoding unit that decodes the binary data of the character identification number constituting the data to be decoded and the bit string converted by the bit string identification number conversion unit into the character identification number;
A conversion unit that refers to the character identification number storage unit and converts the character identification number decoded by the decoding unit into a character associated with the character identification number;
Is provided.

本発明によれば、符号化効率が高い方法を用いて符号化対象のデータを部分的に復号可能なデータに符号化することができる。また、符号化効率が高い方法を用いて符号化されたデータを部分的に復号することができる。 According to the present invention, data to be encoded can be encoded into partially decodable data using a method with high encoding efficiency. In addition, data encoded using a method with high encoding efficiency can be partially decoded.

本発明の実施形態に係る符号化装置の物理構成を示す図である。It is a figure which shows the physical structure of the encoding apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る符号化装置の機能構成を示す図である。It is a figure which shows the function structure of the encoding apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る符号化処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the encoding process which concerns on embodiment of this invention. 辞書データ（符号化対象のデータ）を示す図である。It is a figure which shows dictionary data (data of encoding object). 各文字の辞書データにおける出現頻度を示す図である。It is a figure which shows the appearance frequency in the dictionary data of each character. 文字と文字識別番号との関連付けを示す図である。It is a figure which shows correlation with a character and a character identification number. 最終符号化前バイナリデータを示す図である。It is a figure which shows the binary data before final encoding. 各ビット列の最終符号化前バイナリデータにおける出現頻度を示す図である。It is a figure which shows the appearance frequency in the binary data before the last encoding of each bit sequence. ビット列とビット列識別番号との関連付けを示す図である。It is a figure which shows correlation with a bit string and a bit string identification number. 本発明の実施形態に係る復号装置の物理構成を示す図である。It is a figure which shows the physical structure of the decoding apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る復号装置の機能構成を示す図である。It is a figure which shows the function structure of the decoding apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る復号処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the decoding process which concerns on embodiment of this invention.

以下、本発明の実施形態について、図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

従来の符号化方法・復号方法によれば、データを部分的に復号したい場合は、符号化前の元データを部分的に符号化しておかなければならない。従って、符号化に手間がかかり、符号化効率は低い。この点に関して、本願発明に係る符号化装置１００、復号装置２００は、符号化前の元データをまとめて符号化してしまっても部分的な復号を可能とする。以下、このような符号化装置１００、復号装置２００の物理構成及び機能構成を説明する。 According to the conventional encoding / decoding method, when data is to be partially decoded, the original data before encoding must be partially encoded. Therefore, it takes time for encoding, and the encoding efficiency is low. In this regard, the encoding device 100 and the decoding device 200 according to the present invention enable partial decoding even if the original data before encoding is collectively encoded. Hereinafter, the physical configuration and functional configuration of such an encoding apparatus 100 and decoding apparatus 200 will be described.

本実施形態に係る符号化装置１００は、物理的には図１に示すように構成される。即ち、符号化装置１００は、ＲＯＭ（Read Only Memory）１０と、ＲＡＭ（Random Access Memory）１１と、外部記憶装置１２と、入力装置１３と、表示装置１４と、ＣＰＵ（Central Processing Unit）１５と、を備える。 The encoding device 100 according to the present embodiment is physically configured as shown in FIG. That is, the encoding device 100 includes a ROM (Read Only Memory) 10, a RAM (Random Access Memory) 11, an external storage device 12, an input device 13, a display device 14, and a CPU (Central Processing Unit) 15. .

ＲＯＭ１０は、各種初期設定、ハードウェアの検査、プログラムのロード等を行うための初期プログラムを記憶する。ＲＡＭ１１は、ＣＰＵ１５が実行する各種ソフトウェアプログラム、これらのソフトウェアプログラムの実行に必要なデータ等を一時的に記憶する。 The ROM 10 stores an initial program for performing various initial settings, hardware inspection, program loading, and the like. The RAM 11 temporarily stores various software programs executed by the CPU 15 and data necessary for executing these software programs.

外部記憶装置１２は、例えば、ハードディスクであって、各種ソフトウェアプログラム、データ等を記憶する。これらソフトウェアプログラムの中には、アプリケーションソフトウェアプログラム、ＯＳ（Operating System）のような基本ソフトウェアプログラム等が含まれている。 The external storage device 12 is a hard disk, for example, and stores various software programs, data, and the like. These software programs include application software programs, basic software programs such as OS (Operating System), and the like.

入力装置１３は、キーボード、マウス、トラックパッド等を備え、ユーザからの入力を受け付ける。入力装置１３は、キーボード、マウス、トラックパッド等からの入力に基づいて信号を生成し、ＣＰＵ１５に供給する。 The input device 13 includes a keyboard, a mouse, a track pad, and the like, and receives input from the user. The input device 13 generates a signal based on input from a keyboard, mouse, track pad, etc., and supplies the signal to the CPU 15.

表示装置１４は、液晶ディスプレイ等の画面を備え、ＣＰＵ１５から供給されたテキストデータや画像データを表示する。 The display device 14 includes a screen such as a liquid crystal display and displays text data and image data supplied from the CPU 15.

ＣＰＵ１５は、外部記憶装置１２が記憶するソフトウェアプログラムをＲＡＭ１１に読み出して、そのソフトウェアプログラムを実行制御することにより、以下の機能構成を実現する。 The CPU 15 reads the software program stored in the external storage device 12 into the RAM 11 and controls the execution of the software program, thereby realizing the following functional configuration.

符号化装置１００は、機能的には図２に示すように構成される。即ち、符号化装置１００は、符号化候補データ記憶部１０１と、表示部１０２と、入力部１０３と、文字出現頻度取得部１０４と、文字識別番号関連付部１０５と、文字識別番号記憶部１０６と、変換部１０７と、符号化部１０８と、開始位置記憶部１０９と、ビット列出現頻度取得部１１０と、ビット列識別番号関連付部１１１と、ビット列識別番号記憶部１１２と、ビット列識別番号変換部１１３と、参照フラグ付き符号化部１１４と、非参照フラグ付き符号化部１１５と、圧縮データ記憶部１１６と、を備える。符号化候補データ記憶部１０１と、文字識別番号記憶部１０６と、開始位置記憶部１０９と、ビット列識別番号記憶部１１２と、圧縮データ記憶部１１６と、は図１に示す外部記憶装置１２に構築されている。 The encoding device 100 is functionally configured as shown in FIG. That is, the encoding apparatus 100 includes an encoding candidate data storage unit 101, a display unit 102, an input unit 103, a character appearance frequency acquisition unit 104, a character identification number association unit 105, and a character identification number storage unit 106. A conversion unit 107, an encoding unit 108, a start position storage unit 109, a bit string appearance frequency acquisition unit 110, a bit string identification number association unit 111, a bit string identification number storage unit 112, and a bit string identification number conversion unit 113, an encoding unit with a reference flag 114, an encoding unit with a non-reference flag 115, and a compressed data storage unit. The encoding candidate data storage unit 101, the character identification number storage unit 106, the start position storage unit 109, the bit string identification number storage unit 112, and the compressed data storage unit 116 are constructed in the external storage device 12 shown in FIG. Has been.

符号化候補データ記憶部１０１は、複数の符号化候補のデータを記憶する。符号化候補のデータは、辞書データ等のテキストデータである。 The encoding candidate data storage unit 101 stores a plurality of encoding candidate data. The encoding candidate data is text data such as dictionary data.

表示部１０２は、符号化候補のデータを記録したファイルの名称（ファイル名）を表示装置１４に表示する。ユーザは、表示装置１４に表示されたファイル名を参照して、符号化候補のデータのいずれを符号化するか指定する。 The display unit 102 displays the name (file name) of the file in which the encoding candidate data is recorded on the display device 14. The user refers to the file name displayed on the display device 14 and designates which of the encoding candidate data is to be encoded.

入力部１０３は、入力装置１３が生成した信号を受け付ける。この信号は、ユーザが入力装置１３を介して指定したファイル名を示す。入力部１０３は、入力装置１３から受け付けた信号に基づいて、符号化対象のデータを特定する。そして、符号化対象のデータを符号化する旨の指示を文字出現頻度取得部１０４に入力する。 The input unit 103 receives a signal generated by the input device 13. This signal indicates a file name designated by the user via the input device 13. The input unit 103 specifies data to be encoded based on the signal received from the input device 13. Then, an instruction to encode the data to be encoded is input to the character appearance frequency acquisition unit 104.

文字出現頻度取得部１０４は、入力部１０３から入力された指示に基づいて、符号化候補データ記憶部１０１が記憶する符号化候補のデータの中から符号化対象のデータを取得する。 Based on the instruction input from the input unit 103, the character appearance frequency acquisition unit 104 acquires encoding target data from the encoding candidate data stored in the encoding candidate data storage unit 101.

文字出現頻度取得部１０４は、符号化対象のデータに含まれる文字ごとに、符号化対象のデータにおける出現頻度を取得する。例えば、符号化対象のデータが図４に示す辞書データ１の場合、各文字の辞書データ１における出現頻度は図５のようになる。
なお、出現頻度の取得は、符号化対象のデータに含まれる全ての文字を対象に行われる。 The character appearance frequency acquisition unit 104 acquires the appearance frequency in the encoding target data for each character included in the encoding target data. For example, when the data to be encoded is the dictionary data 1 shown in FIG. 4, the appearance frequency of each character in the dictionary data 1 is as shown in FIG.
The appearance frequency is acquired for all characters included in the data to be encoded.

文字識別番号関連付部１０５は、符号化対象のデータに含まれる文字ごとに、文字を識別する文字識別番号を関連付けて文字識別番号記憶部１０６に記憶する。このとき、文字識別番号関連付部１０５は、符号化対象のデータにおいて出現頻度が高い文字から順に、値が小さい文字識別番号を関連付ける。例えば、符号化対象のデータが図４に示す辞書データ１の場合、図６に示すように、出現頻度が１番目、２番目、３番目・・・に高い文字「ｔ」、「ｈ」、「ｅ」・・・に、それぞれ文字識別番号０、１、２・・・を関連付ける。 The character identification number associating unit 105 associates and stores the character identification number for identifying the character in the character identification number storage unit 106 for each character included in the data to be encoded. At this time, the character identification number associating unit 105 associates character identification numbers having smaller values in order from the character having the highest appearance frequency in the data to be encoded. For example, if the data to be encoded is the dictionary data 1 shown in FIG. 4, as shown in FIG. 6, the characters “t”, “h”, The character identification numbers 0, 1, 2,... Are associated with “e”.

以下、出現頻度に基づいて文字に文字識別番号を関連付ける理由と、出現頻度が高い文字に値が小さい文字識別番号を関連付ける理由を説明する。
テキストの圧縮では文字コードの体系が圧縮率低下の原因となることがある。特に、ＵＴＦ−８を採用すると、多言語のテキストを圧縮対象とするため、言語によってはコード長が冗長になり、圧縮率が低下する。そこで、本願発明は、文字を文字コードに符号化するのではなく、文字に文字識別番号を関連付け、文字を文字識別番号のバイナリデータに符号化する。これで圧縮率は文字コードの体系に影響を受けずにすむ。また、頻出文字をコード長が長いバイナリデータに符号化すると圧縮データのサイズが大きくなるので、本願発明はそうならないよう工夫をしている。具体的には、上述したように頻出文字に値が小さい文字識別番号を関連付ける。そして、デルタ符号化方法、Variable Byte Code符号化方法、ハフマン符号化方法といった値が小さいほどコード長が短いバイナリデータに符号化できる符号化方法を用いて、頻出文字をコード長が短いバイナリデータに符号化する。これにより、文字コードに符号化した場合に比べて圧縮データのサイズは大幅に小さくなる。 Hereinafter, the reason for associating a character identification number with a character based on the appearance frequency and the reason for associating a character identification number with a small value with a character having a high appearance frequency will be described.
In text compression, the character code system may cause a reduction in compression rate. In particular, when UTF-8 is employed, multilingual text is targeted for compression, so that depending on the language, the code length becomes redundant and the compression rate decreases. Therefore, the present invention does not encode a character into a character code, but associates a character identification number with the character and encodes the character into binary data of the character identification number. Thus, the compression rate is not affected by the character code system. In addition, if frequent characters are encoded into binary data having a long code length, the size of the compressed data increases, so the present invention is devised to prevent this from happening. Specifically, as described above, a character identification number having a small value is associated with a frequently appearing character. Then, by using encoding methods that can encode binary data with a shorter code length as the value is smaller, such as delta encoding method, variable byte code encoding method, and Huffman encoding method, frequent characters are converted into binary data with a shorter code length. Encode. As a result, the size of the compressed data is significantly reduced as compared with the case of encoding into a character code.

文字識別番号記憶部１０６は、図６に示すように、符号化対象のデータに含まれる文字と、文字識別番号と、を関連付けて記憶する。 As shown in FIG. 6, the character identification number storage unit 106 stores the character included in the encoding target data and the character identification number in association with each other.

変換部１０７は、文字識別番号記憶部１０６を参照して、符号化対象のデータに含まれる文字を、文字に関連付けられた文字識別番号に変換する。 The conversion unit 107 refers to the character identification number storage unit 106 and converts the character included in the data to be encoded into a character identification number associated with the character.

符号化部１０８は、変換部１０７が変換した文字識別番号をバイナリデータに符号化する。なお、本明細書では符号化部１０８が符号化したバイナリデータを最終符号化前バイナリデータと言う。図７は、図４に示す辞書データ１が変換部１０７により符号化された場合の最終符号化前バイナリデータ２を示す。なお、最終符号化前バイナリデータ２は実際には０又は１を表すビットの集合である。しかし、説明の都合上、８ビットごとに１６進数表記していることに留意されたい。例えば、図７に示すビット列「１２Ａ５Ｂ８ＣＡ」は、「０００１００１０」、「１０１００１０１」、「１０１１１０００」、「１１００１０１０」を、それぞれ「１２」、「Ａ５」、「Ｂ８」、「ＣＡ」というように１６進数表記したものである The encoding unit 108 encodes the character identification number converted by the conversion unit 107 into binary data. In this specification, binary data encoded by the encoding unit 108 is referred to as final pre-encoding binary data. FIG. 7 shows binary data 2 before final encoding when the dictionary data 1 shown in FIG. 4 is encoded by the conversion unit 107. The binary data 2 before final encoding is actually a set of bits representing 0 or 1. However, it should be noted that for convenience of explanation, hexadecimal notation is used every 8 bits. For example, in the bit string “12A5B8CA” shown in FIG. 7, “00010010”, “10100101”, “10111000”, “11001010” are hexadecimal numbers such as “12”, “A5”, “B8”, “CA”, respectively. It is written

開始位置記憶部１０９は、符号化対象のデータに含まれる見出し語の開始位置（本明細書において圧縮境界とも言う。圧縮データを復号する場合の開始地点となる位置を示す。）を記憶する。 The start position storage unit 109 stores a start position of a headword included in the data to be encoded (also referred to as a compression boundary in the present specification, which indicates a position serving as a start point when decoding compressed data).

ビット列出現頻度取得部１１０は、最終符号化前バイナリデータに含まれる同一のビット列ごとに、最終符号化前バイナリデータにおける出現頻度を取得する。例えば、図７に示す最終符号化前バイナリデータ２においては、各ビット列の出現頻度は図８のようになる。ただし、上述した圧縮境界を跨ぐビット列の出現数はビット列の出現頻度に含めない。
なお、上記ビット列の桁数は符号化方法にもよるが自然言語であれば３２桁程度が適当である。そこで、本実施形態では、上記ビット列の桁数は３２桁とする。ただし、ビット列の桁数は３２桁に限定されず任意の桁数で構わない。 The bit string appearance frequency acquisition unit 110 acquires the appearance frequency in the binary data before final encoding for each identical bit string included in the binary data before final encoding. For example, in the binary data 2 before final encoding shown in FIG. 7, the appearance frequency of each bit string is as shown in FIG. However, the number of appearances of the bit string straddling the compression boundary described above is not included in the appearance frequency of the bit string.
Although the number of digits of the bit string depends on the encoding method, about 32 digits are appropriate for a natural language. Therefore, in this embodiment, the number of digits of the bit string is 32 digits. However, the number of digits of the bit string is not limited to 32 digits, and may be any number of digits.

ビット列識別番号関連付部１１１は、最終符号化前バイナリデータに出現する同一のビット列ごとに、ビット列を識別するビット列識別番号を関連付けてビット列識別番号記憶部１１２に記憶する。このとき、ビット列識別番号関連付部１１１は、出現頻度が１から５番目に高いビット列にビット列識別番号を関連付ける。また、出現頻度が高いビット列から順に、値が小さいビット列識別番号を関連付ける。例えば、図７に示す最終符号化前バイナリデータ２であれば、図９に示すように、出現頻度が１番目、２番目、３番目、４番目、５番目に高いビット列「１２Ａ５Ｂ８ＣＡ」、ビット列「ＤＦ４３Ａ６８Ｃ」、ビット列「３８５Ｃ６５Ｆ９」、ビット列「９３５ＡＤ６ＣＤ」、ビット列「Ｂ５８ＣＥＥＡ５」に、それぞれビット列識別番号０、１、２、３、４を関連付ける。 The bit string identification number associating unit 111 stores the bit string identification number for identifying the bit string in the bit string identification number storage unit 112 for each identical bit string that appears in the pre-encoded binary data. At this time, the bit string identification number associating unit 111 associates the bit string identification number with the bit string having the first to fifth highest appearance frequency. In addition, bit string identification numbers having smaller values are associated in descending order of bit strings. For example, in the case of binary data 2 before final encoding shown in FIG. 7, as shown in FIG. 9, the bit string “12A5B8CA”, the bit string “12A5B8CA” having the highest appearance frequency is the first, second, third, fourth, and fifth. Bit string identification numbers 0, 1, 2, 3, and 4 are associated with DF43A68C, bit string "385C65F9", bit string "935AD6CD", and bit string "B58CEEA5", respectively.

ビット列識別番号記憶部１１２は、図９に示すように、最終符号化前バイナリデータに出現する同一のビット列ごとに、ビット列を識別するビット列識別番号を関連付けて記憶する。 As shown in FIG. 9, the bit string identification number storage unit 112 stores a bit string identification number for identifying a bit string in association with each bit string that appears in the binary data before final encoding.

ビット列識別番号変換部１１３は、ビット列識別番号記憶部１１２を参照して、最終符号化前バイナリデータに出現する同一のビット列を、ビット列に関連付けられたビット列識別番号に変換する。ただし、ビット列識別番号変換部１１３は、ビット列が上述した圧縮境界を跨ぐ場合は、ビット列をビット列識別番号に変換しない。これにより、見出し語単位で復号可能なように符号化されたデータ（圧縮データ）を生成することができる。 The bit string identification number conversion unit 113 refers to the bit string identification number storage unit 112 and converts the same bit string appearing in the binary data before final encoding into a bit string identification number associated with the bit string. However, the bit string identification number conversion unit 113 does not convert the bit string into the bit string identification number when the bit string crosses the compression boundary described above. Thereby, data (compressed data) encoded so as to be decodable in units of headwords can be generated.

参照フラグ付き符号化部１１４は、デルタ符号化方法、Variable Byte Code符号化方法、ハフマン符号化方法といった符号化方法を用いて、ビット列識別番号変換部１１３が変換したビット列識別番号をバイナリデータに符号化する。そして、ビット列識別番号のバイナリデータに参照フラグ（復号時にビット列識別番号記憶部１１２を参照することを示すフラグ）を関連付ける。
このように、頻出するビット列をコード長の短いビット列識別番号のバイナリデータに符号化することで、符号化されたデータ（圧縮データ）のサイズを小さくすることができる。 The encoding unit with a reference flag 114 encodes the bit string identification number converted by the bit string identification number conversion unit 113 into binary data using an encoding method such as a delta encoding method, a variable byte code encoding method, or a Huffman encoding method. Turn into. Then, a reference flag (a flag indicating that the bit string identification number storage unit 112 is referred to at the time of decoding) is associated with the binary data of the bit string identification number.
In this way, by encoding a frequently occurring bit string into binary data having a bit string identification number with a short code length, the size of the encoded data (compressed data) can be reduced.

非参照フラグ付き符号化部１１５は、最終符号化前バイナリデータに出現するビット列のうちビット列識別番号が関連付けられていないビット列について、ビット列の先頭からビット列識別番号が関連付けられたビット列が出現するまでのビットの桁数をバイナリデータに符号化する。この際、ビットの桁数のバイナリデータに、非参照フラグ（復号時にビット列識別番号記憶部１１２を参照しないことを示すフラグ）を関連付ける。
また、上記ビット列が圧縮境界を跨ぐ場合は、ビット列の先頭から圧縮境界までのビットの桁数をバイナリデータに符号化し、このバイナリデータに上記非参照フラグを関連付ける。
さらに、ビット列識別番号が関連付けられたビット列のうち圧縮境界を跨ぐためビット列識別番号に変換されなかったビット列についても、圧縮境界を跨ぐ場合と跨がない場合に応じて同様の処理を行う。 The non-reference flag-attached encoding unit 115 performs the process from the beginning of the bit string to the appearance of the bit string associated with the bit string identification number for the bit string that is not associated with the bit string identification number among the bit strings that appear in the binary data before final encoding. Encode the number of bits into binary data. At this time, a non-reference flag (a flag indicating that the bit string identification number storage unit 112 is not referred to at the time of decoding) is associated with binary data having the number of bits.
If the bit string crosses the compression boundary, the number of bits from the beginning of the bit string to the compression boundary is encoded into binary data, and the non-reference flag is associated with the binary data.
Further, the same processing is performed on the bit string that is not converted to the bit string identification number because it crosses the compression boundary among the bit strings associated with the bit string identification number, depending on whether or not the compression boundary is crossed.

圧縮データ記憶部１１６は、符号化対象のデータが符号化されたバイナリデータ（圧縮データ）を記憶する。 The compressed data storage unit 116 stores binary data (compressed data) obtained by encoding data to be encoded.

以上のような符号化装置１００が実行する符号化処理の流れについて、図３に示すフローチャートを参照して説明する。 The flow of the encoding process performed by the encoding apparatus 100 as described above will be described with reference to the flowchart shown in FIG.

［符号化処理］
表示部１０２が、符号化候補のデータを記録したファイルの名称（ファイル名）を表示装置１４に表示しているとする。ユーザは、表示装置１４に表示されたファイル名を参照して、これら符号化候補の中から符号化対象とするもののファイル名を入力装置１３に入力する。入力装置１３は、入力されたファイル名を示す信号を生成し、入力部１０３に供給する。入力部１０３は、入力装置１３から供給された信号を受け付け、ユーザが符号化対象としたファイルを特定する。そして、ユーザが符号化対象としたファイルのデータを符号化する旨の指示を文字出現頻度取得部１０４に供給する。文字出現頻度取得部１０４は、この指示を受け付け、図３に示す符号化処理を開始する。 [Encoding process]
It is assumed that the display unit 102 displays the name (file name) of the file in which the encoding candidate data is recorded on the display device 14. The user refers to the file name displayed on the display device 14 and inputs the file name of an encoding target among these encoding candidates to the input device 13. The input device 13 generates a signal indicating the input file name and supplies it to the input unit 103. The input unit 103 receives a signal supplied from the input device 13 and specifies a file to be encoded by the user. Then, the user supplies an instruction to the character appearance frequency acquisition unit 104 to encode the file data to be encoded. The character appearance frequency acquisition unit 104 receives this instruction and starts the encoding process shown in FIG.

まず、文字出現頻度取得部１０４は、指示に基づいて、符号化候補データ記憶部１０１が記憶する複数の符号化候補のデータの中から符号化対象のデータを取得する（ステップＳ１０）。以下、理解を容易にするために、符号化対象のデータとして図４に示す辞書データ１が取得されたものとする。 First, the character appearance frequency acquisition unit 104 acquires data to be encoded from among a plurality of encoding candidate data stored in the encoding candidate data storage unit 101 based on the instruction (step S10). Hereinafter, in order to facilitate understanding, it is assumed that dictionary data 1 shown in FIG. 4 is acquired as data to be encoded.

文字出現頻度取得部１０４は、辞書データ１に含まれる文字ごとに、辞書データ１における出現頻度を取得する（ステップＳ１１）。なお、図５に示すように、辞書データ１において文字「ｔ」の出現頻度が９２０４１回で最も高い。続いて、文字「ｈ」、文字「ｅ」、文字「ｓ」、文字「ｒ」・・・の出現頻度がそれぞれ８３８９０回、８０９８４回、７６１８９回、６８６０７回・・・で２番目、３番目、４番目、５番目・・・に高い。 The character appearance frequency acquisition unit 104 acquires the appearance frequency in the dictionary data 1 for each character included in the dictionary data 1 (step S11). As shown in FIG. 5, the appearance frequency of the character “t” in the dictionary data 1 is highest at 92041 times. Subsequently, the appearance frequency of the character “h”, the character “e”, the character “s”, the character “r”,... Is second, third, with 83890, 80984, 76189, 68607,. 4th, 5th ... high.

文字識別番号関連付部１０５は、辞書データ１において出現頻度が高い文字から順に、値が小さい文字識別番号を関連付けて文字識別番号記憶部１０６に記憶する（ステップＳ１２）。具体的には、図６に示すように、出現頻度が１番目、２番目、３番目、４番目、５番目・・・に高い文字「ｔ」、文字「ｈ」、文字「ｅ」、文字「ｓ」、文字「ｒ」・・・に、それぞれ文字識別番号０、１、２、３、４・・・を関連付ける。 The character identification number associating unit 105 associates character identification numbers with smaller values in order from the character having the highest appearance frequency in the dictionary data 1 and stores them in the character identification number storage unit 106 (step S12). Specifically, as shown in FIG. 6, the character “t”, the character “h”, the character “e”, the character whose appearance frequency is the first, second, third, fourth, fifth,... The character identification numbers 0, 1, 2, 3, 4... Are associated with “s”, the character “r”.

変換部１０７は、文字識別番号記憶部１０６を参照して、辞書データ１に含まれる文字を、文字に関連付けられた文字識別番号に変換する（ステップＳ１３）。具体的には、辞書データ１に含まれる文字「ｔ」、文字「ｈ」、文字「ｅ」・・・をそれぞれ文字識別番号０、１、２・・・に変換する。 The conversion unit 107 refers to the character identification number storage unit 106 and converts the characters included in the dictionary data 1 into character identification numbers associated with the characters (step S13). Specifically, the character “t”, the character “h”, the character “e”,... Included in the dictionary data 1 are converted into character identification numbers 0, 1, 2,.

符号化部１０８は、変換部１０７が変換した文字識別番号をバイナリデータに符号化する（ステップＳ１４）。具体的には、辞書データ１が図７に示す最終符号化前バイナリデータ２に符号化される。なお、上述したように、説明の都合上、図７に示す最終符号化前バイナリデータ２を１６進数表記の複数の数値で表していることに留意されたい。実際には、最終符号化前バイナリデータ２は、０又は１を表すビットの集合である。 The encoding unit 108 encodes the character identification number converted by the conversion unit 107 into binary data (step S14). Specifically, dictionary data 1 is encoded into binary data 2 before final encoding shown in FIG. Note that, as described above, for convenience of explanation, the binary data 2 before final encoding shown in FIG. 7 is represented by a plurality of numerical values in hexadecimal notation. Actually, the binary data 2 before final encoding is a set of bits representing 0 or 1.

ビット列出現頻度取得部１１０は、最終符号化前バイナリデータ２に出現する同一のビット列ごとに、最終符号化前バイナリデータ２における出現頻度を取得する（ステップＳ１５）。ただし、辞書データ１に含まれる見出し語の開始位置（圧縮境界）を跨ぐビット列の出現数はビット列の出現頻度に含めない。
なお、図８に示すように、最終符号化前バイナリデータ２において、ビット列「１２Ａ５Ｂ８ＣＡ」の出現頻度が１５０回で最も高い。また、ビット列「ＤＦ４３Ａ６８Ｃ」、ビット列「３８５Ｃ６５Ｆ９」、ビット列「９３５ＡＤ６ＣＤ」、ビット列「Ｄ５８ＣＥＥＡ５」、ビット列「１Ｂ３Ｃ２Ａ０９」・・・の出現頻度がそれぞれ１３０回、１００回、８０回、７０回、４０回・・・で２番目、３番目、４番目、５番目、６番目・・・に高い。 The bit string appearance frequency acquisition unit 110 acquires the appearance frequency in the binary data 2 before final encoding for each identical bit string that appears in the binary data 2 before final encoding (step S15). However, the number of appearances of the bit string straddling the start position (compression boundary) of the headword included in the dictionary data 1 is not included in the appearance frequency of the bit string.
As shown in FIG. 8, in the binary data 2 before final encoding, the appearance frequency of the bit string “12A5B8CA” is the highest at 150 times. Also, the appearance frequency of the bit string “DF43A68C”, the bit string “385C65F9”, the bit string “935AD6CD”, the bit string “D58CEEA5”, the bit string “1B3C2A09”,... 130 times, 100 times, 80 times, 70 times, 40 times, respectively.・ In 2nd, 3rd, 4th, 5th, 6th and so on.

ビット列識別番号関連付部１１１は、出現頻度が１から５番目に高いビット列に対して、出現頻度が高いビット列から順に、値が小さいビット列識別番号を関連付けてビット列識別番号記憶部１１２に記憶する（ステップＳ１６）。具体的には、図９に示すように、出現頻度が１番目、２番目、３番目、４番目、５番目に高いビット列「１２Ａ５Ｂ８ＣＡ」、ビット列「ＤＦ４３Ａ６８Ｃ」、ビット列「３８５Ｃ６５Ｆ９」、ビット列「９３５ＡＤ６ＣＤ」、ビット列「Ｂ５８ＣＥＥＡ５」に、それぞれビット列識別番号０、１、２、３、４を関連付ける。 The bit string identification number associating unit 111 associates bit string identification numbers with smaller values in order from the bit string with the highest appearance frequency to the bit string with the first to fifth highest occurrence frequency, and stores them in the bit string identification number storage unit 112 ( Step S16). Specifically, as shown in FIG. 9, the bit string “12A5B8CA”, the bit string “DF43A68C”, the bit string “385C65F9”, the bit string “935AD6CD” having the first, second, third, fourth, and fifth highest appearance frequencies. , Bit string identification numbers 0, 1, 2, 3, 4 are associated with the bit string “B58CEEA5”, respectively.

ビット列識別番号変換部１１３は、ビット列識別番号記憶部１１２を参照して、最終符号化前バイナリデータ２に出現する同一のビット列を、ビット列に関連付けられたビット列識別番号に変換する（ステップＳ１７）。具体的には、ビット列「１２Ａ５Ｂ８ＣＡ」、ビット列「ＤＦ４３Ａ６８Ｃ」、ビット列「３８５Ｃ６５Ｆ９」・・・を、それぞれビット列識別番号０、１、２・・・に変換する。ただし、ビット列識別番号変換部１１３は、圧縮境界を跨ぐビット列については、ビット列識別番号への変換を行わない。 The bit string identification number conversion unit 113 refers to the bit string identification number storage unit 112 and converts the same bit string appearing in the binary data 2 before final encoding into a bit string identification number associated with the bit string (step S17). Specifically, the bit string “12A5B8CA”, the bit string “DF43A68C”, the bit string “385C65F9”,... Are converted into bit string identification numbers 0, 1, 2,. However, the bit string identification number conversion unit 113 does not convert a bit string straddling the compression boundary into a bit string identification number.

参照フラグ付き符号化部１１４は、ビット列識別番号変換部１１３が変換したビット列識別番号をバイナリデータに符号化する。そして、ビット列識別番号のバイナリデータに参照フラグ（復号時にビット列識別番号記憶部１１２を参照することを示すフラグ）を関連付ける（ステップＳ１８）。 The encoding unit with reference flag 114 encodes the bit string identification number converted by the bit string identification number conversion unit 113 into binary data. Then, a reference flag (a flag indicating that the bit string identification number storage unit 112 is referred to at the time of decoding) is associated with the binary data of the bit string identification number (step S18).

次に、非参照フラグ付き符号化部１１５は、最終符号化前バイナリデータ２に出現するビット列のうちビット列識別番号が関連付けられていないビット列について、ビット列の先頭からビット列識別番号が関連付けられたビット列が出現するまでのビットの桁数をバイナリデータに符号化する。この際、ビットの桁数のバイナリデータに、非参照フラグ（復号時にビット列識別番号記憶部１１２を参照しないことを示すフラグ）を関連付ける。また、上記ビット列が圧縮境界を跨ぐ場合は、ビット列の先頭から圧縮境界までのビットの桁数をバイナリデータに符号化し、このバイナリデータに上記非参照フラグを関連付ける。（ステップＳ１９）。
さらに、非参照フラグ付き符号化部１１５は、ビット列識別番号が関連付けられたビット列のうち圧縮境界を跨ぐためビット列識別番号に変換されなかったビット列についても、圧縮境界を跨ぐ場合と跨がない場合に応じて同様の処理を行う（ステップＳ２０）。 Next, the non-reference flag-attached encoding unit 115 generates a bit string associated with a bit string identification number from the beginning of the bit string for a bit string that is not associated with a bit string identification number among the bit strings that appear in the binary data 2 before final encoding. Encode the number of bits before appearing to binary data. At this time, a non-reference flag (a flag indicating that the bit string identification number storage unit 112 is not referred to at the time of decoding) is associated with binary data having the number of bits. If the bit string crosses the compression boundary, the number of bits from the beginning of the bit string to the compression boundary is encoded into binary data, and the non-reference flag is associated with the binary data. (Step S19).
Furthermore, the encoding unit 115 with a non-reference flag also includes a case where a bit string that has not been converted to a bit string identification number because it crosses a compression boundary among bit strings associated with a bit string identification number and a case where the bit string does not cross the compression boundary. Accordingly, the same processing is performed (step S20).

そして、非参照フラグ付き符号化部１１５は、ステップＳ１０からＳ２０までの処理によって辞書データ１が符号化されたバイナリデータ（圧縮データ）を圧縮データ記憶部１１６に記憶する（ステップＳ２１）。 Then, the encoding unit 115 with the non-reference flag stores the binary data (compressed data) obtained by encoding the dictionary data 1 by the processing from steps S10 to S20 in the compressed data storage unit 116 (step S21).

次に上記符号化処理によって符号化された圧縮データを部分的に復号する復号装置２００の物理構成及び機能構成を説明する。 Next, the physical configuration and functional configuration of the decoding apparatus 200 that partially decodes the compressed data encoded by the above encoding process will be described.

本実施形態に係る復号装置２００は、物理的には図１０に示すように構成される。即ち、復号装置２００は、ＲＯＭ２０と、ＲＡＭ２１と、外部記憶装置２２と、入力装置２３と、表示装置２４と、ＣＰＵ２５と、を備える。 The decoding device 200 according to the present embodiment is physically configured as shown in FIG. That is, the decoding device 200 includes a ROM 20, a RAM 21, an external storage device 22, an input device 23, a display device 24, and a CPU 25.

ＲＯＭ２０は、各種初期設定、ハードウェアの検査、プログラムのロード等を行うための初期プログラムを記憶する。ＲＡＭ２１は、ＣＰＵ２５が実行する各種ソフトウェアプログラム、これらのソフトウェアプログラムの実行に必要なデータ等を一時的に記憶する。 The ROM 20 stores an initial program for performing various initial settings, hardware inspection, program loading, and the like. The RAM 21 temporarily stores various software programs executed by the CPU 25, data necessary for executing these software programs, and the like.

外部記憶装置２２は、例えば、ハードディスクであって、各種ソフトウェアプログラム、データ等を記憶する。これらソフトウェアプログラムの中には、アプリケーションソフトウェアプログラム、ＯＳのような基本ソフトウェアプログラム等が含まれている。 The external storage device 22 is a hard disk, for example, and stores various software programs, data, and the like. These software programs include application software programs, basic software programs such as an OS, and the like.

入力装置２３は、キーボード、マウス、トラックパッド等を備え、ユーザからの入力を受け付ける。入力装置２３は、キーボード、マウス、トラックパッド等からの入力に基づいて信号を生成し、ＣＰＵ２５に供給する。 The input device 23 includes a keyboard, a mouse, a track pad, and the like, and receives input from the user. The input device 23 generates a signal based on input from a keyboard, mouse, track pad, etc., and supplies the signal to the CPU 25.

表示装置２４は、液晶ディスプレイ等の画面を備え、ＣＰＵ２５から供給されたテキストデータや画像データを画面に表示する。 The display device 24 includes a screen such as a liquid crystal display, and displays text data and image data supplied from the CPU 25 on the screen.

ＣＰＵ２５は、外部記憶装置２２に記憶されたソフトウェアプログラムをＲＡＭ２１に読み出して、そのソフトウェアプログラムを実行制御することにより、以下の機能構成を実現する。 The CPU 25 implements the following functional configuration by reading the software program stored in the external storage device 22 into the RAM 21 and executing and controlling the software program.

復号装置２００は、機能的には図１１に示すように構成される。即ち、復号装置２００は、ビット列識別番号記憶部２０１と、圧縮データ記憶部２０２と、開始位置記憶部２０３と、ビット列識別番号復号方法記憶部２０４と、入力部２０５と、ビット列識別番号復号部２０６と、ビット列識別番号変換部２０７と、文字列識別番号復号方法記憶部２０８と、文字識別番号記憶部２０９と、復号部２１０と、変換部２１１と、表示部２１２と、を備える。なお、ビット列識別番号記憶部２０１、圧縮データ記憶部２０２、開始位置記憶部２０３、文字識別番号記憶部２０９は、それぞれ符号化装置１００が備えるビット列識別番号記憶部１１２、圧縮データ記憶部１１６、開始位置記憶部１０９、文字識別番号記憶部１０６が記憶するデータと同じデータを記憶している。ビット列識別番号記憶部２０１と、圧縮データ記憶部２０２と、開始位置記憶部２０３と、ビット列識別番号復号方法記憶部２０４と、文字識別番号復号方法記憶部２０８と、文字識別番号記憶部２０９と、は図１０に示す外部記憶装置２２に構築されている。 The decoding device 200 is functionally configured as shown in FIG. That is, the decoding device 200 includes a bit string identification number storage unit 201, a compressed data storage unit 202, a start position storage unit 203, a bit string identification number decoding method storage unit 204, an input unit 205, and a bit string identification number decoding unit 206. A bit string identification number conversion unit 207, a character string identification number decoding method storage unit 208, a character identification number storage unit 209, a decoding unit 210, a conversion unit 211, and a display unit 212. The bit string identification number storage unit 201, the compressed data storage unit 202, the start position storage unit 203, and the character identification number storage unit 209 are respectively a bit string identification number storage unit 112, a compressed data storage unit 116, and a start included in the encoding device 100. The same data as the data stored in the position storage unit 109 and the character identification number storage unit 106 is stored. A bit string identification number storage unit 201, a compressed data storage unit 202, a start position storage unit 203, a bit string identification number decoding method storage unit 204, a character identification number decoding method storage unit 208, a character identification number storage unit 209, Is constructed in the external storage device 22 shown in FIG.

ビット列識別番号記憶部２０１は、最終符号化前バイナリデータに出現する同一のビット列ごとに、ビット列を識別するビット列識別番号を関連付けて記憶する。 The bit string identification number storage unit 201 stores a bit string identification number for identifying a bit string in association with each bit string that appears in the binary data before final encoding.

圧縮データ記憶部２０２は、符号化対象のデータが上記符号化処理によって符号化されたバイナリデータ（圧縮データ）を記憶する。 The compressed data storage unit 202 stores binary data (compressed data) in which data to be encoded is encoded by the encoding process.

開始位置記憶部２０３は、符号化対象のデータに含まれる見出し語の開始位置（圧縮境界）を記憶する。 The start position storage unit 203 stores the start position (compression boundary) of the headword included in the data to be encoded.

ビット列識別番号復号方法記憶部２０４は、圧縮データに含まれるビット列識別番号のバイナリデータをビット列識別番号に復号する方法を記憶する。具体例を挙げると、ビット列識別番号のバイナリデータがデルタ符号化方法を用いて符号化されている場合には、ビット列識別番号復号方法記憶部２０４は、デルタ符号化方法で符号化されたバイナリデータを符号化前の元データに戻すロジックをビット列識別番号復号方法として記憶する。 The bit string identification number decoding method storage unit 204 stores a method for decoding binary data of a bit string identification number included in the compressed data into a bit string identification number. As a specific example, when the binary data of the bit string identification number is encoded using the delta encoding method, the bit string identification number decoding method storage unit 204 stores the binary data encoded by the delta encoding method. Is stored as a bit string identification number decoding method.

入力部２０５は、入力装置２３が生成した信号を受け付ける。この信号は、ユーザが入力装置２３を介して指定した見出し語を示す。入力部２０５は、ユーザが指定した見出し語及びその見出し語の例文であることを復号対象のデータを特定するための条件に設定する。そして、ビット列識別番号復号部２０６に設定した条件を入力する。 The input unit 205 receives a signal generated by the input device 23. This signal indicates a headword specified by the user via the input device 23. The input unit 205 sets a headword designated by the user and an example sentence of the headword as a condition for specifying data to be decoded. Then, the condition set in the bit string identification number decoding unit 206 is input.

ビット列識別番号復号部２０６は、圧縮データのうち入力部２０５から入力された条件を満たすデータを復号対象のデータとして特定する。例えば、見出し語「ｔｈｅ」とその例文であることが条件であれば、圧縮データに含まれる見出し語「ｔｈｅ」とその例文を復号対象のデータとして特定する。なお、圧縮データにおける見出し語及び例文の位置は、開始位置記憶部２０３が記憶する各見出し語の開始位置に基づいて特定される。 The bit string identification number decoding unit 206 specifies data among the compressed data that satisfies the condition input from the input unit 205 as data to be decoded. For example, if the condition is the headword “the” and its example sentence, the headword “the” and its example sentence included in the compressed data are specified as data to be decoded. Note that the positions of headwords and example sentences in the compressed data are specified based on the start positions of the headwords stored in the start position storage unit 203.

また、ビット列識別番号復号部２０６は、見出し語の開始位置に参照フラグと非参照フラグのどちらが存在するか判別する。参照フラグが存在すると判別した場合、参照フラグに関連付けられているビット列識別番号のバイナリデータをビット列識別番号に復号する。復号方法は、ビット列識別番号復号方法記憶部２０４が記憶する復号方法が採用される。
一方、非参照フラグが存在すると判別した場合、復号部２１０に復号処理の制御を移す。 Also, the bit string identification number decoding unit 206 determines whether a reference flag or a non-reference flag exists at the start position of the headword. When it is determined that the reference flag exists, the binary data of the bit string identification number associated with the reference flag is decoded into the bit string identification number. As a decoding method, a decoding method stored in the bit string identification number decoding method storage unit 204 is employed.
On the other hand, when it is determined that the non-reference flag exists, the control of the decoding process is transferred to the decoding unit 210.

ビット列識別番号変換部２０７は、ビット列識別番号記憶部２０１を参照し、ビット列識別番号復号部２０６が復号したビット列識別番号を、ビット列識別番号に関連付けられたビット列に変換する。そして、変換したビット列をキューにコピーする。 The bit string identification number conversion unit 207 refers to the bit string identification number storage unit 201 and converts the bit string identification number decoded by the bit string identification number decoding unit 206 into a bit string associated with the bit string identification number. Then, the converted bit string is copied to the queue.

文字識別番号復号方法記憶部２０８は、圧縮データに含まれる文字識別番号のバイナリデータを文字識別番号に復号する方法を記憶している。具体例を挙げると、文字識別番号のバイナリデータがデルタ符号化方法を用いて符号化されている場合には、文字識別番号復号方法記憶部２０８は、デルタ符号化方法で符号化されたバイナリデータを符号化前の元データに戻すロジックを文字列識別番号復号方法として記憶する。 The character identification number decoding method storage unit 208 stores a method for decoding character identification number binary data included in the compressed data into character identification numbers. As a specific example, when the binary data of the character identification number is encoded using the delta encoding method, the character identification number decoding method storage unit 208 stores the binary data encoded by the delta encoding method. Is stored as a character string identification number decoding method.

文字識別番号記憶部２０９は、圧縮データに含まれる文字と、文字を識別する文字識別番号と、を関連付けて記憶している。 The character identification number storage unit 209 stores a character included in the compressed data and a character identification number for identifying the character in association with each other.

復号部２１０は、ビット列識別番号変換部２０７がキューにコピーしたビット列を文字識別番号に復号する。
一方、復号部２１０は、ビット列識別番号復号部２０６から復号処理の制御を移された場合、非参照フラグに関連付けられているビットの桁数を示すデータを読み込む。そして、上記桁数分のバイナリデータをさらに読み込み、キューにコピーする。そして、キューにコピーしたバイナリデータを文字識別番号に復号する。
なお、復号方法は、文字識別番号復号方法記憶部２０８が記憶する復号方法が採用される。 The decoding unit 210 decodes the bit string copied to the queue by the bit string identification number conversion unit 207 into a character identification number.
On the other hand, when the decoding process is transferred from the bit string identification number decoding unit 206, the decoding unit 210 reads data indicating the number of digits of the bit associated with the non-reference flag. Then, the binary data corresponding to the number of digits is further read and copied to the queue. Then, the binary data copied to the queue is decoded into a character identification number.
Note that the decoding method stored in the character identification number decoding method storage unit 208 is employed.

変換部２１１は、文字識別番号記憶部２０９を参照し、復号部２１０が復号した文字識別番号を文字に変換する。 The conversion unit 211 refers to the character identification number storage unit 209 and converts the character identification number decoded by the decoding unit 210 into a character.

表示部２１２は、圧縮データが復号されたデータ（符号化前の元データ）を表示装置２４に表示する。 The display unit 212 displays data obtained by decoding the compressed data (original data before encoding) on the display device 24.

以上のような復号装置２００が実行する復号処理の流れについて、図１２に示すフローチャートを参照して説明する。 The flow of the decoding process executed by the decoding apparatus 200 as described above will be described with reference to the flowchart shown in FIG.

［復号処理］
ここで、圧縮データ記憶部２０２が記憶する圧縮データは、図４に示す辞書データ１のバイナリデータであるとする。そして、ユーザが、「ｔｈｅ」の定義や「ｔｈｅ」の使用例を調べるにために、入力装置２３に見出し語「ｔｈｅ」を入力したとする。この場合、入力装置２３は、入力された見出し語「ｔｈｅ」を示す信号を生成し、入力部２０５に供給する。入力部２０５は、入力装置２３から供給された信号を受け付け、入力された見出し語が「ｔｈｅ」であることを特定する。そして、見出し語「ｔｈｅ」とその例文であることを復号対象の条件に設定し、ビット列識別番号復号部２０６に設定した条件を入力する。ビット列識別番号復号部２０６は、入力部２０５から上記条件を受け付け、図１２に示す復号処理を開始する。 [Decryption process]
Here, it is assumed that the compressed data stored in the compressed data storage unit 202 is binary data of the dictionary data 1 shown in FIG. Then, it is assumed that the user inputs the headword “the” to the input device 23 in order to examine the definition of “the” and usage examples of “the”. In this case, the input device 23 generates a signal indicating the input headword “the” and supplies the signal to the input unit 205. The input unit 205 receives the signal supplied from the input device 23 and specifies that the input headword is “the”. Then, the headword “the” and its example sentence are set as a condition to be decoded, and the condition set in the bit string identification number decoding unit 206 is input. The bit string identification number decoding unit 206 receives the above condition from the input unit 205 and starts the decoding process shown in FIG.

ビット列識別番号復号部２０６は、圧縮データのうち上記条件を満たすデータを復号対象のデータとして特定する（ステップＳ３０）。ここで上記条件を満たすのは見出し語「ｔｈｅ」とその例文である。従って、見出し語「ｔｈｅ」とその例文が復号対象のデータとして特定される。 The bit string identification number decoding unit 206 identifies data among the compressed data that satisfies the above conditions as data to be decoded (step S30). Here, the headword “the” and its example sentences satisfy the above conditions. Therefore, the headword “the” and its example sentence are specified as data to be decoded.

次に、ビット列識別番号復号部２０６は、見出し語「ｔｈｅ」の開始位置に参照フラグと非参照フラグのどちらが存在するか判別する（ステップＳ３１）。参照フラグが存在すると判別した場合、参照フラグに関連付けられているビット列識別番号のバイナリデータを読み込む。そして、ビット列識別番号復号方法記憶部２０４が記憶する復号方法を用いて、ビット列識別番号のバイナリデータをビット列識別番号に復号する（ステップＳ３２）。なお、ここではステップＳ３２において復号されたビット列識別番号を「０」とする。 Next, the bit string identification number decoding unit 206 determines whether a reference flag or a non-reference flag exists at the start position of the headword “the” (step S31). If it is determined that the reference flag exists, the binary data of the bit string identification number associated with the reference flag is read. Then, using the decoding method stored in the bit string identification number decoding method storage unit 204, the binary data of the bit string identification number is decoded into the bit string identification number (step S32). Here, the bit string identification number decoded in step S32 is “0”.

ビット列識別番号変換部２０７は、ビット列識別番号記憶部２０１を参照して、ビット列識別番号復号部２０６が復号したビット列識別番号を、ビット列識別番号に関連付けられたビット列に変換する（ステップＳ３３）。そして、変換したビット列をキューにコピーする（ステップＳ３４）。具体的には、ステップＳ３２において復号されたビット列識別番号「０」を、ビット列識別番号「０」に関連付けられたビット列「１２Ａ５Ｂ８ＣＡ」（図９参照）に変換し、ビット列「１２Ａ５Ｂ８ＣＡ」をキューにコピーする。 The bit string identification number conversion unit 207 refers to the bit string identification number storage unit 201 and converts the bit string identification number decoded by the bit string identification number decoding unit 206 into a bit string associated with the bit string identification number (step S33). Then, the converted bit string is copied to the queue (step S34). Specifically, the bit string identification number “0” decoded in step S32 is converted into the bit string “12A5B8CA” (see FIG. 9) associated with the bit string identification number “0”, and the bit string “12A5B8CA” is copied to the queue. To do.

復号部２１０は、キューに存在するビット列を、文字識別番号復号方法記憶部２０８が記憶する復号方法を用いて文字識別番号に復号する（ステップＳ３５）。ここでは、ビット列「１２Ａ５Ｂ８ＣＡ」が文字識別番号「０」、「１」、「２」、「３」に復号されたとする。 The decoding unit 210 decodes the bit string existing in the queue into the character identification number using the decoding method stored in the character identification number decoding method storage unit 208 (step S35). Here, it is assumed that the bit string “12A5B8CA” is decoded into character identification numbers “0”, “1”, “2”, and “3”.

変換部２１１は、文字識別番号記憶部２０９を参照し、復号部２１０が復号した文字識別番号を文字に変換する（ステップＳ３６）。具体的には、ステップＳ３５で復号された文字識別番号「０」、「１」、「２」、「３」を、それぞれの文字識別番号に関連付けられた文字「ｔ」、「ｈ」、「ｅ」、「ｓ」（図６参照）に変換する。 The conversion unit 211 refers to the character identification number storage unit 209 and converts the character identification number decoded by the decoding unit 210 into a character (step S36). Specifically, the character identification numbers “0”, “1”, “2”, “3” decrypted in step S35 are replaced with the characters “t”, “h”, “ e ”and“ s ”(see FIG. 6).

文字への変換を終えると、変換部２１１は、全ての復号対象のデータを文字に変換したか否かを判別する（ステップＳ３７）。 When the conversion to characters is completed, the conversion unit 211 determines whether all the data to be decoded has been converted to characters (step S37).

変換部２１１は、復号対象のデータに変換されていない部分が存在すると判別した場合（ステップＳ３７；Ｎｏ）、復号処理の制御をビット列識別番号復号部２０６に移す。この場合、ビット列識別番号復号部２０６は、復号済みであるビット列識別番号のバイナリデータの後尾に参照フラグと非参照フラグのどちらが存在するか判別する（ステップＳ３１）。ビット列識別番号復号部２０６は、非参照フラグが存在すると判別した場合、復号処理の制御を復号部２１０に移す。この場合、復号部２１０は、非参照フラグに関連付けられているビットの桁数のバイナリデータを読み込む（ステップＳ３８）。そして、復号済みであるビット列識別番号のバイナリデータの後尾から上記桁数分のバイナリデータを読み込み（ステップＳ３９）、キューにコピーする（ステップＳ４０）。 When the conversion unit 211 determines that there is a portion that has not been converted into the data to be decoded (step S37; No), the conversion unit 211 moves the control of the decoding process to the bit string identification number decoding unit 206. In this case, the bit string identification number decoding unit 206 determines whether a reference flag or a non-reference flag exists at the end of the binary data of the bit string identification number that has been decoded (step S31). If the bit string identification number decoding unit 206 determines that a non-reference flag exists, the bit string identification number decoding unit 206 moves the control of the decoding process to the decoding unit 210. In this case, the decoding unit 210 reads binary data having the number of bits associated with the non-reference flag (step S38). Then, the binary data corresponding to the number of digits is read from the tail of the binary data of the bit string identification number that has been decoded (step S39) and copied to the queue (step S40).

復号部２１０は、キューに存在するバイナリデータを、文字識別番号復号方法記憶部２０８が記憶する復号方法を用いて文字識別番号に復号する（ステップＳ３５）。 The decoding unit 210 decodes the binary data present in the queue into a character identification number using the decoding method stored in the character identification number decoding method storage unit 208 (step S35).

変換部２１１は、文字識別番号記憶部２０９を参照し、復号部２１０が復号した文字識別番号を文字に変換する（ステップＳ３６）。 The conversion unit 211 refers to the character identification number storage unit 209 and converts the character identification number decoded by the decoding unit 210 into a character (step S36).

文字への変換を終えると、変換部２１１は、全ての復号対象のデータを文字に変換したか否かを判別する（ステップＳ３７）。復号対象のデータに変換されていない部分が存在すると判別した場合（ステップＳ３７；Ｎｏ）、復号処理の制御をビット列識別番号復号部２０６に移す。そして、ステップＳ３１からＳ３７までの処理が、全ての復号対象のデータが文字に変換されるまで繰り返し実行される。 When the conversion to characters is completed, the conversion unit 211 determines whether all the data to be decoded has been converted to characters (step S37). When it is determined that there is a part that has not been converted into the data to be decoded (step S37; No), the control of the decoding process is transferred to the bit string identification number decoding unit 206. Then, the processing from steps S31 to S37 is repeatedly executed until all data to be decoded is converted into characters.

ステップＳ３７において、変換部２１１が全ての復号対象のデータを文字に変換したと判別した場合（ステップＳ３７；Ｙｅｓ）、表示部２１２は、変換された文字を表示装置２４に表示する（ステップＳ４１）。具体的には、見出し語「ｔｈｅ」とその例文を表示装置２４に表示する。 If it is determined in step S37 that the conversion unit 211 has converted all data to be decoded into characters (step S37; Yes), the display unit 212 displays the converted characters on the display device 24 (step S41). . Specifically, the headword “the” and its example sentence are displayed on the display device 24.

このように、上記復号処理では、圧縮データの一部がユーザに指定された見出し語の開始位置から逐次的に復号される。その際、参照フラグと非参照フラグのどちらが存在するかが復号前に判別され、復号しようとしているバイナリデータが文字識別番号のバイナリデータかビット列識別番号のバイナリデータかが特定される。ここで、参照フラグが存在すると判別された場合は、ビット列識別番号のバイナリデータであると特定され、非参照フラグが存在すると判別された場合は、文字識別番号のバイナリデータであると特定される。そして、特定されたバイナリデータの種別に応じた復号方法で、文字識別番号のバイナリデータは文字識別番号に復号され、ビット列識別番号のバイナリデータはビット列に復号される。
ここで注目すべき点は、文字識別番号を文字に変換するために必要となる変換前の元データ、ビット列を文字識別番号に変換するために必要となる変換前の元データを文字識別番号記憶部２０９とビット列識別番号記憶部２０１から自在に取得できることである。これは、圧縮データの一部である復号対象のデータに変換前の元データが存在しなくても、元データに変換可能ということを意味する。従って、従来の符号化方法・復号方法であれば、符号化対象のデータを予め部分的に符号化しておかなければ部分的復号を実行できなかったが、本願発明によればそのようなことをしなくても部分的復号を実行できる。 Thus, in the decoding process, a part of the compressed data is sequentially decoded from the start position of the headword designated by the user. At this time, it is determined before decoding whether a reference flag or a non-reference flag exists, and it is specified whether the binary data to be decoded is binary data of a character identification number or binary data of a bit string identification number. Here, when it is determined that the reference flag exists, it is specified as binary data of the bit string identification number, and when it is determined that the non-reference flag exists, it is specified as binary data of the character identification number. . Then, the binary data of the character identification number is decoded into the character identification number and the binary data of the bit string identification number is decoded into the bit string by a decoding method corresponding to the type of the specified binary data.
What should be noted here is that the original data before conversion necessary for converting the character identification number into characters and the original data before conversion necessary for converting the bit string into the character identification number are stored in the character identification number. It can be freely acquired from the unit 209 and the bit string identification number storage unit 201. This means that the data to be decoded, which is a part of the compressed data, can be converted into the original data even if the original data before conversion does not exist. Therefore, with the conventional encoding method / decoding method, partial decoding cannot be performed unless the data to be encoded is partially encoded in advance. Even partial decoding can be performed.

以上説明したように、本実施形態に係る符号化装置１００は、符号化対象のデータを文字列識別番号及びビット列識別番号のバイナリデータに符号化する。そして、復号装置２００は、文字識別番号記憶部２０９とビット列識別番号記憶部２０１を参照し、符号化装置１００が符号化したバイナリデータ（圧縮データ）を部分的に復号する。これらは、上述したように、符号化対象のデータを部分的に符号化しなくても圧縮データの部分的復号を可能とする。従って、従来の符号化方法・復号方法に比べて符号化に手間がかからず、符号化効率が高い。 As described above, the encoding apparatus 100 according to the present embodiment encodes data to be encoded into binary data of a character string identification number and a bit string identification number. Then, the decoding device 200 refers to the character identification number storage unit 209 and the bit string identification number storage unit 201, and partially decodes the binary data (compressed data) encoded by the encoding device 100. As described above, these allow partial decoding of compressed data without partial encoding of data to be encoded. Therefore, compared with the conventional encoding method / decoding method, encoding is not time-consuming and encoding efficiency is high.

（変形例）
以上に本発明の実施形態について説明したが、上記実施形態は一例であり、本発明の適用範囲はこれに限られない。すなわち、本発明の実施形態は種々の応用が可能であり、あらゆる実施の形態が本発明の範囲に含まれる。 (Modification)
Although the embodiment of the present invention has been described above, the above embodiment is an example, and the scope of application of the present invention is not limited to this. That is, the embodiments of the present invention can be applied in various ways, and all the embodiments are included in the scope of the present invention.

例えば、上記実施形態では、出現頻度が１から５番目に高いビット列にビット列識別番号を関連付けたが、ビット列識別番号を関連付ける対象が、出現頻度が１から５番目に高いビット列に限定されるわけではない。例えば、出現頻度が１から１０番目に高いビット列にビット列識別番号を関連付けてもよいし、全てのビット列にビット列識別番号を関連付けてもよい。ただし、出現頻度が低いビット列をコード長が長いビット列識別番号のバイナリデータに符号化するとかえって圧縮率が低下するため、ビット列識別番号を関連付ける対象を出現頻度が高いビット列に限定したほうが好ましい。 For example, in the above embodiment, the bit string identification number is associated with the bit string having the first to fifth highest occurrence frequency, but the target to associate the bit string identification number is not limited to the bit string having the first to fifth highest occurrence frequency. Absent. For example, a bit string identification number may be associated with a bit string having the highest appearance frequency from 1 to 10, or a bit string identification number may be associated with all bit strings. However, since a compression rate is lowered when a bit string having a low appearance frequency is encoded into binary data having a bit string identification number having a long code length, it is preferable to limit a target to be associated with a bit string identification number to a bit string having a high appearance frequency.

また、上記実施形態では、見出し語の開始位置を圧縮境界としたが、見出し語の開始位置と見出し語の例文の開始位置を圧縮境界としてもよい。 In the above embodiment, the start position of the headword is used as the compression boundary. However, the start position of the headword and the start position of the example sentence of the headword may be used as the compression boundary.

また、上記実施形態では、符号化装置１００と復号装置２００とを互いに独立した装置として記載した。しかし、符号化装置１００が、復号装置２００の機能を全て備え、復号装置として動作してもよいし、復号装置２００が、符号化装置１００の機能を全て備え、符号化装置として動作してもよい。 In the above embodiment, the encoding device 100 and the decoding device 200 are described as devices independent of each other. However, the encoding apparatus 100 may have all the functions of the decoding apparatus 200 and operate as a decoding apparatus, or the decoding apparatus 200 may have all the functions of the encoding apparatus 100 and operate as an encoding apparatus. Good.

また、復号装置２００は、圧縮データから復号対象のデータを特定する特定部を備えていてもよい。そして、図１２に示す復号処理のステップＳ３０で、特定部が圧縮データから復号対象のデータを特定してもよい。この場合、ビット列識別番号復号部２０６は、特定部が特定した復号対象のデータを復号する。 In addition, the decoding device 200 may include a specifying unit that specifies data to be decoded from the compressed data. Then, in step S30 of the decoding process shown in FIG. 12, the specifying unit may specify the data to be decoded from the compressed data. In this case, the bit string identification number decoding unit 206 decodes the decoding target data specified by the specifying unit.

なお、本発明に係る機能を実現するための構成を予め備えた符号化装置、復号装置として提供できることはもとより、プログラムの適用により、既存のパーソナルコンピュータや情報端末機器等を、本発明に係る符号化装置、復号装置として機能させることもできる。すなわち、上記実施形態で例示した符号化装置、復号装置による各機能構成を実現させるためのプログラムを、既存のパーソナルコンピュータや情報端末機器等を制御するＣＰＵ等が実行できるように適用することで、本発明に係る符号化装置、復号装置として機能させることができる。また、本発明に係る符号化方法、復号方法は、符号化装置、復号装置を用いて実施できる。 It should be noted that, in addition to being able to be provided as an encoding device and a decoding device provided in advance with the configuration for realizing the functions according to the present invention, an existing personal computer, an information terminal device, etc. can be installed according to the present invention by applying a program. It can also function as an encoding device or a decoding device. That is, by applying the program for realizing each functional configuration by the encoding device and the decoding device exemplified in the above embodiment so that a CPU or the like that controls an existing personal computer or information terminal device can be executed, The coding apparatus and decoding apparatus according to the present invention can be made to function. The encoding method and decoding method according to the present invention can be implemented using an encoding device and a decoding device.

また、このようなプログラムの適用方法は任意である。プログラムを、例えば、コンピュータが読取可能な記録媒体［ＣＤ−ＲＯＭ（Compact Disc Read-Only Memory）、ＤＶＤ（Digital Versatile Disc）、ＭＯ（Magneto Optical disc）等］に格納して適用できる他、インターネット等のネットワーク上のストレージにプログラムを格納しておき、これをダウンロードさせることにより適用することもできる。 Moreover, the application method of such a program is arbitrary. For example, the program can be stored and applied to a computer-readable recording medium [CD-ROM (Compact Disc Read-Only Memory), DVD (Digital Versatile Disc), MO (Magneto Optical disc), etc.], the Internet, etc. It is also possible to apply the program by storing it in a storage on the network and downloading it.

以上、本発明の好ましい実施形態について説明したが、本発明は係る特定の実施形態に限定されるものではなく、本発明には、特許請求の範囲に記載された発明とその均等の範囲とが含まれる。以下に、本願出願の当初の特許請求の範囲に記載された発明を付記する。 The preferred embodiments of the present invention have been described above. However, the present invention is not limited to the specific embodiments, and the present invention includes the invention described in the claims and the equivalent scope thereof. included. Hereinafter, the invention described in the scope of claims of the present application will be appended.

（付記１）
符号化対象のデータに含まれる文字と、該文字を識別する文字識別番号と、を関連付けて記憶する文字識別番号記憶部と、
前記文字識別番号記憶部を参照して、前記符号化対象のデータに含まれる文字を、該文字に関連付けられた文字識別番号に変換する変換部と、
前記変換部が変換した文字識別番号をバイナリデータに符号化する符号化部と、
前記符号化部が符号化した文字識別番号のバイナリデータに出現する同一のビット列ごとに、該ビット列を識別するビット列識別番号を関連付けてビット列識別番号記憶部に記憶するビット列識別番号関連付部と、
前記ビット列識別番号記憶部を参照して、前記文字識別番号のバイナリデータに出現する同一のビット列を、該ビット列に関連付けられたビット列識別番号に変換するビット列識別番号変換部と、
前記ビット列識別番号変換部が変換したビット列識別番号に、復号時に前記ビット列識別番号記憶部を参照することを示す参照フラグを関連付けて、該ビット列識別番号をバイナリデータに符号化する参照フラグ付き符号化部と、
を備える符号化装置。 (Appendix 1)
A character identification number storage unit that stores a character included in data to be encoded and a character identification number for identifying the character in association with each other;
A conversion unit that refers to the character identification number storage unit and converts a character included in the data to be encoded into a character identification number associated with the character;
An encoding unit that encodes the character identification number converted by the conversion unit into binary data;
A bit string identification number associating unit that associates a bit string identification number that identifies the bit string and stores it in the bit string identification number storage unit for each identical bit string that appears in the binary data of the character identification number encoded by the encoding unit;
A bit string identification number converter that converts the same bit string appearing in the binary data of the character identification number into a bit string identification number associated with the bit string with reference to the bit string identification number storage unit;
Encoding with a reference flag for associating a reference flag indicating that the bit string identification number storage unit is referred to at the time of decoding with the bit string identification number converted by the bit string identification number conversion unit, and encoding the bit string identification number into binary data And
An encoding device comprising:

（付記２）
前記文字識別番号は、前記符号化対象のデータにおいて出現頻度の高い文字に関連付けられたものほど小さく、
前記符号化部は、前記文字識別番号が小さいほど該文字識別番号を符号化して得られるバイナリデータのデータ量が小さい符号化方法を用いて、前記文字識別番号をバイナリデータに符号化する、
付記１に記載の符号化装置。 (Appendix 2)
The character identification number is smaller as it is associated with a character having a high appearance frequency in the data to be encoded,
The encoding unit encodes the character identification number into binary data using an encoding method in which the amount of binary data obtained by encoding the character identification number is smaller as the character identification number is smaller.
The encoding device according to attachment 1.

（付記３）
前記文字識別番号のバイナリデータに出現するビット列のうち前記ビット列識別番号が関連付けられていないビット列については、該ビット列の先頭から前記ビット列識別番号が関連付けられたビット列が出現するまでのビットの桁数に、復号時に前記ビット列識別番号記憶部を参照しないことを示す非参照フラグを関連付けて、該ビットの桁数をバイナリデータに符号化する非参照フラグ付き符号化部、
を備える付記１又は２に記載の符号化装置。 (Appendix 3)
Among the bit strings appearing in the binary data of the character identification number, for the bit string not associated with the bit string identification number, the number of bits from the beginning of the bit string until the bit string associated with the bit string identification number appears An encoding unit with a non-reference flag that associates a non-reference flag indicating that the bit string identification number storage unit is not referred to at the time of decoding and encodes the number of digits of the bit into binary data;
The encoding apparatus according to Supplementary Note 1 or 2, further comprising:

（付記４）
前記ビット列識別番号は、前記文字識別番号のバイナリデータにおいて出現頻度の高い同一のビット列に関連付けられたものほど小さく、
前記参照フラグ付き符号化部は、前記ビット列識別番号が小さいほど該ビット列識別番号を符号化して得られるバイナリデータのデータ量が小さい符号化方法を用いて、前記ビット列識別番号をバイナリデータに符号化する、
付記１乃至３の何れかに記載の符号化装置。 (Appendix 4)
The bit string identification number is smaller as it is associated with the same bit string that frequently appears in the binary data of the character identification number,
The encoding unit with a reference flag encodes the bit string identification number into binary data using an encoding method in which the smaller the bit string identification number, the smaller the amount of binary data obtained by encoding the bit string identification number. To
The encoding device according to any one of appendices 1 to 3.

（付記５）
前記符号化対象のデータは、見出し語を含み、
前記同一のビット列の出現頻度は、前記符号化対象のデータにおける前記見出し語の開始位置を跨ぐ該同一のビット列の出現数を含まない、
付記４に記載の符号化装置。 (Appendix 5)
The encoding target data includes a headword,
The appearance frequency of the same bit string does not include the number of occurrences of the same bit string across the start position of the headword in the encoding target data.
The encoding device according to attachment 4.

（付記６）
前記符号化対象のデータは、見出し語を含み、
前記ビット列識別番号変換部は、前記ビット列識別番号が関連付けられたビット列が、前記見出し語の開始位置を跨ぐ場合、該ビット列を前記ビット列識別番号に変換せず、
前記非参照フラグ付き符号化部は、前記ビット列識別番号が関連付けられたビット列のうち前記ビット列識別番号変換部が前記ビット列識別番号に変換しなかったビット列について、該ビット列の先頭から該ビット列が跨ぐ前記見出し語の開始位置までのビットの桁数に、前記非参照フラグを関連付けて、該ビットの桁数をバイナリデータに符号化する、
付記３に記載の符号化装置。 (Appendix 6)
The encoding target data includes a headword,
The bit string identification number conversion unit, when the bit string associated with the bit string identification number straddles the start position of the headword, does not convert the bit string to the bit string identification number,
The encoding unit with a non-reference flag is the bit string straddling the bit string from the beginning of the bit string with respect to the bit string that the bit string identification number conversion unit does not convert to the bit string identification number among the bit strings associated with the bit string identification number. Associating the non-reference flag with the number of bits to the start position of the headword and encoding the number of bits into binary data;
The encoding device according to attachment 3.

（付記７）
前記非参照フラグ付き符号化部は、前記ビット列識別番号が関連付けられていないビット列が、前記見出し語の開始位置を跨ぐ場合、該ビット列の先頭から該開始位置までのビットの桁数に、前記非参照フラグを関連付けて、該ビットの桁数をバイナリデータに符号化する、
付記６に記載の符号化装置。 (Appendix 7)
When the bit string not associated with the bit string identification number straddles the start position of the headword, the encoding unit with a non-reference flag sets the non-reference flag encoding unit to the number of bits from the beginning of the bit string to the start position. Associating a reference flag and encoding the number of digits of the bit into binary data;
The encoding device according to attachment 6.

（付記８）
前記ビット列識別番号記憶部が記憶する同一のビット列の桁数は、３２桁である、
付記１乃至７の何れかに記載の符号化装置。 (Appendix 8)
The number of digits of the same bit string stored in the bit string identification number storage unit is 32 digits.
The encoding device according to any one of appendices 1 to 7.

（付記９）
符号化対象のデータに含まれる各文字を識別する各文字識別番号のバイナリデータと、前記各文字識別番号のバイナリデータに出現する同一のビット列を識別するビット列識別番号のバイナリデータと、で構成された圧縮データを記憶する圧縮データ記憶部と、
前記符号化対象のデータに含まれる文字と、該文字を識別する前記文字識別番号と、を関連付けて記憶する文字識別番号記憶部と、
前記各文字識別番号のバイナリデータに出現する前記同一のビット列ごとに、該ビット列を識別する前記ビット列識別番号を関連付けて記憶するビット列識別番号記憶部と、
復号対象のデータを特定するための条件を入力する入力部と、
前記圧縮データ記憶部が記憶する圧縮データのうち前記条件を満たすデータを復号対象のデータとして特定し、特定した復号対象のデータを構成する前記文字識別番号のバイナリデータと前記ビット列識別番号のバイナリデータとのうち、前記ビット列識別番号のバイナリデータを前記ビット列識別番号に復号するビット列識別番号復号部と、
前記ビット列識別番号記憶部を参照して、前記ビット列識別番号復号部が復号した前記ビット列識別番号を、該ビット列識別番号に関連付けられたビット列に変換するビット列識別番号変換部と、
前記復号対象のデータを構成する前記文字識別番号のバイナリデータと前記ビット列識別番号変換部が変換したビット列とを前記文字識別番号に復号する復号部と、
前記文字識別番号記憶部を参照して、前記復号部が復号した前記文字識別番号を、該文字識別番号に関連付けられた文字に変換する変換部と、
を備える復号装置。 (Appendix 9)
It consists of binary data of each character identification number that identifies each character included in the data to be encoded, and binary data of a bit string identification number that identifies the same bit string that appears in the binary data of each character identification number A compressed data storage unit for storing compressed data,
A character identification number storage unit for storing the character included in the data to be encoded and the character identification number for identifying the character in association with each other;
A bit string identification number storage unit that associates and stores the bit string identification number for identifying the bit string for each identical bit string that appears in the binary data of each character identification number;
An input unit for inputting a condition for specifying data to be decrypted;
Among the compressed data stored in the compressed data storage unit, the data satisfying the condition is specified as the data to be decoded, and the binary data of the character identification number and the binary data of the bit string identification number constituting the specified decoding target data A bit string identification number decoding unit for decoding binary data of the bit string identification number into the bit string identification number,
A bit string identification number converting unit that converts the bit string identification number decoded by the bit string identification number decoding unit into a bit string associated with the bit string identification number with reference to the bit string identification number storage unit;
A decoding unit that decodes the binary data of the character identification number constituting the data to be decoded and the bit string converted by the bit string identification number conversion unit into the character identification number;
A conversion unit that refers to the character identification number storage unit and converts the character identification number decoded by the decoding unit into a character associated with the character identification number;
A decoding device comprising:

（付記１０）
符号化対象のデータに含まれる文字と、該文字を識別する文字識別番号と、を関連付けて記憶する文字識別番号記憶部を参照して、前記符号化対象のデータに含まれる文字を、該文字に関連付けられた文字識別番号に変換する変換ステップと、
前記変換ステップで変換した文字識別番号をバイナリデータに符号化する符号化ステップと、
前記符号化ステップで符号化した文字識別番号のバイナリデータに出現する同一のビット列ごとに、該ビット列を識別するビット列識別番号を関連付けてビット列識別番号記憶部に記憶するビット列識別番号関連付ステップと、
前記ビット列識別番号記憶部を参照して、前記文字識別番号のバイナリデータに出現する同一のビット列を、該ビット列に関連付けられたビット列識別番号に変換するビット列識別番号変換ステップと、
前記ビット列識別番号変換ステップで変換したビット列識別番号に、復号時に前記ビット列識別番号記憶部を参照することを示す参照フラグを関連付けて、該ビット列識別番号をバイナリデータに符号化する参照フラグ付き符号化ステップと、
を備える符号化方法。 (Appendix 10)
The character included in the data to be encoded is referred to the character identification number storage unit that stores the character included in the data to be encoded and the character identification number that identifies the character in association with each other. A conversion step for converting to a character identification number associated with
An encoding step of encoding the character identification number converted in the conversion step into binary data;
A bit string identification number associating step for associating a bit string identification number for identifying the bit string and storing it in the bit string identification number storage unit for each identical bit string appearing in the binary data of the character identification number encoded in the encoding step;
A bit string identification number conversion step for converting the same bit string appearing in the binary data of the character identification number into a bit string identification number associated with the bit string with reference to the bit string identification number storage unit;
Coding with a reference flag for associating a reference flag indicating that the bit string identification number storage unit is referred to at the time of decoding with the bit string identification number converted in the bit string identification number conversion step, and encoding the bit string identification number into binary data Steps,
An encoding method comprising:

（付記１１）
復号対象のデータを特定するための条件を入力する入力ステップと、
符号化対象のデータに含まれる各文字を識別する各文字識別番号のバイナリデータと、前記各文字識別番号のバイナリデータに出現する同一のビット列を識別するビット列識別番号のバイナリデータと、で構成された圧縮データのうち、前記条件を満たすデータを復号対象のデータとして特定する特定ステップと、
前記特定ステップで特定した前記復号対象のデータを構成する前記文字識別番号のバイナリデータと前記ビット列識別番号のバイナリデータとのうち、前記ビット列識別番号のバイナリデータを前記ビット列識別番号に復号するビット列識別番号復号ステップと、
前記各文字識別番号のバイナリデータに出現する前記同一のビット列ごとに、該ビット列を識別する前記ビット列識別番号を関連付けて記憶するビット列識別番号記憶部を参照して、前記ビット列識別番号復号ステップで復号した前記ビット列識別番号を、該ビット列識別番号に関連付けられたビット列に変換するビット列識別番号変換ステップと、
前記復号対象のデータを構成する前記文字識別番号のバイナリデータと前記ビット列識別番号変換ステップで変換したビット列とを前記文字識別番号に復号する復号ステップと、
前記符号化対象のデータに含まれる文字と、該文字を識別する前記文字識別番号と、を関連付けて記憶する文字識別番号記憶部を参照して、前記復号ステップで復号した前記文字識別番号を、該文字識別番号に関連付けられた文字に変換する変換ステップと、
を備える復号方法。 (Appendix 11)
An input step for inputting conditions for specifying data to be decrypted;
It consists of binary data of each character identification number that identifies each character included in the data to be encoded, and binary data of a bit string identification number that identifies the same bit string that appears in the binary data of each character identification number Among the compressed data, a specific step of specifying data satisfying the condition as data to be decoded;
Bit string identification for decoding binary data of the bit string identification number into the bit string identification number of the binary data of the character identification number and the binary data of the bit string identification number constituting the data to be decoded identified in the identifying step A number decoding step;
Decoding in the bit string identification number decoding step with reference to a bit string identification number storage unit for storing the bit string identification number for identifying the bit string for each of the same bit strings appearing in the binary data of each character identification number Converting the bit string identification number into a bit string associated with the bit string identification number; and
A decoding step for decoding the character identification number binary data constituting the decoding target data and the bit string converted in the bit string identification number conversion step into the character identification number;
The character identification number decoded in the decoding step with reference to a character identification number storage unit that associates and stores the character included in the data to be encoded and the character identification number that identifies the character, A conversion step for converting to a character associated with the character identification number;
A decoding method comprising:

（付記１２）
コンピュータを、
符号化対象のデータに含まれる文字と、該文字を識別する文字識別番号と、を関連付けて記憶する文字識別番号記憶部を参照して、前記符号化対象のデータに含まれる文字を、該文字に関連付けられた文字識別番号に変換する変換部、
前記変換部が文字識別番号をバイナリデータに符号化する符号化部、
前記符号化部が符号化した文字識別番号のバイナリデータに出現する同一のビット列ごとに、該ビット列を識別するビット列識別番号を関連付けてビット列識別番号記憶部に記憶するビット列識別番号関連付部、
前記ビット列識別番号記憶部を参照して、前記文字識別番号のバイナリデータに出現する同一のビット列を、該ビット列に関連付けられたビット列識別番号に変換するビット列識別番号変換部、
前記ビット列識別番号変換部が変換したビット列識別番号に、復号時に前記ビット列識別番号記憶部を参照することを示す参照フラグを関連付けて、該ビット列識別番号をバイナリデータに符号化する参照フラグ付き符号化部、
として機能させるためのプログラム。 (Appendix 12)
Computer
The character included in the data to be encoded is referred to the character identification number storage unit that stores the character included in the data to be encoded and the character identification number that identifies the character in association with each other. A conversion unit for converting to a character identification number associated with
An encoding unit for encoding the character identification number into binary data by the conversion unit;
A bit string identification number associating unit that associates a bit string identification number that identifies the bit string and stores it in the bit string identification number storage unit for each identical bit string that appears in the binary data of the character identification number encoded by the encoding unit;
A bit string identification number converter that converts the same bit string appearing in the binary data of the character identification number into a bit string identification number associated with the bit string with reference to the bit string identification number storage unit;
Encoding with a reference flag for associating a reference flag indicating that the bit string identification number storage unit is referred to at the time of decoding with the bit string identification number converted by the bit string identification number conversion unit, and encoding the bit string identification number into binary data Part,
Program to function as.

（付記１３）
コンピュータを、
復号対象のデータを特定するための条件を入力する入力部、
符号化対象のデータに含まれる各文字を識別する各文字識別番号のバイナリデータと、前記各文字識別番号のバイナリデータに出現する同一のビット列を識別するビット列識別番号のバイナリデータと、で構成された圧縮データのうち、前記条件を満たすデータを復号対象のデータとして特定する特定部、
前記特定部が特定した前記復号対象のデータを構成する前記文字識別番号のバイナリデータと前記ビット列識別番号のバイナリデータとのうち、前記ビット列識別番号のバイナリデータを前記ビット列識別番号に復号するビット列識別番号復号部、
前記各文字識別番号のバイナリデータに出現する前記同一のビット列ごとに、該ビット列を識別する前記ビット列識別番号を関連付けて記憶するビット列識別番号記憶部を参照して、前記ビット列識別番号復号部が復号した前記ビット列識別番号を、該ビット列識別番号に関連付けられたビット列に変換するビット列識別番号変換部、
前記復号対象のデータを構成する前記文字識別番号のバイナリデータと前記ビット列識別番号変換部が変換したビット列とを前記文字識別番号に復号する復号部、
前記符号化対象のデータに含まれる文字と、該文字を識別する前記文字識別番号と、を関連付けて記憶する文字識別番号記憶部を参照して、前記復号部が復号した前記文字識別番号を、該文字識別番号に関連付けられた文字に変換する変換部、
として機能させるためのプログラム。 (Appendix 13)
Computer
An input unit for inputting a condition for specifying data to be decrypted;
It consists of binary data of each character identification number that identifies each character included in the data to be encoded, and binary data of a bit string identification number that identifies the same bit string that appears in the binary data of each character identification number Among the compressed data, a specifying unit that specifies data satisfying the condition as data to be decoded,
Bit string identification for decoding binary data of the bit string identification number into the bit string identification number of binary data of the character identification number and binary data of the bit string identification number constituting the data to be decoded specified by the specifying unit Number decoding part,
For each identical bit string appearing in the binary data of each character identification number, the bit string identification number decoding unit decodes with reference to a bit string identification number storage unit that stores the bit string identification number that identifies the bit string in association with each other. A bit string identification number converting unit that converts the bit string identification number into a bit string associated with the bit string identification number;
A decoding unit that decodes the binary data of the character identification number constituting the data to be decoded and the bit string converted by the bit string identification number conversion unit into the character identification number;
The character identification number decoded by the decoding unit with reference to a character identification number storage unit that associates and stores the character included in the data to be encoded and the character identification number that identifies the character, A conversion unit for converting into a character associated with the character identification number;
Program to function as.

１…辞書データ、２…最終符号化前バイナリデータ、１０…ＲＯＭ、１１…ＲＡＭ、１２…外部記憶装置、１３…入力装置、１４…表示装置、１５…ＣＰＵ、１００…符号化装置、１０１…符号化候補データ記憶部、１０２…表示部、１０３…入力部、１０４…文字出現頻度取得部、１０５…文字識別番号関連付部、１０６…文字識別番号記憶部、１０７…変換部、１０８…符号化部、１０９…開始位置記憶部、１１０…ビット列出現頻度取得部、１１１…ビット列識別番号関連付部、１１２…ビット列識別番号記憶部、１１３…ビット列識別番号変換部、１１４…参照フラグ付き符号化部、１１５…非参照フラグ付き符号化部、１１６…圧縮データ記憶部、２００…復号装置、２０…ＲＯＭ、２１…ＲＡＭ、２２…外部記憶装置、２３…入力装置、２４…表示装置、２５…ＣＰＵ、２００…復号装置、２０１…ビット列識別番号記憶部、２０２…圧縮データ記憶部、２０３…開始位置記憶部、２０４…ビット列識別番号復号方法記憶部、２０５…入力部、２０６…ビット列識別番号復号部、２０７…ビット列識別番号変換部、２０８…文字識別番号復号方法記憶部、２０９…文字識別番号記憶部、２１０…復号部、２１１…変換部、２１２…表示部 DESCRIPTION OF SYMBOLS 1 ... Dictionary data, 2 ... Binary data before final encoding, 10 ... ROM, 11 ... RAM, 12 ... External storage device, 13 ... Input device, 14 ... Display device, 15 ... CPU, 100 ... Encoding device, 101 ... Encoding candidate data storage unit, 102 ... display unit, 103 ... input unit, 104 ... character appearance frequency acquisition unit, 105 ... character identification number association unit, 106 ... character identification number storage unit, 107 ... conversion unit, 108 ... code 109: Start position storage unit, 110: Bit string appearance frequency acquisition unit, 111 ... Bit string identification number association unit, 112 ... Bit string identification number storage unit, 113 ... Bit string identification number conversion unit, 114 ... Coding with reference flag 115: Encoding unit with non-reference flag, 116 ... Compressed data storage unit, 200 ... Decoding device, 20 ... ROM, 21 ... RAM, 22 ... External storage device, 23 ... On Device: 24 ... Display device 25 ... CPU 200 ... Decoding device 201 ... Bit string identification number storage unit 202 ... Compressed data storage unit 203 ... Start position storage unit 204 ... Bit string identification number decoding method storage unit 205 ... Input unit 206 ... Bit string identification number decoding unit, 207 ... Bit string identification number conversion unit, 208 ... Character identification number decoding method storage unit, 209 ... Character identification number storage unit, 210 ... Decoding unit, 211 ... Conversion unit, 212 ... Display Part

Claims

A character identification number storage unit that stores a character included in data to be encoded and a character identification number for identifying the character in association with each other;
A conversion unit that refers to the character identification number storage unit and converts a character included in the data to be encoded into a character identification number associated with the character;
An encoding unit that encodes the character identification number converted by the conversion unit into binary data;
A bit string identification number associating unit that associates a bit string identification number that identifies the bit string and stores it in the bit string identification number storage unit for each identical bit string that appears in the binary data of the character identification number encoded by the encoding unit;
A bit string identification number converter that converts the same bit string appearing in the binary data of the character identification number into a bit string identification number associated with the bit string with reference to the bit string identification number storage unit;
Encoding with a reference flag for associating a reference flag indicating that the bit string identification number storage unit is referred to at the time of decoding with the bit string identification number converted by the bit string identification number conversion unit, and encoding the bit string identification number into binary data And
An encoding device comprising:

The character identification number is smaller as it is associated with a character having a high appearance frequency in the data to be encoded,
The encoding unit encodes the character identification number into binary data using an encoding method in which the amount of binary data obtained by encoding the character identification number is smaller as the character identification number is smaller.
The encoding device according to claim 1.

Among the bit strings appearing in the binary data of the character identification number, for the bit string not associated with the bit string identification number, the number of bits from the beginning of the bit string until the bit string associated with the bit string identification number appears An encoding unit with a non-reference flag that associates a non-reference flag indicating that the bit string identification number storage unit is not referred to at the time of decoding and encodes the number of digits of the bit into binary data;
The encoding device according to claim 1, comprising:

The bit string identification number is smaller as it is associated with the same bit string that frequently appears in the binary data of the character identification number,
The encoding unit with a reference flag encodes the bit string identification number into binary data using an encoding method in which the smaller the bit string identification number, the smaller the amount of binary data obtained by encoding the bit string identification number. To
The encoding apparatus as described in any one of Claims 1 thru | or 3.

The encoding target data includes a headword,
The appearance frequency of the same bit string does not include the number of occurrences of the same bit string across the start position of the headword in the encoding target data.
The encoding device according to claim 4.

The encoding target data includes a headword,
The bit string identification number conversion unit, when the bit string associated with the bit string identification number straddles the start position of the headword, does not convert the bit string to the bit string identification number,
The encoding unit with a non-reference flag is the bit string straddling the bit string from the beginning of the bit string with respect to the bit string that the bit string identification number conversion unit does not convert to the bit string identification number among the bit strings associated with the bit string identification number. Associating the non-reference flag with the number of bits to the start position of the headword and encoding the number of bits into binary data;
The encoding device according to claim 3.

When the bit string not associated with the bit string identification number straddles the start position of the headword, the encoding unit with a non-reference flag sets the non-reference flag encoding unit to the number of bits from the beginning of the bit string to the start position. Associating a reference flag and encoding the number of digits of the bit into binary data;
The encoding device according to claim 6.

The number of digits of the same bit string stored in the bit string identification number storage unit is 32 digits.
The encoding device according to any one of claims 1 to 7.

It consists of binary data of each character identification number that identifies each character included in the data to be encoded, and binary data of a bit string identification number that identifies the same bit string that appears in the binary data of each character identification number A compressed data storage unit for storing compressed data,
A character identification number storage unit for storing the character included in the data to be encoded and the character identification number for identifying the character in association with each other;
A bit string identification number storage unit that associates and stores the bit string identification number for identifying the bit string for each identical bit string that appears in the binary data of each character identification number;
An input unit for inputting a condition for specifying data to be decrypted;
Among the compressed data stored in the compressed data storage unit, the data satisfying the condition is specified as the data to be decoded, and the binary data of the character identification number and the binary data of the bit string identification number constituting the specified decoding target data A bit string identification number decoding unit for decoding binary data of the bit string identification number into the bit string identification number,
A bit string identification number converting unit that converts the bit string identification number decoded by the bit string identification number decoding unit into a bit string associated with the bit string identification number with reference to the bit string identification number storage unit;
A decoding unit that decodes the binary data of the character identification number constituting the data to be decoded and the bit string converted by the bit string identification number conversion unit into the character identification number;
A conversion unit that refers to the character identification number storage unit and converts the character identification number decoded by the decoding unit into a character associated with the character identification number;
A decoding device comprising:

The character included in the data to be encoded is referred to the character identification number storage unit that stores the character included in the data to be encoded and the character identification number that identifies the character in association with each other. A conversion step for converting to a character identification number associated with
An encoding step of encoding the character identification number converted in the conversion step into binary data;
A bit string identification number associating step for associating a bit string identification number for identifying the bit string and storing it in the bit string identification number storage unit for each identical bit string appearing in the binary data of the character identification number encoded in the encoding step;
A bit string identification number conversion step for converting the same bit string appearing in the binary data of the character identification number into a bit string identification number associated with the bit string with reference to the bit string identification number storage unit;
Coding with a reference flag for associating a reference flag indicating that the bit string identification number storage unit is referred to at the time of decoding with the bit string identification number converted in the bit string identification number conversion step, and encoding the bit string identification number into binary data Steps,
An encoding method comprising:

An input step for inputting conditions for specifying data to be decrypted;
It consists of binary data of each character identification number that identifies each character included in the data to be encoded, and binary data of a bit string identification number that identifies the same bit string that appears in the binary data of each character identification number Among the compressed data, a specific step of specifying data satisfying the condition as data to be decoded;
Bit string identification for decoding binary data of the bit string identification number into the bit string identification number of the binary data of the character identification number and the binary data of the bit string identification number constituting the data to be decoded identified in the identifying step A number decoding step;
Decoding in the bit string identification number decoding step with reference to a bit string identification number storage unit for storing the bit string identification number for identifying the bit string for each of the same bit strings appearing in the binary data of each character identification number Converting the bit string identification number into a bit string associated with the bit string identification number; and
A decoding step for decoding the character identification number binary data constituting the decoding target data and the bit string converted in the bit string identification number conversion step into the character identification number;
The character identification number decoded in the decoding step with reference to a character identification number storage unit that associates and stores the character included in the data to be encoded and the character identification number that identifies the character, A conversion step for converting to a character associated with the character identification number;
A decoding method comprising:

Computer
The character included in the data to be encoded is referred to the character identification number storage unit that stores the character included in the data to be encoded and the character identification number that identifies the character in association with each other. A conversion unit for converting to a character identification number associated with
An encoding unit for encoding the character identification number into binary data by the conversion unit;
A bit string identification number associating unit that associates a bit string identification number that identifies the bit string and stores it in the bit string identification number storage unit for each identical bit string that appears in the binary data of the character identification number encoded by the encoding unit;
A bit string identification number converter that converts the same bit string appearing in the binary data of the character identification number into a bit string identification number associated with the bit string with reference to the bit string identification number storage unit;
Encoding with a reference flag for associating a reference flag indicating that the bit string identification number storage unit is referred to at the time of decoding with the bit string identification number converted by the bit string identification number conversion unit, and encoding the bit string identification number into binary data Part,
Program to function as.

Computer
An input unit for inputting a condition for specifying data to be decrypted;
It consists of binary data of each character identification number that identifies each character included in the data to be encoded, and binary data of a bit string identification number that identifies the same bit string that appears in the binary data of each character identification number Among the compressed data, a specifying unit that specifies data satisfying the condition as data to be decoded,
Bit string identification for decoding binary data of the bit string identification number into the bit string identification number of binary data of the character identification number and binary data of the bit string identification number constituting the data to be decoded specified by the specifying unit Number decoding part,
For each identical bit string appearing in the binary data of each character identification number, the bit string identification number decoding unit decodes with reference to a bit string identification number storage unit that stores the bit string identification number that identifies the bit string in association with each other. A bit string identification number converting unit that converts the bit string identification number into a bit string associated with the bit string identification number;
A decoding unit that decodes the binary data of the character identification number constituting the data to be decoded and the bit string converted by the bit string identification number conversion unit into the character identification number;
The character identification number decoded by the decoding unit with reference to a character identification number storage unit that associates and stores the character included in the data to be encoded and the character identification number that identifies the character, A conversion unit for converting into a character associated with the character identification number;
Program to function as.