JPH06332666A

JPH06332666A - Compressing method for data

Info

Publication number: JPH06332666A
Application number: JP5118504A
Authority: JP
Inventors: Kazuo Suzuki; 和夫鈴木; Katsuhiro Utsunomiya; 勝広宇都宮
Original assignee: Hitachi Software Engineering Co Ltd
Current assignee: Hitachi Software Engineering Co Ltd
Priority date: 1993-05-20
Filing date: 1993-05-20
Publication date: 1994-12-02

Abstract

PURPOSE:To facilitate the compression of text data for displaying a Japanese language document by rearranging characters in order of a larger frequency of a appearance and registering them in a file, and outputting a value of location from the head of the registered characters as compressed text data. CONSTITUTION:In a character-string of text data to be compressed, characters are sorted in order of a larger frequency of appearance, and each character is registered in a file 7 in sorted order. Subsequently, a value of location on the file 7 of each registered character is obtained, and by the value of location, each character of the text data to be compressed is represented. That is, each character of the text data to be compressed is converted into the value of location. Each converted character is outputted to a compressed data text file 8 as compressed text data. By excuting such data compression, the memory capacity for storing the compressed text data can be made small.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、データの圧縮方法に関
し、特に、携帯型の情報処理用電子機器などで日本語文
章を表示するために好適なデータの圧縮方法に関するも
のである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data compression method, and more particularly to a data compression method suitable for displaying Japanese sentences on a portable information processing electronic device.

【０００２】[0002]

【従来の技術】近年、半導体デバイス製造技術の急速な
進歩により、各種情報処理を行うことができる携帯型の
電子機器が開発されている。2. Description of the Related Art In recent years, due to rapid progress in semiconductor device manufacturing technology, portable electronic equipment capable of performing various types of information processing has been developed.

【０００３】このような携帯型の情報処理用電子機器、
例えば、電子手帳のような電子機器においては、表示用
メッセージなどのテキストデータを数多く取扱えること
が使い勝手を左右する一つの要因になっている。Such a portable information processing electronic device,
For example, in an electronic device such as an electronic notebook, handling a large amount of text data such as a display message is one factor that affects usability.

【０００４】これら表示用メッセージなどのテキストデ
ータを数多く取扱うためには大きなメモリ容量が必要で
あるが、前記電子手帳のような電子機器のメモリ容量は
４メガバイト程度に限られている。A large memory capacity is required to handle a large amount of text data such as display messages, but the memory capacity of electronic equipment such as the electronic notebook is limited to about 4 megabytes.

【０００５】この限られたメモリ容量内に数多くのテキ
ストデータを記憶するためには、テキストデータをでき
るだけ効率的に圧縮する必要がある。このため、様々な
データの圧縮方法が考えられている。In order to store a large amount of text data in this limited memory capacity, it is necessary to compress the text data as efficiently as possible. Therefore, various data compression methods have been considered.

【０００６】例えば、特開昭６３−３１００１８号公報
に開示されているデータ圧縮方法は、所定ビットの被圧
縮データが複数バイト分連続している場合のデータ列の
データ圧縮方法に関するものであり、ここで開示されて
いるデータの圧縮方法は、繰り返し部分の先頭の１バイ
トのデータに、この繰り返し部分が続く長さの情報を付
加することによってデータを圧縮する方法である。For example, the data compression method disclosed in Japanese Laid-Open Patent Publication No. 63-310018 relates to a data compression method for a data string when the data to be compressed of a predetermined bit is continuous for a plurality of bytes. The data compression method disclosed here is a method of compressing data by adding information of the length of the repeated portion to the leading 1-byte data of the repeated portion.

【０００７】このような繰返し部分が多く出現する例と
して、日本語ワード・プロセッサなどにおける漢字フォ
ントデータがある。この漢字フォントデータは、１つの
文字を、例えば、３２×３２ドットで表現した場合、各
ドットに対応した１０２４ビットのデータ列になる。An example where many such repeated parts appear is Kanji font data in a Japanese word processor or the like. This Kanji font data is a 1024-bit data string corresponding to each dot when one character is represented by 32 × 32 dots, for example.

【０００８】この１０２４ビットで表現された１文字の
データ列の中には、文字線が連続している部分に”
１”、あるいは文字線がない空白部分では”０”などの
データの繰り返し部分が多く出現する。[0008] In the data string of one character represented by 1024 bits, the part where the character line is continuous is "
1 ", or a blank part without a character line, has many repeated parts of data such as" 0 ".

【０００９】従って、前記データ圧縮方法は、漢字フォ
ントデータのような繰り返し部分が多く出現するデータ
の圧縮方法として有効な方法である。Therefore, the data compression method is an effective method for compressing data such as Kanji font data in which many repetitive parts appear.

【００１０】[0010]

【発明が解決しようとする課題】しかしながら、表示用
メッセージなどの文章（文字列）では、同一文字が連続
して出現することはまずあり得ないことであり、前記デ
ータ圧縮方法をテキストデータなどの圧縮に適用して
も、圧縮率が低くて実用にならないという問題がある。However, in a sentence (character string) such as a display message, it is unlikely that the same characters will appear consecutively. Even if it is applied to compression, there is a problem that the compression rate is low and it is not practical.

【００１１】本発明の目的は、携帯型の情報処理用電子
機器などで日本語文章を表示するためのテキストデータ
の圧縮を容易に行うことができるデータの圧縮方法を提
供することにある。It is an object of the present invention to provide a data compression method capable of easily compressing text data for displaying a Japanese sentence on a portable information processing electronic device or the like.

【００１２】[0012]

【課題を解決するための手段】前記目的を達成するため
に、本発明は、被圧縮テキストデータを構成する各文字
について該被圧縮テキストデータ中での出現回数を求め
た後、出現回数の多い順に文字を並べ替えてファイルに
登録し、その登録した文字のファイル上での先頭からの
ロケーションの値を被圧縮テキストデータの圧縮後のテ
キストデータとして出力することを特徴とする。In order to achieve the above object, according to the present invention, after the number of appearances of each character constituting compressed text data is found in the compressed text data, the number of appearances is large. Characters are rearranged in order and registered in a file, and the value of the location of the registered character from the beginning on the file is output as compressed text data of the compressed text data.

【００１３】[0013]

【作用】前記手段によれば、被圧縮テキストデータの文
字列の中で、出現回数の多い順に文字をソートし、各文
字がソートされた順にファイルに登録される。According to the above means, in the character string of the compressed text data, the characters are sorted in descending order of appearance frequency, and each character is registered in the file in the sorted order.

【００１４】次に、この登録された各文字のファイル上
でのロケーションの値が求められ、このロケーションの
値によって被圧縮テキストデータの各文字が表わされ
る。すなわち、被圧縮テキストデータの各文字がロケー
ションの値に変換されることになる。Next, the value of the location of each registered character on the file is obtained, and each character of the compressed text data is represented by the value of this location. That is, each character of the compressed text data is converted into a location value.

【００１５】このロケーションの値に変換された各文字
は、圧縮後のテキストデータとして出力される。Each character converted to this location value is output as compressed text data.

【００１６】このようなデータの圧縮が行われることに
より、圧縮後のテキストデータを記憶するメモリ容量を
小さくすることができる。また、メモリ容量が限られて
いる場合は、より多くのテキストデータが記憶されるこ
とになる。By performing such data compression, it is possible to reduce the memory capacity for storing the compressed text data. If the memory capacity is limited, more text data will be stored.

【００１７】[0017]

【実施例】以下、本発明の実施例を図面を用いて詳細に
説明する。Embodiments of the present invention will now be described in detail with reference to the drawings.

【００１８】図１は、本発明の一実施例の全体構成を示
すブロック構成図である。図１において、１はコマンド
などを入力するための入力装置、２はデータ処理結果な
どを表示するための表示装置、３はデータ処理を行う処
理装置（ＣＰＵ）、４はプログラムやデータを記憶する
ためのメインメモリである。FIG. 1 is a block diagram showing the overall construction of an embodiment of the present invention. In FIG. 1, 1 is an input device for inputting commands and the like, 2 is a display device for displaying data processing results, 3 is a processing device (CPU) for performing data processing, 4 is a program and data storage Is the main memory for.

【００１９】また、５はデータ圧縮プログラム５１、デ
ータ伸長プログラム５２などを記憶している外部記憶装
置である。An external storage device 5 stores a data compression program 51, a data decompression program 52 and the like.

【００２０】６は圧縮対象のテキストを記憶しているテ
キストファイル、７はデータの圧縮・伸長に使用するた
めのファイル、８は圧縮後のデータを記憶する圧縮デー
タテキストファイル、９は各種装置を接続するためのバ
スラインである。6 is a text file storing the text to be compressed, 7 is a file used for data compression / decompression, 8 is a compressed data text file for storing the compressed data, and 9 is various devices. It is a bus line for connecting.

【００２１】図２はデータの圧縮・伸長の処理に必要な
各種テーブルを説明するための説明図である。図２にお
いて、４１はデータを圧縮・伸長する工程において作成
される各種テーブルを集めたメインメモリ４内のテーブ
ルである。FIG. 2 is an explanatory diagram for explaining various tables necessary for data compression / decompression processing. In FIG. 2, reference numeral 41 is a table in the main memory 4 in which various tables created in the process of compressing / decompressing data are collected.

【００２２】４２は圧縮対象文章である被圧縮テキス
ト、４３は被圧縮テキスト４２の文字列から同一文字を
出現回数の多い順にソートした出現回数ソートテーブル
であり、出現回数の多い順にソートした文字列４３１、
同一文字の出現回数４３２、ファイル７上での先頭から
のロケーション４３３により構成される。Reference numeral 42 is a compressed text which is a text to be compressed, and 43 is an appearance count sort table in which the same characters are sorted from the character string of the compressed text 42 in descending order of appearance count. 431,
The number of appearances of the same character 432 and the location 433 from the beginning on the file 7 are included.

【００２３】図２において、例えば被圧縮テキスト４２
の文字列の中で、「、（コンマ）」、「文」、「字」、
「を」および「し」が３回、「圧」、「縮」が２回、
「列」、「こ」、「の」および「は」がそれぞれ１回出
現しているので、出現回数ソートテーブル４３の出現回
数４３２の欄にそれぞれの出現回数「３」、「２」、
「１」と書き込まれる。In FIG. 2, for example, the compressed text 42
In the character string of, ", (comma)", "sentence", "character",
"O" and "shi" three times, "pressure", "contraction" twice,
Since “column”, “ko”, “no”, and “ha” each appear once, the number of occurrences “3”, “2”, and
Written as "1".

【００２４】この時、同一文字が何回出現したとして
も、各文字が１度だけファイル７に登録される。その
際、ファイル７上での先頭からのロケーションの値が出
現回数ソートテーブル４３上のロケーション４３３の欄
に書き込まれる。At this time, no matter how many times the same character appears, each character is registered in the file 7 only once. At this time, the value of the location from the beginning on the file 7 is written in the location 433 column on the appearance count sort table 43.

【００２５】また、４４は被圧縮テキスト４２の文字列
４４１と、ファイル７上での先頭からのロケーションＡ
４４２と、ロケーションＡ４４２の値に「９」を加算し
て新しく設定したロケーションＢ４４３の値とから構成
されるロケーションテーブルである。Further, 44 is a character string 441 of the compressed text 42 and a location A from the beginning on the file 7.
It is a location table composed of 442 and the value of location B443 newly set by adding "9" to the value of location A442.

【００２６】また、４５はロケーションＢ４４３の値を
１６進数で表わした圧縮データテキストテーブルであ
リ、文字列４３１に対応する値が１バイトおよび２バイ
トで表わされている。Reference numeral 45 is a compressed data text table in which the value of the location B443 is expressed in hexadecimal, and the value corresponding to the character string 431 is expressed in 1 byte and 2 bytes.

【００２７】例えば、圧縮データテキストテーブル４５
に書き込まれている圧縮データの値「１１」４５１は、
ロケーションＢ４４３に書き込まれているロケーション
の値「１７」４４４を１６進数で表したものであり、１
バイトで表すことができる。For example, the compressed data text table 45
The compressed data value “11” 451 written in
The value "17" 444 of the location written in the location B443 is represented by a hexadecimal number.
It can be expressed in bytes.

【００２８】また、圧縮データテキストテーブル４５に
書き込まれている圧縮データの値「０１０９」４５２
は、ロケーションＢ４４３に書き込まれているロケーシ
ョンの値「２６５」４４５を１６進数で表わしたもので
あり、２バイトを必要としている。The value of the compressed data written in the compressed data text table 45 is "0109" 452.
Is a hexadecimal representation of the location value "265" 445 written in location B443 and requires 2 bytes.

【００２９】さらに、４６は圧縮されたデータの伸長処
理に使用するテーブルであり、１バイト目のデータを読
み込むためのワークエリアＭ₁４６１と２バイト目のデ
ータを読み込むためのワークエリアＭ₂４６２とから構
成される。Further, 46 is a table used for decompressing the compressed data, which is a work area M ₁ 461 for reading the first byte data and a work area M ₂ 462 for reading the second byte data. Composed of and.

【００３０】図３はデータの圧縮処理の手順を示すフロ
ーチャートである。以下、図３を用いてデータの圧縮処
理の手順を説明する。FIG. 3 is a flow chart showing the procedure of data compression processing. The procedure of data compression processing will be described below with reference to FIG.

【００３１】まず、前記図１における入力装置１からデ
ータ圧縮プログラム５１を起動するコマンドが入力され
ると、処理装置（ＣＰＵ）３は、外部記憶装置５からデ
ータ圧縮プログラム５１をメインメモリ４に読み込み、
データ圧縮プログラム５１の実行が開始される。First, when a command for starting the data compression program 51 is input from the input device 1 in FIG. 1, the processing device (CPU) 3 reads the data compression program 51 from the external storage device 5 into the main memory 4. ,
The execution of the data compression program 51 is started.

【００３２】図３において、データ圧縮プログラム５１
は、まずテキストファイル６から被圧縮テキスト４２を
メインメモリ４に読み込む（ステップ３０１）。In FIG. 3, the data compression program 51
First reads the compressed text 42 from the text file 6 into the main memory 4 (step 301).

【００３３】次に、読み込んだ被圧縮テキスト４２を先
頭から１文字づつサーチして同一文字の出現回数を求
め、出現回数の多い順にソートする（ステップ３０
２）。Next, the compressed text 42 that has been read is searched character by character from the beginning to find the number of appearances of the same character, and sorted in descending order of the number of appearances (step 30).
2).

【００３４】この段階で、ソートした文字列４３１をフ
ァイル７に登録する（ステップ３０３）。At this stage, the sorted character string 431 is registered in the file 7 (step 303).

【００３５】次に、ファイル７に登録した文字列４３１
のファイル７上での先頭からのロケーションの値を求め
る（ステップ３０４）。Next, the character string 431 registered in the file 7
The value of the location from the beginning on the file 7 is obtained (step 304).

【００３６】そして、メインメモリ４に読み込まれた被
圧縮テキスト４２の各文字をステップ３０４で求めたロ
ケーションの値で表わす（ステップ３０５）。Then, each character of the compressed text 42 read into the main memory 4 is represented by the value of the location obtained in step 304 (step 305).

【００３７】例えば、被圧縮テキスト４２の先頭の文字
「こ」はロケーション４３３の値が「８」に該当するた
め「８」として表わされる。また、「プログラム」の
「プ」は、ロケーションの値が「２５６」のため、「２
５６」として表わされる。For example, the first character "ko" of the compressed text 42 is represented as "8" because the value of the location 433 corresponds to "8". In addition, since the value of location is "256", "2" in "Program" is "2".
56 ".

【００３８】こうしてロケーションＡ４４２の値が求ま
ると、ロケーションＡ４４２の値に「９」を加算してロ
ケーションＢ４３３の値を求める（ステップ３０６）。
これは、ロケーションの値を１バイトで表わせるもの
と、２バイト必要なものとがあるためであり、データの
伸長処理において、ロケーションの値が１バイトか２バ
イトかの識別を可能にするための処理である。When the value of the location A442 is obtained in this way, "9" is added to the value of the location A442 to obtain the value of the location B433 (step 306).
This is because there are those that can express the location value with 1 byte and those that require 2 bytes, so that it is possible to identify whether the location value is 1 byte or 2 bytes in the data decompression process. Processing.

【００３９】例えば前記図２において、ロケーションテ
ーブル４４の文字列４４１の欄の先頭の文字「こ」はロ
ケーションＡ４４２の値が「８」であり、文字列４４１
の欄の「プ」はロケーションＡ４４２の値が「２５６」
である。ここで「こ」はロケーションの値を１バイトで
表わせるが、「プ」は２バイト必要になる。For example, in FIG. 2 described above, the first character “ko” in the column of the character string 441 of the location table 44 has the value of the location A 442 is “8” and the character string 441.
The value of location A442 is "256"
Is. Here, "ko" can represent the location value in 1 byte, but "pu" requires 2 bytes.

【００４０】以上の処理が終了すると、ロケーションＢ
４４３の値を圧縮データテキストファイル８に登録して
データの圧縮処理が完了する（ステップ３０７）。When the above processing is completed, location B
The value of 443 is registered in the compressed data text file 8 and the data compression processing is completed (step 307).

【００４１】図２の圧縮データテキストテーブル４５
は、圧縮データテキストファイル８に登録されたデータ
と同じ内容を表わしたものである。The compressed data text table 45 of FIG.
Represents the same contents as the data registered in the compressed data text file 8.

【００４２】すなわち、ロケーションＢ４４３の値を１
６進数で示したものであり、圧縮データテキストテーブ
ル４５の値「１１」４５１は文字列４４１の「こ」に対
応し１バイトで表わされている。That is, the value of location B443 is set to 1
The value "11" 451 in the compressed data text table 45 is represented by a hexadecimal number and corresponds to "ko" in the character string 441 and is represented by 1 byte.

【００４３】また、圧縮データテキストテーブル４５の
値「０１０９」４５２は文字列４４１の「プ」に対応し
て２バイトで表わされている。The value "0109" 452 of the compressed data text table 45 is represented by 2 bytes corresponding to the "p" of the character string 441.

【００４４】図４はデータの伸長処理の手順を説明する
ためのフローチャートである。以下、図４のフローチャ
ートを用いてデータの伸長処理手順について説明する。FIG. 4 is a flow chart for explaining the procedure of data decompression processing. The data decompression processing procedure will be described below with reference to the flowchart of FIG.

【００４５】まず、前記図１における入力装置１からデ
ータ伸長プログラム５２を起動するコマンドが入力され
ると、処理装置（ＣＰＵ）３は、外部記憶装置５からデ
ータ伸長プログラム５２をメインメモリ４に読み込み、
データ伸長プログラム５２の実行が開始される。First, when a command to activate the data decompression program 52 is input from the input device 1 in FIG. 1, the processing device (CPU) 3 reads the data decompression program 52 from the external storage device 5 into the main memory 4. ,
The execution of the data decompression program 52 is started.

【００４６】図４において、データ伸長プログラム５２
は圧縮データテキストファイル８から１バイト分のデー
タをワークエリアＭ₁４６１に読み込む（ステップ４０
１）。In FIG. 4, the data decompression program 52
Reads 1 byte of data from the compressed data text file 8 into the work area M ₁ 461 (step 40
1).

【００４７】１バイト分のデータを読み込んだ後、デー
タがあるか否かを確認し（ステップ４０２）、データが
なければ処理を終了する。After reading 1-byte data, it is confirmed whether or not there is data (step 402), and if there is no data, the process is terminated.

【００４８】ステップ４０２において、データがある場
合は、読み込んだデータの値（Ｍ₁の値）が１６進数の
「０９」以上か否かを判定する（ステップ４０３）。If there is data in step 402, it is determined whether or not the value of the read data (the value of M ₁ ) is hexadecimal "09" or more (step 403).

【００４９】ここで、読み込んだデータの値（Ｍ₁の
値）が「０９」以上の場合、この圧縮データは１バイト
で表現されているロケーションの値であるので、ワーク
エリアＭ₁４６１に読み込まれた値から「９」を減算す
る（ステップ４０４）。これは、データ圧縮時にロケー
ションの値に「９」が加算されているためである。Here, when the value of the read data (the value of M ₁ ) is "09" or more, this compressed data is the value of the location represented by 1 byte, and thus is read into the work area M ₁ 461. "9" is subtracted from the calculated value (step 404). This is because "9" is added to the value of the location when the data is compressed.

【００５０】次に、ファイル７上での先頭からのロケー
ションの値がワークエリアＭ₁４６１に読み込まれた値
に該当する文字をメインメモリ４に読み込むことによ
り、元の文字に変換する（ステップ４０５）。Next, the character whose position value from the beginning on the file 7 corresponds to the value read into the work area M ₁ 461 is read into the main memory 4 to be converted into the original character (step 405). ).

【００５１】一方、読み込まれたデータの値（Ｍ₁の
値）が１６進数の「０９」以下の場合、この圧縮データ
は２バイトで表現されているロケーションの値であるの
で、次の１バイト分のデータをワークエリアＭ₂４６２
に読み込み（ステップ４０７）、ワークエリアＭ₁４６
１、Ｍ₂４６２に読み込まれた値から「９」を減算する
（ステップ４０８）。On the other hand, when the value of the read data (the value of M ₁ ) is less than or equal to the hexadecimal number "09", this compressed data is the value of the location expressed by 2 bytes, so the next 1 byte Minute data to work area M ₂ 462
Read in (step 407), work area M ₁ 46
1, "9" is subtracted from the value read in M ₂ 462 (step 408).

【００５２】次に、ファイル７上での先頭からのロケー
ションの値がワークエリアＭ₁４６１、Ｍ₂４６２に読み
込まれた値に該当する文字をメインメモリ４に読み込む
ことにより、元の文字に変換する（ステップ４０９）。Next, the character whose position value from the beginning on the file 7 corresponds to the value read into the work areas M ₁ 461 and M ₂ 462 is read into the main memory 4 to be converted into the original character. (Step 409).

【００５３】こうして圧縮データテキストファイル８の
データがなくなるまで前記処理を繰り返すことにより、
圧縮されたデータに対する伸長処理を行うことができ
る。By repeating the above processing until there is no data in the compressed data text file 8 in this way,
Decompression processing can be performed on compressed data.

【００５４】以上説明したデータ圧縮・伸長処理は、一
般的にパーソナルソンピュータなど比較的容量の大きな
処理装置上で行われる。The data compression / decompression process described above is generally performed on a processing device having a relatively large capacity, such as a personal computer.

【００５５】このように、データの圧縮処理を終了した
後、ファイル７、圧縮データテキストファイル８及びデ
ータ伸長プログラム５２を携帯型の情報処理用電子機器
などにダウンロードしておくことにより、圧縮後のテキ
ストデータを必要に応じて伸長し、表示するなどの処理
を行うことができる。In this way, after the data compression processing is completed, the file 7, the compressed data text file 8 and the data decompression program 52 are downloaded to a portable information processing electronic device or the like, so that the data after the compression is compressed. Processing such as decompressing the text data and displaying it can be performed.

【００５６】これにより、メモリ容量が４メガバイト程
度の前記電子手帳のような電子機器であっても、１文字
のフォントのサイズが４０×４０、あるいは４８×４８
ドットの場合に、文章（文字列）として約２０００文字
種程度までのテキストデータの表示を行うことができ
る。As a result, even in an electronic device such as the electronic notebook having a memory capacity of about 4 megabytes, the font size of one character is 40 × 40 or 48 × 48.
In the case of dots, text data of up to about 2000 character types can be displayed as a sentence (character string).

【００５７】なお、本実施例では、圧縮率を上げるため
に圧縮後のデータはロケーションの値が「２５５」以下
のものは１バイトで表わし、ロケーションの値が「２５
６」以上のものは２バイトで表わす例を説明したが、こ
れ以外のビット構成であってもよい。In the present embodiment, in order to increase the compression rate, the data after the compression has a location value of "255" or less is represented by 1 byte, and the location value has a value of "25".
Although an example in which "6" or more is represented by 2 bytes has been described, other bit configurations may be used.

【００５８】また、本実施例ではロケーション４３３の
値に「９」を加算することにより１バイトか２バイトか
の識別をするようにしたが、これは「９」に限定される
ものではなく、圧縮後の文字数を推定して最適な値を設
定すればよい。Further, in the present embodiment, "9" is added to the value of the location 433 to identify whether it is 1 byte or 2 bytes, but this is not limited to "9". The optimum number may be set by estimating the number of characters after compression.

【００５９】また、本実施例では、パーソナルコンピュ
ータなどから携帯型の情報処理用電子機器などへのデー
タやプログラムのダウンロードの方法については、一般
的に行われている方法を用いるため説明を省略する。Further, in the present embodiment, the method of downloading the data or program from the personal computer or the like to the portable information processing electronic device or the like is a commonly used method, and the description thereof will be omitted. .

【００６０】[0060]

【発明の効果】以上述べたように本発明によれば、被圧
縮テキストデータを構成する各文字について該被圧縮テ
キストデータ中での出現回数を求めた後、出現回数の多
い順に文字を並べ替えてファイルに登録し、その登録し
た文字のファイル上での先頭からのロケーションの値を
被圧縮テキストデータの圧縮後のテキストデータとして
出力するので、日本語文章（文字列）を表示するための
テキストデータの圧縮を容易に行うことができる。As described above, according to the present invention, the number of appearances in each character forming the compressed text data is found in the compressed text data, and then the characters are rearranged in the descending order of the number of appearances. Registered in the file and the value of the location of the registered character from the beginning in the file is output as the compressed text data of the compressed text data, so the text for displaying Japanese sentences (character strings) Data can be easily compressed.

【００６１】また、この圧縮後のテキストデータを用い
ることにより、電子手帳のような携帯型の情報処理用電
子機器のようにメモリ容量が小さい電子機器であって
も、数多くのテキストデータを取扱うことができるとい
う効果がある。By using the compressed text data, a large amount of text data can be handled even with an electronic device having a small memory capacity such as a portable information processing electronic device such as an electronic notebook. There is an effect that can be.

【００６２】また、前記ロケーションの値は、前記ファ
イル上でのロケーションの値に応じて１バイトまたはと
２バイトで構成されているので、データの伸長処理も前
記圧縮処理の逆の手順により比較的簡単な処理で効率良
く行うことができる。Further, since the value of the location is composed of 1 byte or 2 bytes depending on the value of the location on the file, the data decompression process is relatively performed by the reverse procedure of the compression process. It can be performed efficiently with simple processing.

[Brief description of drawings]

【図１】本発明の一実施例の全体構成を示すブロック
構成図である。FIG. 1 is a block configuration diagram showing an overall configuration of an embodiment of the present invention.

【図２】実施例のデータの圧縮・伸長処理に必要なテ
ーブルを説明するための説明図である。FIG. 2 is an explanatory diagram for explaining a table required for data compression / decompression processing according to the embodiment.

【図３】実施例のデータの圧縮処理の手順を示すフロ
チャートである。FIG. 3 is a flowchart showing a procedure of data compression processing according to the embodiment.

【図４】実施例のデータの伸長処理の手順を示すフロ
チャートである。FIG. 4 is a flowchart showing a procedure of data decompression processing according to the embodiment.

[Explanation of symbols]

１…入力装置、２…表示装置、３…処理装置（ＣＰ
Ｕ）、４…メインメモリ、４１…メインメモリ内のテー
ブル、４２…被圧縮テキスト、４３…出現回数ソートテ
ーブル、４４…ロケーションテーブル、４５…圧縮デー
タテキストテーブル、５…外部記憶装置、６…テキスト
ファイル、７…ファイル、８…圧縮データテキストファ
イル、９…バスライン。1 ... Input device, 2 ... Display device, 3 ... Processing device (CP
U), 4 ... Main memory, 41 ... Main memory table, 42 ... Compressed text, 43 ... Occurrence count sort table, 44 ... Location table, 45 ... Compressed data text table, 5 ... External storage device, 6 ... Text File, 7 ... File, 8 ... Compressed data text file, 9 ... Bus line.

Claims

[Claims]

1. After obtaining the number of appearances of each character forming the compressed text data in the compressed text data, the characters are rearranged in the descending order of the number of appearances and registered in a file. A method for compressing data, wherein the value of the location from the beginning on the file is output as the compressed text data of the compressed text data.

2. The method of compressing data according to claim 1, wherein the value of the location is composed of 1 byte or 2 bytes depending on the location on the file.