TW201322649A

TW201322649A - Method for data compression

Info

Publication number: TW201322649A
Application number: TW100142040A
Authority: TW
Inventors: Hsu-Chuan Wei; Tuan-Hao Chen; Li-Der Chou; hua-rong Chu
Original assignee: Proscend Comm Inc; Univ Nat Central
Priority date: 2011-11-17
Filing date: 2011-11-17
Publication date: 2013-06-01

Abstract

A method for data compression is provided, including the steps of: dividing a source data into a plurality of blocks, the division being based on the consecutive repetitive data; compressing each block and storing compressed block in a segment format having a first length indicating length of consecutive repetitive data, a second length indicating length of non consecutive repetitive data and a data area; forming related segments into a section; and constructing complete compressed data.

Description

Data compression method

本發明係有關一種資料壓縮方法，尤指一種針對連續重複性資料的壓縮方法。The invention relates to a data compression method, in particular to a compression method for continuous repetitive data.

儘管網路頻寬與數位儲存裝置的容量一直持續不斷地增加，但是隨著使用人數以及資料產出量與傳輸量的成長，要完全滿足頻寬與容量的需求似乎仍有力有未逮之處。因此，資料壓縮技術的重要性更是歷久彌新，依然扮演決定性角色。尤其，隨著短小輕薄的消費性電子產品的普及，如何將攜帶式電子產品的有限儲存容量做最有效利用，以及在容量與使用效率之間取得平衡，實為重要課題。Although the network bandwidth and the capacity of digital storage devices have continued to increase, with the growth in the number of users and the growth in data throughput and throughput, it seems that there is still a lack of room to fully meet the bandwidth and capacity requirements. . Therefore, the importance of data compression technology is still a long-lasting and still plays a decisive role. In particular, with the spread of short, thin and thin consumer electronic products, how to make the most effective use of the limited storage capacity of portable electronic products and to balance the capacity and use efficiency is an important issue.

舉例來說，第一圖所示為習知的壓縮資料的存取架構示意圖。如第一圖所示，CPU 110自快閃記憶體(Flash) 120讀取已壓縮資料，將該壓縮資料透過串列周邊介面(Serial Peripheral Interface Bus,SPI Bus)傳給FPGA或DSP 130處理後，再寫入記憶體(Memory) 140。傳統的場可程式化閘陣列(Field Programmable Gate Array,FGPA)或是數位訊號處理(Digital Signal processing,DSP)載入資料的方式是一個字元、一個字元來依次讀取，然後再寫入。資料量越大，讀取與寫入的次數就會越多。相對地，資料載入的時間就會加長。因此，要加速資料的載入的方法就是減少讀取的次數。若是資料採用習知常用的zip方式壓縮，雖然可達到非常高的壓縮比例，減少了記憶體的使用，但是要載入資料時，仍須先將資料解壓縮到記憶體中。因此，雖然節省了記憶體的空間，但是讀取記憶體的次數並未減少。換言之，載入資料的時間並為縮短。For example, the first figure shows a schematic diagram of an access architecture of a conventional compressed data. As shown in the first figure, the CPU 110 reads the compressed data from the flash memory 120, and transmits the compressed data to the FPGA or DSP 130 through the Serial Peripheral Interface Bus (SPI Bus). And write to memory 140 again. The traditional Field Programmable Gate Array (FGPA) or Digital Signal Processing (DSP) loads data in a character, a character, and then reads it. . The larger the amount of data, the more times you read and write. In contrast, the time it takes to load the data will be lengthened. Therefore, the way to speed up the loading of data is to reduce the number of reads. If the data is compressed by the commonly used zip method, although the compression ratio can be achieved, the use of the memory is reduced, but when the data is loaded, the data must first be decompressed into the memory. Therefore, although the memory space is saved, the number of times the memory is read is not reduced. In other words, the time to load the data is shortened.

另一種常見的資料壓縮方是為連續長度編碼(Run-Length Encoding,RLE)，特別適用於將具有連續性重複資料(runs of same value)的資料進行壓縮；換言之，利用單一個資料值(value)與一長度(length)來替代整個連續性重複資料串。例如，若是原始資料為：Another common data compression method is Run-Length Encoding (RLE), which is especially suitable for compressing data with runs of same value; in other words, using a single data value (value And a length (length) to replace the entire continuous repeating data string. For example, if the original material is:

WWWWWWWBWWWWWWWWWBBBWWWWWWWWWWWWWWWBWWWWWWWWWBBBWWWWWWWW

則採用RLE編碼，其編碼結果為：Then use RLE coding, the coding result is:

7W1B9W3B8W7W1B9W3B8W

上述編碼可解讀為：7個W，1個B，9個W，3個B，8個W。因此，原始資料長度為28字元，壓縮後為10字元。若是連續性重複資料越多，則壓縮比例可以越高，亦即，壓縮效果越好。The above code can be interpreted as: 7 W, 1 B, 9 W, 3 B, 8 W. Therefore, the original data is 28 characters long and 10 characters after compression. If there are more continuous data, the compression ratio can be higher, that is, the compression effect is better.

然而，上述的RLE壓縮方式最壞的情況是當原始資料完全不具有任何連續性的重複資料串時，例如，當原始資料為：However, the worst case of the above RLE compression method is when the original data does not have any continuous duplicate data strings, for example, when the original data is:

WBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWB

其RLE編碼結果為：Its RLE coding result is:

1W1B1W1B1W1B1W1B1W1B1W1B1W1B1W1B1W1B1W1B1W1B1W1B1W1B1W1B1W1B1W1B1W1B1W1B

換言之，因此，原始資料長度為18字元，壓縮後為36字元，增加一倍。針對上述例子，其他類似壓縮方式，如常見的LZ77、LZ78等，或稱為移動窗口(sliding window)，提出改善方法，亦即改將WB視為重複性的單元。然而，若是原始資料不具有任何規則性重複的資料，上述改善方法也不足以改善其非規則性的資料的低壓縮比例。因此，一個能夠有效壓縮連續重複性資料，且能因應不具連續重複性資料串的壓縮方法不僅是研究者面臨的重要議題，也是目前相關業界的當務之急。In other words, therefore, the length of the original data is 18 characters, which is 36 characters after compression, which is doubled. For the above examples, other similar compression methods, such as the common LZ77, LZ78, etc., or the sliding window, propose an improvement method, that is, the WB is regarded as a repeating unit. However, if the original data does not have any regular repeating data, the above improvement method is not sufficient to improve the low compression ratio of the irregular data. Therefore, a compression method that can effectively compress continuous repetitive data and can respond to non-continuous repetitive data strings is not only an important issue for researchers, but also a top priority for the relevant industry.

基於上述習知技術之缺失，本發明之主要目的在於提供一種資料壓縮方法，能夠有效壓縮連續重複性資料，達到高壓縮比例，可節省記憶體空間，也同時降低資料載入時間。Based on the above-mentioned shortcomings of the prior art, the main object of the present invention is to provide a data compression method, which can effectively compress continuous repetitive data, achieve a high compression ratio, save memory space, and simultaneously reduce data loading time.

本發明之另一目的在於提供一種資料壓縮方法，能夠有效因應不具連續重複性資料串的資料，不進行壓縮，避免浪費時間與增加資料量。Another object of the present invention is to provide a data compression method, which can effectively respond to data without continuous repetitive data strings, without compression, and avoid wasting time and increasing data volume.

本發明之又一目的在於提供一種資料壓縮方法，能夠邊解邊用，亦即，一邊進行解壓縮一邊使用已經完成解壓縮的部分資料，以增加效率；而非待全部資料解完壓縮後才可使用。Another object of the present invention is to provide a data compression method, which can be used for decompression, that is, while decompressing, while using part of the data that has been decompressed, to increase efficiency; be usable.

為達成上述目的，本發明提供一種資料壓縮方法，包括：將一原始資料進行切割成複數個區塊(blocks)；針對每個區塊分別進行壓縮，壓縮後的資料係以一段落的格式儲存；將所有連續相關段落組成一個章節；以及建構完整壓縮後完整資料。To achieve the above object, the present invention provides a data compression method, comprising: cutting a raw data into a plurality of blocks; compressing each block separately, and compressing the data in a paragraph format; Form all consecutive related paragraphs into one chapter; and construct complete and compressed complete data.

茲配合下列圖示、實施例之詳細說明及申請專利範圍，將上述及本揭露之其他優點詳述於後。The above and other advantages of the present disclosure will be described in detail below with reference to the following drawings, detailed description of the embodiments, and claims.

第二圖所示為本發明之一種資料壓縮方法所用的壓縮原理的示意圖。如第二圖的範例中，一原始資料201包含有連續重複性資料S，與非連續重複性資料分別以d1、d2、d3、d4、d5、d6、d1’、d2’、d3’、d4’、d5’、d6’、d7’、d8’、d9’等表示。本發明之壓縮方法係以連續重複性資料串與其後續之非連續重複性資料串是為一壓縮單位，將其壓縮後的資料以一段落(segment)格式儲存。每個段落各包含一第一長度、一第二長度、以及一資料段；其中該第一長度係以一字元(byte)的大小來表示該連續重複性資料串的長度，該第二長度係以一字元的大小來表示該非連續重複性資料串的長度，而該資料段則是儲存該非連續重複性資料串的資料。換言之，本發明所使用的壓縮方式可定義為一種LLD的壓縮方式，亦即，Length+Length+Data(長度+長度+資料)。因此，如第二圖所示，該原始資料201經壓縮後變成兩個段落202、203，其中第一個段落202的內容分別為：第一長度7、第二長度6、資料段為d1、d2、d3、d4、d5、d6；且第二個段落203的內容分別為：第一長度5、第二長度9、資料段為d1’、d2’、d3’、d4’、d5’、d6’、d7’、d8’、d9’。The second figure shows a schematic diagram of the compression principle used in a data compression method of the present invention. As in the example of the second figure, a raw data 201 contains continuous repetitive data S, and d1, d2, d3, d4, d5, d6, d1', d2', d3', d4, respectively, with non-continuous repetitive data. ', d5', d6', d7', d8', d9', etc. The compression method of the present invention uses a continuous repetitive data string and its subsequent non-continuous repetitive data string as a unit of compression, and the compressed data is stored in a segment format. Each of the paragraphs includes a first length, a second length, and a data segment; wherein the first length represents the length of the continuous repeating data string by a size of a byte, the second length The length of the non-continuous repetitive data string is represented by a size of one character, and the data segment is data for storing the non-continuous repetitive data string. In other words, the compression method used in the present invention can be defined as a compression mode of LLD, that is, Length+Length+Data (length+length+data). Therefore, as shown in the second figure, the original data 201 is compressed into two paragraphs 202, 203, wherein the contents of the first paragraph 202 are: a first length 7, a second length 6, and a data segment d1. D2, d3, d4, d5, d6; and the content of the second paragraph 203 is: first length 5, second length 9, data segment is d1', d2', d3', d4', d5', d6 ', d7', d8', d9'.

值得注意的是，由於本發明僅用一個字元的大小來表示資料串的長度，因此所能表示的資料串長度最長為255。在壓縮原始資料的過程，可能面臨該連續重複性資料串的長度超過255或該非連續重複性資料串的長度超過255的情況。另一個可能的情況是原始資料中可能會有超過一個以上的連續重複性資料串，且其重複性資料並不相同，例如，S1與S2。換言之，每一連續重複性資料串中所重複的資料並不相同。第三種情況是，原始資料中完全不含任何連續重複性資料串，可以供壓縮。針對上述三種原始資料可能會發生的情形，本發明的資料壓縮方法也列入了設計時的考量。It should be noted that since the present invention uses only the size of one character to represent the length of the data string, the length of the data string that can be represented is up to 255. In the process of compressing the original data, it may be faced that the length of the continuous repetitive data string exceeds 255 or the length of the non-continuous repetitive data string exceeds 255. Another possibility is that there may be more than one continuous repetitive data string in the original data, and the repetitive data is not the same, for example, S1 and S2. In other words, the data repeated in each successive repetitive data string is not the same. In the third case, the original data does not contain any continuous repetitive data strings and can be compressed. The data compression method of the present invention is also included in the design considerations for the case where the above three kinds of original materials may occur.

針對上述長度超過的情況，本發明的資料壓縮方法是採用將該原始資料先行切割成複數個區塊(blocks)的方式。第三圖所示為發明之切割原始資料成複數個區塊的示意圖。如第三圖所示，原始資料301係為500個字元的連續重複性資料S，其後接著500個字元的非連續重複性資料d1-d500。由於本發明的每一個段落所能表示的資料串長度最長為255，因此，本發明先將原始資料301切個成數個區塊，其中，區塊302包括255個連續重複性資料S、區塊303包括245個連續重複性資料S與10個非連續重複性資料d1-d10、區塊304包括255個非連續重複性資料d11-d265、區塊305包括235個非連續重複性資料d266-d500。如此一來，其壓縮後的對應段落分別應為段落3021包括：第一長度255、第二長度0、無資料段；段落3031包括：第一長度245、第二長度10、資料段d1-d10；段落3041包括：第一長度0、第二長度255、資料段d11-d265；段落3051包括：第一長度0、第二長度235、資料段d266-d500。For the case where the above length is exceeded, the data compression method of the present invention adopts a method of cutting the original data into a plurality of blocks. The third figure shows a schematic diagram of the invention cutting the original data into a plurality of blocks. As shown in the third figure, the original data 301 is a continuous repetitive data S of 500 characters followed by discontinuous repetitive data d1-d500 of 500 characters. Since the length of the data string that can be represented in each paragraph of the present invention is up to 255, the present invention first cuts the original data 301 into a plurality of blocks, wherein the block 302 includes 255 consecutive repetitive data S and blocks. 303 includes 245 consecutive repetitive data S and 10 non-continuous repetitive data d1-d10, block 304 includes 255 non-continuous repetitive data d11-d265, and block 305 includes 235 non-continuous repetitive data d266-d500 . In this way, the compressed corresponding paragraphs should respectively be paragraph 3021 including: first length 255, second length 0, no data segment; paragraph 3031 includes: first length 245, second length 10, data segment d1-d10 The paragraph 3041 includes a first length 0, a second length 255, and a data segment d11-d265; the paragraph 3051 includes a first length 0, a second length 235, and a data segment d266-d500.

同樣地，針對原始資料中可能會有超過一個以上的連續重複性資料串，且其重複性資料並不相同的情況，針對上述情形，本發明特別增加了一類別的欄位，用以表示該壓縮資料所採用的壓縮類別。例如，當類別為0x00時，表示該壓縮資料僅含有一種連續重複性資料S；當類別為0x01時，表示該壓縮資料僅含有至少兩種連續重複性資料S1、S2。有關本發明之壓縮後之完整資料格式，會在稍後詳述。Similarly, in the case where there may be more than one continuous repetitive data string in the original data, and the repetitive data is not the same, the present invention particularly adds a category of fields for indicating the situation. The compression category used to compress the data. For example, when the category is 0x00, it indicates that the compressed data contains only one continuous repetitive data S; when the category is 0x01, it indicates that the compressed data contains only at least two consecutive repetitive data S1, S2. The compressed full data format of the present invention will be described in detail later.

另外，針對原始資料中完全不含任何連續重複性資料串，可以供壓縮的情況，如前所述的類別欄位，也為上述情況提供解決方案。例如，當類別為0xFF時，表示該壓縮資料係完整保留原始資料，完全未曾壓縮。In addition, for the case where the original data does not contain any continuous repetitive data strings and can be compressed, the category field as described above also provides a solution for the above situation. For example, when the category is 0xFF, it means that the compressed data completely retains the original data and is not compressed at all.

第四A-四C圖所示為本發明之完整壓縮後資料格式的示意圖。如第四圖所示，本發明之壓縮後資料格式，依其壓縮類別分為(A)只針對一種連續重複性資料進行壓縮、(B)針對至少兩種連續重複性資料進行壓縮、與(C)係完整保留原始資料，完全未曾壓縮。第四A圖所示為針對一種連續重複性資料進行壓縮後的資料格式，包括一檢查欄位(checksum)、一版本欄位(version)、一類別欄位(type)、與一章節(section)；其中，該檢查欄位係用於檢查該壓縮後的資料是否正確，該版本欄位係表示所壓縮演算所採用之軟體版本，該類別欄位係設定為0x00，表示只針對一種連續重複性資料進行壓縮，因此後面只有一個章節。該章節更包括一S-byte欄位，用於儲存該連續重複性資料S，一壓縮後總長度(compressed length)欄位，用於儲存該章節的長度，與複數個段落，每一個段落係如前所述為LLD的格式。第四B圖所示為針對至少兩種連續重複性資料進行壓縮後的資料格式，包括一檢查欄位、一版本欄位、一類別欄位、與至少兩個章節；其中，該檢查欄位與該版本欄位如前所述，該類別欄位係設定為0x01，表示只針對至少兩種連續重複性資料進行壓縮，因此後面至少有兩個章節。每個章節更包括一S-byte欄位，用於儲存該章節內的連續重複性資料S，一章節長度欄位(section length)，一壓縮後總長度(compressed length)欄位，與複數個段落，每一個段落係如前所述為LLD的格式。換言之，每個章節內只針對一種連續重複性資料進行壓縮。第四C圖所示完整保留原始資料，完全未曾壓縮的資料格式，包括一檢查欄位、一版本欄位、一類別欄位、一保留欄位(reserved)、一壓縮後總長度欄位與一資料段；其中，該檢查欄位與該版本欄位如前所述，該類別欄位係設定為0xFF，表示完整保留原始資料，完全未曾壓縮。該保留欄位係為對應S-byte欄位而保留，該資料區則是將原始資料區塊完整複製。由此可知，由類別欄位可得知壓縮後的完整資料內具有的章節格式與個數。The fourth A-fourth C diagram shows a schematic diagram of the complete compressed data format of the present invention. As shown in the fourth figure, the compressed data format of the present invention is classified according to its compression category (A) compression for only one continuous repetitive data, and (B) compression for at least two consecutive repetitive data, and ( C) The original data is completely preserved and is not compressed at all. Figure 4A shows the compressed data format for a continuous repetitive data, including a checksum, a version field, a category field, and a chapter ( Section); wherein the check field is used to check whether the compressed data is correct, and the version field indicates the software version used in the compressed calculation, and the category field is set to 0x00, indicating that only one type of continuous Repetitive data is compressed, so there is only one chapter at a time. The chapter further includes an S-byte field for storing the continuous repetitive data S, a compressed length field for storing the length of the chapter, and a plurality of paragraphs, each paragraph The format of the LLD is as described above. Figure 4B shows a compressed data format for at least two consecutive repetitive data, including a check field, a version field, a category field, and at least two chapters; wherein the check field As with the version field, as described above, the category field is set to 0x01, indicating that compression is only performed for at least two consecutive repetitive data, so there are at least two chapters. Each chapter further includes an S-byte field for storing the continuous repetitive data S in the chapter, a section length field, a compressed length field, and a plural number. Each paragraph is in the format of LLD as described above. In other words, each chapter is only compressed for one continuous repetitive data. The fourth C picture shows the complete retained original data, the completely uncompressed data format, including a check field, a version field, a category field, a reserved field (reserved), a compressed total length field and A data segment; wherein the check field and the version field are as described above, the category field is set to 0xFF, indicating that the original data is completely retained, and is not compressed at all. The reserved field is reserved for the corresponding S-byte field, and the data area is the complete copy of the original data block. It can be seen from the category field that the chapter format and number in the compressed complete data can be known.

第五圖所示為本發明之一種資料壓縮方法之流程示意圖。如第五圖所示，步驟501係將一原始資料進行切割成複數個區塊(blocks)，其切割方式係以連續重複性資料為基礎，每個區塊可為一連續重複性資料串後接著一非連續重複性資料串、一連續重複性資料串、或是一非連續重複性資料串，且每一區塊的總長度小於255字元。步驟502係針對每個區塊分別進行壓縮，壓縮後的資料係以一段落的格式儲存。其壓縮方式包括將連續重複性資料以一個字元儲存、連續重複性資料串的長度以一字元表示、非連續重複性資料串的長度以一個字元表示、複製非連續重複性資料至資料段內。步驟503係將所有連續相關段落組成一個章節。步驟504係建構完整壓縮後資料，例如包括設定檢查欄位、版本欄位、類別欄位、並將至少一個的章節或資料段填入，以完成壓縮後的完整資料。The fifth figure shows a flow chart of a data compression method according to the present invention. As shown in the fifth figure, step 501 is to cut a raw data into a plurality of blocks, and the cutting method is based on continuous repetitive data, and each block can be a continuous repetitive data string. Then a non-continuous repetitive data string, a continuous repetitive data string, or a non-continuous repetitive data string, and the total length of each block is less than 255 characters. Step 502 is separately compressed for each block, and the compressed data is stored in a paragraph format. The compression method includes storing the continuous repetitive data in one character, the length of the continuous repetitive data string in one character, the length of the non-continuous repetitive data string in one character, and copying the non-continuous repetitive data to the data. Within the paragraph. Step 503 is to group all consecutive related paragraphs into one chapter. Step 504 constructs the complete compressed data, for example, including setting a check field, a version field, a category field, and filling in at least one chapter or data segment to complete the compressed complete data.

值得注意的是，解壓縮時，可依照其類別欄位解讀其後之章節，在分別對其後至少一個的章節內的各段落或資料段分別還原成原始資料即可。例如，段落的第一長度為50、第二長度30，則該段落可經解壓縮還原為其原始資料串，包括：50個連續重複性資料S(儲存於S-byte內)，以及儲存於該段落之資料段內的30個非連續重複性資料。It is worth noting that when decompressing, the subsequent chapters can be interpreted according to their category fields, and each paragraph or data section in at least one of the subsequent chapters can be restored to the original data. For example, if the first length of the paragraph is 50 and the second length is 30, the paragraph can be decompressed and restored to its original data string, including: 50 consecutive repetitive data S (stored in S-byte), and stored in 30 non-continuous repetitive data in the data section of the paragraph.

經由以上本發明之實施例與現有之習知技術比較，本發明有以下之優點：The present invention has the following advantages over the above-described embodiments of the present invention as compared with the prior art:

1.　結構簡單，例如可運用於系統開機程式。1. Simple structure, for example, can be applied to the system boot program.

2.　能夠有效壓縮連續重複性資料，達到高壓縮比例，可節省記憶體空間，也同時降低資料載入時間。亦可有效因應不具連續重複性資料串的資料，不進行壓縮，避免浪費時間與增加資料量。2. It can effectively compress continuous repetitive data and achieve high compression ratio, which can save memory space and reduce data loading time. It can also effectively respond to data that does not have continuous repetitive data strings, without compression, and avoid wasting time and increasing data volume.

3.　能夠邊解邊用，亦即，一邊進行解壓縮一邊使用已經完成解壓縮的部分資料，以增加效率；而非待全部資料解完壓縮後才可使用。3. Can be used while solving, that is, using some of the data that has been decompressed while decompressing to increase efficiency; rather than waiting for all data to be compressed before use.

因此，本發明之一種資料壓縮的方法，確能藉所揭露之技藝，達到所預期之目的與功效，符合發明專利之新穎性，進步性與產業利用性之要件。Therefore, the method of data compression of the present invention can achieve the intended purpose and effect by the disclosed technology, and meets the requirements of novelty, advancement and industrial utilization of the invention patent.

以上所述者皆僅為本揭露實施例，不能依此限定本揭露實施之範圍。大凡本發明申請專利範圍所作之均等變化與修飾，皆應屬於本發明專利涵蓋之範圍。The above is only the embodiment of the disclosure, and the scope of the disclosure is not limited thereto. All changes and modifications made to the scope of the patent application of the present invention are intended to fall within the scope of the invention.

110．．．CPU110. . . CPU

120．．．快閃記憶體120. . . Flash memory

130．．．FPGA/DSP130. . . FPGA/DSP

140．．．記憶體140. . . Memory

201．．．原始資料201. . . Source material

202．．．段落202. . . paragraph

203．．．段落203. . . paragraph

301．．．原始資料301. . . Source material

302．．．區塊302. . . Block

303．．．區塊303. . . Block

304．．．區塊304. . . Block

305．．．區塊305. . . Block

3021．．．段落3021. . . paragraph

3031．．．段落3031. . . paragraph

3041．．．段落3041. . . paragraph

3051．．．段落3051. . . paragraph

501．．．步驟501. . . step

502．．．步驟502. . . step

503．．．步驟503. . . step

504．．．步驟504. . . step

第一圖所示為習知的壓縮資料的存取架構示意圖。The first figure shows a schematic diagram of the access architecture of the conventional compressed data.

第二圖所示為本發明之一種資料壓縮方法所用的壓縮原理的示意圖。The second figure shows a schematic diagram of the compression principle used in a data compression method of the present invention.

第三圖所示為發明之切割原始資料成複數個區塊的示意圖。The third figure shows a schematic diagram of the invention cutting the original data into a plurality of blocks.

第四A-四C圖所示為本發明之完整壓縮後完整資料格式的示意圖。The fourth A-fourth C diagram shows a schematic diagram of the complete compressed full data format of the present invention.

第五圖所示為本發明之一種資料壓縮方法之流程示意圖。The fifth figure shows a flow chart of a data compression method according to the present invention.

501．．．步驟501. . . step

502．．．步驟502. . . step

503．．．步驟503. . . step

504．．．步驟504. . . step

Claims

A data compression method comprises the steps of: cutting a raw material into a plurality of blocks, the cutting manner is based on continuous repetitive data; respectively compressing for each block, the compression method comprising continuous repeatability The data is stored in one character, the length of the continuous repetitive data string is represented by one character, the length of the non-continuous repetitive data string is represented by one character, and the non-continuous repetitive data is copied; the consecutive related paragraphs are grouped into one chapter. ; Construct the fully compressed data and fill in at least one chapter to complete the compressed data.

The method of claim 1, wherein the cutting method generates each of the blocks as a continuous repetitive data string followed by a non-continuous repetitive data string, a continuous repetitive data string, or A non-continuous repetitive data string, one of the above three.

The method of claim 1, wherein the total length of each block is less than 255 characters.

The method of claim 1, wherein the compressed data is stored in a paragraph format, the paragraph format further comprising a first length, a second length, and a data segment, the first length representation The length of the continuous repetitive data string, the second length representing the length of the non-continuous repetitive data string, and the data segment replicating the non-continuous repetitive data.

The method of claim 1, wherein the format of each chapter further comprises an S-byte field, a compressed total length field and at least one paragraph, the S-byte field storing the continuous Repeatable data, the total length field after compression stores the length of the chapter.

The method of claim 1, wherein the format of each chapter further comprises a reserved field, a compressed total length field and a data area, wherein the chapter format is used when the original data is not When there is continuous repetitive data, the original data is retained without any compression.

The method of claim 1, wherein the compressed complete data further includes a check field, a version field, and a category field, wherein the category field is used to determine the compressed complete The format and number of chapters in the data.