TWI287362B

TWI287362B - Compressing method for statistical data characteristics by finite exhaustive optimization

Info

Publication number: TWI287362B
Application number: TW93141518A
Authority: TW
Inventors: Fred Chen; White Zhang; Harley Yan
Original assignee: Inventec Besta Co Ltd
Priority date: 2004-12-30
Filing date: 2004-12-30
Publication date: 2007-09-21
Also published as: TW200623657A

Abstract

A compressing method for statistical data characteristics by finite exhaustive optimization is disclosed. First, the statistics of the repeat frequency of the same character with specific length in the data is gathered. Next, the finite exhaustive method is used to search an alternative length range, and serial numbers are used for the repeated language units in the alternative length range. The non repeated units are coded in accordance with a Huffman compression algorithm. When the data are dictionary data, the block uniqueness of the dictionary database is used to divide the huge data into small blocks and then compress the small blocks so as to increase the speed of data searching to further enhance the efficiency of data compression without increasing the complexity of decompression time.

Description

1287362 九、發明說明：【發明所屬之技術領域】本發明係關於一種資料壓縮方法，特別是一種對資料統計特徵進行有限窮舉優化的壓縮方法。【先前技術】現今電子產業飛速發展’電腦、手機、個人數位助理 (Personal Digital Assistant; PDA)等高科技產品日新月異。隨著掌上型電子消費產品的廣泛應用’人們對於掌上型電子消費產品的使用要求絲越高，於未來財均電子料產品除了必須要能提供超大容量的知識’甚至要朝向多樣化之服務功能發展。然而，目前的掌上型電子消費產品，尤其是各種嵌入式設備，由於受限於尺寸大小因而其#源有限，即隨機存取記憶體 (random-accessmemory ； RAM) + (central processing unit ； CPU) , 大容量資料的儲存和快速讀取的問題。特別是，在諸如·壓縮料料處理過程中。1287362 IX. Description of the invention: [Technical field to which the invention pertains] The present invention relates to a data compression method, and more particularly to a compression method for performing finite exhaustive optimization of data statistics features. [Prior Art] Today's electronics industry is rapidly developing. High-tech products such as computers, mobile phones, and personal digital assistants (PDAs) are changing with each passing day. With the wide application of handheld electronic consumer products, 'the higher the demand for the use of handheld electronic consumer products, the more expensive the electronic products in the future must provide the knowledge of large capacity' or even the diversified service functions. development of. However, current handheld electronic consumer products, especially various embedded devices, have limited source due to their limited size, namely, random access memory (RAM) + (central processing unit; CPU). , the problem of large-capacity data storage and fast reading. In particular, during processing such as compressing materials.

是採用哈夫曼（HUFFMAN)壓縮算法，即二U 進行統-編碼’再加上針狀長_語言單位進行替代2 那麼則會造成儲存空間的浪費複頻率高的資料，如字典資料，如果不針對資特別疋蒂言重最佳的壓縮方案，職财播諸;貝；、' 的自身特點提出【發明内容】 1287362 鑒於以上的問題，士計特徵進行有限窮舉優要目的在於提供一種對資料統效率。的i鈿方法，藉以提高壓縮及解壓縮之本發明所揭露之對資方法，採用有限窮舉徵進行有限窮舉優化的壓縮縮，從而提高了資料壓财和2料的重複語言單位進行充分麗率和保證咖算法的適應性。本么明所揭露之對資祖方法，係對資料進行分塊芦端進行有限窮舉優化的壓縮性，以躺並消除塊單位與料位間的相關胃⑽㈣輕，_實魏單位的隨機解壓。方法，月=斤=士路之對貧料統計特徵進行有限窮舉優化的壓縮量次料*間空間需求較小的情況下正確屢縮及解_大里貝枓，以適用於資源有限的嵌入式系統中。有達上述目的’本發明所揭露之對㈣統計特徵進行之=二Γ縮方法’包括有下列步驟：首先’針對欲壓縮到一„ 接著進行預處理後之資料中字符統計，以得 :位及其頻率表；再根據語言單位及其頻率表進行有限窮轉到—軸長度細，·賴，根騎代長度範圍對預處理後之資料進行重複語轉位替代，以得㈣代"資料· 2後根據替代信息資料統計預處理後之資料中非替代字符的頻到-哈夫曼（卿舰N)樹（步_)，·最後，根據 4料和·職N樹利用萊斯（Lempd Zip Store Szymansla ; LZSS) _算法和賺FMan _算法進行預處理 1287362 後之資料的編碼，以得到—壓縮資料。 …田^貝料係為Uni⑺如碼文件時，則於預處理之步驟中士，：長字::碼替換’即先將資料中碼值小於0x80的編碼消除其 :字節接著將其餘編碼按照使用頻率排序，接著將使用頻率較 q之既疋數S的碼烟碼值G〜GxFE替代，最制餘的碼值則以OxffL上一字節編碼替代。其中此既定數量可為m個。而當資料係為大容量資料庫（database)(如：字典資料）時’於預處理之步驟的最後會先進行資料的分塊標記記錄以得到塊信息’然後於最後得編碼#代_先根據塊信息妨分塊，再以分割後的小塊資料進行壓縮。有關本發明的特徵與實作，兹配合圖示作最佳實施例詳細說明如下。【實施方式】以下舉出具體實施例以詳細說明本發明之内容，並以圖示作為輔助說明。說明中提及之符號係參照圖式符號。將本發明應用於肷入式設備中，如··電子書、可攜式全球定位系統裝置（Portable global positioning devices)、可上網行動電話、個人數位助理（PDA)配合無線傳輸魏及穿戴式電腦 (Wearable computers)等攜帶式電子裝置，可藉由資料壓縮優化的算法於極小的空間複雜度下（即，内存佔用低，其可達到資料庫（DataBase)的幾萬甚至幾億分之-）’而提供較高的資料壓縮效率’並且比同類型的算法具有更小的時間複雜度（即，速度快， 1287362 且可在t速處理器上接近即時的解出任意需測覽的資料）。 no) 圖*首先，針騎壓縮之:#料進行預處理（步驟其頻率之資料中字符統計，以得到語言單位及搜尋二驟120);根據語言單位及其頻率表進行有限窮舉法預产理替代長度關（步驟13G);根據替代長度範圍對驟L 制行重複語言單鱗代，以彳靖代信息資料（步頻率，、’1_#代信息資料統計預處理後之資料中非替代字符的據铁获Γΐ到—哈夫曼（卿FMAN)樹（步驟15〇);最後再根康曰他息資料和聊FMAN樹利用萊斯（Lempel Zip store 彳’Lzss) _#紗卿?购法進行預處理後之禮的編碼，以得到一壓縮資料（步驟⑽）。、第2圖’於進行資料預處理的步驟中，即『步驟110』，包括下列步驟：首先，判斷欲壓縮資料係為二字節（即，16_bit) 國《準編碼（即，Unic0_文件資料還係其他編碼文件資料 …本也馬文件(ANSI)資料）（步驟⑴），也就是判斷欲墨縮資料的編碼類型是否為Unic〇de石馬之編碼方式。田奴貝料係為Umcode碼文件時，則進行長字節碼替換於此’包括下列步驟：先將肠_碼文件資料中碼值小於 0x80的編碼消除其高字節（步驟112);接著將其餘編碼按照使用頻率排序（步驟113);接著將使用頻率較高之既定數量的碼值用碼值OxSO〜OxFE替代（步驟m);最後剩餘的碼值則以㈣標記加上二字節編碼替代（步驟115);完成迦減碼文件資料之長 1287362 字節碼替換後，再判斷此欲壓縮資料是否為字典資料，即進行欲壓縮資料之資料類型判斷（步驟116)。其中此既定數量可為個。 …、當欲壓縮資料不是Unicode碼文件資料時，則進行欲壓縮資料是否為字典資料之判斷，即進行欲壓縮資料之資料類型判驟 116 ) 〇當欲壓縮資料為字典資料時，則進行資料的分塊標記記錄，以得到塊信息（步驟117)，即完成此預處理步驟。 ° “ 當欲壓縮資料不為字典資料時，則不進行任何資料處理動作，也就是即完成此預處理步驟。翏照第3 ® ’於『步驟12〇』中係先統計爾理後之資料中所有相同将的位置（步驟122)，然後對所有_字符依後續字符進行排序’轉到語言單減其鮮制表（步驟124)。參照第4圖’於進行有限窮舉法搜尋之步驟中，即『步驟 13〇』，首先於語譯似其解表顿$鱗長度範圍以上之有重複語言單位，並触長度侧進行記錄，轉顺數個特定長度範圍’射鱗長絲關最场定長度翻係鱗優 (步驟I32);以尋優範财每—特定長度觀内的語言單位對任句進行由長聰的逐次替代，並記錄減少掉的長度（步驟-根據減少掉的長度查找—最大之減少掉的長度（步驟叫阶最大之減少掉的長度所對應的雜⑽找度範圍，以得到: 長度辄圍（步驟138)。其中，尋優範圍可為從基準值到最大i複 1287362 語言單位的長度。於『步驟⑽』即係根據替代長度範圍而產生重複語言單位之替代U文件。也就是對替代長度範圍⑽錢語言單位從長到短順次替換、編碼，編碼結果輸出_代記錄重複語言單位。最後之資料壓縮步驟，即『步驟⑽』，係先根據替代信息資料和歸man樹對預處理後之資料進行⑽和祕腿混合編碼替換，而於編碼後’儲存複數個信息，以得到該壓縮資料。盆中，當此資料類型係為大容量資料庫（data base)，如：_ 貧料，則於編碼替代時根據塊信息進行分塊，以得到分塊資料，、 ^行小塊分割的壓縮。於此，各項信息包括重複語言單位、塊 #息和Huffman樹等。以下猎由具體實施方輯進—部舉例詳細說明。例如：有一筆大英英中曰韓辭典資料，其原始資料長度為 45,776，158 Bytes (字節），舌土# 子即）I先，於進行預處理後資料長度為 ’接著統計出特定長度範圍（即，0x7f)資料中重稷δ吾吕早位和重複語言單位頻率並存放於哪文件中，再經過 =限窮舉法搜尋可得—特定長度和減少掉的長度之關係列表，如心圖所不’財可得顺域少掉縣度為16,287，刪卿，並且可得到相對應之牲範圍為3535 _，代長度行編號，得到重複度範圍内的重複語言單位由長到短進 σ单位長度為491，862 Bytes，即替代信息資 11 1287362 料為例，862 Bytes。此外，為克服字典類大容量資料庫，因此對資料逕行分塊’並在每個分塊頭建立地址索引，將地址索引存放於.ldx文件中，其長度為⑹乃，做完上述工作後，開始對資料進行壓縮，得·縮結果12，115,479B卿而使用習知的 _算法壓縮此辭典資料時，其資料長度為⑽抑㈣鄉。如第6圖所示，其為大英英巾日韓辭典_結果對比表。另外’貝料共分為24,862塊，壓縮率為2647%，而利用習知眺縮方法其壓縮率為·5%。由此可知剌本發魏行資料壓縮，其壓縮率有顯著提高。 ^以設狀-掌上型電子產品中之牛津辭典㈣來看，可制如第7圖所示之牛津辭典資料壓騎果對照表，可看到原始資料長度為22,58〇,376%如，_後資料長度為4,5()5,792咖；而若以習知之壓縮方法進行_得職料長度為5娜，223 b卿。於此，資料共分為146,292塊’ 1神為19·95%，而習知之壓縮方法壓縮率為22.54〇/〇。因此，可知應用本發明進行資料_之效果較佳。猎由上述之實例分析可知，㈣本發明—實施例之實現步驟，即先$計資料中特定長度之相同字符重複頻率，接著利用有限窮舉法尋找-替代長度麵’並㈣代長度顧⑽重複語言單位以序就代替’而非重複語言單位則按照聊^^颜壓縮算法予以編碼。上而且’由於壓縮後之數據仍保留了字典類資料庫中的分塊特性#息以及資料塊間數據相互獨立，因而提高資料查詢速度，從 12 1287362 :達到不增加解壓縮時間複雜度的前提下提高資料壓縮效率的目因此’通過使用本發财法提高了 f知:聽_算法的效率，對超^容量資觸別是字典諸中錢字符鮮高的資料，實現了在貧料處理上更快、更高效壓縮資料的功效。、雖然本發日脑祕之較佳實關猶如上，财並非用以限定本發明’任何熟習相像技藝者，在不脫離本發明之精神和範圍内二當可無許之更動無飾，因此本發明之翻倾範圍須視本_書軸之帽補範騎界定者鱗。【圖式簡單說明】 # ^第1圖係為說明根據本發明之一實施例之對資料統計特徵進行有限窮舉優化的壓縮方法的流程圖；第2圖係為說明第i圖中之『步驟11〇』的詳細流程圖；第3圖係為說明第i圖中之『步驟12〇』的詳細流程圖，· 第4圖係為說明第i圖中之『步驟13〇』的詳細流程圖，· '第5圖係為說明應用本發明之一實施例進行大英英中曰韓辭 '、貝料壓縮所產生之最大長度和減少掉的長度之關絲； · 第6圖係為說明應用本發明之一實施例和習知之壓縮方法進行大英英中日韓辭典資料壓縮之壓縮結果對照表；以及第7圖係為說明應用本發明之一實施例和習知之壓縮方法進仃牛津辭典資料壓縮之壓縮結果對照表。 [主要元件符號說明】步驟110......................預處理 13 1287362 步驟120.....................字符統計，以得到語言單位及其頻率表步驟130.....................有限窮舉法搜尋，以取得替代長度範圍步驟140.....................替代，以取得替代信息資料步驟150.....................統計，以取得哈夫曼樹步驟160.....................編碼’以取4寻堡縮貢料步驟111......................是否為Unicode碼文件資料？步驟112......................碼值小於0x80的編碼消除其高字節步驟113......................其餘編碼按照使用頻率排序步驟114......................使用頻率較高之既定數量的碼值用碼值 x80〜OxFE替代步驟115......................剩餘的碼值則以Oxff標記加上兩字節編碼替代步驟116......................是否為字典資料？步驟117......................記錄分塊標記步驟122.....................統計所有相同字符的位置步驟124.....................依後續字符進行排序，以得到語言單位及其頻率表步驟132.....................確定尋優範圍步驟134.....................進行由長到短的逐次替代，並記錄減少掉的長度步驟136...................查找一最大之減少掉的長度步驟138.....................取得對應的替換的特定長度範圍，以得 1287362 到替代長度範圍Huffman (HUFFMAN) compression algorithm, that is, two U for system-encoding 'plus stylus long _ language unit to replace 2 then will result in wasted storage space, such as dictionary data, if Not for the special compression scheme, the best compression scheme, the job finance broadcast; Bei;; 'The characteristics of its own [invention content] 1287362 In view of the above problems, the characteristics of the scholarship are limited to provide a kind of Data efficiency. The method of the present invention, which improves the compression and decompression of the method disclosed in the present invention, uses a finite exhaustive levy to perform a limited exhaustive optimization of the compression contraction, thereby improving the data compression and the repeated repetition of the language unit. The rate and the adaptability of the coffee algorithm. The method of cultivating the ancestors revealed by Ming Ming is based on the finite optimization of the finite element of the data, to lie and eliminate the correlation between the block unit and the material level (10) (four) light, _ real Wei unit random Unzip. Method, month = jin = Shilu's statistical characteristics of poor materials for finite exhaustive optimization of compression quantity and secondary material * The case of small space demand is small and correct _ _ 里枓枓枓枓枓枓枓枓枓枓枓枓枓枓In the system. There is a method for achieving the above-mentioned purpose (the fourth statistical method for the (four) statistical feature disclosed in the present invention includes the following steps: firstly, for the character to be compressed to a „subject to the pre-processed data, to obtain: And its frequency table; then according to the language unit and its frequency table, the finite-poor turn--the length of the axis is fine, and the length of the root riding generation is replaced by the repeated transposition of the pre-processed data to obtain the (four) generation " After the data · 2 according to the alternative information data, the pre-processed data in the non-substitute characters in the data to - Huffman (Qing N) tree (step _), and finally, according to the 4 materials and the N-tree use Les (Lempd Zip Store Szymansla; LZSS) _ algorithm and earn FMan _ algorithm to pre-process the data after 1283732 to obtain - compressed data. ... field ^ shell material is Uni (7) such as code files, then in the pre-processing steps Sergeant, long word: :code replacement', the code whose code value is less than 0x80 is first eliminated: the byte then sorts the remaining codes according to the frequency of use, and then the code smoke with the frequency S of the number S is used. Code value G~GxFE alternative, most The remaining code values are replaced by the one-byte code of OxffL, where the predetermined number can be m. When the data is a large-capacity database (such as dictionary data), the last step of the pre-processing step The block mark record of the data is first obtained to obtain the block information 'and then the coded one at the end _ first blocks the block according to the block information, and then compresses the divided pieces of data. Related to the features and implementations of the present invention, DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT The following detailed description is given to illustrate the preferred embodiments of the invention, The invention is applied to the intrusion device, such as an e-book, a portable global positioning device, an internet-enabled mobile phone, a personal digital assistant (PDA), a wireless transmission, and a wearable Portable electronic devices such as computers (Wearable computers) can be optimized by data compression algorithms with minimal space complexity (ie, low memory footprint, which can be achieved) The tens of thousands or even hundreds of millions of databases () provide higher data compression efficiency' and have less time complexity than the same type of algorithm (ie, fast, 1287362 and can be at t speed The processor is close to the instant to solve any data that needs to be viewed.) no) Figure * First, the needle rides the compression: #料进行处理 (steps in the frequency of the data in the character statistics to get the language unit and search 2 120); according to the language unit and its frequency table, the finite exhaustive method is used to replace the length of the pre-production (step 13G); according to the alternative length range, the language single scalar is repeated for the sequel to the information data (step frequency, , '1_# generation of information data statistics pre-processed data in the non-substitute character of the iron was obtained - Huffman (Qing FMAN) tree (step 15 〇); finally rooted in the information and chat FMAN tree The code of the pre-processed ritual is obtained by using the Lempel Zip store 彳 'Lzss _# 卿 ? purchasing method to obtain a compressed data (step (10)). 2, in the step of performing data preprocessing, that is, "step 110", the following steps are included: First, it is determined that the data to be compressed is a two-byte (ie, 16_bit) country "quasi-code (ie, Unic0_ file). The data is also the other encoding file data... This is also the ANSI file (step (1)), that is, whether the encoding type of the data to be inked is the encoding method of Unic〇de. When the Tiannubei material is a Umcode code file, the long byte code is replaced by this'. The following steps are included: the code with the code value less than 0x80 in the gut_code file data is first eliminated (step 112); Sorting the remaining codes according to the frequency of use (step 113); then replacing the code number with a higher frequency of use with the code values OxSO~OxFE (step m); the last remaining code values are marked with (4) plus two bytes The code is replaced (step 115); after the replacement of the length of the canonical code file data is 1,277,362 bytes, and then it is determined whether the data to be compressed is dictionary data, that is, the data type judgment of the data to be compressed is performed (step 116). The established quantity can be one. ..., when the data to be compressed is not the Unicode code file data, the judgment is made as to whether the data to be compressed is a dictionary data, that is, the data type judgment of the data to be compressed is 116) 〇 When the data to be compressed is dictionary data, the data is performed. The block is marked to obtain block information (step 117), that is, the pre-processing step is completed. ° “When the data to be compressed is not a dictionary material, no data processing action is performed, that is, the pre-processing step is completed. Referring to the 3 + 'in step 12〇, the data is first calculated. All the same positions in the middle (step 122), then sort all the _ characters according to the subsequent characters 'to the language list minus its fresh table (step 124). Refer to Figure 4 for the finite exhaustive search step In the middle, that is, "Step 13〇", firstly, there are repeated language units above the range of the length of the $scale, and the length side is recorded, and the number of specific length ranges is turned over. The length of the field is adjusted to be excellent (step I32); the language is replaced by Chang Cong in a language unit within a certain length of view, and the length of the reduction is recorded (step - according to the reduced Length search—the maximum reduced length (the step is called the maximum (10) range of the reduced length of the step to obtain: the length range (step 138). The optimal range can be from the reference value to Maximum i complex 1283732 words The length of the unit. In the "step (10)" is to replace the U file according to the alternative length range. That is, the replacement length range (10) money language unit from long to short sequential replacement, encoding, encoding result output _ generation Record the repeating language unit. The final data compression step, that is, "step (10)", first replaces the pre-processed data with the substitute information and the man tree (10) and the secret leg mixed code, and stores the plural after the encoding. Information to obtain the compressed data. In the basin, when the data type is a large data base, such as: _ poor material, block the block information according to the block information to obtain the block data. ^, the compression of small block division. Here, the information includes repeated language units, block #息, Huffman tree, etc. The following hunting is detailed by the specific implementation of the section - for example, a detailed description. For example: there is a British and English Korean dictionary data, the length of the original data is 45,776,158 Bytes (bytes), the tongue soil #子是) I first, after the pre-processing data length is 'continued statistics In the length range (ie, 0x7f), the data is stored in which file and the repeating language unit frequency is stored in the file, and then the search for the length of the specific length and the reduced length is obtained by the = limit exhaustive search. If the heart map does not have a wealth, the county can be reduced to 16,287, delete the Qing, and the corresponding range of the animal can be 3535 _, the length of the line number is obtained, and the repeating language unit within the range of repetition is obtained. The short-running σ unit length is 491,862 Bytes, which is an alternative information resource of 11 1287362, 862 Bytes. In addition, in order to overcome the dictionary-type large-capacity database, the data path is chunked 'and the address is established at each chunk head. Index, the address index is stored in the .ldx file, the length of which is (6). After the above work, the data is compressed, and the result is shortened by 12,115,479B and the dictionary data is compressed using a conventional algorithm. At the time, the length of the data is (10) and (4). As shown in Figure 6, it is the British-English towel Japanese-Korean dictionary _ results comparison table. In addition, the 'beef material is divided into 24,862 pieces, the compression ratio is 2647%, and the compression ratio is 5% by the conventional collapse method. From this, it can be seen that the compression data of Wei Wei has been significantly reduced. ^ In the case of the Oxford Dictionary (4) in the set-handheld electronic products, the Oxford Dictionary data shown in Figure 7 can be made into a comparison table, and the length of the original data can be seen as 22,58〇,376%. After _, the length of the data is 4,5 () 5,792 coffee; and if the compression method is used, the length of the job is 5 Na, 223 b. Here, the data is divided into 146, 292 pieces, which is 19.95%, and the conventional compression method has a compression ratio of 22.54 〇/〇. Therefore, it is understood that the effect of applying the present invention to the data is better. Hunting from the above example analysis, (4) the implementation steps of the present invention - the embodiment, that is, the first character of the same length of the same character repetition frequency, and then use the finite exhaustive method to find - replace the length surface 'and (four) generation length (10) Repeating the language unit in order to replace ' instead of repeating the language unit is encoded according to the chat ^^ color compression algorithm. And because 'the compressed data still retains the block feature in the dictionary class database and the data between the data blocks are independent of each other, thus improving the data query speed, from 12 1287362: to achieve the premise of not increasing the complexity of the decompression time The purpose of improving the efficiency of data compression is to improve the understanding of the efficiency of the algorithm by using the method of financing, and the data of the high-capacity characters in the dictionary is realized in the poor material processing. The ability to compress data faster and more efficiently. Although the best practice of this day's brain is as above, the money is not intended to limit the invention to any of the familiar artisans, and without any deviation from the spirit and scope of the present invention, The tilting range of the present invention is subject to the definition of the cap of the cap of the book. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a flow chart illustrating a compression method for performing finite exhaustive optimization of statistical features of data according to an embodiment of the present invention; FIG. 2 is a view for explaining Detailed flow chart of step 11〇; Fig. 3 is a detailed flow chart for explaining "step 12〇" in Fig. i, and Fig. 4 is a detailed flow for explaining "step 13〇" in Fig. i Fig. 5 ' is a diagram showing the application of the embodiment of the present invention to the British-Chinese 曰辞、 ', the maximum length and the reduced length of the bedding compression; A compression result comparison table of the British, Chinese, Japanese, and Korean dictionary data compression is applied to an embodiment of the present invention and a conventional compression method; and FIG. 7 is a diagram illustrating the application of an embodiment of the present invention and a conventional compression method to the Oxford Dictionary. The compression result comparison table of data compression. [Main component symbol description] Step 110................Preprocessing 13 1287362 Step 120.............. ....... character statistics, in order to get the language unit and its frequency table step 130........................ finite exhaustive search to obtain an alternative Length range step 140.....................substitute to obtain alternative informational steps 150................ ..... statistic, to obtain the Huffman tree step 160........................ Code 'to take 4 search for the tribute step 111.. .................... Is it a Unicode code file? Step 112................... Code code value less than 0x80 eliminates its high byte Step 113............. .........the rest of the codes are sorted according to the frequency of use. Steps 114................... Use a higher number of code values with a higher frequency. Code value x80~OxFE instead of step 115..................... The remaining code value is replaced by Oxff mark plus two-byte code step 116... ................... Is it dictionary material? Step 117................... Record the block marking step 122................. Counting the positions of all the same characters Step 124............... Sort by subsequent characters to get the language unit and its frequency table. Step 132.... .................determining the optimal range step 134.....................from long to short Substitute successively, and record the reduced length step 136................... Look for a maximum reduced length step 138.......... ...........get the corresponding length range of the corresponding replacement to get 1283732 to the alternative length range

1515

Claims

1287362 X. The scope of application for patent: h, including the pre-processing of the next-to-one data; the statistical pre-test: (4) the relationship table between the financial unit and the frequency of the language unit; t] a plurality of Luwei; _Guxing New Miscellaneous _, 彳彳替代替代替代替代替代替代替代替代替代替代替代替代替代 ( 一个一个 ( 早早早早早早早早早早早早早早早早早早早早早早早早早早早早早早Get a Huffman tree (HUFFMAN) tree; ^ non-曰子根据 according to the replacement H 细哈歧 ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( , flat code 'to get 2. The patent application details of the data to describe the statistical characteristics of the data for the limited poverty of the compression method 'When the data is - two bytes country: = for - dictionary data When the (four) material is pre-processed, the step τ includes the following steps: the code having the code value less than 0x80 is used to eliminate the high byte; the data is sorted out of the code of the high byte; the code is used according to the frequency of use 16 1287362 Use the frequency after the sorted encoding The predetermined number of code values are replaced by threshold values 0x80~OxFE; the sorted code other than the substitute code is replaced by a mark plus a two-byte code to obtain a long bytecode substitute data; The long bytecode substitutes the block mark record of the data to obtain the long byte code substitute data and more than one block information, wherein the long byte code substitute data is the preprocessed data. For example, the compression method for the finite exhaustive optimization of the statistical characteristics of the data described in the second paragraph of the application, wherein the predetermined number is 127. For example, the statistical characteristics described in item 1 of the patent application are limited. An optimized compression method, wherein when the data is a one-two-byte international standard coded text-like dictionary poems, more than one block information can be obtained in the face of pre-processing: and the pre-processing The data, wherein the pre-processed is a -long bytecode replacement material, and is encoded according to the alternative information material and the reading of the singularity algorithm and the Huffman _ algorithm pre-processing Come The steps include the following steps: Substituting the data for the replacement of the Haichangzi weight - several pieces of data; according to the rhyme, "to enter the block, to obtain the complex letter of the letter (four) Huff _ indignation] =r The Harmony_calculation age code·; and such as Shen 2 special spears! · Store a plurality of information to obtain the 1 contraction data. Lilai 4 described in the fourth item of limited statistical exhaustive data characteristics 5 1287362 6. 7. The optimized compression method, wherein the letter, and the Huffman tree. The soap includes the repeating language unit, the block information t application for the special _1 project The pre-slave (four) material is subject to the __block marking information----------------- ^匕 method, where when the data is - dictionary material, to pay The above block information and the pre-processed information are as claimed in the patent scope! According to the item, the data optimization _ shrink method, w (four) ^ Bo 仃仃 finite exhaustive, T Tian 忒枓为 is a dictionary data, in the case of a capital; = processing step can get more than one block The information and the pre-processing shell are encoded according to the substitute information material and the Huffman tree using the Les compression algorithm, and the steps of the data reduction step include the following steps. ··

Obtaining, by the complex number, the pre-processed data according to the block information, and dividing the block data; performing the LZSS compression algorithm and the Huffman compression on the block data according to the substitute information data and the Huffman tree The mixed code of the algorithm is replaced to obtain the 10 compressed data; and after the encoding, a plurality of pieces of information are stored to obtain the compressed data. 8. A compression method for finite exhaustive optimization of data statistical features as described in claim 7 of the patent application, wherein the information includes the repeated language unit, the block information, and the Huffman tree. 9. If the statistical characteristics of the data mentioned in the scope of the patent application are limited to exhaustive 18 1287362 Ancient: _ is found in the table - multiple repetitions above the reference length range ^ and n read the length of the complex language unit Performing a plurality of specific length ranges; 仟a obtaining 2 according to the reference length range and the maximum of the specific length of the comparison - finding the optimal circumference; _reading _ towel each - the specific length of the rib of the statement is performed by To a short successive substitution, and record the reduced length! According to the length of the reduction, the length of the largest thief is reduced; and the length of the replacement of the maximum length of the Z is reduced to a specific length. Replace the length range. 13 ^ The compression method described in the application of (4) statistical features, wherein the data is subjected to more than one linguistic unit # 代 according to the alternative length range, and the step of replacing the information with _ -: The repeating language unit in the long degree range is from long to slogan: the code result is output to the alternative information and recorded in the first paragraph, and the statistical characteristics of the data described in item 1 are limited and poorly smiled. a compression method in which the pre-processed data is compiled according to the substitution from the cattle feed, the m+ 〃曰 low-hearted beibe and the Huffman tree utilization, the W method and the Haf branch algorithm. The step of compressing the data includes the following steps: · According to the _ generation information (4) and the Huffman tree dragon pre-processing, the entanglement algorithm and the Huffman calendar algorithm are mixed coded to obtain the 20 1287362 compressing the data; and after encoding, storing a plurality of pieces of information to obtain the compressed data. 15. A compression method for finite exhaustive optimization of data statistical features as described in claim 12, wherein the information comprises the repeated language unit and the Huffman tree.

twenty one