CN101478311B - Hardware accelerated implementation process for bzip2 compression algorithm - Google Patents

Hardware accelerated implementation process for bzip2 compression algorithm Download PDF

Info

Publication number
CN101478311B
CN101478311B CN2009100955967A CN200910095596A CN101478311B CN 101478311 B CN101478311 B CN 101478311B CN 2009100955967 A CN2009100955967 A CN 2009100955967A CN 200910095596 A CN200910095596 A CN 200910095596A CN 101478311 B CN101478311 B CN 101478311B
Authority
CN
China
Prior art keywords
input
hardware accelerator
register
byte
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009100955967A
Other languages
Chinese (zh)
Other versions
CN101478311A (en
Inventor
陈天洲
严力科
胡威
王罡
冯德贵
吴斌斌
陈度
王勇刚
刘敬伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN2009100955967A priority Critical patent/CN101478311B/en
Publication of CN101478311A publication Critical patent/CN101478311A/en
Application granted granted Critical
Publication of CN101478311B publication Critical patent/CN101478311B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a hardware accelerating implementation method of a bzip2 compression algorithm, wherein, a hardware accelerator is utilized to implement preposing conversion and stroke length coding which cost a great deal of runtime so as to accelerate the program compression speed. The hardware accelerating implementation method has the characteristics as follows: firstly, an input/output buffer memory of the hardware accelerator is used as a communication interface, and is communicated with a general-purpose computing system through the communication interface; software prepares input data for the hardware accelerator and sorts and reads output data; and so the design of the hardware accelerator is simplified; and secondly, the preposing conversion and stroke length coding is realized in a hardware manner, a fully expanding 2048-bit parallel comparator and a shifter are adopted, so that the program execution is accelerated, the data compression speed of the bzip2 algorithm is accelerated, and the program performance is enhanced effectively.

Description

The hardware-accelerated implementation method of bzip2 compression algorithm
Technical field
The present invention relates to software-hardware synergism design, data compression technique field, relate in particular to the hardware-accelerated implementation method of a kind of bzip2 compression algorithm.
Background technology
Along with the application of new material and the development of new technology, the VLSI technology makes great progress, and this is that polycaryon processor (Chip Multi-Processor, lay a good foundation by development CMP).CMP is integrated in a plurality of calculating kernels in the processor chips exactly, thereby improves computing capability., by the equity of calculating kernel whether CMP can be divided into isomorphism multinuclear and heterogeneous polynuclear.
In the years to come, the number of handling nuclear will get more and more, but, along with processing check figure order integrated in the single chip is more and more, increase processing check figure order and be difficult to bring bigger performance boost, general processor also is difficult to satisfy the fusion application demand gradually simultaneously, and increasing polycaryon processor turns to the SoC framework, just the heterogeneous polynuclear framework.Increasing research institution has carried out the research towards heterogeneous multi-nucleus processor, and these researchs have comprised the every aspect of heterogeneous multi-nucleus processor system, as handling the optimization of nuclear structure; Thread on the heterogeneous multi-nucleus processor distributes and migration; And at CPU+DSP polycaryon processor structural research of looking Audio Processing etc.And some commercial processor have begun to adopt the isomery system, perhaps at some special-purpose accelerators of specific applied customization.
Bzip2 is higher than the compression efficiency of traditional gzip or ZIP, but its compression speed is slower.From this point, it is very similar to some other compression algorithm of nearest appearance.Other different is with RAR or ZIP etc., and bzip2 is a data tool of compression, rather than the filing instrument, and it and gzip are similar in this.Program itself does not comprise the instrument that is used for a plurality of files, encryption or document cutting, on the contrary need use external tool as tar or Gnu PG according to the tradition of UNIX.
Bzip2 uses Burrows-Wheeler transform to convert the character string that repeats the character string of same letter to, handles with move-to-front transform then, uses Huffman encoding to compress at last.All data blocks all are equirotal plain text data pieces in bzip2, and they can be selected with the order line variable, use any bit sequence that obtains from the decimal representation of π to identify into compressed text then.
Though the compression efficiency of bzip2 is than gzip or zip height, its slower compression speed has limited the scope of application.Along with the development of VLSI technology, the number of transistors purpose increases on the chip, can quicken its compression process for the special-purpose accelerator of bzip2 customization.
Summary of the invention
In order to satisfy the demand of the calculated performance that improves constantly, finish the program focus function of bzip2 algorithm by customizing special-purpose accelerator, improve the compression speed of bzip2 algorithm, the object of the present invention is to provide the hardware-accelerated implementation method of a kind of bzip2 compression algorithm.
The technical scheme that technical solution problem of the present invention is adopted is:
The hardware-accelerated implementation method of a kind of bzip2 compression algorithm:
1) software manages the input and output of hardware accelerator:
Hardware accelerator with the input and output buffer memory as with the communication interface of general-purpose computing system;
The input and output buffer memory of the direct access hardware accelerator of software, for hardware accelerator is prepared the input data, and dateout is read in arrangement:
1. before hardware accelerator began to calculate, the input data of the good hardware accelerator of software organization were written to the input-buffer of hardware accelerator;
2. after hardware accelerator calculated and finishes, software was taken the dateout of hardware accelerator away from buffer memory, write back to Installed System Memory;
2) hardware accelerator is realized preposing conversion and run length encoding
Hardware accelerator mainly comprises registers group, 2048 parallel-by-bit comparators, 2048 bit shift devices, a 256-8 encoder and a length encoder;
Registers group comprises local storage, local cache, current byte register, Current Address Register, output address register, consecutive identical byte counter, 2048 character lists register;
The specific implementation step is as follows:
1. reading of content is to current byte register from input-buffer according to the current address, and the current address adds 1;
2. with the input of current byte content of registers and character lists register, walk abreast relatively as 2048 parallel-by-bit comparators;
3. with the output of 2048 parallel-by-bit comparators input, encode as the 256-8 encoder;
I, when coding result is 00000000, consecutive identical byte counter adds 1, continues step 1.;
II, when coding result is not 00000000, and consecutive identical byte counter is 0 o'clock, continues execution in step 4.;
III, when coding result is not 00000000, and consecutive identical byte counter is not 0 o'clock, continues execution in step 5.;
4. with the input of the output result of 2048 parallel-by-bit comparators and character lists register as 2048 bit shift devices, with a byte in one among the output result of the 2048 parallel-by-bit comparators corresponding character lists register, with preposition first byte of ' 1 ' byte pointed among the output result of 2048 parallel-by-bit comparators, the byte in ' 0 ' the pairing character lists register on ' 1 ' left side is moved 8 backward to the character lists register; Continue execution in step 6.;
5. with of the input of consecutive identical byte counter count value, carry out run length encoding, continue execution in step then 4. as length encoder;
6. the coding result with the 256-8 encoder writes back to the space that output address register points in the local storage; If the input data are not also handled, continue step 1.;
If the input data are all handled, hardware accelerator is hung up, and notice software is fetched result data.
The beneficial effect that the present invention has is:
At first, with hardware accelerator input and output buffer memory as with the communication interface of general-purpose computing system, and be that hardware accelerator is prepared the input data, and arrangement reads dateout by software, simplified the design of hardware accelerator; Secondly, be implemented in preposing conversion and the run length encoding that holding time is maximum in the whole procedure, quickened program implementation, accelerated the data compression speed of bzip2 algorithm, effectively improve the performance of program with hardware mode.
Description of drawings
Fig. 1 is an overview flow chart of the present invention.
Fig. 2 is the module diagram of hardware accelerator of the present invention.
Embodiment
Specific implementation flow process based on the hardware thread execution method of processor and FPGA mixed architecture is as follows:
The hardware-accelerated implementation method of a kind of bzip2 compression algorithm, concrete steps be as shown in Figure 1:
1) software manages the input and output of accelerator
Hardware accelerator with the input and output buffer memory as with the communication interface of general-purpose computing system, general-purpose computing system refers to traditional desktop computer to be the all-purpose computer of representative.General-purpose computing system is by the input and output buffer memory of PCI-E bus access hardware accelerator, in the present invention, input-buffer separates with output buffers, input-buffer is called local cache, input data as the buffer memory hardware accelerator, output buffers is called local storage, as the result of calculation of storage hardware accelerator.
Software is by the input and output buffer memory of the direct access hardware accelerator of PCI-E bus, and for hardware accelerator is prepared the input data, and dateout is read in arrangement:
1. before hardware accelerator began to calculate, software was organized the input data of hardware accelerator in Installed System Memory, will be organized in the local cache that is transferred to hardware accelerator of data in the Installed System Memory then by PCI-E, notified hardware accelerator to begin to calculate then;
2. after hardware accelerator calculates and finishes, produce and interrupt, notice software is taken the dateout of hardware accelerator away from the storage of this locality, write back to Installed System Memory.
2) hardware accelerator is realized preposing conversion and run length encoding
The module diagram of hardware accelerator comprises local storage, local cache, registers group, 2048 parallel-by-bit comparators, 2048 bit shift devices, a 256-8 encoder and a length encoder as shown in Figure 2;
2048 parallel-by-bit comparators have two inputs: 18 input and 2048 inputs; The output result is 256, per 8 comparative result in 8 inputs of per 1 bit representation and 2048 inputs, and identical then is ' 1 ', otherwise is ' 0 '.
2048 bit shift devices also have two inputs: 1 256 input and 2048 inputs; The output result is 2048,256 inputs per 1 for 8 in 2048 inputs, shift unit is preposition to first byte with the byte of ' 1 ' pairing 2048 inputs in 256 inputs, and the byte in ' 0 ' pairing 2048 inputs on ' 1 ' left side moved 8 backward, produce 2048 output result.
The 256-8 encoder produces 8 output result according to ' 1 ' position in 256 the input, its numerical value be 256 be in ' 1 ' position.
Registers group comprises 8 current byte register, 16 Current Address Register, 16 output address register, 16 consecutive identical byte counter, 2048 character lists register.Current Address Register, output address register and consecutive identical byte counter initial value are 0, when the character lists register is initial from left to right in order storing value be 0 to 256 byte.
The specific implementation step is as follows:
1. reading of content is to current byte register from local cache according to the current address, and the current address adds 1;
2. with the input of current byte content of registers and character lists register, walk abreast relatively as 2048 parallel-by-bit comparators;
3. with the output of 2048 parallel-by-bit comparators input, encode as the 256-8 encoder;
I, when coding result is 00000000, consecutive identical byte counter adds 1, continues step 1.;
II, when coding result is not 00000000, and consecutive identical byte counter is 0 o'clock, continues execution in step 4.;
III, when coding result is not 00000000, and consecutive identical byte counter is not 0 o'clock, continues execution in step 5.;
4. with the input of the output result of 2048 parallel-by-bit comparators and character lists register as 2048 bit shift devices, with a byte in one among the output result of the 2048 parallel-by-bit comparators corresponding character lists register, with preposition first byte of ' 1 ' byte pointed among the output result of 2048 parallel-by-bit comparators, the byte in ' 0 ' the pairing character lists register on ' 1 ' left side is moved 8 backward to the character lists register; Continue execution in step 6.;
5. with of the input of consecutive identical byte counter count value, carry out run length encoding as length encoder, specific as follows:
If position, I consecutive identical byte counter end is ' 1 ', the output address register pointing space writes 1 in the storage of this locality, and input address register adds 1;
If position, II consecutive identical byte counter end is ' 0 ', the output address register pointing space writes 0 in the storage of this locality, and input address register adds 1;
If 4. the consecutive identical byte counter value of III less than 2, continues execution in step; Otherwise consecutive identical byte counter subtracts 2 and also moves 1 again, continues execution in step 5.;
6. the coding result with the 256-8 encoder writes back to the space that output address register points in the local storage, and the output address register content adds 1; If the input data are not also handled, continue step 1.;
If the input data are all handled, hardware accelerator is hung up, and notice software is fetched result data.

Claims (1)

1. hardware-accelerated implementation method of bzip2 compression algorithm is characterized in that:
1) software manages the input and output of hardware accelerator:
Hardware accelerator with the input and output buffer memory as with the communication interface of general-purpose computing system;
The input and output buffer memory of the direct access hardware accelerator of software, for hardware accelerator is prepared the input data, and put and read dateout in order:
1. before hardware accelerator began to calculate, the input data of the good hardware accelerator of software organization were written to the input-buffer of hardware accelerator;
2. after hardware accelerator calculated and finishes, software was taken the dateout of hardware accelerator away from output buffers, write back to Installed System Memory;
2) hardware accelerator is realized preposing conversion and run length encoding:
Input-buffer separates with output buffers, and input-buffer is called local cache, and as the input data of buffer memory hardware accelerator, output buffers is called local storage, as the result of calculation of storage hardware accelerator; Hardware accelerator comprises local storage, local cache, registers group, 2048 parallel-by-bit comparators, 2048 bit shift devices, a 256-8 encoder and a length encoder;
Registers group comprises current byte register, Current Address Register, output address register, consecutive identical byte counter, 2048 character lists register; Wherein during character lists register initial from left to right in order storing value be 0 to 256 byte;
The specific implementation step is as follows:
1. reading of content is to current byte register from input-buffer according to the current address, and the current address adds 1;
2. with the input of current byte content of registers and character lists register as 2048 parallel-by-bit comparators, walk abreast as follows relatively: 2048 parallel-by-bit comparators have two inputs, 8 current byte register input and 2048 character lists register input; The output result is 256, per 8 comparative result in 8 inputs of per 1 bit representation and 2048 inputs, and identical then is ' 1 ', otherwise is ' 0 ';
3. with the output of 2048 parallel-by-bit comparators input, encode as the 256-8 encoder;
I, when coding result is 00000000, consecutive identical byte counter adds 1, continues step 1.;
II, when coding result is not 00000000, and consecutive identical byte counter is 0 o'clock, continues execution in step 4.;
III, when coding result is not 00000000, and consecutive identical byte counter is not 0 o'clock, continues execution in step 5.;
4. with the input of the output result of 2048 parallel-by-bit comparators and character lists register as 2048 bit shift devices, with a byte in one among the output result of the 2048 parallel-by-bit comparators corresponding character lists register, with preposition first byte of ' 1 ' byte pointed among the output result of 2048 parallel-by-bit comparators, the byte in ' 0 ' the pairing character lists register on ' 1 ' left side is moved 8 backward to the character lists register; Continue execution in step 6.;
5. with of the input of consecutive identical byte counter count value, carry out run length encoding, continue execution in step then 4. as length encoder;
6. the coding result with the 256-8 encoder writes back to the space that output address register points in the local storage; If the input data are not also handled, continue step 1.;
If the input data are all handled, hardware accelerator is hung up, and notice software is fetched result data.
CN2009100955967A 2009-01-22 2009-01-22 Hardware accelerated implementation process for bzip2 compression algorithm Expired - Fee Related CN101478311B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009100955967A CN101478311B (en) 2009-01-22 2009-01-22 Hardware accelerated implementation process for bzip2 compression algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009100955967A CN101478311B (en) 2009-01-22 2009-01-22 Hardware accelerated implementation process for bzip2 compression algorithm

Publications (2)

Publication Number Publication Date
CN101478311A CN101478311A (en) 2009-07-08
CN101478311B true CN101478311B (en) 2010-10-20

Family

ID=40838949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009100955967A Expired - Fee Related CN101478311B (en) 2009-01-22 2009-01-22 Hardware accelerated implementation process for bzip2 compression algorithm

Country Status (1)

Country Link
CN (1) CN101478311B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102065288B (en) * 2010-06-30 2013-07-24 美商威睿电通公司 Video processing system and method realized by combining software with hardware and device thereof
CN103020205B (en) * 2012-12-05 2018-07-31 中科天玑数据科技股份有限公司 Compression/decompression method based on hardware accelerator card in a kind of distributed file system
KR101992274B1 (en) * 2013-01-02 2019-09-30 삼성전자주식회사 Method for compressing data and devices using the method
CN107204776A (en) * 2016-03-18 2017-09-26 余海箭 A kind of Web3D data compression algorithms based on floating number situation
US10783279B2 (en) * 2016-09-01 2020-09-22 Atmel Corporation Low cost cryptographic accelerator
CN107220028B (en) * 2017-05-24 2020-05-29 上海兆芯集成电路有限公司 Accelerated compression method and apparatus using the same
CN109639285B (en) * 2018-12-05 2023-06-13 北京安华金和科技有限公司 Method for improving BZIP2 compression algorithm speed based on finite block ordering compression
CN111211787A (en) * 2019-10-09 2020-05-29 华中科技大学 Industrial data compression method, system, storage medium and terminal

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1779716A (en) * 2005-05-26 2006-05-31 智多微电子(上海)有限公司 Realization of rapid coding-decoding circuit with run-length
CN101116342A (en) * 2005-03-30 2008-01-30 英特尔公司 Multistandard variable length decoder with hardware accelerator
US20080201718A1 (en) * 2007-02-16 2008-08-21 Ofir Zohar Method, an apparatus and a system for managing a distributed compression system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101116342A (en) * 2005-03-30 2008-01-30 英特尔公司 Multistandard variable length decoder with hardware accelerator
CN1779716A (en) * 2005-05-26 2006-05-31 智多微电子(上海)有限公司 Realization of rapid coding-decoding circuit with run-length
US20080201718A1 (en) * 2007-02-16 2008-08-21 Ofir Zohar Method, an apparatus and a system for managing a distributed compression system

Also Published As

Publication number Publication date
CN101478311A (en) 2009-07-08

Similar Documents

Publication Publication Date Title
CN101478311B (en) Hardware accelerated implementation process for bzip2 compression algorithm
Fang et al. In-memory database acceleration on FPGAs: a survey
US9304898B2 (en) Hardware-based array compression
TWI517031B (en) Vector instruction for presenting complex conjugates of respective complex numbers
TWI617978B (en) Method and apparatus for vector index load and store
CN108028665B (en) Systems, methods, and apparatus for compression using hardware and software
TWI737651B (en) Processor, method and system for accelerating graph analytics
CN103023509A (en) Hardware LZ77 compression implementation system and implementation method thereof
CN1402843A (en) Processing multiply-accumulate operations in single cycle
CN107925419B (en) System, method and apparatus for decompression using hardware and software
TW201346739A (en) Super multiply ADD (super MADD) instruction
Lal et al. E^ 2MC: Entropy Encoding Based Memory Compression for GPUs
CN111030702A (en) Text compression method
CN103268299B (en) A kind of generic data compression IP kernel being applied to PXI Express bus testing system
Zu et al. GLZSS: LZSS lossless data compression can be faster
Ozsoy et al. Optimizing LZSS compression on GPGPUs
Choi et al. Design of FPGA-based LZ77 compressor with runtime configurable compression ratio and throughput
Jun et al. Zip-io: Architecture for application-specific compression of big data
Li et al. HODS: Hardware object deserialization inside SSD storage
CN209496362U (en) Three n binary adders of input
CN116097212A (en) Apparatus, method, and system for a 16-bit floating point matrix dot product instruction
CN113849770A (en) Matrix data is dispersed and collected by rows
US11416960B2 (en) Shader accessible configurable binning subsystem
TWI799221B (en) Method and apparatus for programming data into flash memory
Dias et al. An Approach for Code Compression in Run Time for Embedded Systems–A Preliminary Results

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20101020

Termination date: 20120122