CN101478311B - Hardware accelerated implementation process for bzip2 compression algorithm - Google Patents
Hardware accelerated implementation process for bzip2 compression algorithm Download PDFInfo
- Publication number
- CN101478311B CN101478311B CN2009100955967A CN200910095596A CN101478311B CN 101478311 B CN101478311 B CN 101478311B CN 2009100955967 A CN2009100955967 A CN 2009100955967A CN 200910095596 A CN200910095596 A CN 200910095596A CN 101478311 B CN101478311 B CN 101478311B
- Authority
- CN
- China
- Prior art keywords
- input
- hardware accelerator
- register
- byte
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000007906 compression Methods 0.000 title claims abstract description 16
- 230000006835 compression Effects 0.000 title claims abstract description 16
- 238000000034 method Methods 0.000 title claims abstract description 13
- 239000000872 buffer Substances 0.000 claims abstract description 25
- 238000006243 chemical reaction Methods 0.000 claims abstract description 6
- 238000004891 communication Methods 0.000 claims abstract description 6
- 238000004364 calculation method Methods 0.000 claims description 2
- 230000000052 comparative effect Effects 0.000 claims description 2
- 230000008520 organization Effects 0.000 claims description 2
- 238000013144 data compression Methods 0.000 abstract description 3
- 238000013461 design Methods 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
Images
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a hardware accelerating implementation method of a bzip2 compression algorithm, wherein, a hardware accelerator is utilized to implement preposing conversion and stroke length coding which cost a great deal of runtime so as to accelerate the program compression speed. The hardware accelerating implementation method has the characteristics as follows: firstly, an input/output buffer memory of the hardware accelerator is used as a communication interface, and is communicated with a general-purpose computing system through the communication interface; software prepares input data for the hardware accelerator and sorts and reads output data; and so the design of the hardware accelerator is simplified; and secondly, the preposing conversion and stroke length coding is realized in a hardware manner, a fully expanding 2048-bit parallel comparator and a shifter are adopted, so that the program execution is accelerated, the data compression speed of the bzip2 algorithm is accelerated, and the program performance is enhanced effectively.
Description
Technical field
The present invention relates to software-hardware synergism design, data compression technique field, relate in particular to the hardware-accelerated implementation method of a kind of bzip2 compression algorithm.
Background technology
Along with the application of new material and the development of new technology, the VLSI technology makes great progress, and this is that polycaryon processor (Chip Multi-Processor, lay a good foundation by development CMP).CMP is integrated in a plurality of calculating kernels in the processor chips exactly, thereby improves computing capability., by the equity of calculating kernel whether CMP can be divided into isomorphism multinuclear and heterogeneous polynuclear.
In the years to come, the number of handling nuclear will get more and more, but, along with processing check figure order integrated in the single chip is more and more, increase processing check figure order and be difficult to bring bigger performance boost, general processor also is difficult to satisfy the fusion application demand gradually simultaneously, and increasing polycaryon processor turns to the SoC framework, just the heterogeneous polynuclear framework.Increasing research institution has carried out the research towards heterogeneous multi-nucleus processor, and these researchs have comprised the every aspect of heterogeneous multi-nucleus processor system, as handling the optimization of nuclear structure; Thread on the heterogeneous multi-nucleus processor distributes and migration; And at CPU+DSP polycaryon processor structural research of looking Audio Processing etc.And some commercial processor have begun to adopt the isomery system, perhaps at some special-purpose accelerators of specific applied customization.
Bzip2 is higher than the compression efficiency of traditional gzip or ZIP, but its compression speed is slower.From this point, it is very similar to some other compression algorithm of nearest appearance.Other different is with RAR or ZIP etc., and bzip2 is a data tool of compression, rather than the filing instrument, and it and gzip are similar in this.Program itself does not comprise the instrument that is used for a plurality of files, encryption or document cutting, on the contrary need use external tool as tar or Gnu PG according to the tradition of UNIX.
Bzip2 uses Burrows-Wheeler transform to convert the character string that repeats the character string of same letter to, handles with move-to-front transform then, uses Huffman encoding to compress at last.All data blocks all are equirotal plain text data pieces in bzip2, and they can be selected with the order line variable, use any bit sequence that obtains from the decimal representation of π to identify into compressed text then.
Though the compression efficiency of bzip2 is than gzip or zip height, its slower compression speed has limited the scope of application.Along with the development of VLSI technology, the number of transistors purpose increases on the chip, can quicken its compression process for the special-purpose accelerator of bzip2 customization.
Summary of the invention
In order to satisfy the demand of the calculated performance that improves constantly, finish the program focus function of bzip2 algorithm by customizing special-purpose accelerator, improve the compression speed of bzip2 algorithm, the object of the present invention is to provide the hardware-accelerated implementation method of a kind of bzip2 compression algorithm.
The technical scheme that technical solution problem of the present invention is adopted is:
The hardware-accelerated implementation method of a kind of bzip2 compression algorithm:
1) software manages the input and output of hardware accelerator:
Hardware accelerator with the input and output buffer memory as with the communication interface of general-purpose computing system;
The input and output buffer memory of the direct access hardware accelerator of software, for hardware accelerator is prepared the input data, and dateout is read in arrangement:
1. before hardware accelerator began to calculate, the input data of the good hardware accelerator of software organization were written to the input-buffer of hardware accelerator;
2. after hardware accelerator calculated and finishes, software was taken the dateout of hardware accelerator away from buffer memory, write back to Installed System Memory;
2) hardware accelerator is realized preposing conversion and run length encoding
Hardware accelerator mainly comprises registers group, 2048 parallel-by-bit comparators, 2048 bit shift devices, a 256-8 encoder and a length encoder;
Registers group comprises local storage, local cache, current byte register, Current Address Register, output address register, consecutive identical byte counter, 2048 character lists register;
The specific implementation step is as follows:
1. reading of content is to current byte register from input-buffer according to the current address, and the current address adds 1;
2. with the input of current byte content of registers and character lists register, walk abreast relatively as 2048 parallel-by-bit comparators;
3. with the output of 2048 parallel-by-bit comparators input, encode as the 256-8 encoder;
I, when coding result is 00000000, consecutive identical byte counter adds 1, continues step 1.;
II, when coding result is not 00000000, and consecutive identical byte counter is 0 o'clock, continues execution in step 4.;
III, when coding result is not 00000000, and consecutive identical byte counter is not 0 o'clock, continues execution in step 5.;
4. with the input of the output result of 2048 parallel-by-bit comparators and character lists register as 2048 bit shift devices, with a byte in one among the output result of the 2048 parallel-by-bit comparators corresponding character lists register, with preposition first byte of ' 1 ' byte pointed among the output result of 2048 parallel-by-bit comparators, the byte in ' 0 ' the pairing character lists register on ' 1 ' left side is moved 8 backward to the character lists register; Continue execution in step 6.;
5. with of the input of consecutive identical byte counter count value, carry out run length encoding, continue execution in step then 4. as length encoder;
6. the coding result with the 256-8 encoder writes back to the space that output address register points in the local storage; If the input data are not also handled, continue step 1.;
If the input data are all handled, hardware accelerator is hung up, and notice software is fetched result data.
The beneficial effect that the present invention has is:
At first, with hardware accelerator input and output buffer memory as with the communication interface of general-purpose computing system, and be that hardware accelerator is prepared the input data, and arrangement reads dateout by software, simplified the design of hardware accelerator; Secondly, be implemented in preposing conversion and the run length encoding that holding time is maximum in the whole procedure, quickened program implementation, accelerated the data compression speed of bzip2 algorithm, effectively improve the performance of program with hardware mode.
Description of drawings
Fig. 1 is an overview flow chart of the present invention.
Fig. 2 is the module diagram of hardware accelerator of the present invention.
Embodiment
Specific implementation flow process based on the hardware thread execution method of processor and FPGA mixed architecture is as follows:
The hardware-accelerated implementation method of a kind of bzip2 compression algorithm, concrete steps be as shown in Figure 1:
1) software manages the input and output of accelerator
Hardware accelerator with the input and output buffer memory as with the communication interface of general-purpose computing system, general-purpose computing system refers to traditional desktop computer to be the all-purpose computer of representative.General-purpose computing system is by the input and output buffer memory of PCI-E bus access hardware accelerator, in the present invention, input-buffer separates with output buffers, input-buffer is called local cache, input data as the buffer memory hardware accelerator, output buffers is called local storage, as the result of calculation of storage hardware accelerator.
Software is by the input and output buffer memory of the direct access hardware accelerator of PCI-E bus, and for hardware accelerator is prepared the input data, and dateout is read in arrangement:
1. before hardware accelerator began to calculate, software was organized the input data of hardware accelerator in Installed System Memory, will be organized in the local cache that is transferred to hardware accelerator of data in the Installed System Memory then by PCI-E, notified hardware accelerator to begin to calculate then;
2. after hardware accelerator calculates and finishes, produce and interrupt, notice software is taken the dateout of hardware accelerator away from the storage of this locality, write back to Installed System Memory.
2) hardware accelerator is realized preposing conversion and run length encoding
The module diagram of hardware accelerator comprises local storage, local cache, registers group, 2048 parallel-by-bit comparators, 2048 bit shift devices, a 256-8 encoder and a length encoder as shown in Figure 2;
2048 parallel-by-bit comparators have two inputs: 18 input and 2048 inputs; The output result is 256, per 8 comparative result in 8 inputs of per 1 bit representation and 2048 inputs, and identical then is ' 1 ', otherwise is ' 0 '.
2048 bit shift devices also have two inputs: 1 256 input and 2048 inputs; The output result is 2048,256 inputs per 1 for 8 in 2048 inputs, shift unit is preposition to first byte with the byte of ' 1 ' pairing 2048 inputs in 256 inputs, and the byte in ' 0 ' pairing 2048 inputs on ' 1 ' left side moved 8 backward, produce 2048 output result.
The 256-8 encoder produces 8 output result according to ' 1 ' position in 256 the input, its numerical value be 256 be in ' 1 ' position.
Registers group comprises 8 current byte register, 16 Current Address Register, 16 output address register, 16 consecutive identical byte counter, 2048 character lists register.Current Address Register, output address register and consecutive identical byte counter initial value are 0, when the character lists register is initial from left to right in order storing value be 0 to 256 byte.
The specific implementation step is as follows:
1. reading of content is to current byte register from local cache according to the current address, and the current address adds 1;
2. with the input of current byte content of registers and character lists register, walk abreast relatively as 2048 parallel-by-bit comparators;
3. with the output of 2048 parallel-by-bit comparators input, encode as the 256-8 encoder;
I, when coding result is 00000000, consecutive identical byte counter adds 1, continues step 1.;
II, when coding result is not 00000000, and consecutive identical byte counter is 0 o'clock, continues execution in step 4.;
III, when coding result is not 00000000, and consecutive identical byte counter is not 0 o'clock, continues execution in step 5.;
4. with the input of the output result of 2048 parallel-by-bit comparators and character lists register as 2048 bit shift devices, with a byte in one among the output result of the 2048 parallel-by-bit comparators corresponding character lists register, with preposition first byte of ' 1 ' byte pointed among the output result of 2048 parallel-by-bit comparators, the byte in ' 0 ' the pairing character lists register on ' 1 ' left side is moved 8 backward to the character lists register; Continue execution in step 6.;
5. with of the input of consecutive identical byte counter count value, carry out run length encoding as length encoder, specific as follows:
If position, I consecutive identical byte counter end is ' 1 ', the output address register pointing space writes 1 in the storage of this locality, and input address register adds 1;
If position, II consecutive identical byte counter end is ' 0 ', the output address register pointing space writes 0 in the storage of this locality, and input address register adds 1;
If 4. the consecutive identical byte counter value of III less than 2, continues execution in step; Otherwise consecutive identical byte counter subtracts 2 and also moves 1 again, continues execution in step 5.;
6. the coding result with the 256-8 encoder writes back to the space that output address register points in the local storage, and the output address register content adds 1; If the input data are not also handled, continue step 1.;
If the input data are all handled, hardware accelerator is hung up, and notice software is fetched result data.
Claims (1)
1. hardware-accelerated implementation method of bzip2 compression algorithm is characterized in that:
1) software manages the input and output of hardware accelerator:
Hardware accelerator with the input and output buffer memory as with the communication interface of general-purpose computing system;
The input and output buffer memory of the direct access hardware accelerator of software, for hardware accelerator is prepared the input data, and put and read dateout in order:
1. before hardware accelerator began to calculate, the input data of the good hardware accelerator of software organization were written to the input-buffer of hardware accelerator;
2. after hardware accelerator calculated and finishes, software was taken the dateout of hardware accelerator away from output buffers, write back to Installed System Memory;
2) hardware accelerator is realized preposing conversion and run length encoding:
Input-buffer separates with output buffers, and input-buffer is called local cache, and as the input data of buffer memory hardware accelerator, output buffers is called local storage, as the result of calculation of storage hardware accelerator; Hardware accelerator comprises local storage, local cache, registers group, 2048 parallel-by-bit comparators, 2048 bit shift devices, a 256-8 encoder and a length encoder;
Registers group comprises current byte register, Current Address Register, output address register, consecutive identical byte counter, 2048 character lists register; Wherein during character lists register initial from left to right in order storing value be 0 to 256 byte;
The specific implementation step is as follows:
1. reading of content is to current byte register from input-buffer according to the current address, and the current address adds 1;
2. with the input of current byte content of registers and character lists register as 2048 parallel-by-bit comparators, walk abreast as follows relatively: 2048 parallel-by-bit comparators have two inputs, 8 current byte register input and 2048 character lists register input; The output result is 256, per 8 comparative result in 8 inputs of per 1 bit representation and 2048 inputs, and identical then is ' 1 ', otherwise is ' 0 ';
3. with the output of 2048 parallel-by-bit comparators input, encode as the 256-8 encoder;
I, when coding result is 00000000, consecutive identical byte counter adds 1, continues step 1.;
II, when coding result is not 00000000, and consecutive identical byte counter is 0 o'clock, continues execution in step 4.;
III, when coding result is not 00000000, and consecutive identical byte counter is not 0 o'clock, continues execution in step 5.;
4. with the input of the output result of 2048 parallel-by-bit comparators and character lists register as 2048 bit shift devices, with a byte in one among the output result of the 2048 parallel-by-bit comparators corresponding character lists register, with preposition first byte of ' 1 ' byte pointed among the output result of 2048 parallel-by-bit comparators, the byte in ' 0 ' the pairing character lists register on ' 1 ' left side is moved 8 backward to the character lists register; Continue execution in step 6.;
5. with of the input of consecutive identical byte counter count value, carry out run length encoding, continue execution in step then 4. as length encoder;
6. the coding result with the 256-8 encoder writes back to the space that output address register points in the local storage; If the input data are not also handled, continue step 1.;
If the input data are all handled, hardware accelerator is hung up, and notice software is fetched result data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009100955967A CN101478311B (en) | 2009-01-22 | 2009-01-22 | Hardware accelerated implementation process for bzip2 compression algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009100955967A CN101478311B (en) | 2009-01-22 | 2009-01-22 | Hardware accelerated implementation process for bzip2 compression algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101478311A CN101478311A (en) | 2009-07-08 |
CN101478311B true CN101478311B (en) | 2010-10-20 |
Family
ID=40838949
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2009100955967A Expired - Fee Related CN101478311B (en) | 2009-01-22 | 2009-01-22 | Hardware accelerated implementation process for bzip2 compression algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101478311B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102065288B (en) * | 2010-06-30 | 2013-07-24 | 美商威睿电通公司 | Video processing system and method realized by combining software with hardware and device thereof |
CN103020205B (en) * | 2012-12-05 | 2018-07-31 | 中科天玑数据科技股份有限公司 | Compression/decompression method based on hardware accelerator card in a kind of distributed file system |
KR101992274B1 (en) * | 2013-01-02 | 2019-09-30 | 삼성전자주식회사 | Method for compressing data and devices using the method |
CN107204776A (en) * | 2016-03-18 | 2017-09-26 | 余海箭 | A kind of Web3D data compression algorithms based on floating number situation |
US10783279B2 (en) * | 2016-09-01 | 2020-09-22 | Atmel Corporation | Low cost cryptographic accelerator |
CN107220028B (en) * | 2017-05-24 | 2020-05-29 | 上海兆芯集成电路有限公司 | Accelerated compression method and apparatus using the same |
CN109639285B (en) * | 2018-12-05 | 2023-06-13 | 北京安华金和科技有限公司 | Method for improving BZIP2 compression algorithm speed based on finite block ordering compression |
CN111211787A (en) * | 2019-10-09 | 2020-05-29 | 华中科技大学 | Industrial data compression method, system, storage medium and terminal |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1779716A (en) * | 2005-05-26 | 2006-05-31 | 智多微电子(上海)有限公司 | Realization of rapid coding-decoding circuit with run-length |
CN101116342A (en) * | 2005-03-30 | 2008-01-30 | 英特尔公司 | Multistandard variable length decoder with hardware accelerator |
US20080201718A1 (en) * | 2007-02-16 | 2008-08-21 | Ofir Zohar | Method, an apparatus and a system for managing a distributed compression system |
-
2009
- 2009-01-22 CN CN2009100955967A patent/CN101478311B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101116342A (en) * | 2005-03-30 | 2008-01-30 | 英特尔公司 | Multistandard variable length decoder with hardware accelerator |
CN1779716A (en) * | 2005-05-26 | 2006-05-31 | 智多微电子(上海)有限公司 | Realization of rapid coding-decoding circuit with run-length |
US20080201718A1 (en) * | 2007-02-16 | 2008-08-21 | Ofir Zohar | Method, an apparatus and a system for managing a distributed compression system |
Also Published As
Publication number | Publication date |
---|---|
CN101478311A (en) | 2009-07-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101478311B (en) | Hardware accelerated implementation process for bzip2 compression algorithm | |
US9304898B2 (en) | Hardware-based array compression | |
TWI517031B (en) | Vector instruction for presenting complex conjugates of respective complex numbers | |
TWI617978B (en) | Method and apparatus for vector index load and store | |
CN117724766A (en) | System and method for executing instructions that transform a matrix into a row-interleaved format | |
CN108028665B (en) | Systems, methods, and apparatus for compression using hardware and software | |
TWI737651B (en) | Processor, method and system for accelerating graph analytics | |
CN103023509A (en) | Hardware LZ77 compression implementation system and implementation method thereof | |
CN1402843A (en) | Processing multiply-accumulate operations in single cycle | |
CN107925419B (en) | System, method and apparatus for decompression using hardware and software | |
TW201346739A (en) | Super multiply ADD (super MADD) instruction | |
Lal et al. | E^ 2MC: Entropy Encoding Based Memory Compression for GPUs | |
CN111030702A (en) | Text compression method | |
Ouyang et al. | Active SSD design for energy-efficiency improvement of web-scale data analysis | |
Zu et al. | GLZSS: LZSS lossless data compression can be faster | |
CN103268299B (en) | A kind of generic data compression IP kernel being applied to PXI Express bus testing system | |
Choi et al. | Design of FPGA-based LZ77 compressor with runtime configurable compression ratio and throughput | |
CN116097212A (en) | Apparatus, method, and system for a 16-bit floating point matrix dot product instruction | |
Jun et al. | Zip-io: Architecture for application-specific compression of big data | |
US12001237B2 (en) | Pattern-based cache block compression | |
CN209496362U (en) | Three n binary adders of input | |
CN112035167B (en) | Apparatus and method for streaming using configurable hardware streaming unit | |
CN113849770A (en) | Matrix data is dispersed and collected by rows | |
US11416960B2 (en) | Shader accessible configurable binning subsystem | |
TWI799221B (en) | Method and apparatus for programming data into flash memory |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20101020 Termination date: 20120122 |