Code device, code translator and system based on huffman coding
Background technique
Huffman (Huffman) coding is a kind of lossless entropy coding, is widely used in the Information Compressions such as digital audio/video.
Huffman coding is to classify to list entries by probability, and high probability is encoded with less bit, lower general
Rate is then encoded with more bit.In decoding, a coding mapping table is needed, each encoded radio is scanned in the table,
To be reduced into original data sequence.When such a coding mode has more redundancy in data sequence, it can obtain preferable
Compression ratio.
By taking two-value huffman coding as an example, it is assumed that there are more identical values for storage sequence, we can use Huffman volume
Code principle is to splitting data into two classes: one kind is constant value MFV (most frequently value), usually to storage square
Battle array scans for, and finds most commonly-encountered value, this kind of value can be mapped as encoded radio " 0 ", only only take up 1bit;Another kind of right and wrong
MFV value, such value can be mapped in it and add one leading " 1 ", as its group indication.Certainly also have and use opposite codimg logic
The case where, this is the difference of use habit.
It is a coding units by the every 16bit of list entries is example, it is assumed that 0x1010 is the most common value, it is mapped as
Bit0, then following sequence mapping relations are:
0x1010----1’b0
0x4567----{1’h1,16’h4567}
0x1010----1’b0
0x1234----{1’h1,16’h1234}
0x1010----1’b0
0x1010----1’b0
The realization of Hoffman decodeng is then with this process on the contrary, the list entries for being W for each length, needs from first
A bit is scanned for, it is matched with Code Mapping Tables, obtains first output;List entries W is updated simultaneously, it will
Bit through having decoded is abandoned, and carries out new word search.
It can be seen that the decoding of huffman coding is serial, front and back under normal conditions due to being a kind of change length coding
Decoding value exist and rely on, the search starting point of next word depends on the decoding of a upper word as a result, being therefore generally not appropriate for
Parallel decoding, and then cannot be decoded with very high throughput, also it is unfavorable for the realization of hardware-efficient rate.
In the prior art, it proposes to be divided into original huffman coding using Huffman boundary code and multiple not depend on mutually
Data sequence, when decoding finds each data sequence by searching the boundary code and is input in different decoders,
So as to realize each data sequence it is parallel decode.
However, high-speed transfer and processing system are all that assembly line is realized, Huffman boundary code speed is fast again, it is also difficult to keep up with
The speed of processor or main equipment (master).In addition, as decoding layer opportunity (the decode layer of a Huffman decoder
When timing) being unsatisfactory for system bandwidth demand, it will usually introduce the design of assembly line (pipeline), and the delay thus introduced
(latency) will lead to assembly line when address jumps empty and address " rollback ", and if without predictable boundary, address
It retracts or re-computation is almost impossible, even used coding and decoding in the prior art, but since huffman coding is elongated degree
, thus the length of data sequence not depended on mutually each be also it is uncertain, even with boundary code, and by searching choosing
Determine that boundary is still uncertain to carry out boundary.
Summary of the invention
The invention is intended to provide a kind of code device, so that the boundary of the huffman coding sequence for parallel decoding can be with
Prediction, and then parallel decoding rapidly and efficiently can be realized within hardware.
A kind of code device based on huffman coding, includes Huffman encoding module, for list entries with the
One designated length is that coding unit carries out huffman coding;
And long module is mended, for obtaining the huffman coding sequence being made of encoded radio from Huffman encoding module, and
Using the second designated length as boundary, if the input of the last one encoded radio can not be put into the independent sequence of the second designated length
Within column, just the encoded radio is put into the independent sequence of next second designated length, and it is remaining in current independent sequence
Length all fill into not influence decode result occupy-place code, final output by multiple length be the second designated length independent sequence
Arrange sequently connected coded sequence.
With this solution, the boundary of coded sequence is fixed to the second designated length, since boundary is fixed, and the side
Dependence is not present in the independent sequence that boundary is divided between each other, may be implemented when being decoded directly with the second specified length
Degree carries out parallel decoding to each independent sequence to divide coded sequence, it is not necessary to search to boundary, substantially increase and translate
Code efficiency.
Due to having used fixed boundary, when introducing the pipeline design, address, which can use, fixes the second designated length
Rollback or re-computation are carried out, the demand of system bandwidth can be better met.
Further, the Huffman encoding module encodes list entries using two-value huffman coding.
Two-value huffman coding only compresses, other in sequence constant value MFV (most frequently value)
Value remain unchanged so that decoding needed for Code Mapping Tables very simple, table look-up quickly, occupy memory space it is minimum.
Further, the code value of the occupy-place code is the leading code value for indicating non-constant value.
Due in two-value huffman coding, distinguishing constant value using leading code value (due to only with 1 code length come table
Show constant value, the encoded radio of leading code value while namely constant value) and non-constant value, and the quantity of occupy-place code must be less than
First designated length is unsatisfactory for the coding rule of non-constant value, to will not be decoded;So with the leading code value of non-constant value
Carry out occupy-place, occupy-place code will not occur and be mistaken as significant encoded radio, is directly achieved that and is accounted for by the coding rule of script
The identification of position code, avoids the generation of mistake.
The invention also discloses a kind of code translators based on huffman coding, described including multiple parallel decoders
Decoder is second to a length for the coding rule according to used by the Huffman encoding module in above-mentioned code device
The sequence of designated length carries out Hoffman decodeng;
It further include input unit, for the coded sequence of input to be divided into multiple independent sequences with the second designated length,
And it is input to each independent sequence is parallel in the decoder.
Since boundary is fixed, and dependence is not present in the independent sequence that the boundary is divided between each other, into
It may be implemented when row decoding directly with the second designated length to divide coded sequence, and each independent sequence translated parallel
Code, it is not necessary to boundary be searched, decoding efficiency is substantially increased.
Further, the decoder includes multiple cascade unit decoding modules;
The unit decoding module, can for from the coded sequence of input, searching out first according to Code Mapping Tables
The code value or code value sequence of decoding, and restore the code value or code value sequence be a length in list entries is the first specified length
The sequence of degree as decoding output, while the coded sequence that not yet decodes of output for next unit decoding module come using.
By cascade unit decoding module, it need to only confirm that single decodes related list entries length, so that it may will
Remaining list entries is input to next unit decoding module, can carry out multiple code values or code value sequence simultaneously in one cycle
The decoding of column, improves decoding efficiency.
Further, the decoder further includes Bit andits control module;
The unit decoding module is also used to export single and decodes the length of related code value or code value sequence as position
It moves length and gives Bit andits control module;
The Bit andits control module, the shift length for being sent by unit therefor decoding module judge the volume of input
Whether code sequence, which has decoded, finishes.
Further, the Bit andits control module refers to according to whether the coded sequence of input has been displaced to more than the second length
Measured length subtracts the first position for specifying long length, judges whether the coded sequence of input has decoded and finishes.
It is occupy-place code due to being input to coded sequence finally, is no longer the coding that can be decoded, is then able to quickly
Judge whether coded sequence currently entered decodes to finish.
On the other hand, the present invention also provides a kind of compression and decompression systems, including above-mentioned code device and translate
Code device;
The code device, for by the way that list entries is encoded to coded sequence to realize the compression to input data;
The code translator, for by the way that the coded sequence of input is reduced to list entries to realize to input data
Decompression.
This programme uses the characteristic of short coded sequence using list entries high to probability of occurrence in huffman coding, into
Row compression, is especially beneficial the compression of the list entries more than repetition values, in addition big using the parallel decoding without carrying out boundary search
The efficiency of decompression is improved greatly.
On the other hand, the present invention also provides a kind of convolutional neural networks uses including above-mentioned compression and decompression systems
In to weight parameter income compression and decompression.
The weight parameter of convolutional neural networks shows as a large-scale sparse matrix, containing a large amount of repetition values (0), and rolls up
The update that weight parameter can be all carried out after the product each self study of neural network, needs to re-enter, and data carrying is a skill
Art problem, data transfer bandwidth become the bottleneck of system for restricting performance boost;It, can using Compress softwares compression system of the invention
To provide biggish compression ratio, be very beneficial to save transmission bandwidth, and decompress it is more efficient, may be implemented in real time quickly solution
Pressure is very beneficial to improve the performance of convolutional neural networks.
Detailed description of the invention
Fig. 1 is the structural schematic diagram of the unit decoding module in the embodiment of the present invention.
Fig. 2 is the structural schematic diagram of the unit decoding module circuits cascading in the embodiment of the present invention.
Fig. 3 is the structural schematic diagram of the code translator based on huffman coding in the embodiment of the present invention.
Fig. 4 is the structural schematic diagram of the CNN accelerating circuit in the embodiment of the present invention.
Fig. 5 is the structural schematic diagram of the code device based on huffman coding in the embodiment of the present invention.
Specific embodiment
It is further described below by specific embodiment:
The present embodiment is based on two-value huffman coding described in background technique, hereinafter except the feelings for having special instruction
Outside condition, alleged huffman coding is the two-value huffman coding;
Huffman coding (or mapping) is carried out by coding units of every S length to list entries first in the present embodiment,
Original coded sequence is obtained, in the process, using length W as boundary, if the input of the last one S long can not be encoded
To within W length, just the encoded radio is put into next W length, and remaining length is all mended in current W code length
Enter to indicate the leading code value 1 of non-constant value.
The coded sequence obtained as a result, is that the independent sequence of multiple W length is sequentially connected with.
Implement the code device of this cataloged procedure as shown in figure 5, including Huffman encoding module, for input sequence
Column carry out huffman coding by coding unit of S length;
And long module is mended, for obtaining the huffman coding sequence being made of encoded radio from Huffman encoding module, and
Using length W as boundary, if the input of the last one encoded radio can not be put within the independent sequence of W length, just should
Encoded radio is put into the independent sequence of next W length, and remaining length all fills into 1 in current independent sequence, finally
The sequently connected coded sequence of independent sequence that output is W by multiple length.
It include multiple parallel decoders, each decoder correspondingly, providing a kind of code translator in the present embodiment
It is cascaded by multiple unit decoding modules (layer).
Wherein the design of unit decoding module is as follows,
Unit decoding module is a basic decoding function module, is responsible for from the sequence of a W length, searches out one
The value of a S length, at the same export the W sequence of update for next unit decoding module come using.In addition, each unit decodes mould
Block exports a shift amount (shift_number), judges whether W sequence has been searched for decoding and finished for peripheral control.
With W=256bit, for S=16bit, Fig. 1 is a kind of such logic circuit of unit decoding module, it is every time
Receive the coded sequence input of a 256bit as coding sequence, then since the initial position of coding sequence, according to the
One bit searches out the value (value_dec) of first decoding output for 0 or 1, while exporting coding sequence after this time decoding
Shift amount (shift_number), the value be 1 or 17, depend on decoding result.After the completion of one coding units decoding
It updates output sequence (value_shift_out), it is that coded sequence input (value_in) is moved to right 1bit or 17bit
As a result.
The value of Shift_number is finished for judging whether the sequence of W length has decoded, which, which needs to combine, compiles
The polishing and alignment techniques on boundary when code.
The cascade of unit decoding module circuit
The logical depth of one unit decoding module is very shallow, its each output list for only decoding at most one S length
Member.We can be together in series N number of unit decoding module, to realize the parallel output of the N number of data of a cycle.Cascade side
Formula such as Fig. 2: the value_shift_out output of each unit decoding module, the input as next unit decoding module.That
In one cycle, so that it may obtain the output of N number of value_dec.And N number of output, the possibility shift step of W sequence is in N
Between the N number of bit of~(S+1) *.
For W sequence, if series N should be set greater than W/S using S bit as coding units.Conversely, by
N is determined in the logical depth and technique of each layer unit decoding module, in the case where S is certain, the selection of W sequence length
Are as follows:
W=S*N
By taking W=256bit as an example, for example choosing mono- byte of s=8bit is a decoding unit, then N=32, generally
Under deep submicron process, in the cycle length of 200Mhz or so, it can more easily realize that the timing of 32layer decoding is received
It holds back.In the case, a cycle may solve the data of most 32byte parallel.
Coding alignment and the high-speed parallel of W sequence decode
The maximum length sequence that the coding of one W length can decode out is W*S length, if Bit andits control module in Fig. 2
It is more than the position of (W-S) length that (shift control), which has found that coding sequence has been displaced to, and remaining less than or equal to S long
The code of degree is all 1, then illustrating that remaining " 1 " has not had efficient coding value, W sequential decoding is finished at this time.
Fig. 3 is exactly the structural schematic diagram of such a high-speed parallel Hoffman decodeng device.There are four parallel to translate in figure
Code module decodes the list entries of 4W length, can be with highest if decoder (Huffman_Dec) logical depth is N
Realize the 4*N*S data output of each period.Output is by subsequent buffering and synchronizes and (uses Fifo/buffers in figure), can
To realize very high system throughput.
High speed Huffman coding and decoding in CNN
Include a large amount of multiply-add operation in convolutional neural networks (CNN), in order to accelerate the performance of CNN operation, special DSP or
The operation that person ASIC is developed for use in hardware-level accelerates.This kind of circuit generally comprises hundreds if not thousands of and multiplies accumulating fortune
Unit is calculated, it is a challenge that the CNN operation of various configurations, which is assigned to parallel computation on hundreds and thousands of a multipliers, another is difficult
Topic is exactly that these multipliers need a large amount of data input, and data bandwidth becomes the bottleneck of system for restricting performance boost.
Therefore the statistical nature for utilizing data, compresses the data scale of construction, and the decoding of real-time perfoming high speed, for drop
Bandwidth requirement low promotes CNN efficiency and brings great benefit.
In neural network CNN trained weight parameter (Weight data) be in many cases a four-dimension (Co,
Ci, Y, X) sparse matrix, there are more constants.Weight parameter matrix be for CNN process it is known, they can instructed
It is encoded after white silk by the code device in the present embodiment, in the present embodiment, each value in weight parameter matrix is by according to one
Fixed sequence is sequentially connected with, to form a weight parameter value sequence, is filled at this time with the coding provided in the present embodiment
It sets that the weight parameter value sequence encode and can obtain more code efficiency.Then by hardware in CNN operational process
Decoding in real time.Decoding had both saved the memory space of weight parameter in real time, has been greatly reduced weight parameter and has been transferred to internal CNN
Transmission bandwidth demand in hardware accelerator.The realization of high-speed parallel decoding, so that huffman coding and decoding accelerate to transport to CNN
Defeated efficiency influence can be ignored.
Fig. 4 is one and is utilized and forms compression by the coding/decoding device in the present embodiment and the CNN of decompression systems adds
The CNN engine of fast electrical block diagram, a high speed is responsible for the scheduling and control of CNN operation, in order to balance inside CNN
And the bandwidth difference outside piece is away from placed the biggish inner buffer of muti-piece (internal ram) to carry out weight parameter
(Weight data), the storage of input data (input data) and output data (output data).But usual feelings
Inner buffer is much not enough to place all data under condition, so multiple intelligent data management (DMA) engines are responsible for inside and outside
Data are monitored, and are needed to suspend CNN operation once shortage of data and are carried out data carrying, such as need to input.As shown,
Increase code translator (Huffman decoder) provided in a present embodiment outside intelligent DMA engine, to input
The weighted data (Encoded Weight data) of coding carries out the Hoffman decodeng of high-speed parallel.By test, this reality is utilized
Coding, the code translator in example are applied, will be large number of in matrix, it needs to indicate with nonzero term using identical bit number originally
0, only compressed with 1 bit, according to the sparsity of weight parameter matrix (0 number) the occupied band of transmission
Width obtains different degrees of reduction, reaches as high as close to 70% (70% or more item is 0), at the same time, due to using
It parallel decoding and only needs to search for the mapping table of very little (it is 0, very that the mapping table of two-value huffman coding, which only has constant term,
Several is initial value, the two mapping relations), bring decoding latency can be ignored substantially, this just greatly improves CNN
Integral operation performance.
What has been described above is only an embodiment of the present invention, and the common sense such as well known specific structure and characteristic are not made herein in scheme
Excessive description, technical field that the present invention belongs to is all before one skilled in the art know the applying date or priority date
Ordinary technical knowledge can know the prior art all in the field, and have using routine experiment hand before the date
The ability of section, one skilled in the art can improve and be implemented in conjunction with self-ability under the enlightenment that the application provides
This programme, some typical known features or known method should not become one skilled in the art and implement the application
Obstacle.It should be pointed out that for those skilled in the art, without departing from the structure of the invention, can also make
Several modifications and improvements out, these also should be considered as protection scope of the present invention, these all will not influence the effect that the present invention is implemented
Fruit and patent practicability.The scope of protection required by this application should be based on the content of the claims, the tool in specification
The records such as body embodiment can be used for explaining the content of claim.