CN1560728A - Superlong command word processor and its command compression method - Google Patents
Superlong command word processor and its command compression method Download PDFInfo
- Publication number
- CN1560728A CN1560728A CNA2004100167557A CN200410016755A CN1560728A CN 1560728 A CN1560728 A CN 1560728A CN A2004100167557 A CNA2004100167557 A CN A2004100167557A CN 200410016755 A CN200410016755 A CN 200410016755A CN 1560728 A CN1560728 A CN 1560728A
- Authority
- CN
- China
- Prior art keywords
- instruction
- compression
- command
- operational code
- word processor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Executing Machine-Instructions (AREA)
Abstract
The invention discloses an overlength command work processor, which includes a command grade, command decompressing unit, command distribution and decoding unit, several data channels, each data channel includes several functional units and a port resister file; the command compressing method of the processor is made up of compressions in vertical direction and in parallel direction. The method can reduces the bandwidth of the command, upgrades the utilization rte of the command bandwidth, reduces the size of the program memory space on chip, and realizes the aim to reduce the hardware expenditure.
Description
Technical field:
The present invention relates to a kind of instruction processing unit and instruction compression method thereof.
Background technology:
Very long instruction word (VLIW) structure is widely used in up-to-date, the high performance processor design in a lot of fields, TMS320C6x such as TI company, the dsp processors such as StarCore of Agere company, the Trimedia Media Processor of TriMedia Technologies company, and the general processor of Intel Company etc.The VLIW structure is exactly that the instruction of a lot of standards is bundled in the long instruction word, and this instruction word has comprised can be on the different sheets or the instruction of carrying out simultaneously in the different function units on a slice.Yet a so long instruction word needs the very wide program storage bandwidth and the bigger program space.Along with the growth of functional unit number, instruction bandwidth and storage space resource will be more nervous.In embedded system was used, system power dissipation and chip area were most important factors, and program storage and instruction bandwidth are big more, and area of chip is also just big more.So reduce to instruct bandwidth and the program storage are very crucial for the expense that reduces system.Usually adopt condensed instruction and reduce the instruction bandwidth, and the instruction compression method we it mainly is divided into two classes.One class adopts the free of losses data compression method, promptly based on the mode of Huffman coding and condensed instruction.We call instruction compression method based on the pattern of individual instructions form to them other one big class methods.Just be beneficial to and realize a more simple decompression unit at being compressed with of individual instructions.These two big class methods all have the relative merits of oneself, but the compression method that is based on Huffman coding is realized relatively difficulty.Though the great favor that vliw processor has obtained in recent years, at the instruction compression method of VLIW structure as other structures, do not study very extensive.1), the instruction bandwidth is too wide general vliw processor all exists following problem or weakness:, 2), the instruction bandwidth availability ratio is low, 3), sequencer space hold amount is big on the sheet.
Summary of the invention:
At the technical matters that exists in the prior art, the invention provides a kind of based on very-long instruction word processor and instruction compression method thereof individual instructions, the pattern novelty, that can reduce the instruction bandwidth.
The present invention is for reaching above purpose, be to realize: a kind of very-long instruction word processor is provided by such technical scheme, comprise instruction-level, instruction decompression unit, command assignment and decoding unit, several data channel, each data channel comprises several function unit and a port register file; By instruction-level with information pass to the instruction decompression unit after, pass to command assignment and decoding unit again, by command assignment and decoding unit information is passed to functional unit, information can be transmitted back and forth between functional unit in each data channel and the multiport register file, and information can be transmitted back and forth between the multiport register file.
The instruction compression method of this kind very-long instruction word processor is made up of horizontal direction compression and vertical direction compression.
The horizontal direction compression is the compression to the instruction inner structure, is made up of register address compression and operational code compression; Register address compression be with a register address in the instruction as the base address, all the other register addresss in the instruction are expressed as the base address and add offset address; The operational code compression is that instruction is split as fictitious quasi-instruction A and fictitious quasi-instruction B, and operational code commonly used in the instruction is put into A, and remaining operational code is put into B, and the figure place of operational code is less than the figure place of operational code among the B among the A.The vertical direction compression is to the compression between instruction, and the instruction bag is divided into Fetch Packet and carries out bag, and Fetch Packet is put in several instructions that need to carry out together, extracts the instruction that actual needs is carried out by carrying out bag.
With very-long instruction word processor of the present invention and instruction compression method thereof, can compress effectively instruction.The horizontal direction compression can reduce the figure place of some, and the vertical direction compression can filter out non-operation instruction.The benefit that adopts this method is exactly without the modify instruction collection, need not revise compiler, need not revise the soft nuclear of hardware, and as long as in compilation, handle with sheet on to insert a decompression module when deciphering just passable.Can reduce the instruction bandwidth, improve the instruction bandwidth utilization, reduce the size of program memory space on the sheet, really reach the purpose that reduces hardware spending.
Description of drawings:
Fig. 1 is the structural representation of very-long instruction word processor of the present invention;
Fig. 2 is the software flow pattern of very-long instruction word processor of the present invention;
Fig. 3 is the order structure figure of general VLIW;
Fig. 4 is the schematic diagram of instruction operational code compression;
Fig. 5 is the schematic diagram of instruction vertical direction compression;
Fig. 6 only adopts the horizontal direction compression method, comparison diagram before and after the instruction compression;
Fig. 7 is the structural representation of new Fetch Packet.
Embodiment:
Embodiment 1: in conjunction with above-mentioned accompanying drawing, very-long instruction word processor of the present invention and instruction compression method thereof are elaborated.
Fig. 1 has provided a kind of structure of very-long instruction word processor of the present invention, this kind processor comprises instruction-level, instruction decompression unit, command assignment and decoding unit, several data channel, and each data channel comprises several function unit and a port register file; By instruction-level with information pass to the instruction decompression unit after, pass to command assignment and decoding unit again, by command assignment and decoding unit information is passed to functional unit, information can be transmitted back and forth between functional unit in each data channel and the multiport register file, and information can be transmitted back and forth between the multiport register file.
The instruction compression method of this kind processor is made up of horizontal direction compression and vertical direction compression.The horizontal direction compression is the compression to the instruction inner structure, is made up of register address compression and operational code compression; Register address compression be with a register address in the instruction as the base address, all the other register addresss in the instruction are expressed as the base address and add offset address; The operational code compression is that instruction is split as fictitious quasi-instruction A and fictitious quasi-instruction B, and operational code commonly used in the instruction is put into A, and remaining operational code is put into B, and the figure place of operational code is less than the figure place of operational code among the B among the A.The vertical direction compression is to the compression between instruction, and the instruction bag is divided into Fetch Packet and carries out bag, and Fetch Packet is put in several instructions that need to carry out together, extracts the instruction that actual needs is carried out by carrying out bag.
For the VLIW stereotyped command, because its model structure, determined the frequency of the use of its register address.As shown in Figure 3, almost each operational order structure all includes three register addresss: two source operand register addresss and a destination operand register address, and an operation field.Three register addresss a large amount of figure place in the instruction that accounted for, if but these three kinds of register addresss differ very near, and just do not need so many figure places not represent these three kinds of register addresss.Statistics shows that three register addresss of the instruction code that most of time compilings produce are close, just can adopt the method that reduces command bits fully.So for three instructions that occur register address simultaneously, we can be one of them as the base address, two other method of representing with offset address is represented this three register addresss.That is to say that the method for indirect addressing can adopt here.So just can reduce the figure place of some.
Secondly be exactly for the processing of the operational code in the general instruction.Can widely apply the statistical study of program, the distributive law that operational code is used calculates, and carries out the minimizing of operational code then on this basis.Fig. 4 has provided the schematic diagram of operation part compression.Can adopt the method that former instruction is separated, an instruction the compilation in be divided into fictitious quasi-instruction be sent to the instruction Fetch Packet carry out.In the drawings, former instruction is through adding up and separating, and mask work can carry out in compilation, and all operations sign indicating number of an instruction is divided into two virtual subnets instructions.A wherein, the B sub-instructions all is a fictitious order.Operational code commonly used is given in the A instruction, and remaining being put in the B instruction, the operational code figure place of this two class is respectively m and n position, and m is less than n.So can reduce, but for not being that very common instruction is just directly sent over original operation code length for instruction operation code commonly used.Generally reduce the shortcoming that operational code is lost instruction with regard to having overcome like this according to statistical conditions.
Generally saving instruction length that may be maximum is exactly to reduce numerical representation position immediately.Can adopt and use the figure place that limits to represent the seniority of counting immediately, can in compilation, add the virtual extended instruction then, in extended instruction, just can provide the remaining number immediately that does not provide.
One of characteristics that VLIW handles are exactly the executed in parallel of many instructions.A plurality of as can be seen from Figure 1 functional units can be carried out simultaneously.But because restriction and the resource limit used; And because in a data passage, the correlativity of instruction manipulation register application up and down; Also have the ability of compiler to make vliw processor can not reach desirable state under many circumstances.The instruction number that might not need exactly thus to carry out simultaneously will necessarily equal the number of functional unit.Fig. 5 has provided the schematic diagram of vertical direction instruction compression.Because compiler is not can both produce the instruction corresponding with functional unit at every turn, so the very possible blank instruction that comprises at the instruction bag of sending.Consider this situation, can be placed on the instruction carried out of several instructions of carrying out together in the instruction bag with the back, so just can be divided into the instruction bag Fetch Packet and carry out bag, a Fetch Packet can have a plurality of execution bags, and the instruction of carrying out in the bag is exactly the instruction that has comprised executed in parallel.So just can therefrom filter some non-operation instructions.
Processor software flow process of the present invention is as shown in Figure 2: after the processing of C code through C compiler/optimizer, change into assembler code; Enter the compilation optimizer again, change into the compilation after the scheduling; Then enter assembler/connector, change into object code; After compressing with compression method of the present invention again, the object code after just obtaining compressing.
Fig. 6 has provided the contrast before and after the instruction compression.TMS320C6XVLIW structure with TI is an example, and it has two data channel, and each data channel has 4 functional units and a register file.We are with one of them functional unit---the L functional unit is an example, uses the part in the compression method of the present invention---, and horizontal compression method compresses individual instructions, and the bandwidth of final instruction is greatly reduced.
As shown in Figure 7, eight is zone bit in the Fetch Packet, and whether this decompression structure needs new Instructions Cache to first bit representation, and whether second bit representation needs the fictitious order expanded, and several of the back is to keep the position, for the continuation and the expansion of follow-up work.Remaining is exactly the instruction of compressing on seven horizontal directions, is 24 of every instructions at present.
The hardware of decompression structure is described by Verilog HDL, with the Synopsys DC of company the nuclear that decompresses is comprehensively become gate level netlist, and my door adopts the UMC java standard library of 0.25um here.Final injunction decompresses and examines nearly 320 logical blocks.So each hardware data compares before and after the compression, meet shown in the table 1, this table is a program size with on-chip memory space size before compressing, we think that static RAM (SRAM) lbit approximates 1.5 here.
Table 1 hardware spending result
Use compression method of the present invention, for some specific application, we can compress fully be similar to the TMS320C6x code structure the VLIW instruction word from 256bits to 176bits, compressibility reaches 68.75%.
The above is a specific embodiment of the present invention only, should be pointed out that for the person of ordinary skill of the art, can also make many modification and improvement, and all modification or improvement all should be considered as protection scope of the present invention.
Claims (4)
1, a kind of very-long instruction word processor, it is characterized in that: comprise instruction-level, the instruction decompression unit, command assignment and decoding unit, several data channel, each data channel comprises several function unit and a port register file, by instruction-level with information pass to the instruction decompression unit after, pass to command assignment and decoding unit again, by command assignment and decoding unit information is passed to functional unit, information can be transmitted back and forth between functional unit in each data channel and the multiport register file, and information can be transmitted back and forth between the multiport register file.
2, a kind of instruction compression method of very-long instruction word processor is characterized in that: described compression method is made up of horizontal direction compression and vertical direction compression.
3, the instruction compression method of a kind of very-long instruction word processor according to claim 2 is characterized in that: described horizontal direction compression is the compression to the instruction inner structure, is made up of register address compression and operational code compression; The compression of described register address be with a register address in the instruction as the base address, all the other register addresss in the instruction are expressed as the base address and add offset address; Described operational code compression is that instruction is split as fictitious quasi-instruction A and fictitious quasi-instruction B, and operational code commonly used in the instruction is put into A, and remaining operational code is put into B, and the figure place of operational code is less than the figure place of operational code among the B among the A.
4, the instruction compression method of a kind of very-long instruction word processor according to claim 2, it is characterized in that: described vertical direction compression is to the compression between instruction, the instruction bag is divided into Fetch Packet and carries out bag, Fetch Packet is put in several instructions that need to carry out together, extracted the instruction that actual needs is carried out by carrying out bag.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 200410016755 CN1275140C (en) | 2004-03-03 | 2004-03-03 | Superlong command word processor and its command compression method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 200410016755 CN1275140C (en) | 2004-03-03 | 2004-03-03 | Superlong command word processor and its command compression method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1560728A true CN1560728A (en) | 2005-01-05 |
CN1275140C CN1275140C (en) | 2006-09-13 |
Family
ID=34440635
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 200410016755 Expired - Fee Related CN1275140C (en) | 2004-03-03 | 2004-03-03 | Superlong command word processor and its command compression method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1275140C (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102436781A (en) * | 2011-11-04 | 2012-05-02 | 杭州中天微系统有限公司 | Microprocessor order split device based on implicit relevance and implicit bypass |
-
2004
- 2004-03-03 CN CN 200410016755 patent/CN1275140C/en not_active Expired - Fee Related
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102436781A (en) * | 2011-11-04 | 2012-05-02 | 杭州中天微系统有限公司 | Microprocessor order split device based on implicit relevance and implicit bypass |
CN102436781B (en) * | 2011-11-04 | 2014-02-12 | 杭州中天微系统有限公司 | Microprocessor order split device based on implicit relevance and implicit bypass |
Also Published As
Publication number | Publication date |
---|---|
CN1275140C (en) | 2006-09-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Alameldeen et al. | Frequent pattern compression: A significance-based compression scheme for L2 caches | |
Ranganathan et al. | Performance of image and video processing with general-purpose processors and media ISA extensions | |
US7302543B2 (en) | Compressed memory architecture for embedded systems | |
Thuresson et al. | Memory-link compression schemes: A value locality perspective | |
US10007605B2 (en) | Hardware-based array compression | |
CN1991768A (en) | Instruction set architecture-based inter-sequencer communications with a heterogeneous resource | |
CN1675619A (en) | Equipment, method and compiler for processing variable long instruction in ultralong instruction word processor | |
CN1851671A (en) | Method for saving global varible internal memory space | |
CN1275140C (en) | Superlong command word processor and its command compression method | |
CN1806225A (en) | Instruction encoding within a data processing apparatus having multiple instruction sets | |
US7474750B2 (en) | Dynamic content-aware memory compression and encryption architecture | |
CN1298049C (en) | Graphic engine chip and its using method | |
CN101477473B (en) | Hardware-supporting database instruction interpretation and execution method | |
CN100511212C (en) | Processing method and apparatus for electronic table file | |
CN103198127A (en) | Large file sorting method and system | |
Djabelkhir et al. | Characterization of embedded applications for decoupled processor architecture | |
CN104699520B (en) | A kind of power-economizing method based on virtual machine (vm) migration scheduling | |
EP1229438A3 (en) | Sum of product arithmetic techniques | |
CN1317882A (en) | Method for compressing and decompressing data in database | |
CN101051383A (en) | Figure processor instruction group for using reconstructable high speed cache | |
Hasan et al. | A tunable hybrid memory allocator | |
CN1794143A (en) | External network calling method based on primitive mechanism | |
CN102054073A (en) | Data support method for large-scale integrated circuit layout | |
Kim et al. | Architectural enhancements for color image and video processing on embedded systems | |
Khan | Accelerating MATLAB slow loop execution with CUDA |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20060913 Termination date: 20100303 |