CN1560728A - Superlong command word processor and its command compression method - Google Patents

Superlong command word processor and its command compression method Download PDF

Info

Publication number
CN1560728A
CN1560728A CNA2004100167557A CN200410016755A CN1560728A CN 1560728 A CN1560728 A CN 1560728A CN A2004100167557 A CNA2004100167557 A CN A2004100167557A CN 200410016755 A CN200410016755 A CN 200410016755A CN 1560728 A CN1560728 A CN 1560728A
Authority
CN
China
Prior art keywords
instruction
compression
command
operational code
word processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2004100167557A
Other languages
Chinese (zh)
Other versions
CN1275140C (en
Inventor
鹏 刘
刘鹏
姚庆栋
洪享
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN 200410016755 priority Critical patent/CN1275140C/en
Publication of CN1560728A publication Critical patent/CN1560728A/en
Application granted granted Critical
Publication of CN1275140C publication Critical patent/CN1275140C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Executing Machine-Instructions (AREA)

Abstract

The invention discloses an overlength command work processor, which includes a command grade, command decompressing unit, command distribution and decoding unit, several data channels, each data channel includes several functional units and a port resister file; the command compressing method of the processor is made up of compressions in vertical direction and in parallel direction. The method can reduces the bandwidth of the command, upgrades the utilization rte of the command bandwidth, reduces the size of the program memory space on chip, and realizes the aim to reduce the hardware expenditure.

Description

Very-long instruction word processor and instruction compression method thereof
Technical field:
The present invention relates to a kind of instruction processing unit and instruction compression method thereof.
Background technology:
Very long instruction word (VLIW) structure is widely used in up-to-date, the high performance processor design in a lot of fields, TMS320C6x such as TI company, the dsp processors such as StarCore of Agere company, the Trimedia Media Processor of TriMedia Technologies company, and the general processor of Intel Company etc.The VLIW structure is exactly that the instruction of a lot of standards is bundled in the long instruction word, and this instruction word has comprised can be on the different sheets or the instruction of carrying out simultaneously in the different function units on a slice.Yet a so long instruction word needs the very wide program storage bandwidth and the bigger program space.Along with the growth of functional unit number, instruction bandwidth and storage space resource will be more nervous.In embedded system was used, system power dissipation and chip area were most important factors, and program storage and instruction bandwidth are big more, and area of chip is also just big more.So reduce to instruct bandwidth and the program storage are very crucial for the expense that reduces system.Usually adopt condensed instruction and reduce the instruction bandwidth, and the instruction compression method we it mainly is divided into two classes.One class adopts the free of losses data compression method, promptly based on the mode of Huffman coding and condensed instruction.We call instruction compression method based on the pattern of individual instructions form to them other one big class methods.Just be beneficial to and realize a more simple decompression unit at being compressed with of individual instructions.These two big class methods all have the relative merits of oneself, but the compression method that is based on Huffman coding is realized relatively difficulty.Though the great favor that vliw processor has obtained in recent years, at the instruction compression method of VLIW structure as other structures, do not study very extensive.1), the instruction bandwidth is too wide general vliw processor all exists following problem or weakness:, 2), the instruction bandwidth availability ratio is low, 3), sequencer space hold amount is big on the sheet.
Summary of the invention:
At the technical matters that exists in the prior art, the invention provides a kind of based on very-long instruction word processor and instruction compression method thereof individual instructions, the pattern novelty, that can reduce the instruction bandwidth.
The present invention is for reaching above purpose, be to realize: a kind of very-long instruction word processor is provided by such technical scheme, comprise instruction-level, instruction decompression unit, command assignment and decoding unit, several data channel, each data channel comprises several function unit and a port register file; By instruction-level with information pass to the instruction decompression unit after, pass to command assignment and decoding unit again, by command assignment and decoding unit information is passed to functional unit, information can be transmitted back and forth between functional unit in each data channel and the multiport register file, and information can be transmitted back and forth between the multiport register file.
The instruction compression method of this kind very-long instruction word processor is made up of horizontal direction compression and vertical direction compression.
The horizontal direction compression is the compression to the instruction inner structure, is made up of register address compression and operational code compression; Register address compression be with a register address in the instruction as the base address, all the other register addresss in the instruction are expressed as the base address and add offset address; The operational code compression is that instruction is split as fictitious quasi-instruction A and fictitious quasi-instruction B, and operational code commonly used in the instruction is put into A, and remaining operational code is put into B, and the figure place of operational code is less than the figure place of operational code among the B among the A.The vertical direction compression is to the compression between instruction, and the instruction bag is divided into Fetch Packet and carries out bag, and Fetch Packet is put in several instructions that need to carry out together, extracts the instruction that actual needs is carried out by carrying out bag.
With very-long instruction word processor of the present invention and instruction compression method thereof, can compress effectively instruction.The horizontal direction compression can reduce the figure place of some, and the vertical direction compression can filter out non-operation instruction.The benefit that adopts this method is exactly without the modify instruction collection, need not revise compiler, need not revise the soft nuclear of hardware, and as long as in compilation, handle with sheet on to insert a decompression module when deciphering just passable.Can reduce the instruction bandwidth, improve the instruction bandwidth utilization, reduce the size of program memory space on the sheet, really reach the purpose that reduces hardware spending.
Description of drawings:
Fig. 1 is the structural representation of very-long instruction word processor of the present invention;
Fig. 2 is the software flow pattern of very-long instruction word processor of the present invention;
Fig. 3 is the order structure figure of general VLIW;
Fig. 4 is the schematic diagram of instruction operational code compression;
Fig. 5 is the schematic diagram of instruction vertical direction compression;
Fig. 6 only adopts the horizontal direction compression method, comparison diagram before and after the instruction compression;
Fig. 7 is the structural representation of new Fetch Packet.
Embodiment:
Embodiment 1: in conjunction with above-mentioned accompanying drawing, very-long instruction word processor of the present invention and instruction compression method thereof are elaborated.
Fig. 1 has provided a kind of structure of very-long instruction word processor of the present invention, this kind processor comprises instruction-level, instruction decompression unit, command assignment and decoding unit, several data channel, and each data channel comprises several function unit and a port register file; By instruction-level with information pass to the instruction decompression unit after, pass to command assignment and decoding unit again, by command assignment and decoding unit information is passed to functional unit, information can be transmitted back and forth between functional unit in each data channel and the multiport register file, and information can be transmitted back and forth between the multiport register file.
The instruction compression method of this kind processor is made up of horizontal direction compression and vertical direction compression.The horizontal direction compression is the compression to the instruction inner structure, is made up of register address compression and operational code compression; Register address compression be with a register address in the instruction as the base address, all the other register addresss in the instruction are expressed as the base address and add offset address; The operational code compression is that instruction is split as fictitious quasi-instruction A and fictitious quasi-instruction B, and operational code commonly used in the instruction is put into A, and remaining operational code is put into B, and the figure place of operational code is less than the figure place of operational code among the B among the A.The vertical direction compression is to the compression between instruction, and the instruction bag is divided into Fetch Packet and carries out bag, and Fetch Packet is put in several instructions that need to carry out together, extracts the instruction that actual needs is carried out by carrying out bag.
For the VLIW stereotyped command, because its model structure, determined the frequency of the use of its register address.As shown in Figure 3, almost each operational order structure all includes three register addresss: two source operand register addresss and a destination operand register address, and an operation field.Three register addresss a large amount of figure place in the instruction that accounted for, if but these three kinds of register addresss differ very near, and just do not need so many figure places not represent these three kinds of register addresss.Statistics shows that three register addresss of the instruction code that most of time compilings produce are close, just can adopt the method that reduces command bits fully.So for three instructions that occur register address simultaneously, we can be one of them as the base address, two other method of representing with offset address is represented this three register addresss.That is to say that the method for indirect addressing can adopt here.So just can reduce the figure place of some.
Secondly be exactly for the processing of the operational code in the general instruction.Can widely apply the statistical study of program, the distributive law that operational code is used calculates, and carries out the minimizing of operational code then on this basis.Fig. 4 has provided the schematic diagram of operation part compression.Can adopt the method that former instruction is separated, an instruction the compilation in be divided into fictitious quasi-instruction be sent to the instruction Fetch Packet carry out.In the drawings, former instruction is through adding up and separating, and mask work can carry out in compilation, and all operations sign indicating number of an instruction is divided into two virtual subnets instructions.A wherein, the B sub-instructions all is a fictitious order.Operational code commonly used is given in the A instruction, and remaining being put in the B instruction, the operational code figure place of this two class is respectively m and n position, and m is less than n.So can reduce, but for not being that very common instruction is just directly sent over original operation code length for instruction operation code commonly used.Generally reduce the shortcoming that operational code is lost instruction with regard to having overcome like this according to statistical conditions.
Generally saving instruction length that may be maximum is exactly to reduce numerical representation position immediately.Can adopt and use the figure place that limits to represent the seniority of counting immediately, can in compilation, add the virtual extended instruction then, in extended instruction, just can provide the remaining number immediately that does not provide.
One of characteristics that VLIW handles are exactly the executed in parallel of many instructions.A plurality of as can be seen from Figure 1 functional units can be carried out simultaneously.But because restriction and the resource limit used; And because in a data passage, the correlativity of instruction manipulation register application up and down; Also have the ability of compiler to make vliw processor can not reach desirable state under many circumstances.The instruction number that might not need exactly thus to carry out simultaneously will necessarily equal the number of functional unit.Fig. 5 has provided the schematic diagram of vertical direction instruction compression.Because compiler is not can both produce the instruction corresponding with functional unit at every turn, so the very possible blank instruction that comprises at the instruction bag of sending.Consider this situation, can be placed on the instruction carried out of several instructions of carrying out together in the instruction bag with the back, so just can be divided into the instruction bag Fetch Packet and carry out bag, a Fetch Packet can have a plurality of execution bags, and the instruction of carrying out in the bag is exactly the instruction that has comprised executed in parallel.So just can therefrom filter some non-operation instructions.
Processor software flow process of the present invention is as shown in Figure 2: after the processing of C code through C compiler/optimizer, change into assembler code; Enter the compilation optimizer again, change into the compilation after the scheduling; Then enter assembler/connector, change into object code; After compressing with compression method of the present invention again, the object code after just obtaining compressing.
Fig. 6 has provided the contrast before and after the instruction compression.TMS320C6XVLIW structure with TI is an example, and it has two data channel, and each data channel has 4 functional units and a register file.We are with one of them functional unit---the L functional unit is an example, uses the part in the compression method of the present invention---, and horizontal compression method compresses individual instructions, and the bandwidth of final instruction is greatly reduced.
As shown in Figure 7, eight is zone bit in the Fetch Packet, and whether this decompression structure needs new Instructions Cache to first bit representation, and whether second bit representation needs the fictitious order expanded, and several of the back is to keep the position, for the continuation and the expansion of follow-up work.Remaining is exactly the instruction of compressing on seven horizontal directions, is 24 of every instructions at present.
The hardware of decompression structure is described by Verilog HDL, with the Synopsys DC of company the nuclear that decompresses is comprehensively become gate level netlist, and my door adopts the UMC java standard library of 0.25um here.Final injunction decompresses and examines nearly 320 logical blocks.So each hardware data compares before and after the compression, meet shown in the table 1, this table is a program size with on-chip memory space size before compressing, we think that static RAM (SRAM) lbit approximates 1.5 here.
Table 1 hardware spending result
Use compression method of the present invention, for some specific application, we can compress fully be similar to the TMS320C6x code structure the VLIW instruction word from 256bits to 176bits, compressibility reaches 68.75%.
The above is a specific embodiment of the present invention only, should be pointed out that for the person of ordinary skill of the art, can also make many modification and improvement, and all modification or improvement all should be considered as protection scope of the present invention.

Claims (4)

1, a kind of very-long instruction word processor, it is characterized in that: comprise instruction-level, the instruction decompression unit, command assignment and decoding unit, several data channel, each data channel comprises several function unit and a port register file, by instruction-level with information pass to the instruction decompression unit after, pass to command assignment and decoding unit again, by command assignment and decoding unit information is passed to functional unit, information can be transmitted back and forth between functional unit in each data channel and the multiport register file, and information can be transmitted back and forth between the multiport register file.
2, a kind of instruction compression method of very-long instruction word processor is characterized in that: described compression method is made up of horizontal direction compression and vertical direction compression.
3, the instruction compression method of a kind of very-long instruction word processor according to claim 2 is characterized in that: described horizontal direction compression is the compression to the instruction inner structure, is made up of register address compression and operational code compression; The compression of described register address be with a register address in the instruction as the base address, all the other register addresss in the instruction are expressed as the base address and add offset address; Described operational code compression is that instruction is split as fictitious quasi-instruction A and fictitious quasi-instruction B, and operational code commonly used in the instruction is put into A, and remaining operational code is put into B, and the figure place of operational code is less than the figure place of operational code among the B among the A.
4, the instruction compression method of a kind of very-long instruction word processor according to claim 2, it is characterized in that: described vertical direction compression is to the compression between instruction, the instruction bag is divided into Fetch Packet and carries out bag, Fetch Packet is put in several instructions that need to carry out together, extracted the instruction that actual needs is carried out by carrying out bag.
CN 200410016755 2004-03-03 2004-03-03 Superlong command word processor and its command compression method Expired - Fee Related CN1275140C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200410016755 CN1275140C (en) 2004-03-03 2004-03-03 Superlong command word processor and its command compression method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200410016755 CN1275140C (en) 2004-03-03 2004-03-03 Superlong command word processor and its command compression method

Publications (2)

Publication Number Publication Date
CN1560728A true CN1560728A (en) 2005-01-05
CN1275140C CN1275140C (en) 2006-09-13

Family

ID=34440635

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200410016755 Expired - Fee Related CN1275140C (en) 2004-03-03 2004-03-03 Superlong command word processor and its command compression method

Country Status (1)

Country Link
CN (1) CN1275140C (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436781A (en) * 2011-11-04 2012-05-02 杭州中天微系统有限公司 Microprocessor order split device based on implicit relevance and implicit bypass

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436781A (en) * 2011-11-04 2012-05-02 杭州中天微系统有限公司 Microprocessor order split device based on implicit relevance and implicit bypass
CN102436781B (en) * 2011-11-04 2014-02-12 杭州中天微系统有限公司 Microprocessor order split device based on implicit relevance and implicit bypass

Also Published As

Publication number Publication date
CN1275140C (en) 2006-09-13

Similar Documents

Publication Publication Date Title
Alameldeen et al. Frequent pattern compression: A significance-based compression scheme for L2 caches
Ranganathan et al. Performance of image and video processing with general-purpose processors and media ISA extensions
US7302543B2 (en) Compressed memory architecture for embedded systems
Thuresson et al. Memory-link compression schemes: A value locality perspective
US10007605B2 (en) Hardware-based array compression
CN1991768A (en) Instruction set architecture-based inter-sequencer communications with a heterogeneous resource
CN1675619A (en) Equipment, method and compiler for processing variable long instruction in ultralong instruction word processor
CN1851671A (en) Method for saving global varible internal memory space
CN1275140C (en) Superlong command word processor and its command compression method
CN1806225A (en) Instruction encoding within a data processing apparatus having multiple instruction sets
US7474750B2 (en) Dynamic content-aware memory compression and encryption architecture
CN1298049C (en) Graphic engine chip and its using method
CN101477473B (en) Hardware-supporting database instruction interpretation and execution method
CN100511212C (en) Processing method and apparatus for electronic table file
CN103198127A (en) Large file sorting method and system
Djabelkhir et al. Characterization of embedded applications for decoupled processor architecture
CN104699520B (en) A kind of power-economizing method based on virtual machine (vm) migration scheduling
EP1229438A3 (en) Sum of product arithmetic techniques
CN1317882A (en) Method for compressing and decompressing data in database
CN101051383A (en) Figure processor instruction group for using reconstructable high speed cache
Hasan et al. A tunable hybrid memory allocator
CN1794143A (en) External network calling method based on primitive mechanism
CN102054073A (en) Data support method for large-scale integrated circuit layout
Kim et al. Architectural enhancements for color image and video processing on embedded systems
Khan Accelerating MATLAB slow loop execution with CUDA

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20060913

Termination date: 20100303