CN201548950U

CN201548950U - Byte code buffering device for improving instruction access bandwidth of Java processor

Info

Publication number: CN201548950U
Application number: CN2009202323651U
Authority: CN
Inventors: 柴志雷; 张平; 梁久祯; 任小龙
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2009-09-30
Filing date: 2009-09-30
Publication date: 2010-08-11
Anticipated expiration: 2019-09-30

Abstract

The utility model relates to a byte code buffering device for improving the instruction access bandwidth of a Java processor. In the utility model, a byte code register, a multichannel selection module and a byte code buffer are orderly connected; the input end of the byte code register is connected with an instruction memory, and the output end of the byte code buffer is connected with the decoding section of a Java processor; the input end of a control module is connected with the decoding section of the Java processor, and the output end of the control module is respectively connected with the byte code register, the multichannel selection module and the byte code buffer; the byte code register is 32 bit, the byte code buffer is 64 bit, and the four high bytes are connected with the decoding section of the Java processor. When the space available in the byte code buffer is not less than four bytes, four bytes are read from the register and sent to the correct buffered position by the multichannel selection module, so byte codes to be executed are completely stored in the four high bytes. Memory access times are reduced, and instruction access bandwidth is improved.

Description

Be used to improve the byte code buffer device of Java processor instruction fetch bandwidth

(1) technical field

The utility model relates to the Java processor, is specially a kind of byte code buffer device that is used to improve Java processor instruction fetch bandwidth.

(2) background technology

The instruction set of computing machine generally is divided into two classes: a class is the instruction set of regular length, no matter promptly instruction type how, all keep identical length, its advantage is just can judge and get a complete instruction in instruction fetch, the flowing water of being convenient to instruct is carried out, its weak point is to instruct all to occupy identical length, is unfavorable for saving the instruction storage space, and the instruction set of regular length is used in lean instruction set computing machine (RISC) system more; Another kind of is elongated instruction set, be that different instructions has different length, its advantage is to reduce instruction length as far as possible, help saving the instruction storage space, its weak point is to determine the complete length of this instruction in instruction fetch, can only could determine and read further part to after the operation part decoding, be unfavorable for the flowing water execution of instructing, more so be used in complex instruction set computer (CISC) (CISC) system.

The Java processor is meant the processor that can directly carry out the Java Virtual Machine instruction set with hardware.The instruction set of Java Virtual Machine (being bytecode) belongs to the unfixed instruction set of length, and its instruction is except extremely indivedual, and the overwhelming majority is no more than 4 bytes.The way that the Java processor is common, carrying out bytecode with command memory with the interface of 1 byte wide exactly reads, only read first byte (byte at operational code place) in the instruction fetch phase, and after decoding section gets access to whole bytecode length, carry out reading of subsequent byte again.Owing to will have influence on the raising of processor performance repeatedly with the width access instruction storer of 1 byte.

The method that improves instruction fetch bandwidth by Instructions Cache is also arranged at present, it is that bytecode is read and is that unit writes a buffering that is formed of registers with 4 bytes from command memory, reads correct bytecode according to the instruction physical length from the output mux of register buffering then; If the instruction of reading surpasses a word, then the data in the register buffering are moved forward once.The characteristics of this method are that registers group can provide instruction prefetch performance preferably when big, but instruction is shifted to wait and can be caused its utilization factor to descend, so need to determine suitable registers group size, the finger demand of getting of decoding unit can be satisfied so preferably, the equipment amount of hardware can be reduced again.

(3) summary of the invention

At the problems referred to above, the utility model provides a kind of byte code buffer device that is used to improve Java processor instruction fetch bandwidth, uses this device can improve performance of processors.

The technical scheme of the utility model device is such:

It comprises byte code register, multichannel selection module, bytecode buffering and control module, and described byte code register, multichannel selection module and bytecode buffering are continuous in proper order; The input end of described byte code register links to each other with described command memory, and the output terminal of described bytecode buffering links to each other with the decoding section of Java processor; The input end of described control module links to each other with the decoding section of Java processor, and the output terminal of described control module selects module and bytecode buffering to link to each other with described byte code register, multichannel respectively, and it is carried out logic control; Described byte code register is 32, and being used for storing what read from command memory is the bytecode of unit with 32; Described bytecode buffering is 64, and its high 4 bytes link to each other with the decoding section of Java processor, for it provides a complete bytecode.

Its further technical scheme is:

Described multichannel selects module to comprise first order MUX and second level MUX, first order MUX is responsible for the effective byte of byte code register is chosen 8 trams in the byte sequence according to byte order, and second level MUX is responsible for the unified ordering of remainder bytes in the byte of first order MUX output and the former bytecode buffering and is delivered to the tram that bytecode cushions;

Described bytecode buffering adopts the cache with pre-fetch function.

The bytecode buffer length that the utility model proposed is fixed on 64, its output is fixed on 4 the highest bytes, when having avoided output to the needs of MUX.Jiu Shi its fundamental purpose " whole deposit zero gets " is 32 that the bytecode of unit is sent into the bytecode buffering with the frequency of operation identical with processor, reads and the subsequent byte sign indicating number is pushed away forward with different length, always make available bytecode remain on fixing position.

When the utility model is not less than 4 bytes at the free space of bytecode buffering, just read 4 bytes from register, and select module to be sent to the tram of buffering by multichannel, make pending bytecode total complete being present in the high byte, because it is complete that pending bytecode can be got in one-period, reduce the memory access number of times, improved instruction fetch bandwidth.

The utility model provides processor performance from two aspects, and the one, simultaneously possible operand is taken out simultaneously in instruction fetch, avoided repeatedly reference-to storage; The 2nd, utilize each 4 bytes of instruction fetch, and most bytecode less than 4 bytes produce certain hour and can be used for instruction prefetch is cushioned to bytecode, thereby hidden the memory access time.

(4) description of drawings

Fig. 1 is the structural representation and the block scheme of the utility model device;

Fig. 2 is data path signal and the block scheme that the multichannel in the utility model is selected module;

Fig. 3 is the signal and the block scheme of the control module in the utility model;

Fig. 4 is the output interface signal and the block scheme of the bytecode buffering in the utility model.

(5) embodiment

As shown in Figure 1, the utility model device comprises byte code register 2, multichannel selection module 3, bytecode buffering 4 and control module 1, and it is continuous that byte code register 2, multichannel select module 3 and bytecode to cushion 4 orders.The input end and instruction storer of byte code register 2 links to each other, and the output terminal of bytecode buffering 4 links to each other with the decoding section of Java processor; The input end of control module 1 links to each other with the decoding section of Java processor, and the output terminal of control module 1 selects module 3 and bytecode buffering 4 to link to each other with byte code register 2, multichannel respectively, and it is carried out logic control.

Byte code register 2 is 32, and being used for storing what read from command memory is the bytecode of unit with 32.Bytecode buffering 4 is 64, and its high 4 bytes link to each other with the decoding section of Java processor, and for it provides a complete bytecode, bytecode length can change between 1 to 4 byte.

As shown in Figure 2, multichannel selects module 3 to comprise first order MUX 5 and second level MUX 6.First order MUX 5 is responsible for the effective byte of byte code register 2 is chosen 8 trams in the byte sequence according to byte order, and second level MUX 6 is responsible for byte and unified ordering of remainder bytes in the former bytecode buffering that first order MUX 5 is exported and the tram of delivering to bytecode buffering 4.

4 bytes of reading from byte code register 2 are connected to all first order MUX 5 simultaneously, each first order MUX 5 can select one of them new byte that reads from storer as output, therefore each byte can be delivered to the tram of 8 byte locations by first order MUX 5 and arranges in order.In the second level MUX 6, the input end that each multichannel is selected has comprised all bytes that may appear at this position, does not have used up byte to sort together when just 4 bytes newly reading into being added that current bytecode is carried out in the original bytecode buffering 4 by second level MUX 6.After a clock period, just be written to bytecode buffering 4 through the bytecode that sorts and suffered.Because the longest bytecode is no more than 4 bytes, so only just enough with the bytecode buffering 4 of 8 bytes.

Should be noted that in the specific implementation:, need before the Java processor is directly carried out, these bytecodes that exceed 4 bytes be changed because there is the bytecode that exceeds 4 bytes individually in the Java Virtual Machine instruction set; Because the width of bytecode buffering is fixed on 64 i.e. 8 bytes, therefore prefetching performance a little less than, but can only there be enough spaces time spent in buffering, from command cache, read bytecode to carry out the identical frequency of frequency with processor, therefore, the cache with pre-fetch function is adopted in suggestion, utilizes less than 4 bytecode and carries out the performance loss that the time of looking ahead that produces reduces memory access.

As shown in Figure 3, be the synoptic diagram of control module 1, the control signal of byte code buffer device being controlled needed initial conditions and output thereof has been described.Wherein sequence number 21 is by the output of the decoding section of processor, is used for selecting the actual effective word joint number of reading in from command memory.Sequence number 13 is made up of minimum two of PC, is used at instruction redirect, interruption, method call, returns etc. under the situation, points out the address align situation of bytecode, thereby points out the real effective word joint number of reading from storer.Sequence number 14 is when the instruction stream order is carried out, 4 byte numbers reading from storer.Sequence number 15 is a bytecode length, i.e. the cushion space that this instruction can be vacateed after carrying out.Sequence number 16 is at sequence number 9 byte number to be got effectively the time.Sequence number 18 is sequence number 9 byte numbers effectively to be got whether no matter.Sequence number 19 is the free spaces that clock period postbyte sign indicating number buffering has.Sequence number 20 is free spaces that current period bytecode buffering has, and it need add that the current bytecode of carrying out will used up space 15, deducts the byte number of newly getting into 18 again.Sequence number 17 is spaces that the bytecode buffering had after the current bytecode of carrying out was carried out, and itself and the byte number that will newly read are into compared the control signal 9 that can generation continue to get.Whether sequence number 7 explanation present instruction storeies or command cache have bytecode readable.Therefore sequence number 7 and sequence number 9 have one not satisfy, and sequence number 8 is read the bytecode operation with regard to control and changed wait over to.

Sequence number

10,11 is used for producing the control signal of control first order MUX 5 and second level MUX 6 according to the situations such as byte number of will used up space and newly reading into, is used for the bytecode of newly reading is into delivered to the tram.Whether sequence number 12 is used for control byte sign indicating number buffering 4 and upgrades.

Fig. 4 is the output interface synoptic diagram of bytecode buffering, wherein first byte is pending bytecode operational code place byte forever, after can obtaining this bytecode and carry out by Byte2mov formation logic 26 with used up byte number 15, the input of Byte2mov formation logic 26 is operation parts of bytecode, output is the length of current bytecode, be used for representing current bytecode carry out after with used up byte number.The highest 4 bytes are delivered to the

decoding section

22,23,24 and 25 of processor simultaneously, comprise operational code and the operand that may exist, even because can not use, do not have harm always take out simultaneously.

The using method of the utility model device is as follows:

Byte code register 2 once reads 4 bytes from the instruction memory address of 32 alignment, select module 3 to deliver to bytecode buffering 4 by multichannel, high 4 bytes of bytecode buffering 4 provide a complete bytecode for the decoding section of Java processor, and this bytecode is 1～4 byte.

Control module 1 is according to the current bytecode actual consumption byte number that is performed, it is cushioned 4 space total amounts from bytecode deducts, and judge whether bytecode buffering 4 can provide the space that is not less than the effective word joint number that will send into, judges whether new bytecode is sent into bytecode buffering 4.

Control module 1 selects real effective byte to send multichannel to select module 3 from byte code register 2 according to the alignment situation of bytecode address.

Claims

1. be used to improve the byte code buffer device of Java processor instruction fetch bandwidth, comprise byte code register (2), multichannel selection module (3), bytecode buffering (4) and control module (1), it is characterized in that:

Described byte code register (2), multichannel select module (3) and bytecode buffering (4) order to link to each other;

The input end of described byte code register (2) links to each other with described command memory, and the output terminal of described bytecode buffering (4) links to each other with the decoding section of Java processor;

The input end of described control module (1) links to each other with the decoding section of Java processor, and the output terminal of described control module (1) selects module (3) and bytecode buffering (4) to link to each other with described byte code register (2), multichannel respectively, and it is carried out logic control;

Described byte code register (2) is 32, and being used for storing what read from command memory is the bytecode of unit with 32;

Described bytecode buffering (4) is 64, and its high 4 bytes link to each other with the decoding section of Java processor, for it provides a complete bytecode.

2. the byte code buffer device that is used to improve Java processor instruction fetch bandwidth according to claim 1, it is characterized in that: described multichannel selects module (3) to comprise first order MUX (5) and second level MUX (6), first order MUX (5) is responsible for the effective byte of byte code register (2) is chosen 8 trams in the byte sequence according to byte order, and second level MUX (6) is responsible for byte and unified ordering of remainder bytes in the former bytecode buffering that first order MUX (5) is exported and the tram of delivering to bytecode buffering (4).

3. the byte code buffer device that is used to improve Java processor instruction fetch bandwidth according to claim 1 is characterized in that: described bytecode buffering (4) adopts the cache with pre-fetch function.