CN201556199U

CN201556199U - Byte code high-speed cache device for real-time Java processor

Info

Publication number: CN201556199U
Application number: CN2009202323632U
Authority: CN
Inventors: 柴志雷; 涂时亮; 吴小俊; 须文波; 孙俊; 叶新栋
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2009-09-30
Filing date: 2009-09-30
Publication date: 2010-08-18
Anticipated expiration: 2019-09-30

Abstract

The utility model relates to a byte code high-speed cache device for a real-time Java processor, which belongs to the field of computers. The device comprises a byte code counter, a high-speed cache read address multiplexer, a high-speed cache write address multiplexer, a high-speed cache read address register, a high-speed cache write address register, a high-speed cache read address adder, a high-speed cache write address adder, a byte code readable comparator, a byte code writable comparator, a main memory, a high-speed cache and a byte code register. The byte code high-speed cache device utilizes the characteristics that Java byte codes have unequal lengths and most of the byte codes are less than 4 bytes to take four bytes at one time and use up by a plurality of times, thereby producing the pre-taking time. The byte code high-speed cache device has the command automatic pre-taking mechanism, and can reduce the pauses of a CPU caused by taking a command, and determine the positions of the byte codes in hit missing, thus being capable of predicting the access time thereof in static state. The byte code high-speed cache device can greatly improve the performances of the Java processor while ensuring the predictability of the implementation of a program.

Description

The bytecode caching device that is used for real-time Java processor

(1) technical field

The utility model relates to the cache mechanism of Java processor in the computer realm, is specially a kind of bytecode (instruction) caching device that is used for real-time Java processor.

(2) background technology

For the access speed that makes storer is complementary with the execution speed of CPU as far as possible, computer system generally satisfies many-sided demand of access speed, memory capacity and manufacturing cost by multi-level memory construction at present.Usually access speed is fast, but the high storer of manufacturing cost is arranged on from the nearer position of CPU with less memory capacity, and it is preserved from a CPU data subclass than large memories far away.According to the program locality principle, the CPU most applications can have access to desired data in its nearer high-speed memory, have only a few cases just need visit storer far away, and a batch data is copied in the high-speed memory.Under storage hierarchy situation reasonable in design, its whole access speed can be near the access speed of one-level recently, and memory capacity and cost are then near one-level farthest.High-speed cache (being Cache) is in the storage hierarchy position of the most close CPU, and the performance of CPU is played crucial effects.Because the pipelinings that adopt of CPU design at present improve overall performance more, in order to reduce the pause of streamline, usually Cache are divided into instruction Cache and Data Cache two parts, the utility model at be instruction Cache part.In the Java processor,, therefore instruct the Cache to be exactly the bytecode caching that the utility model is mentioned because hardware directly carries out the Java bytecode.

Existing instruction Cache is normally by improving hit rate, reduce the time of hitting, reducing methods such as miss expense and improve the memory access average behaviors, since can't shift to an earlier date in the determining program implementation visit any bar instruction can occur miss, so the worst execution time of program can't shift to an earlier date static prediction.Also has by instruction prefetch and improve the instruction memory accesses performance, but main target also is to improve average behavior, and not and Java processor characteristics combine.

In the prior art, the bytecode caching mechanism at real-time Java processor is arranged also.This mechanism method call, when returning once the entire method that will the use high-speed cache of packing into, avoid when carrying out, taking place not hit.Its advantage is an easy static prediction of execution time, and the problem of existence is that the method utilization factor of packing into is low, method call and when returning the processor stand-by period very long.

(3) summary of the invention

At the problems referred to above, the utility model provides a kind of bytecode caching device that is used for real-time Java processor, uses this device can improve the Java performance of processors.

The technical scheme of the utility model device is such:

It comprises that bytecode counter, caching read address MUX, high-speed cache write address MUX, caching read address register, high-speed cache writing address register, caching read address totalizer, high-speed cache write address totalizer, the readable comparer of bytecode, bytecode can write comparer, primary memory, high-speed cache and byte code register;

It is the pending bytecode address of addressing unit that bytecode counter is used for preserving with the byte, and this address is through delivering to high-speed cache write address MUX after the address align operation;

The output terminal of caching read address MUX and high-speed cache write address MUX connects two input ends of the readable comparer of bytecode respectively, and the output terminal of the readable comparer of bytecode connects the Enable Pin of byte code register and caching read address register respectively;

Whether the readable comparer of bytecode is used for judging has bytecode to use in the high-speed cache, write address then has bytecode to use greater than reading the address, by the caching read address totalizer current address is added 1 and squeeze into the caching read address register when next clock comes, select this address as output by control signal control simultaneously;

The output terminal that is connected to high-speed cache write address MUX is imported in the address of reading of primary memory, and its data output is connected to the data input pin of high-speed cache;

High-speed cache read the output terminal that address input end is connected in the caching read address MUX, its write address input end is connected in the output terminal of high-speed cache write address MUX, its data input pin is connected in the data output end of primary memory, and its data output end is connected in the input end of byte code register;

The output terminal of byte code register is connected to the subsequent segment of Java processor;

High-speed cache read address input end and the write address input end connects two input ends that bytecode can be write comparer respectively, the output terminal that bytecode can be write comparer connects the Enable Pin of high-speed cache writing address register;

Bytecode can be write comparer and be used for judging that high-speed cache has or not the space to deposit more multibyte sign indicating number, write address is not equal to that representing when reading the address has living space deposits more multibyte sign indicating number, by high-speed cache write address totalizer the current address is added 4 and squeeze into the high-speed cache writing address register when next clock comes, select this address as output by control signal control simultaneously;

When control stream changes, control caching read address MUX and high-speed cache write address MUX respectively by control signal, selecting the address of bytecode counter is OPADD.

Its further technical scheme is:

The address of described high-speed cache is 0～(2 ^m-1), the address of described primary memory is 0～(2 ⁿ-1); Described high-speed cache read the low m position that address input end is connected in caching read address MUX output terminal, the write address input end of described high-speed cache is connected in the low m position of high-speed cache write address MUX output terminal; The input of the readable comparer of described bytecode is respectively (n-1)～2 of the high-speed cache write address MUX of (n-1)～2 of caching read address MUX of n position and n position, and the input that described bytecode can be write comparer is respectively (m-1)～2 of (m-1)～2 of caching read address input end and high-speed cache write address input end;

Described high-speed cache is the dual-port toroidal memory;

The highway width of the data input pin of described high-speed cache is 32, and the width of the data output end of described high-speed cache is 8.

Technique effect of the present utility model is: the utility model utilizes not isometric, the most bytecode of Java bytecode to be less than the characteristics of 4 bytes, once gets 4 bytes, uses several times, looks ahead the time thereby produce; The utility model needs a plurality of clock period to carry out to some complicated orders are arranged in the Java bytecode, can carry out a large amount of looking ahead simultaneously; By to method call, return and improve, can carry out correct looking ahead thereby obtain new bytecode address as early as possible; Utilize the first read latch of primary memory bigger, read latch features of smaller next, the wait that memory access is produced concentrates on the beginning of fundamental block, so that carry out static analysis.

The utility model device has the automatic prefetch mechanisms of instruction, can reduce the CPU that brings because of instruction fetch and pause; And can determine the bytecode position of miss generation, but make its access time static prediction, guaranteeing that program carries out the predictable while and improved the Java performance of processors greatly.

(4) description of drawings

Fig. 1 is the inner structure synoptic diagram of the utility model device;

Fig. 2 is the synoptic diagram of address align operation in the utility model;

Fig. 3 is the worst case statistical method synoptic diagram of fundamental block in the utility model;

Fig. 4 is the worst execution time static statistics method flow diagram of program in the utility model.

(5) embodiment

As shown in Figure 1, the device that the utility model proposed comprises that bytecode counter 1, caching read address MUX 3, high-speed cache write address MUX 4, caching read address register 5, high-speed cache writing address register 6, caching read address totalizer 7, high-speed cache write address totalizer 8, the readable comparer 9 of bytecode, bytecode can write comparer 10, primary memory 13, high-speed cache 14 and byte code register 15.

Wherein, it is the pending bytecode address of addressing unit that bytecode counter 1 is used for preserving with the byte, is the addressing least unit with the byte, is adjusted according to current bytecode physical length by the decoding section of CPU.Bytecode counter 1 width is determined by primary memory 13 address widths.The address of bytecode counter 1 is through delivering to high-speed cache write address MUX 4 after the address align operation 2.

Fig. 2 is the address align operation chart, because be 32 alignment, can simply minimum 2 ground connection be handled.

High-speed cache 14 is the dual-port toroidal memory, and the address of high-speed cache 14 is 0～(2 ^m-1).High-speed cache 14 read the low m position that address input end 11 is connected in caching read address MUX 3 output terminals, the write address input end 12 of high-speed cache 14 is connected in the low m position of high-speed cache write address MUX 4 output terminals.

The output result that readable comparer 9 of bytecode and bytecode can be write comparer 10 is used for controlling the Enable Pin of caching read address register 5 and high-speed cache writing address register 6 respectively.

The address of primary memory 13 is 0～(2 ⁿ-1), the input of the readable comparer 9 of bytecode is respectively (n-1)～2 of the high-speed cache write address MUX 4 of (n-1)～2 of caching read address MUX 3 of n position and n position.The output terminal of the readable comparer 9 of bytecode connects the Enable Pin of byte code register 15 and caching read address register 5 respectively.Whether the readable comparer 9 of bytecode is used for judging has bytecode to use in the high-speed cache 14.Because what directly compare is the address of primary memory 13, when reading to illustrate have bytecode readable when the address lags behind write address, to read the sequence of addresses byte that moves down, by caching read address totalizer 7 current address is added 1 and squeeze into caching read address register 5 when next clock comes, select these addresses as output by control signal 19 controls simultaneously.Increase progressively with regard to the order of having finished caching read address MUX 3 like this, that has also just finished simultaneously high-speed cache 14 reads increasing progressively of address input end 11.

The input that bytecode can be write comparer 10 is respectively (m-1)～2 of (m-1)～2 of caching read address input end 11 and high-speed cache write address input end 12.Bytecode can be write the Enable Pin of the output terminal connection high-speed cache writing address register 6 of comparer 10.Bytecode can be write comparer 10 and be used for judging that high-speed cache 14 has or not the space to deposit more multibyte sign indicating number.Bytecode can write comparer 10 relatively be the address of high-speed cache 14 because this high-speed cache 14 is annular buffer memorys, write address be greater than or less than read the address all show have living space available, can be the write address order word that moves down.By high-speed cache write address totalizer 8 current address is added 4 and squeeze into high-speed cache writing address register 6 when next clock comes, select these addresses as output by control signal 19 controls simultaneously.Increase progressively with regard to the order of having finished high-speed cache write address MUX 4 like this, also just finished increasing progressively of high-speed cache write address input end 12 simultaneously.

The address input of reading of primary memory 13 is connected to the output terminal of high-speed cache write address MUX 4, is that unit reads the bytecode in the primary memory 13 and it is write corresponding place, high-speed cache 14 addresses because look ahead exactly with 32; The data-out port of main memory module is 32 a port one 7, and port one 7 also is the data input pin 17 of high-speed cache 14 simultaneously.

The data input pin 17 of high-speed cache 14 is data write ports of 32, and it is connected in the data output end of primary memory 13.The data output end 18 of high-speed cache 14 is data read ports of 8, and it is connected in the input end of byte code register 15.It is look ahead automatically bytecode and be written to specified high-speed cache 14 places of write address input end 12 by data input pin 17 of unit that bytecode caching is sentenced 32 by high-speed cache write address MUX 4 specified addresses; CPU reads bytecode and delivers to byte code register 15 by data output end 18 from reading address input end 11 specified addresses when needed.

Byte code register 15 is used for storing the bytecode read from high-speed cache 14 and uses by the subsequent segment that output terminal 16 offers the Java processor.Each clock period of this register can be that unit provides bytecode with 8, and this just gives to look ahead automatically provides the time.

Bytecode counter 1 should be upgraded in the decoding or the execute phase of Java processor, if carry out inner continuation of fundamental block, then just can judge the increment value of bytecode counter 1 according to the current byte codeword joint number of deciphering in the decoding stage; If jump out fundamental block, promptly control stream variation has taken place, then to just can calculate the new address of wanting redirect in the execute phase; If method call, complicated order such as return, may also need more time just can calculate the value that to upgrade.Signal 19 among Fig. 1 is used for illustrating whether the bytecode of current decoding needs to change control stream, and this signal is produced by decoding unit.

When bytecode is positioned at fundamental block inside, control stream carries out in proper order, caching read address MUX 3 and high-speed cache write address MUX 4 independent changes, select the address to add 1 as its address output by control signal 19 control caching read address MUX 3, signal 19 control high-speed cache write address MUX 4 select the address to add 4 as its address output.Read/write address upgrades and independently carries out.

Control stream when changing (as method call, return, branch, redirect, interrupt etc.), by control caching read address MUX 3 and high-speed cache write address MUX 4 respectively from the control signal 19 of decoding section, selection is operated bytecode counter 1 value of 2 32 alignment sending here as the new start address of looking ahead by address align, the utility model device begins to read automatically primary memory 13 and with data write cache 14, originally prefetched bytecode of coming in is dropped from this start address; Simultaneously, caching read address MUX 3 selects the bytecode counter value of the byte-aligned sent here by bytecode counter 1 to read the address as new bytecode.

High-speed cache realizes with the SRAM technology that normally the cycle of the each visit of SRAM all is the same, and primary memory adopts the DRAM technology to realize usually, and the characteristics of DRAM are that first word of visit is different with the time of visit subsequent words.As being example with DRAM PC-100, visiting the required time of first word is 20ns, and the follow-up required time of each word is 10ns.The dominant frequency of considering Embedded Real-Time Java processor again generally is not very high, present majority realizes all being no more than 100MHz, if calculate with 100MHz, the 20ns time-delay then can appear when each new fundamental block is looked ahead first word, and follow-up looking ahead all finished at 10ns, and the dominant frequency of processor is also at 100MHz, add statistical data and show that per 32 bit byte sign indicating numbers on average need cycle of 2.41 to finish, be that processor can be with the average frequency near 40MHz, the highest frequency access cache of 100MHz, and the frequency of looking ahead of high-speed cache can reach 100MHz, so can not take place miss again in fundamental block inside.

Access method of the present utility model is as follows:

CPU only reads bytecode from high-speed cache 14, directly primary memory 13 is not read; When but high-speed cache 14 has living space the time spent, under the driving of clock, high-speed cache 14 is the bytecode of reading from primary memory 13 a unit write cache 14 with each 4 bytes automatically; Can write comparer 10 when bytecode and point out that high-speed cache 14 no spaces deposit more multibyte sign indicating number, look ahead automatically and just change wait over to; But the bytecode time spent is arranged in high-speed cache 14, and CPU reads high-speed cache 14, to obtain required bytecode; Point out that when the readable comparer 9 of bytecode no bytecode can be used in the high-speed cache 14, the Java processor changes wait over to.

When control stream changes, the prefetched instruction of not using cancels, bytecode counter 1 value after caching read address MUX 3 and high-speed cache write address MUX 4 are selected to change is as the new start address of looking ahead, described bytecode caching device begins to read automatically primary memory 13 and with data write cache 14 from this start address, and the Java processor then reads the address with this address as new bytecode and reads bytecode send in the byte code register 15 for follow-up use from high-speed cache 14.

If consider the raising of Java processor frequencies, even also miss situation may take place in fundamental block inside, access method of the present utility model can also calculate the WCET of fundamental block inside easily, and the WCET of whole procedure can count WCET according to the execution route under the worst case by fundamental block is organized in the mode of control flow graph.

Fig. 3 is the worst case statistical method signal of fundamental block, and wherein sequence number 20 is represented bytecodes, and N illustrates that this fundamental block comprises N bytecode.Sequence number 21 is byte numbers of each bytecode correspondence, comprises 2 bytes such as the 1st bytecode, and the 3rd bytecode comprises 4 bytes, and N bytecode comprises individual byte of X (X is certain number of 1～4) or the like.Sequence number 22 is that each bytecode needs accessed time point (which clock period), and it equals all the bytecode execution time sums before this bytecode.Sequence number 23 is that each bytecode does not comprise the memory access delay at the spent clock periodicity of interior execution.Sequence number 24 is that each bytecode is by the time point from the memory pre-fetch to the high-speed cache (with the calculating of which clock period).Whether taken place miss during each bytecode visit of sequence number 25 explanations.

As can be known from Fig. 3, execution required time 23 according to bytecode, can calculate the accessed time of each bytecode, it equals all the bytecode execution time sums before this bytecode, for example the 4th bytecode need be the 8th clock period the accessed time, used 2,1,5 clock period respectively because the instruction before its is carried out; And can infer the time 24 that each bytecode is pre-fetched into according to the time-delay index of byte number 21 and DRAM, time-delay when visiting first word is 2 cycles as DRAM, word access time-delay afterwards all is 1 cycle, and it is 32 from the unit that DRAM reads bytecode, therefore, 1st, time of being prefetched to of 2 bytecodes all is 2, because they belong to the 1st 32; And the 4th time that bytecode is prefetched to is 3, because it belongs to the 2nd 32.If the accessed time 22 after being pre-fetched into the time 24 this bytecode be pre-fetched into, miss situation can not take place, otherwise then can take place miss.For example, the 1st bytecode just need be accessed when the 0th cycle, but just got into up to the 2nd cycle, therefore takes place miss.And the 4th accessed time of instruction is 8, but just has been pre-fetched into as far back as the 3rd cycle, therefore can not take place miss.The time that the last item bytecode is pre-fetched in the fundamental block, the execution time of postbyte sign indicating number was exactly the worst execution time of this fundamental block in addition.As the worst execution time of fundamental block among the figure: when Y＞=M, the worst execution time is Y+3; When Y＜M, the worst execution time is (M-Y)+3.

Because each program all is made up of fundamental block,, the following describes the worst execution time static statistics method of whole procedure as being example with the program formed by fundamental block B1～B12 among Fig. 4.What mark in the bracket is the worst execution time of this fundamental block, as the worst execution time of B1 fundamental block be 64.When carrying out the worst execution time of whole procedure static statistics, replace less path with path with the worst bigger execution time, replace B4-B12 as B4-B5-B6-B7-B11-B4.Thereby obtain the maximum path of the worst execution time of whole procedure and the worst execution time that counts whole procedure be: 64+3+5+n (3+3+3+5+5)+m (15+36+6+5)+15+36, wherein n is the execution number of times under the worst case of B4-B5-B6-B7-B11-B4 path, and m is the execution number of times under the worst case of B7-B8-B9-B10-B7 path.

Claims

1. be used for the bytecode caching device of real-time Java processor, it is characterized in that: it comprises that bytecode counter (1), caching read address MUX (3), high-speed cache write address MUX (4), caching read address register (5), high-speed cache writing address register (6), caching read address totalizer (7), high-speed cache write address totalizer (8), the readable comparer of bytecode (9), bytecode can write comparer (10), primary memory (13), high-speed cache (14) and byte code register (15);

It is the pending bytecode address of addressing unit that bytecode counter (1) is used for preserving with the byte, and high-speed cache write address MUX (4) is delivered to afterwards through address align operation (2) in this address;

The output terminal of caching read address MUX (3) and high-speed cache write address MUX (4) connects two input ends of the readable comparer of bytecode (9) respectively, and the output terminal of the readable comparer of bytecode (9) connects the Enable Pin of byte code register (15) and caching read address register (5) respectively;

Whether the readable comparer of bytecode (9) is used for judging in the high-speed cache (14) has bytecode to use, write address then has bytecode to use greater than reading the address, by caching read address totalizer (7) current address is added 1 and squeeze into caching read address register (5) when next clock comes, select this address as output by control signal (19) control simultaneously;

The output terminal that is connected to high-speed cache write address MUX (4) is imported in the address of reading of primary memory (13), and its data output is connected to the data input pin (17) of high-speed cache (14);

High-speed cache (14) read the output terminal that address input end (11) is connected in caching read address MUX (3), its write address input end (12) is connected in the output terminal of high-speed cache write address MUX (4), its data input pin (17) is connected in the data output end of primary memory (13), and its data output end (18) is connected in the input end of byte code register (15);

The output terminal (16) of byte code register (15) is connected to the subsequent segment of Java processor;

High-speed cache (14) read address input end (11) and write address input end (12) connects two input ends that bytecode can be write comparer (10) respectively, the output terminal that bytecode can be write comparer (10) connects the Enable Pin of high-speed cache writing address register (6);

Bytecode can be write comparer (10) and be used for judging that high-speed cache (14) has or not the space to deposit more multibyte sign indicating number, write address is not equal to that representing when reading the address has living space deposits more multibyte sign indicating number, by high-speed cache write address totalizer (8) current address is added 4 and squeeze into high-speed cache writing address register (6) when next clock comes, select this address as output by control signal (19) control simultaneously;

When control stream changes, control caching read address MUX (3) and high-speed cache write address MUX (4) respectively by control signal (19), selecting the address of bytecode counter (1) is OPADD.

2. the bytecode caching device that is used for real-time Java processor according to claim 1 is characterized in that: the address of described high-speed cache (14) is 0～(2 ^m-1), the address of described primary memory (13) is 0～(2 ⁿ-1); Described high-speed cache (14) read the low m position that address input end (11) is connected in caching read address MUX (3) output terminal, the write address input end (12) of described high-speed cache (14) is connected in the low m position of high-speed cache write address MUX ((4)) output terminal; The input of the readable comparer of described bytecode (9) is respectively (n-1)～2 of the high-speed cache write address MUX (4) of n-1～2 of caching read address MUX (3) of n position and n position, and the input that described bytecode can be write comparer (10) is respectively (m-1)～2 of (m-1)～2 of caching read address input end (11) and high-speed cache write address input end (12).

3. the bytecode caching device that is used for real-time Java processor according to claim 1 is characterized in that: described high-speed cache (14) is the dual-port toroidal memory.

4. the bytecode caching device that is used for real-time Java processor according to claim 1, it is characterized in that: the highway width of the data input pin (17) of described high-speed cache (14) is 32, and the width of the data output end (18) of described high-speed cache (14) is 8.