Summary of the invention
Therefore, the objective of the invention is to solve the problems of the technologies described above, a kind of dynamic group that is used for processor caching device that links is provided, can under the situation that does not reduce processor performance, reduce the overall power of processor.
The objective of the invention is to realize through following technical scheme:
On the one hand, the invention provides a kind of dynamic group that is used for processor caching device that links, this device comprises:
A plurality of cache way, each cache way contains the cache blocks of equal number, and each cache blocks comprises tag block and data block; And
It is effectively or invalid that the list cell of the significance bit of record buffer memory piece, the significance bit of said cache blocks are used to indicate cache blocks.
In the technique scheme, said list cell is independent of the SRAM that realizes tag block array or data block array.Said list cell can be to adopt register file to realize.
In the technique scheme;, processor, and the enable bit of the cache way at this cache blocks place is set according to the significance bit of each cache blocks when carrying out read access to said device at first through checking that said list cell obtains the significance bit of each cache blocks in the buffer memory group to be visited; Read valid cache piece in the buffer memory group then, and sense data in the data block of the valid cache piece of the coupling of the marker field from its tag block and memory access address.
In the technique scheme, for invalid cache blocks, the enable bit of the cache way at this cache blocks place is set to not enable; For effective cache blocks, the enable bit of the cache way at this cache blocks place is set to enable.
On the other hand, the invention provides a kind of method that device in the technique scheme is carried out read access, said method comprises:
(1) locatees buffer memory group to be visited according to the index segment of memory access address;
(2) through checking that said list cell obtains the significance bit of each cache blocks in the said buffer memory group;
(3) enable bit of the residing cache way of this cache blocks is set according to the significance bit of each cache blocks in the said buffer memory group;
(4) read valid cache piece in the buffer memory group, and sense data in the data block of the valid cache piece of the coupling of the marker field from its tag block and memory access address.
In the said method, if be checked through cache blocks all in the buffer memory group to be visited when all invalid, can directly send disappearance message in step (2).
In the said method, for the cache blocks that is in disarmed state, the enable bit of its residing cache way is set to not enable in step (3); For the cache blocks that is in effective status, the enable bit of its residing cache way is set to enable;
In the said method, said step (4) can comprise the following steps:
Read valid cache piece in the buffer memory group;
The marker field in the memory access address and the tag block of each cache blocks of being read are compared;
If hit,, from the data block of corresponding cache blocks, select data to write back then according to the offset segment of memory access address; If miss, then send disappearance message.
Compared with prior art, the invention has the advantages that:
, each read access dynamically changes the degree of association of caching device when taking place; The cache way at invalid cache piece place is when read operation takes place in buffer memory group; To not be enabled, thereby effectively reduced the dynamic power consumption that buffer storage causes owing to read operation in the read-write process.Therefore, can not increase design complexities basically and influence under the situation of processor performance, effectively reducing the power consumption of processor cache.
Embodiment
In order to make the object of the invention, technical scheme and advantage are clearer, pass through specific embodiment to further explain of the present invention below in conjunction with accompanying drawing.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
The power consumption of high-speed cache is divided into dynamic power consumption and quiescent dissipation.Dynamic power consumption is meant capacitor charge and discharge power consumption and short-circuit dissipation, mainly is when high-speed cache is carried out read-write operation, to be caused by the upset of circuit.Quiescent dissipation is meant the leakage current power consumption, is the power consumption of circuit state when stablizing.
According to SPEC (The Standard Performance Evaluation Corporation; Standard performance assessment company) analytic statistics of CPU2000 test procedure; Nearly 1/3rd instruction is access instruction or relates to accessing operation in the program, and read operation approximately is 2 times of write operation.In addition, when a write operation takes place, at first to read corresponding cache blocks, see and whether hit determine whether carry out write operation according to whether hitting.So each write operation also is attended by a read operation and takes place.This shows that read operation is the most important dynamic power consumption of buffer storage source.
Existing cache structure can be divided into the buffer memory that directly links, three kinds of fully-associative buffer storage and set associative buffer memorys.The set associative buffer memory improves hit rate, and has less query time than fully-associative buffer storage owing to can effectively reduce the conflict inefficacy than the buffer memory that directly links, thereby is used the most widely.Fig. 1 has provided existing set associative buffer structure synoptic diagram.Fig. 1 shows the buffer structure of one 4 road set associative, comprises 4 cache way: cache way 0, cache way 1, cache way 2, cache way 3.Wherein each cache way contains the cache blocks (for example: 128) of equal number, and each cache blocks is made up of mark (tag) section and data (data) section (also can be called tag piece and data piece) two parts.Memory access address (32 bit address) is divided into mark (tag) section, index (index) section and skew (offset) section.The buffer memory group is linked by the cache blocks that is in same index in each cache way and constitutes.In the buffer structure of 4 road set associatives shown in Figure 1, a buffer memory group is made up of 4 cache blocks.Like the dash area of Fig. 1, by the 0 the tunnel, the associative sets (set) that the cache blocks that is in same index index in the the 1 tunnel, the 2 tunnel, the 3 tunnel is formed is exactly the buffer memory group.For example, by tag1 and the data1 in the tag0 in the cache way 0 and data0, the cache way 1, tag2 in the cache way 2 and data2 form, the link buffer memory group of composition of the tag3 in the cache way 3 and data3.
Fig. 2 has provided the browsing process signal of buffer structure shown in Figure 1.Visit to caching device is a unit with the buffer memory group, that is to say that read and write access all occurs on the buffer memory group, promptly each cache blocks of the different cache way of same index index is operated.As shown in Figure 2, each cache way comprises tag array (for example, can comprise 128 tag pieces) and data array (for example, can comprise 128 data pieces).In particular hardware realized, tag array or data array can be implemented as the SRAM piece, and said SRAM piece contains the enable control bit, when enable is 1 (high some position), can this SRAM piece be read and write; And visit when taking place, the enable position is 0 (dragging down).
Traditional memory access flow process mainly may further comprise the steps:
A) after generating the memory access address, (current potential is drawn high) all put in the enable position of all cache way;
B) arrive concrete certain buffer memory group (for example, if having 128 buffer memory groups, then index is 7bits) through the index segment index of memory access address;
C) read each cache blocks in the respective cache group (comprising tag piece and data piece) and effective marker position (valid/invalid position) thereof;
D) the tag section of reference address is compared with each tag piece of being read, and the effective marker position (also can be called for short significance bit) of cache blocks is checked; If it is effective to hit (be tag coupling) and significance bit, then from corresponding data piece, select data to operate, if miss then send disappearance message according to the offset section in the reference address.
For example; When a read operation takes place; The index section of memory access address is used to the buffer memory group that index will be visited; The tag section of memory access address is used to the tag comparison with each cache blocks from the buffer memory group of being read, if hit and the data of cache blocks effectively then from the corresponding data piece of this cache blocks, select the data (like 8bytes wherein) of needs according to the offset section of memory access address, and write back register.
In above-mentioned memory access process, the buffer memory group is the base unit of index, when each visit takes place, needs all cache way of visit.In buffer structure shown in Figure 1, the buffer memory group is by constituting from the cache blocks in 4 cache way, and therefore when each visit took place, needing the buffer memory way of visit was 4, and the degree of association that is to say this buffer structure is 4.Wherein, when degree of association refers to that each visit takes place, need the buffer memory way of visit.Degree of association is big more, and the cache blocks that the cache blocks that explanation will be read maybe need mate (like the tag coupling) is many more, so power consumption is also big more.
Yet in fact, when read operation each time took place, the data of each cache way not all were effective also in the buffer memory group that visit.Fig. 3 has provided the synoptic diagram that set associative is buffered in a program executory example states.As shown in Figure 3, wherein in each cache way, black block representes it is valid data, and blank block is represented invalid data.For example for the buffer structure of one 4 road set associative, when a read access takes place, in the buffer memory group that visit; Perhaps 4 circuit-switched data are effective entirely, and perhaps 3 circuit-switched data are effective, and perhaps 2 circuit-switched data are effective; Perhaps have only 1 circuit-switched data effective, even the data on all roads are all invalid.Cause data are invalid in the cache blocks the reason can be for example: a) this cache blocks also be initialised, but can be initialised soon; B) procedure time locality and spatial locality are fine, and some cache blocks can not be initialised in the long period; C) this cache blocks was filled, but data have been disabled.Invalid (invalid) message as perhaps in multi-core environment, sending out through other processor through DMA (Direct Memory Access, direct memory access) operation in the monokaryon environment all can deactivate the data of certain cache blocks.
The inventor analyzes discovery through the SPEC CPU2000 test procedure to random choose, and in the different programs implementation, approximately the read operation of 30%-70% is the read operation to the invalid cache road, has produced a lot of unnecessary dynamic power consumptions thus.
Fig. 4 has provided the link structural representation of caching device of the dynamic group that is used for processor according to an embodiment of the invention.Wherein the significance bit of cache blocks is that to come record, the significance bit of said cache blocks to be used to indicate cache blocks through list cell (like the valid among Fig. 4 table) be effectively or invalid.When cache blocks was inserted data for the first time, (for example, being made as 1) put in this cache blocks corresponding effective marker position (abbreviation significance bit) in the valid table.When cache blocks was not received in data, the effective marker position was invalid (for example, being made as 0).It is rearmounted for invalid that data in the cache blocks also can write back internal memory by this processor core; Or be changed to by invalid message from other processor core or DMA invalid.In the present embodiment, this list cell (being called for short the valid table) is to adopt register file to realize, is independent of the SRAM (static RAM) that realizes tag array (also can be called the tag block array) or data array (also can be called the data block array).That is to say that this list cell is not implemented in together with the SRAM of tag array or data array or do not use identical SRAM with tag array or data array.
As shown in Figure 4, when read access took place, this device at first, was checked the significance bit of each cache blocks in this buffer memory group after the index section through the memory access address navigates to certain concrete buffer memory group.For example through checking that valid shown in Figure 4 shows to judge whether the data of each cache blocks are effective.Then; The enable bit (enable/disable position) of the residing cache way of cache blocks is set according to judged result; When finding certain cache blocks invalid (invalid), the enable bit of the residing cache way of this cache blocks is set to not enable (disable) (just current potential being dragged down or place 0); On the contrary, when cache blocks effectively when (valid), the enable bit of the residing cache way of this cache blocks is set to enable (enable) (just current potential being drawn high or place 1).
Then; This device just can only read effective cache blocks when reading each cache blocks of buffer memory group; And the tag section of memory access address and the tag piece of being read compared, if any hitting (being the tag coupling), then according to the offset section of memory access address; Select (like 8bytes wherein) to write back to respective data blocks (like 32bytes).If do not hit, then send disappearance message.Like this;, each read access dynamically changes the degree of association of caching device when taking place; The cache way at invalid cache piece place will not be enabled when read operation takes place, thereby effectively reduce the dynamic power consumption that buffer storage causes owing to read operation in the read-write process.
Fig. 5 has provided the link read access schematic flow sheet of caching device of the dynamic group that is used for processor according to the embodiment of the invention.This read access flow process mainly may further comprise the steps:
(1) after calculating the memory access address, according to the index section location buffer memory group to be visited of memory access address;
(2) in valid table each cache blocks in the inspection buffer memory group to be visited effectively and disarmed state; If during inspection valid, find that the significance bit of all cache blocks is all invalid, then directly send disappearance message.
(3) enable bit of the residing cache way of this cache blocks is set according to the significance bit of each cache blocks; For example, for the cache blocks that is in disarmed state, the enable bit of its residing cache way is set to not enable, thereby masks the visit to the invalid cache piece; For the cache blocks that is in effective status, the enable bit of its residing cache way is set to enable.
(4) read effective cache blocks (comprising tag piece and data piece) in the buffer memory group,
(5) the tag section in the memory access address and the tag piece of each cache blocks of being read are compared,, then, select data to write back from respective data blocks according to the offset section of memory access address if hit; If miss, then outwards send out disappearance message.
It is thus clear that, in an embodiment of the present invention, to said caching device read operation the time,, and when visit, the cache way invalid (disable) at invalid cache piece place is fallen through the invalid cache blocks of precheck, to reduce the visit of cache way, reduce power consumption.
Should point out that the set associative cache device of the foregoing description only is illustrational purpose, rather than limits.That is to say in caching device of the present invention not restriction of quantity to cache way, can be multichannel set associative buffer memory arbitrarily, and wherein cache blocks size is not had particular restriction, and the size of buffer storage itself is not had particular restriction yet.
Though the present invention is described through preferred embodiment, yet the present invention is not limited to described embodiment here, also comprises various changes and the variation done without departing from the present invention.