KR101442494B1

KR101442494B1 - Control method of sequential selective word reading drowsy cache with word filter

Info

Publication number: KR101442494B1
Application number: KR1020130058371A
Authority: KR
Inventors: 장성태; 조윤교
Original assignee: 수원대학교산학협력단
Priority date: 2013-05-23
Filing date: 2013-05-23
Publication date: 2014-09-26

Abstract

The present invention relates to a method of controlling a sequential selective word access driver cache using a word filter, wherein the control method comprises a word filter cache having a storage unit of 1 word, a tag array and selective word reading (SWR) data A method of controlling a sequential selective word access driver cache using a word filter comprising a plurality of arrays and applying a sequential caching scheme and a drag caching scheme to the tag array and the SWR data array, Concurrently performing a request delivery to the cache, a request delivery to the tag array, and a wake-up signal delivery to the SWR data array; Determining whether the word filter cache has been hit; Determining, if not hit in the word filter cache, whether the tag is hit in the tag array; And if the SWR data array is not hit in the tag array, transmitting the request to the lower storage device. When the SWR data array receives the awake signal, it determines whether it is a dragged mode, .

Description

[0001] CONTROL METHOD OF SEQUENTIAL SELECTIVE WORD READING DROWSY CACHE WITH WORD FILTER [0002]

The present invention relates to a method of controlling a sequential selective word access driver cache using a word filter, and in order to reduce the dynamic power consumption of the cache, a filter cache, a sequential cache And a technique for maximizing cache performance by minimizing the overhead while reducing the dynamic and static power consumption of the cache by fusing the cache.

Cache memory is introduced to overcome the speed difference between the processor and the memory. By introducing temporal and spatial locality, memory access time is reduced and the whole system is greatly improved. Because of the increased performance, cache memory is increasingly utilized in embedded processors with limited resource constraints. However, as the capacity of the cache becomes larger and the structure becomes more complicated, the influence of the cache on the power consumption of the entire chip is also increasing.

The power consumption of cache memory can be divided into static consumption and dynamic consumption. Static consumption refers to the energy generated by a small amount of leakage current flowing through each transistor cell of the cache SRAM, and dynamic consumption is the amount of energy consumed when the transistor is switched ).

Various attempts have been made to reduce the power consumption of the cache memory as follows.

As a typical structure for reducing dynamic consumption, there is a filter cache that reduces the dynamic power consumption of the L1 cache by placing an additional storage device corresponding to the L0 cache between the processor register and the L1 cache. The filter cache is relatively smaller than the L1 cache, which can reduce the dynamic power consumption of the L1 cache when a hit occurs in the filter cache. However, since the filter cache is located in a critical path, performance is degraded if it is not hit in the filter cache. Particularly, in the case of a data cache having a relatively small temporal / spatial association, the performance of the filter cache is low because the hit rate of the filter cache is low.

In addition, Phase Access Cache or Sequential Cache separates the tag array and the data array of the cache to reduce dynamic power consumption. The phase cache operates the tag array with relatively small operating power first in the n - way set associative cache to find out the hit state, and then operates only the data array of the hit way. Thus, when hit, only 1 / n data array dynamic power is used compared to the conventional cache model. However, since it is located in the required path like the filter cache, the performance decreases, and since the performance decreases when hit, the performance decrease decreases in proportion to the dynamic power consumption.

Meanwhile, Drowsy Cache is a technique for reducing static consumption. The Draugi cache includes a normal mode in which a normal voltage is supplied to the cache line and a drowsy mode in which a low voltage is supplied to the cache line in order to maintain the data storage state while reducing leakage power I have. When data is requested when the cache line is in the drain mode, it must be accessed after a wake-up operation that first boosts the cache line to a steady state voltage. However, additional cycles are consumed in this awakening task, which results in the drawbacks of performance degradation.

As such, conventional techniques for reducing the power consumption of the cache have a limitation in sacrificing performance in order to reduce dynamic and static consumption. In addition, conventional techniques are not optimized to each other without mutual benefits.

In the art, there is a need to maximize the advantages of each technique by applying both filter caches, sequential caches, and drain caches, which are conventional techniques for reducing cache power consumption, while maximizing the cache performance by minimizing the overhead. .

According to an aspect of the present invention, there is provided a method for controlling a sequential selective word access driver cache using a word filter according to an embodiment of the present invention includes a word filter cache having a storage unit of one word, a tag array, and a selective word access A method of controlling a sequential selective word access driver cache using a word filter comprising a sequential cache scheme and a drag cache scheme for the tag array and the SWR data array, the method comprising: Concurrently performing a request delivery to the word filter cache, a request delivery to the tag array, and a wake-up signal delivery to the SWR data array; Determining whether the word filter cache has been hit; Determining, if not hit in the word filter cache, whether the tag is hit in the tag array; And if the SWR data array is not hit in the tag array, transmitting the request to the lower storage device. When the SWR data array receives the awake signal, it determines whether it is a dragged mode, .

In addition, the means for solving the above-mentioned problems are not all enumerating the features of the present invention. The various features of the present invention and the advantages and effects thereof will be more fully understood by reference to the following specific embodiments.

In order to reduce the dynamic power consumption of the cache, the filter cache, sequential cache, and drain cache, which are conventional techniques for reducing cache power consumption in the proposed selective word access cache, are merged to reduce the dynamic and static power consumption of the cache, A control method of a sequential selective word-accessing drive cache using a word filter capable of maximizing the cache performance by minimizing the head can be provided.

1 is a diagram showing a structure of a conventional L1 reference cache,
FIG. 2 is a diagram showing a structure in which a conventional filter cache is applied to the L1 reference cache shown in FIG. 1;
3 is a diagram showing the structure of a conventional sequential cache,
4 is a diagram showing the structure of a conventional drive cache,
5 is a diagram showing the structure of a cache implemented by combining both a conventional filter cache, a sequential cache, and a drive cache;
FIG. 6 is a memory request flow chart of the cache shown in FIG. 5,
Figure 7 is a diagram illustrating the structure of a selective word access cache proposed by the present invention;
8 is a diagram illustrating the structure of a cache that combines a conventional filter cache, a sequential cache, and a drive cache in the selective word access cache shown in FIG. 7 according to the present invention;
FIG. 9 is a diagram illustrating an index transfer process for awake operation of the cache shown in FIG. 8; and
10 is a memory request flowchart of the cache shown in FIG.

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings, in order that those skilled in the art can easily carry out the present invention. In the following detailed description of the preferred embodiments of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. In the drawings, like reference numerals are used throughout the drawings.

In addition, in the entire specification, when a part is referred to as being 'connected' to another part, it may be referred to as 'indirectly connected' not only with 'directly connected' . Also, to "include" an element means that it may include other elements, rather than excluding other elements, unless specifically stated otherwise.

Prior to describing a method of controlling a sequential selective word access driver cache using a word filter according to the present invention, conventional techniques for reducing power consumption of a conventional cache structure and a cache will be described in detail.

1 is a diagram showing a structure of a conventional L1 reference cache.

Referring to FIG. 1, when an address 110 of an instruction is transferred, a portion corresponding to a virtual tag is transferred to a Translation Lookaside Buffer (TLB) 120 and converted into a physical address tag. At the same time, a portion corresponding to an index is transmitted to the L1 tag array 130 and the L1 data array 140, and the tag having the converted tag is determined to be hit in the L1 tag array 130. [ In the event of a hit, the block is transferred to a register, the longest required path in this process being the L1 data array 140, which typically takes two cycles.

For example, assuming 32kB, 4-way, 32B block size, and 45nm process technology, Table 1 shows the results of investigating various factors of the conventional L1 reference cache through CACTI.

Total Access Time (ns) 0.453075 Dynamic energy per access (nJ) 0.047674595 Static power (mW) 20.1058 Access time analysis Tag Array (ns) 0.285265 The data array (ns) 0.453075 Energy analysis Tag array Dynamic energy (nJ) 0.001697645 Static power (mW) 2.93087 Data array Dynamic energy (nJ) 0.04597695 Static power (mW) 18.6582 Area analysis Tag Array (mm2) 0.0163746 Data array 0.158728

Assuming a 3Ghz processor, the data array access speed corresponds to two cycles and is equal to the cache access time.

2 is a diagram showing a structure in which a conventional filter cache is applied to the L1 reference cache shown in FIG.

Referring to FIG. 2, the filter cache 250 is implemented at the L0 position in a conventional L1-based cache structure and essentially passes through before accessing the L1 cache. If the filter cache 250 is hit, it is one cycle faster than the L1 reference cache because it can resolve the memory request for one cycle. However, since the filter cache 250 has a longer required path, if the filter cache 250 fails to hit, it becomes one cycle slower than the L1 reference cache. Thus, the cache performance is determined by the hit ratio to the filter cache 250.

In case of performance, performance of L1 reference cache is 2.2 ~ 21.8 depending on application program and 1.5 ~ 21.5 filter cache.

In the case of energy, it sees the gain as much as the L1 power consumption when hit the filter cache, and damages as much as the filter cache power consumption in all other cases.

3 is a diagram showing a structure of a conventional sequential cache.

Referring to FIG. 3, the sequential cache is first accessed to access the L1 data array 340 after accessing the L1 tag array 330. FIG. The sequential cache sees one cycle of loss for all accesses but when using L1 energy array 340 dynamic energy of 1 / n of L1 data array 340 assuming n-set associative cache And does not approach the L1 data array 340 when not hit the L1 tag array 330, which is effective for energy reduction.

Sequential caches are often used in sub - storage - level memories that are large in size, high in set associativity, and relatively limited in time. However, in the case of the L1 cache, it is difficult to apply the sequential cache without a special algorithm because the set associativity is small and the time limit is large.

4 is a diagram showing the structure of a conventional drive cache.

In the case of the Draugi cache, the voltage control device selects a steady voltage of 1 V and a low power voltage of 0.3 V and supplies it to the SRAM. In the case of a Dragey mode (D mode) for supplying a low-power voltage, the set consumes only about 2% of static energy as compared with the steady-state voltage mode (N mode) However, an additional cycle is required for the wake-up process when approaching the set in DRAWJI mode. Thus, the Draugi cache can achieve good performance in applications where intensive memory requests are made to some of the entire Working Set.

As a result of driving the SPEC 2000 in the DRUJI cache as shown in FIG. 4, a performance reduction of about 5% to about 10% and a static power reduction of about 85% were obtained. Considering the time delay due to performance degradation, the real gain power reduction is observed to be around 80%.

FIG. 5 is a diagram showing a structure of a cache implemented by combining a conventional filter cache, a sequential cache, and a drain cache, and FIG. 6 is a memory request flowchart of the cache shown in FIG.

Referring to FIGS. 5 and 6, when a memory request is started, a request is firstly transmitted to the filter cache 550 (S610), and it is determined whether or not the filter cache 550 is hit (S620) The memory request is completed.

On the other hand, if not hit in the filter cache 550, a request is passed to the L1 tag array 530 (S630), and it is determined whether the L1 tag array 530 is hit (S640) (S645).

On the other hand, when hit in the L1 tag array 530, a request is delivered to the hit way of the L1 data array 540 (S650).

It is determined whether or not the L1 data array 540 to which the request is transmitted is in the draggedge mode (S660). If it is not the dragged mode, the memory request is completed in four cycles. If the L1 data array 540 is in the draggedge mode, (S670), the memory request is completed in 5 cycles.

As shown in FIG. 5, when the conventional filter cache, the sequential cache, and the drain cache are all combined, the filtering of the L1 access by the filter cache, the way filtering of the data array by the sequential cache and the static power In addition to the reduction effect, it is possible to obtain a large power reduction effect by extending the time of the drainage mode by the sequential filter cache. However, if the filter fails to hit the cache, each technique will see a total of three cycles of loss, one cycle, and the cache access time will increase to four cycles or five cycles.

Accordingly, the present invention proposes a technique for minimizing the performance loss while maximizing the energy reduction mechanism of conventional techniques by combining the conventional filter cache, the sequential cache, and the drain cache.

7 is a diagram illustrating a structure of a selective word access cache proposed by the present invention.

The selective word access (SWR) cache proposed by the present invention is implemented so as to reduce dynamic power consumption by deactivating other parts than necessary by making a word selection performed in the cache controller in cache connection.

In the selective word access cache, access to the tag array, which is relatively small in size and low in dynamic power consumption, is performed in the same manner as the conventional cache, and access to the data array is performed using the upper bits of the block offset, .

Comparing the structure of the cache shown in Fig. 7 with the prior art, there is a hardware difference in the data output bus portion. The data output bus is used when a read instruction in the cache is hit, and a power consumption gain occurs at this time.

Specifically, when a read hit occurs, the shaded portion is deactivated in Fig. 7, which consumes about 25% of the L1 cache and about 50% of the dynamic power of the L2 cache. In general, since the cache hit ratio exceeds 90% and the ratio of the read access command among all memory access requests is also over 90%, the selective word access cache can effectively reduce the dynamic power consumption.

To deactivate the shaded portion in FIG. 7, a portion of the address may be sent to the memory request upon cache access. The upper two bits of the block offset for the L1 cache and the upper one bit of the block offset for the L2 cache are required.

When a hit failure occurs in the cache, the data input bus is used. In this case, the same operation as that of the conventional cache is performed, and no overhead occurs.

FIG. 8 is a diagram showing the structure of a cache that combines a conventional filter cache, a sequential cache, and a drive cache in the selective word access cache shown in FIG. 7 according to the present invention.

When combining the above-described selective word access cache technique with a cache implemented by combining all of the conventional filter cache, sequential cache, and drain cache shown in FIG. 5, as shown in FIG. 8, 8B one word, i.e., a word filter cache 850. Accordingly, it is possible to perform fine-grain application of the Dragey cache method, that is, wake-up operation on a word-by-word basis.

Thus, if the storage unit of the filter cache is reduced from 32B to 8B, the number of entries can be increased by a factor of four if the same capacity is assumed. If the number of entries is increased, And a better hit ratio. On the other hand, if the same entry is assumed, the capacity can be reduced to 1/4.

FIG. 9 is a diagram illustrating an index transfer process for awakening a cache shown in FIG. 8. FIG.

The upper two bits of the block offset are additionally transmitted to the L1 data arrays 941 and 942 as shown in FIG. 9 to perform the wake-up operation on a word-by-word basis. In addition, conventionally, one-bit-by-one-by-word management of the dragged bits, which have been added one bit at a time, has to be managed.

As described above, when the power mode is managed by finely dividing the Draugi cache technique, it is possible to increase the ratio of the Dragey mode to the conventional technique of managing each block, thereby further reducing power consumption.

10 is a memory request flowchart of the cache shown in FIG.

8 and 10, when a memory request is initiated, a request is forwarded to the word filter cache 850 (S1010), a request is forwarded to the L1 tag array 830 (S1020), and the L1 SWR data array 840 And awake signal transmission (S1030).

It is determined whether or not the word filter cache 850 is hit (S1011). If the hit is detected, the memory request is completed in one cycle.

On the other hand, if it is not hit in the word filter cache 850, it is determined whether the L1 tag array 830 is hit or not (S1021). If not hit, the request is transmitted to the lower storage device (S1023).

On the other hand, when hit in the L1 tag array 830, a request is delivered to the hit way, hit word of the L1 SWR data array 840 (S1022), and the memory request is completed in three cycles.

In step S1031, it is determined whether the L1 SWR data array 840 to which the awake signal is transmitted is in the dragged mode. If the L1 SWR data array 840 is not in the dragged mode, the process proceeds to step S1021. (S1032) and then proceeds to step S1021. 8, after waking up the corresponding line of the L1 SWR data array 840, it is hit in the word filter cache 850, or if the L1 tag array 830 fails to hit, do.

As such, according to the present invention, the word filter cache 850 is enabled by concurrently performing a request delivery to the word filter cache 850, a request delivery to the L1 tag array 830, and an awake signal delivery to the L1 SWR data array 840, , The L1 tag array 830 and the L1 SWR data array 840 to be performed in parallel. This eliminates the consumption of additional cycles even though the filter cache, the sequential cache, and the drag cache are all combined.

By fusing the conventional filter cache, the sequential cache and the drive cache in the selective word access cache as described above, it is possible to achieve an average of 33.28% dynamic energy . In addition, the static power reduction effect can be obtained from the Draugi cache, and the Draugi cache can be further benefited by finely dividing and managing the word.

This results in a 73.4% dynamic energy reduction in the L1 cache, a 83.2% static energy reduction, and a total energy savings of 71.7%.

The present invention is not limited to the above-described embodiments and the accompanying drawings. It will be apparent to those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

110, 210, 310, 410, 510, 810: Instruction address
120, 220, 320, 420, 520, 820: TLB
130, 230, 330, 430, 530, 830: L1 tag array
140, 240, 340, 440, 540: an L1 data array
250, 550: filter cache
840: L1 SWR data array
850: Word Filter Cache

Claims

A word filter including a word filter cache, a tag array, and a selective word reading (SWR) data array having a storage unit of 1 word, wherein the tag array and the SWR data array have a sequential caching scheme and a drag caching scheme applied to the SWR data array A method of controlling a sequential selective word-accessing drag-and-drop cache,
Concurrently performing a transfer of a request to the word filter cache, a transfer of a request to the tag array, and a wake-up signal transfer to the SWR data array when a memory request is delivered;
Determining whether the word filter cache has been hit;
Determining, if not hit in the word filter cache, whether the tag is hit in the tag array; And
And forwarding the request to the child storage if it is not hit in the tag array,
Wherein the SWR data array receives a wake-up signal and determines whether the wake-up mode is a draw mode, and performs a wake-up operation when the wake-up mode is a draw mode.

The method according to claim 1,
Further comprising forwarding a request to a hit word of the SWR data array, the hit word, when hit in the tag array. &Lt; Desc / Clms Page number 19 >

The method according to claim 1,
Wherein the SWR data array performs a wake-up operation on a word-by-word basis, and uses a word filter selectively accessible to the word to control the sequential selective word-access driver cache.

The method of claim 3,
And a word filter for transferring the upper two bits of the block offset to the SWR data array to perform the wake-up operation on a word-by-word basis.

The method according to claim 1,
Wherein the SWR data array performs a wakeup operation and then hits in the word filter cache or hits a miss in the tag array.