WO2006038258A1 - データプロセッサ - Google Patents
データプロセッサ Download PDFInfo
- Publication number
- WO2006038258A1 WO2006038258A1 PCT/JP2004/014353 JP2004014353W WO2006038258A1 WO 2006038258 A1 WO2006038258 A1 WO 2006038258A1 JP 2004014353 W JP2004014353 W JP 2004014353W WO 2006038258 A1 WO2006038258 A1 WO 2006038258A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- address translation
- address
- way
- data
- cache
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0864—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
- G06F12/1027—Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
- G06F12/1045—Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] associated with a data cache
- G06F12/1054—Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] associated with a data cache the data cache being concurrently physically addressed
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present invention relates to a data processor having a cache memory and an address translation buffer.
- a direct mapping method In a cache memory, a direct mapping method, a set associative method, a full mapping method as a mapping method for associating data in an external memory with data in the cache memory in units of a certain size block.
- a social method if the size of the block is B bytes and the number of blocks in the cache memory is c, the number m of the block containing the byte at address a in the external memory is the integer part of aZB.
- the direct mapping method the block of the external memory with the number m is uniquely mapped to the block with the number obtained by the expression m mod c in the cache memory.
- any block in the external memory can be mapped to any block in the cache memory.
- all blocks in the cache memory must be associatively searched for each access, which is difficult to achieve with a practical cache capacity. Therefore, in practice, a setassociative method in between the two is generally used.
- the block (way) is a configuration that takes advantage of both by applying a fully associative mapping, and is called an n-way set associative method depending on the value of n.
- the cache line force tag, valid bit, and data of each of the four ways indexed by the index bit of the virtual address are read.
- the In a physical address tag cache which is a practical cache method, the physical address obtained by converting the virtual address by the address translation buffer (TLB) is compared with the tag of each way. A way with a matching tag and a valid bit of 1 is a cache hit.
- the data requested by the CPU can be supplied by selecting data from the data array of the way that hits the cache.
- a cache miss is when all the ways do not hit. In this case, it is necessary to obtain valid data by accessing the lower-level cache memory or external memory.
- the concept of full associative, set associative, and direct mapping can be applied to the TLB configuration independently of the cache.
- Patent Document 1 was obtained in a prior art search after completion of the present invention.
- Patent Document 1 describes an invention for efficiently performing TLB hit judgment and cache hit judgment in a microprocessor having a TLB and a cache memory.
- a TLB / cache that serves as both TLB and cache memory is arranged, and when the virtual address is converted to a physical address, the TLB / cache is indexed by the virtual address, and the tag is read.
- the side bits are compared, and a cache hit signal is generated by the comparison result signal and the valid flag CV.
- This technology is characterized in that the judgment of the cache hit and the judgment of the TLB hit are performed together in one comparison operation, and a direct map is shown as an example.
- the data for one cache line may be equal to the page size, which is the address translation unit, and the read / write unit for the cache line by index is 1 kilobyte or 4 kilobyte compared to the normal size such as 32 bytes. It will be dozens of times more.
- Patent Document 1 Japanese Unexamined Patent Publication No. 2003-196157
- the present inventor examined the power consumption by the set associative cache memory. For example, in the case of a 4-way set associative cache memory, it is necessary to read a 4-way tag and perform a cache hit determination each time a memory access occurs. 4 way days At the same time, the data is read out at the same time, and the data of the hit hit by the cache hit determination signal is selected. For this reason, the present inventor has found that it is necessary to perform a read operation for all of the four ways of the tag memory and the four ways of the data memory, and the power consumption is large.
- Patent Document 1 The focus of Patent Document 1 is to use both the physical page number of the TLB and the cache tag in order to efficiently perform the TLB hit judgment and the cache hit judgment.
- An object of the present invention is to reduce power consumption by the set associative cache memory in a data processor having a set associative cache memory and an address translation buffer.
- each way in the set associative cache memory with ways equal to the number of entries in the TLB corresponds to the page size that is a unit of address translation by the TLB.
- Corresponding storage capacity is provided in the data part and no tag memory or tag is provided as the address part.
- Each entry in the TLB and each way in the cache memory have a one-to-one correspondence, and only the data in the area mapped to the physical address specified by the TLB address translation pair can be cached in the corresponding way.
- Only one way of cache data array operation is selected by the TLB hit signal obtained by the logical product of the TLB virtual page address comparison result and the TLB valid bit.
- the cache valid bit of the way selected for operation is used as the cache hit signal.
- the present invention will be further described below by dividing it into a plurality of items.
- the data processor has an address translation buffer and a set-associative cache memory, and the address translation buffer has n entry fields for storing address translation pairs,
- the cache memory has n ways one-to-one corresponding to the entry field, and each way includes a data field having a page size storage capacity as an address conversion unit.
- the address translation buffer outputs an associative comparison result for each entry field to a corresponding way, and the way starts a memory operation in response to an associative hit of the input associative comparison result. According to the above means, only one way corresponding to the associative hit of TLB is activated, so that the tag array and the data array of all the ways are read in parallel in the set associative cache memory and operated. This can be avoided and can contribute to low power consumption.
- the address translation pair has information that makes a pair of a virtual page address and a physical page address corresponding to the virtual page address, and the physical page address of the data held in the data field is Equal to the physical page address held by the address translation pair of the corresponding entry field.
- the cache memory does not need to have an address tag field paired with the data field.
- the address conversion buffer compares the input conversion target address with the virtual page address of the entry field, and on the condition that the entry field whose comparison result matches is valid, the way corresponding to the entry field. A way hit is notified, and this way hit notification indicates an associative hit of the associative comparison result.
- a control unit (2, 24) is provided for replacing an address conversion buffer entry when all the associative comparison results by the address conversion buffer are associative misses.
- the entry replaces the address translation buffer entry, it invalidates the data field of the cache memory way corresponding to the replaced entry.
- the control unit further has the data of the data field to be copied back when invalidating the data field of the way of the cache memory corresponding to the entry to be replaced, and if so, in the lower memory. Write back.
- a data processor includes an address translation buffer and a set-associative cache memory, and the address translation buffer includes n entry fields for storing address translation pairs.
- the cache memory has n ways one-to-one corresponding to the entry field, and each way is assigned to store data of a physical page address held in the corresponding entry field. The way starts the memory operation on condition that the associative comparison result for the corresponding entry field becomes an associative hit. Therefore, it is possible to avoid reading the tag array and data array of all the ways in parallel in the set associative cache memory, thereby contributing to low power consumption.
- a control unit that replaces an entry in the address translation buffer when the associative comparison result for all entry fields is an associative miss.
- the control unit When replacing an entry, invalidate the cache data of the way of the cache memory corresponding to the entry to be replaced. Further, when invalidating the data in the cache memory way corresponding to the entry to be replaced, the control unit has the data to be copy-knocked, and if so, writes it back to the lower memory.
- a data processor includes an address translation buffer and a set associative cache memory, and the address translation buffer has n entries for storing an address translation pair. And a prediction circuit that predicts an entry field that becomes a translation hit of address translation, and the cache memory has n ways one-to-one corresponding to the entry field, and each way corresponds to Allocated to store the data located at the physical page address held by the entry field. In the way, the corresponding entry field is the prediction area of the address translation hit.
- the memory operation is started on the condition that the cache memory is present, and the cache memory generates a cache hit on the condition that the prediction of the address translation hit matches the actual address translation result.
- the control mode that activates a corresponding way in response to an associative hit in the TLB is the timing for initiating the operation of one way after the TLB associative search result is output, in parallel with the TLB associative search.
- the time required for the index operation of the cache memory becomes longer than the control mode in which the cache memory is indexed.
- a data processor includes an address translation buffer and a set associative cache memory having a plurality of ways, and the address translation nof is a virtual page address.
- the address translation pair that holds the information and the physical page address information, and the tag of the cache memory is shared with the physical page address information held by the address translation pair of the address translation buffer and is used as a hit signal of the address translation buffer. The corresponding cache way operation is selected accordingly.
- a data processor includes an address translation buffer and a set associative cache memory having a plurality of ways, the address translation buffer including virtual page address information and The address translation pair having physical page address information is stored, and the physical address space data specified by the physical page address information held by the translation pair of the address translation buffer is stored in the corresponding way of the cache memory. The operation of the corresponding way is selected according to the way hit signal of the address translation buffer.
- a data processor using a prediction circuit includes an address conversion nofer and a set associative cache memory having a plurality of ways, and the address conversion buffer includes: An address translation pair that holds virtual page address information and physical page address information; and a prediction circuit that predicts a translation hit of the address translation buffer, and the address translation pair of the address translation buffer holds the cache memory tag. It is shared with the physical page address information, and the operation of the corresponding cache way is selected according to the prediction by the prediction circuit, and a cache hit is generated on condition that the prediction matches the actual address translation result.
- a data processor using a prediction circuit includes an address conversion nofer and a set associative cache memory having a plurality of ways, and the address conversion buffer includes: An address translation pair that holds virtual page address information and physical page address information; and a prediction circuit that predicts a translation hit of the address translation buffer, and is based on physical page address information held by the translation pair of the address translation buffer.
- the data of the physical address space specified by the cache memory is stored in the corresponding way of the cache memory, and the operation of the corresponding cache way is selected according to the prediction by the prediction circuit, and the prediction matches the actual address conversion result. Generates a cache hit on condition that
- the data processor having the set associative cache memory and the address conversion buffer Power consumption by the set associative cache memory can be reduced.
- FIG. 1 is a block diagram showing a detailed example of ITLB and ICACHE.
- FIG. 2 is a block diagram of a data processor according to an example of the present invention.
- FIG. 3 is an address map illustrating the relationship between main memory data and cache memory data in a configuration in which the address translation buffer and cache memory are linked and operated as typified by FIG. .
- FIG. 4 is a flowchart showing the operation flow of ITLB and ICACHE.
- FIG. 5 is a flowchart showing a TLB rewrite control flow.
- FIG. 6 is a flowchart showing a cache rewrite control flow.
- FIG. 7 is a block diagram showing a detailed example of ICACH and ITLB using the prediction result of address translation hits.
- FIG. 8 is a block diagram showing, as a comparative example, a cache memory that indexes all ways in parallel.
- FIG. 9 is an address map illustrating the relationship between the data in the cache memory in FIG. 8 and the data in the main memory.
- VPN virtual page address (entry field)
- PPN physical page address (entry field)
- FIG. 2 shows a data processor according to an example of the present invention.
- the data processor (MPU) 1 shown in the figure is not particularly limited, but is formed on a single semiconductor substrate (semiconductor chip) such as single crystal silicon by a known semiconductor integrated circuit manufacturing technique.
- the data processor 1 includes, for example, a central processing unit (CPU) 2 as a data processing unit.
- the central processing unit 2 is connected to an internal bus (IBUS) 4 via an address translation buffer and a cache unit (TLB'CACH) 3. Connected to.
- the internal bus 4 employs a split “transaction” bus protocol.
- the internal bus 4 is connected to a bus controller (BSC) 5 that performs external bus control or external memory interface control.
- BSC bus controller
- a bus controller 5 is connected to a main memory (MMRY) 6 composed of synchronous DRAM or the like.
- the external circuit connected to the bus controller is not limited to the memory, but may be configured to be connected to other LSI (eg LCDC, peripheral circuit) power.
- a peripheral bus (PBUS) 8 is connected to the internal node 4 via a bus bridge circuit (BBRG) 7.
- Peripheral circuits such as an interrupt controller (INT C) 10 and a clock pulse generator (CPG) 11 are connected to the peripheral bus 8.
- a direct 'memory' access' controller (DMAC) 12 is connected to the peripheral bus 8 and the internal bus 4 to control data transfer between modules.
- the CPU 2 includes, but is not limited to, a general-purpose register and an arithmetic logic unit. And an instruction control unit that includes a program counter, an instruction decoder, etc., and controls instruction fetching and decoding, instruction execution procedures, and arithmetic control.
- the address translation buffer and cache unit 3 includes an instruction address translation buffer (IT LB) 20, an instruction cache memory (ICACHE) 21, a data address translation buffer (DTLB) 22, a data cache memory (DCACHE) 23, and a control It has a circuit 24.
- the ITLB 20 has information on a pair of a virtual instruction address and a physical instruction address corresponding to the virtual instruction address as a conversion pair.
- the DTLB 21 has information of a pair of a virtual data address and a physical data address corresponding to the virtual data address as a conversion pair. These conversion pairs are a copy of part of the page management information on the main memory 6.
- ICACHE21 has a copy of some instructions of the program held in the program area on the main memory.
- DCACHE23 has a copy of part of the data held in the work area on the main memory.
- the CPU 2 When the instruction fetch is performed, the CPU 2 asserts an instruction fetch signal 25 to the ITLB 20 and ICACHE 21 and outputs a virtual instruction address 26. ITLB20 outputs a virtual address translation hit signal 27 to ICACHE21 when there is a translation hit for the virtual address. ICA CHE21 outputs instruction 28 corresponding to the virtual instruction address to CPU2.
- CPU 2 When CPU 2 performs data fetch, it asserts data fetch signal 30 to DTLB 22 and DCACHE 23 and outputs virtual data address 31.
- the DTLB 22 outputs a virtual address translation hit signal 32 to the DCACHE 23 when there is a translation hit for the virtual address.
- DCACHE23 In the case of read access, DCACHE23 outputs data 33 corresponding to the virtual data address to CPU2, and in the case of write access, DCACHE23 writes data 33 from CPU2 to the cache line corresponding to the virtual data address.
- the control circuit 24 performs control such as notifying the CPU 2 of a TLB exception handling request in response to the occurrence of a conversion miss in the ITLB 20 and DTLB 22. Further, the control circuit 24 performs replacement control of a cache entry in response to occurrence of a cache miss in the ICACHE 21 and DCACHE 23.
- the address translation and cache unit 3 performs physical instruction address 40 output, instruction 41 input, data address 42 output, data 43 input / output, etc., with the internal bus 4.
- Figure 1 shows a detailed example of ITLB and ICACHE.
- ITLB20 has an 8-entry associative configuration
- ICACH21 has an 8-way set associative configuration, for example.
- Two entries ETY0 and ETY7 are representatively shown in ITLB20.
- an entry can be called a way.
- it is called an entry to distinguish it from a cache memory way.
- Each entry has an entry field for holding a virtual page address (VPN), a valid bit (V) of the entry, and a physical page address (PPN).
- VPN and PPN constitute a conversion pair.
- the page size which is an address translation unit by ITLB 20
- the virtual address space is a 32-bit address space.
- the bit width of VPN and PPN is 20 bits from the 13th bit to the 32nd bit ([31:12]).
- CMP is a comparison means
- ND is a functional AND gate.
- a memory cell having a comparison function in units of bits can be employed in a memory having a fully associative configuration.
- the comparison function and the logical product function may be assigned to the memory cells in units of bits.
- the virtual page address [31:12] is compared with VPN ([31:12]) by comparison means CMP, and this matches and valid bit TV is set to 1.
- the entry conversion hit signal 50 [0] in the entry ETYO becomes a logical value 1 that means a hit.
- Two or more entry conversion hit signals 50 [7: 0] from each entry become logical 1 at the same time.
- a TLB multi-hit state does not normally occur. When a TLB multi-hit condition is generated, it is detected and the multi-hit exception handling request is notified to the CPU 2 to deal with it.
- a logical sum circuit (OR) 51 takes a logical sum of eight signals 50 [7: 0] to generate a conversion hit signal 53.
- the control circuit 24 receives the conversion hit signal 50 and generates a TLB miss exception request to the CPU 2 when a TLB miss is notified.
- One PPN of the entry is selected by the selector 52 by the entry conversion hit signal 50 [7: 0] and output as a physical page address. This physical page address is output to the internal bus 4 as a physical page address constituting the physical address 40 indicated by 40 in FIG.
- the entry translation hit signal 50 [7: 0] is ANDed with the instruction fetch signal 25 in the AND gate 54, and the virtual address translation hit signal 27 [7: 0] Is supplied to the instruction cache memory 21.
- the instruction cache memory 21 has eight ways WAYO-WAY7.
- Way WAYO When designating all or any one of WAY7, it is simply written as WAY WAY.
- Each way WAY0-WAY7 has a data field DAT and a valid bit field V.
- the cache capacity of the data field of each way WAY matches the page size and is 4 kilobytes.
- the cache line size of the data field DAT is shown as an example of 32 bytes, and the lower side [11: 5] of the virtual address is given as the index address 60 to the instruction cache memory 21.
- the lower address [4: 0] of the virtual address is the in-line offset address 61, and is used to select the data position within 32 bytes in one line.
- the selector 63 is used for the selection.
- each way WAY0-WAY7 is selected for memory operation when the corresponding virtual address translation hit signal 27 [7: 0] is a translation hit.
- the way WAY for which the memory operation has been selected can be addressed by an index address, etc., to select a memory cell, to read the selected memory cell power storage information, or to store the information in the selected memory cell Is made possible. Therefore, even when there is an instruction access request, the way WAY will not start unless the corresponding virtual address translation hit signal 27 [7: 0] hits!
- the virtual address translation hit signal 27 [7: 0] is a translation hit signal for each virtual page
- the virtual address translation hit signal 27 [7: 0] is incremented by 1 to the logical force value 1 (translation hit value).
- the logical force value 1 translation hit value
- only one way can be operated.
- only one way WAY corresponding to the virtual page related to the address translation hit by TLB is operated. All ways are not operated in parallel. Thereby, useless power consumption can be suppressed.
- the cache line corresponding to the index address 60 is selected from the data field DAT and the valid bit field V, and the data and the valid bit are read out.
- the read data is selected by the selector 63 by the offset address 61.
- the data output from the selector 63 and the valid bit from which the way force is also read are selected and output by the selector 64 that performs the selection operation by the virtual address translation hit signal 27 [7: 0].
- the effective bit selected by the selector 64 is supplied to the control circuit 24.
- Control circuit 2 4 considers the valid bit as the cache hit signal 65, and if it is a cache hit (if the valid bit is a logical value indicating validity), the data selected by the selector 64 is supplied to the CPU 2 as the cache data 28.
- the main memory 6 is accessed via the bus controller 5 to control fetching the corresponding instruction into the cache line and supply the fetched instruction to the CPU 2.
- the force data system DTLB and DCACHETLB described for the command system ITLB and ICACHE can be similarly configured.
- data there is no need to perform operations that are different from conventional cache memory, except for selecting the way that also causes write access.
- cache memory operations are required in connection with TLB misses.
- FIG. 3 illustrates the relationship between the data in the main memory and the data in the cache memory in a configuration in which the address translation buffer and the cache memory are linked and operated as typified by FIG.
- the PPN is 2 bits and the page size is 3 bits.
- the cache memory way has 8 cache lines!
- the index address Aidx is 3 bits.
- the TLB PPN corresponding to way WAYO is page number 00
- the TLB PPN corresponding to way WAY1 is page number 10.
- the cache memory way WAYO can store the range RNGO of the main memory from 00000 to 00111
- the way WAY1 can store the range RAG1 of the main memory from 10000 to 10111.
- the activation of the memory operation can be determined for each way of the cache memory by the virtual address translation hit signal for each entry in the TLB.
- data is registered in the cache memory in line size units, and each bit has a valid bit. When valid data is registered in the cache, the valid bit is set to logical value 1 and the data is stored. Indicates that it is valid
- FIG. 4 illustrates the operation flow of ITLB and ICACHE. Life issued from CPU2
- the upper [31:12] of the instruction virtual address is compared with the VPN of each entry in the instruction TLB, and the logical product of the comparison result and the valid bit of each entry is taken to obtain the virtual address translation hit signal 27 [7 : 0] is generated (S 1). It is determined how many logical values 1 are present in the virtual address translation hit signal 27 [7: 0] (S2). If two or more, the TLB multi-hit status is notified to CPU2 (S3). If there is only one logical value 1, the memory operation of the way related to the hit is selected, and the indexed data and valid bit are read from the way (S4). The logical value of the read valid bit is determined (S5).
- the read data is supplied to the SCPU (S6). If not valid, a cache line fill operation for a cache miss is performed by cache rewrite control (S7). If all the logical values are 0 in step S2, it is a TLB miss, and a TLB miss exception handling request for adding or replacing a TLB entry is issued to CPU 2, and TLB rewrite control is performed (S8 ). At this time, the control unit 24 rewrites all the valid bits of the cache memory way corresponding to the rewritten TLB entry to an invalid level (S9). Then, compare the virtual page address VPN of each TLB entry (S1).
- control circuit 24 invalidates the data field of the way of the cache memory corresponding to the entry to be replaced. (S9) If there is data in the data field to be copied back, write back to the main memory.
- FIG. 5 illustrates a TLB rewrite control flow.
- the rewrite control flow differs depending on whether a lower-level TLB exists in the data processor (Sl l). If a lower layer TLB exists, the lower layer TLB is searched (S12). It is determined whether the searched lower layer TLB is a translation hit (TLB hit) for the virtual page address related to the TLB miss (S13). In the case of a TLB hit, the VPN and PPN of the conversion pair of the lower layer TLB are registered as TLB entries related to the miss (S14).
- TLB hit translation hit
- step S13 if the TLB in the lower layer is a miss (if there is a TLB in the lower layer but there is also a TLB miss), the TLB miss is notified to the CPU and the main is controlled by software.
- the page management information managed in the memory is registered in both the upper and lower TLBs (VPN, PPN) related to the mistake and validated (S15). If there is no TLB in the lower layer, a TLB miss exception is notified to the CPU and the software
- the page management information managed in the memo-in memory 6 is registered in the TLB (VPN, PPN) related to the mistake and made effective.
- FIG. 6 illustrates a cache rewrite control flow. If the TLB hits but the valid bit of the corresponding cache way is logical 0 (invalid level), it becomes a cache miss. At this time, the cache rewrite control is performed as described in step S7 in FIG. A cache rewrite is an update of only one line that misses the cache.
- control differs depending on whether or not a lower-level cache memory exists in the data processor (S21). If there is a lower-level cache memory, the lower-level cache memory is searched (S22). If the cache memory in the lower hierarchy is a cache hit, the cache data relating to the hit is registered in the cache memory in the upper hierarchy, and the valid bit is set to the logical value 1 (S24). If there is a lower-level cache, but a cache miss occurs there, the cache controller is notified of the cache miss and the main memory 6 is accessed. As a result, the data acquired from the main memory 6 is registered in both the upper and lower cache memories, and the valid bit is set to the logical value 1 (S25).
- FIG. 8 shows, as a comparative example, a cache memory in which all ways are indexed in parallel.
- ICACHE has an address tag field TAG.
- I CACHE selects all way WAYO-WAY7 operations and opens the index operation. Be started.
- the tag of the indexed cache line is compared with the physical page address supplied from the ITLB card, and the cache data of the matching way is the data related to the cache hit.
- FIG. 9 illustrates the relationship between the cache memory data of FIG. 8 and the main memory data.
- the PPN is 2 bits and the page size is 3 bits for simplicity.
- the cache memory way has 8 cache lines!
- the index address Aidx is 3 bits.
- the data processor 1 responds to the address translation hit signal generated for each TLB entry, as represented by the virtual address translation hit signal 27 [7: 0].
- the memory operation of the cache way to be started is started, and not all cache ways start the index operation in parallel. Since ICACH and DCACH do not require a tag memory in the cache, no power is required to access the tag memory itself. Therefore, low power consumption can be realized with respect to the cache memory having a set-associative configuration of the prior art. In estimating this effect, it is assumed that the ratio of the power consumption of the tag field and the data field is 1: 2 in one cache way, considering the free field of the cache memory and the bit width of the data field.
- the selection of the way that is tightly coupled with the TLB represented by ICACH and the set cache of the conventional technology The ratio of the power consumption of the operation type cache memory is about 12: 2, and the power consumption of the cache memory is reduced. It can be estimated that it can be reduced by about 83%.
- Figure 7 shows a detailed example of ICACH and ITLB using the predicted address translation hits.
- ITLB 20 has an 8-entry associative configuration
- ICACH 21 has an 8-way set associative configuration, for example, as in FIG.
- a prediction circuit 70 and a prediction match confirmation circuit 71 are added to the configuration of FIG. 1, and the way WAY operation is selected according to the virtual address translation hit prediction signal 72 [7: 0] to predict the address translation hit.
- the difference is that cache hit 65 is generated on the condition that matches the actual address translation result.
- the prediction circuit 70 holds the previous address conversion result and outputs it as a prediction signal 73 [7: 0].
- the prediction signal 73 [7: 0] is ANDed with the instruction etch signal 25 by the AND gate 54, and the logical product signal is the virtual address translation hit prediction signal 72 [ 7: 0].
- the way of ICACH21 WAYO—WAY7 is started by the logical value 1 of the corresponding virtual address translation hit prediction signal 72 [7: 0].
- the virtual address translation hit prediction signal 72 [7: 0] has the function of the virtual address translation hit signal 27 [7: 0] in FIG.
- the prediction match confirmation circuit 71 receives an entry translation hit signal 50 [7: 0] which is an address translation result in each actual entry ETYO-ETY7.
- the prediction match confirmation circuit 71 determines whether or not the value of the prediction signal 73 [7: 0] held by the prediction circuit 70 matches the newly received entry conversion hit signal 50 [7: 0].
- the determination result signal 75 is output, and the value of the entry conversion hit signal 50 [7: 0] is held as a new prediction result in the prediction circuit 70 so that it can be used for the next cache operation.
- the determination result signal 75 indicating whether the prediction is correct is logically ANDed with the valid bit selected by the selector 77 and the AND gate 76.
- the logical product signal is regarded as a cache hit signal 65.
- the cache memory 21 can be started without waiting for the conversion hit signal 50 [7: 0] to be confirmed by the ITLB 20, so that high-speed operation is possible. Even in this case, the VPN comparison on the ITLB 20 side is performed, and when the actual address translation hit signal 50 [7: 0] is determined, it is confirmed whether or not the prediction is correct.
- the prediction match confirmation result is supplied to the prediction circuit 70 to be reflected in the next prediction.
- the prediction is correct, the data output from ICACH21 and the cache hit signal are correct and are used in the same way as in Figure 1. If the prediction is incorrect, the correct prediction signal 73 [7: 0] has already been obtained, and therefore the prediction is not erroneous even if the output of the prediction circuit 70 is used. If the correct prediction hit signal is held in the prediction circuit 70, it is possible to resume from reading the way WAY of the corresponding cache memory 21. Of course, it is possible to start over from the VPN comparison of each entry ETY of ITLB20. In this application example, the fact that the cache memory to be activated has only one way is added to the feature that enables effective data in the cache memory to be obtained at high speed. Similarly, the effect of low power consumption can be obtained. [0049] Although the invention made by the present inventor has been specifically described based on the embodiments, it is needless to say that the present invention is not limited thereto and can be variously modified without departing from the gist thereof. .
- the data processor may include a data processing unit such as a floating point arithmetic unit or a product-sum arithmetic unit. In addition, have other circuit modules.
- the data processor is not limited to a single chip, but may be a multi-chip or a multi-CPU configuration having a plurality of central processing units! /.
- the present invention can be widely applied to microcomputers, microprocessors, and the like having an address translation buffer and a cache memory.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2006539089A JPWO2006038258A1 (ja) | 2004-09-30 | 2004-09-30 | データプロセッサ |
US11/663,592 US20080114940A1 (en) | 2004-09-30 | 2004-09-30 | Data Processor |
PCT/JP2004/014353 WO2006038258A1 (ja) | 2004-09-30 | 2004-09-30 | データプロセッサ |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2004/014353 WO2006038258A1 (ja) | 2004-09-30 | 2004-09-30 | データプロセッサ |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006038258A1 true WO2006038258A1 (ja) | 2006-04-13 |
Family
ID=36142349
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2004/014353 WO2006038258A1 (ja) | 2004-09-30 | 2004-09-30 | データプロセッサ |
Country Status (3)
Country | Link |
---|---|
US (1) | US20080114940A1 (ja) |
JP (1) | JPWO2006038258A1 (ja) |
WO (1) | WO2006038258A1 (ja) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010237770A (ja) * | 2009-03-30 | 2010-10-21 | Toshiba Corp | 情報処理装置、ブリッジ装置および情報処理方法 |
WO2014057546A1 (ja) * | 2012-10-10 | 2014-04-17 | 富士通株式会社 | マルチヒット検出回路、処理装置およびマルチヒット検出方法 |
JP2015060571A (ja) * | 2013-09-20 | 2015-03-30 | 株式会社東芝 | キャッシュメモリシステムおよびプロセッサシステム |
US10025719B2 (en) | 2014-02-24 | 2018-07-17 | Kabushiki Kaisha Toshiba | Cache memory system and processor system |
JP2022517318A (ja) * | 2019-01-24 | 2022-03-08 | アドバンスト・マイクロ・ディバイシズ・インコーポレイテッド | トランスレーションルックアサイドバッファエビクションに基づくキャッシュ置換 |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8667258B2 (en) | 2010-06-23 | 2014-03-04 | International Business Machines Corporation | High performance cache translation look-aside buffer (TLB) lookups using multiple page size prediction |
US8788789B2 (en) * | 2010-12-15 | 2014-07-22 | Advanced Micro Devices, Inc. | Power filter in data translation look-aside buffer based on an input linear address |
TWI587136B (zh) * | 2011-05-06 | 2017-06-11 | 創惟科技股份有限公司 | 快閃記憶體系統及其快閃記憶體無效資料頁資訊之管理方法與回收方法 |
US9367456B1 (en) * | 2013-06-14 | 2016-06-14 | Marvell International Ltd. | Integrated circuit and method for accessing segments of a cache line in arrays of storage elements of a folded cache |
GB2547189A (en) * | 2016-02-03 | 2017-08-16 | Swarm64 As | Cache and method |
US10037283B2 (en) * | 2016-08-12 | 2018-07-31 | Advanced Micro Devices, Inc. | Updating least-recently-used data for greater persistence of higher generality cache entries |
US10489305B1 (en) | 2018-08-14 | 2019-11-26 | Texas Instruments Incorporated | Prefetch kill and revival in an instruction cache |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS61271572A (ja) * | 1985-05-28 | 1986-12-01 | Toshiba Corp | キヤツシユメモリ |
JPH05108486A (ja) * | 1991-10-12 | 1993-04-30 | Fujitsu Ltd | キヤツシユ・メモリの制御方式 |
JPH07334423A (ja) * | 1994-06-07 | 1995-12-22 | Hitachi Ltd | セットアソシアティブ方式のメモリ装置 |
JPH086857A (ja) * | 1994-06-17 | 1996-01-12 | Mitsubishi Electric Corp | キャッシュメモリ |
JP2000293437A (ja) * | 1999-04-02 | 2000-10-20 | Nec Corp | キャッシュメモリ装置及びキャッシュメモリ制御方法 |
JP2003196157A (ja) * | 2001-12-25 | 2003-07-11 | Mitsubishi Electric Corp | プロセッサ装置及びメモリ管理方法 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5148538A (en) * | 1989-10-20 | 1992-09-15 | International Business Machines Corporation | Translation look ahead based cache access |
-
2004
- 2004-09-30 US US11/663,592 patent/US20080114940A1/en not_active Abandoned
- 2004-09-30 JP JP2006539089A patent/JPWO2006038258A1/ja active Pending
- 2004-09-30 WO PCT/JP2004/014353 patent/WO2006038258A1/ja active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS61271572A (ja) * | 1985-05-28 | 1986-12-01 | Toshiba Corp | キヤツシユメモリ |
JPH05108486A (ja) * | 1991-10-12 | 1993-04-30 | Fujitsu Ltd | キヤツシユ・メモリの制御方式 |
JPH07334423A (ja) * | 1994-06-07 | 1995-12-22 | Hitachi Ltd | セットアソシアティブ方式のメモリ装置 |
JPH086857A (ja) * | 1994-06-17 | 1996-01-12 | Mitsubishi Electric Corp | キャッシュメモリ |
JP2000293437A (ja) * | 1999-04-02 | 2000-10-20 | Nec Corp | キャッシュメモリ装置及びキャッシュメモリ制御方法 |
JP2003196157A (ja) * | 2001-12-25 | 2003-07-11 | Mitsubishi Electric Corp | プロセッサ装置及びメモリ管理方法 |
Non-Patent Citations (2)
Title |
---|
LEE Y. ET AL: "SHARED TAG FOR MMU AND CACHE MEMORY", PROCEEDINGS OF 1997 INTERNATIONAL SEMICONDUCTOR CONFERENCE (CAS'97), vol. 1, 7 October 1997 (1997-10-07), pages 77 - 80, XP010266859 * |
SUZUKI K. ET AL: "Tagless Cache", THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, vol. 94, no. 481, CPSY94-98, 30 January 1995 (1995-01-30), pages 25 - 32, XP003005651 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010237770A (ja) * | 2009-03-30 | 2010-10-21 | Toshiba Corp | 情報処理装置、ブリッジ装置および情報処理方法 |
WO2014057546A1 (ja) * | 2012-10-10 | 2014-04-17 | 富士通株式会社 | マルチヒット検出回路、処理装置およびマルチヒット検出方法 |
JP2015060571A (ja) * | 2013-09-20 | 2015-03-30 | 株式会社東芝 | キャッシュメモリシステムおよびプロセッサシステム |
US9740613B2 (en) | 2013-09-20 | 2017-08-22 | Kabushiki Kaisha Toshiba | Cache memory system and processor system |
US10025719B2 (en) | 2014-02-24 | 2018-07-17 | Kabushiki Kaisha Toshiba | Cache memory system and processor system |
JP2022517318A (ja) * | 2019-01-24 | 2022-03-08 | アドバンスト・マイクロ・ディバイシズ・インコーポレイテッド | トランスレーションルックアサイドバッファエビクションに基づくキャッシュ置換 |
JP7337173B2 (ja) | 2019-01-24 | 2023-09-01 | アドバンスト・マイクロ・ディバイシズ・インコーポレイテッド | トランスレーションルックアサイドバッファエビクションに基づくキャッシュ置換 |
Also Published As
Publication number | Publication date |
---|---|
US20080114940A1 (en) | 2008-05-15 |
JPWO2006038258A1 (ja) | 2008-05-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP3713312B2 (ja) | データ処理装置 | |
US9208084B2 (en) | Extended main memory hierarchy having flash memory for page fault handling | |
EP3238074B1 (en) | Cache accessed using virtual addresses | |
US9098284B2 (en) | Method and apparatus for saving power by efficiently disabling ways for a set-associative cache | |
US9112537B2 (en) | Content-aware caches for reliability | |
US8291168B2 (en) | Disabling cache portions during low voltage operations | |
US6006311A (en) | Dynamic updating of repair mask used for cache defect avoidance | |
US7516275B2 (en) | Pseudo-LRU virtual counter for a locking cache | |
US6023746A (en) | Dual associative-cache directories allowing simultaneous read operation using two buses with multiplexors, address tags, memory block control signals, single clock cycle operation and error correction | |
US5715427A (en) | Semi-associative cache with MRU/LRU replacement | |
US5958068A (en) | Cache array defect functional bypassing using repair mask | |
JPH036757A (ja) | ライトバツクデータキヤツシユメモリ装置 | |
JP2001195303A (ja) | 機能が並列に分散された変換索引バッファ | |
US6085288A (en) | Dual cache directories with respective queue independently executing its content and allowing staggered write operations | |
US10528473B2 (en) | Disabling cache portions during low voltage operations | |
US5883904A (en) | Method for recoverability via redundant cache arrays | |
CN111164581A (zh) | 用于修补页的系统、方法和装置 | |
WO2006038258A1 (ja) | データプロセッサ | |
JPH07295889A (ja) | アドレス変換回路 | |
JP4071942B2 (ja) | データ処理装置及びデータプロセッサ | |
US5943686A (en) | Multiple cache directories for non-arbitration concurrent accessing of a cache memory | |
US5867511A (en) | Method for high-speed recoverable directory access | |
TWI407306B (zh) | 快取記憶體系統及其存取方法與電腦程式產品 | |
EP2866148B1 (en) | Storage system having tag storage device with multiple tag entries associated with same data storage line for data recycling and related tag storage device | |
JP2000259498A (ja) | マルチスレッド・プロセッサの命令キャッシュ |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
DPEN | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2006539089 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11663592 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase | ||
WWP | Wipo information: published in national office |
Ref document number: 11663592 Country of ref document: US |