US9158697B2 - Method for cleaning cache of processor and associated processor - Google Patents

Method for cleaning cache of processor and associated processor Download PDF

Info

Publication number
US9158697B2
US9158697B2 US13/691,841 US201213691841A US9158697B2 US 9158697 B2 US9158697 B2 US 9158697B2 US 201213691841 A US201213691841 A US 201213691841A US 9158697 B2 US9158697 B2 US 9158697B2
Authority
US
United States
Prior art keywords
cache
field
command
offset
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US13/691,841
Other versions
US20130173862A1 (en
Inventor
Yen-Ju Lu
Ching-Yeh Yu
Chen-Tung Lin
Chao-Wei Huang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Realtek Semiconductor Corp
Original Assignee
Realtek Semiconductor Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Realtek Semiconductor Corp filed Critical Realtek Semiconductor Corp
Assigned to REALTEK SEMICONDUCTOR CORP. reassignment REALTEK SEMICONDUCTOR CORP. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HUANG, CHAO-WEI, LIN, CHEN-TUNG, LU, YEN-JU, YU, CHING-YEH
Publication of US20130173862A1 publication Critical patent/US20130173862A1/en
Application granted granted Critical
Publication of US9158697B2 publication Critical patent/US9158697B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0891Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0804Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating

Definitions

  • the present invention relates to a method for cleaning cache, and more particularly, to a method for cleaning cache of a specific segment of a processor.
  • a cache is memory whose access speed is faster than an access speed of general random access memory.
  • the cache is made of high-speed and expensive static random access memory (SRAM) instead of the slower and cheaper dynamic random access memory (DRAM) used for system main memory.
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • the CPU 10 can directly access the required data from the cache 104 instead of accessing the data from the main memory 12 . Therefore, the access speed of the CPU 10 can be faster, and the operations of the CPU 10 are more efficient.
  • the CPU cache was an advanced technique used in supercomputers, but now an instruction cache and a data cache are integrated into a microprocessor used by a computer, and such internal caches are often called L1 caches (Level 1 On-die Cache).
  • L1 caches Level 1 On-die Cache
  • a L2 cache whose size is greater than L1 cache, was positioned outside the CPU such as a main board or a CPU interface, however, now the L2 cache is a standard component inside the CPU.
  • the advanced or a workstation CPU may have a L3 cache (Level 3 On-die Cache).
  • the cache is used to speed up the access speed of the CPU.
  • the cache are not only used to temporally store the data that was accessed before, but also used to move the data, which is to be used in the further, from the main memory with an instruction prediction and a data pre-access technique implemented by hardware. Therefore, the opportunity the CPU can access the required data in the cache can be increased.
  • the CPU may provide a write-back command or an invalidate command according to requirements of the system and software. Referring to FIG.
  • the core 102 when the core 102 performs the write-back operation upon the cache 104 , the data stored in the cache 104 is written back to the main memory 12 ; and when the core 102 performs the invalidate operation upon the cache 104 , the core 102 cleans the data stored in the cache 104 .
  • the write-back command is sent with the invalidate command to make the cache be cleaned after the data is written back to the main memory 12 .
  • the size of the early cache is very small (several kilobytes, KB), there is no need to consider how to clean only a portion of the cache, however, the current cache is expanded to be several megabytes (MB), how to clean a specific segment of the cache becomes a new topic.
  • a method for cleaning a cache of a processor comprises: generating a specific command according to a request, wherein the specific command comprises an operation command, a first field and a second field; obtaining an offset and a starting address according to the first field and the second field; selecting a specific segment from the cache according to the starting address and the offset; and cleaning data stored in the specific segment.
  • a processor comprises: a cache system comprising a cache memory; and a core, where the core is used for generating a specific command according to a request, where the specific command comprises an operation command, a first field and a second field, and the core further obtains an offset and a starting address according to the first field and the second field.
  • the core transmits the offset and the starting address to the cache system, the cache system selects a specific segment from a cache memory according to the starting address and the offset, and the cache system cleans data stored in the specific segment.
  • the starting address and the size of the segment to be cleaned can be adjusted.
  • FIG. 1 is a diagram illustrating a prior art processor.
  • FIG. 2 shows a command format according to one embodiment of the present invention.
  • FIG. 3 is a flowchart according to one embodiment of the present invention.
  • FIG. 4 is a diagram illustrating a processor of the embodiment shown in FIG. 3 .
  • FIG. 2 shows a command format according to one embodiment of the present invention.
  • an operation (OP) field 22 includes a specific command such as “write-back”, “invalidate” or “write-back and invalidate”, . . . etc.
  • the offset field 24 includes an offset.
  • the register field 26 marked as “rS” is used to indicate to a register to mark a starting address.
  • the processor has 32 resisters called “register file”.
  • the register field 26 is used to indicate to one of the 32 registers whose value is 0x8000 — 0000, therefore, 0x8000 — 0000 serves as the starting address, and an end address is “0x8000 — 0000+offset”, where the “offset” marked in the offset field 24 can be a quantity of offset cache lines.
  • the cache line size is 8 bytes
  • the value of the register indicated by the register field 26 is “0000” and the offset is “0001”
  • the CPU writes the data stored between the addresses “0000” and “0008” of the cache back to the main memory, or the CPU cleans the data stored between the addresses “0000” and “0008” of the cache, according to the command in the OP field 22 .
  • the values of the offset field 24 and the register field 26 the size and the starting address of the selected segment can be adjusted.
  • FIG. 3 is a flowchart according to one embodiment of the present invention
  • FIG. 4 is a diagram illustrating a processor.
  • the processor includes a core 40 and a cache system 42 , where the operations of the core 40 includes many stages such as instruction fetch (IF) stage 402 , instruction decode (ID) stage 404 , execution stage 406 , memory access stage 408 and writeback stage 410 .
  • IF instruction fetch
  • ID instruction decode
  • the core 40 receives a request by software in the instruction fetch stage 402 , and performs decoding operation to obtain information about the offset field 24 and the register field 26 .
  • Step 303 the core 40 obtains the starting address according to the register indicated by the register field 26 .
  • Step 304 the core 40 generates the end address by using the starting address and the offset.
  • Step 305 the core 40 sends the operation command, the starting address and the end address to the cache system 42 .
  • the cache system 42 performs the specific operation corresponding to the operation command, such as write-back, invalidate, or write-back and invalidate, upon the segment between the starting address and the end address.
  • Step 307 the flow is finished.
  • cache memory of the cache system 42 includes a data cache 424 and an instruction cache 422 , and the method of the present invention can be applied to both of the two caches, where the instruction cache 422 generally does not need to perform the “write-back” operation.
  • the core 40 provides the starting address, the offset and the end address to the cache system 42 .
  • the core 40 merely provides the operation command, the starting address and the offset to the cache system 42 , and the cache system 42 generates the end address by using the received starting address and the offset.

Abstract

A method for cleaning a cache of a processor includes: generating a specific command according to a request, wherein the specific command includes an operation command, a first field and a second field; obtaining an offset and a starting address according to the first field and the second field; selecting a specific segment from the cache according to the starting address and the offset; and cleaning data stored in the specific segment.

Description

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a method for cleaning cache, and more particularly, to a method for cleaning cache of a specific segment of a processor.
2. Description of the Prior Art
A cache is memory whose access speed is faster than an access speed of general random access memory. Generally, the cache is made of high-speed and expensive static random access memory (SRAM) instead of the slower and cheaper dynamic random access memory (DRAM) used for system main memory. Referring to FIG. 1, because the operating speed of a central processing unit (CPU) 10 is faster than a reading speed of a main memory 12, if the CPU 10 requires to access data stored in the main memory 12, the CPU 10 needs several clock periods to complete the access operation, causing low-efficient implementation. Therefore, when the CPU 10 accesses data, a core 102 first checks whether the required data is in a cache 104. When the required data has been temporally stored in the cache 104 due to the previous operation, the CPU 10 can directly access the required data from the cache 104 instead of accessing the data from the main memory 12. Therefore, the access speed of the CPU 10 can be faster, and the operations of the CPU 10 are more efficient.
Once, the CPU cache was an advanced technique used in supercomputers, but now an instruction cache and a data cache are integrated into a microprocessor used by a computer, and such internal caches are often called L1 caches (Level 1 On-die Cache). In addition, a L2 cache, whose size is greater than L1 cache, was positioned outside the CPU such as a main board or a CPU interface, however, now the L2 cache is a standard component inside the CPU. In addition, the advanced or a workstation CPU may have a L3 cache (Level 3 On-die Cache).
The cache is used to speed up the access speed of the CPU. To fully exert functions of the cache, the cache are not only used to temporally store the data that was accessed before, but also used to move the data, which is to be used in the further, from the main memory with an instruction prediction and a data pre-access technique implemented by hardware. Therefore, the opportunity the CPU can access the required data in the cache can be increased. In addition, because the size/capacity of the cache is limited, how to clean the data stored in the cache is an important topic. In addition, the CPU may provide a write-back command or an invalidate command according to requirements of the system and software. Referring to FIG. 1, when the core 102 performs the write-back operation upon the cache 104, the data stored in the cache 104 is written back to the main memory 12; and when the core 102 performs the invalidate operation upon the cache 104, the core 102 cleans the data stored in the cache 104. Generally, the write-back command is sent with the invalidate command to make the cache be cleaned after the data is written back to the main memory 12. In addition, because the size of the early cache is very small (several kilobytes, KB), there is no need to consider how to clean only a portion of the cache, however, the current cache is expanded to be several megabytes (MB), how to clean a specific segment of the cache becomes a new topic.
In U.S. Pat. No. 6,978,357, Hacking et al. provide a solution to solve this problem. However, Hacking' method has two restrictions; one is that the selected segment must be a multiple of two, and the other one is that the size of the segment to be cleaned is fixed.
SUMMARY OF THE INVENTION
It is therefore an objective of the present invention to provide a method for cleaning a selected segment of a cache of a processor by referring a command whose format has the selected segment information.
According to one embodiment of the present invention, a method for cleaning a cache of a processor comprises: generating a specific command according to a request, wherein the specific command comprises an operation command, a first field and a second field; obtaining an offset and a starting address according to the first field and the second field; selecting a specific segment from the cache according to the starting address and the offset; and cleaning data stored in the specific segment.
According to another embodiment of the present invention, a processor comprises: a cache system comprising a cache memory; and a core, where the core is used for generating a specific command according to a request, where the specific command comprises an operation command, a first field and a second field, and the core further obtains an offset and a starting address according to the first field and the second field. The core transmits the offset and the starting address to the cache system, the cache system selects a specific segment from a cache memory according to the starting address and the offset, and the cache system cleans data stored in the specific segment.
By using the command format provided by the present invention, the starting address and the size of the segment to be cleaned can be adjusted.
These and other objectives of the present invention will no doubt become obvious to those of ordinary skill in the art after reading the following detailed description of the preferred embodiment that is illustrated in the various figures and drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a diagram illustrating a prior art processor.
FIG. 2 shows a command format according to one embodiment of the present invention.
FIG. 3 is a flowchart according to one embodiment of the present invention.
FIG. 4 is a diagram illustrating a processor of the embodiment shown in FIG. 3.
DETAILED DESCRIPTION
The present invention provides a method for cleaning a cache of a processor, and FIG. 2 shows a command format according to one embodiment of the present invention. Referring to a command 20 shown in FIG. 2, an operation (OP) field 22 includes a specific command such as “write-back”, “invalidate” or “write-back and invalidate”, . . . etc. The offset field 24 includes an offset. The register field 26 marked as “rS” is used to indicate to a register to mark a starting address. Generally, the processor has 32 resisters called “register file”. In this embodiment, the register field 26 is used to indicate to one of the 32 registers whose value is 0x80000000, therefore, 0x80000000 serves as the starting address, and an end address is “0x80000000+offset”, where the “offset” marked in the offset field 24 can be a quantity of offset cache lines.
For example, when the cache line size is 8 bytes, the value of the register indicated by the register field 26 is “0000” and the offset is “0001”, the end address is “rS+offset=0byte+(1<<3) byte=8”. That is, the starting address is “0000”, and the end address is “0008”. The CPU writes the data stored between the addresses “0000” and “0008” of the cache back to the main memory, or the CPU cleans the data stored between the addresses “0000” and “0008” of the cache, according to the command in the OP field 22. By changing the values of the offset field 24 and the register field 26, the size and the starting address of the selected segment can be adjusted.
Please refer to FIG. 3 and FIG. 4 together, where FIG. 3 is a flowchart according to one embodiment of the present invention, and FIG. 4 is a diagram illustrating a processor. The processor includes a core 40 and a cache system 42, where the operations of the core 40 includes many stages such as instruction fetch (IF) stage 402, instruction decode (ID) stage 404, execution stage 406, memory access stage 408 and writeback stage 410. In this embodiment, after the Step 301 the flow starts, in Step 302, the core 40 receives a request by software in the instruction fetch stage 402, and performs decoding operation to obtain information about the offset field 24 and the register field 26. Then, in Step 303, the core 40 obtains the starting address according to the register indicated by the register field 26. In Step 304, the core 40 generates the end address by using the starting address and the offset. In Step 305, the core 40 sends the operation command, the starting address and the end address to the cache system 42. In Step 306, the cache system 42 performs the specific operation corresponding to the operation command, such as write-back, invalidate, or write-back and invalidate, upon the segment between the starting address and the end address. In Step 307, the flow is finished. In addition, cache memory of the cache system 42 includes a data cache 424 and an instruction cache 422, and the method of the present invention can be applied to both of the two caches, where the instruction cache 422 generally does not need to perform the “write-back” operation.
In the embodiment shown in FIG. 3, the core 40 provides the starting address, the offset and the end address to the cache system 42. However, in another embodiment, the core 40 merely provides the operation command, the starting address and the offset to the cache system 42, and the cache system 42 generates the end address by using the received starting address and the offset.
Those skilled in the art will readily observe that numerous modifications and alterations of the device and method may be made while retaining the teachings of the invention. Accordingly, the above disclosure should be construed as limited only by the metes and bounds of the appended claims.

Claims (14)

What is claimed is:
1. A method for cleaning a cache of a processor, wherein the cache comprises a plurality of cache lines, each cache line contains a plurality of segments, and the method comprises:
generating a specific command according to a request, wherein the specific command comprises an operation command, a first field and a second field, and the operation command comprises a “write-back” command;
obtaining an offset and a starting address according to the first field and the second field;
selecting a specific segment from the cache according to the starting address and the offset; and
writing data stored in the specific segment back to a memory in response to the “write-back” command.
2. The method of claim 1, wherein the step of selecting the specific segment from the cache according to the starting address and the offset comprises:
generating an end address by using the first field and the second field; and
determining the specific segment according to the starting address and the end address.
3. The method of claim 1, wherein the request is from software.
4. The method of claim 1, wherein the operation command comprises an “invalidate” command.
5. The method of claim 1, wherein the first field and the second field follows the operation command, and the second field indicates to a register.
6. The method of claim 1, wherein the step of generating the specific command according to the request comprises:
decoding the request to generate the specific command.
7. The method of claim 1, wherein the offset is a quantity of cache lines.
8. A processor, comprising:
a cache system, comprising a cache memory, wherein the cache memory comprises a plurality of cache lines, each cache line contains a plurality of segments; and
a core, for generating a specific command according to a request, wherein the specific command comprises an operation command, a first field and a second field, the operation command comprises a “write-back” command, and the core further obtains an offset and a starting address according to the first field and the second field;
wherein the core transmits the offset and the starting address to the cache system, the cache system selects a specific segment from the cache memory according to the starting address and the offset, and the cache system writes data stored in the specific segment back to a memory in response to the “write-back” command.
9. The processor of claim 8, wherein the core further generates an end address according to the offset and the starting address, and transmits the starting address, the offset and the end address to the cache system.
10. The processor of claim 8, wherein the cache system generates an end address according to the offset and the starting address to determine the specific segment.
11. The processor of claim 8, wherein the request is from software.
12. The processor of claim 8, wherein the operation command comprises an “invalidate” command.
13. The processor of claim 8, further comprising a plurality of registers, wherein the first field and the second field follows the operation command, and the second field indicates to one of the registers.
14. The processor of claim 8, wherein the offset is a quantity of cache lines.
US13/691,841 2011-12-28 2012-12-02 Method for cleaning cache of processor and associated processor Active 2033-11-06 US9158697B2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
TW100149208A TWI579695B (en) 2011-12-28 2011-12-28 Method for cleaning cache of processor and associated processor
TW100149208A 2011-12-28
TW100149208 2011-12-28

Publications (2)

Publication Number Publication Date
US20130173862A1 US20130173862A1 (en) 2013-07-04
US9158697B2 true US9158697B2 (en) 2015-10-13

Family

ID=48695906

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/691,841 Active 2033-11-06 US9158697B2 (en) 2011-12-28 2012-12-02 Method for cleaning cache of processor and associated processor

Country Status (2)

Country Link
US (1) US9158697B2 (en)
TW (1) TWI579695B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10901908B2 (en) 2019-01-16 2021-01-26 International Business Machines Corporation Storing data into a memory

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103714016B (en) * 2014-01-14 2017-10-27 北京猎豹移动科技有限公司 Method for cleaning, device and the client of caching
US11226741B2 (en) * 2018-10-31 2022-01-18 EMC IP Holding Company LLC I/O behavior prediction based on long-term pattern recognition

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6978357B1 (en) * 1998-07-24 2005-12-20 Intel Corporation Method and apparatus for performing cache segment flush and cache segment invalidation operations
CN100388187C (en) 2004-08-04 2008-05-14 威盛电子股份有限公司 Apparatus for predicting multiple branch target addresses
CN100527094C (en) 2007-07-25 2009-08-12 威盛电子股份有限公司 Method and apparatus for obtaining scratch memory data
CN101833437A (en) 2009-05-19 2010-09-15 威盛电子股份有限公司 Device and method for a microprocessor
US20110153952A1 (en) * 2009-12-22 2011-06-23 Dixon Martin G System, method, and apparatus for a cache flush of a range of pages and tlb invalidation of a range of entries

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6978357B1 (en) * 1998-07-24 2005-12-20 Intel Corporation Method and apparatus for performing cache segment flush and cache segment invalidation operations
CN100388187C (en) 2004-08-04 2008-05-14 威盛电子股份有限公司 Apparatus for predicting multiple branch target addresses
CN100527094C (en) 2007-07-25 2009-08-12 威盛电子股份有限公司 Method and apparatus for obtaining scratch memory data
CN101833437A (en) 2009-05-19 2010-09-15 威盛电子股份有限公司 Device and method for a microprocessor
US20110153952A1 (en) * 2009-12-22 2011-06-23 Dixon Martin G System, method, and apparatus for a cache flush of a range of pages and tlb invalidation of a range of entries

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10901908B2 (en) 2019-01-16 2021-01-26 International Business Machines Corporation Storing data into a memory

Also Published As

Publication number Publication date
US20130173862A1 (en) 2013-07-04
TW201327165A (en) 2013-07-01
TWI579695B (en) 2017-04-21

Similar Documents

Publication Publication Date Title
US7669033B2 (en) Pretranslating input/output buffers in environments with multiple page sizes
KR101379524B1 (en) Streaming translation in display pipe
TWI446166B (en) Method of determining cache policies, processor, and system for setting cache policies
US6782453B2 (en) Storing data in memory
US20120137074A1 (en) Method and apparatus for stream buffer management instructions
JP6859361B2 (en) Performing memory bandwidth compression using multiple Last Level Cache (LLC) lines in a central processing unit (CPU) -based system
US20150143045A1 (en) Cache control apparatus and method
US9135177B2 (en) Scheme to escalate requests with address conflicts
US20140089600A1 (en) System cache with data pending state
EP3005126B1 (en) Storage systems and aliased memory
CN109219804B (en) Nonvolatile memory access method apparatus and system
US9875191B2 (en) Electronic device having scratchpad memory and management method for scratchpad memory
US9785345B2 (en) Mode-dependent access to embedded memory elements
US9158697B2 (en) Method for cleaning cache of processor and associated processor
CN115509959A (en) Processing system, control method, chip, and computer-readable storage medium
US9904622B2 (en) Control method for non-volatile memory and associated computer system
US8661169B2 (en) Copying data to a cache using direct memory access
JP2005038427A (en) Method, circuit, and system for using idle time of dynamic frequency scaling cache memory
US9990311B2 (en) Peripheral interface circuit
US8719542B2 (en) Data transfer apparatus, data transfer method and processor
JP2007207249A (en) Method and system for cache hit under miss collision handling, and microprocessor
US20070150653A1 (en) Processing of cacheable streaming data
US7574568B2 (en) Optionally pushing I/O data into a processor&#39;s cache
US20170153994A1 (en) Mass storage region with ram-disk access and dma access
CN103186474B (en) The method that the cache of processor is purged and this processor

Legal Events

Date Code Title Description
AS Assignment

Owner name: REALTEK SEMICONDUCTOR CORP., TAIWAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LU, YEN-JU;YU, CHING-YEH;LIN, CHEN-TUNG;AND OTHERS;REEL/FRAME:029389/0622

Effective date: 20111229

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8