CN103186474B - The method that the cache of processor is purged and this processor - Google Patents

The method that the cache of processor is purged and this processor Download PDF

Info

Publication number
CN103186474B
CN103186474B CN201110448085.6A CN201110448085A CN103186474B CN 103186474 B CN103186474 B CN 103186474B CN 201110448085 A CN201110448085 A CN 201110448085A CN 103186474 B CN103186474 B CN 103186474B
Authority
CN
China
Prior art keywords
cache
field
deviant
processor
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110448085.6A
Other languages
Chinese (zh)
Other versions
CN103186474A (en
Inventor
卢彦儒
虞敬业
林振东
黄朝玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Realtek Semiconductor Corp
Original Assignee
Realtek Semiconductor Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Realtek Semiconductor Corp filed Critical Realtek Semiconductor Corp
Priority to CN201110448085.6A priority Critical patent/CN103186474B/en
Publication of CN103186474A publication Critical patent/CN103186474A/en
Application granted granted Critical
Publication of CN103186474B publication Critical patent/CN103186474B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention relates to method and this processor being purged the cache of processor, the method includes requiring to produce a specific instruction according to one, and this specific instruction comprises an operational order, one first field and one second field;According to this first field and this second field, obtain a deviant and an initial address;According to this initial address and this deviant, a selected appointment section in this cache;And remove the data being stored in this appointment section.This specific instruction can comprise and writes back (Writeback), ineffective treatment (Invalidate) or write back and add ineffective treatment (Writeback+Invalidate).

Description

The method that the cache of processor is purged and this processor
Technical field
The present invention is the sweep-out method about a kind of cache, clear especially with regard to a kind of cache to processor Except the method specifying section.
Background technology
Cache (cache) refers to that access speed is than general random access memory a kind of internal memory faster, it is however generally that, It uses DRAM technology unlike main system memory (main memory), but uses expensive but faster SRAM skill Art.With reference to Fig. 1, due to processor (CPU) 10 executions speed more than main storage 12 reading speed soon, processor 10 to The data of accessing main memory 12, it is necessary to wait that several processor frequencies cycle causes the waste for the treatment of efficiency, therefore, process Device 10 is when accessing data, and its core 102 can arrive first in cache 104 and look for, when required data are because of operation before When being temporary in cache 104, processor 10 avoids the need for reading data from main storage 12, and can be directly from a high speed Caching 104 acquisition desired data, thus promote access speed, it is thus achieved that better performance.
A kind of high-order technology that the cache of CPU was once used on supercomputer, but make on modern computer Microprocessor all incorporate the data high-speed caching and instruction cache differed in size at chip internal, be commonly referred to as L1 high Speed caching (L1Cache i.e. Level 1On-dieCache, first order on chip cache);And than the L2 high speed of L1 more capacity Caching was once placed in outside CPU, such as on motherboard or CPU adapter, but have become as now the standard within CPU Assembly;More expensive top domestic and work station CPU even can be equipped with the third level speed buffering bigger than L2 cache Memorizer (level 3On-die Cache;L3 cache).
Thering is provided the purpose of cache is to allow the processing speed of velocity adaptive CPU of data access, in order to fully send out Waving the effect of cache, cache now the most not only relies on the most accessed temporary data to provide cache Ability, also can coordinate branch prediction and the data pre-fetching technology of hardware implementation, as far as possible the data that will use in advance from master Memorizer is got in cache, promotes CPU and obtains the probability of desired data in cache.Capacity due to cache Limited, in addition to the CPU desired data that effectively prestores, the data that in good time removing is stored in cache are also particularly significant 's.Cache can be provided write back (Writeback) or ineffective treatment according to system or the demand of software by CPU (Invalidate) instruction.With reference to Fig. 1, when core 102 carries out written-back operation to cache 104, former being stored in is delayed at a high speed Deposit the data in 104 and be written back to main storage 12;When performing ineffective treatment operation, core 102 is by the institute in cache 104 There is data dump (clean);Generally, write back instruction to send together along with ineffective treatment instruction, to write back primary storage in data Whole cache is removed after device 12.But, cache capacity in early days is minimum, and the most several KB, therefore without the concern for such as What understands partial sector, but cache now has been expanded and has reached several MB, how the particular section number to cache According to being cleared into new problem.
Hacking et al. proposes a solution U.S. Patent No. US 6978357, but, this reset mode There is several restriction, first, the sector sizes being selected must be the multiple of 2;Second, the district of regular length can only be removed Section.
Summary of the invention
An object of the present invention, is to propose the instruction format of section in a kind of selected cache, according to this to process The cache of device selectes section the method removed.
An object of the present invention, is to propose a kind of to perform the instruction format of section in selected cache, according to this The processor that section selected in its cache is purged.
According to the present invention, a kind of method that the cache of processor is purged, including: require generation one according to one Specific instruction, this specific instruction comprises an operational order, one first field and one second field;According to this this first field with And this second field, obtain a deviant and an initial address;According to this initial address and this deviant, the most slow from this A selected appointment section in depositing;And remove the data being stored in this appointment section.
According to the present invention, a kind of processor includes: a cache, controls including a cache and a cache Device;And a processor core, require to produce a specific instruction according to one, this specific instruction comprise an operational order, one first Field and one second field, obtain a deviant and an initial address according to this first field and this second field;Its In, this processor core sends this initial address and this deviant to this director cache, and this cache controls Device is according to this initial address and deviant, and in this cache, selected one specifies section, and removing is stored in this appointment district The data of section.
The instruction format of present invention proposition makes to be eliminated the initial address of section and sector sizes all adjustable.
Accompanying drawing explanation
Fig. 1 is the schematic diagram of the framework of processor of the prior art;
Fig. 2 is the instruction format proposed according to the present invention;
Fig. 3 is the flow chart according to one embodiment of the invention;And
Fig. 4 is the processor architecture schematic diagram of the embodiment of the 3rd figure.
Primary clustering symbol description
10 processor 102 cores
104 cache 12 main storages
20 instruction 22 OP fields
24 skew field 26 buffer fields
40 cores 402 instruct the acquisition stage
404 instruction decoding stages 406 produced address command and approval stage
42 cache systems 422 director caches
424 data high-speeds cache 426 instruction caches
Detailed description of the invention
The present invention proposes a kind of method being purged the cache of processor, and Fig. 2 shows what the present invention proposed Instruction format, in instruction 20, OP field 22 is specific instruction, such as, write back (Writeback), ineffective treatment (Invalidate) and write back and add ineffective treatment (Writeback+Invalidate) etc., skew field 24 is for write one deviant (offset), the buffer field 26 being denoted as rS is for pointing to a buffer, to represent an initial address.Generally, process Device can be provided with 32 buffers, referred to as register file (Register File), in the present embodiment, buffer field 26 for point to these 32 buffers one of them, its store value be 0x8000_0000, therefore, core is done with 0x8000_0000 For initial address (Starting address), end address (End address) is 0x8000_0000+offset, offsets hurdle Position offset represented by 24 can be the quantity of the cache column (cache line) of skew.
For example, in the case of cache column size (cache line size) is 8bytes, when buffer hurdle The value of buffers that position 26 is pointed to is 0000, and when offset is 0001, and end address is rS+offset=0byte+1 (< < 3) Byte=8, start address now is 0000, and end address is 0008.CPU is according to the instruction in OP field 22, by the most slow Deposit the data stored by address 0000 to 0008 write back to main storage or removed.Thus change skew field 24 and delay The value of storage field 26, the size of selected section and initial address all adjustable.
Fig. 3 is the flow chart according to one embodiment of the invention, and it is said by the processor architecture schematic diagram in conjunction with Fig. 4 Bright.As it was previously stated, processor includes core 40 and 42 two parts of cache systems, wherein, the process of core 40 is also divided into Multiple stages, such as instruction acquisition stage (Instruction Fetch;IF) 402, instruction decoding stage (Instruction Decode;ID) 404, finally enter generation address command and approval stage (Address-Command Generation& Issue) 406, in the present embodiment, after starting 301, perform step 302, core 40 is instructing the acquisition stage 402 according to coming From the requirement of software, it is decoded in the instruction decoding stage 404 and obtains the relevant information of deviant and initial address, determine behaviour Instruct and follow the buffer field after operational order and the value in skew field, then at producing address command and checking and approving rank Section 406 produces whole instruction, subsequently enters step 303, and core 40 obtains this according to the buffer that buffer field points to and initiates Address, produces end address then at step 304 according to this initial address and this deviant computing;In step 305, core 40 Send operational order, initial address and the end address director cache 422 to cache systems 42;Cache Section between this initial address and this end address is performed should the spy of operational order by controller 422 within step 306 Fixed operation, such as, write back, ineffective treatment or write back and add ineffective treatment, then terminates 307.Cache in cache 40 can be divided again (data cache) 424 and 426 two parts of instruction cache (instruction cache) is cached for data high-speed, this The mode that invention is proposed can be simultaneously suitable for this two kinds of caches, and wherein, instruction cache 426 is general without performing to write back The needs of instruction.
In the fig. 3 embodiment, core 40 provide initial address, deviant and end address to cache, but at it In its embodiment, end address can not be calculated generation by core 40, and core only provides operational order, initial address and deviant To cache systems 42, then gone to calculate this end address of generation by the director cache 422 in cache.
The foregoing is only the preferred embodiments of the present invention, all impartial changes done according to scope of the present invention patent with Modify, all should belong to the covering scope of the present invention.

Claims (14)

1. the method being purged the cache of processor, wherein, described cache includes multiple cache Row, each cache column includes multiple section, and described method includes:
Requiring to produce a specific instruction according to one, described specific instruction comprises an operational order, one first field and one second Field, and described operational order includes writing back instruction;
According to described first field and described second field, obtain a deviant and an initial address;
According to described initial address and described deviant, a selected appointment section in described cache;And
It is written back to memorizer in response to the described data that will be stored in described appointment section of instruction that write back.
Method the most according to claim 1, wherein, selectes described appointment according to described initial address and described deviant The step of section includes:
An end address is produced according to described first field and described second field computing;And
Described appointment section is determined according to described initial address, described deviant and described end address.
Method the most according to claim 1, wherein, described requirement is from a software.
Method the most according to claim 1, wherein, described specific instruction comprises an ineffective treatment instruction.
Method the most according to claim 1, wherein, described first field and described second field sequentially follow described Operational order, and described second field points to a buffer.
Method the most according to claim 1, wherein, the described step according to requirement execution one specific instruction includes decoding Described requirement is to produce described specific instruction.
Method the most according to claim 1, wherein, described deviant is the quantity of the cache column of skew.
8. a processor, including:
One cache systems, including a cache memory and a director cache, wherein, described cache Memorizer includes that multiple cache column, each cache column include multiple section;And
One processor core, requires to produce a specific instruction according to one, and described specific instruction comprises an operational order, one first hurdle Position and one second field, described operational order includes writing back instruction, and described processor core according to described first field with And described second field obtains a deviant and an initial address;
Wherein, described processor core sends described initial address and described deviant to described director cache, Described director cache is according to described initial address and deviant, and in this cache memory, selected one specifies district Section, and, described director cache writes back instruction will be stored in specifying the data in section to be written back to this in response to described Cache memory.
Processor the most according to claim 8, wherein, described processor core is always according to described deviant and described Beginning address arithmetic produces an end address, and described initial address, described deviant and described end address are supplied to institute State director cache.
Processor the most according to claim 8, wherein, described director cache according to described initial address and Described deviant computing produces an end address, to determine described appointment section.
11. processors according to claim 8, wherein, described requirement is from a software.
12. processors according to claim 8, wherein, described specific instruction comprises an ineffective treatment instruction.
13. processors according to claim 8, also include multiple buffer, wherein said first field and described Two fields sequentially follow described operational order, and described second field point to the plurality of buffer one of them.
14. processors according to claim 8, wherein, described deviant is the quantity of the cache column of skew.
CN201110448085.6A 2011-12-28 2011-12-28 The method that the cache of processor is purged and this processor Active CN103186474B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110448085.6A CN103186474B (en) 2011-12-28 2011-12-28 The method that the cache of processor is purged and this processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110448085.6A CN103186474B (en) 2011-12-28 2011-12-28 The method that the cache of processor is purged and this processor

Publications (2)

Publication Number Publication Date
CN103186474A CN103186474A (en) 2013-07-03
CN103186474B true CN103186474B (en) 2016-09-07

Family

ID=48677650

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110448085.6A Active CN103186474B (en) 2011-12-28 2011-12-28 The method that the cache of processor is purged and this processor

Country Status (1)

Country Link
CN (1) CN103186474B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107479860B (en) * 2016-06-07 2020-10-09 华为技术有限公司 Processor chip and instruction cache prefetching method
CN114385528A (en) * 2020-10-16 2022-04-22 瑞昱半导体股份有限公司 Direct memory access controller, electronic device using the same, and method of operating the same

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101533371A (en) * 2008-03-12 2009-09-16 Arm有限公司 Cache accessing using a micro tag

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6734867B1 (en) * 2000-06-28 2004-05-11 Micron Technology, Inc. Cache invalidation method and apparatus for a graphics processing system
US8214598B2 (en) * 2009-12-22 2012-07-03 Intel Corporation System, method, and apparatus for a cache flush of a range of pages and TLB invalidation of a range of entries

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101533371A (en) * 2008-03-12 2009-09-16 Arm有限公司 Cache accessing using a micro tag

Also Published As

Publication number Publication date
CN103186474A (en) 2013-07-03

Similar Documents

Publication Publication Date Title
US7234040B2 (en) Program-directed cache prefetching for media processors
US9727471B2 (en) Method and apparatus for stream buffer management instructions
US8244984B1 (en) System and method for cleaning dirty data in an intermediate cache using a data class dependent eviction policy
US8035648B1 (en) Runahead execution for graphics processing units
CN105339908B (en) Method and apparatus for supporting long-time memory
TWI489385B (en) A computer-implemented method and a subsystem for free-fetching cache lines
US6513107B1 (en) Vector transfer system generating address error exception when vector to be transferred does not start and end on same memory page
US20030154349A1 (en) Program-directed cache prefetching for media processors
KR20120049806A (en) Prefetch instruction
JP2014115851A (en) Data processing device and method of controlling the same
JP2008047124A (en) Method and unit for processing computer graphics data
US8359433B2 (en) Method and system of handling non-aligned memory accesses
US8656093B1 (en) Supporting late DRAM bank hits
US8661169B2 (en) Copying data to a cache using direct memory access
CN103186474B (en) The method that the cache of processor is purged and this processor
CN109669881B (en) Computing method based on Cache space reservation algorithm
US9158697B2 (en) Method for cleaning cache of processor and associated processor
KR20080006136A (en) Cache memory apparatus for 3-dimensional graphic computation, and method of processing 3-dimensional graphic computation
US8375163B1 (en) Supporting late DRAM bank hits
TW202029200A (en) Electronic device and method for managing electronic device
CN105900060B (en) Memory pool access method, device and computer equipment
US7085887B2 (en) Processor and processor method of operation
JP2004240616A (en) Memory controller and memory access control method
TWI739430B (en) Cache and method for managing cache
CN117032594B (en) Read command scheduling method, processing method, device and storage equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant