CN103186474B

CN103186474B - The method that the cache of processor is purged and this processor

Info

Publication number: CN103186474B
Application number: CN201110448085.6A
Authority: CN
Inventors: 卢彦儒; 虞敬业; 林振东; 黄朝玮
Original assignee: Realtek Semiconductor Corp
Current assignee: Realtek Semiconductor Corp
Priority date: 2011-12-28
Filing date: 2011-12-28
Publication date: 2016-09-07
Anticipated expiration: 2031-12-28
Also published as: CN103186474A

Abstract

The present invention relates to method and this processor being purged the cache of processor, the method includes requiring to produce a specific instruction according to one, and this specific instruction comprises an operational order, one first field and one second field；According to this first field and this second field, obtain a deviant and an initial address；According to this initial address and this deviant, a selected appointment section in this cache；And remove the data being stored in this appointment section.This specific instruction can comprise and writes back (Writeback), ineffective treatment (Invalidate) or write back and add ineffective treatment (Writeback+Invalidate).

Description

The method that the cache of processor is purged and this processor

Technical field

The present invention is the sweep-out method about a kind of cache, clear especially with regard to a kind of cache to processor Except the method specifying section.

Background technology

Cache (cache) refers to that access speed is than general random access memory a kind of internal memory faster, it is however generally that, It uses DRAM technology unlike main system memory (main memory), but uses expensive but faster SRAM skill Art.With reference to Fig. 1, due to processor (CPU) 10 executions speed more than main storage 12 reading speed soon, processor 10 to The data of accessing main memory 12, it is necessary to wait that several processor frequencies cycle causes the waste for the treatment of efficiency, therefore, process Device 10 is when accessing data, and its core 102 can arrive first in cache 104 and look for, when required data are because of operation before When being temporary in cache 104, processor 10 avoids the need for reading data from main storage 12, and can be directly from a high speed Caching 104 acquisition desired data, thus promote access speed, it is thus achieved that better performance.

A kind of high-order technology that the cache of CPU was once used on supercomputer, but make on modern computer Microprocessor all incorporate the data high-speed caching and instruction cache differed in size at chip internal, be commonly referred to as L1 high Speed caching (L1Cache i.e. Level 1On-dieCache, first order on chip cache)；And than the L2 high speed of L1 more capacity Caching was once placed in outside CPU, such as on motherboard or CPU adapter, but have become as now the standard within CPU Assembly；More expensive top domestic and work station CPU even can be equipped with the third level speed buffering bigger than L2 cache Memorizer (level 3On-die Cache；L3 cache).

Thering is provided the purpose of cache is to allow the processing speed of velocity adaptive CPU of data access, in order to fully send out Waving the effect of cache, cache now the most not only relies on the most accessed temporary data to provide cache Ability, also can coordinate branch prediction and the data pre-fetching technology of hardware implementation, as far as possible the data that will use in advance from master Memorizer is got in cache, promotes CPU and obtains the probability of desired data in cache.Capacity due to cache Limited, in addition to the CPU desired data that effectively prestores, the data that in good time removing is stored in cache are also particularly significant 's.Cache can be provided write back (Writeback) or ineffective treatment according to system or the demand of software by CPU (Invalidate) instruction.With reference to Fig. 1, when core 102 carries out written-back operation to cache 104, former being stored in is delayed at a high speed Deposit the data in 104 and be written back to main storage 12；When performing ineffective treatment operation, core 102 is by the institute in cache 104 There is data dump (clean)；Generally, write back instruction to send together along with ineffective treatment instruction, to write back primary storage in data Whole cache is removed after device 12.But, cache capacity in early days is minimum, and the most several KB, therefore without the concern for such as What understands partial sector, but cache now has been expanded and has reached several MB, how the particular section number to cache According to being cleared into new problem.

Hacking et al. proposes a solution U.S. Patent No. US 6978357, but, this reset mode There is several restriction, first, the sector sizes being selected must be the multiple of 2；Second, the district of regular length can only be removed Section.

Summary of the invention

An object of the present invention, is to propose the instruction format of section in a kind of selected cache, according to this to process The cache of device selectes section the method removed.

An object of the present invention, is to propose a kind of to perform the instruction format of section in selected cache, according to this The processor that section selected in its cache is purged.

According to the present invention, a kind of method that the cache of processor is purged, including: require generation one according to one Specific instruction, this specific instruction comprises an operational order, one first field and one second field；According to this this first field with And this second field, obtain a deviant and an initial address；According to this initial address and this deviant, the most slow from this A selected appointment section in depositing；And remove the data being stored in this appointment section.

According to the present invention, a kind of processor includes: a cache, controls including a cache and a cache Device；And a processor core, require to produce a specific instruction according to one, this specific instruction comprise an operational order, one first Field and one second field, obtain a deviant and an initial address according to this first field and this second field；Its In, this processor core sends this initial address and this deviant to this director cache, and this cache controls Device is according to this initial address and deviant, and in this cache, selected one specifies section, and removing is stored in this appointment district The data of section.

The instruction format of present invention proposition makes to be eliminated the initial address of section and sector sizes all adjustable.

Accompanying drawing explanation

Fig. 1 is the schematic diagram of the framework of processor of the prior art；

Fig. 2 is the instruction format proposed according to the present invention；

Fig. 3 is the flow chart according to one embodiment of the invention；And

Fig. 4 is the processor architecture schematic diagram of the embodiment of the 3rd figure.

Primary clustering symbol description

10 processor 102 cores

104 cache 12 main storages

20 instruction 22 OP fields

24 skew field 26 buffer fields

40 cores 402 instruct the acquisition stage

404 instruction decoding stages 406 produced address command and approval stage

42 cache systems 422 director caches

424 data high-speeds cache 426 instruction caches

Detailed description of the invention

The present invention proposes a kind of method being purged the cache of processor, and Fig. 2 shows what the present invention proposed Instruction format, in instruction 20, OP field 22 is specific instruction, such as, write back (Writeback), ineffective treatment (Invalidate) and write back and add ineffective treatment (Writeback+Invalidate) etc., skew field 24 is for write one deviant (offset), the buffer field 26 being denoted as rS is for pointing to a buffer, to represent an initial address.Generally, process Device can be provided with 32 buffers, referred to as register file (Register File), in the present embodiment, buffer field 26 for point to these 32 buffers one of them, its store value be 0x8000_0000, therefore, core is done with 0x8000_0000 For initial address (Starting address), end address (End address) is 0x8000_0000+offset, offsets hurdle Position offset represented by 24 can be the quantity of the cache column (cache line) of skew.

For example, in the case of cache column size (cache line size) is 8bytes, when buffer hurdle The value of buffers that position 26 is pointed to is 0000, and when offset is 0001, and end address is rS+offset=0byte+1 (< < 3) Byte=8, start address now is 0000, and end address is 0008.CPU is according to the instruction in OP field 22, by the most slow Deposit the data stored by address 0000 to 0008 write back to main storage or removed.Thus change skew field 24 and delay The value of storage field 26, the size of selected section and initial address all adjustable.

Fig. 3 is the flow chart according to one embodiment of the invention, and it is said by the processor architecture schematic diagram in conjunction with Fig. 4 Bright.As it was previously stated, processor includes core 40 and 42 two parts of cache systems, wherein, the process of core 40 is also divided into Multiple stages, such as instruction acquisition stage (Instruction Fetch；IF) 402, instruction decoding stage (Instruction Decode；ID) 404, finally enter generation address command and approval stage (Address-Command Generation& Issue) 406, in the present embodiment, after starting 301, perform step 302, core 40 is instructing the acquisition stage 402 according to coming From the requirement of software, it is decoded in the instruction decoding stage 404 and obtains the relevant information of deviant and initial address, determine behaviour Instruct and follow the buffer field after operational order and the value in skew field, then at producing address command and checking and approving rank Section 406 produces whole instruction, subsequently enters step 303, and core 40 obtains this according to the buffer that buffer field points to and initiates Address, produces end address then at step 304 according to this initial address and this deviant computing；In step 305, core 40 Send operational order, initial address and the end address director cache 422 to cache systems 42；Cache Section between this initial address and this end address is performed should the spy of operational order by controller 422 within step 306 Fixed operation, such as, write back, ineffective treatment or write back and add ineffective treatment, then terminates 307.Cache in cache 40 can be divided again (data cache) 424 and 426 two parts of instruction cache (instruction cache) is cached for data high-speed, this The mode that invention is proposed can be simultaneously suitable for this two kinds of caches, and wherein, instruction cache 426 is general without performing to write back The needs of instruction.

In the fig. 3 embodiment, core 40 provide initial address, deviant and end address to cache, but at it In its embodiment, end address can not be calculated generation by core 40, and core only provides operational order, initial address and deviant To cache systems 42, then gone to calculate this end address of generation by the director cache 422 in cache.

The foregoing is only the preferred embodiments of the present invention, all impartial changes done according to scope of the present invention patent with Modify, all should belong to the covering scope of the present invention.

Claims

1. the method being purged the cache of processor, wherein, described cache includes multiple cache Row, each cache column includes multiple section, and described method includes:

Requiring to produce a specific instruction according to one, described specific instruction comprises an operational order, one first field and one second Field, and described operational order includes writing back instruction；

According to described first field and described second field, obtain a deviant and an initial address；

According to described initial address and described deviant, a selected appointment section in described cache；And

It is written back to memorizer in response to the described data that will be stored in described appointment section of instruction that write back.

Method the most according to claim 1, wherein, selectes described appointment according to described initial address and described deviant The step of section includes:

An end address is produced according to described first field and described second field computing；And

Described appointment section is determined according to described initial address, described deviant and described end address.

Method the most according to claim 1, wherein, described requirement is from a software.

Method the most according to claim 1, wherein, described specific instruction comprises an ineffective treatment instruction.

Method the most according to claim 1, wherein, described first field and described second field sequentially follow described Operational order, and described second field points to a buffer.

Method the most according to claim 1, wherein, the described step according to requirement execution one specific instruction includes decoding Described requirement is to produce described specific instruction.

Method the most according to claim 1, wherein, described deviant is the quantity of the cache column of skew.

8. a processor, including:

One cache systems, including a cache memory and a director cache, wherein, described cache Memorizer includes that multiple cache column, each cache column include multiple section；And

One processor core, requires to produce a specific instruction according to one, and described specific instruction comprises an operational order, one first hurdle Position and one second field, described operational order includes writing back instruction, and described processor core according to described first field with And described second field obtains a deviant and an initial address；

Wherein, described processor core sends described initial address and described deviant to described director cache, Described director cache is according to described initial address and deviant, and in this cache memory, selected one specifies district Section, and, described director cache writes back instruction will be stored in specifying the data in section to be written back to this in response to described Cache memory.

Processor the most according to claim 8, wherein, described processor core is always according to described deviant and described Beginning address arithmetic produces an end address, and described initial address, described deviant and described end address are supplied to institute State director cache.

Processor the most according to claim 8, wherein, described director cache according to described initial address and Described deviant computing produces an end address, to determine described appointment section.

11. processors according to claim 8, wherein, described requirement is from a software.

12. processors according to claim 8, wherein, described specific instruction comprises an ineffective treatment instruction.

13. processors according to claim 8, also include multiple buffer, wherein said first field and described Two fields sequentially follow described operational order, and described second field point to the plurality of buffer one of them.

14. processors according to claim 8, wherein, described deviant is the quantity of the cache column of skew.