CN101887360A - The data pre-acquisition machine of microprocessor and method - Google Patents

The data pre-acquisition machine of microprocessor and method Download PDF

Info

Publication number
CN101887360A
CN101887360A CN201010220151XA CN201010220151A CN101887360A CN 101887360 A CN101887360 A CN 101887360A CN 201010220151X A CN201010220151X A CN 201010220151XA CN 201010220151 A CN201010220151 A CN 201010220151A CN 101887360 A CN101887360 A CN 101887360A
Authority
CN
China
Prior art keywords
line taking
stride
fast line
address
written
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201010220151XA
Other languages
Chinese (zh)
Inventor
约翰·M·吉尔
罗德尼·E·虎克
艾伯特·J·娄坡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Via Technologies Inc
Original Assignee
Via Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/573,462 external-priority patent/US20110010506A1/en
Application filed by Via Technologies Inc filed Critical Via Technologies Inc
Publication of CN101887360A publication Critical patent/CN101887360A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/34Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
    • G06F9/345Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results
    • G06F9/3455Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results using stride

Abstract

The data pre-acquisition machine of microprocessor and method, this data pre-acquisition machine comprises the list field, is written into the historical record of computing in order to maintenance.Each field is stored a label and corresponding next stride.Label comprises the first continuous stride and second stride.Next stride comprises first stride.First stride deducts the first fast line taking address by the second fast line taking address and gets resultant.It is resultant that second stride deducts the second fast line taking address by the 3rd fast line taking address.First, second and the 3rd fast line taking address comprise the storage address of a fast line taking respectively, and it is respectively by first, second and the 3rd before to be written into computing indicated.Control logic circuit deducts last fast line taking address to calculate present stride by the fast line taking address that will newly be written into; Last stride that links to each other in the inquiry form and present stride; Use next stride that hits in the list field to capture fast line taking in advance.

Description

The data pre-acquisition machine of microprocessor and method
Technical field
The present invention relates to field of microprocessors, particularly about the acquisition mechanism in advance in the field of microprocessors.
Background technology
The data pre-acquisition of microprocessor (prefetching) notion is the popular notion of being known.Say that concisely microprocessor can detect program flow from continuous storage address and capture this program flow in advance.Yet program flow is not all to be to be positioned at continuous memory location, can skip a fixed data amount between the data usually.The fixed range of this fixed data amount is referred to as " stride " (stride) usually, and program promptly is written into data in the middle of this stride.It also is the popular technology of being known that the stride detection (stride-detecting) of microprocessor captures mechanism in advance.The tradition stride detects and captures mechanism in advance mainly is according to single step width of cloth spacing, but the inventor observes some important procedure with rule mode access data, but not according to single step width of cloth spacing.Therefore, to detect acquisition mechanism in advance be the address that is written into of these programs of can't calculating to a nicety for traditional stride.
Summary of the invention
One of feature of the embodiment of the invention is to provide a kind of data pre-acquisition device.Data pre-acquisition machine comprises a list, has a plurality of fields, is written into the historical record of computing in order to maintenance.Each field storage tags and corresponding next stride, label comprises the first continuous stride and second stride.Next stride comprises first stride.First stride deducts the first fast line taking address by the second fast line taking address and gets resultant.It is resultant that second stride deducts the second fast line taking address by the 3rd fast line taking address.First, second and the 3rd fast line taking address comprise the storage address of fast line taking respectively, and it is respectively by first, second and the 3rd before to be written into computing indicated.Data pre-acquisition machine also comprises a control logic circuit, is coupled to list.Deduct last fast line taking address to calculate present stride by the fast line taking address that will newly be written into.Last stride that links to each other in the inquiry form and present stride.Capture fast line taking in advance capturing fast line taking address in advance.Capture in advance next stride of being hit for last stride continuous in the fast line taking address that newly is written into and the list and stride at present fast line taking address and.The fast line taking address that newly is written into comprises the storage address that newly is written into the indicated fast line taking of computing.Last fast line taking address comprises the last storage address that is written into the indicated fast line taking of computing that newly is written into computing.It is resultant that last stride deducts last fast line taking address by the first two fast line taking address.The first two fast line taking address comprises the first two that newly be written into computing and is written into the storage address of the indicated fast line taking of computing.
Another feature of the present invention is to be provided in little processing method of acquisition data in advance.This method comprises according to being written into the historical record of computing to safeguard the field of a list.Each field is stored a label and corresponding next stride.Label comprises the first continuous stride and second stride.Next stride comprises first stride.First stride deducts the first fast line taking address by the second fast line taking address and gets resultant.It is resultant that second stride deducts the second fast line taking address by the 3rd fast line taking address.First, second and the 3rd fast line taking address comprise the storage address of fast line taking respectively, and it is respectively by first, second and the 3rd before to be written into computing indicated.This method also comprises the fast line taking address that will newly be written into and deducts last fast line taking address to calculate present stride.The fast line taking address that newly is written into comprises the storage address that newly is written into the indicated fast line taking of computing.Last fast line taking address comprises the last storage address that is written into the indicated fast line taking of computing that newly is written into computing.This method also comprises the last stride and present stride that links to each other in the inquiry form.It is resultant that last stride deducts last fast line taking address by the first two fast line taking address.The first two fast line taking address comprises the first two that newly be written into computing and is written into the storage address of the indicated fast line taking of computing.This method also is contained in one and captures fast line taking address in advance and capture a fast line taking in advance.Capture in advance next stride of being hit for last stride continuous in the fast line taking address that newly is written into and the list and stride at present fast line taking address and.
Description of drawings
Fig. 1 shows the calcspar of microprocessor of the present invention.
The calcspar of the data pre-acquisition engine of Fig. 2 displayed map 1.
Fig. 3 shows the calcspar of the flow hardware setting of Fig. 2 of the present invention.
Fig. 4 a and Fig. 4 b show the operational flowchart of the data pre-acquisition engine of Fig. 2 of the present invention.
The list of Fig. 5 shows the operation of the data pre-acquisition engine of Fig. 2 of the present invention.
[main element symbol description]
100 microprocessors
102 instruction acquisition levels
104 instruction decode stage
106 operands acquisition level
108 execution levels
112 results write back/the Retirement level
114 memory sub-systems
122 memory caches
124 data pre-acquisition engines
126 are written into the unit
128 storage unit
130 bus interface
202 flow hardware setting
204 flow base addresses
206 control logic circuits
208 are written into the address
212 are provided with the selection signal
216 final stride predictions
218 capture the address in advance
222 totalizers
224 multiplexers
The prediction of 228 strides
302 lists
304 flow bae address registers
306 last fast line taking address registers
308 at present fast line taking address registers
312 last stride registers
314 present stride registers
316 are written into counter
322 last stride subregions
324 present stride subregions
326 next stride zone
332 hiting signals
The 402-436 step
500 lists
Embodiment
Present embodiment provides the second order list of a kind of stride prediction, when program is carried out data access in the mode of rule but not according to single step width of cloth spacing, can promote the degree of accuracy that microprocessor is written into prediction.
Please refer to Fig. 1, it shows the calcspar of microprocessor 100 of the present invention.Microprocessor 100 comprises instruction acquisition level 102, instruction decode stage 104, operand acquisition level 106, execution level 108 and writes back/Retirement level 112 with the result.Aforesaid every grade also can comprise multistage.In one embodiment, microprocessor 100 can be the out of order execution of SuperScale (superscalar out-of-order)/orderly resignation (in-order retirement) microprocessor.Microprocessor 100 also comprises bus interface 130, in order to little processing 100 is connected to external bus with access system storer and peripheral unit.Microprocessor 100 also comprises memory sub-system 114, and it comprises one or more memory cache 122, data pre-acquisition engine 124, is written into unit 126 and storage unit 128.
Please refer to Fig. 2, the calcspar of the data pre-acquisition engine 124 of its displayed map 1.Data pre-acquisition engine 124 comprises a plurality of flow hardware setting 202, and it is coupled to control logic circuit 206.Flow hardware setting 202 receive by other elements of microprocessor produce be written into computing specified be written into address 208.In one embodiment, being written into address 208 is 36 physical address, and the capacity of flow or memory area is the page or leaf of 4KB (byte), and the capacity of fast line taking (cache line) is 64 bytes.Therefore, position [35:12] expression number of pages, the fast line taking in the middle of position [11:6] the expression page or leaf, and position [5:0] is shown in the side-play amount in the fast line taking.In addition, flow base address (stream base address, SBA) 304 (as shown in Figure 3) are corresponding to the position [35:12] of physical address, and last fast line taking address (previous cache lineaddress, PCLA) 306, at present fast line taking address (current cache line address, CCLA) 308, last stride (previous stride, PS) 312 with stride at present (current stride, CS) 314 (as shown in Figure 3) all corresponded to the position [11:6] of physical address.Yet, in other embodiments, consideration according to reality, can the use capacity differ from the flow of 4KB page or leaf size or memory block (2MB page or leaf or for example according to type of memory range registers (the memorytype range register of microcode (microcode) definition, MTRR) or program contingency table (program associate table, defined random areas), and can use the fast line taking of different capabilities PAT).
Each flow hardware setting 202 provides flow base address 204 to control logic circuit 206.Control logic circuit 206 with flow base address 204 be written into address 208 and make comparisons, with generation the value of selecting signal (S) 212 is set, in order to point out flow base address 204 and to be written into the address 208 flow hardware setting 202 of coupling mutually.Be provided with and select signal 212 to offer multiplexer 224, and multiplexer 224 predicted for 228 (as shown in Figure 3) from each flow hardware setting 202 reception stride.Select signal 212 to select one of them stride prediction 228 according to being provided with, as final stride prediction 216.Totalizer 222 adds that with final stride prediction 216 being written into address 208 captures address 218 in advance to produce.
Please refer to Fig. 3, it shows the calcspar of the flow hardware setting 202 of Fig. 2 of the present invention.Flow hardware setting 202 comprises flow bae address register 304, last fast line taking address register 306, at present fast line taking address register 308, last stride register 312, present stride register 314, is written into counter 316 and list 302.List 302 be content addressable memory (content-addressable memory, CAM).Each field of list 302 comprises label area and data area.Last stride 322 subregions and present stride 324 subregions of label area for linking to each other.The data area is next stride (next stride, NS) 326 zones.When flow hardware setting 202 is ready to complete, just can carry out the stride prediction, last stride 322 that links to each other in its meeting inquiry form 302 and present stride 324.If find effective (valid) label of coupling, then can export the hiting signal 332 of true value; Otherwise, then export falsity.If for hitting, then the value in next stride 326 zone predicts 228 as stride in the list 302 output coupling fields.
Please refer to Fig. 4 a and Fig. 4 b, it shows the operational flowchart of the data acquisition engine 124 of Fig. 2 of the present invention.This process flow diagram originates in square 402.
At square 402, data pre-acquisition engine 124 receive be written into computing specified be written into address 208, as shown in Figure 2.Then, flow process enters decision square 404.
At decision square 404, the comparer of control logic circuit 206 will be written into the position of address 208, and [35:12 and flow base address 204 (its flow bae address register 304 by each flow hardware setting 202 is provided) are made comparisons.If coupling represents that then flow hardware setting 202 has been assigned to the flow that is written into address 208 indications (as memory area, for example number of pages), then continues to carry out square 406; Otherwise, carry out square 408.
At square 406, control logic circuit 206 sends the index (signal S) of coupling flow hardware setting 202, is written into the stride of computing in order to next that predict this memory area.In addition, control logic circuit 206 increases progressively the value that is written into counter 316 of the flow hardware setting 202 that has disposed.Then, carry out square 412.
At square 408, flow hardware setting 202 of control logic circuit 206 configurations is (in an embodiment, distribute least recently used (least-recently-used) person), the index (signal S) of flow hardware setting 202 of the concurrent configuration that makes new advances is written into the stride of computing with next of forecast memory zone.In addition, control logic circuit 206 remove new configuration flow hardware setting 202 be written into counter 316.Then, enter square 412.
At square 412, flow hardware setting 202 is loaded into last fast line taking address register 306 with the value of at present fast line taking address register 308.Then, carry out square 414.
At square 414, flow hardware setting 202 will be written into address 208 and be loaded at present fast line taking address register 308.Then, carry out decision square 416.
At decision square 416, whether the value that 202 decisions of flow hardware setting are written into counter 316 is 1, that is, whether be that flow hardware setting 202 is written into computing the second time of this memory area.(square 416 and 422 can allow data pre-acquisition engine 124 optimizations, and then can use and lessly be written into computing with accurate prediction stride.In the method, program is written into by identical stride (for example 3,3,3).Yet, can not use the method at other embodiment).Equal 1 if be written into the value of counter 306, then carry out square 422; Otherwise, carry out square 418.
At square 418, flow hardware setting 202 is loaded into last stride register 312 with the value of present stride register 314; And the difference with 306 of at present fast line taking address register 308 and last fast line taking address registers is loaded into present stride register 314.Then, carry out square 424.
At square 422, flow hardware setting 202 is loaded into present stride register 314 and last stride register 312 with the difference of 306 of at present fast line taking address register 308 and last fast line taking address registers.Then, carry out square 424.
At square 424, the last stride register 312 that links to each other in flow hardware setting 202 inquiry forms 302 and the value of present stride register 314.Then, carry out decision block 426.
Whether in decision block 426, control logic circuit 206 is checked hiting signal 332, occur with the inquiry of being done in the decision square 424 and hit.If then carry out square 428; Otherwise, carry out square 432.
At square 428, flow hardware setting 202 predicts 228 with the value output in next stride zone 326 that list 302 in the square 426 is hit as stride.This flow process promptly ends at square 428.
At square 432, flow hardware setting 202 is distributed new field in list 302.In one embodiment, with first in first out (first in first out, the field in the order configuration list 302 FIFO).Then, carry out square 434.
At square 434, flow hardware setting 202 is loaded into the label area (that is, last stride zone 322 and present stride zone 324) of new distribution field with last stride register 312 values that link to each other and stride register 314 values at present.Then, carry out square 436.
At square 436, flow hardware setting 202 is inserted the data area (that is, next stride zone 326) of new distribution field with the value of last stride register 312.This flow process promptly ends at square 436.
Please refer to the list 500 of Fig. 5, it shows that the data pre-acquisition machine 124 of Fig. 2 of the present invention is written into the operation of sequence of operations for an illustration.What each leu in the list 500 time indicated next input is written into address 208 (only showing fast line taking quantity, also ascend the throne [11:6], rather than the whole address 208 that is written into).In this example, be followed successively by 00,01,04 by being written into the fast line taking number that address 208 indicated, 05 and 08, its tool 01,03,01,03 pattern such as stride such as second order such as grade (two-level stride pattern).The present invention can predict the multistage stride pattern that is written into access.When flow hardware setting 202 for after being written into address 208 executable operations, each row of list 500 demonstrate last fast line taking address register 306, at present fast line taking address register 308, last stride register 312, the content of stride register 314 and list 302 at present.Each row of list 500 also demonstrate hiting signal 322 and predict 228 value with stride.For the purpose of simplified illustration, sequence system shown in Figure 5 supposes that all are written into address 208 and are identical memory area, thereby selects identical flow hardware setting 202.
The initial value of the first row indication flow hardware setting 202 of list 500.Last fast line taking address register 306, at present fast line taking address register 308, last stride register 312 all are initially 0 with stride register 314 at present, and the field of list 302 all be made as invalid.
The value that is written into address 208 that list 500 secondary series are indicated is 00.Carry out the step of square 408, to distribute new flow hardware setting 202; And carry out the step of square 412,414 and 418, in order to last fast line taking address register 306, at present fast line taking address register 308, last stride register 312 are updated to 0 respectively with the value of present stride register 314.By being to be written into by memory area for the first time at this, therefore, square 424 performed Query Results are miss.When being written into for the first time by memory areas, list 302 can not upgrade, because the value that lacks last fast line taking address register 306 is to calculate present stride.
The indicated value that is written into address 208 of list 500 the 3rd row is 01.Carrying out the step of square 412,414 and 422, is 00,01,00 and 01 with the value of upgrading last fast line taking address register 306, at present fast line taking address register 308, last stride register 312 and present stride register 314 respectively.Result in square 424 inquiries [01:01] is miss.In addition, flow hardware setting 202 is carried out squares 432,434 and 436 step, distributing the field in the list 302, and insert 01,01 and 01 to last stride zone 322 respectively, stride regional 324 and next stride zone 326 at present.
The indicated value that is written into address 208 of list 500 the 4th row is 04.Carrying out the step of square 412,414 and 418, is 01,04,01 and 03 with the value of upgrading last fast line taking address register 306, at present fast line taking address register 308, last stride register 312 and present stride register 314 respectively.Result in square 424 inquiries [01:03] is miss.In addition, flow hardware setting 202 is carried out squares 432,434 and 436 step, distributing the field in the list 302, and insert 01,03 and 01 to last stride zone 322 respectively, stride regional 324 and next stride zone 326 at present.
The indicated value that is written into address 208 of list 500 the 5th row is 05.Carrying out the step of square 412,414 and 418, is 04,05,03 and 01 with the value of upgrading last fast line taking address register 306, at present fast line taking address register 308, last stride register 312 and present stride register 314 respectively.Result in square 424 inquiries [03:01] is miss.In addition, flow hardware setting 202 is carried out squares 432,434 and 436 step, distributing the field in the list 302, and insert 03,01 and 03 to last stride zone 322 respectively, stride regional 324 and next stride zone 326 at present.
The indicated value that is written into address 208 of list 500 the 6th row is 08.Carrying out the step of square 412,414 and 418, is 05,08,01 and 03 with the value of upgrading last fast line taking address register 306, at present fast line taking address register 308, last stride register 312 and present stride register 314 respectively.In the result of square 424 inquiry [01:03] for hitting, because second field of itself and list 302 is complementary.Therefore, flow hardware setting 202 is carried out the step of squares 428, the value (is 01 at this example) in next stride zone 326 of list 302 fields that hit is exported, with as stride predicted value 228.By this, data pre-acquisition engine 124 will help to capture in advance by capturing the specified fast line taking in address 218 in advance, and this captures address 218 in advance and equals to be written into address 208 and add final stride prediction 216 (are 01 at this example).This kind captures mechanism in advance can be by reducing or avoid capturing in advance the time that is written into of fast line taking, thereby save many times.
In other embodiments, can be according to the indicated chart of field of coupling list 302, the hit detection of mat list 302 is to trigger the acquisition mechanism in advance of a plurality of fast line takings.For example, in the hit detection of the 6th row of Fig. 5, the fast line taking that not only triggers stride 01 captures in advance, and the fast line taking that triggers stride 03,01 and 03 etc. successively captures in advance.Fast line taking captures the quantity of triggering in advance can be according to the different capabilities of the memory cache 122 of Fig. 1, changes with the capacity of flow hardware setting 202 and list 320 or other factors.
Though previous embodiment is only safeguarded in historical form and compare two strides, yet, in other embodiments, also can safeguard and more more stride, to be applicable to more complicated program access pattern.
For above-mentioned disclosed various embodiment, those skilled in the art should know that this embodiment is as illustration and unrestricted.Those skilled in the art should understand not breaking away under the spirit of the present invention, can do the variation of form and details.For example, can use function, manufacturing, modeling, simulation, description and/or the test of software to implement disclosed device and method.Can use general program language (as C, C Plus Plus), hardware description language (HDL, it comprises Verilog HDL, VHDL etc.) or other suitable program languages.But this software can place any known computing machine storage medium, for example semiconductor, tape or CD (for example CD-ROM, DVD-ROM etc.).Disclosed apparatus and method can be semiconductor Wise property core (IP core), microcontroller core (for example with HDL describe) for example, and when making integrated circuit, be converted into hardware.In addition, disclosed apparatus and method also can hardware, the combination of software mode implements.Therefore, the present invention is not limited to any exemplary embodiments in this instructions, and should only be defined by following claim.More particularly, the present invention can be implemented by micro treatmenting device, and it can be used in the general computing machine.The modification that disclosed notion of those skilled in the art and embodiment do as the basis must belong to the scope that claims define.

Claims (12)

1. the data pre-acquisition machine of a microprocessor comprises:
One list, have a plurality of fields, be written into the historical record of computing in order to maintenance, each this field is stored a label and corresponding next stride, wherein this label comprises one first continuous stride and one second stride, wherein this next stride comprises this first stride, wherein this first stride deducts one first fast line taking address by one second fast line taking address and gets resultant, wherein to deduct one second fast line taking address by one the 3rd fast line taking address resultant for this second stride, wherein this first, the second and the 3rd fast line taking address comprises the storage address of a fast line taking respectively, and it is respectively by first, the second and the 3rd before to be written into computing indicated; And
One control logic circuit is coupled to this list, by a fast line taking address that newly is written into being deducted a last fast line taking address to calculate a present stride; Inquire about the last stride and this present stride that link to each other in this list; Capture fast line taking address in advance in one and capture a fast line taking in advance, this capture in advance this next stride of being hit for this last stride continuous in this fast line taking address that newly is written into and this list and this present stride fast line taking address with;
Wherein this fast line taking address that newly is written into comprises a storage address that newly is written into the indicated fast line taking of computing, wherein this last fast line taking address comprises this last storage address that is written into the indicated fast line taking of computing that newly is written into computing, wherein to deduct this last fast line taking address by the first two fast line taking address resultant for this last stride, wherein this first two fast line taking address comprise this newly be written into computing one the first two be written into a storage address of the indicated fast line taking of computing.
2. the data pre-acquisition machine of microprocessor according to claim 1, if this last stride that links to each other in this list is miss with this present stride, then this control logic circuit distributes the field in this list, the label of inserting this distribution list field with this last stride of linking to each other and this present stride, and this next stride of inserting this distribution list field with this last stride.
3. the data pre-acquisition machine of microprocessor according to claim 1 also comprises:
A plurality of lists are written into one of computing historical record in order to safeguard corresponding to a plurality of memory areas;
When this newly is written into this indicated fast line taking of address and is determined not to be present in this a plurality of list corresponding memory zone, this control logic circuit then distribute these a plurality of lists one of them;
When one of these a plurality of lists were determined to be allocated in memory area and this memory area and contain this and newly be written into the indicated fast line taking in address, this control logic circuit then used this distributions list to inquire about to carry out.
4. the data pre-acquisition machine of microprocessor according to claim 1 also comprises:
One counter then increases progressively as the new computing that is written into;
When the value of this counter was 1 after increasing progressively, then this control logic circuit deducted last fast line taking address calculating this last stride newly to be written into fast line taking address, but not the first two fast line taking address deducts last fast line taking address with this;
When this list of first use, this control logic circuit is removed this counter and is zero.
5. the data pre-acquisition machine of microprocessor according to claim 1, wherein this control logic circuit also comprises:
Capture one second fast line taking in advance in a fast line taking address, wherein this fast line taking address by the present stride that captures fast line taking address and this list field in advance with resultant.
6. the data pre-acquisition machine of microprocessor according to claim 1, wherein this label also comprises one the 3rd stride, it is resultant that it deducts the 3rd fast line taking address by one the 4th fast line taking address, wherein the 4th fast line taking address comprises the storage address of a fast line taking, it is written into the 4th of computing by the temporary transient leading the 3rd, and to be written into computing indicated, wherein this control logic circuit is in order to inquire about the first two stride that links to each other in this list, this last stride and this present stride, wherein to deduct this first two fast line taking address by first three fast line taking address resultant for this first two stride, wherein this first three fast line taking address by this newly be written into one of computing first three to be written into computing indicated.
7. the data pre-acquisition method of a microprocessor comprises:
According to the historical record that is written into computing to safeguard the field of a list, each this field is stored a label and corresponding next stride, wherein this label comprises one first continuous stride and one second stride, wherein this next stride comprises this first stride, wherein to deduct one first fast line taking address by one second fast line taking address resultant for this first stride, wherein to deduct one second fast line taking address by one the 3rd fast line taking address resultant for this second stride, wherein this first, the second and the 3rd fast line taking address comprises the storage address of a fast line taking respectively, and it is respectively by first, the second and the 3rd before to be written into computing indicated;
The one fast line taking address that newly is written into is deducted a last fast line taking address to calculate a present stride; Wherein this fast line taking address that newly is written into comprises a storage address that newly is written into the indicated fast line taking of computing, and wherein this last fast line taking address comprises this last storage address that is written into the indicated fast line taking of computing that newly is written into computing;
Inquire about the last stride and this present stride that link to each other in this list, wherein to deduct this last fast line taking address by the first two fast line taking address resultant for this last stride, wherein this first two fast line taking address comprise this newly be written into computing one the first two be written into a storage address of the indicated fast line taking of computing; And
Capture fast line taking address in advance one and capture a fast line taking in advance, this capture in advance this next stride of being hit for this last stride continuous in this fast line taking address that newly is written into and this list and this present stride fast line taking address with.
8. as the data pre-acquisition method of microprocessor as described in the claim 7, also comprise:
If this last stride that links to each other in this list is miss with this present stride, then distribute the field in this list;
Insert the label of this distribution list field with this continuous last stride and this present stride; And
Insert this next stride of this distribution list field with this last stride.
9. as the data pre-acquisition method of microprocessor as described in the claim 7, wherein this microprocessor comprises a plurality of lists, and in order to safeguard the historical record that is written into computing corresponding to a plurality of memory areas, this method also comprises:
When this newly is written into this indicated fast line taking of address and is determined not to be present in this a plurality of list corresponding memory zone, then distribute these a plurality of lists one of them; And
When one of these a plurality of lists are determined to be allocated in memory area and this memory area and contain this and newly be written into the indicated fast line taking in address, then use this distributions list to inquire about to carry out.
10. as the data pre-acquisition method of microprocessor as described in the claim 7, also comprise:
Then increase progressively a counter as the new computing that is written into;
When the value of this counter is 1 after increasing progressively, then deduct last fast line taking address calculating this last stride, but not the first two fast line taking address deducts last fast line taking address with this newly to be written into fast line taking address; And
When this list of first use, remove this counter and be zero.
11. the data pre-acquisition method as microprocessor as described in the claim 7 also comprises:
Capture one second fast line taking in advance in a fast line taking address, wherein this fast line taking address by the present stride that captures fast line taking address and this list field in advance with resultant.
12. data pre-acquisition method as microprocessor as described in the claim 7, wherein this label also comprises one the 3rd stride, it is resultant that it deducts the 3rd fast line taking address by one the 4th fast line taking address, wherein the 4th fast line taking address comprises the storage address of a fast line taking, it is written into the 4th of computing by the temporary transient leading the 3rd, and to be written into computing indicated, wherein this control logic circuit is in order to inquire about in this list the first two stride that one of links to each other, this last stride and this present stride, wherein to deduct this first two fast line taking address by first three fast line taking address resultant for this first two stride, wherein this first three fast line taking address by this newly be written into computing one first three to be written into computing indicated.
CN201010220151XA 2009-07-10 2010-06-25 The data pre-acquisition machine of microprocessor and method Pending CN101887360A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US22478109P 2009-07-10 2009-07-10
US61/224,781 2009-07-10
US12/573,462 US20110010506A1 (en) 2009-07-10 2009-10-05 Data prefetcher with multi-level table for predicting stride patterns
US12/573,462 2009-10-05

Publications (1)

Publication Number Publication Date
CN101887360A true CN101887360A (en) 2010-11-17

Family

ID=43073290

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010220151XA Pending CN101887360A (en) 2009-07-10 2010-06-25 The data pre-acquisition machine of microprocessor and method

Country Status (1)

Country Link
CN (1) CN101887360A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102385622A (en) * 2011-10-25 2012-03-21 曙光信息产业(北京)有限公司 Pre-reading method for stride access mode of file system
CN105183663A (en) * 2010-03-29 2015-12-23 威盛电子股份有限公司 Prefetch Unit And Data Prefetch Method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000322256A (en) * 1999-05-14 2000-11-24 Nec Ic Microcomput Syst Ltd Information processor
US20020065574A1 (en) * 2000-11-07 2002-05-30 Kunihiro Nakada Data processor, semiconductor integrated circuit and CPU
US6976147B1 (en) * 2003-01-21 2005-12-13 Advanced Micro Devices, Inc. Stride-based prefetch mechanism using a prediction confidence value
CN101078979A (en) * 2007-06-29 2007-11-28 东南大学 Storage control circuit with multiple-passage instruction pre-fetching function
CN101149704A (en) * 2007-10-31 2008-03-26 中国人民解放军国防科学技术大学 Segmental high speed cache design method in microprocessor and segmental high speed cache

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000322256A (en) * 1999-05-14 2000-11-24 Nec Ic Microcomput Syst Ltd Information processor
US20020065574A1 (en) * 2000-11-07 2002-05-30 Kunihiro Nakada Data processor, semiconductor integrated circuit and CPU
US6976147B1 (en) * 2003-01-21 2005-12-13 Advanced Micro Devices, Inc. Stride-based prefetch mechanism using a prediction confidence value
CN101078979A (en) * 2007-06-29 2007-11-28 东南大学 Storage control circuit with multiple-passage instruction pre-fetching function
CN101149704A (en) * 2007-10-31 2008-03-26 中国人民解放军国防科学技术大学 Segmental high speed cache design method in microprocessor and segmental high speed cache

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183663A (en) * 2010-03-29 2015-12-23 威盛电子股份有限公司 Prefetch Unit And Data Prefetch Method
CN105183663B (en) * 2010-03-29 2018-11-27 威盛电子股份有限公司 Pre-fetch unit and data prefetching method
CN102385622A (en) * 2011-10-25 2012-03-21 曙光信息产业(北京)有限公司 Pre-reading method for stride access mode of file system
CN102385622B (en) * 2011-10-25 2013-03-13 曙光信息产业(北京)有限公司 Pre-reading method for stride access mode of file system

Similar Documents

Publication Publication Date Title
TWI564718B (en) Multi-mode set associative cache memory dynamically configurable to selectively allocate into all or a subset of its ways depending on the mode
CN105701033B (en) The cache memory dynamically configurable depending on mode
CN104615548B (en) Data prefetching method and microprocessor
TWI599882B (en) Cache memory and operating method thereof, and method for operating a set of associative chahe memory
CN105701022B (en) Set associative cache
US7562191B2 (en) Microprocessor having a power-saving instruction cache way predictor and instruction replacement scheme
CN101694613B (en) Unaligned memory access prediction
US10671535B2 (en) Stride prefetching across memory pages
CN100495325C (en) Method and system for on-demand scratch register renaming
CN104317791A (en) Gathering and scattering multiple data elements
CN101158925B (en) Apparatus and method for supporting simultaneous storage of trace and standard cache lines
US9304919B2 (en) Detecting multiple stride sequences for prefetching
CN103238133A (en) Vector gather buffer for multiple address vector loads
CN101467135A (en) Apparatus and method of prefetching data
CN103793202A (en) Microprocessor and method for prefetching data to the microprocessor
CN102236541A (en) Preload instruction control
CN109643237A (en) Branch target buffer compression
JP5625809B2 (en) Arithmetic processing apparatus, information processing apparatus and control method
CN101887360A (en) The data pre-acquisition machine of microprocessor and method
CN100388187C (en) Apparatus for predicting multiple branch target addresses
US20140115257A1 (en) Prefetching using branch information from an instruction cache
CN101882063A (en) Microprocessor and prefetch data are to the method for microprocessor
JP2000215104A (en) Hit/miss-by-way counter and its counting method
CN115964309A (en) Prefetching
CN117743210A (en) Selective control flow predictor insertion

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20101117