CN101211257A - Method and processor for solving access dependence based on local associative lookup - Google Patents

Method and processor for solving access dependence based on local associative lookup Download PDF

Info

Publication number
CN101211257A
CN101211257A CNA2006101715219A CN200610171521A CN101211257A CN 101211257 A CN101211257 A CN 101211257A CN A2006101715219 A CNA2006101715219 A CN A2006101715219A CN 200610171521 A CN200610171521 A CN 200610171521A CN 101211257 A CN101211257 A CN 101211257A
Authority
CN
China
Prior art keywords
memory access
number storage
storage order
instruction
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2006101715219A
Other languages
Chinese (zh)
Other versions
CN100545806C (en
Inventor
龙国平
范东睿
袁楠
张�浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
G Cloud Technology Co Ltd
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CNB2006101715219A priority Critical patent/CN100545806C/en
Publication of CN101211257A publication Critical patent/CN101211257A/en
Application granted granted Critical
Publication of CN100545806C publication Critical patent/CN100545806C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Advance Control (AREA)

Abstract

The invention relates to a new method for solving access dependence based on partial associative search. The method includes: a partial associative search mechanism, wherein, when a data taking instruction enters into an access group, a subset of the group which is positioned in front only needs to be accessed to adjust whether a latest value can be obtained from inquired data memorizing instruction, in a similar way, and when a memorizing instruction enters into the access group, the subset of the group which is positioned rearwards is only required to adjust whether the data taking instruction which is executed in advance and is written back is existed; an access dependence predictor, wherein, when the data taking instruction is renamed, the access dependence predictor is required to index an access distance, and is the access distance is effective, a transmission module must ensure that the data memorizing instruction thereof in front which is corresponding to the access distance is executed before sending the data taking instruction, and then the data taking instruction is sent.

Description

Based on the solution memory access of local associative lookup relevant method and processor
Technical field
The present invention relates to the structure of processor, more specifically, the present invention relates to solve relevant method and the processor thereof in the address between the accessing operation in the processor.
Background technology
All solve the relevant method in memory access address all based on certain processor microarchitecture.Fig. 1 is the structured flowchart of a modern superscalar microprocessor.
Present most of superscalar processor all adopts the basic structure that is similar to Fig. 1, and moreover, the most existing microprocessor realizes in the memory access parts that all an access queue guarantees the procedure order between the accessing operation of out of order execution.Be correlated with for the address that solves between the accessing operation, need that all access queue is carried out complete association and search.
Complete association in the access queue is searched and is embodied in: when a load instruction (load) is launched out when entering access queue, the necessary all number storage orders (store) of associative lookup, if having before the load instruction between number storage order and they exists the address relevant, mean that so the needed value of this load instruction is all or part of in the number storage order of formation front, need after associative lookup, pass to the load instruction that newly enters formation to the value of the number storage order of matching addresses in this case.Because present processor supports that all with 8,16,32 even 64 be the poke visit of unit, therefore a load instruction may need to transmit (Forward) data there from its a plurality of number storage order before simultaneously, and this has further increased the complicacy of access queue control.Similar with the situation of load instruction, when a number storage order enters access queue, all load instructions in the necessary associative lookup access queue, exist the relevant and relevant load instruction in address to write back in advance in case find follow-up load instruction and this number storage order, relevant load instruction must be put exception sign and brush so and fall follow-up all instructions that begin from this load instruction.
Disclose the method that the instruction of in computer processor scheduler program is carried out in U.S. Pat 6108770, this method comprises to be extracted from instruction poke device and holds instruction, and carries out the instruction of being extracted not according to the order of program.When detecting load instruction/number storage order and conflict in proper order, delete the operating result of this load instruction and be relevant to this result's instruction, re-execute these instructions.This load instruction produces related with other number storage orders about the data that this load instruction relied on.The set of all these number storage orders is called as a number storage order collection.In the emission subsequently of this load instruction, its execution is postponed, till all number storage orders of concentrating at the number storage order of this load instruction are launched.Two load instructions can a shared number storage order collection, when finding that load instruction that a number storage order is concentrated is relevant to number storage order of another number storage order collection, with two number storage orders set also.This US6108770 discloses a preferred embodiment, it comprises two tables, a table is the ID of a poke Patent Office table (SSIT), a part of index or the hash of its PC (programmable counter) by an instruction, and the item among the SSIT provides the number storage order collection that is used for second table of index ID.For each number storage order collection, in second table, comprised a pointer that points to the unenforced number storage order of last extraction.
U.S. Pat 5999727 discloses a kind of relevant method of memory access that solves, and promptly is recorded among the Icache together by historical information and load instruction/poke (Store) instruction that memory access is relied on, so that reference when carrying out instruction scheduling.
Load instruction number storage order number storage order number storage order load instruction number storage order number storage order number storage order load instruction load instruction this shows, in the disposal route of being correlated with in the existing address that solves between the accessing operation, need that not only each accessing operation is carried out complete association and search, and this complete association delay meeting of searching is because formation elongated and rapid deterioration.The complete association of access queue searched also will bring very high dynamic power consumption.
Summary of the invention
The purpose of this invention is to provide and a kind ofly can solve address between the accessing operation relevant method and processor, the power consumption that can save processor under the situation of not losing processor performance allows access queue have certain extensibility simultaneously.
To achieve these goals, the invention provides a kind of processor, comprising:
Get finger and decoding single part, be used for obtaining instruction stream, deliver to the register renaming parts after instruction stream is deciphered from internal memory;
The register renaming parts, the write after write (WAW) that is used to solve between instruction or the microcode is relevant with two kinds of writeafterreads (WAR), and all instructions or microcode are delivered to emission element after passing through register renaming; Described this register renaming unit also comprises a memory access correlation predictive table (MDP), all will inquire about this memory access correlation predictive table for a peek (LOAD) instruction, searches the item that wherein whether has coupling;
Emission element is used to safeguard all instructions or the operand of microcode, in case certain bar instruction or the essential operand of microcode are ready to, just it are transmitted into the rear end execution unit and carry out;
The rear end execution unit, it comprises some fixed point arithmetic logical blocks (ALU), floating-point arithmetic logical block (ALU), and some memory access parts, each memory access parts is provided with the procedure order that an access queue is safeguarded all accessing operations;
Instruction reorder queue (ROQ) is safeguarded the instruction of all processor pipelines or the procedure order of microcode, in case instruction or microcode are finished, just removes the reorder queue from instructing.Another aspect of the present invention provides a kind of relevant method of memory access that solves in processor, described processor comprises: get finger and decoding unit; The register renaming parts, described this register renaming unit also comprises a memory access correlation predictive table (MDP); Emission element; The rear end execution unit, it comprises some fixed point arithmetic logical blocks (ALU), floating-point arithmetic logical block (ALU), and some memory access parts; And instruction reorder queue (ROQ); Described method comprises the following steps:
1) refers to and decoding unit reception access instruction from getting;
2) judge that described access instruction is load instruction or number storage order;
3), judge whether to exist the item that mates with the program counter address of this load instruction if load instruction is then inquired about this memory access correlation predictive table;
4) if in described memory access correlation predictive device, there is the item that mates with the program counter address of this load instruction, then emission element stops this that this load instruction is transmitted into the memory access parts, is finished and value has been write in the data cache (DCACHE) up to the poke relevant with its address operation;
5) if do not find the item of coupling in memory access correlation predictive table, emission element should be transmitted into this load instruction the memory access parts;
6) inquire about accessing operation in the access queue forward, determine whether to obtain from the number storage order of inquiring about the value that to peek there.
Description of drawings
Below in conjunction with the detailed description of preferred embodiment of figure to being adopted, above-mentioned purpose of the present invention, advantage and feature will become apparent by reference, wherein:
Fig. 1 shows the basic block diagram of microprocessor of prior art;
Fig. 2 shows in the prior art access queue is carried out the process that complete association is searched;
Fig. 3 shows and realizes the basic block diagram of microprocessor of the present invention;
Fig. 4 shows the basic structure that realizes memory access correlation predictive device of the present invention;
Fig. 5 shows the basic controlling flow process that realizes emission control logic of the present invention;
Fig. 6 shows the process of being correlated with based on part associative lookup solution memory access of the present invention that realizes;
Fig. 7 shows the structured flowchart of the memory access parts in the preferred embodiments of the present invention.
Embodiment
The preferred embodiments of the present invention are described below with reference to the accompanying drawings.
Fig. 1 is the structured flowchart of a modern superscalar microprocessor.For the processor of sophisticated vocabulary, the base unit that each functional part is handled is a microcode; And for the processor of reduced instruction set computer, the elementary cell that each functional part is handled is instruction.The present invention is suitable for for the processor and the compacting instruction set processor of sophisticated vocabulary, so for fear of unnecessary repetition, except specified otherwise, represent the instruction or the microcode of the basic processing unit of each functional part in the processor in this instructions and claims with instruction.Each several part among the figure is briefly described below:
Get and refer to and decoding (101): from internal memory, obtain instruction stream, deliver to the register renaming module after instruction stream is deciphered.In order to improve handling property, also place instruction cache (CACHE) and instruction TLB parts usually here, because these parts and the present invention concern not quite, so do not illustrate in the drawings.
Register renaming (102): the write after write (WAW) that mainly solves between instruction or the microcode by register renaming is relevant with two kinds of writeafterreads (WAR), all instructions or microcode are through delivering to transmitter module on the one hand behind the register renaming, deliver on the other hand among the instruction reorder queue ROQ, and safeguard the procedure order of all executory instructions or microcode at ROQ.
Instruction issue unit (103): safeguard all instructions or the operand of microcode,, just it is transmitted into the rear end execution unit and carries out in case certain bar instruction or the essential operand of microcode are ready to.Emission element learns that by the intercepted result bus value of related register is ready, carries out thereby the instruction that will be detained because of waiting for this register or microcode are transmitted into functional part.
The rear end execution unit comprises some fixed point ALU, floating-point ALU, especially also comprises some memory access parts, and all at the memory access parts access queue is set safeguards the procedure order of all accessing operations (Program Order) to existing processor usually.In case one instruction or microcode are finished at functional part and just the result are write back to result bus.
Instruction reorder queue (Reorder Queue, ROQ), safeguard the instruction of all processor pipelines or the procedure order of microcode, in case one instruction or microcode correct execution finish, just can notify the register renaming table to revise rename table, instruction that will be finished from ROQ or microcode remove simultaneously.
Fig. 2 is relevant and access queue is carried out the basic diagram that complete association is searched in order to solve memory access.Here the formation of depositing number storage order with load instruction complete association inquiry is an example to find out the relevant nearest number storage order in address.All deposit in the drawings the address contents addressable memory (201) (Address CAM (201)) address of all number storage orders, in order to find out a nearest relevant number storage order that has the address, earlier the address with each number storage order in the address of load instruction and the address contents addressable memory (201) compares, and obtains one and whether have the relevant bit vector in address.Then the bit vector that obtains is sent into and selected a nearest number storage order that exists the address to be correlated with in the priority encoder (202) (Priority Encoder (202)).When the item number of formation was a lot, carrying out bit vector that the address relatively obtains by address contents addressable memory (201) will be corresponding very long, and priority encoder (202) (just has very big delay like this.What is worse, along with increasing of formation item number, the delay meeting of priority encoder (202) is elongated thereupon, thereby has limited the expansion of processor.The formation that load instruction is deposited in number storage order inquiry also must be experienced a process that complete association shown in Figure 2 is searched to find out the relevant and nearest load instruction that carry out in advance in address.
Fig. 3 shows the fundamental block diagram that realizes microprocessor of the present invention.Compare with Fig. 1, both key distinctions are on register renaming, instruction/microcode emission element and memory access parts.Register renaming module (302) among the figure has realized the memory access address correlation predictive device MDP among the present invention; Instruction/microcode transmitter module (303) has been realized emission control logic of the present invention; Memory access parts (305) have been realized of the present invention based on the relevant thought of local associative lookup solution memory access.
Fig. 4 shows memory access correlation predictive device (Memory Dependence Predictor, basic structure MDP).This is a table that complete association is searched, and each comprises 2 basic territory: load instruction PC and internal memory distance (Memory Distance) in the table.Have only load instruction/microcode just might occupy one in table, wherein load instruction PC represents the value in the programmable counter corresponding with load instruction/microcode; In this patent, internal memory distance refers to a pair of address is relevant in the processor the load instruction and the number of the dynamic accessing operation between the number storage order.Whenever a load instruction/microcode during through the register renaming module, just with corresponding PC address complete association index MDP table.If find the item of coupling, illustrating so before this load instruction has the relevant number storage order in an address.Emission must stop formation at this moment the emission of this load instruction, till the corresponding number storage order of memory access distance that its front and index come out has been finished (promptly the value of wanting number storage order being written in the data cache (DCACHE)).
In present patent application, because memory access correlation predictive table MDP is that complete association is searched, so this table can not be very big.Experiment shows, is 16 access queue for length, adopts about 8 MDP to meet the demands.Certainly, the item number of MDP need be decided as circumstances require in actual the realization, but it is noted that no matter adopt great MDP, all should be included in the spiritual scope of the present invention.
Fig. 5 shows the basic controlling flow process of emission control logic.Check at first whether armed instruction/microcode is access instruction/microcode (502), after access instruction/microcode is then waited for necessary operations number ready (509), be transmitted into ALU parts (510) execution.Notice that the ALU parts here comprise Fixed-Point Arithmetic Unit and floating-point calculation component.If armed instruction/microcode is access instruction/microcode, need so to handle respectively according to load instruction and number storage order respectively.If load instruction (503), see so at first whether before inquire about memory access correlation predictive table MDP when register renaming indexes corresponding entry, judge promptly whether effectively the memory access that obtains apart from (504), if do not index corresponding entry, the memory access that promptly obtains distance is invalid, is transmitted into (513) in the memory access parts after waiting address ready (508) so.If the memory access that inquiry obtains distance is effective, illustrate that so there is the relevant number storage order in an address front, this moment must etc. the address number storage order of being correlated be finished and value write in the data cache (DCACHE) and after (507) this load instruction is transmitted in the memory access parts (513).
Emission control for number storage order among Fig. 5 is different.See at first whether the address is ready to (505), in the emission formation, wait for if the address is not ready for.Further whether the value of checking is ready to (506) if the address is ready to, if value is not ready for, then carries out phase one emission (511) in advance, and it is relevant promptly number storage order to be transmitted into access queue solution memory access; If the value of number storage order is ready to, carry out the emission (512) of subordinate phase so, by the emission of subordinate phase the value of number storage order is delivered in the access queue.It is noted that if number storage order in the address ready the time value be ready to, do not need so to launch in two stages, directly once launch an address and value and all deliver in the access queue.In addition, though the part number storage order need divide 2 phase transmission, all be considered as same number storage order in emission formation and access queue.
Fig. 6 shows based on the part associative lookup and solves the relevant synoptic diagram of memory access, and this is that improvement of the present invention realizes.Inquiring about all number storage orders during equally with the execution load instruction is example, different with the whole formation of inquiry in the tradition realization shown in Fig. 2, here only need inquire about several (L) forward, and then the number storage order that selection satisfies condition from several, even drawing by analysis, the front only inquires about several forward, can guarantee that load instruction more than 99% can take valid data and (not need to transmit the load instruction of (Forward) for those, then directly read, think also that here inquiry is correct) from high-speed cache (Cache).Because L is generally little a lot of than N, and L do not increase along with N and grows simultaneously, and therefore the visit time-delay does not increase with the growth of N.
Needing backward when should number storage order entering access queue, whether inquiry has the load instruction that writes back in advance.Whether existing realization also is the formation (Fig. 2) of the whole load instruction of inquiry, and the front has analyzed to do fully like this and there is no need, write back in advance and get final product (Fig. 6) but only need inquire about adjacent several load instructions.
Experiment shows, is 16 access queue for length, can guarantee that for most programs the load instruction more than 99% can obtain correct data when L is 8; Can guarantee simultaneously that 100% number storage order knows whether the load instruction of carrying out in advance or writing back by<8 accessing operations of inquiry.Certainly, the occurrence of L need be decided as circumstances require in actual the realization, but it is noted that no matter the concrete value of L is much, all should be included in the spiritual scope of the present invention.
Although the overwhelming majority is transmitted (Forwarding) and all only occurred between memory access the distance very little load instruction and number storage order, still there is the minority load instruction need be from memory access apart from far number storage order Data transmission.Should be noted that: the transmission of mentioning among the present invention (Forwarding or Forward), refer to when load instruction/microcode is carried out, the number storage order from access queue before this load instruction obtains the process of latest data.We need transmit those but the load instruction of memory access distance far (>8) calls load instruction (mis-forwardloads) in the transmission not.Anatomize the middle load instruction of these transmission and find that they have good predictability.Therefore the present invention is by being provided with the historical information that a memory access correlation predictive table MDP writes down load instruction in the transmission not.
When number storage order of ROQ notice memory access parts is finished, then use the address lookup access queue of this number storage order, if find the relevant load instruction in an address, memory access distance between this Store and the load instruction is greater than L, and this load instruction do not obtain up-to-date value there from the number storage order of memory access distance<=L, illustrates that then this load instruction has read wrong value from data cache.Because this moment, this load instruction may write back, and follow-up instruction/microcode has used the error result of load instruction, therefore for fear of execution error, refresh process device streamline at this moment, in correlation predictive table MDP, distribute simultaneously a new list item, the value of the corresponding PC of load instruction that will and make mistakes and cause wrong number storage order and the load instruction of makeing mistakes between the memory access distance be recorded in the newly assigned MDP list item.
Fig. 7 shows the structured flowchart of the memory access parts in the preferred embodiments of the present invention.When an accessing operation emits from emission formation (701), at first deliver to MemAddr (702) and calculate the memory access address.Each accessing operation enters into LD/ST formation (703) after having calculated the memory access address, access queue is the control center of memory access parts, safeguards the order between all accessing operations.In the preferred embodiment shown in Fig. 7, visit Dcache (704) and TAG relatively (706) divide two to clap and finishes, and inquire about DTLB visit Dcache the time and carry out actual situation address translation (705).If find that at TAGCMP Cache hits, so directly write back among the ROQ (707) by the mmres bus; If instead do not hit, then send read request to second level cache (Cache) (not illustrating among the figure) by memread bus notice cache interface (Cache Interface) (708).When needing the X86 instruction submission (Commit) of memory access for one, ROQ removes corresponding microcode or instruction by Cmtbus notice LD/ST formation from formation.
Although below show the present invention in conjunction with the preferred embodiments of the present invention, one skilled in the art will appreciate that under the situation that does not break away from the spirit and scope of the present invention, can carry out various modifications, replacement and change to the present invention.Therefore, the present invention should not limited by the foregoing description, and should be limited by claims and equivalent thereof.

Claims (24)

1. processor comprises:
Get finger and decoding single part, be used for obtaining instruction stream, deliver to the register renaming parts after instruction stream is deciphered from internal memory;
The register renaming parts, the write after write (WAW) that is used to solve between instruction or the microcode is relevant with two kinds of writeafterreads (WAR), and all instructions or microcode are delivered to emission element after passing through register renaming; Described this register renaming unit also comprises a memory access correlation predictive table (MDP), all will inquire about this memory access correlation predictive table for a peek (LOAD) instruction, searches the item that wherein whether has coupling;
Emission element is used to safeguard all instructions or the operand of microcode, in case certain bar instruction or the essential operand of microcode are ready to, just it are transmitted into the rear end execution unit and carry out;
The rear end execution unit, it comprises some fixed point arithmetic logical blocks (ALU), floating-point arithmetic logical block (ALU), and some memory access parts, each memory access parts is provided with the procedure order that an access queue is safeguarded all accessing operations;
Instruction reorder queue (ROQ) is safeguarded the instruction of all processor pipelines or the procedure order of microcode, in case instruction or microcode are finished, just removes the reorder queue from instructing.
2. according to the processor of claim 1, wherein said memory access correlation predictive table comprises two at least: the value of the programmable counter of load instruction correspondence (PC) and internal memory distance, described internal memory distance are a pair of address is relevant in the processor the load instruction and the number of the dynamic accessing operation between the number storage order.
3. according to the processor of claim 2, if wherein in memory access correlation predictive table, find the item of coupling, then emission element stops this load instruction is transmitted into the memory access parts, is finished and value is write in the data cache (DCACHE) up to the number storage order relevant with its address.
4. according to the processor of claim 2, if wherein do not find the item of coupling in memory access correlation predictive table, emission element just is transmitted into the memory access parts with this load instruction.
5. according to the processor of claim 1, if wherein described access instruction is a number storage order, then the register renaming parts are not retrieved described memory access correlation predictive table, directly number storage order are delivered to emission element.
6. according to the processor of claim 5, if wherein the address of this number storage order is ready to value, then emission element directly is transmitted into number storage order in the memory access parts.
7. according to the processor of claim 5, if wherein the address of this number storage order is unripe, then emission element is waited for described number storage order in the emission formation.
8. according to the processor of claim 5, if wherein the address of this number storage order is ready to and is worth unripely, then emission element carries out the phase one emission, and number storage order is transmitted in the memory access parts.
9. processor according to Claim 8, if wherein the value of this number storage order is ready to, then emission element carries out the subordinate phase emission, and the value of number storage order is transmitted in the memory access parts.
10. according to the processor of one of claim 1-9, wherein said memory access parts are inquired about the length N of the item number L of access queue less than described access queue forward when carrying out load instruction.
11. according to the processor of one of claim 1-9, wherein said memory access parts are inquired about the length N of the item number L of access queue less than described access queue backward when carrying out number storage order.
12. according to the processor of one of claim 10-11, when a number storage order was finished, the instruction reorder queue was notified the address lookup access queue of memory access parts with this number storage order.
13. processor according to claim 12, if the memory access parts find the load instruction relevant with its address in its access queue, and the distance of the memory access between this number storage order and the load instruction is greater than L, and this load instruction is not obtained up-to-date value there from the number storage order of memory access distance<=L, then processor refreshes the streamline of described processor, in memory access correlation predictive table, distribute simultaneously a new list item, the value and the distance of the memory access between described number storage order and the described load instruction of the programmable counter of described load instruction is recorded in the newly assigned memory access correlation predictive list item.
14. one kind solves the relevant method of memory access in processor, described processor comprises: get finger and decoding unit; The register renaming parts, described this register renaming unit also comprises a memory access correlation predictive table (MDP); Emission element; The rear end execution unit, it comprises some fixed point arithmetic logical blocks (ALU), floating-point arithmetic logical block (ALU), and some memory access parts; And instruction reorder queue (ROQ); Described method comprises the following steps:
1) refers to and decoding unit reception access instruction from getting;
2) judge that described access instruction is load instruction or number storage order;
3), judge whether to exist item with the value coupling of the programmable counter of this load instruction if load instruction is then inquired about this memory access correlation predictive table;
4) if in described memory access correlation predictive device, there is item with the value coupling of the programmable counter of this load instruction, then emission element stops this that this load instruction is transmitted into the memory access parts, is finished and value has been write in the data cache (DCACHE) up to the poke relevant with its address operation;
5) if do not find the item of coupling in memory access correlation predictive table, emission element should be transmitted into this load instruction the memory access parts;
6) inquire about accessing operation in the access queue forward, determine whether to obtain from the number storage order of inquiring about the value that to peek there.
15. according to the method for claim 14, if wherein in step 2) in judge that described access instruction is a number storage order, described method also comprises step
7) the register renaming parts are not retrieved described memory access correlation predictive table, directly number storage order are delivered to emission element.
16., also comprise step according to the method for claim 15:
8) if the address of this number storage order and value are ready to, then emission element directly is transmitted into number storage order in the memory access parts.
17., also comprise step according to the method for claim 15:
9) if the address of this number storage order is unripe, then emission element is waited for described number storage order in the emission formation.
18., also comprise step according to the method for claim 15:
10) be worth unripely if the address of this number storage order is ready to, then emission element carries out phase one emission, and number storage order is transmitted in the memory access parts.
19., also comprise step according to the method for claim 15:
11) if the value of this number storage order is ready to, then emission element carries out subordinate phase emission, and the value of number storage order is transmitted in the memory access parts.
20. according to the method for one of claim 14-19, wherein said memory access parts are inquired about the length N of the item number L of access queue less than described access queue forward when carrying out load instruction.
21. according to the method for one of claim 14-19, wherein said memory access parts are inquired about the length N of the item number L of access queue less than described access queue backward when carrying out number storage order.
22., also comprise step according to the method for one of claim 16-21:
12) when a number storage order is finished, the instruction reorder queue is notified the address lookup access queue of memory access parts with this number storage order.
23., also comprise step according to the method for claim 22:
13) if the memory access parts find the load instruction relevant with its address in its access queue, and the distance of the memory access between this number storage order and the load instruction is greater than L, and this load instruction is not obtained up-to-date value there from the number storage order of memory access distance<=L, then processor refreshes the streamline of described processor, in memory access correlation predictive table, distribute simultaneously a new list item, with the value of the programmable counter of described load instruction and the memory access between described number storage order and the described load instruction apart from being recorded in the newly assigned memory access correlation predictive list item.
24. computer installation that adopts the processor of one of claim 1-15.
CNB2006101715219A 2006-12-30 2006-12-30 Based on the solution memory access of local associative lookup relevant method and processor Expired - Fee Related CN100545806C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006101715219A CN100545806C (en) 2006-12-30 2006-12-30 Based on the solution memory access of local associative lookup relevant method and processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006101715219A CN100545806C (en) 2006-12-30 2006-12-30 Based on the solution memory access of local associative lookup relevant method and processor

Publications (2)

Publication Number Publication Date
CN101211257A true CN101211257A (en) 2008-07-02
CN100545806C CN100545806C (en) 2009-09-30

Family

ID=39611316

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006101715219A Expired - Fee Related CN100545806C (en) 2006-12-30 2006-12-30 Based on the solution memory access of local associative lookup relevant method and processor

Country Status (1)

Country Link
CN (1) CN100545806C (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853150A (en) * 2009-05-29 2010-10-06 威盛电子股份有限公司 Non-microprocessor and the method for operating of carrying out in proper order thereof
WO2015149662A1 (en) * 2014-04-04 2015-10-08 上海芯豪微电子有限公司 Cache system and method
CN105242905A (en) * 2015-10-29 2016-01-13 华为技术有限公司 Data false correlation processing method and device
WO2018107331A1 (en) * 2016-12-12 2018-06-21 华为技术有限公司 Computer system and memory access technology
CN109087682A (en) * 2017-06-14 2018-12-25 展讯通信(上海)有限公司 Global storage sequence detection system and method
CN112540794A (en) * 2019-09-20 2021-03-23 阿里巴巴集团控股有限公司 Processor core, processor, device and instruction processing method
CN116841614A (en) * 2023-05-29 2023-10-03 进迭时空(杭州)科技有限公司 Sequential vector scheduling method under disordered access mechanism

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6065103A (en) * 1997-12-16 2000-05-16 Advanced Micro Devices, Inc. Speculative store buffer
US6108770A (en) * 1998-06-24 2000-08-22 Digital Equipment Corporation Method and apparatus for predicting memory dependence using store sets
US6591342B1 (en) * 1999-12-14 2003-07-08 Intel Corporation Memory disambiguation for large instruction windows

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853150A (en) * 2009-05-29 2010-10-06 威盛电子股份有限公司 Non-microprocessor and the method for operating of carrying out in proper order thereof
CN101853150B (en) * 2009-05-29 2013-05-22 威盛电子股份有限公司 Out-of-order execution microprocessor and operating method therefor
CN104978282B (en) * 2014-04-04 2019-10-01 上海芯豪微电子有限公司 A kind of caching system and method
CN104978282A (en) * 2014-04-04 2015-10-14 上海芯豪微电子有限公司 Cache system and method
US10324853B2 (en) 2014-04-04 2019-06-18 Shanghai Xinhao Microelectronics Co., Ltd. Cache system and method using track table and branch information
WO2015149662A1 (en) * 2014-04-04 2015-10-08 上海芯豪微电子有限公司 Cache system and method
CN105242905A (en) * 2015-10-29 2016-01-13 华为技术有限公司 Data false correlation processing method and device
CN105242905B (en) * 2015-10-29 2018-03-09 华为技术有限公司 The treating method and apparatus that data false appearance is closed
WO2018107331A1 (en) * 2016-12-12 2018-06-21 华为技术有限公司 Computer system and memory access technology
US11093245B2 (en) 2016-12-12 2021-08-17 Huawei Technologies Co., Ltd. Computer system and memory access technology
CN109087682A (en) * 2017-06-14 2018-12-25 展讯通信(上海)有限公司 Global storage sequence detection system and method
CN109087682B (en) * 2017-06-14 2020-09-01 展讯通信(上海)有限公司 Global memory sequence detection system and method
CN112540794A (en) * 2019-09-20 2021-03-23 阿里巴巴集团控股有限公司 Processor core, processor, device and instruction processing method
CN116841614A (en) * 2023-05-29 2023-10-03 进迭时空(杭州)科技有限公司 Sequential vector scheduling method under disordered access mechanism
CN116841614B (en) * 2023-05-29 2024-03-15 进迭时空(杭州)科技有限公司 Sequential vector scheduling method under disordered access mechanism

Also Published As

Publication number Publication date
CN100545806C (en) 2009-09-30

Similar Documents

Publication Publication Date Title
US10534616B2 (en) Load-hit-load detection in an out-of-order processor
US10977047B2 (en) Hazard detection of out-of-order execution of load and store instructions in processors without using real addresses
US10776113B2 (en) Executing load-store operations without address translation hardware per load-store unit port
US10963248B2 (en) Handling effective address synonyms in a load-store unit that operates without address translation
US5651125A (en) High performance superscalar microprocessor including a common reorder buffer and common register file for both integer and floating point operations
CN100545806C (en) Based on the solution memory access of local associative lookup relevant method and processor
US7506139B2 (en) Method and apparatus for register renaming using multiple physical register files and avoiding associative search
US6880073B2 (en) Speculative execution of instructions and processes before completion of preceding barrier operations
US10324856B2 (en) Address translation for sending real address to memory subsystem in effective address based load-store unit
US11175925B2 (en) Load-store unit with partitioned reorder queues with single cam port
US10606593B2 (en) Effective address based load store unit in out of order processors
US7600098B1 (en) Method and system for efficient implementation of very large store buffer
US20190108031A1 (en) Efficient store-forwarding with partitioned fifo store-reorder queue in out-of-order processor
US10572257B2 (en) Handling effective address synonyms in a load-store unit that operates without address translation
CA2260541C (en) Apparatus and method for tracking out of order load instructions to avoid data coherency violations in a processor
US20200142702A1 (en) Splitting load hit store table for out-of-order processor
CN111133413B (en) Load-store unit with partition reorder queue using a single CAM port

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: G-CLOUD TECHNOLOGY CO., LTD.

Free format text: FORMER OWNER: INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES

Effective date: 20140423

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 100080 HAIDIAN, BEIJING TO: 523808 DONGGUAN, GUANGDONG PROVINCE

TR01 Transfer of patent right

Effective date of registration: 20140423

Address after: 523808 Guangdong province Dongguan City Songshan Lake Science and Technology Industrial Park Building No. 14 Keyuan pine

Patentee after: G-CLOUD TECHNOLOGY Co.,Ltd.

Address before: 100080 Haidian District, Zhongguancun Academy of Sciences, South Road, No. 6, No.

Patentee before: Institute of Computing Technology, Chinese Academy of Sciences

C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20151130

Address after: 028021, the Inner Mongolia Autonomous Region, Tongliao Tongliao economic and Technological Development Zone, the former building of the former armed police

Patentee after: Inner Mongolia state cloud Technology Co.,Ltd.

Address before: 523808 Guangdong province Dongguan City Songshan Lake Science and Technology Industrial Park Building No. 14 Keyuan pine

Patentee before: G-CLOUD TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right

Effective date of registration: 20200107

Address after: 523808 19th Floor, Cloud Computing Center, Chinese Academy of Sciences, No. 1 Kehui Road, Songshan Lake Hi-tech Industrial Development Zone, Dongguan City, Guangdong Province

Patentee after: G-CLOUD TECHNOLOGY Co.,Ltd.

Address before: 028021, the Inner Mongolia Autonomous Region, Tongliao Tongliao economic and Technological Development Zone, the former building of the former armed police

Patentee before: Inner Mongolia state cloud Technology Co.,Ltd.

TR01 Transfer of patent right
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090930