CN102495724A - Data processor for improving storage instruction execution efficiency - Google Patents
Data processor for improving storage instruction execution efficiency Download PDFInfo
- Publication number
- CN102495724A CN102495724A CN2011103463410A CN201110346341A CN102495724A CN 102495724 A CN102495724 A CN 102495724A CN 2011103463410 A CN2011103463410 A CN 2011103463410A CN 201110346341 A CN201110346341 A CN 201110346341A CN 102495724 A CN102495724 A CN 102495724A
- Authority
- CN
- China
- Prior art keywords
- instruction
- storage instruction
- storage
- data
- operand
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Advance Control (AREA)
Abstract
A data processor for improving the storage instruction execution efficiency comprises a register file, an instruction decoding unit, an instruction scheduling unit, a storage instruction queue and instruction execution units, wherein the instruction scheduling unit is used for completing feedforward of the address operand of a stored instruction and feedforward of all operands of other instructions according to related information of instruction operands, and transmitting the instructions completed by operand feedforward to the corresponding instruction execution unit; the storage instruction queue is used for receiving storage instructions from the instruction decoding unit, storing write-back data and related information of the stored instruction, monitoring outlet data of all the execution units and completing feedforward of the write-back data of the stored instruction according to the related information of the stored instruction data operand; and the instruction execution units are used for receiving instructions transmitted by the instruction scheduling unit and are divided into different execution units according to the instruction types. The data processor provided by the invention effectively reduces breakdown of assembly lines due to genuine correlation of data write after read, improves the execution efficiency of storage instructions and has promoted performance.
Description
Technical field
The present invention relates to a kind of data processor.
Background technology
In the streamlined technology of processor, the conflict of streamline mainly comprises: structural hazard, data collision and control hazard.Along with the continuous intensification of streamline, these several types of conflicts cause the pause of streamline, have had a strong impact on performance of processors.Can solve through the mode that increases resource for structural hazard; For control hazard, adopted various technology such as all kinds of branch predictors, branch target cache device to reduce the performance loss that control hazard brings; For data collision, particularly the truth of writeafterread is closed, and adopts the mechanism of feedforward to go to alleviate the performance loss that it brings on the hardware usually.
For storage instruction, its operand is divided into address operand and data operand, and its operand possibly come from the execution result of all kinds of instructions, comprises that instruction of computing class and internal memory are written into instruction.Solve the data collision of storage instruction, the simplest method is exactly in transmitting instructions, to remove to feedover the execution result of each performance element, when its operand has been eliminated data dependence, just is transmitted into and goes in the performance element to handle.The advantage of this method is simple and unified with the treatment mechanism of other director data correlativitys; Shortcoming is that storage instruction is blocked launching phase and might be blocked the follow-up emission that does not have the correlativity instruction, thereby has influenced performance.From the characteristics of storage instruction itself, we can see, the write-back of storage is needs just when storage instruction really needs internal memory only, and from storage instruction be transmitted into the write-back internal memory in the middle of exist a mistiming.Therefore there is the people to propose to eliminate in the memory address correlativity; Data operand only be written into instruction correlativity arranged; Do not have under the situation of correlativity this storage instruction is transmitted in the performance element with other type instruction and go, storage instruction is accomplished being written into the feedforward operation of instruction in that performance element is inner.The advantage of this method is the speed that has improved memory copying; Shortcoming is that the data operand of storage instruction is trapped in the transmitting instructions stage when with the instruction of other types except being written into instruction correlativity being arranged; And the write-back that all need store in the inner place that storage instruction arranged in every case of performance element and to being written into the logic that the result feedovers causes the complicacy of rear end wiring.
Summary of the invention
Long, the storage instruction of pipeline stall is carried out the deficiency that efficient is lower, limited processor performance when closing for the truth that overcomes the data writeafterread that existing data processor exists in solving the storage data dependence, the present invention proposes a kind of effective minimizing because the truth of data writeafterread is closed pipeline stall, the execution efficient of accelerating storage instruction that causes, the quickening storage instruction that promotes processor performance is carried out the efficiency in data processor.
The technical solution adopted for the present invention to solve the technical problems is:
A kind of quickening storage instruction is carried out the efficiency in data processor, and said data processor comprises:
Register file;
Instruction decoding unit, in order to type information and the operand information according to the instruction operation code decode, access register is piled, and the correlativity of instruction operands is detected;
The instruction scheduling unit; In order to receive all instructions from instruction decoding unit; Monitor the outlet data of each performance element; Accomplish the feedforward of storage instruction address operand and the feedforward of other instruction all operations numbers according to the correlation information of instruction operands, the transmitting instructions that the operand feedforward is accomplished is in the instruction execution unit of correspondence;
The storage instruction formation; In order to receive storage instruction from instruction decoding unit; Preserve the write-back and the correlation information of storage instruction, monitor the outlet data of each performance element, accomplish the feedforward of storage instruction write-back according to the correlation information of store instruction data operand;
Instruction execution unit; In order to receive instruction from the emission of instruction scheduling unit; Be divided into different performance elements according to instruction type, storage instruction is launched in the storage instruction performance element and carries out, and other instructions are accomplished in corresponding performance element according to instruction type; In the said storage instruction performance element, obtain the address operand of storage instruction, obtain the write-back of storage instruction from the outlet of storage instruction formation from the outlet of instruction scheduling unit.
" other instructions " of the present invention is meant, all kinds of arithmetic instructions, and the logical operation instruction is written into instruction, jump instruction and other miscellany instructions.
Further, when instruction decoding unit translates when being a storage instruction, this storage instruction is transported to the instruction scheduling unit, and the information of conveying comprises the correlation information of the address operand that the address operand from register file, got and instruction decoding unit produce; And this storage instruction is created and gets in the storage instruction formation simultaneously, and the information of establishment comprises the write-back from register file, got and the correlation information of the data operand of instruction decoding unit generation; The degree of depth of storage instruction formation is that the simultaneously treated storage instruction of multipotency of the instruction maximum schedulable storage instruction numbers of scheduling unit and storage instruction performance element is counted sum.
Further again, the correlativity of storage instruction comprises two kinds of the correlativitys of correlativity and the data operand of address operand.
In said instruction decoding unit, if there is not correlativity, then each operand directly obtains from register file, if there is correlativity, and then need be through obtaining by instruction scheduling unit or storage instruction formation outlet feedforward from each performance element.
In the instruction scheduling unit, when the address operand of storage instruction did not exist correlativity or correlativity to eliminate through feedforward, this storage instruction can be launched into the storage instruction performance element.
Further, the feedforward of the storage instruction data operand of in the storage instruction formation, carrying out and storage instruction accept in the instruction scheduling unit that scheduling is parallel to be carried out; The feedforward of the data operand that storage instruction is carried out in the storage instruction formation and storage instruction accept to handle parallel the development in the storage instruction performance element; Promptly before the data manipulation data/coherency is eliminated, storage instruction can the executive address generation, physical address translations, cache access beamhouse operation.
Preferably; In the storage instruction performance element; Confirm can the write-back internal memory or during the write-back on-chip memory when storage instruction, from the storage instruction formation, reads the correlation information of storage instruction, if data dependence has been eliminated then write-back internal memory or write-back on-chip memory; If correlativity is not eliminated, this storage instruction can not write-back internal memory or write-back on-chip memory, continues to wait for that data dependence eliminates.
The storage instruction of storage instruction formation is retirement from the storage instruction formation when data are written back to internal memory or are written back to on-chip memory.
Technical conceive of the present invention is: the address operand of storage instruction and the feedforward of data operand are separately carried out; When storage instruction address function data/coherency is eliminated, just being transmitted into performance element goes to handle; The feedforward of the data operand of storage instruction concentrates in the storage instruction formation to be carried out, and when storage instruction write-back internal memory, just from this formation, obtains data.
Beneficial effect of the present invention mainly shows: reduce because the truth of data writeafterread is closed the pipeline stall that causes, accelerate the execution efficient of storage instruction, thereby improved the performance of data processor.
Description of drawings
Fig. 1 is an exemplary plot of data processor;
Fig. 2 is an exemplary plot of the part of processor core.
Fig. 3 is an exemplary plot of storage instruction formation.
Embodiment
Below in conjunction with accompanying drawing the present invention is further described.
With reference to Fig. 1~Fig. 2, a kind of quickening storage instruction is carried out the efficiency in data processor, and said data processor comprises:
Register file;
Instruction decoding unit, according to the type information and the operand information of instruction operation code decode, access register is piled, and the correlativity of instruction operands is detected;
The instruction scheduling unit; Reception is from all instructions of instruction decoding unit; Monitor the outlet data of each performance element; Accomplish the feedforward of storage instruction address operand and the feedforward of other instruction all operations numbers according to the correlation information of instruction operands, the transmitting instructions that the operand feedforward is accomplished is in the instruction execution unit of correspondence;
The storage instruction formation; Reception is from the storage instruction of instruction decoding unit; Preserve the write-back and the correlation information of storage instruction, monitor the outlet data of each performance element, accomplish the feedforward of storage instruction write-back according to the correlation information of store instruction data operand;
Instruction execution unit receives the instruction from the emission of instruction scheduling unit, is divided into different performance elements according to instruction type, and storage instruction is launched in the storage instruction performance element and carries out, and other instructions are accomplished in corresponding performance element according to instruction type;
The storage instruction performance element obtains the address operand of storage instruction from the outlet of instruction scheduling unit, obtains the write-back of storage instruction from the outlet of storage instruction formation.
Fig. 1 declarative data processor 10; In one embodiment; Data processor 10 comprises processor core 12, storer 14, Bus Interface Unit 18 and other unit 16; They are through bus 20 mutual two-way connections, and Bus Interface Unit 18 links to each other external unit through external bus 22 with data processor 10.
The part of the processor core 12 of Fig. 2 key diagram 1.In one embodiment, processor core comprises instruction fetch unit 30, register file 32; Instruction decoding unit 34, instruction scheduling unit 36, storage instruction formation 38; Storage instruction performance element 40, other instruction execution units 42, command cache 44 and data cache 46.Instruction fetch unit 30 access instruction Caches 44 obtain required instruction and send in the instruction decoding unit 34; Instruction decoding unit is according to the type information and the operand information of the operational code decode of instruction; Detect the correlativity of each operand, and access register heap 32 obtains the operand of instruction simultaneously.All instructions that instruction scheduling unit 36 receives from instruction decoding unit 34; Monitor the outlet data of each performance element 42; Accomplish the feedforward of storage instruction address operand and the feedforward of other instruction all operations numbers according to the correlation information of instruction operands, do not have the situation of correlativity, from register file 32, obtain the value of this operand for operand; For operand the situation of correlativity is arranged, obtain the execution result of instruction from the outlet of each instruction execution unit 42; For storage instruction; When its address function data/coherency is eliminated; This storage instruction just can be launched in the storage instruction performance element 40 to be handled, and just can mail in the corresponding instruction execution unit 42 when needing the correlativity of all operations number all to eliminate for the instruction of other type and go.Meanwhile; Storage instruction formation 38 receives from the correlation information of the storage instruction of instruction decoding unit 36 and its data operand with from the data of register file 32; And monitor the outlet of each instruction execution unit 42, obtain write-back according to the correlation information of write-back.Storage instruction is accomplished the calculating of address and the visit of data cache 46 in storage instruction performance element 40;, this storage instruction from storage instruction formation 38, obtains the data of write-back when need Updating Information high-speed cache 46 or chip external memory 14; When the correlativity of write-back was not eliminated, this storage instruction can not write-back Cache 46 or chip external memory 14.The storage instruction of storage instruction formation 38 just can retirement from storage instruction formation 38 when data are written back to internal memory or are written back to on-chip memory.Therefore, the degree of depth of storage instruction formation 38 is that the simultaneously treated storage instruction of multipotency of instruction scheduling unit 36 maximum schedulable storage instruction numbers and storage instruction performance element 40 is counted sum.
Fig. 3 has explained the storage instruction formation 38 of Fig. 2; In one embodiment; Storage instruction formation 38 comprises: contents in table module 50, create control module 52, and feedforward control module 54 and retired control module 56. are wherein; Contents in table module 50 is made up of several list items, the correlation information of the data operand of the corresponding storage instruction of each list item and the data of write-back; The quantity of list item is the degree of depth of storage instruction formation 38.Create control module 52 and be responsible for the establishment of contents in table module 50 each list item; When creating control module 52 and receive the storage instruction from instruction decoding unit 36; Send data creation signal 60 in contents in table module 50, preserve the correlation information and the data message of the data operand of storage instruction according to the order of sequence; In case contents in table module 50 does not have available list item, send list item jam signal 62 and give the establishment transmitting instructions that control module 52 is used to block instruction decoding unit 36.The outlet of feedforward control module 54 each performance elements 42 of monitoring; The data dependence information 64 of each list item in the contents in table module 50 and the object information of other performance element 42 outlets are mated; In case mate successfully; Feedforward control module 54 is sent Data Update signal 66 and is given contents in table module 50, accomplishes the renewal to the storage instruction write-back with the result who matches.The write-back 68 that retirement control module 56 is responsible for transmitting list item content module 50 according to the order of sequence is to storage instruction performance element 40; In case storage instruction performance element 40 receives the write-back success; Just send and receive successful signal to retired control module 56; Send list item retirement signal 70 by retired control module and give contents in table module 50, this list item is removed from contents in table module 50.
Claims (10)
1. accelerate storage instruction execution efficiency in data processor for one kind, it is characterized in that: said data processor comprises:
Register file;
Instruction decoding unit, in order to type information and the operand information according to the instruction operation code decode, access register is piled, and the correlativity of instruction operands is detected;
The instruction scheduling unit; In order to receive all instructions from instruction decoding unit; Monitor the outlet data of each performance element; Accomplish the feedforward of storage instruction address operand and the feedforward of other instruction all operations numbers according to the correlation information of instruction operands, the transmitting instructions that the operand feedforward is accomplished is in the instruction execution unit of correspondence;
The storage instruction formation; In order to receive storage instruction from instruction decoding unit; Preserve the write-back and the correlation information of storage instruction, monitor the outlet data of each performance element, accomplish the feedforward of storage instruction write-back according to the correlation information of store instruction data operand;
Instruction execution unit; In order to receive instruction from the emission of instruction scheduling unit; Be divided into different performance elements according to instruction type, storage instruction is launched in the storage instruction performance element and carries out, and other instructions are accomplished in corresponding performance element according to instruction type; Said storage instruction performance element obtains the address operand of storage instruction from the outlet of instruction scheduling unit, obtains the write-back of storage instruction from the outlet of storage instruction formation.
2. a kind of quickening storage instruction as claimed in claim 1 is carried out the efficiency in data processor; It is characterized in that: when instruction decoding unit translates when being a storage instruction; This storage instruction is transported to the instruction scheduling unit, and the information of conveying comprises the correlation information of the address operand that the address operand from register file, got and instruction decoding unit produce; And this storage instruction is created and gets in the storage instruction formation simultaneously, and the information of establishment comprises the write-back from register file, got and the correlation information of the data operand of instruction decoding unit generation.
3. according to claim 1 or claim 2 a kind of quickening storage instruction is carried out the efficiency in data processor, it is characterized in that: the degree of depth of storage instruction formation is that the simultaneously treated storage instruction of multipotency of the instruction maximum schedulable storage instruction numbers of scheduling unit and storage instruction performance element is counted sum.
According to claim 1 or claim 2 a kind of quickening storage instruction carry out the efficiency in data processor; It is characterized in that: described correlativity refers to need use the data that last instruction produces by present instruction, and the correlativity of storage instruction comprises two kinds of the correlativitys of correlativity and the data operand of address operand.
According to claim 1 or claim 2 a kind of quickening storage instruction carry out the efficiency in data processor; It is characterized in that: in said instruction decoding unit; If there is not correlativity; Then each operand directly obtains from register file, if there is correlativity, and then need be through obtaining by instruction scheduling unit or storage instruction formation outlet feedforward from each performance element.
6. according to claim 1 or claim 2 a kind of quickening storage instruction is carried out the efficiency in data processor; It is characterized in that: in said instruction scheduling unit; When the address operand of storage instruction did not exist correlativity or correlativity to eliminate through feedforward, this storage instruction was launched into the storage instruction performance element.
7. carry out the efficiency in data processor like a kind of quickening storage instruction of claim 1 or 2, it is characterized in that: the feedforward of the data operand that said storage instruction is carried out in the storage instruction formation and storage instruction accept to dispatch parallel the development in the instruction scheduling unit.
8. a kind of quickening storage instruction as claimed in claim 7 is carried out the efficiency in data processor, it is characterized in that: the feedforward of the data operand that said storage instruction is carried out in the storage instruction formation and storage instruction accept to handle parallel the development in the storage instruction performance element.
9. according to claim 1 or claim 2 a kind of quickening storage instruction is carried out the efficiency in data processor; It is characterized in that: in the storage instruction performance element; When storage instruction is confirmed write-back internal memory or write-back on-chip memory; From the storage instruction formation, read the correlation information of storage instruction, if data dependence has been eliminated then write-back internal memory or write-back on-chip memory; If correlativity is not eliminated, this storage instruction can not write-back internal memory or write-back on-chip memory, continues to wait for that data dependence eliminates.
10. according to claim 1 or claim 2 a kind of quickening storage instruction is carried out the efficiency in data processor, and it is characterized in that: the storage instruction of storage instruction formation is retirement from the storage instruction formation when data are written back to internal memory or are written back to on-chip memory.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011103463410A CN102495724A (en) | 2011-11-04 | 2011-11-04 | Data processor for improving storage instruction execution efficiency |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011103463410A CN102495724A (en) | 2011-11-04 | 2011-11-04 | Data processor for improving storage instruction execution efficiency |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102495724A true CN102495724A (en) | 2012-06-13 |
Family
ID=46187552
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011103463410A Pending CN102495724A (en) | 2011-11-04 | 2011-11-04 | Data processor for improving storage instruction execution efficiency |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102495724A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017185385A1 (en) * | 2016-04-26 | 2017-11-02 | 北京中科寒武纪科技有限公司 | Apparatus and method for executing vector merging operation |
WO2017185384A1 (en) * | 2016-04-26 | 2017-11-02 | 北京中科寒武纪科技有限公司 | Apparatus and method for executing vector circular shift operation |
CN108228242A (en) * | 2018-02-06 | 2018-06-29 | 江苏华存电子科技有限公司 | A kind of configurable and tool elasticity instruction scheduler |
CN108614736A (en) * | 2018-04-13 | 2018-10-02 | 杭州中天微系统有限公司 | Realize device and processor that resource index is replaced |
CN115629806A (en) * | 2022-12-19 | 2023-01-20 | 苏州浪潮智能科技有限公司 | Method, system, equipment and storage medium for processing instruction |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101719055A (en) * | 2009-12-03 | 2010-06-02 | 杭州中天微系统有限公司 | Quick implementation, loading and storage command module |
CN102141904A (en) * | 2011-03-31 | 2011-08-03 | 杭州中天微系统有限公司 | Data processor supporting interrupt shielding instruction |
US20110238964A1 (en) * | 2010-03-29 | 2011-09-29 | Renesas Electronics Corporation | Data processor |
-
2011
- 2011-11-04 CN CN2011103463410A patent/CN102495724A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101719055A (en) * | 2009-12-03 | 2010-06-02 | 杭州中天微系统有限公司 | Quick implementation, loading and storage command module |
US20110238964A1 (en) * | 2010-03-29 | 2011-09-29 | Renesas Electronics Corporation | Data processor |
CN102141904A (en) * | 2011-03-31 | 2011-08-03 | 杭州中天微系统有限公司 | Data processor supporting interrupt shielding instruction |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017185385A1 (en) * | 2016-04-26 | 2017-11-02 | 北京中科寒武纪科技有限公司 | Apparatus and method for executing vector merging operation |
WO2017185384A1 (en) * | 2016-04-26 | 2017-11-02 | 北京中科寒武纪科技有限公司 | Apparatus and method for executing vector circular shift operation |
US10761991B2 (en) | 2016-04-26 | 2020-09-01 | Cambricon Technologies Corporation Limited | Apparatus and methods for circular shift operations |
US11157593B2 (en) | 2016-04-26 | 2021-10-26 | Cambricon Technologies Corporation Limited | Apparatus and methods for combining vectors |
CN108228242A (en) * | 2018-02-06 | 2018-06-29 | 江苏华存电子科技有限公司 | A kind of configurable and tool elasticity instruction scheduler |
WO2019153683A1 (en) * | 2018-02-06 | 2019-08-15 | 江苏华存电子科技有限公司 | Configurable and flexible instruction scheduler |
CN108614736A (en) * | 2018-04-13 | 2018-10-02 | 杭州中天微系统有限公司 | Realize device and processor that resource index is replaced |
US11340905B2 (en) | 2018-04-13 | 2022-05-24 | C-Sky Microsystems Co., Ltd. | Device and processor for implementing resource index replacement |
US11734014B2 (en) | 2018-04-13 | 2023-08-22 | C-Sky Microsystems Co., Ltd. | Device and processor for implementing resource index replacement |
CN115629806A (en) * | 2022-12-19 | 2023-01-20 | 苏州浪潮智能科技有限公司 | Method, system, equipment and storage medium for processing instruction |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101817397B1 (en) | Inter-architecture compatability module to allow code module of one architecture to use library module of another architecture | |
CN100573446C (en) | The technology of execute store disambiguation | |
CN104204990B (en) | Accelerate the apparatus and method of operation in the processor using shared virtual memory | |
KR880002660B1 (en) | Central processor | |
JP6006247B2 (en) | Processor, method, system, and program for relaxing synchronization of access to shared memory | |
KR101804908B1 (en) | Method and apparatus for cache occupancy determination and instruction scheduling | |
CN102662634B (en) | Memory access and execution device for non-blocking transmission and execution | |
US9092346B2 (en) | Speculative cache modification | |
CN102495724A (en) | Data processor for improving storage instruction execution efficiency | |
CN101201811B (en) | Encryption-decryption coprocessor for SOC | |
US9904553B2 (en) | Method and apparatus for implementing dynamic portbinding within a reservation station | |
TWI658407B (en) | Managing instruction order in a processor pipeline | |
CN102640226A (en) | Memory having internal processors and methods of controlling memory access | |
WO2016100142A2 (en) | Advanced processor architecture | |
CN101477454A (en) | Out-of-order execution control device of built-in processor | |
CN109196485A (en) | Method and apparatus for maintaining the data consistency in non-homogeneous computing device | |
US9940139B2 (en) | Split-level history buffer in a computer processing unit | |
EP3716055A1 (en) | System, apparatus and method for symbolic store address generation for data-parallel processor | |
CN110908716B (en) | Method for implementing vector aggregation loading instruction | |
CN104221005A (en) | Mechanism for issuing requests to accelerator from multiple threads | |
CN105247479A (en) | Instruction order enforcement pairs of instructions, processors, methods, and systems | |
CN106445472B (en) | A kind of character manipulation accelerated method, device, chip, processor | |
CN110515659B (en) | Atomic instruction execution method and device | |
WO2024131071A1 (en) | Instruction processing method and system, device, and non-volatile readable storage medium | |
US11586462B2 (en) | Memory access request for a memory protocol |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20120613 |