CN102495724A - Data processor for improving storage instruction execution efficiency - Google Patents

Data processor for improving storage instruction execution efficiency Download PDF

Info

Publication number
CN102495724A
CN102495724A CN2011103463410A CN201110346341A CN102495724A CN 102495724 A CN102495724 A CN 102495724A CN 2011103463410 A CN2011103463410 A CN 2011103463410A CN 201110346341 A CN201110346341 A CN 201110346341A CN 102495724 A CN102495724 A CN 102495724A
Authority
CN
China
Prior art keywords
instruction
storage instruction
storage
data
operand
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011103463410A
Other languages
Chinese (zh)
Inventor
葛海通
项晓燕
杨军
陈志坚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou C Sky Microsystems Co Ltd
Original Assignee
Hangzhou C Sky Microsystems Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou C Sky Microsystems Co Ltd filed Critical Hangzhou C Sky Microsystems Co Ltd
Priority to CN2011103463410A priority Critical patent/CN102495724A/en
Publication of CN102495724A publication Critical patent/CN102495724A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Advance Control (AREA)

Abstract

A data processor for improving the storage instruction execution efficiency comprises a register file, an instruction decoding unit, an instruction scheduling unit, a storage instruction queue and instruction execution units, wherein the instruction scheduling unit is used for completing feedforward of the address operand of a stored instruction and feedforward of all operands of other instructions according to related information of instruction operands, and transmitting the instructions completed by operand feedforward to the corresponding instruction execution unit; the storage instruction queue is used for receiving storage instructions from the instruction decoding unit, storing write-back data and related information of the stored instruction, monitoring outlet data of all the execution units and completing feedforward of the write-back data of the stored instruction according to the related information of the stored instruction data operand; and the instruction execution units are used for receiving instructions transmitted by the instruction scheduling unit and are divided into different execution units according to the instruction types. The data processor provided by the invention effectively reduces breakdown of assembly lines due to genuine correlation of data write after read, improves the execution efficiency of storage instructions and has promoted performance.

Description

A kind of quickening storage instruction is carried out the efficiency in data processor
Technical field
The present invention relates to a kind of data processor.
Background technology
In the streamlined technology of processor, the conflict of streamline mainly comprises: structural hazard, data collision and control hazard.Along with the continuous intensification of streamline, these several types of conflicts cause the pause of streamline, have had a strong impact on performance of processors.Can solve through the mode that increases resource for structural hazard; For control hazard, adopted various technology such as all kinds of branch predictors, branch target cache device to reduce the performance loss that control hazard brings; For data collision, particularly the truth of writeafterread is closed, and adopts the mechanism of feedforward to go to alleviate the performance loss that it brings on the hardware usually.
For storage instruction, its operand is divided into address operand and data operand, and its operand possibly come from the execution result of all kinds of instructions, comprises that instruction of computing class and internal memory are written into instruction.Solve the data collision of storage instruction, the simplest method is exactly in transmitting instructions, to remove to feedover the execution result of each performance element, when its operand has been eliminated data dependence, just is transmitted into and goes in the performance element to handle.The advantage of this method is simple and unified with the treatment mechanism of other director data correlativitys; Shortcoming is that storage instruction is blocked launching phase and might be blocked the follow-up emission that does not have the correlativity instruction, thereby has influenced performance.From the characteristics of storage instruction itself, we can see, the write-back of storage is needs just when storage instruction really needs internal memory only, and from storage instruction be transmitted into the write-back internal memory in the middle of exist a mistiming.Therefore there is the people to propose to eliminate in the memory address correlativity; Data operand only be written into instruction correlativity arranged; Do not have under the situation of correlativity this storage instruction is transmitted in the performance element with other type instruction and go, storage instruction is accomplished being written into the feedforward operation of instruction in that performance element is inner.The advantage of this method is the speed that has improved memory copying; Shortcoming is that the data operand of storage instruction is trapped in the transmitting instructions stage when with the instruction of other types except being written into instruction correlativity being arranged; And the write-back that all need store in the inner place that storage instruction arranged in every case of performance element and to being written into the logic that the result feedovers causes the complicacy of rear end wiring.
Summary of the invention
Long, the storage instruction of pipeline stall is carried out the deficiency that efficient is lower, limited processor performance when closing for the truth that overcomes the data writeafterread that existing data processor exists in solving the storage data dependence, the present invention proposes a kind of effective minimizing because the truth of data writeafterread is closed pipeline stall, the execution efficient of accelerating storage instruction that causes, the quickening storage instruction that promotes processor performance is carried out the efficiency in data processor.
The technical solution adopted for the present invention to solve the technical problems is:
A kind of quickening storage instruction is carried out the efficiency in data processor, and said data processor comprises:
Register file;
Instruction decoding unit, in order to type information and the operand information according to the instruction operation code decode, access register is piled, and the correlativity of instruction operands is detected;
The instruction scheduling unit; In order to receive all instructions from instruction decoding unit; Monitor the outlet data of each performance element; Accomplish the feedforward of storage instruction address operand and the feedforward of other instruction all operations numbers according to the correlation information of instruction operands, the transmitting instructions that the operand feedforward is accomplished is in the instruction execution unit of correspondence;
The storage instruction formation; In order to receive storage instruction from instruction decoding unit; Preserve the write-back and the correlation information of storage instruction, monitor the outlet data of each performance element, accomplish the feedforward of storage instruction write-back according to the correlation information of store instruction data operand;
Instruction execution unit; In order to receive instruction from the emission of instruction scheduling unit; Be divided into different performance elements according to instruction type, storage instruction is launched in the storage instruction performance element and carries out, and other instructions are accomplished in corresponding performance element according to instruction type; In the said storage instruction performance element, obtain the address operand of storage instruction, obtain the write-back of storage instruction from the outlet of storage instruction formation from the outlet of instruction scheduling unit.
" other instructions " of the present invention is meant, all kinds of arithmetic instructions, and the logical operation instruction is written into instruction, jump instruction and other miscellany instructions.
Further, when instruction decoding unit translates when being a storage instruction, this storage instruction is transported to the instruction scheduling unit, and the information of conveying comprises the correlation information of the address operand that the address operand from register file, got and instruction decoding unit produce; And this storage instruction is created and gets in the storage instruction formation simultaneously, and the information of establishment comprises the write-back from register file, got and the correlation information of the data operand of instruction decoding unit generation; The degree of depth of storage instruction formation is that the simultaneously treated storage instruction of multipotency of the instruction maximum schedulable storage instruction numbers of scheduling unit and storage instruction performance element is counted sum.
Further again, the correlativity of storage instruction comprises two kinds of the correlativitys of correlativity and the data operand of address operand.
In said instruction decoding unit, if there is not correlativity, then each operand directly obtains from register file, if there is correlativity, and then need be through obtaining by instruction scheduling unit or storage instruction formation outlet feedforward from each performance element.
In the instruction scheduling unit, when the address operand of storage instruction did not exist correlativity or correlativity to eliminate through feedforward, this storage instruction can be launched into the storage instruction performance element.
Further, the feedforward of the storage instruction data operand of in the storage instruction formation, carrying out and storage instruction accept in the instruction scheduling unit that scheduling is parallel to be carried out; The feedforward of the data operand that storage instruction is carried out in the storage instruction formation and storage instruction accept to handle parallel the development in the storage instruction performance element; Promptly before the data manipulation data/coherency is eliminated, storage instruction can the executive address generation, physical address translations, cache access beamhouse operation.
Preferably; In the storage instruction performance element; Confirm can the write-back internal memory or during the write-back on-chip memory when storage instruction, from the storage instruction formation, reads the correlation information of storage instruction, if data dependence has been eliminated then write-back internal memory or write-back on-chip memory; If correlativity is not eliminated, this storage instruction can not write-back internal memory or write-back on-chip memory, continues to wait for that data dependence eliminates.
The storage instruction of storage instruction formation is retirement from the storage instruction formation when data are written back to internal memory or are written back to on-chip memory.
Technical conceive of the present invention is: the address operand of storage instruction and the feedforward of data operand are separately carried out; When storage instruction address function data/coherency is eliminated, just being transmitted into performance element goes to handle; The feedforward of the data operand of storage instruction concentrates in the storage instruction formation to be carried out, and when storage instruction write-back internal memory, just from this formation, obtains data.
Beneficial effect of the present invention mainly shows: reduce because the truth of data writeafterread is closed the pipeline stall that causes, accelerate the execution efficient of storage instruction, thereby improved the performance of data processor.
Description of drawings
Fig. 1 is an exemplary plot of data processor;
Fig. 2 is an exemplary plot of the part of processor core.
Fig. 3 is an exemplary plot of storage instruction formation.
Embodiment
Below in conjunction with accompanying drawing the present invention is further described.
With reference to Fig. 1~Fig. 2, a kind of quickening storage instruction is carried out the efficiency in data processor, and said data processor comprises:
Register file;
Instruction decoding unit, according to the type information and the operand information of instruction operation code decode, access register is piled, and the correlativity of instruction operands is detected;
The instruction scheduling unit; Reception is from all instructions of instruction decoding unit; Monitor the outlet data of each performance element; Accomplish the feedforward of storage instruction address operand and the feedforward of other instruction all operations numbers according to the correlation information of instruction operands, the transmitting instructions that the operand feedforward is accomplished is in the instruction execution unit of correspondence;
The storage instruction formation; Reception is from the storage instruction of instruction decoding unit; Preserve the write-back and the correlation information of storage instruction, monitor the outlet data of each performance element, accomplish the feedforward of storage instruction write-back according to the correlation information of store instruction data operand;
Instruction execution unit receives the instruction from the emission of instruction scheduling unit, is divided into different performance elements according to instruction type, and storage instruction is launched in the storage instruction performance element and carries out, and other instructions are accomplished in corresponding performance element according to instruction type;
The storage instruction performance element obtains the address operand of storage instruction from the outlet of instruction scheduling unit, obtains the write-back of storage instruction from the outlet of storage instruction formation.
Fig. 1 declarative data processor 10; In one embodiment; Data processor 10 comprises processor core 12, storer 14, Bus Interface Unit 18 and other unit 16; They are through bus 20 mutual two-way connections, and Bus Interface Unit 18 links to each other external unit through external bus 22 with data processor 10.
The part of the processor core 12 of Fig. 2 key diagram 1.In one embodiment, processor core comprises instruction fetch unit 30, register file 32; Instruction decoding unit 34, instruction scheduling unit 36, storage instruction formation 38; Storage instruction performance element 40, other instruction execution units 42, command cache 44 and data cache 46.Instruction fetch unit 30 access instruction Caches 44 obtain required instruction and send in the instruction decoding unit 34; Instruction decoding unit is according to the type information and the operand information of the operational code decode of instruction; Detect the correlativity of each operand, and access register heap 32 obtains the operand of instruction simultaneously.All instructions that instruction scheduling unit 36 receives from instruction decoding unit 34; Monitor the outlet data of each performance element 42; Accomplish the feedforward of storage instruction address operand and the feedforward of other instruction all operations numbers according to the correlation information of instruction operands, do not have the situation of correlativity, from register file 32, obtain the value of this operand for operand; For operand the situation of correlativity is arranged, obtain the execution result of instruction from the outlet of each instruction execution unit 42; For storage instruction; When its address function data/coherency is eliminated; This storage instruction just can be launched in the storage instruction performance element 40 to be handled, and just can mail in the corresponding instruction execution unit 42 when needing the correlativity of all operations number all to eliminate for the instruction of other type and go.Meanwhile; Storage instruction formation 38 receives from the correlation information of the storage instruction of instruction decoding unit 36 and its data operand with from the data of register file 32; And monitor the outlet of each instruction execution unit 42, obtain write-back according to the correlation information of write-back.Storage instruction is accomplished the calculating of address and the visit of data cache 46 in storage instruction performance element 40;, this storage instruction from storage instruction formation 38, obtains the data of write-back when need Updating Information high-speed cache 46 or chip external memory 14; When the correlativity of write-back was not eliminated, this storage instruction can not write-back Cache 46 or chip external memory 14.The storage instruction of storage instruction formation 38 just can retirement from storage instruction formation 38 when data are written back to internal memory or are written back to on-chip memory.Therefore, the degree of depth of storage instruction formation 38 is that the simultaneously treated storage instruction of multipotency of instruction scheduling unit 36 maximum schedulable storage instruction numbers and storage instruction performance element 40 is counted sum.
Fig. 3 has explained the storage instruction formation 38 of Fig. 2; In one embodiment; Storage instruction formation 38 comprises: contents in table module 50, create control module 52, and feedforward control module 54 and retired control module 56. are wherein; Contents in table module 50 is made up of several list items, the correlation information of the data operand of the corresponding storage instruction of each list item and the data of write-back; The quantity of list item is the degree of depth of storage instruction formation 38.Create control module 52 and be responsible for the establishment of contents in table module 50 each list item; When creating control module 52 and receive the storage instruction from instruction decoding unit 36; Send data creation signal 60 in contents in table module 50, preserve the correlation information and the data message of the data operand of storage instruction according to the order of sequence; In case contents in table module 50 does not have available list item, send list item jam signal 62 and give the establishment transmitting instructions that control module 52 is used to block instruction decoding unit 36.The outlet of feedforward control module 54 each performance elements 42 of monitoring; The data dependence information 64 of each list item in the contents in table module 50 and the object information of other performance element 42 outlets are mated; In case mate successfully; Feedforward control module 54 is sent Data Update signal 66 and is given contents in table module 50, accomplishes the renewal to the storage instruction write-back with the result who matches.The write-back 68 that retirement control module 56 is responsible for transmitting list item content module 50 according to the order of sequence is to storage instruction performance element 40; In case storage instruction performance element 40 receives the write-back success; Just send and receive successful signal to retired control module 56; Send list item retirement signal 70 by retired control module and give contents in table module 50, this list item is removed from contents in table module 50.

Claims (10)

1. accelerate storage instruction execution efficiency in data processor for one kind, it is characterized in that: said data processor comprises:
Register file;
Instruction decoding unit, in order to type information and the operand information according to the instruction operation code decode, access register is piled, and the correlativity of instruction operands is detected;
The instruction scheduling unit; In order to receive all instructions from instruction decoding unit; Monitor the outlet data of each performance element; Accomplish the feedforward of storage instruction address operand and the feedforward of other instruction all operations numbers according to the correlation information of instruction operands, the transmitting instructions that the operand feedforward is accomplished is in the instruction execution unit of correspondence;
The storage instruction formation; In order to receive storage instruction from instruction decoding unit; Preserve the write-back and the correlation information of storage instruction, monitor the outlet data of each performance element, accomplish the feedforward of storage instruction write-back according to the correlation information of store instruction data operand;
Instruction execution unit; In order to receive instruction from the emission of instruction scheduling unit; Be divided into different performance elements according to instruction type, storage instruction is launched in the storage instruction performance element and carries out, and other instructions are accomplished in corresponding performance element according to instruction type; Said storage instruction performance element obtains the address operand of storage instruction from the outlet of instruction scheduling unit, obtains the write-back of storage instruction from the outlet of storage instruction formation.
2. a kind of quickening storage instruction as claimed in claim 1 is carried out the efficiency in data processor; It is characterized in that: when instruction decoding unit translates when being a storage instruction; This storage instruction is transported to the instruction scheduling unit, and the information of conveying comprises the correlation information of the address operand that the address operand from register file, got and instruction decoding unit produce; And this storage instruction is created and gets in the storage instruction formation simultaneously, and the information of establishment comprises the write-back from register file, got and the correlation information of the data operand of instruction decoding unit generation.
3. according to claim 1 or claim 2 a kind of quickening storage instruction is carried out the efficiency in data processor, it is characterized in that: the degree of depth of storage instruction formation is that the simultaneously treated storage instruction of multipotency of the instruction maximum schedulable storage instruction numbers of scheduling unit and storage instruction performance element is counted sum.
According to claim 1 or claim 2 a kind of quickening storage instruction carry out the efficiency in data processor; It is characterized in that: described correlativity refers to need use the data that last instruction produces by present instruction, and the correlativity of storage instruction comprises two kinds of the correlativitys of correlativity and the data operand of address operand.
According to claim 1 or claim 2 a kind of quickening storage instruction carry out the efficiency in data processor; It is characterized in that: in said instruction decoding unit; If there is not correlativity; Then each operand directly obtains from register file, if there is correlativity, and then need be through obtaining by instruction scheduling unit or storage instruction formation outlet feedforward from each performance element.
6. according to claim 1 or claim 2 a kind of quickening storage instruction is carried out the efficiency in data processor; It is characterized in that: in said instruction scheduling unit; When the address operand of storage instruction did not exist correlativity or correlativity to eliminate through feedforward, this storage instruction was launched into the storage instruction performance element.
7. carry out the efficiency in data processor like a kind of quickening storage instruction of claim 1 or 2, it is characterized in that: the feedforward of the data operand that said storage instruction is carried out in the storage instruction formation and storage instruction accept to dispatch parallel the development in the instruction scheduling unit.
8. a kind of quickening storage instruction as claimed in claim 7 is carried out the efficiency in data processor, it is characterized in that: the feedforward of the data operand that said storage instruction is carried out in the storage instruction formation and storage instruction accept to handle parallel the development in the storage instruction performance element.
9. according to claim 1 or claim 2 a kind of quickening storage instruction is carried out the efficiency in data processor; It is characterized in that: in the storage instruction performance element; When storage instruction is confirmed write-back internal memory or write-back on-chip memory; From the storage instruction formation, read the correlation information of storage instruction, if data dependence has been eliminated then write-back internal memory or write-back on-chip memory; If correlativity is not eliminated, this storage instruction can not write-back internal memory or write-back on-chip memory, continues to wait for that data dependence eliminates.
10. according to claim 1 or claim 2 a kind of quickening storage instruction is carried out the efficiency in data processor, and it is characterized in that: the storage instruction of storage instruction formation is retirement from the storage instruction formation when data are written back to internal memory or are written back to on-chip memory.
CN2011103463410A 2011-11-04 2011-11-04 Data processor for improving storage instruction execution efficiency Pending CN102495724A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011103463410A CN102495724A (en) 2011-11-04 2011-11-04 Data processor for improving storage instruction execution efficiency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011103463410A CN102495724A (en) 2011-11-04 2011-11-04 Data processor for improving storage instruction execution efficiency

Publications (1)

Publication Number Publication Date
CN102495724A true CN102495724A (en) 2012-06-13

Family

ID=46187552

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011103463410A Pending CN102495724A (en) 2011-11-04 2011-11-04 Data processor for improving storage instruction execution efficiency

Country Status (1)

Country Link
CN (1) CN102495724A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017185385A1 (en) * 2016-04-26 2017-11-02 北京中科寒武纪科技有限公司 Apparatus and method for executing vector merging operation
WO2017185384A1 (en) * 2016-04-26 2017-11-02 北京中科寒武纪科技有限公司 Apparatus and method for executing vector circular shift operation
CN108228242A (en) * 2018-02-06 2018-06-29 江苏华存电子科技有限公司 A kind of configurable and tool elasticity instruction scheduler
CN108614736A (en) * 2018-04-13 2018-10-02 杭州中天微系统有限公司 Realize device and processor that resource index is replaced
CN115629806A (en) * 2022-12-19 2023-01-20 苏州浪潮智能科技有限公司 Method, system, equipment and storage medium for processing instruction

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719055A (en) * 2009-12-03 2010-06-02 杭州中天微系统有限公司 Quick implementation, loading and storage command module
CN102141904A (en) * 2011-03-31 2011-08-03 杭州中天微系统有限公司 Data processor supporting interrupt shielding instruction
US20110238964A1 (en) * 2010-03-29 2011-09-29 Renesas Electronics Corporation Data processor

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719055A (en) * 2009-12-03 2010-06-02 杭州中天微系统有限公司 Quick implementation, loading and storage command module
US20110238964A1 (en) * 2010-03-29 2011-09-29 Renesas Electronics Corporation Data processor
CN102141904A (en) * 2011-03-31 2011-08-03 杭州中天微系统有限公司 Data processor supporting interrupt shielding instruction

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017185385A1 (en) * 2016-04-26 2017-11-02 北京中科寒武纪科技有限公司 Apparatus and method for executing vector merging operation
WO2017185384A1 (en) * 2016-04-26 2017-11-02 北京中科寒武纪科技有限公司 Apparatus and method for executing vector circular shift operation
US10761991B2 (en) 2016-04-26 2020-09-01 Cambricon Technologies Corporation Limited Apparatus and methods for circular shift operations
US11157593B2 (en) 2016-04-26 2021-10-26 Cambricon Technologies Corporation Limited Apparatus and methods for combining vectors
CN108228242A (en) * 2018-02-06 2018-06-29 江苏华存电子科技有限公司 A kind of configurable and tool elasticity instruction scheduler
WO2019153683A1 (en) * 2018-02-06 2019-08-15 江苏华存电子科技有限公司 Configurable and flexible instruction scheduler
CN108614736A (en) * 2018-04-13 2018-10-02 杭州中天微系统有限公司 Realize device and processor that resource index is replaced
US11340905B2 (en) 2018-04-13 2022-05-24 C-Sky Microsystems Co., Ltd. Device and processor for implementing resource index replacement
US11734014B2 (en) 2018-04-13 2023-08-22 C-Sky Microsystems Co., Ltd. Device and processor for implementing resource index replacement
CN115629806A (en) * 2022-12-19 2023-01-20 苏州浪潮智能科技有限公司 Method, system, equipment and storage medium for processing instruction

Similar Documents

Publication Publication Date Title
KR101817397B1 (en) Inter-architecture compatability module to allow code module of one architecture to use library module of another architecture
CN100573446C (en) The technology of execute store disambiguation
CN104204990B (en) Accelerate the apparatus and method of operation in the processor using shared virtual memory
KR880002660B1 (en) Central processor
JP6006247B2 (en) Processor, method, system, and program for relaxing synchronization of access to shared memory
KR101804908B1 (en) Method and apparatus for cache occupancy determination and instruction scheduling
CN102662634B (en) Memory access and execution device for non-blocking transmission and execution
US9092346B2 (en) Speculative cache modification
CN102495724A (en) Data processor for improving storage instruction execution efficiency
CN101201811B (en) Encryption-decryption coprocessor for SOC
US9904553B2 (en) Method and apparatus for implementing dynamic portbinding within a reservation station
TWI658407B (en) Managing instruction order in a processor pipeline
CN102640226A (en) Memory having internal processors and methods of controlling memory access
WO2016100142A2 (en) Advanced processor architecture
CN101477454A (en) Out-of-order execution control device of built-in processor
CN109196485A (en) Method and apparatus for maintaining the data consistency in non-homogeneous computing device
US9940139B2 (en) Split-level history buffer in a computer processing unit
EP3716055A1 (en) System, apparatus and method for symbolic store address generation for data-parallel processor
CN110908716B (en) Method for implementing vector aggregation loading instruction
CN104221005A (en) Mechanism for issuing requests to accelerator from multiple threads
CN105247479A (en) Instruction order enforcement pairs of instructions, processors, methods, and systems
CN106445472B (en) A kind of character manipulation accelerated method, device, chip, processor
CN110515659B (en) Atomic instruction execution method and device
WO2024131071A1 (en) Instruction processing method and system, device, and non-volatile readable storage medium
US11586462B2 (en) Memory access request for a memory protocol

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20120613