CN101042641A - Digital signal processor with dynamic submitting pipeline function - Google Patents

Digital signal processor with dynamic submitting pipeline function Download PDF

Info

Publication number
CN101042641A
CN101042641A CN 200710039864 CN200710039864A CN101042641A CN 101042641 A CN101042641 A CN 101042641A CN 200710039864 CN200710039864 CN 200710039864 CN 200710039864 A CN200710039864 A CN 200710039864A CN 101042641 A CN101042641 A CN 101042641A
Authority
CN
China
Prior art keywords
instruction
dynamic
digital signal
signal processor
submission
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 200710039864
Other languages
Chinese (zh)
Other versions
CN101042641B (en
Inventor
金荣伟
李兴仁
刘春晖
林锦麟
张达文
杨一茜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Hualong Information Technology Development Center
Original Assignee
SHANGHAI HUALONG INFORMATION TECHNOLOGY DEVELOPMENT CENTER
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI HUALONG INFORMATION TECHNOLOGY DEVELOPMENT CENTER filed Critical SHANGHAI HUALONG INFORMATION TECHNOLOGY DEVELOPMENT CENTER
Priority to CN200710039864A priority Critical patent/CN101042641B/en
Publication of CN101042641A publication Critical patent/CN101042641A/en
Application granted granted Critical
Publication of CN101042641B publication Critical patent/CN101042641B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

This invention discloses one digital signal processor with dynamic streamline function, wherein, the inner rack comprises program control unit, order code unit, register file, execution unit and the transfer control bench for management order; the said frame control unit is connected to code unit and transfer control bench; the transfer control bench and register file data are connected; the code unit is connected to the register file and execution unit ; the execution unit is connected to the register file data. This invention adds order transfer control bench on process system into code unit to identify whether the current order is abnormal.

Description

A kind of digital signal processor with dynamic submitting pipeline function
Technical field:
The present invention relates to a kind of nextport universal digital signal processor NextPort, particularly a kind of digital signal processor with dynamic submitting pipeline function.
Background technology:
Current, in the design of high-performance digital signal processor, pipelining is occupied very important position in the performance performance.In at present popular processor, carry out the method that combines be widely used (as PowerPC 620, MIPS R10000 etc.) based on the scheduling of Tomasulo algorithm and prediction.
The Important Thought that realizes this method is to allow the out of order execution of instruction still must submit to according to the order of sequence, and should avoid any behavior that can't recover to take place before instruction is submitted to.Different with general DLX streamline, realize that this method need add an extra step and promptly instruct submission.Add submit state and require instruction execution sequence to change not to the utmost, and it is submitted up to instruction with the result that holds instruction to require to increase a cover hardware buffer station.
Certain this method is compared on performance with common streamline and is greatly improved, and this also is the reason that present processor generally adopts this method.But because interior most important one of the thought of its realization is to submit to according to the order of sequence, limited the further lifting of performance to a certain extent, also reduced the utilization factor of resource.
Causing the reason of above-mentioned technology limitation is that this method can produce the wait submit state.For example in fact, referring to Fig. 1, this figure adopts the streamline sequential of carrying out the streamline executive routine 1 of the method that combines based on the dynamic dispatching and the prediction of Tomasulo algorithm.
The program of carrying out among the figure 1 comprises 5 instructions, and contents of program is as follows:
DIV AR1, AR2; Instruction 1
MOV * AR3+, AR4; Instruction 2
MOV * AR3+, AR5; Instruction 3
MOV * AR3+, AR6; Instruction 4
MOV * AR3+, AR7; Instruction 5
Among this figure, in the IS presentation directives emission cycle, ID is the instruction decode cycle, and EX is the instruction performance period, and WB is that the result writes back the cycle, and CT is the instruction submission cycle, and WT waits for the submission cycle.
According to our hypothesis, instruction 1 is a divide instruction in the program 1, and generally the performance period of division is all longer relatively in processor, and this is relevant with figure place.Though all comprise a plurality of functional parts that can executed in parallel (as the multiplication and division unit according to the performance element of the streamline of supporting at present dynamically to carry out, ALU, access unit etc.), instruction 2 just can launch, decipher, carries out in the implementation of instruction 1, and just writes back before end is carried out in instruction 1.But owing to carry out the most important principle of method that combines based on the dynamic dispatching of Tomasulo algorithm and prediction is dynamic execution, submits to according to the order of sequence and could submit to after therefore instructing 2 results that write back must wait instruction 1 to submit to.This is because the physical resource of system is limited, instruction 5 because physical register not enough must wait for the front have instruction submit to and the release physical register after could continue to launch, so etc. the time to be launched also just to grow relatively.
Hypothesis is carried out divide instruction now needs 8 performance periods, and hardware also has 8 virtual registers idle, can see that carrying out this section program need spend 17 instruction cycles, and a large amount of waiting statuss occurred in the execution of instruction 2,3,4,5.
If can before instruction 1 is submitted to, allow the submission of other instruction, a large amount of waiting statuss has appearred in the execution of minimizing such as above-mentioned instruction 2,3,4,5, will improve the execution efficient of instruction so to a great extent, reduce the working time of program.For the consideration that above-mentioned technology solves thinking, we have proposed the present invention.
Summary of the invention:
The purpose of this invention is to provide a kind of digital signal processor with dynamic submitting pipeline function.This processor possesses the shortening stand-by period, improves the function of instruction efficient.
As shown in Figure 2, same is example with the example in the background technology, this processor is in the process of execution command 1, instruction 2 is finished to carry out in the instruction cycle 5 and is write back, permission was just submitted in the instruction cycle 6, instruction instruction cycle of 5 is also with regard to 2 instruction cycles like this, so as long as the whole performance period of program is 12 instruction cycles.Compare with 17 instruction cycles of carrying out the method that combines based on the dynamic dispatching of Tomasulo algorithm and prediction and to have saved 5 instruction cycles, reached the purpose that improves instruction efficient with this.
For realizing above-mentioned technique effect, the digital signal processor with dynamic submitting pipeline function of the present invention, the kernel framework comprises procedure control unit, instruction decoding unit, register file, performance element; It is characterized in that, also comprise in the framework being used for the submission control desk that supervisory instruction dynamically submits to; Core framework intermediate range preface control module is connected with submission control desk data with decoding unit respectively in described; Submit to control desk to be connected with register file data; Decoding unit is connected with register file with performance element respectively; Performance element is connected with register file data.
For the submission control desk of mentioning in the above-mentioned processor, when moving, processor is used for writing down the content of the content that comprises conditional execution instruction mark, condition, prediction, the number of register that needs are submitted to etc.
In addition, it should be noted that also the performance element in this processor comprises functional parts such as multiplication and division unit, ALU and storage unit, allow a plurality of parts to carry out simultaneously.
According to above-mentioned kernel framework, functions such as the emission of looking ahead, instructing that procedure control unit wherein is responsible for instructing, jump forecasting, interrupt management, debugging control.Instruction decoding unit is responsible for instruction decode and hardware resource scheduling.Register file is the framework of a multistage mapping, is responsible for management, the scheduling of logic register and physical register.Submitting control desk to is core of the present invention, is used for the dynamic submission of supervisory instruction.
For the principle of foregoing invention, be because the instruction of the processor of definite instruction set meeting occurrence condition jump forecasting and instruction exception is determined, be that decoding unit can be differentiated, the normal execution of the program flow of other instruction, therefore all instruction results before unusual instruction may take place for next band condition jump forecasting or instruction must take place, and also therefore the result of these instructions can dynamically submit to.Local dynamically submission just is meant that two other instructions between the condition instruction are dynamically to submit to.
The present invention adds instruction and submits control desk on the basis of general processor architecture, make processor after transmitting instructions, enter decoding unit, be conditional execution instruction or might produce unusual if decoding unit identifies present instruction, the extra mark that adds when submiting command being sent to the submission control desk, and instruction type is also sent into the submission control desk preserve up to instructing submitted.
When submitting to, all instructions in submitting control desk to before the markd submiting command of article one can be considered every possible angle, and just can submit to earlier if the submission condition has been satisfied in certain bar instruction wherein.So not only can improve the execution efficient of instruction but also can guarantee to instruct the correctness of carrying out.
Description of drawings:
Fig. 1 is based on the execution graph of Tomasulo algorithm when the execution pipeline program 1.
Fig. 2 is the execution graph of the present invention when execution pipeline program 1.
Fig. 3 is the cut-away view of digital signal processor provided by the present invention.
Fig. 4 is a performance element structured flowchart in the digital signal processor provided by the present invention.
Fig. 5 is the execution graph of Tomasulo algorithm when execution pipeline program 2 among the embodiment.
Fig. 6 is the execution graph of the present invention when execution pipeline program 2 among the embodiment.
Embodiment:
Further set forth the present invention below in conjunction with relevant drawings by embodiment.
As shown in Figure 3, in the digital signal processor provided by the present invention, procedure control unit links to each other with the submission control desk with decoding unit, decoding unit and other four part all have confidential relation, performance element links to each other with decoding unit with register file, submits to control desk can influence program flow control module and register file.
When work disposal, the program flow unit obtains instruction, analysis instruction from command memory, if conditional execution instruction then carries out branch prediction, the combined command that predicts the outcome sends to decoding unit.Decoding unit obtains the instruction that procedure control unit provides, analyzed, and to each functional part of performance element, register file and submit control desk application resource requirement to, if surplus resources can satisfy the instruction demand, just to for the performance element functional part send instruction, simultaneously command information is sent into the submission control desk.Write the number etc. of register of content, the needs submission of content that the content of submitting control desk to should comprise conditional execution instruction mark (the most important executive condition of the present invention), condition, prediction.Instruction comprises the register that read register and mark instructions will be revised to the required register of register application instruction in decoding and write-in functions parts.If conditional execution instruction to revise register, also need in addition special mark.
As shown in Figure 4, performance element comprises functional parts such as multiplication and division unit, ALU and storage unit, allows a plurality of parts to carry out simultaneously.After satisfying executive condition (instruct required operand ready) here, the instruction in the functional part just can really execute instruction, functional part can many instructions of the same type of buffer memory, mandatoryly after satisfying executive condition, can carry out earlier for wherein any one, with the sequence independence that enters functional part, realize dynamically carrying out.
After in functional part, obtaining the execution result of instruction, the result is write back physical register wait for and submitting to, if submit to control desk to find that wherein the required submission content of certain bar instruction is included in when not having the conditional forecasting instruction before the instruction, allows instruction to submit to.The instruction of utilization before article one conditional forecasting instruction can out of order submission characteristic realize local out of order submission.
Following mask body illustrates.
As shown in Figure 2, be example (contents of program pass away scape technology) with regard to program 1, the performance period of individual instructions is identical behind use the present invention, the performance period of instruction 1 still is 8 cycles, instruction 2 is transmitted in the instruction cycle 5 in the instruction cycle 2 and writes back, owing to there not being the conditional forecasting instruction before the instruction 2, therefore instruct 2 in just submission of beat 2, discharge the space of two physical registers simultaneously.The situation of instruction 3, instruction 4 is basic identical, instruction 5 should be in instruction cycles 5 emission under the sufficient situation of resource, but because the processor register resources is limited, therefore need wait until after other instruction discharges register resources and could launch, again owing to the dynamic dispatching based on the Tomasulo algorithm needs the instruction cycle 12 that the release of instruction register just be arranged with method such as the accompanying drawing 1 that the prediction execution combines, therefore at instruction cycles 13 ability firing order 5.And when utilizing the present invention to carry out this program, instruction 2 was just submitted to and has been discharged register resources in the instruction cycle 6, therefore just can firing order 5 in the instruction cycle 7.Utilize the present invention, 12 instruction cycles of 1 needs of program are just finished operation.More many soon than the method that dynamic dispatching and prediction execution based on the Tomasulo algorithm combine.
Certainly, owing to do not comprise conditional forecasting instruction in the program 1, therefore can only find out the advantage of dynamic submission and embody the local characteristic of submitting to.Describe in detail below by enumerating program 2.
Program 2 comprises 9 instructions, and contents of program is as follows:
DIV AR1, AR2; Instruction 1
MOV * AR3+, AR4; Instruction 2
MOV * AR3+, AR5; Instruction 3
MOV * AR3+, AR6; Instruction 4
BCC L1, AR6==#0; Instruct 5 assumed conditions to be false
DIV AR1, AR2; Instruction 6
MOV * AR3+, AR4; Instruction 7
MOV * AR3+, AR5; Instruction 8
MOV * AR3+, AR6; Instruction 9
As Fig. 5, shown in Figure 6, to carry out the method that combines based on the dynamic dispatching of Tomasulo algorithm with prediction as can be seen and spent 26 instruction cycles, and adopt method of the present invention only to spend 17 instruction cycles, advantage is still obvious.
Instruct 1,2,3,4 instruction decode by normal process in its former because program 2,, therefore when decoding, the type of mark and instruction should be write the submission control desk because instruction 5 is conditional forecasting instructions.Whether the instruction of submitting to control desk always to detect in the control desk can be submitted to, finds to instruct 2 can submit in the instruction cycle 6, and does not have conditional order at it, therefore finishes submission in the instruction cycle 6.In like manner, out of order submission is also finished in instruction 3,4,1.Though instruct 5 result prior to instructing 1 to write back, but because instruction 5 is conditional forecasting instructions, in order to guarantee to instruct the correctness of carrying out, instruction 5 could be submitted instruction 5 after must the instruction before being in it all submitting end to, therefore instruct 5 need to wait for instruction 1 submit to after the submission result.In like manner, instruction 6,7,8,9 just can out of orderly be submitted to after instruction 5 submissions such as having only.Out of order submission is the boundary with the instruction of conditional forecasting, realizes local out of order submission.
More than be based on one of related embodiment of the principle of the invention,, do not spend performing creative labour, on the basis of the foregoing description, can do multiple variation, can realize purpose of the present invention equally for those skilled in the art.But this variation obviously should be in the protection domain of claims of the present invention.

Claims (4)

1, a kind of digital signal processor with dynamic submitting pipeline function, the kernel framework comprises procedure control unit, instruction decoding unit, register file, performance element; It is characterized in that, also comprise in the framework being used for the submission control desk that supervisory instruction dynamically submits to; Core framework intermediate range preface control module is connected with submission control desk data with decoding unit respectively in described; Submit to control desk to be connected with register file data; Decoding unit is connected with register file with performance element respectively; Performance element is connected with register file data.
2, according to the digital signal processor with dynamic submitting pipeline function of claim 1, it is characterized in that the instruction sequences emission in the streamline, out of order execution, local out of order submission.
3, according to the digital signal processor with dynamic submitting pipeline function of claim 1, it is characterized in that the submission control desk in the described processor is used for writing down the content of the content that comprises conditional execution instruction mark, condition, prediction, the number of register that needs are submitted to etc.
According to the digital signal processor with dynamic submitting pipeline function of claim 1, it is characterized in that 4, the performance element in the processor comprises functional parts such as multiplication and division unit, ALU and storage unit, and allows a plurality of parts to carry out simultaneously.
CN200710039864A 2007-04-24 2007-04-24 Digital signal processor with dynamic submitting pipeline function Expired - Fee Related CN101042641B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200710039864A CN101042641B (en) 2007-04-24 2007-04-24 Digital signal processor with dynamic submitting pipeline function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200710039864A CN101042641B (en) 2007-04-24 2007-04-24 Digital signal processor with dynamic submitting pipeline function

Publications (2)

Publication Number Publication Date
CN101042641A true CN101042641A (en) 2007-09-26
CN101042641B CN101042641B (en) 2010-05-19

Family

ID=38808178

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200710039864A Expired - Fee Related CN101042641B (en) 2007-04-24 2007-04-24 Digital signal processor with dynamic submitting pipeline function

Country Status (1)

Country Link
CN (1) CN101042641B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105164637A (en) * 2013-05-30 2015-12-16 英特尔公司 Dynamic optimization of pipelined software
CN110780925A (en) * 2019-09-02 2020-02-11 芯创智(北京)微电子有限公司 Pre-decoding system and method of instruction pipeline

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105164637A (en) * 2013-05-30 2015-12-16 英特尔公司 Dynamic optimization of pipelined software
CN110780925A (en) * 2019-09-02 2020-02-11 芯创智(北京)微电子有限公司 Pre-decoding system and method of instruction pipeline
CN110780925B (en) * 2019-09-02 2021-11-16 芯创智(北京)微电子有限公司 Pre-decoding system and method of instruction pipeline

Also Published As

Publication number Publication date
CN101042641B (en) 2010-05-19

Similar Documents

Publication Publication Date Title
KR102311010B1 (en) A vector processor configured to operate on variable length vectors using one or more complex arithmetic instructions.
US7937559B1 (en) System and method for generating a configurable processor supporting a user-defined plurality of instruction sizes
JP3797471B2 (en) Method and apparatus for identifying divisible packets in a multi-threaded VLIW processor
EP1368732B1 (en) Digital signal processing apparatus
KR101730282B1 (en) Select logic using delayed reconstructed program order
CN1013067B (en) Tightly coupled multiprocessor instruction synchronization
Parsons et al. A++/P++ array classes for architecture independent finite difference computations
US9747216B2 (en) Computer processor employing byte-addressable dedicated memory for operand storage
EP3343360A1 (en) Apparatus and methods of decomposing loops to improve performance and power efficiency
US20030005261A1 (en) Method and apparatus for attaching accelerator hardware containing internal state to a processing core
CN110928577B (en) Execution method of vector storage instruction with exception return
EP2466452A1 (en) Register file and computing device using same
CN101042641A (en) Digital signal processor with dynamic submitting pipeline function
US9747238B2 (en) Computer processor employing split crossbar circuit for operand routing and slot-based organization of functional units
CN114528248A (en) Array reconstruction method, device, equipment and storage medium
US9513921B2 (en) Computer processor employing temporal addressing for storage of transient operands
CN111279308B (en) Barrier reduction during transcoding
CN112463218A (en) Instruction emission control method and circuit, data processing method and circuit
Jungeblut et al. A systematic approach for optimized bypass configurations for application-specific embedded processors
Sangireddy Reducing rename logic complexity for high-speed and low-power front-end architectures
CN113703841B (en) Optimization method, device and medium for register data reading
US8898433B2 (en) Efficient extraction of execution sets from fetch sets
Lozano et al. A deeply embedded processor for smart devices
CN115454506A (en) Instruction scheduling apparatus, method, chip, and computer-readable storage medium
JP6307975B2 (en) Arithmetic processing device and control method of arithmetic processing device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
ASS Succession or assignment of patent right

Owner name: SHANGHAI FUKONG HUALONG MICROSYSTEMS TECHNOLOGY C

Free format text: FORMER OWNER: SHANGHAI HUALONG INFORMATION TECHNOLOGY DEVELOPMENT CENTER

Effective date: 20080425

C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20080425

Address after: Shanghai city Pudong New Area Chunxiao Road No. 439 Building No. 11 post encoding: 201203

Applicant after: Shanghai Fukong Hualong Microsystem Technology Co., Ltd.

Address before: Shanghai city Pudong New Area Chunxiao Road No. 439 Building No. 11 post encoding: 201203

Applicant before: Shanghai Hualong Information Technology Development Center

ASS Succession or assignment of patent right

Owner name: SHANGHAI HUALONG INFORMATION TECHNOLOGY DEVELOPME

Free format text: FORMER OWNER: SHANGHAI FUKONG HUALONG MICROSYSTEMS TECHNOLOGY CO., LTD.

Effective date: 20080926

C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20080926

Address after: Shanghai city Pudong New Area Chunxiao Road No. 439 Building No. 11 post encoding: 201203

Applicant after: Shanghai Hualong Information Technology Development Center

Address before: Shanghai city Pudong New Area Chunxiao Road No. 439 Building No. 11 post encoding: 201203

Applicant before: Shanghai Fukong Hualong Microsystem Technology Co., Ltd.

C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20100519

Termination date: 20140424