CN104361182B - Microprocessor microarchitecture parameter optimization method based on Petri network - Google Patents

Microprocessor microarchitecture parameter optimization method based on Petri network Download PDF

Info

Publication number
CN104361182B
CN104361182B CN201410673302.5A CN201410673302A CN104361182B CN 104361182 B CN104361182 B CN 104361182B CN 201410673302 A CN201410673302 A CN 201410673302A CN 104361182 B CN104361182 B CN 104361182B
Authority
CN
China
Prior art keywords
node
critical path
weights
enter
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410673302.5A
Other languages
Chinese (zh)
Other versions
CN104361182A (en
Inventor
王蕾
王永文
窦强
邓宇
赵天磊
孙彩霞
张承义
高军
倪小强
隋兵才
陈微
黄立波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201410673302.5A priority Critical patent/CN104361182B/en
Publication of CN104361182A publication Critical patent/CN104361182A/en
Application granted granted Critical
Publication of CN104361182B publication Critical patent/CN104361182B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of microprocessor microarchitecture parameter optimization method based on Petri network, step includes:The template of pipeline model is constructed based on Colored Petri Net, obtain the command sequence of destination application, obtain relevant information and functional unit type between instruction, the Colored Petri Net model that generation target program is run under parameter current configuration, simulated using Petri network simulation tool and generate simulation report, Colored Petri Net model is generated by corresponding directed acyclic graph according to simulation report, calculate the critical path of directed acyclic graph, the node that critical path is passed through, calculate the slack time for each entering side of each node, analyze the performance bottleneck or power consumption bottleneck of microprocessor operational objective application program under current microarchitecture parameter configuration, if necessary to optimize, then microarchitecture parameter is adjusted.The present invention has the advantages that reliability forecasting and precision are high, it is wide to explore design space coverage, optimized algorithm complexity is relatively low, optimize rapidly and efficiently.

Description

Microprocessor microarchitecture parameter optimization method based on Petri network
Technical field
The present invention relates to the microprocessor performance bottleneck analysis optimization method of microprocessor Design technical field, and in particular to A kind of microprocessor microarchitecture parameter optimization method based on Colored Petri Net.
Background technology
Early stage microprocessor Design, first have to determine every details of microprocessor microarchitecture, including:Flowing water Line series, the quantity of flowing water station inter-register, the quantity of every keystone resources, as reset sequence buffering (Reorder buffer) Size, size of submission (commit) queue etc., the number of execution unit, delay, value width are launched width, sequentially performed Or Out-of-order execution etc..These parameters how are selected to be very challenging work in design initial period.The selection of parameter is relied on In processor towards application the characteristics of, it is necessary to by carrying out detailed performance evaluation to prototype system, it is and relatively more different The performance of processor under parameter configuration.At present early stage microprocessor Design, how all not have in every microarchitecture parameter In the case of having determination, the performance of microprocessor is fast and accurately assessed, so that it is determined that optimal microarchitecture parameter is led to Frequently with method be to use simulator.
But in order to improve versatility, simulator is typically all using more general structure.In order to the micro- of par-ticular processor Architecture is simulated, it is necessary to modified to simulator, can typically expend longer time.In addition, the operation of simulator Speed is also slow.For a kind of configuration of microarchitecture parameter, surveyed with the accurate simulator operation of clock is representational Examination program (such as SPEC 2000) obtains performance, generally requires to spend several weeks.And the design duration of microprocessor is general It is all relatively tighter.Therefore the use of this method is very limited.Therefore, because lacking efficient method, designer is difficult to set The initial period of meter carries out effective exploration to the microarchitecture design space of microprocessor, causes the microbody system of microprocessor The determination of structural parameters relies heavily on experience and conjecture, absence of proof support.
If can be early stage design, the microarchitecture to microprocessor be modeled, and is building microprocessor When model, merge the situation for considering institute's operation program on microprocessor, by the simulation to model and theory analysis, draw The critical path and performance bottleneck of system, so as to instruct the configuration balance of every resource in microarchitecture.Thus can be The early stage of design, the optimal value of parameters is selected, improve the performance of microprocessor, accelerate the design of microprocessor.Traditional Microprocessor Design space exploration method is using large-scale, the architecture simulation of cycle accurate.It is this as described in above The shortcoming of simulator is that speed is too slow.For this problem, it is proposed that the method for several Fast simulations, reduce under every kind of parameter configuration Need the quantity of the instruction of simulation.However, it is desirable to which the configuration space of the microarchitecture parameter of simulation is still very big.In order to reduce The consuming that the number of times and simulation for running simulation are brought, academia proposes some forecast models.Forecast model includes two stages: Training stage and forecast period.In forecast period, select some parameter configuration input simulators and simulated, obtain analog result (Performance or power consumption), then train forecast model with these labeled parameter configurations so that model can predict input(Ginseng Number configuration)And output(Performance or power consumption)Between relation.In forecast period, the forecast model trained is used to predict newly Performance or power consumption that parameter configuration can be generated.Such method is effectively reduced the expense of simulation, but is a lack of effective card It is demonstrated that the accuracy of this kind of method prediction.
Colored Petri Net (Colored Petri Net) is a kind of Petri network (Petri Net) compatible mode, is retained The useful attribute of Petri network, and extended.It is considered that Colored Petri Net (abbreviation CP-nets or CPNs) is The graphical language for being used for building concurrent system model and analysis system attribute is planted, Colored Petri Net allows its token to have value, This value is referred to as token color (color).The color type of each position (place) is referred to as color set (color set)。
In summary, when microprocessor Design early stage being designed space exploration, there is problems with:1) it is existing to be based on The analog rate of the architecture simulation device of cycle accurate is too slow, and the representative test program of operation spends the time oversize, it is impossible to Many kinds of parameters configuration is simulated in the limited time, small part in design space can only be explored.2) existing base In the design space exploration method of forecast model, the confidence level of prediction and the precision of prediction be not high, although speed is fast but is difficult to Produce believable result.For optimizing existing challenge to the microarchitecture parameter of microprocessor in design early stage, How the microprocessor Design space exploration method that is combined using the modeling, simulation and theory analysis of Colored Petri Net is solved Certainly in the quick microarchitecture design space exploration problem of carry out of microprocessor Design early stage, have become one and urgently solve Key technical problem certainly.
The content of the invention
The technical problem to be solved in the present invention is:For prior art above-mentioned technical problem there is provided one kind prediction it is credible Degree and precision are high, it is wide to explore design space coverage, optimized algorithm complexity is relatively low, optimization rapidly and efficiently based on Petri The microprocessor microarchitecture parameter optimization method of net.
In order to solve the above-mentioned technical problem, the technical solution adopted by the present invention is:
A kind of microprocessor microarchitecture parameter optimization method based on Petri network, implementation step is as follows:
1)According to the microarchitecture of microprocessor, the template of pipeline model is constructed based on Colored Petri Net;
2)Obtain the command sequence of the operational objective application program on the microprocessor;
3)The command sequence is analyzed, data correlation, control correlation and the structurally interrelated information between instruction is obtained, And obtain functional unit type used in every instruction;
4)Data correlation, control correlation and structure phase between template, command sequence, instruction based on the pipeline model The functional unit type that information and command sequence are used is closed, the parameter for describing target program in current microarchitecture is generated The Colored Petri Net model of running situation on microprocessor under configuration;
5)Coloring Petri net model is simulated using Petri network simulation tool and simulation report, the simulation is generated The content of report for sequence to simulate step including providing the time that each transition and transition for simulating step triggering are triggered;
6)Colored Petri Net model is generated by corresponding directed acyclic graph according to the simulation report;
7)The critical path for calculating the directed acyclic graph obtains critical path matrix, and critical path depth represents to work as future The time that sequence section is performed on the processor, the node that critical path is passed through is listed according to critical path matrix, according to key The node in path and while composition, each type in critical path while quantity and delay sum generation critical path report;Meter The slack time and the slack time of generation that each enter side for calculating each node report;
8)Based on critical path report, slack time report, analysis microprocessor is in current microarchitecture The performance bottleneck or power consumption bottleneck of operational objective application program under parameter configuration;
9)According to microprocessor under current microarchitecture parameter configuration the performance bottleneck of operational objective application program or The analysis result of power consumption bottleneck judges whether the parameter of microprocessor microarchitecture needs optimization, then will be micro- if necessary to optimization Architecture parameter is adjusted, and redirects execution step 1);If need not optimize, terminate and exit.
Preferably, the step 3)In the data correlation refer in streamline after the source operand of instruction of transmitting be The destination operand of first firing order, the instruction that the instruction launched afterwards has to wait the data first launched related calculates result After could obtain source operand and continue executing with;The judgement bar of the jump instruction of transmitting after the control correlation refers in streamline Part is the result of calculation of first firing order, and jump instruction below has to wait the dependent instruction first launched to calculate after result It could really obtain parsing the correct direction for drawing instruction stream;The instruction and rear transmitting first launched in the related streamline of the structure Instruction use identical functional unit or limited system resource.
Preferably, the step 4)Generate micro- under the parameter configuration of current microarchitecture for describing target program On processor during the Colored Petri Net model of running situation, specifically refer to data correlation, control correlation in Colored Petri Net On represented with side, the related resource constraint Place for passing through Colored Petri Net of structure is realized.
Preferably, the step 6)Detailed step it is as follows:
6.1)By Colored Petri Net model conversion into directed acyclic graph, the node of the directed acyclic graph is Petri network Transition, between two in for the Petri network transition being connected by place while, the weights for going out side of each node are the node The delay of corresponding transition, deletes the place for representing shared resource and its going out while with entering to other transition;
6.2)The time being triggered according to the transition and transition of each simulation step triggering in the simulation report, to oriented nothing Corresponding node increase attribute in ring figure, the time that property value is triggered for transition;
6.3)The functional unit type according to used in every instruction, updates " execution " flowing water section correspondence in directed acyclic graph Node go out while delay for functional unit delay or flowing water series, remaining while delay be set to 1 clock cycle;
6.4)Correct the weights of directed acyclic graph;
6.5)The Description Matrix of the directed acyclic graph after above-mentioned amendment weights is generated, the row and column of the Description Matrix is all Directed edge is whether there is between element representation node in the node of directed acyclic graph, the Description Matrix, if being not present between node has Xiang Bian, then the weights of the element are zero, if there is directed edge between node, and the weights of the element are the weights on the side.
Preferably, the step 6.4)Detailed step it is as follows:
6.4.1)A node is taken out from the set of node of directed acyclic graph and is used as present node;
6.4.2)Obtain present node it is all enter side, obtain present node enters line set;
6.4.3)From present node enter line set in take out a side as front is worked as, find out when starting point in front, such as The triggered time of fruit present node is more than or equal to when the target triggered time in front and when weights sum in front, then assigns and working as Arrival node time in front is when the target triggered time in front and when weights sum in front;Finally, from present node institute Have and reach node time when selection is minimum in arrival node time in front;
6.4.4)Calculate the triggered time of present node and the minimum difference reached between node time;
6.4.5)Find present node it is all go out side, every weights for going out side are increased on the basis of initial value and work as prosthomere The triggered time of point and the minimum difference reached between node time;
6.4.6)Judge whether all nodes in the set of node of directed acyclic graph have stepped through to finish, if not yet traveled through Finish, then next node is taken out from the set of node of directed acyclic graph as present node, execution step 6.4.2 is redirected);Such as Fruit traversal is finished, then the weights for correcting directed acyclic graph terminate.
Preferably, the step 7)Detailed step it is as follows:
7.1)All weights of the Description Matrix of the directed acyclic graph are negated;
7.2)According to weights Description Matrix solution path length of the inverted, obtained in the path length obtained from solution Weighting value for negative shortest path as critical path, the path length of the critical path represents present procedure section in this place Time for performing on reason device, according to the node of critical path and while composition, each type in critical path while quantity and prolong Slow sum generation critical path report;
7.3)The node that critical path is passed through is listed according to critical path matrix;
7.4)The slack time and the slack time of generation that each enter side for calculating each node report.
Preferably, the step 7.2)The detailed step of middle solution path length is as follows:
7.2.1)By each node press-in stack in weights Description Matrix of the inverted, and Initialize installation is each respectively Node is infinity to the distance between source node of critical path is calculated;
7.2.2)Make the source node to the distance between source node itself be 0;
7.2.3)Whether be empty, the most short of the directed acyclic graph that weights are negative is calculated if stack is sky if judging the stack Path is terminated and exited, weights for negative directed acyclic graph shortest path as critical path, critical path depth is represented The time that present procedure section is performed on the processor;Otherwise execution step 7.2.4 is redirected);
7.2.4)A node is ejected from the stack, judges the node ejected to the distance between the source node for nothing Whether set up greatly thoroughly, if invalid again from the stack eject a node, until ejection node to source node it Between distance set up for infinitely great condition;Execution step 7.2.5 is redirected if setting up);
7.2.5)Obtain ejection node it is all enter side, obtain into line set;
7.2.6)From it is described enter line set in take out one enter while as currently enter while, obtain currently enter corresponding of side Point;
7.2.7)Judge currently to enter the corresponding starting point in side to the distance of the source node more than the node ejected to source node Distance, currently enter the both weights of side and whether set up, execution step 7.2.8 is redirected if setting up);Otherwise, redirect Perform step 7.2.9);
7.2.8)By currently enter the corresponding starting point in side to the distance of source node be entered as ejection node to source node away from From, currently enter side both weights sum, redirect execution step 7.2.10);
7.2.9)Judge currently to enter the corresponding starting point in side whether in the stack, do not exist if currently entering the corresponding starting point in side In the stack, then it will currently enter the corresponding starting point in side and be pressed into the stack, redirect execution step 7.2.10);If currently entering side Corresponding starting point directly redirects execution step 7.2.10 in the stack, then);
7.2.10)Judge it is described enter in line set it is all enter side whether have stepped through and finish, finished if not yet traveled through, Then redirect execution step 7.2.6);Otherwise, execution step 7.2.3 is redirected).
Preferably, the step 7.4)Detailed step it is as follows:
7.4.1)Last node for obtaining critical path is used as present node;
7.4.2)Obtain the present node it is all enter side;
7.4.3)Judge whether the present node is set up for the starting point of critical path, terminate and exit if setting up; If invalid, execution step 7.4.4 is redirected);
7.4.4)For each entering while as currently entering for the present node, the starting point for currently entering side is obtained;
7.4.5)Judge the present node triggered time be equal to currently enter while starting point triggered time, currently enter while It is both weights and whether set up, execution step 7.4.6 is redirected if setting up);Otherwise, execution step 7.4.7 is redirected);
7.4.6)Using the starting point for currently entering side as present node, execution step 7.4.2 is redirected);
7.4.7)By currently enter while starting point triggered time, currently enter while the summation of both weights, will described in work as prosthomere The triggered time of point subtracts the result of summation as the slack time for currently entering side of the present node, redirects execution step 7.4.4)Until the slack time for each entering side of the present node has been calculated;
7.4.8)The slack time report of the slack time that each enters side of the generation comprising each node.
Microprocessor microarchitecture parameter optimization method tool of the present invention based on Petri network has the advantage that:
1)Modeling, simulation and theory analysis techniques are combined to carry out microprocessor Design space exploration by the present invention, When being modeled, it is considered to factor of both the microarchitecture and destination application of microprocessor, pass through simulation and theory Calculate, analyze the performance bottleneck (or power consumption bottleneck) of microprocessor operational objective application program, produce analysis report, then basis The result of analysis, changes the microarchitecture structural parameters in model, re-starts simulation and theory analysis, and so circulation is found Optimal microarchitecture parameter, due to designing early stage in microprocessor microarchitecture, can be provided to pipeline organization, system The microarchitecture parameter such as source carries out rapid modeling, when the microarchitecture to microprocessor is modeled, it is contemplated that micro- place The feature of the destination application run on reason device, reliability forecasting and precision are high.
2)Theory analysis of the present invention being simulated by Colored Petri Net and based on figure, obtains the performance or power consumption of system Bottleneck, can accelerate the exploration of design space, explore design space coverage wide.
3)The algorithm complex for wherein carrying out critical path solution based on directed acyclic graph is O (n2), wherein n should for target Instruction number, optimized algorithm complexity is relatively low.
4) the Colored Petri Net model that this method is used, can change microbody system by the modification to pipeline template Structural parameter assignment, then by automatically generating after the completion of after whole work, therefore it is empty to be highly convenient for carrying out quickly design Between explore, have the advantages that optimization rapidly and efficiently.
Brief description of the drawings
Fig. 1 is the method flow schematic diagram of the embodiment of the present invention, and rectangle frame represents operation, and parallelogram frame represents middle Data.
Fig. 2 is the microarchitecture schematic diagram of the microprocessor pipeline of prior art.
Fig. 3 is the structural representation of the Colored Petri Net model generated in the embodiment of the present invention.
Fig. 4 is the structural representation of the directed acyclic graph generated in the embodiment of the present invention.
Fig. 5 is the schematic flow sheet of the weights of amendment directed acyclic graph in the embodiment of the present invention.
Fig. 6 is the schematic flow sheet of solution critical path depth in the embodiment of the present invention.
Fig. 7 be the embodiment of the present invention in calculate each node each enter side slack time schematic flow sheet.
Embodiment
As shown in figure 1, the implementation step of microprocessor microarchitecture parameter optimization method of the present embodiment based on Petri network It is rapid as follows:
1)According to the microarchitecture of microprocessor, the template of pipeline model is constructed based on Colored Petri Net.
In the abstract graph of super scalar pipeline as shown in Figure 2, the streamline includes 6 flowing water sections, respectively fetching (fetch), decode(decode), renaming(rename), transmitting(issue), perform(exe)With write back(commit), wherein Perform(exe)Including 2 arithmetic logic units (ALU), 1 floating point unit (FP), 1 memory access unit(load/store).Stream Waterline model is exactly the structure for describing of this sort streamline, and the template of pipeline model describes the structure of streamline, stream Competition and arbitration of the quantity, resource of various resources etc. in waterline.Place based on Colored Petri Net in the present embodiment (Place) the number amount and type to the memory space in streamline, various resources are modeled, such as:Delay between pipeline stage and deposit, withdraw Refer to queue (fetch queue), redirect caching (reorder buffer) etc.;The transition of coloring petri net (transition) to the flowing water section of streamline, such as:Fetching, decoding, assignment, execution, submission etc., consider it is that order is held during modeling The factor such as row or Out-of-order execution.
2)Obtain the command sequence of the operational objective application program on the microprocessor.
The present embodiment specifically refers to the operational objective application program on real processor or accelerated model, obtains target The command sequence that application program is run on target processor.The form of command sequence can by user according to oneself the need for carry out It is self-defined.Such as use following form:" instruction sequence number:Instruct assembly code ".
3)Command sequence is analyzed, data correlation, control correlation and the structurally interrelated information between instruction is obtained, and obtain Take functional unit type used in every instruction(Such as integer streamline, floating-point pipeline, multiplication division streamline, memory access Unit (load/store) etc.).
In the present embodiment, step 3)In data correlation refer in streamline after the source operand of instruction of transmitting be first to send out The destination operand of instruction is penetrated, the instruction that the instruction launched afterwards has to wait the data first launched related is calculated after result Source operand can be obtained and continued executing with;The Rule of judgment of the jump instruction of transmitting is first to send out after control correlation refers in streamline The result of calculation of instruction is penetrated, jump instruction below has to wait the dependent instruction first launched to calculate could be real after result Obtain parsing the correct direction for drawing instruction stream;The instruction first launched and the instruction launched afterwards use phase in structure correlation streamline Same functional unit or limited system resource.By using exemplified by identical functional unit, if functional unit flowing water, after The dependent instruction that the instruction of transmitting has to wait for first launching enters ability entered function unit execution after corresponding functional unit, if Functional unit not flowing water, the then dependent instruction that the instruction launched afterwards has to wait for first launching has used functional unit that work(just can be used Can unit;Same using limited system resource, instruction is same there is also mutually exclusive relation between, the finger launched afterwards Make the dependent instruction for having to wait for first launching that limited system resource just can be used using complete limited system resource.
4)Data correlation between template, command sequence, instruction based on pipeline model, the related letter related to structure of control The functional unit type that breath and command sequence are used, generates the parameter configuration for describing target program in current microarchitecture Under microprocessor on running situation Colored Petri Net model.
In the present embodiment, step 4)Generate micro- under the parameter configuration of current microarchitecture for describing target program On processor during the Colored Petri Net model of running situation, specifically refer to data correlation, control correlation in Colored Petri Net On represented with side, the related resource constraint Place for passing through Colored Petri Net of structure is realized.
Assuming that performing following code sequence on the pipeline model shown in Fig. 2:
Instruct 1 add R2, R1, R0
Instruct 2 bz TARGET, R2
Instruct 3 mul R7, R6, #0x2 (being not carried out)
Instruct 4 TARGET: add R5, R6,R7
It was found from from above-mentioned code sequence, there is data dependence relation between instruction 2 and instruction 1, between instruction 4 and instruction 2 There is control dependency relation (it is assumed here that R2 is zero after instruction 1 has been performed, therefore instruction 3 is not carried out).Instruction 1 and instruction 4 Arithmetic logic unit (ALU) will be used, therefore there is resource contention.In addition, every instruction to be launched will apply for hair Penetrate queue (issue queue) resource.Implementation status of the above-mentioned command sequence on 6 sections of streamlines is modeled, obtained Colored Petri Net model as shown in Figure 3.
In Colored Petri Net model as shown in Figure 3, longitudinal direction represents the implementation procedure of an instruction in a pipeline, often Bar is instructed(Instr1~Instr3)Account for 1 row;It is laterally the shooting sequence of instruction, the target that horizontal width depends on obtaining should The maximum-norm for the Petri network that can be analyzed with the command sequence scale and Colored Petri Net analysis tool of program;Black is rectangular Shape represents the transition (transition) of Colored Petri Net, and circle represents that the stain in place (place), place represents coloring The token of Petri network, the order laterally performed for instruction, longitudinal direction instructs the microoperation sequence performed in a pipeline for every, Due to instructing 3 in actual command sequence(mul R7, R6, #0x2)It is not performed, therefore instruction 3 is not in Petri Modeled in pessimistic concurrency control;Solid arrow represents the context of microoperation in figure, and long dashed arrows show data is related, chain-dotted line arrow Head(The dash-dot arrows of transmitting queue are represented to the shared of execution unit and competed, and the dash-dot arrows of pool of execution units are represented Application and release to launching queue)Represent that structure is related(Resource contention), 1 executions section is instructed herein to instructing 2 holding Row section has bar side to represent data dependence relation;Short dash line arrow represents that control is related, the execution section of instruction 2 and the fetching of instruction 4 There is bar side to represent that control is related between section.Side between transition and place represents the triggering between transition and place.
5)Coloring Petri net model is simulated using Petri network simulation tool and simulation report is generated, simulation report Content include providing transition and time for being triggered of transition that each simulation walks triggering to simulate step for sequence.
In the present embodiment, the form for simulating every a line of report is:<Simulation step;Change name;The triggered time of transition>.
6)Colored Petri Net model is generated by corresponding directed acyclic graph according to simulation report.
The present embodiment is in step 4)The Colored Petri Net of structure is the digraph for having ring in itself, therefore firstly the need of elimination Ring in figure, when there is the competition to shared resource, just has ring.Colored Petri Net passes through step 4)Dry run, it is raw Into simulation report.Each transition (Transition) for competing shared resource, simulation report all gives the triggered time. Step 6)Then remove the ring in the digraph of generation, the directed acyclic graph finally given such as Fig. 4 including the specifically used triggered time It is shown, three instructions in three row difference corresponding diagrams 3(Instr1~Instr3), it is circular per a line in alphabetical F represent fetching, D represents decompression, and R represents renaming, and I represents transmitting, and E represents to perform, and C represents to write back.
In the present embodiment, step 6)Detailed step it is as follows:
6.1)By Colored Petri Net model conversion into directed acyclic graph, the node of directed acyclic graph is the change of Petri network Move, between two in for the Petri network transition being connected by place while, the weights for going out side of each node are the node pair The delay for the transition answered, deletes the Place for representing shared resource and its going out while with entering to other transition;
6.2)The time being triggered according to the transition and transition of each simulation step triggering in simulation report, to directed acyclic graph In corresponding node increase attribute, property value is time for being triggered of transition;
6.3)The functional unit type according to used in every instruction, updates " execution " flowing water section correspondence in directed acyclic graph Node go out while delay for functional unit delay or flowing water series, remaining while delay be set to 1 clock cycle;
6.4)Correct the weights of directed acyclic graph;
6.5)The Description Matrix of the directed acyclic graph after above-mentioned amendment weights is generated, the row and column of Description Matrix is all oriented Directed edge is whether there is between element representation node in the node of acyclic figure, Description Matrix, if directed edge is not present between node, member The weights of element are zero, if there is directed edge between node, and the weights of element are the weights on the side.
As shown in figure 5, step 6.4)Detailed step it is as follows:
6.4.1)A node V is taken out from the set of node of directed acyclic graphjIt is used as present node;
6.4.2)Obtain present node VjIt is all enter side, obtain present node enters line set;
6.4.3)From present node enter line set in take out a side EijAs front is worked as, find out as front EijRise Point ViIf, present node EijTriggered time Vj.firetime it is more than or equal to and works as the target triggered time in front Vi.firetime and as weights E in frontij.weight sum (Vj.firetime>=Vi.firetime+Eij.weight), then Assign when arrival node time slack [i, j] in front is when target triggered time V in fronti.firetime and when in front Weigh Eij.weight it is worth sum(slack[i,j]= Vi.firetime+Eij.weight);Finally, it is all current from present node Minimum arrival node time is selected in the arrival node time on side(T=min(slack[j,0], slack[j,1], slack[j, 2],……, slack[j,k-1]));
6.4.4)Calculate the triggered time V of present nodej.firetime the difference between node time T is reached minimum X;
6.4.5)Find present node it is all go out side, every weights for going out side are increased on the basis of initial value and work as prosthomere The triggered time V of pointj.firetime the difference X between node time T is reached minimum;
6.4.6)Judge whether all nodes in the set of node of directed acyclic graph have stepped through to finish, if not yet traveled through Finish, then next node is taken out from the set of node of directed acyclic graph is used as present node Vj, redirect execution step 6.4.2); If traversal is finished, the weights of amendment directed acyclic graph terminate.
7)The critical path for calculating directed acyclic graph obtains critical path matrix, and critical path depth represents present procedure section The time performed on the processor, the node that critical path is passed through is listed according to critical path matrix, according to critical path Node and while composition, each type in critical path while quantity and delay sum generation critical path report;Calculate every The slack time and the slack time of generation that each enter side of individual node report.
In the present embodiment, step 7)Detailed step it is as follows:
7.1)All weights of the Description Matrix of directed acyclic graph are negated;
7.2)According to weights Description Matrix solution path length of the inverted, power is obtained in the path length obtained from solution It is worth the shortest path for negative(Namely weights are the longest path of positive directed acyclic graph)It is used as critical path, critical path Path length represent present procedure time for performing on the processor of section, according to the node of critical path and the composition on side, The quantity on each type side and delay sum generation critical path report in critical path;
7.3)The node that critical path is passed through is listed according to critical path matrix;
7.4)The slack time and the slack time of generation that each enter side for calculating each node report.
As shown in fig. 6, step 7.2)The detailed step of middle solution path length is as follows:
7.2.1)By each node V in weights Description Matrix of the invertedjStack Q is pressed into, and Initialize installation is each respectively Node is infinity infinity to the distance between the source node dist [j] of critical path is calculated;
7.2.2)Make source node to the distance between source node itself be 0(dist[s]=0);
7.2.3)Whether be empty, the shortest path for the directed acyclic graph that weights are negative is calculated if stack Q is sky if judging stack Q Footpath is terminated and exited, weights for negative directed acyclic graph shortest path as critical path, critical path depth represent work as The time that preceding program segment is performed on the processor;Otherwise execution step 7.2.4 is redirected);
7.2.4)A node V is ejected from stack Qj, judge the node V of ejectionjTo the distance between source node dist [j] Whether set up for infinitely great infinity, eject a node V from stack Q again if invalidj, until the node of ejection VjSet up to the distance between source node dist [j] for infinity infinity condition;Execution step is redirected if setting up 7.2.5);
7.2.5)Obtain the node V of ejectionjIt is all enter side, obtain into line set neis;
7.2.6)One is taken out from line set neis is entered to enter while as E when currently enteringk, obtain and currently enter side EkIt is corresponding Starting point targetk
7.2.7)Judge currently to enter side EkCorresponding starting point targetkTo source node apart from dist [targetk] it is more than bullet The node gone out enters both weight w eights [k] on side sum to source node apart from dist [j], currently(dist[targetk]> dist[j]+weights[k])Whether set up, execution step 7.2.8 is redirected if setting up);Otherwise, execution step is redirected 7.2.9);
7.2.8)The corresponding starting point in side will currently be entered to source node apart from dist [targetk] it is entered as the node of ejection Enter both weight w eights [k] on side sum apart from dist [j], currently to source node(dist[targetk]=dist[j]+ weights[k]), redirect execution step 7.2.10);
7.2.9)Judge currently to enter the corresponding starting point target in sidekWhether in stack Q, if currently entering the corresponding starting point in side targetkNot in stack Q, then it will currently enter the corresponding starting point target in sidekIt is pressed into stack Q, redirects execution step 7.2.10);Such as Fruit currently enters the corresponding starting point in side in stack Q, then directly redirects execution step 7.2.10);
7.2.10)Judge in line set neis it is all enter side whether have stepped through and finish, if not yet traversal is finished, Then redirect execution step 7.2.6);Otherwise, execution step 7.2.3 is redirected).
Above-mentioned steps 7.2.1)~7.2.10)In, s is the source node for calculating critical path;Dist [] is each node Vj To s distance;Weights [] is the weights of each edge;Target be and each node VjThere is the connected destination node in side;Q For the storehouse of memory node;Infinity is infinity.
As shown in fig. 7, step 7.4)Detailed step it is as follows:
7.4.1)Obtain last node V of critical pathjIt is used as present node;
7.4.2)Obtain present node VjIt is all enter side;
7.4.3)Judge present node VjWhether set up for the starting point s of critical path, terminate and exit if setting up;Such as It is really invalid, then redirect execution step 7.4.4);
7.4.4)For present node VjEach enter while be used as currently enter while Ek, obtain the starting point for currently entering side targetk
7.4.5)Judge that the triggered time res [0, j] of present node is equal to the triggered time res for the starting point for currently entering side [0, targetk], currently enter side both weight w eights [k] sum(res[0,j] == res[0, targetk] + weights[k])Whether set up, execution step 7.4.6 is redirected if setting up);Otherwise, execution step 7.4.7 is redirected);
7.4.6)J is printed out, the starting point for currently entering side is regard as present node(j = targetk), redirect and perform step Rapid 7.4.2);
7.4.7)The triggered time res [0, targetk] of the starting point on side will currently be entered, currently enter the weight w eights on side Both [k] is summed, and the triggered time res [0, j] of present node is subtracted to result (res [0, targetk]+weights of summation [k]) it is used as the slack time slacktime [j, k] for currently entering side of present node(That is slacktime [j, k]=res [0, j]- (res[0,targetk]+weights[k])), redirect execution step 7.4.4)Until during the relaxation for each entering side of present node Between all calculated;
7.4.8)The slack time report of the slack time that each enters side of the generation comprising each node.
Above-mentioned steps 7.4.1)~7.4.7)In, res [i, j] is node Vj triggered time;Slacktime [j, k] is Contact VjEnter side EkSlack time;Weights [] is the weights of each edge;S is the starting point of critical path.
8)Reported by critical path, slack time report based on, analysis microprocessor in current microarchitecture parameter The performance bottleneck or power consumption bottleneck of the lower operational objective application program of configuration;Current microarchitecture can for example be analyzed and there is weight The problem of name buffering configuration is not enough.In the present embodiment, because the content that critical path is reported includes:The node of critical path and The quantity on each type side and delay sum in the composition on side, critical path.Slack time report includes each in critical path Node it is all enter side slack time.According to these information, the performance bottleneck of system can be obtained.If for example, from decoding section To more than the such side quantity of renaming section and delay sum is big, then it represents that the performance bottleneck of system may be the absence of enough Rename Resource.The optimization space that slack time is reported as microarchitecture provides reference.
9)According to microprocessor under current microarchitecture parameter configuration the performance bottleneck of operational objective application program or The analysis result of power consumption bottleneck judges whether the parameter of microprocessor microarchitecture needs optimization, then will be micro- if necessary to optimization Architecture parameter is adjusted, and redirects execution step 1);If need not optimize, terminate and exit.The present embodiment is walked Rapid 9)According to step 8)Analysis result change microprocessor microarchitecture parameter configuration, for example change pipeline organization, Change resource quantity, change functional unit delay etc.;Then it is transferred to step 1)Start anew, obtain the critical path path length newly configured Degree, composition and relaxation events report, are contrasted with design before, complete the parameter optimization of microprocessor microarchitecture, So as to the exploration in design space.
In summary, the present embodiment will be modeled, simulation and theory analysis techniques are combined to carry out microprocessor Design sky Between explore, when being modeled, it is considered to factor of both the microarchitecture and destination application of microprocessor, pass through mould Theoretical calculation is fitted, the performance bottleneck (or power consumption bottleneck) of microprocessor operational objective application program is analyzed, analysis report is produced, Then according to the result of analysis, change the microarchitecture structural parameters in model, re-start simulation and theory analysis, so Optimal microarchitecture parameter is found in circulation, wide, excellent with reliability forecasting and precision height, exploration design space coverage Change algorithm complex is relatively low, the advantage of optimization rapidly and efficiently.
Described above is only the preferred embodiment of the present invention, and protection scope of the present invention is not limited merely to above-mentioned implementation Example, all technical schemes belonged under thinking of the present invention belong to protection scope of the present invention.It should be pointed out that for the art Those of ordinary skill for, some improvements and modifications without departing from the principles of the present invention, these improvements and modifications It should be regarded as protection scope of the present invention.

Claims (7)

1. a kind of microprocessor microarchitecture parameter optimization method based on Petri network, it is characterised in that implementation steps are as follows:
1)According to the microarchitecture of microprocessor, the template of pipeline model is constructed based on Colored Petri Net;
2)Obtain the command sequence of the operational objective application program on the microprocessor;
3)The command sequence is analyzed, data correlation, control correlation and the structurally interrelated information between instruction is obtained, and obtain Take functional unit type used in every instruction;
4)Data correlation between template, command sequence, instruction based on the pipeline model, the related letter related to structure of control The functional unit type that breath and command sequence are used, generates the parameter configuration for describing target program in current microarchitecture Under microprocessor on running situation Colored Petri Net model;
5)Coloring Petri net model is simulated using Petri network simulation tool and simulation report is generated, the simulation report Content include providing transition and time for being triggered of transition that each simulation walks triggering to simulate step for sequence;
6)Colored Petri Net model is generated by corresponding directed acyclic graph according to the simulation report;
7)The critical path for calculating the directed acyclic graph obtains critical path matrix, and critical path depth represents present procedure section The time performed on the processor, the node that critical path is passed through is listed according to critical path matrix, according to critical path Node and while composition, each type in critical path while quantity and delay sum generation critical path report;Calculate every The slack time and the slack time of generation that each enter side of individual node report;
8)Based on critical path report, slack time report, analysis microprocessor is in current microarchitecture parameter The performance bottleneck or power consumption bottleneck of the lower operational objective application program of configuration;
9)According to the performance bottleneck or power consumption of microprocessor operational objective application program under current microarchitecture parameter configuration The analysis result of bottleneck judges whether the parameter of microprocessor microarchitecture needs optimization, if necessary to optimization then by microbody system Structural parameters are adjusted, and redirect execution step 1);If need not optimize, terminate and exit;
The step 4)Generate and transported for describing target program on the microprocessor under the parameter configuration of current microarchitecture During the Colored Petri Net model of market condition, specifically refer to data correlation, control correlation using Bian Laibiao in Colored Petri Net Show, the related resource constraint Place by Colored Petri Net of structure is realized.
2. the microprocessor microarchitecture parameter optimization method according to claim 1 based on Petri network, its feature exists In:The step 3)In the data correlation refer in streamline after the source operand of instruction of transmitting be first firing order Destination operand, the instruction that the instruction launched afterwards has to wait the data first launched related calculates ability acquisition source after result Operand is simultaneously continued executing with;The Rule of judgment of the jump instruction of transmitting is that first transmitting refers to after the control correlation refers in streamline The result of calculation of order, jump instruction below has to wait the dependent instruction first launched to calculate after result and could really obtained Parsing draws the correct direction of instruction stream;The instruction first launched and the instruction launched afterwards use phase in the related streamline of the structure Same functional unit or limited system resource.
3. the microprocessor microarchitecture parameter optimization method according to claim 1 or 2 based on Petri network, it is special Levy and be, the step 6)Detailed step it is as follows:
6.1)By Colored Petri Net model conversion into directed acyclic graph, the node of the directed acyclic graph is the change of Petri network Move, between two in for the Petri network transition being connected by place while, the weights for going out side of each node are the node pair The delay for the transition answered, deletes the place for representing shared resource and its going out while with entering to other transition;
6.2)The time being triggered according to the transition and transition of each simulation step triggering in the simulation report, to directed acyclic graph In corresponding node increase attribute, property value is time for being triggered of transition;
6.3)The functional unit type according to used in every instruction, updates the corresponding section of " execution " flowing water section in directed acyclic graph Point go out while delay for functional unit delay or flowing water series, remaining while delay be set to 1 clock cycle;
6.4)Correct the weights of directed acyclic graph;
6.5)The Description Matrix of the directed acyclic graph after above-mentioned amendment weights is generated, the row and column of the Description Matrix is all oriented Directed edge is whether there is between element representation node in the node of acyclic figure, the Description Matrix, if directed edge is not present between node, Then the weights of the element are zero, if there is directed edge between node, and the weights of the element are the weights on the side.
4. the microprocessor microarchitecture parameter optimization method according to claim 3 based on Petri network, its feature exists In the step 6.4)Detailed step it is as follows:
6.4.1)A node is taken out from the set of node of directed acyclic graph and is used as present node;
6.4.2)Obtain present node it is all enter side, obtain present node enters line set;
6.4.3)From present node enter line set in take out a side as front is worked as, find out when starting point in front, if work as The triggered time of front nodal point is more than or equal to when the target triggered time in front and when weights sum in front, then assigns and work as front Arrival node time be when the target triggered time in front and when weights sum in front;Finally, work as from present node is all Minimum arrival node time is selected in arrival node time in front;
6.4.4)Calculate the triggered time of present node and the minimum difference reached between node time;
6.4.5)Find present node it is all go out side, every weights for going out side are increased into present node on the basis of initial value Triggered time and the minimum difference reached between node time;
6.4.6)Judge whether all nodes in the set of node of directed acyclic graph have stepped through to finish, if not yet traveled through Finish, then next node is taken out from the set of node of directed acyclic graph as present node, execution step 6.4.2 is redirected);If Traversal is finished, then the weights for correcting directed acyclic graph terminate.
5. the microprocessor microarchitecture parameter optimization method according to claim 4 based on Petri network, its feature exists In the step 7)Detailed step it is as follows:
7.1)All weights of the Description Matrix of the directed acyclic graph are negated;
7.2)According to weights Description Matrix solution path length of the inverted, power is obtained in the path length obtained from solution It is worth the shortest path for negative as critical path, the path length of the critical path represents that present procedure section manages device in this place The time of upper execution, according to the node of critical path and while composition, each type in critical path while quantity and postpone it With generation critical path report;
7.3)The node that critical path is passed through is listed according to critical path matrix;
7.4)The slack time and the slack time of generation that each enter side for calculating each node report.
6. the microprocessor microarchitecture parameter optimization method according to claim 5 based on Petri network, its feature exists In the step 7.2)The detailed step of middle solution path length is as follows:
7.2.1)Each node in weights Description Matrix of the inverted is pressed into stack, and each node of Initialize installation respectively It is infinity to the distance between source node of critical path is calculated;
7.2.2)Make the source node to the distance between source node itself be 0;
7.2.3)Whether be empty, the shortest path for the directed acyclic graph that weights are negative is calculated if stack is sky if judging the stack Terminate and exit, weights for negative directed acyclic graph shortest path as critical path, critical path depth represents current The time that program segment is performed on the processor;Otherwise execution step 7.2.4 is redirected);
7.2.4)A node is ejected from the stack, judges the node of ejection to the distance between the source node to be infinitely great Whether set up, eject a node from the stack again if invalid, until the node of ejection is between source node Distance is set up for infinitely great condition;Execution step 7.2.5 is redirected if setting up);
7.2.5)Obtain ejection node it is all enter side, obtain into line set;
7.2.6)From it is described enter line set in take out one enter while as currently enter while, obtain currently enter the corresponding starting point in side;
7.2.7)Judge currently enter the corresponding starting point in side to the source node distance be more than eject node to source node away from From, currently enter the both weights of side and whether set up, execution step 7.2.8 is redirected if setting up);Otherwise, execution is redirected Step 7.2.9);
7.2.8)Will currently enter the corresponding starting point in side to the distance of source node be entered as ejection node to source node distance, when Before enter the sums of both weights on side, redirect execution step 7.2.10);
7.2.9)Judge currently to enter the corresponding starting point in side whether in the stack, if currently entering the corresponding starting point in side not described In stack, then it will currently enter the corresponding starting point in side and be pressed into the stack, redirect execution step 7.2.10);If currently entering side correspondence Starting point execution step 7.2.10 is directly redirected in the stack, then);
7.2.10)Judge it is described enter in line set it is all enter side whether have stepped through and finish, finish, jump if not yet traveled through Turn to perform step 7.2.6);Otherwise, execution step 7.2.3 is redirected).
7. the microprocessor microarchitecture parameter optimization method according to claim 6 based on Petri network, its feature exists In the step 7.4)Detailed step it is as follows:
7.4.1)Last node for obtaining critical path is used as present node;
7.4.2)Obtain the present node it is all enter side;
7.4.3)Judge whether the present node is set up for the starting point of critical path, terminate and exit if setting up;If It is invalid, then redirect execution step 7.4.4);
7.4.4)For each entering while as currently entering for the present node, the starting point for currently entering side is obtained;
7.4.5)Judge the present node triggered time be equal to currently enter while starting point triggered time, currently enter while power Whether value both and is set up, and execution step 7.4.6 is redirected if setting up);Otherwise, execution step 7.4.7 is redirected);
7.4.6)Using the starting point for currently entering side as present node, execution step 7.4.2 is redirected);
7.4.7)By currently enter while starting point triggered time, currently enter while the summation of both weights, by the present node Triggered time subtracts the result of summation as the slack time for currently entering side of the present node, redirects execution step 7.4.4) Until the slack time for each entering side of the present node has been calculated;
7.4.8)The slack time report of the slack time that each enters side of the generation comprising each node.
CN201410673302.5A 2014-11-21 2014-11-21 Microprocessor microarchitecture parameter optimization method based on Petri network Active CN104361182B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410673302.5A CN104361182B (en) 2014-11-21 2014-11-21 Microprocessor microarchitecture parameter optimization method based on Petri network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410673302.5A CN104361182B (en) 2014-11-21 2014-11-21 Microprocessor microarchitecture parameter optimization method based on Petri network

Publications (2)

Publication Number Publication Date
CN104361182A CN104361182A (en) 2015-02-18
CN104361182B true CN104361182B (en) 2017-08-25

Family

ID=52528442

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410673302.5A Active CN104361182B (en) 2014-11-21 2014-11-21 Microprocessor microarchitecture parameter optimization method based on Petri network

Country Status (1)

Country Link
CN (1) CN104361182B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105260166B (en) * 2015-10-15 2018-02-27 西安交通大学 A kind of manual sample set creation method applied to machine learning thread dividing
CN105233362A (en) * 2015-10-29 2016-01-13 陈文香 Ward nursing alarm system
DE102018217581A1 (en) * 2018-10-15 2020-04-16 Robert Bosch Gmbh Process for controlling a production plant, computer program, machine-readable storage medium, electronic control unit and production plant
CN109684421B (en) * 2018-12-25 2023-05-02 四川效率源信息安全技术股份有限公司 Graph theory-based method for engraving and restoring Oracle database file
CN111209178A (en) * 2020-01-13 2020-05-29 中信银行股份有限公司 Full link bottleneck testing method and system
CN112148460B (en) * 2020-10-12 2023-11-03 中国农业银行股份有限公司 Determination method and device for critical path, electronic equipment and computer storage medium
CN113515348B (en) * 2021-07-16 2023-11-14 江苏师范大学 Simulator modeling method and device based on opportunity action flow

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1508684A (en) * 2002-12-20 2004-06-30 中国科学院计算技术研究所 Method for evaluating credibility of computer group system
CN101853201A (en) * 2010-05-24 2010-10-06 南京航空航天大学 Software parallel test method and tool based on coloring petri net
CN102999654A (en) * 2012-08-15 2013-03-27 同济大学 Product modeling method based on behavior flow spatial sequence coloring Petri network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1508684A (en) * 2002-12-20 2004-06-30 中国科学院计算技术研究所 Method for evaluating credibility of computer group system
CN101853201A (en) * 2010-05-24 2010-10-06 南京航空航天大学 Software parallel test method and tool based on coloring petri net
CN102999654A (en) * 2012-08-15 2013-03-27 同济大学 Product modeling method based on behavior flow spatial sequence coloring Petri network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于着色 Petri 网的工作流建模和合理性分析;陈翔等;《计算机集成制造系统》;20040430;第10卷(第4期);正文第3节 *
异步嵌入式微处理器设计与分析关键技术研究;王蕾;《中国博士学位论文全文数据库 信息科技辑》;20071115;正文第4、6章 *
异步电路的静态数据流图模型及其性能分析;晋钢等;《计算机科学》;20091231;第36卷(第12期);第231-234页 *

Also Published As

Publication number Publication date
CN104361182A (en) 2015-02-18

Similar Documents

Publication Publication Date Title
CN104361182B (en) Microprocessor microarchitecture parameter optimization method based on Petri network
Joseph et al. A predictive performance model for superscalar processors
Lee et al. Rpstacks: Fast and accurate processor design space exploration using representative stall-event stacks
Jongerius et al. Analytic processor model for fast design-space exploration
CN106469114A (en) A kind of Parallel Computing Performance detecting system towards communication test and its method
Seshia et al. Game-theoretic timing analysis
CN104361183B (en) Microprocessor microarchitecture parameter optimization method based on simulator
Yang et al. An approach to build cycle accurate full system VLIW simulation platform
Kumar et al. Performance evaluation of highly concurrent computers by deterministic simulation
Gustafsson The worst case execution time tool challenge 2006
Stattelmann et al. Combining instruction set simulation and wcet analysis for embedded software performance estimation
Kalyanam et al. Power prediction of embedded scalar and vector processor: Challenges and solutions
Srinivasan et al. Performance modeling using monte carlo simulation
CN106326521B (en) A kind of static state superscale DSP period analogy method
Uddin et al. Signature-based high-level simulation of microthreaded many-core architectures
Madougou et al. Using colored petri nets for GPGPU performance modeling
CN104516770A (en) Program calculation cost estimation technology based on high speed simulation
CN107066706B (en) GPU FFMA instructs the flux test method under double emission modes
Sharples et al. A run control framework to streamline profiling, porting, and tuning simulation runs and provenance tracking of geoscientific applications
Chtourou et al. SystemC space exploration of behavioral synthesis options on area, performance and power consumption
Lang et al. A modeling approach and design tool for pipelined central processors
Laukemann et al. Core-Level Performance Engineering with the Open-Source Architecture Code Analyzer (OSACA) and the Compiler Explorer
Callanan et al. Estimating stream application performance in early-stage system design
CN110750856B (en) Effective instruction window size assessment method based on machine learning
Becker et al. Tuning the ISA for increased heterogeneous computation in MPSoCs

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant