CN104516770A

CN104516770A - Program calculation cost estimation technology based on high speed simulation

Info

Publication number: CN104516770A
Application number: CN201410853868.6A
Authority: CN
Inventors: 李尚杰; 程胜; 周志军; 魏明; 卓保特
Original assignee: BEIJING SHENZHOU AEROSPACE SOFTWARE TECHNOLOGY Co Ltd
Current assignee: BEIJING SHENZHOU AEROSPACE SOFTWARE TECHNOLOGY Co Ltd
Priority date: 2014-12-31
Filing date: 2014-12-31
Publication date: 2015-04-15

Abstract

The invention discloses a program calculation cost estimation technology based on high speed simulation. The program calculation cost estimation technology is characterized in that the principle of the program calculation cost estimation technology comprises the following steps: firstly analyzing a program, then marking an estimated time delay, then locally executing a simulated program, and thus collecting the performance information of software; without interpretive execution for a target program, converting the target program into a locally executable file, and thus obtaining higher simulation efficiency; running the marked time delay on a development host machine, and thus obtaining a performance estimation result. By exploring the most critical factor of restricting the performance in an instruction set simulation technology for analysis, a hybrid simulation technology blending virtuality and simulation is provided, and the simulation speed is improved, so that the speed and accuracy demands required for multi-core SoC design exploration can be met.

Description

A kind of program computation overhead assessment technology based on high-speed simulation

Technical field

The invention belongs to a kind of program computation overhead assessment technology based on high-speed simulation.

Background technology

Multinuclear SoC is made up of multiple isomery processing unit often.The key point evaluating the design proposal of multinuclear SoC is the performance of the given application program of assessment on the multinuclear SoC platform of certain configuration.But because the performance of framework to whole system of multinuclear SoC has considerable influence; One section of code (individual processor can obtain superior performance and not mean that and also have identical result on other processors one.This just needs to utilize isa simulator application programs to emulate.

But instruction set simulation technology needs structure one to describe the emulator of target architecture in detail, and explanation is performed every objective code by emulator, and this just causes simulation velocity very low, is difficult to the demand meeting multinuclear SoC design space exploration.Design space exploration emulates utilizing emulator design proposal as much as possible, thus compares optimization design scheme.

Summary of the invention

The technical problem to be solved in the present invention overcomes above-mentioned defect, there is provided a kind of and merge hybrid simulation technology that is virtual and emulation, improve simulation velocity, the program computation overhead assessment technology based on high-speed simulation of speed needed for multinuclear SoC design thoughts and accuracy demand can be reached.

For solving the problem, the technical solution adopted in the present invention is:

A kind of program computation overhead assessment technology based on high-speed simulation, it is characterized in that: the principle of this program computation overhead assessment technology is: first analyze program, then the time delay of estimation is marked out, and then the simulated program of local execution, thus collect the performance information of software; Said procedure computing cost assessment technology not to lay down a definition execution to target program, but target program is converted to local executable file, so can obtain higher simulation efficiency; The time delay of mark can be run on exploitation main frame, thus obtains performance evaluation result.

As a kind of technical scheme of optimization, the step of described program computation overhead assessment technology is as follows: step 1, program segment divide; Step 2, code postpone assessment; Step 3, set up independently realistic model; Step 4, program pitching pile.

As a kind of technical scheme of optimization, in described step 1, the instruction sequence of program is divided into fundamental block, the object of instruction assessment is also the command unit in fundamental block; Fundamental block is a special program element, and it only has an entrance and an outlet, and the instruction sequences of fundamental block inside performs; The entry instruction of fundamental block is performed, and all instructions of fundamental block inside all will sequentially perform once, until exit instruction is performed; The instruction normally simple instruction of fundamental block inside, its execution time is the clock period of clearly recording in a clock period or document; Owing to there is no branch instruction, just there is not branch's overhead issues yet.

As a kind of technical scheme of optimization, in described step 2, using a fundamental block as an assessment unit, its execution time is primarily of three part compositions: the streamline chain time delay caused of fundamental block code execution time, the Tapped Delay time entering fundamental block, fundamental block inside.

As a kind of technical scheme of optimization, in described step 3, Simulation model representation method is the transaction layer model of the band time that SystemC provides, software TLM/T model is that a behavior layer of program describes in essence, it has attached the execution time estimated, in SystemC, and model realization SC_MODULE, and have a SC_THREAD to perform master routine, the expectation of program on target processor performs and postpones to represent by explicitly in a model; This technology, based on software TLM/T model, marks with consume () and performs delay.

As a kind of technical scheme of optimization, in described step 4, first utilize ProgPart (P) to whole program pitching pile, it is by initialize routine Performance Evaluation environment, then, according to the division of above-mentioned fundamental block, analysis, Performance Evaluation, insert consume () method at each fundamental block head.

Owing to have employed technique scheme, compared with prior art, the present invention is directed to instruction set simulation technology simulation velocity low, this is not enough cannot to meet multinuclear SoC design space exploration, analyze by excavating the most critical factor restricting performance in instruction set simulation technology, propose a kind of merge virtual and emulation hybrid simulation technology, improve simulation velocity, the speed needed for multinuclear SoC design thoughts and accuracy demand can be reached.

Embodiment

Embodiment:

When the present invention utilizes compiling, the method for code mark provides a kind of high-speed simulation, improves the program computation overhead assessment technology of multinuclear SoC simulation efficiency.The ultimate principle of program computation overhead assessment technology is: first analyze program, then marks out the time delay of estimation, and then the simulated program of local execution, thus collects the performance information of software.Because this simulation model not to lay down a definition execution to target program, but target program is converted to local executable file, so higher simulation efficiency can be obtained.The constant time lag of mark can be run on exploitation main frame, thus obtains performance evaluation result.

This technology needs to carry out pitching pile amendment to program, thus can obtain the execution number of times of programmed instruction, thus estimates the execution time of this program.The information such as programmed instruction execution time, time delay are utilized to be called as realistic model to the method that program is described.Therefore, Compiled code simulation needs an independently realistic model.Realistic model possesses the execution time estimated, perform the behavior that this model just can simulate software, and do not need isa simulator.Meanwhile, due to realistic model directly fortune type on exploitation main frame, the speed of emulation quickly.Certainly, and instruction collection emulator is compared, and Compiled code simulation is slightly poor for the precision of the Performance Evaluation of program; Because it does not consider concrete Compiler Optimization and internal storage access, the two has larger impact for performance.

The step of described program computation overhead assessment technology is as follows: step 1, program segment divide; Step 2, code postpone assessment; Step 3, set up independently realistic model; Step 4, program pitching pile.

Step 1, program segment divide.

The instruction sequence of program is divided into fundamental block.The object of instruction assessment is also the command unit in fundamental block.Because a fundamental block is a special program element, it only has an entrance and an outlet, and the instruction sequences of fundamental block inside performs.Once the entry instruction of fundamental block is performed, so all instructions of fundamental block inside all will sequentially perform once, until exit instruction is performed.The instruction normally simple instruction of fundamental block inside, its execution time is the clock period of clearly recording in a clock period or document.Owing to there is no branch instruction, just there is not branch's overhead issues yet.In multinuclear SoC, a fundamental block does not comprise communication and mutual function between program yet.

The present invention utilizes compiler GCC leading portion that C language source code is converted into intermediate representation, thus generator program fundamental block.

Step 1, code postpone assessment.

In the present invention, a fundamental block is assessed unit as one, its execution time is primarily of three part compositions: the streamline chain time delay caused of fundamental block code execution time, the Tapped Delay time entering fundamental block, fundamental block inside.

A), assessment instruction time.

Article one, the basis of the Performance Evaluation of instruction be by this instruction as under a desirable executive condition, the performance period of instruction, be commonly called and challenge instruction execution cycle (cycle per Instruction-CPI) by oneself.Technique of compiling has applied this category information, and data stream and control flow check is solidified according to the information of instruction level.Under a fixing ordered flow regimen condition, most of instruction of flush bonding processor all only spends a clock period.A small amount of special instruction may perform multiple clock period, but class instruction all can clearly mark and illustrate in processor document.Branch instruction and the chain instruction delay of flowing water then need independent consideration.Based on the above results, the execution time that instruction is selected just can estimate accurately.

Consider based on these, the Performance Evaluation of fundamental block is exactly the time delay sum of all instructions, namely

DelayI _B＝Σ _i∈Bd _i

Here, B is fundamental block, and i is an instruction in fundamental block B, d _ipresentation directives i performs delay.

B), branch's overhead computational.

The overhead of branch instruction is because the condition judgment the possibility of result of branch statement causes instruction prefetch to lose efficacy, must prefetched instruction again; Branch's quality will interrupt instruction flow line in addition.Compiler carries out branch prediction according to program execution characteristics usually, thus the expense brought is redirected in minimizing branch.Such as, according to the position of the execution frequency arrangement fundamental block of program control flow execution route, make the fundamental block neighbour branch statement often performed, thus make instruction prefetch effective in the most of the time.

Branch's expense also must mark on control flow check.By testing different processor streamline, branch's expense of flush bonding processor is two clock period.

DelayB _B＝2

C), the chain Delay computing of flowing water.

Streamline is chain is because the data dependence in data stream causes.When the output that an instruction produces is input (operand) of next instruction, the Output rusults of so Article 1 instruction just must prior to Article 2 instruction.The constraint that this constraint condition will cause instruction sequences to arrange, thus streamline is stopped.Streamline is chain has two pacing itemss: have data dependence relation between (1) two instruction; (2) instruction is that distance between continuous print or instruction is enough short.

Due to one of condition that streamline is chain be there is data dependence instruction between distance enough short.Therefore, be the dependency analysis need not making the overall situation analyzing data dependence, but analyze the data dependence situation of one section of code.Utilize a kind of mechanism of instruction window, data dependence can be analyzed between fundamental block inside and fundamental block, and then assessment streamline chain cycle extra time brought.

Delay P_{B} = Σ_{w &Element; 2^{B} Λ | w | = 2 Λdep (w) = true} d_{w}

Here instruction window size is 2, if it is chain to there is flowing water, the flowing water that it produces postpones for d _w, usual d _wdetermine according to streamline hop count, or be 2 cycles or 1 cycle.

Algorithm 1:

Step 3, set up independently realistic model

In this technique, the foundation of realistic model and expression are crucial.At present, the most appropriate Simulation model representation method is the transaction layer model (Transaction Level Model withTime-TLM/T) of the band time that SystemC provides.Software TLM/T model is that a behavior layer of program describes in essence, and it has attached the execution time estimated.In SystemC, model realization SC_MODULE, and have a SC_THREAD to perform master routine.The expectation of program on target processor performs and postpones to represent by explicitly in a model; This technology, based on software TLM/T model, marks with consume () and performs delay.

Algorithm 2:

Step 4, program pitching pile.

First utilize ProgPart (P) to whole program pitching pile, it is by initialize routine Performance Evaluation environment.Then, according to the division of above-mentioned fundamental block, analysis, Performance Evaluation, insert consume () method at each fundamental block head.Namely inserted before each fundamental block as given an order: Call consume ();

The present invention is not limited to above-mentioned preferred implementation, and anyone should learn the structure change made under enlightenment of the present invention, and every have identical or akin technical scheme with the present invention, all belongs to protection scope of the present invention.

Claims

1. the program computation overhead assessment technology based on high-speed simulation, it is characterized in that: the principle of this program computation overhead assessment technology is: first analyze program, then the time delay of estimation is marked out, and then the simulated program of local execution, thus collect the performance information of software; Said procedure computing cost assessment technology not to lay down a definition execution to target program, but target program is converted to local executable file, so can obtain higher simulation efficiency; The time delay of mark can be run on exploitation main frame, thus obtains performance evaluation result.

2. the program computation overhead assessment technology based on high-speed simulation according to claim 1, is characterized in that: the step of described program computation overhead assessment technology is as follows:

Step 1, program segment divide;

Step 2, code postpone assessment;

Step 3, set up independently realistic model;

Step 4, program pitching pile.

3. the program computation overhead assessment technology based on high-speed simulation according to claim 2, is characterized in that: in described step 1, the instruction sequence of program is divided into fundamental block, and the object of instruction assessment is also the command unit in fundamental block; Fundamental block is a special program element, and it only has an entrance and an outlet, and the instruction sequences of fundamental block inside performs; The entry instruction of fundamental block is performed, and all instructions of fundamental block inside all will sequentially perform once, until exit instruction is performed; The instruction normally simple instruction of fundamental block inside, its execution time is the clock period of clearly recording in a clock period or document; Owing to there is no branch instruction, just there is not branch's overhead issues yet.

4. the program computation overhead assessment technology based on high-speed simulation according to claim 2, it is characterized in that: in described step 2, using a fundamental block as an assessment unit, its execution time is primarily of three part compositions: the streamline chain time delay caused of fundamental block code execution time, the Tapped Delay time entering fundamental block, fundamental block inside.

5. the program computation overhead assessment technology based on high-speed simulation according to claim 2, it is characterized in that: in described step 3, Simulation model representation method is the transaction layer model of the band time that SystemC provides, software TLM/T model is that a behavior layer of program describes in essence, it has attached the execution time estimated, in SystemC, model realization SC_MODULE, and have a SC_THREAD to perform master routine, the expectation of program on target processor performs and postpones to represent by explicitly in a model; This technology, based on software TLM/T model, marks with consume () and performs delay.

6. the program computation overhead assessment technology based on high-speed simulation according to claim 2, it is characterized in that: in described step 4, first utilize ProgPart (P) to whole program pitching pile, it is by initialize routine Performance Evaluation environment, then, according to the division of above-mentioned fundamental block, analysis, Performance Evaluation, insert consume () method at each fundamental block head.