CN104516770A - Program calculation cost estimation technology based on high speed simulation - Google Patents
Program calculation cost estimation technology based on high speed simulation Download PDFInfo
- Publication number
- CN104516770A CN104516770A CN201410853868.6A CN201410853868A CN104516770A CN 104516770 A CN104516770 A CN 104516770A CN 201410853868 A CN201410853868 A CN 201410853868A CN 104516770 A CN104516770 A CN 104516770A
- Authority
- CN
- China
- Prior art keywords
- program
- fundamental block
- instruction
- simulation
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The invention discloses a program calculation cost estimation technology based on high speed simulation. The program calculation cost estimation technology is characterized in that the principle of the program calculation cost estimation technology comprises the following steps: firstly analyzing a program, then marking an estimated time delay, then locally executing a simulated program, and thus collecting the performance information of software; without interpretive execution for a target program, converting the target program into a locally executable file, and thus obtaining higher simulation efficiency; running the marked time delay on a development host machine, and thus obtaining a performance estimation result. By exploring the most critical factor of restricting the performance in an instruction set simulation technology for analysis, a hybrid simulation technology blending virtuality and simulation is provided, and the simulation speed is improved, so that the speed and accuracy demands required for multi-core SoC design exploration can be met.
Description
Technical field
The invention belongs to a kind of program computation overhead assessment technology based on high-speed simulation.
Background technology
Multinuclear SoC is made up of multiple isomery processing unit often.The key point evaluating the design proposal of multinuclear SoC is the performance of the given application program of assessment on the multinuclear SoC platform of certain configuration.But because the performance of framework to whole system of multinuclear SoC has considerable influence; One section of code (individual processor can obtain superior performance and not mean that and also have identical result on other processors one.This just needs to utilize isa simulator application programs to emulate.
But instruction set simulation technology needs structure one to describe the emulator of target architecture in detail, and explanation is performed every objective code by emulator, and this just causes simulation velocity very low, is difficult to the demand meeting multinuclear SoC design space exploration.Design space exploration emulates utilizing emulator design proposal as much as possible, thus compares optimization design scheme.
Summary of the invention
The technical problem to be solved in the present invention overcomes above-mentioned defect, there is provided a kind of and merge hybrid simulation technology that is virtual and emulation, improve simulation velocity, the program computation overhead assessment technology based on high-speed simulation of speed needed for multinuclear SoC design thoughts and accuracy demand can be reached.
For solving the problem, the technical solution adopted in the present invention is:
A kind of program computation overhead assessment technology based on high-speed simulation, it is characterized in that: the principle of this program computation overhead assessment technology is: first analyze program, then the time delay of estimation is marked out, and then the simulated program of local execution, thus collect the performance information of software; Said procedure computing cost assessment technology not to lay down a definition execution to target program, but target program is converted to local executable file, so can obtain higher simulation efficiency; The time delay of mark can be run on exploitation main frame, thus obtains performance evaluation result.
As a kind of technical scheme of optimization, the step of described program computation overhead assessment technology is as follows: step 1, program segment divide; Step 2, code postpone assessment; Step 3, set up independently realistic model; Step 4, program pitching pile.
As a kind of technical scheme of optimization, in described step 1, the instruction sequence of program is divided into fundamental block, the object of instruction assessment is also the command unit in fundamental block; Fundamental block is a special program element, and it only has an entrance and an outlet, and the instruction sequences of fundamental block inside performs; The entry instruction of fundamental block is performed, and all instructions of fundamental block inside all will sequentially perform once, until exit instruction is performed; The instruction normally simple instruction of fundamental block inside, its execution time is the clock period of clearly recording in a clock period or document; Owing to there is no branch instruction, just there is not branch's overhead issues yet.
As a kind of technical scheme of optimization, in described step 2, using a fundamental block as an assessment unit, its execution time is primarily of three part compositions: the streamline chain time delay caused of fundamental block code execution time, the Tapped Delay time entering fundamental block, fundamental block inside.
As a kind of technical scheme of optimization, in described step 3, Simulation model representation method is the transaction layer model of the band time that SystemC provides, software TLM/T model is that a behavior layer of program describes in essence, it has attached the execution time estimated, in SystemC, and model realization SC_MODULE, and have a SC_THREAD to perform master routine, the expectation of program on target processor performs and postpones to represent by explicitly in a model; This technology, based on software TLM/T model, marks with consume () and performs delay.
As a kind of technical scheme of optimization, in described step 4, first utilize ProgPart (P) to whole program pitching pile, it is by initialize routine Performance Evaluation environment, then, according to the division of above-mentioned fundamental block, analysis, Performance Evaluation, insert consume () method at each fundamental block head.
Owing to have employed technique scheme, compared with prior art, the present invention is directed to instruction set simulation technology simulation velocity low, this is not enough cannot to meet multinuclear SoC design space exploration, analyze by excavating the most critical factor restricting performance in instruction set simulation technology, propose a kind of merge virtual and emulation hybrid simulation technology, improve simulation velocity, the speed needed for multinuclear SoC design thoughts and accuracy demand can be reached.
Embodiment
Embodiment:
When the present invention utilizes compiling, the method for code mark provides a kind of high-speed simulation, improves the program computation overhead assessment technology of multinuclear SoC simulation efficiency.The ultimate principle of program computation overhead assessment technology is: first analyze program, then marks out the time delay of estimation, and then the simulated program of local execution, thus collects the performance information of software.Because this simulation model not to lay down a definition execution to target program, but target program is converted to local executable file, so higher simulation efficiency can be obtained.The constant time lag of mark can be run on exploitation main frame, thus obtains performance evaluation result.
This technology needs to carry out pitching pile amendment to program, thus can obtain the execution number of times of programmed instruction, thus estimates the execution time of this program.The information such as programmed instruction execution time, time delay are utilized to be called as realistic model to the method that program is described.Therefore, Compiled code simulation needs an independently realistic model.Realistic model possesses the execution time estimated, perform the behavior that this model just can simulate software, and do not need isa simulator.Meanwhile, due to realistic model directly fortune type on exploitation main frame, the speed of emulation quickly.Certainly, and instruction collection emulator is compared, and Compiled code simulation is slightly poor for the precision of the Performance Evaluation of program; Because it does not consider concrete Compiler Optimization and internal storage access, the two has larger impact for performance.
The step of described program computation overhead assessment technology is as follows: step 1, program segment divide; Step 2, code postpone assessment; Step 3, set up independently realistic model; Step 4, program pitching pile.
Step 1, program segment divide.
The instruction sequence of program is divided into fundamental block.The object of instruction assessment is also the command unit in fundamental block.Because a fundamental block is a special program element, it only has an entrance and an outlet, and the instruction sequences of fundamental block inside performs.Once the entry instruction of fundamental block is performed, so all instructions of fundamental block inside all will sequentially perform once, until exit instruction is performed.The instruction normally simple instruction of fundamental block inside, its execution time is the clock period of clearly recording in a clock period or document.Owing to there is no branch instruction, just there is not branch's overhead issues yet.In multinuclear SoC, a fundamental block does not comprise communication and mutual function between program yet.
The present invention utilizes compiler GCC leading portion that C language source code is converted into intermediate representation, thus generator program fundamental block.
Step 1, code postpone assessment.
In the present invention, a fundamental block is assessed unit as one, its execution time is primarily of three part compositions: the streamline chain time delay caused of fundamental block code execution time, the Tapped Delay time entering fundamental block, fundamental block inside.
A), assessment instruction time.
Article one, the basis of the Performance Evaluation of instruction be by this instruction as under a desirable executive condition, the performance period of instruction, be commonly called and challenge instruction execution cycle (cycle per Instruction-CPI) by oneself.Technique of compiling has applied this category information, and data stream and control flow check is solidified according to the information of instruction level.Under a fixing ordered flow regimen condition, most of instruction of flush bonding processor all only spends a clock period.A small amount of special instruction may perform multiple clock period, but class instruction all can clearly mark and illustrate in processor document.Branch instruction and the chain instruction delay of flowing water then need independent consideration.Based on the above results, the execution time that instruction is selected just can estimate accurately.
Consider based on these, the Performance Evaluation of fundamental block is exactly the time delay sum of all instructions, namely
DelayI
B=Σ
i∈Bd
i
Here, B is fundamental block, and i is an instruction in fundamental block B, d
ipresentation directives i performs delay.
B), branch's overhead computational.
The overhead of branch instruction is because the condition judgment the possibility of result of branch statement causes instruction prefetch to lose efficacy, must prefetched instruction again; Branch's quality will interrupt instruction flow line in addition.Compiler carries out branch prediction according to program execution characteristics usually, thus the expense brought is redirected in minimizing branch.Such as, according to the position of the execution frequency arrangement fundamental block of program control flow execution route, make the fundamental block neighbour branch statement often performed, thus make instruction prefetch effective in the most of the time.
Branch's expense also must mark on control flow check.By testing different processor streamline, branch's expense of flush bonding processor is two clock period.
DelayB
B=2
C), the chain Delay computing of flowing water.
Streamline is chain is because the data dependence in data stream causes.When the output that an instruction produces is input (operand) of next instruction, the Output rusults of so Article 1 instruction just must prior to Article 2 instruction.The constraint that this constraint condition will cause instruction sequences to arrange, thus streamline is stopped.Streamline is chain has two pacing itemss: have data dependence relation between (1) two instruction; (2) instruction is that distance between continuous print or instruction is enough short.
Due to one of condition that streamline is chain be there is data dependence instruction between distance enough short.Therefore, be the dependency analysis need not making the overall situation analyzing data dependence, but analyze the data dependence situation of one section of code.Utilize a kind of mechanism of instruction window, data dependence can be analyzed between fundamental block inside and fundamental block, and then assessment streamline chain cycle extra time brought.
Here instruction window size is 2, if it is chain to there is flowing water, the flowing water that it produces postpones for d
w, usual d
wdetermine according to streamline hop count, or be 2 cycles or 1 cycle.
Algorithm 1:
Step 3, set up independently realistic model
In this technique, the foundation of realistic model and expression are crucial.At present, the most appropriate Simulation model representation method is the transaction layer model (Transaction Level Model withTime-TLM/T) of the band time that SystemC provides.Software TLM/T model is that a behavior layer of program describes in essence, and it has attached the execution time estimated.In SystemC, model realization SC_MODULE, and have a SC_THREAD to perform master routine.The expectation of program on target processor performs and postpones to represent by explicitly in a model; This technology, based on software TLM/T model, marks with consume () and performs delay.
Algorithm 2:
Step 4, program pitching pile.
First utilize ProgPart (P) to whole program pitching pile, it is by initialize routine Performance Evaluation environment.Then, according to the division of above-mentioned fundamental block, analysis, Performance Evaluation, insert consume () method at each fundamental block head.Namely inserted before each fundamental block as given an order: Call consume ();
The present invention is not limited to above-mentioned preferred implementation, and anyone should learn the structure change made under enlightenment of the present invention, and every have identical or akin technical scheme with the present invention, all belongs to protection scope of the present invention.
Claims (6)
1. the program computation overhead assessment technology based on high-speed simulation, it is characterized in that: the principle of this program computation overhead assessment technology is: first analyze program, then the time delay of estimation is marked out, and then the simulated program of local execution, thus collect the performance information of software; Said procedure computing cost assessment technology not to lay down a definition execution to target program, but target program is converted to local executable file, so can obtain higher simulation efficiency; The time delay of mark can be run on exploitation main frame, thus obtains performance evaluation result.
2. the program computation overhead assessment technology based on high-speed simulation according to claim 1, is characterized in that: the step of described program computation overhead assessment technology is as follows:
Step 1, program segment divide;
Step 2, code postpone assessment;
Step 3, set up independently realistic model;
Step 4, program pitching pile.
3. the program computation overhead assessment technology based on high-speed simulation according to claim 2, is characterized in that: in described step 1, the instruction sequence of program is divided into fundamental block, and the object of instruction assessment is also the command unit in fundamental block; Fundamental block is a special program element, and it only has an entrance and an outlet, and the instruction sequences of fundamental block inside performs; The entry instruction of fundamental block is performed, and all instructions of fundamental block inside all will sequentially perform once, until exit instruction is performed; The instruction normally simple instruction of fundamental block inside, its execution time is the clock period of clearly recording in a clock period or document; Owing to there is no branch instruction, just there is not branch's overhead issues yet.
4. the program computation overhead assessment technology based on high-speed simulation according to claim 2, it is characterized in that: in described step 2, using a fundamental block as an assessment unit, its execution time is primarily of three part compositions: the streamline chain time delay caused of fundamental block code execution time, the Tapped Delay time entering fundamental block, fundamental block inside.
5. the program computation overhead assessment technology based on high-speed simulation according to claim 2, it is characterized in that: in described step 3, Simulation model representation method is the transaction layer model of the band time that SystemC provides, software TLM/T model is that a behavior layer of program describes in essence, it has attached the execution time estimated, in SystemC, model realization SC_MODULE, and have a SC_THREAD to perform master routine, the expectation of program on target processor performs and postpones to represent by explicitly in a model; This technology, based on software TLM/T model, marks with consume () and performs delay.
6. the program computation overhead assessment technology based on high-speed simulation according to claim 2, it is characterized in that: in described step 4, first utilize ProgPart (P) to whole program pitching pile, it is by initialize routine Performance Evaluation environment, then, according to the division of above-mentioned fundamental block, analysis, Performance Evaluation, insert consume () method at each fundamental block head.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410853868.6A CN104516770A (en) | 2014-12-31 | 2014-12-31 | Program calculation cost estimation technology based on high speed simulation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410853868.6A CN104516770A (en) | 2014-12-31 | 2014-12-31 | Program calculation cost estimation technology based on high speed simulation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104516770A true CN104516770A (en) | 2015-04-15 |
Family
ID=52792129
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410853868.6A Pending CN104516770A (en) | 2014-12-31 | 2014-12-31 | Program calculation cost estimation technology based on high speed simulation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104516770A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108090239A (en) * | 2016-11-23 | 2018-05-29 | 北京遥感设备研究所 | A kind of distributed simulation method based on TLM system models |
CN113128143A (en) * | 2021-06-17 | 2021-07-16 | 北京燧原智能科技有限公司 | AI processor simulation method, AI processor simulation device, computer equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102520984A (en) * | 2011-11-29 | 2012-06-27 | 北京广利核系统工程有限公司 | Computing method for worst time of object software in specified hardware environment |
CN102622260A (en) * | 2012-02-27 | 2012-08-01 | 中国科学院计算技术研究所 | Optimization method and optimization system of on-line iteration compiling |
CN103207772A (en) * | 2013-04-07 | 2013-07-17 | 北京航空航天大学 | Instruction prefetching content selecting method for optimizing WCET (worst-case execution time) of real-time task |
-
2014
- 2014-12-31 CN CN201410853868.6A patent/CN104516770A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102520984A (en) * | 2011-11-29 | 2012-06-27 | 北京广利核系统工程有限公司 | Computing method for worst time of object software in specified hardware environment |
CN102622260A (en) * | 2012-02-27 | 2012-08-01 | 中国科学院计算技术研究所 | Optimization method and optimization system of on-line iteration compiling |
CN103207772A (en) * | 2013-04-07 | 2013-07-17 | 北京航空航天大学 | Instruction prefetching content selecting method for optimizing WCET (worst-case execution time) of real-time task |
Non-Patent Citations (1)
Title |
---|
姬孟洛: ""实时系统最差情况执行时间分析的研究"", 《中国博士学位论文全文数据库信息科技辑》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108090239A (en) * | 2016-11-23 | 2018-05-29 | 北京遥感设备研究所 | A kind of distributed simulation method based on TLM system models |
CN113128143A (en) * | 2021-06-17 | 2021-07-16 | 北京燧原智能科技有限公司 | AI processor simulation method, AI processor simulation device, computer equipment and storage medium |
CN113128143B (en) * | 2021-06-17 | 2021-09-28 | 北京燧原智能科技有限公司 | AI processor simulation method, AI processor simulation device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105893256B (en) | software fault positioning method based on machine learning algorithm | |
Van Werkhoven et al. | Performance models for CPU-GPU data transfers | |
Chakravarty et al. | Automated, retargetable back-annotation for host compiled performance and power modeling | |
Wolf | Behavioral intervals in embedded software: timing and power analysis of embedded real-time software processes | |
Ma et al. | Can't see the forest for the trees: State restoration's limitations in post-silicon trace signal selection | |
Zuo et al. | A polyhedral-based systemc modeling and generation framework for effective low-power design space exploration | |
Gerstlauer et al. | Abstract system-level models for early performance and power exploration | |
Posadas et al. | System-level performance analysis in SystemC | |
Zhao et al. | Source-level performance, energy, reliability, power and thermal (PERPT) simulation | |
Wang et al. | An approach to improve accuracy of source-level TLMs of embedded software | |
CN106469114B (en) | A kind of Parallel Computing Performance detection system and its method towards communication test | |
CN104361182A (en) | Microprocessor micro system structure parameter optimization method based on Petri network | |
Chen et al. | {HyPFuzz}:{Formal-Assisted} Processor Fuzzing | |
Oyamada et al. | Software performance estimation in MPSoC design | |
CN104516770A (en) | Program calculation cost estimation technology based on high speed simulation | |
Carrington et al. | Applying an automated framework to produce accurate blind performance predictions of full-scale hpc applications | |
Diaz et al. | VIPPE, parallel simulation and performance analysis of multi-core embedded systems on multi-core platforms | |
Wolf et al. | Intervals in software execution cost analysis | |
Uddin et al. | Collecting signatures to model latency tolerance in high-level simulations of microthreaded cores | |
Engblom et al. | A worst-case execution-time analysis tool prototype for embedded real-time systems | |
Patel et al. | Recent trends in embedded system software performance estimation | |
Wong et al. | Pas2p tool, parallel application signature for performance prediction | |
George et al. | An Integrated Simulation Environment for Parallel and Distributed System Prototying | |
Kupriyanov et al. | High-speed event-driven rtl compiled simulation | |
Callanan et al. | Estimating Stream Application Performance in Early-Stage System Design |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20150415 |