CN104516770A - Program calculation cost estimation technology based on high speed simulation - Google Patents

Program calculation cost estimation technology based on high speed simulation Download PDF

Info

Publication number
CN104516770A
CN104516770A CN201410853868.6A CN201410853868A CN104516770A CN 104516770 A CN104516770 A CN 104516770A CN 201410853868 A CN201410853868 A CN 201410853868A CN 104516770 A CN104516770 A CN 104516770A
Authority
CN
China
Prior art keywords
program
fundamental block
instruction
simulation
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410853868.6A
Other languages
Chinese (zh)
Inventor
李尚杰
程胜
周志军
魏明
卓保特
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING SHENZHOU AEROSPACE SOFTWARE TECHNOLOGY Co Ltd
Original Assignee
BEIJING SHENZHOU AEROSPACE SOFTWARE TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING SHENZHOU AEROSPACE SOFTWARE TECHNOLOGY Co Ltd filed Critical BEIJING SHENZHOU AEROSPACE SOFTWARE TECHNOLOGY Co Ltd
Priority to CN201410853868.6A priority Critical patent/CN104516770A/en
Publication of CN104516770A publication Critical patent/CN104516770A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a program calculation cost estimation technology based on high speed simulation. The program calculation cost estimation technology is characterized in that the principle of the program calculation cost estimation technology comprises the following steps: firstly analyzing a program, then marking an estimated time delay, then locally executing a simulated program, and thus collecting the performance information of software; without interpretive execution for a target program, converting the target program into a locally executable file, and thus obtaining higher simulation efficiency; running the marked time delay on a development host machine, and thus obtaining a performance estimation result. By exploring the most critical factor of restricting the performance in an instruction set simulation technology for analysis, a hybrid simulation technology blending virtuality and simulation is provided, and the simulation speed is improved, so that the speed and accuracy demands required for multi-core SoC design exploration can be met.

Description

A kind of program computation overhead assessment technology based on high-speed simulation
Technical field
The invention belongs to a kind of program computation overhead assessment technology based on high-speed simulation.
Background technology
Multinuclear SoC is made up of multiple isomery processing unit often.The key point evaluating the design proposal of multinuclear SoC is the performance of the given application program of assessment on the multinuclear SoC platform of certain configuration.But because the performance of framework to whole system of multinuclear SoC has considerable influence; One section of code (individual processor can obtain superior performance and not mean that and also have identical result on other processors one.This just needs to utilize isa simulator application programs to emulate.
But instruction set simulation technology needs structure one to describe the emulator of target architecture in detail, and explanation is performed every objective code by emulator, and this just causes simulation velocity very low, is difficult to the demand meeting multinuclear SoC design space exploration.Design space exploration emulates utilizing emulator design proposal as much as possible, thus compares optimization design scheme.
Summary of the invention
The technical problem to be solved in the present invention overcomes above-mentioned defect, there is provided a kind of and merge hybrid simulation technology that is virtual and emulation, improve simulation velocity, the program computation overhead assessment technology based on high-speed simulation of speed needed for multinuclear SoC design thoughts and accuracy demand can be reached.
For solving the problem, the technical solution adopted in the present invention is:
A kind of program computation overhead assessment technology based on high-speed simulation, it is characterized in that: the principle of this program computation overhead assessment technology is: first analyze program, then the time delay of estimation is marked out, and then the simulated program of local execution, thus collect the performance information of software; Said procedure computing cost assessment technology not to lay down a definition execution to target program, but target program is converted to local executable file, so can obtain higher simulation efficiency; The time delay of mark can be run on exploitation main frame, thus obtains performance evaluation result.
As a kind of technical scheme of optimization, the step of described program computation overhead assessment technology is as follows: step 1, program segment divide; Step 2, code postpone assessment; Step 3, set up independently realistic model; Step 4, program pitching pile.
As a kind of technical scheme of optimization, in described step 1, the instruction sequence of program is divided into fundamental block, the object of instruction assessment is also the command unit in fundamental block; Fundamental block is a special program element, and it only has an entrance and an outlet, and the instruction sequences of fundamental block inside performs; The entry instruction of fundamental block is performed, and all instructions of fundamental block inside all will sequentially perform once, until exit instruction is performed; The instruction normally simple instruction of fundamental block inside, its execution time is the clock period of clearly recording in a clock period or document; Owing to there is no branch instruction, just there is not branch's overhead issues yet.
As a kind of technical scheme of optimization, in described step 2, using a fundamental block as an assessment unit, its execution time is primarily of three part compositions: the streamline chain time delay caused of fundamental block code execution time, the Tapped Delay time entering fundamental block, fundamental block inside.
As a kind of technical scheme of optimization, in described step 3, Simulation model representation method is the transaction layer model of the band time that SystemC provides, software TLM/T model is that a behavior layer of program describes in essence, it has attached the execution time estimated, in SystemC, and model realization SC_MODULE, and have a SC_THREAD to perform master routine, the expectation of program on target processor performs and postpones to represent by explicitly in a model; This technology, based on software TLM/T model, marks with consume () and performs delay.
As a kind of technical scheme of optimization, in described step 4, first utilize ProgPart (P) to whole program pitching pile, it is by initialize routine Performance Evaluation environment, then, according to the division of above-mentioned fundamental block, analysis, Performance Evaluation, insert consume () method at each fundamental block head.
Owing to have employed technique scheme, compared with prior art, the present invention is directed to instruction set simulation technology simulation velocity low, this is not enough cannot to meet multinuclear SoC design space exploration, analyze by excavating the most critical factor restricting performance in instruction set simulation technology, propose a kind of merge virtual and emulation hybrid simulation technology, improve simulation velocity, the speed needed for multinuclear SoC design thoughts and accuracy demand can be reached.
Embodiment
Embodiment:
When the present invention utilizes compiling, the method for code mark provides a kind of high-speed simulation, improves the program computation overhead assessment technology of multinuclear SoC simulation efficiency.The ultimate principle of program computation overhead assessment technology is: first analyze program, then marks out the time delay of estimation, and then the simulated program of local execution, thus collects the performance information of software.Because this simulation model not to lay down a definition execution to target program, but target program is converted to local executable file, so higher simulation efficiency can be obtained.The constant time lag of mark can be run on exploitation main frame, thus obtains performance evaluation result.
This technology needs to carry out pitching pile amendment to program, thus can obtain the execution number of times of programmed instruction, thus estimates the execution time of this program.The information such as programmed instruction execution time, time delay are utilized to be called as realistic model to the method that program is described.Therefore, Compiled code simulation needs an independently realistic model.Realistic model possesses the execution time estimated, perform the behavior that this model just can simulate software, and do not need isa simulator.Meanwhile, due to realistic model directly fortune type on exploitation main frame, the speed of emulation quickly.Certainly, and instruction collection emulator is compared, and Compiled code simulation is slightly poor for the precision of the Performance Evaluation of program; Because it does not consider concrete Compiler Optimization and internal storage access, the two has larger impact for performance.
The step of described program computation overhead assessment technology is as follows: step 1, program segment divide; Step 2, code postpone assessment; Step 3, set up independently realistic model; Step 4, program pitching pile.
Step 1, program segment divide.
The instruction sequence of program is divided into fundamental block.The object of instruction assessment is also the command unit in fundamental block.Because a fundamental block is a special program element, it only has an entrance and an outlet, and the instruction sequences of fundamental block inside performs.Once the entry instruction of fundamental block is performed, so all instructions of fundamental block inside all will sequentially perform once, until exit instruction is performed.The instruction normally simple instruction of fundamental block inside, its execution time is the clock period of clearly recording in a clock period or document.Owing to there is no branch instruction, just there is not branch's overhead issues yet.In multinuclear SoC, a fundamental block does not comprise communication and mutual function between program yet.
The present invention utilizes compiler GCC leading portion that C language source code is converted into intermediate representation, thus generator program fundamental block.
Step 1, code postpone assessment.
In the present invention, a fundamental block is assessed unit as one, its execution time is primarily of three part compositions: the streamline chain time delay caused of fundamental block code execution time, the Tapped Delay time entering fundamental block, fundamental block inside.
A), assessment instruction time.
Article one, the basis of the Performance Evaluation of instruction be by this instruction as under a desirable executive condition, the performance period of instruction, be commonly called and challenge instruction execution cycle (cycle per Instruction-CPI) by oneself.Technique of compiling has applied this category information, and data stream and control flow check is solidified according to the information of instruction level.Under a fixing ordered flow regimen condition, most of instruction of flush bonding processor all only spends a clock period.A small amount of special instruction may perform multiple clock period, but class instruction all can clearly mark and illustrate in processor document.Branch instruction and the chain instruction delay of flowing water then need independent consideration.Based on the above results, the execution time that instruction is selected just can estimate accurately.
Consider based on these, the Performance Evaluation of fundamental block is exactly the time delay sum of all instructions, namely
DelayI B=Σ i∈Bd i
Here, B is fundamental block, and i is an instruction in fundamental block B, d ipresentation directives i performs delay.
B), branch's overhead computational.
The overhead of branch instruction is because the condition judgment the possibility of result of branch statement causes instruction prefetch to lose efficacy, must prefetched instruction again; Branch's quality will interrupt instruction flow line in addition.Compiler carries out branch prediction according to program execution characteristics usually, thus the expense brought is redirected in minimizing branch.Such as, according to the position of the execution frequency arrangement fundamental block of program control flow execution route, make the fundamental block neighbour branch statement often performed, thus make instruction prefetch effective in the most of the time.
Branch's expense also must mark on control flow check.By testing different processor streamline, branch's expense of flush bonding processor is two clock period.
DelayB B=2
C), the chain Delay computing of flowing water.
Streamline is chain is because the data dependence in data stream causes.When the output that an instruction produces is input (operand) of next instruction, the Output rusults of so Article 1 instruction just must prior to Article 2 instruction.The constraint that this constraint condition will cause instruction sequences to arrange, thus streamline is stopped.Streamline is chain has two pacing itemss: have data dependence relation between (1) two instruction; (2) instruction is that distance between continuous print or instruction is enough short.
Due to one of condition that streamline is chain be there is data dependence instruction between distance enough short.Therefore, be the dependency analysis need not making the overall situation analyzing data dependence, but analyze the data dependence situation of one section of code.Utilize a kind of mechanism of instruction window, data dependence can be analyzed between fundamental block inside and fundamental block, and then assessment streamline chain cycle extra time brought.
Delay P B = Σ w ∈ 2 B Λ | w | = 2 Λdep ( w ) = true d w
Here instruction window size is 2, if it is chain to there is flowing water, the flowing water that it produces postpones for d w, usual d wdetermine according to streamline hop count, or be 2 cycles or 1 cycle.
Algorithm 1:
Step 3, set up independently realistic model
In this technique, the foundation of realistic model and expression are crucial.At present, the most appropriate Simulation model representation method is the transaction layer model (Transaction Level Model withTime-TLM/T) of the band time that SystemC provides.Software TLM/T model is that a behavior layer of program describes in essence, and it has attached the execution time estimated.In SystemC, model realization SC_MODULE, and have a SC_THREAD to perform master routine.The expectation of program on target processor performs and postpones to represent by explicitly in a model; This technology, based on software TLM/T model, marks with consume () and performs delay.
Algorithm 2:
Step 4, program pitching pile.
First utilize ProgPart (P) to whole program pitching pile, it is by initialize routine Performance Evaluation environment.Then, according to the division of above-mentioned fundamental block, analysis, Performance Evaluation, insert consume () method at each fundamental block head.Namely inserted before each fundamental block as given an order: Call consume ();
The present invention is not limited to above-mentioned preferred implementation, and anyone should learn the structure change made under enlightenment of the present invention, and every have identical or akin technical scheme with the present invention, all belongs to protection scope of the present invention.

Claims (6)

1. the program computation overhead assessment technology based on high-speed simulation, it is characterized in that: the principle of this program computation overhead assessment technology is: first analyze program, then the time delay of estimation is marked out, and then the simulated program of local execution, thus collect the performance information of software; Said procedure computing cost assessment technology not to lay down a definition execution to target program, but target program is converted to local executable file, so can obtain higher simulation efficiency; The time delay of mark can be run on exploitation main frame, thus obtains performance evaluation result.
2. the program computation overhead assessment technology based on high-speed simulation according to claim 1, is characterized in that: the step of described program computation overhead assessment technology is as follows:
Step 1, program segment divide;
Step 2, code postpone assessment;
Step 3, set up independently realistic model;
Step 4, program pitching pile.
3. the program computation overhead assessment technology based on high-speed simulation according to claim 2, is characterized in that: in described step 1, the instruction sequence of program is divided into fundamental block, and the object of instruction assessment is also the command unit in fundamental block; Fundamental block is a special program element, and it only has an entrance and an outlet, and the instruction sequences of fundamental block inside performs; The entry instruction of fundamental block is performed, and all instructions of fundamental block inside all will sequentially perform once, until exit instruction is performed; The instruction normally simple instruction of fundamental block inside, its execution time is the clock period of clearly recording in a clock period or document; Owing to there is no branch instruction, just there is not branch's overhead issues yet.
4. the program computation overhead assessment technology based on high-speed simulation according to claim 2, it is characterized in that: in described step 2, using a fundamental block as an assessment unit, its execution time is primarily of three part compositions: the streamline chain time delay caused of fundamental block code execution time, the Tapped Delay time entering fundamental block, fundamental block inside.
5. the program computation overhead assessment technology based on high-speed simulation according to claim 2, it is characterized in that: in described step 3, Simulation model representation method is the transaction layer model of the band time that SystemC provides, software TLM/T model is that a behavior layer of program describes in essence, it has attached the execution time estimated, in SystemC, model realization SC_MODULE, and have a SC_THREAD to perform master routine, the expectation of program on target processor performs and postpones to represent by explicitly in a model; This technology, based on software TLM/T model, marks with consume () and performs delay.
6. the program computation overhead assessment technology based on high-speed simulation according to claim 2, it is characterized in that: in described step 4, first utilize ProgPart (P) to whole program pitching pile, it is by initialize routine Performance Evaluation environment, then, according to the division of above-mentioned fundamental block, analysis, Performance Evaluation, insert consume () method at each fundamental block head.
CN201410853868.6A 2014-12-31 2014-12-31 Program calculation cost estimation technology based on high speed simulation Pending CN104516770A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410853868.6A CN104516770A (en) 2014-12-31 2014-12-31 Program calculation cost estimation technology based on high speed simulation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410853868.6A CN104516770A (en) 2014-12-31 2014-12-31 Program calculation cost estimation technology based on high speed simulation

Publications (1)

Publication Number Publication Date
CN104516770A true CN104516770A (en) 2015-04-15

Family

ID=52792129

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410853868.6A Pending CN104516770A (en) 2014-12-31 2014-12-31 Program calculation cost estimation technology based on high speed simulation

Country Status (1)

Country Link
CN (1) CN104516770A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108090239A (en) * 2016-11-23 2018-05-29 北京遥感设备研究所 A kind of distributed simulation method based on TLM system models
CN113128143A (en) * 2021-06-17 2021-07-16 北京燧原智能科技有限公司 AI processor simulation method, AI processor simulation device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102520984A (en) * 2011-11-29 2012-06-27 北京广利核系统工程有限公司 Computing method for worst time of object software in specified hardware environment
CN102622260A (en) * 2012-02-27 2012-08-01 中国科学院计算技术研究所 Optimization method and optimization system of on-line iteration compiling
CN103207772A (en) * 2013-04-07 2013-07-17 北京航空航天大学 Instruction prefetching content selecting method for optimizing WCET (worst-case execution time) of real-time task

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102520984A (en) * 2011-11-29 2012-06-27 北京广利核系统工程有限公司 Computing method for worst time of object software in specified hardware environment
CN102622260A (en) * 2012-02-27 2012-08-01 中国科学院计算技术研究所 Optimization method and optimization system of on-line iteration compiling
CN103207772A (en) * 2013-04-07 2013-07-17 北京航空航天大学 Instruction prefetching content selecting method for optimizing WCET (worst-case execution time) of real-time task

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
姬孟洛: ""实时系统最差情况执行时间分析的研究"", 《中国博士学位论文全文数据库信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108090239A (en) * 2016-11-23 2018-05-29 北京遥感设备研究所 A kind of distributed simulation method based on TLM system models
CN113128143A (en) * 2021-06-17 2021-07-16 北京燧原智能科技有限公司 AI processor simulation method, AI processor simulation device, computer equipment and storage medium
CN113128143B (en) * 2021-06-17 2021-09-28 北京燧原智能科技有限公司 AI processor simulation method, AI processor simulation device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN105893256B (en) software fault positioning method based on machine learning algorithm
Van Werkhoven et al. Performance models for CPU-GPU data transfers
Chakravarty et al. Automated, retargetable back-annotation for host compiled performance and power modeling
Wolf Behavioral intervals in embedded software: timing and power analysis of embedded real-time software processes
Ma et al. Can't see the forest for the trees: State restoration's limitations in post-silicon trace signal selection
Zuo et al. A polyhedral-based systemc modeling and generation framework for effective low-power design space exploration
Gerstlauer et al. Abstract system-level models for early performance and power exploration
Posadas et al. System-level performance analysis in SystemC
Zhao et al. Source-level performance, energy, reliability, power and thermal (PERPT) simulation
Wang et al. An approach to improve accuracy of source-level TLMs of embedded software
CN106469114B (en) A kind of Parallel Computing Performance detection system and its method towards communication test
CN104361182A (en) Microprocessor micro system structure parameter optimization method based on Petri network
Chen et al. {HyPFuzz}:{Formal-Assisted} Processor Fuzzing
Oyamada et al. Software performance estimation in MPSoC design
CN104516770A (en) Program calculation cost estimation technology based on high speed simulation
Carrington et al. Applying an automated framework to produce accurate blind performance predictions of full-scale hpc applications
Diaz et al. VIPPE, parallel simulation and performance analysis of multi-core embedded systems on multi-core platforms
Wolf et al. Intervals in software execution cost analysis
Uddin et al. Collecting signatures to model latency tolerance in high-level simulations of microthreaded cores
Engblom et al. A worst-case execution-time analysis tool prototype for embedded real-time systems
Patel et al. Recent trends in embedded system software performance estimation
Wong et al. Pas2p tool, parallel application signature for performance prediction
George et al. An Integrated Simulation Environment for Parallel and Distributed System Prototying
Kupriyanov et al. High-speed event-driven rtl compiled simulation
Callanan et al. Estimating Stream Application Performance in Early-Stage System Design

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150415