CN103677965A

CN103677965A - All-digital high-speed simulating technology

Info

Publication number: CN103677965A
Application number: CN201410003561.7A
Authority: CN
Inventors: 任永青; 魏明; 王金龙
Original assignee: BEIJING SHENZHOU AEROSPACE SOFTWARE TECHNOLOGY Co Ltd
Current assignee: Beijing Shenzhou Aerospace Software Technology Co.,Ltd.
Priority date: 2014-01-03
Filing date: 2014-01-03
Publication date: 2014-03-26
Anticipated expiration: 2034-01-03
Also published as: CN103677965B

Abstract

The invention publishes an all-digital high-speed simulating technology which is characterized by comprising the following technical parts of 1, a reverse-order production line technology, 2, a register backup technology, 3, a storage-system writing-time allocating technology, 4, a functional-part register-cache delaying technology, 5, an instruction-decode buffering technology and 6, a high-speed cached-data simulating and delay-emulating separating technology. Through the combined application of the all-digital high-speed simulating technology, all-digital high-speed simulation is realized, the requirements of working, such as embedded-type software development, debugging and testing based on an all-digital high-speed simulating environment are met, the researching and developing efficiency is remarkably improved, the cost is lowered, the quality is improved, and the time to market is shortened.

Description

A kind of digital fast simulation technique

Technical field

The invention belongs to computer simulation technique field, specifically, relate to a kind of digital fast simulation technique.

Background technology

The technology such as C67X series DSP is the high performance digital signal process chip generally adopting in embedded system, and it adopts 16 level production lines, 8 transmitting very long instruction words, become with features such as high-throughput, low delay, high stabilities the first-selection that multiple systems designs.But due to instruction set and the pipeline organization of this DSP high complexity, digitalized artificial technology faces lot of challenges, especially digital fast simulation technique is the basis that solves its availability.The Digital Simulation of current DSP C67X only has the simulation software of external TI company, but belongs to trade secret, and simulation performance is very low, cannot meet application demand.

Summary of the invention

For solving the problem in background technology, the invention provides a kind of digital fast simulation technique, pass through applied in any combination, realize digital high-speed simulation, meet the work such as embedded software developing based on digital simulated environment, debugging, test, significantly improve efficiency of research and development, reduce costs, improve quality, shorten Time To Market.

Technical scheme of the present invention is:

1), inverted order pipelining a digital fast simulation technique, is characterized in that: comprise following technology part:; 2), register redundancy technique; 3) distribution technique when, storage system is write; 4), functional part register buffer delay technology; 5), Instruction decoding buffer technology; 6), cached data emulation and delay simulation isolation technics.

The simulation that described inverted order pipelining is each stage of virtual machine streamline need to store current dummy instruction, adopts inverted order simulation can effectively reduce program internal memory consumption.

Described register redundancy technique before analog stream waterline and afterwards, is respectively register backup and register recovery operation.

When described storage system is write, distribution technique is the mode of taking used time distribution, and each sector address space, if user program does not conduct interviews, just refuses allocation space, completes the distribution of storage space in program process.

Functional part in described functional part register buffer delay technology, comprises arithmetic logical operation, floating-point operation and memory access, comprises at most 10 level production lines, adopts inverted order pipelining to need 10 function calls.

Described Instruction decoding buffer technology is that virtual kernel distributes a slice region of memory to be used for storing specially the decoding information of these recursion instructions; When the address of next instruction bag is consistent with instruction packet address in Instruction decoding buffer memory, in the first six stage of streamline, with regard to not needing, carry out any operation so, during execution, use the decoding information of each instruction in Instruction decoding buffer memory.

Described cached data emulation is that processor core is when carrying out data access with postponing simulation isolation technics, can directly by the quick path of storage space, carry out the storage of data, the correctness of assurance program operation, but carrying out system-level accurate simulation, in the time of need to obtaining the performance informations such as sequential, can optionally add memory hierarchy and postpone simulation.

Owing to having adopted technique scheme, compared with prior art, the present invention passes through applied in any combination, realize digital high-speed simulation, meet the work such as embedded software developing based on digital simulated environment, debugging, test, significantly improve efficiency of research and development, reduce costs, improve quality, shorten Time To Market.

Embodiment

1), inverted order pipelining a digital fast simulation technique, comprises following technology part:; 2), register redundancy technique; 3) distribution technique when, storage system is write; 4), functional part register buffer delay technology; 5), Instruction decoding buffer technology; 6), cached data emulation and delay simulation isolation technics.

Technology one: inverted order pipelining.

The simulation in each stage of virtual machine streamline of C67XX need to store current dummy instruction, adopts inverted order simulation can effectively reduce program internal memory consumption.Due in same clock, each stage of streamline carries out in practice simultaneously, progressive die fits inverted order simulation, and to result, it doesn't matter, but while taking inverted order simulation, what in a upper clock of E1 stage, carry out is command N, and that DC simulation execution is instruction N+1, therefore within this clock period, E1 carries out prior to DC, and DC is save command N+1 also, so E1 only need copy DC stage instruction, covers original command N; If but as the order simulation on the right, DC carries out prior to E1, the instruction that DC not only needs memory to simulate, also need to store command N+1 simultaneously and simulate execution for the E1 stage, as can be seen here, in the pipelined analog stage, inverted order simulation can be saved nearly 50% memory headroom than order simulation.

The simulation of streamline inverted order can better guarantee the correct execution of jump instruction, while adding streamline E1 simulation, simulating jump instruction N, each stage of inverted order simulation and order simulation, the inverted order simulation on the left side, now PS is simulating the execution of N+5 bar, the jump instruction of E1 can directly affect in this clock period the simulation of streamline PG, so the delay groove of jump instruction is 5, meets C67X jump instruction and carries out.

Technology two: register redundancy technique

Simulation 16 level production lines before and afterwards, respectively register backup and register recovery operation, this is because C67X DSP adopts 8 transmitting VLIW technology, be each clock period can carry out at most 8 instructions, may there is the read and write of many parallel instructions to same register, but software emulation platform adopts high level language, when carrying out, can only carry out one by one dummy instruction, therefore the concurrent reading and concurrent writing register manipulation of many instructions cannot be simulated, the read-write error of register value may be caused.

C67X emulation platform adopts register redundancy technique, each, carry out in a VLIW before many instructions, register is backed up, two parts of register files of system maintenance, during many executing instructions, from a file, read, in another part of file, write, thereby guaranteed can not read and write coverage condition, maintenance program is carried out semantic correctness.

Technology three: distribution technique when storage system is write

DSP C67X supports 32 addressing of address spaces, needs at most the storage size of 4G, and simulated environment can not be opened up for each user program the space of 4G when simulation, and is unnecessary.C67X emulation platform is taked the mode of used time distribution, and each sector address space, if user program does not conduct interviews, just refuses allocation space, completes the distribution of storage space in program process.The administrative mechanism of storage space, when emulation platform initialization, the null pointer that just comprises a series of storage blocks, and when corresponding memory block is arrived in actual memory read and write access, just memory block pointer is carried out to allocation space, then run user program operates, thus the waste of having avoided space to open up.

Technology four: functional part register buffer delay technology

Functional part, comprise arithmetic logical operation, floating-point operation and memory access etc., comprise at most 10 level production lines, adopt inverted order pipelining to need 10 function calls, and only have a few instructions can reach the above flowing water of Pyatyi, in view of the serial feature of higher level lanquage, a large amount of disable function call overheads will be caused, when adopting explanation execution to carry out instruction simulation, every instruction all adopts a function to simulate, for multi-cycle instructions, its function need to be divided, decompose in a plurality of pipelining-stage analog functions, in view of above-mentioned analysis, when realizing, the simulation that completes instruction at Pipeline_e1 is carried out, and E2～E10 only carries out delay for dummy instruction, with the accurate simulated timing diagrams cycle, therefore available one postpones buffering realization, Register_Latency_Buffer, the command calculations result that Pipeline_e1 is completed cushions, definition according to DSPC67XX microstructure to instruction delay, after the agreement clock period, result is write back to register file.

Functional part postpones buffering can effectively reduce function call, concentrates instruction simulation process, increases the utilization factor of storage system on sheet, improves simulation precision.

Technology five: Instruction decoding buffering

DSP C67XX processor pipeline one is divided into 16 grades, is respectively computations packet address, sends instruction packet address, waits for acquisition instruction bag, collects instruction bag, instruction assignment, Instruction decoding and ten execute phases.

Every instruction of virtual kernel all will be simulated the first six stage, and simulate all the other ten stages according to instruction feature.Therefore in the larger situation of cycle index, have a large amount of simulations that repeat of the first six stage and carry out.And these operations do not need at every turn all complete carrying out, instruction is obtained and is only needed once with decoding in fact, and the decoding information with decoding gained is obtained in the namely instruction of circulation time for the first time.

Instruction decoding buffer memory is that virtual kernel distributes a slice region of memory to be used for storing specially the decoding information of these recursion instructions.When the address of next instruction bag is consistent with instruction packet address in Instruction decoding buffer memory, in the first six stage of streamline, just do not need to carry out any operation so, the E1 stage is simulated the decoding information of using each instruction in Instruction decoding buffer memory while carrying out.

Technology six: cached data emulation and delay simulation isolation technics

According to von Neumann architectural definition, storage space is being deposited the computing mode information such as all instruction and datas, and processor core can directly complete program by memory access and carries out in theory; But semiconductor development causes the access speed of Digital Logic and storage unit to have greatest differences, so storage system becomes increasingly complex, but for program, no matter be hardware Cache level, or being virtual memory management, is all transparent, program itself is visible only has address space.

Only to the performance of storage system hardware design, there is considerable influence in Cache level, and for software simulator, its maximum efficiency is the acquisition of deferred message, therefore in actual design process, by the original function of storage system, be data storages as module independently, and the inferior performance simulation of accumulation layer is processed as module independently.

Processor core is when carrying out data access, can directly by the quick path of storage space, carry out the storage of data, the correctness of assurance program operation, but carrying out system-level accurate simulation, in the time of need to obtaining the performance informations such as sequential, can optionally add memory hierarchy and postpone simulation.The mode that this function is separated with performance simulation, for the use of virtual kernel provides larger flexible degree of freedom.

The present invention is not limited to above-mentioned preferred implementation, and anyone should learn the variation of making under enlightenment of the present invention, and every have identical or akin technical scheme with the present invention, all belongs to protection scope of the present invention.

Claims

1. a digital fast simulation technique, is characterized in that: comprise following technology part:

1), inverted order pipelining; 2), register redundancy technique; 3) distribution technique when, storage system is write; 4), functional part register buffer delay technology; 5), Instruction decoding buffer technology; 6), cached data emulation and delay simulation isolation technics.

2. digital fast simulation technique as claimed in claim 1, is characterized in that: the simulation that described inverted order pipelining is each stage of virtual machine streamline need to store current dummy instruction, adopts inverted order simulation can effectively reduce program internal memory consumption.

3. digital fast simulation technique as claimed in claim 1, is characterized in that: described register redundancy technique before analog stream waterline and afterwards, is respectively register backup and register recovery operation.

4. digital fast simulation technique as claimed in claim 1, it is characterized in that: when described storage system is write, distribution technique is the mode of taking used time distribution, it is each sector address space, if user program does not conduct interviews, just refuse allocation space, in program process, complete the distribution of storage space.

5. digital fast simulation technique as claimed in claim 1, it is characterized in that: the functional part in described functional part register buffer delay technology, comprise arithmetic logical operation, floating-point operation and memory access, comprise at most 10 level production lines, adopt inverted order pipelining to need 10 function calls.

6. digital fast simulation technique as claimed in claim 1, is characterized in that: described Instruction decoding buffer technology is that virtual kernel distributes a slice region of memory to be used for storing specially the decoding information of these recursion instructions; When the address of next instruction bag is consistent with instruction packet address in Instruction decoding buffer memory, in the first six stage of streamline, with regard to not needing, carry out any operation so, during execution, use the decoding information of each instruction in Instruction decoding buffer memory.

7. digital fast simulation technique as claimed in claim 1, it is characterized in that: described cached data emulation is that processor core is when carrying out data access with postponing simulation isolation technics, can directly by the quick path of storage space, carry out the storage of data, the correctness of assurance program operation, but carrying out system-level accurate simulation, in the time of need to obtaining the performance informations such as sequential, can optionally add memory hierarchy and postpone simulation.