CN103677965A - All-digital high-speed simulating technology - Google Patents
All-digital high-speed simulating technology Download PDFInfo
- Publication number
- CN103677965A CN103677965A CN201410003561.7A CN201410003561A CN103677965A CN 103677965 A CN103677965 A CN 103677965A CN 201410003561 A CN201410003561 A CN 201410003561A CN 103677965 A CN103677965 A CN 103677965A
- Authority
- CN
- China
- Prior art keywords
- technology
- simulation
- instruction
- technique
- register
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Debugging And Monitoring (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The invention publishes an all-digital high-speed simulating technology which is characterized by comprising the following technical parts of 1, a reverse-order production line technology, 2, a register backup technology, 3, a storage-system writing-time allocating technology, 4, a functional-part register-cache delaying technology, 5, an instruction-decode buffering technology and 6, a high-speed cached-data simulating and delay-emulating separating technology. Through the combined application of the all-digital high-speed simulating technology, all-digital high-speed simulation is realized, the requirements of working, such as embedded-type software development, debugging and testing based on an all-digital high-speed simulating environment are met, the researching and developing efficiency is remarkably improved, the cost is lowered, the quality is improved, and the time to market is shortened.
Description
Technical field
The invention belongs to computer simulation technique field, specifically, relate to a kind of digital fast simulation technique.
Background technology
The technology such as C67X series DSP is the high performance digital signal process chip generally adopting in embedded system, and it adopts 16 level production lines, 8 transmitting very long instruction words, become with features such as high-throughput, low delay, high stabilities the first-selection that multiple systems designs.But due to instruction set and the pipeline organization of this DSP high complexity, digitalized artificial technology faces lot of challenges, especially digital fast simulation technique is the basis that solves its availability.The Digital Simulation of current DSP C67X only has the simulation software of external TI company, but belongs to trade secret, and simulation performance is very low, cannot meet application demand.
Summary of the invention
For solving the problem in background technology, the invention provides a kind of digital fast simulation technique, pass through applied in any combination, realize digital high-speed simulation, meet the work such as embedded software developing based on digital simulated environment, debugging, test, significantly improve efficiency of research and development, reduce costs, improve quality, shorten Time To Market.
Technical scheme of the present invention is:
1), inverted order pipelining a digital fast simulation technique, is characterized in that: comprise following technology part:; 2), register redundancy technique; 3) distribution technique when, storage system is write; 4), functional part register buffer delay technology; 5), Instruction decoding buffer technology; 6), cached data emulation and delay simulation isolation technics.
The simulation that described inverted order pipelining is each stage of virtual machine streamline need to store current dummy instruction, adopts inverted order simulation can effectively reduce program internal memory consumption.
Described register redundancy technique before analog stream waterline and afterwards, is respectively register backup and register recovery operation.
When described storage system is write, distribution technique is the mode of taking used time distribution, and each sector address space, if user program does not conduct interviews, just refuses allocation space, completes the distribution of storage space in program process.
Functional part in described functional part register buffer delay technology, comprises arithmetic logical operation, floating-point operation and memory access, comprises at most 10 level production lines, adopts inverted order pipelining to need 10 function calls.
Described Instruction decoding buffer technology is that virtual kernel distributes a slice region of memory to be used for storing specially the decoding information of these recursion instructions; When the address of next instruction bag is consistent with instruction packet address in Instruction decoding buffer memory, in the first six stage of streamline, with regard to not needing, carry out any operation so, during execution, use the decoding information of each instruction in Instruction decoding buffer memory.
Described cached data emulation is that processor core is when carrying out data access with postponing simulation isolation technics, can directly by the quick path of storage space, carry out the storage of data, the correctness of assurance program operation, but carrying out system-level accurate simulation, in the time of need to obtaining the performance informations such as sequential, can optionally add memory hierarchy and postpone simulation.
Owing to having adopted technique scheme, compared with prior art, the present invention passes through applied in any combination, realize digital high-speed simulation, meet the work such as embedded software developing based on digital simulated environment, debugging, test, significantly improve efficiency of research and development, reduce costs, improve quality, shorten Time To Market.
Embodiment
Embodiment
1), inverted order pipelining a digital fast simulation technique, comprises following technology part:; 2), register redundancy technique; 3) distribution technique when, storage system is write; 4), functional part register buffer delay technology; 5), Instruction decoding buffer technology; 6), cached data emulation and delay simulation isolation technics.
Technology one: inverted order pipelining.
The simulation in each stage of virtual machine streamline of C67XX need to store current dummy instruction, adopts inverted order simulation can effectively reduce program internal memory consumption.Due in same clock, each stage of streamline carries out in practice simultaneously, progressive die fits inverted order simulation, and to result, it doesn't matter, but while taking inverted order simulation, what in a upper clock of E1 stage, carry out is command N, and that DC simulation execution is instruction N+1, therefore within this clock period, E1 carries out prior to DC, and DC is save command N+1 also, so E1 only need copy DC stage instruction, covers original command N; If but as the order simulation on the right, DC carries out prior to E1, the instruction that DC not only needs memory to simulate, also need to store command N+1 simultaneously and simulate execution for the E1 stage, as can be seen here, in the pipelined analog stage, inverted order simulation can be saved nearly 50% memory headroom than order simulation.
The simulation of streamline inverted order can better guarantee the correct execution of jump instruction, while adding streamline E1 simulation, simulating jump instruction N, each stage of inverted order simulation and order simulation, the inverted order simulation on the left side, now PS is simulating the execution of N+5 bar, the jump instruction of E1 can directly affect in this clock period the simulation of streamline PG, so the delay groove of jump instruction is 5, meets C67X jump instruction and carries out.
Technology two: register redundancy technique
Simulation 16 level production lines before and afterwards, respectively register backup and register recovery operation, this is because C67X DSP adopts 8 transmitting VLIW technology, be each clock period can carry out at most 8 instructions, may there is the read and write of many parallel instructions to same register, but software emulation platform adopts high level language, when carrying out, can only carry out one by one dummy instruction, therefore the concurrent reading and concurrent writing register manipulation of many instructions cannot be simulated, the read-write error of register value may be caused.
C67X emulation platform adopts register redundancy technique, each, carry out in a VLIW before many instructions, register is backed up, two parts of register files of system maintenance, during many executing instructions, from a file, read, in another part of file, write, thereby guaranteed can not read and write coverage condition, maintenance program is carried out semantic correctness.
Technology three: distribution technique when storage system is write
DSP C67X supports 32 addressing of address spaces, needs at most the storage size of 4G, and simulated environment can not be opened up for each user program the space of 4G when simulation, and is unnecessary.C67X emulation platform is taked the mode of used time distribution, and each sector address space, if user program does not conduct interviews, just refuses allocation space, completes the distribution of storage space in program process.The administrative mechanism of storage space, when emulation platform initialization, the null pointer that just comprises a series of storage blocks, and when corresponding memory block is arrived in actual memory read and write access, just memory block pointer is carried out to allocation space, then run user program operates, thus the waste of having avoided space to open up.
Technology four: functional part register buffer delay technology
Functional part, comprise arithmetic logical operation, floating-point operation and memory access etc., comprise at most 10 level production lines, adopt inverted order pipelining to need 10 function calls, and only have a few instructions can reach the above flowing water of Pyatyi, in view of the serial feature of higher level lanquage, a large amount of disable function call overheads will be caused, when adopting explanation execution to carry out instruction simulation, every instruction all adopts a function to simulate, for multi-cycle instructions, its function need to be divided, decompose in a plurality of pipelining-stage analog functions, in view of above-mentioned analysis, when realizing, the simulation that completes instruction at Pipeline_e1 is carried out, and E2~E10 only carries out delay for dummy instruction, with the accurate simulated timing diagrams cycle, therefore available one postpones buffering realization, Register_Latency_Buffer, the command calculations result that Pipeline_e1 is completed cushions, definition according to DSPC67XX microstructure to instruction delay, after the agreement clock period, result is write back to register file.
Functional part postpones buffering can effectively reduce function call, concentrates instruction simulation process, increases the utilization factor of storage system on sheet, improves simulation precision.
Technology five: Instruction decoding buffering
DSP C67XX processor pipeline one is divided into 16 grades, is respectively computations packet address, sends instruction packet address, waits for acquisition instruction bag, collects instruction bag, instruction assignment, Instruction decoding and ten execute phases.
Every instruction of virtual kernel all will be simulated the first six stage, and simulate all the other ten stages according to instruction feature.Therefore in the larger situation of cycle index, have a large amount of simulations that repeat of the first six stage and carry out.And these operations do not need at every turn all complete carrying out, instruction is obtained and is only needed once with decoding in fact, and the decoding information with decoding gained is obtained in the namely instruction of circulation time for the first time.
Instruction decoding buffer memory is that virtual kernel distributes a slice region of memory to be used for storing specially the decoding information of these recursion instructions.When the address of next instruction bag is consistent with instruction packet address in Instruction decoding buffer memory, in the first six stage of streamline, just do not need to carry out any operation so, the E1 stage is simulated the decoding information of using each instruction in Instruction decoding buffer memory while carrying out.
Technology six: cached data emulation and delay simulation isolation technics
According to von Neumann architectural definition, storage space is being deposited the computing mode information such as all instruction and datas, and processor core can directly complete program by memory access and carries out in theory; But semiconductor development causes the access speed of Digital Logic and storage unit to have greatest differences, so storage system becomes increasingly complex, but for program, no matter be hardware Cache level, or being virtual memory management, is all transparent, program itself is visible only has address space.
Only to the performance of storage system hardware design, there is considerable influence in Cache level, and for software simulator, its maximum efficiency is the acquisition of deferred message, therefore in actual design process, by the original function of storage system, be data storages as module independently, and the inferior performance simulation of accumulation layer is processed as module independently.
Processor core is when carrying out data access, can directly by the quick path of storage space, carry out the storage of data, the correctness of assurance program operation, but carrying out system-level accurate simulation, in the time of need to obtaining the performance informations such as sequential, can optionally add memory hierarchy and postpone simulation.The mode that this function is separated with performance simulation, for the use of virtual kernel provides larger flexible degree of freedom.
The present invention is not limited to above-mentioned preferred implementation, and anyone should learn the variation of making under enlightenment of the present invention, and every have identical or akin technical scheme with the present invention, all belongs to protection scope of the present invention.
Claims (7)
1. a digital fast simulation technique, is characterized in that: comprise following technology part:
1), inverted order pipelining; 2), register redundancy technique; 3) distribution technique when, storage system is write; 4), functional part register buffer delay technology; 5), Instruction decoding buffer technology; 6), cached data emulation and delay simulation isolation technics.
2. digital fast simulation technique as claimed in claim 1, is characterized in that: the simulation that described inverted order pipelining is each stage of virtual machine streamline need to store current dummy instruction, adopts inverted order simulation can effectively reduce program internal memory consumption.
3. digital fast simulation technique as claimed in claim 1, is characterized in that: described register redundancy technique before analog stream waterline and afterwards, is respectively register backup and register recovery operation.
4. digital fast simulation technique as claimed in claim 1, it is characterized in that: when described storage system is write, distribution technique is the mode of taking used time distribution, it is each sector address space, if user program does not conduct interviews, just refuse allocation space, in program process, complete the distribution of storage space.
5. digital fast simulation technique as claimed in claim 1, it is characterized in that: the functional part in described functional part register buffer delay technology, comprise arithmetic logical operation, floating-point operation and memory access, comprise at most 10 level production lines, adopt inverted order pipelining to need 10 function calls.
6. digital fast simulation technique as claimed in claim 1, is characterized in that: described Instruction decoding buffer technology is that virtual kernel distributes a slice region of memory to be used for storing specially the decoding information of these recursion instructions; When the address of next instruction bag is consistent with instruction packet address in Instruction decoding buffer memory, in the first six stage of streamline, with regard to not needing, carry out any operation so, during execution, use the decoding information of each instruction in Instruction decoding buffer memory.
7. digital fast simulation technique as claimed in claim 1, it is characterized in that: described cached data emulation is that processor core is when carrying out data access with postponing simulation isolation technics, can directly by the quick path of storage space, carry out the storage of data, the correctness of assurance program operation, but carrying out system-level accurate simulation, in the time of need to obtaining the performance informations such as sequential, can optionally add memory hierarchy and postpone simulation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410003561.7A CN103677965B (en) | 2014-01-03 | 2014-01-03 | All-digital high-speed simulating method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410003561.7A CN103677965B (en) | 2014-01-03 | 2014-01-03 | All-digital high-speed simulating method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103677965A true CN103677965A (en) | 2014-03-26 |
CN103677965B CN103677965B (en) | 2017-03-22 |
Family
ID=50315622
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410003561.7A Active CN103677965B (en) | 2014-01-03 | 2014-01-03 | All-digital high-speed simulating method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103677965B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109101276A (en) * | 2018-08-14 | 2018-12-28 | 阿里巴巴集团控股有限公司 | The method executed instruction in CPU |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1153933A (en) * | 1995-09-29 | 1997-07-09 | 松下电工株式会社 | Programmable controller |
US20030093594A1 (en) * | 2001-11-14 | 2003-05-15 | Smith Patrick J. | Apparatus and method for controlling block signal flow in a multi digital signal processor configuration from a shared peripheral direct memory controller to high level data link controller |
CN101788919A (en) * | 2010-01-29 | 2010-07-28 | 中国科学技术大学苏州研究院 | Chip multi-core processor clock precision parallel simulation system and simulation method thereof |
CN103440373A (en) * | 2013-08-25 | 2013-12-11 | 浙江大学 | Interconnected configuration simulation method of multi-DSP system |
-
2014
- 2014-01-03 CN CN201410003561.7A patent/CN103677965B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1153933A (en) * | 1995-09-29 | 1997-07-09 | 松下电工株式会社 | Programmable controller |
US20030093594A1 (en) * | 2001-11-14 | 2003-05-15 | Smith Patrick J. | Apparatus and method for controlling block signal flow in a multi digital signal processor configuration from a shared peripheral direct memory controller to high level data link controller |
CN101788919A (en) * | 2010-01-29 | 2010-07-28 | 中国科学技术大学苏州研究院 | Chip multi-core processor clock precision parallel simulation system and simulation method thereof |
CN103440373A (en) * | 2013-08-25 | 2013-12-11 | 浙江大学 | Interconnected configuration simulation method of multi-DSP system |
Non-Patent Citations (2)
Title |
---|
张福新等: "《基于SimpleScalar的龙芯CPU模拟器Sim-Godson》", 《计算机学报》 * |
潘烽锋: "《高性能、时钟精确C67X DSP指令模拟技术研究》", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109101276A (en) * | 2018-08-14 | 2018-12-28 | 阿里巴巴集团控股有限公司 | The method executed instruction in CPU |
US11579885B2 (en) | 2018-08-14 | 2023-02-14 | Advanced New Technologies Co., Ltd. | Method for replenishing a thread queue with a target instruction of a jump instruction |
Also Published As
Publication number | Publication date |
---|---|
CN103677965B (en) | 2017-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105051680B (en) | The processor and method of process instruction on road are executed for the hardware concurrent inside processor | |
US7444499B2 (en) | Method and system for trace generation using memory index hashing | |
US8549468B2 (en) | Method, system and computer readable storage device for generating software transaction-level modeling (TLM) model | |
CN105408859B (en) | For instructing the method and system of scheduling | |
CN105453030B (en) | Processor, the method and system loaded dependent on the partial width of mode is carried out to wider register | |
CN104937568B (en) | Apparatus and method for multipage size conversion look-aside buffer (TLB) | |
CN104246694B (en) | Assemble page mistake signaling and processing | |
US8694955B2 (en) | Flow and methodology to find TDP power efficiency | |
US8725486B2 (en) | Apparatus and method for simulating a reconfigurable processor | |
CN102750133A (en) | 32-Bit triple-emission digital signal processor supporting SIMD | |
CN105074657B (en) | The hardware and software solution of diverging branch in parallel pipeline | |
CN102073480B (en) | Method for simulating cores of multi-core processor by adopting time division multiplex | |
Bouchhima et al. | Using abstract CPU subsystem simulation model for high level HW/SW architecture exploration | |
CN103440373A (en) | Interconnected configuration simulation method of multi-DSP system | |
Pilato et al. | System-level memory optimization for high-level synthesis of component-based SoCs | |
CN100530103C (en) | Simulator and method | |
CN103677965A (en) | All-digital high-speed simulating technology | |
Jungeblut et al. | Design space exploration for memory subsystems of VLIW architectures | |
Fu et al. | A simulation platform for reconfigurable computing research | |
Rekik et al. | Virtual prototyping of multiprocessor architectures using the open virtual platform | |
US20140282320A1 (en) | Analyzing timing requirements of a hierarchical integrated circuit design | |
Chen et al. | MRP: Mix real cores and pseudo cores for FPGA-based chip-multiprocessor simulation | |
Lee et al. | Design and implementation of a virtual platform of solid-state disks | |
CN103645936B (en) | A kind of data card virtualization implementation method based on equipment simulating | |
CN101770391B (en) | Cache simulator based on GPU and time parallel speedup simulating method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder |
Address after: 100094 No. 28, Yongfeng Road, Beijing, Haidian District Patentee after: Beijing Shenzhou Aerospace Software Technology Co.,Ltd. Address before: 100094 No. 28, Yongfeng Road, Beijing, Haidian District Patentee before: BEIJING SHENZHOU AEROSPACE SOFTWARE TECHNOLOGY Co.,Ltd. |
|
CP01 | Change in the name or title of a patent holder |