CN103677965A - All-digital high-speed simulating technology - Google Patents

All-digital high-speed simulating technology Download PDF

Info

Publication number
CN103677965A
CN103677965A CN201410003561.7A CN201410003561A CN103677965A CN 103677965 A CN103677965 A CN 103677965A CN 201410003561 A CN201410003561 A CN 201410003561A CN 103677965 A CN103677965 A CN 103677965A
Authority
CN
China
Prior art keywords
technology
simulation
instruction
technique
register
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410003561.7A
Other languages
Chinese (zh)
Other versions
CN103677965B (en
Inventor
任永青
魏明
王金龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shenzhou Aerospace Software Technology Co.,Ltd.
Original Assignee
BEIJING SHENZHOU AEROSPACE SOFTWARE TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING SHENZHOU AEROSPACE SOFTWARE TECHNOLOGY Co Ltd filed Critical BEIJING SHENZHOU AEROSPACE SOFTWARE TECHNOLOGY Co Ltd
Priority to CN201410003561.7A priority Critical patent/CN103677965B/en
Publication of CN103677965A publication Critical patent/CN103677965A/en
Application granted granted Critical
Publication of CN103677965B publication Critical patent/CN103677965B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Debugging And Monitoring (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention publishes an all-digital high-speed simulating technology which is characterized by comprising the following technical parts of 1, a reverse-order production line technology, 2, a register backup technology, 3, a storage-system writing-time allocating technology, 4, a functional-part register-cache delaying technology, 5, an instruction-decode buffering technology and 6, a high-speed cached-data simulating and delay-emulating separating technology. Through the combined application of the all-digital high-speed simulating technology, all-digital high-speed simulation is realized, the requirements of working, such as embedded-type software development, debugging and testing based on an all-digital high-speed simulating environment are met, the researching and developing efficiency is remarkably improved, the cost is lowered, the quality is improved, and the time to market is shortened.

Description

A kind of digital fast simulation technique
Technical field
The invention belongs to computer simulation technique field, specifically, relate to a kind of digital fast simulation technique.
Background technology
The technology such as C67X series DSP is the high performance digital signal process chip generally adopting in embedded system, and it adopts 16 level production lines, 8 transmitting very long instruction words, become with features such as high-throughput, low delay, high stabilities the first-selection that multiple systems designs.But due to instruction set and the pipeline organization of this DSP high complexity, digitalized artificial technology faces lot of challenges, especially digital fast simulation technique is the basis that solves its availability.The Digital Simulation of current DSP C67X only has the simulation software of external TI company, but belongs to trade secret, and simulation performance is very low, cannot meet application demand.
Summary of the invention
For solving the problem in background technology, the invention provides a kind of digital fast simulation technique, pass through applied in any combination, realize digital high-speed simulation, meet the work such as embedded software developing based on digital simulated environment, debugging, test, significantly improve efficiency of research and development, reduce costs, improve quality, shorten Time To Market.
Technical scheme of the present invention is:
1), inverted order pipelining a digital fast simulation technique, is characterized in that: comprise following technology part:; 2), register redundancy technique; 3) distribution technique when, storage system is write; 4), functional part register buffer delay technology; 5), Instruction decoding buffer technology; 6), cached data emulation and delay simulation isolation technics.
The simulation that described inverted order pipelining is each stage of virtual machine streamline need to store current dummy instruction, adopts inverted order simulation can effectively reduce program internal memory consumption.
Described register redundancy technique before analog stream waterline and afterwards, is respectively register backup and register recovery operation.
When described storage system is write, distribution technique is the mode of taking used time distribution, and each sector address space, if user program does not conduct interviews, just refuses allocation space, completes the distribution of storage space in program process.
Functional part in described functional part register buffer delay technology, comprises arithmetic logical operation, floating-point operation and memory access, comprises at most 10 level production lines, adopts inverted order pipelining to need 10 function calls.
Described Instruction decoding buffer technology is that virtual kernel distributes a slice region of memory to be used for storing specially the decoding information of these recursion instructions; When the address of next instruction bag is consistent with instruction packet address in Instruction decoding buffer memory, in the first six stage of streamline, with regard to not needing, carry out any operation so, during execution, use the decoding information of each instruction in Instruction decoding buffer memory.
Described cached data emulation is that processor core is when carrying out data access with postponing simulation isolation technics, can directly by the quick path of storage space, carry out the storage of data, the correctness of assurance program operation, but carrying out system-level accurate simulation, in the time of need to obtaining the performance informations such as sequential, can optionally add memory hierarchy and postpone simulation.
Owing to having adopted technique scheme, compared with prior art, the present invention passes through applied in any combination, realize digital high-speed simulation, meet the work such as embedded software developing based on digital simulated environment, debugging, test, significantly improve efficiency of research and development, reduce costs, improve quality, shorten Time To Market.
Embodiment
Embodiment
1), inverted order pipelining a digital fast simulation technique, comprises following technology part:; 2), register redundancy technique; 3) distribution technique when, storage system is write; 4), functional part register buffer delay technology; 5), Instruction decoding buffer technology; 6), cached data emulation and delay simulation isolation technics.
Technology one: inverted order pipelining.
The simulation in each stage of virtual machine streamline of C67XX need to store current dummy instruction, adopts inverted order simulation can effectively reduce program internal memory consumption.Due in same clock, each stage of streamline carries out in practice simultaneously, progressive die fits inverted order simulation, and to result, it doesn't matter, but while taking inverted order simulation, what in a upper clock of E1 stage, carry out is command N, and that DC simulation execution is instruction N+1, therefore within this clock period, E1 carries out prior to DC, and DC is save command N+1 also, so E1 only need copy DC stage instruction, covers original command N; If but as the order simulation on the right, DC carries out prior to E1, the instruction that DC not only needs memory to simulate, also need to store command N+1 simultaneously and simulate execution for the E1 stage, as can be seen here, in the pipelined analog stage, inverted order simulation can be saved nearly 50% memory headroom than order simulation.
The simulation of streamline inverted order can better guarantee the correct execution of jump instruction, while adding streamline E1 simulation, simulating jump instruction N, each stage of inverted order simulation and order simulation, the inverted order simulation on the left side, now PS is simulating the execution of N+5 bar, the jump instruction of E1 can directly affect in this clock period the simulation of streamline PG, so the delay groove of jump instruction is 5, meets C67X jump instruction and carries out.
Technology two: register redundancy technique
Simulation 16 level production lines before and afterwards, respectively register backup and register recovery operation, this is because C67X DSP adopts 8 transmitting VLIW technology, be each clock period can carry out at most 8 instructions, may there is the read and write of many parallel instructions to same register, but software emulation platform adopts high level language, when carrying out, can only carry out one by one dummy instruction, therefore the concurrent reading and concurrent writing register manipulation of many instructions cannot be simulated, the read-write error of register value may be caused.
C67X emulation platform adopts register redundancy technique, each, carry out in a VLIW before many instructions, register is backed up, two parts of register files of system maintenance, during many executing instructions, from a file, read, in another part of file, write, thereby guaranteed can not read and write coverage condition, maintenance program is carried out semantic correctness.
Technology three: distribution technique when storage system is write
DSP C67X supports 32 addressing of address spaces, needs at most the storage size of 4G, and simulated environment can not be opened up for each user program the space of 4G when simulation, and is unnecessary.C67X emulation platform is taked the mode of used time distribution, and each sector address space, if user program does not conduct interviews, just refuses allocation space, completes the distribution of storage space in program process.The administrative mechanism of storage space, when emulation platform initialization, the null pointer that just comprises a series of storage blocks, and when corresponding memory block is arrived in actual memory read and write access, just memory block pointer is carried out to allocation space, then run user program operates, thus the waste of having avoided space to open up.
Technology four: functional part register buffer delay technology
Functional part, comprise arithmetic logical operation, floating-point operation and memory access etc., comprise at most 10 level production lines, adopt inverted order pipelining to need 10 function calls, and only have a few instructions can reach the above flowing water of Pyatyi, in view of the serial feature of higher level lanquage, a large amount of disable function call overheads will be caused, when adopting explanation execution to carry out instruction simulation, every instruction all adopts a function to simulate, for multi-cycle instructions, its function need to be divided, decompose in a plurality of pipelining-stage analog functions, in view of above-mentioned analysis, when realizing, the simulation that completes instruction at Pipeline_e1 is carried out, and E2~E10 only carries out delay for dummy instruction, with the accurate simulated timing diagrams cycle, therefore available one postpones buffering realization, Register_Latency_Buffer, the command calculations result that Pipeline_e1 is completed cushions, definition according to DSPC67XX microstructure to instruction delay, after the agreement clock period, result is write back to register file.
Functional part postpones buffering can effectively reduce function call, concentrates instruction simulation process, increases the utilization factor of storage system on sheet, improves simulation precision.
Technology five: Instruction decoding buffering
DSP C67XX processor pipeline one is divided into 16 grades, is respectively computations packet address, sends instruction packet address, waits for acquisition instruction bag, collects instruction bag, instruction assignment, Instruction decoding and ten execute phases.
Every instruction of virtual kernel all will be simulated the first six stage, and simulate all the other ten stages according to instruction feature.Therefore in the larger situation of cycle index, have a large amount of simulations that repeat of the first six stage and carry out.And these operations do not need at every turn all complete carrying out, instruction is obtained and is only needed once with decoding in fact, and the decoding information with decoding gained is obtained in the namely instruction of circulation time for the first time.
Instruction decoding buffer memory is that virtual kernel distributes a slice region of memory to be used for storing specially the decoding information of these recursion instructions.When the address of next instruction bag is consistent with instruction packet address in Instruction decoding buffer memory, in the first six stage of streamline, just do not need to carry out any operation so, the E1 stage is simulated the decoding information of using each instruction in Instruction decoding buffer memory while carrying out.
Technology six: cached data emulation and delay simulation isolation technics
According to von Neumann architectural definition, storage space is being deposited the computing mode information such as all instruction and datas, and processor core can directly complete program by memory access and carries out in theory; But semiconductor development causes the access speed of Digital Logic and storage unit to have greatest differences, so storage system becomes increasingly complex, but for program, no matter be hardware Cache level, or being virtual memory management, is all transparent, program itself is visible only has address space.
Only to the performance of storage system hardware design, there is considerable influence in Cache level, and for software simulator, its maximum efficiency is the acquisition of deferred message, therefore in actual design process, by the original function of storage system, be data storages as module independently, and the inferior performance simulation of accumulation layer is processed as module independently.
Processor core is when carrying out data access, can directly by the quick path of storage space, carry out the storage of data, the correctness of assurance program operation, but carrying out system-level accurate simulation, in the time of need to obtaining the performance informations such as sequential, can optionally add memory hierarchy and postpone simulation.The mode that this function is separated with performance simulation, for the use of virtual kernel provides larger flexible degree of freedom.
The present invention is not limited to above-mentioned preferred implementation, and anyone should learn the variation of making under enlightenment of the present invention, and every have identical or akin technical scheme with the present invention, all belongs to protection scope of the present invention.

Claims (7)

1. a digital fast simulation technique, is characterized in that: comprise following technology part:
1), inverted order pipelining; 2), register redundancy technique; 3) distribution technique when, storage system is write; 4), functional part register buffer delay technology; 5), Instruction decoding buffer technology; 6), cached data emulation and delay simulation isolation technics.
2. digital fast simulation technique as claimed in claim 1, is characterized in that: the simulation that described inverted order pipelining is each stage of virtual machine streamline need to store current dummy instruction, adopts inverted order simulation can effectively reduce program internal memory consumption.
3. digital fast simulation technique as claimed in claim 1, is characterized in that: described register redundancy technique before analog stream waterline and afterwards, is respectively register backup and register recovery operation.
4. digital fast simulation technique as claimed in claim 1, it is characterized in that: when described storage system is write, distribution technique is the mode of taking used time distribution, it is each sector address space, if user program does not conduct interviews, just refuse allocation space, in program process, complete the distribution of storage space.
5. digital fast simulation technique as claimed in claim 1, it is characterized in that: the functional part in described functional part register buffer delay technology, comprise arithmetic logical operation, floating-point operation and memory access, comprise at most 10 level production lines, adopt inverted order pipelining to need 10 function calls.
6. digital fast simulation technique as claimed in claim 1, is characterized in that: described Instruction decoding buffer technology is that virtual kernel distributes a slice region of memory to be used for storing specially the decoding information of these recursion instructions; When the address of next instruction bag is consistent with instruction packet address in Instruction decoding buffer memory, in the first six stage of streamline, with regard to not needing, carry out any operation so, during execution, use the decoding information of each instruction in Instruction decoding buffer memory.
7. digital fast simulation technique as claimed in claim 1, it is characterized in that: described cached data emulation is that processor core is when carrying out data access with postponing simulation isolation technics, can directly by the quick path of storage space, carry out the storage of data, the correctness of assurance program operation, but carrying out system-level accurate simulation, in the time of need to obtaining the performance informations such as sequential, can optionally add memory hierarchy and postpone simulation.
CN201410003561.7A 2014-01-03 2014-01-03 All-digital high-speed simulating method Active CN103677965B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410003561.7A CN103677965B (en) 2014-01-03 2014-01-03 All-digital high-speed simulating method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410003561.7A CN103677965B (en) 2014-01-03 2014-01-03 All-digital high-speed simulating method

Publications (2)

Publication Number Publication Date
CN103677965A true CN103677965A (en) 2014-03-26
CN103677965B CN103677965B (en) 2017-03-22

Family

ID=50315622

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410003561.7A Active CN103677965B (en) 2014-01-03 2014-01-03 All-digital high-speed simulating method

Country Status (1)

Country Link
CN (1) CN103677965B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109101276A (en) * 2018-08-14 2018-12-28 阿里巴巴集团控股有限公司 The method executed instruction in CPU

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1153933A (en) * 1995-09-29 1997-07-09 松下电工株式会社 Programmable controller
US20030093594A1 (en) * 2001-11-14 2003-05-15 Smith Patrick J. Apparatus and method for controlling block signal flow in a multi digital signal processor configuration from a shared peripheral direct memory controller to high level data link controller
CN101788919A (en) * 2010-01-29 2010-07-28 中国科学技术大学苏州研究院 Chip multi-core processor clock precision parallel simulation system and simulation method thereof
CN103440373A (en) * 2013-08-25 2013-12-11 浙江大学 Interconnected configuration simulation method of multi-DSP system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1153933A (en) * 1995-09-29 1997-07-09 松下电工株式会社 Programmable controller
US20030093594A1 (en) * 2001-11-14 2003-05-15 Smith Patrick J. Apparatus and method for controlling block signal flow in a multi digital signal processor configuration from a shared peripheral direct memory controller to high level data link controller
CN101788919A (en) * 2010-01-29 2010-07-28 中国科学技术大学苏州研究院 Chip multi-core processor clock precision parallel simulation system and simulation method thereof
CN103440373A (en) * 2013-08-25 2013-12-11 浙江大学 Interconnected configuration simulation method of multi-DSP system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张福新等: "《基于SimpleScalar的龙芯CPU模拟器Sim-Godson》", 《计算机学报》 *
潘烽锋: "《高性能、时钟精确C67X DSP指令模拟技术研究》", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109101276A (en) * 2018-08-14 2018-12-28 阿里巴巴集团控股有限公司 The method executed instruction in CPU
US11579885B2 (en) 2018-08-14 2023-02-14 Advanced New Technologies Co., Ltd. Method for replenishing a thread queue with a target instruction of a jump instruction

Also Published As

Publication number Publication date
CN103677965B (en) 2017-03-22

Similar Documents

Publication Publication Date Title
US7444499B2 (en) Method and system for trace generation using memory index hashing
CN105051680B (en) The processor and method of process instruction on road are executed for the hardware concurrent inside processor
US8549468B2 (en) Method, system and computer readable storage device for generating software transaction-level modeling (TLM) model
CN105408859B (en) For instructing the method and system of scheduling
CN104937568B (en) Apparatus and method for multipage size conversion look-aside buffer (TLB)
US8694955B2 (en) Flow and methodology to find TDP power efficiency
CN102207904B (en) Device and method for being emulated to reconfigurable processor
CN109508206A (en) Processor, the method and system loaded dependent on the partial width of mode is carried out to wider register
CN102750133A (en) 32-Bit triple-emission digital signal processor supporting SIMD
CN104246694B (en) Assemble page mistake signaling and processing
CN102073480B (en) Method for simulating cores of multi-core processor by adopting time division multiplex
CN105074657B (en) The hardware and software solution of diverging branch in parallel pipeline
CN104008021A (en) Precision exception signaling for multiple data architecture
Bouchhima et al. Using abstract CPU subsystem simulation model for high level HW/SW architecture exploration
CN103440373A (en) Interconnected configuration simulation method of multi-DSP system
Pilato et al. System-level memory optimization for high-level synthesis of component-based SoCs
CN100530103C (en) Simulator and method
CN103677965A (en) All-digital high-speed simulating technology
Jungeblut et al. Design space exploration for memory subsystems of VLIW architectures
Biancolin et al. Accessible, FPGA resource-optimized simulation of multiclock systems in firesim
Fu et al. A simulation platform for reconfigurable computing research
Rekik et al. Virtual prototyping of multiprocessor architectures using the open virtual platform
US20140282320A1 (en) Analyzing timing requirements of a hierarchical integrated circuit design
Lee et al. Design and implementation of a virtual platform of solid-state disks
CN103645936B (en) A kind of data card virtualization implementation method based on equipment simulating

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 100094 No. 28, Yongfeng Road, Beijing, Haidian District

Patentee after: Beijing Shenzhou Aerospace Software Technology Co.,Ltd.

Address before: 100094 No. 28, Yongfeng Road, Beijing, Haidian District

Patentee before: BEIJING SHENZHOU AEROSPACE SOFTWARE TECHNOLOGY Co.,Ltd.

CP01 Change in the name or title of a patent holder