CN115858092A - Time sequence simulation method, device and system - Google Patents

Time sequence simulation method, device and system Download PDF

Info

Publication number
CN115858092A
CN115858092A CN202211575933.4A CN202211575933A CN115858092A CN 115858092 A CN115858092 A CN 115858092A CN 202211575933 A CN202211575933 A CN 202211575933A CN 115858092 A CN115858092 A CN 115858092A
Authority
CN
China
Prior art keywords
simulation
code
time
processing
host
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211575933.4A
Other languages
Chinese (zh)
Inventor
郭旸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zeku Technology Beijing Corp Ltd
Original Assignee
Zeku Technology Beijing Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zeku Technology Beijing Corp Ltd filed Critical Zeku Technology Beijing Corp Ltd
Priority to CN202211575933.4A priority Critical patent/CN115858092A/en
Publication of CN115858092A publication Critical patent/CN115858092A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The application relates to a time sequence simulation method, a time sequence simulation device and a time sequence simulation system. The method comprises the following steps: converting the target machine code into an executable code corresponding to the simulation host machine; responding to the parallel execution of the executable codes by the plurality of simulation processes, recording the processing time of the simulation processes for processing the codes, generating a processing time sequence, and determining the virtual simulation time required by the simulation processes for processing the codes according to the processing time sequence; updating each virtual simulation time to keep time synchronization among each simulation process; outputting a time sequence simulation result under the condition that the executable code is executed; and the time sequence simulation result is used for representing the execution time sequence of the simulation host computer aiming at the target machine code. The method and the device can maintain globally uniform simulation time, realize the generation of software processing time sequence, ensure the simulation speed and accurately reflect the software execution time sequence.

Description

Time sequence simulation method, device and system
Technical Field
The present application relates to the field of software simulation technologies, and in particular, to a timing simulation method, apparatus, and system.
Background
In the field of software performance simulation, a traditional CA (clock Accurate) simulation scheme is to perform signal-level timing modeling on a microarchitecture pipeline inside a microprocessor, and update behaviors of modules according to clock cycles during simulation. Conventional functional simulation schemes simulate the instruction-level behavior of the processor architecture. However, the current software simulation scheme has the problems that the simulation speed cannot be guaranteed and the software execution time sequence cannot be accurately reflected.
Disclosure of Invention
In view of the above, it is necessary to provide a timing simulation method, device and system capable of ensuring simulation speed and accurately representing software execution timing.
In a first aspect, the present application provides a timing simulation method, including:
converting the target machine code into an executable code corresponding to the simulation host machine;
responding to the parallel execution of the executable codes by the plurality of simulation processes, recording the processing time of the simulation processes for processing the codes, generating a processing time sequence, and determining the virtual simulation time required by the simulation processes for processing the codes according to the processing time sequence;
updating each virtual simulation time to keep time synchronization among the simulation processes;
outputting a time sequence simulation result under the condition that the executable code is executed; and the time sequence simulation result is used for representing the execution time sequence of the simulation host computer aiming at the target machine code.
In one embodiment, the executable code includes host code carrying latency information; the virtual simulation time includes an actual processing delay obtained by the simulation process processing the corresponding host code based on the delay information.
In one embodiment, the time delay information comprises at least one code segment identification determined according to simulation precision requirements and at least one time delay data determined according to time delay characteristics of the target machine code; the actual processing time delay comprises the actual execution time of the simulation host computer aiming at the target machine code;
wherein, at least one code segment identification corresponds to at least one time delay data one by one; the code segment identifier is used for indicating the simulation process to jump out of the current processing aiming at the host code at the identifier position so as to obtain corresponding time delay data; the time delay data is used for representing the actual execution time of the target machine code corresponding to the host machine code positioned in front of the identification position.
In one embodiment, the processing time includes an execution starting time and an execution ending time of the host code, and the simulation process acquires an actual execution time.
In one embodiment, the processing timing is in clock cycles; the virtual simulation time is the difference between the execution starting time and the execution ending time.
In one embodiment, the host code is a block of code executable by the instruction level emulator ISS;
the step of converting the target machine code into the executable code corresponding to the simulation host comprises the following steps:
analyzing the target machine code and confirming the time delay type of the target machine code;
determining time delay data based on the time delay type of the target machine code;
segmenting the target machine code according to the quantity of the time delay data according to the requirement of the simulation precision to obtain each segmented code, and inserting a code segment identifier at the end of each segmented code;
carrying out binary conversion on the target machine code inserted with the code segment identifier to obtain a code block;
and adding the time delay data corresponding to each segmented code to the code block to obtain the host code.
In one embodiment, the code segment is identified as a Trap instruction; the time delay data is a Suspend data segment attached at the end of the code block.
In one embodiment, the emulation process is used to encapsulate the processing tasks of the instruction level emulator ISS in process form;
the step of updating each virtual simulation time and keeping time synchronization among the simulation processes comprises the following steps:
and updating the virtual simulation time corresponding to the simulation process to which the processing task belongs based on the execution frequency of the processing task.
In one embodiment, the latency types include an arithmetic logic type and a data storage type; the latency data includes a number of cycles corresponding to an arithmetic logic type, and triplet information corresponding to a data storage type.
In one embodiment, the triplet information includes a target address, an access type, and a data volume; the method comprises the following steps of determining the virtual simulation time required by the simulation process to process the code according to the processing time sequence, wherein the steps comprise:
responding to the simulation process executed to the identification position, and if the period number corresponding to the current section code is read, accumulating the period number to the virtual simulation time;
and responding to the execution of the simulation process to the identification position, if the triple information corresponding to the current section code is read, acquiring the access time of the simulation process for the data volume based on the target address, and accumulating the access time to the virtual simulation time.
In a second aspect, the present application further provides a timing simulation apparatus, including:
the code conversion module is used for converting the target machine code into an executable code corresponding to the simulation host;
the simulation time acquisition module is used for responding to the parallel execution of the executable codes by the plurality of simulation processes, recording the processing time of the simulation processes for processing the codes, generating a processing time sequence and determining the virtual simulation time required by the simulation processes for processing the codes according to the processing time sequence;
the simulation time updating module is used for updating each virtual simulation time to keep time synchronization among the simulation processes;
the simulation result output module is used for outputting a time sequence simulation result under the condition that the executable code is executed; and the time sequence simulation result is used for representing the execution time sequence of the simulation host computer aiming at the target machine code.
In a third aspect, the present application further provides a timing simulation system, where a simulation engine of a baseband chip runs on the timing simulation system, and the simulation engine is configured to drive a plurality of simulation processes to execute instruction-level simulation in parallel;
the time sequence simulation system comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the method when executing the computer program.
In one embodiment, the simulation engine comprises at least one module modeled based on System C language; the emulation process is used to encapsulate the module in process form.
In a fourth aspect, the present application further provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the method when executing the computer program.
In a fifth aspect, the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method described above.
In a sixth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the method described above.
According to the time sequence simulation method, the time sequence simulation device and the time sequence simulation system, the target machine code is converted into the executable code corresponding to the simulation host machine, the executable code is executed in parallel in response to a plurality of simulation processes, the processing time of the simulation process processing code is recorded, the processing time sequence is obtained, the virtual simulation time is updated, time synchronization is kept among the simulation processes, and under the condition that the executable code is executed, the time sequence simulation result used for representing the execution time sequence of the simulation host machine aiming at the target machine code can be output; the embodiment of the application drives a plurality of simulation processes to perform parallel simulation, maintains globally uniform simulation time, realizes generation of software processing time sequence, ensures simulation speed, can accurately reflect software execution time sequence, supports performance analysis of software, and plays a role as a uniform function development and performance analysis platform in the whole software life cycle.
Drawings
FIG. 1 is a schematic diagram of a conventional functional simulation scheme;
FIG. 2 is a flow diagram illustrating a timing simulation method according to one embodiment;
FIG. 3 is a flowchart illustrating the transcoding step of the target machine in one embodiment;
FIG. 4 is a flowchart illustrating a timing simulation method according to another embodiment;
FIG. 5 is a block diagram of a timing simulation apparatus according to an embodiment;
FIG. 6 is a diagram illustrating an internal structure of a computer device according to an embodiment;
fig. 7 is an internal structural view of a computer device in another embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
According to the traditional CA simulation scheme, signal-level time sequence modeling is carried out on a micro-architecture Pipeline (Pipeline) (comprising a finger Instruction Fetch, a decoding Instruction Decode, execution, memory access, write-back Write back), an Instruction Cache (Instruction Cache), a Data Cache (Data Cache), registers (Registers), a forward channel (Forwarding Path) and an interlocking Logic (Interlock Logic) in a microprocessor, and behaviors of the modules are updated according to clock periods during simulation. When the microprocessor is in the Execution phase, the modules of the digital Logic Arithmetic Unit ALU (Arithmetric Logic Unit), the Shifter, the Multiplier, the Auto-Addressing Unit Auto-induced Addressing and the like are subjected to signal level time sequence modeling, and the behaviors of the modules are updated according to the clock period in the simulation.
In the conventional CA simulation, a model is first established for each module defined in the RTL (Register transfer level) implementation based on the standard SystemC standard specification, and the clock/reset input and external signal definition of the model are consistent with the RTL implementation of the processor. And secondly, connecting each signal interface in a SystemC simulation environment to form a complete processor system. And thirdly, loading the compiled executable program in the instruction and data memory model. And fourthly, activating the periodic clock signal, releasing the reset signal and starting the SystemC simulation engine. Next, the SystemC simulation engine will traverse once all external signals and input/output signals of the model internal methods/threads in the current cycle according to the cycle defined by the clock, execute the model behavior in the current clock cycle when the signal sensitive to the model changes, update the output signals, check again whether there is a model sensitive to the changed signal, and execute the model behavior until there is no change in the model sensitive signal. Then go to the next cycle and repeat the just updated-sensitive-executed iterative behavior; until the simulation is finished.
The traditional CA simulation scheme can accurately simulate the behaviors and time sequences of all modules of the processor, provide an execution environment and an operation result which are consistent with hardware for software function development, and provide time sequence simulation which is almost consistent with the hardware for software performance analysis; however, there are at least the following problems: (1) the simulation speed is slow: the granularity of time stepping is 1 cycle, and the simulation of only 1 cycle can be completed in each iteration. Sensitive signal simulation at a module level is required, and the calculation amount of each iteration is large. The iteration calculation amount is large, the stepping time span is small, and the simulation speed is slow. (2) The verification and maintenance difficulty is large: on one hand, the adopted model is a model at the signal module level, which needs to strictly simulate the accurate RTL signal behavior, on the other hand, the adopted implementation mode is a SystemC mode, and the internal functions of the module are realized through a Method, a Thread and a Function. Two implementation modes of RTL and SystemC and a system integration mode exist in the same module. When RTL verification is performed, systemC needs to be verified at the same time, and when RTL is upgraded and modified, systemC needs to be upgraded and modified at the same time.
Further, in the conventional functional simulation scheme, an Instruction level Simulator (ISS) simulates Instruction level behaviors Of a processor architecture, and a DBT (Dynamic Binary Translation) is used to convert a Platform Code module (Blocks Of Guest Platform Code) Of a target processor into an executable Code Of a simulation host at runtime. When the translation is complete, these code modules may be executed directly on the host. Because the simulation of the RTL of the processor is skipped and the converted host code is directly executed, the simulation speed is greatly improved.
Taking the emulation host as X86-64 and the target machine as ARM (Advanced RISC Machines), as shown in fig. 1, the instruction level emulator ISS maintains a virtual target machine CPU (Central Processing Unit) State GCS (Guest CPU State), which includes target machine Program Counters (GPC, guest Program Counters, counters for maintaining instruction execution Pointers of the emulation target machine), stack Pointers, control Flag, and General-Purpose register (General-Purpose-Program Registers). The instruction level emulator ISS converts the target machine's Binary code (Guest Binary) into host code and then acts based on the CPU State. It should be noted that the instruction level emulator ISS does not translate all the target machine Code at once, but uses a microcode Generator TCG (Tiny Code Generator) to generate the Code Blocks TBs (Translation Blocks) of the host quickly at execution time, and stores them in the Translation Cache (Translation Cache), and then executes them. The execution of the TCG is divided into two phases, the first phase converting the target machine Instruction Set (ISA) to TCG code, and the second phase converting the TCG code to the host ISA.
The core of the ISS is a cpu _ exec () unit (abbreviated ISS processing unit) which is responsible for controlling the process of the conversion and the execution of the TB which completes the conversion. In the simulation, the cpu _ exec () is executed as follows: (1) if a piece of target machine code needs to be executed, the program counter (GPC) of the target machine is read from the CPU State. (2) Look up the GPC in the mapping Table Map Table. There are two cases, the first of which is if the object code starting with GPC has been translated and stored in the Translation Cache. The second case is that the object and code corresponding to the GPC are not translated or are translated but not kept in the Translation Cache.
In the case of finding the TB, the Map Table returns the execution process of the Translation Cache, and the process is transferred from cpu _ exec () to the preprocessing module Prologue. Prologue saves the host register state of the ISS, then jumps into the found TB, and then executes the TB. In most emulated scenarios, when execution reaches the end of the TB, it does not jump back to CPU _ exec () again, but instead jumps to the TB itself, or to other TBs, depending on the logic of the target machine code. When the current TB or the target TB of the jump is still in the Translation Cache, the return to the cpu _ exec () is not needed. Finally, when a TB is executed, the execution is exited, the GPC register in the CPU state is updated in the episogue module, the host register state of the ISS is restored, and the CPU _ exec () is returned. The GPC register state in the CPU state refers to the processor state of the virtual target machine to be emulated.
Further, for the case that no TB is found, if no GPC can be found in the Translation Cache, the target machine code is sent into the TCG for conversion, the converted code is put into a newly-built TB, the TB is inserted into the Translation Cache, and the TB is indexed in the Map Cache. When the above operation is completed, the subsequent execution process is the same as the case of finding the TB.
The traditional function simulation scheme can execute the target machine code module at high speed and provides an efficient platform for most software development and debugging. However, when the software function design needs results depending on the software processing timing and execution speed, when the software function itself and the timing are tightly bound (such as a real-time control system), and the software enters the optimization stage and the execution efficiency of the software needs to be determined, since the function simulation scheme cannot accurately simulate the execution time of the software, an effective timing simulation result cannot be provided in these scenes, and further, the software function related to the timing cannot be developed and debugged. Especially, the software of the 5G (5 th Generation Mobile Communication Technology, fifth Generation Mobile Communication Technology) baseband chip has a strong dependence on the execution timing, and the conventional functional simulation scheme cannot support the software development of the 5G baseband chip.
In the above conventional scheme, the CPU is simulated at the signal level to simulate the execution timing of the software, which results in a large processing amount and a slow simulation speed; however, the functional simulation of the CPU instruction is fast in processing, but cannot simulate the core output timing information.
The time sequence simulation scheme of the embodiment of the application not only ensures the simulation speed, but also can accurately reflect the software execution time sequence. The two capabilities of the functional simulation scheme and the CA simulation scheme are realized in one simulation scheme, and the simulation scheme plays a role as a unified functional development and performance analysis platform in the whole software life cycle.
The time sequence simulation method provided by the embodiment of the application can be applied to software development of a baseband chip; the baseband Chip is a System On Chip (SOC) with a very complex integration level, the basic structures of most of the baseband chips at present are a microprocessor and a Digital Signal Processor, the microprocessor is a control center of the whole Chip, most of the microprocessors use an ARM core, and a DSP (Digital Signal Processor) subsystem is responsible for baseband processing.
For example, a baseband chip in a smart phone is an SOC chip with a complex structure, and the chip has multiple functions, and the normal operation of each function is configured and coordinated through a microprocessor. Illustratively, the complex chip takes the ARM microprocessor as a center, and various peripheral functional modules around the ARM microprocessor are controlled and configured through a dedicated bus of the ARM microprocessor, where the functional modules may be, for example, a Global System For Mobile Communications (GSM), a Wi-Fi (wireless network System), a Global Positioning System (GSP), a bluetooth, a DSP, a memory, and the like, and each functional module has an independent memory and an address space, and functions are independent of each other.
Alternatively, the baseband chip may be divided into a plurality of sub-blocks, for example, the plurality of sub-blocks may be a CPU processor, a channel encoder, a digital signal processor, a modem, an interface module, and the like. For example, the baseband chip in the embodiment of the present application may refer to a Modem (Modem) baseband processing chip; it should be noted that, various simulations are involved in the software development process of the baseband chip, and the embodiment of the present application may be extended to other processor simulation systems to implement a time sequence simulation capability, and is used in implementation of EDA (electronic Design Automation) tools such as a Virtual Platform (VP), a performance simulation Platform, and the like. In addition, the embodiment of the application can also relate to Instruction precision (IA).
An execution main body of the timing simulation method provided by the embodiment of the present application is generally a computer device with certain computing capability, and the computer device includes: a terminal device or a server or other processing device, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a Personal Digital Assistant (PDA), or the like; the server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers. In some possible implementations, the timing simulation method may be implemented by a processor calling computer readable instructions stored in a memory.
In one embodiment, as shown in fig. 2, a timing simulation method is provided, which is described by taking the software development of the method applied to a baseband chip as an example, and includes the following steps:
step 202, converting the target machine code into an executable code corresponding to the simulation host machine;
the target machine code may refer to a code corresponding to a target machine program, that is, a code at a target machine end. Executable code may refer to code that an emulated host is capable of executing, such as host code, i.e., code on the host side; further, the target machine code may be a target machine instruction set ISA, and the corresponding executable code of the emulated host may be a host instruction set ISA.
In some examples, emulating the host's corresponding executable code may include code blocks TBs of the host that are generated quickly when executed using a microcode generator TCG. In other examples, the executable code may also include code blocks TBs carrying latency information. Further, the executable code may include code blocks TBs executable by the instruction level emulator ISS.
Illustratively, the process of converting target machine code into executable code may involve binary translation (e.g., dynamic binary translation, DBT). Optionally, the process of converting the target machine code into the executable code may also involve a microcode generator TCG, and the execution of the microcode generator TCG may include two stages, where the first stage is to convert the target machine instruction set ISA into the TCG code (the TCG code is a general instruction description form and is not related to the ISA of the host); the second phase is to translate the TCG code to the host ISA. By the two-stage conversion mode, the conversion of the target terminal and the host terminal is decoupled, and the portability is enhanced.
In one embodiment, the executable code may include host code carrying latency information; the virtual simulation time may include an actual processing delay resulting from the simulation process processing the corresponding host code based on the delay information.
Specifically, the embodiment of the application can convert the corresponding target machine code into the host machine code carrying the time delay information; the delay information can be used for the simulation process to obtain the actual processing delay for processing the corresponding host code, i.e. the virtual simulation time may include the actual processing delay, e.g. the actual processing delay is accumulated into the virtual simulation time. Furthermore, based on the host code carrying the time delay information, the simulation process of the embodiment of the application can obtain the actual processing time delay through the interaction of the real simulation target code and the outside world, so that the software processing time sequence obtained through simulation is more accurate.
In one embodiment, the time delay information may include at least one code segment identification determined according to the simulation accuracy requirement and at least one time delay data determined according to the time delay characteristic of the target machine code; the actual processing latency may include the actual execution time of the emulated host for the target machine code;
wherein, at least one code segment identification corresponds to at least one time delay data one by one; the code segment identifier is used for indicating the simulation process to jump out of the current processing aiming at the host code at the identifier position so as to acquire corresponding time delay data; the time delay data is used for representing the actual execution time of the target machine code corresponding to the host machine code positioned in front of the identification position.
Specifically, the embodiment of the application provides that a target machine code is converted into an executable code corresponding to a simulation host, wherein the executable code comprises a host code carrying time delay information; further, the latency information may include at least one code segment identification determined according to simulation accuracy requirements, and optionally, the latency information may further include at least one latency data determined according to latency characteristics of the target machine code.
Taking the host code as a code Block TB (Translation Block) as an example, while the target machine code is converted into the code Block TB, the target machine code may be analyzed synchronously by a Tiny Delay Generator (TDG) to generate Delay information corresponding to the code Block TB, and in some examples, generating the Delay information may refer to generating a Suspend trap (Suspend trap) ST corresponding to the code Block TB. It should be noted that the micro-delay generator TDG may be in parallel with the microcode generator TCG.
According to the method and the device, a micro-Delay generator TDG can be called, firstly, the TDG analyzes the target machine code one by one to obtain the Delay (Delay) characteristic of each code, and then the Delay data (such as Suspend data) is determined, secondly, the target machine code can be segmented according to the number of the contained Delay data according to the requirement of simulation precision, and a code segmentation identifier (such as a Trap instruction) is inserted after the end of each segment.
Furthermore, the code segment identification corresponds to the time delay data one by one; in the embodiment of the application, the code segment identifier may be used to indicate that the simulation process jumps out of the current processing for the host code at the identifier position to obtain corresponding delay data; and the time delay data can be used for representing the actual execution time of the target machine code corresponding to the host machine code positioned in front of the identification position.
In one embodiment, the code segment identification may be a Trap instruction; the latency data may be a pending Suspend data segment appended at the end of the code block.
Illustratively, taking the host code as a code block TB and the latency information as a suspend Trap ST, the latency information may include two parts, one part is a Trap instruction embedded in the host code, so that when the host code is executed to a Trap position, the TB code can be skipped and entered into an interrupt (intermediate processing module). A portion may be a Suspend data segment appended to the end of the code block TB, describing the actual execution time (e.g., actual execution latency) of the target machine code to which the host code corresponds before the Trap position. Wherein, suspend and Trap are in one-to-one correspondence relationship. Furthermore, the actual processing time delay is obtained by simulating the interaction with the outside world in an interaction (intermediate processing module) through a simulation process, so that the software processing time sequence obtained through simulation is more accurate.
It should be noted that the interrupt module (intermediate processing module) in the embodiment of the present application is configured to implement a delay of the emulation code and update the virtual emulation time of the emulation process (e.g., SC _ read, i.e., systemC process). Illustratively, the interrupt module may be configured to update a Clock Counter (Clock Counter) count during program simulation, and mark the count, thereby implementing the generation of software processing timing. In addition, for codes depending on external interaction (for example, codes belonging to a data storage type of a delay type), the actual processing delay is obtained by simulating the interaction with the outside in an interaction module through a simulation process, so that the software processing time sequence obtained through simulation is more accurate.
As described above, according to the requirement for precision of the simulation timing sequence, the embodiment of the present application generates Suspend trap (Suspend trap) data bound to the code block TB, and further can record the processing time of the simulation process for processing the host code, thereby realizing generation of the software processing timing sequence. Meanwhile, the actual processing time delay can be obtained based on the time delay information, so that the software processing time sequence obtained by simulation is more accurate.
And 204, responding to the parallel execution of the executable codes by the plurality of simulation processes, recording the processing time of the simulation processes for processing the codes, generating a processing time sequence, and determining the virtual simulation time required by the simulation processes for processing the codes according to the processing time sequence.
Specifically, after the executable code is acquired, the processing timing of the simulation process may be acquired in response to the plurality of simulation processes executing the executable code in parallel to further acquire the virtual simulation time required for the simulation process to process the code. In the embodiment of the present application, the multiple simulation processes execute the executable code in parallel, which may mean that the simulation engine drives the multiple task modules (SC _ THREAD) to perform parallel simulation.
In one embodiment, the emulation process is used to encapsulate the processing tasks of the instruction level emulator ISS in process form. Illustratively, the processing task may refer to cpu _ exec (); the emulation engine in the embodiment of the present application may be configured to drive a plurality of emulation processes to perform instruction-level emulation in parallel, where the instruction-level emulation may refer to performing emulation of instruction-level behavior (ISS) of a processor architecture.
Optionally, the simulation process in this embodiment may be an SC _ THREAD (SystemC process) using the cpu _ exec () as a main task; the simulation Engine in the embodiment of the present application may refer to a SystemC Time simulation Engine (SystemC Time Engine), for example, a SystemC simulation underlying Engine, and further implement SC _ THREAD form encapsulation of cpu _ exec (), and when cpu _ exec () completes one processing procedure, the SystemC Time simulation Engine may be notified to update the current SC _ THREAD virtual simulation Time.
Different from a traditional simple CPU _ exec () loop, the system c simulation bottom layer engine is introduced in the embodiment of the application, the CPU _ exec () loop is executed in one task module (SC _ THREAD), and then the system c bottom layer simulation engine can drive a plurality of task modules (SC _ THREAD) to perform parallel simulation and maintain globally uniform simulation time, so that a joint simulation platform is formed by a CPU model and other models, and the simulation of the interaction action of the CPU and other modules is supported. In the above, the embodiment of the present application performs joint simulation by embedding the SystemC simulation engine and other modules.
It should be noted that the simulation engine may include at least one module modeled based on the System C language, and the simulation process in this embodiment of the present application is configured to encapsulate the module in a process form, for example, the simulation process encapsulates a cpu _ exec () function of the instruction level simulator ISS in a process form, and implements SC _ THREAD form encapsulation of the cpu _ exec (), that is, the System C simulation engine and the SC _ THREAD task are introduced in this embodiment of the present application to encapsulate the cpu _ exec () core processing process. For example, in one simulation, a plurality of modules including a CPU may be simulated in parallel, for example, there may also be a memory, a GPU (Graphics Processing Unit), and the like. In this case, the CPU can concurrently execute one task module of the system mc and the other task modules.
The modules, task modules, tasks, models, CPU models, and the like in the embodiments of the present application may be understood as being implemented wholly or partially by software, hardware, and a combination thereof. The modules and the like can be embedded in a hardware form or independent from a processor in the computer equipment, and can also be stored in a memory in the computer equipment in a software form, so that the processor can call and execute the corresponding operations of the modules and the like.
Further, in the case where a plurality of simulation processes execute executable code in parallel, the processing timing can be generated by recording the processing timing at which the simulation processes process the code. Wherein, the processing time can refer to the time point of the target machine code execution; according to the embodiment of the application, the time point of execution of the target machine code is recorded, so that a complete time sequence is formed.
In one embodiment, the processing time may include an execution start time and an execution end time of the host code, and a obtaining time when the simulation process obtains the actual execution time.
Specifically, based on the parallel execution of executable codes by multiple simulation processes proposed in the embodiments of the present application, for example, a SystemC simulation engine and an SC _ THREAD task are introduced to encapsulate a CPU _ exec () core processing procedure, and then the CPU _ exec () starts to execute, in addition to maintaining a CPU State GCS (Guest CPU State), a Clock Counter (Clock Counter) may be initialized to record a time point of execution of target machine codes in a next step, so as to form a complete timing sequence.
The execution start time of the host code may be a marking of the time of entering the host code (e.g., code block TB) execution, alternatively, the execution start time may refer to the time when the host code (e.g., code block TB) enters the pre-processing module (Prologue); for example, the execution start time of the host code may be recorded by marking a current PC (Program Counter) and a Clock Counter corresponding to the current code, and writing the marked current PC and Clock Counter into a Timing Database (Timing Database). The marking in the embodiment of the present application may refer to recording an instantaneous value of a Clock Counter which is stepped forward all the time at a certain code position. Writing to the time-series database may refer to writing the instantaneous value to the database for reproducing the execution process with the time information in future analysis.
The execution end time of the host code may be understood as a time when a certain host code (for example, a code block TB) is executed, and exits from execution, for example, a time when cpu _ exec () is executed to enter a post-processing module (Epilogue), a GCS state is saved, a host register state is restored, and a time when a Compensation module (Suspend Compensation) is entered; the Compensation module (Suspend Compensation) is used for compensating the virtual simulation time of all the TBs existing in the mapping Table (Map Table) between the last pre-processing module (Prologue) to the post-processing module (Epilogue) for performing simulation.
The time when the simulation process obtains the actual execution time may refer to the time when the actual execution time is obtained by executing according to the delay data, and at this time, the current PC and the lockcounter corresponding to the current code may be marked and written into a Timing Database (Timing Database).
In the above, the Clock Counter count is updated in the program simulation process, and the marking is performed by using the count, so that the generation of the software processing time sequence is realized.
In one embodiment, the processing timing is in clock cycles; the virtual simulation time may be a difference between the execution start time and the execution end time.
In particular, the Clock cycle may refer to the Clock cycle number, and the virtual simulation time may also be in Clock cycles, and alternatively, the virtual simulation time may be understood as the Clock cycle consumed by the current CPU simulation.
Taking a host code as a code block TB as an example, when the current code block TB has no remaining part, the method enters a post-processing module (epilog), saves the GCS state, restores the host register state, enters a Compensation module (Suspend Compensation), reads the time marking of the execution of the TB in a Timing Database (Timing Database), reads the current time of a Clock Counter, and subtracts the current time to obtain the total Clock cycle number (virtual simulation time) required by the TB simulation which is just finished.
As described above, the SystemC simulation engine and the SC _ THREAD task are introduced in the embodiment of the present application to encapsulate the cpu _ exec () core processing process, and a Clock Counter (Clock Counter) is designed to record the software processing timing, so that the time update of the SystemC simulation engine is realized by combining the cpu _ exec () core processing process and the SC _ THREAD task.
And step 206, updating each virtual simulation time to keep time synchronization among the simulation processes.
Specifically, under the condition that the virtual simulation time corresponding to the simulation process is obtained, time synchronization can be kept among the simulation processes by updating the virtual simulation times; taking the simulation process to package the processing task CPU _ exec () of the instruction level simulator ISS in a process form as an example, the CPU model can be made time-consistent with other modules sharing the co-simulation.
Optionally, updating the virtual simulation time may refer to compensating the virtual simulation time, for example, the virtual simulation time of all TBs existing in the mapping Table (Map Table) between the last pre-processing module (Prologue) to the post-processing module (Epilogue) for performing simulation may be compensated by the Compensation module (Suspend Compensation) on the basis of exiting the post-processing module (Epilogue).
In one embodiment, the emulation process is used to encapsulate the processing tasks of the instruction level emulator ISS in process form;
the step of updating each virtual simulation time to keep time synchronization between each simulation process may include:
and updating the virtual simulation time corresponding to the simulation process to which the processing task belongs based on the execution frequency of the processing task.
Specifically, a processing task may refer to a processing logic function of the instruction level emulator ISS, for example, cpu _ exec (), and then, based on the execution frequency of the processing task, a virtual simulation time corresponding to a simulation process to which the processing task belongs may be updated. Taking the host code as the code block TB and the virtual simulation time as the total Clock cycle number as an example, after the total Clock cycle number required for TB simulation which is just finished is obtained, the SystemC simulation time of the CPU model to which SC _ THREAD belongs is updated according to the execution frequency of the current CPU. So that the CPU model is time-consistent with the other modules sharing the co-simulation.
Step 208, outputting a time sequence simulation result under the condition that the executable code is executed; and the time sequence simulation result is used for representing the execution time sequence of the simulation host computer aiming at the target machine code.
Specifically, in the case of completion of execution of the executable code, the timing simulation result may be output; the timing simulation result can be used for representing the execution timing of the simulation host machine aiming at the target machine code. Further, the execution Timing of the simulation host for the target machine code may refer to a software execution Timing of the simulation host for the target machine program, and based on the embodiment of the present application, if the current target machine program is completely executed, the simulation may be stopped, and the software execution Timing in the Timing Database (Timing Database) may be output.
Therefore, the time sequence simulation method of the embodiment of the application not only ensures the simulation speed, but also can accurately reflect the software execution time sequence and support the performance analysis of the software. Embedding a SystemC simulation engine, and carrying out joint simulation with other modules. And the verification of a system (such as a Modem baseband processing chip) with strict real-time interactivity requirements is supported. The embodiment of the application can realize the two capabilities of a function simulation scheme and a CA simulation scheme in one simulation scheme, and can play a role as a unified function development and performance analysis platform in the whole software life cycle.
In one embodiment, the host code is a block of code executable by the instruction level emulator ISS; as shown in fig. 3, the step of converting the target machine code into the executable code corresponding to the emulated host may include:
step 302, analyzing a target machine code, and confirming a time delay type of the target machine code; determining time delay data based on the time delay type of the target machine code;
step 304, segmenting the target machine code according to the quantity of the time delay data according to the requirement of the simulation precision to obtain each segmented code, and inserting a code segment identifier at the end of each segmented code;
step 306, performing binary conversion on the target machine code inserted with the code segment identifier to obtain a code block;
and step 308, adding the time delay data corresponding to each segmented code to the code block to obtain the host code.
In particular, the host code may be a code block, i.e. a code block TB, executable by the instruction level emulator ISS. The process of converting the target machine code into the executable code corresponding to the simulation host in the embodiment of the application can be understood as a process of analyzing and confirming the time delay information item by item for the target machine code, further performing corresponding conversion on the target machine code carrying corresponding time delay information (for example, a code segment identifier) to obtain a code block TB, and then writing the corresponding time delay information (for example, time delay data) and the code block TB into a cache corresponding to a mapping Table (Map Table). The software processing time sequence is generated, and the actual processing time delay can be obtained based on the time delay information, so that the software processing time sequence obtained through simulation is more accurate.
Taking the host code as the code block TB and the delay information as the suspend trap ST as an example, the corresponding procedure may be started to call the micro-delay generator TDG. The TDG can analyze the target machine code one by one to obtain a Delay (Delay) characteristic of each code, and further determine a Delay type to which the target machine code belongs, that is, the Delay of software processing is divided into two types of processing according to whether the code depends on external interaction. Then, determining time delay data according to the time delay type; secondly, segmenting the target machine code according to the number of the contained delay data (namely the quantity of the delay data) according to the requirement of simulation precision, and inserting a code segment identifier (such as a Trap instruction) after the end of each segment; and calling the microcode generator TCG, and converting the target machine ISA-TCG code-host machine ISA of the code of the micro-delay generator TDG after the Trap instruction is inserted. And combining the time delay data corresponding to each segmented code into suspended trap (Suspend trap) information, and writing the suspended trap information and the code block TB into a Translation Cache.
In one embodiment, the latency type may include an arithmetic logic type and a data storage type; the latency data may include a number of cycles corresponding to an arithmetic logic type, and a triplet of information corresponding to a data storage type. In one embodiment, the triplet information may include the target address, the access type, and the amount of data.
In particular, the time delay type is an arithmetic logic type of object machine code, which can be understood as arithmetic logic code. The time delay type is a target machine code of a data storage type, and can be understood as a data storage code interacted with the outside. Taking a CPU as an example, the instructions of the CPU can be divided into two types as a whole: arithmetic logic operations, and memory instruction data accesses. Arithmetic logic operations are performed within the CPU, and memory instructions and data access needs interact with memory external to the CPU.
Further, the TDG analyzes the target code one by one to obtain a Delay (Delay) characteristic of each code, where the obtained Delay characteristic is a determined cycle Cn for an arithmetic logic code, and may be a triplet Ct for a data storage code interacting with the outside; furthermore, according to the requirement of simulation precision, the target machine code can be segmented according to the number of contained triples Ct, and a Trap instruction is inserted after the end of each segment. The triple information may refer to a triple Ct < Addr, rw, load > formed by a < target address, an access type, and a data size >.
Then, the microcode generator TCG can be called, and the conversion of the target machine ISA-TCG code-host machine ISA is carried out on the code after the Trap instruction is inserted into the TDG. And combining the Cn and the Ct corresponding to each segmented code into suspended trap (Suspend trap) information, and writing the suspended trap information and the code block TB into a Translation Cache. In the embodiment of the application, the time delay of software processing is divided into two types of processing, and for the Ct type instruction depending on external interaction, the actual processing time delay can be obtained by simulating the interaction with the outside in a simulation process, so that the software processing time sequence obtained by simulation is more accurate.
In one embodiment, the triplet information may include the target address, the access type, and the amount of data; the method comprises the following steps of determining the virtual simulation time required by the simulation process to process the code according to the processing time sequence, wherein the steps comprise:
responding to the simulation process executed to the identification position, and if the period number corresponding to the current section code is read, accumulating the period number to the virtual simulation time;
and responding to the execution of the simulation process to the identification position, if the triple information corresponding to the current section code is read, acquiring the access time of the simulation process for the data volume based on the target address, and accumulating the access time to the virtual simulation time.
Specifically, in the process of executing the host code, if the host code is executed to the inserted code segment identification position (for example, the inserted Trap instruction position), the process of obtaining the corresponding delay data may be skipped, for example, the process may be skipped to the interaction module, and further, the interaction with the outside world is truly simulated through the simulation process in the interaction module, so as to obtain the actual processing delay.
Taking a host code as a code block TB and time delay information as a suspended trap ST as an example, the triple information may include a target address, an access type and a data amount, that is, the triple Ct < Addr, rw, load >; when the host code is executed to the inserted Trap instruction position, the GCS state is saved, the host register state is restored, and the jump is carried out to the interrupt module. The transmission module execution may include the following processes:
(1) the Suspend trap (Suspend trap) information is read and Cn of the current segment code is accumulated into Clock Counter, i.e., into virtual emulation time. (2) Reading the triple Ct, generating payload data accessed by the port according to the data volume described in the triple Ct, and sending out read-write access of a corresponding rw type from a corresponding port of the SC _ READ according to a target address described in the triple Ct. The time Ctn required for access completion is accumulated into Clock Counter, i.e., into the virtual emulation time.
Furthermore, the virtual simulation time in this embodiment includes an actual processing delay (for example, an actual execution time of the simulation host for the target machine code) obtained by processing the corresponding host machine code by the simulation process based on the delay information. According to the method and the device, the interaction with the outside can be truly simulated to obtain the actual processing time delay, so that the software processing time sequence obtained through simulation is more accurate.
To further explain the solution of the embodiment of the present application, a specific example is described below, and as shown in fig. 4, by taking an example that a SystemC simulation engine and an SC _ THREAD task are introduced to encapsulate a CPU _ exec () core processing process in the embodiment of the present application, the embodiment of the present application proposes to extend a CPU State GCS (Guest CPU State), and add a Clock Counter module for recording Clock cycles that need to be consumed by current CPU simulation. Further, in parallel with the microcode generator TCG, the target code is analyzed synchronously by the micro-delay generator TDG while being converted into the code block TB, and a suspended trap (Suspend trap) ST corresponding to the code block TB is generated.
The ST may include two parts, one of which is a Trap instruction embedded in the TB of the host code, so that when the host code is executed to the Trap position, the TB code can be skipped and the interrupt (intermediate processing module) can be entered. And a Suspend data segment attached to the end of the TB is a part, which describes the actual execution delay of the target machine code corresponding to the host machine code before the Trap position. Suspend and Trap are in one-to-one correspondence. On the basis of entering a code block TB execution pre-processing module (Prologue) and exiting a TB execution post-processing module (Epilogue), an interaction module is added for realizing the delay of simulation codes and updating the virtual simulation time of SC _ THREAD. In addition, on exiting the post-processing module (epilog), the Compensation module (Suspend Compensation) is further extended to compensate the virtual simulation time of all TBs existing in the Map Table for performing simulation between the last pre-processing module (Prologue) and the post-processing module (epilog).
As shown in fig. 4, the simulation execution process of the embodiment of the present application is represented by a thin line with an arrow, the transcoding process is represented by a thick dotted line with an arrow, and the newly added modules participating in the simulation execution and transcoding are represented by black boxes. The time sequence simulation method of the embodiment of the application can comprise the following steps:
step 1: when the ISS processing unit CPU _ exec () starts executing, in addition to maintaining the target machine CPU state GCS, a Clock Counter (Clock Counter) may be initialized, which may be used to record the point in time when the target machine code executes in the next step, thereby forming a complete timing sequence.
Step 2: after the system initialization is completed, according to a register lookup mapping Table (Map Table) in the current GCS, whether a corresponding code block TB exists in a Translation Cache (Translation Cache) for execution is searched. If not, the startup procedure (a) calls the micro-delay generator TDG. The micro-Delay generator TDG first analyzes the target code item by item to obtain the Delay (Delay) characteristic of each code item. The derived Delay property is a triple Ct < Addr, rw, load > consisting of < target address, access type, and data amount > for arithmetic logic code, and for data storage code interacting with the outside. Secondly, the TDG can segment the target machine code according to the number of the contained triple Ct according to the requirement of simulation precision, and a Trap instruction is inserted after the end of each segment. (b) Calling a microcode generator TCG, and converting the ISA-TCG code of the target machine to the ISA of the host machine for the code of the TDG after the Trap instruction is inserted into the TDG. And combining the Cn and the Ct corresponding to each segmented code into suspended Trap (Suspend Trap) information, and writing the suspended Trap information and the code block TB into a Translation Cache (Translation Cache).
And step 3: entering a preprocessing module (Prologue), marking a current PC (program Counter) and a Clock Counter corresponding to the current code, and writing the marked PC and the Clock Counter into a Timing Database (Timing Database). And saving the ISS host register state of the instruction level emulator, jumping to the TB pointed by the current PC pointer, and executing the converted target machine code.
And 4, step 4: when the target machine code is executed to the position of the inserted Trap instruction, the GCS state is saved, the host register state is restored, and the jump is carried out to an intermediate processing module (interrupt). The intermediate processing module (transmission) may perform the following process: (a) The Suspend Trap (Suspend Trap) information is read and Cn for the current code segment is accumulated into Clock Counter. (b) Reading the triple Ct, generating payload data accessed by the port according to the data volume described in the Ct, and sending out read-write access of a corresponding rw type from a corresponding port of the SC _ THREAD according to a target address described in the Ct. The time Ctn required for access completion is accumulated into the Clock Counter. (c) After all Cts are executed, marking the current PC and the Clock Counter corresponding to the current code, and writing the marked PC and the Clock Counter into a Timing Database (Timing Database). And jumping to the rest part of the TB for continuous execution, and if the current TB has no rest part, entering the step 5.
And 5: and entering a post-processing module (Epilogue), saving the GCS state and restoring the host register state. The operation can ensure the normal switching function of the host environment and the target machine in terms of saving the GCS state and restoring the host register state; if the state of the host computer is not saved when the target computer is switched, or vice versa, the host computer cannot return after switching. The saving and restoring in the embodiment of the present application may be for the status bit of the register, i.e., the value. It should be noted that the host register state of the ISS refers to the register state in the physical CPU of the server environment that executes the emulation.
Step 6: entering a Compensation module (Suspend Compensation), reading the time marking executed by the code block TB in a Timing Database (Timing Database), reading the current time of the Clock Counter, and subtracting the current time of the Clock Counter to obtain the total Clock period number required by the TB simulation just finished. And updating SystemC simulation time of the CPU model to which the simulation process (SC _ THREAD) belongs according to the execution frequency of the current CPU. So that the CPU model is time-consistent with the other modules sharing the co-simulation.
And 7: and entering cpu _ exec () of the SC _ THREAD task, if the current target machine program is not executed, returning to the step 1, if the current target machine program is completely executed, stopping simulation, and outputting a software execution time sequence in a time sequence Database (Timing Database).
Therefore, the time sequence simulation method of the embodiment of the application not only ensures the simulation speed, but also can accurately reflect the software execution time sequence and support the performance analysis of the software. Embedding a SystemC simulation engine, and carrying out joint simulation with other modules. And the verification of a system (such as a Modem baseband processing chip) with strict real-time interactivity requirements is supported.
Further, the embodiment of the application encapsulates the cpu _ exec () core processing process by introducing a SystemC simulation engine and an SC _ THREAD task; and designing a Clock Counter to record the software processing time sequence with the Clock period as a unit. The two are combined to realize the time update of the systemC simulation engine. According to the embodiment of the application, a micro-delay generator TDG is designed, and suspended Trap (Suspend Trap) data bound with a code block TB is generated according to the requirement on the precision of a simulation time sequence; and designing an interrupt module to update Clock Counter count in the program simulation process, and carrying out practical marking by using the count, thereby realizing the generation of software processing time sequence. According to the method and the device, the time delay of software processing is divided into two types of processing, and for Ct type instructions depending on external interaction, the actual processing time delay is obtained through the interaction with the outside through SC _ THREAD real simulation in an interaction module, so that the software processing time sequence obtained through simulation is more accurate.
It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.
Based on the same inventive concept, the embodiment of the application also provides a time sequence simulation device for realizing the time sequence simulation method. The implementation scheme for solving the problem provided by the apparatus is similar to the implementation scheme described in the above method, so the specific limitations in one or more embodiments of the timing simulation apparatus provided below can be referred to the limitations of the timing simulation method in the foregoing, and details are not described here.
In one embodiment, as shown in fig. 5, there is provided a timing simulation apparatus, including:
a code conversion module 510, configured to convert the target machine code into an executable code corresponding to the emulated host;
a simulation time obtaining module 520, configured to respond to parallel execution of the executable code by multiple simulation processes, record a processing time at which the simulation processes process the code, generate a processing time sequence, and determine, according to the processing time sequence, a virtual simulation time required by the simulation processes to process the code;
a simulation time updating module 530, configured to update each virtual simulation time, so as to keep time synchronization between the simulation processes;
a simulation result output module 540, configured to output a timing simulation result when the executable code is executed; and the time sequence simulation result is used for representing the execution time sequence of the simulation host computer aiming at the target machine code.
In one embodiment, the executable code includes host code carrying latency information; the virtual simulation time includes an actual processing delay obtained by the simulation process processing the corresponding host code based on the delay information.
In one embodiment, the time delay information comprises at least one code segment identification determined according to simulation precision requirements and at least one time delay data determined according to time delay characteristics of the target machine code; the actual processing time delay comprises the actual execution time of the simulation host computer aiming at the target machine code;
wherein, at least one code segment identification corresponds to at least one time delay data one by one; the code segment identifier is used for indicating the simulation process to jump out of the current processing aiming at the host code at the identifier position so as to acquire corresponding time delay data; the time delay data is used for representing the actual execution time of the target machine code corresponding to the host machine code positioned in front of the identification position.
In one embodiment, the processing time includes an execution starting time and an execution ending time of the host code, and the simulation process acquires an actual execution time.
In one embodiment, the processing timing is in clock cycles; the virtual simulation time is the difference between the execution starting time and the execution ending time.
In one embodiment, the host code is a block of code executable by the instruction level emulator ISS;
the code conversion module is used for analyzing the target machine code and confirming the time delay type of the target machine code; determining time delay data based on the time delay type of the target machine code; segmenting the target machine code according to the quantity of the time delay data according to the requirement of the simulation precision to obtain each segmented code, and inserting a code segment identifier at the end of each segmented code; carrying out binary conversion on the target machine code inserted with the code segment identifier to obtain a code block; and adding the time delay data corresponding to each segmented code to the code block to obtain the host code.
In one embodiment, the code segment is identified as a Trap instruction; the time delay data is a Suspend data segment attached at the end of the code block.
In one embodiment, the emulation process is used to encapsulate the processing tasks of the instruction level emulator ISS in process form; and the simulation time updating module is used for updating the virtual simulation time corresponding to the simulation process to which the processing task belongs based on the execution frequency of the processing task.
In one embodiment, the latency types include an arithmetic logic type and a data storage type; the latency data includes a number of cycles corresponding to an arithmetic logic type, and triplet information corresponding to a data storage type.
In one embodiment, the triplet information includes a target address, an access type, and a data volume;
the simulation time acquisition module is used for responding to the execution of the simulation process to the identification position, and if the periodicity corresponding to the current segmentation code is read, accumulating the periodicity to the virtual simulation time; and responding to the execution of the simulation process to the identification position, if the triple information corresponding to the current segmented code is read, acquiring the access time of the simulation process for the data volume based on the target address, and accumulating the access time to the virtual simulation time.
The modules in the above timing simulation apparatus can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, an Input/Output interface (I/O for short), and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data such as processing time sequence and the like. The input/output interface of the computer device is used for exchanging information between the processor and an external device. The communication interface of the computer device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a timing simulation method.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 7. The computer apparatus includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input device. The processor, the memory and the input/output interface are connected by a system bus, and the communication interface, the display unit and the input device are connected by the input/output interface to the system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The input/output interface of the computer device is used for exchanging information between the processor and an external device. The communication interface of the computer device is used for communicating with an external terminal in a wired or wireless manner, and the wireless manner can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a timing simulation method. The display unit of the computer device is used for forming a visual picture and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the configurations shown in fig. 6 and 7 are block diagrams of only some of the configurations relevant to the present disclosure, and do not constitute a limitation on the computing devices to which the present disclosure may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the above timing simulation method when executing the computer program.
In one embodiment, a timing simulation system is provided, on which a simulation engine of a baseband chip runs, the simulation engine is used for driving a plurality of simulation processes to execute simulation of an instruction level in parallel;
the time sequence simulation system comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the method when executing the computer program.
In one embodiment, the simulation engine comprises at least one module modeled based on System C language; the emulation process is used to encapsulate the module in process form.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the above-mentioned timing simulation method.
In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the above-described timing simulation method.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, databases, or other media used in the embodiments provided herein can include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), magnetic Random Access Memory (MRAM), ferroelectric Random Access Memory (FRAM), phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the various embodiments provided herein may be, without limitation, general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, or the like.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (16)

1. A method of timing simulation, the method comprising:
converting the target machine code into an executable code corresponding to the simulation host machine;
responding to a plurality of simulation processes to execute the executable codes in parallel, recording the processing time of the simulation processes for processing the codes, generating a processing time sequence, and determining the virtual simulation time required by the simulation processes for processing the codes according to the processing time sequence;
updating each virtual simulation time to keep time synchronization among each simulation process;
under the condition that the executable code is executed, outputting a time sequence simulation result; and the time sequence simulation result is used for representing the execution time sequence of the simulation host computer aiming at the target machine code.
2. The method of claim 1, wherein the executable code comprises host code carrying latency information; the virtual simulation time includes an actual processing delay obtained by the simulation process processing the corresponding host code based on the delay information.
3. The method of claim 2, wherein the latency information comprises at least one code segment identification determined based on simulation accuracy requirements and at least one latency data determined based on latency characteristics of the target code; the actual processing delay comprises the actual execution time of the simulation host computer aiming at the target machine code;
wherein the at least one code segment identification is in one-to-one correspondence with the at least one time delay data; the code segment identifier is used for indicating the simulation process to jump out of the current processing aiming at the host code at an identifier position so as to obtain the corresponding time delay data; the time delay data is used for representing the actual execution time of the target machine code corresponding to the host machine code before the identification position.
4. The method according to claim 3, wherein the processing time includes an execution start time and an execution end time of the host code, and the simulation process acquires the actual execution time.
5. The method of claim 4, wherein the processing timing is in clock cycles; the virtual simulation time is the difference between the execution starting time and the execution ending time.
6. The method of claim 3, wherein the host code is a block of code executable by an instruction level emulator (ISS);
the step of converting the target machine code into the executable code corresponding to the simulation host comprises the following steps:
analyzing the target machine code and confirming the time delay type of the target machine code;
determining the time delay data based on the time delay type of the target machine code;
segmenting the target machine code according to the quantity of the time delay data according to the simulation precision requirement to obtain each segmented code, and inserting the code segmentation identification at the end of each segmented code;
carrying out binary conversion on the target machine code inserted with the code segment identifier to obtain the code block;
and adding the time delay data corresponding to each segmented code to the code block to obtain the host code.
7. The method of claim 6, wherein the code segment is identified as a Trap instruction; the time delay data is a Suspend data segment attached to the end of the code block.
8. The method according to claim 6, wherein the emulation process is used to encapsulate the processing tasks of the instruction level emulator ISS in process form;
the step of updating each virtual simulation time to keep time synchronization between each simulation process includes:
and updating the virtual simulation time corresponding to the simulation process to which the processing task belongs based on the execution frequency of the processing task.
9. The method of any of claims 3 to 8, wherein the latency types include an arithmetic logic type and a data storage type; the latency data includes a number of cycles corresponding to the arithmetic logic type and triplet information corresponding to the data storage type.
10. The method of claim 9, wherein the triplet information includes a destination address, an access type, and a data volume; the step of determining the virtual simulation time required by the simulation process to process the code according to the processing time sequence comprises the following steps:
responding to the simulation process executed to the identification position, and if the period number corresponding to the current segmented code is read, accumulating the period number to the virtual simulation time;
responding to the simulation process executed to the identification position, if the triple information corresponding to the current segmented code is read, acquiring the access time of the simulation process for the data volume based on the target address, and accumulating the access time to the virtual simulation time.
11. A timing simulation apparatus, the apparatus comprising:
the code conversion module is used for converting the target machine code into an executable code corresponding to the simulation host;
the simulation time acquisition module is used for responding to the parallel execution of the executable codes by the plurality of simulation processes, recording the processing time of the simulation process processing codes, generating a processing time sequence and determining the virtual simulation time required by the simulation process processing codes according to the processing time sequence;
the simulation time updating module is used for updating each virtual simulation time to keep time synchronization among each simulation process;
the simulation result output module is used for outputting a time sequence simulation result under the condition that the executable code is executed; and the time sequence simulation result is used for representing the execution time sequence of the simulation host computer aiming at the target machine code.
12. A time sequence simulation system is characterized in that a simulation engine of a baseband chip runs on the time sequence simulation system, and the simulation engine is used for driving a plurality of simulation processes to execute simulation of an instruction level in parallel;
the timing simulation system comprises a memory storing a computer program and a processor implementing the steps of the method of any one of claims 1 to 10 when executing the computer program.
13. The System of claim 12, wherein the simulation engine comprises at least one module modeled based on the System C language; the simulation process is used for encapsulating the module in a process form.
14. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 10 when executing the computer program.
15. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 10.
16. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 10 when executed by a processor.
CN202211575933.4A 2022-12-09 2022-12-09 Time sequence simulation method, device and system Pending CN115858092A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211575933.4A CN115858092A (en) 2022-12-09 2022-12-09 Time sequence simulation method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211575933.4A CN115858092A (en) 2022-12-09 2022-12-09 Time sequence simulation method, device and system

Publications (1)

Publication Number Publication Date
CN115858092A true CN115858092A (en) 2023-03-28

Family

ID=85671334

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211575933.4A Pending CN115858092A (en) 2022-12-09 2022-12-09 Time sequence simulation method, device and system

Country Status (1)

Country Link
CN (1) CN115858092A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116521576A (en) * 2023-05-11 2023-08-01 上海合见工业软件集团有限公司 EDA software data processing system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116521576A (en) * 2023-05-11 2023-08-01 上海合见工业软件集团有限公司 EDA software data processing system
CN116521576B (en) * 2023-05-11 2024-03-08 上海合见工业软件集团有限公司 EDA software data processing system

Similar Documents

Publication Publication Date Title
US8549468B2 (en) Method, system and computer readable storage device for generating software transaction-level modeling (TLM) model
US10559057B2 (en) Methods and apparatus to emulate graphics processing unit instructions
US7779393B1 (en) System and method for efficient verification of memory consistency model compliance
WO2014035463A1 (en) System and methods for generating and managing a virtual device
Pétrot et al. On mpsoc software execution at the transaction level
ITVI20100208A1 (en) METHOD ¿SIMULATION SYSTEM ACTS TO SIMULATE A HARDWARE PLATFORM WITH MULTIPLE COMPONENTS
CN113196243A (en) Improving simulation and tracking performance using compiler-generated simulation-optimized metadata
Wang et al. Fast and accurate cache modeling in source-level simulation of embedded software
CN115858092A (en) Time sequence simulation method, device and system
CN113868068B (en) Kernel performance testing method, computing device and storage medium
JP5514211B2 (en) Simulating processor execution with branch override
Lai et al. Fast profiling framework and race detection for heterogeneous system
CN109891395B (en) Debugging system and method
JP2007310565A (en) System lsi verification device and system lsi verification program
US9658849B2 (en) Processor simulation environment
US8886512B2 (en) Simulation apparatus, computer-readable recording medium, and method
US20120191444A1 (en) Simulation device, simulation method, and computer program therefor
Wang et al. Hycos: hybrid compiled simulation of embedded software with target dependent code
Afonso et al. Fancier: A unified framework for java, c, and opencl integration
JP2013020425A (en) Hardware and software cooperative verification method using open source software
JP2011238137A (en) Performance estimation device
CN108604205B (en) Test point creating method, device and system
Popovici et al. Virtual prototype design
US20040098708A1 (en) Simulator for software development and recording medium having simulation program recorded therein
Cho et al. Case study: verification framework of Samsung reconfigurable processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination