CN118468798A - Method and device for generating check point, electronic equipment and storage medium - Google Patents
Method and device for generating check point, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN118468798A CN118468798A CN202410931327.4A CN202410931327A CN118468798A CN 118468798 A CN118468798 A CN 118468798A CN 202410931327 A CN202410931327 A CN 202410931327A CN 118468798 A CN118468798 A CN 118468798A
- Authority
- CN
- China
- Prior art keywords
- processor
- core
- synchronization
- processor core
- check point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 230000002093 peripheral effect Effects 0.000 claims abstract description 36
- 230000001360 synchronised effect Effects 0.000 claims abstract description 29
- 238000004891 communication Methods 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 17
- 238000011084 recovery Methods 0.000 claims description 12
- 238000005516 engineering process Methods 0.000 abstract description 9
- 229910052710 silicon Inorganic materials 0.000 abstract description 8
- 239000010703 silicon Substances 0.000 abstract description 8
- 230000009286 beneficial effect Effects 0.000 abstract description 4
- 238000004088 simulation Methods 0.000 description 20
- 230000006870 function Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 13
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 7
- 238000004590 computer program Methods 0.000 description 7
- 230000006399 behavior Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000012938 design process Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000000750 progressive effect Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 238000004886 process control Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Landscapes
- Debugging And Monitoring (AREA)
Abstract
The embodiment of the invention provides a method and a device for generating a check point, electronic equipment and a storage medium, and relates to the technical field of computers, wherein the method comprises the following steps: acquiring a synchronization threshold value corresponding to each processor core in the multi-core processor; the synchronization threshold is used for indicating the processor core to verify the number of instructions which need to be executed for synchronizing with the states of other processor cores; controlling each processor core to enter a synchronous state respectively according to the synchronous threshold value; after each processor core in the multi-core processor enters a synchronous state, controlling each processor core to continue executing instructions; after each processor core reaches a checkpoint generation period, generating a checkpoint, and storing the checkpoint into a peripheral address space for a software simulator to read and simulate. The embodiment of the invention realizes the application of the check point technology on the multi-core processor, and is beneficial to improving the efficiency of evaluating the correctness of the multi-core system before silicon.
Description
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for generating a checkpoint, an electronic device, and a storage medium.
Background
In the chip design process, the verification of the correctness before silicon and the performance evaluation are very important links. Current pre-silicon evaluations are mostly based on various simulation schemes. The simulation consisted of two parts overall: simulator and workload. Common simulators generally include software-based hardware description language (Hardware Description Language) HDL simulation, field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) based hardware simulation, and simulation accelerator based hardware simulation. The workload is typically a benchmark suite.
Various simulation techniques have respective limitations, for example, HDL simulation inevitably has a problem of extremely slow simulation speed due to complete simulation hardware implementation; although the simulation speed of the FPGA is faster than that of HDL simulation, the universality of the FPGA is not strong, the capacity is limited, and the cost for purchasing the FPGA is higher and higher along with the continuous rise of the complexity and the scale of a Xiangshan processor; the simulation accelerator, while capable of accelerating the simulation and of sufficient capacity, is expensive resulting in a very limited amount of possession. Thus reducing the size (number of instructions) of the workload being emulated becomes an economically viable solution.
The currently popular schemes for reducing workload are the following: 1) Reducing an input set of workloads; 2) Sampling the workload; 3) Fast forward techniques to simulate workload. Among them, sampling of the workload is the most common approach at present, by which checkpoints representing the program behavior can be derived. In some cases, the simulation of the program can be implemented even with only a single, maximally weighted checkpoint (40M instruction), thus greatly speeding up the simulation.
However, most of the current checkpoint generation schemes only support single-core loads, and support for multi-core loads is very limited, so that the efficiency of accuracy evaluation of the multi-core system before silicon is limited.
Disclosure of Invention
The embodiment of the invention provides a method and a device for generating a check point, electronic equipment and a storage medium, which can solve the problem that the method for generating the check point in the related technology does not support multi-core load.
In order to solve the above problems, an embodiment of the present invention discloses a method for generating a check point, which is applied to a simulator, and the method includes:
Acquiring a synchronization threshold value corresponding to each processor core in the multi-core processor; the synchronization threshold is used for indicating the processor core to verify the number of instructions which need to be executed for synchronizing with the states of other processor cores;
controlling each processor core to enter a synchronous state respectively according to the synchronous threshold value;
After each processor core in the multi-core processor enters a synchronous state, controlling each processor core to continue executing instructions;
After each processor core reaches a checkpoint generation period, generating a checkpoint, and storing the checkpoint into a peripheral address space for a software simulator to read and simulate.
Optionally, the method further comprises:
reading the memory layout information of the check point from the peripheral address space by using a software restorer;
reading a register value corresponding to each processor core according to the memory layout information;
and carrying out state recovery on the register corresponding to each processor core according to the register value so as to recover the register into the state corresponding to the check point.
Optionally, the acquiring the synchronization threshold corresponding to each processor core in the multi-core processor includes:
Generating an initial check point during the running process of the simulator;
Sending the initial checkpoint to a software simulator to enable the software simulator to predict a synchronization threshold according to the initial checkpoint;
And receiving a synchronization threshold predicted by the software simulator.
Optionally, the method further comprises:
and sending the check point to the software simulator so that the software simulator predicts a synchronization threshold corresponding to the next round of multi-core synchronization according to the check point.
Optionally, the acquiring the synchronization threshold corresponding to each processor core in the multi-core processor includes:
Acquiring a preset initial synchronization threshold value, and performing first-round multi-core synchronization based on the initial synchronization threshold value;
and receiving a synchronization threshold predicted by the software simulator, and performing next round of multi-core synchronization based on the synchronization threshold.
Optionally, the controlling each processor core to enter the synchronization state according to the synchronization threshold includes:
calculating a first target instruction number which needs to be executed when each processor core reaches a synchronization point according to the synchronization threshold value and a first preset period;
And counting the number of the executed instructions for each processor core respectively, and controlling the processor cores to enter a synchronous state under the condition that the number of the executed instructions of the processor cores is equal to the first target instruction number corresponding to the processor cores.
On the other hand, the embodiment of the invention discloses a device for generating a check point, which is applied to a simulator, and comprises the following components:
The acquisition module is used for acquiring the synchronization threshold value corresponding to each processor core in the multi-core processor; the synchronization threshold is used for indicating the processor core to verify the number of instructions which need to be executed for synchronizing with the states of other processor cores;
the synchronization module is used for controlling each processor core to enter a synchronization state respectively according to the synchronization threshold value;
The control module is used for controlling each processor core in the multi-core processor to continue executing instructions after the processor cores enter a synchronous state;
And the generation module is used for generating a check point after each processor core reaches a check point generation period, and storing the check point into a peripheral address space for a software simulator to read and simulate.
Optionally, the apparatus further comprises:
the first reading module is used for reading the memory layout information of the check point from the peripheral address space by using a software restorer;
the second reading module is used for reading the register value corresponding to each processor core according to the memory layout information;
and the state recovery module is used for carrying out state recovery on the register corresponding to each processor core according to the register value so as to recover the register into the state corresponding to the check point.
Optionally, the acquiring module includes:
an initial check point generating sub-module, configured to generate an initial check point during the running process of the simulator;
an initial checkpoint transmitting sub-module, configured to transmit the initial checkpoint to a software simulator, so that the software simulator predicts a synchronization threshold according to the initial checkpoint;
and the first receiving submodule is used for receiving the synchronization threshold value predicted by the software simulator.
Optionally, the apparatus further comprises:
And the check point sending module is used for sending the check point to the software simulator so that the software simulator predicts a synchronization threshold corresponding to the next round of multi-core synchronization according to the check point.
Optionally, the acquiring module includes:
the initial threshold value acquisition sub-module is used for acquiring a preset initial synchronization threshold value and carrying out first-round multi-core synchronization based on the initial synchronization threshold value;
And the second receiving sub-module is used for receiving the synchronization threshold value predicted by the software simulator and carrying out the next round of multi-core synchronization based on the synchronization threshold value.
Optionally, the synchronization module includes:
The calculation sub-module is used for calculating a first target instruction number which needs to be executed when each processor core reaches a synchronization point according to the synchronization threshold value and a first preset period;
And the control sub-module is used for counting the number of the executed instructions for each processor core respectively and controlling the processor cores to enter a synchronous state under the condition that the number of the executed instructions of the processor cores is equal to the first target instruction number corresponding to the processor cores.
In still another aspect, the embodiment of the invention also discloses an electronic device, which comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus; the memory is used for storing executable instructions which enable the processor to execute the method for generating the check point.
The embodiment of the invention also discloses a readable storage medium, which enables the electronic equipment to execute the method for generating the check point when the instructions in the readable storage medium are executed by the processor of the electronic equipment.
The embodiment of the invention has the following advantages:
The embodiment of the invention provides a method for generating a check point, which can synchronize the progress of different processor cores on a simulator according to the synchronization threshold value of each processor core in a multi-core processor, generate the check point after each processor core reaches the check point generation period, store the check point in a peripheral address space, and not occupy the memory space of a kernel, thereby realizing the application of the check point technology on the multi-core processor and being beneficial to improving the efficiency of evaluating the correctness of a multi-core system before silicon.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of steps of an embodiment of a method of generating a checkpoint of the present invention;
FIG. 2 is a schematic diagram of a checkpoint generation and restoration process of the present invention;
FIG. 3 is a schematic flow diagram of a multi-core synchronization and checkpointing process of the present invention;
FIG. 4 is a block diagram illustrating an embodiment of a checkpoint generating device in accordance with the present invention;
Fig. 5 is a block diagram of an electronic device according to an example of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present invention may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, the term "and/or" as used in the specification and claims to describe an association of associated objects means that there may be three relationships, e.g., a and/or B, may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. The term "plurality" in embodiments of the present invention means two or more, and other adjectives are similar.
Method embodiment
First, some terms related to embodiments of the present invention will be explained.
HDL: a hardware description language for describing the behavior of a chip circuit. HDL simulation can simulate the behavior of a circuit in software to help evaluate the function and performance of the circuit.
And (3) FPGA: a Field Programmable Gate Array (FPGA) is an integrated circuit that is configured by a customer or designer after manufacture, and is therefore referred to as field programmable. FPGA configuration is typically specified using a Hardware Description Language (HDL). Unlike conventional fixed function integrated circuits (ASICs), FPGAs can be flexibly reprogrammed and reconfigured to accommodate different applications and functions as desired by the user. An FPGA consists of a large number of programmable logic blocks (logic blocks) and programmable interconnect resources (interconnect resources). Programmable logic blocks are typically composed of Look-Up Tables (LUTs), registers, and other logic elements that can perform various logic functions. The programmable interconnect resources are used to connect the logic blocks to form the desired circuit structure. Using FPGAs, design engineers can describe the required circuit functions using hardware description language and convert them into bit streams (bitstreams) compatible with the FPGA chip by programming tools. The bit stream contains information to program and configure the internal logic and interconnect resources of the FPGA. One of the main advantages of FPGA is its programmability and flexibility. It allows design engineers to implement custom functions and algorithms at the hardware level without the need for traditional custom integrated circuit design and manufacturing processes. This makes FPGAs very useful in prototype development and fast design iterations.
Simulation accelerator: a field programmable development platform specially designed for a large chip has the advantages of larger capacity and easier debugging compared with an FPGA. But the use cost is high, and the operation speed is slower than that of the FPGA (500 KHz VS 50 MHz).
Checkpoint (checkpoint): the system is a data format, records the state of a program at a certain moment in the execution process, and can be loaded by a specific simulator or the simulator, so that the simulator and the simulator can directly start to run from a certain middle moment of the program running. Typically, the system comprises at least information such as architecture visible registers, memory and the like.
Electronic system level (Electronic SYSTEM LEVEL, ESL) platform: a platform for modeling the behavior of the entire circuitry using a high-level language (e.g., C, C ++) or using a graphical design tool. A typical ESL platform is GEM5.
Benchmark test suite: the test program for evaluating the performance, power consumption and other indexes of the processor comprises a program and an input set.
Software restorer: a program for restoring register state from a file into hardware runs in the M state of RISC-V.
Function simulator: only the target instruction set architecture (Instruction Set Architecture, ISA) function implementation is simulated, focusing on achieving functional consistency, a simulator that does not contain any microarchitectural information.
Referring to fig. 1, there is shown a flowchart of steps of an embodiment of a method for generating a checkpoint of the present invention, the method may specifically include the steps of:
Step 101, acquiring a synchronization threshold value corresponding to each processor core in the multi-core processor; the synchronization threshold is used for indicating the processor core to verify the number of instructions which need to be executed for synchronizing with the states of other processor cores;
102, controlling each processor core to enter a synchronous state respectively according to the synchronous threshold;
Step 103, after each processor core in the multi-core processor enters a synchronous state, controlling each processor core to continue executing instructions;
And 104, after each processor core reaches a check point generation period, generating a check point, and storing the check point into a peripheral address space for a software simulator to read and simulate.
The method for generating the check point provided by the embodiment of the invention can be applied to a simulator. A simulator is a software tool that simulates the behavior or functionality of a hardware device, system, or other software, and with which a virtual environment can be created in which a user can perform and test various operations without the need for actual physical hardware. The simulator in the embodiment of the invention can simulate the functions of instruction execution, data transmission and the like of the multi-core processor. By way of example, the simulator in embodiments of the present invention may be a functional simulator, such as QEMU, spike, etc.
Checkpoints refer to a snapshot of saving program state at a specific time point in the process of executing a program, and can be used for backing up and restoring the program state, for example, after a slice in the program is extracted, a checkpoint corresponding to the starting point of the program slice needs to be saved, so as to facilitate restoring the operation of the program slice in a simulator or a hardware environment. Program slicing is a technique in which subsets are selectively extracted from a program, and by which the program can be sliced into smaller portions, thereby making analysis and evaluation of the program more accurate and efficient.
According to the method for generating the check point, the check point can be generated in the simulator, a user can pause simulation at a certain specific point, save information such as the current processor state, memory data and system configuration, and then re-update and load the check point after a period of time, and continue simulation from the state.
In particular, before checkpoints are generated, progress between different cores needs to be synchronized on the simulator. The embodiment of the invention sets a separate synchronization threshold value for each processor core, wherein the synchronization threshold value is used for indicating the number of instructions which need to be executed for the processor core to synchronize with the states of other processor cores. For example, the synchronization threshold may be predicted after a checkpoint or test program slice is simulated using a software simulator.
After determining the synchronization threshold value corresponding to each processor core, each processor core is respectively controlled to carry out synchronization state based on the synchronization threshold value. Specifically, when the number of instructions executed by a certain processor core Pi is equal to the number of instructions indicated by the synchronization threshold, the processor core Pi may be controlled to enter the synchronization state, and wait for other processor cores to enter the synchronization state.
After all the processor cores of the multi-core processor enter the synchronous state, each processor core is controlled to continue executing instructions. The progress among the processor cores can be synchronized in the running process of the simulator by circulating the above steps, and multi-core synchronization is realized.
And after all processor cores of the multi-core processor reach a check point generation period, generating a check point, and storing the check point into a peripheral address space. The software emulator may read and emulate checkpoints from the peripheral address space.
The checkpoint generation period may be a preset period, and the checkpoint generation period of each processor core may be the same or different.
Checkpoints are a type of file that contains hardware state and memory images and can be used to save state of a program at any time. The common check point technology has two implementation schemes, one is an operating system level check point, which is realized by saving a process control block of a process and a memory mirror image; one is a full system level checkpoint implemented by saving the complete register state and memory image. The embodiment of the invention does not limit the generation mode of the check point in detail.
The peripheral refers to external equipment of the multi-core processor core, and the external equipment can be carried in an FPGA chip and does not belong to a PL end or a PS end. The external device in embodiments of the present invention may be an external memory device, such as a double rate synchronous dynamic random access memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR), for example.
The peripheral address space refers to a virtual address space allocated for the external device based on the memory map. It should be noted that, in the embodiment of the present invention, the peripheral address space does not include the memory space occupied by the processor core. According to the embodiment of the invention, the check points are stored in the peripheral address space, so that the situation that a large number of hardware states need to be stored in the memory to occupy the memory space of the kernel when the check points are generated due to the excessive number of processor cores in the multi-core processor can be avoided, the check point technology is used in the multi-core processor, and the memory space of the kernel is not occupied.
According to the method for generating the check point, which is provided by the embodiment of the invention, the progress of different processor cores can be synchronized on the simulator according to the synchronization threshold value of each processor core in the multi-core processor, after each processor core reaches the check point generation period, the check point is generated and stored in the peripheral address space, the memory space of the inner core is not occupied, the application of the check point technology on the multi-core processor is realized, and the efficiency of evaluating the correctness of the multi-core system before silicon is improved.
Optionally, the method further comprises:
s11, reading the memory layout information of the check point from the peripheral address space by using a software restorer;
step S12, reading a register value corresponding to each processor core according to the memory layout information;
And S13, performing state recovery on the register corresponding to each processor core according to the register value so as to recover the register into the state corresponding to the check point.
In embodiments of the present invention, a software restorer may be utilized to restore hardware state in the emulator from the peripheral address space based on checkpoints.
Specifically, the software restorer may first read the memory layout information of the check point from the peripheral address space, and then read the register value corresponding to each processor core recorded in the check point according to the memory layout information. And finally, carrying out state recovery on the register for checking each processor in the simulator based on the read register value, namely recovering the register to a state corresponding to the check point. After the recovery of the check point is completed, the simulator is run, and each processor core continues to execute instructions from the corresponding position of the check point.
It should be noted that the registers to be restored may include a privilege level (CSR) register, an integer (int) register, a floating point (fp) register, a vector (vector) register, and various peripheral registers (e.g., mtime register, mtimecmp register, etc.). Each core will read in register values from the peripheral address space and restore them to its own registers, except for mtime. mtime is recovered by one of the cores (typically the last core in the run to recover to the mtime recovery location).
Referring to fig. 2, a schematic diagram of a checkpoint generation and restoration process according to an embodiment of the present invention is shown. As shown in FIG. 2, after each processor core arrives at a checkpoint generation cycle, the simulator (e.g., QEMU) generates a checkpoint, which is a compressed file whose structure, as shown in the figure pointed to by checkpoint (CHECKPOINT), includes three parts, one is the HEADER (HEADER) of the checkpoint, includes the basic information and memory layout of the checkpoint, the second is the register state of each core, and the third is the memory image at save (i.e., at the time of checkpointing). Checkpoints are decompressed into memory and peripheral address space by a software emulator (detail model, e.g., GEM 5/XS), and the software resumptor overlay is read from the software resumptor path given in the parameters at 0x80000000 addresses to 0x80100000 addresses. Then, the software emulator performs emulation starting at the 0x80000000 address. It should be noted that, the software emulator may directly read the memory portion in the check point to the memory, and read other contents to the peripheral address space. The register states of the different cores may be placed in the order of hartid x 1M offset after the HEADER in the checkpoint and then placed into the peripheral address space by the software emulator.
The simulator uses a software restorer (GCPT _restore, for example) to read the HEADER from the peripheral address space, determine the memory layout of the checkpoints, then reads the register value corresponding to each core from the peripheral address space according to the memory layout, and finally RESTOREs to the corresponding core. For example, the software restorer may restore a privilege level register using a privilege level instruction, restore a normal register using a load instruction, and restore a vector register using a vector instruction.
Optionally, the acquiring the synchronization threshold corresponding to each processor core in the multi-core processor includes:
step S21, generating an initial check point in the running process of the simulator;
Step S22, the initial check point is sent to a software simulator, so that the software simulator predicts a synchronization threshold according to the initial check point;
and S23, receiving a synchronization threshold predicted by the software simulator.
In the embodiment of the invention, the software simulator can simulate the check point generated in the previous period to predict the synchronization threshold. The simulator performs a new round of state synchronization for each processor core based on the synchronization threshold.
When the first round of multi-core synchronization is performed, an initial check point can be generated after the simulator starts to run, and the initial check point is sent to the software simulator. The software emulator predicts a synchronization threshold for each processor core in a first round of multi-core synchronization based on an initial checkpoint.
Optionally, the method further comprises:
and sending the check point to the software simulator so that the software simulator predicts a synchronization threshold corresponding to the next round of multi-core synchronization according to the check point.
The simulator controls each processor core to enter a synchronous state based on a synchronous threshold value respectively, and first-round multi-core synchronization is achieved. And after all the processor cores reach the check point generation period, generating a check point, sending the check point to the software simulator, and predicting the synchronization threshold value of each processor core in the next round of multi-core synchronization by the software simulator. And (5) circulating in this way until the operation is finished.
Optionally, the acquiring the synchronization threshold corresponding to each processor core in the multi-core processor includes:
step S31, acquiring a preset initial synchronization threshold value, and performing first-round multi-core synchronization based on the initial synchronization threshold value;
And step S32, receiving a synchronization threshold predicted by the software simulator, and carrying out next round of multi-core synchronization based on the synchronization threshold.
In another alternative embodiment of the present invention, an initial synchronization threshold may be set in advance for each processor core, and the simulator performs a first round of multi-core synchronization based on the initial synchronization threshold, and generates and stores a checkpoint into the peripheral address space when each processor core reaches a checkpoint generation period. The software simulator obtains a check point from the peripheral address space, simulates the check point, predicts a new synchronization threshold value, and sends the synchronization threshold value to the simulator. The simulator performs the next round of multi-core synchronization based on the received synchronization threshold.
Optionally, the controlling each processor core to enter the synchronization state according to the synchronization threshold includes:
Step S41, calculating a first target instruction number which needs to be executed when each processor core reaches a synchronization point according to the synchronization threshold value and a first preset period;
Step S42, counting the number of instructions executed for each processor core, and controlling the processor cores to enter a synchronous state when the number of instructions executed by the processor cores is equal to the first target number of instructions corresponding to the processor cores.
The first preset period may be a fixed value, and is not changed in each round of multi-core synchronization.
For each processor core, the number of instructions required to reach the synchronization point may be calculated based on the synchronization threshold and the first preset period. Illustratively, the first target number of instructions that the processor core needs to execute to reach the synchronization point = a first preset cycle/synchronization threshold.
In the running process of the simulator, the number of instructions executed by each processor core can be counted respectively, and the processor cores are controlled to enter a synchronous state under the condition that the number of instructions executed by any processor core is equal to the first target number of instructions corresponding to the processor core.
Similarly, in the embodiment of the present invention, a checkpoint generating interval may be preset, and based on the synchronization threshold value and the checkpoint generating interval corresponding to each processor core, the second target instruction number that needs to be executed when each processor core reaches the checkpoint generating period is calculated, and when the number of instructions executed by any processor core is equal to the second target instruction number corresponding to the processor core, it is determined that the processor core reaches the checkpoint generating period. When all processor cores reach a checkpoint generation cycle, a checkpoint is generated.
Referring to fig. 3, a schematic flow chart of multi-core synchronization and checkpointing provided by an embodiment of the present invention is shown. As shown in FIG. 3, the simulator generates a checkpoint and submits the checkpoint to the software simulator. The software simulator simulates the check point, predicts the synchronization threshold, and sends the synchronization threshold to the simulator. The simulator calculates a first target instruction number to be executed by each processor core to reach a synchronization point according to a synchronization threshold value of each processor core and a first preset period, such as 600 clock periods (cycles), and synchronizes progress of different processor cores according to the first target instruction number. For example, as shown in fig. 3, assuming that the first preset cycle is 600cycles and the synchronization threshold corresponding to the core 0 is 1, the first target instruction number to be executed when the core 0 reaches the synchronization point is 600; similarly, the synchronization threshold corresponding to the core 1 is 2, and the number of first target instructions to be executed when the core 1 reaches the synchronization point is 300; the synchronization threshold corresponding to the core 2 is 3, and the number of first target instructions to be executed when the core 2 synchronizes points is 200; the synchronization threshold corresponding to the core 3 is 1.5, and the number of first target instructions to be executed when the core 3 synchronizes the point is 400. And then, the simulator calculates a second target instruction number which needs to be executed when each processor core reaches the checkpoint generation period according to the synchronization threshold value and the checkpoint generation interval, and judges whether the processor core reaches the checkpoint generation period or not based on the second target instruction number. As shown in fig. 3, in the embodiment of the present invention, the checkpoint generating interval may be an integer multiple of the first preset period, and the second target instruction number executed by each processor core when reaching the checkpoint generating period is an integer multiple of the first target instruction number required to be executed when reaching the synchronization point, so that the states of each processor core are synchronized in the same checkpoint generating period. And generating a check point when all processor cores reach a check point generation period, and then submitting the check point to a software simulator, wherein the software simulator predicts a synchronization threshold value. And (5) circulating in this way until the operation is finished.
In summary, the embodiment of the invention provides a method for generating a check point, which can synchronize the progress of different processor cores on a simulator according to the synchronization threshold value of each processor core in a multi-core processor, generate the check point after each processor core reaches the check point generation period, store the check point in a peripheral address space, and not occupy the memory space of a kernel, thereby realizing the application of the check point technology on the multi-core processor and being beneficial to improving the efficiency of evaluating the correctness of a multi-core system before silicon.
It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.
Device embodiment
Referring to fig. 4, there is shown a block diagram of a checkpoint generating device of the present invention, applied to a simulator, and specifically may include:
An obtaining module 401, configured to obtain a synchronization threshold corresponding to each processor core in the multicore processor; the synchronization threshold is used for indicating the processor core to verify the number of instructions which need to be executed for synchronizing with the states of other processor cores;
A synchronization module 402, configured to control each processor core to enter a synchronization state according to the synchronization threshold;
A control module 403, configured to control each processor core in the multi-core processor to continue executing instructions after each processor core enters a synchronization state;
and the generating module 404 is configured to generate a checkpoint after each processor core reaches a checkpoint generating period, and store the checkpoint in a peripheral address space, so that the software simulator can read and simulate the checkpoint.
Optionally, the apparatus further comprises:
the first reading module is used for reading the memory layout information of the check point from the peripheral address space by using a software restorer;
the second reading module is used for reading the register value corresponding to each processor core according to the memory layout information;
and the state recovery module is used for carrying out state recovery on the register corresponding to each processor core according to the register value so as to recover the register into the state corresponding to the check point.
Optionally, the acquiring module includes:
an initial check point generating sub-module, configured to generate an initial check point during the running process of the simulator;
an initial checkpoint transmitting sub-module, configured to transmit the initial checkpoint to a software simulator, so that the software simulator predicts a synchronization threshold according to the initial checkpoint;
and the first receiving submodule is used for receiving the synchronization threshold value predicted by the software simulator.
Optionally, the apparatus further comprises:
And the check point sending module is used for sending the check point to the software simulator so that the software simulator predicts a synchronization threshold corresponding to the next round of multi-core synchronization according to the check point.
Optionally, the acquiring module includes:
the initial threshold value acquisition sub-module is used for acquiring a preset initial synchronization threshold value and carrying out first-round multi-core synchronization based on the initial synchronization threshold value;
And the second receiving sub-module is used for receiving the synchronization threshold value predicted by the software simulator and carrying out the next round of multi-core synchronization based on the synchronization threshold value.
Optionally, the synchronization module includes:
The calculation sub-module is used for calculating a first target instruction number which needs to be executed when each processor core reaches a synchronization point according to the synchronization threshold value and a first preset period;
And the control sub-module is used for counting the number of the executed instructions for each processor core respectively and controlling the processor cores to enter a synchronous state under the condition that the number of the executed instructions of the processor cores is equal to the first target instruction number corresponding to the processor cores.
In summary, the embodiment of the invention provides a device for generating a check point, which can synchronize the progress of different processor cores on a simulator according to the synchronization threshold value of each processor core in a multi-core processor, generate the check point after each processor core reaches the check point generation period, store the check point in a peripheral address space, and not occupy the memory space of a kernel, thereby realizing the application of the check point technology on the multi-core processor and being beneficial to improving the efficiency of evaluating the correctness of a multi-core system before silicon.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
The specific manner in which the various modules perform the operations in relation to the processor of the above-described embodiments have been described in detail in relation to the embodiments of the method and will not be described in detail herein.
Referring to fig. 5, a block diagram of an electronic device for generating a checkpoint is provided in an embodiment of the present invention. As shown in fig. 5, the electronic device includes: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus; the memory is used for storing executable instructions, and the executable instructions enable the processor to execute the checkpoint generation method of the previous embodiment.
The Processor may be a CPU (Central Processing Unit ), general purpose Processor, DSP (DIGITAL SIGNAL Processor ), ASIC (Application SPECIFIC INTEGRATED Circuit), FPGA (Field Programmable GATE ARRAY ) or other editable device, transistor logic device, hardware component, or any combination thereof. The processor may also be a combination that performs the function of a computation, e.g., a combination comprising one or more microprocessors, a combination of a DSP and a microprocessor, etc.
The communication bus may include a path to transfer information between the memory and the communication interface. The communication bus may be a PCI (PERIPHERAL COMPONENT INTERCONNECT, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The communication bus may be classified into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one line is shown in fig. 5, but not only one bus or one type of bus.
The Memory may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, an EEPROM (ELECTRICALLY ERASABLE PROGRAMMABLE READ ONLY MEMORY ), a CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory), a magnetic tape, a floppy disk, an optical data storage device, and the like.
Embodiments of the present invention also provide a non-transitory computer-readable storage medium, which when executed by a processor of an electronic device (server or terminal), enables the processor to perform the checkpoint generation method shown in fig. 1.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems) and computer program products according to embodiments of the invention. It will be understood that each flowchart and/or block of the flowchart illustrations and/or block diagrams, and combinations of flowcharts and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or terminal device that comprises the element.
The above description of the method, the device, the electronic device and the storage medium for generating the check point provided by the invention applies specific examples to illustrate the principle and the implementation of the invention, and the description of the above examples is only used for helping to understand the method and the core idea of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.
Claims (10)
1. A method of generating a checkpoint, applied to a simulator, the method comprising:
Acquiring a synchronization threshold value corresponding to each processor core in the multi-core processor; the synchronization threshold is used for indicating the processor core to verify the number of instructions which need to be executed for synchronizing with the states of other processor cores;
controlling each processor core to enter a synchronous state respectively according to the synchronous threshold value;
After each processor core in the multi-core processor enters a synchronous state, controlling each processor core to continue executing instructions;
After each processor core reaches a checkpoint generation period, generating a checkpoint, and storing the checkpoint into a peripheral address space for a software simulator to read and simulate.
2. The method according to claim 1, wherein the method further comprises:
reading the memory layout information of the check point from the peripheral address space by using a software restorer;
reading a register value corresponding to each processor core according to the memory layout information;
and carrying out state recovery on the register corresponding to each processor core according to the register value so as to recover the register into the state corresponding to the check point.
3. The method of claim 1, wherein the obtaining a synchronization threshold for each processor core in the multi-core processor comprises:
Generating an initial check point during the running process of the simulator;
Sending the initial checkpoint to a software simulator to enable the software simulator to predict a synchronization threshold according to the initial checkpoint;
And receiving a synchronization threshold predicted by the software simulator.
4. The method according to claim 1, wherein the method further comprises:
and sending the check point to the software simulator so that the software simulator predicts a synchronization threshold corresponding to the next round of multi-core synchronization according to the check point.
5. The method of claim 4, wherein the obtaining a synchronization threshold for each processor core in a multi-core processor comprises:
Acquiring a preset initial synchronization threshold value, and performing first-round multi-core synchronization based on the initial synchronization threshold value;
and receiving a synchronization threshold predicted by the software simulator, and performing next round of multi-core synchronization based on the synchronization threshold.
6. The method of claim 1, wherein said controlling each processor core to enter a respective synchronization state in accordance with said synchronization threshold comprises:
calculating a first target instruction number which needs to be executed when each processor core reaches a synchronization point according to the synchronization threshold value and a first preset period;
And counting the number of the executed instructions for each processor core respectively, and controlling the processor cores to enter a synchronous state under the condition that the number of the executed instructions of the processor cores is equal to the first target instruction number corresponding to the processor cores.
7. A checkpoint generating device, for use in a simulator, the device comprising:
The acquisition module is used for acquiring the synchronization threshold value corresponding to each processor core in the multi-core processor; the synchronization threshold is used for indicating the processor core to verify the number of instructions which need to be executed for synchronizing with the states of other processor cores;
the synchronization module is used for controlling each processor core to enter a synchronization state respectively according to the synchronization threshold value;
The control module is used for controlling each processor core in the multi-core processor to continue executing instructions after the processor cores enter a synchronous state;
And the generation module is used for generating a check point after each processor core reaches a check point generation period, and storing the check point into a peripheral address space for a software simulator to read and simulate.
8. The apparatus of claim 7, wherein the apparatus further comprises:
the first reading module is used for reading the memory layout information of the check point from the peripheral address space by using a software restorer;
the second reading module is used for reading the register value corresponding to each processor core according to the memory layout information;
and the state recovery module is used for carrying out state recovery on the register corresponding to each processor core according to the register value so as to recover the register into the state corresponding to the check point.
9. An electronic device, comprising a processor, a memory, a communication interface, and a communication bus, wherein the processor, the memory, and the communication interface communicate with each other via the communication bus; the memory is configured to store executable instructions that cause the processor to perform the checkpoint generation method as claimed in any one of claims 1 to 6.
10. A readable storage medium, wherein instructions in the readable storage medium, when executed by a processor of an electronic device, enable the processor to perform the method of generating a checkpoint as claimed in any one of claims 1to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410931327.4A CN118468798B (en) | 2024-07-11 | 2024-07-11 | Method and device for generating check point, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410931327.4A CN118468798B (en) | 2024-07-11 | 2024-07-11 | Method and device for generating check point, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118468798A true CN118468798A (en) | 2024-08-09 |
CN118468798B CN118468798B (en) | 2024-10-01 |
Family
ID=92154446
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410931327.4A Active CN118468798B (en) | 2024-07-11 | 2024-07-11 | Method and device for generating check point, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118468798B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102906728A (en) * | 2010-05-19 | 2013-01-30 | 埃克森美孚上游研究公司 | Method and system for checkpointing during simulations |
US20130346058A1 (en) * | 2012-06-22 | 2013-12-26 | Bradford M. Beckmann | Simulating vector execution |
US20140281243A1 (en) * | 2011-10-28 | 2014-09-18 | The Regents Of The University Of California | Multiple-core computer processor |
CN104657229A (en) * | 2015-03-19 | 2015-05-27 | 哈尔滨工业大学 | Multi-core processor rollback recovering system and method based on high-availability hardware checking point |
CN104750603A (en) * | 2013-12-30 | 2015-07-01 | 联芯科技有限公司 | Multi-core DSP (Digital Signal Processor) software emulator and physical layer software testing method thereof |
CN109891393A (en) * | 2016-11-04 | 2019-06-14 | Arm有限公司 | Use the primary processor error detection of detector processor |
CN111651341A (en) * | 2020-07-14 | 2020-09-11 | 中国人民解放军国防科技大学 | Performance evaluation method of general processor |
US10922146B1 (en) * | 2018-12-13 | 2021-02-16 | Amazon Technologies, Inc. | Synchronization of concurrent computation engines |
CN115729627A (en) * | 2022-11-16 | 2023-03-03 | 海光信息技术股份有限公司 | Thread synchronization method and device, chip simulation method and platform and related equipment |
CN117215963A (en) * | 2023-11-08 | 2023-12-12 | 睿思芯科(深圳)技术有限公司 | Simulator program checkpoint saving and restoring method, system and related equipment |
-
2024
- 2024-07-11 CN CN202410931327.4A patent/CN118468798B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102906728A (en) * | 2010-05-19 | 2013-01-30 | 埃克森美孚上游研究公司 | Method and system for checkpointing during simulations |
US20140281243A1 (en) * | 2011-10-28 | 2014-09-18 | The Regents Of The University Of California | Multiple-core computer processor |
US20130346058A1 (en) * | 2012-06-22 | 2013-12-26 | Bradford M. Beckmann | Simulating vector execution |
CN104750603A (en) * | 2013-12-30 | 2015-07-01 | 联芯科技有限公司 | Multi-core DSP (Digital Signal Processor) software emulator and physical layer software testing method thereof |
CN104657229A (en) * | 2015-03-19 | 2015-05-27 | 哈尔滨工业大学 | Multi-core processor rollback recovering system and method based on high-availability hardware checking point |
CN109891393A (en) * | 2016-11-04 | 2019-06-14 | Arm有限公司 | Use the primary processor error detection of detector processor |
US10922146B1 (en) * | 2018-12-13 | 2021-02-16 | Amazon Technologies, Inc. | Synchronization of concurrent computation engines |
CN111651341A (en) * | 2020-07-14 | 2020-09-11 | 中国人民解放军国防科技大学 | Performance evaluation method of general processor |
CN115729627A (en) * | 2022-11-16 | 2023-03-03 | 海光信息技术股份有限公司 | Thread synchronization method and device, chip simulation method and platform and related equipment |
CN117215963A (en) * | 2023-11-08 | 2023-12-12 | 睿思芯科(深圳)技术有限公司 | Simulator program checkpoint saving and restoring method, system and related equipment |
Non-Patent Citations (1)
Title |
---|
高岚;王锐;钱德沛;: "多核处理器并行程序的确定性重放研究", 软件学报, no. 06, 15 June 2013 (2013-06-15) * |
Also Published As
Publication number | Publication date |
---|---|
CN118468798B (en) | 2024-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12001317B2 (en) | Waveform based reconstruction for emulation | |
JP4994393B2 (en) | System and method for generating multiple models at different levels of abstraction from a single master model | |
CN112433819A (en) | Heterogeneous cluster scheduling simulation method and device, computer equipment and storage medium | |
Muttillo et al. | Hepsycode-RT: A real-time extension for an ESL HW/SW co-design methodology | |
CN110941934A (en) | FPGA prototype verification development board segmentation simulation system, method, medium and terminal | |
US10162915B2 (en) | Method and system for emulation of multiple electronic designs in a single testbench environment | |
CN117910398A (en) | Method for simulating logic system design, electronic device and storage medium | |
US20070266355A1 (en) | Distributed simultaneous simulation | |
Yasudo et al. | Performance estimation for exascale reconfigurable dataflow platforms | |
CN118468798B (en) | Method and device for generating check point, electronic equipment and storage medium | |
Abdurohman et al. | Software for Simplifying Embedded System Design Based on Event-Driven Method | |
US20100057429A1 (en) | Method and apparatus for parallelization of sequential power simulation | |
CN115935870A (en) | Power consumption analysis method and device, electronic equipment and storage medium | |
CN115309502A (en) | Container scheduling method and device | |
US7865348B1 (en) | Performance of circuit simulation with multiple combinations of input stimuli | |
George et al. | An Integrated Simulation Environment for Parallel and Distributed System Prototying | |
Banerjee et al. | Design aware scheduling of dynamic testbench controlled design element accesses in FPGA-based HW/SW co-simulation systems for fast functional verification | |
US11275875B2 (en) | Co-simulation repeater with former trace data | |
Uddin et al. | Analytical-based high-level simulation of the microthreaded many-core architectures | |
CN113627107A (en) | Method, apparatus, electronic device, and medium for determining power supply voltage data | |
CN118069374B (en) | Method, device, equipment and medium for accelerating intelligent training simulation transaction of data center | |
CN116451625B (en) | Apparatus and method for joint simulation of RTL and netlist with SDF | |
CN117933155B (en) | Multi-process joint simulation system and method, electronic equipment and storage medium | |
JP5390464B2 (en) | Simulation apparatus, simulation apparatus control method, and program | |
CN108604205B (en) | Test point creating method, device and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |