CN115146582A - Simulation method, simulation device, electronic apparatus, and computer-readable storage medium - Google Patents

Simulation method, simulation device, electronic apparatus, and computer-readable storage medium Download PDF

Info

Publication number
CN115146582A
CN115146582A CN202210921285.7A CN202210921285A CN115146582A CN 115146582 A CN115146582 A CN 115146582A CN 202210921285 A CN202210921285 A CN 202210921285A CN 115146582 A CN115146582 A CN 115146582A
Authority
CN
China
Prior art keywords
simulation
component
slices
computing device
processes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210921285.7A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Biren Intelligent Technology Co Ltd
Original Assignee
Shanghai Biren Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Biren Intelligent Technology Co Ltd filed Critical Shanghai Biren Intelligent Technology Co Ltd
Priority to CN202210921285.7A priority Critical patent/CN115146582A/en
Publication of CN115146582A publication Critical patent/CN115146582A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/39Circuit design at the physical level
    • G06F30/398Design verification or optimisation, e.g. using design rule check [DRC], layout versus schematics [LVS] or finite element methods [FEM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2115/00Details relating to the type of the circuit
    • G06F2115/02System on chip [SoC] design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2115/00Details relating to the type of the circuit
    • G06F2115/10Processors

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A simulation method for a computing device, a simulation device for a computing device, an electronic apparatus, and a computer-readable storage medium. The simulation method for the computing device comprises the following steps: obtaining a plurality of code simulation slices for a computing device, the plurality of code simulation slices including a plurality of component simulation slices, each of the plurality of component simulation slices simulating a portion of a functional component in the computing device, and the plurality of code simulation slices being compiled into a plurality of different executable programs; a plurality of different executable programs are executed in parallel to obtain a plurality of simulation processes, and the plurality of simulation processes interact with each other to verify the computing device. The method can improve the simulation efficiency, effectively shorten the simulation time of the large chip and realize the whole simulation verification of the chip function.

Description

Simulation method, simulation device, electronic apparatus, and computer-readable storage medium
Technical Field
Embodiments of the present disclosure relate to a simulation method for a computing device, a simulation device for a computing device, an electronic apparatus, and a computer-readable storage medium.
Background
With the rapid development of the integrated circuit industry, the complexity of a chip is greatly increased, and the requirement for functional verification of the chip is higher and higher, and at present, the verification work of the chip design is mainly performed based on a VCS (vertical complex Simulator), a compiled vertical Simulator) or Cadence and other simulation tools.
Disclosure of Invention
At least one embodiment of the present disclosure provides a simulation method for a computing device, including: obtaining a plurality of code simulation slices for the computing device, wherein the plurality of code simulation slices comprises a plurality of component simulation slices, each component simulation slice in the plurality of component simulation slices simulates a portion of a functional component in the computing device, and the plurality of code simulation slices are compiled into a plurality of different executable programs; the plurality of different executable programs are executed in parallel to obtain a plurality of simulation processes, and the plurality of simulation processes interact with each other to verify the computing device.
For example, in an emulation method for a computing device provided by an embodiment of the present disclosure, the plurality of emulation processes communicate through an inter-process communication channel.
For example, in the simulation method for the computing device provided by an embodiment of the present disclosure, the simulation process corresponding to each component simulation slice includes a process communication interface for communicating with other simulation processes in the plurality of simulation processes through the inter-process communication channel.
For example, in a simulation method for a computing device provided by an embodiment of the present disclosure, the plurality of simulation processes are distributed to be executed on at least one simulation computing device, and each simulation computing device executes at least one simulation process.
For example, an embodiment of the present disclosure provides a simulation method for a computing device, further including: and in response to the fact that a first code simulation slice in the plurality of code simulation slices needs to be modified, obtaining the modified first code simulation slice and a modified executable program compiled based on the modified first code simulation slice so as to replace the first code simulation slice and the executable program thereof.
For example, in a simulation method for a computing device provided by an embodiment of the present disclosure, the plurality of code simulation slices further include a test case simulation slice; wherein executing the plurality of different executable programs in parallel to obtain a plurality of simulation processes, and the plurality of simulation processes interact with each other, comprises: and executing the executable programs corresponding to the test case simulation slices and the executable programs corresponding to the component simulation slices in parallel to obtain a test case simulation process and a plurality of component simulation processes, and enabling the test case simulation process to interact with the component simulation processes, so that the test case simulation process sends test commands and/or data to the component simulation processes according to the test cases.
For example, in a simulation method for a computing device provided by an embodiment of the present disclosure, functional components of the computing device include a computing component and a supporting component, where the supporting component includes at least one of a storage component, a control component, and an interconnection component; the plurality of component simulation slices includes at least one compute component slice, each of the compute component slices including a compute module for simulating the compute component, and at least one top-level slice, each of the top-level slices including a support module for simulating the support component.
For example, in a simulation method for a computing device provided by an embodiment of the present disclosure, the top-level slice further includes a first control module, and each of the computing component slices further includes a second control module; the first control module and the second control module are used for controlling the simulation behaviors of the respective simulation processes.
For example, in an emulation method for a computing device provided by an embodiment of the present disclosure, the computing device includes a multi-core processor including a plurality of cores, each of the plurality of cores serving as one of the computing components; each slice of compute components individually emulates one or more of the cores of the multi-core processor, or each core of the multi-core processor is emulated by one or more of the slices of compute components.
For example, in an emulation method for a computing device provided by an embodiment of the present disclosure, the multicore processor further includes a plurality of support components; each of the top slices emulates one or more of the support components of the multi-core processor.
For example, in a simulation method for a computing device provided by an embodiment of the present disclosure, the computing device includes a multi-die device, the multi-die device includes a plurality of dies, at least one die of the plurality of dies includes one or more cores, each of the cores serves as one of the computing components; each of the computing component slices simulates one or more of the cores of the multi-die device, or each core of the multi-die device is simulated by one or more of a plurality of the computing component slices.
For example, in a simulation method for a computing device provided by an embodiment of the present disclosure, at least one die of the plurality of dies further includes a plurality of supporting components; each of the top slices simulates one or more of the support components of the multi-die device.
For example, in a simulation method for a computing device provided by an embodiment of the present disclosure, the computing device includes a heterogeneous integrated chip, the heterogeneous integrated chip includes a plurality of core particles, at least one core particle of the plurality of core particles includes one or more cores, and each core serves as one computing component; each of the computing component slices simulates one or more of the cores of the heterogeneous integrated chip, or each core of the heterogeneous integrated chip is simulated by one or more of the plurality of computing component slices.
For example, in a simulation method for a computing device provided by an embodiment of the present disclosure, at least one core grain of the plurality of core grains further comprises a plurality of support components; each of the top slices emulates one or more of the support components of the heterogeneous integrated chip.
For example, in a simulation method for a computing device provided by an embodiment of the present disclosure, the heterogeneous integrated chip includes at least two core grains that implement different types of processors.
At least one embodiment of the present disclosure provides an emulation apparatus for a computing apparatus, comprising a plurality of code emulation slices and a process execution unit, the plurality of code emulation slices comprising a plurality of component emulation slices, each component emulation slice of the plurality of component emulation slices simulating a portion of a functional component in the computing apparatus, and the plurality of code emulation slices being compiled into a plurality of different executable programs; the process execution unit is configured to execute the plurality of different executable programs in parallel to obtain a plurality of simulation processes, and the plurality of simulation processes interact with each other to verify the computing device.
At least one embodiment of the present disclosure provides an electronic device comprising a processor; a memory including one or more computer program modules; wherein the one or more computer program modules are stored in the memory and configured to be executed by the processor, the one or more computer program modules comprising instructions for implementing a simulation method for a computing device provided by any of the embodiments of the present disclosure.
At least one embodiment of the present disclosure provides a computer-readable storage medium storing non-transitory computer-readable instructions that, when executed by a computer, implement a simulation method for a computing device provided by any of the embodiments of the present disclosure.
Drawings
To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description only relate to some embodiments of the present disclosure and do not limit the present disclosure.
Fig. 1 illustrates a flow chart of a simulation method for a computing device according to at least one embodiment of the present disclosure;
FIG. 2 is a schematic diagram illustrating an emulated process deployment run provided by at least one embodiment of the present disclosure;
FIG. 3 illustrates a schematic diagram of a distributed simulation platform for a multicore processor provided by at least one embodiment of the present disclosure;
FIG. 4 is a schematic diagram illustrating an emulated process deployment run of a multicore processor provided by at least one embodiment of the present disclosure;
fig. 5 illustrates a schematic diagram of a distributed simulation platform for a multi-die device provided by at least one embodiment of the present disclosure;
fig. 6 is a schematic diagram illustrating an emulation process deployment operation of a multi-die processor provided in at least one embodiment of the present disclosure;
fig. 7 is a schematic diagram illustrating a distributed simulation platform of a heterogeneous integrated chip according to at least one embodiment of the present disclosure;
FIG. 8 is a schematic diagram illustrating an emulation process deployment run of a multi-kernel processor provided by at least one embodiment of the present disclosure;
FIG. 9 illustrates a schematic diagram of a plurality of simulated slices provided by at least one embodiment of the present disclosure;
FIG. 10 illustrates a schematic diagram of a simulated deployment run provided by at least one embodiment of the present disclosure;
FIG. 11 illustrates a schematic diagram of another simulated deployment run provided by at least one embodiment of the present disclosure;
FIG. 12 illustrates a schematic block diagram of an emulation device for a computing device in accordance with at least one embodiment of the present disclosure;
fig. 13 illustrates a schematic block diagram of an electronic device provided by at least one embodiment of the present disclosure;
fig. 14 illustrates a schematic block diagram of another electronic device provided by at least one embodiment of the present disclosure; and
fig. 15 illustrates a schematic diagram of a computer-readable storage medium provided by at least one embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings of the embodiments of the present disclosure. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without inventive step, are within the scope of protection of the disclosure.
Unless defined otherwise, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. Also, the use of the terms "a," "an," or "the" and similar referents do not denote a limitation of quantity, but rather denote the presence of at least one. The word "comprising" or "comprises", and the like, means that the element or item preceding the word comprises the element or item listed after the word and its equivalent, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.
For large chips, such as GPU (graphics Processing Unit) chips, multi-die GPU chips, and very large chips designed based on Chiplet, with the increase in complexity of the system, the simulator has many disadvantages in aspects of simulation efficiency, compiling time, dependence on server resources, and the like. For example, for the conventional simulation technology, no matter how large the code size of the chip is, an emulator must be integrally put for compiling and simulating, and the simulation work is limited to a single-machine single-process execution. If the existing simulation platform architecture is adopted, the larger the chip scale is, the longer the simulation time required by the chip is, and the more often the simulation time needs to be a week, a month or even longer, so that the simulation of a test case can be completed, and the development efficiency is low.
In addition, for a very large chip (particularly a GPU), since the code size is too large, the upper limit of the compiler is easily reached, and the problem that the compiler cannot compile occurs. For the problem that the code scale of the chip is too large and cannot be compiled, one solution is to divide the chip into a plurality of parts, and the plurality of parts are compiled and simulated respectively and independently, but the problem brought by this method is that only each functional module of the chip can be verified respectively and independently, and there is no interaction between the parts (functional modules), so that the whole simulation verification of the functions of the whole chip cannot be performed, and the method has great limitation.
At least one embodiment of the present disclosure provides a simulation method for a computing device, a simulation device for a computing device, an electronic apparatus, and a computer-readable storage medium. The simulation method for the computing device comprises the following steps: obtaining a plurality of code simulation slices for a computing device, the plurality of code simulation slices including a plurality of component simulation slices, each component simulation slice in the plurality of component simulation slices simulating a portion of a functional component in the computing device, and the plurality of code simulation slices being compiled into a plurality of different executable programs; a plurality of different executable programs are executed in parallel to obtain a plurality of simulation processes, and the plurality of simulation processes interact with each other to verify the entirety of the computing device. According to the simulation method for the computing device, the chip is divided into a plurality of parts and is simulated in different processes, so that the simulation efficiency can be improved, the simulation time of a large chip can be effectively shortened, and the development efficiency can be improved; moreover, through interprocess communication, the whole simulation verification of the chip function can be realized.
Fig. 1 illustrates a flowchart of a simulation method for a computing device according to at least one embodiment of the present disclosure.
As shown in fig. 1, the simulation method may include steps S110 to S120.
Step S110: a plurality of code simulation slices for a computing device is obtained, the plurality of code simulation slices including a plurality of component simulation slices, each component simulation slice in the plurality of component simulation slices simulating a portion of a functional component in the computing device, and the plurality of code simulation slices being compiled into a plurality of different executable programs.
Step S120: a plurality of different executable programs are executed in parallel to obtain a plurality of simulation processes, and the plurality of simulation processes interact with each other to verify the computing device.
For example, in at least one embodiment of the present disclosure, the computing device may include, but is not limited to, at least one of GPU, CPU (Central Processing Unit), TPU (Tensor processor), DPU (Deep learning Processing Unit), AI accelerator, and the like type chips; for another example, in at least one embodiment of the present disclosure, the computing device may be a multi-core chip, a multi-die (die) chip, or a chipset-based heterogeneous integrated chip, among others. In at least one embodiment of the present disclosure, the computing device may also be a single core chip or a single die chip, or the like. It should be noted that, in practical applications, the type of the computing device may be set according to actual requirements, and the computing device may be a chip with any architecture, and is not limited to the type of the computing device described in the embodiments of the present disclosure.
For example, the computing device may include various functional components, for example, one or more of functional components including a computing component, a storage component interconnection component, and the like, and the number of each functional component may be one or more.
For example, a functional component included in a computing device may be described and programmed by using a programming language (e.g., C + +, verilog, system Verilog, etc.), and a plurality of component simulation slices are obtained, where each component simulation slice simulates a part of the functional component in the computing device, for example, in the form of code. For example, in some embodiments, a plurality of functional components included in a computing device may be grouped, each group may include one or more functional components, a plurality of component simulation slices may be compiled for the plurality of groups, respectively, each component simulation slice corresponding to one group, that is, each component simulation slice may include code simulating a functional component in one group. The number of functional components corresponding to each component simulation slice may be the same or different, and the division of the plurality of component simulation slices may be determined according to the type and/or actual needs of the computing device.
For example, after obtaining a plurality of code simulation slices, a corresponding plurality of executable programs may be generated from the code of the plurality of code simulation slices, respectively. For example, the conversion of the code into the executable program can be realized by preprocessing, compiling, assembling and linking the four phases, and a plurality of corresponding executable programs are generated. The specific way of compiling the code into the executable program may be referred to in the related art, and the embodiments of the present disclosure are not described herein again.
For example, when the simulation verification of the computing device is required, the multiple executable programs are executed in parallel to obtain multiple simulation processes, that is, the multiple executable programs are executed in a multi-process manner. Interfaces to emulator calls may be included in each executable program so that the emulation processes can be individually initiated. A Process (Process) is a basic unit capable of operating independently, and is a basic unit for resource allocation and scheduling of a system, and is an execution entity of a program. For example, a process may contain one or more threads (threads), e.g., multiple thread tasks may be executed in parallel in a process.
For example, in the process of verifying the computing device, the multiple emulation processes may communicate directly or indirectly, for example, the multiple emulation processes may communicate through an inter-process communication channel to perform message passing between the multiple emulation processes, for example, transmitting signals such as a clock signal and an emulation behavior signal; however, it is noted that communication between any two emulation processes is not required in at least one embodiment of the present disclosure. The plurality of simulation processes are executed in parallel and can perform inter-process message transmission, so that the interconnection and intercommunication among the functional components of the simulated chip can be realized, the simulation verification process is participated together, and the simulation verification of the functions of the whole chip is realized integrally.
For example, the interprocess Communication channel may use an IPC (Inter-Process Communication) Communication mechanism of an operating system (e.g., unix system, linux system, windows system, etc.) itself, including a pipe (FIFO), a shared memory, a semaphore, a message queue, and a socket. In addition, the interprocess communication channel can also adopt communication mechanisms of an open source or self-research framework, including inproc (single-process intra-communication), ipc (single-machine multi-process communication), and TCP (TCP-based Socket or MQ communication) mechanisms, etc.
For example, the embodiment of the present disclosure replaces signal transmission between modules inside a chip with inter-process message transmission, so that when a chip is simulated, the chip can be cut into a plurality of fragments without destroying the chip architecture and connectivity of the chip, and the simulation can be performed in a distributed manner in different processes. The current mainstream interprocess communication technology has the characteristics of large message capacity and high transmission efficiency, so that the real-time performance of the distributed simulation technology is met.
According to the simulation method of at least one embodiment of the disclosure, the chip is divided into a plurality of parts and is simulated in different processes, so that the simulation efficiency can be improved, and the simulation time of a large chip can be effectively shortened; and, through interprocess communication, the whole simulation verification of the chip function can be realized.
For example, the plurality of emulated processes may be distributed for execution on at least one emulated computing device, each emulated computing device executing at least one emulated process.
For example, in some embodiments, the multiple emulation processes can execute on the same emulated computing device, including, for example, a server, a stand-alone, etc., implementing the deployment of a distributed emulation platform on a single server or a single host.
For example, in other embodiments, the plurality of emulated processes may be distributed for execution on two or more emulated computing devices, each executing at least one emulated process. Fig. 2 is a schematic diagram illustrating a deployment and operation of an emulation process according to at least one embodiment of the present disclosure. As shown in FIG. 2, each simulation process can be distributively deployed to two or more servers or clients according to the occupation requirement of each simulation process on simulation resources. For example, a chip slice 1 simulation process to a chip slice s simulation process (s is an integer greater than 1) shown in fig. 2 may be distributed to be run on s servers or clients, and each server or client runs one simulation process.
As described above, in the conventional simulation technology, the upper limit of the resources available for simulation depends on the hardware performance of a single simulated computing device (e.g., a server), so the simulation rate is also limited by the highest hardware configuration of the simulated computing device, which cannot be further improved, and even occurs that the maximum resource consumption during simulation exceeds the highest configuration performance of the server, which results in failure of simulation. Based on the multi-process parallel simulation technology of at least one embodiment of the disclosure, simulation processes can be distributed and deployed on a plurality of servers or clients and run in parallel. Therefore, the hardware resources of a plurality of servers or clients can be utilized to break through the resource upper limit of a single simulation computing device and restrict the simulation rate. Therefore, at least one embodiment of the disclosure solves the problem of overlong simulation time or simulation failure caused by the fact that the simulation can be executed on a single server in the related art, can reduce the dependence on server resources, and further improves the simulation efficiency. For example, by means of a proper chip slicing mode, the slicing number of the chip can be continuously increased and the chip can be deployed to more servers, so that more server hardware resources are utilized to further increase the simulation rate.
For example, the functional components of the computing device include computing components and support components including at least one of storage components (e.g., one or more levels of cache, etc.), control components, interconnect components (e.g., a bus or network on a chip, etc.). For example, a computational core (core) or a computational unit (element) in the GPU for performing an operation on image data may be referred to as a computational component. The support component is used, for example, to implement data storage, data transmission, and the like.
For example, the plurality of component simulation slices in step S110 includes at least one compute component slice and at least one top-level slice. The computing component slices are used to model the computing components, each computing component slice including a computing module for simulating the computing components. The top-level slices are for simulating support components, each top-level slice including a support module for simulating a support component.
For example, a computing device includes a multi-core processor that includes a plurality of cores (cores), each of the plurality of cores as one compute component. Each slice of compute components individually simulates one or more cores of a multi-core processor, or each core of a multi-core processor is simulated by one or more of a plurality of slices of compute components. For example, the multi-core processor also includes a plurality of support components, each top-level slice emulating one or more support components of the multi-core processor.
Fig. 3 shows a schematic diagram of a distributed simulation platform for a multicore processor provided by at least one embodiment of the present disclosure.
As shown in fig. 3, the GPU top-level emulation process and the GPU core 0-GPU core n emulation processes represent, for example, component emulation processes, respectively. The GPU top level emulation process corresponds to, for example, a top level slice, and the GPU core 0 emulation process to the GPU core n emulation process correspond to, for example, a plurality of compute component slices, respectively. For example, the multi-core processor may be a multi-core GPU, a multi-core CPU, and the like, and the multi-core GPU is taken as an example for the description in the embodiments of the present disclosure. The multi-core GPU comprises GPU cores 0-n, and each GPU core can be used as a computing component. According to actual requirements, a plurality of GPU cores can be cut in different combinations and integrated into a plurality of simulation processes. For example, as shown in fig. 3, a component simulation slice may be correspondingly formed for each of the GPU core 0 to the GPU core n, so as to obtain an n +1 component simulation slice, where each component simulation slice includes a computation module for simulating one GPU core, and the GPU core 0 to GPU core n modules shown in fig. 3 are, for example, used to represent the computation modules in the component simulation slice. The n +1 calculation component slices are operated, and n +1 component simulation processes (GPU core 0 simulation process-GPU core n simulation process) can be obtained.
For example, in some embodiments, the GPU cores may be grouped in a multi-core manner (e.g., a group of 2 cores or a group of 4 cores, etc.) according to actual needs, and two or more GPU cores in a group are integrated into the same emulation process. For example, one slice of compute components may be formed for every two GPU cores of GPU core 0 through GPU core n.
For example, in other embodiments, multiple slices of compute components may be formed for one GPU core, with each slice of compute components simulating portions of one GPU core.
For example, the multi-core GPU further includes a plurality of support components, and the plurality of support components include, for example, a DMA (Direct Memory Access) module, a SMMU (System Memory Management Unit), connections (various interconnection modules), a Video processing module, a CP (Command Processor) module, and a NOC (network-on-chip) module. A top-level slice can be formed that includes top-level modules that model the plurality of support components. The top module shown in fig. 3 is the main frame portion of the GPU chip code, and strips away all GPU core modules, while retaining the NOC network on chip and other functional modules (e.g., DMA, SMMU, etc.). And running the executable code corresponding to the top slice to obtain the GPU top simulation process.
For example, in some embodiments, the support components may be integrated into multiple simulation processes in a grouped form (e.g., 1 or 2 support components as a group), according to actual needs. For example, a top-level slice may be formed for each support component and a corresponding simulation process may be obtained.
For example, the plurality of code simulation slices in step S110 may include a test case simulation slice in addition to the plurality of component simulation slices. The test case simulation slice may provide a test case interface, for example, through which a user may edit a test case.
For example, step S120 includes: and executing the executable program corresponding to the test case simulation slice and the executable programs corresponding to the component simulation slices in parallel to obtain a test case simulation process and a plurality of component simulation processes, and enabling the test case simulation process to interact with the component simulation processes so that the test case simulation process sends test commands and/or data to the component simulation processes according to the test cases.
For example, as shown in fig. 3, the test case simulation process is parallel to the GPU top-level simulation process and the GPU core 0 simulation process to the GPU core n simulation process. The test case simulation process can be communicated with the GPU top-level simulation process through an inter-process communication channel, and can be communicated with the GPU core 0 simulation process to the GPU core n simulation process through the GPU top-level simulation process. For example, the test case simulation process also includes a process communication interface, so as to implement signal transmission with the GPU top-level simulation process and each GPU core simulation process through the process communication interface, or be used for mutual control among the simulation processes.
For example, the test case includes that each GPU core performs a certain operation (e.g., binarization operation, rendering operation, etc.) on a test image, the test case simulation process may send test image data and an operation command to the GPU core top-level simulation process according to the test case, the GPU core top-level simulation process controls the GPU core 0 simulation process to the GPU core n simulation process to execute the test case, and may feed back a processing result to the test case simulation process or other simulation processes (e.g., simulation processes for result comparison) to verify whether the processing result meets an expectation. For example, in some embodiments, the test case emulation process may also communicate directly with the GPU core 0 emulation process through the GPU core n emulation process.
The simulation method of at least one embodiment of the disclosure uses the test case as a process alone for the programmer to write the test command, so that on one hand, the tester can write the test command without knowing the architecture of the whole simulation platform, thereby reducing the labor cost and improving the efficiency, on the other hand, the tester can modify and upgrade the code of the test case alone without modifying the codes of other parts correspondingly, and the test case can be reused in different scenes, thereby improving the universality of the test case.
For example, the corresponding simulation process of each component simulation slice includes a process communication interface for communicating with other simulation processes of the plurality of simulation processes via an interprocess communication channel.
For example, as shown in fig. 3, each of the GPU core 0 simulation process to the GPU core n simulation process includes a process communication interface, and the process communication interface in each simulation process realizes message transmission between the simulation process and other simulation processes, for example, signal transmission between the simulation process and the GPU top-level simulation process is realized, so as to keep synchronization of signals such as clocks and synchronization of simulation time with the GPU top-level simulation process. The GPU top-level simulation process also comprises a process communication interface, and message transmission between the simulation process and other simulation processes is realized, including signal transmission between the simulation process and a GPU core simulation process and test request interaction between the simulation process and a test case simulation process.
For example, the top-level slice also includes a first control module, and each compute component slice also includes a second control module. The first control module and the second control module are used for controlling the simulation behaviors of the respective simulation processes. For example, as shown in fig. 3, the GPU top-level simulation control module is a first control module, and the first control module may control the GPU top-level simulation process, and in addition, the first control module may also control the overall timing of the simulation, including but not limited to the start and the end of the simulation and the behavior control (e.g., pause) in the process, for example, the GPU top-level simulation control module may control each GPU core simulation process to start running or pause running, and the like. The GPU core 0 simulation control module to the GPU core n simulation control module are respectively a second control module which is used for controlling the simulation time sequence of the simulation process. Each second control module may communicate with the first control module to synchronize the simulation timing of each simulation process. For example, by sampling critical signals (e.g., timing signals), the test case simulation process may also achieve timing synchronization with other simulation processes. Furthermore, synchronization of signals such as clocks between the processes can be maintained by the first control module and the second control module.
For example, the GPU top-level emulation process may further include, for example, a communication interface BFM (Bus function module), a memory BFM, and a clock generator BFM. For example, the communication interface BFM may be an AXI/PCIE BFM, for example, a communication interface model such as AXI or PCIE implemented based on a programming language (e.g., C + +), and is used to convert a signal access request required by a simulation process into a bus request conforming to a chip interface timing sequence. The memory BFM is a memory model implemented based on a programming language (e.g., C + +). The clock generator BFM is a clock generator model implemented based on a programming language (e.g., C + +) for generating the chip system clock signals required for the simulation.
For example, in the example shown in fig. 2, the GPU top-level simulation process may be used as a master process of the distributed simulation platform, and the other GPU core simulation processes and the test case simulation process are used as slave processes, which are coordinated by the master process, implement interconnection with the main body of the chip, and control the overall timing of the simulation platform.
Fig. 4 shows a schematic diagram of an emulated process deployment operation of a multicore processor provided by at least one embodiment of the present disclosure. Fig. 4 shows a deployment operation manner of the distributed simulation platform when the simulation method of the embodiment of the present disclosure is applied to a chip of a single-chip multi-core architecture (taking a single GPU as an example). In addition to starting multi-process operation on a single server, in some embodiments, each simulation process (including a GPU core 0 simulation process to a GPU core n simulation process, a GPU top-level simulation process, and a test case simulation process) may be deployed on a plurality of servers, and the simulation processes are interconnected and communicated through a high-speed network. For example, each simulation process can be deployed in a distributed mode to run on remote servers with different performances according to the characteristics of the simulation process.
For example, a computing device includes a multi-die device including a plurality of dies, at least one die of the plurality of dies including one or more cores (e.g., CPU cores or GPU cores), each core as a computing component. Each computing component slice simulates one or more cores of a multi-die device, or each core of a multi-die device is simulated by one or more of a plurality of computing component slices. For example, at least one die of the plurality of dies further includes a plurality of support components, each top slice simulating one or more support components of the multi-die device.
Fig. 5 is a schematic diagram illustrating a distributed simulation platform for a multi-die device according to at least one embodiment of the present disclosure.
As shown in fig. 5, for example, the DIE is also referred to as DIE or wafer, DIE. The multi-DIE GPU is, for example, a GPU chip with a higher function formed by integrating a plurality of GPU bare chips in the same package through a DIE-to-DIE (D2D) internal interconnection technology. The embodiment of the disclosure is described by taking a multi-DIE device as a GPU (multi GPU) with a multi-DIE multi-core architecture. The illustrated GPUs 1 to GPUm are respectively used to represent m pieces of DIE (m is an integer greater than 1), each piece of DIE includes, for example, a plurality of cores, and all the cores of the m pieces of DIE can be cut in different combinations according to actual requirements and integrated into a plurality of GPU core simulation processes. For example, GPU cores in the same DIE are integrated into the same simulation process in a multi-core (e.g., 2-core, 4-core, etc.) manner. For example, the cores of part of the DIE (for example, the illustrated GPU 1) may be divided in a pairwise combination manner, that is, for each two cores, a slice of a compute component is correspondingly formed, and a simulation process is further obtained. Cores of part of the DIE (e.g., the illustrated GPUm) may be divided in a manner of grouping four cores, and a slice of the compute component is correspondingly formed for every four cores, and a simulation process is further obtained. The GPU 1-kernel 1 x 2 simulation process represents the simulation process corresponding to 2 kernels of the 1 st group of the GPU1, the GPum-kernel 2 x 4 simulation process represents the simulation process corresponding to 4 kernels of the 2 nd group of the GPum, and the rest is the same.
For example, in some embodiments, a slice of compute components may be formed for a core of a DIE. In other embodiments, multiple slices of compute components may be formed for a core of a DIE, each slice of compute components simulating a portion of a core, and simulation verification of a core may be performed using multiple simulation processes.
For example, the support components of multiple DIE may be integrated into a simulation process (such as the multitouch top-level simulation process shown in fig. 5) that, for example, acts as the host process for the distributed simulation platform, enabling interconnection with the various GPU DIE and controlling the overall timing of the simulation platform. The Multi GPU top-level module comprises a GPU1 top-level module, a GPU2 top-level module, \8230anda GPum top-level module, wherein the body frame part of each GPU DIE (GPU 1, GPU2, \8230;, GPum) code of the Multi GPU top-level module is stripped off, all GPU core modules are reserved, a NOC on-chip network and other functional modules are reserved, and the GPU DIEs are connected through a high-speed protocol (such as a D2D protocol). In other embodiments, the simulation effort for the support components of multiple GPU DIEs may be distributed among multiple simulation processes, e.g., a top-level slice may be formed for each GPU DIE, each top-level slice simulating multiple support components in one GPU DIE.
For example, each multi-GPU top-level simulation process further includes AXI/PCIE BFM, memory BFM, and clock generator BFM, each GPU DIE may correspondingly set a set of AXI/PCIE BFM and memory BFM, and the simulation clock of each GPU DIE may be generated by the same clock generator BFM, that is, a plurality of GPU DIEs may share one clock generator BFM. Furthermore, in some embodiments, at least one corresponding clock generator BFM may be provided for each GPU DIE, in which case the clock generators BFM of the respective GPU DIEs may perform clock coordination between the domains to synchronize the clock signals of the respective GPU DIEs. The functions of AXI/PCIE BFM, memory BFM and clock generator BFM may be seen in the description above for the respective BFM.
For example, each emulation process includes an emulation control module and a process communication interface, for example, a GPU 1-core 1 × 2 emulation process includes a GPU 1-core 1 × 2 emulation control module and a corresponding process communication interface, and a multi GPU top-level emulation process includes a multi GPU top-level emulation control module and a corresponding process communication interface. The simulation control module is used for controlling the simulation behavior of the simulation process, and the process communication interface realizes the message transmission between the simulation process and other simulation processes. For example, the MultiGPU top-level simulation control module also controls the overall timing of individual DIE simulations, including but not limited to the start, end, and behavior control during the simulation. The process communication interface of the MultiGPU top-level simulation process is also used to implement test request interaction with the test case simulation process, for example. The simulation control module and process communication interface can be seen in fig. 3 and described above with respect to the second control module and process communication interface.
For example, the test case simulation process may implement signal transmission with the multi GPU top-level module and the core simulation process of each GPU DIE through the process communication interface, or be used for mutual control between the simulation processes. The test case simulation process can also realize the time sequence synchronization with other simulation processes.
Fig. 6 is a schematic diagram illustrating a deployment and operation of a simulation process of a multi-die processor according to at least one embodiment of the present disclosure. Fig. 6 shows a deployment and operation manner of a distributed simulation platform when the simulation method of the embodiment of the present disclosure is applied to a chip of a multi-DIE multi-core architecture. For example, each simulation process may be deployed on a plurality of servers, and the processes may communicate with each other through a high-speed network. The GPU1 core a × 2 emulation process represents an emulation process corresponding to 2 cores of the a-th (a is a positive integer) group of the GPU1, and the GPU core b × 4 emulation process represents an emulation process corresponding to 4 cores of the b-th (b is a positive integer) group of the GPU 1. For example, each simulation process can be deployed in a distributed mode to run on remote servers with different performances according to the characteristics of the simulation process.
For example, a computing device includes a heterogeneous integrated chip that includes a plurality of core particles, at least one of which includes one or more cores (e.g., GPU cores or CPU cores), each core as a computing component. Each compute component slice emulates one or more cores of a heterogeneous integrated chip, or each core of a heterogeneous integrated chip is emulated by one or more of a plurality of compute component slices. For example, at least one of the plurality of core particles further comprises a plurality of support components, each top slice simulating one or more support components of a heterogeneous integrated chip.
For example, a heterogeneous integrated chip contains at least two core grains that implement different types of processors. Different types of processors are implemented, including, for example, CPU, GPU, DPU, TPU, AI accelerator, and the like types of processors. For example, the heterogeneous integrated chip is an integrated chip based on the chipset technology. Various processors based on multiple processes are integrated into the same package through a Chiplet technology, for example, a super-heterogeneous computing chip comprising a plurality of GPUs, CPUs, DPUs and other processors. Due to the increasing demand for computing, the design scale of heterogeneous integrated chips may become larger and larger.
Fig. 7 is a schematic diagram illustrating a distributed simulation platform of a heterogeneous integrated chip according to at least one embodiment of the present disclosure.
As shown in fig. 7, in a heterogeneous integrated chip based on the chipset technology, a core particle may also be referred to as a chipset, and the embodiment of the present disclosure is described by taking a multi-core device as a GPU with a multi-chipset and multi-core architecture as an example. The illustrated chip GPU1 to chip GPUp are used to represent chips of p GPU types (p is an integer greater than 1), the chip CPU1 to chip CPUq are used to represent chips of q CPU types (q is an integer greater than 1), and the chip DPU1 to chip DPUr are used to represent chips of r DPU types (r is an integer greater than 1), and may include chips of other types of processors.
For example, each chipset may include multiple cores, and the multiple cores included in each chipset may be cut in different combinations according to actual requirements, and integrated into several GPU core simulation processes. For example, each GPU core in the same chipset is divided into a plurality of groups in a single-core or multi-core (e.g. 2-core, 4-core, 8-core, etc.) manner, and a plurality of simulation processes are obtained therefrom, and the grouping manners of different types of chiplets may be the same or different. Taking GPU type chip as an example, a component simulation slice should be formed for each core, and a component simulation process is obtained correspondingly. For example, the GPU 1-kernel 1 emulation process shown in the figure may represent an emulation process corresponding to the 1 st kernel of the chipset GPU1, and the GPUp-kernel x emulation process may represent an emulation process corresponding to the x (x is a positive integer) th kernel of the chipset GPUp. Similarly, each of the remaining types of chiplets (chipset CPU1 to chipset CPUq, chipset DPU1 to chipset DPUr, and the like) may group the respective multiple cores in a single-core or multi-core (such as 2 cores, 4 cores, 8 cores, and the like) manner, and a component simulation slice should be formed for each grouped core, so as to obtain a plurality of CPU core simulation processes, a plurality of DPU core simulation processes, and other processor simulation processes. For example, as shown in fig. 7, the GPU 1-core 1 emulation process further includes an emulation control module and a process communication interface, and the rest of the core emulation processes (such as a CPU core emulation process, a DPU core emulation process, and other processor emulation processes) are similar to each other and also include corresponding emulation control modules and process communication interfaces, so as to implement control over respective emulation behaviors and communication with other emulation processes.
In other embodiments, multiple computing component slices may be formed corresponding to one core of one chipset, each computing component slice simulates a portion of one core, and simulation verification of one core is completed by using multiple simulation processes.
For example, the support components of different chiplets may be distributed in different top-level simulation processes, for example, p top-level simulation processes may be correspondingly formed for the Chiplet GPU1 to the Chiplet GPUp, q top-level simulation processes may be correspondingly formed for the Chiplet CPU1 to the Chiplet CPUq, and the like. Each top-level simulation process may include a top-level module, a process communication interface, a top-level simulation control module, a BFM module, and the like. The simulation processes are communicated with each other through the inter-process communication channel, for example, the top-level simulation processes can be communicated with each other, and each core simulation process can be communicated with the corresponding top-level simulation process. It is noted that communication between any two emulation processes is not required in at least one embodiment of the present disclosure.
For example, in other embodiments, the support components of one or more chiplets may also be integrated in the same top-level simulation process, for example, the support components of multiple chiplets of the same type may be integrated in the same top-level simulation process.
For example, the test case simulation process may be in communication connection with any component simulation process, and communication between the test case simulation process and all component simulation processes may be achieved through a communication channel between the component simulation processes.
Fig. 8 illustrates a schematic diagram of a deployment operation of a simulation process of a multicore processor according to at least one embodiment of the present disclosure. Fig. 8 shows a deployment operation mode of a distributed simulation platform when the simulation method according to the embodiment of the present disclosure is applied to a chip of a multi-chip multi-core architecture. For example, each simulation process may be deployed on a plurality of servers, and the processes may communicate with each other through a high-speed network. The GPUi top-level simulation process represents a top-level simulation process corresponding to a chipset of the ith GPU type, the GPUi core a × 8 simulation process represents a simulation process corresponding to 8 cores of the a-th (a is a positive integer) group of the chipset of the ith GPU type, the CPUj top-level simulation process represents a top-level simulation process corresponding to a chipset of the jth CPU type, the CPUj core b × 4 simulation process represents a simulation process corresponding to 4 cores of the b-th (b is a positive integer) group of the chipset of the jth CPU type, and the rest of the simulation processes are similar to those shown in the figure. For example, each simulation process can be deployed in a distributed mode to run on remote servers with different performances according to the characteristics of the simulation process.
For example, the simulation method further includes: and responding to the fact that a first code simulation slice in the plurality of code simulation slices needs to be modified, and acquiring the modified first code simulation slice and a modified executable program compiled based on the modified first code simulation slice so as to replace the first code simulation slice and the executable program thereof.
Fig. 9 illustrates a schematic diagram of a plurality of simulation slices provided by at least one embodiment of the present disclosure. As shown in FIG. 9, GOOD represents a running normal code emulation slice, and BAD represents a code emulation slice that needs to be modified (e.g., a failed code emulation slice). For a super-large-scale chip, the project cycle is often very tight in the late stage of the project, and in the traditional simulation mode, the whole chip project needs to be recompiled and simulation verification needs to be carried out again when the code change caused by any defect. Thus leading to slower quality convergence and more likely loss of control of project cycle the later the project. By using the multi-process parallel simulation technology of the embodiment of the disclosure, the modification of a certain fragmentation module only needs to recompile the engineering code of the corresponding simulation process, but does not need to recompile the code of the whole engineering, thereby not only greatly reducing the cost of single compiling and debugging, but also accelerating the quality convergence speed of the whole chip at the later stage of the project.
For example, some of the nouns mentioned in the above embodiments are further explained below. D2D: die-to-Die, is a Die-to-Die interconnection technology inside a chip package. A chip, which is translated into a small chip or a module chip, can be understood as a type of DIE that satisfies a specific function. The Chiplet architecture: a plurality of module chips are packaged together on a bottom layer basic chip through a Die-to-Die internal interconnection technology to form a multifunctional chip design framework of a heterogeneous chip. AXI: advanced eXtensible inter, AXI protocol for short, is a general parallel bus protocol. The AXI BFM is a bus-level functional module developed in the distributed simulation platform architecture of the embodiment of the present disclosure, and is used for realizing signal transmission between the simulation platform and the chip module, and signals conform to the AXI protocol specification. PCIe: also written as PCI-Express, a high-speed serial computer expansion bus standard. The PCIE BFM is a bus-level functional module developed in the distributed simulation platform architecture of the embodiment of the present disclosure, and is used to implement signal transmission between the simulation platform and the chip module, and signals conform to PCIE protocol specifications. And (4) memory BFM: in the distributed simulation platform architecture of the embodiment of the disclosure, a bus-level functional module is developed for simulating the memory behavior of a chip, so that the simulation behavior of the whole chip is more real. Clock generator BFM: in the distributed simulation platform architecture of the embodiment of the disclosure, a bus-level functional module is developed for generating a chip system clock signal required by simulation.
For example, the simulation of the chip includes three stages of compiling, simulating and debugging, and the simulation method of at least one embodiment of the disclosure can solve the limitations and difficulties existing in each stage of the conventional simulation method and improve the efficiency of each stage.
For example, in the compilation stage, the compilation problem of a very large scale chip can be solved. For the traditional simulation technology, no matter how large the code size of the chip is, a simulator must be integrally put in for compiling and simulating. Therefore, for a very large chip (particularly a GPU), since the code size is too large, a problem that a compiler cannot compile easily occurs. For the problem that the code scale of the chip is too large and cannot be compiled, one solution is to divide the chip into a plurality of parts, and the parts are compiled and simulated respectively and independently, but the problem brought by this method is that only each functional module of the chip can be verified independently and the function of the whole chip cannot be verified by whole simulation, which has great limitation. In the simulation method provided by at least one embodiment of the present disclosure, each module after the chip is sliced is compiled into a plurality of different executable programs. In addition, each simulation process can communicate through a communication channel between processes when executing, so that the simulated modules can be interconnected and communicated with each other, and the function of the whole chip can be simulated integrally. Moreover, the simulation method based on at least one embodiment of the disclosure can avoid the problem that the chip code size exceeds the upper limit of the compiler capacity, and for any large-scale chip, even a very large-scale chip comprising a plurality of GPUs, CPUs, DPUs and the like, the limit problem of the compilation can be solved as long as enough fragments are cut by the chip. The simulation method provided by at least one embodiment of the disclosure solves the problem of compiling, for example, a very large chip code, and is a technical scheme which realizes a 0 to 1 breakthrough.
For example, in the simulation phase, on one hand, the simulation threshold can be reduced, and the dependence on the hardware resources of the server can be reduced. The traditional simulation technology can only carry out simulation on a single server or a single machine because the traditional simulation technology is operated by a single process. Therefore, the upper limit of resources available for simulation depends on the hardware performance of a single server, and generally, on the number of CPU cores of the server, and the cost of the server increases linearly (even exponentially) as the number of CPU cores increases. In other words, for the simulation of very large chips, it shows an extreme dependence on the hardware resources of the server, requiring companies to purchase large batches of very expensive high performance servers. When the hardware resources required by simulation exceed the top-allocation performance of the servers on the market, the simulation work cannot be carried out. Generally speaking, hardware resources required by a simulator during simulation are in proportion to the code scale of a simulation object, and after a chip is sliced, the signal quantity and logic complexity of each slicing module are greatly reduced, so that the dependence on the hardware resources of a server for a single simulation process is also greatly reduced. Therefore, by adopting the simulation method of the embodiment of the disclosure, the dependence on the hardware resources of the server can be effectively solved by deploying the simulation processes on the plurality of servers in a distributed manner and performing parallel simulation. The problem that resources required by chip simulation exceed the upper limit of server resources can be avoided. For any large-scale chip, even a chip technology-based super-large-scale chip, as long as the chip is sliced into enough fragments and the servers for simulation are enough, the problem of limitation in simulation can be solved.
For example, in the simulation phase, on the other hand, the simulation efficiency can be improved. According to the simulation method based on the embodiment of the disclosure, all simulation processes can be distributed and deployed on a plurality of servers and run in parallel. Therefore, the hardware resources of a plurality of servers can be utilized to break through the constraint of the upper limit of the resources of a single server on the simulation rate. By means of a proper chip slicing technology, the slicing number of chips can be continuously increased and the chips can be deployed on more servers, so that more server hardware resources are utilized, and the simulation rate is further improved. In addition, the simulation speed can be continuously improved by increasing the number of simulation servers and the communication speed between processes.
For example, in the debugging stage, on one hand, the time for shortening the compiling time of the chip can be greatly reduced. For very large scale chips, the time for each compilation of chip-level engineering code is often as long as one week, one month, or even longer. By using the simulation method of at least one embodiment of the disclosure, after the chip is sliced, the chip code can be compiled in parallel, so that the time consumed by the whole compiling of the chip code can be greatly shortened, and the engineering efficiency can be greatly improved.
For example, in the debugging stage, on the other hand, the chip quality convergence speed can be improved. According to the simulation method disclosed by the embodiment of the disclosure, each module of the chip can be relatively independently simulated, if a certain fragment has a defect or needs iteration, the engineering code of the corresponding fragment only needs to be recompiled, and the code of the whole engineering does not need to be recompiled, so that the cost of single compiling and debugging is greatly reduced, and the quality convergence rate of the whole chip at the later stage of the project can be accelerated. Particularly, in the middle and later stages (subsystem level or chip level stage) of the design verification of the super-large scale chip, the quality convergence process of the chip can be accelerated, and unnecessary verification preparation work can be reduced to a great extent.
For example, in the debugging phase, in another aspect, the test case development and debugging cost can be reduced. For a very large scale chip, the software and hardware configuration environment of a server for running a simulation platform is often complex. In the later stage of the project, a large number of verification and test workers need to be familiar with and maintain a set of self simulation environment, which undoubtedly is a huge workload, and the cost of test case development and debugging is greatly increased.
In addition, through a proper monitoring means, simulation performance bottlenecks existing in the simulation process can be better discovered. By re-fragmenting the chip or pointedly improving the server performance of a specific simulation process, the problem of simulation resource performance bottleneck encountered in the simulation process can be better solved, and the simulation speed is further improved.
FIG. 10 is a schematic diagram of a simulation deployment operation provided by at least one embodiment of the present disclosure, and each computing device executing a simulation process may form a distributed simulation environment, as shown in FIG. 10. For example, if s computing devices (e.g., servers) execute a plurality of component simulation processes in parallel, s (s is a positive integer) distributed simulation environments (e.g., the illustrated distributed simulation environment 1 to the illustrated distributed simulation environment s) may be formed, and a distributed simulation environment in which test case simulation processes are executed may be included. In some embodiments, the user-side device may run the test case simulation process, and the user-side software development test environment may serve as a distributed simulation environment and interact with other distributed simulation environments (e.g., the illustrated distributed simulation environment 1-distributed simulation environment s). Therefore, the simulation method based on the embodiment of the disclosure supports the separation of the test case simulation process from the operating environment of other simulation processes, in other words, testers (especially testers of a software stack) do not need to be aware of how the simulation platform is built and deployed, and only need to develop and remotely debug the test cases according to the protocol of inter-process communication, thereby greatly reducing the development and debugging cost of the test cases.
For example, in the debugging phase, in yet another aspect, the reusability of the emulation platform can be improved. For a very large scale chip, the development and quality convergence speeds of the modules often have the large difference, which brings great challenges to the integration of the simulation platform, and is especially prominent for a chip architecture chip. Small-scale chiplets, such as DPUs, have fast quality convergence, the project cycle can be greatly advanced from other chiplets, and is constrained by the progress of the whole project, and the maintenance personnel of the simulation platform need to be continuously invested in the project to participate in the maintenance work of the whole simulation platform, and cooperate with the subsequent integration work of the simulation platforms of other chiplets.
Fig. 11 shows a schematic diagram of another simulation deployment operation provided in at least one embodiment of the present disclosure, as shown in fig. 11, based on the simulation method in at least one embodiment of the present disclosure, for a heterogeneous integrated chip, different types of processor cores are deployed on different simulation platforms, for example, multiple GPU simulation processes (e.g., GPU simulation process 1 to GPU simulation process n) are run on a GPU simulation platform, multiple DPU simulation processes (e.g., DPU simulation process 1 to DPU simulation process n) are run on a DPU simulation platform, multiple CPU simulation processes (e.g., CPU simulation process 1 to CPU simulation process n) are run on a CPU simulation platform, and multiple other processor simulation processes (e.g., other processor simulation process 1 to other processor simulation process n) are run on other processor simulation platforms. One or more simulation processes run on each simulation platform, the simulation processes on each platform can interact with each other, and different simulation platforms can also interact with each other. Therefore, the simulation platform of the GPU, the simulation platform of the CPU, and the simulation platform of the DPU may be deployed as separate services on different servers or server combinations, respectively, for example, each simulation platform may include one or more servers or clients for running one or more simulation processes, respectively. And the subsequent simulation platforms of other Chiplets can be awakened through interprocess communication and integrated into the whole simulation environment. The scheme of integration and deployment of building blocks in a progressive mode is integrally realized, and repeated large-scale integration and maintenance cost is saved.
The embodiment of the disclosure also provides a simulation device for the computing device. Fig. 12 illustrates a schematic block diagram of an emulation device 200 for a computing device according to at least one embodiment of the present disclosure.
For example, as shown in fig. 12, the simulation apparatus 200 for a computing apparatus includes a plurality of code simulation slices 210 and a process execution unit 220.
The plurality of code emulation slices 210 includes a plurality of component emulation slices, each component emulation slice in the plurality of component emulation slices simulates a portion of a functional component in a computing device, and the plurality of code emulation slices are compiled into a plurality of different executable programs.
The process execution unit 220 is configured to execute the plurality of different executable programs in parallel to obtain a plurality of simulation processes, and the plurality of simulation processes interact with each other to verify the computing device.
For example, the code emulation slice 210 may refer to the description about the code emulation slice in the above embodiment, and is not described herein again.
For example, the process execution unit 220 may be hardware, software, firmware, or any feasible combination thereof. For example, the process execution unit 220 may be a dedicated or general purpose circuit, chip or device, or a combination of a processor and a memory. The embodiments of the present disclosure are not limited in this regard to the specific implementation forms of the above units.
It should be noted that, in the embodiment of the present disclosure, reference may be made to the description related to the simulation method for a computing device for specific functions of the simulation apparatus 200 for a computing device, and details are not described herein again. The components and configuration of the simulation apparatus 200 for a computing device shown in FIG. 12 are exemplary only, and not limiting, and the simulation apparatus 200 for a computing device may also include other components and configurations as desired.
At least one embodiment of the present disclosure also provides an electronic device comprising a processor and a memory, the memory including one or more computer program modules. One or more computer program modules are stored in the memory and configured to be executed by the processor, the one or more computer program modules comprising instructions for implementing the simulation method for a computing device described above. The electronic equipment divides the chip into a plurality of parts and simulates in different processes respectively, so that the simulation efficiency can be improved, the simulation time of the large chip can be effectively shortened, and the whole simulation verification of the chip function can be realized through interprocess communication.
Fig. 13 is a schematic block diagram of an electronic device provided in some embodiments of the present disclosure. As shown in fig. 13, the electronic device 300 includes a processor 310 and a memory 320. Memory 320 stores non-transitory computer-readable instructions (e.g., one or more computer program modules). The processor 310 is configured to execute non-transitory computer readable instructions which, when executed by the processor 310, perform one or more of the steps in the simulation method for a computing device described above. The memory 320 and the processor 310 may be interconnected by a bus system and/or other form of connection mechanism (not shown).
It should be noted that the components of the electronic device 300 shown in fig. 13 are only exemplary and not limiting, and the electronic device 300 may have other components according to the actual application.
For example, the processor 310 and the memory 320 may be in direct or indirect communication with each other.
For example, the processor 310 and the memory 320 may communicate over a network. The network may include a wireless network, a wired network, and/or any combination of wireless and wired networks. The processor 310 and the memory 320 may also communicate with each other via a system bus, which is not limited by the present disclosure.
For example, the processor 310 and the memory 320 may be located on the server side (or cloud side).
For example, the processor 310 may control other components in the electronic device 300 to perform desired functions. For example, the processor 310 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or other form of processing unit having data processing capabilities and/or program execution capabilities. For example, the Central Processing Unit (CPU) may be an X86, ARM, RISC-V architecture, or the like. The processor 310 may be a general-purpose processor or a special-purpose processor that may control other components in the electronic device 300 to perform desired functions.
For example, memory 320 may include any combination of one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, read Only Memory (ROM), a hard disk, an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), USB memory, flash memory, and the like. One or more computer program modules may be stored on the computer-readable storage medium and executed by the processor 310 to implement the various functions of the electronic device 300. Various applications and various data, as well as various data used and/or generated by the applications, etc., may also be stored in the computer-readable storage medium.
It should be noted that, in the embodiment of the present disclosure, reference may be made to the above description on the simulation method for the computing apparatus for specific functions and technical effects of the electronic apparatus 300, and details are not described herein again.
Fig. 14 is a schematic block diagram of another electronic device provided by some embodiments of the present disclosure. The electronic apparatus 400 is, for example, suitable for implementing the simulation method for a computing device provided by the embodiments of the present disclosure. The electronic device 400 may be a terminal device or the like. It should be noted that the electronic device 400 shown in fig. 14 is only one example, and does not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 14, electronic device 400 may include a processing means (e.g., central processing unit, graphics processor, etc.) 410 that may perform various appropriate actions and processes in accordance with programs stored in a Read Only Memory (ROM) 420 or loaded from a storage device 480 into a Random Access Memory (RAM) 430. In the RAM430, various programs and data necessary for the operation of the electronic apparatus 400 are also stored. The processing device 410, the ROM 420, and the RAM430 are connected to each other by a bus 440. An input/output (I/O) interface 450 is also connected to bus 440.
Generally, the following devices may be connected to the I/O interface 450: input devices 460 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 470 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, or the like; storage 480 including, for example, magnetic tape, hard disk, etc.; and a communication device 490. The communication device 490 may allow the electronic device 400 to communicate wirelessly or by wire with other electronic devices to exchange data. While fig. 14 illustrates an electronic device 400 having various means, it is to be understood that not all illustrated means are required to be implemented or provided, and that the electronic device 400 may alternatively be implemented or provided with more or less means.
For example, according to an embodiment of the present disclosure, the above-described simulation method for a computing device may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program comprising program code for performing the above-described simulation method for a computing device. In such embodiments, the computer program may be downloaded and installed from a network through communication device 490, or installed from storage device 480, or installed from ROM 420. When executed by the processing device 410, the computer program may implement the functions defined in the simulation method for a computing device provided by the embodiments of the present disclosure.
At least one embodiment of the present disclosure also provides a computer-readable storage medium storing non-transitory computer-readable instructions that, when executed by a computer, may implement the above-described simulation method for a computing device.
Fig. 15 is a schematic diagram of a storage medium according to some embodiments of the present disclosure. As shown in fig. 15, the storage medium 500 stores non-transitory computer readable instructions 510. For example, the non-transitory computer readable instructions 510, when executed by a computer, perform one or more steps in a simulation method for a computing device according to the description above.
For example, the storage medium 500 may be applied to the electronic device 300 described above. The storage medium 500 may be, for example, the memory 320 in the electronic device 300 shown in fig. 13. For example, the related description about the storage medium 500 may refer to the corresponding description of the memory 320 in the electronic device 300 shown in fig. 13, and is not repeated here.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other combinations of features described above or equivalents thereof without departing from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
The following points need to be explained:
(1) The drawings of the embodiments of the disclosure only relate to the structures related to the embodiments of the disclosure, and other structures can refer to common designs.
(2) Without conflict, embodiments of the present disclosure and features of the embodiments may be combined with each other to arrive at new embodiments.
The above description is only a specific embodiment of the present disclosure, but the scope of the present disclosure is not limited thereto, and the scope of the present disclosure should be subject to the scope of the claims.

Claims (18)

1. A simulation method for a computing device, comprising:
obtaining a plurality of code simulation slices for the computing device, wherein the plurality of code simulation slices comprises a plurality of component simulation slices, each component simulation slice in the plurality of component simulation slices simulates a portion of a functional component in the computing device, and the plurality of code simulation slices are compiled into a plurality of different executable programs;
the plurality of different executable programs are executed in parallel to obtain a plurality of simulation processes, and the plurality of simulation processes interact with each other to verify the computing device.
2. The simulation method of claim 1, wherein the plurality of simulation processes communicate over an interprocess communication channel.
3. The emulation method of claim 2, wherein the emulation process corresponding to each component emulation slice comprises a process communication interface for communicating with other emulation processes of the plurality of emulation processes via the inter-process communication channel.
4. The emulation method of claim 1, wherein the plurality of emulated processes are distributed for execution on at least one emulated computing device, each of the emulated computing devices executing at least one of the emulated processes.
5. The simulation method of claim 1, further comprising:
and in response to the fact that a first code simulation slice in the plurality of code simulation slices needs to be modified, obtaining the modified first code simulation slice and a modified executable program compiled based on the modified first code simulation slice so as to replace the first code simulation slice and the executable program thereof.
6. The simulation method of claim 1, wherein the plurality of code simulation slices further comprises a testcase simulation slice;
wherein executing the plurality of different executable programs in parallel to obtain a plurality of simulation processes, and the plurality of simulation processes interact with each other, comprises:
and executing the executable programs corresponding to the test case simulation slices and the executable programs corresponding to the component simulation slices in parallel to obtain a test case simulation process and a plurality of component simulation processes, and enabling the test case simulation process to interact with the component simulation processes, so that the test case simulation process sends test commands and/or data to the component simulation processes according to the test cases.
7. The simulation method according to any one of claims 1 to 6,
the functional components of the computing device comprise a computing component and a supporting component, wherein the supporting component comprises at least one of a storage component, a control component and an interconnection component;
the plurality of component simulation slices includes at least one compute component slice, each of the compute component slices including a compute module for simulating the compute component, and at least one top-level slice, each of the top-level slices including a support module for simulating the support component.
8. The simulation method of claim 7,
the top-level slice further comprises a first control module, each of the computing component slices further comprises a second control module;
the first control module and the second control module are used for controlling the simulation behaviors of the respective simulation processes.
9. The emulation method of claim 7, wherein the computing device comprises a multi-core processor comprising a plurality of cores, each of the plurality of cores being one of the compute components;
each slice of compute components individually emulates one or more of the cores of the multi-core processor, or each core of the multi-core processor is emulated by one or more of the slices of compute components.
10. The emulation method of claim 9, wherein the multi-core processor further comprises a plurality of support components;
each of the top slices emulates one or more of the support components of the multi-core processor.
11. The simulation method of claim 7, wherein said computing device comprises a multi-die device comprising a plurality of dies, at least one die of said plurality of dies comprising one or more cores, each said core being one of said computing components;
each of the computing component slices simulates one or more of the cores of the multi-die device, or each core of the multi-die device is simulated by one or more of a plurality of the computing component slices.
12. The simulation method of claim 11, wherein at least one of the plurality of dice further comprises a plurality of support components;
each of the top slices simulates one or more of the support components of the multi-die device.
13. The simulation method of claim 7, wherein the computing device comprises a heterogeneous integrated chip comprising a plurality of core particles, at least one of the plurality of core particles comprising one or more cores, each of the cores being one of the computing components;
each of the computing component slices simulates one or more of the cores of the heterogeneous integrated chip, or each core of the heterogeneous integrated chip is simulated by one or more of the plurality of computing component slices.
14. The simulation method of claim 13, wherein at least one of the plurality of core particles further comprises a plurality of support components;
each of the top slices emulates one or more of the support components of the heterogeneous integrated chip.
15. The simulation method of claim 13, wherein the heterogeneous integrated chip contains at least two core grains implementing different types of processors.
16. A simulation device for a computing device, comprising:
a plurality of code emulation slices, wherein the plurality of code emulation slices comprises a plurality of component emulation slices, each component emulation slice in the plurality of component emulation slices simulates a portion of a functional component in the computing device, and the plurality of code emulation slices are compiled into a plurality of different executable programs;
a process execution unit configured to execute the plurality of different executable programs in parallel to obtain a plurality of simulation processes, and the plurality of simulation processes interact with each other to verify the computing device.
17. An electronic device, comprising:
a processor;
a memory including one or more computer program modules;
wherein the one or more computer program modules are stored in the memory and configured to be executed by the processor, the one or more computer program modules comprising instructions for implementing the emulation method for a computing device of any of claims 1-15.
18. A computer-readable storage medium storing non-transitory computer-readable instructions which, when executed by a computer, implement the emulation method for a computing device of any one of claims 1 to 15.
CN202210921285.7A 2022-08-02 2022-08-02 Simulation method, simulation device, electronic apparatus, and computer-readable storage medium Pending CN115146582A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210921285.7A CN115146582A (en) 2022-08-02 2022-08-02 Simulation method, simulation device, electronic apparatus, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210921285.7A CN115146582A (en) 2022-08-02 2022-08-02 Simulation method, simulation device, electronic apparatus, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN115146582A true CN115146582A (en) 2022-10-04

Family

ID=83415046

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210921285.7A Pending CN115146582A (en) 2022-08-02 2022-08-02 Simulation method, simulation device, electronic apparatus, and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN115146582A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117236263A (en) * 2023-11-15 2023-12-15 之江实验室 Multi-core interconnection simulation method and device, storage medium and electronic equipment
CN117408060A (en) * 2023-10-13 2024-01-16 上海同星智能科技有限公司 Whole vehicle model simulation performance optimization method, storage medium and electronic equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117408060A (en) * 2023-10-13 2024-01-16 上海同星智能科技有限公司 Whole vehicle model simulation performance optimization method, storage medium and electronic equipment
CN117408060B (en) * 2023-10-13 2024-05-14 上海同星智能科技有限公司 Whole vehicle model simulation performance optimization method, storage medium and electronic equipment
CN117236263A (en) * 2023-11-15 2023-12-15 之江实验室 Multi-core interconnection simulation method and device, storage medium and electronic equipment
CN117236263B (en) * 2023-11-15 2024-02-06 之江实验室 Multi-core interconnection simulation method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
EP3754496B1 (en) Data processing method and related products
CN110795219B (en) Resource scheduling method and system suitable for multiple computing frameworks
CN115146582A (en) Simulation method, simulation device, electronic apparatus, and computer-readable storage medium
Jayasinghe et al. Variations in performance and scalability when migrating n-tier applications to different clouds
CN101344899B (en) Simulation test method and system of on-chip system
CN102508753B (en) IP (Internet protocol) core verification system
CN102799465B (en) Virtual interrupt management method and device of distributed virtual system
CN104750603A (en) Multi-core DSP (Digital Signal Processor) software emulator and physical layer software testing method thereof
CN112631986B (en) Large-scale DSP parallel computing device
Giorgi et al. Axiom: A scalable, efficient and reconfigurable embedded platform
Posadas et al. Automatic synthesis of embedded SW for evaluating physical implementation alternatives from UML/MARTE models supporting memory space separation
CN111353263A (en) Software and hardware design and verification platform system
CN112764981B (en) Cooperative testing system and method
EP2672388B1 (en) Multi-processor parallel simulation method, system and scheduler
EP4148568A1 (en) Method for realizing live migration, chip, board, and storage medium
CN112699041A (en) Automatic deployment method, system and equipment for embedded software
Du et al. A FACE-based simulation and verification approach for avionics systems
Paolucci et al. EURETILE 2010-2012 summary: first three years of activity of the European Reference Tiled Experiment
EP4092531A1 (en) Systems, methods, and apparatus for coordinating computation systems
Cho et al. A full-system VM-HDL co-simulation framework for servers with PCIe-connected FPGAs
Yang et al. Research on Heterogeneous Cloud Test Platform Based on Elastic Scaling Mechanism
US20230289500A1 (en) Method and system for building hardware images from heterogeneous designs for eletronic systems
US20230034779A1 (en) Service mesh for composable cloud-native network functions
Callanan et al. A study in rapid prototyping: Leveraging software and hardware simulation tools in the bringup of system-on-a-chip based platforms
Aylward et al. Reconfigurable systems and flexible programming for hardware design, verification and software enablement for system-on-a-chip architectures

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: 201100 room 1302, 13 / F, building 16, No. 2388, Chenhang highway, Minhang District, Shanghai

Applicant after: Shanghai Bi Ren Technology Co.,Ltd.

Address before: 201100 room 1302, 13 / F, building 16, No. 2388, Chenhang highway, Minhang District, Shanghai

Applicant before: Shanghai Bilin Intelligent Technology Co.,Ltd.

Country or region before: China

CB02 Change of applicant information