WO2023232006A1 - 仿真装置、仿真系统及其仿真方法、存储介质 - Google Patents

仿真装置、仿真系统及其仿真方法、存储介质 Download PDF

Info

Publication number
WO2023232006A1
WO2023232006A1 PCT/CN2023/097006 CN2023097006W WO2023232006A1 WO 2023232006 A1 WO2023232006 A1 WO 2023232006A1 CN 2023097006 W CN2023097006 W CN 2023097006W WO 2023232006 A1 WO2023232006 A1 WO 2023232006A1
Authority
WO
WIPO (PCT)
Prior art keywords
simulation device
space
simulation
object model
host
Prior art date
Application number
PCT/CN2023/097006
Other languages
English (en)
French (fr)
Inventor
李涛
罗海钊
袁航剑
施云峰
王剑
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2023232006A1 publication Critical patent/WO2023232006A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • Embodiments of the present disclosure relate to a simulation device, a simulation system and a simulation method thereof, and non-transitory computer-readable storage media.
  • DSA Domain-Specific Accelerator
  • At least one embodiment of the present disclosure provides a simulation device, which is used to simulate a neural network processor and includes: an agent module configured to communicate with an object model simulating the neural network processor and in the An agent serving as the object model in the simulation device; a management module configured to manage the simulation device; and an interconnection module configured to communicate with the agent module and the management module.
  • the simulation device receives the task sent by the host, and through the agent
  • the module sends work information related to the task to the object model, receives feedback information returned by the object model after processing the work information through the proxy module, and provides the feedback information to the host.
  • At least one embodiment of the present disclosure also provides a simulation system, including: a simulation device according to any embodiment of the present disclosure, an object model, and a host, where the host is configured to obtain the task and send the task to The simulation device; the object model is configured to process the work information to obtain the feedback information.
  • At least one embodiment of the present disclosure also provides a simulation method applied to the simulation system according to any embodiment of the present disclosure, including: sending a task through the host; receiving and parsing the task through the simulation device to convert the Work information related to the task is sent to the object model; the object model processes the work information to obtain the feedback information; and the feedback information is provided to the host through the simulation device.
  • At least one embodiment of the present disclosure also provides a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are implemented when executed by a processor.
  • the simulation method according to any of the above embodiments.
  • Figure 1A is a schematic diagram of the hardware architecture of a simulation device provided by at least one embodiment of the present disclosure
  • 1B is a schematic diagram of the hardware architecture of another simulation device provided by at least one embodiment of the present disclosure.
  • Figure 2 is a schematic diagram of the hardware architecture of a simulation system provided by at least one embodiment of the present disclosure
  • Figure 3 is a schematic flow chart of a simulation method provided by at least one embodiment of the present disclosure.
  • Figure 4 is a schematic diagram of a non-transitory computer-readable storage medium provided by at least one embodiment of the present disclosure
  • FIG. 5 is a schematic diagram of the hardware structure of an electronic device provided by at least one embodiment of the present disclosure.
  • the term “include” and its variations are open-ended, ie, “including but not limited to.”
  • the term “based on” means “based at least in part on.”
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; and the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
  • VLIW Very Long Instruction Word
  • ISA Integrated and complete instruction set architecture
  • Master-Slave data block
  • Data-Streaming data flow
  • the optimization strategies involved in AI compilers mainly include operator fusion, splitting, IO/computing parallel scheduling, etc. These optimization strategies constitute a huge optimization search space.
  • the final execution efficiency of each optimization strategy on the underlying accelerator is different, and a cost-model is needed to evaluate each optimization strategy and reduce the optimization search space.
  • the cost model needs to be accurate enough without introducing excessive compilation stage overhead.
  • the open source full-system simulation platform includes the gem5 simulator.
  • the shortcomings of the gem5 simulator mainly include: the system is complex, not designed for AI accelerator modeling, and the execution efficiency is low. Moreover, as a cost model, the gem5 simulator has too much overhead. . Therefore, current AI compilers require high-performance cost-models for evaluation.
  • the simulation device is used to simulate a neural network processor and includes: an agent module, a management module and an interconnection module.
  • the agent module is configured to communicate with the object model of the simulated neural network processor and serve as an agent of the object model in the simulation device;
  • the management module is configured to manage the simulation device;
  • the interconnection module is configured to connect the agent module and the management module Communication connection.
  • the simulation device receives the task sent by the host, sends the work information related to the task to the object model through the agent module, receives the feedback information returned by the object model after processing the work information through the agent module, and provides the feedback information to the host.
  • the simulation device models the full-system simulation platform of the neural network processor, which can effectively realize the architectural design, exploration and functional verification of the neural network processor, and accelerate the structural and functional design of the neural network processor. process, thereby avoiding hardware limitations in the development of neural network processors.
  • the simulation device has a simple structure and is easy to implement.
  • At least one embodiment of the present disclosure also provides a simulation system, a simulation method thereof, and a non-transitory computer-readable storage medium.
  • FIG. 1A is a schematic diagram of the hardware architecture of a simulation device provided by at least one embodiment of the present disclosure.
  • FIG. 1B is a schematic diagram of the hardware architecture of another simulation device provided by at least one embodiment of the present disclosure.
  • FIG. 2 is a schematic diagram of at least one simulation device of the present disclosure. The embodiment provides a schematic diagram of the hardware architecture of a simulation system.
  • the simulation device can be used to simulate a neural network processor.
  • the neural network processor can be implemented in the form of hardware. For example, it can be implemented based on program design. Language chip.
  • the neural network processor can be used to implement convolution operations, matrix operations, etc.
  • the simulation device 100 may include an agent module 110 , a management module 120 and an interconnection module 130 .
  • the agent module 110 is configured to communicate with the object model 300 that simulates the neural network processor and serves as an agent of the object model in the simulation device 100, that is, the object model 300 is used to simulate the functions of the neural network processor;
  • the management module 120 is configured to manage the simulation device 100;
  • the interconnection module 130 is configured to communicate with the agent module 110 and the management module 120.
  • the simulation device 100 shown in FIG. 2 is the simulation device shown in FIG. 1A, the simulation device 100 in the simulation system 1000 may also be the simulation device shown in FIG. 1B.
  • the management module 120 can manage the agent module 110 to control the agent module 110 to perform corresponding functions.
  • the simulation device 100 receives a task sent by the host 200, sends the work information related to the task to the object model through the agent module 110, and receives the work information processed by the object model through the agent module 110. Return the feedback information and provide the feedback information to the host 200.
  • the simulation device 100 is implemented through a virtual simulation platform.
  • the simulation device 100 can be implemented as a QEMU (Quick EMUlator) virtual platform (virt platform) or the like.
  • the management module 120 is implemented through a virtual central processor.
  • the virtual central processor may be a RISC (Reduced Instruction Set Computer RISC)-V (V represents the fifth generation RISC) core.
  • RISC Reduced Instruction Set Computer RISC
  • V represents the fifth generation RISC
  • the task may be any task that needs to be performed by the neural network processor, such as object recognition, matrix operations (eg, matrix multiplication operations), etc.
  • the proxy module 110 and the host 200 may communicate through at least one method.
  • the at least one method may include socket method, shared storage method, and/or message queue method.
  • the agent module 110 communicates with the host 200 through sockets, shared storage, and message queues.
  • the socket mode may be Unix Domain Socket (UDS, Unix Domain Socket) mode.
  • the shared storage mode indicates communication through a shared storage file.
  • the shared storage file can be the /dev/shm/ivshmem file in the host 200.
  • the ivshmem file can share the memory area created by the host 200 between different QEMU processes.
  • Message Queue (message Queue) mode MQ1 can use First Input (First Input) First Output, FIFO) method to deliver messages.
  • communication between data with a large amount of data is realized through shared storage, and communication between data with a small amount of data is realized through a socket method and/or a message queue method. communication.
  • the agent module 110 and the object model 300 can also communicate through at least one way.
  • the agent module 110 and the object model 300 can communicate through a message queue.
  • the agent module 110 communicates with the object model 300 through a message queue.
  • the module 110 communicates with the object model 300 through message queue mode.
  • the agent module 110 sends task-related work information to the object model 300 through the message queue mode MQ2, and the object model 300 sends feedback information to the agent module through the message queue mode MQ3.
  • message queue mode MQ2 and message queue mode MQ3 can use FIFO to deliver messages.
  • the object model 300 can be connected to the simulation device 100 in a pluggable manner. That is to say, the simulation device 100 can simulate different object models. For example, different object models can simulate different neural network processors.
  • the object model 300 can accelerate the computing speed of the neural network processor, save computing time, and improve computing efficiency.
  • the host 200 can obtain a task from an external device and send the task to the simulation device 100 through a socket.
  • information such as parameters of the neural network processor and task-related input and/or output are accessed by the host 200 and the simulation device 100 through shared storage for reading or writing operations.
  • the input and/or output related to the task may be determined according to the type of the task.
  • the task may be to identify the target object in the image and feed back the image labeled with the target object to the host 200 , at this time, the input image can be task-related input, and the image identified through the object model 300 and labeled with the target object can be task-related output.
  • the simulation device 100 may send (for example, synchronize) the feedback information to the host 200 through a message queue.
  • the simulation device 100 further includes an input and output module 140 , and the input and output module 140 can interact with the host 200 , for example, communicate with the host 200 through a socket.
  • Receive tasks For example, the input and output module 140 is configured Set to send the task to the agent module 110 or the management module 120.
  • the input and output module 140 may include a buffer or the like for storing tasks.
  • the address space of the agent module 110 may include an agent register space Re and a model space Mem1.
  • the proxy register space Re is used to define the registers of the neural network processor.
  • the registers of the neural network processor are all memory mapped.
  • Memory mapped refers to the unified addressing of the device's registers and memory, that is, using the memory address.
  • the register of the neural network processor is defined in the memory space and can be located through the memory address to achieve reading and writing.
  • the proxy register space Re can communicate with the host 200 and the object model 300 through a message queue.
  • model space Mem1 is used to store parameters of the neural network processor and task-related inputs and/or outputs.
  • the model space Mem1 is shared by the agent module 110 and the host 200. At this time, for example, the model space Mem1 communicates with the host 200 through shared storage.
  • model space Mem1 may be mapped to storage space Mem2 in the object model. It can be seen that the model space Mem1 and the storage space Mem2 are in the same address space, and the actual storage is mapped to the shared storage file in the host 200 . That is to say, the model space Mem1 can actually be accessed by the simulation device 100, the host 200 and the object model 300. For example, the host 200 writes the parameters of the neural network processor and the input and/or output related to the task into model space Mem1, the object model 300 can directly read the task-related input and/or output information written into the model space Mem1 for related processing.
  • the address space of the agent module 110 may also include the configuration instruction space Mem3.
  • the configuration instruction space Mem3 is used to store the configuration instructions of the neural network processor's registers and task-related control instructions.
  • the agent module 110, the management module 120, and the interconnect module 130 may be mounted into a peripheral component interconnect express (PCIe) system.
  • PCIe system can include the following device types: root complex (RC, root complex), bridge, switch, end device (Endpoint), etc.
  • the root complex is the CPU
  • the interface between the PCIe bus and the PCIe bus, Bridge provides an interface to other buses (such as PCI or PCI-x, or even another PCIe bus), sometimes also called forwarding bridge;
  • Switch provides expansion or aggregation capabilities and allows more Multiple devices are connected to a PCle port.
  • the Switch can act as a packet router and identify which path a given packet needs to take based on the address or other routing information.
  • the Endpoint is in the topology of the PCIe bus system. At the very end, it is generally used as the initiator (similar to the master in the PCI bus) or the terminator (Completers, similar to the slave in the PCI bus) of the bus operation.
  • interconnect module 130 may be a root complex in a PCIe system.
  • the agent module 100 may be an endpoint in the PCIe system.
  • the agent module 100 may include multiple base address register (BAR) spaces, and the multiple base address registers include a first base address register space and a second base address register space.
  • the multiple base address register spaces may include BRA0 space to BRA5 space, the first base address register space may be BRA0 space, and the second base address register space may be BRA3 space.
  • the proxy register space Re corresponds to the first base address register space, such as the BRA0 space
  • the configuration instruction space Mem3 corresponds to the second base address register space, such as the BRA3 space.
  • the base address register space refers to the BAR space of PCIe;
  • the emulation device is a device of the host (HOST) and is connected to the HOST through the PCIe interface.
  • HOST host
  • the HOST needs to map the space on the emulation device to the BAR space of PCIe; If there are multiple independently accessible spaces on the emulation device, PCIe provides multiple BAR spaces for mapping.
  • the configuration instruction space Mem3 is mapped to the shared storage file ivshmem shared by the agent module 110 and the host 200 and the shared storage file ivshmem is located on the host 200. That is to say, in the simulation device 100, all accesses to the BAR3 space are forwarded to the shared storage file ivshmem on the host 200, and the host 200 writes the shared storage file ivshmem to realize the proxy module in the simulation device 100. 110 Send configuration instructions.
  • model space Mem1 and storage space Mem2 can be mapped to the same location in the shared storage file
  • configuration instruction space Mem3 maps to the same location in the shared storage file as the model space Mem1/storage space Mem2 maps to the shared storage file. settings are different.
  • the agent module 110 is configured to: based on the contents in the configuration instruction space Mem3, provide the contents in the agent register space Re and the model space Mem1 to the object model 300 to configure and schedule the object model 300 and perform work simulation. It should be noted that when the model space Mem1 can be mapped to the storage space Mem2 in the object model, the object model 300 can directly access the model space Mem1 to obtain required data, for example, task-related input data.
  • the host 200 can pull an interrupt to the agent module 110 through a socket, that is, the tasks sent by the host 200 and executed by the object model 300 can be executed through the interrupt.
  • the agent module 110 may include an interrupt register, and the host writes task-related notification information to the interrupt register to notify the management module 120 to perform the task in an interrupt manner.
  • the communication content between the memory manager (runtime) process in the host 200 and the simulation device 100 through the socket method is: the memory manager writes to the interrupt register of the agent module 110 in the simulation device 100 , to write notification information related to the task; after the interrupt register of the agent module 110 is written with notification information related to the task, an interrupt will be pulled to the management module 120; after the interrupt of the agent module 110 is pulled to the management module 120, it will be executed
  • the interrupt handler of the agent module 110 (belongs to part of the driver of the agent module 110), and the interrupt handler is in the kernel state (when a process is executed in the kernel code due to a system call, it is in the kernel running state (kernel state), at this time , the highest privilege level), therefore, it is necessary to notify the scheduler of the user state (when a process is executing the user'
  • the simulation device 100 may also include a software module 150.
  • the software module 150 includes an application program App and a virtual machine system (Guest OS).
  • the application program App includes a variety of tools Tools, library files, Scheduler, etc., the virtual machine system can include the operating system kernel and drivers.
  • the management module 120 runs an operating system kernel.
  • the operating system kernel may be a Linux 5.2 kernel, etc.
  • the driver of the agent module 110 is loaded into the operating system kernel to be executed.
  • the driver of the agent module 110 may be loaded into the kernel through a kernel module.
  • the kernel module is a concept of the operating system; the driver is generally The kernel module is loaded by the operating system.
  • At least one embodiment of the present disclosure also provides a simulation system.
  • the simulation system 1000 may include a simulation device 100 , a host 200 and an object model 300 . It should be noted that, regarding the communication method between the simulation device 100, the host 200 and the object model 300, reference may be made to the description of the embodiment of the simulation device 100 above, and repeated descriptions will not be repeated.
  • the host 200 is configured to obtain tasks and send the tasks to the simulation device 100 .
  • the host 200 includes a memory manager that communicates with the simulation device 100 to transmit tasks to the simulation device 100 and receive feedback information returned from the simulation device 100 after the object model 300 processes work information related to the task.
  • the object model 300 is configured to process work information to obtain feedback information.
  • the object model 300 is used to simulate the functionality of a hardware accelerator (eg, a neural network processor), and may be modeled using SystemC language.
  • SystemC is a modeling platform composed of a set of C++ class libraries. It adds a simulation core and can support hardware modeling at the system level, behavior description level and register conversion level.
  • the object model 300 can also be modeled using Verlog language.
  • the abstraction level of the object model 300 is at least one of an algorithm level (ALM), a system architecture level (SAM), a transaction level (TLM), and a register transfer level (RTL).
  • ALM algorithm level
  • SAM system architecture level
  • TTL transaction level
  • RTL register transfer level
  • the object model 300 may include execution units such as a matrix execution unit and a vector execution unit, on-chip static random access memory (Static Random-Access Memory, SRAM), SRAM controller, etc. Storage related units, and microcontroller units (Microcontroller Unit, MCU), etc.
  • execution units such as a matrix execution unit and a vector execution unit, on-chip static random access memory (Static Random-Access Memory, SRAM), SRAM controller, etc.
  • SRAM static random access memory
  • SRAM static random access memory
  • SRAM controller static random access memory
  • MCU microcontroller Unit
  • the object model 300 can model at least part of the read storage pipeline and the calculation pipeline in the neural network processor, and calculate the average execution time for each operation in the neural network processor.
  • the computing pipeline represents the integer execution unit, floating point execution unit, etc.; accordingly, the read storage pipeline represents the IO operation, and the IO operation represents the LOAD (load)/SAVE (storage) execution unit, which is usually also called for LSU.
  • the average execution time of each operation can represent the number of execution cycles of each stage (for example, the stage represents a pipeline stage), that is, the number of cycles of executing each operation. For example, by counting the average execution time of each operation, an approximately cycle-accurate model can be calculated. type.
  • the object model 300 supports two modes: functional mode/performance mode; in the performance mode, only the pipeline is simulated (for example, the pipeline represents a linear communication model of pipeline segments that exchange data) Due to the delay, the actual operations corresponding to the read storage pipeline and the calculation pipeline are not executed. In the performance mode, an accurate number of simulated cycles can be obtained. Compared with the functional mode, the execution speed is increased by an order of magnitude.
  • SystemC is used to model the delay in the read storage pipeline and the calculation pipeline of the object model 300.
  • the object model 300 meets the AI compiler's requirements for a high-performance cost model, so that the object model 300 can be used as an AI
  • the cost model of the compiler as the cost-model of the AI compiler, can significantly improve the inference time performance indicators of the Resent50 network on the chip, for example, it can reduce the inference time.
  • At least one embodiment of the present disclosure also provides a simulation method applied to a simulation system.
  • the simulation system can be a simulation system provided by any embodiment of the present disclosure, for example, the simulation system 1000 shown in FIG. 2 .
  • Figure 3 is a schematic flow chart of a simulation method provided by at least one embodiment of the present disclosure.
  • the simulation method may include the following steps S10 to S13.
  • Step S10 Send the task through the host.
  • Step S11 Receive and parse the task through the simulation device to send task-related work information to the object model.
  • Step S12 The object model processes the work information to obtain feedback information.
  • Step S13 Provide feedback information to the host through the simulation device.
  • step S10 is implemented by the host 200.
  • the host 200 can receive a task from an external device and send the task to the simulation device 100.
  • the task can be sent to the host 200 by the user through an external device.
  • step S11 is implemented by the simulation device 100.
  • the management module 120 in the simulation device 100 can parse the task, and then the agent module 110 in the simulation device 100 sends task-related work information to the object model 300.
  • step S12 is implemented by the object model 300.
  • the object model 300 can process the work information to obtain feedback information, and the feedback information can be sent to the simulation device 100.
  • the feedback information may include the results after the object model 300 processes the work information.
  • the feedback information may include the probability of having the target in the image, etc.
  • step S13 is implemented by the simulation device 100.
  • the simulation device 100 receives the feedback information returned by the object model 300 after processing the work information through the agent module 110.
  • At least one embodiment of the present disclosure also provides a simulation device, which may include one or more memories and one or more processors. It should be noted that the components of the above-mentioned simulation device are only exemplary and not restrictive. According to actual application requirements, the simulation device may also have other components, which are not specifically limited by the embodiments of the present disclosure.
  • one or more memories are configured to non-transitory store computer-executable instructions; one or more processors are configured to execute the computer-executable instructions.
  • Computer-executable instructions when executed by one or more processors, implement one or more steps in the simulation method according to any embodiment of the present disclosure. For the specific implementation and related explanations of each step of the simulation method, please refer to the embodiments of the above simulation method, and repeated details will not be repeated here.
  • a processor and a memory may communicate with each other directly or indirectly.
  • processors and memory can communicate over a network.
  • a network may include a wireless network, a wired network, and/or any combination of wireless and wired networks.
  • the processor and the memory can also communicate with each other through the system bus, which is not limited by this disclosure.
  • the processor and memory can be provided on the server side (or in the cloud).
  • the processor may control other components in the simulation device to perform desired functions.
  • the processor may be a central processing unit (CPU), a graphics processing unit (GPU), a network processor (NP), etc.; the processor may also be other forms of processing units with data processing capabilities and/or program execution capabilities, for example, Digital signal processor (DSP), application specific integrated circuit (ASIC), field programmable gate array (FPGA), tensor processing unit (TPU) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • TPU tensor processing unit
  • the central processing unit (CPU) can be X86 or ARM architecture, etc.
  • memory may be a computer-readable medium, and may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. sexual memory.
  • Volatile memory may include, for example, random Machine access memory (RAM) and/or cache memory (cache), etc.
  • Non-volatile memory may include, for example, read-only memory (ROM), hard disk, erasable programmable read-only memory (EPROM), portable compact disk read-only memory (CD-ROM), USB memory, flash memory, and the like.
  • One or more computer-readable instructions may be stored on the computer-readable storage medium, and the processor may execute the computer-readable instructions to implement various functions of the simulation device.
  • Various applications and various data can also be stored in the storage medium.
  • FIG. 4 is a schematic diagram of a non-transitory computer-readable storage medium provided by at least one embodiment of the present disclosure.
  • one or more computer-executable instructions 401 may be non-transitory stored on non-transitory computer-readable storage medium 40.
  • one or more steps in the simulation method according to any embodiment of the present disclosure may be performed when the computer-executable instructions 401 are executed by a processor.
  • the non-transitory computer-readable storage medium 40 can be applied to the above-mentioned simulation device.
  • the non-transitory computer-readable storage medium 40 may include the memory in the above-mentioned simulation device.
  • the description of the non-transitory computer-readable storage medium 40 may refer to the description of the memory in the embodiment of the simulation device, and repeated descriptions will not be repeated.
  • FIG. 5 shows a schematic structural diagram of an electronic device 500 suitable for implementing embodiments of the present disclosure.
  • the electronic device 500 may be a terminal device (for example, a computer) or a processor, and may be used to execute the simulation method of the above embodiment.
  • Electronic devices in embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, personal digital assistants (Personal Digital Assistant, PDA for short), tablet computers (Portable Android Device, PAD for short), portable multimedia Portable Media Player (PMP for short), vehicle terminals (such as vehicle navigation terminals), mobile terminals such as wearable electronic devices, and fixed terminals such as digital TVs, desktop computers, smart home devices, etc.
  • PDA Personal Digital Assistant
  • PDA Personal Digital Assistant
  • PAD Portable Media Player
  • vehicle terminals such as vehicle navigation terminals
  • mobile terminals such as wearable electronic devices
  • fixed terminals such as digital TVs, desktop computers, smart home devices, etc.
  • the electronic device shown in FIG. 5 is only an example and should
  • the electronic device 500 may include a processing device (eg, central processing unit, graphics processor, etc.) 501 that may be loaded into a random access device according to a program stored in a read-only memory (ROM) 502 or from a storage device 508 .
  • the program in the memory (RAM) 503 executes various appropriate actions and processes.
  • various programs required for the operation of the electronic device 500 are also stored. sequence and data.
  • the processing device 501, the ROM 502 and the RAM 503 are connected to each other via a bus 504.
  • An input/output (I/O) interface 505 is also connected to bus 504.
  • input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a Liquid Crystal Display (LCD) , an output device 507 such as a speaker, a vibrator, etc.; a storage device 508 including a magnetic tape, a hard disk, etc.; and a communication device 509.
  • Communication device 509 may allow electronic device 500 to communicate wirelessly or wiredly with other devices to exchange data.
  • FIG. 5 illustrates electronic device 500 with various means, it should be understood that implementation or availability of all illustrated means is not required. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a non-transitory computer-readable medium, the computer program including program code for performing the method illustrated in the flowchart to perform the method according to One or more steps in the simulation method described above.
  • the computer program may be downloaded and installed from the network via communication device 509, or from storage device 508, or from ROM 502.
  • the computer program When the computer program is executed by the processing device 501, it can cause the processing device 501 to perform the above-mentioned functions defined in the simulation method of the embodiment of the present disclosure.
  • a computer-readable medium may be a tangible medium that may contain or be stored for use by or in conjunction with an instruction execution system, apparatus, or device. program.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two.
  • the computer-readable storage medium may be, for example, but is not limited to: an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination thereof.
  • Computer readable storage media may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), removable Programmd read-only memory (EPROM or flash memory), fiber optics, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying therein Computer-readable program code.
  • Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer-readable medium may be transmitted using any suitable medium, including but not limited to: wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; it may also exist independently without being assembled into the electronic device.
  • Computer program code for performing the operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages such as Java, Smalltalk, C++, or a combination thereof. Includes conventional procedural programming languages, such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer, such as through an Internet service provider through the Internet. connect).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagram may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or operations. , or can be implemented using a combination of specialized hardware and computer instructions.
  • the units involved in the embodiments of the present disclosure can be implemented in software or hardware.
  • the name of a unit does not constitute a reference to the unit in a certain situation.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs Systems on Chips
  • CPLD Complex Programmable Logical device
  • a simulation device is used to simulate a neural network processor, and includes: an agent module configured to interact with an object model that simulates the neural network processor. communicate and serve as an agent of the object model in the simulation device, a management module configured to manage the simulation device, and an interconnection module configured to communicate with the agent module and the management module , wherein the simulation device receives the task sent by the host, sends the work information related to the task to the object model through the agent module, and receives the object model through the agent module to process the work information. The feedback information returned later is provided to the host.
  • the proxy module communicates with the host through a socket, a shared storage, and/or a message queue.
  • the address space of the agent module includes an agent register space and a model space.
  • the agent register space is used to define registers of the neural network processor, and the neural network processor The registers are located by means of memory addresses; the model space is used to store parameters of the neural network processor and input and/or output related to the task.
  • the model space is mapped to a storage space in the object model.
  • the model space is shared by the agent module and the host.
  • the address space of the agent module also includes a configuration instruction space, and the configuration instruction space is used to store configuration instructions of the registers of the neural network processor and instructions related to the task. Control instruction.
  • the proxy module includes a plurality of base address register spaces, the plurality of base address register spaces include a first base address register space and a second base address register space, and the proxy register space Corresponding to the first base address register space, the configuration The instruction space corresponds to the second base address register space.
  • the configuration instruction space is mapped to a shared storage file shared by the agent module and the host and the shared storage file is located on the host.
  • the proxy module is configured to provide content in the proxy register space and the model space to the object model based on content in the configuration instruction space. Configure and schedule the object model and perform job simulations.
  • the agent module communicates with the object model through a message queue.
  • an operating system kernel runs in the management module, and the driver of the agent module is loaded into the operating system kernel.
  • the agent module includes an interrupt register
  • the host writes notification information related to the task to the interrupt register to notify the management module to execute the requested task in an interrupt manner. Describe the task.
  • the simulation device further includes an input and output module.
  • the input and output module communicates with the host through a socket to receive the task.
  • the input and output module is configured To send the task to the agent module or the management module.
  • the simulation device is implemented through a virtual simulation platform, and the management module is implemented through a virtual central processor.
  • a simulation system includes: the simulation device according to any embodiment of the present disclosure, the object model and the host, wherein the host is It is configured to obtain the task and send the task to the simulation device; the object model is configured to process the work information to obtain the feedback information.
  • the abstraction level of the object model is an algorithm level, a system structure level, a transaction level or a register transfer level.
  • the object model models at least part of the read storage pipeline and the calculation pipeline in the neural network processor, and for each operation in the neural network processor Statistics average execution time.
  • a simulation method applied to the simulation system includes: sending a task through the host; receiving and Parse the task to send work information related to the task to the Object model; the object model processes the work information to obtain the feedback information; and provides the feedback information to the host through the simulation device.
  • a fourth aspect a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer-executable instructions, the computer-executable When executed by the processor, the instructions implement the simulation method according to any embodiment of the present disclosure.

Abstract

本公开的实施例提供一种仿真装置、仿真系统及其仿真方法、非瞬时性计算机可读存储介质。该仿真装置用于对神经网络处理器进行仿真,且包括:代理模块,被配置为与模拟神经网络处理器的对象模型进行通信并在仿真装置中作为对象模型的代理;管理模块,被配置为对仿真装置进行管理;互连模块,被配置为将代理模块和管理模块通信连接。仿真装置接收主机发送的任务,通过代理模块将与任务相关的工作信息发送给对象模型,通过代理模块接收对象模型对工作信息处理后所返回的反馈信息,并将反馈信息提供给主机。

Description

仿真装置、仿真系统及其仿真方法、存储介质
本申请要求于2022年5月31日递交的中国专利申请第202210613143.4号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。
技术领域
本公开的实施例涉及一种仿真装置、仿真系统及其仿真方法、非瞬时性计算机可读存储介质。
背景技术
随着人工智能(人工智能,Artificial Intelligence)的发展,算法模型的参数量剧增,对算力的需求越来越大。对于传统硬件架构(例如,CPU(中央处理器,central processing unit)/GPU(图形处理器,graphics processing unit)),由于其在架构设计阶段考虑了不同业务需求之间的平衡,导致其在AI应用上提供的算力有限。由此,领域专用加速器(Domain-Specific Accelerator,DSA)应运而生。DSA的核心思想同样是使用专用的硬件做专用的事情,DSA是满足一个领域(Domain)内的应用,而非一个固定的应用,因此,DSA能够满足灵活性与专用性的折衷。
发明内容
提供该内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。
本公开至少一个实施例提供一种仿真装置,该仿真装置用于对神经网络处理器进行仿真且包括:代理模块,被配置为与模拟所述神经网络处理器的对象模型进行通信并在所述仿真装置中作为所述对象模型的代理;管理模块,被配置为对所述仿真装置进行管理;互连模块,被配置为将所述代理模块和所述管理模块通信连接。所述仿真装置接收主机发送的任务,通过所述代理 模块将与所述任务相关的工作信息发送给所述对象模型,通过所述代理模块接收所述对象模型对所述工作信息处理后所返回的反馈信息,并将所述反馈信息提供给所述主机。
本公开至少一个实施例还提供一种仿真系统,包括:根据本公开任一实施例所述的仿真装置、对象模型和主机,所述主机被配置为获取所述任务并将所述任务发送至所述仿真装置;所述对象模型被配置为对所述工作信息进行处理以得到所述反馈信息。
本公开至少一个实施例还提供一种应用于本公开任一实施例所述的仿真系统的仿真方法,包括:通过所述主机发送任务;通过所述仿真装置接收并解析所述任务,以将与所述任务相关的工作信息发送给所述对象模型;所述对象模型对所述工作信息进行处理,以得到所述反馈信息;通过所述仿真装置将所述反馈信息提供给所述主机。
本公开至少一个实施例还提供一种非瞬时性计算机可读存储介质,其中,所述非瞬时性计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令被处理器执行时实现根据上述任一实施例所述的仿真方法。
附图说明
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。
图1A为本公开至少一个实施例提供的一种仿真装置的硬件架构的示意图;
图1B为本公开至少一个实施例提供的另一种仿真装置的硬件架构的示意图;
图2为本公开至少一个实施例提供的一种仿真系统的硬件架构的示意图;
图3为本公开至少一个实施例提供的一种仿真方法的示意性流程图;
图4为本公开至少一个实施例提供的一种非瞬时性计算机可读存储介质的示意图;
图5为本公开至少一个实施例提供的一种电子设备的硬件结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
经过最近几年的发展,AI硬件加速器的架构设计日趋复杂。从最初的单核共享内存,逐步发展到目前的同构众核-分布式内存,AI硬件加速器中的核的设计也由简单的超长指令字(Very Long Instruction Word,VLIW)结构发展成图灵完备的指令集架构(ISA,Instruction Set Architecture),指令驱动的数据访问的两种实现架构,即Master-Slave(数据块)和Data-Streaming(数据流),并存。对于AI硬件加速器的前期的架构设计探索以及后期的验证等,一个完善的用于对AI硬件加速器进行仿真的仿真系统变得不可或缺,也即,目前AI硬件加速器缺少一个简单高效的仿真框架,用于架构设计和功能验证。
AI编译器涉及到的优化策略主要包括算子融合、拆分、IO/计算并行调度等,这些优化策略构成了一个巨大的优化搜索空间。每一种优化策略最终在底层加速器上执行的效率是不同的,需要代价模型(cost-model)来评估每一种优化策略,以及减小优化搜索空间。代价模型需要足够精确,同时不能引入过多编译阶段的开销。目前,开源全系统仿真平台包括gem5模拟器,gem5模拟器的缺点主要包括:系统庞杂、不是为AI加速器建模而设计、执行效率低等,而且,gem5模拟器作为代价模型,其开销太大。因此,目前AI编译器需要高性能的cost-model进行评估。
本公开至少一个实施例提供一种仿真装置。该仿真装置用于对神经网络处理器进行仿真,且包括:代理模块、管理模块和互连模块。代理模块被配置为与模拟神经网络处理器的对象模型进行通信并在仿真装置中作为对象模型的代理;管理模块被配置为对仿真装置进行管理;互连模块被配置为将代理模块和管理模块通信连接。仿真装置接收主机发送的任务,通过代理模块将与任务相关的工作信息发送给对象模型,通过代理模块接收对象模型对工作信息处理后所返回的反馈信息,并将反馈信息提供给主机。
本公开的实施例提供的仿真装置建模了神经网络处理器的全系统仿真平台,可以有效地实现对神经网络处理器的架构设计和探索以及功能验证,加速神经网络处理器的结构和功能设计过程,从而可以避免由于硬件而限制神经网络处理器的发展。而且,该仿真装置的结构简单,便于实现。
本公开至少一个实施例还提供一种仿真系统及其仿真方法、非瞬时性计算机可读存储介质。
下面结合附图对本公开的实施例进行详细说明,但是本公开并不限于这些具体的实施例。为了保持本公开实施例的以下说明清楚且简明,本公开省略了部分已知功能和已知部件的详细说明。
图1A为本公开至少一个实施例提供的一种仿真装置的硬件架构的示意图,图1B为本公开至少一个实施例提供的另一种仿真装置的硬件架构的示意图,图2为本公开至少一个实施例提供的一种仿真系统的硬件架构的示意图。
本公开的实施例提供的仿真装置可以用于对神经网络处理器进行仿真,该神经网络处理器可以采用硬件形式实现,例如,可以实现为基于程序设计 语言的芯片。例如,该神经网络处理器可以用于实现卷积运算、矩阵运算等。
如图1A和图1B所示,在本公开的一些实施例中,该仿真装置100可以包括代理模块110、管理模块120和互连模块130。如图2所示,代理模块110被配置为与模拟神经网络处理器的对象模型300进行通信并在仿真装置100中作为对象模型的代理,即对象模型300用于仿真神经网络处理器的功能;管理模块120被配置为对仿真装置100进行管理;互连模块130被配置为将代理模块110和管理模块120通信连接。需要说明的是,虽然图2所示的仿真装置100为图1A所示的仿真装置,但是,在仿真系统1000中的仿真装置100也可以为图1B所示的仿真装置。
例如,管理模块120可以对代理模块110进行管理,以控制代理模块110执行相应的功能。
例如,如图2所示,仿真装置100接收主机200发送的任务(task),通过代理模块110将与任务相关的工作信息发送给对象模型,通过代理模块110接收对象模型对工作信息处理后所返回的反馈信息,并将反馈信息提供给主机200。
例如,仿真装置100通过虚拟仿真平台实现,例如,仿真装置100可以实现为QEMU(Quick EMUlator)虚拟平台(virt platform)等。
例如,管理模块120通过虚拟中央处理器实现,例如,在一些实施例中,虚拟中央处理器可以为RISC(Reduced Instruction Set Computer RISC)-V(V表示为第五代RISC)核。
例如,任务可以为任何需要由神经网络处理器执行的任务,例如,目标识别、矩阵运算(例如,矩阵乘法运算)等。
例如,代理模块110与主机200之间可以通过至少一种方式通信,例如,至少一种方式可以包括套接字方式、共享存储方式和/或消息队列方式通信。如图2所示,在一些实施例中,代理模块110与主机200通过套接字方式、共享存储方式和消息队列方式通信。套接字方式可以为Unix域套接字(UDS,Unix Domain Socket)方式。共享存储方式表示通过共享存储文件的方式进行通信,例如,共享存储文件可以为主机200中的/dev/shm/ivshmem文件。ivshmem文件可以实现在不同QEMU的进程间共享由主机200创建的内存区域。消息队列(message Queue)方式MQ1可以采用先进先出(First Input  First Output,FIFO)的方式传递消息。
例如,在代理模块110与主机200之间,通过共享存储方式实现数据量较大的数据之间的通信,而通过套接字方式和/或消息队列方式实现数据量较小的数据之间的通信。
例如,代理模块110与对象模型300之间也可以通过至少一种方式通信,例如,代理模块110与对象模型300之间通过消息队列方式通信,如图2所示,在一些实施例中,代理模块110与对象模型300通过消息队列方式通信,例如,代理模块110通过消息队列方式MQ2将与任务相关的工作信息发送给对象模型300,对象模型300通过消息队列方式MQ3将反馈信息发送给代理模块110。类似地,消息队列方式MQ2和消息队列方式MQ3可以采用FIFO的方式传递消息。
例如,对象模型300可以通过可插拔的方式连接至仿真装置100,也就是说,仿真装置100可以对不同的对象模型进行仿真,例如,不同的对象模型可以模拟不同的神经网络处理器。
例如,对象模型300可以加速神经网络处理器的运算速度,节省运算时间,提高运算效率。
例如,如图2所示,主机200可以从外部设备获取任务,并通过套接字方式将任务发送给仿真装置100。
例如,神经网络处理器的参数以及与任务相关的输入和/或输出等信息通过共享存储方式由主机200和仿真装置100访问以进行读取或写入等操作。例如,与任务相关的输入和/或输出可以根据任务的类型决定,例如,在一些实施例中,任务可以为对图像中的目标物体进行识别,并将标注了目标物体的图像反馈到主机200中,此时,输入的图像则可以为与任务相关的输入,在通过对象模型300对图像进行识别并标注了目标物体的图像可以为与任务相关的输出。
例如,仿真装置100可以将反馈信息通过消息队列方式发送(例如,同步)至主机200。
例如,在一些实施例中,如图1B所示,仿真装置100还包括可以输入输出模块140,输入输出模块140可以与主机200进行交互,例如,通过套接字方式与主机200进行通信,以接收任务。例如,输入输出模块140被配 置为将任务发送给代理模块110或管理模块120。例如,输入输出模块140可以包括缓存器(buffer)等,以用于存储任务。
例如,如图1A和图1B所示,代理模块110的地址空间可以包括代理寄存器空间Re和模型空间Mem1。
例如,代理寄存器空间Re用于定义神经网络处理器的寄存器,神经网络处理器的寄存器都是存储器映射(memory mapped)的,memory mapped指的是设备的寄存器和内存统一编址,即用内存地址来定位设备的寄存器,从而神经网络处理器的寄存器被定义在内存空间中,可以通过内存地址的方式被定位从而实现读写。
例如,代理寄存器空间Re可以通过消息队列方式与主机200和对象模型300进行通信。
例如,模型空间Mem1用于存放神经网络处理器的参数以及与任务相关的输入和/或输出等。
例如,在一些实施例中,模型空间Mem1由代理模块110与主机200共享,此时,例如,模型空间Mem1通过共享存储方式与主机200进行通信。
例如,在一些实施例中,如图2所示,模型空间Mem1可以被映射为对象模型中的存储空间Mem2。由此可知,模型空间Mem1和存储空间Mem2为同一地址空间,且实际的存储被映射到主机200中的共享存储文件中。也就是说,模型空间Mem1实际上可以被仿真装置100、主机200和对象模型300访问,例如,主机200通过将神经网络处理器的参数以及与任务相关的输入和/或输出等信息写入到模型空间Mem1,则对象模型300可以直接读取该写入到模型空间Mem1中的与任务相关的输入和/或输出等信息以进行相关处理。
例如,在一些实施例中,如图2所示,代理模块110的地址空间还可以包括配置指令空间Mem3。配置指令空间Mem3用于存放神经网络处理器的寄存器的配置指令以及与任务相关的控制指令。
例如,在一些实施例中,代理模块110、管理模块120和互连模块130可以挂载到高速串行计算机扩展总线(PCIe,peripheral component interconnect express)系统中。PCIe系统可以包括以下几种设备类型:根复合体(RC,root complex)、桥(Bridge)、Switch、末端设备(Endpoint)等,根复合体是CPU 和PCle总线之间的接口,Bridge提供了与其他总线(如PCI或PCI-x,甚至是另一个PCle总线)的接口,有时也被称为转发桥接;Switch提供扩展或聚合能力,并允许更多的设备连接到一个PCle的端口,Switch可以充当包路由器,根据地址或其他路由信息识别给定包需要走哪条路径,是一种PCIe转PCIe的桥;Endpoint处于PCIe总线系统的拓扑结构中的最末端,一般作为总线操作的发起者(initiator,类似于PCI总线中的主机)或者终结者(Completers,类似于PCI总线中的从机)。
例如,在一些实施例中,互连模块130可以为PCIe系统中的根复合体。
例如,在一些实施例中,代理模块100可以为PCIe系统中的endpoint。例如,代理模块100可以包括多个基地址寄存器(base address register,BAR)空间,多个基地址寄存器包括第一基地址寄存器空间和第二基地址寄存器空间。在一些实施例中,多个基地址寄存器空间可以包括BRA0空间~BRA5空间,第一基地址寄存器空间可以为BRA0空间,第二基地址寄存器空间可以为BRA3空间。
例如,代理寄存器空间Re对应于第一基地址寄存器空间,例如BRA0空间,配置指令空间Mem3对应于第二基地址寄存器空间,例如BRA3空间。基地址寄存器空间指的是PCIe的BAR空间;仿真装置作为主机(HOST)的设备,通过PCIe接口与HOST相连;HOST要访问仿真装置,需要将仿真装置上的空间映射到PCIe的BAR空间里;如果仿真装置上有多个独立的可访问空间,PCIe提供了多个BAR空间以供映射。
例如,配置指令空间Mem3被映射到由代理模块110和主机200共享的共享存储文件ivshmem且共享存储文件ivshmem位于主机200。也就是说,在仿真装置100中,对BAR3空间的所有访问,都被转发到主机200上的共享存储文件ivshmem,而主机200通过写该共享存储文件ivshmem以实现给仿真装置100中的代理模块110发送配置指令。
例如,在本公开的一些实施例中,如图2所示,对于模型空间Mem1、存储空间Mem2和配置指令空间Mem3,其实际的存储被映射到主机200中的共享存储文件中,需要说明的是,模型空间Mem1和存储空间Mem2可以映射到共享存储文件中的同一位置,而配置指令空间Mem3映射到共享存储文件的位置与模型空间Mem1/存储空间Mem2映射到共享存储文件中的位 置不同。
例如,代理模块110被配置为:基于配置指令空间Mem3中的内容,将代理寄存器空间Re和模型空间Mem1中的内容提供至对象模型300以配置和调度对象模型300以及进行工作模拟。需要说明的是,当模型空间Mem1可以被映射为对象模型中的存储空间Mem2时,对象模型300可以直接访问模型空间Mem1而获取需要的数据,例如,与任务相关的输入等数据。
例如,主机200可以通过套接字方式给代理模块110拉中断,即主机200发送的由对象模型300执行的任务可以通过中断方式被执行。
例如,在一些实施例中,代理模块110可以包括中断寄存器,主机向中断寄存器写入与任务相关的通知信息,以通过中断方式通知管理模块120来执行任务。例如,主机200中的内存管理器(runtime)的进程,通过套接字方式与仿真装置100之间的通信的内容是:内存管理器对仿真装置100中的代理模块110的中断寄存器进行写操作,以写入与任务相关的通知信息;代理模块110的中断寄存器被写入与任务相关的通知信息之后,会给管理模块120拉中断;代理模块110的中断拉给管理模块120之后,会执行代理模块110的中断处理程序(属于代理模块110的驱动程序的一部分),而中断处理程序在内核态(当一个进程因为系统调用陷入内核代码中执行时处于内核运行态(内核态),此时,特权级最高),因此,需要通知用户态(当一个进程在执行用户自己的代码时处于用户运行态(即用户态),此时,特权级最低)的调度程序,例如,可以采用信号(signal)的方式通知用户态的调度程序接收新的任务(即主机200发送的任务)。
例如,如图1A和图1B所示,仿真装置100还可以包括软件模块150,软件模块150包括应用程序App和虚拟机系统(Guest OS),应用程序App包括多种工具Tools、库文件Library、调度器Scheduler等,虚拟机系统可以包括操作系统内核和驱动器。
例如,管理模块120中运行有操作系统内核,例如,操作系统内核可以为Linux 5.2内核等。
例如,代理模块110的驱动程序被加载到操作系统内核以被执行,例如,代理模块110的驱动程序可以通过内核模块(kernel module)的方式加载到内核。内核模块(kernel module)是操作系统的一个概念;驱动程序一般作 为内核模块被操作系统加载。
本公开至少一个实施例还提供一种仿真系统。
例如,如图2所示,仿真系统1000可以包括仿真装置100、主机200和对象模型300。需要说明的是,关于仿真装置100、主机200和对象模型300之间的通信方式可以参考上述对于仿真装置100的实施例中的描述,重复之处不再赘述。
例如,主机200被配置为获取任务并将任务发送至仿真装置100。例如,主机200包括内存管理器,内存管理器与仿真装置100进行通信以传输任务至仿真装置100并从仿真装置100接收对象模型300对与任务相关的工作信息处理后所返回的反馈信息。
例如,对象模型300被配置为对工作信息进行处理以得到反馈信息。
例如,在一些实施例中,对象模型300用于仿真硬件加速器(例如,神经网络处理器)的功能,且可以采用SystemC语言进行建模。SystemC是由一组C++类库所组成的建模平台,加入了一个仿真核,可以在系统级、行为描述级和寄存器转换级支持硬件建模。
又例如,对象模型300也可以采用Verlog语言进行建模。
例如,对象模型300的抽象层次为算法级(ALM)、系统结构级(SAM)、事务级(TLM)和寄存器传输级(RTL)中的至少一个。
例如,在一些实施例中,对象模型300可以包括矩阵(Matrix)执行单元、向量(Vector)执行单元等执行单元,片上静态随机存取存储器(Static Random-Access Memory,SRAM),SRAM控制器等存储相关单元,以及微控制单元(Microcontroller Unit,MCU)等。
例如,在一些实施例中,对象模型300可以对神经网络处理器中的至少部分读取存储流水线和计算流水线进行建模,并针对神经网络处理器中的每个操作统计平均执行时间。例如,以CPU举例,计算流水线表示整型执行单元、浮点执行单元等;相应地,读取存储流水线表示IO操作,IO操作表示LOAD(加载)/SAVE(存储)执行单元,通常也被称为LSU。
例如,每个操作的平均执行时间可以表示每个阶段(例如,阶段表示pipeline stage)的执行周期(cycle)数,即执行每个操作的周期(cycle)数。例如,通过对每个操作统计平均执行时间,可以计算得到近似周期精确的模 型。
为了提高对象模型300作为cost-model的执行效率,对象模型300支持两种模式:功能模式/性能模式;在性能模式下,只模拟pipeline(例如,pipeline表示交换数据的管线段的线性通信模型)上的延迟,读取存储流水线和计算流水线对应的实际操作并不执行,在性能模式下可以得到一个准确的模拟cycle数,相对于功能模式,其执行速度提高了一个数量级。
在本公开的实施例中,利用SystemC建模对象模型300的读取存储流水线和计算流水线中的延迟,该对象模型300满足AI编译器对高性能代价模型的需求,从而对象模型300可以作为AI编译器的代价模型,而作为AI编译器的cost-model,能够明显提高resent50网络在芯片上的推理时间性能指标,例如可以减少推理时间。
本公开至少一个实施例还提供一种应用于仿真系统的仿真方法,仿真系统可以为本公开任一实施例提供的仿真系统,例如,图2所示的仿真系统1000。
图3为本公开至少一个实施例提供的一种仿真方法的示意性流程图。
如图3所示,该仿真方法可以包括以下步骤S10~S13。
步骤S10:通过主机发送任务。
步骤S11:通过仿真装置接收并解析任务,以将与任务相关的工作信息发送给对象模型。
步骤S12:对象模型对工作信息进行处理,以得到反馈信息。
步骤S13:通过仿真装置将反馈信息提供给主机。
例如,以图2所示的仿真系统1000为例,步骤S10由主机200实现,在步骤S10中,主机200可以从外部设备接收任务,并将任务发送至仿真装置100。例如,在一些实施例中,该任务可以由用户通过外部设备发送至主机200中。
例如,步骤S11由仿真装置100实现,例如,仿真装置100中的管理模块120可以解析任务,然后,仿真装置100中的代理模块110将与任务相关的工作信息发送至对象模型300。
例如,步骤S12由对象模型300实现,在步骤S12中,对象模型300可以对工作信息进行处理,以得到反馈信息,该反馈信息可以被发送至仿真装置100。例如,反馈信息可以包括对象模型300对工作信息进行处理之后的 结果,例如,当任务为对图像中的目标进行识别时,该反馈信息可以包括该图像中具有该目标的概率等。
例如,步骤S13由仿真装置100实现,在步骤S13中,仿真装置100通过代理模块110接收对象模型300对工作信息处理后所返回的反馈信息。
关于仿真方法可以实现的技术效果可以参考上述仿真装置和仿真系统的实施例中的相关描述,重复之处不再赘述。
本公开至少一个实施例还提供一种仿真装置,该仿真装置可以包括一个或多个存储器和一个或多个处理器。应当注意,上述仿真装置的组件只是示例性的,而非限制性的,根据实际应用需要,该仿真装置还可以具有其他组件,本公开的实施例对此不作具体限制。
例如,一个或多个存储器用于非瞬时性地存储有计算机可执行指令;一个或多个处理器被配置为运行所述计算机可执行指令。计算机可执行指令被一个或多个处理器运行时实现根据本公开任一实施例所述的仿真方法中的一个或多个步骤。关于该仿真方法的各个步骤的具体实现以及相关解释内容可以参见上述仿真方法的实施例,重复之处在此不作赘述。
例如,处理器和存储器之间可以直接或间接地互相通信。
例如,处理器和存储器可以通过网络进行通信。网络可以包括无线网络、有线网络、和/或无线网络和有线网络的任意组合。处理器和存储器之间也可以通过系统总线实现相互通信,本公开对此不作限制。
例如,处理器和存储器可以设置在服务器端(或云端)。
例如,处理器可以控制仿真装置中的其它组件以执行期望的功能。处理器可以是中央处理器(CPU)、图形处理器(GPU)、网络处理器(NP)等;处理器还可以为具有数据处理能力和/或程序执行能力的其它形式的处理单元,例如,数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、张量处理单元(TPU)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。中央处理元(CPU)可以为X86或ARM架构等。
例如,存储器可以为计算机可读介质,且可以包括一个或多个计算机程序产品的任意组合,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。易失性存储器例如可以包括随 机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机可读指令,处理器可以运行所述计算机可读指令,以实现仿真装置的各种功能。在存储介质中还可以存储各种应用程序和各种数据等。
关于仿真装置可以实现的技术效果可以参考上述仿真方法的实施例中的相关描述,重复之处不再赘述。
图4为本公开至少一个实施例提供的一种非瞬时性计算机可读存储介质的示意图。例如,如图4所示,在非瞬时性计算机可读存储介质40上可以非暂时性地存储一个或多个计算机可执行指令401。例如,当计算机可执行指令401由处理器执行时可以执行根据本公开任一实施例所述的仿真方法中的一个或多个步骤。
例如,该非瞬时性计算机可读存储介质40可以应用于上述仿真装置中。例如,非瞬时性计算机可读存储介质40可以包括上述仿真装置中的存储器。
例如,关于非瞬时性计算机可读存储介质40的说明可以参考仿真装置的实施例中对于存储器的描述,重复之处不再赘述。
下面参考图5,图5示出了适于用来实现本公开实施例的电子设备500的结构示意图。该电子设备500可以为终端设备(例如,计算机)或处理器等,并可用于执行上述实施例的仿真方法。本公开实施例中的电子设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,简称PDA)、平板电脑(Portable Android Device,简称PAD)、便携式多媒体播放器(Portable Media Player,简称PMP)、车载终端(例如车载导航终端)、可穿戴电子设备等等的移动终端以及诸如数字TV、台式计算机、智能家居设备等等的固定终端。图5示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图5所示,电子设备500可以包括处理装置(例如中央处理器、图形处理器等)501,其可以根据存储在只读存储器(ROM)502中的程序或者从存储装置508加载到随机访问存储器(RAM)503中的程序而执行各种适当的动作和处理。在RAM 503中,还存储有电子设备500操作所需的各种程 序和数据。处理装置501、ROM 502以及RAM 503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。
通常,以下装置可以连接至I/O接口505:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置506;包括例如液晶显示器(Liquid Crystal Display,LCD)、扬声器、振动器等的输出装置507;包括例如磁带、硬盘等的存储装置508;以及通信装置509。通信装置509可以允许电子设备500与其他设备进行无线或有线通信以交换数据。虽然图5示出了具有各种装置的电子设备500,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码,以执行根据上文所述的仿真方法中的一个或多个步骤。在这样的实施例中,该计算机程序可以通过通信装置509从网络上被下载和安装,或者从存储装置508被安装,或者从ROM 502被安装。在该计算机程序被处理装置501执行时,可以使得处理装置501执行本公开实施例的仿真方法中限定的上述功能。
需要说明的是,在本公开的上下文中,计算机可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是,但不限于:电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了 计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述介质的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言,诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言,诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络(包括局域网(LAN)或广域网(WAN))连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。例如,单元的名称在某种情况下并不构成对该单 元本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
第一方面,根据本公开的一个或多个实施例,一种仿真装置,用于对神经网络处理器进行仿真,且包括:代理模块,被配置为与模拟所述神经网络处理器的对象模型进行通信并在所述仿真装置中作为所述对象模型的代理,管理模块,被配置为对所述仿真装置进行管理,互连模块,被配置为将所述代理模块和所述管理模块通信连接,其中,所述仿真装置接收主机发送的任务,通过所述代理模块将与所述任务相关的工作信息发送给所述对象模型,通过所述代理模块接收所述对象模型对所述工作信息处理后所返回的反馈信息,并将所述反馈信息提供给所述主机。
根据本公开的一个或多个实施例,所述代理模块与所述主机之间通过套接字方式、共享存储方式和/或消息队列方式通信。
根据本公开的一个或多个实施例,所述代理模块的地址空间包括代理寄存器空间、模型空间,所述代理寄存器空间用于定义所述神经网络处理器的寄存器,且所述神经网络处理器的寄存器通过内存地址的方式被定位;所述模型空间用于存放所述神经网络处理器的参数以及与所述任务相关的输入和/或输出。
根据本公开的一个或多个实施例,所述模型空间被映射为所述对象模型中的存储空间。
根据本公开的一个或多个实施例,所述模型空间由所述代理模块与所述主机共享。
根据本公开的一个或多个实施例,所述代理模块的地址空间还包括配置指令空间,所述配置指令空间用于存放所述神经网络处理器的寄存器的配置指令以及与所述任务相关的控制指令。
根据本公开的一个或多个实施例,所述代理模块包括多个基地址寄存器空间,所述多个基地址寄存器包括第一基地址寄存器空间和第二基地址寄存器空间,所述代理寄存器空间对应于所述第一基地址寄存器空间,所述配置 指令空间对应于所述第二基地址寄存器空间。
根据本公开的一个或多个实施例,所述配置指令空间被映射到由所述代理模块和主机共享的共享存储文件且所述共享存储文件位于所述主机。
根据本公开的一个或多个实施例,所述代理模块被配置为,基于所述配置指令空间中的内容,将所述代理寄存器空间和所述模型空间中的内容提供至所述对象模型以配置和调度所述对象模型以及进行工作模拟。
根据本公开的一个或多个实施例,所述代理模块与所述对象模型之间通过消息队列方式通信。
根据本公开的一个或多个实施例,所述管理模块中运行有操作系统内核,所述代理模块的驱动程序加载到所述操作系统内核。
根据本公开的一个或多个实施例,所述代理模块包括中断寄存器,所述主机向所述中断寄存器写入与所述任务相关的通知信息,以通过中断方式通知所述管理模块来执行所述任务。
根据本公开的一个或多个实施例,仿真装置还包括输入输出模块,所述输入输出模块与所述主机之间通过套接字方式通信,以接收所述任务,所述输入输出模块被配置为将所述任务发送给所述代理模块或所述管理模块。
根据本公开的一个或多个实施例,所述仿真装置通过虚拟仿真平台实现,所述管理模块通过虚拟中央处理器实现。
第二方面,根据本公开的一个或多个实施例,一种仿真系统,包括:根据本公开任一实施例所述的仿真装置、所述对象模型和所述主机,其中,所述主机被配置为获取所述任务并将所述任务发送至所述仿真装置;所述对象模型被配置为对所述工作信息进行处理以得到所述反馈信息。
根据本公开的一个或多个实施例,所述对象模型的抽象层次为算法级、系统结构级、事务级或寄存器传输级。
根据本公开的一个或多个实施例,所述对象模型对所述神经网络处理器中的至少部分读取存储流水线和计算流水线进行建模,并针对所述神经网络处理器中的每个操作统计平均执行时间。
第三方面,根据本公开的一个或多个实施例,一种应用于本公开任一实施例所述的仿真系统的仿真方法,包括:通过所述主机发送任务;通过所述仿真装置接收并解析所述任务,以将与所述任务相关的工作信息发送给所述 对象模型;所述对象模型对所述工作信息进行处理,以得到所述反馈信息;通过所述仿真装置将所述反馈信息提供给所述主机。
第四方面,根据本公开的一个或多个实施例,一种非瞬时性计算机可读存储介质,其中,所述非瞬时性计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令被处理器执行时实现根据本公开任一实施例所述的仿真方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。
对于本公开,还有以下几点需要说明:
(1)本公开实施例附图只涉及到与本公开实施例涉及到的结构,其他结构可参考通常设计。
(2)在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合以得到新的实施例。
以上所述仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,本公开的保护范围应以所述权利要求的保护范围为准。

Claims (19)

  1. 一种仿真装置,用于对神经网络处理器进行仿真,包括:
    代理模块,被配置为与模拟所述神经网络处理器的对象模型进行通信并在所述仿真装置中作为所述对象模型的代理;
    管理模块,被配置为对所述仿真装置进行管理;
    互连模块,被配置为将所述代理模块和所述管理模块通信连接;
    其中,所述仿真装置接收主机发送的任务,通过所述代理模块将与所述任务相关的工作信息发送给所述对象模型,通过所述代理模块接收所述对象模型对所述工作信息处理后所返回的反馈信息,并将所述反馈信息提供给所述主机。
  2. 根据权利要求1所述的仿真装置,其中,所述代理模块与所述主机之间通过套接字方式、共享存储方式和/或消息队列方式通信。
  3. 根据权利要求1或2所述的仿真装置,其中,所述代理模块的地址空间包括代理寄存器空间和模型空间,
    所述代理寄存器空间用于定义所述神经网络处理器的寄存器,且所述神经网络处理器的寄存器通过内存地址的方式被定位;
    所述模型空间用于存放所述神经网络处理器的参数以及与所述任务相关的输入和/或输出。
  4. 根据权利要求3所述的仿真装置,其中,所述模型空间被映射为所述对象模型中的存储空间。
  5. 根据权利要求3或4所述的仿真装置,其中,所述模型空间由所述代理模块与所述主机共享。
  6. 根据权利要求3~5任一项所述的仿真装置,其中,所述代理模块的地址空间还包括配置指令空间,
    所述配置指令空间用于存放所述神经网络处理器的寄存器的配置指令以及与所述任务相关的控制指令。
  7. 根据权利要求6所述的仿真装置,其中,
    所述代理模块包括多个基地址寄存器空间,所述多个基地址寄存器包括第一基地址寄存器空间和第二基地址寄存器空间,
    所述代理寄存器空间对应于所述第一基地址寄存器空间,所述配置指令空间对应于所述第二基地址寄存器空间。
  8. 根据权利要求6所述的仿真装置,其中,所述配置指令空间被映射到由所述代理模块和所述主机共享的共享存储文件且所述共享存储文件位于所述主机。
  9. 根据权利要求6所述的仿真装置,其中,所述代理模块被配置为,基于所述配置指令空间中的内容,将所述代理寄存器空间和所述模型空间中的内容提供至所述对象模型以配置和调度所述对象模型以及进行工作模拟。
  10. 根据权利要求1~9任一项所述的仿真装置,其中,所述代理模块与所述对象模型之间通过消息队列方式通信。
  11. 根据权利要求1~10任一项所述的仿真装置,其中,所述管理模块中运行有操作系统内核,所述代理模块的驱动程序加载到所述操作系统内核。
  12. 根据权利要求1~11任一项所述的仿真装置,其中,所述代理模块包括中断寄存器,所述主机向所述中断寄存器写入与所述任务相关的通知信息,以通过中断方式通知所述管理模块来执行所述任务。
  13. 根据权利要求1~12任一项所述的仿真装置,还包括输入输出模块,
    其中,所述输入输出模块与所述主机之间通过套接字方式通信,以接收所述任务,
    所述输入输出模块被配置为将所述任务发送给所述代理模块或所述管理模块。
  14. 根据权利要求1~13任一项所述的仿真装置,其中,所述仿真装置通过虚拟仿真平台实现,所述管理模块通过虚拟中央处理器实现。
  15. 一种仿真系统,包括:
    根据权利要求1~14任一项所述的仿真装置、所述对象模型和所述主机,
    其中,所述主机被配置为获取所述任务并将所述任务发送至所述仿真装置;
    所述对象模型被配置为对所述工作信息进行处理以得到所述反馈信息。
  16. 根据权利要求15所述的仿真系统,其中,所述对象模型的抽象层次为算法级、系统结构级、事务级或寄存器传输级。
  17. 根据权利要求15或16所述的仿真系统,其中,所述对象模型对所 述神经网络处理器中的至少部分读取存储流水线和计算流水线进行建模,并针对所述神经网络处理器中的每个操作统计平均执行时间。
  18. 一种应用于权利要求15~17任一项所述的仿真系统的仿真方法,包括:
    通过所述主机发送任务;
    通过所述仿真装置接收并解析所述任务,以将与所述任务相关的工作信息发送给所述对象模型;
    所述对象模型对所述工作信息进行处理,以得到所述反馈信息;
    通过所述仿真装置将所述反馈信息提供给所述主机。
  19. 一种非瞬时性计算机可读存储介质,其中,所述非瞬时性计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令被处理器执行时实现根据权利要求18所述的仿真方法。
PCT/CN2023/097006 2022-05-31 2023-05-30 仿真装置、仿真系统及其仿真方法、存储介质 WO2023232006A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210613143.4A CN117217067A (zh) 2022-05-31 2022-05-31 仿真装置、仿真系统及其仿真方法、存储介质
CN202210613143.4 2022-05-31

Publications (1)

Publication Number Publication Date
WO2023232006A1 true WO2023232006A1 (zh) 2023-12-07

Family

ID=89026921

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/097006 WO2023232006A1 (zh) 2022-05-31 2023-05-30 仿真装置、仿真系统及其仿真方法、存储介质

Country Status (2)

Country Link
CN (1) CN117217067A (zh)
WO (1) WO2023232006A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117763876A (zh) * 2024-02-19 2024-03-26 浙江大学 一种工业装备平行仿真分析方法和装置
CN117763876B (zh) * 2024-02-19 2024-05-10 浙江大学 一种工业装备平行仿真分析方法和装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102082674A (zh) * 2009-12-01 2011-06-01 中兴通讯股份有限公司 数据通道的仿真方法和系统
CN103023967A (zh) * 2012-11-15 2013-04-03 武汉邮电科学研究院 基于simics系统模拟器的云计算仿真系统及方法
KR20130069106A (ko) * 2011-12-16 2013-06-26 국방과학연구소 임베디드 소프트웨어 검증장치 및 그 운용방법
CN103186458A (zh) * 2011-12-29 2013-07-03 联芯科技有限公司 基于嵌入式操作系统的仿真调试系统及方法
CN104053179A (zh) * 2014-05-07 2014-09-17 重庆邮电大学 一种c-ran系统级仿真平台

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102082674A (zh) * 2009-12-01 2011-06-01 中兴通讯股份有限公司 数据通道的仿真方法和系统
KR20130069106A (ko) * 2011-12-16 2013-06-26 국방과학연구소 임베디드 소프트웨어 검증장치 및 그 운용방법
CN103186458A (zh) * 2011-12-29 2013-07-03 联芯科技有限公司 基于嵌入式操作系统的仿真调试系统及方法
CN103023967A (zh) * 2012-11-15 2013-04-03 武汉邮电科学研究院 基于simics系统模拟器的云计算仿真系统及方法
CN104053179A (zh) * 2014-05-07 2014-09-17 重庆邮电大学 一种c-ran系统级仿真平台

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117763876A (zh) * 2024-02-19 2024-03-26 浙江大学 一种工业装备平行仿真分析方法和装置
CN117763876B (zh) * 2024-02-19 2024-05-10 浙江大学 一种工业装备平行仿真分析方法和装置

Also Published As

Publication number Publication date
CN117217067A (zh) 2023-12-12

Similar Documents

Publication Publication Date Title
US10942716B1 (en) Dynamic computational acceleration using a heterogeneous hardware infrastructure
US20170323045A1 (en) Method and system for designing fpga based on hardware requirements defined in source code
US8572614B2 (en) Processing workloads using a processor hierarchy system
US10180850B1 (en) Emulating applications that use hardware acceleration
Ruaro et al. Memphis: a framework for heterogeneous many-core SoCs generation and validation
JP7096213B2 (ja) 人工知能チップに適用される算出方法および人工知能チップ
CN110825435B (zh) 用于处理数据的方法和装置
US20200371843A1 (en) Framework for application driven exploration and optimization of hardware engines
WO2023232006A1 (zh) 仿真装置、仿真系统及其仿真方法、存储介质
US11593547B1 (en) Prediction and optimization of multi-kernel circuit design performance using a programmable overlay
US11392406B1 (en) Alternative interrupt reporting channels for microcontroller access devices
Popovici et al. Extending a RISC-V Core with a CAN-FD Communication Unit
Lantreibecq et al. Model checking and co-simulation of a dynamic task dispatcher circuit using CADP
US20140244232A1 (en) Simulation apparatus and simulation method
JP2021096829A (ja) 分散環境における深層学習トレーニングの最適化のためのランタイムにおけるサービスクラス属性の初期化及び管理
Bhimani et al. Design space exploration of GPU Accelerated cluster systems for optimal data transfer using PCIe bus
Wehner et al. Parallel and distributed simulation of networked multi-core systems
Reichenbach et al. LibHSA: one step towards mastering the era of heterogeneous hardware accelerators using FPGAs
Manavar et al. Experience with PCIe streaming on FPGA for high throughput ML inferencing
US11537457B2 (en) Low latency remoting to accelerators
CN115297169B (zh) 数据处理方法、装置、电子设备及介质
US20240111694A1 (en) Node identification allocation in a multi-tile system with multiple derivatives
Fleming et al. PushPush: Seamless integration of hardware and software objects via function calls over AXI
US11630935B1 (en) Data traffic injection for simulation of circuit designs
US20230114858A1 (en) Circuit design simulation and clock event reduction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23815185

Country of ref document: EP

Kind code of ref document: A1