CN110825438A

CN110825438A - Method and device for simulating data processing of artificial intelligence chip

Info

Publication number: CN110825438A
Application number: CN201810906709.6A
Authority: CN
Inventors: 柳嘉强
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Kunlun Core Beijing Technology Co ltd; Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-08-10
Filing date: 2018-08-10
Publication date: 2020-02-21
Anticipated expiration: 2038-08-10
Also published as: CN110825438B

Abstract

The embodiment of the application discloses a method and a device for simulating data processing of an artificial intelligence chip. The artificial intelligence chip comprises at least one module, and one specific implementation mode of the method comprises the following steps: acquiring a bit group sequence to be processed and hardware specification information of an artificial intelligence chip, wherein the hardware specification information comprises an instruction parsing rule, a supported instruction set and module information of modules related to instructions in the instruction set. At least one instruction is parsed from the sequence of bit groups according to an instruction parsing rule. For an instruction in at least one instruction, the simulation end time of the instruction is predicted according to the module information of the module related to the instruction, and the instruction is simulated and executed in response to the detection that the current simulation time reaches the simulation end time of the instruction. The implementation method can simulate the time sequence in the artificial intelligence chip, determine the sequence of executing each instruction according to the time sequence and increase the consistency of the running result of the simulator and the running result of the chip.

Description

Method and device for simulating data processing of artificial intelligence chip

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for simulating data processing of an artificial intelligence chip.

Background

The simulator is used for replacing a hardware chip before the chip formal tape-out, and is used for verifying the completeness of the design function of the processor (meeting the requirement of a desired application) and the correctness of implementation (whether the code meets the design expectation or not). And is used for the development of software stacks and the optimization of upper-layer application performance.

In summary, the simulator needs to simulate the behavior of the chip in both functional and performance aspects. Functionally, the results of the simulator and the chip must be consistent given the input program. In terms of performance, the simulator needs to output the time (usually expressed in cycles) required by the chip to run a given program.

Existing simulators typically need to simulate data transmitted on the pipeline every cycle during execution of each instruction, and are only simple instruction-by-instruction execution, without considering the timing factor of the instruction.

Disclosure of Invention

The embodiment of the application provides a method and a device for simulating data processing of an artificial intelligence chip.

In a first aspect, an embodiment of the present application provides a method for simulating data processing of an artificial intelligence chip, where the artificial intelligence chip includes at least one module, and the method includes: acquiring a bit group sequence to be processed and hardware specification information of an artificial intelligence chip, wherein the hardware specification information comprises an instruction parsing rule, a supported instruction set and module information of modules related to instructions in the instruction set. At least one instruction is parsed from the sequence of bit groups according to an instruction parsing rule. For an instruction in at least one instruction, the simulation end time of the instruction is predicted according to the module information of the module related to the instruction, and the instruction is simulated and executed in response to the detection that the current simulation time reaches the simulation end time of the instruction.

In some embodiments, the module information includes at least one of: interconnection information among modules, hardware structure information of modules, and protocol information of interaction among modules.

In some embodiments, simulating execution of the instruction comprises: a preset function is called to simulate the function of the instruction.

In some embodiments, simulating execution of the instruction comprises: writing the instruction into a shared queue; the instruction is fetched from the shared queue and a predetermined function is called to emulate the function of the instruction.

In some embodiments, predicting the simulated end time of the instruction based on module information of a module to which the instruction relates comprises: simulating the process executed by the instruction according to the module information of the module related to the instruction. The completion time of the instruction is determined based on a process that simulates the instruction being executed. And determining the simulation ending time of the instruction according to the current simulation time and the completion time.

In some embodiments, determining the completion time of the instruction from simulating the execution of the instruction comprises: and determining the internal processing time of the module related to the instruction according to the hardware structure information of the module related to the instruction. And decomposing the interaction between the modules in the instruction execution process into a plurality of transactions according to the protocol information of the interaction between the modules of the module related to the instruction, and determining the time of a pipeline related to the transaction and the time of queuing. And determining the completion time of the instruction according to the internal processing time of the module related to the instruction, the pipeline time related to the transaction in the instruction execution process and the queuing time.

In a second aspect, an embodiment of the present application provides an apparatus for simulating data processing of an artificial intelligence chip, where the artificial intelligence chip includes at least one module, and the apparatus includes: the acquisition unit is configured to acquire the bit group sequence to be processed and hardware specification information of the artificial intelligence chip, wherein the hardware specification information comprises an instruction parsing rule, a supported instruction set and module information of modules related to instructions in the instruction set. A parsing unit configured to parse at least one instruction from the sequence of bit groups according to an instruction parsing rule. And the simulation unit is configured to predict the simulation end time of the instruction according to the module information of the module related to the instruction for the instruction in at least one instruction, and simulate to execute the instruction in response to detecting that the current simulation time reaches the simulation end time of the instruction.

In some embodiments, the analog unit is further configured to: a preset function is called to simulate the function of the instruction.

In some embodiments, the analog unit is further configured to: the instruction is written to a shared queue. The instruction is fetched from the shared queue and a predetermined function is called to emulate the function of the instruction.

In some embodiments, the analog unit is further configured to: simulating the process executed by the instruction according to the module information of the module related to the instruction. The completion time of the instruction is determined based on a process that simulates the instruction being executed. And determining the simulation ending time of the instruction according to the current simulation time and the completion time.

In some embodiments, the analog unit is further configured to: and determining the internal processing time of the module related to the instruction according to the hardware structure information of the module related to the instruction. And decomposing the interaction between the modules in the instruction execution process into a plurality of transactions according to the protocol information of the interaction between the modules of the module related to the instruction, and determining the time of a pipeline related to the transaction and the time of queuing. And determining the completion time of the instruction according to the internal processing time of the module related to the instruction, the pipeline time related to the transaction in the instruction execution process and the queuing time.

In a third aspect, an embodiment of the present application provides a simulator, including: one or more processors. A storage device having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement a method as in any one of the first aspects.

In a fourth aspect, the present application provides a computer readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method according to any one of the first aspect.

According to the method and the device for simulating the data processing of the artificial intelligence chip, the completion time of the instruction and the state of the system after the instruction execution is completed are simulated. When the time sequence in the artificial intelligent chip is simulated, actual data copying and calculation are not carried out, so that the running speed of the simulator is improved. And determining the sequence of executing each instruction according to the time sequence simulation result, and increasing the consistency of the simulation operation result and the chip operation result.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for simulating data processing of an artificial intelligence chip according to the application;

FIG. 3 is a schematic diagram of one application scenario of a method for simulating data processing of an artificial intelligence chip in accordance with the present application;

FIG. 4 is a flow diagram of yet another embodiment of a method for simulating data processing of an artificial intelligence chip in accordance with the present application;

FIG. 5 is a schematic diagram of yet another application scenario of a method for simulating data processing of an artificial intelligence chip according to the application;

FIG. 6 is a schematic block diagram illustrating one embodiment of an apparatus for simulating data processing of an artificial intelligence chip according to the application;

FIG. 7 is a block diagram of a computer system suitable for use in implementing a simulator in accordance with embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the present methods for simulating data processing of an artificial intelligence chip or an apparatus for simulating data processing of an artificial intelligence chip may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and an AI (Artificial Intelligence) chip 105. AI chips are also referred to as AI accelerators or compute cards, i.e. modules dedicated to handling a large number of computational tasks in artificial intelligence applications (other non-computational tasks are still taken care of by the CPU). Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

101, 102, 103 to interact with the chip 105 over the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as a chip simulator, a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a display screen and supporting processor analog functions, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The AI chip 105 may be a chip that provides various services, such as a voice recognition chip, an image recognition chip, and the like that provide support for simulated instructions on the

terminal devices

101, 102, 103. The AI chip 105 may decode, execute, and the like the received instruction, and feed back the operation result to the terminal device. The terminal equipment can also obtain an instruction to be processed from the AI chip, simulate the execution of the instruction through software, compare an operation result obtained by software simulation with a chip operation result and verify the simulation effect.

It should be noted that the method for simulating data processing of an artificial intelligence chip provided in the embodiment of the present application is generally executed by the

terminal devices

101, 102, and 103, and accordingly, the apparatus for simulating data processing of an artificial intelligence chip is generally disposed in the

terminal devices

101, 102, and 103.

It should be understood that the numbers of terminal devices, networks, and AI chips in fig. 1 are merely illustrative. There may be any number of terminal devices, networks, and AI chips, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for simulating data processing of an artificial intelligence chip in accordance with the present application is shown. The artificial intelligence chip includes at least one module. The method for simulating the data processing of the artificial intelligence chip comprises the following steps:

step 201, acquiring a bit group sequence to be processed and hardware specification information of an artificial intelligence chip.

In this embodiment, an execution subject (for example, the terminal device shown in fig. 1) of the method for simulating data processing of the artificial intelligence chip may obtain the bit group sequence to be processed and the hardware specification information of the artificial intelligence chip from the third-party server in a wired connection manner or a wireless connection manner. Or the sequence of bit groups to be processed may be obtained from the AI chip. The sequence of groups of bits to be processed may also be obtained from the terminal. The input form of the bit group sequence may be a file, or may be data stored in a memory in advance. Each group of bits in the sequence of groups of bits corresponds to an instruction. The hardware specification information comprises an instruction parsing rule, a supported instruction set and module information of modules related to instructions in the instruction set. The instruction parsing rule includes a decoding rule and a separation rule of the bit group sequence. The decode rules refer to splitting and interpreting a bit group according to a predetermined instruction format (e.g., the first two bits represent an opcode and the last eight bits represent an address code), identifying different instruction classes and fetching various operands. The operation code and address code of an instruction can be resolved from a bit group according to the decoding rule. For example, a 10-bit length group of bits, the first two bits representing an opcode (e.g., 00 for addition and 01 for multiplication), and the last eight bits representing an address code. The separation rule of the bit group sequence is used for distinguishing the bit groups corresponding to different instructions. For example, for different groups of bits, the separation is performed using a separator. Alternatively, the different bit groups are separated by a fixed bit group length. On storage, different bit group sequences can be separated by means of different separation forms.

In some optional implementations of this embodiment, the module information includes at least one of: interconnection information among modules, hardware structure information of modules, and protocol information of interaction among modules. The interconnection information between modules refers to whether there is a connection between different modules, by what means, for example, a bus connection, etc. Hardware configuration information for a module is used to determine when the module will interact with other modules. The protocol information of the interaction between the modules refers to various parameters related to the interaction between the modules, for example, information that affects the instruction execution time, such as the size of data read from an SRAM (Static Random-Access Memory) each time, the size of data written into a DRAM (Dynamic Random-Access Memory), and the like.

Step 202, at least one instruction is parsed from the bit group sequence according to the instruction parsing rule.

In this embodiment, an instruction is parsed from an input sequence of bit groups according to an instruction parsing rule. First, the bit group sequence is divided into at least one bit group according to a division rule of the bit group sequence. Then, for each group of bits, the group of bits is decoded into an instruction according to a decoding rule. Two points need to be explained here, first, in the AI chip, instructions may be executed in parallel. Second, the parsing process of the bit group sequence depends on the rule that the AI chip parses the instruction. For example, if there are multiple units in the AI chip that can receive instructions, the input instructions may be multiple files or multiple pieces of data in memory. Or only one file or piece of data, but can be parsed independently by different units.

And step 203, for an instruction in at least one instruction, predicting a simulation end time of the instruction according to module information of a module related to the instruction, and simulating to execute the instruction in response to detecting that the current simulation time reaches the simulation end time of the instruction.

In this embodiment, after the instruction is obtained through analysis, the time sequence of each instruction in the actual execution process is simulated according to the module information of the module related to the instruction, that is, the specific implementation of hardware, and the time for ending the execution of each instruction is predicted. The current simulation time refers to the internal time of the simulated chip. In order to accurately predict the time of the end of each instruction execution, the simulator is usually required to simulate the interaction process of each module in the AI chip. The state of the chip (such as data in a memory) maintained inside the simulator can be abstracted, and the state can be updated after the simulation execution of the instruction function.

It should be noted that, in order to accurately predict the time when each instruction ends executing, the interaction and influence of each module need to be considered. But at this stage, it is not necessary to actually follow the flow within the chip to perform the function of the instruction. For example, it is not necessary to actually copy data from DRAM to SRAM multiple times.

After the time of the execution end of an instruction is predicted, a timing task is started, and when the simulated time in the chip reaches the time of the execution end of the instruction, the function of the instruction is executed. The reason for this is that the execution of instructions within the chip may be in parallel, and it is possible that the execution of other instructions may end earlier than the current instruction. For example, at simulation time t0, the simulator predicts that the end time of execution of instruction I1 is at t0+100 cycles; at simulation time t1 after a while, the simulator predicts that the execution end time of the instruction I2 is 30 cycles later, i.e., t1+ 30; that is, compared to I1, although the simulator gets the time when the execution of I2 ended later (t0< t 1); however, in function execution, I2 should be earlier than I1(t0+100> t1+ 30). The above process of starting the timing task has different implementation methods, and is not described herein again.

In some optional implementations of this embodiment, simulating execution of the instruction includes: a preset function is called to simulate the function of the instruction. A predetermined function may be called by the thread that simulates the timing to simulate the function of the instruction. The thread simulating the time sequence can simulate the time sequence of each instruction in the actual execution process and can predict the execution ending time of each instruction. The preset function may emulate the function of all instructions. A function specific to the function of the instruction may also be called to emulate the function of the instruction. The function of the command may be data transfer, addition, multiplication, or the like.

In some optional implementations of this embodiment, simulating execution of the instruction includes: the instruction is written to a shared queue. The instruction is fetched from the shared queue and a predetermined function is called to emulate the function of the instruction. The function of the instruction is simulated by a special thread in the simulator. Data is transmitted between the threads simulating the time sequence and the threads simulating the functions through a shared queue. For a thread simulating timing, when the simulated time in the chip reaches the end time of executing a certain instruction, only the instruction needs to be written into the queue. For the thread of the simulation function, the instruction is only required to be fetched from the shared queue and executed. Therefore, the codes of function simulation and the codes of time sequence simulation can be decoupled, independent development can be realized, and the development iteration speed of the simulator is accelerated. And the time sequence simulation and the function simulation in the simulator are operated in different threads, so that the operation speed of the simulator is increased.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for simulating data processing of an artificial intelligence chip according to the present embodiment. In the application scenario of fig. 3, after receiving a bit group sequence to be processed (0101001100, 0010100110), a terminal device obtains hardware specification information of an artificial intelligence chip. At least one instruction is parsed from the bit group sequence according to the hardware specification information (0101001100 is decoded as a first instruction, 0010100110 is decoded as a second instruction), and then internal module timing simulation is performed according to the hardware specification information, so that the simulation end time of each instruction is predicted (t0+ 30 cycles and t0+100 cycles, respectively, wherein t0 is the current simulation time). And putting the simulation data into a completion instruction queue according to the simulation finishing time. For each instruction in the completion instruction queue, in response to detecting that the current simulation time reaches the simulation end time of the instruction, simulating the function of executing the instruction, namely, the function of executing the first instruction when the current simulation time reaches t0+30 cycles, and the function of executing the second instruction when the current simulation time reaches t0+100 cycles. The results of simulation execution may be compared with the results of actual operation of the AI chip to verify the performance and function of the simulator.

The method provided by the above embodiment of the present application simulates the internal timing of the chip by predicting the ending time of the instruction, determines the sequence of executing each instruction according to the result of timing simulation, and increases the consistency of the simulation operation result and the chip operation result.

With further reference to FIG. 4, a flow diagram 400 of yet another embodiment of a method for simulating data processing of an artificial intelligence chip is shown. The process 400 of the method for simulating data processing of an artificial intelligence chip comprises the following steps:

step 401, acquiring a bit group sequence to be processed and hardware specification information of an artificial intelligence chip.

Step 401 is substantially the same as step 201, and therefore is not described again.

Step 402, at least one instruction is parsed from the bit group sequence according to an instruction parsing rule.

Step 402 is substantially the same as step 202 and therefore will not be described in detail.

And step 403, simulating the process executed by the instruction according to the module information of the module related to the instruction, for the instruction in the at least one instruction.

In this embodiment, the hardware specification information may include module information of modules involved in an instruction in the instruction set. That is, the hardware specification information obtained in step 401 can know which modules are needed for each instruction to complete together. An instruction may be performed by a module alone, for example, an add instruction or a multiply instruction may be performed by an arithmetic module alone. And the carrying instruction needs a plurality of modules to be matched and completed. And determining the circulation process of the instruction related data between the modules according to the interconnection information between the modules in the module information of the modules and the protocol information of interaction between the modules. The function of executing the instruction does not need to be really executed according to the flow inside the chip. The state of each module is updated to the state due to the actual execution of the chip only at the predicted end time.

In order to accurately predict the time of the end of each instruction execution, the simulator is usually required to simulate the interaction process of each module in the AI chip. For example, assume that the internal structure of the AI chip is as shown in fig. 5, wherein the module 1 is responsible for executing a data transfer command, and its function is to copy a piece of data from the memory (DRAM) of the chip to the internal SRAM. Generally, in hardware implementation, the handling process is split into multiple bus transactions (transactions). The data address, length, etc. corresponding to each transaction will affect the time for the module 1 to finally complete the instruction. Therefore, the simulator also needs to simulate the impact of the splitting process. In addition, as shown in FIG. 5, in addition to module 1, module 2 and module 3 may also access DRAM, while module 2 may also access SRAM. Therefore, the simulator needs to simulate the effect of arbitration or queuing caused by sharing the same resource among a plurality of modules.

At step 404, a completion time for the instruction is determined based on a process that simulates the instruction being executed.

In this embodiment, for an instruction that does not involve interaction between modules, the internal processing time of the module involved in the instruction may be determined as the completion time of the instruction according to the hardware structure information of the module involved in the instruction. For instructions involving interactions between modules, the completion time also needs to take into account the time of arbitration or queuing due to sharing the same resource, as well as the pipeline time resulting from data transfers between different modules. The interaction between the modules in the instruction execution process can be decomposed into a plurality of transactions according to the protocol information of the interaction between the modules of the module related to the instruction, and the time of the pipeline related to the transaction and the time of queuing can be determined. The sequence of operations from requesting a bus to completing bus usage is referred to as a bus transaction, which is a series of activities that occur in one bus cycle. Typical bus transactions include request operations, arbitration operations, address transfers, data transfers, and bus releases. Queuing time is also required for other communication connections. For an instruction involving interaction between modules, the completion time of the instruction is the sum of the internal processing time of the module involved in the instruction, the pipeline time involved in the transaction during execution of the instruction, and the time queued.

Step 405, determining the simulation ending time of the instruction according to the current simulation time and the completion time.

In this embodiment, the current simulation time is the time inside the software simulation chip. The simulation end time is the sum of the current simulation time and the completion time. Assuming that the current simulation time is t0, the completion time determined by step 404 is t 1. The simulated end time of the instruction is t0+ t 1.

In response to detecting that the current simulation time reaches the simulation end time of the instruction, the instruction is simulated and executed, step 406.

In this embodiment, the current simulation time of step 406 is different from the current simulation time of step 405. The current simulation time is varied in real time. When the updated current simulation time is detected to be t0+ t1 obtained in step 405, the instruction is simulated to be executed. The simulation execution steps are substantially the same as those in step 203, and thus are not described in detail.

As can be seen from fig. 4, the flow 400 of the method for simulating data processing of an artificial intelligence chip in this embodiment highlights the step of predicting the completion time of instructions relating to interactions between modules, compared to the corresponding embodiment of fig. 2. Therefore, the scheme described in the embodiment can be used for not copying and calculating actual data when the time sequence in the chip is simulated, and the running speed of the simulator is increased.

With further reference to fig. 6, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for simulating data processing of an artificial intelligence chip, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in various electronic devices.

As shown in fig. 6, the apparatus 600 for simulating data processing of an artificial intelligence chip of the present embodiment includes: acquisition section 601, analysis section 602, and simulation section 603. The obtaining unit 601 is configured to obtain a bit group sequence to be processed and hardware specification information of an artificial intelligence chip, where the hardware specification information includes an instruction parsing rule, a supported instruction set, and module information of modules related to instructions in the instruction set. The parsing unit 602 is configured to parse out at least one instruction from the sequence of bit groups according to an instruction parsing rule. The simulation unit 603 is configured to, for an instruction of the at least one instruction, predict a simulation end time of the instruction according to module information of a module to which the instruction relates, and simulate execution of the instruction in response to detecting that a current simulation time reaches the simulation end time of the instruction.

In this embodiment, the specific processes of the obtaining unit 601, the analyzing unit 602 and the simulating unit 603 of the apparatus 600 for simulating data processing of an artificial intelligence chip may refer to step 201, step 202 and step 203 in the corresponding embodiment of fig. 2.

In some optional implementations of this embodiment, the module information includes at least one of: interconnection information among modules, hardware structure information of modules, and protocol information of interaction among modules.

In some optional implementations of this embodiment, the simulation unit 603 is further configured to: a preset function is called to simulate the function of the instruction.

In some optional implementations of this embodiment, the simulation unit 603 is further configured to: the instruction is written to a shared queue. The instruction is fetched from the shared queue and a predetermined function is called to emulate the function of the instruction.

In some optional implementations of this embodiment, the simulation unit 603 is further configured to: simulating the process executed by the instruction according to the module information of the module related to the instruction. The completion time of the instruction is determined based on a process that simulates the instruction being executed. And determining the simulation ending time of the instruction according to the current simulation time and the completion time.

In some optional implementations of this embodiment, the simulation unit 603 is further configured to: and determining the internal processing time of the module related to the instruction according to the hardware structure information of the module related to the instruction. And decomposing the interaction between the modules in the instruction execution process into a plurality of transactions according to the protocol information of the interaction between the modules of the module related to the instruction, and determining the time of a pipeline related to the transaction and the time of queuing. And determining the completion time of the instruction according to the internal processing time of the module related to the instruction, the pipeline time related to the transaction in the instruction execution process and the queuing time.

Referring now to FIG. 7, a block diagram of a computer system 700 suitable for use in implementing an electronic device (e.g., the terminal device/server shown in FIG. 1) of an embodiment of the present application is shown. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program, when executed by a Central Processing Unit (CPU)701, performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, an analysis unit, and a simulation unit. The names of these units do not in some cases form a limitation to the unit itself, and for example, the obtaining unit may also be described as a "unit for obtaining the bit group sequence to be processed and the hardware specification information of the artificial intelligence chip".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring a bit group sequence to be processed and hardware specification information of an artificial intelligence chip, wherein the hardware specification information comprises an instruction parsing rule, a supported instruction set and module information of modules related to instructions in the instruction set. At least one instruction is parsed from the sequence of bit groups according to an instruction parsing rule. For an instruction in at least one instruction, the simulation end time of the instruction is predicted according to the module information of the module related to the instruction, and the instruction is simulated and executed in response to the detection that the current simulation time reaches the simulation end time of the instruction.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for simulating data processing of an artificial intelligence chip, wherein the artificial intelligence chip comprises at least one module, the method comprising:

acquiring a bit group sequence to be processed and hardware specification information of the artificial intelligence chip, wherein the hardware specification information comprises an instruction parsing rule, a supported instruction set and module information of modules related to instructions in the instruction set;

resolving at least one instruction from the bit group sequence according to the instruction resolving rule;

and for an instruction in the at least one instruction, predicting a simulation end time of the instruction according to the module information of the module related to the instruction, and simulating to execute the instruction in response to detecting that the current simulation time reaches the simulation end time of the instruction.

2. The method of claim 1, wherein the module information comprises at least one of:

interconnection information among modules, hardware structure information of modules, and protocol information of interaction among modules.

3. The method of claim 1 or 2, wherein said simulating executes the instruction, comprising:

a preset function is called to simulate the function of the instruction.

4. The method of claim 1 or 2, wherein said simulating executes the instruction, comprising:

writing the instruction into a shared queue;

and taking the instruction out of the shared queue and calling a preset function to simulate the function of the instruction.

5. The method of claim 2, wherein predicting the simulated end time of the instruction according to the module information of the module to which the instruction relates comprises:

simulating the process of the instruction executed according to the module information of the module related to the instruction;

determining the completion time of the instruction according to the process of simulating the execution of the instruction;

and determining the simulation ending time of the instruction according to the current simulation time and the completion time.

6. The method of claim 5, wherein determining the completion time of the instruction based on a process that simulates the instruction being executed comprises:

determining the internal processing time of the module related to the instruction according to the hardware structure information of the module related to the instruction;

decomposing the interaction among the modules in the instruction execution process into a plurality of transactions according to the protocol information of the interaction among the modules of the module related to the instruction, and determining the time of a pipeline and the queuing time related to the transactions;

and determining the completion time of the instruction according to the internal processing time of the module related to the instruction, the pipeline time related to the transaction in the instruction execution process and the queuing time.

7. An apparatus for simulating data processing of an artificial intelligence chip, wherein the artificial intelligence chip comprises at least one module, the apparatus comprising:

the acquisition unit is configured to acquire a bit group sequence to be processed and hardware specification information of the artificial intelligence chip, wherein the hardware specification information comprises an instruction parsing rule, a supported instruction set and module information of modules related to instructions in the instruction set;

a parsing unit configured to parse at least one instruction from the sequence of bit groups according to the instruction parsing rule;

and the simulation unit is configured to predict the simulation end time of the instruction according to the module information of the module related to the instruction, and simulate to execute the instruction in response to detecting that the current simulation time reaches the simulation end time of the instruction.

8. The apparatus of claim 7, wherein the module information comprises at least one of:

9. The apparatus of claim 7 or 8, wherein the analog unit is further configured to:

a preset function is called to simulate the function of the instruction.

10. The apparatus of claim 7 or 8, wherein the analog unit is further configured to:

writing the instruction into a shared queue;

11. The apparatus of claim 8, wherein the analog unit is further configured to:

12. The apparatus of claim 11, wherein the analog unit is further configured to:

13. A simulator, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.

14. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-6.