WO2024108907A1

WO2024108907A1 - Data processing method and apparatus, ai chip, electronic device, and storage medium

Info

Publication number: WO2024108907A1
Application number: PCT/CN2023/092113
Authority: WO
Inventors: 刘军; 彭凡; 杨媛静; 王鸥
Original assignee: 成都登临科技有限公司
Priority date: 2022-11-25
Filing date: 2023-05-04
Publication date: 2024-05-30

Abstract

The present application relates to the technical field of data processing, and relates to a data processing method and apparatus, an AI chip, an electronic device, and a storage medium. The data processing method comprises: acquiring a computational graph of a network model to be run; translating each operation in the computational graph of the network model into a hardware execution command executable by a target hardware device of an AI chip; and storing the hardware execution command by using a network execution graph. Each operation in a computational graph of a network model is translated into a corresponding hardware execution command executable by a target hardware device, and the hardware execution command is stored, so that subsequently, every time the network model needs to be run, the pre-stored hardware execution command is directly distributed to corresponding hardware for execution; each operation in the computational graph of the network model does not need to be translated into a hardware execution command again; therefore, the problem that a processor needs great performance overhead and consumes long time every time the processor runs the network model is mitigated.

Description

Data processing method, device, AI chip, electronic device and storage medium

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to the Chinese patent application with application number 202211486836.8 filed with the State Intellectual Property Office of China on November 25, 2022, and entitled “Data processing method, device, AI chip, electronic device and storage medium”, and the Chinese patent application with application number 202211486830.0 filed with the State Intellectual Property Office of China on November 25, 2022, and entitled “Command generation method, device, AI chip, electronic device and storage medium”, the entire contents of which are incorporated by reference in this application.

Technical Field

The present application belongs to the field of artificial intelligence technology, and specifically relates to a data processing method, device, AI chip, electronic device and storage medium.

Background technique

When it is necessary to run an AI (Artificial Intelligence) network for data processing and calculation, it is usually necessary to load the computing tasks corresponding to the network model onto the hardware device used to perform the computing tasks. This process requires generating hardware execution commands for the network model that can be recognized and executed by the hardware.

At present, every time the processor runs a network model, it needs to re-translate the corresponding hardware execution commands for each operation (or operator) in the network model and provide them to the hardware for execution as soon as possible, and each operation needs to be translated into a hardware execution command. It takes a little time. Every time the processor needs to run a network model, every time it processes an operation in the network model, the driver needs to translate the operation into a hardware execution command and send it to the hardware for execution, and then process the next operation. The driver translates the next operation into a hardware execution command and sends it to the hardware for execution until the driver translates the last operation of the network model and sends the hardware execution command corresponding to the last operation required to run the network model this time. This processing method requires a lot of performance overhead and takes a long time to execute and run the AI network every time the processor runs the network model for data processing.

In addition, given that AI chips (which can be various processors) currently use network models for data processing, when the processor generates hardware execution commands (referred to as hardware commands or commands) for each network model, the hardware execution commands usually need to include or reflect: the type or content of the operation, the read address of the data source required for the operation, and the write address for storing the operation results. The processor uses a lot of memory information when translating the various operations of the network model into hardware execution commands.

The process of generating hardware execution commands for network models involves the allocation and occupation of memory resources, and the data of different network models will occupy different memory spaces, which will increase the occupation of limited memory resources by network models. Especially when the data required to execute the network model requires a large amount of memory, it is easy to have insufficient memory, which makes it difficult to translate the hardware execution commands for the network model as expected, and then makes it difficult for the hardware to execute the network model as expected. In addition, this insufficient memory may also affect the performance of the hardware in other aspects.

Summary of the invention

In view of this, one aspect of the present application provides a data processing method to improve the problem that the current processor requires a large performance overhead and takes a long time each time it runs a network model for data processing, and to improve the problem in the related technology that when generating hardware execution commands for network model translation, a large memory resource overhead is required, which easily leads to insufficient memory.

In order to achieve the above purpose, the embodiments of the present application are implemented as follows:

In a first aspect, an embodiment of the present application provides a data processing method, which may include: obtaining a computational graph of a network model to be run; translating each operation in the computational graph of the network model into a hardware execution command that can be executed by a target hardware device of the AI chip, wherein the hardware execution command includes device information of the target hardware device; using a network execution graph to store the hardware execution command, wherein the network execution graph is used to record all hardware execution commands generated for the network model, and the target hardware device is used to run the network model by executing the hardware execution commands in the network execution graph.

In combination with a possible implementation method of the first aspect of the above-mentioned embodiment, translating each operation contained in the computation graph of the network model into a hardware execution command that can be executed by the target hardware device of the AI chip may include: compiling the source code of each operation in the computation graph of the network model into instructions, and obtaining relevant information required for the target hardware device to perform each operation; generating the hardware execution command according to the corresponding instructions of each operation and the relevant information required to execute each operation.

Among them, the preset first API function (such as object creation API, compile instruction API) can be used to compile the source code of each operation in the calculation graph of the network model into instructions, and then the preset second API function (such as memory allocation API, data transfer API) can be used to obtain the relevant information required for the target hardware device to perform each operation (such as the address and length of the instruction, the number of memory addresses that the instruction needs to operate, the execution order between instructions, etc.), and then the preset third API function (such as execution API) is used to generate hardware execution commands according to the corresponding instructions of each operation and the relevant information required to execute each operation. In this way, each operation in the calculation graph of the network model can be quickly and accurately translated into hardware execution commands that can be executed by the target hardware device.

In combination with a possible implementation method of the first aspect of the above-mentioned embodiment, using a network execution graph to store the hardware execution command may include: storing the hardware execution command corresponding to each operation in the network execution graph in sequence according to the execution order of each operation contained in the network model, and recording key information of each hardware execution command, wherein the key information is used to obtain the hardware execution command.

In combination with a possible implementation manner of the first aspect embodiment above, the data processing method may also include: when it is necessary to run the network model, obtaining the hardware execution command pre-stored in the network execution graph; sending the hardware execution command to the target hardware device for execution, so that the target hardware device executes the hardware execution command to realize running the network model on the target hardware device.

In conjunction with a possible implementation manner of the first aspect of the embodiment, sending the hardware execution command to the target hardware device for execution may include: Modify the read address used to obtain input data in the hardware execution command, and/or modify the write address used to store output data in the hardware execution command; send the modified hardware execution command to the target hardware device for execution, so that the target hardware device executes the modified hardware execution command, thereby achieving the purpose of running the network model on the target hardware device to process the input data.

In combination with a possible implementation manner of the first aspect embodiment above, the data processing method may further include: copying the hardware execution command according to the total number of hardware devices in the AI chip; modifying the device information contained in the copied hardware execution command according to the device information of other hardware devices in the AI chip except the target hardware device, to obtain a hardware execution command with modified device information, wherein the hardware execution command with modified device information can be provided to the other hardware devices for execution.

In combination with a possible implementation manner of the first aspect embodiment above, the data processing method may further include: determining a first number of hardware devices currently required to run the network model based on an amount of data to be processed.

In combination with a possible implementation manner of the first aspect embodiment above, translating each operation in the computation graph of the network model into a hardware execution command that can be executed by the target hardware device of the AI chip may include: allocating a corresponding virtual memory space to the network model; and based on the virtual memory space, translating each operation contained in the network model into a corresponding first hardware execution command, wherein the addresses in the first hardware execution command are all virtual addresses, and the virtual memory space has the same properties as the real memory space.

In combination with a possible implementation manner of the first aspect embodiment mentioned above, using a network execution graph to store the hardware execution command may include: after translating each operation contained in the network model into a corresponding first hardware execution command based on the virtual memory space, storing the first hardware execution command, wherein the first hardware execution command is used to be provided to a hardware device that needs to run the network model for execution after address replacement.

In combination with a possible implementation manner of the first aspect embodiment described above, allocating a corresponding virtual memory space for the network model may include: allocating a virtual memory space corresponding to the data size required to execute the network model.

This helps to meet the requirements for generating commands without occupying too much real memory resources of the hardware.

In combination with a possible implementation manner of the first aspect embodiment above, before translating the various operations contained in the network model into corresponding first hardware execution commands based on the virtual memory space, the data processing method may also include: determining whether the network model is executed within a preset time period after the current moment; when it is determined that the network model is not executed within the preset time period after the current moment, executing the steps: based on the virtual memory space, translating the various operations contained in the network model into corresponding first hardware execution commands.

In combination with a possible implementation manner of the first aspect embodiment above, the data processing method may also include: when determining that the network model is to be executed within a preset time period starting from the current moment, based on the real memory space, translating each operation contained in the network model into a corresponding second hardware execution command, wherein the addresses contained in the second hardware execution command are all real addresses, and the real memory space is used to store the data required to execute the network model.

In combination with a possible implementation manner of the first aspect embodiment above, after storing the first hardware execution command, the data processing method may also include: when the network model needs to be executed, loading the data required to execute the network model into the real memory space; replacing the false address in the first hardware execution command with the real address corresponding to the real memory space; and sending the replaced first hardware execution command as the second hardware execution command to the corresponding hardware device, so that the corresponding hardware device can execute the second hardware execution command.

Through this implementation, it is possible to ensure that the actual memory resources of the hardware are not occupied and the normal use of the network model is not affected.

In combination with a possible implementation manner of the first aspect embodiment mentioned above, replacing the false address in the first hardware execution command with the real address corresponding to the real memory space may include: identifying the first hardware execution command, determining part or all of the first hardware execution command currently containing the false address as the target command; replacing the false address in the target command with the real address corresponding to the real memory space.

In combination with a possible implementation manner of the first aspect embodiment above, after sending the replaced first hardware execution command as the second hardware execution command to the corresponding hardware device for execution, the data processing method may also include: replacing the real address in the second hardware execution command with the false address corresponding to the virtual memory space, and caching the command with the address replaced by the false address.

In combination with a possible implementation method of the first aspect embodiment above, based on the virtual memory space, translating each operation contained in the network model into a corresponding first hardware execution command can include: compiling the source code of each operation contained in the network model into instructions corresponding to each operation, and based on the virtual memory space, obtaining relevant information required to execute each operation contained in the network model, the relevant information including address information; and generating the first hardware execution command according to the corresponding instructions of each operation and the relevant information required to execute each operation.

In combination with a possible implementation manner of the first aspect of the embodiment, different network models correspond to different virtual memory spaces.

In the second aspect, an embodiment of the present application also provides a data processing method, which may include: when it is necessary to run a network model, obtaining a pre-stored hardware execution command that can be executed by a target hardware device corresponding to the network model; sending the hardware execution command to the target hardware device so that the target hardware device executes the hardware execution command to achieve the purpose of running the network model on the target hardware device to process the input data.

In combination with a possible implementation of the second aspect of the embodiment, when it is necessary to run a network model, obtaining a pre-stored hardware execution command that can be executed by a target hardware device corresponding to the network model may include: when it is necessary to execute the network model, loading the network original data corresponding to the network model into a real memory space, and obtaining a pre-stored first hardware execution command; wherein the first hardware execution command is obtained by translating each operation contained in the network model based on a virtual memory space, and the virtual memory space has the same properties as the real memory space; and using the real memory space The corresponding real address replaces the false address in the first hardware execution command.

In combination with a possible implementation manner of the above-mentioned second aspect embodiment, after replacing the false address in the first hardware execution command with the real address corresponding to the real memory space, the data processing method also includes: sending the replaced first hardware execution command as the second hardware execution command to the corresponding hardware device.

In combination with a possible implementation manner of the above-mentioned second aspect embodiment, after sending the replaced first hardware execution command as the second hardware execution command to the corresponding hardware device for execution, the data processing method may also include: replacing the real address in the second hardware execution command with a false address corresponding to the false memory space, and replacing the cache address with the second hardware execution command with the false address.

In the third aspect, an embodiment of the present application also provides a data processing device, which may include: an acquisition module, a command generation module and a storage module, wherein the acquisition module is configured to: acquire the calculation graph of the network model to be run; the command generation module is configured to: translate each operation in the calculation graph of the network model into a hardware execution command that can be executed by the corresponding target hardware device, and the hardware execution command contains the device information of the target hardware device; the storage module is configured to: store the hardware execution command using a network execution graph, wherein the network execution graph is used to record all hardware execution commands generated for the network model, and the target hardware device can run the network model by executing the hardware execution commands in the network execution graph.

In combination with a possible implementation manner of the third aspect embodiment above, the command generation module may include: an allocation module and a translation module, wherein the allocation module is configured to: allocate a corresponding virtual memory space for the network model; the translation module is configured to: based on the virtual memory space, translate each operation contained in the network model into a corresponding first hardware execution command, the addresses in the first hardware execution command are all virtual addresses, and the virtual memory space has the same properties as the real memory space.

In combination with a possible implementation manner of the third aspect embodiment above, the storage module is configured to: store the first hardware execution command, and the first hardware execution command is used to be provided to the hardware device that needs to run the network model for execution after the address is replaced.

In a fourth aspect, an embodiment of the present application further provides a data processing device, which may include: an acquisition module and a sending module, wherein the acquisition module is configured to: when it is necessary to run a network model, acquire a pre-stored hardware execution command that can be executed by a target hardware device corresponding to the network model; the sending module is configured to: send the hardware execution command to the target hardware device so that the target hardware device executes the hardware execution command to achieve the purpose of running the network model on the target hardware device to process input data.

In combination with a possible implementation manner of the fourth aspect embodiment above, the acquisition module may include: a first hardware execution command acquisition module and a translation module, wherein the first hardware execution command acquisition module is configured to: when the network model needs to be executed, load the network original data corresponding to the network model into the real memory space, and obtain the pre-stored first hardware execution command; wherein the first hardware execution command is obtained by translating each operation contained in the network model based on the virtual memory space, and the virtual memory space has the same properties as the real memory space; the translation module is configured to: replace the virtual address in the first hardware execution command with the real address corresponding to the real memory space.

In combination with a possible implementation manner of the fourth aspect of the embodiment, the sending module is further configured to: send the replaced first hardware execution command as the second hardware execution command to the corresponding hardware device.

In a fifth aspect, an embodiment of the present application further provides an AI chip, which may include: a kernel and a storage device, wherein the kernel is configured to: obtain a computational graph of a network model to be run, and translate each operation in the computational graph of the network model into a hardware execution command executable by a target hardware device, wherein the hardware execution command contains device information of the target hardware device; the storage device is configured to: store the hardware execution command using a network execution graph, wherein the network execution graph is used to record all hardware execution commands generated for the network model, and the target hardware device can be used to run the network model by executing the hardware execution commands in the network execution graph.

In combination with a possible implementation manner of the fifth aspect embodiment above, the kernel is configured to: allocate a corresponding virtual memory space for the network model, and based on the virtual memory space, translate each operation contained in the network model into a corresponding first hardware execution command, the addresses in the first hardware execution command are all virtual addresses, and the virtual memory space has the same properties as the real memory space; the storage device is configured to: store the first hardware execution command, and the first hardware execution command is used to provide it to the hardware device that needs to run the network model for execution after the address is replaced.

In the sixth aspect, an embodiment of the present application also provides an AI chip, which may include: a hardware device, a storage device and a kernel, wherein the storage device is configured to: store hardware execution commands corresponding to each operation in the computational graph of the network model; the kernel is configured to: when it is necessary to run the network model, obtain the previously stored hardware execution commands from the storage device, and send the hardware execution commands to the hardware device; the hardware device is configured to: execute the hardware execution commands to achieve the purpose of running the network model to process the input data.

In combination with a possible implementation manner of the sixth aspect embodiment above, the storage device is also configured to: store a first hardware execution command; wherein the first hardware execution command is obtained by translating each operation contained in the network model based on a virtual memory space, and the virtual memory space has the same properties as the real memory space; the kernel is also configured to: when the network model needs to be executed, load the network original data corresponding to the network model into the real memory space, obtain the first hardware execution command stored in the storage device, replace the virtual address in the first hardware execution command with the real address corresponding to the real memory space, and send the replaced first hardware execution command as the second hardware execution command to the hardware device; the hardware device is also configured to: execute the second hardware execution command to achieve the purpose of running the network model to process the input data.

In a seventh aspect, an embodiment of the present application further provides an electronic device, which may include: a memory and a processor, wherein the processor is connected to the memory, wherein the memory is configured to store a program; and the processor is configured to call the program stored in the memory to execute Execute the data processing method provided by the above-mentioned first aspect embodiment and/or any possible implementation method combined with the first aspect embodiment, or execute the data processing method provided by the above-mentioned second aspect embodiment and/or any possible implementation method combined with the second aspect embodiment.

In the eighth aspect, an embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored. When the computer program is executed by a processor, it executes the data processing method provided by the above-mentioned first aspect embodiment and/or any possible implementation method combined with the first aspect embodiment, or executes the above-mentioned second aspect embodiment and/or any possible implementation method combined with the second aspect embodiment.

Other features and advantages of the present application will be described in the following description. The purpose and other advantages of the present application can be realized and obtained through the structures specifically pointed out in the written description and the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present application or the related art, the drawings required for use in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present application. For ordinary technicians in this field, other drawings can also be obtained based on these drawings without creative work. As shown in the drawings, the above and other purposes, features and advantages of the present application will be clearer. The same reference numerals indicate the same parts in all the drawings. The drawings are not deliberately scaled to the actual size, and the focus is on showing the main purpose of the present application.

FIG1 shows a schematic flow chart of a data processing method provided in an embodiment of the present application.

FIG2 is a flow chart showing some steps in a data processing method provided in an embodiment of the present application.

FIG. 3 shows another flowchart of some steps in a data processing method provided in an embodiment of the present application.

FIG4 shows a flow chart of another data processing method provided in an embodiment of the present application.

FIG5 is a flow chart showing some steps in another data processing method provided in an embodiment of the present application.

FIG6 shows a module diagram of a data processing device provided in an embodiment of the present application.

FIG. 7 shows a more detailed module diagram of a data processing device provided in an embodiment of the present application.

FIG8 shows a module diagram of another data processing device provided in an embodiment of the present application.

FIG. 9 shows a more detailed module diagram of yet another data processing device provided in an embodiment of the present application.

FIG. 10 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.

FIG. 11 shows a schematic structural diagram of another electronic device provided in an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be described below in conjunction with the accompanying drawings in the embodiments of the present application. In the absence of conflict, the embodiments of the present application or more specific implementation details in the embodiments can be used in combination with each other.

It should be noted that similar numbers and letters represent similar items in the following figures, so once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures. At the same time, in the description of this application, terms such as "including", "comprising" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also includes other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, an element defined by the statement "including a..." does not exclude the presence of other identical elements in the process, method, article or device including the element.

Furthermore, the term "and/or" in this application is merely a term used to describe the association relationship between associated objects, indicating that three relationships may exist. For example, A and/or B may represent three situations: A exists alone, A and B exist at the same time, and B exists alone.

The terms "first" and "second" in this application are only used to distinguish one entity or operation or object from another entity or operation or object, and do not require or imply any actual relationship or order between these entities or operations or objects.

The embodiments of the present application involve application scenarios in which network models (various neural network models) are used for data processing. In order to better understand the solutions of the embodiments of the present application, the relevant terms and concepts that may be involved in the embodiments of the present application are first introduced below.

Among them, the neural network model can be composed of neural units, which can be specifically understood as a neural network model with an input layer, a hidden layer, and an output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the layers in between are all hidden layers (there can be many hidden layers). The neural network model uses one or more layers (such as hidden layers, output layers, etc.) to generate outputs for received inputs. The output of each hidden layer is used as the input of the next layer (such as the next hidden layer or output layer). Each layer of the neural network model generates outputs for received inputs according to the current relevant parameters of the layer (such as weights).

Each operation contained in the neural network model (such as convolution, pooling, activation, normalization, classification, etc.) needs to be translated into hardware execution commands before the hardware device can execute them. The hardware device implements the functions of the corresponding operations in the network model by executing these hardware execution commands, thereby supporting the function of running the neural network on the hardware device to process the input data.

In order to express the computational logic of the network model, a computational graph is often used to reflect the computational logic of the network model. Each node in the computational graph can correspond to an operation in the network model. These operations are also called operators in the network model. Each operator has its own unique features and is used to complete specific functions. The computational graph of a network model usually contains many different operations, such as convolution operations, pooling operations, activation functions, etc.

In view of the inventor's discovery that, currently, when faced with a scenario where data processing needs to be performed by running a network model, each time the processor runs the network model, it is necessary to temporarily translate each operation in the network model into a hardware execution command, and send the hardware execution command temporarily generated for a single operation to the hardware for execution (that is, after a part of the hardware execution command is generated for an operation of the network model and sent to the hardware for execution, another part of the hardware execution command is generated for the next operation of the same network model and sent to the hardware for execution), so that the processor has a large performance overhead each time it runs the network model, And it takes a long time.

Therefore, the inventors proposed the following embodiments according to the characteristics of the network model to improve the above problems.

The inventor of the present application considers that when a network model is used for data processing, the structure of the network model itself is fixed, but the input data processed each time when the network model is loaded into the hardware for execution may be different, and there may be different output results for different input data. Based on this, the present application generates (or translates in advance) the hardware execution commands that can be executed by the corresponding target hardware device for each operation contained in the network model, but does not send it to the target hardware device (referring to the hardware device that can execute the hardware execution command) for execution first, but first stores the hardware execution commands corresponding to each operation contained in the network model (for example, stored in the constructed network execution graph), so that each time the network model needs to be used to process the input data (such as recognition, classification, feature extraction, size transformation, etc.), the pre-stored hardware execution commands can be distributed to the corresponding hardware for execution, which is conducive to quickly loading the computing logic and computing tasks of the network model onto the hardware that needs to run the network model. When hardware execution commands are generated in advance for each operation of the network model, each time the execution occurs, only the content related to the input and output needs to be modified. There is no need to re-translate each operation in the network model into hardware execution commands. This improves the problem that the processor requires a large performance overhead and takes a long time each time it runs the network model.

The embodiment of the present application provides a data processing method that can be applied to network models used in various artificial intelligence application scenarios. Artificial intelligence application scenarios include but are not limited to: text processing, speech recognition and processing, multi-language translation, image recognition, biometric recognition, and intelligent control. The data processing method can be applied to a driver program and can be applied to an AI chip, which can be a homogeneous processor or a heterogeneous processor.

For a better understanding, the data processing method provided in the embodiment of the present application will be described below in conjunction with Figures 1, 2 and 3.

S1: Get the computational graph of the network model to be run.

Before translating each operation of the network model into a hardware execution command, a computational graph of the network model to be run and translated is obtained.

In the field of artificial intelligence, computational graphs are a commonly used method for representing computational processes. They are often used to represent the computational logic of neural network model design and are widely used in various data processing platforms. Each node in the computational graph represents the corresponding operation (i.e., operator or operation) that the network model needs to perform, and the directed edges between nodes represent the dependencies between the operations corresponding to the nodes. After each operation (or operator) in the computational graph is translated into a hardware execution command, it is sent to the corresponding hardware device for execution, thereby completing the execution of the network model. The operation operators corresponding to the nodes in the computational graph can be defined at the granularity of algebraic operators (such as vector addition, subtraction, multiplication, division, and matrix multiplication, etc.). For the case where the abstract granularity of the operator is low, the computational graph of a network model may often include many (for example, thousands) nodes.

The computation graph of the network model to be run and translated obtained in step S1 may be an original computation graph or an optimized computation graph, for example, a computation graph obtained after operator fusion. After the network structure of the network model is converted into the original computation graph, it may be optimized once or multiple times to obtain an optimized computation graph.

As an implementation method, the AI chip can directly or indirectly obtain the calculation graph of the network model, as long as the structure of the network model can be determined and the various operations that need to be implemented by the network model can be known. The AI chip is provided with a corresponding driver, which can be deployed in the kernel of the AI chip.

S2: Translate each operation in the computational graph of the network model into hardware execution commands that can be executed by the target hardware device of the AI chip.

Among them, the translation process of S2 can be executed by the driver corresponding to the AI chip.

The translated hardware execution command may contain the device information of the target hardware device (such as the device identification), which is used to indicate which hardware device can execute the hardware execution command. Different target hardware devices have different corresponding device information. After the operation in the network model is translated, the obtained hardware execution command can be provided to the corresponding hardware device for execution when the network model corresponding to the hardware execution command needs to be run.

The target hardware device refers to the hardware device that runs the hardware execution command and is the hardware object that is expected to have the ability to run the network model. An AI chip may involve multiple hardware devices.

For example, an AI chip can be a dedicated computing acceleration chip (or accelerator) designed to undertake heavy computing tasks, such as a graphics processing unit (GPU), a tensor processing unit (TPU), etc. Of course, it can also be other homogeneous or heterogeneous processors.

Optionally, an AI chip may include multiple hardware devices, any of which may be used as a target hardware device according to actual needs. A hardware device may include multiple hardware execution units. For example, a hardware device in an AI chip may include but is not limited to: a first unit (CU, Compute engine Unit) for general computing, a second unit (TU, Tensor Unit) for AI accelerated computing, and a third unit (DMA, Direct Memory Access) for data transfer, etc. A hardware device in an AI chip can also be regarded as a computing cluster containing multiple hardware execution units. The number of hardware execution units contained in different types of hardware devices may be different, and the types may also be different. The specific hardware architecture should not be understood as a limitation on the embodiments of the method of the present application.

In an optional implementation, S2 may include: compiling the source code of each operation in the computational graph of the network model into instructions (hardware machine instructions), and obtaining the relevant information required for the target hardware device to perform each operation; generating a hardware execution command according to the corresponding instructions of each operation and the relevant information required to perform each operation. Exemplarily, the relevant information required for the target hardware device to perform an operation can be used to reflect the following information related to the operation: the address and length of the hardware instruction, how many memory addresses the instruction needs to operate, where the memory address is located, how big the memory is, what the processing order between instructions is, etc.

Exemplarily, the source code of each operation in the computational graph of the network model can be compiled into instructions using a preset first API (Application Programming Interface) function, and the relevant information required for the target hardware device to perform each operation can be obtained using a preset second API function; a preset third API function can be used to generate hardware execution commands based on the corresponding instructions of each operation and the relevant information required to perform each operation. Among them, hardware execution commands corresponding to each operation can be generated in advance for each operation of the network model. There may be hundreds of hardware execution commands generated for one operation.

Among them, the calculation graph of the network model contains many different operations (also called operators, each operator has its own unique features and is used to complete specific functions), such as convolution operations, pooling operations, activation functions, etc. In order to translate the various operations in the calculation graph of the network model into hardware execution commands that can be executed by the hardware, the driver provides a set of relatively general API functions, such as object creation API, compilation instruction API, memory allocation API, data transfer API, execution API, etc.

Exemplarily, for each operation of the network model, the driver provides a function of a programmable language similar to C++ syntax, and the source code of the operation can be implemented using this syntax. At the same time, the driver also uses a preset first API function (such as an object creation API, a compilation instruction API) to compile the source code of an operation in the computational graph of the network model into a hardware instruction corresponding to the operation through a compiler. The specific implementation details of using a compiler to compile source code into instructions are already well known in the art and will not be introduced here.

Since each operation requires an operation object, for example, if data needs to be operated, such as the convolution operation, the input data and weight need to be convolved, a memory allocation API provided by the driver can be used to allocate a space on the memory and provide it to the convolution operation. In addition, part of each operation may involve the handling of data, so the data handling API provided by the driver is used to handle data during the operation. The driver can obtain the relevant information required for the target hardware device to perform each operation by using the preset second API function (such as the aforementioned memory allocation API, data handling API), and can use the preset third API function (such as the execution API) to generate hardware execution commands according to the corresponding instructions of each operation and the relevant information required to execute each operation. Among them, how to organize instructions and related information for a single operation to generate a hardware execution command for a single operation, this part of the implementation method is already well known in the art and will not be introduced here.

By adopting the data processing method provided in the embodiment of the present application, the process of generating the hardware execution command corresponding to the network model for the network model only needs to be done once, and the generated hardware execution command is first cached, for example, stored in the constructed network execution graph. Each time it is to be executed, the command is distributed based on the hardware execution command stored in the constructed network execution graph to enable the hardware to execute these commands. There is no need to repeat this process many times like the related art, and each time the generated command is immediately sent to the hardware for execution (the related art needs to perform a translation and conversion process to generate the required command before each command is distributed, and many times of distribution require many times of translation).

S3: Utilize the network execution graph to store the hardware execution command.

After translating each operation of the network model into a hardware execution command executable by the corresponding target hardware device, the translated hardware execution command is stored. In one implementation, the constructed network execution graph can be used for storage.

The network execution graph can also be used to reflect the computing logic of the network model and can be regarded as a new computing graph. However, the network execution graph does not need to record or store the source code of each operation like the original computing graph of the network model.

The network execution graph is used to record all hardware execution commands generated for the network model, and can also be used to record the key information of each hardware execution command. The key information may include the starting address, offset, and command execution order, etc. The length and storage location of the command can be known based on the starting address and offset. The key information is used to obtain the hardware execution command, and the target hardware device can obtain the hardware execution command based on the key information.

The network execution graph stores all commands that need to be executed by the hardware regarding the network model. After the hardware execution commands are stored in the constructed network execution graph (this process can convert the various operations (including characteristic parameters) contained in the network model into commands that can be recognized and executed by the hardware and stored), after the network execution graph or the commands in the network execution graph are provided to the target hardware device, the target hardware device can run the network model based on the hardware execution commands cached in the network execution graph.

The storage device in the AI chip may use the network execution graph to store the hardware execution command. The network execution graph may be located on the target hardware device or not. For example, it may be located on a storage device connected to the target hardware device.

In an embodiment of the present application, each operation in the calculation graph of the network model is first translated into a hardware execution command that can be executed by the target hardware device, but it is not sent to the target hardware device for execution first. Instead, the translated hardware execution commands are first stored using the network execution graph, so that each time the network model needs to be run subsequently, the pre-stored hardware execution commands can be distributed to the corresponding hardware for execution, and there is no need to re-translate each operation in the calculation graph of the network model into hardware execution commands, thereby improving the problem that the processor requires a large performance overhead and a long time each time it runs the network model.

In an optional implementation, the process of using a network execution graph to store translated hardware execution commands can be to store the hardware execution commands corresponding to each operation in the network execution graph in sequence according to the execution order of each operation contained in the network model, and record the key information of each hardware execution command accordingly. It can be understood that compared with the random storage method, this method of storing hardware execution commands according to the execution order of each operation can improve the efficiency of subsequent execution of hardware execution commands. Since the corresponding hardware execution commands need to be executed in the execution order of each operation contained in the network model when executing the hardware execution commands later, the normal operation of the network model can be guaranteed. Therefore, when storing hardware execution commands, they are stored in this execution order, and the subsequent execution of instructions can be directly sent in the order of storage.

Among them, according to the execution order of each operation contained in the network model, the hardware execution commands corresponding to each operation are stored in the network execution graph in turn, and the key information of each hardware execution command is recorded, so that the calculation logic involved in the neural network (the execution order of each operation in the network model) can be quickly learned according to the network execution graph later, so that when executing the network model, the corresponding hardware execution commands can be sent to the target hardware device for execution in turn according to the key information recorded in the network execution graph and the execution order of each operation, thereby avoiding execution logic errors and improving efficiency.

Given that the network model has the characteristic that the computing operations performed each time are the same except for the input data and output results, all the operations contained in the network model can be translated into a bunch of command sequences and stored in advance through the driver program. Each time the network model is executed, the hardware execution commands can be fine-tuned.

For example, in a scenario where an AI model needs to be run to identify different facial images, for two facial recognition processes using the same AI model, Because the same AI model is used, each time the AI model is executed, the essential computing logic corresponding to the network execution graph used (including the generated and cached hardware execution commands) remains unchanged. When a new face image needs to be recognized after recognizing a face image, only the content/parameters related to input and output in some hardware execution commands need to be fine-tuned. For example, only the read address used to obtain input data in the hardware execution command needs to be replaced with the address where the new data is located, and/or the write address used to store output data needs to be replaced with a new write address, so that the same AI model can be used to process the new input data, and the output result corresponding to the new input data can be stored in a new location. This greatly reduces the burden on the processor and improves the efficiency of data processing.

It can be understood that this method can be applied to scenarios with multiple network models, that is, when there are multiple network models, correspondingly, for each network model, each operation contained in the network model can be translated into a hardware execution command that can be executed by the corresponding target hardware device, the hardware execution command contains the device information of the target hardware device, and the translated hardware execution command is stored. Among them, one network model corresponds to a unique network execution graph. By translating each operation contained in different network models into hardware execution commands that can be executed by the target hardware device and storing them, when it is necessary to execute which network model later (the required network model can be selected according to the task to be processed), the pre-stored hardware execution command corresponding to the required network model can be selected, so as to distribute the command so that the hardware executes these hardware execution commands corresponding to the corresponding network model.

Optionally, the data processing method may further include S4: when it is necessary to run the network model to process the input data, obtain a pre-stored hardware execution command, and send the hardware execution command to the target hardware device for execution, so as to implement the operation of the network model on the target hardware device. For example, according to the execution order of each operation included in the network model, the corresponding hardware execution command is sent to the target hardware device for execution in sequence, thereby supporting the function of running the network model on the target hardware device to process the input data.

In the case of storing all the hardware execution commands corresponding to the network model in advance, when the network model needs to be run to process the input data later, the commands can be distributed directly based on the pre-stored hardware execution commands to enable the hardware to execute these commands.

Among them, sending the hardware execution command to the target hardware device for execution may include: modifying the read address in the hardware execution command for obtaining input data, and/or modifying the write address in the hardware execution command for storing output data; sending the modified hardware execution command to the target hardware device for execution, so that the target hardware device executes the modified hardware execution command, and realizes the function or purpose of running the network model on the hardware to process the input data. Optionally, the modified corresponding hardware execution commands can be sent to the target hardware device for execution in sequence according to the execution order of each operation contained in the network model, so that the target hardware device executes the modified hardware execution command.

Among them, by modifying the read address used to obtain input data in the hardware execution command, and/or modifying the write address used to store output data in the hardware execution command, the modified corresponding hardware execution command is then sent to the target hardware device for execution. In this way, when the network model is executed before and after, input data can be obtained from different places, and output data can be stored in different places, which has better flexibility.

Through the above method, the hardware execution commands generated for a target hardware device can be quickly expanded to other hardware devices in the AI chip, so that when multiple hardware devices are required to run the network model in parallel, there is no need to re-translate each operation in the calculation graph of the network model into corresponding hardware execution commands for different hardware devices respectively, which further reduces the performance overhead of the processor and improves the efficiency of data processing.

Assume that the input data was stored at position A and the output data was stored at position B when the network model was executed last time. If the network model is currently needed to process the input data stored at position C, the read address used to obtain the input data in the hardware execution command can be modified (that is, the read address is changed from the original position A to position C), and the write address used to store the output data in the hardware execution command can be modified (such as changing the write address from the original position B to position D). After that, the modified hardware execution command is sent to the target hardware device for execution in sequence according to the execution order of each operation contained in the network model, so that the target hardware device executes the modified hardware execution command, thereby realizing that when the target hardware device runs the network model, the input data stored at position C can be processed and the processed output data can be stored in position D.

It is understandable that if the acquisition address of the input data remains unchanged (for example, the input data that needed to be processed last time was at position A, and the new input data that needs to be processed currently is still placed at position A, and the new input data can replace the original input data at position A), then there is no need to modify the read address used to acquire the input data in the hardware execution command. Similarly, if the storage address of the output data remains unchanged (for example, after the new input data is currently processed, the expected output data is still stored at position B), then there is no need to modify the write address used to store the output data in the hardware execution command. If it is expected that after the new input data at position A is currently processed, the output data obtained will be stored at position D, then the write address used to store the output data in the hardware execution command can be modified (such as changing the write address from the original position B to position D).

The process described above is illustrated for a hardware device inside the AI chip. In one implementation, if the AI chip contains multiple parallel hardware devices, in order to support the parallel operation of multiple parallel hardware devices for the network model, the data processing method may also include: S5: According to the total number of hardware devices in the AI chip, copy the hardware execution command obtained in S2; S6: According to the device information of other hardware devices in the AI chip except the target hardware device, modify the device information contained in the copied hardware execution command to obtain the hardware execution command with the modified device information, wherein the hardware execution command with the modified device information can be provided to other hardware devices for execution.

Among them, by copying the hardware execution command into a specified number of copies, the specified number is the total number of hardware devices in the AI chip, and each copied hardware execution command is modified according to the device information of the hardware devices other than the target hardware device in the AI chip (that is, the device information in the hardware execution command is modified), so that each modified copied hardware execution command can be executed by other hardware devices.

After obtaining the hardware execution command with modified device information, the hardware execution command generated for the network model can be sent to each hardware device with matching device information in the AI chip according to the principle of device information matching, so that each hardware device in the AI chip can obtain the hardware execution command that it can execute, so that each hardware device in the AI chip can run the network model. It is understandable that this can also be done by sending a network execution graph. The distribution of commands means copying the network execution graph into multiple copies, modifying the device information in the network execution graph, and then sending the multiple network execution graphs with different device information to each hardware device with matching device information in the AI chip.

Assuming that the total number of hardware devices in the AI chip that can run the network model is 3 (assuming that the three hardware devices are respectively recorded as hardware device 0, hardware device 1, and hardware device 2, where hardware device 0 is the target hardware device), the hardware execution command generated for the target hardware device can be copied twice, and for one of the copied hardware execution commands, the copied hardware execution command is modified according to the device information of hardware device 1, so that the modified copied hardware execution command can be executed by hardware device 1. Similarly, the other copied hardware execution command can be modified according to the device information of hardware device 2, so that the modified other copied hardware execution command can be executed by hardware device 2. The hardware execution commands that have been translated for the network model are expanded in the same chip so that each hardware device in the chip can run the network model.

Under this implementation, the hardware execution command generated for one hardware device can be quickly extended to other hardware devices. In an application scenario, the network execution graph can be copied to other hardware devices first, and then the information related to the device information can be modified (the network execution graph can be fine-tuned). Based on a hardware execution command that has been generated and cached, multiple copies are copied and cached to multiple hardware devices. For the copied hardware execution command, the information is modified for the matched hardware devices. The information modification here is fine-tuning, and the purpose is to modify the copied hardware execution command to a command that is adapted to each hardware device. For example, based on hardware device 0 and an AI model, when a hardware execution command is generated and cached for hardware device 0 and the AI model, these hardware execution commands are copied to hardware device 1, hardware device 2, and hardware device 3. After that, the three commands copied to hardware device 1, hardware device 2, and hardware device 3 are modified to commands applicable to hardware device 1, commands applicable to hardware device 2, and commands applicable to hardware device 3, respectively.

It is understandable that if the aforementioned command extension method is not adopted, it is necessary to: first specify a hardware device, generate a hardware execution command for the specified hardware device, then specify another hardware device, and generate another hardware execution command for the other hardware device. This is actually another level of repetitive process. The process of generating commands for different hardware devices multiple times will also affect the performance of the processor to a certain extent, resulting in high power consumption and low efficiency.

Optionally, the number of hardware devices that need to run the network model at the current moment can be manually configured or determined based on the amount of data to be processed. In this case, the data processing method may also include: determining the first number of hardware devices that need to run the network model based on the amount of data to be processed. The first number is less than or equal to the total number mentioned above. Each time the network model is run, it is not necessary to use up all the hardware devices in the chip. The specific number of hardware devices that need to be run can be determined based on the actual application scenario.

Among them, it is possible to combine the actual amount of data to be processed and the total amount of hardware devices that support the operation of the network model, and determine part or all of all hardware devices to run the network model, so as to process the input data to be processed. In this way, the number of required hardware devices can be reasonably determined to the maximum extent, and the processing efficiency can be improved.

When the amount of data to be processed is small, calling one hardware device may be sufficient to meet the demand. When the amount of tasks is large, multiple hardware devices may need to run in parallel. For example, when using AI models to identify CT (Computed Tomography) images in medical scenarios, it may only be necessary to identify one image or a small amount of image data. In this case, using one hardware device to run the AI network can meet the computing needs; and for some scenarios that require a large amount of recognition in a short period of time, that is, a large amount of image data needs to be recognized, or scenarios that require high real-time computing results, multiple hardware devices can be used to run the AI network in parallel for processing.

In addition, in view of the current scenario where AI chips (which can be various processors) use network models for data processing, the processor will occupy memory space in the process of generating hardware execution commands (which can be referred to as hardware commands or commands) for each network model, and the memory space occupied by the data required by different network models is different, it is easy to have insufficient memory, resulting in the hardware being unable to execute the network model as expected, and may affect the performance of the hardware. Based on this, the embodiment of the present application proposes a command generation method to improve the problem that each time the processor generates a hardware execution command for a network model, it needs to occupy a large amount of memory resource overhead, which easily leads to insufficient memory.

According to the characteristics of the network model, the inventors have proposed the following exemplary embodiments to improve the above problems. For a better understanding, a possible method provided by an embodiment of the present application will be described below in conjunction with Figure 2. In an optional implementation, translating each operation in the computational graph of the network model into a hardware execution command that can be executed by the target hardware device of the AI chip may include the following S21 and S22.

S21: Allocate corresponding virtual memory space for the network model.

In the embodiment of the present application, before generating the corresponding first hardware execution command for each operation included in the network model, it is necessary to allocate the corresponding fake memory space (which can be represented by fake memory) for the network model. As an implementation method, S21 can be executed by the core in the AI chip.

In one implementation, a virtual memory space corresponding to the size of the data required to execute the network model (the data here includes the input data to be processed by the network model, and may also include the characteristic data of the network model itself (such as weights, parameters, etc.)) can be allocated. Since the virtual memory space is allocated according to the size of the data required to execute the network model, the requirements for generating commands can be met without occupying the real memory resources of the hardware.

It should be noted that the creation or allocation of false memory space can also be regarded as not occupying too much physical storage space. Even if a little real physical storage space has to be occupied due to the allocation, recording or marking of false memory space, the total amount of space occupied is about 1KB or several KB (this data is only used as an example). The allocation of false memory space may not occupy the real memory of the hardware device to run the network model, but it may occupy a small amount of storage space of the hardware that does not need to run the network model due to the process of allocating or recording false memory space. Since the total amount occupied is extremely small (for example, about 1KB), the amount occupied in this case can also be ignored and is regarded as not occupying real memory and real physical storage space.

For example, assuming that the data size required to execute a network model is 2G, when allocating fake memory resources, a fake memory space of 2G can be allocated. The fake memory space is the same as the real memory space, and each storage row has an independent address, and the fake memory space corresponds to a large The size of the real memory space that is expected to be occupied by the data size required for caching and executing the network model is consistent with that of the real memory space that is expected to be occupied. At this time, the physical memory allocated is not actually 2G. When allocating false memory, the data required for executing the network model will not be actually loaded (or written) into the real memory of the hardware device to run the network model, so it will not occupy the real memory resources of the hardware.

It is understandable that the fake memory space has the same properties as the real memory space, such as the space size (the fake memory space can be the same size as the real memory space) and independent addresses (the address format and address search method can also be designed like the real memory). The only difference is that the fake memory space is not a real physical storage space. The fake memory allocated here is a fake memory, which is neither a real physical memory nor a traditional virtual memory (virtual memory, sometimes also called logical memory) that needs to be mapped with physical memory. The address of the fake memory can be regarded as a fake address, which is a fake address deliberately created and allocated to meet the need of generating hardware execution commands.

S22: Based on the virtual memory space, translate each operation included in the network model into a corresponding first hardware execution command.

In the embodiment of the present application, based on the fake memory space (such as represented by fake memory), each operation (i.e., each operator) contained in the network model is translated into a corresponding first hardware execution command, wherein the first hardware execution command generated for one operation may be hundreds. Since the command generation process is based on the fake memory space, it does not occupy the real memory space. In this way, even if the first hardware execution commands corresponding to each model are generated for multiple network models, it will not cause insufficient memory space.

Since the hardware execution command of each network model usually needs to include various operations, a read address for obtaining the data source required for the operation, a write address for storing the operation result and other operation information, therefore, when generating a command, it is necessary to provide an address for reading and writing data in order to generate the command. In this application, considering that in actual applications, it is not necessarily often faced with the demand of "immediately executing the command after generating the command and immediately reading and writing the memory", therefore, the hardware execution command is generated based on the (address of) false memory space, so that the addresses in the hardware execution command are all false addresses, and when it is determined that some commands need to be executed, the addresses in the relevant commands can be replaced, so that the hardware execution commands can be translated in advance for each operation of the network model, which is conducive to improving the execution efficiency of the network model, and can avoid excessive occupation of memory resources due to translating commands for the network model in advance.

S22 can be executed by the core of the AI chip, and a driver for translating each operation included in the network model into corresponding hardware execution commands can be deployed in the core.

Optionally, the implementation process of S22 may include: compiling the source code of each operation included in the network model into instructions corresponding to each operation, and obtaining relevant information required to execute each operation included in the network model based on the virtual memory space; generating a first hardware execution command according to the corresponding instructions of each operation and the relevant information required to execute each operation.

In this implementation, the source code of each operation included in the network model is compiled into instructions, and based on the virtual memory space, the relevant information required to execute each operation included in the network model is obtained, and then the hardware execution command is generated according to the corresponding instructions of each operation and the relevant information required to execute each operation. In this way, each operation included in the network model can be quickly and accurately translated into the corresponding hardware execution command, and the relevant information required to execute each operation included in the network model is obtained based on the virtual memory space, and will not occupy the real memory resources of the hardware while meeting the requirements for generating commands.

In order to translate the various operations contained in the network model into hardware execution commands that can be executed by the hardware, the driver provides a set of relatively common API functions, such as creating compiled object API, compiling instruction API, creating memory API, data transfer API, execution API, etc.

Exemplarily, for each operation of the network model, the driver provides a programmable language function similar to C++ syntax, and the source code of the operation can be implemented using this syntax. At the same time, the driver will also compile the source code of an operation included in the network model into the hardware instructions corresponding to the operation based on the compiler by using a preset first API function (such as creating a compilation object API, compiling an instruction API). The specific implementation details of using the compiler to compile the source code into hardware instructions are already well known in the art and will not be introduced here.

Since each operation requires an operation object, for example, if data needs to be operated, such as the convolution operation, the input data and weight need to be convolved, a memory creation API provided by the driver can be used to allocate a space on the virtual memory space and provide it to the operator of the convolution operation. In addition, some of the operations may involve the handling of data, so the driver provides a data handling API for handling data during the operation. In this way, the driver can obtain the relevant information required for the hardware device to execute each operation contained in the network model based on the virtual memory space by using the preset second API function (such as the aforementioned memory creation API and data handling API). Exemplarily, the relevant information required for the hardware device to execute an operation can be used to reflect the following related information: the address and length of the instruction, how many memory addresses the instruction needs to operate, where the specific location of the memory address is, how much the memory size is, what is the processing order between instructions, etc. Finally, the preset third API function (such as the execution API) can be used to generate a first hardware execution command according to the corresponding instructions of each operation and the relevant information required to execute each operation. In some application scenarios, if the process of compiling the source code of the operation into hardware instructions also requires some memory information, and there is no need to execute the instructions temporarily or to perform actual data reading, writing or loading temporarily, the aforementioned false memory space can also be used to compile the instructions. When data reading, writing or loading is required, the false address in the instruction is replaced with the real address.

Optionally, in addition to each operation, a read address for obtaining the data source required for the operation, and a write address for storing the operation result, the first hardware execution command may also include device information of the hardware device (such as a device identifier) to indicate which hardware device executes the first hardware execution command, and different hardware devices have different corresponding device information. The hardware device is a hardware object that is expected to have the ability to run the network model. An AI chip may involve multiple hardware devices. The hardware execution command obtained after the network module is translated can be provided to the corresponding hardware device for execution when the network model corresponding to the hardware execution command needs to be run.

When there are multiple network models, when translating each operation contained in the network model into a corresponding first hardware execution command based on the virtual memory space, it can be: for different network models, based on different virtual memory spaces, translating each operation contained in each network model into a corresponding first hardware execution command Command, different network models correspond to different virtual memory spaces (that is, one network model corresponds to one virtual memory space). Since the virtual memory space used does not really exist, even if there are many network models to be processed, it will not increase the memory resource consumption too much.

In the embodiment of the present application, different virtual memory spaces are used for different network models, so that logical confusion will not be caused when performing subsequent address conversion, thereby ensuring efficient conversion of commands.

In an optional implementation manner, using a network execution graph to store the hardware execution command may include: after S22, S31: storing the first hardware execution command.

After S22, the first hardware execution command is stored for subsequent use. As an implementation mode, when storing the first hardware command, the first hardware execution command may be stored in a pre-constructed network execution graph. For example, the first hardware execution command may be stored by a storage device in the AI chip using a network execution graph. Among them, the network execution graph is used to record all first hardware execution commands generated for the network model, and can also be used to record key information of each first hardware execution command. The key information may include a starting address, an offset, and a command execution order, etc., and the length and storage location of the command can be known based on the starting address and the offset. The hardware device can obtain the first hardware execution command based on the key information.

In an embodiment of the present application, each operation (i.e., each operation operator) contained in the network model is translated into a corresponding first hardware execution command based on a false memory space (such as represented by fake memory). Since the process is based on a false memory space, a large amount of real memory space will not be occupied due to the command generation process. In this way, even if many commands are generated for a network model, or the first hardware execution command is generated for multiple network models, the actual real memory space will not be insufficient due to occupying too much real memory space during the command generation process. In addition, through this method, it is possible to support the translation of hardware execution commands for network models in advance, which is conducive to supporting the generation of hardware execution commands corresponding to each model for one or more network models in advance, and it can avoid excessive occupation of limited memory resources due to the translation of hardware execution commands corresponding to each operation for the network model in advance.

Optionally, taking into account the inherent characteristics of the network model, that is, the network model itself is fixed, but the input data processed each time is different, and there may be different output results corresponding to different input data, therefore, the various operations contained in the network model can be translated into a first hardware execution command that can be executed by the corresponding hardware device, but it is not sent to the hardware device for execution first. Instead, the translated first hardware execution command is stored first, so that each subsequent time the network model is needed to process the input data, it does not need to be retranslated, but only needs to be fine-tuned by replacing the address in the first hardware execution command, for example, the address information related to the input data and output data in the command can be modified. There is no need for the driver to re-translate the various operations in the network model into the first hardware execution command, thereby saving the performance overhead required for the processor to run the network model each time.

Since the above-mentioned first hardware execution command is generated based on the virtual memory space, the addresses in the first hardware execution command are all virtual addresses. Although the virtual addresses can be searched and applied in the process of generating the command, they cannot actually be used to store the loaded data in the command execution process. Therefore, when the network model needs to be executed subsequently, the command generation method may also include: loading the data required to execute the network model into the real memory space, and replacing the virtual addresses in the first hardware execution command with the real addresses corresponding to the real memory space, and sending the replaced first hardware execution command as the second hardware execution command to the corresponding hardware device for execution.

The process of replacing the false address in the first hardware execution command with the real address may include: identifying the first hardware execution command, determining part or all of the first hardware execution command currently containing the false address as the target command; and replacing the false address in the target command with the real address corresponding to the real memory space. When replacing, it is necessary to identify which first hardware execution commands use the false address. After identifying the first hardware execution command using the false address, the false address in part or all of the first hardware execution command can be replaced with the real address.

In the embodiment of the present application, when performing address replacement, the first hardware execution command is identified and only the first hardware execution command containing the false address is replaced, thereby avoiding erroneous replacement or missing replacement.

There may be the following results for the identification results: 1) When all the first hardware execution commands are not executed, all the addresses in the first hardware execution commands are false addresses; 2) During the execution of the network model, there are some commands being executed by the hardware devices, and the addresses of these commands being executed may have been replaced with real addresses (valid addresses). At the same time, there may also be some unexecuted commands that have not been replaced, and the addresses of these commands are still false addresses.

When multiple network models are included, the fake addresses in the first hardware execution commands corresponding to all network models may be replaced at one time, or only the fake addresses in the first hardware execution commands corresponding to some network models may be replaced.

Among them, the decision to replace the addresses of all commands or some commands can be made based on the currently remaining available memory capacity, the processing progress of the network model, the amount of data to be processed, the processing capacity supported by the chip and other optional factors. The present application does not require the number of commands to be replaced each time and/or the type of operation corresponding to the command. The false addresses contained in a batch of commands corresponding to some operations in a network can be replaced each time, or the false addresses in all commands of an entire network model (or multiple network models) can be replaced at one time.

For example, when a task requires the use of two network models, when replacing, if the currently remaining available memory or the processing power supported by the chip supports processing the data corresponding to the two network models at one time, then the false addresses in the first hardware execution commands corresponding to the two network models can be replaced at one time. If it is currently not supported to replace all the false addresses in the first hardware execution commands corresponding to the two network models with real addresses at one time, then the replacement can be performed in batches, such as first replacing the address of the command of one of the network models, and then replacing the address of the command of the other network model.

It is understandable that the "real memory" mentioned in this application is physical memory, and the "real address" is the physical address that a physical storage medium will have, while the "false address" is not a physical address, but a false address that can be designed to have similar properties or a similar format as a physical address. Regarding the process of "replacing the false address in the first hardware execution command with the real address corresponding to the real memory space", the "real address corresponding to the real memory space" used for address replacement can be either the physical address of the physical memory or the address of the virtual memory that has established a mapping relationship with the physical memory in advance. This virtual memory is relatively As for the false memory of the present application, it also has a real address and will occupy physical storage space. Generally speaking, through virtual memory mapping technology, a mapping relationship can be established between a physical storage space (which may be a physical external memory or a physical memory) and another physical storage space (usually a physical memory), so that the originally discontinuous physical addresses are logically mapped and associated, so that the originally scattered or unrelated and disordered physical addresses become logically associated and ordered in some scenarios, and the actual loading and reading and writing of data can also be completed through this mapping relationship. Regarding the real address used to replace the false address, whether it is the address of the physical memory or other physical addresses that establish a mapping relationship with the physical memory in advance, as long as it can be guaranteed that the first hardware execution command after replacement can be executed correctly.

Among them, the first hardware execution commands corresponding to each cached network model can be processed and executed on demand. For example, assuming that there are first hardware execution commands corresponding to 20 network models in the cache, but only the command corresponding to one of the network models needs to be executed currently, then only these commands can be temporarily replaced, and the replaced new commands (which can be called second hardware execution commands) can be distributed to specific hardware devices for execution.

In addition, optionally, the real address in the second hardware execution command can be replaced based on the false address corresponding to the false memory space, so that the second hardware execution command can become the first hardware execution command containing the false address again (that is, the address in the first hardware execution command that replaced the real address is changed to a false address again), thereby releasing the corresponding physical memory resources. After the replaced first hardware execution command is sent to the corresponding hardware device as the second hardware execution command for execution, when it is no longer necessary to execute a certain network model, the command generation method may also include: when it is determined that the network model will not be executed within a preset time period starting from the current moment, based on the false memory space, the real address in the second hardware execution command corresponding to the network model is replaced with a false address, and the hardware execution command with the address replaced with the false address is cached so that it can be used the next time the same network model needs to be run.

Through this implementation, after the replaced first hardware execution command is sent to the corresponding hardware device as the second hardware execution command for execution, the real address in the command is replaced with a false address corresponding to the false memory space and cached, thereby freeing up a portion of the corresponding real memory space.

It can be understood that the addresses in the first hardware execution command are all false addresses, and the addresses in the second hardware execution command are all real addresses. If the false addresses in the first hardware execution command are replaced with real addresses, the first hardware execution command after replacement is the second hardware execution command. Similarly, after replacing the real addresses in the replaced first hardware execution command (second hardware execution command) with false addresses, the first hardware execution command can be obtained again.

Based on the same inventive concept as that of improving the problem of insufficient memory in the process of generating hardware execution commands for the network model each time in the above-mentioned data processing method, before translating each operation included in the network model into the corresponding first hardware execution command based on the virtual memory space, the data processing method of the embodiment of the present application further includes another step, as shown in Figure 3. The principle will be described below in conjunction with Figure 3.

S210: Determine whether the network model is executed within a preset time period starting from the current moment.

By judging whether the network model is executed or used within a preset time period starting from the current moment, if it is determined that the network model is to be executed within the preset time period starting from the current moment, S220 is executed, and if it is determined that the network model is not to be executed within the preset time period starting from the current moment, S240 is executed. The preset time period can be set according to actual needs, such as minutes, hours, etc.

Through this implementation, only when it is determined that the network model will not be executed within a preset time period starting from the current moment, the various operations contained in the network model will be translated into corresponding first hardware execution commands based on the virtual memory space. This is beneficial for translating the network model in advance without reducing the processing efficiency of the network model, and can improve the translation efficiency, which is beneficial to improving the processing efficiency of the network model.

S220: Based on the real memory space, translate each operation included in the network model into a corresponding second hardware execution command.

When it is determined that the network model is to be executed within a preset time period starting from the current moment, each operation contained in the network model is translated into a corresponding second hardware execution command based on the real memory space, wherein the addresses contained in the second hardware execution command are all real addresses, and the real memory space is used to store the data required when executing the network model.

Through this implementation, when it is determined that the network model is to be executed within a preset time period starting from the current moment, each operation contained in the network model is directly translated into a corresponding second hardware execution command based on the real memory space. In this way, when the network model needs to be executed as soon as possible, it is avoided that after the first hardware execution command is generated based on the false memory space, the address in the first hardware execution command needs to be converted into the real address required for execution, thereby improving the command translation efficiency and processing efficiency of the network model to be executed.

When it is determined that the network model is to be executed within a preset time period starting from the current moment, a real memory space corresponding to the data size can be allocated according to the data size required to execute the network model, and each operation contained in the network model can be translated into a corresponding second hardware execution command. At the same time, the data required for executing the network model (the data here includes the input data to be processed by the network model, and can also include the characteristic data of the network model itself (such as weights, parameters, etc.)) is loaded into the real memory space, so that after translating each operation contained in the network model into the corresponding second hardware execution command, the second hardware execution command is directly sent to the corresponding hardware device for execution, so that the hardware device executes these second hardware execution commands to execute the network model.

S230: Store the second hardware execution command.

After translating each operation contained in the network model into a corresponding second hardware execution command based on the real memory space, the second hardware execution command is stored. When it is needed to be executed later, the second hardware execution command is directly sent to the corresponding hardware device for execution, so that the hardware device executes these second hardware execution commands to execute the network model.

The implementation principle of the above S230 is consistent with the implementation principle of S31 in Figure 2, except that the first hardware execution command is stored in S31, while the second hardware execution command is stored in this step. The second hardware execution command can also be stored in the network execution graph.

S240: Based on the virtual memory space, translate each operation included in the network model into a corresponding first hardware execution command.

When it is determined that the network model is not executed within a preset time period starting from the current moment, each operation included in the network model is translated into a corresponding first hardware execution command based on the virtual memory space.

S250: Store the first hardware execution command.

The implementation principles of the above-mentioned S240 and S250 are respectively consistent with the implementation principles of S22 and S31 in the aforementioned Figure 2, and will not be further described here.

It should be noted that the above-mentioned process of translating each operation in the network model into hardware execution commands (including the first hardware execution command and the second hardware execution command) and the process of executing the hardware execution commands can be implemented by the same AI chip or by two AI chips respectively. For example, AI chip 1 is only responsible for translating each operation in the network model into hardware execution commands, and AI chip 2 is responsible for executing these hardware execution commands. The two processes are completed through the cooperation between the two AI chips.

Among them, when the above two processes are implemented by two AI chips, AI chip 1 can translate each operation in the network model into hardware execution commands (including a first hardware execution command and a second hardware execution command) and store them; when the network model is to be run subsequently, the corresponding hardware execution command is sent to the hardware device of AI chip 2 for execution, or the corresponding first hardware execution command is converted into a second hardware execution command and then sent to the hardware device of AI chip 2 for execution. The command conversion process includes: replacing the false address in the first hardware execution command with the real address corresponding to the real memory space, thereby obtaining the second hardware execution command. Alternatively, AI chip 1 translates each operation in the network model into a hardware execution command, sends the hardware execution command to AI chip 2 for storage, and when the network model is to be run subsequently, AI chip 2 obtains the corresponding hardware execution command and sends it to the hardware device of AI chip 2 for execution. Alternatively, AI chip 1 translates each operation in the network model into a first hardware execution command, sends the first hardware execution command to AI chip 2 for storage, and when the network model is to be run subsequently, AI chip 2 replaces the false address in the first hardware execution command with the real address corresponding to the real memory space to obtain the second hardware execution command, which is then sent to the hardware device of AI chip 2 for execution.

Based on the same inventive concept of improving the problem that each time the processor runs a network model, the data processing method requires a large performance overhead and takes a long time, the embodiment of the present application also provides another data processing method for the scenario of running a network model for data processing, and its principle is explained below in conjunction with Figures 4 and 5. Compared with Figure 1, Figure 4 is described only from the perspective of executing hardware execution commands.

S10: When it is necessary to run the network model, obtain a pre-stored hardware execution command executable by a target hardware device corresponding to the network model.

S20: Sending the hardware execution command to the target hardware device for execution, so that the target hardware device executes the hardware execution command, thereby running the network model on the target hardware device.

In order to reduce the performance overhead of the processor and improve efficiency, the various operations contained in the network model can be translated into hardware execution commands that can be executed by the target hardware device and stored in advance (the aforementioned network execution graph storage can be used). When the network model needs to be run to process the input data in the future, the pre-stored hardware execution commands corresponding to the network model are obtained and provided to the target hardware device for execution.

Among them, by storing the hardware execution commands corresponding to each operation in the network model in advance, when the network model is subsequently run, the hardware execution commands stored in the network execution graph in advance are distributed to the corresponding hardware executions, and there is no need to re-translate each operation in the network model into hardware execution commands, thereby solving the problem that the processor requires a large performance overhead and takes a long time each time it runs the network model.

In one implementation, the corresponding hardware execution commands may be sent to the target hardware device for execution in sequence according to the execution order of each operation contained in the network model. After obtaining the hardware execution commands that can be executed by the target hardware device corresponding to the pre-stored network model, the corresponding hardware execution commands may be sent to the target hardware device for execution in sequence according to the execution order of each operation contained in the network model, so that the target hardware device executes the hardware execution commands, thereby realizing the function of running the network model on the target hardware device, and facilitating the network model to be run on the target hardware device to process the input data.

The principle and technical effect of the data processing method shown in FIG4 are the same as those of the method embodiment shown in FIG1. For the sake of brief description, for parts not mentioned in the embodiment shown in FIG4, reference may be made to the corresponding contents in the method embodiment shown in FIG1 above.

Based on the same inventive concept as that of improving the problem of insufficient memory in the process of generating hardware execution commands for the network model each time in the above data processing method, when the network model needs to be run, the embodiment of the present application obtains the pre-stored hardware execution command executable by the target hardware device corresponding to the network model, which may include the following S110 and S120, as shown in Figure 5. The principle will be described below in conjunction with Figure 5.

S110: When the network model needs to be used, the original network data corresponding to the network model is loaded into the real memory space, and the pre-stored first hardware execution command is obtained.

When the network model is needed to process the input data (such as image recognition, classification, etc.), the original network data corresponding to the network model (the data at this time includes the input data to be processed by the network model and the characteristic data of the network itself) is loaded into the real memory space, and the pre-stored first hardware execution command is obtained. The first hardware execution command is obtained by translating each operation contained in the network model based on the virtual memory space, and the virtual memory space has the same properties as the real memory space.

In this manner, it is necessary to translate each operation included in the network model into a corresponding first hardware execution command based on the virtual memory space in advance and cache it.

S120: Using the real address corresponding to the real memory space to replace the false address in the first hardware execution command.

In the embodiment of the present application, after replacing the false address in the first hardware execution command with the real address corresponding to the real memory space, the data processing method further includes: sending the replaced first hardware execution command to the corresponding hardware device.

Since the above-mentioned first hardware execution command is generated based on the virtual memory space, the addresses in the first hardware execution command are all false addresses. Therefore, when executing the network model subsequently, it is necessary to use the real address corresponding to the real memory space to replace the false address in the first hardware execution command, and send the replaced first hardware execution command as the second hardware execution command to the corresponding hardware device for execution.

When a certain network model is no longer needed to be executed, the real address in the second hardware execution command can be replaced based on the fake address corresponding to the fake memory space. Thus, the corresponding memory resources can be released, that is, the address in the first hardware execution command that replaced the real address is changed to a false address again. At this time, after sending the replaced first hardware execution command as the second hardware execution command to the corresponding hardware device for execution, the command generation method can also include: when it is determined that the network model is not executed within a preset time period starting from the current moment, the real address in the replaced first hardware execution command (i.e., the second hardware execution command) is replaced with a false address corresponding to the false memory space, and the cache address is replaced with the hardware execution command of the false address, wherein the false memory space has the same properties as the real memory space and is the same size.

Through the above-mentioned command generation process, it is beneficial to translate the network model in advance without reducing the network model processing efficiency, and it can improve the translation efficiency, which is beneficial to improve the processing efficiency of the network model, thereby saving the performance overhead required by the processor each time the network model runs.

Based on the same inventive concept of improving the problem that the processor needs a large performance overhead and takes a long time to run the network model each time in the above data processing method, the embodiment of the present application also provides a data processing device 100, as shown in Figure 6, the data processing device 100 may include: an acquisition module 110, a command generation module 120 and a storage module 130. The acquisition module 110 can also be recorded as a first acquisition module.

The acquisition module 110 may be configured to: acquire a computational graph of a network model to be run.

The command generation module 120 may be configured to: translate each operation in the computation graph of the network model into a hardware execution command executable by a corresponding target hardware device, wherein the hardware execution command includes device information of the target hardware device.

The storage module 130 can be configured to: store the hardware execution commands using a network execution graph, wherein the network execution graph is used to record all hardware execution commands generated for the network model, and the target hardware device can run the network model by executing the hardware execution commands in the network execution graph.

Optionally, the command generation module 120 can be configured to use a preset first API function to compile the source code of each operation in the computational graph of the network model into instructions, and use a preset second API function to obtain the relevant information required for the target hardware device to perform each operation; and use a preset third API function to generate the hardware execution command according to the corresponding instructions of each operation and the relevant information required to perform each operation. The storage module 130 can be configured to: store the hardware execution commands corresponding to each operation in the network execution graph in sequence according to the execution order of each operation contained in the network model, and record the key information of each hardware execution command, and the key information is used to obtain the hardware execution command.

Optionally, the data processing device 100 may further include: a sending module.

The acquisition module 110 may also be configured to: when it is necessary to run the network model, acquire the hardware execution command pre-stored in the network execution graph.

The sending module may also be configured to: send the hardware execution command to the target hardware device for execution, so that the target hardware device executes the hardware execution command, thereby implementing the operation of the network model on the target hardware device.

Optionally, the sending module can be configured to: modify the read address in the hardware execution command for obtaining input data, and/or modify the write address in the hardware execution command for storing output data; send the modified corresponding hardware execution command to the target hardware device for execution, so that the target hardware device executes the modified hardware execution command, thereby achieving the purpose of running the network model on the target hardware device to process the input data.

Optionally, the data processing device 100 may further include: a copy module, which is configured to: copy the hardware execution command according to the total number of hardware devices in the AI chip; modify the device information contained in the copied hardware execution command according to the device information of other hardware devices in the AI chip except the target hardware device, to obtain a hardware execution command with modified device information, wherein the hardware execution command with modified device information can be provided to the other hardware devices for execution.

Optionally, the replication module may also be configured to: determine a first number of hardware devices currently required to run the network model based on an amount of data to be processed.

Based on the same inventive concept as that of improving the problem of insufficient memory in the processor each time it generates hardware execution commands for a network model in the above-mentioned data processing method, the command generation module 120 in the embodiment of the present application may include: an allocation module 121 and a translation module 122.

The allocation module 121 may be configured to allocate a corresponding virtual memory space for the network model.

The translation module 122 can be configured to: translate each operation included in the network model into a corresponding first hardware execution command based on the virtual memory space, the addresses in the first hardware execution command are all virtual addresses, and the virtual memory space has the same properties as the real memory space.

If there are multiple network models, the translation module 122 can be configured to translate each operation contained in each network model into a corresponding first hardware execution command based on different virtual memory spaces for different network models, and different network models correspond to different virtual memory spaces.

In the embodiment of the present application, optionally, the storage module 130 can also be configured to store the first hardware execution command, where the first hardware execution command is provided to the hardware device that needs to run the network model for execution after the address is replaced.

Optionally, the allocation module 121 may be configured to allocate a virtual memory space corresponding to a data size required for executing the network model.

Optionally, the command generation module 120 may further include a judgment module, which may be configured to: judge whether the network model is executed within a preset time period starting from the current moment. When it is determined that the network model is not executed within the preset time period starting from the current moment, the translation module 122 may be configured to: translate each operation included in the network model into a corresponding first hardware execution command based on the virtual memory space.

When determining that the network model is to be executed within a preset time period starting from the current moment, the translation module 122 may also be configured to: The real memory space is used to translate each operation included in the network model into a corresponding second hardware execution command, wherein the addresses included in the second hardware execution command are all real addresses, and the real memory space stores the data required for executing the network model. The storage module 130 can also be configured to store the second hardware execution command.

Optionally, the command generation module 120 may also include an acquisition module and a sending module, and the acquisition module may be configured to: when executing the network model, load the data required for executing the network model into the real memory space. The translation module 122 may also be configured to: replace the false address in the first hardware execution command with the real address corresponding to the real memory space. The sending module may be configured to: send the replaced first hardware execution command as the second hardware execution command to the corresponding hardware device.

The translation module 122 may also be configured to replace the real address in the replaced first hardware execution command with the false address corresponding to the false memory space when it is determined that the network model is not executed within a preset time period starting from the current moment.

Optionally, the translation module 122 may be configured to: identify the first hardware execution command to identify a first hardware execution command containing a false address; and replace the false address in the first hardware execution command containing the false address with a real address corresponding to the real memory space.

The translation module 122 can be configured to: compile the source code of each operation included in the network model into instructions, and based on the virtual memory space, obtain the relevant information required to execute each operation included in the network model; generate the first hardware execution command according to the corresponding instructions of each operation and the relevant information required to execute each operation.

The command generation module 120 provided in the embodiment of the present application has the same implementation principle and technical effects as those of the aforementioned method embodiment. For the sake of brief description, for matters not mentioned in the device embodiment, reference may be made to the corresponding contents in the aforementioned method embodiment.

The process of executing each module in the above-mentioned command generation module is beneficial to translating the network model in advance without reducing the processing efficiency of the network model, and can improve the translation efficiency, which is beneficial to improving the processing efficiency of the network model, thereby saving the performance overhead required by the processor each time the network model runs.

Based on the same inventive concept of improving the problem that each time the processor runs a network model, the data processing method requires a large performance overhead and takes a long time, the embodiment of the present application also provides another data processing device 200 for a scenario where a network model is run for data processing, as shown in FIG8 , the data processing device 200 includes: an acquisition module 210 and a sending module 220. The acquisition module 210 can also be recorded as a second acquisition module.

The acquisition module 210 may be configured to: when it is necessary to run the network model, acquire the pre-stored hardware execution command that can be executed by the target hardware device corresponding to the network model. The sending module 220 may be configured to: send the hardware execution command to the target hardware device for execution, so that the target hardware device executes the hardware execution command, so as to achieve the purpose of running the network model on the target hardware device to process the input data.

Based on the same inventive concept as that of improving the problem of insufficient memory in the processor each time it generates a hardware execution command for a network model in the above-mentioned data processing method, as shown in FIG9 , the acquisition module 210 may include: a first hardware execution command acquisition module 211 and a translation module 212.

The first hardware execution command acquisition module 211 can be configured to: when the network model needs to be executed, load the network original data corresponding to the network model into the real memory space, and acquire the pre-stored first hardware execution command. The first hardware execution command is obtained by translating each operation included in the network model based on the virtual memory space, and the virtual memory space has the same properties as the real memory space.

The translation module 212 may be configured to replace the false address in the first hardware execution command with the real address corresponding to the real memory space.

The sending module 220 may also be configured to send the replaced first hardware execution command to the corresponding hardware device.

The translation module 212 can also be configured to: when it is determined that the network model is not executed within a preset time period starting from the current moment, replace the real address in the replaced first hardware execution command with a virtual address corresponding to the virtual memory space, and the virtual memory space has the same properties as the real memory space.

The acquisition module 210 provided in the embodiment of the present application has the same implementation principle and technical effects as the aforementioned method embodiment. For the sake of brief description, for matters not mentioned in the device embodiment, reference can be made to the corresponding contents in the aforementioned method embodiment. The modules in the acquisition module 210 and the modules in the aforementioned command generation module 120 can be integrated together or used independently.

Through the process executed by each module in the above-mentioned acquisition module, it is beneficial to translate the network model in advance without reducing the processing efficiency of the network model, and the translation efficiency can be improved, which is beneficial to improving the processing efficiency of the network model, thereby saving the performance overhead required by the processor each time the network model is run. The data processing device 100 or data processing device 200 provided in the embodiment of the present application has the same implementation principle and technical effects as the aforementioned method embodiment. For the sake of brief description, the parts not mentioned in the device embodiment can refer to the corresponding content in the aforementioned method embodiment.

Based on the same inventive concept of improving the problem that the processor needs a large performance overhead and takes a long time to run the network model each time in the above data processing method, the embodiment of the present application also provides an AI chip, which may include: a core and a storage device. The AI chip can be used to execute the above data processing method.

The kernel is used to obtain the computation graph of the network model to be run, and translate each operation in the computation graph of the network model into a hardware execution command executable by the target hardware device, wherein the hardware execution command contains the device information of the target hardware device;

Among them, a driver is deployed in the kernel, which can translate various operations in the calculation graph of the network model into hardware execution commands that can be executed by the target hardware device, and send its hardware execution commands to the storage device.

The kernel may compile the source code of each operation in the computational graph of the network model into instructions using a preset first API function, and use a preset second API function to The storage device can be configured to store the hardware execution command using a network execution graph, wherein the network execution graph is used to record the hardware execution command, and the hardware execution command is used to run the network model.

In an optional implementation, the storage device may store the hardware execution commands corresponding to each operation in the network execution graph in sequence according to the execution order of each operation contained in the network model, and record key information of each hardware execution command, which is used to obtain the hardware execution command.

Based on the same inventive concept as that of improving the problem of insufficient memory each time the processor generates a hardware execution command for a network model in the above-mentioned data processing method, under an optional implementation mode, the kernel in the embodiment of the present application can also be configured to: allocate a corresponding virtual memory space for the network model, and based on the virtual memory space, translate each operation contained in the network model into a corresponding first hardware execution command, the addresses in the first hardware execution command are all virtual addresses, and the virtual memory space has the same properties as the real memory space.

A driver is deployed in the kernel, and the driver can translate various operations included in the network model into first hardware execution commands, and send the first hardware execution commands to the storage device for storage.

The kernel can also be configured to: compile the source code of each operation included in the network model into instructions, and based on the virtual memory space, obtain the relevant information required to execute each operation included in the network model; generate the first hardware execution command according to the corresponding instructions of each operation and the relevant information required to execute each operation.

The storage device may also be configured to store a first hardware execution command, where the first hardware execution command is provided to a hardware device that needs to run the network model for execution after address replacement.

Optionally, the kernel may be configured to: allocate a virtual memory space corresponding to a data size required for executing the network model.

Optionally, before the kernel translates each operation contained in the network model into a corresponding first hardware execution command based on the virtual memory space, the kernel can also be configured to: determine whether the network model is executed within a preset time period after the current moment, and only translate each operation contained in the network model into a corresponding first hardware execution command based on the virtual memory space when it is determined that the network model is not executed within the preset time period after the current moment.

When it is determined that the network model is to be executed within a preset time period starting from the current moment, the kernel is further configured to translate each operation contained in the network model into a corresponding second hardware execution command based on the real memory space, wherein the addresses contained in the second hardware execution command are all real addresses, and the real memory space stores the data required for executing the network model. At this time, the storage device can also be configured to store the second hardware execution command.

Optionally, the kernel can also be configured to: when executing the network model, load the data required to execute the network model into the real memory space; replace the false address in the first hardware execution command with the real address corresponding to the real memory space, and send the replaced first hardware execution command as the second hardware execution command to the corresponding hardware device.

Optionally, the kernel can be configured to: identify the first hardware execution command, determine part or all of the first hardware execution command currently containing a false address as a target command; and replace the false address in the target command with a real address corresponding to the real memory space.

Optionally, the kernel can also be configured to: after sending the replaced first hardware execution command as the second hardware execution command to the corresponding hardware device for execution, replace the real address in the replaced first hardware execution command (i.e., the second hardware execution command) with the false address corresponding to the false memory space, and replace the cache address with the second hardware execution command with the false address.

Since the process of translating each operation in the network model into a hardware execution command and the process of executing the hardware execution command can be implemented by different AI chips, after AI chip 1 obtains the hardware execution command, it can be stored in the storage device of AI chip 1, or in the storage device of AI chip 2, or in the storage device shared by AI chip 1 and AI chip 2. Based on the same inventive concept of improving the problem that the processor requires a large performance overhead and takes a long time each time to run the network model in the above-mentioned data processing method, the embodiment of the present application also provides an AI chip, which may include: a hardware device, a kernel, and a storage device. The AI chip can be used to execute the aforementioned data processing method.

The storage device may be configured to store hardware execution commands corresponding to each operation in the computation graph of the network model.

The kernel may be configured to: when the network model needs to be run, obtain the previously stored hardware execution command from the storage device, and send the hardware execution command to the hardware device.

The hardware device can be configured to: execute hardware execution commands to achieve the purpose of running the network model to process input data.

Since the process of translating each operation in the network model into a hardware execution command and the process of executing the hardware execution command can be implemented by different AI chips, therefore, in one implementation, the AI chip can receive the hardware execution command sent by other AI chips and execute the hardware execution command. At this time, the kernel is also used to receive the hardware execution command sent by other AI chips and store it for execution by the hardware device.

Based on the same inventive concept as improving the problem of insufficient memory in the process of generating hardware execution commands for the network model each time in the above-mentioned data processing method, in an optional implementation manner, the storage device in the embodiment of the present application can also be configured to store a first hardware execution command; wherein the first hardware execution command is obtained by translating each operation included in the network model based on a virtual memory space, and the virtual memory space has the same properties as the real memory space;

The kernel may also be configured to: when the network model needs to be executed, load the network original data corresponding to the network model into the real memory space, obtain the first hardware execution command stored in the storage device, and replace the first hardware execution command with the real address corresponding to the real memory space. The false address in the hardware device and the replaced first hardware execution command are sent to the hardware device;

The hardware device may also be configured to: execute the replaced first hardware execution command to achieve the purpose of running the network model to process input data.

The AI chip provided in the embodiment of the present application has the same implementation principle and technical effects as those in the aforementioned method embodiment. For the sake of brief description, for matters not mentioned in the AI chip embodiment, reference may be made to the corresponding contents in the aforementioned method embodiment.

Based on the same inventive concept of improving the problem that the processor needs a large performance overhead and takes a long time to run the network model each time in the above-mentioned data processing method, as shown in Figure 10, Figure 10 shows a structural block diagram of an electronic device 300 provided in an embodiment of the present application. The electronic device 300 may include: a transceiver 310, a memory 320, a communication bus 330 and a processor 340. The transceiver 310, the memory 320, and the processor 340 are directly or indirectly electrically connected to each other to realize data transmission or interaction. For example, these elements can be electrically connected to each other through one or more communication buses 330 or signal lines. Among them, the transceiver 310 can be configured to receive and send data. The memory 320 can be configured to store computer programs, such as storing the software function modules shown in Figures 6 to 9, that is, the data processing device 100 of Figure 6 or the data processing device 200 of Figure 8. The data processing device 100 includes at least one software function module that can be stored in the memory 320 in the form of software or firmware or fixed in the operating system (OS) of the electronic device 300. The processor 340 can be configured to execute the executable module stored in the memory 320.

For example, when the processor 340 can be configured to execute the software function module or computer program included in the data processing device 100, the processor 340 can be configured to: obtain the calculation graph of the network model to be run; translate each operation in the calculation graph of the network model into a hardware execution command that can be executed by the target hardware device of the AI chip, and the hardware execution command contains the device information of the target hardware device; use the network execution graph to store the hardware execution command, wherein the network execution graph is used to record all hardware execution commands generated for the network model, and the target hardware device can run the network model by executing the hardware execution commands in the network execution graph.

When the processor 340 can be configured to execute the software function modules or computer programs included in the data processing device 200, the processor 340 can be configured to: when it is necessary to run the network model, obtain a pre-stored hardware execution command that can be executed by the target hardware device corresponding to the network model; send the hardware execution command to the target hardware device for execution, so that the target hardware device executes the hardware execution command, so as to achieve the purpose of running the network model on the target hardware device to process the input data.

It is understandable that the electronic device 300 may also include two processors 340, wherein one processor 340 is responsible for translating various operations in the network model into hardware execution commands, and one processor 340 is responsible for executing the hardware execution commands.

Among them, the memory 320 can be, but is not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read-only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable read-only memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.

The processor 340 may be an integrated circuit chip with signal processing capabilities. The above-mentioned processor may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it may also be a graphics processor (GPU), a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The various methods, steps and logic block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general-purpose processor may be a microprocessor or the processor 340 may also be any conventional processor, etc.

Among them, the above-mentioned electronic devices 300 include but are not limited to smart phones, tablets, computers, industrial computers, vehicle-mounted equipment, servers, smart wearable devices, edge boxes, etc.

Based on the same inventive concept as that of improving the problem of insufficient memory each time the processor generates hardware execution commands for the network model in the above-mentioned data processing method, in an optional implementation, the memory can be configured to store the network model, and can also be configured to store the original data required to execute the network model, such as input data to be processed, and characteristic data of the network itself.

The first processor may be configured to allocate a corresponding virtual memory space for the network model, translate each operation contained in the network model into a corresponding first hardware execution command, and store the first hardware execution command. The electronic device may also include a central processing unit (CPU), and the first processor may be a coprocessor that assists the central processing unit in data processing, such as a graphics processing unit (GPU) or a general purpose computing on graphics processing units (GPGPU). The CPU and the first processor may be regarded as the above-mentioned AI chip.

When the network model is to be executed, the first processor loads the data required to execute the network model into the real memory space of the first processor, and replaces the false address in the first hardware execution command with the real address corresponding to the real memory space, and sends the replaced first hardware execution command as the second hardware execution command to the corresponding hardware device for execution. Afterwards, when it is determined that the network model will not be executed within a preset time period starting from the current moment, the real address in the second hardware execution command can be replaced with the false address corresponding to the false memory space, and the cache address is replaced with the second hardware execution command with the false address.

In order to better understand the principles of the present application, the command generation method provided in the example of the present application and a command generation method that does not use a false memory are compared and explained below in conjunction with the electronic device shown in FIG. 11 .

General process:

Step 1. Initially, the network raw data (including the input data to be processed and the characteristic data of the network itself) and the network model are stored in In a storage device (which may be a disk).

Step 2. Before the network model needs to be translated into hardware execution commands, the original network data and the data of the network model itself need to be loaded into the DDR (Double Data Rate) of the CPU. According to the CPU DDR space occupied by the data, a real DDR space of the same size is allocated and occupied in the dedicated DDR of the first processor. Through the collaboration of the CPU and the first processor, all data (including input data) stored in the DDR of the CPU are moved to the DDR of the first processor.

Step 3. When translating the network model into hardware execution commands, the first processor combines all the operation operators in the network model, the DDR addresses of the feature data, the DDR addresses of the input data, and the DDR addresses of the storage operation results based on the allocated real DDR space to generate a series of hardware execution commands.

Step 4. Then directly execute these hardware execution commands.

A process of using the command generation method shown in this application may include:

Step 1. Initially, the original network data (including input data and may also include characteristic data of the network itself) and the network model may also be stored in a storage device (which may be a disk).

Step 2. Before the network model needs to be translated into the first hardware execution command, a fake memory space corresponding to the data size required to execute the network model is allocated, and the network model is loaded into the DDR of the first processor.

Step 3. When translating the network model into the first hardware execution command, the first processor combines all the operation operators in the network model, the false DDR addresses of the feature data, the false DDR addresses of the input data, and the false DDR addresses of the storage operation results based on the allocated false DDR space (false memory space) to generate a series of first hardware execution commands, and stores these hardware execution commands.

Step 4. When the network model is subsequently executed, the original network data is loaded into the DDR of the CPU. According to the DDR space occupied by the data, a DDR space of the same size (real memory space) is allocated in the DDR of the first processor. Through the collaboration between the CPU and the first processor, all the data in the DDR of the CPU is moved to the DDR of the first processor. Then, according to the real address corresponding to the allocated real memory space, the false address in the first hardware execution command is replaced, and the replaced first hardware execution command is sent to the corresponding hardware device.

When multiple different network models need to be executed on an electronic device, and the DDR of the CPU and the first processor is limited, the situation is:

Conventional process: Repeat step 1, step 2, and step 3, which will quickly fill up the DDR.

By adopting the command generation method shown in the present application: repeatedly executing the above steps 1, 2 (and 3), all the required hardware execution commands can be generated, and almost no DDR of the CPU and the first processor is occupied. When a certain network model needs to be executed, the corresponding step 4 is executed.

It should be noted that the process shown in Figure 11 is only one of many embodiments, and the process of translating each operation contained in the network model into a corresponding hardware execution command can also be completed by the above-mentioned CPU. The embodiment of the present application also provides a non-volatile computer-readable storage medium (hereinafter referred to as the storage medium), on which a computer program is stored. When the computer program is run by a computer such as the above-mentioned electronic device 300, the data processing method shown above is executed. The aforementioned computer-readable storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk, etc. Various media that can store program codes.

It should be noted that the various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments. The same or similar parts between the various embodiments can be referenced to each other.

In addition, the functional modules in the various embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

If the functions are implemented in the form of software function modules and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application, or the part that contributes to the prior art, or the part of the technical solution, can be embodied in the form of a software product, which is stored in a computer-readable storage medium and includes several instructions for enabling a computer device (which can be a personal computer, a laptop, a server, or an electronic device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present application.

The above is only a specific implementation of the present application, but the protection scope of the present application is not limited thereto. Any technician familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application, which should be included in the protection scope of the present application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Industrial Applicability

The present application relates to a data processing method, device, AI chip, electronic device and storage medium, and belongs to the field of data processing technology. The data processing method includes: obtaining a calculation graph of a network model to be run; translating each operation in the calculation graph of the network model into a hardware execution command that can be executed by the target hardware device of the AI chip; and storing the hardware execution command using the network execution graph. By translating each operation in the calculation graph of the network model into a hardware execution command that can be executed by the corresponding target hardware device and storing it, each subsequent time when the network model needs to be run, the pre-stored hardware execution command is directly distributed to the corresponding hardware execution, and there is no need to re-translate each operation in the calculation graph of the network model into a hardware execution command, thereby improving the problem that the processor requires a large performance overhead and takes a long time to run the network model each time.

In addition, it is understood that the data processing method, device, AI chip, electronic device and storage medium of the present application are reproducible and can be used in a variety of industrial applications. For example, the data processing method, device, AI chip, electronic device and storage medium of the present application can be used in any device that needs to reduce the performance overhead of the processor and improve the efficiency of data processing.

Claims

A data processing method, wherein the data processing method comprises:

Get the computational graph of the network model to be run;

Translate each operation in the computation graph of the network model into a hardware execution command executable by the target hardware device of the AI chip, wherein the hardware execution command includes device information of the target hardware device;

The hardware execution commands are stored using a network execution graph, wherein the network execution graph is used to record all hardware execution commands generated for the network model, and the target hardware device is used to run the network model by executing the hardware execution commands in the network execution graph.
The data processing method according to claim 1, wherein translating each operation contained in the computation graph of the network model into a hardware execution command executable by a target hardware device of the AI chip comprises:

Compile the source code of each operation in the computation graph of the network model into instructions using a preset first API function, and obtain relevant information required for the target hardware device to perform each operation using a preset second API function;

The hardware execution command is generated by using the preset third API function according to the corresponding instructions of each operation and the relevant information required to execute each operation.
The data processing method according to claim 1 or 2, wherein storing the hardware execution command using a network execution graph comprises:

According to the execution order of each operation contained in the network model, the hardware execution command corresponding to each operation is stored in the network execution graph in sequence, and the key information of each hardware execution command is recorded, and the key information is used to obtain the hardware execution command.
The data processing method according to any one of claims 1 to 3, wherein the data processing method further comprises:

When the network model needs to be run, obtaining the hardware execution command pre-stored in the network execution graph;

The hardware execution command is sent to the target hardware device for execution, so that the target hardware device executes the hardware execution command, thereby realizing running the network model on the target hardware device.
The data processing method according to claim 4, wherein sending the hardware execution command to the target hardware device for execution comprises:

Modifying a read address in the hardware execution command for obtaining input data, and/or modifying a write address in the hardware execution command for storing output data;

The modified hardware execution command is sent to the target hardware device for execution, so that the target hardware device executes the modified hardware execution command, thereby running the network model on the target hardware device to process the input data.
The data processing method according to any one of claims 1 to 5, wherein the data processing method further comprises:

Copying the hardware execution command according to the total number of hardware devices in the AI chip;

According to the device information of other hardware devices in the AI chip except the target hardware device, the device information contained in the copied hardware execution command is modified to obtain the hardware execution command with the modified device information, wherein the hardware execution command with the modified device information can be provided to the other hardware devices for execution.
The data processing method according to claim 6, wherein the data processing method further comprises:

According to the amount of data to be processed, a first number of hardware devices currently required to run the network model is determined.
The data processing method according to any one of claims 1 to 7, wherein translating each operation in the computation graph of the network model into a hardware execution command executable by a target hardware device of the AI chip comprises:

Allocating corresponding false memory space for the network model; and

Based on the virtual memory space, each operation included in the network model is translated into a corresponding first hardware execution command, the addresses in the first hardware execution command are all virtual addresses, and the virtual memory space has the same properties as the real memory space.
According to the data processing method of claim 8, wherein the hardware execution command is stored using a network execution graph, comprising: after translating each operation contained in the network model into a corresponding first hardware execution command based on the virtual memory space, storing the first hardware execution command, the first hardware execution command being used to provide the first hardware execution command to a hardware device that needs to run the network model for execution after address replacement.
The data processing method according to claim 8 or 9, wherein allocating a corresponding virtual memory space to the network model comprises:

According to the data size required for executing the network model, a virtual memory space corresponding to the data size is allocated.
The data processing method according to any one of claims 8 to 10, wherein, before translating each operation included in the network model into a corresponding first hardware execution command based on the virtual memory space, the data processing method further comprises:

Determining whether the network model is executed within a preset time period starting from the current moment;

When it is determined that the network model is not executed within a preset time period starting from the current moment, the step of: translating each operation included in the network model into a corresponding first hardware execution command based on the virtual memory space.
The data processing method according to claim 11, wherein the data processing method further comprises:

When it is determined that the network model is to be executed within a preset time period starting from the current moment, each operation included in the network model is translated into a corresponding second hardware execution command based on the real memory space, wherein the addresses included in the second hardware execution command are all real addresses, and the real memory space The space is used to store the data required for executing the network model.
The data processing method according to any one of claims 9 to 12, wherein, after storing the first hardware execution command, the data processing method further comprises:

When the network model needs to be executed, the data required for executing the network model is loaded into the real memory space;

Replacing the false address in the first hardware execution command with the real address corresponding to the real memory space;

The replaced first hardware execution command is sent to the corresponding hardware device as the second hardware execution command, so that the corresponding hardware device executes the second hardware execution command.
The data processing method according to claim 13, wherein replacing the false address in the first hardware execution command with the real address corresponding to the real memory space comprises:

Identifying the first hardware execution command, and determining part or all of the first hardware execution command currently containing the false address as the target command;

The real address corresponding to the real memory space is used to replace the false address in the target command.
The data processing method according to claim 13 or 14, wherein after sending the replaced first hardware execution command as the second hardware execution command to the corresponding hardware device for execution, the data processing method further comprises:

The real address in the second hardware execution command is replaced with the false address corresponding to the false memory space, and the command in which the address is replaced with the false address is cached.
The data processing method according to any one of claims 8 to 15, wherein, based on the virtual memory space, translating each operation included in the network model into a corresponding first hardware execution command comprises:

Compiling source codes of various operations included in the network model into instructions corresponding to the various operations, and obtaining relevant information required to execute various operations included in the network model based on the virtual memory space, wherein the relevant information includes address information; and

The first hardware execution command is generated according to the corresponding instructions of each operation and the relevant information required to execute each operation.
A data processing method, wherein the data processing method comprises:

When the network model needs to be run, a pre-stored hardware execution command executable by a target hardware device corresponding to the network model is obtained;

The hardware execution command is sent to the target hardware device so that the target hardware device executes the hardware execution command, thereby achieving the purpose of running the network model on the target hardware device to process the input data.
The data processing method according to claim 17, wherein, when it is necessary to run the network model, obtaining the pre-stored hardware execution command executable by the target hardware device corresponding to the network model comprises:

When the network model needs to be executed, the network original data corresponding to the network model is loaded into the real memory space, and a pre-stored first hardware execution command is obtained, wherein the first hardware execution command is obtained by translating each operation included in the network model based on the virtual memory space, and the virtual memory space has the same attributes as the real memory space; and

The false address in the first hardware execution command is replaced by the real address corresponding to the real memory space.
According to the data processing method of claim 18, after replacing the false address in the first hardware execution command with the real address corresponding to the real memory space, the data processing method further includes: sending the replaced first hardware execution command as the second hardware execution command to the corresponding hardware device.
According to the data processing method of claim 19, after sending the replaced first hardware execution command as the second hardware execution command to the corresponding hardware device for execution, the data processing method further comprises: replacing the real address in the second hardware execution command with a false address corresponding to the false memory space, and replacing the cache address with the second hardware execution command of the false address.
A data processing device, wherein the data processing device comprises:

An acquisition module, wherein the acquisition module is configured to: acquire a computational graph of a network model to be run;

A command generation module, the command generation module is configured to: translate each operation in the computation graph of the network model into a hardware execution command executable by a target hardware device, the hardware execution command including device information of the target hardware device; and

A storage module, wherein the storage module is configured to: store the hardware execution commands using a network execution graph, wherein the network execution graph is used to record all hardware execution commands generated for the network model, and the target hardware device is used to run the network model by executing the hardware execution commands in the network execution graph.
The data processing device according to claim 21, wherein the command generation module comprises:

An allocation module, the allocation module being configured to: allocate a corresponding virtual memory space for the network model; and

A translation module, wherein the translation module is configured to: based on the virtual memory space, translate each operation contained in the network model into a corresponding first hardware execution command, wherein the addresses in the first hardware execution command are all virtual addresses, and the virtual memory space has the same properties as the real memory space.
The data processing device according to claim 22, wherein the storage module is further configured to: store the first hardware execution command, and the first hardware execution command is used to be provided to the hardware device that needs to run the network model for execution after the address is replaced.
A data processing device, comprising:

The acquisition module is configured to: when the network model needs to be run, obtain the target hardware device corresponding to the pre-stored network model; hardware execution commands that the device can execute; and

A sending module is configured to: send the hardware execution command to the target hardware device so that the target hardware device executes the hardware execution command to achieve the purpose of running the network model on the target hardware device to process the input data.
The data processing device according to claim 24, wherein the acquisition module comprises:

A first hardware execution command acquisition module, the first hardware execution command acquisition module is configured to: when the network model needs to be executed, load the network original data corresponding to the network model into the real memory space, and acquire a pre-stored first hardware execution command; wherein the first hardware execution command is obtained by translating each operation included in the network model based on the virtual memory space, and the virtual memory space has the same properties as the real memory space; and

A translation module is configured to replace the false address in the first hardware execution command with the real address corresponding to the real memory space.
The data processing device according to claim 25,

The sending module is further configured to send the replaced first hardware execution command as the second hardware execution command to the corresponding hardware device.
An AI chip, wherein the AI chip comprises:

A kernel, the kernel being configured to: obtain a computation graph of a network model to be run, and translate each operation in the computation graph of the network model into a hardware execution command executable by a target hardware device, wherein the hardware execution command includes device information of the target hardware device; and

A storage device, wherein the storage device is configured to: store the hardware execution commands using a network execution graph, wherein the network execution graph is used to record all hardware execution commands generated for the network model, and the target hardware device is used to run the network model by executing the hardware execution commands in the network execution graph.
The AI chip according to claim 27, wherein:

The kernel is further configured to: allocate a corresponding virtual memory space for the network model, and translate each operation included in the network model into a corresponding first hardware execution command based on the virtual memory space, wherein addresses in the first hardware execution command are all virtual addresses, and the virtual memory space has the same attributes as the real memory space; and

The storage device is further configured to store the first hardware execution command, where the first hardware execution command is provided to a hardware device that needs to run the network model for execution after address replacement.
An AI chip, wherein the AI chip comprises:

hardware equipment;

A storage device, the storage device being configured to: store hardware execution commands corresponding to each operation in a computational graph of a network model; and

A kernel, wherein the kernel is configured to: when it is necessary to run the network model, obtain the previously stored hardware execution command from the storage device and send the hardware execution command to the hardware device,

Wherein, the hardware device is configured to: execute the hardware execution command to achieve the purpose of running the network model to process the input data.
The AI chip according to claim 29, wherein:

The storage device is further configured to: store a first hardware execution command, wherein the first hardware execution command is obtained by translating each operation included in the network model based on a virtual memory space, and the virtual memory space has the same properties as the real memory space;

The kernel is further configured to: when the network model needs to be executed, load the network original data corresponding to the network model into the real memory space, obtain the first hardware execution command stored in the storage device, replace the false address in the first hardware execution command with the real address corresponding to the real memory space, and send the replaced first hardware execution command as the second hardware execution command to the first hardware device;

The hardware device is further configured to: execute the second hardware execution command.
An electronic device, wherein the electronic device comprises:

A memory and a processor, wherein the processor is connected to the memory;

The memory is configured to: store a program;

The processor is configured to call a program stored in the memory to execute the method according to any one of claims 1 to 20.
A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method according to any one of claims 1 to 20 is executed.