WO2020259020A1

WO2020259020A1 - Instruction block processing method and apparatus, storage medium, and electronic device

Info

Publication number: WO2020259020A1
Application number: PCT/CN2020/085180
Authority: WO
Inventors: 姚海东; 徐东
Original assignee: 中兴通讯股份有限公司
Priority date: 2019-06-26
Filing date: 2020-04-16
Publication date: 2020-12-30
Also published as: CN112148291A

Abstract

An instruction block processing method and apparatus, a storage medium and an electronic apparatus. Said method comprises: compiling, by means of a compiling module, a description file of a neural network model, to obtain an image package, the image package comprising an instruction block group, an instruction block sequence table, a jump instruction mapping table, the instruction block group comprising a plurality of instruction blocks to be processed, the instruction block sequence table being used to indicate the operation sequence of the plurality of instruction blocks, and an execution device for operating the instruction blocks, the execution device comprising a processor, an acceleration device, a jump instruction being provided after each of the instruction blocks, the jump instruction mapping table comprising the jump instruction and a next instruction block to be executed (S202); and loading the image package and processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table (S204).

Description

Method and device for processing instruction block, storage medium and electronic device

Cross references to related applications

This application is filed based on a Chinese patent application with an application number of 201910562823.6 and an application date of June 26, 2019, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated into this application by way of introduction.

Technical field

This application relates to the field of neural networks, and in particular to a method and device for processing instruction blocks, storage media, and electronic devices.

Background technique

With the tremendous improvement of computing power and the convenience of big data acquisition, deep learning technology has made great progress. More and more problems such as image processing and natural language analysis can be solved well through deep learning technology.

The deep neural network model solves business problems and needs to perform an inference process. Devices that perform inference calculations generally include central processing unit (Central Processing Unit, referred to as CPU), graphics processing unit (Graphics Processing Unit, referred to as GPU), field programmable gate array (Field Programable Gate Array, referred to as FPGA), etc. In the process of implementing this type of business, to efficiently use resources and quickly obtain results, it is necessary to have a deep understanding of the calculation and storage architecture of inference computing equipment, and a deep understanding of the computing requirements described by deep neural networks. It is often difficult and takes a long time.

In particular, some business functions often require a combination of multiple neural networks to complete. For example, in a face recognition business scenario, a deep neural network model needs to be called first to detect whether the image contains a human face (Face Detection Face Image). Detection), if a person’s face image is input into another deep neural network for inference operation, the detailed feature information of the face image is obtained, and identification (Face Idnetification, face recognition) is performed, and finally the business office is obtained The desired result.

Regarding the related technology, for one or more neural network systems, how to schedule and process different instruction blocks and other issues, there is currently no effective solution.

Summary of the invention

The embodiments of the present application provide a method and device for processing instruction blocks, a storage medium, and an electronic device to solve the problems of how to schedule and process different instructions for one or more neural network systems in the related art.

According to an embodiment of the present application, there is provided a method for processing instruction blocks, which includes: compiling a description file of a neural network model by a compilation module to obtain a mirrored package, wherein the mirrored package includes: instruction block group, instruction A block sequence table, a jump instruction mapping table, the instruction block group includes a plurality of instruction blocks to be processed, the instruction block sequence table is used to indicate the running sequence of the plurality of instruction blocks, and the execution device for running the instruction blocks , The execution device includes a processor, an acceleration device, a jump instruction is set after each instruction block, and the jump instruction mapping table includes: the jump instruction and the next instruction block to be executed; The mirror package is described, and the instruction blocks of the instruction block group are processed according to the instruction block sequence table and the jump instruction mapping table.

In the embodiment of the present application, the description file of the neural network model is compiled by the compiling module to obtain the image package, including: the description file of the multiple neural network models is compiled by the compilation module to obtain the mirror image corresponding to the multiple neural network models package.

In the embodiment of the present application, processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table includes: instructing the execution device in the instruction block sequence table to follow the instruction block The running sequence of the sequence table and the jump instruction mapping table process the instruction blocks of the instruction block group.

In the embodiment of the present application, after processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table, the method further includes: for each instruction, in accordance with the execution sequence After processing an instruction, the obtained data is cached in the pre-allocated buffer area.

According to another embodiment of the present application, there is also provided an instruction block processing device, including: a compiling module for compiling the description file of the neural network model to obtain an image package, wherein the image package includes: instructions A block group, an instruction block sequence table, a jump instruction mapping table, the instruction block group includes a plurality of instruction blocks to be processed, and the instruction block sequence table is used to indicate the running sequence of the plurality of instruction blocks, and the running instructions A block execution device, the execution device includes: a processor, an acceleration device, a jump instruction is set after each instruction block, and the jump instruction mapping table includes: the jump instruction and the next executed instruction Block; processing module, used to load the mirrored package, and process the instruction block of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.

In the embodiment of the present application, the compilation module is used to compile description files of multiple neural network models through the compilation module to obtain mirror packages corresponding to the multiple neural network models.

In the embodiment of the present application, the processing module is further used to instruct the execution device in the instruction block sequence table to process the instruction block group according to the running sequence of the instruction block sequence table and the jump instruction mapping table .

In the embodiment of the present application, the processing module is further configured to cache the obtained data in a pre-allocated buffer area after processing an instruction according to the running sequence for each instruction.

According to another embodiment of the present application, there is also provided a storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps in any one of the foregoing method embodiments when running.

According to another embodiment of the present application, there is also provided an electronic device, including a memory and a processor, the memory is stored with a computer program, and the processor is configured to run the computer program to execute any of the above Steps in the method embodiment.

Description of the drawings

The drawings described here are used to provide a further understanding of the application and constitute a part of the application. The exemplary embodiments and descriptions of the application are used to explain the application and do not constitute an improper limitation of the application. In the attached picture:

FIG. 1 is a block diagram of the hardware structure of a terminal of an instruction block processing method according to an embodiment of the present application;

Figure 2 is a flowchart of a method for processing an instruction block according to an embodiment of the present application;

Fig. 3 is a structural block diagram of an instruction block processing device according to an embodiment of the present application;

Fig. 4 is a schematic diagram of a working flow of a compilation module according to a preferred embodiment of the present application;

FIG. 5 is a schematic diagram of the structure of an image package according to a preferred embodiment of the present application;

Fig. 6 is a functional block diagram of a running state module according to a preferred embodiment of the present application;

Figure 7 is a schematic diagram of input and output buffers and control information according to a preferred embodiment of the present application;

8 is a schematic diagram of adding instruction blocks and jump instructions of an acceleration device according to a preferred embodiment of the present application;

FIG. 9 is a mapping table of the jump position of an instruction block of an acceleration device according to a preferred embodiment of the present application;

FIG. 10 is a flow chart of an acceleration device operating according to instructions according to a preferred embodiment of the present application;

Fig. 11 is an internal functional block diagram of an acceleration device according to a preferred embodiment of the present application;

Figure 12 is a flowchart of interaction between a host and an acceleration device according to a preferred embodiment of the present application;

Fig. 13 is an overall system block diagram according to a preferred embodiment of the present application.

Detailed ways

Hereinafter, the application will be described in detail with reference to the drawings and in conjunction with embodiments. It should be noted that the embodiments in this application and the features in the embodiments can be combined with each other if there is no conflict.

It should be noted that the terms "first" and "second" in the description and claims of the application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence.

Example 1

The method embodiment provided in Embodiment 1 of the present application may be executed in a terminal or similar computing device. Taking running on a terminal as an example, FIG. 1 is a hardware structure block diagram of a terminal in a method for processing instruction blocks in an embodiment of the present application. As shown in FIG. 1, the terminal 10 may include one or more (only one is shown in FIG. 1) processor 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) And the memory 104 for storing data, optionally, the aforementioned terminal may also include a transmission device 106 and an input/output device 108 for communication functions. Those of ordinary skill in the art can understand that the structure shown in FIG. 1 is only for illustration, and does not limit the structure of the foregoing terminal. For example, the terminal 10 may also include more or fewer components than those shown in FIG. 1, or have the same functions as those shown in FIG. 1 or more different configurations than those shown in FIG. 1.

The memory 104 may be used to store computer programs, for example, software programs and modules of application software, such as the computer programs corresponding to the navigation method of online ride-hailing in the embodiment of the present application. The processor 102 runs the computer programs stored in the memory 104, Thereby, various functional applications and data processing are executed, that is, the above-mentioned method is realized. The memory 104 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include a memory remotely provided with respect to the processor 102, and these remote memories may be connected to the terminal 10 through a neural network model. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used to receive or send data via a network. The aforementioned specific examples of the network may include a wireless network provided by the communication provider of the terminal 10. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, NIC for short), which can be connected to other network devices through a base station to communicate with the Internet. In an example, the transmission device 106 may be a radio frequency (Radio Frequency, referred to as RF) module, which is used to communicate with the Internet in a wireless manner.

In this embodiment, a method for processing an instruction block running on a terminal is provided. FIG. 2 is a flowchart of the method for processing an instruction block according to an embodiment of the present application. As shown in FIG. 2, the process includes the following steps:

In step S202, the description file of the neural network model is compiled by the compiling module to obtain a mirrored package, where the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, and the instruction block group includes multiple to-be-processed Instruction block, instruction block sequence table is used to indicate the running sequence of multiple instruction blocks, and the execution device for running the instruction block, the execution device includes: processor, acceleration device, each instruction block is set with a jump instruction, jump instruction The mapping table includes: jump instructions and the next executed instruction block;

Step S204: Load the mirrored package, and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.

Through this application, the description file of the neural network model is compiled through the compilation module to obtain a mirrored package, wherein the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, and the instruction block group includes A plurality of instruction blocks to be processed, the instruction block sequence table is used to indicate the execution sequence of the plurality of instruction blocks, and an execution device for running the instruction blocks. The execution device includes: a processor, an acceleration device, each A jump instruction is set after the instruction block, and the jump instruction mapping table includes: the jump instruction and the next instruction block to be executed; load the mirror package, and follow the instruction block sequence table and the jump instruction The instruction mapping table processes the instruction blocks of the instruction block group, which solves the problem of how to schedule and process different instruction blocks for one or more neural network systems in the related technology, and then flexibly process multiple instruction block groups. Instruction block.

In the embodiment of the present application, the description file of the neural network model is compiled through the compiling module to obtain the image package, including: compiling the description files of the multiple neural network models through the compilation module to obtain the mirror images corresponding to the multiple neural network models package.

Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiment can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is Better implementation. Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the method described in each embodiment of the present application.

In this embodiment, an instruction block processing device is also provided, and the device is used to implement the above-mentioned embodiments and preferred implementations, and those that have been described will not be repeated. As used below, the term "module" can implement a combination of software and/or hardware with predetermined functions. Although the devices described in the following embodiments are preferably implemented by software, the implementation of hardware or a combination of software and hardware is also possible and conceived.

Fig. 3 is a structural block diagram of an instruction block processing device according to an embodiment of the present application. As shown in Fig. 3, the device includes:

The compiling module 30 is used to compile the description file of the neural network model to obtain a mirrored package, where the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, and the instruction block group includes waiting A plurality of instruction blocks processed, the instruction block sequence table is used to indicate the execution sequence of the plurality of instruction blocks and an execution device for running the instruction blocks. The execution device includes a processor, an acceleration device, and each of the A jump instruction is set after the instruction block, and the jump instruction mapping table includes: the jump instruction and the next instruction block to be executed;

The processing module 32 is configured to load the image package, and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.

In the embodiment of the present application, the compilation module 30 is configured to compile description files of multiple neural network models through the compilation module to obtain mirror packages corresponding to the multiple neural network models.

In the embodiment of the present application, the processing module 32 is also used to instruct the execution device in the instruction block sequence table to process the instruction block according to the running sequence of the instruction block sequence table and the jump instruction mapping table group.

In the embodiment of the present application, the processing module 32 is further configured to cache the obtained data in a pre-allocated buffer area after processing an instruction according to the running sequence for each instruction.

The process of processing the above instruction block will be roughly described below in conjunction with a preferred embodiment, but it is not used to limit the technical solution of the embodiment of the present application.

Preferred embodiment 1

The preferred embodiment of this application focuses on the realization of the reasoning function of the face detection service. It should be noted that the neural network model 1 performs the face detection function, recognizes whether the picture contains a face, and gives the position of the face in the picture. ; Neural network model 2 performs face recognition, extracts the features of the face given by neural network model 1, compares with the database, and gives the recognition results; generally, some preprocessing is required before inputting the neural network model. After the network model runs, some post-processing work is performed.

Based on the functions completed by the aforementioned neural network model 1 and neural network model 2, the technical solution of the preferred embodiment of the present application includes the following steps:

Step 1: Input the two neural network models into the compilation module for compilation;

Step 1.1: Two neural network model description files are input into the coding module for compilation;

Step 1.2: Compile the module to compile, and finally output the image package, as shown in Figure 4;

Among them, as shown in Figure 5, the image package includes instruction block group and instruction block sequence description, and acceleration device jump mapping table.

Specifically, the instruction block group includes NET_C0, NET_D1, NET_C1, NET_D2, NET_C2; respectively correspond to the CPU face detection preprocessing (NET_C0), the acceleration device performs face detection (NET_D1), and the CPU performs face detection post-processing and face recognition Preprocessing (NET_C1), the acceleration device performs face recognition processing (NET_D2), and the CPU performs face recognition post-processing to complete the business (NET_C2);

The instruction block sequence table gives C0-D1-C1-D2-C2, where Cx indicates that the CPU performs calculation processing with sequence number x; Dx indicates that the acceleration device Device performs sequence number x segment instruction processing.

It should be noted that the preferred embodiment of the present application assumes that the related operations described by neural network model 1 are processed by acceleration equipment as a whole (NET_D1), and the calculation requirements described by neural network model 2 are processed by acceleration equipment (NET_D2). Face detection preprocessing (NET_C0), the output result is submitted to the acceleration device for processing, and face detection is required. After the acceleration device receives the relevant input, it completes the face detection NET_D1 operation, outputs the information of the face position and other information, which is obtained by the CPU, performs face detection post-processing and face recognition preprocessing (NET_C1 description), and submits the processed data to the acceleration device Perform face recognition (NET_D2) processing. After the processing is completed, submit it to the CPU for neural network model face recognition post-processing (NET_C2) to complete the overall business function;

It should be noted that it is assumed here that the division of the single instruction block of the single neural network model does not lose generality. If the acceleration device supports part of the operations described by the neural network model, it can be divided into multiple modules. For details, please refer to the preferred embodiment 2. .

Step 2 Related processing flow in running state;

The image package output is completed in the compilation stage and submitted to the running state for operation.

As shown in Figure 6, the running state includes the following modules: load module, acceleration device control management module, input output management module, upper-level API interface, etc., among which,

The load module completes the loading of the instruction block group to the corresponding device, and the acceleration device management module controls the startup, stop, and reset of the acceleration device; the API interface completes the interaction with the upper-level user;

The input and output management module completes the input and output interaction with the acceleration device (as shown in Figure 11 is the internal block diagram of the acceleration device), and organizes the running items through the control information contained in the cache item, specifically: from the acceleration device side See, there are an input buffer (InBuffer) and an output buffer (OutBuffer); as shown in Figure 7: the buffer content has two blocks, one block of control information, one block of data information; the control information includes picture sequence number Px, instruction block processing equipment and Instruction block serial number Tx, where x is a number, T is a device type, there are C and D, C represents the HOST side CPU, and D represents the acceleration device Device.

Step 2.1: Use API interface to set input and output, complete programming;

Step 2.2: A general compiling tool (such as gcc) compiles the code and generates an executable file;

Step 2.3: Run the executable file: the running process is shown in Figure 12.

In the running state of the HOST side, according to the instruction block sequence table, the CPU is first scheduled for face detection preprocessing (NET_C0). After the calculation is completed, the data is filled into the InBuffer, and the filling control information is P1-Net-D1, indicating that face detection is required Neural network model inference process.

The accelerator obtains the content in the InBuffer and performs the instruction processing of face detection (Net-D1). After the processing is completed, the data is filled into the OutBuffer and the input control information (P1-Net-D1) is copied at the same time.

After the HOST side obtains the content from the device OutBuffer, it judges to perform face detection post-processing and face recognition neural network model preprocessing (Net-C1) according to the instruction block sequence table. After the processing is completed, fill the data into InBuffer, And fill the control information as P1-Net_D2 (face recognition neural network model operation) according to the instruction set sequence table.

The acceleration device obtains this item, performs face recognition calculation (Net_d2) processing, outputs data, and copies control information P1-Net_D2.

The CPU side obtains this item, and performs face recognition data post-processing (NET_C2) processing according to the instruction block sequence table to complete the overall reasoning.

It can be seen from the above process that in the process of using the instruction block sequence table, the processing of the control information on the device side is accelerated, and only copying is performed. The host side maintains and changes control information according to the instruction block sequence table.

Here, the acceleration device has two different functions, face detection (NET-D1) and face recognition (NET-D2). The time-division multiplexing is to use jump instructions and jump mapping tables to complete the neural network. The model function is switched by reasoning. The relevant situation is as follows:

The first case: Compiled state processing: the following four steps:

Step a: The compilation module generates a neural network model according to the equipment situation and dispatches the instruction block NET_Tx to different equipment (face detection processing (NET_D1) and face recognition processing (NET_D2) in this example;

Step b: The compilation module adds a jump instruction JMP 0 after the acceleration device instruction block NET_Dx;

Step c: The compilation module generates the acceleration device jump mapping table (it can also be generated in the running state, and the compilation module generation is described here). The jump mapping table is shown in Figure 9.

Step d: The compilation module adds the buff acquisition instruction and the JMP Rj instruction before the acceleration device instruction block NET-D0; (Figure 8)

The second case: the operation process of the acceleration device to the instruction is as follows: (Figure 10)

1) Execute instruction, execute from instruction base address 0;

2) Get data from InBuffer (control information + data input), parse the neural network model index from the control word, look up the mapping table, get the base address of the execution instruction, and fill it into Rj.

3) Jump to the instruction base address to start execution;

4) Until the execution reaches the end of the neural network model processing, output the execution result to OutBuffer, and write the input control word;

5) Then execute JMP 0 and repeat execution.

Preferred embodiment 2: The single neural network model needs to run together on the host and the acceleration device

Step 2.1: The neural network model is input to the compilation module, compiled, and the image package is output;

Assuming that the neural network model instruction block combination and instruction block sequence table are C0-D1-C1-D2-C2; it means that the neural network model needs to be preprocessed by the host CPU first, then the device is accelerated, then the CPU is processed, and then the device is accelerated, and then processed, Finally CPU processing;

The device jump table example is the same as the preferred embodiment 1, and will not be repeated here.

Step 2.2: According to the API interface code, compile and generate an executable file;

Step 2.3: Run in the running state on the HOST side, load and run the image; the process is roughly the same as step 2 in the preferred embodiment 1;

Step 2.4: The running state continues to run, and the inference calculation result is continuously given.

Preferred embodiment 3: Combination of multiple neural network models, and a single neural network model needs multiple splits

Step 3.1: Input the multi-neural network model into the compilation module, compile, and output the image package;

Without loss of generality, the neural network model instruction block combination and instruction block sequence table can be set as C0-D1-C1-D2-C2-D3-C3-C4-D4-C5-D5-C6;

The device jump table example is the same as the preferred embodiment 1.

Step 3.2: Same as other steps in the preferred embodiment 2.

From the perspective of the preferred embodiments of the present application, the embodiments of the present application and the content disclosed in the preferred embodiments are not only applicable to multi-neural network model combined reasoning services, but also services that require a combination of HOST and acceleration equipment to complete a single neural network model. In addition, a combination of multiple neural network models with a neural network model completed by a combination of a HOST and an acceleration device can also be applied; a combination of a host and multiple acceleration devices can also be applied; these are all within the protection scope of this application.

Furthermore, the technical solutions of the above-mentioned embodiments and preferred embodiments of the present application aim at the problem that the business reasoning related to the combination of multi-neural neural network models is difficult to implement. It provides a method for compiling and running in two stages, using jump instructions and mapping A method, device and system for implementing multi-neural neural network model combination business reasoning by table and instruction block table sequence.

In an optional embodiment of the present application, there is provided a method for compiling and generating a sequence table of used instruction blocks, and performing multi-device scheduling and cooperative operation according to the table during runtime;

In an optional embodiment of the present application, there is provided a method and device in which an acceleration device uses a simple jump instruction based on a jump mapping table to perform time division multiplexing to complete different arithmetic functions;

In an optional embodiment of the present application, a neural neural network model acceleration device is provided, including: an instruction cache, used to store related instructions; a functional unit set module, which implements neural neural network model related calculation modules; jump instructions, Jump mapping table, used to realize the jump of neural network model function group; register group, etc.;

In an optional embodiment of the present application, a compilation module of a deep neural network model is provided, which compiles and converts the deep neural network model into a related instruction set; and generates a sequence table of related instruction blocks and a jump mapping table;

In an optional embodiment of the present application, a running state module of neural network model inference operation is provided: including a loading module, which loads related images to a specific location; device control, which controls the start, stop, and reset of acceleration devices; The input and output management of the equipment, provide the data and requirements to be processed to the equipment, and obtain the processing results of the equipment; including the programming interface (API) provided to business users;

In summary, through the instruction block processing method and device of the embodiments of the present application and the preferred embodiment, the compiler and runtime system (as shown in Figure 13) can be used to complete the reasoning of the multi-neural neural network model conveniently, quickly and efficiently. Landing, simple completion of related business functions.

The embodiment of the present application also provides a storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps in any of the foregoing method embodiments when running.

Optionally, in this embodiment, the foregoing storage medium may be configured to store a computer program for executing the following steps:

S1: Compile the description file of the neural network model through the compilation module to obtain a mirrored package, where the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, instruction block group includes multiple instructions to be processed Block, instruction block sequence table is used to indicate the running sequence of multiple instruction blocks, and the execution device for running the instruction block, the execution device includes: processor, acceleration device, each instruction block is set with jump instructions, jump instruction mapping The table includes: jump instruction and next instruction block to be executed;

S2. Load the mirrored package, and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.

Optionally, in this embodiment, the foregoing storage medium may include, but is not limited to: U disk, Read-Only Memory (Read-Only Memory, ROM for short), Random Access Memory (Random Access Memory, RAM for short), Various media that can store computer programs, such as mobile hard disks, magnetic disks, or optical disks.

The embodiment of the present application also provides an electronic device, including a memory and a processor, the memory is stored with a computer program, and the processor is configured to run the computer program to execute the steps in any of the foregoing method embodiments.

Optionally, the aforementioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the aforementioned processor, and the input-output device is connected to the aforementioned processor.

Optionally, in this embodiment, the foregoing processor may be configured to execute the following steps through a computer program:

Optionally, for specific examples in this embodiment, reference may be made to the examples described in the above-mentioned embodiments and optional implementation manners, and details are not described herein again in this embodiment.

Obviously, those skilled in the art should understand that the above-mentioned modules or steps of this application can be implemented by a general computing device, and they can be concentrated on a single computing device, or distributed on a nerve composed of multiple computing devices. On the network model, they can optionally be implemented with program codes executable by the computing device, so that they can be stored in the storage device for execution by the computing device, and in some cases, they can be different from here. The steps shown or described are executed in the order of, or they are respectively fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module to achieve. In this way, this application is not limited to any specific hardware and software combination.

The above descriptions are only preferred embodiments of the application, and are not used to limit the application. For those skilled in the art, the application can have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the principles of this application shall be included in the protection scope of this application.

Claims

A method for processing instruction blocks, including:

The description file of the neural network model is compiled through the compilation module to obtain a mirrored package, where the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, and the instruction block group includes multiple to be processed Instruction blocks, the instruction block sequence table is used to indicate the running sequence of the multiple instruction blocks, and the execution device for running the instruction blocks, the execution device includes: a processor, an acceleration device, after each instruction block A jump instruction is set, and the jump instruction mapping table includes: the jump instruction and the next instruction block to be executed;

Load the image package, and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
The method according to claim 1, wherein the compiling of the description file of the neural network model by the compiling module to obtain the image package comprises:

The description files of multiple neural network models are compiled through the compilation module to obtain mirror packages corresponding to multiple neural network models.
The method according to claim 1, wherein processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table comprises:

Instruct the execution device in the instruction block sequence table to process the instruction blocks of the instruction block group according to the running sequence of the instruction block sequence table and the jump instruction mapping table.
The method according to any one of claims 1 to 3, wherein after processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table, the method further comprises:

For each instruction, after processing an instruction according to the running sequence, the obtained data is cached in the pre-allocated buffer area.
An instruction block processing device includes:

The compiling module is used to compile the description file of the neural network model to obtain a mirrored package, where the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, and the instruction block group includes to-be-processed The instruction block sequence table is used to indicate the running sequence of the multiple instruction blocks and the execution device for running the instruction block, the execution device includes: a processor, an acceleration device, each of the instructions A jump instruction is set after the block, and the jump instruction mapping table includes: the jump instruction and the next instruction block to be executed;

The processing module is configured to load the mirrored package and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
5. The device according to claim 5, wherein the compiling module is used to compile description files of multiple neural network models through the compiling module to obtain mirror packages corresponding to the multiple neural network models.
The apparatus according to claim 5, wherein the processing module is further configured to instruct the execution device in the instruction block sequence table to process the execution device in the instruction block sequence table according to the running sequence of the instruction block sequence table and the jump instruction mapping table. The instruction block group.
The device according to any one of claims 5 to 7, wherein the processing module is further configured to, for each instruction, after processing an instruction according to the running sequence, buffer the obtained data to a pre-allocated In the cache.
A storage medium, wherein a computer program is stored in the storage medium, wherein the computer program is configured to execute the method described in any one of claims 1 to 4 when the computer program is run.
An electronic device comprising a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to run the computer program to execute the one described in any one of claims 1 to 4 Methods.