CN112148291A - Instruction block processing method and device, storage medium and electronic device - Google Patents

Instruction block processing method and device, storage medium and electronic device Download PDF

Info

Publication number
CN112148291A
CN112148291A CN201910562823.6A CN201910562823A CN112148291A CN 112148291 A CN112148291 A CN 112148291A CN 201910562823 A CN201910562823 A CN 201910562823A CN 112148291 A CN112148291 A CN 112148291A
Authority
CN
China
Prior art keywords
instruction
instruction block
jump
neural network
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910562823.6A
Other languages
Chinese (zh)
Inventor
姚海东
徐东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201910562823.6A priority Critical patent/CN112148291A/en
Priority to PCT/CN2020/085180 priority patent/WO2020259020A1/en
Publication of CN112148291A publication Critical patent/CN112148291A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/61Installation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/61Installation
    • G06F8/63Image based installation; Cloning; Build to order

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

The embodiment of the invention provides a method and a device for processing an instruction block, a storage medium and an electronic device, wherein the method comprises the following steps: compiling the description file of the neural network model through a compiling module to obtain a mirror image package, wherein the mirror image package comprises: the instruction block group, instruction block preface table, jump instruction mapping table, instruction block group include a plurality of instruction blocks to be processed, instruction block preface table is used for pointing out the operation order of a plurality of instruction blocks to and the execution equipment of operation instruction block, execution equipment includes: the processor, acceleration equipment, be provided with jump instruction behind each instruction block, jump instruction mapping table includes: a jump instruction and a next executed instruction block; and loading the mirror image packet, and processing the instruction block group according to the instruction block sequence table and the jump instruction mapping table.

Description

Instruction block processing method and device, storage medium and electronic device
Technical Field
The invention relates to the field of neural networks, in particular to a method and a device for processing an instruction block, a storage medium and an electronic device.
Background
With the great improvement of computing power and the convenience of acquiring big data, the deep learning technology makes great progress, and more problems of image processing, natural language analysis and the like can be well solved through the deep learning technology.
The deep neural network model solves business problems and needs to execute an Inference (Inference) process. The device for performing inference operation generally includes a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Field Programmable Gate Array (FPGA), and the like, and in the process of performing such services, it is necessary to efficiently utilize resources and quickly obtain results, and it is necessary to deeply understand the computation and storage architecture of the inference operation device and to deeply understand the computation requirement described by the deep neural network. Often with greater difficulty and taking longer.
Particularly, some service functions are often completed by combining a plurality of neural networks, for example, in a Face recognition service scenario, a deep neural network model is first called to detect whether an image of a Face of a person is included in the image (Face Detection), if the image of the Face of the person is included, the image is input into another deep neural network to perform inference operation, detailed feature information of the Face image is obtained, discrimination (Face identification) is performed, and a result required by a service is finally obtained.
In the related art, no effective solution exists at present for the problem of how to schedule and process different instruction blocks for one or more neural network systems.
Disclosure of Invention
The embodiment of the invention provides a processing method and device of an instruction block, a storage medium and an electronic device, which are used for solving the problems of how to schedule and process different instructions for one or more neural network systems in the related art and the like.
According to an embodiment of the present invention, there is provided a method for processing an instruction block, including: compiling the description file of the neural network model through a compiling module to obtain a mirror image package, wherein the mirror image package comprises: the instruction block group comprises a plurality of instruction blocks to be processed, the instruction block sequence table is used for indicating the operation sequence of the instruction blocks and executing equipment of the instruction blocks, and the executing equipment comprises: the processor, acceleration equipment, be provided with jump instruction behind each instruction block, jump instruction mapping table includes: the jump instruction and the next executed instruction block; and loading the mirror image packet, and processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
In the embodiment of the present invention, compiling the description file of the neural network model by the compiling module to obtain the mirror image package includes: and compiling the description files of the plurality of neural network models through a compiling module to obtain mirror image packages corresponding to the plurality of neural network models.
In this embodiment of the present invention, processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table includes: and instructing the execution equipment in the instruction block sequence table to process the instruction blocks of the instruction block group according to the running sequence of the instruction block sequence table and the jump instruction mapping table.
In this embodiment of the present invention, after processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table, the method further includes: and for each instruction, after one instruction is processed according to the running sequence, caching the obtained data into a pre-allocated cache region.
According to another embodiment of the present invention, there is also provided an instruction block processing apparatus including: the compiling module is used for compiling the description file of the neural network model to obtain a mirror image package, wherein the mirror image package comprises: the instruction block group comprises a plurality of instruction blocks to be processed, the instruction block sequence table is used for indicating the operation sequence of the instruction blocks and executing equipment of the instruction blocks, and the executing equipment comprises: the processor, acceleration equipment, be provided with jump instruction behind each instruction block, jump instruction mapping table includes: the jump instruction and the next executed instruction block; and the processing module is used for loading the mirror image packet and processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
In the embodiment of the present invention, the compiling module is configured to compile the description files of the plurality of neural network models through the compiling module to obtain the mirror image packages corresponding to the plurality of neural network models.
In this embodiment of the present invention, the processing module is further configured to instruct an execution device in the instruction block sequence table to process the instruction block group according to the running order of the instruction block sequence table and the jump instruction mapping table.
In this embodiment of the present invention, the processing module is further configured to, for each instruction, after one instruction is processed according to the running order, cache the obtained data in a pre-allocated cache region.
According to a further embodiment of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.
According to the invention, the description file of the neural network model is compiled through the compiling module to obtain the mirror image package, wherein the mirror image package comprises: the instruction block group comprises a plurality of instruction blocks to be processed, the instruction block sequence table is used for indicating the operation sequence of the instruction blocks and executing equipment of the instruction blocks, and the executing equipment comprises: the processor, acceleration equipment, be provided with jump instruction behind each instruction block, jump instruction mapping table includes: the jump instruction and the next executed instruction block; and loading the mirror image packet, and processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table, so that the problems of how to schedule and process different instruction blocks and the like for one or more neural network systems in the related art are solved, and a plurality of instruction blocks in the instruction block group are flexibly processed.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a block diagram of a hardware configuration of a terminal of a method for processing an instruction block according to an embodiment of the present invention;
FIG. 2 is a flow diagram of a method of processing an instruction block according to an embodiment of the invention;
FIG. 3 is a block diagram of a processing apparatus of an instruction block according to an embodiment of the present invention;
FIG. 4 is a compiler module workflow diagram according to the preferred embodiment of the present invention;
FIG. 5 is a diagram of a mirror package composition in accordance with a preferred embodiment of the present invention;
FIG. 6 is a functional block diagram of the run state modules in accordance with a preferred embodiment of the present invention;
FIG. 7 is a diagram illustrating input/output buffering and control information according to a preferred embodiment of the present invention;
FIG. 8 is a schematic diagram of an accelerator instruction block and jump instruction addition in accordance with a preferred embodiment of the present invention;
FIG. 9 is an accelerator instruction block jump location map in accordance with a preferred embodiment of the present invention;
FIG. 10 is a flowchart of an acceleration device operating on command in accordance with a preferred embodiment of the present invention;
FIG. 11 is a functional block diagram internal to an acceleration device in accordance with a preferred embodiment of the present invention;
FIG. 12 is a flowchart of host and acceleration device interaction, according to a preferred embodiment of the present invention;
fig. 13 is an overall system block diagram according to a preferred embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Example 1
The method provided by the embodiment 1 of the present application can be executed in a terminal or a similar computing device. Taking the example of running on a terminal, fig. 1 is a hardware structure block diagram of the terminal of a processing method of an instruction block according to an embodiment of the present invention. As shown in fig. 1, the terminal 10 may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and optionally may also include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the terminal. For example, the terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration with equivalent functionality to that shown in FIG. 1 or with more functionality than that shown in FIG. 1.
The memory 104 may be used to store a computer program, for example, a software program and a module of an application software, such as a computer program corresponding to the navigation method of the networked car reservation in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the terminal 10 through a neural network model. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
In the present embodiment, a method for processing an instruction block executed in a terminal is provided, and fig. 2 is a flowchart of a method for processing an instruction block according to an embodiment of the present invention, as shown in fig. 2, the flowchart includes the following steps:
step S202, compiling the description file of the neural network model through a compiling module to obtain a mirror image package, wherein the mirror image package comprises: instruction block group, instruction block preface table, jump instruction mapping table, instruction block group include a plurality of instruction blocks of treating, instruction block preface table is used for pointing out the operation order of a plurality of instruction blocks to and the execution equipment of operation instruction block, and execution equipment includes: the treater, accelerating equipment is provided with the jump instruction behind each instruction block, and the jump instruction mapping table includes: a jump instruction and a next executed instruction block;
and step S204, loading the mirror image package, and processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
According to the invention, the description file of the neural network model is compiled through the compiling module to obtain the mirror image package, wherein the mirror image package comprises: the instruction block group comprises a plurality of instruction blocks to be processed, the instruction block sequence table is used for indicating the operation sequence of the instruction blocks and executing equipment of the instruction blocks, and the executing equipment comprises: the processor, acceleration equipment, be provided with jump instruction behind each instruction block, jump instruction mapping table includes: the jump instruction and the next executed instruction block; and loading the mirror image packet, and processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table, so that the problems of how to schedule and process different instruction blocks and the like for one or more neural network systems in the related art are solved, and a plurality of instruction blocks in the instruction block group are flexibly processed.
In the embodiment of the present invention, compiling the description file of the neural network model by the compiling module to obtain the mirror image package includes: and compiling the description files of the plurality of neural network models through a compiling module to obtain mirror image packages corresponding to the plurality of neural network models.
In this embodiment of the present invention, processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table includes: and instructing the execution equipment in the instruction block sequence table to process the instruction blocks of the instruction block group according to the running sequence of the instruction block sequence table and the jump instruction mapping table.
In this embodiment of the present invention, after processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table, the method further includes: and for each instruction, after one instruction is processed according to the running sequence, caching the obtained data into a pre-allocated cache region.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
In this embodiment, a processing apparatus of an instruction block is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, and details of which have been already described are omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 3 is a block diagram of a processing apparatus of an instruction block according to an embodiment of the present invention, as shown in fig. 3, the apparatus including:
a compiling module 30, configured to compile a description file of the neural network model to obtain a mirror image package, where the mirror image package includes: the instruction block group comprises a plurality of instruction blocks to be processed, the instruction block sequence table is used for indicating the operation sequence of the instruction blocks and executing equipment of the instruction blocks, and the executing equipment comprises: the processor, acceleration equipment, be provided with jump instruction behind each instruction block, jump instruction mapping table includes: the jump instruction and the next executed instruction block;
and the processing module 32 is configured to load the mirror image packet, and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
According to the invention, the description file of the neural network model is compiled through the compiling module to obtain the mirror image package, wherein the mirror image package comprises: the instruction block group comprises a plurality of instruction blocks to be processed, the instruction block sequence table is used for indicating the operation sequence of the instruction blocks and executing equipment of the instruction blocks, and the executing equipment comprises: the processor, acceleration equipment, be provided with jump instruction behind each instruction block, jump instruction mapping table includes: the jump instruction and the next executed instruction block; and loading the mirror image packet, and processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table, so that the problems of how to schedule and process different instruction blocks and the like for one or more neural network systems in the related art are solved, and a plurality of instruction blocks in the instruction block group are flexibly processed.
In this embodiment of the present invention, the compiling module 30 is configured to compile the description files of the plurality of neural network models through the compiling module to obtain the mirror image packages corresponding to the plurality of neural network models.
In this embodiment of the present invention, the processing module 32 is further configured to instruct an execution device in the instruction block ordered table to process the instruction block group according to the running order of the instruction block ordered table and the jump instruction mapping table.
In this embodiment of the present invention, the processing module 32 is further configured to, for each instruction, after one instruction is processed according to the running order, cache the obtained data in a pre-allocated cache region.
The following describes the processing procedure of the instruction block in general with reference to the preferred embodiment, but the invention is not limited to the technical solution of the embodiment of the invention.
Preferred embodiment 1
The preferred embodiment of the invention focuses on the realization of the face detection business reasoning function, and it needs to be explained that the neural network model 1 performs the face detection function, identifies whether the picture contains the face, and gives the position of the face in the picture; the neural network model 2 carries out face recognition, carries out feature extraction on the face given by the neural network model 1, compares the face with a database and gives a recognition result; generally, some preprocessing is required before the neural network model is input, and some post-processing work is performed after the neural network model is operated.
Based on the functions completed by the neural network model 1 and the neural network model 2, the technical scheme of the preferred embodiment of the invention comprises the following steps:
step 1: the two neural network models are input into a compiling module for compiling;
step 1.1: the two neural network model description files are input into a coding module for compiling;
step 1.2: compiling by the compiling module, and finally outputting a mirror image packet as shown in fig. 4;
as shown in fig. 5, the mirror packet includes an instruction block group and an instruction block sequence description, and an acceleration apparatus jump mapping table.
Specifically, the command block group includes NET _ C0, NET _ D1, NET _ C1, NET _ D2, NET _ C2; respectively corresponding to CPU face detection pretreatment (NET _ C0), carrying out face detection (NET _ D1) by an accelerating device, carrying out face detection post-treatment and face recognition pretreatment (NET _ C1) by the CPU, carrying out face recognition treatment (NET _ D2) by the accelerating device, carrying out face recognition post-treatment by the CPU, and finishing the service (NET _ C2);
the instruction block sequence table gives C0-D1-C1-D2-C2, wherein Cx represents that the CPU carries out calculation processing with the sequence number x; dx indicates that the acceleration Device performs sequential number x segment instruction processing.
It should be noted that, in the preferred embodiment of the present invention, it is assumed that the whole related operations described in the neural network model 1 are processed by the acceleration device (NET _ D1), the calculation requirements described in the neural network model 2 are processed by the acceleration device (NET _ D2), the input is first subjected to face detection preprocessing (NET _ C0), and the output result is submitted to the acceleration device for processing, so as to require face detection. After receiving the relevant input, the accelerating equipment completes the operation of face detection NET _ D1, outputs the information of face position and the like, the CPU obtains the information, carries out face detection post-processing and face recognition preprocessing (NET _ C1 description), submits the processed data to the accelerating equipment for face recognition (NET _ D2), and submits the processed data to the CPU for neural network model face recognition post-processing (NET _ C2) after the processing is completed, thereby completing the whole service function;
it should be noted that, assuming that the single instruction block of the neural network model is divided without loss of generality, the acceleration device may be divided into a plurality of modules if supporting a part of the operations described by the neural network model, and specifically, reference may be made to preferred embodiment 2.
Step 2, running state related processing flow;
and the compiling stage finishes outputting the mirror image package and submits the mirror image package to an operation state for operation.
As shown in fig. 6, the operation state includes the following modules: a loading module, an accelerating equipment control management module, an input/output management module, an upper API interface and the like, wherein,
the loading module finishes loading the instruction block group to corresponding equipment, and the accelerating equipment management module controls starting, stopping, resetting and the like of the accelerating equipment; the API interface completes the interaction with the upper layer user;
the input/output management module completes input/output interaction with an acceleration device (as shown in fig. 11, which is an internal block diagram of the acceleration device), and organizes the operation items by the control information contained in the cache items, specifically: viewed from the side of the acceleration equipment, an input buffer area (InBuffer) and an output buffer area (OutBuffer) are provided; FIG. 7 shows: the content of the cache area comprises two blocks, one block of control information and one block of data information; the control information includes a picture serial number Px, an instruction block processing Device and an instruction block serial number Tx, where x is a number, T is a Device type, and there are C and D, C denotes a HOST side CPU, and D denotes an acceleration Device.
Step 2.1: using an API interface to carry out input and output setting to complete programming;
step 2.2: compiling the code by a general compiling tool (such as gcc) to generate an executable file;
step 2.3: and running the executable file: the operation is shown in figure 12.
And the HOST side operation state firstly schedules the CPU to carry out face detection preprocessing (NET _ C0) according to the instruction block sequence table, and fills data into the InBuffer after the calculation is finished, wherein the filling control information is P1-Net-D1, and the neural network model inference process of the face detection is indicated.
The accelerator acquires the content in the InBuffer, performs instruction processing of human face detection (Net-D1), fills data into the OutBuffer after the processing is completed, and copies the input control information (P1-Net-D1).
After obtaining the content from the equipment out buffer, the HOST side judges to perform face detection post-processing and face recognition neural network model preprocessing (Net-C1) according to the instruction block sequence table, fills data into the Inbuffer after the processing is finished, and fills control information into P1-Net _ D2 (the face recognition neural network model operates) according to the instruction set sequence table.
The accelerator device acquires the item, performs face recognition calculation (Net _ D2) processing, outputs data, and copies control information P1-Net _ D2.
The CPU side acquires the item, and performs face recognition data post-processing (NET _ C2) according to the instruction block sequence table to complete the overall reasoning.
As can be seen from the above flow, in the process of using the instruction block sequence table, the processing of the control information by the acceleration apparatus side is only copied. And the host side maintains and changes the control information according to the instruction block sequence table.
The acceleration device has two different functions, namely a face detection (NET-D1) function and a face recognition (NET-D2) function, wherein the time division multiplexing is realized by using a jump instruction and a jump mapping table to finish the inference switching of the neural network model function. The relevant cases are as follows:
in the first case: and (3) compiling state processing: the method comprises the following four steps:
step a: the compiling module generates an instruction block NET _ Tx of a neural network model to be dispatched to different devices according to device conditions, (in the example, a face detection process (NET _ D1) and a face recognition process (NET _ D2);
step b: the compiling module adds a jump instruction JMP 0 after an accelerating device instruction block NET _ Dx;
step c: the compilation module generates an accelerated device jump map (or run state generation, as described herein). The jump mapping table is shown in FIG. 9.
Step d: the compiling module adds a buff acquisition instruction and a JMP Rj instruction in front of an accelerating device instruction block NET-D0; (FIG. 8)
In the second case: the operation process of the accelerating device on the instruction is as follows: (FIG. 10)
1) Executing the instruction, executing from instruction base 0;
2) and acquiring data (control information + data input) from the InBuffer, analyzing the neural network model index from the control word, searching a mapping table, acquiring an execution instruction base address, and filling the base address to Rj.
3) Jumping to an instruction base address to start execution;
4) until the end of the neural network model processing is executed, outputting an execution result to an OutBuffer, and writing an input control word;
5) then JMP 0 is executed, and the execution is repeated.
Preferred embodiment 2: the single neural network model needs to be operated together at the host and the accelerating equipment
Step 2.1: the neural network model is input into a compiling module for compiling and outputting a mirror image packet;
suppose the neural network model command block combination and command block sequence table is C0-D1-C1-D2-C2; representing that the neural network model needs to be preprocessed by a host CPU, then processed by an accelerating device, then processed by the CPU, then processed by the accelerating device, and finally processed by the CPU;
the device skip list is the same as the preferred embodiment 1, and is not described herein again.
Step 2.2: compiling to generate an executable file according to the API interface code;
step 2.3: the HOST side runs in a running state, and mirror loading and running are carried out; the process is substantially the same as step 2 in preferred embodiment 1;
step 2.4: the running state continuously runs and continuously gives the reasoning operation result.
Preferred embodiment 3: multiple neural network model combinations, and a single neural network model requires multiple splits
Step 3.1: the multi-neural network model is input into a compiling module for compiling and outputting a mirror image packet;
without loss of generality, the instruction block combination and instruction block sequence table of the neural network model can be set as C0-D1-C1-D2-C2-D3-C3-C4-D4-C5-D5-C6;
the apparatus jump table is the same as in preferred embodiment 1.
Step 3.2: the other steps of the preferred embodiment 2.
From the preferred embodiment of the present invention, the embodiments of the present invention and the disclosure of the preferred embodiment are not only applicable to the combined inference service of multiple neural network models, but also applicable to the service of a single neural network model that needs to be completed by the combination of HOST and acceleration equipment. In addition, a multi-neural network model combination of neural network models which are combined by HOST and accelerating equipment can also be suitable; a host and multiple accelerator combinations may also be suitable; these are all within the scope of the present invention.
Further, the technical solutions of the above embodiments and the preferred embodiments of the present invention provide a method, an apparatus, and a system for implementing business inference of a neuro-neural network model combination by using a jump instruction, a mapping table, and an instruction block sequence table through two stages of compiling and running, aiming at the problem that business inference related to the neuro-neural network model combination is difficult to implement.
In an optional embodiment of the present invention, a method for generating a usage instruction block sequence table by compiling, and performing multi-device scheduling and cooperative operation according to the table during operation is provided;
in an optional embodiment of the present invention, a method and an apparatus are provided for an acceleration device to perform time division multiplexing to complete different operation functions by using a simple jump instruction based on a jump mapping table;
in an alternative embodiment of the present invention, there is provided a neural network model acceleration apparatus including: an instruction cache for storing related instructions; the functional unit set module is used for realizing a neural network model related calculation module; a jump instruction, a jump mapping table, for realizing the jump of the neural network model functional group; a register group, etc.;
in an optional embodiment of the present invention, a deep neural network model compiling module is provided, which compiles and converts the deep neural network model into a relevant instruction set; generating a related instruction block sequence table and a jump mapping table;
in an optional embodiment of the present invention, a running state module of neural network model inference operation is provided: the system comprises a loading module, a judging module and a judging module, wherein the loading module is used for loading a relevant mirror image to a specific position; the equipment control is used for controlling the starting, stopping, resetting and the like of the acceleration equipment; the input and output management of the equipment, which provides the data to be processed and the requirement to the equipment and obtains the processing result of the equipment; including programming interfaces (APIs) provided to business users, etc.;
in summary, by using the instruction block processing method and apparatus of the embodiments and preferred embodiments of the present invention, a compiling and running state system (as shown in fig. 13) is adopted, so that inference landing of a multi-neural network model can be conveniently, quickly, and efficiently completed, and related business functions can be simply and conveniently completed.
Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:
s1, compiling the description file of the neural network model through a compiling module to obtain a mirror image package, wherein the mirror image package comprises: instruction block group, instruction block preface table, jump instruction mapping table, instruction block group include a plurality of instruction blocks of treating, instruction block preface table is used for pointing out the operation order of a plurality of instruction blocks to and the execution equipment of operation instruction block, and execution equipment includes: the treater, accelerating equipment is provided with the jump instruction behind each instruction block, and the jump instruction mapping table includes: a jump instruction and a next executed instruction block;
s2, loading the mirror image package, and processing the instruction block of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, compiling the description file of the neural network model through a compiling module to obtain a mirror image package, wherein the mirror image package comprises: instruction block group, instruction block preface table, jump instruction mapping table, instruction block group include a plurality of instruction blocks of treating, instruction block preface table is used for pointing out the operation order of a plurality of instruction blocks to and the execution equipment of operation instruction block, and execution equipment includes: the treater, accelerating equipment is provided with the jump instruction behind each instruction block, and the jump instruction mapping table includes: a jump instruction and a next executed instruction block;
s2, loading the mirror image package, and processing the instruction block of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented in a general purpose computing device, they may be centralized on a single computing device or distributed across a neural network model comprised of multiple computing devices, and alternatively, they may be implemented in program code executable by a computing device, such that they may be stored in a memory device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for processing an instruction block, comprising:
compiling the description file of the neural network model through a compiling module to obtain a mirror image package, wherein the mirror image package comprises: the instruction block group comprises a plurality of instruction blocks to be processed, the instruction block sequence table is used for indicating the operation sequence of the instruction blocks and executing equipment of the instruction blocks, and the executing equipment comprises: the processor, acceleration equipment, be provided with jump instruction behind each instruction block, jump instruction mapping table includes: the jump instruction and the next executed instruction block;
and loading the mirror image packet, and processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
2. The method of claim 1, wherein compiling the description file of the neural network model by the compiling module to obtain a mirror package comprises:
and compiling the description files of the plurality of neural network models through a compiling module to obtain mirror image packages corresponding to the plurality of neural network models.
3. The method of claim 1, wherein processing the instruction blocks of the group of instruction blocks according to the instruction block order table and the jump instruction mapping table comprises:
and instructing the execution equipment in the instruction block sequence table to process the instruction blocks of the instruction block group according to the running sequence of the instruction block sequence table and the jump instruction mapping table.
4. The method according to any of claims 1 to 3, wherein after processing the instruction blocks of the group of instruction blocks according to the instruction block order table and the jump instruction mapping table, the method further comprises:
and for each instruction, after one instruction is processed according to the running sequence, caching the obtained data into a pre-allocated cache region.
5. An apparatus for processing an instruction block, comprising:
the compiling module is used for compiling the description file of the neural network model to obtain a mirror image package, wherein the mirror image package comprises: the instruction block group comprises a plurality of instruction blocks to be processed, the instruction block sequence table is used for indicating the operation sequence of the instruction blocks and executing equipment of the instruction blocks, and the executing equipment comprises: the processor, acceleration equipment, be provided with jump instruction behind each instruction block, jump instruction mapping table includes: the jump instruction and the next executed instruction block;
and the processing module is used for loading the mirror image packet and processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
6. The apparatus of claim 5, wherein the compiling module is configured to compile the description files of the plurality of neural network models through the compiling module to obtain the mirror image packages corresponding to the plurality of neural network models.
7. The apparatus of claim 5, wherein the processing module is further configured to instruct an execution device in the instruction block ordered table to process the instruction block groups according to the operation order of the instruction block ordered table and the jump instruction mapping table.
8. The apparatus of any of claims 5 to 7, wherein the processing module is further configured to, for each instruction, cache the obtained data in a pre-allocated cache area after processing one instruction according to the execution order.
9. A storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of any of claims 1 to 4 when executed.
10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 4.
CN201910562823.6A 2019-06-26 2019-06-26 Instruction block processing method and device, storage medium and electronic device Pending CN112148291A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910562823.6A CN112148291A (en) 2019-06-26 2019-06-26 Instruction block processing method and device, storage medium and electronic device
PCT/CN2020/085180 WO2020259020A1 (en) 2019-06-26 2020-04-16 Instruction block processing method and apparatus, storage medium, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910562823.6A CN112148291A (en) 2019-06-26 2019-06-26 Instruction block processing method and device, storage medium and electronic device

Publications (1)

Publication Number Publication Date
CN112148291A true CN112148291A (en) 2020-12-29

Family

ID=73869963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910562823.6A Pending CN112148291A (en) 2019-06-26 2019-06-26 Instruction block processing method and device, storage medium and electronic device

Country Status (2)

Country Link
CN (1) CN112148291A (en)
WO (1) WO2020259020A1 (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110874212B (en) * 2015-06-30 2021-08-20 华为技术有限公司 Hardware acceleration method, compiler and equipment
US10776115B2 (en) * 2015-09-19 2020-09-15 Microsoft Technology Licensing, Llc Debug support for block-based processor
CN106227507B (en) * 2016-07-11 2019-10-18 北京深鉴智能科技有限公司 Computing system and its controller
CN107239315B (en) * 2017-04-11 2019-11-15 赛灵思公司 Programming model towards neural network heterogeneous computing platforms
CN109272109B (en) * 2018-10-30 2020-07-17 北京地平线机器人技术研发有限公司 Instruction scheduling method and device of neural network model
CN109919311B (en) * 2019-03-13 2020-04-10 北京地平线机器人技术研发有限公司 Method for generating instruction sequence, method and device for executing neural network operation

Also Published As

Publication number Publication date
WO2020259020A1 (en) 2020-12-30

Similar Documents

Publication Publication Date Title
CN110162413B (en) Event-driven method and device
CN111651207B (en) Neural network model operation chip, method, device, equipment and medium
CN109426516A (en) Software version management method and device
CN107203401B (en) Front-end project construction method, device and system
CN104636202A (en) Computer system and scheduling method thereof
CN108429787B (en) Container deployment method and device, computer storage medium and terminal
CN110442441B (en) Data processing method and device, readable storage medium and terminal equipment
CN110750298B (en) AI model compiling method, equipment and storage medium
CN111104120A (en) Neural network compiling method and system and corresponding heterogeneous computing platform
CN109445797A (en) Handle task executing method and device
CN112099882B (en) Service processing method, device and equipment
CN115421735A (en) Heterogeneous deployment method and device for deep learning task and electronic equipment
CN109343856A (en) The generation method and device of custom algorithm component
CN118034924A (en) Data processing method and device based on many-core system, electronic equipment and medium
CN112148291A (en) Instruction block processing method and device, storage medium and electronic device
CN116861359A (en) Operator fusion method and system for deep learning reasoning task compiler
CN115701302A (en) Deep learning model test method and device and computer storage medium
CN109284097A (en) Realize method, equipment, system and the storage medium of complex data analysis
CN117370638B (en) Method and device for decomposing and scheduling basic model task with enhanced thought diagram prompt
CN118409758B (en) Method, apparatus, medium and program product for compiling a kernel function
CN113805976B (en) Data processing method and device, electronic equipment and computer readable storage medium
CN106599180A (en) Method and device for processing abnormal data
CN118313425A (en) Neural network accelerator based on instruction operation, operation method and storage medium
CN114489992A (en) Method and apparatus for determining operation result, storage medium, and electronic apparatus
CN116820470A (en) Operation method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination