WO2020259020A1 - Instruction block processing method and apparatus, storage medium, and electronic device - Google Patents
Instruction block processing method and apparatus, storage medium, and electronic device Download PDFInfo
- Publication number
- WO2020259020A1 WO2020259020A1 PCT/CN2020/085180 CN2020085180W WO2020259020A1 WO 2020259020 A1 WO2020259020 A1 WO 2020259020A1 CN 2020085180 W CN2020085180 W CN 2020085180W WO 2020259020 A1 WO2020259020 A1 WO 2020259020A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- instruction
- instruction block
- jump
- neural network
- blocks
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
- G06F8/61—Installation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
- G06F8/61—Installation
- G06F8/63—Image based installation; Cloning; Build to order
Definitions
- This application relates to the field of neural networks, and in particular to a method and device for processing instruction blocks, storage media, and electronic devices.
- the deep neural network model solves business problems and needs to perform an inference process.
- Devices that perform inference calculations generally include central processing unit (Central Processing Unit, referred to as CPU), graphics processing unit (Graphics Processing Unit, referred to as GPU), field programmable gate array (Field Programable Gate Array, referred to as FPGA), etc.
- CPU Central Processing Unit
- GPU Graphics Processing Unit
- FPGA Field Programable Gate Array
- a deep neural network model needs to be called first to detect whether the image contains a human face (Face Detection Face Image). Detection), if a person’s face image is input into another deep neural network for inference operation, the detailed feature information of the face image is obtained, and identification (Face Idnetification, face recognition) is performed, and finally the business office is obtained The desired result.
- the embodiments of the present application provide a method and device for processing instruction blocks, a storage medium, and an electronic device to solve the problems of how to schedule and process different instructions for one or more neural network systems in the related art.
- a method for processing instruction blocks which includes: compiling a description file of a neural network model by a compilation module to obtain a mirrored package, wherein the mirrored package includes: instruction block group, instruction A block sequence table, a jump instruction mapping table, the instruction block group includes a plurality of instruction blocks to be processed, the instruction block sequence table is used to indicate the running sequence of the plurality of instruction blocks, and the execution device for running the instruction blocks ,
- the execution device includes a processor, an acceleration device, a jump instruction is set after each instruction block, and the jump instruction mapping table includes: the jump instruction and the next instruction block to be executed;
- the mirror package is described, and the instruction blocks of the instruction block group are processed according to the instruction block sequence table and the jump instruction mapping table.
- the description file of the neural network model is compiled by the compiling module to obtain the image package, including: the description file of the multiple neural network models is compiled by the compilation module to obtain the mirror image corresponding to the multiple neural network models package.
- processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table includes: instructing the execution device in the instruction block sequence table to follow the instruction block The running sequence of the sequence table and the jump instruction mapping table process the instruction blocks of the instruction block group.
- the method further includes: for each instruction, in accordance with the execution sequence After processing an instruction, the obtained data is cached in the pre-allocated buffer area.
- an instruction block processing device including: a compiling module for compiling the description file of the neural network model to obtain an image package, wherein the image package includes: instructions A block group, an instruction block sequence table, a jump instruction mapping table, the instruction block group includes a plurality of instruction blocks to be processed, and the instruction block sequence table is used to indicate the running sequence of the plurality of instruction blocks, and the running instructions
- the execution device includes: a processor, an acceleration device, a jump instruction is set after each instruction block, and the jump instruction mapping table includes: the jump instruction and the next executed instruction Block; processing module, used to load the mirrored package, and process the instruction block of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
- the compilation module is used to compile description files of multiple neural network models through the compilation module to obtain mirror packages corresponding to the multiple neural network models.
- the processing module is further used to instruct the execution device in the instruction block sequence table to process the instruction block group according to the running sequence of the instruction block sequence table and the jump instruction mapping table .
- the processing module is further configured to cache the obtained data in a pre-allocated buffer area after processing an instruction according to the running sequence for each instruction.
- a storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps in any one of the foregoing method embodiments when running.
- an electronic device including a memory and a processor, the memory is stored with a computer program, and the processor is configured to run the computer program to execute any of the above Steps in the method embodiment.
- FIG. 1 is a block diagram of the hardware structure of a terminal of an instruction block processing method according to an embodiment of the present application
- Figure 2 is a flowchart of a method for processing an instruction block according to an embodiment of the present application
- Fig. 3 is a structural block diagram of an instruction block processing device according to an embodiment of the present application.
- Fig. 4 is a schematic diagram of a working flow of a compilation module according to a preferred embodiment of the present application.
- FIG. 5 is a schematic diagram of the structure of an image package according to a preferred embodiment of the present application.
- Fig. 6 is a functional block diagram of a running state module according to a preferred embodiment of the present application.
- Figure 7 is a schematic diagram of input and output buffers and control information according to a preferred embodiment of the present application.
- FIG. 8 is a schematic diagram of adding instruction blocks and jump instructions of an acceleration device according to a preferred embodiment of the present application.
- FIG. 9 is a mapping table of the jump position of an instruction block of an acceleration device according to a preferred embodiment of the present application.
- FIG. 10 is a flow chart of an acceleration device operating according to instructions according to a preferred embodiment of the present application.
- Fig. 11 is an internal functional block diagram of an acceleration device according to a preferred embodiment of the present application.
- Figure 12 is a flowchart of interaction between a host and an acceleration device according to a preferred embodiment of the present application
- Fig. 13 is an overall system block diagram according to a preferred embodiment of the present application.
- FIG. 1 is a hardware structure block diagram of a terminal in a method for processing instruction blocks in an embodiment of the present application.
- the terminal 10 may include one or more (only one is shown in FIG. 1) processor 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA)
- the memory 104 for storing data
- the aforementioned terminal may also include a transmission device 106 and an input/output device 108 for communication functions.
- the terminal 10 may also include more or fewer components than those shown in FIG. 1, or have the same functions as those shown in FIG. 1 or more different configurations than those shown in FIG. 1.
- the memory 104 may be used to store computer programs, for example, software programs and modules of application software, such as the computer programs corresponding to the navigation method of online ride-hailing in the embodiment of the present application.
- the processor 102 runs the computer programs stored in the memory 104, Thereby, various functional applications and data processing are executed, that is, the above-mentioned method is realized.
- the memory 104 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
- the memory 104 may further include a memory remotely provided with respect to the processor 102, and these remote memories may be connected to the terminal 10 through a neural network model. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
- the transmission device 106 is used to receive or send data via a network.
- the aforementioned specific examples of the network may include a wireless network provided by the communication provider of the terminal 10.
- the transmission device 106 includes a network adapter (Network Interface Controller, NIC for short), which can be connected to other network devices through a base station to communicate with the Internet.
- the transmission device 106 may be a radio frequency (Radio Frequency, referred to as RF) module, which is used to communicate with the Internet in a wireless manner.
- RF Radio Frequency
- FIG. 2 is a flowchart of the method for processing an instruction block according to an embodiment of the present application. As shown in FIG. 2, the process includes the following steps:
- step S202 the description file of the neural network model is compiled by the compiling module to obtain a mirrored package, where the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, and the instruction block group includes multiple to-be-processed Instruction block, instruction block sequence table is used to indicate the running sequence of multiple instruction blocks, and the execution device for running the instruction block, the execution device includes: processor, acceleration device, each instruction block is set with a jump instruction, jump instruction
- the mapping table includes: jump instructions and the next executed instruction block;
- Step S204 Load the mirrored package, and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
- the description file of the neural network model is compiled through the compilation module to obtain a mirrored package, wherein the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, and the instruction block group includes A plurality of instruction blocks to be processed, the instruction block sequence table is used to indicate the execution sequence of the plurality of instruction blocks, and an execution device for running the instruction blocks.
- the execution device includes: a processor, an acceleration device, each A jump instruction is set after the instruction block, and the jump instruction mapping table includes: the jump instruction and the next instruction block to be executed; load the mirror package, and follow the instruction block sequence table and the jump instruction
- the instruction mapping table processes the instruction blocks of the instruction block group, which solves the problem of how to schedule and process different instruction blocks for one or more neural network systems in the related technology, and then flexibly process multiple instruction block groups. Instruction block.
- the description file of the neural network model is compiled through the compiling module to obtain the image package, including: compiling the description files of the multiple neural network models through the compilation module to obtain the mirror images corresponding to the multiple neural network models package.
- processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table includes: instructing the execution device in the instruction block sequence table to follow the instruction block The running sequence of the sequence table and the jump instruction mapping table process the instruction blocks of the instruction block group.
- the method further includes: for each instruction, in accordance with the execution sequence After processing an instruction, the obtained data is cached in the pre-allocated buffer area.
- the method according to the above embodiment can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is Better implementation.
- the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the method described in each embodiment of the present application.
- an instruction block processing device is also provided, and the device is used to implement the above-mentioned embodiments and preferred implementations, and those that have been described will not be repeated.
- the term "module" can implement a combination of software and/or hardware with predetermined functions.
- the devices described in the following embodiments are preferably implemented by software, the implementation of hardware or a combination of software and hardware is also possible and conceived.
- Fig. 3 is a structural block diagram of an instruction block processing device according to an embodiment of the present application. As shown in Fig. 3, the device includes:
- the compiling module 30 is used to compile the description file of the neural network model to obtain a mirrored package, where the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, and the instruction block group includes waiting A plurality of instruction blocks processed, the instruction block sequence table is used to indicate the execution sequence of the plurality of instruction blocks and an execution device for running the instruction blocks.
- the execution device includes a processor, an acceleration device, and each of the A jump instruction is set after the instruction block, and the jump instruction mapping table includes: the jump instruction and the next instruction block to be executed;
- the processing module 32 is configured to load the image package, and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
- the description file of the neural network model is compiled through the compilation module to obtain a mirrored package, wherein the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, and the instruction block group includes A plurality of instruction blocks to be processed, the instruction block sequence table is used to indicate the execution sequence of the plurality of instruction blocks, and an execution device for running the instruction blocks.
- the execution device includes: a processor, an acceleration device, each A jump instruction is set after the instruction block, and the jump instruction mapping table includes: the jump instruction and the next instruction block to be executed; load the mirror package, and follow the instruction block sequence table and the jump instruction
- the instruction mapping table processes the instruction blocks of the instruction block group, which solves the problem of how to schedule and process different instruction blocks for one or more neural network systems in the related technology, and then flexibly process multiple instruction block groups. Instruction block.
- the compilation module 30 is configured to compile description files of multiple neural network models through the compilation module to obtain mirror packages corresponding to the multiple neural network models.
- the processing module 32 is also used to instruct the execution device in the instruction block sequence table to process the instruction block according to the running sequence of the instruction block sequence table and the jump instruction mapping table group.
- the processing module 32 is further configured to cache the obtained data in a pre-allocated buffer area after processing an instruction according to the running sequence for each instruction.
- the preferred embodiment of this application focuses on the realization of the reasoning function of the face detection service.
- the neural network model 1 performs the face detection function, recognizes whether the picture contains a face, and gives the position of the face in the picture.
- Neural network model 2 performs face recognition, extracts the features of the face given by neural network model 1, compares with the database, and gives the recognition results; generally, some preprocessing is required before inputting the neural network model. After the network model runs, some post-processing work is performed.
- the technical solution of the preferred embodiment of the present application includes the following steps:
- Step 1 Input the two neural network models into the compilation module for compilation
- Step 1.1 Two neural network model description files are input into the coding module for compilation;
- Step 1.2 Compile the module to compile, and finally output the image package, as shown in Figure 4;
- the image package includes instruction block group and instruction block sequence description, and acceleration device jump mapping table.
- the instruction block group includes NET_C0, NET_D1, NET_C1, NET_D2, NET_C2; respectively correspond to the CPU face detection preprocessing (NET_C0), the acceleration device performs face detection (NET_D1), and the CPU performs face detection post-processing and face recognition Preprocessing (NET_C1), the acceleration device performs face recognition processing (NET_D2), and the CPU performs face recognition post-processing to complete the business (NET_C2);
- the instruction block sequence table gives C0-D1-C1-D2-C2, where Cx indicates that the CPU performs calculation processing with sequence number x; Dx indicates that the acceleration device Device performs sequence number x segment instruction processing.
- the preferred embodiment of the present application assumes that the related operations described by neural network model 1 are processed by acceleration equipment as a whole (NET_D1), and the calculation requirements described by neural network model 2 are processed by acceleration equipment (NET_D2). Face detection preprocessing (NET_C0), the output result is submitted to the acceleration device for processing, and face detection is required.
- the acceleration device receives the relevant input, it completes the face detection NET_D1 operation, outputs the information of the face position and other information, which is obtained by the CPU, performs face detection post-processing and face recognition preprocessing (NET_C1 description), and submits the processed data to the acceleration device Perform face recognition (NET_D2) processing. After the processing is completed, submit it to the CPU for neural network model face recognition post-processing (NET_C2) to complete the overall business function;
- the acceleration device supports part of the operations described by the neural network model, it can be divided into multiple modules. For details, please refer to the preferred embodiment 2. .
- Step 2 Related processing flow in running state
- the image package output is completed in the compilation stage and submitted to the running state for operation.
- the running state includes the following modules: load module, acceleration device control management module, input output management module, upper-level API interface, etc., among which,
- the load module completes the loading of the instruction block group to the corresponding device, and the acceleration device management module controls the startup, stop, and reset of the acceleration device; the API interface completes the interaction with the upper-level user;
- the input and output management module completes the input and output interaction with the acceleration device (as shown in Figure 11 is the internal block diagram of the acceleration device), and organizes the running items through the control information contained in the cache item, specifically: from the acceleration device side See, there are an input buffer (InBuffer) and an output buffer (OutBuffer); as shown in Figure 7: the buffer content has two blocks, one block of control information, one block of data information; the control information includes picture sequence number Px, instruction block processing equipment and Instruction block serial number Tx, where x is a number, T is a device type, there are C and D, C represents the HOST side CPU, and D represents the acceleration device Device.
- Step 2.1 Use API interface to set input and output, complete programming
- Step 2.2 A general compiling tool (such as gcc) compiles the code and generates an executable file;
- Step 2.3 Run the executable file: the running process is shown in Figure 12.
- the CPU is first scheduled for face detection preprocessing (NET_C0). After the calculation is completed, the data is filled into the InBuffer, and the filling control information is P1-Net-D1, indicating that face detection is required Neural network model inference process.
- the accelerator obtains the content in the InBuffer and performs the instruction processing of face detection (Net-D1). After the processing is completed, the data is filled into the OutBuffer and the input control information (P1-Net-D1) is copied at the same time.
- the HOST side After the HOST side obtains the content from the device OutBuffer, it judges to perform face detection post-processing and face recognition neural network model preprocessing (Net-C1) according to the instruction block sequence table. After the processing is completed, fill the data into InBuffer, And fill the control information as P1-Net_D2 (face recognition neural network model operation) according to the instruction set sequence table.
- the acceleration device obtains this item, performs face recognition calculation (Net_d2) processing, outputs data, and copies control information P1-Net_D2.
- the CPU side obtains this item, and performs face recognition data post-processing (NET_C2) processing according to the instruction block sequence table to complete the overall reasoning.
- NET_C2 face recognition data post-processing
- the acceleration device has two different functions, face detection (NET-D1) and face recognition (NET-D2).
- face detection NET-D1
- face recognition NET-D2
- the time-division multiplexing is to use jump instructions and jump mapping tables to complete the neural network.
- the model function is switched by reasoning. The relevant situation is as follows:
- Step a The compilation module generates a neural network model according to the equipment situation and dispatches the instruction block NET_Tx to different equipment (face detection processing (NET_D1) and face recognition processing (NET_D2) in this example;
- Step b The compilation module adds a jump instruction JMP 0 after the acceleration device instruction block NET_Dx;
- Step c The compilation module generates the acceleration device jump mapping table (it can also be generated in the running state, and the compilation module generation is described here).
- the jump mapping table is shown in Figure 9.
- Step d The compilation module adds the buff acquisition instruction and the JMP Rj instruction before the acceleration device instruction block NET-D0; ( Figure 8)
- Preferred embodiment 2 The single neural network model needs to run together on the host and the acceleration device
- Step 2.1 The neural network model is input to the compilation module, compiled, and the image package is output;
- the neural network model instruction block combination and instruction block sequence table are C0-D1-C1-D2-C2; it means that the neural network model needs to be preprocessed by the host CPU first, then the device is accelerated, then the CPU is processed, and then the device is accelerated, and then processed, Finally CPU processing;
- the device jump table example is the same as the preferred embodiment 1, and will not be repeated here.
- Step 2.2 According to the API interface code, compile and generate an executable file
- Step 2.3 Run in the running state on the HOST side, load and run the image; the process is roughly the same as step 2 in the preferred embodiment 1;
- Step 2.4 The running state continues to run, and the inference calculation result is continuously given.
- Preferred embodiment 3 Combination of multiple neural network models, and a single neural network model needs multiple splits
- Step 3.1 Input the multi-neural network model into the compilation module, compile, and output the image package;
- the neural network model instruction block combination and instruction block sequence table can be set as C0-D1-C1-D2-C2-D3-C3-C4-D4-C5-D5-C6;
- the device jump table example is the same as the preferred embodiment 1.
- Step 3.2 Same as other steps in the preferred embodiment 2.
- the embodiments of the present application and the content disclosed in the preferred embodiments are not only applicable to multi-neural network model combined reasoning services, but also services that require a combination of HOST and acceleration equipment to complete a single neural network model.
- a combination of multiple neural network models with a neural network model completed by a combination of a HOST and an acceleration device can also be applied; a combination of a host and multiple acceleration devices can also be applied; these are all within the protection scope of this application.
- the technical solutions of the above-mentioned embodiments and preferred embodiments of the present application aim at the problem that the business reasoning related to the combination of multi-neural neural network models is difficult to implement. It provides a method for compiling and running in two stages, using jump instructions and mapping A method, device and system for implementing multi-neural neural network model combination business reasoning by table and instruction block table sequence.
- an acceleration device uses a simple jump instruction based on a jump mapping table to perform time division multiplexing to complete different arithmetic functions
- a neural neural network model acceleration device including: an instruction cache, used to store related instructions; a functional unit set module, which implements neural neural network model related calculation modules; jump instructions, Jump mapping table, used to realize the jump of neural network model function group; register group, etc.;
- a compilation module of a deep neural network model which compiles and converts the deep neural network model into a related instruction set; and generates a sequence table of related instruction blocks and a jump mapping table;
- a running state module of neural network model inference operation including a loading module, which loads related images to a specific location; device control, which controls the start, stop, and reset of acceleration devices;
- the input and output management of the equipment provide the data and requirements to be processed to the equipment, and obtain the processing results of the equipment; including the programming interface (API) provided to business users;
- API programming interface
- the compiler and runtime system (as shown in Figure 13) can be used to complete the reasoning of the multi-neural neural network model conveniently, quickly and efficiently. Landing, simple completion of related business functions.
- the embodiment of the present application also provides a storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps in any of the foregoing method embodiments when running.
- the foregoing storage medium may be configured to store a computer program for executing the following steps:
- S1 Compile the description file of the neural network model through the compilation module to obtain a mirrored package, where the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, instruction block group includes multiple instructions to be processed Block, instruction block sequence table is used to indicate the running sequence of multiple instruction blocks, and the execution device for running the instruction block, the execution device includes: processor, acceleration device, each instruction block is set with jump instructions, jump instruction mapping The table includes: jump instruction and next instruction block to be executed;
- the foregoing storage medium may include, but is not limited to: U disk, Read-Only Memory (Read-Only Memory, ROM for short), Random Access Memory (Random Access Memory, RAM for short), Various media that can store computer programs, such as mobile hard disks, magnetic disks, or optical disks.
- the embodiment of the present application also provides a storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps in any of the foregoing method embodiments when running.
- the embodiment of the present application also provides an electronic device, including a memory and a processor, the memory is stored with a computer program, and the processor is configured to run the computer program to execute the steps in any of the foregoing method embodiments.
- the aforementioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the aforementioned processor, and the input-output device is connected to the aforementioned processor.
- the foregoing processor may be configured to execute the following steps through a computer program:
- S1 Compile the description file of the neural network model through the compilation module to obtain a mirrored package, where the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, instruction block group includes multiple instructions to be processed Block, instruction block sequence table is used to indicate the running sequence of multiple instruction blocks, and the execution device for running the instruction block, the execution device includes: processor, acceleration device, each instruction block is set with jump instructions, jump instruction mapping The table includes: jump instruction and next instruction block to be executed;
- the embodiment of the present application also provides an electronic device, including a memory and a processor, the memory is stored with a computer program, and the processor is configured to run the computer program to execute the steps in any of the foregoing method embodiments.
- the aforementioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the aforementioned processor, and the input-output device is connected to the aforementioned processor.
- modules or steps of this application can be implemented by a general computing device, and they can be concentrated on a single computing device, or distributed on a nerve composed of multiple computing devices.
- they can optionally be implemented with program codes executable by the computing device, so that they can be stored in the storage device for execution by the computing device, and in some cases, they can be different from here.
- the steps shown or described are executed in the order of, or they are respectively fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module to achieve. In this way, this application is not limited to any specific hardware and software combination.
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Stored Programmes (AREA)
Abstract
Description
Claims (10)
- 一种指令块的处理方法,包括:A method for processing instruction blocks, including:通过编译模块对神经网络模型的描述文件进行编译,得到镜像包,其中,所述镜像包包括:指令块组,指令块序表,跳转指令映射表,所述指令块组包括待处理的多个指令块,所述指令块序表用于指示所述多个指令块的运行顺序,以及运行指令块的执行设备,所述执行设备包括:处理器,加速设备,每一个所述指令块后设置有跳转指令,所述跳转指令映射表包括:所述跳转指令和下一个执行的指令块;The description file of the neural network model is compiled through the compilation module to obtain a mirrored package, where the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, and the instruction block group includes multiple to be processed Instruction blocks, the instruction block sequence table is used to indicate the running sequence of the multiple instruction blocks, and the execution device for running the instruction blocks, the execution device includes: a processor, an acceleration device, after each instruction block A jump instruction is set, and the jump instruction mapping table includes: the jump instruction and the next instruction block to be executed;加载所述镜像包,并按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块。Load the image package, and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
- 根据权利要求1所述的方法,其中,通过编译模块对神经网络模型的描述文件进行编译,得到镜像包,包括:The method according to claim 1, wherein the compiling of the description file of the neural network model by the compiling module to obtain the image package comprises:通过编译模块对多个神经网络模型的描述文件进行编译,得到多个神经网络模型对应的镜像包。The description files of multiple neural network models are compiled through the compilation module to obtain mirror packages corresponding to multiple neural network models.
- 根据权利要求1所述的方法,其中,按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块,包括:The method according to claim 1, wherein processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table comprises:指示所述指令块序表中的执行设备按照所述指令块序表的运行顺序和所述跳转指令映射表处理所述指令块组的指令块。Instruct the execution device in the instruction block sequence table to process the instruction blocks of the instruction block group according to the running sequence of the instruction block sequence table and the jump instruction mapping table.
- 根据权利要求1至3任一项所述的方法,其中,按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块之后,所述方法还包括:The method according to any one of claims 1 to 3, wherein after processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table, the method further comprises:对于每一个指令,在按照所述运行顺序处理完一个指令后,将得到的数据缓存到预先分配的缓存区中。For each instruction, after processing an instruction according to the running sequence, the obtained data is cached in the pre-allocated buffer area.
- 一种指令块的处理装置,包括:An instruction block processing device includes:编译模块,用于对神经网络模型的描述文件进行编译,得到镜像包,其中,所述镜像包包括:指令块组,指令块序表,跳转指令映射表,所述指令块组包括待处理的多个指令块,所述指令块序表用于指 示所述多个指令块的运行顺序,以及运行指令块的执行设备,所述执行设备包括:处理器,加速设备,每一个所述指令块后设置有跳转指令,所述跳转指令映射表包括:所述跳转指令和下一个执行的指令块;The compiling module is used to compile the description file of the neural network model to obtain a mirrored package, where the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, and the instruction block group includes to-be-processed The instruction block sequence table is used to indicate the running sequence of the multiple instruction blocks and the execution device for running the instruction block, the execution device includes: a processor, an acceleration device, each of the instructions A jump instruction is set after the block, and the jump instruction mapping table includes: the jump instruction and the next instruction block to be executed;处理模块,用于加载所述镜像包,并按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块。The processing module is configured to load the mirrored package and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
- 根据权利要求5所述的装置,其中,所述编译模块,用于通过编译模块对多个神经网络模型的描述文件进行编译,得到多个神经网络模型对应的镜像包。5. The device according to claim 5, wherein the compiling module is used to compile description files of multiple neural network models through the compiling module to obtain mirror packages corresponding to the multiple neural network models.
- 根据权利要求5所述的装置,其中,所述处理模块,还用于指示所述指令块序表中的执行设备按照所述指令块序表的运行顺序和所述跳转指令映射表处理所述指令块组。The apparatus according to claim 5, wherein the processing module is further configured to instruct the execution device in the instruction block sequence table to process the execution device in the instruction block sequence table according to the running sequence of the instruction block sequence table and the jump instruction mapping table. The instruction block group.
- 根据权利要求5至7任一项所述的装置,其中,所述处理模块,还用于对于每一个指令,在按照所述运行顺序处理完一个指令后,将得到的数据缓存到预先分配的缓存区中。The device according to any one of claims 5 to 7, wherein the processing module is further configured to, for each instruction, after processing an instruction according to the running sequence, buffer the obtained data to a pre-allocated In the cache.
- 一种存储介质,其中,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行所述权利要求1至4中任一项中所述的方法。A storage medium, wherein a computer program is stored in the storage medium, wherein the computer program is configured to execute the method described in any one of claims 1 to 4 when the computer program is run.
- 一种电子装置,包括存储器和处理器,其中,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行所述权利要求1至4中任一项中所述的方法。An electronic device comprising a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to run the computer program to execute the one described in any one of claims 1 to 4 Methods.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910562823.6A CN112148291A (en) | 2019-06-26 | 2019-06-26 | Instruction block processing method and device, storage medium and electronic device |
CN201910562823.6 | 2019-06-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020259020A1 true WO2020259020A1 (en) | 2020-12-30 |
Family
ID=73869963
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/085180 WO2020259020A1 (en) | 2019-06-26 | 2020-04-16 | Instruction block processing method and apparatus, storage medium, and electronic device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112148291A (en) |
WO (1) | WO2020259020A1 (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106325967A (en) * | 2015-06-30 | 2017-01-11 | 华为技术有限公司 | Hardware acceleration method, compiler, and device |
US20180011710A1 (en) * | 2016-07-11 | 2018-01-11 | DeePhi Technology Co., Ltd. | Computing System and controller thereof |
CN108027731A (en) * | 2015-09-19 | 2018-05-11 | 微软技术许可有限责任公司 | Debugging for block-based processor is supported |
US20180293057A1 (en) * | 2017-04-11 | 2018-10-11 | Beijing Deephi Technology Co., Ltd. | Programming model of neural network-oriented heterogeneous computing platform |
CN109272109A (en) * | 2018-10-30 | 2019-01-25 | 北京地平线机器人技术研发有限公司 | The instruction dispatching method and device of neural network model |
CN109919311A (en) * | 2019-03-13 | 2019-06-21 | 北京地平线机器人技术研发有限公司 | The method for generating instruction sequence, the method and apparatus for executing neural network computing |
-
2019
- 2019-06-26 CN CN201910562823.6A patent/CN112148291A/en active Pending
-
2020
- 2020-04-16 WO PCT/CN2020/085180 patent/WO2020259020A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106325967A (en) * | 2015-06-30 | 2017-01-11 | 华为技术有限公司 | Hardware acceleration method, compiler, and device |
CN108027731A (en) * | 2015-09-19 | 2018-05-11 | 微软技术许可有限责任公司 | Debugging for block-based processor is supported |
US20180011710A1 (en) * | 2016-07-11 | 2018-01-11 | DeePhi Technology Co., Ltd. | Computing System and controller thereof |
US20180293057A1 (en) * | 2017-04-11 | 2018-10-11 | Beijing Deephi Technology Co., Ltd. | Programming model of neural network-oriented heterogeneous computing platform |
CN109272109A (en) * | 2018-10-30 | 2019-01-25 | 北京地平线机器人技术研发有限公司 | The instruction dispatching method and device of neural network model |
CN109919311A (en) * | 2019-03-13 | 2019-06-21 | 北京地平线机器人技术研发有限公司 | The method for generating instruction sequence, the method and apparatus for executing neural network computing |
Also Published As
Publication number | Publication date |
---|---|
CN112148291A (en) | 2020-12-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102544522B1 (en) | Data processing method and related products | |
US10942716B1 (en) | Dynamic computational acceleration using a heterogeneous hardware infrastructure | |
US20200042856A1 (en) | Scheduler for mapping neural networks onto an array of neural cores in an inference processing unit | |
CN111258744A (en) | Task processing method based on heterogeneous computation and software and hardware framework system | |
CN111651207B (en) | Neural network model operation chip, method, device, equipment and medium | |
US11003429B1 (en) | Compile-time scheduling | |
CN111190741B (en) | Scheduling method, equipment and storage medium based on deep learning node calculation | |
CN110430444A (en) | A kind of video stream processing method and system | |
WO2021000971A1 (en) | Method and device for generating operation data and related product | |
CN109491664B (en) | iOS application program generation method, device, equipment and storage medium | |
US11733983B2 (en) | Method and apparatus for generating metadata by a compiler | |
US20210158131A1 (en) | Hierarchical partitioning of operators | |
CN109196476A (en) | Seamless high-performance interoperability between the different type figure of shared garbage collector | |
CN114217886A (en) | Function calling method, computing device and storage medium | |
KR101826828B1 (en) | System and method for managing log data | |
US11631001B2 (en) | Heterogeneous computing on a system-on-chip, including machine learning inference | |
WO2020259020A1 (en) | Instruction block processing method and apparatus, storage medium, and electronic device | |
CN115186305B (en) | Method for constructing data element model and producing data element | |
US11573777B2 (en) | Method and apparatus for enabling autonomous acceleration of dataflow AI applications | |
Peñil et al. | Automatic synthesis from UML/MARTE models using channel semantics | |
CN114168151A (en) | Container-based program compiling method and device, electronic equipment and storage medium | |
Delestrac et al. | Demystifying the TensorFlow eager execution of deep learning inference on a CPU-GPU tandem | |
WO2023071509A1 (en) | Model compilation method and apparatus, and model running system | |
US11537310B2 (en) | Threading of replication based on data type | |
CN110879744B (en) | Method and system for executing computation graph by multiple threads |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20832669 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20832669 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 25.05.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20832669 Country of ref document: EP Kind code of ref document: A1 |