WO2020259020A1 - Instruction block processing method and apparatus, storage medium, and electronic device - Google Patents

Instruction block processing method and apparatus, storage medium, and electronic device Download PDF

Info

Publication number
WO2020259020A1
WO2020259020A1 PCT/CN2020/085180 CN2020085180W WO2020259020A1 WO 2020259020 A1 WO2020259020 A1 WO 2020259020A1 CN 2020085180 W CN2020085180 W CN 2020085180W WO 2020259020 A1 WO2020259020 A1 WO 2020259020A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
instruction block
jump
neural network
blocks
Prior art date
Application number
PCT/CN2020/085180
Other languages
French (fr)
Chinese (zh)
Inventor
姚海东
徐东
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2020259020A1 publication Critical patent/WO2020259020A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/61Installation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/61Installation
    • G06F8/63Image based installation; Cloning; Build to order

Definitions

  • This application relates to the field of neural networks, and in particular to a method and device for processing instruction blocks, storage media, and electronic devices.
  • the deep neural network model solves business problems and needs to perform an inference process.
  • Devices that perform inference calculations generally include central processing unit (Central Processing Unit, referred to as CPU), graphics processing unit (Graphics Processing Unit, referred to as GPU), field programmable gate array (Field Programable Gate Array, referred to as FPGA), etc.
  • CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • FPGA Field Programable Gate Array
  • a deep neural network model needs to be called first to detect whether the image contains a human face (Face Detection Face Image). Detection), if a person’s face image is input into another deep neural network for inference operation, the detailed feature information of the face image is obtained, and identification (Face Idnetification, face recognition) is performed, and finally the business office is obtained The desired result.
  • the embodiments of the present application provide a method and device for processing instruction blocks, a storage medium, and an electronic device to solve the problems of how to schedule and process different instructions for one or more neural network systems in the related art.
  • a method for processing instruction blocks which includes: compiling a description file of a neural network model by a compilation module to obtain a mirrored package, wherein the mirrored package includes: instruction block group, instruction A block sequence table, a jump instruction mapping table, the instruction block group includes a plurality of instruction blocks to be processed, the instruction block sequence table is used to indicate the running sequence of the plurality of instruction blocks, and the execution device for running the instruction blocks ,
  • the execution device includes a processor, an acceleration device, a jump instruction is set after each instruction block, and the jump instruction mapping table includes: the jump instruction and the next instruction block to be executed;
  • the mirror package is described, and the instruction blocks of the instruction block group are processed according to the instruction block sequence table and the jump instruction mapping table.
  • the description file of the neural network model is compiled by the compiling module to obtain the image package, including: the description file of the multiple neural network models is compiled by the compilation module to obtain the mirror image corresponding to the multiple neural network models package.
  • processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table includes: instructing the execution device in the instruction block sequence table to follow the instruction block The running sequence of the sequence table and the jump instruction mapping table process the instruction blocks of the instruction block group.
  • the method further includes: for each instruction, in accordance with the execution sequence After processing an instruction, the obtained data is cached in the pre-allocated buffer area.
  • an instruction block processing device including: a compiling module for compiling the description file of the neural network model to obtain an image package, wherein the image package includes: instructions A block group, an instruction block sequence table, a jump instruction mapping table, the instruction block group includes a plurality of instruction blocks to be processed, and the instruction block sequence table is used to indicate the running sequence of the plurality of instruction blocks, and the running instructions
  • the execution device includes: a processor, an acceleration device, a jump instruction is set after each instruction block, and the jump instruction mapping table includes: the jump instruction and the next executed instruction Block; processing module, used to load the mirrored package, and process the instruction block of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
  • the compilation module is used to compile description files of multiple neural network models through the compilation module to obtain mirror packages corresponding to the multiple neural network models.
  • the processing module is further used to instruct the execution device in the instruction block sequence table to process the instruction block group according to the running sequence of the instruction block sequence table and the jump instruction mapping table .
  • the processing module is further configured to cache the obtained data in a pre-allocated buffer area after processing an instruction according to the running sequence for each instruction.
  • a storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps in any one of the foregoing method embodiments when running.
  • an electronic device including a memory and a processor, the memory is stored with a computer program, and the processor is configured to run the computer program to execute any of the above Steps in the method embodiment.
  • FIG. 1 is a block diagram of the hardware structure of a terminal of an instruction block processing method according to an embodiment of the present application
  • Figure 2 is a flowchart of a method for processing an instruction block according to an embodiment of the present application
  • Fig. 3 is a structural block diagram of an instruction block processing device according to an embodiment of the present application.
  • Fig. 4 is a schematic diagram of a working flow of a compilation module according to a preferred embodiment of the present application.
  • FIG. 5 is a schematic diagram of the structure of an image package according to a preferred embodiment of the present application.
  • Fig. 6 is a functional block diagram of a running state module according to a preferred embodiment of the present application.
  • Figure 7 is a schematic diagram of input and output buffers and control information according to a preferred embodiment of the present application.
  • FIG. 8 is a schematic diagram of adding instruction blocks and jump instructions of an acceleration device according to a preferred embodiment of the present application.
  • FIG. 9 is a mapping table of the jump position of an instruction block of an acceleration device according to a preferred embodiment of the present application.
  • FIG. 10 is a flow chart of an acceleration device operating according to instructions according to a preferred embodiment of the present application.
  • Fig. 11 is an internal functional block diagram of an acceleration device according to a preferred embodiment of the present application.
  • Figure 12 is a flowchart of interaction between a host and an acceleration device according to a preferred embodiment of the present application
  • Fig. 13 is an overall system block diagram according to a preferred embodiment of the present application.
  • FIG. 1 is a hardware structure block diagram of a terminal in a method for processing instruction blocks in an embodiment of the present application.
  • the terminal 10 may include one or more (only one is shown in FIG. 1) processor 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA)
  • the memory 104 for storing data
  • the aforementioned terminal may also include a transmission device 106 and an input/output device 108 for communication functions.
  • the terminal 10 may also include more or fewer components than those shown in FIG. 1, or have the same functions as those shown in FIG. 1 or more different configurations than those shown in FIG. 1.
  • the memory 104 may be used to store computer programs, for example, software programs and modules of application software, such as the computer programs corresponding to the navigation method of online ride-hailing in the embodiment of the present application.
  • the processor 102 runs the computer programs stored in the memory 104, Thereby, various functional applications and data processing are executed, that is, the above-mentioned method is realized.
  • the memory 104 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the memory 104 may further include a memory remotely provided with respect to the processor 102, and these remote memories may be connected to the terminal 10 through a neural network model. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the transmission device 106 is used to receive or send data via a network.
  • the aforementioned specific examples of the network may include a wireless network provided by the communication provider of the terminal 10.
  • the transmission device 106 includes a network adapter (Network Interface Controller, NIC for short), which can be connected to other network devices through a base station to communicate with the Internet.
  • the transmission device 106 may be a radio frequency (Radio Frequency, referred to as RF) module, which is used to communicate with the Internet in a wireless manner.
  • RF Radio Frequency
  • FIG. 2 is a flowchart of the method for processing an instruction block according to an embodiment of the present application. As shown in FIG. 2, the process includes the following steps:
  • step S202 the description file of the neural network model is compiled by the compiling module to obtain a mirrored package, where the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, and the instruction block group includes multiple to-be-processed Instruction block, instruction block sequence table is used to indicate the running sequence of multiple instruction blocks, and the execution device for running the instruction block, the execution device includes: processor, acceleration device, each instruction block is set with a jump instruction, jump instruction
  • the mapping table includes: jump instructions and the next executed instruction block;
  • Step S204 Load the mirrored package, and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
  • the description file of the neural network model is compiled through the compilation module to obtain a mirrored package, wherein the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, and the instruction block group includes A plurality of instruction blocks to be processed, the instruction block sequence table is used to indicate the execution sequence of the plurality of instruction blocks, and an execution device for running the instruction blocks.
  • the execution device includes: a processor, an acceleration device, each A jump instruction is set after the instruction block, and the jump instruction mapping table includes: the jump instruction and the next instruction block to be executed; load the mirror package, and follow the instruction block sequence table and the jump instruction
  • the instruction mapping table processes the instruction blocks of the instruction block group, which solves the problem of how to schedule and process different instruction blocks for one or more neural network systems in the related technology, and then flexibly process multiple instruction block groups. Instruction block.
  • the description file of the neural network model is compiled through the compiling module to obtain the image package, including: compiling the description files of the multiple neural network models through the compilation module to obtain the mirror images corresponding to the multiple neural network models package.
  • processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table includes: instructing the execution device in the instruction block sequence table to follow the instruction block The running sequence of the sequence table and the jump instruction mapping table process the instruction blocks of the instruction block group.
  • the method further includes: for each instruction, in accordance with the execution sequence After processing an instruction, the obtained data is cached in the pre-allocated buffer area.
  • the method according to the above embodiment can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is Better implementation.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the method described in each embodiment of the present application.
  • an instruction block processing device is also provided, and the device is used to implement the above-mentioned embodiments and preferred implementations, and those that have been described will not be repeated.
  • the term "module" can implement a combination of software and/or hardware with predetermined functions.
  • the devices described in the following embodiments are preferably implemented by software, the implementation of hardware or a combination of software and hardware is also possible and conceived.
  • Fig. 3 is a structural block diagram of an instruction block processing device according to an embodiment of the present application. As shown in Fig. 3, the device includes:
  • the compiling module 30 is used to compile the description file of the neural network model to obtain a mirrored package, where the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, and the instruction block group includes waiting A plurality of instruction blocks processed, the instruction block sequence table is used to indicate the execution sequence of the plurality of instruction blocks and an execution device for running the instruction blocks.
  • the execution device includes a processor, an acceleration device, and each of the A jump instruction is set after the instruction block, and the jump instruction mapping table includes: the jump instruction and the next instruction block to be executed;
  • the processing module 32 is configured to load the image package, and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
  • the description file of the neural network model is compiled through the compilation module to obtain a mirrored package, wherein the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, and the instruction block group includes A plurality of instruction blocks to be processed, the instruction block sequence table is used to indicate the execution sequence of the plurality of instruction blocks, and an execution device for running the instruction blocks.
  • the execution device includes: a processor, an acceleration device, each A jump instruction is set after the instruction block, and the jump instruction mapping table includes: the jump instruction and the next instruction block to be executed; load the mirror package, and follow the instruction block sequence table and the jump instruction
  • the instruction mapping table processes the instruction blocks of the instruction block group, which solves the problem of how to schedule and process different instruction blocks for one or more neural network systems in the related technology, and then flexibly process multiple instruction block groups. Instruction block.
  • the compilation module 30 is configured to compile description files of multiple neural network models through the compilation module to obtain mirror packages corresponding to the multiple neural network models.
  • the processing module 32 is also used to instruct the execution device in the instruction block sequence table to process the instruction block according to the running sequence of the instruction block sequence table and the jump instruction mapping table group.
  • the processing module 32 is further configured to cache the obtained data in a pre-allocated buffer area after processing an instruction according to the running sequence for each instruction.
  • the preferred embodiment of this application focuses on the realization of the reasoning function of the face detection service.
  • the neural network model 1 performs the face detection function, recognizes whether the picture contains a face, and gives the position of the face in the picture.
  • Neural network model 2 performs face recognition, extracts the features of the face given by neural network model 1, compares with the database, and gives the recognition results; generally, some preprocessing is required before inputting the neural network model. After the network model runs, some post-processing work is performed.
  • the technical solution of the preferred embodiment of the present application includes the following steps:
  • Step 1 Input the two neural network models into the compilation module for compilation
  • Step 1.1 Two neural network model description files are input into the coding module for compilation;
  • Step 1.2 Compile the module to compile, and finally output the image package, as shown in Figure 4;
  • the image package includes instruction block group and instruction block sequence description, and acceleration device jump mapping table.
  • the instruction block group includes NET_C0, NET_D1, NET_C1, NET_D2, NET_C2; respectively correspond to the CPU face detection preprocessing (NET_C0), the acceleration device performs face detection (NET_D1), and the CPU performs face detection post-processing and face recognition Preprocessing (NET_C1), the acceleration device performs face recognition processing (NET_D2), and the CPU performs face recognition post-processing to complete the business (NET_C2);
  • the instruction block sequence table gives C0-D1-C1-D2-C2, where Cx indicates that the CPU performs calculation processing with sequence number x; Dx indicates that the acceleration device Device performs sequence number x segment instruction processing.
  • the preferred embodiment of the present application assumes that the related operations described by neural network model 1 are processed by acceleration equipment as a whole (NET_D1), and the calculation requirements described by neural network model 2 are processed by acceleration equipment (NET_D2). Face detection preprocessing (NET_C0), the output result is submitted to the acceleration device for processing, and face detection is required.
  • the acceleration device receives the relevant input, it completes the face detection NET_D1 operation, outputs the information of the face position and other information, which is obtained by the CPU, performs face detection post-processing and face recognition preprocessing (NET_C1 description), and submits the processed data to the acceleration device Perform face recognition (NET_D2) processing. After the processing is completed, submit it to the CPU for neural network model face recognition post-processing (NET_C2) to complete the overall business function;
  • the acceleration device supports part of the operations described by the neural network model, it can be divided into multiple modules. For details, please refer to the preferred embodiment 2. .
  • Step 2 Related processing flow in running state
  • the image package output is completed in the compilation stage and submitted to the running state for operation.
  • the running state includes the following modules: load module, acceleration device control management module, input output management module, upper-level API interface, etc., among which,
  • the load module completes the loading of the instruction block group to the corresponding device, and the acceleration device management module controls the startup, stop, and reset of the acceleration device; the API interface completes the interaction with the upper-level user;
  • the input and output management module completes the input and output interaction with the acceleration device (as shown in Figure 11 is the internal block diagram of the acceleration device), and organizes the running items through the control information contained in the cache item, specifically: from the acceleration device side See, there are an input buffer (InBuffer) and an output buffer (OutBuffer); as shown in Figure 7: the buffer content has two blocks, one block of control information, one block of data information; the control information includes picture sequence number Px, instruction block processing equipment and Instruction block serial number Tx, where x is a number, T is a device type, there are C and D, C represents the HOST side CPU, and D represents the acceleration device Device.
  • Step 2.1 Use API interface to set input and output, complete programming
  • Step 2.2 A general compiling tool (such as gcc) compiles the code and generates an executable file;
  • Step 2.3 Run the executable file: the running process is shown in Figure 12.
  • the CPU is first scheduled for face detection preprocessing (NET_C0). After the calculation is completed, the data is filled into the InBuffer, and the filling control information is P1-Net-D1, indicating that face detection is required Neural network model inference process.
  • the accelerator obtains the content in the InBuffer and performs the instruction processing of face detection (Net-D1). After the processing is completed, the data is filled into the OutBuffer and the input control information (P1-Net-D1) is copied at the same time.
  • the HOST side After the HOST side obtains the content from the device OutBuffer, it judges to perform face detection post-processing and face recognition neural network model preprocessing (Net-C1) according to the instruction block sequence table. After the processing is completed, fill the data into InBuffer, And fill the control information as P1-Net_D2 (face recognition neural network model operation) according to the instruction set sequence table.
  • the acceleration device obtains this item, performs face recognition calculation (Net_d2) processing, outputs data, and copies control information P1-Net_D2.
  • the CPU side obtains this item, and performs face recognition data post-processing (NET_C2) processing according to the instruction block sequence table to complete the overall reasoning.
  • NET_C2 face recognition data post-processing
  • the acceleration device has two different functions, face detection (NET-D1) and face recognition (NET-D2).
  • face detection NET-D1
  • face recognition NET-D2
  • the time-division multiplexing is to use jump instructions and jump mapping tables to complete the neural network.
  • the model function is switched by reasoning. The relevant situation is as follows:
  • Step a The compilation module generates a neural network model according to the equipment situation and dispatches the instruction block NET_Tx to different equipment (face detection processing (NET_D1) and face recognition processing (NET_D2) in this example;
  • Step b The compilation module adds a jump instruction JMP 0 after the acceleration device instruction block NET_Dx;
  • Step c The compilation module generates the acceleration device jump mapping table (it can also be generated in the running state, and the compilation module generation is described here).
  • the jump mapping table is shown in Figure 9.
  • Step d The compilation module adds the buff acquisition instruction and the JMP Rj instruction before the acceleration device instruction block NET-D0; ( Figure 8)
  • Preferred embodiment 2 The single neural network model needs to run together on the host and the acceleration device
  • Step 2.1 The neural network model is input to the compilation module, compiled, and the image package is output;
  • the neural network model instruction block combination and instruction block sequence table are C0-D1-C1-D2-C2; it means that the neural network model needs to be preprocessed by the host CPU first, then the device is accelerated, then the CPU is processed, and then the device is accelerated, and then processed, Finally CPU processing;
  • the device jump table example is the same as the preferred embodiment 1, and will not be repeated here.
  • Step 2.2 According to the API interface code, compile and generate an executable file
  • Step 2.3 Run in the running state on the HOST side, load and run the image; the process is roughly the same as step 2 in the preferred embodiment 1;
  • Step 2.4 The running state continues to run, and the inference calculation result is continuously given.
  • Preferred embodiment 3 Combination of multiple neural network models, and a single neural network model needs multiple splits
  • Step 3.1 Input the multi-neural network model into the compilation module, compile, and output the image package;
  • the neural network model instruction block combination and instruction block sequence table can be set as C0-D1-C1-D2-C2-D3-C3-C4-D4-C5-D5-C6;
  • the device jump table example is the same as the preferred embodiment 1.
  • Step 3.2 Same as other steps in the preferred embodiment 2.
  • the embodiments of the present application and the content disclosed in the preferred embodiments are not only applicable to multi-neural network model combined reasoning services, but also services that require a combination of HOST and acceleration equipment to complete a single neural network model.
  • a combination of multiple neural network models with a neural network model completed by a combination of a HOST and an acceleration device can also be applied; a combination of a host and multiple acceleration devices can also be applied; these are all within the protection scope of this application.
  • the technical solutions of the above-mentioned embodiments and preferred embodiments of the present application aim at the problem that the business reasoning related to the combination of multi-neural neural network models is difficult to implement. It provides a method for compiling and running in two stages, using jump instructions and mapping A method, device and system for implementing multi-neural neural network model combination business reasoning by table and instruction block table sequence.
  • an acceleration device uses a simple jump instruction based on a jump mapping table to perform time division multiplexing to complete different arithmetic functions
  • a neural neural network model acceleration device including: an instruction cache, used to store related instructions; a functional unit set module, which implements neural neural network model related calculation modules; jump instructions, Jump mapping table, used to realize the jump of neural network model function group; register group, etc.;
  • a compilation module of a deep neural network model which compiles and converts the deep neural network model into a related instruction set; and generates a sequence table of related instruction blocks and a jump mapping table;
  • a running state module of neural network model inference operation including a loading module, which loads related images to a specific location; device control, which controls the start, stop, and reset of acceleration devices;
  • the input and output management of the equipment provide the data and requirements to be processed to the equipment, and obtain the processing results of the equipment; including the programming interface (API) provided to business users;
  • API programming interface
  • the compiler and runtime system (as shown in Figure 13) can be used to complete the reasoning of the multi-neural neural network model conveniently, quickly and efficiently. Landing, simple completion of related business functions.
  • the embodiment of the present application also provides a storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps in any of the foregoing method embodiments when running.
  • the foregoing storage medium may be configured to store a computer program for executing the following steps:
  • S1 Compile the description file of the neural network model through the compilation module to obtain a mirrored package, where the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, instruction block group includes multiple instructions to be processed Block, instruction block sequence table is used to indicate the running sequence of multiple instruction blocks, and the execution device for running the instruction block, the execution device includes: processor, acceleration device, each instruction block is set with jump instructions, jump instruction mapping The table includes: jump instruction and next instruction block to be executed;
  • the foregoing storage medium may include, but is not limited to: U disk, Read-Only Memory (Read-Only Memory, ROM for short), Random Access Memory (Random Access Memory, RAM for short), Various media that can store computer programs, such as mobile hard disks, magnetic disks, or optical disks.
  • the embodiment of the present application also provides a storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps in any of the foregoing method embodiments when running.
  • the embodiment of the present application also provides an electronic device, including a memory and a processor, the memory is stored with a computer program, and the processor is configured to run the computer program to execute the steps in any of the foregoing method embodiments.
  • the aforementioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the aforementioned processor, and the input-output device is connected to the aforementioned processor.
  • the foregoing processor may be configured to execute the following steps through a computer program:
  • S1 Compile the description file of the neural network model through the compilation module to obtain a mirrored package, where the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, instruction block group includes multiple instructions to be processed Block, instruction block sequence table is used to indicate the running sequence of multiple instruction blocks, and the execution device for running the instruction block, the execution device includes: processor, acceleration device, each instruction block is set with jump instructions, jump instruction mapping The table includes: jump instruction and next instruction block to be executed;
  • the embodiment of the present application also provides an electronic device, including a memory and a processor, the memory is stored with a computer program, and the processor is configured to run the computer program to execute the steps in any of the foregoing method embodiments.
  • the aforementioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the aforementioned processor, and the input-output device is connected to the aforementioned processor.
  • modules or steps of this application can be implemented by a general computing device, and they can be concentrated on a single computing device, or distributed on a nerve composed of multiple computing devices.
  • they can optionally be implemented with program codes executable by the computing device, so that they can be stored in the storage device for execution by the computing device, and in some cases, they can be different from here.
  • the steps shown or described are executed in the order of, or they are respectively fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module to achieve. In this way, this application is not limited to any specific hardware and software combination.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

An instruction block processing method and apparatus, a storage medium and an electronic apparatus. Said method comprises: compiling, by means of a compiling module, a description file of a neural network model, to obtain an image package, the image package comprising an instruction block group, an instruction block sequence table, a jump instruction mapping table, the instruction block group comprising a plurality of instruction blocks to be processed, the instruction block sequence table being used to indicate the operation sequence of the plurality of instruction blocks, and an execution device for operating the instruction blocks, the execution device comprising a processor, an acceleration device, a jump instruction being provided after each of the instruction blocks, the jump instruction mapping table comprising the jump instruction and a next instruction block to be executed (S202); and loading the image package and processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table (S204).

Description

指令块的处理方法及装置、存储介质、电子装置Method and device for processing instruction block, storage medium and electronic device
相关申请的交叉引用Cross references to related applications
本申请基于申请号为201910562823.6、申请日为2019年6月26日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本申请。This application is filed based on a Chinese patent application with an application number of 201910562823.6 and an application date of June 26, 2019, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated into this application by way of introduction.
技术领域Technical field
本申请涉及神经网络领域,具体而言,涉及一种指令块的处理方法及装置、存储介质、电子装置。This application relates to the field of neural networks, and in particular to a method and device for processing instruction blocks, storage media, and electronic devices.
背景技术Background technique
随着计算力的极大提高和大数据的获取便利,深度学习技术取得了巨大进步,越来越多的图像处理、自然语言分析等问题,能通过深度学习技术,得到很好的解决。With the tremendous improvement of computing power and the convenience of big data acquisition, deep learning technology has made great progress. More and more problems such as image processing and natural language analysis can be solved well through deep learning technology.
深度神经神经网络模型解决业务问题,需要执行推理(Inference)过程。执行推理运算的设备一般有中央处理器(Central Processing Unit,简称为CPU),图形处理器(Graphics Processing Unit,简称为GPU),现场可编程门阵列(Field Programable Gate Array,简称为FPGA)等,在进行此类业务落地的过程中,要高效利用资源和快速获得结果,既需要对推理运算设备的计算、存储架构有深入理解,也需要对深度神经网络描述的运算要求有深刻理解。往往有较大难度,并会花费较长时间。The deep neural network model solves business problems and needs to perform an inference process. Devices that perform inference calculations generally include central processing unit (Central Processing Unit, referred to as CPU), graphics processing unit (Graphics Processing Unit, referred to as GPU), field programmable gate array (Field Programable Gate Array, referred to as FPGA), etc. In the process of implementing this type of business, to efficiently use resources and quickly obtain results, it is necessary to have a deep understanding of the calculation and storage architecture of inference computing equipment, and a deep understanding of the computing requirements described by deep neural networks. It is often difficult and takes a long time.
特别的,有些业务功能往往需要多个神经网络的组合来完成,比如,在人脸识别业务场景中,需先调用一深度神经网络模型检测图像中是否含有人的脸部图像(Face Detection人脸检测),如果有人的脸部图像,再将该图像输入另一种深度神经网络进行推理运算,获取该人脸图像的详细特征信息,进行辨别(Face Idnetification,人脸识别),最终获得业务所需的结果。In particular, some business functions often require a combination of multiple neural networks to complete. For example, in a face recognition business scenario, a deep neural network model needs to be called first to detect whether the image contains a human face (Face Detection Face Image). Detection), if a person’s face image is input into another deep neural network for inference operation, the detailed feature information of the face image is obtained, and identification (Face Idnetification, face recognition) is performed, and finally the business office is obtained The desired result.
针对相关技术中,对于一个或多个神经网络系统,如何调度处理不同 的指令块等问题,目前尚未存在有效的解决方案。Regarding the related technology, for one or more neural network systems, how to schedule and process different instruction blocks and other issues, there is currently no effective solution.
发明内容Summary of the invention
本申请实施例提供了一种指令块的处理方法及装置、存储介质、电子装置,以解决相关技术中对于一个或多个神经网络系统,如何调度处理不同的指令等问题。The embodiments of the present application provide a method and device for processing instruction blocks, a storage medium, and an electronic device to solve the problems of how to schedule and process different instructions for one or more neural network systems in the related art.
根据本申请的一个实施例,提供了一种指令块的处理方法,包括:通过编译模块对神经网络模型的描述文件进行编译,得到镜像包,其中,所述镜像包包括:指令块组,指令块序表,跳转指令映射表,所述指令块组包括待处理的多个指令块,所述指令块序表用于指示所述多个指令块的运行顺序,以及运行指令块的执行设备,所述执行设备包括:处理器,加速设备,每一个所述指令块后设置有跳转指令,所述跳转指令映射表包括:所述跳转指令和下一个执行的指令块;加载所述镜像包,并按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块。According to an embodiment of the present application, there is provided a method for processing instruction blocks, which includes: compiling a description file of a neural network model by a compilation module to obtain a mirrored package, wherein the mirrored package includes: instruction block group, instruction A block sequence table, a jump instruction mapping table, the instruction block group includes a plurality of instruction blocks to be processed, the instruction block sequence table is used to indicate the running sequence of the plurality of instruction blocks, and the execution device for running the instruction blocks , The execution device includes a processor, an acceleration device, a jump instruction is set after each instruction block, and the jump instruction mapping table includes: the jump instruction and the next instruction block to be executed; The mirror package is described, and the instruction blocks of the instruction block group are processed according to the instruction block sequence table and the jump instruction mapping table.
在本申请实施例中,通过编译模块对神经网络模型的描述文件进行编译,得到镜像包,包括:通过编译模块对多个神经网络模型的描述文件进行编译,得到多个神经网络模型对应的镜像包。In the embodiment of the present application, the description file of the neural network model is compiled by the compiling module to obtain the image package, including: the description file of the multiple neural network models is compiled by the compilation module to obtain the mirror image corresponding to the multiple neural network models package.
在本申请实施例中,按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块,包括:指示所述指令块序表中的执行设备按照所述指令块序表的运行顺序和所述跳转指令映射表处理所述指令块组的指令块。In the embodiment of the present application, processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table includes: instructing the execution device in the instruction block sequence table to follow the instruction block The running sequence of the sequence table and the jump instruction mapping table process the instruction blocks of the instruction block group.
在本申请实施例中,按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块之后,所述方法还包括:对于每一个指令,在按照所述运行顺序处理完一个指令后,将得到的数据缓存到预先分配的缓存区中。In the embodiment of the present application, after processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table, the method further includes: for each instruction, in accordance with the execution sequence After processing an instruction, the obtained data is cached in the pre-allocated buffer area.
根据本申请的另一个实施例,还提供了一种指令块的处理装置,包括:编译模块,用于对神经网络模型的描述文件进行编译,得到镜像包,其中,所述镜像包包括:指令块组,指令块序表,跳转指令映射表,所述指令块组包括待处理的多个指令块,所述指令块序表用于指示所述多个指令块的 运行顺序,以及运行指令块的执行设备,所述执行设备包括:处理器,加速设备,每一个所述指令块后设置有跳转指令,所述跳转指令映射表包括:所述跳转指令和下一个执行的指令块;处理模块,用于加载所述镜像包,并按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块。According to another embodiment of the present application, there is also provided an instruction block processing device, including: a compiling module for compiling the description file of the neural network model to obtain an image package, wherein the image package includes: instructions A block group, an instruction block sequence table, a jump instruction mapping table, the instruction block group includes a plurality of instruction blocks to be processed, and the instruction block sequence table is used to indicate the running sequence of the plurality of instruction blocks, and the running instructions A block execution device, the execution device includes: a processor, an acceleration device, a jump instruction is set after each instruction block, and the jump instruction mapping table includes: the jump instruction and the next executed instruction Block; processing module, used to load the mirrored package, and process the instruction block of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
在本申请实施例中,所述编译模块,用于通过编译模块对多个神经网络模型的描述文件进行编译,得到多个神经网络模型对应的镜像包。In the embodiment of the present application, the compilation module is used to compile description files of multiple neural network models through the compilation module to obtain mirror packages corresponding to the multiple neural network models.
在本申请实施例中,所述处理模块,还用于指示所述指令块序表中的执行设备按照所述指令块序表的运行顺序和所述跳转指令映射表处理所述指令块组。In the embodiment of the present application, the processing module is further used to instruct the execution device in the instruction block sequence table to process the instruction block group according to the running sequence of the instruction block sequence table and the jump instruction mapping table .
在本申请实施例中,所述处理模块,还用于对于每一个指令,在按照所述运行顺序处理完一个指令后,将得到的数据缓存到预先分配的缓存区中。In the embodiment of the present application, the processing module is further configured to cache the obtained data in a pre-allocated buffer area after processing an instruction according to the running sequence for each instruction.
根据本申请的又一个实施例,还提供了一种存储介质,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。According to another embodiment of the present application, there is also provided a storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps in any one of the foregoing method embodiments when running.
根据本申请的又一个实施例,还提供了一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行上述任一项方法实施例中的步骤。According to another embodiment of the present application, there is also provided an electronic device, including a memory and a processor, the memory is stored with a computer program, and the processor is configured to run the computer program to execute any of the above Steps in the method embodiment.
附图说明Description of the drawings
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings described here are used to provide a further understanding of the application and constitute a part of the application. The exemplary embodiments and descriptions of the application are used to explain the application and do not constitute an improper limitation of the application. In the attached picture:
图1是本申请实施例的一种指令块的处理方法的终端的硬件结构框图;FIG. 1 is a block diagram of the hardware structure of a terminal of an instruction block processing method according to an embodiment of the present application;
图2是根据本申请实施例的指令块的处理方法的流程图;Figure 2 is a flowchart of a method for processing an instruction block according to an embodiment of the present application;
图3是根据本申请实施例的指令块的处理装置的结构框图;Fig. 3 is a structural block diagram of an instruction block processing device according to an embodiment of the present application;
图4是根据本申请优选实施例的是编译模块工作流程示意图;Fig. 4 is a schematic diagram of a working flow of a compilation module according to a preferred embodiment of the present application;
图5是根据本申请优选实施例的镜像包构成示意图;FIG. 5 is a schematic diagram of the structure of an image package according to a preferred embodiment of the present application;
图6是根据本申请优选实施例的运行态模块功能框图;Fig. 6 is a functional block diagram of a running state module according to a preferred embodiment of the present application;
图7是根据本申请优选实施例的输入输出缓冲及控制信息示意图;Figure 7 is a schematic diagram of input and output buffers and control information according to a preferred embodiment of the present application;
图8是根据本申请优选实施例的加速设备指令块及跳转指令添加示意图;8 is a schematic diagram of adding instruction blocks and jump instructions of an acceleration device according to a preferred embodiment of the present application;
图9是根据本申请优选实施例的加速设备指令块跳转位置映射表;FIG. 9 is a mapping table of the jump position of an instruction block of an acceleration device according to a preferred embodiment of the present application;
图10是根据本申请优选实施例的加速设备按指令运行流程图;FIG. 10 is a flow chart of an acceleration device operating according to instructions according to a preferred embodiment of the present application;
图11是根据本申请优选实施例的加速设备内部功能框图;Fig. 11 is an internal functional block diagram of an acceleration device according to a preferred embodiment of the present application;
图12是根据本申请优选实施例的主机和加速设备交互流程图;Figure 12 is a flowchart of interaction between a host and an acceleration device according to a preferred embodiment of the present application;
图13是根据本申请优选实施例的总体系统框图。Fig. 13 is an overall system block diagram according to a preferred embodiment of the present application.
具体实施方式Detailed ways
下文中将参考附图并结合实施例来详细说明本申请。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。Hereinafter, the application will be described in detail with reference to the drawings and in conjunction with embodiments. It should be noted that the embodiments in this application and the features in the embodiments can be combined with each other if there is no conflict.
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that the terms "first" and "second" in the description and claims of the application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence.
实施例1Example 1
本申请实施例1所提供的方法实施例可以在终端或者类似的运算装置中执行。以运行在终端上为例,图1是本申请实施例的一种指令块的处理方法的终端的硬件结构框图。如图1所示,终端10可以包括一个或多个(图1中仅示出一个)处理器102(处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)和用于存储数据的存储器104,可选地,上述终端还可以包括用于通信功能的传输设备106以及输入输出设备108。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述终端的结构造成限定。例如,终端10还可包括比图1中所示更多或者更少的组件,或者具有与图1所示等同功能或比图1所示功能更多的不同的配置。The method embodiment provided in Embodiment 1 of the present application may be executed in a terminal or similar computing device. Taking running on a terminal as an example, FIG. 1 is a hardware structure block diagram of a terminal in a method for processing instruction blocks in an embodiment of the present application. As shown in FIG. 1, the terminal 10 may include one or more (only one is shown in FIG. 1) processor 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) And the memory 104 for storing data, optionally, the aforementioned terminal may also include a transmission device 106 and an input/output device 108 for communication functions. Those of ordinary skill in the art can understand that the structure shown in FIG. 1 is only for illustration, and does not limit the structure of the foregoing terminal. For example, the terminal 10 may also include more or fewer components than those shown in FIG. 1, or have the same functions as those shown in FIG. 1 or more different configurations than those shown in FIG. 1.
存储器104可用于存储计算机程序,例如,应用软件的软件程序以及模块,如本申请实施例中的网约车的导航方法对应的计算机程序,处理器102通过运行存储在存储器104内的计算机程序,从而执行各种功能应用以及数据处理,即实现上述的方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器102远程设置的存储器,这些远程存储器可以通过神经网络模型连接至终端10。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 104 may be used to store computer programs, for example, software programs and modules of application software, such as the computer programs corresponding to the navigation method of online ride-hailing in the embodiment of the present application. The processor 102 runs the computer programs stored in the memory 104, Thereby, various functional applications and data processing are executed, that is, the above-mentioned method is realized. The memory 104 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include a memory remotely provided with respect to the processor 102, and these remote memories may be connected to the terminal 10 through a neural network model. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
传输装置106用于经由一个网络接收或者发送数据。上述的网络具体实例可包括终端10的通信供应商提供的无线网络。在一个实例中,传输装置106包括一个网络适配器(Network Interface Controller,简称为NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输装置106可以为射频(Radio Frequency,简称为RF)模块,其用于通过无线方式与互联网进行通讯。The transmission device 106 is used to receive or send data via a network. The aforementioned specific examples of the network may include a wireless network provided by the communication provider of the terminal 10. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, NIC for short), which can be connected to other network devices through a base station to communicate with the Internet. In an example, the transmission device 106 may be a radio frequency (Radio Frequency, referred to as RF) module, which is used to communicate with the Internet in a wireless manner.
在本实施例中提供了一种运行于终端的指令块的处理方法,图2是根据本申请实施例的指令块的处理方法的流程图,如图2所示,该流程包括如下步骤:In this embodiment, a method for processing an instruction block running on a terminal is provided. FIG. 2 is a flowchart of the method for processing an instruction block according to an embodiment of the present application. As shown in FIG. 2, the process includes the following steps:
步骤S202,通过编译模块对神经网络模型的描述文件进行编译,得到镜像包,其中,镜像包包括:指令块组,指令块序表,跳转指令映射表,指令块组包括待处理的多个指令块,指令块序表用于指示多个指令块的运行顺序,以及运行指令块的执行设备,执行设备包括:处理器,加速设备,每一个指令块后设置有跳转指令,跳转指令映射表包括:跳转指令和下一个执行的指令块;In step S202, the description file of the neural network model is compiled by the compiling module to obtain a mirrored package, where the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, and the instruction block group includes multiple to-be-processed Instruction block, instruction block sequence table is used to indicate the running sequence of multiple instruction blocks, and the execution device for running the instruction block, the execution device includes: processor, acceleration device, each instruction block is set with a jump instruction, jump instruction The mapping table includes: jump instructions and the next executed instruction block;
步骤S204,加载所述镜像包,并按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块。Step S204: Load the mirrored package, and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
通过本申请,通过编译模块对神经网络模型的描述文件进行编译,得到镜像包,其中,所述镜像包包括:指令块组,指令块序表,跳转指令映射表,所述指令块组包括待处理的多个指令块,所述指令块序表用于指示 所述多个指令块的运行顺序,以及运行指令块的执行设备,所述执行设备包括:处理器,加速设备,每一个所述指令块后设置有跳转指令,所述跳转指令映射表包括:所述跳转指令和下一个执行的指令块;加载所述镜像包,并按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块,解决了相关技术中,对于一个或多个神经网络系统,如何调度处理不同的指令块等问题,进而灵活的处理指令块组中的多个指令块。Through this application, the description file of the neural network model is compiled through the compilation module to obtain a mirrored package, wherein the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, and the instruction block group includes A plurality of instruction blocks to be processed, the instruction block sequence table is used to indicate the execution sequence of the plurality of instruction blocks, and an execution device for running the instruction blocks. The execution device includes: a processor, an acceleration device, each A jump instruction is set after the instruction block, and the jump instruction mapping table includes: the jump instruction and the next instruction block to be executed; load the mirror package, and follow the instruction block sequence table and the jump instruction The instruction mapping table processes the instruction blocks of the instruction block group, which solves the problem of how to schedule and process different instruction blocks for one or more neural network systems in the related technology, and then flexibly process multiple instruction block groups. Instruction block.
在本申请实施例中,通过编译模块对神经网络模型的描述文件进行编译,得到镜像包,包括:通过编译模块对多个神经网络模型的描述文件进行编译,得到多个神经网络模型对应的镜像包。In the embodiment of the present application, the description file of the neural network model is compiled through the compiling module to obtain the image package, including: compiling the description files of the multiple neural network models through the compilation module to obtain the mirror images corresponding to the multiple neural network models package.
在本申请实施例中,按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块,包括:指示所述指令块序表中的执行设备按照所述指令块序表的运行顺序和所述跳转指令映射表处理所述指令块组的指令块。In the embodiment of the present application, processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table includes: instructing the execution device in the instruction block sequence table to follow the instruction block The running sequence of the sequence table and the jump instruction mapping table process the instruction blocks of the instruction block group.
在本申请实施例中,按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块之后,所述方法还包括:对于每一个指令,在按照所述运行顺序处理完一个指令后,将得到的数据缓存到预先分配的缓存区中。In the embodiment of the present application, after processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table, the method further includes: for each instruction, in accordance with the execution sequence After processing an instruction, the obtained data is cached in the pre-allocated buffer area.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiment can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is Better implementation. Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the method described in each embodiment of the present application.
在本实施例中还提供了一种指令块的处理装置,该装置用于实现上述实施例及优选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实 现也是可能并被构想的。In this embodiment, an instruction block processing device is also provided, and the device is used to implement the above-mentioned embodiments and preferred implementations, and those that have been described will not be repeated. As used below, the term "module" can implement a combination of software and/or hardware with predetermined functions. Although the devices described in the following embodiments are preferably implemented by software, the implementation of hardware or a combination of software and hardware is also possible and conceived.
图3是根据本申请实施例的指令块的处理装置的结构框图,如图3所示,该装置包括:Fig. 3 is a structural block diagram of an instruction block processing device according to an embodiment of the present application. As shown in Fig. 3, the device includes:
编译模块30,用于对神经网络模型的描述文件进行编译,得到镜像包,其中,所述镜像包包括:指令块组,指令块序表,跳转指令映射表,所述指令块组包括待处理的多个指令块,所述指令块序表用于指示所述多个指令块的运行顺序,以及运行指令块的执行设备,所述执行设备包括:处理器,加速设备,每一个所述指令块后设置有跳转指令,所述跳转指令映射表包括:所述跳转指令和下一个执行的指令块;The compiling module 30 is used to compile the description file of the neural network model to obtain a mirrored package, where the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, and the instruction block group includes waiting A plurality of instruction blocks processed, the instruction block sequence table is used to indicate the execution sequence of the plurality of instruction blocks and an execution device for running the instruction blocks. The execution device includes a processor, an acceleration device, and each of the A jump instruction is set after the instruction block, and the jump instruction mapping table includes: the jump instruction and the next instruction block to be executed;
处理模块32,用于加载所述镜像包,并按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块。The processing module 32 is configured to load the image package, and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
通过本申请,通过编译模块对神经网络模型的描述文件进行编译,得到镜像包,其中,所述镜像包包括:指令块组,指令块序表,跳转指令映射表,所述指令块组包括待处理的多个指令块,所述指令块序表用于指示所述多个指令块的运行顺序,以及运行指令块的执行设备,所述执行设备包括:处理器,加速设备,每一个所述指令块后设置有跳转指令,所述跳转指令映射表包括:所述跳转指令和下一个执行的指令块;加载所述镜像包,并按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块,解决了相关技术中,对于一个或多个神经网络系统,如何调度处理不同的指令块等问题,进而灵活的处理指令块组中的多个指令块。Through this application, the description file of the neural network model is compiled through the compilation module to obtain a mirrored package, wherein the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, and the instruction block group includes A plurality of instruction blocks to be processed, the instruction block sequence table is used to indicate the execution sequence of the plurality of instruction blocks, and an execution device for running the instruction blocks. The execution device includes: a processor, an acceleration device, each A jump instruction is set after the instruction block, and the jump instruction mapping table includes: the jump instruction and the next instruction block to be executed; load the mirror package, and follow the instruction block sequence table and the jump instruction The instruction mapping table processes the instruction blocks of the instruction block group, which solves the problem of how to schedule and process different instruction blocks for one or more neural network systems in the related technology, and then flexibly process multiple instruction block groups. Instruction block.
在本申请实施例中,所述编译模块30,用于通过编译模块对多个神经网络模型的描述文件进行编译,得到多个神经网络模型对应的镜像包。In the embodiment of the present application, the compilation module 30 is configured to compile description files of multiple neural network models through the compilation module to obtain mirror packages corresponding to the multiple neural network models.
在本申请实施例中,所述处理模块32,还用于指示所述指令块序表中的执行设备按照所述指令块序表的运行顺序和所述跳转指令映射表处理所述指令块组。In the embodiment of the present application, the processing module 32 is also used to instruct the execution device in the instruction block sequence table to process the instruction block according to the running sequence of the instruction block sequence table and the jump instruction mapping table group.
在本申请实施例中,所述处理模块32,还用于对于每一个指令,在按照所述运行顺序处理完一个指令后,将得到的数据缓存到预先分配的缓存区中。In the embodiment of the present application, the processing module 32 is further configured to cache the obtained data in a pre-allocated buffer area after processing an instruction according to the running sequence for each instruction.
以下结合优选实施例对上述指令块的处理的过程进行大致说明,但不 用于限定本申请实施例的技术方案。The process of processing the above instruction block will be roughly described below in conjunction with a preferred embodiment, but it is not used to limit the technical solution of the embodiment of the present application.
优选实施例1 Preferred embodiment 1
本申请优选实施例侧重的是人脸检测业务推理功能的实现,需要说明的是,神经网络模型1进行人脸检测功能,识别图片中是否包含人脸,并给出人脸在图片中的位置;神经网络模型2进行人脸识别,对神经网络模型1给出的人脸进行特征提取,同数据库进行比对,给出识别结果;一般,在输入神经网络模型前需进行一些预处理,神经网络模型运行完成后进行一些后处理工作。The preferred embodiment of this application focuses on the realization of the reasoning function of the face detection service. It should be noted that the neural network model 1 performs the face detection function, recognizes whether the picture contains a face, and gives the position of the face in the picture. ; Neural network model 2 performs face recognition, extracts the features of the face given by neural network model 1, compares with the database, and gives the recognition results; generally, some preprocessing is required before inputting the neural network model. After the network model runs, some post-processing work is performed.
基于上述神经网络模型1和神经网络模型2所完成的功能,本申请优选实施例的技术方案包括以下步骤:Based on the functions completed by the aforementioned neural network model 1 and neural network model 2, the technical solution of the preferred embodiment of the present application includes the following steps:
步骤1:两个神经网络模型输入编译模块进行编译;Step 1: Input the two neural network models into the compilation module for compilation;
步骤1.1:两个神经网络模型描述文件输入编码模块进行编译;Step 1.1: Two neural network model description files are input into the coding module for compilation;
步骤1.2:编译模块进行编译,最后输出镜像包,如图4所示;Step 1.2: Compile the module to compile, and finally output the image package, as shown in Figure 4;
其中,如图5所示,镜像包包括指令块组和指令块序描述,以及加速设备跳转映射表。Among them, as shown in Figure 5, the image package includes instruction block group and instruction block sequence description, and acceleration device jump mapping table.
具体地,指令块组包括NET_C0,NET_D1,NET_C1,NET_D2,NET_C2;分别对应CPU人脸检测预处理(NET_C0),加速设备进行人脸检测(NET_D1),CPU进行人脸检测后处理及人脸识别预处理(NET_C1),加速设备进行人脸识别处理(NET_D2),CPU进行人脸识别后处理,完成业务(NET_C2);Specifically, the instruction block group includes NET_C0, NET_D1, NET_C1, NET_D2, NET_C2; respectively correspond to the CPU face detection preprocessing (NET_C0), the acceleration device performs face detection (NET_D1), and the CPU performs face detection post-processing and face recognition Preprocessing (NET_C1), the acceleration device performs face recognition processing (NET_D2), and the CPU performs face recognition post-processing to complete the business (NET_C2);
指令块序表给出C0-D1-C1-D2-C2,其中,Cx表示CPU进行序号为x的计算处理;Dx表示加速设备Device进行顺序号x段指令处理。The instruction block sequence table gives C0-D1-C1-D2-C2, where Cx indicates that the CPU performs calculation processing with sequence number x; Dx indicates that the acceleration device Device performs sequence number x segment instruction processing.
需要说明的是,本申请优选实施例假定神经网络模型1描述的相关运算整体由加速设备进行处理(NET_D1),神经网络模型2描述的计算要求由加速设备进行处理(NET_D2),输入首先经过人脸检测预处理(NET_C0),输出结果提交给加速设备处理,要求进行人脸检测。加速设备接收相关输入后,完成人脸检测NET_D1运算,输出人脸位置等信息,CPU获取,进行人脸检测后处理及人脸识别预处理(NET_C1描述),处理完成后的 数据提交到加速设备进行人脸识别(NET_D2)处理,处理完成后,提交给CPU进行神经网络模型人脸识别后处理(NET_C2),完成整体业务功能;It should be noted that the preferred embodiment of the present application assumes that the related operations described by neural network model 1 are processed by acceleration equipment as a whole (NET_D1), and the calculation requirements described by neural network model 2 are processed by acceleration equipment (NET_D2). Face detection preprocessing (NET_C0), the output result is submitted to the acceleration device for processing, and face detection is required. After the acceleration device receives the relevant input, it completes the face detection NET_D1 operation, outputs the information of the face position and other information, which is obtained by the CPU, performs face detection post-processing and face recognition preprocessing (NET_C1 description), and submits the processed data to the acceleration device Perform face recognition (NET_D2) processing. After the processing is completed, submit it to the CPU for neural network model face recognition post-processing (NET_C2) to complete the overall business function;
需要说明的是,此处假定单神经网络模型单指令块的划分并不失一般性,加速设备如果支持神经神经网络模型描述的一部分运算,可以划分成多个模块,具体可参照优选实施例2。It should be noted that it is assumed here that the division of the single instruction block of the single neural network model does not lose generality. If the acceleration device supports part of the operations described by the neural network model, it can be divided into multiple modules. For details, please refer to the preferred embodiment 2. .
步骤2运行态相关处理流程; Step 2 Related processing flow in running state;
编译阶段完成镜像包输出,提交到运行态进行运行。The image package output is completed in the compilation stage and submitted to the running state for operation.
如附图6所示,运行态包括如下模块:加载模块,加速设备控制管理模块,输入输出管理模块,上层API接口等,其中,As shown in Figure 6, the running state includes the following modules: load module, acceleration device control management module, input output management module, upper-level API interface, etc., among which,
加载模块完成指令块组到相应设备的加载,加速设备管理模块对加速设备的启动、停止、复位等进行控制;API接口完成同上层用户的交互;The load module completes the loading of the instruction block group to the corresponding device, and the acceleration device management module controls the startup, stop, and reset of the acceleration device; the API interface completes the interaction with the upper-level user;
输入输出管理模块,完成同加速设备(如图11所示为加速设备的内部框图)的输入输出交互,并通过缓存项中所包含的控制信息对运行项进行组织,具体地:从加速设备侧看,有输入缓存区(InBuffer)和输出缓存区(OutBuffer);附图7所示:缓存区内容有两块,一块控制信息,一块数据信息;控制信息包括图片序号Px,指令块处理设备及指令块序号Tx,其中,x是数字,T是设备类型,有C和D,C表示HOST侧CPU,D表示加速设备Device。The input and output management module completes the input and output interaction with the acceleration device (as shown in Figure 11 is the internal block diagram of the acceleration device), and organizes the running items through the control information contained in the cache item, specifically: from the acceleration device side See, there are an input buffer (InBuffer) and an output buffer (OutBuffer); as shown in Figure 7: the buffer content has two blocks, one block of control information, one block of data information; the control information includes picture sequence number Px, instruction block processing equipment and Instruction block serial number Tx, where x is a number, T is a device type, there are C and D, C represents the HOST side CPU, and D represents the acceleration device Device.
步骤2.1:使用API接口,进行输入输出设置,完成编程;Step 2.1: Use API interface to set input and output, complete programming;
步骤2.2:通用编译工具(如gcc)对代码进行编译,生成可执行文件;Step 2.2: A general compiling tool (such as gcc) compiles the code and generates an executable file;
步骤2.3:运行可执行文件:运行过程如附图12所示。Step 2.3: Run the executable file: the running process is shown in Figure 12.
HOST侧运行态按照指令块序表,先调度CPU进行人脸检测预处理(NET_C0)处理,计算完成后,将数据填充到InBuffer,填充控制信息为P1-Net-D1,表示需要进行人脸检测神经网络模型推理过程。In the running state of the HOST side, according to the instruction block sequence table, the CPU is first scheduled for face detection preprocessing (NET_C0). After the calculation is completed, the data is filled into the InBuffer, and the filling control information is P1-Net-D1, indicating that face detection is required Neural network model inference process.
加速器获取到InBuffer中该项内容,进行人脸检测(Net-D1)的指令处理,处理完成后,数据填充到OutBuffer,并同时将该输入控制信息(P1-Net-D1)复制。The accelerator obtains the content in the InBuffer and performs the instruction processing of face detection (Net-D1). After the processing is completed, the data is filled into the OutBuffer and the input control information (P1-Net-D1) is copied at the same time.
HOST侧从设备OutBuffer获取到该项内容后,根据指令块序表,判 断进行人脸检测后处理及人脸识别神经网络模型预处理(Net-C1)处理,处理完成后,填充数据到InBuffer,并根据指令集序表填充控制信息为P1-Net_D2(人脸识别神经网络模型运行)。After the HOST side obtains the content from the device OutBuffer, it judges to perform face detection post-processing and face recognition neural network model preprocessing (Net-C1) according to the instruction block sequence table. After the processing is completed, fill the data into InBuffer, And fill the control information as P1-Net_D2 (face recognition neural network model operation) according to the instruction set sequence table.
加速设备获取该项进行人脸识别计算(Net_d2)处理,输出数据,复制控制信息P1-Net_D2。The acceleration device obtains this item, performs face recognition calculation (Net_d2) processing, outputs data, and copies control information P1-Net_D2.
CPU侧获取到该项,根据指令块序表,进行人脸识别数据后处理(NET_C2)处理,完成总体推理。The CPU side obtains this item, and performs face recognition data post-processing (NET_C2) processing according to the instruction block sequence table to complete the overall reasoning.
从上述流程可见,使用指令块序表的过程中,加速设备侧对控制信息的处理,仅进行拷贝。主机侧根据指令块序表进行控制信息的维护、更改。It can be seen from the above process that in the process of using the instruction block sequence table, the processing of the control information on the device side is accelerated, and only copying is performed. The host side maintains and changes control information according to the instruction block sequence table.
此处,加速设备有进行两个不同功能,人脸检测(NET-D1)和人脸识别(NET-D2)功能,这其中的时分复用是用跳转指令、跳转映射表完成神经网络模型功能推理切换的。相关情况如下:Here, the acceleration device has two different functions, face detection (NET-D1) and face recognition (NET-D2). The time-division multiplexing is to use jump instructions and jump mapping tables to complete the neural network. The model function is switched by reasoning. The relevant situation is as follows:
第一种情况:编译态处理:如下四个步骤:The first case: Compiled state processing: the following four steps:
步骤a:编译模块根据设备情况生成神经网络模型调度到不同设备的指令块NET_Tx,(本例中人脸检测处理(NET_D1)和人脸识别处理(NET_D2);Step a: The compilation module generates a neural network model according to the equipment situation and dispatches the instruction block NET_Tx to different equipment (face detection processing (NET_D1) and face recognition processing (NET_D2) in this example;
步骤b:编译模块在加速设备指令块NET_Dx后增加跳转指令JMP 0;Step b: The compilation module adds a jump instruction JMP 0 after the acceleration device instruction block NET_Dx;
步骤c:编译模块生成加速设备跳转映射表(也可运行态生成,此处描述编译模块生成)。跳转映射表如附图9。Step c: The compilation module generates the acceleration device jump mapping table (it can also be generated in the running state, and the compilation module generation is described here). The jump mapping table is shown in Figure 9.
步骤d:编译模块在加速设备指令块NET-D0前增加buff获取指令和JMP Rj指令;(附图8)Step d: The compilation module adds the buff acquisition instruction and the JMP Rj instruction before the acceleration device instruction block NET-D0; (Figure 8)
第二种情况:加速设备对指令的运行过程如下:(附图10)The second case: the operation process of the acceleration device to the instruction is as follows: (Figure 10)
1)执行指令,从指令基址0执行;1) Execute instruction, execute from instruction base address 0;
2)从InBuffer获取数据(控制信息+数据输入),从控制字中解析神经网络模型索引,查找映射表,获取执行指令基址,填充到Rj。2) Get data from InBuffer (control information + data input), parse the neural network model index from the control word, look up the mapping table, get the base address of the execution instruction, and fill it into Rj.
3)跳转到指令基址开始执行;3) Jump to the instruction base address to start execution;
4)直至执行到该段神经网络模型处理尾端,输出执行结果到OutBuffer, 将输入控制字写入;4) Until the execution reaches the end of the neural network model processing, output the execution result to OutBuffer, and write the input control word;
5)然后执行JMP 0,重复执行。5) Then execute JMP 0 and repeat execution.
优选实施例2:单神经网络模型需要在主机和加速设备共同运行Preferred embodiment 2: The single neural network model needs to run together on the host and the acceleration device
步骤2.1:神经网络模型输入编译模块,进行编译,输出镜像包;Step 2.1: The neural network model is input to the compilation module, compiled, and the image package is output;
假设神经网络模型指令块组合及指令块序表为C0-D1-C1-D2-C2;表示神经网络模型需要先经主机CPU预处理,然后加速设备处理,然后CPU处理,然后加速设备再处理,最后CPU处理;Assuming that the neural network model instruction block combination and instruction block sequence table are C0-D1-C1-D2-C2; it means that the neural network model needs to be preprocessed by the host CPU first, then the device is accelerated, then the CPU is processed, and then the device is accelerated, and then processed, Finally CPU processing;
设备跳转表例同优选实施例1,此处不再赘述。The device jump table example is the same as the preferred embodiment 1, and will not be repeated here.
步骤2.2:根据API接口编码,编译生成可执行文件;Step 2.2: According to the API interface code, compile and generate an executable file;
步骤2.3:HOST侧运行态运行,进行镜像加载,运行;过程大体同优选实施例1中的步骤2;Step 2.3: Run in the running state on the HOST side, load and run the image; the process is roughly the same as step 2 in the preferred embodiment 1;
步骤2.4:运行态持续运行,持续给出推理运算结果。Step 2.4: The running state continues to run, and the inference calculation result is continuously given.
优选实施例3:多神经网络模型组合,且单神经网络模型需要多拆分Preferred embodiment 3: Combination of multiple neural network models, and a single neural network model needs multiple splits
步骤3.1:多神经网络模型输入编译模块,进行编译,输出镜像包;Step 3.1: Input the multi-neural network model into the compilation module, compile, and output the image package;
不失一般性,可设神经网络模型指令块组合及指令块序表为C0-D1-C1-D2-C2-D3-C3-C4-D4-C5-D5-C6;Without loss of generality, the neural network model instruction block combination and instruction block sequence table can be set as C0-D1-C1-D2-C2-D3-C3-C4-D4-C5-D5-C6;
设备跳转表例同优选实施例1。The device jump table example is the same as the preferred embodiment 1.
步骤3.2:同优选实施例2的其他步骤。Step 3.2: Same as other steps in the preferred embodiment 2.
从本申请优选实施例来看,本申请实施例以及优选实施例所公开内容不仅适用于多神经网络模型组合推理业务,单神经网络模型需HOST和加速设备组合完成的业务也能应用。此外,本身有HOST和加速设备组合完成的神经网络模型的多神经网络模型组合也能适用;主机和多个加速设备组合也能适用;这些都在本申请的保护范围内。From the perspective of the preferred embodiments of the present application, the embodiments of the present application and the content disclosed in the preferred embodiments are not only applicable to multi-neural network model combined reasoning services, but also services that require a combination of HOST and acceleration equipment to complete a single neural network model. In addition, a combination of multiple neural network models with a neural network model completed by a combination of a HOST and an acceleration device can also be applied; a combination of a host and multiple acceleration devices can also be applied; these are all within the protection scope of this application.
进一步地,本申请上述实施例以及优选实施例的技术方案,针对多神经神经网络模型组合相关的业务推理难落地实现的问题,提供了一种通过 编译和运行两阶段、用跳转指令及映射表和指令块序表实现多神经神经网络模型组合业务推理的方法、装置和系统。Furthermore, the technical solutions of the above-mentioned embodiments and preferred embodiments of the present application aim at the problem that the business reasoning related to the combination of multi-neural neural network models is difficult to implement. It provides a method for compiling and running in two stages, using jump instructions and mapping A method, device and system for implementing multi-neural neural network model combination business reasoning by table and instruction block table sequence.
在本申请一可选实施例中,提供了一种编译生成使用指令块序表,运行时根据该表进行多设备调度、协同运算的方法;In an optional embodiment of the present application, there is provided a method for compiling and generating a sequence table of used instruction blocks, and performing multi-device scheduling and cooperative operation according to the table during runtime;
在本申请一可选实施例中,提供了一种加速设备基于跳转映射表使用简单跳转指令,进行时分复用,完成不同运算功能的方法和装置;In an optional embodiment of the present application, there is provided a method and device in which an acceleration device uses a simple jump instruction based on a jump mapping table to perform time division multiplexing to complete different arithmetic functions;
在本申请一可选实施例中,提供了一种神经神经网络模型加速设备,包括:指令缓存,用于存储相关指令;功能单元集模块,实现神经神经网络模型相关计算模块;跳转指令,跳转映射表,用于实现神经网络模型功能组的跳转;寄存器组等;In an optional embodiment of the present application, a neural neural network model acceleration device is provided, including: an instruction cache, used to store related instructions; a functional unit set module, which implements neural neural network model related calculation modules; jump instructions, Jump mapping table, used to realize the jump of neural network model function group; register group, etc.;
在本申请一可选实施例中,提供了一种深度神经神经网络模型的编译模块,对深度神经神经网络模型进行编译转换为相关指令集;并生成相关指令块序表及跳转映射表;In an optional embodiment of the present application, a compilation module of a deep neural network model is provided, which compiles and converts the deep neural network model into a related instruction set; and generates a sequence table of related instruction blocks and a jump mapping table;
在本申请一可选实施例中,提供了一种神经神经网络模型推理运行的运行态模块:包括加载模块,加载相关镜像到具体位置;设备控制,控制加速设备的启动、停止、复位等;设备的输入输出管理,提供要处理的数据和要求给设备,获取设备的处理结果;包括提供给业务用户的编程接口(API)等;In an optional embodiment of the present application, a running state module of neural network model inference operation is provided: including a loading module, which loads related images to a specific location; device control, which controls the start, stop, and reset of acceleration devices; The input and output management of the equipment, provide the data and requirements to be processed to the equipment, and obtain the processing results of the equipment; including the programming interface (API) provided to business users;
综上,通过本申请实施例以及优选实施例的指令块的处理方法及装置,采用编译和运行态系统(如图13所示),可方便、快速、高效的完成多神经神经网络模型的推理落地,简便完成相关业务功能。In summary, through the instruction block processing method and device of the embodiments of the present application and the preferred embodiment, the compiler and runtime system (as shown in Figure 13) can be used to complete the reasoning of the multi-neural neural network model conveniently, quickly and efficiently. Landing, simple completion of related business functions.
本申请的实施例还提供了一种存储介质,该存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。The embodiment of the present application also provides a storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps in any of the foregoing method embodiments when running.
可选地,在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的计算机程序:Optionally, in this embodiment, the foregoing storage medium may be configured to store a computer program for executing the following steps:
S1,通过编译模块对神经网络模型的描述文件进行编译,得到镜像包, 其中,镜像包包括:指令块组,指令块序表,跳转指令映射表,指令块组包括待处理的多个指令块,指令块序表用于指示多个指令块的运行顺序,以及运行指令块的执行设备,执行设备包括:处理器,加速设备,每一个指令块后设置有跳转指令,跳转指令映射表包括:跳转指令和下一个执行的指令块;S1: Compile the description file of the neural network model through the compilation module to obtain a mirrored package, where the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, instruction block group includes multiple instructions to be processed Block, instruction block sequence table is used to indicate the running sequence of multiple instruction blocks, and the execution device for running the instruction block, the execution device includes: processor, acceleration device, each instruction block is set with jump instructions, jump instruction mapping The table includes: jump instruction and next instruction block to be executed;
S2,加载所述镜像包,并按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块。S2. Load the mirrored package, and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储计算机程序的介质。Optionally, in this embodiment, the foregoing storage medium may include, but is not limited to: U disk, Read-Only Memory (Read-Only Memory, ROM for short), Random Access Memory (Random Access Memory, RAM for short), Various media that can store computer programs, such as mobile hard disks, magnetic disks, or optical disks.
本申请的实施例还提供了一种存储介质,该存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。The embodiment of the present application also provides a storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps in any of the foregoing method embodiments when running.
本申请的实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。The embodiment of the present application also provides an electronic device, including a memory and a processor, the memory is stored with a computer program, and the processor is configured to run the computer program to execute the steps in any of the foregoing method embodiments.
可选地,上述电子装置还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。Optionally, the aforementioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the aforementioned processor, and the input-output device is connected to the aforementioned processor.
可选地,在本实施例中,上述处理器可以被设置为通过计算机程序执行以下步骤:Optionally, in this embodiment, the foregoing processor may be configured to execute the following steps through a computer program:
S1,通过编译模块对神经网络模型的描述文件进行编译,得到镜像包,其中,镜像包包括:指令块组,指令块序表,跳转指令映射表,指令块组包括待处理的多个指令块,指令块序表用于指示多个指令块的运行顺序,以及运行指令块的执行设备,执行设备包括:处理器,加速设备,每一个指令块后设置有跳转指令,跳转指令映射表包括:跳转指令和下一个执行的指令块;S1: Compile the description file of the neural network model through the compilation module to obtain a mirrored package, where the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, instruction block group includes multiple instructions to be processed Block, instruction block sequence table is used to indicate the running sequence of multiple instruction blocks, and the execution device for running the instruction block, the execution device includes: processor, acceleration device, each instruction block is set with jump instructions, jump instruction mapping The table includes: jump instruction and next instruction block to be executed;
S2,加载所述镜像包,并按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块。S2. Load the mirrored package, and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
可选地,本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。Optionally, for specific examples in this embodiment, reference may be made to the examples described in the above-mentioned embodiments and optional implementation manners, and details are not described herein again in this embodiment.
本申请的实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。The embodiment of the present application also provides an electronic device, including a memory and a processor, the memory is stored with a computer program, and the processor is configured to run the computer program to execute the steps in any of the foregoing method embodiments.
可选地,上述电子装置还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。Optionally, the aforementioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the aforementioned processor, and the input-output device is connected to the aforementioned processor.
可选地,本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。Optionally, for specific examples in this embodiment, reference may be made to the examples described in the above-mentioned embodiments and optional implementation manners, and details are not described herein again in this embodiment.
显然,本领域的技术人员应该明白,上述的本申请的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的神经网络模型上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请不限制于任何特定的硬件和软件结合。Obviously, those skilled in the art should understand that the above-mentioned modules or steps of this application can be implemented by a general computing device, and they can be concentrated on a single computing device, or distributed on a nerve composed of multiple computing devices. On the network model, they can optionally be implemented with program codes executable by the computing device, so that they can be stored in the storage device for execution by the computing device, and in some cases, they can be different from here. The steps shown or described are executed in the order of, or they are respectively fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module to achieve. In this way, this application is not limited to any specific hardware and software combination.
以上所述仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above descriptions are only preferred embodiments of the application, and are not used to limit the application. For those skilled in the art, the application can have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the principles of this application shall be included in the protection scope of this application.

Claims (10)

  1. 一种指令块的处理方法,包括:A method for processing instruction blocks, including:
    通过编译模块对神经网络模型的描述文件进行编译,得到镜像包,其中,所述镜像包包括:指令块组,指令块序表,跳转指令映射表,所述指令块组包括待处理的多个指令块,所述指令块序表用于指示所述多个指令块的运行顺序,以及运行指令块的执行设备,所述执行设备包括:处理器,加速设备,每一个所述指令块后设置有跳转指令,所述跳转指令映射表包括:所述跳转指令和下一个执行的指令块;The description file of the neural network model is compiled through the compilation module to obtain a mirrored package, where the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, and the instruction block group includes multiple to be processed Instruction blocks, the instruction block sequence table is used to indicate the running sequence of the multiple instruction blocks, and the execution device for running the instruction blocks, the execution device includes: a processor, an acceleration device, after each instruction block A jump instruction is set, and the jump instruction mapping table includes: the jump instruction and the next instruction block to be executed;
    加载所述镜像包,并按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块。Load the image package, and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
  2. 根据权利要求1所述的方法,其中,通过编译模块对神经网络模型的描述文件进行编译,得到镜像包,包括:The method according to claim 1, wherein the compiling of the description file of the neural network model by the compiling module to obtain the image package comprises:
    通过编译模块对多个神经网络模型的描述文件进行编译,得到多个神经网络模型对应的镜像包。The description files of multiple neural network models are compiled through the compilation module to obtain mirror packages corresponding to multiple neural network models.
  3. 根据权利要求1所述的方法,其中,按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块,包括:The method according to claim 1, wherein processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table comprises:
    指示所述指令块序表中的执行设备按照所述指令块序表的运行顺序和所述跳转指令映射表处理所述指令块组的指令块。Instruct the execution device in the instruction block sequence table to process the instruction blocks of the instruction block group according to the running sequence of the instruction block sequence table and the jump instruction mapping table.
  4. 根据权利要求1至3任一项所述的方法,其中,按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块之后,所述方法还包括:The method according to any one of claims 1 to 3, wherein after processing the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table, the method further comprises:
    对于每一个指令,在按照所述运行顺序处理完一个指令后,将得到的数据缓存到预先分配的缓存区中。For each instruction, after processing an instruction according to the running sequence, the obtained data is cached in the pre-allocated buffer area.
  5. 一种指令块的处理装置,包括:An instruction block processing device includes:
    编译模块,用于对神经网络模型的描述文件进行编译,得到镜像包,其中,所述镜像包包括:指令块组,指令块序表,跳转指令映射表,所述指令块组包括待处理的多个指令块,所述指令块序表用于指 示所述多个指令块的运行顺序,以及运行指令块的执行设备,所述执行设备包括:处理器,加速设备,每一个所述指令块后设置有跳转指令,所述跳转指令映射表包括:所述跳转指令和下一个执行的指令块;The compiling module is used to compile the description file of the neural network model to obtain a mirrored package, where the mirrored package includes: instruction block group, instruction block sequence table, jump instruction mapping table, and the instruction block group includes to-be-processed The instruction block sequence table is used to indicate the running sequence of the multiple instruction blocks and the execution device for running the instruction block, the execution device includes: a processor, an acceleration device, each of the instructions A jump instruction is set after the block, and the jump instruction mapping table includes: the jump instruction and the next instruction block to be executed;
    处理模块,用于加载所述镜像包,并按照所述指令块序表和所述跳转指令映射表处理所述指令块组的指令块。The processing module is configured to load the mirrored package and process the instruction blocks of the instruction block group according to the instruction block sequence table and the jump instruction mapping table.
  6. 根据权利要求5所述的装置,其中,所述编译模块,用于通过编译模块对多个神经网络模型的描述文件进行编译,得到多个神经网络模型对应的镜像包。5. The device according to claim 5, wherein the compiling module is used to compile description files of multiple neural network models through the compiling module to obtain mirror packages corresponding to the multiple neural network models.
  7. 根据权利要求5所述的装置,其中,所述处理模块,还用于指示所述指令块序表中的执行设备按照所述指令块序表的运行顺序和所述跳转指令映射表处理所述指令块组。The apparatus according to claim 5, wherein the processing module is further configured to instruct the execution device in the instruction block sequence table to process the execution device in the instruction block sequence table according to the running sequence of the instruction block sequence table and the jump instruction mapping table. The instruction block group.
  8. 根据权利要求5至7任一项所述的装置,其中,所述处理模块,还用于对于每一个指令,在按照所述运行顺序处理完一个指令后,将得到的数据缓存到预先分配的缓存区中。The device according to any one of claims 5 to 7, wherein the processing module is further configured to, for each instruction, after processing an instruction according to the running sequence, buffer the obtained data to a pre-allocated In the cache.
  9. 一种存储介质,其中,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行所述权利要求1至4中任一项中所述的方法。A storage medium, wherein a computer program is stored in the storage medium, wherein the computer program is configured to execute the method described in any one of claims 1 to 4 when the computer program is run.
  10. 一种电子装置,包括存储器和处理器,其中,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行所述权利要求1至4中任一项中所述的方法。An electronic device comprising a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to run the computer program to execute the one described in any one of claims 1 to 4 Methods.
PCT/CN2020/085180 2019-06-26 2020-04-16 Instruction block processing method and apparatus, storage medium, and electronic device WO2020259020A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910562823.6A CN112148291A (en) 2019-06-26 2019-06-26 Instruction block processing method and device, storage medium and electronic device
CN201910562823.6 2019-06-26

Publications (1)

Publication Number Publication Date
WO2020259020A1 true WO2020259020A1 (en) 2020-12-30

Family

ID=73869963

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/085180 WO2020259020A1 (en) 2019-06-26 2020-04-16 Instruction block processing method and apparatus, storage medium, and electronic device

Country Status (2)

Country Link
CN (1) CN112148291A (en)
WO (1) WO2020259020A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106325967A (en) * 2015-06-30 2017-01-11 华为技术有限公司 Hardware acceleration method, compiler, and device
US20180011710A1 (en) * 2016-07-11 2018-01-11 DeePhi Technology Co., Ltd. Computing System and controller thereof
CN108027731A (en) * 2015-09-19 2018-05-11 微软技术许可有限责任公司 Debugging for block-based processor is supported
US20180293057A1 (en) * 2017-04-11 2018-10-11 Beijing Deephi Technology Co., Ltd. Programming model of neural network-oriented heterogeneous computing platform
CN109272109A (en) * 2018-10-30 2019-01-25 北京地平线机器人技术研发有限公司 The instruction dispatching method and device of neural network model
CN109919311A (en) * 2019-03-13 2019-06-21 北京地平线机器人技术研发有限公司 The method for generating instruction sequence, the method and apparatus for executing neural network computing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106325967A (en) * 2015-06-30 2017-01-11 华为技术有限公司 Hardware acceleration method, compiler, and device
CN108027731A (en) * 2015-09-19 2018-05-11 微软技术许可有限责任公司 Debugging for block-based processor is supported
US20180011710A1 (en) * 2016-07-11 2018-01-11 DeePhi Technology Co., Ltd. Computing System and controller thereof
US20180293057A1 (en) * 2017-04-11 2018-10-11 Beijing Deephi Technology Co., Ltd. Programming model of neural network-oriented heterogeneous computing platform
CN109272109A (en) * 2018-10-30 2019-01-25 北京地平线机器人技术研发有限公司 The instruction dispatching method and device of neural network model
CN109919311A (en) * 2019-03-13 2019-06-21 北京地平线机器人技术研发有限公司 The method for generating instruction sequence, the method and apparatus for executing neural network computing

Also Published As

Publication number Publication date
CN112148291A (en) 2020-12-29

Similar Documents

Publication Publication Date Title
KR102544522B1 (en) Data processing method and related products
US10942716B1 (en) Dynamic computational acceleration using a heterogeneous hardware infrastructure
US20200042856A1 (en) Scheduler for mapping neural networks onto an array of neural cores in an inference processing unit
CN111258744A (en) Task processing method based on heterogeneous computation and software and hardware framework system
CN111651207B (en) Neural network model operation chip, method, device, equipment and medium
US11003429B1 (en) Compile-time scheduling
CN111190741B (en) Scheduling method, equipment and storage medium based on deep learning node calculation
CN110430444A (en) A kind of video stream processing method and system
WO2021000971A1 (en) Method and device for generating operation data and related product
CN109491664B (en) iOS application program generation method, device, equipment and storage medium
US11733983B2 (en) Method and apparatus for generating metadata by a compiler
US20210158131A1 (en) Hierarchical partitioning of operators
CN109196476A (en) Seamless high-performance interoperability between the different type figure of shared garbage collector
CN114217886A (en) Function calling method, computing device and storage medium
KR101826828B1 (en) System and method for managing log data
US11631001B2 (en) Heterogeneous computing on a system-on-chip, including machine learning inference
WO2020259020A1 (en) Instruction block processing method and apparatus, storage medium, and electronic device
CN115186305B (en) Method for constructing data element model and producing data element
US11573777B2 (en) Method and apparatus for enabling autonomous acceleration of dataflow AI applications
Peñil et al. Automatic synthesis from UML/MARTE models using channel semantics
CN114168151A (en) Container-based program compiling method and device, electronic equipment and storage medium
Delestrac et al. Demystifying the TensorFlow eager execution of deep learning inference on a CPU-GPU tandem
WO2023071509A1 (en) Model compilation method and apparatus, and model running system
US11537310B2 (en) Threading of replication based on data type
CN110879744B (en) Method and system for executing computation graph by multiple threads

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20832669

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20832669

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 25.05.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20832669

Country of ref document: EP

Kind code of ref document: A1