WO2023027368A1 - Procédé d'optimisation de moteur d'exécution, dispositif d'optimisation de moteur d'exécution et système d'optimisation de moteur d'exécution - Google Patents

Procédé d'optimisation de moteur d'exécution, dispositif d'optimisation de moteur d'exécution et système d'optimisation de moteur d'exécution Download PDF

Info

Publication number
WO2023027368A1
WO2023027368A1 PCT/KR2022/011390 KR2022011390W WO2023027368A1 WO 2023027368 A1 WO2023027368 A1 WO 2023027368A1 KR 2022011390 W KR2022011390 W KR 2022011390W WO 2023027368 A1 WO2023027368 A1 WO 2023027368A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
neural network
network model
execution engine
data
Prior art date
Application number
PCT/KR2022/011390
Other languages
English (en)
Korean (ko)
Inventor
이현재
김재우
Original Assignee
주식회사 에너자이
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020210111338A external-priority patent/KR102393767B1/ko
Application filed by 주식회사 에너자이 filed Critical 주식회사 에너자이
Publication of WO2023027368A1 publication Critical patent/WO2023027368A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/447Target code generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/75Structural analysis for program understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to an execution engine optimization method, an execution engine optimization apparatus, and an execution engine optimization system. Specifically, the present application relates to an execution engine optimization method for optimizing an execution engine used in an embedded device, an execution engine optimization device, and an execution engine optimization system.
  • artificial intelligence technology develops, there is a demand for artificial intelligence technology to be applied to embedded devices with embedded systems used in various industries. Accordingly, lightweight technologies have been developed, and artificial intelligence technology can be applied to embedded devices with low performance and low specifications.
  • artificial intelligence technology can be applied to embedded devices through inference engine technology, which is software developed to efficiently execute pre-learned artificial intelligence models on embedded devices.
  • a conventional embedded artificial intelligence execution engine adopts a method of acquiring information about model execution in an embedded device, allocating memory required for model execution, and executing the model.
  • An example of a representative execution engine adopting this method is Tensorflow Lite Micro. This method has the advantage of being able to flexibly analyze the model structure and control memory allocation even when the model is changed during model execution.
  • An object to be solved by the present invention is to provide an execution engine optimization method, an execution engine optimization device, and an execution engine optimization system for optimizing an execution engine in consideration of a computing environment of an embedded device.
  • An execution engine optimization method includes obtaining binary data of a neural network model on which training is completed; extracting execution data of the neural network model from the binary data, wherein the execution data is related to at least one of execution sequence data of the neural network model and structural data of the neural network model; obtaining computing environment information of the embedded device, wherein the computing environment information includes at least one of memory information and processor information of the embedded device; predicting an operation of the neural network model in the embedded device based on the execution data and the computing environment information and performing optimization of the execution engine; obtaining optimal code information to be used in the execution engine based on the optimization result; and transmitting the optimum code information.
  • An execution engine optimization system includes a processor generating an optimal code for optimizing an execution engine to be used in an embedded device based on data of a neural network model that has been trained; and a transceiver for communicating with the embedded device; a server including; and an embedded device that obtains the optimal code and executes the optimal code; wherein the processor obtains binary data of the neural network model for which learning has been completed, and from the binary data, execution data of the neural network model - the execution data Extracts related to at least one of execution order information of the neural network model and structural data of the neural network model, and computing environment information of the embedded device-the computing environment information is at least one of memory information and processor information of the embedded device.
  • Obtaining, predicting the operation of the neural network model in the embedded device based on the execution data and the computing environment information, performing optimization of the execution engine, and based on the optimization result It may be configured to obtain optimal code information to be used in the execution engine, and transmit the optimal code information to the embedded device through the transceiver.
  • the execution capability of a neural network model in an embedded device may be improved.
  • power consumption in an embedded device may be reduced.
  • FIG. 1 is a schematic diagram of an execution engine optimization system according to an embodiment of the present application.
  • FIG. 2 is a diagram illustrating operations of an execution engine optimization system according to an embodiment of the present application.
  • FIG. 3 is a flowchart illustrating a method of optimizing an execution engine according to an embodiment of the present application.
  • FIG. 4 is a flowchart detailing steps for performing optimization of an execution engine according to an embodiment of the present application.
  • FIG. 5 is a flowchart specifying a step of obtaining target structure information of a neural network model according to an embodiment of the present application.
  • FIG. 6 is a diagram illustrating one aspect of a method for generating a first optimal code according to an embodiment of the present application.
  • FIG. 7 is a flowchart embodying steps for performing optimization of an execution engine according to another embodiment of the present application.
  • FIG. 8 is a flowchart embodying a step of generating a second optimal code according to an embodiment of the present application.
  • FIG. 9 is a flowchart embodying steps for performing optimization of an execution engine in another embodiment of the present application.
  • An execution engine optimization method includes obtaining binary data of a neural network model on which training is completed; extracting execution data of the neural network model from the binary data, wherein the execution data is related to at least one of execution sequence data of the neural network model and structural data of the neural network model; obtaining computing environment information of the embedded device, wherein the computing environment information includes at least one of memory information and processor information of the embedded device; predicting an operation of the neural network model in the embedded device based on the execution data and the computing environment information and performing optimization of the execution engine; obtaining optimal code information to be used in the execution engine based on the optimization result; and transmitting the optimum code information.
  • the optimizing of the execution engine may include obtaining the structure data of the neural network model from the execution data; obtaining target structure information of the neural network model from the structure data; and generating a first optimal code for merging operations related to a data set of interest included in the target structure information.
  • the obtaining of the target structure information may include obtaining structure-of-interest information of the previously set neural network model; detecting the set of interest data corresponding to the structure information of interest from the structure data; and obtaining the target structure information of the neural network model based on the interest data set.
  • the optimizing of the execution engine may include the expected memory usage when the neural network model is operated in the computing environment of the embedded device based on the execution data and the computing environment information. Computing; and generating a second optimal code for determining a memory allocation amount based on the memory usage.
  • the generating of the second optimal code may include obtaining location information of a memory block from the memory information of the computing environment information; Evaluating memory efficiency based on the memory usage and the memory allocation; and generating a code for rearranging the memory block based on the location information of the memory block and the memory efficiency.
  • the optimizing of the execution engine may include comparing the memory usage and the memory allocation; and generating a code for adjusting the memory usage based on a comparison result between the memory allocation amount and the memory usage.
  • the code for controlling the memory usage may be related to the Im2Col conversion code.
  • the generating of the second optimal code may include obtaining location information of a memory block from the memory information of the computing environment information; Evaluating memory efficiency based on the memory usage and the memory allocation; and generating a code for rearranging the memory block based on the location information of the memory block and the memory efficiency.
  • a computer-readable recording medium recording a program for executing the execution engine optimization method may be provided.
  • An execution engine optimization system includes a processor generating an optimal code for optimizing an execution engine to be used in an embedded device based on data of a neural network model that has been trained; and a transceiver for communicating with the embedded device; a server including; and an embedded device that obtains the optimal code and executes the optimal code; wherein the processor obtains binary data of the neural network model for which learning has been completed, and from the binary data, execution data of the neural network model - the execution data Extracts related to at least one of execution order information of the neural network model and structural data of the neural network model, and computing environment information of the embedded device-the computing environment information is at least one of memory information and processor information of the embedded device.
  • Obtaining, predicting the operation of the neural network model in the embedded device based on the execution data and the computing environment information, performing optimization of the execution engine, and based on the optimization result It may be configured to obtain optimal code information to be used in the execution engine, and transmit the optimal code information to the embedded device through the transceiver.
  • FIG. 1 is a schematic diagram of an execution engine optimization system according to an embodiment of the present application.
  • the execution engine optimization system 10 may include an embedded device 100 and a server 1000 (or an execution engine optimization device).
  • the server 1000 may have a computing environment that exhibits superior performance to that of the embedded device 100 .
  • the embedded device 100 may have a first computing environment representing a first capability.
  • the server 1000 may have a second computing environment that exhibits second performance superior to the first performance.
  • performance may include any information related to a computing environment, such as memory capacity, processor specifications, execution speed, and power consumption.
  • the server 1000 of the execution engine optimization system 10 is configured to perform the neural network model based on the data of the trained neural network model and the computing environment information of the embedded device 100 in which the neural network model is actually executed. You can perform operations to optimize the model's execution engine.
  • the execution engine optimization system 10 according to an embodiment of the present application optimizes the execution engine in a server (1000, or an execution engine optimization device) with relatively excellent performance, rather than in the embedded device 100 having performance limitations. By performing, it is possible to efficiently and quickly obtain an inference engine that is optimal for the computing environment of the embedded device 100 and can execute the neural network model.
  • the server 1000 may include a transceiver 1100, a memory 1200, and a processor 1300.
  • the transceiver 1100 of the server 1000 may communicate with any external device including the embedded device 100 .
  • the server 1000 may transmit optimal code information obtained by performing optimization of an execution engine to the embedded device 100 through the transceiver 1100 .
  • the server 1000 may receive computing environment information of the embedded device 100 from the embedded device 100 or any external device through the transceiver 1100 .
  • the server 1000 may transmit and receive various types of data by accessing a network through the transceiver 1100 .
  • the transceiver may largely include a wired type and a wireless type. Since the wired type and the wireless type each have advantages and disadvantages, the server 1000 may be provided with both the wired type and the wireless type in some cases.
  • a wireless local area network (WLAN)-based communication method such as Wi-Fi may be mainly used.
  • a wireless type a cellular communication, eg, LTE, 5G-based communication method may be used.
  • the wireless communication protocol is not limited to the above example, and any suitable wireless type communication method may be used.
  • LAN Local Area Network
  • USB Universal Serial Bus
  • the memory 1200 of the server 1000 may store various types of information. Various types of data may be temporarily or semi-permanently stored in the memory 1200 . Examples of the memory may include a hard disk drive (HDD), a solid state drive (SSD), flash memory, read-only memory (ROM), and random access memory (RAM). there is.
  • the memory 1200 may be provided in a form embedded in the server 1000 or in a detachable form.
  • the memory 1200 may store various data necessary for the operation of the server 1000, including an operating system (OS) for driving the server 1000 or a program for operating each component of the server 1000. there is.
  • OS operating system
  • the processor 1300 may control overall operations of the server 1000 .
  • the processor 1300 includes an operation of acquiring binary data of a neural network model for which learning has been completed, an operation of extracting execution data from the binary data, an operation of obtaining computing environment information of an embedded device, and an operation of executing data and computing environment information. It is possible to control overall operations of the server 1000, such as an operation of performing optimization of an execution engine based on the optimization result, an operation of obtaining optimum code information based on an optimization result, and an operation of transmitting optimum code information.
  • the processor 1300 may load and execute a program for overall operation of the server 1000 from the memory 1200 .
  • the processor 1300 may be implemented as an application processor (AP), a central processing unit (CPU), a microcontroller unit (MCU), or a similar device according to hardware, software, or a combination thereof.
  • AP application processor
  • CPU central processing unit
  • MCU microcontroller unit
  • AP application processor
  • hardware it may be provided in the form of an electronic circuit that processes electrical signals to perform a control function
  • software it may be provided in the form of a program or code that drives a hardware circuit.
  • the embedded device 100 may mean a device including a programmable arbitrary embedded system made for a specific purpose (or specific function).
  • the embedded device 100 may include hardware including a processor and/or memory. Also, the embedded device 100 may include firmware for controlling hardware. In addition, the embedded device 100 may be configured to execute an artificial intelligence model by inputting arbitrary software including an artificial intelligence execution engine into firmware.
  • the artificial intelligence execution engine is software for executing pre-learned artificial intelligence models in the embedded device 100 as efficiently as possible, and is a technology aimed at actual use of artificial intelligence and is efficient in the environment of the mounted device. function to increase
  • an execution engine may be implemented in accordance with the specifications of a slow operation speed and low power consumption, which are computing environments of the mobile device.
  • an execution engine may be implemented to maximize high-performance parallel processing capability.
  • the embedded device 100 may obtain code information optimized for the computing environment of the embedded device 100 from the server 1000 and add (or input) the optimized code information to the firmware. there is.
  • optimized code information can be generated by analyzing the internal structure of the neural network model after learning has been completed.
  • the optimized code information may be generated in consideration of a computing environment including memory specifications and/or processor specifications of the embedded device 100 .
  • the embedded device 100 may add optimal code information generated from the server 1000 to firmware and execute a neural network model.
  • the server 1000 of the execution engine optimization system 10 may optimize an execution engine to be used in the embedded device 100 .
  • the server 1000 of the execution engine optimization system 10 is an execution engine of the neural network model based on data on the neural network model that has been trained and computing environment information of the embedded device 100 in which the neural network model will actually be executed.
  • Optimal code information may be obtained by performing an operation of optimizing .
  • FIG. 2 is a diagram illustrating operations of the execution engine optimization system 10 according to an embodiment of the present application.
  • the server 1000 may obtain computing environment information of the embedded device 100 from the embedded device 100 .
  • the computing environment information may include at least one of memory information, processor information, and/or performance information of the embedded device 100 .
  • the computing environment information may include any appropriate information related to the computing environment (or computing specifications) of the embedded device 100.
  • the server 1000 may obtain data of a neural network model on which learning has been completed.
  • the data of the trained neural network model may be arbitrary data related to information of the neural network model.
  • the data of the neural network model that has been trained may be binary data.
  • the neural network model may be a model obtained by performing learning in the server 1000 according to an embodiment of the present application.
  • the neural network model may be a model obtained by performing learning in an external device of the server 1000 .
  • a neural network model may be learned in an external server having higher performance than the server 1000 .
  • the server 1000 may obtain binary data of the learned neural network model from an external server (or external device) through the transceiver 1100 .
  • the server 1000 may extract execution data of a neural network model from binary data.
  • the server 1000 may extract execution data related to at least one of execution sequence data of the neural network model and structure data of the neural network model from the binary data.
  • the server 1000 may perform optimization of an execution engine to be used in the embedded device 100 .
  • the server 1000 may optimize the execution engine based on the computing environment information and execution data of the embedded device 100 .
  • the server 1000 predicts the operation of the neural network model in the embedded device 100 using execution data of the neural network model and computing environment information of the embedded device 100, and optimizes the execution engine based on the prediction result. can be done
  • the server 1000 may obtain optimal code information to be used for the execution engine based on the optimization result of the execution engine.
  • the server 1000 may generate code for merging neural network model operations or code related to memory management. Regarding the contents of obtaining the optimal code information, it will be described in more detail in FIGS. 3 to 9 .
  • the server 1000 may transmit optimal code information to the embedded device 100 through the transceiver 1100 .
  • the embedded device 100 may acquire optimal code information through any appropriate transceiver. Also, the embedded device 100 may execute optimal code information. In detail, the embedded device 100 may execute a neural network model optimized for the computing environment of the embedded device 100 by adding optimal code information to firmware.
  • FIG. 3 is a flowchart illustrating a method of optimizing an execution engine according to an embodiment of the present application.
  • An execution engine optimization method includes obtaining binary data of a trained neural network model (S1000), extracting execution data (S2000), and computing environment information of the embedded device 100. It may include acquiring (S3000), optimizing an execution engine (S4000), and acquiring optimal code information (S5000).
  • the server 1000 may obtain binary data of the trained neural network model.
  • binary data may mean encompassing an arbitrary information file of a neural network model on which learning has been completed.
  • binary data may be data in the form of binary data of arbitrary information files of a neural network model on which training is completed.
  • the neural network model may be learned in the server 1000 according to an embodiment of the present application.
  • the neural network model may be learned from an external server of the server 1000 according to an embodiment of the present application.
  • a neural network model may be learned from an external server having a computing environment superior in performance to that of the server 1000 .
  • the server 1000 may be implemented to acquire binary data of the neural network model from an external server through an arbitrary transceiver.
  • the server 1000 may extract execution data of the neural network model from binary data of the neural network model.
  • the binary data of the neural network model may be in the form of binarized information required to execute the neural network model, including information related to an execution sequence of the neural network model or information related to the internal structure of the neural network model. Accordingly, the server 1000 according to an embodiment of the present application may extract execution data necessary for the execution of the neural network model from binary data of the neural network model.
  • the server 1000 may obtain computing environment information of the embedded device 100 through the transceiver 1100.
  • the computing environment information may include any information related to the computing environment (or computing specifications) of the embedded device 100, including memory information or processor information of the embedded device 100.
  • the server 1000 may optimize the execution engine based on execution data of the neural network model and computing environment information of the embedded device 100 .
  • the server 1000 predicts the operation of the neural network model in the embedded device by using the execution data of the neural network model and the computing environment information of the embedded device 100, and based on this, the execution engine, which is software related to the execution of the neural network model, is used. optimization can be performed.
  • the server 1000 may obtain structure data of the neural network model from execution data of the neural network model, and may detect target structure information of the neural network model from the structure data.
  • the target structure information may be any information related to a structure generally used in relation to a calculation structure of a neural network model.
  • the server 1000 may perform optimization of an execution engine by generating code for merging operation structures included in target structure information. In this regard, it will be described in detail in FIGS. 4 to 6 .
  • the server 1000 may be implemented to generate code related to memory management using execution data of the neural network model and computing environment information of the embedded device 100 .
  • the server 1000 uses neural network model execution data and computing environment information (eg, memory information and processor information, etc.) of the embedded device 100 to allow the neural network model to be operated in the computing environment of the embedded device 100.
  • An execution engine may be optimized by calculating an expected memory usage amount at the time and generating a code related to memory management based on the calculated memory usage amount.
  • the server 1000 may generate code for determining a memory allocation amount based on memory usage.
  • the server 1000 may generate code for controlling memory usage in order to utilize the allocated memory as much as possible. In this regard, it will be described in detail in FIGS. 7 to 9 .
  • the server 1000 may generate code for rearranging memory blocks by using execution data of the neural network model and computing environment information of the embedded device 100 .
  • the server 1000 may obtain location information of a memory block of the embedded device 100 from computing environment information (eg, memory information) of the embedded device 100 .
  • the server 1000 may predict or evaluate memory efficiency based on the above-described amount of memory operation and amount of memory allocation.
  • the server 1000 may generate code for rearranging memory blocks based on location information and memory efficiency of memory blocks. This will be described in detail in FIG. 8 .
  • the server 1000 may acquire optimal code information to be used for the execution engine based on the optimization result.
  • the optimal code information may include code information for merging the computational structure of the neural network model, code information related to memory management, and/or code information for rearranging memory blocks.
  • the execution engine optimization method may further include transmitting optimal code information.
  • the server 1000 may transmit the optimal code information to the embedded device 100 through the transceiver 1100 .
  • FIGS. 4 to 9 a method of performing optimization of an execution engine according to embodiments of the present application will be described in detail with reference to FIGS. 4 to 9 .
  • FIGS. 4 to 6 an optimization operation of merging the calculation structure of the neural network model is described in detail.
  • 7 to 9 describe optimization operations for memory management in detail.
  • FIG. 4 is a flowchart detailing steps for performing optimization of an execution engine according to an embodiment of the present application.
  • Optimizing the execution engine according to an embodiment of the present application includes acquiring structure data of the neural network model (S4110), acquiring target structure information of the neural network model from the structure data (S4120), and It may include generating a first optimal code for merging operations related to the data set of interest included in the target structure information (S4130).
  • the server 1000 may obtain structural data representing the internal structure of the neural network model from execution data of the neural network model.
  • the server 1000 may obtain target structure information from the structure data of the neural network model.
  • a structure of a commonly used neural network model may exist for each type of neural network model. For example, in a specific network space, a structure that performs a convolution operation, a depthwise convolution operation, and an activation operation may be generally used.
  • the server 1000 may obtain structure-of-interest information related to a commonly used structure as described above, and may perform an operation of detecting a data set of interest corresponding to the structure-of-interest information from structure data. . Also, the server 1000 may obtain target structure information of the neural network model based on the detected data set of interest.
  • FIG. 5 is a flowchart specifying a step of obtaining target structure information of a neural network model according to an embodiment of the present application.
  • 6 is a diagram illustrating one aspect of a method for generating a first optimal code according to an embodiment of the present application.
  • Acquiring target structure information of a neural network model according to an embodiment of the present application includes acquiring structure information of interest (S4122), a data set of interest corresponding to structure information of interest from structure data (data set of interest) ) may be detected (S4124) and target structure information of the neural network model may be obtained based on the data set (S4126).
  • the server 1000 may obtain structure of interest information related to the neural network model.
  • a commonly used calculation structure for each type of neural network model may constitute a neural network model.
  • a structure in which the first operation O1 is performed and the second operation O2 is performed based on the output value output from the first operation O1 can be generally used in a neural network model having a specific network space.
  • a structure in which a convolution operation is performed, and a depthwise convolution operation and an activation operation are sequentially performed may be generally used.
  • the second model obtains an intermediate result value by performing a depthwise convolution operation for compressing data for each channel by applying a filter for each channel related to the color of the image, and obtaining an intermediate result value. Based on this, it may include a structure for performing a pointwise operation.
  • the server 1000 may obtain structure information of interest related to a structure for performing the first operation O1 and the second operation O2.
  • structure of interest information may be previously input by a user.
  • the server 1000 may obtain structure-of-interest information through a user's input.
  • this is only an example and may be implemented to acquire structure-of-interest information related to the neural network model by any suitable method, and obtain target structure information based on the structure-of-interest information.
  • the server 1000 selects an operation structure corresponding to the structure of interest information from among the data sets included in the structure data, based on the structure of interest information.
  • a branch may be implemented to detect a data set of interest.
  • the structure-of-interest information includes information on a structure for performing the first operation O1 and the second operation O2
  • the server 1000 provides a structure corresponding to the structure-of-interest information among data sets included in the structure data.
  • a first target operation TO1 and a second target operation TO2 related to the operation structure may be detected.
  • the server 1000 may obtain object structure information of the neural network model based on the interest data set corresponding to the interest structure information.
  • object structure information of the neural network model may be obtained based on a data set of interest related to a structure in which the first object operation TO1 is performed and the second object operation TO2 is sequentially performed.
  • the execution engine optimization method may include generating a first optimal code for merging operations related to a data set of interest included in target structure information (S4130).
  • the server 1000 may generate a first optimal code configured to perform an operation by merging the first object operation TO1 and the second object operation TO2 related to the object structure information.
  • the first optimal code may be generated using an operation fusion technique.
  • FIG. 7 is a flowchart embodying a step (S4000) of performing optimization of an execution engine according to another embodiment of the present application. Specifically, FIG. 7 is a flowchart illustrating a method of optimizing a memory allocation amount according to another embodiment of the present application.
  • Optimizing the execution engine according to another embodiment of the present application includes calculating the expected memory usage when the neural network model is operated in the computing environment of the embedded device (S4210) and based on the memory usage
  • a step of generating a second optimal code for determining a memory allocation amount (S4220) may be included.
  • the server 1000 In the step of calculating the expected memory usage when the neural network model is operated in the computing environment of the embedded device (S4210), the server 1000 operates the neural network model based on the execution data of the neural network model and the computing environment information of the embedded device 100. Expected memory usage when the model is operated in the computing environment of the embedded device 100 may be calculated.
  • the server 1000 In the step of generating the second optimal code for determining the memory allocation amount based on the memory usage (S4220), the server 1000 generates the second optimal code for determining or adjusting the memory allocation amount using the memory usage amount, and executes the Optimization of the engine can be performed.
  • FIG. 8 is a flowchart embodying a step of generating a second optimal code according to an embodiment of the present application. Specifically, FIG. 8 is a flowchart illustrating a method of optimizing a memory block according to an embodiment of the present application.
  • Generating the second optimal code (S4220) includes acquiring location information of memory blocks (S4230), evaluating memory efficiency (S4240), and rearranging the memory blocks. It may include a step of generating (S4250).
  • the server 1000 may obtain location information of the memory block from memory information of the computing environment information of the embedded device 100.
  • the server 1000 may calculate the memory efficiency using the expected memory usage and memory allocation when the neural network model is operated in the computing environment of the embedded device 100. At this time, if a memory block is allocated to a specific location, memory efficiency may be disturbed. In this case, the server 1000 according to an embodiment of the present application may rearrange the memory blocks based on location information and memory efficiency of the memory blocks, as will be described later.
  • the server 1000 may generate the code for rearranging the memory block based on location information and memory efficiency of the memory block.
  • the server 1000 may generate code for rearranging memory blocks by using location information of memory blocks when memory efficiency is lower than a preset threshold efficiency value.
  • the execution engine optimization system 10 may allocate an optimal memory for the computing environment (or computing specification) of the embedded device 100 .
  • FIG. 9 is a flowchart detailing steps for performing optimization of an execution engine according to another embodiment of the present application.
  • Optimizing the execution engine according to an embodiment of the present application includes comparing memory usage and memory allocation (S4310) and generating code for adjusting memory usage based on the comparison result (S4310). S4320) may be included.
  • the server 1000 compares the memory usage expected when the neural network model is operated in the embedded device 100 with the memory allocation determined based on the memory usage.
  • step S4320 of generating code for adjusting memory usage based on the comparison result the server 1000 may generate code for adjusting memory usage by comparing memory usage and memory allocation.
  • the server 1000 may generate code for adjusting the memory usage to increase.
  • the server 1000 may be implemented to improve the performance of the neural network model in the embedded device 100 by increasing the cache hit ratio by increasing memory usage by using the Im2Col extension technique. More specifically, when the Im2Col extension technique is used, execution speed can be expected to improve as the cache hit rate increases instead of memory usage increasing.
  • the server 1000 according to an embodiment of the present application utilizes the Im2Col extension technique when the expected memory usage is smaller than the memory allocation amount when the neural network model is operated in the embedded device 100, while maximizing the use of the memory.
  • the execution engine can be optimized to improve the execution speed of the neural network model in (100).
  • the server 1000 may generate code for adjusting the memory usage to be lowered.
  • the embedded device 100 of the execution engine optimization system 10 analyzes the structure of a model by using an execution engine optimized in the server 1000 superior to the computing specifications of the embedded device 100. Alternatively, you can directly run the neural network model without deciding on memory allocation.
  • an algorithm for analyzing the structure of a model and merging operations for a specific structure may be applied to the execution engine.
  • a complex and improved memory allocation algorithm or memory block rearrangement algorithm is applied to the execution engine in consideration of the computing specifications of the embedded device 100. can do. Accordingly, the execution capability of the neural network model in the embedded device 100 may be improved.
  • an operation of analyzing the structure of the above-described model or determining memory allocation is not performed in the embedded device 100 having limitations in the computing environment. Instead, it is performed in the server 1000, which has a relatively excellent computing environment. Therefore, when the model is executed in the embedded device 100, the memory of the embedded device 100 can be efficiently utilized and power consumption of the embedded device can be reduced.
  • Various operations of the server 1000 described above may be stored in the memory 1200 of the server 1000, and the processor 1300 of the server 1000 may be provided to perform the operations stored in the memory 1200.
  • the execution engine optimization method, execution engine optimization device, and execution engine optimization system disclosed in this application provide an efficient method of artificial intelligence models in various embedded systems, including home appliances, vehicle sensors, products for the safety of infants or the elderly, and smart watches. can be used for execution.

Abstract

La présente demande porte, selon un mode de réalisation, sur un procédé d'optimisation de moteur d'exécution qui comprend les étapes consistant : à acquérir des données binaires d'un modèle de réseau neuronal appris ; à extraire, à partir des données binaires, des données d'exécution du modèle de réseau neuronal ; à acquérir des informations d'environnement informatique d'un dispositif intégré ; à prédire une opération du modèle de réseau neuronal dans le dispositif intégré, sur la base des données d'exécution et des informations d'environnement informatique et à optimiser un moteur d'exécution ; à acquérir, sur la base du résultat d'optimisation, des informations de code optimales à utiliser pour le moteur d'exécution ; et à transmettre les informations de code optimales.
PCT/KR2022/011390 2021-08-24 2022-08-02 Procédé d'optimisation de moteur d'exécution, dispositif d'optimisation de moteur d'exécution et système d'optimisation de moteur d'exécution WO2023027368A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR10-2021-0111338 2021-08-24
KR1020210111338A KR102393767B1 (ko) 2021-08-24 2021-08-24 실행 엔진 최적화 방법, 실행 엔진 최적화 장치, 및 실행 엔진 최적화 시스템
KR10-2022-0052708 2021-08-24
KR1020220052708A KR102573644B1 (ko) 2021-08-24 2022-04-28 실행 엔진 최적화 방법, 실행 엔진 최적화 장치, 및 실행 엔진 최적화 시스템

Publications (1)

Publication Number Publication Date
WO2023027368A1 true WO2023027368A1 (fr) 2023-03-02

Family

ID=85323506

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2022/011390 WO2023027368A1 (fr) 2021-08-24 2022-08-02 Procédé d'optimisation de moteur d'exécution, dispositif d'optimisation de moteur d'exécution et système d'optimisation de moteur d'exécution

Country Status (2)

Country Link
KR (1) KR102573644B1 (fr)
WO (1) WO2023027368A1 (fr)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130086449A (ko) * 2012-01-25 2013-08-02 전자부품연구원 계층별 볼륨의 이용량에 따라 데이터 블럭을 재배치하는 볼륨 관리 방법
KR20190113928A (ko) * 2017-03-24 2019-10-08 구글 엘엘씨 강화 학습을 통한 디바이스 배치 최적화
KR20210042012A (ko) * 2019-10-08 2021-04-16 한국전자통신연구원 인공 지능 추론 장치 및 방법
KR102257028B1 (ko) * 2020-10-06 2021-05-27 주식회사 딥이티 컴퓨팅 플랫폼 기반의 적응형 딥러닝 작업 할당 장치 및 방법
KR20210090349A (ko) * 2020-01-10 2021-07-20 주식회사 소이넷 인공지능 모델 구동 가속 장치 및 방법
KR102393767B1 (ko) * 2021-08-24 2022-05-04 주식회사 에너자이(ENERZAi) 실행 엔진 최적화 방법, 실행 엔진 최적화 장치, 및 실행 엔진 최적화 시스템

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102192325B1 (ko) * 2019-06-04 2020-12-28 (주)딥엑스 인공신경망의 데이터 로컬리티 기반의 데이터 캐슁을 이용하여 고속의 인공신경망 오퍼레이션을 지원하는 데이터 관리 장치
KR20200131722A (ko) * 2019-10-15 2020-11-24 주식회사 뷰노 훈련된 심층 신경망 모델의 재현 성능을 개선하는 방법 및 이를 이용한 장치

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130086449A (ko) * 2012-01-25 2013-08-02 전자부품연구원 계층별 볼륨의 이용량에 따라 데이터 블럭을 재배치하는 볼륨 관리 방법
KR20190113928A (ko) * 2017-03-24 2019-10-08 구글 엘엘씨 강화 학습을 통한 디바이스 배치 최적화
KR20210042012A (ko) * 2019-10-08 2021-04-16 한국전자통신연구원 인공 지능 추론 장치 및 방법
KR20210090349A (ko) * 2020-01-10 2021-07-20 주식회사 소이넷 인공지능 모델 구동 가속 장치 및 방법
KR102257028B1 (ko) * 2020-10-06 2021-05-27 주식회사 딥이티 컴퓨팅 플랫폼 기반의 적응형 딥러닝 작업 할당 장치 및 방법
KR102393767B1 (ko) * 2021-08-24 2022-05-04 주식회사 에너자이(ENERZAi) 실행 엔진 최적화 방법, 실행 엔진 최적화 장치, 및 실행 엔진 최적화 시스템

Also Published As

Publication number Publication date
KR20230029494A (ko) 2023-03-03
KR102573644B1 (ko) 2023-09-01

Similar Documents

Publication Publication Date Title
WO2018070768A1 (fr) Procédé de commande de système de surveillance et dispositif électronique le prenant en charge
WO2011147324A1 (fr) Disque dur électronique (ssd) à interfaces multiples, procédé de traitement et système associés
WO2015115783A1 (fr) Circuit de charge et dispositif électronique comportant ce dernier
WO2015186984A1 (fr) Dispositif électronique et procédé pour configurer un dispositif d'éclairage
WO2019156283A1 (fr) Mappage dynamique de mémoires pour réseaux neuronaux
WO2019132299A1 (fr) Système, dispositif et procédé de mise à l'échelle de ressources basée sur la priorité dans un système infonuagique
WO2022050541A1 (fr) Procédé d'ajustement de l'attribution de ressources informatiques pour de multiples vnf, et serveur associé
CN105100730A (zh) 一种监控方法及摄像头装置
CN104244381A (zh) 唤醒控制方法、装置和终端设备
WO2014119864A1 (fr) Procédé et appareil de migration de logiciel dans un environnement de microserveur
WO2021201387A1 (fr) Procédé et dispositif d'estimation de capacité de batterie basée sur un réseau neuronal
WO2016023493A1 (fr) Procédé de commande d'une interface homme-machine répondant à une instruction d'opération, et terminal
WO2023101402A1 (fr) Procédé destiné à calculer un volume de production d'une ligne de production, par intelligence artificielle
CN112584471A (zh) 一种节能信号接收方法、发送方法、终端和网络设备
WO2023027368A1 (fr) Procédé d'optimisation de moteur d'exécution, dispositif d'optimisation de moteur d'exécution et système d'optimisation de moteur d'exécution
JP2021513281A (ja) 検索空間の傍受情報の決定方法及び検索空間の傍受方法の決定装置
WO2012070900A2 (fr) Système de partage d'événement et données entre dispositifs personnels
KR102393767B1 (ko) 실행 엔진 최적화 방법, 실행 엔진 최적화 장치, 및 실행 엔진 최적화 시스템
KR20100052157A (ko) 차량용 블랙박스 장치 및 이에 의한 차량 상태 감시 방법
WO2023014041A1 (fr) Procédé, dispositif et système de génération de bancs d'essai
WO2023163453A1 (fr) Procédé d'optimisation de modèle de réseau neuronal devant être exécuté dans un dispositif intégré, appareil d'optimisation de modèle de réseau neuronal, et système d'optimisation de modèle de réseau neuronal
WO2015083857A1 (fr) Appareil pour un matériel de caractéristique robuste accélérée (surf) et procédé pour gérer une image intégrale
WO2017150841A1 (fr) Dispositif électronique, système d'exécution d'application, et procédé de commande correspondant
WO2022010064A1 (fr) Dispositif électronique et procédé de commande associé
WO2015137722A1 (fr) Procédé et dispositif pour traiter une instruction vliw, et procédé et dispositif pour générer une instruction pour traiter une instruction vliw

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22861579

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE