WO2023027368A1

WO2023027368A1 - Execution engine optimization method, execution engine optimization device, and execution engine optimization system

Info

Publication number: WO2023027368A1
Application number: PCT/KR2022/011390
Authority: WO
Inventors: 이현재; 김재우
Original assignee: 주식회사 에너자이
Priority date: 2021-08-24
Filing date: 2022-08-02
Publication date: 2023-03-02
Also published as: KR20230029494A; KR102573644B1

Abstract

An execution engine optimization method according to one embodiment of the present application comprises the steps of: acquiring binary data of a learned neural network model; extracting, from the binary data, execution data of the neural network model; acquiring computing environment information of an embedded device; predicting an operation of the neural network model in the embedded device, on the basis of the execution data and the computing environment information, and optimizing an execution engine; acquiring, on the basis of the optimization result, optimal code information to be used for the execution engine; and transmitting the optimal code information.

Description

Execution Engine Optimization Method, Execution Engine Optimization Apparatus, and Execution Engine Optimization System

The present application relates to an execution engine optimization method, an execution engine optimization apparatus, and an execution engine optimization system. Specifically, the present application relates to an execution engine optimization method for optimizing an execution engine used in an embedded device, an execution engine optimization device, and an execution engine optimization system.

As artificial intelligence technology develops, there is a demand for artificial intelligence technology to be applied to embedded devices with embedded systems used in various industries. Accordingly, lightweight technologies have been developed, and artificial intelligence technology can be applied to embedded devices with low performance and low specifications. In particular, artificial intelligence technology can be applied to embedded devices through inference engine technology, which is software developed to efficiently execute pre-learned artificial intelligence models on embedded devices.

A conventional embedded artificial intelligence execution engine adopts a method of acquiring information about model execution in an embedded device, allocating memory required for model execution, and executing the model. An example of a representative execution engine adopting this method is Tensorflow Lite Micro. This method has the advantage of being able to flexibly analyze the model structure and control memory allocation even when the model is changed during model execution.

However, in the case of embedded devices, there is a high possibility that the advantages of the existing method cannot be utilized because there are almost no updates during model execution. In addition, there is a concern that analyzing the structure of a model and determining memory allocation in an embedded device having limitations in a computing environment may act as a load on the memory of the embedded device. Additionally, embedded devices have limitations in using more complex and efficient algorithms to analyze the structure of a model and determine memory allocation because of limitations in computing specifications.

Therefore, it is required to develop an execution engine optimization method, apparatus, and system for implementing an execution engine that is efficient and has improved calculation speed in consideration of a computing environment or computing specifications of an embedded device.

An object to be solved by the present invention is to provide an execution engine optimization method, an execution engine optimization device, and an execution engine optimization system for optimizing an execution engine in consideration of a computing environment of an embedded device.

The problem to be solved by the present invention is not limited to the above-mentioned problems, and problems not mentioned will be clearly understood by those skilled in the art from this specification and the accompanying drawings. .

An execution engine optimization method according to an embodiment of the present application includes obtaining binary data of a neural network model on which training is completed; extracting execution data of the neural network model from the binary data, wherein the execution data is related to at least one of execution sequence data of the neural network model and structural data of the neural network model; obtaining computing environment information of the embedded device, wherein the computing environment information includes at least one of memory information and processor information of the embedded device; predicting an operation of the neural network model in the embedded device based on the execution data and the computing environment information and performing optimization of the execution engine; obtaining optimal code information to be used in the execution engine based on the optimization result; and transmitting the optimum code information.

An execution engine optimization system according to an embodiment of the present application includes a processor generating an optimal code for optimizing an execution engine to be used in an embedded device based on data of a neural network model that has been trained; and a transceiver for communicating with the embedded device; a server including; and an embedded device that obtains the optimal code and executes the optimal code; wherein the processor obtains binary data of the neural network model for which learning has been completed, and from the binary data, execution data of the neural network model - the execution data Extracts related to at least one of execution order information of the neural network model and structural data of the neural network model, and computing environment information of the embedded device-the computing environment information is at least one of memory information and processor information of the embedded device. Obtaining, predicting the operation of the neural network model in the embedded device based on the execution data and the computing environment information, performing optimization of the execution engine, and based on the optimization result , It may be configured to obtain optimal code information to be used in the execution engine, and transmit the optimal code information to the embedded device through the transceiver.

The solutions to the problems of the present invention are not limited to the above-described solutions, and solutions not mentioned will be clearly understood by those skilled in the art from this specification and the accompanying drawings. You will be able to.

According to the execution engine optimization method, the execution engine optimization apparatus, and the execution engine optimization system according to embodiments of the present application, the execution capability of a neural network model in an embedded device may be improved.

According to the execution engine optimization method, the execution engine optimization apparatus, and the execution engine optimization system according to embodiments of the present application, power consumption in an embedded device may be reduced.

Effects of the present invention are not limited to the above-mentioned effects, and effects not mentioned will be clearly understood by those skilled in the art from this specification and the accompanying drawings.

1 is a schematic diagram of an execution engine optimization system according to an embodiment of the present application.

2 is a diagram illustrating operations of an execution engine optimization system according to an embodiment of the present application.

3 is a flowchart illustrating a method of optimizing an execution engine according to an embodiment of the present application.

4 is a flowchart detailing steps for performing optimization of an execution engine according to an embodiment of the present application.

5 is a flowchart specifying a step of obtaining target structure information of a neural network model according to an embodiment of the present application.

6 is a diagram illustrating one aspect of a method for generating a first optimal code according to an embodiment of the present application.

7 is a flowchart embodying steps for performing optimization of an execution engine according to another embodiment of the present application.

8 is a flowchart embodying a step of generating a second optimal code according to an embodiment of the present application.

9 is a flowchart embodying steps for performing optimization of an execution engine in another embodiment of the present application.

According to an embodiment of the present application, the optimizing of the execution engine may include obtaining the structure data of the neural network model from the execution data; obtaining target structure information of the neural network model from the structure data; and generating a first optimal code for merging operations related to a data set of interest included in the target structure information.

According to an embodiment of the present application, the obtaining of the target structure information may include obtaining structure-of-interest information of the previously set neural network model; detecting the set of interest data corresponding to the structure information of interest from the structure data; and obtaining the target structure information of the neural network model based on the interest data set.

According to an embodiment of the present application, the optimizing of the execution engine may include the expected memory usage when the neural network model is operated in the computing environment of the embedded device based on the execution data and the computing environment information. Computing; and generating a second optimal code for determining a memory allocation amount based on the memory usage.

According to one embodiment of the present application, the generating of the second optimal code may include obtaining location information of a memory block from the memory information of the computing environment information; Evaluating memory efficiency based on the memory usage and the memory allocation; and generating a code for rearranging the memory block based on the location information of the memory block and the memory efficiency.

According to one embodiment of the present application, the optimizing of the execution engine may include comparing the memory usage and the memory allocation; and generating a code for adjusting the memory usage based on a comparison result between the memory allocation amount and the memory usage.

According to an embodiment of the present application, the code for controlling the memory usage may be related to the Im2Col conversion code.

According to an embodiment of the present application, a computer-readable recording medium recording a program for executing the execution engine optimization method may be provided.

The foregoing objects, features and advantages of the present application will become more apparent from the following detailed description taken in conjunction with the accompanying drawings. However, the present application can apply various changes and can have various embodiments. Hereinafter, specific embodiments will be illustrated in the drawings and described in detail.

Like reference numerals designate essentially like elements throughout the specification. In addition, components having the same function within the scope of the same idea appearing in the drawings of each embodiment will be described using the same reference numerals, and overlapping descriptions thereof will be omitted.

If it is determined that a detailed description of a known function or configuration related to the present application may unnecessarily obscure the subject matter of the present application, the detailed description thereof will be omitted. In addition, numbers (eg, first, second, etc.) used in the description process of this specification are only identifiers for distinguishing one component from another component.

In addition, the suffixes "module" and "unit" for components used in the following embodiments are given or used interchangeably in consideration of ease of writing the specification, and do not have meanings or roles that are distinguished from each other by themselves.

In the following examples, expressions in the singular number include plural expressions unless the context clearly dictates otherwise.

In the following embodiments, terms such as include or have mean that features or components described in the specification exist, and do not preclude the possibility that one or more other features or components may be added.

In the drawings, the size of components may be exaggerated or reduced for convenience of description. For example, the size and thickness of each component shown in the drawings are arbitrarily shown for convenience of explanation, and the present invention is not necessarily limited to those shown.

If an embodiment is otherwise implementable, the order of specific processes may be performed differently from the order described. For example, two processes that are described in succession may be performed substantially concurrently, or may proceed in an order reverse to that described.

In the following embodiments, when components are connected, a case in which the components are directly connected as well as a case in which components are interposed between the components and connected indirectly is included.

For example, when it is said that components are electrically connected in this specification, not only the case where the components are directly electrically connected, but also the case where the components are interposed and electrically connected indirectly is included.

Hereinafter, the execution engine optimization method, execution engine optimization device, and execution engine optimization system of the present application will be described with reference to FIGS. 1 to 9 .

The execution engine optimization system 10 according to an embodiment of the present application may include an embedded device 100 and a server 1000 (or an execution engine optimization device).

The server 1000 may have a computing environment that exhibits superior performance to that of the embedded device 100 . Specifically, the embedded device 100 may have a first computing environment representing a first capability. On the other hand, the server 1000 may have a second computing environment that exhibits second performance superior to the first performance. Here, performance may include any information related to a computing environment, such as memory capacity, processor specifications, execution speed, and power consumption.

The server 1000 of the execution engine optimization system 10 according to an embodiment of the present application is configured to perform the neural network model based on the data of the trained neural network model and the computing environment information of the embedded device 100 in which the neural network model is actually executed. You can perform operations to optimize the model's execution engine. The execution engine optimization system 10 according to an embodiment of the present application optimizes the execution engine in a server (1000, or an execution engine optimization device) with relatively excellent performance, rather than in the embedded device 100 having performance limitations. By performing, it is possible to efficiently and quickly obtain an inference engine that is optimal for the computing environment of the embedded device 100 and can execute the neural network model.

The server 1000 according to an embodiment of the present application may include a transceiver 1100, a memory 1200, and a processor 1300.

The transceiver 1100 of the server 1000 may communicate with any external device including the embedded device 100 . For example, the server 1000 may transmit optimal code information obtained by performing optimization of an execution engine to the embedded device 100 through the transceiver 1100 . In addition, the server 1000 may receive computing environment information of the embedded device 100 from the embedded device 100 or any external device through the transceiver 1100 .

The server 1000 may transmit and receive various types of data by accessing a network through the transceiver 1100 . The transceiver may largely include a wired type and a wireless type. Since the wired type and the wireless type each have advantages and disadvantages, the server 1000 may be provided with both the wired type and the wireless type in some cases. Here, in the case of the wireless type, a wireless local area network (WLAN)-based communication method such as Wi-Fi may be mainly used. Alternatively, in the case of a wireless type, a cellular communication, eg, LTE, 5G-based communication method may be used. However, the wireless communication protocol is not limited to the above example, and any suitable wireless type communication method may be used. In the case of a wired type, LAN (Local Area Network) or USB (Universal Serial Bus) communication is a representative example, and other methods are also possible.

The memory 1200 of the server 1000 may store various types of information. Various types of data may be temporarily or semi-permanently stored in the memory 1200 . Examples of the memory may include a hard disk drive (HDD), a solid state drive (SSD), flash memory, read-only memory (ROM), and random access memory (RAM). there is. The memory 1200 may be provided in a form embedded in the server 1000 or in a detachable form. The memory 1200 may store various data necessary for the operation of the server 1000, including an operating system (OS) for driving the server 1000 or a program for operating each component of the server 1000. there is.

The processor 1300 may control overall operations of the server 1000 . For example, the processor 1300 includes an operation of acquiring binary data of a neural network model for which learning has been completed, an operation of extracting execution data from the binary data, an operation of obtaining computing environment information of an embedded device, and an operation of executing data and computing environment information. It is possible to control overall operations of the server 1000, such as an operation of performing optimization of an execution engine based on the optimization result, an operation of obtaining optimum code information based on an optimization result, and an operation of transmitting optimum code information. In detail, the processor 1300 may load and execute a program for overall operation of the server 1000 from the memory 1200 . The processor 1300 may be implemented as an application processor (AP), a central processing unit (CPU), a microcontroller unit (MCU), or a similar device according to hardware, software, or a combination thereof. In this case, in terms of hardware, it may be provided in the form of an electronic circuit that processes electrical signals to perform a control function, and in terms of software, it may be provided in the form of a program or code that drives a hardware circuit.

The embedded device 100 may mean a device including a programmable arbitrary embedded system made for a specific purpose (or specific function).

The embedded device 100 may include hardware including a processor and/or memory. Also, the embedded device 100 may include firmware for controlling hardware. In addition, the embedded device 100 may be configured to execute an artificial intelligence model by inputting arbitrary software including an artificial intelligence execution engine into firmware.

Here, the artificial intelligence execution engine (Inference Engine) is software for executing pre-learned artificial intelligence models in the embedded device 100 as efficiently as possible, and is a technology aimed at actual use of artificial intelligence and is efficient in the environment of the mounted device. function to increase For example, in the case of a mobile device, an execution engine may be implemented in accordance with the specifications of a slow operation speed and low power consumption, which are computing environments of the mobile device. As another example, in the case of a PC server having relatively high computing performance, an execution engine may be implemented to maximize high-performance parallel processing capability.

The embedded device 100 according to an embodiment of the present application may obtain code information optimized for the computing environment of the embedded device 100 from the server 1000 and add (or input) the optimized code information to the firmware. there is. As will be described later, optimized code information can be generated by analyzing the internal structure of the neural network model after learning has been completed. In addition, the optimized code information may be generated in consideration of a computing environment including memory specifications and/or processor specifications of the embedded device 100 .

The embedded device 100 according to an embodiment of the present application may add optimal code information generated from the server 1000 to firmware and execute a neural network model.

Hereinafter, the operation of the execution engine optimization system 10 according to an embodiment of the present application will be described in detail with reference to FIGS. 2 to 9 .

The server 1000 of the execution engine optimization system 10 according to an embodiment of the present application may optimize an execution engine to be used in the embedded device 100 . Specifically, the server 1000 of the execution engine optimization system 10 is an execution engine of the neural network model based on data on the neural network model that has been trained and computing environment information of the embedded device 100 in which the neural network model will actually be executed. Optimal code information may be obtained by performing an operation of optimizing .

2 is a diagram illustrating operations of the execution engine optimization system 10 according to an embodiment of the present application.

The server 1000 according to an embodiment of the present application may obtain computing environment information of the embedded device 100 from the embedded device 100 . For example, the computing environment information may include at least one of memory information, processor information, and/or performance information of the embedded device 100 . However, this is only an example, and the computing environment information may include any appropriate information related to the computing environment (or computing specifications) of the embedded device 100.

The server 1000 according to an embodiment of the present application may obtain data of a neural network model on which learning has been completed. Here, the data of the trained neural network model may be arbitrary data related to information of the neural network model. Also, the data of the neural network model that has been trained may be binary data.

The neural network model may be a model obtained by performing learning in the server 1000 according to an embodiment of the present application. Alternatively, the neural network model may be a model obtained by performing learning in an external device of the server 1000 . For example, in order to learn a more sophisticated neural network model, a neural network model may be learned in an external server having higher performance than the server 1000 . At this time, the server 1000 may obtain binary data of the learned neural network model from an external server (or external device) through the transceiver 1100 .

The server 1000 according to an embodiment of the present application may extract execution data of a neural network model from binary data. In detail, the server 1000 may extract execution data related to at least one of execution sequence data of the neural network model and structure data of the neural network model from the binary data.

The server 1000 according to an embodiment of the present application may perform optimization of an execution engine to be used in the embedded device 100 . In detail, the server 1000 may optimize the execution engine based on the computing environment information and execution data of the embedded device 100 . For example, the server 1000 predicts the operation of the neural network model in the embedded device 100 using execution data of the neural network model and computing environment information of the embedded device 100, and optimizes the execution engine based on the prediction result. can be done

The server 1000 according to an embodiment of the present application may obtain optimal code information to be used for the execution engine based on the optimization result of the execution engine. In detail, the server 1000 may generate code for merging neural network model operations or code related to memory management. Regarding the contents of obtaining the optimal code information, it will be described in more detail in FIGS. 3 to 9 .

The server 1000 according to an embodiment of the present application may transmit optimal code information to the embedded device 100 through the transceiver 1100 .

The embedded device 100 according to an embodiment of the present application may acquire optimal code information through any appropriate transceiver. Also, the embedded device 100 may execute optimal code information. In detail, the embedded device 100 may execute a neural network model optimized for the computing environment of the embedded device 100 by adding optimal code information to firmware.

See Figure 3. 3 is a flowchart illustrating a method of optimizing an execution engine according to an embodiment of the present application.

An execution engine optimization method according to an embodiment of the present application includes obtaining binary data of a trained neural network model (S1000), extracting execution data (S2000), and computing environment information of the embedded device 100. It may include acquiring (S3000), optimizing an execution engine (S4000), and acquiring optimal code information (S5000).

In the step of acquiring binary data of the trained neural network model (S1000), the server 1000 may obtain binary data of the trained neural network model. In this case, binary data may mean encompassing an arbitrary information file of a neural network model on which learning has been completed. On the other hand, binary data may be data in the form of binary data of arbitrary information files of a neural network model on which training is completed.

For example, the neural network model may be learned in the server 1000 according to an embodiment of the present application.

As another example, the neural network model may be learned from an external server of the server 1000 according to an embodiment of the present application. For example, a neural network model may be learned from an external server having a computing environment superior in performance to that of the server 1000 . In this case, since the neural network model can be learned using more training data, a more sophisticated neural network model can be obtained. In this case, the server 1000 may be implemented to acquire binary data of the neural network model from an external server through an arbitrary transceiver.

In the step of extracting execution data (S2000), the server 1000 may extract execution data of the neural network model from binary data of the neural network model. Specifically, the binary data of the neural network model may be in the form of binarized information required to execute the neural network model, including information related to an execution sequence of the neural network model or information related to the internal structure of the neural network model. Accordingly, the server 1000 according to an embodiment of the present application may extract execution data necessary for the execution of the neural network model from binary data of the neural network model.

In the step of obtaining computing environment information of the embedded device 100 (S3000), the server 1000 may obtain computing environment information of the embedded device 100 through the transceiver 1100. As described above, the computing environment information may include any information related to the computing environment (or computing specifications) of the embedded device 100, including memory information or processor information of the embedded device 100.

In the step of optimizing the execution engine ( S4000 ), the server 1000 may optimize the execution engine based on execution data of the neural network model and computing environment information of the embedded device 100 . In detail, the server 1000 predicts the operation of the neural network model in the embedded device by using the execution data of the neural network model and the computing environment information of the embedded device 100, and based on this, the execution engine, which is software related to the execution of the neural network model, is used. optimization can be performed.

For example, the server 1000 may obtain structure data of the neural network model from execution data of the neural network model, and may detect target structure information of the neural network model from the structure data. Here, the target structure information may be any information related to a structure generally used in relation to a calculation structure of a neural network model. In this case, it may be efficient to implement the operation to perform the operation by merging the operation structures included in the target structure information. Accordingly, the server 1000 according to an embodiment of the present application may perform optimization of an execution engine by generating code for merging operation structures included in target structure information. In this regard, it will be described in detail in FIGS. 4 to 6 .

As another example, the server 1000 may be implemented to generate code related to memory management using execution data of the neural network model and computing environment information of the embedded device 100 . Specifically, the server 1000 uses neural network model execution data and computing environment information (eg, memory information and processor information, etc.) of the embedded device 100 to allow the neural network model to be operated in the computing environment of the embedded device 100. An execution engine may be optimized by calculating an expected memory usage amount at the time and generating a code related to memory management based on the calculated memory usage amount. For example, the server 1000 may generate code for determining a memory allocation amount based on memory usage. For another example, the server 1000 may generate code for controlling memory usage in order to utilize the allocated memory as much as possible. In this regard, it will be described in detail in FIGS. 7 to 9 .

As another example, the server 1000 may generate code for rearranging memory blocks by using execution data of the neural network model and computing environment information of the embedded device 100 . Specifically, the server 1000 may obtain location information of a memory block of the embedded device 100 from computing environment information (eg, memory information) of the embedded device 100 . In addition, the server 1000 may predict or evaluate memory efficiency based on the above-described amount of memory operation and amount of memory allocation. Also, the server 1000 may generate code for rearranging memory blocks based on location information and memory efficiency of memory blocks. This will be described in detail in FIG. 8 .

In the step of obtaining optimal code information ( S5000 ), the server 1000 may acquire optimal code information to be used for the execution engine based on the optimization result. For example, as described above, the optimal code information may include code information for merging the computational structure of the neural network model, code information related to memory management, and/or code information for rearranging memory blocks.

Meanwhile, although not shown in FIG. 3 , the execution engine optimization method according to an embodiment of the present application may further include transmitting optimal code information. Specifically, in the step of transmitting the optimal code information, the server 1000 may transmit the optimal code information to the embedded device 100 through the transceiver 1100 .

Hereinafter, a method of performing optimization of an execution engine according to embodiments of the present application will be described in detail with reference to FIGS. 4 to 9 . In FIGS. 4 to 6, an optimization operation of merging the calculation structure of the neural network model is described in detail. 7 to 9 describe optimization operations for memory management in detail.

See Figure 4. 4 is a flowchart detailing steps for performing optimization of an execution engine according to an embodiment of the present application.

Optimizing the execution engine according to an embodiment of the present application (S4000) includes acquiring structure data of the neural network model (S4110), acquiring target structure information of the neural network model from the structure data (S4120), and It may include generating a first optimal code for merging operations related to the data set of interest included in the target structure information (S4130).

In the step of acquiring structural data of the neural network model ( S4110 ), the server 1000 may obtain structural data representing the internal structure of the neural network model from execution data of the neural network model.

In the step of obtaining target structure information of the neural network model from the structure data (S4120), the server 1000 may obtain target structure information from the structure data of the neural network model. Specifically, a structure of a commonly used neural network model may exist for each type of neural network model. For example, in a specific network space, a structure that performs a convolution operation, a depthwise convolution operation, and an activation operation may be generally used.

According to an embodiment, the server 1000 may obtain structure-of-interest information related to a commonly used structure as described above, and may perform an operation of detecting a data set of interest corresponding to the structure-of-interest information from structure data. . Also, the server 1000 may obtain target structure information of the neural network model based on the detected data set of interest.

See Figures 5-6. 5 is a flowchart specifying a step of obtaining target structure information of a neural network model according to an embodiment of the present application. 6 is a diagram illustrating one aspect of a method for generating a first optimal code according to an embodiment of the present application.

Acquiring target structure information of a neural network model according to an embodiment of the present application (S4120) includes acquiring structure information of interest (S4122), a data set of interest corresponding to structure information of interest from structure data (data set of interest) ) may be detected (S4124) and target structure information of the neural network model may be obtained based on the data set (S4126).

In the step of obtaining structure of interest information (S4122), the server 1000 may obtain structure of interest information related to the neural network model.

As described above, a commonly used calculation structure for each type of neural network model may constitute a neural network model. Specifically, a structure in which the first operation O1 is performed and the second operation O2 is performed based on the output value output from the first operation O1 can be generally used in a neural network model having a specific network space. For example, in the first model, a structure in which a convolution operation is performed, and a depthwise convolution operation and an activation operation are sequentially performed may be generally used. For another example, the second model obtains an intermediate result value by performing a depthwise convolution operation for compressing data for each channel by applying a filter for each channel related to the color of the image, and obtaining an intermediate result value. Based on this, it may include a structure for performing a pointwise operation.

In this case, the server 1000 may obtain structure information of interest related to a structure for performing the first operation O1 and the second operation O2. For example, structure of interest information may be previously input by a user. The server 1000 may obtain structure-of-interest information through a user's input. However, this is only an example and may be implemented to acquire structure-of-interest information related to the neural network model by any suitable method, and obtain target structure information based on the structure-of-interest information.

In the step of detecting a data set of interest corresponding to the structure of interest information from the structure data (S4124), the server 1000 selects an operation structure corresponding to the structure of interest information from among the data sets included in the structure data, based on the structure of interest information. A branch may be implemented to detect a data set of interest. In detail, if the structure-of-interest information includes information on a structure for performing the first operation O1 and the second operation O2, the server 1000 provides a structure corresponding to the structure-of-interest information among data sets included in the structure data. A first target operation TO1 and a second target operation TO2 related to the operation structure may be detected.

In step S4126 of acquiring object structure information of the neural network model based on the interest data set, the server 1000 may obtain object structure information of the neural network model based on the interest data set corresponding to the interest structure information. For example, object structure information of the neural network model may be obtained based on a data set of interest related to a structure in which the first object operation TO1 is performed and the second object operation TO2 is sequentially performed.

Referring back to FIG. 4 , the execution engine optimization method according to an embodiment of the present application may include generating a first optimal code for merging operations related to a data set of interest included in target structure information (S4130). there is. In detail, the server 1000 may generate a first optimal code configured to perform an operation by merging the first object operation TO1 and the second object operation TO2 related to the object structure information. In this case, the first optimal code may be generated using an operation fusion technique. Through an optimization process of merging these operation structures, memory usage (or memory allocation) required for each operation can be reduced, and operation speed can be increased.

Hereinafter, content of performing optimization of an execution engine according to another embodiment of the present application will be described in detail with reference to FIGS. 7 and 8 .

7 is a flowchart embodying a step (S4000) of performing optimization of an execution engine according to another embodiment of the present application. Specifically, FIG. 7 is a flowchart illustrating a method of optimizing a memory allocation amount according to another embodiment of the present application.

Optimizing the execution engine according to another embodiment of the present application (S4000) includes calculating the expected memory usage when the neural network model is operated in the computing environment of the embedded device (S4210) and based on the memory usage A step of generating a second optimal code for determining a memory allocation amount (S4220) may be included.

In the step of calculating the expected memory usage when the neural network model is operated in the computing environment of the embedded device (S4210), the server 1000 operates the neural network model based on the execution data of the neural network model and the computing environment information of the embedded device 100. Expected memory usage when the model is operated in the computing environment of the embedded device 100 may be calculated.

In the step of generating the second optimal code for determining the memory allocation amount based on the memory usage (S4220), the server 1000 generates the second optimal code for determining or adjusting the memory allocation amount using the memory usage amount, and executes the Optimization of the engine can be performed.

8 is a flowchart embodying a step of generating a second optimal code according to an embodiment of the present application. Specifically, FIG. 8 is a flowchart illustrating a method of optimizing a memory block according to an embodiment of the present application.

Generating the second optimal code (S4220) according to another embodiment of the present application includes acquiring location information of memory blocks (S4230), evaluating memory efficiency (S4240), and rearranging the memory blocks. It may include a step of generating (S4250).

In the step of acquiring location information of the memory block (S4230), the server 1000 may obtain location information of the memory block from memory information of the computing environment information of the embedded device 100.

In the step of evaluating the memory efficiency (S4240), the server 1000 may calculate the memory efficiency using the expected memory usage and memory allocation when the neural network model is operated in the computing environment of the embedded device 100. At this time, if a memory block is allocated to a specific location, memory efficiency may be disturbed. In this case, the server 1000 according to an embodiment of the present application may rearrange the memory blocks based on location information and memory efficiency of the memory blocks, as will be described later.

In generating the code for rearranging the memory block (S4250), the server 1000 may generate the code for rearranging the memory block based on location information and memory efficiency of the memory block. In detail, the server 1000 may generate code for rearranging memory blocks by using location information of memory blocks when memory efficiency is lower than a preset threshold efficiency value. Through the process of optimizing the location of the memory block, the execution engine optimization system 10 according to an embodiment of the present application may allocate an optimal memory for the computing environment (or computing specification) of the embedded device 100 .

Hereinafter, with reference to FIG. 9 , optimization of an execution engine according to another embodiment of the present application will be described in detail.

9 is a flowchart detailing steps for performing optimization of an execution engine according to another embodiment of the present application.

Optimizing the execution engine according to an embodiment of the present application (S4000) includes comparing memory usage and memory allocation (S4310) and generating code for adjusting memory usage based on the comparison result (S4310). S4320) may be included.

In the step of comparing the memory usage and memory allocation (S4230), the server 1000 compares the memory usage expected when the neural network model is operated in the embedded device 100 with the memory allocation determined based on the memory usage.

In step S4320 of generating code for adjusting memory usage based on the comparison result, the server 1000 may generate code for adjusting memory usage by comparing memory usage and memory allocation.

For example, when the calculated memory usage is less than the memory allocation, the server 1000 may generate code for adjusting the memory usage to increase. In detail, the server 1000 may be implemented to improve the performance of the neural network model in the embedded device 100 by increasing the cache hit ratio by increasing memory usage by using the Im2Col extension technique. More specifically, when the Im2Col extension technique is used, execution speed can be expected to improve as the cache hit rate increases instead of memory usage increasing. The server 1000 according to an embodiment of the present application utilizes the Im2Col extension technique when the expected memory usage is smaller than the memory allocation amount when the neural network model is operated in the embedded device 100, while maximizing the use of the memory. The execution engine can be optimized to improve the execution speed of the neural network model in (100).

As another example, when the calculated memory usage is greater than the allocated memory, the server 1000 may generate code for adjusting the memory usage to be lowered.

The embedded device 100 of the execution engine optimization system 10 according to an embodiment of the present application analyzes the structure of a model by using an execution engine optimized in the server 1000 superior to the computing specifications of the embedded device 100. Alternatively, you can directly run the neural network model without deciding on memory allocation.

In particular, according to the execution engine optimization system 10 according to an embodiment of the present application, an algorithm for analyzing the structure of a model and merging operations for a specific structure may be applied to the execution engine. In addition, according to the execution engine optimization system 10 according to an embodiment of the present application, a complex and improved memory allocation algorithm or memory block rearrangement algorithm is applied to the execution engine in consideration of the computing specifications of the embedded device 100. can do. Accordingly, the execution capability of the neural network model in the embedded device 100 may be improved.

In addition, according to the execution engine optimization system 10 according to an embodiment of the present application, an operation of analyzing the structure of the above-described model or determining memory allocation is not performed in the embedded device 100 having limitations in the computing environment. Instead, it is performed in the server 1000, which has a relatively excellent computing environment. Therefore, when the model is executed in the embedded device 100, the memory of the embedded device 100 can be efficiently utilized and power consumption of the embedded device can be reduced.

Various operations of the server 1000 described above may be stored in the memory 1200 of the server 1000, and the processor 1300 of the server 1000 may be provided to perform the operations stored in the memory 1200.

The execution engine optimization method, execution engine optimization device, and execution engine optimization system disclosed in this application provide an efficient method of artificial intelligence models in various embedded systems, including home appliances, vehicle sensors, products for the safety of infants or the elderly, and smart watches. can be used for execution.

The features, structures, effects, etc. described in the embodiments above are included in at least one embodiment of the present invention, and are not necessarily limited to only one embodiment. Furthermore, the features, structures, effects, etc. illustrated in each embodiment can be combined or modified with respect to other embodiments by those skilled in the art in the field to which the embodiments belong. Therefore, contents related to these combinations and variations should be construed as being included in the scope of the present invention.

In addition, although the embodiment has been described above, this is only an example and does not limit the present invention, and those skilled in the art to the present invention pertain to the above to the extent that does not deviate from the essential characteristics of the present embodiment. It will be appreciated that various modifications and applications not exemplified are possible. That is, each component specifically shown in the embodiment can be implemented by modifying it. And differences related to these modifications and applications should be construed as being included in the scope of the present invention as defined in the appended claims.

Claims

A method for optimizing an inference engine to be used in an embedded device exhibiting a first performance by a server exhibiting a second performance superior to the first performance in consideration of a computing environment of the embedded device, the method comprising:

Acquiring binary data of the neural network model on which training is completed;

extracting execution data of the neural network model from the binary data, wherein the execution data is related to at least one of execution sequence data of the neural network model and structural data of the neural network model;

obtaining computing environment information of the embedded device, wherein the computing environment information includes at least one of memory information and processor information of the embedded device;

predicting an operation of the neural network model in the embedded device based on the execution data and the computing environment information and performing optimization of the execution engine;

obtaining optimal code information to be used in the execution engine based on the optimization result; and

Transmitting the optimal code information; including,

Execution Engine Optimization Methods.
According to claim 1,

The step of optimizing the execution engine,

acquiring the structure data of the neural network model from the execution data;

obtaining target structure information of the neural network model from the structure data; and

Generating a first optimal code for merging operations related to a data set of interest included in the target structure information;

Execution Engine Optimization Methods.
According to claim 2,

Obtaining the target structure information,

obtaining structure-of-interest information of the previously set neural network model;

detecting the set of interest data corresponding to the structure information of interest from the structure data; and

Acquiring the target structure information of the neural network model based on the interest data set; comprising,

Execution Engine Optimization Methods.
According to claim 1,

The step of optimizing the execution engine,

calculating expected memory usage when the neural network model is operated in the computing environment of the embedded device based on the execution data and the computing environment information; and

Generating a second optimal code for determining a memory allocation amount based on the memory usage;

Execution Engine Optimization Methods.
According to claim 4,

Generating the second optimal code,

obtaining location information of a memory block from the memory information of the computing environment information;

Evaluating memory efficiency based on the memory usage and the memory allocation; and

Generating a code for rearranging the memory block based on the location information of the memory block and the memory efficiency;

Execution Engine Optimization Methods.
According to claim 4,

The step of optimizing the execution engine,

comparing the memory usage and the memory allocation; and

Generating a code for adjusting the memory usage based on a comparison result of the memory allocation amount and the memory usage;

Execution Engine Optimization Methods.
According to claim 6,

The code for controlling the memory usage is related to the Im2Col conversion code,

Execution Engine Optimization Methods.
A computer-readable recording medium recording a program for executing the method according to any one of claims 1 to 7 in a computer.
A system for optimizing an inference engine to be used in an embedded device in consideration of the computing environment of the embedded device, the system comprising:

a processor for generating an optimal code for optimizing an execution engine to be used in an embedded device based on data of a trained neural network model; and a transceiver for communicating with the embedded device; a server including; and

An embedded device for obtaining the optimal code and executing the optimal code; including,

the processor,

Acquiring binary data of a neural network model that has been trained, extracting execution data of the neural network model from the binary data, the execution data being related to at least one of execution order information of the neural network model and structural data of the neural network model, , Computing environment information of the embedded device - the computing environment information includes at least one of memory information and processor information of the embedded device - is obtained, and based on the execution data and the computing environment information, in the embedded device Predicting the operation of the neural network model, performing optimization of the execution engine, obtaining optimal code information to be used in the execution engine based on the optimization result, and transmitting the optimal code information through the transceiver. configured to transmit to an embedded device;

Execution engine optimization system.