WO2023163453A1

WO2023163453A1 - Neural network model optimization method to be executed in embedded device, neural network model optimization apparatus, and neural network model optimization system

Info

Publication number: WO2023163453A1
Application number: PCT/KR2023/002279
Authority: WO
Inventors: 이현재; 김재우
Original assignee: 주식회사 에너자이
Priority date: 2022-02-23
Filing date: 2023-02-16
Publication date: 2023-08-31

Abstract

A neural network model optimization method according to an embodiment of the present application comprises the steps of: obtaining execution data of a neural network model that has completed training, wherein the execution data includes at least one of layer data of the neural network model, operation data constituting the neural network model, and a parameter of the neural network model; on the basis of the execution data of the neural network model, optimizing the structure of the neural network model and obtaining instruction information; on the basis of the instruction information, optimizing an embedded device in which the neural network model is to be run and obtaining optimal code information; and transmitting the optimal code information.

Description

A method for optimizing a neural network model to be executed in an embedded device, a device for optimizing a neural network model, and a system for optimizing a neural network model

The present application relates to a method for optimizing a neural network model, an apparatus for optimizing a neural network model, and a system for optimizing a neural network model. Specifically, the present application relates to a method, apparatus, and system for optimizing a neural network model to be executed in an embedded device.

As artificial intelligence technology develops, there is a demand for artificial intelligence technology to be applied to embedded devices with embedded systems used in various industries. Accordingly, lightweight technologies have been developed, and artificial intelligence technology can be applied to embedded devices with low performance and low specifications. In particular, artificial intelligence technology can be applied to embedded devices through inference engine technology, which is software developed to efficiently execute pre-learned artificial intelligence models on embedded devices.

A conventional embedded AI execution engine adopts a method of reading information about model execution in the embedded device itself, setting a model execution order, and allocating memory necessary for model execution to execute the model. However, executing the above-described preparation processes for model execution in the embedded device itself, which has limitations in memory space, causes a considerable burden on the hardware environment of the embedded device.

In addition, conventional embedded artificial intelligence execution engines have a problem in that additional manual work is required for optimization of non-standardized hardware requirements. Specifically, conventional embedded artificial intelligence execution engines have limitations in that engine codes must be manually modified one by one in order to use special instructions of each hardware or to use optimization techniques such as For Loop Unrolling. That is, the conventional embedded artificial intelligence execution engine could be optimized only for specific hardware, and in order to actually use it in hardware other than specific hardware, a process of manually optimizing execution functions was essentially required.

Accordingly, it is required to develop a neural network model optimization method, device, and system for optimally executing a neural network model in an embedded device based on an artificial intelligence model and hardware information (computing specifications) of the embedded device.

One problem to be solved by the present invention is to provide a neural network model optimization method, a neural network model optimization apparatus, and a neural network model optimization system for optimally executing a neural network model in an embedded device in consideration of hardware information of the embedded device.

The problem to be solved by the present invention is not limited to the above-mentioned problems, and problems not mentioned will be clearly understood by those skilled in the art from this specification and the accompanying drawings. .

A method for optimizing a neural network model according to an embodiment of the present application includes the steps of acquiring execution data of a neural network model that has been trained - the execution data includes layer data of the neural network model and operation data constituting the neural network model. , and at least one of parameters of the neural network model; optimizing the structure of the neural network model based on execution data of the neural network model and obtaining instruction information; optimizing an embedded device in which the neural network model is to be driven based on the instruction information and obtaining optimal code information; and transmitting the optimal code information, wherein the optimizing the structure of the neural network model and obtaining instruction information includes generating a directed acyclic graph (DAG) based on the execution data. ; determining an execution order of operations based on the directed acyclic graph; detecting a target operation pattern of the directed acyclic graph corresponding to the reference operation pattern based on a predetermined reference operation pattern, and merging a first target operation and a second target operation included in the target operation pattern; obtaining a first memory space map based on the determined execution order, and performing optimization related to memory allocation based on the first memory space map; and generating an instruction related to the memory address based on a result of performing the optimization.

An apparatus for optimizing a neural network model according to an embodiment of the present application includes a transceiver for acquiring execution data of a trained neural network model and computing environment information of an embedded device in which the neural network model is to be driven; and a processor configured to optimize the neural network model based on the execution data and computing environment information of the embedded device, wherein the processor includes execution data of the neural network model that has been learned - the execution data is the neural network model. Obtaining at least one of layer data of, operation data constituting the neural network model, and parameters of the neural network model, and determining the structure of the neural network model based on the execution data of the neural network model. Perform optimization and obtain instruction information, perform optimization for an embedded device in which the neural network model is to be driven based on the instruction information, obtain optimal code information, and transmit the optimal code information, wherein the processor , A directed acyclic graph (DAG) is generated based on the execution data, an execution order of operations is determined based on the directed acyclic graph, and an execution sequence of operations corresponding to the reference operation pattern is determined based on a predetermined reference operation pattern. Detecting a target operation pattern of a directed acyclic graph, merging a first target operation and a second target operation included in the target operation pattern, obtaining a first memory space map based on the determined execution order, and 1 It may be configured to obtain the instruction information by performing optimization related to memory allocation based on a memory space map and generating an instruction related to a memory address based on a result of performing the optimization.

A method for optimizing a neural network model according to an embodiment of the present application includes the steps of acquiring execution data of a neural network model that has been trained - the execution data includes layer data of the neural network model and operation data constituting the neural network model. and at least one of parameters of the neural network model; Optimizing the structure of the neural network model based on execution data of the neural network model and obtaining instruction information, wherein the instruction information includes information related to at least one of a type of operation, an operation value, and a memory address. ; optimizing an embedded device in which the neural network model is to be driven based on the instruction information and obtaining optimal code information; and transmitting the optimal code information, wherein the obtaining of the optimal code information comprises: acquiring computing environment information of the embedded device; obtaining an optimization parameter from instruction information through an agent trained by reinforcement learning; and generating code information to be used in the embedded device based on the optimization parameter.

An apparatus for optimizing a neural network model according to an embodiment of the present application includes a transceiver for acquiring execution data of a trained neural network model and computing environment information of an embedded device in which the neural network model is to be driven; and a processor configured to optimize the neural network model based on the execution data and computing environment information of the embedded device, wherein the processor includes execution data of the neural network model that has been learned - the execution data is the neural network model. Obtaining at least one of layer data of, operation data constituting the neural network model, and parameters of the neural network model, and determining the structure of the neural network model based on the execution data of the neural network model. Optimization is performed, instruction information, including information related to at least one of a type of operation and a memory address, is obtained, and optimization for an embedded device in which the neural network model is to be driven is performed based on the instruction information. and obtains optimal code information, and is configured to transmit the optimal code information, wherein the processor obtains computing environment information of the embedded device and obtains an optimization parameter from instruction information through an agent trained by reinforcement learning. and to obtain the optimal code information by generating code information to be used in the embedded device based on the optimization parameter.

The solutions to the problems of the present invention are not limited to the above-described solutions, and solutions not mentioned will be clearly understood by those skilled in the art from this specification and the accompanying drawings. You will be able to.

According to the neural network model optimization method, the neural network model optimization apparatus, and the neural network model optimization system according to an embodiment of the present application, the non-standardization problem of hardware of an embedded device and the structural limitations of existing artificial intelligence execution engines are solved, and the neural network model It can be optimized for the hardware platform of the embedded device.

According to the neural network model optimization method, the neural network model optimization apparatus, and the neural network model optimization system according to the embodiments of the present application, the execution capability of the neural network model in an embedded device can be improved.

According to the neural network model optimization method, the neural network model optimization apparatus, and the neural network model optimization system according to embodiments of the present application, power consumption required to execute a neural network model in an embedded device can be reduced.

Effects of the present invention are not limited to the above-mentioned effects, and effects not mentioned will be clearly understood by those skilled in the art from this specification and the accompanying drawings.

1 is a schematic diagram of a neural network model optimization system according to an embodiment of the present application.

2 is a flowchart illustrating a method for optimizing a neural network model according to an embodiment of the present application.

3 is a flowchart specifying steps of optimizing the structure of a neural network model and obtaining instruction information according to an embodiment of the present application.

4 is a diagram illustrating an aspect of a directed acyclic graph according to an embodiment of the present application.

5 is a diagram illustrating an aspect of merging a first target operation and a second target operation according to an embodiment of the present application.

6 is a diagram illustrating one aspect of a first memory space map according to an embodiment of the present application.

7 is a diagram illustrating an aspect of optimization related to memory allocation according to an embodiment of the present application.

8 is a flowchart specifying steps of optimizing an embedded device and obtaining optimal code information according to an embodiment of the present application.

9 is a diagram illustrating an aspect of training an agent through a reinforcement learning method according to an embodiment of the present application.

According to an embodiment of the present application, the merging of the first target operation and the second target operation may include obtaining predetermined reference operation pattern information—the reference operation pattern information includes the first operation and the first operation pattern. includes a second operation associated with the operation; detecting the first target operation corresponding to the first operation and the second target operation corresponding to the second operation from the directed acyclic graph based on the reference operation pattern information; and merging the first target operation and the second target operation, and transforming a kernel based on a result of the merging.

According to one embodiment of the present application, the performing of optimization related to memory allocation may include generating the first memory space map based on the determined execution order and the size of data output through the operation; changing a first memory tensor storing a value input through a third target operation into a second memory tensor storing a value output through the third target operation; and generating a second memory space map from the first memory space map based on the change result.

According to an embodiment of the present application, the determining of the execution order of the operations may include a first memory space required for a fourth target operation included in a first branch of the directed acyclic graph and the directional ratio. calculating a second memory space required for a fifth target operation included in a second branch of the cycle graph; comparing the first memory space and the second memory space; and determining an execution order of the fourth target operation and the fifth target operation according to the comparison result.

According to an embodiment of the present application, when the first memory space is larger than the second memory space, the execution order of the fourth target operation is assigned a lower priority than the execution order of the fifth target operation, and the first When the memory space is smaller than the second memory space, the execution order of the fourth target operation may be assigned a priority over the execution order of the fifth target operation.

According to an embodiment of the present application, the optimizing the structure of the neural network model and acquiring instruction information may include acquiring input data and output data related to an operation of the neural network model; and adjusting the input data and the output data to values corresponding to a predetermined integer range.

According to an embodiment of the present application, a computer-readable recording medium recording a program for executing the neural network model optimization method may be provided.

According to an embodiment of the present application, the obtaining of the optimization parameter may include inputting at least one of at least one operation type information corresponding to the operation, memory state information, and computing environment information of the embedded device to the agent. doing; and obtaining an optimization parameter output through the agent.

According to an embodiment of the present application, the optimization parameter may be related to at least one of a parameter for selecting an algorithm type to be performed for the operation, a parameter related to a block size of the operation, and a parameter related to a code length. there is.

According to an embodiment of the present application, the agent, according to an initial rule, is based on target device information related to a computing environment of a target embedded device, memory state information, and at least one algorithm type information corresponding to an operation. The agent may be trained by updating the initial rule so that an evaluation value for the performance of a code generated through the predicted value is maximized.

According to one embodiment of the present application, generating code information to be used in the embedded device may include generating codes to correspond to the instruction information based on the optimization parameter; and compiling the generated code and converting it into a binary file format.

The foregoing objects, features and advantages of the present application will become more apparent from the following detailed description taken in conjunction with the accompanying drawings. However, the present application can apply various changes and can have various embodiments. Hereinafter, specific embodiments will be illustrated in the drawings and described in detail.

Like reference numerals designate essentially like elements throughout the specification. In addition, components having the same function within the scope of the same idea appearing in the drawings of each embodiment will be described using the same reference numerals, and overlapping descriptions thereof will be omitted.

If it is determined that a detailed description of a known function or configuration related to the present application may unnecessarily obscure the subject matter of the present application, the detailed description thereof will be omitted. In addition, numbers (eg, first, second, etc.) used in the description process of this specification are only identifiers for distinguishing one component from another component.

In addition, the suffixes "module" and "unit" for components used in the following embodiments are given or used interchangeably in consideration of ease of writing the specification, and do not have meanings or roles that are distinguished from each other by themselves.

In the following examples, expressions in the singular number include plural expressions unless the context clearly dictates otherwise.

In the following embodiments, terms such as include or have mean that features or components described in the specification exist, and do not preclude the possibility that one or more other features or components may be added.

In the drawings, the size of components may be exaggerated or reduced for convenience of explanation. For example, the size and thickness of each component shown in the drawings are arbitrarily shown for convenience of explanation, and the present invention is not necessarily limited to those shown.

If an embodiment is otherwise implementable, the order of specific processes may be performed differently from the order described. For example, two processes that are described in succession may be performed substantially concurrently, or may proceed in an order reverse to that described.

In the following embodiments, when components are connected, a case in which the components are directly connected as well as a case in which components are interposed between the components and connected indirectly is included.

For example, when it is said that components are electrically connected in this specification, not only the case where the components are directly electrically connected, but also the case where the components are interposed and electrically connected indirectly is included.

Hereinafter, the neural network model optimization method, the neural network model optimization apparatus, and the neural network model optimization system of the present application will be described with reference to FIGS. 1 to 9 .

The neural network model optimization system 10 according to an embodiment of the present application may include the embedded device 100 and the neural network model optimization device 1000 (or server).

The embedded device 100 may mean a device including a programmable arbitrary embedded system made for a specific purpose (or specific function). The embedded device 100 may include hardware including a processor and/or memory. Also, the embedded device 100 may include firmware for controlling hardware. In addition, the embedded device 100 may be configured to execute an arbitrary artificial intelligence model by inputting arbitrary software to firmware, including an artificial intelligence execution engine. Here, the artificial intelligence execution engine (Inference Engine) is software for maximally and efficiently executing a pre-learned neural network model on the embedded device 100, and is a technology aimed at actual use of artificial intelligence and improves efficiency in the environment of the device to be mounted. height function. For example, in the case of a mobile device, an execution engine may be implemented in accordance with the specifications of a slow operation speed and low power consumption, which are computing environments of the mobile device. As another example, in the case of a PC server having relatively high computing performance, an execution engine may be implemented to maximize high-performance parallel processing capability.

The embedded device 100 according to an embodiment of the present application obtains optimal code information optimized for the computing environment of the embedded device 100 from the neural network model optimization device 1000, and adds (or inputs) the optimal code information to the firmware. )can do. As will be described later, optimal code information can be generated by analyzing the internal structure of the neural network model. In addition, the optimal code information may be generated in consideration of a computing environment including memory specifications and/or processor specifications of the embedded device 100 . In addition, the embedded device 100 may add optimal code information generated by the neural network model optimization apparatus 1000 to firmware and execute the neural network model.

The neural network model optimization apparatus 1000 according to an embodiment of the present application is configured so that a neural network model trained in any device (or server) other than the embedded device 100 can be optimally executed in the computing environment of the embedded device 100. , optimization of the computational structure and/or memory allocation of the neural network model may be performed. In addition, the apparatus 1000 for optimizing the neural network model according to an embodiment of the present application may automatically generate optimal code for optimally executing the neural network model in the computing environment of the embedded device 100 .

An apparatus 1000 for optimizing a neural network model according to an embodiment of the present application may include a transceiver 1100, a memory 1200, and a processor 1300.

The transceiver 1100 of the neural network model optimization apparatus 1000 may communicate with any external device including the embedded device 100 . For example, the neural network model optimization apparatus 1000 may transmit optimal code information obtained by performing optimization to the embedded device 100 through the transceiver 1100 . In addition, the neural network model optimization apparatus 1000 may receive computing environment information of the embedded device 100 from the embedded device 100 or any external device through the transceiver 1100 . In addition, the neural network model optimization apparatus 1000 may receive a trained neural network model and/or execution data for executing the neural network model through the transceiver 1100 .

The apparatus 1000 for optimizing a neural network model may transmit and receive various types of data by accessing a network through the transceiver 1100 . The transceiver may largely include a wired type and a wireless type. Since the wired type and the wireless type each have advantages and disadvantages, the wired type and the wireless type may be simultaneously provided in the apparatus 1000 for optimizing a neural network model in some cases. Here, in the case of the wireless type, a wireless local area network (WLAN)-based communication method such as Wi-Fi may be mainly used. Alternatively, in the case of a wireless type, a cellular communication, eg, LTE, 5G-based communication method may be used. However, the wireless communication protocol is not limited to the above example, and any suitable wireless type communication method may be used. In the case of a wired type, LAN (Local Area Network) or USB (Universal Serial Bus) communication is a representative example, and other methods are also possible.

The memory 1200 of the neural network model optimization apparatus 1000 may store various kinds of information. Various types of data may be temporarily or semi-permanently stored in the memory 1200 . Examples of the memory may include a hard disk drive (HDD), a solid state drive (SSD), flash memory, read-only memory (ROM), and random access memory (RAM). there is. The memory 1200 may be provided in a form embedded in the neural network model optimization apparatus 1000 or in a detachable form. The memory 1200 includes an operating system (OS) for driving the neural network model optimization device 1000 or a program for operating each component of the neural network model optimization device 1000, as well as the components of the neural network model optimization device 1000. Various data required for operation may be stored.

The processor 1300 may control overall operations of the neural network model optimization apparatus 1000 . For example, the processor 1300 may perform an operation of acquiring execution data of a neural network model for which learning has been completed, an operation of optimizing the structure of the neural network model, an operation of performing optimization of an embedded device, and an operation of performing optimization result. Overall operations of the neural network model optimization apparatus 1000, such as obtaining optimal code information and/or transmitting optimal code information, may be controlled. In detail, the processor 1300 may load and execute a program for overall operation of the neural network model optimization apparatus 1000 from the memory 1200 . The processor 1300 may be implemented as an application processor (AP), a central processing unit (CPU), a microcontroller unit (MCU), or a similar device according to hardware, software, or a combination thereof. In this case, in terms of hardware, it may be provided in the form of an electronic circuit that processes electrical signals to perform a control function, and in terms of software, it may be provided in the form of a program or code that drives a hardware circuit.

Hereinafter, an operation of the neural network model optimization apparatus 1000 and a method for optimizing a neural network model according to an embodiment of the present application will be described in detail with reference to FIGS. 2 to 9 .

A method for optimizing a neural network model according to an embodiment of the present application includes obtaining execution data of a trained neural network model (S1000), optimizing the structure of the neural network model and obtaining instruction information (S2000), Optimizing the embedded device and obtaining optimal code information (S3000) and transmitting the optimal code information (S4000) may be further included.

In the step of acquiring execution data of the learned neural network model ( S1000 ), the apparatus 1000 for optimizing the neural network model may obtain execution data of the learned neural network model through the transceiver 1100 . Here, the execution data is any data necessary to execute the neural network model, including layer data of the neural network model, operation data constituting the neural network model, and/or arbitrary weights (or parameters) related to the neural network model. It may mean covering the appropriate data of.

In the step of optimizing the structure of the neural network model and obtaining instruction information (S2000), the neural network model optimization apparatus 1000 optimizes the structure of the neural network model based on the execution data of the neural network model, for example, the structure of the neural network model. Lightening can be performed for . For example, the neural network model optimization apparatus 1000 may be configured to perform an operation of detecting an operation pattern included in an operation structure of a neural network model and merging target operations included in the operation pattern. As another example, the neural network model optimization apparatus 1000 may determine an execution order of operations of the neural network model, and perform memory allocation-related optimization based on the determined execution order.

In the step of optimizing the structure of the neural network model and obtaining instruction information (S2000), the neural network model optimization apparatus 1000 may obtain instruction information according to the optimization result. Here, the instruction information may include instructions for types of operations of the neural network model and/or instructions for a memory address related to each operation of the neural network model. Step S2000 will be described in more detail in FIGS. 3 to 7 .

In the step of optimizing the embedded device and obtaining optimal code information (S3000), the neural network model optimization apparatus 1000 is a code for optimally executing the trained neural network model in the computing environment of the embedded device 100. can create Specifically, the apparatus 1000 for optimizing the neural network model obtains optimization parameters based on the instruction information and embedded device information acquired in step S2000 through an agent trained with a reinforcement learning technique, and the embedded device 100 based on the optimization parameters. It can be implemented to generate optimal code information to be used in Step S3000 will be described in more detail in FIGS. 8 to 9 .

In the step of transmitting the optimal code information (S4000), the neural network model optimization apparatus 1000 transmits the obtained optimal code information through the transceiver 1100 to any external device (or external device including the embedded device 100). server).

Hereinafter, the optimization of the structure of the neural network model according to an embodiment of the present application will be described in more detail with reference to FIGS. 3 to 7 .

In the step of optimizing the structure of the neural network model and obtaining instruction information (S2000) according to an embodiment of the present application, generating a directed acyclic graph (DAG) based on execution data ( S2100), determining an execution order of operations based on the DAG (S2200), detecting a target operation pattern of the DAG corresponding to a predetermined reference operation pattern, and including a first target operation and a second target included in the target operation pattern merging operations (S2300), obtaining a first memory space map based on the determined execution order, and performing optimization related to memory allocation based on the first memory space map (S2400); Based on this, a step of generating an instruction related to the memory address (S2500) may be further included.

In the step of generating a directed acyclic graph (DAG) based on the execution data (S2100), the neural network model optimization apparatus 1000 performs the execution data of the neural network model (eg, layer data of the neural network model, the neural network model). A DAG may be generated based on operation data constituting , and/or parameters of a neural network model, etc.). DAG may refer to any infinite directed graph without directed cycles.

4 is a diagram illustrating one aspect of a DAG according to an embodiment of the present application.

The DAG may include information related to a data dependency relationship of each function constituting the neural network model, for example, a connection relationship between each function. For example, the DAG is information related to a connection relationship related to a first branch structure including a second operation (eg, B) connected to a first operation (eg, A) and a third operation (eg, C) connected to the second operation. can include For example, the DAG is information related to a connection relationship related to a second branch structure including a fourth operation (eg, D) connected to a first operation (eg, A) and a fifth operation (eg, E) connected to the fourth operation. can include In addition, the DAG may include information related to a connection relationship related to the third operation and the sixth operation (eg, F) connected to the fifth operation. However, FIG. 4 is only an example for convenience of description of the DAG, and is not construed as being limited thereto.

In determining the execution order of operations based on the DAG ( S2200 ), the neural network model optimization apparatus 1000 may determine the execution order of one or more operations constituting the neural network model using the DAG.

Any suitable rules may be used to determine the order of execution.

For example, the neural network model optimization apparatus 1000 may determine an execution order of DAG operations based on a memory space required for operations constituting the neural network model. For example, the apparatus 1000 for optimizing the neural network model provides a first memory space required for an operation included in a first branch of the DAG (eg, B in FIG. 4 ) and an operation included in a second branch of the DAG (eg, in FIG. 4 ). The second memory space required for D) is calculated, the first memory space and the second memory space are compared, and the operation included in the first branch (eg, B in FIG. 4) and the second branch are performed based on the comparison result. It is possible to determine an execution order between operations included in (eg, D in FIG. 4 ). Specifically, when the first memory space is larger than the second memory space, the neural network model optimization apparatus 1000 changes the execution order of the operation included in the first branch (eg, B in FIG. 4 ) to the operation included in the second branch ( For example, it may be assigned in a lower order than the execution order of D) of FIG. 4 . On the other hand, when the first memory space is smaller than the second memory space, the neural network model optimization apparatus 1000 changes the execution order of the operation included in the first branch (eg, B in FIG. 4 ) to the operation included in the second branch (eg, B in FIG. 4 ). , can be assigned in priority order of execution of D) of FIG. 4 .

As another example, the apparatus 1000 for optimizing the neural network model may determine the execution order of operations constituting the neural network model in consideration of the branch structure of the DAG. For example, sequentially executing operations included in the same branch may be more advantageous in terms of memory space than executing operations included in a first branch and executing operations included in a second branch. Therefore, the neural network model optimization apparatus 1000 sequentially executes the operations included in the first branch (eg, B and C in FIG. 4 ), and then performs the operations included in the second branch (eg, B and C in FIG. 4 ). The order of execution can be determined to execute D and E). Alternatively, the neural network model optimization apparatus 1000 executes operations included in the second branch (eg, D and E in FIG. 4 ) and performs operations included in the first branch (eg, B and C in FIG. 4 ). You can decide the execution order to execute.

In the step of detecting the target operation pattern of the DAG corresponding to the predetermined reference operation pattern and merging the first target operation and the second target operation included in the target operation pattern (S2300), the neural network model optimization apparatus 1000, A predetermined reference operation pattern may be obtained, and a target operation pattern of the DAG corresponding to the reference operation pattern may be detected. In addition, in the step of detecting the target operation pattern of the DAG corresponding to the predetermined reference operation pattern and merging the first target operation and the second target operation included in the target operation pattern (S2300), the neural network model optimization apparatus 1000 may be implemented to perform an operation of merging operations included in the detected target operation pattern.

More specifically, the step of merging the first target operation and the second target operation according to an embodiment of the present application includes predetermined reference operation pattern information-reference operation pattern information is a first operation and a second operation associated with the first operation. Detecting a first target operation corresponding to the first operation and a second target operation corresponding to the second operation from the DAG based on the reference operation pattern information, and the first target operation The method may further include merging the first target operation and the second target operation, and converting a kernel based on a result of the merging.

In the step of acquiring predetermined reference operation pattern information, wherein the reference operation pattern information includes a first operation and a second operation associated with the first operation, the neural network model optimization apparatus 1000 obtains the predetermined reference operation pattern information. can be obtained In this case, the reference calculation pattern information is information related to a commonly used calculation pattern, and may be set in advance. For example, the reference operation pattern information may be related to an operation pattern including a first operation (eg, convolution) and a second operation connected to the first operation (eg, Rectified Linear Unit (ReLu)). However, this is merely an example for convenience of explanation, and any suitable operation pattern may be set in advance. For example, the reference operation pattern information may be related to an operation pattern in which a convolution operation is performed, and a depthwise convolution operation and an activation operation are sequentially performed. As another example, the reference operation pattern information obtains an intermediate result value by performing a depthwise convolution operation for compressing data for each channel by applying a filter for each channel related to the color of the image, and based on the intermediate result value It may include an operation pattern that performs a pointwise operation.

In the step of detecting a first target operation corresponding to the first operation and a second target operation corresponding to the second operation from the DAG based on the reference operation pattern information, the neural network model optimization apparatus 1000 uses the reference operation pattern information. a first target operation (eg, the first target operation in FIG. 5 ) corresponding to the first operation (eg, convolution) included in the DAG, and a second target operation (eg, ReLu) corresponding to the second operation (eg, ReLu) , the second target operation of FIG. 5) may be detected. At this time, an arbitrary pattern matching algorithm may be used to detect the target operation pattern.

In the step of merging the first target operation and the second target operation and transforming the kernel based on the merge result, the neural network model optimization apparatus 1000 performs the first target operation (eg, convolution in FIG. 5 ) and the second target operation. (eg, ReLu in FIG. 5 ) and convert kernels included in the target operation pattern into a single kernel (eg, Convolution + ReLu kernel) based on the merge result. According to the present embodiment, by merging the first target operation and the second target operation and performing the operation integrally, an advantageous effect of reducing a memory space required for the operation and increasing execution speed may be provided.

Referring back to FIG. 2 , in the step of optimizing the structure of the neural network model and obtaining instruction information (S2000), the neural network model optimization apparatus 1000 performs input data and/or output data of calculation constituting the neural network model. It is possible to perform purification on . In detail, the neural network model optimization apparatus 1000 may convert input data and/or output data of a decimal type operation into an integer within a specific range by using a quantization technique. Here, quantization is a technique of converting a 32-bit decimal point (float) value into an 8-bit integer (int). The neural network model optimization apparatus 1000 calculates a scale and a zero point for each data tensor for an operation constituting a neural network model, and converts a 32-bit decimal point value into an 8-bit integer value (int8Value) having a relatively small capacity can be converted to For example, the neural network model optimization apparatus 1000 may be configured to convert or adjust a 32-bit decimal point value into an 8-bit integer value through the following equation.

Formula: Int8Value = (RealValue/Scale)- ZeroPoint

However, the aforementioned 8-bit integer quantization is just one example, and the neural network model optimization apparatus 1000 converts or adjusts an arbitrary decimal point value to 0 or 8-bit natural number value in addition to the 8-bit integer to fit the structure of the neural network model. It can be implemented to perform optimization for

Referring back to FIG. 3 , in the step of optimizing the structure of a neural network model and obtaining instruction information according to an embodiment of the present application (S2000), a first memory space map is obtained based on the determined execution order, , performing optimization related to memory allocation based on the first memory space map (S2400).

In the step of obtaining a first memory space map based on the determined execution order and performing optimization related to memory allocation based on the first memory space map (S2400), the neural network model optimization apparatus 1000 determines the determined in step S2200. A first memory space map may be generated based on an execution order and a memory space required for each operation. Specifically, according to the rule described above with reference to FIG. 4 , the execution order of operations constituting the neural network model (eg, the order of A, B, C, D, E, and F in FIG. 6 ) may be determined. In this case, the neural network model optimization apparatus 1000 may generate a first memory space map based on a memory space related to the size of data output through each operation. For example, the neural network model optimization apparatus 1000 considers the memory space required for data output through operation A and the execution order of operations (eg, operation B and operation D) requiring data output through operation A, and operation A You can place a memory tensor (eg, T1) associated with In addition, the neural network model optimization apparatus 1000 considers a memory space required for data output through operation B and an execution order of operations (eg, operation C) requiring data output through operation B, and performs a memory space related to operation B. A tensor (e.g., T2) can be placed. In this case, the T2 memory tensor in which data output through operation B is stored may be disposed adjacent to the T1 memory tensor related to operation A that stores data necessary for operation B. In a similar way, the neural network model optimization apparatus 1000 considers the memory space related to the size of data output through operations constituting the neural network model (eg, operations C, D, E, etc.) and the execution order of operations in the first memory. You can create spatial maps.

In the above, the operation of arranging the memory tensor of the neural network model optimization apparatus 1000 centering on the first memory space map shown in FIG. 6 has been described. However, the first memory space map shown in FIG. 6 is only an example for convenience of explanation, and should not be construed as being limited thereto.

In the step of obtaining a first memory space map based on the determined execution order and performing optimization related to memory allocation based on the first memory space map (S2400), the neural network model optimization apparatus 1000 configures the neural network model. A first memory space map generated based on an execution order of operations and a memory space related to a size of data output through the operations may be obtained.

Also, the neural network model optimization apparatus 1000 may perform memory allocation-related optimization based on the first memory space map. As an example, the neural network model optimization apparatus 1000 uses a memory in-placing technique to assign an output space of a specific operation (eg, a ReLu operation, an Add operation, and/or a Sigmoid operation) to a corresponding You can perform an operation to overwrite the input space of an operation. Specifically, the neural network model optimization apparatus 1000 utilizes a memory in-placement technique to store a value input to a specific operation (eg, the third target operation (operation C) in FIG. 7). It can be implemented to change the T2 memory tensor of 7) to a memory tensor (eg, the T3 memory tensor of FIG. 7) in which a value output through a specific operation (eg, the third target operation (Operation C) of FIG. 7) is stored. there is. Also, the apparatus 1000 for optimizing the neural network model may generate a second memory space map from the first memory space map based on a change result of the memory tensor. The second memory space map occupies a relatively smaller memory space than the first memory space map. Therefore, according to the neural network model optimization method according to an embodiment of the present application through this operation, the total required memory space can be reduced and the execution speed of operations can be increased.

In the step of generating an instruction related to the memory address based on the optimization result (S2500), the neural network model optimization apparatus 1000 is based on the optimization result related to merging of target operations included in the above-described target operation pattern or memory allocation. Thus, an instruction related to a memory address of a memory tensor corresponding to each operation constituting the neural network model and/or an instruction related to the type of each operation constituting the neural network model may be generated.

Referring back to FIG. 2 , the method for optimizing a neural network model according to an embodiment of the present application may include optimizing an embedded device and obtaining optimal code information (S3000).

Hereinafter, with reference to FIGS. 8 and 9 , optimization of a neural network model for an embedded device according to an embodiment of the present application will be described in more detail. According to an embodiment of the present application, it may be implemented to perform optimization of a neural network model for an embedded device using a reinforcement learning technique.

Optimizing the embedded device and obtaining optimal code information according to an embodiment of the present application (S3000) includes acquiring computing environment information of the embedded device (S3100), through an agent trained through reinforcement learning. The method may further include obtaining an optimization parameter based on the instruction information (S3200) and generating optimal code information to be used in the embedded device based on the optimization parameter (S3300).

In the step of obtaining computing environment information of the embedded device (S3100), the neural network model optimization apparatus 1000, through the transceiver 1100, the neural network model optimization apparatus 1000, the embedded device 100 or any external Computing environment information (eg, memory information and processor information of the embedded device 100) of the embedded device 100, which is a target device on which the neural network model will be executed, may be obtained from the device. On the other hand, although not shown in FIG. 8, in the step of obtaining computing environment information of the embedded device (S3100), the neural network model optimization apparatus 1000 includes the embedded device (such as device type information or target function information of the embedded device 100) ( In 100), it is possible to obtain arbitrary information about variables affecting the execution of the neural network model.

In the step of obtaining optimization parameters based on instruction information through an agent trained by reinforcement learning (S3200), the neural network model optimization apparatus 1000 generates an optimal code using an agent trained using a reinforcement learning technique. Optimization parameters can be obtained for

In the step of acquiring optimization parameters based on the instruction information through an agent trained by reinforcement learning (S3200), the neural network model optimization apparatus 1000 includes memory state information, operation type information included in the instruction information, and embedded device ( At least one of embedded device information including computing environment information of 100) may be input to the agent, and an optimization parameter output through the agent may be acquired. Here, the optimization parameter is any parameter related to the code necessary to execute the neural network model, including a parameter for selecting the type of algorithm to be performed for the operation, a parameter related to the block size of the operation, and/or a parameter related to the length of the code. Variables can be related.

As an example, it is assumed that the operation type input to the agent is convolution. At this time, the trained agent may output a parameter for selecting at least one algorithm from among convolution-related algorithms, for example, the Im2Col algorithm, the default algorithm, and the FFT algorithm, based on the input value. Also, the agent may output a parameter related to the block size of an operation and/or a parameter related to a code length based on an input value. In this case, the apparatus 1000 for optimizing the neural network model may obtain the optimization parameters output through the agent.

As described above, the neural network model optimization apparatus 1000 according to an embodiment of the present application may acquire optimization parameters through an agent learned through 'reinforcement learning'. Specifically, the agent receives operation type information, memory state information, and embedded device information (or target device information) based on an initial rule (policy) and optimizes parameters (eg, parameters for selecting an algorithm type, blocks of operations) parameters related to the size, and/or parameters related to the length of the code, etc.). Meanwhile, the code generator may obtain optimization parameters and generate codes based on the optimization parameters. At this time, execution and evaluation of the performance of the generated code are performed, and based on the evaluation result, an initial rule of the agent may be updated to maximize the evaluation value of the code performance (ie, to maximize the performance of the code). . Specifically, the initial rule of the agent may be learned to be updated so as to output an optimization parameter that maximizes the evaluation value of the performance of the code. The trained agent may receive operation type information, memory state information, and/or embedded device information and output optimization parameters capable of generating code with maximized performance.

In the step of generating optimal code information to be used in the embedded device based on the optimization parameter (S3300), the neural network model optimization apparatus 1000 selects the type of algorithm to be performed for the optimization parameter (eg, operation) generated in step S3200. Optimal code information to be used in the embedded device 100 may be generated based on a parameter related to a block size of an operation, a parameter related to a block size of an operation, and/or a parameter related to a code length. In detail, the neural network model optimization apparatus 1000 may generate an optimal code based on an optimization parameter and a memory address of a corresponding operation through a code generator. For example, the optimization parameter is a parameter for selecting at least one algorithm (eg, Im2Col algorithm) from among the Im2Col algorithm, the Default algorithm, and the FFT algorithm, a parameter related to a block size value of an operation, and/or a parameter related to a code length value. When at least one is included, the neural network model optimization apparatus 1000 may generate an optimal code through a code generator based on the optimization parameter and the memory address of the operation included in the instruction information.

Meanwhile, although not shown in FIG. 8 , generating optimal code information to be used in an embedded device based on an optimization parameter according to an embodiment of the present application (S3300) includes code corresponding to instruction information based on the optimization parameter. The step of generating, compiling the generated code and converting it into a binary file form may be further included. In detail, the neural network model optimization apparatus 1000 may generate code (eg, C language code) based on the optimization parameter and store the generated code in a memory address included in instruction information. In addition, the apparatus 1000 for optimizing the neural network model may compile and convert the generated code into a binary file (eg, an API file). In addition, the neural network model optimization apparatus 1000 may transmit the binary file of the generated code to the embedded device 100 or an arbitrary external device through the transceiver 1100 . According to this embodiment, the optimal code is transmitted to the embedded device 100 in the form of a binary file (eg, an API file), so that the user of the embedded device 100 can visually check the optimal code.

Various operations of the neural network model optimization apparatus 1000 described above may be stored in the memory 1200 of the neural network model optimization apparatus 1000, and the processor 1300 of the neural network model optimization apparatus 1000 may be stored in the memory 1200. Can be provided to perform actions.

The neural network model optimization method, the neural network model optimization device, and the neural network model optimization system disclosed in this application provide an efficient method of artificial intelligence models in various embedded systems, including home appliances, vehicle sensors, products for the safety of infants or the elderly, and smart watches. can be used for execution.

The features, structures, effects, etc. described in the embodiments above are included in at least one embodiment of the present invention, and are not necessarily limited to only one embodiment. Furthermore, the features, structures, effects, etc. illustrated in each embodiment can be combined or modified with respect to other embodiments by a person having ordinary knowledge in the field to which the embodiments belong. Therefore, contents related to these combinations and variations should be construed as being included in the scope of the present invention.

In addition, although the embodiment has been described above, this is only an example and does not limit the present invention, and those skilled in the art to the present invention pertain to the above to the extent that does not deviate from the essential characteristics of the present embodiment. It will be appreciated that various modifications and applications not exemplified are possible. That is, each component specifically shown in the embodiment can be implemented by modifying it. And differences related to these modifications and applications should be construed as being included in the scope of the present invention as defined in the appended claims.

Claims

A method for optimizing a neural network model by an apparatus for optimizing a neural network model based on execution data of a trained neural network model in consideration of a computing environment of an embedded device in which the neural network model is to be driven,

Acquiring execution data of a neural network model for which learning has been completed, wherein the execution data includes at least one of layer data of the neural network model, operation data constituting the neural network model, and parameters of the neural network model;

performing optimization on the structure of the neural network model based on execution data of the neural network model and obtaining instruction information, wherein the instruction information includes information related to at least one of an operation type and a memory address;

optimizing an embedded device in which the neural network model is to be driven based on the instruction information and obtaining optimal code information; and

Transmitting the optimal code information; including,

Obtaining the optimal code information,

obtaining computing environment information of the embedded device;

obtaining an optimization parameter from instruction information through an agent trained by reinforcement learning; and

Generating code information to be used in the embedded device based on the optimization parameter; Further comprising,

Neural network model optimization methods.
According to claim 1,

Obtaining the optimization parameters,

inputting at least one of at least one operation type information corresponding to the operation, memory state information, and computing environment information of the embedded device to the agent; and

Acquiring an optimization parameter output through the agent; further comprising,

Methods for optimizing neural network models.
According to claim 2,

The optimization parameter is,

Associated with at least one of a parameter for selecting the type of algorithm to be performed for the operation, a parameter related to the block size of the operation, and a parameter related to the length of the code,

Neural network model optimization methods.
According to claim 1,

the agent,

According to the initial rule, based on target device information related to the computing environment of the target embedded device, memory state information, and at least one algorithm type information corresponding to an operation, a predicted value related to a parameter is output,

The agent is trained by updating the initial rule so that the evaluation value for the performance of the code generated through the predicted value is maximized.

Neural network model optimization methods.
According to claim 1,

The step of generating code information to be used in the embedded device,

generating code to correspond to the instruction information based on the optimization parameter; and

Compiling the generated code and converting it into a binary file form; further comprising,

Neural network model optimization methods.
A method for optimizing a neural network model by an apparatus for optimizing a neural network model based on execution data of a trained neural network model in consideration of a computing environment of an embedded device in which the neural network model is to be driven,

Obtaining execution data of a neural network model on which training has been completed, wherein the execution data includes at least one of layer data of the neural network model, operation data constituting the neural network model, and parameters of the neural network model. ;

optimizing the structure of the neural network model based on execution data of the neural network model and obtaining instruction information;

optimizing an embedded device in which the neural network model is to be driven based on the instruction information and obtaining optimal code information; and

Transmitting the optimal code information; including,

The step of optimizing the structure of the neural network model and obtaining instruction information,

generating a directed acyclic graph (DAG) based on the execution data;

determining an execution order of operations based on the directed acyclic graph;

detecting a target operation pattern of the directed acyclic graph corresponding to the reference operation pattern based on a predetermined reference operation pattern, and merging a first target operation and a second target operation included in the target operation pattern;

obtaining a first memory space map based on the determined execution order, and performing optimization related to memory allocation based on the first memory space map; and

Generating an instruction related to the memory address based on the result of performing the optimization; further comprising,

Neural network model optimization methods.
According to claim 6,

The merging of the first target operation and the second target operation comprises:

obtaining predetermined reference calculation pattern information, wherein the reference calculation pattern information includes a first calculation and a second calculation associated with the first calculation;

detecting the first target operation corresponding to the first operation and the second target operation corresponding to the second operation from the directed acyclic graph based on the reference operation pattern information; and

Further comprising merging the first target operation and the second target operation and converting a kernel based on the merging result.

Methods for optimizing neural network models.
According to claim 6,

The step of performing the optimization related to the memory allocation,

generating the first memory space map based on the determined execution order and the size of data output through the operation;

changing a first memory tensor storing a value input through a third target operation into a second memory tensor storing a value output through the third target operation; and

Generating a second memory space map from the first memory space map based on the change result; further comprising,

Neural network model optimization methods.
According to claim 6,

Determining the execution order of the operations,

A first memory space required for a fourth target operation included in the first branch of the directed acyclic graph and a second memory space required for a fifth target operation included in the second branch of the directed acyclic graph. Computing;

comparing the first memory space and the second memory space; and

Determining an execution order of the fourth target operation and the fifth target operation according to the comparison result;

Methods for optimizing neural network models.
According to claim 9,

When the first memory space is larger than the second memory space, the execution order of the fourth target operation is assigned a lower priority than the execution order of the fifth target operation;

When the first memory space is smaller than the second memory space, the execution order of the fourth target operation is assigned a priority over the execution order of the fifth target operation.

Neural network model optimization methods.
According to claim 10,

The step of optimizing the structure of the neural network model and obtaining instruction information,

obtaining input data and output data related to the operation of the neural network model; and

Adjusting the input data and the output data to values corresponding to a predetermined range of integers; further comprising,

Methods for optimizing neural network models.
A computer-readable recording medium recording a program for executing the method according to any one of claims 1 to 11 in a computer.
A neural network model optimization apparatus for optimizing a neural network model based on execution data of a trained neural network model and a computing environment of an embedded device in which the neural network model is to be driven,

a transceiver for acquiring execution data of the trained neural network model and computing environment information of an embedded device in which the neural network model is to be driven; and

A processor performing optimization of the neural network model based on the execution data and computing environment information of the embedded device;

the processor,

Obtaining execution data of a neural network model on which learning has been completed, wherein the execution data includes at least one of layer data of the neural network model, operation data constituting the neural network model, and parameters of the neural network model, Optimization of the structure of the neural network model is performed based on execution data of the neural network model, and instruction information, including information related to at least one of a type of operation and a memory address, is obtained, and the instruction Based on the information, it is configured to optimize an embedded device in which the neural network model will be driven, obtain optimal code information, and transmit the optimal code information,

the processor,

The optimal code information by obtaining computing environment information of the embedded device, obtaining optimization parameters from instruction information through an agent trained through reinforcement learning, and generating code information to be used in the embedded device based on the optimization parameters. configured to obtain

Neural network model optimizer.
A neural network model optimization apparatus for optimizing a neural network model based on execution data of a trained neural network model and a computing environment of an embedded device in which the neural network model is to be driven,

a transceiver for acquiring execution data of the trained neural network model and computing environment information of an embedded device in which the neural network model is to be driven; and

A processor performing optimization of the neural network model based on the execution data and computing environment information of the embedded device;

the processor,

Obtaining execution data of a neural network model on which learning has been completed, wherein the execution data includes at least one of layer data of the neural network model, operation data constituting the neural network model, and parameters of the neural network model, Based on the execution data of the neural network model, optimization of the structure of the neural network model is performed, instruction information is obtained, optimization of the embedded device in which the neural network model is to be driven is performed based on the instruction information, and optimal code information is obtained. Acquire and configured to transmit the optimal code information,

the processor,

A directed acyclic graph (DAG) is generated based on the execution data, an execution order of operations is determined based on the directed acyclic graph, and the directionality corresponding to the reference operation pattern is determined based on a predetermined reference operation pattern. A target operation pattern of an acyclic graph is detected, a first target operation and a second target operation included in the target operation pattern are merged, a first memory space map is obtained based on the determined execution order, and the first Performing optimization related to memory allocation based on the memory space map, and generating instructions related to an execution order and a memory address based on a result of performing the optimization, thereby obtaining the instruction information,

Neural network model optimizer.