WO2023038159A1

WO2023038159A1 - Method and system for optimizing deep-learning model through layer-by-layer lightening

Info

Publication number: WO2023038159A1
Application number: PCT/KR2021/012100
Authority: WO
Inventors: 심경환; 김태호
Original assignee: 주식회사 노타
Priority date: 2021-09-07
Filing date: 2021-09-07
Publication date: 2023-03-16
Also published as: KR102552478B1; KR20230038636A

Abstract

Disclosed are a method and system for optimizing a deep-learning model through layer-by-layer lightening. The method for optimizing a deep-learning model according to an embodiment may comprise the steps of: determining the importance of each layer included in the deep-learning model on the basis of the degree of performance degradation of the deep-learning model when independently lightening each of the layers; and lightening the deep-learning model according to the importance determined for each of the layers.

Description

Deep learning model optimization method and system through weight reduction by layer

The following description relates to a deep learning model optimization method and system through layer-by-layer weight reduction.

Representative technologies for lightening deep learning models include network pruning, filter decomposition, and quantization. Network pruning is a technique that reduces the size of the model or accelerates the model by removing parameters that are judged to be of low importance in the model. The objects of the removed parameters include weight vectors, entire channels, or layers. Include all components of the model. Filter decomposition is a technique to reduce weight by approximating multi-dimensional model weights with a small number of coefficients. The techniques include Tucker decomposition, canonical polyadic decomposition (CP decomposition), and singular value decomposition (SVD). The quantization technique is a technique of expressing and approximating weights with a smaller number of bits than before in expressing weights of a model, thereby reducing weight.

In order to lighten a specific model through the above technique, an algorithm that efficiently allocates the degree of lightness of the model for each layer is required, and even if the degree of lightness is the same, there is a difference in performance depending on the superiority of the corresponding algorithm. The degree of lightening for each layer of each lightening technique is determined by how much network pruning is performed within the layer, how many coefficients are used to approximate the weights of the existing model (filter decomposition), or how many small bits. Including whether to approximate the weights, etc., this corresponds to the NP complete (Nondeterministic Polynomial Complete) problem.

At this time, although the characteristics of the layers are very diverse even in the same model, there is a problem that the performance is not guaranteed when the weight is reduced to the same degree of weight or an arbitrary value determined by the user. In addition, in determining the degree of weight reduction for each layer, an independent technique from the characteristics of each weight reduction technique is required.

However, various techniques for the purpose of weight reduction and acceleration continue to develop, and there is a problem in that it is very difficult to measure the degree of weight reduction for each layer suitable for each weight reduction technique.

[Prior document number]

Korean Patent Publication No. 10-2018-0013674

Provides a deep learning model optimization method and system that can determine the degree of weight reduction for each layer independently of each weight reduction technique.

A deep learning model optimization method performed by a computer device including at least one processor, wherein each of the layers included in the deep learning model is independently lightweight by the at least one processor, of the deep learning model determining an importance of each of the layers based on a degree of performance degradation; and performing, by the at least one processor, lightening of the deep learning model according to the importance determined for each of the layers.

According to one aspect, the determining of the importance may include generating a lightweight first layer by lightening a first layer among the layers using a first value among preset values of parameters for controlling an amount of compression. Level 1; a second step of generating a first instance model of the deep learning model in which the first layer is replaced with the lightweight first layer; and a third step of determining whether to select the first value as the importance of the first layer by comparing the performance of the first instance model with a difference between the performance of the deep learning model and a preset tolerance for performance degradation. It can be characterized by doing.

According to another aspect, the determining of the importance may include, when the first value is not selected as the importance of the first layer, using a second value among preset values of the parameter in the first step to the first layer. It may be characterized by further comprising a fourth step of re-performing step 3.

According to another aspect, the step of determining the importance may further include a fifth step of repeatedly performing the first to fourth steps for a second layer among the layers.

According to another aspect, the permissible value of performance degradation may be preset based on a user's input.

According to another aspect, the importance depends on whether the layer corresponding to the importance is pruned or the weight vector of the layer corresponding to the importance, when the deep learning model is lightweight using network pruning. It may be characterized by indicating whether or not pruning is performed.

According to another aspect, the importance may indicate a coefficient to which a weight of a layer corresponding to the importance is approximated when the deep learning model is lightweighted using filter decomposition.

According to another aspect, the importance may indicate the number of bits representing the weight of the layer corresponding to the importance when the deep learning model is lightweighted using quantization.

A computer program stored in a computer readable recording medium is provided in combination with a computer device to execute the method on the computer device.

A computer readable recording medium having a program for executing the method in a computer device is recorded.

In the case of including at least one processor implemented to execute instructions readable by a computer device, and independently lightweighting each of the layers included in the deep learning model by the at least one processor, performance degradation of the deep learning model It provides a computer device characterized in that determining the importance of each of the layers based on the degree, and proceeding with the weight reduction of the deep learning model according to the determined importance of each of the layers.

The degree of weight reduction for each layer can be determined independently of each weight reduction technique.

1 is a block diagram illustrating an example of a computer device according to one embodiment of the present invention.

2 is a block diagram showing an example of the internal configuration of a deep learning model optimization system according to an embodiment of the present invention.

3 is a flowchart illustrating an example of a deep learning model optimization method according to an embodiment of the present invention.

4 is a flowchart illustrating an example of a process of determining importance according to an embodiment of the present invention.

[Correction under Rule 91 02.11.2021]
[delete]

Hereinafter, an embodiment will be described in detail with reference to the accompanying drawings.

A deep learning model optimization system according to embodiments of the present invention may be implemented by at least one computer device. At this time, a computer program according to an embodiment of the present invention may be installed and driven in the computer device, and the computer device may perform the deep learning model optimization method according to the embodiments of the present invention under the control of the driven computer program. can The above-described computer program may be combined with a computer device and stored in a computer readable recording medium to execute a deep learning model optimization method on a computer.

1 is a block diagram illustrating an example of a computer device according to one embodiment of the present invention. As shown in FIG. 1, a computer device 100 includes a memory 110, a processor 120, a communication interface 130, and an I/O interface 140. can include The memory 110 is a computer-readable recording medium and may include a random access memory (RAM), a read only memory (ROM), and a permanent mass storage device such as a disk drive. Here, a non-perishable mass storage device such as a ROM and a disk drive may be included in the computer device 100 as a separate permanent storage device distinct from the memory 110. Also, an operating system and at least one program code may be stored in the memory 110 . These software components may be loaded into the memory 110 from a recording medium readable by a separate computer from the memory 110 . The separate computer-readable recording medium may include a computer-readable recording medium such as a floppy drive, a disk, a tape, a DVD/CD-ROM drive, and a memory card. In another embodiment, software components may be loaded into the memory 110 through the communication interface 130 rather than a computer-readable recording medium. For example, software components may be loaded into the memory 110 of the computer device 100 based on a computer program installed by files received through a network 160 .

The processor 120 may be configured to process commands of a computer program by performing basic arithmetic, logic, and input/output operations. Instructions may be provided to processor 120 by memory 110 or communication interface 130 . For example, processor 120 may be configured to execute received instructions according to program codes stored in a recording device such as memory 110 .

The communication interface 130 may provide a function for the computer device 100 to communicate with other devices through the network 160 . For example, a request, command, data, file, etc. generated according to a program code stored in a recording device such as the memory 110 by the processor 120 of the computer device 100 is transmitted to the network ( 160) to other devices. Conversely, signals, commands, data, files, etc. from other devices may be received by the computer device 100 via the communication interface 130 of the computer device 100 via the network 160 . Signals, commands, data, etc. received through the communication interface 130 may be transmitted to the processor 120 or the memory 110, and files, etc. may be stored as storage media that the computer device 100 may further include (described above). permanent storage).

The input/output interface 140 may be a means for interface with the input/output device (I/O device, 150). For example, the input device may include a device such as a microphone, keyboard, or mouse, and the output device may include a device such as a display or speaker. As another example, the input/output interface 140 may be a means for interface with a device in which functions for input and output are integrated into one, such as a touch screen. The input/output device 150 and the computer device 100 may be configured as one device.

Also, in other embodiments, computer device 100 may include fewer or more elements than those of FIG. 1 . However, there is no need to clearly show most of the prior art components. For example, the computer device 100 may be implemented to include at least a portion of the above-described input/output device 150 or may further include other components such as a transceiver and a database.

2 is a block diagram showing an example of the internal configuration of a deep learning model optimization system according to an embodiment of the present invention, and FIG. 3 is a flowchart showing an example of a deep learning model optimization method according to an embodiment of the present invention. am.

The deep learning model optimization system 200 according to this embodiment may be implemented by at least one computer device 100 . The deep learning model optimization system 200 of FIG. 2 may include an importance determination unit 210 and a weight reduction unit 220 . At this time, the importance determination unit 210 and the weight reduction unit 220 are functional representations of functions in which the processor 120 of the computer device 100 implementing the deep learning model optimization system 200 operates under the control of a computer program ( functional expression). For example, the processor 120 of the computer device 100 may be implemented to execute control instructions according to codes of an operating system included in the memory 110 or codes of at least one computer program. Here, the processor 120 controls the computer device 100 so that the computer device 100 performs

steps

310 and 320 included in the method of FIG. 3 according to a control command provided by a code stored in the computer device 100. can control. At this time, as a functional expression of the processor 120 for performing each of the

steps

310 and 320, the importance determination unit 210 and the weight reduction unit 220 may be used.

In step 310, the importance determiner 210 may determine the importance of each layer based on the degree of performance degradation of the deep learning model when each of the layers included in the deep learning model is independently weighted. For example, the importance determining unit 210 measures the performance of the original deep learning model, generates a first instance model of the deep learning model in which the first layer of the deep learning model is independently lightweight, and measures the performance. , the importance of the first layer may be determined based on how much the performance of the first instance model in which the first layer is independently lightweight is lowered compared to the original deep learning model. A process of determining the importance of each of the layers will be described in detail later with reference to FIG. 4 .

In step 320, the weight reduction unit 220 may progress weight reduction of the deep learning model according to the importance determined for each of the layers. For example, when a deep learning model is lightweighted using network pruning, the importance may indicate whether a layer corresponding to the importance is pruned or whether the weight vector of the layer corresponding to the importance is pruned. . Since pruning for each of the layers or each of the weight vectors of the layers is determined through an absolute index of performance, the lightweight unit 220 may determine the degree of lightweighting for each layer for the network pruning technique. In addition, when a deep learning model is lightweighted using filter decomposition, the importance may indicate a coefficient to which a weight of a layer corresponding to the importance is approximated. Since the coefficient to which the weight is approximated for each layer is determined through an absolute index of performance, the lightweight unit 220 may determine the degree of lightweighting for each layer for the filter decomposition technique. In addition, when a deep learning model is lightweighted using quantization, the importance may indicate the number of bits representing the weight of a layer corresponding to the importance. Since the number of bits representing the weight of each layer is determined through an absolute index of performance, the weight reduction unit 220 may determine the degree of weight reduction for each layer with respect to the quantization technique. As such, the deep learning model optimization system 200 can determine the degree of weight reduction for each layer independently of each weight reduction technique, and through this, it is possible to provide a deep learning model optimization method with guaranteed performance.

4 is a flowchart illustrating an example of a process of determining importance according to an embodiment of the present invention. Steps 410 to 450 of FIG. 4 may be included in step 310 of FIG. 3 and performed by the importance determining unit 210 .

In step 410, the importance determiner 210 may create a lightweight first layer by lightening the first layer among the layers using a first value among preset values of parameters for controlling the amount of compression. . The set of preset values may vary according to the lightweighting technique to be used for lightweighting the deep learning model.

In step 420, the importance determiner 210 may generate a first instance model of the deep learning model in which the first layer is replaced with a lightweight first layer. In other words, the importance determining unit 210 may generate the first instance model by replacing the first layer of the deep learning model with the first layer that is lightweight in step 410 .

In step 430, the importance determination unit 210 may compare the performance of the first instance model with the difference between the performance of the deep learning model and a preset tolerance for performance degradation.

In step 440, the importance determining unit 210 may determine whether to select the first value as the importance of the first layer.

For example, the preset values are parameters for controlling the compression amount, and may be arranged in an order of relatively increasing compression amount. At this time, in step 410, the importance determining unit 210 may sequentially select and utilize values starting from the parameter having the smallest compression amount. For example, assuming that a set of preset values for network pruning is {1, 0.9, 0.8, 쪋, 0.1}, the importance determining unit 210 first selects '1' as a first value and performs step Steps 410 to 440 may be performed. In this case, when the performance of the first instant model including the first layer reduced by using the first value is less than or equal to the difference between the performance of the deep learning model and the predetermined performance degradation tolerance (that is, close to the given performance degradation tolerance) performance degradation), the first value may be selected as the importance of the first layer.

If the first value is not selected as the importance of the first layer (that is, when the performance of the first instance model exceeds the difference between the performance of the deep learning model and the predetermined performance degradation tolerance), the importance determining unit ( 210 may re-perform steps 410 to 440. At this time, the importance determiner 210 may lighten the first layer by using a second value different from the first value among preset values of parameters in the re-performed step 410 . As an example, assuming that a set of preset values for network pruning is {1, 0.9, 0.8, 쪋, 0.1}, the importance determining unit 210 determines that '1' selected as the first value is the value of the first layer. If the importance is not selected, '0.9' may be selected as the second value and steps 410 to 440 may be performed again. For example, the importance determining unit 210 may create a lightweight first layer by lightening the first layer using the second value in the re-performed step 410, and in the re-performed step 420, the weighted first layer may be generated. A second instance model of the deep learning model in which one layer is replaced with a lightweight first layer may be generated. Thereafter, the importance determiner 210 may compare the performance of the second instance model with the difference between the performance of the deep learning model and a preset tolerance for performance degradation in step 430, which is performed again. For example, if the performance of the second instance model is equal to or less than the difference, the importance determiner 210 may select the second value as the importance of the first layer in the re-performed step 440 . On the other hand, when the second value is not selected as the importance of the first layer, the importance determiner 210 determines a third value (eg, 0.8) different from the first and second values among preset values of the parameter. Steps 410 to 440 may be performed again using This re-execution may be performed until the importance is determined through step 440 or steps 410 to 440 are repeatedly performed for all preset values of the parameter.

In step 450, the importance determiner 210 may check whether the importance has been determined for each of the layers. If there is a layer whose importance has not been determined, steps 410 to 440 may be re-performed on the next layer. Meanwhile, if the importance level is determined for each of the layers, the process for determining the level of importance may end and step 320 of FIG. 3 may be performed.

Table 1 below shows an example of an algorithm of a deep learning model optimization method.

At this time, Table 1 is an algorithm for lightening the deep learning model M by independently determining the importance of each of the layers included in the set L (1, 2, ..., l ∈ L ) of the layers of the deep learning model M. shows an example of Here, θ may be a parameter for adjusting the compression amount of the model, which changes according to the compression method F. As an example, θ ∈ (0, 1] (e.g., θ ∈ [1, 0.9, 0.8, …, 0.1]) in the case of network pruning to obtain a compressed model M 'or compressed layer l ' In the case of filter decomposition, θ ∈ [0, 1, None ] (eg, θ ∈ [1, 0.9, 0.8, …, 0.1]), and in the case of quantization, θ ∈ [2, 4, 8, 16, 32] (where 2, 4, and 8 may mean the number of bits representing an integer, and 16 and 32 may mean the number of bits representing a real number (floating point), respectively). In order to measure the performance of the deep learning model M or the compressed model M ', the function infer () and the dataset D can be used using the function infer () and the dataset D. In addition, δ can be provided by the user. The permissible performance degradation value, Θ may mean a set of preset values of parameters for adjusting the amount of compression.

In Table 1, the first row may be an example of declaring an array “ Compressed ” in which the importance determined for each of the layers included in the deep learning model M will be stored. In addition, the second row may be an example of measuring the performance “ OrigAcc ” of the deep learning model M by inputting the deep learning model M and the dataset D to the function infer (). The third row may mean that all elements included in the set L of the layers of the deep learning model M , that is, iterative performance may be performed for each of the layers, and the fourth row represents parameters for adjusting the amount of compression. This may mean that iterative performance may be performed for all elements included in the set of preset values Θ . A fifth row shows an example of generating a compressed layer l ' by independently compressing (lightening) the layer l through a specific θ . Here, the specific θ may be selected in order of a relatively small amount of compression. The sixth row shows an example of generating a compressed deep learning model M ' by replacing the layer l of the deep learning model M with the compressed layer l ' using the function replace () for replacing the layer. The seventh row shows an example of measuring performance such as accuracy for the compressed deep learning model M ' using the dataset D. In the eighth row, when the measured performance is less than or equal to the difference between the performance of the deep learning model M minus the performance degradation tolerance δ , the current value of θ for layer l of the array “ Compressed ” is stored as an importance, as in the ninth row. can In row 5, since θ is selected in order of relatively small amount of compression, a specific θ for which the measured performance is equal to or less than the difference between the performance of the deep learning model M minus the degradation tolerance δ is a performance degradation close to the degradation tolerance. while having the highest compression rate at the same time, it is possible to obtain a compressed layer l '. After the current value of θ for layer l is determined as the importance, it is possible to deviate from the repetition of the fourth row by “break” in the tenth row and determine the importance for other layers through the third row. Line 11 means to escape the "if" statement on line 8, line 12 means to escape the "for" statement on line 4, and line 13 means to escape the "for" statement on line 3. can When the value of θ is determined as the importance for each of all layers and the array " Compressed " is completed, the deep learning model M is compressed through the compression method F according to the importance of the array " Compressed " in the 14th row, and the final compressed deep A running model M ' may be created.

As such, according to embodiments of the present invention, the degree of weight reduction for each layer can be determined independently for each weight reduction technique.

The system or device described above may be implemented as a hardware component or a combination of hardware components and software components. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA) , a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may run an operating system (OS) and one or more software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of software. For convenience of understanding, there are cases in which one processing device is used, but those skilled in the art will understand that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it can include. For example, a processing device may include a plurality of processors or a processor and a controller. Other processing configurations are also possible, such as parallel processors.

Software may include a computer program, code, instructions, or a combination of one or more of the foregoing, which configures a processing device to operate as desired or processes independently or collectively. You can command the device. Software and/or data may be any tangible machine, component, physical device, virtual equipment, computer storage medium or device, intended to be interpreted by or provide instructions or data to a processing device. can be embodied in Software may be distributed on networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer readable media.

The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The medium may continuously store programs executable by a computer or temporarily store them for execution or download. In addition, the medium may be various recording means or storage means in the form of a single or combined hardware, but is not limited to a medium directly connected to a certain computer system, and may be distributed on a network. Examples of the medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROM and DVD, magneto-optical media such as floptical disks, and ROM, RAM, flash memory, etc. configured to store program instructions. In addition, examples of other media include recording media or storage media managed by an app store that distributes applications, a site that supplies or distributes various other software, and a server. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler.

As described above, although the embodiments have been described with limited examples and drawings, those skilled in the art can make various modifications and variations from the above description. For example, the described techniques may be performed in an order different from the method described, and/or components of the described system, structure, device, circuit, etc. may be combined or combined in a different form than the method described, or other components may be used. Or even if it is replaced or substituted by equivalents, appropriate results can be achieved.

Therefore, other implementations, other embodiments, and equivalents of the claims are within the scope of the following claims.

Claims

A deep learning model optimization method performed by a computer device including at least one processor,

determining, by the at least one processor, an importance of each of the layers based on a degree of performance degradation of the deep learning model when each of the layers included in the deep learning model is independently lightweight; and

Lightening the deep learning model according to the importance determined for each of the layers, by the at least one processor.

Deep learning model optimization method including.
According to claim 1,

The step of determining the importance is,

A first step of generating a lightweight first layer by lightening a first layer among the layers using a first value among predetermined values of a parameter for controlling an amount of compression;

a second step of generating a first instance model of the deep learning model in which the first layer is replaced with the lightweight first layer; and

A third step of determining whether to select the first value as the importance of the first layer by comparing the performance of the first instance model with the difference between the performance of the deep learning model and a predetermined tolerance for performance degradation.

Deep learning model optimization method comprising a.
According to claim 2,

The step of determining the importance is,

A fourth step of re-performing the first to third steps using a second value among preset values of the parameter when the first value is not selected as the importance of the first layer.

Deep learning model optimization method further comprising a.
According to claim 3,

The step of determining the importance is,

A fifth step of repeatedly performing the first to fourth steps for a second layer among the layers.

Deep learning model optimization method further comprising a.
According to claim 2,

The deep learning model optimization method, characterized in that the performance degradation tolerance is preset based on a user's input.
According to claim 1,

The importance indicates whether the layer corresponding to the importance is pruned or the weight vector of the layer corresponding to the importance is pruned when the deep learning model is lightweighted using network pruning. Deep learning model optimization method, characterized in that.
According to claim 1,

The method of optimizing a deep learning model, characterized in that the importance indicates a coefficient to which a weight of a layer corresponding to the importance is approximated when the deep learning model is lightweighted using filter decomposition.
According to claim 1,

The deep learning model optimization method, characterized in that the importance represents the number of bits representing the weight of the layer corresponding to the importance when the deep learning model is lightweighted using quantization.
A computer program stored in a computer readable recording medium to be combined with a computer device to execute the method of any one of claims 1 to 8 on the computer device.
A computer readable recording medium on which a program for executing the method of any one of claims 1 to 8 is recorded in a computer device.
at least one processor implemented to execute instructions readable by a computer device;

including,

by the at least one processor,

Determine the importance of each of the layers based on the degree of performance degradation of the deep learning model when each of the layers included in the deep learning model is independently lightweight,

Lightening the deep learning model according to the importance determined for each of the layers.

Characterized by a computer device.
According to claim 11,

To determine the importance, by the at least one processor,

A first process of generating a lightweight first layer by lightening a first layer among the layers using a first value among predetermined values of parameters for controlling an amount of compression;

a second process of generating a first instance model of the deep learning model in which the first layer is replaced with the lightweight first layer; and

A third process of determining whether to select the first value as the importance of the first layer by comparing the performance of the first instance model with the difference between the performance of the deep learning model and a predetermined tolerance for performance degradation.

A computer device characterized in that for performing.
According to claim 12,

To determine the importance, by the at least one processor,

A fourth process of re-performing the first to third processes using a second value among preset values of the parameter when the first value is not selected as the importance of the first layer.

A computer device characterized in that it further performs.
According to claim 13,

To determine the importance, by the at least one processor,

A fifth process of repeatedly performing the first to fourth processes on a second layer among the layers.

A computer device characterized in that it further performs.