WO2022107951A1

WO2022107951A1 - Method for training ultra-lightweight deep learning network

Info

Publication number: WO2022107951A1
Application number: PCT/KR2020/016635
Authority: WO
Inventors: 박종희; 이상설; 장성준
Original assignee: 한국전자기술연구원
Priority date: 2020-11-20
Filing date: 2020-11-24
Publication date: 2022-05-27
Also published as: KR20220069653A

Abstract

Provided is a method for training/quantizing an ultra-lightweight deep learning network. The method for training/quantizing an ultra-lightweight deep learning network according to an embodiment of the present invention quantizes LSN parameters, disseminates quantized knowledge generated by quantization to an SSN, and quantizes SSN parameters using the disseminated quantized knowledge. As such, training for ultra-lightweight quantization of a base network having low complexity becomes possible.

Description

How to train an ultra-light deep learning network

The present invention relates to a method for learning a deep learning network, and more particularly, to a method for learning and quantizing an ultralight deep learning network.

In the prior art, in order to learn an ultra-light network, the network is trained based on a floating-point number system and the corresponding parameters are quantized.

A method of quantizing floating-point data can be roughly divided into two types. The first technique is a technique of performing quantization of floating-point data that has been trained through a specific mapping function, as shown in FIG. 1 .

This has a problem in that the accuracy of the ultra-light deep learning network is greatly lowered compared to the floating-point result because it generates a quantization error that occurs when the real part is changed to an integer.

The second technique is a method of performing quantization during learning as shown in FIG. 2 . This method is presented to compensate for the quantization error of the first technique.

This is a method of inserting a quantization mapping function in the middle of the base network to make it ultra-light, and the hyper parameters used in the quantization mapping function are learned together, so it has higher accuracy than the first technique.

However, since the structure of the base network itself is not changed, when the base network has low complexity, there is a disadvantage in that the accuracy after quantization is lowered.

The present invention has been devised to solve the above problems, and an object of the present invention is to quantize the parameters of the SSN by propagating the quantization knowledge information of the LSN as a method to prevent the accuracy from falling after quantization in an ultra-light network. to provide a way to do it.

Another object of the present invention is to provide a method for adaptively selecting a quantization technique of an SSN.

According to an embodiment of the present invention for achieving the above object, a network quantization method includes a first quantization step of quantizing parameters of a first deep learning network; propagating quantization knowledge information generated in the process of performing quantization in the first quantization step to a second deep learning network; and a second quantization step of quantizing parameters of a second deep learning network using the propagated quantization knowledge information.

The first deep learning network may be a Large Scale Network (LSN), and the second deep learning network may be a Small Scale Network (SSN).

The quantization knowledge information may include at least one of data dispersion, quantization error, and error dispersion.

In the propagation step, if the number of layers of the first deep learning network and the number of layers of the second deep learning network are not the same, quantization knowledge information about parameters of all layers may be propagated.

In the propagation step, when the number of layers of the first deep learning network and the number of layers of the second deep learning network are the same, quantization knowledge information about parameters may be propagated for each layer.

The second quantization step is a quantization technique that maps floating point and integer to 1:N, and parameters of the second deep learning network can be quantized.

A network quantization method according to an embodiment of the present invention includes a first learning step of learning a first deep learning network; propagating the classification knowledge information obtained through the first learning step to a second deep learning network; The method may further include a second learning step of learning the second deep learning network by using the propagated classification knowledge information.

On the other hand, according to another embodiment of the present invention, a network quantization system, a communication unit for receiving quantization knowledge information generated in the process of quantizing parameters of a first deep learning network; and a processor that quantizes parameters of the second deep learning network by using the received quantization knowledge information.

As described above, according to embodiments of the present invention, a structure for propagating quantization information from LSN (Teacher Network) to SSN (Student Network) is introduced, so that it is possible to learn to quantize a low-complexity base network with ultra-light weight. becomes

In addition, according to embodiments of the present invention, a 1:N mapping quantization technique can be adaptively selected by searching while changing the structure of the base network simultaneously with quantization.

1 is a quantization technique after learning;

2 is a quantization technique during learning;

3 is a concept of an ultra-light deep learning network learning / quantization method according to an embodiment of the present invention;

4 is a flowchart provided for explaining a method for learning/quantizing an ultra-light deep learning network according to an embodiment of the present invention;

5 and 6 are diagrams provided for the explanation of a quantization knowledge propagation method;

7 is a diagram illustrating an adaptive quantization technique;

8 is a block diagram of an ultra-light deep learning network learning/quantization system according to another embodiment of the present invention.

Hereinafter, the present invention will be described in more detail with reference to the drawings.

In an embodiment of the present invention, a method for learning and quantizing an ultra-light deep learning network is presented.

The basic concept of a framework for learning and quantizing an ultra-light deep learning network is shown in FIG. 3 . 3 is a diagram provided to explain an ultra-light deep learning network learning/quantization method according to an embodiment of the present invention.

As shown, first, by utilizing the Knowledge distillation technique to learn the SSN (Small Scale Network) 200 for generating an ultra-light network with higher accuracy, classification knowledge information of the LSN (Large Scale Network) 100 learn by spreading That is, the LSN 100 functions as a 'Teacher network', and the SSN 200 functions as a 'Student network', respectively.

Next, the knowledge generated when the LSN 100 is quantized is propagated to the SSN 200, and the SSN 200 is quantized to configure an ultra-light network.

The concept shown in FIG. 3 will be described in detail with reference to FIG. 4 . 4 is a flowchart provided to explain an ultra-light deep learning network learning/quantization method according to an embodiment of the present invention.

As shown, first, the LSN 100 is learned (S310), and the classification knowledge information obtained through the step S310 is propagated to the SSN 200 (S320). Accordingly, the SSN 200 is learned using the classification knowledge information propagated in step S320 (S330).

Next, the parameters of the LSN 100 are quantized (S340), and quantization knowledge information generated in the process of performing quantization in step S340 is propagated to the SSN 200 (S350). Accordingly, the parameters of the SSN 200 are quantized using the quantization knowledge information propagated in step S350 (S360).

The quantization knowledge information propagated in step S350 and used for parameter quantization of the SSN 200 may include data dispersion, quantization error, and error dispersion.

The quantization method performed in steps S340 to S360 may be classified into two types.

5 is a diagram illustrating the concept of the first quantization method. This is a method in which quantization knowledge information on parameters of all layers of the LSN 100 is propagated to the SSN 200, and the SSN 200 performs quantization on parameters of all layers based on the propagation information.

This is a suitable method when the number of layers of the LSN 100 and the number of layers of the SSN 200 are not the same.

6 is a method of propagating quantization knowledge information on parameters for each layer of the LSN 100 for each layer of the SSN 200, and the SSN 200 performs quantization on parameters for each layer based on this. to be.

This is a suitable method when the number of layers of the LSN 100 and the number of layers of the SSN 200 are the same.

Meanwhile, in the quantization method in steps S340 to S360, the parameters of the SSN 200 can be quantized using a quantization technique that maps a floating point and an integer to 1:N.

Since the ultra-light network has significantly fewer parameters than the LSN 100 , the discrimination power of the generated network is also reduced. This is because, in general, when quantizing a base network, we map a floating point to an integer 1:1.

In order to adaptively select this, 1:N mapping is applied in the network learning/quantization method according to an embodiment of the present invention. However, in this case, the parameter N is hueristic, and selecting N optimal for each data set requires re-learning in several steps.

In order to prevent this, as shown in FIG. 7, a selection vector of N-dimension to select a feature that is actually used among N mapped for adaptive ultra-light quantization network structure change is converted into 1:N mapping transformation intermediate , and designing each vector internal element with a sigmoid function that can be selected while learning is in progress, it is implemented so that features used adaptively can be selected.

Ultralight deep learning network learning / quantization system according to an embodiment of the present invention, as shown, includes a communication unit 410, an output unit 420, a processor 430, an input unit 440 and a storage unit 450 It can be implemented with a computing system that

The communication unit 410 is a communication interface that is connected to communicate with an external network or an external device, and transmits and receives data/information. In an embodiment of the present invention, the communication unit 410 receives the classification knowledge information and the quantization knowledge information of the LSN 100 by communicating with the system for learning the LSN 100 .

The processor 430 learns the SSN 200 using the classification knowledge information received through the communication unit 410 , and quantizes the SSN 200 using the quantization knowledge information to create an ultra-light network.

The input unit 440 is an input unit that transmits a user command to the processor 430 , and the output unit 420 is an output unit that outputs an execution result of the processor 430 . The storage unit 450 provides a storage space necessary for the processor 430 to operate and function.

So far, a preferred embodiment has been described in detail for a method of learning and quantizing an ultralight deep learning network.

In the above embodiment, a structure for propagating quantization information from the LSN (Teacher Network) 100 to the SSN (Student Network) 200 was introduced, and a learning technique for quantizing a low-complexity base network to an ultra-light weight was presented.

In addition, in the above embodiment, a structure in which a 1:N mapping quantization technique can be adaptively selected was proposed by searching while changing the structure of the base network while quantizing it.

Accordingly, the deep learning technology that can be operated in expensive GPU devices can be utilized in various fields, and the network can adaptively learn in various fields of application.

Furthermore, since deep learning algorithms can be performed in real time even on devices with low computational performance, such as mobile devices or edge devices, AI technology can be applied to various industries.

In addition, the network structure can be automatically changed based on data sets with different characteristics for each field, enabling efficient operation.

On the other hand, it goes without saying that the technical idea of the present invention can be applied to a computer-readable recording medium containing a computer program for performing the functions of the apparatus and method according to the present embodiment. In addition, the technical ideas according to various embodiments of the present invention may be implemented in the form of computer-readable codes recorded on a computer-readable recording medium. The computer-readable recording medium may be any data storage device readable by the computer and capable of storing data. For example, the computer-readable recording medium may be a ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, hard disk drive, or the like. In addition, the computer-readable code or program stored in the computer-readable recording medium may be transmitted through a network connected between computers.

In addition, although preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the specific embodiments described above, and the technical field to which the present invention belongs without departing from the gist of the present invention as claimed in the claims In addition, various modifications are possible by those of ordinary skill in the art, and these modifications should not be individually understood from the technical spirit or perspective of the present invention.

Claims

a first quantization step of quantizing parameters of a first deep learning network;

propagating quantization knowledge information generated in the process of performing quantization in the first quantization step to a second deep learning network;

A network quantization method comprising: a second quantization step of quantizing parameters of a second deep learning network using the propagated quantization knowledge information.
The method according to claim 1,

The first deep learning network is

LSN (Large Scale Network),

The second deep learning network is

Network quantization method, characterized in that the SSN (Small Scale Network).
The method according to claim 1,

Quantization knowledge information,

A network quantization method comprising at least one of data variance, quantization error, and error variance.
4. The method according to claim 3,

The propagation stage is

When the number of layers of the first deep learning network and the number of layers of the second deep learning network are not the same, quantization knowledge information about the parameters of all layers is propagated.
4. The method according to claim 3,

The propagation stage is

When the number of layers of the first deep learning network and the number of layers of the second deep learning network are the same, quantization knowledge information on parameters is propagated for each layer.
The method according to claim 1,

The second quantization step is

A network quantization method, characterized in that the parameters of the second deep learning network are quantized as a quantization technique that maps floating points and integers to 1:N.
The method according to claim 1,

A first learning step of learning the first deep learning network;

propagating the classification knowledge information obtained through the first learning step to a second deep learning network;

Using the propagated classification knowledge information, a second learning step of learning the second deep learning network; Network quantization method further comprising a.
a communication unit for receiving quantization knowledge information generated in the process of quantizing parameters of the first deep learning network;

A network quantization system comprising: a processor that quantizes parameters of the second deep learning network by using the received quantization knowledge information.