WO2020149511A1

WO2020149511A1 - Electronic device and control method therefor

Info

Publication number: WO2020149511A1
Application number: PCT/KR2019/016235
Authority: WO
Inventors: 박치연; 김재덕; 정현주
Original assignee: 삼성전자주식회사
Priority date: 2019-01-17
Filing date: 2019-11-25
Publication date: 2020-07-23

Abstract

An electronic device and a control method therefor are provided. The electronic device may comprise: a memory for storing at least one instruction; and a processor connected to the memory so as to control the electronic device, wherein the processor: by executing the at least one instruction, appends a second layer including a learnable function to a first layer in an artificial neural network including a plurality of layers; updates a parameter value included in the second layer by learning of the artificial neural network; acquires a function value by inputting the updated parameter value to the learnable function; and eliminates at least one channel among a plurality of channels included in the first layer on the basis of the acquired function value so as to achieve update to a third layer.

Description

Electronic device and control method therefor

The present disclosure relates to an electronic device and a control method thereof, and more particularly, to an electronic device and a control method thereof, which perform a function of reducing the size of an artificial neural network by adding a layer including a learnable function to the artificial neural network .

Recently, artificial neural networks are widely used in various fields such as machine translation, speech recognition, and image classification. As the use of artificial neural networks gradually widened and demanded precise calculation, the size and required computation of artificial neural networks increased exponentially. There is a limitation in storing a large-sized artificial neural network model on an on-device platform (for example, a smart phone, an Internet of Things (IoT) device, etc.). Techniques are being researched and developed to reduce computational accuracy while reducing it. One of the representative technologies is a channel pruning technology that reduces the channel size of an artificial neural network layer.

Previously, a channel pruning technique was used to calculate the sum of the absolute values of the weights of each channel of a layer included in a convolutional neural network, and sequentially cut the channels with the smallest sum of absolute values. However, this technique has a limitation in that a case in which a channel having the smallest sum of absolute values of weights is not the least important channel in an artificial neural network may occur.

In addition, in the past, channel pruning technology was used to find the optimal combination by comparing all the channel combinations of the layers of the artificial neural network. However, this technique has a limitation in that an exponential comparison operation must be performed in order to compare all channels of an artificial neural network layer.

In addition, conventionally, a soft gating technique has been used in which each channel of a layer is multiplied by a learned weight having a real value between 0 and 1, and a channel having a small value is removed. However, in the case of the corresponding technique, since the weight has an intermediate value between 0 and 1, weight learning is not complete, there is a limitation that a regularization or annealing algorithm must be additionally applied to the weight.

In addition, in the past, the Variational Method technique for learning the probability distribution parameter of whether each channel is to be removed has been utilized, but there is a limitation that the mask needs to be repeatedly sampled because the channel depends on the probability.

The present disclosure has been devised in accordance with the above-described need, and the present disclosure adds a layer including a learnable function to a layer of an artificial neural network, and an electronic device for reducing an existing layer using a learnable function and a learned parameter, and an object thereof In providing a control method.

An electronic device according to an embodiment of the present disclosure for achieving the above object includes a memory storing at least one instruction and a processor connected to the memory to control the electronic device, wherein the processor comprises: By executing the at least one instruction, a second layer including a learnable function is added to a first layer of an artificial neural network including a plurality of layers, and the artificial neural network is trained to perform the second operation. Update a parameter value included in a layer, obtain the function value by inputting the updated parameter value in the learnable function, and at least one of a plurality of channels included in the first layer based on the acquired function value One channel may be removed to update the third layer.

On the other hand, the control method of the electronic device according to an embodiment for achieving the above object is applied to the first layer of the artificial neural network (Neural Network) including a plurality of layers, the second layer including a learnable function (append) Step of learning the artificial neural network, updating the parameter value included in the second layer, entering the updated parameter value in the learnable function to obtain a function value, and obtaining the acquired function value The method may include updating at least one channel among the plurality of channels included in the first layer and updating to a third layer.

As described above, according to various embodiments of the present disclosure, the electronic device reduces the size of the artificial neural network by performing pruning by adding a layer including a learnable function to the layer of the artificial neural network without comparing all channel combinations, and Accuracy of calculation can be maintained. Accordingly, the user can efficiently utilize the compressed artificial neural network in various fields through an electronic device that performs pruning.

1A is a view for explaining a control method of an electronic device according to an embodiment of the present disclosure;

1B is a view for explaining a result obtained when an electronic device performs pruning on an artificial neural network according to an embodiment of the present disclosure;

2A is a block diagram schematically illustrating a configuration of an electronic device according to an embodiment of the present disclosure;

2B is a block diagram showing in detail the configuration of an electronic device according to an embodiment of the present disclosure;

3 is a diagram for explaining a method of learning an artificial neural network, according to an embodiment of the present disclosure;

4 is a diagram for describing a learnable function included in a second layer of an electronic device, according to an embodiment of the present disclosure;

5 is a view for explaining an experimental result related to an electronic device, according to an embodiment of the present disclosure;

6 is a diagram for describing a control method of an electronic device according to an embodiment of the present disclosure.

Hereinafter, various embodiments of the present disclosure will be described with reference to the drawings.

1A is a diagram illustrating a control method of an electronic device 100 according to an embodiment of the present disclosure. As illustrated in FIG. 1A, the electronic device 100 may store an artificial neural network including a plurality of layers. Layer may refer to each stage of the artificial neural network. The plurality of layers included in the artificial neural network has a plurality of weight values, and the calculation of the layer may be performed through calculation of a result of the previous layer and a plurality of weights. Specifically, the artificial neural network may be composed of a combination of several layers, and the layer may be represented by a plurality of weights. And, a kernel is a set of weights included in one layer. In one embodiment, the kernel may be implemented as a multidimensional matrix tensor. Also, a channel may mean the last dimension when the kernel is implemented as a tensor. Therefore, the channel can also match the last dimension of the tensor representing the output value of a particular layer.

In addition, examples of artificial neural networks include: Convolutional Neural Network (CNN), Deep Neural Network (DNN), Recurrent Neural Network (RNN), Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), and Bidirectional Recurrent Deep Neural Network (BRDNN). ) And Deep Q-Networks, and the neural network in the present disclosure is not limited to the above-mentioned examples except for the case where specified.

The electronic device 100 may add a second layer including a learnable function 30 to a first layer among a plurality of layers included in the artificial neural network to compress the artificial neural network. According to an embodiment, the electronic device 100 may add a layer including a learnable function to all layers included in the artificial neural network. As another embodiment, when a first layer is selected from a plurality of layers according to a user command, the electronic device 100 may add a second layer to the selected first layer.

Meanwhile, the second layer may include a learnable function 30 and parameters W _{l, 1,} W _{l, 2} … W _{l, n} 20. The parameters W _{1 and i} illustrated in FIG. 1A mean parameters for deriving a function value corresponding to the i-th channel of the first layer. The learnable function 30 may be a function having a non-trivial differential value when a parameter 20 defined as a real range is input. Specifically, the learnable function 30 is a function of adding a second function that is differentiable to a first function that outputs 0 or 1, and the second function is a function of multiplying the function having a preset gradient and a function that is differentiable. Can be. The first function may be a unit step function, and the second function may be a sawtooth function, but this is only an example, and the second function is implemented as a differential function with a limited range of output values. Can be.

Then, the electronic device 100 may input an arbitrary parameter value into the learnable function 30 to obtain 0 or 1 equal to the output value of the first function. In one embodiment, the electronic device 100 may obtain a function value 0 by inputting a negative parameter value into the learnable function 30, and inputting a positive parameter value into the learnable function 30 to obtain a function value 1. Can be obtained. However, this is an embodiment and the electronic device 100 may input negative or positive parameter values into the learnable function 30 to obtain values having a difference within a range of 0 or 1 and a threshold, respectively. The mathematical characteristics and proofs related to the learnable function 30 will be described in detail with reference to FIG. 4 later.

Meanwhile, the electronic device 100 may learn the artificial neural network to update the parameter 20 included in the second layer. Specifically, the electronic device 100 multiplies the function value obtained when the parameter 20 is input to the learnable function 30 of the second layer and the output data 10 of the first layer to output data of the second layer (40).

In one embodiment, when the function value obtained through the learnable function 30 is 0, the electronic device 100 is configured to correspond to the function value of 0 and the parameter 20 input to the learnable function 30. The channel and the multiplication operation of the output data 10 of one layer can be performed. Accordingly, as illustrated in FIG. 1A, the output data 40 of the second layer is data in which some channels 50-1 and 50-2 of the output data 10 of the first layer are masked to zero. Can be

Meanwhile, the

output data

10 and 40 of each layer may be implemented as a multi-dimensional matrix tensor, but this is only an embodiment and may be implemented in various forms (eg, vectors, etc.).

Then, the electronic device 100 generates a loss function for the difference between the output value of the artificial neural network and the output value that the electronic device 100 wants to acquire based on the output data 40 of the second layer, , It is possible to obtain a parameter that outputs the minimum function value (ie, minimum loss value) of the generated loss function. In one embodiment, the electronic device 100 may obtain and update the parameter 20 value at which the loss value is minimum by applying a stochastic gradient descent algorithm to the loss function. The stochastic gradient descent algorithm is an algorithm that can obtain a parameter value that can output a minimum function value (loss value) in a loss function. However, the electronic device 100 may output a minimum function value by applying various algorithms (for example, a momentum algorithm, adagrad algorithm, adam algorithm, etc.) as well as a stochastic gradient descent algorithm to the loss function. The parameter value can be obtained. Then, the electronic device 100 may update the parameter included in the second layer with a parameter that outputs a minimum function value of the loss function.

Meanwhile, the loss function may be a function of a loss function for maximizing the accuracy of an artificial neural network operation, and an additional loss function indicating the size or computational complexity of a layer to be obtained after compression. In one embodiment, the loss function indicating the size of the layer may be calculated as the sum of the output values of the learnable function 30 included in the second layer, and the loss function indicating the computational complexity of the layer is included in the second layer It can be calculated as the product of the sum of the output values of the learned learnable function 30 and the size of the input value of the first layer. In another embodiment, the additional loss function may be implemented as a sum of weights of the loss function indicating the size or computational complexity of the layer.

Meanwhile, the electronic device 100 obtains a function value by inputting the updated parameter 20 into the learnable function 30, and based on the obtained function value, at least one of a plurality of channels included in the first layer The channel may be removed to update the third layer. Specifically, when a function value is obtained by inputting W _l,i among the parameters to the learnable function 30, the electronic device 100 according to the acquired function value (0 or 1), the i-th channel of the first layer It can be determined whether or not to remove. If W _{1, i} of the parameters is input to the learnable function 30 to obtain a function value of 0, the electronic device 100 may remove the i-th channel of the first layer 10. The input of W _l,i to the learnable function 30 of the parameter to output 0 may mean that the i-th channel of the first layer is an insignificant channel of the entire calculation of the artificial neural network, and the electronic device 100 Can be removed by Therefore, the overall size and calculation amount of the artificial neural network is reduced, while the accuracy of calculation can be maintained. On the other hand, when W _{1, i} among parameters is input to the learnable function 30 to obtain a function value of 1, the electronic device 100 may maintain the i-th channel of the first layer.

Then, the electronic device 100 may remove the second layer and update the third layer from which at least one channel among the plurality of channels included in the first layer is removed based on the function value. However, this is only an example, and the electronic device 100 may first update the first layer to the third layer and then remove the second layer.

As an example, as shown in (a) of FIG. 1B, the electronic device 100 adds a second layer to the first layer to perform pruning, thereby turning the first layer into a third layer with a reduced number of channels. It can be renewed. Specifically, the electronic device 100 may update the first layer to the third layer by removing some of the channels 60-1 and 60-2 included in the first layer based on the function value output from the learnable function. Can.

On the other hand, as an embodiment of the present disclosure, the electronic device 100 obtains a function value by inputting the updated parameter 20 value to the learnable function 30, and the weight of the first kernel based on the obtained function value And may update the first kernel of the first layer to a second kernel including the changed weight.

Specifically, the electronic device 100 may update a parameter by adding a second layer having a learnable parameter 50 corresponding to the number of weights of the kernel of the first layer and learning the artificial neural network. Then, the electronic device 100 may determine whether to change the weight of the first kernel of the first layer to 0 according to the function value obtained by inputting the updated parameter to the learnable function 30. For example, when the first kernel of the first layer is implemented in a 3D 3x3x64 (ie, a 3X3 type filter is implemented in 64 channels) matrix form, the electronic device 100 is included in the first kernel A second layer including a trainable parameter 20 in the form of a matrix of 3x3x64 that can correspond to a weighted one-to-one can be added to the first layer. Then, the electronic device 100 may update the parameters by learning the artificial neural network. Then, when the function value obtained by inputting the updated parameter to the learnable function 30 is 0, the electronic device 100 determines the weight of the first kernel of the first layer corresponding to the parameter input to the learnable function. Can be changed to 0. For example, when the parameter when outputting a function value of 0 corresponds to the weight of the (3,3,20) position of the first kernel, the electronic device 100 displays (3,3,20) of the first kernel ) You can change the weight of the position to 0. Then, when the obtained function value is 1, the electronic device 100 may maintain the weight of the first kernel of the first layer corresponding to the parameter input to the learnable function 30. Accordingly, the electronic device 100 may reduce the computation amount of the entire artificial neural network by changing some of the weights of the first kernel of the first layer to 0.

As an example, as shown in (b) of FIG. 1B, the electronic device 100 adds a second layer to the first layer to perform pruning, thereby partially part of the weights included in the first kernel of the first layer. By changing to 0, the first kernel can be updated to the second kernel 70 including the changed weight.

On the other hand, as another embodiment, the electronic device 100 removes some channels of the layer based on the function value obtained by inputting the updated parameter to the learnable function, thereby reducing the weight of the artificial neural network or weighting the individual kernels of the layer to 0. By changing to, it is possible not only to reduce the amount of computation, but also to remove the specific layer to lighten the artificial neural network.

Specifically, the electronic device 100 may remove at least one layer among a plurality of layers included in the artificial neural network based on the function value obtained by inputting the updated parameter 20 into the learnable function 30. have. For example, when the parameter input to the learnable function when the function value of 0 is output corresponds to the first layer, the electronic device 100 may remove the first layer of the artificial neural network. In addition, when the parameter input to the learnable function when the single function value is output corresponds to the first layer, the electronic device 100 may maintain the first layer of the artificial neural network.

As an example, as shown in FIG. 1B(c), the electronic device 100 performs pruning by adding a second layer including a learnable function 30 to each layer included in the artificial neural network. Some of the multiple layers can be removed. For example, when the function value obtained through the learnable function 30 is 0 and the parameters input to the learnable function 30 correspond to the first and third layers, the electronic device 100 may first and The artificial neural network can be made lighter by removing the third layer.

2A is a diagram schematically illustrating a configuration of an electronic device 100 according to an embodiment of the present disclosure. As illustrated in FIG. 2A, the electronic device 100 may include a memory 110 and a processor 120. 2A are exemplary diagrams for implementing embodiments of the present disclosure, and appropriate hardware/software configurations of a level apparent to those skilled in the art may be additionally included in the electronic device 100.

The memory 110 may store instructions or data related to at least one other component of the electronic device 100. In particular, the memory 110- may be implemented as a non-volatile memory, a volatile memory, a flash-memory, a hard disk drive (HDD) or a solid state drive (SSD). The memory 110 is accessed by the processor 120, and data read/write/modify/delete/update may be performed by the processor 120. In the present disclosure, the term memory is a memory card (not shown) mounted on a memory 110, a ROM (not shown), a RAM (not shown), or an electronic device 100 in a processor 120 (eg, micro SD) Card, memory stick).

In particular, the memory 110 may store an artificial neural network including a plurality of layers, and may store kernels and parameters included in each layer. In addition, the memory 110 may store output data of each layer and output data of the entire artificial neural network.

The processor 120 is electrically connected to the memory 110 to control overall operations and functions of the electronic device 100. In particular, the processor 120 adds a second layer including a learnable function to a first layer among artificial neural networks including a plurality of layers by executing at least one instruction stored in the memory 110, and artificial A neural network is trained to update parameter values included in the second layer, input updated parameter values to a learnable function to obtain function values, and a plurality of values included in the first layer based on the acquired function values At least one of the channels may be removed to update the third layer.

In particular, the processor 120 may add a layer including a learnable function to all layers included in the artificial neural network in order to reduce the size of the artificial neural network. However, this is an embodiment and when a user command selecting a specific layer is input through the input unit 130, the processor 120 may add a learnable function to only the selected layer. Further, the learnable function may be a function of adding a second function that can be differentiated to a first function that outputs 0 or 1, and the second function may be a function of multiplying a function having a preset gradient and a function that can be differentiated. The mathematical characteristics and proof of the function will be described in detail with reference to FIG. 4.

Also, the processor 120 may update the parameter values included in the second layer by learning the artificial neural network. For example, the processor 120 may obtain output data of the second layer by multiplying the function value of the learnable function included in the second layer and the output data of the first layer for each channel. For example, the parameters included in the second layer (W _{l, 1,} W _{l, 2,} ... W _{l, n)} of the case, the processor 120 may learn the available function (W _{l, 1,} W _{l, 2} … W _l,n ) to obtain the function value, and the obtained function value is the first channel and the second channel of the first layer. The output data of the second layer may be obtained by multiplying the n-th channel. In addition, the processor 120 may generate a loss function based on the obtained output data of the second layer. Then, the processor 120 may apply a stochastic gradient descent algorithm to the loss function to obtain a parameter value that minimizes the function value (loss value), and update the existing parameter value with the acquired parameter value. However, the stochastic gradient descent algorithm is only one embodiment, and the electronic device 100 applies various algorithms (for example, a momentum algorithm, adagrad algorithm, adam algorithm, etc.) to the loss function, and thus a minimum function value. It is possible to obtain a parameter value to have.

Then, the processor 120 may obtain the function value by inputting the updated parameter value to the learnable function. For example, when the updated parameter value is negative, the processor 120 may obtain a function value of 0 by inputting a negative parameter value into the learnable function. In addition, when the updated parameter value is a positive number, the processor 120 may obtain a function value of 1 by inputting a positive parameter value into a learnable function. However, this is an embodiment and the processor 120 may input a negative or positive parameter value into a learnable function to obtain a value having a difference in a threshold range of 0 or 1, respectively.

Also, the processor 120 may update the first layer to the third layer by removing at least one channel among the plurality of channels included in the first layer based on the acquired function value. Specifically, when the obtained function value is 0, the processor 120 may remove the kernel of the first layer corresponding to the parameter input to the learnable function. For example, when the input parameter is W _l,1 when a function value of 0 is obtained, the processor 120 may delete the first channel of the first layer. In addition, when the inputted parameter is W _l,2 when a function value of 1 is obtained, the processor 120 may maintain the second channel of the first layer. Then, the processor 120 may update the first layer to the third layer in which the channel is removed or maintained based on the function value.

On the other hand, the processor 120 may remove the second layer and remove at least one channel among a plurality of channels included in the first layer based on the function value to update the third layer, but this is an embodiment. After updating to 3 layers, the second layer can be removed.

Meanwhile, the processor 120 obtains a function value by inputting an updated parameter value to the learnable function, changes the weight of the first kernel of the first layer based on the obtained function value, and removes the first layer One kernel may be updated with a second kernel including the changed weight. In one embodiment, when the function value obtained when the first kernel of the first layer is implemented as a 3X3X64 matrix is 0, the processor 120 of the first kernel of the first layer corresponding to the parameter input to the learnable function The weight can be changed to zero. For example, when the parameter when the function value of 0 is output corresponds to the weight of the (3,3,20) position of the first kernel, the processor 120 may (3,3,20) of the first kernel The weight of the position can be changed to zero. In addition, when the parameter when the function value of 1 is output corresponds to the weight of the (3,3,20) position of the first kernel, the processor 120 displays the (3,3,20) position of the first kernel. The weight can be maintained.

Meanwhile, the processor 120 may be composed of one or a plurality of processors. At this time, the one or a plurality of processors 120 is a general-purpose processor such as a central processing unit (CPU), an application processor (AP), a graphics-processing unit (GPU), a graphics-only processor such as a VPU (Visual Processing Unit), or an NPU Neural Processing Unit).

One or a plurality of processors are controlled to process input data according to predefined operation rules or artificial intelligence models stored in the memory 110. The predefined motion rules or artificial intelligence models are characterized by being created through learning.

Here, being made through learning means that by applying a learning algorithm to a plurality of learning data, a predefined action rule or artificial intelligence model of desired characteristics is created. Such learning may be performed on a device on which artificial intelligence according to the present disclosure is performed, or may be performed through a separate server/system.

The learning algorithm is a method of training a predetermined target device (eg, a robot) using a plurality of learning data so that a predetermined target device can make a decision or make a prediction by itself. Examples of learning algorithms include supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, where learning algorithms in the present disclosure are specified. It is not limited to the above-mentioned example except.

2B is a block diagram illustrating in detail the configuration of the electronic device 100 according to an embodiment of the present disclosure. 2B, the electronic device 100 may include a memory 110, a processor 120, an input unit 130, a communication unit 140, a display 150, and an audio output unit 160. . Meanwhile, since the memory 110 and the processor 120 are described in FIG. 2A, redundant description will be omitted.

The input unit 130 may receive various user inputs and transmit them to the processor 120. In particular, the input unit 130 may include a touch sensor, a (digital) pen sensor, a pressure sensor, a key, or a microphone. For the touch sensor, for example, at least one of capacitive, pressure-sensitive, infrared, and ultrasonic methods may be used. The (digital) pen sensor may be, for example, a part of the touch panel or may include a separate recognition sheet.

As an embodiment, a user command for selecting a first layer to reduce the number of channels among a plurality of layers included in the artificial neural network may be received through the input unit 130 and transmitted to the processor 120.

The communication unit 140 may communicate with an external device through various communication methods. The communication unit 140 may be connected to an external device through communication with a third device (eg, a repeater, a hub, an access point, a server, or a gateway).

Meanwhile, the communication unit 140 may include various communication modules to perform communication with an external device. For example, the communication unit 140 may include a wireless communication module, for example, LTE, LTE-A (LTE Advance), code division multiple access (CDMA), wideband CDMA (WCDMA), universal mobile telecommunications (UMTS) system), WiBro (Wireless Broadband), or GSM (Global System for Mobile Communications). As another example, the wireless communication module may include, for example, at least one of WiFi (wireless fidelity), Bluetooth, Bluetooth low power (BLE), and Zigbee.

The display 150 may display various information under the control of the processor 120. In particular, the display 150 may display an indicator that the first layer has been updated to the third layer under the control of the processor 140.

In addition, the display 150 may be implemented as a touch screen together with a touch panel. However, it is not limited to the above-described implementation, and the display 150 may be implemented differently according to the type of the electronic device 100.

The audio output unit 160 is a component that outputs various notification sounds or voice messages as well as various audio data performed by various processing operations such as decoding, amplification, and noise filtering by the audio processing unit (not shown). As an embodiment, the audio output unit 160 may output a notification sound that the first layer has been updated to the third layer.

The audio output unit 160 may be implemented as a speaker 160, but this is only an example, and may be implemented as another output terminal capable of outputting audio data.

3 is a diagram for explaining a method of learning an artificial neural network including a plurality of layers stored in the electronic device 100 according to an embodiment of the present disclosure.

As an embodiment, when receiving training input data 310 from a user or an external device, the electronic device 100 may acquire the output data 330 by inputting the input data 310 to the first layer 320. have. The input data 310 may be video data or audio data, but is not limited thereto. In addition, the output data 330 may be a feature map obtained by performing a synthesis convolution operation between the kernel of the first layer 320 and the input data 310. Meanwhile, the input data 310 and the output data 330 illustrated in FIG. 3 may be implemented as a multi-dimensional matrix tensor, but this is only an embodiment and may be implemented in various forms (for example, vectors). have.

The electronic device 100 may obtain the masked output data 340 by multiplying the function value obtained by inputting the learnable parameter 370 into the learnable function 380 and the output data 330 for each channel. For example, when the parameter W _1,i of the function value to be multiplied by the i th eclectic data is negative, the electronic device 100 obtains a function value of 0 by inputting a negative parameter value to the learnable function, , The obtained zero function value can be multiplied with the i-th output data. When the parameter W _{1, i} of the function value to be multiplied by the i-th output data is positive, the electronic device 100 obtains a function value of 1 by inputting a parameter value that is a positive value to the learnable function, and the acquired 1-function The value can be multiplied by the i-th output data. Accordingly, the electronic device 100 may obtain the masked output tensor 340 by multiplying the

function value

0 or 1 for each channel of the output data 330. Meanwhile, the learnable function 380 and the learnable parameter 370 may be included in the second layer added to the first layer.

In addition, the learnable function 380 is a function of adding a second function that can be differentiated to a first function that outputs a discretization value, such as outputting 0 or 1 or outputting -1 or 1, and the second function is differentiation. It may be a function multiplied by a function having a predetermined gradient and a possible function. The mathematical properties and proofs related to the learnable function will be described in detail with reference to FIG. 4.

Then, as an example, the electronic device 100 obtains a loss function based on the masked output data 340 and applies a stochastic gradient descent algorithm to the loss function to minimize the function value (loss value). The parameters can be obtained. Since a method of obtaining a parameter having a minimum function value in a loss function through a probabilistic gradient descent algorithm is a known technique, a detailed method will be omitted. Then, the electronic device 100 may update the existing parameter with the newly acquired parameter (360).

Then, when the additional input data 310 is input, the electronic device 100 acquires output data by inputting input data to the first layer, and acquires updated data by obtaining the learnable function included in the second layer 370 ) To obtain the masked output data by multiplying the output data obtained by inputting the function value for each channel. Thereafter, a loss function may be obtained based on the output data masked as described above, and a parameter value that minimizes the loss function value may be obtained by applying a stochastic gradient descent algorithm to the loss function. Then, the electronic device 100 may update the existing parameters with the acquired parameters. That is, the electronic device 100 may learn an artificial neural network and update parameter values as illustrated in FIG. 3.

4 is a diagram for describing a learnable function included in a second layer of an electronic device according to an embodiment of the present disclosure.

The learnable function 450 included in the second layer is a function of adding a second function 420 that can be differentiated to a first function 410 that outputs 0 or 1. As illustrated in FIG. 4, the first function 410 may be a unit staircase function that outputs 0 when a negative parameter value is input and 1 when a positive parameter value is input. However, this is only an example, and the first function 410 may be implemented as another function that outputs 0 or 1.

As an example embodiment, when the first function 410 is implemented as a unit step function, since the differential value is 0 for an arbitrary parameter value, the electronic device 100 loses a stochastic gradient descent algorithm utilizing the differential function. It cannot be applied to train parameters. Accordingly, a second function 420 that can be differentiated can be added to the unit step function. The second function 420 may be a function of multiplying the derivative function 430 and a function (or derivative shape) 440 having a predetermined slope. The derivable function 430 may be implemented as a function having a limited amount of derivative values in a range of output values, and may include a sawtooth function as an example. As an example, the second function 430, which is a differential function that the differential function 430 is implemented as a sawtooth function, is expressed by Equation (1).

M in Equation 1 is a predetermined positive integer, and the slope value for any w (parameter) in Equation 1 is 1.

Learnable function (450)

And a function 440 having a predetermined slope

When defined as, the function that can be learned is equal to Equation 2.

In addition, when the value of M exceeds a threshold value (for example, 10^5) when the learnable function 450 is implemented as in Equation 2, the learnable function 450 is an arbitrary w (parameter). With respect to, a function value that is the same as the first function or an error within the threshold range can be output. In addition, the learnable function may output a slope value in which an error occurs within the same or threshold range as the function 440 having a predetermined slope.

Proof of the characteristic that if M exceeds a threshold, the learnable function 450 can output a function value equal to or equal to the first function for any w (parameter) or within a threshold range. Equation 3.

Proof of the characteristic that when M exceeds the threshold, the learnable function 450 outputs a slope value that is equal to a function having a predetermined slope for any w (parameter) or an error within a threshold range. Equation 4.

According to an embodiment of the present disclosure, since the M value of the second function exceeds a threshold, the learnable function 450 is the same as the first function for any w (parameter) or an error within a threshold range occurs The function value may be output, and a slope value in which an error in the same or threshold range as the function 440 having a predetermined slope occurs. Therefore, when a negative or positive parameter is input to the learnable function 450, the electronic device 100 outputs a function value of 0 or 1, respectively, or a function value in which an error occurs within a range of 0 or 1 and a threshold value. Can output In addition, since the differential value of the learnable function 450 is not 0, it is possible to update the parameter value included in the second layer by learning by applying a stochastic gradient descent method to the artificial neural network.

5 is a view for explaining an experimental result related to the electronic device 100 according to an embodiment of the present disclosure. Specifically, FIG. 5 is a diagram for comparing an experiment result (compression rate and accuracy of calculation after compression) of a method different from the method in which the electronic device 100 compresses the first layer in the present disclosure. The compression rate (FLOPs (Floating Point Operations)) is a ratio obtained by dividing the calculated amount of the existing artificial neural network and the calculated amount of the compressed artificial neural network by adding a second trainable layer. Therefore, the larger the compression ratio value, the more the artificial neural network is compressed.

5 is an experimental result when the electronic device 100 compresses the artificial neural network structure of ResNet-56 (v2) in a CIFAR-10 data set in various ways. The method (a) is a method of compressing the artificial neural network by acquiring the sum of the absolute values of weights for each channel of the kernel of the layer included in the artificial neural network, and removing the kernel of the channel having the smallest sum of absolute values. The method (B) is a method of compressing an artificial neural network by multiplying each channel of a layer by a learned weight having a real value between 0 and 1, and removing a channel having a small value. (C) is a method of compressing an artificial neural network using the above-described control method of the electronic device 100.

As shown in Fig. 5, the compression ratio of the artificial neural network in (A) is 1.21 times, but in the case of (C), the compression is twice. And, in the (B) method, the compression ratio of the artificial neural network is doubled. In the case of (B) method, the accuracy of calculation after artificial neural network compression decreased by 1%, but in the case of (C) method, it decreased by 0.46%. That is, it can be seen from the experimental results of FIG. 5(a) that the (c) method has a higher compression rate and a calculation accuracy after compression of the artificial neural network than the (a) and (b) methods, respectively.

6 is a flowchart illustrating a control method of the electronic device 100 according to an embodiment of the present disclosure.

First, the electronic device 100 may connect a second layer including a learnable function to a first layer of an artificial neural network including a plurality of layers (S610). As an embodiment, the electronic device 100 may connect layers including learnable functions to all layers included in the artificial neural network, respectively. As another embodiment, when a user command for selecting a layer to reduce the size of a channel is input, the electronic device 100 may connect a layer including a learnable function to the selected layer. The learnable function is a function of adding a second function that can be differentiated to a first function that outputs 0 or 1, and the second function is a function of multiplying a function having a preset gradient and a function that can be differentiated. Since the mathematical properties of the function have been described with reference to FIG. 4, redundant description will be omitted.

The electronic device 100 may learn the artificial neural network and update the parameter values included in the second layer (S620). Specifically, the electronic device 100 obtains the output data of the second layer by multiplying the output data of the first layer by the function value for each channel, and generates and generates a loss function based on the obtained output data of the second layer. A parameter that outputs the minimum function value of the lost function can be obtained. For example, the electronic device 100 may obtain a parameter outputting the minimum function value of the loss function by applying a stochastic gradient descent method to the loss function, and update the existing parameter with the acquired parameter.

Then, the electronic device 100 may obtain the function value by inputting the updated parameter value to the learnable function (S630). Then, the electronic device 100 may update at the third layer by removing at least one channel among the plurality of channels included in the first layer based on the obtained function value (S640). In an embodiment, when the updated parameter value is negative, the electronic device 100 may obtain a function value 0 by inputting a negative parameter value into the learnable function, and remove the kernel of the first layer corresponding to the parameter. have. For example, when the negative parameter input to the learnable function is W _1,i , the electronic device 100 may acquire the function value 0 and remove the i-th channel of the first layer. If the positive parameter input to the learnable function is W _1,i , the electronic device 100 may acquire the function value 1 and maintain the i-th channel of the first layer.

Then, the electronic device 100 removes the second layer added to the first layer, and removes at least one channel among the plurality of channels included in the first layer based on the function value, thereby removing the channel layer from the first layer. A third layer can be obtained. However, this is an embodiment and the second layer may be removed after updating the first layer to the third layer.

Meanwhile, various embodiments of the present disclosure are described with reference to the accompanying drawings. However, this is not intended to limit the technology described in this disclosure to specific embodiments, and it should be understood that it includes various modifications, equivalents, and/or alternatives of embodiments of the present disclosure. . In connection with the description of the drawings, similar reference numerals may be used for similar elements.

In the present disclosure, expressions such as “have,” “can have,” “includes,” or “can include,” include the presence of a corresponding feature (eg, a component such as a numerical value, function, operation, or part). And does not exclude the presence of additional features.

In the present disclosure, expressions such as “A or B,” “at least one of A or/and B,” or “one or more of A or/and B”, etc. may include all possible combinations of the items listed together. . For example, “A or B,” “at least one of A and B,” or “at least one of A or B,” (1) includes at least one A, (2) includes at least one B, Or (3) all cases including both at least one A and at least one B.

Expressions such as "first," "second," "first," or "second," as used in the present disclosure may modify various components, regardless of order and/or importance, and denote one component. It is used to distinguish from other components, but does not limit the components.

Some component (eg, first component) is "(functionally or communicatively) coupled with/to" another component (eg, second component), or " When referred to as "connected to", it should be understood that any of the above components may be directly connected to the other component or may be connected through another component (eg, a third component). On the other hand, when it is mentioned that a component (eg, the first component) is “directly connected” or “directly connected” to another component (eg, the second component), the component and the component It can be understood that there are no other components (eg, the third component) between the other components.

The expression "configured to" as used in the present disclosure may have, depending on the situation, for example, "having the capacity to" ," "designed to," "adapted to," "made to," or "capable of" can be used interchangeably. The term "configured (or set) to" may not necessarily mean only "specifically designed to" in hardware. Instead, in some situations, the expression "device configured to" may mean that the device "can" with other devices or parts. For example, the phrase “subprocessor configured (or set) to perform A, B, and C” executes a dedicated processor (eg, an embedded processor) to perform the operation, or one or more software programs stored in the memory device. By doing so, it may mean a general-purpose processor (eg, a CPU or application processor) capable of performing the corresponding operations.

An electronic device according to various embodiments of the present disclosure may include, for example, at least one of a smart phone, a tablet PC, a desktop PC, a laptop PC, a netbook computer, a server, a PDA, a medical device, or a wearable device. In some embodiments, the electronic device may include at least one of a television, a refrigerator, an air conditioner, an air purifier, a set top box, a media box (eg, Samsung HomeSyncTM, Apple TVTM, or Google TVTM).

Meanwhile, the term user may refer to a person using an electronic device or a device using an electronic device (eg, an artificial intelligence electronic device). Hereinafter, the present disclosure will be described in more detail with reference to the drawings.

Various embodiments of the present disclosure may be implemented with software including instructions stored in a machine (eg, computer) readable storage media. As a device that can be called and operated according to the called command, it may include an electronic device (eg, the electronic device 100) according to the disclosed embodiments. When the command is executed by a processor, the processor may directly or Under the control of the processor, other components may be used to perform functions corresponding to the command, which may include code generated or executed by a compiler or interpreter. It can be provided in the form of a non-transitory storage medium, where'non-transitory' means that the storage medium does not contain a signal and is tangible, but the data is semi-permanent. Or, it does not distinguish between temporary storage.

According to an embodiment, a method according to various embodiments disclosed in the present disclosure may be provided as being included in a computer program product. Computer program products can be traded between sellers and buyers as products. The computer program product may be distributed online in the form of a device-readable storage medium (eg compact disc read only memory (CD-ROM)) or through an application store (eg Play StoreTM). In the case of online distribution, at least a portion of the computer program product may be temporarily stored at least temporarily on a storage medium such as a memory of a manufacturer's server, an application store's server, or a relay server, or may be temporarily generated.

Each component (eg, module or program) according to various embodiments may be composed of a singular or a plurality of entities, and some of the aforementioned sub-components may be omitted, or other sub-components may be various. It may be further included in the embodiment. Alternatively or additionally, some components (eg, modules or programs) can be integrated into one entity, performing the same or similar functions performed by each corresponding component before being integrated. According to various embodiments, operations performed by a module, program, or other component may be sequentially, parallelly, repeatedly, or heuristically executed, at least some operations may be executed in a different order, omitted, or other operations may be added. Can.

Claims

In the electronic device,

A memory that stores at least one instruction; And

And a processor connected to the memory to control the electronic device.

The processor, by executing the at least one instruction,

A second layer including a learnable function is added to a first layer of an artificial neural network including a plurality of layers,

The artificial neural network is trained to update parameter values included in the second layer,

Acquire the function value by inputting the updated parameter value into the learnable function,

An electronic device that updates the first layer to a third layer by removing at least one channel among a plurality of channels included in the first layer based on the obtained function value.
According to claim 1,

The learnable function is a function of adding a second function that can be differentiated to a first function that outputs 0 or 1,

The second function is a function of multiplying a function having a differentiable function and a function having a predetermined gradient.
According to claim 1,

The processor,

The output data of the second layer is obtained by multiplying the output data of the first layer by the function value channel-wise,

A loss function is generated based on the output data of the first layer and the output data of the second layer,

An electronic device that acquires a parameter outputting a minimum function value of the loss function.
According to claim 3,

The processor,

An electronic device that acquires a parameter outputting a minimum function value of the loss function by applying a stochastic gradient descent method to the loss function
According to claim 3,

The processor,

An electronic device that updates a parameter included in the second layer with a parameter that outputs a minimum function value of the loss function.
The method of claim 5,

The processor,

If the updated parameter value is negative, input the negative parameter value into the learnable function to obtain a function value 0,

If the updated parameter value is a positive number, the electronic device obtains a function value 1 by inputting the positive parameter value to the learnable function.
The method of claim 5,

The processor,

When the acquired function value is 0, the channel of the first layer corresponding to the parameter input to the learnable function is removed,

When the obtained function value is 1, the electronic device maintaining the channel of the first layer corresponding to the parameter input to the learnable function.
According to claim 1,

The processor,

An electronic device that removes the second layer and removes the channel of the first layer based on the obtained function value, thereby updating the third layer.
According to claim 1,

The processor,

Acquire the function value by inputting the updated parameter value into the learnable function,

Change the weight of the first kernel of the first layer based on the obtained function value,

An electronic device that updates the first kernel of the first layer to a second kernel including the changed weight.
The method of claim 9,

The processor,

When the obtained function value is 0, the weight of the first kernel of the first layer corresponding to the parameter input to the learnable function is changed to 0,

When the obtained function value is 1, the electronic device maintaining the weight of the first kernel of the first layer corresponding to the parameter input to the learnable function.
In the control method of the electronic device,

Adding a second layer including a learnable function to a first layer of an artificial neural network including a plurality of layers;

Updating the parameter values included in the second layer by learning the artificial neural network;

Obtaining a function value by inputting the updated parameter value into the learnable function; And

And removing at least one channel from among a plurality of channels included in the first layer and updating to a third layer based on the acquired function value.
The method of claim 11,

The learnable function is a function of adding a second function that can be differentiated to a first function that outputs 0 or 1,

The second function is a function of multiplying a function having a differentiable function and a function having a predetermined gradient.
The method of claim 11,

In the step of updating the parameter value,

Obtaining output data of the second layer by multiplying the output value of the first layer by a channel-wise function value;

Generating a loss function based on the output data of the first layer and the output data of the second layer; And

And obtaining a parameter outputting a minimum function value of the loss function.
The method of claim 13,

In the step of updating the parameter value,

And obtaining a parameter outputting a minimum function value of the loss function by applying a stochastic gradient descent method to the loss function.
The method of claim 13,

In the step of updating the parameter value,

And updating a parameter included in the second layer with a parameter outputting a minimum function value of the loss function.