WO2022097954A1 - Procédé de calcul de réseau neuronal et procédé de production de pondération de réseau neuronal - Google Patents

Procédé de calcul de réseau neuronal et procédé de production de pondération de réseau neuronal Download PDF

Info

Publication number
WO2022097954A1
WO2022097954A1 PCT/KR2021/014367 KR2021014367W WO2022097954A1 WO 2022097954 A1 WO2022097954 A1 WO 2022097954A1 KR 2021014367 W KR2021014367 W KR 2021014367W WO 2022097954 A1 WO2022097954 A1 WO 2022097954A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
neural network
value
output
input
Prior art date
Application number
PCT/KR2021/014367
Other languages
English (en)
Korean (ko)
Inventor
정태영
Original Assignee
오픈엣지테크놀로지 주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 오픈엣지테크놀로지 주식회사 filed Critical 오픈엣지테크놀로지 주식회사
Publication of WO2022097954A1 publication Critical patent/WO2022097954A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to computing technology, and more particularly, to an operation structure inside a neural network and a method of generating parameters provided therefor.
  • the neural network 2 has a structure including an input layer, hidden layers, and an output layer, performs an operation based on received input data I1 and I2 , and outputs data based on the result of the execution. (O1 and O2) can be formed.
  • Each of the layers included in the neural network 2 may include a plurality of channels.
  • a channel may correspond to a plurality of artificial nodes known as a neuron, a processing element (PE), a unit, or similar terms.
  • PE processing element
  • Channels included in each of the layers of the neural network 2 may be connected to each other to process data.
  • one channel may perform an operation by receiving data from other channels, and may output an operation result to other channels.
  • each of the layers may be referred to as input activation and output activation. That is, activation may be an output of one channel and a parameter corresponding to an input of channels included in the next layer. Meanwhile, each of the channels may determine its own activation based on activations and weights received from channels included in the previous layer.
  • the weight is a parameter used to calculate output activation in each layer, and may be a value assigned to a connection relationship between layers.
  • activation of one input data can have c certain values for each coordinate (two-dimensional in x and y in the case of an image), and the axis for these c values can be expressed as a channel axis.
  • one input may be composed of (coordinates*channel) data.
  • activation for image processing can be composed of four dimensions such as (batch, x, y, channel). .
  • a batch may mean a dimension representing input 0, input 1, input 2, and the like.
  • Each of the channels may be processed by a computational unit or processing element that receives an input and outputs an output activation, and an input-output of each of the channels may be mapped.
  • is an activation function
  • w i jk is the weight from the k-th channel included in the (i-1)-th layer to the j-th channel included in the i-th layer
  • b i j is If it is the bias of the j-th channel included in the i-th layer
  • a i j is the activation of the j-th channel in the i-th layer
  • activation of the first channel CH 1 of the second layer Layer 2 may be expressed as a 2 1 .
  • Equation 1 described above is only an example for explaining the activation and weight used to process data in the neural network 2 , and is not limited thereto.
  • Activation may be a value obtained by passing a value obtained by applying an activation function to the sum of activations received from the previous layer through a Rectified Linear Unit (ReLU).
  • ReLU Rectified Linear Unit
  • the neural network quantization apparatus proposed in the above-mentioned prior art generates a neural network, trains (or learns) a neural network, quantizes a floating-point type neural network into a fixed-point type neural network, or regenerates a neural network. It performs various processing functions, such as functions to train (retrain).
  • a trained neural network can be generated by repeatedly training (learning) a given initial neural network.
  • the initial neural network may have floating-point type parameters in order to secure processing accuracy of the neural network.
  • the floating point requires a relatively large amount of computation and a large memory access frequency compared to the fixed point.
  • processing of a neural network having floating-point type parameters may not be smooth in a mobile device, an embedded device, or the like, which has relatively low processing performance.
  • the floating-point type parameters are preferably quantized.
  • parameter quantization means converting a floating-point type parameter into a fixed-point type parameter.
  • the neural network quantization apparatus performs quantization by converting parameters of a trained neural network into a fixed-point type of predetermined bits, and transmits the quantized neural network to a device to be employed.
  • the performance or efficiency of the hardware accelerator may be improved.
  • An object of the present invention is to provide a technique capable of reducing hardware complexity of a computing device processing a neural network, increasing processing speed, or improving resource utilization.
  • I A denotes an input activation
  • O A denotes an output activation
  • w denotes a weight parameter
  • a hardware accelerator may be implemented to convert an input activation, an output activation, and a weight into integers in a fixed point form and process the operation through an integer operator.
  • I A , O A , w may be defined as shown in the following equation and FIG. 3A .
  • the scale value is determined so that the q value has sufficient resolution in the form of an n-bit integer, and the floating-point value can be converted into the fixed-point format of the integer form through the /scale and quantize process.
  • S I , S w , and S O values are commonly used values within one layer or a specific channel of one activation . , but in the case of q S , it is commonly used within a layer or within a specific channel of one activation.
  • a fixed-point-based quantized arithmetic method of the form shown in FIG. 4 is provided as a method for making simpler hardware while exhibiting an operation effect similar to that shown in FIGS. 2, 3A, and 3B.
  • one multiplier can be reduced for each multiplication, there is an advantage in that hardware of a smaller size can be made.
  • the present invention includes a quantization algorithm for generating a weight parameter q' w for the above quantization operation.
  • This algorithm is an algorithm that premultiplies the weight with S I /S O and then converts it to a fixed-point integer.
  • a computing device comprising a data operation unit that executes an operation of a specific layer of an integer type neural network including an input layer, an intermediate layer, and an output layer, and an internal memory that provides data for operation to the data operation unit can do.
  • the specific layer is the first layer belonging to the intermediate layer part or the output layer
  • the first node included in the first layer includes a set of input activations input to the first node and a set of activations.
  • a set of multipliers each multiplying a set of weights corresponding to s; an adder for adding the outputs of the set of multipliers to each other; and a shift unit converting the output of the adder to generate output activation of the first node.
  • each of the set of input activations and the set of weights may be n-bit integer data, and the shift unit may be configured to right-shift the output of the adder by n bits.
  • the second node included in the input layer converts activation having a real value input to the second node into activation having an integer value using a predetermined input scaling factor. may be adapted to be converted.
  • the computing device may further include a scaling unit that converts and outputs the activation output by the output layer using a predetermined output scaling factor.
  • the input scaling factor, the output scaling factor, and the one set of weights may be provided by the computing device from another computing device. and the other computing device is configured to use information about a neural network having a structure corresponding to the integer neural network, and to generate the input scaling factor assigned to an input layer of the original neural network, - generate the output scaling factor assigned to an output layer of a neural network, and generate scaling factors assigned to each of the intermediate layers defined between the input layer and the output layer in the original-neural network, dividing the Lth scaling factor assigned to the Lth layer directly upstream of the L+1th layer including the node corresponding to the first node by the L+1th scaling factor assigned to the L+1th layer in A first value is calculated, and the weight (w.ab Layer_L ) assigned to the link (link ab Layer_L ) connected from the node having the index b of the Lth layer to the node having the index a of the L+1th layer is added to the multiply the first value to calculate a second value, multiply the second value by 2
  • a terminal including a processing unit and a storage unit drives a neural network to generate an output activation of the first node included in the intermediate layer or the output layer of an integer neural network including an input layer, an intermediate layer, and an output layer method can be provided.
  • the neural network driving method may include: obtaining, by the processing unit, a set of input activations from the storage unit, and acquiring a set of weights corresponding to the set of input activations from the storage unit; a multiplication step in which the processing unit multiplies the one set of weights and the values corresponding to each other of the one set of input activations to calculate a set of first values; an adding step in which the processing unit adds the set of first values to each other to calculate a second value; and a shifting step in which the processing unit calculates the output activation of the first node by right-shifting the second value by n bits.
  • each of the set of input activations and the set of weights may be n-bit integer data.
  • the method of driving the neural network includes: converting, by the processor, an activation having a real value input to a second node belonging to the input layer into an activation having an integer value using a predetermined input scaling factor; and converting, by the processing unit, the activation output from the output layer using a predetermined output scaling factor.
  • the input scaling factor, the output scaling factor, and the set of weights may be provided by the terminal from another computing device.
  • the other computing device is configured to use information about a one-neural network having a structure corresponding to the integer neural network, and to generate the input scaling factor assigned to an input layer of the original-neural network, generate the output scaling factor assigned to an output layer of a neural network, and generate scaling factors assigned to each of the intermediate layers defined between the input layer and the output layer in the original neural network, wherein in the original neural network
  • the L-th scaling factor allocated to the L-th layer directly upstream of the L+1-th layer including the node corresponding to the first node is divided by the L+1-th scaling factor allocated to the L+1-th layer to obtain a A value of 1 is calculated, and the weight (w.ab Layer_L ) assigned to the link (link ab Layer_L ) connected from the node having the index b of the Lth layer to the node having the index a of the L+1th layer is added to the weight
  • a second value is calculated by multiplying the value by 1
  • a third value is calculated by multiplying the second value by 2 n
  • a fourth value is generated by approximating the third value to an integer value.
  • the fourth value may be a weight (q' w.ab layer_L ) of the integer neural network corresponding to the weight ( w.ab Layer_L ) .
  • a server generates an input scaling factor assigned to the input layer of a neural network having an input layer, an intermediate layer part, and an output layer, and generates an output scaling factor assigned to the output layer, and the intermediate layer generating scaling factors assigned to each layer of negative; and the server assigns the Lth scaling factor assigned to the Lth layer directly upstream of the L+1th layer including the first node included in the intermediate layer unit or the output layer to the L+1th layer.
  • a weight assigned to a link (link ab Layer_L ) connected from a node having an index b of the L-th layer to a node having an index a of the L+1 layer to calculate a first value by dividing by the L+1 scaling factor ( w .ab Layer_L ) is multiplied by the first value to calculate a second value, the second value is multiplied by 2 n to calculate a third value, and the third value is approximated to an integer value to generate a fourth value may include;
  • the fourth value may be a weight (q' w.ab layer_L ) of the integer neural network corresponding to the weight (w .ab Layer_L ).
  • the integer neural network information processing method may further include the step of providing, by the server, the input scaling factor, the output scaling factor, and a weight of the integer neural network to a computing device executing the operation of the integer neural network.
  • the present invention it is possible to provide a technology capable of reducing hardware complexity of a computing device processing a neural network, increasing processing speed, or improving resource utilization.
  • the neural network when the neural network is implemented as hardware, it is possible to provide a technique for reducing the complexity of the structure of each computation node in the neural network.
  • the neural network when the neural network is implemented as software, it is possible to provide a technique for simplifying the computation algorithm in each computation node in the neural network.
  • 1 is a diagram for explaining an operation performed in a neural network according to an embodiment.
  • 3A and 3B are conceptual diagrams presented to explain the creative process of the present invention.
  • FIG. 5 illustrates an example of the structure of a neural network executed in a terminal provided according to an embodiment of the present invention.
  • FIG. 6 shows a method of generating an output activation at each node of the input layer of the neural network shown in FIG. 5 .
  • FIG. 7 shows a method of generating an output activation at each node of an intermediate layer (hidden layer) of the neural network shown in FIG. 5 .
  • FIG. 8 shows an output generation method at each node of the intermediate layer (hidden layer) of the neural network shown in FIG. 5 .
  • FIG. 9 shows an output generation method at each node of the output layer of the neural network shown in FIG. 5 .
  • FIG. 10 is a diagram illustrating a method of restoring a target value from an output integer value output from each node of the output layer of the neural network shown in FIG. 5 .
  • 11 shows information provided by the server to the terminal.
  • FIG. 12 shows the structure of a neural network that has been trained by the server.
  • the neural network shown in FIG. 12 has a structure corresponding to the neural network shown in FIG. 5 .
  • FIG. 13A is a block diagram illustrating a hardware configuration of a server according to an embodiment of the present invention.
  • 13B is a diagram for explaining the quantization of a pre-trained neural network according to an embodiment of the present invention and employing it in a terminal (hardware accelerator).
  • FIG. 14 is a diagram illustrating a neural network calculation method in a terminal provided according to a comparative embodiment.
  • FIG. 5 illustrates an example of the structure of a neural network executed in a terminal provided according to an embodiment of the present invention.
  • the neural network 420 may be implemented as hardware or software within the terminal 200 .
  • the neural network 420 may include a plurality of layers (layer 1, layer 2, layer 3, layer 4). Each layer may include one or a plurality of operation nodes. In the example shown in FIG. 5 , the input layer includes two nodes and the output layer includes one node. In FIG. 5, each node is expressed as a rectangle including the letter N.
  • a value input to each node of the input layer (layer 1) may be a real value in the form of a floating point or a fixed point.
  • each node of the input layer (layer 1) may convert the real value into an integer value and output it.
  • a value input to each node of the input layer (layer 1) may be an integer type.
  • each node of each layer is an integer value and can be expressed in integer or fixed-point format.
  • the weight assigned to the link connecting each node is also an integer value, and may be expressed in an integer or fixed-point format.
  • the integer value output from the output layer (layer 4) is not the target value itself that the neural network 420 should calculate from the real value provided to the input layer, but is proportional to the target value.
  • the scale unit 210 may restore the target value from the output integer value by applying an output scaling factor to the output integer value output from the output layer.
  • the target value may be an integer or a real value.
  • the target value may be expressed in a fixed-point form or in a floating-point form.
  • the structure of the neural network 420 is simply presented in FIG. 5 for convenience of explanation, it may be implemented more complexly than this according to an embodiment. However, from the structure of the neural network 420 shown in FIG. 5 , the more complex structures of the neural network 420 according to another embodiment can be fully understood.
  • Each of the nodes shown in FIG. 5 is configured to output an integer.
  • a multiplier existing inside each node may be configured to perform only integer multiplication. Since the multiplier does not need to perform multiplication between real numbers, its complexity is relatively low.
  • the neural network 420 may be a DNN or n-layer neural network including two or more hidden layers.
  • the neural network 420 may be a DNN including an input layer (Layer 1), two hidden layers (Layer 2 and Layer 3), and an output layer (Layer 4).
  • Layer 1 an input layer
  • Layer 2 and Layer 3 two hidden layers
  • Layer 4 an output layer
  • the neural network 420 is illustrated as including four layers, this is only an example and the neural network 420 may include fewer or more layers, or fewer or more channels. That is, the neural network 420 may include layers of various structures.
  • Each of the layers included in the neural network 420 may include a plurality of channels.
  • a channel may correspond to a plurality of artificial nodes/computation nodes/nodes known as neurons, processing elements (PEs), units, or similar terms.
  • Layer 1 may include two channels (nodes)
  • Layer 2 and Layer 3 may each include three channels.
  • each of the layers included in the neural network 420 may include a variable number of channels (nodes).
  • FIG. 6 shows a method of generating an output activation at each node of the input layer of the neural network shown in FIG. 5 .
  • Activation having a real value may be provided to each node (eg, N11 ) of the input layer (layer 1).
  • the real value may have a floating-point or fixed-point format.
  • Each node of the input layer may convert the input real value into an output activation having an integer value.
  • S I is the input scaling factor given in advance
  • I A is the input activation (real number)
  • q I is the output activation (integer)
  • the output activation q I is generated by approximating to the integer value by removing the decimal part from the value obtained by dividing the input activation I A by the input scaling factor S I given in advance.
  • the operator quantize ( ) is an operator that approximates a real number to an integer close to it.
  • S 1 S I .
  • the superscript attached to each term indicates the layer number '1' of the first layer, which is the input layer.
  • the output of each node of the input layer is an integer.
  • the input scaling factor S I may be provided from a device different from the terminal 200 , for example, a server.
  • FIG. 7 shows a method of generating an output activation at each node of an intermediate layer (hidden layer) of the neural network shown in FIG. 5 .
  • FIG. 7 shows a method of generating output activation at each node of the second layer (layer 2) of the neural network shown in FIG. 5 .
  • the nodes N11 and N12 of the input layer output output activations q I1 1 and q I2 1 .
  • Output activations output from the nodes of the input layer may be input to each node of the second layer (layer 2) that is a downstream layer of the input layer (layer 1) as input activations for the second layer.
  • pre-prepared weights q' w.ab 1 corresponding to each of the respective input activations may be input to each node of the second layer.
  • the output activation q Ia 2 of an arbitrary node (a) in the second layer may be expressed as in [Equation 2].
  • Equation 2 A multiplication operation is performed in Equation 2, and since the multiplication is multiplication between integers, an operation for multiplying two integers is sufficient. This has the advantage of consuming less hardware resources than an operator performing multiplication between real numbers.
  • the multiplication operator only needs to be provided upstream of the summation operator, and does not need to be provided downstream of the summation operator. Only the Right_Shift operator needs to be provided downstream of the summation operator.
  • a denotes an index identifying a node of the second layer
  • b denotes an index identifying a node of a first layer upstream of the second layer
  • a superscript is a layer number indicates That is, the weight assigned to each link of the neural network 420 shown in FIG. 5 may be a value independently provided according to the number of the layer to which the weight is provided, and a source node and a target node of each link.
  • each weight q' w.ab Layer_L may be a value provided to the terminal 200 by a device different from the terminal 200 , for example, a server.
  • both the input and output of each node of the second layer are integers.
  • b 21 shown in FIG. 7 is a bias applied to the node N21 and may have an integer value.
  • FIG. 8 shows an output generation method at each node of the intermediate layer (hidden layer) of the neural network shown in FIG. 5 .
  • FIG. 8 shows a method of generating output activation at each node of the third layer (layer 3) of the neural network 420 shown in FIG. 5 .
  • the 22nd node N22 of the second layer outputs the output activation q I2 2
  • the 23rd node N23 of the second layer outputs the output activation q I3 2 .
  • Output activations output from the nodes of the second layer may be input to each node of the third layer, which is a downstream layer of the second layer, as input activations for the third layer.
  • pre-prepared weights q' w.ab 2 corresponding to each of the respective input activations may be input to each node of the third layer.
  • the 31st node N31 of the third layer outputs an output activation q I1 3 .
  • the output activation q Ia 3 of an arbitrary node (a) of the third layer may be expressed as in [Equation 3].
  • q Ia 3 Right_Shift ⁇ n3 , ( q I1 2 *q' w.a1 2 + q I2 2 *q' w.a2 2 + q I3 2 *q' w.a3 2 + b 31 ) ⁇
  • Equation 3 A multiplication operation is performed in Equation 3, and since the multiplication is multiplication between integers, an operation for multiplying two integers is sufficient.
  • a represents an index for identifying a node of the third layer
  • b represents an index for identifying a node of a second layer upstream of the third layer
  • the superscript is a layer number indicates
  • b 31 shown in FIG. 8 is a bias applied to the node N31 and may have an integer value.
  • FIG. 9 shows an output generation method at each node of the output layer of the neural network shown in FIG. 5 .
  • FIG. 9 shows a method of generating output activation at each node of the fourth layer (output layer) of the neural network 420 shown in FIG. 5 .
  • FIG. 9 a 41 th node N41 among the nodes of the output layer shown in FIG. 5 is shown in FIG. 9 .
  • the 32nd node N32 of the third layer outputs the output activation q I2 3
  • the 33rd node N33 of the third layer outputs the output activation q I3 3 .
  • pre-prepared weights q' w.ab 3 corresponding to each of the respective input activations may be input to each node of the fourth layer.
  • the output activation q Ia 4 of any node of the fourth layer may be expressed as [Equation 4].
  • q Ia 4 Right_Shift ⁇ n4 , ( q I1 3 *q' w.a1 3 + q I2 3 *q' w.a2 3 + q I3 3 *q' w.a3 3 + b 41 ) ⁇
  • Equation 4 A multiplication operation is performed in Equation 4, and since the multiplication is multiplication between integers, an operation for multiplying two integers is sufficient.
  • a represents an index for identifying a node of the fourth layer
  • b represents an index for identifying a node of a third layer upstream of the fourth layer
  • the superscript is a layer number indicates
  • b 41 shown in FIG. 9 is a bias applied to the node N41 and may have an integer value.
  • Equation 2 Any two distinct numbers selected from n2, n3, and n4 shown in Equation 2, Equation 3, and Equation 4 may be different from or the same as each other.
  • a is an index of a specific node of layer L+1
  • b is an index of a specific node of layer L
  • Layer_N is a link from node b of layer L to node a of layer L+1 is the weight assigned to
  • FIG. 10 is a diagram illustrating a method of restoring a target value from an output integer value output from each node of the output layer of the neural network shown in FIG. 5 .
  • the restored value O A restored by the scale unit 210 from the output integer value q O output by the node belonging to the output layer has the same relationship as [Equation 6].
  • S O is a pre-given output layer scale factor.
  • the restored value output by the scale unit may be expressed in a real number of a fixed-point format or a floating-point format.
  • SO may be a value provided to the terminal 200 by a device other than the terminal, for example, a server.
  • the Right_Shift operation performed at each node of the hidden layers and the output layer of the neural network 420 shown in FIG. 5 may all be an n-bit shift.
  • the output values of each node of the neural network 420 shown in FIG. 5 are all n-bit integers, and the values of each weight are also all n-bit integers.
  • 11 shows information provided by the server to the terminal.
  • the server 100 may provide the above-described input layer scaling factor (S I ) and output layer scaling factor (SO ) to the terminal 200 .
  • the server 100 may provide all the above-described weights to the terminal 200 .
  • the server 100 may provide the terminal 200 with an n value used for the n-bit Right-Shift operation.
  • the n value used for the right-shift operation may be independently set for each layer. Accordingly, the server 100 may provide the terminal 200 with n values to be assigned to each layer of the integer neural network provided according to an embodiment of the present invention.
  • the server 100 must be able to generate the above-described data. This method will be described with reference to FIG. 12 .
  • the server 100 may provide the terminal 200 with only the input layer scaling factor ( SI ), the output layer scaling factor (SO ), weights, and n values used for the n-bit Right-Shift operation. there is.
  • the server 100 may have to provide not only the above-mentioned information but also the structure information of the neural network to the terminal 200 .
  • the neural network 410 shown in FIG. 12 has a structure corresponding to the neural network 420 shown in FIG. 5 .
  • the weight w.ab assigned to each link of the learned neural network 410 may be a real value.
  • the real value may be expressed in a fixed-point method or a floating-point method.
  • the neural network 410 illustrated in FIG. 12 includes a total of four layers including an input layer and an output layer. Each layer is assigned a corresponding scaling factor.
  • the value of the scaling factor may be determined as an arbitrary value, but in a preferred embodiment may be determined according to a well-designed method.
  • Patent Publication No. 10-2019-0014900 provides an example of generating a scaling factor assigned to each layer.
  • the scaling factors S 2 , S 3 allocated to the hidden layer do not need to be provided to the terminal 200 , but are used in the process of calculating the respective weights q a Layer_L to be provided to the terminal 200 . can be
  • the weight to be provided by the server 100 to the terminal 200 may be calculated by Equation 7.
  • layer_L represents the L-th layer
  • layer_L+1 represents the L+1-th layer
  • S Layer_L represents the scaling factor given to the L-th layer
  • S Layer_L+1 represents the scaling factor given to the L+1-th layer
  • w.ab Layer_L represents the weight given to the link from node b of the Lth layer to the node a of the L+1th layer
  • q' w.ab layer_L is an n-bit integer, which is a quantized weight obtained from w.ab Layer_L for the server to provide to the terminal.
  • FIG. 13A is a block diagram illustrating a hardware configuration of a server according to an embodiment of the present invention.
  • the server may be referred to as a neural network quantization device.
  • the server 100 includes a processor 110 and a memory 120 .
  • the server 100 shown in FIG. 13A only the components related to the present embodiments are shown. Accordingly, it is apparent to those skilled in the art that other general-purpose components may be further included in the server 100 in addition to the components illustrated in FIG. 13A .
  • the server 100 generates a neural network 410, trains (or learns) the neural network 410, or quantizes a floating-point type neural network 410 into a fixed-point type neural network 420. Or, it corresponds to a computing device having various processing functions, such as functions to retrain the neural network 410 .
  • the server 100 may be implemented with various types of devices such as a personal computer (PC), a server device, and a mobile device.
  • the processor 110 serves to perform an overall function for controlling the server 100 .
  • the processor 110 generally controls the server 100 by executing programs stored in the memory 120 in the server 100 .
  • the processor 110 may be implemented as a central processing unit (CPU), a graphics processing unit (GPU), an application processor (AP), etc. provided in the server 100 , but is not limited thereto.
  • the memory 120 is hardware for storing various types of data processed in the server 100 , and for example, the memory 120 may store data processed by the server 100 and data to be processed. In addition, the memory 120 may store applications to be driven by the server 100 , drivers, and the like.
  • the memory 120 may be a DRAM, but is not limited thereto.
  • the memory 120 may include at least one of a volatile memory and a nonvolatile memory.
  • Non-volatile memory includes ROM (Read Only Memory), PROM (Programmable ROM), EPROM (Electrically Programmable ROM), EEPROM (Electrically Erasable and Programmable ROM), Flash memory, PRAM (Phase-change RAM), MRAM (Magnetic RAM), RRAM (Resistive RAM), FRAM (Ferroelectric RAM), and the like.
  • Volatile memory includes DRAM (Dynamic RAM), SRAM (Static RAM), SDRAM (Synchronous DRAM), PRAM (Phase-change RAM), MRAM (Magnetic RAM), RRAM (Resistive RAM), FeRAM (Ferroelectric RAM), etc. .
  • the memory 1940 is a hard disk drive (HDD), solid state drive (SSD), compact flash (CF), secure digital (SD), micro secure digital (Micro-SD), mini-SD (mini). secure digital), xD (extreme digital), or Memory Stick.
  • HDD hard disk drive
  • SSD solid state drive
  • CF compact flash
  • SD secure digital
  • Micro-SD micro secure digital
  • mini-SD mini-SD
  • secure digital xD (extreme digital), or Memory Stick.
  • the processor 110 may generate the trained neural network 410 by repeatedly training (learning) a given initial neural network.
  • the initial neural network may have floating-point type parameters, for example, parameters of 32-bit floating point precision in order to secure processing accuracy of the neural network.
  • the parameters may include various types of data input/output to the neural network, such as input/output activations, weights, and biases of the neural network. As the neural network iteratively trains, the floating-point parameters of the neural network can be tuned to compute a more accurate output for a given input.
  • Floating point requires a relatively large amount of computation and memory access frequency compared to fixed point.
  • most of the amount of computation required for processing a neural network is a convolution operation that performs computation of various parameters.
  • the processing of the neural network 410 having floating-point type parameters may not be smooth.
  • the floating-point type parameters processed in the neural network 410 are can be quantized.
  • parameter quantization means converting a floating-point type parameter into a fixed-point type parameter having an integer value.
  • the server 100 converts the parameters of the trained neural network 410 into a fixed-point type of predetermined bits in consideration of the processing performance of the device (eg, mobile device, embedded device, etc.) to which the neural network is to be deployed (eg, mobile device, embedded device, etc.) After performing quantization, the server 100 transmits the quantized neural network 420 to the device to be employed.
  • the device to which the neural network 420 is to be employed may be the aforementioned terminal 200 . Specific examples include, but are not limited to, autonomous vehicles, robotics, smart phones, tablet devices, augmented reality (AR) devices, and Internet of Things (IoT) devices that perform voice recognition and image recognition using neural networks.
  • AR augmented reality
  • IoT Internet of Things
  • the processor 110 obtains data of the neural network 410 that is pre-trained using floating points, stored in the memory 120 .
  • the pretrained neural network 410 may be data repeatedly trained with floating-point type parameters. Training of the neural network may be first iteratively trained with training-set data as input, and then iteratively trained again with test-set data, but is not necessarily limited thereto.
  • the training-set data is input data for training a neural network
  • the test set data is input data that does not overlap with the training-set data, and is data for training while measuring the performance of a neural network trained with the training-set data.
  • the processor 110 may analyze the statistical distribution for each channel of the floating-point type parameter values used in each layer included in each of the feature maps and the kernels from the pre-trained neural network data.
  • the processor 110 may determine the scaling factors corresponding to each of the above-described layers based on the analyzed statistical distribution for each layer. For example, the scaling factors may be determined so that the above-described q' w.ab Layer_L value has sufficient resolution in the form of an n bit integer.
  • the memory 120 is, for example, untrained initial neural network data, neural network data generated in the training process, neural network data for which all training has been completed, quantized neural network data, etc. to be processed or processed by the processor 110 .
  • a related data set may be stored, and various programs related to a training algorithm of a neural network, a quantization algorithm, etc. to be executed by the processor 110 may be stored.
  • 13B is a diagram for explaining the quantization of a pre-trained neural network according to an embodiment of the present invention and employing it in a terminal (hardware accelerator).
  • the processor (110 in FIG. 13A) is a floating-point type (eg, 32-bit floating-point type) neural network train Since the pretrained neural network 410 itself may not be efficiently processed in a low-power or low-performance hardware accelerator due to floating-point type parameters, the processor 110 of the server 100 operates the floating-point type neural network 410 ) is quantized into the neural network 420 of a fixed-point type (eg, a fixed-point type of 16 bits or less).
  • a floating-point type eg, 32-bit floating-point type
  • the terminal is dedicated hardware for driving the neural network 420 , and since it is implemented with relatively low power or low performance, it may be implemented more suitable for a fixed-point operation rather than a floating-point operation.
  • the hardware accelerator may correspond to, for example, a neural processing unit (NPU), a tensor processing unit (TPU), a neural engine, etc. which are dedicated modules for driving a neural network, but is not limited thereto.
  • the hardware accelerator for driving the quantized neural network 420 may be implemented in an independent device separate from the server 100 .
  • the present invention is not limited thereto, and the hardware accelerator may be implemented in the same device as the server 100 .
  • the terminal 200 may not implement the quantized neural network 420 as a hardware accelerator, but may be implemented by a CPU and software.
  • FIG. 14 is a diagram illustrating a neural network calculation method in a terminal provided according to a comparative embodiment.
  • each node further includes one arithmetic unit for multiplying integers. Accordingly, the embodiment of FIG. 7 has lower complexity than the embodiment of FIG. 14 .
  • an additional parameter q S 2 provided for the one added operator should be further provided. That is, the number of parameters that the server should provide to the terminal 200 is greater in the embodiment of FIG. 14 than in the embodiment of FIG. 7 .
  • the operation structure in each node according to FIG. 7 has a great advantage compared to the operation structure in each node according to FIG. 14 .
  • the neural network of FIG. 5 having each node structure according to FIG. 7 can also enjoy improved technical effects.
  • the multiplier and the adder used in the embodiment of the present invention may be integer type operators.
  • the present invention is a complex of next-generation intelligent semiconductor technology development (design)-artificial intelligence processor business, a research project supported by the Ministry of Science and ICT and the Information and Communication Planning and Evaluation Institute affiliated with the National Research Foundation of Open Edge Technology Co., Ltd. (the task execution organization). It was developed in the course of carrying out the research task of developing a sensory-based situational prediction type mobile artificial intelligence processor (task unique number 2020001310, task number 2020-0-01310, research period 2020.04.01 ⁇ 2024.12.31).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé de pilotage de réseau neuronal comprenant une étape de décalage consistant à obtenir un ensemble de premières valeurs en multipliant un ensemble de pondérations par des valeurs correspondantes d'un ensemble d'activations d'entrée, respectivement, obtenir une deuxième valeur en ajoutant un ensemble de premières valeurs les unes aux autres, et obtenir une activation de sortie d'un premier nœud par le décalage à droite de n bits de la deuxième valeur. Chacun dudit ensemble d'activations d'entrée et dudit ensemble de pondérations correspond à des données de type entier à n bits.
PCT/KR2021/014367 2020-11-03 2021-10-15 Procédé de calcul de réseau neuronal et procédé de production de pondération de réseau neuronal WO2022097954A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020200145549A KR102384588B1 (ko) 2020-11-03 2020-11-03 신경망 연산방법 및 신경망 가중치 생성방법
KR10-2020-0145549 2020-11-03

Publications (1)

Publication Number Publication Date
WO2022097954A1 true WO2022097954A1 (fr) 2022-05-12

Family

ID=81183000

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2021/014367 WO2022097954A1 (fr) 2020-11-03 2021-10-15 Procédé de calcul de réseau neuronal et procédé de production de pondération de réseau neuronal

Country Status (2)

Country Link
KR (1) KR102384588B1 (fr)
WO (1) WO2022097954A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20240099929A (ko) * 2022-12-22 2024-07-01 오픈엣지테크놀로지 주식회사 정수형 npu에서 동작하는 신경망을 위한 네트워크 파라미터 교정 방법 및 이를 위한 장치

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20030072860A (ko) * 2002-03-07 2003-09-19 엘지전자 주식회사 학습된 신경망 설계를 위한 어레이 구조 연산 방법
KR20190062129A (ko) * 2017-11-27 2019-06-05 삼성전자주식회사 컨볼루션 신경망 계산을 위한 저전력 하드웨어 가속 방법 및 시스템
KR20190074938A (ko) * 2017-12-20 2019-06-28 연세대학교 산학협력단 인공 신경망을 위한 디지털 뉴런, 인공 뉴런 및 이를 포함하는 추론 엔진
JP2019185134A (ja) * 2018-04-02 2019-10-24 Kddi株式会社 情報処理装置、学習方法、及びプログラム
KR20200061164A (ko) * 2018-11-23 2020-06-02 삼성전자주식회사 뉴럴 네트워크 연산 수행을 위한 뉴럴 네트워크 장치, 뉴럴 네트워크 장치의 동작 방법 및 뉴럴 네트워크 장치를 포함하는 애플리케이션 프로세서

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20030072860A (ko) * 2002-03-07 2003-09-19 엘지전자 주식회사 학습된 신경망 설계를 위한 어레이 구조 연산 방법
KR20190062129A (ko) * 2017-11-27 2019-06-05 삼성전자주식회사 컨볼루션 신경망 계산을 위한 저전력 하드웨어 가속 방법 및 시스템
KR20190074938A (ko) * 2017-12-20 2019-06-28 연세대학교 산학협력단 인공 신경망을 위한 디지털 뉴런, 인공 뉴런 및 이를 포함하는 추론 엔진
JP2019185134A (ja) * 2018-04-02 2019-10-24 Kddi株式会社 情報処理装置、学習方法、及びプログラム
KR20200061164A (ko) * 2018-11-23 2020-06-02 삼성전자주식회사 뉴럴 네트워크 연산 수행을 위한 뉴럴 네트워크 장치, 뉴럴 네트워크 장치의 동작 방법 및 뉴럴 네트워크 장치를 포함하는 애플리케이션 프로세서

Also Published As

Publication number Publication date
KR102384588B1 (ko) 2022-04-08

Similar Documents

Publication Publication Date Title
WO2021060609A1 (fr) Système informatique distribué comprenant une pluralité de périphéries et un nuage et procédé de fourniture de modèle pour l'utilisation d'intelligence adaptative de celui-ci
WO2019194465A1 (fr) Processeur de réseau neuronal
EP3735662A1 (fr) Procédé de réalisation d'apprentissage d'un réseau neuronal profond et appareil associé
WO2021054614A1 (fr) Dispositif électronique et son procédé de commande
WO2020235797A1 (fr) Appareil de traitement d'opération de multiplication modulaire et procédés associés
WO2021225262A1 (fr) Génération de modèle dnn optimisé basé sur la recherche d'architecture neuronale permettant l'exécution de tâches dans un dispositif électronique
WO2022097954A1 (fr) Procédé de calcul de réseau neuronal et procédé de production de pondération de réseau neuronal
WO2020231226A1 (fr) Procédé de réalisation, par un dispositif électronique, d'une opération de convolution au niveau d'une couche donnée dans un réseau neuronal, et dispositif électronique associé
WO2020045794A1 (fr) Dispositif électronique et procédé de commande associé
WO2020159016A1 (fr) Procédé d'optimisation de paramètre de réseau neuronal approprié pour la mise en œuvre sur matériel, procédé de fonctionnement de réseau neuronal et appareil associé
WO2021158085A1 (fr) Procédé de mise à jour de réseau neuronal, procédé de classification et dispositif électronique
WO2023287239A1 (fr) Procédé et appareil d'optimisation de fonction
WO2022255632A1 (fr) Dispositif et procédé de réseau de neurones artificiels de création de conception automatique, faisant appel à des bits ux
WO2022216109A1 (fr) Procédé et dispositif électronique de quantification d'un modèle de réseau neuronal profond (rnp)
WO2021230470A1 (fr) Dispositif électronique et son procédé de commande
WO2022270815A1 (fr) Dispositif électronique et procédé de commande de dispositif électronique
WO2021125496A1 (fr) Dispositif électronique et son procédé de commande
WO2023229094A1 (fr) Procédé et appareil pour la prédiction d'actions
WO2020091253A1 (fr) Dispositif électronique et procédé de commande d'un dispositif électronique
WO2019198900A1 (fr) Appareil électronique et procédé de commande associé
WO2023043108A1 (fr) Procédé et appareil permettant d'améliorer la précision efficace d'un réseau neuronal par extension d'architecture
WO2021158040A1 (fr) Dispositif électronique fournissant un énoncé correspondant au contexte d'une conversation, et procédé d'utilisation associé
WO2023075372A1 (fr) Procédé et dispositif électronique pour effectuer une opération de réseau neuronal profond
WO2024058572A1 (fr) Accumulateur multibits et processeur informatique en mémoire le comprenant
WO2024136129A1 (fr) Procédé de correction de paramètre de réseau pour réseau neuronal fonctionnant dans une npu de type entier, et dispositif associé

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21889419

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21889419

Country of ref document: EP

Kind code of ref document: A1