WO2022239448A1 - Quantization program, information processing device, and range determination method - Google Patents

Quantization program, information processing device, and range determination method Download PDF

Info

Publication number
WO2022239448A1
WO2022239448A1 PCT/JP2022/011040 JP2022011040W WO2022239448A1 WO 2022239448 A1 WO2022239448 A1 WO 2022239448A1 JP 2022011040 W JP2022011040 W JP 2022011040W WO 2022239448 A1 WO2022239448 A1 WO 2022239448A1
Authority
WO
WIPO (PCT)
Prior art keywords
channel
value range
quantization
channels
range
Prior art date
Application number
PCT/JP2022/011040
Other languages
French (fr)
Japanese (ja)
Inventor
雄一 尾崎
Original Assignee
コニカミノルタ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by コニカミノルタ株式会社 filed Critical コニカミノルタ株式会社
Priority to JP2023520860A priority Critical patent/JPWO2022239448A1/ja
Publication of WO2022239448A1 publication Critical patent/WO2022239448A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention relates to a quantization program, an information processing device, and a range determination method.
  • quantized deep learning models that can be operated with high-speed and small-scale circuits are attracting attention.
  • a quantized deep learning model it is necessary to construct a model that achieves both high-precision and high-speed operation through quantization, and it is important to set the optimal value range during quantization.
  • the parameters (weight, bias) of each layer of the quantized deep learning model and the output of each layer have multiple channels, and depending on the model, the value scale may differ greatly for each channel.
  • the value range is common to all channels, when the scale of the value for each channel differs greatly, the error due to quantization increases and the accuracy may decrease.
  • Non-Patent Document 1 learning using pseudo-quantization is used to maintain accuracy, but it is desirable to have a value range for each channel.
  • each channel has a value range, but since it is a method of determining the value range based on the distribution of parameters and outputs, it may not be the optimal value range.
  • a method of searching for the range that minimizes the loss (loss) for the correct data using learning can be considered, but many quantized deep learning models have the number of layers Since the number of channels in each layer is large compared to , executing a search for each channel requires a very long learning time.
  • the number of times of learning common to all channels is the number of times obtained by multiplying the number of layers to be quantized by the range pattern per layer.
  • the number of times of learning for each channel is the number of times obtained by multiplying the number of layers to be quantized by the range pattern per layer and the number of channels in each layer.
  • the present invention provides a quantization program, an information processing apparatus, and a value range determination method that simultaneously suppresses deterioration in the accuracy of a quantized deep learning model and shortens the learning time required for quantization. That is the issue.
  • the initial value of the quantization target layer is an initial value of a layer different from the top layer;
  • the different value ranges for each channel are Calculated using any one of weighting, bias, and output distribution for each channel in the quantization target layer, The quantization program according to any one of (1) to (6).
  • the different range of values for each channel is a ratio of the range of values for each channel;
  • the quantization program according to any one of (1) to (7).
  • An information processing device comprising:
  • (13) in the quantization target layer of the deep learning model separating the value range in quantization of the parameters of the deep learning model into a value range common to channels and a different value range for each channel; determining the separated value range common to the channels and the different value ranges in each of the channels using different methods; Binning method including .
  • the quantization program in the quantization program, the information processing device, and the value range determination method, it is possible to achieve both suppression of accuracy deterioration of the quantization deep learning model and shortening of the learning time required for quantization.
  • the present invention by determining the value range using the values of each layer before quantization, it is possible to prevent the performance of the model before quantization from being greatly changed by learning.
  • FIG. 3 is a functional block diagram showing functions of a CPU of the information processing device according to the embodiment;
  • FIG. 4 is an explanatory diagram showing an example of a deep learning model executed by an inference unit;
  • FIG. 4 is an explanatory diagram showing the number of parameters each layer of a deep learning model has;
  • FIG. 2 is a flowchart showing a process in which the information processing apparatus of the present embodiment executes range search of a deep learning model (No. 1);
  • FIG. 2 is a flowchart showing a process of executing a range search of a deep learning model by the information processing apparatus of the present embodiment (No. 2).
  • FIG. 1 is an explanatory diagram illustrating a main configuration example of an information processing apparatus 100 according to this embodiment.
  • the information processing device 100 determines the value range of the quantized deep learning model by executing the control program.
  • the same components are denoted by the same reference numerals, and the description thereof is omitted as appropriate.
  • the information processing apparatus 100 includes a CPU (Central Processing Unit) 110, a storage unit 120, a ROM (Read Only Memory) 130, a RAM (Random Access Memory) 140, an input unit 150, a display unit 160, and a communication unit. 170.
  • a CPU Central Processing Unit
  • storage unit 120 a storage unit 120
  • ROM Read Only Memory
  • RAM Random Access Memory
  • the CPU 110 implements each process (function) shown in FIG. 2 by executing the control program stored in the storage unit 120 or ROM 130 . Note that each process embodied by the CPU 110 will be described later with reference to FIG.
  • the storage unit 120 is configured by a large-capacity storage device, and includes, for example, a hard disk drive, a non-volatile memory, and the like. Storage unit 120 stores a control program.
  • the RAM 140 functions as a work area that temporarily stores various programs read from the ROM 130 and executable by the CPU 110, input or output data, parameters, etc. in various processes executed and controlled by the CPU 110.
  • the input unit 150 comprises a keyboard with cursor keys, number input keys, various function keys, etc., and a pointing device such as a mouse.
  • the input unit 150 outputs to the CPU 110 as an input signal a key press signal pressed on the keyboard or a mouse operation signal.
  • the CPU 110 executes various processes based on operation signals from the input unit 150 .
  • the display unit 160 includes a monitor such as a CRT (Cathode Ray Tube) or an LCD (Liquid Crystal Display).
  • the display unit 160 displays various screens according to instructions of display signals input from the CPU 110 .
  • the display unit 160 also has a function as the input unit 150 .
  • the communication unit 27 has a communication interface and communicates with external devices on the network.
  • FIG. 2 is a functional block diagram showing functions of the CPU 110 of the information processing apparatus 100 according to this embodiment. As shown in FIG. 2, by executing the control program, the CPU 110 executes the acquisition unit 10, the separation unit 20, the value range setting unit 30, the weight update unit 40, the inference unit 50, the loss calculation unit 60, the value range determination unit 70, A channel-by-channel value range determination unit 80 and a learning loss calculation unit 90 are implemented.
  • the acquisition unit 10 acquires an input image and correct data corresponding to the input image.
  • the acquisition unit 10 acquires this input image and teacher data indicating correct data as a set.
  • the format of each acquired image is composed of 8-bit grayscale (256 levels) and has 28 width ⁇ 28 height fields.
  • the separation unit 20 separates the value range in quantization of the parameters of the deep learning model DLM into a value range common to channels and a value range different for each channel. Weights and biases included in the convolution layer and the full-connect layer, and output values from each layer are quantized using quantization parameters S and Z according to the following equation (1).
  • the separating unit 20 separates into the value range S ALL that is common to all channels and the value range S CH that is different for each channel based on Equation (2).
  • the value range ratio SCH of each channel is determined in a stage prior to quantization by learning. Then, the value range S ALL common to all channels is determined by calculating losses in a plurality of value ranges and searching for the value range with the smallest loss.
  • the value range setting unit 30 includes a channel-by-channel value range setting unit 31 and a channel common value range setting unit 32 .
  • the value range setting unit 30 determines the value range common to the channels separated by the separation unit 20 and the value range different for each channel using different methods. For example, the channel-by-channel value range setting unit 31 sets a different value range for each channel, and the channel common value range setting unit 32 sets a common value range for each channel.
  • the channel-by-channel value range determination unit 80 uniquely determines the value range without performing a search using the ratio of the value ranges for each channel. For example, the channel-by-channel value range determining unit 80 calculates a different value range for each channel using any one of weighting, bias, and output distribution for each channel in the quantization target layer.
  • the channel-by-channel value range determination unit 80 determines weighting and bias using the maximum and minimum parameter distribution for each channel before quantization that has been acquired in advance. Also, the channel-by-channel range determining unit 80 may perform inference on the input data and determine the output for each channel using the histogram for each channel at that time. In this case, the channel-by-channel value range determination unit 80 may determine the initial value of the value range common to all channels by a similar method.
  • the channel-by-channel value range determination unit 80 may determine different value ranges for each channel based on the ratio of the value ranges for each channel. In this case, when a different value range is calculated for each channel by weighting or biasing, the channel-by-channel value range determining unit 80 determines a different value range for each channel using the ratio of the parameter distribution for each channel.
  • the channel-by-channel value range determination unit 80 uses the procedure for acquiring input data and the ratio of the distribution of the output during inference with respect to the input data for each channel. may determine a different range for each channel.
  • the channel-by-channel value range setting unit 31 sets a different value range for each channel, which is determined by the channel-by-channel value range determination unit 80 .
  • the different value ranges for each channel may be represented by bit shift.
  • the learning loss calculation unit 90 learns the deep learning model using the learning teacher data.
  • the learning loss calculator 90 acquires, for example, a training image data set of 60,000 images and a test data image data set of 10,000 images via the acquisition unit 10 .
  • the channel-common value range setting unit 32 temporarily sets the channel-common value range.
  • the weight updating unit 40 sequentially changes the value range common to the channels to quantize the parameters.
  • the weight updating unit 40 repeats the determination of the channel-common value range layer by layer from the layer immediately below the quantization target layer to the lowest layer of the deep learning model.
  • the inference unit 50 is composed of a deep learning model, which is a neural network.
  • the inference unit 50 performs quantization (inference) by converting model parameters expressed in 32-bit floating point into values expressible in 8-bit fixed point in the value range set by the value range setting unit 30. conduct.
  • the value range that differs for each channel is the value range set by the channel-by-channel value range setting unit 31
  • the channel-common value range is the value range provisionally set by the channel-common value range setting unit 32 .
  • FIG. 3 is an explanatory diagram showing an example of the deep learning model DLM executed by the inference unit 50.
  • This deep learning model DLM takes handwritten digits from "0" to "9" in an 8-bit grayscale field of width 28 ⁇ height 28 as input data and outputs the numbers as correct data.
  • the deep learning model DLM is configured with convolution CNV1, max pooling M1, convolution CNV2, max pooling M2, and full connect FLC1 and FLC2.
  • Convolution means a convolution layer
  • full connect means a connection layer.
  • the input image PT1 is input to the convolution CNV1.
  • the input image PT1 is an 8-bit grayscale image of width 28 ⁇ height.
  • FIG. 4 is an explanatory diagram showing the number of parameters each layer of the deep learning model DLM has.
  • Each layer of the deep learning model DLM in FIG. 3 will be referred to and explained as appropriate.
  • this explanatory diagram includes columns for layer name, input channel, output channel, kernel size, weight, bias, weight range, and output range.
  • Weight only weighting sets the value range for each channel, and the output value range is common to all channels.
  • the input image PT1 is input to the "input” layer, and the number of parameters of the output channel is "1".
  • a value range for quantizing the input image PT1 (grayscale) is set.
  • the number of input channel parameters is “1”
  • the number of output channel parameters is “32”
  • the number of kernel size parameters is “3 ⁇ 3”
  • the number of weighting parameters is “32 ⁇ 1 ⁇ 3 ⁇ 3”
  • the number of parameters for the bias is “32”
  • the number of parameters for the weighting region is “32”
  • the number of parameters for the output range is “1”.
  • the output size of this hierarchy is 32x28x28.
  • the number of input channel parameters is “32”
  • the number of output channel parameters is “64”
  • the number of kernel size parameters is “3 ⁇ 3”
  • the number of weighting parameters is “64 ⁇ 32 ⁇ 3 ⁇ 3”
  • the number of bias parameters is “64”
  • the number of weighting area parameters is “64”
  • the number of output range parameters is “1”.
  • the 'Convolution CNV2' layer extracts 64 feature maps. The output size of this hierarchy will be 64 ⁇ 7 ⁇ 7.
  • the number of input channel parameters is “3128”
  • the number of output channel parameters is “128”
  • the number of weighting parameters is “128 ⁇ 3128”
  • the number of bias parameters is “128”
  • the number of parameters in the weighting area is “128”
  • the number of parameters in the output range is “1”.
  • the number of input channel parameters is “128”
  • the number of output channel parameters is “10”
  • the number of weighting parameters is “10 ⁇ 128”
  • the number of bias parameters is “10”
  • the number of parameters in the weighting area is “10”
  • the number of parameters in the output range is “1”.
  • the "Full Connect FLC2" layer is the last layer, so the output dimension matches the class sum of 10.
  • the loss calculation unit 60 calculates the loss for the correct data of the estimation result by the deep learning model DLM based on the input data. For example, the loss calculator 60 calculates the loss for correct data based on Equation (3).
  • the loss calculation unit 60 may calculate the loss for the correct data of the estimation result based on the input data by the deep learning model DLM after learning.
  • the value range determination unit 70 determines the value range when the loss is minimized as the value range common to all channels.
  • FIGS. 5A and 5B are flowcharts showing the process of executing the range search of the deep learning model DLM by the information processing apparatus 100 of this embodiment.
  • the information processing apparatus 100 receives a user's operation through the input unit 150, for example, receives input of the input image PT1 and correct data corresponding to the input image PT1 (step S001). Thereby, the acquiring unit 10 of the CPU 110 acquires the input image PT1 and the correct answer data corresponding to the input image PT1.
  • the separation unit 20 of the CPU 110 separates the value range in the quantization of the parameters of the deep learning model DLM into a value range common to channels and a value range different for each channel in the quantization target layer of the deep learning model DLM.
  • the separation unit 20 separates the value range in quantization into a value range common to channels and a value range different for each channel, for example, according to the above equation (2).
  • the channel-by-channel value range setting unit 31 of the value range setting unit 30 of the CPU 110 sets a different value range for each channel (step S003).
  • the channel-by-channel value range setting unit 31 uniquely determines and sets the value range ratio for each channel by the channel-by-channel value range determination unit 80 without searching.
  • the channel-by-channel value range determination unit 80 determines weighting and bias using the pre-quantized parameter distribution for each channel obtained in advance.
  • the distribution of parameters for each channel before quantization means, for example, maximum and minimum values.
  • the channel-by-channel value range determination unit 80 performs inference on the input data and determines the output of each channel using the output distribution of each channel at that time. Note that the output distribution for each channel is, for example, a histogram.
  • the channel-common value range setting unit 32 of the value range setting unit 30 of the CPU 110 provisionally sets the channel-common value range (step S005).
  • the channel-by-channel value range determination unit 80 determines the initial value of the value range common to all channels by a similar method.
  • the weight updating unit 40 updates the range candidates including the initial value based on the initial value of the range common to all channels (step S007). For example, the weight updating unit 40 sets five values of 0.25 times, 0.5 times, 1 time, 2 times, and 4 times the constant multiple as value range candidates. That is, the weight updating unit 40 sequentially changes the value range common to the channels to quantize the parameters.
  • the inference unit 50 performs quantization (inference) by converting model parameters expressed in 32-bit floating point into values expressible in 8-bit fixed point in the value range set by the value range setting unit 30. Execute (step S009).
  • the loss calculation unit 60 calculates the loss for the correct data in the result of estimation by the deep learning model DLM based on the input data (step S011).
  • the loss calculator 60 may calculate the loss according to the above equation (3), or may use the mean square of the difference from the correct data.
  • the loss calculation unit 60 can also calculate the loss for the correct data of the estimation result based on the input data by the deep learning model DLM after learning.
  • the loss calculation unit 60 determines whether all the value ranges have been searched (step S013), and if all the value ranges have not been searched (No in step S013), the process returns to step S005.
  • the value range determination unit 70 determines the value range that minimizes the loss when inferring the input image PT1 as the value range common to all channels (step S015). ) and proceed to step S017.
  • the channel-common value range setting unit 32 provisionally sets the channel-common value range (step S005), and repeats the processing up to step S011.
  • step S017 the value range determination unit 70 determines whether the value ranges of all parameters of weighting, bias, and output for each channel have been searched (step S017). Then, if it is determined that all parameters have not been searched (No in step S017), the value range determining unit 70 returns to step S005.
  • the search may be performed by separating the value range common to the channels and the value range different in each channel, or searching any one of them. good.
  • step S017 the range determining unit 70 determines whether all layers have been searched.
  • the range determining unit 70 ends the process of executing the range search of the deep learning model DLM.
  • step S019 the range determining unit 70 returns to step S005. Then, the CPU 110 repeats similar processing.
  • the weight updating unit 40 repeats determination of a channel-common value range layer by layer from the layer immediately below the quantization target layer to the lowest layer of the deep learning model DLM.
  • the initial value of the quantization target layer can be the initial value of the uppermost layer.
  • the initial value of the quantization target layer may be the initial value of a layer different from the top layer.
  • the CPU 110 of the information processing apparatus 100 can determine the value range for each layer in order from the upper layer before determining the value range, and can prevent the parameters and characteristics before quantization from greatly varying. .
  • the CPU 110 of the information processing apparatus 100 repeats the search until all layers (from the top layer to the bottom layer) are searched (step S019).
  • the process of executing the range search of the learning model DLM ends.
  • the CPU 110 of the information processing apparatus 100 separates the value range in the quantization of the parameters of the deep learning model DLM into a channel-common value range and a different value range for each channel. Then, the CPU 110 of the information processing apparatus 100 determines the value range common to the channels and the value range different for each channel using different methods.
  • the CPU 110 of the information processing apparatus 100 sets a value range for each channel, thereby suppressing performance deterioration due to quantization, and sets a value range common to all channels for the time required for learning. can be reduced to the same amount of time as when
  • the CPU 110 of the information processing apparatus 100 can achieve both suppression of accuracy deterioration and shortening of the learning time required for quantization.
  • CPU 110 of information processing apparatus 100 determines the value range ratio SCH of each channel using the value of each layer before quantization. This can prevent the performance of the deep learning model DLM before quantization from significantly changing due to learning.
  • FIGS. 5A and 5B will be supplemented as appropriate with respect to the content of processing in the invention according to the present embodiment.
  • step S001 in FIG. 5A the acquisition unit 10 of the CPU 110 receives the input image PT1 and the correct data corresponding to the input image PT1. Learned parameters (weights and biases) can be loaded.
  • step S003 in FIG. 5A the channel-by-channel value range setting unit 31 of the value range setting unit 30 of the CPU 110 sets a different value range for each channel. We perform inference to determine the range and check the output distribution from each layer.
  • the weighting area S CH (32 in total) for each channel is calculated from the 1 ⁇ 1 ⁇ 3 weighting distribution associated with the 32 output channels.
  • the value range SCH which is different for each channel, may be represented by bit shift.
  • step S005 in FIG. 5A the channel-common value range setting unit 32 of the value range setting unit 30 of the CPU 110 provisionally sets a value range common to all channels.
  • S ALL is assumed to be the initial value for the search.
  • the inference unit 50 of the CPU 110 can perform quantization using the ratio to the maximum value as the value range for each channel (total of 32 values).
  • the inference unit 50 of the CPU 110 may quantize (calculate) the value range S from the distribution of the output of the convolution CNV1 of 32 ⁇ 28 ⁇ 28, and use it as an initial value at the time of searching. .
  • step S019 in FIG. 5B it is determined whether or not all layers have been searched, and the value range is determined from the upper layer to the lower layer (step S021).
  • a value range can be determined for each layer, and an optimal value range for each layer can be determined.
  • the optimum value range is determined in each layer for each parameter of output (input), weighting, and bias, but the processing from step S005 to step S013 in FIG. It is not limited.
  • basic concepts for determining the range of output (input), weighting, and bias will be described below. Note that the processing of the CPU 110 based on the following concept may be executed in any step.
  • the CPU 110 makes an inference in advance in the inference unit 50 , sets the obtained value range S as an initial value at the time of searching, and sets a range candidate including the initial value by the value range setting unit 30 .
  • the weight updating unit 40 sets five values of 0.25 times, 0.5 times, 1 time, 2 times, and 4 times the constant multiple as value range candidates. This corresponds to the channel-common range provisional setting process described in step S005 of FIG. 5A.
  • the inference unit 50 of the CPU 110 performs inference in each value range for the set range candidates, and the loss calculation unit 60 calculates a loss. This corresponds to the inference processing described in step S009 of FIG. 5A and the loss calculation processing described in step S011.
  • the loss calculator 60 of the CPU 110 repeats loss calculation for the number of candidates. This corresponds to the repeated loop from steps S005 to S013 of FIG. 5A.
  • the value range determination unit 70 determines the value range with the lowest loss as the value range for the output (input). This corresponds to the channel common range determination process described in step S015 of FIG. 5B.
  • the weighting and bias of the quantization target layer are made variable, and the channel common range setting unit 32 is updated by learning by the learning loss calculation unit 90 .
  • the CPU 110 causes the inference unit 50 to make an inference in advance, sets the obtained value range S as an initial value at the time of searching, and sets a range candidate including the initial value by the value range setting unit 30 .
  • the weight updating unit 40 sets five values of 0.25 times, 0.5 times, 1 time, 2 times, and 4 times the constant multiple as value range candidates. This corresponds to the channel-common range provisional setting process described in step S005 of FIG. 5A.
  • the learning loss calculation unit 90 of the CPU 110 executes learning in each range for the range candidates defined in (1) of (B), and while updating the weighting and bias, the loss calculation unit 60 calculates the loss. calculate. This corresponds to the weighting update process described in step S007 in FIG. 5A, the inference process described in step S009, and the loss calculation process described in step S011.
  • the CPU 110 causes the loss calculation unit 60 to repeat calculation of the loss by the number of range candidates. This corresponds to the repeated loop from steps S005 to S013 of FIG. 5A.
  • CPU 110 determines, in value range determination section 70, the value range with the lowest loss in loss calculation section 60 as the value range for the weighting. This corresponds to the channel common range determination process described in step S015 of FIG. 5B.
  • the weighting and bias updated by the learning of the loss calculation unit 60 are recorded, and the state before learning is restored before searching.
  • the CPU 110 can use the values when the weights and biases were also learned in that range.
  • Equation (4) Method of Determining Bias Range of Convolution Layer and Fully Connected Layer The relationship between weighting and bias in the convolution layer or fully connected layer can be expressed as shown in Equation (4).
  • the bias scale can be regarded as equivalent to the product of the weighting and the input in the quantization target layer, so the bias range S BIAS can be calculated as in Equation (5).
  • the weighting range S can be separated into a channel-common range S ALL and a different range S CH for each channel by Equation (2), so the channel-common bias range S ALLBIAS is , can be expressed as in equation (6).
  • bias value range SCHBIAS that is different for each channel can be expressed as in Equation (7).
  • the output (input), weighting, and bias parameters are not limited to the processing from step S005 to step S013 in FIG. An optimal range can be determined.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

A quantization program for causing a CPU (110) to run includes: a procedure by which in a quantization target layer of a deep learning model (DLM), a range for quantization of parameters of the deep learning model (DLM) is divided into a channel shared range and ranges different for each channel; and a procedure for using respectively different methods to determine the divided channel shared region and regions different for each channel, and the quantization program causes the CPU (110) to execute said procedures.

Description

[規則26に基づく補充 05.04.2022] 量子化プログラム、情報処理装置、及び値域決定方法[Replenishment based on Rule 26 05.04.2022] Quantization program, information processing device, and range determination method
 本発明は、量子化プログラム、情報処理装置、及び値域決定方法に関する。 The present invention relates to a quantization program, an information processing device, and a range determination method.
 近年、AI(Artificial Intelligence)技術開発が進み、ディープラーニング(Deep Learning)に関する技術の普及が著しい。その中で、様々な身近なものに、低コストでAI機能を導入しようとするエッジコンピューティング技術が注目されている。 In recent years, the development of AI (Artificial Intelligence) technology has progressed, and the spread of technology related to deep learning has been remarkable. Among them, edge computing technology that attempts to introduce AI functions into various familiar things at low cost is attracting attention.
 エッジコンピューティング技術では、低コストおよび低消費電力等が要求されるため、高速かつ小規模な回路で演算可能な量子化ディープラーニングモデルが注目されている。量子化ディープラーニングモデルでは、量子化により高精度かつ高速動作の両立を図るモデルを構築する必要があり、量子化時に最適な値域を設定することが重要である。  Since edge computing technology requires low cost and low power consumption, quantized deep learning models that can be operated with high-speed and small-scale circuits are attracting attention. In a quantized deep learning model, it is necessary to construct a model that achieves both high-precision and high-speed operation through quantization, and it is important to set the optimal value range during quantization.
特開2019-32833号公報JP 2019-32833 A 特開2020-149311号公報JP 2020-149311 A
 量子化ディープラーニングモデルの各層のパラメータ(重み付け(Weight)、バイアス)や各層の出力は、複数のチャンネルを持ち、モデルによってはチャンネル毎に値のスケールが大きく異なることがある。一方、値域が全チャンネルで共通の場合、チャンネル毎の値のスケールが大きく異なる際、量子化による誤差が大きくなり、精度が低下することがある。  The parameters (weight, bias) of each layer of the quantized deep learning model and the output of each layer have multiple channels, and depending on the model, the value scale may differ greatly for each channel. On the other hand, when the value range is common to all channels, when the scale of the value for each channel differs greatly, the error due to quantization increases and the accuracy may decrease.
 そこで、非特許文献1では、精度を維持するために疑似量子化を用いた学習で対応しているが、チャンネルごとに値域を持つことが望ましい。 Therefore, in Non-Patent Document 1, learning using pseudo-quantization is used to maintain accuracy, but it is desirable to have a value range for each channel.
 また、特許文献1及び2では、チャンネルごとに値域を持っているが、パラメータや出力の分布に基づいて値域を決定する方法であるため、最適な値域とならない場合がある。 Also, in Patent Documents 1 and 2, each channel has a value range, but since it is a method of determining the value range based on the distribution of parameters and outputs, it may not be the optimal value range.
 ここで、最適な値域を決定する方法として、学習を用いて、正解データに対するロス(loss)が最も小さくなる値域を検索する方法が考えられるが、多くの量子化ディープラーニングモデルは、層の数に比べて各層のチャンネルの数が多いため、チャンネル毎に検索を実行すると、学習時間に非常に多くの時間を必要とする。 Here, as a method of determining the optimum range, a method of searching for the range that minimizes the loss (loss) for the correct data using learning can be considered, but many quantized deep learning models have the number of layers Since the number of channels in each layer is large compared to , executing a search for each channel requires a very long learning time.
 具体的には、全チャンネル共通の学習回数は、量子化対象の層の数に、1層あたりの値域パターンを乗算した回数となる。また、チャンネルごとの学習回数は、量子化対象の層の数に、1層あたりの値域パターンと各層のチャンネル数とを、乗算した回数となる。 Specifically, the number of times of learning common to all channels is the number of times obtained by multiplying the number of layers to be quantized by the range pattern per layer. Also, the number of times of learning for each channel is the number of times obtained by multiplying the number of layers to be quantized by the range pattern per layer and the number of channels in each layer.
 このように、従来の量子化ディープラーニングモデルでは、チャンネルの最適な値域の探索において、多くの学習時間が必要となる。 In this way, conventional quantized deep learning models require a lot of learning time to search for the optimal range of channels.
 そこで、このような課題に関し、本発明では、量子化プログラム、情報処理装置、及び値域決定方法において、量子化ディープラーニングモデルの精度劣化の抑制と量子化にかかる学習時間の短縮との両立を図ることを、課題とする。 In view of this problem, the present invention provides a quantization program, an information processing apparatus, and a value range determination method that simultaneously suppresses deterioration in the accuracy of a quantized deep learning model and shortens the learning time required for quantization. That is the issue.
 すなわち、本発明の上記課題は、下記の構成により解決される。
(1)ディープラーニングモデルの量子化対象層において、前記ディープラーニングモデルのパラメータの量子化における値域を、チャンネル共通の値域と、各チャンネルにおいて異なる値域に分離する手順、
 分離された、前記チャンネル共通の値域と、前記各チャンネルにおいて異なる値域とを、それぞれ異なる方法で決定する手順、
 をコンピュータに実行させるための量子化プログラム。
That is, the above problems of the present invention are solved by the following configurations.
(1) A procedure for separating the value range in quantization of the parameters of the deep learning model into a value range common to channels and a different value range for each channel in the quantization target layer of the deep learning model,
A procedure for determining the separated value range common to the channels and the different value ranges in each of the channels by different methods;
A quantization program for executing a computer.
(2)入力データと、前記入力データに対応する正解データとを取得する手順、
 前記チャンネル共通の値域を順次変えて、前記パラメータを量子化する手順と、
 前記入力データに基づく前記ディープラーニングモデルによる推定結果の前記正解データに対するロスを算出し、当該ロスが最小になるときの値域を前記チャンネル共通の値域として決定する手順、
 をコンピュータに実行させるための(1)に記載の量子化プログラム。
(2) a procedure for obtaining input data and correct data corresponding to the input data;
a step of quantizing the parameter by sequentially changing the value range common to the channels;
a procedure of calculating a loss of the result of estimation by the deep learning model based on the input data with respect to the correct data, and determining a value range when the loss is minimized as a value range common to the channels;
The quantization program according to (1) for causing a computer to execute
(3)入力データと、前記入力データに対応する正解データとを取得する手順、
 前記チャンネル共通の値域を順次変えて、前記パラメータを量子化して、前記ディープラーニングモデルを学習用教師データを用いて学習する手順、
 学習後の前記ディープラーニングモデルによる前記入力データに基づく推定結果の前記正解データに対するロスを算出し、当該ロスが最小になるときの値域を前記チャンネル共通の値域として決定する手順、
 をコンピュータに実行させるための(1)に記載の量子化プログラム。
(3) a procedure for obtaining input data and correct data corresponding to the input data;
a step of sequentially changing the range of values common to the channels, quantizing the parameters, and learning the deep learning model using learning teacher data;
a procedure of calculating the loss of the estimated result based on the input data by the deep learning model after learning with respect to the correct data, and determining the value range when the loss is minimized as the value range common to the channels;
The quantization program according to (1) for causing a computer to execute
(4)前記量子化対象層の直下の層から前記ディープラーニングモデルの最下層まで、前記チャンネル共通の値域の決定を一層ずつ繰り返す手順、
 を更にコンピュータに実行させるための(2)または(3)のいずれか1項に記載の量子化プログラム。
(4) a procedure of repeating determination of the common value range for each channel from the layer immediately below the quantization target layer to the lowest layer of the deep learning model;
The quantization program according to any one of (2) or (3), further causing a computer to execute the quantization program.
(5)前記量子化対象層の初期値は、最上層の初期値である
 (1)から(4)のいずれか1つに記載の量子化プログラム。
(5) The quantization program according to any one of (1) to (4), wherein the initial value of the quantization target layer is the initial value of the uppermost layer.
(6)前記量子化対象層の初期値は、最上層とは異なる層の初期値である、
 (1)から(4)のいずれか1つに記載の量子化プログラム。
(6) the initial value of the quantization target layer is an initial value of a layer different from the top layer;
The quantization program according to any one of (1) to (4).
(7)前記各チャンネルで異なる値域は、
 前記量子化対象層における重み付け、バイアス、及びチャンネル毎の出力の分布のうち何れかを用いて算出する、
 (1)から(6)のいずれか1つに記載の量子化プログラム。
(7) The different value ranges for each channel are
Calculated using any one of weighting, bias, and output distribution for each channel in the quantization target layer,
The quantization program according to any one of (1) to (6).
(8)前記各チャンネルで異なる値域は、各チャンネルにおける値域の比である、
 (1)から(7)のいずれか1つに記載の量子化プログラム。
(8) the different range of values for each channel is a ratio of the range of values for each channel;
The quantization program according to any one of (1) to (7).
(9)前記各チャンネルで異なる値域が重み付け、又はバイアスで算出される場合、
 パラメータの分布のチャンネル毎の比を用いて前記各チャンネルにおいて異なる値域を決定する、
 (8)に記載の量子化プログラム。
(9) When different ranges are calculated with weights or biases for each channel,
determining different bins in each of said channels using a channel-wise ratio of the distribution of the parameter;
The quantization program according to (8).
(10)前記各チャンネルで異なる値域が量子化対象層の出力の場合、
 前記入力データを取得する手順と、
 当該入力データに対する推論時の出力の分布のチャンネル毎の比を用いて各チャンネルで異なる値域を決定する手順と、
 を更に含む(2)から(4)のうち何れか1項に記載の量子化プログラム。
(10) When the output of the quantization target layer has a different value range for each channel,
a procedure for obtaining the input data;
determining a different range for each channel using a channel-wise ratio of the distribution of the output during inference to the input data;
The quantization program according to any one of (2) to (4), further comprising:
(11)前記各チャンネルで異なる値域は、ビットシフトで表現される、
 (1)から(10)のいずれか1つに記載の量子化プログラム。
(11) the different value ranges for each channel are represented by bit shifts;
The quantization program according to any one of (1) to (10).
(12)ディープラーニングモデルの量子化対象層において、前記ディープラーニングモデルのパラメータの量子化における値域を、チャンネル共通の値域と、各チャンネルにおいて異なる値域に分離する分離部と、
 分離された、前記チャンネル共通の値域と、前記各チャンネルにおいて異なる値域とを、それぞれ異なる方法で決定する値域決定部と、
 を備えることを特徴とする情報処理装置。
(12) a separation unit that separates the value range in quantization of the parameters of the deep learning model into a value range common to channels and a different value range for each channel in the quantization target layer of the deep learning model;
a value range determination unit that determines the separated value range common to the channels and the different value ranges in each of the channels using different methods;
An information processing device comprising:
(13)ディープラーニングモデルの量子化対象層において、前記ディープラーニングモデルのパラメータの量子化における値域を、チャンネル共通の値域と、各チャンネルにおいて異なる値域に分離するステップと、
 分離された、前記チャンネル共通の値域と、前記各チャンネルにおいて異なる値域とを、それぞれ異なる方法で決定するステップと、
 を含む値域決定方法。
(13) in the quantization target layer of the deep learning model, separating the value range in quantization of the parameters of the deep learning model into a value range common to channels and a different value range for each channel;
determining the separated value range common to the channels and the different value ranges in each of the channels using different methods;
Binning method including .
 本発明によれば、量子化プログラム、情報処理装置、及び値域決定方法において、量子化ディープラーニングモデルの精度劣化の抑制と量子化にかかる学習時間の短縮との両立を図ることができる。また、本発明によれば、量子化前の各層の値を用いて値域を決定することにより、量子化前のモデルの性能から、学習によって大きく変わってしまうことを防ぐこともできる。 According to the present invention, in the quantization program, the information processing device, and the value range determination method, it is possible to achieve both suppression of accuracy deterioration of the quantization deep learning model and shortening of the learning time required for quantization. Moreover, according to the present invention, by determining the value range using the values of each layer before quantization, it is possible to prevent the performance of the model before quantization from being greatly changed by learning.
本実施形態に係る情報処理装置の主な構成例を説明する説明図である。It is an explanatory view explaining the main example of composition of the information processor concerning this embodiment. 本実施形態に係る情報処理装置のCPUの機能を示した機能ブロック図である。3 is a functional block diagram showing functions of a CPU of the information processing device according to the embodiment; FIG. 推論部で実行されるディープラーニングモデルの一例を示した説明図である。FIG. 4 is an explanatory diagram showing an example of a deep learning model executed by an inference unit; ディープラーニングモデルの各層が持つパラメータ数を示した説明図である。FIG. 4 is an explanatory diagram showing the number of parameters each layer of a deep learning model has; 本実施形態の情報処理装置がディープラーニングモデルの値域探索を実行する処理を示したフローチャートである(その1)。FIG. 2 is a flowchart showing a process in which the information processing apparatus of the present embodiment executes range search of a deep learning model (No. 1); FIG. 本実施形態の情報処理装置がディープラーニングモデルの値域探索を実行する処理を示したフローチャートである(その2)。2 is a flowchart showing a process of executing a range search of a deep learning model by the information processing apparatus of the present embodiment (No. 2).
 以下に、本発明を実施するための形態について詳細に説明する。なお、以下に説明する実施の形態は、本発明を実現するための一例であり、本発明が適用される装置の構成や各種条件によって適宜修正又は変更されるべきものであり、本発明は以下の実施の形態に限定されるものではない。 The following describes in detail the embodiments for carrying out the present invention. The embodiment described below is an example for realizing the present invention, and should be appropriately modified or changed according to the configuration of the apparatus to which the present invention is applied and various conditions. It is not limited to the embodiment of
 <本実施形態>
[画像処理装置の全体構成]
 図1は、本実施形態に係る情報処理装置100の主な構成例を説明する説明図である。情報処理装置100は、制御プログラムを実行することにより量子化ディープラーニングモデルの値域を決定する。なお、図1に示された情報処理装置100において、同一の構成については同一の符号を付し、説明を適宜、省略する。
<This embodiment>
[Overall Configuration of Image Processing Apparatus]
FIG. 1 is an explanatory diagram illustrating a main configuration example of an information processing apparatus 100 according to this embodiment. The information processing device 100 determines the value range of the quantized deep learning model by executing the control program. In the information processing apparatus 100 shown in FIG. 1, the same components are denoted by the same reference numerals, and the description thereof is omitted as appropriate.
 本実施形態に係る情報処理装置100は、CPU(Central Processing Unit)110、記憶部120、ROM(Read Only Memory)130、RAM(Random Access Memory)140、入力部150、表示部160、及び通信部170を備えて構成されている。 The information processing apparatus 100 according to the present embodiment includes a CPU (Central Processing Unit) 110, a storage unit 120, a ROM (Read Only Memory) 130, a RAM (Random Access Memory) 140, an input unit 150, a display unit 160, and a communication unit. 170.
 CPU110は、記憶部120又はROM130に格納された制御プログラムを実行することにより、図2に示す各処理(機能)を具現化する。なお、CPU110が具現化する各処理については、図2を用いて後述する。 The CPU 110 implements each process (function) shown in FIG. 2 by executing the control program stored in the storage unit 120 or ROM 130 . Note that each process embodied by the CPU 110 will be described later with reference to FIG.
 記憶部120は、大容量の記憶装置により構成され、例えば、ハードディスクドライブ(Hard Disk Drive)、不揮発性メモリなどを備えている。記憶部120は、制御プログラムを格納する。 The storage unit 120 is configured by a large-capacity storage device, and includes, for example, a hard disk drive, a non-volatile memory, and the like. Storage unit 120 stores a control program.
 RAM140は、CPU110により実行制御される各種処理において、ROM130から読み出され、CPU110で実行可能な各種プログラム、入力若しくは出力データ、及びパラメータ等を一時的に記憶するワークエリアとして機能する。 The RAM 140 functions as a work area that temporarily stores various programs read from the ROM 130 and executable by the CPU 110, input or output data, parameters, etc. in various processes executed and controlled by the CPU 110.
 入力部150は、カーソルキー、数字入力キー、及び各種機能キーなどを備えたキーボードと、マウスなどのポインティングデバイスを備えて構成される。入力部150は、キーボードで押下操作されたキーの押下信号やマウスによる操作信号を、入力信号としてCPU110に出力する。CPU110は、入力部150からの操作信号に基づいて、各種処理を実行する。 The input unit 150 comprises a keyboard with cursor keys, number input keys, various function keys, etc., and a pointing device such as a mouse. The input unit 150 outputs to the CPU 110 as an input signal a key press signal pressed on the keyboard or a mouse operation signal. The CPU 110 executes various processes based on operation signals from the input unit 150 .
 表示部160は、例えばCRT(Cathode Ray Tube)やLCD(Liquid Crystal Display)等のモニタを備えて構成される。表示部160は、CPU110から入力される表示信号の指示に従って、各種画面を表示する。また、表示部160として、タッチパネルを採用する場合、表示部160は、入力部150としての機能も併せ持つ。 The display unit 160 includes a monitor such as a CRT (Cathode Ray Tube) or an LCD (Liquid Crystal Display). The display unit 160 displays various screens according to instructions of display signals input from the CPU 110 . Moreover, when a touch panel is adopted as the display unit 160 , the display unit 160 also has a function as the input unit 150 .
 通信部27は、通信インタフェースを備え、ネットワーク上の外部装置と通信する。 The communication unit 27 has a communication interface and communicates with external devices on the network.
 次に、本実施形態に係る情報処理装置100のCPU110の機能について、図2を用いて説明する。 Next, the functions of the CPU 110 of the information processing apparatus 100 according to this embodiment will be described using FIG.
 図2は、本実施形態に係る情報処理装置100のCPU110の機能を示した機能ブロック図である。図2に示すように、CPU110は、制御プログラムを実行することにより、取得部10、分離部20、値域設定部30、重み更新部40、推論部50、ロス算出部60、値域決定部70、チャンネル毎値域決定部80、及び学習用ロス算出部90を具現化する。 FIG. 2 is a functional block diagram showing functions of the CPU 110 of the information processing apparatus 100 according to this embodiment. As shown in FIG. 2, by executing the control program, the CPU 110 executes the acquisition unit 10, the separation unit 20, the value range setting unit 30, the weight update unit 40, the inference unit 50, the loss calculation unit 60, the value range determination unit 70, A channel-by-channel value range determination unit 80 and a learning loss calculation unit 90 are implemented.
 取得部10は、入力画像と、入力画像に対応する正解データとを取得する。取得部10は、この入力画像と、正解データを示す教師データとセットで取得する。なお、取得する各画像のフォーマットは、8bitグレースケールで構成され(256段階)、幅28×高さ28フィールドを持っている。 The acquisition unit 10 acquires an input image and correct data corresponding to the input image. The acquisition unit 10 acquires this input image and teacher data indicating correct data as a set. The format of each acquired image is composed of 8-bit grayscale (256 levels) and has 28 width×28 height fields.
 分離部20は、ディープラーニングモデルDLMの量子化対象層において、ディープラーニングモデルDLMのパラメータの量子化における値域を、チャンネル共通の値域と、各チャンネルにおいて異なる値域に分離する。
 コンボリューション層やフルコネクト層に含まれる重み付けとバイアスや、各層からの出力値は、量子化パラメータSとZを用いて、以下の式(1)で量子化される。
In the quantization target layer of the deep learning model DLM, the separation unit 20 separates the value range in quantization of the parameters of the deep learning model DLM into a value range common to channels and a value range different for each channel.
Weights and biases included in the convolution layer and the full-connect layer, and output values from each layer are quantized using quantization parameters S and Z according to the following equation (1).
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 ただし、計算の簡略化のため本実施形態ではZ=0としてSのみ最適化が必要なパラメータとした。 However, in order to simplify the calculation, in this embodiment, Z=0 and only S is a parameter that requires optimization.
 ここで、分離部20は、式(2)に基づいて、チャンネル共通の値域SALLと、各チャンネルにおいて異なる値域SCHに分離する。 Here, the separating unit 20 separates into the value range S ALL that is common to all channels and the value range S CH that is different for each channel based on Equation (2).
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 本実施形態では、各チャンネルの値域の比SCHは、学習による量子化の前段階で決定される。そして、全チャンネル共通の値域SALLは、複数の値域でロスが計算され、最もロスが小さい値域が探索されて、決定される。 In this embodiment, the value range ratio SCH of each channel is determined in a stage prior to quantization by learning. Then, the value range S ALL common to all channels is determined by calculating losses in a plurality of value ranges and searching for the value range with the smallest loss.
 値域設定部30は、チャンネル毎値域設定部31と、チャンネル共通値域設定部32とを備えて構成されている。値域設定部30は、分離部20で分離された、チャンネル共通の値域と、各チャンネルにおいて異なる値域とを、それぞれ異なる方法で決定する。例えば、チャンネル毎値域設定部31は、各チャンネルにおいて異なる値域を設定するとともに、チャンネル共通値域設定部32は、チャンネル共通の値域を設定する。 The value range setting unit 30 includes a channel-by-channel value range setting unit 31 and a channel common value range setting unit 32 . The value range setting unit 30 determines the value range common to the channels separated by the separation unit 20 and the value range different for each channel using different methods. For example, the channel-by-channel value range setting unit 31 sets a different value range for each channel, and the channel common value range setting unit 32 sets a common value range for each channel.
 チャンネル毎値域決定部80は、チャンネル毎の値域の比を用いた検索を行わず、一意に決定する。例えば、チャンネル毎値域決定部80は、各チャンネルで異なる値域を、量子化対象層における重み付け、バイアス、及びチャンネル毎の出力の分布のうち何れかを用いて算出する。 The channel-by-channel value range determination unit 80 uniquely determines the value range without performing a search using the ratio of the value ranges for each channel. For example, the channel-by-channel value range determining unit 80 calculates a different value range for each channel using any one of weighting, bias, and output distribution for each channel in the quantization target layer.
 例えば、チャンネル毎値域決定部80は、重み付けとバイアスについて、事前に取得した量子化前のチャンネル毎の最大最小によるパラメータの分布を用いて決定する。また、チャンネル毎値域決定部80は、チャンネル毎の出力について、入力データに対して推論を実行して、その際のチャンネル毎のヒストグラムを用いて決定してもよい。この場合、チャンネル毎値域決定部80は、チャンネル共通の値域の初期値を同様の手法により決定してもよい。 For example, the channel-by-channel value range determination unit 80 determines weighting and bias using the maximum and minimum parameter distribution for each channel before quantization that has been acquired in advance. Also, the channel-by-channel range determining unit 80 may perform inference on the input data and determine the output for each channel using the histogram for each channel at that time. In this case, the channel-by-channel value range determination unit 80 may determine the initial value of the value range common to all channels by a similar method.
 なお、チャンネル毎値域決定部80は、各チャンネルで異なる値域を、各チャンネルにおける値域の比で決定してもよい。この場合において、各チャンネルで異なる値域が重み付け、又はバイアスで算出される場合、チャンネル毎値域決定部80は、パラメータの分布のチャンネル毎の比を用いて、各チャンネルにおいて異なる値域を決定する。 It should be noted that the channel-by-channel value range determination unit 80 may determine different value ranges for each channel based on the ratio of the value ranges for each channel. In this case, when a different value range is calculated for each channel by weighting or biasing, the channel-by-channel value range determining unit 80 determines a different value range for each channel using the ratio of the parameter distribution for each channel.
 また、各チャンネルで異なる値域が量子化対象層の出力の場合、チャンネル毎値域決定部80は、入力データを取得する手順と、当該入力データに対する推論時の出力の分布のチャンネル毎の比を用いて各チャンネルで異なる値域を決定してもよい。 In addition, when the output of the quantization target layer has a different value range for each channel, the channel-by-channel value range determination unit 80 uses the procedure for acquiring input data and the ratio of the distribution of the output during inference with respect to the input data for each channel. may determine a different range for each channel.
 これにより、チャンネル毎値域設定部31は、チャンネル毎値域決定部80で決定された、各チャンネルで異なる値域を設定する。なお、各チャンネルで異なる値域は、ビットシフトで表現されていてもいい。 As a result, the channel-by-channel value range setting unit 31 sets a different value range for each channel, which is determined by the channel-by-channel value range determination unit 80 . Note that the different value ranges for each channel may be represented by bit shift.
 学習用ロス算出部90は、ディープラーニングモデルを、学習用教師データを用いて学習する。学習用ロス算出部90は、取得部10を介して、例えば、6万枚の訓練用の画像データセットと、1万枚のテストデータ用の画像データセットを取得する。 The learning loss calculation unit 90 learns the deep learning model using the learning teacher data. The learning loss calculator 90 acquires, for example, a training image data set of 60,000 images and a test data image data set of 10,000 images via the acquisition unit 10 .
 チャンネル共通値域設定部32は、チャンネル共通の値域の仮設定を行う。 The channel-common value range setting unit 32 temporarily sets the channel-common value range.
 重み更新部40は、チャンネル共通の値域を順次変えて、パラメータを量子化させる。重み更新部40は、量子化対象層の直下の層からディープラーニングモデルの最下層まで、チャンネル共通の値域の決定を一層ずつ繰り返す。 The weight updating unit 40 sequentially changes the value range common to the channels to quantize the parameters. The weight updating unit 40 repeats the determination of the channel-common value range layer by layer from the layer immediately below the quantization target layer to the lowest layer of the deep learning model.
 推論部50は、ニューラルネットワークであるディープラーニングモデルで構成されている。推論部50は、値域設定部30において設定された値域で、32ビットの浮動小数点で表現されたモデルパラメータを、8ビットの固定小数点で表現可能な値にすることにより、量子化(推論)を行う。なお、各チャンネルにおいて異なる値域は、チャンネル毎値域設定部31で設定された値域であり、チャンネル共通の値域は、チャンネル共通値域設定部32で仮設定された値域である。 The inference unit 50 is composed of a deep learning model, which is a neural network. The inference unit 50 performs quantization (inference) by converting model parameters expressed in 32-bit floating point into values expressible in 8-bit fixed point in the value range set by the value range setting unit 30. conduct. The value range that differs for each channel is the value range set by the channel-by-channel value range setting unit 31 , and the channel-common value range is the value range provisionally set by the channel-common value range setting unit 32 .
 図3は、推論部50で実行されるディープラーニングモデルDLMの一例を示した説明図である。このディープラーニングモデルDLMは、8ビットグレースケールで幅28×高さ28フィールドの「0」から「9」までの手書き数字を入力データとして、その数を正解データとして出力するものである。 FIG. 3 is an explanatory diagram showing an example of the deep learning model DLM executed by the inference unit 50. FIG. This deep learning model DLM takes handwritten digits from "0" to "9" in an 8-bit grayscale field of width 28×height 28 as input data and outputs the numbers as correct data.
 図3に示すように、ディープラーニングモデルDLMは、コンボリューション(Convolution)CNV1,マックスプーリングM1、コンボリューションCNV2、マックスプーリングM2、フルコネクト(FullConnect)FLC1,FLC2を備えて構成されている。コンボリューションとは、畳み込み層を意味し、フルコネクトとは、結合層を意味している。ここでは、コンボリューションCNV1に、入力画像PT1が入力される。入力画像PT1は、幅28×高さ28の、8ビットグレースケール画像である。 As shown in FIG. 3, the deep learning model DLM is configured with convolution CNV1, max pooling M1, convolution CNV2, max pooling M2, and full connect FLC1 and FLC2. Convolution means a convolution layer, and full connect means a connection layer. Here, the input image PT1 is input to the convolution CNV1. The input image PT1 is an 8-bit grayscale image of width 28×height.
 図4は、ディープラーニングモデルDLMの各層が持つパラメータ数を示した説明図である。適宜、図3のディープラーニングモデルDLMの各層を引用しつつ説明する。 FIG. 4 is an explanatory diagram showing the number of parameters each layer of the deep learning model DLM has. Each layer of the deep learning model DLM in FIG. 3 will be referred to and explained as appropriate.
 図4に示すように、この説明図では、層の名前、入力チャンネル、出力チャンネル、カーネルサイズ、重み付け(Weight)、バイアス、重み付け(Weight)値域、及び出力値域の欄を備えている。本実施形態では、一例として、重み付け(Weight)のみチャンネル毎に値域を設定し、出力値域は、全チャンネル共通とする。 As shown in FIG. 4, this explanatory diagram includes columns for layer name, input channel, output channel, kernel size, weight, bias, weight range, and output range. In the present embodiment, as an example, only weighting (Weight) sets the value range for each channel, and the output value range is common to all channels.
 「input」層には、入力画像PT1が入力され、出力チャンネルのパラメータ数が「1」になっている。なお、出力値域には、入力画像PT1(グレースケール)を量子化するための値域が設定されている。 The input image PT1 is input to the "input" layer, and the number of parameters of the output channel is "1". In the output value range, a value range for quantizing the input image PT1 (grayscale) is set.
 「コンボリューションCNV1」層には、入力チャンネルのパラメータ数が「1」、出力チャンネルのパラメータ数が「32」、カーネルサイズのパラメータ数が「3×3」、重み付けのパラメータ数が、「32×1×3×3」、バイアスのパラメータ数が「32」、重み付け領域のパラメータ数が「32」、出力値域のパラメータ数が「1」となっている。図3に示したように、この階層の出力サイズは32×28×28になる。 In the “convolution CNV1” layer, the number of input channel parameters is “1”, the number of output channel parameters is “32”, the number of kernel size parameters is “3×3”, and the number of weighting parameters is “32× 1×3×3”, the number of parameters for the bias is “32”, the number of parameters for the weighting region is “32”, and the number of parameters for the output range is “1”. As shown in Figure 3, the output size of this hierarchy is 32x28x28.
 次に、マックスプーリングM1の層では、RelUアクティベーションを適用し、続いてカーネルサイズが2でストライド2の最大プーリングレイヤーを適用する。これにより、フィーチャマップが32×14×14の次元にダウンサンプリングされる。 Next, in the layer of max pooling M1, we apply RelU activation, followed by a max pooling layer with kernel size 2 and stride 2. This downsamples the feature map to dimensions of 32×14×14.
 「コンボリューションCNV2」層には、入力チャンネルのパラメータ数が「32」、出力チャンネルのパラメータ数が「64」、カーネルサイズのパラメータ数が「3×3」、重み付けのパラメータ数が、「64×32×3×3」、バイアスのパラメータ数が「64」、重み付け領域のパラメータ数が「64」、出力値域のパラメータ数が「1」となっている。「コンボリューションCNV2」層は、64の特徴マップを抽出する。この階層の出力サイズは、64×7×7になる。 In the “convolution CNV2” layer, the number of input channel parameters is “32”, the number of output channel parameters is “64”, the number of kernel size parameters is “3×3”, and the number of weighting parameters is “64× 32×3×3”, the number of bias parameters is “64”, the number of weighting area parameters is “64”, and the number of output range parameters is “1”. The 'Convolution CNV2' layer extracts 64 feature maps. The output size of this hierarchy will be 64×7×7.
 次に、マックスプーリングM2の層では、RelUアクティベーションと、サイズが2およびストライド2のカーネルを持つ最大プーリングレイヤーでフォローアップする。ダウンサンプリングされた特徴マップのサイズは64×7×7になる。 Next, in a layer of max pooling M2, we follow up with RelU activations and a max pooling layer with a kernel of size 2 and stride 2. The size of the downsampled feature map becomes 64x7x7.
 「フルコネクトFLC1」層には、入力チャンネルのパラメータ数が「3128」、出力チャンネルのパラメータ数が「128」、重み付けのパラメータ数が、「128×3128」、バイアスのパラメータ数が「128」、重み付け領域のパラメータ数が「128」、出力値域のパラメータ数が「1」となっている。「フルコネクトFLC1」層は、64×7×7=3136ノードであり、各ノードは次のレイヤーの128ノードに接続する。 In the "Full Connect FLC1" layer, the number of input channel parameters is "3128", the number of output channel parameters is "128", the number of weighting parameters is "128×3128", the number of bias parameters is "128", The number of parameters in the weighting area is "128", and the number of parameters in the output range is "1". The “Full Connect FLC1” layer has 64×7×7=3136 nodes, each node connecting to 128 nodes in the next layer.
 「フルコネクトFLC2」層には、入力チャンネルのパラメータ数が「128」、出力チャンネルのパラメータ数が「10」、重み付けのパラメータ数が、「10×128」、バイアスのパラメータ数が「10」、重み付け領域のパラメータ数が「10」、出力値域のパラメータ数が「1」となっている。「フルコネクトFLC2」層は、最終の層であるため、出力ディメンションはクラスの合計である10と一致する。 In the “Full Connect FLC2” layer, the number of input channel parameters is “128”, the number of output channel parameters is “10”, the number of weighting parameters is “10×128”, the number of bias parameters is “10”, The number of parameters in the weighting area is "10", and the number of parameters in the output range is "1". The "Full Connect FLC2" layer is the last layer, so the output dimension matches the class sum of 10.
 ロス算出部60は、入力データに基づくディープラーニングモデルDLMによる推定結果の正解データに対するロスを算出する。例えば、ロス算出部60は、式(3)に基づいて、正解データに対するロスを算出する。 The loss calculation unit 60 calculates the loss for the correct data of the estimation result by the deep learning model DLM based on the input data. For example, the loss calculator 60 calculates the loss for correct data based on Equation (3).
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 ロス算出部60は、学習後のディープラーニングモデルDLMによる入力データに基づく推定結果の正解データに対するロスを算出してもよい。 The loss calculation unit 60 may calculate the loss for the correct data of the estimation result based on the input data by the deep learning model DLM after learning.
 値域決定部70は、ロスが最小になるときの値域を、チャンネル共通の値域として決定する。 The value range determination unit 70 determines the value range when the loss is minimized as the value range common to all channels.
 [情報処理装置の処理]
 図5A及び図5Bは、本実施形態の情報処理装置100がディープラーニングモデルDLMの値域探索を実行する処理を示したフローチャートである。
[Processing of information processing device]
5A and 5B are flowcharts showing the process of executing the range search of the deep learning model DLM by the information processing apparatus 100 of this embodiment.
 まず、情報処理装置100は、入力部150によりユーザの操作を受け付け、例えば、入力画像PT1と、入力画像PT1に対応する正解データの入力を受け付ける(ステップS001)。これにより、CPU110の取得部10は、入力画像PT1と、入力画像PT1に対応する正解データとを取得する。 First, the information processing apparatus 100 receives a user's operation through the input unit 150, for example, receives input of the input image PT1 and correct data corresponding to the input image PT1 (step S001). Thereby, the acquiring unit 10 of the CPU 110 acquires the input image PT1 and the correct answer data corresponding to the input image PT1.
 CPU110の分離部20は、ディープラーニングモデルDLMの量子化対象層において、ディープラーニングモデルDLMのパラメータの量子化における値域を、チャンネル共通の値域と、各チャンネルにおいて異なる値域に分離する。分離部20は、例えば、上述の式(2)に従って、量子化における値域を、チャンネル共通の値域と、各チャンネルにおいて異なる値域に分離する。 The separation unit 20 of the CPU 110 separates the value range in the quantization of the parameters of the deep learning model DLM into a value range common to channels and a value range different for each channel in the quantization target layer of the deep learning model DLM. The separation unit 20 separates the value range in quantization into a value range common to channels and a value range different for each channel, for example, according to the above equation (2).
 そして、CPU110の値域設定部30のチャンネル毎値域設定部31は、各チャンネルにおいて異なる値域を設定する(ステップS003)。この場合、チャンネル毎値域設定部31は、チャンネル毎値域決定部80により、チャンネル毎の値域の比を探索せずに、一意に決定されて設定する。 Then, the channel-by-channel value range setting unit 31 of the value range setting unit 30 of the CPU 110 sets a different value range for each channel (step S003). In this case, the channel-by-channel value range setting unit 31 uniquely determines and sets the value range ratio for each channel by the channel-by-channel value range determination unit 80 without searching.
 チャンネル毎値域決定部80は、重み付けとバイアスについて、事前に取得した量子化前のチャンネル毎のパラメータの分布を用いて決定する。量子化前のチャンネル毎のパラメータの分布とは、例えば最大値と最小値のことをいう。また、チャンネル毎値域決定部80は、チャンネル毎の出力について、入力データに対して推論を実行して、その際のチャンネル毎の出力の分布を用いて決定する。なお、チャンネル毎の出力の分布とは、例えば、ヒストグラムである。 The channel-by-channel value range determination unit 80 determines weighting and bias using the pre-quantized parameter distribution for each channel obtained in advance. The distribution of parameters for each channel before quantization means, for example, maximum and minimum values. In addition, the channel-by-channel value range determination unit 80 performs inference on the input data and determines the output of each channel using the output distribution of each channel at that time. Note that the output distribution for each channel is, for example, a histogram.
 一方、CPU110の値域設定部30のチャンネル共通値域設定部32は、チャンネル共通の値域を仮設定する(ステップS005)。この場合、チャンネル毎値域決定部80は、チャンネル共通の値域の初期値を同様の手法により決定する。 On the other hand, the channel-common value range setting unit 32 of the value range setting unit 30 of the CPU 110 provisionally sets the channel-common value range (step S005). In this case, the channel-by-channel value range determination unit 80 determines the initial value of the value range common to all channels by a similar method.
 重み更新部40は、チャンネル共通の値域の初期値を基準にして、初期値を含む値域候補を更新する(ステップS007)。重み更新部40は、例えば、値域候補として、定数倍の0.25倍、0.5倍、1倍、2倍、4倍の5つを設定する。即ち、重み更新部40は、チャンネル共通の値域を順次変えて、パラメータを量子化させている。 The weight updating unit 40 updates the range candidates including the initial value based on the initial value of the range common to all channels (step S007). For example, the weight updating unit 40 sets five values of 0.25 times, 0.5 times, 1 time, 2 times, and 4 times the constant multiple as value range candidates. That is, the weight updating unit 40 sequentially changes the value range common to the channels to quantize the parameters.
 推論部50は、値域設定部30において設定された値域で、32ビットの浮動小数点で表現されたモデルパラメータを、8ビットの固定小数点で表現可能な値にすることにより、量子化(推論)を実行する(ステップS009)。 The inference unit 50 performs quantization (inference) by converting model parameters expressed in 32-bit floating point into values expressible in 8-bit fixed point in the value range set by the value range setting unit 30. Execute (step S009).
 ロス算出部60は、入力データに基づくディープラーニングモデルDLMによる推定結果の正解データに対するロスを算出する(ステップS011)。ロス算出部60は、上述の式(3)に従って、ロスを算出してもよく、正解データとの差分による2乗平均を用いて算出してもよい。また、ロス算出部60は、学習後のディープラーニングモデルDLMによる入力データに基づく推定結果の正解データに対するロスも算出することができる。 The loss calculation unit 60 calculates the loss for the correct data in the result of estimation by the deep learning model DLM based on the input data (step S011). The loss calculator 60 may calculate the loss according to the above equation (3), or may use the mean square of the difference from the correct data. The loss calculation unit 60 can also calculate the loss for the correct data of the estimation result based on the input data by the deep learning model DLM after learning.
 ロス算出部60は、全ての値域を探索したか判定し(ステップS013)、全ての値域を探索していない場合(ステップS013のNo)、ステップS005に戻る。 The loss calculation unit 60 determines whether all the value ranges have been searched (step S013), and if all the value ranges have not been searched (No in step S013), the process returns to step S005.
 一方、全ての値域を探索した場合(ステップS013のYes)、値域決定部70は、入力画像PT1を推論した際のロスが最小になるときの値域を、チャンネル共通の値域として決定し(ステップS015)、ステップS017に進む。 On the other hand, if all the value ranges have been searched (Yes in step S013), the value range determination unit 70 determines the value range that minimizes the loss when inferring the input image PT1 as the value range common to all channels (step S015). ) and proceed to step S017.
 ステップS005に戻ると、チャンネル共通値域設定部32は、チャンネル共通の値域を仮設定し(ステップS005)、ステップS011までの処理を繰り返す。 Upon returning to step S005, the channel-common value range setting unit 32 provisionally sets the channel-common value range (step S005), and repeats the processing up to step S011.
 ステップS017において、値域決定部70は、重み付け、バイアス、チャンネル毎の出力の全てのパラメータの値域を探索したか判定する(ステップS017)。そして、全てのパラメータを探索していないと判定した場合(ステップS017のNo)、値域決定部70は、ステップS005に戻る。 In step S017, the value range determination unit 70 determines whether the value ranges of all parameters of weighting, bias, and output for each channel have been searched (step S017). Then, if it is determined that all parameters have not been searched (No in step S017), the value range determining unit 70 returns to step S005.
 この場合、重み付け、バイアス、チャンネル毎の出力の全てのパラメータについて、チャンネル共通の値域と、各チャンネルにおいて異なる値域とに分離して探索してもよく、また、どれか1つを探索してもよい。 In this case, for all parameters of weighting, bias, and output for each channel, the search may be performed by separating the value range common to the channels and the value range different in each channel, or searching any one of them. good.
 一方、全ての値域を探索した場合(ステップS017のYes)、値域決定部70は、全ての層を探索したか否かを判定する(ステップS019)。 On the other hand, if all ranges have been searched (Yes in step S017), the range determining unit 70 determines whether all layers have been searched (step S019).
 全ての層を探索した場合(ステップS019のYes)、値域決定部70は、ディープラーニングモデルDLMの値域探索を実行する処理を終了する。 When all layers have been searched (Yes in step S019), the range determining unit 70 ends the process of executing the range search of the deep learning model DLM.
 一方、全ての層を探索していない場合(ステップS019のNo)、値域決定部70は、ステップS005に戻る。そして、CPU110は、同様の処理を繰り返す。 On the other hand, if all layers have not been searched (No in step S019), the range determining unit 70 returns to step S005. Then, the CPU 110 repeats similar processing.
 また、ステップS007において、重み更新部40は、量子化対象層の直下の層からディープラーニングモデルDLMの最下層まで、チャンネル共通の値域の決定を一層ずつ繰り返す。なお、量子化対象層の初期値は、最上層の初期値とすることができる。また、量子化対象層の初期値は、最上層とは異なる層の初期値であってもよい。 Also, in step S007, the weight updating unit 40 repeats determination of a channel-common value range layer by layer from the layer immediately below the quantization target layer to the lowest layer of the deep learning model DLM. Note that the initial value of the quantization target layer can be the initial value of the uppermost layer. Also, the initial value of the quantization target layer may be the initial value of a layer different from the top layer.
 このように、情報処理装置100のCPU110は、値域を決定する前に、上位層から順に一層ずつ値域決定を行うことができ、量子化前のパラメータと特性が大きく変動することを防ぐことができる。 In this way, the CPU 110 of the information processing apparatus 100 can determine the value range for each layer in order from the upper layer before determining the value range, and can prevent the parameters and characteristics before quantization from greatly varying. .
 本実施の形態に係る情報処理装置100のCPU110は、全ての層(最上層から最下層まで)を探索するまで繰り返すようになっており(ステップS019)、全ての層の探索が終了すると、ディープラーニングモデルDLMの値域探索を実行する処理を終了する。 The CPU 110 of the information processing apparatus 100 according to the present embodiment repeats the search until all layers (from the top layer to the bottom layer) are searched (step S019). The process of executing the range search of the learning model DLM ends.
 以上説明したように、本実施の形態に係る情報処理装置100のCPU110は、ディープラーニングモデルDLMのパラメータの量子化における値域を、チャンネル共通の値域と、各チャンネルにおいて異なる値域に分離する。そして、情報処理装置100のCPU110は、チャンネル共通の値域と、各チャンネルにおいて異なる値域とを、それぞれ異なる方法で決定する。 As described above, the CPU 110 of the information processing apparatus 100 according to the present embodiment separates the value range in the quantization of the parameters of the deep learning model DLM into a channel-common value range and a different value range for each channel. Then, the CPU 110 of the information processing apparatus 100 determines the value range common to the channels and the value range different for each channel using different methods.
 これにより、本実施の形態に係る情報処理装置100のCPU110は、チャンネル毎に値域を設定することにより、量子化による性能劣化を抑制しつつ、学習にかかる時間は、全チャンネル共通で値域を設定したときと同程度の時間に抑えることができる。 As a result, the CPU 110 of the information processing apparatus 100 according to the present embodiment sets a value range for each channel, thereby suppressing performance deterioration due to quantization, and sets a value range common to all channels for the time required for learning. can be reduced to the same amount of time as when
 したがって、本実施の形態に係る情報処理装置100のCPU110は、精度劣化の抑制と量子化にかかる学習時間の短縮との両立を図ることができる。 Therefore, the CPU 110 of the information processing apparatus 100 according to the present embodiment can achieve both suppression of accuracy deterioration and shortening of the learning time required for quantization.
 また、チャンネル毎に値域を設定し、学習による量子化を行う際の自由度を高くすると、学習が進み過ぎることにより、量子化前のディープラーニングモデルと、性能が大きく変わってしまう恐れがあった。 In addition, setting a value range for each channel and increasing the degree of freedom when performing quantization by learning may lead to a large difference in performance from the deep learning model before quantization due to excessive learning. .
 これに対し、本実施の形態に係る情報処理装置100のCPU110は、各チャンネルの値域の比SCHを、量子化前の各層の値を用いて決定する。これにより、量子化前のディープラーニングモデルDLMの性能が、学習によって大きく変わってしまうことを防ぐことできる。 On the other hand, CPU 110 of information processing apparatus 100 according to the present embodiment determines the value range ratio SCH of each channel using the value of each layer before quantization. This can prevent the performance of the deep learning model DLM before quantization from significantly changing due to learning.
 次に、本実施形態に係る発明における処理の内容について、図5A及び図5Bに示した内容を適宜、補足する。 Next, the content shown in FIGS. 5A and 5B will be supplemented as appropriate with respect to the content of processing in the invention according to the present embodiment.
 図5AにおけるステップS001では、CPU110の取得部10は、入力画像PT1と、入力画像PT1に対応する正解データの入力を受け付けていたが、この段階では、量子化前のパラメータであって、事前に学習したパラメータ(重み付け及びバイアス)を読み込むことができる。 In step S001 in FIG. 5A, the acquisition unit 10 of the CPU 110 receives the input image PT1 and the correct data corresponding to the input image PT1. Learned parameters (weights and biases) can be loaded.
 図5AにおけるステップS003では、CPU110の値域設定部30のチャンネル毎値域設定部31は、各チャンネルにおいて異なる値域を設定していたが、まず、画像データセットをディープラーニングモデルDLMに入力し、出力の値域を決定するための推論を実施し、各層からの出力分布を確認する。 In step S003 in FIG. 5A, the channel-by-channel value range setting unit 31 of the value range setting unit 30 of the CPU 110 sets a different value range for each channel. We perform inference to determine the range and check the output distribution from each layer.
 例えば、コンボリューションCNV1の場合、重み付け領域では、出力32チャンネルに紐づく1×1×3の重み付け分布から、チャンネル毎の重み付け領域SCH(計32個)を算出する。なお、この場合、各チャンネルで異なる値域SCHは、ビットシフトで表現されてもよい。 For example, in the case of convolution CNV1, in the weighting area, the weighting area S CH (32 in total) for each channel is calculated from the 1×1×3 weighting distribution associated with the 32 output channels. In this case, the value range SCH , which is different for each channel, may be represented by bit shift.
 図5AにおけるステップS005では、CPU110の値域設定部30のチャンネル共通値域設定部32は、チャンネル共通の値域を仮設定したが、例えば、コンボリューションCNV1の場合、32個中最大の重み付け値域をチャネル共通の値域SALLを探索時の初期値と仮定する。これにより、CPU110の推論部50は、最大値との比をチャネルごとの値域(計32個)として、量子化を実行することができる。 In step S005 in FIG. 5A, the channel-common value range setting unit 32 of the value range setting unit 30 of the CPU 110 provisionally sets a value range common to all channels. S ALL is assumed to be the initial value for the search. As a result, the inference unit 50 of the CPU 110 can perform quantization using the ratio to the maximum value as the value range for each channel (total of 32 values).
 また、出力の値域の場合、CPU110の推論部50は、コンボリューションCNV1の出力が32×28×28の分布から値域Sを量子化(算出)し、探索時の初期値に使用してもよい。 In addition, in the case of the output value range, the inference unit 50 of the CPU 110 may quantize (calculate) the value range S from the distribution of the output of the convolution CNV1 of 32×28×28, and use it as an initial value at the time of searching. .
 また、図5BにおけるステップS019では、全ての層について探索したか判定して、上位層から下位層へ値域を決定していたが(ステップS021)、具体的には、以下のような順番で、各層毎に値域を決定し、各層の最適な値域を決定することができる。 In addition, in step S019 in FIG. 5B, it is determined whether or not all layers have been searched, and the value range is determined from the upper layer to the lower layer (step S021). A value range can be determined for each layer, and an optimal value range for each layer can be determined.
 (1)コンボリューションCNV1の入力
 (2)コンボリューションCNV1の重み付け
 (3)コンボリューションCNV1のバイアス
 (4)コンボリューションCNV1の出力
 (5)コンボリューションCNV2の重み付け
 (6)コンボリューションCNV2のバイアス
 (7)コンボリューションCNV2の出力
 (8)フルコネクトFLC1の重み付け
 (9)フルコネクトFLC1のバイアス
 (10)フルコネクトFLC1の出力
 (11)フルコネクトFLC2の重み付け
 (12)フルコネクトFLC2のバイアス
 (13)フルコネクトFLC2の出力
(1) Input of convolution CNV1 (2) Weighting of convolution CNV1 (3) Bias of convolution CNV1 (4) Output of convolution CNV1 (5) Weighting of convolution CNV2 (6) Bias of convolution CNV2 (7) ) Output of convolution CNV2 (8) Weighting of fully connected FLC1 (9) Bias of fully connected FLC1 (10) Output of fully connected FLC1 (11) Weighting of fully connected FLC2 (12) Bias of fully connected FLC2 (13) Full Output of connect FLC2
 また、本実施形態では、出力(入力)、重み付け、及びバイアスの各パラメータついて、各層において最適な値域を決定していたが、図5AのステップS005からステップS013の処理は一例であり、これに限定されるものではない。本実施形態として上述した、図5AのステップS005からステップS013の処理によって実行される概念として、出力(入力)、重み付け、及びバイアスの値域を決定するそれぞれの基本概念を、以下に説明する。なお、下記の概念に基づくCPU110の処理は、いずれのステップで実行されてもよい。 In addition, in the present embodiment, the optimum value range is determined in each layer for each parameter of output (input), weighting, and bias, but the processing from step S005 to step S013 in FIG. It is not limited. As concepts executed by the processing from step S005 to step S013 in FIG. 5A described above as the present embodiment, basic concepts for determining the range of output (input), weighting, and bias will be described below. Note that the processing of the CPU 110 based on the following concept may be executed in any step.
(A)出力(入力)の値域の決定方法
 出力(入力)の値域を決定する際の基本概念は、学習による重み付けやバイアスの更新は行わず、推論によるロスの値のみで決定することである。
(A) Method of determining output (input) value range The basic concept for determining the output (input) value range is to determine only the value of loss due to inference without updating weights or biases through learning. .
(1)CPU110は、推論部50において事前に推論を実施し、得られた値域Sを探索時の初期値として、初期値を含む値域候補を値域設定部30により設定する。重み更新部40は、例えば、値域候補として、定数倍の0.25倍、0.5倍、1倍、2倍、4倍の5つを設定する。これは、図5AのステップS005に記載のチャンネル共通値域の仮設定処理に相当する。 (1) The CPU 110 makes an inference in advance in the inference unit 50 , sets the obtained value range S as an initial value at the time of searching, and sets a range candidate including the initial value by the value range setting unit 30 . For example, the weight updating unit 40 sets five values of 0.25 times, 0.5 times, 1 time, 2 times, and 4 times the constant multiple as value range candidates. This corresponds to the channel-common range provisional setting process described in step S005 of FIG. 5A.
(2)CPU110の推論部50は、設定した値域候補について各値域で推論を実行し、ロス算出部60によりロスを算出する。これは、図5AのステップS009に記載の推論処理と、ステップS011に記載のロスの計算処理に相当する。 (2) The inference unit 50 of the CPU 110 performs inference in each value range for the set range candidates, and the loss calculation unit 60 calculates a loss. This corresponds to the inference processing described in step S009 of FIG. 5A and the loss calculation processing described in step S011.
(3)CPU110のロス算出部60は、候補の数だけロスの算出を繰り返す。これは、図5AのステップS005からS013の繰り返しループに対応する。値域決定部70は、ロスが最も低い値域をその出力(入力)に対する値域と決定する。これは、図5BのステップS015に記載のチャンネル共通値域の決定処理に相当する。 (3) The loss calculator 60 of the CPU 110 repeats loss calculation for the number of candidates. This corresponds to the repeated loop from steps S005 to S013 of FIG. 5A. The value range determination unit 70 determines the value range with the lowest loss as the value range for the output (input). This corresponds to the channel common range determination process described in step S015 of FIG. 5B.
(B)コンボリューション、フルコネクトの重み付けの値域の決定方法
 チャネルごとの重み付け値域SCHを決定する際の基本概念は、事前の推論で既に定めているため、チャネル共通の値域SALLを探索により決定することである。その際、量子化対象層の重み付けやバイアスが学習によって更新されるようにする。
(B) Determining method of convolution and full-connect weighting range Since the basic concept for determining the weighting range SCH for each channel has already been determined by prior inference, by searching the channel common range S ALL to decide. At that time, the weighting and bias of the quantization target layer are updated by learning.
(1)まず、量子化対象層の重み付けとバイアスを可変とし、チャンネル共通値域設定部32は、学習用ロス算出部90による学習によって更新されるようにする。 (1) First, the weighting and bias of the quantization target layer are made variable, and the channel common range setting unit 32 is updated by learning by the learning loss calculation unit 90 .
(2)CPU110は、推論部50において事前に推論を実施し、得られた値域Sを探索時の初期値として、初期値を含む値域候補を値域設定部30により設定する。重み更新部40は、例えば、値域候補として、定数倍の0.25倍、0.5倍、1倍、2倍、4倍の5つを設定する。これは、図5AのステップS005に記載のチャンネル共通値域の仮設定処理に相当する。 (2) The CPU 110 causes the inference unit 50 to make an inference in advance, sets the obtained value range S as an initial value at the time of searching, and sets a range candidate including the initial value by the value range setting unit 30 . For example, the weight updating unit 40 sets five values of 0.25 times, 0.5 times, 1 time, 2 times, and 4 times the constant multiple as value range candidates. This corresponds to the channel-common range provisional setting process described in step S005 of FIG. 5A.
(3)CPU110の学習用ロス算出部90は、(B)の(1)で定めた値域候補について各値域で学習を実行し、重み付けとバイアスを更新しながら、ロス算出部60により、ロスを算出する。これは、図5AのステップS007に記載の重み付けの更新処理と、ステップS009に記載の推論処理と、ステップS011に記載のロスの計算処理に相当する。 (3) The learning loss calculation unit 90 of the CPU 110 executes learning in each range for the range candidates defined in (1) of (B), and while updating the weighting and bias, the loss calculation unit 60 calculates the loss. calculate. This corresponds to the weighting update process described in step S007 in FIG. 5A, the inference process described in step S009, and the loss calculation process described in step S011.
(4)CPU110は、ロス算出部60において、値域候補の数だけロスの算出を繰り返す。これは、図5AのステップS005からS013の繰り返しループに対応する。CPU110は、値域決定部70において、ロス算出部60において最もロスの低い値域を、その重み付けに対する値域と決定する。これは、図5BのステップS015に記載のチャンネル共通値域の決定処理に相当する。 (4) The CPU 110 causes the loss calculation unit 60 to repeat calculation of the loss by the number of range candidates. This corresponds to the repeated loop from steps S005 to S013 of FIG. 5A. CPU 110 determines, in value range determination section 70, the value range with the lowest loss in loss calculation section 60 as the value range for the weighting. This corresponds to the channel common range determination process described in step S015 of FIG. 5B.
 なお、他の候補を探索する際は、ロス算出部60の学習によって更新された重み付けとバイアスを記録した上で学習前の状態に戻してから探索する。よって、CPU110は、重み付けとバイアスもその値域で学習された時の値を使用することができる。 When searching for another candidate, the weighting and bias updated by the learning of the loss calculation unit 60 are recorded, and the state before learning is restored before searching. Thus, the CPU 110 can use the values when the weights and biases were also learned in that range.
(C)コンボリューション層、フルコネクト層のバイアスの値域の決定方法
 コンボリューション層、又はフルコネクト層における重み付けやバイアスの関係は、式(4)のように表すことができる。
(C) Method of Determining Bias Range of Convolution Layer and Fully Connected Layer The relationship between weighting and bias in the convolution layer or fully connected layer can be expressed as shown in Equation (4).
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 これにより、バイアスのスケールは、量子化対象層における重み付けと入力の積と同等とみなすことができるので、バイアスの値域SBIASは、式(5)のように算出することができる。 As a result, the bias scale can be regarded as equivalent to the product of the weighting and the input in the quantization target layer, so the bias range S BIAS can be calculated as in Equation (5).
Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005
 また、本実施形態では、式(2)によって重み付けの値域Sを、チャンネル共通の値域SALLと、各チャンネルにおいて異なる値域SCHに分離することができるので、チャンネル共通のバイアスの値域SALLBIASは、式(6)のように表すことが出来る。 In addition, in this embodiment, the weighting range S can be separated into a channel-common range S ALL and a different range S CH for each channel by Equation (2), so the channel-common bias range S ALLBIAS is , can be expressed as in equation (6).
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006
 また、各チャンネルにおいて異なるバイアスの値域SCHBIASは、式(7)のように表すことが出来る。
Figure JPOXMLDOC01-appb-M000007
Also, the bias value range SCHBIAS that is different for each channel can be expressed as in Equation (7).
Figure JPOXMLDOC01-appb-M000007
 以上説明したように、本実施形態では、出力(入力)、重み付け、及びバイアスの各パラメータついて、図5AのステップS005からステップS013の処理に限定されることなく、基本概念に基づいて、各層における最適な値域を決定することができる。 As described above, in the present embodiment, the output (input), weighting, and bias parameters are not limited to the processing from step S005 to step S013 in FIG. An optimal range can be determined.
10 取得部
20 分離部
30 値域設定部
40 重み更新部
50 推論部
60 ロス算出部
70 値域決定部
80 チャンネル毎値域決定部
90 学習用ロス算出部
100 情報処理装置
110 CPU
120 記憶部
130 ROM
140 RAM
150 入力部
160 表示部
170 通信部
10 acquisition unit 20 separation unit 30 value range setting unit 40 weight update unit 50 inference unit 60 loss calculation unit 70 range determination unit 80 channel-by-channel value range determination unit 90 learning loss calculation unit 100 information processing device 110 CPU
120 storage unit 130 ROM
140 RAM
150 Input unit 160 Display unit 170 Communication unit

Claims (13)

  1.  ディープラーニングモデルの量子化対象層において、前記ディープラーニングモデルのパラメータの量子化における値域を、チャンネル共通の値域と、各チャンネルにおいて異なる値域に分離する手順、
     分離された、前記チャンネル共通の値域と、前記各チャンネルにおいて異なる値域とを、それぞれ異なる方法で決定する手順、
     をコンピュータに実行させるための量子化プログラム。
    A procedure for separating the value range in the quantization of the parameters of the deep learning model into a value range common to channels and a different value range for each channel in the quantization target layer of the deep learning model;
    A procedure for determining the separated value range common to the channels and the different value ranges in each of the channels by different methods;
    A quantization program for executing a computer.
  2.  入力データと、前記入力データに対応する正解データとを取得する手順、
     前記チャンネル共通の値域を順次変えて、前記パラメータを量子化する手順、
     前記入力データに基づく前記ディープラーニングモデルによる推定結果の前記正解データに対するロスを算出し、当該ロスが最小になるときの値域を前記チャンネル共通の値域として決定する手順、
     をコンピュータに実行させるための請求項1に記載の量子化プログラム。
    a procedure for acquiring input data and correct data corresponding to the input data;
    a step of quantizing the parameter by sequentially changing the range common to the channels;
    a procedure of calculating a loss of the result of estimation by the deep learning model based on the input data with respect to the correct data, and determining a value range when the loss is minimized as a value range common to the channels;
    2. The quantization program according to claim 1, for causing a computer to execute:
  3.  入力データと、前記入力データに対応する正解データとを取得する手順、
     前記チャンネル共通の値域を順次変えて、前記パラメータを量子化して、前記ディープラーニングモデルを学習用教師データを用いて学習する手順、
     学習後の前記ディープラーニングモデルによる前記入力データに基づく推定結果の前記正解データに対するロスを算出し、当該ロスが最小になるときの値域を前記チャンネル共通の値域として決定する手順、
     をコンピュータに実行させるための請求項1に記載の量子化プログラム。
    a procedure for acquiring input data and correct data corresponding to the input data;
    a step of sequentially changing the range of values common to the channels, quantizing the parameters, and learning the deep learning model using learning teacher data;
    a procedure of calculating the loss of the estimated result based on the input data by the deep learning model after learning with respect to the correct data, and determining the value range when the loss is minimized as the value range common to the channels;
    2. The quantization program according to claim 1, for causing a computer to execute:
  4.  前記量子化対象層の直下の層から前記ディープラーニングモデルの最下層まで、前記チャンネル共通の値域の決定を一層ずつ繰り返す手順、
     をコンピュータに実行させるための請求項2または3に記載の量子化プログラム。
    A step of repeating the determination of the common range of the channels layer by layer from the layer immediately below the quantization target layer to the lowest layer of the deep learning model;
    4. The quantization program according to claim 2 or 3, for causing a computer to execute the quantization program.
  5.  前記量子化対象層の初期値は、最上層の初期値である
     請求項1から4のいずれか1つに記載の量子化プログラム。
    5. The quantization program according to any one of claims 1 to 4, wherein the initial value of the quantization target layer is the initial value of the uppermost layer.
  6.  前記量子化対象層の初期値は、最上層とは異なる層の初期値である、
     請求項1から4のいずれか1つに記載の量子化プログラム。
    The initial value of the quantization target layer is an initial value of a layer different from the top layer,
    A quantization program according to any one of claims 1 to 4.
  7.  前記各チャンネルで異なる値域は、
     前記量子化対象層における重み付け、バイアス、及びチャンネル毎の出力の分布のうち何れかを用いて算出される、
     請求項1から6のいずれか1つに記載の量子化プログラム。
    The different value ranges for each channel are
    Calculated using any one of the weighting, bias, and output distribution for each channel in the quantization target layer,
    A quantization program according to any one of claims 1 to 6.
  8.  前記各チャンネルで異なる値域は、各チャンネルにおける値域の比である、
     請求項1から7のいずれか1つに記載の量子化プログラム。
    wherein the range of values different in each channel is a ratio of the range of values in each channel;
    A quantization program according to any one of claims 1 to 7.
  9.  前記各チャンネルで異なる値域が重み付け、又はバイアスで算出される場合、
     パラメータの分布のチャンネル毎の比を用いて前記各チャンネルにおいて異なる値域を決定する、
     請求項8に記載の量子化プログラム。
    When different ranges are calculated with weights or biases for each of the channels,
    determining different bins in each of said channels using a channel-wise ratio of the distribution of the parameter;
    9. A quantization program according to claim 8.
  10.  前記各チャンネルで異なる値域が量子化対象層の出力の場合、
     前記入力データを取得する手順、
     当該入力データに対する推論時の出力の分布のチャンネル毎の比を用いて各チャンネルで異なる値域を決定する手順、
     を更に含む請求項2から4のうち何れか1項に記載の量子化プログラム。
    When the output of the quantization target layer has a different value range for each channel,
    a procedure for obtaining said input data;
    determining a different range for each channel using the channel-wise ratio of the distribution of the output during inference to the input data;
    5. The quantization program according to any one of claims 2 to 4, further comprising:
  11.  前記各チャンネルで異なる値域は、ビットシフトで表現される、
     請求項1から10のいずれか1つに記載の量子化プログラム。
    The different value ranges for each channel are represented by bit shifts,
    A quantization program according to any one of claims 1 to 10.
  12.  ディープラーニングモデルの量子化対象層において、前記ディープラーニングモデルのパラメータの量子化における値域を、チャンネル共通の値域と、各チャンネルにおいて異なる値域に分離する分離部と、
     分離された、前記チャンネル共通の値域と、前記各チャンネルにおいて異なる値域とを、それぞれ異なる方法で決定する値域決定部と、
     を備えることを特徴とする情報処理装置。
    a separation unit that separates a value range in quantization of parameters of the deep learning model into a value range common to channels and a value range different for each channel in a quantization target layer of the deep learning model;
    a value range determination unit that determines the separated value range common to the channels and the different value ranges in each of the channels using different methods;
    An information processing device comprising:
  13.  ディープラーニングモデルの量子化対象層において、前記ディープラーニングモデルのパラメータの量子化における値域を、チャンネル共通の値域と、各チャンネルにおいて異なる値域に分離するステップと、
     分離された、前記チャンネル共通の値域と、前記各チャンネルにおいて異なる値域とを、それぞれ異なる方法で決定するステップと、
     を含む値域決定方法。
    In a quantization target layer of a deep learning model, a step of separating a value range in quantization of a parameter of the deep learning model into a value range common to channels and a different value range for each channel;
    determining the separated value range common to the channels and the different value ranges in each of the channels using different methods;
    Binning method including .
PCT/JP2022/011040 2021-05-10 2022-03-11 Quantization program, information processing device, and range determination method WO2022239448A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2023520860A JPWO2022239448A1 (en) 2021-05-10 2022-03-11

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021-079555 2021-05-10
JP2021079555 2021-05-10

Publications (1)

Publication Number Publication Date
WO2022239448A1 true WO2022239448A1 (en) 2022-11-17

Family

ID=84028153

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/011040 WO2022239448A1 (en) 2021-05-10 2022-03-11 Quantization program, information processing device, and range determination method

Country Status (2)

Country Link
JP (1) JPWO2022239448A1 (en)
WO (1) WO2022239448A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019032833A (en) * 2017-08-04 2019-02-28 三星電子株式会社Samsung Electronics Co.,Ltd. Method and apparatus for fixed-point quantized neural network
US20200394522A1 (en) * 2019-06-12 2020-12-17 Shanghai Cambricon Information Technology Co., Ltd Neural Network Quantization Parameter Determination Method and Related Products

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019032833A (en) * 2017-08-04 2019-02-28 三星電子株式会社Samsung Electronics Co.,Ltd. Method and apparatus for fixed-point quantized neural network
US20200394522A1 (en) * 2019-06-12 2020-12-17 Shanghai Cambricon Information Technology Co., Ltd Neural Network Quantization Parameter Determination Method and Related Products

Also Published As

Publication number Publication date
JPWO2022239448A1 (en) 2022-11-17

Similar Documents

Publication Publication Date Title
US11928574B2 (en) Neural architecture search with factorized hierarchical search space
CN109948149B (en) Text classification method and device
JP2019032808A (en) Mechanical learning method and device
WO2023050707A1 (en) Network model quantization method and apparatus, and computer device and storage medium
US20190279072A1 (en) Method and apparatus for optimizing and applying multilayer neural network model, and storage medium
CN110969251A (en) Neural network model quantification method and device based on label-free data
Aravind et al. A simple approach to clustering in excel
WO2020151438A1 (en) Neural network processing method and evaluation method, and data analysis method and device
US20220301288A1 (en) Control method and information processing apparatus
US20210383272A1 (en) Systems and methods for continual learning
CN112699215B (en) Grading prediction method and system based on capsule network and interactive attention mechanism
JP2023547010A (en) Model training methods, equipment, and electronics based on knowledge distillation
CN113377964A (en) Knowledge graph link prediction method, device, equipment and storage medium
CN111062465A (en) Image recognition model and method with neural network structure self-adjusting function
CN112905809B (en) Knowledge graph learning method and system
JP6942203B2 (en) Data processing system and data processing method
WO2022239448A1 (en) Quantization program, information processing device, and range determination method
US20230268035A1 (en) Method and apparatus for generating chemical structure using neural network
Bernard et al. A fast algorithm to find best matching units in self-organizing maps
JP2021144461A (en) Learning device and inference device
WO2023215658A1 (en) Implementing monotonic constrained neural network layers using complementary activation functions
CN114677547B (en) Image classification method based on self-holding characterization expansion type incremental learning
CN116229170A (en) Task migration-based federal unsupervised image classification model training method, classification method and equipment
US20220414936A1 (en) Multimodal color variations using learned color distributions
CN114444654A (en) NAS-oriented training-free neural network performance evaluation method, device and equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22807155

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2023520860

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22807155

Country of ref document: EP

Kind code of ref document: A1