WO2022239448A1

WO2022239448A1 - Quantization program, information processing device, and range determination method

Info

Publication number: WO2022239448A1
Application number: PCT/JP2022/011040
Authority: WO
Inventors: 雄一尾崎
Original assignee: コニカミノルタ株式会社
Priority date: 2021-05-10
Filing date: 2022-03-11
Publication date: 2022-11-17
Also published as: JPWO2022239448A1

Abstract

A quantization program for causing a CPU (110) to run includes: a procedure by which in a quantization target layer of a deep learning model (DLM), a range for quantization of parameters of the deep learning model (DLM) is divided into a channel shared range and ranges different for each channel; and a procedure for using respectively different methods to determine the divided channel shared region and regions different for each channel, and the quantization program causes the CPU (110) to execute said procedures.

Description

[Replenishment based on Rule 26 05.04.2022] Quantization program, information processing device, and range determination method

The present invention relates to a quantization program, an information processing device, and a range determination method.

In recent years, the development of AI (Artificial Intelligence) technology has progressed, and the spread of technology related to deep learning has been remarkable. Among them, edge computing technology that attempts to introduce AI functions into various familiar things at low cost is attracting attention.

　Since edge computing technology requires low cost and low power consumption, quantized deep learning models that can be operated with high-speed and small-scale circuits are attracting attention. In a quantized deep learning model, it is necessary to construct a model that achieves both high-precision and high-speed operation through quantization, and it is important to set the optimal value range during quantization.

JP 2019-32833 A JP 2020-149311 A

　The parameters (weight, bias) of each layer of the quantized deep learning model and the output of each layer have multiple channels, and depending on the model, the value scale may differ greatly for each channel. On the other hand, when the value range is common to all channels, when the scale of the value for each channel differs greatly, the error due to quantization increases and the accuracy may decrease.

Therefore, in Non-Patent Document 1, learning using pseudo-quantization is used to maintain accuracy, but it is desirable to have a value range for each channel.

Also, in Patent Documents 1 and 2, each channel has a value range, but since it is a method of determining the value range based on the distribution of parameters and outputs, it may not be the optimal value range.

Here, as a method of determining the optimum range, a method of searching for the range that minimizes the loss (loss) for the correct data using learning can be considered, but many quantized deep learning models have the number of layers Since the number of channels in each layer is large compared to , executing a search for each channel requires a very long learning time.

Specifically, the number of times of learning common to all channels is the number of times obtained by multiplying the number of layers to be quantized by the range pattern per layer. Also, the number of times of learning for each channel is the number of times obtained by multiplying the number of layers to be quantized by the range pattern per layer and the number of channels in each layer.

In this way, conventional quantized deep learning models require a lot of learning time to search for the optimal range of channels.

In view of this problem, the present invention provides a quantization program, an information processing apparatus, and a value range determination method that simultaneously suppresses deterioration in the accuracy of a quantized deep learning model and shortens the learning time required for quantization. That is the issue.

That is, the above problems of the present invention are solved by the following configurations.
(1) A procedure for separating the value range in quantization of the parameters of the deep learning model into a value range common to channels and a different value range for each channel in the quantization target layer of the deep learning model,
A procedure for determining the separated value range common to the channels and the different value ranges in each of the channels by different methods;
A quantization program for executing a computer.

(2) a procedure for obtaining input data and correct data corresponding to the input data;
a step of quantizing the parameter by sequentially changing the value range common to the channels;
a procedure of calculating a loss of the result of estimation by the deep learning model based on the input data with respect to the correct data, and determining a value range when the loss is minimized as a value range common to the channels;
The quantization program according to (1) for causing a computer to execute

(3) a procedure for obtaining input data and correct data corresponding to the input data;
a step of sequentially changing the range of values common to the channels, quantizing the parameters, and learning the deep learning model using learning teacher data;
a procedure of calculating the loss of the estimated result based on the input data by the deep learning model after learning with respect to the correct data, and determining the value range when the loss is minimized as the value range common to the channels;
The quantization program according to (1) for causing a computer to execute

(4) a procedure of repeating determination of the common value range for each channel from the layer immediately below the quantization target layer to the lowest layer of the deep learning model;
The quantization program according to any one of (2) or (3), further causing a computer to execute the quantization program.

(5) The quantization program according to any one of (1) to (4), wherein the initial value of the quantization target layer is the initial value of the uppermost layer.

(6) the initial value of the quantization target layer is an initial value of a layer different from the top layer;
The quantization program according to any one of (1) to (4).

(7) The different value ranges for each channel are
Calculated using any one of weighting, bias, and output distribution for each channel in the quantization target layer,
The quantization program according to any one of (1) to (6).

(8) the different range of values for each channel is a ratio of the range of values for each channel;
The quantization program according to any one of (1) to (7).

(9) When different ranges are calculated with weights or biases for each channel,
determining different bins in each of said channels using a channel-wise ratio of the distribution of the parameter;
The quantization program according to (8).

(10) When the output of the quantization target layer has a different value range for each channel,
a procedure for obtaining the input data;
determining a different range for each channel using a channel-wise ratio of the distribution of the output during inference to the input data;
The quantization program according to any one of (2) to (4), further comprising:

(11) the different value ranges for each channel are represented by bit shifts;
The quantization program according to any one of (1) to (10).

(12) a separation unit that separates the value range in quantization of the parameters of the deep learning model into a value range common to channels and a different value range for each channel in the quantization target layer of the deep learning model;
a value range determination unit that determines the separated value range common to the channels and the different value ranges in each of the channels using different methods;
An information processing device comprising:

(13) in the quantization target layer of the deep learning model, separating the value range in quantization of the parameters of the deep learning model into a value range common to channels and a different value range for each channel;
determining the separated value range common to the channels and the different value ranges in each of the channels using different methods;
Binning method including .

According to the present invention, in the quantization program, the information processing device, and the value range determination method, it is possible to achieve both suppression of accuracy deterioration of the quantization deep learning model and shortening of the learning time required for quantization. Moreover, according to the present invention, by determining the value range using the values of each layer before quantization, it is possible to prevent the performance of the model before quantization from being greatly changed by learning.

It is an explanatory view explaining the main example of composition of the information processor concerning this embodiment. 3 is a functional block diagram showing functions of a CPU of the information processing device according to the embodiment; FIG. FIG. 4 is an explanatory diagram showing an example of a deep learning model executed by an inference unit; FIG. 4 is an explanatory diagram showing the number of parameters each layer of a deep learning model has; FIG. 2 is a flowchart showing a process in which the information processing apparatus of the present embodiment executes range search of a deep learning model (No. 1); FIG. 2 is a flowchart showing a process of executing a range search of a deep learning model by the information processing apparatus of the present embodiment (No. 2).

The following describes in detail the embodiments for carrying out the present invention. The embodiment described below is an example for realizing the present invention, and should be appropriately modified or changed according to the configuration of the apparatus to which the present invention is applied and various conditions. It is not limited to the embodiment of

<This embodiment>
[Overall Configuration of Image Processing Apparatus]
FIG. 1 is an explanatory diagram illustrating a main configuration example of an information processing apparatus 100 according to this embodiment. The information processing device 100 determines the value range of the quantized deep learning model by executing the control program. In the information processing apparatus 100 shown in FIG. 1, the same components are denoted by the same reference numerals, and the description thereof is omitted as appropriate.

The information processing apparatus 100 according to the present embodiment includes a CPU (Central Processing Unit) 110, a storage unit 120, a ROM (Read Only Memory) 130, a RAM (Random Access Memory) 140, an input unit 150, a display unit 160, and a communication unit. 170.

The CPU 110 implements each process (function) shown in FIG. 2 by executing the control program stored in the storage unit 120 or ROM 130 . Note that each process embodied by the CPU 110 will be described later with reference to FIG.

The storage unit 120 is configured by a large-capacity storage device, and includes, for example, a hard disk drive, a non-volatile memory, and the like. Storage unit 120 stores a control program.

The RAM 140 functions as a work area that temporarily stores various programs read from the ROM 130 and executable by the CPU 110, input or output data, parameters, etc. in various processes executed and controlled by the CPU 110.

The input unit 150 comprises a keyboard with cursor keys, number input keys, various function keys, etc., and a pointing device such as a mouse. The input unit 150 outputs to the CPU 110 as an input signal a key press signal pressed on the keyboard or a mouse operation signal. The CPU 110 executes various processes based on operation signals from the input unit 150 .

The display unit 160 includes a monitor such as a CRT (Cathode Ray Tube) or an LCD (Liquid Crystal Display). The display unit 160 displays various screens according to instructions of display signals input from the CPU 110 . Moreover, when a touch panel is adopted as the display unit 160 , the display unit 160 also has a function as the input unit 150 .

The communication unit 27 has a communication interface and communicates with external devices on the network.

Next, the functions of the CPU 110 of the information processing apparatus 100 according to this embodiment will be described using FIG.

FIG. 2 is a functional block diagram showing functions of the CPU 110 of the information processing apparatus 100 according to this embodiment. As shown in FIG. 2, by executing the control program, the CPU 110 executes the acquisition unit 10, the separation unit 20, the value range setting unit 30, the weight update unit 40, the inference unit 50, the loss calculation unit 60, the value range determination unit 70, A channel-by-channel value range determination unit 80 and a learning loss calculation unit 90 are implemented.

The acquisition unit 10 acquires an input image and correct data corresponding to the input image. The acquisition unit 10 acquires this input image and teacher data indicating correct data as a set. The format of each acquired image is composed of 8-bit grayscale (256 levels) and has 28 width×28 height fields.

In the quantization target layer of the deep learning model DLM, the separation unit 20 separates the value range in quantization of the parameters of the deep learning model DLM into a value range common to channels and a value range different for each channel.
Weights and biases included in the convolution layer and the full-connect layer, and output values from each layer are quantized using quantization parameters S and Z according to the following equation (1).

However, in order to simplify the calculation, in this embodiment, Z=0 and only S is a parameter that requires optimization.

Here, the separating unit 20 separates into the value range S _ALL that is common to all channels and the value range S _CH that is different for each channel based on Equation (2).

In this embodiment, the value range ratio _SCH of each channel is determined in a stage prior to quantization by learning. Then, the value range S _ALL common to all channels is determined by calculating losses in a plurality of value ranges and searching for the value range with the smallest loss.

The value range setting unit 30 includes a channel-by-channel value range setting unit 31 and a channel common value range setting unit 32 . The value range setting unit 30 determines the value range common to the channels separated by the separation unit 20 and the value range different for each channel using different methods. For example, the channel-by-channel value range setting unit 31 sets a different value range for each channel, and the channel common value range setting unit 32 sets a common value range for each channel.

The channel-by-channel value range determination unit 80 uniquely determines the value range without performing a search using the ratio of the value ranges for each channel. For example, the channel-by-channel value range determining unit 80 calculates a different value range for each channel using any one of weighting, bias, and output distribution for each channel in the quantization target layer.

For example, the channel-by-channel value range determination unit 80 determines weighting and bias using the maximum and minimum parameter distribution for each channel before quantization that has been acquired in advance. Also, the channel-by-channel range determining unit 80 may perform inference on the input data and determine the output for each channel using the histogram for each channel at that time. In this case, the channel-by-channel value range determination unit 80 may determine the initial value of the value range common to all channels by a similar method.

It should be noted that the channel-by-channel value range determination unit 80 may determine different value ranges for each channel based on the ratio of the value ranges for each channel. In this case, when a different value range is calculated for each channel by weighting or biasing, the channel-by-channel value range determining unit 80 determines a different value range for each channel using the ratio of the parameter distribution for each channel.

In addition, when the output of the quantization target layer has a different value range for each channel, the channel-by-channel value range determination unit 80 uses the procedure for acquiring input data and the ratio of the distribution of the output during inference with respect to the input data for each channel. may determine a different range for each channel.

As a result, the channel-by-channel value range setting unit 31 sets a different value range for each channel, which is determined by the channel-by-channel value range determination unit 80 . Note that the different value ranges for each channel may be represented by bit shift.

The learning loss calculation unit 90 learns the deep learning model using the learning teacher data. The learning loss calculator 90 acquires, for example, a training image data set of 60,000 images and a test data image data set of 10,000 images via the acquisition unit 10 .

The channel-common value range setting unit 32 temporarily sets the channel-common value range.

The weight updating unit 40 sequentially changes the value range common to the channels to quantize the parameters. The weight updating unit 40 repeats the determination of the channel-common value range layer by layer from the layer immediately below the quantization target layer to the lowest layer of the deep learning model.

The inference unit 50 is composed of a deep learning model, which is a neural network. The inference unit 50 performs quantization (inference) by converting model parameters expressed in 32-bit floating point into values expressible in 8-bit fixed point in the value range set by the value range setting unit 30. conduct. The value range that differs for each channel is the value range set by the channel-by-channel value range setting unit 31 , and the channel-common value range is the value range provisionally set by the channel-common value range setting unit 32 .

FIG. 3 is an explanatory diagram showing an example of the deep learning model DLM executed by the inference unit 50. FIG. This deep learning model DLM takes handwritten digits from "0" to "9" in an 8-bit grayscale field of width 28×height 28 as input data and outputs the numbers as correct data.

As shown in FIG. 3, the deep learning model DLM is configured with convolution CNV1, max pooling M1, convolution CNV2, max pooling M2, and full connect FLC1 and FLC2. Convolution means a convolution layer, and full connect means a connection layer. Here, the input image PT1 is input to the convolution CNV1. The input image PT1 is an 8-bit grayscale image of width 28×height.

FIG. 4 is an explanatory diagram showing the number of parameters each layer of the deep learning model DLM has. Each layer of the deep learning model DLM in FIG. 3 will be referred to and explained as appropriate.

As shown in FIG. 4, this explanatory diagram includes columns for layer name, input channel, output channel, kernel size, weight, bias, weight range, and output range. In the present embodiment, as an example, only weighting (Weight) sets the value range for each channel, and the output value range is common to all channels.

The input image PT1 is input to the "input" layer, and the number of parameters of the output channel is "1". In the output value range, a value range for quantizing the input image PT1 (grayscale) is set.

In the “convolution CNV1” layer, the number of input channel parameters is “1”, the number of output channel parameters is “32”, the number of kernel size parameters is “3×3”, and the number of weighting parameters is “32× 1×3×3”, the number of parameters for the bias is “32”, the number of parameters for the weighting region is “32”, and the number of parameters for the output range is “1”. As shown in Figure 3, the output size of this hierarchy is 32x28x28.

Next, in the layer of max pooling M1, we apply RelU activation, followed by a max pooling layer with kernel size 2 and stride 2. This downsamples the feature map to dimensions of 32×14×14.

In the “convolution CNV2” layer, the number of input channel parameters is “32”, the number of output channel parameters is “64”, the number of kernel size parameters is “3×3”, and the number of weighting parameters is “64× 32×3×3”, the number of bias parameters is “64”, the number of weighting area parameters is “64”, and the number of output range parameters is “1”. The 'Convolution CNV2' layer extracts 64 feature maps. The output size of this hierarchy will be 64×7×7.

Next, in a layer of max pooling M2, we follow up with RelU activations and a max pooling layer with a kernel of size 2 and stride 2. The size of the downsampled feature map becomes 64x7x7.

In the "Full Connect FLC1" layer, the number of input channel parameters is "3128", the number of output channel parameters is "128", the number of weighting parameters is "128×3128", the number of bias parameters is "128", The number of parameters in the weighting area is "128", and the number of parameters in the output range is "1". The “Full Connect FLC1” layer has 64×7×7=3136 nodes, each node connecting to 128 nodes in the next layer.

In the “Full Connect FLC2” layer, the number of input channel parameters is “128”, the number of output channel parameters is “10”, the number of weighting parameters is “10×128”, the number of bias parameters is “10”, The number of parameters in the weighting area is "10", and the number of parameters in the output range is "1". The "Full Connect FLC2" layer is the last layer, so the output dimension matches the class sum of 10.

The loss calculation unit 60 calculates the loss for the correct data of the estimation result by the deep learning model DLM based on the input data. For example, the loss calculator 60 calculates the loss for correct data based on Equation (3).

The loss calculation unit 60 may calculate the loss for the correct data of the estimation result based on the input data by the deep learning model DLM after learning.

The value range determination unit 70 determines the value range when the loss is minimized as the value range common to all channels.

[Processing of information processing device]
5A and 5B are flowcharts showing the process of executing the range search of the deep learning model DLM by the information processing apparatus 100 of this embodiment.

First, the information processing apparatus 100 receives a user's operation through the input unit 150, for example, receives input of the input image PT1 and correct data corresponding to the input image PT1 (step S001). Thereby, the acquiring unit 10 of the CPU 110 acquires the input image PT1 and the correct answer data corresponding to the input image PT1.

The separation unit 20 of the CPU 110 separates the value range in the quantization of the parameters of the deep learning model DLM into a value range common to channels and a value range different for each channel in the quantization target layer of the deep learning model DLM. The separation unit 20 separates the value range in quantization into a value range common to channels and a value range different for each channel, for example, according to the above equation (2).

Then, the channel-by-channel value range setting unit 31 of the value range setting unit 30 of the CPU 110 sets a different value range for each channel (step S003). In this case, the channel-by-channel value range setting unit 31 uniquely determines and sets the value range ratio for each channel by the channel-by-channel value range determination unit 80 without searching.

The channel-by-channel value range determination unit 80 determines weighting and bias using the pre-quantized parameter distribution for each channel obtained in advance. The distribution of parameters for each channel before quantization means, for example, maximum and minimum values. In addition, the channel-by-channel value range determination unit 80 performs inference on the input data and determines the output of each channel using the output distribution of each channel at that time. Note that the output distribution for each channel is, for example, a histogram.

On the other hand, the channel-common value range setting unit 32 of the value range setting unit 30 of the CPU 110 provisionally sets the channel-common value range (step S005). In this case, the channel-by-channel value range determination unit 80 determines the initial value of the value range common to all channels by a similar method.

The weight updating unit 40 updates the range candidates including the initial value based on the initial value of the range common to all channels (step S007). For example, the weight updating unit 40 sets five values of 0.25 times, 0.5 times, 1 time, 2 times, and 4 times the constant multiple as value range candidates. That is, the weight updating unit 40 sequentially changes the value range common to the channels to quantize the parameters.

The inference unit 50 performs quantization (inference) by converting model parameters expressed in 32-bit floating point into values expressible in 8-bit fixed point in the value range set by the value range setting unit 30. Execute (step S009).

The loss calculation unit 60 calculates the loss for the correct data in the result of estimation by the deep learning model DLM based on the input data (step S011). The loss calculator 60 may calculate the loss according to the above equation (3), or may use the mean square of the difference from the correct data. The loss calculation unit 60 can also calculate the loss for the correct data of the estimation result based on the input data by the deep learning model DLM after learning.

The loss calculation unit 60 determines whether all the value ranges have been searched (step S013), and if all the value ranges have not been searched (No in step S013), the process returns to step S005.

On the other hand, if all the value ranges have been searched (Yes in step S013), the value range determination unit 70 determines the value range that minimizes the loss when inferring the input image PT1 as the value range common to all channels (step S015). ) and proceed to step S017.

Upon returning to step S005, the channel-common value range setting unit 32 provisionally sets the channel-common value range (step S005), and repeats the processing up to step S011.

In step S017, the value range determination unit 70 determines whether the value ranges of all parameters of weighting, bias, and output for each channel have been searched (step S017). Then, if it is determined that all parameters have not been searched (No in step S017), the value range determining unit 70 returns to step S005.

In this case, for all parameters of weighting, bias, and output for each channel, the search may be performed by separating the value range common to the channels and the value range different in each channel, or searching any one of them. good.

On the other hand, if all ranges have been searched (Yes in step S017), the range determining unit 70 determines whether all layers have been searched (step S019).

When all layers have been searched (Yes in step S019), the range determining unit 70 ends the process of executing the range search of the deep learning model DLM.

On the other hand, if all layers have not been searched (No in step S019), the range determining unit 70 returns to step S005. Then, the CPU 110 repeats similar processing.

Also, in step S007, the weight updating unit 40 repeats determination of a channel-common value range layer by layer from the layer immediately below the quantization target layer to the lowest layer of the deep learning model DLM. Note that the initial value of the quantization target layer can be the initial value of the uppermost layer. Also, the initial value of the quantization target layer may be the initial value of a layer different from the top layer.

In this way, the CPU 110 of the information processing apparatus 100 can determine the value range for each layer in order from the upper layer before determining the value range, and can prevent the parameters and characteristics before quantization from greatly varying. .

The CPU 110 of the information processing apparatus 100 according to the present embodiment repeats the search until all layers (from the top layer to the bottom layer) are searched (step S019). The process of executing the range search of the learning model DLM ends.

As described above, the CPU 110 of the information processing apparatus 100 according to the present embodiment separates the value range in the quantization of the parameters of the deep learning model DLM into a channel-common value range and a different value range for each channel. Then, the CPU 110 of the information processing apparatus 100 determines the value range common to the channels and the value range different for each channel using different methods.

As a result, the CPU 110 of the information processing apparatus 100 according to the present embodiment sets a value range for each channel, thereby suppressing performance deterioration due to quantization, and sets a value range common to all channels for the time required for learning. can be reduced to the same amount of time as when

Therefore, the CPU 110 of the information processing apparatus 100 according to the present embodiment can achieve both suppression of accuracy deterioration and shortening of the learning time required for quantization.

In addition, setting a value range for each channel and increasing the degree of freedom when performing quantization by learning may lead to a large difference in performance from the deep learning model before quantization due to excessive learning. .

On the other hand, CPU 110 of information processing apparatus 100 according to the present embodiment determines the value range ratio _SCH of each channel using the value of each layer before quantization. This can prevent the performance of the deep learning model DLM before quantization from significantly changing due to learning.

Next, the content shown in FIGS. 5A and 5B will be supplemented as appropriate with respect to the content of processing in the invention according to the present embodiment.

In step S001 in FIG. 5A, the acquisition unit 10 of the CPU 110 receives the input image PT1 and the correct data corresponding to the input image PT1. Learned parameters (weights and biases) can be loaded.

In step S003 in FIG. 5A, the channel-by-channel value range setting unit 31 of the value range setting unit 30 of the CPU 110 sets a different value range for each channel. We perform inference to determine the range and check the output distribution from each layer.

For example, in the case of convolution CNV1, in the weighting area, the weighting area S _CH (32 in total) for each channel is calculated from the 1×1×3 weighting distribution associated with the 32 output channels. In this case, the value range _SCH , which is different for each channel, may be represented by bit shift.

In step S005 in FIG. 5A, the channel-common value range setting unit 32 of the value range setting unit 30 of the CPU 110 provisionally sets a value range common to all channels. S _ALL is assumed to be the initial value for the search. As a result, the inference unit 50 of the CPU 110 can perform quantization using the ratio to the maximum value as the value range for each channel (total of 32 values).

In addition, in the case of the output value range, the inference unit 50 of the CPU 110 may quantize (calculate) the value range S from the distribution of the output of the convolution CNV1 of 32×28×28, and use it as an initial value at the time of searching. .

In addition, in step S019 in FIG. 5B, it is determined whether or not all layers have been searched, and the value range is determined from the upper layer to the lower layer (step S021). A value range can be determined for each layer, and an optimal value range for each layer can be determined.

(1) Input of convolution CNV1 (2) Weighting of convolution CNV1 (3) Bias of convolution CNV1 (4) Output of convolution CNV1 (5) Weighting of convolution CNV2 (6) Bias of convolution CNV2 (7) ) Output of convolution CNV2 (8) Weighting of fully connected FLC1 (9) Bias of fully connected FLC1 (10) Output of fully connected FLC1 (11) Weighting of fully connected FLC2 (12) Bias of fully connected FLC2 (13) Full Output of connect FLC2

In addition, in the present embodiment, the optimum value range is determined in each layer for each parameter of output (input), weighting, and bias, but the processing from step S005 to step S013 in FIG. It is not limited. As concepts executed by the processing from step S005 to step S013 in FIG. 5A described above as the present embodiment, basic concepts for determining the range of output (input), weighting, and bias will be described below. Note that the processing of the CPU 110 based on the following concept may be executed in any step.

(A) Method of determining output (input) value range The basic concept for determining the output (input) value range is to determine only the value of loss due to inference without updating weights or biases through learning. .

(1) The CPU 110 makes an inference in advance in the inference unit 50 , sets the obtained value range S as an initial value at the time of searching, and sets a range candidate including the initial value by the value range setting unit 30 . For example, the weight updating unit 40 sets five values of 0.25 times, 0.5 times, 1 time, 2 times, and 4 times the constant multiple as value range candidates. This corresponds to the channel-common range provisional setting process described in step S005 of FIG. 5A.

(2) The inference unit 50 of the CPU 110 performs inference in each value range for the set range candidates, and the loss calculation unit 60 calculates a loss. This corresponds to the inference processing described in step S009 of FIG. 5A and the loss calculation processing described in step S011.

(3) The loss calculator 60 of the CPU 110 repeats loss calculation for the number of candidates. This corresponds to the repeated loop from steps S005 to S013 of FIG. 5A. The value range determination unit 70 determines the value range with the lowest loss as the value range for the output (input). This corresponds to the channel common range determination process described in step S015 of FIG. 5B.

(B) Determining method of convolution and full-connect weighting range Since the basic concept for determining the weighting range _SCH for each channel has already been determined by prior inference, by searching the channel common range S _ALL to decide. At that time, the weighting and bias of the quantization target layer are updated by learning.

(1) First, the weighting and bias of the quantization target layer are made variable, and the channel common range setting unit 32 is updated by learning by the learning loss calculation unit 90 .

(2) The CPU 110 causes the inference unit 50 to make an inference in advance, sets the obtained value range S as an initial value at the time of searching, and sets a range candidate including the initial value by the value range setting unit 30 . For example, the weight updating unit 40 sets five values of 0.25 times, 0.5 times, 1 time, 2 times, and 4 times the constant multiple as value range candidates. This corresponds to the channel-common range provisional setting process described in step S005 of FIG. 5A.

(3) The learning loss calculation unit 90 of the CPU 110 executes learning in each range for the range candidates defined in (1) of (B), and while updating the weighting and bias, the loss calculation unit 60 calculates the loss. calculate. This corresponds to the weighting update process described in step S007 in FIG. 5A, the inference process described in step S009, and the loss calculation process described in step S011.

(4) The CPU 110 causes the loss calculation unit 60 to repeat calculation of the loss by the number of range candidates. This corresponds to the repeated loop from steps S005 to S013 of FIG. 5A. CPU 110 determines, in value range determination section 70, the value range with the lowest loss in loss calculation section 60 as the value range for the weighting. This corresponds to the channel common range determination process described in step S015 of FIG. 5B.

When searching for another candidate, the weighting and bias updated by the learning of the loss calculation unit 60 are recorded, and the state before learning is restored before searching. Thus, the CPU 110 can use the values when the weights and biases were also learned in that range.

(C) Method of Determining Bias Range of Convolution Layer and Fully Connected Layer The relationship between weighting and bias in the convolution layer or fully connected layer can be expressed as shown in Equation (4).

As a result, the bias scale can be regarded as equivalent to the product of the weighting and the input in the quantization target layer, so the bias range S _BIAS can be calculated as in Equation (5).

In addition, in this embodiment, the weighting range S can be separated into a channel-common range S _ALL and a different range S _CH for each channel by Equation (2), so the channel-common bias range S _ALLBIAS is , can be expressed as in equation (6).

Also, the bias value range _SCHBIAS that is different for each channel can be expressed as in Equation (7).

As described above, in the present embodiment, the output (input), weighting, and bias parameters are not limited to the processing from step S005 to step S013 in FIG. An optimal range can be determined.

10 acquisition unit 20 separation unit 30 value range setting unit 40 weight update unit 50 inference unit 60 loss calculation unit 70 range determination unit 80 channel-by-channel value range determination unit 90 learning loss calculation unit 100 information processing device 110 CPU
120 storage unit 130 ROM
140 RAM
150 Input unit 160 Display unit 170 Communication unit

Claims

A procedure for separating the value range in the quantization of the parameters of the deep learning model into a value range common to channels and a different value range for each channel in the quantization target layer of the deep learning model;
A procedure for determining the separated value range common to the channels and the different value ranges in each of the channels by different methods;
A quantization program for executing a computer.
a procedure for acquiring input data and correct data corresponding to the input data;
a step of quantizing the parameter by sequentially changing the range common to the channels;
a procedure of calculating a loss of the result of estimation by the deep learning model based on the input data with respect to the correct data, and determining a value range when the loss is minimized as a value range common to the channels;
2. The quantization program according to claim 1, for causing a computer to execute:
a procedure for acquiring input data and correct data corresponding to the input data;
a step of sequentially changing the range of values common to the channels, quantizing the parameters, and learning the deep learning model using learning teacher data;
a procedure of calculating the loss of the estimated result based on the input data by the deep learning model after learning with respect to the correct data, and determining the value range when the loss is minimized as the value range common to the channels;
2. The quantization program according to claim 1, for causing a computer to execute:
A step of repeating the determination of the common range of the channels layer by layer from the layer immediately below the quantization target layer to the lowest layer of the deep learning model;
4. The quantization program according to claim 2 or 3, for causing a computer to execute the quantization program.
5. The quantization program according to any one of claims 1 to 4, wherein the initial value of the quantization target layer is the initial value of the uppermost layer.
The initial value of the quantization target layer is an initial value of a layer different from the top layer,
A quantization program according to any one of claims 1 to 4.
The different value ranges for each channel are
Calculated using any one of the weighting, bias, and output distribution for each channel in the quantization target layer,
A quantization program according to any one of claims 1 to 6.
wherein the range of values different in each channel is a ratio of the range of values in each channel;
A quantization program according to any one of claims 1 to 7.
When different ranges are calculated with weights or biases for each of the channels,
determining different bins in each of said channels using a channel-wise ratio of the distribution of the parameter;
9. A quantization program according to claim 8.
When the output of the quantization target layer has a different value range for each channel,
a procedure for obtaining said input data;
determining a different range for each channel using the channel-wise ratio of the distribution of the output during inference to the input data;
5. The quantization program according to any one of claims 2 to 4, further comprising:
The different value ranges for each channel are represented by bit shifts,
A quantization program according to any one of claims 1 to 10.
a separation unit that separates a value range in quantization of parameters of the deep learning model into a value range common to channels and a value range different for each channel in a quantization target layer of the deep learning model;
a value range determination unit that determines the separated value range common to the channels and the different value ranges in each of the channels using different methods;
An information processing device comprising:
In a quantization target layer of a deep learning model, a step of separating a value range in quantization of a parameter of the deep learning model into a value range common to channels and a different value range for each channel;
determining the separated value range common to the channels and the different value ranges in each of the channels using different methods;
Binning method including .