WO2021246249A1

WO2021246249A1 - Information processing device and information processing method

Info

Publication number: WO2021246249A1
Application number: PCT/JP2021/019876
Authority: WO
Inventors: 隆之氏家
Original assignee: ソニーグループ株式会社
Priority date: 2020-06-03
Filing date: 2021-05-25
Publication date: 2021-12-09

Abstract

An information processing device comprises a generation unit (151), a calculation unit (152), and an evaluation unit (153). The generation unit (151) generates a DNN model having a plurality of calculation layers that output calculation results based on input data, and a differential model that differs from a quantized DNN model obtained by quantizing the DNN model. The calculation unit (152) calculates, for each calculation layer, error factors that constitute the quantization errors in each calculation layer of the quantized DNN model, on the basis of the calculation results of mutually corresponding calculation layers of the DNN model and the differential model. The evaluation unit (153) evaluates the error factors derived for each calculation layer.

Description

Information processing equipment and information processing method

This disclosure relates to an information processing device and an information processing method.

In recent years, research on the quantization of artificial neural networks such as DNN (Deep Neural Network), which has multiple hidden layers between the input layer and the output layer, has been promoted for the purpose of speeding up calculations and reducing computational resources. ing.

For example, Patent Document 1 proposes a computer system that dynamically optimizes bit accuracy during training in order to reduce the demand for computational resources in a neural network. Further, Patent Document 2 proposes a method and an apparatus for quantizing the parameters of a neural network.

Japanese Unexamined Patent Publication No. 2019-164793 Japanese Unexamined Patent Publication No. 2019-32833

However, since DNN has the characteristic of accumulating operations in multiple layers, even if the final output result of the quantized DNN is referred to, the cause of the performance deterioration of the DNN due to the quantization is detailed. Difficult to analyze.

Therefore, in this disclosure, we propose an information processing device and an information processing method that can analyze in detail the factors of DNN performance deterioration due to quantization.

In order to solve the above problems, the information processing apparatus of one form according to the present disclosure includes a generation unit, a calculation unit, and an evaluation unit. The generation unit generates a difference model between a DNN model having a plurality of calculation layers that output calculation results based on input data and a quantized DNN model obtained by quantizing the DNN model. The calculation unit derives an error factor constituting the quantization error of each calculation layer of the quantized DNN model for each calculation layer based on the calculation results of the calculation layers corresponding to each other of the DNN model and the difference model. The evaluation unit evaluates the error factors derived for each arithmetic layer.

It is a figure which shows the outline of the information processing which concerns on embodiment of this disclosure. It is a figure which shows the outline of the information processing which concerns on embodiment of this disclosure. It is a block diagram which shows the structural example of the information processing apparatus which concerns on embodiment of this disclosure. It is a figure for demonstrating the outline of the approximation of the activation function which concerns on embodiment of this disclosure. It is a figure which shows the outline of the calculation method of the error factor in the maximum value pooling which concerns on embodiment of this disclosure. It is a figure which shows the information display example of the design tool which concerns on embodiment of this disclosure. It is a flowchart which shows the processing procedure example of the contribution signal display processing which concerns on embodiment of this disclosure. It is a flowchart which shows the processing procedure example of the optimization plan information display processing which concerns on embodiment of this disclosure. It is a figure which shows the outline of the information processing which concerns on the modification. It is a block diagram which shows the schematic configuration example of the computer which functions as the information processing apparatus which concerns on embodiment of this disclosure.

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. In each of the following embodiments, duplicate description may be omitted by assigning the same reference numerals to the same parts. Further, in the present specification and the drawings, a plurality of components having substantially the same functional configuration may be distinguished by adding different numbers after the same reference numerals.

In addition, the present disclosure will be described according to the order of items shown below.
1. 1. Introduction 2. Outline of information processing according to the embodiment of the present disclosure 3. Configuration example of information processing device 4. Method of applying the error factor analysis of the present disclosure to each layer 4-1. About all connection layers 4-2. About activation function 4-3. About the pooling layer 4-3-1. Average value pooling 4-3-2. Maximum value pooling 5. Example of error factor analysis according to the embodiment of the present disclosure 6. Example of processing procedure of information processing device 6-1. Contribution signal display processing 6-2. Optimization plan information display processing 7. Modification example 7-1. Aggregation of error factors 7-2. Other modifications 8. Hardware configuration 9. Conclusion

<< 1. Introduction >>
While learning methods using neural networks such as DNN have high accuracy, the processing load related to computation is large, so research is underway to effectively reduce the processing load by quantizing the neural network. ..

Conventionally, in dynamic quantization in which the quantization bit accuracy is dynamically determined for each layer of the operation in DNN, the optimum bit accuracy value for each layer is determined manually by manual adjustment. However, manual adjustment by hand is established by trial and error of the operator and experience of tuning so far. And, such manual adjustment is not based on a quantitative understanding as to which layer should be given more bit precision throughout the network to mitigate the adverse effect of quantization on DNN performance.

Further, the above-mentioned Patent Document 1 and Patent Document 2 propose a method for dynamically determining the quantization bit accuracy. However, these methods also do not quantitatively analyze the influence of each layer on the entire network, but are methods that automate trial and error or analyze independently for each layer. I have a similar problem.

When analyzing the adverse effect of DNN on the performance due to quantization, there are the following problems depending on the characteristics of DNN.
(Problem 1) Since DNN stacks many layers for performing calculations, it is not possible to determine in which layer the quantization most affects the performance deterioration by observing only the output result.
(Problem 2) It is difficult to observe the influence of the quantization error propagated between layers because the result of the inner product or convolution operation is distorted by the activation operation by the nonlinear function.

The technique according to the present disclosure addresses the above-mentioned problems depending on the characteristics of DNN, and quantitatively determines which layer should be given more bit precision to mitigate the adverse effect of quantization on performance throughout the network. To give an understanding, we propose a method of propagating the quantization error generated in each layer of DNN and tracking the largest error factor. This makes it possible to analyze the layer that most affects the output error when applying the quantization to DNN.

<< 2. Outline of information processing according to the embodiment of the present disclosure >>
1 and 2 are diagrams showing an outline of information processing according to the embodiment of the present disclosure. The information processing (information processing method) according to the embodiment of the present disclosure is realized by the information processing apparatus 1 (see FIG. 3) described later.

As shown in FIG. 1, the information processing apparatus 1 generates a quantized DNN model M2 obtained by quantizing the DNN model M1. The quantization method is not particularly limited, and for example, quantization by an operation using a fixed-point number by a fixed-point method can be adopted.

Further, the information processing apparatus 1 generates a difference DNN model M3 between the DNN model M1 and the quantized DNN model M2. The difference DNN model M3 is a model having a network topology equivalent to that of the DNN model M1 and the quantized DNN model M2, which is composed of the feature map before and after the quantization in each layer of the DNN and the entire difference of the parameters.

The DNN model M1, the quantized DNN model M2, and the differential DNN model M3 each have a plurality of corresponding arithmetic layers L _m to L _n . A plurality of arithmetic layers L _m to an arithmetic layer L _n correspond to any one of an input layer, an output layer, and a plurality of hidden layers (intermediate layers) (for example, "m" and "n" have m + 2 <n. Satisfied positive integer). The plurality of arithmetic layers L _m to L _n may correspond to any of all connection layers, convolution layers, pooling layers, activation functions, or other types of layers, depending on the structure of the network. In FIG. 1, "m" is, for example, an arbitrary integer of 1 or more, and "n" is an arbitrary integer of 4 or more.

The DNN model M1 outputs the calculation result for the input data. The quantized DNN model M2 outputs an operation result (quantized output) for the quantized input. The difference DNN model M3 outputs a calculation result (difference output) for the difference input.

The information processing apparatus 1 calculates an error factor constituting the quantization error of each calculation layer of the quantized DNN model M2 based on the calculation results of the calculation layers corresponding to each other of the DNN model M1 and the difference DNN model M3. Derived for each layer. That is, the information processing apparatus 1 linearly combines the calculation results of each calculation layer of the quantized DNN model M2 with the calculation results of the calculation layers of the corresponding DNN model M1 and the calculation results of the calculation layer of the difference DNN model M3. Indirectly expressed as. By doing so, the calculation result of each calculation layer of the quantized DNN model M2 can be decomposed for each calculation layer, and the error factors constituting the quantization error can be propagated to the subsequent calculation layer.

Specifically, as shown in FIG. 2, the information processing apparatus 1, the input data and the differential data input from the operation layer L _{m (input} layer), not shown, corresponding arithmetic layer DNN model M1 and the differential DNN model M3 Enter each in L _{m + 1.} The difference input data is, for example, difference data between the input data to the DNN model M1 and the quantized input data to the quantized DNN model M2.

Subsequently, the information processing apparatus 1 uses the calculation result (output before quantization) _{of the calculation layer L m + 1} of the DNN model M1 and the calculation result (difference output) _{of the calculation layer L m + 1 of the difference DNN model M3.} An error factor E _{m + 1} that indirectly represents the arithmetic layer of the arithmetic layer L _{m + 1} of the quantized DNN model M2 is derived.

The error factor E _{m + 1} includes a plurality of factors m + 1_1 to factor m + 1_z. Factors m + 1_1 to factor m + 1_z correspond to errors caused by various parameters such as weight parameters and bias parameters of each arithmetic layer, non-linear components of the activation function, differences between elements in the pooling process, and the like.

Then, the information processing apparatus 1, the derived error factor _{E m + 1} of the operation layer _{L m + 1,} followed by a calculation layer _{L m + 1,} and outputs corresponding to the operation layer _{L m + 2} of DNN model M1 and the differential DNN model M3.

Subsequently, similarly to the operation layer _{L m + 1,} the information processing apparatus 1 derives an error factor _{E m + 2} of the operation layer _{L m + 2.} Then, the information processing apparatus 1, the error factor _{E m + 2} derived, followed by calculation layer _{L m + 2,} and outputs the corresponding operation layer _{L m + 3} of DNN model M1 and the differential DNN model M3.

Similarly, the information processing apparatus 1 sequentially propagates the error factor of each arithmetic layer to the subsequent arithmetic layer, and the error constituting the quantization error of each arithmetic layer up _{to the arithmetic layer Ln-1.} Derive the factor. Then, the information processing apparatus 1 executes the final output corresponding to the quantized output of quantized DNN model M2 from a not-shown operation layer L _{n (output} layer).

The information processing device 1 evaluates the error factors derived for each arithmetic layer. Specifically, the information processing apparatus 1 has a degree of influence of an error factor of each arithmetic layer on an error (output error) included in the quantized output of the quantized DNN model M2 based on a predetermined evaluation index. To evaluate.

As described above, the information processing apparatus 1 has an error constituting the quantization error of each arithmetic layer of the quantized DNN model M2 based on the arithmetic results of the arithmetic layers corresponding to each other of the DNN model M1 and the difference DNN model M3. Factors are derived for each arithmetic layer. Then, the information processing apparatus 1 evaluates the error factor derived for each arithmetic layer. As a result, the information processing apparatus 1 can analyze in detail the factors of the DNN performance deterioration due to the quantization.

<< 3. Information processing device configuration >>
Hereinafter, the configuration of the information processing apparatus 1 according to the embodiment of the present disclosure will be described. FIG. 3 is a block diagram showing a configuration example of the information processing apparatus according to the embodiment of the present disclosure.

As shown in FIG. 3, the information processing apparatus 1 includes an input unit 110, an output unit 120, a communication unit 130, a storage unit 140, and a control unit 150.

The input unit 110 detects an input operation by the administrator of the information processing device 1. The control unit 150, which will be described later, can input, for example, a data set for evaluating the quantization error according to the input operation detected by the input unit 110. The input unit 110 can be realized by, for example, various buttons, a keyboard, a touch panel, a mouse, a switch, and the like.

The output unit 120 outputs various information. The output unit 120 may be configured to include a display device that outputs visual information. The output unit 120 can display the window of the analysis tool executed in response to the operation from the administrator. The display device can be realized by, for example, a CRT (Cathode Ray Tube), an LCD (Liquid Crystal Display), an OLED (Organic Light Emitting Diode), or the like.

The communication unit 130 can be realized by, for example, a NIC (Network Interface Card), various communication modems, or the like. The communication unit 130 is connected to a network (Internet or the like) by wire or wirelessly, and transmits / receives information to / from an external device or the like via the network.

The storage unit 140 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory (Flash Memory), or a storage device such as a hard disk or an optical disk. The storage unit 140 functions as a storage means for the control unit 150. The storage unit 140 has a design tool storage unit 141 and an evaluation threshold storage unit 142.

The design tool storage unit 141 stores a design tool that provides various functions for designing a DNN model (for example, a DNN model M1). The design tool may include an analytical function for analyzing the quantization error of the quantized DNN model (eg, the quantized DNN model M2) obtained by quantizing the designed DNN model (eg, DNN model M1). can. This analysis function can provide the administrator of the information processing apparatus 1 with a function for analyzing and visualizing the quantization error.

The evaluation threshold storage unit 142 stores a threshold for evaluating the quantization error for each arithmetic layer. The evaluation threshold storage unit 142 is used for the evaluation process of the evaluation unit 153, which will be described later.

The control unit 150 is a controller that controls each unit of the information processing device 1. The control unit 150 is realized by a processor such as a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or a GPU (Graphics Processing Unit), for example. The control unit 150 is realized by the processor executing various programs stored inside the information processing apparatus 1 with a RAM (Random Access Memory) or the like as a work area. The control unit 150 may be realized by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array). The CPU, MPU, ASIC, and FPGA can all be regarded as controllers.

As shown in FIG. 3, the control unit 150 includes a generation unit 151, a calculation unit 152, and an evaluation unit 153, and realizes or executes the functions and operations of the information processing device 1 described below. Each block (generation unit 151 to evaluation unit 153) constituting the control unit 150 is a functional block indicating the function of the control unit 150, respectively. These functional blocks may be software blocks or hardware blocks. For example, each of the above-mentioned functional blocks may be one software module realized by software (including a microprogram), or may be one circuit block on a semiconductor chip (die). Of course, each functional block may be one processor or one integrated circuit. The method of configuring the functional block is arbitrary. The control unit 150 may be configured in a functional unit different from the above-mentioned functional block.

The generation unit 151 generates a difference DNN model M3 between a DNN model M1 having a plurality of calculation layers that output calculation results based on input data and a quantized DNN model M2 obtained by quantizing the DNN model M1. The quantization method of the DNN model M1 is not limited to a specific method. Further, the difference between the DNN model M1 and the quantized DNN model M2 is a model composed of the entire difference between the feature map and the parameters before and after the quantization in each layer of the DNN, and is equivalent to the original model DNN model M1. Represents a model with the topology of. That is, the DNN model M1, the quantized DNN model M2, and the differential DNN model M3 are each provided with corresponding layers.

The calculation unit 152 sets an error factor constituting the quantization error of each calculation layer of the quantized DNN model M2 based on the calculation results of the calculation layers corresponding to each other of the DNN model M1 and the difference DNN model M3. Derived for each. That is, the information processing apparatus 1 linearly combines the calculation results of each calculation layer of the quantized DNN model M2 with the calculation results of the calculation layers of the corresponding DNN model M1 and the calculation results of the calculation layer of the difference DNN model M3. Indirectly expressed as. By doing so, the calculation result of each calculation layer of the quantized DNN model M2 can be decomposed for each calculation layer, and the error factor can be propagated to the subsequent calculation layer.

Specifically, the calculation unit 152 transfers the calculation result of the calculation layer of the DNN model M1 and the calculation result of the corresponding calculation layer in the difference DNN model M3 to the next calculation layer of the DNN model M1 and the difference DNN model M3. Interact (input). Then, using the calculation result output from the DNN model M1 and the calculation result output from the difference DNN model M3, the quantization error of the corresponding calculation layer in the quantized DNN model M2 is indirectly expressed. The error factors that make up the quantization error are derived. After deriving the error factor, the calculation unit 152 propagates the derived error factor to the next calculation layer and sequentially executes the same processing to configure the quantization error of each calculation layer of the quantized DNN model M2. The error factors to be performed are derived for each calculation layer. The details of the processing of the calculation unit 152 will be described later with reference to the drawings.

The evaluation unit 153 evaluates each error factor derived for each calculation layer by the calculation unit 152. Specifically, the evaluation unit 153 calculates and evaluates the degree of influence of the error factor derived for each arithmetic layer on the error (output error) included in the quantized output of the quantized DNN model M2. .. Further, the evaluation unit 153 presents detailed information regarding the error factor for each calculation layer. In addition, the evaluation unit 153 presents advice information for optimizing the quantized DNN model M2. The details of the processing of the evaluation unit 153 will be described later with reference to the figure.

<< 4. Method of applying the error factor analysis of the present disclosure to each layer >>
Hereinafter, a method of applying the error factor analysis by the information processing apparatus 1 according to the embodiment of the present disclosure to each layer will be described.

<4-1. About all connection layers>
The calculation unit 152 determines the quantization error for each calculation layer based on the result of the inner product calculation of the input vector, weight parameter, and bias parameter for the DNN model M1 and the corresponding input vector, weight parameter, and bias parameter in the difference DNN model M3. Is decomposed as the sum of error factors.

In the DNN model M1, the input vectors (images) for all the connection layers are x, the weight parameter w, and the bias parameter b. Further, in the difference DNN model M3, the corresponding elements are Δx, Δw, and Δb. At this time, the output vector (image) y + Δy of all the connection layers in the quantized DNN model M2 can be transformed and expressed as the following equation (1). However, "・" represents a matrix product (none matrix vector product).

y + Δy = (x + Δx) · (w + Δw) + (b + Δb)
= X ・ w + x ・ Δw + Δx ・ w + Δx ・ Δw + b + Δb
= Y + x · Δw + Δx · w + Δx · Δw + Δb ... (1)

Thereby, the error Δy generated in the calculation result by the quantization in all the connection layers is expressed as the sum of the error factors included in the quantization error as shown in the following equation (2).

Δy = x · Δw + Δx · w + Δx · Δw + Δb ... (2)

Depending on the quantization method, the calculation result itself is held with a higher accuracy than the quantization bit accuracy, and when it propagates to the subsequent stage (layer), it is quantized again to match the bit accuracy. May be done. In such a case, the overall integration can be maintained by adding the residual of the difference output of the difference DNN model M3 and Δy as Δq.

The modification of the above equation can be applied not only to all connecting layers but also to all layers that perform linear operations such as convolution layers for images.

<4-2. About activation function ＞
When a nonlinear operation by the activation function is performed in the arithmetic layer of the DNN model M1 and the arithmetic layer corresponding to each other of the difference DNN model M3, the activation function is approximated, and each arithmetic layer is based on the approximate activation function. Derivation of the quantization error.

In general, the activation function f, which is inserted in the subsequent stage of all connection layers, convolution layers, etc. in DNN and performs a non-linear operation, needs to be handled approximately when analyzing the quantization error. FIG. 4 is a diagram for explaining an outline of approximation of the activation function according to the embodiment of the present disclosure. In Figure 4, the input x _{₁ + Δx 1} with quantization error [Delta] x ₁ is shows a case where the input to the activation function f.

As shown in FIG. 4, for example, the arithmetic unit 152 can separate the non-linearity of the activation function f into terms of Δf by linearly approximating the activation function f. _{That is, the output f (x 1} + Δx ₁ ) when input to the activation function of the quantized DNN model M2 uses the differential coefficient f ′ (x ₁ ) of the activation function f to the following equation (3). ) Can be expressed.

f (x ₁ + Δx ₁ ) = f (x ₁ ) + f ′ (x ₁ ) Δx _{1 −} Δf ... (3)

Here, [Delta] x _1, since the linear combination of error factors before layer (quantization error), _{f'(x 1)} f'also [Delta] x ₁ _{(x 1)} is the error factor since it is a constant It becomes a linear combination. Further, Δf can be calculated as the following equation (4), assuming that the output of the difference DNN model M3 is Δf (x _1).

Δf = f'(x ₁ ) Δx ₁ − Δf (x ₁ ) ... (4)

_{As described above, the term of f'(x 1} ) Δx _{1 −} Δf shown in the above equation (3) becomes an error factor of the entire activation function.

<4-3. About the pooling layer>
(4-3-1. Average pooling)
When the average value pooling process is performed in the arithmetic layer, the arithmetic unit 152 determines the quantization error for each arithmetic layer in the average value pooling process due to the filter size of the filter used for the average value pooling process and the linear combination of the elements included in the filter. The error factors that make up the above are derived.

For example, in the average value pooling in the pooling layer mainly inserted in the convolutional neural network for an image, the average value of each element (for example, a pixel value) in the filter used for the pooling process is output. That is, assuming that the filter size is α and each element of the image contained in the filter is _xi (i = 1, ..., α), the output value y _P1 which is the calculation result of the pooling layer of the DNN model M1 is It is calculated as y _P1 = Σ _i x _{i / α.} _{This operation is linear, and the filter output error Δy P1} is expressed as a linear combination with _{Δy P1} = Σ _i Δx _i / α even when the difference Δxi contained in the input element of the quantized DNN model M2 is taken into consideration. To. Therefore, it can be propagated to the subsequent layer.

(4-3-2. Maximum value pooling)
On the other hand, when the maximum value pooling process is performed in the calculation layer, the calculation unit 152 sets the difference between the element that should have been selected as the representative value before the quantization and the element selected as the representative value after the quantization. It is used to derive the error factors that make up the quantization error for each arithmetic layer.

In the maximum value pooling in the pooling layer, the maximum value of each element (for example, pixel value) in the filter is output as a representative value, but this process is treated as a non-linear process. The handling in error factor analysis is also affected by the non-linearity with respect to the difference in representative values before and after quantization. FIG. 5 is a diagram showing an outline of a method for calculating an error factor in the maximum value pooling according to the embodiment of the present disclosure. The left figure of FIG. 5 shows an example of selection of representative values before quantization, and the right figure of FIG. 5 shows an example of selection of representative values after quantization.

As shown in FIG. 5, before the quantization, the pixel value x0 is selected as the representative value, but after the quantization, the pixel value x3 + Δx3 is selected as the representative value, and the representative values selected before and after the quantization are. It's different. When the representative values selected before and after the quantization are the same, the filtering process merely has a role of propagating the calculation result and the error factor of the previous layer to the subsequent stage, and there is no problem. However, as shown in FIG. 5, when the representative values selected before and after the quantization are different, the error factor also changes, which is a problem.

Therefore, in such a case, the arithmetic unit 152 responds by introducing a difference (vector) Δp between the representative value originally selected before the quantization and the representative value actually selected after the quantization. .. Δp is 0 (zero) for the part where the same element is selected before and after quantization. By introducing such a mechanism, the filter output error Δy _P2 _{in the maximum value pooling becomes Δy P2} = Δx _c + Δp under c = argmax _i (x ₀ , ..., X _α ). .. It should be noted that the error factor itself does not reflect the change in the address (position), but only the change in the calculation result, but if necessary, an auxiliary index can be calculated and used for analysis. ..

<< 5. Example of error factor analysis according to the embodiment of the present disclosure >>
Hereinafter, an example of error factor analysis according to the embodiment of the present disclosure will be described. The function of error factor analysis according to the embodiment of the present disclosure can be additionally realized, for example, as an analysis function of a design tool or a library stored in the design tool storage unit 141. The administrator of the information processing apparatus 1 inputs the DNN by using the design tool, quantizes the DNN model M1 based on a specific method, and generates the quantized DNN model M2.

In addition, the administrator of the information processing apparatus 1 can use the function of analyzing and visualizing the quantization error for each arithmetic layer introduced in the design tool to quantize parameters such as the quantization bit width for each arithmetic layer. Gather information for coordination or changes in network structure. The error factors constituting the quantization error appear as differences from the output result / feature map of the original (pre-quantization) DNN model M1. Therefore, the information collected for each arithmetic layer can be variously processed and visualized and used as auxiliary information for further quantization such as quantization for each channel.

FIG. 6 is a diagram showing an information display example of the design tool according to the embodiment of the present disclosure. FIG. 6 shows an example of the analysis window 121 (“Tensor Board”) displayed on the output unit 120 by the design tool. In the analysis window 121 shown in FIG. 6, a DNN graph GR that visualizes the network structure of a quantized DNN model (for example, a quantized DNN model M2) obtained by quantizing a DNN model (for example, the DNN model M1) is displayed. Will be done. FIG. 6 shows, as an example, an example in which a DNN graph GR showing a network structure of a convolutional neural network is displayed. In the DNN graph GR, each block of "conv1", "conv2", "pool2", "vector", "dense3", and "output" is a quantized DNN model (for example, quantized DNN model M2). Each arithmetic layer is shown.

The error factors that are derived for each calculation layer by the calculation unit 152 and that constitute the quantization error that propagates to the output layer have the same dimensions for each term, and can be summarized and compared using the same evaluation index. The evaluation unit 153 describes an error included in the quantized output of the quantized DNN model (for example, the quantized DNN model M2) based on a predetermined evaluation index (hereinafter, referred to as “output error”). ) Is evaluated by the degree of influence of the error factor for each arithmetic layer. Examples of the evaluation index include the average value and the maximum value after taking the absolute value for each element, the length of the entire factor (vector) (L1, L∞, L2 norm), and the like. By applying such an evaluation index, each error factor constituting the quantization error can be summarized into a scalar value and used for analysis of the behavior change accompanying the quantization.

Then, the evaluation unit 153 displays the degree of influence (contribution degree) of the error factor for each calculation layer on the output error in different modes according to the evaluation result. Specifically, for example, as shown in FIG. 6, the evaluation unit 153 sets the evaluation results of each block of "conv1", "conv2", "pool2", "vector", "dense3", and "output". Quantization error contribution signals SG ₁ to SG ₆ are displayed in the upper left, respectively. The quantization error contribution signals SG ₁ to SG ₆ are relative to the quantization error (or its evaluation index) for each arithmetic layer in the output error of the quantized DNN model (for example, the quantized DNN model M2). It shows the degree (magnitude) of the influence of the error factor on the calculation layer. For example, the quantization error contribution signals SG ₁ to SG ₆ can display the contribution indicating the degree of influence by changing the display mode such as a color or a pattern. As an example, a signal corresponding to a high-contribution arithmetic layer can be displayed in red, a signal corresponding to a medium-contribution arithmetic layer can be displayed in yellow, and a signal corresponding to a low-contribution arithmetic layer can be displayed in green. The contribution degree is preset by the administrator of the information processing apparatus 1 and can be evaluated based on the evaluation threshold value stored in the evaluation threshold value storage unit 142.

Further, the evaluation unit 153 presents detailed information about the error factor for each of the error factors for each calculation layer. Specifically, when the analysis window 121 _{detects, for example, an operation on the quantization error contribution signal SG 2} , the evaluation unit 153 of the calculation layer (“conv2”) corresponding to the _{quantization error contribution signal SG 2} More detailed internal error factor information 121-1 is presented in a pop-up. The error factor information 121-1 is, for example, an error factor name (“Dxw”, “xDw”) having a high degree of contribution (high degree of influence) based on a default setting or a magnitude of an evaluation index specified in advance. , "Df", "Dq", etc.) are displayed in descending order. The administrator of the information processing apparatus 1 can adjust the network of the quantized DNN model M2 by referring to the error factor information 121-1.

Further, the evaluation unit 153 presents detailed information on the error factor and advice information for optimizing the quantized DNN model (for example, the quantized DNN model M2). Specifically, the evaluation unit 153 displays the optimization plan information (“Optimization Hint”) of the quantized DNN model (for example, the quantized DNN model M2) in the pop-up displaying the error factor information 121-1. 121-2 is presented. The optimization plan information ("Optimization Hint") 121-2 includes, for example, "increase the total number of bits of w by x bits", "shift the decimal part of x by bits", "increase the number of arithmetic bits by z bits", etc. The optimization hints associated with the error factors that have a large influence on the calculation layer are displayed in order from the top. The administrator of the information processing apparatus 1 performs network adjustment of the quantized DNN model (for example, the quantized DNN model M2) more easily by selecting a plan from the optimization plan information 121-2. Can be done.

<< 6. Information processing device processing procedure example >>
Hereinafter, an example of a processing procedure by the information processing apparatus 1 according to the embodiment of the present disclosure will be described with reference to FIGS. 7 and 8. FIG. 7 is a flowchart showing a processing procedure example of the contribution signal display processing according to the embodiment of the present disclosure. FIG. 8 is a flowchart showing a processing procedure example of the optimization plan information display processing according to the embodiment of the present disclosure.

<6-1. Contribution signal display processing>
Hereinafter, the display processing of the quantization error contribution signal will be described with reference to FIG. 7. The display process of the quantization error contribution signal shown in FIG. 7 is composed of a procedure PH ₁ _{for deriving the quantization error and a procedure PH 2} for classifying and displaying the quantization error contribution signal.

As shown in FIG. 7, the calculation unit 152 inputs the data set to be processed, evaluates the DNN model before and after quantization (step S101), and analyzes the error factor (step S102). The error factor analysis derives the error factors that make up the quantization error for each arithmetic layer.

After the error factor analysis, the calculation unit 152 determines whether or not the evaluation of the entire data set is completed (step S103).

When the calculation unit 152 determines that the evaluation of the entire data set has not been completed (step S103; No), the calculation unit 152 prepares the next data (step S104), and returns to the processing procedure of the above step S101.

On the other hand, when the calculation unit 152 determines that the evaluation of the entire data set has been completed (step S103; Yes), the calculation unit 152 averages the error factors of the entire data set (step S105).

Subsequently, the evaluation unit 153 executes a function of analyzing and visualizing the quantization error of each arithmetic layer according to the operation of the administrator of the information processing apparatus 1, and displays the DNN graph GR in the analysis window 121 (step). S106).

After displaying the DNN graph GR, the evaluation unit 153, for example, relativizes the error factors associated with the calculation layer of the DNN graph GR with respect to the original calculation result (step S107), and summarizes each error factor with a specific evaluation index. (Step S108).

After summarizing the error factors, the evaluation unit 153 compares the evaluation index value with the preset threshold value, classifies the contribution of the error factors constituting the quantization error of the corresponding arithmetic layer (step S109), and classifies the contribution as the classification result. The quantization error contribution signal SG is displayed in the analysis window 121 (step S110).

Subsequently, the evaluation unit 153 determines whether or not the quantization error contribution signal of all layers is displayed (step S111).

When the evaluation unit 153 determines that the quantization error contribution signal of all layers is not displayed (step S111; No), the evaluation unit 153 moves to the processing of the layer in which the quantization error contribution signal is not yet displayed (step). S112), the process returns to the processing procedure of step S107.

On the other hand, when the evaluation unit 153 determines that the quantization error contribution signal of all layers is displayed (step S111; Yes), the evaluation unit 153 ends the display process of the quantization error contribution signal shown in FIG. 7.

The processes of steps S107 to S112 can be executed in any order, such as processing in order from the final calculation layer of the DNN model quantized by the design tool (for example, the quantized DNN model M2).

<6-2. Optimization plan information display processing>
Hereinafter, the display processing of the optimization plan information will be described with reference to FIG. As shown in FIG. 8, the evaluation unit 153 ranks the error factors of the corresponding arithmetic layer based on the evaluation index value according to the operation detection of the administrator of the information processing apparatus 1 with respect to the quantization error contribution signal ( Step S201).

Subsequently, the evaluation unit 153 arranges the error factor names having a large influence on the output error of the quantized DNN model (for example, the quantized DNN model M2) in descending order based on the ranking result. The factor information 121-1 is displayed in a pop-up in the analysis window 121 (step S202).

Subsequently, the evaluation unit 153 acquires an error factor having a higher rank (greater influence) (step S203), and corresponds to the error factor so that the Δ (error) element (Δx, Δw, etc.) included in the error factor becomes smaller. Create a hint for assigning the quantization accuracy of the element (step S204).

Subsequently, the evaluation unit 153 determines whether or not the evaluation index value is less than the threshold value that can be determined to have a small effect based on the created allocation hint (step S205).

When the evaluation unit 153 determines that the evaluation index value is not less than the threshold value (threshold value that can be determined to have a small effect) (step S205; No), the evaluation unit 153 returns to the processing procedure of step S204 and creates another allocation hint.

On the other hand, when the evaluation unit 153 determines that the evaluation index value is less than the threshold value (threshold value that can be determined to have a small influence) (step S205; Yes), it determines whether or not hints have been created for all the error factors having higher ranks. Determination (step S206).

When the evaluation unit 153 determines as a result of the determination that hints have not been created for all the error factors having higher ranks (step S206; No), the process returns to the processing procedure of step S203.

On the other hand, when the evaluation unit 153 determines as a result of the determination that hints are created for all the error factors having higher ranks (step S206; Yes), the hints associated with the error factors having higher ranks are arranged in order. The optimization plan information 121-2 is displayed in the analysis window 121 (step S207), and the display process of the optimization plan information shown in FIG. 8 is terminated.

<< 7. Modification example >>
<7-1. Aggregation of error factors>
The above embodiment shows an example, and various modifications and applications are possible. In the information processing according to the embodiment of the present disclosure, the error factor increases exponentially as the number of arithmetic layers of the DNN increases. Therefore, in a huge DNN model, it is difficult to analyze the error factor with a realistic memory consumption. Hereinafter, a modification example for such a problem will be described. FIG. 9 is a diagram showing an outline of information processing according to a modified example.

As shown in FIG. 9, in the information processing related to the modification, only the error factor (for example, factor _{m + 1_P} _{) that recorded the maximum value among the error factors Em + 1} summarized by the specific evaluation index is independently held. Other error factors are aggregated and held as the factor sum Σ. By repeating this operation up to the output layer, only two factors, the maximum error factor of the evaluation index caused by a specific layer and the sum of the other error factors, are finally obtained. From this information, it is possible to identify the layer that most affects the quantization error from the viewpoint of the evaluation index specified in advance, and the quantization process that is the cause in particular. Therefore, the designer of the DNN model can iteratively make adjustments to a particular layer to improve the overall performance of the quantized DNN model (eg, image recognition performance).

According to the information processing related to the modification shown in FIG. 9, the error factors after finishing the processing of each calculation layer are aggregated into two, that is, the one with the maximum evaluation index value and the total of other error factors. To. Therefore, even if a huge DNN model is designed and the number of arithmetic layers becomes enormous, the memory area for holding the arithmetic results can be suppressed to a constant multiple of 3 times the original, which is a realistic memory consumption. It is possible to analyze the error factor with.

<7-2. Other variants>
The information processing apparatus 1 according to the embodiment of the present disclosure may be realized by a dedicated computer system or a general-purpose computer system.

Further, various programs for realizing the information processing method according to the embodiment of the present disclosure may be stored and distributed in a computer-readable recording medium such as an optical disk, a semiconductor memory, a magnetic tape, or a flexible disk. At this time, for example, the information processing apparatus 1 realizes the information processing method according to the embodiment of the present disclosure by installing and executing various programs on a computer.

Further, various programs for realizing the information processing method according to the embodiment of the present disclosure may be stored in a disk device provided in a server device on a network such as the Internet so that they can be downloaded to a computer or the like. Further, the functions provided by various programs for realizing the information processing method according to the embodiment of the present disclosure may be realized by the cooperation between the OS (Operating System) and the application software. In this case, the part other than the OS may be stored in a medium and distributed, or the part other than the OS may be stored in the server device so that it can be downloaded to a computer or the like.

Further, among the processes described in the above-described embodiment, all or a part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed can be performed. All or part of it can be done automatically by a known method. In addition, information including processing procedures, specific names, various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified. For example, the various information shown in each figure is not limited to the information shown in the figure.

Further, each component of each device shown in the figure is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of them may be functionally or physically distributed / physically in arbitrary units according to various loads and usage conditions. Can be integrated and configured.

Further, the above-described embodiments can be appropriately combined in an area where the processing contents do not contradict each other. Further, the order of each step shown in the sequence diagram or the flowchart of the present embodiment can be changed as appropriate.

<< 8. Hardware configuration >>
An example of the hardware configuration of the information processing apparatus according to the embodiment of the present disclosure will be described with reference to FIG. FIG. 10 is a block diagram showing a schematic configuration example of a computer that functions as an information processing apparatus according to the embodiment of the present disclosure. Note that FIG. 10 shows a schematic configuration of a computer that functions as an information processing apparatus 1, and some of the components shown in FIG. 10 may be omitted, and components other than those shown in FIG. 10 may be further omitted. It may be included.

As shown in FIG. 10, the computer 200 functioning as the information processing device 1 includes, for example, a CPU 201, a ROM 202, a RAM 203, an interface 204, an input device 205, an output device 206, a storage 207, and a drive 208. , Port 209 and communication device 210.

The CPU 201 functions as, for example, an arithmetic processing device or a control device, and controls all or a part of the operation of each component based on various programs recorded in the ROM 202. Various programs stored in the ROM 202 may be recorded in the storage 207 or the recording medium 301 connected via the drive 208. In this case, the CPU 201 controls all or a part of the operation of each component based on the program stored in the recording medium 301. The various programs include programs that provide various functions for realizing information processing of the information processing apparatus 1.

The ROM 202 functions as an auxiliary storage device for storing programs read into the CPU 201, data used for calculations, and the like. The RAM 203 functions as a main storage device for temporarily or permanently storing, for example, a program read into the CPU 201 and various parameters that are appropriately changed when the program read into the CPU 201 is executed.

The CPU 201, ROM 202, and RAM 203 realize various functions such as the generation unit 151, the calculation unit 152, and the evaluation unit 153 included in the control unit 150 described above in collaboration with software (various programs stored in the ROM 202 and the like). Can be.

The CPU 201, ROM 202, and RAM 203 are connected to each other via the bus 211. Further, the bus 211 is connected to each part of the computer 200 via the interface 204.

The input device 205 is realized by a device such as a mouse, a keyboard, a touch panel, a button, a switch, and a lever, in which information is input by a user. The input device 205 may be a remote controller capable of transmitting a control signal using infrared rays or other radio waves. Further, the input device 205 may include a voice input device such as a microphone. The function of the input unit 110 described above can be realized by the input device 205.

The output device 206 visually or audibly gives the acquired information to the user, for example, a display device such as a CRT, LCD, or an organic EL, an audio output device such as a speaker or headphones, a printer, a mobile phone, or a facsimile. It is a device that can be notified. The function of the output unit 120 described above can be realized by the output device 206.

The storage 207 is a device for storing various types of data, and for example, a magnetic storage device such as a hard disk drive (HDD), a semiconductor storage device, an optical storage device, an optical magnetic storage device, or the like is used. The function of the storage unit 140 described above can be realized by the storage 9.

The drive 208 is, for example, a device for reading out information recorded on the recording medium 301 and writing information to the recording medium 301. The recording medium 301 includes a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, and the like.

The port 209 is a connection port for connecting an external device 302, and includes a USB (Universal Serial Bus) port, an IEEE1394 port, a SCSI (Small Computer System Interface), an RS-232C port, an optical audio terminal, and the like. The external device 302 includes a printer, a portable music player, a digital camera, a digital video camera, an IC recorder, and the like.

The communication device 210 is a communication interface for connecting to a network. The communication device 210 is, for example, a communication card for a wired or wireless LAN (Local Area Network), LTE (Long Term Evolution), Bluetooth (registered trademark), WUSB (Wireless USB), or the like. Further, the communication device 210 may be a router for optical communication, various communication modems, or the like. The function of the communication unit 130 described above can be realized by the communication device 210.

<< 9. Conclusion >>
As described above, according to one embodiment of the present disclosure, the information processing apparatus 1 includes a generation unit 151, a calculation unit 152, and an evaluation unit 153. The generation unit 151 generates a difference DNN model M3 between a DNN model M1 having a plurality of calculation layers that output calculation results based on input data and a quantized DNN model M2 obtained by quantizing the DNN model M1. The calculation unit 152 sets an error factor constituting the quantization error of each calculation layer of the quantized DNN model M2 based on the calculation results of the calculation layers corresponding to each other of the DNN model M1 and the difference DNN model M3. Derived for each. The evaluation unit 13 evaluates the error factors derived for each arithmetic layer.

This makes it possible to analyze in detail the factors of the DNN performance deterioration due to the quantization when the DNN composed of a plurality of arithmetic layers is quantized.

Further, the calculation unit 152 performs an error for each calculation layer based on the inner product calculation result of the input vector, the weight parameter and the bias parameter for the DNN model M1 and the input vector, the weight parameter and the bias parameter corresponding to the difference DNN model M3. Derivation of factors. As a result, the quantization error generated in each layer of all the connection layers of the DNN can be expressed by linearly combining the terms corresponding to the error factors constituting the quantization error, and can be propagated between the calculation layers.

Further, when the non-linear calculation by the activation function is performed in the calculation layer, the calculation unit 152 approximates the activation function and derives an error factor for each calculation layer based on the approximate activation function. As a result, the calculation result by the activation function can be expressed as a linear combination and propagated between the calculation layers.

Further, when the average value pooling process is performed in the arithmetic layer, the arithmetic unit 152 causes an error in each arithmetic layer in the average value pooling process due to the filter size of the filter used for the average value pooling process and the linear combination of the elements included in the filter. Derive the factor. As a result, the calculation result by the average value pooling can be expressed by a linear combination and propagated between the calculation layers.

Further, the calculation unit 152 determines the difference between the element that should have been selected as the representative value before the quantization and the element selected as the representative value after the quantization when the maximum value pooling process is performed in the calculation layer. It is used to derive the error factor for each arithmetic layer. As a result, the calculation result by the maximum value pooling can be treated as a linear combination and can be propagated between the calculation layers.

Further, the evaluation unit 153 evaluates the degree of influence of the error factor for each arithmetic layer on the output error of the quantized DNN model M2 based on the evaluation index defined in advance. This makes it possible to identify in which arithmetic layer the quantization error has a large effect on the output error.

Further, the evaluation unit 153 displays the degree of influence of the error factor for each arithmetic layer on the output error of the quantized DNN model M2 in different modes according to the evaluation result. As a result, the degree of influence of the error factor for each calculation layer can be recognized at a glance.

Further, the evaluation unit 153 presents detailed information about the error factor for each of the error factors for each calculation layer. This makes it possible to identify the factor that has a large influence among the plurality of factors included in the error factor.

In addition, the evaluation unit 153 presents detailed information as well as advice information for optimizing the quantized DNN model M2. As a result, the quantized DNN model M2 can be adjusted more easily.

Further, when the error factor for each calculation layer is propagated to the calculation layer in the subsequent stage, the evaluation unit 153 aggregates the error factors based on the conditions specified in advance. As a result, even if the DNN model M1 to be quantized is huge, it is possible to analyze the error factor with a realistic memory consumption.

Further, the effects described in the present specification are merely explanatory or exemplary and are not limited. That is, the techniques of the present disclosure may have other effects apparent to those of skill in the art from the description herein, in addition to, or in lieu of, the above effects.

The technology of the present disclosure can be configured as follows, assuming that it belongs to the technical scope of the present disclosure.
(1)
A generator that generates a difference model between a DNN model having a plurality of calculation layers that output calculation results based on input data and a quantized DNN model obtained by quantizing the DNN model.
An operation for deriving an error factor constituting the quantization error of each arithmetic layer of the quantized DNN model for each arithmetic layer based on the arithmetic results of the arithmetic layers corresponding to each other of the DNN model and the difference model. Department and
An information processing device including an evaluation unit for evaluating the error factor derived for each calculation layer.
(2)
The calculation unit
Based on the result of the inner product calculation of the input vector, weight parameter and bias parameter for the DNN model and the corresponding input vector, weight parameter and bias parameter in the difference model, the error factor for each calculation layer is derived ( The information processing device according to 1).
(3)
The calculation unit
Described in (2) above, when a non-linear operation by an activation function is performed in the arithmetic layer, the activation function is approximated and the error factor for each arithmetic layer is derived based on the approximated activation function. Information processing equipment.
(4)
The calculation unit
When the average value pooling process is performed in the arithmetic layer, the error factor for each arithmetic layer in the average value pooling process is derived from the filter size of the filter used for the average value pooling process and the linear combination of the elements included in the filter. The information processing apparatus according to (3) above.
(5)
The calculation unit
When the maximum value pooling process is performed in the calculation layer, the calculation layer uses the difference between the element that should have been selected as the representative value before the quantization and the element selected as the representative value after the quantization. The information processing apparatus according to (3) above, which derives the error factor for each.
(6)
The evaluation unit
The degree of influence of the error factor for each arithmetic layer on the output error included in the quantized output of the quantized DNN model is evaluated based on a predetermined evaluation index (1) to (5). ) Is described in any one of the information processing devices.
(7)
The evaluation unit
The information processing apparatus according to (6), wherein the degree of influence of the error factor for each calculation layer on the error included in the quantized output is displayed in different modes according to the evaluation result.
(8)
The evaluation unit
The information processing apparatus according to (6) or (7) above, which presents detailed information about the error factor for each of the error factors for each calculation layer.
(9)
The evaluation unit
The information processing apparatus according to any one of (6) to (8) above, which presents advice information for optimizing the quantized DNN model together with the detailed information.
(10)
The evaluation department
The information processing apparatus according to (1), wherein when the error factor for each calculation layer is propagated to the calculation layer in the subsequent stage, the error factors are aggregated based on predetermined conditions.
(11)
The processor,
A difference model is generated between a DNN model having a plurality of calculation layers that output calculation results based on input data and a quantized DNN model obtained by quantizing the DNN model.
Based on the calculation results of the calculation layers corresponding to each other of the DNN model and the difference model, an error factor constituting the quantization error of each calculation layer of the quantized DNN model is derived for each calculation layer.
An information processing method for evaluating the error factor derived for each arithmetic layer.

1 Information processing unit 110 Input unit 120 Output unit 121 Analysis window 121-1 Error factor information 121-2 Optimization plan information 130 Communication unit 140 Storage unit 141 Design tool storage unit 142 Evaluation threshold storage unit 150 Control unit 151 Generation unit 152 Calculation unit 153 Evaluation unit 200 Computer 201 CPU
202 ROM
203 RAM
204 interface (I / F)
205 Input device 206 Output device 207 Storage 208 Drive 209 Port 210 Communication device 211 Bus

Claims

A generator that generates a difference model between a DNN model having a plurality of calculation layers that output calculation results based on input data and a quantized DNN model obtained by quantizing the DNN model.
An operation for deriving an error factor constituting the quantization error of each arithmetic layer of the quantized DNN model for each arithmetic layer based on the arithmetic results of the arithmetic layers corresponding to each other of the DNN model and the difference model. Department and
An information processing device including an evaluation unit for evaluating the error factor derived for each calculation layer.
The calculation unit
Claim to derive the error factor for each calculation layer based on the inner product calculation result of the input vector, weight parameter and bias parameter for the DNN model and the corresponding input vector, weight parameter and bias parameter in the difference model. The information processing apparatus according to 1.
The calculation unit
The second aspect of claim 2, wherein when the non-linear operation by the activation function is performed in the arithmetic layer, the activation function is approximated and the error factor for each arithmetic layer is derived based on the approximated activation function. Information processing device.
The calculation unit
When the average value pooling process is performed in the arithmetic layer, the error factor for each arithmetic layer in the average value pooling process is derived from the filter size of the filter used for the average value pooling process and the linear combination of the elements included in the filter. The information processing apparatus according to claim 3.
The calculation unit
When the maximum value pooling process is performed in the calculation layer, the calculation layer uses the difference between the element that should have been selected as the representative value before the quantization and the element selected as the representative value after the quantization. The information processing apparatus according to claim 3, wherein the error factor for each is derived.
The evaluation unit
The information processing apparatus according to claim 1, which evaluates the degree of influence of the error factor for each arithmetic layer on the error included in the quantized output of the quantized DNN model based on a predetermined evaluation index. ..
The evaluation unit
The information processing apparatus according to claim 6, wherein the degree of influence of the error factor for each arithmetic layer on the error included in the quantized output is displayed in different modes according to the evaluation result.
The evaluation unit
The information processing apparatus according to claim 7, wherein detailed information about the error factor is presented for each of the error factors for each calculation layer.
The evaluation unit
The information processing apparatus according to claim 8, which presents advice information for optimizing the quantized DNN model together with the detailed information.
The calculation unit
The information processing apparatus according to claim 1, wherein when the error factor for each calculation layer is propagated to the calculation layer in the subsequent stage, the error factors are aggregated based on predetermined conditions.
The processor,
A difference model is generated between a DNN model having a plurality of calculation layers that output calculation results based on input data and a quantized DNN model obtained by quantizing the DNN model.
Based on the calculation results of the calculation layers corresponding to each other of the DNN model and the difference model, an error factor constituting the quantization error of each calculation layer of the quantized DNN model is derived for each calculation layer.
An information processing method for evaluating the error factor derived for each arithmetic layer.