WO2021246249A1 - Information processing device and information processing method - Google Patents

Information processing device and information processing method Download PDF

Info

Publication number
WO2021246249A1
WO2021246249A1 PCT/JP2021/019876 JP2021019876W WO2021246249A1 WO 2021246249 A1 WO2021246249 A1 WO 2021246249A1 JP 2021019876 W JP2021019876 W JP 2021019876W WO 2021246249 A1 WO2021246249 A1 WO 2021246249A1
Authority
WO
WIPO (PCT)
Prior art keywords
calculation
layer
dnn model
error
information processing
Prior art date
Application number
PCT/JP2021/019876
Other languages
French (fr)
Japanese (ja)
Inventor
隆之 氏家
Original Assignee
ソニーグループ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーグループ株式会社 filed Critical ソニーグループ株式会社
Publication of WO2021246249A1 publication Critical patent/WO2021246249A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This disclosure relates to an information processing device and an information processing method.
  • DNN Deep Neural Network
  • Patent Document 1 proposes a computer system that dynamically optimizes bit accuracy during training in order to reduce the demand for computational resources in a neural network.
  • Patent Document 2 proposes a method and an apparatus for quantizing the parameters of a neural network.
  • DNN has the characteristic of accumulating operations in multiple layers, even if the final output result of the quantized DNN is referred to, the cause of the performance deterioration of the DNN due to the quantization is detailed. Difficult to analyze.
  • the information processing apparatus of one form according to the present disclosure includes a generation unit, a calculation unit, and an evaluation unit.
  • the generation unit generates a difference model between a DNN model having a plurality of calculation layers that output calculation results based on input data and a quantized DNN model obtained by quantizing the DNN model.
  • the calculation unit derives an error factor constituting the quantization error of each calculation layer of the quantized DNN model for each calculation layer based on the calculation results of the calculation layers corresponding to each other of the DNN model and the difference model.
  • the evaluation unit evaluates the error factors derived for each arithmetic layer.
  • Patent Document 1 and Patent Document 2 propose a method for dynamically determining the quantization bit accuracy.
  • these methods also do not quantitatively analyze the influence of each layer on the entire network, but are methods that automate trial and error or analyze independently for each layer. I have a similar problem.
  • the technique according to the present disclosure addresses the above-mentioned problems depending on the characteristics of DNN, and quantitatively determines which layer should be given more bit precision to mitigate the adverse effect of quantization on performance throughout the network.
  • Outline of information processing according to the embodiment of the present disclosure >> 1 and 2 are diagrams showing an outline of information processing according to the embodiment of the present disclosure.
  • the information processing (information processing method) according to the embodiment of the present disclosure is realized by the information processing apparatus 1 (see FIG. 3) described later.
  • the information processing apparatus 1 generates a quantized DNN model M2 obtained by quantizing the DNN model M1.
  • the quantization method is not particularly limited, and for example, quantization by an operation using a fixed-point number by a fixed-point method can be adopted.
  • the information processing apparatus 1 generates a difference DNN model M3 between the DNN model M1 and the quantized DNN model M2.
  • the difference DNN model M3 is a model having a network topology equivalent to that of the DNN model M1 and the quantized DNN model M2, which is composed of the feature map before and after the quantization in each layer of the DNN and the entire difference of the parameters.
  • the DNN model M1, the quantized DNN model M2, and the differential DNN model M3 each have a plurality of corresponding arithmetic layers L m to L n .
  • a plurality of arithmetic layers L m to an arithmetic layer L n correspond to any one of an input layer, an output layer, and a plurality of hidden layers (intermediate layers) (for example, "m” and "n” have m + 2 ⁇ n. Satisfied positive integer).
  • the plurality of arithmetic layers L m to L n may correspond to any of all connection layers, convolution layers, pooling layers, activation functions, or other types of layers, depending on the structure of the network. In FIG. 1, "m” is, for example, an arbitrary integer of 1 or more, and "n” is an arbitrary integer of 4 or more.
  • the DNN model M1 outputs the calculation result for the input data.
  • the quantized DNN model M2 outputs an operation result (quantized output) for the quantized input.
  • the difference DNN model M3 outputs a calculation result (difference output) for the difference input.
  • the information processing apparatus 1 calculates an error factor constituting the quantization error of each calculation layer of the quantized DNN model M2 based on the calculation results of the calculation layers corresponding to each other of the DNN model M1 and the difference DNN model M3. Derived for each layer. That is, the information processing apparatus 1 linearly combines the calculation results of each calculation layer of the quantized DNN model M2 with the calculation results of the calculation layers of the corresponding DNN model M1 and the calculation results of the calculation layer of the difference DNN model M3. Indirectly expressed as. By doing so, the calculation result of each calculation layer of the quantized DNN model M2 can be decomposed for each calculation layer, and the error factors constituting the quantization error can be propagated to the subsequent calculation layer.
  • the information processing apparatus 1 the input data and the differential data input from the operation layer L m (input layer), not shown, corresponding arithmetic layer DNN model M1 and the differential DNN model M3 Enter each in L m + 1.
  • the difference input data is, for example, difference data between the input data to the DNN model M1 and the quantized input data to the quantized DNN model M2.
  • the information processing apparatus 1 uses the calculation result (output before quantization) of the calculation layer L m + 1 of the DNN model M1 and the calculation result (difference output) of the calculation layer L m + 1 of the difference DNN model M3.
  • An error factor E m + 1 that indirectly represents the arithmetic layer of the arithmetic layer L m + 1 of the quantized DNN model M2 is derived.
  • the error factor E m + 1 includes a plurality of factors m + 1_1 to factor m + 1_z.
  • Factors m + 1_1 to factor m + 1_z correspond to errors caused by various parameters such as weight parameters and bias parameters of each arithmetic layer, non-linear components of the activation function, differences between elements in the pooling process, and the like.
  • the information processing apparatus 1 the derived error factor E m + 1 of the operation layer L m + 1, followed by a calculation layer L m + 1, and outputs corresponding to the operation layer L m + 2 of DNN model M1 and the differential DNN model M3.
  • the information processing apparatus 1 derives an error factor E m + 2 of the operation layer L m + 2. Then, the information processing apparatus 1, the error factor E m + 2 derived, followed by calculation layer L m + 2, and outputs the corresponding operation layer L m + 3 of DNN model M1 and the differential DNN model M3.
  • the information processing apparatus 1 sequentially propagates the error factor of each arithmetic layer to the subsequent arithmetic layer, and the error constituting the quantization error of each arithmetic layer up to the arithmetic layer Ln-1. Derive the factor. Then, the information processing apparatus 1 executes the final output corresponding to the quantized output of quantized DNN model M2 from a not-shown operation layer L n (output layer).
  • the information processing device 1 evaluates the error factors derived for each arithmetic layer. Specifically, the information processing apparatus 1 has a degree of influence of an error factor of each arithmetic layer on an error (output error) included in the quantized output of the quantized DNN model M2 based on a predetermined evaluation index. To evaluate.
  • the information processing apparatus 1 has an error constituting the quantization error of each arithmetic layer of the quantized DNN model M2 based on the arithmetic results of the arithmetic layers corresponding to each other of the DNN model M1 and the difference DNN model M3. Factors are derived for each arithmetic layer. Then, the information processing apparatus 1 evaluates the error factor derived for each arithmetic layer. As a result, the information processing apparatus 1 can analyze in detail the factors of the DNN performance deterioration due to the quantization.
  • FIG. 3 is a block diagram showing a configuration example of the information processing apparatus according to the embodiment of the present disclosure.
  • the information processing apparatus 1 includes an input unit 110, an output unit 120, a communication unit 130, a storage unit 140, and a control unit 150.
  • the input unit 110 detects an input operation by the administrator of the information processing device 1.
  • the control unit 150 which will be described later, can input, for example, a data set for evaluating the quantization error according to the input operation detected by the input unit 110.
  • the input unit 110 can be realized by, for example, various buttons, a keyboard, a touch panel, a mouse, a switch, and the like.
  • the output unit 120 outputs various information.
  • the output unit 120 may be configured to include a display device that outputs visual information.
  • the output unit 120 can display the window of the analysis tool executed in response to the operation from the administrator.
  • the display device can be realized by, for example, a CRT (Cathode Ray Tube), an LCD (Liquid Crystal Display), an OLED (Organic Light Emitting Diode), or the like.
  • the communication unit 130 can be realized by, for example, a NIC (Network Interface Card), various communication modems, or the like.
  • the communication unit 130 is connected to a network (Internet or the like) by wire or wirelessly, and transmits / receives information to / from an external device or the like via the network.
  • the storage unit 140 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory (Flash Memory), or a storage device such as a hard disk or an optical disk.
  • the storage unit 140 functions as a storage means for the control unit 150.
  • the storage unit 140 has a design tool storage unit 141 and an evaluation threshold storage unit 142.
  • the design tool storage unit 141 stores a design tool that provides various functions for designing a DNN model (for example, a DNN model M1).
  • the design tool may include an analytical function for analyzing the quantization error of the quantized DNN model (eg, the quantized DNN model M2) obtained by quantizing the designed DNN model (eg, DNN model M1). can.
  • This analysis function can provide the administrator of the information processing apparatus 1 with a function for analyzing and visualizing the quantization error.
  • the evaluation threshold storage unit 142 stores a threshold for evaluating the quantization error for each arithmetic layer.
  • the evaluation threshold storage unit 142 is used for the evaluation process of the evaluation unit 153, which will be described later.
  • the control unit 150 is a controller that controls each unit of the information processing device 1.
  • the control unit 150 is realized by a processor such as a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or a GPU (Graphics Processing Unit), for example.
  • the control unit 150 is realized by the processor executing various programs stored inside the information processing apparatus 1 with a RAM (Random Access Memory) or the like as a work area.
  • the control unit 150 may be realized by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the control unit 150 includes a generation unit 151, a calculation unit 152, and an evaluation unit 153, and realizes or executes the functions and operations of the information processing device 1 described below.
  • Each block (generation unit 151 to evaluation unit 153) constituting the control unit 150 is a functional block indicating the function of the control unit 150, respectively.
  • These functional blocks may be software blocks or hardware blocks.
  • each of the above-mentioned functional blocks may be one software module realized by software (including a microprogram), or may be one circuit block on a semiconductor chip (die).
  • each functional block may be one processor or one integrated circuit.
  • the method of configuring the functional block is arbitrary.
  • the control unit 150 may be configured in a functional unit different from the above-mentioned functional block.
  • the generation unit 151 generates a difference DNN model M3 between a DNN model M1 having a plurality of calculation layers that output calculation results based on input data and a quantized DNN model M2 obtained by quantizing the DNN model M1.
  • the quantization method of the DNN model M1 is not limited to a specific method.
  • the difference between the DNN model M1 and the quantized DNN model M2 is a model composed of the entire difference between the feature map and the parameters before and after the quantization in each layer of the DNN, and is equivalent to the original model DNN model M1. Represents a model with the topology of. That is, the DNN model M1, the quantized DNN model M2, and the differential DNN model M3 are each provided with corresponding layers.
  • the calculation unit 152 sets an error factor constituting the quantization error of each calculation layer of the quantized DNN model M2 based on the calculation results of the calculation layers corresponding to each other of the DNN model M1 and the difference DNN model M3. Derived for each. That is, the information processing apparatus 1 linearly combines the calculation results of each calculation layer of the quantized DNN model M2 with the calculation results of the calculation layers of the corresponding DNN model M1 and the calculation results of the calculation layer of the difference DNN model M3. Indirectly expressed as. By doing so, the calculation result of each calculation layer of the quantized DNN model M2 can be decomposed for each calculation layer, and the error factor can be propagated to the subsequent calculation layer.
  • the calculation unit 152 transfers the calculation result of the calculation layer of the DNN model M1 and the calculation result of the corresponding calculation layer in the difference DNN model M3 to the next calculation layer of the DNN model M1 and the difference DNN model M3. Interact (input). Then, using the calculation result output from the DNN model M1 and the calculation result output from the difference DNN model M3, the quantization error of the corresponding calculation layer in the quantized DNN model M2 is indirectly expressed. The error factors that make up the quantization error are derived. After deriving the error factor, the calculation unit 152 propagates the derived error factor to the next calculation layer and sequentially executes the same processing to configure the quantization error of each calculation layer of the quantized DNN model M2. The error factors to be performed are derived for each calculation layer. The details of the processing of the calculation unit 152 will be described later with reference to the drawings.
  • the evaluation unit 153 evaluates each error factor derived for each calculation layer by the calculation unit 152. Specifically, the evaluation unit 153 calculates and evaluates the degree of influence of the error factor derived for each arithmetic layer on the error (output error) included in the quantized output of the quantized DNN model M2. .. Further, the evaluation unit 153 presents detailed information regarding the error factor for each calculation layer. In addition, the evaluation unit 153 presents advice information for optimizing the quantized DNN model M2. The details of the processing of the evaluation unit 153 will be described later with reference to the figure.
  • the calculation unit 152 determines the quantization error for each calculation layer based on the result of the inner product calculation of the input vector, weight parameter, and bias parameter for the DNN model M1 and the corresponding input vector, weight parameter, and bias parameter in the difference DNN model M3. Is decomposed as the sum of error factors.
  • the input vectors (images) for all the connection layers are x, the weight parameter w, and the bias parameter b.
  • the corresponding elements are ⁇ x, ⁇ w, and ⁇ b.
  • the output vector (image) y + ⁇ y of all the connection layers in the quantized DNN model M2 can be transformed and expressed as the following equation (1).
  • " ⁇ " represents a matrix product (none matrix vector product).
  • the calculation result itself is held with a higher accuracy than the quantization bit accuracy, and when it propagates to the subsequent stage (layer), it is quantized again to match the bit accuracy. May be done.
  • the overall integration can be maintained by adding the residual of the difference output of the difference DNN model M3 and ⁇ y as ⁇ q.
  • FIG. 4 is a diagram for explaining an outline of approximation of the activation function according to the embodiment of the present disclosure.
  • the input x 1 + ⁇ x 1 with quantization error [Delta] x 1 is shows a case where the input to the activation function f.
  • the arithmetic unit 152 can separate the non-linearity of the activation function f into terms of ⁇ f by linearly approximating the activation function f. That is, the output f (x 1 + ⁇ x 1 ) when input to the activation function of the quantized DNN model M2 uses the differential coefficient f ′ (x 1 ) of the activation function f to the following equation (3). ) Can be expressed.
  • ⁇ f f'(x 1 ) ⁇ x 1 ⁇ ⁇ f (x 1 ) ... (4)
  • the arithmetic unit 152 determines the quantization error for each arithmetic layer in the average value pooling process due to the filter size of the filter used for the average value pooling process and the linear combination of the elements included in the filter. The error factors that make up the above are derived.
  • the calculation unit 152 sets the difference between the element that should have been selected as the representative value before the quantization and the element selected as the representative value after the quantization. It is used to derive the error factors that make up the quantization error for each arithmetic layer.
  • FIG. 5 is a diagram showing an outline of a method for calculating an error factor in the maximum value pooling according to the embodiment of the present disclosure.
  • the left figure of FIG. 5 shows an example of selection of representative values before quantization, and the right figure of FIG. 5 shows an example of selection of representative values after quantization.
  • the pixel value x0 is selected as the representative value
  • the pixel value x3 + ⁇ x3 is selected as the representative value
  • the representative values selected before and after the quantization are. It's different.
  • the filtering process merely has a role of propagating the calculation result and the error factor of the previous layer to the subsequent stage, and there is no problem.
  • the error factor also changes, which is a problem.
  • the arithmetic unit 152 responds by introducing a difference (vector) ⁇ p between the representative value originally selected before the quantization and the representative value actually selected after the quantization. .. ⁇ p is 0 (zero) for the part where the same element is selected before and after quantization.
  • the error factor itself does not reflect the change in the address (position), but only the change in the calculation result, but if necessary, an auxiliary index can be calculated and used for analysis. ..
  • Example of error factor analysis according to the embodiment of the present disclosure >> Hereinafter, an example of error factor analysis according to the embodiment of the present disclosure will be described.
  • the function of error factor analysis according to the embodiment of the present disclosure can be additionally realized, for example, as an analysis function of a design tool or a library stored in the design tool storage unit 141.
  • the administrator of the information processing apparatus 1 inputs the DNN by using the design tool, quantizes the DNN model M1 based on a specific method, and generates the quantized DNN model M2.
  • the administrator of the information processing apparatus 1 can use the function of analyzing and visualizing the quantization error for each arithmetic layer introduced in the design tool to quantize parameters such as the quantization bit width for each arithmetic layer. Gather information for coordination or changes in network structure. The error factors constituting the quantization error appear as differences from the output result / feature map of the original (pre-quantization) DNN model M1. Therefore, the information collected for each arithmetic layer can be variously processed and visualized and used as auxiliary information for further quantization such as quantization for each channel.
  • FIG. 6 is a diagram showing an information display example of the design tool according to the embodiment of the present disclosure.
  • FIG. 6 shows an example of the analysis window 121 (“Tensor Board”) displayed on the output unit 120 by the design tool.
  • a DNN graph GR that visualizes the network structure of a quantized DNN model (for example, a quantized DNN model M2) obtained by quantizing a DNN model (for example, the DNN model M1) is displayed.
  • a DNN graph GR shows, as an example, an example in which a DNN graph GR showing a network structure of a convolutional neural network is displayed.
  • each block of "conv1”, “conv2”, “pool2”, “vector”, “dense3”, and “output” is a quantized DNN model (for example, quantized DNN model M2).
  • quantized DNN model M2 for example, quantized DNN model M2.
  • the error factors that are derived for each calculation layer by the calculation unit 152 and that constitute the quantization error that propagates to the output layer have the same dimensions for each term, and can be summarized and compared using the same evaluation index.
  • the evaluation unit 153 describes an error included in the quantized output of the quantized DNN model (for example, the quantized DNN model M2) based on a predetermined evaluation index (hereinafter, referred to as “output error”). ) Is evaluated by the degree of influence of the error factor for each arithmetic layer. Examples of the evaluation index include the average value and the maximum value after taking the absolute value for each element, the length of the entire factor (vector) (L1, L ⁇ , L2 norm), and the like. By applying such an evaluation index, each error factor constituting the quantization error can be summarized into a scalar value and used for analysis of the behavior change accompanying the quantization.
  • the evaluation unit 153 displays the degree of influence (contribution degree) of the error factor for each calculation layer on the output error in different modes according to the evaluation result. Specifically, for example, as shown in FIG. 6, the evaluation unit 153 sets the evaluation results of each block of "conv1", “conv2", “pool2”, “vector”, “dense3", and “output”. Quantization error contribution signals SG 1 to SG 6 are displayed in the upper left, respectively. The quantization error contribution signals SG 1 to SG 6 are relative to the quantization error (or its evaluation index) for each arithmetic layer in the output error of the quantized DNN model (for example, the quantized DNN model M2). It shows the degree (magnitude) of the influence of the error factor on the calculation layer.
  • the quantization error contribution signals SG 1 to SG 6 can display the contribution indicating the degree of influence by changing the display mode such as a color or a pattern.
  • a signal corresponding to a high-contribution arithmetic layer can be displayed in red
  • a signal corresponding to a medium-contribution arithmetic layer can be displayed in yellow
  • a signal corresponding to a low-contribution arithmetic layer can be displayed in green.
  • the contribution degree is preset by the administrator of the information processing apparatus 1 and can be evaluated based on the evaluation threshold value stored in the evaluation threshold value storage unit 142.
  • the evaluation unit 153 presents detailed information about the error factor for each of the error factors for each calculation layer. Specifically, when the analysis window 121 detects, for example, an operation on the quantization error contribution signal SG 2 , the evaluation unit 153 of the calculation layer (“conv2”) corresponding to the quantization error contribution signal SG 2 More detailed internal error factor information 121-1 is presented in a pop-up.
  • the error factor information 121-1 is, for example, an error factor name (“Dxw”, “xDw”) having a high degree of contribution (high degree of influence) based on a default setting or a magnitude of an evaluation index specified in advance. , "Df", "Dq”, etc.) are displayed in descending order.
  • the administrator of the information processing apparatus 1 can adjust the network of the quantized DNN model M2 by referring to the error factor information 121-1.
  • the evaluation unit 153 presents detailed information on the error factor and advice information for optimizing the quantized DNN model (for example, the quantized DNN model M2). Specifically, the evaluation unit 153 displays the optimization plan information (“Optimization Hint”) of the quantized DNN model (for example, the quantized DNN model M2) in the pop-up displaying the error factor information 121-1. 121-2 is presented.
  • the optimization plan information (“Optimization Hint”) 121-2 includes, for example, "increase the total number of bits of w by x bits", “shift the decimal part of x by bits”, “increase the number of arithmetic bits by z bits", etc.
  • the optimization hints associated with the error factors that have a large influence on the calculation layer are displayed in order from the top.
  • the administrator of the information processing apparatus 1 performs network adjustment of the quantized DNN model (for example, the quantized DNN model M2) more easily by selecting a plan from the optimization plan information 121-2. Can be done.
  • FIG. 7 is a flowchart showing a processing procedure example of the contribution signal display processing according to the embodiment of the present disclosure.
  • FIG. 8 is a flowchart showing a processing procedure example of the optimization plan information display processing according to the embodiment of the present disclosure.
  • the display process of the quantization error contribution signal shown in FIG. 7 is composed of a procedure PH 1 for deriving the quantization error and a procedure PH 2 for classifying and displaying the quantization error contribution signal.
  • the calculation unit 152 inputs the data set to be processed, evaluates the DNN model before and after quantization (step S101), and analyzes the error factor (step S102).
  • the error factor analysis derives the error factors that make up the quantization error for each arithmetic layer.
  • the calculation unit 152 determines whether or not the evaluation of the entire data set is completed (step S103).
  • step S104 When the calculation unit 152 determines that the evaluation of the entire data set has not been completed (step S103; No), the calculation unit 152 prepares the next data (step S104), and returns to the processing procedure of the above step S101.
  • step S105 when the calculation unit 152 determines that the evaluation of the entire data set has been completed (step S103; Yes), the calculation unit 152 averages the error factors of the entire data set (step S105).
  • the evaluation unit 153 executes a function of analyzing and visualizing the quantization error of each arithmetic layer according to the operation of the administrator of the information processing apparatus 1, and displays the DNN graph GR in the analysis window 121 (step). S106).
  • the evaluation unit 153 After displaying the DNN graph GR, the evaluation unit 153, for example, relativizes the error factors associated with the calculation layer of the DNN graph GR with respect to the original calculation result (step S107), and summarizes each error factor with a specific evaluation index. (Step S108).
  • the evaluation unit 153 compares the evaluation index value with the preset threshold value, classifies the contribution of the error factors constituting the quantization error of the corresponding arithmetic layer (step S109), and classifies the contribution as the classification result.
  • the quantization error contribution signal SG is displayed in the analysis window 121 (step S110).
  • the evaluation unit 153 determines whether or not the quantization error contribution signal of all layers is displayed (step S111).
  • step S111 When the evaluation unit 153 determines that the quantization error contribution signal of all layers is not displayed (step S111; No), the evaluation unit 153 moves to the processing of the layer in which the quantization error contribution signal is not yet displayed (step). S112), the process returns to the processing procedure of step S107.
  • step S111 when the evaluation unit 153 determines that the quantization error contribution signal of all layers is displayed (step S111; Yes), the evaluation unit 153 ends the display process of the quantization error contribution signal shown in FIG. 7.
  • steps S107 to S112 can be executed in any order, such as processing in order from the final calculation layer of the DNN model quantized by the design tool (for example, the quantized DNN model M2).
  • the evaluation unit 153 ranks the error factors of the corresponding arithmetic layer based on the evaluation index value according to the operation detection of the administrator of the information processing apparatus 1 with respect to the quantization error contribution signal ( Step S201).
  • the evaluation unit 153 arranges the error factor names having a large influence on the output error of the quantized DNN model (for example, the quantized DNN model M2) in descending order based on the ranking result.
  • the factor information 121-1 is displayed in a pop-up in the analysis window 121 (step S202).
  • the evaluation unit 153 acquires an error factor having a higher rank (greater influence) (step S203), and corresponds to the error factor so that the ⁇ (error) element ( ⁇ x, ⁇ w, etc.) included in the error factor becomes smaller. Create a hint for assigning the quantization accuracy of the element (step S204).
  • the evaluation unit 153 determines whether or not the evaluation index value is less than the threshold value that can be determined to have a small effect based on the created allocation hint (step S205).
  • step S205 When the evaluation unit 153 determines that the evaluation index value is not less than the threshold value (threshold value that can be determined to have a small effect) (step S205; No), the evaluation unit 153 returns to the processing procedure of step S204 and creates another allocation hint.
  • the threshold value threshold value that can be determined to have a small effect
  • step S205 when the evaluation unit 153 determines that the evaluation index value is less than the threshold value (threshold value that can be determined to have a small influence) (step S205; Yes), it determines whether or not hints have been created for all the error factors having higher ranks. Determination (step S206).
  • step S206 determines as a result of the determination that hints have not been created for all the error factors having higher ranks (step S206; No).
  • the process returns to the processing procedure of step S203.
  • step S206 when the evaluation unit 153 determines as a result of the determination that hints are created for all the error factors having higher ranks (step S206; Yes), the hints associated with the error factors having higher ranks are arranged in order.
  • the optimization plan information 121-2 is displayed in the analysis window 121 (step S207), and the display process of the optimization plan information shown in FIG. 8 is terminated.
  • FIG. 9 is a diagram showing an outline of information processing according to a modified example.
  • the error factors after finishing the processing of each calculation layer are aggregated into two, that is, the one with the maximum evaluation index value and the total of other error factors.
  • the memory area for holding the arithmetic results can be suppressed to a constant multiple of 3 times the original, which is a realistic memory consumption. It is possible to analyze the error factor with.
  • the information processing apparatus 1 may be realized by a dedicated computer system or a general-purpose computer system.
  • various programs for realizing the information processing method according to the embodiment of the present disclosure may be stored and distributed in a computer-readable recording medium such as an optical disk, a semiconductor memory, a magnetic tape, or a flexible disk.
  • the information processing apparatus 1 realizes the information processing method according to the embodiment of the present disclosure by installing and executing various programs on a computer.
  • various programs for realizing the information processing method according to the embodiment of the present disclosure may be stored in a disk device provided in a server device on a network such as the Internet so that they can be downloaded to a computer or the like.
  • the functions provided by various programs for realizing the information processing method according to the embodiment of the present disclosure may be realized by the cooperation between the OS (Operating System) and the application software.
  • the part other than the OS may be stored in a medium and distributed, or the part other than the OS may be stored in the server device so that it can be downloaded to a computer or the like.
  • each component of each device shown in the figure is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of them may be functionally or physically distributed / physically in arbitrary units according to various loads and usage conditions. Can be integrated and configured.
  • FIG. 10 is a block diagram showing a schematic configuration example of a computer that functions as an information processing apparatus according to the embodiment of the present disclosure. Note that FIG. 10 shows a schematic configuration of a computer that functions as an information processing apparatus 1, and some of the components shown in FIG. 10 may be omitted, and components other than those shown in FIG. 10 may be further omitted. It may be included.
  • the computer 200 functioning as the information processing device 1 includes, for example, a CPU 201, a ROM 202, a RAM 203, an interface 204, an input device 205, an output device 206, a storage 207, and a drive 208. , Port 209 and communication device 210.
  • the CPU 201 functions as, for example, an arithmetic processing device or a control device, and controls all or a part of the operation of each component based on various programs recorded in the ROM 202.
  • Various programs stored in the ROM 202 may be recorded in the storage 207 or the recording medium 301 connected via the drive 208. In this case, the CPU 201 controls all or a part of the operation of each component based on the program stored in the recording medium 301.
  • the various programs include programs that provide various functions for realizing information processing of the information processing apparatus 1.
  • the ROM 202 functions as an auxiliary storage device for storing programs read into the CPU 201, data used for calculations, and the like.
  • the RAM 203 functions as a main storage device for temporarily or permanently storing, for example, a program read into the CPU 201 and various parameters that are appropriately changed when the program read into the CPU 201 is executed.
  • the CPU 201, ROM 202, and RAM 203 realize various functions such as the generation unit 151, the calculation unit 152, and the evaluation unit 153 included in the control unit 150 described above in collaboration with software (various programs stored in the ROM 202 and the like). Can be.
  • the CPU 201, ROM 202, and RAM 203 are connected to each other via the bus 211. Further, the bus 211 is connected to each part of the computer 200 via the interface 204.
  • the input device 205 is realized by a device such as a mouse, a keyboard, a touch panel, a button, a switch, and a lever, in which information is input by a user.
  • the input device 205 may be a remote controller capable of transmitting a control signal using infrared rays or other radio waves. Further, the input device 205 may include a voice input device such as a microphone.
  • the function of the input unit 110 described above can be realized by the input device 205.
  • the output device 206 visually or audibly gives the acquired information to the user, for example, a display device such as a CRT, LCD, or an organic EL, an audio output device such as a speaker or headphones, a printer, a mobile phone, or a facsimile. It is a device that can be notified.
  • a display device such as a CRT, LCD, or an organic EL
  • an audio output device such as a speaker or headphones
  • printer a printer
  • a mobile phone or a facsimile. It is a device that can be notified.
  • the function of the output unit 120 described above can be realized by the output device 206.
  • the storage 207 is a device for storing various types of data, and for example, a magnetic storage device such as a hard disk drive (HDD), a semiconductor storage device, an optical storage device, an optical magnetic storage device, or the like is used.
  • a magnetic storage device such as a hard disk drive (HDD)
  • a semiconductor storage device such as a hard disk drive (HDD)
  • an optical storage device such as an optical magnetic storage device, or the like is used.
  • the function of the storage unit 140 described above can be realized by the storage 9.
  • the drive 208 is, for example, a device for reading out information recorded on the recording medium 301 and writing information to the recording medium 301.
  • the recording medium 301 includes a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, and the like.
  • the port 209 is a connection port for connecting an external device 302, and includes a USB (Universal Serial Bus) port, an IEEE1394 port, a SCSI (Small Computer System Interface), an RS-232C port, an optical audio terminal, and the like.
  • the external device 302 includes a printer, a portable music player, a digital camera, a digital video camera, an IC recorder, and the like.
  • the communication device 210 is a communication interface for connecting to a network.
  • the communication device 210 is, for example, a communication card for a wired or wireless LAN (Local Area Network), LTE (Long Term Evolution), Bluetooth (registered trademark), WUSB (Wireless USB), or the like. Further, the communication device 210 may be a router for optical communication, various communication modems, or the like. The function of the communication unit 130 described above can be realized by the communication device 210.
  • the information processing apparatus 1 includes a generation unit 151, a calculation unit 152, and an evaluation unit 153.
  • the generation unit 151 generates a difference DNN model M3 between a DNN model M1 having a plurality of calculation layers that output calculation results based on input data and a quantized DNN model M2 obtained by quantizing the DNN model M1.
  • the calculation unit 152 sets an error factor constituting the quantization error of each calculation layer of the quantized DNN model M2 based on the calculation results of the calculation layers corresponding to each other of the DNN model M1 and the difference DNN model M3. Derived for each.
  • the evaluation unit 13 evaluates the error factors derived for each arithmetic layer.
  • the calculation unit 152 performs an error for each calculation layer based on the inner product calculation result of the input vector, the weight parameter and the bias parameter for the DNN model M1 and the input vector, the weight parameter and the bias parameter corresponding to the difference DNN model M3. Derivation of factors.
  • the quantization error generated in each layer of all the connection layers of the DNN can be expressed by linearly combining the terms corresponding to the error factors constituting the quantization error, and can be propagated between the calculation layers.
  • the calculation unit 152 approximates the activation function and derives an error factor for each calculation layer based on the approximate activation function.
  • the calculation result by the activation function can be expressed as a linear combination and propagated between the calculation layers.
  • the arithmetic unit 152 causes an error in each arithmetic layer in the average value pooling process due to the filter size of the filter used for the average value pooling process and the linear combination of the elements included in the filter. Derive the factor.
  • the calculation result by the average value pooling can be expressed by a linear combination and propagated between the calculation layers.
  • the calculation unit 152 determines the difference between the element that should have been selected as the representative value before the quantization and the element selected as the representative value after the quantization when the maximum value pooling process is performed in the calculation layer. It is used to derive the error factor for each arithmetic layer. As a result, the calculation result by the maximum value pooling can be treated as a linear combination and can be propagated between the calculation layers.
  • the evaluation unit 153 evaluates the degree of influence of the error factor for each arithmetic layer on the output error of the quantized DNN model M2 based on the evaluation index defined in advance. This makes it possible to identify in which arithmetic layer the quantization error has a large effect on the output error.
  • the evaluation unit 153 displays the degree of influence of the error factor for each arithmetic layer on the output error of the quantized DNN model M2 in different modes according to the evaluation result. As a result, the degree of influence of the error factor for each calculation layer can be recognized at a glance.
  • the evaluation unit 153 presents detailed information about the error factor for each of the error factors for each calculation layer. This makes it possible to identify the factor that has a large influence among the plurality of factors included in the error factor.
  • the evaluation unit 153 presents detailed information as well as advice information for optimizing the quantized DNN model M2.
  • the quantized DNN model M2 can be adjusted more easily.
  • the evaluation unit 153 aggregates the error factors based on the conditions specified in advance. As a result, even if the DNN model M1 to be quantized is huge, it is possible to analyze the error factor with a realistic memory consumption.
  • a generator that generates a difference model between a DNN model having a plurality of calculation layers that output calculation results based on input data and a quantized DNN model obtained by quantizing the DNN model.
  • Department and An information processing device including an evaluation unit for evaluating the error factor derived for each calculation layer.
  • the calculation unit Based on the result of the inner product calculation of the input vector, weight parameter and bias parameter for the DNN model and the corresponding input vector, weight parameter and bias parameter in the difference model, the error factor for each calculation layer is derived ( The information processing device according to 1). (3) The calculation unit Described in (2) above, when a non-linear operation by an activation function is performed in the arithmetic layer, the activation function is approximated and the error factor for each arithmetic layer is derived based on the approximated activation function. Information processing equipment.
  • the calculation unit When the average value pooling process is performed in the arithmetic layer, the error factor for each arithmetic layer in the average value pooling process is derived from the filter size of the filter used for the average value pooling process and the linear combination of the elements included in the filter.
  • the information processing apparatus according to (3) above.
  • the calculation unit When the maximum value pooling process is performed in the calculation layer, the calculation layer uses the difference between the element that should have been selected as the representative value before the quantization and the element selected as the representative value after the quantization.
  • the information processing apparatus according to (3) above which derives the error factor for each.
  • the evaluation unit The degree of influence of the error factor for each arithmetic layer on the output error included in the quantized output of the quantized DNN model is evaluated based on a predetermined evaluation index (1) to (5). ) Is described in any one of the information processing devices. (7) The evaluation unit The information processing apparatus according to (6), wherein the degree of influence of the error factor for each calculation layer on the error included in the quantized output is displayed in different modes according to the evaluation result. (8) The evaluation unit The information processing apparatus according to (6) or (7) above, which presents detailed information about the error factor for each of the error factors for each calculation layer.
  • the evaluation unit The information processing apparatus according to any one of (6) to (8) above, which presents advice information for optimizing the quantized DNN model together with the detailed information.
  • the evaluation department The information processing apparatus according to (1), wherein when the error factor for each calculation layer is propagated to the calculation layer in the subsequent stage, the error factors are aggregated based on predetermined conditions.
  • the processor A difference model is generated between a DNN model having a plurality of calculation layers that output calculation results based on input data and a quantized DNN model obtained by quantizing the DNN model. Based on the calculation results of the calculation layers corresponding to each other of the DNN model and the difference model, an error factor constituting the quantization error of each calculation layer of the quantized DNN model is derived for each calculation layer.
  • Information processing unit 110 Input unit 120 Output unit 121 Analysis window 121-1 Error factor information 121-2 Optimization plan information 130 Communication unit 140 Storage unit 141 Design tool storage unit 142 Evaluation threshold storage unit 150 Control unit 151 Generation unit 152 Calculation unit 153 Evaluation unit 200 Computer 201 CPU 202 ROM 203 RAM 204 interface (I / F) 205 Input device 206 Output device 207 Storage 208 Drive 209 Port 210 Communication device 211 Bus

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Image Processing (AREA)

Abstract

An information processing device comprises a generation unit (151), a calculation unit (152), and an evaluation unit (153). The generation unit (151) generates a DNN model having a plurality of calculation layers that output calculation results based on input data, and a differential model that differs from a quantized DNN model obtained by quantizing the DNN model. The calculation unit (152) calculates, for each calculation layer, error factors that constitute the quantization errors in each calculation layer of the quantized DNN model, on the basis of the calculation results of mutually corresponding calculation layers of the DNN model and the differential model. The evaluation unit (153) evaluates the error factors derived for each calculation layer.

Description

情報処理装置及び情報処理方法Information processing equipment and information processing method
 本開示は、情報処理装置及び情報処理方法に関する。 This disclosure relates to an information processing device and an information processing method.
 近年、計算の高速化や計算資源の削減などを目的として、入力層と出力層との間に複数の隠れ層を有するDNN(Deep Neural Network)などの人工ニューラルネットワークの量子化の研究が進められている。 In recent years, research on the quantization of artificial neural networks such as DNN (Deep Neural Network), which has multiple hidden layers between the input layer and the output layer, has been promoted for the purpose of speeding up calculations and reducing computational resources. ing.
 例えば、特許文献1には、ニューラルネットワークにおいて計算リソースの要求を減少させるために、トレーニング時のビット精度の最適化を動的に行うコンピュータシステムが提案されている。また、特許文献2には、ニューラルネットワークのパラメータを量子化する方法及び装置が提案されている。 For example, Patent Document 1 proposes a computer system that dynamically optimizes bit accuracy during training in order to reduce the demand for computational resources in a neural network. Further, Patent Document 2 proposes a method and an apparatus for quantizing the parameters of a neural network.
特開2019-164793号公報Japanese Unexamined Patent Publication No. 2019-164793 特開2019-32833号公報Japanese Unexamined Patent Publication No. 2019-32833
 しかしながら、DNNでは、複数の層で演算を積み重ねていくという特性を有するので、量子化済みのDNNの最終的な出力結果を参照しても、量子化に伴うDNNの性能劣化の要因を詳細に分析することが困難である。 However, since DNN has the characteristic of accumulating operations in multiple layers, even if the final output result of the quantized DNN is referred to, the cause of the performance deterioration of the DNN due to the quantization is detailed. Difficult to analyze.
 そこで、本開示では、量子化に伴うDNNの性能劣化の要因を詳細に分析できる情報処理装置及び情報処理方法を提案する。 Therefore, in this disclosure, we propose an information processing device and an information processing method that can analyze in detail the factors of DNN performance deterioration due to quantization.
 上記の課題を解決するために、本開示に係る一形態の情報処理装置は、生成部と、演算部と、評価部とを備える。生成部は、入力データに基づく演算結果を出力する演算層を複数有するDNNモデルと、DNNモデルを量子化した量子化済みDNNモデルとの差分モデルを生成する。演算部は、DNNモデル及び差分モデルの相互に対応する演算層の演算結果に基づいて、量子化済みDNNモデルの各演算層の量子化誤差を構成する誤差因子を、演算層ごとに導出する。評価部は、演算層ごとに導出された誤差因子を評価する。 In order to solve the above problems, the information processing apparatus of one form according to the present disclosure includes a generation unit, a calculation unit, and an evaluation unit. The generation unit generates a difference model between a DNN model having a plurality of calculation layers that output calculation results based on input data and a quantized DNN model obtained by quantizing the DNN model. The calculation unit derives an error factor constituting the quantization error of each calculation layer of the quantized DNN model for each calculation layer based on the calculation results of the calculation layers corresponding to each other of the DNN model and the difference model. The evaluation unit evaluates the error factors derived for each arithmetic layer.
本開示の実施形態に係る情報処理の概要を示す図である。It is a figure which shows the outline of the information processing which concerns on embodiment of this disclosure. 本開示の実施形態に係る情報処理の概要を示す図である。It is a figure which shows the outline of the information processing which concerns on embodiment of this disclosure. 本開示の実施形態に係る情報処理装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the information processing apparatus which concerns on embodiment of this disclosure. 本開示の実施形態に係る活性化関数の近似の概要を説明するための図である。It is a figure for demonstrating the outline of the approximation of the activation function which concerns on embodiment of this disclosure. 本開示の実施形態に係る最大値プーリングにおける誤差因子の算出方法の概要を示す図である。It is a figure which shows the outline of the calculation method of the error factor in the maximum value pooling which concerns on embodiment of this disclosure. 本開示の実施形態に係る設計ツールの情報表示例を示す図である。It is a figure which shows the information display example of the design tool which concerns on embodiment of this disclosure. 本開示の実施形態に係る寄与度シグナル表示処理の処理手順例を示すフローチャートである。It is a flowchart which shows the processing procedure example of the contribution signal display processing which concerns on embodiment of this disclosure. 本開示の実施形態に係る最適化プラン情報表示処理の処理手順例を示すフローチャートである。It is a flowchart which shows the processing procedure example of the optimization plan information display processing which concerns on embodiment of this disclosure. 変形例に係る情報処理の概要を示す図である。It is a figure which shows the outline of the information processing which concerns on the modification. 本開示の実施形態に係る情報処理装置として機能するコンピュータの概略構成例を示すブロック図である。It is a block diagram which shows the schematic configuration example of the computer which functions as the information processing apparatus which concerns on embodiment of this disclosure.
 以下に、本開示の実施形態について図面に基づいて詳細に説明する。なお、以下の各実施形態において、同一の部位には同一の符号を付することにより重複する説明を省略する場合がある。また、本明細書及び図面において、実質的に同一の機能構成を有する複数の構成要素を、同一の符号の後に異なる数字を付して区別する場合もある。 Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. In each of the following embodiments, duplicate description may be omitted by assigning the same reference numerals to the same parts. Further, in the present specification and the drawings, a plurality of components having substantially the same functional configuration may be distinguished by adding different numbers after the same reference numerals.
 また、以下に示す項目順序に従って本開示を説明する。
  1.はじめに
  2.本開示の実施形態に係る情報処理の概要
  3.情報処理装置の構成例
  4.本開示の誤差因子分析の各層への適用方法
  4-1.全接続層について
  4-2.活性化関数について
  4-3.プーリング層について
  4-3-1.平均値プーリング
  4-3-2.最大値プーリング
  5.本開示の実施形態に係る誤差因子分析の例
  6.情報処理装置の処理手順例
  6-1.寄与度シグナル表示処理
  6-2.最適化プラン情報表示処理
  7.変形例
  7-1.誤差因子の集約
  7-2.その他の変形例
  8.ハードウェア構成
  9.むすび
In addition, the present disclosure will be described according to the order of items shown below.
1. 1. Introduction 2. Outline of information processing according to the embodiment of the present disclosure 3. Configuration example of information processing device 4. Method of applying the error factor analysis of the present disclosure to each layer 4-1. About all connection layers 4-2. About activation function 4-3. About the pooling layer 4-3-1. Average value pooling 4-3-2. Maximum value pooling 5. Example of error factor analysis according to the embodiment of the present disclosure 6. Example of processing procedure of information processing device 6-1. Contribution signal display processing 6-2. Optimization plan information display processing 7. Modification example 7-1. Aggregation of error factors 7-2. Other modifications 8. Hardware configuration 9. Conclusion
<<1.はじめに>>
 DNNなどのニューラルネットワークを用いた学習手法は高い精度を有する一方、演算に係る処理負担が大きいことから、ニューラルネットワークを量子化することにより、処理負担を効果的に軽減する研究が進められている。
<< 1. Introduction >>
While learning methods using neural networks such as DNN have high accuracy, the processing load related to computation is large, so research is underway to effectively reduce the processing load by quantizing the neural network. ..
 従来は、DNNにおける演算の各層ごとに量子化ビット精度を動的に決定する動的量子化において、層ごとの最適なビット精度値の決定は人手によるマニュアル調整で行われている。しかし、人手によるマニュアル調整は、オペレータの試行錯誤や、これまでのチューニングの経験により成立するものである。そして、このようなマニュアル調整は、ネットワーク全体を通して、どの層により多くのビット精度を与えれば、量子化に伴うDNNの性能に対する悪影響を緩和できるかについて、定量的な理解に基づくものではない。 Conventionally, in dynamic quantization in which the quantization bit accuracy is dynamically determined for each layer of the operation in DNN, the optimum bit accuracy value for each layer is determined manually by manual adjustment. However, manual adjustment by hand is established by trial and error of the operator and experience of tuning so far. And, such manual adjustment is not based on a quantitative understanding as to which layer should be given more bit precision throughout the network to mitigate the adverse effect of quantization on DNN performance.
 また、上述した特許文献1や特許文献2は、量子化ビット精度を動的に決定する一手法を提案するものである。しかしながら、これらの手法も、ネットワーク全体における各層の影響を定量的に分析するものではなく、試行錯誤の自動化、或いは層ごとに独立した分析による手法であるという点で、上述した人手によるマニュアル調整と同様の問題を抱えている。 Further, the above-mentioned Patent Document 1 and Patent Document 2 propose a method for dynamically determining the quantization bit accuracy. However, these methods also do not quantitatively analyze the influence of each layer on the entire network, but are methods that automate trial and error or analyze independently for each layer. I have a similar problem.
 量子化に伴うDNNの性能に対する悪影響を分析する際、DNNの特性に依存して以下のような問題がある。
(問題1)DNNは演算を行うような多くの層を積み重ねていくため、出力結果のみを観測しても、どの層での量子化が最も性能劣化に影響しているのかが決定できない。
(問題2)非線形関数による活性化演算によって、内積や畳み込み演算の結果が歪んでしまうため、層間で伝搬した量子化誤差の影響を観測することが難しい。
When analyzing the adverse effect of DNN on the performance due to quantization, there are the following problems depending on the characteristics of DNN.
(Problem 1) Since DNN stacks many layers for performing calculations, it is not possible to determine in which layer the quantization most affects the performance deterioration by observing only the output result.
(Problem 2) It is difficult to observe the influence of the quantization error propagated between layers because the result of the inner product or convolution operation is distorted by the activation operation by the nonlinear function.
 本開示に係る技術は、上述したDNNの特性に依存した問題点に対応し、ネットワーク全体を通して、どの層により多くのビット精度を与えれば量子化による性能への悪影響を緩和できるかについて定量的な理解を与えるため、DNNの各層で発生した量子化誤差を伝搬し、最も大きな誤差要因を追跡する方式を提案する。これにより、DNNへの量子化の適用に際して、最も出力誤差に影響を及ぼす層の分析を可能とする。 The technique according to the present disclosure addresses the above-mentioned problems depending on the characteristics of DNN, and quantitatively determines which layer should be given more bit precision to mitigate the adverse effect of quantization on performance throughout the network. To give an understanding, we propose a method of propagating the quantization error generated in each layer of DNN and tracking the largest error factor. This makes it possible to analyze the layer that most affects the output error when applying the quantization to DNN.
<<2.本開示の実施形態に係る情報処理の概要>>
 図1及び図2は、本開示の実施形態に係る情報処理の概要を示す図である。本開示の実施形態に係る情報処理(情報処理方法)は、後述する情報処理装置1(図3参照)によって実現される。
<< 2. Outline of information processing according to the embodiment of the present disclosure >>
1 and 2 are diagrams showing an outline of information processing according to the embodiment of the present disclosure. The information processing (information processing method) according to the embodiment of the present disclosure is realized by the information processing apparatus 1 (see FIG. 3) described later.
 図1に示すように、情報処理装置1は、DNNモデルM1を量子化した量子化済みDNNモデルM2を生成する。量子化の手法は特に限定されないが、例えば、固定小数点方式による固定小数点数を用いた演算による量子化を採用できる。 As shown in FIG. 1, the information processing apparatus 1 generates a quantized DNN model M2 obtained by quantizing the DNN model M1. The quantization method is not particularly limited, and for example, quantization by an operation using a fixed-point number by a fixed-point method can be adopted.
 また、情報処理装置1は、DNNモデルM1と、量子化済みDNNモデルM2との差分DNNモデルM3を生成する。差分DNNモデルM3は、DNNの各層における量子化前後の特徴マップ及びパラメータの差分全体によって構成され、DNNモデルM1や量子化済みDNNモデルM2と同等のネットワークトポロジーを備えたモデルである。 Further, the information processing apparatus 1 generates a difference DNN model M3 between the DNN model M1 and the quantized DNN model M2. The difference DNN model M3 is a model having a network topology equivalent to that of the DNN model M1 and the quantized DNN model M2, which is composed of the feature map before and after the quantization in each layer of the DNN and the entire difference of the parameters.
 DNNモデルM1、量子化済みDNNモデルM2、及び差分DNNモデルM3は、それぞれ、対応する複数の演算層L~演算層Lを有する。複数の演算層L~演算層Lは、入力層、出力層、及び複数の隠れ層(中間層)のいずれかに対応する(例えば、「m」、「n」は、m+2<nを満足する正の整数)。複数の演算層L~演算層Lは、ネットワークの構造に応じて、全接続層、畳み込み層、プーリング層、活性化関数、又は他のタイプの層のうちのいずれかに該当し得る。なお、図1において、「m」は、例えば、1以上の任意の整数であり、「n」は4以上の任意の整数である。 The DNN model M1, the quantized DNN model M2, and the differential DNN model M3 each have a plurality of corresponding arithmetic layers L m to L n . A plurality of arithmetic layers L m to an arithmetic layer L n correspond to any one of an input layer, an output layer, and a plurality of hidden layers (intermediate layers) (for example, "m" and "n" have m + 2 <n. Satisfied positive integer). The plurality of arithmetic layers L m to L n may correspond to any of all connection layers, convolution layers, pooling layers, activation functions, or other types of layers, depending on the structure of the network. In FIG. 1, "m" is, for example, an arbitrary integer of 1 or more, and "n" is an arbitrary integer of 4 or more.
 DNNモデルM1は、入力データに対する演算結果を出力する。量子化済みDNNモデルM2は、量子化済み入力に対する演算結果(量子化済み出力)を出力する。差分DNNモデルM3は、差分入力に対する演算結果(差分出力)を出力する。 The DNN model M1 outputs the calculation result for the input data. The quantized DNN model M2 outputs an operation result (quantized output) for the quantized input. The difference DNN model M3 outputs a calculation result (difference output) for the difference input.
 情報処理装置1は、DNNモデルM1及び差分DNNモデルM3の相互に対応する演算層の演算結果に基づいて、量子化済みDNNモデルM2の各演算層の量子化誤差を構成する誤差因子を、演算層ごとに導出する。すなわち、情報処理装置1は、量子化済みDNNモデルM2の各演算層の演算結果を、相互に対応するDNNモデルM1の演算層の演算結果及び差分DNNモデルM3の演算層の演算結果の線形結合として間接的に表現する。このようにすることで、量子化済みDNNモデルM2の各演算層の演算結果を演算層ごとに分解でき、量子化誤差を構成する誤差因子を後段の演算層に伝搬できる。 The information processing apparatus 1 calculates an error factor constituting the quantization error of each calculation layer of the quantized DNN model M2 based on the calculation results of the calculation layers corresponding to each other of the DNN model M1 and the difference DNN model M3. Derived for each layer. That is, the information processing apparatus 1 linearly combines the calculation results of each calculation layer of the quantized DNN model M2 with the calculation results of the calculation layers of the corresponding DNN model M1 and the calculation results of the calculation layer of the difference DNN model M3. Indirectly expressed as. By doing so, the calculation result of each calculation layer of the quantized DNN model M2 can be decomposed for each calculation layer, and the error factors constituting the quantization error can be propagated to the subsequent calculation layer.
 具体的には、図2に示すように、情報処理装置1は、図示しない演算層L(入力層)から入力データ及び差分入力データを、DNNモデルM1及び差分DNNモデルM3の対応する演算層Lm+1にそれぞれ入力する。差分入力データは、例えば、DNNモデルM1への入力データと、量子化済みDNNモデルM2への量子化済み入力データとの差分データである。 Specifically, as shown in FIG. 2, the information processing apparatus 1, the input data and the differential data input from the operation layer L m (input layer), not shown, corresponding arithmetic layer DNN model M1 and the differential DNN model M3 Enter each in L m + 1. The difference input data is, for example, difference data between the input data to the DNN model M1 and the quantized input data to the quantized DNN model M2.
 続いて、情報処理装置1は、DNNモデルM1の演算層Lm+1の演算結果(量子化前の出力)と、差分DNNモデルM3の演算層Lm+1の演算結果(差分出力)とを用いて、量子化済みDNNモデルM2の演算層Lm+1の演算層を間接的に表現した誤差因子Em+1を導出する。 Subsequently, the information processing apparatus 1 uses the calculation result (output before quantization) of the calculation layer L m + 1 of the DNN model M1 and the calculation result (difference output) of the calculation layer L m + 1 of the difference DNN model M3. An error factor E m + 1 that indirectly represents the arithmetic layer of the arithmetic layer L m + 1 of the quantized DNN model M2 is derived.
 誤差因子Em+1は、複数の因子m+1_1~因子m+1_zを含む。因子m+1_1~因子m+1_zは、各演算層の重みパラメータ及びバイアスパラメータなどの各種パラメータに起因する誤差や活性化関数の非線形成分、プーリング処理における要素間の差分などに対応する。 The error factor E m + 1 includes a plurality of factors m + 1_1 to factor m + 1_z. Factors m + 1_1 to factor m + 1_z correspond to errors caused by various parameters such as weight parameters and bias parameters of each arithmetic layer, non-linear components of the activation function, differences between elements in the pooling process, and the like.
 そして、情報処理装置1は、導出した演算層Lm+1の誤差因子Em+1を、演算層Lm+1に続く、DNNモデルM1及び差分DNNモデルM3の対応する演算層Lm+2に出力する。 Then, the information processing apparatus 1, the derived error factor E m + 1 of the operation layer L m + 1, followed by a calculation layer L m + 1, and outputs corresponding to the operation layer L m + 2 of DNN model M1 and the differential DNN model M3.
 続いて、演算層Lm+1と同様に、情報処理装置1は、演算層Lm+2の誤差因子Em+2を導出する。そして、情報処理装置1は、導出した誤差因子Em+2を、演算層Lm+2に続く、DNNモデルM1及び差分DNNモデルM3の対応する演算層Lm+3に出力する。 Subsequently, similarly to the operation layer L m + 1, the information processing apparatus 1 derives an error factor E m + 2 of the operation layer L m + 2. Then, the information processing apparatus 1, the error factor E m + 2 derived, followed by calculation layer L m + 2, and outputs the corresponding operation layer L m + 3 of DNN model M1 and the differential DNN model M3.
 同様にして、情報処理装置1は、各演算層の誤差因子を後段の演算層に対して順次伝搬させていき、演算層Ln-1に至るまで各演算層の量子化誤差を構成する誤差因子を導出する。そして、情報処理装置1は、図示しない演算層L(出力層)から量子化済みDNNモデルM2の量子化済み出力に対応する最終的な出力を実行する。 Similarly, the information processing apparatus 1 sequentially propagates the error factor of each arithmetic layer to the subsequent arithmetic layer, and the error constituting the quantization error of each arithmetic layer up to the arithmetic layer Ln-1. Derive the factor. Then, the information processing apparatus 1 executes the final output corresponding to the quantized output of quantized DNN model M2 from a not-shown operation layer L n (output layer).
 情報処理装置1は、演算層ごとに導出した誤差因子を評価する。具体的には、情報処理装置1は、予め規定される評価指標に基づいて、量子化済みDNNモデルM2の量子化出力に含まれる誤差(出力誤差)に対する各演算層の誤差因子の影響の度合いを評価する。 The information processing device 1 evaluates the error factors derived for each arithmetic layer. Specifically, the information processing apparatus 1 has a degree of influence of an error factor of each arithmetic layer on an error (output error) included in the quantized output of the quantized DNN model M2 based on a predetermined evaluation index. To evaluate.
 このように、情報処理装置1は、DNNモデルM1及び差分DNNモデルM3の相互に対応する演算層の演算結果に基づいて、量子化済みDNNモデルM2の各演算層の量子化誤差を構成する誤差因子を、演算層ごとに導出する。そして、情報処理装置1は、演算層ごとに導出した誤差因子を評価する。これにより、情報処理装置1は、量子化に伴うDNNの性能劣化の要因を詳細に分析できる。 As described above, the information processing apparatus 1 has an error constituting the quantization error of each arithmetic layer of the quantized DNN model M2 based on the arithmetic results of the arithmetic layers corresponding to each other of the DNN model M1 and the difference DNN model M3. Factors are derived for each arithmetic layer. Then, the information processing apparatus 1 evaluates the error factor derived for each arithmetic layer. As a result, the information processing apparatus 1 can analyze in detail the factors of the DNN performance deterioration due to the quantization.
<<3.情報処理装置の構成>>
 以下、本開示の実施形態に係る情報処理装置1の構成を説明する。図3は、本開示の実施形態に係る情報処理装置の構成例を示すブロック図である。
<< 3. Information processing device configuration >>
Hereinafter, the configuration of the information processing apparatus 1 according to the embodiment of the present disclosure will be described. FIG. 3 is a block diagram showing a configuration example of the information processing apparatus according to the embodiment of the present disclosure.
 図3に示すように、情報処理装置1は、入力部110と、出力部120と、通信部130と、記憶部140と、制御部150とを備える。 As shown in FIG. 3, the information processing apparatus 1 includes an input unit 110, an output unit 120, a communication unit 130, a storage unit 140, and a control unit 150.
 入力部110は、情報処理装置1の管理者による入力操作を検出する。後述する制御部150は、例えば、入力部110により検出される入力操作に応じて、例えば、量子化誤差の評価を行うデータセットを入力できる。入力部110は、例えば、各種のボタン、キーボード、タッチパネル、マウス、スイッチなどにより実現できる。 The input unit 110 detects an input operation by the administrator of the information processing device 1. The control unit 150, which will be described later, can input, for example, a data set for evaluating the quantization error according to the input operation detected by the input unit 110. The input unit 110 can be realized by, for example, various buttons, a keyboard, a touch panel, a mouse, a switch, and the like.
 出力部120は、種々の情報を出力する。出力部120は、視覚情報を出力するディスプレイ装置を含んで構成され得る。出力部120は、管理者からの操作に応じて実行された分析ツールのウィンドウを表示できる。ディスプレイ装置は、例えば、CRT(Cathode Ray Tube)、LCD(Liquid Crystal Display)、OLED(Organic Light Emitting Diode)などにより実現され得る。 The output unit 120 outputs various information. The output unit 120 may be configured to include a display device that outputs visual information. The output unit 120 can display the window of the analysis tool executed in response to the operation from the administrator. The display device can be realized by, for example, a CRT (Cathode Ray Tube), an LCD (Liquid Crystal Display), an OLED (Organic Light Emitting Diode), or the like.
 通信部130は、例えば、NIC(Network Interface Card)や各種通信用モデム等によって実現できる。通信部130は、ネットワーク(インターネット等)と有線又は無線で接続され、ネットワークを介して、外部装置等との間で情報の送受信を行う。 The communication unit 130 can be realized by, for example, a NIC (Network Interface Card), various communication modems, or the like. The communication unit 130 is connected to a network (Internet or the like) by wire or wirelessly, and transmits / receives information to / from an external device or the like via the network.
 記憶部140は、例えば、RAM(Random Access Memory)、フラッシュメモリ(Flash Memory)等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。記憶部140は、制御部150の記憶手段として機能する。記憶部140は、設計ツール格納部141と評価用閾値格納部142を有する。 The storage unit 140 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory (Flash Memory), or a storage device such as a hard disk or an optical disk. The storage unit 140 functions as a storage means for the control unit 150. The storage unit 140 has a design tool storage unit 141 and an evaluation threshold storage unit 142.
 設計ツール格納部141は、DNNモデル(例えば、DNNモデルM1)を設計するための各種機能を提供する設計ツールを記憶する。設計ツールは、設計したDNNモデル(例えば、DNNモデルM1)を量子化した量子化済みのDNNモデル(例えば、量子化済みDNNモデルM2)の量子化誤差を分析するための分析機能を含むことができる。この分析機能は、量子化誤差を分析し、可視化するための機能を情報処理装置1の管理者に提供できる。 The design tool storage unit 141 stores a design tool that provides various functions for designing a DNN model (for example, a DNN model M1). The design tool may include an analytical function for analyzing the quantization error of the quantized DNN model (eg, the quantized DNN model M2) obtained by quantizing the designed DNN model (eg, DNN model M1). can. This analysis function can provide the administrator of the information processing apparatus 1 with a function for analyzing and visualizing the quantization error.
 評価用閾値格納部142は、演算層ごとの量子化誤差を評価するための閾値を記憶する。評価用閾値格納部142は、後述する評価部153の評価処理に用いられる。 The evaluation threshold storage unit 142 stores a threshold for evaluating the quantization error for each arithmetic layer. The evaluation threshold storage unit 142 is used for the evaluation process of the evaluation unit 153, which will be described later.
 制御部150は、情報処理装置1の各部を制御するコントローラである。制御部150は、例えば、CPU(Central Processing Unit)やMPU(Micro Processing Unit)、GPU(Graphics Processing Unit)等のプロセッサにより実現される。制御部150は、情報処理装置1の内部に記憶された各種プログラムを、プロセッサがRAM(Random Access Memory)等を作業領域として実行することにより実現される。なお、制御部150は、ASIC(Application Specific Integrated Circuit)やFPGA(Field Programmable Gate Array)等の集積回路により実現されてもよい。CPU、MPU、ASIC、及びFPGAは何れもコントローラとみなすことができる。 The control unit 150 is a controller that controls each unit of the information processing device 1. The control unit 150 is realized by a processor such as a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or a GPU (Graphics Processing Unit), for example. The control unit 150 is realized by the processor executing various programs stored inside the information processing apparatus 1 with a RAM (Random Access Memory) or the like as a work area. The control unit 150 may be realized by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array). The CPU, MPU, ASIC, and FPGA can all be regarded as controllers.
 制御部150は、図3に示すように、生成部151と、演算部152と、評価部153とを備え、以下に説明する情報処理装置1の機能や作用を実現または実行する。なお、制御部150を構成する各ブロック(生成部151~評価部153)はそれぞれ制御部150の機能を示す機能ブロックである。これら機能ブロックはソフトウェアブロックであってもよいし、ハードウェアブロックであってもよい。例えば、上述の機能ブロックが、それぞれ、ソフトウェア(マイクロプログラムを含む。)で実現される1つのソフトウェアモジュールであってもよいし、半導体チップ(ダイ)上の1つの回路ブロックであってもよい。勿論、各機能ブロックがそれぞれ1つのプロセッサ又は1つの集積回路であってもよい。機能ブロックの構成方法は任意である。なお、制御部150は上述の機能ブロックとは異なる機能単位で構成されていてもよい。 As shown in FIG. 3, the control unit 150 includes a generation unit 151, a calculation unit 152, and an evaluation unit 153, and realizes or executes the functions and operations of the information processing device 1 described below. Each block (generation unit 151 to evaluation unit 153) constituting the control unit 150 is a functional block indicating the function of the control unit 150, respectively. These functional blocks may be software blocks or hardware blocks. For example, each of the above-mentioned functional blocks may be one software module realized by software (including a microprogram), or may be one circuit block on a semiconductor chip (die). Of course, each functional block may be one processor or one integrated circuit. The method of configuring the functional block is arbitrary. The control unit 150 may be configured in a functional unit different from the above-mentioned functional block.
 生成部151は、入力データに基づく演算結果を出力する演算層を複数有するDNNモデルM1と、当該DNNモデルM1を量子化した量子化済みDNNモデルM2との差分DNNモデルM3を生成する。DNNモデルM1の量子化手法は特定の手法に限定されない。また、DNNモデルM1と、量子化済みDNNモデルM2の差分とは、DNNの各層における量子化前後の特徴マップ及びパラメータの差分全体によって構成されるモデルであり、元モデルであるDNNモデルM1と同等のトポロジーを持ったモデルを表す。すなわち、DNNモデルM1、量子化済みDNNモデルM2、及び差分DNNモデルM3は、それぞれ対応する層を備えて構成されている。 The generation unit 151 generates a difference DNN model M3 between a DNN model M1 having a plurality of calculation layers that output calculation results based on input data and a quantized DNN model M2 obtained by quantizing the DNN model M1. The quantization method of the DNN model M1 is not limited to a specific method. Further, the difference between the DNN model M1 and the quantized DNN model M2 is a model composed of the entire difference between the feature map and the parameters before and after the quantization in each layer of the DNN, and is equivalent to the original model DNN model M1. Represents a model with the topology of. That is, the DNN model M1, the quantized DNN model M2, and the differential DNN model M3 are each provided with corresponding layers.
 演算部152は、DNNモデルM1及び差分DNNモデルM3の相互に対応する演算層の演算結果に基づいて、量子化済みDNNモデルM2の各演算層の量子化誤差を構成する誤差因子を、演算層ごとに導出する。すなわち、情報処理装置1は、量子化済みDNNモデルM2の各演算層の演算結果を、相互に対応するDNNモデルM1の演算層の演算結果及び差分DNNモデルM3の演算層の演算結果の線形結合として間接的に表現する。このようにすることで、量子化済みDNNモデルM2の各演算層の演算結果を演算層ごとに分解でき、誤差因子を後段の演算層に伝搬できる。 The calculation unit 152 sets an error factor constituting the quantization error of each calculation layer of the quantized DNN model M2 based on the calculation results of the calculation layers corresponding to each other of the DNN model M1 and the difference DNN model M3. Derived for each. That is, the information processing apparatus 1 linearly combines the calculation results of each calculation layer of the quantized DNN model M2 with the calculation results of the calculation layers of the corresponding DNN model M1 and the calculation results of the calculation layer of the difference DNN model M3. Indirectly expressed as. By doing so, the calculation result of each calculation layer of the quantized DNN model M2 can be decomposed for each calculation layer, and the error factor can be propagated to the subsequent calculation layer.
 具体的には、演算部152は、DNNモデルM1の演算層の演算結果と、差分DNNモデルM3において対応する演算層の演算結果とを、DNNモデルM1及び差分DNNモデルM3の次の演算層に相互に作用させる(入力する)。そして、DNNモデルM1から出力される演算結果と、差分DNNモデルM3から出力される演算結果とを用いて、量子化済みDNNモデルM2において該当する演算層の量子化誤差を間接的に表現した当該量子化誤差を構成する誤差因子を導出する。誤差因子の導出後、演算部152は、導出した誤差因子を次の演算層に伝搬させ、同様の処理を順次実行することにより、量子化済みDNNモデルM2の各演算層の量子化誤差を構成する誤差因子を演算層ごとにそれぞれ導出する。演算部152の処理の詳細については、図を参照しつつ後述する。 Specifically, the calculation unit 152 transfers the calculation result of the calculation layer of the DNN model M1 and the calculation result of the corresponding calculation layer in the difference DNN model M3 to the next calculation layer of the DNN model M1 and the difference DNN model M3. Interact (input). Then, using the calculation result output from the DNN model M1 and the calculation result output from the difference DNN model M3, the quantization error of the corresponding calculation layer in the quantized DNN model M2 is indirectly expressed. The error factors that make up the quantization error are derived. After deriving the error factor, the calculation unit 152 propagates the derived error factor to the next calculation layer and sequentially executes the same processing to configure the quantization error of each calculation layer of the quantized DNN model M2. The error factors to be performed are derived for each calculation layer. The details of the processing of the calculation unit 152 will be described later with reference to the drawings.
 評価部153は、演算部152により演算層ごとに導出された誤差因子をそれぞれ評価する。具体的には、評価部153は、演算層ごとに導出された誤差因子の、量子化済みDNNモデルM2の量子化出力に含まれる誤差(出力誤差)に対する影響の度合いをそれぞれ算出して評価する。また、評価部153は、演算層ごとの誤差因子に関する詳細情報を提示する。また、評価部153は、量子化済みDNNモデルM2を最適化するための助言情報を提示する。評価部153の処理の詳細については、図を参照しつつ後述する。 The evaluation unit 153 evaluates each error factor derived for each calculation layer by the calculation unit 152. Specifically, the evaluation unit 153 calculates and evaluates the degree of influence of the error factor derived for each arithmetic layer on the error (output error) included in the quantized output of the quantized DNN model M2. .. Further, the evaluation unit 153 presents detailed information regarding the error factor for each calculation layer. In addition, the evaluation unit 153 presents advice information for optimizing the quantized DNN model M2. The details of the processing of the evaluation unit 153 will be described later with reference to the figure.
<<4.本開示の誤差因子分析の各層への適用方法>>
 以下、本開示の実施形態に係る情報処理装置1による誤差因子分析の各層への適用方法について説明する。
<< 4. Method of applying the error factor analysis of the present disclosure to each layer >>
Hereinafter, a method of applying the error factor analysis by the information processing apparatus 1 according to the embodiment of the present disclosure to each layer will be described.
<4-1.全接続層について>
 演算部152は、DNNモデルM1に対する入力ベクトル、重みパラメータ及びバイアスパラメータと、差分DNNモデルM3において対応する入力ベクトル、重みパラメータ及びバイアスパラメータとの内積演算結果に基づいて、演算層ごとの量子化誤差を誤差因子の和として分解する。
<4-1. About all connection layers>
The calculation unit 152 determines the quantization error for each calculation layer based on the result of the inner product calculation of the input vector, weight parameter, and bias parameter for the DNN model M1 and the corresponding input vector, weight parameter, and bias parameter in the difference DNN model M3. Is decomposed as the sum of error factors.
 DNNモデルM1において、全接続層に対する入力ベクトル(画像)をx、重みパラメータw、バイアスパラメータbとする。また、差分DNNモデルM3において、対応する同要素をΔx、Δw、Δbとする。この際、量子化済みDNNモデルM2における全接続層の出力ベクトル(画像)y+Δyは、以下の式(1)として変形して表現できる。ただし、「・」は、行列積(なし行列ベクトル積)を表す。 In the DNN model M1, the input vectors (images) for all the connection layers are x, the weight parameter w, and the bias parameter b. Further, in the difference DNN model M3, the corresponding elements are Δx, Δw, and Δb. At this time, the output vector (image) y + Δy of all the connection layers in the quantized DNN model M2 can be transformed and expressed as the following equation (1). However, "・" represents a matrix product (none matrix vector product).
 y+Δy=(x+Δx)・(w+Δw)+(b+Δb)
     =x・w+x・Δw+Δx・w+Δx・Δw+b+Δb
     =y+x・Δw+Δx・w+Δx・Δw+Δb・・・(1)
y + Δy = (x + Δx) · (w + Δw) + (b + Δb)
= X ・ w + x ・ Δw + Δx ・ w + Δx ・ Δw + b + Δb
= Y + x · Δw + Δx · w + Δx · Δw + Δb ... (1)
 これにより、全接続層において量子化により演算結果に発生した誤差Δyは、以下の式(2)に示すように、量子化誤差に含まれる誤差因子の和として表される。 Thereby, the error Δy generated in the calculation result by the quantization in all the connection layers is expressed as the sum of the error factors included in the quantization error as shown in the following equation (2).
 Δy=x・Δw+Δx・w+Δx・Δw+Δb・・・(2) Δy = x · Δw + Δx · w + Δx · Δw + Δb ... (2)
 なお、量子化の手法によっては、演算結果自体は量子化ビット精度よりも高い精度で保持しておいて、後段(の層)に伝搬する際に再度量子化を施してビット精度を合わせる処理を行う場合がある。このような場合には、差分DNNモデルM3の差分出力とΔyとの残差をΔqとして追加することにより、全体の統合を維持できる。 Depending on the quantization method, the calculation result itself is held with a higher accuracy than the quantization bit accuracy, and when it propagates to the subsequent stage (layer), it is quantized again to match the bit accuracy. May be done. In such a case, the overall integration can be maintained by adding the residual of the difference output of the difference DNN model M3 and Δy as Δq.
 上述の式の変形は、全接続層のみではなく、画像に対する畳み込み層などの線形演算を行う層全般について適用できる。 The modification of the above equation can be applied not only to all connecting layers but also to all layers that perform linear operations such as convolution layers for images.
<4-2.活性化関数について>
 DNNモデルM1の演算層及び差分DNNモデルM3の相互に対応する演算層において活性化関数による非線形演算が行われる場合、活性化関数を近似し、近似した活性化関数に基づいて、演算層ごとの量子化誤差を導出する。
<4-2. About activation function >
When a nonlinear operation by the activation function is performed in the arithmetic layer of the DNN model M1 and the arithmetic layer corresponding to each other of the difference DNN model M3, the activation function is approximated, and each arithmetic layer is based on the approximate activation function. Derivation of the quantization error.
 一般に、DNNにおいて全接続層や畳み込み層などの後段に挿入され、非線形演算を行う活性化関数fは、量子化誤差の分析に際して近似的な取り扱いが必要となる。図4は、本開示の実施形態に係る活性化関数の近似の概要を説明するための図である。図4において、図4は、量子化誤差Δxを持つ入力x+Δxが、活性化関数fへ入力される場合を示している。 In general, the activation function f, which is inserted in the subsequent stage of all connection layers, convolution layers, etc. in DNN and performs a non-linear operation, needs to be handled approximately when analyzing the quantization error. FIG. 4 is a diagram for explaining an outline of approximation of the activation function according to the embodiment of the present disclosure. In Figure 4, the input x 1 + Δx 1 with quantization error [Delta] x 1 is shows a case where the input to the activation function f.
 図4に示すように、例えば、演算部152は、活性化関数fを一次近似することにより、活性化関数fの非線形性をΔfの項に分離できる。すなわち、量子化済みDNNモデルM2の活性化関数に入力されたときの出力f(x+Δx)は、活性化関数fの微分係数f´(x)を用いて、以下の式(3)のように表現できる。 As shown in FIG. 4, for example, the arithmetic unit 152 can separate the non-linearity of the activation function f into terms of Δf by linearly approximating the activation function f. That is, the output f (x 1 + Δx 1 ) when input to the activation function of the quantized DNN model M2 uses the differential coefficient f ′ (x 1 ) of the activation function f to the following equation (3). ) Can be expressed.
 f(x+Δx)=f(x)+f´(x)Δx-Δf・・・(3) f (x 1 + Δx 1 ) = f (x 1 ) + f ′ (x 1 ) Δx 1 − Δf ... (3)
 ここで、Δxは、前層までの誤差因子の線形結合(量子化誤差)となるため、f´(x)Δxについてもf´(x)が定数であることから誤差因子の線形結合となる。また、Δfについては、実際上、差分DNNモデルM3の出力をΔf(x)とすると、以下に示す式(4)として算出できる。 Here, [Delta] x 1, since the linear combination of error factors before layer (quantization error), f'(x 1) f'also [Delta] x 1 (x 1) is the error factor since it is a constant It becomes a linear combination. Further, Δf can be calculated as the following equation (4), assuming that the output of the difference DNN model M3 is Δf (x 1).
 Δf=f´(x)Δx-Δf(x)・・・(4) Δf = f'(x 1 ) Δx 1 − Δf (x 1 ) ... (4)
 このように、上記式(3)に示すf´(x)Δx-Δfの項が、活性化関数全体の誤差因子となる。 As described above, the term of f'(x 1 ) Δx 1 − Δf shown in the above equation (3) becomes an error factor of the entire activation function.
<4-3.プーリング層について>
(4-3-1.平均値プーリング)
 演算部152は、演算層において平均値プーリング処理が行われる場合、平均値プーリング処理に用いるフィルタのフィルタサイズ及びフィルタに含まれる要素の線形結合により、平均値プーリング処理における演算層ごとの量子化誤差を構成する誤差因子を導出する。
<4-3. About the pooling layer>
(4-3-1. Average pooling)
When the average value pooling process is performed in the arithmetic layer, the arithmetic unit 152 determines the quantization error for each arithmetic layer in the average value pooling process due to the filter size of the filter used for the average value pooling process and the linear combination of the elements included in the filter. The error factors that make up the above are derived.
 例えば、画像を対象とした畳み込みニューラルネットワークにおいて主に挿入されるプーリング層における平均値プーリングでは、プーリング処理に用いるフィルタ内の各要素(例えば、画素値)の平均値を出力する。すなわち、フィルタサイズをα、フィルタ内に含まれる画像の各要素をx(i=1,・・・,α)とすると、DNNモデルM1のプーリング層の演算結果である出力値yP1は、yP1=Σ/αと算出される。この演算は、線形であり、量子化済みDNNモデルM2の入力要素に含まれる差分Δxiについて考慮しても、フィルタ出力誤差ΔyP1は、ΔyP1=ΣΔx/αと線形結合で表現される。このため、後段の層に伝搬できる。 For example, in the average value pooling in the pooling layer mainly inserted in the convolutional neural network for an image, the average value of each element (for example, a pixel value) in the filter used for the pooling process is output. That is, assuming that the filter size is α and each element of the image contained in the filter is xi (i = 1, ..., α), the output value y P1 which is the calculation result of the pooling layer of the DNN model M1 is It is calculated as y P1 = Σ i x i / α. This operation is linear, and the filter output error Δy P1 is expressed as a linear combination with Δy P1 = Σ i Δx i / α even when the difference Δxi contained in the input element of the quantized DNN model M2 is taken into consideration. To. Therefore, it can be propagated to the subsequent layer.
(4-3-2.最大値プーリング)
 一方、演算部152は、演算層において最大値プーリング処理が行われる場合、量子化前に代表値として選択されるはずであった要素と、量子化後に代表値として選択された要素との差分を用いて、演算層ごとの量子化誤差を構成する誤差因子を導出する。
(4-3-2. Maximum value pooling)
On the other hand, when the maximum value pooling process is performed in the calculation layer, the calculation unit 152 sets the difference between the element that should have been selected as the representative value before the quantization and the element selected as the representative value after the quantization. It is used to derive the error factors that make up the quantization error for each arithmetic layer.
 プーリング層における最大値プーリングでは、フィルタ内の各要素(例えば、画素値)のうち最大値を代表値として出力するが、この処理は非線形処理として扱われる。誤差因子分析における取り扱いにおいても、量子化前後の代表値の差異に関して非線形性の影響を受ける。図5は、本開示の実施形態に係る最大値プーリングにおける誤差因子の算出方法の概要を示す図である。図5の左図は、量子化前の代表値の選出例を示し、図5の右図は、量子化後の代表値の選出例を示している。 In the maximum value pooling in the pooling layer, the maximum value of each element (for example, pixel value) in the filter is output as a representative value, but this process is treated as a non-linear process. The handling in error factor analysis is also affected by the non-linearity with respect to the difference in representative values before and after quantization. FIG. 5 is a diagram showing an outline of a method for calculating an error factor in the maximum value pooling according to the embodiment of the present disclosure. The left figure of FIG. 5 shows an example of selection of representative values before quantization, and the right figure of FIG. 5 shows an example of selection of representative values after quantization.
 図5に示すように、量子化前は、代表値として画素値x0が選出されるが、量子化後は、代表値として画素値x3+Δx3が選出され、量子化の前後で選出される代表値が異なっている。量子化の前後で選出される代表値が同じ場合、そのフィルタ処理は、単に前層の演算結果及び誤差因子を後段に伝搬する役割を持つに過ぎず、問題がない。しかし、図5に示すように、量子化の前後で選出される代表値が異なる場合、誤差因子も変わってしまうので、問題となる。 As shown in FIG. 5, before the quantization, the pixel value x0 is selected as the representative value, but after the quantization, the pixel value x3 + Δx3 is selected as the representative value, and the representative values selected before and after the quantization are. It's different. When the representative values selected before and after the quantization are the same, the filtering process merely has a role of propagating the calculation result and the error factor of the previous layer to the subsequent stage, and there is no problem. However, as shown in FIG. 5, when the representative values selected before and after the quantization are different, the error factor also changes, which is a problem.
 そこで、このような場合、演算部152は、量子化前に元々選出されるはずだった代表値と、量子化後に実際に選出された代表値との差分(ベクトル)Δpを導入して対応する。Δpは、量子化前後で同じ要素が選ばれる部分については0(ゼロ)となる。このような仕組みを導入することにより、最大値プーリングでのフィルタ出力誤差ΔyP2は、c=argmax(x,・・・,xα)のもとで、ΔyP2=Δx+Δpとなる。なお、誤差因子そのもとしては番地(位置)の変化は反映されず、ただ演算結果の変化のみが反映されることになるが、必要に応じて補助的な指標を算出して分析に利用できる。 Therefore, in such a case, the arithmetic unit 152 responds by introducing a difference (vector) Δp between the representative value originally selected before the quantization and the representative value actually selected after the quantization. .. Δp is 0 (zero) for the part where the same element is selected before and after quantization. By introducing such a mechanism, the filter output error Δy P2 in the maximum value pooling becomes Δy P2 = Δx c + Δp under c = argmax i (x 0 , ..., X α ). .. It should be noted that the error factor itself does not reflect the change in the address (position), but only the change in the calculation result, but if necessary, an auxiliary index can be calculated and used for analysis. ..
<<5.本開示の実施形態に係る誤差因子分析の例>>
 以下、本開示の実施形態に係る誤差因子分析の例について説明する。本開示の実施形態に係る誤差因子分析の機能は、例えば、設計ツール格納部141に記憶された設計ツールやライブラリの分析機能として追加して実現することができる。情報処理装置1の管理者は、設計ツールを利用することにより、DNNを入力し、ある特定の方式に基づいてDNNモデルM1を量子化して量子化済みDNNモデルM2の生成を行う。
<< 5. Example of error factor analysis according to the embodiment of the present disclosure >>
Hereinafter, an example of error factor analysis according to the embodiment of the present disclosure will be described. The function of error factor analysis according to the embodiment of the present disclosure can be additionally realized, for example, as an analysis function of a design tool or a library stored in the design tool storage unit 141. The administrator of the information processing apparatus 1 inputs the DNN by using the design tool, quantizes the DNN model M1 based on a specific method, and generates the quantized DNN model M2.
 また、情報処理装置1の管理者は、設計ツールに導入された演算層ごとの量子化誤差の分析・可視化する機能を利用して、演算層ごとの量子化ビット幅を初めとした量子化パラメータの調整ないしネットワーク構造の変更のための情報を収集する。量子化誤差を構成する誤差因子は、元(量子化前)のDNNモデルM1の出力結果・特徴マップに対する差分として現れる。このため、演算層ごとに収集した情報は、チャネル別量子化など、さらなる量子化のための補助情報として多様に加工・可視化して用いることができる。 In addition, the administrator of the information processing apparatus 1 can use the function of analyzing and visualizing the quantization error for each arithmetic layer introduced in the design tool to quantize parameters such as the quantization bit width for each arithmetic layer. Gather information for coordination or changes in network structure. The error factors constituting the quantization error appear as differences from the output result / feature map of the original (pre-quantization) DNN model M1. Therefore, the information collected for each arithmetic layer can be variously processed and visualized and used as auxiliary information for further quantization such as quantization for each channel.
 図6は、本開示の実施形態に係る設計ツールの情報表示例を示す図である。図6は、設計ツールにより出力部120に表示される分析ウィンドウ121(「Tensor Board」)の一例を示すものである。図6に示す分析ウィンドウ121には、DNNモデル(例えば、DNNモデルM1)を量子化した量子化済みのDNNモデル(例えば、量子化済みDNNモデルM2)のネットワーク構造を可視化したDNNグラフGRが表示される。図6では、一例として、畳み込みニューラルネットワークのネットワーク構造を示すDNNグラフGRが表示される例を示している。DNNグラフGRにおいて、「conv1」、「conv2」、「pool2」、「vector」、「dense3」、「output」の各ブロックが、量子化されたDNNモデル(例えば、量子化済みDNNモデルM2)の各演算層を示している。 FIG. 6 is a diagram showing an information display example of the design tool according to the embodiment of the present disclosure. FIG. 6 shows an example of the analysis window 121 (“Tensor Board”) displayed on the output unit 120 by the design tool. In the analysis window 121 shown in FIG. 6, a DNN graph GR that visualizes the network structure of a quantized DNN model (for example, a quantized DNN model M2) obtained by quantizing a DNN model (for example, the DNN model M1) is displayed. Will be done. FIG. 6 shows, as an example, an example in which a DNN graph GR showing a network structure of a convolutional neural network is displayed. In the DNN graph GR, each block of "conv1", "conv2", "pool2", "vector", "dense3", and "output" is a quantized DNN model (for example, quantized DNN model M2). Each arithmetic layer is shown.
 演算部152により演算層ごとに導出され、出力層まで伝搬した量子化誤差を構成する誤差因子は、各々の項が同一の次元をもっており、同一の評価指標によって要約や比較が可能である。評価部153は、予め規定される評価指標に基づいて、量子化されたDNNモデル(例えば、量子化済みDNNモデルM2)の量子化済み出力に含まれる誤差(以下、「出力誤差」と記載する)に対する演算層ごとの誤差因子の影響の度合いを評価する。評価指標の一例としては、要素単位で絶対値を取った上での平均値や最大値、因子(ベクトル)全体の長さ(L1,L∞,L2ノルム)などが挙げられる。このような評価指標の適用によって、量子化誤差を構成する各誤差因子をスカラー値に要約し、量子化に伴う挙動変化の分析に利用できる。 The error factors that are derived for each calculation layer by the calculation unit 152 and that constitute the quantization error that propagates to the output layer have the same dimensions for each term, and can be summarized and compared using the same evaluation index. The evaluation unit 153 describes an error included in the quantized output of the quantized DNN model (for example, the quantized DNN model M2) based on a predetermined evaluation index (hereinafter, referred to as “output error”). ) Is evaluated by the degree of influence of the error factor for each arithmetic layer. Examples of the evaluation index include the average value and the maximum value after taking the absolute value for each element, the length of the entire factor (vector) (L1, L∞, L2 norm), and the like. By applying such an evaluation index, each error factor constituting the quantization error can be summarized into a scalar value and used for analysis of the behavior change accompanying the quantization.
 そして、評価部153は、出力誤差に対する演算層ごとの誤差因子の影響の度合い(寄与度)を、評価結果に応じた異なる態様で表示する。具体的には、例えば、図6に示すように、評価部153は、評価結果として、「conv1」、「conv2」、「pool2」、「vector」、「dense3」、「output」の各ブロックの左上に、それぞれ量子化誤差寄与度シグナルSG~SGを表示する。量子化誤差寄与度シグナルSG~SGは、量子化されたDNNモデル(例えば、量子化済みDNNモデルM2)の出力誤差における演算層ごとの量子化誤差(あるいはその評価指標)から相対的にみられる当該演算層での誤差因子の影響の度合い(大きさ)を表したものである。例えば、量子化誤差寄与度シグナルSG~SGは、影響の度合いを示す寄与度を、色や模様などの表示態様を変更して表示できる。一例としては、寄与度が高い演算層に対応するシグナルは赤色で表示し、寄与度が中程度の演算層に対応するシグナルは黄色で表示し、寄与度が低いシグナルは緑色で表示できる。寄与度は、情報処理装置1の管理者により予め設定され、評価用閾値格納部142に記憶される評価用閾値に基づいて評価できる。 Then, the evaluation unit 153 displays the degree of influence (contribution degree) of the error factor for each calculation layer on the output error in different modes according to the evaluation result. Specifically, for example, as shown in FIG. 6, the evaluation unit 153 sets the evaluation results of each block of "conv1", "conv2", "pool2", "vector", "dense3", and "output". Quantization error contribution signals SG 1 to SG 6 are displayed in the upper left, respectively. The quantization error contribution signals SG 1 to SG 6 are relative to the quantization error (or its evaluation index) for each arithmetic layer in the output error of the quantized DNN model (for example, the quantized DNN model M2). It shows the degree (magnitude) of the influence of the error factor on the calculation layer. For example, the quantization error contribution signals SG 1 to SG 6 can display the contribution indicating the degree of influence by changing the display mode such as a color or a pattern. As an example, a signal corresponding to a high-contribution arithmetic layer can be displayed in red, a signal corresponding to a medium-contribution arithmetic layer can be displayed in yellow, and a signal corresponding to a low-contribution arithmetic layer can be displayed in green. The contribution degree is preset by the administrator of the information processing apparatus 1 and can be evaluated based on the evaluation threshold value stored in the evaluation threshold value storage unit 142.
 また、評価部153は、演算層ごとの誤差因子の各々について、当該誤差因子に関する詳細情報を提示する。具体的には、分析ウィンドウ121において、例えば、量子化誤差寄与度シグナルSGに対する操作を検出すると、評価部153は、量子化誤差寄与度シグナルSGに対応する演算層(「conv2」)の内部のより詳細な誤差因子情報121-1をポップアップで提示する。誤差因子情報121-1は、例えば、例えば、デフォルトの設定や事前に指定された評価指標の大小に基づいて、寄与度が高い(影響の度合いが大きい)誤差因子名(「Dxw」,「xDw」,「Df」,「Dq」など)を降順で表示する。情報処理装置1の管理者は、誤差因子情報121-1を参照することにより、量子化済みDNNモデルM2のネットワーク調整等を行うことができる。 Further, the evaluation unit 153 presents detailed information about the error factor for each of the error factors for each calculation layer. Specifically, when the analysis window 121 detects, for example, an operation on the quantization error contribution signal SG 2 , the evaluation unit 153 of the calculation layer (“conv2”) corresponding to the quantization error contribution signal SG 2 More detailed internal error factor information 121-1 is presented in a pop-up. The error factor information 121-1 is, for example, an error factor name (“Dxw”, “xDw”) having a high degree of contribution (high degree of influence) based on a default setting or a magnitude of an evaluation index specified in advance. , "Df", "Dq", etc.) are displayed in descending order. The administrator of the information processing apparatus 1 can adjust the network of the quantized DNN model M2 by referring to the error factor information 121-1.
 また、評価部153は、誤差因子に関する詳細情報とともに、量子化されたDNNモデル(例えば、量子化済みDNNモデルM2)を最適化するための助言情報を提示する。具体的には、評価部153は、誤差因子情報121-1を表示するポップアップ内に、量子化されたDNNモデル(例えば、量子化済みDNNモデルM2)の最適化プラン情報(「Optimization Hint」)121-2を提示する。最適化プラン情報(「Optimization Hint」)121-2は、例えば、「wの総ビット数をxビット増やす」、「xの小数部をビットずらす」、「演算ビット数をzビット増やす」など、当該演算層において影響が大きい誤差因子に紐付く最適化のヒントを上から順に表示する。情報処理装置1の管理者は、最適化プラン情報121-2からプラン選択を行うことにより、より簡便に量子化されたDNNモデル(例えば、量子化済みDNNモデルM2)のネットワーク調整等を行うことができる。 Further, the evaluation unit 153 presents detailed information on the error factor and advice information for optimizing the quantized DNN model (for example, the quantized DNN model M2). Specifically, the evaluation unit 153 displays the optimization plan information (“Optimization Hint”) of the quantized DNN model (for example, the quantized DNN model M2) in the pop-up displaying the error factor information 121-1. 121-2 is presented. The optimization plan information ("Optimization Hint") 121-2 includes, for example, "increase the total number of bits of w by x bits", "shift the decimal part of x by bits", "increase the number of arithmetic bits by z bits", etc. The optimization hints associated with the error factors that have a large influence on the calculation layer are displayed in order from the top. The administrator of the information processing apparatus 1 performs network adjustment of the quantized DNN model (for example, the quantized DNN model M2) more easily by selecting a plan from the optimization plan information 121-2. Can be done.
<<6.情報処理装置の処理手順例>>
 以下、図7及び図8を用いて、本開示の実施形態に係る情報処理装置1による処理手順例を説明する。図7は、本開示の実施形態に係る寄与度シグナル表示処理の処理手順例を示すフローチャートである。図8は、本開示の実施形態に係る最適化プラン情報表示処理の処理手順例を示すフローチャートである。
<< 6. Information processing device processing procedure example >>
Hereinafter, an example of a processing procedure by the information processing apparatus 1 according to the embodiment of the present disclosure will be described with reference to FIGS. 7 and 8. FIG. 7 is a flowchart showing a processing procedure example of the contribution signal display processing according to the embodiment of the present disclosure. FIG. 8 is a flowchart showing a processing procedure example of the optimization plan information display processing according to the embodiment of the present disclosure.
<6-1.寄与度シグナル表示処理>
 以下、図7を用いて、量子化誤差寄与度シグナルの表示処理について説明する。図7に示す量子化誤差寄与度シグナルの表示処理は、量子化誤差を導出する手順PHと、量子化誤差寄与度シグナルを分類して表示する手順PHとで構成される。
<6-1. Contribution signal display processing>
Hereinafter, the display processing of the quantization error contribution signal will be described with reference to FIG. 7. The display process of the quantization error contribution signal shown in FIG. 7 is composed of a procedure PH 1 for deriving the quantization error and a procedure PH 2 for classifying and displaying the quantization error contribution signal.
 図7に示すように、演算部152は、処理対象となるデータセットを入力して、量子化前後のDNNモデルを評価し(ステップS101)、誤差因子を分析する(ステップS102)。誤差因子分析により、演算層ごとの量子化誤差を構成する誤差因子が導出される。 As shown in FIG. 7, the calculation unit 152 inputs the data set to be processed, evaluates the DNN model before and after quantization (step S101), and analyzes the error factor (step S102). The error factor analysis derives the error factors that make up the quantization error for each arithmetic layer.
 誤差因子分析後、演算部152は、データセット全体の評価が完了したか否かを判定する(ステップS103)。 After the error factor analysis, the calculation unit 152 determines whether or not the evaluation of the entire data set is completed (step S103).
 演算部152は、データセット全体の評価が完了していないと判定した場合(ステップS103;No)、次のデータを準備して(ステップS104)、上記ステップS101の処理手順に戻る。 When the calculation unit 152 determines that the evaluation of the entire data set has not been completed (step S103; No), the calculation unit 152 prepares the next data (step S104), and returns to the processing procedure of the above step S101.
 一方、演算部152は、データセット全体の評価が完了していると判定した場合(ステップS103;Yes)、データセット全体の誤差因子を平均化する(ステップS105)。 On the other hand, when the calculation unit 152 determines that the evaluation of the entire data set has been completed (step S103; Yes), the calculation unit 152 averages the error factors of the entire data set (step S105).
 続いて、評価部153は、情報処理装置1の管理者の操作に応じて、演算層ごとの量子化誤差の分析・可視化する機能を実行し、DNNグラフGRを分析ウィンドウ121に表示する(ステップS106)。 Subsequently, the evaluation unit 153 executes a function of analyzing and visualizing the quantization error of each arithmetic layer according to the operation of the administrator of the information processing apparatus 1, and displays the DNN graph GR in the analysis window 121 (step). S106).
 DNNグラフGRの表示後、評価部153は、例えば、DNNグラフGRの演算層に付随する誤差因子を元演算結果に対して相対化し(ステップS107)、各誤差因子を特定の評価指標で要約する(ステップS108)。 After displaying the DNN graph GR, the evaluation unit 153, for example, relativizes the error factors associated with the calculation layer of the DNN graph GR with respect to the original calculation result (step S107), and summarizes each error factor with a specific evaluation index. (Step S108).
 誤差因子の要約後、評価部153は、評価指標値を事前設定の閾値と比較して、該当演算層の量子化誤差を構成する誤差因子の寄与度を分類し(ステップS109)、分類結果として量子化誤差寄与度シグナルSGを分析ウィンドウ121に表示する(ステップS110)。 After summarizing the error factors, the evaluation unit 153 compares the evaluation index value with the preset threshold value, classifies the contribution of the error factors constituting the quantization error of the corresponding arithmetic layer (step S109), and classifies the contribution as the classification result. The quantization error contribution signal SG is displayed in the analysis window 121 (step S110).
 続いて、評価部153は、全層の量子化誤差寄与度シグナルを表示したか否かを判定する(ステップS111)。 Subsequently, the evaluation unit 153 determines whether or not the quantization error contribution signal of all layers is displayed (step S111).
 評価部153は、全層の量子化誤差寄与度シグナルを表示していないと判定した場合(ステップS111;No)、量子化誤差寄与度シグナルを未だ表示していない層の処理へ移動し(ステップS112)、上記ステップS107の処理手順に戻る。 When the evaluation unit 153 determines that the quantization error contribution signal of all layers is not displayed (step S111; No), the evaluation unit 153 moves to the processing of the layer in which the quantization error contribution signal is not yet displayed (step). S112), the process returns to the processing procedure of step S107.
 一方、評価部153は、全層の量子化誤差寄与度シグナルを表示していると判定した場合(ステップS111;Yes)、図7に示す量子化誤差寄与度シグナルの表示処理を終了する。 On the other hand, when the evaluation unit 153 determines that the quantization error contribution signal of all layers is displayed (step S111; Yes), the evaluation unit 153 ends the display process of the quantization error contribution signal shown in FIG. 7.
 上記ステップS107~ステップS112の処理は、設計ツールにより量子化されたDNNモデル(例えば、量子化済みDNNモデルM2)の最終演算層から順に処理を行うなど任意の順序で実行できる。 The processes of steps S107 to S112 can be executed in any order, such as processing in order from the final calculation layer of the DNN model quantized by the design tool (for example, the quantized DNN model M2).
<6-2.最適化プラン情報表示処理>
 以下、図8を用いて、最適化プラン情報の表示処理について説明する。図8に示すように、評価部153は、量子化誤差寄与度シグナルに対する情報処理装置1の管理者の操作検出に応じて、該当演算層の誤差因子を評価指標値に基づいて順位付けする(ステップS201)。
<6-2. Optimization plan information display processing>
Hereinafter, the display processing of the optimization plan information will be described with reference to FIG. As shown in FIG. 8, the evaluation unit 153 ranks the error factors of the corresponding arithmetic layer based on the evaluation index value according to the operation detection of the administrator of the information processing apparatus 1 with respect to the quantization error contribution signal ( Step S201).
 続いて、評価部153は、順位付けの結果に基づいて、量子化済みのDNNモデル(例えば、量子化済みDNNモデルM2)の出力誤差に対して影響の大きい誤差因子名を降順に並べた誤差因子情報121-1を分析ウィンドウ121にポップアップで表示する(ステップS202)。 Subsequently, the evaluation unit 153 arranges the error factor names having a large influence on the output error of the quantized DNN model (for example, the quantized DNN model M2) in descending order based on the ranking result. The factor information 121-1 is displayed in a pop-up in the analysis window 121 (step S202).
 続いて、評価部153は、順位が上位(影響が大きい)の誤差因子を取得し(ステップS203)、誤差因子に含まれるΔ(誤差)要素(Δx、Δw等)が小さくなるように、該当要素の量子化精度の割当ヒントを作成する(ステップS204)。 Subsequently, the evaluation unit 153 acquires an error factor having a higher rank (greater influence) (step S203), and corresponds to the error factor so that the Δ (error) element (Δx, Δw, etc.) included in the error factor becomes smaller. Create a hint for assigning the quantization accuracy of the element (step S204).
 続いて、評価部153は、作成した割当ヒントにより、評価指標値が、影響小と判断できる閾値未満となるか否かを判定する(ステップS205)。 Subsequently, the evaluation unit 153 determines whether or not the evaluation index value is less than the threshold value that can be determined to have a small effect based on the created allocation hint (step S205).
 評価部153は、評価指標値が閾値(影響小と判断できる閾値)未満とならないと判定した場合(ステップS205;No)、上記ステップS204の処理手順に戻り、別の割当ヒントを作成する。 When the evaluation unit 153 determines that the evaluation index value is not less than the threshold value (threshold value that can be determined to have a small effect) (step S205; No), the evaluation unit 153 returns to the processing procedure of step S204 and creates another allocation hint.
 一方、評価部153は、評価指標値が閾値(影響小と判断できる閾値)未満となると判定した場合(ステップS205;Yes)、順位が上位の誤差因子の全てについてヒントを作成したか否かを判定する(ステップS206)。 On the other hand, when the evaluation unit 153 determines that the evaluation index value is less than the threshold value (threshold value that can be determined to have a small influence) (step S205; Yes), it determines whether or not hints have been created for all the error factors having higher ranks. Determination (step S206).
 評価部153は、判定の結果、順位が上位の誤差因子の全てについてヒントを作成していないと判定した場合(ステップS206;No)、上記ステップS203の処理手順に戻る。 When the evaluation unit 153 determines as a result of the determination that hints have not been created for all the error factors having higher ranks (step S206; No), the process returns to the processing procedure of step S203.
 一方、評価部153は、判定の結果、順位が上位の誤差因子の全てについてヒントを作成していると判定した場合(ステップS206;Yes)、順位が上位の誤差因子に紐付くヒントを順に並べた最適化プラン情報121-2を分析ウィンドウ121に表示し(ステップS207)、図8に示す最適化プラン情報の表示処理を終了する。 On the other hand, when the evaluation unit 153 determines as a result of the determination that hints are created for all the error factors having higher ranks (step S206; Yes), the hints associated with the error factors having higher ranks are arranged in order. The optimization plan information 121-2 is displayed in the analysis window 121 (step S207), and the display process of the optimization plan information shown in FIG. 8 is terminated.
<<7.変形例>>
<7-1.誤差因子の集約>
 上述の実施形態は一例を示したものであり、種々の変更及び応用が可能である。本開示の実施形態に係る情報処理は、DNNの演算層の増加に応じて、指数的に誤差因子が増加する。このため、巨大なDNNモデルでは、現実的なメモリ消費量で誤差因子の分析を行うことが難しい。以下、このような課題に対する変形例を説明する。図9は、変形例に係る情報処理の概要を示す図である。
<< 7. Modification example >>
<7-1. Aggregation of error factors>
The above embodiment shows an example, and various modifications and applications are possible. In the information processing according to the embodiment of the present disclosure, the error factor increases exponentially as the number of arithmetic layers of the DNN increases. Therefore, in a huge DNN model, it is difficult to analyze the error factor with a realistic memory consumption. Hereinafter, a modification example for such a problem will be described. FIG. 9 is a diagram showing an outline of information processing according to a modified example.
 図9に示すように、変形例に係る情報処理では、特定の評価指標で要約した誤差因子Em+1のうち、最大値を記録した誤差因子(例えば、因子m+1_P)のみを独立して保持し、その他の誤差因子については因子和Σとして集約して保持する。この操作を出力層まで反復することにより、ある特定の層に起因する評価指標最大の誤差因子と、それ以外の誤差因子の和という2つの因子のみが最終的に得られることになる。これらの情報からは、事前に指定した評価指標の観点で最も量子化誤差に影響している層と、その中でも特に原因となっている量子化処理を特定できる。したがって、DNNモデルの設計者は、特定の層に対する調整を反復的に実施し、量子化済みのDNNモデル全体の性能(例えば、画像認識性能)を改善できる。 As shown in FIG. 9, in the information processing related to the modification, only the error factor (for example, factor m + 1_P ) that recorded the maximum value among the error factors Em + 1 summarized by the specific evaluation index is independently held. Other error factors are aggregated and held as the factor sum Σ. By repeating this operation up to the output layer, only two factors, the maximum error factor of the evaluation index caused by a specific layer and the sum of the other error factors, are finally obtained. From this information, it is possible to identify the layer that most affects the quantization error from the viewpoint of the evaluation index specified in advance, and the quantization process that is the cause in particular. Therefore, the designer of the DNN model can iteratively make adjustments to a particular layer to improve the overall performance of the quantized DNN model (eg, image recognition performance).
 図9に示す変形例に係る情報処理によれば、各演算層の処理を終えた後の誤差因子は、評価指標値が最大のものと、それ以外の誤差因子の合計という2つに集約される。このため、巨大なDNNモデルを設計し、演算層の数が膨大となっても、演算結果を保持するメモリ領域は、もともとの3倍という定数倍に抑えることができ、現実的なメモリ消費量で誤差因子の分析を行うことが可能となる。 According to the information processing related to the modification shown in FIG. 9, the error factors after finishing the processing of each calculation layer are aggregated into two, that is, the one with the maximum evaluation index value and the total of other error factors. To. Therefore, even if a huge DNN model is designed and the number of arithmetic layers becomes enormous, the memory area for holding the arithmetic results can be suppressed to a constant multiple of 3 times the original, which is a realistic memory consumption. It is possible to analyze the error factor with.
<7-2.その他の変形例>
 本開示の実施形態に係る情報処理装置1は、専用のコンピュータシステムで実現してもよいし、汎用のコンピュータシステムで実現してもよい。
<7-2. Other variants>
The information processing apparatus 1 according to the embodiment of the present disclosure may be realized by a dedicated computer system or a general-purpose computer system.
 また、本開示の実施形態に係る情報処理方法を実現するための各種プログラムを、光ディスク、半導体メモリ、磁気テープ、フレキシブルディスク等のコンピュータ読み取り可能な記録媒体等に格納して配布してもよい。このとき、例えば、情報処理装置1は、各種プログラムをコンピュータにインストールして実行することにより、本開示の実施形態に係る情報処理方法を実現する。 Further, various programs for realizing the information processing method according to the embodiment of the present disclosure may be stored and distributed in a computer-readable recording medium such as an optical disk, a semiconductor memory, a magnetic tape, or a flexible disk. At this time, for example, the information processing apparatus 1 realizes the information processing method according to the embodiment of the present disclosure by installing and executing various programs on a computer.
 また、本開示の実施形態に係る情報処理方法を実現するための各種プログラムを、インターネット等のネットワーク上のサーバ装置が備えるディスク装置に格納しておき、コンピュータにダウンロード等できるようにしてもよい。また、本開示の実施形態に係る情報処理方法を実現するための各種プログラムにより提供される機能を、OS(Operating System)とアプリケーションソフトとの協働により実現してもよい。この場合には、OS以外の部分を媒体に格納して配布してもよいし、OS以外の部分をサーバ装置に格納しておき、コンピュータにダウンロード等できるようにしてもよい。 Further, various programs for realizing the information processing method according to the embodiment of the present disclosure may be stored in a disk device provided in a server device on a network such as the Internet so that they can be downloaded to a computer or the like. Further, the functions provided by various programs for realizing the information processing method according to the embodiment of the present disclosure may be realized by the cooperation between the OS (Operating System) and the application software. In this case, the part other than the OS may be stored in a medium and distributed, or the part other than the OS may be stored in the server device so that it can be downloaded to a computer or the like.
 また、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部又は一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部又は一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 Further, among the processes described in the above-described embodiment, all or a part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed can be performed. All or part of it can be done automatically by a known method. In addition, information including processing procedures, specific names, various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified. For example, the various information shown in each figure is not limited to the information shown in the figure.
 また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部又は一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的又は物理的に分散・統合して構成することができる。 Further, each component of each device shown in the figure is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of them may be functionally or physically distributed / physically in arbitrary units according to various loads and usage conditions. Can be integrated and configured.
 また、上記してきた実施形態は、処理内容を矛盾させない領域で適宜組み合わせることが可能である。また、本実施形態のシーケンス図或いはフローチャートに示された各ステップは、適宜順序を変更することが可能である。 Further, the above-described embodiments can be appropriately combined in an area where the processing contents do not contradict each other. Further, the order of each step shown in the sequence diagram or the flowchart of the present embodiment can be changed as appropriate.
<<8.ハードウェア構成>>
 図10を用いて、本開示の実施形態に係る情報処理装置のハードウェア構成例について説明する。図10は、本開示の実施形態に係る情報処理装置として機能するコンピュータの概略構成例を示すブロック図である。なお、図10は、情報処理装置1として機能するコンピュータの概略構成を示すものであり、図10に示す構成要素の一部が省略されていてもよく、図10に示す以外の構成要素がさらに含まれていてもよい。
<< 8. Hardware configuration >>
An example of the hardware configuration of the information processing apparatus according to the embodiment of the present disclosure will be described with reference to FIG. FIG. 10 is a block diagram showing a schematic configuration example of a computer that functions as an information processing apparatus according to the embodiment of the present disclosure. Note that FIG. 10 shows a schematic configuration of a computer that functions as an information processing apparatus 1, and some of the components shown in FIG. 10 may be omitted, and components other than those shown in FIG. 10 may be further omitted. It may be included.
 図10に示すように、情報処理装置1として機能するコンピュータ200は、例えば、CPU201と、ROM202と、RAM203と、インターフェイス204と、入力装置205と、出力装置206と、ストレージ207と、ドライブ208と、ポート209と、通信装置210とを有する。 As shown in FIG. 10, the computer 200 functioning as the information processing device 1 includes, for example, a CPU 201, a ROM 202, a RAM 203, an interface 204, an input device 205, an output device 206, a storage 207, and a drive 208. , Port 209 and communication device 210.
 CPU201は、例えば、演算処理装置又は制御装置として機能し、ROM202に記録された各種プログラムに基づいて各構成要素の動作全般又はその一部を制御する。ROM202に記憶される各種プログラムは、ストレージ207、又はドライブ208を介して接続される記録媒体301に記録されてもよい。この場合、CPU201は、記録媒体301に記憶されているプログラムに基づいて各構成要素の動作全般又はその一部を制御する。各種プログラムは、情報処理装置1の情報処理を実現するための各種機能を提供するプログラムを含む。 The CPU 201 functions as, for example, an arithmetic processing device or a control device, and controls all or a part of the operation of each component based on various programs recorded in the ROM 202. Various programs stored in the ROM 202 may be recorded in the storage 207 or the recording medium 301 connected via the drive 208. In this case, the CPU 201 controls all or a part of the operation of each component based on the program stored in the recording medium 301. The various programs include programs that provide various functions for realizing information processing of the information processing apparatus 1.
 ROM202は、CPU201に読み込まれるプログラムや演算に用いるデータ等を格納する補助記憶装置として機能する。RAM203には、例えば、CPU201に読み込まれるプログラムや、CPU201に読み込まれるプログラムを実行する際に適宜変化する各種パラメータ等が一時的又は永続的に格納される主記憶装置として機能する。 The ROM 202 functions as an auxiliary storage device for storing programs read into the CPU 201, data used for calculations, and the like. The RAM 203 functions as a main storage device for temporarily or permanently storing, for example, a program read into the CPU 201 and various parameters that are appropriately changed when the program read into the CPU 201 is executed.
 CPU201、ROM202、及びRAM203は、ソフトウェア(ROM202等に記憶される各種プログラム)との協働により、上述した制御部150が備える生成部151、演算部152、及び評価部153等の各種機能を実現し得る。 The CPU 201, ROM 202, and RAM 203 realize various functions such as the generation unit 151, the calculation unit 152, and the evaluation unit 153 included in the control unit 150 described above in collaboration with software (various programs stored in the ROM 202 and the like). Can be.
 CPU201、ROM202、RAM203は、バス211を介して相互に接続される。また、バス211は、インターフェイス204を介して、コンピュータ200の各部と接続される。 The CPU 201, ROM 202, and RAM 203 are connected to each other via the bus 211. Further, the bus 211 is connected to each part of the computer 200 via the interface 204.
 入力装置205は、例えば、マウス、キーボード、タッチパネル、ボタン、スイッチ、及びレバー等、ユーザによって情報が入力される装置により実現される。入力装置205は、赤外線やその他の電波を利用して制御信号を送信することが可能なリモートコントローラであってもよい。また、入力装置205として、マイクロフォンなどの音声入力装置が含まれてもよい。上述した入力部110の機能は、入力装置205により実現し得る。 The input device 205 is realized by a device such as a mouse, a keyboard, a touch panel, a button, a switch, and a lever, in which information is input by a user. The input device 205 may be a remote controller capable of transmitting a control signal using infrared rays or other radio waves. Further, the input device 205 may include a voice input device such as a microphone. The function of the input unit 110 described above can be realized by the input device 205.
 出力装置206は、例えば、CRT、LCD、又は有機EL等のディスプレイ装置、スピーカ、ヘッドホン等のオーディオ出力装置、プリンタ、携帯電話、又はファクシミリ等、取得した情報を利用者に対して視覚的又は聴覚的に通知することが可能な装置である。上述した出力部120の機能は、出力装置206により実現し得る。 The output device 206 visually or audibly gives the acquired information to the user, for example, a display device such as a CRT, LCD, or an organic EL, an audio output device such as a speaker or headphones, a printer, a mobile phone, or a facsimile. It is a device that can be notified. The function of the output unit 120 described above can be realized by the output device 206.
 ストレージ207は、各種のデータを格納するための装置であり、例えば、ハードディスクドライブ(HDD)等の磁気記憶デバイス、半導体記憶デバイス、光記憶デバイス、又は光磁気記憶デバイス等が用いられる。上述した記憶部140の機能は、ストレージ9により実現し得る。 The storage 207 is a device for storing various types of data, and for example, a magnetic storage device such as a hard disk drive (HDD), a semiconductor storage device, an optical storage device, an optical magnetic storage device, or the like is used. The function of the storage unit 140 described above can be realized by the storage 9.
 ドライブ208は、例えば、記録媒体301に記録された情報の読み出し、及び記録媒体301への情報の書き込みを行う装置である。記録媒体301には、磁気ディスク、光ディスク、光磁気ディスク、又は半導体メモリ等が含まれる。 The drive 208 is, for example, a device for reading out information recorded on the recording medium 301 and writing information to the recording medium 301. The recording medium 301 includes a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, and the like.
 ポート209は、外部機器302を接続するための接続口であり、USB(Universal Serial Bus)ポート、IEEE1394ポート、SCSI(Small Computer System Interface)、RS-232Cポート、又は光オーディオ端子等が含まれる。外部機器302には、プリンタ、携帯音楽プレーヤ、デジタルカメラ、デジタルビデオカメラ、又はICレコーダ等が含まれる。 The port 209 is a connection port for connecting an external device 302, and includes a USB (Universal Serial Bus) port, an IEEE1394 port, a SCSI (Small Computer System Interface), an RS-232C port, an optical audio terminal, and the like. The external device 302 includes a printer, a portable music player, a digital camera, a digital video camera, an IC recorder, and the like.
 通信装置210は、ネットワークに接続するための通信インターフェイスである。通信装置210は、例えば、有線又は無線LAN(Local Area Network)、LTE(Long Term Evolution)、Bluetooth(登録商標)、又はWUSB(Wireless USB)用の通信カード等である。また、通信装置210は、光通信用のルータ、又は各種通信用モデム等であってもよい。上述した通信部130の機能は、通信装置210により実現し得る。 The communication device 210 is a communication interface for connecting to a network. The communication device 210 is, for example, a communication card for a wired or wireless LAN (Local Area Network), LTE (Long Term Evolution), Bluetooth (registered trademark), WUSB (Wireless USB), or the like. Further, the communication device 210 may be a router for optical communication, various communication modems, or the like. The function of the communication unit 130 described above can be realized by the communication device 210.
<<9.むすび>>
 以上説明したように、本開示の一実施形態によれば、情報処理装置1は、生成部151と、演算部152と、評価部153とを備える。生成部151は、入力データに基づく演算結果を出力する演算層を複数有するDNNモデルM1と、当該DNNモデルM1を量子化した量子化済みDNNモデルM2との差分DNNモデルM3を生成する。演算部152は、DNNモデルM1及び差分DNNモデルM3の相互に対応する演算層の演算結果に基づいて、量子化済みDNNモデルM2の各演算層の量子化誤差を構成する誤差因子を、演算層ごとに導出する。評価部13は、演算層ごとに導出された誤差因子を評価する。
<< 9. Conclusion >>
As described above, according to one embodiment of the present disclosure, the information processing apparatus 1 includes a generation unit 151, a calculation unit 152, and an evaluation unit 153. The generation unit 151 generates a difference DNN model M3 between a DNN model M1 having a plurality of calculation layers that output calculation results based on input data and a quantized DNN model M2 obtained by quantizing the DNN model M1. The calculation unit 152 sets an error factor constituting the quantization error of each calculation layer of the quantized DNN model M2 based on the calculation results of the calculation layers corresponding to each other of the DNN model M1 and the difference DNN model M3. Derived for each. The evaluation unit 13 evaluates the error factors derived for each arithmetic layer.
 これにより、複数の演算層で構成されたDNNの量子化を行う際に、量子化に伴うDNNの性能劣化の要因を詳細に分析できる。 This makes it possible to analyze in detail the factors of the DNN performance deterioration due to the quantization when the DNN composed of a plurality of arithmetic layers is quantized.
 また、演算部152は、DNNモデルM1に対する入力ベクトル、重みパラメータ及びバイアスパラメータと、差分DNNモデルM3において対応する入力ベクトル、重みパラメータ及びバイアスパラメータとの内積演算結果に基づいて、演算層ごとの誤差因子を導出する。これにより、DNNの全接続層の各層において発生した量子化誤差を、量子化誤差を構成する誤差因子に対応する各項を線形結合して表現でき、演算層間で伝搬できる。 Further, the calculation unit 152 performs an error for each calculation layer based on the inner product calculation result of the input vector, the weight parameter and the bias parameter for the DNN model M1 and the input vector, the weight parameter and the bias parameter corresponding to the difference DNN model M3. Derivation of factors. As a result, the quantization error generated in each layer of all the connection layers of the DNN can be expressed by linearly combining the terms corresponding to the error factors constituting the quantization error, and can be propagated between the calculation layers.
 また、演算部152は、演算層において活性化関数による非線形演算が行われる場合、活性化関数を近似し、近似した活性化関数に基づいて、演算層ごとの誤差因子を導出する。これにより、活性化関数による演算結果を線形結合で表現でき、演算層間で伝搬できる。 Further, when the non-linear calculation by the activation function is performed in the calculation layer, the calculation unit 152 approximates the activation function and derives an error factor for each calculation layer based on the approximate activation function. As a result, the calculation result by the activation function can be expressed as a linear combination and propagated between the calculation layers.
 また、演算部152は、演算層において平均値プーリング処理が行われる場合、平均値プーリング処理に用いるフィルタのフィルタサイズ及びフィルタに含まれる要素の線形結合により、平均値プーリング処理における演算層ごとの誤差因子を導出する。これにより、平均値プーリングによる演算結果を線形結合で表現でき、演算層間で伝搬できる。 Further, when the average value pooling process is performed in the arithmetic layer, the arithmetic unit 152 causes an error in each arithmetic layer in the average value pooling process due to the filter size of the filter used for the average value pooling process and the linear combination of the elements included in the filter. Derive the factor. As a result, the calculation result by the average value pooling can be expressed by a linear combination and propagated between the calculation layers.
 また、演算部152は、演算層において最大値プーリング処理が行われる場合、量子化前に代表値として選択されるはずであった要素と、量子化後に代表値として選択された要素との差分を用いて、演算層ごとの誤差因子を導出する。これにより、最大値プーリングによる演算結果を線形結合として取り扱うことができ、演算層間で伝搬できる。 Further, the calculation unit 152 determines the difference between the element that should have been selected as the representative value before the quantization and the element selected as the representative value after the quantization when the maximum value pooling process is performed in the calculation layer. It is used to derive the error factor for each arithmetic layer. As a result, the calculation result by the maximum value pooling can be treated as a linear combination and can be propagated between the calculation layers.
 また、評価部153は、予め規定される評価指標に基づいて、量子化済みDNNモデルM2の出力誤差に対する演算層ごとの誤差因子の影響の度合いを評価する。これにより、どの演算層で発生した量子化誤差が、出力誤差に対する影響が大きいかを特定できる。 Further, the evaluation unit 153 evaluates the degree of influence of the error factor for each arithmetic layer on the output error of the quantized DNN model M2 based on the evaluation index defined in advance. This makes it possible to identify in which arithmetic layer the quantization error has a large effect on the output error.
 また、評価部153は、量子化済みDNNモデルM2の出力誤差に対する演算層ごとの誤差因子の影響の度合いを、評価結果に応じた異なる態様で表示する。これにより、演算層ごとの誤差因子の影響の度合いを一見して認識できる。 Further, the evaluation unit 153 displays the degree of influence of the error factor for each arithmetic layer on the output error of the quantized DNN model M2 in different modes according to the evaluation result. As a result, the degree of influence of the error factor for each calculation layer can be recognized at a glance.
 また、評価部153は、演算層ごとの誤差因子の各々について、当該誤差因子に関する詳細情報を提示する。これにより、誤差因子に含まれる複数の因子のうち、影響が大きい因子を特定できる。 Further, the evaluation unit 153 presents detailed information about the error factor for each of the error factors for each calculation layer. This makes it possible to identify the factor that has a large influence among the plurality of factors included in the error factor.
 また、評価部153は、詳細情報とともに、量子化済みDNNモデルM2を最適化するための助言情報を提示する。これにより、量子化済みDNNモデルM2の調整をより簡便に行うことができる。 In addition, the evaluation unit 153 presents detailed information as well as advice information for optimizing the quantized DNN model M2. As a result, the quantized DNN model M2 can be adjusted more easily.
 また、評価部153は、演算層ごとの誤差因子を後段の演算層に伝搬させる場合、予め規定される条件に基づいて、誤差因子を集約する。これにより、量子化対象となるDNNモデルM1が巨大であっても、現実的なメモリ消費量で誤差因子の分析を行うことが可能となる。 Further, when the error factor for each calculation layer is propagated to the calculation layer in the subsequent stage, the evaluation unit 153 aggregates the error factors based on the conditions specified in advance. As a result, even if the DNN model M1 to be quantized is huge, it is possible to analyze the error factor with a realistic memory consumption.
 また、本明細書に記載された効果は、あくまで説明的または例示的なものであって限定的ではない。つまり、本開示の技術は、上記の効果とともに、または上記の効果に代えて、本明細書の記載から当業者にとって明らかな他の効果を奏しうる。 Further, the effects described in the present specification are merely explanatory or exemplary and are not limited. That is, the techniques of the present disclosure may have other effects apparent to those of skill in the art from the description herein, in addition to, or in lieu of, the above effects.
 なお、本開示の技術は、本開示の技術的範囲に属するものとして、以下のような構成もとることができる。
(1)
 入力データに基づく演算結果を出力する演算層を複数有するDNNモデルと、当該DNNモデルを量子化した量子化済みDNNモデルとの差分モデルを生成する生成部と、
 前記DNNモデル及び前記差分モデルの相互に対応する演算層の演算結果に基づいて、前記量子化済みDNNモデルの各演算層の量子化誤差を構成する誤差因子を、前記演算層ごとに導出する演算部と、
 前記演算層ごとに導出された前記誤差因子を評価する評価部と
 を備える情報処理装置。
(2)
 前記演算部は、
 前記DNNモデルに対する入力ベクトル、重みパラメータ及びバイアスパラメータと、前記差分モデルにおいて対応する入力ベクトル、重みパラメータ及びバイアスパラメータとの内積演算結果に基づいて、前記演算層ごとの前記誤差因子を導出する
 前記(1)に記載の情報処理装置。
(3)
 前記演算部は、
 前記演算層において活性化関数による非線形演算が行われる場合、前記活性化関数を近似し、近似した前記活性化関数に基づいて、前記演算層ごとの前記誤差因子を導出する
 前記(2)に記載の情報処理装置。
(4)
 前記演算部は、
 前記演算層において平均値プーリング処理が行われる場合、平均値プーリング処理に用いるフィルタのフィルタサイズ及び前記フィルタに含まれる要素の線形結合により、平均値プーリング処理における前記演算層ごとの前記誤差因子を導出する
 前記(3)に記載の情報処理装置。
(5)
 前記演算部は、
 前記演算層において最大値プーリング処理が行われる場合、量子化前に代表値として選択されるはずであった要素と、量子化後に代表値として選択された要素との差分を用いて、前記演算層ごとの前記誤差因子を導出する
 前記(3)に記載の情報処理装置。
(6)
 前記評価部は、
 予め規定される評価指標に基づいて、前記量子化済みDNNモデルの量子化済み出力に含まれる出力誤差に対する前記演算層ごとの前記誤差因子の影響の度合いを評価する
 前記(1)~前記(5)のいずれか1つに記載の情報処理装置。
(7)
 前記評価部は、
 前記量子化済み出力に含まれる誤差に対する前記演算層ごとの前記誤差因子の影響の度合いを、評価結果に応じた異なる態様で表示する
 前記(6)に記載の情報処理装置。
(8)
 前記評価部は、
 前記演算層ごとの前記誤差因子の各々について、当該誤差因子に関する詳細情報を提示する
 前記(6)又は前記(7)に記載の情報処理装置。
(9)
 前記評価部は、
 前記詳細情報とともに、前記量子化済みDNNモデルを最適化するための助言情報を提示する
 前記(6)~前記(8)のいずれか1つに記載の情報処理装置。
(10)
 評価部は、
 前記演算層ごとの前記誤差因子を後段の前記演算層に伝搬させる場合、予め規定される条件に基づいて、前記誤差因子を集約する
 前記(1)に記載の情報処理装置。
(11)
 プロセッサが、
 入力データに基づく演算結果を出力する演算層を複数有するDNNモデルと、当該DNNモデルを量子化した量子化済みDNNモデルとの差分モデルを生成し、
 前記DNNモデル及び前記差分モデルの相互に対応する演算層の演算結果に基づいて、前記量子化済みDNNモデルの各演算層の量子化誤差を構成する誤差因子を、前記演算層ごとに導出し、
 前記演算層ごとに導出された前記誤差因子を評価する
 情報処理方法。
The technology of the present disclosure can be configured as follows, assuming that it belongs to the technical scope of the present disclosure.
(1)
A generator that generates a difference model between a DNN model having a plurality of calculation layers that output calculation results based on input data and a quantized DNN model obtained by quantizing the DNN model.
An operation for deriving an error factor constituting the quantization error of each arithmetic layer of the quantized DNN model for each arithmetic layer based on the arithmetic results of the arithmetic layers corresponding to each other of the DNN model and the difference model. Department and
An information processing device including an evaluation unit for evaluating the error factor derived for each calculation layer.
(2)
The calculation unit
Based on the result of the inner product calculation of the input vector, weight parameter and bias parameter for the DNN model and the corresponding input vector, weight parameter and bias parameter in the difference model, the error factor for each calculation layer is derived ( The information processing device according to 1).
(3)
The calculation unit
Described in (2) above, when a non-linear operation by an activation function is performed in the arithmetic layer, the activation function is approximated and the error factor for each arithmetic layer is derived based on the approximated activation function. Information processing equipment.
(4)
The calculation unit
When the average value pooling process is performed in the arithmetic layer, the error factor for each arithmetic layer in the average value pooling process is derived from the filter size of the filter used for the average value pooling process and the linear combination of the elements included in the filter. The information processing apparatus according to (3) above.
(5)
The calculation unit
When the maximum value pooling process is performed in the calculation layer, the calculation layer uses the difference between the element that should have been selected as the representative value before the quantization and the element selected as the representative value after the quantization. The information processing apparatus according to (3) above, which derives the error factor for each.
(6)
The evaluation unit
The degree of influence of the error factor for each arithmetic layer on the output error included in the quantized output of the quantized DNN model is evaluated based on a predetermined evaluation index (1) to (5). ) Is described in any one of the information processing devices.
(7)
The evaluation unit
The information processing apparatus according to (6), wherein the degree of influence of the error factor for each calculation layer on the error included in the quantized output is displayed in different modes according to the evaluation result.
(8)
The evaluation unit
The information processing apparatus according to (6) or (7) above, which presents detailed information about the error factor for each of the error factors for each calculation layer.
(9)
The evaluation unit
The information processing apparatus according to any one of (6) to (8) above, which presents advice information for optimizing the quantized DNN model together with the detailed information.
(10)
The evaluation department
The information processing apparatus according to (1), wherein when the error factor for each calculation layer is propagated to the calculation layer in the subsequent stage, the error factors are aggregated based on predetermined conditions.
(11)
The processor,
A difference model is generated between a DNN model having a plurality of calculation layers that output calculation results based on input data and a quantized DNN model obtained by quantizing the DNN model.
Based on the calculation results of the calculation layers corresponding to each other of the DNN model and the difference model, an error factor constituting the quantization error of each calculation layer of the quantized DNN model is derived for each calculation layer.
An information processing method for evaluating the error factor derived for each arithmetic layer.
1 情報処理装置
110 入力部
120 出力部
121 分析ウィンドウ
121-1 誤差因子情報
121-2 最適化プラン情報
130 通信部
140 記憶部
141 設計ツール格納部
142 評価用閾値格納部
150 制御部
151 生成部
152 演算部
153 評価部
200 コンピュータ
201 CPU
202 ROM
203 RAM
204 インターフェイス(I/F)
205 入力装置
206 出力装置
207 ストレージ
208 ドライブ
209 ポート
210 通信装置
211 バス
1 Information processing unit 110 Input unit 120 Output unit 121 Analysis window 121-1 Error factor information 121-2 Optimization plan information 130 Communication unit 140 Storage unit 141 Design tool storage unit 142 Evaluation threshold storage unit 150 Control unit 151 Generation unit 152 Calculation unit 153 Evaluation unit 200 Computer 201 CPU
202 ROM
203 RAM
204 interface (I / F)
205 Input device 206 Output device 207 Storage 208 Drive 209 Port 210 Communication device 211 Bus

Claims (11)

  1.  入力データに基づく演算結果を出力する演算層を複数有するDNNモデルと、当該DNNモデルを量子化した量子化済みDNNモデルとの差分モデルを生成する生成部と、
     前記DNNモデル及び前記差分モデルの相互に対応する演算層の演算結果に基づいて、前記量子化済みDNNモデルの各演算層の量子化誤差を構成する誤差因子を、前記演算層ごとに導出する演算部と、
     前記演算層ごとに導出された前記誤差因子を評価する評価部と
     を備える情報処理装置。
    A generator that generates a difference model between a DNN model having a plurality of calculation layers that output calculation results based on input data and a quantized DNN model obtained by quantizing the DNN model.
    An operation for deriving an error factor constituting the quantization error of each arithmetic layer of the quantized DNN model for each arithmetic layer based on the arithmetic results of the arithmetic layers corresponding to each other of the DNN model and the difference model. Department and
    An information processing device including an evaluation unit for evaluating the error factor derived for each calculation layer.
  2.  前記演算部は、
     前記DNNモデルに対する入力ベクトル、重みパラメータ及びバイアスパラメータと、前記差分モデルにおいて対応する入力ベクトル、重みパラメータ及びバイアスパラメータとの内積演算結果に基づいて、前記演算層ごとの前記誤差因子を導出する
     請求項1に記載の情報処理装置。
    The calculation unit
    Claim to derive the error factor for each calculation layer based on the inner product calculation result of the input vector, weight parameter and bias parameter for the DNN model and the corresponding input vector, weight parameter and bias parameter in the difference model. The information processing apparatus according to 1.
  3.  前記演算部は、
     前記演算層において活性化関数による非線形演算が行われる場合、前記活性化関数を近似し、近似した前記活性化関数に基づいて、前記演算層ごとの前記誤差因子を導出する
     請求項2に記載の情報処理装置。
    The calculation unit
    The second aspect of claim 2, wherein when the non-linear operation by the activation function is performed in the arithmetic layer, the activation function is approximated and the error factor for each arithmetic layer is derived based on the approximated activation function. Information processing device.
  4.  前記演算部は、
     前記演算層において平均値プーリング処理が行われる場合、平均値プーリング処理に用いるフィルタのフィルタサイズ及び前記フィルタに含まれる要素の線形結合により、平均値プーリング処理における前記演算層ごとの前記誤差因子を導出する
     請求項3に記載の情報処理装置。
    The calculation unit
    When the average value pooling process is performed in the arithmetic layer, the error factor for each arithmetic layer in the average value pooling process is derived from the filter size of the filter used for the average value pooling process and the linear combination of the elements included in the filter. The information processing apparatus according to claim 3.
  5.  前記演算部は、
     前記演算層において最大値プーリング処理が行われる場合、量子化前に代表値として選択されるはずであった要素と、量子化後に代表値として選択された要素との差分を用いて、前記演算層ごとの前記誤差因子を導出する
     請求項3に記載の情報処理装置。
    The calculation unit
    When the maximum value pooling process is performed in the calculation layer, the calculation layer uses the difference between the element that should have been selected as the representative value before the quantization and the element selected as the representative value after the quantization. The information processing apparatus according to claim 3, wherein the error factor for each is derived.
  6.  前記評価部は、
     予め規定される評価指標に基づいて、前記量子化済みDNNモデルの量子化済み出力に含まれる誤差に対する前記演算層ごとの前記誤差因子の影響の度合いを評価する
     請求項1に記載の情報処理装置。
    The evaluation unit
    The information processing apparatus according to claim 1, which evaluates the degree of influence of the error factor for each arithmetic layer on the error included in the quantized output of the quantized DNN model based on a predetermined evaluation index. ..
  7.  前記評価部は、
     前記量子化済み出力に含まれる誤差に対する前記演算層ごとの前記誤差因子の影響の度合いを、評価結果に応じた異なる態様で表示する
     請求項6に記載の情報処理装置。
    The evaluation unit
    The information processing apparatus according to claim 6, wherein the degree of influence of the error factor for each arithmetic layer on the error included in the quantized output is displayed in different modes according to the evaluation result.
  8.  前記評価部は、
     前記演算層ごとの前記誤差因子の各々について、当該誤差因子に関する詳細情報を提示する
     請求項7に記載の情報処理装置。
    The evaluation unit
    The information processing apparatus according to claim 7, wherein detailed information about the error factor is presented for each of the error factors for each calculation layer.
  9.  前記評価部は、
     前記詳細情報とともに、前記量子化済みDNNモデルを最適化するための助言情報を提示する
     請求項8に記載の情報処理装置。
    The evaluation unit
    The information processing apparatus according to claim 8, which presents advice information for optimizing the quantized DNN model together with the detailed information.
  10.  前記演算部は、
     前記演算層ごとの前記誤差因子を後段の前記演算層に伝搬させる場合、予め規定される条件に基づいて、前記誤差因子を集約する
     請求項1に記載の情報処理装置。
    The calculation unit
    The information processing apparatus according to claim 1, wherein when the error factor for each calculation layer is propagated to the calculation layer in the subsequent stage, the error factors are aggregated based on predetermined conditions.
  11.  プロセッサが、
     入力データに基づく演算結果を出力する演算層を複数有するDNNモデルと、当該DNNモデルを量子化した量子化済みDNNモデルとの差分モデルを生成し、
     前記DNNモデル及び前記差分モデルの相互に対応する演算層の演算結果に基づいて、前記量子化済みDNNモデルの各演算層の量子化誤差を構成する誤差因子を、前記演算層ごとに導出し、
     前記演算層ごとに導出された前記誤差因子を評価する
     情報処理方法。
    The processor,
    A difference model is generated between a DNN model having a plurality of calculation layers that output calculation results based on input data and a quantized DNN model obtained by quantizing the DNN model.
    Based on the calculation results of the calculation layers corresponding to each other of the DNN model and the difference model, an error factor constituting the quantization error of each calculation layer of the quantized DNN model is derived for each calculation layer.
    An information processing method for evaluating the error factor derived for each arithmetic layer.
PCT/JP2021/019876 2020-06-03 2021-05-25 Information processing device and information processing method WO2021246249A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020096987 2020-06-03
JP2020-096987 2020-06-03

Publications (1)

Publication Number Publication Date
WO2021246249A1 true WO2021246249A1 (en) 2021-12-09

Family

ID=78831025

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/019876 WO2021246249A1 (en) 2020-06-03 2021-05-25 Information processing device and information processing method

Country Status (1)

Country Link
WO (1) WO2021246249A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019046072A (en) * 2017-08-31 2019-03-22 Tdk株式会社 Control device for array comprising neuromorphic element, calculation method of discretization step size, and program

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019046072A (en) * 2017-08-31 2019-03-22 Tdk株式会社 Control device for array comprising neuromorphic element, calculation method of discretization step size, and program

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HIROSE KAZUTOSHI, ANDO KOTA, UEYOSHI KODAI, IKEBE MASAYUKI, ASAI TETSUYA, MOTOMURA MASATO, TAKAMAEDA-YAMAZAKI SHINYA: "Quantization Error-aware Neural Network Training", JSAI TECHNICAL REPORT, SIG-FPAI, vol. 104, no. SIG-FPAI-B507-01, 1 August 2017 (2017-08-01), pages 1 - 4, XP009533077, DOI: 10.11517/jsaifpai.104.0_01 *
YUKA OU; DAISUKE MURAKAMI; TATSUYA NAKAE; KOTA ANDO; TETSUYA ASAI; MASATO MOTOMURA; SHINYA TAKAMAEDA: "Examination of hardware-oriented accuracy improvement method of binarized neural network", IPSG SIG TECHNICAL REPORT, vol. 2019-ARC-236, no. 10, 4 June 2019 (2019-06-04), JP , pages 1 - 6, XP009532834, ISSN: 2188-8574 *

Similar Documents

Publication Publication Date Title
CN110880036B (en) Neural network compression method, device, computer equipment and storage medium
TWI698807B (en) Artificial neural network class-based pruning
US9563825B2 (en) Convolutional neural network using a binarized convolution layer
CN108197652B (en) Method and apparatus for generating information
WO2019155064A1 (en) Data compression using jointly trained encoder, decoder, and prior neural networks
JP2016085704A (en) Information processing system, information processing device, information processing method, and program
JP5246030B2 (en) Circuit automatic design program, method and apparatus
JP6831347B2 (en) Learning equipment, learning methods and learning programs
WO2023050707A1 (en) Network model quantization method and apparatus, and computer device and storage medium
JP7287388B2 (en) Information processing device and information processing method
WO2020000689A1 (en) Transfer-learning-based robo-advisor strategy generation method and apparatus, and electronic device and storage medium
CN112150347B (en) Image modification patterns learned from a limited set of modified images
CN110969175A (en) Wafer processing method and device, storage medium and electronic equipment
WO2018228399A1 (en) Computing device and method
US20230268035A1 (en) Method and apparatus for generating chemical structure using neural network
KR20200027080A (en) Electronic apparatus and control method thereof
CN111753954A (en) Hyper-parameter optimization method of sparse loss function
JP2021144461A (en) Learning device and inference device
CN112085668B (en) Image tone mapping method based on region self-adaptive self-supervision learning
CN112241761B (en) Model training method and device and electronic equipment
WO2021246249A1 (en) Information processing device and information processing method
CN113010687B (en) Exercise label prediction method and device, storage medium and computer equipment
US11861452B1 (en) Quantized softmax layer for neural networks
US11809984B2 (en) Automatic tag identification for color themes
US20230092545A1 (en) Image data analytics using neural networks for automated design evaluation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21817586

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21817586

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP