JPWO2022201352A5

JPWO2022201352A5 -

Info

Publication number: JPWO2022201352A5
Application number: JP2023508251A
Authority: JP
Filing date: 2021-03-24
Publication date: 2023-06-28
Anticipated expiration: 2041-03-24

Description

本開示に係る推論装置は、
推論用データを用いて機械学習の手法に基づいた少なくとも１回の量子化演算を実行する量子化推論部と、
前記推論用データを用いて前記少なくとも１回の量子化演算それぞれに対応する少なくとも１回の非量子化演算の少なくともいずれかを実行する非量子化推論部と
を備える推論装置であって、
前記少なくとも１回の量子化演算それぞれは、前記少なくとも１回の量子化演算それぞれの特徴を示す少なくとも１つの量子化特徴データそれぞれに応じた演算であり、
前記少なくとも１回の非量子化演算それぞれは、前記少なくとも１回の非量子化演算それぞれの特徴を示す少なくとも１つの非量子化特徴データそれぞれに応じた演算であり、
前記推論装置は、さらに、
前記少なくとも１回の量子化演算の少なくともいずれかにおいてオーバーフローが発生した場合に、オーバーフローが発生した量子化演算の各々に対応する量子化特徴データと、オーバーフローが発生した量子化演算の各々に対応する非量子化演算に対応する非量子化特徴データとを抽出するデータ抽出部
を備える。 The reasoning device according to the present disclosure is
a quantization inference unit that uses the inference data to perform at least one quantization operation based on machine learning techniques;
a non-quantization inference unit that performs at least one of at least one non-quantization operation corresponding to each of the at least one quantization operation using the inference data, the inference device comprising:
each of the at least one quantization operation is an operation corresponding to each of at least one quantization feature data indicating a feature of each of the at least one quantization operation;
each of the at least one unquantized operation is an operation corresponding to each of at least one piece of unquantized feature data representing a feature of each of the at least one unquantized operation;
The reasoning device further
When overflow occurs in at least one of the at least one quantization operations, quantization feature data corresponding to each of the quantization operations in which overflow has occurred, and quantization feature data corresponding to each of the quantization operations in which overflow has occurred. A data extraction unit for extracting non-quantized feature data corresponding to the non-quantized operation.

本開示によれば、機械学習の手法に基づいた推論においてオーバーフローが発生した場合に、データ抽出部が、発生したオーバーフローに関係のある量子化特徴データと非量子化特徴データとを抽出する。ここで、量子化特徴データと非量子化特徴データとの各々は推論用データとは異なるデータである。そのため、本開示によれば、ある推論用データを用いて推論を実行した際にオーバーフローが発生した場合において、発生したオーバーフローを解析するためのデータであって当該ある推論用データとは異なるデータを取得することができる。 According to the present disclosure, when an overflow occurs in inference based on a machine learning technique , the data extraction unit extracts quantized feature data and non-quantized feature data related to the overflow that has occurred. . Here, each of the quantized feature data and the non-quantized feature data is data different from the inference data. Therefore, according to the present disclosure, when an overflow occurs when inference is executed using certain inference data, data for analyzing the overflow that has occurred and is different from the inference data can be obtained.

データ抽出部１３０は、量子化推論プロセスにおいてオーバーフローが発生した場合に、量子化推論プロセスと非量子化推論プロセスとから退避データを抽出する。退避データは、量子化推論プロセスにおけるオーバーフローを解析する際に用いられるデータであり、具体例として、入力データと特徴データとから成る。入力データは、少なくとも１回の量子化演算の各々と少なくとも１回の非量子化演算の各々とを実行する際に入力されるデータである。特徴データは、量子化演算において活用されるデータである。特徴データは、量子化特徴データと非量子化特徴データとの総称であり、少なくとも１回の量子化演算と少なくとも１回の非量子化演算との各々の演算ごとに存在し、具体例として演算における入力データの振れ幅を表す。入力データの振れ幅は、入力データとして想定されるデータが示す値の最小値から最大値までの範囲である。データ抽出部１３０は、少なくとも１回の量子化演算の少なくともいずれかにおいてオーバーフローが発生した場合に、オーバーフローが発生した量子化演算の各々に対応する量子化特徴データと、オーバーフローが発生した量子化演算の各々に対応する非量子化演算に対応する非量子化特徴データとを抽出する。
機械学習の手法がディープラーニングである場合において、データ抽出部１３０は、量子化特徴データとしてオーバーフローが発生した量子化演算に対応するレイヤについてのパラメータを示すデータを抽出してもよく、非量子化特徴データとしてオーバーフローが発生した量子化演算に対応する非量子化演算に対応するレイヤについてのパラメータを示すデータを抽出してもよい。 The data extraction unit 130 extracts saved data from the quantization inference process and the non-quantization inference process when an overflow occurs in the quantization inference process. The saved data is data used when analyzing overflow in the quantization inference process, and as a specific example, consists of input data and feature data. Input data is data that is input in performing each of the at least one quantized operation and each of the at least one unquantized operation. Feature data is data utilized in the quantization operation. Feature data is a general term for quantized feature data and non-quantized feature data, and exists for each of at least one quantized operation and at least one non-quantized operation. represents the amplitude of the input data at The amplitude of the input data is the range from the minimum value to the maximum value indicated by the data assumed as the input data. When an overflow occurs in at least one of at least one quantization operation, the data extracting unit 130 extracts quantization feature data corresponding to each of the quantization operations in which the overflow occurs, and the quantization operation in which the overflow occurs. and unquantized feature data corresponding to unquantized operations corresponding to each of .
When the machine learning technique is deep learning , the data extraction unit 130 may extract, as the quantization feature data, data indicating parameters for the layer corresponding to the quantization operation in which the overflow occurred. As the quantization feature data, data representing parameters for a layer corresponding to a non-quantization operation corresponding to a quantization operation in which overflow has occurred may be extracted.

Claims

a quantization inference unit that uses the inference data to perform at least one quantization operation based on machine learning techniques;
a non-quantization inference unit that performs at least one of at least one non-quantization operation corresponding to each of the at least one quantization operation using the inference data, the inference device comprising:
each of the at least one quantization operation is an operation corresponding to each of at least one quantization feature data indicating a feature of each of the at least one quantization operation;
each of the at least one unquantized operation is an operation corresponding to each of at least one piece of unquantized feature data representing a feature of each of the at least one unquantized operation;
The reasoning device further
When overflow occurs in at least one of the at least one quantization operations, quantization feature data corresponding to each of the quantization operations in which overflow has occurred, and quantization feature data corresponding to each of the quantization operations in which overflow has occurred. An inference apparatus comprising a data extraction unit for extracting non-quantized feature data corresponding to non-quantized operations.

each of the at least one quantized feature data includes parameters corresponding to each of the at least one quantization operation;
2. The reasoning apparatus of claim 1, wherein each of the at least one unquantized feature data includes parameters corresponding to each of the at least one unquantized operation.

The machine learning method is deep learning,
The data extractor is
extracting, as the quantization feature data, data indicating a parameter for a layer corresponding to a quantization operation in which an overflow has occurred;
3. The reasoning apparatus according to claim 1, wherein, as said non-quantized feature data, data indicating parameters for a layer corresponding to a non-quantized operation corresponding to a quantized operation in which an overflow has occurred is extracted.

The reasoning device further
quantization corresponding to the target operation, which is the quantization operation corresponding to the extracted quantized feature data, based on the extracted quantized feature data and the non-quantized feature data, so that overflow does not occur in the target operation; a requantizer that modifies the feature data,
4. The reasoning apparatus according to any one of claims 1 to 3, wherein the quantization reasoning unit executes a quantization operation according to changed quantization feature data.

a computer performs at least one quantization operation based on machine learning techniques using the inference data;
an inference method in which the computer performs at least one unquantized operation corresponding to each of the at least one quantized operation using the inference data,
each of the at least one quantization operation is an operation corresponding to each of at least one quantization feature data indicating a feature of each of the at least one quantization operation;
each of the at least one unquantized operation is an operation corresponding to each of at least one piece of unquantized feature data representing a feature of each of the at least one unquantized operation;
When an overflow occurs in at least one of the at least one quantization operations, the computer provides quantization feature data corresponding to each of the quantization operations in which the overflow occurred, and quantization feature data corresponding to each of the quantization operations in which the overflow occurred. An inference method for extracting unquantized feature data corresponding to each corresponding unquantized operation.

quantization inference processing that performs at least one quantization operation based on machine learning techniques using inference data;
making an inference device, which is a computer, perform non-quantized inference processing for executing at least one of at least one non-quantized operation corresponding to each of the at least one quantized operation using the inference data; a program,
each of the at least one quantization operation is an operation corresponding to each of at least one quantization feature data indicating a feature of each of the at least one quantization operation;
each of the at least one unquantized operation is an operation corresponding to each of at least one piece of unquantized feature data representing a feature of each of the at least one unquantized operation;
The inference program further
When overflow occurs in at least one of the at least one quantization operations, quantization feature data corresponding to each of the quantization operations in which overflow has occurred, and quantization feature data corresponding to each of the quantization operations in which overflow has occurred. An inference program that causes the inference device to execute data extraction processing for extracting non-quantized feature data corresponding to non-quantized operations.