JP7350214B2

JP7350214B2 - Inference device, inference method, and inference program

Info

Publication number: JP7350214B2
Application number: JP2023508251A
Authority: JP
Inventors: 昌弘出口; 武尚水口
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2021-03-24
Filing date: 2021-03-24
Publication date: 2023-09-25
Anticipated expiration: 2041-03-24
Also published as: TW202238458A; JPWO2022201352A1; WO2022201352A1

Description

本開示は、推論装置、推論方法、及び、推論プログラムに関する。 The present disclosure relates to an inference device, an inference method, and an inference program.

ディープラーニングの推論環境を組込機器等のリソース制約がある装置に搭載する場合、ディープラーニングに関する演算を軽量化する必要がある。軽量化する手法の一つとして、浮動小数点演算を固定小数点演算又は整数演算に置き換える技術がある。当該技術は、一般的に量子化と呼ばれる。特許文献１は、推論を高速化するための量子化を比較的高い精度で実施するために統計的手法を用いてデータ分布を推定する技術を開示している。 When installing a deep learning inference environment on a device with resource constraints such as an embedded device, it is necessary to reduce the weight of deep learning calculations. One method for reducing the weight is to replace floating point operations with fixed point operations or integer operations. This technique is commonly called quantization. Patent Document 1 discloses a technique of estimating data distribution using a statistical method in order to perform quantization with relatively high accuracy to speed up inference.

特開２０１８－０１０６１８号公報Japanese Patent Application Publication No. 2018-010618

特許文献１が開示する技術によれば、量子化において、学習用データセット等を用いた場合においてオーバーフローが発生しないよう、推論に関する演算のパラメータ等が調整される。しかしながら、学習用データセット等があらゆる推論用データを網羅することはできない。そのため、特許文献１が開示する技術によれば、どのような推論用データを用いた場合であってもオーバーフローが発生しないことを保証することはできず、推論用データによってはオーバーフローが発生し得るという課題がある。ここで、推論用データは機密情報であることもあるため、ある推論用データを用いて推論を実行した際にオーバーフローが発生した場合であっても、発生したオーバーフローを当該ある推論用データを用いて解析することができるとは限らない。 According to the technology disclosed in Patent Document 1, in quantization, parameters of calculations related to inference are adjusted so that overflow does not occur when a learning data set or the like is used. However, the training data set etc. cannot cover all inference data. Therefore, according to the technology disclosed in Patent Document 1, it cannot be guaranteed that overflow will not occur no matter what kind of inference data is used, and overflow may occur depending on the inference data. There is a problem. Here, since the inference data may be confidential information, even if an overflow occurs when inference is performed using a certain inference data, the overflow that has occurred can be handled using the inference data. It is not always possible to analyze the

本開示は、ある推論用データを用いて推論を実行した際にオーバーフローが発生した場合において、発生したオーバーフローを解析するためのデータであって当該ある推論用データとは異なるデータを取得することを目的とする。 The present disclosure provides a method for acquiring data for analyzing the overflow that is different from the certain inference data when an overflow occurs when inference is performed using certain inference data. purpose.

本開示に係る推論装置は、
推論用データを用いて機械学習の手法に基づいた少なくとも１回の量子化演算を実行する量子化推論部と、
前記推論用データを用いて前記少なくとも１回の量子化演算それぞれに対応する少なくとも１回の非量子化演算の少なくともいずれかを実行する非量子化推論部と
を備える推論装置であって、
前記少なくとも１回の量子化演算それぞれは、前記少なくとも１回の量子化演算それぞれの特徴を示す少なくとも１つの量子化特徴データそれぞれに応じた演算であり、
前記少なくとも１回の非量子化演算それぞれは、前記少なくとも１回の非量子化演算それぞれの特徴を示す少なくとも１つの非量子化特徴データそれぞれに応じた演算であり、
前記推論装置は、さらに、
前記少なくとも１回の量子化演算の少なくともいずれかにおいてオーバーフローが発生した場合に、オーバーフローが発生した量子化演算の各々に対応する量子化特徴データと、オーバーフローが発生した量子化演算の各々に対応する非量子化演算に対応する非量子化特徴データとを抽出するデータ抽出部
を備える。 The inference device according to the present disclosure includes:
a quantization inference unit that performs at least one quantization operation based on a machine learning method using the inference data;
An inference device comprising: a non-quantization inference unit that executes at least one of at least one non-quantization operation corresponding to each of the at least one quantization operation using the inference data,
Each of the at least one quantization operation is an operation corresponding to at least one quantization feature data indicating a characteristic of each of the at least one quantization operation,
Each of the at least one non-quantized operation is an operation according to each of at least one non-quantized feature data indicating the characteristics of each of the at least one non-quantized operation,
The inference device further includes:
When an overflow occurs in at least one of the at least one quantization operation, quantization feature data corresponding to each quantization operation in which an overflow occurred, and quantization feature data corresponding to each quantization operation in which an overflow occurred. The apparatus includes a data extraction unit that extracts non-quantized feature data corresponding to non-quantized operations.

本開示によれば、機械学習の手法に基づいた推論においてオーバーフローが発生した場合に、データ抽出部が、発生したオーバーフローに関係のある量子化特徴データと非量子化特徴データとを抽出する。ここで、量子化特徴データと非量子化特徴データとの各々は推論用データとは異なるデータである。そのため、本開示によれば、ある推論用データを用いて推論を実行した際にオーバーフローが発生した場合において、発生したオーバーフローを解析するためのデータであって当該ある推論用データとは異なるデータを取得することができる。 According to the present disclosure, when an overflow occurs in inference based on a machine learning method , the data extraction unit extracts quantized feature data and non-quantized feature data related to the overflow that has occurred. . Here, each of the quantized feature data and the non-quantized feature data is data different from the inference data. Therefore, according to the present disclosure, when an overflow occurs when inference is performed using certain inference data, data for analyzing the overflow that has occurred and which is different from the certain inference data is used. can be obtained.

実施の形態１に係る推論装置１００の構成例を示す図。1 is a diagram showing a configuration example of an inference device 100 according to Embodiment 1. FIG. 実施の形態１に係るプロセスの優先度を説明する図。FIG. 3 is a diagram illustrating priority levels of processes according to the first embodiment. 実施の形態１に係る推論装置１００のハードウェア構成例を示す図。1 is a diagram showing an example of a hardware configuration of an inference device 100 according to Embodiment 1. FIG. 実施の形態１に係る推論装置１００の動作を示すフローチャート。7 is a flowchart showing the operation of the inference device 100 according to the first embodiment. 実施の形態１に係る推論装置１００の動作を説明する図。FIG. 3 is a diagram illustrating the operation of the inference device 100 according to the first embodiment. 実施の形態１の変形例に係る推論装置１００のハードウェア構成例を示す図。7 is a diagram illustrating an example of the hardware configuration of an inference device 100 according to a modification of the first embodiment. FIG. 実施の形態１の変形例に係る推論装置１００のハードウェア構成例を示す図。7 is a diagram illustrating an example of the hardware configuration of an inference device 100 according to a modification of the first embodiment. FIG. 実施の形態２に係る推論装置１００の構成例を示す図。FIG. 3 is a diagram illustrating a configuration example of an inference device 100 according to a second embodiment. 実施の形態２に係る推論装置１００の動作を示すフローチャート。7 is a flowchart showing the operation of the inference device 100 according to the second embodiment.

実施の形態の説明及び図面において、同じ要素及び対応する要素には同じ符号を付している。同じ符号が付された要素の説明は、適宜に省略又は簡略化する。図中の矢印はデータの流れ又は処理の流れを主に示している。また、「部」を、「回路」、「工程」、「手順」、「処理」又は「サーキットリー」に適宜読み替えてもよい。 In the description of the embodiments and the drawings, the same elements and corresponding elements are denoted by the same reference numerals. Descriptions of elements labeled with the same reference numerals will be omitted or simplified as appropriate. Arrows in the figure mainly indicate the flow of data or processing. Furthermore, "unit" may be read as "circuit," "process," "procedure," "process," or "circuitry" as appropriate.

実施の形態１．
以下、本実施の形態について、図面を参照しながら詳細に説明する。Embodiment 1.
Hereinafter, this embodiment will be described in detail with reference to the drawings.

＊＊＊構成の説明＊＊＊
図１は、本実施の形態に係る推論装置１００の構成例を示している。推論装置１００は、本図に示すように、量子化推論部１１０と、非量子化推論部１２０と、データ抽出部１３０とを備える。
推論装置１００は、典型的には組込みシステムの一部である。なお、推論装置１００が推論プロセス管理部を備え、推論プロセス管理部が量子化推論部１１０と非量子化推論部１２０とを制御する構成であってもよい。***Explanation of configuration***
FIG. 1 shows a configuration example of an inference device 100 according to this embodiment. As shown in this figure, the inference device 100 includes a quantized inference section 110, a non-quantized inference section 120, and a data extraction section 130.
Reasoning device 100 is typically part of an embedded system. Note that the inference device 100 may include an inference process management section, and the inference process management section may control the quantization inference section 110 and the non-quantization inference section 120.

量子化推論部１１０は、量子化推論プロセスを実行する、即ち、機械学習の手法に基づいた少なくとも１回の量子化演算を実行する。少なくとも１回の量子化演算それぞれは、少なくとも１つの量子化特徴データそれぞれに応じた演算である。少なくとも１つの量子化特徴データそれぞれは、少なくとも１回の量子化演算それぞれの特徴を示しており、また、少なくとも１回の量子化演算それぞれに対応するパラメータを含んでもよい。また、量子化推論部１１０は推論用データを用いる。推論用データは推論を実行する際に学習済モデルに入力されるデータである。
量子化推論プロセスは量子化済推論プロセスとも呼ばれる。量子化推論プロセスにおいて量子化アルゴリズムが実行される。量子化アルゴリズムは量子化推論アルゴリズムとも呼ばれる。また、量子化推論部１１０は、量子化推論プロセスにおいてオーバーフロー等が発生した場合に、発生したオーバーフロー等に関する情報を記録する。The quantization inference unit 110 executes a quantization inference process, that is, executes at least one quantization operation based on a machine learning technique. Each of the at least one quantization operation is an operation corresponding to at least one piece of quantized feature data. Each of the at least one quantization feature data indicates a feature of each of at least one quantization operation, and may also include a parameter corresponding to each of at least one quantization operation. Further, the quantization inference unit 110 uses inference data. Inference data is data that is input to the learned model when performing inference.
The quantized inference process is also called the quantized inference process. A quantization algorithm is executed in the quantization inference process. Quantization algorithms are also called quantization inference algorithms. Furthermore, when an overflow or the like occurs in the quantization inference process, the quantization inference unit 110 records information regarding the overflow or the like that has occurred.

非量子化推論部１２０は、非量子化推論プロセスを実行する、即ち、少なくとも１回の量子化演算それぞれに対応する少なくとも１回の非量子化演算の少なくともいずれかを実行する。少なくとも１回の量子化演算と、少なくとも１回の非量子化演算とは、同一の機械学習の手法に基づいた演算である。機械学習の手法がニューラルネットワークである場合において、各量子化演算と各非量子化演算とはニューラルネットワークの各レイヤの演算である。少なくとも１回の非量子化演算それぞれは、少なくとも１つの非量子化特徴データそれぞれに応じた演算である。少なくとも１つの非量子化特徴データそれぞれは、少なくとも１回の非量子化演算それぞれの特徴を示しており、また、少なくとも１回の非量子化演算それぞれに対応するパラメータを含んでもよい。非量子化推論部１２０は推論用データと後述の退避データとを用いる。
非量子化推論プロセスは未量子化推論プロセスとも呼ばれる。非量子化推論プロセスにおいて非量子化アルゴリズムが実行される。非量子化アルゴリズムは非量子化推論アルゴリズムとも呼ばれる。非量子化アルゴリズムは学習用のデータを用いて学習を実行した結果得られたアルゴリズムである。量子化アルゴリズムと、非量子化アルゴリズムとは基本的には同じである。量子化アルゴリズムは、非量子化アルゴリズムを、量子化に対応するよう適宜変更したものである。量子化は、典型的には浮動小数点演算を固定小数点演算又は整数演算に置き換えることである。また、量子化アルゴリズムと、非量子化アルゴリズムとの各々を、推論装置１００が生成してもよく、他の装置が生成してもよい。また、非量子化推論部１２０は、量子化推論プロセスにおいてオーバーフローが発生した際に用いられた入力データ等に応じて演算を実行する。The non-quantized inference unit 120 executes a non-quantized inference process, that is, executes at least one non-quantized operation corresponding to each of the at least one quantized operation. At least one quantization operation and at least one non-quantization operation are operations based on the same machine learning method. When the machine learning method is a neural network, each quantization operation and each non-quantization operation are operations for each layer of the neural network. Each of the at least one non-quantized operation is an operation corresponding to at least one non-quantized feature data. Each of the at least one non-quantized feature data indicates a feature of each of the at least one non-quantized operation, and may also include a parameter corresponding to each of the at least one non-quantized operation. The non-quantized inference unit 120 uses inference data and saved data, which will be described later.
A non-quantized inference process is also called an unquantized inference process. A non-quantized algorithm is executed in the non-quantized inference process. Non-quantized algorithms are also called non-quantized inference algorithms. The non-quantized algorithm is an algorithm obtained as a result of learning using learning data. The quantization algorithm and the non-quantization algorithm are basically the same. The quantization algorithm is a non-quantization algorithm modified as appropriate to support quantization. Quantization typically involves replacing floating point operations with fixed point or integer operations. Further, each of the quantization algorithm and the non-quantization algorithm may be generated by the inference device 100, or may be generated by another device. Further, the non-quantization inference unit 120 executes calculations according to the input data used when an overflow occurs in the quantization inference process.

データ抽出部１３０は、量子化推論プロセスにおいてオーバーフローが発生した場合に、量子化推論プロセスと非量子化推論プロセスとから退避データを抽出する。退避データは、量子化推論プロセスにおけるオーバーフローを解析する際に用いられるデータであり、具体例として、入力データと特徴データとから成る。入力データは、少なくとも１回の量子化演算の各々と少なくとも１回の非量子化演算の各々とを実行する際に入力されるデータである。特徴データは、量子化演算において活用されるデータである。特徴データは、量子化特徴データと非量子化特徴データとの総称であり、少なくとも１回の量子化演算と少なくとも１回の非量子化演算との各々の演算ごとに存在し、具体例として演算における入力データの振れ幅を表す。入力データの振れ幅は、入力データとして想定されるデータが示す値の最小値から最大値までの範囲である。データ抽出部１３０は、少なくとも１回の量子化演算の少なくともいずれかにおいてオーバーフローが発生した場合に、オーバーフローが発生した量子化演算の各々に対応する量子化特徴データと、オーバーフローが発生した量子化演算の各々に対応する非量子化演算に対応する非量子化特徴データとを抽出する。
機械学習の手法がディープラーニングである場合において、データ抽出部１３０は、量子化特徴データとしてオーバーフローが発生した量子化演算に対応するレイヤについてのパラメータを示すデータを抽出してもよく、非量子化特徴データとしてオーバーフローが発生した量子化演算に対応する非量子化演算に対応するレイヤについてのパラメータを示すデータを抽出してもよい。 The data extraction unit 130 extracts saved data from the quantized inference process and the non-quantized inference process when an overflow occurs in the quantized inference process. The saved data is data used when analyzing overflow in the quantization inference process, and includes input data and feature data as a specific example. The input data is data that is input when performing at least one quantization operation and at least one non-quantization operation. Feature data is data utilized in quantization calculations. Feature data is a general term for quantized feature data and non-quantized feature data, and exists for each operation: at least one quantization operation and at least one non-quantization operation. represents the amplitude of input data in . The amplitude of the input data is a range from the minimum value to the maximum value of the values assumed as the input data. When an overflow occurs in at least one of at least one quantization operation, the data extraction unit 130 extracts quantization feature data corresponding to each of the quantization operations in which the overflow occurred and the quantization operation in which the overflow occurred. The non-quantized feature data corresponding to the non-quantized operation corresponding to each of the non-quantized feature data are extracted.
When the machine learning method is deep learning , the data extraction unit 130 may extract data indicating parameters for a layer corresponding to a quantization operation in which an overflow has occurred as quantization feature data; Data indicating parameters for a layer corresponding to a non-quantized operation corresponding to a quantized operation in which an overflow has occurred may be extracted as the quantized feature data.

図２は、推論装置１００が実行する各プロセスの優先度を示している。本図に示すように、量子化推論プロセスの優先度は、非量子化推論プロセスの優先度よりも高い。また、量子化推論プロセスの優先度よりも優先度が低く、かつ、非量子化推論プロセスの優先度よりも優先度が高いプロセスがあってもよい。なお、他プロセスという表記を挟む丸括弧は、他プロセスがあってもなくてもよいことを示している。 FIG. 2 shows the priority of each process executed by the inference device 100. As shown in the figure, the priority of the quantized inference process is higher than the priority of the non-quantized inference process. Furthermore, there may be a process that has a lower priority than the quantized inference process and a higher priority than the non-quantized inference process. Note that the parentheses surrounding the notation ``other process'' indicate that the other process may or may not be present.

図３は、本実施の形態に係る推論装置１００のハードウェア構成例を示している。推論装置１００は、コンピュータから成る。推論装置１００は、複数のコンピュータから成ってもよい。 FIG. 3 shows an example of the hardware configuration of the inference device 100 according to this embodiment. Reasoning device 100 consists of a computer. The inference device 100 may consist of multiple computers.

推論装置１００は、本図に示すように、プロセッサ１１と、メモリ１２と、補助記憶装置１３と、入出力ＩＦ（Ｉｎｔｅｒｆａｃｅ）１４と、通信装置１５等のハードウェアを備えるコンピュータである。これらのハードウェアは、信号線１９を介して適宜接続されている。 As shown in the figure, the inference device 100 is a computer that includes hardware such as a processor 11, a memory 12, an auxiliary storage device 13, an input/output IF (Interface) 14, and a communication device 15. These pieces of hardware are appropriately connected via signal lines 19.

プロセッサ１１は、演算処理を行うＩＣ（ＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）であり、かつ、コンピュータが備えるハードウェアを制御する。プロセッサ１１は、具体例として、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）、又はＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）である。
推論装置１００は、プロセッサ１１を代替する複数のプロセッサを備えてもよい。複数のプロセッサは、プロセッサ１１の役割を分担する。The processor 11 is an IC (Integrated Circuit) that performs arithmetic processing, and controls hardware included in the computer. The processor 11 is, for example, a CPU (Central Processing Unit), a DSP (Digital Signal Processor), or a GPU (Graphics Processing Unit).
The inference device 100 may include a plurality of processors that replace the processor 11. A plurality of processors share the role of the processor 11.

メモリ１２は、典型的には、揮発性の記憶装置である。メモリ１２は、主記憶装置又はメインメモリとも呼ばれる。メモリ１２は、具体例として、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）である。メモリ１２に記憶されたデータは、必要に応じて補助記憶装置１３に保存される。 Memory 12 is typically a volatile storage device. Memory 12 is also called main storage or main memory. The memory 12 is, for example, a RAM (Random Access Memory). The data stored in the memory 12 is stored in the auxiliary storage device 13 as necessary.

補助記憶装置１３は、典型的には、不揮発性の記憶装置である。補助記憶装置１３は、具体例として、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）、又はフラッシュメモリである。補助記憶装置１３に記憶されたデータは、必要に応じてメモリ１２にロードされる。
メモリ１２及び補助記憶装置１３は一体的に構成されていてもよい。The auxiliary storage device 13 is typically a nonvolatile storage device. The auxiliary storage device 13 is, for example, a ROM (Read Only Memory), an HDD (Hard Disk Drive), or a flash memory. Data stored in the auxiliary storage device 13 is loaded into the memory 12 as needed.
The memory 12 and the auxiliary storage device 13 may be configured integrally.

入出力ＩＦ１４は、入力装置及び出力装置が接続されるポートである。入出力ＩＦ１４は、具体例として、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）端子である。入力装置は、具体例として、カメラ、キーボード及びマウスである。出力装置は、具体例として、ディスプレイである。 The input/output IF 14 is a port to which an input device and an output device are connected. The input/output IF 14 is, for example, a USB (Universal Serial Bus) terminal. Specific examples of input devices include a camera, a keyboard, and a mouse. A specific example of the output device is a display.

通信装置１５は、レシーバ及びトランスミッタである。通信装置１５は、具体例として、通信チップ又はＮＩＣ（ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣａｒｄ）である。 Communication device 15 is a receiver and a transmitter. The communication device 15 is, for example, a communication chip or a NIC (Network Interface Card).

推論装置１００の各部は、他の装置等と通信する際に、通信装置１５を適宜用いてもよい。推論装置１００の各部は、入出力ＩＦ１４を介してデータを受け付けてもよく、また、通信装置１５を介してデータを受け付けてもよい。 Each part of the inference device 100 may use the communication device 15 as appropriate when communicating with other devices. Each part of the inference device 100 may receive data via the input/output IF 14 or may receive data via the communication device 15.

補助記憶装置１３は、推論プログラムを記憶している。推論プログラムは、推論装置１００が備える各部の機能をコンピュータに実現させるプログラムである。推論プログラムは、メモリ１２にロードされて、プロセッサ１１によって実行される。推論装置１００が備える各部の機能は、ソフトウェアにより実現される。
また、補助記憶装置１３は、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）を記憶している。ＯＳの少なくとも一部は、メモリ１２にロードされて、プロセッサ１１によって実行される。つまり、プロセッサ１１は、ＯＳを実行しながら、推論プログラムを実行する。The auxiliary storage device 13 stores an inference program. The inference program is a program that causes a computer to realize the functions of each part of the inference device 100. The inference program is loaded into memory 12 and executed by processor 11. The functions of each part included in the inference device 100 are realized by software.
Further, the auxiliary storage device 13 stores an OS (Operating System). At least a portion of the OS is loaded into memory 12 and executed by processor 11. That is, the processor 11 executes the inference program while executing the OS.

推論プログラムを実行する際に用いられるデータと、推論プログラムを実行することによって得られるデータと等は、記憶装置に適宜記憶される。推論装置１００の各部は、適宜記憶装置を利用する。記憶装置は、具体例として、メモリ１２と、補助記憶装置１３と、プロセッサ１１内のレジスタと、プロセッサ１１内のキャッシュメモリとの少なくとも１つから成る。なお、データと情報とは、同等の意味を有することもある。記憶装置は、コンピュータと独立したものであってもよい。
メモリ１２及び補助記憶装置１３の機能は、他の記憶装置によって実現されてもよい。Data used when executing the inference program, data obtained by executing the inference program, etc. are appropriately stored in the storage device. Each part of the inference device 100 uses a storage device as appropriate. The storage device includes, as a specific example, at least one of the memory 12, the auxiliary storage device 13, a register within the processor 11, and a cache memory within the processor 11. Note that data and information may have the same meaning. The storage device may be independent of the computer.
The functions of the memory 12 and the auxiliary storage device 13 may be realized by other storage devices.

推論プログラムは、コンピュータが読み取り可能な不揮発性の記録媒体に記録されていてもよい。不揮発性の記録媒体は、具体例として、光ディスク又はフラッシュメモリである。推論プログラムは、プログラムプロダクトとして提供されてもよい。 The inference program may be recorded on a computer-readable non-volatile recording medium. Specific examples of the nonvolatile recording medium include an optical disk or a flash memory. The reasoning program may be provided as a program product.

＊＊＊動作の説明＊＊＊
推論装置１００の動作手順は、推論方法に相当する。また、推論装置１００の動作を実現するプログラムは、推論プログラムに相当する。***Operation explanation***
The operation procedure of the inference device 100 corresponds to an inference method. Further, a program that realizes the operation of the inference device 100 corresponds to an inference program.

図４は、推論装置１００の動作の一例を示すフローチャートである。また、図５は、推論装置１００の動作の一例を模式的に説明する図である。図５において、平行四辺形は、出力データを除いて演算において用いられるデータを表現している。図４及び図５を参照して推論装置１００の動作を説明する。なお、本実施の形態は、量子化したデータを用いることもできる機械学習の手法に対して適用することができるが、説明の便宜上、推論装置１００の動作の説明において機械学習の手法はディープラーニングとする。 FIG. 4 is a flowchart illustrating an example of the operation of the inference device 100. Further, FIG. 5 is a diagram schematically explaining an example of the operation of the inference device 100. In FIG. 5, parallelograms represent data used in calculations, excluding output data. The operation of the inference device 100 will be explained with reference to FIGS. 4 and 5. Note that this embodiment can be applied to a machine learning method that can also use quantized data; however, for convenience of explanation, the machine learning method in the explanation of the operation of the inference device 100 is based on deep learning. shall be.

（ステップＳ１０１）
量子化推論部１１０の量子化推論プロセスが起動される。(Step S101)
The quantization inference process of the quantization inference unit 110 is activated.

（ステップＳ１０２）
推論を開始する指示がある場合、推論装置１００はステップＳ１０３に進む。それ以外の場合、推論装置１００は本ステップの処理を再度実行する。(Step S102)
If there is an instruction to start inference, the inference device 100 proceeds to step S103. In other cases, the inference device 100 executes the process of this step again.

（ステップＳ１０３）
量子化推論部１１０は、推論用データを用いて推論処理を開始する。推論用データは、推論を開始する指示と併せて推論装置１００が受け取ったデータであってもよい。(Step S103)
The quantization inference unit 110 starts inference processing using the inference data. The inference data may be data received by the inference device 100 together with an instruction to start inference.

（ステップＳ１０４）
量子化推論部１１０は、対象レイヤについてのレイヤ演算を実行する。ここで、レイヤはディープラーニングにおけるレイヤであり、ステップＳ１０４からステップＳ１０６から成るループ処理においてレイヤ演算を初めて実行する場合に先頭のレイヤを対象レイヤとし、それ以外の場合に１回前に実行したレイヤ演算における対象レイヤの次のレイヤを対象レイヤとする。
図５における演算がレイヤ演算に相当する。また、量子化推論部１１０は、演算の直下に示されている太い矢印の左側に示されているデータを入力データとしてレイヤ演算を実行し、当該矢印の右側に示されているデータを出力する。(Step S104)
The quantization inference unit 110 performs layer calculations on the target layer. Here, the layer is a layer in deep learning, and in the loop processing consisting of steps S104 to S106, when layer calculation is executed for the first time, the first layer is the target layer, and in other cases, the layer that was executed one time ago is the target layer. The layer next to the target layer in the calculation is the target layer.
The calculations in FIG. 5 correspond to layer calculations. Further, the quantization inference unit 110 executes a layer operation using the data shown on the left side of the thick arrow shown directly below the operation as input data, and outputs the data shown on the right side of the arrow. .

（ステップＳ１０５）
量子化推論部１１０は、ステップＳ１０４の処理においてオーバーフローが発生したか否かを確認する。オーバーフローが発生したか否かを確認する方法は、具体例として、ＣＰＵのフラグを参照する方法、演算に関係する変数を演算の前後に確認する方法、又は、ＦＰＧＡ等の独自の回路により確認する方法である。
オーバーフローが発生した場合、推論装置１００はステップＳ１０９に進む。それ以外の場合、推論装置１００はステップＳ１０６に進む。(Step S105)
The quantization inference unit 110 checks whether an overflow has occurred in the process of step S104. Examples of ways to check whether an overflow has occurred are to refer to CPU flags, to check variables related to calculations before and after calculations, or to check using a unique circuit such as an FPGA. It's a method.
If an overflow occurs, the inference device 100 proceeds to step S109. Otherwise, the inference device 100 proceeds to step S106.

（ステップＳ１０６）
全てのレイヤ演算が終了した場合、推論装置１００はステップＳ１０７に進む。それ以外の場合、推論装置１００はステップＳ１０４に進む。(Step S106)
When all layer calculations are completed, the inference device 100 proceeds to step S107. Otherwise, the inference device 100 proceeds to step S104.

（ステップＳ１０７）
量子化推論部１１０は、推論を指示した者に推論の結果を適宜通知する。(Step S107)
The quantization inference unit 110 appropriately notifies the person who instructed the inference of the result of the inference.

（ステップＳ１０８）
少なくともいずれかのレイヤ演算においてオーバーフローが発生した場合、推論装置１００はステップＳ１１１に進む。それ以外の場合、推論装置１００はステップＳ１０２に戻る。(Step S108)
If an overflow occurs in at least one of the layer calculations, the inference device 100 proceeds to step S111. Otherwise, the inference device 100 returns to step S102.

（ステップＳ１０９）
データ抽出部１３０は、対象レイヤに対応する退避データを退避する。具体例として、データ抽出部１３０は、対象レイヤについてのデータを抽出し、抽出したデータを退避データとして退避する。対象レイヤについてのデータは、具体例として、対象レイヤに対する入力データと、対象レイヤにおける変数データと、対象レイヤにおける特徴データとである。ただし、対象レイヤが先頭のレイヤである場合、データ抽出部１３０は、入力データが推論用データであるため、退避データとして入力データを抽出しない。なお、データ抽出部１３０は推論用データを退避してもよい。ここで、推論用データは先頭のレイヤに入力されるデータである。
図５は、対象レイヤに対応する退避データとして、オーバーフローが発生した対象レイヤについてのデータと、対象レイヤの１つ前のレイヤについてのデータとをデータ抽出部１３０が退避する様子を示している。なお、データ抽出部１３０が退避するデータは、対象レイヤについてのデータのみであってもよく、対象レイヤから対象レイヤのｎ（ｎは自然数）個前のレイヤまでの各々のレイヤについてのデータであってもよい。(Step S109)
The data extraction unit 130 saves the save data corresponding to the target layer. As a specific example, the data extraction unit 130 extracts data regarding the target layer and saves the extracted data as save data. Specific examples of the data regarding the target layer include input data for the target layer, variable data in the target layer, and feature data in the target layer. However, if the target layer is the first layer, the data extraction unit 130 does not extract the input data as saved data because the input data is inference data. Note that the data extraction unit 130 may save the inference data. Here, the inference data is data input to the first layer.
FIG. 5 shows how the data extraction unit 130 saves data about the target layer where an overflow has occurred and data about the layer immediately before the target layer, as saved data corresponding to the target layer. Note that the data saved by the data extraction unit 130 may be only data about the target layer, or may be data about each layer from the target layer to the layer n (n is a natural number) before the target layer. It's okay.

（ステップＳ１１０）
量子化推論部１１０は、飽和演算を実行し、その後、ステップＳ１０６に進む。(Step S110)
The quantization inference unit 110 performs a saturation operation, and then proceeds to step S106.

（ステップＳ１１１）
非量子化推論部１２０の非量子化推論プロセスが起動される。
以下、推論装置１００は、ステップＳ１０２以降の処理と、ステップＳ１２１以降の処理とを並列に実行する。(Step S111)
The non-quantized inference process of the non-quantized inference unit 120 is activated.
Hereinafter, the inference device 100 executes the processing after step S102 and the processing after step S121 in parallel.

（ステップＳ１２１）
非量子化推論部１２０は、推論用データを用いて非量子化推論プロセスを実行する。
なお、図５において、非量子化推論部１２０が非量子化推論プロセスを最後まで実行することによって出力データを出力しているが、非量子化推論部１２０は、量子化推論プロセスにおいてオーバーフローが発生したレイヤに対応するレイヤまでのレイヤ演算を実行すれば十分である。(Step S121)
The non-quantized inference unit 120 executes a non-quantized inference process using the inference data.
Note that in FIG. 5, the non-quantized inference unit 120 outputs output data by executing the non-quantized inference process to the end, but the non-quantized inference unit 120 outputs output data when an overflow occurs in the quantized inference process. It is sufficient to perform layer calculations up to the layer corresponding to the layer.

（ステップＳ１２２）
データ抽出部１３０は、退避データとして、オーバーフローが発生したレイヤに対応するレイヤについてのデータを抽出する。ここで、データ抽出部１３０は、量子化推論プロセスにおいて退避したデータに対応するレイヤと同一のレイヤについてのデータを退避する。具体例として、図５に示すように、データ抽出部１３０が量子化推論プロセスにおいてオーバーフローが発生したレイヤと当該レイヤの１つ前のレイヤとの各々についてのデータを退避した場合、データ抽出部１３０は、オーバーフローが発生したレイヤに対応するレイヤとオーバーフローが発生したレイヤの１つ前のレイヤとの各々に対応する非量子化推論プロセスのレイヤについてのデータを退避データとして退避する。
データ抽出部１３０が退避した量子化推論プロセスにおけるデータと非量子化推論プロセスにおけるデータとは推論装置１００の外部に出力されてもよい。エンジニア等は、出力されたデータを入手し、入手したデータに基づいて量子化推論プロセスのパラメータを再設定してもよく、入手したデータに基づいてソースコード内の量子化推論プロセスに関するパラメータを変更してもよい。(Step S122)
The data extraction unit 130 extracts, as saved data, data regarding a layer corresponding to the layer in which the overflow has occurred. Here, the data extraction unit 130 saves data regarding the same layer as the layer corresponding to the data saved in the quantization inference process. As a specific example, as shown in FIG. 5, when the data extraction unit 130 saves data for each of the layer in which an overflow occurred in the quantization inference process and the layer immediately before the layer, the data extraction unit 130 saves, as save data, data regarding the layer of the non-quantized inference process corresponding to each of the layer corresponding to the layer in which the overflow has occurred and the layer immediately before the layer in which the overflow has occurred.
The data in the quantized inference process and the data in the non-quantized inference process saved by the data extraction unit 130 may be output to the outside of the inference device 100. Engineers, etc. may obtain the output data and reset the parameters of the quantization inference process based on the obtained data, and may change the parameters related to the quantization inference process in the source code based on the obtained data. You may.

（ステップＳ１２３）
非量子化推論部１２０の非量子化推論プロセスは終了される。(Step S123)
The non-quantization inference process of the non-quantization inference unit 120 is terminated.

＊＊＊実施の形態１の効果の説明＊＊＊
以上のように、本実施の形態によれば、推論用データが機密情報である場合であっても、推論用データを抽出せず、オーバーフローが発生した演算に関するデータのみを抽出する。ここで、オーバーフローが発生した演算に関するデータは、推論用データに存在する機密性が排除されたデータである。そのため、本実施の形態によれば、推論環境における精度を維持しつつ、機密性を保持したまま量子化演算において発生したオーバーフローを解析することができる。そのため、本実施の形態によれば、発生したオーバーフローに対処するための学習済モデルの改良に必要なデータを取得することができる。
また、本実施の形態によれば、学習用データ等のみならず、実際の推論用データに基づいて量子化を伴う機械学習におけるパラメータ等を調整することができる。***Explanation of effects of Embodiment 1***
As described above, according to the present embodiment, even if the inference data is confidential information, the inference data is not extracted, but only the data related to the operation in which the overflow occurred. Here, the data related to the operation in which the overflow has occurred is data from which the confidentiality that exists in the inference data has been eliminated. Therefore, according to the present embodiment, it is possible to analyze an overflow that occurs in a quantization operation while maintaining accuracy in the inference environment and maintaining confidentiality. Therefore, according to the present embodiment, it is possible to obtain data necessary for improving the trained model to deal with the overflow that has occurred.
Furthermore, according to the present embodiment, parameters and the like in machine learning involving quantization can be adjusted based not only on learning data and the like but also on actual inference data.

＊＊＊他の構成＊＊＊
＜変形例１＞
量子化推論部１１０は、オーバーフローが発生した場合に推論を途中で打ち切ってもよい。具体例として、量子化推論部１１０は、少なくとも１回の量子化演算のいずれかにおいてオーバーフローが発生した場合に、オーバーフローが発生した演算よりも後の演算を実行しない。***Other configurations***
<Modification 1>
The quantization inference unit 110 may terminate inference midway if overflow occurs. As a specific example, if an overflow occurs in at least one quantization operation, the quantization inference unit 110 does not execute the operation subsequent to the operation in which the overflow occurred.

＜変形例２＞
図６は、本変形例に係る推論装置１００のハードウェア構成例を示している。本変形例に係る推論装置１００は、本図に示すように、オフロードデバイス１６を備える。
量子化推論プロセスは、オフロードデバイス１６によって実行されてもよい。オフロードデバイス１６は具体例として、ＧＰＵ又はＦＰＧＡである。本変形例において、非量子化推論プロセスはプロセッサ１１で実行されていてもよく、このとき、プロセッサ１１とオフロードデバイス１６との間で適宜通信が実行される。<Modification 2>
FIG. 6 shows an example of the hardware configuration of the inference device 100 according to this modification. The inference device 100 according to this modification includes an offload device 16, as shown in the figure.
The quantization inference process may be performed by offload device 16. The offload device 16 is, for example, a GPU or an FPGA. In this modification, the non-quantized inference process may be executed by the processor 11, and at this time, communication is executed between the processor 11 and the offload device 16 as appropriate.

＜変形例３＞
図７は、本変形例に係る推論装置１００のハードウェア構成例を示している。
推論装置１００は、プロセッサ１１、プロセッサ１１とメモリ１２、プロセッサ１１と補助記憶装置１３、あるいはプロセッサ１１とメモリ１２と補助記憶装置１３に代えて、処理回路１８を備える。
処理回路１８は、推論装置１００が備える各部の少なくとも一部を実現するハードウェアである。
処理回路１８は、専用のハードウェアであってもよく、また、メモリ１２に格納されるプログラムを実行するプロセッサであってもよい。<Modification 3>
FIG. 7 shows an example of the hardware configuration of the inference device 100 according to this modification.
The inference device 100 includes a processing circuit 18 in place of the processor 11, the processor 11 and memory 12, the processor 11 and auxiliary storage 13, or the processor 11, memory 12, and auxiliary storage 13.
The processing circuit 18 is hardware that implements at least a portion of each unit included in the inference device 100.
Processing circuit 18 may be dedicated hardware or may be a processor that executes a program stored in memory 12.

処理回路１８が専用のハードウェアである場合、処理回路１８は、具体例として、単一回路、複合回路、プログラム化したプロセッサ、並列プログラム化したプロセッサ、ＡＳＩＣ（ＡＳＩＣはＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、ＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）又はこれらの組み合わせである。
推論装置１００は、処理回路１８を代替する複数の処理回路を備えてもよい。複数の処理回路は、処理回路１８の役割を分担する。When the processing circuit 18 is dedicated hardware, the processing circuit 18 may be, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (ASIC stands for Application Specific Integrated Circuit), or an FPGA. (Field Programmable Gate Array) or a combination thereof.
The inference device 100 may include a plurality of processing circuits that replace the processing circuit 18. The plurality of processing circuits share the role of the processing circuit 18.

推論装置１００において、一部の機能が専用のハードウェアによって実現されて、残りの機能がソフトウェア又はファームウェアによって実現されてもよい。 In the inference device 100, some functions may be realized by dedicated hardware, and the remaining functions may be realized by software or firmware.

処理回路１８は、具体例として、ハードウェア、ソフトウェア、ファームウェア、又はこれらの組み合わせにより実現される。
プロセッサ１１とメモリ１２と補助記憶装置１３と処理回路１８とを、総称して「プロセッシングサーキットリー」という。つまり、推論装置１００の各機能構成要素の機能は、プロセッシングサーキットリーにより実現される。
他の実施の形態に係る推論装置１００についても、本変形例と同様の構成であってもよい。The processing circuit 18 is implemented, for example, by hardware, software, firmware, or a combination thereof.
The processor 11, memory 12, auxiliary storage device 13, and processing circuit 18 are collectively referred to as a "processing circuitry." That is, the functions of each functional component of the inference device 100 are realized by processing circuitry.
The inference device 100 according to other embodiments may also have the same configuration as this modification.

実施の形態２．
以下、主に前述した実施の形態と異なる点について、図面を参照しながら説明する。Embodiment 2.
Hereinafter, differences from the embodiments described above will be mainly described with reference to the drawings.

＊＊＊構成の説明＊＊＊
図８は、本実施の形態に係る推論装置１００の構成例を示している。推論装置１００は、本図に示すように再量子化部１４０を備える。
再量子化部１４０は、データ抽出部１３０が抽出したデータに基づいて量子化アルゴリズムを変更することにより再量子化アルゴリズムを生成する。ここで、再量子化アルゴリズムは、再量子化部１４０によって変更された量子化アルゴリズムである。また、再量子化部１４０は、再量子化を実行するタイミングを管理する。再量子化は、再量子化アルゴリズムを変更することであり、量子化アルゴリズムを再量子化アルゴリズムに置き換えることを含んでもよい。再量子化部１４０は、データ抽出部１３０によって抽出された量子化特徴データと非量子化特徴データとに基づいて、抽出された量子化特徴データに対応する量子化演算である対象演算においてオーバーフローが発生しないよう、対象演算に対応する量子化特徴データを変更する。
本実施の形態に係る量子化推論部１１０は、再量子化部１４０が生成した再量子化アルゴリズムを適宜利用し、変更された量子化特徴データに応じた量子化演算を実行する。また、量子化推論部１１０は、再量子化アルゴリズムと動作している量子化アルゴリズムとを入れ替えるタイミングを管理する。***Explanation of configuration***
FIG. 8 shows a configuration example of the inference device 100 according to this embodiment. The inference device 100 includes a requantization unit 140 as shown in the figure.
The requantization unit 140 generates a requantization algorithm by changing the quantization algorithm based on the data extracted by the data extraction unit 130. Here, the requantization algorithm is a quantization algorithm modified by the requantization unit 140. Further, the requantization unit 140 manages the timing to perform requantization. Requantization is changing the requantization algorithm, and may include replacing the quantization algorithm with a requantization algorithm. Based on the quantized feature data and non-quantized feature data extracted by the data extraction unit 130, the requantization unit 140 determines whether overflow occurs in the target operation, which is a quantization operation corresponding to the extracted quantized feature data. Change the quantization feature data corresponding to the target operation so that this does not occur.
The quantization inference unit 110 according to the present embodiment appropriately utilizes the requantization algorithm generated by the requantization unit 140 and executes a quantization operation according to the changed quantization feature data. Further, the quantization inference unit 110 manages the timing of replacing the re-quantization algorithm with the currently operating quantization algorithm.

＊＊＊動作の説明＊＊＊
図９は、推論装置１００の動作の一例を示すフローチャートである。本図を参照して、実施の形態１に係る推論装置１００の動作と、本実施の形態に係る推論装置１００の動作との差異を主に説明する。***Operation explanation***
FIG. 9 is a flowchart showing an example of the operation of the inference device 100. With reference to this figure, differences between the operation of inference device 100 according to the first embodiment and the operation of inference device 100 according to the present embodiment will be mainly explained.

（ステップＳ１０２）
推論装置１００は、本ステップの処理を再度実行する代わりに、ステップＳ２０１に進む。(Step S102)
The inference device 100 proceeds to step S201 instead of executing the process of this step again.

（ステップＳ１０８）
推論装置１００はステップＳ１０２に戻る代わりに、ステップＳ２０１に進む。(Step S108)
The inference device 100 proceeds to step S201 instead of returning to step S102.

（ステップＳ２０１）
再量子化部１４０によって再量子化アルゴリズムが準備されていない場合、量子化推論部１１０はステップＳ１０２に進む。それ以外の場合、量子化推論部１１０はステップＳ２０２に進む。(Step S201)
If the requantization algorithm is not prepared by the requantization unit 140, the quantization inference unit 110 proceeds to step S102. In other cases, the quantization inference unit 110 proceeds to step S202.

（ステップＳ２０２）
量子化推論部１１０は、量子化推論プロセスにおける量子化アルゴリズムを再量子化部１４０が準備した再量子化アルゴリズムに入れ替える。(Step S202)
The quantization inference unit 110 replaces the quantization algorithm in the quantization inference process with the requantization algorithm prepared by the requantization unit 140.

（ステップＳ２２１）
再量子化部１４０は、退避データを用いて再量子化を実行することにより再量子化アルゴリズムを生成する。再量子化は、オーバーフローが発生したレイヤにおいてオーバーフローが発生しないように量子化アルゴリズムを調整することである。再量子化部１４０は、具体例として、オーバーフローが発生したレイヤにおける特徴データを変更する。
なお、複数のレイヤにおいてオーバーフローが発生した場合に、再量子化部１４０は、最も早く処理されるレイヤに対応する特徴データのみを調整してもよく、オーバーフローが発生した全てのレイヤの各々に対応する特徴データを一括で調整してもよい。(Step S221)
The requantization unit 140 generates a requantization algorithm by performing requantization using the saved data. Requantization is adjusting the quantization algorithm so that overflow does not occur in the layer where overflow has occurred. As a specific example, the requantization unit 140 changes the feature data in the layer where the overflow has occurred.
Note that when overflow occurs in multiple layers, the requantization unit 140 may adjust only the feature data corresponding to the layer that is processed earliest, and may adjust the feature data corresponding to each of all the layers in which overflow has occurred. You may adjust all the feature data at once.

（ステップＳ２２２）
再量子化部１４０は、ステップＳ２２１において生成した再量子化アルゴリズムを保存する。(Step S222)
The requantization unit 140 stores the requantization algorithm generated in step S221.

＊＊＊実施の形態２の効果の説明＊＊＊
以上のように、本実施の形態によれば、オーバーフローが発生しないように量子化アルゴリズムを変更する処理を自動的に実行することができる。***Explanation of effects of Embodiment 2***
As described above, according to the present embodiment, it is possible to automatically execute the process of changing the quantization algorithm so that overflow does not occur.

＊＊＊他の構成＊＊＊
＜変形例４＞
量子化推論プロセスにおける量子化アルゴリズムを再量子化部１４０が準備した量子化アルゴリズムに量子化推論部１１０が入れ替えるタイミングは、推論装置１００を備える組込みシステムの再起動時であってもよい。また、量子化推論部１１０は、次に推論を実行するタイミングを考慮して次に実行する推論に影響がないと判断した際に量子化アルゴリズムを入れ替えるよう動作予約する方式により量子化アルゴリズムを入れ替えてもよい。***Other configurations***
<Modification 4>
The timing at which the quantization inference unit 110 replaces the quantization algorithm in the quantization inference process with the quantization algorithm prepared by the requantization unit 140 may be when the embedded system including the inference device 100 is restarted. In addition, the quantization inference unit 110 replaces the quantization algorithm by a method that takes into account the timing of the next inference and makes an operation reservation to replace the quantization algorithm when it is determined that the next inference will not be affected. It's okay.

＊＊＊他の実施の形態＊＊＊
前述した各実施の形態の自由な組み合わせ、あるいは各実施の形態の任意の構成要素の変形、もしくは各実施の形態において任意の構成要素の省略が可能である。
また、実施の形態は、実施の形態１から２で示したものに限定されるものではなく、必要に応じて種々の変更が可能である。フローチャート等を用いて説明した手順は、適宜変更されてもよい。***Other embodiments***
It is possible to freely combine the embodiments described above, to modify any component of each embodiment, or to omit any component in each embodiment.
Further, the embodiments are not limited to those shown in Embodiments 1 and 2, and various changes can be made as necessary. The procedures described using flowcharts and the like may be modified as appropriate.

１１プロセッサ、１２メモリ、１３補助記憶装置、１４入出力ＩＦ、１５通信装置、１６オフロードデバイス、１８処理回路、１９信号線、１００推論装置、１１０量子化推論部、１２０非量子化推論部、１３０データ抽出部、１４０再量子化部。 11 processor, 12 memory, 13 auxiliary storage device, 14 input/output IF, 15 communication device, 16 offload device, 18 processing circuit, 19 signal line, 100 inference device, 110 quantization inference section, 120 non-quantization inference section, 130 data extraction section, 140 requantization section.

Claims

a quantization inference unit that performs at least one quantization operation based on a machine learning method using the inference data;
An inference device comprising: a non-quantization inference unit that executes at least one of at least one non-quantization operation corresponding to each of the at least one quantization operation using the inference data,
Each of the at least one quantization operation is an operation corresponding to at least one quantization feature data indicating a characteristic of each of the at least one quantization operation,
Each of the at least one non-quantized operation is an operation according to each of at least one non-quantized feature data indicating the characteristics of each of the at least one non-quantized operation,
The inference device further includes:
When an overflow occurs in at least one of the at least one quantization operation, quantization feature data corresponding to each quantization operation in which an overflow occurred, and quantization feature data corresponding to each quantization operation in which an overflow occurred. An inference device comprising a data extraction unit that extracts non-quantized feature data corresponding to a non-quantized operation.

Each of the at least one quantization feature data includes a parameter corresponding to each of the at least one quantization operation,
The inference device according to claim 1, wherein each of the at least one non-quantized feature data includes a parameter corresponding to each of the at least one non-quantized operation.

The machine learning method is deep learning,
The data extraction unit is
Extracting data indicating parameters for a layer corresponding to a quantization operation in which an overflow has occurred as the quantization feature data;
The inference device according to claim 1 or 2, wherein data indicating a parameter for a layer corresponding to a non-quantized operation corresponding to a quantized operation in which an overflow has occurred is extracted as the non-quantized feature data.

The inference device further includes:
Based on the extracted quantized feature data and non-quantized feature data, quantization corresponding to the target operation is performed to prevent overflow from occurring in the target operation, which is a quantization operation corresponding to the extracted quantized feature data. Equipped with a requantization unit that changes feature data,
The inference device according to any one of claims 1 to 3, wherein the quantization inference unit executes a quantization operation according to changed quantized feature data.

the computer executes at least one quantization operation based on a machine learning method using the inference data;
An inference method , wherein the computer uses the inference data to execute at least one of at least one non-quantization operation corresponding to each of the at least one quantization operation,
Each of the at least one quantization operation is an operation corresponding to at least one quantization feature data indicating a characteristic of each of the at least one quantization operation,
Each of the at least one non-quantized operation is an operation according to each of at least one non-quantized feature data indicating the characteristics of each of the at least one non-quantized operation,
When an overflow occurs in at least one of the at least one quantization operation, the computer stores quantization feature data corresponding to each of the quantization operations in which the overflow has occurred, and quantization feature data for each of the quantization operations in which the overflow has occurred. An inference method for extracting non-quantized feature data corresponding to each corresponding non-quantized operation.

quantization inference processing that performs at least one quantization operation based on a machine learning method using inference data;
An inference that causes an inference device, which is a computer, to perform a non-quantized inference process of executing at least one of at least one non-quantized operation corresponding to each of the at least one quantized operation using the inference data. A program,
Each of the at least one quantization operation is an operation corresponding to at least one quantization feature data indicating a characteristic of each of the at least one quantization operation,
Each of the at least one non-quantized operation is an operation according to each of at least one non-quantized feature data indicating the characteristics of each of the at least one non-quantized operation,
The inference program further includes:
When an overflow occurs in at least one of the at least one quantization operation, quantization feature data corresponding to each quantization operation in which an overflow occurred, and quantization feature data corresponding to each quantization operation in which an overflow occurred. An inference program that causes the inference device to execute a data extraction process of extracting non-quantized feature data corresponding to a non-quantized operation.