JP7120308B2

JP7120308B2 - DATA PROCESSING DEVICE, DATA PROCESSING CIRCUIT AND DATA PROCESSING METHOD

Info

Publication number: JP7120308B2
Application number: JP2020528664A
Authority: JP
Inventors: 芙美代鷹野; 誠也柴田; 崇竹中; 浩明井上
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2018-07-06
Filing date: 2018-07-06
Publication date: 2022-08-17
Anticipated expiration: 2038-07-06
Also published as: WO2020008643A1; JPWO2020008643A1

Description

本発明は、２種類の演算精度による演算を含むデータ処理を行うデータ処理装置、データ処理回路およびデータ処理方法に関する。 The present invention relates to a data processing device, a data processing circuit, and a data processing method that perform data processing including calculations with two types of calculation accuracy.

機械学習の普及が進み、時々刻々と変化する状況に対応するための更なる工夫が求められている。 With the spread of machine learning, further ingenuity is required to cope with ever-changing situations.

そのためには、実際に使用される環境で取得される多様な生データを学習用データとして学習に取り入れる必要がある。学習用データを用いた学習（機械学習）では、例えば、学習用データで示される入力と出力の関係等に基づいて、所定の学習器で使用される演算式や判別式のパラメタが調整される。学習器は、例えば、データが入力されると、１つまたは複数のラベルについての判別を行う判別モデル等である。 For that reason, it is necessary to incorporate various raw data obtained in the actual environment as learning data into learning. In learning (machine learning) using learning data, for example, parameters of arithmetic expressions and discriminants used in predetermined learning devices are adjusted based on the relationship between input and output indicated by learning data. . A learner is, for example, a discriminant model or the like that discriminates about one or more labels when data is input.

機械学習における演算資源と演算精度の関係について、例えば、非特許文献１には、ニューラルネットワークの深層学習を効率的に、特に低い消費電力で実行するための学習用演算回路および学習方法の例が記載されている。 Regarding the relationship between computational resources and computational accuracy in machine learning, for example, Non-Patent Document 1 describes an example of a computational circuit for learning and a learning method for executing deep learning of a neural network efficiently, particularly with low power consumption. Have been described.

また、非特許文献２には、ＣＮＮ（Convolutional Neural Network）における深層学習において、複数ある畳込み層を、重みが固定される層と重みが更新される層（拡張機能層）に分けて学習範囲を制限することで、学習時間の短縮を図る学習方法の例が記載されている。 In addition, in non-patent document 2, in deep learning in CNN (Convolutional Neural Network), a plurality of convolution layers are divided into a layer whose weight is fixed and a layer whose weight is updated (enhanced function layer), and the learning range An example of a learning method for shortening the learning time by restricting is described.

また、機械学習における学習演算用の回路構成の例として、非特許文献３には、ＦＰＧＡ（Field-Programmable Gate Array）をベースとしたアクセラレータ設計の最適化例が記載されている。 As an example of circuit configuration for learning calculation in machine learning, Non-Patent Document 3 describes an optimization example of accelerator design based on FPGA (Field-Programmable Gate Array).

Y.H.Chen, et.al., "Eyeriss: an Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks", in IEEE Jornal of Slid-State Circuits, vol.52, no.1, Jan. 2017, pp.127-138.Y.H.Chen, et.al., "Eyeriss: an Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks", in IEEE Journal of Slid-State Circuits, vol.52, no.1, Jan. 2017, pp.127-138 . Wei. Liu, et.al., "SSD: Single shot MultiBox Detector", arXiv:1512.02325v5, Dec. 2016.Wei. Liu, et.al., "SSD: Single shot MultiBox Detector", arXiv:1512.02325v5, Dec. 2016. Chen Zhang, et.al., "Optimizing FPGA-based Accelerator Design for Deep convolutional Neural Networks", In ACM FPGA 2015, pp.160-170.Chen Zhang, et.al., "Optimizing FPGA-based Accelerator Design for Deep convolutional Neural Networks", In ACM FPGA 2015, pp.160-170.

学習用データを用いた機械学習の多くは、汎用用途の学習アルゴリズムに対応すべく、大規模な高精度演算回路を構築可能なクラウド環境で行われていた。 Most of the machine learning using training data is performed in a cloud environment where large-scale high-precision arithmetic circuits can be constructed in order to support general-purpose learning algorithms.

しかし、現場によっては、ネットワーク帯域の制限やプライバシの保護等、種々のデータ移動の制約があるため、クラウド環境ではなく、現場にあるデバイス内（以下、エッジ・デバイス層という）で学習できる仕組みが望まれる。そのためには、より少ないコンピュータ資源ひいては低消費電力で、十分な認識率を得られる学習方法が望まれる。 However, depending on the site, there are various restrictions on data movement, such as network bandwidth restrictions and privacy protection. desired. For that purpose, a learning method that can obtain a sufficient recognition rate with less computer resources and lower power consumption is desired.

非特許文献１に記載の学習方法によれば、３２ｂｉｔ浮動小数点の演算回路を用いて学習を行うＮＶＩＤＩＡ社のＴＫ１（Jetson Kit）と比較して、１６ｂｉｔ固定小数点の演算回路を用いることで、より低い消費電力で学習を実現できるとされている。しかし、当該方法は、すべての学習演算（パラメータの調整を行うための全ての演算）を行う演算回路におけるビット幅を削減することにより、演算精度の低下と引き換えに消費電力を低減しようというものにすぎず、演算回路そのものの演算精度が低下することによる弊害については何ら考慮されていない。例えば、学習演算を実施するのに十分な演算精度が確保されないおそれ等については何ら考慮されていない。 According to the learning method described in Non-Patent Document 1, compared to NVIDIA's TK1 (Jetson Kit), which performs learning using a 32-bit floating-point arithmetic circuit, by using a 16-bit fixed-point arithmetic circuit, It is said that learning can be realized with low power consumption. However, this method is intended to reduce power consumption in exchange for a decrease in calculation accuracy by reducing the bit width in the calculation circuit that performs all learning calculations (all calculations for adjusting parameters). However, no consideration is given to the adverse effects caused by the lowering of the arithmetic accuracy of the arithmetic circuit itself. For example, no consideration is given to the possibility that sufficient computational accuracy for executing the learning computation may not be ensured.

例えば、深層学習を行う演算回路では、複数のユニットが層状に結合された構成を利用した多層演算が行われるが、この時の多層演算は、層ごとにユニットの出力を計算する部分（いわゆる推論処理。例えば、順伝搬処理）と、該計算に用いるパラメタ（例えば、重み等）を更新するための計算をする部分（いわゆるパラメタ更新処理。例えば、逆伝搬処理）とに大別される。このうちの特にパラメタ更新処理が、機械学習における実際の学習演算部分に相当するといえる。したがって、パラメタ更新処理の演算精度は、運用時の認識率に大きく影響を与える演算であり、高精度であればあるほど好ましい。一方で、推論処理の演算精度は、それほど高くなくてもよい場合が多い。 For example, in an arithmetic circuit that performs deep learning, multi-layered calculations are performed using a structure in which multiple units are connected in layers. Processing (for example, forward propagation processing) and calculation for updating parameters (for example, weights) used in the calculation (so-called parameter update processing (for example, back propagation processing)). It can be said that parameter update processing, in particular, among these processes corresponds to the actual learning calculation part in machine learning. Therefore, the calculation accuracy of the parameter updating process greatly affects the recognition rate during operation, and the higher the accuracy, the better. On the other hand, it is often the case that the calculation accuracy of inference processing does not need to be so high.

したがって、学習処理に含まれる演算のうち、例えば高い精度を必要とする演算のみを高精度演算を行い、高い精度を必要としない演算は低い精度で演算すれば、消費電力を低減しつつ十分な精度での学習が可能になる。そこで、高い精度を必要とする演算と高い精度を必要としない演算が混在している処理を行う装置が２種類の演算精度の演算回路を有し、当該処理において行われる各演算の実施先とする回路を、該演算が必要とする精度に応じて切り替えながら実行することを考える。その場合、該装置では、演算精度が異なるコア（演算回路）間のデータ交換が必須要件となる。このような、演算精度が異なるコア間のデータ交換を含む学習処理のさらなる効率化を図るには、コア間のデータ交換の高速化が重要となる。 Therefore, among the calculations included in the learning process, for example, if only calculations that require high accuracy are performed with high accuracy, and calculations that do not require high accuracy are performed with low accuracy, sufficient power consumption can be achieved while reducing power consumption. Accurate learning becomes possible. Therefore, a device that performs processing that requires a mixture of computations that require high accuracy and computation that does not require high accuracy has arithmetic circuits with two types of computational accuracies. Consider executing the circuit while switching according to the accuracy required for the operation. In that case, data exchange between cores (arithmetic circuits) with different arithmetic accuracies is an essential requirement in the device. In order to further improve the efficiency of learning processing including data exchange between cores with different calculation accuracies, it is important to increase the speed of data exchange between cores.

なお、非特許文献２に記載の学習方法は、学習範囲を制限することで学習時間の短縮をしようとするものにすぎず、演算精度が異なるコア間のデータ交換を含む学習処理の効率化、特に異なる精度を有するコア間のデータ交換の効率化については何ら考慮されていない。また、非特許文献３に記載の方法は、すべての学習演算を行う回路の回路構成の最適化により回路規模や計算時間の縮小を行おうというものにすぎず、やはり演算精度が異なるコア間のデータ交換を含む学習処理の効率化、特に異なる精度を有するコア間のデータ交換の効率化については何ら考慮されていない。 It should be noted that the learning method described in Non-Patent Document 2 is merely an attempt to shorten the learning time by limiting the learning range. In particular, no consideration is given to the efficiency of data exchange between cores having different accuracies. In addition, the method described in Non-Patent Document 3 is nothing more than an attempt to reduce the circuit scale and calculation time by optimizing the circuit configuration of the circuit that performs all learning calculations. No consideration is given to improving the efficiency of learning processing including data exchange, especially the efficiency of data exchange between cores having different accuracies.

本発明は、上述した課題に鑑みて、高い精度を必要とする演算と高い精度を必要としない演算が混在する処理のさらなる効率化が可能なデータ処理装置、データ処理回路およびデータ処理方法を提供することを目的とする。 SUMMARY OF THE INVENTION In view of the above problems, the present invention provides a data processing device, a data processing circuit, and a data processing method capable of further improving the efficiency of processing in which operations that require high accuracy and operations that do not require high accuracy are mixed. intended to

本発明によるデータ処理装置は、第１の精度で所定の演算を行う低精度演算処理手段と、第１の精度よりも高い第２の精度で所定の演算を行う高精度演算処理手段と、高精度演算処理手段と低精度演算処理手段との間でデータの受け渡しを行うための通信路の高精度演算処理手段側の端点に設けられる第１のデータ変換手段とを備え、第１のデータ変換手段は、接続先の高精度演算処理手段との間で受け渡されるデータが高精度演算処理手段で扱えるデータであるとともに、通信路を通るデータ量が、第１の精度のデータを使用した場合のデータ量以下となり、かつ通信路を通るデータの精度が第１の精度以下となるように、通信路と高精度演算処理手段との間を通るデータに対して所定の変換を行い、通信路の低精度演算処理手段側の端点に設けられる第２のデータ変換手段をさらに備え、第１のデータ変換手段および第２のデータ変換手段は、接続先の演算処理手段との間で受け渡されるデータが接続先の演算処理手段で扱えるデータであるとともに、通信路を通るデータ量が、第１の精度のデータを使用した場合のデータ量よりも少なく、かつ通信路を通るデータの精度が第１の精度よりも低くなるように、通信路と接続先の演算処理手段との間を通るデータに対して所定の変換を行うことを特徴とする。 A data processing apparatus according to the present invention comprises low-accuracy arithmetic processing means for performing predetermined arithmetic operations with a first precision, high-precision arithmetic processing means for performing predetermined arithmetic operations with a second precision higher than the first precision, and high-precision arithmetic processing means. a first data conversion means provided at an end point on the side of the high-precision arithmetic processing means of a communication path for exchanging data between the precision arithmetic processing means and the low-precision arithmetic processing means; The means is that the data transferred between the high-precision arithmetic processing means of the connection destination is data that can be handled by the high-precision arithmetic processing means, and the amount of data passing through the communication path is data of the first precision. and the accuracy of the data passing through the communication channel is equal to or less than the first accuracy, performing a predetermined conversion on the data passing between the communication channel and the high-precision arithmetic processing means, and communicating a second data conversion means provided at an end point of the low-precision arithmetic processing means side of the path; The data to be transmitted is data that can be handled by the arithmetic processing means of the connection destination, the amount of data passing through the communication channel is smaller than the amount of data when data of the first accuracy is used, and the accuracy of the data passing through the communication channel is It is characterized by performing a predetermined conversion on data passing between the communication path and the arithmetic processing means of the connection destination so as to be lower than the first accuracy .

本発明によるデータ処理回路は、第１の精度で所定の演算を行う低精度演算回路と、第１の精度よりも高い第２の精度で所定の演算を行う高精度演算回路と、高精度演算回路と低精度演算回路との間でデータの受け渡しを行うための通信路の高精度演算回路側の端点に設けられ、通信路と高精度演算回路との間を通るデータに対して、予め定められた変換を行う第１のデータ変換回路とを備え、第１のデータ変換回路と接続先の高精度演算回路との間で受け渡されるデータが高精度演算回路が扱うデータであり、通信路の低精度演算回路側の端点に設けられ、通信路と低精度演算回路との間を通るデータに対して、予め定められた変換を行う第２のデータ変換回路をさらに備え、第２のデータ変換回路と接続先の低精度演算回路との間で受け渡されるデータが低精度演算回路が扱うデータであり、通信路を通るデータ量が、第１の精度のデータを使用した場合のデータ量より少なく、かつ当該通信路を通るデータの精度が第１の精度よりも低いことを特徴とする。 A data processing circuit according to the present invention comprises a low-precision arithmetic circuit that performs predetermined arithmetic with a first precision, a high-precision arithmetic circuit that performs predetermined arithmetic with a second precision higher than the first precision, and a high-precision arithmetic circuit. It is provided at the end point of the high-precision arithmetic circuit side of the communication path for exchanging data between the circuit and the low-precision arithmetic circuit, and is predetermined for the data passing between the communication path and the high-precision arithmetic circuit. and a first data conversion circuit for performing the conversion, data transferred between the first data conversion circuit and the high-precision arithmetic circuit to which the high-precision arithmetic circuit is connected is data handled by the high-precision arithmetic circuit, and the communication path further comprising a second data conversion circuit provided at an end point on the low-precision arithmetic circuit side of the second data conversion circuit for performing predetermined conversion on data passing between the communication path and the low-precision arithmetic circuit; The data transferred between the conversion circuit and the connected low-precision arithmetic circuit is the data handled by the low-precision arithmetic circuit, and the amount of data passing through the communication channel is the amount of data when the first precision data is used. and the accuracy of data passing through the communication path is lower than the first accuracy .

本発明によるデータ処理方法は、第１の精度で所定の演算を行う低精度演算処理手段と、第１の精度よりも高い第２の精度で所定の演算を行う高精度演算処理手段との間でデータの受け渡しを行うための通信路の高精度演算処理手段側の端点に設けられる第１のデータ変換手段が、接続先の高精度演算処理手段との間で受け渡されるデータが高精度演算処理手段で扱えるデータであるとともに、通信路を通るデータ量が、第１の精度のデータを使用した場合のデータ量以下となり、かつ通信路を通るデータの精度が第１の精度以下となるように、通信路と高精度演算処理手段との間を通るデータに対して所定の変換を行い、第１のデータ変換手段が、接続先の高精度演算処理手段との間で受け渡されるデータが高精度演算処理手段で扱えるデータであるとともに、通信路を通るデータ量が、第１の精度のデータを使用した場合のデータ量より少なく、かつ通信路を通るデータの精度が第１の精度より低くなるように、通信路と高精度演算処理手段との間を通るデータに対して所定の変換を行い、通信路の低精度演算処理手段側の端点に設けられる第２のデータ変換手段が、接続先の低精度演算処理手段との間で受け渡されるデータが低精度演算処理手段で扱えるデータであるとともに、通信路を通るデータ量が、第１の精度のデータを使用した場合のデータ量より少なく、かつ通信路を通るデータの精度が第１の精度より低くなるように、通信路と低精度演算処理手段との間を通るデータに対して所定の変換を行うことを特徴とする。 In the data processing method according to the present invention, a low-precision arithmetic processing means for performing a predetermined arithmetic operation with a first accuracy and a high-precision arithmetic processing means for performing a predetermined arithmetic operation with a second accuracy higher than the first accuracy are provided. The first data conversion means provided at the end point of the high-precision arithmetic processing means side of the communication path for exchanging data with the high-precision arithmetic processing means of the connection destination performs high-precision arithmetic operation on the data transferred between It is data that can be handled by the processing means, and the amount of data passing through the communication channel is equal to or less than the amount of data when data with the first accuracy is used, and the accuracy of the data passing through the communication channel is equal to or less than the first accuracy. Then, the data passing between the communication channel and the high-precision arithmetic processing means is subjected to a predetermined conversion, and the first data conversion means transfers the data to and from the high-precision arithmetic processing means of the connection destination. is data that can be handled by the high-precision arithmetic processing means, the amount of data passing through the communication channel is less than the amount of data when using data of the first precision, and the accuracy of the data passing through the communication channel is the first precision Data passing between the communication path and the high-precision arithmetic processing means is subjected to a predetermined conversion so that the data passing between the communication path and the high-precision arithmetic processing means is further reduced, and the second data conversion means provided at the end point of the communication path on the side of the low-precision arithmetic processing means is , the data transferred between the low-precision arithmetic processing means of the connection destination is data that can be handled by the low-precision arithmetic processing means, and the amount of data passing through the communication path is data in the case of using data of the first precision A predetermined conversion is performed on data passing between the communication channel and the low-precision arithmetic processing means so that the accuracy of the data passing through the communication channel is less than the amount and the accuracy of the data passing through the communication channel is lower than the first accuracy. .

本発明によれば、高い精度を必要とする演算と高い精度を必要としない演算が混在する処理をさらに効率化できる。 According to the present invention, it is possible to further improve the efficiency of processing in which operations that require high accuracy and operations that do not require high accuracy are mixed.

本発明のデータ処理方法の例としての学習方法の概略を示す説明図である。FIG. 2 is an explanatory diagram showing an outline of a learning method as an example of the data processing method of the present invention; あるユニットの入出力および他ユニットとの結合の例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of input/output of a certain unit and connection with other units; 第１の実施形態の学習装置の構成例を示すブロック図である。1 is a block diagram showing a configuration example of a learning device according to a first embodiment; FIG. 学習処理部１０６のハードウエア構成の例を示す構成図である。3 is a configuration diagram showing an example of the hardware configuration of a learning processing unit 106; FIG. 低精度演算回路１１における演算精度と高精度演算回路１２における演算精度の組み合わせの例を示す説明図である。FIG. 3 is an explanatory diagram showing an example of a combination of arithmetic accuracy in a low-precision arithmetic circuit 11 and arithmetic accuracy in a high-precision arithmetic circuit 12; 学習装置１００にかかるコンピュータの構成例を示す概略ブロック図である。2 is a schematic block diagram showing a configuration example of a computer related to the learning device 100; FIG. 演算回路の例を示す概略構成図である。1 is a schematic configuration diagram showing an example of an arithmetic circuit; FIG. 演算回路の他の例を示す概略構成図である。FIG. 4 is a schematic configuration diagram showing another example of an arithmetic circuit; 演算回路の他の例を示す概略構成図である。FIG. 4 is a schematic configuration diagram showing another example of an arithmetic circuit; 演算回路の他の例を示す概略構成図である。FIG. 4 is a schematic configuration diagram showing another example of an arithmetic circuit; 第１の実施形態の学習装置１００の動作の例を示すフローチャートである。4 is a flow chart showing an example of the operation of the learning device 100 of the first embodiment; 学習装置１００のより具体的な動作例を示すフローチャートである。4 is a flowchart showing a more specific example of operation of the learning device 100; 学習装置１００のより具体的な動作の他の例を示すフローチャートである。7 is a flow chart showing another example of a more specific operation of the learning device 100. FIG. 学習装置１００のより具体的な動作の他の例を示すフローチャートである。7 is a flow chart showing another example of a more specific operation of the learning device 100. FIG. 第２の実施形態のデータ処理装置の構成例を示す説明図である。It is an explanatory view showing an example of composition of a data processor of a 2nd embodiment. 第２の実施形態のデータ処理装置の構成例を示す説明図である。It is an explanatory view showing an example of composition of a data processor of a 2nd embodiment. 本発明のデータ処理装置の概要を示すブロック図である。1 is a block diagram showing an outline of a data processing device of the present invention; FIG. 本発明のデータ処理回路の構成を示す構成図である。1 is a configuration diagram showing the configuration of a data processing circuit of the present invention; FIG. 本発明のデータ処理回路の他の構成を示す構成図である。4 is a configuration diagram showing another configuration of the data processing circuit of the present invention; FIG.

以下、本発明の実施形態について図面を参照して説明する。以下では、高い精度を必要とする演算と高い精度を必要としない演算が混在する処理の例に深層学習における学習処理を用いて本発明を説明するが、本発明が適用される処理、装置およびデータ処理方法は学習処理、学習装置および学習方法に限定されない。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described with reference to the drawings. Hereinafter, the present invention will be described using learning processing in deep learning as an example of processing in which calculations that require high accuracy and calculations that do not require high accuracy are mixed. The data processing method is not limited to the learning process, learning device and learning method.

まず、本発明のデータ処理の例としての学習処理の概略を説明する。図１（ａ）は、入力層と出力層との間に１つ以上の中間層を含むニューラルネットワークにおける一般的な学習方法およびそのための回路構成の例を示す説明図であり、図１（ｂ）は、本発明のデータ処理方法の例としての学習方法およびそのための回路構成の例を示す説明図である。 First, an outline of learning processing as an example of data processing of the present invention will be described. FIG. 1(a) is an explanatory diagram showing a general learning method in a neural network including one or more intermediate layers between an input layer and an output layer and an example of a circuit configuration therefor; ) is an explanatory diagram showing an example of a learning method and a circuit configuration therefor as an example of the data processing method of the present invention.

図１（ａ）に示す例では、汎用用途の学習アルゴリズムに対応すべく、大規模学習回路９０を用いて、所定の判別モデルであるニューラルネットワーク全体を学習する。 In the example shown in FIG. 1(a), a large-scale learning circuit 90 is used to learn the entire neural network, which is a predetermined discriminant model, in order to deal with general-purpose learning algorithms.

なお、図１では、回路に付した吹き出しに、ニューラルネットワークの学習過程における処理の方向およびその範囲を模式的に示している。吹き出し内において、符号５１（図中の丸）はニューラルネットワークにおけるニューロンに相当するユニットを表す。また、符号５２（図中のユニット間を結ぶ線）は、ユニット間結合を表す。また、符号５３（図中の右向きの太線矢印）は、推論処理およびその範囲を表す。また、符号５４（図中の左向きの太線矢印）は、パラメタ更新処理およびその範囲を表す。なお、図１では、各ユニットへの入力が前段の層のユニットの出力となるフィードフォワード型のニューラルネットワークの例を示しているが、各ユニットへの入力はこれに限らない。例えば、時系列情報を保持している場合には、リカレント型のニューラルネットワークのように、各ユニットへの入力に、前の時刻における前段の層のユニットの出力を含めることも可能である。なお、そのような場合も、推論処理の方向は、入力層から出力層へと向かう方向（順方向）であるとみなされる。このように入力層から所定の順番で行われる推論処理は「順伝搬」とも呼ばれる。一方、パラメタ更新処理の方向は、特に限定されない。図中のパラメタ更新処理のように、出力層から入力層へと向かう方向（逆方向）であってもよい。なお、図中のパラメタ更新処理の方向は、誤差逆伝搬法の例であるが、パラメタ更新処理は誤差逆伝搬法に限定されない。例えば、パラメタ更新処理がＳＴＤＰ（Spike Timing Dependent Plasticity）等であってもよい。 In FIG. 1, the balloon attached to the circuit schematically shows the direction and range of processing in the learning process of the neural network. In the balloon, reference numerals 51 (circles in the figure) represent units corresponding to neurons in the neural network. Reference numeral 52 (a line connecting units in the drawing) represents a connection between units. Reference numeral 53 (rightward thick arrow in the figure) represents the inference process and its range. Reference numeral 54 (thick leftward arrow in the figure) represents parameter update processing and its range. Note that FIG. 1 shows an example of a feedforward neural network in which the input to each unit is the output of the unit in the preceding layer, but the input to each unit is not limited to this. For example, when time-series information is held, the input to each unit can include the output of the previous layer unit at the previous time, like a recurrent neural network. Even in such a case, the direction of inference processing is considered to be the direction (forward direction) from the input layer to the output layer. Such inference processing performed in a predetermined order from the input layer is also called “forward propagation”. On the other hand, the direction of parameter update processing is not particularly limited. It may be in the direction (reverse direction) from the output layer to the input layer as in the parameter update processing in the figure. The direction of the parameter updating process in the figure is an example of the error backpropagation method, but the parameter updating process is not limited to the error backpropagation method. For example, the parameter update process may be STDP (Spike Timing Dependent Plasticity) or the like.

ニューラルネットワークに限らず、深層学習におけるモデルの学習方法の例としては、次のような学習方法が挙げられる。まず、入力層に学習用データを入力した後、出力層までの各層で順方向に各ユニットの出力を計算する推論処理を行う（順伝搬：図中の矢印５３参照）。次いで、出力層からの出力（最終出力）と学習用データで示される入力と出力の関係等とから算出される誤差に基づいて、その誤差を最小化するように、出力層から第１層までの各層を逆方向に辿って、層内の各ユニットの出力を計算するためのパラメタを更新するパラメタ更新処理を行う（逆伝搬：図中の矢印５４参照）。 Examples of model learning methods in deep learning, not limited to neural networks, include the following learning methods. First, after inputting learning data to the input layer, inference processing is performed to calculate the output of each unit in the forward direction in each layer up to the output layer (forward propagation: see arrow 53 in the figure). Then, based on the error calculated from the output (final output) from the output layer and the relationship between the input and output indicated by the learning data, etc., from the output layer to the first layer so as to minimize the error , and performs parameter update processing for updating parameters for calculating the output of each unit in the layer (back propagation: see arrow 54 in the figure).

図１（ａ）に示すように、モデル全体を学習対象とした場合、パラメタ更新処理で、入力層より後段の全ての層（第１層～第ｎ層）において層内の各ユニットの出力を計算するためのパラメタ（例えば、層内の各ユニットと他の層のユニットを結合するユニット結合の重み等）を更新する。このようなパラメタ更新処理を、例えば学習用データを替えながら複数回繰り返すことにより、高い認識率を有する学習済みモデルを生成できる。図１（ａ）では、そのような学習を行う演算回路の実現例として、上記の推論処理とパラメタ更新処理とを高い演算精度で行う大規模学習回路９０が示されている。しかし、推論処理やパラメタ更新処理の演算精度が高いほど、またその処理の計算範囲が広いほど、誤差関数の展開項数が増えて回路が大規模化するため、消費電力が非常に増大する。 As shown in Fig. 1(a), when the entire model is targeted for learning, in the parameter update process, the output of each unit in each layer in all layers (1st to nth layers) after the input layer is Update the parameters for the calculation (eg, the weight of the unit connection that connects each unit in a layer with units in other layers). By repeating such parameter update processing a plurality of times while changing the learning data, for example, a trained model having a high recognition rate can be generated. FIG. 1A shows a large-scale learning circuit 90 that performs the above-described inference processing and parameter updating processing with high accuracy as an implementation example of an arithmetic circuit that performs such learning. However, the higher the calculation accuracy of the inference processing and the parameter update processing, and the wider the calculation range of the processing, the larger the number of expansion terms of the error function and the larger the circuit, resulting in a significant increase in power consumption.

一方、本発明では、図１（ｂ）に示すように、モデルの一部のみを学習対象とする。なお、ここでいう学習は、上記と同様、より実際の学習処理である、パラメタ更新処理を指す。モデルの一部のみを学習対象とした場合、順伝搬までは上記と同様に行う。その上で、出力層からの出力（最終出力）と学習用データで示される入力と出力の関係等とから算出される誤差に基づいて、指定されたユニット（例えば、出力層である第ｎ層から第ｋ層までの各層内のユニット）についてのみ、当該ユニットの出力を計算するためのパラメタ（例えば、他ユニットとの結合にかかる重み等）を更新するパラメタ更新処理を行う。 On the other hand, in the present invention, as shown in FIG. 1B, only part of the model is subject to learning. Note that learning here refers to parameter update processing, which is more actual learning processing, as in the above case. If only part of the model is to be learned, forward propagation is performed in the same manner as above. Then, based on the error calculated from the output (final output) from the output layer and the relationship between the input and output indicated by the learning data, a specified unit (for example, the nth layer which is the output layer to the k-th layer), a parameter updating process is performed to update the parameters for calculating the output of the unit (for example, the weight for coupling with other units, etc.).

図１（ｂ）では、そのような学習を行う演算回路１０の実現例として、高い演算精度で指定された一部のユニットのパラメタ更新処理を行う高精度演算回路１２と、高精度演算回路１２よりも低い演算精度で少なくとも指定されたユニットの推論処理を行う低精度演算回路１１とを組み合わせた例が示されている。このような２つの異なる演算精度を有する演算回路を備えた上で、高精度演算回路１２に対しては、例えば、高精度演算が必要な一部のユニットについてパラメタ更新処理を行わせ、低精度演算回路１１に対しては、高精度演算が必要でない他の処理を行わせる。このようにして１つの学習用データに対する学習演算の中で、推論処理の少なくとも一部を低い演算精度で実施し、かつパラメタ更新処理の少なくとも一部を高い演算精度で実施するとともに、高い演算精度で実施するパラメタ更新処理の範囲を最適化することで、コンピュータ資源を効率化（低消費電力等）しつつ、十分な演算精度を確保する。 In FIG. 1B, as an implementation example of the arithmetic circuit 10 that performs such learning, a high-accuracy arithmetic circuit 12 that performs parameter update processing of some units specified with high arithmetic accuracy and a high-accuracy arithmetic circuit 12 An example of combination with a low-accuracy arithmetic circuit 11 that performs inference processing of at least a designated unit with an arithmetic accuracy lower than that is shown. In addition to such arithmetic circuits having two different arithmetic accuracies, the high-precision arithmetic circuit 12 is caused, for example, to perform parameter update processing for some units that require high-precision arithmetic, and low-precision arithmetic The arithmetic circuit 11 is caused to perform other processing that does not require high-precision arithmetic. In this way, in the learning calculation for one piece of learning data, at least part of the inference processing is performed with low calculation accuracy, and at least part of the parameter update processing is performed with high calculation accuracy, and high calculation accuracy is performed. By optimizing the range of parameter update processing performed in , sufficient computational accuracy is ensured while improving the efficiency of computer resources (low power consumption, etc.).

なお、図１（ｂ）では出力側の一部の層をパラメタの更新を行う範囲（実際の学習範囲）とする例を示したが、パラメタの更新範囲は出力側の層に限られず、例えば、第１層～第ｎ層のうちの奇数層や偶数層などといった個別的な指定も可能である。また、図１（ｂ）では、パラメタ更新処理自体の範囲を制限する例を示したが、パラメタ更新処理自体の範囲は制限せず、高演算精度で実施するパラメタ更新処理の範囲を制限してもよい。すなわち、全てのユニットのうちの一部のユニットについてのみ高演算精度でパラメタ更新処理を行い、それ以外のユニットについては低い演算精度でパラメタ更新処理を行うことも可能である。なお、パラメタ更新処理の対象として、高精度演算により実施されるユニットと、低精度演算により実施されるユニットと、実施されないユニット（その際、パラメタは固定される）の３種類に分けることも可能である。 Although FIG. 1(b) shows an example in which some layers on the output side are used as the range for updating parameters (actual learning range), the range for updating parameters is not limited to the layers on the output side. , odd-numbered layers and even-numbered layers among the first to n-th layers can also be specified individually. Also, FIG. 1B shows an example in which the range of the parameter update process itself is limited, but the range of the parameter update process itself is not limited, and the range of the parameter update process performed with high calculation accuracy is limited. good too. In other words, it is possible to perform parameter update processing with high calculation accuracy only for some units among all units, and to perform parameter update processing with low calculation accuracy for other units. It is also possible to divide the target of parameter update processing into three types: units that are executed by high-precision calculations, units that are executed by low-precision calculations, and units that are not executed (in which case the parameters are fixed). is.

また、高精度演算と低精度演算の対象とする処理の分け方の他の例としては、全てのユニットの推論処理を低精度演算で行い、かつ全てのユニットのパラメタ更新処理を高精度演算で行うことも可能である。また、例えば、全てのユニットの推論処理を低精度演算で行い、かつ一部のユニットのパラメタ更新処理を高精度演算で行うことも可能である。その場合、高精度演算の対象外とされた残りの一部のユニットについては、低精度演算でパラメタ更新処理を行ってもよいし、パラメタ更新処理の対象外としてもよい。また、例えば、一部のユニットについては推論処理およびパラメタ更新処理を低精度演算で行い、残りの一部のユニットについては推論処理およびパラメタ更新処理を高精度演算で行うことも可能である。 Another example of how to divide the processing targeted for high-precision calculations and low-precision calculations is to perform inference processing for all units using low-precision calculations, and perform parameter update processing for all units using high-precision calculations. It is also possible to Further, for example, it is possible to perform the inference processing of all units by low-precision calculations and perform the parameter update processing of some units by high-precision calculations. In this case, the remaining units excluded from high-precision calculation may be subjected to parameter update processing using low-precision calculation, or may be excluded from parameter update processing. Also, for example, it is possible to perform inference processing and parameter update processing with low-precision calculations for some units, and perform inference processing and parameter update processing with high-precision calculations for some remaining units.

換言すると、本発明のデータ処理方法の例としての学習方法は、学習装置が、相対的に低い演算精度を有する低精度演算回路と、相対的に高い演算精度を有する高精度演算回路とを備え、低精度演算回路に少なくとも一部のユニットの推論処理を行わせ、かつ高精度演算回路に少なくとも一部のユニットのパラメタ更新処理を行わせるものであればよい。その上で、残りの一部のユニットの推論処理については低精度演算回路で行ってもよいし、高精度演算回路で行ってもよい。また、上記の残りの一部のユニットのパラメタ更新処理については低精度演算回路で行ってもよいし、処理そのものを省略してもよい。どのユニットについて高精度の推論処理の対象とするか低精度の推論処理の対象とするかや、どのユニットについて高精度のパラメタ更新処理の対象とするか低精度のパラメタ更新処理の対象とするかもしくは処理対象外とするか等については、特に限定されない。 In other words, in the learning method as an example of the data processing method of the present invention, the learning device includes a low-accuracy arithmetic circuit having relatively low arithmetic accuracy and a high-accuracy arithmetic circuit having relatively high arithmetic accuracy. , the low-precision arithmetic circuit is caused to perform the inference processing of at least some of the units, and the high-precision arithmetic circuit is made to perform the parameter update processing of at least some of the units. After that, the inference processing of the remaining part of the units may be performed by the low-precision arithmetic circuit or by the high-precision arithmetic circuit. Further, the parameter update processing of the remaining part of the units may be performed by a low-precision arithmetic circuit, or the processing itself may be omitted. Which units are targeted for high-precision inference processing or low-precision inference processing, and which units are targeted for high-precision parameter update processing or low-precision parameter update processing There is no particular limitation as to whether or not to be processed.

なお、上記は、異なる演算精度を有する２つの演算回路を利用する場合の例であるが、例えば、異なる演算精度を有する２以上の演算回路を利用する場合も基本的に同様である。すなわち、ある一部のユニットの推論処理を行う演算回路の演算精度に対して、より高い演算精度を有する演算回路にてある一部のユニットのパラメタ更新処理が行われる構成であれば、他の一部のユニットの推論処理およびパラメタ更新処理が具体的にどの演算回路で行われるかまたは処理自体が行われないかは特に限定されない。 The above is an example of using two arithmetic circuits with different arithmetic accuracies, but basically the same applies to the case of using two or more arithmetic circuits with different arithmetic accuracies, for example. That is, if the configuration is such that the parameter update processing of some units is performed in an arithmetic circuit having a higher arithmetic accuracy than the arithmetic circuit that performs the inference processing of some units, other There is no particular limitation as to which arithmetic circuit specifically performs the inference processing and parameter update processing of some units or whether the processing itself is not performed.

図２は、１つのユニットに着目したときの当該ユニットの入出力および他ユニットとの結合の例を示す説明図である。図２（ａ）に１つのユニットの入出力の例、（ｂ）に２層に並べられたユニット間の結合の例を示す。図２（ａ）に示すように、１つのユニットに対して４つの入力（ｘ_１～ｘ_４）と１つの出力（ｚ）があった場合に、当該ユニットの動作は例えば、式（１Ａ）のように表される。ここで、ｆ（）は活性化関数を表している。FIG. 2 is an explanatory diagram showing an example of input/output of a unit and connection with other units when focusing attention on one unit. FIG. 2A shows an example of input/output of one unit, and FIG. 2B shows an example of connection between units arranged in two layers. As shown in FIG. 2(a), when there are four inputs (x ₁ to x ₄ ) and one output (z) for one unit, the operation of the unit is expressed by equation (1A). is represented as where f() represents the activation function.

ｚ＝ｆ（ｕ）・・・（１Ａ）
ただし、ｕ＝ａ＋ｗ_１ｘ_１＋ｗ_２ｘ_２＋ｗ_３ｘ_３＋ｗ_４ｘ_４・・・（１Ｂ）z=f(u) (1A)
However, u= _a +w1x1 ₊ _w2x2 ₊ _w3x3 + _w4x4 ₍ _1B )

式（１Ｂ）において、ａは切片、ｗ_１～ｗ_４は各入力（ｘ_１～ｘ_４）に対応した重み等のパラメタを表す。In equation (1B), a is an intercept, and w ₁ to w ₄ are parameters such as weights corresponding to each input (x ₁ to x ₄ ).

一方、図２（ｂ）に示すように、２層に並べられた層間で各ユニットが結合されている場合、後段の層に着目すると、当該層内の各ユニットへの入力（それぞれｘ_１～ｘ_４）に対する当該各ユニットの出力（ｚ_１～ｚ_４）は、例えば、次のように表される。なお、ｉは同一層内のユニットの識別子（本例ではｉ＝１～３）である。On the other hand, as shown in FIG. 2(b), when each unit is connected between layers arranged in two layers, focusing on the latter layer, the input to each unit in the layer (each x ₁ to x ₄ ) of each unit (z ₁ to z ₄ ) are expressed as follows, for example. Note that i is an identifier of a unit in the same layer (i=1 to 3 in this example).

ｚ_ｉ＝ｆ（ｕ_ｉ）・・・（２Ａ）
ただし、ｕ_ｉ＝ａ＋ｗ_ｉ，１ｘ_１＋ｗ_ｉ，２ｘ_２＋ｗ_ｉ，３ｘ_３＋ｗ_ｉ，４ｘ_４・・・（２Ｂ）z _i =f(u _i ) (2A)
However, u _i =a+wi _,1 x ₁ +wi _,2 x ₂ +wi _,3 x ₃ +wi _,4 x ₄ (2B)

以下では、式（２Ｂ）を単純化して、ｚ_ｉ＝Σｗ_ｉ，ｋ＊ｘ_ｋと記す場合がある。なお、切片ａは省略した。なお、切片ａを値１の定数項の係数（パラメタの１つ）とみなすことも可能である。ここで、ｋは当該層における各ユニットへの入力、より具体的にはその入力を行う他のユニットの識別子を表す。このとき、当該層における各ユニットへの入力が前段の層の各ユニットの出力のみである場合には、上述の簡略式を、ｕ_ｉ ^（Ｌ）＝Σｗ_ｉ，ｋ ^（Ｌ）＊ｚ_ｋ ^{（Ｌ－１）}と記すことも可能である。なお、Ｌは層の識別子を表す。これらの式において、ｗ_ｉ，ｋが、当該層（第Ｌ層）における各ユニットｉのパラメタ、より具体的には、各ユニットｉと他のユニットｋとの結合（ユニット間結合）の重みに相当する。以下では、ユニットを特に区別せず、ユニットの出力値を決める関数（活性化関数）を簡略化して、ｚ＝Σｗ＊ｘと記す場合がある。Equation (2B) may be simplified below as z _i =Σw _i,k *x _k . In addition, the intercept a is omitted. It is also possible to regard the intercept a as a constant term coefficient (one of the parameters) with a value of 1. Here, k represents the input to each unit in the layer, more specifically, the identifier of another unit that performs the input. At this time, when the input to each unit in the layer is only the output of each unit in the previous layer, the above simplified expression is u _i ^(L) =Σw _i,k ^(L) * z _k ^{( L-1)} can also be written. Note that L represents a layer identifier. In these expressions, wi _,k is the parameter of each unit i in the layer (L-th layer), more specifically, the weight of the connection (inter-unit connection) between each unit i and another unit k. Equivalent to. In the following description, a function (activation function) that determines the output value of a unit may be simply expressed as z=Σw*x without distinguishing between units.

上記の例において、あるユニットについて入力ｘから出力ｚを求める計算が、当該ユニットにおける推論処理に相当する。このとき、パラメタｗは固定される。一方、あるユニットについてパラメタｗを求める計算が当該ユニットにおけるパラメタ更新処理に相当する。 In the above example, the calculation for obtaining the output z from the input x for a certain unit corresponds to the inference processing in that unit. At this time, the parameter w is fixed. On the other hand, the calculation for obtaining the parameter w for a certain unit corresponds to parameter update processing for that unit.

実施形態１．
図３は、第１の実施形態の学習装置の構成例を示すブロック図である。図３に示す学習装置１００は、学習前モデル記憶部１０１と、学習用データ記憶部１０２と、学習処理部１０６と、学習後モデル記憶部１０７とを備える。Embodiment 1.
FIG. 3 is a block diagram showing a configuration example of the learning device of the first embodiment. The learning device 100 shown in FIG. 3 includes a pre-learning model storage unit 101 , a learning data storage unit 102 , a learning processing unit 106 , and a post-learning model storage unit 107 .

学習前モデル記憶部１０１は、学習前のモデルの情報を記憶する。学習前のモデルの情報には、パラメタの初期値が含まれていてもよい。 The pre-learning model storage unit 101 stores information on the pre-learning model. The information of the model before learning may include the initial values of the parameters.

学習用データ記憶部１０２は、モデルの学習に用いるデータである学習用データを記憶する。なお、学習用データの形式は特に問わない。 The learning data storage unit 102 stores learning data that is used for model learning. Note that the format of the learning data is not particularly limited.

学習処理部１０６は、学習用データ記憶部１０２に記憶された学習用データを用いて、学習前モデル記憶部１０１に記憶されているモデルの学習を行う。 The learning processing unit 106 uses the learning data stored in the learning data storage unit 102 to learn the model stored in the pre-learning model storage unit 101 .

本実施形態の学習処理部１０６は、少なくとも高効率推論処理部１０３ａと高精度パラメタ更新処理部１０４ｂと制御部１０５とを含む。なお、学習処理部１０６は、図３に示すように、さらに高精度推論処理部１０３ｂと高効率パラメタ更新処理部１０４ａを含んでいてもよい。 The learning processing unit 106 of this embodiment includes at least a high-efficiency inference processing unit 103a, a high-precision parameter update processing unit 104b, and a control unit 105. FIG. Note that the learning processing unit 106 may further include a high-accuracy inference processing unit 103b and a high-efficiency parameter update processing unit 104a, as shown in FIG.

高効率推論処理部１０３ａは、指定された層またはユニットを対象とする推論処理を、第１の演算精度で行う。 The high-efficiency inference processing unit 103a performs inference processing on the specified layer or unit with the first calculation accuracy.

高精度パラメタ更新処理部１０４ｂは、指定された層、ユニットまたはパラメタを対象とするパラメタ更新処理を、第１の演算精度よりも高い演算精度の第２の演算精度で行う。 The high-precision parameter update processing unit 104b performs parameter update processing for a designated layer, unit, or parameter with a second calculation precision higher than the first calculation precision.

制御部１０５は、学習処理を実施する各処理部（本例では、高効率推論処理部１０３ａ、高精度推論処理部１０３ｂ、高効率パラメタ更新処理部１０４ａおよび高精度パラメタ更新処理部１０４ｂ）を制御して、必要な学習処理を実施させる。制御部１０５は、より具体的には、学習前のモデルおよび学習用データの読み込み、学習処理を実施する各処理部へ演算の指示を行うことによる学習処理にかかる演算精度の切替制御を行う。演算の指示には、演算対象とするユニットの指定や演算に必要なパラメータの入力が含まれる。 The control unit 105 controls each processing unit (in this example, the high efficiency inference processing unit 103a, the high accuracy inference processing unit 103b, the high efficiency parameter update processing unit 104a and the high accuracy parameter update processing unit 104b) that performs the learning process. to perform the required learning process. More specifically, the control unit 105 reads a pre-learning model and learning data, and controls the switching of the calculation accuracy of the learning process by instructing each processing unit that performs the learning process to perform the calculation. Instructions for calculation include designation of a unit to be calculated and input of parameters required for calculation.

学習後モデル記憶部１０７は、学習後のモデルの情報を記憶する。学習後のモデルの情報には、各ユニットの更新後のパラメタの値が含まれていてもよい。 The learned model storage unit 107 stores information on the model after learning. The information of the model after learning may include the updated parameter values of each unit.

また、図４は、学習処理部１０６のハードウエア構成の例を示す構成図である。図４に示すように、学習処理部１０６は、低精度演算回路１１と、高精度演算回路１２と、メモリ１３と、制御装置１４とがそれぞれバス１５を介して接続された演算処理装置等により実現されてもよい。なお、高精度演算回路１２は、低精度演算回路１１よりも高い演算精度で演算が可能な回路であればよい。 4 is a configuration diagram showing an example of the hardware configuration of the learning processing unit 106. As shown in FIG. As shown in FIG. 4, the learning processing unit 106 is implemented by an arithmetic processing unit or the like in which a low-precision arithmetic circuit 11, a high-precision arithmetic circuit 12, a memory 13, and a control device 14 are connected via a bus 15. may be implemented. It should be noted that the high-precision arithmetic circuit 12 may be any circuit that can perform arithmetic operations with higher precision than the low-precision arithmetic circuit 11 .

その場合において、高効率推論処理部１０３ａおよび高効率パラメタ更新処理部１０４ａは、例えば、低精度演算回路１１により実現されてもよい。また、高精度推論処理部１０３ｂおよび高精度パラメタ更新処理部１０４ｂは、例えば、高精度演算回路１２により実現されてもよい。また、制御部１０５は、例えば、制御装置１４により実現されてもよい。 In that case, the highly efficient inference processor 103a and the highly efficient parameter update processor 104a may be realized by the low-precision arithmetic circuit 11, for example. Also, the high-precision inference processing unit 103b and the high-precision parameter update processing unit 104b may be realized by the high-precision arithmetic circuit 12, for example. Also, the control unit 105 may be realized by the control device 14, for example.

本例において、低精度演算回路１１と高精度演算回路１２はそれぞれバス１５を介して接続されており、バス１５を介してお互いの演算結果を通知するなどのデータのやり取りを行うことができる。なお、バス１５にはさらにメモリ１３が接続されていてもよく、その場合、低精度演算回路１１と高精度演算回路１２がそれぞれメモリ１３を介してデータのやりとりを行うことも可能である。その場合、メモリ１３は通信路の一部として扱われる。なお、メモリ１３は、On-chip memoryとして、低精度演算回路１１および高精度演算回路１２と同一のチップ上に実装されてもよい。すなわち、低精度演算回路１１、高精度演算回路１２およびメモリ１３が、チップ内で内部接続されていてもよい。また、メモリ１３は、Off-chip memoryとして、低精度演算回路１１や高精度演算回路１２と同一のチップ上に実装されなくてもよい。すなわち、外部メモリインタフェースを介して外部接続されていてもよい。 In this example, the low-precision arithmetic circuit 11 and the high-precision arithmetic circuit 12 are connected via a bus 15, respectively, and can exchange data, such as notifying each other of arithmetic results, via the bus 15. FIG. A memory 13 may be further connected to the bus 15, in which case the low-precision arithmetic circuit 11 and the high-precision arithmetic circuit 12 can exchange data via the memory 13, respectively. In that case, the memory 13 is treated as part of the communication path. Note that the memory 13 may be mounted on the same chip as the low-precision arithmetic circuit 11 and the high-precision arithmetic circuit 12 as an on-chip memory. That is, the low-precision arithmetic circuit 11, the high-precision arithmetic circuit 12 and the memory 13 may be internally connected within the chip. Also, the memory 13 may not be mounted on the same chip as the low-precision arithmetic circuit 11 or the high-precision arithmetic circuit 12 as an off-chip memory. That is, it may be externally connected via an external memory interface.

本実施形態では、学習処理（特に、推論処理およびパラメタ更新処理）を実施する処理部が実際に演算に用いる数字データの値域の広さ・細かさの尺度（より具体的には、その処理部を実現する演算回路におけるビット幅および小数点の取り扱い等で定まる数字データの値域の広さおよび細かさの尺度）を、「精度」または「演算精度」と呼ぶ。低精度演算回路１１における演算精度である低演算精度と高精度演算回路１２における演算精度である高演算精度の組み合わせの例としては、例えば、図５に示すような組み合わせが挙げられる。図５は、低精度演算回路１１における演算精度である低演算精度と高精度演算回路１２における演算精度である高演算精度の組み合わせの例を示す説明図である。 In the present embodiment, the scale of the range and fineness of numeric data actually used in calculations by the processing unit that performs learning processing (especially inference processing and parameter update processing) (more specifically, the processing unit A measure of the breadth and fineness of the range of numerical data determined by the bit width and the handling of decimal points in an arithmetic circuit that implements is called "precision" or "calculation accuracy". Examples of combinations of low arithmetic accuracy, which is the arithmetic accuracy in the low-accuracy arithmetic circuit 11, and high arithmetic accuracy, which is the arithmetic accuracy in the high-accuracy arithmetic circuit 12, include combinations such as those shown in FIG. FIG. 5 is an explanatory diagram showing an example of a combination of low arithmetic precision, which is the arithmetic precision in the low-precision arithmetic circuit 11, and high arithmetic precision, which is the arithmetic precision in the high-precision arithmetic circuit 12. As shown in FIG.

なお、低精度演算回路１１における演算精度と高精度演算回路１２における演算精度の組み合わせは、図５に示すものに限定されない。例えば、低精度演算回路１１における演算精度（低演算精度）を、固定少数点の｛１，２，８，１６｝ビットのいずれかまたは整数の｛１，２，８，１６｝ビットのいずれかとし、高精度演算回路１２における演算精度（高演算精度）を、固定小数点の｛２，８，１６，３２｝ビット、浮動小数点の｛９，１６，３２｝ビットのいずれかまたはpower of 2の浮動小数点の｛８，１６，２４，３２｝ビットのいずれかとしてもよい。ただし、高演算精度は、低演算精度に比べて、高精度（例えば、数字データの値域がより広い、数値データの値域がより細かいなど、表現可能な有効桁数がより大きい）であるものとする。 Note that the combination of the arithmetic accuracy in the low-accuracy arithmetic circuit 11 and the arithmetic accuracy in the high-accuracy arithmetic circuit 12 is not limited to that shown in FIG. For example, the arithmetic precision (low arithmetic precision) in the low-precision arithmetic circuit 11 is either fixed decimal point {1, 2, 8, 16} bits or integer {1, 2, 8, 16} bits. and the arithmetic precision (high arithmetic precision) in the high-precision arithmetic circuit 12 is either fixed point {2, 8, 16, 32} bits, floating point {9, 16, 32} bits, or power of 2 It may be any of {8, 16, 24, 32} bits of floating point. However, high precision arithmetic means higher precision than low arithmetic precision (for example, the range of numeric data is wider, the range of numeric data is finer, and the number of significant digits that can be represented is greater). do.

また、図６は、学習装置１００にかかるコンピュータの構成例を示す概略ブロック図である。コンピュータ１０００は、プロセッサ１００８と、主記憶装置１００２と、補助記憶装置１００３と、インタフェース１００４と、ディスプレイ装置１００５と、入力デバイス１００６とを備える。また、プロセッサ１００８は、ＣＰＵ１００１や、ＧＰＵ１００７などの各種演算・処理装置を含んでいてもよい。 FIG. 6 is a schematic block diagram showing a configuration example of a computer related to the learning device 100. As shown in FIG. Computer 1000 includes processor 1008 , main memory device 1002 , auxiliary memory device 1003 , interface 1004 , display device 1005 and input device 1006 . Also, the processor 1008 may include various arithmetic/processing devices such as the CPU 1001 and the GPU 1007 .

学習装置１００は、例えば、図６に示すようなコンピュータ１０００に実装されてもよい。その場合、学習装置１００（特に、制御部１０５）の動作は、プログラムの形式で補助記憶装置１００３に記憶されていてもよい。ＣＰＵ１００１は、プログラムを補助記憶装置１００３から読み出して主記憶装置１００２に展開し、そのプログラムに従って学習装置１００における所定の処理を実施する。なお、ＣＰＵ１００１は、プログラムに従って動作する情報処理装置の一例であり、コンピュータ１０００は、ＣＰＵ（Central Processing Unit）以外にも、例えば、ＭＰＵ（Micro Processing Unit）やＭＣＵ（Memory Control Unit）やＧＰＵ（Graphics Processing Unit）を備えていてもよい。 The learning device 100 may be implemented in a computer 1000 as shown in FIG. 6, for example. In that case, the operation of learning device 100 (especially control unit 105) may be stored in auxiliary storage device 1003 in the form of a program. CPU 1001 reads a program from auxiliary storage device 1003, develops it in main storage device 1002, and executes predetermined processing in study device 100 according to the program. Note that the CPU 1001 is an example of an information processing apparatus that operates according to a program, and the computer 1000 includes, in addition to a CPU (Central Processing Unit), an MPU (Micro Processing Unit), an MCU (Memory Control Unit), and a GPU (Graphics Unit). Processing Unit).

図６では、コンピュータ１０００が、ＣＰＵ１００１に加えて、上記の低精度演算回路１１および高精度演算回路１２を実装するＧＰＵ１００７をさらに備える例が示されているが、低精度演算回路１１および高精度演算回路１２が他のプロセッサや演算装置（後述するＭＡＣ（multiplier-accumulator）や乗算器ツリーやＡＬＵ（Arthmetic Logic Unit）アレイ等）により実装される場合は本例の限りではなく、当該他のプロセッサや演算装置を備えていればよい。また、低精度演算回路１１および高精度演算回路１２は異なるチップに実装されてもよく、具体的なチップ構成は特に限定されない。 FIG. 6 shows an example in which the computer 1000 further includes a GPU 1007 implementing the low-precision arithmetic circuit 11 and the high-precision arithmetic circuit 12 in addition to the CPU 1001. However, the low-precision arithmetic circuit 11 and the high-precision arithmetic If the circuit 12 is implemented by another processor or arithmetic device (MAC (multiplier-accumulator), multiplier tree, ALU (Arthmetic Logic Unit) array, etc. described later), it is not limited to this example, and the other processor or It suffices if it has an arithmetic unit. Also, the low-precision arithmetic circuit 11 and the high-precision arithmetic circuit 12 may be mounted on different chips, and the specific chip configuration is not particularly limited.

補助記憶装置１００３は、一時的でない有形の媒体の一例である。一時的でない有形の媒体の他の例として、インタフェース１００４を介して接続される磁気ディスク、光磁気ディスク、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、半導体メモリ等が挙げられる。また、このプログラムが通信回線によってコンピュータ１０００に配信される場合、配信を受けたコンピュータは１０００がそのプログラムを主記憶装置１００２に展開し、学習装置１００における所定の処理を実行してもよい。 Auxiliary storage device 1003 is an example of a non-transitory tangible medium. Other examples of non-transitory tangible media include magnetic disks, magneto-optical disks, CD-ROMs, DVD-ROMs, semiconductor memories, etc. connected via interface 1004 . Also, when this program is distributed to computer 1000 via a communication line, computer 1000 receiving the distribution may develop the program in main storage device 1002 and execute predetermined processing in learning device 100 .

また、プログラムは、学習装置１００における所定の処理の一部を実現するためのものであってもよい。さらに、プログラムは、補助記憶装置１００３に既に記憶されている他のプログラムとの組み合わせで学習装置１００における所定の処理を実現する差分プログラムであってもよい。 Further, the program may be for realizing a part of predetermined processing in learning device 100 . Furthermore, the program may be a difference program that realizes a predetermined process in learning device 100 in combination with another program already stored in auxiliary storage device 1003 .

インタフェース１００４は、他の装置との間で情報の送受信を行う。また、ディスプレイ装置１００５は、ユーザに情報を提示する。また、入力デバイス１００６は、ユーザからの情報の入力を受け付ける。 Interface 1004 transmits and receives information to and from other devices. The display device 1005 also presents information to the user. Also, the input device 1006 receives input of information from the user.

また、学習装置１００における処理内容によっては、コンピュータ１０００の一部の要素は省略可能である。例えば、コンピュータ１０００がユーザに情報を提示しないのであれば、ディスプレイ装置１００５は省略可能である。例えば、コンピュータ１０００がユーザから情報入力を受け付けないのであれば、入力デバイス１００６は省略可能である。 Also, some elements of the computer 1000 can be omitted depending on the processing content of the learning device 100 . For example, display device 1005 may be omitted if computer 1000 does not present information to a user. For example, input device 1006 may be omitted if computer 1000 does not accept information input from a user.

また、上記の各構成要素の一部または全部は、汎用または専用の回路（Circuitry）、プロセッサ等やこれらの組み合わせによって実施される。これらは単一のチップによって構成されてもよいし、バスを介して接続される複数のチップによって構成されてもよい。また、上記の各構成要素の一部又は全部は、上述した回路等とプログラムとの組み合わせによって実現されてもよい。 Also, some or all of the above components are implemented by general-purpose or dedicated circuits (Circuitry), processors, etc., or combinations thereof. These may be composed of a single chip, or may be composed of multiple chips connected via a bus. Also, part or all of the components described above may be realized by a combination of the circuits and the like described above and a program.

上記の各構成要素の一部又は全部が複数の情報処理装置や回路等により実現される場合には、複数の情報処理装置や回路等は、集中配置されてもよいし、分散配置されてもよい。例えば、情報処理装置や回路等は、クライアントアンドサーバシステム、クラウドコンピューティングシステム等、各々が通信ネットワークを介して接続される形態として実現されてもよい。 When part or all of the above components are realized by a plurality of information processing devices, circuits, etc., the plurality of information processing devices, circuits, etc. may be centrally arranged or distributed. good. For example, the information processing device, circuits, and the like may be implemented as a client-and-server system, a cloud computing system, or the like, each of which is connected via a communication network.

［回路構成］
次に、少なくとも高効率推論処理部１０３ａの実装例とされる推論回路の構成をいくつか例示する。高効率推論処理部１０３ａは、例えば、指定された層の各ユニットまたは指定されたユニットについて、当該ユニットへの入力を受け付けると、当該ユニットの出力を計算する推論処理を所定の低演算精度で行い、計算結果を出力してもよい。そのとき、高効率推論処理部１０３ａは、ユニットの出力を計算する際に用いる入力の値および他の変数（重みや切片等のパラメタ）の値を入力として受け付けて、上記の処理を行ってもよい。以下、推論処理で行われる演算を推論演算という場合がある。[Circuit configuration]
Next, some configurations of inference circuits, which are implementation examples of at least the high-efficiency inference processing unit 103a, will be exemplified. The highly efficient inference processing unit 103a, for example, for each unit in the designated layer or the designated unit, upon receiving an input to the unit, performs inference processing for calculating the output of the unit with a predetermined low arithmetic accuracy. , may output the calculation result. At that time, the high-efficiency inference processing unit 103a accepts the input values used when calculating the output of the unit and the values of other variables (parameters such as weights and intercepts) as inputs, and performs the above processing. good. An operation performed in inference processing may be referred to as an inference operation hereinafter.

以下では、推論演算を行うための回路を「推論回路」と呼び、特に、高精度パラメタ更新処理部１０４ｂが行うパラメタ更新演算の演算精度よりも低い演算精度で推論演算を行うための回路を「高効率推論回路」と呼ぶ。このようにして、推論回路の演算精度をできるだけ低く、少なくとも高精度パラメタ更新処理部１０４ｂが行うパラメタ更新演算の演算精度よりも低くする（例えば、ビット幅を３２ビットから１６ビットにする、浮動小数点演算を固定少数点演算にする等）ことで、消費電力を低減する。なお、高効率推論回路と区別するために、高精度パラメタ更新処理部１０４ｂが行うパラメタ更新演算の演算精度と同じ演算精度で推論演算を行うための回路を「高精度推論回路」と呼ぶ場合がある。上述した高精度推論処理部（図示せず）は、そのような高精度推論回路により実現されてもよい。 Hereinafter, a circuit for performing an inference operation will be referred to as an "inference circuit". It is called a high-efficiency inference circuit. In this way, the calculation precision of the inference circuit is made as low as possible, at least lower than the calculation precision of the parameter update calculation performed by the high-precision parameter update processing unit 104b (for example, the bit width is changed from 32 bits to 16 bits, floating point Power consumption can be reduced by using fixed decimal point calculations, etc.). In order to distinguish from the high-efficiency inference circuit, a circuit for performing an inference operation with the same accuracy as the parameter update operation performed by the high-precision parameter update processing unit 104b may be called a "high-precision inference circuit". be. The high-precision inference processor (not shown) described above may be realized by such a high-precision inference circuit.

以下に示す推論回路の構成は、推論演算が高精度で行われるか低精度で行われる回路かを問わず実現可能である。すなわち、高効率推論処理部１０３ａと高精度推論処理部１０３ｂの違いが、当該処理部の動作を実装した演算回路において演算に用いる各変数、加算器、乗算器の精度のみであってもよい。 The configuration of the inference circuit shown below can be implemented regardless of whether the inference operation is performed with high accuracy or low accuracy. That is, the difference between the high-efficiency inference processing unit 103a and the high-precision inference processing unit 103b may be only the precision of variables, adders, and multipliers used for calculation in the arithmetic circuit implementing the operation of the processing unit.

推論回路の最も単純な例は、乗算器と加算器を組み合わせた乗加算器（ＭＡＣ）２２１を１つ備えた構成である（図７（ａ）の演算回路２２ａ参照）。なお、符号２１はバスを表している。 The simplest example of an inference circuit is a configuration including one multiplier-adder (MAC) 221 combining a multiplier and an adder (see arithmetic circuit 22a in FIG. 7A). Reference numeral 21 denotes a bus.

ＭＡＣ２２１は、乗算器と、加算器と、３つの入力を保持する記憶素子と、１つの出力を保持する記憶素子とを含んでいてもよい（図７（ｂ）参照）。図７（ｂ）に示すＭＡＣ２２１は、３つの変数ａ，ｗ，ｘを受け付けると、１つの出力変数ｚ＝ａ＋ｗ＊ｘを計算する演算回路の例である。本例において、ｚがユニットの出力、ａ、ｗがパラメタ（推論処理では固定）、ｘがユニットの入力に相当する。このような構成において、当該回路の演算精度は、当該回路が含む乗算器や加算器のビット幅および小数点の取り扱い（浮動小数点か固定小数点か等）により決定される。例えば、高効率推論処理部１０３ａが演算回路２２ａにより実現される場合、当該回路が含むＭＡＣ２２１における各変数（ａ，ｗ，ｘ，ｚ）、加算器および乗算器による演算が低演算精度（第１の演算精度）に対応していればよい。このとき、当該回路における各変数、加算および乗算のすべてが同じ精度である必要はない（以下、同様）。例えば、各変数、加算および乗算のいずれかで用いられる精度が、高精度パラメタ更新処理部１０４ｂを実現する演算回路の各変数、加算および乗算のいずれかで用いられる精度よりも低ければよい。 The MAC 221 may include a multiplier, an adder, a storage element holding three inputs, and a storage element holding one output (see FIG. 7(b)). The MAC 221 shown in FIG. 7B is an example of an arithmetic circuit that receives three variables a, w, and x and calculates one output variable z=a+w*x. In this example, z corresponds to the output of the unit, a and w to parameters (fixed in inference processing), and x to the input of the unit. In such a configuration, the arithmetic precision of the circuit is determined by the bit width of the multiplier and adder included in the circuit and the handling of the decimal point (floating point or fixed point, etc.). For example, when the high-efficiency inference processing unit 103a is realized by the arithmetic circuit 22a, each variable (a, w, x, z) in the MAC 221 included in the circuit, and the calculation by the adder and the multiplier are performed with low calculation accuracy (first calculation accuracy). At this time, all variables, additions and multiplications in the circuit need not have the same precision (the same applies hereinafter). For example, the precision used in either addition or multiplication for each variable should be lower than the precision used in either addition or multiplication for each variable of the arithmetic circuit that implements the high-precision parameter update processing unit 104b.

図８～１０は、推論演算用の演算回路（推論回路）の他の例を示す概略構成図である。推論回路は、例えば、図８に示す演算回路２２ｂのように、複数のＭＡＣ２２１を並列に接続した構成（いわゆるＧＰＵの構成）であってもよい。このような構成においても、当該回路の演算精度は、当該回路が含む乗算器や加算器のビット幅および小数点の取り扱い（浮動小数点か固定小数点か等）により決定される。 8 to 10 are schematic configuration diagrams showing other examples of arithmetic circuits (inference circuits) for inference operations. The inference circuit may have a configuration in which a plurality of MACs 221 are connected in parallel (a so-called GPU configuration), for example, like an arithmetic circuit 22b shown in FIG. Even in such a configuration, the arithmetic precision of the circuit is determined by the bit width of the multiplier and adder included in the circuit and the handling of the decimal point (floating point or fixed point, etc.).

また、推論回路は、例えば図９に示す演算回路２２ｃのように、メモリ層２２２を介して複数の乗加算ツリー２２３が並列に接続された構成であってもよい。図９に示す乗加算ツリー２２３は、４つの乗算器と２つの加算器と１つの加算器がツリー状に接続された構成の回路である。なお、図９に示す演算回路２２ｃの一例は、非特許文献３にも開示されている。このような構成においても、当該回路の演算精度は、当該回路が含む乗算器や加算器のビット幅および小数点の取り扱い（浮動小数点か固定小数点か等）により決定される。 Also, the inference circuit may have a configuration in which a plurality of multiplication-addition trees 223 are connected in parallel via a memory layer 222, like an arithmetic circuit 22c shown in FIG. The multiply-add tree 223 shown in FIG. 9 is a circuit in which four multipliers, two adders, and one adder are connected in a tree configuration. An example of the arithmetic circuit 22c shown in FIG. 9 is also disclosed in Non-Patent Document 3. Even in such a configuration, the arithmetic precision of the circuit is determined by the bit width of the multiplier and adder included in the circuit and the handling of the decimal point (floating point or fixed point, etc.).

また、推論回路は、例えば図１０に示す演算回路２２ｄのように、メモリ層２２２を介して複数のＡＬＵ２２４がアレイ状に接続された構成（シストリックアレイ構成）であってもよい。なお、図１０に示す演算回路２２ｄの一例は、非特許文献１にも開示されている。このような構成においても、当該回路の演算精度は、当該回路が含む乗算器や加算器のビット幅および小数点の取り扱い（浮動小数点か固定小数点か等）により決定される。 Also, the inference circuit may have a configuration (systolic array configuration) in which a plurality of ALUs 224 are connected in an array via a memory layer 222, like an arithmetic circuit 22d shown in FIG. 10, for example. An example of the arithmetic circuit 22d shown in FIG. 10 is also disclosed in Non-Patent Document 1. Even in such a configuration, the arithmetic precision of the circuit is determined by the bit width of the multiplier and adder included in the circuit and the handling of the decimal point (floating point or fixed point, etc.).

なお、例えば、高効率推論処理部１０３ａが図８～図１０に示した演算回路２２ｂ、演算回路２２ｃまたは演算回路２２ｄにより実現される場合、当該回路において演算に用いられる各変数、加算器または乗算器による演算が低演算精度（第１の演算精度）に対応していればよい。 For example, when the high-efficiency inference processing unit 103a is realized by the arithmetic circuit 22b, the arithmetic circuit 22c, or the arithmetic circuit 22d shown in FIGS. It suffices if the calculation by the unit corresponds to the low calculation accuracy (first calculation accuracy).

一方、例えば、高精度推論処理部１０３ｂが演算回路２２ａ，演算回路２２ｂ、演算回路２２ｃまたは演算回路２２ｄにより実現される場合、当該回路において演算に用いられる各変数、加算器または乗算器による演算が高演算精度（第２の演算精度）に対応していればよい。 On the other hand, for example, when the high-precision inference processing unit 103b is realized by the arithmetic circuit 22a, the arithmetic circuit 22b, the arithmetic circuit 22c, or the arithmetic circuit 22d, each variable used for the arithmetic operation in the circuit, the arithmetic operation by the adder or the multiplier is It suffices if it supports high calculation accuracy (second calculation accuracy).

次に、少なくとも高精度パラメタ更新処理部１０４ｂの実装例とされるパラメタ更新用回路の構成をいくつか例示する。高精度パラメタ更新処理部１０４ｂは、例えば、指定された層の各ユニットにおける各パラメタ、指定されたユニットにおける各パラメタまたは指定されたパラメタについて、当該パラメタを調整パラメータとして含む誤差関数などの目的関数の最適化問題を解いて該調整パラメータを更新するパラメタ更新処理を所定の高演算精度で行い、更新後の値を出力してもよい。そのとき、高精度パラメタ更新処理部１０４ｂは、最適化問題を解く際に用いる変数の値（更新前のパラメタの値を含みうる）をパラメータとして受け付けて、上記の処理を行ってもよい。以下、パラメタ更新処理で行われる演算をパラメタ更新演算という場合がある。 Next, some configurations of parameter update circuits, which are implementation examples of at least the high-precision parameter update processing unit 104b, are illustrated. For example, for each parameter in each unit of the designated layer, each parameter in the designated unit, or the designated parameter, the high-precision parameter update processing unit 104b updates an objective function such as an error function including the parameter as an adjustment parameter. A parameter update process for solving the optimization problem and updating the adjustment parameter may be performed with a predetermined high calculation accuracy, and the updated value may be output. At that time, the high-accuracy parameter update processing unit 104b may receive the values of variables used when solving the optimization problem (which may include the values of the parameters before updating) as parameters and perform the above processing. Hereinafter, the computation performed in the parameter update process may be referred to as parameter update computation.

以下では、パラメタ更新演算を行うための回路を「パラメタ更新回路」と呼び、特に、高効率推論処理部１０３ａが行う推論演算の演算精度よりも高い演算精度で思い学習演算を行うための回路を「高精度パラメタ更新回路」と呼ぶ。なお、高精度パラメタ更新回路と区別するために、高効率推論処理部１０３ａが行う推論演算の演算精度と同じ演算精度でパラメタ更新演算を行うための回路を「高効率パラメタ更新回路」と呼ぶ場合がある。上述した高効率パラメタ更新処理部（図示せず）は、そのような高効率パラメタ更新回路により実現されてもよい。 Hereinafter, a circuit for performing parameter update calculations will be referred to as a "parameter update circuit", and in particular, a circuit for performing learning calculations with higher calculation accuracy than the calculation accuracy of the inference calculations performed by the high-efficiency inference processing unit 103a. It is called a "high-precision parameter update circuit". In order to distinguish from the high-precision parameter update circuit, a circuit for performing parameter update calculation with the same calculation accuracy as the calculation precision of the inference calculation performed by the high-efficiency inference processing unit 103a is called a "high-efficiency parameter update circuit". There is The high-efficiency parameter update processing section (not shown) described above may be realized by such a high-efficiency parameter update circuit.

以下に示すパラメタ更新回路の構成は、パラメタ更新演算が高精度で行われるか低精度で行われる回路かを問わず実現可能である。すなわち、高効率パラメタ更新処理部１０４ａと高精度パラメタ更新処理部１０４ｂの違いが、当該処理部の動作を実装した演算回路において演算に用いる各変数、加算器または乗算器の精度のみであってもよい。 The configuration of the parameter update circuit described below can be realized regardless of whether the parameter update operation is performed with high precision or with low precision. That is, even if the difference between the high-efficiency parameter update processing unit 104a and the high-precision parameter update processing unit 104b is only the precision of each variable, adder or multiplier used for calculation in the arithmetic circuit implementing the operation of the processing unit, good.

パラメタ更新回路の最も単純な例は、推論回路と同様、乗算器と加算器を組み合わせた乗加算器（ＭＡＣ）２２１を１つ備えた構成である（図７（ａ）の演算回路２２ａ，図７（ｂ）のＭＡＣ２２１等参照）。また、パラメタ更新回路は、例えば、図８～１０に示す演算回路２２ｂ、演算回路２２ｃ、演算回路２２ｄによっても実現できる。すなわち、図７～図１０に示す演算回路は、パラメタ更新演算用の演算回路の例でもある。 The simplest example of the parameter updating circuit is a configuration including one multiplier-adder (MAC) 221 combining a multiplier and an adder, as in the inference circuit (arithmetic circuit 22a in FIG. 7A, FIG. See MAC221, etc. in 7(b)). The parameter update circuit can also be implemented by, for example, the arithmetic circuit 22b, the arithmetic circuit 22c, and the arithmetic circuit 22d shown in FIGS. That is, the arithmetic circuits shown in FIGS. 7 to 10 are also examples of arithmetic circuits for parameter update arithmetic.

例えば、高精度パラメタ更新処理部１０４ｂが演算回路２２ａ，演算回路２２ｂ、演算回路２２ｃまたは演算回路２２ｄにより実現される場合、当該回路において演算に用いられる各変数、加算器および乗算器による演算が高演算精度（第２の演算精度）に対応していればよい。このとき、各変数、加算および乗算のすべてが同じ精度である必要はなく、当該回路においてパラメタ更新演算に用いられる各変数、加算および乗算のいずれかの精度が、高効率推論処理部１０３ａを実現する演算回路において推論演算に用いられる各変数、加算および乗算のいずれかの精度よりも高ければよい。 For example, when the high-precision parameter update processing unit 104b is realized by the arithmetic circuit 22a, the arithmetic circuit 22b, the arithmetic circuit 22c, or the arithmetic circuit 22d, each variable used in the arithmetic operation in the circuit, the arithmetic operation by the adder and the multiplier are high. It suffices if it corresponds to the calculation accuracy (second calculation accuracy). At this time, it is not necessary that all variables, additions and multiplications have the same precision, and the precision of any one of the variables, additions and multiplications used in the parameter update operation in the relevant circuit realizes the highly efficient inference processing unit 103a. It is sufficient if the precision of each variable, addition or multiplication used for inference calculation in the arithmetic circuit to be used is higher than the precision of either one.

一方、例えば、高効率パラメタ更新処理部１０４ａが演算回路２２ａ，演算回路２２ｂ、演算回路２２ｃまたは演算回路２２ｄにより実現される場合、当該回路において演算に用いられる各変数、加算器および乗算器による演算が低演算精度（第１の演算精度）に対応していればよい。 On the other hand, for example, when the high-efficiency parameter update processing unit 104a is realized by the arithmetic circuit 22a, the arithmetic circuit 22b, the arithmetic circuit 22c, or the arithmetic circuit 22d, each variable used in the arithmetic operation in the circuit, the arithmetic operation by the adder and the multiplier corresponds to the low calculation accuracy (first calculation accuracy).

［動作］
次に、本実施形態の学習装置１００の動作を説明する。図１１は、本実施形態の学習装置１００の動作の例を示すフローチャートである。図１１に示す動作は、例えば、制御部１０５による制御に基づいて実行される。[motion]
Next, the operation of the learning device 100 of this embodiment will be described. FIG. 11 is a flow chart showing an example of the operation of the learning device 100 of this embodiment. The operation shown in FIG. 11 is executed under the control of the control unit 105, for example.

図１１に示す例では、まず、制御部１０５が、学習前モデル記憶部１０１から学習前モデルを読み出すとともに、学習用データ記憶部１０２から学習用データを読み出す（ステップＳ１１）。 In the example shown in FIG. 11, first, the control unit 105 reads a pre-learning model from the pre-learning model storage unit 101 and reads learning data from the learning data storage unit 102 (step S11).

次いで、制御部１０５は、必要に応じて高効率推論処理部１０３ａおよび高精度推論処理部１０３ｂを制御して、第１層～第ｎ層までの全ての層に含まれる各ユニットについて順に推論処理を実施する（ステップＳ１２：順伝搬）。このとき、制御部１０５は、少なくとも一部のユニットの推論処理を高効率推論処理部１０３ａに実施させる。なお、制御部１０５は、全てのユニットの推論処理を高効率推論処理部１０３ａに実施させてもよいし、一部のユニットの推論処理を高効率推論処理部１０３ａに実施させてもよい。順伝搬で、高効率推論処理部１０３ａに一部のユニットの推論処理を実施させる場合、制御部１０５は、残りのユニットの推論処理を高精度推論処理部１０３ｂに実施させてもよい。 Next, the control unit 105 controls the high-efficiency inference processing unit 103a and the high-precision inference processing unit 103b as necessary to sequentially perform inference processing on each unit included in all layers from the first layer to the n-th layer. (Step S12: forward propagation). At this time, the control unit 105 causes the highly efficient inference processing unit 103a to perform the inference processing of at least some units. The control unit 105 may cause the high-efficiency inference processing unit 103a to perform inference processing for all units, or may cause the high-efficiency inference processing unit 103a to perform inference processing for some units. In forward propagation, when the high-efficiency inference processing unit 103a performs inference processing for some units, the control unit 105 may cause the high-precision inference processing unit 103b to perform inference processing for the remaining units.

高効率推論処理部１０３ａおよび高精度推論処理部１０３ｂは、制御部１０５からの指示に応じて、指定された層またはユニットの推論処理を実施する。 The high-efficiency inference processing unit 103a and the high-precision inference processing unit 103b carry out inference processing of designated layers or units in accordance with instructions from the control unit 105. FIG.

次いで、制御部１０５は、必要に応じて高効率パラメタ更新処理部１０４ａおよび高精度パラメタ更新処理部１０４ｂを制御して、各層のユニットの出力を計算するためのパラメタのうちの所定のパラメタについて、パラメタ更新処理を実施する（ステップＳ１３：パラメタ更新処理）。このとき、制御部１０５は、少なくとも一部のパラメタについてパラメタ更新処理を高精度パラメタ更新処理部１０４ｂに実施させる。なお、制御部１０５は、全てのパラメタのパラメタ更新処理を高精度パラメタ更新処理部１０４ｂに実施させてもよいし、一部のパラメタのパラメタ更新処理を高精度パラメタ更新処理部１０４ｂに実施させてもよい。パラメタ更新処理で、高精度パラメタ更新処理部１０４ｂに一部のパラメタのパラメタ更新処理のみを実施させる場合、制御部１０５は、残りのパラメタの全てのパラメタ更新処理を高効率パラメタ更新処理部１０４ａに実施させてもよいし、残りのパラメタの一部のパラメタ更新処理を高効率パラメタ更新処理部１０４ａに実施させてもよい。なお、後者の場合、一部のパラメタについてはパラメタ更新処理自体が省略される。 Next, the control unit 105 controls the high-efficiency parameter update processing unit 104a and the high-accuracy parameter update processing unit 104b as necessary to calculate the output of each layer unit. A parameter update process is performed (step S13: parameter update process). At this time, the control unit 105 causes the high-precision parameter update processing unit 104b to perform parameter update processing for at least some of the parameters. Note that the control unit 105 may cause the high-precision parameter update processing unit 104b to perform parameter update processing for all parameters, or may cause the high-precision parameter update processing unit 104b to perform parameter update processing for some parameters. good too. In the parameter update processing, when the high-accuracy parameter update processing unit 104b is caused to perform only the parameter update processing for some parameters, the control unit 105 instructs the high-efficiency parameter update processing unit 104a to perform the parameter update processing for all the remaining parameters. Alternatively, the high-efficiency parameter update processing unit 104a may be caused to perform the parameter update processing for part of the remaining parameters. In the latter case, the parameter update process itself is omitted for some parameters.

高効率パラメタ更新処理部１０４ａおよび高精度パラメタ更新処理部１０４ｂは、制御部１０５からの指示に応じて、指定されたパラメタのパラメタ更新処理を実施する。 The high-efficiency parameter update processing unit 104a and the high-precision parameter update processing unit 104b perform parameter update processing for designated parameters in accordance with instructions from the control unit 105. FIG.

最後に、制御部１０５は、ステップＳ１３で更新されたパラメタを含む学習済みモデルを学習後モデル記憶部１０７に記憶する（ステップＳ１４）。 Finally, the control unit 105 stores the learned model including the parameters updated in step S13 in the post-learning model storage unit 107 (step S14).

上記動作の他のバリエーションとして、例えば、複数の学習用データが保持されている場合には、学習用データの数分、ステップＳ１１～ステップＳ１４の動作を繰り返してもよい。なお、その場合、１つ前の学習用データに対する学習結果としての学習済みモデルが、次の学習用データに対する学習の学習前モデルとして使用される。 As another variation of the above operation, for example, when a plurality of learning data are held, the operations of steps S11 to S14 may be repeated by the number of learning data. In this case, a trained model as a learning result for the previous learning data is used as a pre-learning model for learning for the next learning data.

また、例えば、複数の学習用データが保持されている場合において、学習用データの数分、ステップＳ１２～ステップＳ１３の動作を繰り返し行うことも可能である。 Further, for example, when a plurality of pieces of learning data are held, it is possible to repeat the operations of steps S12 and S13 by the number of pieces of learning data.

さらには、学習用データの数に関わらず、同じ学習用データを使って上記のステップＳ１１～ステップＳ１４の繰り返し動作またはステップＳ１２～ステップＳ１４の繰り返し動作を、複数回繰り返すことも可能である（ｅｐｏｃｈ処理）。 Furthermore, regardless of the number of pieces of learning data, it is also possible to repeat the repeating operations of steps S11 to S14 or the repeating operations of steps S12 to S14 a plurality of times using the same learning data (epoch process).

また、ステップＳ１２の順伝搬で、例えば、低演算精度で推論処理を行う範囲（低精度推論範囲）を、予め定めておくだけでなく、ユーザから指定できるようにしたり、学習用データ毎やｅｐｏｃｈ処理の繰り返し毎に変化させることも可能である。 Further, in forward propagation in step S12, for example, the range in which inference processing is performed with low calculation accuracy (low-precision inference range) is not only predetermined, but also can be specified by the user, or can be specified for each learning data or epoch. It is also possible to change each iteration of the process.

また、ステップＳ１３のパラメタ更新処理で、例えば、高演算精度でパラメタ更新処理を行う範囲（高精度パラメタ更新範囲）を全結合層のみに限定してもよい。また、例えば、高精度パラメタ更新範囲、低演算精度でパラメタ更新処理を行う範囲（低精度パラメタ更新範囲）、パラメタ更新処理を行わない範囲を、予め定めておくだけでなく、ユーザから指定できるようにしたり、処理の度（学習用データ毎やｅｐｏｃｈ処理の繰り返し毎）に変化させることも可能である。 Further, in the parameter update process of step S13, for example, the range in which the parameter update process is performed with high calculation accuracy (high-precision parameter update range) may be limited only to the fully connected layers. In addition, for example, the high-precision parameter update range, the range in which parameter update processing is performed with low calculation precision (low-precision parameter update range), and the range in which parameter update processing is not performed are not only determined in advance, but also can be specified by the user. Alternatively, it can be changed each time processing is performed (for each learning data or each repetition of epoch processing).

また、図１２および図１３は、本実施形態の学習装置１００のより具体的な動作例を示すフローチャートである。なお、図１２および図１３に示す動作例は、学習装置１００を構成するハードウエアに着目して各ステップの動作を例示した例である。なお、ハードウエア構成は図４に示す構成とした。 12 and 13 are flow charts showing a more specific operation example of the learning device 100 of this embodiment. The operation examples shown in FIGS. 12 and 13 are examples in which the operation of each step is illustrated by focusing on the hardware that constitutes the learning apparatus 100. FIG. It should be noted that the hardware configuration was the configuration shown in FIG.

図１２に示す例では、まず高効率推論処理部１０３ａとしての低精度演算回路１１が、制御部１０５としての制御装置１４からの指示に応じて、学習用データ・学習前モデルをメモリ１３から読み出す（ステップＳ１１１）。 In the example shown in FIG. 12, first, the low-precision arithmetic circuit 11 as the high-efficiency inference processing unit 103a reads the learning data/pre-learning model from the memory 13 in response to an instruction from the control device 14 as the control unit 105. (Step S111).

次いで、該低精度演算回路１１が、順伝搬の一部（本例では第１層～第（ｋ－１）層までの各層に含まれる各ユニットの出力を計算する推論演算）を低演算精度で実施する（ステップＳ１１２）。そして、低精度演算回路１１は、ステップＳ１１２の演算結果（本例では、第ｋ－１層の各ユニットからの出力）をメモリ１３に保存する（ステップＳ１１３）。 Next, the low-precision arithmetic circuit 11 performs part of the forward propagation (in this example, an inference operation for calculating the output of each unit included in each layer from the first layer to the (k-1)-th layer) with a low arithmetic precision. (step S112). Then, the low-precision arithmetic circuit 11 saves the arithmetic result of step S112 (output from each unit of the k-1th layer in this example) in the memory 13 (step S113).

なお、本例では、学習前モデルは、入力層を第０層、出力層を第ｎ層として、第０層から第ｎ層までのｎ＋１層の多層構造のニューラルネットワークであるとする。また、上記の第（ｋ－１）層は、入力層（第０層）よりも後段でかつ出力層（第ｎ層）よりも前段の中間層とする。すなわち、ｋは、０＜ｋ－１＜ｎを満たす整数とする。 In this example, the pre-learning model is assumed to be a multi-layered neural network of n+1 layers from the 0th layer to the nth layer, where the input layer is the 0th layer and the output layer is the nth layer. The (k−1)-th layer is an intermediate layer that is located after the input layer (0th layer) and before the output layer (nth layer). That is, k is an integer that satisfies 0<k-1<n.

次いで、高精度推論処理部１０３ｂとしての高精度演算回路１２が、制御装置１４の指示に応じて、ステップＳ１１３で保存された演算結果（第ｋ－１層の各ユニットからの出力）を読み出す（ステップＳ２１１）。 Next, the high-precision arithmetic circuit 12 as the high-precision inference processing unit 103b reads out the arithmetic result (output from each unit of the k−1th layer) saved in step S113 in accordance with the instruction of the control device 14 ( step S211).

そして、該高精度演算回路１２は、順伝搬の続き（本例では、第ｋ層～第ｎ層までの各層に含まれる各ユニットの出力を計算する推論演算）を高演算精度で実施する（ステップＳ２１２）。 Then, the high-accuracy arithmetic circuit 12 carries out the continuation of the forward propagation (in this example, the inference operation for calculating the output of each unit included in each layer from the k-th layer to the n-th layer) with high arithmetic accuracy ( step S212).

次いで、高精度パラメタ更新処理部１０４ｂとしての高精度演算回路１２が、制御装置１４の指示に応じて、一部の層（本例では第ｋ層～第ｎ層までの各層）に含まれる各ユニットにおけるパラメタ（他ユニットとの結合重み等）を更新するためのパラメタ更新演算を高演算精度で実施する（ステップＳ２１２）。そして、高精度演算回路１２は、ステップＳ２１２の演算結果（本例では、第ｋ層～第ｎ層の各層に含まれる各ユニットにおける更新後のパラメタ）をメモリ１３に保存する（ステップＳ２１３）。 Next, the high-precision arithmetic circuit 12 as the high-precision parameter update processing unit 104b, according to the instruction of the control device 14, each layer included in some layers (each layer from the k-th layer to the n-th layer in this example) A parameter update operation for updating the parameters of the unit (such as weights of connections with other units) is performed with high accuracy (step S212). Then, the high-precision arithmetic circuit 12 saves the calculation result of step S212 (in this example, updated parameters in each unit included in each of the k-th to n-th layers) in the memory 13 (step S213).

なお、ステップＳ２１３で演算結果として保存された更新後のパラメタが、上述した学習済みモデルに相当する。 It should be noted that the updated parameters stored as the calculation results in step S213 correspond to the learned model described above.

図１２に示す例は、まず低精度演算回路１１が、高効率推論処理部１０３ａとして、一部の層について推論処理を行った上で、高精度演算回路１２が、高精度パラメタ更新処理部１０４ｂとして、残りの層について推論処理とパラメタ更新処理とを行う動作例である。 In the example shown in FIG. 12, first, the low-precision arithmetic circuit 11 performs inference processing on some layers as the high-efficiency inference processing unit 103a, and then the high-precision arithmetic circuit 12 performs the high-precision parameter update processing unit 104b. , is an operation example in which inference processing and parameter update processing are performed for the remaining layers.

また、図１３に示す例では、まず高効率推論処理部１０３ａとしての低精度演算回路１１が、制御部１０５としての制御装置１４からの指示に応じて、学習用データ・学習前モデルをメモリ１３から読み出す（ステップＳ１２１）。 In the example shown in FIG. 13, first, the low-precision arithmetic circuit 11 as the high-efficiency inference processing unit 103a stores the learning data/pre-learning model in the memory 13 in response to an instruction from the control device 14 as the control unit 105. (step S121).

次いで、該低精度演算回路１１が、順伝搬（本例では第１層～第ｎ層までの各層に含まれる各ユニットの出力を計算する推論演算）を低演算精度で実施する（ステップＳ１２２）。そして、低精度演算回路１１は、ステップＳ１２２の演算結果（本例では、出力層である第ｎ層のユニットからの出力）をメモリ１３に保存する（ステップＳ１２３）。 Next, the low-accuracy arithmetic circuit 11 performs forward propagation (in this example, an inference operation for calculating the output of each unit included in each layer from the first layer to the n-th layer) with low arithmetic accuracy (step S122). . Then, the low-precision arithmetic circuit 11 saves the arithmetic result of step S122 (in this example, the output from the unit of the n-th layer which is the output layer) in the memory 13 (step S123).

なお、本例でも、学習前モデルは、入力層を第０層、出力層を第ｎ層として、第０層から第ｎ層までのｎ＋１層の多層構造のニューラルネットワークであるとする。 Also in this example, the pre-learning model is assumed to be a multi-layered neural network of n+1 layers from the 0th layer to the nth layer, where the input layer is the 0th layer and the output layer is the nth layer.

次いで、高精度推論処理部１０３ｂとしての高精度演算回路１２が、制御装置１４の指示に応じて、ステップＳ１２３で保存された演算結果（出力層である第ｎ層のユニットからの出力）を読み出す（ステップＳ２２１）。 Next, the high-precision arithmetic circuit 12 as the high-precision inference processing unit 103b reads out the arithmetic result (output from the unit of the n-th layer which is the output layer) saved in step S123 in accordance with the instruction of the control device 14. (Step S221).

次いで、該高精度演算回路１２は、制御装置１４の指示に応じて、一部の層（本例では第ｋ層～第ｎ層までの各層）に含まれる各ユニットにおけるパラメタ（他ユニットとの結合重み等）を更新するためのパラメタ更新演算を高演算精度で実施する（ステップＳ２２２）。そして、高精度演算回路１２は、ステップＳ２２２の演算結果（本例では、第ｋ層～第ｎ層の各層に含まれる各ユニットにおける更新後のパラメタ）をメモリ１３に保存する（ステップＳ２２３）。 Next, the high-precision arithmetic circuit 12, in accordance with instructions from the control device 14, sets parameters (comparisons with other units) in each unit included in some layers (each layer from the k-th layer to the n-th layer in this example). A parameter update calculation for updating connection weights, etc., is performed with high calculation accuracy (step S222). Then, the high-precision arithmetic circuit 12 saves the calculation result of step S222 (in this example, updated parameters in each unit included in each of the k-th to n-th layers) in the memory 13 (step S223).

なお、ステップＳ２２３で演算結果として保存された更新後のパラメタが、上述した学習済みモデルに相当する。 It should be noted that the updated parameters stored as the calculation results in step S223 correspond to the learned model described above.

図１３に示す例は、低精度演算回路１１が、高効率推論処理部１０３ａとして、全ての層について推論処理を行った上で、高精度演算回路１２が、高精度パラメタ更新処理部１０４ｂとして、一部の層についてパラメタ更新処理を行う動作例である。 In the example shown in FIG. 13, the low-precision arithmetic circuit 11 performs inference processing for all layers as the high-efficiency inference processing unit 103a, and the high-precision arithmetic circuit 12 performs the high-precision parameter update processing unit 104b. It is an operation example of performing parameter update processing for some layers.

なお、図１２のステップＳ２１３や図１３のステップＳ２２３の後に、さらに、低精度演算回路１１が、高効率パラメタ更新処理部１０４ａとして、図１４に示すような動作を行うことも可能である。 After step S213 in FIG. 12 and step S223 in FIG. 13, the low-accuracy arithmetic circuit 11 can also perform the operations shown in FIG. 14 as the high-efficiency parameter update processing unit 104a.

すなわち、低精度演算回路１１が、高効率パラメタ更新処理部１０４ａとして、メモリ１３に保存されていた第ｋ層～第ｎ層の各層に含まれる各ユニットにおける更新後のパラメタを読み出す（ステップＳ２３１）。 That is, the low-precision arithmetic circuit 11, as the high-efficiency parameter update processing unit 104a, reads the updated parameters in each unit included in each of the k-th to n-th layers stored in the memory 13 (step S231). .

次いで、該低精度演算回路１１が、残りの層（本例では、第１層～第（ｋ－１）層までの各層）に含まれる各ユニットにおけるパラメタ（他ユニットとの結合重み等）を更新するためのパラメタ更新演算を低演算精度で実施する（ステップＳ２３２）。そして、低精度演算回路１１は、ステップＳ２３２の演算結果（本例では、第１層～第（ｋ－１）層の各層に含まれる各ユニットにおける更新後のパラメタ）をメモリ１３に保存する（ステップＳ２３３）。 Next, the low-precision arithmetic circuit 11 calculates the parameters (weight of connection with other units, etc.) in each unit included in the remaining layers (each layer from the 1st layer to the (k-1)th layer in this example). A parameter update calculation for updating is performed with low calculation accuracy (step S232). Then, the low-precision arithmetic circuit 11 saves the arithmetic result of step S232 (in this example, updated parameters in each unit included in each layer from the first layer to the (k−1)th layer) in the memory 13 ( step S233).

本例の場合、ステップＳ２１３またはステップＳ２２３で演算結果として保存された更新後のパラメタとステップＳ２３３で演算結果として保存された更新後のパラメタとが、上述した学習済みモデルに相当する。 In the case of this example, the updated parameter saved as the calculation result in step S213 or step S223 and the updated parameter saved as the calculation result in step S233 correspond to the learned model described above.

なお、図１２～図１４に示す動作は、１つの学習用データに対する学習処理の例である。したがって、複数の学習用データが保持されている場合には、学習用データの数分、上記動作や上記動作に含まれる各演算ステップを繰り返すことも可能である。また、学習用データの数に関わらず、同じ学習用データを使って上記動作または上記動作に含まれる各演算ステップを、複数回繰り返すことも可能である（ｅｐｏｃｈ処理）。また、上記動作において高精度パラメタ更新範囲とされる第ｋ層～第ｎ層は全結合層であってもよいし、ｋをユーザが指定したり、処理の度に変化させることも可能である。 Note that the operations shown in FIGS. 12 to 14 are examples of learning processing for one piece of learning data. Therefore, when a plurality of pieces of learning data are stored, it is possible to repeat the above operation and each operation step included in the above operation as many times as the number of pieces of learning data. Moreover, regardless of the number of learning data, it is also possible to repeat the above operation or each operation step included in the above operation a plurality of times using the same learning data (epoch processing). In addition, the k-th layer to the n-th layer, which are the high-precision parameter update range in the above operation, may be fully connected layers, or k may be specified by the user or changed for each process. .

以上のように、本実施形態によれば、学習アルゴリズムの演算処理を、推論処理とパラメタ更新処理とに分け、推論処理の少なくとも一部を低演算精度で演算し、かつパラメタ更新処理の少なくとも一部を高演算精度で演算することで、高演算精度を必要とする演算部分を最適化できるので、消費電力を低減しつつ十分な精度での学習が可能になる。 As described above, according to the present embodiment, the arithmetic processing of the learning algorithm is divided into the inference processing and the parameter update processing, at least part of the inference processing is operated with low arithmetic accuracy, and at least one of the parameter update processing is performed. By calculating the part with high calculation accuracy, the calculation part that requires high calculation accuracy can be optimized, so that power consumption can be reduced and learning with sufficient accuracy becomes possible.

実施形態２．
次に、本発明の第２の実施形態を説明する。図１５は、第２の実施形態のデータ処理装置の要部の構成例を示すブロック図である。図１５に示すデータ処理装置３００は、低精度演算処理部３１と、高精度演算処理部３２と、通信路３３と、データ変換部３４とを備える。Embodiment 2.
Next, a second embodiment of the invention will be described. FIG. 15 is a block diagram showing a configuration example of the main part of the data processing device of the second embodiment. A data processing device 300 shown in FIG. 15 includes a low-precision arithmetic processing unit 31, a high-precision arithmetic processing unit 32, a communication path 33, and a data conversion unit .

低精度演算処理部３１は、相対的に低い演算精度で所定の演算を行う処理部である。ここで、相対的に低い演算精度とは、高精度演算処理部３２が行う演算の演算精度よりも低い演算精度であればよい。 The low-precision arithmetic processing unit 31 is a processing unit that performs predetermined arithmetic operations with relatively low arithmetic accuracy. Here, the relatively low calculation accuracy may be any calculation accuracy lower than the calculation accuracy of the calculation performed by the high-precision calculation processing unit 32 .

高精度演算処理部３２は、相対的に高い演算精度で所定の演算を行う処理部である。ここで、相対的に高い演算精度とは、低精度演算処理部３１が行う演算の演算精度よりも高い演算精度であればよい。 The high-accuracy arithmetic processing unit 32 is a processing unit that performs predetermined arithmetic operations with relatively high arithmetic accuracy. Here, the relatively high calculation accuracy may be any calculation accuracy higher than the calculation accuracy of the calculation performed by the low-accuracy calculation processing unit 31 .

低精度演算処理部３１は、例えば、上記の高効率推論処理部１０３ａや高効率パラメタ更新処理部１０４ａであってもよい。また、高精度演算処理部３２は、例えば、上記の高精度推論処理部１０３ｂや高精度パラメタ更新処理部１０４ｂであってもよい。本実施形態においても、低精度演算処理部３１および高精度演算処理部３２が行うデータ処理において実際に行われる演算に用いる数字データの値域の広さ・細かさの尺度（より具体的には、その処理部を実現する演算回路におけるビット幅および小数点の取り扱い等で定まる数字データの値域の広さ・細かさの尺度）を、「精度」または「演算精度」と呼ぶ。 The low-accuracy arithmetic processing unit 31 may be, for example, the high-efficiency inference processing unit 103a or the high-efficiency parameter update processing unit 104a. Further, the high-precision arithmetic processing unit 32 may be, for example, the high-precision inference processing unit 103b or the high-precision parameter update processing unit 104b. Also in the present embodiment, the scale of the breadth and fineness of the value range of numerical data used for the calculation actually performed in the data processing performed by the low-precision arithmetic processing unit 31 and the high-precision arithmetic processing unit 32 (more specifically, A measure of the breadth and fineness of the value range of numerical data determined by the bit width and the handling of decimal points in an arithmetic circuit that implements the processing unit is called "accuracy" or "calculation precision."

本例では、低精度演算処理部３１と高精度演算処理部３２は、通信路３３およびデータ変換部３４を介して接続される。なお、データ変換部３４は、高精度演算処理部３２と通信路３３との間に設けられる。 In this example, the low-precision arithmetic processing section 31 and the high-precision arithmetic processing section 32 are connected via a communication path 33 and a data conversion section 34 . Note that the data converter 34 is provided between the high-precision arithmetic processor 32 and the communication path 33 .

通信路３３は、例えば、バスによって実現されてもよい。なお、通信路３３は、チップ内部に設けられる接続回路（Inter-connect）によって実現されていてもよい。また、通信路３３には、バスや接続回路だけでなく、該バスや該接続回路に接続されるメモリ（外部メモリやバッファ等）を含んでいてもよい。 The communication path 33 may be implemented by, for example, a bus. Note that the communication path 33 may be realized by an interconnection circuit (Inter-connect) provided inside the chip. Moreover, the communication path 33 may include not only a bus and a connection circuit, but also a memory (external memory, buffer, etc.) connected to the bus and the connection circuit.

データ変換部３４は、低精度演算処理部３１と高精度演算処理部３２との間でやりとりされるデータを対象に、所定の変換処理を行う。このとき、データ変換部３４によるデータ変換は、例えば、通信路３３において行われる通信（データのやりとり）において、データ量（１データあたりの通信量）がより少なくなる演算精度のデータのデータ通信になるように行われる。 The data conversion unit 34 performs predetermined conversion processing on data exchanged between the low-precision arithmetic processing unit 31 and the high-precision arithmetic processing unit 32 . At this time, the data conversion by the data conversion unit 34 is performed, for example, in data communication (data exchange) performed on the communication path 33, for data communication with an arithmetic accuracy that reduces the data amount (communication amount per piece of data). It is done so that

例えば、データ変換部３４は、通信路３３を通る各データが、低精度演算処理部３１の演算精度と高精度演算処理部３２の演算精度のうちのよりデータ量が少ない演算精度のデータになるように、送受信データの変換を行う。なお、データ量が同じであればより低い演算精度のデータになるように、送受信データの変換を行う。なお、図１５の構成であれば、データ変換部３４は、通信路３３を通る各データが、低精度演算処理部３１の演算精度のデータとなるようにデータの変換を行えばよい。低い方の演算精度に合わせることで、データ交換による演算精度の劣化を最小限にしつつ、低精度演算処理部３１側でのデータ変換を不要にできる。 For example, the data conversion unit 34 converts each data passing through the communication path 33 into data with the smaller amount of data, out of the calculation accuracy of the low-precision calculation processing unit 31 and the calculation accuracy of the high-precision calculation processing unit 32. Transmit and receive data is converted as follows. If the amount of data is the same, the transmission/reception data is converted so as to obtain data with lower calculation accuracy. In the configuration of FIG. 15, the data conversion unit 34 may convert data so that each piece of data passing through the communication path 33 becomes data with the calculation precision of the low-precision calculation processing unit 31. FIG. By adjusting to the lower calculation accuracy, data conversion on the low-precision calculation processing unit 31 side can be made unnecessary while minimizing deterioration in calculation accuracy due to data exchange.

ここで、データの変換には、データ型を、通信端点とされる処理部のうちより低い演算精度のデータ型に合わせる型変換や、データの圧縮（特に、数値列圧縮や桁数の削減等の数値データ圧縮）や、２以上の変換後のデータの合成が含まれる。 Here, the data conversion includes type conversion to match the data type with a data type with lower arithmetic precision in the processing unit that is the communication end point, data compression (in particular, numeric string compression, digit number reduction, etc.). numerical data compression) and synthesis of two or more transformed data.

データ変換部３４は、例えば、低精度演算処理部３１から高精度演算処理部３２に向けて送信されたデータを通信路３３を介して受信し、該受信データ（低演算精度のデータ）を高精度演算処理部３２が扱う演算精度（高演算精度）のデータに変換して、高精度演算処理部３２に渡す。また、データ変換部３４は、例えば、高精度演算処理部３２から低精度演算処理部３１に向けて送信されたデータを受信し、該受信データ（高演算精度のデータ）を低精度演算処理部３１が扱う演算精度（低演算精度）のデータに変換して、通信路３３に送出する。 The data conversion unit 34 receives, for example, data transmitted from the low-accuracy arithmetic processing unit 31 to the high-accuracy arithmetic processing unit 32 via the communication path 33, and converts the received data (low-accuracy data) to high-accuracy The data is converted into data of calculation precision (high calculation precision) handled by the precision calculation processing unit 32 and transferred to the high precision calculation processing unit 32 . Further, the data conversion unit 34, for example, receives data transmitted from the high-precision arithmetic processing unit 32 toward the low-precision arithmetic processing unit 31, and converts the received data (high-precision arithmetic data) to the low-precision arithmetic processing unit 31 is converted into data of calculation accuracy (low calculation accuracy) handled by 31 and sent to the communication path 33 .

例えば、低精度演算処理部３１の演算精度（ここでは、演算に用いられる数値のデータ型）が整数１６ビット（INT16）であり、高精度演算処理部３２の演算精度が浮動小数点３２ビット（FP32）である場合、データ変換部３４は、通信路３３を通るデータが整数１６ビットのデータとなるようにデータ変換を行えばよい。また、例えば、低精度演算処理部３１の演算精度が整数１６ビット（INT16）であり、高精度演算処理部３２の演算精度が浮動小数点１６ビット（FP16）である場合、データ変換部３４は、通信路３３を通るデータが整数１６ビットのデータとなるようにデータ変換を行えばよい。 For example, the arithmetic precision of the low-precision arithmetic processing unit 31 (here, the data type of numerical values used for arithmetic) is 16-bit integer (INT16), and the arithmetic precision of the high-precision arithmetic processing unit 32 is 32-bit floating point (FP32 ), the data conversion unit 34 may perform data conversion so that the data passing through the communication path 33 becomes integer 16-bit data. Further, for example, when the arithmetic precision of the low-precision arithmetic processing unit 31 is 16-bit integer (INT16) and the arithmetic precision of the high-precision arithmetic processing unit 32 is 16-bit floating point (FP16), the data conversion unit 34 Data conversion may be performed so that the data passing through the communication path 33 becomes integer 16-bit data.

なお、通信路３３を通るデータの通信が、片方向通信（例えば、低精度演算処理部３１から高精度演算処理部３２への送信のみ、高精度演算処理部３２から低精度演算処理部３１への送信のみ等）であってもよい。その場合、データ変換部３４は、実際に行われる通信に対応したデータ変換のみを行えばよい。 Data communication through the communication path 33 is one-way communication (for example, only transmission from the low-accuracy arithmetic processing unit 31 to the high-accuracy arithmetic processing unit 32, and transmission from the high-accuracy arithmetic processing unit 32 to the low-accuracy arithmetic processing unit 31). ) may be used. In that case, the data conversion unit 34 only needs to perform data conversion corresponding to communication that is actually performed.

本実施形態において、データ変換部３４は、データ処理装置３００の構成および動作に併せて設計されたデータ変換を行う専用のデータ変換回路により実現されていてもよい。データ変換部３４を専用の回路に実装することで、設定値や状態の読み出しやそれらに応じた分岐といった汎用化のための処理を省くことができ、さらなる効率化を図ることができる。また、データ変換を専用化することで、複数のデータをまとめて変換処理したり、複数のデータのデータ変換を並列で行ってその結果をとりまとめて一括送信する等の処理も容易に実装できるので、さらなる効率化を図ることができる。ここで、データ変換の並列処理およびその結果のとりまとめは、各データの変換と変換後のデータの合成とを組み合わせたデータ変換例の１つである。データ変換部３４は、そのようなデータ変換を、例えば、ＳＩＭＤ（Single instruction multiple data）演算により実現してもよい。 In this embodiment, the data conversion unit 34 may be realized by a dedicated data conversion circuit that performs data conversion designed in accordance with the configuration and operation of the data processing device 300 . By implementing the data conversion unit 34 in a dedicated circuit, it is possible to omit processing for generalization such as reading of setting values and states and branching according to them, and further efficiency improvement can be achieved. In addition, by dedicating data conversion, it is possible to easily implement processing such as collectively converting multiple data, performing data conversion of multiple data in parallel, and collectively sending the results. , further efficiency can be achieved. Here, the parallel processing of data conversion and the compilation of the results are one example of data conversion in which the conversion of each data and the synthesis of data after conversion are combined. The data conversion unit 34 may realize such data conversion by SIMD (Single Instruction Multiple Data) calculation, for example.

また、図１６は、第２の実施形態のデータ処理装置の他の構成例を示すブロック図である。図１５に示す例では、データ変換部３４が、高精度演算処理部３２側にのみ設けられていたが、データ変換部は低精度演算処理部３１側にも設けることが可能である。図１６に示すデータ処理装置３００は、図１５に示す構成と比べて、低精度演算処理部３１と通信路３３との間に、さらにデータ変換部３５を備える点が異なる。すなわち、本例では、低精度演算処理部３１と高精度演算処理部３２が、データ変換部３５、通信路３３およびデータ変換部３４を介して接続される。 FIG. 16 is a block diagram showing another configuration example of the data processing device of the second embodiment. In the example shown in FIG. 15, the data conversion section 34 is provided only on the high-precision arithmetic processing section 32 side, but the data conversion section can also be provided on the low-precision arithmetic processing section 31 side. The data processor 300 shown in FIG. 16 differs from the configuration shown in FIG. 15 in that a data converter 35 is further provided between the low-precision arithmetic processor 31 and the communication path 33 . That is, in this example, the low-precision arithmetic processing section 31 and the high-precision arithmetic processing section 32 are connected via the data conversion section 35 , the communication path 33 and the data conversion section 34 .

データ変換部３５は、低精度演算処理部３１と高精度演算処理部３２との間でやりとりされるデータを対象に、所定の変換処理を行う。 The data conversion unit 35 performs predetermined conversion processing on data exchanged between the low-precision arithmetic processing unit 31 and the high-precision arithmetic processing unit 32 .

本例において、データ変換部３４およびデータ変換部３５によるデータ変換は、通信路３３において行われる通信（データのやりとり）において、データ量が、低精度演算処理部３１で扱う演算精度で行うデータ通信のデータ量よりもさらに少なくなる演算精度のデータのデータ通信になるように行われる。 In this example, the data conversion by the data conversion unit 34 and the data conversion unit 35 is performed with the calculation accuracy handled by the low-precision calculation processing unit 31 in the communication (data exchange) performed on the communication path 33. The data communication is performed so that the data communication has an arithmetic precision that is even smaller than the data amount of the .

例えば、データ変換部３４およびデータ変換部３５は、通信路３３を通る各データが、低精度演算処理部３１の演算精度で行うデータ通信量よりも少なくなる演算精度（以下、超低演算精度）のデータになるように、送受信データの変換を行う。 For example, the data conversion unit 34 and the data conversion unit 35 each data passing through the communication path 33, the calculation accuracy (hereinafter referred to as ultra-low calculation accuracy) less than the amount of data communication performed with the calculation accuracy of the low-precision calculation processing unit 31 Transmit and receive data is converted so that it becomes the data of

本例では、データ変換部３４は、例えば、低精度演算処理部３１から高精度演算処理部３２に向けて送信されたデータを、データ変換部３５および通信路３３を介して受信し、該受信データ（データ変換部３５による変換後の超低演算精度のデータ）を高精度演算処理部３２が扱う演算精度（高演算精度）のデータに変換して、高精度演算処理部３２に渡す。また、データ変換部３４は、例えば、高精度演算処理部３２から低精度演算処理部３１に向けて送信されたデータを受信し、該受信データ（高演算精度のデータ）を超低演算精度のデータに変換して、通信路３３に送出する。 In this example, the data conversion unit 34 receives, for example, data transmitted from the low-precision arithmetic processing unit 31 toward the high-precision arithmetic processing unit 32 via the data conversion unit 35 and the communication path 33, and The data (ultra-low calculation accuracy data after conversion by the data conversion unit 35 ) is converted into calculation accuracy (high calculation accuracy) data handled by the high-accuracy calculation processing unit 32 and passed to the high-accuracy calculation processing unit 32 . Further, the data conversion unit 34, for example, receives data transmitted from the high-accuracy arithmetic processing unit 32 to the low-accuracy arithmetic processing unit 31, and converts the received data (high-accuracy data) to ultra-low arithmetic accuracy. It is converted into data and sent to the communication path 33 .

また、データ変換部３５は、例えば、高精度演算処理部３２から低精度演算処理部３１に向けて送信されたデータを、データ変換部３４および通信路３３を介して受信し、該受信データ（データ変換部３４による変換後の超低演算精度のデータ）を低精度演算処理部３１が扱う演算精度（低演算精度）のデータに変換して、低精度演算処理部３１に渡す。また、データ変換部３５は、例えば、高精度演算処理部３２から低精度演算処理部３１に向けて送信されたデータを受信し、該受信データ（高演算精度のデータ）を超低演算精度のデータに変換して、通信路３３に送出する。 Further, the data conversion unit 35 receives, for example, data transmitted from the high-precision arithmetic processing unit 32 toward the low-precision arithmetic processing unit 31 via the data conversion unit 34 and the communication path 33, and converts the received data ( The ultra-low-accuracy data converted by the data converter 34 ) is converted into arithmetic-accuracy (low-accuracy) data handled by the low-accuracy arithmetic processing unit 31 , and passed to the low-accuracy arithmetic processing unit 31 . Further, the data conversion unit 35, for example, receives data transmitted from the high-accuracy arithmetic processing unit 32 to the low-accuracy arithmetic processing unit 31, and converts the received data (high-accuracy data) to ultra-low arithmetic accuracy. It is converted into data and sent to the communication path 33 .

例えば、低精度演算処理部３１の演算精度（ここでは、演算に用いられる数値のデータ型）が整数１６ビット（INT16）であり、高精度演算処理部３２の演算精度が浮動小数点３２ビット（FP32）である場合、データ変換部３４およびデータ変換部３５は、通信路３３を通るデータが、INT16よりもデータ量が小さい整数１２ビット（INT12）や整数８ビット（INT8）になるようにデータを圧縮してもよい。データ変換部３４およびデータ変換部３５は、データ圧縮をする際、データが数値データとしての意味を失わないように精度のみを低下させる数値データ圧縮（例えば、下位ビットの削減等）を行う。 For example, the arithmetic precision of the low-precision arithmetic processing unit 31 (here, the data type of numerical values used for arithmetic) is 16-bit integer (INT16), and the arithmetic precision of the high-precision arithmetic processing unit 32 is 32-bit floating point (FP32 ), the data conversion unit 34 and the data conversion unit 35 convert data so that the data passing through the communication path 33 becomes a 12-bit integer (INT12) or an 8-bit integer (INT8), which has a smaller data amount than INT16. Can be compressed. The data conversion unit 34 and the data conversion unit 35 perform numerical data compression (for example, reduction of lower bits, etc.) that lowers only the accuracy so that the data does not lose its meaning as numerical data when compressing data.

なお、データ変換部３４およびデータ変換部３５は、深層学習における活性化関数の特徴を用いてデータ圧縮を行うことも可能である。例えば、活性化関数の１つであるステップ関数を利用すると、データを１ビットに圧縮できる。また、ＲｅＬＵ（ランプ関数）を利用すると、データの符号ビットを削減できる。 Note that the data conversion unit 34 and the data conversion unit 35 can also perform data compression using the feature of the activation function in deep learning. For example, using a step function, which is one of the activation functions, data can be compressed to 1 bit. Also, if ReLU (ramp function) is used, the number of sign bits of data can be reduced.

また、データ変換部３４およびデータ変換部３５は、ビット数を削減するデータ変換を行う際に、半端なビット数の複数のデータを詰めて纏めたり、そのようにして纏められたデータを複数のデータに分解する処理（パック／アンパック処理）を行ってもよい。このようなパック／アンパック処理も、専用化することで効率化を図ることができる。 Further, when performing data conversion for reducing the number of bits, the data conversion unit 34 and the data conversion unit 35 collect a plurality of pieces of data with an odd number of bits, or divide the collected data into a plurality of pieces of data. Processing (packing/unpacking processing) for disassembling into data may be performed. Such packing/unpacking processing can also be made more efficient by specializing it.

このようにして通信路３３を通るデータ量を削減することで、演算精度が異なるコア（演算回路）間のデータ交換を高速化できる。さらに、コア間のデータ交換をメモリを介して行う場合には、データ交換のためのメモリの使用量を削減できるので、メモリ使用にかかる消費電力の削減も可能である。 By reducing the amount of data passing through the communication path 33 in this way, the speed of data exchange between cores (arithmetic circuits) with different arithmetic accuracies can be increased. Furthermore, when data exchange between cores is performed via memory, the amount of memory used for data exchange can be reduced, so power consumption associated with memory use can also be reduced.

なお、データ交換を行う異なる演算精度のコアの組み合わせが２以上ある場合、各組み合わせ毎に、当該組み合わせにおけるコア間通信の通信路の一方または両方の端点に、上記のデータ変換部３４やデータ変換部３５を設ければよい。 Note that when there are two or more combinations of cores with different arithmetic accuracies that perform data exchange, for each combination, one or both end points of the communication path for inter-core communication in the combination are connected to the data conversion unit 34 or the data conversion unit. A portion 35 may be provided.

次に、本発明の概要を説明する。図１７は、本発明のデータ処理装置の概要を示すブロック図である。図１７に示すデータ処理装置５００は、低精度演算処理手段５０１と、高精度演算処理手段５０２と、第１のデータ変換手段５０４とを備える。 Next, an outline of the present invention will be described. FIG. 17 is a block diagram showing the outline of the data processing device of the present invention. A data processing device 500 shown in FIG.

低精度演算処理手段５０１（例えば、低精度演算処理部３１）は、第１の精度で所定の演算を行う。 The low-precision arithmetic processing means 501 (for example, the low-precision arithmetic processing section 31) performs a predetermined arithmetic operation with a first precision.

高精度演算処理手段５０２（例えば、高精度演算処理部３２）は、第１の精度よりも高い第２の精度で所定の演算を行う。 The high-precision arithmetic processing means 502 (for example, the high-precision arithmetic processing section 32) performs a predetermined arithmetic operation with a second precision higher than the first precision.

第１のデータ変換手段５０４（例えば、データ変換部３４）は、高精度演算処理手段５０２と低精度演算処理手段５０１との間でデータの受け渡しを行うための通信路５０３の高精度演算処理手段５０２側の端点に設けられる。 The first data conversion means 504 (for example, the data conversion unit 34) is a high-precision arithmetic processing means of the communication path 503 for exchanging data between the high-precision arithmetic processing means 502 and the low-precision arithmetic processing means 501. It is provided at the end point on the 502 side.

第１のデータ変換手段５０４は、接続先の高精度演算処理手段５０２との間で受け渡されるデータが高精度演算処理手段５０２で扱えるデータであるとともに、通信路５０３を通るデータ量が、第１の精度のデータを使用した場合のデータ量以下となり、かつ通信路５０３を通るデータの精度が第１の精度以下となるように、通信路５０３と高精度演算処理手段５０２との間を通るデータに対して所定の変換を行う。 The first data conversion means 504 ensures that the data transferred to and from the high-precision arithmetic processing means 502 of the connection destination is data that can be handled by the high-precision arithmetic processing means 502, and that the amount of data passing through the communication path 503 is Pass between the communication path 503 and the high-precision arithmetic processing means 502 so that the amount of data is less than the amount of data when using data with an accuracy of 1, and the accuracy of the data passing through the communication path 503 is less than or equal to the first accuracy. Predetermined conversion is performed on the data.

このような構成により、高い精度を必要とする演算と高い精度を必要としない演算が混在する処理であっても効率化できる。 With such a configuration, it is possible to improve efficiency even in a process in which calculations requiring high accuracy and calculations not requiring high accuracy are mixed.

また、図１８は、本発明のデータ処理回路の構成例を示す構成図である。図１８に示すデータ処理回路６００は、低精度演算回路６０１と、高精度演算回路６０２と、第１のデータ変換回路６０４とを備える。 FIG. 18 is a configuration diagram showing a configuration example of the data processing circuit of the present invention. A data processing circuit 600 shown in FIG. 18 includes a low-precision arithmetic circuit 601 , a high-precision arithmetic circuit 602 , and a first data conversion circuit 604 .

低精度演算回路６０１（例えば、低精度演算処理部３１や低精度演算回路１１）は、第１の精度で所定の演算を行う。 The low-precision arithmetic circuit 601 (for example, the low-precision arithmetic processing unit 31 and the low-precision arithmetic circuit 11) performs a predetermined arithmetic operation with a first precision.

高精度演算回路６０２（例えば、高精度演算処理部３２や高精度演算回路１２）は、第１の精度よりも高い第２の精度で所定の演算を行う。 The high-precision arithmetic circuit 602 (for example, the high-precision arithmetic processing unit 32 and the high-precision arithmetic circuit 12) performs a predetermined arithmetic operation with a second precision higher than the first precision.

第１のデータ変換回路６０４（例えば、データ変換部３４）は、高精度演算回路６０２と低精度演算回路６０１との間でデータの受け渡しを行うための通信路６０３の高精度演算回路６０２側の端点に設けられ、通信路６０３と高精度演算回路６０２との間を通るデータに対して、予め定められた変換を行う。 The first data conversion circuit 604 (for example, the data conversion unit 34) is provided on the high-precision arithmetic circuit 602 side of the communication path 603 for exchanging data between the high-precision arithmetic circuit 602 and the low-precision arithmetic circuit 601. It is provided at the end point and performs predetermined conversion on data passing between the communication path 603 and the high-precision arithmetic circuit 602 .

このようなデータ処理回路６００において、第１のデータ変換回路６０４と接続先の高精度演算回路６０２との間で受け渡されるデータが高精度演算回路６０２が扱うデータであり、通信路６０３を通るデータ量が、第１の精度のデータを使用した場合のデータ量以下であり、かつ当該通信路を通るデータの精度が第１の精度以下であるよう構成される。 In such a data processing circuit 600 , the data transferred between the first data conversion circuit 604 and the high-precision arithmetic circuit 602 connected thereto is the data handled by the high-precision arithmetic circuit 602 , and passes through the communication path 603 . The amount of data is equal to or less than the amount of data when data with the first accuracy is used, and the accuracy of data passing through the communication path is configured to be equal to or less than the first accuracy.

このような構成によっても、高い精度を必要とする演算と高い精度を必要としない演算が混在する処理を効率化できる。 With such a configuration as well, it is possible to improve the efficiency of processing in which operations that require high accuracy and operations that do not require high accuracy are mixed.

なお、データ処理回路６００は、図１９に示すように、さらに、通信路６０３の低精度演算回路６０１側の端点に設けられ、通信路６０３と低精度演算回路６０１との間を通るデータに対して、予め定められた変換を行う第２のデータ変換回路６０５を備えていてもよい。 As shown in FIG. 19, the data processing circuit 600 is further provided at the end point of the communication path 603 on the low-precision arithmetic circuit 601 side. may include a second data conversion circuit 605 that performs a predetermined conversion.

このようなデータ処理回路６００において、さらに、第２のデータ変換回路６０５と接続先の低精度演算回路６０１との間で受け渡されるデータが低精度演算回路６０１が扱うデータであるとともに、通信路６０３を通るデータ量が、第１の精度のデータを使用した場合のデータ量より少なく、かつ当該通信路６０３を通るデータの精度が第１の精度よりも低い構成であってもよい。 In the data processing circuit 600 as described above, the data transferred between the second data conversion circuit 605 and the low-precision arithmetic circuit 601 connected thereto is the data handled by the low-precision arithmetic circuit 601, and the communication path The amount of data passing through 603 may be smaller than the amount of data when data with the first accuracy is used, and the accuracy of the data passing through the communication path 603 may be lower than the first accuracy.

このような構成によれば、高い精度を必要とする演算と高い精度を必要としない演算が混在する処理をさらに効率化できる。 According to such a configuration, it is possible to further improve the efficiency of processing in which calculations that require high accuracy and calculations that do not require high accuracy are mixed.

なお、上記の実施形態は以下の付記のようにも記載できる。 It should be noted that the above embodiment can also be described as the following additional remarks.

（付記１）第１の精度で所定の演算を行う低精度演算処理手段と、第１の精度よりも高い第２の精度で所定の演算を行う高精度演算処理手段と、高精度演算処理手段と低精度演算処理手段との間でデータの受け渡しを行うための通信路の高精度演算処理手段側の端点に設けられる第１のデータ変換手段とを備え、第１のデータ変換手段は、接続先の高精度演算処理手段との間で受け渡されるデータが高精度演算処理手段で扱えるデータであるとともに、通信路を通るデータ量が、第１の精度のデータを使用した場合のデータ量以下となり、かつ通信路を通るデータの精度が第１の精度以下となるように、通信路と高精度演算処理手段との間を通るデータに対して所定の変換を行うことを特徴とするデータ処理装置。 (Appendix 1) Low-precision arithmetic processing means for performing predetermined arithmetic operations with a first precision, high-precision arithmetic processing means for performing predetermined arithmetic operations with a second precision higher than the first precision, and high-precision arithmetic processing means and a first data conversion means provided at the end point of the high-precision arithmetic processing means side of the communication path for exchanging data between the low-precision arithmetic processing means and the first data conversion means, the connection The data transferred to and from the high-precision arithmetic processing means is data that can be handled by the high-precision arithmetic processing means, and the amount of data passing through the communication path is equal to or less than the amount of data when using data of the first precision. and performing a predetermined conversion on the data passing between the communication channel and the high-precision arithmetic processing means so that the accuracy of the data passing through the communication channel is equal to or lower than the first accuracy. Device.

（付記２）第１のデータ変換手段は、通信路から、低精度演算処理手段より高精度演算処理手段に渡されるデータを第１の精度のデータとして受け付け、受け付けた第１の精度のデータを高精度演算処理手段で扱える精度のデータに変換し、第１のデータ変換手段は、高精度演算処理手段より低精度演算処理手段に渡されるデータを高精度演算処理手段で扱えるデータのままで受け付け、受け付けたデータを低精度演算処理手段で扱える精度のデータに変換する付記１に記載のデータ処理装置。 (Appendix 2) The first data conversion means accepts data passed from the low-precision arithmetic processing means to the high-precision arithmetic processing means from the communication path as first precision data, and converts the received first precision data to The first data conversion means receives the data passed from the high-precision arithmetic processing means to the low-precision arithmetic processing means as data that can be handled by the high-precision arithmetic processing means. , the data processing device according to appendix 1, which converts the received data into data of precision that can be handled by the low-precision arithmetic processing means.

（付記３）通信路の低精度演算処理手段側の端点に設けられる第２のデータ変換手段をさらに備え、第１のデータ変換手段および第２のデータ変換手段は、接続先の演算処理手段との間で受け渡されるデータが接続先の演算処理手段で扱えるデータであるとともに、通信路を通るデータ量が、第１の精度のデータを使用した場合のデータ量よりも少なく、かつ通信路を通るデータの精度が第１の精度よりも低くなるように、通信路と接続先の演算処理手段との間を通るデータに対して所定の変換を行う付記１または付記２に記載のデータ処理装置。 (Appendix 3) Further comprising a second data conversion means provided at an end point of the communication path on the side of the low-precision arithmetic processing means, the first data conversion means and the second data conversion means are connected to the arithmetic processing means of the connection destination. The data passed between the 3. The data processing device according to appendix 1 or 2, wherein the data passing between the communication path and the arithmetic processing means of the connection destination is subjected to a predetermined conversion so that the accuracy of the passing data is lower than the first accuracy. .

（付記４）第１のデータ変換手段および第２のデータ変換手段は、通信路から、接続先の演算処理手段より相手側の演算処理手段に渡されるデータを第１の精度よりも低い所定の第３の精度のデータとして受け付け、受け付けた第３の精度のデータを接続先の演算処理手段で扱える精度のデータに変換し、第１のデータ変換手段および第２のデータ変換手段は、接続先の演算処理手段から相手先の演算処理手段に渡されるデータを接続先の演算処理手段で扱えるデータのままで受け付け、受け付けたデータを相手側の演算処理手段で扱える精度のデータに変換する付記３に記載のデータ処理装置。 (Appendix 4) The first data conversion means and the second data conversion means convert the data passed from the connection destination arithmetic processing means to the counterpart arithmetic processing means through the communication path into a predetermined accuracy lower than the first accuracy. receiving as third precision data, converting the received third precision data into precision data that can be handled by the arithmetic processing means of the connection destination, and the first data conversion means and the second data conversion means Supplementary note 3 that accepts the data passed from the arithmetic processing means of the other to the arithmetic processing means of the other party as data that can be handled by the arithmetic processing means of the connection destination, and converts the received data into data of accuracy that can be handled by the arithmetic processing means of the other party The data processing device according to .

（付記５）第１のデータ変換手段および第２のデータ変換手段の少なくともいずれかは、通信路に変換後のデータを送出する際に、複数の変換後のデータを纏めて送出し、第１のデータ変換手段および第２のデータ変換手段の少なくともいずれかは、通信路から、纏められた複数の変換後のデータを受け付け、受け付けた複数の変換後のデータを分解した上で、分解後の各データに対して、接続先の演算処理手段で扱える精度のデータへの変換を行う付記３または付記４に記載のデータ処理装置。 (Appendix 5) At least one of the first data conversion means and the second data conversion means, when sending the converted data to the communication channel, collectively sends a plurality of converted data, at least one of the data conversion means and the second data conversion means receives a plurality of data after conversion from the communication channel, decomposes the received data after conversion, and decomposes the data after decomposing 4. The data processing device according to appendix 3 or appendix 4, wherein each data is converted into data of accuracy that can be handled by the arithmetic processing means of the connection destination.

（付記６）第１のデータ変換手段および第２のデータ変換手段で行う変換が予め定められ、かつ固定されている付記３から付記５のうちのいずれかに記載のデータ処理装置。 (Appendix 6) The data processing apparatus according to any one of Appendices 3 to 5, wherein conversions performed by the first data conversion means and the second data conversion means are predetermined and fixed.

（付記７）前記データ処理装置が、層状に結合された２以上のユニットで構成される所定の判別モデルを学習する学習装置であり、学習用データが入力されると、前記判別モデルの各ユニットの出力を所定の順番で計算する推論処理と、前記推論処理の結果に基づいて、前記各ユニットの出力の計算に用いられるパラメタの少なくとも一部を更新するパラメタ更新処理とを行う学習手段を備え、前記学習手段は、前記低精度演算処理手段として、前記推論処理において行われる演算のうちの指定された演算を、第１の演算精度で実施する高効率推論手段と、前記高精度演算処理手段として、前記パラメタ更新処理において行われる演算のうちの指定された演算を、前記第１の演算精度よりも高い第２の演算精度で実施する高精度パラメタ更新手段とを含む付記１から付記６のうちのいずれかに記載のデータ処理装置。 (Appendix 7) The data processing device is a learning device that learns a predetermined discriminant model composed of two or more units connected in layers, and when learning data is input, each unit of the discriminant model and a parameter update process for updating at least part of the parameters used in calculating the output of each unit based on the result of the inference process. , the learning means includes, as the low-precision arithmetic processing means, high-efficiency inference means for performing a specified arithmetic operation among the arithmetic operations performed in the inference processing with a first arithmetic precision, and the high-accuracy arithmetic processing means. and high-precision parameter update means for performing a specified operation among the operations performed in the parameter update process with a second operation accuracy higher than the first operation accuracy. A data processing device according to any one of the above.

（付記８Ａ）第１の精度で所定の演算を行う低精度演算回路と、第１の精度よりも高い第２の精度で所定の演算を行う高精度演算回路と、高精度演算回路と低精度演算回路との間でデータの受け渡しを行うための通信路の高精度演算回路側の端点に設けられ、通信路と高精度演算回路との間を通るデータに対して、予め定められた変換を行う第１のデータ変換回路とを備え、第１のデータ変換回路と接続先の高精度演算回路との間で受け渡されるデータが高精度演算回路が扱うデータであり、通信路を通るデータ量が、第１の精度のデータを使用した場合のデータ量以下であり、かつ当該通信路を通るデータの精度が第１の精度以下であることを特徴とするデータ処理回路。 (Appendix 8A) A low-precision arithmetic circuit that performs a predetermined operation with a first precision, a high-precision arithmetic circuit that performs a predetermined arithmetic operation with a second precision that is higher than the first precision, a high-precision arithmetic circuit, and a low-precision It is provided at the end point of the high-precision arithmetic circuit side of the communication path for exchanging data with the arithmetic circuit, and performs a predetermined conversion on the data passing between the communication path and the high-precision arithmetic circuit. data transferred between the first data conversion circuit and the connected high-precision arithmetic circuit is data handled by the high-precision arithmetic circuit, and the amount of data passing through the communication path is less than or equal to the amount of data when using data with a first precision, and the precision of data passing through the communication path is less than or equal to the first precision.

（付記８Ｂ）通信路の低精度演算回路側の端点に設けられ、通信路と低精度演算回路との間を通るデータに対して、予め定められた変換を行う第２のデータ変換回路をさらに備え、第２のデータ変換回路と接続先の低精度演算回路との間で受け渡されるデータが低精度演算回路が扱うデータであり、通信路を通るデータ量が、第１の精度のデータを使用した場合のデータ量より少なく、かつ当該通信路を通るデータの精度が第１の精度よりも低い付記８Ｂに記載のデータ処理回路。 (Appendix 8B) Further, a second data conversion circuit is provided at an end point of the communication path on the low-precision arithmetic circuit side and performs predetermined conversion on data passing between the communication path and the low-precision arithmetic circuit. The data transferred between the second data conversion circuit and the connected low-precision arithmetic circuit is the data handled by the low-precision arithmetic circuit, and the amount of data passing through the communication path is the data of the first precision. 8B. The data processing circuit of clause 8B, wherein the amount of data is less than the amount of data used and the accuracy of the data passing through the channel is less than the first accuracy.

（付記９）第１の精度で所定の演算を行う低精度演算処理手段と、第１の精度よりも高い第２の精度で所定の演算を行う高精度演算処理手段との間でデータの受け渡しを行うための通信路の高精度演算処理手段側の端点に設けられる第１のデータ変換手段が、接続先の高精度演算処理手段との間で受け渡されるデータが高精度演算処理手段で扱えるデータであるとともに、通信路を通るデータ量が、第１の精度のデータを使用した場合のデータ量以下となり、かつ通信路を通るデータの精度が第１の精度以下となるように、通信路と高精度演算処理手段との間を通るデータに対して所定の変換を行うことを特徴とするデータ処理方法。 (Appendix 9) Transferring data between low-precision arithmetic processing means that performs predetermined arithmetic operations with a first precision and high-precision arithmetic processing means that performs predetermined arithmetic operations with a second precision that is higher than the first precision The first data conversion means provided at the end point of the high-precision arithmetic processing means side of the communication path for performing the above, the data transferred between the high-precision arithmetic processing means of the connection destination can be handled by the high-precision arithmetic processing means In addition to being data, the amount of data passing through the communication channel is equal to or less than the amount of data when data with the first accuracy is used, and the accuracy of the data passing through the communication channel is equal to or less than the first accuracy. and a high-precision arithmetic processing means, wherein a predetermined conversion is performed on data passing between the high-precision arithmetic processing means.

（付記１０）第１のデータ変換手段が、接続先の高精度演算処理手段との間で受け渡されるデータが高精度演算処理手段で扱えるデータであるとともに、通信路を通るデータ量が、第１の精度のデータを使用した場合のデータ量より少なく、かつ通信路を通るデータの精度が第１の精度より低くなるように、通信路と高精度演算処理手段との間を通るデータに対して所定の変換を行い、通信路の低精度演算処理手段側の端点に設けられる第２のデータ変換手段が、接続先の低精度演算処理手段との間で受け渡されるデータが低精度演算処理手段で扱えるデータであるとともに、通信路を通るデータ量が、第１の精度のデータを使用した場合のデータ量より少なく、かつ通信路を通るデータの精度が第１の精度より低くなるように、通信路と低精度演算処理手段との間を通るデータに対して所定の変換を行う付記９に記載のデータ処理方法。 (Supplementary Note 10) The data transferred between the first data conversion means and the high-precision arithmetic processing means of the connection destination is data that can be handled by the high-precision arithmetic processing means, and the amount of data passing through the communication path is For data passing between the communication channel and the high-precision arithmetic processing means so that the amount of data is less than when data with a precision of 1 is used, and the precision of the data passing through the communication channel is lower than the first precision the second data conversion means provided at the end point of the communication path on the low-precision arithmetic processing means side performs the low-precision arithmetic processing on the data transferred between the low-precision arithmetic processing means of the connection destination In addition to being data that can be handled by means, the amount of data passing through the communication channel is less than the amount of data when using data with the first accuracy, and the accuracy of the data passing through the communication channel is lower than the first accuracy 10. The data processing method according to appendix 9, wherein the data passing between the communication channel and the low-precision arithmetic processing means is subjected to a predetermined conversion.

以上、実施形態および実施例を参照して本願発明を説明したが、本願発明は上記実施形態および実施例に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 Although the present invention has been described with reference to the embodiments and examples, the present invention is not limited to the above embodiments and examples. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

本発明は、深層学習に限らず、高い精度を必要とする演算と高い精度を必要としない演算が混在している処理を行う装置において、消費電力を抑えつつ該処理を行いたい場合に好適に適用可能である。 The present invention is not limited to deep learning, and is suitable for devices that perform processing that requires high accuracy and processing that does not require high accuracy, and is suitable when it is desired to perform the processing while suppressing power consumption. Applicable.

１０演算回路
１１低精度演算回路
１２高精度演算回路
１３メモリ
１４制御装置
１５バス
５１ユニット
５２ユニット間結合
５３推論処理
５４パラメタ更新処理
１００学習装置
１０１学習前モデル記憶部
１０２学習用データ記憶部
１０３ａ高効率推論処理部
１０３ｂ高精度推論処理部
１０４ａ高効率パラメタ更新処理部
１０４ｂ高精度パラメタ更新処理部
１０５制御部
１０６学習処理部
１０７学習後モデル記憶部
１０００コンピュータ
１００１ＣＰＵ
１００２主記憶装置
１００３補助記憶装置
１００４インタフェース
１００５ディスプレイ装置
１００６入力デバイス
１００７ＧＰＵ
１００８プロセッサ
２１バス
２２ａ、２２ｂ、２２ｃ、２２ｄ演算回路
２２１ＭＡＣ
２２２メモリ層
２２３乗加算ツリー
２２４ＡＬＵ
３００データ処理装置
３１低精度演算処理部
３２高精度演算処理部
３３通信路
３４、３５データ変換部
５００データ処理装置
５０１低精度演算処理手段
５０２高精度演算処理手段
５０３通信路
５０４第１のデータ変換手段
６００データ処理回路
６０１低精度演算回路
６０２高精度演算回路
６０３通信路
６０４第１のデータ変換回路
６０５第２のデータ変換回路
９０大規模学習回路10 arithmetic circuit 11 low-precision arithmetic circuit 12 high-precision arithmetic circuit 13 memory 14 control device 15 bus 51 unit 52 inter-unit connection 53 inference processing 54 parameter update processing 100 learning device 101 pre-learning model storage unit 102 learning data storage unit 103a high Efficient inference processing unit 103b High-precision inference processing unit 104a High-efficiency parameter update processing unit 104b High-precision parameter update processing unit 105 Control unit 106 Learning processing unit 107 Post-learning model storage unit 1000 Computer 1001 CPU
1002 Main Storage Device 1003 Auxiliary Storage Device 1004 Interface 1005 Display Device 1006 Input Device 1007 GPU
1008 processor 21 bus 22a, 22b, 22c, 22d arithmetic circuit 221 MAC
222 memory layer 223 multiply-add tree 224 ALU
300 data processing device 31 low-precision arithmetic processing unit 32 high-precision arithmetic processing unit 33 communication path 34, 35 data conversion unit 500 data processing device 501 low-precision arithmetic processing means 502 high-precision arithmetic processing means 503 communication path 504 first data conversion means 600 data processing circuit 601 low precision arithmetic circuit 602 high precision arithmetic circuit 603 communication channel 604 first data conversion circuit 605 second data conversion circuit 90 large scale learning circuit

Claims

low-precision arithmetic processing means for performing a predetermined arithmetic operation with a first precision;
high-precision arithmetic processing means for performing a predetermined arithmetic operation with a second precision higher than the first precision;
A first data conversion means provided at an end point on the high-precision arithmetic processing means side of a communication path for exchanging data between the high-precision arithmetic processing means and the low-precision arithmetic processing means,
In the first data conversion means, the data transferred between the high-precision arithmetic processing means of the connection destination is data that can be handled by the high-precision arithmetic processing means, and the amount of data passing through the communication path is between the communication path and the high-precision arithmetic processing means so that the amount of data is equal to or less than the amount of data when data with a first accuracy is used, and the accuracy of data passing through the communication path is equal to or less than the first accuracy. performs a predetermined transformation on the data passing through
Further comprising a second data conversion means provided at an end point of the communication path on the low-precision arithmetic processing means side,
In the first data conversion means and the second data conversion means, the data transferred between the arithmetic processing means of the connection destination is data that can be handled by the arithmetic processing means of the connection destination, and the communication path is The communication path and the Performs a predetermined conversion on the data passing between it and the arithmetic processing means of the connection destination
A data processing device characterized by:

The first data conversion means and the second data conversion means convert data passed from the communication path from the counterpart arithmetic processing means to the connection destination arithmetic processing means to a predetermined precision lower than the first accuracy. and converting the received data of the third precision into data of precision that can be handled by the arithmetic processing means of the connection destination,
The first data conversion means and the second data conversion means accept data transferred from the arithmetic processing means of the connection destination to the arithmetic processing means of the other party as data that can be handled by the arithmetic processing means of the connection destination. , converting the received data into data of the third accuracy .

At least one of the first data conversion means and the second data conversion means collectively sends a plurality of converted data when sending the converted data to the communication path,
At least one of the first data conversion means and the second data conversion means receives from the communication path the plurality of converted data that have been put together, and decomposes the received plurality of converted data. 3. The data processing apparatus according to claim 1 , further comprising converting each piece of decomposed data into data with a precision that can be handled by the arithmetic processing means of the connection destination.

4. The data processing apparatus according to any one of claims 1 to 3 , wherein conversions performed by said first data conversion means and said second data conversion means are predetermined and fixed.

The data processing device is a learning device that learns a predetermined discriminant model composed of two or more units connected in layers,
When learning data is input, inference processing for calculating the output of each unit of the discriminant model in a predetermined order; A learning means for performing parameter update processing for partially updating,
The learning means is
high-efficiency inference means for performing, as the low-precision arithmetic processing means, a specified arithmetic operation among the arithmetic operations performed in the inference processing with a first arithmetic precision;
high-precision parameter update means for performing a specified calculation among the calculations performed in the parameter update process with a second calculation precision higher than the first calculation precision, as the high-precision calculation processing means. 5. A data processing apparatus according to any one of claims 1 to 4 .

a low-precision arithmetic circuit that performs a predetermined arithmetic operation with a first precision;
a high-precision arithmetic circuit that performs a predetermined arithmetic operation with a second precision higher than the first precision;
provided at an end point on the high-precision arithmetic circuit side of a communication path for exchanging data between the high-precision arithmetic circuit and the low-precision arithmetic circuit, and connecting between the communication path and the high-precision arithmetic circuit a first data conversion circuit that performs a predetermined conversion on passing data,
Data transferred between the first data conversion circuit and the high-precision arithmetic circuit to which it is connected is data handled by the high-precision arithmetic circuit,
a second data conversion circuit provided at an end point of the communication path on the low-precision arithmetic circuit side and performing predetermined conversion on data passing between the communication path and the low-precision arithmetic circuit; prepared,
Data transferred between the second data conversion circuit and the low-precision arithmetic circuit to which it is connected is data handled by the low-precision arithmetic circuit,
The amount of data passing through the communication channel is less than the amount of data when using the data with the first accuracy, and the accuracy of the data passing through the communication channel is lower than the first accuracy.
A data processing circuit characterized by:

To exchange data between low-precision arithmetic processing means for performing predetermined arithmetic operations with a first precision and high-precision arithmetic processing means for performing predetermined arithmetic operations with a second precision higher than the first precision. A first data conversion means provided at an end point of the communication path on the side of the high-precision arithmetic processing means,
The data transferred between the high-precision arithmetic processing means of the connection destination is data that can be handled by the high-precision arithmetic processing means, and the amount of data passing through the communication path uses the data of the first precision. data passing between the communication channel and the high-precision arithmetic processing means so that the amount of data passing through the communication channel is less than or equal to the first precision, and the accuracy of the data passing through the communication channel is less than or equal to the first precision and
In the first data conversion means, the data transferred between the high-precision arithmetic processing means of the connection destination is data that can be handled by the high-precision arithmetic processing means, and the amount of data passing through the communication path is between the communication path and the high-precision arithmetic processing means so that the amount of data is less than the amount of data when data with the first accuracy is used, and the accuracy of the data passing through the communication path is lower than the first accuracy. performs a predetermined transformation on the data passing through
The second data conversion means provided at the end point of the communication path on the low-precision arithmetic processing means side can handle the data transferred between the low-accuracy arithmetic processing means to which it is connected and the low-accuracy arithmetic processing means. data, the amount of data passing through the communication path is less than the amount of data when using data with the first accuracy, and the accuracy of the data passing through the communication path is lower than the first accuracy. and performing a predetermined conversion on the data passing between the communication path and the low-precision arithmetic processing means
A data processing method characterized by: