JP7462140B2

JP7462140B2 - Neural network circuit and neural network operation method

Info

Publication number: JP7462140B2
Application number: JP2019196326A
Authority: JP
Inventors: 太樹尼崎; 康宏中原; 純太郎千竈; 全広飯田
Original assignee: Kumamoto University NUC
Current assignee: Kumamoto University NUC
Priority date: 2019-10-29
Filing date: 2019-10-29
Publication date: 2024-04-05
Anticipated expiration: 2039-10-29
Also published as: JP2021071772A

Description

本発明はニューラルネットワーク回路及びニューラルネットワーク演算方法に関する。 The present invention relates to a neural network circuit and a neural network calculation method.

近年、複数のコンピュータで集めたデータを１つのコンピュータに集約し処理するこれまでのクラウドコンピューティングではなく、ユーザに近いコンピュータ（以下「エッジコンピュータ」という。）で処理をさせるエッジコンピューティングに注目が集まっている。さらに、近年の畳み込みニューラルネットワークの技術の進歩に伴い、エッジコンピュータで畳み込みニューラルネットワークを行わせることが提案されている。 In recent years, attention has been focused on edge computing, which processes data on computers closer to users (hereafter referred to as "edge computers"), rather than the conventional cloud computing, which consolidates data collected by multiple computers and processes it on a single computer. Furthermore, with recent advances in convolutional neural network technology, it has been proposed to run convolutional neural networks on edge computers.

特開２０１８－１４２０４９号公報JP 2018-142049 A 特開２０１８－１６００８６号公報JP 2018-160086 A 特開２０１８－１１６４６９号公報JP 2018-116469 A 特開２０１７－２１４８３号公報JP 2017-21483 A 特開２０１９－５７０７２号公報JP 2019-57072 A

しかしながら、エッジコンピュータは畳み込みニューラルネットワークを実行するには十分な性能でない場合が多い。例えば、エッジコンピュータでは、畳み込みニューラルネットワークの実行に当たりメモリへのアクセスの回数が多くなり、演算速度が遅いという問題がある。 However, edge computers often do not have sufficient performance to run convolutional neural networks. For example, edge computers have the problem that they require a large number of memory accesses to run convolutional neural networks, resulting in slow calculation speeds.

上記事情に鑑み、本発明は、畳み込みニューラルネットワークにおける演算速度を向上させる技術を提供することを目的としている。 In view of the above, the present invention aims to provide a technology for improving the calculation speed in convolutional neural networks.

本発明の一態様は、畳み込みニューラルネットワークであって全結合層と複数の畳み込み層とを有する畳み込みニューラルネットワークにおける畳み込み層又は全結合層の処理を実行するニューラルネットワーク回路であって、各層に入力されるデータであってテンソルで表されるデータの各要素の値であるアクチベーションデータを記憶する第１記憶部と、前記畳み込み層又は前記全結合層で実行される処理のための重みデータを記憶する第２記憶部と、所定のタイミングで前記第１記憶部から１つの前記アクチベーションデータを読み出し、読み出した前記アクチベーションデータと前記重みデータのうちの重みデータに関する所定の条件を満たす１つの前記重みデータとの積の項の値である項値を取得する複数の演算部と、を備え、前記演算部が読み出した前記アクチベーションデータは、前記演算部が前記項値を演算した後に予め対応付けられた他の前記演算部に出力される、ニューラルネットワーク回路である。 One aspect of the present invention is a neural network circuit that executes processing of a convolutional layer or a fully connected layer in a convolutional neural network having a fully connected layer and multiple convolutional layers, the neural network circuit comprising: a first storage unit that stores activation data, which is the value of each element of data input to each layer and expressed as a tensor; a second storage unit that stores weight data for processing executed in the convolutional layer or the fully connected layer; and multiple calculation units that read one of the activation data from the first storage unit at a predetermined timing and obtain a term value that is the value of a term of the product of the read activation data and one of the weight data that satisfies a predetermined condition related to the weight data among the weight data, and the activation data read by the calculation unit is output to another of the calculation units that is previously associated with the activation data after the calculation unit calculates the term value.

本発明の一態様は、上記のニューラルネットワーク回路であって、前記畳み込み層の処理は畳み込み積分であって、前記畳み込み積分の実行時の前記重みデータは、前記畳み込み積分を実行するためのフィルタの各値であるフィルタ係数である。 One aspect of the present invention is the above-mentioned neural network circuit, in which the processing of the convolution layer is a convolution integral, and the weight data when the convolution integral is performed are filter coefficients, which are the values of a filter for performing the convolution integral.

本発明の一態様は、上記のニューラルネットワーク回路であって、前記全結合層の処理の実行時の前記重みデータは、前記全結合層の処理に用いられる重み係数である。 One aspect of the present invention is the above-mentioned neural network circuit, in which the weight data when the processing of the fully connected layer is executed is a weight coefficient used in the processing of the fully connected layer.

本発明の一態様は、畳み込みニューラルネットワークであって全結合層と複数の畳み込み層とを有する畳み込みニューラルネットワークにおける畳み込み層又は全結合層の処理を実行するニューラルネットワーク回路であって、各層に入力されるデータであってテンソルで表されるデータの各要素の値であるアクチベーションデータを記憶する第１記憶部と、前記畳み込み層又は前記全結合層で実行される処理のための重みデータを記憶する第２記憶部と、所定のタイミングで前記第１記憶部から１つの前記アクチベーションデータを読み出し、読み出した前記アクチベーションデータと前記重みデータのうちの重みデータに関する所定の条件を満たす１つの前記重みデータとの積の項の値である項値を取得する複数の演算部と、を備え、前記演算部が読み出した前記アクチベーションデータは、前記演算部が前記項値を演算した後に予め対応付けられた他の前記演算部に出力されるニューラルネットワーク回路が実行するニューラルネットワーク演算方法であって、前記演算部が、所定のタイミングで前記第１記憶部から１つの前記アクチベーションデータを読み出し、読み出した前記アクチベーションデータと前記重みデータのうちの重みデータに関する所定の条件を満たす１つの前記重みデータとの積の項の値である項値を取得する取得ステップと、前記演算部が読み出した前記アクチベーションデータが、前記演算部が前記項値を演算した後に予め対応付けられた他の前記演算部に出力される出力ステップと、を有するニューラルネットワーク演算方法である。 One aspect of the present invention is a neural network circuit that executes processing of a convolutional layer or a fully connected layer in a convolutional neural network having a fully connected layer and multiple convolutional layers, the neural network circuit comprising: a first storage unit that stores activation data, which is the value of each element of data that is input to each layer and is expressed as a tensor; a second storage unit that stores weight data for processing executed in the convolutional layer or the fully connected layer; and multiple calculation units that read out one of the activation data from the first storage unit at a predetermined timing and obtain a term value that is the value of a term of the product of the read out activation data and one of the weight data that satisfies a predetermined condition related to the weight data. The activation data read by the calculation unit is output to another calculation unit associated with the activation data after the calculation unit calculates the term value. The neural network calculation method includes an acquisition step in which the calculation unit reads one of the activation data from the first storage unit at a predetermined timing and acquires a term value that is the value of a term of a product of the read activation data and one of the weight data that satisfies a predetermined condition related to the weight data, and an output step in which the activation data read by the calculation unit is output to the other calculation unit associated with the activation data after the calculation unit calculates the term value.

本発明により、畳み込みニューラルネットワークにおける演算速度を向上させることができる。 This invention makes it possible to improve the calculation speed in convolutional neural networks.

実施形態のニューラルネットワーク回路１の機能構成図の一例を示す図。FIG. 2 is a diagram showing an example of a functional configuration diagram of the neural network circuit 1 according to the embodiment. 実施形態の単位処理部１００における情報の流れの一例を説明する説明図。FIG. 4 is an explanatory diagram illustrating an example of an information flow in a unit processing unit 100 of the embodiment. 実施形態の単位処理部１００間の情報の流れの一例を説明する説明図。FIG. 4 is an explanatory diagram illustrating an example of an information flow between unit processing sections 100 of the embodiment. 実施形態の単位処理部１００間のコンフィギュレーションデータの流れの一例を説明する説明図。FIG. 4 is an explanatory diagram illustrating an example of a flow of configuration data between unit processing units 100 of the embodiment. 実施形態の積和演算部１０２における処理の流れの一例を説明する説明図。FIG. 4 is an explanatory diagram illustrating an example of a processing flow in a product-sum calculation unit 102 according to the embodiment. 実施形態のニューラルネットワーク回路１が実行する畳み込み処理を説明する第１の説明図。FIG. 2 is a first explanatory diagram illustrating a convolution process executed by the neural network circuit 1 according to the embodiment. 実施形態のニューラルネットワーク回路１が実行する畳み込み演算を説明する第２の説明図。FIG. 2 is a second explanatory diagram illustrating the convolution operation executed by the neural network circuit 1 according to the embodiment. 実施形態の第１バッファ記憶部１１から各単位処理部１００へ出力される情報の流れを説明する図。5 is a diagram for explaining the flow of information output from a first buffer memory unit 11 to each unit processing unit 100 in the embodiment. 実施形態の第２バッファ記憶部１２から各単位処理部１００へ出力される情報の流れを説明する図。5 is a diagram for explaining the flow of information output from the second buffer memory unit 12 to each unit processing unit 100 in the embodiment. FIG. 実施形態の単位処理部１００から第１バッファ記憶部１１へ出力される情報の流れを説明する図。5 is a diagram for explaining the flow of information output from the unit processing unit 100 to a first buffer memory unit 11 in the embodiment. 実施形態の単位処理部１００から第２バッファ記憶部１２へ出力される情報の流れを説明する図。5 is a diagram for explaining the flow of information output from the unit processing unit 100 to a second buffer memory unit 12 in the embodiment. FIG. 実施形態の第１バッファ記憶部１１から主演算部１０へ出力される情報の出力先を説明する図。5A and 5B are diagrams for explaining destinations of information output from a first buffer memory unit 11 to a main processing unit 10 according to an embodiment. 実施形態の主演算部１０から第１バッファ記憶部１１へ出力される情報の出力元を説明する図。4 is a diagram for explaining the source of information output from the main processing unit 10 to a first buffer memory unit 11 according to the embodiment. FIG. 実施形態における畳み込み処理の１番目の処理を説明する説明図。FIG. 4 is an explanatory diagram for explaining a first process of the convolution process in the embodiment. 実施形態における畳み込み処理の２番目の処理を説明する説明図。FIG. 11 is an explanatory diagram for explaining a second process of the convolution process in the embodiment. 実施形態における畳み込み処理の３番目の処理を説明する説明図。FIG. 11 is an explanatory diagram for explaining a third process of the convolution process in the embodiment. 実施形態における畳み込み処理の４番目の処理を説明する説明図。FIG. 11 is an explanatory diagram for explaining a fourth process of the convolution process in the embodiment. 実施形態における畳み込み処理の５番目の処理を説明する説明図。FIG. 11 is an explanatory diagram for explaining a fifth process of the convolution process in the embodiment. 実施形態における畳み込み処理の６番目の処理を説明する説明図。FIG. 11 is an explanatory diagram for explaining a sixth process of the convolution process in the embodiment. 実施形態における畳み込み処理の７番目の処理を説明する説明図。FIG. 11 is an explanatory diagram for explaining a seventh process of the convolution process in the embodiment. 実施形態における畳み込み処理の流れの一例を説明するフローチャート。11 is a flowchart illustrating an example of the flow of a convolution process according to the embodiment. 実施形態における全結合処理の概要を説明する説明図。FIG. 4 is an explanatory diagram illustrating an overview of a full join process according to an embodiment. 実施形態の全結合処理の実行時の単位処理部１００間のデータの流れの一例を説明する第１の説明図。FIG. 4 is a first explanatory diagram illustrating an example of a data flow between the unit processing units 100 when the all-join process of the embodiment is executed. 実施形態の全結合処理の実行時の単位処理部１００間のデータの流れの一例を説明する第２の説明図。FIG. 11 is a second explanatory diagram illustrating an example of the data flow between the unit processing units 100 when the all-join process of the embodiment is executed. 実施形態における全結合処理の１番目の処理の一例を説明する説明図。FIG. 11 is an explanatory diagram illustrating an example of a first process of a full joining process according to the embodiment. 実施形態における全結合処理の２番目の処理の一例を説明する第１の説明図。FIG. 11 is a first explanatory diagram illustrating an example of a second process of the full joining process according to the embodiment. 実施形態における全結合処理の２番目の処理の一例を説明する第２の説明図。FIG. 11 is a second explanatory diagram illustrating an example of the second process of the full joining process in the embodiment. 実施形態における全結合処理の３番目の処理の一例を説明する説明図。FIG. 11 is an explanatory diagram illustrating an example of a third process of the full joining process according to the embodiment. 実施形態における全結合処理の４番目の処理の一例を説明する説明図。FIG. 11 is an explanatory diagram illustrating an example of a fourth process of the full joining process according to the embodiment. 実施形態における全結合処理の５番目の処理の一例を説明する説明図。FIG. 11 is an explanatory diagram illustrating an example of a fifth process of the full joining process according to the embodiment. 実施形態における全結合処理の６番目の処理の一例を説明する説明図。FIG. 11 is an explanatory diagram illustrating an example of a sixth process of the full joining process according to the embodiment. 実施形態における全結合処理の流れの一例を示すフローチャート。11 is a flowchart showing an example of the flow of a full join process in the embodiment.

以下、実施形態のニューラルネットワーク回路及びニューラルネットワーク演算方法を、図面を参照して説明する。なお、以下の説明において接続とは電気的な接続を意味する。また、以下の説明においては、同一の回路とは、回路素子間の相対的な位置が同一であることを意味する。相対的な位置とは、基準となる回路素子から見た他の回路素子の位置を意味する。同一の回路とは、回路素子間の接続関係が同一であることは必ずしも意味しない。 The neural network circuit and neural network calculation method of the embodiment will be described below with reference to the drawings. In the following description, connection means electrical connection. In the following description, the same circuit means that the relative positions between circuit elements are the same. Relative position means the position of another circuit element as viewed from a reference circuit element. The same circuit does not necessarily mean that the connection relationships between circuit elements are the same.

図１は、実施形態のニューラルネットワーク回路１の機能構成図の一例を示す図である。ニューラルネットワーク回路１は、自装置に接続された第１外部メモリ９１、第２外部メモリ９２及び制御用外部メモリ９３から全結合層を有する畳み込みニューラルネットワークによる処理の対象のデータを取得する。第１外部メモリ９１及び第２外部メモリ９２は、半導体記憶装置などの記憶装置を用いて構成される。第１外部メモリ９１及び第２外部メモリ９２は、例えば、ＤＲＡＭ（Dynamic Random Access Memory）である。制御用外部メモリ９３は、半導体記憶装置などの記憶装置を用いて構成される。制御用外部メモリ９３は、例えば、フラッシュメモリである。 FIG. 1 is a diagram showing an example of a functional configuration diagram of a neural network circuit 1 according to an embodiment. The neural network circuit 1 acquires data to be processed by a convolutional neural network having a fully connected layer from a first external memory 91, a second external memory 92, and a control external memory 93 connected to the neural network circuit 1. The first external memory 91 and the second external memory 92 are configured using a storage device such as a semiconductor storage device. The first external memory 91 and the second external memory 92 are, for example, a DRAM (Dynamic Random Access Memory). The control external memory 93 is configured using a storage device such as a semiconductor storage device. The control external memory 93 is, for example, a flash memory.

ニューラルネットワーク回路１は、取得したデータに基づいて畳み込みニューラルネットワークの各畳み込み層の畳み込み積分を実行する。また、ニューラルネットワーク回路１は、取得したデータに基づいて、全結合層を有する畳み込みニューラルネットワークの全結合層における処理を実行する。全結合層における処理は、具体的には積和演算である。 The neural network circuit 1 executes convolution integrals in each convolution layer of the convolution neural network based on the acquired data. The neural network circuit 1 also executes processing in the fully connected layer of the convolution neural network, which has a fully connected layer, based on the acquired data. Specifically, the processing in the fully connected layer is a product-sum operation.

ニューラルネットワーク回路１は、第１バッファ記憶部１１、第２バッファ記憶部１２、制御用バッファ記憶部１３、制御部１４及び主演算部１０を備える。 The neural network circuit 1 includes a first buffer memory unit 11, a second buffer memory unit 12, a control buffer memory unit 13, a control unit 14, and a main calculation unit 10.

第１バッファ記憶部１１は、半導体記憶装置などの記憶装置のうち一時的記録媒体を用いて構成される。第１バッファ記憶部１１は、第１外部メモリ９１、主演算部１０及び制御部１４に接続される。第１バッファ記憶部１１は、第１外部メモリ９１が出力したデータを一時的に記憶する。第１バッファ記憶部１１は、主演算部１０が出力したデータを一時的に記憶する。第１バッファ記憶部１１は、制御部１４によって制御されて動作する。第１バッファ記憶部１１は、記憶したデータを所定のタイミングで主演算部１０又は第１外部メモリ９１に出力する。 The first buffer memory unit 11 is constructed using a temporary recording medium from among memory devices such as semiconductor memory devices. The first buffer memory unit 11 is connected to the first external memory 91, the main calculation unit 10, and the control unit 14. The first buffer memory unit 11 temporarily stores data output by the first external memory 91. The first buffer memory unit 11 temporarily stores data output by the main calculation unit 10. The first buffer memory unit 11 operates under the control of the control unit 14. The first buffer memory unit 11 outputs the stored data to the main calculation unit 10 or the first external memory 91 at a predetermined timing.

第１バッファ記憶部１１は、例えば、アクチベーションデータを記憶する。アクチベーションデータは、畳み込みニューラルネットワークの各層に入力されるデータであってテンソルで表されるデータ（以下「層入力データ」という。）の各要素の値である。層入力データは、０階のテンソル（すなわちスカラー）で表されてもよいし、１階のテンソル（すなわちベクトル）で表されてもよいし、２階以上のテンソルで表されてもよい。層入力データは、例えばチャネル数がｖ１であって各チャネルのデータはｖ２行ｖ３列の行列で表される場合、３階のテンソルで表されるデータである。 The first buffer memory unit 11 stores, for example, activation data. The activation data is the value of each element of data (hereinafter referred to as "layer input data") that is input to each layer of a convolutional neural network and is expressed as a tensor. The layer input data may be expressed as a 0th-order tensor (i.e., a scalar), a 1st-order tensor (i.e., a vector), or a 2nd-order or higher-order tensor. For example, when the number of channels is v1 and the data of each channel is expressed as a matrix with v2 rows and v3 columns, the layer input data is data expressed as a 3rd-order tensor.

第１バッファ記憶部１１は、例えば、コンフィギュレーションデータを記憶する。コンフィギュレーションデータは、詳細を後述する主演算部１０が備える複数の単位処理部１００間の接続関係を示すデータである。単位処理部１００は、１つの他の単位処理部１００に接続されてもよいし、複数の他の単位処理部１００に接続されてもよい。 The first buffer memory unit 11 stores, for example, configuration data. The configuration data is data that indicates the connection relationships between multiple unit processing units 100 provided in the main processing unit 10, the details of which will be described later. A unit processing unit 100 may be connected to one other unit processing unit 100, or may be connected to multiple other unit processing units 100.

第２バッファ記憶部１２は、半導体記憶装置などの記憶装置のうち一時的記録媒体を用いて構成される。第２バッファ記憶部１２は、第２外部メモリ９２、主演算部１０及び制御部１４に接続される。第２バッファ記憶部１２は、第２外部メモリ９２が出力したデータを一時的に記憶する。第２バッファ記憶部１２は、制御部１４によって制御されて動作する。 The second buffer memory unit 12 is configured using a temporary recording medium from among memory devices such as semiconductor memory devices. The second buffer memory unit 12 is connected to the second external memory 92, the main processing unit 10, and the control unit 14. The second buffer memory unit 12 temporarily stores data output by the second external memory 92. The second buffer memory unit 12 operates under the control of the control unit 14.

第２バッファ記憶部１２は、記憶したデータを所定のタイミングで主演算部１０に出力する。第２バッファ記憶部１２は、例えば、畳み込みを実行するための各層の１又は複数のフィルタ（以下「畳み込みフィルタ」という。）の各要素の値（以下「フィルタ係数」という。）を記憶する。また、第２バッファ記憶部１２は、例えば、全結合層の処理における重み（以下「全結合係数」という。）を記憶する。 The second buffer memory unit 12 outputs the stored data to the main calculation unit 10 at a predetermined timing. The second buffer memory unit 12 stores, for example, the values of each element (hereinafter referred to as "filter coefficients") of one or more filters (hereinafter referred to as "convolution filters") in each layer for performing convolution. The second buffer memory unit 12 also stores, for example, the weights (hereinafter referred to as "fully connected coefficients") in the processing of the fully connected layer.

制御用バッファ記憶部１３は、半導体記憶装置などの記憶装置のうち一時的記録媒体を用いて構成される。制御用バッファ記憶部１３は、制御用外部メモリ９３及び制御部１４に接続される。制御用バッファ記憶部１３は、制御用外部メモリ９３が出力したデータを一時的に記憶する。制御用バッファ記憶部１３は、記憶したデータを所定のタイミングで制御部１４に出力する。制御用バッファ記憶部１３は、例えば、マイクロコード等の制御命令を記憶する。 The control buffer memory unit 13 is configured using a temporary recording medium from among memory devices such as semiconductor memory devices. The control buffer memory unit 13 is connected to the control external memory 93 and the control unit 14. The control buffer memory unit 13 temporarily stores data output by the control external memory 93. The control buffer memory unit 13 outputs the stored data to the control unit 14 at a predetermined timing. The control buffer memory unit 13 stores control commands such as microcode, for example.

制御部１４は、ニューラルネットワーク回路１が備える各機能部の動作を制御する。具体的には、バスで接続されたＣＰＵ（Central Processing Unit）等のプロセッサとメモリとを備え、プログラムを実行する。制御部１４は、プログラムの実行によってニューラルネットワーク回路１が備える各機能部の動作を制御する。制御部１４は、例えば、第１バッファ記憶部１１と第２バッファ記憶部１２と主演算部１０の動作とを制御することで、後述する畳み込み処理の実行を制御する。制御部１４は、例えば、第１バッファ記憶部１１と第２バッファ記憶部１２と主演算部１０の動作とを制御することで、後述する全結合処理の実行を制御する。制御部１４は、例えば、コンフィギュレーションデータを第１バッファ記憶部１１から主演算部１０に入力させるタイミングを制御する。制御部１４は、例えば、第１バッファ記憶部１１から主演算部１０に入力させるコンフィギュレーションデータを入力させるタイミングに合わせて選択する機能を有する。制御部１４は、例えば、コンフィギュレーションデータに基づいて接続先を変更する後述する接続部１０１の動作を制御する。 The control unit 14 controls the operation of each functional unit included in the neural network circuit 1. Specifically, the control unit 14 includes a processor such as a CPU (Central Processing Unit) and a memory connected by a bus, and executes a program. The control unit 14 controls the operation of each functional unit included in the neural network circuit 1 by executing the program. For example, the control unit 14 controls the operation of the first buffer memory unit 11, the second buffer memory unit 12, and the main calculation unit 10 to control the execution of a convolution process described later. For example, the control unit 14 controls the operation of the first buffer memory unit 11, the second buffer memory unit 12, and the main calculation unit 10 to control the execution of a full connection process described later. For example, the control unit 14 controls the timing of inputting configuration data from the first buffer memory unit 11 to the main calculation unit 10. For example, the control unit 14 has a function of selecting the configuration data to be input from the first buffer memory unit 11 to the main calculation unit 10 according to the timing of input. For example, the control unit 14 controls the operation of a connection unit 101 described later that changes the connection destination based on the configuration data.

主演算部１０は、畳み込みニューラルネットワークの各畳み込み層の畳み込み積分を実行する。また、主演算部１０は、全結合層を有する畳み込みニューラルネットワークの全結合層における処理を実行する。 The main calculation unit 10 executes the convolution integral of each convolution layer of the convolutional neural network. The main calculation unit 10 also executes processing in the fully connected layer of the convolutional neural network that has a fully connected layer.

主演算部１０は、格子状に複数個配置された単位処理部１００を備える。単位処理部１００は、接続部１０１と積和演算部１０２とを備える。接続部１０１は、積和演算部１０２を予め対応付けられた他の単位処理部１００と、第１バッファ記憶部１１と、第２バッファ記憶部１２とに接続するためのインタフェースである。接続部１０１の予め対応付けられた他の単位処理部１００は、コンフィギュレーションデータが示す他の単位処理部１００である。 The main processing unit 10 has a plurality of unit processing units 100 arranged in a lattice pattern. Each unit processing unit 100 has a connection unit 101 and a product-sum calculation unit 102. The connection unit 101 is an interface for connecting the product-sum calculation unit 102 to other unit processing units 100 that are associated in advance, the first buffer memory unit 11, and the second buffer memory unit 12. The other unit processing units 100 that are associated in advance with the connection unit 101 are other unit processing units 100 indicated by the configuration data.

接続部１０１は、所定のタイミングで、コンフィギュレーションデータに基づいて接続先の単位処理部１００を変更する。より具体的には、接続部１０１は、所定のタイミングで、コンフィギュレーションデータが示す接続先に接続するように配線を変更する。配線の変更方法は、ＦＰＧＡ（field-programmable gate array）と同様である。ＦＰＧＡと同様の方法とは、具体的には、コンフィグレーションメモリに対して外部よりコンフィグレーションデータをダウンロードする方法である。所定のタイミングは、例えば、ニューラルネットワーク回路１の動作開始時である。所定のタイミングは、例えば、畳み込みニューラルネットワークの各畳み込み層の処理が終了した後であって全結合層による処理が開始される前のタイミングであってもよい。 The connection unit 101 changes the destination unit processing unit 100 based on the configuration data at a predetermined timing. More specifically, the connection unit 101 changes the wiring so as to connect to the destination indicated by the configuration data at a predetermined timing. The method of changing the wiring is similar to that of an FPGA (field-programmable gate array). Specifically, the method similar to that of an FPGA is a method of downloading configuration data from the outside to the configuration memory. The predetermined timing is, for example, the start of operation of the neural network circuit 1. The predetermined timing may be, for example, a timing after the processing of each convolution layer of the convolutional neural network is completed and before the processing by the fully connected layer is started.

積和演算部１０２は、積和演算を実行する。積和演算部１０２は、積和演算の実行結果に基づいて活性を取得する。積和演算部１０２は、取得した活性を予め定められた所定の活性化関数に入力し活性化関数による演算の結果（以下「活性結果」という。）を取得する。活性化関数は、ステップ関数であってもよいし、シグモイド関数であってもよいし、ＲｅＬＵ関数であってもよいし、恒等関数であってもよいし、ソフトマックス関数であってもよい。活性結果は、例えば、次段の層にアクチベーションデータとして入力される。 The product-sum calculation unit 102 executes a product-sum calculation. The product-sum calculation unit 102 acquires activation based on the result of the product-sum calculation. The product-sum calculation unit 102 inputs the acquired activation to a predetermined activation function and acquires the result of the calculation using the activation function (hereinafter referred to as the "activation result"). The activation function may be a step function, a sigmoid function, a ReLU function, an identity function, or a softmax function. The activation result is input to the next layer, for example, as activation data.

各単位処理部１００は単位処理部１００ごとの専用の導線によって接続部１０１を介して第１バッファ記憶部１１に接続される。また、各単位処理部１００は単位処理部１００ごとの専用の導線によって接続部１０１を介して第２バッファ記憶部１２に接続される。専用とは、第１バッファ記憶部１１又は第２バッファ記憶部１２と他の単位処理部１００との間の情報の送受信には使用されないことを意味する。 Each unit processing unit 100 is connected to the first buffer memory unit 11 via the connection unit 101 by a conductor dedicated to each unit processing unit 100. Also, each unit processing unit 100 is connected to the second buffer memory unit 12 via the connection unit 101 by a conductor dedicated to each unit processing unit 100. "Dedicated" means that it is not used to send or receive information between the first buffer memory unit 11 or the second buffer memory unit 12 and other unit processing units 100.

図２は、実施形態の単位処理部１００における情報の流れの一例を説明する説明図である。接続部１０１は、入力制御回路１１１、重み制御回路１１２及び出力制御回路１１３を備える。積和演算部１０２は、積和演算回路２０１及び活性化回路２０２を備える。 FIG. 2 is an explanatory diagram illustrating an example of the flow of information in the unit processing unit 100 of the embodiment. The connection unit 101 includes an input control circuit 111, a weight control circuit 112, and an output control circuit 113. The product-sum calculation unit 102 includes a product-sum calculation circuit 201 and an activation circuit 202.

入力制御回路１１１は、第１バッファ記憶部１１からコンフィギュレーションデータを取得する。入力制御回路１１１は、取得したコンフィギュレーションデータに基づき、取得したコンフィギュレーションデータが示す接続先に接続される。入力制御回路１１１は、第１バッファ記憶部１１が出力したアクチベーションデータを取得する。入力制御回路１１１は、接続先の単位処理部１００が出力したアクチベーションデータを取得する。 The input control circuit 111 acquires configuration data from the first buffer memory unit 11. Based on the acquired configuration data, the input control circuit 111 connects to the connection destination indicated by the acquired configuration data. The input control circuit 111 acquires the activation data output by the first buffer memory unit 11. The input control circuit 111 acquires the activation data output by the connected unit processing unit 100.

入力制御回路１１１は、接続先の単位処理部１００にアクチベーションデータを出力する。入力制御回路１１１は、接続先の単位処理部１００にコンフィギュレーションデータを出力する。接続先の単位処理部１００は、コンフィギュレーションデータに基づき接続されたコンフィギュレーションデータが示す接続先である。入力制御回路１１１は、レジスタ１１４を備える。レジスタ１１４は、取得したコンフィギュレーションデータ及びアクチベーションデータを記憶する。 The input control circuit 111 outputs activation data to the connected unit processing unit 100. The input control circuit 111 outputs configuration data to the connected unit processing unit 100. The connected unit processing unit 100 is the connection destination indicated by the configuration data connected based on the configuration data. The input control circuit 111 includes a register 114. The register 114 stores the acquired configuration data and activation data.

重み制御回路１１２は、第２バッファ記憶部１２からフィルタ係数及び全結合係数を取得する。重み制御回路１１２は、レジスタ１１５を備える。レジスタ１１５は、取得したフィルタ係数及び全結合係数を記憶する。 The weight control circuit 112 acquires the filter coefficients and all the coupling coefficients from the second buffer memory unit 12. The weight control circuit 112 includes a register 115. The register 115 stores the acquired filter coefficients and all the coupling coefficients.

出力制御回路１１３は、積和演算部１０２に接続される。出力制御回路１１３は、積和演算部１０２の演算結果である活性結果を取得する。出力制御回路１１３は、第１バッファ記憶部１１へ活性結果を出力する。出力制御回路１１３は、レジスタ１１６を備える。レジスタ１１６は、取得した活性結果を記憶する。出力制御回路１１３は、活性結果を入力制御回路１１１に入力するための導線（以下「フィードバック導線１１８」という。）を備える。フィードバック導線１１８は、単位処理部１００が外部メモリを介さずに活性結果を自身にフィードバックする場合に使用される。より詳しくは、単位処理部１００が外部メモリを介さずに活性結果を自身にフィードバックする場合に、制御部１４の制御によって、出力制御回路１が出力した活性結果がフィードバック導線１１８を介して入力制御回路１１１に入力される。 The output control circuit 113 is connected to the sum-of-products calculation unit 102. The output control circuit 113 acquires the activation result, which is the calculation result of the sum-of-products calculation unit 102. The output control circuit 113 outputs the activation result to the first buffer memory unit 11. The output control circuit 113 includes a register 116. The register 116 stores the acquired activation result. The output control circuit 113 includes a conductor (hereinafter referred to as the "feedback conductor 118") for inputting the activation result to the input control circuit 111. The feedback conductor 118 is used when the unit processing unit 100 feeds back the activation result to itself without going through an external memory. More specifically, when the unit processing unit 100 feeds back the activation result to itself without going through an external memory, the activation result output by the output control circuit 1 is input to the input control circuit 111 via the feedback conductor 118 under the control of the control unit 14.

積和演算回路２０１は、入力制御回路１１１、重み制御回路１１２及び活性化回路２０２に接続される回路である。積和演算回路２０１は、入力制御回路１１１を介して、第１バッファ記憶部１１又は接続先の他の単位処理部１００からアクチベーションデータを取得する。積和演算回路２０１は、重み制御回路１１２を介して第２バッファ記憶部１２からフィルタ係数又は全結合係数を取得する。積和演算回路２０１は取得したアクチベーションデータと、フィルタ係数又は全結合係数とに基づいて積和演算を実行し活性を取得する。積和演算回路２０１は、レジスタ１１７を備える。レジスタ１１７は、積和演算の演算途中の結果である中間値を記憶する。レジスタ１１７は、活性を記憶してもよい。積和演算回路２０１は取得した活性を活性化回路２０２に出力する。 The sum-of-products calculation circuit 201 is a circuit connected to the input control circuit 111, the weight control circuit 112, and the activation circuit 202. The sum-of-products calculation circuit 201 acquires activation data from the first buffer memory unit 11 or another connected unit processing unit 100 via the input control circuit 111. The sum-of-products calculation circuit 201 acquires filter coefficients or full coupling coefficients from the second buffer memory unit 12 via the weight control circuit 112. The sum-of-products calculation circuit 201 executes sum-of-products calculation based on the acquired activation data and the filter coefficients or full coupling coefficients to acquire activity. The sum-of-products calculation circuit 201 includes a register 117. The register 117 stores intermediate values that are the results of the sum-of-products calculation. The register 117 may store activity. The sum-of-products calculation circuit 201 outputs the acquired activity to the activation circuit 202.

活性化回路２０２は、積和演算回路２０１の出力（すなわち活性）を取得する。活性化回路２０２は、取得した活性を予め定められた所定の活性化関数に入力する。活性化回路２０２は、活性化関数による演算の結果である活性結果を取得する。活性化回路２０２は、取得した活性結果を出力制御回路１１３に出力する。 The activation circuit 202 acquires the output (i.e., activity) of the product-sum calculation circuit 201. The activation circuit 202 inputs the acquired activity to a predetermined activation function. The activation circuit 202 acquires an activity result that is the result of the calculation using the activation function. The activation circuit 202 outputs the acquired activity result to the output control circuit 113.

図３は、実施形態の単位処理部１００間の情報の流れの一例を説明する説明図である。
図３は、単位処理部１００間の情報の流れを説明するための具体例として、単位処理部１００－１、１００－２、１００－３、１００－４及び１００－５の５つの単位処理部１００の間の情報の流れを示す。単位処理部１００－１は、単位処理部１００－２、１００－３、１００－４及び１００－５の４つの他の単位処理部１００に接続される。単位処理部１００－１は、接続先の他の単位処理部１００のひとつである単位処理部１００－２に情報を出力する。情報は、例えば、アクチベーションデータである。単位処理部１００－１は、接続先の他の単位処理部１００のひとつである単位処理部１００－２から情報を取得する。 FIG. 3 is an explanatory diagram illustrating an example of the flow of information between the unit processing units 100 of the embodiment.
3 shows the flow of information between five unit processing units 100, 100-1, 100-2, 100-3, 100-4, and 100-5, as a specific example for explaining the flow of information between the unit processing units 100. The unit processing unit 100-1 is connected to four other unit processing units 100, 100-2, 100-3, 100-4, and 100-5. The unit processing unit 100-1 outputs information to the unit processing unit 100-2, which is one of the other unit processing units 100 to which it is connected. The information is, for example, activation data. The unit processing unit 100-1 acquires information from the unit processing unit 100-2, which is one of the other unit processing units 100 to which it is connected.

単位処理部１００－１は、接続先の他の単位処理部１００のひとつである単位処理部１００－３に情報を出力する。単位処理部１００－１は、接続先の他の単位処理部１００のひとつである単位処理部１００－３から情報を取得する。単位処理部１００－１は、接続先の他の単位処理部１００のひとつである単位処理部１００－４に情報を出力する。単位処理部１００－１は、接続先の他の単位処理部１００のひとつである単位処理部１００－４から情報を取得する。単位処理部１００－１は、接続先の他の単位処理部１００のひとつである単位処理部１００－５に情報を出力する。単位処理部１００－１は、接続先の他の単位処理部１００のひとつである単位処理部１００－５から情報を取得する。 The unit processing unit 100-1 outputs information to the unit processing unit 100-3, which is one of the other unit processing units 100 to which it is connected. The unit processing unit 100-1 acquires information from the unit processing unit 100-3, which is one of the other unit processing units 100 to which it is connected. The unit processing unit 100-1 outputs information to the unit processing unit 100-4, which is one of the other unit processing units 100 to which it is connected. The unit processing unit 100-1 acquires information from the unit processing unit 100-4, which is one of the other unit processing units 100 to which it is connected. The unit processing unit 100-1 outputs information to the unit processing unit 100-5, which is one of the other unit processing units 100 to which it is connected. The unit processing unit 100-1 acquires information from the unit processing unit 100-5, which is one of the other unit processing units 100 to which it is connected.

図４は、実施形態の単位処理部１００間のコンフィギュレーションデータの流れの一例を説明する説明図である。
単位処理部１００に入力されたコンフィギュレーションデータはレジスタ１１４に記憶される。レジスタ１１４に記憶されたコンフィギュレーションデータに基づいて単位処理部１００は、コンフィギュレーションデータが示す他の単位処理部１００に接続される。コンフィギュレーションデータは、接続先の他の単位処理部１００に出力される。 FIG. 4 is an explanatory diagram illustrating an example of the flow of configuration data between the unit processing units 100 of the embodiment.
The configuration data input to the unit processing unit 100 is stored in the register 114. Based on the configuration data stored in the register 114, the unit processing unit 100 is connected to another unit processing unit 100 indicated by the configuration data. The configuration data is output to the other unit processing unit 100 as the connection destination.

図５は、実施形態の積和演算部１０２における処理の流れの一例を説明する説明図である。
積和演算部１０２では、積和演算が実行される。図５においてｘはアクチベーションデータを表し、ｗはフィルタ係数又は全結合係数を表す。積和演算の中間値は、レジスタ１１７に記憶される。積和演算の結果である活性（図５におけるｕ）は活性化関数に入力される。図５においてｚは、活性ｕに対応する活性結果を表す。以下、説明の簡単のため、活性化関数は値が１の恒等関数であって活性結果ｚと活性ｕとが同一である場合を例にニューラルネットワーク回路１を説明する。 FIG. 5 is an explanatory diagram illustrating an example of a processing flow in the product-sum calculation unit 102 according to the embodiment.
The product-sum operation unit 102 executes the product-sum operation. In Fig. 5, x represents activation data, and w represents a filter coefficient or a full connection coefficient. The intermediate value of the product-sum operation is stored in the register 117. The activation (u in Fig. 5) which is the result of the product-sum operation is input to the activation function. In Fig. 5, z represents the activation result corresponding to the activation u. For simplicity of explanation, the neural network circuit 1 will be explained below taking as an example a case where the activation function is an identity function with a value of 1, and the activation result z is the same as the activation u.

＜畳み込み層における畳み込み積分の処理＞
主演算部１０が実行する畳み込み層の畳み込み積分の処理（以下「畳み込み処理」という。）について図６～図２１を用いて説明する。 <Processing of convolution integrals in the convolution layer>
The convolution integral process of the convolution layer (hereinafter referred to as "convolution process") executed by the main calculation unit 10 will be described with reference to Figs. 6 to 21.

図６は、実施形態のニューラルネットワーク回路１が実行する畳み込み処理を説明する第１の説明図である。
図６では、説明の簡単のため、Ｉ行Ｊ列の２階のテンソルで表される層入力データＤ１０１の各要素のうちｉ行ｊ列の要素の値ｘ（ｉ、ｊ）から（ｉ＋２）行（ｊ＋２）列の要素の値ｘ（ｉ＋２、ｊ＋２）までの９つの要素を用いて畳み込み処理を説明する。Ｉは１以上の整数であり、Ｊは１以上の整数である。ｉは０以上（Ｉ－１）以下の整数であり、ｊは０以上（Ｊ－１）以下の整数である。層入力データＤ１０１のチャネル数は１である。以下、アクチベーションデータｘ（Ａ、Ｂ）のＡを、アクチベーションデータｘの２指標表示の第１指標という。以下、アクチベーションデータｘ（Ａ、Ｂ）のＢを、アクチベーションデータｘの２指標表示の第２指標という。なお、アクチベーションデータｘの２指標表示とは、アクチベーションデータｘが２つの指標によって他のアクチベーションデータｘと区別される値であることを意味する。なお、以下、説明の中で、上述したＡ、Ｂ以外にもＣ、Ｄ、Ｅの記号が使われるが、Ｃ、Ｄ、ＥもまたＡ、Ｂと同様に１つのものを他のものと区別するための指標を示す記号である。 FIG. 6 is a first explanatory diagram illustrating the convolution process executed by the neural network circuit 1 of the embodiment.
In FIG. 6, for the sake of simplicity, the convolution process is described using nine elements from the value x(i, j) of the element in row i and column j to the value x(i+2, j+2) of the element in row (j+2) and column (j+2) of the layer input data D101 represented by a second-order tensor with row I and column J. I is an integer of 1 or more, and J is an integer of 1 or more. i is an integer of 0 or more (I-1) and j is an integer of 0 or more (J-1). The number of channels of the layer input data D101 is 1. Hereinafter, A of the activation data x(A,B) is referred to as the first index of the two-index display of the activation data x. Hereinafter, B of the activation data x(A,B) is referred to as the second index of the two-index display of the activation data x. The two-index display of the activation data x means that the activation data x is a value that is distinguished from other activation data x by two indexes. In the following explanation, in addition to the above-mentioned A and B, the symbols C, D, and E will also be used. Like A and B, C, D, and E are also symbols that indicate indicators for distinguishing one thing from another.

上述したように、畳み込みフィルタは、各層において必ずしも１つではなく複数存在してもよい。図６では、複数存在する畳み込みフィルタのうちの１つとして畳み込みフィルタＦ１０１を畳み込みフィルタの具体例として示す。畳み込みフィルタＦ１０１は、２行２列であり４つのフィルタ係数を有する畳み込みフィルタである。具体的には、畳み込みフィルタＦ１０１はｗ（０、０）、ｗ（０、１）、ｗ（１、０）及びｗ（１、１）の４つのフィルタ係数を有する。以下、説明の簡単のため、畳み込みフィルタが２階のテンソルで表される場合であって、特に正方行列で表される場合を例にニューラルネットワーク回路１を説明する。フィルタ係数ｗ（Ａ、Ｂ）は、フィルタ係数ｗが畳み込みフィルタにおけるＡ行Ｂ列目の要素の値であることを示す。以下、フィルタ係数ｗ（Ａ、Ｂ）のＡを、フィルタ係数ｗの２指標表示の第１指標という。以下、フィルタ係数ｗ（Ａ、Ｂ）のＢを、フィルタ係数ｗの２指標表示の第２指標という。 As described above, each layer may have more than one convolution filter. In FIG. 6, the convolution filter F101 is shown as a specific example of one of the multiple convolution filters. The convolution filter F101 is a convolution filter with two rows and two columns and four filter coefficients. Specifically, the convolution filter F101 has four filter coefficients: w(0,0), w(0,1), w(1,0), and w(1,1). For ease of explanation, the neural network circuit 1 will be described below using an example in which the convolution filter is expressed as a second-order tensor, particularly as a square matrix. The filter coefficient w(A,B) indicates that the filter coefficient w is the value of the element in the Ath row and Bth column of the convolution filter. Hereinafter, the A of the filter coefficient w(A,B) will be referred to as the first index of the two-index display of the filter coefficient w. Hereinafter, the B of the filter coefficient w(A,B) will be referred to as the second index of the two-index display of the filter coefficient w.

図６（ａ）は、層入力データＤ１０１と畳み込みフィルタＦ１０１との畳み込み積分の実行によって、２行２列の特徴マップが取得されることを示す。特徴マップの各要素の値が活性である。具体的には、図６（ａ）は、層入力データＤ１０１と畳み込みフィルタＦ１０１との畳み込み積分の結果、活性ｕ（ｉ、ｊ）、活性ｕ（ｉ、ｊ＋１）、活性ｕ（ｉ＋１、ｊ）、活性ｕ（ｉ＋１、ｊ＋１）を有する特徴マップが算出されることを示す。以下、活性ｕ（Ａ、Ｂ）のＡを活性ｕの２指標表示の第１指標という。以下、活性ｕ（Ａ、Ｂ）のＢを活性ｕの２指標表示の第２指標という。なお、活性ｕの２指標表示とは、活性ｕが２つの指標によって他の活性ｕと区別される値であることを意味する。 Figure 6(a) shows that a feature map with two rows and two columns is obtained by performing a convolution integral between the layer input data D101 and the convolution filter F101. The value of each element of the feature map is an activity. Specifically, Figure 6(a) shows that a feature map having activation u(i, j), activation u(i, j+1), activation u(i+1, j), and activation u(i+1, j+1) is calculated as a result of the convolution integral between the layer input data D101 and the convolution filter F101. Hereinafter, A in activation u(A, B) is referred to as the first index of the two-index display of activation u. Hereinafter, B in activation u(A, B) is referred to as the second index of the two-index display of activation u. Note that the two-index display of activation u means that the activation u is a value that is distinguished from other activation u by two indexes.

活性ｕ（ｉ、ｊ）は具体的には、以下の式（１）で表される。 Specifically, activity u(i, j) is expressed by the following formula (1):

式（１）におけるＰとＱとの定義を説明する。畳み込みフィルタがＰ行Ｑ列のテンソルである場合、ｐは、０以上Ｐ未満の整数である。なお、畳み込みフィルタがＰ行Ｑ列のテンソルである場合、ｑは、０以上Ｑ未満の整数である。例えば、畳み込みフィルタが２行２列のテンソルである場合、それぞれ０又は１の値である。 The definitions of P and Q in formula (1) are explained below. When the convolution filter is a tensor with P rows and Q columns, p is an integer greater than or equal to 0 and less than P. Note that when the convolution filter is a tensor with P rows and Q columns, q is an integer greater than or equal to 0 and less than Q. For example, when the convolution filter is a tensor with 2 rows and 2 columns, q is a value of 0 or 1, respectively.

第ｎの畳み込み層（以下「第ｎ畳み込み層」という。）における処理によって取得された特徴マップは、次段（すなわち、第（ｎ＋１）の畳み込み層）の層入力データとして次段の畳み込み層に入力される。ｎは畳み込みニューラルネットワークの各層を区別するための番号であって畳み込みニューラルネットワークにおいて処理が実行されるタイミングが早い層であるほど値が小さな番号である。ｎは０以上（Ｎ－１）以下の整数であって、Ｎは２以上の整数である。以下、ｎを層番号という。例えば、第ｎ畳み込み層の次に処理が実行される層は第（ｎ＋１）畳み込み層である。次段の層が全結合層である場合には、特徴マップは、全結合層の層入力データとして全結合層に入力される。 The feature map obtained by processing in the nth convolutional layer (hereinafter referred to as the "nth convolutional layer") is input to the next convolutional layer as layer input data for the next stage (i.e., the (n+1)th convolutional layer). n is a number for distinguishing between layers of a convolutional neural network, and the earlier the layer in the convolutional neural network is processed, the smaller the number is. n is an integer between 0 and (N-1), and N is an integer greater than or equal to 2. Hereinafter, n is referred to as the layer number. For example, the layer that is processed next after the nth convolutional layer is the (n+1)th convolutional layer. If the next layer is a fully connected layer, the feature map is input to the fully connected layer as layer input data for the fully connected layer.

図６（ｂ）は、特徴マップの１つの要素の値である活性ｕ（ｉ、ｊ）を算出する処理の流れの一例を、活性ｕ（ｉ、ｊ）を算出する積和演算の各項の計算において用いられる層入力データＤ１０１の要素を示しながら説明する。畳み込みフィルタＦ１０１は２行２列のテンソルであるため、活性ｕ（ｉ、ｊ）は第１項から第４項までの４つの項の和で表される。図６（ｂ）では、まず第１項が計算され、次に第２項が計算され、次に第３項が計算され、最後に第４項が計算される場合を例に活性ｕ（ｉ、ｊ）を算出する処理を説明する。 Figure 6(b) illustrates an example of the process flow for calculating activation u(i,j), which is the value of one element of a feature map, while showing the elements of the layer input data D101 used in the calculation of each term of the product-sum operation that calculates activation u(i,j). Since the convolution filter F101 is a tensor with 2 rows and 2 columns, activation u(i,j) is expressed as the sum of four terms, from the first term to the fourth term. In Figure 6(b), the process for calculating activation u(i,j) is explained using an example in which the first term is calculated first, then the second term, then the third term, and finally the fourth term.

図６（ｂ）は、第１項の計算には、層入力データＤ１０１のアクチベーションデータのうちアクチベーションデータｘ（ｉ、ｊ）、アクチベーションデータｘ（ｉ、ｊ＋１）、アクチベーションデータｘ（ｉ＋１、ｊ）及びアクチベーションデータｘ（ｉ＋１、ｊ＋１）の４つが用いられることを示す。図６（ｂ）は、第２項の計算には、層入力データＤ１０１のアクチベーションデータのうちアクチベーションデータｘ（ｉ、ｊ＋１）、アクチベーションデータｘ（ｉ、ｊ＋２）、アクチベーションデータｘ（ｉ＋１、ｊ＋１）及びアクチベーションデータｘ（ｉ＋１、ｊ＋２）の４つが用いられることを示す。図６（ｂ）は、第３項の計算には、層入力データＤ１０１のアクチベーションデータのうちアクチベーションデータｘ（ｉ＋１、ｊ）、アクチベーションデータｘ（ｉ＋１、ｊ＋１）、アクチベーションデータｘ（ｉ＋２、ｊ）及びアクチベーションデータｘ（ｉ＋２、ｊ＋１）の４つが用いられることを示す。図６（ｂ）は、第４項の計算には、層入力データＤ１０１のアクチベーションデータのうちアクチベーションデータｘ（ｉ＋１、ｊ＋１）、アクチベーションデータｘ（ｉ＋１、ｊ＋２）、アクチベーションデータｘ（ｉ＋２、ｊ＋１）及びアクチベーションデータｘ（ｉ＋２、ｊ＋２）の４つが用いられることを示す。 Figure 6(b) shows that the calculation of the first term uses four activation data from the layer input data D101: activation data x(i, j), activation data x(i, j+1), activation data x(i+1, j), and activation data x(i+1, j+1). Figure 6(b) shows that the calculation of the second term uses four activation data from the layer input data D101: activation data x(i, j+1), activation data x(i, j+2), activation data x(i+1, j+1), and activation data x(i+1, j+2). FIG. 6(b) shows that the calculation of the third term uses four activation data from the layer input data D101: activation data x(i+1, j), activation data x(i+1, j+1), activation data x(i+2, j), and activation data x(i+2, j+1). FIG. 6(b) shows that the calculation of the fourth term uses four activation data from the layer input data D101: activation data x(i+1, j+1), activation data x(i+1, j+2), activation data x(i+2, j+1), and activation data x(i+2, j+2).

なお、図６（ｂ）は説明の簡単のため、第１項、第２項、第３項、第４項の順に計算されることを示すが、計算は必ずしも第１項、第２項、第３項、第４項の順に実行される必要は無い。計算は、例えば、第１項、第３項、第２項、第４項の順に実行されてもよい。 For ease of explanation, FIG. 6(b) shows that the calculations are performed in the order of the first term, the second term, the third term, and the fourth term, but the calculations do not necessarily have to be performed in the order of the first term, the second term, the third term, and the fourth term. The calculations may be performed in the order of the first term, the third term, the second term, and the fourth term, for example.

図７は、実施形態のニューラルネットワーク回路１が実行する畳み込み演算を説明する第２の説明図である。
図７は、第ｎ畳み込み層への層入力データがチャネル数３であり各チャネルのデータが４行４列の行列で表されるデータである場合を例に畳み込み演算を説明する。図７において第ｎ畳み込み層への層入力データは、チャネル数が３であり各チャネルのデータが４行４列の行列で表されるデータであるため、３階のテンソルで表されるデータである。以下、層入力データのチャネル数（以下「入力チャネル数」という。）を、ＮｕｍＣＨ_ｉｎの記号で表す。以下、特徴マップのチャネル数（以下「出力チャネル数」という。）を、ＮｕｍＣＨ_ｏｕｔの記号で表す。出力チャネル数ＮｕｍＣＨ_ｏｕｔは、畳み込みフィルタの数に等しい。 FIG. 7 is a second explanatory diagram illustrating the convolution operation executed by the neural network circuit 1 of the embodiment.
FIG. 7 illustrates a convolution operation using an example in which the layer input data to the nth convolution layer has three channels and the data of each channel is represented by a matrix of four rows and four columns. In FIG. 7, the layer input data to the nth convolution layer has three channels and the data of each channel is represented by a matrix of four rows and four columns, so the data is represented by a third-order tensor. Hereinafter, the number of channels of the layer input data (hereinafter referred to as the "number of input channels") is represented by the symbol NumCH _in . Hereinafter, the number of channels of the feature map (hereinafter referred to as the "number of output channels") is represented by the symbol NumCH _out . The number of output channels NumCH _out is equal to the number of convolution filters.

図７は、２つの畳み込みフィルタがそれぞれ層入力データのチャネルごとに適用されることを示す。 Figure 7 shows that two convolution filters are applied, one for each channel of the layer input data.

図７は、層入力データに対してフィルタ１及びフィルタ２の２つの３行３列の畳み込みフィルタが適用された場合に、特徴マップは、出力チャネル数ＮｕｍＣＨ_ｏｕｔが２であって各チャネルのデータが２行２列の行列で表されるデータであることを示す。 FIG. 7 shows that when two convolution filters, Filter 1 and Filter 2, each having 3 rows and 3 columns, are applied to the layer input data, the feature map has an output channel number NumCH _out of 2 and data for each channel is represented by a matrix having 2 rows and 2 columns.

ニューラルネットワーク回路１では、複数の畳み込みフィルタがそれぞれ層入力データのチャネルごとに適用されるため、ニューラルネットワーク回路１で算出される活性ｕは以下の式（２）で表される。 In the neural network circuit 1, multiple convolution filters are applied to each channel of the layer input data, so the activation u calculated by the neural network circuit 1 is expressed by the following equation (2).

式（２）におけるＣＨ_ｏｕｔ及びＣＨ_ｉｎと、ｘ、ｕ及びｗの各指標とについて説明する。 The following describes CH _out and CH _in , and the indices x, u, and w in formula (2).

式（２）に示すＣｈ_ｏｕｔは、特徴マップの各チャネルをそれぞれ区別するための値である。例えば、Ｃｈ_ｏｕｔ＝Ｖは（Ｖは非負の整数）、特徴マップが備える複数のチャネルのうち第Ｖ番目のチャネルを表す。以下、ＣＨ_ｏｕｔを出力チャネル番号という。 Ch _out in formula (2) is a value for distinguishing each channel of the feature map. For example, Ch _out =V (V is a non-negative integer) represents the Vth channel among the multiple channels included in the feature map. Hereinafter, CH _out will be referred to as the output channel number.

式（２）に示すＣｈ_ｉｎは、層入力データの各チャネルをそれぞれ区別するための値である。例えば、Ｃｈ_ｉｎ＝Ｖは、層入力データが備える複数のチャネルのうち第Ｖ番目のチャネルを表す（Ｖは非負の整数）。以下、ＣＨ_ｉｎ（すなわちＶ）を入力チャネル番号という。 Ch _in in formula (2) is a value for distinguishing each channel of the layer input data. For example, Ch _in = V represents the Vth channel (V is a non-negative integer) among the multiple channels included in the layer input data. Hereinafter, Ch _in (i.e., V) is referred to as the input channel number.

式（２）におけるアクチベーションデータｘは４つの指標によって他のアクチベーションデータｘと区別されている。具体的には、アクチベーションデータｘはｘ（Ａ、Ｂ、Ｃ、Ｄ）と表されることで他のアクチベーションデータと区別されている。アクチベーションデータｘ（Ａ、Ｂ、Ｃ、Ｄ）は、アクチベーションデータｘが、層番号Ａの層に入力される層入力データの要素の値であって、入力チャネル番号ＢのデータのうちＣ行Ｄ列の要素の値であることを示す。すなわち、アクチベーションデータｘの４指標表示の第１指標Ａは、アクチベーションデータｘを有する層入力データの入力先の層番号ｎを表す。アクチベーションデータｘの４指標表示の第２指標Ｂは、入力チャネル番号Ｃｈ_ｉｎを表す。また、アクチベーションデータｘの４指標表示の第３指標Ｃ及び第４指標Ｄは、アクチベーションデータｘが２階のテンソルのＣ行Ｄ列に位置する要素の値であることを示す。なお、前述したアクチベーションデータｘ（Ａ、Ｂ）は、入力チャネル数と出力チャネル数とが１の場合におけるアクチベーションデータｘ（Ａ、Ｂ、Ｃ、Ｄ）を表す。なお、アクチベーションデータｘの４指標表示とは、アクチベーションデータｘが４つの指標によって他のアクチベーションデータｘと区別される値であることを意味する。 The activation data x in formula (2) is distinguished from other activation data x by four indices. Specifically, the activation data x is distinguished from other activation data by being represented as x(A, B, C, D). The activation data x(A, B, C, D) indicates that the activation data x is the value of an element of the layer input data input to the layer with layer number A, and is the value of the element in row C and column D of the data of the input channel number B. That is, the first index A of the four-indicator display of the activation data x represents the layer number n of the input destination of the layer input data having the activation data x. The second index B of the four-indicator display of the activation data x represents the input channel number Ch _in . In addition, the third index C and the fourth index D of the four-indicator display of the activation data x indicate that the activation data x is the value of the element located in row C and column D of the second-order tensor. The above-mentioned activation data x (A, B) represents the activation data x (A, B, C, D) when the number of input channels and the number of output channels are 1. The four-index representation of the activation data x means that the activation data x is a value that can be distinguished from other activation data x by four indexes.

活性ｕ（Ａ、Ｂ、Ｃ、Ｄ）は、活性ｕが、層番号Ａの層における演算の結果であって、特徴マップにおける出力チャネル番号ＢのデータのうちのＣ行Ｄ列の要素の値であることを示す。すなわち、活性ｕの４指標表示の第１指標Ａは、活性ｕを出力する演算が実行された層の層番号ｎを表す。活性ｕの４指標表示の第２指標Ｂは、出力チャネル番号Ｃｈ_ｏｕｔを表す。また、活性ｕの４指標表示の第３指標Ｃ及び第４指標Ｄは、活性ｕが２階のテンソルのＣ行Ｄ列に位置する要素の値として、第１バッファ記憶部１１に記憶される値であることを示す。なお、前述した活性ｕ（Ａ、Ｂ）は、入力チャネル数と出力チャネル数とが１の場合における活性ｕ（Ａ、Ｂ、Ｃ、Ｄ）を表す。なお、活性ｕの２指標表示とは、活性ｕが４つの指標によって他の活性ｕと区別される値であることを意味する。 The activation u (A, B, C, D) indicates that the activation u is the result of an operation in a layer with layer number A, and is the value of an element in row C and column D of the data of output channel number B in the feature map. That is, the first index A of the four-index display of the activation u represents the layer number n of the layer in which the operation to output the activation u was executed. The second index B of the four-index display of the activation u represents the output channel number Ch _out . In addition, the third index C and the fourth index D of the four-index display of the activation u indicate that the activation u is a value stored in the first buffer memory unit 11 as the value of an element located in row C and column D of a second-order tensor. Note that the above-mentioned activation u (A, B) represents the activation u (A, B, C, D) when the number of input channels and the number of output channels are 1. Note that the two-index display of the activation u means that the activation u is a value that is distinguished from other activation u by four indexes.

ｗ（Ａ、Ｂ、Ｃ、Ｄ、Ｅ）は、フィルタ係数を表す。フィルタ係数ｗの５指標表示の第１指標Ａは、畳み込み積分の演算が実行される層番号ｎを表す。フィルタ係数ｗの５指標表示の第２指標Ｂは、畳み込み積分の実行時にフィルタ係数ｗに掛け算されるアクチベーションデータｘが属する入力チャネル番号Ｃｈ_ｉｎを表す。フィルタ係数ｗの５指標表示の第３指標Ｃは、畳み込み積分の実行結果である特徴マップにおける出力チャネル番号ＣＨ_ｏｕｔを表す。フィルタ係数ｗの５指標表示の第４指標Ｄ及び第５指標Ｅは、フィルタ係数ｗが畳み込みフィルタを表す２階のテンソルのＤ行Ｅ列に位置する要素の値であることを示す。なお、フィルタ係数ｗの５指標表示とは、フィルタ係数ｗが５つの指標によって他のフィルタ係数ｗと区別される値であることを意味する。 w (A, B, C, D, E) represents a filter coefficient. The first index A of the five-index display of the filter coefficient w represents the layer number n where the convolution integral operation is performed. The second index B of the five-index display of the filter coefficient w represents the input channel number Ch _in to which the activation data x to be multiplied by the filter coefficient w when the convolution integral is performed belongs. The third index C of the five-index display of the filter coefficient w represents the output channel number CH _out in the feature map which is the result of the convolution integral. The fourth index D and the fifth index E of the five-index display of the filter coefficient w indicate that the filter coefficient w is the value of the element located in the D row and E column of the second-order tensor representing the convolution filter. The five-index display of the filter coefficient w means that the filter coefficient w is a value that is distinguished from other filter coefficients w by five indexes.

ｂ（ｎ、ＣＨ_ｏｕｔ）は、バイアスである。バイアスｂ（ｎ、ＣＨ_ｏｕｔ）は、層番号ｕの層における演算に用いられるバイアスであって、出力チャネル番号ＣＨ_ｏｕｔの活性ｕを算出する演算に用いられるバイアスであることを示す。なお、バイアスｂは０であってもよく、畳み込み処理においては、０である。そこで、以下、説明の簡単のため、バイアスｂが０である場合を例にニューラルネットワーク回路１における畳み込み処理を説明する。 b(n, CH _out ) is a bias. The bias b(n, CH _out ) is a bias used in the calculation in the layer with layer number u, and indicates that it is a bias used in the calculation to calculate the activity u of the output channel number CH _out . Note that the bias b may be 0, and is 0 in the convolution process. Therefore, for simplicity of explanation, the convolution process in the neural network circuit 1 will be explained below using the case where the bias b is 0 as an example.

図８は、実施形態の第１バッファ記憶部１１から各単位処理部１００へ出力される情報の流れを説明する図である。 Figure 8 is a diagram illustrating the flow of information output from the first buffer memory unit 11 to each unit processing unit 100 in the embodiment.

図８は、単位処理部１００＿０＿０から単位処理部１００＿（Ｒ＿１）＿（Ｓ＿１）までの（Ｒ×Ｓ）個（ＲはＣＨ_ｉｎ以上の整数。ＳはＪ以上の整数）の単位処理部１００を示す。Ｒは、入力チャネル数ＮｕｍＣＨ_ｉｎの数に同一であることが望ましい。Ｓは、層入力データの列の数Ｊに同一であることが望ましい。以下、説明の簡単のため、単位処理部１００＿（ｒ＿１）＿（ｓ＿１）をＰＢ（ｒ－１、ｓ－１）という。なお、ｒは０以上（Ｒ－１）以下の整数であり、ｓは０以上（Ｓ－１）以下の整数である。以下、ＰＢ（Ａ、Ｂ）のＡをＰＢの第１指標という。ＰＢ（Ａ、Ｂ）のＢをＰＢの第２指標という。ＰＢの第１指標は、格子状に配置された単位処理部１００が属する列を示す。ＰＢの第２指標は、格子状に配置された単位処理部１００が属する行を示す。ＰＢの第２指標が同じでありＰＢの第１指標の差が１である２つの単位処理部１００は、互いに隣接する。ＰＢの第１指標が同じでありＰＢの第２指標の差が１である２つの単位処理部１００は、互いに隣接する。 FIG. 8 shows (R×S) (R is an integer equal to or greater than CH _in , and S is an integer equal to or greater than J) unit processing units 100 from unit processing unit 100_0_0 to unit processing unit 100_(R_1)_(S_1). R is preferably equal to the number of input channels NumCH _in . S is preferably equal to the number of columns J of the layer input data. Hereinafter, for ease of explanation, unit processing unit 100_(r_1)_(s_1) will be referred to as PB(r-1, s-1). Note that r is an integer equal to or greater than 0 (R-1), and s is an integer equal to or greater than 0 (S-1). Hereinafter, A in PB(A,B) will be referred to as the first index of PB. B in PB(A,B) will be referred to as the second index of PB. The first index of PB indicates the column to which the unit processing units 100 arranged in a lattice pattern belong. The second index of PB indicates the row to which the unit processing units 100 arranged in a lattice pattern belong. Two unit processing units 100 having the same PB second index and a difference between their PB first indexes of 1 are adjacent to each other. Two unit processing units 100 having the same PB first index and a difference between their PB second indexes of 1 are adjacent to each other.

第１バッファ記憶部１１は、各単位処理部１００に情報を直接送信可能である。直接送信可能とは、単位処理部１００を介して他の単位処理部１００に情報を送信する必要はない、ことを意味する。 The first buffer memory unit 11 can transmit information directly to each unit processing unit 100. Direct transmission means that there is no need to transmit information to other unit processing units 100 via the unit processing unit 100.

図９は、実施形態の第２バッファ記憶部１２から各単位処理部１００へ出力される情報の流れを説明する図である。
第２バッファ記憶部１２は、各単位処理部１００に情報を直接送信可能である。 FIG. 9 is a diagram illustrating the flow of information output from the second buffer memory unit 12 to each unit processing unit 100 in the embodiment.
The second buffer memory unit 12 is capable of directly transmitting information to each unit processing unit 100 .

図１０は、実施形態の単位処理部１００から第１バッファ記憶部１１へ出力される情報の流れを説明する図である。
第１バッファ記憶部１１は、各単位処理部１００から情報を直接受信可能である。直接受信可能とは、単位処理部１００を介して他の単位処理部１００が送信した情報を受信する必要はない、ことを意味する。 FIG. 10 is a diagram illustrating the flow of information output from the unit processing unit 100 to the first buffer memory unit 11 in the embodiment.
The first buffer memory unit 11 can directly receive information from each unit processing unit 100. Being directly receivable means that it is not necessary to receive information transmitted by another unit processing unit 100 via the unit processing unit 100.

図１１は、実施形態の単位処理部１００から第２バッファ記憶部１２へ出力される情報の流れを説明する図である。
単位処理部１００は、予め定められた他の単位処理部１００に情報を送信可能である。予め定められた他の単位処理部１００とは、コンフィギュレーションデータが示す接続先の単位処理部１００である。 FIG. 11 is a diagram illustrating the flow of information output from the unit processing unit 100 to the second buffer memory unit 12 in the embodiment.
The processing unit 100 is capable of transmitting information to another predetermined processing unit 100. The other predetermined processing unit 100 is a connection destination processing unit 100 indicated by the configuration data.

図１１は、ＰＢ（０、０）からＰＢ（１、０）への情報の送信が可能であることを示す。図１１は、ＰＢ（１、０）からＰＢ（２、０）への情報の送信が可能であることを示す。図１１は、ＰＢ（Ｓ－１、０）からＰＢ（０、０）への情報の送信が可能であることを示す。このように、ＰＢの第１指標が（Ｓ－１）である単位処理部１００以外の各単位処理部１００は、畳み込み演算において、ＰＢの第２指標が同一であり第１指標が１だけ増加する単位処理部１００に情報を送信可能である。ＰＢの第１指標が（Ｓ－１）である単位処理部１００は、ＰＢの第２指標が同一でありＰＢの第１指標が０である単位処理部１００に情報を送信可能である。 Figure 11 shows that information can be transmitted from PB(0,0) to PB(1,0). Figure 11 shows that information can be transmitted from PB(1,0) to PB(2,0). Figure 11 shows that information can be transmitted from PB(S-1,0) to PB(0,0). In this way, each unit processing unit 100 other than the unit processing unit 100 whose PB first index is (S-1) can transmit information to the unit processing unit 100 whose PB second index is the same and whose first index increases by 1 in the convolution operation. The unit processing unit 100 whose PB first index is (S-1) can transmit information to the unit processing unit 100 whose PB second index is the same and whose PB first index is 0.

図１２は、実施形態の第１バッファ記憶部１１から主演算部１０へ出力される情報の出力先を説明する図である。以下、説明の簡単のため、層入力データは入力チャネル数ＮｕｍＣｈ_ｉｎが６４のデータであって各チャネルが有するデータが６４行６４列の２階のテンソルで表されるデータである場合を例にニューラルネットワーク回路１を説明する。ただし、以下の説明は、入力チャネル数が６４でなくてもよい。また、各チャネルが有するデータも２階のテンソルで表されるデータであれば６４行６４列でなくてもよい。また、各チャネルが有するデータは、２階のテンソルで表されるデータであれば正方行列でなくてもよい。 12 is a diagram for explaining the output destination of information output from the first buffer memory unit 11 to the main calculation unit 10 in the embodiment. In the following, for the sake of simplicity, the neural network circuit 1 will be explained using an example in which the layer input data is data with an input channel number NumCh _in of 64, and the data possessed by each channel is data expressed by a second-order tensor with 64 rows and 64 columns. However, in the following explanation, the number of input channels does not have to be 64. Furthermore, the data possessed by each channel does not have to be 64 rows and 64 columns as long as it is data expressed by a second-order tensor. Furthermore, the data possessed by each channel does not have to be a square matrix as long as it is data expressed by a second-order tensor.

図１２は、層入力データは入力チャネル番号が０から６３までの６４チャネルのデータであることを示す。また、図１２は、第１バッファ記憶部１１に記憶された各チャネルのデータが、行数が０行目から６３行目までの６４であって列数が０列目から６３列目までの６４である６４行６４列のテンソルで表されるデータであることを示す。 Figure 12 shows that the layer input data is data for 64 channels with input channel numbers from 0 to 63. Figure 12 also shows that the data for each channel stored in the first buffer memory unit 11 is data represented as a 64-row, 64-column tensor, with 64 rows from row 0 to row 63 and 64 columns from column 0 to column 63.

図１２は、層入力データの各チャネルの０行目のデータが主演算部１０へ出力されることを示す。主演算部１０の単位処理部１００には、単位処理部１００にひとつのアクチベーションデータが入力されることを示す。例えば、ＰＢ（０、０）には、入力チャネル番号が０のデータのうち０行０列目のアクチベーションデータｘ（０、０、０、０）が入力される。 Figure 12 shows that the data in the 0th row of each channel of the layer input data is output to the main calculation unit 10. It shows that one activation data is input to the unit processing unit 100 of the main calculation unit 10. For example, activation data x(0,0,0,0) in the 0th row and 0th column of the data with input channel number 0 is input to PB(0,0).

図１２は、主演算部１０には、以下の第１アクチベーションデータ入力規則及び第２アクチベーションデータ入力規則にしたがって、層入力データが入力されることを示す。第１アクチベーションデータ入力規則は、ＰＢの第１指標が同じ単位処理部１００には入力チャネル番号が同じアクチベーションデータｘが入力される、という規則である。第２アクチベーションデータ入力規則は、ＰＢの第２指標が同じ単位処理部１００には、第２指標が異なり第３指標が同じアクチベーションデータｘが入力される、という規則である。 Figure 12 shows that layer input data is input to the main calculation unit 10 according to the following first activation data input rule and second activation data input rule. The first activation data input rule is a rule that activation data x with the same input channel number is input to unit processing units 100 with the same PB first index. The second activation data input rule is a rule that activation data x with different second indexes and the same third index are input to unit processing units 100 with the same PB second index.

図１３は、実施形態の主演算部１０から第１バッファ記憶部１１へ出力される情報の出力元を説明する図である。
以下、説明の簡単のため、特徴マップは出力チャネル数ＮｕｍＣｈ_ｏｕｔが６４であって、各チャネルが有するデータが６４行６４列の２階のテンソルで表されるデータである場合を例にニューラルネットワーク回路１を説明する。ただし、以下の説明は、出力チャネル数が６４でなくてもよい。また、各チャネルが有するデータも２階のテンソルで表されるデータであれば６４行６４列でなくてもよい。また、各チャネルが有するデータは、２階のテンソルで表されるデータであれば正方行列でなくてもよい。 FIG. 13 is a diagram for explaining the source of information output from the main processing unit 10 to the first buffer memory unit 11 in the embodiment.
For ease of explanation, the neural network circuit 1 will be described below using as an example a case where the feature map has an output channel number NumCh _out of 64, and the data held by each channel is data expressed by a second-order tensor with 64 rows and 64 columns. However, in the following explanation, the number of output channels does not have to be 64. Furthermore, the data held by each channel does not have to be 64 rows and 64 columns as long as it is data expressed by a second-order tensor. Furthermore, the data held by each channel does not have to be a square matrix as long as it is data expressed by a second-order tensor.

図１３は、主演算部１０の各単位処理部１００の演算の結果である活性ｕ（すなわち活性結果ｚ）が、特徴マップにおける各チャネルの０行目の各要素の値として第１バッファ記憶部１１に記憶されることを示す。例えば、ＰＢ（０、０）による演算の結果であるｕ（０、０、０、０）は、特徴マップにおける出力チャネル番号が０の０行０列目の要素の値として、第１バッファ記憶部１１に記憶されることを示す。 Figure 13 shows that the activation u (i.e., activation result z) that is the result of the calculation of each unit processing unit 100 of the main calculation unit 10 is stored in the first buffer memory unit 11 as the value of each element in the 0th row of each channel in the feature map. For example, u(0,0,0,0), which is the result of the calculation by PB(0,0), is stored in the first buffer memory unit 11 as the value of the element in the 0th row and 0th column of the output channel number 0 in the feature map.

図１３は、ＰＢの第１指標が同じ単位処理部１００が算出した活性ｕは、特徴マップの各要素のうち出力チャネル番号が同じ要素の値として第１バッファ記憶部１１に記憶されることを示す。図１３は、ＰＢの第２指標が同じ単位処理部１００が算出した活性ｕは、特徴マップの各要素のうち出力チャネル番号Ｃｈ_ｏｕｔが異なり入力チャネル番号Ｃｈ_ｉｎが同じである要素の値として第１バッファ記憶部１１に記憶されることを示す。 Fig. 13 shows that the activity u calculated by the unit processing units 100 having the same first index of PB is stored in the first buffer storage unit 11 as the value of an element having the same output channel number among the elements of the feature map. Fig. 13 shows that the activity u calculated by the unit processing units 100 having the same second index of PB is stored in the first buffer storage unit 11 as the value of an element having a different output channel number Ch _out but the same input channel number Ch _in among the elements of the feature map.

図１４～図２０を用いて、畳み込み処理の流れの一例を説明する。畳み込み処理では、まず、層入力データの各チャネルのデータの１行目のデータが主演算部１０に入力される。次に、畳み込みフィルタのフィルタ係数のうちの予め定められた１つのフィルタ係数（以下「初期フィルタ係数」という。）について後述する畳み込み処理の２番目～６番目までの処理が実行される。次に、フィルタ係数に対して予め定められた所定の順番にしたがい、畳み込みフィルタの他のフィルタ係数について、フィルタ係数ごとに２番目から６番目までの処理が実行される。以下、層入力データの各チャネルのデータの２行目のデータが主演算部１０に入力され、各フィルタ係数について２番目～６番目の処理が実行される。以下同様にして、以下、層入力データの各チャネルのデータの最後の行（例えば６４行目）まで同様の処理が実行される。 An example of the flow of the convolution process will be described with reference to Figures 14 to 20. In the convolution process, first, the data of the first row of data of each channel of the layer input data is input to the main calculation unit 10. Next, the second to sixth processes of the convolution process described below are performed on a predetermined filter coefficient (hereinafter referred to as the "initial filter coefficient") among the filter coefficients of the convolution filter. Next, the second to sixth processes are performed for each of the other filter coefficients of the convolution filter according to a predetermined order for the filter coefficients. Thereafter, the data of the second row of data of each channel of the layer input data is input to the main calculation unit 10, and the second to sixth processes are performed for each filter coefficient. In the same manner, the same processes are performed until the last row (for example, the 64th row) of data of each channel of the layer input data.

以下、畳み込みフィルタがｗ_１１、ｗ_１２、ｗ_２１、ｗ_２２の４のフィルタ係数を有する場合を例に畳み込み処理の流れの一例を説明する。また、以下、初期フィルタ係数がｗ_１１である場合を例に畳み込み処理の流れの一例を説明する。以下、説明の簡単のため図１４～図１８では、層入力データの各チャネルのデータの１行目のデータが主演算部１０に入力された場合であってフィルタ係数がｗ_１１である場合を例に畳み込み処理の流れの一例を説明する。そして図１９では、２番目～６番目の処理が実行されるフィルタ係数がｗ_１１からｗ_１２に変更される際の、主演算部１０中のデータの変化を説明する。図２０は、層入力データの各チャネルのデータの他の行のデータについて畳み込み処理を実行する場合に、主演算部１０に入力されるデータを説明する。 Hereinafter, an example of the flow of the convolution process will be described using an example in which the convolution filter has four filter coefficients, _w11 , _w12 , _w21 , and _w22 . Also, an example of the flow of the convolution process will be described using an example in which the initial filter coefficient is _w11 . For the sake of simplicity, in the following, in Figs. 14 to 18, an example of the flow of the convolution process will be described using an example in which the data in the first row of data in each channel of the layer input data is input to the main arithmetic unit 10 and the filter coefficient is _w11 . Then, in Fig. 19, a change in data in the main arithmetic unit 10 when the filter coefficient for executing the second to sixth processes is changed from _w11 to _w12 will be described. Fig. 20 describes data input to the main arithmetic unit 10 when convolution processes are executed for data in other rows of data in each channel of the layer input data.

図１４は、実施形態における畳み込み処理の１番目の処理を説明する説明図である。図１４は、畳み込み処理の１番目の処理では、第１アクチベーションデータ入力規則及び第２アクチベーションデータ入力規則にしたがって主演算部１０に層入力データが入力されることを示す。 Figure 14 is an explanatory diagram explaining the first process of the convolution process in the embodiment. Figure 14 shows that in the first process of the convolution process, layer input data is input to the main calculation unit 10 according to the first activation data input rule and the second activation data input rule.

図１５は、実施形態における畳み込み処理の２番目の処理を説明する説明図である。２番目の処理は１番目の処理の次に実行される。 Figure 15 is an explanatory diagram illustrating the second process of the convolution process in the embodiment. The second process is executed after the first process.

図１５において、ｗ_ＡＢは、５指標表示の第４指標がＡで５指標表示の第５指標がＢのフィルタ係数ｗを表す。例えば、ｗ_１２は、５指標表示の第４指標が１で５指標表示の第５指標が２のフィルタ係数ｗを表す。畳み込み処理の２番目の処理においては、５指標表示の第４指標及び第５指標が１のフィルタ係数ｗが主演算部１０に入力される。 15 , w _AB represents a filter coefficient w whose fourth index in the five-index display is A and whose fifth index in the five-index display is B. For example, w ₁₂ represents a filter coefficient w whose fourth index in the five-index display is 1 and whose fifth index in the five-index display is 2. In the second process of the convolution process, the filter coefficient w whose fourth index and fifth index in the five-index display are 1 are input to the main calculation unit 10.

図１５は、畳み込み処理の２番目の処理では、第２バッファ記憶部１２が記憶する畳み込みフィルタのフィルタ係数がフィルタ入力規則にしたがって主演算部１０に入力されることを示す。フィルタ入力規則は、以下の規則条件１を満たす単位処理部１００に規則条件２を満たすフィルタ係数ｗが入力されるという規則である。規則条件１は、フィルタ係数ｗの５指標表示の第１指標が表す層番号と同じ層番号を４指標表示の第１指標が表す単位処理部１００であるという条件を含む。規則条件１は、フィルタ係数ｗの５指標表示の第２指標が表す入力チャネル番号と同じ入力チャネル番号を４指標表示の第２指標が表すアクチベーションデータｘが入力されている単位処理部１００である、という条件を含む。規則条件２は、規則条件１を満たす単位処理部１００に入力されているアクチベーションデータｘの４指標表示の第１指標が表す層番号と同じ層番号を５指標表示の第１指標が表すフィルタ係数ｗである、という条件を含む。規則条件２は、条件１を満たす単位処理部１００に入力されているアクチベーションデータｘの４指標表示の第２指標が表す入力チャネル番号と同じ入力チャネル番号を５指標表示の第２指標が表すフィルタ係数ｗである、という条件を含む。 Figure 15 shows that in the second process of the convolution process, the filter coefficients of the convolution filter stored in the second buffer memory unit 12 are input to the main calculation unit 10 according to the filter input rule. The filter input rule is a rule that a filter coefficient w that satisfies rule condition 2 is input to a unit processing unit 100 that satisfies the following rule condition 1. Rule condition 1 includes a condition that the unit processing unit 100 is represented by a first index of a four-index display that represents the same layer number as the layer number represented by the first index of a five-index display of the filter coefficient w. Rule condition 1 includes a condition that the unit processing unit 100 is a unit into which activation data x is input, the second index of a four-index display that represents the same input channel number as the input channel number represented by the second index of a five-index display of the filter coefficient w. Rule condition 2 includes a condition that the filter coefficient w is represented by a first index of a five-index display that represents the same layer number as the layer number represented by the first index of a four-index display of the activation data x input to the unit processing unit 100 that satisfies rule condition 1. Rule condition 2 includes the condition that the filter coefficient w represented by the second index of the five-index display has the same input channel number as the input channel number represented by the second index of the four-index display of the activation data x input to the unit processing unit 100 that satisfies condition 1.

畳み込み処理の２番目の処理では、各単位処理部１００に入力されたフィルタ係数ｗ_１１と、畳み込み処理の１番目の処理で各単位処理部１００に入力されたアクチベーションデータｘとの積を算出する処理が各単位処理部１００において実行される。 In the second step of the convolution process, each unit processing unit 100 executes a process of calculating the product of the filter coefficient _w11 input to each unit processing unit 100 and the activation data x input to each unit processing unit 100 in the first step of the convolution process.

図１６は、実施形態における畳み込み処理の３番目の処理を説明する説明図である。３番目の処理は２番目の処理の次に実行される。
畳み込み処理の３番目の処理においては、各単位処理部１００が接続先の単位処理部１００に、直前の処理で用いたアクチベーションデータｘを出力する。例えば、図１６は、ＰＢ（６３、０）が畳み込み処理の２番目の処理で用いたアクチベーションデータｘ（０、６３、０、０）をＰＢ（０、０）に出力することを示す。すなわち、ＰＢ（０、０）には、アクチベーションデータｘ（０、６３、０、０）が入力される。 16 is an explanatory diagram illustrating the third process of the convolution process in the embodiment. The third process is executed after the second process.
In the third process of the convolution process, each unit processing unit 100 outputs the activation data x used in the immediately preceding process to the connected unit processing unit 100. For example, Fig. 16 shows that PB (63, 0) outputs the activation data x (0, 63, 0, 0) used in the second process of the convolution process to PB (0, 0). That is, the activation data x (0, 63, 0, 0) is input to PB (0, 0).

図１７は、実施形態における畳み込み処理の４番目の処理を説明する説明図である。４番目の処理は３番目の処理の次に実行される。畳み込み処理の４番目の処理においては、フィルタ入力規則にしたがって主演算部１０にフィルタ係数ｗが入力される。 Figure 17 is an explanatory diagram explaining the fourth process of the convolution process in the embodiment. The fourth process is executed after the third process. In the fourth process of the convolution process, the filter coefficient w is input to the main calculation unit 10 according to the filter input rule.

図１８は、実施形態における畳み込み処理の５番目の処理を説明する説明図である。５番目の処理は４番目の処理の次に実行される。
畳み込み処理の５番目の処理においては、各単位処理部１００が接続先の単位処理部１００に、直前の処理で用いたアクチベーションデータｘを出力する。例えば、図１８は、ＰＢ（６３、０）が畳み込み処理の４番目の処理で用いたアクチベーションデータｘ（０、６２、０、０）をＰＢ（０、０）に出力することを示す。すなわち、ＰＢ（０、０）には、アクチベーションデータｘ（０、６２、０、０）が入力される。 18 is an explanatory diagram illustrating the fifth process of the convolution process in the embodiment. The fifth process is executed after the fourth process.
In the fifth process of the convolution process, each unit processing unit 100 outputs the activation data x used in the immediately preceding process to the connected unit processing unit 100. For example, FIG. 18 shows that PB(63,0) outputs the activation data x(0,62,0,0) used in the fourth process of the convolution process to PB(0,0). That is, the activation data x(0,62,0,0) is input to PB(0,0).

図１９は、実施形態における畳み込み処理の６番目の処理を説明する説明図である。６番目の処理は５番目の処理の次に実行される。
畳み込み処理の６番目の処理においては、各ＰＢは、ＰＢの第２指標がひとつ大きな値であり第１指標が同じ接続先のＰＢからアクチベーションデータｘが入力される。図１４と図１９とを比較すると、図１９で各ＰＢが記憶するアクチベーションデータｘが図１４で各ＰＢが記憶するアクチベーションデータｘに比べて、４指標表示の第４指標の値が１だけ異なっている。 19 is an explanatory diagram illustrating the sixth process of the convolution process in the embodiment. The sixth process is executed after the fifth process.
In the sixth process of the convolution process, each PB receives activation data x from a connected PB whose second index is one value larger than the first index. Comparing Fig. 14 with Fig. 19, the activation data x stored in each PB in Fig. 19 differs by one in the value of the fourth index of the four-index display from the activation data x stored in each PB in Fig. 14.

畳み込み処理の６番目の処理の実行後、次のフィルタ係数について２番目～６番目の処理が実行される。次のフィルタ係数とは、例えばｗ_１１のフィルタ係数について２番目～６番目の処理が実行された次であればｗ_１２のフィルタ係数である。このようにして、層入力データの各チャネルの１行目のデータに対し畳み込みフィルタが備える全てのフィルタ係数について２番目～６番目の処理が実行される。次に、図２０に示す処理が実行される。 After the sixth process of the convolution process is executed, the second to sixth processes are executed for the next filter coefficient. For example, if the next filter coefficient is the _w12 filter coefficient after the second to sixth processes are executed for the _w11 filter coefficient, the second to sixth processes are executed for all filter coefficients of the convolution filter for the first row data of each channel of the layer input data. Next, the process shown in FIG. 20 is executed.

図２０は、実施形態における畳み込み処理の７番目の処理を説明する説明図である。
図２０は、層入力データの各チャネルの２行目のデータが主演算部１０に入力されることを示す。７番目の処理の実行後、畳み込みフィルタが備える全てのフィルタ係数について２番目～６番目の処理が実行される。次に、層入力データの各チャネルの次の行のデータが主演算部１０に入力される。次の行とは、例えば、直前の行が２行目であれば３行目である。次の行とは、例えば、直前の行がｉ行目であれば（ｉ＋１）行目である。 FIG. 20 is an explanatory diagram illustrating the seventh process of the convolution process in the embodiment.
20 shows that the second row of data for each channel of the layer input data is input to the main calculation unit 10. After the seventh process is executed, the second to sixth processes are executed for all filter coefficients of the convolution filter. Next, the next row of data for each channel of the layer input data is input to the main calculation unit 10. For example, if the previous row is the second row, the next row is the third row. For example, if the previous row is the i-th row, the next row is the (i+1)-th row.

図２１は、実施形態における畳み込み処理の流れの一例を説明するフローチャートである。図２１において、副畳み込み処理とは、畳み込み処理の２番目から６番目の処理を全てのフィルタ係数について実行する処理である。 Figure 21 is a flowchart illustrating an example of the flow of convolution processing in an embodiment. In Figure 21, the sub-convolution processing is processing in which the second to sixth processes of the convolution processing are performed for all filter coefficients.

畳み込み処理が開始される（ステップＳ１０１）。主演算部１０に層入力データの各チャネルの１行目のアクチベーションデータｘが第１バッファ記憶部１１から入力される（ステップＳ１０２）。副畳み込み処理の実行が開始される（ステップＳ１０３）。副畳み込み処理において、まず、畳み込み処理の２番目の処理が実行される（ステップＳ１０４）。次に、畳み込み処理の３番目の処理が実行される（ステップＳ１０５）。次に、畳み込み処理の４番目の処理が実行される（ステップＳ１０６）。次に、畳み込み処理の５番目の処理が実行される（ステップＳ１０７）。次に、畳み込み処理の６番目の処理が実行される（ステップＳ１０８）。全てのフィルタ係数についてステップＳ１０４～ステップＳ１０８の処理が実行されると、ステップＳ１０２で入力されたアクチベーションデータｘに対する副畳み込み処理が終了する（ステップＳ１０９）。層入力データの全ての行のアクチベーションデータｘに対して副畳み込み処理が実行されたか否かが判定される（ステップＳ１１０）。ステップＳ１１０の判定は例えば制御部１４によって判定される。副畳み込み処理が実行されていない行が有る場合（ステップＳ１１０：ＹＥＳ）、主演算部１０に、層入力データの各チャネルの未だ副畳み込み処理が実行されていない行のアクチベーションデータｘが第１バッファ記憶部１１から入力される（ステップＳ１１１）。未だ副畳み込み処理が実行されていない行が複数ある場合、予め定められた処理の順番でアクチベーションデータｘが入力される。予め定められた所定の順番は、例えば、行の番号が小さい順番である。次に、ステップＳ１０３の処理に戻り、ステップＳ１０９までの処理が実行され、ステップＳ１１０の判定が行われる。一方、副畳み込み処理が実行されていない行が無い場合（ステップＳ１１０：ＮＯ）、畳み込み処理が終了する（ステップＳ１１２）。 The convolution process is started (step S101). The activation data x of the first row of each channel of the layer input data is input to the main calculation unit 10 from the first buffer memory unit 11 (step S102). The execution of the sub-convolution process is started (step S103). In the sub-convolution process, the second process of the convolution process is executed first (step S104). Next, the third process of the convolution process is executed (step S105). Next, the fourth process of the convolution process is executed (step S106). Next, the fifth process of the convolution process is executed (step S107). Next, the sixth process of the convolution process is executed (step S108). When the processes of steps S104 to S108 are executed for all filter coefficients, the sub-convolution process for the activation data x input in step S102 is completed (step S109). It is determined whether or not the sub-convolution process has been performed on the activation data x of all rows of the layer input data (step S110). The determination in step S110 is made, for example, by the control unit 14. If there are rows on which the sub-convolution process has not been performed (step S110: YES), the activation data x of the rows on which the sub-convolution process has not yet been performed in each channel of the layer input data is input from the first buffer memory unit 11 to the main calculation unit 10 (step S111). If there are multiple rows on which the sub-convolution process has not yet been performed, the activation data x is input in a predetermined processing order. The predetermined order is, for example, in order of the smallest row number. Next, the process returns to step S103, the process up to step S109 is performed, and the determination in step S110 is made. On the other hand, if there are no rows on which the sub-convolution process has not yet been performed (step S110: NO), the convolution process ends (step S112).

このように構成された実施形態のニューラルネットワーク回路１は、畳み込み層の処理において、第１バッファ記憶部１１及び第２バッファ記憶部１２から主演算部１０に入力されたデータが、単位処理部１００から他の単位処理部１００に出力される。このため、主演算部１０が第１バッファ記憶部１１及び第２バッファ記憶部１２からデータを読み出す機会が少ない。このことは、ニューラルネットワーク回路１が、第１外部メモリ９１及び第２外部メモリ９２からデータを読み出す機会が少ないことを意味する。そのため、このように構成されたニューラルネットワーク回路１は、畳み込み層の処理において、畳み込みニューラルネットワークにおける演算速度を向上させることができる。また、第１外部メモリ９１及び第２外部メモリ９２からデータを読み出す機会が少ないため、ニューラルネットワーク回路１は、消費電力の少ない回路である。 In the neural network circuit 1 of the embodiment configured in this manner, in the processing of the convolutional layer, data input from the first buffer memory unit 11 and the second buffer memory unit 12 to the main processing unit 10 is output from the unit processing unit 100 to another unit processing unit 100. Therefore, the main processing unit 10 has few opportunities to read data from the first buffer memory unit 11 and the second buffer memory unit 12. This means that the neural network circuit 1 has few opportunities to read data from the first external memory 91 and the second external memory 92. Therefore, the neural network circuit 1 configured in this manner can improve the calculation speed in the convolutional neural network in the processing of the convolutional layer. In addition, since there are few opportunities to read data from the first external memory 91 and the second external memory 92, the neural network circuit 1 is a circuit with low power consumption.

＜全結合層における処理＞
主演算部１０が実行する全結合層における処理（以下「全結合処理」という。）について図２２～図３２を用いて説明する。全結合層における処理の説明においては、記号ｗは、フィルタ係数ではなく全結合係数を表す。 <Processing in the fully connected layer>
The processing in the fully connected layer (hereinafter referred to as "fully connected processing") executed by the main processing unit 10 will be described with reference to Fig. 22 to Fig. 32. In the description of the processing in the fully connected layer, the symbol w represents a fully connected coefficient rather than a filter coefficient.

図２２は、実施形態における全結合処理の概要を説明する説明図である。
図２２におけるｘ_０、ｘ_１、ｘ_２及びｘ_３はアクチベーションデータを表す。図２２におけるｗ_００、ｗ_０１、ｗ_０２及びｗ_０３は全結合係数を表す。図２２におけるｕ_０、ｕ_１、ｕ_２及びｕ_３は、活性を表す。 FIG. 22 is an explanatory diagram for explaining an overview of the full join process in the embodiment.
In Fig. 22, _x0 , _x1 , _x2 , and _x3 represent activation data. In Fig. 22, _w00 , _w01 , _w02 , and _w03 represent total coupling coefficients. In Fig. 22, _u0 , _u1 , _u2 , and _u3 represent activity.

全結合処理では、ｘ_０、ｘ_１、ｘ_２及びｘ_３のそれぞれについて予め定められた全結合係数が乗算される。図２２において、全結合係数ｗ_ＡＢは、活性ｕ_Ａを算出する演算においてアクチベーションデータｘ_Ｂに対して乗算される値であることを示す。すなわち、全結合係数ｗは２つの指標によって他の全結合係数と区別される値であって、全結合係数ｗ_ＡＢの第１指標Ａは活性ｕ_Ａの算出に用いられる値であることを表す。また、全結合係数ｗ_ＡＢの第２指標Ｂは乗算先のアクチベーションデータｘ_Ｂを表す。全結合処理では、乗算後の値の和を１つの活性として出力する。例えば、図２２の例では、ｘ_０とｗ_００との積と、ｘ_１とｗ_０１との積と、ｘ_２とｗ_０２との積と、ｘ_３とｗ_０３との積との和が活性ｕ_０として出力される。他の活性ｕについても同様であり、図２２の例で算出される活性ｕを数式で表せば、以下の式（３）～式（５）で表される。 In the total coupling process, x ₀ , x ₁ , x ₂ and x ₃ are multiplied by a total coupling coefficient determined in advance. In FIG. 22, the total coupling coefficient w _AB indicates a value multiplied by the activation data x _B in the calculation to calculate the activation u _A. That is, the total coupling coefficient w is a value that is distinguished from other total coupling coefficients by two indices, and the first index A of the total coupling coefficient w _AB indicates that it is a value used to calculate the activation u _A. In addition, the second index B of the total coupling coefficient w _AB indicates the activation data x _B to be multiplied. In the total coupling process, the sum of the values after multiplication is output as one activation. For example, in the example of FIG. 22, the sum of the product of x ₀ and w ₀₀ , the product of x ₁ and w ₀₁ , the product of x ₂ and w ₀₂ , and the product of x ₃ and w ₀₃ is output as the activation u ₀ . The same is true for the other activities u. If the activity u calculated in the example of FIG. 22 is expressed by a formula, it can be expressed by the following formulas (3) to (5).

このように、全結合処理においては、複数の活性ｕ_Ａが算出される。そこで、全結合処理の実行のため、制御用バッファ記憶部１３は、予め、各活性ｕ_Ａについて全結合層の処理において活性ｕ_Ａを算出するために用いられる単位処理部１００を示す情報（以下「全結合対応情報」という。）が記憶されている。 In this way, in the fully connected process, a plurality of activations _uA are calculated. Therefore, in order to execute the fully connected process, the control buffer storage unit 13 stores in advance information indicating the unit processing unit 100 used to calculate the activation _uA in the process of the fully connected layer for each activation _uA (hereinafter referred to as "fully connected corresponding information").

図２３は、実施形態の全結合処理の実行時の単位処理部１００間のデータの流れの一例を説明する第１の説明図である。
図２３は、単位処理部１００には、コンフィギュレーションデータによって接続先として示された他の単位処理部１００からデータが入力されることを示す。図２３の例では、例えば、ＰＢ（１、１）には、ＰＢ（０、１）、ＰＢ（１、０）、ＰＢ（１、２）及びＰＢ（２、１）からデータが入力される。 FIG. 23 is a first explanatory diagram illustrating an example of the flow of data between the unit processing units 100 when the all-join process of the embodiment is executed.
Fig. 23 shows that data is input to the unit processing unit 100 from other unit processing units 100 that are designated as connection destinations by the configuration data. In the example of Fig. 23, for example, data is input to PB(1,1) from PB(0,1), PB(1,0), PB(1,2), and PB(2,1).

図２４は、実施形態の全結合処理の実行時の単位処理部１００間のデータの流れの一例を説明する第２の説明図である。
図２４は、単位処理部１００には、コンフィギュレーションデータによって接続先として示された他の単位処理部１００にデータが出力されることを示す。図２４の例では、例えば、ＰＢ（１、１）から、ＰＢ（０、１）、ＰＢ（１、０）、ＰＢ（１、２）及びＰＢ（２、１）にデータが出力される。 FIG. 24 is a second explanatory diagram illustrating an example of the flow of data between the unit processing units 100 when the all-join process of the embodiment is executed.
Fig. 24 shows that data is output from the unit processing unit 100 to another unit processing unit 100 that is specified as a connection destination by the configuration data. In the example of Fig. 24, for example, data is output from PB(1,1) to PB(0,1), PB(1,0), PB(1,2) and PB(2,1).

図２５～図３１を用いて、全結合処理の流れの一例を説明する。全結合処理は、後述する１番目の処理から６番目までの処理を含む。全結合処理では、後述する１番目から６番目までの処理の実行後、４番目の処理から６番目の処理が所定の終了条件を満たすまで繰り返される。所定の終了条件は、例えば、全結合係数の全てが活性ｕの算出に用いられた、という条件である。 An example of the flow of the full connection process will be described using Figures 25 to 31. The full connection process includes the first to sixth processes described below. In the full connection process, after the first to sixth processes described below are executed, the fourth to sixth processes are repeated until a specific end condition is met. The specific end condition is, for example, that all of the connection coefficients have been used to calculate the activity u.

図２５は、実施形態における全結合処理の１番目の処理の一例を説明する説明図である。全結合処理の１番目の処理では、予め対応付けられた単位処理部１００に第１バッファ記憶部１１からアクチベーションデータｘが入力される。全結合処理におけるアクチベーションデータは、例えば、畳み込み処理によって算出された活性である。 Figure 25 is an explanatory diagram illustrating an example of the first process of the full connection process in an embodiment. In the first process of the full connection process, activation data x is input from the first buffer memory unit 11 to a unit processing unit 100 that is associated in advance. The activation data in the full connection process is, for example, activity calculated by a convolution process.

図２５の例は、ＰＢ（０、０）にアクチベーションデータｘ_０が入力されていることを示す。図２５の例は、ＰＢ（０、１）にアクチベーションデータｘ_１が入力されていることを示す。図２５の例は、ＰＢ（１、０）にアクチベーションデータｘ_２が入力されていることを示す。図２５の例は、ＰＢ（１、１）にアクチベーションデータｘ_３が入力されていることを示す。すなわち、図２５の例は、式（５）の計算におけるアクチベーションデータｘ_０、ｘ_１、ｘ_２、ｘ_３が主演算部１０に読み出されていることを示す。すなわち、図２５の例では、以下の式（７）における点線で囲まれた値が単位処理部１００に読み出されていることを示す。 The example of FIG. 25 indicates that activation data _x0 is input to PB(0,0). The example of FIG. 25 indicates that activation data _x1 is input to PB(0,1). The example of FIG. 25 indicates that activation data _x2 is input to PB(1,0). The example of FIG. 25 indicates that activation data _x3 is input to PB(1,1). That is, the example of FIG. 25 indicates that activation data _x0 , _x1 , _x2 , and _x3 in the calculation of formula (5) are read out to the main calculation unit 10. That is, the example of FIG. 25 indicates that the value surrounded by the dotted line in the following formula (7) is read out to the unit processing unit 100.

図２６及び図２７は、全結合処理の２番目の処理を説明する説明図である。
図２６は、実施形態における全結合処理の２番目の処理の一例を説明する第１の説明図である。２番目の処理は１番目の処理の次に実行される。全結合処理の２番目の処理では、全結合係数入力第１規則にしたがって第２バッファ記憶部１２が記憶する全結合係数が主演算部１０に入力される。全結合係数入力第１規則は、以下の規則条件３を満たす単位処理部１００に以下の規則条件４及び規則条件５を満たす全結合係数ｗが入力されるという規則である。規則条件３は、入力予定の全結合係数ｗ_ＡＢの第２指標が示すアクチベーションデータｘ_Ｂが１番目の処理によって入力された単位処理部１００である、という条件である。規則条件４は、入力先の単位処理部１００には第２指標が示すアクチベーションデータｘ_Ｂが１番目の処理によって入力済みである、という条件である。規則条件５は、入力先の単位処理部１００は、第１指標が示す活性ｕ_Ａを算出するための単位処理部１００であることが全結合対応情報によって示されている、という条件である。 26 and 27 are diagrams illustrating the second step of the full join process.
FIG. 26 is a first explanatory diagram for explaining an example of the second process of the full-bonding process in the embodiment. The second process is executed after the first process. In the second process of the full-bonding process, all the bond coefficients stored in the second buffer storage unit 12 are input to the main calculation unit 10 according to the first rule for inputting all bond coefficients. The first rule for inputting all bond coefficients is a rule that all bond coefficients w that satisfy the following rule conditions 4 and 5 are input to the unit processing unit 100 that satisfies the following rule condition 3. Rule condition 3 is a condition that the activation data x _B indicated by the second index of the total bond coefficients w _AB to be input is the unit processing unit 100 input by the first process. Rule condition 4 is a condition that the activation data x _B indicated by the second index has already been input by the first process to the unit processing unit 100 of the input destination. Rule condition 5 is a condition that the full-bond correspondence information indicates that the unit processing unit 100 of the input destination is the unit processing unit 100 for calculating the activation u _A indicated by the first index.

図２６は、第２バッファ記憶部１２から全結合係数が主演算部１０に入力されることを示す。具体的には、図２６は、全結合係数ｗ_００がＰＢ（０、０）に入力され、全結合係数ｗ２２がＰＢ（１、０）に入力されることを示す。すなわち、図２６の例では、以下の式（７）における点線で囲まれた値が対応する単位処理部１００に読み出されていることを示す。 Fig. 26 shows that all coupling coefficients are input from the second buffer storage unit 12 to the main calculation unit 10. Specifically, Fig. 26 shows that all coupling coefficients _w00 are input to PB(0,0), and all coupling coefficients w22 are input to PB(1,0). That is, the example of Fig. 26 shows that the values surrounded by dotted lines in the following formula (7) are read out to the corresponding unit processing unit 100.

図２７は、実施形態における全結合処理の２番目の処理の一例を説明する第２の説明図である。
図２７は、第２バッファ記憶部１２から全結合係数が主演算部１０に入力されることを示す。具体的には、図２７は、全結合係数ｗ_１１がＰＢ（０、１）に入力され、全結合係数ｗ３３がＰＢ（１、１）に入力されることを示す。すなわち、図２７の例では、以下の式（８）における点線で囲まれた値が対応する単位処理部１００に読み出されていることを示す。 FIG. 27 is a second explanatory diagram illustrating an example of the second process of the full join process in the embodiment.
Fig. 27 shows that all coupling coefficients are input from the second buffer memory unit 12 to the main calculation unit 10. Specifically, Fig. 27 shows that all coupling coefficients _w11 are input to PB(0,1), and all coupling coefficients w33 are input to PB(1,1). That is, the example of Fig. 27 shows that the values surrounded by dotted lines in the following formula (8) are read out to the corresponding unit processing unit 100.

図２８は、実施形態における全結合処理の３番目の処理の一例を説明する説明図である。３番目の処理は２番目の処理の次に実行される。
全結合処理の３番目の処理においては、全結合処理の１番目の処理と２番目の処理とで各単位処理部１００に入力された値の積が各単位処理部１００で計算される。すなわち、図２５～図２７の例では、以下の式（９）における点線で囲まれた項の値が算出される。算出された値（以下「全結合積結果」という。）は、各単位処理部１００が備えるレジスタ１１７に記憶される。 28 is a diagram illustrating an example of the third process of the full join process according to the embodiment. The third process is executed after the second process.
In the third process of the full connection process, the product of the values input to each unit processing unit 100 in the first and second processes of the full connection process is calculated in each unit processing unit 100. That is, in the example of Fig. 25 to Fig. 27, the value of the term surrounded by the dotted line in the following equation (9) is calculated. The calculated value (hereinafter referred to as the "full connection product result") is stored in the register 117 provided in each unit processing unit 100.

図２９は、実施形態における全結合処理の４番目の処理の一例を説明する説明図である。４番目の処理は３番目の処理の次、又は、後述する６番目の処理の次に実行される。
全結合処理の４番目の処理では、各単位処理部１００が接続先の単位処理部１００に、直前の処理で用いたアクチベーションデータｘを出力する。例えば、図２９は、ＰＢ（１、０）が全結合処理の３番目の処理で用いたアクチベーションデータｘ_２をＰＢ（０、０）に出力することを示す。すなわち、ＰＢ（０、０）には、アクチベーションデータｘ_２が入力される。より具体的には、図２９の例では、全結合処理の４番目の処理により、以下の式（１０）における点線で囲まれた項の値が対応する単位処理部１００に入力される。 29 is a diagram illustrating an example of the fourth process of the full join process in the embodiment. The fourth process is executed after the third process or after the sixth process described below.
In the fourth process of the full connection process, each unit processing unit 100 outputs the activation data x used in the immediately preceding process to the connected unit processing unit 100. For example, FIG. 29 shows that PB(1,0) outputs the activation data _x2 used in the third process of the full connection process to PB(0,0). That is, the activation data _x2 is input to PB(0,0). More specifically, in the example of FIG. 29, the value of the term surrounded by the dotted line in the following formula (10) is input to the corresponding unit processing unit 100 by the fourth process of the full connection process.

図３０は、実施形態における全結合処理の５番目の処理の一例を説明する説明図である。５番目の処理は４番目の処理の次に実行される。
全結合処理の５番目の処理においては、全結合係数入力第２規則にしたがって第２バッファ記憶部１２が記憶する全結合係数が主演算部１０に入力される。全結合係数入力第２規則は、以下の規則条件６を満たす単位処理部１００に以下の規則条件７及び規則条件８を満たす全結合係数ｗが入力されるという規則である。規則条件６は、全結合処理の直前の処理（すなわち全結合処理の４番目の処理）において全結合係数ｗ_ＡＢの第２指標が示すアクチベーションデータｘ_Ｂが入力された単位処理部１００である、という条件を含む。規則条件７は、規則条件３を満たす単位処理部１００に入力されているアクチベーションデータｘ_Ｂを第２指標が示す全結合係数である、という条件を含む。規則条件８は、第１指標が示す活性ｕ_Ａが、全結合処理の２番目の処理において規則条件３を満たす単位処理部１００に入力された全結合係数の第１指標が示す活性ｕ_Ａである、という条件を含む。 30 is a diagram illustrating an example of the fifth process of the all-join process according to the embodiment. The fifth process is executed after the fourth process.
In the fifth process of the full coupling process, all coupling coefficients stored in the second buffer memory unit 12 are input to the main calculation unit 10 according to the second rule for inputting all coupling coefficients. The second rule for inputting all coupling coefficients is a rule that all coupling coefficients w that satisfy the following rule conditions 7 and 8 are input to the unit processing unit 100 that satisfies the following rule condition 6. Rule condition 6 includes a condition that the unit processing unit 100 is the unit to which the activation data x _B indicated by the second index of the all coupling coefficients w _AB was input in the process immediately before the full coupling process (i.e., the fourth process of the full coupling process). Rule condition 7 includes a condition that the activation data x _B input to the unit processing unit 100 that satisfies rule condition 3 is the all coupling coefficient indicated by the second index. Rule condition 8 includes a condition that the activation u _A indicated by the first index is the activation u _A indicated by the first index of the all coupling coefficients input to the unit processing unit 100 that satisfies rule condition 3 in the second process of the full coupling process.

例えば、図３０の例では、ＰＢ（０、０）には全結合係数ｗ_０２が入力される。例えば、図３０の例では、ＰＢ（０、１）には全結合係数ｗ_１０が入力される。例えば、図３０の例では、ＰＢ（１、０）には全結合係数ｗ_２３が入力される。例えば、図３０の例では、ＰＢ（１、１）には全結合係数ｗ_３１が入力される。より具体的には、図３０の例では、全結合処理の５番目の処理により、以下の式（１１）における点線で囲まれた項の値が対応する単位処理部１００に入力される。 For example, in the example of FIG. 30, a total coupling coefficient w ₀₂ is input to PB(0,0). For example, in the example of FIG. 30, a total coupling coefficient w ₁₀ is input to PB(0,1). For example, in the example of FIG. 30, a total coupling coefficient w ₂₃ is input to PB(1,0). For example, in the example of FIG. 30, a total coupling coefficient w ₃₁ is input to PB(1,1). More specifically, in the example of FIG. 30, the value of the term surrounded by a dotted line in the following formula (11) is input to the corresponding unit processing unit 100 by the fifth process of the total coupling process.

図３１は、実施形態における全結合処理の６番目の処理の一例を説明する説明図である。６番目の処理は５番目の処理の次に実行される。
全結合処理の６番目の処理においては、全結合処理の４番目の処理で入力されたアクチベーションデータと全結合処理の５番目の処理で入力された全結合係数との積が計算される。算出された値は、全結合積結果としてレジスタ１１７に記憶される。例えば、図３１の例では、以下の式（１２）における点線で囲まれた項の値が算出される。 31 is a diagram illustrating an example of the sixth process of the full join process according to the embodiment. The sixth process is executed after the fifth process.
In the sixth process of the full connection process, the product of the activation data input in the fourth process of the full connection process and the full connection coefficients input in the fifth process of the full connection process is calculated. The calculated value is stored in the register 117 as the full connection product result. For example, in the example of FIG. 31, the value of the term surrounded by the dotted line in the following formula (12) is calculated.

図３２は、実施形態における全結合処理の流れの一例を示すフローチャートである。
全結合処理が開始される（ステップＳ２０１）。全結合処理の１番目の処理が実行される（ステップＳ２０２）。次に、全結合処理の２番目の処理が実行される（ステップＳ２０３）。次に、全結合処理の３番目の処理が実行される（ステップＳ２０４）。次に、全結合処理の４番目の処理が実行される（ステップＳ２０５）。次に、全結合処理の５番目の処理が実行される（ステップＳ２０６）。次に、全結合処理の６番目の処理が実行される（ステップＳ２０７）。全結合係数の全てが活性ｕの算出に用いられたか否かが判定される（ステップＳ２０８）。ステップＳ２０８の判定は例えば制御部１４によって判定される。活性ｕの算出に用いられていない全結合係数が有る場合（ステップＳ２０８：ＹＥＳ）、ステップＳ２０５の処理が実行される。一方、活性ｕの算出に用いられていない全結合係数が無い場合（ステップＳ２０８：ＮＯ）、各単位処理部１００は、ステップＳ２０１以降に各単位処理部１００がレジスタ１１７に記憶した全結合積結果の和を算出する（ステップＳ２０９）。算出された和が、活性ｕである。ステップＳ２０９の次に、全結合処理が終了する（ステップＳ２１０）。 FIG. 32 is a flowchart showing an example of the flow of the all-join process in the embodiment.
The full-combination process is started (step S201). The first process of the full-combination process is executed (step S202). Next, the second process of the full-combination process is executed (step S203). Next, the third process of the full-combination process is executed (step S204). Next, the fourth process of the full-combination process is executed (step S205). Next, the fifth process of the full-combination process is executed (step S206). Next, the sixth process of the full-combination process is executed (step S207). It is determined whether all of the coupling coefficients have been used in the calculation of the activity u (step S208). The determination in step S208 is made by, for example, the control unit 14. If there are any coupling coefficients that have not been used in the calculation of the activity u (step S208: YES), the process in step S205 is executed. On the other hand, if there are no all coupling coefficients that have not been used in the calculation of the activation u (step S208: NO), each processing unit 100 calculates the sum of all coupling product results that each processing unit 100 has stored in the register 117 after step S201 (step S209). The calculated sum is the activation u. After step S209, the all coupling process ends (step S210).

このように構成された実施形態のニューラルネットワーク回路１は、全結合層の処理において、第１バッファ記憶部１１及び第２バッファ記憶部１２から主演算部１０に入力されたデータが、単位処理部１００から他の単位処理部１００に出力される。このため、主演算部１０が第１バッファ記憶部１１及び第２バッファ記憶部１２からデータを読み出す機会が少ない。このことは、ニューラルネットワーク回路１が、第１外部メモリ９１及び第２外部メモリ９２からデータを読み出す機会が少ないことを意味する。そのため、このように構成されたニューラルネットワーク回路１は、全結合層の処理において、畳み込みニューラルネットワークにおける演算速度を向上させることができる。また、第１外部メモリ９１及び第２外部メモリ９２からデータを読み出す機会が少ないため、ニューラルネットワーク回路１は、消費電力の少ない回路である。 In the neural network circuit 1 of the embodiment configured in this manner, in the processing of the fully connected layer, data input from the first buffer memory unit 11 and the second buffer memory unit 12 to the main processing unit 10 is output from the unit processing unit 100 to another unit processing unit 100. Therefore, the main processing unit 10 has few opportunities to read data from the first buffer memory unit 11 and the second buffer memory unit 12. This means that the neural network circuit 1 has few opportunities to read data from the first external memory 91 and the second external memory 92. Therefore, the neural network circuit 1 configured in this manner can improve the calculation speed in the convolutional neural network in the processing of the fully connected layer. In addition, since there are few opportunities to read data from the first external memory 91 and the second external memory 92, the neural network circuit 1 is a circuit with low power consumption.

また、このように構成された実施形態のニューラルネットワーク回路１は、制御部１４の制御によりコンフィギュレーションデータに基づいて単位処理部１００間の接続関係を変更することができる。そのため、ニューラルネットワーク回路１は同一の回路で、畳み込み層の処理と全結合層の処理とを実行することができる汎用性の高い回路である。 In addition, the neural network circuit 1 of the embodiment configured in this manner can change the connection relationships between the unit processing units 100 based on the configuration data under the control of the control unit 14. Therefore, the neural network circuit 1 is a highly versatile circuit that can execute both the processing of the convolution layer and the processing of the fully connected layer with the same circuit.

（変形例）
なお、主演算部１０において単位処理部１００は、必ずしも格子状に２次元的に配置されている必要は無い。複数の単位処理部１００は、コンフィギュレーションデータが示す接続関係で他の単位処理部１００に接続可能な位置に配置されればどのように配置されてもよい。例えば、単位処理部１００は、３次元的に配置されていてもよい。例えば、各単位処理部１００は、立方格子の各頂点に位置してもよい。 (Modification)
In the main processing unit 10, the unit processing units 100 do not necessarily need to be arranged two-dimensionally in a lattice pattern. The multiple unit processing units 100 may be arranged in any manner as long as they are arranged at positions where they can be connected to other unit processing units 100 in accordance with the connection relationship indicated by the configuration data. For example, the unit processing units 100 may be arranged three-dimensionally. For example, each unit processing unit 100 may be located at each vertex of a cubic lattice.

なお、第１バッファ記憶部１１は、第１記憶部の一例である。なお、第２バッファ記憶部１２は、第２記憶部の一例である。なお、単位処理部１００は演算部の一例である。なお、フィルタ係数及び全結合係数は重みデータの一例である。なお、アクチベーションデータとフィルタ係数との積と、アクチベーションデータと全結合係数との積とは、項値の一例である。なお、図１５及び図１７における各フィルタ係数と、図２６～図３１における各全結合係数とは、重みデータに関する所定の条件を満たす１つの前記重みデータの一例である。なお、規則条件１、規則条件２、規則条件４、規則条件５、規則条件７及び規則条件８は、重みデータに関する所定の条件の一例である。 The first buffer memory unit 11 is an example of a first memory unit. The second buffer memory unit 12 is an example of a second memory unit. The unit processing unit 100 is an example of a calculation unit. The filter coefficients and the total coupling coefficients are examples of weight data. The product of the activation data and the filter coefficients and the product of the activation data and the total coupling coefficients are examples of term values. Each filter coefficient in FIG. 15 and FIG. 17 and each total coupling coefficient in FIG. 26 to FIG. 31 are examples of one piece of weight data that satisfies a predetermined condition related to the weight data. Rule condition 1, rule condition 2, rule condition 4, rule condition 5, rule condition 7, and rule condition 8 are examples of a predetermined condition related to the weight data.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The above describes an embodiment of the present invention in detail with reference to the drawings, but the specific configuration is not limited to this embodiment, and includes designs that do not deviate from the gist of the present invention.

１…ニューラルネットワーク回路、１０…主演算部、１１…第１バッファ記憶部、１２…第２バッファ記憶部、１３…制御用バッファ記憶部、１４…制御部、１００…単位処理部、１０１…接続部、１０２…積和演算部、１１１…入力制御回路、１１２…重み制御回路、１１３…出力制御回路、１１４、１１５、１１６、１１７…レジスタ、１１８…フィードバック導線、２０１…積和演算回路、２０２…活性化回路 1...Neural network circuit, 10...Main calculation unit, 11...First buffer memory unit, 12...Second buffer memory unit, 13...Control buffer memory unit, 14...Control unit, 100...Unit processing unit, 101...Connection unit, 102...Product-sum calculation unit, 111...Input control circuit, 112...Weight control circuit, 113...Output control circuit, 114, 115, 116, 117...Register, 118...Feedback conductor, 201...Product-sum calculation circuit, 202...Activation circuit

Claims

A neural network circuit for executing processing of a convolutional layer or a fully connected layer in a convolutional neural network having a fully connected layer and a plurality of convolutional layers, comprising:
A first storage unit that stores activation data, which is data input to each layer and is represented by a tensor, and is a value of each element of the data;
A second storage unit that stores weight data for processing executed in the convolution layer or the fully connected layer;
A plurality of calculation units that read out one of the activation data from the first storage unit at a predetermined timing, and obtain a term value that is a value of a term of a product of the read out activation data and one of the weight data that satisfies a predetermined condition related to the weight data;
Equipped with
The activation data read by the calculation unit is output to another calculation unit associated with the activation data after the calculation unit calculates the term value ,
The calculation unit is used in common for the processing of the convolution layer and the processing of the fully connected layer.
Neural network circuit.

The process of the convolution layer is a convolution integral,
The weight data when performing the convolution integral is a filter coefficient, which is each value of a filter for performing the convolution integral.
2. The neural network circuit of claim 1.

The weight data at the time of execution of the processing of the fully connected layer is a weight coefficient used in the processing of the fully connected layer.
2. The neural network circuit of claim 1.

A neural network circuit for executing processing of a convolutional layer or a fully connected layer in a convolutional neural network having a fully connected layer and a plurality of convolutional layers, the neural network circuit comprising: a first storage unit for storing activation data, which is a value of each element of data represented by a tensor and input to each layer; a second storage unit for storing weight data for processing executed in the convolutional layer or the fully connected layer; and a plurality of arithmetic units for reading one of the activation data from the first storage unit at a predetermined timing and acquiring a term value, which is a term value of a product of the read activation data and one of the weight data that satisfies a predetermined condition related to the weight data, the activation data read by the arithmetic unit is output to another of the arithmetic units associated in advance after the arithmetic unit calculates the term value , the arithmetic unit being used in common for processing of the convolutional layer and processing of the fully connected layer, the neural network arithmetic unit comprising:
An acquisition step in which the calculation unit reads out one of the activation data from the first storage unit at a predetermined timing, and acquires a term value that is a value of a term of a product of the read out activation data and one of the weight data that satisfies a predetermined condition related to the weight data;
an output step in which the activation data read by the calculation unit is output to another calculation unit associated with the activation data after the calculation unit calculates the term value;
The neural network operation method includes: