JP2021026613A

JP2021026613A - Calculation device, calculation method and program

Info

Publication number: JP2021026613A
Application number: JP2019145592A
Authority: JP
Inventors: 鈴木　亮; Akira Suzuki; 亮鈴木
Original assignee: Denso Ten Ltd
Current assignee: Denso Ten Ltd
Priority date: 2019-08-07
Filing date: 2019-08-07
Publication date: 2021-02-22

Abstract

To eliminate a time required for processing between cores to speed up calculation.SOLUTION: A calculation device comprises a control unit. In a neural network in which calculation in each layer is executed based on a calculation result in a previous layer, the control unit executes the calculations of the layers at predetermined calculation timing in parallel based on the calculation result of the previous layer at previous calculation timing.SELECTED DRAWING: Figure 2

Description

開示の実施形態は、演算装置、演算方法およびプログラムに関する。 The disclosed embodiments relate to arithmetic units, arithmetic methods and programs.

従来、エンジンやブレーキ等のパワートレイン系機器や、物体検知を行うレーダ系機器における制御のための演算を行う演算装置が知られている。かかる演算装置は、組み込みマイコンとも呼ばれ、特定の機器を制御するために設計されたシステムが組み込まれる。 Conventionally, there are known arithmetic devices that perform calculations for control in power train devices such as engines and brakes, and radar devices that detect objects. Such an arithmetic unit is also called an embedded microcomputer, and incorporates a system designed to control a specific device.

ところで、近年、かかる組み込み分野での機器の演算にＡＩ（Artificial Intelligence）の活用が望まれている。しかし、組み込みマイコンは、必要最小限のスペックしか有していないことが多いため、大量の行列演算を行うＡＩ処理を行うとなると演算に時間がかかり、処理負荷も増大する。このため、ＡＩ処理には不向きであった。 By the way, in recent years, it has been desired to utilize AI (Artificial Intelligence) for the calculation of devices in the embedded field. However, since embedded microcomputers often have only the minimum necessary specifications, it takes time to perform AI processing that performs a large amount of matrix operations, and the processing load also increases. Therefore, it was not suitable for AI treatment.

ただし、こうした組み込みマイコンもマルチコア化が進んできたため、複数のコアで演算処理を分割することで、処理負荷の低減が可能になってきている（たとえば、特許文献１参照）。 However, since such embedded microcomputers have become multi-core, it has become possible to reduce the processing load by dividing the arithmetic processing among a plurality of cores (see, for example, Patent Document 1).

特開平０５−２８２２７２号公報Japanese Unexamined Patent Publication No. 05-282272

しかしながら、上述した従来技術には、コア間での処理待ちを解消して演算の高速化を図るうえで、更なる改善の余地がある。 However, the above-mentioned conventional technique has room for further improvement in eliminating the processing waiting between cores and increasing the speed of calculation.

たとえば、ＡＩ処理で用いられるニューラルネットワークは、ノードごとの演算が独立しているために、ノードによって異なるコアを割り当てた並列演算が可能である。ただし、ニューラルネットワークにおける所定の層の演算のためには、その前段の層のすべてのノードで演算が完了している必要がある。このため、ノードとコアの割り当てによっては、他のコアでの演算完了を待つ同期処理が必要になる。 For example, in the neural network used in AI processing, since the operations for each node are independent, parallel operations in which different cores are assigned to each node are possible. However, in order to perform an operation on a predetermined layer in a neural network, it is necessary that all the nodes in the previous layer have completed the operation. Therefore, depending on the allocation of nodes and cores, synchronous processing that waits for the completion of operations on other cores is required.

かかる同期処理においては、割り込み処理等により演算完了が遅れたコアがあった場合、他のコアはその完了を待ち合わせる必要があり、マルチコアによる並列演算化にも関わらず、無駄なオーバーヘッドが生じてしまう。 In such synchronous processing, if there is a core whose calculation completion is delayed due to interrupt processing or the like, it is necessary for the other cores to wait for the completion, and a useless overhead is generated despite the parallel calculation by the multi-core. ..

実施形態の一態様は、上記に鑑みてなされたものであって、コア間での処理待ちを解消して演算の高速化を図ることができる演算装置、演算方法およびプログラムを提供することを目的とする。 One aspect of the embodiment is made in view of the above, and an object of the present invention is to provide an arithmetic unit, an arithmetic method, and a program capable of eliminating the processing waiting between cores and speeding up the arithmetic. And.

実施形態の一態様に係る演算装置は、制御部を備える。前記制御部は、各層の演算がそれぞれの前段の層における演算結果に基づいて実行されるニューラルネットワークにおいて、所定の演算タイミングにおける各層の演算を、前回の演算タイミングでの前記前段の層における演算結果に基づいて並列に実行させる。 The arithmetic unit according to one aspect of the embodiment includes a control unit. In a neural network in which the calculation of each layer is executed based on the calculation result of the previous layer, the control unit performs the calculation of each layer at a predetermined calculation timing and the calculation result of the previous layer at the previous calculation timing. To be executed in parallel based on.

実施形態の一態様によれば、コア間での処理待ちを解消して演算の高速化を図ることができる。 According to one aspect of the embodiment, it is possible to eliminate the processing wait between the cores and speed up the calculation.

図１Ａは、実施形態に係る演算方法の概要説明図（その１）である。FIG. 1A is a schematic explanatory view (No. 1) of the calculation method according to the embodiment. 図１Ｂは、実施形態に係る演算方法の概要説明図（その２）である。FIG. 1B is a schematic explanatory view (No. 2) of the calculation method according to the embodiment. 図１Ｃは、実施形態に係る演算方法の概要説明図（その３）である。FIG. 1C is a schematic explanatory view (No. 3) of the calculation method according to the embodiment. 図２は、実施形態に係る車載装置のブロック図である。FIG. 2 is a block diagram of the in-vehicle device according to the embodiment. 図３は、実施形態に係る車載装置の動作イメージを示す図である。FIG. 3 is a diagram showing an operation image of the vehicle-mounted device according to the embodiment. 図４は、演算時間の説明図である。FIG. 4 is an explanatory diagram of the calculation time. 図５は、実施形態に係る車載装置が実行する処理手順を示すフローチャートある。FIG. 5 is a flowchart showing a processing procedure executed by the in-vehicle device according to the embodiment.

以下、添付図面を参照して、本願の開示する演算装置、演算方法およびプログラムの実施形態を詳細に説明する。なお、以下に示す実施形態によりこの発明が限定されるものではない。 Hereinafter, embodiments of the arithmetic unit, arithmetic method, and program disclosed in the present application will be described in detail with reference to the accompanying drawings. The present invention is not limited to the embodiments shown below.

また、以下では、演算装置がマルチコア構成であり、コアの数が３であるものとする。また、以下では、演算装置が、車両に搭載される車載装置１０である場合を例に挙げて説明を行う。 Further, in the following, it is assumed that the arithmetic unit has a multi-core configuration and the number of cores is 3. Further, in the following, the case where the arithmetic unit is an in-vehicle device 10 mounted on a vehicle will be described as an example.

また、以下では、実施形態に係る演算方法の概要について図１Ａ〜図１Ｃを用いて説明した後に、実施形態に係る演算方法を適用した車載装置１０の具体的な構成について、図２〜図５を用いて説明することとする。 Further, in the following, after explaining the outline of the calculation method according to the embodiment with reference to FIGS. 1A to 1C, the specific configuration of the vehicle-mounted device 10 to which the calculation method according to the embodiment is applied will be described with reference to FIGS. 2 to 5. Will be explained using.

まず、実施形態に係る演算方法の概要について図１Ａ〜図１Ｃを用いて説明する。図１Ａ〜図１Ｃは、実施形態に係る演算方法の概要説明図（その１）〜（その３）である。 First, the outline of the calculation method according to the embodiment will be described with reference to FIGS. 1A to 1C. 1A to 1C are schematic explanatory views (No. 1) to (No. 3) of the calculation method according to the embodiment.

なお、図１Ａおよび図１Ｂには、ニューラルネットワークの一例を示している。かかるニューラルネットワークにおいて、丸印を「ノード」と呼び、ノードを接続している線を「辺」と呼ぶ場合がある。 Note that FIGS. 1A and 1B show an example of a neural network. In such a neural network, a circle may be called a "node" and a line connecting the nodes may be called a "side".

図１Ａおよび図１Ｂのニューラルネットワークは、入力層に値Ｘ１〜Ｘ３が入力され、各辺に対応付けられた重みに応じつつ、中間層の第１層にて値Ｙ１〜Ｙ３が、つづいて第２層にて値Ｚ１〜Ｚ３が、そして出力層にて値Ａ１〜Ａ３がそれぞれ演算されるものである。 In the neural networks of FIGS. 1A and 1B, the values X1 to X3 are input to the input layer, and the values Y1 to Y3 are subsequently added to the first layer of the intermediate layer while responding to the weights associated with each side. The values Z1 to Z3 are calculated in the two layers, and the values A1 to A3 are calculated in the output layer.

ところで、実施形態に係る車載装置１０は、たとえば、車載機器を電子的に制御する各種の電子制御装置に適用することができる。電子制御装置が制御対象とする車載機器は、エンジン、変速機、ブレーキ等のパワートレイン系機器、物体検知を行うレーダ系機器、エアコン、シート、ドアロック等のボディ系機器、ナビゲーション装置等の情報系機器、および、エアバックやシートベルト等の安全系機器等である。 By the way, the in-vehicle device 10 according to the embodiment can be applied to, for example, various electronic control devices that electronically control the in-vehicle device. In-vehicle devices controlled by electronic control devices include powertrain devices such as engines, transmissions, and brakes, radar devices that detect objects, body devices such as air conditioners, seat belts, and door locks, and navigation devices. System equipment and safety equipment such as airbags and seat belts.

たとえば、車載装置１０は、エンジンのノッキングを高精度に判定するための手法として上記したニューラルネットワークを用いることができる。かかる場合、値Ｘ１〜Ｘ３は、たとえばノックセンサの波形を示す画像等であり、値Ａ１〜Ａ３は、ノッキングの有無の確率等である。このような値Ａ１〜Ａ３は、たとえばパワートレイン系機器の故障検出等に用いることができる。なお、これはあくまで一例であり、ニューラルネットワークによる推定は、ノッキング判定に限らず任意のものであってよい。 For example, the in-vehicle device 10 can use the above-mentioned neural network as a method for determining engine knocking with high accuracy. In such a case, the values X1 to X3 are, for example, images showing the waveform of the knock sensor, and the values A1 to A3 are the probabilities of the presence or absence of knocking. Such values A1 to A3 can be used, for example, for failure detection of powertrain equipment. Note that this is just an example, and the estimation by the neural network is not limited to the knocking determination and may be arbitrary.

まず、図１Ａには、本実施形態の比較例となるニューラルネットワークを示している。図１Ａに示すように、比較例に係るニューラルネットワークは、中間層以降の各層においてノードごとの演算が互いに独立しているため、マルチコアにつき、ノードによって異なるコアを割り当てた並列演算が可能である。 First, FIG. 1A shows a neural network as a comparative example of the present embodiment. As shown in FIG. 1A, in the neural network according to the comparative example, the operations for each node are independent of each other in each layer after the intermediate layer, so that it is possible to perform parallel operations in which different cores are assigned to each node for a multi-core.

たとえば図１Ａには、値Ｙ１，Ｚ１，Ａ１を第１コアで演算し、値Ｙ２，Ｚ２，Ａ２を第２コアで演算し、値Ｙ３，Ｚ３，Ａ３を第３コアで演算するように、各コアを割り当てた例を示している。 For example, in FIG. 1A, the values Y1, Z1, A1 are calculated on the first core, the values Y2, Z2, A2 are calculated on the second core, and the values Y3, Z3, A3 are calculated on the third core. An example of allocating each core is shown.

ただし、かかる例の場合、図１Ａに示すように、たとえば中間層の第１層で第３コアによる値Ｙ３の演算が未完了であった場合、第１コアでは値Ｙ１の、第２コアでは値Ｙ２の演算がそれぞれ終わっていても、後段の中間層の第２層では値Ｙ３の入力が必要である。 However, in the case of such an example, as shown in FIG. 1A, for example, when the calculation of the value Y3 by the third core is incomplete in the first layer of the intermediate layer, the value Y1 in the first core and the value Y1 in the second core. Even if the calculation of the value Y2 is completed, it is necessary to input the value Y3 in the second layer of the intermediate layer in the subsequent stage.

このため、値Ｚ１を演算する第１コア、および、値Ｚ２を演算する第２コアは、第３コアによる値Ｙ３の演算の完了を待ち合わせる必要がある。したがって、比較例に係るニューラルネットワークを用いた演算方法では、マルチコアによる並列演算化にも関わらず、無駄なオーバーヘッドが生じるという問題があった。 Therefore, the first core that calculates the value Z1 and the second core that calculates the value Z2 need to wait for the completion of the calculation of the value Y3 by the third core. Therefore, the calculation method using the neural network according to the comparative example has a problem that unnecessary overhead is generated in spite of the parallel calculation by the multi-core.

そこで、実施形態に係る演算方法では、ニューラルネットワークの層ごとに各コアを割り当てることとした。そして、そのうえで、各層における今回の演算タイミングにおける演算については、前段の層における前回の演算タイミングの演算結果（以下、「前回値」と言う場合がある）を入力して行うこととした。 Therefore, in the calculation method according to the embodiment, each core is assigned to each layer of the neural network. Then, on that basis, the calculation at the current calculation timing in each layer is performed by inputting the calculation result of the previous calculation timing in the previous layer (hereinafter, may be referred to as "previous value").

具体的には、図１Ｂに示すように、実施形態に係る演算方法では、中間層以降の層ごとに各コアを割り当てる。たとえば、図１Ｂには、中間層の第１層Ｌ１に第１コアを、第２層Ｌ２に第２コアを、出力層に第３コアをそれぞれ割り当てた例を示している。 Specifically, as shown in FIG. 1B, in the calculation method according to the embodiment, each core is assigned to each layer after the intermediate layer. For example, FIG. 1B shows an example in which the first core is assigned to the first layer L1 of the intermediate layer, the second core is assigned to the second layer L2, and the third core is assigned to the output layer.

そして、そのうえで、実施形態に係る演算方法では、図１Ｂに示すように、各層における今回の演算タイミングｔにおける演算については、前段の層における前回の演算タイミングｔ−１の演算結果を入力して行う。 Then, in the calculation method according to the embodiment, as shown in FIG. 1B, the calculation at the current calculation timing t in each layer is performed by inputting the calculation result of the previous calculation timing t-1 in the previous layer. ..

これを実現するために、中間層（第１層Ｌ１および第２層Ｌ２）の各ノードは、図１Ｃに示すように、二重のバッファ＃０，＃１を有している。これらバッファ＃０，＃１は、それぞれ「演算結果格納用」と「入力値参照用」との２つの用途があり、図１Ｃに示すように、演算タイミングごとにかかる用途を切り替えて用いられる。したがって、今回値を前回値へコピーするといった、バッファ＃０，＃１間のデータ転送は不要である。 In order to realize this, each node of the intermediate layer (first layer L1 and second layer L2) has double buffers # 0 and # 1 as shown in FIG. 1C. These buffers # 0 and # 1 have two uses, "for storing the calculation result" and "for referring to the input value", respectively, and as shown in FIG. 1C, the buffers # 0 and # 1 are used by switching the use for each calculation timing. Therefore, it is not necessary to transfer data between buffers # 0 and # 1 such as copying the current value to the previous value.

このため、図１Ｂに示すように、仮に、第１コアによる第１層Ｌ１の演算が未完了だったとしても、後段の第２コアによる第２層Ｌ２の演算は、第１層Ｌ１における前回の演算タイミングｔ−１の演算結果である値Ｙ１_ｔ−１〜Ｙ３_ｔ−１を入力するため、待ち合わせは不要となる。 Therefore, as shown in FIG. 1B, even if the calculation of the first layer L1 by the first core is not completed, the calculation of the second layer L2 by the second core in the subsequent stage is the previous time in the first layer L1. _{Since the values Y1 t-1 to} Y3 _t-1 , which are the calculation results of the calculation timing t-1 of, are input, no waiting is required.

なお、各層における各コアによる演算時間は、演算タイミングの周期より十分に短い。したがって、次回の演算タイミングまでには、各層における各コアの演算は、すべて完了することが可能である。演算タイミングは、たとえばタイマ割り込み（時間同期）や、パルス入力割り込み（パワートレイン系であればクランク角同期等）により同期させることができる。 The calculation time by each core in each layer is sufficiently shorter than the calculation timing cycle. Therefore, by the next calculation timing, all the operations of each core in each layer can be completed. The calculation timing can be synchronized by, for example, a timer interrupt (time synchronization) or a pulse input interrupt (crank angle synchronization in the case of a power train system).

このように、実施形態に係る演算方法では、ニューラルネットワークの層ごとに各コアを割り当て、そのうえで、各層における今回の演算タイミングにおける演算については、前段の層における前回の演算タイミングの演算結果を入力して行うこととした。 As described above, in the calculation method according to the embodiment, each core is assigned to each layer of the neural network, and then, for the calculation at the current calculation timing in each layer, the calculation result of the previous calculation timing in the previous layer is input. I decided to do it.

したがって、実施形態に係る演算方法によれば、コア間での処理待ちを解消して演算の高速化を図ることができる。 Therefore, according to the calculation method according to the embodiment, it is possible to eliminate the processing waiting between the cores and speed up the calculation.

以下、上述した実施形態に係る演算方法を適用した車載装置１０について、さらに具体的に説明する。 Hereinafter, the in-vehicle device 10 to which the calculation method according to the above-described embodiment is applied will be described in more detail.

図２は、実施形態に係る車載装置１０のブロック図である。なお、図２では、本実施形態の特徴を説明するために必要な構成要素のみを機能ブロックで表しており、一般的な構成要素についての記載を省略している。 FIG. 2 is a block diagram of the in-vehicle device 10 according to the embodiment. Note that, in FIG. 2, only the components necessary for explaining the features of the present embodiment are represented by functional blocks, and the description of general components is omitted.

換言すれば、図２に図示される各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。例えば、各機能ブロックの分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することが可能である。 In other words, each component shown in FIG. 2 is a functional concept and does not necessarily have to be physically configured as shown. For example, the specific form of distribution / integration of each functional block is not limited to the one shown in the figure, and all or part of the functional blocks are functionally or physically distributed in arbitrary units according to various loads and usage conditions. -It is possible to integrate and configure.

また、図２を用いた説明では、既に説明済みの構成要素については、説明を簡略化するか、説明を省略する場合がある。 Further, in the description using FIG. 2, the description of the components already described may be simplified or omitted.

図２に示すように、車載装置１０は、記憶部１１と、制御部１２とを備える。記憶部１１は、たとえば、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）等の半導体メモリ素子等によって実現され、図２の例では、二重バッファ１１ａ，１１ｂを含む。 As shown in FIG. 2, the in-vehicle device 10 includes a storage unit 11 and a control unit 12. The storage unit 11 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory (Flash Memory), and includes the double buffers 11a and 11b in the example of FIG.

二重バッファ１１ａ，１１ｂはそれぞれ、図１Ｃで説明した二重のバッファであって、各バッファは、演算タイミングごとに、演算結果格納用と入力値参照用とで用途が切り替えられる。 The double buffers 11a and 11b are the double buffers described with reference to FIG. 1C, respectively, and the use of each buffer can be switched between for storing the calculation result and for referring to the input value at each calculation timing.

なお、図２では、記憶部１１と制御部１２とを分離して表しているが、記憶部１１の一部として、たとえばＴＣＭ（Tightly Coupled Memory）が制御部１２に内蔵されていてもよい。 Although the storage unit 11 and the control unit 12 are shown separately in FIG. 2, for example, a TCM (Tightly Coupled Memory) may be built in the control unit 12 as a part of the storage unit 11.

制御部１２は、いわゆるマルチコアＣＰＵ（Central Processing Unit）であって、第１コア１２１−１と、第２コア１２１−２と、第３コア１２１−３とを有する。 The control unit 12 is a so-called multi-core CPU (Central Processing Unit), and has a first core 121-1, a second core 121-2, and a third core 121-3.

制御部１２は、本実施形態の場合、第１コア１２１−１を中間層の第１層Ｌ１に、第２コア１２１−２を中間層の第２層Ｌ２に、第３コア１２１−３を出力層にそれぞれ割り当てる。 In the case of the present embodiment, the control unit 12 connects the first core 121-1 to the first layer L1 of the intermediate layer, the second core 121-2 to the second layer L2 of the intermediate layer, and the third core 121-3. Assign to each output layer.

第１コア１２１−１、第２コア１２１−２および第３コア１２１−３は、コントローラ（controller）であり、たとえば、車載装置１０内部の記憶デバイスに記憶されている各種プログラムがＲＡＭを作業領域として実行されることにより実現される。また、たとえば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現することができる。 The first core 121-1, the second core 121-2, and the third core 121-3 are controllers, and for example, various programs stored in the storage device inside the in-vehicle device 10 use the RAM as a work area. It is realized by executing as. Further, for example, it can be realized by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

第１コア１２１−１および第２コア１２１−２はそれぞれ、判定部１２１ａと、切替部１２１ｂと、演算部１２１ｃとを有する。第３コア１２１−３は、判定部１２１ａと、演算部１２１ｃとを有する。 The first core 121-1 and the second core 121-2 each have a determination unit 121a, a switching unit 121b, and a calculation unit 121c, respectively. The third core 121-3 has a determination unit 121a and a calculation unit 121c.

判定部１２１ａは、上述した演算タイミングを同期させるためのタイマ割り込みや、パルス入力割り込みを取得し、演算タイミングを判定する。 The determination unit 121a acquires a timer interrupt for synchronizing the above-mentioned calculation timing and a pulse input interrupt, and determines the calculation timing.

切替部１２１ｂは、判定部１２１ａによって演算タイミングであると判定された場合に、二重バッファ１１ａ，１１ｂのそれぞれにおいて、演算結果格納用と入力値参照用とで各バッファの用途を切り替える。 When the determination unit 121a determines that the calculation timing is reached, the switching unit 121b switches the use of each buffer between the operation result storage and the input value reference in each of the double buffers 11a and 11b.

演算部１２１ｃは、割り当てられたニューラルネットワークの各層における各ノードの演算を実行する。なお、本実施形態の場合、第１コア１２１−１の演算部１２１ｃは、入力値を入力して演算を行い、二重バッファ１１ａ内の、今回の演算タイミングにおいて演算結果格納用に切り替えられているバッファへ演算結果を格納する。 The calculation unit 121c executes the calculation of each node in each layer of the assigned neural network. In the case of the present embodiment, the calculation unit 121c of the first core 121-1 inputs the input value to perform the calculation, and is switched for storing the calculation result in the double buffer 11a at the current calculation timing. Store the operation result in the existing buffer.

また、第２コア１２１−２の演算部１２１ｃは、二重バッファ１１ａ内の、今回の演算タイミングにおいて入力値参照用に切り替えられているバッファから、第１コア１２１−１の前回値を入力する。そして、かかる前回値に基づいて演算を行い、二重バッファ１１ｂ内の、今回の演算タイミングにおいて演算結果格納用に切り替えられているバッファへ演算結果を格納する。 Further, the calculation unit 121c of the second core 121-2 inputs the previous value of the first core 121-1 from the buffer in the double buffer 11a that has been switched for input value reference at the current calculation timing. .. Then, the calculation is performed based on the previous value, and the calculation result is stored in the buffer switched for storing the calculation result at the current calculation timing in the double buffer 11b.

また、第３コア１２１−３の演算部１２１ｃは、二重バッファ１１ｂ内の、今回の演算タイミングにおいて入力値参照用に切り替えられているバッファから、第２コア１２１−２の前回値を入力する。そして、かかる前回値に基づいて演算を行い、その演算結果を出力値として出力する。 Further, the calculation unit 121c of the third core 121-3 inputs the previous value of the second core 121-2 from the buffer in the double buffer 11b that has been switched for input value reference at the current calculation timing. .. Then, an operation is performed based on the previous value, and the operation result is output as an output value.

次に、これまでの説明を分かりやすくするために、演算タイミングｔ，ｔ＋１それぞれにおいての動作イメージを図３に示す。図３は、実施形態に係る車載装置１０の動作イメージを示す図である。 Next, in order to make the explanation so far easy to understand, FIG. 3 shows an operation image at each of the calculation timings t and t + 1. FIG. 3 is a diagram showing an operation image of the in-vehicle device 10 according to the embodiment.

図３に示すように、演算タイミングｔにおいては、中間層の第２層Ｌ２以降にそれぞれ割り当てられた第２コア１２１−２または第３コア１２１−３は、前段の各層における前回値（すなわち、前回の演算タイミングｔ−１での演算結果である値Ｙ１_ｔ−１〜Ｙ３_ｔ−１または値Ｚ１_ｔ−１〜Ｚ３_ｔ−１）を入力し、各ノードの演算を実行する。また、中間層の第１層Ｌ１に割り当てられた第１コア１２１−１は、入力値である値Ｘ１_ｔ〜Ｘ３_ｔを入力し、各ノードの演算を実行する。これら各コアの演算は、並列に実行される。 As shown in FIG. 3, at the calculation timing t, the second core 121-2 or the third core 121-3 assigned to the second layer L2 and later of the intermediate layer are the previous values (that is, that is, the previous values in each layer of the previous stage). _{The values Y1 t-1 to} Y3 _t-1 or the values Z1 _{t-1 to} Z3 _t-1 ), which are the calculation results at the previous calculation timing t-1, are input, and the calculation of each node is executed. Further, the first core 121-1 assigned to the first layer L1 of the intermediate layer inputs the input values X1 _{t to} X3 _t and executes the calculation of each node. The operations of each of these cores are executed in parallel.

そして、中間層での演算結果（すなわち、値Ｙ１_ｔ〜Ｙ３_ｔまたは値Ｚ１_ｔ〜Ｚ３_ｔ）は、演算タイミングｔにおいて演算結果格納用に切り替えられているバッファへそれぞれ格納される。また、出力層での演算結果（すなわち、値Ａ１_ｔ〜Ａ３_ｔ）は、出力される。 Then, the calculation results in the intermediate layer (that is, the values Y1 _{t to} Y3 _t or the values Z1 _{t to} Z3 _t ) are stored in the buffers switched for storing the calculation results at the calculation timing t, respectively. Further, the calculation result (that is, the values A1 _{t to} A3 _t ) in the output layer is output.

また、演算タイミングｔ＋１においては、まず中間層の二重バッファ１１ａ，１１ｂにおいて各バッファの用途が切り替えられる。 Further, at the calculation timing t + 1, the use of each buffer is first switched in the double buffers 11a and 11b of the intermediate layer.

そして、中間層の第２層Ｌ２以降にそれぞれ割り当てられた第２コア１２１−２または第３コア１２１−３は、前段の各層における前回値（すなわち、前回の演算タイミングｔでの演算結果である値Ｙ１_ｔ〜Ｙ３_ｔまたは値Ｚ１_ｔ〜Ｚ３_ｔ）を入力し、各ノードの演算を実行する。また、中間層の第１層Ｌ１に割り当てられた第１コア１２１−１は、入力値である値Ｘ１_ｔ＋１〜Ｘ３_ｔ＋１を入力し、各ノードの演算を実行する。これら各コアの演算は、並列に実行される。 The second core 121-2 or the third core 121-3 assigned to the second layer L2 and subsequent layers of the intermediate layer are the previous values in each layer of the previous stage (that is, the calculation results at the previous calculation timing t). Enter the values Y1 _{t to} _{Y3 t} or the values Z1 _{t to} Z3 _t ) and execute the operation of each node. Further, the first core 121-1 assigned to the first layer L1 of the intermediate layer inputs the input values X1 _{t + 1 to} X3 _{t + 1} and executes the calculation of each node. The operations of each of these cores are executed in parallel.

そして、中間層での演算結果（すなわち、値Ｙ１_ｔ＋１〜Ｙ３_ｔ＋１または値Ｚ１_ｔ＋１〜Ｚ３_ｔ＋１）は、演算タイミングｔ＋１において演算結果格納用に切り替えられているバッファへそれぞれ格納される。また、出力層での演算結果（すなわち、値Ａ１_ｔ＋１〜Ａ３_ｔ＋１）は、出力される。 Then, the calculation results in the intermediate layer (that is, the values Y1 _{t + 1 to} Y3 _{t + 1} or the values Z1 _{t + 1 to} Z3 _{t + 1} ) are stored in the buffers switched for storing the calculation results at the calculation timing t + 1, respectively. Further, the calculation result in the output layer (that is, the values A1 _{t + 1 to} A3 _{t + 1} ) is output.

なお、このように前回値を入力して演算するため、入力値Ｘ１〜Ｘ３に対する出力値Ａ１〜Ａ３の出力までに、「演算タイミング周期×中間層以降の層数」分の時間がかかる。図４は、演算時間の説明図である。 Since the previous value is input and calculated in this way, it takes time for "calculation timing cycle x number of layers after the intermediate layer" to output the output values A1 to A3 with respect to the input values X1 to X3. FIG. 4 is an explanatory diagram of the calculation time.

図４に示すように、たとえば演算タイミングｔにおける第１コア１２１−１の演算結果は、最終的に演算タイミングｔ＋２において第３コア１２１−３の演算に反映される。したがって、本実施形態のように中間層以降の層数が３である場合、演算タイミング周期を１ｍｓとすれば、ニューラルネットワークの演算時間は３ｍｓとなる。ただし、たとえばパワートレイン系機器の故障検出の周期は５０ｍｓ以上等であり、十分に間に合うため、車両の運行等に支障をきたすことはない。 As shown in FIG. 4, for example, the calculation result of the first core 121-1 at the calculation timing t is finally reflected in the calculation of the third core 121-3 at the calculation timing t + 2. Therefore, when the number of layers after the intermediate layer is 3 as in the present embodiment, if the calculation timing period is 1 ms, the calculation time of the neural network is 3 ms. However, for example, the failure detection cycle of the power train system equipment is 50 ms or more, which is sufficient in time, so that the operation of the vehicle is not hindered.

次に、実施形態に係る車載装置１０が実行する処理手順について、図５を用いて説明する。図５は、実施形態に係る車載装置１０が実行する処理手順を示すフローチャートである。 Next, the processing procedure executed by the in-vehicle device 10 according to the embodiment will be described with reference to FIG. FIG. 5 is a flowchart showing a processing procedure executed by the in-vehicle device 10 according to the embodiment.

図５に示すように、まず制御部１２は、ニューラルネットワークの各層に各コアを割り当てる（ステップＳ１０１）。そして、各コアの判定部１２１ａが、割り込みに基づいて演算タイミングを判定する（ステップＳ１０２）。 As shown in FIG. 5, first, the control unit 12 allocates each core to each layer of the neural network (step S101). Then, the determination unit 121a of each core determines the calculation timing based on the interrupt (step S102).

ここで、演算タイミングでない場合（ステップＳ１０２，Ｎｏ）、ステップＳ１０２からの処理を繰り返す。演算タイミングである場合（ステップＳ１０２，Ｙｅｓ）、中間層が割り当てられているのであれば（ステップＳ１０３，Ｙｅｓ）、切替部１２１ｂが、二重バッファ１１ａ，１１ｂの各バッファを切り替える（ステップＳ１０４）。 Here, when it is not the calculation timing (step S102, No), the processing from step S102 is repeated. In the case of calculation timing (step S102, Yes), if the intermediate layer is assigned (step S103, Yes), the switching unit 121b switches each buffer of the double buffers 11a and 11b (step S104).

中間層が割り当てられていないのであれば（ステップＳ１０３，Ｎｏ）、ステップＳ１０８へ移行する。 If the intermediate layer is not assigned (steps S103, No), the process proceeds to step S108.

つづいて、中間層の第２層以降が割り当てられているか否かを判定する（ステップＳ１０５）。ここで、中間層の第２層以降が割り当てられている場合（ステップＳ１０５，Ｙｅｓ）、前段の層における前回の演算タイミングの演算結果である前回値があれば（ステップＳ１０６，Ｙｅｓ）、かかる前回値を入力する（ステップＳ１０７）。 Subsequently, it is determined whether or not the second and subsequent layers of the intermediate layer are assigned (step S105). Here, when the second and subsequent layers of the intermediate layer are assigned (step S105, Yes), if there is a previous value which is the calculation result of the previous calculation timing in the previous layer (step S106, Yes), the previous time is taken. Enter the value (step S107).

前回値がなければ（ステップＳ１０６，Ｎｏ）、ステップＳ１０２からの処理を繰り返す。また、ステップＳ１０５で、中間層の第２層以降が割り当てられているのではない（ステップＳ１０５，Ｎｏ）、すなわち中間層の第１層が割り当てられている場合、入力層から入力する（ステップＳ１０８）。 If there is no previous value (step S106, No), the process from step S102 is repeated. Further, in step S105, when the second and subsequent layers of the intermediate layer are not assigned (steps S105, No), that is, when the first layer of the intermediate layer is assigned, input is performed from the input layer (step S108). ).

そして、各層で並列演算を実行し（ステップＳ１０９）、ステップＳ１０２からの処理を繰り返す。 Then, parallel calculation is executed in each layer (step S109), and the processing from step S102 is repeated.

上述してきたように、実施形態に係る車載装置１０（「演算装置」の一例に相当）は、制御部１２を備える。制御部１２は、各層の演算がそれぞれの前段の層における演算結果に基づいて実行されるニューラルネットワークにおいて、所定の演算タイミングにおける各層の演算を、前回の演算タイミングでの上記前段の層における演算結果に基づいて並列に実行させる。 As described above, the in-vehicle device 10 (corresponding to an example of the "arithmetic device") according to the embodiment includes a control unit 12. In the neural network in which the calculation of each layer is executed based on the calculation result of each previous stage layer, the control unit 12 performs the calculation of each layer at a predetermined calculation timing and the calculation result of the previous stage layer at the previous calculation timing. To be executed in parallel based on.

したがって、実施形態に係る車載装置１０によれば、コア間での処理待ちを解消して演算の高速化を図ることができる。 Therefore, according to the in-vehicle device 10 according to the embodiment, it is possible to eliminate the processing waiting between the cores and speed up the calculation.

また、制御部１２は、第１コア１２１−１、第２コア１２１−２および第３コア１２１−３（「複数のコア」の一例に相当）を有し、ニューラルネットワークの層ごとに第１コア１２１−１、第２コア１２１−２および第３コア１２１−３を割り当てる。 Further, the control unit 12 has a first core 121-1, a second core 121-2, and a third core 121-3 (corresponding to an example of "plurality of cores"), and the first core is for each layer of the neural network. The core 121-1, the second core 121-2, and the third core 121-3 are assigned.

したがって、実施形態に係る車載装置１０によれば、各層に複数のコアが割り当てられている場合に、前段の層ですべてのコアでの演算が完了するのを待ち合わせる必要がなくなる。これにより、処理待ちを減らし、演算の高速化を図ることができる。 Therefore, according to the in-vehicle device 10 according to the embodiment, when a plurality of cores are assigned to each layer, it is not necessary to wait for the calculation in all the cores to be completed in the previous layer. As a result, the waiting time for processing can be reduced and the operation speed can be increased.

また、第１コア１２１−１、第２コア１２１−２および第３コア１２１−３は、所定の周期の演算タイミングで同期してニューラルネットワークの層ごとの演算を実行する。 Further, the first core 121-1, the second core 121-2, and the third core 121-3 execute the calculation for each layer of the neural network in synchronization with the calculation timing of a predetermined cycle.

したがって、実施形態に係る車載装置１０によれば、すべてのコアが同期してニューラルネットワークの層ごとの演算を実行することで、処理待ちが起こることを極力低減し、演算の高速化を図ることができる。 Therefore, according to the in-vehicle device 10 according to the embodiment, all the cores synchronously execute the calculation for each layer of the neural network, thereby reducing the occurrence of processing waiting as much as possible and increasing the speed of the calculation. Can be done.

また、実施形態に係る車載装置１０は、ニューラルネットワークの中間層における前回および今回の演算タイミングでの演算結果を格納する二重バッファ１１ａ，１１ｂを有する。 Further, the in-vehicle device 10 according to the embodiment has double buffers 11a and 11b for storing the calculation results at the previous and current calculation timings in the intermediate layer of the neural network.

したがって、実施形態に係る車載装置１０によれば、所定の演算タイミングにおけるニューラルネットワークの各層の演算を、前回の演算タイミングでの上記前段の層における演算結果に基づいて並列に実行させることを可能にできる。 Therefore, according to the in-vehicle device 10 according to the embodiment, it is possible to execute the calculation of each layer of the neural network at a predetermined calculation timing in parallel based on the calculation result in the previous layer at the previous calculation timing. it can.

また、制御部１２は、演算タイミングごとに上記二重のバッファの用途を切り替える。 Further, the control unit 12 switches the use of the double buffer for each calculation timing.

したがって、実施形態に係る車載装置１０によれば、今回値を前回値へコピーするといった、バッファ間のデータ転送を不要にすることができる。 Therefore, according to the in-vehicle device 10 according to the embodiment, it is possible to eliminate the need for data transfer between buffers, such as copying the current value to the previous value.

なお、上述した実施形態では、制御部１２が有するコアの数と、ニューラルネットワークの中間層以降の層の数が同数である場合を例に挙げたが、同数でなくともよい。たとえば、コアの数の方が多ければ、コアのうちのいずれかを割り当てるようにすればよい。また、コアの間でローテーションしつつ、動的に割り当てるようにしてもよい。 In the above-described embodiment, the case where the number of cores of the control unit 12 and the number of layers after the intermediate layer of the neural network are the same is given as an example, but the number does not have to be the same. For example, if the number of cores is larger, one of the cores may be assigned. It may also be dynamically assigned while rotating between cores.

また、ニューラルネットワークの層の数の方が多ければ、中間層の第１層Ｌ１から順に、第１コア１２１−１→第２コア１２１−２→第３コア１２１−３→第１コア１２１−１…のように割り当てるようにしてもよい。層ごとに１つコアが割り当てられるのであれば、順序に関係なくランダムに割り当ててもよい。 If the number of layers of the neural network is larger, the first core 121-1 → the second core 121-2 → the third core 121-3 → the first core 121- in order from the first layer L1 of the intermediate layer. It may be assigned as 1 ... If one core is assigned to each layer, it may be randomly assigned regardless of the order.

また、たとえば制御部１２が、各コアの処理状態を監視しておき、監視結果に応じてニューラルネットワークの各層に対する各コアの割り当てを動的に変更するようにしてもよい。 Further, for example, the control unit 12 may monitor the processing state of each core and dynamically change the allocation of each core to each layer of the neural network according to the monitoring result.

これにより、たとえばあるコアの処理負荷が高い状態が続いたり、暴走状態となったりした場合等に、かかるコアのニューラルネットワークに対する割り当てを解除して他のコアを割り当てることで、車載装置１０の可用性を高めることができる。 As a result, for example, when the processing load of a certain core continues to be high or a runaway state occurs, the availability of the in-vehicle device 10 is achieved by canceling the allocation of the core to the neural network and allocating another core. Can be enhanced.

また、上述した実施形態では、コアの数が３である場合を例に挙げたが、２であってもよいし、４以上であってもよい。 Further, in the above-described embodiment, the case where the number of cores is 3 is given as an example, but it may be 2 or 4 or more.

また、上述した実施形態では、車載装置１０が演算装置の一例であることとしたが、無論、演算装置の用途を限定するものではなく、ニューラルネットワークのマルチコア演算を行う様々な場面に適用することができる。 Further, in the above-described embodiment, the in-vehicle device 10 is an example of the arithmetic unit, but of course, the use of the arithmetic unit is not limited, and it is applied to various situations where multi-core arithmetic of a neural network is performed. Can be done.

さらなる効果や変形例は、当業者によって容易に導き出すことができる。このため、本発明のより広範な態様は、以上のように表しかつ記述した特定の詳細および代表的な実施形態に限定されるものではない。したがって、添付の特許請求の範囲およびその均等物によって定義される総括的な発明の概念の精神または範囲から逸脱することなく、様々な変更が可能である。 Further effects and variations can be easily derived by those skilled in the art. For this reason, the broader aspects of the invention are not limited to the particular details and representative embodiments expressed and described as described above. Therefore, various modifications can be made without departing from the spirit or scope of the general concept of the invention as defined by the appended claims and their equivalents.

１０車載装置
１１記憶部
１１ａ，１１ｂ二重バッファ
１２制御部
１２１−１第１コア
１２１−２第２コア
１２１−３第３コア
１２１ａ判定部
１２１ｂ切替部
１２１ｃ演算部 10 In-vehicle device 11 Storage unit 11a, 11b Double buffer 12 Control unit 121-1 1st core 121-2 2nd core 121-3 3rd core 121a Judgment unit 121b Switching unit 121c Calculation unit

Claims

In a neural network in which the operations of each layer are executed based on the operation results of the previous layer, the operations of each layer at a predetermined operation timing are performed in parallel based on the operation results of the previous layer at the previous operation timing. An arithmetic unit characterized by having a control unit to be executed.

The control unit
The arithmetic unit according to claim 1, further comprising a plurality of cores and allocating the core to each layer of the neural network.

The core is
The arithmetic unit according to claim 2, wherein the arithmetic operation for each layer of the neural network is executed in synchronization with the arithmetic timing of a predetermined period.

The arithmetic unit according to claim 2 or 3, further comprising a double buffer for storing the arithmetic results at the previous and current arithmetic timings in the intermediate layer of the neural network.

The control unit
The arithmetic unit according to claim 4, wherein the use of the double buffer is switched for each arithmetic timing.

The control unit
The arithmetic unit according to any one of claims 2 to 5, wherein the processing state of the core is monitored, and the allocation of the core to each layer of the neural network is changed according to the processing state.

In a neural network in which operations of each layer are executed based on the operation results of each previous layer, the operations of each layer at a predetermined operation timing are performed in parallel based on the operation results of the previous layer at the previous operation timing. An arithmetic method characterized by including a control process to be executed.

In a neural network in which operations of each layer are executed based on the operation results of the previous layer, the operations of each layer at a predetermined operation timing are performed in parallel based on the operation results of the previous layer at the previous operation timing. A program that causes a computer to execute a control procedure to be executed.