JP2006154992A

JP2006154992A - Neuro-processor

Info

Publication number: JP2006154992A
Application number: JP2004341555A
Authority: JP
Inventors: Hirokazu Madokoro; 洋和間所; Kazuto Sato; 和人佐藤; Masaki Ishii; 雅樹石井
Original assignee: Akita Prefecture
Current assignee: Akita Prefecture
Priority date: 2004-11-26
Filing date: 2004-11-26
Publication date: 2006-06-15

Abstract

<P>PROBLEM TO BE SOLVED: To provide a neuro-chip wherein initialization of a coupling load to a neuron is easy. <P>SOLUTION: Intermediate layer modules 103-105, and output layer modules 106 and 107 respectively hold a coupling load with respect to each input. Connection relationships from three input terminals 111-113 of an input layer to each input IN1-IN3 of the intermediate modules 103-105 are different in each of the intermediate modules 103, 104, and 105. Connection relationships from outputs of the intermediate modules 103-105 to each input IN1-IN3 of the output layer modules 106 and 107 are also different in each of the output layer modules 106 and 107. By shifting connection relationships between modules of the same layer in such manner, even when a set of the same coupling load is initialized in each module, learning progresses since output of each module is different with respect to input from a higher order. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、ニューラルネットワークをハードウエア回路で実現したニューロプロセッサに関する。 The present invention relates to a neuroprocessor that implements a neural network with a hardware circuit.

ニューラルネットワークをハードウエア化したプロセッサとして、例えば特許文献１〜４に示されるものがある。 As a processor in which a neural network is implemented as hardware, for example, there are those disclosed in Patent Documents 1 to 4.

このうち特許文献１及び２に示されるものは、ニューラルネットワークの認識処理のみをハードウエア化し、学習は外部の汎用計算機で行っている。 Of these, those shown in Patent Documents 1 and 2 are implemented by hardware only for recognition processing of a neural network, and learning is performed by an external general-purpose computer.

また特許文献３及び４に示されるものは、認識処理と学習処理の両方をハードウエア化している。特に特許文献３には、階層型ニューラルネットワークの入力層、中間層及び出力層の各ニューロンをそれぞれ１つのチップとして構成し、各チップ内のメモリに上位層の各ニューロンとのシナプス結合の結合荷重を記憶させ、これら結合荷重を用いて順伝搬（フォワードプロパゲーション）と逆伝搬（バックプロパゲーション）の演算処理を行っている。 Moreover, the thing shown by patent document 3 and 4 implement | achieves both the recognition process and the learning process as hardware. In particular, in Patent Document 3, each neuron in the input layer, intermediate layer, and output layer of a hierarchical neural network is configured as one chip, and the connection load of synaptic connections with each neuron in the upper layer is stored in the memory in each chip. Are stored, and forward propagation (back propagation) and back propagation (back propagation) are processed using these coupling weights.

特開平７−１９２０７３号公報JP 7-192073 A 特開平７−２１０５３２号公報JP-A-7-210532 特開平５−１５９０８７号公報Japanese Patent Laid-Open No. 5-159087 特開平５−３４６９１４号公報JP-A-5-346914

ニューラルネットワークの学習の際に、各ニューロン間の結合荷重の初期値の設定が適切になされていないと、学習の収束が著しく遅くなることや、場合によっては学習が収束しないことが起こり得る。同一層の各チップに対し、上位層の各ニューロンとの結合荷重の初期値としてチップ同士で同じ値を設定すると、各チップは同一の入力値に対し同一の出力値を出力することになるので、ニューラルネットワークが均一な状態のまま変化せず、学習が進まない。 When learning the neural network, if the initial value of the connection weight between the neurons is not properly set, the convergence of the learning may be remarkably slowed or the learning may not converge depending on the case. For each chip in the same layer, if the same value is set between the chips as the initial value of the connection load with each neuron in the upper layer, each chip outputs the same output value for the same input value. The neural network remains in a uniform state and learning does not progress.

かといって、各シナプス結合の結合荷重が同一層のニューロン（チップ）間で同じにならないように設定するのは、ニューロン数が多くなると非常に手間が掛かる。チップ間で初期値が同じにならないようにする初期値設定プログラムを用意したり、或いは同様の機能をハードウエア回路としてプロセッサに組み込んだりするといった対策をとれば、そのような手間は軽減されるが、その対策のためのコストは必ずしも少なくない。 However, setting the connection load of each synapse connection not to be the same between neurons (chips) in the same layer is very troublesome as the number of neurons increases. Such measures can be reduced by taking measures such as preparing an initial value setting program that prevents the initial values from being the same between chips, or incorporating similar functions into the processor as hardware circuits. The cost for the countermeasure is not necessarily small.

本発明の１つの側面では、各ニューロンモジュールに対する結合荷重の初期値設定が容易なニューロプロセッサを提供する。 In one aspect of the present invention, a neuroprocessor is provided in which an initial value of a connection load for each neuron module can be easily set.

また本発明の別の側面では、学習のための順伝搬処理と逆伝搬処理とを交互に実行できるニューロプロセッサを提供する。 In another aspect of the present invention, a neuroprocessor capable of alternately executing forward propagation processing and back propagation processing for learning is provided.

本発明のニューロプロセッサは、複数の信号源を含む上位層と、複数のニューロンモジュールを含む下位層と、を備え、前記各ニューロンモジュールは、前記信号源から信号が入力される複数の入力端子であって所定の順序で並んだ複数の入力端子と、それら入力端子ごとの結合荷重を記憶する結合荷重記憶部と、各入力端子から入力された信号と各入力端子に対応する結合荷重とを用いて順伝搬演算を行う順伝搬演算回路と、を備え、前記各信号源と前記ニューロンモジュールの所定の順序で並んだ各入力端子との接続関係を、それらニューロンモジュールごとに異ならせたことを特徴とする。 The neuroprocessor according to the present invention includes an upper layer including a plurality of signal sources and a lower layer including a plurality of neuron modules, and each neuron module has a plurality of input terminals to which signals are input from the signal sources. A plurality of input terminals arranged in a predetermined order, a coupling load storage unit for storing a coupling load for each of the input terminals, a signal input from each input terminal, and a coupling load corresponding to each input terminal are used. And a forward propagation arithmetic circuit for performing forward propagation computation, wherein the connection relationship between each signal source and each input terminal arranged in a predetermined order of the neuron module is different for each neuron module. And

この構成において、信号源は、単なる信号の入力端子の場合もあれば、或いは上位の層のニューロンモジュールの場合もある。例えば入力層１層、中間層１層、出力層１層の３層構造のニューロプロセッサの場合、中間層を「下位層」と考えれば入力層が「上位層」となり、出力層を「下位層」と考えれば中間層が「上位層」となる。 In this configuration, the signal source may be a simple signal input terminal or may be a neuron module in an upper layer. For example, in the case of a neuroprocessor having a three-layer structure of one input layer, one intermediate layer, and one output layer, if the intermediate layer is considered as the “lower layer”, the input layer becomes the “upper layer” and the output layer becomes the “lower layer” The middle class is the “upper class”.

好適な態様では、前記下位層の各ニューロンモジュールは、前記順伝搬演算回路の演算結果と、与えられた逆伝搬信号とに基づき、前記結合荷重記憶部に記憶された前記入力端子ごとの結合荷重を修正するとともに、前記上位層の各信号源に対する逆伝搬信号を生成する逆伝搬演算回路と、それら各逆伝搬信号を出力する逆伝搬出力端子と、を備え、前記各逆伝搬出力端子は、それぞれ対応する前記信号源の逆伝搬入力端子に接続される。 In a preferred aspect, each neuron module in the lower layer has a connection load for each of the input terminals stored in the connection load storage unit based on the operation result of the forward propagation operation circuit and a given back propagation signal. And a back-propagation operation circuit that generates back-propagation signals for the signal sources of the upper layer, and back-propagation output terminals that output the respective back-propagation signals, each back-propagation output terminal, Each is connected to a back propagation input terminal of the corresponding signal source.

この態様において、逆伝搬信号は、出力層の場合は外部から与えられる教師信号であり、その他の層の場合は、下位層から逆伝搬される誤差逆伝搬信号である。 In this aspect, the back propagation signal is a teacher signal given from the outside in the case of the output layer, and is an error back propagation signal that is back propagated from the lower layer in the case of other layers.

また別の好適な態様では、ニューロプロセッサは、学習モード時に、与えられた基準クロックから学習用順伝搬クロック及び学習用逆伝搬クロックを生成する第１のクロック生成回路を備え、前記上位層の各信号源がそれぞれニューロンモジュールであり、前記上位層の各ニューロンモジュールと前記下位層の各ニューロンモジュールとが、前記学習用順伝搬クロックと前記学習用逆伝搬クロックとに従って、順伝搬演算と逆伝搬演算を交互に実行する。 In another preferred aspect, the neuroprocessor includes a first clock generation circuit that generates a learning forward propagation clock and a learning back propagation clock from a given reference clock in the learning mode, and each of the upper layers Each signal source is a neuron module, and each neuron module in the upper layer and each neuron module in the lower layer perform forward propagation operation and back propagation operation according to the learning forward propagation clock and the learning backward propagation clock. Execute alternately.

更に好適な態様では、前記学習用順伝搬クロック及び前記学習用逆伝搬クロックは、前記上位層及び前記下位層での一連の順伝搬演算及び逆伝搬演算に要する時間以上の所定の周期を持つとともに、学習用逆伝搬クロックは学習用順伝搬クロックに対して、前記上位層及び前記下位層での一連の順伝搬演算に要する時間以上の所定遅れ時間だけ位相が遅れており、前記上位層のニューロンモジュールは、学習用順伝搬クロックの立下がりをトリガとして順伝搬演算を実行すると共に、学習用逆伝搬クロックの立上がりをトリガとして逆伝搬演算を実行し、前記下位層のニューロンモジュールは、学習用順伝搬クロックの立上がりをトリガとして順伝搬演算を実行すると共に、学習用逆伝搬クロックの立下がりをトリガとして逆伝搬演算を実行する。 In a further preferred aspect, the learning forward propagation clock and the learning backward propagation clock have a predetermined period equal to or longer than a time required for a series of forward propagation operations and backward propagation operations in the upper layer and the lower layer. The learning back-propagation clock is delayed in phase by a predetermined delay time greater than the time required for a series of forward propagation operations in the upper layer and the lower layer with respect to the learning forward propagation clock. The module executes a forward propagation operation using the falling edge of the learning forward propagation clock as a trigger, and also executes a backward propagation operation using the rising edge of the learning back propagation clock as a trigger. Forward propagation calculation is triggered by the rising edge of the propagation clock, and back propagation calculation is triggered by the falling edge of the backward propagation clock for learning. That.

以下、図面を参照して、本発明を実施するための最良の形態（以下「実施形態」と呼ぶ）について説明する。 The best mode for carrying out the present invention (hereinafter referred to as “embodiment”) will be described below with reference to the drawings.

まず、図１を参照して、本実施形態のニューロプロセッサ１０１のシステム構成を説明する。 First, the system configuration of the neuroprocessor 101 of this embodiment will be described with reference to FIG.

ニューロプロセッサ１０１は、中間層が１層の階層型ニューラルネットワークをハードウエアチップ化したものであり、クロック生成器１０２、中間層モジュール１０３〜１０５、出力層モジュール１０６〜１０７から構成される。このチップの学習方式には、誤差逆伝搬（Back Propagation：バックプロパゲーション）法による学習則を適用している。 The neuroprocessor 101 is a hardware chip formed from a hierarchical neural network with one intermediate layer, and includes a clock generator 102, intermediate layer modules 103 to 105, and output layer modules 106 to 107. A learning rule based on a back propagation method is applied to this chip learning method.

なお、図１に例示したニューロプロセッサ１０１は、ＲＧＢ（赤、緑、青）表色系のデジタルカラー画像から人間の肌領域を抽出することを目的とてしてネットワークを構築したものであるため、入力層はＲ，Ｇ，Ｂの各色に１つずつで合計３ユニット設けられている。具体的には、入力層のユニットは、Ｒ色データ用の入力端子１１１，Ｇ色データ用の入力端子１１２，Ｂ色データ用の入力端子１１３である。また、出力層は、入力された色（Ｒ，Ｇ，Ｂ）が肌の色である確からしさ（肌色度と呼ぶ）と、肌の色でない確からしさ（非肌色度と呼ぶ）とをそれぞれ求めるように２ユニット構成となっている。例えば出力層モジュール１０６が肌色度を出力し、出力層モジュール１０７が非肌色度を出力する。すなわち、このニューロプロセッサ１０１は、１画素の色を示すＲ，Ｇ，Ｂの各値を入力として受け入れ、その色の肌色度と非肌色度をそれぞれ出力する。図１の例では、中間層は、中間層モジュール１０３〜１０５の３ユニット構成を採っている。 The neuroprocessor 101 illustrated in FIG. 1 is a network constructed for the purpose of extracting human skin regions from digital color images of the RGB (red, green, blue) color system. The input layer is provided for a total of three units, one for each of R, G, and B colors. Specifically, the units of the input layer are an input terminal 111 for R color data, an input terminal 112 for G color data, and an input terminal 113 for B color data. Further, the output layer obtains a probability that the input color (R, G, B) is a skin color (referred to as skin chromaticity) and a probability that it is not a skin color (referred to as non-skin chromaticity). Thus, it has a two-unit configuration. For example, the output layer module 106 outputs skin chromaticity, and the output layer module 107 outputs non-skin chromaticity. That is, the neuroprocessor 101 accepts R, G, and B values indicating the color of one pixel as input, and outputs the skin chromaticity and non-skin chromaticity of that color, respectively. In the example of FIG. 1, the intermediate layer has a three-unit configuration of intermediate layer modules 103 to 105.

ただし、このようなユニット構成はあくまで一例であり、各層のユニット数は適用する問題領域に応じて自由に変更することができる。 However, such a unit configuration is merely an example, and the number of units in each layer can be freely changed according to the problem area to be applied.

図１の左側の入力端子１１１〜１１３から入力されるＲ，Ｇ，Ｂの入力データは、それぞれ、各中間層モジュール１０３〜１０５に分配される。本実施形態では、入力層の分配機能は、各入力端子１１１〜１１３と各中間層モジュール１０３〜１０５とを結ぶ専用バス１０８により実現している。図１の結線構造から分かるように、各中間層モジュール１０３〜１０５には、入力層のすべてのユニットの入力データ（すなわちＲ，Ｇ，Ｂのすべて）が入力される。 R, G, and B input data input from the input terminals 111 to 113 on the left side of FIG. 1 are distributed to the intermediate layer modules 103 to 105, respectively. In this embodiment, the distribution function of the input layer is realized by the dedicated bus 108 that connects the input terminals 111 to 113 and the intermediate layer modules 103 to 105. As can be seen from the connection structure in FIG. 1, the input data (that is, all of R, G, and B) of all units in the input layer are input to each of the intermediate layer modules 103 to 105.

ここで、本実施形態では、入力層の各入力端子１１１〜１１３と、各中間層モジュール１０３〜１０５の３つの入力ＩＮ１〜ＩＮ３との接続関係が、中間層モジュールごとに異なっている。すなわち中間層モジュール１０３には、Ｒ，Ｇ，Ｂの信号が入力ＩＮ１，ＩＮ２，ＩＮ３に対しこの順に入力されるが、中間層モジュール１０４ではＲがＩＮ２に、ＧがＩＮ３に、ＢがＩＮ１に入力され、中間層モジュール１０５ではＲがＩＮ３に、ＧがＩＮ１に、ＢがＩＮ２に入力される。 Here, in this embodiment, the connection relationship between the input terminals 111 to 113 of the input layer and the three inputs IN1 to IN3 of the intermediate layer modules 103 to 105 is different for each intermediate layer module. That is, R, G, and B signals are input to the input IN1, IN2, and IN3 in this order in the intermediate layer module 103, but in the intermediate layer module 104, R is IN2, G is IN3, and B is IN1. In the intermediate layer module 105, R is input to IN3, G is input to IN1, and B is input to IN2.

また中間層モジュール１０３〜１０５と出力層モジュール１０６，１０７間は、順伝搬用の専用バス１０９と逆伝搬用の専用バス１１０によって結合している。各出力層モジュール１０６，１０７には、中間層モジュール１０３〜１０５のすべての順伝搬出力が入力される。また、各中間層モジュール１０３〜１０５には、各出力層モジュール１０６，１０７の逆伝搬出力が１つずつ入力される。 Further, the intermediate layer modules 103 to 105 and the output layer modules 106 and 107 are connected by a dedicated forward propagation bus 109 and a backward propagation dedicated bus 110. All the forward propagation outputs of the intermediate layer modules 103 to 105 are input to the output layer modules 106 and 107. Further, the back propagation outputs of the output layer modules 106 and 107 are input to the intermediate layer modules 103 to 105 one by one.

ここで、本実施形態では、中間層モジュール１０３〜１０５からの３つ順伝搬出力と、各出力層モジュール１０６，１０７の３つの入力ＩＮ１〜ＩＮ３との接続関係が、中間層モジュールごとに異なっている。すなわち出力層モジュール１０６には、中間層モジュール１０３〜１０５の信号が入力ＩＮ１，ＩＮ２，ＩＮ３に対しこの順に入力されるが、出力層モジュール１０７では、中間層モジュール１０３の出力がＩＮ２に、１０４の出力がＩＮ３に、１０５の出力がＩＮ１に入力される。 Here, in this embodiment, the connection relationship between the three forward propagation outputs from the intermediate layer modules 103 to 105 and the three inputs IN1 to IN3 of the output layer modules 106 and 107 is different for each intermediate layer module. Yes. In other words, the signals of the intermediate layer modules 103 to 105 are input to the output layer module 106 in this order with respect to the inputs IN1, IN2, and IN3. In the output layer module 107, the output of the intermediate layer module 103 is set to IN2. The output is input to IN3 and the output of 105 is input to IN1.

また、図１の回路構成では、このような接続関係に応じて、出力層モジュール１０６，１０７の逆伝搬出力と、各中間層モジュール１０３〜１０５の各逆伝搬入力との接続がなされている。すなわち、図４において、出力層モジュール４０１の逆伝搬出力ＤＷ１は順伝搬入力ＩＮ１を用いて求めた逆伝搬値を出力していることから分かるように、各逆伝搬出力ＤＷ１〜ＤＷ３は、対応する番号の順伝搬入力の入力値から求めた逆伝搬値を出力する。すなわち、出力層モジュールは、中間層モジュールからの順伝搬入力値を用いてその中間層モジュールに対する逆伝搬出力を生成しているのである。したがって、出力層モジュール１０６，１０７の逆伝搬出力ＤＷ１〜ＤＷ３はそれぞれ、中間層モジュール１０３〜１０５のうち、対応する番号の順伝搬入力ＩＮ１〜ＩＮ３の接続先であるモジュールに対して接続される。 Further, in the circuit configuration of FIG. 1, the back propagation output of the output layer modules 106 and 107 and the back propagation input of each of the intermediate layer modules 103 to 105 are connected according to such a connection relationship. That is, in FIG. 4, as can be seen from the fact that the back propagation output DW1 of the output layer module 401 outputs the back propagation value obtained using the forward propagation input IN1, the back propagation outputs DW1 to DW3 correspond to each other. The back propagation value obtained from the input value of the forward propagation input of the number is output. That is, the output layer module uses the forward propagation input value from the intermediate layer module to generate a back propagation output for the intermediate layer module. Accordingly, the back propagation outputs DW1 to DW3 of the output layer modules 106 and 107 are connected to the modules to which the forward propagation inputs IN1 to IN3 having the corresponding numbers are connected, among the intermediate layer modules 103 to 105, respectively.

ニューロプロセッサ１０１による処理結果は、図１右側の出力端子１１４，１１５から出力される。出力端子１１４は出力層モジュール１０６の順伝搬出力を、出力端子１１５は出力層モジュール１０７の順伝搬出力を、それぞれ出力する。また、ニューロプロセッサ１０１は、教師信号入力端子１１６，１１７を有し、前者から入力される教師信号は出力層モジュール１０６に、後者からの教師信号は出力層モジュール１０７に、それぞれ入力される。 The processing result by the neuroprocessor 101 is output from the output terminals 114 and 115 on the right side of FIG. The output terminal 114 outputs the forward propagation output of the output layer module 106, and the output terminal 115 outputs the forward propagation output of the output layer module 107, respectively. The neuroprocessor 101 also has teacher signal input terminals 116 and 117. The teacher signal input from the former is input to the output layer module 106, and the teacher signal from the latter is input to the output layer module 107.

クロック生成器１０２は、外部から基準クロックＣＬＫを受け取り、この基準クロックＣＬＫから、順伝搬処理の際の中間層モジュール１０３〜１０５及び出力層モジュール１０６，１０７の動作制御に用いる順伝搬クロックＣＬＫ_ＦＰと、逆伝搬処理の際のそれらの動作制御に用いる逆伝搬クロックＣＬＫ_ＢＰとを生成する。順伝搬クロックＣＬＫ_ＦＰと逆伝搬クロックＣＬＫ_ＢＰとは、それぞれ専用のクロック信号線を介して、各中間層モジュール１０３〜１０５及び各出力層モジュール１０６，１０７に供給される。また、このクロック生成器１０２は、学習モードと認識モードとで異なるクロック信号を生成する。ニューロプロセッサ１０１が学習動作を行うモードが学習モードであり、パターン認識動作を行うモードが認識モードである。モードの指定は、モード信号ＭＯＤＥとして外部から入力される。なお、クロック生成器１０２の詳細については後で詳しく説明する。 The clock generator 102 receives a reference clock CLK from the outside, and from this reference clock CLK, a forward propagation clock CLK_FP used for operation control of the intermediate layer modules 103 to 105 and the output layer modules 106 and 107 at the time of forward propagation processing, A back-propagation clock CLK_BP that is used to control the operations during back-propagation processing is generated. The forward propagation clock CLK_FP and the backward propagation clock CLK_BP are respectively supplied to the intermediate layer modules 103 to 105 and the output layer modules 106 and 107 via dedicated clock signal lines. Further, the clock generator 102 generates different clock signals in the learning mode and the recognition mode. The mode in which the neuroprocessor 101 performs the learning operation is the learning mode, and the mode in which the pattern recognition operation is performed is the recognition mode. The designation of the mode is input from the outside as a mode signal MODE. Details of the clock generator 102 will be described later.

以上、ニューロプロセッサ１０１の概略構成を説明した。次に、中間層モジュール１０３〜１０５、及び出力層モジュール１０６，１０７の詳細や、ニューロプロセッサ１０１の動作について詳しく説明する前に、この実施形態で採用している階層型ニューラルネットワークの工学的モデルを説明する。 The schematic configuration of the neuroprocessor 101 has been described above. Next, before describing the details of the intermediate layer modules 103 to 105 and the output layer modules 106 and 107 and the operation of the neuroprocessor 101, an engineering model of the hierarchical neural network employed in this embodiment will be described. explain.

図２は、階層型ニューラルネットワークの工学的モデルを示す。図２の（ａ）には、入力層２０１が３ユニット（ニューロン）、中間層２０２が３ユニット、出力層２０３が２ユニットの階層型ニューラルネットワークを例として図示した。（ｂ）には、下位の層の１つのユニット２０６とそれに接続される上位層のユニット２０４との関係を模式的に示す。例えば、下位層が中間層であれば上位層は入力層、下位層が出力層であれば上位層は中間層である。このモデルでは、下位層の各ユニット２０６は、上位層の全ユニット２０４と結合しており、それら各結合は結合荷重２０５を有している。下位層のユニット２０６の内部では、１つ前の上位層の各ユニット２０４から出力される順伝搬出力値と各結合荷重２０５の積和演算（Σ）が行われ、その結果が所定の伝達関数（ｆ）に通されることによってユニット２０６の出力値が決定される。伝達関数としては、シグモイド関数が良く用いられる。 FIG. 2 shows an engineering model of a hierarchical neural network. FIG. 2A shows an example of a hierarchical neural network in which the input layer 201 has 3 units (neurons), the intermediate layer 202 has 3 units, and the output layer 203 has 2 units. FIG. 4B schematically shows the relationship between one unit 206 in the lower layer and the unit 204 in the upper layer connected thereto. For example, if the lower layer is an intermediate layer, the upper layer is an input layer, and if the lower layer is an output layer, the upper layer is an intermediate layer. In this model, each unit 206 in the lower layer is coupled to all the units 204 in the upper layer, each of which has a combined load 205. In the lower layer unit 206, a product-sum operation (Σ) of the forward propagation output value output from each previous upper layer unit 204 and each coupling load 205 is performed, and the result is a predetermined transfer function. By passing through (f), the output value of the unit 206 is determined. A sigmoid function is often used as a transfer function.

次に、図１に示した中間層モジュール１０３〜１０５の内部構成のハードウエアモデルを、図３を参照して説明する。本実施形態では、中間層モジュール１０３〜１０５のハードウエア的な内部構成は共通であり、これを図３では中間層モジュール３０１として示している。図では、信号の流れを矢印で示しており、順伝搬処理時の信号の流れを実線の矢印で、逆伝搬処理時の信号の流れを破線の矢印で示している。 Next, a hardware model of the internal configuration of the intermediate layer modules 103 to 105 shown in FIG. 1 will be described with reference to FIG. In this embodiment, the hardware internal configurations of the intermediate layer modules 103 to 105 are common, and this is shown as an intermediate layer module 301 in FIG. In the figure, the signal flow is indicated by arrows, the signal flow during forward propagation processing is indicated by solid arrows, and the signal flow during reverse propagation processing is indicated by broken arrows.

中間層モジュール３０１は、３つの順伝搬入力ＩＮ１，ＩＮ２，ＩＮ３を有しており、これら各入力に対しバス１０８を介して入力層の各入力端子１１１〜１１３からの順伝搬の入力信号３０９，３１０，３１１が入力される。中間層モジュール３０１は、順伝搬入力ＩＮ１に対応する結合荷重ｗ１を記憶する荷重記憶部３０２，順伝搬入力ＩＮ２に対応する結合荷重ｗ２を記憶する荷重記憶部３０３，及び入力ＩＮ３に対応する結合荷重ｗ３を記憶する記憶部３０４を有する。順伝搬時に入力ＩＮ１から入力される順伝搬の入力信号３０９には、乗算器３２０により結合荷重ｗ１が乗じられる。同様に入力ＩＮ２，ＩＮ３から入力される信号３１０，３１１には、乗算器３２１及び３２２により結合荷重ｗ２及びｗ３がそれぞれ乗じられる。また、閾値記憶部３０５には、ニューロンの閾値（オフセット）が記憶されている。各乗算器３２０〜３２２の乗算結果と、閾値記憶部３０５からの閾値とが、Σ演算ブロック３０６により足し合わされる。以上が順伝搬処理における積和演算である。 The intermediate layer module 301 has three forward propagation inputs IN1, IN2, and IN3. For each of these inputs, forward propagation input signals 309 from the input terminals 111 to 113 of the input layer via the bus 108 are provided. 310 and 311 are input. The intermediate layer module 301 includes a load storage unit 302 that stores a coupling load w1 corresponding to the forward propagation input IN1, a load storage unit 303 that stores a coupling load w2 corresponding to the forward propagation input IN2, and a coupling load corresponding to the input IN3. A storage unit 304 that stores w3 is included. The forward propagation input signal 309 input from the input IN1 during forward propagation is multiplied by the coupling weight w1 by the multiplier 320. Similarly, the signals 310 and 311 input from the inputs IN2 and IN3 are multiplied by the coupling loads w2 and w3 by the multipliers 321 and 322, respectively. The threshold storage unit 305 stores neuron thresholds (offsets). The multiplication results of the multipliers 320 to 322 and the threshold value from the threshold value storage unit 305 are added by the Σ operation block 306. The above is the product-sum operation in the forward propagation process.

シグモイド関数ＬＵＴ部（ルックアップテーブル）３０７は、ニューロンの伝達関数であるシグモイド関数を示したテーブルを記憶している。シグモイド関数ＬＵＴ部３０７には、Σ演算ブロック３０６の出力値が入力され、この入力の値に対応したシグモイド関数値をそのテーブルから読み出して出力する。このようにシグモイド関数ＬＵＴ部３０７を通すことにより、積和演算結果は０〜１の範囲に正規化される。このようにシグモイド関数ＬＵＴ部３０７から出力される値が、中間層モジュール３０１の出力値３１２となる。この出力値３１２は、中間層モジュールの順伝搬出力ＯＵＴから出力され、順伝搬用の専用バス１０９を介して出力層モジュール１０６，１０７に供給される。以上が、中間層モジュール３０１内の順伝搬のための構成である。 The sigmoid function LUT unit (lookup table) 307 stores a table showing sigmoid functions that are transfer functions of neurons. The sigmoid function LUT unit 307 receives the output value of the Σ operation block 306, reads out the sigmoid function value corresponding to the input value from the table, and outputs it. By passing through the sigmoid function LUT unit 307 in this way, the product-sum operation result is normalized to a range of 0-1. Thus, the value output from the sigmoid function LUT unit 307 becomes the output value 312 of the intermediate layer module 301. This output value 312 is output from the forward propagation output OUT of the intermediate layer module, and is supplied to the output layer modules 106 and 107 via the dedicated bus 109 for forward propagation. The above is the configuration for forward propagation in the intermediate layer module 301.

次に、図中破線で示す逆伝搬処理の際の構成について説明する。逆伝搬処理は学習過程で実行される。学習過程では、出力層モジュール１０６，１０７から逆伝搬信号３１３，３１４が伝搬される。逆伝搬信号３１３は逆伝搬入力ＤＷ１から、逆伝搬信号３１４は逆伝搬入力ＤＷ２から入力される。これら逆伝搬信号３１３及び３１４は、加算器３２３で足し合わされ、乗算器３２４に入力される。乗算器３２４のもう一方の入力には、シグモイド微分関数ＬＵＴ部３０８の出力値が入力される。このシグモイド微分関数ＬＵＴ部３０８は、シグモイド関数ＬＵＴ部３０７が持つシグモイド関数の微分関数を示すテーブルを記憶している。シグモイド微分関数ＬＵＴ部３０８は、学習過程での順伝搬の際にΣ演算ブロック３０６で求められた積和演算結果を受け、これに対応するシグモイド微分関数値を出力する。このシグモイド微分関数値を、乗算器３２４により、逆伝搬信号３１３と３１４の和に乗じることで、誤差信号Ａが求められる。この誤差信号Ａは、加算器３２５と、乗算器３３０，３３２，３３４に入力される。 Next, the configuration at the time of back propagation processing indicated by a broken line in the figure will be described. The back propagation process is executed in the learning process. In the learning process, back propagation signals 313 and 314 are propagated from the output layer modules 106 and 107. The back propagation signal 313 is input from the back propagation input DW1, and the back propagation signal 314 is input from the back propagation input DW2. These back propagation signals 313 and 314 are added by an adder 323 and input to a multiplier 324. The output value of the sigmoid differential function LUT unit 308 is input to the other input of the multiplier 324. The sigmoid differential function LUT unit 308 stores a table indicating the differential function of the sigmoid function that the sigmoid function LUT unit 307 has. The sigmoid differential function LUT unit 308 receives the product-sum operation result obtained by the Σ operation block 306 during forward propagation in the learning process, and outputs a sigmoid differential function value corresponding thereto. An error signal A is obtained by multiplying the sum of the back propagation signals 313 and 314 by the multiplier 324 by this sigmoid differential function value. The error signal A is input to an adder 325 and multipliers 330, 332, and 334.

加算器３２５は、誤差信号Ａを、閾値記憶部３０５から出力される閾値と加算することで、閾値の更新値を計算する。求められた閾値の更新値は閾値記憶部３０５に書き込まれ、これにより閾値が更新される。 The adder 325 calculates the threshold update value by adding the error signal A to the threshold output from the threshold storage unit 305. The obtained threshold update value is written in the threshold value storage unit 305, whereby the threshold value is updated.

乗算器３３０は、順伝搬入力ＩＮ１からの入力信号３０９に誤差信号Ａを乗算する。この乗算の結果は、加算器３３１により荷重記憶部３０２からの結合荷重ｗ１と加算される。この加算結果により、荷重記憶部３０２の持つ結合荷重ｗ１の値が更新される。 The multiplier 330 multiplies the input signal 309 from the forward propagation input IN1 by the error signal A. The result of this multiplication is added by the adder 331 to the combined load w1 from the load storage unit 302. Based on the addition result, the value of the combined load w1 of the load storage unit 302 is updated.

同様に、誤差信号Ａは、乗算器３３２及び３３４により入力信号３１０及び３１１にそれぞれ乗算され、これら各乗算結果を加算器３３３及び３３５により元の結合荷重ｗ２及びｗ３にそれぞれ加算することで、荷重記憶部３０３及び３０４が持つ結合荷重ｗ２及びｗ３の値が更新される。 Similarly, the error signal A is multiplied by the input signals 310 and 311 by the multipliers 332 and 334, respectively, and these multiplication results are added to the original combined weights w2 and w3 by the adders 333 and 335, respectively. The values of the combined loads w2 and w3 that the storage units 303 and 304 have are updated.

以上が中間層モジュール３０１の構成と、この構成による順伝搬と逆伝搬の処理である。 The above is the configuration of the intermediate layer module 301 and the processing of forward propagation and back propagation by this configuration.

次に図４を参照して、出力層モジュール１０６及び１０７のハードウェアモデルを説明する。出力層モジュール１０６及び１０７のハードウエア的な内部構成は共通であり、これを図４では出力層モジュール４０１として示している。 Next, a hardware model of the output layer modules 106 and 107 will be described with reference to FIG. The hardware internal configurations of the output layer modules 106 and 107 are common, and this is shown as the output layer module 401 in FIG.

出力層モジュール４０１は、３つの順伝搬入力ＩＮ１，ＩＮ２，ＩＮ３を有しており、これら各入力に対し、３つの中間層モジュール１０３〜１０５からの順伝搬の入力信号４０９，４１０，４１１が入力される。出力層モジュール４０１は、順伝搬入力ＩＮ１に対応する結合荷重ｗ１を記憶する荷重記憶部４０２，順伝搬入力ＩＮ２に対応する結合荷重ｗ２を記憶する荷重記憶部４０３，及び入力ＩＮ３に対応する結合荷重ｗ３を記憶する記憶部４０４を有する。順伝搬時に入力ＩＮ１，ＩＮ２，ＩＮ３から入力される信号４０９，４１０，４１１には、乗算器４２０，４２１，４２２により結合荷重ｗ１，ｗ２，ｗ３がそれぞれ乗じられる。また、閾値記憶部４０５には、ニューロンの閾値（オフセット）が記憶されている。Σ演算ブロック４０６は、各乗算器４２０〜４２２の乗算結果と、閾値記憶部４０５からの閾値とを総和し、その結果を出力する。 The output layer module 401 has three forward propagation inputs IN1, IN2, and IN3. For each of these inputs, forward propagation input signals 409, 410, and 411 from the three intermediate layer modules 103 to 105 are input. Is done. The output layer module 401 includes a load storage unit 402 that stores a coupling load w1 corresponding to the forward propagation input IN1, a load storage unit 403 that stores a coupling load w2 corresponding to the forward propagation input IN2, and a coupling load corresponding to the input IN3. It has the memory | storage part 404 which memorize | stores w3. The signals 409, 410, and 411 input from the inputs IN1, IN2, and IN3 at the time of forward propagation are multiplied by coupling loads w1, w2, and w3 by multipliers 420, 421, and 422, respectively. The threshold storage unit 405 stores neuron thresholds (offsets). The Σ operation block 406 sums the multiplication results of the multipliers 420 to 422 and the threshold value from the threshold value storage unit 405 and outputs the result.

シグモイド関数ＬＵＴ部４０７は、中間層モジュールと同様のシグモイド関数を示したテーブルを記憶しており、Σ演算ブロック４０６の出力値が入力され、この入力の値に対応したシグモイド関数値をそのテーブルから出力する。このようにしてシグモイド関数ＬＵＴ部４０７を通して０〜１の範囲に正規化された値が、中間層モジュール４０１の出力値４１２となり、順伝搬出力ＯＵＴから出力される。以上が、出力層モジュール４０１内の順伝搬のための構成である。 The sigmoid function LUT unit 407 stores a table indicating a sigmoid function similar to that of the intermediate layer module. The output value of the Σ operation block 406 is input, and the sigmoid function value corresponding to the input value is obtained from the table. Output. The value normalized in the range of 0 to 1 through the sigmoid function LUT unit 407 in this way becomes the output value 412 of the intermediate layer module 401 and is output from the forward propagation output OUT. The above is the configuration for forward propagation in the output layer module 401.

次に逆伝搬の構成について説明する。逆伝搬処理は上記中間層モジュールと同様、学習過程で実行される。学習過程では、教師信号入力ＴＣＨから教師信号４１３が入力される。教師信号４１３は加算器４２３の一方の入力に供給される。加算器４２３のもう一方の入力には、シグモイド関数ＬＵＴ部４０７が出力する順伝搬出力値をインバータ４１８により符号反転した値が入力される。したがって、加算器４２３からは、教師信号４１３から当該出力層モジュール４０１の順伝搬出力を引いた差分値が出力される。この差分値は、乗算器４２４の一方の入力に与えられる。乗算器４２４のもう一方の入力には、シグモイド微分関数ＬＵＴ部４０８の出力値が入力される。このシグモイド微分関数ＬＵＴ部４０８は、シグモイド関数ＬＵＴ部４０７が持つシグモイド関数の微分関数を示すテーブルを記憶しており、順伝搬の際にΣ演算ブロック４０６で求められた積和演算結果を受け、これに対応するシグモイド微分関数値を出力する。このシグモイド微分関数値を、乗算器４２４により、逆伝搬信号４１３と４１４の和に乗じることで、誤差信号Ａが求められる。この誤差信号Ａは、加算器４２５と、乗算器４３０，４３２，４３４に入力される。 Next, the configuration of back propagation will be described. The back propagation process is executed in the learning process, similar to the intermediate layer module. In the learning process, a teacher signal 413 is input from the teacher signal input TCH. The teacher signal 413 is supplied to one input of the adder 423. A value obtained by inverting the sign of the forward propagation output value output from the sigmoid function LUT unit 407 by the inverter 418 is input to the other input of the adder 423. Therefore, the adder 423 outputs a difference value obtained by subtracting the forward propagation output of the output layer module 401 from the teacher signal 413. This difference value is given to one input of the multiplier 424. The output value of the sigmoid differential function LUT unit 408 is input to the other input of the multiplier 424. The sigmoid differential function LUT unit 408 stores a table indicating the differential function of the sigmoid function of the sigmoid function LUT unit 407, receives the product-sum operation result obtained by the Σ operation block 406 at the time of forward propagation, The sigmoid differential function value corresponding to this is output. The error signal A is obtained by multiplying the sum of the back propagation signals 413 and 414 by the multiplier 424 with this sigmoid differential function value. The error signal A is input to an adder 425 and multipliers 430, 432, and 434.

加算器４２５は、誤差信号Ａを、閾値記憶部４０５から出力される閾値と加算することで、閾値の更新値を計算する。求められた閾値の更新値は閾値記憶部４０５に書き込まれ、これにより閾値が更新される。 The adder 425 calculates an updated value of the threshold by adding the error signal A to the threshold output from the threshold storage unit 405. The obtained update value of the threshold value is written in the threshold value storage unit 405, whereby the threshold value is updated.

乗算器４３０，４３２，４３４は、順伝搬入力ＩＮ１，ＩＮ２，ＩＮ３からの入力信号４０９，４１０，４１１に誤差信号Ａをそれぞれ乗算する。この乗算の結果は、加算器４３１，４３３，４３５により荷重記憶部４０２、４０３，４０４からの結合荷重ｗ１，ｗ２，ｗ３とそれぞれ加算される。この加算結果により、荷重記憶部４０２、４０３，４０４の持つ結合荷重ｗ１，ｗ２，ｗ３の値が更新される。 Multipliers 430, 432, and 434 multiply error signals A by input signals 409, 410, and 411 from forward propagation inputs IN1, IN2, and IN3, respectively. The multiplication results are added to the combined loads w1, w2, and w3 from the load storage units 402, 403, and 404 by adders 431, 433, and 435, respectively. Based on the addition result, the values of the combined loads w1, w2, and w3 of the load storage units 402, 403, and 404 are updated.

また、誤差信号Ａは、乗算器４３６，４３７，４３８によりそれぞれ結合荷重ｗ１，ｗ２，ｗ３と掛け合わされる。これら各乗算結果が逆伝搬出力ＤＷ１，ＤＷ２，ＤＷ３からそれぞれ逆伝搬値４１４，４１６，４１７としてそれぞれ出力される。出力された逆伝搬値は、逆伝搬用の専用バス１１０を介して中間層モジュール１０３〜１０５へと供給される。 The error signal A is multiplied by the coupling loads w1, w2, and w3 by the multipliers 436, 437, and 438, respectively. These multiplication results are output as back propagation values 414, 416, and 417 from back propagation outputs DW1, DW2, and DW3, respectively. The output back propagation value is supplied to the intermediate layer modules 103 to 105 via the dedicated back propagation bus 110.

以上が出力層モジュール４０１の構成と、この構成による順伝搬と逆伝搬の処理である。 The above is the configuration of the output layer module 401 and the processing of forward propagation and back propagation by this configuration.

次に、図５を参照して、クロック生成器１０２が各モードで生成するクロック信号と、これに基づく各モジュールの動作について説明する。 Next, with reference to FIG. 5, the clock signal generated by the clock generator 102 in each mode and the operation of each module based on this will be described.

クロック生成器１０２は、外部からＣＬＫ入力に与えられる基準クロック５０１から順伝搬を制御する順伝搬クロック５０２，５０４、逆伝搬を制御する逆伝搬クロック５０３，５０５の２種類のクロックを生成する。中間層モジュール１０３〜１０５及び出力層モジュール１０６，１０７のニューロンの学習及び認識（テスト）処理は、順伝搬クロック５０２又は５０４と逆伝搬クロック５０３又は５０５によって制御する。クロック生成器１０２は、ＭＯＤＥ入力に対して外部から入力されるモード設定信号に従い、学習モードと認識モードに切り替わる。 The clock generator 102 generates two types of clocks, forward propagation clocks 502 and 504 for controlling forward propagation, and backward propagation clocks 503 and 505 for controlling backward propagation, from a reference clock 501 given to the CLK input from the outside. The learning and recognition (test) processing of the neurons of the intermediate layer modules 103 to 105 and the output layer modules 106 and 107 are controlled by the forward propagation clock 502 or 504 and the backward propagation clock 503 or 505. The clock generator 102 switches between the learning mode and the recognition mode in accordance with a mode setting signal input from the outside with respect to the MODE input.

まず学習モードでは、クロック生成器１０２は、順伝搬クロック５０２及び逆伝搬クロック５０３を生成し、前者を順伝搬クロック出力ＣＬＫ_ＦＰから出力し、後者を逆伝搬クロック出力ＣＬＫ_ＢＰから出力する。 First, in the learning mode, the clock generator 102 generates the forward propagation clock 502 and the backward propagation clock 503, outputs the former from the forward propagation clock output CLK_FP, and outputs the latter from the backward propagation clock output CLK_BP.

図５の例では、基準クロック４クロックで１回の学習を行う。すなわち、まず基準クロック５０１の１クロック目の立上がりで順伝搬クロック５０２を立ち下げる。この順伝搬クロック５０２の立下がりをトリガとして、中間層モジュール１０３〜１０５が並列的に順伝搬の演算処理を行う。そして、基準クロック５０１の２クロック目の立上がりで順伝搬クロック５０２を立ち上げる。この順伝搬クロック５０２の立上がりをトリガとして、出力層モジュール１０６，１０７が並列的に順伝搬の演算処理を行う。 In the example of FIG. 5, learning is performed once with 4 reference clocks. That is, the forward propagation clock 502 is first lowered at the first rising edge of the reference clock 501. Using the falling edge of the forward propagation clock 502 as a trigger, the intermediate layer modules 103 to 105 perform forward propagation arithmetic processing in parallel. Then, the forward propagation clock 502 is raised at the second rising edge of the reference clock 501. The output layer modules 106 and 107 perform forward propagation arithmetic processing in parallel with the rising of the forward propagation clock 502 as a trigger.

続いて、基準クロック５０１の３クロック目の立上がりで逆伝搬クロック５０３を立ち下げる。出力層モジュール１０６，１０７は、この逆伝搬クロック５０３の立下がりをトリガとして逆伝搬の演算処理を行する。そして、基準クロック５０１の４クロック目の立上がりで逆伝搬クロック５０３を立ち上げ、これをトリガとして中間層モジュール１０３〜１０５が逆伝搬処理を行う。 Subsequently, the back-propagation clock 503 falls at the third rise of the reference clock 501. The output layer modules 106 and 107 perform reverse propagation calculation processing using the falling edge of the reverse propagation clock 503 as a trigger. Then, the back propagation clock 503 is raised at the rise of the fourth clock of the reference clock 501, and the intermediate layer modules 103 to 105 perform back propagation processing using this as a trigger.

このように、順伝搬クロック５０２及び逆伝搬クロック５０３の１周期で１回の学習処理、すなわち学習データをニューロプロセッサ１０１の入力層に与えて順伝搬させて出力層からの出力を求め、その出力層に対して教師信号を与え、これを逆伝搬させて出力層モジュール及び中間層モジュールの閾値や結合荷重を更新する処理、を実行する。したがって、クロック周期は、ニューロプロセッサ１０１内の各層での順伝搬処理とその後に続く逆伝搬処理に要する時間の合計以上の期間とすればよい。 In this way, learning processing is performed once in one cycle of the forward propagation clock 502 and the backward propagation clock 503, that is, the learning data is given to the input layer of the neuroprocessor 101 to be forward propagated to obtain the output from the output layer, and the output A process is performed in which a teacher signal is given to the layer, and the threshold signal and the coupling load of the output layer module and the intermediate layer module are updated by propagating the teacher signal. Therefore, the clock cycle may be a period that is equal to or greater than the total time required for forward propagation processing in each layer in the neuroprocessor 101 and subsequent back propagation processing.

図５の例では、順伝搬クロック５０２及び逆伝搬クロック５０３の１周期は、基準クロック４周期の長さである。順伝搬クロック５０２は、その周期の１／４（すなわちち基準クロック１周期分）の期間Ｌ（ロー）レベルであり、残りの３／４周期はＨ（ハイ）レベルである。この例では、基準クロックの立上がりにより順伝搬クロック５０２を生成しているので、順伝搬クロック５０２のＬレベル期間は基準クロック５０１の１周期となっているが、これは一例に過ぎない。原理的には、この順伝搬クロック５０２のＬレベル期間の長さは、中間層モジュール１０３〜１０５が入力ＩＮ１〜ＩＮ３から入力される入力信号に対し、内部の順伝搬用の演算回路による演算を行い、出力ＯＵＴから出力する出力信号を生成するまでに要する時間以上であればよい。同様に、逆伝搬クロック５０４のＬレベル期間の長さは、原理的には、出力層モジュール１０６，１０７が教師信号入力ＴＣＨから入力される教師信号を用いて、内部の逆伝搬用の演算回路による演算を行い、逆伝搬出力ＤＷ１〜３から出力する出力信号を生成するまでに要する時間以上であればよい。また、順伝搬クロック５０２の立上がりの後、次に逆伝搬クロック５０３が立ち下がるまでの間隔は、原理的には、出力層モジュール１０６，１０７が入力ＩＮ１〜ＩＮ３から入力される入力信号に対し、内部の順伝搬用の演算回路による演算を行い、出力ＯＵＴから出力する出力信号を生成するまでに要する時間以上であればよい。したがって、逆伝搬クロック５０３は、順伝搬クロック５０２に対して、ニューロプロセッサ１０１内での各層の順伝搬に要する時間の合計以上だけ位相が遅れていればよい。 In the example of FIG. 5, one period of the forward propagation clock 502 and the backward propagation clock 503 is a length of four reference clock periods. The forward propagation clock 502 is at a period L (low) level for a quarter of the period (that is, one period of the reference clock), and the remaining 3/4 period is at an H (high) level. In this example, since the forward propagation clock 502 is generated by the rising of the reference clock, the L level period of the forward propagation clock 502 is one cycle of the reference clock 501, but this is only an example. In principle, the length of the L level period of the forward propagation clock 502 is that the intermediate layer modules 103 to 105 perform an operation by an internal forward propagation arithmetic circuit on the input signals input from the inputs IN1 to IN3. The time required to generate the output signal output from the output OUT is sufficient. Similarly, the length of the L level period of the back propagation clock 504 is basically the same as the internal back propagation arithmetic circuit using the teacher signal input from the teacher signal input TCH by the output layer modules 106 and 107. It suffices if it is equal to or longer than the time required to generate the output signal output from the back propagation outputs DW1 to DW3. Further, in principle, the interval between the rising of the forward propagation clock 502 and the next falling of the backward propagation clock 503 is relative to the input signals input from the input IN1 to IN3 by the output layer modules 106 and 107. It is sufficient that the time is longer than the time required to perform an operation by the internal forward propagation arithmetic circuit and generate an output signal output from the output OUT. Therefore, the phase of the back propagation clock 503 only needs to be delayed from the forward propagation clock 502 by at least the total time required for forward propagation of each layer in the neuroprocessor 101.

以上のような順伝搬クロック５０２及び逆伝搬クロック５０３を用いることで、１つの学習データに対する順伝搬の演算処理とそれに対応する教師信号に基づく逆伝搬の演算処理とを交互に、言い換えれば１回の学習ごとに同時並列的に、実行することができる。このように学習時に順伝搬と逆伝搬を交互に実行できることで、入力された学習データに対する順伝搬処理の結果とその学習データに対する教師信号との対応付けが容易になる。例えば特許文献３の装置では、一旦全ての学習データに対する順伝搬結果を求めて記憶装置に蓄積し、その後で蓄積した順伝搬結果に対し、これに対応する教師データを順に与えて誤差逆伝搬学習を行っているが、この方式では蓄積した順伝搬結果と教師データの対応を管理する必要がある。これに対し、本実施形態のように順伝搬と逆伝搬を交互に行えば、そのような対応付けは不要になる。 By using the forward propagation clock 502 and the backward propagation clock 503 as described above, the forward propagation arithmetic processing for one learning data and the reverse propagation arithmetic processing based on the corresponding teacher signal are alternately performed, in other words, once. Can be executed simultaneously in parallel for each learning. Since forward propagation and back propagation can be executed alternately during learning in this way, it becomes easy to associate the result of forward propagation processing for the input learning data with the teacher signal for the learning data. For example, in the apparatus of Patent Document 3, the forward propagation results for all the learning data are obtained once and stored in the storage device, and then the teacher data corresponding to the forward propagation results accumulated thereafter are sequentially given to perform error back propagation learning. However, in this method, it is necessary to manage the correspondence between the accumulated forward propagation results and the teacher data. On the other hand, if forward propagation and reverse propagation are alternately performed as in the present embodiment, such association is unnecessary.

なお、学習時の順伝搬クロック及び逆伝搬クロックとして、図５に示したのとＨ，Ｌのレベルを反転させたものを用いることももちろん可能である。この場合、順伝搬クロックの立上がりで中間層モジュールに、立下がりで出力層モジュールに、それぞれ順伝搬処理を行わせ、逆伝搬クロックの立上がりで出力層モジュールに、立下がりで中間層モジュールに、それぞれ逆伝搬処理を行わせればよい。 Of course, the forward and backward propagation clocks at the time of learning may be those obtained by inverting the H and L levels as shown in FIG. In this case, the forward propagation clock causes the intermediate layer module to perform forward propagation processing at the falling edge, and the output layer module performs forward propagation processing at the falling edge, respectively. What is necessary is just to perform a back propagation process.

次に、認識モードでの処理を説明する。認識モードでは、クロック生成器１０２は、順伝搬クロック５０４及び逆伝搬クロック５０５を生成し、前者を順伝搬クロック出力ＣＬＫ_ＦＰから出力し、後者を逆伝搬クロック出力ＣＬＫ_ＢＰから出力する。 Next, processing in the recognition mode will be described. In the recognition mode, the clock generator 102 generates the forward propagation clock 504 and the backward propagation clock 505, outputs the former from the forward propagation clock output CLK_FP, and outputs the latter from the backward propagation clock output CLK_BP.

この例では、基準クロック２クロックで１回の認識動作を行う、すなわち、まず基準クロック５０１の１クロック目の立上がりで順伝搬クロック５０２を立ち下げる。これをトリガとして、中間層モジュール１０３〜１０５が順伝搬の演算処理を並列的に行う。次に、基準クロック５０１の２クロック目の立上がりで順伝搬クロック５０２を立ち上げる。これをトリガとして、出力層モジュール１０６，１０７が順伝搬の演算処理を実行する。認識処理では、逆伝搬は行わないので、逆伝搬クロック５０５は平坦な信号となる。 In this example, the recognition operation is performed once with two reference clocks, that is, the forward propagation clock 502 is first lowered at the first rising edge of the reference clock 501. With this as a trigger, the intermediate layer modules 103 to 105 perform forward propagation arithmetic processing in parallel. Next, the forward propagation clock 502 is raised at the second rising edge of the reference clock 501. With this as a trigger, the output layer modules 106 and 107 execute forward propagation arithmetic processing. In the recognition processing, no reverse propagation is performed, so that the reverse propagation clock 505 is a flat signal.

この例では、認識時の順伝搬クロック５０４は、基準クロック５０１の２クロックを１周期としているが、原理的には、ニューロプロセッサ１０１内の各層での順伝搬処理に要する時間の総和以上の周期であればよい。また、原理的には、順伝搬クロック５０４のＬレベル期間は中間層モジュール１０３〜１０５の順伝搬演算に要する時間以上、Ｈレベル期間は出力層モジュール１０６，１０７の順伝搬演算に要する時間以上、であればよい。 In this example, the forward-propagation clock 504 at the time of recognition has two cycles of the reference clock 501 as one cycle. However, in principle, the cycle is equal to or greater than the sum of the time required for the forward-propagation processing in each layer in the neuroprocessor 101. If it is. In principle, the L level period of the forward propagation clock 504 is longer than the time required for the forward propagation operation of the intermediate layer modules 103 to 105, and the H level period is longer than the time required for the forward propagation operation of the output layer modules 106 and 107. If it is.

以上、ニューロプロセッサ１０１の回路構成と、その動作について説明した。その説明から分かるように、本実施形態では、各層間を専用のバスで結合し、各層のモジュール１０３〜１０７に共通の順伝搬クロック及び逆伝搬クロックを供給することで、それら各モジュールを完全に並列動作させることができる。そして、学習モードでは、適切な位相関係を持つ順伝搬クロックと逆伝搬クロックとを用いることで、学習データの認識とそれに対する教師データを用いた誤差逆伝搬学習とを交互に実行することができる。 The circuit configuration and operation of the neuroprocessor 101 have been described above. As can be seen from the description, in this embodiment, the respective layers are connected by a dedicated bus, and the common forward propagation clock and the backward propagation clock are supplied to the modules 103 to 107 of each layer, so that each module is completely configured. It can be operated in parallel. In the learning mode, by using a forward propagation clock and a back propagation clock having an appropriate phase relationship, recognition of learning data and error back propagation learning using teacher data can be executed alternately. .

また、本実施形態では、上述のように入力層の各入力端子１１１〜１１３と中間層モジュール１０３〜１０５の各入力ＩＮ１〜ＩＮ３との接続関係を中間層モジュールごとにシフトし、各中間層モジュール１０３〜１０５の順伝搬出力と各出力層モジュール１０６，１０７の各順伝搬入力ＩＮ１〜ＩＮ３との接続関係を出力層モジュールごとにシフトしたので、各モジュールに対する結合荷重の初期値設定が容易であるという利点がある。この点につき、図６を参照して説明する。 Further, in the present embodiment, as described above, the connection relationship between the input terminals 111 to 113 of the input layer and the inputs IN1 to IN3 of the intermediate layer modules 103 to 105 is shifted for each intermediate layer module. Since the connection relationship between the forward propagation outputs 103 to 105 and the forward propagation inputs IN1 to IN3 of the output layer modules 106 and 107 is shifted for each output layer module, it is easy to set the initial value of the coupling load for each module. There is an advantage. This point will be described with reference to FIG.

図６は、結合荷重の初期値として３つの中間層モジュール１０３〜１０５の間で同じ値を設定したときに、入力層−中間層の間の接続関係を従来のようにシフトしない場合（ａ）と、本実施形態のようにシフトした場合（ｂ）とでどのように出力値が変化するかを示している。図示例では、全ての中間層モジュール１０３〜１０５に対し、結合荷重の初期値として、ｗ１＝０．２，ｗ２＝０．６，ｗ３＝０．５が設定されている。 FIG. 6 shows a case where the connection relationship between the input layer and the intermediate layer is not shifted as in the conventional case when the same value is set between the three intermediate layer modules 103 to 105 as the initial value of the coupling load (a). And how the output value changes in the case of shifting (b) as in the present embodiment. In the illustrated example, w1 = 0.2, w2 = 0.6, and w3 = 0.5 are set as initial values of the coupling loads for all the intermediate layer modules 103 to 105.

このような初期値設定をした場合、接続関係のシフトをしない場合（ａ）では、入力層からＲ，Ｇ，Ｂの入力を与えると、中間層モジュール１０３も、中間層モジュール１０４も、中間層モジュール１０５も、全て同じ値（この例では０．４２）を出力する。このように全ての中間層モジュール１０３〜１０５の出力値が同じでは、これに対して誤差逆伝搬により学習を行っても、全ての中間層モジュールが同じ出力値と同じ逆伝搬値に基づき結合荷重や閾値を変更するので、すべて同じように変化するだけでニューラルネットワークが均一状態のままとなり、学習が進まない。 In such a case where the initial values are set and the connection relationship is not shifted (a), if the input of R, G, B is given from the input layer, both the intermediate layer module 103, the intermediate layer module 104, and the intermediate layer All the modules 105 also output the same value (0.42 in this example). In this way, if the output values of all the intermediate layer modules 103 to 105 are the same, even if learning is performed by back propagation, all the intermediate layer modules are connected to the same output value and the same back propagation value based on the combined load. Since the threshold value is changed, the neural network remains in a uniform state only by changing all the same, and learning does not proceed.

これに対し、（ｂ）のように接続関係をシフトすれば、中間層モジュール１０３〜１０５内の結合荷重がモジュール間で同じでも、入力に対する出力値がモジュール間で相互に異なったものとなる。このように中間層モジュール間で出力値が異なれば、ニューラルネットワークの状態にばらつきができ、学習が進む。 On the other hand, if the connection relationship is shifted as shown in (b), even if the coupling loads in the intermediate layer modules 103 to 105 are the same among the modules, the output values with respect to the inputs are different between the modules. In this way, if the output value differs between the intermediate layer modules, the state of the neural network can vary, and learning proceeds.

したがって、本実施形態では、複数の中間層モジュール１０３〜１０５に対し、共通の荷重の組（ｗ１，ｗ２，ｗ３）を設定すれば、学習が進む初期設定ができる。このためには、中間層モジュール１０３〜１０５の荷重記憶部３０２同士を同じ初期値設定用のラインで結び、同様に３０３同士、３０４同士をそれぞれ同じ初期値設定用のラインで結び、それら各ラインに対してそれぞれ対応する荷重の初期値を与えればよい。ｗ１＝ｗ２＝ｗ３となる初期値の組を与えない限り、接続関係のシフトの効果により、順伝搬の演算結果にモジュールごとのばらつきが出やすくなるので、学習が進みやすい。 Therefore, in the present embodiment, if a common set of loads (w1, w2, w3) is set for the plurality of intermediate layer modules 103 to 105, an initial setting in which learning proceeds can be performed. For this purpose, the load storage units 302 of the intermediate layer modules 103 to 105 are connected with the same initial value setting line, and similarly, the 303 and 304 are connected with the same initial value setting line. The initial value of the corresponding load may be given to each. Unless a set of initial values satisfying w1 = w2 = w3 is given, variations in the forward propagation calculation results are likely to occur for each module due to the effect of shifting the connection relationship, so that learning is likely to proceed.

なお、接続関係をシフトしない回路構成でも、３つの結合荷重の初期値を各中間層モジュールの荷重記憶部３０２〜３０４に、モジュールの順にシフトしながら設定するような内部回路を組み込めば上述と同様の効果が得られるが、これでは内部回路の分だけニューロプロセッサの回路規模が増大するという不利益がある。 Even in a circuit configuration that does not shift the connection relationship, if an internal circuit that sets the initial values of the three coupling loads in the load storage units 302 to 304 of each intermediate layer module while shifting in the order of the modules is incorporated, the same as described above However, this has the disadvantage that the circuit scale of the neuroprocessor increases by the amount of the internal circuit.

これに対し、接続関係をシフトする本実施形態の回路構成では、回路規模の増大を押さえることができる。また、ＦＰＧＡ(Field Programmable Gate Array)などのようなプログラマブルなロジックデバイスを用いてニューロプロセッサ１０１を構成する場合、接続関係のシフトは、ＶＨＤＬ等のハードウエア記述言語での回路記述において、入力層・中間層間のポートマップに記述する関係を、各中間層モジュールごとにシフトするだけでよいので、実現が容易である。 On the other hand, in the circuit configuration of the present embodiment that shifts the connection relationship, an increase in circuit scale can be suppressed. Also, when the neuroprocessor 101 is configured using a programmable logic device such as an FPGA (Field Programmable Gate Array), the shift of the connection relationship is caused by the input layer and the circuit in the hardware description language such as VHDL. Since the relationship described in the port map between the intermediate layers only needs to be shifted for each intermediate layer module, the implementation is easy.

以上、入力層・中間層間の接続関係のシフトについて説明したが、中間層・出力層間の接続関係についても同様である。なお、実施形態では、入力層・中間層間と中間層・出力層間の両方で接続関係をシフトしたが、どちらか一方だけでもある程度の効果が得られる。 The shift of the connection relationship between the input layer and the intermediate layer has been described above, but the same applies to the connection relationship between the intermediate layer and the output layer. In the embodiment, the connection relationship is shifted in both the input layer / intermediate layer and the intermediate layer / output layer, but a certain degree of effect can be obtained by using only one of them.

以上の実施形態は、中間層が１層しかない３層構造のニューラルネットワークであったが、中間層が２層以上あるニューラルネットワークも、本実施形態と同様の方式で構成することができる。 In the above embodiment, the neural network has a three-layer structure having only one intermediate layer. However, a neural network having two or more intermediate layers can also be configured in the same manner as in this embodiment.

次に本実施形態に基づくニューロプロセッサ１０１の具体例について説明する。 Next, a specific example of the neuroprocessor 101 based on this embodiment will be described.

一つの具体例では、ニューロプロセッサ１０１には、基準クロックとしてＶＧＡ(Video Graphics Array)のピクセルクロック２４．５４５４ＭＨｚを用いる。この場合学習過程では４クロックで１回の学習処理が行えることから、毎秒６０フレームで描画される画像に対しては、１フレームで１０万回（24,545,400÷4÷60≒100,000）の学習処理を実行することができる。すなわち、例えば１フレームの画像のうちの１０万画素のデータを１画素ずつ順に入力層に与え、学習することができる（もちろんこのときには各画素に対応する教師信号の値も同時に出力層に与える）。なお、１フレームの画素数が１０万より多い場合には、例えば学習に使う領域（領域の画素数の合計を１０万以下とすればよい）をあらかじめ設定しておき、これら学習領域の画素のデータを順にニューロプロセッサ１０１に与えればよい。 In one specific example, the neuroprocessor 101 uses a VGA (Video Graphics Array) pixel clock 24.5454 MHz as a reference clock. In this case, since the learning process can be performed once every 4 clocks in the learning process, 100,000 images (24,545,400 ÷ 4 ÷ 60 ≒ 100,000) learning processes per frame are performed for images drawn at 60 frames per second. Can be executed. That is, for example, data of 100,000 pixels in an image of one frame can be given to the input layer one by one in order and can be learned (of course, the value of the teacher signal corresponding to each pixel is also given to the output layer at this time) . If the number of pixels in one frame is greater than 100,000, for example, a region used for learning (the total number of pixels in the region may be 100,000 or less) is set in advance, and the pixels in these learning regions are set. Data may be given to the neuroprocessor 101 in order.

図８に、このような学習領域の設定例を示す。この例では、ＶＧＡの６４０×４８０画素の画面７０１の中央に６４×６４画素の目的領域７０２、その周囲に３２×３２画素の４つの非目的領域７０３，７０４，７０５，７０６を設定している。目的領域７０２は肌色として学習して欲しい色の画像が来る領域であり、非目的領域７０３〜７０６は肌色ではないと学習して欲しい色の画像が来る領域である。したがって、目的領域７０２の画素のデータを学習データとしてニューロプロセッサ１０１の入力に与えるときには、教師信号として、肌色度を出力する出力層モジュール１０６には例えば「１」を、非肌色度を出力する出力層モジュール１０６には例えば「０」を、それぞれ与える。逆に、非目的領域７０３〜７０６の画素のデータを学習データとしてニューロプロセッサ１０１の入力に与えるときには、教師信号として、肌色度を出力する出力層モジュール１０６には例えば「０」を、非肌色度を出力する出力層モジュール１０６には例えば「１」を、それぞれ与える。 FIG. 8 shows an example of setting such a learning area. In this example, a 64 × 64 pixel target area 702 is set at the center of a VGA 640 × 480 pixel screen 701, and four non-target areas 703, 704, 705 and 706 of 32 × 32 pixels are set around the target area 702. . The target area 702 is an area where an image of a color desired to be learned as a skin color comes, and the non-target areas 703 to 706 are areas where an image of a color desired to be learned that is not a skin color comes. Therefore, when the pixel data of the target area 702 is given to the input of the neuroprocessor 101 as learning data, for example, “1” is output to the output layer module 106 that outputs skin chromaticity as a teacher signal, and output that outputs non-skin chromaticity. For example, “0” is given to each of the layer modules 106. On the contrary, when the pixel data of the non-target areas 703 to 706 is given to the input of the neuroprocessor 101 as learning data, for example, “0” is set in the output layer module 106 that outputs skin chromaticity as a teacher signal. For example, “1” is given to the output layer module 106 that outputs “1”.

また、認識過程では、２クロック毎に１回の認識処理が行えることから、ノンインターレース画像では２画素に１回の割合で、インターレース画像の場合はアップスキャン前の全画素に対して、認識処理を行うことができる。 Also, since the recognition process can be performed once every two clocks in the recognition process, the recognition process is performed on every pixel before up-scanning in the case of an interlaced image at a rate of once every two pixels. It can be performed.

また、一つの具体例では、ニューロプロセッサ１０１内部の数値表現には、図７に示すように１６ビットの固定小数点を用いる。この１６ビットの内訳は、小数部６０３を８ビット、整数部６０２を７ビット、符号部６０１を１ビットとした。よって、分解能は0.00390625、最大値は127.99609375、最小値は-127.99609375となる。 Also, in one specific example, a 16-bit fixed point is used for the numerical expression in the neuroprocessor 101 as shown in FIG. The breakdown of the 16 bits is that the decimal part 603 is 8 bits, the integer part 602 is 7 bits, and the encoding part 601 is 1 bit. Therefore, the resolution is 0.00390625, the maximum value is 127.99609375, and the minimum value is -127.99609375.

ここでニューロプロセッサ１０１に対する入力値はＲ，Ｇ，Ｂのそれぞれに対して８ビットなので、その８ビットを０〜１の小数とする。これが入力された中間層モジュール１０３〜１０５は、その８ビットの上位の整数部７ビットと符号部１ビットにそれぞれ０をパディングすることで、１６ビットの固定小数点データとし、そのモジュール１０３〜１０５では１６ビットの固定小数点データとして加算や乗算を実行する。結合荷重や閾値も同じ１６ビットの固定小数点データの形で設定しておく。なお、このようにモジュール内部では１６ビット固定小数点で演算を行うが、出力の際にはシグモイド関数ＬＵＴ部３０７により０〜１の値に正規化され、これが８ビットデータとして出力される。これを受け取った出力層モジュール１０６，１０７は、中間層と同様、その８ビットデータの上位ビットの０をパディングして１６ビットの固定小数点データとし、内部の演算を実行する。 Here, since the input value to the neuroprocessor 101 is 8 bits for each of R, G, and B, the 8 bits are set to a decimal number from 0 to 1. The intermediate layer modules 103 to 105 to which this has been input are converted to 16-bit fixed-point data by padding the 8-bit upper integer part 7 bits and the code part 1 bit with 0, respectively. Addition and multiplication are performed as 16-bit fixed point data. The joint weight and threshold value are also set in the form of the same 16-bit fixed point data. As described above, the calculation is performed with a 16-bit fixed point in the module as described above, but when output, it is normalized to a value of 0 to 1 by the sigmoid function LUT unit 307 and output as 8-bit data. Upon receiving this, the output layer modules 106 and 107 pad the high-order bits 0 of the 8-bit data as 16-bit fixed point data, and execute internal calculations, as in the intermediate layer.

例えばＣ言語などの高級言語を用いてニューラルネットワークを記述し、ノイマン型計算機により逐次的に実行する場合には、通常、倍精度浮動小数点が用いられる。これに対し本願発明の発明者はシミュレーション実験から、１６ビットの固定小数点表現を用いた場合でも、倍精度浮動小数点を用いた場合と比較して遜色のない結果が得られること確認している。この比較の結果については、後で図１１を参照して説明する。 For example, when a neural network is described using a high-level language such as C language and executed sequentially by a Neumann computer, a double precision floating point is usually used. On the other hand, the inventors of the present invention have confirmed from simulation experiments that even if a 16-bit fixed-point representation is used, an inferior result can be obtained as compared with the case where a double-precision floating point is used. The result of this comparison will be described later with reference to FIG.

また、このように１６ビットの固定小数点データを用いることに伴い、この具体例ではシグモイド関数は階段関数に、シグモイド関数の微分形はピラミッド型関数で近似した。 In addition, with the use of 16-bit fixed point data in this way, in this specific example, the sigmoid function is approximated by a step function, and the differential form of the sigmoid function is approximated by a pyramid type function.

すなわち、図９の（ａ）に示すように、順伝搬時に用いるシグモイド関数（破線）は、２３段の階段関数（実線）として近似した。また図９の（ｂ）に示すように、逆伝搬時に用いるシグモイド１次微分関数（実線）は、１２段のピラミッド型関数（破線）として近似した。なお、これら各関数のｘ軸方向の変異幅（ステップの刻み）はそれぞれ０．５としている。シグモイド関数ＬＵＴ部３０７，４０７には図９（ａ）の階段関数を表すテーブルが、シグモイド微分関数ＬＵＴ部３０８，４０８には図９（ｂ）のピラミッド型関数を表すテーブルが、それぞれ保持されることになる。 That is, as shown in FIG. 9A, the sigmoid function (broken line) used for forward propagation was approximated as a 23-step step function (solid line). Further, as shown in FIG. 9B, the sigmoid primary differential function (solid line) used at the time of back propagation was approximated as a 12-stage pyramid function (broken line). Note that the variation width (step increment) in the x-axis direction of each of these functions is 0.5. The sigmoid function LUT units 307 and 407 hold the table representing the step function of FIG. 9A, and the sigmoid differential function LUT units 308 and 408 hold the table representing the pyramid function of FIG. 9B, respectively. It will be.

本実施形態のニューロプロセッサは、IP（Intellectual Property）コアとしての提供を想定しており、図１０の例ではＦＰＧＡ８０４上に実装している。この実装例は、一般的なシーン画像からリアルタイムに肌領域を学習し抽出する処理を対象に、その有効性を確認した段階にある。したがって、ここでは、肌領域を抽出するためのシステムを例に挙げて説明する。しかしながら、本実施形態の方式は、肌領域の抽出を目的としたシステムに限らず、形状や音声等のパターン認識一般に拡張可能である。 The neuroprocessor of the present embodiment is supposed to be provided as an IP (Intellectual Property) core, and is mounted on the FPGA 804 in the example of FIG. This implementation example is in the stage of confirming the effectiveness of a process for learning and extracting a skin region in real time from a general scene image. Therefore, here, a system for extracting a skin region will be described as an example. However, the method of the present embodiment is not limited to a system for extracting a skin region, and can be extended to pattern recognition such as shape and voice in general.

この実装例では、ボード８０７に対し、ＦＰＧＡ８０４としてAltera社のStratixシリーズEP1S80を搭載している。また、画像入力用ＬＳＩとしてデジタルビデオデコーダ８０５、画像出力用ＬＳＩとしてビデオＤ／Ａコンバータ８０６を搭載している。ＣＣＤカメラ８０８から入力される画像は、デジタルビデオレコーダ８０５を通してデジタルデータとしてＦＰＧＡ８０４内の画像処理ブロック８０２に入力される。ＦＰＧＡ８０４内のニューロプロセッサ部８０１には画像処理ブロック８０２を介して画像データが１画素分ずつ順に受け渡される。ニューロプロセッサ部８０１は、受け取ったデータに対し上述の処理を行う。その処理結果は、画像処理ブロック８０２に入力される認識モードでは、画像処理ブロック８０２は、例えばニューロプロセッサ部８０１から供給される各画素の肌色度及び非肌色度の値から各画素が肌色領域に属するかどうかを判定し、デジタルビデオデコーダ８０５から入力された画像データのうち肌色領域に属さない画素の値はマスクし、肌色領域に属する画素の値のみを出力する。これにより、例えば顔認識に利用できる顔領域の画像を抽出できる。画像処理ブロック８０２の出力はビデオＤ／Ａコンバータ８０６によりアナログデータに変換され、ＶＧＡディスプレイ８０９に表示される。なお、デジタルビデオレコーダ８０５からはインターレースの画像データが入力され、ビデオＤ／Ａコンバータ８０６ではノンインターレースの画像データを扱うため、インターレースからノンインターレースの変換、すなわちアップスキャン処理は画像処理ブロック８０２が担っている。また、ＲＡＭ８０３は、画像処理ブロック８０２等の処理の際の作業用のメモリ領域として用いられる。 In this mounting example, an Altera Stratix series EP1S80 is mounted on the board 807 as the FPGA 804. Further, a digital video decoder 805 is mounted as an image input LSI, and a video D / A converter 806 is mounted as an image output LSI. An image input from the CCD camera 808 is input to the image processing block 802 in the FPGA 804 as digital data through the digital video recorder 805. Image data is sequentially delivered to the neuro processor unit 801 in the FPGA 804 pixel by pixel via the image processing block 802. The neuroprocessor unit 801 performs the above-described processing on the received data. In the recognition mode that is input to the image processing block 802, the processing result is obtained by the image processing block 802, for example, from the skin chromaticity and non-skin chromaticity values of each pixel supplied from the neuroprocessor unit 801. It is determined whether or not the pixel data does not belong to the skin color area of the image data input from the digital video decoder 805, and only the pixel values belonging to the skin color area are output. Thereby, for example, an image of a face area that can be used for face recognition can be extracted. The output of the image processing block 802 is converted into analog data by the video D / A converter 806 and displayed on the VGA display 809. Since interlaced image data is input from the digital video recorder 805 and the video D / A converter 806 handles non-interlaced image data, the image processing block 802 is responsible for interlaced to non-interlaced conversion, that is, up-scan processing. ing. The RAM 803 is used as a working memory area when processing the image processing block 802 and the like.

最後に、図１１を参照して、上述の具体的な実装例の装置を用いて肌領域を抽出した抽出結果９０１と、通常のノイマン型計算機で実行した倍精度浮動小数点モデルによる抽出結果９０２を比較する。本実施形態の実装例では、１０万回の学習に対してエラーレートが1.746094まで収束している。これに対し、倍精度浮動小数点モデルでは、２１，１５５回の学習でエラーレートが0.01まで収束した。これ以上の学習はオーバーフィッティングを招くことが考えられることから、0.01をもって学習を打ち切った。実装例による抽出結果９０１は、特に画面右下を中心とする背景領域での誤抽出、右眼の下側や鼻の上部などで未検出が発生している。しかしながら、倍精度浮動小数点モデルによる抽出結果９０２と比較して遜色ない結果が得られていることから、本実施形態の構成は、ソフトウェアモデル（倍精度浮動小数点モデル）にかなり近いレベルで、ニューラルネットワークの学習及び認識処理を並列かつ高速に演算できるハードウェアモデルであるといえる。 Finally, referring to FIG. 11, an extraction result 901 obtained by extracting the skin region using the apparatus of the above-described specific implementation example, and an extraction result 902 obtained by a double precision floating point model executed by a normal Neumann computer are used. Compare. In the implementation example of this embodiment, the error rate converges to 1.746094 for 100,000 learnings. In contrast, in the double precision floating point model, the error rate converged to 0.01 after 21,155 learnings. Since learning beyond this could lead to overfitting, learning was terminated at 0.01. In the extraction result 901 according to the implementation example, misdetection particularly in the background region centered on the lower right of the screen, undetected in the lower part of the right eye, the upper part of the nose, or the like. However, since a result comparable to the extraction result 902 by the double-precision floating-point model is obtained, the configuration of this embodiment has a level that is considerably close to that of the software model (double-precision floating-point model), and a neural network. It can be said that this is a hardware model that can perform learning and recognition processing in parallel and at high speed.

実施形態のニューロプロセッサの構成を示す図である。It is a figure which shows the structure of the neuroprocessor of embodiment. 階層型ニューラルネットワークの工学モデルを説明するための図である。It is a figure for demonstrating the engineering model of a hierarchical neural network. 中間層モジュールの内部構成を示す図である。It is a figure which shows the internal structure of an intermediate | middle layer module. 出力層モジュールの内部構成を示す図である。It is a figure which shows the internal structure of an output layer module. クロック生成器が生成するクロック信号を示す図である。It is a figure which shows the clock signal which a clock generator produces | generates. 接続関係のシフトの効果を説明するための図である。It is a figure for demonstrating the effect of a shift of connection relation. 具体例での各モジュール内部での固定小数点データ表現を説明する図である。It is a figure explaining the fixed point data expression in each module in a specific example. 画面中に設定される学習領域の例を示す図である。It is a figure which shows the example of the learning area | region set in a screen. 具体例でのシグモイド関数及びシグモイド微分関数の近似関数の例を示す図である。It is a figure which shows the example of the approximate function of the sigmoid function in a specific example, and a sigmoid differential function. ニューロプロセッサを用いた画像処理システムの具体例を示す図である。It is a figure which shows the specific example of the image processing system using a neuroprocessor. 内部的なデータ処理を固定小数点で行う具体例のニューロプロセッサと、倍精度浮動小数点演算を行うソフトウエアによるニューラルネットワークとの比較例を示す図である。It is a figure which shows the comparative example of the neuro processor of the specific example which performs internal data processing by a fixed point, and the neural network by the software which performs a double precision floating point calculation.

Explanation of symbols

１０１ニューロプロセッサ、１０２クロック生成器、１０３〜１０５中間層モジュール、１０６，１０７出力層モジュール、１０８〜１１０バス、１１１〜１１３入力端子、１１４，１１５出力端子、１１６，１１７教師信号入力端子。 101 neuroprocessor, 102 clock generator, 103-105 intermediate layer module, 106, 107 output layer module, 108-110 bus, 111-113 input terminal, 114, 115 output terminal, 116, 117 teacher signal input terminal.

Claims

An upper layer containing multiple signal sources;
A lower layer containing multiple neuron modules;
With
Each of the neuron modules is a plurality of input terminals to which signals are input from the signal source and arranged in a predetermined order; a connection load storage unit that stores a connection load for each of the input terminals; A forward propagation computation circuit that performs forward propagation computation using a signal input from each input terminal and a coupling load corresponding to each input terminal,
A neuroprocessor characterized in that a connection relationship between each signal source and each input terminal arranged in a predetermined order of the neuron module is different for each neuron module.

The neuroprocessor of claim 1, comprising:
Each neuron module in the lower layer corrects the coupling load for each of the input terminals stored in the coupling load storage unit based on the computation result of the forward propagation computation circuit and the given back propagation signal, and A back propagation operation circuit for generating a back propagation signal for each signal source in the upper layer, and a back propagation output terminal for outputting each back propagation signal,
Each back propagation output terminal is connected to a back propagation input terminal of the corresponding signal source, respectively.
A neuroprocessor characterized by that.

A neuroprocessor according to claim 2, wherein
A first clock generation circuit for generating a learning forward propagation clock and a learning backward propagation clock from a given reference clock in the learning mode;
Each signal source in the upper layer is a neuron module,
Each of the upper layer neuron modules and the lower layer neuron modules alternately execute forward propagation operation and back propagation operation according to the learning forward propagation clock and the learning back propagation clock. Neuroprocessor to do.

The neuroprocessor according to claim 3, wherein
The learning forward propagation clock and the learning backward propagation clock have a predetermined period equal to or longer than a time required for a series of forward propagation operations and backward propagation operations in the upper layer and the lower layer, and the learning backward propagation clock. Is behind the learning forward propagation clock by a predetermined delay time equal to or longer than the time required for a series of forward propagation operations in the upper layer and the lower layer,
The neuron module of the upper layer executes the forward propagation calculation using the falling edge of the learning forward propagation clock as a trigger, and executes the reverse propagation operation using the rising edge of the learning back propagation clock as a trigger,
The lower layer neuron module executes a forward propagation operation with a rising edge of the learning forward propagation clock as a trigger, and executes a back propagation operation with a falling edge of the learning back propagation clock as a trigger.
A neuroprocessor characterized by that.

The neuroprocessor according to claim 3, wherein
The learning forward propagation clock and the learning backward propagation clock have a predetermined period equal to or longer than a time required for a series of forward propagation operations and backward propagation operations in the upper layer and the lower layer, and the learning backward propagation clock. Is behind the learning forward propagation clock by a predetermined delay time equal to or longer than the time required for a series of forward propagation operations in the upper layer and the lower layer,
The neuron module of the upper layer executes a forward propagation operation using a rising edge of the learning forward propagation clock as a trigger, and executes a backward propagation operation using a falling edge of the learning back propagation clock as a trigger,
The neuron module of the lower layer executes a forward propagation calculation using a falling edge of the learning forward propagation clock as a trigger, and executes a reverse propagation operation using a rising edge of the learning back propagation clock as a trigger.
A neuroprocessor characterized by that.

A neuroprocessor according to claim 4 or 5, wherein
A second clock generation circuit for generating a recognition forward propagation clock having a predetermined period equal to or longer than a time required for a series of forward propagation operations in the upper layer and the lower layer from the reference clock in the recognition mode;
The upper layer neuron module triggers one of the rising and falling edges of the recognition forward propagation clock, and the lower layer neuron module triggers the other of the rising and falling edges of the recognition forward propagation clock, respectively. Perform forward propagation as
A neuroprocessor characterized by that.

A neuroprocessor according to claim 6, wherein
Each neuron module in the upper layer and the lower layer includes a forward propagation clock input terminal and a backward propagation clock input terminal,
The output terminals of the learning forward propagation clock of the first clock generation circuit and the recognition forward propagation clock of the second clock generation circuit are shared, and the output terminals are connected to the upper layer and lower layer neuron modules. Connect to the forward propagation clock input terminal of
Connecting the output terminal of the back propagation clock for learning of the first clock generation circuit to the back propagation clock input terminal of each neuron module of the upper layer and the lower layer;
Operating the first clock generation circuit in the learning mode to stop the second clock generation circuit, and in the recognition mode to stop the first clock generation circuit and operate the second clock generation circuit;
A neuroprocessor characterized by that.