JPH06266398A

JPH06266398A - Arithmetic unit using neural network

Info

Publication number: JPH06266398A
Application number: JP5055938A
Authority: JP
Inventors: Hiroya Murao; 浩也村尾; Toshiyuki Watanabe; 俊幸渡辺; Shinichi Tsurufuji; 真一鶴藤
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1993-03-16
Filing date: 1993-03-16
Publication date: 1994-09-22

Abstract

PURPOSE:To enable plural neural networks to be realized by the same program by performing arithmetic for an optional neural network by a neural network arithmetic part according to a selection. CONSTITUTION:A neural network structure storage part 100 sends neural network structure information to the neural network arithmetic part 40. The neural network arithmetic part 40, on receiving the information, determines the structure of the neural network and becomes to be in a state waiting data from a pattern generation part 20. Then, when the pattern generation part 20 generates a speech pattern, the arithmetic of the neural network is performed on the basis of the speech pattern and inter-unit coupling information and a coupling coefficient designated by the neural network structure storage part 100 by a program written in the ROM of the neural network arithmetic part 40.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ニューラルネットワー
クを用いた演算装置であって、特にニューラルネットワ
ークへの入力情報として、音声の分析結果である特徴パ
ラメータを用いることにより音声パターン等を認識する
認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an arithmetic unit using a neural network, and particularly to recognition for recognizing a voice pattern or the like by using a characteristic parameter which is a voice analysis result as input information to the neural network. Regarding the device.

【０００２】[0002]

【従来の技術】音声認識装置において、日常人間が行っ
ている判定方式に近い処理をコンピュータ上で実現する
ことが、音声認識の実用化にとって重要であり、その一
方策として、人間の神経回路網を簡単なモデルとしてコ
ンピュータ上で実現したニューラルネットワークが広く
用いられている。2. Description of the Related Art In a speech recognition apparatus, it is important for practical use of speech recognition to realize processing on a computer that is similar to the judgment method performed by everyday humans. A neural network realized on a computer as a simple model is widely used.

【０００３】ニューラルネットワークの演算を行う手法
としては、専用ＬＳＩによる方法と汎用コンピュータに
よる方法があるが、専用ＬＳＩは、コストアップにつな
がるため、汎用コンピュータによりプログラムを組ん
で、演算を実現することが一般的である。As a method of calculating the neural network, there are a method using a dedicated LSI and a method using a general-purpose computer. However, since the dedicated LSI leads to an increase in cost, it is possible to program the general-purpose computer to realize the calculation. It is common.

【０００４】図３はニューラルネットワークを用いた音
声認識装置の概略構成図であり、同図における１１は入
力された音声を分析する音声分析部、１２は音声分析部
１１で分析された音声の特徴パラメータを基に音声区間
を検出する区間検出部であり、この音声分析部１１及び
区間検出部１２により特徴パラメータ抽出部１０を構成
する。２０は特徴パラメータ抽出部１０で抽出された音
声区間の特徴パラメータからパターンを作成するパター
ン作成部、３１はニューラルネットワークのユニット間
の結合情報が数値化され格納されているユニット間結合
情報記憶部、３２はニューラルネットワークのユニット
間の結合係数が格納されている結合係数記憶部であり、
このユニット間結合情報記憶部３１と結合係数記憶部３
２が認識時にはＲＯＭ３０に格納される。４０はパター
ン作成部２０で作成されたパターン、ユニット間結合情
報記憶部３１に記憶されているユニット間結合情報及び
結合係数記憶部３２に記憶されている結合係数からニュ
ーラルネットワークの演算を行うニューラルネットワー
ク演算部、５０はニューラルネットワーク演算部の演算
結果を基に認識の判定を行う認識判定部である。FIG. 3 is a schematic configuration diagram of a voice recognition device using a neural network. In FIG. 3, 11 is a voice analysis unit for analyzing the input voice, and 12 is a feature of the voice analyzed by the voice analysis unit 11. This is a section detection unit that detects a voice section based on a parameter, and the voice analysis section 11 and the section detection section 12 configure a characteristic parameter extraction section 10. Reference numeral 20 is a pattern creation unit that creates a pattern from the feature parameters of the voice section extracted by the feature parameter extraction unit 10, 31 is an inter-unit connection information storage unit in which connection information between units of the neural network is digitized and stored, Reference numeral 32 denotes a coupling coefficient storage unit that stores coupling coefficients between units of the neural network,
This inter-unit coupling information storage unit 31 and coupling coefficient storage unit 3
When 2 is recognized, it is stored in the ROM 30. Reference numeral 40 denotes a neural network for performing a neural network operation from the pattern created by the pattern creating unit 20, the inter-unit coupling information stored in the inter-unit coupling information storage unit 31, and the coupling coefficient stored in the coupling coefficient storage unit 32. The calculation unit 50 is a recognition determination unit that determines recognition based on the calculation result of the neural network calculation unit.

【０００５】図４は図３に示す音声認識装置のニューラ
ルネットワークを学習するためのニューラルネットワー
ク学習装置の概略構成図であり、図３におけるニューラ
ルネットワークを用いた音声認識装置と同一機能を有す
るものについては、同一番号を付し、その説明は省略す
る。６０はパターン作成部２０から特徴パラメータを受
けて音声パターンを蓄積する学習パターン蓄積部、７０
は学習パターン蓄積部６０からニューラルネットワーク
演算部４０に送られた学習パターンの該当するカテゴリ
（ここでカテゴリとは認識対象となる語彙のことであ
る。）の番号を教師信号として発生する教師信号発生
部、８０はニューラルネットワーク演算部４０の演算結
果と教師信号発生部７０の教師信号とにより誤差を計算
する誤差演算部、９０は誤差演算部８０の演算結果に基
づいて結合係数記憶部３２の結合係数を変更する学習演
算部である。FIG. 4 is a schematic block diagram of a neural network learning device for learning the neural network of the speech recognition device shown in FIG. 3, which has the same function as the speech recognition device using the neural network in FIG. Are denoted by the same reference numerals, and description thereof will be omitted. Reference numeral 60 denotes a learning pattern storage unit that receives characteristic parameters from the pattern creation unit 20 and stores a voice pattern, and 70
Is a teacher signal generation that generates, as a teacher signal, the number of the corresponding category (here, the category is a vocabulary to be recognized) of the learning pattern sent from the learning pattern storage unit 60 to the neural network operation unit 40. Reference numeral 80 denotes an error calculation unit that calculates an error based on the calculation result of the neural network calculation unit 40 and the teacher signal of the teacher signal generation unit 70. Reference numeral 90 denotes the combination of the combination coefficient storage unit 32 based on the calculation result of the error calculation unit 80. It is a learning calculation unit that changes the coefficient.

【０００６】図５は階層型ニューラルネットワークの概
略構成図であり、４１ａは入力層、４２ａは中間層、４
３ａは出力層であり、夫々Ｉ個、Ｊ個、Ｋ個のユニット
より構成されている。上下方向に隣接する各層を構成す
る夫々のユニットは図示した通り、ユニット間結合情報
記憶部３１の結合情報を基に情報伝送経路によって接続
される。その情報伝達経路の結合係数は結合係数記憶部
３２に格納されている。FIG. 5 is a schematic configuration diagram of a hierarchical neural network. 41a is an input layer, 42a is an intermediate layer, and 4a.
An output layer 3a is composed of I, J, and K units, respectively. As shown in the drawing, the respective units forming the layers adjacent to each other in the vertical direction are connected by the information transmission path based on the coupling information in the inter-unit coupling information storage unit 31. The coupling coefficient of the information transmission path is stored in the coupling coefficient storage unit 32.

【０００７】ここで、具体的に各層のユニットの個数を
述べる。例えば、一桁の数字音声「れい」、「いち」、
「に」、「さん」、「し」、「ご」、「ろく」、「し
ち」、「はち」、「く」及びこれらの読み替えである
「ぜろ」、「まる」、「よん」、「なな」、「きゅう」
の計１５単語を例に挙げる。Here, the number of units in each layer will be specifically described. For example, the one-digit number voice "Rei", "Ichi",
"Ni", "san", "shi", "go", "roku", "shichi", "hachi", "ku" and their replacements "zero", "maru", "yon" , "Nana", "Kyu"
Take a total of 15 words as an example.

【０００８】入力される音声は、周波数帯域を１６分
割、時間を８分割してパターン化するようにしているの
で、入力層４１ａのユニット数Ｉ＝１６×８＝１２８で
ある。Since the frequency of the input voice is divided into 16 and the time is divided into 8, the number of units in the input layer 41a is I = 16 × 8 = 128.

【０００９】また、認識すべきカテゴリ数は１５である
ので、出力層４３ａのユニット数Ｋ＝１５であり、更に
中間層４２ａのユニット数Ｊは５０とする。Since the number of categories to be recognized is 15, the number of units K of the output layer 43a is K = 15, and the number of units J of the intermediate layer 42a is 50.

【００１０】上述の如き構成において、学習パターン蓄
積部６０における学習パターンの蓄積並びにニューラル
ネットワークの学習について説明する。一桁数字音声
「れい」を音声分析部１１に入力する。音声分析部１１
では、その音声「れい」をＡ／Ｄコンバータ（明示せ
ず）で、Ａ／Ｄ変換したうえで、例えば１００〜６００
０Ｈｚの周波数帯域を１６分割し、夫々の周波数帯域に
おける大きさ、即ち１６個の周波数成分を５ミリ秒毎に
抽出すると共に、必要に応じてパワー情報等も抽出す
る。これらの情報が特徴パラメータとして用いられる。The accumulation of learning patterns in the learning pattern accumulating unit 60 and the learning of the neural network in the above-mentioned configuration will be described. The one-digit number voice “Rei” is input to the voice analysis unit 11. Speech analysis unit 11
Then, after the voice "Rei" is A / D converted by an A / D converter (not explicitly shown), for example, 100 to 600
The frequency band of 0 Hz is divided into 16, and the size in each frequency band, that is, 16 frequency components are extracted every 5 milliseconds, and power information and the like are extracted as necessary. These pieces of information are used as characteristic parameters.

【００１１】区間検出部１２では、音声分析部１１で抽
出された特徴パラメータに基づいて、入力音声「れい」
の開始時刻及び終了時刻を判定して音声区間を決定す
る。区間決定部１２で決定された音声区間の特徴パラメ
ータは、パターン作成部２０において、その音声区間を
８分割し、各区間における５ミリ秒毎に抽出された周波
数成分の平均を各周波数成分毎に算出する。即ち、１つ
の入力音声に対して、１６×８個のデータからなる音声
パターンが作成される。In the section detection unit 12, based on the characteristic parameters extracted by the voice analysis unit 11, the input voice "Rei"
The voice section is determined by determining the start time and end time of. The feature parameter of the voice section determined by the section determining unit 12 is divided into eight by the pattern creating unit 20, and the average of the frequency components extracted every 5 milliseconds in each period is averaged for each frequency component. calculate. That is, a voice pattern consisting of 16 × 8 data is created for one input voice.

【００１２】パターン作成部２０で作成された音声パタ
ーンは、学習パターン蓄積部６０に送られ、「れい」に
対応するカテゴリの学習パターンとして蓄積される。The voice pattern created by the pattern creating section 20 is sent to the learning pattern accumulating section 60 and is accumulated as a learning pattern of the category corresponding to "REI".

【００１３】以下、上述と同様に「いち」、「に」、・・
・・・、「きゅう」の１４個の学習パターンがカテゴリ毎
に学習パターン蓄積部６０に蓄積される。Thereafter, similar to the above, "ichi", "ni", ...
..., 14 learning patterns of "kyu" are stored in the learning pattern storage unit 60 for each category.

【００１４】学習パターン蓄積部６０に学習パターンが
蓄積されると、学習パターン蓄積部６０から「れい」の
カテゴリに属する１個の学習パターンが、ニューラルネ
ットワーク演算部４０の入力層４１ａに入力されると共
に、学習パターン蓄積部６０から数字音声「れい」が属
するカテゴリｃの番号（但し、１≦ｃ≦Ｋ、ｃは整数、
上述の例の場合Ｋ＝１５である。）が教師信号発生部７
０に送られる。When the learning patterns are accumulated in the learning pattern accumulating unit 60, one learning pattern belonging to the category "Rei" is inputted from the learning pattern accumulating unit 60 into the input layer 41a of the neural network operation unit 40. At the same time, the number of the category c to which the numerical voice “Rei” belongs from the learning pattern storage unit 60 (where 1 ≦ c ≦ K, c is an integer,
In the above example, K = 15. ) Is the teacher signal generator 7
Sent to 0.

【００１５】ニューラルネットワーク演算部４０では、
ユニット間結合情報記憶部３１に記憶されているネット
ワーク形状と、これに対応付けて結合係数記憶部３２に
記憶されているユニット間結合係数とを用いて、入力層
４１ａに入力された学習パターンに対して、ニューラル
ネットワーク演算部４０内部で演算を行ない、その結果
を出力層４３ａのＫ個のユニットから出力値Ｏk（ｋ＝
１、２、・・・・・・、Ｋ）（以下、Ｏkという。）を出力す
る。In the neural network operation unit 40,
Using the network shape stored in the unit-to-unit coupling information storage unit 31 and the inter-unit coupling coefficient stored in the coupling coefficient storage unit 32 in association with this, the learning pattern input to the input layer 41a is set. On the other hand, a calculation is performed inside the neural network calculation unit 40, and the result is output from the K units of the output layer 43a as output values Ok (k = k).
, 1, ..., K) (hereinafter referred to as Ok) is output.

【００１６】教師信号発生部７０では、学習パターン蓄
積部６０から送られてきたカテゴリｃに応じた教師信号
Ｔk（ｋ＝１、２、・・・・・・、Ｋ）（以下、Ｔkという。）
を発生させて、誤差演算部８０に送る。In the teacher signal generator 70, a teacher signal Tk (k = 1, 2, ..., K) (hereinafter referred to as Tk) corresponding to the category c sent from the learning pattern accumulator 60. )
Is generated and sent to the error calculator 80.

【００１７】一方、誤差演算部８０では、教師信号発生
部７０の教師信号Ｔkと学習用ニューラルネット演算部
６０の出力値Ｏkとの誤差、即ち、Ｅk＝Ｔk−Ｏk （ｋ＝１、２、・・・・・・、Ｋ）・・・（１）を演算する。On the other hand, in the error calculator 80, the error between the teacher signal Tk of the teacher signal generator 70 and the output value Ok of the learning neural network calculator 60, that is, Ek = Tk-Ok (k = 1, 2, ····, K) ··· (1) is calculated.

【００１８】学習演算部９０は、式（１）に示される誤
差Ｅkが最小になるように、ユニット間結合情報記憶部
３１に記憶されているネットワーク形状を参照し乍ら、
結合係数記憶部３２に記憶されているユニット間結合係
数を誤差逆伝搬法に基づいて変更する。The learning calculation unit 90 refers to the network shape stored in the unit-to-unit coupling information storage unit 31 so that the error Ek shown in the equation (1) is minimized.
The inter-unit coupling coefficient stored in the coupling coefficient storage unit 32 is changed based on the error back propagation method.

【００１９】この操作を繰り返すことによって、結合係
数記憶部３２に記憶されているユニット間結合係数を徐
々に変化させ、最適解に近付ける。By repeating this operation, the inter-unit coupling coefficient stored in the coupling coefficient storage unit 32 is gradually changed to approach the optimum solution.

【００２０】尚、実際には、最適解を求めることは困難
であるので、学習を十分繰り返すことで準最適解を求
め、この準最適解であっても実用上は問題はない。In practice, it is difficult to find the optimum solution, so that the quasi-optimal solution is sought by sufficiently repeating the learning, and this quasi-optimal solution has no problem in practice.

【００２１】以上の動作により結合係数記憶部３２にお
いて結合係数が確定される。この結合係数とユニット間
結合情報記憶部３１のユニット間結合情報がＲＯＭライ
ター（明示せず）によりＲＯＭ化され、図３におけるＲ
ＯＭ３０として使用される。By the above operation, the coupling coefficient is determined in the coupling coefficient storage unit 32. This coupling coefficient and the inter-unit coupling information in the inter-unit coupling information storage unit 31 are converted to ROM by a ROM writer (not explicitly shown), and R in FIG.
Used as OM30.

【００２２】そこで、学習を終えたニューラルネットワ
−クを用いて音声認識の動作を行う場合について説明す
る。Then, the case of performing the voice recognition operation using the neural network which has finished learning will be described.

【００２３】例えば、一桁数字音声「れい」が発声され
た場合について説明する。音声分析部１１では、その音
声「れい」をＡ／Ｄコンバータ（明示せず）でＡ／Ｄ変
換したうえで、例えば１００〜６０００Ｈｚの周波数帯
域を１６分割し、夫々の周波数帯域における大きさ、即
ち１６個の周波数成分を５ミリ秒毎に抽出すると共に、
必要に応じてパワー情報等も抽出する。For example, a case where the one-digit number voice "Rei" is uttered will be described. In the voice analysis unit 11, after the voice "Rei" is A / D converted by an A / D converter (not explicitly shown), for example, the frequency band of 100 to 6000 Hz is divided into 16, and the size in each frequency band, That is, 16 frequency components are extracted every 5 milliseconds, and
Power information and the like are also extracted as needed.

【００２４】区間検出部１２では、音声分析部１１で抽
出された特徴パラメータに基づいて、入力音声「れい」
の開始時刻及び終了時刻を判定して音声区間を決定す
る。区間決定部１２で決定された音声区間の特徴パラメ
ータは、パターン作成部２０において、その音声区間を
８分割し、各区間における５ミリ秒毎に抽出された周波
数成分の平均を各周波数成分毎に算出する。即ち、１つ
の入力音声に対して、１６×８個のデータからなる音声
パターンが作成される。In the section detecting unit 12, the input voice "Rei" is input based on the characteristic parameters extracted by the voice analyzing unit 11.
The voice section is determined by determining the start time and end time of. The feature parameter of the voice section determined by the section determining unit 12 is divided into eight by the pattern creating unit 20, and the average of the frequency components extracted every 5 milliseconds in each period is averaged for each frequency component. calculate. That is, a voice pattern consisting of 16 × 8 data is created for one input voice.

【００２５】パターン作成部２０で音声パターンが作成
されると、ニューラルネットワーク演算部４０のＲＯＭ
（明示せず）に書かれているプログラムに基づいて、上
記音声パターンと、ＲＯＭ３０に格納されているユニッ
ト間結合情報及び結合係数を用いて演算を行う。演算結
果として、認識すべきカテゴリの１５単語の演算結果に
該当する出力層４３ａの出力値が認識判定部５０に伝達
される。認識判定部５０においては、ニューラルネット
ワーク演算部４０から送られてきた出力層４３ａの出力
値のうち最も値が大きいカテゴリを認識結果と判定す
る。この場合には、カテゴリ「れい」の出力層４３ａの
出力値が最も大きいため、カテゴリ「れい」が認識結果
と決定される。When a voice pattern is created by the pattern creating unit 20, the ROM of the neural network computing unit 40
Based on the program written in (not explicitly shown), calculation is performed using the voice pattern and the inter-unit coupling information and coupling coefficient stored in the ROM 30. As the calculation result, the output value of the output layer 43a corresponding to the calculation result of 15 words in the category to be recognized is transmitted to the recognition determination unit 50. The recognition determination unit 50 determines the category having the largest output value of the output layer 43a sent from the neural network operation unit 40 as the recognition result. In this case, since the output value of the output layer 43a of the category "Rei" is the largest, the category "Rei" is determined as the recognition result.

【００２６】認識対象語彙が一定の場合には上記方法を
用いていた。然し乍ら、音声認識における認識対象語彙
は、一定とは限らず、対象となる語彙の種類や対象語数
が変化することが一般的である。例えば、認識対象語彙
数がかわる場合は、同一の構造のニューラルネットワー
クでは演算を実行することが不可能になり、出力層のユ
ニット数、中間層の層数、中間層のユニット数、更には
入力層のユニット数を変更する必要が生じていた。The above method is used when the vocabulary to be recognized is constant. However, the recognition target vocabulary in speech recognition is not always constant, and the type of target vocabulary and the number of target words generally change. For example, when the number of vocabulary to be recognized changes, it becomes impossible to execute the operation with the neural network having the same structure, and the number of units in the output layer, the number of layers in the intermediate layer, the number of units in the intermediate layer, and the input It was necessary to change the number of units in a layer.

【００２７】従来では、複数の認識対象に対応するため
に、ニューラルネットワーク演算部４０に複数のニュー
ラルネットワークを備えるため、複数のプログラムを用
い、対象語彙に応じてプログラムを選択していた。この
ため、認識対象語彙の種類の増加に伴ってプログラム数
が増える結果、プログラム容量が増えていた。Conventionally, since the neural network operation unit 40 has a plurality of neural networks in order to support a plurality of recognition targets, a plurality of programs are used and a program is selected according to the target vocabulary. Therefore, as the number of programs increases with the increase in the types of recognition target vocabulary, the program capacity increases.

【００２８】[0028]

【発明が解決しようとする課題】そこで、本発明は上記
問題点に鑑みなされたものであり、認識対象語彙の変化
に対して、単一のプログラムによりニューラルネットワ
ークの演算を行うことを目的とする。Therefore, the present invention has been made in view of the above problems, and it is an object of the present invention to perform a neural network operation with a single program in response to changes in the vocabulary to be recognized. .

【００２９】[0029]

【課題を解決するための手段】本発明のニューラルネッ
トワークを用いた演算装置は、ニューラルネットワーク
を選択するニューラルネットワーク選択部を備えるとと
もに、少なくとも１つのネットワーク構造を格納するニ
ューラルネットワーク構造格納部において、ニューラル
ネットワーク構造情報（中間層の数、入力層のユニット
数、各中間層のユニット数、出力層のユニット数、ユニ
ット間結合情報が記憶されているＲＯＭのアドレス情報
或るいは結合係数が記憶されているＲＯＭのアドレス情
報）をテーブル化し、ニューラルネットワーク選択部の
選択に応じて、ニューラルネットワーク演算部により、
任意のニューラルネットワークに対して演算することを
可能とする。An arithmetic unit using a neural network according to the present invention comprises a neural network selection unit for selecting a neural network, and a neural network structure storage unit for storing at least one network structure Network structure information (number of intermediate layers, number of units in input layer, number of units in each intermediate layer, number of units in output layer, address information of ROM storing inter-unit coupling information or coupling coefficient) Address information of existing ROM) is made into a table, and according to the selection of the neural network selection unit, the neural network operation unit
It is possible to operate on any neural network.

【００３０】[0030]

【作用】上述の構成により、ニューラルネットワーク構
造情報をテーブル化し、ニューラルネットワーク演算部
での演算に際して、ニューラルネットワーク選択部がニ
ューラルネットワーク構造格納部内のニューラルネット
ワーク構造情報を選択し、ニューラルネットワーク演算
部は、該ニューラルネットワーク構造情報に基づいて演
算を行う。With the above-described structure, the neural network structure information is tabulated, the neural network selection unit selects the neural network structure information in the neural network structure storage unit when the neural network calculation unit calculates, and the neural network calculation unit Calculation is performed based on the neural network structure information.

【００３１】[0031]

【実施例】図１は本発明のニューラルネットワークの一
実施例の概略構成図を示す。1 is a schematic block diagram of an embodiment of a neural network of the present invention.

【００３２】図２は、図５と異なる構造の階層型ニュー
ラルネットワークの概略構成図である。FIG. 2 is a schematic diagram of a hierarchical neural network having a structure different from that of FIG.

【００３３】図１において、従来例に示す図３と同一機
能を有するものについては、同一番号を付し、その説明
は省略する。In FIG. 1, those having the same functions as those in FIG. 3 showing the conventional example are designated by the same reference numerals, and the description thereof will be omitted.

【００３４】本発明が従来例と異なる点は、ニューラル
ネットワーク構造格納部１００とニューラルネットワー
ク選択部１１０を設けたことであり、ニューラルネット
ワーク構造格納部１００は、中間層の数、入力層のユニ
ット数、各中間層のユニット数、出力層のユニット数等
からなるニューラルネットワーク構造情報を１組として
格納し、一方、ニューラルネットワーク選択部１１０
は、ニューラルネットワーク演算部４０における演算の
際に用いられるニューラルネットワーク構造情報を選択
する機能を有する。The present invention is different from the conventional example in that a neural network structure storage unit 100 and a neural network selection unit 110 are provided. The neural network structure storage unit 100 includes the number of intermediate layers and the number of input layer units. , The number of units in each intermediate layer, the number of units in the output layer, and the like are stored as one set, while the neural network selection unit 110 is stored.
Has a function of selecting neural network structure information used in the calculation in the neural network calculation unit 40.

【００３５】図２における４１ｂは入力層、４２ｂは入
力層に結合される第１の中間層、４２ｃは第１の中間層
に接続される第２の中間層、４３ｂは第２の中間層４２
ｂに接続される出力層であり、夫々Ｌ個、Ｍ個、Ｎ個及
びＯ個のユニットより構成されている。上下方向に隣接
する各層を構成する夫々のユニットは図示した通り、情
報伝達経路によって接続されており、その情報伝達経路
の結合係数はＲＯＭ３０に格納されている。In FIG. 2, 41b is an input layer, 42b is a first intermediate layer coupled to the input layer, 42c is a second intermediate layer connected to the first intermediate layer, and 43b is a second intermediate layer 42.
The output layer is connected to b, and is composed of L, M, N, and O units, respectively. As shown in the figure, the respective units constituting the vertically adjacent layers are connected by an information transmission path, and the coupling coefficient of the information transmission path is stored in the ROM 30.

【００３６】ここで、具体的に図２における各層のユニ
ットの個数を述べると、入力される音声は、周波数帯域
を１６分割、時間を８分割してパターン化するようにし
ているので、入力層４１ａのユニット数Ｌ＝１６×８＝
１２８である。Here, the number of units in each layer in FIG. 2 will be specifically described. The input voice is patterned by dividing the frequency band into 16 parts and the time into 8 parts. 41a number of units L = 16 × 8 =
128.

【００３７】また、認識すべきカテゴリは、例えば地名
８単語（「東京」、「大阪」、「京都」、「神戸」、
「横浜」、「名古屋」、「仙台」、「札幌」）とすると
出力層４３ｂのユニット数Ｏ＝８であり、更に第１の中
間層４２ｂ及び第２の中間層４２ｃのユニット数Ｍ、Ｎ
は、それぞれ３０とする。The categories to be recognized are, for example, 8 words of place names (“Tokyo”, “Osaka”, “Kyoto”, “Kobe”,
“Yokohama”, “Nagoya”, “Sendai”, “Sapporo”), the number of units O in the output layer 43b is 8, and the number of units M, N in the first middle layer 42b and the second middle layer 42c.
Are 30 respectively.

【００３８】以下においては、一桁数字が認識対象の場
合をステージ１、地名が認識対象の場合をステージ２と
して説明する。In the following description, the case where a one-digit number is a recognition target is Stage 1 and the case where a place name is a recognition target is Stage 2.

【００３９】表１にステージ１及びステージ２のニュー
ラルネットワーク構造情報を示す。Table 1 shows the neural network structure information of stage 1 and stage 2.

【００４０】[0040]

【表１】 [Table 1]

【００４１】例えば、発声された一桁数字音声「れい」
を認識する場合について説明する。尚、ステージ１及び
ステージ２のニューラルネットワークの学習について
は、従来例と同じであるのでここでは割愛する。For example, the one-digit numeral voice "Rei" that has been uttered
The case of recognizing will be described. The learning of the stage 1 and stage 2 neural networks is the same as in the conventional example, and will not be described here.

【００４２】まず、使用者はニューラルネットワーク選
択部１１０でステージ１を指定する。ステージ１が指定
されるとニューラルネットワーク構造格納部１００は、
表１のステージ１に示されるニューラルネットワーク構
造情報（中間層が１層、入力層が１２８ユニット、中間
層が５０ユニット、出力層が１５ユニット、ユニット間
結合情報が記憶されているＲＯＭ３０のアドレスが１０
００及び結合係数が記憶されているＲＯＭ３０のアドレ
スが２０００）をニューラルネットワーク演算部４０に
伝達する。First, the user specifies the stage 1 by the neural network selection unit 110. When stage 1 is designated, the neural network structure storage unit 100
Neural network structure information shown in stage 1 of Table 1 (the middle layer is one layer, the input layer is 128 units, the middle layer is 50 units, the output layer is 15 units, and the address of the ROM 30 in which the inter-unit coupling information is stored is 10
00 and the address of the ROM 30 in which the coupling coefficient is stored are 2000) are transmitted to the neural network operation unit 40.

【００４３】ニューラルネットワーク演算部４０は、上
記情報を受けて、ニューラルネットワークの構造を決定
し、パターン作成部２０からのデータ待機状態となる。The neural network calculation unit 40 receives the above information, determines the structure of the neural network, and enters a data standby state from the pattern generation unit 20.

【００４４】使用者がマイクロフォン（明示せず）に向
かって「れい」を発声すると、音声分析部１１では、そ
の音声「れい」をＡ／Ｄ変換したうえで、例えば１００
〜６０００Ｈｚの周波数帯域を１６分割し、夫々の周波
数帯域における大きさ、即ち１６個の周波数成分を５ミ
リ秒毎に抽出すると共に、必要に応じてパワー情報等も
抽出する。When the user utters "Rei" into the microphone (not explicitly shown), the voice analysis unit 11 A / D-converts the voice "Rei" and then, for example, 100
The frequency band of up to 6000 Hz is divided into 16 parts, and the size in each frequency band, that is, 16 frequency components are extracted every 5 milliseconds, and power information and the like are extracted as necessary.

【００４５】区間検出部１２では、音声分析部１１で抽
出された周波数成分及びパワー情報に基づいて、入力音
声「れい」の開始時刻及び終了時刻を判定して音声区間
を決定する。区間決定部１２で決定された音声区間の特
徴パラメータは、パターン作成部２０において、その音
声区間を８分割し、各区間における５ミリ秒毎に抽出さ
れた周波数成分の平均を各周波数成分毎に算出する。即
ち、１つの入力音声に対して、１６×８個のデータから
なる音声パターンが作成される。The section detecting unit 12 determines the start section and the end time of the input voice "REI" based on the frequency component and the power information extracted by the voice analyzing unit 11 to determine the voice section. The feature parameter of the voice section determined by the section determining unit 12 is divided into eight by the pattern creating unit 20, and the average of the frequency components extracted every 5 milliseconds in each period is averaged for each frequency component. calculate. That is, a voice pattern consisting of 16 × 8 data is created for one input voice.

【００４６】パターン作成部２０で音声パターンが作成
されると、ニューラルネットワーク演算部４０のＲＯＭ
（明示せず）に書かれているプログラムに基づいて、上
記音声パターンと、ニューラルネットワーク構造格納部
１００で指定されたユニット間結合情報及び結合係数を
基にしてニューラルネットワークの演算を行う。演算結
果として、認識すべき１５単語のカテゴリに該当する出
力層４３ｂの出力値が演算結果として認識判定部５０に
伝達される。認識判定部５０においては、認識用ニュー
ラルネットワーク演算部４０の出力層４３ｂの出力値の
うち最も値が大きいカテゴリを認識結果と判定する。こ
の場合には、カテゴリ「れい」の出力値が最も大きいた
め、カテゴリ「れい」が認識結果と決定される。When a voice pattern is created by the pattern creating unit 20, the ROM of the neural network computing unit 40
Based on the program written (not explicitly shown), the neural network is calculated based on the voice pattern, the inter-unit coupling information and the coupling coefficient designated in the neural network structure storage unit 100. As the calculation result, the output value of the output layer 43b corresponding to the category of 15 words to be recognized is transmitted to the recognition determination unit 50 as the calculation result. The recognition determination unit 50 determines the category with the largest output value of the output layers 43b of the recognition neural network operation unit 40 as the recognition result. In this case, since the output value of the category "Rei" is the largest, the category "Rei" is determined as the recognition result.

【００４７】続いて、地名の認識を行うために、ニュー
ラルネットワーク選択部１１０でステージ２を指定す
る。ニューラルネットワーク選択部１１０でステージ２
が指定されるとニューラルネットワーク構造格納部１０
０は、表１のステージ２に示されるニューラルネットワ
ーク構造情報（中間層が２層、入力層が１２８ユニッ
ト、第１の中間層が３０ユニット、第２の中間層が３０
ユニット、出力層が８ユニット、ユニット間結合情報が
記憶されているＲＯＭ１のアドレスが３０００及び結合
係数が記憶されているＲＯＭ１のアドレスが４０００）
をニューラルネットワーク演算部４０に伝達する。音声
入力部１１からニューラルネット演算部４０の処理は同
じであるのでここでは省略する。演算結果として、認識
すべき８単語のカテゴリに該当する出力層４３ｂの出力
値が認識判定部５０に伝達される。認識判定部５０にお
いては、ニューラルネットワーク演算部４０から送られ
てきた出力層４３ｂの出力値のうち最も値が大きいもの
を認識結果と判定する。この場合には、カテゴリ「大
阪」の出力値が最も大きいため、カテゴリ「大阪」が認
識結果と決定される。Then, the stage 2 is designated by the neural network selection unit 110 in order to recognize the place name. Stage 2 in the neural network selection unit 110
Is specified, the neural network structure storage unit 10
0 is the neural network structure information shown in stage 2 of Table 1 (two layers for the intermediate layer, 128 units for the input layer, 30 units for the first intermediate layer, and 30 units for the second intermediate layer).
(Units, output layer is 8 units, address of ROM1 storing coupling information between units is 3000 and address of ROM1 storing coupling coefficient is 4000)
Is transmitted to the neural network calculation unit 40. Since the processing from the voice input unit 11 to the neural network calculation unit 40 is the same, it is omitted here. As an operation result, the output value of the output layer 43b corresponding to the category of 8 words to be recognized is transmitted to the recognition determination unit 50. The recognition determination unit 50 determines that the output value of the output layer 43b sent from the neural network operation unit 40 that has the largest value is the recognition result. In this case, since the output value of the category "Osaka" is the largest, the category "Osaka" is determined as the recognition result.

【００４８】以上の動作により、認識の対象となるカテ
ゴリが変更になった場合においてもニューラルネットワ
ーク演算部４０にニューラルネットワーク構造情報を与
えることにより、簡単に複数のニューラルネットワーク
の演算を行うことできる。By the above-mentioned operation, even when the category to be recognized is changed, the neural network structure information is given to the neural network calculation unit 40, so that a plurality of neural networks can be calculated easily.

【００４９】尚、本実施例においては、中間層の層数、
各中間層のユニット数、入力層のユニット数及び出力層
のユニット数を限定して説明したが、本発明はこれらの
数に限定されるものではない。また、パターン作成部２
０への入力として、音声認識に用いるための音声分析結
果を用いたが、文字認識のための特徴パラメータ等を用
いることも可能であり、パターン作成部２０への入力は
限定されるものではない。In the present embodiment, the number of intermediate layers,
Although the number of units in each intermediate layer, the number of units in the input layer, and the number of units in the output layer are limited and described, the present invention is not limited to these numbers. Also, the pattern creation unit 2
As the input to 0, the voice analysis result for use in voice recognition is used, but it is also possible to use a characteristic parameter for character recognition and the like, and the input to the pattern creating unit 20 is not limited. .

【００５０】[0050]

【発明の効果】本発明によれば、音声認識の対象となる
カテゴリの変更に伴ってニューラルネットワークの演算
を行うためのプログラムを変更する必要がないため、同
一プログラムにより複数のニューラルネットワークを実
現することが可能となり、少ないプログラム容量によ
り、ニューラルネットワークの演算を実現することがで
きる。According to the present invention, it is not necessary to change the program for performing the operation of the neural network according to the change of the category of the voice recognition target, so that a plurality of neural networks are realized by the same program. Therefore, it is possible to realize the operation of the neural network with a small program capacity.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明のニューラルネットワークの一実施例を
音声認識に応用した場合の一実施例の概略構成図FIG. 1 is a schematic configuration diagram of an embodiment when an embodiment of the neural network of the present invention is applied to speech recognition.

【図２】階層型のニューラルネットワークの一概略構成
図FIG. 2 is a schematic configuration diagram of a hierarchical neural network.

【図３】従来のニューラルネットワークを音声認識に応
用した場合の一実施例の概略構成図FIG. 3 is a schematic configuration diagram of an embodiment in which a conventional neural network is applied to voice recognition.

【図４】ニューラルネットワークの学習部の一実施例の
概略構成図FIG. 4 is a schematic configuration diagram of an embodiment of a learning unit of a neural network.

【図５】別の階層型のニューラルネットワークの一概略
構成図FIG. 5 is a schematic configuration diagram of another hierarchical neural network.

[Explanation of symbols]

１１音声分析部１２区間検出部２０パターン作成部３１ユニット間結合情報記憶部３２結合係数記憶部４０ニューラルネットワーク演算部５０認識判定部１００ニューラルネットワーク構造格納部１１０ニューラルネットワーク指定部 11 Voice Analysis Section 12 Section Detection Section 20 Pattern Creation Section 31 Inter-Unit Coupling Information Storage Section 32 Coupling Coefficient Storage Section 40 Neural Network Operation Section 50 Recognition Determination Section 100 Neural Network Structure Storage Section 110 Neural Network Designation Section

Claims

[Claims]

1. A feature parameter extracting section for extracting a feature parameter of an input signal, a pattern creating section for creating a pattern based on the feature parameter extracted by the feature parameter extracting section, and a pattern creating section A neural network operation unit that executes an operation of the neural network using the generated pattern as input information; a neural network structure storage unit that stores the structure information of the neural network necessary for the operation in the neural network operation unit; And a neural network selection unit that selects one set of neural network structure information from a plurality of sets of neural network structure information stored in the network structure storage unit, and the neural network operation unit is created by the pattern creation unit. Patta An input layer for inputting an input layer, an intermediate layer connected to the input layer, and an output layer connected to the intermediate layer. The neural network structure storage unit includes the number of layers of the intermediate layer and a unit of the input layer. Number, the number of units in the intermediate layer, or
A plurality of sets of neural network structure information having at least one of the number of units of the output layer as a constituent element is stored, and the neural network operation unit is based on the neural network structure information selected by the neural network selection unit. An arithmetic unit using a neural network, which is characterized by performing arithmetic operations.

2. The arithmetic unit using a neural network according to claim 1, wherein the input signal to the characteristic parameter extraction unit is a voice signal or character information.