JP2004258748A

JP2004258748A - Program for allowing computer to execute operation for obtaining approximate function and computer readable recording medium having its program recorded thereon

Info

Publication number: JP2004258748A
Application number: JP2003046003A
Authority: JP
Inventors: Yoshie Kono; 芳江河野; Taro Ando; 太郎安藤; Shigeru Saito; 茂斎藤
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2003-02-24
Filing date: 2003-02-24
Publication date: 2004-09-16

Abstract

<P>PROBLEM TO BE SOLVED: To provide a program for allowing a computer to execute an operation for obtaining an approximate function which defines the relationship between input and output. <P>SOLUTION: The program sequentially executes the following steps S1 and S2 to operate an approximate function. Step S1 comprises accepting (m1 × S) input values and S output values included in sample data. Step S2 optimizes parameters w<SB>ij</SB>, θ<SB>j</SB>, τ<SB>j</SB>, W<SB>j</SB>and Θ of a three-layer neutral network using high dimensional algorithm in order to obtain an approximate function which defines the relationship between (m1) input values and one output value by changing parameters w<SB>ij</SB>, θ<SB>j</SB>and τ<SB>j</SB>of intermediate units within a broad search range while increasing the number of the intermediate units one by one. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
この発明は、入力と出力との関係を規定する近似関数を求める演算をコンピュータに実行させるためのプログラム及びそのプログラムを記録したコンピュータ読取り可能な記録媒体に関するものである。
【０００２】
【従来の技術】
自己相互作用を考慮した精度の高い量子状態計算法がいくつか提案されている。いずれの方法も、量子井戸の構造パラメータ及び量子井戸に印加される外部電場等を入力変数の組とし、その入力変数の組からある１つの値の組が抽出されると、その抽出された１つの値の組を入力として微分方程式の積分及び各種パラメータの最適化等の計算量の多い演算を実行する。そして、与えられた入力に対して、物理的に許容されるエネルギー準位又は波動関数等が出力として計算される。
【０００３】
なお、以上、本発明についての従来の技術を、出願人の知得した一般的技術情報に基づいて説明したが、出願人の記憶する範囲において、出願前までに先行技術文献情報として開示すべき情報を出願人は有していない。
【０００４】
【発明が解決しようとする課題】
しかし、従来の方法では、個々の入力値に対して出力値を求めるために、計算のための専用のプログラム及び十分な計算リソース（メモリ及びＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）等）が必要である。また、実際のナノデバイスの設計においては、具体的な要求条件は、殆どの場合、エネルギー準位及び波動関数等、上述した計算手法の出力に対して課される。即ち、量子井戸におけるエネルギー準位又は波動関数が設計値として与えられ、その与えられたエネルギー準位又は波動関数を実現するために、井戸層の幅、井戸層のバンドギャップ、バリア層の高さ、バリア層の幅、バリア層のバンドギャップ及び井戸層又はバリア層におけるドーパントのドーピング量等のパラメータをどのように設定すればよいかを解く必要がある。
【０００５】
この場合、上述した従来の方法を用いれば、試行錯誤的に入力値を変えながら条件を満たす系を探さなければならないが、そのためには、膨大な計算量が必要になり、ナノデバイスの設計の度に演算を行なうのは効率的でない。
【０００６】
そこで、この発明は、かかる問題を解決するためになされたものであり、その目的は、入力と出力との関係を規定する近似関数を求める演算をコンピュータに実行させるためのプログラムを提供することである。
【０００７】
また、この発明の別の目的は、入力と出力との関係を規定する近似関数を求める演算をコンピュータに実行させるためのプログラムを記録したコンピュータ読取り可能な記録媒体を提供することである。
【０００８】
【課題を解決するための手段および発明の効果】
この発明によれば、各々がｍ（ｍは自然数）個の入力値とｎ（ｎは自然数）個の出力値とから成るＳ（Ｓは自然数）個のサンプルデータを用いてｍ個の入力とｎ個の出力との関係を規定する近似関数を求める演算をコンピュータに実行させるためのプログラムは、ｍ×Ｓ個の入力値とｎ×Ｓ個の出力値とを受付ける第１のステップと、ｍ個の入力値に対してｎ個の出力演算値を演算する超球面識別タイプの３層ニューラルネットワークの全パラメータのうち、識別超球面のパラメータの値を通常の探索範囲よりも広い探索範囲で変化させて、超球面識別タイプの演算によりｍ×Ｓ個の入力値に対するｎ×Ｓ個の出力演算値を演算し、その演算したｎ×Ｓ個の出力演算値を用いて近似関数が得られるように全パラメータの値を最適化する第２のステップとをコンピュータに実行させ、第２のステップは、全パラメータの数で定義される次元数よりも高い高次元空間を設定し、その設定した高次元空間において全パラメータの値が最適値以外である領域を素速く通過し、全パラメータの値が最適値である領域に容易に入ることが期待される高次元アルゴリズムにより全パラメータの最適化を行なう、コンピュータに実行させるためのプログラムである。
【０００９】
好ましくは、第２のステップは、３層ニューラルネットワークに含まれ、かつ、超球面識別タイプの演算を行なう中間ユニットの個数を初期値に設定してｎ×Ｓ個の出力演算値を演算し、全パラメータの最適化を行なう。
【００１０】
好ましくは、第２のステップは、全パラメータを初期値に設定して超球面識別タイプの演算によりｎ×Ｓ個の出力演算値を演算する第１のサブステップと、演算されたｎ×Ｓ個の出力演算値を評価するコスト関数値を演算し、その演算したコスト関数値を所定値と比較する第２のサブステップと、コスト関数値が所定値以下のとき、コスト関数値が得られるときの全パラメータの値を最適値とする第３のサブステップと、コスト関数値が所定値よりも大きいとき、コスト関数値を低減させるための全パラメータを高次元アルゴリズムにより広い探索範囲で演算する第４のサブステップと、第４のサブステップにより演算された全パラメータを用いて第１のサブステップを実行し、その後、第２から第４のサブステップを実行する第５のサブステップと、第１から第５のサブステップを規定回数まで繰返し実行したときのコスト関数値が所定値よりも大きいとき、中間ユニットの個数を増加して第１から第５のサブステップを実行する第６のサブステップとを含む。
【００１１】
好ましくは、中間ユニットの個数は、１個づつ増加される。
好ましくは、中間ユニットの個数の初期値は、１である。
【００１２】
好ましくは、第２のステップは、全パラメータの数を初期値に設定してｎ×Ｓ個の出力演算値を演算し、全パラメータの最適化を行なう。
【００１３】
好ましくは、第２のステップは、全パラメータを初期値に設定して超球面識別タイプの演算によりｎ×Ｓ個の出力演算値を演算する第１のサブステップと、演算されたｎ×Ｓ個の出力演算値を評価するコスト関数値を演算し、その演算したコスト関数値を所定値と比較する第２のサブステップと、コスト関数値が所定値以下のとき、コスト関数値が得られるときの全パラメータの値を最適値とする第３のサブステップと、コスト関数値が所定値よりも大きいとき、コスト関数値を低減させるための全パラメータを高次元アルゴリズムにより広い探索範囲で演算する第４のサブステップと、第４のサブステップにより演算された全パラメータを用いて第１のサブステップを実行し、その後、第２から第４のサブステップを実行する第５のサブステップと、第１から第５のサブステップを規定回数まで繰返し実行したときのコスト関数値が所定値よりも大きいとき、全パラメータの数を増加して第１から第５のサブステップを実行する第６のサブステップとを含む。
【００１４】
好ましくは、全パラメータは、所定数づつ増加される。そして、プログラムの第１から第５のサブステップは、全パラメータの数が増加されたとき、全パラメータの数が増加される前の所定数のパラメータの値を固定して実行される。
【００１５】
好ましくは、全パラメータの数が増加される前の所定数のパラメータを第１のパラメータとし、増加された所定数のパラメータを第２のパラメータとしたとき、第４のサブステップは、第１のパラメータを固定し、第２のパラメータを広い探索範囲で変化させて高次元アルゴリズムによりコスト関数値を低減させるための全パラメータを演算する。
【００１６】
好ましくは、第２のサブステップは、受付けたｎ×Ｓ個の出力値と演算されたｎ×Ｓ個の出力演算値との二乗誤差の和の平均をコスト関数値として演算する。
【００１７】
好ましくは、ｎ×Ｓ個の出力値は、ガウシャン様の分布の結合により近似される。
【００１８】
好ましくは、ｍ×Ｓ個の入力値及びｎ×Ｓ個の出力値は、コンピュータにより演算されたデータである。
【００１９】
好ましくは、ｍ×Ｓ個の入力値及びｎ×Ｓ個の出力値は、微小構造中に閉じ込められた粒子の量子準位を演算する量子準位演算プログラムによって演算されたデータである。そして、量子準位演算プログラムは、線形のシュレディンガー方程式に基づいて初期の波動関数を演算し、その演算された初期の波動関数を複数の離散化された成分から成る数値列として与えるステップＡと、離散化された複数の成分を持つ第１の波動関数と粒子の相互作用を考慮した非線形項を含むハミルトニアンとを用いて微小構造中に存在する粒子数で規格化され、かつ、全系のエネルギーを示すコスト関数を演算するステップＢと、演算されたコスト関数を用いて、系の全体エネルギーが最小となる最終的な波動関数を演算するステップＣと、最終的な波動関数とハミルトニアンとを用いて最終的な波動関数で表わされる状態のエネルギーを演算するステップＤとを含む。
【００２０】
好ましくは、高次元アルゴリズムは、解くべき問題に現われ、かつ、最適化すべき全パラメータの空間を意味空間と定義するステップと、全パラメータと共役な共役パラメータによって新しい空間を定義するステップと、意味空間に新しい空間を加えて高次元空間を定義するステップと、高次元空間において問題を設定するステップと、全パラメータの値が最適値以外である領域を素速く通過し、全パラメータの値が最適値である領域に容易に入ることが期待される自律的運動を高次元空間において行なって全パラメータの最適値を検出するステップとから成る。
また、この発明によれば、近似関数を求める演算をコンピュータに実行させるためのプログラムを記録したコンピュータ読取り可能な記録媒体は、請求項１から請求項１４のいずれか１項に記載されたプログラムを記録したコンピュータ読取り可能な記録媒体である。
【００２１】
この発明によるプログラムは、超球面識別タイプの演算を行なうときの超球面パラメータを通常の探索範囲より広い範囲で変化させて３層ニューラルネットワークにより入力値に対する出力演算値を演算する。そして、この発明によるプログラムは、その演算した出力演算値を評価するコスト関数が所定値以下になるように高次元アルゴリズムを用いてパラメータを最適化し、入力値と出力値との関係を規定する近似関数を演算する。
【００２２】
従って、この発明によれば、入力値に対する出力値を容易に得ることが可能な近似関数を求めることができる。
【００２３】
また、この発明においては、プログラムは、出力演算値を演算する際、パラメータの探索範囲を通常の探索範囲よりも広い範囲で変化させ、超球面識別タイプの演算を行なう。
【００２４】
従って、この発明によれば、局所的特徴及び大局的特徴の両方を効率良く出力演算値に反映させることができる。
【００２５】
更に、この発明においては、プログラムは、３層ニューラルネットワークのパラメータの数により設定される次元数よりも高い高次元空間において自律的運動を行なう高次元アルゴリズムを用いてパラメータの最適化を行なう。そして、自律的運動とは、パラメータの最適値以外の値が存在する領域を素速く通過し、最適値が存在する領域に容易に入る運動を言う。
【００２６】
従って、この発明によれば、出力演算値を評価するコスト関数の局所解に捉まりにくく、かつ、コスト関数の平坦領域を素速く通過してパラメータを最適化できる。
【００２７】
特に、超球面識別タイプの演算を行なう際にパラメータの探索範囲を広くすると、コスト関数の平坦領域が増加するが、高次元アルゴリズムは、平坦領域を素速く通過して最適解に到達する特徴を有するので、局所的特徴及び大局的特徴の両方を出力演算値に反映させ、かつ、早くパラメータを最適化できる。
【００２８】
【発明の実施の形態】
本発明の実施の形態について図面を参照しながら詳細に説明する。なお、図中同一または相当部分には同一符号を付してその説明は繰返さない。
【００２９】
図１は、この発明によるプログラムが近似関数を求める演算に用いる入力値と出力値とを示す。集合１０は、入力値の組を示し、（ｘ_１（１），・・・，ｘ_ｍ１（１））、（ｘ_１（２），・・・，ｘ_ｍ１（２））、・・・、（ｘ_１（Ｓ），・・・，ｘ_ｍ１（Ｓ））、・・・、（ｘ_１（Ｍ），・・・，ｘ_ｍ１（Ｍ））を含む。また、集合２０は、出力値の集合を示し、ｚ（１）、ｚ（２）、・・・、ｚ（Ｓ）、・・・、ｚ（Ｍ）を含む（Ｓ，Ｍ：自然数）。
【００３０】
そして、出力値ｚ（１）は、入力値の組（ｘ_１（１），・・・，ｘ_ｍ１（１））に対して得られ、出力値ｚ（２）は、入力値の組（ｘ_１（２），・・・，ｘ_ｍ１（２））に対して得られ、以下、同様にして出力値ｚ（Ｍ）は、入力値の組（ｘ_１（Ｍ），・・・，ｘ_ｍ１（Ｍ））に対して得られる。また、集合１０に含まれる入力値の組（ｘ_１（１），・・・，ｘ_ｍ１（１））、（ｘ_１（２），・・・，ｘ_ｍ１（２））、・・・、（ｘ_１（Ｓ），・・・，ｘ_ｍ１（Ｓ））、・・・、（ｘ_１（Ｍ），・・・，ｘ_ｍ１（Ｍ））及び集合２０に含まれる出力値ｚ（１）、ｚ（２）、・・・、ｚ（Ｓ）、・・・、ｚ（Ｍ）は、コンピュータによって予め正確に演算された値である。
【００３１】
なお、これらの入力値の組及び出力値を求める方法については後述する。
集合１０及び２０から近似関数を求める演算に用いるためのサンプルデータ３０が生成される。即ち、サンプルデータ３０は、サンプル１〜サンプルＳから成る。そして、サンプル１は、入力値の組（ｘ_１（１），・・・，ｘ_ｍ１（１））と出力値ｚ（１）とから成り、サンプル２は、入力値の組（ｘ_１（２），・・・，ｘ_ｍ１（２））と出力値ｚ（２）とから成り、以下、同様にしてサンプルＳは、入力値の組（ｘ_１（Ｓ），・・・，ｘ_ｍ１（Ｓ））と出力値ｚ（Ｓ）とから成る。つまり、サンプルデータ３０は、集合１０及び２０に含まれる入力値の組及び出力値から抽出された一部の入力値の組及び出力値によって構成される。
【００３２】
このようにして、近似関数を求める演算に用いるべきサンプルデータ３０が準備される。
【００３３】
この発明によるプログラムは、サンプルデータ３０を用いて入力値（（ｘ_１（１），・・・，ｘ_ｍ１（１））、（ｘ_１（２），・・・，ｘ_ｍ１（２））、・・・、（ｘ_１（Ｓ），・・・，ｘ_ｍ１（Ｓ））と出力値（ｚ（１）、ｚ（２）、・・・、ｚ（Ｓ））との関係を規定する近似関数（ｚ（ｎ）≒ｆ（ｘ_１（ｎ），・・・，ｘ_ｍ１（ｎ））、（ｎ＝１，・・・Ｓ）を満たす関数ｆ）を３層ニューラルネットワークに基づく関数モデル及び高次元アルゴリズム（新上和正、「高次元アルゴリズム」、Ｂｉｔ，Ｖｏｌ．３１．Ｎｏ．７，ｐｐ．２−８（１９９９）、新上和正、「高次元アルゴリズム：最適化問題を解く１つの方法」、日本ファジィ学会誌、Ｖｏｌ．１１，Ｎｏ．３，ｐｐ．３８２−３９６（１９９９）参照）を用いて演算する。なお、以下においては、３層ニューラルネットワークに基づく関数モデルによる演算を単に「３層ニューラルネットワークによる演算」と言う。
【００３４】
図２は、入力変数が２個の場合における入力値と出力値との関係を示す。図２の（ａ）は、２個の入力値の組ｘ_１（ｎ），ｘ_２（ｎ）と、２個の入力値の組ｘ_１（ｎ），ｘ_２（ｎ）のプロット点及び関数ｆの等高線とを示す。図２の（ａ）において、黒丸は、２個の入力値の組ｘ_１（ｎ），ｘ_２（ｎ）の各々が［０，１］の範囲で変化した場合における２個の入力値の組ｘ_１（ｎ），ｘ_２（ｎ）のプロット点を示し、実線は、関数ｆの等高線を示す。また、図２の（ｂ）は、関数ｆの３次元表現である。
【００３５】
この場合、サンプルデータ３０は、２個の入力値の組ｘ_１（ｎ），ｘ_２（ｎ）を含むので、２個の入力値の組ｘ_１（ｎ），ｘ_２（ｎ）は、プロット点（１），（２），・・・，（Ｓ）によって表わされる。そして、プロット点（１），（２），・・・，（Ｓ）の各々が入力値として入力された場合、等高線１〜６によって表わされる出力値が得られる。この等高線１〜６によって表わされた曲面を３次元表現したものが図２の（ｂ）に示す曲面７である。従って、プロット点（１），（２），・・・，（Ｓ）を入力値とした場合の出力値は、曲面７上に存在することになり、この発明によるプログラムは、プロット点（１），（２），・・・，（Ｓ）に対して曲面７を表わす関数ｆを３層ニューラルネットワーク及び高次元アルゴリズムを用いて演算する。
【００３６】
曲面７は、なだらかな曲面から成る。つまり、フーリエ級数又は三角関数の級数和によって表わした方が適切な激しい振動を多く含む曲面ではなく、ガウシャン様の分布の結合により表わされる。従って、この発明によるプログラムを用いて関数ｆを求める演算を行なう場合、好ましくは、出力値（ｚ（１）、ｚ（２）、・・・、ｚ（Ｓ））は、ガウシャン様の分布の結合により表わされるような「なだらな曲面」を構成する。
【００３７】
なお、この発明において、「なだらかな曲面」とは、ガウシャン様の分布の結合により表わされる曲面を言う。
【００３８】
図３は、この発明によるプログラムが入力値に対する出力演算値Ｚ（ｎ）を演算する３層ニューラルネットワークの概念図を示す。３層ニューラルネットワーク４０は、入力層４１と、中間層４２と、出力層４３とを含む。
【００３９】
入力層４１は、ｍ１個の入力ユニット４１ｉ（ｉ＝１，・・・，ｍ１）から成る。中間層４２は、ｍ２個の中間ユニット４２ｊ（ｊ＝１，・・・，ｍ２）から成る。出力層４３は、１個の出力ユニット４３１から成る。
【００４０】
入力ユニット４１ｉは、入力層４１のｉ番目のユニットに入力される入力値ｘ_ｉ（ｎ）を受け、その受けた入力値ｘ_ｉ（ｎ）をｍ２個の中間ユニット４２ｊの各々に伝達する。
【００４１】
中間層４２は、入力値と出力値との間の特徴抽出の主要な役割を担う層である。中間ユニット４２ｊの内部状態、出力及び閾値をそれぞれｙ_ｊ，Ｙ_ｊ，θ_ｊとし、ｉ番目の入力ユニット４１ｉとｊ番目の中間ユニット４２ｊとの間の結合のパラメータをｗ_ｉｊとしたとき、中間ユニット４２ｊは、式（１）により内部状態ｙ_ｊを演算する。
【００４２】
【数１】

【００４３】
即ち、中間ユニット４２ｊは、超球面識別タイプの演算を行なう。中間ユニット４２ｊが超球面識別タイプの演算を行なう理由については後述する。
【００４４】
そして、中間ユニット４２ｊは、演算した内部状態ｙ_ｊを式（２）に代入して出力Ｙ_ｊを演算する。
【００４５】
【数２】

【００４６】
即ち、中間ユニット４２ｊは、シグモイド関数により出力Ｙ_ｊを演算する。式（２）において、Ｔ_ｊは、シグモイド関数の遷移領域のスロープを調整するパラメータである。この実施の形態においては、式（２）の右辺の分母に含まれる指数関数が数値的に発散するのを防止するためにＴ_ｊは、式（３）により定義される。
【００４７】
【数３】

【００４８】
式（３）において、τ_ｊは、中間ユニット４２ｊの出力関数の傾きを表わす。
中間ユニット４２ｊは、出力Ｙ_ｊを演算すると、その演算した出力Ｙ_ｊを出力層４３の出力ユニット４３１へ出力する。
【００４９】
出力層４３は、ｍ２個の中間ユニット４２ｊの各々の出力結果を適切な重み付けにより最終出力の調整を行なう。従って、出力ユニット４３１は、式（４）により出力演算値Ｚ_１（ｎ）を演算する。
【００５０】
【数４】

【００５１】
式（４）において、Ｗ_ｊは、ｊ番目の中間ユニット４２ｊとの結合重みであり、Θは、出力ユニット４３１の閾値である。
【００５２】
なお、この発明においては、中間ユニットの個数は、最初、１個に設定され、その設定された１個の中間ユニットを用いた近似関数ｆを求める演算結果に応じて、１個づつ増加される。従って、出力ユニット４３１は、最初、中間ユニット４２１からの出力Ｙ_１と、結合重みＷ_１とを式（４）に代入して出力演算値Ｚ_１（ｎ）を演算し、中間ユニットの個数が増加されれば、その増加された中間ユニットからの出力及び増加された中間ユニットとの結合重みを用いて出力演算値Ｚ_１（ｎ）を演算する。
【００５３】
このように、３層ニューラルネットワーク４０は、ｍ１個の入力値（ｘ_１（ｎ），・・・，ｘ_ｍ１（ｎ））に対して、θ_ｊ，ｗ_ｉｊ，τ_ｊ，Ｗ_ｊ，Θをパラメータとして１個の出力演算値Ｚ_１（ｎ）を演算する。そして、Ｓ個のサンプル１〜Ｓの各々は、ｍ１個のデータから成る入力値（ｘ_１（ｎ），・・・，ｘ_ｍ１（ｎ））を含むので、３層ニューラルネットワーク４０は、ｍ１×Ｓ個の入力値に対してＳ個の出力演算値Ｚ_１（１），・・・，Ｚ_１（Ｓ）を演算する。
【００５４】
中間ユニット４２ｊが内部状態ｙ_ｊを求めるために式（１）により超球面識別タイプの演算を行なう理由について説明する。一般に、内部状態ｙ_ｊを求めるために式（５）により表わされる入力値ｘ_ｉ（ｎ）と結合のパラメータｗ_ｉｊとの積和演算がよく使用される。
【００５５】
【数５】

【００５６】
この場合、識別曲面の方程式ｙ_ｊ＝０は、１つの超平面を指定する。その例を図４及び図５に示す。図４は、２個の入力値の組ｘ_１（ｎ），ｘ_２（ｎ）を用いた場合に式（５）を用いて演算された内部状態ｙ_ｊを示す。また、図５は、２個の入力値の組ｘ_１（ｎ），ｘ_２（ｎ）を用いた場合の中間ユニットからの出力Ｙ_ｊを示す。
【００５７】
図４に示すように、識別曲面の方程式ｙ_ｊ＝０は、１つの超平面５０を指定する。そして、図５に示すように、中間ユニットの出力Ｙ_ｊは、超平面５０を境界にして変化する曲面によって表わされる。超平面５０の両側の領域５１及び５２は、ほぼ平坦であり、領域５１及び５２の各々において、更に狭い領域における特徴を表現することはできない。即ち、２個の入力値の組ｘ_１（ｎ），ｘ_２（ｎ）に対して演算された中間ユニットの出力Ｙ_ｊは、超平面５０を境界にして変化するという大局的特徴を表現できるが、領域５１及び５２の更に狭い領域における局所的特徴を表現することができない。
【００５８】
式（５）を用いて内部状態ｙ_ｊを演算することを超平面識別タイプの演算をすると言う。そして、超平面識別タイプの演算を行なった場合、上述したように、大局的特徴を表現できるが、１つの中間ユニットで局所的特徴を表現することはできない。局所的特徴を表現するには多数の中間ユニットを要する。
【００５９】
一方、式（１）を用いて内部状態ｙ_ｊを演算した場合、識別曲面の方程式ｙ_ｊ＝０は、１つの超球面を指定する。その例を図６及び図７に示す。図６は、２個の入力値の組ｘ_１（ｎ），ｘ_２（ｎ）を用いた場合に式（１）を用いて演算された内部状態ｙ_ｊを示す。また、図７は、２個の入力値の組ｘ_１（ｎ），ｘ_２（ｎ）を用いた場合の中間ユニット４２ｊからの出力Ｙ_ｊを示す。
【００６０】
図６に示すように、識別球面の方程式ｙ_ｊ＝０は、１つの超球面６０を指定する。そして、図７に示すように、中間ユニット４２ｊの出力Ｙ_ｊは、超球面６０の外側では平面であり、超球面６０の内側では凸曲面になる。また、超球面６０は、一般的には狭い領域に形成される。従って、中間ユニット４２ｊの内部状態ｙ_ｊを超球面識別タイプの演算により求めることにより局所的特徴を表現することができる。
【００６１】
上述したように、超平面識別タイプの演算により内部状態ｙ_ｊを求めれば、大局的特徴を表現できるが、局所的特徴を表現できない。一方、超球面識別タイプの演算により内部状態ｙ_ｊを求めれば、局所的特徴を表現できるが、大局的特徴を表現することができない。
【００６２】
従って、理想的には、局所的特徴の表現に有利な中間ユニットと、大局的特徴の表現に有利な中間ユニットとを揃えればよいが、それぞれどれだけの個数を揃えればよいかが未知であるため超平面識別タイプの演算及び超球面識別タイプの演算を混在して行なえば、計算量が増加する。
【００６３】
そこで、この発明においては、局所的特徴の表現に有利な超球面識別タイプの演算を採用し、中間ユニット４２ｊが内部状態ｙ_ｊ及び出力Ｙ_ｊを演算する際のパラメータ（θ_ｊ，ｗ_ｉｊ，τ_ｊ）を通常の探索範囲よりも広い範囲で変化させることにより大局的特徴を表現することにした。
【００６４】
２個の入力値の組ｘ_１（ｎ），ｘ_２（ｎ）を用いた場合、識別曲面の方程式ｙ_ｊ＝０は、式（１）より、パラメータｗ_ｉｊを中心とし、パラメータθ_ｊを半径とする円を指定する。パラメータθ_ｊ及びｗ_ｉｊの範囲を変えた場合の関数ｆの変化について説明する。
【００６５】
図８は、パラメータθ_ｊ及びｗ_ｉｊの範囲を局所的特徴に相応する小さな半径にした場合における識別超球面の取り得る相対的位置関係と中間ユニット４２ｊの出力例とを示す。図９は、パラメータθ_ｊ及びｗ_ｉｊの範囲を広くした場合における識別超球面の取り得る相対的位置関係と中間ユニット４２ｊの出力例とを示す。図１０は、パラメータｗ_ｉｊ（中心）が入力変数ｘ_１，ｘ_２の定義域外に存在する小さな超球である場合における識別超球面の取り得る相対的位置関係と中間ユニット４２ｊの出力例とを示す。
【００６６】
パラメータ（ｗ_１ｊ，ｗ_２ｊ）（中心）が入力変数ｘ_１，ｘ_２の定義域内（［０，１］^２の範囲）にあって、パラメータθ_ｊ（半径）の取り得る範囲が［０，０．５］程度と小さい場合（図８の（ａ）参照）、中間ユニット４２ｊは、局所的特徴を反映した曲面６１又は６２によって表わされる出力Ｙ_ｊを出力する（図８の（ｂ）参照）。
【００６７】
曲面６１は、パラメータ（ｗ_１ｊ，ｗ_２ｊ）（中心）が（０．５，０．５）であり、パラメータθ_ｊ（半径）が０．１であり、Ｔ_ｊが０．０３である場合に得られる。また、曲面６２は、パラメータ（ｗ_１ｊ，ｗ_２ｊ）（中心）が（０．５，０．５）であり、パラメータθ_ｊ（半径）が０．１であり、Ｔ_ｊが０．０５である場合に得られる。
【００６８】
このように、パラメータ（ｗ_１ｊ，ｗ_２ｊ）（中心）を入力変数ｘ_１，ｘ_２の定義域内に設定し、パラメータθ_ｊ（半径）を小さい値にした場合、中間ユニット４２ｊは、局所的特徴を反映した曲面６１及び６２によって表わされる出力Ｙ_ｊを出力する。
【００６９】
パラメータ（ｗ_１ｊ，ｗ_２ｊ）（中心）が入力変数ｘ_１，ｘ_２の定義域外（［−０．２，１．２］^２の範囲）にあって、パラメータθ_ｊ（半径）の取り得る範囲が［０，１．５］と大きい場合（図９の（ａ）参照）、中間ユニット４２ｊは、曲面６３又は６４によって表わされる出力Ｙ_ｊを出力する（図９の（ｂ）参照）。
【００７０】
曲面６３は、パラメータ（ｗ_１ｊ，ｗ_２ｊ）（中心）が（１．２，１．２）であり、パラメータθ_ｊ（半径）が１．０であり、Ｔ_ｊが０．１である場合に得られる。また、曲面６４は、パラメータ（ｗ_１ｊ，ｗ_２ｊ）（中心）が（１．２，１．２）であり、パラメータθ_ｊ（半径）が１．０であり、Ｔ_ｊが０．５である場合に得られる。
【００７１】
曲面６３及び６４は、図５に示す曲面に似た曲面である。従って、パラメータ（ｗ_１ｊ，ｗ_２ｊ）（中心）を入力変数ｘ_１，ｘ_２の定義域外に設定し、パラメータθ_ｊ（半径）を大きい値にした場合、中間ユニット４２ｊは、大局的特徴を反映した曲面６３及び６４によって表わされる出力Ｙ_ｊを出力する。つまり、中間ユニット４２ｊは、近似的に超平面識別タイプの演算を行なう。このことは、超球面識別タイプの演算においてパラメータθ_ｊ及びｗ_ｉｊの範囲を局所的特徴を反映する範囲よりも広い範囲に設定することによって、中間ユニット４２ｊは、大局的特徴を反映した出力Ｙ_ｊを出力できることを意味する。
【００７２】
パラメータ（ｗ_１ｊ，ｗ_２ｊ）（中心）が入力変数ｘ_１，ｘ_２の定義域外にあって、パラメータθ_ｊ（半径）が小さい場合（図１０の（ａ）参照）、中間ユニット４２ｊは、曲面６５によって表わされる出力Ｙ_ｊを出力する（図１０の（ｂ）参照）。
【００７３】
この場合、中間ユニット４２ｊは、定義域全域において、ほぼゼロの値から成る曲面６５を出力し、最終出力である出力演算値Ｚ_１（ｎ）に殆ど寄与しない。
【００７４】
上述したように、パラメータθ_ｊ，ｗ_ｉｊ及びＴ_ｊ（つまりτ_ｊ）の範囲を変化させることにより、中間ユニット４２ｊは、超球面識別タイプの演算において局所的特徴を反映した出力Ｙ_ｊ（図８参照）及び大局的特徴を反映した出力Ｙ_ｊ（図９参照）の両方を出力することが可能である。そして、超球面識別タイプの演算により局所的特徴及び大局的特徴を反映するようにしても中間ユニット４２ｊのパラメータの数が増加することはない。従って、この発明においては、中間ユニット４２ｊは、パラメータθ_ｊ，ｗ_ｉｊ及びＴ_ｊ（つまりτ_ｊ）の範囲を通常の超球面識別タイプの演算を行なう範囲よりも広い範囲まで変化させて超球面識別タイプの演算を行ない、出力Ｙ_ｊを出力することにした。これが、中間ユニット４２ｊにおいて超球面識別タイプの演算を行なうことにした理由である。
【００７５】
なお、「通常の探索範囲」とは、超球面識別タイプの演算により局所的特徴のみを出力Ｙ_ｊに反映させるためにパラメータθ_ｊ，ｗ_ｉｊ及びＴ_ｊ（つまりτ_ｊ）を変化させる範囲を言う。
【００７６】
中間ユニット４２ｊからの出力Ｙ_ｊは、出力ユニット４３１に入力され、出力ユニット４３１は、出力Ｙ_ｊ及びパラメータＷ_ｊ，Θを式（４）へ代入して出力演算値Ｚ_１（ｎ）を演算する。
【００７７】
出力演算値Ｚ_１（ｎ）が演算されると、サンプルデータ３０に含まれる出力値ｚ_１（ｎ）と出力演算値Ｚ_１（ｎ）との二乗誤差の和の平均Ｖ（ｗ_ｉｊ，θ_ｊ，τ_ｊ，Θ）が式（６）により演算される。
【００７８】
【数６】

【００７９】
二乗誤差の和の平均Ｖ（ｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θ）は、３層ニューラルネットワーク４０によって演算された出力演算値Ｚ_１（ｎ）が実際の出力値ｚ_１（ｎ）に近い度合いを示す指標であり、出力演算値Ｚ_１（ｎ）を評価するコスト関数である。
【００８０】
この発明によるプログラムは、パラメータｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θを変化させて（中間ユニット４２ｊのパラメータｗ_ｉｊ，θ_ｊ，τ_ｊについては、局所的特徴を反映する範囲よりも広い範囲で変化させる）、出力演算値Ｚ_１（ｎ）を演算し、出力演算値Ｚ_１（ｎ）を式（６）に代入して演算したコスト関数Ｖ（ｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θ）の関数値が所定値ε以下になるようにパラメータｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θを最適化する。この場合、この発明によるプログラムは、最初、中間ユニット４２ｊの個数を１個に設定し（ｊ＝１）、その設定した１個の中間ユニット４２１を用いて最終的に演算したコスト関数Ｖ（ｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θ）の関数値が所定値εよりも大きいとき、関数値が所定値ε以下になるまで中間ユニット４２ｊの個数を１個づつ増加し、関数値が所定値ε以下になるようにパラメータｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θを最適化する。
【００８１】
このパラメータｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θが最適化されれば、入力値ｘ_１（ｎ），・・・，ｘ_ｍ１（ｎ）と出力値ｚ_１（ｎ）（ｎ＝１〜Ｓ）との関係を規定する近似関数ｆが決定されるので、パラメータｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θを最適化することは、近似関数ｆを求める演算を行なうことに相当する。
【００８２】
上述したように、中間ユニット４２ｊは、パラメータｗ_ｉｊ，θ_ｊ，τ_ｊを局所的特徴を反映する範囲よりも広い範囲で変化させて超球面識別タイプの演算を行なうが、パラメータｗ_ｉｊ，θ_ｊ，τ_ｊを広い範囲で変化させると、コスト関数Ｖ（ｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θ）において広い平坦領域を増加させることになる。図１０の（ａ）に示すように、中心ｗ_ｉｊが入力値の定義域外にあり、半径θ_ｊが小さい場合、中間ユニット４２ｊの出力Ｙ_ｊは、定義域全域においてほぼゼロとなり、最終出力である出力演算値Ｚ_１（ｎ）に殆ど寄与しない。従って、パラメータ空間における、このような状況に対応する領域ではコスト関数Ｖ（ｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θ）が平坦になってしまう。
【００８３】
ｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θのようなパラメータを最適化する場合によく用いられる誤差逆伝播法（ＢＰ：ＥｒｒｏｒＢａｃｋ−ＰｒｏｐａｇａｔｉｏｎＡｌｇｏｒｉｔｈｍ）又は焼きなまし法（ＳＡ：ＳｉｍｕｌａｔｅｄＡｎｎｅａｌｉｎｇ）のようなアルゴリズムは、広い平坦領域を含むコスト関数に対して非常に効率が悪い。広い平坦領域の出現を避けるためにコスト関数にペナルティー項を追加する方法も考えられるが、この方法では、本来のコスト関数をゆがめたり、コスト関数の評価に要する計算量を増加させることになる。
【００８４】
そこで、この発明においては、広い平坦領域、及びニューラルネットワークの学習において常に問題となる複数の局所解を含むコスト関数に対して有効な最適化手法である高次元アルゴリズムを採用することにした。
【００８５】
高次元アルゴリズムによるパラメータの最適化について説明する。図１１は、高次元アルゴリズムによる最適化のフローチャートを示す。解くべき問題に現われる全ての最適化すべき変数ｑの空間を意味空間と定義する（ステップＡ）。そして、変数ｑと共役な変数ｐを人為的に導入し、変数ｐの新しい空間を定義する（ステップＢ）。その後、変数ｐの空間を意味空間に加え、意味空間を高次元化した空間を高次元空間と定義する（ステップＣ）。
【００８６】
そして、変数ｑ，ｐの高次元空間において、問題を設定する（ステップＤ）。最後に、高次元空間において、自律的運動によって最適解を探索し、コスト関数を最小とする最適解を検出する（ステップＥ）。ここで、自律的運動とは、解の存在しない領域を素速く通過し、解の存在する領域に容易に入る運動を言う。
【００８７】
このように、高次元アルゴリズムは、最適化すべき変数ｑの意味空間を高次元化し、その高次元化した高次元空間において自律的運動を行なうことによって最適解を検出する。
【００８８】
図１２は、高次元アルゴリズムによる解の探索方法を示す概念図である。また、図１３は、パラメータが２個（ｋ１，ｋ２）の場合におけるコスト関数のランドスケープを示す。図１３に示すランドスケープには、多くの山及び谷が存在し、解は全ての谷の最も低い点（極小値）に対応する。従って、解を求めるためには、複雑に入り組んだ多くの山及び谷を通過して最も低い谷に到達する必要がある。
【００８９】
つまり、図１２に示すように、最適化すべき変数ｑの意味空間７０において解７１に到達するには、矢印７２で示される経路を移動して多くの山及び谷を通過する必要がある。しかし、大きな意味空間７０において小さな解７１に到達するのは困難である。
【００９０】
そこで、変数ｑと共役な変数ｐを追加して変数ｑ，ｐの高次元空間８０を定義する。この高次元空間８０においては、意味空間７０において解でない領域は小さく、解に相当する領域は拡大される。従って、高次元空間８０においては、意味空間７０における解７１が解の存在する領域８１に拡大され、矢印８２によって示される経路を運動して領域８１に容易に到達する。そして、高次元空間８０では解でない領域は小さくなり、解に相当する領域は拡大されるので、矢印８２によって示される経路を運動する場合、解でない領域を素速く通過し、解の存在する領域８１に容易に入る。つまり、高次元空間８０においては、自律的運動が行なわれて解が検出される。
【００９１】
このように、探索すべき空間を高次元化すれば目的物を探索し易いことは、次の例によって明確に理解できる。例えば、長い切り口の六角形の鉛筆を探索する場合を考える。この鉛筆を正面から見ると、小さい六角形の断面が見えるだけであるが、視線をシフトすれば、その切り口の長さも見える。この「長さ」を次元と考えれば、上述した意味空間７０から高次元空間８０へ次元を高次元化することにより、解を探索し易くなることを容易に理解できる。
【００９２】
高次元アルゴリズムは、このように探索すべき空間を高次元化し、その高次元化した高次元空間において解を探索する結果、コスト関数の局所解に捉まりにくく、かつ、コスト関数の平坦領域を素速く通過でき、大局解を容易に求めることができる。
【００９３】
３層ニューラルネットワーク４０により演算された出力演算値Ｚ_１（ｎ）が実際の出力値ｚ_１（ｎ）に近くなるように高次元アルゴリズムを用いてパラメータｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θを最適化する場合、この発明によるプログラムは、高次元空間８０における変数ｑ，ｐの関数であるハミルトニアンＨ（ｑ，ｐ）を用いる。このハミルトニアンＨ（ｑ，ｐ）は、運動する力学系を表わす具体的な道具であり、この具体的な道具（ハミルトニアンＨ（ｑ，ｐ））によって表わされた空間においては、意味空間７０を成す変数ｑと、意味空間７０を高次元空間８０へ高次元化し、かつ、変数ｑと共役である変数ｐとを導入し易いからである。実際の力学系の場合、変数ｑは、運動する物体の位置を表わし、変数ｐは、運動する物体の速度を表わす。高次元アルゴリズムは、この力学系のアナロジーを取って最適パラメータを探索することを特徴とする最適化手法である。
【００９４】
この発明の近似関数を求める問題の場合、高次元アルゴリズムによって最適化すべき変数ｑはｑ＝ｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θとなる（図１１のステップＡ参照）。そして、この変数ｑ＝ｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θは、運動する物体の位置に対応する。ポテンシャルエネルギーＶ（ｑ）は、変数ｑ＝ｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θの関数である。
【００９５】
次に、変数ｑに共役な変数ｐが人為的に導入され、変数ｐの成す新しい空間が定義される（図１１のステップＢ参照）。この新しい空間を意味空間に加えたものを高次元空間と定義する（図１１のステップＣ参照）。変数ｐは、上述したように運動する物体の速度に対応し、変数ｐの関数である運動エネルギーＴ（ｐ）に対応する関数を運動する力学系と同様に定義する（図１１のステップＤ参照）。
【００９６】
その後、意味空間における関数Ｖ（ｑ）に新しい空間における関数Ｔ（ｐ）を加えて意味空間を高次元化した高次元空間における関数Ｈ（ｑ，ｐ）が変数ｑ，ｐによって定義される（図１１のステップＤ参照）。
【００９７】
ハミルトニアン力学系においては、運動する物体の任意の時間ｔにおける位置は、ハミルトニアンＨ（ｑ，ｐ）から導かれる運動方程式により決定されるため、高次元アルゴリズムにおいても同様にする。
【００９８】
最後に、高次元空間において、自律的運動によって最適解を検出することは、ハミルトニアンＨ（ｑ，ｐ）から導かれる運動方程式を解いて最適化された変数ｑを見つけることに相当する（図１１のステップＥ参照）。
【００９９】
ポテンシャルエネルギーＶ（ｑ）は、式（７）によって表わされる。
【０１００】
【数７】

【０１０１】
即ち、ポテンシャルエネルギーＶ（ｑ）は、コスト関数Ｖ_Ｃ（ｑ）と拘束ポテンシャルＶ_Ｌ（ｑ）との和とする。この拘束ポテンシャルＶ_Ｌ（ｑ）は、パラメータｑ＝ｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θの探索範囲を制限するものである。そして、拘束ポテンシャルＶ_Ｌ（ｑ）は、式（８）によって表わされる。
【０１０２】
【数８】

【０１０３】
式（８）において、ｃは、拘束の強さを制御するパラメータであり、この実施の形態においてはｃ＝１に設定される。また、式（８）におけるν_ｎ（ｑ_ｎ）は、式（９）によって表わされる。
【０１０４】
【数９】

【０１０５】
式（９）において、θ（ｕ）は、階段関数を表わし、ｕ＝ａ_ｎ−ｑ_ｎ又はｑ_ｎ−ｂ_ｎ＞０のとき、θ（ｕ）＝１であり、ｕ＝ａ_ｎ−ｑ_ｎ又はｑ_ｎ−ｂ_ｎ＜０のとき、θ（ｕ）＝０である。
【０１０６】
即ち、ν_ｎ（ｑ_ｎ）は、図１４に示す曲線８９によって表わされ、パラメータｑ＝ｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θの探索範囲の限界を規定する。
【０１０７】
また、運動エネルギーＴ（ｐ）は、式（１０）によって表わされ、ハミルトニアンＨ（ｑ，ｐ）は、式（１１）によって表わされる。
【０１０８】
【数１０】

【０１０９】
【数１１】

【０１１０】
その結果、系の運動を記述する運動方程式は、式（１２）によって表わされる。
【０１１１】
【数１２】

【０１１２】
また、式（１２）の下側の式の右辺に現われるコスト関数Ｖ_Ｃ（ｑ）及び拘束ポテンシャルを示す関数Ｖ_Ｌ（ｑ）のパラメータｑ＝ｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θによる微分形をそれぞれ式（１３）及び式（１４）に示す。
【０１１３】
【数１３】

【０１１４】
【数１４】

【０１１５】
ランダムに選択した初期値（ｑ（０），ｐ（０））を出発点とし、式（１２）の２Ｎ個の一階微分方程式の組をベルレー（Ｖｅｒｌｅｔ）法又はルンゲクッタ（Ｒｕｎｇｅ−Ｋｕｔｔａ）法を用いて数値的に解くことにより、系の軌道（ｑ（ｔ），ｐ（ｔ））が得られる。そして、十分な時間、軌道ｑ（ｔ）に沿ったコスト関数Ｖ_Ｃ（ｑ（ｔ））を監視することにより、パラメータｑ＝ｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θの最適値を見つけることができる。
【０１１６】
式（１２）に示す運動方程式により記述される運動系は、混合性を持ち、かつ、全エネルギーＥが一定である力学系である。この場合、系は、位相空間のＨ（ｑ，ｐ）＝Ｅを満たす等エネルギー曲面上を等しい確率で到る所を動き回ることが期待される（等重率の原理）。この等重率の原理に基づけば、系が位置ｑの近傍の微小体積ｑ＋ｄｑ内に滞在する時間の期待値δ（ｑ）は、式（１５）によって表わされる（ケイ、シンジョー（Ｋ．Ｓｈｉｎｊｏ）及びティー、ササダ（Ｔ．Ｓａｓａｄａ）著、「ハミルトニアンシステムズウィズメニィディグリーオブフリードム：アシンメトリックモウションアンドインテンシティオブモウションインフェーズスペース（Ｈａｍｉｌｔｏｎｉａｎｓｙｓｔｅｍｓｗｉｔｈｍａｎｙｄｅｇｒｅｅｓｏｆｆｒｅｅｄｏｍ：ａｓｙｍｍｅｔｒｉｃｍｏｔｉｏｎａｎｄｉｎｔｅｎｓｉｔｙｏｆｍｏｔｉｏｎｉｎｐｈａｓｅｓｐａｃｅ）」、フィジカルレビュー（ＰｈｙｓｉｃａｌＲｅｖｉｅｗ）Ｅ５４，ｐｐ４６８５−４７００，１９９６）。
【０１１７】
【数１５】

【０１１８】
式（１５）より、自由度Ｎが３以上のとき、高いポテンシャル値を持つ領域では滞在時間の期待値が低く、低いポテンシャル値を持つ領域では滞在時間の期待値が高くなり、更に、この傾向は、自由度Ｎが大きければ大きいほど顕著になる。
【０１１９】
従って、高次元アルゴリズムは、このような特徴を有するハミルトニアンＨ（ｑ，ｐ）の力学系のアナロジーを取って最適解を探索することにより、上述したコスト関数の局所解に捉まりにくく、かつ、コスト関数の平坦な領域を素速く通過するという特徴を有する。
【０１２０】
この高次元アルゴリズムの特徴を概念的に説明する。
図１５は、コスト関数の局所解に捉まる度合いを高次元アルゴリズム（ＨＡ）と焼きなまし法（ＳＡ）について比較して示す。図１５の（ａ）は、高次元アルゴリズム（ＨＡ）の場合を示し、図１５の（ｂ）は、焼きなまし法（ＳＡ）の場合を示す。図１５の（ａ）及び（ｂ）において、横軸は最適化すべき変数ｑを表わし、縦軸はコスト関数Ｖ（ｑ）を表わす。
【０１２１】
また、図１６は、コスト関数の平坦領域を通過する速さを高次元アルゴリズム（ＨＡ）と焼きなまし法（ＳＡ）について比較して示す。図１６の（ａ）は、高次元アルゴリズム（ＨＡ）の場合を示し、図１６の（ｂ）は、焼きなまし法（ＳＡ）の場合を示す。図１６の（ａ）及び（ｂ）において、横軸は最適化すべき変数ｑを表わし、縦軸はコスト関数Ｖ（ｑ）を表わす。
【０１２２】
図１５に示すように、高次元アルゴリズム（ＨＡ）の場合、コスト関数Ｖ（ｑ）の谷（局所解）から抜け出す役割を運動エネルギーＥが担うが、全エネルギー一定の条件より、運動エネルギーＥは、コスト関数の関数値が小さい位置では大きくなる。つまり、局所解に入り込んだら運動エネルギーＥが大きくなる。従って、高次元アルゴリズム（ＨＡ）は、局所解に捉まりにくい（図１５の（ａ）参照）。一方、焼きなまし法（ＳＡ）の場合、運動は、正の絶対温度を持つことにより局所解から抜け出すことが可能であるが、この絶対温度は位置には依存しない。従って、絶対温度がコスト関数の山よりも低ければ、局所解を抜け出すことが困難である。その結果、焼きなまし法（ＳＡ）は、コスト関数の局所解に捉まり易い（図１５の（ｂ）参照）。
【０１２３】
また、図１６に示すように、高次元アルゴリズム（ＨＡ）の場合、等速直線運動により一方の方向に運動するため、コスト関数Ｖ（ｑ）の平坦領域を素速く通過する（図１６の（ａ）参照）。一方、焼きなまし法（ＳＡ）の場合、ランダムウォーク（紙面の右方向及び左方向に各ステップごとにランダムに運動する）によって運動するため、コスト関数Ｖ（ｑ）の平坦領域をなかなか通過できない（図１６の（ｂ）参照）。
【０１２４】
このように、高次元アルゴリズムは、コスト関数の局所解に捉まりにくく、かつ、平坦領域を素速く通過できるという特徴を有する。その結果、少ない計算量によって最適解に到達できる。
【０１２５】
図１７は、この発明によるプログラムが入力値の組ｘ_１（ｎ），・・・，ｘ_ｍ１（ｎ）と出力値ｚ_１（ｎ）との関係を規定する近似関数ｆを求める演算を行なうフローチャートを示す。近似関数ｆを求める演算が開始されると、サンプルデータ３０に含まれるｍ１×Ｓ個の入力値及びＳ個の出力値が受付けられる（ステップＳ１）。そして、１個に設定された中間ユニット４２ｊ（ｊ＝１）を用いて３層ニューラルネットワーク４０のパラメータｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θが最適化される。
【０１２６】
即ち、３層ニューラルネットワーク４０の中間ユニット４２ｊのパラメータｗ_ｉｊ，θ_ｊ，τ_ｊを広い探索範囲で変化させて、ｍ１個の入力値と１個の出力値との関係を模倣する近似関数ｆが得られるように、３層ニューラルネットワーク４０のパラメータｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θを高次元アルゴリズムを用いて最適化する（ステップＳ２）。これにより、パラメータｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θが最適化されれば、近似関数ｆが決定されるので、一連の動作は終了する。
【０１２７】
図１８は、図１７に示すステップＳ２の詳細な動作を説明するためのフローチャートを示す。図１７に示すステップＳ１の後、３層ニューラルネットワーク４０のパラメータｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θを初期値に設定し、超球面識別タイプの演算によりＳ個の出力演算値Ｚ_１（１），・・・，Ｚ_１（Ｓ）が演算される（ステップＳ２１）。
【０１２８】
そして、Ｓ個の出力演算値Ｚ_１（１），・・・，Ｚ_１（Ｓ）を評価するコスト関数値が演算される（ステップＳ２２）。その後、コスト関数値が所定値ε以下であるか否かが判定され（ステップＳ２３）、コスト関数値が所定値ε以下であるとき一連の動作は終了する。
【０１２９】
一方、ステップＳ２３において、コスト関数値が所定値ε以下でないと判定されたとき、計算回数が規定回数以下であるか否かが判定される（ステップＳ２４）。そして、計算回数が規定回数以下であるとき、コスト関数値を低減させるためのパラメータが高次元アルゴリズムにより広い探索範囲で演算される（ステップＳ２５）。
【０１３０】
即ち、式（１３）及び（１４）により、ステップＳ２２において演算されたコスト関数値及び拘束条件をパラメータｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θの各々で偏微分した値を求め、式（１２）により次に取るべきパラメータｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θの値を演算する。そして、その演算した値を、ステップＳ２１における演算に用いるパラメータｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θとして用いる。
【０１３１】
このようにして、ステップＳ２５の後、ステップＳ２１〜ステップＳ２４が繰返し実行される。
【０１３２】
一方、ステップＳ２４において、計算回数が規定回数以下でないと判定されたとき、一連の繰返しのうち、最もコスト関数値が小さくなるパラメータを、その中間ユニットの最適値と固定し、３層ニューラルネットワーク４０の中間ユニット４２ｊの個数が１個増加される（ステップＳ２６）。そして、新しく追加された中間ユニットに対してステップＳ２１〜ステップＳ２５が繰返し実行される。
【０１３３】
このように、この発明によるプログラムは、中間ユニット４２ｊの個数が、最初、１個に設定され、その設定された１個の中間ユニット４２１を用いて、コスト関数値を小さくするパラメータを見つけるため、高次元アルゴリズムによりパラメータｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θを広い探索範囲で変化させて、対応するコスト関数値を次々と演算する。そして、中間ユニット４２ｊの個数を１個にして規定回数の演算を行なっても、最も小さなコスト関数値が所定値ε以下にならないとき、中間ユニット４２ｊの個数が１個増加され、同じ演算が繰返し実行される。中間ユニット４２ｊの個数を２個に設定して規定回数の演算を行なっても、コスト関数値が所定値ε以下にならないとき、中間ユニット４２ｊの個数が更に１個増加される。そして、図１８に示すステップＳ２３においてコスト関数値が所定値ε以下であると判定されるまで、高次元アルゴリズムによる新たなパラメータの探索と中間ユニット４２ｊの個数の増加とが繰返し実行される。
【０１３４】
図１７に示すステップＳ２が最初に実行される場合、中間層４２の中間ユニット４２ｊの個数は１個に設定されているので、中間層４２は、中間ユニット４２１のみによって出力Ｙ_１を演算する。図１９は、中間ユニットの個数を１個に設定した場合の３層ニューラルネットワーク４０の概念図を示す。従って、３層ニューラルネットワーク４０は、図１９に示す入力層４１、中間層４２Ａ及び出力層４３によって出力演算値Ｚ_１（ｎ）を演算する。
【０１３５】
この場合、中間ユニット４２１は、入力層４１の入力ユニット４１ｉ（ｉ＝１，・・・，ｍ１）からそれぞれ入力値ｘ_１（ｎ），・・・，ｘ_ｍ１（ｎ）（ｎ＝１〜Ｓ）を受け、その受けた入力値ｘ_１（ｎ），・・・，ｘ_ｍ１（ｎ）（ｎ＝１〜Ｓ）と結合のパラメータｗ_１，１，ｗ_２，１，ｗ_ｍ１，１と閾値θ_１とを式（１）に代入して内部状態ｙ_１を演算する。そして、中間ユニット４２１は、演算した内部状態ｙ_１及びパラメータＴ_１を式（２）に代入して出力Ｙ_１を演算し、その演算した出力Ｙ_１を出力層４３の出力ユニット４３１へ出力する。
【０１３６】
出力ユニット４３１は、中間ユニット４２１から受けた出力Ｙ_１と結合重みＷ_１と閾値Θとを式（４）に代入して出力演算値Ｚ_１（ｎ）を演算する。つまり、パラメータ（ｗ_ｉ１，θ_１，τ_１，Ｗ_１，Θ）_１１を用いて出力演算値Ｚ_１１（ｎ）が演算される。なお、パラメータ（ｗ_ｉ１，θ_１，τ_１，Ｗ_１，Θ）_１１及び出力演算値Ｚ_１１（ｎ）の添字｛１１｝のうち、前者の｛１｝は中間ユニット４２ｊの個数を表わし、後者の｛１｝は１回目に設定されたパラメータであることを表わす。
【０１３７】
コスト関数Ｖ_Ｃ（ｑ）は、入力値ｘ_１（ｎ），・・・，ｘ_ｍ１（ｎ）（ｎ＝１〜Ｓ）と出力値ｚ_１（ｎ）との関係を規定する近似関数ｆを求める演算においては式（６）によって表わされる。従って、演算された出力演算値Ｚ_１１（ｎ）及び実際の出力値ｚ_１（ｎ）が式（６）に代入されてコスト関数Ｖ（（ｗ_ｉ１，θ_１，τ_１，Ｗ_１，Θ）_１１）のコスト関数値Ｖ_１１が演算される（図１８のステップＳ２２参照）。なお、コスト関数値Ｖ_１１の添字｛１１｝の意味は、パラメータ（ｗ_ｉ１，θ_１，τ_１，Ｗ_１，Θ）_１１及び出力演算値Ｚ_１１（ｎ）の添字の意味と同じである。
【０１３８】
そして、コスト関数値Ｖ_１１が所定値ε以下でないとき、コスト関数値Ｖ_１１よりも小さいコスト関数値Ｖ_１２を求めるためのパラメータ（ｗ_ｉ１，θ_１，τ_１，Ｗ_１，Θ）_１２を高次元アルゴリズムにより求める（図１８のステップＳ２５参照）。その後、パラメータ（ｗ_ｉ１，θ_１，τ_１，Ｗ_１，Θ）_１２を用いてＳ個の出力演算値Ｚ_１２（ｎ）が演算され（図１８のステップＳ２１参照）、式（６）によりコスト関数値Ｖ_１２が演算される（図１８のステップＳ２２参照）。
【０１３９】
そして、コスト関数値Ｖ_１２が所定値ε以下であるか否かが判定され（図１８のステップＳ２３）、コスト関数値Ｖ_１２が所定値ε以下でないとき、既に演算された最小のコスト関数値よりも更に小さいコスト関数値Ｖ_１ｈ（ｈ：ｈ≦ｋの自然数、ｋ：規定回数）を求めるためのパラメータ（ｗ_ｉ１，θ_１，τ_１，Ｗ_１，Θ）_１ｈが演算される（図１８のステップＳ２５参照）。
【０１４０】
このように、計算回数が規定回数ｋ以下であるとき、高次元アルゴリズムによりなるべく小さいコスト関数値Ｖ_１ｈ（ｈは、ｋ以下の自然数）を見つけるためのパラメータ（ｗ_ｉ１，θ_１，τ_１，Ｗ_１，Θ）_１ｈの値が演算され、その演算されたパラメータ（ｗ_ｉ１，θ_１，τ_１，Ｗ_１，Θ）_１ｈを用いて新たなコスト関数値Ｖ_１ｈが演算され、これが次々と繰返される。つまり、高次元アルゴリズムによりパラメータ（ｗ_ｉ１，θ_１，τ_１，Ｗ_１，Θ）を変化させて対応するコスト関数値を監視しながら最も小さいコスト関数値と、最も小さいコスト関数値に対応するパラメータとを探す。
【０１４１】
そして、計算回数が規定回数に達すると、中間ユニット４２ｊの個数は、１個増加され、２個に設定される（図１８のステップＳ２６参照）。
【０１４２】
図２０は、中間ユニットの個数を２個に設定した場合の３層ニューラルネットワーク４０の概念図を示す。従って、３層ニューラルネットワーク４０は、図２０に示す入力層４１、中間層４２Ｂ及び出力層４３によって出力演算値Ｚ_２１（ｎ）を演算する。
【０１４３】
この場合、中間ユニット４２１は、入力層４１の入力ユニット４１ｉ（ｉ＝１，・・・，ｍ１）からそれぞれ入力値ｘ_１（ｎ），・・・，ｘ_ｍ１（ｎ）（ｎ＝１〜Ｓ）を受け、その受けた入力値ｘ_１（ｎ），・・・，ｘ_ｍ１（ｎ）（ｎ＝１〜Ｓ）と結合のパラメータｗ_１，１，ｗ_２，１，ｗ_ｍ１，１と閾値θ_１とを式（１）に代入して内部状態ｙ_１を演算する。そして、中間ユニット４２１は、演算した内部状態ｙ_１及びパラメータＴ_１を式（２）に代入して出力Ｙ_１を演算し、その演算した出力Ｙ_１を出力層４３の出力ユニット４３１へ出力する。なお、パラメータｗ_１，１，ｗ_２，１，ｗ_ｍ１，１、θ_１及びＴ_ｊは、最も小さいコスト関数値を記録したときのパラメータ（ｗ_ｉ１，θ_１，τ_１，Ｗ_１）_１ｋに固定される。
【０１４４】
また、中間ユニット４２２は、入力層４１の入力ユニット４１ｉ（ｉ＝１，・・・，ｍ１）からそれぞれ入力値ｘ_１（ｎ），・・・，ｘ_ｍ１（ｎ）（ｎ＝１〜Ｓ）を受け、その受けた入力値ｘ_１（ｎ），・・・，ｘ_ｍ１（ｎ）（ｎ＝１〜Ｓ）と結合のパラメータｗ_１，２，ｗ_２，２，ｗ_ｍ１，２と閾値θ_２とを式（１）に代入して内部状態ｙ_２を演算する。そして、中間ユニット４２２は、演算した内部状態ｙ_２及びパラメータＴ_２を式（２）に代入して出力Ｙ_２を演算し、その演算した出力Ｙ_２を出力層４３の出力ユニット４３１へ出力する。
【０１４５】
出力ユニット４３１は、中間ユニット４２１から受けた出力Ｙ_１と中間ユニット４２２から受けた出力Ｙ_２と結合重みＷ_１，Ｗ_２と閾値Θとを式（４）に代入して出力演算値Ｚ_２１（ｎ）を演算する。
【０１４６】
そして、出力演算値Ｚ_２１（ｎ）を用いてコスト関数値Ｖ_２１が演算され、中間ユニット４２１のみを用いた場合と同様の演算が繰返し実行される。中間ユニット４２ｊの個数を２個に設定し、高次元アルゴリズムによりパラメータ（ｗ_ｉ１，θ_１，τ_１，Ｗ_１，Θ）_２ｈを変化させて小さいコスト関数値Ｖ_２ｈを見つける計算を規定回数ｋまで繰返し実行してもコスト関数値Ｖ_２ｈが所定値ε以下にならないとき、中間ユニット４２ｊの個数が更に１個増加され、中間ユニット４２ｊの個数を２個に設定した場合と同様の演算が繰返し実行される。
【０１４７】
このように、中間ユニット４２ｊの個数がある値に設定されると、その設定された個数の中間ユニット４２ｊを用いて演算された出力演算値Ｚ（ｎ）を評価するコスト関数値を小さくする最適値を求めるために、新たなパラメータ（ｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θ）が高次元アルゴリズムにより次々と演算される。そして、新たなパラメータの演算を規定回数まで行なってもコスト関数値が所定値ε以下にならないとき、中間ユニット４２ｊの個数を１個増加してコスト関数値を小さくするように新たなパラメータ（ｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θ）が高次元アルゴリズムにより次々と演算される。
【０１４８】
従って、この発明によるプログラムは、３層ニューラルネットワーク４０の中間ユニット４２ｊの個数を１個づつ増加させながらパラメータ（ｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θ）を最適化することを特徴とする。
【０１４９】
なお、上記においては、中間ユニット４２ｊの個数は、最初、「１」個に設定されると説明したが、この発明においては、これに限らず、最初、複数に設定されてもよい。つまり、この発明においては、中間ユニット４２ｊの個数は、最初、１個以上の初期値に設定されればよい。
【０１５０】
３層ニューラルネットワーク４０のパラメータ（ｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θ）は、中間ユニット４２ｊの個数が増加するごとに（ｍ１＋３）個づつ増加する。例えば、中間ユニット４２ｊの個数が１個から２個に増加したとき、全体のパラメータは、（ｗ_ｉ１，θ_１，τ_１，Ｗ_１，Θ）から（ｗ_ｉ１，θ_１，τ_１，Ｗ_１；ｗ_ｉ２，θ_２，τ_２，Ｗ_２，Θ）へ（ｍ１＋３）個増加する。従って、中間ユニット４２ｊの個数を１個づつ増加してパラメータ（ｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θ）を最適化することは、パラメータの数を所定数づつ増加してパラメータ（ｗ_ｉｊ，θ_ｊ，τ_ｊ，Ｗ_ｊ，Θ）を最適化することに相当する。そして、パラメータの数が増加される前の所定数のパラメータのうち、パラメータ（ｗ_ｉ１，θ_１，τ_１，Ｗ_１）の値を固定し、増加されたパラメータ（ｗ_ｉ２，θ_２，τ_２，Ｗ_２）とパラメータΘのうち、超球面識別タイプの演算を行なう中間ユニット４２２のパラメータ（ｗ_ｉ２，θ_２，τ_２）を広い探索範囲で変化させて出力演算値Ｚ_１（ｎ）を演算する。この場合、図１７に示すステップＳ２では、パラメータの個数を所定数の初期値に設定してパラメータの最適化を行なう。
【０１５１】
従って、この発明によるプログラムは、３層ニューラルネットワークのパラメータの数を所定数づつ増加してパラメータを最適化することを特徴とする。
【０１５２】
近似関数ｆを求める演算を行なう場合における３層ニューラルネットワーク４０の効果について説明する。即ち、３層ニューラルネットワーク４０の中間ユニット４２ｊにおいて超球面識別タイプの演算を行なう場合の効果について説明する。
【０１５３】
図２１は、二入力−一出力の３つのテスト関数の等高線を示す。図２１の（ａ）は、テスト関数ｚをｚ＝−０．７６ｘ_１＋０．１９ｘ_２＋０．７８としたときの等高線を示す。図２１の（ｂ）は、テスト関数ｚをｚ＝ｓｉｎ（πｘ_１）・ｓｉｎ（πｘ_２）としたときの等高線を示す。図２１の（ｃ）は、テスト関数ｚをｚ＝０．５ｅｘｐ｛−５（ｘ_１−０．２）^２−５（ｘ_２−０．２）^２｝＋０．９ｅｘｐ｛−５（ｘ_１−０．８）^２−１０（ｘ_２−０．６）^２｝としたときの等高線を示す。図２１の（ａ），（ｂ），（ｃ）において、横軸及び縦軸は入力変数ｘ_１，ｘ_２を表わす。
【０１５４】
また、各学習条件は、次に示すとおりである。

問題（ｉ）、（ｉｉ）及び（ｉｉｉ）の各々において、超球面識別タイプの演算及び超平面識別タイプの演算を行ない、その演算結果の比較を行なった。なお、両方の演算において、パラメータの最適化は高次元アルゴリズムを用いて行なわれた。
【０１５５】
問題（ｉ）の場合、次の結果が得られた。
＜１＞超平面識別ネットワーク
獲得した中間ユニットの個数＝１
平均学習回数≒１１，５０７回
＜２＞超球面識別ネットワーク
獲得した中間ユニットの個数＝１
平均学習回数≒６４，１０３回
図２１の（ａ）に示すように、このテスト関数の等高線は線型であり、超平面識別ネットワークに明らかに有利な問題である。しかし、超球面識別ネットワークにおいても、１個の中間ユニットで比較的厳しい要求精度（ε≦０．００３）を達成していることから、パラメータの探索範囲を広くすることにより、局所的特徴のみならず、大局的特徴も反映させることができ、写像能力が強化されることが解る。
【０１５６】
一方、超球面識別ネットワークは、平均学習回数が超平面識別ネットワークの６倍程度多い。これは、パラメータの探索範囲を広くしたことが原因しているが、明らかに不利なケースにも拘わらず、高々６倍程度の増加に留まっている。
【０１５７】
次に、問題（ｉｉ）の結果を示す。
＜１＞超平面識別ネットワーク
獲得した中間ユニットの個数＝８
平均学習回数≒３９３，８２９回
＜２＞超球面識別ネットワーク
獲得した中間ユニットの個数＝１
平均学習回数≒１２，０４５回
図２１の（ｂ）に示すように、このテスト関数の等高線は、閉じた曲線であり、超球面識別ネットワークに明らかに有利な場合である。超平面識別ネットワークについては、要求精度をε≦０．０３に上げると学習の収束が急速に困難になったため、ε≦０．０５とした。必要な中間ユニットの個数は、超平面識別ネットワークが８個、超球面識別ネットワークが１個という結果であり、これから、写像能力は、超球面識別ネットワークの方が各段に優れていることが解る。
【０１５８】
平均学習回数に関しても、８倍の中間ユニット数（パラメータ数）を要する超平面識別ネットワークでは約３３倍多く、超平面識別ネットワークは、局所的特徴の表現には効率が悪いことが解る。
【０１５９】
最後に、問題（ｉｉｉ）の結果を示す。
＜１＞超平面識別ネットワーク
獲得した中間ユニットの個数＝６
平均学習回数≒２６４，７０６回
＜２＞超球面識別ネットワーク
獲得した中間ユニットの個数＝２
平均学習回数≒２，６６５回
図２１の（ｃ）に示すように、このテスト関数は、非対称な閉じた等高線を持ち、現実の応用問題にありそうな一例である。超平面識別ネットワークは、要求精度ε≦０．０１を達成するのが困難であったため、ε≦０．０３とした。
【０１６０】
中間ユニットの個数は、超球面識別ネットワークが２個であり、超平面識別ネットワークが６個である。その結果、この場合も超球面識別ネットワークの方が写像能力が高いことが解る。平均学習回数も、超平面識別ネットワークは、超球面識別ネットワークの１００倍近く多い。
【０１６１】
このように、パラメータの探索範囲を適切に拡大した超球面識別ネットワークを高次元アルゴリズムで学習する方法は、一般に、超平面識別ネットワークの場合よりも少ないパラメータ数で要求精度を達成できる高い写像能力を持ち、かつ、学習性能も良好であることが確認された。
【０１６２】
図１７に示すステップＳ１，Ｓ２及び図１８に示すステップＳ２１〜Ｓ２６を備えるプログラムは、図２２に示すパーソナルコンピュータによって実行される。図２２は、パーソナルコンピュータの概略ブロック図である。パーソナルコンピュータ９０は、データバスＢＳと、ＣＰＵ９１と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９２と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）９３と、シリアルインタフェース９４と、端子９５と、ＣＤ−ＲＯＭドライブ９６と、ディスプレイ９７と、キーボード９８とを備える。
【０１６３】
ＣＰＵ９１は、ＲＯＭ９３に格納されたプログラムをデータバスＢＳを介して読出す。また、ＣＰＵ９１は、シリアルインタフェース９４、端子９５及びインターネット網を介して取得したプログラム、またはＣＤ（ＣｏｍｐａｃｔＤｉｓｋ）９９からＣＤ−ＲＯＭドライブ９６を介して読出したプログラムをＲＯＭ９３に格納する。更に、ＣＰＵ９１は、キーボード９８から入力されたユーザからの指示を受付ける。
【０１６４】
ＲＡＭ９２は、ＣＰＵ９１が上述した近似関数ｆを求める演算を行なう際のワークメモリである。ＲＯＭ９３は、プログラム等を格納する。シリアルインタフェース９４は、データバスＢＳと端子９５との間でデータのやり取りを行なう。
【０１６５】
端子９５は、ケーブルによってパーソナルコンピュータ９０をインターネットに接続するためのインタフェース（図示せず）に接続するための端子である。ＣＤ−ＲＯＭドライブ９６は、ＣＤ９９に記録されたプログラムを読出す。ディスプレイ９７は、各種の情報を視覚情報としてユーザに与える。キーボード９８は、ユーザからの指示を受付ける。
【０１６６】
ＣＰＵ９１は、キーボード９８を介して入力されたユーザの指示に応じて、ＲＯＭ９３に格納されたプログラムを読出し、その読出したプログラムを実行する。そして、ＣＰＵ９１は、図１７及び図１８に示すフローチャートに従って近似関数ｆを求める演算を行ない、最適化されたパラメータをディスプレイ９７に表示する。
【０１６７】
この発明によるプログラムは、ＣＤ９９からＣＤ−ＲＯＭドライブ９６を介してＣＰＵ９１によって読み込まれてＲＯＭ９３に格納され、またはシリアルインタフェース９４、端子９５及びインターネットを介して取得されてＲＯＭ９３に格納される。
【０１６８】
このように、ユーザは、この発明によるプログラムをパーソナルコンピュータ９０により実行して近似関数ｆを求める演算を行なうことができる。
【０１６９】
上述したように、この発明によるプログラムが入力値ｘ_１（ｎ），・・・，ｘ_ｍ１（ｎ）（ｎ＝１〜Ｓ）と出力値ｚ_１（ｎ）との関係を規定する近似関数ｆを求める演算を行なう場合、サンプルデータ３０の元になる集合１０に含まれる入力値（ｘ_１（１），・・・，ｘ_ｍ１（１））、（ｘ_１（２），・・・，ｘ_ｍ１（２））、・・・、（ｘ_１（Ｓ），・・・，ｘ_ｍ１（Ｓ））、・・・、（ｘ_１（Ｍ），・・・，ｘ_ｍ１（Ｍ））及び集合２０に含まれる出力値（ｚ（１）、ｚ（２）、・・・、ｚ（Ｓ）、・・・、ｚ（Ｍ））は、例えば、量子井戸構造における井戸層の幅ｄ_ｗ１、井戸層の電子密度ｎ_ｗ１、バリア層の高さＷ_ｄ２及びバリア層の幅ｄ_ｂ等から成る入力値と、量子井戸構造における粒子（電子及び正孔）のエネルギー準位Ｅ_ｗ１から成る出力値として取得される。
【０１７０】
そこで、井戸層の幅ｄ_ｗ１、井戸層の電子密度ｎ_ｗ１、バリア層の高さＷ_ｄ２及びバリア層の幅ｄ_ｂ等から成る入力値（ｘ_１（１），・・・，ｘ_ｍ１（１））、（ｘ_１（２），・・・，ｘ_ｍ１（２））、・・・、（ｘ_１（Ｓ），・・・，ｘ_ｍ１（Ｓ））、・・・、（ｘ_１（Ｍ），・・・，ｘ_ｍ１（Ｍ））と、エネルギー準位Ｅ_ｗ１から成る出力値（ｚ（１）、ｚ（２）、・・・、ｚ（Ｓ）、・・・、ｚ（Ｍ））とを取得する方法について説明する。
【０１７１】
［サンプルデータの取得方法］
エネルギー準位Ｅ_ｗ１は、量子井戸構造における粒子の自己相互作用を考慮して演算される。
【０１７２】
図２３は、量子準位演算プログラムが演算の対象とする量子井戸の概念図である。縦軸はエネルギーを示し、横軸は位置を示す。量子井戸１００は、バリア層１０１，１０２と井戸層１０３とから成る。井戸層１０３に閉じ込められた電子はエネルギー準位１０４を形成する。
【０１７３】
この発明においては、粒子の相互作用を含まないシュレディンガー方程式を解いて１つの量子井戸１００における波動関数Ψを求める。そして、求めた波動関数Ψから出発して、変分法の原理に基づき系全体のエネルギーを最小化する波動関数を求める。この場合、系全体のエネルギーは、粒子の相互作用を非線形項として含むハミルトニアンの期待値を、与えられた波動関数に対して計算したものとして定義される。その後、量子井戸１００の一方のバリア層１０１の端から他方のバリア層１０２の端までを複数のポイントｘ_１〜ｘ_Ｎ（Ｎは自然数）に分割し、その分割した各ポイントｘ_ｉ（１≦ｉ≦Ｎ）に対応して波動関数ΨをＮ個の波動関数Ψ_１，・・・，Ψ_ｉ，・・・，Ψ_Ｎに離散化する。この場合、ポイントｘ_１〜ｘ_Ｎの各々と隣接するポイントとの間の距離は全て等しいように位置ｘ_１〜ｘ_Ｎが決定される。
【０１７４】
波動関数Ψを波動関数Ψ_１，・・・，Ψ_ｉ，・・・，Ψ_Ｎへ離散化すると、量子井戸１００の系全体のエネルギーが最小となるように、波動関数Ψ_１，・・・，Ψ_ｉ，・・・，Ψ_Ｎの各々を演算する。そして、演算した波動関数Ψ_１，・・・，Ψ_ｉ，・・・，Ψ_Ｎを用いて系の全体エネルギーを求める。
【０１７５】
このように、粒子の相互作用を含まないシュレディンガー方程式を解いて求めた波動関数ΨをＮ個に離散化し、系全体のエネルギーが最小となるように、離散化した波動関数Ψ_１，・・・，Ψ_ｉ，・・・，Ψ_Ｎの各々を演算する。
【０１７６】
以下、粒子の相互作用を含まないシュレディンガー方程式を解いて求めた波動関数を変分法の原理に適用して系全体のエネルギーを演算する際に、ハミルトニアンに取り入れられる非線形項として粒子のクーロン相互作用を用いた場合について説明する。
【０１７７】
図２４は、量子準位演算プログラムを構成する各ステップを示すフローチャートである。量子準位演算プログラムが実行されると、量子井戸１００に閉じ込められた電子の自己相互作用のない場合の波動関数が演算される（ステップＳ１００）。即ち、運動エネルギーの項と外部電界によるポテンシャル項とから成るハミルトニアンを用いて電子の初期の波動関数Ψが演算される。この演算は、転送行列法、Ｓ行列法、及び狙い撃ち法のいずれかを用いて行われる。
【０１７８】
そして、最小化される系全体のエネルギーを示す式は式（１６）により与えられる。
【０１７９】
【数１６】

【０１８０】
式（１６）の右辺の第２項は、自己相互作用を含むクーロン相互作用項である。そして、ε（ｘ）は、位置ｘにおける誘電定数であり、−ｅ及びｍは、それぞれ、電子の電荷及び質量である。また、ｄ（ｙ）は、位置ｙにおける固定ドナーの体積密度であり、固定ドナーは＋ｅの電荷を有するものと仮定している。即ち、コンピュータによる計算のために、イオン化されたドナーは、自由電子により＋ｅの電荷を運ぶものと仮定した。
【０１８１】
式（１６）は、量子井戸１００の系全体のエネルギーを表わすが、式（１６）を用いて演算することは困難であるので、空間を離散化するとともに系全体に存在する粒子数で規格化することとした。即ち、ステップＳ１００で求めた波動関数ΨをＮ個の波動関数Ψ_１，・・・，Ψ_ｉ，・・・，Ψ_Ｎに離散化するとともに系全体の粒子数で規格化することにより、式（１６）から式（１７）が得られる。
【０１８２】
【数１７】

【０１８３】
式（１７）において、Ｎｔは規格化因子であり、Ｎｅは電子数である。また、Ｖｉ’は、クーロンポテンシャルである。
【０１８４】
式（１７）は、系に存在する１つの電子あたりのエネルギーを表わし、以下では、最適化問題の用語に合わせて「コスト関数」と呼ばれる。そして、式（１７）を離散化したＮ個の波動関数Ψ_１，・・・，Ψ_ｉ，・・・，Ψ_Ｎの各々によって偏微分してＮ個の導関数を演算する。即ち、式（１８）が得られる。
【０１８５】
【数１８】

【０１８６】
式（１８）において、Ｅｉｎｔは１粒子当たりの非線形の相互エネルギーを意味する。そして、式（１６）から式（１７）及び式（１８）を求めることは、コスト関数及びコスト関数の導関数を計算すること（ステップＳ１０２）に相当する。
【０１８７】
なお、式（１８）は、式（１７）の両辺を波動関数Ψ_１，・・・，Ψ_ｉ，・・・，Ψ_Ｎの各々によって偏微分したものではなく、式（１７）の自己相互作用を考慮したハミルトニアンＨ’を波動関数Ψ_１，・・・，Ψ_ｉ，・・・，Ψ_Ｎの各々によって偏微分したものになっているが、その理由は次の理由による。
【０１８８】
量子準位演算プログラムは、自己相互作用を考慮して量子井戸１００の井戸層１０３に閉じ込められた電子の系全体のエネルギーが最小になるように波動関数を計算することを目的とするため、式（１７）の全体を波動関数Ψ_１，・・・，Ψ_ｉ，・・・，Ψ_Ｎの各々によって偏微分するのではなく、自己相互作用を考慮したハミルトニアンＨ’を波動関数Ψ_１，・・・，Ψ_ｉ，・・・，Ψ_Ｎの各々によって偏微分することにより自己相互作用の影響を最大限に反映して演算することにしたものである。
【０１８９】
従って、ステップＳ１０２においては、コスト関数（式（１７））に離散化した波動関数Ψ_１，・・・，Ψ_ｉ，・・・，Ψ_Ｎを代入して系全体のエネルギーＳ｛（Ψ_ｉ）｝を演算し、コスト関数の導関数が自己相互作用を考慮したハミルトニアンＨ’を波動関数Ψ_１，・・・，Ψ_ｉ，・・・，Ψ_Ｎの各々によって偏微分することにより演算される（式（１８））。
【０１９０】
その後、ステップＳ１０２において演算したＮ個の導関数及び次式（１９）を用いて新しい波動関数を演算する（ステップＳ１０４）。
【０１９１】
【数１９】

【０１９２】
式（１９）において、ηは、量子準位演算プログラムによる最小エネルギーの演算が収束するようにするためのスケーリングファクターである。また、式（１９）の右辺第２項は、自己相互作用を考慮したハミルトニアンＨ’を波動関数Ψの各成分よって偏微分して式（１８）を演算し、その演算した式（１８）に波動関数Ψ^ｏｌｄを代入して演算される。
【０１９３】
式（１９）による新しい波動関数Ψ_ｉ ^ｎｅｗは、既に演算された波動関数Ψ_ｉ ^ｏｌｄに自己相互作用による変化分（式（１９）の右辺第２項）を加算したものである。従って、新しい波動関数Ψ_ｉ ^ｎｅｗは、自己相互作用の変化分を反映して演算される。そして、新しい波動関数Ψ_ｉ ^ｎｅｗは、離散化したＮ個の波動関数Ψ_１，・・・，Ψ_ｉ，・・・，Ψ_Ｎの各々に対して演算される。
【０１９４】
式（１９）により新しい波動関数Ψ_ｉ ^ｎｅｗが演算されると、その新しい波動関数Ψ_ｉ ^ｎｅｗを式（１７）の波動関数Ψ_ｉに代入して、新しい波動関数によるコスト関数（式（１７））及びその導関数（式（１８））が演算される（ステップＳ１０６）。
【０１９５】
そして、コスト関数が増加するか否かにより、またはステップＳ１０６において演算したＮ個の導関数の全てが零であるか否かを判定することにより、系全体のエネルギーが収束するか否かが判定される（ステップＳ１０８）。
【０１９６】
ステップＳ１０８において、コスト関数が増加しないとき、又はＮ個の導関数の全てが零でないとき、系全体のエネルギーは収束しないと判定され、ステップＳ１０４，Ｓ１０６，Ｓ１０８が繰返し実行される。これは、コスト関数が増加しないとき、コスト関数の増加分が零かコスト関数が減少していることを示し、系全体のエネルギーが更に減少する可能性があるからであり、コスト関数の導関数が零でないとき、コスト関数が変化していることを示し、この場合も系全体のエネルギーが更に減少する可能性があるからである。
【０１９７】
一方、ステップＳ１０８において、コスト関数が増加したとき、またはＮ個の導関数の全てが零であるとき、系全体のエネルギーが収束したと判定される。そして、１ステップ前の波動関数を出力する（ステップＳ１１０）。
【０１９８】
例えば、ステップＳ１０４，Ｓ１０６，Ｓ１０８を５回繰返して実行し、５回目にステップＳ１０８においてコスト関数が増加、又はＮ個の導関数の全てが零になったとすると、４回目にステップＳ１０４において式（１９）を用いて演算したＮ個の成分をステップＳ１１０において出力する。５回目にステップＳ１０８においてコスト関数が増加、又はＮ個の導関数が全て零になったということは、ステップＳ１０４において４回目に演算した波動関数を用いて、ステップＳ１０６において演算したコスト関数が最小になったことを意味するからである。
【０１９９】
ステップＳ１１０において、系全体のエネルギーを最小にするＮ個の成分が決定されると、その決定されたＮ個の成分から成る波動関数（最終的な波動関数）と、自己相互作用を考慮したハミルトニアンである式（２０）を用いて系全体のエネルギーを演算し（ステップＳ１１２）、全体の演算動作が終了する。
【０２００】
【数２０】

【０２０１】
上述した各ステップのうち、ステップＳ１０２，Ｓ１０４，Ｓ１０６，Ｓ１０８，Ｓ１１０は１つの粒子に対する動作を表わす。従って、量子準位演算プログラムは、系全体のエネルギーを最小とするようにＮ個の成分を決定するとき、１つの粒子に着目し、その１つの粒子に対する波動関数において自己相互作用による影響が最も小さくなるようにＮ個の成分を決定することを第１の特徴とする。そして、１つの粒子に対する波動関数が決定されると、その決定された波動関数を系全体の粒子に適用して系全体のエネルギーを演算することを第２の特徴とする。
【０２０２】
その結果、量子井戸１００のバリア層１０１，１０２又は井戸層１０３へのドーピング量の増加に起因して井戸層１０３に閉じ込められる電子数が増加し、電子の相互作用による影響が大きくなっても系全体のエネルギーが収束するように演算できる。
【０２０３】
以下、量子準位演算プログラムを用いて演算した例について具体的に説明する。
【０２０４】
図２５は、量子準位演算プログラムが演算の対象とする量子井戸の具体例である。縦軸はエネルギーを示し、横軸はバリア層１０１，１０２及び井戸層１０３の厚み方向の距離ｚを示す。２つのバリア層１０１，１０２の各々は、３５モノレイヤー（＝１０ｎｍ）のＡｌ_０．２Ｇａ_０．８Ａｓから成り、井戸層１０３は、３５モノレイヤー（＝約１０ｎｍ）のＧａＡｓから成る。そして、伝導帯側のバンドの不連続値ΔＥｃ（＝Ｅｃ１−Ｅｃ２）は１６７ｍｅＶである。
【０２０５】
図２５に示す系において井戸層１０３であるＧａＡｓのボトムエッジから２８．２５７９５ｍｅＶの位置に第１準位Ｅ１が形成され、その第１準位Ｅ１を占める電子の初期の波動関数が波動関数Ψである。
【０２０６】
以下、井戸層１０３にドナーをドーピングした場合（「井戸ドーピング」と言う。）、及びバリア層１０１，１０２にドナーをドーピングした場合（「バリアドーピング」と言う。）について説明する。そして、ドーピングされたドナーは全て活性化され、自由電子の総数は、ドナーの総数に等しいと仮定する。
【０２０７】
また、バリア層１０１，１０２及び井戸層１０３の各モノレイヤーを１０ポイントに分割する。即ち、波動関数Ψを１０５１個の波動関数Ψ_１，・・・，Ψ_ｉ，・・・，Ψ_１０５１に離散化する。
【０２０８】
更に、バリア層１０１，１０２及び井戸層１０３から成る量子井戸１００の両端では波動関数は零であると近似している。
【０２０９】
更に、式（１９）のスケーリングファクターηを−１．１５×１０^７に固定して計算した。このスケーリングファクターηの値は、あくまで１モノレイヤー当たり１０ポイントに分割した場合の値であり、他の分割数の場合には、他の値が用いられる。例えば、１モノレイヤー当たり２０ポイントに分割したのであれば、η＝−２．３０×１０^７が用いられる。また、スケーリングファクターηの値としては、これらの値以外の値も想定される。
【０２１０】
図２６は、井戸ドーピングを行なった場合の計算結果をドーピング量に対して示したものである。図２６の（ａ）〜（ｄ）の各々において、縦軸はエネルギーを示し、横軸はバリア層１０１，１０２及び井戸層１０３の厚み方向の距離ｚを示す。図２６の（ａ）は、ドーピング量が１．０×１０^１８ｃｍ^−３の場合を示し、図２６の（ｂ）は、ドーピング量が５．０×１０^１８ｃｍ^−３の場合を示し、図２６の（ｃ）は、ドーピング量が８．０×１０^１８ｃｍ^−３の場合を示し、図２６の（ｄ）は、ドーピング量が１．０×１０^１９ｃｍ^−３の場合を示す。また、図２６の（ａ）〜（ｄ）において波動関数Ψ０は初期の波動関数であり、記号Ｅｗ１〜Ｅｗ４は基底状態のエネルギー値を示し、記号Ψｗ１〜Ψｗ４は上述した量子準位演算プログラムを用いて計算された波動関数を示す。
【０２１１】
量子準位演算プログラムを用いて４つのドーピング量に対する計算を行なった結果、その計算時間は３０秒以下と非常に短かった。
【０２１２】
井戸ドーピングにおいては、電子及びドナーは、両方とも井戸層１０３に存在するため、電子とドナーとの間で電荷の打消しが起こり、これが量子井戸の外側におけるバンドの曲がりを抑制する。そして、これは、高いドーピング量においても当てはまる。
【０２１３】
また、計算された波動関数のひずみは小さいことが解かった。更に、基底状態のエネルギーＥｗ１〜Ｅｗ４の変化は、イオン化された不純物（ドナー）及び電子間の相互作用による電子のポテンシャルに起因するが、この変化は、そう大きくないのに対し、井戸層１０３におけるバンドの曲がりがドーピング量の増加に伴い大きくなる。
【０２１４】
表１は、井戸ドーピングにおいて、量子準位演算プログラムを用いて計算した基底状態のエネルギー値を従来のＳ−Ｐ法を用いた計算結果と比較して示す。
【０２１５】
【表１】

【０２１６】
表１から明らかなように、ドーピング量が１．０×１０^１８〜１．７×１０^１８ｃｍ^−３の範囲においては、量子準位演算プログラムを用いた計算結果は、従来のＳ−Ｐ法による計算結果と良い一致を示し、その差は、殆ど無視できる程度の１．３×１０^−４％以下である。
【０２１７】
また、従来のＳ−Ｐ法は、１．８×１０^１８ｃｍ^−３以上のドーピング量に対して発散するのに対し、量子準位演算プログラムを用いた計算結果は、少なくとも１．０×１０^１９ｃｍ^−３のドーピング量までは確実に収束することがわかった（数値が得られていることは収束していることを示す。以下同じ。）。
【０２１８】
このように、量子準位演算プログラムを用いた計算方法は、井戸層１０３へドーピングした場合において、ドーピング量の低い領域では従来のＳ−Ｐ法による計算結果と良い一致を示し、高いドーピング量の範囲まで収束した計算結果を示す。その結果、量子準位演算プログラムを用いることにより、高いドーピング量の範囲まで基底状態のエネルギー値を得ることができる。
【０２１９】
図２７は、バリアドーピングを行なった場合の計算結果をドーピング量に対して示したものである。図２７の（ａ）〜（ｄ）において、縦軸はエネルギーを示し、横軸はバリア層１０１，１０２及び井戸層１０３の厚み方向の距離ｚを示す。また、図２７の（ａ）〜（ｄ）において、記号Ψ０，Ψｗ１〜Ψｗ４，Ｅｗ１〜Ｅｗ４は、図２６の（ａ）〜（ｄ）における意味と同じである。更に、図２７の（ａ）は、ドーピング量が１．０×１０^１７ｃｍ^−３の場合を示し、図２７の（ｂ）は、ドーピング量が３．０×１０^１７ｃｍ^−３の場合を示し、図２７の（ｃ）は、ドーピング量が５．０×１０^１７ｃｍ^−３の場合を示し、図２７の（ｄ）は、ドーピング量が７．０×１０^１７ｃｍ^−３の場合を示す。
【０２２０】
この場合も、計算時間は３０秒以下であった。バリア層１０１，１０２へのドーピング量が増加するに伴い波動関数Ψｗ１〜Ψｗ４のひずみが増加する。これは、次の理由による。バリアドーピングにおいては、閉じ込められた電子は井戸層１０３に存在し、ドナーはバリア層１０１，１０２に存在するため、井戸層１０３の電子は、電子間の相互作用による反発力とバリア層１０１，１０２に存在するドナーからの引力とにより両側に存在するバリア層１０１，１０２の方へ拡がろうとする。そして、バリア層１０１，１０２へのドーピング量が増加するに従ってバリア層１０１，１０２に存在するドナーからの引力が増加するので、井戸層１０３における電子の拡がりは増加する。その結果、バリア層１０１，１０２へのドーピング量が増加するに従って波動関数Ψｗ１〜Ψｗ４のひずみが増加する。
【０２２１】
また、バリアドーピングの場合、井戸ドーピングの場合に比べてバンドの曲がりが大きくなる。これは、次の理由による。イオン化されたドナーによる電子のポテンシャルは、井戸層１０３のボトムにおけるバンド端のポテンシャルを増加させる。一方、電子とドナーとは空間的に分離されているので、イオン化されたドナーからの電気的な影響を除去しようとする電子の働きは低下する。その結果、井戸層１０３からバリア層１０１，１０２へ電界が及び、バンドの曲がりが大きくなる。
【０２２２】
表２は、バリアドーピングにおいて、量子準位演算プログラムを用いて計算した基底状態のエネルギー値を従来のＳ−Ｐ法を用いた計算結果と比較して示す。
【０２２３】
【表２】

【０２２４】
表２から明らかなように、ドーピング量が１．０×１０^１７〜５．０×１０^１７ｃｍ^−３の範囲においては、量子準位演算プログラムを用いた計算結果は、従来のＳ−Ｐ法による計算結果と良い一致を示す。そして、量子準位演算プログラムを用いた計算方法は、従来のＳ−Ｐ法では収束しない６．０×１０^１７ｃｍ^−３，７．０×１０^１７ｃｍ^−３のドーピング量において収束する。
【０２２５】
このように、量子準位演算プログラムを用いた計算方法は、バリア層１０１，１０２へドーピングした場合において、ドーピング量の低い領域では従来のＳ−Ｐ法による計算結果と良い一致を示し、高いドーピング量の範囲まで収束した計算結果を示す。その結果、量子準位演算プログラムを用いることにより、高いドーピング量の範囲まで基底状態のエネルギー値を得ることができる。
【０２２６】
図２８は、ドーピング量が２．０×１０^１８ｃｍ^−３の井戸ドーピングを行なった場合において、量子準位演算プログラムを用いた計算結果を従来のＳ−Ｐ法による計算結果と比較して示す。図２８の（ａ），（ｂ）において、縦軸はエネルギーであり、横軸はバリア層１０１，１０２及び井戸層１０３の厚み方向の距離ｚである。図２８の（ａ）は、従来のＳ−Ｐ法による計算結果を示し、図２８の（ｂ）は、量子準位演算プログラムを用いた計算結果を示す。
【０２２７】
記号Ψｃ，Ψｉｖｔは、波動関数を示し、記号Ｖｃ，Ｖｉｖｔはバンド端のポテンシャルを示す。
【０２２８】
従来のＳ−Ｐ法は、シュレディンガー方程式の解法とポアソン方程式の解法とを繰返す過程において、小さなバランスのずれや計算のエラーが増幅され、その結果、計算ステップの進行に対して波動関数Ψｃが振動する。
【０２２９】
これに対して、量子準位演算プログラムを用いた計算方法は、図２４に示すステップＳ１０４，Ｓ１０６，Ｓ１０８の繰返しにおいて、波動関数が本当の解に近づくに従って式（１９）による補正が小さくなり、その結果、波動関数Ψｉｖｔは収束する。
【０２３０】
上記においては、１モノレイヤー当たりの分割数が１０ポイントの場合について説明したが、この分割数を変化させた場合について図２９に示す。縦軸は基底状態のエネルギーを表わし、横軸は１モノレイヤー当たりの分割数を表わす。また、曲線１０５（実線で示される）は、量子準位演算プログラムを用いた計算結果であり、曲線１０６（点線で示される）は、従来のＳ−Ｐ法による計算結果である。なお、ドーピングは、井戸層１０３へ行なわれ、ドーピング量は１．０×１０^１８ｃｍ^−３である。
【０２３１】
両方の方法において、分割数が増加するに従って基底状態のエネルギー値が小さくなり、空間をより小さく分割した方が正確な波動関数が得られることが解った。また、空間の分割数に関しては、量子準位演算プログラムを用いた計算方法は従来のＳ−Ｐ法による計算方法と大きな差がないことが解かった。
【０２３２】
上述した井戸ドーピング及びバリアドーピングにおける計算においては、初期状態を狙い撃ち法（ＳｈｏｏｔｉｎｇＭｅｔｈｏｄ）により求めた初期状態を用いた。
【０２３３】
表３は、井戸ドーピング及びバリアドーピングにおいて、数学的に厳密な波動関数を用いた場合の、量子準位演算プログラムを用いた計算結果を示す。初期状態として採用した厳密な波動関数に対応するエネルギー値は、２８．２７６８３ｍｅＶである。
【０２３４】
【表３】

【０２３５】
その結果、初期状態として採用した厳密な波動関数に対応するエネルギー値として２８．２７６８３ｍｅＶのエネルギー値を用いた場合の方が、各ドーピング量に対するエネルギー値は大きくなることが解かった（表１及び表２参照）。しかし、その差は、殆ど１．５％であり、実際の半導体材料における量子準位の見積もりにおいては許容される範囲である。従って、量子準位演算プログラムにおいては、狙い撃ち法（ＳｈｏｏｔｉｎｇＭｅｔｈｏｄ）により初期状態のエネルギー値を演算しても、特に、問題はないと考えられる。
【０２３６】
このようにして、量子準位演算プログラムを用いて量子井戸構造における井戸層の幅ｄ_ｗ１、井戸層の電子密度ｎ_ｗ１、バリア層の高さＷ_ｄ２及びバリア層の幅ｄ_ｂ等から成る入力値と、エネルギー準位Ｅ_ｗ１から成る出力値とが取得される。
【０２３７】
そして、集合１０に含まれる入力値（ｘ_１（１），・・・，ｘ_ｍ１（１））、（ｘ_１（２），・・・，ｘ_ｍ１（２））、・・・、（ｘ_１（Ｓ），・・・，ｘ_ｍ１（Ｓ））、・・・、（ｘ_１（Ｍ），・・・，ｘ_ｍ１（Ｍ））及び集合２０に含まれる出力値（ｚ（１）、ｚ（２）、・・・、ｚ（Ｓ）、・・・、ｚ（Ｍ））が取得されると、出力値（ｚ（１）、ｚ（２）、・・・、ｚ（Ｓ）、・・・、ｚ（Ｍ））が上述したなだらかな曲面上に存在するか否かを判定し、出力値（ｚ（１）、ｚ（２）、・・・、ｚ（Ｓ）、・・・、ｚ（Ｍ））がなだらかな曲面上に存在する場合、集合１０に含まれる入力値（ｘ_１（１），・・・，ｘ_ｍ１（１））、（ｘ_１（２），・・・，ｘ_ｍ１（２））、・・・、（ｘ_１（Ｓ），・・・，ｘ_ｍ１（Ｓ））、・・・、（ｘ_１（Ｍ），・・・，ｘ_ｍ１（Ｍ））から入力値（ｘ_１（１），・・・，ｘ_ｍ１（１））、（ｘ_１（２），・・・，ｘ_ｍ１（２））、・・・、（ｘ_１（Ｓ），・・・，ｘ_ｍ１（Ｓ））を抽出し、集合２０に含まれる出力値（ｚ（１）、ｚ（２）、・・・、ｚ（Ｓ）、・・・、ｚ（Ｍ））から出力値（ｚ（１）、ｚ（２）、・・・、ｚ（Ｓ）、・・・、ｚ（Ｍ））から出力値（ｚ（１）、ｚ（２）、・・・、ｚ（Ｓ））を抽出してサンプルデータ３０を準備する。
【０２３８】
図２２に示すパーソナルコンピュータ９０が量子準位演算プログラムを用いてエネルギー準位Ｅ_ｗ１から成る出力値を演算する場合、ＣＰＵ９１は、量子井戸構造における井戸層の幅ｄ_ｗ１、井戸層の電子密度ｎ_ｗ１、バリア層の高さＷ_ｄ２及びバリア層の幅ｄ_ｂ等から成る入力値と、エネルギー準位Ｅ_ｗ１から成る出力値とをＲＡＭ９２に記憶する。そして、ＣＰＵ９１は、近似関数ｆを求める演算を行なうプログラムの実行をキーボード９８から指示されると、その指示に応じてＲＡＭ９２に記憶した入力値及び出力値を読み出してディスプレイ９７に表示する。
【０２３９】
ユーザは、ディスプレイ９７に表示された入力値及び出力値を見て、出力値がなだらかな曲面上に存在するか否かを判定し、出力値がなだらかな曲面上に存在する場合、サンプルデータ３０を構成する入力値ｘ_１（ｎ），・・・，ｘ_ｍ１（ｎ）及び出力値ｚ_１（ｎ）（ｎ＝１〜Ｓ）をキーボード９８から指定する。
【０２４０】
そして、ＣＰＵ９１は、指定された入力値ｘ_１（ｎ），・・・，ｘ_ｍ１（ｎ）及び出力値ｚ_１（ｎ）からサンプルデータ３０を構成し、この発明によるプログラムをＲＯＭ９３から読み出し、その読み出したプログラムを実行して入力値ｘ_１（ｎ），・・・，ｘ_ｍ１（ｎ）と出力値ｚ_１（ｎ）との関係を規定する近似関数ｆを求める演算を行なう。
【０２４１】
また、出力値ｚ_１（ｎ）（ｎ＝１〜Ｓ）がなだらかな曲面上に存在するか否かの判定基準を予めＣＰＵ９１に与えておき、入力値ｘ_１（ｎ），・・・，ｘ_ｍ１（ｎ）及び出力値ｚ_１（ｎ）の抽出をＣＰＵ９１が自動的に行なうようにしてもよい。
【０２４２】
従って、この発明によるプログラムは、より具体的には、量子井戸構造を決定するバリア層１０１，１０２及び井戸層１０３のパラメータを入力値とし、量子井戸における電子及び正孔のエネルギー準位を出力値とするサンプルデータを用いて、入力値と出力値との関係を規定する近似関数ｆの演算を行なうことを特徴とする。
【０２４３】
上記においては、入力値ｘ_１（ｎ），・・・，ｘ_ｍ１（ｎ）及び出力値ｚ_１（ｎ）は、パーソナルコンピュータ９０により演算される場合を説明したが、この発明は、これに限らず、入力値ｘ_１（ｎ），・・・，ｘ_ｍ１（ｎ）及び出力値ｚ_１（ｎ）を実験結果として取得し、その取得した実験結果をパーソナルコンピュータ９０に入力してサンプルデータ３０を構成するようにしてもよい。
【０２４４】
また、入力値ｘ_１（ｎ），・・・，ｘ_ｍ１（ｎ）及び出力値ｚ_１（ｎ）の一方が実験結果であり、入力値ｘ_１（ｎ），・・・，ｘ_ｍ１（ｎ）及び出力値ｚ_１（ｎ）の他方をパーソナルコンピュータ９０による計算結果としてサンプルデータ３０を構成するようにしてもよい。
【０２４５】
更に、上記においては、出力層４３が１個の出力ユニットから成る３層ニューラルネットワーク４０を用いて近似関数ｆを求める演算を行なう場合について説明したが、この発明によるプログラムは、図３０に示す３層ニューラルネットワーク４０Ａを用いて近似関数ｆを求める演算を行なうようにしてもよい。
【０２４６】
図３０は、この発明によるプログラムが入力値に対する出力演算値を演算する３層ニューラルネットワークの他の概念図を示す。３層ニューラルネットワーク４０Ａは、３層ニューラルネットワーク４０の出力層４３を出力層４３Ａに代えたものであり、その他は、３層ニューラルネットワーク４０と同じである。
【０２４７】
出力層４３Ａは、出力ユニット４３ｋ（ｋ＝１〜ｍ３）から成る。そして、出力ユニット４３ｋは、中間ユニット４２ｊ（ｊ＝１〜ｍ２）からそれぞれ出力Ｙ_ｊ（ｊ＝１〜ｍ２）を受け、その受けた出力Ｙ_ｊ、結合重みＷ_ｊｋ及び閾値Θ_ｋを式（４）に代入して出力演算値Ｚ_１（ｎ），・・・，Ｚ_ｍ３（ｎ）（ｎ＝１〜Ｓ）を演算する。
【０２４８】
３層ニューラルネットワーク４０Ａが用いられる場合も、この発明によるプログラムは、図１７及び図１８に示すフローチャートに従って近似関数ｆを求める演算を実行する。そして、３層ニューラルネットワーク４０Ａを用いて近似関数ｆを求める演算が実行される場合、ｎ×Ｓ個の出力値ｚ_１（ｎ），・・・，ｚ_ｍ１（ｎ）が準備される。
【０２４９】
その他は、上述したとおりである。
更に、近似関数ｆを求める演算に用いる入力値及び出力値は、量子井戸に関するデータに限らず、出力値がなだらかな曲面上に存在するデータであれば、どのような種類の入力値及び出力値であってもよい。
【０２５０】
今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は、上記した実施の形態の説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。
【図面の簡単な説明】
【図１】この発明によるプログラムが近似関数を求める演算に用いる入力値と出力値とを示す図である。
【図２】入力値が２個の場合における入力値と出力値との関係を示す図である。
【図３】この発明によるプログラムが入力値に対する出力演算値を演算する３層ニューラルネットワークの概念図である。
【図４】２個の入力値を用いた場合に式（５）を用いて演算された内部状態を示す図である。
【図５】２個の入力値を用いた場合の図３に示す中間ユニットからの出力を示す図である。
【図６】２個の入力値を用いた場合に式（１）を用いて演算された内部状態を示す図である。
【図７】２個の入力値を用いた場合の図３に示す中間ユニットからの出力を示す図である。
【図８】パラメータθ_ｊ及びｗ_ｉｊの範囲を局所的特徴に相応する小さな半径にした場合における識別超球面の取り得る相対的位置関係と中間ユニット４２ｊの出力例とを示す図である。
【図９】パラメータθ_ｊ及びｗ_ｉｊの範囲を広くした場合における識別超球面の取り得る相対的位置関係と中間ユニット４２ｊの出力例とを示す図である。
【図１０】パラメータｗ_ｉｊ（中心）が入力変数の定義域外にある小さな超球である場合における識別超球面の取り得る相対的位置関係と中間ユニット４２ｊの出力例とを示す図である。
【図１１】高次元アルゴリズムによる最適化のフローチャートである。
【図１２】高次元アルゴリズムによる解の探索方法を示す概念図である。
【図１３】パラメータが２個の場合におけるコスト関数のランドスケープを示す図である。
【図１４】式（９）に示されるν_ｎ（ｑ_ｎ）の概略図である。
【図１５】コスト関数の局所解に捉まる度合いを高次元アルゴリズム（ＨＡ）と焼きなまし法（ＳＡ）について比較して示す図である。
【図１６】コスト関数の平坦領域を通過する速さを高次元アルゴリズム（ＨＡ）と焼きなまし法（ＳＡ）について比較して示す図である。
【図１７】この発明によるプログラムが入力値ｘ_１（ｎ），・・・，ｘ_ｍ１（ｎ）と出力値ｚ_１（ｎ）との関係を規定する近似関数ｆを求める演算を行なうフローチャートである。
【図１８】図１７に示すステップＳ２の詳細な動作を説明するためのフローチャートである。
【図１９】中間ユニットの個数を１個に設定した場合の３層ニューラルネットワークの概念図である。
【図２０】中間ユニットの個数を２個に設定した場合の３層ニューラルネットワークの概念図である。
【図２１】二入力−一出力の３つのテスト関数の等高線を示す図である。
【図２２】パーソナルコンピュータの概略ブロック図である。
【図２３】量子準位演算プログラムが演算の対象とする量子井戸の概念図である。
【図２４】量子準位演算プログラムを構成する各ステップを示すフローチャートである。
【図２５】量子準位演算プログラムが演算の対象とする量子井戸の具体例である。
【図２６】井戸ドーピングにおいて、ドーピング量を変化させたときの計算結果である。
【図２７】バリアドーピングにおいて、ドーピング量を変化させたときの計算結果である。
【図２８】量子準位演算プログラムを用いた計算結果と従来法による計算結果との比較を示す図である。
【図２９】基底状態のエネルギー値の分割数依存性を示す図である。
【図３０】この発明によるプログラムが入力値に対する出力演算値を演算する３層ニューラルネットワークの他の概念図である。
【符号の説明】
１〜６等高線、７曲面、１０，２０集合、３０サンプルデータ、４０，４０Ａ３層ニューラルネットワーク、４１入力層、４２，４２Ａ，４２Ｂ中間層、４３，４３Ａ出力層、５０超平面、５１，５２，８１領域、６０超球面、６１〜６５曲面、７０意味空間、７１解、７２，８２矢印、８０高次元空間、８９，１０５，１０６曲線、９０パーソナルコンピュータ、９１ＣＰＵ、９２ＲＡＭ、９３ＲＯＭ、９４シリアルインタフェース、９５端子、９６ＣＤ−ＲＯＭドライブ、９７ディスプレイ、９８キーボード、９９ＣＤ、１００量子井戸、１０１，１０２バリア層、１０３井戸層、１０４エネルギー準位、４１１〜４１ｍ１入力ユニット、４２１〜４２ｍ２中間ユニット、４３１〜４３ｍ３出力ユニット。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a program for causing a computer to execute an operation for obtaining an approximate function that defines a relationship between an input and an output, and a computer-readable recording medium on which the program is recorded.
[0002]
[Prior art]
Several highly accurate quantum state calculation methods considering self-interaction have been proposed. In any method, the structure parameter of the quantum well, the external electric field applied to the quantum well, and the like are used as a set of input variables. When one set of values is extracted from the set of input variables, the extracted 1 Using a set of two values as input, a computation-intensive operation such as integration of differential equations and optimization of various parameters is executed. Then, for a given input, a physically allowable energy level or wave function is calculated as an output.
[0003]
Although the prior art of the present invention has been described above based on general technical information obtained by the applicant, it should be disclosed as prior art document information before filing within the scope of the applicant's storage. The applicant has no information.
[0004]
[Problems to be solved by the invention]
However, the conventional method requires a dedicated program for calculation and sufficient calculation resources (such as a memory and a CPU (Central Processing Unit)) in order to obtain an output value for each input value. In actual nanodevice design, specific requirements are almost always imposed on the output of the above-described calculation method, such as the energy level and the wave function. That is, the energy level or wave function in the quantum well is given as a design value, and in order to realize the given energy level or wave function, the width of the well layer, the band gap of the well layer, the height of the barrier layer It is necessary to find out how to set parameters such as the width of the barrier layer, the band gap of the barrier layer, and the doping amount of the dopant in the well layer or the barrier layer.
[0005]
In this case, if the conventional method described above is used, it is necessary to search for a system that satisfies the condition while changing the input value by trial and error, but for that purpose, an enormous amount of calculation is required, and the design of nanodevices is required. It is not efficient to perform the operation every time.
[0006]
Therefore, the present invention has been made to solve such a problem, and an object of the present invention is to provide a program for causing a computer to execute an operation for obtaining an approximate function that defines a relationship between an input and an output. is there.
[0007]
Another object of the present invention is to provide a computer-readable recording medium on which a program for causing a computer to execute an operation for obtaining an approximate function that defines a relationship between an input and an output is recorded.
[0008]
Means for Solving the Problems and Effects of the Invention
According to the present invention, m (m is a natural number) input values and n (n is a natural number) output values are used, and m (m is a natural number) sample data and m (m is a natural number) sample data are used. A program for causing a computer to execute an operation for obtaining an approximate function that defines a relationship with n outputs includes: a first step of receiving m × S input values and n × S output values; The parameter value of the discriminating hypersphere among all parameters of the three-layer neural network of the hypersphere discrimination type that calculates n output calculation values for the input values is changed in a search range wider than a normal search range. Then, n × S output operation values for m × S input values are calculated by hypersphere identification type operation, and an approximate function is obtained using the calculated n × S output operation values. The second step is to optimize the values of all parameters The second step is to set a high-dimensional space higher than the number of dimensions defined by the number of all parameters, and the values of all parameters in the set high-dimensional space are other than optimal values. Is a program to be executed by a computer for optimizing all parameters by a high-dimensional algorithm which is expected to pass through an area quickly and easily enter an area where all parameter values are optimal.
[0009]
Preferably, in the second step, the number of intermediate units included in the three-layer neural network and performing the operation of the hypersphere identification type is set to an initial value, and n × S output operation values are calculated, Optimize all parameters.
[0010]
Preferably, the second step is a first sub-step of setting all parameters to initial values and calculating n × S output calculation values by a hypersphere identification type calculation, and the calculated n × S calculation values A second sub-step of calculating a cost function value for evaluating the output calculation value of the above, and comparing the calculated cost function value with a predetermined value; and when the cost function value is equal to or less than the predetermined value, the cost function value is obtained. A third sub-step of optimizing the values of all the parameters of the above, and, when the cost function value is larger than a predetermined value, calculating all parameters for reducing the cost function value by a high-dimensional algorithm in a wide search range. A fourth sub-step and a fifth sub-step in which the first sub-step is executed using all the parameters calculated in the fourth sub-step, and thereafter the second to fourth sub-steps are executed. When the cost function value obtained when the first and fifth sub-steps are repeatedly executed up to a specified number of times is larger than a predetermined value, the number of intermediate units is increased and the first to fifth sub-steps are executed. And a sixth sub-step.
[0011]
Preferably, the number of intermediate units is increased one by one.
Preferably, the initial value of the number of intermediate units is one.
[0012]
Preferably, in the second step, the number of all parameters is set to an initial value, n × S output operation values are calculated, and optimization of all parameters is performed.
[0013]
Preferably, the second step is a first sub-step of setting all parameters to initial values and calculating n × S output calculation values by a hypersphere identification type calculation, and the calculated n × S calculation values A second sub-step of calculating a cost function value for evaluating the output calculation value of the above, and comparing the calculated cost function value with a predetermined value; and when the cost function value is equal to or less than the predetermined value, the cost function value is obtained. A third sub-step of optimizing the values of all the parameters of the above, and, when the cost function value is larger than a predetermined value, calculating all parameters for reducing the cost function value by a high-dimensional algorithm in a wide search range. A fourth sub-step and a fifth sub-step in which the first sub-step is executed using all the parameters calculated in the fourth sub-step, and thereafter the second to fourth sub-steps are executed. When the cost function value obtained by repeatedly executing the first to fifth sub-steps up to a specified number of times is larger than a predetermined value, the number of all parameters is increased and the first to fifth sub-steps are executed. And a sixth sub-step.
[0014]
Preferably, all parameters are incremented by a predetermined number. Then, when the number of all parameters is increased, the first to fifth sub-steps of the program are executed by fixing the values of a predetermined number of parameters before the number of all parameters is increased.
[0015]
Preferably, when a predetermined number of parameters before the number of all parameters is increased is set as a first parameter, and the increased predetermined number of parameters is set as a second parameter, the fourth sub-step includes: The parameters are fixed, the second parameter is changed in a wide search range, and all parameters for reducing the cost function value are calculated by a high-dimensional algorithm.
[0016]
Preferably, the second sub-step calculates, as a cost function value, an average of the sum of square errors of the received n × S output values and the calculated n × S output calculation values.
[0017]
Preferably, the n × S output values are approximated by a combination of Gaussian-like distributions.
[0018]
Preferably, the m × S input values and the n × S output values are data calculated by a computer.
[0019]
Preferably, the m × S input values and the n × S output values are data calculated by a quantum level calculation program for calculating quantum levels of particles confined in the microstructure. Then, the quantum level calculation program calculates an initial wave function based on the linear Schrodinger equation, and gives the calculated initial wave function as a numerical sequence composed of a plurality of discretized components; Using the first wave function having a plurality of discretized components and the Hamiltonian including a nonlinear term in consideration of the interaction between particles, the first wave function is normalized by the number of particles present in the microstructure, and the energy of the entire system is A step B of calculating a cost function indicating the following, a step C of calculating a final wave function that minimizes the total energy of the system using the calculated cost function, and a step C of calculating the final wave function and the Hamiltonian. Calculating the energy of the state represented by the final wave function.
[0020]
Preferably, the high-dimensional algorithm comprises a step of defining a space of all parameters that appear in the problem to be solved and to be optimized as a semantic space, a step of defining a new space by a conjugate parameter conjugate to all the parameters, Defining a high-dimensional space by adding a new space to, a step of setting a problem in the high-dimensional space, and quickly passing through a region where the values of all parameters are other than optimal values, and the values of all parameters are optimal values. Performing an autonomous motion expected to easily enter a certain area in a high-dimensional space to detect the optimal values of all parameters.
Further, according to the present invention, a computer-readable recording medium storing a program for causing a computer to execute an operation for obtaining an approximation function includes a program according to any one of claims 1 to 14. The recorded computer-readable recording medium.
[0021]
The program according to the present invention calculates an output operation value for an input value by using a three-layer neural network by changing a hypersphere parameter when performing a hypersphere identification type operation in a range wider than a normal search range. Then, the program according to the present invention optimizes parameters using a high-dimensional algorithm so that a cost function for evaluating the calculated output operation value is equal to or less than a predetermined value, and performs an approximation that defines a relationship between an input value and an output value. Operate a function.
[0022]
Therefore, according to the present invention, an approximate function that can easily obtain an output value with respect to an input value can be obtained.
[0023]
Further, in the present invention, when calculating the output calculation value, the program changes the search range of the parameter in a range wider than the normal search range, and performs the calculation of the hypersphere identification type.
[0024]
Therefore, according to the present invention, both the local feature and the global feature can be efficiently reflected in the output operation value.
[0025]
Furthermore, in the present invention, the program optimizes the parameters using a high-dimensional algorithm that performs an autonomous motion in a high-dimensional space higher than the number of dimensions set by the number of parameters of the three-layer neural network. The autonomous motion refers to a motion that quickly passes through a region where a value other than the optimum value of the parameter exists and easily enters a region where the optimum value exists.
[0026]
Therefore, according to the present invention, it is difficult to catch the local solution of the cost function for evaluating the output operation value, and the parameter can be optimized by quickly passing through the flat region of the cost function.
[0027]
In particular, if the search range of the parameter is widened when performing the operation of the hypersphere identification type, the flat region of the cost function increases, but the high-dimensional algorithm has a feature that the optimal solution is obtained by quickly passing through the flat region. As a result, both the local feature and the global feature can be reflected in the output operation value, and the parameters can be optimized quickly.
[0028]
BEST MODE FOR CARRYING OUT THE INVENTION
Embodiments of the present invention will be described in detail with reference to the drawings. In the drawings, the same or corresponding portions have the same reference characters allotted, and description thereof will not be repeated.
[0029]
FIG. 1 shows input values and output values used by a program according to the present invention for calculating an approximate function. The set 10 indicates a set of input values, and (x₁(1), ..., x_m1(1)), (x₁(2), ..., x_m1(2)),..., (X₁(S), ..., x_m1(S)),..., (X₁(M), ..., x_m1(M)). The set 20 indicates a set of output values, and includes z (1), z (2),..., Z (S),..., Z (M) (S, M: natural numbers).
[0030]
The output value z (1) is a set of input values (x₁(1), ..., x_m1(1)), the output value z (2) is the set of input values (x₁(2), ..., x_m1(2)), and similarly, the output value z (M) is similarly set to the set of input values (x₁(M), ..., x_m1(M)). Also, a set of input values (x₁(1), ..., x_m1(1)), (x₁(2), ..., x_m1(2)),..., (X₁(S), ..., x_m1(S)),..., (X₁(M), ..., x_m1(M)) and output values z (1), z (2),..., Z (S),. It is.
[0031]
The method for obtaining the set of these input values and the output value will be described later.
From the

sets

10 and 20, sample data 30 to be used in an operation for obtaining an approximate function is generated. That is, the sample data 30 includes a sample 1 to a sample S. Then, sample 1 has a set of input values (x₁(1), ..., x_m1(1)) and an output value z (1), and sample 2 is a set of input values (x₁(2), ..., x_m1(2)) and an output value z (2). In the same manner, the sample S is similarly composed of a set of input values (x₁(S), ..., x_m1(S)) and an output value z (S). That is, the sample data 30 is composed of a set of input values and a partial set of input values and output values extracted from the output values included in the

sets

10 and 20.
[0032]
In this way, the sample data 30 to be used for the calculation for obtaining the approximate function is prepared.
[0033]
The program according to the present invention uses the sample data 30 to input values ((x₁(1), ..., x_m1(1)), (x₁(2), ..., x_m1(2)),..., (X₁(S), ..., x_m1(S)) and an approximate function (z (n) ≒ f (x) that defines the relationship between the output values (z (1), z (2),..., Z (S)).₁(N), ..., x_m1(N)) and a function f) satisfying (n = 1,..., S) are converted to a function model based on a three-layer neural network and a high-dimensional algorithm (Kazumasa Shinkami, “High-dimensional Algorithm”, Bit, Vol. No. 7, pp. 2-8 (1999), Kazumasa Shinkami, "High-dimensional Algorithm: One Method for Solving the Optimization Problem", Journal of the Japanese Fuzzy Society, Vol. 11, No. 3, pp. 382-396. (See (1999)). In the following, the calculation by the function model based on the three-layer neural network is simply referred to as “calculation by the three-layer neural network”.
[0034]
FIG. 2 shows a relationship between an input value and an output value when there are two input variables. FIG. 2A shows a set x of two input values.₁(N), x₂(N) and a set x of two input values₁(N), x₂The plot points of (n) and the contour lines of the function f are shown. In FIG. 2A, a black circle indicates a set x of two input values.₁(N), x₂A set x of two input values when each of (n) changes in the range [0, 1]₁(N), x₂The plot point of (n) is shown, and the solid line shows the contour of the function f. FIG. 2B is a three-dimensional expression of the function f.
[0035]
In this case, the sample data 30 is a set of two input values x₁(N), x₂(N), two sets of input values x₁(N), x₂(N) is represented by plot points (1), (2), ..., (S). When each of the plot points (1), (2),..., (S) is input as an input value, an output value represented by contour lines 1 to 6 is obtained. A three-dimensional representation of the curved surface represented by the contour lines 1 to 6 is a curved surface 7 shown in FIG. Therefore, when the plot points (1), (2),..., (S) are input values, the output value exists on the curved surface 7, and the program according to the present invention uses the plot point (1). ), (2),..., (S), a function f representing the curved surface 7 is calculated using a three-layer neural network and a high-dimensional algorithm.
[0036]
The curved surface 7 has a gentle curved surface. In other words, a surface expressed by a Fourier series or a series sum of trigonometric functions is expressed by a combination of Gaussian-like distributions, instead of an appropriate curved surface containing many violent vibrations. Therefore, when an operation for obtaining the function f is performed using the program according to the present invention, preferably, the output values (z (1), z (2),..., Z (S)) have a Gaussian-like distribution. Construct a "smooth curved surface" as represented by the combination.
[0037]
In the present invention, a “smooth curved surface” refers to a curved surface represented by a combination of Gaussian-like distributions.
[0038]
FIG. 3 is a conceptual diagram of a three-layer neural network in which a program according to the present invention calculates an output operation value Z (n) for an input value. The three-layer neural network 40 includes an input layer 41, an intermediate layer 42, and an output layer 43.
[0039]
The input layer 41 includes m1 input units 41i (i = 1,..., M1). The intermediate layer 42 includes m2 intermediate units 42j (j = 1,..., M2). The output layer 43 includes one output unit 431.
[0040]
The input unit 41i has an input value x input to the i-th unit of the input layer 41._i(N) and the received input value x_i(N) is transmitted to each of the m2 intermediate units 42j.
[0041]
The intermediate layer 42 is a layer that plays a major role in extracting features between input values and output values. The internal state, output and threshold value of the intermediate unit 42j are respectively set to y_j, Y_j, Θ_jAnd the parameter of the connection between the i-th input unit 41i and the j-th intermediate unit 42j is w_ij, The intermediate unit 42j calculates the internal state y according to the equation (1)._jIs calculated.
[0042]
(Equation 1)

[0043]
That is, the intermediate unit 42j performs an operation of the hypersphere identification type. The reason why the intermediate unit 42j performs the operation of the hypersphere identification type will be described later.
[0044]
Then, the intermediate unit 42j calculates the calculated internal state y._jInto equation (2) and output Y_jIs calculated.
[0045]
(Equation 2)

[0046]
That is, the intermediate unit 42j outputs the output Y using the sigmoid function._jIs calculated. In equation (2), T_jIs a parameter for adjusting the slope of the transition region of the sigmoid function. In this embodiment, in order to prevent the exponential function included in the denominator on the right side of Expression (2) from diverging numerically, T_jIs defined by equation (3).
[0047]
(Equation 3)

[0048]
In equation (3), τ_jRepresents the slope of the output function of the intermediate unit 42j.
The intermediate unit 42j outputs the output Y_jIs calculated, the calculated output Y_jTo the output unit 431 of the output layer 43.
[0049]
The output layer 43 adjusts the final output by appropriately weighting the output results of each of the m2 intermediate units 42j. Therefore, the output unit 431 calculates the output operation value Z according to the equation (4).₁(N) is calculated.
[0050]
(Equation 4)

[0051]
In equation (4), W_jIs the connection weight with the j-th intermediate unit 42j, and Θ is the threshold value of the output unit 431.
[0052]
In the present invention, the number of intermediate units is initially set to one, and is increased by one in accordance with the calculation result of obtaining the approximate function f using the set one intermediate unit. . Therefore, the output unit 431 initially outputs the output Y from the intermediate unit 421.₁And the connection weight W₁To the output operation value Z₁(N) is calculated, and if the number of intermediate units is increased, the output operation value Z is calculated using the output from the increased intermediate unit and the connection weight with the increased intermediate unit.₁(N) is calculated.
[0053]
Thus, the three-layer neural network 40 has m1 input values (x₁(N), ..., x_m1(N)), θ_j, W_ij, Τ_j, W_j, Θ as parameters, one output operation value Z₁(N) is calculated. Each of the S samples 1 to S has an input value (x₁(N), ..., x_m1(N)), the three-layer neural network 40 generates S output operation values Z for m1 × S input values.₁(1), ..., Z₁(S) is calculated.
[0054]
The intermediate unit 42j is in the internal state y_jThe reason for performing the operation of the hypersphere discrimination type in accordance with equation (1) in order to obtain the following equation will be described. In general, the internal state y_jThe input value x represented by equation (5) to determine_i(N) and the parameter w of the combination_ijIs often used.
[0055]
(Equation 5)

[0056]
In this case, the equation y of the identification surface_j= 0 specifies one hyperplane. Examples are shown in FIGS. FIG. 4 shows a set x of two input values.₁(N), x₂Internal state y calculated using equation (5) when (n) is used_jIs shown. FIG. 5 shows a set x of two input values.₁(N), x₂Output Y from intermediate unit when (n) is used_jIs shown.
[0057]
As shown in FIG. 4, the equation y of the identification surface_j= 0 designates one hyperplane 50. Then, as shown in FIG._jIs represented by a curved surface that changes with the hyperplane 50 as a boundary. The

regions

51 and 52 on both sides of the hyperplane 50 are substantially flat, and in each of the

regions

51 and 52, it is not possible to express features in a narrower region. That is, a set x of two input values₁(N), x₂Intermediate unit output Y calculated for (n)_jCan express a global feature that changes with the hyperplane 50 as a boundary, but cannot express a local feature in a narrower area of the

areas

51 and 52.
[0058]
Using the equation (5), the internal state y_jIs referred to as hyperplane identification type operation. Then, when the operation of the hyperplane identification type is performed, as described above, the global feature can be expressed, but the local feature cannot be expressed by one intermediate unit. Representing local features requires a number of intermediate units.
[0059]
On the other hand, using the equation (1), the internal state y_jIs calculated, the equation y of the identification surface_j= 0 specifies one hypersphere. Examples are shown in FIGS. FIG. 6 shows a set x of two input values.₁(N), x₂Internal state y calculated using equation (1) when (n) is used_jIs shown. FIG. 7 shows a set x of two input values.₁(N), x₂Output Y from intermediate unit 42j when (n) is used_jIs shown.
[0060]
As shown in FIG._j= 0 designates one hypersphere 60. Then, as shown in FIG. 7, the output Y of the intermediate unit 42j is output._jIs a flat surface outside the hypersphere 60 and a convex surface inside the hypersphere 60. The hypersphere 60 is generally formed in a narrow area. Therefore, the internal state y of the intermediate unit 42j_jIs obtained by the operation of the hypersphere identification type, thereby expressing the local feature.
[0061]
As described above, the internal state y is calculated by the operation of the hyperplane identification type._jIs obtained, global features can be expressed, but local features cannot be expressed. On the other hand, the internal state y_j, Local features can be expressed, but global features cannot be expressed.
[0062]
Therefore, ideally, an intermediate unit that is advantageous for expressing local features and an intermediate unit that is advantageous for expressing global features should be aligned, but it is unknown how many should be aligned. If the operation of the hyperplane identification type and the operation of the hypersphere identification type are performed in a mixed manner, the amount of calculation increases.
[0063]
Therefore, in the present invention, the operation of the hyperspherical identification type which is advantageous for the expression of the local feature is adopted, and the intermediate unit 42j has the internal state y_jAnd output Y_j(Θ_j, W_ij, Τ_j) Is changed over a wider range than the normal search range to express global features.
[0064]
A set x of two input values₁(N), x₂When (n) is used, the equation y of the identification surface_j= 0 from Equation (1), the parameter w_ijAnd the parameter θ_jSpecify a circle whose radius is. Parameter θ_jAnd w_ijThe change of the function f when the range is changed will be described.
[0065]
FIG. 8 shows the parameter θ_jAnd w_ijShows a relative positional relationship of the identification hypersphere and an output example of the intermediate unit 42j in the case where the range is set to a small radius corresponding to the local feature. FIG. 9 shows the parameter θ_jAnd w_ijShows the relative positional relationship of the identification hypersphere and the output example of the intermediate unit 42j when the range is widened. FIG. 10 shows the parameter w_ij(Center) is the input variable x₁, X₂Fig. 7 shows possible relative positional relationships of the discriminating hypersphere and an output example of the intermediate unit 42j in the case of a small hypersphere existing outside the defined range.
[0066]
Parameter (w_1j, W_2j) (Center) is the input variable x₁, X₂Within the domain ([0,1]²) And the parameter θ_jWhen the possible range of (radius) is as small as [0, 0.5] (see FIG. 8A), the intermediate unit 42j outputs the output Y represented by the

curved surface

61 or 62 reflecting the local feature._j(See FIG. 8B).
[0067]
The curved surface 61 has a parameter (w_1j, W_2j) (Center) is (0.5, 0.5) and the parameter θ_j(Radius) is 0.1 and T_jIs 0.03. The curved surface 62 has a parameter (w_1j, W_2j) (Center) is (0.5, 0.5) and the parameter θ_j(Radius) is 0.1 and T_jIs 0.05.
[0068]
Thus, the parameter (w_1j, W_2j) (Center) is the input variable x₁, X₂And set the parameter θ_jIf (radius) is set to a small value, the intermediate unit 42j outputs the output Y represented by the

curved surfaces

61 and 62 reflecting local features._jIs output.
[0069]
Parameter (w_1j, W_2j) (Center) is the input variable x₁, X₂Outside the domain of [(-0.2, 1.2]²) And the parameter θ_jWhen the possible range of (radius) is as large as [0, 1.5] (see FIG. 9A), the intermediate unit 42j outputs the output Y represented by the curved surface 63 or 64._j(See FIG. 9B).
[0070]
The surface 63 has a parameter (w_1j, W_2j) (Center) is (1.2, 1.2) and the parameter θ_j(Radius) is 1.0 and T_jIs 0.1. In addition, the curved surface 64 has a parameter (w_1j, W_2j) (Center) is (1.2, 1.2) and the parameter θ_j(Radius) is 1.0 and T_jIs 0.5.
[0071]
The curved surfaces 63 and 64 are similar to the curved surface shown in FIG. Therefore, the parameter (w_1j, W_2j) (Center) is the input variable x₁, X₂Outside the defined range, and the parameter θ_jIf (radius) is set to a large value, the intermediate unit 42j outputs the output Y represented by the

curved surfaces

63 and 64 reflecting the global features._jIs output. That is, the intermediate unit 42j performs a hyperplane identification type operation approximately. This means that the parameter θ_jAnd w_ijIs set to a range wider than the range reflecting the local feature, the intermediate unit 42j outputs the output Y reflecting the global feature._jCan be output.
[0072]
Parameter (w_1j, W_2j) (Center) is the input variable x₁, X₂Outside the domain of the parameter θ_jWhen the (radius) is small (see FIG. 10A), the intermediate unit 42j outputs the output Y represented by the curved surface 65._jIs output (see FIG. 10B).
[0073]
In this case, the intermediate unit 42j outputs a curved surface 65 having a value of substantially zero over the entire domain, and outputs the output operation value Z which is the final output.₁It hardly contributes to (n).
[0074]
As described above, the parameter θ_j, W_ijAnd T_j(That is, τ_j), The intermediate unit 42j outputs the output Y reflecting the local feature in the operation of the hypersphere identification type._j(See FIG. 8) and output Y reflecting global features_j(See FIG. 9). Even if the local feature and the global feature are reflected by the operation of the hypersphere identification type, the number of parameters of the intermediate unit 42j does not increase. Accordingly, in the present invention, the intermediate unit 42j_j, W_ijAnd T_j(That is, τ_j) Is changed to a range wider than the range for performing the operation of the normal hypersphere identification type, and the operation of the hypersphere identification type is performed, and the output Y_jWas output. This is the reason that the intermediate unit 42j performs the operation of the hypersphere identification type.
[0075]
The “normal search range” means that only the local features are output by the operation of the hypersphere identification type._jParameter θ to reflect_j, W_ijAnd T_j(That is, τ_j).
[0076]
Output Y from intermediate unit 42j_jIs input to the output unit 431, and the output unit 431 outputs_jAnd parameter W_j, へ into equation (4) and output operation value Z₁(N) is calculated.
[0077]
Output operation value Z₁When (n) is calculated, the output value z included in the sample data 30 is calculated.₁(N) and output operation value Z₁(N) and the average V (w)_ij, Θ_j, Τ_j, Θ) is calculated by equation (6).
[0078]
(Equation 6)

[0079]
The average V of the sum of squared errors (w_ij, Θ_j, Τ_j, W_j, Θ) are output operation values Z calculated by the three-layer neural network 40.₁(N) is the actual output value z₁(N) is an index indicating a degree close to (n).₁This is a cost function for evaluating (n).
[0080]
The program according to the present invention has a parameter w_ij, Θ_j, Τ_j, W_j, （(The parameter w of the intermediate unit 42j)_ij, Θ_j, Τ_jIs changed in a range wider than the range reflecting the local feature), and the output operation value Z₁(N) and the output operation value Z₁Cost function V (w) calculated by substituting (n) into equation (6)_ij, Θ_j, Τ_j, W_j, Θ) such that the parameter w_ij, Θ_j, Τ_j, W_j, Θ is optimized. In this case, the program according to the present invention first sets the number of intermediate units 42j to one (j = 1), and finally calculates the cost function V (w using the one set intermediate unit 421)._ij, Θ_j, Τ_j, W_j, Θ) is larger than the predetermined value ε, the number of the intermediate units 42j is increased by one until the function value becomes the predetermined value ε or less, and the parameter w is set so that the function value becomes the predetermined value ε or less._ij, Θ_j, Τ_j, W_j, Θ is optimized.
[0081]
This parameter w_ij, Θ_j, Τ_j, W_j, Θ are optimized, the input value x₁(N), ..., x_m1(N) and output value z₁(N) Since the approximate function f that defines the relationship with (n = 1 to S) is determined, the parameter w_ij, Θ_j, Τ_j, W_j, Θ corresponds to performing an operation for obtaining the approximate function f.
[0082]
As described above, the intermediate unit 42j includes the parameter w_ij, Θ_j, Τ_jIs changed in a range wider than the range reflecting the local feature, and the operation of the hypersphere identification type is performed._ij, Θ_j, Τ_jIs changed over a wide range, the cost function V (w_ij, Θ_j, Τ_j, W_j, Θ) will increase the wide flat area. As shown in FIG. 10A, the center w_ijIs outside the defined range of the input value, and the radius θ_jIs smaller than the output Y of the intermediate unit 42j._jIs substantially zero in the entire domain and the output operation value Z which is the final output₁It hardly contributes to (n). Therefore, in a region corresponding to such a situation in the parameter space, the cost function V (w_ij, Θ_j, Τ_j, W_j, Θ) becomes flat.
[0083]
w_ij, Θ_j, Τ_j, W_j, Θ, an algorithm such as an error back-propagation algorithm (BP) or an annealing method (SA: Simulated Annealing) which is often used when optimizing parameters such as Very inefficient against A method of adding a penalty term to the cost function in order to avoid the appearance of a wide flat area is also conceivable, but this method distorts the original cost function and increases the amount of calculation required for evaluating the cost function.
[0084]
Therefore, in the present invention, a high-dimensional algorithm, which is an effective optimization method for a cost function including a wide flat region and a plurality of local solutions which are always problematic in learning a neural network, is adopted.
[0085]
The optimization of the parameters by the high-dimensional algorithm will be described. FIG. 11 shows a flowchart of optimization by a high-dimensional algorithm. The space of all variables q to be optimized appearing in the problem to be solved is defined as a semantic space (step A). Then, a variable p conjugate with the variable q is artificially introduced to define a new space for the variable p (step B). Then, the space of the variable p is added to the semantic space, and a space obtained by increasing the dimension of the semantic space is defined as a high-dimensional space (step C).
[0086]
Then, a problem is set in a high-dimensional space of variables q and p (step D). Finally, in a high-dimensional space, an optimal solution is searched by autonomous motion, and an optimal solution that minimizes the cost function is detected (step E). Here, the autonomous motion refers to a motion that quickly passes through a region where a solution does not exist and easily enters a region where a solution exists.
[0087]
As described above, the high-dimensional algorithm increases the dimension of the semantic space of the variable q to be optimized, and detects an optimal solution by performing an autonomous motion in the increased high-dimensional space.
[0088]
FIG. 12 is a conceptual diagram illustrating a solution search method using a high-dimensional algorithm. FIG. 13 shows a landscape of the cost function when the number of parameters is two (k1, k2). In the landscape shown in FIG. 13, there are many peaks and valleys, and the solution corresponds to the lowest point (minimum value) of all valleys. Therefore, in order to find a solution, it is necessary to reach the lowest valley through many complicated peaks and valleys.
[0089]
That is, as shown in FIG. 12, in order to reach the solution 71 in the semantic space 70 of the variable q to be optimized, it is necessary to move along the path indicated by the arrow 72 and pass through many peaks and valleys. However, it is difficult to reach a small solution 71 in a large semantic space 70.
[0090]
Therefore, a high-dimensional space 80 of the variables q and p is defined by adding a variable p conjugate to the variable q. In the high-dimensional space 80, a region that is not a solution in the semantic space 70 is small, and a region corresponding to the solution is enlarged. Therefore, in the high-dimensional space 80, the solution 71 in the semantic space 70 is expanded to the region 81 where the solution exists, and moves along the path indicated by the arrow 82 to easily reach the region 81. Then, in the high-dimensional space 80, the region that is not a solution becomes smaller, and the region corresponding to the solution is enlarged. Enter 81 easily. That is, in the high-dimensional space 80, an autonomous motion is performed and a solution is detected.
[0091]
Thus, it can be clearly understood from the following example that the object can be easily searched if the space to be searched is increased in dimension. For example, consider the case of searching for a hexagonal pencil with a long cut. When you look at this pencil from the front, you can only see a small hexagonal cross section, but if you look away, you can see the length of the cut. If this "length" is considered as a dimension, it can be easily understood that by increasing the dimension from the semantic space 70 to the higher-dimensional space 80, it becomes easier to search for a solution.
[0092]
The high-dimensional algorithm increases the space to be searched in this way and searches for a solution in the higher-dimensional space.As a result, the high-dimensional algorithm is less likely to be caught by the local solution of the cost function, and the flat region of the cost function is reduced. It can pass quickly and can easily find a global solution.
[0093]
Output operation value Z calculated by three-layer neural network 40₁(N) is the actual output value z₁Parameter w using a high-dimensional algorithm so as to approach (n)_ij, Θ_j, Τ_j, W_j, Θ, the program according to the present invention uses Hamiltonian H (q, p) which is a function of variables q, p in high dimensional space 80. This Hamiltonian H (q, p) is a specific tool representing a dynamical system that moves, and in the space represented by this specific tool (Hamiltonian H (q, p)), the semantic space 70 This is because the variable q to be formed and the semantic space 70 are transformed into a high-dimensional space 80, and a variable p conjugate to the variable q is easily introduced. In the case of an actual dynamical system, the variable q represents the position of the moving object, and the variable p represents the speed of the moving object. The high-dimensional algorithm is an optimization method characterized by searching for an optimal parameter by taking an analogy of the dynamic system.
[0094]
In the case of the problem of obtaining the approximate function of the present invention, the variable q to be optimized by the high-dimensional algorithm is q = w_ij, Θ_j, Τ_j, W_j, Θ (see step A in FIG. 11). And this variable q = w_ij, Θ_j, Τ_j, W_j, Θ correspond to the position of the moving object. The potential energy V (q) is a variable q = w_ij, Θ_j, Τ_j, W_j, Θ.
[0095]
Next, a variable p conjugate to the variable q is artificially introduced, and a new space formed by the variable p is defined (see step B in FIG. 11). The new space added to the semantic space is defined as a high-dimensional space (see step C in FIG. 11). The variable p corresponds to the velocity of the moving object as described above, and defines a function corresponding to the kinetic energy T (p) which is a function of the variable p in the same manner as the moving dynamic system (see step D in FIG. 11). ).
[0096]
Thereafter, a function H (q, p) in a high-dimensional space obtained by adding a function T (p) in a new space to a function V (q) in the semantic space to increase the dimension of the semantic space is defined by variables q and p ( (See step D in FIG. 11).
[0097]
In a Hamiltonian dynamical system, the position of a moving object at an arbitrary time t is determined by a motion equation derived from the Hamiltonian H (q, p), and therefore the same applies to a high-dimensional algorithm.
[0098]
Finally, detecting an optimal solution by autonomous motion in a high-dimensional space is equivalent to finding an optimized variable q by solving a motion equation derived from the Hamiltonian H (q, p) (FIG. 11). Step E).
[0099]
The potential energy V (q) is represented by equation (7).
[0100]
(Equation 7)

[0101]
That is, the potential energy V (q) is represented by the cost function V_C(Q) and constraint potential V_L(Q). This constraint potential V_L(Q) is a parameter q = w_ij, Θ_j, Τ_j, W_j, Θ are limited. And the constrained potential V_L(Q) is represented by equation (8).
[0102]
(Equation 8)

[0103]
In the equation (8), c is a parameter for controlling the strength of the constraint, and in this embodiment, c = 1 is set. Also, ν in equation (8)_n(Q_n) Is represented by equation (9).
[0104]
(Equation 9)

[0105]
In equation (9), θ (u) represents a step function, and u = a_n-Q_nOr q_n-B_nWhen> 0, θ (u) = 1 and u = a_n-Q_nOr q_n-B_nWhen <0, θ (u) = 0.
[0106]
That is, ν_n(Q_n) Is represented by the curve 89 shown in FIG._ij, Θ_j, Τ_j, W_j, Θ define the limits of the search range.
[0107]
Further, the kinetic energy T (p) is represented by Expression (10), and the Hamiltonian H (q, p) is represented by Expression (11).
[0108]
(Equation 10)

[0109]
(Equation 11)

[0110]
As a result, the equation of motion describing the motion of the system is represented by equation (12).
[0111]
(Equation 12)

[0112]
Further, the cost function V appearing on the right side of the lower expression of Expression (12)_C(Q) and the function V indicating the constrained potential_LParameter q = w of (q)_ij, Θ_j, Τ_j, W_j, Θ are shown in equations (13) and (14), respectively.
[0113]
(Equation 13)

[0114]
[Equation 14]

[0115]
Starting from a randomly selected initial value (q (0), p (0)), a set of 2N first-order differential equations of equation (12) is obtained by a Verlet method or a Runge-Kutta method. , The orbit (q (t), p (t)) of the system is obtained. Then, the cost function V along the trajectory q (t) for a sufficient time_CBy monitoring (q (t)), the parameter q = w_ij, Θ_j, Τ_j, W_j, Θ can be found.
[0116]
The motion system described by the motion equation shown in Expression (12) is a dynamic system having mixing properties and a constant total energy E. In this case, the system is expected to move around with equal probability on an equal energy surface that satisfies H (q, p) = E in the phase space (the principle of equal weight). Based on this isobaric principle, the expected value δ (q) of the time the system stays in the microvolume q + dq near the position q is expressed by equation (15) (K. Shinjo) And T. Sasada, "Hamiltonian Systems with Many Degrees of Freedom: Asymmetric Motion and Intensity of Motion in a Phase of Space in the Phase of Space in the Phase of Space in the Phase Space" Physical Review E54, pp4685-4700, 1996 ).
[0117]
[Equation 15]

[0118]
From Expression (15), when the degree of freedom N is 3 or more, the expected value of the stay time is low in the region having a high potential value, and the expected value of the stay time is high in the region having a low potential value. Becomes more remarkable as the degree of freedom N increases.
[0119]
Therefore, the high-dimensional algorithm takes the analogy of the dynamical system of the Hamiltonian H (q, p) having such characteristics and searches for the optimal solution, so that it is hard to be caught by the local solution of the cost function described above, and It has the feature that it quickly passes through a flat area of the cost function.
[0120]
The features of this high-dimensional algorithm will be described conceptually.
FIG. 15 shows the degree of the localization of the cost function captured by the high-dimensional algorithm (HA) and the annealing method (SA) in comparison. FIG. 15A shows the case of the high-dimensional algorithm (HA), and FIG. 15B shows the case of the annealing method (SA). 15A and 15B, the horizontal axis represents the variable q to be optimized, and the vertical axis represents the cost function V (q).
[0121]
FIG. 16 shows the speed of the cost function passing through the flat region in comparison with the high-dimensional algorithm (HA) and the annealing method (SA). FIG. 16A shows the case of the high-dimensional algorithm (HA), and FIG. 16B shows the case of the annealing method (SA). 16A and 16B, the horizontal axis represents the variable q to be optimized, and the vertical axis represents the cost function V (q).
[0122]
As shown in FIG. 15, in the case of the high-dimensional algorithm (HA), the kinetic energy E plays a role of getting out of the valley (local solution) of the cost function V (q). , The value becomes large at a position where the function value of the cost function is small. In other words, the kinetic energy E increases when entering the local solution. Therefore, the high-dimensional algorithm (HA) is hardly caught by a local solution (see FIG. 15A). On the other hand, in the case of the annealing method (SA), the movement can escape from the local solution by having a positive absolute temperature, but this absolute temperature is position-independent. Therefore, if the absolute temperature is lower than the peak of the cost function, it is difficult to escape from the local solution. As a result, the annealing method (SA) is easily captured by the local solution of the cost function (see FIG. 15B).
[0123]
Also, as shown in FIG. 16, in the case of the high-dimensional algorithm (HA), since the robot moves in one direction by linear motion at a constant velocity, it passes through the flat region of the cost function V (q) quickly (( a)). On the other hand, in the case of the annealing method (SA), since it moves by a random walk (moves randomly in each step in the right and left directions on the paper), it cannot easily pass through the flat area of the cost function V (q) (see FIG. 16 (b)).
[0124]
As described above, the high-dimensional algorithm has a feature that it is hard to be caught by the local solution of the cost function and can pass through the flat region quickly. As a result, an optimal solution can be reached with a small amount of calculation.
[0125]
FIG. 17 shows a program according to the present invention in which a set of input values x₁(N), ..., x_m1(N) and output value z₁4 is a flowchart illustrating an operation for calculating an approximate function f that defines the relationship with (n). When the calculation for obtaining the approximate function f is started, m1 × S input values and S output values included in the sample data 30 are received (step S1). Then, the parameter w of the three-layer neural network 40 is determined using the intermediate unit 42j (j = 1) set to one._ij, Θ_j, Τ_j, W_j, Θ are optimized.
[0126]
That is, the parameter w of the intermediate unit 42j of the three-layer neural network 40_ij, Θ_j, Τ_jIs changed in a wide search range, and a parameter w of the three-layer neural network 40 is obtained so as to obtain an approximate function f that imitates the relationship between m1 input values and one output value._ij, Θ_j, Τ_j, W_j, Θ are optimized using a high-dimensional algorithm (step S2). Thereby, the parameter w_ij, Θ_j, Τ_j, W_j, Θ are optimized, the approximate function f is determined, and the series of operations ends.
[0127]
FIG. 18 shows a flowchart for explaining the detailed operation of step S2 shown in FIG. After step S1 shown in FIG. 17, the parameter w of the three-layer neural network 40_ij, Θ_j, Τ_j, W_j, に are set to initial values, and S output operation values Z are calculated by hypersphere identification type operation.₁(1), ..., Z₁(S) is calculated (step S21).
[0128]
Then, S output operation values Z₁(1), ..., Z₁A cost function value for evaluating (S) is calculated (step S22). Thereafter, it is determined whether or not the cost function value is equal to or smaller than the predetermined value ε (step S23). When the cost function value is equal to or smaller than the predetermined value ε, a series of operations ends.
[0129]
On the other hand, when it is determined in step S23 that the cost function value is not equal to or smaller than the predetermined value ε, it is determined whether the number of calculations is equal to or smaller than a specified number (step S24). Then, when the number of calculations is equal to or less than the specified number, a parameter for reducing the cost function value is calculated over a wide search range by a high-dimensional algorithm (step S25).
[0130]
That is, the cost function value and the constraint condition calculated in step S22 are converted into the parameter w by the equations (13) and (14)._ij, Θ_j, Τ_j, W_j, Θ, and the value of the parameter w to be next taken by equation (12)._ij, Θ_j, Τ_j, W_j, Θ are calculated. Then, the calculated value is used as the parameter w used in the calculation in step S21._ij, Θ_j, Τ_j, W_j, Θ.
[0131]
Thus, after step S25, steps S21 to S24 are repeatedly executed.
[0132]
On the other hand, when it is determined in step S24 that the number of calculations is not less than or equal to the prescribed number, the parameter with the smallest cost function value in a series of iterations is fixed to the optimum value of the intermediate unit, and the three-layer neural network 40 Is increased by one (step S26). Then, steps S21 to S25 are repeatedly executed for the newly added intermediate unit.
[0133]
As described above, according to the program of the present invention, the number of the intermediate units 42j is initially set to one, and the set one intermediate unit 421 is used to find a parameter that reduces the cost function value. Parameter w_ij, Θ_j, Τ_j, W_j, Θ in a wide search range, and the corresponding cost function values are calculated one after another. If the smallest cost function value does not fall below the predetermined value ε even if the number of intermediate units 42j is set to 1 and the specified number of operations is performed, the number of intermediate units 42j is increased by 1 and the same operation is repeated. Be executed. Even if the number of intermediate units 42j is set to two and the specified number of calculations are performed, the number of intermediate units 42j is further increased by one when the cost function value does not fall below the predetermined value ε. Until it is determined in step S23 shown in FIG. 18 that the cost function value is equal to or smaller than the predetermined value ε, a search for a new parameter by the high-dimensional algorithm and an increase in the number of intermediate units 42j are repeatedly executed.
[0134]
When step S2 shown in FIG. 17 is performed first, the number of the intermediate units 42j of the intermediate layer 42 is set to one, so that the intermediate layer 42 outputs the output Y only by the intermediate unit 421.₁Is calculated. FIG. 19 shows a conceptual diagram of the three-layer neural network 40 when the number of intermediate units is set to one. Accordingly, the three-layer neural network 40 uses the input layer 41, the intermediate layer 42A, and the output layer 43 shown in FIG.₁(N) is calculated.
[0135]
In this case, the intermediate unit 421 receives input values x from input units 41i (i = 1,..., M1) of the input layer 41, respectively.₁(N), ..., x_m1(N) (n = 1 to S), and the received input value x₁(N), ..., x_m1(N) (n = 1 to S) and the parameter w of the connection_1,1, W_2,1, W_m1,1And threshold θ₁Into the equation (1) to obtain the internal state y₁Is calculated. Then, the intermediate unit 421 calculates the calculated internal state y.₁And parameter T₁Into equation (2) and output Y₁Is calculated, and the calculated output Y₁To the output unit 431 of the output layer 43.
[0136]
The output unit 431 outputs the output Y received from the intermediate unit 421.₁And the connection weight W₁And the threshold Θ into equation (4) and output operation value Z₁(N) is calculated. That is, the parameter (w_i1, Θ₁, Τ₁, W₁, Θ)₁₁Output operation value Z using₁₁(N) is calculated. Note that the parameter (w_i1, Θ₁, Τ₁, W₁, Θ)₁₁And output operation value Z₁₁Of the suffixes {11} of (n), the former {1} indicates the number of intermediate units 42j, and the latter {1} indicates that the parameter is set for the first time.
[0137]
Cost function V_C(Q) is the input value x₁(N), ..., x_m1(N) (n = 1 to S) and output value z₁The calculation for obtaining the approximate function f defining the relationship with (n) is represented by Expression (6). Therefore, the calculated output operation value Z₁₁(N) and the actual output value z₁(N) is substituted into equation (6), and the cost function V ((w_i1, Θ₁, Τ₁, W₁, Θ)₁₁) Cost function value V₁₁Is calculated (see step S22 in FIG. 18). Note that the cost function value V₁₁The meaning of the subscript {11} of the parameter (w_i1, Θ₁, Τ₁, W₁, Θ)₁₁And output operation value Z₁₁This has the same meaning as the subscript (n).
[0138]
And the cost function value V₁₁Is not less than the predetermined value ε, the cost function value V₁₁Cost function value V smaller than₁₂(W_i1, Θ₁, Τ₁, W₁, Θ)₁₂Is obtained by a high-dimensional algorithm (see step S25 in FIG. 18). Then, the parameter (w_i1, Θ₁, Τ₁, W₁, Θ)₁₂And S output operation values Z₁₂(N) is calculated (see step S21 in FIG. 18), and the cost function value V is calculated by equation (6).₁₂Is calculated (see step S22 in FIG. 18).
[0139]
And the cost function value V₁₂Is determined to be less than or equal to a predetermined value ε (step S23 in FIG. 18), and the cost function value V₁₂Is not less than or equal to the predetermined value ε, the cost function value V smaller than the already calculated minimum cost function value_1h(H: natural number of h ≦ k, k: specified number of times)_i1, Θ₁, Τ₁, W₁, Θ)_1hIs calculated (see step S25 in FIG. 18).
[0140]
As described above, when the number of calculations is equal to or less than the specified number k, the cost function value V_1h(H is a natural number less than or equal to k)_i1, Θ₁, Τ₁, W₁, Θ)_1hIs calculated, and the calculated parameter (w_i1, Θ₁, Τ₁, W₁, Θ)_1hAnd a new cost function value V_1hIs calculated, and this is repeated one after another. That is, the parameter (w_i1, Θ₁, Τ₁, W₁, Θ) while monitoring the corresponding cost function value, searching for the smallest cost function value and the parameter corresponding to the smallest cost function value.
[0141]
Then, when the number of calculations reaches the specified number, the number of intermediate units 42j is increased by one and set to two (see step S26 in FIG. 18).
[0142]
FIG. 20 is a conceptual diagram of the three-layer neural network 40 when the number of intermediate units is set to two. Accordingly, the three-layer neural network 40 uses the input layer 41, the intermediate layer 42B, and the output layer 43 shown in FIG.₂₁(N) is calculated.
[0143]
In this case, the intermediate unit 421 receives input values x from input units 41i (i = 1,..., M1) of the input layer 41, respectively.₁(N), ..., x_m1(N) (n = 1 to S), and the received input value x₁(N), ..., x_m1(N) (n = 1 to S) and the parameter w of the connection_1,1, W_2,1, W_m1,1And threshold θ₁Into the equation (1) to obtain the internal state y₁Is calculated. Then, the intermediate unit 421 calculates the calculated internal state y.₁And parameter T₁Into equation (2) and output Y₁Is calculated, and the calculated output Y₁To the output unit 431 of the output layer 43. Note that the parameter w_1,1, W_2,1, W_m1,1, Θ₁And T_jIs the parameter (w_i1, Θ₁, Τ₁, W₁)_1kFixed to.
[0144]
Further, the intermediate unit 422 receives input values x from input units 41i (i = 1,..., M1) of the input layer 41, respectively.₁(N), ..., x_m1(N) (n = 1 to S), and the received input value x₁(N), ..., x_m1(N) (n = 1 to S) and the parameter w of the connection_1,2, W_2,2, W_m1,2And threshold θ₂Into the equation (1) to obtain the internal state y₂Is calculated. Then, the intermediate unit 422 calculates the calculated internal state y.₂And parameter T₂Into equation (2) and output Y₂Is calculated, and the calculated output Y₂To the output unit 431 of the output layer 43.
[0145]
The output unit 431 outputs the output Y received from the intermediate unit 421.₁And the output Y received from the intermediate unit 422₂And the connection weight W₁, W₂And the threshold Θ into equation (4) and output operation value Z₂₁(N) is calculated.
[0146]
And the output operation value Z₂₁Using (n), the cost function value V₂₁Is calculated, and the same calculation as when only the intermediate unit 421 is used is repeatedly executed. The number of intermediate units 42j is set to two, and parameters (w_i1, Θ₁, Τ₁, W₁, Θ)_2hAnd the small cost function value V_2hCost function value V even if the calculation for finding_2hDoes not become equal to or smaller than the predetermined value ε, the number of intermediate units 42j is further increased by one, and the same operation as when the number of intermediate units 42j is set to two is repeatedly executed.
[0147]
As described above, when the number of the intermediate units 42j is set to a certain value, the cost function value for evaluating the output operation value Z (n) calculated using the set number of the intermediate units 42j is optimally reduced. To determine the value, a new parameter (w_ij, Θ_j, Τ_j, W_j, Θ) are calculated one after another by a high-dimensional algorithm. If the cost function value does not become less than or equal to the predetermined value ε even after performing the calculation of the new parameter up to the specified number of times, the new parameter (w_ij, Θ_j, Τ_j, W_j, Θ) are calculated one after another by a high-dimensional algorithm.
[0148]
Therefore, the program according to the present invention increases the number of intermediate units 42j of the three-layer neural network 40 by one while increasing the parameter (w_ij, Θ_j, Τ_j, W_j, Θ) are optimized.
[0149]
In the above description, the number of intermediate units 42j is initially set to “1”. However, the present invention is not limited to this, and may be initially set to a plurality. That is, in the present invention, the number of intermediate units 42j may be initially set to one or more initial values.
[0150]
The parameters of the three-layer neural network 40 (w_ij, Θ_j, Τ_j, W_j, Θ) increase by (m1 + 3) as the number of intermediate units 42j increases. For example, when the number of intermediate units 42j increases from one to two, the overall parameters are (w_i1, Θ₁, Τ₁, W₁, Θ) to (w_i1, Θ₁, Τ₁, W₁; W_i2, Θ₂, Τ₂, W₂, Θ) by (m1 + 3). Therefore, the number of intermediate units 42j is increased by one and the parameter (w_ij, Θ_j, Τ_j, W_j, Θ) is to increase the number of parameters by a predetermined number and increase the number of parameters (w_ij, Θ_j, Τ_j, W_j, Θ) is equivalent to optimizing. Then, among the predetermined number of parameters before the number of parameters is increased, the parameter (w_i1, Θ₁, Τ₁, W₁) Is fixed and the increased parameter (w_i2, Θ₂, Τ₂, W₂) And the parameter Θ, the parameter (w_i2, Θ₂, Τ₂) In a wide search range to change the output operation value Z₁(N) is calculated. In this case, in step S2 shown in FIG. 17, the number of parameters is set to a predetermined initial value to optimize the parameters.
[0151]
Therefore, the program according to the present invention is characterized in that the number of parameters of the three-layer neural network is increased by a predetermined number to optimize the parameters.
[0152]
The effect of the three-layer neural network 40 when performing an operation for obtaining the approximate function f will be described. That is, the effect of performing the operation of the hypersphere identification type in the intermediate unit 42j of the three-layer neural network 40 will be described.
[0153]
FIG. 21 shows the contours of three test functions with two inputs and one output. FIG. 21A shows that the test function z is expressed as z = −0.76x₁+ 0.19x₂The contour line when +0.78 is set is shown. FIG. 21B shows that the test function z is expressed as z = sin (πx₁) · Sin (πx₂) Indicates contour lines. FIG. 21C shows that the test function z is expressed as z = 0.5exp ｛−5 (x₁-0.2)²-5 (x₂-0.2)²{+ 0.9exp} -5 (x₁-0.8)²-10 (x₂-0.6)²Shows the contour line when｝. 21A, 21B, and 21C, the horizontal axis and the vertical axis represent the input variable x.₁, X₂Represents
[0154]
Each learning condition is as follows.

In each of the problems (i), (ii) and (iii), an operation of the hypersphere identification type and an operation of the hyperplane identification type were performed, and the results of the operations were compared. In both calculations, optimization of parameters was performed using a high-dimensional algorithm.
[0155]
For problem (i), the following results were obtained:
<1> Hyperplane identification network
Number of acquired intermediate units = 1
Average number of times of learning ≒ 11,507 times
<2> Hypersphere identification network
Number of acquired intermediate units = 1
Average learning times: 64,103 times
As shown in FIG. 21A, the contours of this test function are linear, which is a clear advantage for hyperplane identification networks. However, even in the hypersphere discrimination network, since a relatively strict required accuracy (ε ≦ 0.003) is achieved by one intermediate unit, by expanding the parameter search range, if only local features are used, However, it can be understood that the global characteristics can be reflected, and the mapping ability is enhanced.
[0156]
On the other hand, the hypersphere discrimination network has about six times the average number of times of learning as the hyperplane discrimination network. This is due to the fact that the parameter search range is widened, but despite the clearly disadvantageous case, the increase is at most about 6 times.
[0157]
Next, the result of the problem (ii) will be described.
<1> Hyperplane identification network
Number of acquired intermediate units = 8
Average number of times of learning 393,829 times
<2> Hypersphere identification network
Number of acquired intermediate units = 1
Average number of learnings = 12,045 times
As shown in FIG. 21 (b), the contour of this test function is a closed curve, which is a case clearly advantageous for the hypersphere discrimination network. For the hyperplane discrimination network, if the required accuracy was raised to ε ≦ 0.03, convergence of learning became difficult rapidly, so ε ≦ 0.05. The required number of intermediate units is a result of eight hyperplane discrimination networks and one hypersphere discrimination network, which shows that the mapping ability of the hypersphere discrimination network is superior to each stage. .
[0158]
As for the average number of times of learning, the hyperplane identification network requiring eight times the number of intermediate units (the number of parameters) is about 33 times as large, which indicates that the hyperplane identification network is inefficient for expressing local features.
[0159]
Finally, the results of problem (iii) are shown.
<1> Hyperplane identification network
Number of acquired intermediate units = 6
Average number of learnings: 264,706
<2> Hypersphere identification network
Number of acquired intermediate units = 2
Average number of learning ≒ 2,665 times
As shown in FIG. 21C, this test function has an asymmetrical closed contour and is an example likely to be a practical application problem. Since it was difficult for the hyperplane identification network to achieve the required accuracy ε ≦ 0.01, ε ≦ 0.03 was set.
[0160]
The number of intermediate units is two for the hypersphere identification network and six for the hyperplane identification network. As a result, it can be seen that also in this case, the hyperspherical identification network has higher mapping ability. The average number of times of learning is also nearly 100 times greater in the hyperplane identification network than in the hypersphere identification network.
[0161]
As described above, a method of learning a hypersphere identification network in which a parameter search range is appropriately expanded by a high-dimensional algorithm generally has a high mapping ability capable of achieving required accuracy with a smaller number of parameters than in the case of a hyperplane identification network. It was confirmed that the learning performance was good.
[0162]
A program including steps S1 and S2 shown in FIG. 17 and steps S21 to S26 shown in FIG. 18 is executed by the personal computer shown in FIG. FIG. 22 is a schematic block diagram of a personal computer. The personal computer 90 includes a data bus BS, a CPU 91, a RAM (Random Access Memory) 92, a ROM (Read Only Memory) 93, a serial interface 94, a terminal 95, a CD-ROM drive 96, a display 97, , A keyboard 98.
[0163]
CPU 91 reads out the program stored in ROM 93 via data bus BS. In addition, the CPU 91 stores in the ROM 93 a program obtained via the serial interface 94, the terminal 95 and the Internet network, or a program read from a CD (Compact Disk) 99 via the CD-ROM drive 96. Further, the CPU 91 receives an instruction from the user input from the keyboard 98.
[0164]
The RAM 92 is a work memory when the CPU 91 performs the calculation for obtaining the above-described approximate function f. The ROM 93 stores programs and the like. Serial interface 94 exchanges data between data bus BS and terminal 95.
[0165]
The terminal 95 is a terminal for connecting the personal computer 90 to an interface (not shown) for connecting the personal computer 90 to the Internet via a cable. The CD-ROM drive 96 reads out a program recorded on the CD 99. The display 97 gives the user various types of information as visual information. Keyboard 98 accepts instructions from the user.
[0166]
CPU 91 reads a program stored in ROM 93 in response to a user's instruction input via keyboard 98, and executes the read program. Then, the CPU 91 performs an operation for obtaining the approximate function f in accordance with the flowcharts shown in FIGS. 17 and 18, and displays the optimized parameters on the display 97.
[0167]
The program according to the present invention is read from the CD 99 by the CPU 91 via the CD-ROM drive 96 and stored in the ROM 93, or is obtained via the serial interface 94, the terminal 95 and the Internet and stored in the ROM 93.
[0168]
Thus, the user can execute the program according to the present invention by the personal computer 90 to perform the calculation for obtaining the approximate function f.
[0169]
As described above, the program according to the present invention uses the input value x₁(N), ..., x_m1(N) (n = 1 to S) and output value z₁When performing an operation for obtaining an approximate function f that defines the relationship with (n), the input value (x₁(1), ..., x_m1(1)), (x₁(2), ..., x_m1(2)),..., (X₁(S), ..., x_m1(S)),..., (X₁(M), ..., x_m1(M)) and output values (z (1), z (2),..., Z (S),..., Z (M)) included in the set 20 are, for example, wells in a quantum well structure. Layer width d_w1, The electron density n of the well layer_w1, Barrier layer height W_d2And the width d of the barrier layer_bAnd the energy level E of the particles (electrons and holes) in the quantum well structure._w1Obtained as an output value consisting of
[0170]
Therefore, the width d of the well layer_w1, The electron density n of the well layer_w1, Barrier layer height W_d2And the width d of the barrier layer_bInput value (x₁(1), ..., x_m1(1)), (x₁(2), ..., x_m1(2)),..., (X₁(S), ..., x_m1(S)),..., (X₁(M), ..., x_m1(M)) and the energy level E_w1, Z (S),..., Z (M)) will be described.
[0171]
[How to get sample data]
Energy level E_w1Is calculated in consideration of the self-interaction of particles in the quantum well structure.
[0172]
FIG. 23 is a conceptual diagram of a quantum well to be calculated by the quantum level calculation program. The vertical axis indicates energy, and the horizontal axis indicates position. The quantum well 100 includes barrier layers 101 and 102 and a well layer 103. The electrons confined in the well layer 103 form an energy level 104.
[0173]
In the present invention, the wave function における in one quantum well 100 is determined by solving the Schrodinger equation not including the interaction of particles. Starting from the obtained wave function Ψ, a wave function that minimizes the energy of the entire system is obtained based on the principle of the variational method. In this case, the energy of the entire system is defined as the expected value of the Hamiltonian including the interaction of particles as a nonlinear term calculated for a given wave function. Thereafter, a plurality of points x extend from the end of one barrier layer 101 of the quantum well 100 to the end of the other barrier layer 102.₁~ X_N(N is a natural number) and each divided point x_i(1 ≦ i ≦ N), the wave function Ψ is changed to N wave functions Ψ₁, ..., Ψ_i, ..., Ψ_NDiscretized into In this case, the point x₁~ X_NAnd so that the distance between each of the₁~ X_NIs determined.
[0174]
Wave function Ψ₁, ..., Ψ_i, ..., Ψ_NAnd the energy of the wave function Ψ is minimized so that the energy of the entire system of the quantum well 100 is minimized.₁, ..., Ψ_i, ..., Ψ_NIs calculated. Then, the calculated wave function Ψ₁, ..., Ψ_i, ..., Ψ_NIs used to determine the overall energy of the system.
[0175]
In this way, the wave function た obtained by solving the Schrodinger equation not including the interaction of particles is discretized into N pieces, and the discretized wave function Ψ is set so that the energy of the entire system is minimized.₁, ..., Ψ_i, ..., Ψ_NIs calculated.
[0176]
In the following, when applying the wave function obtained by solving the Schrodinger equation that does not include the particle interaction to the principle of the variational method to calculate the energy of the entire system, the Coulomb interaction of the particle as a nonlinear term incorporated into the Hamiltonian The case where is used will be described.
[0177]
FIG. 24 is a flowchart showing each step constituting the quantum level operation program. When the quantum level operation program is executed, a wave function when electrons confined in the quantum well 100 have no self-interaction is calculated (step S100). That is, the initial wave function 電子 of the electron is calculated using the Hamiltonian including the kinetic energy term and the potential term due to the external electric field. This calculation is performed using any one of the transfer matrix method, the S matrix method, and the shooting method.
[0178]
Then, an expression indicating the energy of the entire system to be minimized is given by Expression (16).
[0179]
(Equation 16)

[0180]
The second term on the right side of the equation (16) is a Coulomb interaction term including a self interaction. And ε (x) is the dielectric constant at the position x, and -e and m are the charge and mass of the electron, respectively. D (y) is the volume density of the fixed donor at position y, assuming that the fixed donor has a charge of + e. That is, for computer calculations, the ionized donor was assumed to carry a charge of + e due to free electrons.
[0181]
Equation (16) represents the energy of the entire system of the quantum well 100. However, since it is difficult to calculate using the equation (16), the space is discretized and normalized by the number of particles existing in the entire system. It was decided to. That is, the wave function 求め obtained in step S100 is replaced with N wave functions Ψ₁, ..., Ψ_i, ..., Ψ_NEquation (17) is obtained from Equation (16) by discretizing the equation and normalizing it by the number of particles in the entire system.
[0182]
[Equation 17]

[0183]
In Expression (17), Nt is a normalization factor, and Ne is the number of electrons. Vi 'is a Coulomb potential.
[0184]
Equation (17) represents the energy per electron present in the system, and is hereinafter referred to as “cost function” according to the term of the optimization problem. Then, N wave functions 離散 obtained by discretizing equation (17)₁, ..., Ψ_i, ..., Ψ_NTo calculate N derivatives. That is, equation (18) is obtained.
[0185]
(Equation 18)

[0186]
In the equation (18), Eint means nonlinear mutual energy per particle. Determining Expressions (17) and (18) from Expression (16) corresponds to calculating the cost function and the derivative of the cost function (Step S102).
[0187]
Equation (18) expresses both sides of equation (17) as a wave function Ψ₁, ..., Ψ_i, ..., Ψ_NThe Hamiltonian H ′ taking into account the self-interaction of equation (17) is not partially differentiated by each of₁, ..., Ψ_i, ..., Ψ_NIs partially differentiated by each of the following reasons.
[0188]
Since the quantum level calculation program aims to calculate the wave function so that the energy of the entire system of electrons confined in the well layer 103 of the quantum well 100 in consideration of self-interaction is minimized, the following equation is used. The whole wave function Ψ₁, ..., Ψ_i, ..., Ψ_NInstead of partial differentiation by each of the following, the Hamiltonian H ′ considering the self-interaction is represented by the wave function Ψ₁, ..., Ψ_i, ..., Ψ_NThe calculation is performed by partially differentiating by each of the above so as to reflect the influence of the self-interaction to the maximum.
[0189]
Therefore, in step S102, the wave function Ψ discretized into the cost function (Equation (17))₁, ..., Ψ_i, ..., Ψ_NAnd the energy of the whole system S ｛(Ψ_i) Is calculated, and the derivative of the cost function is transformed into the wave function ハ by using the Hamiltonian H ′ considering the self-interaction.₁, ..., Ψ_i, ..., Ψ_N(Equation (18)).
[0190]
Thereafter, a new wave function is calculated using the N derivatives calculated in step S102 and the following equation (19) (step S104).
[0191]
[Equation 19]

[0192]
In Expression (19), η is a scaling factor for causing the calculation of the minimum energy by the quantum level calculation program to converge. The second term on the right side of the equation (19) is obtained by calculating the equation (18) by partially differentiating the Hamiltonian H ′ considering the self-interaction with each component of the wave function Ψ, and calculating the equation (18). Wave function Ψ^oldIs calculated.
[0193]
New wave function Ψ by equation (19)_i ^newIs the already calculated wave function Ψ_i ^oldAnd the amount of change due to self-interaction (the second term on the right-hand side of equation (19)). Therefore, the new wave function Ψ_i ^newIs calculated by reflecting the change in the self-interaction. And the new wave function Ψ_i ^newIs N discrete wave functions Ψ₁, ..., Ψ_i, ..., Ψ_NIs calculated for each of.
[0194]
From equation (19), a new wave function Ψ_i ^newIs calculated, the new wave function Ψ_i ^newTo the wave function 式 of equation (17)_iTo calculate the cost function (equation (17)) and its derivative (equation (18)) using the new wave function (step S106).
[0195]
Then, it is determined whether the energy of the entire system converges by determining whether the cost function increases or by determining whether all of the N derivatives calculated in step S106 are zero. Is performed (step S108).
[0196]
In step S108, when the cost function does not increase or when all of the N derivatives are not zero, it is determined that the energy of the entire system does not converge, and steps S104, S106, and S108 are repeatedly performed. This is because when the cost function does not increase, the increase in the cost function is zero or the cost function decreases, and the energy of the entire system may further decrease. Is not zero, which indicates that the cost function is changing, and in this case also, the energy of the entire system may be further reduced.
[0197]
On the other hand, in step S108, when the cost function increases or when all of the N derivatives are zero, it is determined that the energy of the entire system has converged. Then, the wave function one step before is output (step S110).
[0198]
For example, if steps S104, S106, and S108 are repeatedly executed five times, and the cost function increases in step S108 for the fifth time, or all of the N derivatives become zero, the expression ( The N components calculated using (19) are output in step S110. The fact that the cost function has increased in the fifth step S108 or that all N derivatives have become zero means that the cost function calculated in the step S106 using the wave function calculated the fourth time in the step S104 has a minimum value. It means that it has become.
[0199]
In step S110, when N components that minimize the energy of the entire system are determined, a wave function (final wave function) including the determined N components and a Hamiltonian in consideration of the self-interaction. The energy of the entire system is calculated by using Expression (20) (Step S112), and the entire calculation operation ends.
[0200]
(Equation 20)

[0201]
Of the steps described above, steps S102, S104, S106, S108, and S110 represent an operation on one particle. Therefore, when determining the N components so as to minimize the energy of the entire system, the quantum level calculation program pays attention to one particle, and the influence of the self-interaction in the wave function for the one particle is the most. The first feature is that the N components are determined so as to be small. When the wave function for one particle is determined, the determined wave function is applied to particles of the entire system to calculate the energy of the entire system.
[0202]
As a result, the number of electrons confined in the well layer 103 due to the increase in the doping amount of the barrier layers 101 and 102 or the well layer 103 of the quantum well 100 increases, and even if the influence of the interaction of electrons increases, the Calculation can be performed so that the entire energy converges.
[0203]
Hereinafter, an example of calculation using the quantum level calculation program will be specifically described.
[0204]
FIG. 25 is a specific example of a quantum well to be calculated by the quantum level calculation program. The vertical axis indicates energy, and the horizontal axis indicates the distance z in the thickness direction of the barrier layers 101 and 102 and the well layer 103. Each of the two

barrier layers

101 and 102 is composed of 35 monolayers (= 10 nm) of Al._0.2Ga_0.8The well layer 103 is made of 35 monolayer (= about 10 nm) GaAs. The discontinuous value ΔEc (= Ec1−Ec2) of the band on the conduction band side is 167 meV.
[0205]
In the system shown in FIG. 25, a first level E1 is formed at a position of 28.25795 meV from the bottom edge of GaAs which is the well layer 103, and the initial wave function of the electrons occupying the first level E1 is a wave function Ψ. is there.
[0206]
Hereinafter, a case where the well layer 103 is doped with a donor (referred to as “well doping”) and a case where the barrier layers 101 and 102 are doped with a donor (referred to as “barrier doping”) will be described. Then, it is assumed that all the doped donors are activated and the total number of free electrons is equal to the total number of donors.
[0207]
Further, each monolayer of the barrier layers 101 and 102 and the well layer 103 is divided into 10 points. That is, the wave function Ψ is changed to 1051 wave functions Ψ₁, ..., Ψ_i, ..., Ψ₁₀₅₁Discretized into
[0208]
Further, the wave function at both ends of the quantum well 100 including the barrier layers 101 and 102 and the well layer 103 is approximated to be zero.
[0209]
Further, the scaling factor η in Expression (19) is set to −1.15 × 10⁷It was fixed and calculated. The value of the scaling factor η is a value when the image is divided into 10 points per monolayer, and another value is used in the case of another division number. For example, if it is divided into 20 points per monolayer, η = −2.30 × 10⁷Is used. In addition, as the value of the scaling factor η, a value other than these values is also assumed.
[0210]
FIG. 26 shows the calculation results for the well doping with respect to the doping amount. In each of FIGS. 26A to 26D, the vertical axis indicates energy, and the horizontal axis indicates distance z in the thickness direction of barrier layers 101 and 102 and well layer 103. FIG. 26A shows that the doping amount is 1.0 × 10¹⁸cm^-3FIG. 26B shows the case where the doping amount is 5.0 × 10¹⁸cm^-3FIG. 26C shows that the doping amount is 8.0 × 10¹⁸cm^-3FIG. 26D shows that the doping amount is 1.0 × 10¹⁹cm^-3The case of is shown. In FIGS. 26A to 26D, the wave function 初期 0 is the initial wave function, the symbols Ew1 to Ew4 indicate the energy values of the ground state, and the symbols Ψw1 to Ψw4 indicate the quantum level calculation program described above. 4 shows a wave function calculated using the wave function.
[0211]
As a result of calculating the four doping amounts using the quantum level operation program, the calculation time was as short as 30 seconds or less.
[0212]
In well doping, since electrons and donors are both present in the well layer 103, charge cancellation occurs between the electrons and the donors, which suppresses band bending outside the quantum well. This is also true at high doping.
[0213]
It was also found that the calculated distortion of the wave function was small. Further, the change in the ground state energies Ew1 to Ew4 is caused by the potential of the electrons due to the interaction between the ionized impurities (donors) and the electrons. The bending of the band increases as the doping amount increases.
[0214]
Table 1 shows the ground state energy value calculated by using the quantum level operation program in the well doping in comparison with the calculation result using the conventional SP method.
[0215]
[Table 1]

[0216]
As is clear from Table 1, the doping amount is 1.0 × 10¹⁸~ 1.7 × 10¹⁸cm^-3In the range, the calculation result using the quantum level calculation program shows good agreement with the calculation result by the conventional SP method, and the difference is almost negligible at 1.3 × 10^-4% Or less.
[0217]
The conventional SP method is 1.8 × 10¹⁸cm^-3While diverging for the above doping amount, the calculation result using the quantum level operation program is at least 1.0 × 10¹⁹cm^-3It was found that the convergence was surely achieved up to the doping amount of (the fact that the numerical value was obtained indicates that the convergence was achieved. The same applies hereinafter).
[0218]
As described above, the calculation method using the quantum level calculation program shows good agreement with the calculation result by the conventional SP method in the region where the doping amount is low when the well layer 103 is doped, The calculation result converged to the range is shown. As a result, by using the quantum level operation program, the energy value of the ground state can be obtained up to a high doping amount range.
[0219]
FIG. 27 shows a calculation result in the case where barrier doping is performed with respect to the doping amount. 27A to 27D, the vertical axis represents energy, and the horizontal axis represents the distance z in the thickness direction of the barrier layers 101 and 102 and the well layer 103. Further, in (a) to (d) of FIG. 27, the symbols Ψ0, Ψw1 to Ψw4, and Ew1 to Ew4 have the same meanings as in (a) to (d) of FIG. Further, FIG. 27A shows that the doping amount is 1.0 × 10¹⁷cm^-3FIG. 27B shows that the doping amount is 3.0 × 10¹⁷cm^-3FIG. 27C shows that the doping amount is 5.0 × 10¹⁷cm^-3FIG. 27D shows that the doping amount is 7.0 × 10¹⁷cm^-3The case of is shown.
[0220]
Also in this case, the calculation time was 30 seconds or less. As the doping amount of the barrier layers 101 and 102 increases, the distortion of the wave functions Ψw1 to Ψw4 increases. This is for the following reason. In the barrier doping, the confined electrons are present in the well layer 103 and the donors are present in the barrier layers 101 and 102. Therefore, the electrons in the well layer 103 are repelled by the interaction between the electrons and the barrier layers 101 and 102. Of the barrier layers 101 and 102 existing on both sides due to the attraction from the donor existing in the substrate. Then, as the doping amount of the barrier layers 101 and 102 increases, the attractive force from the donor existing in the barrier layers 101 and 102 increases, so that the spread of electrons in the well layer 103 increases. As a result, the distortion of the wave functions Ψw1 to 増加 w4 increases as the doping amount of the barrier layers 101 and 102 increases.
[0221]
Also, in the case of barrier doping, the bending of the band is larger than in the case of well doping. This is for the following reason. The potential of the electrons by the ionized donor increases the potential at the band edge at the bottom of the well layer 103. On the other hand, since the electrons and the donor are spatially separated from each other, the action of the electrons for removing the electric influence from the ionized donor is reduced. As a result, an electric field is applied from the well layer 103 to the barrier layers 101 and 102, and the bending of the band increases.
[0222]
Table 2 shows the ground state energy value calculated using the quantum level operation program in the barrier doping in comparison with the calculation result using the conventional SP method.
[0223]
[Table 2]

[0224]
As is clear from Table 2, the doping amount is 1.0 × 10¹⁷~ 5.0 × 10¹⁷cm^-3In the range, the calculation result using the quantum level operation program shows a good agreement with the calculation result by the conventional SP method. The calculation method using the quantum level calculation program is not converged by the conventional SP method.¹⁷cm^-3, 7.0 × 10¹⁷cm^-3Converge at the doping amount of.
[0225]
As described above, the calculation method using the quantum level calculation program shows good agreement with the calculation result by the conventional SP method in a region where the doping amount is low when the barrier layers 101 and 102 are doped, and shows a high doping amount. The calculation result converged to the range of the amount is shown. As a result, by using the quantum level operation program, the energy value of the ground state can be obtained up to a high doping amount range.
[0226]
FIG. 28 shows that the doping amount is 2.0 × 10¹⁸cm^-3In the case where the well doping is performed, the calculation result using the quantum level calculation program is shown in comparison with the calculation result by the conventional SP method. 28A and 28B, the vertical axis represents energy, and the horizontal axis represents the distance z in the thickness direction of the barrier layers 101 and 102 and the well layer 103. FIG. 28A shows a calculation result by the conventional SP method, and FIG. 28B shows a calculation result by using the quantum level operation program.
[0227]
The symbols Ψc and Ψivt indicate wave functions, and the symbols Vc and Vivt indicate band edge potentials.
[0228]
In the conventional SP method, in the process of repeating the method of solving the Schrodinger equation and the method of solving the Poisson equation, small deviations in balance and calculation errors are amplified, and as a result, the wave function Ψc oscillates with the progress of the calculation step. I do.
[0229]
On the other hand, in the calculation method using the quantum level calculation program, in the repetition of steps S104, S106, and S108 shown in FIG. 24, the correction by the equation (19) decreases as the wave function approaches the true solution, As a result, the wave function Ψivt converges.
[0230]
In the above description, the case where the number of divisions per monolayer is 10 points has been described. FIG. 29 shows the case where the number of divisions is changed. The vertical axis represents the ground state energy, and the horizontal axis represents the number of divisions per monolayer. A curve 105 (shown by a solid line) is a calculation result using the quantum level calculation program, and a curve 106 (shown by a dotted line) is a calculation result by the conventional SP method. The doping is performed on the well layer 103, and the doping amount is 1.0 × 10¹⁸cm^-3It is.
[0231]
In both methods, it was found that the energy value of the ground state became smaller as the number of divisions increased, and a more accurate wave function was obtained by dividing the space into smaller ones. Also, regarding the number of divisions of the space, it was found that the calculation method using the quantum level calculation program did not differ greatly from the calculation method using the conventional SP method.
[0232]
In the calculations in the well doping and barrier doping described above, the initial state obtained by the shooting method was used.
[0233]
Table 3 shows calculation results using a quantum level operation program when mathematically strict wave functions are used in well doping and barrier doping. The energy value corresponding to the exact wave function adopted as the initial state is 28.27683 meV.
[0234]
[Table 3]

[0235]
As a result, it was found that the energy value for each doping amount was larger when the energy value of 28.27683 meV was used as the energy value corresponding to the strict wave function adopted as the initial state (Tables 1 and 2). See Table 2). However, the difference is almost 1.5%, which is an allowable range in estimating quantum levels in actual semiconductor materials. Therefore, in the quantum level calculation program, it is considered that there is no particular problem even if the energy value in the initial state is calculated by the Shooting Method.
[0236]
In this manner, the width d of the well layer in the quantum well structure is obtained by using the quantum level operation program._w1, The electron density n of the well layer_w1, Barrier layer height W_d2And the width d of the barrier layer_bAnd the energy level E_w1Is obtained.
[0237]
Then, the input values (x₁(1), ..., x_m1(1)), (x₁(2), ..., x_m1(2)),..., (X₁(S), ..., x_m1(S)),..., (X₁(M), ..., x_m1(M)) and the output values (z (1), z (2),..., Z (S),..., Z (M)) included in the set 20 are obtained. z (1), z (2),..., z (S),..., z (M)) are determined as to exist on the above-mentioned gentle curved surface, and the output value (z ( 1), z (2),..., Z (S),..., Z (M)) exist on a gentle curved surface, and the input values (x₁(1), ..., x_m1(1)), (x₁(2), ..., x_m1(2)),..., (X₁(S), ..., x_m1(S)),..., (X₁(M), ..., x_m1(M)) to the input value (x₁(1), ..., x_m1(1)), (x₁(2), ..., x_m1(2)),..., (X₁(S), ..., x_m1(S)) and extract output values (z (M)) from the output values (z (1), z (2),..., Z (S),. 1), z (2), ..., z (S), ..., z (M)) to output values (z (1), z (2), ..., z (S)). Extract and prepare sample data 30.
[0238]
The personal computer 90 shown in FIG._w1When calculating the output value consisting of: the width d of the well layer in the quantum well structure_w1, The electron density n of the well layer_w1, Barrier layer height W_d2And the width d of the barrier layer_bAnd the energy level E_w1Is stored in the RAM 92. Then, when the CPU 91 is instructed from the keyboard 98 to execute a program for performing an operation for obtaining the approximate function f, the CPU 91 reads out the input values and the output values stored in the RAM 92 and displays them on the display 97 in accordance with the instruction.
[0239]
The user looks at the input value and the output value displayed on the display 97 and determines whether or not the output value exists on the gentle curved surface. When the output value exists on the gentle curved surface, the sample data 30 is output. Input value x₁(N), ..., x_m1(N) and output value z₁(N) (n = 1 to S) is designated from the keyboard 98.
[0240]
Then, the CPU 91 determines the designated input value x₁(N), ..., x_m1(N) and output value z₁(N), the sample data 30 is constructed, the program according to the present invention is read from the ROM 93, and the read program is executed to execute the input value x₁(N), ..., x_m1(N) and output value z₁An operation for obtaining an approximate function f defining the relationship with (n) is performed.
[0241]
Also, the output value z₁(N) A criterion for determining whether (n = 1 to S) exists on a gentle curved surface is given to the CPU 91 in advance, and the input value x₁(N), ..., x_m1(N) and output value z₁The CPU 91 may automatically extract (n).
[0242]
Therefore, the program according to the present invention more specifically uses the parameters of the barrier layers 101 and 102 and the well layer 103 that determine the quantum well structure as input values, and outputs the energy levels of electrons and holes in the quantum well as output values. Calculating an approximation function f that defines the relationship between the input value and the output value by using the sample data described below.
[0243]
In the above, the input value x₁(N), ..., x_m1(N) and output value z₁(N) describes the case where the calculation is performed by the personal computer 90. However, the present invention is not limited to this.₁(N), ..., x_m1(N) and output value z₁(N) may be acquired as an experiment result, and the acquired experiment result may be input to the personal computer 90 to form the sample data 30.
[0244]
Also, the input value x₁(N), ..., x_m1(N) and output value z₁One of (n) is the experimental result, and the input value x₁(N), ..., x_m1(N) and output value z₁The other of (n) may constitute the sample data 30 as a calculation result by the personal computer 90.
[0245]
Further, in the above description, the case where the output layer 43 performs the calculation for obtaining the approximate function f using the three-layer neural network 40 including one output unit has been described. The calculation for obtaining the approximate function f may be performed using the layer neural network 40A.
[0246]
FIG. 30 shows another conceptual diagram of a three-layer neural network in which a program according to the present invention calculates an output operation value with respect to an input value. The three-layer neural network 40A is the same as the three-layer neural network 40 except that the output layer 43 of the three-layer neural network 40 is replaced with an output layer 43A.
[0247]
The output layer 43A includes output units 43k (k = 1 to m3). The output unit 43k outputs the output Y from the intermediate unit 42j (j = 1 to m2)._j(J = 1 to m2), and the received output Y_j, The connection weight W_jkAnd threshold Θ_kIs substituted into Expression (4), and the output operation value Z₁(N), ..., Z_m3(N) (n = 1 to S) is calculated.
[0248]
Even when the three-layer neural network 40A is used, the program according to the present invention executes the calculation for obtaining the approximate function f according to the flowcharts shown in FIGS. When the calculation for obtaining the approximate function f is performed using the three-layer neural network 40A, n × S output values z₁(N), ..., z_m1(N) is prepared.
[0249]
Others are as described above.
Further, the input value and the output value used in the calculation for obtaining the approximation function f are not limited to the data related to the quantum well, but any type of the input value and the output value as long as the output value exists on a gentle curved surface. It may be.
[0250]
The embodiments disclosed this time are to be considered in all respects as illustrative and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description of the embodiments, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.
[Brief description of the drawings]
FIG. 1 is a diagram showing input values and output values used by a program according to the present invention for calculating an approximate function.
FIG. 2 is a diagram illustrating a relationship between an input value and an output value when there are two input values;
FIG. 3 is a conceptual diagram of a three-layer neural network in which a program according to the present invention calculates an output operation value with respect to an input value.
FIG. 4 is a diagram illustrating an internal state calculated using Expression (5) when two input values are used.
FIG. 5 is a diagram showing an output from the intermediate unit shown in FIG. 3 when two input values are used.
FIG. 6 is a diagram illustrating an internal state calculated using Expression (1) when two input values are used.
FIG. 7 is a diagram showing an output from the intermediate unit shown in FIG. 3 when two input values are used.
FIG. 8 shows a parameter θ._jAnd w_ijFIG. 14 is a diagram showing a possible relative positional relationship of the identification hypersphere and an output example of the intermediate unit 42j when the range of is set to a small radius corresponding to the local feature.
FIG. 9 shows a parameter θ._jAnd w_ijFIG. 10 is a diagram showing a possible relative positional relationship of the identification hypersphere and an output example of the intermediate unit 42j when the range of “” is widened.
FIG. 10: Parameter w_ijIt is a figure which shows the relative positional relationship which an identification hypersphere can take when the (center) is a small hypersphere outside the domain of an input variable, and the output example of the intermediate unit 42j.
FIG. 11 is a flowchart of optimization by a high-dimensional algorithm.
FIG. 12 is a conceptual diagram illustrating a solution search method using a high-dimensional algorithm.
FIG. 13 is a diagram illustrating a landscape of a cost function when there are two parameters.
FIG. 14 shows ν shown in equation (9)._n(Q_nFIG.
FIG. 15 is a diagram showing a degree of the localization of the cost function in comparison with the high-dimensional algorithm (HA) and the annealing method (SA).
FIG. 16 is a diagram showing the speed of passing through a flat region of the cost function in comparison between the high-dimensional algorithm (HA) and the annealing method (SA).
FIG. 17 shows a program according to the present invention in which an input value x₁(N), ..., x_m1(N) and output value z₁10 is a flowchart illustrating an operation for obtaining an approximate function f that defines a relationship with (n).
18 is a flowchart illustrating a detailed operation of step S2 shown in FIG.
FIG. 19 is a conceptual diagram of a three-layer neural network when the number of intermediate units is set to one.
FIG. 20 is a conceptual diagram of a three-layer neural network when the number of intermediate units is set to two.
FIG. 21 is a diagram showing contour lines of three test functions of two inputs and one output.
FIG. 22 is a schematic block diagram of a personal computer.
FIG. 23 is a conceptual diagram of a quantum well to be operated by the quantum level operation program.
FIG. 24 is a flowchart showing steps constituting a quantum level operation program.
FIG. 25 is a specific example of a quantum well to be calculated by a quantum level calculation program.
FIG. 26 shows calculation results when the doping amount is changed in well doping.
FIG. 27 shows calculation results when the doping amount is changed in barrier doping.
FIG. 28 is a diagram showing a comparison between a calculation result using a quantum level calculation program and a calculation result using a conventional method.
FIG. 29 is a diagram showing the division number dependence of the energy value of the ground state.
FIG. 30 is another conceptual diagram of a three-layer neural network in which a program according to the present invention calculates an output operation value with respect to an input value.
[Explanation of symbols]
1 to 6 contours, 7 curved surfaces, 10, 20 sets, 30 sample data, 40, 40A three-layer neural network, 41 input layer, 42, 42A, 42B intermediate layer, 43, 43A output layer, 50 hyperplane, 51, 52 , 81 regions, 60 hyperspheres, 61-65 curved surfaces, 70 semantic spaces, 71 solutions, 72, 82 arrows, 80 high-dimensional spaces, 89, 105, 106 curves, 90 personal computers, 91 CPU, 92 RAM, 93 ROM, 94 serial interface, 95 terminals, 96 CD-ROM drive, 97 display, 98 keyboard, 99 CD, 100 quantum well, 101, 102 barrier layer, 103 well layer, 104 energy level, 411-41m1 input unit, 421-42m2 Intermediate unit, 431-43m3 Output unit.

Claims

Using S (S is a natural number) sample data each consisting of m (m is a natural number) input values and n (n is a natural number) output values, m input and n output A program for causing a computer to execute an operation for obtaining an approximate function that defines a relationship,
A first step of receiving the m × S input values and the n × S output values;
Among all the parameters of the three-layer neural network of the hypersphere identification type that calculates n output operation values for the m input values, the value of the parameter of the identification hypersphere is wider than the normal search range. , Calculating n × S output operation values for the m × S input values by the hypersphere identification type operation, and using the calculated n × S output operation values to perform the approximation. And a second step of optimizing the values of all said parameters so as to obtain a function.
The second step sets a high-dimensional space higher than the number of dimensions defined by the number of all parameters, and quickly sets an area in the set high-dimensional space where the values of all parameters are other than optimal values. A program for causing a computer to execute the optimization of all parameters by a high-dimensional algorithm that is expected to pass through and easily enter the region where the values of all parameters are the optimal values.

In the second step, the number of intermediate units included in the three-layer neural network and performing the operation of the hypersphere identification type is set to an initial value, and the n × S output operation values are calculated. 2. A program for causing a computer to execute the method according to claim 1, wherein the computer optimizes all the parameters.

The second step is
A first sub-step of setting the parameters to initial values and calculating the n × S output calculation values by the hypersphere identification type calculation;
A second sub-step of calculating a cost function value for evaluating the calculated n × S output calculation values, and comparing the calculated cost function value with a predetermined value;
A third sub-step of, when the cost function value is equal to or less than the predetermined value, setting values of all parameters when the cost function value is obtained as an optimum value;
A fourth sub-step of calculating all parameters for reducing the cost function value in the wide search range by the high-dimensional algorithm when the cost function value is larger than the predetermined value;
A fifth sub-step of executing the first sub-step using all the parameters calculated in the fourth sub-step, and thereafter executing the second to fourth sub-steps;
When the cost function value when the first to fifth sub-steps are repeatedly executed up to a specified number of times is larger than the predetermined value, the number of the intermediate units is increased and the first to fifth sub-steps are executed. A program for causing a computer to execute according to claim 2, comprising: a sixth sub-step of executing.

The program for causing a computer to execute according to claim 3, wherein the number of the intermediate units is increased by one.

The program for causing a computer to execute according to any one of claims 2 to 4, wherein an initial value of the number of the intermediate units is one.

2. The computer according to claim 1, wherein the second step sets the number of all the parameters to an initial value, calculates the n × S output operation values, and optimizes all the parameters. Program to let you.

The second step is
A first sub-step of setting the parameters to initial values and calculating the n × S output calculation values by the hypersphere identification type calculation;
A second sub-step of calculating a cost function value for evaluating the calculated n × S output calculation values, and comparing the calculated cost function value with a predetermined value;
A third sub-step of, when the cost function value is equal to or less than the predetermined value, setting values of all parameters when the cost function value is obtained as an optimum value;
A fourth sub-step of calculating all parameters for reducing the cost function value in the wide search range by the high-dimensional algorithm when the cost function value is larger than the predetermined value;
A fifth sub-step of executing the first sub-step using all the parameters calculated in the fourth sub-step, and thereafter executing the second to fourth sub-steps;
When the cost function value when the first to fifth sub-steps are repeatedly executed up to a specified number of times is greater than the predetermined value, the number of all the parameters is increased to execute the first to fifth sub-steps And a sixth sub-step of executing the program.

All parameters are incremented by a predetermined number,
8. The method according to claim 7, wherein the first to fifth sub-steps are performed by fixing a value of a predetermined number of parameters before the number of all parameters is increased when the number of all parameters is increased. A program for causing a computer to execute according to the above.

When a predetermined number of parameters before the number of all parameters is increased is defined as a first parameter, and the increased predetermined number of parameters is defined as a second parameter,
In the fourth sub-step, the first parameter is fixed, and the second parameter is changed in the wide search range to calculate all parameters for reducing the cost function value by the high-dimensional algorithm. A program for causing a computer according to claim 8 to execute.

The second sub-step calculates the average of the sum of square errors of the received n × S output values and the calculated n × S output calculation values as the cost function value. A program for causing a computer according to any one of claims 3 to 5 and claims 7 to 9 to execute.

The program for causing a computer to execute according to any one of claims 1 to 10, wherein the nxS output values are approximated by a combination of Gaussian-like distributions.

The program for causing a computer to execute according to any one of claims 1 to 11, wherein the mxS input values and the nxS output values are data calculated by the computer. .

The m × S input values and the n × S output values are data calculated by a quantum level calculation program for calculating a quantum level of a particle confined in a microstructure,
The quantum level operation program,
Calculating an initial wave function based on the linear Schrodinger equation and providing the calculated initial wave function as a numerical sequence of a plurality of discretized components;
Using the first wave function having a plurality of discretized components and a Hamiltonian including a nonlinear term in consideration of the interaction between particles, the first wave function is normalized by the number of particles existing in the microstructure, and the whole system Step B of calculating a cost function indicating the energy of
Step C of calculating a final wave function that minimizes the total energy of the system using the calculated cost function;
13. The program for causing a computer to execute according to claim 12, comprising a step D of calculating energy of a state represented by the final wave function using the final wave function and the Hamiltonian.

The high-dimensional algorithm,
Defining the space of all parameters that appear in the problem to be solved and to be optimized as a semantic space;
Defining a new space with a conjugate parameter conjugate to all the parameters;
Adding the new space to the semantic space to define a high-dimensional space;
Setting a problem in the high dimensional space;
Perform an autonomous motion in the high-dimensional space that quickly passes through a region where the values of all the parameters are other than the optimal value and easily enters the region where the values of the all parameters are the optimal value. 14. A program for causing a computer to execute according to any one of claims 1 to 13, comprising a step of detecting optimum values of all the parameters.

A computer-readable recording medium recording the program according to any one of claims 1 to 14.