JP7239798B2

JP7239798B2 - Element value inference method, element value inference device, and element value inference program

Info

Publication number: JP7239798B2
Application number: JP2019089493A
Authority: JP
Inventors: 伸和 ▲高▼井; 雅史福田; 将大猿田
Original assignee: Gunma University NUC
Current assignee: Gunma University NUC
Priority date: 2019-05-10
Filing date: 2019-05-10
Publication date: 2023-03-15
Anticipated expiration: 2039-05-10
Also published as: JP2020187395A

Description

本発明は、素子値推論方法、素子値推論装置、及び、素子値推論プログラムに関する。 The present invention relates to an element value inference method, an element value inference device, and an element value inference program.

特許文献１に開示された課題「回路中の素子の定数を容易に決定すること」を解決すべく、例えば（１）数式及び知識ベースで設計する手法、（２）遺伝的プログラムを用いる手法、並びに（３）ニューラルネットワークを用いる手法が採用される。 In order to solve the problem "easily determining the constants of elements in a circuit" disclosed in Patent Document 1, for example, (1) a method of designing with a mathematical formula and a knowledge base, (2) a method of using a genetic program, and (3) a technique using a neural network is adopted.

特開２００４－１４５４１０号公報JP-A-2004-145410

しかしながら、上記した（１）の手法では、設計された素子の定数が妥当であるか否かは、設計者の知識、経験、判断等により左右される。また、上記した（２）の手法では、遺伝学的過程（例えば、突然変異、染色体の交叉）を通じて適切な解を得ようと試みることから、同一の解、即ち、素子について同一の定数を繰り返し得るとの再現性に欠ける。さらに、上記した（３）の手法では、学習した範囲内のみで学習の効果を期待することができることから、例えば、一の回路の素子について学習しても、学習をしていない他の回路については、素子の定数を適切に決定することができない。 However, in method (1) above, whether or not the constants of the designed elements are appropriate depends on the designer's knowledge, experience, judgment, and the like. In addition, in the above method (2), since an attempt is made to obtain an appropriate solution through a genetic process (e.g., mutation, crossover of chromosomes), the same solution, that is, the same constant is repeated for the element. Lack of reproducibility when obtained. Furthermore, in the above method (3), the effect of learning can be expected only within the learned range. cannot adequately determine the constants of the elements.

本発明の目的は、回路性能が向上する回路素子の素子値を容易に決定することにある。 SUMMARY OF THE INVENTION It is an object of the present invention to easily determine element values of circuit elements that improve circuit performance.

上記した課題を解決すべく、本発明に係る素子値推論方法は、回路に含まれる１つ以上の回路素子が取り得る複数の素子値のうち、一の素子値から他の素子値へ変更する素子値変更工程と、前記変更後の前記他の素子値を用いて、前記回路が有する複数の特性を算出するシミュレーション工程と、前記算出された複数の特性を用いて、前記変更前の前記一の素子値の価値を算出する価値算出工程と、前記複数の素子値のうち、前記複数の特性を改善する蓋然性を有する素子値を推論すべく、前記変更前の前記一の素子値についての前記算出された価値を用いて、前記複数の素子値と前記複数の素子値の価値との対応関係を示す行動価値関数を更新する行動価値関数更新工程と、を含む。 In order to solve the above-described problems, an element value inference method according to the present invention changes one element value from among a plurality of element values that can be taken by one or more circuit elements included in a circuit to another element value. an element value changing step; a simulation step of calculating a plurality of characteristics of the circuit using the other element values after the change; and a simulation step of calculating the characteristics of the circuit using the calculated characteristics. and a value calculation step of calculating the value of the element value of the one element value before the change, in order to infer an element value having a probability of improving the plurality of characteristics among the plurality of element values and an action-value function updating step of updating an action-value function indicating a correspondence relationship between the plurality of element values and the values of the plurality of element values, using the calculated values.

上記した課題を解決すべく、本発明に係る素子値推論装置は、回路に含まれる１つ以上の回路素子が取り得る複数の素子値のうち、一の素子値から他の素子値へ変更する素子値変更部と、前記変更後の前記他の素子値を用いて、前記回路が有する複数の特性を算出するシミュレーション部と、前記算出された複数の特性を用いて、前記変更前の前記一の素子値の価値を算出する価値算出部と、前記複数の素子値のうち、前記複数の特性を改善する蓋然性を有する素子値を推論すべく、前記変更前の前記一の素子値についての前記算出された価値を用いて、前記複数の素子値と前記複数の素子値の価値との対応関係を示す行動価値関数を更新する行動価値関数更新部と、を含む。 In order to solve the above-described problems, an element value inference apparatus according to the present invention changes one element value from among a plurality of element values that can be taken by one or more circuit elements included in a circuit to another element value. an element value changing unit; a simulation unit that calculates a plurality of characteristics of the circuit using the other element values after the change; and a value calculation unit for calculating the value of the element value of the one element value before the change, in order to infer an element value having a probability of improving the plurality of characteristics among the plurality of element values an action-value function updating unit that updates an action-value function indicating a correspondence relationship between the plurality of element values and the values of the plurality of element values, using the calculated values.

上記した課題を解決すべく、本発明に係る素子値推論プログラムは、コンピュータに、
回路に含まれる１つ以上の回路素子が取り得る複数の素子値のうち、一の素子値から他の素子値へ変更する素子値変更工程と、前記変更後の前記他の素子値を用いて、前記回路が有する複数の特性を算出するシミュレーション工程と、前記算出された複数の特性を用いて、前記変更前の前記一の素子値の価値を算出する価値算出工程と、前記複数の素子値のうち、前記複数の特性を改善する蓋然性を有する素子値を推論すべく、前記変更前の前記一の素子値についての前記算出された価値を用いて、前記複数の素子値と前記複数の素子値の価値との対応関係を示す行動価値関数を更新する行動価値関数更新工程と、を実行させる。 In order to solve the above problems, an element value inference program according to the present invention provides a computer with:
an element value changing step of changing from one element value to another element value among a plurality of element values that can be taken by one or more circuit elements included in a circuit, and using the other element value after the change , a simulation step of calculating a plurality of characteristics possessed by the circuit; a value calculation step of calculating a value of the one element value before the change using the calculated plurality of characteristics; Among them, in order to infer an element value having a probability of improving the plurality of characteristics, using the calculated value for the one element value before the change, the plurality of element values and the plurality of elements and an action-value function updating step of updating the action-value function indicating the correspondence between the value and the value.

本発明に係る素子値推論方法、素子値推論装置、及び、素子値推論プログラムによれば、前記変更後の前記他の素子値を用いて算出された、前記回路が有する複数の特性を用いることにより、前記変更前の前記一の素子値の価値を算出し、さらに、前記変更前の前記一の素子値についての前記価値を用いることにより、前記行動価値関数を更新する。当該更新された行動価値関数を用いて、前記回路の前記複数の特性を改善する蓋然性を有する素子値を推論することから、回路性能が向上する回路素子の素子値を容易に決定することができる。 According to the element value inference method, the element value inference device, and the element value inference program according to the present invention, the plurality of characteristics of the circuit calculated using the other element values after the change are used. calculates the value of the one element value before the change, and updates the action value function by using the value of the one element value before the change. By inferring element values that have a probability of improving the plurality of characteristics of the circuit using the updated behavioral value function, it is possible to easily determine element values of circuit elements that improve circuit performance. .

実施形態１の素子値推論装置の構成を示す。1 shows the configuration of an element value inference device of Embodiment 1. FIG. 実施形態１の素子値推論装置の構成（ハードウェア）を示す。2 shows the configuration (hardware) of the element value inference device of Embodiment 1. FIG. 実施形態１の回路を示す。1 shows the circuit of Embodiment 1; 実施形態１のＭＯＳ型電解効果トランジスタを示す。1 shows a MOS field effect transistor of Embodiment 1. FIG. 実施形態１の回路の特性を示す。4 shows the characteristics of the circuit of Embodiment 1. FIG. 実施形態１のＱテーブルを示す。2 shows a Q-table of Embodiment 1; 実施形態１の素子値推論装置の動作を示す（前半）。The operation of the element value inference device of Embodiment 1 is shown (first half). 実施形態１の素子値推論装置の動作を示す（後半）。The operation of the element value inference device of Embodiment 1 is shown (second half). 実施形態１のＱテーブルの更新を示す。4 shows the updating of the Q-table of Embodiment 1; 実施形態１の回路特性及び回路特性の値を示す。4 shows circuit characteristics and values of the circuit characteristics of Embodiment 1. FIG. 実施形態１のエピソード、ステップ、及びスコアの関係を示す。1 shows the relationship between episodes, steps, and scores in Embodiment 1. FIG. 実施形態１での学習の時期、ステップ、スコアの関係を示す。3 shows the relationship between learning time, step, and score in Embodiment 1. FIG. 変形例の回路の構成を示す。3 shows a circuit configuration of a modified example; 変形例の回路特性及び回路特性の改善を示す。4 shows circuit characteristics of a modified example and improvements in circuit characteristics.

〈実施形態１〉
以下、本発明に係る素子値推論装置の実施形態について説明する。 <Embodiment 1>
An embodiment of an element value inference device according to the present invention will be described below.

〈素子値推論装置の構成（概要）〉
図１は、実施形態１の素子値推論装置の構成を示す。実施形態１の素子値推論装置ＤＶは、強化学習（Ｑ－Ｌｅａｒｎｉｎｇ）を用いた上で、回路ＣＫ（図３に図示。）を構成する回路素子であるＭＯＳ型電界効果トランジスタ（ＭＯＳＦＥＴ：Metal-Oxide-Semiconductor Field-Effect Transistor）Ｍ１～Ｍ８の素子値である「Ｍ」（図４を用いて後述。）を推論することを目的とする。素子値推論装置ＤＶは、より詳しくは、回路ＣＫの回路特性ＣＨ、即ち、消費電流ＣＨ１～入力換算雑音ＣＨ１３（図５に図示。）を改善する蓋然性、言い換えれば、全体最適により近付く蓋然性を有する「Ｍ」を、Ｑ－Ｌｅａｒｎｉｎｇを実施するための行動価値関数（以下、「ＱテーブルＱＴ」ともいう。図６に図示。）を用いて推論する、即ち、探索する。 <Configuration of element value inference device (outline)>
FIG. 1 shows the configuration of an element value inference device according to the first embodiment. The element value inference apparatus DV of the first embodiment uses reinforcement learning (Q-Learning), and uses MOS field effect transistors (MOSFETs), which are circuit elements that make up the circuit CK (shown in FIG. 3). The object is to infer "M" (described later with reference to FIG. 4), which is the element value of Oxide-Semiconductor Field-Effect Transistors M1 to M8. More specifically, the element value inference device DV has the probability of improving the circuit characteristics CH of the circuit CK, that is, the consumption current CH1 to the input conversion noise CH13 (shown in FIG. 5), in other words, the probability of approaching the overall optimum. “M” is inferred, that is, searched using an action-value function (hereinafter also referred to as “Q table QT”, shown in FIG. 6) for implementing Q-Learning.

実施形態１の素子値推論装置ＤＶの説明に先立ち、（１）回路ＣＫ、（２）回路ＣＫを構成するＭＯＳ型電解効果トランジスタＭ１～Ｍ８、（３）回路ＣＫの回路特性ＣＨ、及び（４）ＱテーブルＱＴについて説明する。 Prior to the explanation of the element value inference device DV of the first embodiment, (1) the circuit CK, (2) the MOS field effect transistors M1 to M8 constituting the circuit CK, (3) the circuit characteristics CH of the circuit CK, and (4 ) Q table QT will be explained.

〈回路の構成〉
図３は、実施形態１の回路を示す。 <Circuit configuration>
FIG. 3 shows the circuit of the first embodiment.

回路ＣＫは、従来知られたオペアンプ回路である。回路ＣＫは、図３に示されるように、差動増幅を行うべく、ＭＯＳ型電解効果トランジスタＭ１～Ｍ８と、抵抗器Ｒ１、キャパシタＣ１とを含む。回路ＣＫには、電源電圧Ｖｄｄ（例えば、５．０Ｖ、３．３Ｖ）、及び、負電圧Ｖｓｓが印加されている。回路ＣＫは、端子Ｖｉｎｍ、端子Ｖｉｎｐに入力される２つの信号に前記差動増幅を施し、当該差動増幅された信号を端子Ｖｏｕｔから出力する。 Circuit CK is a conventionally known operational amplifier circuit. The circuit CK, as shown in FIG. 3, includes MOS field effect transistors M1-M8, a resistor R1 and a capacitor C1 for differential amplification. A power supply voltage Vdd (eg, 5.0 V, 3.3 V) and a negative voltage Vss are applied to the circuit CK. The circuit CK performs the differential amplification on the two signals input to the terminal Vinm and the terminal Vinp, and outputs the differentially amplified signal from the terminal Vout.

〈ＭＯＳ型電解効果トランジスタの構成〉
図４は、実施形態１のＭＯＳ型電解効果トランジスタを示す。 <Structure of MOS field effect transistor>
FIG. 4 shows a MOS field effect transistor of Embodiment 1. FIG.

ＭＯＳ型電解効果トランジスタＭ１～Ｍ８では、図４に示されるように、基板上に、ソース、ドレイン、及びゲートが形成されている。「Ｌ」は、ソース及びドレイン間の距離であるゲート長を表し、また、「Ｗ」は、ゲートの奥行方向の長さであるゲート幅を表す。実施形態１では、「Ｌ」及び「Ｗ」を固定した上で、「Ｗ」を「Ｍ」倍することを想定する（Ｍは、任意の整数）。ここで、「Ｌ」及び「Ｗ」を固定した上で、「Ｍ」を推論しようとする理由は、当該推論、即ち、探索に要する時間を限定することにより、学習全体の所要時間を短縮するためである。 In the MOS type field effect transistors M1 to M8, as shown in FIG. 4, the source, drain and gate are formed on the substrate. "L" represents the gate length, which is the distance between the source and the drain, and "W" represents the gate width, which is the length of the gate in the depth direction. In the first embodiment, it is assumed that "L" and "W" are fixed and "W" is multiplied by "M" (M is an arbitrary integer). Here, the reason for trying to infer 'M' after fixing 'L' and 'W' is to shorten the time required for the entire learning by limiting the time required for the inference, that is, the search. It's for.

〈回路特性の内訳〉
図５は、実施形態１の回路の回路特性を示す。 <Breakdown of circuit characteristics>
FIG. 5 shows circuit characteristics of the circuit of the first embodiment.

回路ＣＫ（図３に図示。）は、図５に示されるように、回路特性ＣＨを有し、より詳しくは、消費電流ＣＨ１～入力換算雑音ＣＨ１３を有する。消費電流ＣＨ１～入力換算雑音ＣＨ１３の値は、回路ＣＫを構成するＭＯＳ型電解効果トランジスタＭ１～Ｍ８の「Ｗ」の長短により変わり、換言すれば、「Ｍ」の大小により変わる。消費電流ＣＨ１～入力換算雑音ＣＨ１３のうち、「Ｍ」を大きくすると、特性の値が最低要件ＲＥを基準として良くなるように変わる回路特性が存在し、対照的に、特性の値が最低要件ＲＥを基準として良くならないように変わる回路特性も存在する。従って、消費電流ＣＨ１～入力換算雑音ＣＨ１３の全てが、最低要件ＲＥを満足しつつ、かつ、全体として最適になるような「Ｍ」を推論する必要性がある。 The circuit CK (shown in FIG. 3) has circuit characteristics CH, more specifically, current consumption CH1 to input conversion noise CH13, as shown in FIG. The values of the current consumption CH1 to the input equivalent noise CH13 change depending on the length of "W" of the MOS field effect transistors M1 to M8 that constitute the circuit CK, in other words, change depending on the magnitude of "M". Among the current consumption CH1 to the input equivalent noise CH13, if "M" is increased, there is a circuit characteristic in which the characteristic value changes to improve with respect to the minimum requirement RE. There are also circuit characteristics that change unfavorably with respect to . Therefore, it is necessary to infer "M" such that all of the current consumption CH1 to the input equivalent noise CH13 satisfy the minimum requirement RE and are optimal as a whole.

〈Ｑテーブルの構成〉
図６は、実施形態１のＱテーブルを示す。 <Configuration of Q table>
FIG. 6 shows the Q table of the first embodiment.

ＱテーブルＱＴは、図６に示されるように、行列（マトリックス）である。ＱテーブルＱＴの縦軸は、状態Ｓを示し、横軸は、行動Ａを示す。縦軸及び横軸により規定される位置（セル）には、例えば、価値Ｑ１１～Ｑ６６が存在する。 The Q-table QT is a matrix, as shown in FIG. The vertical axis of the Q-table QT indicates the state S, and the horizontal axis indicates the action A. Values Q11 to Q66, for example, exist at positions (cells) defined by the vertical and horizontal axes.

「状態Ｓ」は、Ｑ－Ｌｅａｒｎｉｎｇについて知られている「状態」と同義であり、ここでは、「回路ＣＫの回路特性ＣＨ（消費電流ＣＨ１～入力換算雑音ＣＨ１３）」をいう。 "State S" is synonymous with "state" known for Q-Learning, and here refers to "circuit characteristics CH of circuit CK (consumption current CH1 to input conversion noise CH13)".

「行動Ａ」は、Ｑ－Ｌｅａｒｎｉｎｇについて知られている「行動」と同義であり、ここでは、「回路ＣＫの回路特性（消費電流ＣＨ１～入力換算雑音ＣＨ１３）を改善すべく、「Ｍ」（素子値）を変更する」ことをいう。 "Action A" is synonymous with "action" known for Q-Learning, and here, "in order to improve the circuit characteristics of the circuit CK (consumption current CH1 to input equivalent noise CH13), 'M' ( element value).

「価値Ｑ１１～Ｑ６６」は、Ｑ－Ｌｅａｒｎｉｎｇについて知られている「Ｑ値」と同義であり、例えば、「価値Ｑ２３」は、「状態Ｓ２」のときに「行動Ａ３」を取ったときの「行動Ａ３」の価値を示し、ここでは、回路ＣＫの回路特性ＣＨ（消費電流ＣＨ１～入力換算雑音ＣＨ１３）が「状態Ｓ２」であるときに、回路ＣＫの「Ｍ」（素子値）を変更するとの「行動Ａ３」を取るときの「行動Ａ３」の価値をいう。 "Values Q11 to Q66" are synonymous with "Q values" known for Q-Learning. For example, "value Q23" is " Here, when the circuit characteristic CH (consumption current CH1 to input conversion noise CH13) of the circuit CK is in the "state S2", if the "M" (element value) of the circuit CK is changed, It means the value of "action A3" when taking "action A3" of

〈素子値推論装置の構成（詳細）〉
図１に戻り、実施形態１の素子値推論装置ＤＶは、図１に示されるように、制御部１と、初期設定・初期化部２と、素子値変更部３と、シミュレーション部４と、Ｑ値算出部５と、Ｑテーブル更新部６と、記憶部７と、を含む。 <Configuration of element value inference device (details)>
Returning to FIG. 1, as shown in FIG. 1, the element value inference device DV of Embodiment 1 includes a control unit 1, an initialization/initialization unit 2, an element value change unit 3, a simulation unit 4, A Q value calculator 5 , a Q table updater 6 , and a storage 7 are included.

制御部１は、素子値推論装置ＤＶの全体の動作を監視し及び制御する。 The control unit 1 monitors and controls the overall operation of the element value inference device DV.

初期設定・初期化部２は、Ｑ－Ｌｅａｒｎｉｎｇの初期化、ＱテーブルＱＴの初期化、ステップＳＴの上限値（最後の数）の設定、エピソードＥＰの上限値（最後の数）の設定、目標値ＭＫの設定、更新式ＵＤのα、γの設定等を行うために用いられる。 The initialization/initialization unit 2 initializes Q-Learning, initializes the Q table QT, sets the upper limit value (last number) of the step ST, sets the upper limit value (last number) of the episode EP, sets the target It is used for setting the value MK, setting α and γ of the update formula UD, and the like.

「ステップＳＴ」は、「行動Ａ」（図６に図示。）を変更する回数に相当し、例えば、ステップＳＴ＝１で「行動Ａ１」を行い、ステップＳＴ＝２で「行動Ａ２」を行うとの意である。 “Step ST” corresponds to the number of times “action A” (shown in FIG. 6) is changed, for example, “action A1” is performed at step ST=1, and “action A2” is performed at step ST=2. It means.

「エピソードＥＰ」は、複数のステップＳＴからなる１つのセットをいう。例えば、エピソードＥＰが３であるとき、例えば、１番目のエピソードでは、「行動Ａ１」→「行動Ａ２」→「行動Ａ３」→、、、→「行動Ａ６」との複数のステップＳＴを行い、２番目のエピソードでは、「行動Ａ１」→「行動Ａ３」→「行動Ａ５」→、、、→「行動Ａ６」との複数のステップＳＴを行い、３番目のエピソードでは、「行動Ａ１」→「行動Ａ４」→「行動Ａ２」→、、、→「行動Ａ６」との複数のステップＳＴを行う。 "Episode EP" refers to one set consisting of a plurality of steps ST. For example, when the episode EP is 3, for example, in the first episode, a plurality of steps ST of "action A1" → "action A2" → "action A3" → → "action A6" are performed, In the second episode, a plurality of steps ST of "action A1" → "action A3" → "action A5" →,,, → "action A6" are performed, and in the third episode, "action A1" → " A plurality of steps ST such as “action A4”→“action A2”→,,,→“action A6” are performed.

目標値ＭＫは、後述される、式２により表される報酬ｒ（「価値Ｑ」（図６に図示。）を算出するための値）を算出するために用いられる定数である。目標値ＭＫの値については、素子値推論装置ＤＶを使用する者が、任意に設定することができ、例えば、回路ＣＫの回路特性ＣＨ（消費電流ＣＨ１～入力換算雑音ＣＨ１３）についての、初期時（Ｑ－Ｌｅａｒｎｉｎｇを開始する前）のスコアＳＣ（後述される式１）を設定することができる。 The target value MK is a constant used to calculate a reward r (a value for calculating a “value Q” (shown in FIG. 6)) represented by Equation 2, which will be described later. The value of the target value MK can be arbitrarily set by a person who uses the element value inference device DV. It is possible to set a score SC (Formula 1 described later) (before starting Q-Learning).

α、γは、後述される、式３により表される更新式ＵＤにおける学習率、割引率をそれぞれ示す。学習率α率及び割引率γは、Ｑ－Ｌｅａｒｎｉｎｇについて知られているパラメータである。具体的には、学習率αは、価値Ｑを更新させる度合いを表し、割引率γは、将来の価値Ｑを割り引く程度を表す。上記した目標値ＭＫと同様に、素子値推論装置ＤＶを使用する者が、学習率α及び割引率γを任意に設定することができる。 α and γ denote the learning rate and discount rate in the update formula UD represented by Formula 3, which will be described later, respectively. The learning rate α rate and the discount rate γ are known parameters for Q-Learning. Specifically, the learning rate α represents the degree to which the value Q is updated, and the discount rate γ represents the degree to which the future value Q is discounted. As with the target value MK described above, the person using the element value inference device DV can arbitrarily set the learning rate α and the discount rate γ.

素子値変更部３は、ＭＯＳ型電解効果トランジスタＭ１～Ｍ８の素子値である「Ｍ」の大きさを変更する。 The element value changer 3 changes the magnitude of "M", which is the element value of the MOS field effect transistors M1 to M8.

シミュレーション部４は、素子値である「Ｍ」が変更される毎に、変更後の「Ｍ」を用いて、回路ＣＫの消費電流ＣＨ１～入力換算雑音ＣＨ１３の値を算出する、即ち、シミュレーションする。シミュレーション部４は、当該シミュレーションを、例えば、従来知られているＳＰＩＣＥ（Simulation Program with Integrated Circuit Emphasis）を用いて行う。ＳＰＩＣＥは、回路ＣＫを基にテキスト（文字）で又はグラフィカル（画像処理）に生成されたネットリスト（消費電流ＣＨ１～入力換算雑音ＣＨ１３を算出するための複数の理論式ＥＱ）に基づき、「Ｍ」を用いて消費電流ＣＨ１～入力換算雑音ＣＨ１３の値を算出する。 Every time the element value "M" is changed, the simulation unit 4 uses the changed "M" to calculate the values of the current consumption CH1 to the input equivalent noise CH13 of the circuit CK, that is, to simulate. . The simulation unit 4 performs the simulation using, for example, the conventionally known SPICE (Simulation Program with Integrated Circuit Emphasis). SPICE is based on a netlist (a plurality of theoretical equations EQ for calculating current consumption CH1 to input equivalent noise CH13) generated textually (characters) or graphically (image processing) based on the circuit CK. ” is used to calculate the values of the current consumption CH1 to the input conversion noise CH13.

Ｑ値算出部５は、「価値Ｑ」、例えば、「価値Ｑ１１」～「価値Ｑ６６」（図６に図示。）を算出する。Ｑ値算出部５は、具体的には、（１）式１を用いてスコアＳＣを算出し、（２）式２を用いて報酬ｒを算出し、（３）式３を用いて「価値Ｑ」を算出する。ここで、式１は、例えば、演算増幅器設計コンテストで採用されている評価基準である。式２は、「行動Ａ」を行ったときの「行動Ａ」のスコアＳＣ（後述される評価値（図１０と同義。））が、目標値ＭＫ（例えば、回路ＣＫの回路特性ＣＨについての初期時（学習前）のスコアＳＣ）を超える場合に、報酬ｒを与える趣旨である。式３の更新式ＵＤは、Ｑ－Ｌｅａｒｎｉｎｇについて知られている、Ｑ値を更新するための手法である。 The Q-value calculator 5 calculates "value Q", for example, "value Q11" to "value Q66" (shown in FIG. 6). Specifically, the Q value calculation unit 5 (1) calculates the score SC using the formula 1, (2) calculates the reward r using the formula 2, and (3) calculates the "value Q” is calculated. Here, Equation 1 is, for example, an evaluation criterion adopted in operational amplifier design contests. Equation 2 indicates that the score SC (evaluation value (synonymous with FIG. 10) to be described later) of "action A" when "action A" is performed is the target value MK (for example, the circuit characteristic CH of circuit CK). This is intended to give a reward r when the initial score (before learning) SC) is exceeded. The update formula UD in Equation 3 is a technique for updating the Q value known from Q-Learning.

スコアＳＣ＝（スルーレートＣＨ６×同相入力範囲ＣＨ１１×直流利得ＣＨ３）／消費電力ＣＨ２・・・（式１）
報酬ｒ＝スコアＳＣ／目標値ＭＫ・・・（式２）

・・・（式３） Score SC=(Slew rate CH6×In-phase input range CH11×DC gain CH3)/Power consumption CH2 (Formula 1)
Reward r=Score SC/Target value MK (Formula 2)

... (Formula 3)

式３について、本来、Ｑ－Ｌｅａｒｎｉｎｇでは、ｔは、時刻を表し、ｓ_ｔは、時刻ｔのときの状態（状態Ｓ（図６に図示）と同義。）を意味し、ａ_ｔは、時刻ｔのときの行動（行動Ａ（図６に図示。）と同義。）を意味し、式３中のＡは、式３中の行動ａの集合体を意味する。実施形態１では、ｔは、「行動Ａ」（図６に図示。）を行う順番を表し、例えば、ｔ＝ｔ１（換言すれば、ステップＳＴ＝１）のとき、「行動Ａ３」を行い、ｔ＝ｔ２（換言すれば、ステップＳＴ＝２）のとき、「行動Ａ５」を行うことを表す。 Regarding Equation 3, originally in Q-Learning, t represents time, s _t means the state at time t (synonymous with state S (shown in FIG. 6)), and a _t is time It means an action at time t (synonymous with action A (illustrated in FIG. 6)), and A in Equation 3 means a collection of actions a in Equation 3. In the first embodiment, t represents the order in which "action A" (shown in FIG. 6) is performed. For example, when t=t1 (in other words, step ST=1), "action A3" is performed, When t=t2 (in other words, step ST=2), it represents that "action A5" is performed.

Ｑテーブル更新部６は、ＱテーブルＱＴ（図６に図示。）を更新する。Ｑテーブル更新部６は、具体的には、古い「価値Ｑ」から、Ｑ値算出部５により算出された新しい「価値Ｑ」へ差し替えることにより、ＱテーブルＱＴを更新する。例えば、ＱテーブルＱＴ中で、「状態Ｓ２」のときの「行動Ａ３」の「価値Ｑ２３」が、既に「２」であったことを仮定すると、Ｑ値算出部５が、「価値Ｑ２３」として「３」を新たに算出したとき、Ｑテーブル更新部６は、「価値Ｑ２３」を古い「２」から新しい「３」へ書き換えることにより、ＱテーブルＱＴを更新する。 The Q table update unit 6 updates the Q table QT (illustrated in FIG. 6). Specifically, the Q table updating unit 6 updates the Q table QT by replacing the old “value Q” with the new “value Q” calculated by the Q value calculating unit 5 . For example, in the Q table QT, assuming that the "value Q23" of "behavior A3" in "state S2" was already "2", the Q value calculation unit 5 calculates "value Q23" as When "3" is newly calculated, the Q table updating unit 6 updates the Q table QT by rewriting the "value Q23" from the old "2" to the new "3".

記憶部７は、制御部１～Ｑテーブル更新部６が処理を行うために必要な情報を記憶する。記憶部７は、図１に示されるように、回路ＣＫ、理論式ＥＱ、回路特性ＣＨ、ＱテーブルＱＴ、更新式ＵＤ等を記憶している。 The storage unit 7 stores information necessary for the control unit 1 to the Q table updating unit 6 to perform processing. As shown in FIG. 1, the storage unit 7 stores the circuit CK, the theoretical equation EQ, the circuit characteristic CH, the Q table QT, the updating equation UD, and the like.

〈素子値推論装置の構成（ハードウェア）〉
図２は、実施形態１の素子値推論装置の構成（ハードウェア）を示す。 <Configuration of Element Value Inference Device (Hardware)>
FIG. 2 shows the configuration (hardware) of the element value inference device of the first embodiment.

素子値推論装置ＤＶは、ハードウェアの観点からは、図２に示されるように、入力部１１と、ＣＰＵ（Central Processing Unit）１２と、出力部１３と、記憶媒体１４と、メモリ１５と、を含む。入力部１１は、例えば、キーボード、マウスから構成される。ＣＰＵ１２は、ソフトウェアに従ってハードウェアを動作させる等の、よく知られたコンピュータの中核である。出力部１３は、例えば、液晶モニター、プリンタから構成される。記憶媒体１４は、図１に図示された記憶部７と同様に、回路ＣＫ、理論式ＥＱ、回路特性ＣＨ、ＱテーブルＱＴ、更新式ＵＤ等を記憶し、更に、プログラムＰＲを記憶している。記憶媒体１４は、当該記憶を行うべく、例えば、ハードディスクドライブ（ＨＤＤ：Hard Disk Drive）、ソリッドステートドライブ（ＳＳＤ：Solid State Drive）、ＲＯＭ（Read Only Memory）から構成される。メモリ１５は、例えば、ＤＲＡＭ（Dynamic Random Access Memory）、ＳＲＡＭ（Static Random Access Memory）から構成される。 From the hardware point of view, the element value inference device DV includes an input unit 11, a CPU (Central Processing Unit) 12, an output unit 13, a storage medium 14, a memory 15, and including. The input unit 11 is composed of, for example, a keyboard and a mouse. The CPU 12 is the core of a well-known computer such as operating hardware according to software. The output unit 13 is composed of, for example, a liquid crystal monitor and a printer. The storage medium 14 stores the circuit CK, the theoretical equation EQ, the circuit characteristic CH, the Q table QT, the updating equation UD, etc., and further stores the program PR, like the storage unit 7 shown in FIG. . The storage medium 14 is composed of, for example, a hard disk drive (HDD: Hard Disk Drive), a solid state drive (SSD: Solid State Drive), and a ROM (Read Only Memory) in order to perform the storage. The memory 15 is composed of, for example, a DRAM (Dynamic Random Access Memory) and an SRAM (Static Random Access Memory).

素子値推論装置ＤＶにおける、機能（図１）とハードウェア（図２）との関係については、ハードウェア上で、ＣＰＵ１２が、記憶媒体１４に記憶されたプログラムＰＲを、メモリ１５を用いつつ実行すると共に、必要に応じて、入力部１１及び出力部１３の動作を制御することにより、制御部１～記憶部７の機能が実現される。 Regarding the relationship between the function (FIG. 1) and the hardware (FIG. 2) in the element value inference device DV, the CPU 12 executes the program PR stored in the storage medium 14 using the memory 15 on the hardware. In addition, by controlling the operations of the input section 11 and the output section 13 as necessary, the functions of the control section 1 to the storage section 7 are realized.

〈素子値推論装置の動作〉
図７及び図８は、実施形態１の素子値推論装置の動作を示すフローチャートである。以下、実施形態１の素子値推論装置の動作を、図７及び図８を参照して説明する。図７及び図８中の工程の順序を示す「ステップＳ」は、「行動Ａ」を変更する回数である「ステップＳＴ」と何らの関係を有しない。 <Operation of Element Value Inference Device>
7 and 8 are flowcharts showing the operation of the element value inference device of the first embodiment. The operation of the element value inference apparatus of Embodiment 1 will be described below with reference to FIGS. 7 and 8. FIG. "Step S" indicating the order of steps in FIGS. 7 and 8 has nothing to do with "Step ST", which is the number of times "action A" is changed.

ステップＳ１０：素子値推論装置ＤＶの使用者は、回路ＣＫを設計し、より正確には、回路ＣＫに関する情報を素子値推論装置ＤＶへ入力する。ここで、使用者は、回路ＣＫを新規に設計することに代えて、従来知られた回路（例えば、アナログ回路における典型的な回路の一つであるオペアンプ回路）を採用することが可能である。使用者は、回路ＣＫに関する情報（例えば、回路素子の種類、回路素子間の接続関係）を、入力部１１であるマウス及びキーボードを用いて、文字の形式または画像の形式で素子値推論装置ＤＶに入力する。 Step S10: The user of the element value inference device DV designs the circuit CK, and more precisely, inputs information on the circuit CK to the element value inference device DV. Here, instead of newly designing the circuit CK, the user can adopt a conventionally known circuit (for example, an operational amplifier circuit which is one of the typical circuits in analog circuits). . The user inputs information about the circuit CK (for example, types of circuit elements and connection relationships between circuit elements) to the element value inference device DV in the form of characters or images using the mouse and keyboard as the input unit 11. to enter.

ステップＳ１１：素子値推論装置ＤＶの使用者は、初期設定・初期化部２を用いて、Ｑ－Ｌｅａｒｎｉｎｇを初期設定する。具体的には、使用者は、エピソードＥＰを何回行うか（エピソードＥＰの最後の数）、ステップＳＴを何回行うか（ステップＳＴの最後の数）、目標値ＭＫ、並びに、学習率α及び割引率γの値を設定する。以下の説明及び理解を容易にすべく、使用者が、「最後のエピソードＥＰ」として、「１０００」を設定し、また、「最後のステップＳＴ」として、「９０」を設定することを想定する。 Step S11: The user of the element value inference device DV uses the initialization/initialization unit 2 to initialize Q-Learning. Specifically, the user determines how many times episode EP is performed (last number of episode EP), how many times step ST is performed (last number of step ST), target value MK, and learning rate α and the value of the discount rate γ. In order to facilitate the following explanation and understanding, it is assumed that the user sets "1000" as "last episode EP" and "90" as "last step ST". .

ステップＳ１２：初期設定・初期化部２は、ＱテーブルＱＴを初期設定する。初期設定・初期化部２は、具体的には、ＱテーブルＱＴの「価値Ｑ」として任意の値を設定し、換言すれば、「価値Ｑ」をランダムに設定する。ここで、ＱテーブルＱＴの「価値Ｑ」を空欄に設定するのではなく、何らかの値を設定する理由は、もし、いずれの「価値Ｑ」も空欄であるときには、「行動Ａ１」～「Ａ６」のうちのいずれを優先的に選択すべきかを判断することができないためである。「価値Ｑ」がランダム性に依存しないようにすべく、例えば、広く知られているε－ｇｒｅｅｄｙ法を用いることが望ましい。 Step S12: The initialization/initialization unit 2 initializes the Q table QT. Specifically, the initial setting/initializing unit 2 sets an arbitrary value as the "value Q" of the Q table QT, in other words, randomly sets the "value Q". Here, the reason why the "value Q" of the Q table QT is set not to be blank but to some value is that if any "value Q" is blank, "behavior A1" to "A6" This is because it is not possible to determine which one of them should be selected preferentially. For example, it is desirable to use the widely known ε-greedy method so that the "value Q" does not depend on randomness.

ステップＳ１３：初期設定・初期化部２は、エピソードＥＰを「０」に設定する。 Step S13: The initialization/initialization section 2 sets the episode EP to "0".

ステップＳ１４：初期設定・初期化部２は、ステップＳＴを「０」に設定する。 Step S14: The initialization/initialization unit 2 sets step ST to "0".

ステップＳ１５：初期設定・初期化部２は、素子値である「Ｍ」を初期化する。より詳しくは、初期設定・初期化部２は、「Ｍ」として、任意の値（但し、複数のエピソードＥＰ間で共通（同一）の固定値）を、例えば、「２」を設定する。ここで、「Ｍ」を初期化する理由は、例えば、１番目のエピソードＥＰが終わったとき、「Ｍ」を固定値に再設定（初期化）した上で、後続のエピソードである２番目のエピソードＥＰを開始するためである。 Step S15: The initial setting/initializing section 2 initializes the element value "M". More specifically, the initial setting/initializing unit 2 sets an arbitrary value (however, a fixed value that is common (same) among a plurality of episode EPs) as "M", for example, "2". Here, the reason for initializing "M" is that, for example, when the first episode EP ends, "M" is reset (initialized) to a fixed value, and then the second episode, which is the subsequent episode, is initialized. This is for starting the episode EP.

ステップＳ１６：素子値変更部３は、素子値である「Ｍ」を変更する。素子値変更部３は、例えば、「Ｍ」を「２」から「３」へ変更する。 Step S16: The element value changer 3 changes the element value "M". The element value changing unit 3 changes "M" from "2" to "3", for example.

ステップＳ１７：シミュレーション部４は、「３」である「Ｍ」を、上述した理論式ＥＱに代入することにより、理論式ＥＱを実行する。 Step S17: The simulation unit 4 executes the theoretical formula EQ by substituting "M" which is "3" into the theoretical formula EQ described above.

ステップＳ１８：シミュレーション部４は、理論式ＥＱを実行した結果、回路特性ＣＨの値、即ち、消費電流ＣＨ１～入力換算雑音ＣＨ１３の値を取得する。 Step S18: As a result of executing the theoretical formula EQ, the simulation unit 4 acquires the values of the circuit characteristics CH, that is, the values of the consumption current CH1 to the input conversion noise CH13.

ステップＳ１９：シミュレーション部４は、取得された消費電流ＣＨ１～入力換算雑音ＣＨ１３の値のうち、スルーレートＣＨ６、同相入力範囲ＣＨ１１、直流利得ＣＨ３）、及び消費電力ＣＨ２の値を、上記した式１に代入することにより、「スコアＳＣ」を算出する。 Step S19: The simulation unit 4 calculates the values of the slew rate CH6, the in-phase input range CH11, the DC gain CH3), and the power consumption CH2 from among the obtained values of the current consumption CH1 to the input conversion noise CH13, and the values of the power consumption CH2 according to the above equation 1 , the "score SC" is calculated.

ステップＳ２０：シミュレーション部４は、算出された「スコアＳＣ」を、上記した式２に代入することにより、「報酬ｒ」を算出する。 Step S20: The simulation unit 4 calculates the "reward r" by substituting the calculated "score SC" into the formula 2 described above.

ステップＳ２１：シミュレーション部４は、ステップＳ２０で算出された「報酬ｒ」、ステップＳ１１で設定された学習率α及び割引率γを、上記した更新式ＵＤ（式３）に代入することにより、「価値Ｑ」を算出する。ここで、算出される価値Ｑは、例えば、状態Ｓ２であるときに、行動Ａ３を行うことにより状態Ｓ３に至ったことによって獲得することができる報酬ｒを考慮した上での、行動Ａ３の価値を意味する。より具体的には、算出される価値Ｑは、消費電流ＣＨ１～入力換算雑音ＣＨ１３の全体が状態Ｓ２であるとき、例えば、「Ｍ」を「３」にするとの行動Ａ３を行い、その結果として、消費電流ＣＨ１～入力換算雑音ＣＨ１３の全体が状態Ｓ３に至ったことにより獲得することができる報酬ｒを考慮した上での、行動Ａ３の価値を意味する。 Step S21: The simulation unit 4 substitutes the "reward r" calculated in step S20, the learning rate α and the discount rate γ set in step S11 into the above-described update formula UD (formula 3) to obtain " Calculate the value Q. Here, the calculated value Q is, for example, the value of the action A3 after considering the reward r that can be obtained by performing the action A3 in the state S2 and reaching the state S3. means More specifically, when the current consumption CH1 to the input conversion noise CH13 are all in the state S2, the calculated value Q is, for example, an action A3 of changing "M" to "3", and as a result, , the value of the action A3 in consideration of the reward r that can be obtained when all of the consumption current CH1 to the input-converted noise CH13 reach the state S3.

ステップＳ２２：Ｑテーブル更新部６は、算出された行動Ａの「価値Ｑ」を用いて、ＱテーブルＱＴを更新する。 Step S22: The Q table updating unit 6 updates the Q table QT using the calculated "value Q" of the behavior A.

ステップＳ２３：制御部１は、ステップＳＴが、最後のステップの「９０」であるか否かを判断する。 Step S23: The control unit 1 determines whether or not step ST is the last step "90".

ステップＳ２４：ステップＳ２３で、ステップＳＴが、「９０」でないとき、即ち、「９０」に至っていないとき、制御部１は、ステップＳＴを＋１（インクリメント）し、ステップＳ１６へ戻る。 Step S24: In step S23, when step ST is not "90", that is, when it does not reach "90", control unit 1 increments step ST by +1 and returns to step S16.

ステップＳ２５：ステップＳ２３で、ステップＳＴが、「９０」であるとき、即ち、「９０」に至ったとき、制御部１は、エピソードＥＰが、最後のエピソードの「１０００」であるか否かを判断する。 Step S25: In step S23, when step ST is "90", that is, when it reaches "90", the control unit 1 checks whether the episode EP is "1000" of the last episode. to decide.

ステップＳ２６：ステップＳ２５で、エピソードＥＰが、「１０００」でないとき、即ち、「１０００」に至っていないとき、制御部１は、エピソードＥＰを＋１（インクリメント）し、ステップＳ１４へ戻る。 Step S26: In step S25, when the episode EP is not "1000", that is, when it has not reached "1000", the control unit 1 increments the episode EP by +1 and returns to step S14.

ステップＳ２５で、エピソードＥＰが、「１０００」であるとき、即ち、「１０００」に至ったとき、制御部１は、動作を終了させる。 In step S25, when the episode EP is "1000", that is, when it reaches "1000", the control unit 1 terminates the operation.

〈Ｑテーブルの更新〉
図９は、実施形態１のＱテーブルの更新を示す。 <Update Q table>
FIG. 9 shows the updating of the Q table of the first embodiment.

ＱテーブルＱＴは、フローチャート（図７、図８）中のステップＳ１６～ステップＳ２２で、以下のように、更新される。 The Q table QT is updated as follows in steps S16 to S22 in the flow charts (FIGS. 7 and 8).

素子値変更部３は、図９の１．に示されるように、回路ＣＫの回路特性ＣＨ（消費電流ＣＨ１～入力換算雑音ＣＨ１３）が、「状態Ｓ２」にある場合に、「Ｍ」を「３」にすることを意味する「行動Ａ３」を行うことができるとき、「行動Ａ３」を行う（図７のステップＳ１６に対応。）。シミュレーション部４は、回路ＣＫの回路特性ＣＨが、図９の２．に示されるように、「状態Ｓ３」になることを取得し、式２を用いて報酬ｒを算出する（図７のステップＳ１７～図８のステップＳ２０に対応。）。Ｑ値算出部５は、「行動Ａ３」の「価値Ｑ２３」について、「行動Ａ３」を行うことにより得られる「状態Ｓ３」の価値「５」を考慮した上で、図９の３．に示されるように、「３」を算出し（図８のステップＳ２１に対応。）、Ｑテーブル更新部６は、「Ｑ２３」の「３」を用いて、ＱテーブルＱＴを更新する（図８のステップＳ２２に対応。）。 The element value changing unit 3 is 1. in FIG. , "action A3" means that "M" is set to "3" when the circuit characteristic CH (consumption current CH1 to input conversion noise CH13) of the circuit CK is in "state S2". can be performed, "action A3" is performed (corresponding to step S16 in FIG. 7). The simulation unit 4 determines that the circuit characteristic CH of the circuit CK corresponds to 2. in FIG. , it obtains that it will be in "state S3", and calculates the reward r using Equation 2 (corresponding to steps S17 in FIG. 7 to step S20 in FIG. 8). The Q value calculation unit 5 considers the value "5" of the "state S3" obtained by performing the "action A3" for the "value Q23" of the "action A3". , calculates "3" (corresponding to step S21 in FIG. 8), and the Q table updating unit 6 uses "3" in "Q23" to update the Q table QT (FIG. 8 corresponds to step S22 of ).

同様にして、素子値変更部３は、図９の４．に示されるように、回路ＣＫの回路特性ＣＨが、「状態Ｓ３」にある場合に、「Ｍ」を「４」にすることを意味する「行動Ａ４」を行うことができるとき、「行動Ａ４」を行う（図７のステップＳ１６に対応。）。シミュレーション部４は、回路ＣＫの回路特性ＣＨが、図９の５．に示されるように、「状態Ｓ４」になることを取得し、式２を用いて報酬ｒを算出する（図７のステップＳ１７～図８のステップＳ２０に対応。）。Ｑ値算出部５は、「行動Ａ４」の「価値Ｑ３４」について、「行動Ａ４」を行うことにより得られる「状態Ｓ４」の価値「６」を考慮した上で、図９の６．に示されるように、「２」を算出し（図８のステップＳ２１に対応。）、Ｑテーブル更新部６は、「Ｑ３４」の「２」を用いて、ＱテーブルＱＴを更新する（図８のステップＳ２２に対応。）。 In the same way, the element value changing unit 3 performs 4. in FIG. , when the circuit characteristic CH of the circuit CK is in the "state S3" and the "action A4" meaning changing "M" to "4" can be performed, the "action A4 ” (corresponding to step S16 in FIG. 7). In the simulation unit 4, the circuit characteristic CH of the circuit CK corresponds to 5. in FIG. , it acquires that it will be in "state S4", and calculates the reward r using Equation 2 (corresponding to steps S17 in FIG. 7 to step S20 in FIG. 8). The Q value calculation unit 5 considers the value "6" of the "state S4" obtained by performing the "action A4" for the "value Q34" of the "action A4", and calculates 6. in FIG. , calculates "2" (corresponding to step S21 in FIG. 8), and the Q table updating unit 6 uses "2" in "Q34" to update the Q table QT (FIG. 8 corresponds to step S22 of ).

上記したＱテーブルＱＴの更新を繰り返すことにより（図８のステップＳ２３に対応。）、図９の７．に示されるように、更新が完了したＱテーブルＱＴが完成する。即ち、ステップＳＴが「９０」に至るまで、換言すれば、「Ｍ」の値を変更することを「９０」回、行った結果として、図９の７．に示されるＱテーブルＱＴが完成する。 By repeating the updating of the Q table QT described above (corresponding to step S23 in FIG. 8), 7. in FIG. , the updated Q-table QT is completed. That is, until the step ST reaches "90", in other words, the value of "M" is changed "90" times. The Q table QT shown in is completed.

エピソードＥＰとＱテーブルＱＴとの関係については、例えば、１番目のエピソードＥＰが終了することにより完成したＱテーブルＱＴは、後続する２番目のエピソードＥＰを開始するときに使用される（図７、図８の矢印Ｃ）。換言すれば、２番目のエピソードＥＰは、先行する１番目のエピソードＥＰで完成したＱテーブルを用いて開始される。 Regarding the relationship between the episode EP and the Q table QT, for example, the Q table QT completed by the end of the first episode EP is used when starting the subsequent second episode EP (FIG. 7, Arrow C) in FIG. In other words, the second episode EP starts with the Q-table completed in the preceding first episode EP.

同様にして、３番目のエピソードＥＰは、２番目のエピソードＥＰで完成したＱテーブルＱＴを用いて開始され、４番目のエピソードＥＰは、３番目のエピソードＥＰで完成したＱテーブルＱＴを用いて開始される。 Similarly, the third episode EP starts using the Q-table QT completed in the second episode EP, and the fourth episode EP starts using the Q-table QT completed in the third episode EP. be done.

実施形態１では、エピソードＥＰの最後の数が「１０００」に設定されていることから、最終的に、１番目のエピソードＥＰ～１０００番目のエピソードＥＰが終了した時点で、１つのＱテーブルＱＴが完成することになる。 In Embodiment 1, since the last number of episode EPs is set to "1000", one Q table QT is finally completed when the first episode EP to the 1000th episode EP are completed. to be completed.

ここで、図９の７．に示されるＱテーブルＱＴは、１番目のエピソードＥＰ～１０００番目のエピソードＥＰの全てが終了することにより完成する。図９の７．に示されるＱテーブルＱＴは、また、１番目のエピソードＥＰ～１０００番目のエピソードＥＰの各エピソードが終了した時点でも、完成する可能性がある。従って、完成された１つのＱテーブルＱＴの中で、または、完成された複数のＱテーブルＱＴの中で、最も大きい「評価値」へ辿り着く「Ｍ」の変化の過程が最適であると推論する。
Here, 7. in FIG. is completed when all of the 1st episode EP to the 1000th episode EP are completed. 7. in FIG. The Q-table QT shown in can also be completed when each episode from the 1st episode EP to the 1000th episode EP ends. Therefore, in one completed Q-table QT or in a plurality of completed Q-tables QT, it is inferred that the process of changing "M" to reach the largest "evaluation value" is optimal. do.

〈回路特性の改善〉
図１０は、実施形態１の回路特性及び回路特性の値を示す。 <Improvement of circuit characteristics>
FIG. 10 shows the circuit characteristics and values of the circuit characteristics of the first embodiment.

回路ＣＫ（図３に図示。）の回路特性ＣＨ（消費電流ＣＨ１～入力換算雑音ＣＨ１３）についての評価値（上記した式１に示されるスコアＳＣと同義。）は、図１０に示されるように、上記した学習（図７、図８）を開始する前には、１．８１×１０^１９であったことに対し、上記した学習を終了した後には、７．７１×１０^１９に変更され、即ち、約４倍に改善されている。 The evaluation value (synonymous with the score SC shown in Equation 1 above) for the circuit characteristic CH (consumption current CH1 to input conversion noise CH13) of the circuit CK (shown in FIG. 3) is as shown in FIG. , Before starting the above learning (FIGS. 7 and 8), it was 1.81×10 ¹⁹ , but after completing the above learning, it was changed to 7.71×10 ¹⁹ , That is, it is improved by about four times.

〈エピソード、ステップ、及びスコアの関係〉
図１１は、実施形態１のエピソード、ステップ、及びスコアの関係を示す。 <Relationship between episodes, steps, and scores>
FIG. 11 shows the relationship between episodes, steps, and scores in Embodiment 1;

上記した「１０００」回のエピソードＥＰのうち、図１１に示されるように、「２９８」回め～「３００」回めのエピソードＥＰ間で、「１」～「９０」のステップＳＴについてのスコアＳＣが、殆ど変化していない。このことから、実質的に、約「３００」回めのエピソードＥＰの段階で、「１」～「９０」のステップＳＴのスコアＳＣ、即ち、回路特性ＣＨが、最大に改善されている。換言すれば、エピソードＥＰを「３００」回程度行えば、「１」～「９０」のステップＳＴの回路特性ＣＨを最適にすることを繰り返し実現するとの再現性を獲得することができることが裏付けられている。 Of the "1000" episode EPs described above, as shown in FIG. SC is almost unchanged. From this, substantially, at the stage of about the "300th" episode EP, the score SC of steps ST from "1" to "90", that is, the circuit characteristic CH is improved to the maximum. In other words, if the episode EP is performed about "300" times, it is possible to obtain the reproducibility of repeatedly optimizing the circuit characteristics CH of the steps ST of "1" to "90". ing.

〈学習の時期、ステップ、及びスコアの関係〉
図１２は、実施形態１での学習の時期、ステップ、スコアの関係を示す。 <Relationship between study period, step, and score>
FIG. 12 shows the relationship between learning time, step, and score in the first embodiment.

上記した学習（図７、図８）について、学習の初期では、図１２（Ａ）に示されるように、「１」～「９０」のステップＳＴのうち、「２０」～「９０」のステップＳＴのスコアＳＣは、学習前の評価値（スコアＳＣ）である「１．８１×１０^１９」を下回っている。 Regarding the above-described learning (FIGS. 7 and 8), in the initial stage of learning, as shown in FIG. The score SC of ST is lower than the pre-learning evaluation value (score SC) of "1.81×10 ¹⁹ ".

学習の中期では、図１２（Ｂ）に示されるように、「１」～「９０」のステップＳＴのうち、「１」～「８０」のステップＳＴのスコアＳＣは、上記した「１．８１×１０^１９」を上回っている。 In the middle period of learning, as shown in FIG. × 10 ¹⁹ ”.

学習の後期では、図１２（Ｃ）に示されるように、「１」～「９０」のステップＳＴのうち、「１」～「９０」の全てのステップＳＴのスコアＳＣは、上記した「１．８１×１０^１９」を上回っている。 In the latter stage of learning, as shown in FIG. .81×10 ¹⁹ ”.

上記した３つの事実から、学習が進むに連れて、回路ＣＫの回路特性ＣＨが、上記の「１．８１×１０^１９」（学習前の評価値（スコアＳＣ））から改善されていくというステップＳＴが徐々に増えていくことが、裏付けられる。 From the above three facts, as the learning progresses, the circuit characteristic CH of the circuit CK is improved from the above "1.81×10 ¹⁹ " (evaluation value (score SC) before learning). Gradual increase in ST is supported.

〈素子値推論装置の効果〉
実施形態１の素子値推論装置ＤＶでは、上記したように、回路ＣＫの消費電流ＣＨ１～入力換算雑音ＣＨ１３を全体的に最適化すべく、回路ＣＫを構成するＭＯＳ型電解効果トランジスタＭ１～Ｍ８の「Ｍ」の値を、Ｑ－Ｌｅａｒｎｉｎｇを用いて探索する、即ち、推論する。これにより、従来と異なり、（１）回路ＣＫを設計する設計者の知識、経験、判断等に依拠する必要が無く（図７、図８）、（２）消費電流ＣＨ１～入力換算雑音ＣＨ１３を全体として最適にする可能性を有する、同一の値である「Ｍ」を得るとの再現性を確保することができ（図１１）、（３）Ｑ－Ｌｅａｒｎｉｎｇ自体が学習の範囲を限定することなく学習を進めることから、学習済みの範囲内であるか範囲外であるかを問うことなく、「Ｍ」の値を適切に決定することができる。 <Effect of element value inference device>
In the element value inference device DV of the first embodiment, as described above, the MOS field effect transistors M1 to M8 constituting the circuit CK are optimized in order to optimize the current consumption CH1 to the input conversion noise CH13 of the circuit CK as a whole. The value of M' is searched or inferred using Q-Learning. As a result, unlike the prior art, (1) there is no need to rely on the knowledge, experience, judgment, etc. of the designer who designs the circuit CK (FIGS. 7 and 8), and (2) the current consumption CH1 to the input conversion noise CH13 (3) Q-Learning itself limits the scope of learning Therefore, the value of "M" can be appropriately determined regardless of whether it is within or outside the learned range.

〈変形例〉
〈回路の構成〉
図１３は、変形例の回路の構成を示す。 <Modification>
<Circuit configuration>
FIG. 13 shows the configuration of the circuit of the modification.

変形例の回路ＣＫは、実施形態１の回路ＣＫ（図３に図示。）に比して、多数のＭＯＳ型電解効果トランジスタＭ１～Ｍ２４を有する。 The modified circuit CK has a larger number of MOS field effect transistors M1 to M24 than the circuit CK of the first embodiment (shown in FIG. 3).

〈回路特性の改善〉
図１４は、変形例の回路特性及び回路特性の改善を示す。 <Improvement of circuit characteristics>
FIG. 14 shows circuit characteristics and improvements in circuit characteristics of the modified example.

上記した変形例の回路ＣＫ（図１３に図示。）について、素子値推論装置ＤＶ（図１及び図２に図示。）が、上記した学習（図７及び図８に図示。）を行うことにより、図１４に示されるように、変形例の回路ＣＫの消費電流ＣＨ１～入力換算雑音ＣＨ１３についての評価値（スコアＳＣ）は、上記した学習を開始する前には、７．１７×１０^１８であったことに対し、上記した学習を終了した後には、１．２６×１０^１９に変更され、即ち、約２倍に改善されている。 With respect to the modified circuit CK (shown in FIG. 13), the element value inference device DV (shown in FIGS. 1 and 2) performs the above-described learning (shown in FIGS. 7 and 8). , as shown in FIG. 14, the evaluation value (score SC) for the current consumption CH1 to the input-equivalent noise CH13 of the circuit CK of the modified example is 7.17×10 ¹⁸ before starting the learning described above. However, after completing the learning described above, it was changed to 1.26×10 ¹⁹ , that is, it was improved by about two times.

実施形態１及び変形例において、回路特性ＣＨが改善されたことから、他の回路（図示せず。）についても、それらの回路の回路特性ＣＨが、上記した学習を通じて改善されることが期待される。 Since the circuit characteristics CH are improved in the first embodiment and the modification, it is expected that the circuit characteristics CH of other circuits (not shown) will also be improved through the above-described learning. be.

ＤＶ素子値推論措置、１制御部、２初期設定・初期化部、３素子値変更部、４シミュレーション部、５Ｑ値算出部、６Ｑテーブル更新部、７記憶部 DV element value inference measure, 1 control unit, 2 initial setting/initialization unit, 3 element value change unit, 4 simulation unit, 5 Q value calculation unit, 6 Q table update unit, 7 storage unit

Claims

the computer
an element value changing step of changing from one element value to another element value among a plurality of element values that can be taken by one or more circuit elements included in the circuit;
a simulation step of calculating a plurality of characteristics of the circuit using the other element values after the change;
A value calculation step of calculating the value of the one element value before the change using the calculated plurality of characteristics;
Among the plurality of element values, in order to infer an element value having a probability of improving the plurality of characteristics, using the calculated value for the one element value before the change, the plurality of element values and an action-value function updating step of updating an action-value function indicating the correspondence relationship between and the values of the plurality of element values;
An element value inference method that performs operations including

The value calculation step rewards the calculation of the value of the one element value before the change when the calculated evaluation values regarding the plurality of characteristics exceed initial evaluation values regarding the plurality of characteristics of the circuit. 2. The element value inference method according to claim 1, wherein the element value inference method is performed by adding

an element value changing unit that changes from one element value to another element value among a plurality of element values that can be taken by one or more circuit elements included in the circuit;
a simulation unit that calculates a plurality of characteristics of the circuit using the other element values after the change;
a value calculation unit that calculates the value of the one element value before the change using the plurality of calculated characteristics;
Among the plurality of element values, in order to infer an element value having a probability of improving the plurality of characteristics, using the calculated value for the one element value before the change, the plurality of element values and an action-value function updating unit that updates an action-value function indicating a correspondence relationship between and the values of the plurality of element values;
Element value reasoning device including.

to the computer,
an element value changing step of changing from one element value to another element value among a plurality of element values that can be taken by one or more circuit elements included in the circuit;
a simulation step of calculating a plurality of characteristics of the circuit using the other element values after the change;
A value calculation step of calculating the value of the one element value before the change using the calculated plurality of characteristics;
Among the plurality of element values, in order to infer an element value having a probability of improving the plurality of characteristics, using the calculated value for the one element value before the change, the plurality of element values and an action-value function updating step of updating an action-value function indicating the correspondence relationship between and the values of the plurality of element values;
Element value inference program to run.