JP2018124790A - Decision making device - Google Patents

Decision making device Download PDF

Info

Publication number
JP2018124790A
JP2018124790A JP2017016294A JP2017016294A JP2018124790A JP 2018124790 A JP2018124790 A JP 2018124790A JP 2017016294 A JP2017016294 A JP 2017016294A JP 2017016294 A JP2017016294 A JP 2017016294A JP 2018124790 A JP2018124790 A JP 2018124790A
Authority
JP
Japan
Prior art keywords
decision
electrolyte
electrodes
voltage
electrode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2017016294A
Other languages
Japanese (ja)
Other versions
JP6872226B2 (en
Inventor
敬志 土屋
Takashi Tsuchiya
敬志 土屋
寺部 一弥
Kazuya Terabe
一弥 寺部
徹 鶴岡
Toru Tsuruoka
徹 鶴岡
成主 金
Narikazu Kin
成主 金
青野 正和
Masakazu Aono
正和 青野
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Institute for Materials Science
Original Assignee
National Institute for Materials Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Institute for Materials Science filed Critical National Institute for Materials Science
Priority to JP2017016294A priority Critical patent/JP6872226B2/en
Publication of JP2018124790A publication Critical patent/JP2018124790A/en
Application granted granted Critical
Publication of JP6872226B2 publication Critical patent/JP6872226B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

PROBLEM TO BE SOLVED: To provide a decision making device capable of performing decision making precisely on the basis of a tug-of-war principle by a simple device that can be downsized.SOLUTION: A decision making device includes: learning means for performing learning by accumulating electric charges; electric charges supply means for supplying electric charges in response to an action of an event to the learning means; and voltage reading means for reading a voltage of the learning means, and performs decision making by the voltage read by the voltage reading means. The learning means is composed of an electrolyte element in which an electrolyte material layer capable of conveying ions by an electric field is sandwiched by two or more electrodes.SELECTED DRAWING: Figure 2

Description

本発明は、事象情報を電気信号にして与えときに報酬確率の高い行動を選択する意思決定装置に関する。   The present invention relates to a decision making device that selects an action with a high reward probability when event information is provided as an electrical signal.

近年、高効率な意思決定の重要性が増している。例えば、金融においては、刻一刻と変動する相場情報を基に安全に危険資産の管理を行う必要がある。コグニティブ無線では、端末の位置や時間帯によって最適な無線方式、周波数帯を選択する必要がある。囲碁、将棋といった競技は変動する環境で意思決定が問題となる典型例であり、近年、人間とコンピュータとの対戦が話題となっている。   In recent years, the importance of highly efficient decision making has increased. For example, in finance, it is necessary to safely manage dangerous assets based on market information that changes every moment. In cognitive radio, it is necessary to select an optimal radio system and frequency band according to the location and time zone of the terminal. Competitions such as Go and Shogi are typical examples in which decision-making becomes a problem in a changing environment. In recent years, competition between humans and computers has become a hot topic.

こうした問題は、多本腕バンディット問題として取り扱われ、通常、SOFTMAX法やε−GREEDY法といった従来型アルゴリズムを用いた計算処理により解決される。しかし、このような手法は万能ではなく、より高速かつ正確な解法が求められている。
近年、こうした多本腕バンディット問題の効率的な解法として「綱引き原理」が提案された(非特許文献1から3、及び特許文献1)。例えば、報酬確率の異なる2つの行動を選択する場合、それぞれの行動に対する試行錯誤において得られる報酬に応じて変位(綱引き)する物体を用いることによって、より報酬確率の高い行動を選択する。これを意思決定と呼ぶ。
図1を参照しながら報酬確率80%の行動Aと20%の行動Bの2つの行動を選択する場合を考える。行動AとBの報酬確率はプレイヤーにとって未知であるため、それぞれの行動を選択し報酬を得る、あるいは得られないという経験を基に報酬確率を予測し、より報酬確率の高い行動を選択(意思決定)する。綱引き原理では、プレイヤーが行動AやBを選択し、得た報酬に応じて物体を刻一刻と変位させていくことによって、より報酬確率の高い行動を選択(意思決定)する。例えば、試行錯誤の過程で行動Aを選択し、報酬を得た場合は+1、報酬を得られなかった場合は‐ωの変位を物体に与える。逆に行動Bを選択し、報酬を得た場合は‐1、報酬を得られなかった場合は+ωの変位を物体に与える。物体の変位がどちらかに偏ることにより、選択(意思決定)をしたと見做せばよい。ここで、ωはγ/2‐γで定義される。図1の場合、γは行動Aの報酬確率(80%)と行動Bの報酬確率(20%)の和を100で割った値である1.0となる(非特許文献1)。
Such a problem is treated as a multiple-arm bandit problem, and is usually solved by a calculation process using a conventional algorithm such as the SOFTMAX method or the ε-GREEDY method. However, such a method is not universal, and a faster and more accurate solution is required.
In recent years, the “tug of war principle” has been proposed as an efficient solution to such a multi-armed bandit problem (Non-Patent Documents 1 to 3 and Patent Document 1). For example, when two actions with different reward probabilities are selected, an action with a higher reward probability is selected by using an object that is displaced (tug of war) according to the reward obtained through trial and error for each action. This is called decision making.
Consider a case in which two actions, action A with a reward probability of 80% and action B with a 20% probability, are selected with reference to FIG. The reward probabilities of actions A and B are unknown to the player. Therefore, the reward probability is predicted based on the experience of selecting each action and obtaining or not obtaining the reward, and selecting an action with a higher reward probability (intention decide. In the tug-of-war principle, the player selects an action A or B and selects an action with a higher reward probability (decision decision) by displacing the object every moment according to the reward obtained. For example, the action A is selected in the process of trial and error, and if the reward is obtained, +1 is given to the object, and if the reward is not obtained, the displacement of -ω is given to the object. On the contrary, when the action B is selected and the reward is obtained, a displacement of −1 is given to the object, and when the reward is not obtained, a displacement of + ω is given to the object. What is necessary is just to assume that the selection (decision making) has been made because the displacement of the object is biased to either. Here, ω is defined by γ / 2-γ. In the case of FIG. 1, γ is 1.0, which is a value obtained by dividing the sum of the reward probability of action A (80%) and the reward probability of action B (20%) by 100 (Non-Patent Document 1).

綱引き原理は、従来手法と比較すると報酬確率の高い行動への収束が高速であるだけでなく、環境(それぞれの行動が持つ報酬確率)の変化に対して適応性が高いという利点を有している。さらに、他の解法が計算処理に依拠するプログラムであることに対して、綱引き原理は物理現象に依拠するため、プログラムにおいて問題となる計算処理量の増大とそれに伴って生じる処理数の限界を回避することが可能となる。   The tug-of-war principle has the advantage of not only faster convergence to actions with higher reward probabilities, but also higher adaptability to changes in the environment (reward probabilities of each action) compared to conventional methods. Yes. In addition, the tug-of-war principle relies on physical phenomena, while other solutions are programs that rely on computation processing, avoiding an increase in the amount of computation processing that can be problematic in the program and the associated limitations on the number of processing. It becomes possible to do.

綱引き原理を用いた意思決定手段を様々な物理現象を利用して実装して、強化学習に用いる試みがなされている(非特許文献4から7)。例えば、ナノダイヤモンドの窒素欠陥を光子源として用いると、単一光子の粒子性と確率性を利用することで綱引き原理を物理的に実装することが出来る(非特許文献6)。しかし、このような方法では大規模な光学回路が必要となるため、デバイス、回路の小型化には適さないという課題が残る。また、比較的小さな空間で金属フィラメントの生成・切断を行い意思決定に用いようとする試みもある(非特許文献7)。しかしながらこの方法は、綱引き原理を原理上精度良く再現出来ないという根本的な問題を内包しており、実用的とは言い難い。このように、綱引き原理に正確に基づき、かつ小型化可能なデバイスによって意思決定するという意思決定装置の課題は解決されていない。   Attempts have been made to implement decision making using the tug of war principle using various physical phenomena and use it for reinforcement learning (Non-Patent Documents 4 to 7). For example, when the nitrogen defect of nanodiamond is used as a photon source, the tug-of-war principle can be physically implemented by utilizing the particle nature and probability of single photons (Non-patent Document 6). However, since such a method requires a large-scale optical circuit, there remains a problem that it is not suitable for miniaturization of devices and circuits. There is also an attempt to make and use metal filaments in a relatively small space for decision making (Non-patent Document 7). However, this method has a fundamental problem that the principle of tug-of-war cannot be reproduced with high accuracy in principle, and is not practical. As described above, the problem of the decision making device that makes a decision with a device that can be miniaturized accurately based on the tug-of-war principle has not been solved.

特開2014−191598号公報JP 2014-191598 A

New J.Phys.,vol.17,p.083023(2015)New J.M. Phys. , Vol. 17, p. 083023 (2015) Biosystems,vol.101,pp.29−36(2010)Biosystems, vol. 101, pp. 29-36 (2010) LNCS,vol.6079,pp.69−80(2010)LNCS, vol. 6079, pp. 69-80 (2010) Sci.Rep.,vol.3,p.2370(2013)Sci. Rep. , Vol. 3, p. 2370 (2013) J.Appl.Phys.,vol.116,p.154303(2014)J. et al. Appl. Phys. , Vol. 116, p. 154303 (2014) AIMS Mater.Sci.,vol.3,pp.245−259(2016)AIMS Mater. Sci. , Vol. 3, pp. 245-259 (2016) DOI:10.1039/c6nr00690f(2016)DOI: 10.10039 / c6nr00690f (2016)

本発明の課題は、簡易で小型化可能なデバイスにより、綱引き原理に正確に基づいて意思決定が可能な意思決定装置を提供することである。   An object of the present invention is to provide a decision making device capable of making a decision based on the tug of war principle with a simple and miniaturizable device.

本発明の構成を下記に示す。
(構成1)
電荷の蓄積により学習を行う学習手段、事象の行動に応じた電荷を前記学習手段に与える電荷供給手段、及び前記学習手段の電圧を読み取る電圧読み取り手段を有し、前記電圧読み取り手段で読み取った電圧により意思を決定する意思決定装置であって、
前記学習手段は、電場によるイオンの輸送が可能な電解質材料層を2以上の電極で挟んだ電解質素子からなる、意思決定装置。
(構成2)
前記2以上の電極間に前記電荷の流入による電流を流して前記イオンを輸送し、前記電極間に電圧を生じさせる、構成1記載の意思決定装置。
(構成3)
前記イオンの前記2以上の電極のうちの少なくとも1の電極側への移動または電極内への侵入により、前記2以上の電極に電子及び正孔が生成されて電圧が発生する、構成1または2記載の意思決定装置。
(構成4)
前記電解質材料層は液体電解質または固体電解質を含む、構成1から3の何れか1に記載の意思決定装置。
(構成5)
前記液体電解質は、テトラメチルアンモニウムイオン(TMA)、テトラエチルアンモニウムイオン(TEA)、テトラブチルアンモニウムイオン(TBA)、テトラフルオロホウ酸イオン(BF )、N,N−ジエチル−N−メチル−N−(2−メトキシエチル)アンモニウム−ビス(トリフルオロメタンスルホニル)イミド(DEME−TFSI)、N,N−ジエチル−N−メチル−N−(2−メトキシエチル)アンモニウム−テトラフルオロボラート(DEME−BF)からなる群の少なくとも1を含む、構成4記載の意思決定装置。
(構成6)
前記電解質材料は可動イオンを有する高分子化合物を含む構成4に記載の意思決定装置。
(構成7)
前記高分子化合物はポリエチレンオキシドまたはナフィオンの少なくとも何れかを含む、構成6に記載の意思決定装置。
(構成8)
前記電解質材料層は可動イオンを有する金属酸化物またはケイ酸(SiO) の少なくとも何れかを含む、構成4に記載の意思決定装置。
(構成9)
前記金属酸化物は、酸化セリウム(CeO)、酸化タンタル(Ta)、酸化ジルコニウム(ZrO)、酸化ニオブ(Nb)、酸化タングステン(WO)、酸化リチウム(LiO)からなる群の少なくとも1を含む、構成8に記載の意思決定装置。
(構成10)
前記2以上の電極は電子伝導性を有する金属または半導体の少なくとも何れかを含む、構成4に記載の意思決定装置。
(構成11)
前記金属は、金、白金、銀、パラジウム、アルミニウム、鉄、銅、タングステン、チタン、タンタルからなる群の少なくとも1を含む、構成10に記載の意思決定装置。
(構成12)
前記半導体は、炭素、シリコン、コバルト酸リチウムからなる群の少なくとも1を含む、構成10に記載の意思決定装置。
(構成13)
前記金属及び半導体は、電場下でイオンとの化学反応が可能な活性物質を含む、構成10に記載の意思決定装置。
(構成14)
前記金属及び半導体は、電場下でイオン輸送が可能な電解質を含み、前記電解質材料層内及び前記2以上の電極のうちの一方の電極内のイオンが移動して他方の電極内に前記イオンが侵入する、構成10に記載の意思決定装置。
(構成15)
前記意思決定装置は配線切替手段を有する、構成1から14の何れかに記載の意思決定装置。
The configuration of the present invention is shown below.
(Configuration 1)
Learning means for performing learning by accumulating electric charge, charge supplying means for supplying the learning means with charge according to an event action, and voltage reading means for reading the voltage of the learning means, and the voltage read by the voltage reading means A decision-making device for making a decision by
The learning means is a decision making device comprising an electrolyte element in which an electrolyte material layer capable of transporting ions by an electric field is sandwiched between two or more electrodes.
(Configuration 2)
The decision making device according to configuration 1, wherein a current due to the inflow of the electric charge flows between the two or more electrodes to transport the ions and generate a voltage between the electrodes.
(Configuration 3)
Configuration 1 or 2 in which electrons and holes are generated in the two or more electrodes due to movement of the ions to at least one of the two or more electrodes or intrusion into the electrodes, thereby generating a voltage. The described decision-making device.
(Configuration 4)
The decision-making apparatus according to any one of configurations 1 to 3, wherein the electrolyte material layer includes a liquid electrolyte or a solid electrolyte.
(Configuration 5)
The liquid electrolyte includes tetramethylammonium ion (TMA + ), tetraethylammonium ion (TEA + ), tetrabutylammonium ion (TBA + ), tetrafluoroborate ion (BF 4 ), N, N-diethyl-N—. Methyl-N- (2-methoxyethyl) ammonium-bis (trifluoromethanesulfonyl) imide (DEME-TFSI), N, N-diethyl-N-methyl-N- (2-methoxyethyl) ammonium-tetrafluoroborate ( DEME-BF 4) comprises at least one of the group consisting of structure 4 wherein decision device.
(Configuration 6)
The decision-making apparatus according to Configuration 4, wherein the electrolyte material includes a polymer compound having mobile ions.
(Configuration 7)
The decision-making apparatus according to Configuration 6, wherein the polymer compound includes at least one of polyethylene oxide and Nafion.
(Configuration 8)
The decision-making apparatus according to Configuration 4, wherein the electrolyte material layer includes at least one of a metal oxide having mobile ions and silicic acid (SiO 2 ).
(Configuration 9)
The metal oxide includes cerium oxide (CeO 2 ), tantalum oxide (Ta 2 O 5 ), zirconium oxide (ZrO 2 ), niobium oxide (Nb 2 O 5 ), tungsten oxide (WO 3 ), and lithium oxide (Li 2 ). The decision making device according to configuration 8, comprising at least one of the group consisting of O).
(Configuration 10)
The decision making device according to Configuration 4, wherein the two or more electrodes include at least one of a metal or a semiconductor having electron conductivity.
(Configuration 11)
The decision-making apparatus according to Configuration 10, wherein the metal includes at least one of the group consisting of gold, platinum, silver, palladium, aluminum, iron, copper, tungsten, titanium, and tantalum.
(Configuration 12)
The decision-making apparatus according to Configuration 10, wherein the semiconductor includes at least one of the group consisting of carbon, silicon, and lithium cobalt oxide.
(Configuration 13)
The decision-making apparatus according to Configuration 10, wherein the metal and the semiconductor include an active substance that can chemically react with ions under an electric field.
(Configuration 14)
The metal and the semiconductor include an electrolyte capable of ion transport under an electric field, and ions in one electrode of the electrolyte material layer and the two or more electrodes move to move the ions into the other electrode. The decision-making device according to Configuration 10, which intrudes.
(Configuration 15)
15. The decision making device according to any one of configurations 1 to 14, wherein the decision making device includes a wiring switching unit.

本発明によれば、簡易で小型化可能なデバイスにより、綱引き原理に正確に基づいて意思決定が可能な意思決定装置を提供することが可能になる。   ADVANTAGE OF THE INVENTION According to this invention, it becomes possible to provide the decision making apparatus which can make a decision based on the tug-of-war principle accurately with the device which can be reduced in size simply.

綱引き原理による意思決定を説明する概念図。The conceptual diagram explaining the decision making by the tug-of-war principle. 意思決定装置の構成を示す構成図。The block diagram which shows the structure of a decision-making apparatus. 電源スイッチ部の構成を電気回路で示す回路図。The circuit diagram which shows the structure of a power switch part with an electric circuit. 電源スイッチ部の構成を電気回路で示す回路図。The circuit diagram which shows the structure of a power switch part with an electric circuit. 電解質素子の構成を示す断面図。Sectional drawing which shows the structure of an electrolyte element. 電解質素子の動作原理を示す説明図。Explanatory drawing which shows the principle of operation of an electrolyte element. 学習、意思決定過程における電解質素子の電気特性を説明する説明図。Explanatory drawing explaining the electrical property of the electrolyte element in a learning and decision-making process. 電解質素子の構成を示す断面図。Sectional drawing which shows the structure of an electrolyte element. 電解質素子の動作原理を示す説明図。Explanatory drawing which shows the principle of operation of an electrolyte element. 意思決定装置の構成を示す構成図。The block diagram which shows the structure of a decision-making apparatus. 学習記憶装置部の構成を示す構成図。The block diagram which shows the structure of a learning memory | storage device part. 学習記憶装置部の動作原理を示す説明図。Explanatory drawing which shows the principle of operation of a learning memory | storage device part. 学習記憶装置部の動作原理を示す説明図。Explanatory drawing which shows the principle of operation of a learning memory | storage device part. 報酬確率(P,P)を(80%、20%)としたときの電解質素子の起電力の変化を示す特性図。Compensation probability (P A, P B) (80%, 20%) characteristic diagram showing the change of the electromotive force of the electrolyte element when the. 報酬確率(P,P)を(80%、20%)と(20%、80%)で繰り返し切り替えた場合の正答確率の推移を示す特性図。Compensation probability (P A, P B) (80%, 20%) (20%, 80%) characteristic diagram showing changes in the correct probability when switched repeatedly with. 報酬確率(P,P)を(70%、30%)と(30%、70%)で繰り返し切り替えた場合の正答確率の推移を示す特性図。Compensation probability (P A, P B) (70%, 30%) (30%, 70%) characteristic diagram showing changes in the correct probability when switched repeatedly with. 報酬確率(P,P)を(60%、40%)と(40%、60%)で繰り返し切り替えた場合の正答確率の推移を示す特性図。Compensation probability (P A, P B) (60%, 40%) (40%, 60%) characteristic diagram showing changes in the correct probability when switched repeatedly with.

以下本発明を実施するための形態を図面を参照しながら説明する。
(実施の形態1)
<意思決定装置の構成>
本発明の意思決定装置は、電荷の蓄積により学習を行う学習手段、事象の行動に応じた電荷を学習手段に与える電荷供給手段、及び学習手段の電圧を読み取る電圧読み取り手段からなり、その構成を図2に示す。
ここで、電荷の蓄積により学習を行う学習手段は、電場によるイオン輸送が可能な電解質材料層を2以上の電極で挟んだ電解質素子11からなる。
電荷供給手段は、事象の行動の学習をさせるための入力信号を基に電源から電荷を供給する電源スイッチからなり、電圧を読み取る手段は電圧計14からなる。電圧計は、この回路を流れる電流に対してなるべく影響を与えないように、高抵抗(高インピーダンス)のものを用いることが好ましい。
電源スイッチは、電源と入力信号により電圧の印加と切断、電圧の正負及びその電圧の大きさの調整を行う機能を有する。図1では、電源スイッチは、電解質素子11に入力信号15を基に第1の電圧を印加及びその切断が可能な第1の電源スイッチ12と、入力信号16を基に第1の電源とは逆向きの電圧を印加及びその切断することが可能な第2の電源スイッチ13からなる場合を示す。但し、これは一例であり、電源スイッチは、1つの電源から入力信号を基に、電解質素子11に正負を含む所定の電圧を印加したり、電圧の印加を中断したりすることが可能なスイッチを有するものでもよい。
DESCRIPTION OF EMBODIMENTS Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings.
(Embodiment 1)
<Configuration of decision-making device>
The decision making device of the present invention comprises learning means for learning by charge accumulation, charge supply means for supplying the learning means with charge according to the behavior of the event, and voltage reading means for reading the voltage of the learning means. As shown in FIG.
Here, the learning means for performing learning by accumulating electric charges includes an electrolyte element 11 in which an electrolyte material layer capable of ion transport by an electric field is sandwiched between two or more electrodes.
The charge supply means includes a power switch that supplies charge from a power source based on an input signal for learning event action, and the voltage reading means includes a voltmeter 14. It is preferable to use a voltmeter having a high resistance (high impedance) so as not to affect the current flowing through the circuit as much as possible.
The power switch has a function of applying and disconnecting a voltage, positive / negative of the voltage, and adjusting the magnitude of the voltage according to the power source and the input signal. In FIG. 1, the power switch includes a first power switch 12 that can apply and disconnect a first voltage to the electrolyte element 11 based on the input signal 15, and a first power source based on the input signal 16. A case will be described in which the second power switch 13 is capable of applying and disconnecting a reverse voltage. However, this is an example, and the power switch is a switch capable of applying a predetermined voltage including positive and negative to the electrolyte element 11 or interrupting the voltage application based on an input signal from one power source. It may have.

電源スイッチ12としては、例えば図3に示すように、MOSトランジスタスイッチ21、直流電源22、可変抵抗23からなるものが挙げられる。学習を与えるための入力信号13がMOSトランジスタ21のゲート24に入力されると、MOSトランジスタ21がオンの状態になって、電解質素子11に電圧が印加される。入力信号13が入力されない場合は、MOSトランジスタ21はオフの状態になって電解質素子11へは電圧は印加されない。ここで、電解質素子11に印加される電圧の大きさは可変抵抗23によって所定の値に調整される。
電源スイッチ13としては、例えば図4に示すように、MOSトランジスタスイッチ25、直流電源26、可変抵抗27からなるものが挙げられる。ここで、直流電源26は、直流電源22の電圧の正負とは逆の正負を与える電源にしておく。学習を与えるための入力信号16がMOSトランジスタ25のゲート28に入力されると、MOSトランジスタ25がオンの状態になって、電解質素子11に電源スイッチ12からの電圧とは逆向きの電圧が印加される。入力信号16が入力されない場合は、MOSトランジスタ25はオフの状態になって電解質素子11へは電圧は印加されない。ここで、電源スイッチ12と同様に、電解質素子11に印加される電圧の大きさは可変抵抗27によって所定の値に調整される。
As the power switch 12, for example, as shown in FIG. 3, a switch comprising a MOS transistor switch 21, a DC power source 22, and a variable resistor 23 can be mentioned. When the input signal 13 for giving learning is input to the gate 24 of the MOS transistor 21, the MOS transistor 21 is turned on, and a voltage is applied to the electrolyte element 11. When the input signal 13 is not input, the MOS transistor 21 is turned off and no voltage is applied to the electrolyte element 11. Here, the magnitude of the voltage applied to the electrolyte element 11 is adjusted to a predetermined value by the variable resistor 23.
As the power switch 13, for example, as shown in FIG. 4, a switch comprising a MOS transistor switch 25, a DC power supply 26, and a variable resistor 27 can be cited. Here, the DC power source 26 is a power source that gives positive / negative opposite to the positive / negative of the voltage of the DC power source 22. When an input signal 16 for giving learning is input to the gate 28 of the MOS transistor 25, the MOS transistor 25 is turned on, and a voltage opposite to the voltage from the power switch 12 is applied to the electrolyte element 11. Is done. When the input signal 16 is not input, the MOS transistor 25 is turned off and no voltage is applied to the electrolyte element 11. Here, like the power switch 12, the magnitude of the voltage applied to the electrolyte element 11 is adjusted to a predetermined value by the variable resistor 27.

<電解質素子の構造>
実施の形態1では、その構成と機能をわかりやすくすることも考慮して、電極が2つからなる電解質素子11(2端子電解質素子11)の場合について説明する。
電解質素子11の構造を断面図である図5に示す。電解質素子11は、陰イオン1と陽イオン2が移動出来る電解質材料層3を第1の電極4と第2の電極5で挟んだ積層構造になっている。電流印加による効果は、第1の電極4と第2の電極5との間の電圧(起電力)として測定可能である。
<Structure of electrolyte element>
In the first embodiment, the case of the electrolyte element 11 (two-terminal electrolyte element 11) including two electrodes will be described in consideration of making the configuration and function easy to understand.
The structure of the electrolyte element 11 is shown in FIG. The electrolyte element 11 has a laminated structure in which an electrolyte material layer 3 capable of moving anions 1 and cations 2 is sandwiched between a first electrode 4 and a second electrode 5. The effect of current application can be measured as a voltage (electromotive force) between the first electrode 4 and the second electrode 5.

なお、図5及び以降の概念図は本発明を概念的に示すものであるため、実際の構造がこれらの図に示す構造と完全に相似形となることが必要とされるわけではないし、またこれらの図には明示されていない要素を追加したり、同等な別の要素で置換することもできる。   5 and the subsequent conceptual diagrams conceptually illustrate the present invention, the actual structure is not required to be completely similar to the structures shown in these drawings. You can add elements that are not explicitly shown in these figures, or replace them with other equivalent elements.

電解質材料層3の材料としては、例えば、液体電解質であるテトラメチルアンモニウム−テトラフルオロボラート(TMA−BF)を用いることができる。電解質としては、テトラメチルアンモニウムイオン(TMA)、テトラエチルアンモニウムイオン(TEA)、テトラブチルアンモニウムイオン(TBA)、テトラフルオロホウ酸イオン(BF )、N,N−ジエチル−N−メチル−N−(2−メトキシエチル)アンモニウム−ビス(トリフルオロメタンスルホニル)イミド(DEME−TFSI)、N,N−ジエチル−N−メチル−N−(2−メトキシエチル)アンモニウム−テトラフルオロボラート(DEME−BF)からなる群の少なくとも1を含む液体電解質を使用することもできる。また、電解質材料には電解質以外に各種の添加物を加えることもできる。また、電解質の材料としては他に固体電解質、可動イオンを含む高分子化合物、可動イオンを有する金属酸化物及びケイ酸(SiO)も使用可能である。
ここで、可動イオンを含む高分子化合物としては、ポリエチレンオキシド、ナフィオンを挙げることができ、可動イオンを有する金属酸化物としては、酸化セリウム(CeO)、酸化タンタル(Ta)、酸化ジルコニウム(ZrO)、酸化ニオブ(Nb)、酸化タングステン(WO)、酸化リチウム(LiO)を挙げることができる。
As a material of the electrolyte material layer 3, for example, tetramethylammonium tetrafluoroborate (TMA-BF 4 ) that is a liquid electrolyte can be used. Examples of the electrolyte include tetramethylammonium ion (TMA + ), tetraethylammonium ion (TEA + ), tetrabutylammonium ion (TBA + ), tetrafluoroborate ion (BF 4 ), N, N-diethyl-N-methyl. -N- (2-methoxyethyl) ammonium-bis (trifluoromethanesulfonyl) imide (DEME-TFSI), N, N-diethyl-N-methyl-N- (2-methoxyethyl) ammonium-tetrafluoroborate (DEME) A liquid electrolyte containing at least one member of the group consisting of —BF 4 ) can also be used. In addition to the electrolyte, various additives can be added to the electrolyte material. In addition, as the electrolyte material, a solid electrolyte, a polymer compound containing mobile ions, a metal oxide having mobile ions, and silicic acid (SiO 2 ) can also be used.
Here, examples of the polymer compound containing mobile ions include polyethylene oxide and Nafion. Examples of the metal oxide having mobile ions include cerium oxide (CeO 2 ), tantalum oxide (Ta 2 O 5 ), and oxide. Examples include zirconium (ZrO 2 ), niobium oxide (Nb 2 O 5 ), tungsten oxide (WO 3 ), and lithium oxide (Li 2 O).

第1の電極4及び第2の電極5の材料としては、例えば、電解質との化学反応について比較的不活性であるグラファイトを用いることができる。グラファイト以外にも、電子伝導性を有する金属、例えば、金、白金、銀、パラジウム、アルミニウム、鉄、銅、タングステン、チタン、タンタルを用いることができる。また、第1の電極4及び第2の電極5として、電子伝導性を有する半導体、例えば、炭素、シリコン、コバルト酸リチウムを用いることもできる。これらの金属及び半導体は、電場下でイオンとの化学反応が可能な活性物質を含んでいる。   As a material of the first electrode 4 and the second electrode 5, for example, graphite that is relatively inactive with respect to a chemical reaction with an electrolyte can be used. In addition to graphite, metals having electron conductivity, such as gold, platinum, silver, palladium, aluminum, iron, copper, tungsten, titanium, and tantalum can be used. In addition, as the first electrode 4 and the second electrode 5, a semiconductor having electron conductivity, for example, carbon, silicon, or lithium cobalt oxide can be used. These metals and semiconductors contain active substances that can chemically react with ions under an electric field.

<意思決定装置の動作>
図6と図7を参照しながら、本発明の動的に強化学習可能な意思決定装置の動作を説明する。図6は、図5に示した2端子電解質素子11に対して第2の電極側から電流を流すことによって、第1の電極4と第2の電極5の間の電圧(起電力)を変化させることができることを示している。
<Operation of decision making device>
With reference to FIGS. 6 and 7, the operation of the decision making apparatus capable of dynamically reinforcing learning of the present invention will be described. 6 changes the voltage (electromotive force) between the first electrode 4 and the second electrode 5 by flowing a current from the second electrode side to the two-terminal electrolyte element 11 shown in FIG. It shows that it can be made.

図6に示す電解質素子11を作製した段階(原点状態)では、図5に示す様に、電解質材料層3内には陰イオン1と陽イオン2が均一に分布している。次に、電解質素子11の第1の電極4側から電流を流すと、電解質材料層3内の負の電荷を有する陰イオン1は、第1の電極4と電解質材料層3の界面(以下、第1の電極側界面と称する。また、第2の電極5と電解質材料層3との界面を第2の電極側界面と称する。)付近に移動し、場合によっては一部が第1の電極内に侵入して、濃化する。このとき、陰イオン1の濃化により、第1の電極4には正の電荷h(正の極性の伝導キャリア)が蓄積される。一方、第1の電極と対向する第2の電極5においては、陰イオン1が減少して、正の極性のイオンである陽イオン2が残される。そのため、第2の電極5には負の電荷e(負の極性の伝導キャリア)が蓄積される。
この状態は平行極板キャパシタに蓄電したのと類似の状態であるので、第1の電極4と第2の電極5との間に、第1の電極4を正の極性とした電圧(Vで表し、第1の電極4側の電圧を印加電圧の極性とする。)が起電力として生じる。ここで、この起電力Vは、流れる電流や電解質材料層3内におけるイオン伝導度、イオン輸率によって変化する。なお、電流を流す時間は数ミリ秒から数秒が好ましい。
At the stage where the electrolyte element 11 shown in FIG. 6 is manufactured (origin state), as shown in FIG. 5, the anions 1 and cations 2 are uniformly distributed in the electrolyte material layer 3. Next, when a current is passed from the first electrode 4 side of the electrolyte element 11, the negative ions 1 having a negative charge in the electrolyte material layer 3 are exchanged between the interface between the first electrode 4 and the electrolyte material layer 3 (hereinafter, referred to as “the negative electrode 1”). The interface between the second electrode 5 and the electrolyte material layer 3 is referred to as a second electrode-side interface, and a part of the first electrode is referred to as the first electrode-side interface. It penetrates and thickens. At this time, due to the concentration of the negative ions 1, positive charges h + (positive polarity conduction carriers) are accumulated in the first electrode 4. On the other hand, in the second electrode 5 facing the first electrode, the anion 1 is reduced, and the cation 2 that is a positive polarity ion is left. Therefore, negative charges e (negative polarity conduction carriers) are accumulated in the second electrode 5.
Since this state is similar to that stored in the parallel plate capacitor, a voltage (V) between the first electrode 4 and the second electrode 5 with the first electrode 4 having a positive polarity. And the voltage on the first electrode 4 side is the polarity of the applied voltage). Here, the electromotive force V varies depending on the flowing current, the ion conductivity in the electrolyte material layer 3, and the ion transport number. It is preferable that the current flow time is several milliseconds to several seconds.

本装置で生じた起電力Vは電流により蓄積した電荷によるものなので、電流を停止して回路を開放しても起電力はすぐには失われない。そして、さらに電流を流すことにより、起電力を増減させることが可能である。   Since the electromotive force V generated in this apparatus is due to the electric charge accumulated by the current, the electromotive force is not lost immediately even if the current is stopped and the circuit is opened. Further, it is possible to increase or decrease the electromotive force by flowing a current further.

次に、図7を用いて強化学習及び意思決定の手順を、報酬確率P、P(%)を持つ二つの行動A、Bの選択を行う場合を例に挙げて説明する。意思決定装置100は、結果的に、正の起電力(電圧)を示す場合は行動Aを選択し、Pの確率で報酬を得るとする。逆に100‐P(%)の確率で報酬は得られない。同様に、負の起電力(電圧)を示す場合は行動Bを選択し、Pの確率で報酬を得るとする。このときは、100‐P(%)の確率で報酬が得られない。
図7のtに示す時点で起電力を正と判定すると、行動Aを選択するのでPの確率で報酬を得るが、装置上では、この報酬に対応する予め定めた値の正の電流を一定時間流しておく。正の電流により、起電力Vは正の極性で増大する(Vに対応)。電流を止めて回路を一定時間開くと、起電力Vの減衰が起こる(Vに対応)。回路を開いた状態でtの時点で起電力Vを判定した後、tの時点から再び電流(この場合は上記とは逆向きの電流)を流し、tの時点で起電力Vを判定する。そして、tの時点で回路を開き(Vに対応)、同様の過程を繰り返す。図7の時刻tからtの過程(図7のT)を1回の試行とし、この試行を繰り返し行う。試行回数を増すに従い、起電力が正、もしくは負に偏っていく。これを以て装置が行動A、もしくは行動Bを選択したと判断する。例えば、P>Pであれば、正の起電力に偏っていくとき、意思決定装置100は報酬確率がより高い行動を正しく選択したと解釈される。
Next, the procedure of reinforcement learning and decision making will be described with reference to FIG. 7, taking as an example the case of selecting two actions A and B having reward probabilities P A and P B (%). Decision device 100, consequently, if a positive electromotive force (voltage) selects an action A, and obtaining a reward with a probability of P A. Conversely, no reward can be obtained with a probability of 100-P A (%). Similarly, when a negative electromotive force (voltage) is indicated, action B is selected, and a reward is obtained with a probability of P B. At this time, no reward is obtained with a probability of 100-P B (%).
When determining the electromotive force as positive when shown in t 1 in FIG. 7, to obtain a compensation with a probability of P A so selecting an action A, it is on the device, a positive current of a predetermined value corresponding to the remuneration For a certain time. The positive current, the electromotive force V is (corresponding to V 1) to increase a positive polarity. Opening a circuit constant time stop current (corresponding to V 2) attenuation occurs electromotive force V. After determining the electromotive force V at the time t 2 with the circuit open, a current (current in the opposite direction to the above) is supplied again from the time t 2 , and the electromotive force V is applied at the time t 3. judge. Then, (corresponding to V 4) to open the circuit at the time of t 3, it repeats the same process. The process from time t 1 to time t 2 in FIG. 7 (T 1 in FIG. 7) is regarded as one trial, and this trial is repeated. As the number of trials increases, the electromotive force tends to be positive or negative. Based on this, it is determined that the device has selected action A or action B. For example, if P A > P B , when it is biased toward positive electromotive force, the decision-making device 100 is interpreted as correctly selecting an action with a higher reward probability.

本発明の意思決定装置100では、事象の行動に応じて電荷を電解質素子11に蓄積させていき、試行を繰り返した結果、最終的に蓄積された電荷による起電力により意思決定を行っている。本発明では、この電荷の蓄積素子として電気化学動作を行う電解質を用いたことが1つの要となっている。   In the decision making device 100 of the present invention, charges are accumulated in the electrolyte element 11 in accordance with the behavior of the event, and as a result of repeated trials, decision making is performed by electromotive force based on the accumulated charges. In the present invention, the use of an electrolyte that performs an electrochemical operation as the charge storage element is a key point.

例えば、電解質素子11に置き換えて、電子を蓄積するコンデンサーを電荷蓄積素子として用いた場合を考える。コンデンサーの場合は、電流印加によって蓄積された電荷をQとすると、報酬確率の変動に対応するために失われなければならない電荷も‐Qになる。コンデンサーの場合はこの関係性が厳密に成立する。意思決定工程を、パチンコを例に例えて言うと、1台のパチンコ台を使って10万円儲けた遊戯者は、その台で10万円以上損をするまでその台を諦められない状態に相当し、賢い意思決定とは言い難い状況になる。   For example, consider a case where a capacitor for storing electrons is used as the charge storage element instead of the electrolyte element 11. In the case of a capacitor, if the charge accumulated by applying a current is Q, the charge that must be lost to cope with fluctuations in the reward probability is -Q. In the case of capacitors, this relationship is strictly established. In the decision-making process, taking a pachinko machine as an example, a player who made 100,000 yen using one pachinko machine will not be able to give up that machine until he loses 100,000 yen or more on that machine. This is a situation that is hard to say as a smart decision.

一方、電荷蓄積素子として電解質素子を用いている本発明では、電気化学反応が進行することにより電荷が少しずつ失われていくため、報酬確率の変動に対応するために失われなければならないQはかなり小さくなる。上記のパチンコの例で言うと、10万円儲かった台で例えば3万円損をした段階で見切りをつけて他の台を選択するという判断が可能になり、より賢い意思決定ができる。   On the other hand, in the present invention using the electrolyte element as the charge storage element, the charge is gradually lost due to the progress of the electrochemical reaction. Therefore, the Q that must be lost in order to cope with the change in the reward probability is It becomes quite small. In the example of the above pachinko, it becomes possible to make a decision to give a decision to give up at a stage where a loss of 30,000 yen, for example, gives a loss of 30,000 yen and to select another stand.

(実施の形態2)
一連の強化学習と意思決定は2つ以上の行動に対しても、対応する電極を適宜増設することによって実施することが可能である。具体的には、上述の起電力の判定基準を、最も高いもしくは低い起電力を示す行動を選択する、と改めればよい。よって、原理上は取り扱うことが出来る行動の数には制限がない。
(Embodiment 2)
A series of reinforcement learning and decision making can be carried out for two or more actions by appropriately adding corresponding electrodes. Specifically, the determination criterion of the electromotive force described above may be changed to select an action showing the highest or lowest electromotive force. Therefore, in principle, there is no limit to the number of actions that can be handled.

以下、図を用いて詳細に説明する。
電極の数を第1の電極4、第2の電極5、そして第3の電極6と3つに増やした電解質素子51の例を図8に示す。ここで、第1の電極4、第2の電極5、第3の電極6をそれぞれ行動A、B、Cに対応させた場合を考える。実施の形態1で述べた2端子電解質素子1を用いた場合と同様に、図9に示すように、報酬確率Pに対応する電流を第1の電極4、第2の電極5、第3の電極6の間に流す。こうした試行を繰り返すことで、最も報酬確率の高い行動を選択することが可能になる。
Hereinafter, it demonstrates in detail using figures.
FIG. 8 shows an example of the electrolyte element 51 in which the number of electrodes is increased to three, that is, the first electrode 4, the second electrode 5, and the third electrode 6. Here, a case is considered in which the first electrode 4, the second electrode 5, and the third electrode 6 are made to correspond to actions A, B, and C, respectively. As with the embodiment 1 2 terminal electrolyte device 1 described in the embodiment, as shown in FIG. 9, compensation probability P A current first electrode 4 corresponding to the second electrode 5, third Between the two electrodes 6. By repeating such trials, it becomes possible to select the action with the highest reward probability.

電極の数が3つの3端子電解質素子31を用いた意思決定装置110の例を図10に示す。図10の意思決定装置110では、第1の電極33は、電解質材料層32からなる層を挟んで第2の電極34及び第3の電極35と対向した場合であるが、電極が並列に並んでいる3端子電解質素子51とその機能は変わらない。意思決定装置110では、電源スイッチ41、42、44、45、47,48、及び電圧計43,46,49を使って、実施の形態1と同様の手法で行動A、B、Cに対応して、報酬確率の高い行動を選択することが可能である。   An example of a decision making device 110 using a three-terminal electrolyte element 31 having three electrodes is shown in FIG. In the decision making device 110 of FIG. 10, the first electrode 33 is a case where the first electrode 33 faces the second electrode 34 and the third electrode 35 across the layer made of the electrolyte material layer 32, but the electrodes are arranged in parallel. The function of the three-terminal electrolyte element 51 is not different. The decision making device 110 uses the power switches 41, 42, 44, 45, 47, and 48 and the voltmeters 43, 46, and 49 to respond to the actions A, B, and C in the same manner as in the first embodiment. Thus, it is possible to select an action with a high reward probability.

(実施の形態3)
本技術を用いた場合、電解質素子1個による試行では最も報酬確率が高い行動のみしか決定出来ないのに対し、素子を増やすことによってより困難な問題を解くことが可能になる。電極を複数取り付けた電解質素子7及び8を、配線切替機9を介して電源(直流電源)60に接続した学習記憶装置部120を図11に示す。ここで、学習記憶装置部120は、意思決定装置の一部で、学習手段と電荷供給手段からなるモジュールである。
(Embodiment 3)
When this technique is used, only an action with the highest reward probability can be determined by trial with one electrolyte element, but more difficult problems can be solved by increasing the number of elements. FIG. 11 shows a learning storage device unit 120 in which the electrolyte elements 7 and 8 having a plurality of electrodes attached are connected to a power source (DC power source) 60 via a wiring switch 9. Here, the learning storage device unit 120 is a part of the decision making device, and is a module including learning means and charge supply means.

電解質素子8の最も高い電位を示す電極が第1の電極61の場合、第1の電極61に報酬確率Pに対応する電流を流す。このとき、図12に示すように、電解質素子8の第1の電極64と電解質素子7の第1の電極61とを電気的に繋ぎ、電解質素子7の第1の電極61以外の電極、例えば第3の電極63と、それに対応する電解質素子8の第3の電極66を電源(直流電源)60に電気的に繋ぎ、電流を流す。この場合、電解質素子7の第3の電極63と電解質素子8のそれに対応する第3の電極66には、それぞれ逆の符合の電荷が蓄積される。 When the electrode having the highest potential of the electrolyte element 8 of the first electrode 61, electric current corresponding to the remuneration probability P A to the first electrode 61. At this time, as shown in FIG. 12, the first electrode 64 of the electrolyte element 8 and the first electrode 61 of the electrolyte element 7 are electrically connected, and an electrode other than the first electrode 61 of the electrolyte element 7, for example, The third electrode 63 and the third electrode 66 of the electrolyte element 8 corresponding to the third electrode 63 are electrically connected to a power source (DC power source) 60, and a current flows. In this case, charges having opposite signs are accumulated in the third electrode 63 of the electrolyte element 7 and the third electrode 66 corresponding to that of the electrolyte element 8.

次に、電解質素子7の第1の電極61以外の電極として第2の電極62を選択した場合は、図13に示すように、ここでも電解質素子7の第2の電極62と電解質素子8の第2の電極65にはそれぞれ逆の符合の電荷が蓄積される。   Next, when the second electrode 62 is selected as an electrode other than the first electrode 61 of the electrolyte element 7, the second electrode 62 of the electrolyte element 7 and the electrolyte element 8 are again here as shown in FIG. 13. Charges of opposite signs are accumulated in the second electrodes 65, respectively.

こうした試行を第1の電解質素子7と第2の電解質素子8で交互に繰り返していくことで、最終的に電解質素子7と電解質素子8は異なった行動を選択するが、これは報酬確率の最も高い上位2つの行動に対応する。
このように、電解質素子1個による試行では最も報酬確率が高い行動のみしか決定出来ないのに対し、電解質素子の数を増やし、配線切替機(配線切替手段)を用いて適宜各電解質素子間の電極の電気的接合と切り離し、電源への接合と切り離しを行うことで上位2つ以上を決定するというより困難な問題を解くことが可能になる。
By repeating such a trial alternately with the first electrolyte element 7 and the second electrolyte element 8, the electrolyte element 7 and the electrolyte element 8 finally select different actions, which is the most reward probability. Corresponds to the top two actions.
In this way, only one action with the highest reward probability can be determined in the trial with one electrolyte element, whereas the number of electrolyte elements is increased, and the wiring switch (wiring switching means) is used to appropriately connect each electrolyte element. It is possible to solve the more difficult problem of determining the top two or more by disconnecting the electrode from electrical connection and disconnecting it from the power source.

以下、実施例により本発明をさらに詳細に説明するが、当然のこととして、本発明は以下の実施例に限定されるものではなく、特許請求の範囲のみにより規定されるものであることに注意されたい。   Hereinafter, the present invention will be described in more detail by way of examples. However, it should be understood that the present invention is not limited to the following examples, but is defined only by the claims. I want to be.

(実施例1)
実施例1では、図2に示す意思決定装置100を用いて、意思決定の評価を行った。そこでは、電解質素子11の電極数を2とし、報酬化率P、Pに応じてその2つの電極間に電源スイッチ15及び16を通じて下記所定の電圧を印加して、起電力の変化を電圧計14でモニターした。電解質素子11の電極4,5にはグラファイトを用い、電解質材料層3の電解質としては液体電解質であるテトラメチルアンモニウム-テトラフルオロボラート(TMA−BF)を用いた(図5参照)。
Example 1
In Example 1, the decision-making was evaluated using the decision-making apparatus 100 shown in FIG. In this case, the number of electrodes of the electrolyte element 11 is set to 2, and the following predetermined voltage is applied between the two electrodes through the power switches 15 and 16 in accordance with the compensation rates P A and P B to change the electromotive force. Monitored with a voltmeter 14. The electrodes 4 and 5 of the electrolyte element 11 with graphite as the electrolyte of the electrolyte material layer 3 tetramethylammonium a liquid electrolyte - using tetrafluoroborate (TMA-BF 4) (see FIG. 5).

行動A及びBの報酬確率をそれぞれP=80%、P=20%とし、正の起電力を示した場合に行動Aを選択、負の起電力を示した場合に行動Bを選択するとした。それぞれの行動A、BにおいてP、Pの確率で報酬を得た場合に印加する電流値を4mA、得なかった場合の電流値を3.9mAとした。また、電流の印加時間と回路解放時間をそれぞれ1秒間とした。以上の条件で行った試行により両電極間に生じた起電力変化の例を図14に示す。図7を用いて説明したのと同様の起電力変化が数100mV程度の大きさで実際に観察されていることがわかる。これは電流印加により電極界面近傍の電気二重層が変調されることに起因する。 When the reward probabilities for actions A and B are P A = 80% and P B = 20%, respectively, action A is selected when a positive electromotive force is indicated, and action B is selected when a negative electromotive force is indicated. did. In each of the actions A and B, the current value applied when the reward is obtained with the probability of P A and P B is 4 mA, and the current value when the reward is not obtained is 3.9 mA. The current application time and circuit release time were each 1 second. An example of a change in electromotive force generated between both electrodes as a result of trials performed under the above conditions is shown in FIG. It can be seen that the same electromotive force change as described with reference to FIG. 7 is actually observed with a magnitude of about several hundred mV. This is due to the fact that the electric double layer near the electrode interface is modulated by applying current.

時間に対して報酬確率が変化する行動群の中から場面に応じた強化学習によって最適な行動を選択させるという観点から、P、Pの変化に対する追従性が重要となる。そこで、この測定では試行回数100回毎にPとPの大きさを入れ替えている。その際に装置が報酬確率の高い行動を正しく選択した確率(正答確率)を試行回数に対してプロットすると図15となる。試行回数0回から10回では正答確率が40%以下であるが、試行回数40回でほぼ90%以上に到達している。次に、試行回数100回を超えた時点でPとPの値を反転させた所、直後は正答確率が0%に落ち込んだ。しかし、報酬確率の変動に対応して再び正答確率を高め、試行回数150回で再び正答確率がほぼ90%に達した。報酬確率の変動をさらに与えたが、同様に速やかに正答確率を回復させる挙動が観察された。 From the viewpoint of selecting an optimal action by reinforcement learning according to a scene from an action group whose reward probability changes with time, followability to changes in P A and P B is important. Therefore, and replacing the magnitude of P A and P B in attempts every 100 times in this measurement. At that time, the probability that the device correctly selects an action with a high reward probability (correct answer probability) is plotted against the number of trials as shown in FIG. The correct answer probability is 40% or less from 0 to 10 trials, but it has reached approximately 90% or more after 40 trials. Next, where by inverting the value of P A and P B at the time of exceeding the number of trials 100 times, immediately fell correct probability to 0%. However, the correct answer probability was increased again in response to the change in the reward probability, and the correct answer probability reached almost 90% again after 150 trials. Although the fluctuation of the reward probability was further given, the behavior of quickly recovering the correct answer probability was observed as well.

図16にP、Pを70%、30%として図15と同様に試行回数100回毎に入れ替えた際の正答確率の変化を示す。正答確率が90%以上に収束する試行回数が50回から70回と相対的に増加している。これは、図15での試行と比較してPとPの値が近く、意思決定までにより多くの試行回数を要する難しい問題であることと対応しており、合理的な結果と言える。 FIG. 16 shows changes in the probability of correct answers when P A and P B are set to 70% and 30%, and the number of trials is changed every 100 times as in FIG. The number of trials where the correct answer probability converges to 90% or more is relatively increased from 50 to 70 times. This, P A and values near P B as compared to trial at 15, corresponds with that a difficult problem requiring more attempts before decisions can be said that a reasonable result.

(実施例2)
実施例2では、実施例1で用いた装置の電解質のみを液体電解質から固体電解質であるナフィオンに代えて実施例1と同様の測定を行った場合を示す。報酬確率P、Pを60%、40%として試行回数200回毎に入れ替えて測定を行った結果を図17に示すが、その図から実施例1と同様の正答確率の変化が確認出来る。これは、液体、固体という電解質の状態に関わらずイオン伝導性によって強化学習、及びそれに伴う意思決定が可能となっていることを示している。この例ではナフィオン中を伝導するプロトンによって機能が得られている。
(Example 2)
In Example 2, a case where the same measurement as in Example 1 is performed by replacing only the electrolyte of the apparatus used in Example 1 with Nafion which is a solid electrolyte from a liquid electrolyte is shown. FIG. 17 shows the results of measurement with the reward probabilities P A and P B being set to 60% and 40% for every 200 trials. The same change in the correct answer probability as in Example 1 can be confirmed from the figure. . This indicates that reinforcement learning and accompanying decision making are possible by ionic conductivity regardless of the electrolyte state of liquid or solid. In this example, the function is obtained by protons conducted in Nafion.

綱引き原理は、学習結果を強く反映した強化学習に位置づけられている。本発明の意思決定装置は、小型で簡易なデバイスでかつ複雑な計算を必要とせずに、その強化学習に基づいて効率的に意思決定を行うことが可能である。このため、本発明の意思決定装置は産業分野で大いに利用される可能性がある。   The tug-of-war principle is positioned in reinforcement learning that strongly reflects the learning results. The decision making apparatus of the present invention is a small and simple device and can efficiently make a decision based on reinforcement learning without requiring complicated calculation. For this reason, there is a possibility that the decision making apparatus of the present invention is greatly utilized in the industrial field.

1:陰イオン
2:陽イオン
3:電解質材料層
4:第1の電極
5:第2の電極
6:第3の電極
7:電解質素子
8:電解質素子
9:配線切替機
11:電解質素子
12:第1の電源スイッチ
13:第2の電源スイッチ
14:電圧計
15,16:入力信号
21,25: MOSトランジスタ
22,26:直流電源
23,27:可変抵抗
24,28:ゲート
31:電解質素子
32:電解質材料層
33:第1の電極
34:第2の電極
35:第3の電極
41,42,44,45,47,48:電源スイッチ
43,46,49:電圧計
51:電解質素子
60:電源(直流電源)
100,110:意思決定装置
120:学習記憶装置部
1: anion 2: cation 3: electrolyte material layer 4: first electrode 5: second electrode 6: third electrode 7: electrolyte element 8: electrolyte element 9: wiring switch 11: electrolyte element 12: 1st power switch 13: 2nd power switch 14: Voltmeter 15, 16: Input signal 21, 25: MOS transistor 22, 26: DC power supply 23, 27: Variable resistance 24, 28: Gate 31: Electrolyte element 32 : Electrolyte material layer 33: first electrode 34: second electrode 35: third electrodes 41, 42, 44, 45, 47, 48: power switches 43, 46, 49: voltmeter 51: electrolyte element 60: Power supply (DC power supply)
100, 110: Decision-making device 120: Learning storage unit

Claims (15)

電荷の蓄積により学習を行う学習手段、事象の行動に応じた電荷を前記学習手段に与える電荷供給手段、及び前記学習手段の電圧を読み取る電圧読み取り手段を有し、前記電圧読み取り手段で読み取った電圧により意思を決定する意思決定装置であって、
前記学習手段は、電場によるイオンの輸送が可能な電解質材料層を2以上の電極で挟んだ電解質素子からなる、意思決定装置。
Learning means for performing learning by accumulating electric charge, charge supplying means for supplying the learning means with charge according to an event action, and voltage reading means for reading the voltage of the learning means, and the voltage read by the voltage reading means A decision-making device for making a decision by
The learning means is a decision making device comprising an electrolyte element in which an electrolyte material layer capable of transporting ions by an electric field is sandwiched between two or more electrodes.
前記2以上の電極間に前記電荷の流入による電流を流して前記イオンを輸送し、前記電極間に電圧を生じさせる、請求項1記載の意思決定装置。   The decision making device according to claim 1, wherein a current is generated between the two or more electrodes by flowing an electric current caused by the inflow of the electric charge to transport the ions and generate a voltage between the electrodes. 前記イオンの前記2以上の電極のうちの少なくとも1の電極側への移動または電極内への侵入により、前記2以上の電極に電子及び正孔が生成されて電圧が発生する、請求項1または2記載の意思決定装置。   The voltage is generated by generating electrons and holes in the two or more electrodes due to movement of the ions toward at least one of the two or more electrodes or intrusion into the electrodes. 2. The decision making device according to 2. 前記電解質材料層は液体電解質または固体電解質を含む、請求項1から3の何れか1に記載の意思決定装置。   The decision-making apparatus according to claim 1, wherein the electrolyte material layer includes a liquid electrolyte or a solid electrolyte. 前記液体電解質は、テトラメチルアンモニウムイオン(TMA)、テトラエチルアンモニウムイオン(TEA)、テトラブチルアンモニウムイオン(TBA)、テトラフルオロホウ酸イオン(BF )、N,N−ジエチル−N−メチル−N−(2−メトキシエチル)アンモニウム−ビス(トリフルオロメタンスルホニル)イミド(DEME−TFSI)、N,N−ジエチル−N−メチル−N−(2−メトキシエチル)アンモニウム−テトラフルオロボラート(DEME−BF)からなる群の少なくとも1を含む、請求項4記載の意思決定装置。 The liquid electrolyte includes tetramethylammonium ion (TMA + ), tetraethylammonium ion (TEA + ), tetrabutylammonium ion (TBA + ), tetrafluoroborate ion (BF 4 ), N, N-diethyl-N—. Methyl-N- (2-methoxyethyl) ammonium-bis (trifluoromethanesulfonyl) imide (DEME-TFSI), N, N-diethyl-N-methyl-N- (2-methoxyethyl) ammonium-tetrafluoroborate ( The decision-making device according to claim 4, comprising at least one of the group consisting of DEME-BF 4 ). 前記電解質材料は可動イオンを有する高分子化合物を含む請求項4に記載の意思決定装置。   The decision-making apparatus according to claim 4, wherein the electrolyte material includes a polymer compound having mobile ions. 前記高分子化合物はポリエチレンオキシドまたはナフィオンの少なくとも何れかを含む、請求項6に記載の意思決定装置。   The decision-making apparatus according to claim 6, wherein the polymer compound includes at least one of polyethylene oxide and Nafion. 前記電解質材料層は可動イオンを有する金属酸化物またはケイ酸(SiO) の少なくとも何れかを含む、請求項4に記載の意思決定装置。 The decision-making apparatus according to claim 4, wherein the electrolyte material layer includes at least one of a metal oxide having mobile ions and silicic acid (SiO 2 ). 前記金属酸化物は、酸化セリウム(CeO)、酸化タンタル(Ta)、酸化ジルコニウム(ZrO)、酸化ニオブ(Nb)、酸化タングステン(WO)、酸化リチウム(LiO)からなる群の少なくとも1を含む、請求項8に記載の意思決定装置。 The metal oxide includes cerium oxide (CeO 2 ), tantalum oxide (Ta 2 O 5 ), zirconium oxide (ZrO 2 ), niobium oxide (Nb 2 O 5 ), tungsten oxide (WO 3 ), and lithium oxide (Li 2 ). The decision-making device according to claim 8, comprising at least one of the group consisting of O). 前記2以上の電極は電子伝導性を有する金属または半導体の少なくとも何れかを含む、請求項4に記載の意思決定装置。   The decision making apparatus according to claim 4, wherein the two or more electrodes include at least one of a metal or a semiconductor having electronic conductivity. 前記金属は、金、白金、銀、パラジウム、アルミニウム、鉄、銅、タングステン、チタン、タンタルからなる群の少なくとも1を含む、請求項10に記載の意思決定装置。   The decision-making apparatus according to claim 10, wherein the metal includes at least one of a group consisting of gold, platinum, silver, palladium, aluminum, iron, copper, tungsten, titanium, and tantalum. 前記半導体は、炭素、シリコン、コバルト酸リチウムからなる群の少なくとも1を含む、請求項10に記載の意思決定装置。   The decision-making apparatus according to claim 10, wherein the semiconductor includes at least one selected from the group consisting of carbon, silicon, and lithium cobalt oxide. 前記金属及び半導体は、電場下でイオンとの化学反応が可能な活性物質を含む、請求項10に記載の意思決定装置。   The decision-making apparatus according to claim 10, wherein the metal and the semiconductor include an active substance capable of chemically reacting with ions under an electric field. 前記金属及び半導体は、電場下でイオン輸送が可能な電解質を含み、前記電解質材料層内及び前記2以上の電極のうちの一方の電極内のイオンが移動して他方の電極内に前記イオンが侵入する、請求項10に記載の意思決定装置。   The metal and the semiconductor include an electrolyte capable of ion transport under an electric field, and ions in one electrode of the electrolyte material layer and the two or more electrodes move to move the ions into the other electrode. The decision making device according to claim 10, which intrudes. 前記意思決定装置は配線切替手段を有する、請求項1から14の何れかに記載の意思決定装置。
The decision making apparatus according to any one of claims 1 to 14, wherein the decision making apparatus includes a wiring switching unit.
JP2017016294A 2017-01-31 2017-01-31 Decision maker Active JP6872226B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2017016294A JP6872226B2 (en) 2017-01-31 2017-01-31 Decision maker

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2017016294A JP6872226B2 (en) 2017-01-31 2017-01-31 Decision maker

Publications (2)

Publication Number Publication Date
JP2018124790A true JP2018124790A (en) 2018-08-09
JP6872226B2 JP6872226B2 (en) 2021-05-19

Family

ID=63109684

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2017016294A Active JP6872226B2 (en) 2017-01-31 2017-01-31 Decision maker

Country Status (1)

Country Link
JP (1) JP6872226B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020059723A1 (en) * 2018-09-18 2020-03-26 学校法人慶應義塾 Intention determination device, and method for controlling intention determination device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0335347A (en) * 1989-06-30 1991-02-15 Matsushita Electric Ind Co Ltd Information processing element
JP2012256657A (en) * 2011-06-08 2012-12-27 National Institute For Materials Science Synapse operation element

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0335347A (en) * 1989-06-30 1991-02-15 Matsushita Electric Ind Co Ltd Information processing element
JP2012256657A (en) * 2011-06-08 2012-12-27 National Institute For Materials Science Synapse operation element

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020059723A1 (en) * 2018-09-18 2020-03-26 学校法人慶應義塾 Intention determination device, and method for controlling intention determination device
JP7403739B2 (en) 2018-09-18 2023-12-25 慶應義塾 Decision-making device and method for controlling the decision-making device

Also Published As

Publication number Publication date
JP6872226B2 (en) 2021-05-19

Similar Documents

Publication Publication Date Title
Upadhyay et al. Emerging memory devices for neuromorphic computing
Lee et al. Charge transition of oxygen vacancies during resistive switching in oxide-based RRAM
Lübben et al. Active electrode redox reactions and device behavior in ECM type resistive switching memories
CN110622313B (en) Memory structure
Parejiya et al. Improving contact impedance via electrochemical pulses applied to lithium–solid electrolyte interface in solid-state batteries
O’Kelly et al. A single nanoscale junction with programmable multilevel memory
Meuffels et al. Fundamental issues and problems in the realization of memristors
Celano et al. Understanding the dual nature of the filament dissolution in conductive bridging devices
Erlandsson et al. Electrolysis‐reducing electrodes for electrokinetic devices
Choi et al. Structural engineering of Li-based electronic synapse for high reliability
Marchante et al. An electrically driven and readable molecular monolayer switch based on a solid electrolyte
Terabe et al. A variety of functional devices realized by ionic nanoarchitectonics, complementing electronics components
Duncan et al. Hydrogen doping in HfO2 resistance change random access memory
Erokhin et al. Electrochemically controlled polymeric device: a memristor (and more) found two years ago
Stoliar et al. Nonvolatile multilevel resistive switching memory cell: A transition metal oxide-based circuit
Dananjaya et al. Unidirectional threshold switching induced by Cu migration with high selectivity and ultralow off current under gradual electroforming treatment
JP6872226B2 (en) Decision maker
Jin et al. Ferroelectrically modulated ion dynamics in Li+ electrolyte-gated transistors for neuromorphic computing
Huang et al. Three-terminal resistive switch based on metal/metal oxide redox reactions
Zhai et al. Reconfigurable 2D-ferroelectric platform for neuromorphic computing
Sial et al. Artificial nociceptor using liquid ionic memory
Gao et al. Correlation between diode polarization and resistive switching polarity in Pt/TiO2/Pt memristive device
Demin et al. Electrochemical model of polyaniline-based memristor with mass transfer step
Liu et al. Proton-assisted redox-based three-terminal memristor for synaptic device applications
JP6712413B2 (en) Charge state control method using ion transport under magnetic field and its application

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20191223

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20201126

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20201208

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20210125

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20210330

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20210412

R150 Certificate of patent or registration of utility model

Ref document number: 6872226

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250