JP2010176583A

JP2010176583A - Apparatus, method, program and system for processing information

Info

Publication number: JP2010176583A
Application number: JP2009020841A
Authority: JP
Inventors: Masatake Fukunaga; 正剛福永
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2009-01-30
Filing date: 2009-01-30
Publication date: 2010-08-12

Abstract

<P>PROBLEM TO BE SOLVED: To provide an information processing apparatus, an information processing method, an information processing program and an information processing system, for securely protecting a learning routine and a learning result. <P>SOLUTION: The information processing apparatus includes: a first encrypting part 11 for encrypting a learning result learned on the basis of a predetermined learning algorithm having convergence property; a first storing part 12 for storing the encrypted learning result; a first decrypting part 13 for decrypting the encrypted learning result; an input part 14 in which first information input; an outputting part 15 for outputting second information to an external device; and a control part 16 which reads the encrypted learning result from the first storing part 12 when the first information is input, decrypts the encrypted learning result by the first decrypting part 13, generates operation information determining the next operation in the external device, as the second information on the basis of the first information with reference to the decrypted learning result, and outputs the operation information as the second information to the external device through the outputting part 15. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、情報を処理する情報処理装置、情報処理方法、情報処理プログラム及び情報処理システムに関する。 The present invention relates to an information processing apparatus, an information processing method, an information processing program, and an information processing system for processing information.

複数のユーザが使用する物、例えば、エレベータにおいて、効率的な運用を図るために、当該物を制御する制御装置が備えられている。ここで、エレベータの制御装置は、学習機能を有しており、当該学習機能の発揮によりエレベータを効率的に制御している。 In an object used by a plurality of users, for example, an elevator, a control device for controlling the object is provided for efficient operation. Here, the elevator control device has a learning function, and efficiently controls the elevator by exhibiting the learning function.

ここで、特許文献１において、人工知能を利用したエレベータシステムのトラフィック予測精度を改善する学習方法について提案されている。 Here, Patent Literature 1 proposes a learning method for improving the traffic prediction accuracy of an elevator system using artificial intelligence.

また、学習方法の代表例として強化学習機能がある。強化学習機能は、エレベータ等の製品に組み込まれ、より利便性の高いシステムを利用者に提供するための一機能として使用されている。 A typical example of the learning method is a reinforcement learning function. The reinforcement learning function is incorporated in a product such as an elevator and is used as one function for providing a user with a more convenient system.

特開平０５−２１３５４２号公報Japanese Patent Laid-Open No. 05-213542

ここで、実際のシステムに組み込まれて得られた学習機能における学習ルーチン（処理）や学習結果自体は、それぞれの製品等により異なるが、多大な資産価値があるため、不正に改ざんされたり、漏洩されたりすることから守りたいという要求が存在する。 Here, the learning routine (processing) and the learning result itself in the learning function obtained by being incorporated in the actual system differ depending on each product, etc., but they have a tremendous asset value, so they are illegally altered or leaked. There is a demand to protect from being done.

また、学習ルーチン（処理）や学習結果が不正に改ざんされた場合、それらが組み込まれた製品にセキュリティや性能面での重大な影響を及ぼし、また、コードやルーチンが漏洩することで資産価値に大きな影響を及ぼすことが予想される。 In addition, if a learning routine (processing) or learning result is tampered with, the product in which the learning routine (processing) or learning result is tampered will have a serious impact on security and performance, and the value of the asset may be increased by leaking code or routine. Expected to have significant impact.

そこで、本発明は、学習ルーチン（処理）や学習結果をセキュアに保護することができる情報処理装置、情報処理方法、情報処理プログラム及び情報処理システムを提供することを目的とする。 Accordingly, an object of the present invention is to provide an information processing apparatus, an information processing method, an information processing program, and an information processing system that can securely protect a learning routine (processing) and a learning result.

本発明は、以下のような解決手段により、前記課題を解決する。なお、理解を容易にするために、本発明の実施形態に対応する符号を付して説明するが、これに限定されるものではない。 The present invention solves the above problems by the following means. In addition, in order to make an understanding easy, although the code | symbol corresponding to embodiment of this invention is attached | subjected and demonstrated, it is not limited to this.

請求項１の発明は、収束性を有する所定の学習用のアルゴリズムに基づいて学習された学習結果を暗号化する第１の暗号化手段（１１）と、前記第１の暗号化手段により暗号化された前記学習結果を記憶する第１の記憶手段（１２）と、前記第１の記憶手段に記憶されている暗号化された前記学習結果を復号する第１の復号手段（１３）と、外部機器により生成された第１の情報が入力される入力手段（１４）と、前記外部機器に第２の情報を出力する出力手段（１５）と、前記入力手段に前記第１の情報が入力されたときに、前記第１の記憶手段に記憶されている暗号化された前記学習結果を読み出し、当該読み出した暗号化されている前記学習結果を前記第１の復号手段により復号し、当該復号された前記学習結果を参照して、前記第１の情報に基づき、前記外部機器における次の動作を決定する動作情報を第２の情報として生成し、前記第２の情報としての前記動作情報を前記出力手段を介して前記外部機器に出力する制御手段（１６）と、を備えることを特徴とする情報処理装置（１）である。 According to the first aspect of the present invention, there is provided first encryption means (11) for encrypting a learning result learned based on a predetermined learning algorithm having convergence, and encryption by the first encryption means. A first storage means (12) for storing the learning result that has been stored, a first decryption means (13) for decrypting the encrypted learning result stored in the first storage means, and an external Input means (14) for inputting first information generated by the device, output means (15) for outputting second information to the external device, and input of the first information to the input means. The encrypted learning result stored in the first storage means is read, and the read encrypted learning result is decrypted by the first decryption means, and the decrypted Referring to the learning result, the first Control means for generating operation information for determining a next operation in the external device as second information based on the information, and outputting the operation information as the second information to the external device via the output means (16). An information processing apparatus (1) comprising:

請求項２の発明は、請求項１に記載の情報処理装置（１）において、前記制御手段により生成された前記動作情報を暗号化する第２の暗号化手段（１７）と、前記第２の暗号化手段により暗号化された前記動作情報を記憶する第２の記憶手段（１８）と、を備え、前記制御手段は、前記第２の記憶手段に記憶されている暗号化された前記動作情報を前記出力手段を介して前記外部機器に出力することを特徴とする情報処理装置である。 According to a second aspect of the present invention, in the information processing apparatus (1) according to the first aspect, the second encryption means (17) for encrypting the operation information generated by the control means, and the second encryption means Second storage means (18) for storing the operation information encrypted by the encryption means, and the control means stores the encrypted operation information stored in the second storage means Is output to the external device via the output means.

請求項３の発明は、請求項１又は２に記載の情報処理装置（１）において、前記アルゴリズムに基づいて学習を行う学習手段（１９）を備え、前記制御手段（１６）は、前記動作情報に基づいて動作をした結果である動作結果情報が前記外部機器から前記入力手段を介して入力されたときに、前記動作結果情報を参照して、前記学習手段により前記アルゴリズムに基づく学習を行わせ、当該学習により得られた学習結果を前記第１の暗号化手段により暗号化し、当該暗号化された前記学習結果を前記第１の記憶手段に書込み、前記第１の記憶手段に記憶されている学習結果を更新することを特徴とする情報処理装置である。 A third aspect of the present invention is the information processing apparatus (1) according to the first or second aspect, further comprising learning means (19) for performing learning based on the algorithm, wherein the control means (16) includes the operation information. When the operation result information, which is the result of the operation based on the above, is input from the external device via the input unit, the learning unit performs learning based on the algorithm with reference to the operation result information. The learning result obtained by the learning is encrypted by the first encryption unit, and the encrypted learning result is written in the first storage unit and stored in the first storage unit. An information processing apparatus characterized by updating a learning result.

請求項４の発明は、請求項１から３までのいずれか１項に記載の情報処理装置（１）において、前記入力手段に入力される前記第１の情報は暗号化されており、暗号化された当該第１の情報を復号する第２の復号手段（２０）を備え、前記制御手段（１６）は、前記学習結果を参照して、前記第２の復号手段により復号された前記第１の情報に基づき、前記外部機器における次の動作を決定する動作情報を第２の情報として生成することを特徴とする情報処理装置である。 According to a fourth aspect of the present invention, in the information processing apparatus (1) according to any one of the first to third aspects, the first information input to the input means is encrypted, The control means (16) refers to the learning result, and the control means (16) decodes the first information decoded by the second decoding means. On the basis of the information, the operation information for determining the next operation in the external device is generated as the second information.

請求項５の発明は、請求項３に記載の情報処理装置（１）において、前記入力手段（１４）により入力された前記第１の情報には、所定の時刻における環境情報が含まれており、前記学習手段（１９）は、動的な環境に適したアルゴリズムと、静的な環境に適したアルゴリズムとを有しており、前記環境情報に基づいて、静的な環境であるか又は動的な環境であるかを判断し、当該判断の結果に基づいて最適となるアルゴリズムを選択し、選択したアルゴリズムに基づいて学習を行うことを特徴とする情報処理装置。 According to a fifth aspect of the present invention, in the information processing apparatus (1) according to the third aspect, the first information input by the input means (14) includes environmental information at a predetermined time. The learning means (19) has an algorithm suitable for a dynamic environment and an algorithm suitable for a static environment. Based on the environment information, the learning means (19) is a static environment or a dynamic environment. An information processing apparatus that determines whether the environment is a natural environment, selects an optimal algorithm based on a result of the determination, and performs learning based on the selected algorithm.

請求項６の発明は、収束性を有する所定の学習用のアルゴリズムに基づいて学習された学習結果を暗号化する暗号化工程と、前記暗号化工程により暗号化された前記学習結果を記憶部に記憶する記憶工程と、前記記憶部に記憶されている暗号化された前記学習結果を復号する復号工程と、外部機器により生成された第１の情報が入力される入力工程と、前記入力工程により前記第１の情報が入力されたときに、前記記憶部に記憶されている暗号化された前記学習結果を読み出し、当該読み出した暗号化されている前記学習結果を前記復号工程により復号し、当該復号された前記学習結果を参照して、前記第１の情報に基づき、前記外部機器における次の動作を決定する動作情報を第２の情報として生成し、前記第２の情報としての前記動作情報を前記外部機器に出力する制御工程と、を有することを特徴とする情報処理方法である。 According to a sixth aspect of the present invention, an encryption step of encrypting a learning result learned based on a predetermined learning algorithm having convergence, and the learning result encrypted by the encryption step are stored in a storage unit. A storage step of storing, a decryption step of decrypting the encrypted learning result stored in the storage unit, an input step of inputting first information generated by an external device, and the input step When the first information is input, the encrypted learning result stored in the storage unit is read, the read encrypted learning result is decrypted by the decryption step, With reference to the decoded learning result, based on the first information, operation information for determining a next operation in the external device is generated as second information, and the operation information as the second information is generated. Which is the information processing method characterized by and a control step of outputting to the external device.

請求項７の発明は、コンピュータに、収束性を有する所定の学習用のアルゴリズムに基づいて学習された学習結果を暗号化する暗号化工程と、前記暗号化工程により暗号化された前記学習結果を記憶部に記憶する記憶工程と、前記記憶部に記憶されている暗号化された前記学習結果を復号する復号工程と、外部機器により生成された第１の情報が入力される入力工程と、前記入力工程により前記第１の情報が入力されたときに、前記記憶部に記憶されている暗号化された前記学習結果を読み出し、当該読み出した暗号化されている前記学習結果を前記復号工程により復号し、当該復号された前記学習結果を参照して、前記第１の情報に基づき、前記外部機器における次の動作を決定する動作情報を第２の情報として生成し、前記第２の情報としての前記動作情報を前記外部機器に出力する制御工程と、を実行させるための情報処理プログラムである。 According to a seventh aspect of the present invention, an encryption process for encrypting a learning result learned based on a predetermined learning algorithm having convergence, and the learning result encrypted by the encryption process are stored in a computer. A storage step of storing in the storage unit; a decryption step of decrypting the encrypted learning result stored in the storage unit; an input step of inputting first information generated by an external device; When the first information is input by the input step, the encrypted learning result stored in the storage unit is read, and the read encrypted learning result is decoded by the decryption step Then, referring to the decoded learning result, based on the first information, generates operation information for determining a next operation in the external device as second information, and the second information and A control step of outputting to the external device the operation information of the Te is an information processing program for execution.

請求項８の発明は、情報処理装置と外部機器とを備えて所定の情報の処理を実行する情報処理システムにおいて、前記情報処理装置（１）は、収束性を有する所定の学習用のアルゴリズムに基づいて学習された学習結果を暗号化する第１の暗号化手段（１１）と、前記第１の暗号化手段により暗号化された前記学習結果を記憶する第１の記憶手段（１２）と、前記第１の記憶手段に記憶されている暗号化された前記学習結果を復号する第１の復号手段（１３）と、外部機器により生成された第１の情報が入力される入力手段（１４）と、前記外部機器に第２の情報を出力する第１の出力手段（１５）と、前記入力手段に前記第１の情報が入力されたときに、前記第１の記憶手段に記憶されている暗号化された前記学習結果を読み出し、当該読み出した暗号化されている前記学習結果を前記第１の復号手段により復号し、当該復号された前記学習結果を参照して、前記第１の情報に基づき、前記外部機器における次の動作を決定する動作情報を第２の情報として生成し、前記第２の情報としての前記動作情報を前記第１の出力手段を介して前記外部機器に出力する制御手段（１６）と、を備え、前記外部機器は、前記第１の情報を生成する情報生成手段（３１）と、前記第１の情報を前記入力手段に出力する第２の出力手段（３２）と、前記第２の情報としての前記動作情報に基づいて動作を実行する動作制御手段（３３）と、を備えることを特徴とする情報処理システムである。 The invention according to claim 8 is an information processing system that includes an information processing device and an external device and executes processing of predetermined information. The information processing device (1) uses a predetermined learning algorithm having convergence. A first encryption unit (11) for encrypting a learning result learned based on the first storage unit (12) for storing the learning result encrypted by the first encryption unit; First decryption means (13) for decrypting the encrypted learning result stored in the first storage means, and input means (14) for inputting first information generated by an external device And first output means (15) for outputting second information to the external device, and when the first information is input to the input means, the first information is stored in the first storage means. The encrypted learning result is read and the read The encrypted learning result is decrypted by the first decryption means, and the next operation in the external device is determined based on the first information with reference to the decrypted learning result. Control means (16) for generating operation information to be generated as second information, and outputting the operation information as the second information to the external device via the first output means, The device includes an information generating unit (31) that generates the first information, a second output unit (32) that outputs the first information to the input unit, and the operation as the second information. An information processing system comprising: an operation control means (33) for executing an operation based on information.

本発明によれば、以下の効果を奏することができる。
（１）本発明は、セキュリティデバイスであるＩＣカードに内蔵されて、外部機器（例えば、エレベータ）に備えられているＲ／Ｗ（リーダライタ）部に挿入されることにより、設置されている場所、運転状況及び運転時間等に適応した外部機器の固有の動作（例えば、エレベータの昇降動作）を強化学習により制御し、その過程で得られた学習結果をセキュアに保護することができる。
また、本発明は、収束性を有する所定の学習用のアルゴリズムに基づいて学習された学習結果を利用するので、学習時間の経過にしたがってある一定の学習結果に収束するため、安定した学習結果によって外部機器を制御することができる。
（２）本発明は、資産価値がある学習ルーチン及び学習結果が不正に改ざんされたり、漏洩されたりする状況を回避することができる。
（３）本発明は、学習部を備えるので、外部機器に含まれている制御装置に学習機能が備わっていない場合であっても、収束性を有する所定の学習用のアルゴリズムにより得られる学習結果によって外部機器の動作を制御することができる。
（４）本発明は、セキュリティデバイスであるＩＣカードに内蔵されて、外部機器（例えば、エレベータ）に備えられているＲ／Ｗ（リーダライタ）部に挿入されることにより、外部機器から暗号化された第１の情報（環境情報であって、例えば、報酬情報ｒ（ｔ）と環境情報Ｓ（ｔ）等）が入力されてくるので、第１の情報を不正に利用されることがなく、安全性の高いシステムを構築することができる。
（５）本発明は、環境に適したアルゴリズムにより学習を行うので、環境に適した学習結果により外部機器を制御することができる。 According to the present invention, the following effects can be obtained.
(1) The present invention is a place installed by being inserted into an R / W (reader / writer) unit provided in an external device (for example, an elevator) built in an IC card as a security device. The unique operation of the external device (for example, the elevator lifting / lowering operation) adapted to the driving situation and the driving time can be controlled by reinforcement learning, and the learning result obtained in the process can be securely protected.
In addition, since the present invention uses a learning result learned based on a predetermined learning algorithm having convergence, it converges to a certain learning result as the learning time elapses. External devices can be controlled.
(2) The present invention can avoid a situation in which a learning routine having an asset value and a learning result are tampered with or leaked.
(3) Since the present invention includes a learning unit, even when the control device included in the external device does not have a learning function, a learning result obtained by a predetermined learning algorithm having convergence Can control the operation of the external device.
(4) The present invention is incorporated in an IC card, which is a security device, and encrypted from an external device by being inserted into an R / W (reader / writer) unit provided in an external device (for example, an elevator). Since the received first information (environment information, for example, reward information r (t) and environment information S (t)) is input, the first information is not used illegally. , You can build a highly secure system.
(5) Since the present invention performs learning using an algorithm suitable for the environment, the external device can be controlled based on the learning result suitable for the environment.

情報処理装置の第１の構成を示す機能ブロック図である。It is a functional block diagram which shows the 1st structure of information processing apparatus. 学習結果を保護するシステムの処理の流れについての説明に供するタイミングチャートである。It is a timing chart with which it uses for description about the flow of a process of the system which protects a learning result. 情報処理装置の第２の構成を示す機能ブロック図である。It is a functional block diagram which shows the 2nd structure of information processing apparatus. 情報処理装置の第３の構成を示す機能ブロック図である。It is a functional block diagram which shows the 3rd structure of information processing apparatus. 学習結果及び学習ルーチンを保護するシステムの処理の流れについての説明に供するタイミングチャートである。It is a timing chart with which it uses for description about the flow of a process of the system which protects a learning result and a learning routine. 情報処理装置の第４の構成を示す機能ブロック図である。It is a functional block diagram which shows the 4th structure of information processing apparatus. 制御装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of a control apparatus. 情報処理装置を外部機器の昇降制御に利用した場合におけるシステム構成を示す図である。It is a figure which shows the system configuration | structure in case an information processing apparatus is utilized for the raising / lowering control of an external apparatus.

以下、図面等を参照しながら、本発明の実施の形態について、さらに詳しく説明する。図１は、外部機器と通信を行う情報処理装置１のブロック図である。情報処理装置１は、強化学習を利用して外部機器を効率的に制御し、その過程で得られる学習ルーチン（処理）や学習結果を、暗号化技術を利用して保護する機能を有しており、例えば、セキュリティデバイスであるＩＣカード等のメディアに内蔵される。なお、当該メディアは、物理的あるいは論理的に内部の情報を読み取らせ難い機能（耐タンパ機能）を有している。また、以下では、「外部機器」には、外部機器（例えば、エレベータ）の動作を制御する制御装置が含まれているものとし、また、制御装置には、セキュリティデバイスであるＩＣカード等のメディアに対して、情報の読み込み及び書込みが可能なＲ／Ｗ（リーダライタ）部が備えられているものとする。 Hereinafter, embodiments of the present invention will be described in more detail with reference to the drawings. FIG. 1 is a block diagram of an information processing apparatus 1 that communicates with an external device. The information processing apparatus 1 has a function of efficiently controlling an external device using reinforcement learning and protecting a learning routine (processing) and a learning result obtained in the process using an encryption technique. For example, it is built in a medium such as an IC card which is a security device. Note that the medium has a function (tamper resistant function) that makes it difficult to read internal information physically or logically. In the following description, it is assumed that the “external device” includes a control device that controls the operation of the external device (for example, an elevator), and the control device includes a medium such as an IC card that is a security device. On the other hand, it is assumed that an R / W (reader / writer) unit capable of reading and writing information is provided.

当該機能を実現するために、情報処理装置１は、図１に示すように、第１の暗号化部１１（第１の暗号化手段）と、第１の記憶部１２（第１の記憶手段）と、第１の復号部１３（第１の復号手段）と、入力部１４（入力手段）と、出力部１５（出力手段）と、制御部１６（制御手段）とを備える。 In order to realize the function, the information processing apparatus 1 includes a first encryption unit 11 (first encryption unit) and a first storage unit 12 (first storage unit) as illustrated in FIG. ), A first decoding unit 13 (first decoding unit), an input unit 14 (input unit), an output unit 15 (output unit), and a control unit 16 (control unit).

第１の暗号化部１１は、詳細を後述する、収束性を有する所定の学習用のアルゴリズムに基づいて学習された学習結果を暗号化する。ここで、「収束性を有する所定の学習用のアルゴリズム」について説明する。当該アルゴリズムは、エレベータ等の製品に組み込まれ、より利便性の高いシステムを利用者に提供するための強化学習を実行するアルゴリズムである。また、強化学習（例えば、Ｑ−ＬｅａｒｎｉｎｇやＰｒｏｆｉｔＳｈａｒｉｎｇ）とは、行動の結果（出力）にのみ評価（報酬）を与え、その評価が最大になる出力を学習する仕組みを指す。また、強化学習では、学習するために様々な行動を試し、それぞれの結果に応じた評価を得て、高評価であった行動を優先的に実行する。そして、強化学習では、学習が完了すると、行動には特定の報酬が与えられた状態となり、報酬が高い行動は優先的に選択されるようになる。 The first encryption unit 11 encrypts a learning result learned based on a predetermined learning algorithm having convergence, which will be described in detail later. Here, “a predetermined learning algorithm having convergence” will be described. The algorithm is an algorithm for executing reinforcement learning for providing a user with a more convenient system incorporated in a product such as an elevator. Reinforcement learning (for example, Q-Learning or Profit Sharing) refers to a mechanism that gives an evaluation (reward) only to a result (output) of an action and learns an output that maximizes the evaluation. In reinforcement learning, various behaviors are tested for learning, evaluations according to the results are obtained, and behaviors that have been highly evaluated are preferentially executed. In the reinforcement learning, when learning is completed, a specific reward is given to the action, and an action with a high reward is preferentially selected.

第１の記憶部１２は、第１の暗号化部１１により暗号化された学習結果を記憶する。第１の復号部１３は、第１の記憶部１２に記憶されている暗号化された学習結果を復号する。入力部１４は、外部機器により生成された第１の情報が入力される。出力部１５は、外部機器に第２の情報を出力する。 The first storage unit 12 stores the learning result encrypted by the first encryption unit 11. The first decryption unit 13 decrypts the encrypted learning result stored in the first storage unit 12. The input unit 14 receives first information generated by an external device. The output unit 15 outputs the second information to the external device.

制御部１６は、入力部１４に第１の情報が入力されたときに、第１の記憶部１２に記憶されている暗号化された学習結果を読み出し、当該読み出した暗号化されている学習結果を第１の復号部１３により復号し、当該復号された学習結果を参照して、第１の情報に基づき、外部機器における次の動作を決定する動作情報を第２の情報として生成し、第２の情報としての動作情報を、出力部１５を介して当該外部機器に出力する。 When the first information is input to the input unit 14, the control unit 16 reads the encrypted learning result stored in the first storage unit 12, and reads the read encrypted learning result. Is decoded by the first decoding unit 13, and the operation information for determining the next operation in the external device is generated as the second information based on the first information with reference to the decoded learning result, The operation information as the second information is output to the external device via the output unit 15.

また、学習結果を保護するシステムの処理の流れについて図２を参照しながら説明する。なお、図２では、情報処理装置１を有するデバイスをセキュリティデバイス１０１とし、外部機器における次の動作を決定する制御部１６の機能を動作決定ルーチン１０２とし、外部機器を制御する装置を制御システム１０３とし、強化学習を実行する機能を学習ルーチン１０４とする。 Further, the processing flow of the system for protecting the learning result will be described with reference to FIG. In FIG. 2, the device having the information processing apparatus 1 is the security device 101, the function of the control unit 16 that determines the next operation in the external device is the operation determination routine 102, and the device that controls the external device is the control system 103. A function for executing reinforcement learning is a learning routine 104.

学習ルーチン１０４は、学習結果をセキュリティデバイス１０１に供給する（Ｓ１）。セキュリティデバイス１０１では、学習結果を暗号化して第１の記憶部１２に格納する。
制御システム１０３は、現状の情報を動作決定ルーチン１０２に供給し、次の動作の指示を要求する（Ｓ２）。動作決定ルーチン１０２は、現在、第１の記憶部１２に格納されている学習結果に基づいて、次の動作を決定し（Ｓ３）、次の動作を指示する情報を制御システム１０３に供給する（Ｓ４）。 The learning routine 104 supplies the learning result to the security device 101 (S1). In the security device 101, the learning result is encrypted and stored in the first storage unit 12.
The control system 103 supplies the current information to the operation determination routine 102 and requests an instruction for the next operation (S2). The action determination routine 102 determines the next action based on the learning result currently stored in the first storage unit 12 (S3), and supplies the control system 103 with information instructing the next action (S3). S4).

制御システム１０３は、動作決定ルーチン１０２から供給された次の動作を決定する情報に基づいて外部機器の動作を制御し、そのときの状況情報及び結果（乗客の待ち時間等）を学習ルーチン１０４に供給する（Ｓ５）。学習ルーチン１０４は、制御システム１０３から供給された状況情報及び結果に基づいて強化学習を実行する（Ｓ６）。また、制御システム１０３は、実行した学習結果をセキュリティデバイス１０１に供給する（Ｓ７）。セキュリティデバイス１０１では、学習結果を更新するために、新しい学習結果を暗号化して第１の記憶部１２に格納する（古い学習結果を上書きする）。 The control system 103 controls the operation of the external device based on the information for determining the next operation supplied from the operation determination routine 102, and the status information and results (passenger waiting time etc.) at that time are stored in the learning routine 104. Supply (S5). The learning routine 104 executes reinforcement learning based on the situation information and the result supplied from the control system 103 (S6). In addition, the control system 103 supplies the executed learning result to the security device 101 (S7). In order to update the learning result, the security device 101 encrypts the new learning result and stores it in the first storage unit 12 (overwriting the old learning result).

このようにして構成される情報処理装置１は、例えば、セキュリティデバイスであるＩＣカードに内蔵されて、外部機器（例えば、エレベータ）に備えられているＲ／Ｗ（リーダライタ）部に挿入されることにより、設置されている場所、運転状況及び運転時間等に適応した外部機器の固有の動作（例えば、エレベータの昇降動作）を強化学習により制御し、その過程で得られた学習結果をセキュアに保護することができる。したがって、情報処理装置１は、資産価値がある学習結果が不正に改ざんされたり、漏洩されたりする状況を回避することができる。また、情報処理装置１では、収束性を有する所定の学習用のアルゴリズムに基づいて学習された学習結果を利用するので、学習時間の経過にしたがってある一定の学習結果に収束するため、安定した学習結果によって外部機器を制御することができる。 The information processing apparatus 1 configured as described above is incorporated in, for example, an IC card that is a security device, and is inserted into an R / W (reader / writer) unit provided in an external device (for example, an elevator). By controlling the specific operation of the external device (for example, elevator lifting / lowering operation) adapted to the installation location, operation status, and operation time, etc., through reinforcement learning, the learning results obtained in the process are secured. Can be protected. Therefore, the information processing apparatus 1 can avoid a situation in which a learning result having an asset value is tampered with or leaked. In addition, since the information processing apparatus 1 uses a learning result learned based on a predetermined learning algorithm having convergence, the information processing apparatus 1 converges to a certain learning result as the learning time elapses. The external device can be controlled according to the result.

また、情報処理装置１は、図３に示すように、図１に示す構成にさらに、第２の暗号化部１７（第２の暗号化手段）と、第２の記憶部１８（第２の記憶手段）とを備える構成であっても良い。なお、図３では、第２の暗号化部１７の周辺の構成のみを示す。 Further, as shown in FIG. 3, the information processing apparatus 1 further includes a second encryption unit 17 (second encryption unit) and a second storage unit 18 (second storage) in addition to the configuration shown in FIG. (Storage means). FIG. 3 shows only the configuration around the second encryption unit 17.

第２の暗号化部１７は、制御部１６により生成された動作情報を暗号化する。第２の記憶部１８は、第２の暗号化部１７により暗号化された動作情報を記憶する。このように構成される場合には、制御部１６は、第２の記憶部１８に記憶されている暗号化された動作情報を、出力部１５を介して外部機器に出力する。 The second encryption unit 17 encrypts the operation information generated by the control unit 16. The second storage unit 18 stores the operation information encrypted by the second encryption unit 17. When configured in this manner, the control unit 16 outputs the encrypted operation information stored in the second storage unit 18 to the external device via the output unit 15.

このようにして構成される情報処理装置１は、例えば、セキュリティデバイスであるＩＣカードに内蔵されて、外部機器（例えば、エレベータ）に備えられているＲ／Ｗ（リーダライタ）部に挿入されることにより、設置されている場所、運転状況及び運転時間等に適応した外部機器の固有の動作（例えば、エレベータの昇降動作）を強化学習により制御し、その過程で得られた動作情報としての学習ルーチン及び学習結果をセキュアに保護することができる。したがって、情報処理装置１は、資産価値がある学習ルーチンや学習結果が不正に改ざんされたり、漏洩されたりする状況を回避することができる。また、情報処理装置１では、収束性を有する所定の学習用のアルゴリズムに基づいて学習された学習結果を利用するので、学習時間の経過にしたがってある一定の学習結果に収束するため、安定した学習結果によって外部機器を制御することができる。 The information processing apparatus 1 configured as described above is incorporated in, for example, an IC card that is a security device, and is inserted into an R / W (reader / writer) unit provided in an external device (for example, an elevator). By using reinforcement learning to control the specific operation of the external device (for example, elevator lifting / lowering operation) adapted to the installation location, operation status, operation time, etc., learning as operation information obtained in the process Routines and learning results can be securely protected. Therefore, the information processing apparatus 1 can avoid a situation in which the learning routine having the asset value or the learning result is tampered with or leaked. In addition, since the information processing apparatus 1 uses a learning result learned based on a predetermined learning algorithm having convergence, the information processing apparatus 1 converges to a certain learning result as the learning time elapses. The external device can be controlled according to the result.

また、情報処理装置１は、図４に示すように、図１の構成にさらに、アルゴリズムに基づいて学習を行う学習部１９（学習手段）を備える構成であっても良い。このような構成の場合には、制御部１６は、動作情報に基づいて動作をした結果である動作結果情報が外部機器から入力部１４を介して入力されたときに、動作結果情報を参照して、学習部１９により収束性を有する所定の学習用のアルゴリズムにより学習を行わせ、当該学習により得られた学習結果を第１の暗号化部１１により暗号化し、当該暗号化された学習結果を第１の記憶部１２に書込み、第１の記憶部１２に記憶されている学習結果を更新する。 Further, as illustrated in FIG. 4, the information processing apparatus 1 may be configured to further include a learning unit 19 (learning unit) that performs learning based on an algorithm in addition to the configuration of FIG. 1. In the case of such a configuration, the control unit 16 refers to the operation result information when the operation result information that is the result of the operation based on the operation information is input from the external device via the input unit 14. Then, the learning unit 19 performs learning using a predetermined learning algorithm having convergence, the learning result obtained by the learning is encrypted by the first encryption unit 11, and the encrypted learning result is obtained. Write to the first storage unit 12 and update the learning result stored in the first storage unit 12.

また、学習結果及び学習ルーチンを保護するシステムの処理の流れについて図５を参照しながら説明する。なお、図５では、情報処理装置１を有するデバイスをセキュリティデバイス１０１とし、外部機器における次の動作を決定する制御部１６の機能を動作決定ルーチン１０２とし、外部機器を制御する装置を制御システム１０３とし、強化学習を実行する機能を学習ルーチン１０４とする。 Further, the flow of processing of the system for protecting the learning result and the learning routine will be described with reference to FIG. In FIG. 5, the device having the information processing apparatus 1 is the security device 101, the function of the control unit 16 that determines the next operation in the external device is the operation determination routine 102, and the device that controls the external device is the control system 103. A function for executing reinforcement learning is a learning routine 104.

制御システム１０３は、現状の情報を動作決定ルーチン１０２に供給し、次の動作の指示を要求する（Ｓ１１）。動作決定ルーチン１０２は、動作決定ルーチン１０２は、現在第１の記憶部１２に格納されている学習結果に基づいて、次の動作を決定し（Ｓ１２）、次の動作を指示する情報を制御システム１０３に供給する（Ｓ１３）。 The control system 103 supplies the current information to the operation determination routine 102 and requests an instruction for the next operation (S11). The operation determination routine 102 determines the next operation based on the learning result currently stored in the first storage unit 12 (S12), and sends information indicating the next operation to the control system. 103 (S13).

制御システム１０３は、動作決定ルーチン１０２から供給された次の動作を決定する情報に基づいて外部機器の動作（例えば、エレベータの昇降動作）を制御し（Ｓ１４）、そのときの状況情報及び結果（乗客の待ち時間等）を学習ルーチン１０４に供給する（Ｓ１５）。 The control system 103 controls the operation of the external device (for example, elevator lifting / lowering operation) based on the information for determining the next operation supplied from the operation determination routine 102 (S14), and the situation information and result at that time ( Passenger waiting time etc.) are supplied to the learning routine 104 (S15).

学習ルーチン１０４は、制御システム１０３から供給された状況情報及び結果に基づいて強化学習を実行し、学習結果を更新する（Ｓ１６）。具体的には、セキュリティデバイス１０１では、学習結果を更新するために、新しい学習結果を暗号化して第１の記憶部１２に格納する（古い学習結果を上書きする）。また、学習ルーチン１０４は、更新された学習結果を動作決定ルーチン１０２に反映させる（Ｓ１７）。 The learning routine 104 executes reinforcement learning based on the situation information and the result supplied from the control system 103, and updates the learning result (S16). Specifically, in order to update the learning result, the security device 101 encrypts the new learning result and stores it in the first storage unit 12 (overwriting the old learning result). Further, the learning routine 104 reflects the updated learning result in the operation determination routine 102 (S17).

このようにして構成される情報処理装置１は、学習部１９を備えるので、外部機器に含まれている制御装置に学習機能が備わっていない場合であっても、収束性を有する所定の学習用のアルゴリズムにより得られる学習結果によって外部機器の動作を制御することができる。 Since the information processing apparatus 1 configured as described above includes the learning unit 19, even if the control device included in the external device does not have a learning function, the information processing apparatus 1 has a predetermined learning function that has convergence. The operation of the external device can be controlled by the learning result obtained by the algorithm.

また、情報処理装置１は、図６に示すように、図１に示す構成にさらに、入力部１４に入力される第１の情報は暗号化されており、暗号化された当該第１の情報を復号する第２の復号部２０（第２の復号手段）を備える構成であっても良い。このような構成の場合には、制御部１６は、学習結果を参照して、第２の復号部２０により復号された第１の情報に基づき、外部機器における次の動作を決定する動作情報を第２の情報として生成する。また、入力部１４に入力された第１の情報は、第２の復号部２０により復号された後、学習部１９に入力される。 Further, as shown in FIG. 6, the information processing apparatus 1 has the configuration shown in FIG. 1, and the first information input to the input unit 14 is encrypted, and the encrypted first information The configuration may include a second decoding unit 20 (second decoding means) that decodes. In the case of such a configuration, the control unit 16 refers to the learning result, based on the first information decoded by the second decoding unit 20, operation information for determining the next operation in the external device. Generated as second information. The first information input to the input unit 14 is decoded by the second decoding unit 20 and then input to the learning unit 19.

このようにして構成される情報処理装置１は、例えば、セキュリティデバイスであるＩＣカードに内蔵されて、外部機器（例えば、エレベータ）に備えられているＲ／Ｗ（リーダライタ）部に挿入されることにより、外部機器から暗号化された第１の情報（環境情報であって、例えば、後述する報酬情報ｒ（ｔ）と環境情報Ｓ（ｔ）等）が入力されてくるので、第１の情報を不正に利用されることがなく、安全性の高いシステムを構築することができる。 The information processing apparatus 1 configured as described above is incorporated in, for example, an IC card that is a security device, and is inserted into an R / W (reader / writer) unit provided in an external device (for example, an elevator). As a result, the encrypted first information (environment information, for example, reward information r (t) and environment information S (t) described later) is input from the external device. It is possible to construct a highly secure system without illegally using information.

ここで、学習部１９により実行される学習方法について詳細に説明する。なお、以下では、強化学習としてＱ−Ｌｅａｒｎｉｎｇアルゴリズムを適用した場合を想定して説明するが、これは一例であって、収束性を有する学習用のアルゴリズムであれば他のものであっても適用が可能である。また、以下では、外部機器３に含まれている制御装置２は、図７に示すように、第１の情報としての報酬情報（ｒ（ｔ））及び環境情報（Ｓ（ｔ））を生成する情報生成部３１と、第１の情報を情報処理装置１の入力部１４に出力する出力部３２と、第２の情報としての動作情報に基づいて外部機器３（例えば、エレベータ）の動作を制御する動作制御部３３とを備えているものとする。また、出力部３２は、第１の情報を暗号化して情報処理装置１に出力する機能を有しているものとする。 Here, the learning method executed by the learning unit 19 will be described in detail. In the following description, the case where the Q-Learning algorithm is applied as reinforcement learning will be described. However, this is only an example, and any other learning algorithm having convergence can be applied. Is possible. In the following, the control device 2 included in the external device 3 generates reward information (r (t)) and environment information (S (t)) as first information, as shown in FIG. The information generation unit 31 that performs the operation, the output unit 32 that outputs the first information to the input unit 14 of the information processing apparatus 1, and the operation of the external device 3 (for example, an elevator) based on the operation information as the second information. It is assumed that an operation control unit 33 to be controlled is provided. The output unit 32 has a function of encrypting the first information and outputting it to the information processing apparatus 1.

学習部１９により利用される強化学習のための一般化式は、以下の通りである。時刻ｔにおける行動情報、報酬情報、環境情報、行動指標値を、それぞれａ（ｔ）、ｒ（ｔ）、Ｓ（ｔ）、Ｑ（ｔ）とすると、基本的に以下の（１）式及び（２）式になる。
ａ（ｔ）＝ｆ（Ｓ（ｔ），Ｑ（ｔ））・・・（１）
Ｑ（ｔ＋１）＝Ｅ（ｒ（ｔ），Ｓ（ｔ），ａ（ｔ））・・・（２） A generalized expression for reinforcement learning used by the learning unit 19 is as follows. Assuming that the behavior information, reward information, environment information, and behavior index value at time t are a (t), r (t), S (t), and Q (t), respectively, (2)
a (t) = f (S (t), Q (t)) (1)
Q (t + 1) = E (r (t), S (t), a (t)) (2)

ここで、ｆ（ｘ）は、環境情報と行動指標値を用いて、行動を決定するための任意の関数であり、Ｅ（ｘ）は、行動情報、報酬情報及び環境情報から行動指標値を更新するための任意の学習用関数である。なお、学習部１９は、学習結果を収束させるように演算を行うため、Ｑ（ｔ）を収束させるような学習用関数を選択する。 Here, f (x) is an arbitrary function for determining the behavior using the environmental information and the behavior index value, and E (x) is the behavior index value from the behavior information, the reward information, and the environment information. It is an arbitrary learning function for updating. Note that the learning unit 19 selects a learning function that converges Q (t) in order to perform computation so as to converge the learning result.

また、情報処理装置１をＩＣカードに内蔵させる構成においては、外部機器３から入力される各種の情報（第１の情報）は、暗号化して入力されることが要求される場合がある。その場合には、制御装置２の出力部３２は、当該情報（第１の情報）を暗号化してＩＣカードに出力する。ここで、暗号化に利用する鍵を「ｋ」とし、暗号化対象データを「ｘ」とした場合の任意の暗号化関数を、Ｅｎｃ_ｋ（ｘ）と表すと、ＩＣカードに出力される情報（第１の情報）は、以下の（３）式及び（４）式となる。
Ｓ´（ｔ）＝Ｅｎｃ_ｋ（Ｓ（ｔ））・・・（３）
ｒ´（ｔ）＝Ｅｎｃ_ｋ（ｒ（ｔ））・・・（４） Further, in the configuration in which the information processing apparatus 1 is built in the IC card, various information (first information) input from the external device 3 may be required to be input after being encrypted. In that case, the output unit 32 of the control device 2 encrypts the information (first information) and outputs it to the IC card. Here, when an encryption function when the key used for encryption is “k” and the encryption target data is “x” is represented as Enc _k (x), information output to the IC card (First information) is expressed by the following equations (3) and (4).
S ′ (t) = Enc _k (S (t)) (3)
r ′ (t) = Enc _k (r (t)) (4)

なお、Ｓ´（ｔ）及びｒ´（ｔ）は、それぞれ暗号化された環境情報及び報酬情報である。また、情報処理装置１は、第２の復号部２０により暗号化された第１の情報を復号する。ここで、復号関数をＤｅｃ_ｋ（ｘ）とすると、第２の復号部２０において実行されるべき関数は以下の（５）式及び（６）式になる。
ａ（ｔ）＝ｆ（Ｄｅｃ_ｋ（Ｓ´（ｔ）），Ｑ（ｔ））・・・（５）
Ｑ（ｔ＋１）＝Ｅ（Ｄｅｃ_ｋ（ｒ´（ｔ）），Ｄｅｃ_ｋ（Ｓ´（ｔ）），ａ（ｔ））・・・（６） Note that S ′ (t) and r ′ (t) are encrypted environment information and reward information, respectively. Further, the information processing apparatus 1 decrypts the first information encrypted by the second decryption unit 20. Here, when the decoding function is Dec _k (x), the functions to be executed in the second decoding unit 20 are the following expressions (5) and (6).
a (t) = f (Dec _k (S ′ (t)), Q (t)) (5)
Q (t + 1) = E (Dec _k (r ′ (t)), Dec _k (S ′ (t)), a (t)) (6)

また、この関数に適合する一例として、Ｑ−Ｌｅａｒｎｉｎｇアルゴリズムの一般式に上記条件を適用すると、以下の（７）式になる。
Ｑ（Ｓ_ｔ，ａ）←Ｑ（Ｄｅｃ_ｋ（ｓ´_ｔ），ａ）＋α（Ｄｅｃ_ｋ（ｒ´_ｔ＋１）＋γｍａｘＱ（Ｄｅｃ_ｋ（ｓ´_ｔ＋１），ｐ）−Ｑ（Ｄｅｃ_ｋ（ｓ´_ｔ），ａ）・・・（７） Further, as an example of conforming to this function, when the above condition is applied to a general expression of the Q-Learning algorithm, the following expression (7) is obtained.
Q (S _t , a) ← Q (Dec _k (s ′ _t ), a) + α (Dec _k (r ′ _{t + 1} ) + γmaxQ (Dec _k (s ′ _{t + 1} ), p) −Q (Dec _k (s ′ _t ), A) (7)

また、情報処理装置１では、上述したように、入力部１４により入力された第１の情報には、所定の時刻ｔにおける環境情報Ｑ（ｔ）が含まれている。また、学習部１９は、動的な環境に適した収束性を有する所定の学習用のアルゴリズム（例えば、ＰｒｏｆｉｔＳｈａｒｉｎｇアルゴリズム）と、静的な環境に適した収束性を有する所定の学習用のアルゴリズム（例えば、Ｑ−Ｌｅａｒｎｉｎｇアルゴリズム）とを有しており、環境情報Ｑ（ｔ）に基づいて、静的な環境であるか又は動的な環境であるかを判断し、当該判断の結果に基づいて最適となるアルゴリズムを選択し、選択したアルゴリズムに基づいて学習を行う。
このように構成されることにより、情報処理装置１は、環境に適したアルゴリズムにより学習を行うので、環境に適した学習結果により外部機器を制御することができる。 In the information processing apparatus 1, as described above, the first information input by the input unit 14 includes the environment information Q (t) at the predetermined time t. The learning unit 19 also includes a predetermined learning algorithm (for example, Profit Sharing algorithm) having convergence suitable for a dynamic environment and a predetermined learning algorithm having convergence suitable for a static environment. (For example, Q-Learning algorithm), based on the environment information Q (t), it is determined whether it is a static environment or a dynamic environment, and based on the result of the determination The optimal algorithm is selected, and learning is performed based on the selected algorithm.
With this configuration, the information processing apparatus 1 performs learning using an algorithm suitable for the environment, and thus can control an external device based on a learning result suitable for the environment.

ここで、情報処理装置１をエレベータ（外部機器）の昇降制御に利用した場合について説明する。図８は、システム全体の模式図であり、外部機器としてのエレベータのカゴ２０１と、カゴ２０１の昇降動作を制御する制御システム２０２と、情報処理装置１を有するセキュリティデバイス２０３とから構成される。なお、制御システム２０２は、上述した制御システム１０３に対応する同一構成のシステムであり、セキュリティデバイス２０３は、上述したセキュリティデバイス１０１と同一構成のデバイスである。また、エレベータに関する制御の一つに停止階制御がある。停止階制御とは、例えば、高層の建物において、エレベータのカゴ２０１が４階に停止しているときに、１階と５階においてそれぞれ同時期に乗場押しボタンが押下された場合にカゴ２０１の移動（昇降）等を制御することである。 Here, the case where the information processing apparatus 1 is used for elevator control of an elevator (external device) will be described. FIG. 8 is a schematic diagram of the entire system, and includes an elevator car 201 as an external device, a control system 202 that controls the lifting operation of the car 201, and a security device 203 having the information processing apparatus 1. The control system 202 is a system having the same configuration corresponding to the control system 103 described above, and the security device 203 is a device having the same configuration as the security device 101 described above. One of the controls related to elevators is stop floor control. Stop floor control is, for example, in a high-rise building, when the elevator car 201 is stopped on the fourth floor and the hall push button is pressed at the same time on the first floor and the fifth floor. Controlling movement (lifting) and the like.

また、強化学習の観点で考えると、評価（報酬）として与えられるものは各階においてエレベータのカゴ２０１を待っている乗客の待ち時間となる。エレベータのカゴ２０１は４階に停止中において、例えば、１階で待っている乗客により上階への移動を指示するボタンが押下され、同時期に、５階で待っている乗客により下階への移動を指示するボタンが押下された場合には、エレベータのカゴ２０１の昇降を制御する制御システム２０２は、以下の２通りの昇降制御を指示することが考えられる。なお、エレベータのカゴ２０１の昇降動作には、移動する、停止する、上に移動する、下に移動する、の４つ動作があり、制御システム２０２は、各動作を制御する。
１．制御システム２０２は、１階の乗客を乗せるために、４階から１階にカゴ２０１を移動し、停止して（停止後、扉を開き、１階の乗客を乗せた後、扉を閉める）、１階から５階にカゴ２０１を移動し、停止して（途中階では停止指示がされなかったものとし、カゴ２０１の停止後、扉を開き、５階の乗客を乗せて、扉を閉める）、再び、５階から１階にカゴ２０１を移動し、停止する（停止後、扉を開き、乗客を降ろした後、扉を閉めて次の指示を待つ）。
２．制御システム２０２は、５階の乗客を乗せるために、４階から５階にカゴ２０１を移動し、停止して（停止後、扉を開き、５階の乗客を乗せた後、扉を閉める）、５階から１階にカゴ２０１を移動し、停止する（途中階では停止指示がされなかったものとする。また、停止後、扉を開き、乗客を降ろした後、扉を閉めて次の指示を待つ）。 Also, from the viewpoint of reinforcement learning, what is given as an evaluation (reward) is the waiting time of passengers waiting for the elevator car 201 on each floor. When the elevator car 201 is stopped on the fourth floor, for example, a passenger instructing movement to the upper floor is pressed by a passenger waiting on the first floor, and at the same time, a passenger waiting on the fifth floor moves to the lower floor. When the button for instructing the movement of the elevator is pressed, the control system 202 that controls the raising / lowering of the elevator car 201 may instruct the following two types of raising / lowering control. The elevator car 201 is moved up and down in four ways: moving, stopping, moving up, and moving down. The control system 202 controls each operation.
1. The control system 202 moves the basket 201 from the fourth floor to the first floor to stop passengers on the first floor, and stops (opens the door after stopping, closes the door after putting passengers on the first floor). Move the basket 201 from the first floor to the fifth floor, stop (assuming that the stop instruction was not given on the middle floor, open the door after the basket 201 stops, put the passenger on the fifth floor, and close the door ) Again, the basket 201 is moved from the fifth floor to the first floor and stopped (after stopping, the door is opened, the passenger is lowered, the door is closed, and the next instruction is waited).
2. The control system 202 moves the basket 201 from the 4th floor to the 5th floor and puts it on stop to put passengers on the 5th floor. Move the basket 201 from the fifth floor to the first floor and stop it (assuming that the stop instruction was not given on the middle floor. After stopping, the door is opened, the passengers are lowered, the door is closed, and the next Wait for instructions).

カゴ２０１は、１．の動作の場合、最短でも４Ｆ→１Ｆ→５Ｆの順で移動するので、途中階における停止指示がない場合には、７階分移動しなければならない。一方、カゴ２０１は、２．の動作の場合、４Ｆ→５Ｆ→１Ｆの順で移動するので、途中階における停止指示がない場合には、５階分の移動で乗客の輸送が完了する。 The basket 201 includes: In the case of the above operation, the movement moves in the order of 4F → 1F → 5F at the shortest. Therefore, when there is no stop instruction on the intermediate floor, it is necessary to move 7 floors. On the other hand, the basket 201 is 2. In the case of the above operation, the vehicle moves in the order of 4F → 5F → 1F. Therefore, when there is no stop instruction on the middle floor, the passenger transportation is completed by moving the fifth floor.

情報処理装置１を有するセキュリティデバイス２０３は、１．の動作と２．の動作を試した結果、２．の動作に対し高い評価（報酬）を与えることとなる。なお、セキュリティデバイス２０３は、単に動作だけでなく、各階で待っている乗客の数や実際の利用状況を把握して強化学習を行うため、場合によっては、１．の動作に対して高い評価（報酬）を与えることもある。 The security device 203 having the information processing apparatus 1 includes: And 2. As a result of trying the operation of 2. A high evaluation (reward) will be given to the movement of. Note that the security device 203 performs not only the operation but also the reinforcement learning by grasping the number of passengers waiting at each floor and the actual usage status. A high evaluation (reward) may be given to the movements of.

また、保護すべき情報（資産価値が高い情報）は、行動（上階に異動するか、又は下階に移動するか）に対する評価結果（乗客の待ち時間）を学習する「学習ルーチン」、及び学習した結果を反映し、次の行動（エレベータのカゴ２０１の移動制御）に利用する「評価結果」となる。そのため、情報処理装置１では、学習ルーチンを実装したコードや評価結果を示すデータ等を暗号化して保護する。 In addition, information to be protected (information with high asset value) is a “learning routine” for learning an evaluation result (passenger waiting time) for an action (whether it moves to the upper floor or moves to the lower floor), and Reflecting the learning result, it becomes an “evaluation result” used for the next action (movement control of the elevator car 201). For this reason, the information processing apparatus 1 encrypts and protects a code that implements a learning routine, data indicating an evaluation result, and the like.

したがって、情報処理装置１を利用したエレベータのカゴ２０１の昇降動作を制御するシステムにおいては、暗号化によりセキュアに保護されており、収束性を有する所定の学習用のアルゴリズムにより生成された評価結果を用いて、次のエレベータのカゴ２０１の移動順序について効率的に決定することができる。 Therefore, in the system that controls the lifting and lowering operation of the elevator car 201 using the information processing device 1, the evaluation result generated by a predetermined learning algorithm that is protected by encryption and has convergence is used. It is possible to efficiently determine the order of movement of the next elevator car 201.

また、情報処理装置１は、学習ルーチンが行動及び報酬に基づいた学習を評価結果に反映するので、学習を繰り返すごとに評価結果が変動し、学習回数が多くなるほど収束して安定した評価結果となって外部機器に入力される。また、情報処理装置１は、時間的に規則性のある条件ごと（例えば、曜日ごと、週ごと、月ごと等）に評価結果を生成しておき、各条件に沿って外部機器を制御するような構成であっても良い。 Moreover, since the learning routine reflects learning based on behavior and reward in the evaluation result, the evaluation result fluctuates each time learning is repeated, and the evaluation result that converges and stabilizes as the number of times of learning increases. Is input to the external device. Further, the information processing apparatus 1 generates an evaluation result for each condition with regularity in time (for example, every day of the week, every week, every month, etc.), and controls the external device in accordance with each condition. It may be a simple configuration.

本実施例では、単一のエレベータのカゴの昇降制御について説明したが、これに限られず、複数台のエレベータが並ぶような場合における停止階制御に利用されても良いし、特定の階に停止しない不停止制御（フロアカット）に利用されても良いし、また、強化学習が適用可能な技術分野であればエレベータの昇降制御以外の制御に用いられても良い。 In the present embodiment, the elevator elevator control of a single elevator has been described. However, the present invention is not limited to this, and may be used for stop floor control when a plurality of elevators are lined up, or stop at a specific floor. It may be used for non-stop control (floor cut) that is not performed, or may be used for control other than elevator lifting control in a technical field to which reinforcement learning is applicable.

１情報処理装置
２制御装置
３外部機器
１１第１の暗号化部
１２第１の記憶部
１３第１の復号部
１４入力部
１５、３２出力部
１６制御部
１７第２の暗号化部
１８第２の記憶部
１９学習部
２０第２の復号部
３１情報生成部
３３動作制御部 DESCRIPTION OF SYMBOLS 1 Information processing apparatus 2 Control apparatus 3 External apparatus 11 1st encryption part 12 1st memory | storage part 13 1st decoding part 14 Input part 15, 32 Output part 16 Control part 17 2nd encryption part 18 2nd Storage unit 19 learning unit 20 second decoding unit 31 information generation unit 33 operation control unit

Claims

First encryption means for encrypting a learning result learned based on a predetermined learning algorithm having convergence;
First storage means for storing the learning result encrypted by the first encryption means;
First decryption means for decrypting the encrypted learning result stored in the first storage means;
Input means for inputting first information generated by an external device;
Output means for outputting second information to the external device;
When the first information is input to the input unit, the encrypted learning result stored in the first storage unit is read, and the read encrypted learning result is Decoding by a first decoding means, referring to the decoded learning result, and generating operation information for determining a next operation in the external device as second information based on the first information; An information processing apparatus comprising: control means for outputting the operation information as the second information to the external device via the output means.

The information processing apparatus according to claim 1,
Second encryption means for encrypting the operation information generated by the control means;
Second storage means for storing the operation information encrypted by the second encryption means,
The information processing apparatus, wherein the control means outputs the encrypted operation information stored in the second storage means to the external device via the output means.

The information processing apparatus according to claim 1 or 2,
Learning means for performing learning based on the algorithm,
The control means refers to the operation result information when the operation result information that is the result of the operation based on the operation information is input from the external device via the input means, and the learning means Learning based on the algorithm is performed, the learning result obtained by the learning is encrypted by the first encryption unit, and the encrypted learning result is written in the first storage unit, An information processing apparatus characterized by updating a learning result stored in the storage means.

In the information processing apparatus according to any one of claims 1 to 3,
The first information input to the input means is encrypted, and includes second decryption means for decrypting the encrypted first information,
The control means generates, as second information, operation information for determining a next operation in the external device based on the first information decoded by the second decoding means with reference to the learning result. An information processing apparatus characterized by:

The information processing apparatus according to claim 3.
The first information input by the input means includes environment information at a predetermined time,
The learning means includes an algorithm suitable for a dynamic environment and an algorithm suitable for a static environment. Based on the environment information, the learning means is a static environment or a dynamic environment. An information processing apparatus characterized by determining whether there is an algorithm, selecting an optimal algorithm based on a result of the determination, and performing learning based on the selected algorithm.

An encryption step of encrypting a learning result learned based on a predetermined learning algorithm having convergence;
A storage step of storing the learning result encrypted by the encryption step in a storage unit;
A decryption step of decrypting the encrypted learning result stored in the storage unit;
An input process in which first information generated by an external device is input;
When the first information is input by the input step, the encrypted learning result stored in the storage unit is read, and the read encrypted learning result is read by the decryption step. Decoding, referring to the decoded learning result, generating operation information for determining a next operation in the external device as second information based on the first information, and as the second information And a control step for outputting the operation information to the external device.

On the computer,
An encryption step of encrypting a learning result learned based on a predetermined learning algorithm having convergence;
A storage step of storing the learning result encrypted by the encryption step in a storage unit;
A decryption step of decrypting the encrypted learning result stored in the storage unit;
An input process in which first information generated by an external device is input;
When the first information is input by the input step, the encrypted learning result stored in the storage unit is read, and the read encrypted learning result is read by the decryption step. Decoding, referring to the decoded learning result, generating operation information for determining a next operation in the external device as second information based on the first information, and as the second information A control step of outputting the operation information to the external device;
Information processing program to execute.

In an information processing system that includes an information processing device and an external device and executes processing of predetermined information,
The information processing apparatus includes:
First encryption means for encrypting a learning result learned based on a predetermined learning algorithm having convergence;
First storage means for storing the learning result encrypted by the first encryption means;
First decryption means for decrypting the encrypted learning result stored in the first storage means;
Input means for inputting first information generated by an external device;
First output means for outputting second information to the external device;
When the first information is input to the input unit, the encrypted learning result stored in the first storage unit is read, and the read encrypted learning result is Decoding by a first decoding means, referring to the decoded learning result, and generating operation information for determining a next operation in the external device as second information based on the first information; Control means for outputting the operation information as the second information to the external device via the first output means,
The external device is
Information generating means for generating the first information;
Second output means for outputting the first information to the input means;
And an operation control unit that controls an operation based on the operation information as the second information.