JP2023175366A

JP2023175366A - Control device, control method, and program

Info

Publication number: JP2023175366A
Application number: JP2022087778A
Authority: JP
Inventors: 大地和田; Daichi Wada; 篤司大瀬戸; Atsushi OSEDO; 深作久田; Shinsaku HISADA
Original assignee: Japan Aerospace Exploration Agency JAXA
Current assignee: Japan Aerospace Exploration Agency JAXA
Priority date: 2022-05-30
Filing date: 2022-05-30
Publication date: 2023-12-12
Also published as: WO2023233857A1

Abstract

To provide a control device, a control method, and a program capable of preferably controlling a flying device regardless of a physical constitution of a user or regardless of the existence of a user.SOLUTION: A control device according to an embodiment controls a user-wearable flying device. The control device comprises a processing unit which obtains state data regarding a state of the flying device and operation data regarding an operation of the flying device, inputs the obtained state data and operation data to a model learned by using deep reinforcement learning, and controls the flying device based on an output result of the model to which the state data and the operation data are input.SELECTED DRAWING: Figure 3

Description

本発明は、制御装置、制御方法、及びプログラムに関する。 The present invention relates to a control device, a control method, and a program.

ジェットやロケットの推力を利用してユーザを飛行させる装着型の飛行装置（飛行器具）が知られている。このような飛行装置は、ポータブル・パーソナル・エアモビリティ・システムとも呼ばれている。一方で、深層強化学習を用いてロボットを制御する技術が知られている（例えば非特許文献１参照）。 2. Description of the Related Art Wearable flying devices (flight instruments) that allow users to fly using the thrust of jets or rockets are known. Such flying devices are also called portable personal air mobility systems. On the other hand, a technique for controlling a robot using deep reinforcement learning is known (for example, see Non-Patent Document 1).

X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 3803-3810.X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 3803- 3810.

人間は一人ひとり体格が異なっているため、飛行装置がヘリコプターのような大きな装置でなく、スーツのように、人間の体格の差に対して相対的に大きな影響を受ける装置である場合、飛行装置を装着するユーザに応じて飛行装置の制御方法を調整する必要がある。しかしながら、従来の技術では、ユーザに応じて飛行装置の制御方法を十分に調整できていなかった。また、その制御方法はユーザが替わるたびに調整し直す必要があり、時間的或いは経済的コストが大きかった。 Since each human being has a different physique, if the flight device is not a large device such as a helicopter, but a device that is relatively affected by differences in human physique, such as a suit, the flight device should be It is necessary to adjust the control method of the flight device depending on the user wearing the device. However, with the conventional technology, it has not been possible to sufficiently adjust the control method of the flight device depending on the user. Furthermore, the control method needs to be readjusted every time the user changes, resulting in large time and economic costs.

本発明は、このような事情を考慮してなされたものであり、ユーザを問わずに飛行装置を好適に制御することができる制御装置、制御方法、及びプログラムを提供することを目的の一つとする。 The present invention has been made in consideration of such circumstances, and one of the objects is to provide a control device, a control method, and a program that can suitably control a flight device regardless of the user. do.

本発明の一態様は、ユーザが装着可能な飛行装置を制御するための制御装置である。前記飛行装置は、前記飛行装置の状態に関する状態データと、前記飛行装置の操作に関する操作データとを取得し、深層強化学習を用いて学習されたモデルに対して、前記取得した姿勢データ及び操作データを入力し、前記状態データ及び操作データが入力された前記モデルの出力結果に基づいて、前記飛行装置を制御する処理部を備える。 One aspect of the present invention is a control device for controlling a flight device wearable by a user. The flight device acquires state data regarding the state of the flight device and operation data regarding the operation of the flight device, and applies the acquired attitude data and operation data to a model learned using deep reinforcement learning. and a processing unit that controls the flight device based on the output result of the model to which the state data and operation data are input.

本発明の一態様によれば、ユーザの体格に依らずに、又はユーザの有無に依らずに飛行装置を好適に制御することができる。 According to one aspect of the present invention, the flight device can be suitably controlled regardless of the physique of the user or the presence or absence of the user.

実施形態に係る飛行装置１の利用場面を説明するための図である。FIG. 2 is a diagram for explaining a usage scene of the flight device 1 according to the embodiment. 実施形態に係る飛行装置１の構成例を表す図である。FIG. 1 is a diagram illustrating a configuration example of a flight device 1 according to an embodiment. 実施形態に係る制御装置１００の構成例を表す図である。FIG. 1 is a diagram illustrating a configuration example of a control device 100 according to an embodiment. 処理部１７０の一連の処理の流れを示すフローチャートである。3 is a flowchart showing the flow of a series of processes performed by the processing unit 170. FIG. 深層強化学習モデルＭＤＬの一例を表す図である。FIG. 2 is a diagram illustrating an example of a deep reinforcement learning model MDL.

以下、図面を参照し、本発明の制御装置、制御方法、及びプログラムの実施形態について説明する。 DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of a control device, a control method, and a program according to the present invention will be described below with reference to the drawings.

［飛行装置の利用場面］
図１は、実施形態に係る飛行装置１の利用場面を説明するための図である。図示のように、飛行装置１はユーザＵによって装着される。ユーザＵによって装着された飛行装置１は、ユーザＵの操縦によって飛行したり、又はオートパイロットのように自律的に飛行したりする。例えば、飛行装置１は、出発地Ａから目的地Ｂまで移動するために利用される。飛行装置１を装着したユーザＵが出発地Ａから目的地Ｂまで移動した後、飛行装置１を脱着して目的地Ｂに降り立った場合、飛行装置１は再度ユーザＵが装着するまで目的地Ｂの周辺でホバリングし続けてもよいし、自律飛行によって目的地Ｂから出発地Ａまで戻ってもよい。飛行装置１は、予め決められた単独のユーザだけでなく、不特定多数のユーザによって利用されてよい。 [Usage situations of flight equipment]
FIG. 1 is a diagram for explaining a usage scene of a flight device 1 according to an embodiment. As shown, the flight device 1 is worn by a user U. The flight device 1 worn by the user U flies under the control of the user U, or flies autonomously like an autopilot. For example, the flight device 1 is used to travel from a departure point A to a destination B. When user U wearing flight device 1 travels from departure point A to destination B, then detaches flight device 1 and lands at destination B, flight device 1 remains at the destination until user U attaches it again. It may continue to hover around B, or it may return from destination B to departure point A by autonomous flight. The flight device 1 may be used not only by a single predetermined user but also by an unspecified number of users.

例えば、飛行装置１は、山岳救助隊が、山のふもとに設置された本部基地（出発地Ａ）から登山道内の救助現場（目的地Ｂ）に空路で向かうために利用されてよい。この際、１人目の救助隊員が目的地Ｂに到着した後、飛行装置１を脱着して目的地Ｂに降り立ち、その後、飛行装置１が単独で出発地Ａまで戻ることで、２人目の救助隊員が飛行装置１を装着して救助現場に向かう。これを繰り返すことで、１つの飛行装置１によって複数の救助隊員を目的地Ｂに出動させることができる。また、救助隊員が目的地Ｂに到着した後、飛行装置１を脱着して目的地Ｂに降り立ち、その後、飛行装置１が単独で出発地Ａや給油地Ｃまで向かい、出発地Ａや給油地Ｃにおいて給油を終えた後に飛行装置１が単独で目的地Ｂまで戻るようにしてもよい。この場合、出発地Ａから目的地Ｂまでの片道分の燃料しか搭載しておらず、往路しか有人飛行できない場合であっても、途中で飛行装置１単独による給油を挟むことで、目的地Ｂから出発地Ａまでの復路も有人飛行することができる。このように、航続距離をより長くすることもできる。 For example, the flight device 1 may be used by a mountain rescue team to fly from a headquarters base (departure point A) located at the foot of a mountain to a rescue site on a mountain trail (destination B). At this time, after the first rescuer arrives at destination B, he or she attaches and detaches the flight device 1 and lands at destination B, and then the flight device 1 returns to the departure point A by itself, allowing the second person to be rescued. A member of the team attaches the flight device 1 and heads to the rescue site. By repeating this, a plurality of rescue workers can be dispatched to destination B using one flying device 1. In addition, after the rescue team arrives at destination B, the flight device 1 is attached and detached and landed at destination B, and then the flight device 1 independently heads to departure point A and refueling point C. After refueling at point C, the flight device 1 may return to destination B by itself. In this case, even if only one-way fuel is loaded from departure point A to destination B and manned flight is only possible on the outbound flight, by intervening refueling by flight device 1 alone, it is possible to reach destination B. The return trip from A to departure point A can also be manned. In this way, the cruising distance can also be increased.

また、飛行装置１は、前述の用途に加え、地上の要救助者を上空で待機中のヘリコプターまで移送させるために利用されてもよい。更に、飛行装置１は地上に限られず、海上でも利用されてよい。例えば、飛行装置１は、海上遭難者を上空のヘリコプターや海上の船舶まで移送させるために利用されてもよい。 Further, in addition to the above-mentioned uses, the flight device 1 may be used to transport a person in need of rescue on the ground to a helicopter waiting in the sky. Furthermore, the flight device 1 is not limited to being used on land, but may also be used on the sea. For example, the flight device 1 may be used to transport people lost at sea to a helicopter in the sky or a ship on the sea.

［飛行装置の構成］
図２は、実施形態に係る飛行装置１の構成例を表す図である。図示のように、飛行装置１は、例えば、推力装置１０と、翼２０と、着脱部３０と、制御装置１００とを備える。 [Configuration of flight equipment]
FIG. 2 is a diagram illustrating a configuration example of the flight device 1 according to the embodiment. As illustrated, the flight device 1 includes, for example, a thrust device 10, wings 20, a detachable section 30, and a control device 100.

図２に示すΣ_Ｗは慣性座標系の一つの地球固定座標Σ_Ｗを表し、Ｏ_Ｗは地球固定座標Σ_Ｗの原点を表し、Ｘ_Ｗ軸は真北を表し、Ｙ_Ｗ軸は東を表し、Ｚ_Ｗ軸は鉛直下方を表している。また、慣性主軸を機体固定座標系として定義した場合、図中Ｘ_Ｂ軸は、飛行装置１の重心を原点としたときの機体の慣性主軸を表し、Ｚ_Ｂ軸は、機体の下方向を表し、Ｙ_Ｂ軸は、機体の進行方向右側の方向を表している。言い換えれば、Ｘ_Ｂ軸はロール軸を表し、Ｚ_Ｂ軸はヨー軸を表し、Ｙ_Ｂ軸はピッチ軸を表している。 Σ _W shown in Figure 2 represents one earth-fixed coordinate Σ _W of the inertial coordinate system, O _W represents the origin of the earth-fixed coordinate Σ _W , the X _W axis represents true north, and the Y _W axis represents east. , Z _W axis represents vertically downward direction. In addition, when the principal axis of inertia is defined as a coordinate system fixed to the aircraft, the _XB axis in the figure represents the principal axis of inertia of the aircraft when the center of gravity of the flight device 1 is the origin, and the _ZB axis represents the downward direction of the aircraft. , Y and _B axes represent the right direction in the direction of movement of the aircraft. In other words, the _XB axis represents the roll axis, the _ZB axis represents the yaw axis, and the _YB axis represents the pitch axis.

推力装置１０は、燃料１１を用いて飛行装置１に推力を発生させる。推力装置１０には、例えば、公知のジェットエンジンが好適に用いられてよい。以下、一例として推力装置１０には推力偏向可能なジェットエンジンが適用されるものとして説明する。ジェットエンジンの噴射口には、ダクトファンによって生じた噴流の向きを切り替えるための推力偏向機構（例えばパドルやノズル、リングなどを有するスラストベクタリング機構）が設けられており、これら推力偏向機構は、制御装置１００によって制御される。 The thrust device 10 causes the flight device 1 to generate thrust using fuel 11. For example, a known jet engine may be suitably used as the thrust device 10. Hereinafter, as an example, a description will be given assuming that a jet engine capable of thrust deflection is applied to the thrust device 10. The injection port of a jet engine is equipped with a thrust deflection mechanism (for example, a thrust vectoring mechanism having a paddle, nozzle, ring, etc.) for switching the direction of the jet flow generated by a duct fan. It is controlled by a control device 100.

翼２０は、飛行装置１の姿勢を維持し、かつ飛行する方向を転換する。翼２０による方向の転換は、ユーザＵが後述のユーザインターフェース１２０を操作することで行われてもよいし、制御装置１００が行ってもよいし、ユーザＵと制御装置１００との協働で行われてもよい。 The wings 20 maintain the attitude of the flight device 1 and change the direction of flight. The direction change by the wings 20 may be performed by the user U operating a user interface 120 (described later), by the control device 100, or by cooperation between the user U and the control device 100. It's okay to be hurt.

本実施形態において、翼２０はリンク機構を備え、鳥の羽根のように折り畳みが可能である。上述の翼幅は、翼２０を広げた状態におけるものとする。翼２０を折り畳むことができることから、下記の機能を有する。すなわち、高速飛行時においては翼２０を折り畳んで小さめにすることで空気抵抗を減らし、低速飛行時及び離着陸時には翼２０を大きく展開することで空気力を得る。また、飛行装置１の不使用時には翼２０を折り畳むことで、運搬時の機動性に寄与してもよい。また、上記に限らず、翼２０は、折り畳むことに代えて伸縮構造を備えることによって展開及び格納が可能な構造としてもよい。あるいは、折りたたみ可能な構造を備えない平板状（つまり固定翼）であってもよい。また、本実施形態に係る翼２０は、上述のリンク機構に加えて各種アクチュエータを備え、図２に示すロール軸Ｘ_Ｂ、ヨー軸Ｚ_Ｂ、ピッチ軸Ｙ_Ｂまわりに回動することができるものとする。詳細については後述する。 In this embodiment, the wing 20 is provided with a link mechanism and can be folded like a bird's wing. The above-mentioned wing span is assumed to be when the wing 20 is spread out. Since the wings 20 can be folded, they have the following functions. That is, during high-speed flight, the wings 20 are folded to make them smaller to reduce air resistance, and during low-speed flight and takeoff and landing, the wings 20 are expanded to obtain aerodynamic force. Further, when the flight device 1 is not in use, the wings 20 may be folded to contribute to mobility during transportation. Moreover, the wing 20 is not limited to the above structure, and instead of being folded, the wing 20 may have a structure that can be expanded and retracted by having a telescoping structure. Alternatively, it may be a flat plate (i.e., a fixed wing) without a foldable structure. Further, the blade 20 according to the present embodiment includes various actuators in addition to the link mechanism described above, and can rotate around the roll axis _XB , yaw axis _ZB , and pitch axis _YB shown in FIG. shall be. Details will be described later.

なお飛行装置１は、翼２０が設けられている代わりに、手と足の間に布を張ったウィングスーツであってもよいし、上記のような固定翼であってもよい。 Note that instead of being provided with the wings 20, the flight device 1 may be a wing suit in which cloth is stretched between the arms and legs, or may be a fixed wing as described above.

着脱部３０は、ユーザＵが飛行装置１を装着するための部材であり、この部材はユーザＵが容易に着脱可能な構造を有する。例えば、着脱部３０は、一般的なリュックサックのように肩に掛ける構造と、ユーザＵに固定するための留め具と、を備える構造を有してよい。あるいは、各ユーザＵが予め着脱部３０に対応した形状を備えた取付部材を装備した状態において、ユーザＵに装備された取付部材を介して、ユーザＵと着脱部３０とを適宜固定する構造としてもよい。 The attachment/detachment part 30 is a member for the user U to attach the flight device 1 to, and this member has a structure that allows the user U to easily attach and detach it. For example, the detachable part 30 may have a structure that includes a structure to be hung on the shoulder like a general rucksack, and a fastener for fixing to the user U. Alternatively, a structure may be adopted in which each user U is equipped with a mounting member having a shape corresponding to the detachable part 30 in advance, and the user U and the detachable part 30 are appropriately fixed via the mounting member equipped to the user U. Good too.

制御装置１００は、推力装置１０の推力を制御したり、その推力の向きを制御したりする。更に、制御装置１００は、翼２０の形状や向きを制御することで、飛行装置１の姿勢を調整したり、飛行する方向を転換したりする。 The control device 100 controls the thrust of the thrust device 10 and the direction of the thrust. Further, the control device 100 adjusts the attitude of the flight device 1 and changes the direction of flight by controlling the shape and orientation of the wings 20.

［制御装置の構成］
図３は、実施形態に係る制御装置１００の構成例を表す図である。図示のように、制御装置１００は、例えば、通信インターフェース１１０と、ユーザインターフェース１２０と、センサ１３０と、電源１４０と、記憶部１５０と、アクチュエータ１６０と、処理部１７０とを備える。 [Control device configuration]
FIG. 3 is a diagram illustrating a configuration example of the control device 100 according to the embodiment. As illustrated, the control device 100 includes, for example, a communication interface 110, a user interface 120, a sensor 130, a power source 140, a storage section 150, an actuator 160, and a processing section 170.

通信インターフェース１１０は、例えば、ＷＡＮ（Wide Area Network）などのネットワークを介して、外部装置と無線通信を行う。外部装置は、例えば、飛行装置１を遠隔操作可能なリモートコントローラであってよい。例えば、通信インターフェース１１０は、外部装置から、飛行装置１がとるべき目標の姿勢や速度などを指示するコマンドを受信してよい。これにより、ユーザＵの操縦技能が未熟であって、かつ、制御部２３０による自律単独飛行が不可能であるような場合に、外部から操縦に熟練したオペレータによる操縦を行うことができる。 The communication interface 110 performs wireless communication with an external device via a network such as a WAN (Wide Area Network), for example. The external device may be, for example, a remote controller that can remotely control the flight device 1. For example, the communication interface 110 may receive a command from an external device that instructs the target attitude, speed, etc. that the flight device 1 should take. As a result, even if the user U has inexperienced piloting skills and is unable to perform autonomous solo flight using the control unit 230, the pilot can be operated from the outside by a skilled operator.

また、通信インターフェース１１０は、外部装置から、目的地Ｂが変更になった旨を飛行中のユーザＵに対して連絡するための情報を受信してもよいし、目的地Ｂのより詳細な情報をユーザＵに対して連絡するための情報を受信してもよい。 Further, the communication interface 110 may receive information from an external device to notify the user U who is flying that the destination B has been changed, or may receive more detailed information about the destination B. The information for contacting the user U may be received.

また、通信インターフェース１１０は、外部装置に対して情報を送信してもよい。例えば、通信インターフェース１１０は、救助現場に関する詳細な情報（座標や高度等）を外部装置に送信してよい。 Additionally, the communication interface 110 may transmit information to an external device. For example, the communication interface 110 may send detailed information about the rescue scene (coordinates, altitude, etc.) to an external device.

ユーザインターフェース１２０は、入力インターフェース１２０ａと、出力インターフェース１２０ｂとが含まれる。例えば、入力インターフェース１２０ａは、ジョイスティックやハンドル、ボタン、スイッチ、マイクロフォンなどである。出力インターフェース１２０ｂは、例えば、ディスプレイやスピーカなどである。例えば、ユーザＵは、入力インターフェース１２０ａのジョイスティック等を操作して、推力装置１０の推力やその向きを調整してもよいし、翼２０の形状や向きを調整してもよい。また、ユーザＵは、入力インターフェース１２０ａのマイクロフォンに対して、飛行装置１がとるべき速度や高度、姿勢などを発話することで、推力装置１０の推力やその向きを調整してもよいし、翼２０の形状や向きを調整してもよい。 The user interface 120 includes an input interface 120a and an output interface 120b. For example, the input interface 120a is a joystick, a handle, a button, a switch, a microphone, etc. The output interface 120b is, for example, a display or a speaker. For example, the user U may operate the joystick or the like of the input interface 120a to adjust the thrust of the thrust device 10 and its direction, or may adjust the shape and direction of the blade 20. Further, the user U may adjust the thrust of the thrust device 10 and its direction by speaking into the microphone of the input interface 120a the speed, altitude, attitude, etc. that the flight device 1 should take. The shape and orientation of 20 may be adjusted.

センサ１３０は、例えば、慣性計測装置である。慣性計測装置は、例えば、三軸式加速度センサと、三軸式ジャイロセンサとを含む。慣性計測装置は、三軸式加速度センサや三軸式ジャイロセンサによって検出された検出値を処理部１７０に出力する。慣性計測装置による検出値には、例えば、水平方向、垂直方向、奥行き方向の各加速度及び／又は角速度や、ピッチ、ロール、ヨーの各軸の速度（レート）などが含まれる。センサ１３０には、更に、レーダやファインダ、ソナー、ＧＰＳ（Global Positioning System）受信機などが含まれてもよい。 Sensor 130 is, for example, an inertial measurement device. The inertial measurement device includes, for example, a three-axis acceleration sensor and a three-axis gyro sensor. The inertial measurement device outputs a detection value detected by a triaxial acceleration sensor or a triaxial gyro sensor to the processing unit 170. The detected values by the inertial measurement device include, for example, acceleration and/or angular velocity in the horizontal direction, vertical direction, and depth direction, and velocity (rate) in each axis of pitch, roll, and yaw. The sensor 130 may further include a radar, a finder, a sonar, a GPS (Global Positioning System) receiver, and the like.

電源１４０は、例えば、リチウムイオン電池などの二次電池である。電源１４０は、アクチュエータ１６０や処理部１７０などの構成要素に電力を供給する。電源１４０には、更に、ソーラーパネルなどが含まれてもよい。 The power source 140 is, for example, a secondary battery such as a lithium ion battery. Power supply 140 supplies power to components such as actuator 160 and processing section 170. Power source 140 may further include a solar panel or the like.

またアクチュエータ１６０や処理部１７０などは、電源１４０から供給された電力を利用する代わりに、或いは加えて、推力装置１０のジェットエンジンによって発電された電力を利用してもよい。 Further, the actuator 160, the processing unit 170, and the like may use the electric power generated by the jet engine of the thrust device 10 instead of or in addition to using the electric power supplied from the power source 140.

記憶部１５０は、例えば、ＨＤＤ（Hard Disc Drive）、フラッシュメモリ、ＥＥＰＲＯＭ（Electrically Erasable Programmable Read Only Memory）、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）などの記憶装置により実現される。記憶部１５０には、ファームウェアやアプリケーションプログラムなどの各種プログラムのほかに、処理部１７０の演算結果などがログとして格納される。また、記憶部１５０には、モデル情報１５２が格納される。モデル情報１５２は、例えば、ネットワークを介して外部装置から記憶部１５０にインストールされてもよいし、制御装置１００のドライブ装置に接続された可搬型の記憶媒体から記憶部１５０にインストールされてもよい。モデル情報１５２については後述する。 The storage unit 150 is realized by a storage device such as an HDD (Hard Disc Drive), a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), a ROM (Read Only Memory), or a RAM (Random Access Memory). In addition to various programs such as firmware and application programs, the storage unit 150 stores calculation results of the processing unit 170 as a log. Furthermore, model information 152 is stored in the storage unit 150. The model information 152 may be installed into the storage unit 150 from an external device via a network, or may be installed into the storage unit 150 from a portable storage medium connected to a drive device of the control device 100, for example. . The model information 152 will be described later.

アクチュエータ１６０は、例えば、推力アクチュエータ１６２と、スイープアクチュエータ１６４と、フォールドアクチュエータ１６８とを備える。 The actuator 160 includes, for example, a thrust actuator 162, a sweep actuator 164, and a fold actuator 168.

推力アクチュエータ１６２は、推力装置１０を駆動させて、飛行装置１に推力を与えたり、その推力の向きを変更したりする。スイープアクチュエータ１６４は、ヨー軸Ｚ_Ｂ周りに翼２０を回動させる。 The thrust actuator 162 drives the thrust device 10 to provide thrust to the flight device 1 or change the direction of the thrust. Sweep actuator 164 rotates wing 20 around yaw axis _ZB .

処理部１７０は、例えば、ＣＰＵ（Central Processing Unit）やＧＰＵ（Graphics Processing Unit）などが記憶部１５０に格納されたプログラムを実行することにより実現される。また、処理部１７０は、ＬＳＩ（Large Scale Integration）、ＡＳＩＣ（Application Specific Integrated Circuit）、またはＦＰＧＡ（Field-Programmable Gate Array）などのハードウェアにより実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。 The processing unit 170 is realized by, for example, a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or the like executing a program stored in the storage unit 150. Further, the processing unit 170 may be realized by hardware such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), or FPGA (Field-Programmable Gate Array), or may be implemented by collaboration between software and hardware. It may be realized by

処理部１７０は、（ｉ）入力インターフェース１２０ａに対するユーザＵの入力操作、（ｉｉ）センサ１３０の検出結果、（ｉｉｉ）通信インターフェース１１０が外部装置から受信した遠隔操作のためのコマンドのうちの一部又は全部に基づいて、推力アクチュエータ１６２を制御する。これによって、推力装置１０の推力が制御されたり、その推力の向きが制御されたりする。例えば、制御装置１００は、推力アクチュエータ１６２を制御することによって、推力装置１０のジェットエンジンのダクトファンの回転数を制御することで推力を調整したり、ジェットエンジンの推力偏向機構を制御して推力方向を調整したりする。 The processing unit 170 processes (i) the input operation of the user U to the input interface 120a, (ii) the detection result of the sensor 130, and (iii) some of the commands for remote operation that the communication interface 110 receives from the external device. or all based on the thrust actuator 162. As a result, the thrust of the thrust device 10 is controlled, and the direction of the thrust is controlled. For example, by controlling the thrust actuator 162, the control device 100 can adjust the thrust by controlling the rotation speed of the duct fan of the jet engine of the thrust device 10, or control the thrust deflection mechanism of the jet engine to increase the thrust. or adjust the direction.

また、制御装置１００は、翼２０が可変翼である場合、（ｉ）～（ｉｉｉ）のうちの一部又は全部に基づいて、スイープアクチュエータ１６４やフォールドアクチュエータ１６８を制御する。これによって、翼２０の形状や向きが制御される。翼２０の形状や向きは、「可変翼の操作量」の一例である。 Further, when the blade 20 is a variable blade, the control device 100 controls the sweep actuator 164 and the fold actuator 168 based on some or all of (i) to (iii). This controls the shape and orientation of the blade 20. The shape and orientation of the blade 20 are an example of the "variable blade operation amount."

［処理部の処理フロー］
以下、処理部１７０の一連の処理の流れをフローチャートを用いて説明する。図４は、処理部１７０の一連の処理の流れを示すフローチャートである。本フローチャートの処理は、例えば、所定の周期で繰り返し行われてよい。 [Processing flow of processing unit]
The flow of a series of processes performed by the processing unit 170 will be described below using a flowchart. FIG. 4 is a flowchart showing the flow of a series of processes performed by the processing unit 170. The processing in this flowchart may be repeated, for example, at a predetermined period.

まず処理部１７０は、現時刻ｔにおいて飛行装置１をとりまく環境の状態を示す状態変数ｓ_ｔを取得する（ステップＳ１００）。状態変数ｓ_ｔには、例えば、飛行装置１の現時刻ｔの姿勢、位置、速度、及び角速度のうち少なくとも一つ（好ましくは全て）が含まれる。例えば、状態変数ｓ_ｔに含まれる角度は、ピッチ軸周りの角度（以下、ピッチ角という）であってよい。また状態変数ｓ_ｔに含まれる角速度は、ピッチ角の角速度であってよい。さらに、状態変数ｓ_ｔには、現時刻ｔの推力装置１０の推力及びその方向や、現時刻ｔの翼２０の形状や向きが含まれてよい。現時刻ｔにおける姿勢、位置、速度、及び角速度のうち少なくとも一つ又は全部は「状態データ」の一例である。また現時刻ｔの推力装置１０の推力及びその方向や、現時刻ｔの翼２０の形状や向きは「操作データ」の一例である。 First, the processing unit 170 obtains a state variable s _t indicating the state of the environment surrounding the flight device 1 at the current time t (step S100). The state variable s _t includes, for example, at least one (preferably all) of the attitude, position, velocity, and angular velocity of the flight device 1 at the current time t. For example, the angle included in the state variable s _t may be an angle around the pitch axis (hereinafter referred to as pitch angle). Further, the angular velocity included in the state variable s _t may be the angular velocity of the pitch angle. Furthermore, the state variable s _t may include the thrust of the thrust device 10 and its direction at the current time t, and the shape and orientation of the blade 20 at the current time t. At least one or all of the attitude, position, velocity, and angular velocity at the current time t is an example of "state data." Further, the thrust of the thrust device 10 and its direction at the current time t, and the shape and orientation of the blade 20 at the current time t are examples of "operation data".

例えば、処理部１７０は、センサ１３０から姿勢、位置、速度、及び角速度を状態変数ｓ_ｔとして取得する。 For example, the processing unit 170 acquires the attitude, position, velocity, and angular velocity from the sensor 130 as the state variable s _t .

また、処理部１７０は、入力インターフェース１２０ａを介してユーザＵが推力装置１０の推力やその向きを指示した場合、入力インターフェース１２０ａに対するユーザＵの入力操作を状態変数ｓ_ｔに加えてもよい。 Further, when the user U instructs the thrust force of the thrust device 10 and its direction via the input interface 120a, the processing unit 170 may add the user U's input operation to the input interface 120a to the state variable s _t .

次に、処理部１７０は、記憶部１５０からモデル情報１５２を読み出し、そのモデル情報１５２によって定義された深層強化学習モデルＭＤＬを用いて、状態変数ｓ_ｔから、次の時刻ｔ＋１において飛行装置１が取り得ることが可能な最適な行動（行動変数）ａ_ｔ＋１を決定する（ステップＳ１０２）。 Next, the processing unit 170 reads the model information 152 from the storage unit 150, and uses the deep reinforcement learning model MDL defined by the model information ₁₅₂ to determine whether the flight device 1 at the next time t+1 is The optimal action (action variable) a _t+1 that can be taken is determined (step S102).

本実施形態における行動（行動変数）ａ_ｔ＋１は、所望のタスクを実現させるための行動であり、例えば、当タスクを実現させるために必要となる推力装置１０の推力とその方向が含まれてよく、更には、翼２０の形状や向きが含まれてよい。所望のタスクは、例えば、ある一定の高度を保ったまま飛行装置１をホバリングさせ続けることや、水平飛行から滑らかにホバリング体勢に遷移させること、強風下でも真っ直ぐに飛行すること、といった様々なタスクであってよい。 The action (action variable) a _t+1 in this embodiment is an action for realizing a desired task, and may include, for example, the thrust of the thrust device 10 and its direction that are necessary for realizing the task. Furthermore, the shape and orientation of the blade 20 may be included. The desired tasks include various tasks such as keeping the flight device 1 hovering while maintaining a certain altitude, smoothly transitioning from horizontal flight to a hovering position, and flying straight even under strong winds. It may be.

図５は、深層強化学習モデルＭＤＬの一例を表す図である。本実施形態に係る深層強化学習モデルＭＤＬは、深層強化学習を利用したニューラルネットワークである。図示のように、例えば、深層強化学習モデルＭＤＬは、中間層（隠れ層）の一部がＬＳＴＭ（Long Short Term Memory）であるリカレントニューラルネットワークであってよい。深層強化学習モデルＭＤＬは、ドメイン－ランダマゼイション（Domain-Randomization）を用いて、飛行装置１の重量や重心、慣性モーメント等のダイナミクスとシステム応答遅れがランダムに設定されて学習される。 FIG. 5 is a diagram illustrating an example of a deep reinforcement learning model MDL. The deep reinforcement learning model MDL according to this embodiment is a neural network using deep reinforcement learning. As illustrated, for example, the deep reinforcement learning model MDL may be a recurrent neural network in which part of the intermediate layer (hidden layer) is LSTM (Long Short Term Memory). The deep reinforcement learning model MDL is trained by randomly setting dynamics such as the weight, center of gravity, and moment of inertia of the flight device 1 and system response delay using domain-randomization.

ドメイン－ランダマゼイションによる（飛行装置１のダイナミクスがランダム化されて）学習をする際、深層強化学習モデルＭＤＬのＬＳＴＭには、ランダムに設定された飛行装置１のダイナミクスを反映した時系列が記憶される。このように、ニューラルネットワークにＬＳＴＭを設けることで、ドメイン－ランダマゼイションによる学習が好適に行われる。 When learning by domain-randomization (the dynamics of flight device 1 is randomized), the LSTM of the deep reinforcement learning model MDL stores a time series that reflects the randomly set dynamics of flight device 1. be done. In this way, by providing the LSTM in the neural network, learning by domain randomization is suitably performed.

例えば、深層強化学習モデルＭＤＬを学習させる深層強化学習のアルゴリズムが、価値ベース（Value based）である場合、深層強化学習モデルＭＤＬは、ＤＱＮ（Deep Q-Network）などを用いて学習されてよい。ＤＱＮとは、Ｑ学習と呼ばれる強化学習において、ある時刻ｔのある環境の状態ｓ_ｔの下で、ある行動ａ_ｔを選択したときの価値を関数として表した行動価値関数Ｑ（ｓ_ｔ、ａ_ｔ）を、ニューラルネットワークに近似関数として学習させる手法である。つまり、価値ベース（Value based）の手法で学習された深層強化学習モデルＭＤＬは、現時刻ｔにおいて飛行装置１が取り得ることが可能な一つ又は複数の行動（行動変数）ａ_ｔのうち、価値（Ｑ値）が最大となる行動（行動変数）ａ_ｔを出力するように学習されてよい。 For example, if the deep reinforcement learning algorithm for training the deep reinforcement learning model MDL is value based, the deep reinforcement learning model MDL may be trained using a DQN (Deep Q-Network) or the like. In reinforcement learning called Q-learning, DQN is _an action value function Q(s _t _, a _t ) as an approximation function in a neural network. In other words, the deep reinforcement learning model MDL learned using a value-based method selects one or more actions (action variables) a _t that the flight device 1 can take at the current time t. It may be learned to output the behavior (behavior variable) _at which has the maximum value (Q value).

Ｑ学習は、例えば、翼２０や推力装置１０が理想的な状態をとる場合に報酬を高くして、深層強化学習モデルＭＤＬの重みやバイアスを学習する。例えば、決められた地点の上空において、飛行装置１の姿勢が９０度のピッチアップ姿勢であり、飛行装置１の速度が静止と見做せる程度の速度にあるときには報酬を高くしてよい。一方、飛行装置１が地面や木々に接触したり、決められていた高度から逸脱したりする状態にあるときには、報酬を低く（例えばゼロ）にしてよい。 In Q-learning, for example, when the wings 20 and the thrust device 10 are in an ideal state, the rewards are increased and the weights and biases of the deep reinforcement learning model MDL are learned. For example, when the flying device 1 is in a pitch-up attitude of 90 degrees above a predetermined point and the speed of the flying device 1 is such that it can be considered stationary, the reward may be increased. On the other hand, when the flight device 1 is in a state where it contacts the ground or trees or deviates from a predetermined altitude, the reward may be set low (for example, zero).

また、例えば、深層強化学習モデルＭＤＬを学習させる深層強化学習のアルゴリズムが方策ベース（Policy based）である場合、深層強化学習モデルＭＤＬは、方策勾配法（Policy Gradients）などを用いて学習されてよい。 Further, for example, if the deep reinforcement learning algorithm that trains the deep reinforcement learning model MDL is policy based, the deep reinforcement learning model MDL may be trained using a policy gradient method or the like. .

また、例えば、深層強化学習モデルＭＤＬを学習させる深層強化学習のアルゴリズムが価値と方策を組み合わせたActor-Criticである場合、深層強化学習モデルＭＤＬに含まれるActor（行動器）を学習しながら、方策を評価するCritic（評価器）も同時に学習してよい。図５に例示した深層強化学習モデルＭＤＬは、ＰＰＯ（Proximal Policy Optimization）等のActor-Criticを用いて学習されたモデルであり、上段のレイヤが方策を出力するように学習され、下段のレイヤが価値を出力するように学習される。 For example, if the deep reinforcement learning algorithm that trains the deep reinforcement learning model MDL is Actor-Critic, which combines values and strategies, while learning the actors (behavioral devices) included in the deep reinforcement learning model MDL, You may also learn the Critic (evaluator) that evaluates at the same time. The deep reinforcement learning model MDL illustrated in Figure 5 is a model trained using Actor-Critic such as PPO (Proximal Policy Optimization), in which the upper layer is trained to output a policy, and the lower layer is trained to output a policy. It is learned to output value.

このような深層強化学習モデルＭＤＬを定義したモデル情報１５２には、例えば、ニューラルネットワークを構成する複数の層のそれぞれに含まれるユニットが互いにどのように結合されるのかという結合情報や、結合されたユニット間で入出力されるデータに付与される結合係数などの各種情報が含まれる。結合情報とは、例えば、各層に含まれるユニット数や、各ユニットの結合先のユニットの種類を指定する情報、各ユニットを実現する活性化関数、隠れ層のユニット間に設けられたゲートなどの情報を含む。ユニットを実現する活性化関数は、例えば、正規化線形関数（ＲｅＬＵ関数）であってもよいし、シグモイド関数や、ステップ関数、その他の関数などであってもよい。ゲートは、例えば、活性化関数によって返される値（例えば１または０）に応じて、ユニット間で伝達されるデータを選択的に通過させたり、重み付けたりする。結合係数は、例えば、ニューラルネットワークの隠れ層において、ある層のユニットから、より深い層のユニットにデータが出力される際に、出力データに対して付与される重みを含む。結合係数は、各層の固有のバイアス成分などを含んでもよい。更に、モデル情報１５２には、ＬＳＴＭに含まれる各ゲートの活性化関数の種類を指定する情報や、リカレント重みやピープホール重みなどが含まれてよい。 The model information 152 that defines such a deep reinforcement learning model MDL includes, for example, connection information on how units included in each of a plurality of layers constituting a neural network are connected to each other, and Contains various information such as coupling coefficients given to data input/output between units. Connection information includes, for example, the number of units included in each layer, information specifying the type of unit to which each unit is connected, activation functions that realize each unit, gates installed between units in hidden layers, etc. Contains information. The activation function that realizes the unit may be, for example, a normalized linear function (ReLU function), a sigmoid function, a step function, or other functions. The gate selectively passes or weights data communicated between units, eg, depending on the value returned by the activation function (eg, 1 or 0). The coupling coefficient includes, for example, a weight given to output data when data is output from a unit in a certain layer to a unit in a deeper layer in a hidden layer of a neural network. The coupling coefficient may include bias components specific to each layer, and the like. Furthermore, the model information 152 may include information specifying the type of activation function of each gate included in the LSTM, recurrent weights, peephole weights, and the like.

例えば、処理部１７０は、飛行装置１の現時刻ｔの姿勢、位置、速度、及び角速度のうち少なくとも一つと、現時刻ｔの推力装置１０の推力及びその方向とを取得すると、それらを状態変数ｓ_ｔとして深層強化学習モデルＭＤＬに入力する。状態変数ｓ_ｔが入力された深層強化学習モデルＭＤＬは、次の時刻ｔ＋１において最適となる推力装置１０の推力及びその方向を出力する。上述したように、深層強化学習モデルＭＤＬは、次の時刻ｔ＋１において推力装置１０が出力すべき推力及びその方向に加えて、或いは代えて、次の時刻ｔ＋１において翼２０がとるべき形状や向きを出力するように学習されてもよい。 For example, upon acquiring at least one of the attitude, position, velocity, and angular velocity of the flight device 1 at the current time t, and the thrust of the thrust device 10 and its direction at the current time t, the processing unit 170 converts them into state variables. Input to the deep reinforcement learning model MDL as s _t . The deep reinforcement learning model MDL into which the state variable s _t is input outputs the optimal thrust of the thrust device 10 and its direction at the next time t+1. As described above, the deep reinforcement learning model MDL determines the shape and orientation that the blade 20 should take at the next time t+1, in addition to or in place of the thrust that the thrust device 10 should output at the next time t+1 and its direction. It may be learned to output.

図４のフローチャートの説明に戻る。次に、処理部１７０は、深層強化学習モデルＭＤＬを用いて決定した飛行装置１がとるべき行動（行動変数）ａ_ｔ＋１、つまり次の時刻ｔ＋１において推力装置１０が出力すべき推力及びその方向や、次の時刻ｔ＋１において翼２０がとるべき形状や向きに基づいて、飛行装置１のアクチュエータ１６０を制御するための制御コマンドを生成する（ステップＳ１０４）。 Returning to the explanation of the flowchart in FIG. 4. Next, the processing unit 170 determines the action (action variable) a _t+1 that the flight device 1 should take determined using the deep reinforcement learning model MDL, that is, the thrust that the thrust device 10 should output at the next time t+1 and its direction. , a control command for controlling the actuator 160 of the flight device 1 is generated based on the shape and orientation that the wing 20 should take at the next time t+1 (step S104).

例えば、処理部１７０は、深層強化学習モデルＭＤＬによって行動変数ａ_ｔ＋１として出力された推力装置１０の推力及びその方向に基づいて、推力アクチュエータ１６２の制御コマンドを生成してよい。また、処理部１７０は、行動変数ａ_ｔ＋１として出力された翼２０の形状や向きに基づいて、スイープアクチュエータ１６４やフォールドアクチュエータ１６８の制御コマンドを生成してよい。 For example, the processing unit 170 may generate a control command for the thrust actuator 162 based on the thrust of the thrust device 10 and its direction output as the action variable a _t+1 by the deep reinforcement learning model MDL. Furthermore, the processing unit 170 may generate control commands for the sweep actuator 164 and the fold actuator 168 based on the shape and orientation of the wing 20 output as the action variable a _t+1 .

次に、処理部１７０は、生成した制御コマンドに基づいてアクチュエータ１６０を制御する（ステップＳ１０６）。これによって所望のタスクが実現され、その結果として飛行装置１を取り巻く環境の状態が変化し、その状態を表す状態変数がｓ_ｔがｓ_ｔ＋１へと変化する。 Next, the processing unit 170 controls the actuator 160 based on the generated control command (step S106). As a result, the desired task is realized, and as a result, the state of the environment surrounding the flight device 1 changes, and the state variable representing the state changes from s _t to s _t+1 .

処理部１７０は、状態変数ｓ_ｔがｓ_ｔ＋１へと変化したのに伴って、時刻ｔ＋１における状態変数ｓ_ｔ＋１を取得し直す。そして、処理部１７０は、時刻ｔ＋１における状態変数ｓ_ｔ＋１が、飛行装置１によって所望のタスクが達成され続けるように制御コマンドを対象のアクチュエータ１６０に与え続ける。これによって本フローチャートの処理が終了する。 The processing unit 170 reacquires the state variable s t ₊₁ at time t+1 as the state variable s _t changes to s t ₊₁ . Then, the processing unit 170 continues to give control commands to the target actuator 160 so that the state variable s _t+ 1 at time t+1 continues to allow the flight device 1 to accomplish the desired task. This completes the processing of this flowchart.

以上説明した第実施形態によれば、制御装置１００の処理部１７０は、飛行装置１の現時刻ｔの姿勢、位置、速度、及び角速度のうち少なくとも一つ（好ましくは全て）と、現時刻ｔの推力装置１０の推力及びその向きとを状態変数ｓ_ｔとして取得する。この際、処理部１７０は、現時刻ｔの推力装置１０の推力及びその向きに加えて、或いは代えて、現時刻ｔの翼２０の形状や向きを、状態変数ｓ_ｔとして取得してもよい。 According to the embodiment described above, the processing unit 170 of the control device 100 determines at least one (preferably all) of the attitude, position, velocity, and angular velocity of the flight device 1 at the current time t, and the current time t. The thrust of the thrust device 10 and its direction are obtained as state variables _st . At this time, in addition to or in place of the thrust of the thrust device 10 and its direction at the current time t, the processing unit 170 may acquire the shape and orientation of the blade 20 at the current time t as the state variable s _t . .

処理部１７０は、状態変数ｓ_ｔを取得すると、深層強化学習によって予め学習された深層強化学習モデルＭＤＬに対して、状態変数ｓ_ｔを入力する。処理部１７０は、状態変数ｓ_ｔが入力されたことに応じて深層強化学習モデルＭＤＬが出力した次の時刻ｔ＋１における行動変数ａ_ｔ＋１に基づいて、飛行装置１を制御する。このように、現時刻ｔの飛行装置１の姿勢、位置、速度、及び角速度と、現時刻ｔの推力装置１０の推力及びその向きとを含む状態変数ｓ_ｔをもとに深層強化学習された深層強化学習モデルＭＤＬを利用して飛行装置１を制御するため、有人飛行の場合には、飛行装置１を装着するユーザＵの体格（体重や身長など）にばらつきがあったとしても、ユーザＵの体格に依らずに飛行装置１を好適に制御することができる。また、飛行途中でユーザが飛行装置１を離脱し、有人飛行から無人飛行に切り替わった場合であっても、飛行装置１を好適に制御することができる。 Upon acquiring the state variable s _t , the processing unit 170 inputs the state variable s _t to the deep reinforcement learning model MDL that has been trained in advance by deep reinforcement learning. The processing unit 170 controls the flight device 1 based on the behavior variable a _{t+1 at the next time t+1} output by the deep reinforcement learning model MDL in response to input of the state variable s _t . In this way, deep reinforcement learning is performed based on the state variables _s which include the attitude, position, velocity, and angular velocity of the flight device 1 at the current time t, and the thrust of the thrust device 10 and its direction at the current time t. Since the flight device 1 is controlled using the deep reinforcement learning model MDL, in the case of a manned flight, even if the physique (weight, height, etc.) of the user U who wears the flight device 1 varies, the user U The flight device 1 can be suitably controlled regardless of the physique of the person. Further, even if the user leaves the flight device 1 during the flight and the flight is switched from manned flight to unmanned flight, the flight device 1 can be suitably controlled.

例えば、前述のように、山岳救助隊が飛行装置１を装着して登山道内の救助現場（目的地Ｂ）に空路で向かう場合、１人目の救助隊員が目的地Ｂに到着した後、飛行装置１を脱着して目的地Ｂに降り立ち、その後、飛行装置１が単独で出発地Ａまで戻ることで、２人目の救助隊員が飛行装置１を装着して救助現場に向かうことが想定される。このような場合において、例えば、１人目の救助隊員と２人目の救助隊員との体格が大きく異なると、従来の技術では、同一の飛行装置１を利用することが難しい。これに対して本実施形態では、特にリカレントニューラルネットワークがＬＳＴＭ層を有するため、時系列の記憶が可能となり、制御出力と状態変数の履歴からユーザの体格に応じたチューニングが可能となる。この結果、体重が重いユーザＵが飛行装置１を装着したときでも、体重が軽いユーザＵが飛行装置１を装着したときでも同じように飛行装置１を安定的に飛行させ続けることができる。 For example, as mentioned above, when a mountain rescue team attaches the flight device 1 and heads by air to a rescue site on a mountain trail (destination B), after the first rescue team arrives at destination B, the flight device It is assumed that the second rescue worker attaches and detaches the flight device 1 and lands at the destination B, and then the flight device 1 returns alone to the departure point A, so that a second rescue worker attaches the flight device 1 and heads to the rescue site. In such a case, for example, if the first rescuer and the second rescuer have significantly different physiques, it is difficult to use the same flight device 1 with the conventional technology. On the other hand, in this embodiment, since the recurrent neural network has the LSTM layer, time-series storage is possible, and tuning according to the user's physique is possible from the history of control outputs and state variables. As a result, the flight device 1 can continue to fly stably in the same way even when a heavy user U wears the flight device 1 and when a light user U wears the flight device 1.

また、例えば、１人目の救助隊員が目的地Ｂに到着した後、飛行装置１を脱着して目的地Ｂに降り立った場合、飛行装置１にかかる荷重が急激に減少することになる。このような場合、従来の技術では、飛行装置１を安定して飛行させ続けることが難しい。これに対して本実施形態では、ユーザの体格を考慮したのではなく、飛行装置１のダイナミクスや応答遅れのばらつきを考慮して深層強化学習を行っているため、つまりドメイン－ランダマゼイションを用いて深層強化学習を行っているため、ユーザＵが飛行装置１を離脱し、飛行装置１単体となった場合であっても、ユーザＵが飛行装置１を装着していたとき同様に、飛行装置１を安定的に飛行させ続けることができる。 Further, for example, when the first rescue worker arrives at destination B and then attaches and detaches the flight device 1 and lands at destination B, the load on the flight device 1 will decrease rapidly. In such a case, with the conventional technology, it is difficult to keep the flight device 1 flying stably. On the other hand, in this embodiment, deep reinforcement learning is performed not by considering the user's physique, but by taking into account the dynamics and response delay variations of the flight device 1. In other words, domain randomization is used. Because deep reinforcement learning is performed on the flight device, even if the user U leaves the flight device 1 and becomes only the flight device 1, the flight device 1 can be kept flying stably.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 Although the mode for implementing the present invention has been described above using embodiments, the present invention is not limited to these embodiments in any way, and various modifications and substitutions can be made without departing from the gist of the present invention. can be added.

１…飛行装置、１０…推力装置、２０…翼、３０…着脱部、１００…制御装置、１１０…通信インターフェース、１２０…ユーザインターフェース、１３０…センサ、１４０…電源、１５０…記憶部、１６０…アクチュエータ、１７０…処理部 DESCRIPTION OF SYMBOLS 1...Flight device, 10...Thrust device, 20...Wing, 30...Detachable part, 100...Control device, 110...Communication interface, 120...User interface, 130...Sensor, 140...Power supply, 150...Storage part, 160...Actuator , 170...processing section

Claims

A control device for controlling a flight device wearable by a user, the control device comprising:
obtaining state data regarding the state of the flight device and operation data regarding the operation of the flight device;
Inputting the acquired state data and operation data into a model learned using deep reinforcement learning,
controlling the flight device based on the output result of the model into which the state data and operation data are input;
A control device including a processing section.

the model is a neural network trained by domain-randomization;
The control device according to claim 1.

the model is a recurrent neural network including a memory layer;
The control device according to claim 1 or 2.

The flight device includes a jet engine,
The state data includes at least one of the attitude, position, velocity, and angular velocity of the flight device,
The operation data includes the thrust of the jet engine and the direction of the thrust,
The processing unit controls the attitude of the flight device based on the thrust output by the model and the direction of the thrust.
The control device according to claim 1 or 2.

The flight device further includes variable wings,
The operation data further includes an operation amount of the variable blade,
The processing unit controls the attitude of the flight device based on the thrust output by the model, the direction of the thrust, and the operation amount of the variable wing.
The control device according to claim 4.

A control method for controlling a flight device wearable by a user using a computer, the method comprising:
acquiring state data regarding the state of the flight device and operation data regarding the operation of the flight device; inputting the acquired state data and operation data into a model learned using deep reinforcement learning;
controlling the flight device based on an output result of the model into which the state data and operation data are input;
control methods including.

A program for causing a computer to control a flight device wearable by a user, the program comprising:
acquiring state data regarding the state of the flight device and operation data regarding the operation of the flight device; inputting the acquired state data and operation data into a model learned using deep reinforcement learning;
controlling the flight device based on an output result of the model into which the state data and operation data are input;
programs containing.