JP7358808B2

JP7358808B2 - Robots and machine learning methods

Info

Publication number: JP7358808B2
Application number: JP2019126806A
Authority: JP
Inventors: 史紘佐々木
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2019-07-08
Filing date: 2019-07-08
Publication date: 2023-10-11
Anticipated expiration: 2039-07-08
Also published as: JP2021010984A

Description

本発明は、ロボットおよび機械学習方法に関する。 The present invention relates to robots and machine learning methods.

作業の自動化の一環として、機械学習技術によって学習し、自律制御を行うロボットが知られている。 As part of work automation, robots that learn using machine learning technology and perform autonomous control are known.

このような機械学習を行って自律制御を行うロボットの１つに、人間の操作者による模範操作を学習させた後に自律制御による自動操作を行うロボットが知られている（例えば特許文献１等参照）。
このようなロボットにおいて、人間の操作者による模範操作を正しく模倣する模倣学習を行うためには、操作者が行った操作と、その操作の根拠となる操作者が把握していた観測情報と、を１組の学習データとして正しくロボットに理解させることが重要である。
しかしながら、模倣学習を行う際には、例えば操作者が何を見ているかをロボットに把握させることは難しいため、こうした観測情報は学習データとして入力しづらいという問題が生じていた。 One known robot that performs autonomous control by performing such machine learning is a robot that performs automatic operation by autonomous control after learning a model operation by a human operator (see, for example, Patent Document 1) ).
In order for such a robot to perform imitation learning to correctly imitate a model operation by a human operator, it is necessary to know the operation performed by the operator, the observation information known to the operator that is the basis for that operation, It is important to have the robot understand correctly as a set of learning data.
However, when performing imitation learning, for example, it is difficult to make the robot understand what the operator is looking at, so there has been a problem in that it is difficult to input such observation information as learning data.

本発明は、以上のような問題点に鑑みてなされたものであり、模範操作時に操作者が把握していた観測情報に基づいて模倣学習を行うロボットの提供を目的とする。 The present invention has been made in view of the above-mentioned problems, and aims to provide a robot that performs imitation learning based on observation information grasped by an operator during a model operation.

上述した課題を解決するため、本発明のロボットは、学習を行うときに前記ロボットの周囲の状況を複数観測可能な観測手段と、前記観測手段によって観測された複数の観測結果のうち、前記操作者が前記観測結果を確認した上で行った前記観測手段の切替操作によって何れかの観測結果を選択する選択手段と、を有し、前記学習において、前記切替操作を行った際に前記選択手段によって選択された前記観測結果と、前記選択手段によって前記観測結果が選択された後に入力された前記操作者による当該ロボットの移動操作と、が紐づけられて学習されることを特徴とする。 In order to solve the above-mentioned problems, the robot of the present invention includes observation means that can observe a plurality of situations around the robot when learning, and a robot that selects the operation from among the plurality of observation results observed by the observation means. selection means for selecting one of the observation results by a switching operation of the observation means performed by a person after confirming the observation results, and in the learning , when the switching operation is performed, the selection means The observation result selected by the selection means and the movement operation of the robot by the operator input after the observation result is selected by the selection means are linked and learned.

本発明によれば、模倣学習を効率よく行うことができて、自律制御時における自動操作の精度が向上する。 According to the present invention, imitation learning can be performed efficiently, and the accuracy of automatic operation during autonomous control is improved.

本発明の実施形態に係るロボットの制御システムの全体構成の一例を示す図である。1 is a diagram showing an example of the overall configuration of a robot control system according to an embodiment of the present invention. 図１に示した操作部の構成の一例を示す図である。FIG. 2 is a diagram showing an example of the configuration of an operating section shown in FIG. 1. FIG. 図１に示した制御システムの構成の一例を示すブロック図である。2 is a block diagram showing an example of the configuration of the control system shown in FIG. 1. FIG. 模倣学習を行う際の比較例を示す図である。FIG. 6 is a diagram showing a comparative example when performing imitation learning. 本発明における学習制御部の動作の一例を示す図である。It is a figure showing an example of operation of a learning control part in the present invention. 本発明における選択手段の動作の一例を示す図である。It is a figure which shows an example of operation|movement of the selection means in this invention. 本発明における自律制御部の制御動作の一例を示す図である。It is a figure which shows an example of the control operation of the autonomous control part in this invention. 本発明における第２の実施形態の構成の一例を示す図である。It is a figure showing an example of composition of a 2nd embodiment in the present invention. 本発明における第２の実施形態の選択手段の動作の一例を示す図である。It is a figure showing an example of operation of a selection means of a 2nd embodiment in the present invention. 本発明における第３の実施形態の構成の一例を示す図である。It is a figure showing an example of composition of a 3rd embodiment in the present invention.

本発明の第１の実施形態として図１に、操作者Ｐが操作部１０を用いてネットワーク９を介して操作するロボット１０１の制御システム１００の全体構成の概念図を示す。 As a first embodiment of the present invention, FIG. 1 shows a conceptual diagram of the overall configuration of a control system 100 for a robot 101 operated by an operator P using an operation unit 10 via a network 9.

ロボット１０１は、観測手段の１つとして設けられた動画像取得手段たる複数のカメラ２０ａ、２０ｂと、指定された対象物Ｑを摘まみ上げるための移動可能なアーム４０と、アーム４０を移動可能なようにモータ等で構成された駆動部と、操作部１０からの操作指示を受信するための受信部２２と、受信部２２が受け取った操作指示に基づいてロボット１０１の各部位を制御するための制御端末たる制御部３０と、を有している。
なお、本実施形態においては特に、ロボット１０１は指定された対象物Ｑを摘まみ上げるピックアップロボットとして説明を行うが、操作者Ｐによる模範操作を機械学習し、かかる学習結果をもとに自律操作可能なロボットであればどのような操作を行うためのロボットであっても良い。 The robot 101 has a plurality of cameras 20a and 20b which are moving image acquisition means provided as one of observation means, a movable arm 40 for picking up a designated object Q, and a movable arm 40. A driving section composed of a motor etc. as shown in FIG. The control unit 30 is a control terminal.
In this embodiment, the robot 101 will be described as a pick-up robot that picks up a designated object Q, but it learns model operations by the operator P by machine and performs autonomous operation based on the learning results. Any robot capable of performing any operation may be used.

本実施形態においては、カメラ２０ａはアーム４０に取り付けられ、カメラ２０ｂは作業台近傍に固定して取り付けられており、アーム４０と対象物Ｑとを撮影可能に支持されている。
また、カメラ２０ａ、２０ｂは同型のカメラであって、特にカメラ２０ａ、２０ｂを区別しないときにはカメラ２０との呼称を用いる。
特に本実施形態においては観測手段の１つとして撮像部たるカメラについてのみ説明するが、カメラ以外にも、音、可視光以外の電磁波等を観測してモニター１１に操作者Ｐが認識可能な形で表示可能なセンサや検知手段であっても良い。
また、赤外線センサー、レーダー、ライダー等の距離情報を補足可能なセンサや、当該センサの数値情報等のデータをモニター１１に表示可能であれば、このようなセンサ類をカメラ２０に代えて、あるいは加えても良い。 In this embodiment, the camera 20a is attached to the arm 40, and the camera 20b is fixedly attached near the workbench and supported so that the arm 40 and the object Q can be photographed.
Further, the cameras 20a and 20b are of the same type, and the name camera 20 is used especially when the cameras 20a and 20b are not distinguished.
In particular, in this embodiment, only a camera as an imaging unit will be described as one of the observation means, but other than the camera, there are also other devices that can be used to observe sound, electromagnetic waves other than visible light, etc. It may also be a sensor or detection means that can display the information.
Furthermore, if there is a sensor that can supplement distance information such as an infrared sensor, radar, or lidar, or if data such as numerical information of the sensor can be displayed on the monitor 11, such sensors may be used instead of the camera 20, or You can also add it.

操作部１０は、図２に示すようにアーム４０の移動方向を指示する複数のボタンを備える移動指示部１２と、モニター１１に表示される動画像を切替えるため、あるいはカメラ２０の視界を移動させるための動画像操作部１３と、を有している。
操作部１０は、制御部３０及びロボット１０１と無線あるいは有線のネットワーク９を介して通信し、データの送受信を行っている。
具体的には、ロボット１０１がカメラ２０で撮影した動画像がモニター１１へと送信され、操作者Ｐがモニター１１を見ながら移動指示部１２を用いて行った操作指示がロボット１０１の受信部２２へと送信される。 The operation unit 10 includes a movement instruction unit 12 that includes a plurality of buttons for instructing the movement direction of the arm 40, as shown in FIG. It has a moving image operation section 13 for.
The operation unit 10 communicates with the control unit 30 and the robot 101 via a wireless or wired network 9, and sends and receives data.
Specifically, a moving image taken by the robot 101 with the camera 20 is transmitted to the monitor 11, and an operation instruction given by the operator P using the movement instruction section 12 while looking at the monitor 11 is transmitted to the receiving section 22 of the robot 101. sent to.

モニター１１は、カメラ２０による観測結果たる動画像を表示する。言い換えるとモニター１１はカメラ２０によって観測された複数の動画像のうち、動画像操作部１３によって選択された観測結果の１つを表示する表示手段としての機能を有している。
すなわち本実施形態においては、モニター１１は操作者Ｐが操作するために参照するカメラ２０が撮影した画像あるいは動画を表示する動画像表示部としての機能を有している。
すなわち、モニター１１に表示される動画像は、操作部１０の動画像操作部１３によって切り替え可能である。 The monitor 11 displays a moving image that is the observation result by the camera 20. In other words, the monitor 11 has a function as a display means for displaying one of the observation results selected by the video image operation section 13 from among the plurality of video images observed by the camera 20.
That is, in this embodiment, the monitor 11 has a function as a moving image display unit that displays images or moving images taken by the camera 20 that the operator P refers to for operation.
That is, the moving image displayed on the monitor 11 can be switched by the moving image operating section 13 of the operating section 10.

移動指示部１２は、例えば前進指示ボタン１２ａ、右移動指示ボタン１２ｂ、後退指示ボタン１２ｃ、左移動指示ボタン１２ｄ、の４つを備えた移動方向指示部としての機能を有している。なお、本実施形態では移動指示部１２は４つの独立したボタンを有するとしたが、ジョイスティックやその他ポインティングデバイス等であっても良い。
また、本実施形態では、受信部２２を制御部３０とは分けて表記しているが、制御部３０の中の一機能として受信部２２を設けるとしても良く、かかる構成には限定されるものではない。
また、操作部１０がタブレット端末とすれば、モニター１１は操作部１０内に含まれるような構成であっても良い。 The movement instruction section 12 has a function as a movement direction instruction section including, for example, a forward instruction button 12a, a right movement instruction button 12b, a backward movement instruction button 12c, and a left movement instruction button 12d. In this embodiment, the movement instruction section 12 has four independent buttons, but it may be a joystick or other pointing device.
Further, in this embodiment, the receiving section 22 is shown separately from the control section 30, but the receiving section 22 may be provided as one function in the control section 30, but the configuration is not limited to this. isn't it.
Further, if the operation section 10 is a tablet terminal, the monitor 11 may be included in the operation section 10.

制御部３０は、図３に示すように、操作者Ｐが行った模範操作と、模範操作時にモニター１１に表示されていた動画像と、を組み合わせて学習するための機械学習制御部３１と、機械学習制御部３１によって学習されたデータセットに基づいて、自発的にアーム４０の制御を行うための自律制御部３２と、を有している。
機械学習制御部３１は、操作者Ｐの模範操作すなわち前後左右の何れの方向に移動したかを表す移動操作コマンドと、模範操作が行われたときのモニター表示切替コマンドと、模範操作が行われたときの動画像フレームと、を１組のデータセットとして保存する。
また、本実施形態においては動画像フレームを観測結果として機械学習制御部３１に保存されるデータセットとしたが、その他選択されたカメラ２０ａ、２０ｂの識別子等、ロボット１０１の全体状態情報を観測結果として含んでいても良い。 As shown in FIG. 3, the control unit 30 includes a machine learning control unit 31 for learning by combining a model operation performed by the operator P and a moving image displayed on the monitor 11 at the time of the model operation; It has an autonomous control section 32 for autonomously controlling the arm 40 based on the data set learned by the machine learning control section 31.
The machine learning control unit 31 receives a model operation by the operator P, that is, a movement operation command indicating which direction the model operation was made, forward, backward, left, or right, a monitor display switching command when the model operation is performed, and a command to switch the monitor display when the model operation is performed. The video frames at that time are saved as a set of data.
In addition, in this embodiment, the moving image frame is used as a data set to be stored in the machine learning control unit 31 as an observation result, but the overall state information of the robot 101, such as the identifiers of the selected cameras 20a and 20b, is also used as an observation result. It may be included as

さらに、機械学習制御部３１は、蓄積された観測結果たる動画像フレームＦと、アーム４０の移動操作コマンドＭと、モニター表示切替コマンドＧと、のデータセットを利用し、入力が動画像フレーム、出力が移動操作コマンド及びモニター表示切替コマンドとなるように関数を学習する。
すなわち、機械学習制御部３１は、「蓄積された観測結果たる動画像フレームＦ」と「操作した操作指示たる移動操作コマンドＭとモニター表示切替コマンドＧと」を一組のデータセットとして記憶・蓄積した関数を学習する。 Furthermore, the machine learning control unit 31 uses the data set of the moving image frame F as the accumulated observation result, the movement operation command M of the arm 40, and the monitor display switching command G, and inputs the moving image frame, Learn functions so that the output is a movement operation command and a monitor display switching command.
That is, the machine learning control unit 31 stores and accumulates "the moving image frame F which is the accumulated observation result" and "the movement operation command M and the monitor display switching command G which are the operated operation instructions" as a data set. Learn the functions.

自律制御部３２は、機械学習制御部３１によって学習された制御関数に基づいて、動画像フレームが入力されたときには、出力となる移動操作コマンドによってロボット１０１を動作させ、出力されたモニター表示切替コマンドに従い、次の制御関数に入力される動画像フレームを作成するようにカメラ２０ａ、２０ｂの操作を制御する。
ここで具体例を示すために、操作者Ｐが模範操作においてモニター１１に表示されたフレームＦ１を見て移動操作コマンドＭ１を送信し、その後モニター表示切替コマンドＧ１を操作したと機械学習制御部３１が学習した場合について考える。
自律制御部３２はフレームＦ１が入力されると、移動操作コマンドＭ１を出力し、アーム４０が移動操作コマンドＭ１によって所定の位置に移動すると、さらにモニター表示切替コマンドＧ１を出力して次に得られるフレームＦ２を取得する。
以下、得られたフレームＦ２を入力値として、機械学習制御部３１のデータセットから移動操作コマンドＭ２、モニター表示切替コマンドＧ２、とを出力し、次々と自律制御を行う。 Based on the control function learned by the machine learning control unit 31, the autonomous control unit 32 operates the robot 101 according to the output movement operation command when a video frame is input, and uses the output monitor display switching command. Accordingly, the operations of the cameras 20a and 20b are controlled to create a moving image frame to be input to the next control function.
Here, to show a specific example, it is assumed that the operator P looks at the frame F1 displayed on the monitor 11 in a model operation, sends the movement operation command M1, and then operates the monitor display switching command G1. Consider the case where the
When the frame F1 is input, the autonomous control unit 32 outputs a movement operation command M1, and when the arm 40 moves to a predetermined position by the movement operation command M1, it further outputs a monitor display switching command G1 to obtain the next result. Obtain frame F2.
Thereafter, using the obtained frame F2 as an input value, a movement operation command M2 and a monitor display switching command G2 are output from the data set of the machine learning control unit 31, and autonomous control is performed one after another.

さて、従来の機械学習方法においては、どのような入出力をデータセットとして保存するかについて様々な取り組みがなされている。
機械学習に用いるデータセットの一例として例えば入力される動画像フレームと、移動操作コマンドと、をデータセットとして組み合わせる学習方法が挙げられる。
図４には、このようなデータセットの一例として、左側にカメラ２０によって撮影した動画像フレームＦ１’～Ｆ４’の視界を実線で示し、カメラ２０の視界外に対象物Ｑがある場合には対象物Ｑを破線で表示した。また、図４（ａ）～（ｃ）については左側に表示された動画像フレームＦ１’～Ｆ３’にそれぞれ対応する移動操作コマンドＭ１’～Ｍ３’を右側に表示した。
なお、以降の説明においては図４（ａ）の左側の状態を撮影状態Ａ、図４（ｂ）の左側の状態を撮影状態Ｂ、図４（ｃ）の左側の状態を撮影状態Ｄ、とそれぞれ表現する。 Now, in conventional machine learning methods, various efforts have been made regarding what kind of input and output should be saved as a data set.
An example of a data set used for machine learning is a learning method that combines input video frames and movement operation commands as a data set.
In FIG. 4, as an example of such a data set, the field of view of the moving image frames F1' to F4' photographed by the camera 20 is shown by a solid line on the left side, and if there is an object Q outside the field of view of the camera 20, The object Q is indicated by a broken line. Further, in FIGS. 4A to 4C, movement operation commands M1' to M3' corresponding to the moving image frames F1' to F3' displayed on the left side are displayed on the right side.
In the following description, the left side of FIG. 4(a) will be referred to as shooting state A, the left side of FIG. 4(b) will be called shooting state B, and the left side of FIG. 4(c) will be called shooting state D. Express each.

図４（ａ）、（ｂ）のような撮影状態Ａ、撮影状態Ｂと、アーム４０の移動操作コマンドと、を模倣学習によりそれぞれ紐づけることで学習するような方法について説明する。
操作者Ｐがまず模範操作として、図４（ａ）、（ｂ）に示したような動画像フレームＦ１’、動画像フレームＦ２’のそれぞれのパターンにおいて、移動操作コマンドＭ１’、移動操作コマンドＭ２’のそれぞれの操作を行い、機械学習制御部３１に模範操作として学習させる。
このような学習方法によれば、自律制御部３２が例えば動画像フレームＦ１’として図４（ａ）のように撮影状態Ａを検知したときには、カメラ２０の中心部に向かうような移動操作コマンドＭ１’を入力することで、アーム４０を適切に対象物Ｑの位置へと誘導することができる。
同様に、自律制御部３２が動画像フレームＦ２’として図４（ｂ）のように撮影状態Ｂを検知したときには、カメラ２０の右上部に向かうような移動操作コマンドＭ２’を入力することで、アーム４０を適切に対象物Ｑの位置へと誘導することができる。 A method of learning by linking the photographing state A and photographing state B shown in FIGS. 4A and 4B with the movement operation command of the arm 40 through imitation learning will be described.
First, as a model operation, the operator P executes a movement operation command M1' and a movement operation command M2 in the respective patterns of the moving image frame F1' and the moving image frame F2' as shown in FIGS. 4(a) and 4(b). ' are performed, and the machine learning control unit 31 is made to learn as a model operation.
According to such a learning method, when the autonomous control unit 32 detects the shooting state A as shown in FIG. By inputting ', the arm 40 can be appropriately guided to the position of the object Q.
Similarly, when the autonomous control unit 32 detects the shooting state B as the moving image frame F2' as shown in FIG. The arm 40 can be appropriately guided to the position of the object Q.

しかしながら、例えば図４（ｃ）に破線で示すように、対象物Ｑがカメラの視野外に存在するような撮影状態Ｃを検知した時には、図４（ｃ）の動画像フレームＦ３’からは対象物Ｑを読み取ることは難しい。
操作者Ｐは、撮影状態Ｃにおいても、破線で表示された対象物Ｑを直接視認して操作すればよいから、アーム４０をカメラの視覚外右側にある対象物Ｑへと誘導することができる。すなわち模範操作においては撮影状態Ｃにあっても移動操作コマンドＭ３’が出力され、学習データとして蓄積されていってしまう。 However, for example, as shown by the broken line in FIG. 4(c), when a shooting state C is detected in which the object Q exists outside the field of view of the camera, the object Q is detected from the moving image frame F3' in FIG. 4(c). It is difficult to read Object Q.
Even in the photographing state C, the operator P only has to directly view and operate the object Q indicated by the broken line, so that the arm 40 can be guided to the object Q on the right side out of the visual field of the camera. . That is, in the model operation, even in the photographing state C, the movement operation command M3' is output and stored as learning data.

このように、実際には操作者Ｐの判断に影響を与えていないような動画像フレームＦ３’が、移動操作コマンドＭ３’と紐づけられてしまうようなことが繰り返されるときの問題点について説明する。
例えば図４（ｄ）に示すような撮影状態Ｄを動画像フレームＦ４’として入力された場合に、十分に学習が済んだロボット１０１は、動画像フレームＦ３’と誤認し、実際の対象物Ｑが図４（ｄ）に破線で示すようにカメラの視覚外左側にあるにも拘らず、移動操作コマンドＭ３’を出力してアーム４０を右側へと移動してしまう。
すなわち、撮影状態Ｄの動画像フレームＦ４’の観測結果からは最適な移動操作コマンドを予測することが難しい。 In this way, we will explain the problem when the video frame F3', which does not actually affect the judgment of the operator P, is repeatedly associated with the movement operation command M3'. do.
For example, when a photographing state D as shown in FIG. As shown by the broken line in FIG. 4(d), the arm 40 is moved to the right by outputting the movement operation command M3' even though it is on the left outside of the camera's visual field.
That is, it is difficult to predict the optimal movement operation command from the observation results of the video frame F4' in the shooting state D.

あるいは模範操作の最中に、図４（ｃ）、（ｄ）のような学習パターンが多く含まれてしまうと、撮影状態Ｃと撮影状態Ｄとの間の切り分けができず、移動操作コマンドが一意には定まらないため、学習に失敗する場合も有り得る。 Alternatively, if many learning patterns such as those shown in FIGS. 4(c) and 4(d) are included during the model operation, it will not be possible to distinguish between the shooting state C and the shooting state D, and the movement operation command will be Since it is not determined uniquely, learning may fail.

このような問題を解決するためには、操作者Ｐが操作の根拠とした観測情報即ち操作者Ｐが見ている観測結果と、ロボット１０１が認識可能な観測結果（上記比較例ではカメラ２０の視野）が一致していれば良い。
しかしながら、単に操作者Ｐが模範操作の際にモニター１１のみをみて操作するのでは、図４（ｃ）、（ｄ）において模範操作を行うことが難しく、そもそも学習パターンとして除外されてしまう。 In order to solve such a problem, it is necessary to combine the observation information that the operator P uses as the basis for the operation, that is, the observation results that the operator P is looking at, and the observation results that the robot 101 can recognize (in the above comparative example, the observation results of the camera 20). It is good as long as the field of view) matches.
However, if the operator P simply looks at the monitor 11 during the model operation, it will be difficult to perform the model operation in FIGS. 4(c) and 4(d), and the pattern will be excluded as a learning pattern in the first place.

本発明は、かかる問題を解決するべく、機械学習制御部３１は、動画像フレームＦ１と、アーム４０の移動操作コマンドＭ１と、に加えて、さらにモニター表示切替コマンドＧ１とを紐づけて学習を行う。
モニター表示切替コマンドＧを用いた学習方法について、図５のフロー図と、図６（ａ）～（ｃ）に例示するような動作例を用いて説明する。
なお、図６（ａ）～（ｃ）の左側にはカメラ２０ａの画像を例示し、図６（ａ）～（ｃ）の右側にはカメラ２０ｂの画像を例示している。また、それぞれのカメラ２０ａ、２０ｂに対応するモニター表示切替コマンドＧ１、Ｇ２、Ｇ３を押した際のモニター１１に表示される画面の遷移について、図６（ａ）～（ｃ）の中央部にそれぞれ矢印で模式的に示している。
操作者Ｐは、図４（ｃ）に示した撮影状態Ｃを確認した初期状態において、まず動画像操作部１３を用いてモニター表示切替コマンドＧ１を入力し、モニター１１に表示されるカメラ２０の視野を適切に設定する（ステップＳ１０１）。
かかるモニター表示切替コマンドＧ１は、例えばカメラ２０ａ、２０ｂの何れかに切り替えるものであっても良いし、カメラ２０ａ、２０ｂの何れかを移動させて図４（ａ）～（ｃ）に示した各撮影状態のうち、撮影状態Ｃから撮影状態Ｂへと移動させるものであっても良い。
また、カメラ２０の移動は例えばパン・チルト・ロール等の操作の他、カメラ２０の視野の変更を行うような操作であって良い。
何れにせよ、モニター表示切替コマンドＧ１によって、対象物Ｑの位置を動画像フレームＦ２内に収めることができる（ステップＳ１０２）。
かかる画像の選択は、例えば図６（ａ）～（ｃ）に示すようにモニター表示切替コマンドＧ１、Ｇ２、Ｇ３を用いてカメラ２０ａの画像からカメラ２０ｂの画像へと切り替えることによって行うことができる。
本実施形態における操作者Ｐの模範操作において、既に述べたように図４（ｃ）に示した撮影状態Ｃすなわち図６（ａ）の左側に表示したような画像が表示された場合には、操作者Ｐは、モニター表示切替コマンドＧ１によって、カメラ２０ａによって撮影した動画像フレームＦ１から、カメラ２０ｂによって撮影された動画像フレームＦ１を選択してモニター１１の表示を切り替えた後、移動操作コマンドＭ１を入力する。
機械学習制御部３１は、動画像フレームＦ１の時に行ったモニター表示切替コマンドＧ１と、モニター１１の表示を異なる動画像フレームＦ１に切り替えた後の移動操作コマンドＭ１とを紐づけて学習に用いるデータセットとして保存する（ステップＳ１０３）。
このように、「操作者Ｐが切り替えたかった動画像フレームＦ１」は「動画像フレームＦ１の時に行ったモニター表示切替コマンドＧ１」によって導かれるから、「模範操作を行った際に選択手段によって選択された観測結果」として機能する。
さらに移動操作コマンドＭ１を入力後には、カメラ２０ａ、２０ｂの撮影する動画像フレームは図６（ｂ）に示すように動画像フレームＦ２に変化する。かかる動画像フレームＦ２を新たな観測結果として、移動操作コマンドＭ２、モニター表示切替コマンドＧ２についても次々とループして模範操作が終了したと判定されるまで学習を行う（ステップＳ１０４）。例えば移動操作コマンドＭ２の後、図６（ｃ）に示すように動画像フレームＦ３に変化すると次は動画像フレームＦ３を新たな観測結果としてループを行う。本実施形態においては、ステップＳ１０４の終了判定は、対象物Ｑをピックアップ出来た時点で１つの学習パターンを終了する。 In order to solve this problem, in the present invention, the machine learning control unit 31 performs learning by linking the moving image frame F1 and the movement operation command M1 of the arm 40, as well as the monitor display switching command G1. conduct.
A learning method using the monitor display switching command G will be explained using the flow diagram of FIG. 5 and operation examples illustrated in FIGS. 6(a) to 6(c).
Note that the left side of FIGS. 6(a) to 6(c) shows an example of an image of the camera 20a, and the right side of FIGS. 6(a) to 6(c) shows an example of an image of the camera 20b. In addition, the transition of the screen displayed on the monitor 11 when the monitor display switching commands G1, G2, and G3 corresponding to the cameras 20a and 20b are pressed is shown in the center of FIGS. 6(a) to 6(c), respectively. Schematically indicated by arrows.
In the initial state in which the operator P has confirmed the shooting state C shown in FIG. The field of view is appropriately set (step S101).
Such monitor display switching command G1 may be, for example, to switch to either camera 20a or 20b, or to move either camera 20a or 20b to display each of the images shown in FIGS. 4(a) to 4(c). Of the photographing states, it may be moved from photographing state C to photographing state B.
Furthermore, the camera 20 may be moved by, for example, operations such as panning, tilting, and rolling, as well as operations such as changing the field of view of the camera 20.
In any case, the monitor display switching command G1 allows the position of the object Q to be placed within the moving image frame F2 (step S102).
Selection of such an image can be performed, for example, by switching from the image of the camera 20a to the image of the camera 20b using monitor display switching commands G1, G2, and G3 as shown in FIGS. 6(a) to (c). .
In the model operation of the operator P in this embodiment, as described above, when the photographing state C shown in FIG. 4(c), that is, the image shown on the left side of FIG. 6(a) is displayed, The operator P selects the moving image frame F1 taken by the camera 20b from the moving image frames F1 taken by the camera 20a and switches the display on the monitor 11 using the monitor display switching command G1, and then uses the movement operation command M1. Enter.
The machine learning control unit 31 links the monitor display switching command G1 executed when the video frame F1 was displayed with the movement operation command M1 after switching the display on the monitor 11 to a different video frame F1, and creates data used for learning. Save as a set (step S103).
In this way, since "video frame F1 that operator P wanted to switch" is guided by "monitor display switching command G1 performed when video frame F1", It functions as a "observation result".
Further, after inputting the movement operation command M1, the moving image frame photographed by the cameras 20a and 20b changes to a moving image frame F2 as shown in FIG. 6(b). Using the moving image frame F2 as a new observation result, the moving operation command M2 and the monitor display switching command G2 are also learned in a loop one after another until it is determined that the model operation has ended (step S104). For example, after the movement operation command M2, when the moving image frame F3 changes as shown in FIG. 6(c), a loop is performed using the moving image frame F3 as a new observation result. In this embodiment, the end determination in step S104 ends one learning pattern when the target object Q can be picked up.

言うまでもないことではあるが、これらの学習パターンを１セットとして、複数パターンの学習結果を機械学習制御部３１に蓄積・記憶することで、自律制御部３２が用いる制御関数を導出することができる。
これらの学習によって得られる制御関数としては、畳み込みニューラルネットワーク、前結合ニューラルネットワーク、線形関数など様々な形態が考えられるが、所謂機械学習に用いられる関数であれば良く、かかる構成に限定されるものではない。 Needless to say, the control function used by the autonomous control section 32 can be derived by storing and storing the learning results of a plurality of patterns in the machine learning control section 31 using these learning patterns as one set.
The control function obtained through this learning can take various forms, such as a convolutional neural network, a precombinant neural network, and a linear function, but any function used in so-called machine learning is sufficient and is limited to such configurations. isn't it.

以上述べたように、本実施形態において機械学習制御部３１の行う学習方法は、操作者Ｐによる操作を行う操作ステップＳ１０２と、操作ステップＳ１０２において複数の観測手段から得られた観測結果のうち少なくとも１つを選択する選択ステップＳ１０１と、選択ステップＳ１０１において選択された動画像フレームＦと、操作ステップＳ１０２において操作者が操作した移動操作コマンドＭと、モニター表示切替コマンドＧと、を一組としたデータセットとして保存・蓄積する学習ステップＳ１０３と、を有している。
かかる構成により、操作者Ｐによる模範操作に基づく機械学習を効率よくおこなうことができる。 As described above, the learning method performed by the machine learning control unit 31 in this embodiment includes the operation step S102 in which the operator P performs an operation, and at least one of the observation results obtained from a plurality of observation means in the operation step S102. The selection step S101 in which one is selected, the moving image frame F selected in the selection step S101, the movement operation command M operated by the operator in the operation step S102, and the monitor display switching command G are set as a set. and a learning step S103 of storing and accumulating as a data set.
With this configuration, machine learning based on model operations by the operator P can be efficiently performed.

このように模範操作による学習を繰り返し行い、機械学習が終了した後のロボット１０１の自律制御部３２の動作について図７を用いて説明する。
まず、自律制御部３２は、初期入力値としてモニター１１の動画像フレームＦ１を確認する（ステップＳ２０１）。
自律制御部３２は機械学習制御部３１に保存された学習データセットと、それに基づく制御関数から、入力された動画像フレームＦ１に対する出力としてモニター表示切替コマンドＧ１と移動操作コマンドＭ１とを出力する（ステップＳ２０２）。
自律制御部３２は、モニター表示切替コマンドＧ１と移動操作コマンドＭ１とによって模範操作で行われた全ての操作が終了したかどうかを確認する（ステップＳ２０３）。
自律制御部３２は、さらにモニター表示切替コマンドＧ１によって変更された動画像フレームＦ２を制御関数への新たな入力値として、モニター表示切替コマンドＧ２と移動操作コマンドＭ２とを出力するようにループ処理を順次実行する（ステップＳ２０４）。
このようにロボット１０１は、操作者Ｐの模範操作を模倣学習することによって、自律的に対象物Ｑをピックアップする操作を完了することができる。 The operation of the autonomous control unit 32 of the robot 101 after the machine learning is completed by repeatedly performing learning through model operations in this manner will be described with reference to FIG.
First, the autonomous control unit 32 checks the video frame F1 on the monitor 11 as an initial input value (step S201).
The autonomous control unit 32 outputs a monitor display switching command G1 and a movement operation command M1 as outputs for the input video frame F1 from the learning data set stored in the machine learning control unit 31 and the control function based thereon ( Step S202).
The autonomous control unit 32 checks whether all operations performed in the model operation using the monitor display switching command G1 and the movement operation command M1 have been completed (step S203).
The autonomous control unit 32 further performs loop processing to output a monitor display switching command G2 and a movement operation command M2 using the moving image frame F2 changed by the monitor display switching command G1 as a new input value to the control function. It is executed sequentially (step S204).
In this way, the robot 101 can autonomously complete the operation of picking up the object Q by imitating and learning the model operation of the operator P.

本実施形態においては、ロボット１０１は、操作者Ｐによる模範操作に基づいて学習を行い、学習の結果に基づいて自律制御を行うロボットであって、学習を行うときにロボット１０１の周囲の状況を複数観測可能な観測手段たるカメラ２０ａ、２０ｂと、カメラ２０ａ、２０ｂによって観測された複数の観測結果のうち、何れかを選択する選択手段としての動画像操作部１３と、を有している。
また、機械学習制御部３１は、ロボット１０１の学習において、操作者Ｐの移動操作コマンドＭと、操作を行った際に動画像操作部１３によって選択された動画像フレームＦ１とを紐づけて学習する。
かかる構成によれば、操作者Ｐが動画像操作部１３によって複数の動画像フレームＦの中から、移動操作コマンドＭを行うために好適な動画像フレームＦを選択することができるので、模倣学習を効率よく行うことができて、自律制御時における自動操作の精度が向上する。 In this embodiment, the robot 101 is a robot that performs learning based on a model operation by the operator P and performs autonomous control based on the learning results, and when performing learning, the robot 101 is a robot that performs autonomous control based on the model operation performed by the operator P. It has cameras 20a and 20b as observation means capable of performing multiple observations, and a moving image operation unit 13 as selection means for selecting one of the plurality of observation results observed by the cameras 20a and 20b.
In addition, during learning of the robot 101, the machine learning control unit 31 learns by associating the movement operation command M of the operator P with the video frame F1 selected by the video image operation unit 13 when performing the operation. do.
According to this configuration, the operator P can select a suitable moving image frame F for performing the movement operation command M from among the plurality of moving image frames F using the moving image operation unit 13, so that imitation learning is possible. can be performed efficiently, improving the accuracy of automatic operation during autonomous control.

また本実施形態においては、観測手段として複数のカメラ２０ａ、２０ｂを有し、動画像操作部１３は、カメラ２０ａ、２０ｂのうちの１つが撮影した動画像を観測結果として選択可能である。
かかる構成により、操作者Ｐが動画像操作部１３によって複数の動画像フレームＦの中から、移動操作コマンドＭを行うために好適な動画像フレームＦを選択することができるので、模倣学習を効率よく行うことができて、自律制御時における自動操作の精度が向上する。 Further, in this embodiment, a plurality of cameras 20a and 20b are provided as observation means, and the moving image operation section 13 can select a moving image photographed by one of the cameras 20a and 20b as an observation result.
With this configuration, the operator P can select a suitable moving image frame F for performing the movement operation command M from among the plurality of moving image frames F using the moving image operation unit 13, so that imitation learning can be performed efficiently. This improves the accuracy of automatic operation during autonomous control.

本発明における第２の実施形態として、アーム４０とは独立して駆動可能なカメラ２１を用いる場合について図７を用いて説明する。
なお、第２の実施形態において、既に述べた第１の実施形態と同様の構成においては、同一の番号を付して説明を省略する。 As a second embodiment of the present invention, a case where a camera 21 that can be driven independently of the arm 40 is used will be described using FIG. 7.
Note that, in the second embodiment, the same configurations as those of the first embodiment already described are given the same numbers and the description thereof will be omitted.

カメラ２１は、パン・チルト・ロール等の自由な移動が可能なカメラアーム２４を有している。
動画像操作部１３は、かかるカメラ２１のカメラアーム２４を操作することによって、カメラ２１の視野を任意の位置で調整することが可能である。
操作者Ｐは、初期に図９（ａ）に示すような動画像フレームＦ１を得た場合には、模範操作としてまず動画像操作部１３を用いてパン・チルト・ロール等を行った操作内容をモニター表示切替コマンドＧ１として入力する。
機械学習制御部３１は、かかるモニター表示切替コマンドＧ１によって変化したカメラ２１の視野を新たな動画像フレームＦ２として、操作者Ｐの次の操作であるモニター表示切替コマンドＧ２と移動操作コマンドＭ２とを一組のデータセットとして模範操作として記憶する。
このとき、動画像フレームＦ２は、操作者Ｐが模範操作によって見たかった画像であり、「動画像操作部１３によって任意に選択された観測結果」である。すなわち、操作者Ｐの移動操作コマンドＭ２の根拠となる「操作者Ｐが把握していた観測情報」が動画像フレームＦ２である。
機械学習制御部３１は、かかるデータセットの組から、制御関数を導出する。自律制御部３２は、かかる制御関数に基づいて自律制御を行い、操作者Ｐが行った模範操作を模倣するように対象物Ｑを摘まみ取るという一連の動作を完了させる。 The camera 21 has a camera arm 24 that can be freely moved such as panning, tilting, and rolling.
The moving image operation unit 13 can adjust the field of view of the camera 21 at any position by operating the camera arm 24 of the camera 21.
When the operator P initially obtains the moving image frame F1 as shown in FIG. is input as the monitor display switching command G1.
The machine learning control unit 31 sets the field of view of the camera 21 changed by the monitor display switching command G1 as a new video frame F2, and uses the monitor display switching command G2 and the movement operation command M2, which are the next operations by the operator P, as a new video frame F2. Store as a model operation as a set of data.
At this time, the moving image frame F2 is an image that the operator P wanted to see through the model operation, and is "an observation result arbitrarily selected by the moving image operating section 13." That is, "observation information known to the operator P" that is the basis for the movement operation command M2 of the operator P is the moving image frame F2.
The machine learning control unit 31 derives a control function from this set of data sets. The autonomous control unit 32 performs autonomous control based on the control function, and completes a series of operations of picking up the object Q so as to imitate the model operation performed by the operator P.

第２の実施形態において、ロボット１０１はアーム４０の動作と独立して移動する少なくとも１つのカメラ２１を有し、動画像操作部１３は、カメラ２１が撮影した任意の位置における動画像を観測結果として選択可能である。
このように、撮像部たるカメラ２１が１つであっても、任意の位置に移動可能であれば「観測手段を複数有する」のと同様の効果を得られて、かかる複数選択可能な観測結果の中から動画像操作部１３によって１つを選択することができる。
かかる構成により、操作者Ｐが動画像操作部１３によって複数の動画像フレームＦの中から、移動操作コマンドＭを行うために好適な動画像フレームＦを選択することができるので、模倣学習を効率よく行うことができて、自律制御時における自動操作の精度が向上する。 In the second embodiment, the robot 101 has at least one camera 21 that moves independently of the movement of the arm 40, and the video image operation unit 13 displays the video image at an arbitrary position taken by the camera 21 as an observation result. It can be selected as
In this way, even if there is only one camera 21 as the imaging unit, if it can be moved to any position, the same effect as "having multiple observation means" can be obtained, and such multiple selections of observation results can be obtained. One can be selected from among them using the moving image operation section 13.
With this configuration, the operator P can select a suitable moving image frame F for performing the movement operation command M from among the plurality of moving image frames F using the moving image operation unit 13, so that imitation learning can be performed efficiently. This improves the accuracy of automatic operation during autonomous control.

また、第３の実施形態として、アーム４０の先端に取り付けた全天球カメラ２３を用いた構成について説明する。 Furthermore, as a third embodiment, a configuration using the omnidirectional camera 23 attached to the tip of the arm 40 will be described.

全天球カメラ２３は、アーム４０の先端に取り付けられて、全天球あるいは半天球画像を撮影可能な撮像部としての機能を有している。
全天球カメラ２３はまた、撮影した全天球、半天球動画像のうちの何れの位置についても切り取り、拡大などを動画像操作部１３によって選択可能であり、モニター１１はかかる選択された全天球画像の一部あるいは全部を表示することができる。
このような全天球カメラ２３を用いたときにも、機械学習制御部３１は動画像フレームＦ１が入力されたときに、対応する動画像操作部１３によるモニター表示切替コマンドＧ１と移動操作コマンドＭ１とを一組のデータセットとして保存・蓄積する。
以下、自律制御部３２の動作や、学習の過程については、第１の実施形態あるいは第２の実施形態と説明が重複するため省略する。 The spherical camera 23 is attached to the tip of the arm 40 and has a function as an imaging unit capable of capturing spherical or hemispherical images.
The spherical camera 23 can also select any position of the captured spherical or semi-spherical moving image to be cropped or enlarged using the moving image operation section 13, and the monitor 11 displays the selected entire spherical moving image. Part or all of the celestial sphere image can be displayed.
Even when such a spherical camera 23 is used, the machine learning control unit 31, when the video frame F1 is input, uses the monitor display switching command G1 and the movement operation command M1 by the corresponding video operation unit 13. and stored as a set of data.
Hereinafter, the operation of the autonomous control unit 32 and the learning process will be omitted because the description overlaps with the first embodiment or the second embodiment.

なお、例えば全天球カメラ２３の全天球画像を用いる場合には、図１０に示すように、動画像操作部１３とモニター１１とをヘッドアップディスプレイ（ＨＵＤ）１５として一体化しても良い。
その場合には、例えば操作者Ｐは、ヘッドアップディスプレイ１５とポインティングデバイス１６とを装着した状態で、全天球画像の何処かにある対象物Ｑを摘まみ上げるようなモーションを行うことで簡易に模範操作を終了することができる。 Note that, for example, when using a spherical image from the spherical camera 23, the moving image operation section 13 and the monitor 11 may be integrated as a head-up display (HUD) 15, as shown in FIG.
In that case, for example, the operator P, while wearing the head-up display 15 and the pointing device 16, can perform a simple motion such as picking up the object Q located somewhere in the spherical image. The model operation can be completed immediately.

さらに、このように全天球カメラ２３とヘッドアップディスプレイ１５とを用いた場合には、ヘッドアップディスプレイ１５の機能として操作者Ｐの視線の位置あるいは視野が全天球カメラ２３の撮影した全天球画像のうちの一部分と常に一致するから、選択手段と表示手段とが一体化できて簡易な構成で模範操作を行うことができる。 Furthermore, when the omnidirectional camera 23 and the head-up display 15 are used in this way, the function of the head-up display 15 is to change the position of the line of sight or field of view of the operator P to the entire sky captured by the omnidirectional camera 23. Since it always matches a portion of the spherical image, the selection means and the display means can be integrated, and a model operation can be performed with a simple configuration.

以上本発明の好ましい実施の形態について説明したが、本発明はかかる特定の実施形態に限定されるものではなく、上述の説明で特に限定していない限り、特許請求の範囲に記載された本発明の趣旨の範囲内において、種々の変形・変更が可能である。 Although the preferred embodiments of the present invention have been described above, the present invention is not limited to such specific embodiments, and unless specifically limited in the above explanation, the present invention described in the claims Various modifications and changes are possible within the scope of the spirit.

例えば、上記実施形態では、ロボット１０１として、アーム４０を駆動させて対象物Ｑを摘まみ上げる操作を行うロボットについて説明を行ったが、その他の駆動形態を有するロボットについて本発明を適用しても良い。 For example, in the above embodiment, the robot 101 is described as a robot that drives the arm 40 to pick up the object Q, but the present invention may also be applied to robots having other drive modes. good.

また上記実施形態では、観測手段としてカメラについてのみ説明したが、音、可視光以外の電磁波等を観測して、かかる観測結果に基づいて制御を行うロボットであっても良い。
また上記実施形態において用いられた学習方法は、ステップＳ１０１～ステップＳ１０４に上述した各手段を実行するプログラムあるいは当該プログラムが記憶された各種記憶媒体によって実行されるとしても良い。 Further, in the above embodiment, only a camera was described as an observation means, but a robot that observes sound, electromagnetic waves other than visible light, etc., and performs control based on the observation results may also be used.
Further, the learning method used in the above embodiment may be executed by a program that executes each of the above-mentioned means in steps S101 to S104, or by various storage media in which the program is stored.

本発明の実施の形態に記載された効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、本発明の実施の形態に記載されたものに限定されるものではない。 The effects described in the embodiments of the present invention are merely a list of the most preferred effects resulting from the present invention, and the effects of the present invention are limited to those described in the embodiments of the present invention. isn't it.

１０操作部
１１動画像表示部（モニター）
１２移動指示部
１３選択手段（動画像操作部）
２０観測手段（カメラ）
２０ａ、２０ｂ観測手段（カメラ）
２１観測手段（カメラ）
２２受信部
２３観測手段（全天球カメラ）
２４カメラアーム
３０制御部（制御端末）
３１学習制御部
３２自律制御部
１００制御システム
１０１ロボット
Ｆ観測結果の一例
Ｐ操作者
Ｑ対象物 10 Operation unit 11 Video display unit (monitor)
12 Movement instruction section 13 Selection means (video image operation section)
20 Observation means (camera)
20a, 20b Observation means (camera)
21 Observation means (camera)
22 Receiving unit 23 Observation means (all-celestial camera)
24 Camera arm 30 Control unit (control terminal)
31 Learning control unit 32 Autonomous control unit 100 Control system 101 Robot F Example of observation results P Operator Q Target object

特許第５３６１７５６号Patent No. 5361756

Claims

A robot that learns based on operations by an operator and performs autonomous control based on the results of the learning,
observation means capable of observing multiple situations around the robot when performing the learning;
selection means for selecting one of the plurality of observation results observed by the observation means by a switching operation of the observation means performed by the operator after confirming the observation results ;
has
In the learning , the observation result selected by the selection means when performing the switching operation; the movement operation of the robot by the operator input after the observation result is selected by the selection means; A robot that is characterized by being linked and learned.

The robot according to claim 1,
The observation means has a plurality of imaging units,
The robot is characterized in that the selection means is capable of selecting a moving image photographed by one of the imaging units as the observation result.

The robot according to claim 1 or 2,
The observation means has at least one imaging unit that moves with the movement of the robot,
The robot is characterized in that the selection means is capable of selecting a moving image taken by the imaging unit at an arbitrary position as the observation result.

The robot according to any one of claims 1 to 3,
an autonomous control unit that causes the robot to operate autonomously based on the learning result;
The autonomous control unit selects the observation result selected by the selection means when performing the switching operation, and the operation of the robot by the operator that is input after the observation result is selected by the selection means. A robot that performs machine learning based on a dataset consisting of a set of and .

The robot according to any one of claims 1 to 4,
The robot is characterized in that the observation means includes an imaging unit capable of capturing a spherical image or a hemispherical image.

The robot according to claim 5,
The robot is characterized in that the selection means is capable of selecting a predetermined range of the celestial sphere image or the half-celestial sphere image as the observation result.

a selection step of selecting at least one of the observation results obtained from a plurality of observation means by a switching operation of the observation means performed by an operator after confirming the observation results; ,
an operation step of performing a movement operation of the robot by the operator input after the observation result is selected in the selection step;
a learning step of storing and accumulating the observation results selected in the selection step and the operation instructions operated by the operator in the operation step as a data set;
A machine learning method characterized by having the following.