JP2019214112A

JP2019214112A - Machine learning device, and robot system equipped with the same

Info

Publication number: JP2019214112A
Application number: JP2018113672A
Authority: JP
Inventors: 国宗駒池; Kunimune Komaike
Original assignee: Yamaha Motor Co Ltd
Current assignee: Yamaha Motor Co Ltd
Priority date: 2018-06-14
Filing date: 2018-06-14
Publication date: 2019-12-19
Anticipated expiration: 2038-06-14
Also published as: JP7102241B2

Abstract

To provide a machine learning device capable of learning the motion of a robot that enables holding of a workpiece, on which holding by a hand part is determined to be impossible because a holding space is not secured.SOLUTION: The motion of a robot (2) equipped in a robot system (1) is controlled on the basis of a learning result of a machine learning device (5). The machine learning device (5) includes a determination part (7) and a learning part (63). The determination part (7) determines whether a workpiece that becomes a candidate for next holding by a hand part (26) is a holding impossible workpiece for which a holding space is not secured. The learning part (63) learns a displacement technique for enabling the holding impossible workpiece to be displaced to secure a holding space, and learns an action pattern of the robot (2) using the displacement technique.SELECTED DRAWING: Figure 1

Description

本発明は、バラ積みされた状態のワークを取り出すロボットの動作を学習する機械学習装置、及びこの機械学習装置を備えたロボットシステムに関する。 The present invention relates to a machine learning device that learns an operation of a robot that takes out a work in a piled state, and a robot system including the machine learning device.

複数のワークをバラ積みされた状態で収容する容器からワークを取り出すシステムとして、ハンド部を備えたロボットによってワークを取り出すロボットシステムが知られている（特許文献１参照）。特許文献１に開示されるロボットシステムは、ロボットの取り出し動作を学習する機械学習装置を備えている。機械学習装置は、三次元計測器により計測されたワークの三次元マップに対応したロボットの動作と、ワークの取り出しの成否の判定結果とを関連付けた教師データに基づいて、容器からワークを取り出す際のロボットの動作を学習している。 2. Description of the Related Art As a system for taking out a work from a container accommodating a plurality of works in a stacked state, a robot system for taking out a work by a robot having a hand unit is known (see Patent Document 1). The robot system disclosed in Patent Document 1 includes a machine learning device that learns a robot take-out operation. When the machine learning device takes out the workpiece from the container based on the teacher data that associates the movement of the robot corresponding to the three-dimensional map of the workpiece measured by the three-dimensional measuring instrument and the determination result of the success or failure of the workpiece removal. Learning the behavior of robots.

容器からのワークの取り出し動作が繰り返されると、ハンド部による次の保持候補となるワークに対するハンド部の保持が不可能となる場合がある。例えば、容器の内面に近接した状態でワークが配置されている場合や、複数のワーク同士が互いに近接した状態で配置されている場合には、ハンド部による保持を可能とするための保持スペースが確保されていない状況となり、このような状況のワークはハンド部による保持が不可能となる。 When the operation of taking out the work from the container is repeated, the hand unit may not be able to hold the hand unit with respect to the next candidate work. For example, when the work is arranged in a state close to the inner surface of the container, or when a plurality of works are arranged in a state close to each other, a holding space for enabling the holding by the hand unit is provided. The work is not secured, and the work in such a situation cannot be held by the hand unit.

保持スペースが確保されていないワークをハンド部によって保持可能とするための技術が、例えば特許文献２に開示されている。特許文献２に開示される技術では、容器内のワークをハンド部によって掻き乱す。しかしながら、容器内のワークをハンド部によって無作為に掻き乱すので、保持スペースが十分に確保されずにハンド部による保持が可能となるには至らない場合がある。 For example, Patent Document 2 discloses a technique for enabling a hand in which a holding space is not secured to be held by a hand unit. In the technique disclosed in Patent Literature 2, the work in the container is disturbed by the hand unit. However, since the work in the container is randomly disturbed by the hand, the holding space may not be sufficiently secured and the holding by the hand may not be possible.

特開２０１７−６４９１０号公報JP 2017-64910 A 特開２０１１−１１５９３０号公報JP 2011-115930 A

本発明は、このような事情に鑑みてなされたものであり、その目的とするところは、保持スペースが確保されずにハンド部による保持が不可能とされたワークの保持を可能とするロボットの動作を学習できる機械学習装置、及びこれを備えたロボットシステムを提供することにある。 The present invention has been made in view of such circumstances, and an object of the present invention is to provide a robot capable of holding a workpiece that cannot be held by a hand unit without holding space. An object of the present invention is to provide a machine learning device capable of learning an operation and a robot system including the same.

本発明の一の局面に係る機械学習装置は、複数のワークをバラ積みされた状態で収容する容器から前記ワークを保持することによって取り出すハンド部を備えたロボットの動作を学習する装置である。この機械学習装置は、前記ハンド部が前記容器内の一のワークを保持する前又は保持したときに、前記容器内での各ワークの収容状況を認識し、前記ハンド部による次の保持候補となるワークが、前記ハンド部による保持を可能とするための保持スペースが周囲に確保されていない保持不可ワークであるかを判定する判定部と、前記保持スペースが確保されるように前記保持不可ワークを変位させることが可能な変位手法を学習するとともに、当該変位手法を用いた前記ロボットの行動パターンを学習する学習部と、前記学習部の学習結果に基づく前記ロボットの行動パターンを、前記保持不可ワークを前記ハンド部によって保持可能とするための行動パターンとして決定する行動決定部と、を備える。 A machine learning device according to one aspect of the present invention is a device for learning an operation of a robot including a hand unit that removes a plurality of works by holding the works from a container that accommodates the works in a stacked state. This machine learning device, before or when the hand unit holds one work in the container, recognizes the accommodation state of each work in the container, and the next holding candidate by the hand unit A determination unit that determines whether a work to be held is a non-holdable work in which a holding space for enabling the holding by the hand unit is not secured around, and the non-holdable work so that the holding space is ensured. A learning unit that learns a displacement method capable of displacing the robot, and learns an action pattern of the robot using the displacement method; and an action pattern of the robot that is based on a learning result of the learning unit. And an action determining unit that determines the work as an action pattern for enabling the work to be held by the hand unit.

この機械学習装置によれば、ハンド部による次の保持候補となるワークが保持不可ワークであることが判定部によって判定された場合、学習部は、保持スペースが確保されるように保持不可ワークを変位させることが可能な変位手法を学習し、当該変位手法を用いたロボットの行動パターンを学習する。これにより、学習部は、ハンド部による保持が不可能とされたワークの保持を可能とする、所定の変位手法を用いたロボットの行動パターンを学習することができる。そして、行動決定部は、学習部の学習結果に基づくロボットの行動パターンを、保持不可ワークをハンド部によって保持可能とするための行動パターンとして決定する。この行動パターンに従ってロボットが動作することにより、ハンド部による保持が不可能とされたワークの周囲に、ハンド部による保持を可能とするための保持スペースが確保され、当該ワークのハンド部による保持が可能となる。このため、保持不可ワークの存在に起因してロボットの動作を停止させることを可及的に回避することができ、ハンド部による容器からのワークの取り出し動作を継続させることができる。 According to this machine learning device, when the determination unit determines that the work to be held next by the hand unit is a work that cannot be held, the learning unit determines the work that cannot be held so that a holding space is secured. A displacement method that can be displaced is learned, and a behavior pattern of the robot using the displacement method is learned. Thus, the learning unit can learn the behavior pattern of the robot using a predetermined displacement method that enables the holding of the work that cannot be held by the hand unit. Then, the action determining unit determines an action pattern of the robot based on a learning result of the learning unit as an action pattern for enabling the hand unit to hold the work that cannot be held. When the robot operates in accordance with this behavior pattern, a holding space for enabling the holding by the hand unit is secured around the work that cannot be held by the hand unit. It becomes possible. Therefore, it is possible to avoid stopping the operation of the robot due to the existence of the non-holdable work as much as possible, and it is possible to continue the operation of taking out the work from the container by the hand unit.

上記の機械学習装置において、前記変位手法は、前記保持不可ワークを変位させる方式が異なる複数の手法を含み、前記学習部は、前記複数の手法が組み合わされた前記ロボットの行動パターンを学習する構成であってもよい。 In the machine learning device described above, the displacement method includes a plurality of methods different in a method of displacing the non-holdable work, and the learning unit learns an action pattern of the robot in which the plurality of methods are combined. It may be.

また、上記の機械学習装置において、前記変位手法は、前記ハンド部が、保持した前記一のワークを前記保持不可ワークに当接させた状態で移動することにより、当該保持不可ワークを変位させる手法であり、前記学習部が学習する前記ロボットの行動パターンを規定する行動要素には、前記一のワークの前記保持不可ワークに対する当接位置を決定付ける要素と、前記ハンド部の移動軌跡を決定付ける要素とが含まれる構成であってもよい。 Further, in the machine learning device described above, the displacement method is a method of displacing the non-retainable work by moving the hand unit in a state where the held one work is in contact with the non-retainable work. The behavior element that defines the behavior pattern of the robot that the learning unit learns is an element that determines a contact position of the one work with the non-holdable work and a movement trajectory of the hand unit. A configuration including elements may be included.

また、上記の機械学習装置において、前記変位手法は、前記ハンド部が前記容器を保持した状態で移動することにより、前記保持不可ワークを変位させる手法であり、前記学習部が学習する前記ロボットの行動パターンを規定する行動要素には、前記ハンド部が前記容器を保持する保持位置を決定付ける要素と、前記ハンド部の移動軌跡を決定付ける要素と、前記ハンド部の移動速度を決定付ける要素とが含まれる構成であってもよい。 In the machine learning device described above, the displacement method is a method of displacing the work that cannot be held by moving the hand unit while holding the container, and the learning unit learns the robot. The action element that defines the action pattern includes an element that determines a holding position where the hand unit holds the container, an element that determines a movement locus of the hand unit, and an element that determines a moving speed of the hand unit. May be included.

本発明の他の局面に係るロボットシステムは、複数のワークをバラ積みされた状態で収容する容器から前記ワークを保持することによって取り出すハンド部を備えたロボットと、前記ロボットの動作を学習する、上記の機械学習装置と、前記機械学習装置の学習結果に基づいて、前記ロボットの動作を制御する制御装置と、を備える。 A robot system according to another aspect of the present invention includes a robot having a hand unit that removes a plurality of works by holding the works from a container that accommodates the works in a bulk state, and learns the operation of the robot. The apparatus includes the machine learning device described above, and a control device that controls an operation of the robot based on a learning result of the machine learning device.

このロボットシステムによれば、ハンド部による保持が不可能とされたワークの保持を可能とするロボットの行動パターンを学習できる上記の機械学習装置を備えている。このため、ロボットは、保持不可ワークの存在に起因して停止されることが可及的に回避され、ハンド部による容器からのワークの取り出し動作を継続することができる。 According to this robot system, there is provided the above-mentioned machine learning device capable of learning an action pattern of a robot which enables the holding of a workpiece which cannot be held by the hand unit. Therefore, the robot can be prevented from being stopped due to the existence of the non-holdable work as much as possible, and can continue the operation of removing the work from the container by the hand unit.

以上説明したように、本発明によれば、保持スペースが確保されずにハンド部による保持が不可能とされたワークの保持を可能とするロボットの動作を学習できる機械学習装置、及びこれを備えたロボットシステムを提供することができる。 As described above, according to the present invention, there is provided a machine learning device capable of learning an operation of a robot capable of holding a work that cannot be held by a hand unit because a holding space is not secured, and a machine learning device including the machine learning device. Robot system can be provided.

本発明の一実施形態に係るロボットシステムの構成を示すブロック図である。It is a block diagram which shows the structure of the robot system which concerns on one Embodiment of this invention. ロボットシステムに備えられるロボットの一例を示す図である。It is a figure which shows an example of the robot with which a robot system is equipped. ロボットシステムに備えられる機械学習装置の状態観測部の動作を説明するための図である。It is a figure for demonstrating operation | movement of the state observation part of the machine learning apparatus with which a robot system is equipped. 機械学習装置の行動観測部の動作を説明するための図である。It is a figure for demonstrating operation | movement of the action observation part of a machine learning apparatus. ロボットの行動パターンを規定する行動要素を説明するための図である。It is a figure for demonstrating the action element which prescribes | regulates the action pattern of a robot. 保持不可ワークを変位させるための変位手法を説明するための図である。It is a figure for explaining a displacement method for displacing a work which cannot be held. 保持不可ワークを変位させる変位動作の第１例を説明するための図である。It is a figure for explaining the 1st example of the displacement operation which displaces the work which cannot be held. 機械学習装置の変位量観測部の動作を説明するための図である。It is a figure for demonstrating operation | movement of the displacement amount observation part of a machine learning apparatus. 第１例の変位動作において学習部によって生成される学習結果情報を説明するための図である。It is a figure for explaining learning result information generated by a learning part in the displacement operation of the 1st example. 第１例の変位動作に関する機械学習装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the machine learning apparatus regarding the displacement operation | movement of a 1st example. 第１例の変位動作におけるロボットの行動パターンの変形例を説明するための図である。It is a figure for explaining the modification of the action pattern of the robot in the displacement operation of the 1st example. 保持不可ワークを変位させる変位動作の第２例を説明するための図である。It is a figure for explaining the 2nd example of the displacement operation which displaces the work which cannot be held. 第２例の変位動作において学習部によって生成される学習結果情報を説明するための図である。It is a figure for explaining learning result information generated by a learning part in the displacement operation of the 2nd example. 第２例の変位動作に関する機械学習装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the machine learning apparatus regarding the displacement operation | movement of the 2nd example. 第３例の変位動作において学習部によって生成される学習結果情報を説明するための図である。It is a figure for explaining learning result information generated by a learning part in the displacement operation of the 3rd example. 第３例の変位動作に関する機械学習装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the machine learning apparatus regarding the displacement operation | movement of the 3rd example.

［ロボットシステムの全体構成］
図１は、本発明の一実施形態に係るロボットシステム１の構成を示すブロック図である。ロボットシステム１は、ロボット２と、撮像装置３と、制御装置４と、機械学習装置５とを備える。ロボットシステム１においては、撮像装置３から出力される画像データに基づき機械学習装置５がロボット２の動作を学習し、その学習結果に基づき制御装置４がロボット２の動作を制御する。 [Entire configuration of robot system]
FIG. 1 is a block diagram showing a configuration of a robot system 1 according to an embodiment of the present invention. The robot system 1 includes a robot 2, an imaging device 3, a control device 4, and a machine learning device 5. In the robot system 1, the machine learning device 5 learns the operation of the robot 2 based on the image data output from the imaging device 3, and the control device 4 controls the operation of the robot 2 based on the learning result.

まず、図２を参照しながらロボット２について説明する。図２は、ロボットシステム１に備えられるロボット２の一例を示す図である。ロボット２は、複数のワークＷをバラ積みされた状態で収容する容器ＣＮから、当該ワークＷを取り出すためのロボットである。容器ＣＮは、上方側が開口した有底筒状に形成されている。ロボット２は、容器ＣＮの上方側の開口を介してワークＷを取り出す。 First, the robot 2 will be described with reference to FIG. FIG. 2 is a diagram illustrating an example of the robot 2 provided in the robot system 1. The robot 2 is a robot for taking out the workpiece W from a container CN that accommodates a plurality of workpieces W in a stacked state. The container CN is formed in a bottomed cylindrical shape whose upper side is open. The robot 2 takes out the workpiece W through the opening on the upper side of the container CN.

ロボット２は、容器ＣＮからワークＷを取り出すことが可能なハンド部を備えたロボットであれば特に限定されるものではなく、例えば、垂直多関節ロボットや水平多関節ロボット、或いは双腕型の多関節ロボットを採用することができる。以下では、図２に示す６軸垂直多関節ロボットを例として、ロボット２の構成について説明する。なお、垂直多関節ロボットにおいて軸の数は、６軸に限定されるものではなく、任意の軸数でよい。ロボット２は、ベース部２１と、胴部２２と、第１アーム２３と、第２アーム２４と、手首部２５と、ハンド部２６とを備える。 The robot 2 is not particularly limited as long as it has a hand unit capable of taking out the workpiece W from the container CN. For example, the robot 2 is a vertical articulated robot, a horizontal articulated robot, or a double-armed robot. An articulated robot can be employed. Hereinafter, the configuration of the robot 2 will be described using the 6-axis vertical articulated robot shown in FIG. 2 as an example. Note that the number of axes in the vertical articulated robot is not limited to six, and may be any number of axes. The robot 2 includes a base part 21, a body part 22, a first arm 23, a second arm 24, a wrist part 25, and a hand part 26.

ベース部２１は、床や台等に固定設置され、内部に不図示の駆動モーター等を収容する箱体である。胴部２２は、ベース部２１の上面において、鉛直方向（上下方向）に延びる第１軸２Ａ回りに、正逆両方向に回転可能に配置される。第１アーム２３は、所定の長さを有するアーム部材であり、その長手方向の一端部が水平方向に延びる第２軸２Ｂを介して胴部２２に取り付けられている。第１アーム２３は、第２軸２Ｂ回りに、正逆両方向に回転可能である。 The base 21 is a box fixedly installed on a floor, a table, or the like, and accommodating a drive motor (not shown) therein. The body portion 22 is disposed on the upper surface of the base portion 21 so as to be rotatable in both forward and reverse directions around the first axis 2A extending in the vertical direction (vertical direction). The first arm 23 is an arm member having a predetermined length, and one end portion in the longitudinal direction thereof is attached to the trunk portion 22 via a second shaft 2B extending in the horizontal direction. The first arm 23 can rotate in both forward and reverse directions around the second axis 2B.

第２アーム２４は、アームベース２４１とアーム部２４２とを含む。アームベース２４１は、第２アーム２４のベース部分であり、第２軸２Ｂに対して平行且つ水平方向に延びる第３軸２Ｃを介して、第１アーム２３の長手方向他端部に取り付けられている。アームベース２４１は、第３軸２Ｃ回りに、正逆両方向に回転可能である。アーム部２４２は、所定の長さを有するアーム部材であり、その長手方向の一端部が第３軸２Ｃに対して垂直な第４軸２Ｄを介してアームベース２４１に取り付けられている。アーム部２４２は、第４軸２Ｄ回りに、正逆両方向に回転可能である。 The second arm 24 includes an arm base 241 and an arm part 242. The arm base 241 is a base portion of the second arm 24, and is attached to the other end portion in the longitudinal direction of the first arm 23 via a third shaft 2C extending parallel to the second shaft 2B and extending in the horizontal direction. Yes. The arm base 241 can rotate in both forward and reverse directions around the third axis 2C. The arm portion 242 is an arm member having a predetermined length, and one end portion in the longitudinal direction thereof is attached to the arm base 241 via a fourth shaft 2D perpendicular to the third shaft 2C. The arm portion 242 can rotate in both forward and reverse directions around the fourth axis 2D.

手首部２５は、第２軸２Ｂ及び第３軸２Ｃに対して平行且つ水平方向に延びる第５軸２Ｅを介して、アーム部２４２の長手方向他端部に取り付けられている。手首部２５は、第５軸２Ｅ回りに、正逆両方向に回転可能である。 The wrist 25 is attached to the other end in the longitudinal direction of the arm 242 via a fifth shaft 2E extending in a horizontal direction in parallel with the second shaft 2B and the third shaft 2C. The wrist portion 25 can rotate in both forward and reverse directions around the fifth axis 2E.

ハンド部２６は、ロボット２において容器ＣＮからワークＷを取り出す部分であり、第５軸２Ｅに対して垂直な第６軸２Ｆを介して手首部２５に取り付けられている。ハンド部２６は、第６軸２Ｆ回りに、正逆両方向に回転可能である。ハンド部２６は、容器ＣＮ内のワークＷを保持可能な構造であれば特に限定されるものではなく、例えば、ワークＷを把持して保持する複数の爪部を備えた構造であってもよいし、ワークＷに対して吸引力を発生する電磁石又は負圧発生装置を備えた構造であってもよい。本実施形態では、ハンド部２６は、複数の爪部２６１を備えた構造を有し、容器ＣＮ内のワークＷを爪部２６１によって保持（把持）することにより当該ワークＷを取り出す。 The hand portion 26 is a portion for taking out the work W from the container CN in the robot 2 and is attached to the wrist portion 25 via a sixth axis 2F perpendicular to the fifth axis 2E. The hand part 26 can rotate in both forward and reverse directions around the sixth axis 2F. The hand part 26 is not particularly limited as long as it can hold the workpiece W in the container CN. For example, the hand part 26 may have a structure including a plurality of claw parts that hold and hold the workpiece W. And the structure provided with the electromagnet or negative-pressure generator which generate | occur | produces an attractive force with respect to the workpiece | work W may be sufficient. In the present embodiment, the hand unit 26 has a structure including a plurality of claws 261, and takes out the work W in the container CN by holding (gripping) the work W in the container CN.

次に、撮像装置３は、容器ＣＮ内に収容される複数のワークＷの全てが視野内に収まるように容器ＣＮ内の全体を上方から撮像し、ワークＷの位置情報を含む画像データを出力する装置である。本実施形態では、撮像装置３は、図１に示すように、カメラ３１と画像処理部３２を備えた三次元視覚センサ等の三次元計測器である。カメラ３１は、容器ＣＮ内の全体を上方から撮像し、容器ＣＮ内に収容される複数のワークＷの各々の画像領域を含む画像を取得する。画像処理部３２は、カメラ３１が取得した画像を画像処理することによって、各ワークＷの三次元位置情報を含む画像データを生成する。各ワークの三次元位置情報は、例えば、ＸＹＺ直交座標系を用いた座標値（Ｘ，Ｙ，Ｚ）で表される。なお、ＸＹＺ直交座標系とは、Ｘ軸とＹ軸を含む平面（ＸＹ平面）が水平で、Ｚ軸がＸＹ平面に対して鉛直となるように各座標軸が配置された座標系である。撮像装置３から出力された画像データは、後述の機械学習装置５に備えられる変位量観測部６４及び判定部７に入力される。 Next, the imaging device 3 images the entire inside of the container CN from above so that all of the plurality of works W accommodated in the container CN fall within the field of view, and outputs image data including position information of the work W. It is a device to do. In the present embodiment, the imaging device 3 is a three-dimensional measuring instrument such as a three-dimensional visual sensor including a camera 31 and an image processing unit 32 as shown in FIG. The camera 31 captures an image of the entire inside of the container CN from above, and acquires an image including each image region of the plurality of works W accommodated in the container CN. The image processing unit 32 generates image data including three-dimensional position information of each workpiece W by performing image processing on the image acquired by the camera 31. The three-dimensional position information of each workpiece is represented by, for example, coordinate values (X, Y, Z) using an XYZ orthogonal coordinate system. The XYZ orthogonal coordinate system is a coordinate system in which coordinate axes are arranged such that a plane including the X axis and the Y axis (XY plane) is horizontal and the Z axis is perpendicular to the XY plane. The image data output from the imaging device 3 is input to a displacement observation unit 64 and a determination unit 7 provided in the machine learning device 5 described later.

次に、制御装置４は、ロボット２の動作を制御するとともに、撮像装置３の動作を制御する。制御装置４は、後述の機械学習装置５に備えられる行動決定部９によって生成される情報に基づいて、ロボット２の動作を制御する。 Next, the control device 4 controls the operation of the imaging device 3 while controlling the operation of the robot 2. The control device 4 controls the operation of the robot 2 based on information generated by an action determining unit 9 provided in the machine learning device 5 described later.

［機械学習装置の構成］
次に、機械学習装置５について説明する。機械学習装置５は、図１に示すように、ロボット２の動作を学習（機械学習）する学習処理を実行する学習処理部６と、判定部７と、記憶部８と、行動決定部９とを備える。機械学習装置５が実行する学習の手法としては、特に限定されるものではなく、例えば、「教師あり学習」、「教師なし学習」及び「強化学習」等を採用することができる。本実施形態では、機械学習装置５における学習の手法として、強化学習としてのＱ学習の手法が採用されている。Ｑ学習は、ロボット２の連続的な動作を複数の状態に区分し、状態が順次移行されるときのロボット２の行動について、報酬が得られるような価値の高い行動を学習する手法である。また、機械学習装置５が実行する強化学習としてのＱ学習は、例えば、ニューラルネットワーク（ＮｅｕｒａｌＮｅｔｗｏｒｋ）を使用して実現することが可能である。ニューラルネットワークは、人間の脳の構造を模した構成となっており、人間の脳におけるニューロン（神経細胞）の機能を模した論理回路を多層に積層して構成されたものである。 [Configuration of machine learning device]
Next, the machine learning device 5 will be described. As illustrated in FIG. 1, the machine learning device 5 includes a learning processing unit 6 that performs a learning process of learning the operation of the robot 2 (machine learning), a determination unit 7, a storage unit 8, an action determination unit 9, Is provided. The learning method executed by the machine learning device 5 is not particularly limited, and for example, “supervised learning”, “unsupervised learning”, “reinforcement learning”, and the like can be employed. In the present embodiment, a Q learning method as reinforcement learning is employed as a learning method in the machine learning device 5. Q-learning is a method of learning a high-value behavior that rewards the behavior of the robot 2 when the continuous motion of the robot 2 is divided into a plurality of states and the states are sequentially shifted. Further, Q learning as reinforcement learning executed by the machine learning device 5 can be realized by using, for example, a neural network. A neural network has a configuration that mimics the structure of a human brain, and is configured by stacking multiple logic circuits that mimic the function of neurons (neurons) in the human brain.

＜学習処理部について＞
学習処理部６は、ロボット２の動作を学習する学習処理を実行する部分である。学習処理部６は、ロボット２が生産動作を実行しているときに学習処理を実行してもよいし、ロボット２の生産動作とは切り離して学習処理を実行してもよい。ロボット２の生産動作とは、ハンド部２６によって容器ＣＮからワークＷを取り出し、その取り出したワークＷをパレットＰＬ（後記の図３参照）に載置するという、ロボット２の連続的な動作のことである。学習処理部６は、状態観測部６１と、行動観測部６２と、学習部６３と、変位量観測部６４とを含んで構成される。 <About the learning processor>
The learning processing unit 6 is a part that executes a learning process for learning the operation of the robot 2. The learning processing unit 6 may execute the learning process when the robot 2 is performing the production operation, or may execute the learning process separately from the production operation of the robot 2. The production operation of the robot 2 is a continuous operation of the robot 2 in which the work W is taken out of the container CN by the hand unit 26 and the taken out work W is placed on the pallet PL (see FIG. 3 described later). It is. The learning processing unit 6 includes a state observation unit 61, a behavior observation unit 62, a learning unit 63, and a displacement amount observation unit 64.

（状態観測部について）
図３は、状態観測部６１の動作を説明するための図である。なお、図３においては、ロボット２及び容器ＣＮを上方から見た状態が示されており、容器ＣＮ内には３つのワークＷ１，Ｗ２，Ｗ３がバラ積みされている。ロボット２は、ハンド部２６によって容器ＣＮから一のワークＷ３を取り出し、その取り出したワークＷ３をパレットＰＬに載置するという、連続的な動作を実行する。状態観測部６１は、ロボット２の連続的な動作を複数の状態に区分し、そのロボット２の状態を観測する。 (About the state observation unit)
FIG. 3 is a diagram for explaining the operation of the state observation unit 61. Note that FIG. 3 shows a state in which the robot 2 and the container CN are viewed from above, and three works W1, W2, and W3 are stacked in the container CN. The robot 2 performs a continuous operation of taking out one work W3 from the container CN by the hand unit 26 and placing the taken work W3 on the pallet PL. The state observation unit 61 divides the continuous operation of the robot 2 into a plurality of states, and observes the state of the robot 2.

状態観測部６１が観測するロボット２の状態の数は、特に限定されるものではないが、図３では、状態Ｓ１、状態Ｓ２、状態Ｓ３及び状態Ｓ４の４つの状態が示されている。状態Ｓ１は、ロボット２のハンド部２６がパレットＰＬの上方における所定の位置に配置されるように、胴部２２、第１アーム２３、第２アーム２４、手首部２５及びハンド部２６の姿勢が調整されたロボット２の状態である。状態Ｓ２は、ハンド部２６の爪部２６１によって容器ＣＮ内の保持（把持）対象となるワークＷ３を保持する直前の状態であって、ワークＷ３の真上における所定の位置にハンド部２６が配置されるように、胴部２２、第１アーム２３、第２アーム２４、手首部２５及びハンド部２６の姿勢が調整されたロボット２の状態である。状態Ｓ３は、ハンド部２６の爪部２６１が容器ＣＮ内の保持対象となるワークＷ３を保持するように、胴部２２、第１アーム２３、第２アーム２４、手首部２５及びハンド部２６の姿勢が調整されたロボット２の状態である。状態Ｓ４は、ハンド部２６の爪部２６１によって保持したワークＷ３をパレットＰＬに載置するように、胴部２２、第１アーム２３、第２アーム２４、手首部２５及びハンド部２６の姿勢が調整されたロボット２の状態である。ロボット２は、状態Ｓ１、状態Ｓ２、状態Ｓ３、状態Ｓ４の順に状態を連続的に移行することにより、ハンド部２６によって容器ＣＮから一のワークＷ３を取り出し、その取り出したワークＷ３をパレットＰＬに載置する。 Although the number of states of the robot 2 observed by the state observation unit 61 is not particularly limited, FIG. 3 shows four states S1, S2, S3, and S4. In the state S1, the postures of the trunk portion 22, the first arm 23, the second arm 24, the wrist portion 25, and the hand portion 26 are set so that the hand portion 26 of the robot 2 is disposed at a predetermined position above the pallet PL. The state of the adjusted robot 2 is shown. The state S2 is a state immediately before holding the work W3 to be held (gripped) in the container CN by the claw part 261 of the hand part 26, and the hand part 26 is arranged at a predetermined position directly above the work W3. This is the state of the robot 2 in which the postures of the trunk 22, the first arm 23, the second arm 24, the wrist 25, and the hand 26 are adjusted. The state S3 is such that the claws 261 of the hand unit 26 hold the work W3 to be held in the container CN, the body 22, the first arm 23, the second arm 24, the wrist 25, and the hand unit 26. This is the state of the robot 2 whose posture has been adjusted. In the state S4, the postures of the body 22, the first arm 23, the second arm 24, the wrist 25, and the hand 26 are set such that the work W3 held by the claw 261 of the hand 26 is placed on the pallet PL. This is the state of the robot 2 adjusted. The robot 2 takes out one work W3 from the container CN by the hand unit 26 by continuously shifting the state in the order of the state S1, the state S2, the state S3, and the state S4, and places the taken out work W3 on the pallet PL. Place.

ロボット２の状態は、状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）によって規定される。状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）は、ロボット２の状態が移行されるごとに変化する変数である。 The state of the robot 2 is defined by state variables (ΔX, ΔY, ΔZ, p, d). The state variables (ΔX, ΔY, ΔZ, p, d) are variables that change each time the state of the robot 2 is shifted.

状態変数の「ΔＸ」は、ＸＹＺ直交座標系において、ハンド部２６の爪部２６１による保持対象となるワークＷ３の容器ＣＮ内における位置に関するＸ座標値を基準値（以下、「Ｘ基準値」と称する）とし、ハンド部２６の位置に関するＸ座標値（以下、「ハンドＸ値」と称する）の前記Ｘ基準値に対する差異を表す。状態変数の「ΔＹ」は、ＸＹＺ直交座標系において、ハンド部２６の爪部２６１による保持対象となるワークＷ３の容器ＣＮ内における位置に関するＹ座標値を基準値（以下、「Ｙ基準値」と称する）とし、ハンド部２６の位置に関するＹ座標値（以下、「ハンドＹ値」と称する）の前記Ｙ基準値に対する差異を表す。状態変数の「ΔＺ」は、ＸＹＺ直交座標系において、ハンド部２６の爪部２６１による保持対象となるワークＷ３の容器ＣＮ内における位置に関するＺ座標値を基準値（以下、「Ｚ基準値」と称する）とし、ハンド部２６の位置に関するＺ座標値（以下、「ハンドＺ値」と称する）の前記Ｚ基準値に対する差異を表す。状態変数の「ｐ」は、ハンド部２６の爪部２６１がワークＷ３を保持しているか否かを表す。状態変数の「ｐ」は、ハンド部２６の爪部２６１がワークＷ３を保持している場合には「１」とされ、ハンド部２６の爪部２６１がワークＷ３を保持していない場合には「０：ゼロ」とされる。状態変数の「ｄ」は、ハンド部２６の爪部２６１による一のワークＷ３に対して次の保持候補となるワークの周囲に、爪部２６１による保持を可能とするための保持スペースが確保されているか否かを表す。状態変数の「ｄ」は、次の保持候補となるワークの周囲に保持スペースが確保されている場合には「１」とされ、保持スペースが確保されていない場合には「０：ゼロ」とされる。 The state variable “ΔX” refers to an X coordinate value related to the position in the container CN of the work W3 to be held by the claw 261 of the hand unit 26 in a XYZ orthogonal coordinate system as a reference value (hereinafter referred to as “X reference value”). ), And represents a difference between the X coordinate value (hereinafter, referred to as “hand X value”) of the position of the hand unit 26 with respect to the X reference value. The state variable “ΔY” refers to a Y coordinate value related to the position in the container CN of the work W3 to be held by the claw portion 261 of the hand unit 26 in a XYZ orthogonal coordinate system as a reference value (hereinafter, “Y reference value”). ), And represents a difference between the Y coordinate value (hereinafter, referred to as “hand Y value”) of the position of the hand unit 26 with respect to the Y reference value. The state variable “ΔZ” refers to a Z coordinate value related to the position in the container CN of the work W3 to be held by the claw 261 of the hand unit 26 in a XYZ orthogonal coordinate system as a reference value (hereinafter referred to as “Z reference value”). ), And represents the difference between the Z coordinate value (hereinafter, referred to as “hand Z value”) of the position of the hand unit 26 with respect to the Z reference value. The state variable “p” indicates whether or not the claw portion 261 of the hand unit 26 holds the work W3. The state variable “p” is “1” when the claw 261 of the hand unit 26 holds the work W3, and is “1” when the claw 261 of the hand unit 26 does not hold the work W3. "0: Zero" is set. As for the state variable “d”, a holding space for enabling the holding by the claw 261 is secured around the work that is the next holding candidate for one work W3 by the claw 261 of the hand unit 26. It indicates whether or not. The state variable "d" is set to "1" when a holding space is secured around the next candidate work, and is set to "0: zero" when a holding space is not secured. Is done.

図３に示す例において、ロボット２の状態が状態Ｓ１である場合には、ハンド部２６は、容器ＣＮに対してＸ軸、Ｙ軸及びＺ軸の各座標軸の軸方向に離れており、爪部２６１によってワークＷ３を保持しておらず、ワークＷ１，Ｗ２の周囲には爪部２６１による保持スペースが確保されている。このため、ロボット２の状態Ｓ１を規定する状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）においては、「ΔＸ」、「ΔＹ」及び「ΔＺ」がそれぞれ所定の値「ＸＡ」、「ＹＡ」及び「ＺＡ」を示し、「ｐ」が「０：ゼロ」を示し、「ｄ」が「１」を示す。 In the example shown in FIG. 3, when the state of the robot 2 is the state S1, the hand unit 26 is separated from the container CN in the X-axis, Y-axis, and Z-axis directions, The work W3 is not held by the part 261 and a holding space for the claw part 261 is secured around the work W1, W2. Therefore, in the state variables (ΔX, ΔY, ΔZ, p, d) that define the state S1 of the robot 2, “ΔX”, “ΔY”, and “ΔZ” are predetermined values “XA”, “YA”, respectively. , “ZA”, “p” indicates “0: zero”, and “d” indicates “1”.

図３に示す例において、ロボット２の状態が状態Ｓ２である場合には、ハンド部２６は、容器ＣＮに対してＸ軸及びＹ軸の各軸方向には離れていないけれどもＺ軸方向には離れており、爪部２６１によってワークＷ３を保持しておらず、ワークＷ１，Ｗ２の周囲には爪部２６１による保持スペースが確保されている。このため、ロボット２の状態Ｓ２を規定する状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）においては、「ΔＸ」及び「ΔＹ」がそれぞれ「０；ゼロ」を示し、「ΔＺ」が所定の値「ＺＡ」を示し、「ｐ」が「０；ゼロ」を示し、「ｄ」が「１」を示す。 In the example illustrated in FIG. 3, when the state of the robot 2 is the state S2, the hand unit 26 is not separated from the container CN in the X-axis and Y-axis directions but is not separated in the Z-axis direction. The work W3 is not held by the claw portion 261 and the holding space for the claw portion 261 is secured around the works W1 and W2. Therefore, in the state variables (ΔX, ΔY, ΔZ, p, d) that define the state S2 of the robot 2, “ΔX” and “ΔY” indicate “0; zero”, respectively, and “ΔZ” indicates a predetermined value. The value “ZA” is indicated, “p” indicates “0; zero”, and “d” indicates “1”.

図３に示す例において、ロボット２の状態が状態Ｓ３である場合には、ハンド部２６は、容器ＣＮに対してＸ軸、Ｙ軸及びＺ軸の各座標軸の軸方向には離れておらず、爪部２６１によってワークＷ３を保持しており、ワークＷ１，Ｗ２の周囲には爪部２６１による保持スペースが確保されている。このため、ロボット２の状態Ｓ３を規定する状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）においては、「ΔＸ」、「ΔＹ」及び「ΔＺ」がそれぞれ「０；ゼロ」を示し、「ｐ」が「１」を示し、「ｄ」が「１」を示す。 In the example illustrated in FIG. 3, when the state of the robot 2 is the state S3, the hand unit 26 is not separated from the container CN in the X-axis, the Y-axis, and the Z-axis. The work W3 is held by the claws 261 and a space for holding the work W3 is secured around the works W1 and W2. For this reason, in the state variables (ΔX, ΔY, ΔZ, p, d) that define the state S3 of the robot 2, “ΔX”, “ΔY” and “ΔZ” indicate “0; Indicates “1”, and “d” indicates “1”.

図３に示す例において、ロボット２の状態が状態Ｓ４である場合には、ハンド部２６は、容器ＣＮに対してＸ軸、Ｙ軸及びＺ軸の各座標軸の軸方向に離れており、爪部２６１によってワークＷ３を保持しており、ワークＷ１，Ｗ２の周囲には爪部２６１による保持スペースが確保されている。このため、ロボット２の状態Ｓ４を規定する状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）においては、「ΔＸ」、「ΔＹ」及び「ΔＺ」がそれぞれ所定の値「ＸＡ」、「ＹＡ」及び「ＺＡ」を示し、「ｐ」が「１」を示し、「ｄ」が「１」を示す。 In the example shown in FIG. 3, when the state of the robot 2 is the state S4, the hand unit 26 is separated from the container CN in the X-axis, Y-axis, and Z-axis directions, The work W3 is held by the portion 261 and a holding space for the claw portion 261 is secured around the works W1 and W2. Therefore, in the state variables (ΔX, ΔY, ΔZ, p, d) that define the state S4 of the robot 2, “ΔX”, “ΔY”, and “ΔZ” are predetermined values “XA”, “YA”, respectively. And “ZA”, “p” indicates “1”, and “d” indicates “1”.

状態観測部６１は、ロボット２の状態が移行されるごとに変化する状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）に基づいて、ロボット２の状態が状態Ｓ１、状態Ｓ２、状態Ｓ３及び状態Ｓ４の何れの状態であるかを認識することができる。なお、ロボット２の状態が状態Ｓ１、状態Ｓ２及び状態Ｓ３の何れかの状態である場合には、胴部２２、第１アーム２３、第２アーム２４、手首部２５及びハンド部２６の姿勢等の違いにより、複数のサブ状態が存在する。状態観測部６１は、ロボット２の状態が状態Ｓ１、状態Ｓ２及び状態Ｓ３の何れかの状態である場合には、サブ状態についても観測する。ハンド部２６の爪部２６１が保持したワークＷ３をパレットＰＬに載置するという、ロボット２の最終目標の状態を示す状態Ｓ４については、状態Ｓ１、状態Ｓ２及び状態Ｓ３のようなサブ状態は存在しない。 The state observation unit 61 determines whether the state of the robot 2 is the state S1, the state S2, the state S3, or the state based on the state variables (ΔX, ΔY, ΔZ, p, d) that change each time the state of the robot 2 is shifted. The state of S4 can be recognized. When the state of the robot 2 is any one of the state S1, the state S2, and the state S3, the posture of the trunk unit 22, the first arm 23, the second arm 24, the wrist unit 25, the hand unit 26, and the like Due to the difference, there are a plurality of sub-states. When the state of the robot 2 is any of the state S1, the state S2, and the state S3, the state observation unit 61 also observes the sub state. As for the state S4 indicating the state of the final target of the robot 2 in which the work W3 held by the claw part 261 of the hand part 26 is placed on the pallet PL, there are sub-states such as the state S1, the state S2, and the state S3. do not do.

（行動観測部について）
図４は、行動観測部６２の動作を説明するための図である。なお、図４においては、ロボット２の状態について、状態Ｓ１には複数のサブ状態「Ｓ１−１、Ｓ１−２、・・・Ｓ１−ｎ」が存在することが示され、状態Ｓ２には複数のサブ状態「Ｓ２−１、Ｓ２−２、・・・Ｓ２−ｎ」が存在することが示され、状態Ｓ３には複数のサブ状態「Ｓ３−１、Ｓ３−２、・・・Ｓ３−ｎ」が存在することが示されている。 (About the behavior observation section)
FIG. 4 is a diagram for explaining the operation of the behavior observation unit 62. Note that FIG. 4 shows that the state of the robot 2 includes a plurality of sub-states “S1-1, S1-2,... S1-n” in the state S1, and a plurality of sub-states in the state S2. , S2-2,... S2-n are present, and the state S3 includes a plurality of sub-states "S3-1, S3-2,. Is present.

行動観測部６２は、ロボット２の状態が移行されるときのロボット２の行動パターンを観測する。より詳しくは、行動観測部６２は、ロボット２の行動パターンについて、ロボット２の状態が状態Ｓ１から状態Ｓ２へ移行されるときの行動パターン、状態Ｓ２から状態Ｓ３へ移行されるときの行動パターン、状態Ｓ３から状態Ｓ４へ移行されるときの行動パターンをそれぞれ観測する。状態が移行されるときにロボット２が取り得る行動パターンは、状態Ｓ１、状態Ｓ２及び状態Ｓ３の各々におけるサブ状態の存在数に応じて、複数存在する（行動Ａ１、行動Ａ２、・・・行動Ａｎ）。なお、ロボット２の状態が状態Ｓ３から状態Ｓ４へ移行されるときには、容器ＣＮ内の一のワークＷをハンド部２６の爪部２６１によって保持した状態で当該容器ＣＮから取り出す取り出し動作が実行されている。 The behavior observation unit 62 observes a behavior pattern of the robot 2 when the state of the robot 2 is shifted. More specifically, the behavior observing unit 62 regards the behavior pattern of the robot 2, the behavior pattern when the state of the robot 2 is shifted from the state S1 to the state S2, the behavior pattern when the state is shifted from the state S2 to the state S3, The behavior pattern when the state S3 is shifted to the state S4 is observed. There are a plurality of action patterns that the robot 2 can take when the state is shifted, depending on the number of sub-states in each of the states S1, S2, and S3 (action A1, action A2,... Action An). When the state of the robot 2 is shifted from the state S3 to the state S4, a take-out operation is performed in which one work W in the container CN is taken out from the container CN while being held by the claw 261 of the hand unit 26. I have.

行動観測部６２によって観測されるロボット２の行動パターンを規定する行動要素としては、図５に示される、把持角θ、把持位置ＨＰ、第１軸２Ａにおける回転角β１及び回転速度パターン、第２軸２Ｂにおける回転角β２及び回転速度パターン、第３軸２Ｃにおける回転角β３及び回転速度パターン、第４軸２Ｄにおける回転角β４及び回転速度パターン、第５軸２Ｅにおける回転角β５及び回転速度パターン、第６軸２Ｆにおける回転角β６及び回転速度パターンが含まれる。なお、前述したように、垂直多関節ロボットからなるロボット２において、その軸数は６軸に限定されるものではなく、任意である。このため、ロボット２の行動パターンを規定する行動要素に含まれる各軸における回転角及び回転速度パターンは、軸数に応じたものとなる。 The behavior elements that define the behavior pattern of the robot 2 observed by the behavior observation unit 62 include the grip angle θ, the grip position HP, the rotation angle β1 and the rotation speed pattern on the first axis 2A, and the second pattern shown in FIG. Rotation angle β2 and rotation speed pattern on axis 2B, rotation angle β3 and rotation speed pattern on third axis 2C, rotation angle β4 and rotation speed pattern on fourth axis 2D, rotation angle β5 and rotation speed pattern on fifth axis 2E, The rotation angle β6 and the rotation speed pattern on the sixth axis 2F are included. As described above, in the robot 2 composed of the vertical articulated robot, the number of axes is not limited to six, but is arbitrary. For this reason, the rotation angle and the rotation speed pattern in each axis included in the action element that defines the action pattern of the robot 2 are in accordance with the number of axes.

把持角θは、ハンド部２６においてワークＷを保持（把持）するための２つの爪部２６１の成す角度である（図２参照）。把持位置ＨＰは、ハンド部２６が一のワークＷを取り出すときの、当該一のワークＷを爪部２６１によって保持（把持）する位置を表す。第１軸２Ａにおける回転角β１は、ロボット２の状態が移行されるときの胴部２２の第１軸２Ａ回りの回転角を表す。胴部２２は第１軸２Ａ回りに正逆両方向に回転可能であるので、回転角β１は、胴部２２が正方向に回転するときには「正；プラス」の回転角で示され、胴部２２が逆方向に回転するときには「負；マイナス」の回転角で示される。第２軸２Ｂにおける回転角β２は、ロボット２の状態が移行されるときの第１アーム２３の第２軸２Ｂ回りの回転角を表す。第１アーム２３は第２軸２Ｂ回りに正逆両方向に回転可能であるので、回転角β２は、第１アーム２３が正方向に回転するときには「正；プラス」の回転角で示され、第１アーム２３が逆方向に回転するときには「負；マイナス」の回転角で示される。第３軸２Ｃにおける回転角β３は、ロボット２の状態が移行されるときのアームベース２４１の第３軸２Ｃ回りの回転角を表す。アームベース２４１は第３軸２Ｃ回りに正逆両方向に回転可能であるので、回転角β３は、アームベース２４１が正方向に回転するときには「正；プラス」の回転角で示され、アームベース２４１が逆方向に回転するときには「負；マイナス」の回転角で示される。 The grip angle θ is an angle formed by two claw portions 261 for holding (gripping) the work W in the hand portion 26 (see FIG. 2). The gripping position HP indicates a position where the one work W is held (gripped) by the claw 261 when the hand unit 26 takes out one work W. The rotation angle β1 on the first axis 2A represents the rotation angle of the trunk 22 around the first axis 2A when the state of the robot 2 is shifted. Since the barrel 22 can rotate in both forward and reverse directions around the first axis 2A, the rotation angle β1 is indicated by a “positive; plus” rotation angle when the barrel 22 rotates in the forward direction. When the lens rotates in the reverse direction, it is indicated by a rotation angle of “negative; minus”. The rotation angle β2 on the second axis 2B represents the rotation angle around the second axis 2B of the first arm 23 when the state of the robot 2 is shifted. Since the first arm 23 is rotatable in both the forward and reverse directions about the second axis 2B, the rotation angle β2 is indicated by a “positive; plus” rotation angle when the first arm 23 rotates in the forward direction. When the one arm 23 rotates in the reverse direction, the rotation angle is indicated by “negative; minus”. The rotation angle β3 on the third axis 2C represents the rotation angle around the third axis 2C of the arm base 241 when the state of the robot 2 is shifted. Since the arm base 241 can rotate in both forward and reverse directions around the third axis 2C, the rotation angle β3 is indicated by a “positive; plus” rotation angle when the arm base 241 rotates in the forward direction. When the lens rotates in the reverse direction, it is indicated by a rotation angle of “negative; minus”.

第４軸２Ｄにおける回転角β４は、ロボット２の状態が移行されるときのアーム部２４２の第４軸２Ｄ回りの回転角を表す。アーム部２４２は第４軸２Ｄ回りに正逆両方向に回転可能であるので、回転角β４は、アーム部２４２が正方向に回転するときには「正；プラス」の回転角で示され、アーム部２４２が逆方向に回転するときには「負；マイナス」の回転角で示される。第５軸２Ｅにおける回転角β５は、ロボット２の状態が移行されるときの手首部２５の第５軸２Ｅ回りの回転角を表す。手首部２５は第５軸２Ｅ回りに正逆両方向に回転可能であるので、回転角β５は、手首部２５が正方向に回転するときには「正；プラス」の回転角で示され、手首部２５が逆方向に回転するときには「負；マイナス」の回転角で示される。第６軸２Ｆにおける回転角β６は、ロボット２の状態が移行されるときのハンド部２６の第６軸２Ｆ回りの回転角を表す。ハンド部２６は第６軸２Ｆ回りに正逆両方向に回転可能であるので、回転角β６は、ハンド部２６が正方向に回転するときには「正；プラス」の回転角で示され、ハンド部２６が逆方向に回転するときには「負；マイナス」の回転角で示される。 The rotation angle β4 on the fourth axis 2D represents the rotation angle of the arm 242 about the fourth axis 2D when the state of the robot 2 is shifted. Since the arm portion 242 can rotate in both forward and reverse directions around the fourth axis 2D, the rotation angle β4 is indicated by a “positive; plus” rotation angle when the arm portion 242 rotates in the forward direction. When the lens rotates in the reverse direction, it is indicated by a rotation angle of “negative; minus”. A rotation angle β5 on the fifth axis 2E represents a rotation angle around the fifth axis 2E of the wrist 25 when the state of the robot 2 is shifted. Since the wrist portion 25 can rotate in both forward and reverse directions around the fifth axis 2E, the rotation angle β5 is indicated by a “positive; plus” rotation angle when the wrist portion 25 rotates in the forward direction. When the lens rotates in the reverse direction, it is indicated by a rotation angle of “negative; minus”. The rotation angle β6 on the sixth axis 2F represents the rotation angle around the sixth axis 2F of the hand unit 26 when the state of the robot 2 is shifted. Since the hand portion 26 can rotate in both forward and reverse directions around the sixth axis 2F, the rotation angle β6 is indicated by a “positive; plus” rotation angle when the hand portion 26 rotates in the forward direction. When the lens rotates in the reverse direction, it is indicated by a rotation angle of “negative; minus”.

各軸２Ａ〜２Ｆにおける回転速度パターンは、各軸回りの回転速度のパターンを表し、図５に示す第１パターン、第２パターン及び第３パターンに区分される。回転速度の第１パターンは、時間経過に伴って回転速度が直線的に上昇する上昇領域と、当該上昇領域の終端から時間経過に伴って回転速度が直線的に下降する下降領域との、２つの領域からなる。回転速度の第２パターンは、時間経過に伴って回転速度が直線的に上昇する上昇領域と、当該上昇領域の終端から一定時間の間で回転速度が等速となる等速領域と、等速領域の終端から時間経過に伴って回転速度が直線的に下降する下降領域との、３つの領域からなる。回転速度の第３パターンは、時間経過に伴って回転速度が曲線的に上昇する上昇領域と、当該上昇領域の終端から時間経過に伴って回転速度が曲線的に下降する下降領域との、２つの領域からなる。 The rotation speed pattern on each of the axes 2A to 2F indicates a rotation speed pattern around each axis, and is divided into a first pattern, a second pattern, and a third pattern shown in FIG. The first pattern of the rotational speed includes two ascending regions in which the rotational speed increases linearly with the passage of time, and a descending region in which the rotational speed decreases linearly with the passage of time from the end of the upward region. It consists of two areas. The second pattern of the rotational speed includes an ascending region where the rotational speed increases linearly with the passage of time, a constant velocity region where the rotational speed is constant at a constant time from the end of the ascending region, and a constant speed. It consists of three regions: a descending region where the rotational speed decreases linearly with the passage of time from the end of the region. The third pattern of the rotational speed includes two ascending areas in which the rotational speed rises in a curve with time and a descending area in which the rotational speed falls in a curve with time from the end of the ascending area. It consists of two areas.

行動観測部６２は、各行動要素に基づいて、ロボット２の状態が移行されるときの行動パターンを認識することができる。 The behavior observation unit 62 can recognize a behavior pattern when the state of the robot 2 is shifted based on each behavior element.

ロボット２の状態が状態Ｓ１から状態Ｓ２へ移行されるときのロボット２の最適な行動パターンと、状態Ｓ２から状態Ｓ３へ移行されるときのロボット２の最適な行動パターンとは、後述の学習部６３によって既に学習済みである。また、状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）が（０，０，０，１，１）である場合の状態Ｓ３から状態Ｓ４へ移行されるときのロボット２の最適な行動パターンについても、後述の学習部６３によって既に学習済みである。つまり、ロボット２の状態が状態Ｓ３から状態Ｓ４へ移行されるときのロボット２の行動パターンについて、状態変数の「ｄ」が「１」であり、ハンド部２６の爪部２６１による次の保持候補となるワークの周囲に爪部２６１による保持スペースが確保されている場合の行動パターンについては既に学習済みである。このような、学習部６３によって既に学習済みのロボット２の行動パターンについては、記憶部８に記憶されている。 The optimum behavior pattern of the robot 2 when the state of the robot 2 is shifted from the state S1 to the state S2 and the optimum behavior pattern of the robot 2 when the state of the robot 2 is changed from the state S2 to the state S3 are described later in a learning unit. 63 has already been learned. Further, regarding the optimal behavior pattern of the robot 2 when the state is shifted from the state S3 to the state S4 when the state variables (ΔX, ΔY, ΔZ, p, d) are (0, 0, 0, 1, 1). Have already been learned by the learning unit 63 described later. That is, regarding the behavior pattern of the robot 2 when the state of the robot 2 is shifted from the state S3 to the state S4, the state variable “d” is “1”, and the next holding candidate by the claw 261 of the hand unit 26 is set. The behavior pattern in the case where the holding space for the claw portion 261 is secured around the work to be formed has already been learned. The behavior pattern of the robot 2 already learned by the learning unit 63 is stored in the storage unit 8.

記憶部８に記憶されている既存の行動パターンは、後述の行動決定部９によって記憶部８から読み出されて、制御装置４に向けて出力される。既存の行動パターンが入力された制御装置４は、当該既存の行動パターンに基づいて、ロボット２の動作を制御することができる。制御装置４の制御によってロボット２は、ハンド部２６によって容器ＣＮからワークＷを取り出し、その取り出したワークＷをパレットＰＬに載置するという、連続的な生産動作を実行する。 The existing behavior pattern stored in the storage unit 8 is read from the storage unit 8 by the behavior determination unit 9 described later, and is output to the control device 4. The control device 4 to which the existing behavior pattern is input can control the operation of the robot 2 based on the existing behavior pattern. Under the control of the control device 4, the robot 2 executes a continuous production operation in which the hand W takes out the work W from the container CN and places the taken out work W on the pallet PL.

一方、状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）が（０，０，０，１，０）である場合の状態Ｓ３から状態Ｓ４へ移行されるときのロボット２の最適な行動パターンについては、後述の学習部６３によって強化学習される。つまり、ロボット２の状態が状態Ｓ３から状態Ｓ４へ移行されるときのロボット２の行動パターンについて、状態変数の「ｄ」が「０」であり、ハンド部２６の爪部２６１による次の保持候補となるワークの周囲に爪部２６１による保持スペースが確保されていない場合の行動パターンについては、学習部６３によって強化学習される。 On the other hand, regarding the optimal behavior pattern of the robot 2 when shifting from the state S3 to the state S4 when the state variables (ΔX, ΔY, ΔZ, p, d) are (0, 0, 0, 1, 0). Is subjected to reinforcement learning by a learning unit 63 described later. That is, regarding the behavior pattern of the robot 2 when the state of the robot 2 is shifted from the state S3 to the state S4, the state variable “d” is “0”, and the next holding candidate by the claw 261 of the hand unit 26 is set. The behavior pattern when the holding space of the claw portion 261 is not secured around the work to be performed is subjected to the reinforcement learning by the learning section 63.

ハンド部２６の爪部２６１による次の保持候補となるワークが、保持スペースが周囲に確保されておらず爪部２６１による保持が不可能な保持不可ワークであるか否かは、判定部７によって判定される。判定部７は、ハンド部２６が爪部２６１によって容器ＣＮ内の一のワークＷを保持する前又は保持したときに、一のワークＷに対して次の保持候補となるワークが保持不可ワークであるか否かを判定する。ハンド部２６が爪部２６１によって容器ＣＮ内の一のワークＷを保持する前又は保持したときには、撮像装置３におけるカメラ３１の撮像動作によって基準画像が取得され、画像処理部３２の基準画像に対する画像処理によって基準画像データが生成されている。基準画像データは、次の保持候補となるワークに関する三次元位置情報を含む画像データとなる。判定部７は、撮像装置３から出力される基準画像データに基づいて、容器ＣＮ内での各ワークの収容状況を認識し、次の保持候補となるワークが保持不可ワークであるか否かを判定する。 The determination unit 7 determines whether or not the work to be held next by the claw 261 of the hand unit 26 is a work that cannot be held by the claw 261 because a holding space is not secured around the work. Is determined. Before or when the hand unit 26 holds one work W in the container CN by the claw unit 261, the determination unit 7 determines that a work that is the next holding candidate for one work W is a work that cannot be held. It is determined whether or not there is. Before or when the hand unit 26 holds one work W in the container CN by the claw unit 261, a reference image is obtained by an imaging operation of the camera 31 in the imaging device 3, and an image corresponding to the reference image of the image processing unit 32 is obtained. Reference image data is generated by the processing. The reference image data is image data including three-dimensional position information regarding a work to be a next holding candidate. The determination unit 7 recognizes the accommodation state of each work in the container CN based on the reference image data output from the imaging device 3, and determines whether or not the work to be the next holding candidate is a work that cannot be held. judge.

判定部７は、ハンド部２６の爪部２６１が挿入不能となるように、次の保持候補となるワークが容器ＣＮの内面に近接した状態で配置されている場合や、複数のワーク同士が互いに近接した状態で配置されている場合に、爪部２６１による保持を可能とするための保持スペースが確保されていないと判断し、保持不可ワークであると判定する。判定部７によって次の保持候補となるワークが保持不可ワークであると判定された場合に、ロボット２の状態が状態Ｓ３であるときの状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）が（０，０，０，１，０）とされる。 The determination unit 7 determines whether the next work candidate to be held is arranged close to the inner surface of the container CN so that the claw unit 261 of the hand unit 26 cannot be inserted, or if a plurality of works In the case where the work is arranged in the proximity state, it is determined that the holding space for enabling the holding by the claw portion 261 is not secured, and it is determined that the work cannot be held. When the determination unit 7 determines that the next work to be held is a work that cannot be held, the state variables (ΔX, ΔY, ΔZ, p, d) when the state of the robot 2 is the state S3 are ( 0,0,0,1,0).

ロボット２の状態が状態Ｓ３であるときに、判定部７によって次の保持候補となるワークが保持不可ワークであると判定された場合、状態Ｓ３から状態Ｓ４へ移行する前にロボット２は、所定の変位手法を用いて保持不可ワークを変位させる変位動作を実行する。ロボット２が変位動作を実行するときに用いる、保持不可ワークを変位させるための変位手法としては、例えば、図６に示す第１〜第７手法が挙げられる。また、第１〜第７手法から選択される複数の手法が組み合わされたものを変位手法としてもよい。なお、図６においては、ハンド部２６の爪部２６１によって容器ＣＮ内の一のワークＷ３を保持したときに、ワークＷ１，Ｗ２の周囲に保持スペースが確保されておらず、ワークＷ１，Ｗ２が保持不可ワークとされている。 When the state of the robot 2 is the state S3 and the determination unit 7 determines that the work to be the next holding candidate is a work that cannot be held, the robot 2 performs a predetermined operation before shifting from the state S3 to the state S4. A displacement operation for displacing the non-holdable work is performed by using the displacement method of (1). Examples of the displacement method used when the robot 2 performs the displacement operation for displacing the non-holdable work include first to seventh methods shown in FIG. Further, a combination of a plurality of methods selected from the first to seventh methods may be used as the displacement method. In FIG. 6, when one work W3 in the container CN is held by the claw portion 261 of the hand portion 26, no holding space is secured around the work W1, W2, and the work W1, W2 is The work cannot be held.

第１手法は、爪部２６１によって保持した一のワークＷ３をパレットＰＬへ向けて移動させるときに、ハンド部２６が一のワークＷ３を保持不可ワークＷ２に当接させた状態で移動することにより、当該保持不可ワークＷ２を変位させる変位手法である。ロボット２による第１手法を用いた変位動作によって、ハンド部２６の爪部２６１による保持が不可能とされたワークＷ２の周囲に、爪部２６１による保持を可能とするための保持スペースが確保され、当該ワークＷ２の爪部２６１による保持が可能となる。 The first method is to move one work W3 held by the claw portion 261 toward the pallet PL by moving the hand portion 26 in a state where the one work W3 is in contact with the non-holdable work W2. This is a displacement method for displacing the non-holdable work W2. By the displacement operation of the robot 2 using the first method, a holding space for enabling the holding by the claw portion 261 is secured around the work W2 where the holding of the hand portion 26 by the claw portion 261 is impossible. Thus, the work W2 can be held by the claw portion 261.

第２手法は、爪部２６１によって保持した一のワークＷ３をパレットＰＬに載置した後、ハンド部２６が爪部２６１によって容器ＣＮを保持した状態で移動することにより、容器ＣＮの移動に応じて保持不可ワークＷ２を変位させる変位手法である。ロボット２による第２手法を用いた変位動作によって、ハンド部２６の爪部２６１による保持が不可能とされたワークＷ２の周囲に、爪部２６１による保持を可能とするための保持スペースが確保され、当該ワークＷ２の爪部２６１による保持が可能となる。 In the second method, after one work W3 held by the claw portion 261 is placed on the pallet PL, the hand portion 26 moves while holding the container CN by the claw portion 261 to respond to the movement of the container CN. This is a displacement method for displacing the work W2 that cannot be held. By the displacement operation of the robot 2 using the second method, a holding space for enabling the holding by the claw portion 261 is secured around the work W2 where the holding of the hand portion 26 by the claw portion 261 is impossible. Thus, the work W2 can be held by the claw portion 261.

第３手法は、爪部２６１によって保持した一のワークＷ３をパレットＰＬに載置した後、ハンド部２６が爪部２６１を保持不可ワークＷ２に当接させた状態で移動することにより、当該保持不可ワークＷ２を変位させる変位手法である。ロボット２による第３手法を用いた変位動作によって、ハンド部２６の爪部２６１による保持が不可能とされたワークＷ２の周囲に、爪部２６１による保持を可能とするための保持スペースが確保され、当該ワークＷ２の爪部２６１による保持が可能となる。 In the third method, after one work W3 held by the claw portion 261 is placed on the pallet PL, the hand portion 26 moves while the claw portion 261 is in contact with the work W2 that cannot be held, thereby holding the work W3. This is a displacement method for displacing the impossible work W2. By the displacement operation using the third method by the robot 2, a holding space for enabling the holding by the claw portion 261 is secured around the work W2 where the holding of the hand portion 26 by the claw portion 261 is impossible. Thus, the work W2 can be held by the claw portion 261.

第４手法は、爪部２６１によって保持した一のワークＷ３をパレットＰＬに載置した後、ワークＷ１，Ｗ２，Ｗ３とは種類の異なる、他の容器から取り出したワークＷＳを爪部２６１によって保持させ、ハンド部２６がワークＷＳを保持不可ワークＷ２に当接させた状態で移動することにより、当該保持不可ワークＷ２を変位させる変位手法である。ロボット２による第４手法を用いた変位動作によって、ハンド部２６の爪部２６１による保持が不可能とされたワークＷ２の周囲に、爪部２６１による保持を可能とするための保持スペースが確保され、当該ワークＷ２の爪部２６１による保持が可能となる。 In the fourth method, after one work W3 held by the claw portion 261 is placed on the pallet PL, a work WS different from the work W1, W2, and W3 and taken out of another container is held by the claw portion 261. This is a displacement method in which the hand unit 26 moves while the work WS is in contact with the non-holdable work W2, thereby displacing the non-holdable work W2. By the displacement operation using the fourth method by the robot 2, a holding space for enabling the holding by the claw 261 is secured around the work W2 where the holding by the claw 261 of the hand unit 26 is disabled. Thus, the work W2 can be held by the claw portion 261.

第５手法は、爪部２６１によって保持した一のワークＷ３をパレットＰＬに載置した後、専用治具ＪＧを爪部２６１によって保持させ、ハンド部２６が専用治具ＪＧを保持不可ワークＷ２に当接させた状態で移動することにより、当該保持不可ワークＷ２を変位させる変位手法である。ロボット２による第５手法を用いた変位動作によって、ハンド部２６の爪部２６１による保持が不可能とされたワークＷ２の周囲に、爪部２６１による保持を可能とするための保持スペースが確保され、当該ワークＷ２の爪部２６１による保持が可能となる。 In the fifth method, after one work W3 held by the claw portion 261 is placed on the pallet PL, the special jig JG is held by the claw portion 261 and the hand portion 26 holds the special jig JG on the work W2 that cannot hold the special jig JG. This is a displacement technique for displacing the non-holdable work W2 by moving in a state where the workpiece W2 is in contact with the workpiece. By the displacement operation using the fifth method by the robot 2, a holding space for enabling the holding by the claw portion 261 is secured around the work W2 where the holding of the hand portion 26 by the claw portion 261 is impossible. Thus, the work W2 can be held by the claw portion 261.

第６手法は、爪部２６１によって保持した一のワークＷ３をパレットＰＬに載置した後、空気等の気体を噴射可能なノズルＮＺを爪部２６１によって保持させ、ノズルＮＺから気体を保持不可ワークＷ２に向けて噴射させた状態でハンド部２６が移動することにより、当該保持不可ワークＷ２を変位させる変位手法である。ロボット２による第６手法を用いた変位動作によって、ハンド部２６の爪部２６１による保持が不可能とされたワークＷ２の周囲に、爪部２６１による保持を可能とするための保持スペースが確保され、当該ワークＷ２の爪部２６１による保持が可能となる。 In the sixth method, after one work W3 held by the claw portion 261 is placed on the pallet PL, the nozzle NZ capable of injecting gas such as air is held by the claw portion 261 and the work cannot hold the gas from the nozzle NZ. This is a displacement method for displacing the non-holdable work W2 by moving the hand unit 26 in a state of being ejected toward W2. By the displacement operation of the robot 2 using the sixth technique, a holding space for enabling the holding by the claw 261 is secured around the work W2, which is not allowed to be held by the claw 261 of the hand unit 26. Thus, the work W2 can be held by the claw portion 261.

第７手法は、爪部２６１によって保持した一のワークＷ３を取り出す取り出し動作において、保持不可ワークＷ１，Ｗ２を崩すことによって、当該保持不可ワークＷ１，Ｗ２を変位させる変位手法である。この第７手法は、例えば、一のワークＷ３の上に保持不可ワークＷ１，Ｗ２が配置されている場合などに有効な手法である。ロボット２による第７手法を用いた変位動作によって、ハンド部２６の爪部２６１による保持が不可能とされたワークＷ１，Ｗ２の周囲に、爪部２６１による保持を可能とするための保持スペースが確保され、当該ワークＷ１，Ｗ２の爪部２６１による保持が可能となる。 The seventh method is a displacement method of displacing the non-retainable works W1 and W2 by breaking the non-retainable works W1 and W2 in the take-out operation of taking out one work W3 held by the claw portion 261. The seventh method is effective when, for example, non-holdable works W1 and W2 are arranged on one work W3. By the displacement operation of the robot 2 using the seventh method, a holding space for enabling the holding by the claw 261 is provided around the workpieces W1 and W2, which cannot be held by the claw 261 of the hand unit 26. Thus, the workpieces W1 and W2 can be held by the claw portions 261.

行動観測部６２は、判定部７によって次の保持候補となるワークが保持不可ワークであると判定された場合、上記の変位手法を用いて保持不可ワークを変位させるロボット２の変位動作における行動パターンについても観測する。行動観測部６２は、上記の図５に示した各行動要素に基づいて、ロボット２の変位動作における行動パターンを認識することができる。 When the determination unit 7 determines that the work to be the next holding candidate is a non-holdable work, the behavior observation unit 62 determines the action pattern in the displacement operation of the robot 2 that displaces the non-holdable work using the above-described displacement method. Observe also. The behavior observation unit 62 can recognize the behavior pattern in the displacement operation of the robot 2 based on each behavior element shown in FIG.

（変位量観測部について）
変位量観測部６４は、判定部７によって次の保持候補となるワークが保持不可ワークであると判定され、所定の変位手法を用いて保持不可ワークを変位させる変位動作がロボット２によって実行されたときに、保持不可ワークのワーク変位量を観測する。変位量観測部６４は、撮像装置３から出力される、ロボット２による変位動作の前後における各画像データに基づいて、保持不可ワークのワーク変位量を観測する。 (Displacement observation unit)
In the displacement amount observation unit 64, the determination unit 7 has determined that the work to be the next holding candidate is a non-holdable work, and the robot 2 has performed a displacement operation of displacing the non-holdable work using a predetermined displacement method. Sometimes, the work displacement of the work that cannot be held is observed. The displacement amount observation unit 64 observes the work displacement amount of the non-holdable work based on each image data output from the imaging device 3 before and after the displacement operation by the robot 2.

より詳しくは、変位量観測部６４は、ロボット２による変位動作前における画像データであって、判定部７が保持不可ワークの存否を判定するときに参照する前述の基準画像データと、ロボット２による変位動作後における画像データとに基づいて、保持不可ワークのワーク変位量を観測する。変位量観測部６４は、基準画像データに含まれる保持不可ワークの三次元位置情報における各座標値と、ロボット２による変位動作後の画像データに含まれるワークの三次元位置情報における各座標値との差分を算出することにより、保持不可ワークのワーク変位量を観測する。変位量観測部６４の動作の詳細については、後述する。 More specifically, the displacement amount observing unit 64 is the image data before the displacing operation by the robot 2, the reference image data referred to when the judging unit 7 judges the presence or absence of the work that cannot be held, The work displacement amount of the non-holdable work is observed based on the image data after the displacement operation. The displacement amount observation unit 64 calculates each coordinate value in the three-dimensional position information of the non-holdable work included in the reference image data, and each coordinate value in the three-dimensional position information of the work included in the image data after the displacement operation by the robot 2. By calculating the difference, the work displacement amount of the work that cannot be held is observed. Details of the operation of the displacement amount observation unit 64 will be described later.

（学習部について）
学習部６３は、ロボット２の状態が移行されるときのロボット２の最適な行動パターンを学習する。更に、学習部６３は、ハンド部２６による次の保持候補となるワークが保持不可ワークであることが判定部７によって判定された場合に、保持スペースが確保されるように保持不可ワークを変位させることが可能な最適な変位手法を学習するとともに、当該変位手法を用いたロボット２の行動パターンを学習する。 (About the learning department)
The learning unit 63 learns an optimal behavior pattern of the robot 2 when the state of the robot 2 is shifted. Further, the learning unit 63 displaces the non-holding work so that a holding space is secured when the determination unit 7 determines that the work to be the next holding candidate by the hand unit 26 is the non-holding work. In addition to learning an optimal displacement method capable of performing the same, the behavior pattern of the robot 2 using the displacement method is learned.

なお、前述したように、学習部６３は、ロボット２の状態が状態Ｓ１から状態Ｓ２へ移行されるときのロボット２の最適な行動パターンと、状態Ｓ２から状態Ｓ３へ移行されるときのロボット２の最適な行動パターンとについては、既に学習済みである。また、学習部６３は、ロボット２の状態が状態Ｓ３から状態Ｓ４へ移行されるときのロボット２の行動パターンについて、ハンド部２６の爪部２６１による次の保持候補となるワークの周囲に爪部２６１による保持スペースが確保されている場合の行動パターンについても、既に学習済みである。学習部６３によって既に学習済みのロボット２の行動パターンについては、記憶部８に記憶されている。以下では、ロボット２の状態が状態Ｓ３であるときに、所定の変位手法を用いて保持不可ワークを変位させる変位動作におけるロボット２の行動パターンの学習について、詳細に説明する。 As described above, the learning unit 63 determines the optimum behavior pattern of the robot 2 when the state of the robot 2 shifts from the state S1 to the state S2 and the robot 2 when the state of the robot 2 shifts from the state S2 to the state S3. The optimal behavior pattern has already been learned. In addition, the learning unit 63 determines that the behavior pattern of the robot 2 when the state of the robot 2 is shifted from the state S3 to the state S4 includes a claw section around the work to be held next by the claw section 261 of the hand section 26. The behavior pattern when the holding space according to H.261 is secured has already been learned. The behavior pattern of the robot 2 that has already been learned by the learning unit 63 is stored in the storage unit 8. Hereinafter, learning of the behavior pattern of the robot 2 in the displacement operation of displacing the non-holdable work using the predetermined displacement method when the state of the robot 2 is the state S3 will be described in detail.

学習部６３は、所定の変位手法を用いて保持不可ワークを変位させるときの、行動観測部６２により観測されたロボット２の行動パターンを、変位量観測部６４により観測された保持不可ワークのワーク変位量と対応付けて学習する。学習部６３は、ロボット２の行動パターンとワーク変位量とを関連付けた教師データに基づいて、保持スペースの確保が可能となるように保持不可ワークを変位させる最適な変位手法及びロボット２の行動パターンを学習する。 The learning unit 63 compares the behavior pattern of the robot 2 observed by the behavior observation unit 62 when the non-holdable work is displaced using a predetermined displacement method with the work of the non-holdable work observed by the displacement amount observation unit 64. Learning is performed in association with the displacement amount. The learning unit 63 is based on the teacher data that associates the behavior pattern of the robot 2 with the amount of work displacement, and based on the teacher data, an optimal displacement method for displacing the non-holdable work so that a holding space can be secured, and the behavior pattern of the robot 2 To learn.

学習部６３は、図１に示すように、報酬設定部６３１と価値関数更新部６３２とを含んで構成される。 As shown in FIG. 1, the learning unit 63 includes a reward setting unit 631 and a value function updating unit 632.

報酬設定部６３１は、行動観測部６２により観測されたロボット２の変位動作における行動パターンに対し、保持不可ワークのワーク変位量に応じた報酬Ｒ（後記の図９参照）を設定する。報酬設定部６３１は、保持不可ワークのワーク変位量に応じて段階的に報酬Ｒを設定してもよい。例えば、報酬設定部６３１は、保持不可ワークのワーク変位量が所定の閾値ＷＤＴ（後記の図９参照）以上となるロボット２の行動パターンに対しては、第１の値Ｒ１（例えば「１００」）の報酬Ｒを与える。また、報酬設定部６３１は、保持不可ワークのワーク変位量が（閾値ＷＤＴ×０．５）以上且つ閾値ＷＤＴ未満となるロボット２の行動パターンに対しては、第１の値Ｒ１よりも小さい第２の値Ｒ２（例えば「１０」）の報酬Ｒを与える。また、報酬設定部６３１は、保持不可ワークのワーク変位量が（閾値ＷＤＴ×０．５）未満となるロボット２の行動パターンに対しては、第２の値Ｒ２よりも小さい第３の値Ｒ３（例えば「０：ゼロ」）の報酬Ｒを与える。 The reward setting unit 631 sets a reward R (see FIG. 9 described later) according to the amount of work displacement of the work that cannot be held, for the behavior pattern in the displacement operation of the robot 2 observed by the behavior observation unit 62. The reward setting unit 631 may set the reward R stepwise according to the work displacement amount of the work that cannot be held. For example, the reward setting unit 631 sets the first value R1 (for example, “100”) for the behavior pattern of the robot 2 in which the work displacement amount of the non-holdable work is equal to or larger than a predetermined threshold value WDT (see FIG. 9 described later). ) Is given. In addition, the reward setting unit 631 determines that, for the behavior pattern of the robot 2 in which the work displacement amount of the non-holdable work is equal to or more than (threshold value WDT × 0.5) and less than the threshold value WDT, a first value R1 smaller than the first value R1. A reward R of a value R2 of 2 (for example, “10”) is given. In addition, the reward setting unit 631 sets a third value R3 smaller than the second value R2 for an action pattern of the robot 2 in which the work displacement amount of the non-holdable work is less than (threshold value WDT × 0.5). (For example, “0: zero”).

なお、前記閾値ＷＤＴは、例えば、ハンド部２６の爪部２６１の厚みに「１」以上の係数（例えば「１．２」）を乗算した値である。つまり、前記閾値ＷＤＴは、ハンド部２６の爪部２６１の厚みに相当する、爪部２６１によるワークの保持を可能とするための保持スペースよりも、僅かに大きい値に設定されている。 The threshold value WDT is, for example, a value obtained by multiplying the thickness of the claw portion 261 of the hand portion 26 by a coefficient (eg, “1.2”) of “1” or more. That is, the threshold value WDT is set to a value slightly larger than a holding space corresponding to the thickness of the claw portion 261 of the hand portion 26 for enabling the claw portion 261 to hold the work.

価値関数更新部６３２は、ロボット２の行動パターンの価値Ｑ（ｓ，ａ）を規定する価値関数を、報酬設定部６３１により設定された報酬Ｒに応じて更新する。価値関数更新部６３２は、下記式（１）で示される価値Ｑ（ｓ，ａ）の更新式を用いて価値関数を更新する。 The value function updating unit 632 updates the value function that defines the value Q (s, a) of the behavior pattern of the robot 2 according to the reward R set by the reward setting unit 631. The value function updating unit 632 updates the value function using an update expression of the value Q (s, a) represented by the following expression (1).

上記式（１）において、「ｓ」は、ロボット２の状態（状態Ｓ３）を表し、「ａ」は、行動パターンに従ったロボット２の行動を表す。行動「ａ」によってロボット２の状態が、状態「ｓ」（状態Ｓ３）から状態「ｓ’」（変位動作後の状態）へ移行する。Ｒ（ｓ，ａ）は、その状態の移行により得られた報酬Ｒを表している。 In the above equation (1), “s” represents the state of the robot 2 (state S3), and “a” represents the behavior of the robot 2 according to the behavior pattern. The state of the robot 2 shifts from the state “s” (state S3) to the state “s ′” (state after the displacement operation) by the action “a”. R (s, a) represents the reward R obtained by the transition of the state.

上記式（１）において、「ｍａｘ」が付された項は、状態「ｓ’」において最も価値の高い行動「ａ’」を選択した場合の価値Ｑ（ｓ’，ａ’）に「γ」を乗算したものになる。「γ」は、減衰率と呼ばれるパラメータであり、０＜γ≦１の範囲（例えば０．９）とされる。また、「α」は、学習率と呼ばれるパラメータであり、０＜α≦１の範囲（例えば０．１）とされる。また、「ε」は、修正係数と呼ばれるパラメータであり、０＜ε≦１の範囲とされる。修正係数εは、詳細については後述するが、学習部６３によって算出される。上記式（１）で示される価値Ｑ（ｓ，ａ）の更新式において、「ε」は、学習部６３によって修正係数εが算出されるまでは、「ε＝１」とされる。 In the above equation (1), the term to which “max” has been added represents “γ” as the value Q (s ′, a ′) when the highest value action “a ′” is selected in the state “s ′”. Multiplied by “Γ” is a parameter called an attenuation factor, and is in a range of 0 <γ ≦ 1 (for example, 0.9). “Α” is a parameter called a learning rate, and is in a range of 0 <α ≦ 1 (for example, 0.1). “Ε” is a parameter called a correction coefficient, and is in a range of 0 <ε ≦ 1. The correction coefficient ε is calculated by the learning unit 63, which will be described later in detail. In the updating equation of the value Q (s, a) shown in the above equation (1), “ε” is set to “ε = 1” until the learning unit 63 calculates the correction coefficient ε.

上記式（１）は、行動「ａ」に対して報酬設定部６３１により設定された報酬Ｒ（ｓ，ａ）に基づいて、状態「ｓ」における行動「ａ」の価値Ｑ（ｓ，ａ）を更新する更新式を表している。すなわち、上記式（１）は、状態「ｓ」における行動「ａ」の価値Ｑ（ｓ，ａ）よりも、状態「ｓ’」における行動「ａ’」の価値Ｑ（ｓ’，ａ’）と報酬Ｒ（ｓ，ａ）との合計値の方が大きければ、価値Ｑ（ｓ，ａ）を大きくし、反対に小さければ、価値Ｑ（ｓ，ａ）を小さくすることを示している。つまり、価値関数更新部６３２は、上記式（１）で示される更新式を用いて価値関数を更新することによって、或る状態「ｓ」における或る行動「ａ」の価値Ｑ（ｓ，ａ）を、その行動「ａ」に対して設定される報酬Ｒと、その行動「ａ」による次の状態「ｓ’」における最良の行動「ａ’」の価値Ｑ（ｓ’，ａ’）に近付けるようにしている。 The above equation (1) calculates the value Q (s, a) of the action “a” in the state “s” based on the reward R (s, a) set by the reward setting unit 631 for the action “a”. Represents an update expression for updating. In other words, the above equation (1) is such that the value Q (s ′, a ′) of the action “a ′” in the state “s ′” is greater than the value Q (s, a) of the action “a” in the state “s”. And the reward R (s, a) is larger, the value Q (s, a) is increased, and on the contrary, the value Q (s, a) is decreased. In other words, the value function updating unit 632 updates the value function using the update expression represented by the above expression (1), thereby updating the value Q (s, a) of the certain action “a” in the certain state “s”. ) To the reward R set for the action “a” and the value Q (s ′, a ′) of the best action “a ′” in the next state “s ′” by the action “a”. I try to get closer.

ここで、詳細については後述するが、図７〜図１１に示される変位動作の第１例を参照して説明すると、ロボット２の行動「ａ」（図７の行動Ａ１）による変位動作によって、保持不可ワークが保持スペースの確保が可能に変位されると、ロボット２の状態が、状態「ｓ」（状態Ｓ３）から状態「ｓ’」（図７の状態Ｓ３１）へ移行する。この状態「ｓ’」（状態Ｓ３１）であるときの状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）は（０，０，０，１，１）とされる。つまり、状態変数の「ｄ」が「１」であり、ハンド部２６による次の保持候補となるワークの周囲に爪部２６１による保持スペースが確保された状態となるので、次に遷移するロボット２の状態は状態Ｓ４となる。よって、状態「ｓ’」（状態Ｓ３１）から状態Ｓ４へ移行する行動「ａ’」である行動Ａ１’（図７）を選択して、状態Ｓ４に移行したときの価値Ｑ（ｓ’，ａ’）が最も高いものとなる。 Here, although the details will be described later, with reference to the first example of the displacement operation shown in FIGS. 7 to 11, by the displacement operation by the action “a” (the action A1 in FIG. 7) of the robot 2, When the work that cannot be held is displaced so that the holding space can be secured, the state of the robot 2 shifts from the state “s” (state S3) to the state “s ′” (state S31 in FIG. 7). The state variables (ΔX, ΔY, ΔZ, p, d) in this state “s ′” (state S31) are (0, 0, 0, 1, 1). That is, since the state variable “d” is “1” and the holding space by the claw portion 261 is secured around the work to be held next by the hand portion 26, the robot 2 to which the next transition is made Is the state S4. Therefore, the action A1 ′ (FIG. 7) that is the action “a ′” that shifts from the state “s ′” (state S31) to the state S4 is selected, and the value Q (s ′, a) when shifting to the state S4 is selected. ') Will be the highest.

一方、ロボット２の行動「ａ」（図７の行動Ａ２）による変位動作によって、保持不可ワークが変位したけれども保持スペースの確保には至らなかった場合、ロボット２の状態が、状態「ｓ」（状態Ｓ３）から状態「ｓ’」（図７の状態Ｓ３２）へ移行する。この状態「ｓ’」（状態Ｓ３２）であるときの状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）は（０，０，０，１，０）とされる。この場合、状態変数の「ｄ」が「０」であり、ハンド部２６による次の保持候補となるワークの周囲には保持スペースが確保されていない状態となるので、次に遷移するロボット２の状態は再度状態Ｓ３となり、変位動作が再試行される。よって、状態「ｓ’」（状態Ｓ３２）から状態Ｓ３へ移行する行動「ａ’」である行動Ａ２’（図７）を選択して、状態Ｓ３に移行したときの価値Ｑ（ｓ’，ａ’）は、上記の状態Ｓ４に移行したときの価値に比べて低い。 On the other hand, when the non-holdable work is displaced by the displacement operation due to the action “a” of the robot 2 (action A2 in FIG. 7) but the holding space is not secured, the state of the robot 2 changes to the state “s” ( The state shifts from the state S3) to the state “s ′” (the state S32 in FIG. 7). The state variables (ΔX, ΔY, ΔZ, p, d) in this state “s ′” (state S32) are (0, 0, 0, 1, 0). In this case, the state variable “d” is “0”, and a holding space is not secured around the work to be the next holding candidate by the hand unit 26. The state becomes the state S3 again, and the displacement operation is retried. Therefore, the action A2 ′ (FIG. 7) that is the action “a ′” that shifts from the state “s ′” (state S32) to the state S3 is selected, and the value Q (s ′, a) when shifting to the state S3 is selected. ') Is lower than the value at the time of shifting to the state S4.

また、ロボット２の行動「ａ」（図７の行動Ａ３）による変位動作によって、保持不可ワークが殆ど変位しなかった場合、ロボット２の状態が、状態「ｓ」（状態Ｓ３）から状態「ｓ’」（図７の状態Ｓ３３）へ移行する。この状態「ｓ’」（状態Ｓ３３）であるときの状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）は（０，０，０，１，０）とされる。この場合、状態変数の「ｄ」が「０」であり、ハンド部２６による次の保持候補となるワークの周囲には保持スペースが確保されていない状態となるので、次に遷移するロボット２の状態は再度状態Ｓ３となり、変位動作が再試行される。よって、状態「ｓ’」（状態Ｓ３３）から状態Ｓ３へ移行する行動「ａ’」である行動Ａ３’（図７）を選択して、状態Ｓ３に移行したときの価値Ｑ（ｓ’，ａ’）も、同様に低い。 Further, when the non-holdable work hardly displaces due to the displacement operation by the action “a” (action A3 in FIG. 7) of the robot 2, the state of the robot 2 changes from the state “s” (state S3) to the state “s”. '"(State S33 in FIG. 7). The state variables (ΔX, ΔY, ΔZ, p, d) in the state “s ′” (state S33) are (0, 0, 0, 1, 0). In this case, the state variable “d” is “0”, and a holding space is not secured around the work to be the next holding candidate by the hand unit 26. The state becomes the state S3 again, and the displacement operation is retried. Therefore, the action A3 ′ (FIG. 7) that is the action “a ′” that shifts from the state “s ′” (state S33) to the state S3 is selected, and the value Q (s ′, a) when shifting to the state S3 is selected. ') Is similarly low.

なお、後記の図１２〜図１４に示される変位動作の第２例、図１５及び図１６に示される変位動作の第３例についても、上記の変位動作の第１例の場合と同様に、上記式（１）で示される価値Ｑ（ｓ，ａ）の更新式が適用できる。 The second example of the displacement operation shown in FIGS. 12 to 14 described later and the third example of the displacement operation shown in FIGS. 15 and 16 are also similar to the case of the first example of the above-described displacement operation. The updating formula of the value Q (s, a) shown in the above formula (1) can be applied.

学習部６３は、所定の変位手法を用いて保持不可ワークを変位させる変位動作におけるロボット２の行動パターンの学習結果を表す学習結果情報を生成する。学習部６３によって生成された学習結果情報は、記憶部８に記憶される。なお、学習部６３は、保持不可ワークを変位させる変位動作におけるロボット２の行動パターンの学習について、ロボット２が生産動作を実行しているときに学習を実行してもよいし、ロボット２の生産動作とは切り離して学習を実行してもよい。 The learning unit 63 generates learning result information indicating a learning result of an action pattern of the robot 2 in a displacement operation of displacing a work that cannot be held using a predetermined displacement method. The learning result information generated by the learning unit 63 is stored in the storage unit 8. The learning unit 63 may learn the behavior pattern of the robot 2 in the displacement operation of displacing the non-holdable work when the robot 2 is performing the production operation, or may perform the production of the robot 2. Learning may be performed separately from the operation.

＜行動決定部について＞
行動決定部９は、記憶部８に記憶されているロボット２の行動パターンを記憶部８から読み出すことによって、状態移行時のロボット２の行動パターンを決定する。行動決定部９は、記憶部８から読み出したロボット２の行動パターンを制御装置４に向けて出力する。より詳しくは、行動決定部９は、前述した既存の行動パターンを記憶部８から読み出して制御装置４に向けて出力する。既存の行動パターンが入力された制御装置４は、当該既存の行動パターンに基づいて、ロボット２の動作を制御する。制御装置４の制御によってロボット２は、ハンド部２６によって容器ＣＮからワークＷを取り出し、その取り出したワークＷをパレットＰＬに載置するという、連続的な生産動作を実行する。 <About the action decision section>
The action determining unit 9 reads the action pattern of the robot 2 stored in the storage unit 8 from the storage unit 8 to determine the action pattern of the robot 2 at the time of the state transition. The action determining unit 9 outputs the action pattern of the robot 2 read from the storage unit 8 to the control device 4. More specifically, the action determining unit 9 reads out the above-described existing action pattern from the storage unit 8 and outputs it to the control device 4. The control device 4 to which the existing behavior pattern has been input controls the operation of the robot 2 based on the existing behavior pattern. Under the control of the control device 4, the robot 2 executes a continuous production operation in which the hand W takes out the work W from the container CN and places the taken out work W on the pallet PL.

また、保持不可ワークを変位させる変位動作の実行時においては、行動決定部９は、学習部６３の学習結果を表す、記憶部８に記憶されている前記学習結果情報を参照する。例えば、前記学習結果情報に保持スペースの確保が可能な行動パターンが登録されている場合、行動決定部９は、当該行動パターンを、保持不可ワークを爪部２６１によって保持可能とするための行動パターンとして決定する。行動決定部９は、記憶部８に記憶されている前記学習結果情報に登録された、保持スペースの確保が可能な行動パターンを読み出すことによって、変位動作の実行時におけるロボット２の行動パターンを決定する。行動決定部９は、記憶部８から読み出したロボット２の変位動作時の行動パターンを制御装置４に向けて出力する。変位動作時の行動パターンが入力された制御装置４は、当該行動パターンに基づいて、ロボット２の動作を制御する。制御装置４の制御によってロボット２は、爪部２６１による保持スペースが周囲に確保されるように保持不可ワークを変位させる。 Further, at the time of executing the displacement operation for displacing the non-holdable work, the action determining unit 9 refers to the learning result information stored in the storage unit 8 and representing the learning result of the learning unit 63. For example, when an action pattern capable of securing a holding space is registered in the learning result information, the action determining unit 9 sets the action pattern to enable the non-holdable work to be held by the claw 261. To be determined. The action determining unit 9 determines an action pattern of the robot 2 at the time of performing the displacement operation by reading out an action pattern registered in the learning result information stored in the storage unit 8 and capable of securing a holding space. I do. The action determining unit 9 outputs the action pattern at the time of the displacement operation of the robot 2 read from the storage unit 8 to the control device 4. The control device 4 to which the behavior pattern at the time of the displacement operation is input controls the operation of the robot 2 based on the behavior pattern. Under the control of the control device 4, the robot 2 displaces the non-holdable work such that a space for holding the claw portion 261 is secured around the robot.

保持不可ワークの周囲に保持スペースが確保されると、当該ワークの爪部２６１による保持が可能となる。従って、保持不可ワークの周囲に保持スペースが確保された後、行動決定部９は、前述した既存の行動パターンを記憶部８から読み出して制御装置４に向けて出力する。これにより、制御装置４の制御によってロボット２は、ハンド部２６によって容器ＣＮからワークＷを取り出し、その取り出したワークＷをパレットＰＬに載置するという、連続的な生産動作を実行する。 When a holding space is secured around the work that cannot be held, the work can be held by the claw portion 261. Therefore, after the holding space is secured around the non-holding work, the action determining unit 9 reads out the above-described existing action pattern from the storage unit 8 and outputs it to the control device 4. Accordingly, under the control of the control device 4, the robot 2 executes a continuous production operation of taking out the work W from the container CN by the hand unit 26 and placing the taken out work W on the pallet PL.

［保持不可ワークを変位させる変位動作の具体例］
以上説明したように、ロボット２の状態が状態Ｓ３であって、ハンド部２６の爪部２６１によって一のワークを保持したときに、次の保持候補となるワークが保持不可ワークであることが判定部７によって判定された場合、保持スペースが確保されるように保持不可ワークを変位させる、所定の変位手法を用いた変位動作が実行される。以下では、具体例を挙げて、保持不可ワークを変位させる変位動作の詳細について説明する。 [Specific example of displacement operation for displacing a workpiece that cannot be held]
As described above, when the state of the robot 2 is the state S3 and one work is held by the claw part 261 of the hand unit 26, it is determined that the work to be the next holding candidate is a work that cannot be held. When the determination is made by the unit 7, a displacement operation using a predetermined displacement method for displacing the non-holdable work so that a holding space is secured is executed. Hereinafter, the displacement operation for displacing the non-holdable work will be described in detail with reference to a specific example.

＜変位動作の第１例について＞
図７〜図１０を参照して、変位動作の第１例について説明する。図７は、保持不可ワークを変位させる変位動作の第１例を説明するための図である。図８は、変位量観測部６４の動作を説明するための図である。図９は、第１例の変位動作において学習部６３によって生成される学習結果情報ＪＨ１を説明するための図である。図１０は、第１例の変位動作に関する機械学習装置５の動作を示すフローチャートである。 <About the first example of the displacement operation>
A first example of the displacement operation will be described with reference to FIGS. FIG. 7 is a diagram illustrating a first example of a displacement operation for displacing a non-holdable work. FIG. 8 is a diagram for explaining the operation of the displacement amount observation unit 64. FIG. 9 is a diagram for explaining the learning result information JH1 generated by the learning unit 63 in the displacement operation of the first example. FIG. 10 is a flowchart illustrating the operation of the machine learning device 5 regarding the displacement operation of the first example.

状態観測部６１は、ロボット２の状態が移行されるごとに変化する状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）に基づいて、ロボット２の状態が状態Ｓ２から状態Ｓ３へ移行されたことを観測する（図１０のステップａ１）。ロボット２の状態が状態Ｓ２から状態Ｓ３へ移行されたとき、すなわち、ハンド部２６の爪部２６１によって一のワークが保持されたときに、判定部７は、撮像装置３から出力された基準画像データを取得する（図１０のステップａ２）。基準画像データは、次の保持候補となるワークに関する三次元位置情報を含む画像データとなる。判定部７は、基準画像データに基づいて容器ＣＮ内での各ワークの収容状況を認識し、次の保持候補となるワークが保持不可ワークであるか否かを判定する（図１０のステップａ３）。 The state observing unit 61 determines that the state of the robot 2 has been shifted from the state S2 to the state S3 based on the state variables (ΔX, ΔY, ΔZ, p, d) that change each time the state of the robot 2 is shifted. Is observed (step a1 in FIG. 10). When the state of the robot 2 is shifted from the state S2 to the state S3, that is, when one work is held by the claw part 261 of the hand unit 26, the determination unit 7 outputs the reference image output from the imaging device 3. Data is acquired (step a2 in FIG. 10). The reference image data is image data including three-dimensional position information regarding a work to be a next holding candidate. The determination unit 7 recognizes the accommodation state of each work in the container CN based on the reference image data, and determines whether the next work as a candidate for holding is a work that cannot be held (step a3 in FIG. 10). ).

次の保持候補となるワークが保持不可ワークではないと判定部７によって判定された場合には、ロボット２の状態が状態Ｓ３であるときの状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）が（０，０，０，１，１）とされる。この場合、行動決定部９は、状態Ｓ３から状態Ｓ４へと移行させるための既存の行動パターンを記憶部８から読み出して制御装置４に向けて出力する。既存の行動パターンが入力された制御装置４は、当該既存の行動パターンに基づいて、ロボット２の動作を制御する。制御装置４の制御によってロボット２は、爪部２６１によって保持された一のワークを容器ＣＮから取り出す取り出し動作を実行する（図１０のステップａ５）。 If the determination unit 7 determines that the work to be the next holding candidate is not a work that cannot be held, the state variables (ΔX, ΔY, ΔZ, p, d) when the state of the robot 2 is the state S3 are determined. (0, 0, 0, 1, 1). In this case, the action determining unit 9 reads an existing action pattern for shifting from the state S3 to the state S4 from the storage unit 8 and outputs the pattern to the control device 4. The control device 4 to which the existing behavior pattern has been input controls the operation of the robot 2 based on the existing behavior pattern. Under the control of the control device 4, the robot 2 executes a take-out operation of taking out one work held by the claw portion 261 from the container CN (step a5 in FIG. 10).

一方、次の保持候補となるワークが保持不可ワークであると判定部７によって判定された場合には、図７に示すように、ロボット２の状態が状態Ｓ３であるときの状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）が（０，０，０，１，０）とされる。図７に示す例では、ハンド部２６の爪部２６１によって容器ＣＮ内の一のワークＷ３を保持したときに、ワークＷ１，Ｗ２の周囲に保持スペースが確保されておらず、ワークＷ１，Ｗ２が保持不可ワークとされている。このため、状態変数の「ｄ」が「０」であり、ハンド部２６の爪部２６１による次の保持候補となるワークＷ１，Ｗ２の周囲に爪部２６１による保持スペースが確保されていないことが示されている。なお、図７に示す例では、保持不可ワークＷ１は容器ＣＮの内面に近接して配置され、保持不可ワークＷ２は保持不可ワークＷ１の側方において当該保持不可ワークＷ１に近接して配置されている。このため、保持不可ワークＷ１，Ｗ２の周囲に保持スペースが確保されていない。 On the other hand, when the determination unit 7 determines that the work to be the next holding candidate is a work that cannot be held, as shown in FIG. 7, the state variable (ΔX, ΔY, ΔZ, p, d) are set to (0, 0, 0, 1, 0). In the example illustrated in FIG. 7, when one work W3 in the container CN is held by the claw portion 261 of the hand unit 26, no holding space is secured around the works W1 and W2, and the work W1 and W2 are The work cannot be held. For this reason, the state variable “d” is “0”, and the holding space of the claw 261 is not secured around the workpieces W1 and W2 to be held next by the claw 261 of the hand unit 26. It is shown. In the example shown in FIG. 7, the non-retainable work W1 is disposed close to the inner surface of the container CN, and the non-retainable work W2 is disposed adjacent to the non-retainable work W1 on the side of the non-retainable work W1. I have. Therefore, a holding space is not secured around the non-holding works W1 and W2.

次の保持候補となるワークＷ１，Ｗ２が保持不可ワークであると判定部７によって判定された場合、保持不可ワークであるワークＷ１及びワークＷ２の少なくともいずれか一方のワークを、爪部２６１による保持スペースが周囲に確保されるように変位させる変位動作が実行される。変位動作の第１例においては、ロボット２は、図７に示すように、第１手法を用いた行動パターンに基づく変位動作によって保持不可ワークとしてのワークＷ２を変位させる。なお、第１手法は、前述したように、爪部２６１によって保持した一のワークＷ３をパレットＰＬへ向けて移動させるときに、ハンド部２６が一のワークＷ３を保持不可ワークＷ２に当接させた状態で移動することにより、当該保持不可ワークＷ２を変位させる変位手法である。 When the determination unit 7 determines that the work W1 or W2 that is the next holding candidate is a work that cannot be held, the claw unit 261 holds at least one of the work W1 and the work W2 that is the work that cannot be held. A displacement operation is performed to displace so that a space is secured around. In the first example of the displacement operation, as shown in FIG. 7, the robot 2 displaces the work W2 as the non-holdable work by the displacement operation based on the action pattern using the first method. In the first method, as described above, when moving one work W3 held by the claw portion 261 toward the pallet PL, the hand unit 26 causes the one work W3 to contact the non-holdable work W2. This is a displacement method of displacing the non-holdable work W2 by moving in a state where the workpiece W2 is held.

行動観測部６２は、第１手法を用いたロボット２の行動パターンを観測する（図１０のステップａ４）。図７に示す例では、第１手法を用いたロボット２の行動パターンとして、行動Ａ１、行動Ａ２及び行動Ａ３の３種の行動パターンが示されている。行動Ａ１は、爪部２６１によって保持した一のワークＷ３の先端が保持不可ワークＷ２の長手方向一端面に当接（当接位置ＣＰ）した状態でハンド部２６が、保持不可ワークＷ１，Ｗ２の並列方向に関して保持不可ワークＷ１から斜めに離れる方向に移動（移動軌跡ＭＴ）するような行動パターンである。行動Ａ２は、行動Ａ１に対して保持不可ワークＷ１，Ｗ２の並列方向に関する移動軌跡ＭＴの傾斜度合いが異なる以外は、行動Ａ１と同様の行動パターンである。行動Ａ３は、爪部２６１によって保持した一のワークＷ３の先端が保持不可ワークＷ２の側面に当接（当接位置ＣＰ）した状態でハンド部２６が、保持不可ワークＷ１，Ｗ２の並列方向と直交する方向、すなわち保持不可ワークＷ２の側面に沿った方向に移動（移動軌跡ＭＴ）するような行動パターンである。 The behavior observing unit 62 observes the behavior pattern of the robot 2 using the first method (Step a4 in FIG. 10). In the example illustrated in FIG. 7, three types of behavior patterns of the behavior A1, the behavior A2, and the behavior A3 are illustrated as the behavior patterns of the robot 2 using the first technique. The action A1 is that the hand unit 26 moves the non-holdable works W1 and W2 in a state where the tip of one work W3 held by the claw 261 is in contact with one longitudinal end surface of the non-holdable work W2 (abutting position CP). This is an action pattern that moves in a direction obliquely away from the non-holdable work W1 in the parallel direction (movement trajectory MT). The action A2 has the same action pattern as the action A1 except that the inclination degree of the movement trajectory MT in the parallel direction of the non-holdable works W1 and W2 is different from the action A1. In the action A3, the hand unit 26 moves in the state where the tip of one work W3 held by the claw 261 abuts on the side surface of the non-holdable work W2 (abutting position CP) and the parallel direction of the non-holdable works W1 and W2. This is an action pattern that moves (movement trajectory MT) in a direction perpendicular to the direction, that is, a direction along the side surface of the non-holdable work W2.

行動観測部６２によって観測されるロボット２の行動パターンを規定する行動要素としては、前述の図５に示される、把持角θ、把持位置ＨＰ、第１軸２Ａにおける回転角β１及び回転速度パターン、第２軸２Ｂにおける回転角β２及び回転速度パターン、第３軸２Ｃにおける回転角β３及び回転速度パターン、第４軸２Ｄにおける回転角β４及び回転速度パターン、第５軸２Ｅにおける回転角β５及び回転速度パターン、第６軸２Ｆにおける回転角β６及び回転速度パターンが含まれる。図５に示される各行動要素は、第１手法を用いたロボット２の行動パターンにおいて、爪部２６１によって保持した一のワークＷ３の保持不可ワークＷ２に対する当接位置ＣＰを決定付ける要素となるとともに、ハンド部２６の移動軌跡ＭＴを決定付ける要素となる。 The behavior elements that define the behavior pattern of the robot 2 observed by the behavior observation unit 62 include the grip angle θ, the grip position HP, the rotation angle β1 and the rotation speed pattern on the first axis 2A shown in FIG. Rotation angle β2 and rotation speed pattern on second shaft 2B, rotation angle β3 and rotation speed pattern on third shaft 2C, rotation angle β4 and rotation speed pattern on fourth shaft 2D, rotation angle β5 and rotation speed on fifth shaft 2E The pattern, the rotation angle β6 on the sixth axis 2F, and the rotation speed pattern are included. Each action element shown in FIG. 5 is an element that determines the contact position CP of one work W3 held by the claw portion 261 with respect to the non-holdable work W2 in the action pattern of the robot 2 using the first method. , The moving trajectory MT of the hand unit 26.

第１手法を用いた行動パターンに基づく変位動作が完了すると、変位量観測部６４は、撮像装置３から出力された、変位動作後の画像データを取得する（図１０のステップａ６）。変位動作後の画像データは、第１手法を用いたロボット２の行動パターンによって変位された後の保持不可ワークＷ１，Ｗ２に関する三次元位置情報を含む画像データとなる。変位量観測部６４は、ロボット２による変位動作前における画像データであって、判定部７が保持不可ワークＷ１，Ｗ２の存否を判定するときに参照する基準画像データと、ロボット２による変位動作後における画像データとに基づいて、保持不可ワークＷ１，Ｗ２のワーク変位量を観測する（図１０のステップａ７）。 When the displacement operation based on the behavior pattern using the first method is completed, the displacement amount observation unit 64 acquires the image data after the displacement operation output from the imaging device 3 (step a6 in FIG. 10). The image data after the displacement operation is image data including three-dimensional position information on the non-holdable works W1 and W2 after being displaced by the behavior pattern of the robot 2 using the first method. The displacement amount observing section 64 is image data before the displacing operation by the robot 2, the reference image data which the judging section 7 refers to when judging the existence of the non-holdable works W 1, W 2, and after the displacing operation by the robot 2. The work displacement amount of the non-holdable works W1 and W2 is observed based on the image data in (1) (step a7 in FIG. 10).

図８に示す例では、ロボット２による変位動作前であって、ハンド部２６の爪部２６１によって一のワークＷ３が保持されたときに、撮像装置３におけるカメラ３１の撮像動作によって基準画像ＧＳが取得され、画像処理部３２の基準画像ＧＳに対する画像処理によって基準画像データＧＤＳが生成されている。基準画像ＧＳには、保持不可ワークＷ１に対応した画像領域ＧＷ１と、保持不可ワークＷ２に対応した画像領域ＧＷ２とが含まれている。また、基準画像データＧＤＳには、保持不可ワークＷ１の三次元位置情報としての座標値（Ｘ１，Ｙ１，Ｚ１）に関する情報と、保持不可ワークＷ２の三次元位置情報としての座標値（Ｘ２，Ｙ２，Ｚ２）に関する情報とが含まれている。 In the example illustrated in FIG. 8, before the displacement operation by the robot 2 and when one work W3 is held by the claw portion 261 of the hand unit 26, the reference image GS is formed by the imaging operation of the camera 31 in the imaging device 3. The acquired reference image data GDS is generated by the image processing unit 32 performing image processing on the reference image GS. The reference image GS includes an image area GW1 corresponding to the non-holdable work W1 and an image area GW2 corresponding to the non-holdable work W2. The reference image data GDS includes information on coordinate values (X1, Y1, Z1) as three-dimensional position information of the non-holdable work W1 and coordinate values (X2, Y2) as three-dimensional position information of the non-holdable work W2. , Z2).

また、図８に示す例では、ロボット２による変位動作後において、撮像装置３におけるカメラ３１の撮像動作によって第１画像Ｇ１、第２画像Ｇ２及び第３画像Ｇ３が取得され、画像処理部３２の各画像Ｇ１，Ｇ２，Ｇ３に対する画像処理によって第１画像データＧＤ１、第２画像データＧＤ２及び第３画像データＧＤ３がそれぞれ生成されている。 In the example illustrated in FIG. 8, after the displacement operation by the robot 2, the first image G 1, the second image G 2, and the third image G 3 are acquired by the imaging operation of the camera 31 in the imaging device 3. The first image data GD1, the second image data GD2, and the third image data GD3 are generated by performing image processing on the images G1, G2, and G3, respectively.

第１画像Ｇ１及び第１画像データＧＤ１は、第１手法を用いた行動パターンＡ１（図７の行動Ａ１）に基づくロボット２の変位動作後の画像及び画像データを示している。第１画像Ｇ１には、行動パターンＡ１に基づくロボット２の変位動作後における保持不可ワークＷ１，Ｗ２について、保持不可ワークＷ１に対応した画像領域ＧＷ１と、保持不可ワークＷ２に対応した画像領域ＧＷ２とが含まれている。また、第１画像データＧＤ１には、保持不可ワークＷ１の三次元位置情報としての座標値（Ｘ１１，Ｙ１１，Ｚ１１）に関する情報と、保持不可ワークＷ２の三次元位置情報としての座標値（Ｘ２１，Ｙ２１，Ｚ２１）に関する情報とが含まれている。 The first image G1 and the first image data GD1 show an image and image data after the displacement operation of the robot 2 based on the behavior pattern A1 (the behavior A1 in FIG. 7) using the first method. The first image G1 includes an image area GW1 corresponding to the non-holdable work W1 and an image area GW2 corresponding to the non-holdable work W2 for the non-holdable works W1 and W2 after the displacement operation of the robot 2 based on the action pattern A1. It is included. The first image data GD1 includes information on coordinate values (X11, Y11, Z11) as three-dimensional position information of the non-holdable work W1 and coordinate values (X21, Y21, Z21).

第２画像Ｇ２及び第２画像データＧＤ２は、第１手法を用いた行動パターンＡ２（図７の行動Ａ２）に基づくロボット２の変位動作後の画像及び画像データを示している。第２画像Ｇ２には、行動パターンＡ２に基づくロボット２の変位動作後における保持不可ワークＷ１，Ｗ２について、保持不可ワークＷ１に対応した画像領域ＧＷ１と、保持不可ワークＷ２に対応した画像領域ＧＷ２とが含まれている。また、第２画像データＧＤ２には、保持不可ワークＷ１の三次元位置情報としての座標値（Ｘ１２，Ｙ１２，Ｚ１２）に関する情報と、保持不可ワークＷ２の三次元位置情報としての座標値（Ｘ２２，Ｙ２２，Ｚ２２）に関する情報とが含まれている。 The second image G2 and the second image data GD2 show an image and image data after the displacement operation of the robot 2 based on the action pattern A2 (the action A2 in FIG. 7) using the first method. The second image G2 includes an image area GW1 corresponding to the non-holdable work W1 and an image area GW2 corresponding to the non-holdable work W2 for the non-holdable works W1 and W2 after the displacement operation of the robot 2 based on the action pattern A2. It is included. The second image data GD2 includes information on the coordinate values (X12, Y12, Z12) as the three-dimensional position information of the non-holdable work W1, and the coordinate values (X22, Y22, Z22).

第３画像Ｇ３及び第３画像データＧＤ３は、第１手法を用いた行動パターンＡ３（図７の行動Ａ３）に基づくロボット２の変位動作後の画像及び画像データを示している。第３画像Ｇ３には、行動パターンＡ３に基づくロボット２の変位動作後における保持不可ワークＷ１，Ｗ２について、保持不可ワークＷ１に対応した画像領域ＧＷ１と、保持不可ワークＷ２に対応した画像領域ＧＷ２とが含まれている。また、第３画像データＧＤ３には、保持不可ワークＷ１の三次元位置情報としての座標値（Ｘ１３，Ｙ１３，Ｚ１３）に関する情報と、保持不可ワークＷ２の三次元位置情報としての座標値（Ｘ２３，Ｙ２３，Ｚ２３）に関する情報とが含まれている。 The third image G3 and the third image data GD3 show the image and the image data after the displacement operation of the robot 2 based on the behavior pattern A3 (the behavior A3 in FIG. 7) using the first method. The third image G3 includes an image area GW1 corresponding to the non-holdable work W1 and an image area GW2 corresponding to the non-holdable work W1 for the non-holdable works W1 and W2 after the displacement operation of the robot 2 based on the action pattern A3. It is included. Further, the third image data GD3 includes information on the coordinate values (X13, Y13, Z13) as the three-dimensional position information of the non-retainable work W1 and the coordinate values (X23, Y23, Z23).

変位量観測部６４は、基準画像データＧＤＳと第１画像データＧＤ１とに基づいて、第１手法を用いた行動パターンＡ１に基づくロボット２の変位動作が実行されたときの、保持不可ワークＷ１，Ｗ２の容器ＣＮ内での変位量を表す第１ワーク変位量ＷＤ１を観測する。第１ワーク変位量ＷＤ１は、保持不可ワークＷ１のワーク変位量（ＸＤ１１，ＹＤ１１，ＺＤ１１）と、保持不可ワークＷ２のワーク変位量（ＸＤ２１，ＹＤ２１，ＺＤ２１）とを含む。保持不可ワークＷ１のワーク変位量において「ＸＤ１１」は、基準画像データＧＤＳに含まれる保持不可ワークＷ１の三次元位置情報におけるＸ座標値「Ｘ１」と、第１画像データＧＤ１に含まれる保持不可ワークＷ１の三次元位置情報におけるＸ座標値「Ｘ１１」との差分を示す。また、保持不可ワークＷ１のワーク変位量において「ＹＤ１１」は、基準画像データＧＤＳに含まれる保持不可ワークＷ１の三次元位置情報におけるＹ座標値「Ｙ１」と、第１画像データＧＤ１に含まれる保持不可ワークＷ１の三次元位置情報におけるＹ座標値「Ｙ１１」との差分を示す。また、保持不可ワークＷ１のワーク変位量において「ＺＤ１１」は、基準画像データＧＤＳに含まれる保持不可ワークＷ１の三次元位置情報におけるＺ座標値「Ｚ１」と、第１画像データＧＤ１に含まれる保持不可ワークＷ１の三次元位置情報におけるＺ座標値「Ｚ１１」との差分を示す。 The displacement amount observing unit 64 is configured to perform the displacement operation of the robot 2 based on the action pattern A1 using the first method based on the reference image data GDS and the first image data GD1, and the non-holding work W1, The first work displacement WD1 representing the displacement of W2 in the container CN is observed. The first work displacement amount WD1 includes a work displacement amount (XD11, YD11, ZD11) of the non-holdable work W1 and a work displacement amount (XD21, YD21, ZD21) of the non-holdable work W2. In the work displacement amount of the non-holdable work W1, “XD11” is the X coordinate value “X1” in the three-dimensional position information of the non-holdable work W1 included in the reference image data GDS, and the non-holdable work included in the first image data GD1. The difference from the X coordinate value “X11” in the three-dimensional position information of W1 is shown. In addition, “YD11” in the work displacement amount of the non-holdable work W1 is the Y coordinate value “Y1” in the three-dimensional position information of the non-holdable work W1 included in the reference image data GDS and the hold included in the first image data GD1. The difference from the Y coordinate value “Y11” in the three-dimensional position information of the unacceptable work W1 is shown. In addition, “ZD11” in the work displacement amount of the non-holdable work W1 is the Z coordinate value “Z1” in the three-dimensional position information of the non-holdable work W1 included in the reference image data GDS and the hold included in the first image data GD1. The difference from the Z coordinate value “Z11” in the three-dimensional position information of the unacceptable work W1 is shown.

同様に、保持不可ワークＷ２のワーク変位量において「ＸＤ２１」は、基準画像データＧＤＳに含まれる保持不可ワークＷ２の三次元位置情報におけるＸ座標値「Ｘ２」と、第１画像データＧＤ１に含まれる保持不可ワークＷ２の三次元位置情報におけるＸ座標値「Ｘ２１」との差分を示す。また、保持不可ワークＷ２のワーク変位量において「ＹＤ２１」は、基準画像データＧＤＳに含まれる保持不可ワークＷ２の三次元位置情報におけるＹ座標値「Ｙ２」と、第１画像データＧＤ１に含まれる保持不可ワークＷ２の三次元位置情報におけるＹ座標値「Ｙ２１」との差分を示す。また、保持不可ワークＷ２のワーク変位量において「ＺＤ２１」は、基準画像データＧＤＳに含まれる保持不可ワークＷ２の三次元位置情報におけるＺ座標値「Ｚ２」と、第１画像データＧＤ１に含まれる保持不可ワークＷ２の三次元位置情報におけるＺ座標値「Ｚ２１」との差分を示す。 Similarly, “XD21” in the work displacement amount of the non-holdable work W2 is included in the X-coordinate value “X2” in the three-dimensional position information of the non-holdable work W2 included in the reference image data GDS, and in the first image data GD1. The difference from the X coordinate value “X21” in the three-dimensional position information of the work W2 that cannot be held is shown. In the work displacement amount of the non-holdable work W2, “YD21” is the Y coordinate value “Y2” in the three-dimensional position information of the non-holdable work W2 included in the reference image data GDS, and is stored in the first image data GD1. The difference from the Y coordinate value “Y21” in the three-dimensional position information of the unacceptable work W2 is shown. Further, in the work displacement amount of the non-holdable work W2, “ZD21” is the Z coordinate value “Z2” in the three-dimensional position information of the non-holdable work W2 included in the reference image data GDS and the hold included in the first image data GD1. The difference from the Z coordinate value “Z21” in the three-dimensional position information of the unacceptable work W2 is shown.

基準画像ＧＳと第１画像Ｇ１との比較から明らかなように、第１手法を用いた行動パターンＡ１に基づくロボット２の変位動作後において保持不可ワークＷ１の位置は、変位動作前の位置と比較して殆ど変化していないが、保持不可ワークＷ２の位置は爪部２６１による保持スペースが確保される程度に変化している。このため、変位量観測部６４によって観測された第１ワーク変位量ＷＤ１に含まれる保持不可ワークＷ１のワーク変位量（ＸＤ１１，ＹＤ１１，ＺＤ１１）の各値は「０；ゼロ」に近い値を示すが、保持不可ワークＷ２のワーク変位量（ＸＤ２１，ＹＤ２１，ＺＤ２１）の各値は保持不可ワークＷ２の変位に応じた値を示すことになる。 As is clear from the comparison between the reference image GS and the first image G1, the position of the non-holdable work W1 after the displacement operation of the robot 2 based on the action pattern A1 using the first method is compared with the position before the displacement operation. However, the position of the non-holdable work W2 has changed to such an extent that a holding space for the claw portion 261 is secured. Therefore, each value of the work displacement amount (XD11, YD11, ZD11) of the non-holdable work W1 included in the first work displacement amount WD1 observed by the displacement amount observation unit 64 indicates a value close to “0; zero”. However, each value of the work displacement amount (XD21, YD21, ZD21) of the non-holdable work W2 indicates a value corresponding to the displacement of the non-holdable work W2.

第１手法を用いた行動パターンＡ１に基づくロボット２の変位動作によって、保持不可ワークＷ２が保持スペースの確保が可能に変位されると、図７に示すように、ロボット２の状態が状態Ｓ３１（変位動作後の状態）であるときの状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）が（０，０，０，１，１）とされる。図７に示す例では、行動パターンＡ１に基づくロボット２の変位動作によって保持不可ワークＷ２の周囲に保持スペースが確保され、当該ワークＷ２が保持可能とされる。このため、状態変数の「ｄ」が「１」であり、ワークＷ２の周囲に爪部２６１による保持スペースが確保されたことが示されている。 When the non-holdable work W2 is displaced by the displacement operation of the robot 2 based on the action pattern A1 using the first method so that the holding space can be secured, as shown in FIG. 7, the state of the robot 2 is changed to the state S31 ( The state variables ([Delta] X, [Delta] Y, [Delta] Z, p, d) when the state is the state after the displacement operation are (0, 0, 0, 1, 1). In the example shown in FIG. 7, a holding space is secured around the non-holdable work W2 by the displacement operation of the robot 2 based on the action pattern A1, and the work W2 can be held. Therefore, the state variable “d” is “1”, which indicates that the holding space for the claw portion 261 is secured around the work W2.

また、変位量観測部６４は、基準画像データＧＤＳと第２画像データＧＤ２とに基づいて、第１手法を用いた行動パターンＡ２に基づくロボット２の変位動作が実行されたときの、保持不可ワークＷ１，Ｗ２の容器ＣＮ内での変位量を表す第２ワーク変位量ＷＤ２を観測する。第２ワーク変位量ＷＤ２は、保持不可ワークＷ１のワーク変位量（ＸＤ１２，ＹＤ１２，ＺＤ１２）と、保持不可ワークＷ２のワーク変位量（ＸＤ２２，ＹＤ２２，ＺＤ２２）とを含む。保持不可ワークＷ１のワーク変位量において「ＸＤ１２」は、基準画像データＧＤＳに含まれる保持不可ワークＷ１の三次元位置情報におけるＸ座標値「Ｘ１」と、第２画像データＧＤ２に含まれる保持不可ワークＷ１の三次元位置情報におけるＸ座標値「Ｘ１２」との差分を示す。また、保持不可ワークＷ１のワーク変位量において「ＹＤ１２」は、基準画像データＧＤＳに含まれる保持不可ワークＷ１の三次元位置情報におけるＹ座標値「Ｙ１」と、第２画像データＧＤ２に含まれる保持不可ワークＷ１の三次元位置情報におけるＹ座標値「Ｙ１２」との差分を示す。また、保持不可ワークＷ１のワーク変位量において「ＺＤ１２」は、基準画像データＧＤＳに含まれる保持不可ワークＷ１の三次元位置情報におけるＺ座標値「Ｚ１」と、第２画像データＧＤ２に含まれる保持不可ワークＷ１の三次元位置情報におけるＺ座標値「Ｚ１２」との差分を示す。 In addition, the displacement amount observing unit 64 performs the non-holding work when the robot 2 performs the displacement operation based on the action pattern A2 using the first method based on the reference image data GDS and the second image data GD2. The second work displacement WD2 representing the displacement of W1 and W2 in the container CN is observed. The second work displacement amount WD2 includes a work displacement amount (XD12, YD12, ZD12) of the non-holdable work W1 and a work displacement amount (XD22, YD22, ZD22) of the non-holdable work W2. In the work displacement amount of the non-holdable work W1, “XD12” is the X coordinate value “X1” in the three-dimensional position information of the non-holdable work W1 included in the reference image data GDS, and the non-holdable work included in the second image data GD2. The difference from the X coordinate value “X12” in the three-dimensional position information of W1 is shown. In addition, “YD12” in the work displacement amount of the non-holdable work W1 is the Y coordinate value “Y1” in the three-dimensional position information of the non-holdable work W1 included in the reference image data GDS and the hold included in the second image data GD2. The difference from the Y coordinate value “Y12” in the three-dimensional position information of the unacceptable work W1 is shown. Further, in the work displacement amount of the non-holdable work W1, “ZD12” is the Z coordinate value “Z1” in the three-dimensional position information of the non-holdable work W1 included in the reference image data GDS and the hold included in the second image data GD2. The difference from the Z coordinate value “Z12” in the three-dimensional position information of the unacceptable work W1 is shown.

同様に、保持不可ワークＷ２のワーク変位量において「ＸＤ２２」は、基準画像データＧＤＳに含まれる保持不可ワークＷ２の三次元位置情報におけるＸ座標値「Ｘ２」と、第２画像データＧＤ２に含まれる保持不可ワークＷ２の三次元位置情報におけるＸ座標値「Ｘ２２」との差分を示す。また、保持不可ワークＷ２のワーク変位量において「ＹＤ２２」は、基準画像データＧＤＳに含まれる保持不可ワークＷ２の三次元位置情報におけるＹ座標値「Ｙ２」と、第２画像データＧＤ２に含まれる保持不可ワークＷ２の三次元位置情報におけるＹ座標値「Ｙ２２」との差分を示す。また、保持不可ワークＷ２のワーク変位量において「ＺＤ２２」は、基準画像データＧＤＳに含まれる保持不可ワークＷ２の三次元位置情報におけるＺ座標値「Ｚ２」と、第２画像データＧＤ２に含まれる保持不可ワークＷ２の三次元位置情報におけるＺ座標値「Ｚ２２」との差分を示す。 Similarly, “XD22” in the work displacement amount of the non-holdable work W2 is included in the X coordinate value “X2” in the three-dimensional position information of the non-holdable work W2 included in the reference image data GDS, and in the second image data GD2. The difference from the X coordinate value “X22” in the three-dimensional position information of the work W2 that cannot be held is shown. Further, in the work displacement amount of the non-holdable work W2, “YD22” is the Y coordinate value “Y2” in the three-dimensional position information of the non-holdable work W2 included in the reference image data GDS and the hold included in the second image data GD2. The difference from the Y coordinate value “Y22” in the three-dimensional position information of the unacceptable work W2 is shown. Further, in the work displacement amount of the non-holdable work W2, “ZD22” is the Z coordinate value “Z2” in the three-dimensional position information of the non-holdable work W2 included in the reference image data GDS and the hold included in the second image data GD2. The difference from the Z coordinate value “Z22” in the three-dimensional position information of the unacceptable work W2 is shown.

基準画像ＧＳと第２画像Ｇ２との比較から明らかなように、第１手法を用いた行動パターンＡ２に基づくロボット２の変位動作後において保持不可ワークＷ１の位置は、変位動作前の位置と比較して殆ど変化していないが、保持不可ワークＷ２の位置は保持スペースよりも小さい範囲で変化している。このため、変位量観測部６４によって観測された第２ワーク変位量ＷＤ２に含まれる保持不可ワークＷ１のワーク変位量（ＸＤ１２，ＹＤ１２，ＺＤ１２）の各値は「０；ゼロ」に近い値を示すが、保持不可ワークＷ２のワーク変位量（ＸＤ２２，ＹＤ２２，ＺＤ２２）の各値は保持不可ワークＷ２の変位に応じた値を示すことになる。 As is clear from the comparison between the reference image GS and the second image G2, the position of the non-holdable work W1 after the displacement operation of the robot 2 based on the action pattern A2 using the first method is compared with the position before the displacement operation. However, the position of the non-holding work W2 changes in a range smaller than the holding space. Therefore, each value of the work displacement amount (XD12, YD12, ZD12) of the non-holdable work W1 included in the second work displacement amount WD2 observed by the displacement amount observation unit 64 indicates a value close to “0; zero”. However, each value of the work displacement amount (XD22, YD22, ZD22) of the non-retainable work W2 indicates a value corresponding to the displacement of the non-retainable work W2.

第１手法を用いた行動パターンＡ２に基づくロボット２の変位動作によって、保持不可ワークＷ２が保持スペースよりも小さい範囲で変位されると、図７に示すように、ロボット２の状態が状態Ｓ３２（変位動作後の状態）であるときの状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）が（０，０，０，１，０）とされる。図７に示す例では、行動パターンＡ２に基づくロボット２の変位動作によって保持不可ワークＷ２は変位されたけれども、その周囲に保持スペースが確保されておらず、当該ワークＷ２の保持が不可能である。このため、状態変数の「ｄ」が「０」であり、ワークＷ２の周囲に爪部２６１による保持スペースが確保されていないことが示されている。 When the non-holdable work W2 is displaced in a range smaller than the holding space by the displacement operation of the robot 2 based on the action pattern A2 using the first method, as shown in FIG. 7, the state of the robot 2 is changed to the state S32 ( The state variables ([Delta] X, [Delta] Y, [Delta] Z, p, d) in the state after the displacement operation are (0, 0, 0, 1, 0). In the example shown in FIG. 7, although the work W2 that cannot be held is displaced by the displacement operation of the robot 2 based on the action pattern A2, a holding space is not secured around the work W2, and the work W2 cannot be held. . Therefore, the state variable “d” is “0”, which indicates that the holding space for the claw portion 261 is not secured around the work W2.

また、変位量観測部６４は、基準画像データＧＤＳと第３画像データＧＤ３とに基づいて、第１手法を用いた行動パターンＡ３に基づくロボット２の変位動作が実行されたときの、保持不可ワークＷ１，Ｗ２の容器ＣＮ内での変位量を表す第３ワーク変位量ＷＤ３を観測する。第３ワーク変位量ＷＤ３は、保持不可ワークＷ１のワーク変位量（ＸＤ１３，ＹＤ１３，ＺＤ１３）と、保持不可ワークＷ２のワーク変位量（ＸＤ２３，ＹＤ２３，ＺＤ２３）とを含む。保持不可ワークＷ１のワーク変位量において「ＸＤ１３」は、基準画像データＧＤＳに含まれる保持不可ワークＷ１の三次元位置情報におけるＸ座標値「Ｘ１」と、第３画像データＧＤ３に含まれる保持不可ワークＷ１の三次元位置情報におけるＸ座標値「Ｘ１３」との差分を示す。また、保持不可ワークＷ１のワーク変位量において「ＹＤ１３」は、基準画像データＧＤＳに含まれる保持不可ワークＷ１の三次元位置情報におけるＹ座標値「Ｙ１」と、第３画像データＧＤ３に含まれる保持不可ワークＷ１の三次元位置情報におけるＹ座標値「Ｙ１３」との差分を示す。また、保持不可ワークＷ１のワーク変位量において「ＺＤ１３」は、基準画像データＧＤＳに含まれる保持不可ワークＷ１の三次元位置情報におけるＺ座標値「Ｚ１」と、第３画像データＧＤ３に含まれる保持不可ワークＷ１の三次元位置情報におけるＺ座標値「Ｚ１３」との差分を示す。 In addition, the displacement amount observing unit 64 performs the non-holding work when the displacement operation of the robot 2 based on the action pattern A3 using the first method is performed based on the reference image data GDS and the third image data GD3. The third work displacement WD3 representing the displacement of W1 and W2 in the container CN is observed. The third work displacement amount WD3 includes a work displacement amount (XD13, YD13, ZD13) of the non-holdable work W1 and a work displacement amount (XD23, YD23, ZD23) of the non-holdable work W2. In the work displacement amount of the non-holdable work W1, “XD13” is the X coordinate value “X1” in the three-dimensional position information of the non-holdable work W1 included in the reference image data GDS, and the non-holdable work included in the third image data GD3. The difference from the X coordinate value “X13” in the three-dimensional position information of W1 is shown. Further, in the work displacement amount of the non-holdable work W1, “YD13” is the Y coordinate value “Y1” in the three-dimensional position information of the non-holdable work W1 included in the reference image data GDS and the hold included in the third image data GD3. The difference from the Y coordinate value “Y13” in the three-dimensional position information of the unacceptable work W1 is shown. Further, in the work displacement amount of the non-holdable work W1, “ZD13” is the Z coordinate value “Z1” in the three-dimensional position information of the non-holdable work W1 included in the reference image data GDS, and the hold included in the third image data GD3. The difference from the Z coordinate value “Z13” in the three-dimensional position information of the unacceptable work W1 is shown.

同様に、保持不可ワークＷ２のワーク変位量において「ＸＤ２３」は、基準画像データＧＤＳに含まれる保持不可ワークＷ２の三次元位置情報におけるＸ座標値「Ｘ２」と、第３画像データＧＤ３に含まれる保持不可ワークＷ２の三次元位置情報におけるＸ座標値「Ｘ２３」との差分を示す。また、保持不可ワークＷ２のワーク変位量において「ＹＤ２３」は、基準画像データＧＤＳに含まれる保持不可ワークＷ２の三次元位置情報におけるＹ座標値「Ｙ２」と、第３画像データＧＤ３に含まれる保持不可ワークＷ２の三次元位置情報におけるＹ座標値「Ｙ２３」との差分を示す。また、保持不可ワークＷ２のワーク変位量において「ＺＤ２３」は、基準画像データＧＤＳに含まれる保持不可ワークＷ２の三次元位置情報におけるＺ座標値「Ｚ２」と、第３画像データＧＤ３に含まれる保持不可ワークＷ２の三次元位置情報におけるＺ座標値「Ｚ２３」との差分を示す。 Similarly, in the work displacement amount of the non-holdable work W2, “XD23” is included in the X coordinate value “X2” in the three-dimensional position information of the non-holdable work W2 included in the reference image data GDS, and in the third image data GD3. The difference from the X coordinate value “X23” in the three-dimensional position information of the work W2 that cannot be held is shown. Further, “YD23” in the work displacement amount of the non-holdable work W2 is the Y coordinate value “Y2” in the three-dimensional position information of the non-holdable work W2 included in the reference image data GDS, and the hold included in the third image data GD3. The difference from the Y coordinate value “Y23” in the three-dimensional position information of the unacceptable work W2 is shown. Further, in the work displacement amount of the non-holdable work W2, “ZD23” is the Z coordinate value “Z2” in the three-dimensional position information of the non-holdable work W2 included in the reference image data GDS and the hold included in the third image data GD3. The difference from the Z coordinate value “Z23” in the three-dimensional position information of the unacceptable work W2 is shown.

基準画像ＧＳと第３画像Ｇ３との比較から明らかなように、第１手法を用いた行動パターンＡ３に基づくロボット２の変位動作後において保持不可ワークＷ１，Ｗ２の位置は、変位動作前の位置と比較して殆ど変化していない。このため、変位量観測部６４によって観測された第３ワーク変位量ＷＤ３に含まれる保持不可ワークＷ１のワーク変位量（ＸＤ１３，ＹＤ１３，ＺＤ１３）の各値と、保持不可ワークＷ２のワーク変位量（ＸＤ２３，ＹＤ２３，ＺＤ２３）の各値とは、「０；ゼロ」に近い値を示すことになる。 As is clear from the comparison between the reference image GS and the third image G3, the positions of the non-holdable works W1 and W2 after the displacement operation of the robot 2 based on the action pattern A3 using the first method are the positions before the displacement operation. There is hardly any change compared to. Therefore, each value of the work displacement amount (XD13, YD13, ZD13) of the non-holdable work W1 included in the third work displacement amount WD3 observed by the displacement amount observation unit 64 and the work displacement amount of the non-holdable work W2 ( Each value of XD23, YD23, ZD23) indicates a value close to "0; zero".

第１手法を用いた行動パターンＡ３に基づくロボット２の変位動作が実行されると、図７に示すように、ロボット２の状態が状態Ｓ３３（変位動作後の状態）であるときの状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）が（０，０，０，１，０）とされる。図７に示す例では、行動パターンＡ３に基づくロボット２の変位動作によって保持不可ワークＷ２は殆ど変位されておらず、その周囲に保持スペースが確保されていないので、当該ワークＷ２の保持が不可能である。このため、状態変数の「ｄ」が「０」であり、ワークＷ２の周囲に爪部２６１による保持スペースが確保されていないことが示されている。 When the displacement operation of the robot 2 based on the action pattern A3 using the first method is executed, as shown in FIG. 7, the state variable (the state after the displacement operation) when the state of the robot 2 is the state S33 (the state after the displacement operation) ΔX, ΔY, ΔZ, p, d) are set to (0, 0, 0, 1, 0). In the example shown in FIG. 7, the work W2 that cannot be held is hardly displaced by the displacement operation of the robot 2 based on the action pattern A3, and the work W2 cannot be held because the holding space is not secured around the work W2. It is. Therefore, the state variable “d” is “0”, which indicates that the holding space for the claw portion 261 is not secured around the work W2.

変位量観測部６４によって保持不可ワークＷ１，Ｗ２のワーク変位量が観測されると、学習部６３の報酬設定部６３１は、保持不可ワークＷ１，Ｗ２の少なくともいずれか一方のワーク（保持不可ワークＷ２）のワーク変位量が（閾値ＷＤＴ×０．５）以上であるか否かを判定する（図１０のステップａ８）。更に、報酬設定部６３１は、保持不可ワークＷ２のワーク変位量が閾値ＷＤＴ以上であるか否かを判定する（図１０のステップａ９）。報酬設定部６３１は、保持不可ワークＷ２のワーク変位量が所定の閾値ＷＤＴ以上となるロボット２の行動パターン（図７の行動Ａ１）に対しては、第１の値Ｒ１（例えば「１００」）の報酬Ｒを与える（図１０のステップａ１０）。報酬設定部６３１は、保持不可ワークＷ２のワーク変位量が（閾値ＷＤＴ×０．５）以上且つ閾値ＷＤＴ未満となるロボット２の行動パターン（図７の行動Ａ２）に対しては、第１の値Ｒ１よりも小さい第２の値Ｒ２（例えば「１０」）の報酬Ｒを与える（図１０のステップａ１５）。報酬設定部６３１は、保持不可ワークＷ２のワーク変位量が（閾値ＷＤＴ×０．５）未満となるロボット２の行動パターン（図７の行動Ａ３）に対しては、第２の値Ｒ２よりも小さい第３の値Ｒ３（例えば「０：ゼロ」）の報酬Ｒを与える（図１０のステップａ１４）。 When the displacement amount of the non-holdable works W1 and W2 is observed by the displacement amount observation unit 64, the reward setting unit 631 of the learning unit 63 sets at least one of the non-holdable works W1 and W2 (the non-holdable work W2). ) Is determined to be equal to or more than (threshold value WDT × 0.5) (step a8 in FIG. 10). Further, the reward setting unit 631 determines whether or not the work displacement amount of the non-holdable work W2 is equal to or larger than the threshold value WDT (step a9 in FIG. 10). The reward setting unit 631 sets the first value R1 (for example, “100”) for the behavior pattern (the behavior A1 in FIG. 7) of the robot 2 in which the work displacement amount of the non-holdable work W2 is equal to or larger than the predetermined threshold WDT. (Step a10 in FIG. 10). The reward setting unit 631 sets the first movement pattern (behavior A2 in FIG. 7) of the robot 2 in which the work displacement amount of the non-holdable work W2 is equal to or more than (threshold value WDT × 0.5) and less than the threshold value WDT. A reward R of a second value R2 (for example, “10”) smaller than the value R1 is given (step a15 in FIG. 10). The reward setting unit 631 determines that the action pattern of the robot 2 (action A3 in FIG. 7) in which the work displacement amount of the non-holdable work W2 is less than (threshold value WDT × 0.5) is smaller than the second value R2. A reward R of a small third value R3 (for example, “0: zero”) is given (step a14 in FIG. 10).

次に、学習部６３の価値関数更新部６３２は、ロボット２の行動パターンの価値Ｑ（ｓ，ａ）を規定する価値関数を、上記式（１）の更新式を用いて更新する（図１０のステップａ１１，ａ１６）。 Next, the value function updating unit 632 of the learning unit 63 updates the value function defining the value Q (s, a) of the behavior pattern of the robot 2 by using the above-mentioned equation (1) (FIG. 10). Steps a11 and a16).

価値関数更新部６３２によって価値関数が更新されるごとに学習部６３は、第１手法を用いて保持不可ワークＷ２を変位させる変位動作におけるロボット２の行動パターンの学習結果を表す学習結果情報ＪＨ１（図９）を生成する。学習部６３によって生成された学習結果情報ＪＨ１は、記憶部８に記憶される。学習結果情報ＪＨ１は、例えば、変位手法情報Ｊ１１と、基準画像データ情報Ｊ１２と、行動パターン情報Ｊ１３と、ワーク変位量情報Ｊ１４と、報酬情報Ｊ１５とが関連付けられた情報である。変位手法情報Ｊ１１は、ロボット２の変位動作の際に用いられた変位手法を表す情報である。基準画像データ情報Ｊ１２は、判定部７が保持不可ワークの存否を判定する際に参照した基準画像データＧＤＳを表す情報である。行動パターン情報Ｊ１３は、ロボット２の変位動作の際に行動観測部６２により観測されたロボット２の行動パターンを表す情報であり、行動パターンを規定する行動要素が含まれる。ワーク変位量情報Ｊ１４は、ロボット２の変位動作の際に変位量観測部６４により観測された保持不可ワークのワーク変位量を表す情報である。報酬情報Ｊ１５は、行動観測部６２により観測されたロボット２の行動パターンに対して報酬設定部６３１が設定した報酬Ｒを表す情報である。 Each time the value function is updated by the value function updating unit 632, the learning unit 63 uses the first method to learn the behavior result of the robot 2 in the displacement operation of displacing the non-holdable work W2 by using the learning result information JH1 ( 9) is generated. The learning result information JH1 generated by the learning unit 63 is stored in the storage unit 8. The learning result information JH1 is, for example, information in which displacement method information J11, reference image data information J12, action pattern information J13, work displacement amount information J14, and reward information J15 are associated with each other. The displacement method information J11 is information indicating the displacement method used at the time of the displacement operation of the robot 2. The reference image data information J12 is information representing the reference image data GDS that the determination unit 7 referred to when determining whether or not there was a work that could not be held. The behavior pattern information J13 is information representing a behavior pattern of the robot 2 observed by the behavior observation unit 62 at the time of the displacement operation of the robot 2, and includes a behavior element that defines the behavior pattern. The work displacement amount information J14 is information representing the work displacement amount of the non-holdable work observed by the displacement amount observation unit 64 during the displacement operation of the robot 2. The reward information J15 is information representing the reward R set by the reward setting unit 631 for the behavior pattern of the robot 2 observed by the behavior observation unit 62.

図９に例示される学習結果情報ＪＨ１においては、基準画像データ情報Ｊ１２にて表される基準画像データＧＤＳに対応した配置状況の保持不可ワークに対して、第１手法（変位手法情報Ｊ１１）を用いた行動パターンＡ１，Ａ２，Ａ３（行動パターン情報Ｊ１３）に基づくロボット２の変位動作が実行されたことが示されている。そして、第１手法を用いた行動パターンＡ１は、保持不可ワークのワーク変位量ＷＤ１が閾値ＷＤＴ以上となり（ワーク変位量情報Ｊ１４）、第１の値Ｒ１（＝１００）の報酬Ｒ（報酬情報Ｊ１５）が与えられている。つまり、学習部６３は、基準画像データＧＤＳに対応した配置状況の保持不可ワークを保持スペースの確保が可能に変位させるための最適なロボット２の行動パターンとして、第１手法を用いた行動パターンＡ１を学習したことになる。図７を参照して説明すると、学習部６３は、第１手法を用いた行動パターンＡ１を規定する各行動要素を解析することによって、爪部２６１によって保持した一のワークＷ３の先端を保持不可ワークＷ２のどの位置に当接させて（当接位置ＣＰ）、どの方向にハンド部２６が移動すれば（移動軌跡ＭＴ）、保持スペースの確保が可能に保持不可ワークＷ２を変位させることができるかを学習する。また、学習部６３は、第１手法を用いた行動パターンＡ２，Ａ３については、保持不可ワークの周囲に保持スペースを確保するには至らない行動パターンであることを学習したことになる。 In the learning result information JH1 illustrated in FIG. 9, the first method (displacement method information J11) is applied to a work that cannot be held in an arrangement state corresponding to the reference image data GDS represented by the reference image data information J12. This shows that the displacement operation of the robot 2 based on the used action patterns A1, A2, and A3 (action pattern information J13) has been executed. In the behavior pattern A1 using the first method, the work displacement amount WD1 of the work that cannot be held becomes equal to or larger than the threshold value WDT (work displacement amount information J14), and the reward R (reward information J15) of the first value R1 (= 100) is obtained. ) Is given. That is, the learning unit 63 uses the behavior pattern A1 using the first method as the optimal behavior pattern of the robot 2 for displacing the work that cannot be held in the arrangement state corresponding to the reference image data GDS so that the holding space can be secured. You have learned. Referring to FIG. 7, the learning unit 63 cannot hold the tip of one work W3 held by the claw 261 by analyzing each action element that defines the action pattern A1 using the first method. If the hand portion 26 is moved to a position (contact position CP) where the hand W is brought into contact with the work W2 (movement locus MT), the work W2 that cannot be held can be displaced so that a holding space can be secured. To learn. In addition, the learning unit 63 has learned that the behavior patterns A2 and A3 using the first method are behavior patterns that cannot secure a holding space around the non-holdable work.

学習部６３は、第１の値Ｒ１（＝１００）の報酬Ｒが与えられた行動パターン、すなわち、保持スペースが確保されるように保持不可ワークを変位させた行動パターンを認識した時点で学習処理を終了する。図９に示す例では、学習部６３は、第１の値Ｒ１（＝１００）の報酬Ｒが与えられた、第１手法を用いた行動パターンＡ１を認識した時点で学習処理を終了する。このように、第１の値Ｒ１（＝１００）の報酬Ｒが与えられた行動パターンに基づくロボット２の変位動作が実行されたときには、保持不可ワークの周囲に保持スペースが確保され、当該ワークの爪部２６１による保持が可能となる。従って、保持不可ワークの周囲に保持スペースが確保された後、行動決定部９は、前述した既存の行動パターンを記憶部８から読み出すことによって、保持スペースが確保されたワークに対するロボット２の行動パターンを決定し（図１０のステップａ１２）、その決定した行動パターンを制御装置４に向けて出力する（図１０のステップａ１３）。これにより、制御装置４の制御によってロボット２は、ハンド部２６によって容器ＣＮから保持スペースが確保されたワークを取り出し、その取り出したワークをパレットＰＬに載置するという、連続的な生産動作を実行する。 When the learning unit 63 recognizes the behavior pattern to which the reward R of the first value R1 (= 100) has been given, that is, the behavior pattern in which the non-holdable work is displaced so that the holding space is secured. To end. In the example illustrated in FIG. 9, the learning unit 63 ends the learning process when recognizing the behavior pattern A1 using the first method, to which the reward R of the first value R1 (= 100) has been given. As described above, when the displacement operation of the robot 2 based on the action pattern provided with the reward R of the first value R1 (= 100) is performed, the holding space is secured around the non-holdable work, and The holding by the claw portion 261 becomes possible. Therefore, after the holding space is secured around the non-holdable work, the action determining unit 9 reads out the above-described existing action pattern from the storage unit 8 to determine the behavior pattern of the robot 2 with respect to the work for which the holding space is secured. Is determined (step a12 in FIG. 10), and the determined action pattern is output to the control device 4 (step a13 in FIG. 10). Thereby, under the control of the control device 4, the robot 2 executes a continuous production operation of taking out the work with the holding space secured from the container CN by the hand unit 26 and placing the taken out work on the pallet PL. I do.

一方、第２の値Ｒ２（＝１０）又は第３の値Ｒ３（＝０）の報酬Ｒが与えられた行動パターン、すなわち、保持不可ワークの周囲に保持スペースを確保するには至らない行動パターンを認識した場合、学習部６３は、学習処理の回数が所定の学習回数に達したか否かを判定する（図１０のステップａ１７）。第２の値Ｒ２（＝１０）又は第３の値Ｒ３（＝０）の報酬Ｒが与えられた行動パターンの学習部６３による認識が繰り返されて、所定の学習回数に達した場合、学習部６３は、保持不可ワークの周囲に保持スペースの確保ができないと判断し、ワーク保持不可情報を出力する（図１０のステップａ１８）。ワーク保持不可情報は、ハンド部２６の爪部２６１によるワークの保持が不可能であることを表す情報である。学習部６３によってワーク保持不可情報が出力された場合、第１手法を用いた行動パターンに基づくロボット２の変位動作の実行時において、爪部２６１に保持されていた一のワークをパレットＰＬに載置する動作が実行された後、ロボット２の生産動作が中断される。ロボット２の生産動作が中断されると、作業者は、容器ＣＮ内におけるワークの収容状況を確認し、ハンド部２６による保持が不可能であると想定されるワークを移動させる等の処置を行えばよい。 On the other hand, an action pattern in which a reward R of the second value R2 (= 10) or the third value R3 (= 0) is given, that is, an action pattern that does not lead to securing a holding space around a non-holdable work. When learning is performed, the learning unit 63 determines whether the number of times of the learning process has reached a predetermined number of times of learning (step a17 in FIG. 10). When the learning unit 63 repeats the recognition of the behavior pattern to which the reward R of the second value R2 (= 10) or the third value R3 (= 0) has been given, and reaches a predetermined number of times of learning, the learning unit 63 determines that a holding space cannot be secured around the non-holding work, and outputs the work holding non-holding information (step a18 in FIG. 10). The work holding impossible information is information indicating that the holding of the work by the claw part 261 of the hand part 26 is impossible. When the learning unit 63 outputs the work holding impossible information, the one work held by the claw unit 261 is placed on the pallet PL when the displacement operation of the robot 2 is performed based on the behavior pattern using the first method. After the placing operation is performed, the production operation of the robot 2 is interrupted. When the production operation of the robot 2 is interrupted, the worker checks the accommodation state of the work in the container CN, and performs a measure such as moving the work which is assumed to be impossible to be held by the hand unit 26. Just do it.

なお、学習部６３により生成された今回の学習結果を表す学習結果情報ＪＨ１は、次回以降のロボット２の変位動作の実行時に参照される。例えば、第１の値Ｒ１（＝１００）の報酬Ｒが与えられた行動パターンが登録された学習結果情報ＪＨ１が記憶部８に記憶されていることを想定する。記憶部８に記憶された学習結果情報ＪＨ１に登録されている基準画像データＧＤＳにて表される配置状況と同一又は類似した配置状況の保持不可ワークの存在が判定部７によって判定された場合、学習部６３による変位動作に関する学習処理は省略される。この場合、行動決定部９は、記憶部８に記憶された学習結果情報ＪＨ１に登録されている、第１の値Ｒ１（＝１００）の報酬Ｒが与えられた行動パターンを読み出すことによって、変位動作の実行時におけるロボット２の行動パターンを決定する。行動決定部９は、記憶部８から読み出したロボット２の変位動作時の行動パターンを制御装置４に向けて出力する。変位動作時の行動パターンが入力された制御装置４は、当該行動パターンに基づいて、ロボット２の動作を制御する。制御装置４の制御によってロボット２は、爪部２６１による保持スペースが周囲に確保されるように保持不可ワークを変位させる。 The learning result information JH1 representing the current learning result generated by the learning unit 63 is referred to at the time of performing the displacement operation of the robot 2 from the next time. For example, it is assumed that learning result information JH1 in which an action pattern to which a reward R of a first value R1 (= 100) is given is stored in the storage unit 8. When the determination unit 7 determines that there is a non-holdable work having the same or similar arrangement status as the arrangement status represented by the reference image data GDS registered in the learning result information JH1 stored in the storage unit 8, The learning process regarding the displacement operation by the learning unit 63 is omitted. In this case, the behavior determining unit 9 reads the behavior pattern provided with the reward R of the first value R1 (= 100) registered in the learning result information JH1 stored in the storage unit 8 to perform the displacement. The action pattern of the robot 2 at the time of executing the action is determined. The action determining unit 9 outputs the action pattern at the time of the displacement operation of the robot 2 read from the storage unit 8 to the control device 4. The control device 4 to which the behavior pattern at the time of the displacement operation is input controls the operation of the robot 2 based on the behavior pattern. Under the control of the control device 4, the robot 2 displaces the work that cannot be held such that a space for holding the claw portion 261 is secured around the robot.

以上説明したように、ハンド部２６の爪部２６１による次の保持候補となるワークが保持不可ワークであることが判定部７によって判定された場合、学習部６３は、保持スペースが確保されるように保持不可ワークを変位させることが可能な、第１手法を用いたロボット２の行動パターンを学習する。これにより、学習部６３は、ハンド部２６の爪部２６１による保持が不可能とされたワークの保持を可能とする、第１手法を用いたロボット２の最適な行動パターンを学習することができる。そして、次回のロボット２の生産動作において、行動決定部９は、学習部６３により生成された学習結果情報ＪＨ１に登録された、第１の値Ｒ１（＝１００）の報酬Ｒが与えられた行動パターンを、保持不可ワークをハンド部２６によって保持可能とするための行動パターンとして決定する。この行動パターンに従ってロボット２が変位動作を実行することにより、ハンド部２６の爪部２６１による保持が不可能とされたワークの周囲に、爪部２６１による保持を可能とするための保持スペースが確保され、当該ワークの爪部２６１による保持が可能となる。このため、保持不可ワークの存在に起因してロボット２の動作を停止させることを可及的に回避することができ、ハンド部２６による容器ＣＮからのワークの取り出し動作を継続させることができる。 As described above, when the determination unit 7 determines that the work to be held next by the claw unit 261 of the hand unit 26 is a work that cannot be held, the learning unit 63 ensures that the holding space is secured. The behavior pattern of the robot 2 using the first method, which can displace the work that cannot be held, is learned. Thereby, the learning unit 63 can learn the optimal behavior pattern of the robot 2 using the first method, which enables the holding of the work that cannot be held by the claw unit 261 of the hand unit 26. . Then, in the next production operation of the robot 2, the action determining unit 9 sets the action given the reward R of the first value R1 (= 100) registered in the learning result information JH1 generated by the learning unit 63. The pattern is determined as an action pattern for enabling the work that cannot be held by the hand unit 26 to be held. When the robot 2 performs the displacement operation in accordance with this behavior pattern, a holding space for enabling the holding by the claw portion 261 is secured around the work that cannot be held by the claw portion 261 of the hand unit 26. Thus, the work can be held by the claw portion 261. Therefore, it is possible to avoid stopping the operation of the robot 2 due to the existence of the non-holdable work as much as possible, and the operation of taking out the work from the container CN by the hand unit 26 can be continued.

なお、第１手法を用いた行動パターンに基づくロボット２の変位動作において、ロボット２の行動パターンは、図７に例示されるものに限定されるわけではなく、例えば図１１に示す行動パターンであってもよい。図１１は、第１例の変位動作におけるロボット２の行動パターンの変形例を説明するための図である。 In the displacement operation of the robot 2 based on the behavior pattern using the first method, the behavior pattern of the robot 2 is not limited to the example illustrated in FIG. You may. FIG. 11 is a diagram for explaining a modification of the behavior pattern of the robot 2 in the displacement operation of the first example.

図１１に示す例では、ハンド部２６の爪部２６１によって容器ＣＮ内の一のワークＷ３を保持したときに、ワークＷ１，Ｗ２の周囲に保持スペースが確保されておらず、ワークＷ１，Ｗ２が保持不可ワークとされている。このため、ロボット２の状態が状態Ｓ３であるときの状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）が（０，０，０，１，０）とされる。なお、図１１に示す例では、図７に示す例と同様に、保持不可ワークＷ１は容器ＣＮの内面に近接して配置され、保持不可ワークＷ２は保持不可ワークＷ１の側方において当該保持不可ワークＷ１に近接して配置されている。このため、保持不可ワークＷ１，Ｗ２の周囲に保持スペースが確保されていない。 In the example illustrated in FIG. 11, when one work W3 in the container CN is held by the claw portion 261 of the hand unit 26, no holding space is secured around the works W1 and W2, and the work W1 and W2 are The work cannot be held. Therefore, the state variables (ΔX, ΔY, ΔZ, p, d) when the state of the robot 2 is the state S3 are set to (0, 0, 0, 1, 0). In the example shown in FIG. 11, similarly to the example shown in FIG. 7, the non-holdable work W1 is disposed close to the inner surface of the container CN, and the non-holdable work W2 is not held on the side of the non-holdable work W1. It is arranged close to the work W1. Therefore, a holding space is not secured around the non-holding works W1 and W2.

第１手法を用いたロボット２の行動パターンとして、図１１に例示される行動Ａ４は、爪部２６１によって保持した一のワークＷ３の先端が容器ＣＮの内面に近接して配置された保持不可ワークＷ１の長手方向一端面に当接（当接位置ＣＰ）した状態で、ハンド部２６が移動（移動軌跡ＭＴ）するような行動パターンである。行動Ａ４では、ハンド部２６は、その移動途中において一のワークＷ３の先端の当接位置ＣＰが保持不可ワークＷ１から保持不可ワークＷ２へ遷移するように、保持不可ワークＷ１，Ｗ２の並列方向に関して保持不可ワークＷ１から斜めに離れる方向に移動（移動軌跡ＭＴ）する。 As an action pattern of the robot 2 using the first method, an action A4 illustrated in FIG. 11 is a non-holding work in which the tip of one work W3 held by the claw portion 261 is arranged close to the inner surface of the container CN. This is an action pattern in which the hand unit 26 moves (movement trajectory MT) in a state where the hand unit 26 is in contact with one longitudinal end surface of W1 (contact position CP). In the action A4, the hand unit 26 moves in the parallel direction of the non-holdable works W1 and W2 so that the contact position CP at the tip of one work W3 changes from the non-holdable work W1 to the non-holdable work W2 during the movement. It moves in a direction obliquely away from the non-holdable work W1 (movement locus MT).

第１手法を用いた行動パターンＡ４（行動Ａ４）に基づくロボット２の変位動作が実行されると、保持不可ワークＷ１及び保持不可ワークＷ２の双方のワークを変位させることが可能であり、少なくとも保持不可ワークＷ２については保持スペースが確保される程度に変位させることが可能である。 When the displacement operation of the robot 2 based on the behavior pattern A4 (behavior A4) using the first method is executed, it is possible to displace both of the work W1 and the work W2 that cannot be held, and at least hold the work. The impossible work W2 can be displaced to such an extent that a holding space is secured.

報酬設定部６３１は、上記のような、複数の保持不可ワークＷ１，Ｗ２を変位させ、少なくとも１つの保持不可ワークＷ２を保持スペースが確保される程度に変位させる行動パターンＡ４については、第１の値Ｒ１（＝１００）よりも大きな値の報酬Ｒを与えるようにしてもよい。 The reward setting unit 631 displaces the plurality of non-holdable works W1 and W2 as described above and displaces at least one non-holdable work W2 to such an extent that a holding space is secured. A reward R having a value larger than the value R1 (= 100) may be given.

＜変位動作の第２例について＞
図１２〜図１４を参照して、変位動作の第２例について説明する。図１２は、保持不可ワークを変位させる変位動作の第２例を説明するための図である。図１３は、第２例の変位動作において学習部６３によって生成される学習結果情報ＪＨ２を説明するための図である。図１４は、第２例の変位動作に関する機械学習装置５の動作を示すフローチャートである。第２例の変位動作は、第２手法を用いた行動パターンに基づくロボット２の変位動作である。 <About the second example of the displacement operation>
A second example of the displacement operation will be described with reference to FIGS. FIG. 12 is a diagram for explaining a second example of the displacement operation for displacing the non-holdable work. FIG. 13 is a diagram illustrating the learning result information JH2 generated by the learning unit 63 in the displacement operation of the second example. FIG. 14 is a flowchart illustrating the operation of the machine learning device 5 regarding the displacement operation of the second example. The displacement operation of the second example is a displacement operation of the robot 2 based on an action pattern using the second method.

状態観測部６１は、ロボット２の状態が移行されるごとに変化する状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）に基づいて、ロボット２の状態が状態Ｓ２から状態Ｓ３へ移行されたことを観測する（図１４のステップｂ１）。ロボット２の状態が状態Ｓ２から状態Ｓ３へ移行されたとき、判定部７は、撮像装置３から出力された基準画像データを取得する（図１４のステップｂ２）。判定部７は、基準画像データに基づいて容器ＣＮ内での各ワークの収容状況を認識し、次の保持候補となるワークが保持不可ワークであるか否かを判定する（図１４のステップｂ３）。 The state observing unit 61 determines that the state of the robot 2 has been shifted from the state S2 to the state S3 based on the state variables (ΔX, ΔY, ΔZ, p, d) that change each time the state of the robot 2 is shifted. Is observed (step b1 in FIG. 14). When the state of the robot 2 is shifted from the state S2 to the state S3, the determination unit 7 acquires the reference image data output from the imaging device 3 (Step b2 in FIG. 14). The determination unit 7 recognizes the accommodation state of each work in the container CN based on the reference image data, and determines whether or not the work to be the next holding candidate is a work that cannot be held (step b3 in FIG. 14). ).

次の保持候補となるワークが保持不可ワークではないと判定部７によって判定された場合には、ロボット２の状態が状態Ｓ３であるときの状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）が（０，０，０，１，１）とされる。この場合、行動決定部９は、状態Ｓ３から状態Ｓ４へと移行させるための既存の行動パターンを記憶部８から読み出して制御装置４に向けて出力する。既存の行動パターンが入力された制御装置４は、当該既存の行動パターンに基づいて、ロボット２の動作を制御する。制御装置４の制御によってロボット２は、爪部２６１によって保持された一のワークを容器ＣＮから取り出す取り出し動作を実行する（図１４のステップｂ５）。 When the determination unit 7 determines that the work to be the next holding candidate is not a work that cannot be held, the state variables (ΔX, ΔY, ΔZ, p, d) when the state of the robot 2 is the state S3 are determined. (0, 0, 0, 1, 1). In this case, the action determining unit 9 reads an existing action pattern for shifting from the state S3 to the state S4 from the storage unit 8 and outputs the pattern to the control device 4. The control device 4 to which the existing behavior pattern has been input controls the operation of the robot 2 based on the existing behavior pattern. Under the control of the control device 4, the robot 2 executes a take-out operation of taking out one work held by the claw portion 261 from the container CN (step b5 in FIG. 14).

一方、次の保持候補となるワークが保持不可ワークであると判定部７によって判定された場合には、図１２に示すように、ロボット２の状態が状態Ｓ３であるときの状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）が（０，０，０，１，０）とされる。図１２に示す例では、ハンド部２６の爪部２６１によって容器ＣＮ内の一のワークＷ３を保持したときに、ワークＷ１，Ｗ２の周囲に保持スペースが確保されておらず、ワークＷ１，Ｗ２が保持不可ワークとされている。なお、図１２に示す例では、図７に示す例と同様に、保持不可ワークＷ１は容器ＣＮの内面に近接して配置され、保持不可ワークＷ２は保持不可ワークＷ１の側方において当該保持不可ワークＷ１に近接して配置されている。このため、保持不可ワークＷ１，Ｗ２の周囲に保持スペースが確保されていない。 On the other hand, when the determination unit 7 determines that the work to be the next holding candidate is a work that cannot be held, as shown in FIG. 12, the state variable (ΔX, ΔY, ΔZ, p, d) are set to (0, 0, 0, 1, 0). In the example illustrated in FIG. 12, when one work W3 in the container CN is held by the claw 261 of the hand unit 26, no holding space is secured around the works W1 and W2, and the work W1 and W2 are The work cannot be held. In the example illustrated in FIG. 12, similarly to the example illustrated in FIG. 7, the non-retainable work W1 is disposed close to the inner surface of the container CN, and the non-retainable work W2 is non-retainable on the side of the non-retainable work W1. It is arranged close to the work W1. Therefore, a holding space is not secured around the non-holding works W1 and W2.

次の保持候補となるワークＷ１，Ｗ２が保持不可ワークであると判定部７によって判定された場合、保持不可ワークであるワークＷ１及びワークＷ２の少なくともいずれか一方のワークを、爪部２６１による保持スペースが周囲に確保されるように変位させる変位動作が実行される。変位動作の第２例においては、ロボット２は、図１２に示すように、第２手法を用いた行動パターンに基づく変位動作によって保持不可ワークを変位させる。なお、第２手法は、前述したように、爪部２６１によって保持した一のワークＷ３をパレットＰＬに載置した後、ハンド部２６が爪部２６１によって容器ＣＮを保持した状態で移動することにより、容器ＣＮの移動に応じて保持不可ワークを変位させる変位手法である。 When the determination unit 7 determines that the work W1 or W2 that is the next holding candidate is a work that cannot be held, the claw unit 261 holds at least one of the work W1 and the work W2 that is the work that cannot be held. A displacement operation is performed to displace so that a space is secured around. In the second example of the displacement operation, as shown in FIG. 12, the robot 2 displaces the work that cannot be held by the displacement operation based on the behavior pattern using the second method. Note that, as described above, the second method is such that, after placing one work W3 held by the claw portion 261 on the pallet PL, the hand portion 26 moves while holding the container CN by the claw portion 261. This is a displacement method for displacing the non-holdable work in accordance with the movement of the container CN.

行動観測部６２は、第２手法を用いたロボット２の行動パターンを観測する（図１４のステップｂ４）。図１２に示す例では、第２手法を用いたロボット２の行動パターンとして、行動Ａ１、行動Ａ２及び行動Ａ３の３種の行動パターンが示されている。行動Ａ１は、ハンド部２６が爪部２６１によって容器ＣＮを保持した状態で、保持不可ワークＷ１，Ｗ２の並列方向に対して保持不可ワークＷ１，Ｗ２に近づくように傾斜する方向に、所定の移動速度パターンで移動（移動軌跡ＭＴ）するような行動パターンである。行動Ａ２は、ハンド部２６の移動時における加速度が行動Ａ１よりも遅く、移動速度パターンが異なる以外は、行動Ａ１と同様の行動パターンである。行動Ａ３は、ハンド部２６が爪部２６１によって容器ＣＮを保持した状態で、保持不可ワークＷ１，Ｗ２の並列方向に対して保持不可ワークＷ１，Ｗ２から離れるように傾斜する方向に、所定の移動速度パターンで移動（移動軌跡ＭＴ）するような行動パターンである。 The behavior observation unit 62 observes the behavior pattern of the robot 2 using the second technique (Step b4 in FIG. 14). In the example illustrated in FIG. 12, three types of action patterns of the action A1, the action A2, and the action A3 are illustrated as the action patterns of the robot 2 using the second technique. The action A1 is a predetermined movement in a direction in which the hand unit 26 holds the container CN by the claw unit 261 and inclines so as to approach the non-holdable works W1 and W2 with respect to the parallel direction of the non-holdable works W1 and W2. This is an action pattern that moves in a speed pattern (movement locus MT). The action A2 is the same as the action A1, except that the acceleration of the hand unit 26 when moving is slower than the action A1 and the moving speed pattern is different. The action A3 is a predetermined movement in a direction in which the hand unit 26 holds the container CN by the claw unit 261 and is inclined away from the non-holdable works W1 and W2 with respect to the parallel direction of the non-holdable works W1 and W2. This is an action pattern that moves in a speed pattern (movement locus MT).

行動観測部６２によって観測されるロボット２の行動パターンを規定する行動要素としては、前述の図５に示される、把持角θ、把持位置ＨＰ、第１軸２Ａにおける回転角β１及び回転速度パターン、第２軸２Ｂにおける回転角β２及び回転速度パターン、第３軸２Ｃにおける回転角β３及び回転速度パターン、第４軸２Ｄにおける回転角β４及び回転速度パターン、第５軸２Ｅにおける回転角β５及び回転速度パターン、第６軸２Ｆにおける回転角β６及び回転速度パターンが含まれる。図５に示される各行動要素は、第２手法を用いたロボット２の行動パターンにおいて、爪部２６１が容器ＣＮを保持する保持位置を決定付ける要素となり、ハンド部２６の移動軌跡ＭＴを決定付ける要素となり、ハンド部２６の移動速度パターンを決定付ける要素となる。 The behavior elements that define the behavior pattern of the robot 2 observed by the behavior observation unit 62 include the grip angle θ, the grip position HP, the rotation angle β1 and the rotation speed pattern on the first axis 2A shown in FIG. Rotation angle β2 and rotation speed pattern on second shaft 2B, rotation angle β3 and rotation speed pattern on third shaft 2C, rotation angle β4 and rotation speed pattern on fourth shaft 2D, rotation angle β5 and rotation speed on fifth shaft 2E The pattern, the rotation angle β6 on the sixth axis 2F, and the rotation speed pattern are included. Each of the action elements shown in FIG. 5 is an element that determines the holding position where the claw 261 holds the container CN in the action pattern of the robot 2 using the second method, and determines the movement trajectory MT of the hand unit 26. This is an element that determines the moving speed pattern of the hand unit 26.

第２手法を用いた行動パターンに基づく変位動作が完了すると、変位量観測部６４は、撮像装置３から出力された、変位動作後の画像データを取得する（図１４のステップｂ６）。変位動作後の画像データは、第２手法を用いたロボット２の行動パターンによって変位された後の保持不可ワークＷ１，Ｗ２に関する三次元位置情報を含む画像データとなる。変位量観測部６４は、ロボット２による変位動作前における画像データであって、判定部７が保持不可ワークＷ１，Ｗ２の存否を判定するときに参照する基準画像データと、ロボット２による変位動作後における画像データとに基づいて、保持不可ワークＷ１，Ｗ２のワーク変位量を観測する（図１４のステップｂ７）。 When the displacement operation based on the action pattern using the second method is completed, the displacement amount observation unit 64 acquires the image data after the displacement operation output from the imaging device 3 (step b6 in FIG. 14). The image data after the displacement operation is image data including three-dimensional position information on the non-holdable works W1 and W2 after being displaced by the action pattern of the robot 2 using the second method. The displacement amount observing section 64 is image data before the displacing operation by the robot 2, the reference image data which the judging section 7 refers to when judging the existence of the non-holdable works W 1, W 2, and after the displacing operation by the robot 2. The amount of work displacement of the non-holdable works W1 and W2 is observed based on the image data in (1) (step b7 in FIG. 14).

図１２に示す例では、第２手法を用いた行動パターンＡ１に基づくロボット２の変位動作後において、保持不可ワークＷ１及び保持不可ワークＷ２の双方のワークが変位し、少なくとも保持不可ワークＷ２は保持スペースが確保される程度に変位している。第２手法を用いた行動パターンＡ１に基づくロボット２の変位動作によって、少なくとも保持不可ワークＷ２が保持スペースの確保が可能に変位されると、ロボット２の状態が状態Ｓ３１（変位動作後の状態）であるときの状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）が（０，０，０，１，１）とされる。 In the example shown in FIG. 12, after the displacement operation of the robot 2 based on the action pattern A1 using the second method, both the work that cannot be held W1 and the work that cannot be held W2 are displaced, and at least the work W2 that cannot be held is held. It is displaced to the extent that space is secured. When at least the non-holdable work W2 is displaced by the displacement operation of the robot 2 based on the action pattern A1 using the second method so that a holding space can be secured, the state of the robot 2 is changed to the state S31 (state after the displacement operation). , The state variables (ΔX, ΔY, ΔZ, p, d) are set to (0, 0, 0, 1, 1).

また、図１２に示す例では、第２手法を用いた行動パターンＡ２に基づくロボット２の変位動作後において、保持不可ワークＷ１は殆ど変位していないが、保持不可ワークＷ２は保持スペースよりも小さい範囲で変位している。第２手法を用いた行動パターンＡ２に基づくロボット２の変位動作によって、保持不可ワークＷ２は変位したけれども保持スペースよりも小さい範囲の変位であるので、ロボット２の状態が状態Ｓ３２（変位動作後の状態）であるときの状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）が（０，０，０，１，０）とされる。 In the example shown in FIG. 12, after the displacement operation of the robot 2 based on the action pattern A2 using the second method, the non-holdable work W1 is hardly displaced, but the non-holdable work W2 is smaller than the holding space. It is displaced in the range. Since the non-holdable work W2 is displaced by the displacement operation of the robot 2 based on the action pattern A2 using the second method, but is displaced in a range smaller than the holding space, the state of the robot 2 is changed to the state S32 (after the displacement operation). State variable (ΔX, ΔY, ΔZ, p, d) is (0, 0, 0, 1, 0).

また、図１２に示す例では、第２手法を用いた行動パターンＡ３に基づくロボット２の変位動作後において、保持不可ワークＷ１及び保持不可ワークＷ２の双方のワークが殆ど変位していない。第２手法を用いた行動パターンＡ３に基づくロボット２の変位動作によって、保持不可ワークＷ１及び保持不可ワークＷ２の双方のワークが殆ど変位しておらず、その周囲に保持スペースが確保されていないので、ロボット２の状態が状態Ｓ３３（変位動作後の状態）であるときの状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）が（０，０，０，１，０）とされる。 Further, in the example shown in FIG. 12, after the displacement operation of the robot 2 based on the behavior pattern A3 using the second method, both the non-holdable work W1 and the non-holdable work W2 hardly displace. Due to the displacement operation of the robot 2 based on the action pattern A3 using the second method, both the non-holdable work W1 and the non-holdable work W2 are hardly displaced, and no holding space is secured around them. When the state of the robot 2 is the state S33 (state after the displacement operation), the state variables (ΔX, ΔY, ΔZ, p, d) are set to (0, 0, 0, 1, 0).

変位量観測部６４によって保持不可ワークＷ１，Ｗ２のワーク変位量が観測されると、学習部６３の報酬設定部６３１は、保持不可ワークＷ１，Ｗ２の少なくともいずれか一方のワーク（保持不可ワークＷ２）のワーク変位量が（閾値ＷＤＴ×０．５）以上であるか否かを判定する（図１４のステップｂ８）。更に、報酬設定部６３１は、保持不可ワークＷ２のワーク変位量が閾値ＷＤＴ以上であるか否かを判定する（図１４のステップｂ９）。報酬設定部６３１は、保持不可ワークＷ２のワーク変位量が所定の閾値ＷＤＴ以上となるロボット２の行動パターン（図１２の行動Ａ１）に対しては、第１の値Ｒ１（例えば「１００」）の報酬Ｒを与える（図１４のステップｂ１０）。報酬設定部６３１は、保持不可ワークＷ２のワーク変位量が（閾値ＷＤＴ×０．５）以上且つ閾値ＷＤＴ未満となるロボット２の行動パターン（図１２の行動Ａ２）に対しては、第１の値Ｒ１よりも小さい第２の値Ｒ２（例えば「１０」）の報酬Ｒを与える（図１４のステップｂ１５）。報酬設定部６３１は、保持不可ワークＷ２のワーク変位量が（閾値ＷＤＴ×０．５）未満となるロボット２の行動パターン（図１２の行動Ａ３）に対しては、第２の値Ｒ２よりも小さい第３の値Ｒ３（例えば「０：ゼロ」）の報酬Ｒを与える（図１４のステップｂ１４）。 When the displacement amount of the non-holdable works W1 and W2 is observed by the displacement amount observation unit 64, the reward setting unit 631 of the learning unit 63 sets at least one of the non-holdable works W1 and W2 (the non-holdable work W2). ) Is determined to be equal to or more than (threshold value WDT × 0.5) (step b8 in FIG. 14). Further, the reward setting unit 631 determines whether or not the work displacement amount of the non-holdable work W2 is equal to or larger than the threshold value WDT (step b9 in FIG. 14). The reward setting unit 631 sets the first value R1 (for example, “100”) for the behavior pattern (the behavior A1 in FIG. 12) of the robot 2 in which the work displacement amount of the non-holdable work W2 is equal to or more than the predetermined threshold WDT. (Step b10 in FIG. 14). The reward setting unit 631 sets the first movement pattern (the movement A2 in FIG. 12) of the robot 2 in which the work displacement amount of the non-holdable work W2 is equal to or more than (threshold value WDT × 0.5) and less than the threshold value WDT. A reward R of a second value R2 (for example, “10”) smaller than the value R1 is given (step b15 in FIG. 14). The reward setting unit 631 determines that the action pattern (action A3 in FIG. 12) of the robot 2 in which the work displacement amount of the non-holdable work W2 is less than (threshold value WDT × 0.5) is smaller than the second value R2. A reward R of a small third value R3 (for example, “0: zero”) is given (step b14 in FIG. 14).

次に、学習部６３の価値関数更新部６３２は、ロボット２の行動パターンの価値Ｑ（ｓ，ａ）を規定する価値関数を、上記式（１）の更新式を用いて更新する（図１４のステップｂ１１，ｂ１６）。 Next, the value function updating unit 632 of the learning unit 63 updates the value function defining the value Q (s, a) of the behavior pattern of the robot 2 using the updating expression of the above expression (1) (FIG. 14). Steps b11 and b16).

価値関数更新部６３２によって価値関数が更新されるごとに学習部６３は、第２手法を用いて保持不可ワークＷ２を変位させる変位動作におけるロボット２の行動パターンの学習結果を表す学習結果情報ＪＨ２（図１３）を生成する。学習部６３によって生成された学習結果情報ＪＨ２は、記憶部８に記憶される。学習結果情報ＪＨ２は、前述した図９に示す学習結果情報ＪＨ１と同様に、例えば、変位手法情報Ｊ２１と、基準画像データ情報Ｊ２２と、行動パターン情報Ｊ２３と、ワーク変位量情報Ｊ２４と、報酬情報Ｊ２５とが関連付けられた情報である。 Each time the value function is updated by the value function updating unit 632, the learning unit 63 uses the second method to learn the learning result information JH2 () representing the learning result of the behavior pattern of the robot 2 in the displacement operation of displacing the non-holdable work W2. 13) is generated. The learning result information JH2 generated by the learning unit 63 is stored in the storage unit 8. Like the learning result information JH1 shown in FIG. 9 described above, the learning result information JH2 includes, for example, displacement method information J21, reference image data information J22, action pattern information J23, work displacement amount information J24, and reward information. J25 is associated information.

図１３に例示される学習結果情報ＪＨ２においては、基準画像データ情報Ｊ２２にて表される基準画像データＧＤＳに対応した配置状況の保持不可ワークに対して、第２手法（変位手法情報Ｊ２１）を用いた行動パターンＡ１，Ａ２，Ａ３（行動パターン情報Ｊ２３）に基づくロボット２の変位動作が実行されたことが示されている。そして、第２手法を用いた行動パターンＡ１は、保持不可ワークのワーク変位量ＷＤ１が閾値ＷＤＴ以上となり（ワーク変位量情報Ｊ２４）、第１の値Ｒ１（＝１００）の報酬Ｒ（報酬情報Ｊ２５）が与えられている。つまり、学習部６３は、基準画像データＧＤＳに対応した配置状況の保持不可ワークを保持スペースの確保が可能に変位させるための最適なロボット２の行動パターンとして、第２手法を用いた行動パターンＡ１を学習したことになる。図１２を参照して説明すると、学習部６３は、第２手法を用いた行動パターンＡ１を規定する各行動要素を解析することによって、爪部２６１によって容器ＣＮのどの位置を保持し（保持位置）、ハンド部２６がどの方向に、どのような移動速度パターンで移動（移動軌跡ＭＴ）すれば、保持スペースの確保が可能に保持不可ワークＷ２を変位させることができるかを学習する。また、学習部６３は、第２手法を用いた行動パターンＡ２，Ａ３については、保持不可ワークの周囲に保持スペースを確保するには至らない行動パターンであることを学習したことになる。 In the learning result information JH2 illustrated in FIG. 13, the second method (displacement method information J21) is applied to a work that cannot be held in an arrangement state corresponding to the reference image data GDS represented by the reference image data information J22. This shows that the displacement operation of the robot 2 based on the used action patterns A1, A2, and A3 (action pattern information J23) has been performed. In the behavior pattern A1 using the second technique, the work displacement amount WD1 of the work that cannot be held becomes equal to or larger than the threshold value WDT (work displacement amount information J24), and the reward R (reward information J25) of the first value R1 (= 100) ) Is given. In other words, the learning unit 63 uses the behavior pattern A1 using the second method as the optimal behavior pattern of the robot 2 for displacing the work that cannot be held in the arrangement state corresponding to the reference image data GDS so that the holding space can be secured. You have learned. Referring to FIG. 12, the learning unit 63 holds which position of the container CN by the claw unit 261 by analyzing each action element that defines the action pattern A1 using the second method (holding position). ), Learns in what direction and in what kind of moving speed pattern (moving locus MT) the hand unit 26 can displace the non-holdable work W2 so that a holding space can be secured. In addition, the learning unit 63 has learned that the behavior patterns A2 and A3 using the second method are behavior patterns that do not lead to securing a holding space around the work that cannot be held.

学習部６３は、第１の値Ｒ１（＝１００）の報酬Ｒが与えられた行動パターン、すなわち、保持スペースが確保されるように保持不可ワークを変位させた行動パターンを認識した時点で学習処理を終了する。図１３に示す例では、学習部６３は、第１の値Ｒ１（＝１００）の報酬Ｒが与えられた、第２手法を用いた行動パターンＡ１を認識した時点で学習処理を終了する。このように、第１の値Ｒ１（＝１００）の報酬Ｒが与えられた行動パターンに基づくロボット２の変位動作が実行されたときには、保持不可ワークの周囲に保持スペースが確保され、当該ワークの爪部２６１による保持が可能となる。従って、保持不可ワークの周囲に保持スペースが確保された後、行動決定部９は、前述した既存の行動パターンを記憶部８から読み出すことによって、保持スペースが確保されたワークに対するロボット２の行動パターンを決定し（図１４のステップｂ１２）、その決定した行動パターンを制御装置４に向けて出力する（図１４のステップｂ１３）。これにより、制御装置４の制御によってロボット２は、ハンド部２６によって容器ＣＮから保持スペースが確保されたワークを取り出し、その取り出したワークをパレットＰＬに載置するという、連続的な生産動作を実行する。 When the learning unit 63 recognizes the behavior pattern to which the reward R of the first value R1 (= 100) has been given, that is, the behavior pattern in which the non-holdable work is displaced so that the holding space is secured. To end. In the example illustrated in FIG. 13, the learning unit 63 ends the learning process when recognizing the behavior pattern A1 using the second method, to which the reward R of the first value R1 (= 100) has been given. As described above, when the displacement operation of the robot 2 based on the action pattern provided with the reward R of the first value R1 (= 100) is performed, the holding space is secured around the non-holdable work, and The holding by the claw portion 261 becomes possible. Therefore, after the holding space is secured around the non-holdable work, the action determining unit 9 reads out the above-described existing action pattern from the storage unit 8 to determine the behavior pattern of the robot 2 with respect to the work for which the holding space is secured. Is determined (step b12 in FIG. 14), and the determined action pattern is output to the control device 4 (step b13 in FIG. 14). Thereby, under the control of the control device 4, the robot 2 executes a continuous production operation of taking out the work with the holding space secured from the container CN by the hand unit 26 and placing the taken out work on the pallet PL. I do.

一方、第２の値Ｒ２（＝１０）又は第３の値Ｒ３（＝０）の報酬Ｒが与えられた行動パターン、すなわち、保持不可ワークの周囲に保持スペースを確保するには至らない行動パターンを認識した場合、学習部６３は、学習処理の回数が所定の学習回数に達したか否かを判定する（図１４のステップｂ１７）。第２の値Ｒ２（＝１０）又は第３の値Ｒ３（＝０）の報酬Ｒが与えられた行動パターンの学習部６３による認識が繰り返されて、所定の学習回数に達した場合、学習部６３は、保持不可ワークの周囲に保持スペースの確保ができないと判断し、ワーク保持不可情報を出力する（図１４のステップｂ１８）。学習部６３によってワーク保持不可情報が出力された場合、第２手法を用いた行動パターンに基づくロボット２の変位動作の実行時において、爪部２６１による容器ＣＮの保持を解除した後、ロボット２の生産動作が中断される。ロボット２の生産動作が中断されると、作業者は、容器ＣＮ内におけるワークの収容状況を確認し、ハンド部２６による保持が不可能であると想定されるワークを移動させる等の処置を行えばよい。 On the other hand, an action pattern in which a reward R of the second value R2 (= 10) or the third value R3 (= 0) is given, that is, an action pattern that does not lead to securing a holding space around a non-holdable work. When learning is performed, the learning unit 63 determines whether the number of times of the learning process has reached a predetermined number of times of learning (step b17 in FIG. 14). When the learning unit 63 repeats the recognition of the behavior pattern to which the reward R of the second value R2 (= 10) or the third value R3 (= 0) has been given, and reaches a predetermined number of times of learning, the learning unit 63 determines that a holding space cannot be secured around the non-holding work, and outputs the work holding non-holding information (step b18 in FIG. 14). When the learning unit 63 outputs the work holding impossible information, when the displacement operation of the robot 2 based on the action pattern using the second method is performed, the holding of the container CN by the claw unit 261 is released, and then the movement of the robot 2 is stopped. Production operation is interrupted. When the production operation of the robot 2 is interrupted, the worker checks the accommodation state of the work in the container CN, and performs a measure such as moving the work which is assumed to be impossible to be held by the hand unit 26. Just do it.

なお、学習部６３により生成された今回の学習結果を表す学習結果情報ＪＨ２は、次回以降のロボット２の変位動作の実行時に参照される。例えば、第１の値Ｒ１（＝１００）の報酬Ｒが与えられた行動パターンが登録された学習結果情報ＪＨ２が記憶部８に記憶されていることを想定する。記憶部８に記憶された学習結果情報ＪＨ２に登録されている基準画像データＧＤＳにて表される配置状況と同一又は類似した配置状況の保持不可ワークの存在が判定部７によって判定された場合、学習部６３による変位動作に関する学習処理は省略される。この場合、行動決定部９は、記憶部８に記憶された学習結果情報ＪＨ２に登録されている、第１の値Ｒ１（＝１００）の報酬Ｒが与えられた行動パターンを読み出すことによって、変位動作の実行時におけるロボット２の行動パターンを決定する。行動決定部９は、記憶部８から読み出したロボット２の変位動作時の行動パターンを制御装置４に向けて出力する。変位動作時の行動パターンが入力された制御装置４は、当該行動パターンに基づいて、ロボット２の動作を制御する。制御装置４の制御によってロボット２は、爪部２６１による保持スペースが周囲に確保されるように保持不可ワークを変位させる。 The learning result information JH2 representing the current learning result generated by the learning unit 63 is referred to at the time of executing the displacement operation of the robot 2 from the next time. For example, it is assumed that learning result information JH2 in which an action pattern provided with a reward R of a first value R1 (= 100) is stored in the storage unit 8. When the determination unit 7 determines that there is a non-holdable work having the same or similar layout status as the layout status represented by the reference image data GDS registered in the learning result information JH2 stored in the storage unit 8, The learning process regarding the displacement operation by the learning unit 63 is omitted. In this case, the behavior determining unit 9 reads the behavior pattern provided with the reward R of the first value R1 (= 100) registered in the learning result information JH2 stored in the storage unit 8, thereby performing the displacement. The action pattern of the robot 2 at the time of executing the action is determined. The action determining unit 9 outputs the action pattern at the time of the displacement operation of the robot 2 read from the storage unit 8 to the control device 4. The control device 4 to which the behavior pattern at the time of the displacement operation is input controls the operation of the robot 2 based on the behavior pattern. Under the control of the control device 4, the robot 2 displaces the work that cannot be held such that a space for holding the claw portion 261 is secured around the robot.

以上説明したように、ハンド部２６の爪部２６１による次の保持候補となるワークが保持不可ワークであることが判定部７によって判定された場合、学習部６３は、保持スペースが確保されるように保持不可ワークを変位させることが可能な、第２手法を用いたロボット２の行動パターンを学習する。これにより、学習部６３は、ハンド部２６の爪部２６１による保持が不可能とされたワークの保持を可能とする、第２手法を用いたロボット２の最適な行動パターンを学習することができる。そして、次回のロボット２の生産動作において、行動決定部９は、学習部６３により生成された学習結果情報ＪＨ２に登録された、第１の値Ｒ１（＝１００）の報酬Ｒが与えられた行動パターンを、保持不可ワークをハンド部２６によって保持可能とするための行動パターンとして決定する。この行動パターンに従ってロボット２が変位動作を実行することにより、ハンド部２６の爪部２６１による保持が不可能とされたワークの周囲に、爪部２６１による保持を可能とするための保持スペースが確保され、当該ワークの爪部２６１による保持が可能となる。このため、保持不可ワークの存在に起因してロボット２の動作を停止させることを可及的に回避することができ、ハンド部２６による容器ＣＮからのワークの取り出し動作を継続させることができる。 As described above, when the determination unit 7 determines that the work to be held next by the claw unit 261 of the hand unit 26 is a work that cannot be held, the learning unit 63 ensures that the holding space is secured. Then, the behavior pattern of the robot 2 using the second method, which can displace the work that cannot be held, is learned. Accordingly, the learning unit 63 can learn the optimal behavior pattern of the robot 2 using the second method, which enables the holding of the work that cannot be held by the claw unit 261 of the hand unit 26. . Then, in the next production operation of the robot 2, the action determining unit 9 sets the action in which the reward R of the first value R1 (= 100) registered in the learning result information JH2 generated by the learning unit 63 is given. The pattern is determined as an action pattern for enabling the work that cannot be held by the hand unit 26 to be held. When the robot 2 performs the displacement operation in accordance with this behavior pattern, a holding space for enabling the holding by the claw portion 261 is secured around the work that cannot be held by the claw portion 261 of the hand unit 26. Thus, the work can be held by the claw portion 261. Therefore, it is possible to avoid stopping the operation of the robot 2 due to the existence of the non-holdable work as much as possible, and the operation of taking out the work from the container CN by the hand unit 26 can be continued.

＜変位動作の第３例について＞
図１５及び図１６を参照して、変位動作の第３例について説明する。図１５は、第３例の変位動作において学習部６３によって生成される学習結果情報ＪＨ３を説明するための図である。図１６は、第３例の変位動作に関する機械学習装置５の動作を示すフローチャートである。第３例では、機械学習装置５は、保持不可ワークを変位させる変位手法を切り替えながら、保持スペースの確保が可能に保持不可ワークを変位させる最適な行動パターンを学習する。 <About the third example of the displacement operation>
A third example of the displacement operation will be described with reference to FIGS. FIG. 15 is a diagram illustrating the learning result information JH3 generated by the learning unit 63 in the displacement operation of the third example. FIG. 16 is a flowchart illustrating the operation of the machine learning device 5 regarding the displacement operation of the third example. In the third example, the machine learning device 5 learns an optimal behavior pattern for displacing the non-retainable work so as to secure a holding space while switching the displacement method for displacing the non-retainable work.

第３例によるロボット２の変位動作において試行される変位手法の種類、数、及び試行順位は、特に限定されるものではない。学習部６３は、変位手法の種類、数、及び試行順位を、予め設定する。以下では、ロボット２の変位動作において、変位手法の試行順位が、前述の図６に例示される第３手法、第４手法、第２手法、第５手法、第６手法の順位に設定されている場合について説明する。 The type, number, and trial order of the displacement methods tried in the displacement operation of the robot 2 according to the third example are not particularly limited. The learning unit 63 presets the type, number, and trial order of the displacement methods. Hereinafter, in the displacement operation of the robot 2, the trial order of the displacement method is set to the order of the third method, the fourth method, the second method, the fifth method, and the sixth method illustrated in FIG. Will be described.

状態観測部６１は、ロボット２の状態が移行されるごとに変化する状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）に基づいて、ロボット２の状態が状態Ｓ２から状態Ｓ３へ移行されたことを観測する（図１６のステップｃ１）。ロボット２の状態が状態Ｓ２から状態Ｓ３へ移行されたとき、判定部７は、撮像装置３から出力された基準画像データを取得する（図１６のステップｃ２）。判定部７は、基準画像データに基づいて容器ＣＮ内での各ワークの収容状況を認識し、次の保持候補となるワークが保持不可ワークであるか否かを判定する（図１６のステップｃ３）。 The state observing unit 61 determines that the state of the robot 2 has been shifted from the state S2 to the state S3 based on the state variables (ΔX, ΔY, ΔZ, p, d) that change each time the state of the robot 2 is shifted. Is observed (step c1 in FIG. 16). When the state of the robot 2 is shifted from the state S2 to the state S3, the determination unit 7 acquires the reference image data output from the imaging device 3 (Step c2 in FIG. 16). The determination unit 7 recognizes the state of accommodation of each work in the container CN based on the reference image data, and determines whether the next work as a candidate for holding is a work that cannot be held (step c3 in FIG. 16). ).

次の保持候補となるワークが保持不可ワークではないと判定部７によって判定された場合には、ロボット２の状態が状態Ｓ３であるときの状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）が（０，０，０，１，１）とされる。この場合、行動決定部９は、状態Ｓ３から状態Ｓ４へと移行させるための既存の行動パターンを記憶部８から読み出して制御装置４に向けて出力する。既存の行動パターンが入力された制御装置４は、当該既存の行動パターンに基づいて、ロボット２の動作を制御する。制御装置４の制御によってロボット２は、爪部２６１によって保持された一のワークを容器ＣＮから取り出す取り出し動作を実行する（図１６のステップｃ５）。 If the determination unit 7 determines that the work to be the next holding candidate is not a work that cannot be held, the state variables (ΔX, ΔY, ΔZ, p, d) when the state of the robot 2 is the state S3 are determined. (0, 0, 0, 1, 1). In this case, the action determining unit 9 reads an existing action pattern for shifting from the state S3 to the state S4 from the storage unit 8 and outputs the pattern to the control device 4. The control device 4 to which the existing behavior pattern has been input controls the operation of the robot 2 based on the existing behavior pattern. Under the control of the control device 4, the robot 2 executes a take-out operation of taking out one work held by the claw portion 261 from the container CN (step c5 in FIG. 16).

一方、次の保持候補となるワークが保持不可ワークであると判定部７によって判定された場合には、ロボット２の状態が状態Ｓ３であるときの状態変数（ΔＸ，ΔＹ，ΔＺ，ｐ，ｄ）が（０，０，０，１，０）とされる。 On the other hand, when the determination unit 7 determines that the work to be the next holding candidate is a work that cannot be held, the state variables (ΔX, ΔY, ΔZ, p, d) when the state of the robot 2 is the state S3. ) Is (0,0,0,1,0).

次の保持候補となるワークが保持不可ワークであると判定部７によって判定された場合、爪部２６１による保持スペースが周囲に確保されるように保持不可ワークを変位させる変位動作が実行される。変位動作の第３例においては、ロボット２は、まず、第３手法を用いた行動パターンに基づく変位動作によって保持不可ワークを変位させる試行を行う。なお、第３手法は、前述したように、爪部２６１によって保持した一のワークをパレットＰＬに載置した後、ハンド部２６が爪部２６１を保持不可ワークに当接させた状態で移動することにより、当該保持不可ワークを変位させる変位手法である。 When the determination unit 7 determines that the work to be the next holding candidate is a work that cannot be held, a displacement operation is performed to displace the work that cannot be held so that a space for holding the claw 261 is secured around the work. In the third example of the displacement operation, the robot 2 first performs a trial of displacing the non-holdable work by the displacement operation based on the action pattern using the third method. In the third method, as described above, after one work held by the claw portion 261 is placed on the pallet PL, the hand portion 26 moves while the claw portion 261 is in contact with the non-holdable work. This is a displacement method for displacing the work that cannot be held.

行動観測部６２は、第３手法を用いたロボット２の行動パターンを観測する（図１６のステップｃ４）。第３手法を用いた行動パターンに基づく変位動作が完了すると、変位量観測部６４は、撮像装置３から出力された、変位動作後の画像データを取得する（図１６のステップｃ６）。変位動作後の画像データは、第３手法を用いたロボット２の行動パターンによって変位された後の保持不可ワークに関する三次元位置情報を含む画像データとなる。変位量観測部６４は、ロボット２による変位動作前における画像データであって、判定部７が保持不可ワークの存否を判定するときに参照する基準画像データと、ロボット２による変位動作後における画像データとに基づいて、保持不可ワークのワーク変位量を観測する（図１６のステップｃ７）。 The behavior observation unit 62 observes the behavior pattern of the robot 2 using the third technique (Step c4 in FIG. 16). When the displacement operation based on the action pattern using the third method is completed, the displacement amount observation unit 64 acquires the image data after the displacement operation, which is output from the imaging device 3 (step c6 in FIG. 16). The image data after the displacement operation is image data including three-dimensional position information on the non-holdable work that has been displaced by the behavior pattern of the robot 2 using the third method. The displacement amount observing section 64 is image data before the displacing operation by the robot 2, reference image data to be referred to when the judging section 7 judges the presence or absence of a work that cannot be held, and image data after the displacing operation by the robot 2. Based on the above, the work displacement amount of the work that cannot be held is observed (step c7 in FIG. 16).

変位量観測部６４によって保持不可ワークのワーク変位量が観測されると、学習部６３の報酬設定部６３１は、保持不可ワークのワーク変位量が（閾値ＷＤＴ×０．５）以上であるか否かを判定する（図１６のステップｃ８）。更に、報酬設定部６３１は、保持不可ワークのワーク変位量が閾値ＷＤＴ以上であるか否かを判定する（図１６のステップｃ９）。報酬設定部６３１は、保持不可ワークのワーク変位量が所定の閾値ＷＤＴ以上となるロボット２の行動パターンに対しては、第１の値Ｒ１（例えば「１００」）の報酬Ｒを与える（図１６のステップｃ１０）。報酬設定部６３１は、保持不可ワークのワーク変位量が（閾値ＷＤＴ×０．５）以上且つ閾値ＷＤＴ未満となるロボット２の行動パターンに対しては、第１の値Ｒ１よりも小さい第２の値Ｒ２（例えば「１０」）の報酬Ｒを与える（図１６のステップｃ１５）。報酬設定部６３１は、保持不可ワークのワーク変位量が（閾値ＷＤＴ×０．５）未満となるロボット２の行動パターンに対しては、第２の値Ｒ２よりも小さい第３の値Ｒ３（例えば「０：ゼロ」）の報酬Ｒを与える（図１６のステップｃ１４）。 When the displacement amount of the non-holdable work is observed by the displacement amount observation unit 64, the reward setting unit 631 of the learning unit 63 determines whether the work displacement amount of the non-holdable work is equal to or more than (threshold WDT × 0.5). Is determined (step c8 in FIG. 16). Further, the reward setting unit 631 determines whether or not the work displacement amount of the work that cannot be held is equal to or larger than the threshold value WDT (step c9 in FIG. 16). The reward setting unit 631 gives the reward R of the first value R1 (for example, “100”) to the behavior pattern of the robot 2 in which the amount of work displacement of the work that cannot be held is equal to or larger than the predetermined threshold WDT (FIG. 16). Step c10). The reward setting unit 631 sets the second value smaller than the first value R1 for the behavior pattern of the robot 2 in which the work displacement amount of the non-holdable work is equal to or more than (threshold value WDT × 0.5) and less than the threshold value WDT. A reward R of a value R2 (for example, “10”) is given (step c15 in FIG. 16). For a behavior pattern of the robot 2 in which the work displacement amount of the non-holdable work is less than (threshold value WDT × 0.5), the reward setting unit 631 sets a third value R3 (for example, smaller than the second value R2). A reward R of “0: zero” is given (step c14 in FIG. 16).

次に、学習部６３の価値関数更新部６３２は、ロボット２の行動パターンの価値Ｑ（ｓ，ａ）を規定する価値関数を、上記式（１）の更新式を用いて更新する（図１６のステップｃ１１，ｃ１６）。 Next, the value function updating unit 632 of the learning unit 63 updates the value function defining the value Q (s, a) of the behavior pattern of the robot 2 using the updating expression of the above expression (1) (FIG. 16). Steps c11 and c16).

第３手法を用いた行動パターンに対して第３の値Ｒ３（＝０）の報酬Ｒが与えられた場合、学習部６３は、当該第３手法を用いた行動パターンに対して、基準回数（例えば「２０」）連続して第３の値Ｒ３（＝０）の報酬Ｒが与えられたか否かを判定する（図１６のステップｃ１７）。第３手法を用いた行動パターンに対して基準回数（＝２０）連続して第３の値Ｒ３（＝０）の報酬Ｒが与えられた場合、学習部６３は、この第３手法が保持不可ワークを変位させる適正度の低い変位手法であると判断し、変位手法を第３手法から次の試行順位の第４手法に切り替える（図１６のステップｃ１８）。なお、第４手法は、前述したように、爪部２６１によって保持した一のワークをパレットＰＬに載置した後、他の容器から取り出したワークＷＳを爪部２６１によって保持させ、ハンド部２６がワークＷＳを保持不可ワークに当接させた状態で移動することにより、当該保持不可ワークを変位させる変位手法である。 When a reward R of a third value R3 (= 0) is given to the behavior pattern using the third method, the learning unit 63 sets the reference number ( For example, “20”), it is determined whether or not the reward R of the third value R3 (= 0) is continuously given (step c17 in FIG. 16). When the reward R of the third value R3 (= 0) is given continuously for the reference number of times (= 20) to the action pattern using the third method, the learning unit 63 cannot hold the third method. It is determined that the displacement method has a low degree of appropriateness for displacing the work, and the displacement method is switched from the third method to the fourth method in the next trial order (step c18 in FIG. 16). In the fourth method, as described above, after one work held by the claw 261 is placed on the pallet PL, the work WS taken out from another container is held by the claw 261 and the hand 26 This is a displacement method for displacing the non-holdable work by moving the work WS in contact with the non-holdable work.

学習部６３は、学習処理の回数が所定の学習回数に達したか否かを判定し（図１６のステップｃ１９）、所定の学習回数に達していない場合にはステップｃ１８において切り替えられた第４手法を用いた行動パターンに関する学習処理を、ステップｃ４に戻って上記の第３手法の場合と同様に行う。一方、所定の学習回数に達した場合、学習部６３は、現時点までに試行してきた変位手法では保持不可ワークの周囲に保持スペースの確保ができないと判断し、ワーク保持不可情報を出力する（図１６のステップｃ２０）。学習部６３によってワーク保持不可情報が出力された場合、ロボット２の生産動作が中断される。ロボット２の生産動作が中断されると、作業者は、容器ＣＮ内におけるワークの収容状況を確認し、ハンド部２６による保持が不可能であると想定されるワークを移動させる等の処置を行えばよい。 The learning unit 63 determines whether the number of times of the learning process has reached the predetermined number of times of learning (step c19 in FIG. 16). If the number of times of learning has not reached the predetermined number of times of learning, the fourth unit switched in step c18. The learning process on the behavior pattern using the method is returned to step c4, and is performed in the same manner as in the case of the third method. On the other hand, when the predetermined number of times of learning has been reached, the learning unit 63 determines that it is not possible to secure a holding space around the work that cannot be held by the displacement method tried so far, and outputs work holding impossible information (FIG. 16 Step c20). When the learning section 63 outputs the work holding impossible information, the production operation of the robot 2 is interrupted. When the production operation of the robot 2 is interrupted, the worker checks the accommodation state of the work in the container CN, and performs a measure such as moving the work which is assumed to be impossible to be held by the hand unit 26. Just do it.

以下では、所定の学習回数の範囲内で第３手法に続いて第４手法、第２手法、第５手法、第６手法の順番に、変位手法が試行されたものとして説明を続ける。 Hereinafter, the description will be continued assuming that the displacement technique has been tried in the order of the fourth technique, the second technique, the fifth technique, and the sixth technique within the range of the predetermined number of times of learning.

上記の第３手法を用いた行動パターンに対しては、基準回数（＝２０）連続して第３の値Ｒ３（＝０）の報酬Ｒが与えられた。このため、学習部６３は、第３手法が保持不可ワークを変位させる適正度の低い変位手法であると判断した。第３手法の次に試行された第４手法を用いた行動パターンに対しては、第３の値Ｒ３（＝０）の報酬Ｒが基準回数よりも少ない「１９」回連続して与えられ、第２の値Ｒ２（＝１０）の報酬Ｒが「１」回与えられたものとする。この場合、学習部６３は、第３手法よりも第４手法の方が保持不可ワークを変位させる適正度は僅かに高いけれども、第１の値Ｒ１（＝１００）の報酬Ｒが与えられていないのであるから、保持スペースの確保が可能に保持不可ワークを変位させるには至っていないと判断する。このため、学習部６３は、変位手法を第４手法から次の試行順位の第２手法に切り替える。なお、第２手法は、前述したように、爪部２６１によって保持した一のワークをパレットＰＬに載置した後、ハンド部２６が爪部２６１によって容器ＣＮを保持した状態で移動することにより、容器ＣＮの移動に応じて保持不可ワークを変位させる変位手法である。 The reward R of the third value R3 (= 0) was given to the behavior pattern using the above-described third technique continuously for the reference number of times (= 20). Therefore, the learning unit 63 has determined that the third technique is a displacement technique having a low degree of appropriateness for displacing the non-holdable work. For the behavior pattern using the fourth method that has been tried next to the third method, the reward R of the third value R3 (= 0) is continuously given “19” times smaller than the reference number, It is assumed that the reward R of the second value R2 (= 10) has been given “1” times. In this case, although the learning unit 63 has a slightly higher degree of appropriateness to displace the non-holdable work in the fourth method than in the third method, the reward R of the first value R1 (= 100) is not given. Therefore, it is determined that the work that cannot be held has been displaced so that the holding space can be secured. Therefore, the learning unit 63 switches the displacement method from the fourth method to the second method in the next trial order. As described above, in the second method, as described above, after one work held by the claw portion 261 is placed on the pallet PL, the hand portion 26 moves while holding the container CN by the claw portion 261. This is a displacement method for displacing a non-holdable work according to the movement of the container CN.

第４手法の次に試行された第２手法を用いた行動パターンに対しては、第３の値Ｒ３（＝０）の報酬Ｒが基準回数よりも少ない「１５」回連続して与えられ、その後、第１の値Ｒ１（＝１００）の報酬Ｒが与えられたものとする。この場合、学習部６３は、第１の値Ｒ１（＝１００）の報酬Ｒが与えられた時点で保持不可ワークの周囲に保持スペースが確保されたと判断し、学習処理を終了する。 For the behavior pattern using the second method that has been tried next to the fourth method, the reward R of the third value R3 (= 0) is continuously given “15” times smaller than the reference number, Thereafter, it is assumed that the reward R of the first value R1 (= 100) has been given. In this case, the learning unit 63 determines that the holding space is secured around the non-holdable work when the reward R of the first value R1 (= 100) is given, and ends the learning process.

第１の値Ｒ１（＝１００）の報酬Ｒが与えられた、第２手法を用いた行動パターンに基づくロボット２の変位動作が実行されたときには、保持不可ワークの周囲に保持スペースが確保され、当該ワークの爪部２６１による保持が可能となる。従って、保持不可ワークの周囲に保持スペースが確保された後、行動決定部９は、前述した既存の行動パターンを記憶部８から読み出すことによって、保持スペースが確保されたワークに対するロボット２の行動パターンを決定し（図１６のステップｃ１２）、その決定した行動パターンを制御装置４に向けて出力する（図１６のステップｃ１３）。これにより、制御装置４の制御によってロボット２は、ハンド部２６によって容器ＣＮから保持スペースが確保されたワークを取り出し、その取り出したワークをパレットＰＬに載置するという、連続的な生産動作を実行する。 When the displacement operation of the robot 2 based on the action pattern using the second method, in which the reward R of the first value R1 (= 100) is given, a holding space is secured around the non-holdable work, The work can be held by the claw portion 261. Therefore, after the holding space is secured around the non-holdable work, the action determining unit 9 reads out the above-described existing action pattern from the storage unit 8 to determine the behavior pattern of the robot 2 with respect to the work for which the holding space is secured. Is determined (step c12 in FIG. 16), and the determined action pattern is output to the control device 4 (step c13 in FIG. 16). Thereby, under the control of the control device 4, the robot 2 executes a continuous production operation of taking out the work with the holding space secured from the container CN by the hand unit 26 and placing the taken out work on the pallet PL. I do.

上記のように、第２手法を用いた行動パターンに対しては第１の値Ｒ１（＝１００）の報酬Ｒが与えられたので、学習部６３は学習処理を終了した。しかしながら、第２手法を用いた行動パターンは、第３の値Ｒ３（＝０）の報酬Ｒが与えられた回数が「０；ゼロ」ではなく、基準回数よりは少ないものの「１５」回連続して第３の値Ｒ３（＝０）の報酬Ｒが与えられている。このため、学習部６３は、第２手法については、判定部７によって取得された基準画像データにて表される、今回試行した配置状況の保持不可ワークを変位させるための変位手法として、最適な手法ではないと判断する。従って、今回試行した配置状況と同一又は類似した配置状況の保持不可ワークの存在が判定部７によって判定された場合、学習部６３は、第２手法に対して次の試行順位の第５手法を用いた行動パターンに関する学習処理を実行する。なお、第５手法は、前述したように、爪部２６１によって保持した一のワークをパレットＰＬに載置した後、専用治具ＪＧを爪部２６１によって保持させ、ハンド部２６が専用治具ＪＧを保持不可ワークに当接させた状態で移動することにより、当該保持不可ワークを変位させる変位手法である。 As described above, since the reward R of the first value R1 (= 100) is given to the action pattern using the second technique, the learning unit 63 ends the learning process. However, in the behavior pattern using the second method, the number of times the reward R of the third value R3 (= 0) is given is not “0; Thus, a reward R of a third value R3 (= 0) is given. For this reason, the learning unit 63 uses the second method, which is optimal as a displacement method for displacing the work that cannot be held in the arrangement status that has been tried this time and is represented by the reference image data acquired by the determination unit 7. Judge that it is not a method. Therefore, when the determination unit 7 determines that there is an unretainable work having the same or similar placement status as the placement status tested this time, the learning unit 63 performs the fifth trial of the next trial rank with the second trial. A learning process related to the used behavior pattern is executed. In the fifth method, as described above, after one work held by the claw 261 is placed on the pallet PL, the dedicated jig JG is held by the claw 261 and the hand 26 is This is a displacement method in which the non-holdable work is displaced by moving in a state of contact with the non-holdable work.

第５手法を用いた行動パターンに対しては、第３の値Ｒ３（＝０）の報酬Ｒが与えられた回数が「０；ゼロ」であり、第１の値Ｒ１（＝１００）の報酬Ｒが与えられたものとする。この場合、学習部６３は、今回試行した配置状況の保持不可ワークを変位させるための変位手法として第５手法が最適な手法であると判断し、第１の値Ｒ１（＝１００）の報酬Ｒが与えられた時点で学習処理を終了する。学習部６３は、今回試行した配置状況の保持不可ワークを変位させるための変位手法として第５手法が最適な手法であると判断したので、第５手法に対して次の試行順位に設定されていた第６手法については試行を未実施とする。 For the behavior pattern using the fifth method, the number of times the reward R of the third value R3 (= 0) is given is “0; zero”, and the reward of the first value R1 (= 100) is Let R be given. In this case, the learning unit 63 determines that the fifth technique is the optimal technique as the displacement technique for displacing the work that cannot be held in the arrangement status tried this time, and the reward R of the first value R1 (= 100). When the is given, the learning process ends. The learning unit 63 has determined that the fifth technique is the optimal technique as the displacement technique for displacing the work that cannot be held in the arrangement situation that has been tried this time, and is therefore set in the next trial order with respect to the fifth technique. For the sixth method, no trial was performed.

学習部６３は、変位手法を切り替えながら保持不可ワークを変位させる変位動作におけるロボット２の行動パターンの学習結果を表す学習結果情報ＪＨ３（図１５）を生成する。学習部６３によって生成された学習結果情報ＪＨ３は、記憶部８に記憶される。学習結果情報ＪＨ３は、例えば、基準画像データ情報Ｊ３１と、変位手法情報Ｊ３２と、報酬情報Ｊ３３と、報酬ゼロ連続回数情報Ｊ３４と、修正係数情報Ｊ３５とが関連付けられた情報である。 The learning unit 63 generates learning result information JH3 (FIG. 15) representing a learning result of the behavior pattern of the robot 2 in the displacement operation of displacing the non-holdable work while switching the displacement method. The learning result information JH3 generated by the learning unit 63 is stored in the storage unit 8. The learning result information JH3 is, for example, information in which reference image data information J31, displacement technique information J32, reward information J33, consecutive zero reward information J34, and correction coefficient information J35 are associated with each other.

基準画像データ情報Ｊ３１は、判定部７が保持不可ワークの存否を判定する際に参照した基準画像データＧＤＳを表す情報である。変位手法情報Ｊ３２は、学習部６３によって設定された変位手法を表す情報である。図１５に示す例では、変位手法情報Ｊ３２として、試行順位の順に第３手法、第４手法、第２手法、第５手法及び第６手法が登録されている。報酬情報Ｊ３３は、行動観測部６２により観測されたロボット２の行動パターンに対して報酬設定部６３１が設定した報酬Ｒを表す情報である。図１５に示す例では、第３手法に対しては第３の値Ｒ３（＝０）の報酬Ｒが与えられ、第４手法に対しては第２の値Ｒ２（＝１０）の報酬Ｒが与えられ、第２手法及び第５手法に対しては第１の値Ｒ１（＝１００）の報酬Ｒが与えられたことが示されている。なお、第６手法については試行が未実施であるため、その旨を表す「未実施」が登録されている。 The reference image data information J31 is information representing the reference image data GDS that the determination unit 7 refers to when determining whether there is a work that cannot be held. The displacement method information J32 is information representing the displacement method set by the learning unit 63. In the example illustrated in FIG. 15, the third technique, the fourth technique, the second technique, the fifth technique, and the sixth technique are registered as the displacement technique information J32 in the order of the trial order. The reward information J33 is information indicating the reward R set by the reward setting unit 631 for the behavior pattern of the robot 2 observed by the behavior observation unit 62. In the example shown in FIG. 15, a reward R of a third value R3 (= 0) is given to the third method, and a reward R of a second value R2 (= 10) is given to the fourth method. It is shown that the reward R of the first value R1 (= 100) was given to the second technique and the fifth technique. Since the trial has not been performed for the sixth method, “not performed” indicating that fact has been registered.

報酬ゼロ連続回数情報Ｊ３４は、変位手法情報Ｊ３２にて表される変位手法を用いた行動パターンに基づくロボット２の変位動作の試行において、第３の値Ｒ３（＝０）の報酬Ｒが与えられた連続回数を表す情報である。図１５に示す例では、第３の値Ｒ３（＝０）の報酬Ｒが与えられた連続回数は、第３手法が前記基準回数と同じ「２０」であり、第４手法が「１９」であり、第２手法が「１５」であり、第５手法が「０」であることが示されている。なお、第６手法については試行が未実施であることを表す「未実施」が登録されている。 The reward zero continuous count information J34 is provided with a reward R of a third value R3 (= 0) in a trial of the displacement operation of the robot 2 based on the action pattern using the displacement technique represented by the displacement technique information J32. This is information indicating the number of consecutive times. In the example illustrated in FIG. 15, the number of consecutive times that the reward R of the third value R3 (= 0) is given is “20” which is the same as the reference number in the third method, and is “19” in the fourth method. Yes, the second method is “15”, and the fifth method is “0”. As for the sixth method, “not performed” indicating that the trial has not been performed is registered.

修正係数情報Ｊ３５は、変位手法情報Ｊ３２にて表される変位手法の、保持不可ワークを変位させる適正度の指標となる修正係数εを表す情報である。修正係数情報Ｊ３５にて表される修正係数εは、各変位手法の試行回数の基準となる前記基準回数「Ｍ」と、報酬ゼロ連続回数情報Ｊ３４にて表される第３の値Ｒ３（＝０）の報酬Ｒが与えられた連続回数「Ｋ」とに基づいて、「修正係数ε＝（Ｍ−Ｋ）／Ｍ」の式に従って学習部６３によって算出される。修正係数εは、第３の値Ｒ３（＝０）の報酬Ｒが与えられた連続回数「Ｋ」が少ないほど大きくなる。つまり、修正係数εが大きくなるほど、保持不可ワークを変位させる適正度が高くなる。図１５に示す例では、修正係数εは、第３手法が「０」であり、第４手法が「０．０５」であり、第２手法が「０．２５」であり、第５手法が「１」であることが示されている。なお、第６手法については試行が未実施であることを表す「未実施」が登録されている。学習部６３によって算出された修正係数εは、上記式（１）で示される価値Ｑ（ｓ，ａ）の更新式の「ε」に反映される。なお、上記式（１）で示される価値Ｑ（ｓ，ａ）の更新式の「ε」は、学習部６３によって修正係数εが算出されるまでは、「ε＝１」とされる。 The correction coefficient information J35 is information indicating a correction coefficient ε which is an index of the appropriateness of displacing the non-holdable work in the displacement method represented by the displacement method information J32. The correction coefficient ε represented by the correction coefficient information J35 is based on the reference number “M” serving as a reference for the number of trials of each displacement method, and the third value R3 (= The learning unit 63 calculates the correction coefficient ε = (M−K) / M based on the number of consecutive times “K” to which the reward R of 0) is given. The correction coefficient ε increases as the number of consecutive times “K” in which the reward R of the third value R3 (= 0) is given is smaller. That is, as the correction coefficient ε increases, the degree of appropriateness for displacing the non-holdable work increases. In the example shown in FIG. 15, the correction coefficient ε is “0” in the third method, “0.05” in the fourth method, “0.25” in the second method, and “5” in the fifth method. It is shown to be “1”. As for the sixth method, “not performed” indicating that the trial has not been performed is registered. The correction coefficient ε calculated by the learning unit 63 is reflected in “ε” of the updating equation of the value Q (s, a) shown in the above equation (1). Note that “ε” in the updating equation of the value Q (s, a) shown in the above equation (1) is “ε = 1” until the correction coefficient ε is calculated by the learning unit 63.

学習部６３により生成された学習結果情報ＪＨ３は、次回以降のロボット２の変位動作の実行時に参照される。記憶部８に記憶された学習結果情報ＪＨ３に登録されている基準画像データＧＤＳにて表される配置状況と同一又は類似した配置状況の保持不可ワークの存在が判定部７によって判定された場合、学習部６３は、報酬情報Ｊ３３及び修正係数情報Ｊ３５を参照し、新たな学習処理が必要であるか否かを判断する。具体的には、学習部６３は、変位手法情報Ｊ３２にて表される変位手法ごとに、修正係数情報Ｊ３５にて表される修正係数εを報酬情報Ｊ３３にて表される報酬Ｒに乗算し、修正された修正報酬値（上記式（１）の「εＲ（ｓ，ａ）」に相当する）を算出する。修正報酬値は、保持不可ワークを変位させる適正度の指標となる修正係数εを報酬Ｒに乗算した値であるので、変位手法ごとに与えられる適正度を加味した報酬であると言える。 The learning result information JH3 generated by the learning unit 63 is referred to at the time of executing the displacement operation of the robot 2 from the next time. When the determination unit 7 determines that there is a non-holdable work having the same or similar arrangement status as the arrangement status represented by the reference image data GDS registered in the learning result information JH3 stored in the storage unit 8, The learning unit 63 refers to the reward information J33 and the correction coefficient information J35, and determines whether a new learning process is necessary. Specifically, the learning unit 63 multiplies the compensation R represented by the compensation information J33 by the modification coefficient ε represented by the modification coefficient information J35 for each displacement technique represented by the displacement technique information J32. , And the corrected reward value (corresponding to “εR (s, a)” in equation (1)) is calculated. The corrected reward value is a value obtained by multiplying the reward R by a correction coefficient ε, which is an index of the appropriateness for displacing the work that cannot be held, and can be said to be a reward that takes into account the appropriateness given for each displacement method.

学習部６３は、学習結果情報ＪＨ３に基づき、修正報酬値が第１の値Ｒ１（＝１００）の報酬Ｒと同値の変位手法を認識した場合、すなわち、第１の値Ｒ１（＝１００）の報酬Ｒが与えられ、且つ修正係数εが「１」の変位手法（図１５における第５手法が相当）を認識した場合、その変位手法が保持不可ワークを変位させるための最適な手法であると判断し、学習処理を省略する。この場合、行動決定部９は、記憶部８に記憶された学習結果情報ＪＨ３に登録されている第５手法を用いた行動パターンを、変位動作の実行時におけるロボット２の行動パターンとして決定する。行動決定部９は、決定した第５手法を用いた行動パターンを制御装置４に向けて出力する。変位動作時の行動パターンが入力された制御装置４は、当該行動パターンに基づいて、ロボット２の動作を制御する。制御装置４の制御によってロボット２は、爪部２６１による保持スペースが周囲に確保されるように保持不可ワークを変位させる。 The learning unit 63 recognizes, based on the learning result information JH3, a displacement technique in which the corrected reward value is the same as the reward R of the first value R1 (= 100), that is, the first reward value R1 (= 100). When the reward R is given and the displacement method with the modification coefficient ε of “1” is recognized (the fifth method in FIG. 15 is equivalent), it is determined that the displacement method is the optimal method for displacing the work that cannot be held. Judge, and the learning process is omitted. In this case, the action determining unit 9 determines the action pattern using the fifth technique registered in the learning result information JH3 stored in the storage unit 8 as the action pattern of the robot 2 at the time of performing the displacement operation. The action determining unit 9 outputs the determined action pattern using the fifth technique to the control device 4. The control device 4 to which the behavior pattern at the time of the displacement operation is input controls the operation of the robot 2 based on the behavior pattern. Under the control of the control device 4, the robot 2 displaces the work that cannot be held such that a space for holding the claw portion 261 is secured around the robot.

また、第１の値Ｒ１（＝１００）の報酬Ｒが与えられ、且つ修正係数εが「１」の変位手法（図１５における第５手法が相当）を認識した場合、上記のように学習処理を省略してもよいが、その認識した手法の試行回数が少ない場合は、再度学習処理を始めることでロバスト性を高めることもできる。学習部６３は、認識した手法の試行回数が少ない場合には、その変位手法を、保持不可ワークを変位させるための最適な手法の候補（最適候補手法：図１５の例では第５手法）であると判断し、その最適候補手法について再度学習処理を実行する。学習部６３は、最適候補手法について再度学習処理を実行する際、その試行回数を予め設定しておき、設定された試行回数のうちの第１の値Ｒ１（＝１００）の報酬Ｒが与えられた回数の割合で評価して、修正係数εを算出するようにしてもよい。例えば、最適候補手法の試行回数を「３回」に設定し、基準画像データＧＤＳにて表される配置状況と同一又は類似した配置状況の保持不可ワークの存在が判定部７によって「３回」判定された場合、学習部６３は、その都度、保持不可ワークを変位させる変位手法として最適候補手法を選択し、第１の値Ｒ１（＝１００）の報酬Ｒが与えられた回数に応じて修正係数εを算出する。学習部６３は、最適候補手法を「３回」試行し、「３回」とも第１の値Ｒ１（＝１００）の報酬Ｒが与えられた場合には、修正係数εを「１」に設定し、その最適候補手法とされていた変位手法が保持不可ワークを変位させるための最適な手法であると判断する。 In addition, when the reward R of the first value R1 (= 100) is given and the displacement method with the modification coefficient ε of “1” is recognized (the fifth method in FIG. 15 is equivalent), the learning process is performed as described above. May be omitted, but when the number of trials of the recognized method is small, robustness can be improved by starting the learning process again. When the number of trials of the recognized method is small, the learning unit 63 determines the displacement method as a candidate of the optimal method for displacing the non-holdable work (optimal candidate method: the fifth method in the example of FIG. 15). It is determined that there is, and the learning process is executed again for the optimal candidate method. When executing the learning process again for the optimal candidate method, the learning unit 63 sets the number of trials in advance, and is given a reward R of the first value R1 (= 100) of the set number of trials. The correction coefficient ε may be calculated by evaluating at the ratio of the number of times of the correction. For example, the number of trials of the optimal candidate method is set to “three times”, and the determination unit 7 determines that there is a non-holdable work having the same or similar arrangement state as the arrangement state represented by the reference image data GDS. When it is determined, the learning unit 63 selects the optimal candidate method as the displacement method for displacing the non-holdable work each time, and corrects according to the number of times the reward R of the first value R1 (= 100) is given. Calculate the coefficient ε. The learning unit 63 trials the optimal candidate method “3 times”, and sets the modification coefficient ε to “1” when a reward R of the first value R1 (= 100) is given to all “3 times”. Then, it is determined that the displacement method that has been regarded as the optimal candidate method is the optimal method for displacing the work that cannot be held.

また、一旦例えば上述の図１５における第５手法のように、ある変位手法が保持不可ワークを変位させるための最適な手法であると判断された後、記憶部８に記憶された学習結果情報ＪＨ３に登録されている基準画像データＧＤＳにて表される配置状況と同一又は類似した配置状況の保持不可ワークの存在が判定部７によって判定された場合、学習部６３は、図１６に示す学習処理を繰り返して実行するようにしてもよい。この場合、図１６のステップｃ４において最初に試行する変位手法として、図１５における第５手法のように一旦最適な手法であると判断された手法を用いればよいが、その後、その手法に対して第１の値Ｒ１（＝１００）の報酬Ｒが与えられなかった場合には、例えば図１５における第６手法等の他の変位手法を試行するようにしてもよい。すなわち、学習部６３は、最適と判断する手法が変わることを許容してもよい。 Further, once it is determined that a certain displacement technique is the optimal technique for displacing the non-holdable work, for example, as in the above-described fifth technique in FIG. 15, the learning result information JH3 stored in the storage unit 8 is obtained. When the determination unit 7 determines that the non-holdable work having the same or similar arrangement status as the arrangement status represented by the reference image data GDS registered in the learning unit 63 is determined, the learning unit 63 performs the learning process illustrated in FIG. May be repeatedly executed. In this case, as the displacement technique to be tried first in step c4 in FIG. 16, a technique once determined to be the optimal technique like the fifth technique in FIG. 15 may be used. If the reward R of the first value R1 (= 100) is not given, another displacement method such as the sixth method in FIG. 15 may be tried. That is, the learning unit 63 may allow a change in the method determined to be optimal.

なお、記憶部８に記憶された学習結果情報ＪＨ３に登録されている基準画像データＧＤＳにて表される配置状況とは大きく異なる配置状況の保持不可ワークの存在が判定部７によって判定された場合、学習部６３は、予め設定した試行順位に従って各変位手法を用いた行動パターンに基づくロボット２の変位動作を試行し、その行動パターンを学習する。 In the case where the determination unit 7 determines that there is a work that cannot be held in an arrangement state that is significantly different from the arrangement state represented by the reference image data GDS registered in the learning result information JH3 stored in the storage unit 8, The learning unit 63 trials the displacement operation of the robot 2 based on the behavior pattern using each displacement method according to the trial order set in advance, and learns the behavior pattern.

以上説明したように、ハンド部２６の爪部２６１による次の保持候補となるワークが保持不可ワークであることが判定部７によって判定された場合、学習部６３は、保持スペースが確保されるように保持不可ワークを変位させることが可能な、最適な変位手法を学習するとともに、当該変位手法を用いたロボット２の行動パターンを学習する。これにより、学習部６３は、ハンド部２６の爪部２６１による保持が不可能とされたワークの保持を可能とする、最適な変位手法を用いたロボット２の最適な行動パターンを学習することができる。そして、次回のロボット２の生産動作において、行動決定部９は、学習部６３により生成された学習結果情報ＪＨ３に基づいて最適な変位手法を用いた行動パターンを、保持不可ワークをハンド部２６によって保持可能とするための行動パターンとして決定する。この行動パターンに従ってロボット２が変位動作を実行することにより、ハンド部２６の爪部２６１による保持が不可能とされたワークの周囲に、爪部２６１による保持を可能とするための保持スペースが確保され、当該ワークの爪部２６１による保持が可能となる。このため、保持不可ワークの存在に起因してロボット２の動作を停止させることを可及的に回避することができ、ハンド部２６による容器ＣＮからのワークの取り出し動作を継続させることができる。 As described above, when the determination unit 7 determines that the work to be held next by the claw unit 261 of the hand unit 26 is a work that cannot be held, the learning unit 63 ensures that the holding space is secured. In addition to learning an optimal displacement method capable of displacing a workpiece that cannot be held, a behavior pattern of the robot 2 using the displacement method is learned. As a result, the learning unit 63 can learn the optimal behavior pattern of the robot 2 using the optimal displacement method, which enables the holding of the work that cannot be held by the claw unit 261 of the hand unit 26. it can. Then, in the next production operation of the robot 2, the action determination unit 9 uses the hand unit 26 to convert the action pattern using the optimal displacement method based on the learning result information JH 3 generated by the learning unit 63 into a non-holdable work. It is determined as an action pattern for enabling the holding. When the robot 2 performs the displacement operation in accordance with this behavior pattern, a holding space for enabling the holding by the claw portion 261 is secured around the work that cannot be held by the claw portion 261 of the hand unit 26. Thus, the work can be held by the claw portion 261. Therefore, it is possible to avoid stopping the operation of the robot 2 due to the existence of the non-holdable work as much as possible, and the operation of taking out the work from the container CN by the hand unit 26 can be continued.

なお、上記では、１つの変位手法を用いた行動パターンに基づく変位動作によって保持不可ワークを変位させる試行を行っているが、複数の手法が組み合わされた行動パターンに基づく変位動作を試行するようにしてもよい。この場合、例えば、専用治具ＪＧを用いた第５手法と、容器ＣＮを移動させる第２手法とが組み合わされた行動パターンに基づく変位動作の試行が考えられる。 In the above description, an attempt is made to displace a non-holdable work by a displacement operation based on an action pattern using one displacement method. However, a displacement operation based on an action pattern obtained by combining a plurality of methods is attempted. You may. In this case, for example, a trial of a displacement operation based on an action pattern in which the fifth technique using the dedicated jig JG and the second technique for moving the container CN are considered.

また、例えば、パレットＰＬに載置後のワークについて、ロボット２の変位動作に起因した傷等が発生しているかなどを検査し、その検査結果を加味した報酬Ｒを、ロボット２の変位動作に対応した行動パターンに与えるようにしてもよい。この場合、例えば、保持スペースの確保が可能に保持不可ワークを変位させ、且つ、傷等が発生しないような変位手法を用いた行動パターンに対しては、第１の値Ｒ１（＝１００）に所定値（例えば「１」）を加算した値の報酬Ｒを与えるようにすればよい。 Further, for example, the work after being placed on the pallet PL is inspected for any damage caused by the displacement operation of the robot 2 and the like, and a reward R considering the inspection result is added to the displacement operation of the robot 2. You may make it give to the corresponding action pattern. In this case, for example, for an action pattern using a displacement method that displaces a work that cannot be held so that a holding space can be secured and that does not generate a scratch or the like, the first value R1 (= 100) is set. What is necessary is just to give the reward R of the value which added the predetermined value (for example, "1").

１ロボットシステム
２ロボット
２６ハンド部
３撮像装置
４制御装置
５機械学習装置
６学習処理部
６１状態観測部
６２行動観測部
６３学習部
６４変位量観測部
７判定部
８記憶部
９行動決定部 Reference Signs List 1 robot system 2 robot 26 hand unit 3 imaging device 4 control device 5 machine learning device 6 learning processing unit 61 state observation unit 62 action observation unit 63 learning unit 64 displacement amount observation unit 7 determination unit 8 storage unit 9 action determination unit

Claims

A machine learning device that learns the operation of a robot including a hand unit that takes out a plurality of works by holding the works from a container that accommodates the works in a bulk state,
Before or when the hand unit holds one work in the container, the hand unit recognizes the accommodation state of each work in the container, and the next work candidate to be held by the hand unit is the hand unit. A determination unit that determines whether or not a holding space for enabling holding by the unit is a non-holdable work that is not secured around;
A learning unit that learns a displacement method capable of displacing the non-holdable work so that the holding space is secured, and learns an action pattern of the robot using the displacement method.
A machine learning device comprising: a behavior determining unit that determines a behavior pattern of the robot based on a learning result of the learning unit as a behavior pattern for enabling the holding unit to hold the work that cannot be held.

The displacement method includes a plurality of different methods for displacing the non-holdable work,
The machine learning device according to claim 1, wherein the learning unit learns an action pattern of the robot in which the plurality of methods are combined.

The displacement method is a method of displacing the non-retainable work by moving the hand unit in a state where the held one work is in contact with the non-retainable work,
The action elements that define the action pattern of the robot that the learning unit learns include an element that determines a contact position of the one work with the non-holdable work and an element that determines a movement locus of the hand unit. The machine learning device according to claim 1, which is included.

The displacement method is a method of displacing the non-holdable work by moving the hand unit while holding the container,
The behavior element that defines the behavior pattern of the robot that is learned by the learning unit includes an element that determines a holding position where the hand unit holds the container, an element that determines a movement trajectory of the hand unit, The machine learning device according to claim 1, further comprising: an element that determines a moving speed of the unit.

A robot having a hand unit that takes out a plurality of works by holding the works from a container that accommodates the works in a piled state,
The machine learning device according to any one of claims 1 to 4, which learns an operation of the robot,
And a control device that controls the operation of the robot based on a learning result of the machine learning device.