JP2019098419A

JP2019098419A - Machine learning method for learning extraction operation of powder, grain or fluid, and robot machine learning control device

Info

Publication number: JP2019098419A
Application number: JP2017228176A
Authority: JP
Inventors: 三木　良雄; Yoshio Miki; 良雄三木; 脩平大類; Shuhei Orui; 武荻野; Takeshi Ogino
Original assignee: QP Corp; Kogakuin University
Current assignee: QP Corp; Kogakuin University
Priority date: 2017-11-28
Filing date: 2017-11-28
Publication date: 2019-06-24

Abstract

To provide a machine learning method for making a robot learn an operation for scooping and subdividing a defined amount only of powder, grains or fluid accommodated in bulk in a container into another container, and to provide a robot machine learning control device.SOLUTION: Provided is a machine learning method for making a robot machine learning control device learn an operation for scooping a defined amount only of powder, grains or fluid that are objects from a first container into a second container by using a multi-axial robot and extraction means disposed at a tip thereof. The machine learning method comprises: a reality learning process for making the multi-axial robot and the object perform reinforcement learning in a reality space; and a simulation learning process for making an artificial multi-axial robot which artificially simulates the multi-axial robot, and an artificial object which artificially simulates the object perform reinforcement learning in a simulation space. The artificial object includes a variation parameter comprised in the object.SELECTED DRAWING: Figure 1

Description

本発明は、容器に大容量で入った粉体、粒体又は流体を、別の容器に定められた量だけすくい出して小分けする動作をロボットに学習させる機械学習方法、およびロボット機械学習制御装置に関する。 The present invention relates to a machine learning method for causing a robot to learn a motion of skimming and dividing a large volume of powder, particles or fluid contained in a container into another container by a predetermined amount, and a robot machine learning control device About.

＜背景技術の説明−１＞従前より、乱雑に置かれた複数のワークを、ワーク毎の三次元マップを演算して算出した指令量でロボットを操作して、ハンドで取り出し、ワーク毎の三次元マップおよび取り出し動作の結果を関連付けながら、機械学習する装置が知られている（特許文献１参照）。 <Description of Background Art-1> From the past, the robot was operated with the command amount calculated by calculating the three-dimensional map for each work, and the plurality of works placed randomly were taken out with the hand, and the third order for each work There is known an apparatus that performs machine learning while associating an original map and a result of retrieval operation (see Patent Document 1).

＜背景技術の説明−２＞特許文献１には、複数のワークからハンドによってワークを取り出すロボット、ワーク毎の三次元マップを取得する三次元計測器の出力データを含むロボットの状態量を観測する状態量観測部、ロボットの取り出し動作の結果を取得する動作結果取得部、取り出し動作をロボットに指令する指令データが記載されており、状態量および取り出し動作の結果に関連付けて指令データを含む操作量を学習するシステムを備えている。 <Description of Background Art-2> Patent Document 1 observes a state quantity of a robot that takes out a workpiece from a plurality of workpieces with a hand, and a robot including output data of a three-dimensional measuring instrument that acquires a three-dimensional map for each workpiece. A state quantity observation unit, an operation result acquisition unit for acquiring the result of the extraction operation of the robot, and instruction data for instructing the robot to the extraction operation are described, and an operation amount including the instruction data in association with the state amount and the result of the extraction operation. Has a system to learn.

＜要望される技術＞しかしながら、粉体、粒体又は流体は特許文献１に記載されたようなワークと異なり定型性がなく、粉体、粒体又は流体の小分け取り出し動作には、定量性が求められるという問題があり、これらの問題を解決する技術が要望されている。 <Required Technology> However, unlike the work as described in Patent Document 1, the powder, the particles or the fluid have no stylistic property, and the quantitative removal operation for the powder, the particles or the fluid in an operation for partial dispensing There is a problem of being required, and a technique for solving these problems is required.

特開２０１７−３０１３５号公報Unexamined-Japanese-Patent No. 2017-30135

＜背景技術の課題＞そこで、本発明は、このような事情に鑑みてなされたものであり、その目的は、容器に大容量で入った粉体、粒体又は流体を、別の容器に定められた量だけすくい出して小分けする動作をロボットに学習させる機械学習方法、およびロボット機械学習制御装置を提供することにある。 <Problems of Background Art> Therefore, the present invention has been made in view of such circumstances, and an object thereof is to set powder, particles or fluid contained in a large volume in a container in another container. It is an object of the present invention to provide a machine learning method and a robot machine learning control device that cause a robot to learn an operation of taking out and dividing by the specified amount.

＜請求項１の内容＞このような目的を達成するため、本発明は、以下の構成によって把握される。（１）本発明は、多軸ロボットとその先端に設置された取出し手段とを用いて第一の容器から対象物である粉体、粒体又は流体を定められた量だけ第二の容器にすくい出す動作をロボット機械学習制御装置に学習させる機械学習方法であって、現実空間において（多軸ロボット、取出し手段、第一の容器、第二の容器、および）対象物で強化学習させる現実学習プロセスと、シミュレーション空間において、（多軸ロボット、取出し手段、第一の容器、第二の容器を疑似的にシミュレートした疑似多軸ロボット、疑似取出し手段、疑似第一の容器、疑似第二の容器）対象物を疑似的にシミュレートした疑似対象物で、強化学習させるシミュレーション学習プロセスとからなり、疑似対象物が、対象物が有する変動パラメータを有することを特徴とする機械学習方法である。 <Contents of claim 1> In order to achieve such an object, the present invention is grasped by the following constitution. (1) The present invention uses a multi-axis robot and a take-out means installed at the tip thereof to make a second container a predetermined amount of powder, particles or fluid as an object from the first container. A machine learning method for causing a robot machine learning control device to learn a motion of picking up, wherein the reality learning in which a reinforcement learning is performed by an object (multi-axis robot, taking means, first container, second container, and) in real space In the process and simulation space, (multi-axis robot, taking-out means, first container, pseudo-multi-axis robot simulating second container simulated, pseudo-taking-out means, first pseudo container, second pseudo Container) A simulated target that simulates an object in a simulated manner, and consists of a simulation learning process for reinforcement learning, and the simulated target has a variation parameter that the object has A machine learning method characterized.

＜請求項２の内容＞（２）本発明は、上記（１）の構成において、粉体又は粒体の変動パラメータが、対象物の安息角を含むことを特徴とするものである。 <Contents of Claim 2> (2) In the configuration of the above (1), the present invention is characterized in that the fluctuation parameter of the powder or particles includes the repose angle of the object.

＜請求項３の内容＞（３）本発明は、上記（２）の構成において、粉体又は粒体の変動パラメータが、対象物の温度および／または湿度に応じた複数の安息角を含むことを特徴とするものである。 <Contents of Claim 3> (3) In the configuration of the above (2), in the configuration of the above (2), the variation parameter of the powder or particles includes a plurality of repose angles according to the temperature and / or the humidity of the object. It is characterized by

＜請求項４の内容＞（４）本発明は、上記（１）の構成において、粉体又は粒体の変動パラメータが、対象物の粒子間摩擦係数であることを特徴とするものである。 <Contents of Claim 4> (4) In the configuration of the above (1), the present invention is characterized in that the fluctuation parameter of the powder or the particles is the interparticle friction coefficient of the object.

＜請求項５の内容＞（５）本発明は、上記（４）の構成において、粉体又は粒体の変動パラメータが、対象物の温度および／または湿度に応じた複数の粒子間摩擦係数を含むことを特徴とするものである。 <Contents of Claim 5> (5) In the configuration of the above (4), according to the present invention, the fluctuation parameter of the powder or the granules is a plurality of interparticle friction coefficients according to the temperature and / or humidity of the object. It is characterized by including.

＜請求項６の内容＞（６）本発明は、上記（１）の構成において、流体の変動パラメータが、対象物の粘度又は動粘度である。 <Contents of Claim 6> (6) In the configuration of the above (1), the fluctuation parameter of the fluid according to the present invention is the viscosity or the kinematic viscosity of the object.

＜請求項７の内容＞（７）本発明は、上記（１）〜（６）のいずれかに記載の機械学習方法で学習されたロボット機械学習制御装置である。 <Contents of Claim 7> (7) The present invention is a robot machine learning control device learned by the machine learning method according to any one of the above (1) to (6).

本発明によれば、容器に大容量で入った粉体、粒体又は流体を、別の容器に定められた量だけすくい出して小分けする動作をロボットに学習させる機械学習方法、およびロボット機械学習制御装置を提供することができる。 According to the present invention, there is provided a machine learning method for causing a robot to learn an operation of skimming and dividing a large amount of powder, particles or fluid contained in a container into another container by a predetermined amount, and robot machine learning A control device can be provided.

本発明の実施形態に係る機械学習方法の概念的な構成を示すブロック図である。It is a block diagram showing a notional composition of a machine learning method concerning an embodiment of the present invention.

＜実施形態の説明−１＞以下、添付図面を参照して、本発明を実施するための形態（以下、「実施形態」と称する）について詳細に説明する。実施形態の説明の全体を通して同じ要素には同じ符号を付している。 <Description of Embodiment-1> Hereinafter, a mode for carrying out the present invention (hereinafter referred to as "embodiment") will be described in detail with reference to the attached drawings. The same components are denoted by the same reference symbols throughout the description of the embodiments.

＜実施形態の説明−２＞まず、本発明における機械学習方法を、図１に基づいて説明する。 <Description of Embodiment-2> First, a machine learning method according to the present invention will be described based on FIG.

＜対象物＞本発明における粉体、粒体又は流体とは、砂糖、食塩、有機酸塩、粉末香辛料などの粉体、穀類、豆類、みじん切りされた野菜などの粒体、水、油、酢、卵黄、醤油などの流体であり、味噌、餡子など粘性を有する粘性体も流体に含む。これら粉体、粒体又は流体は定型性がないため、食品製造時に調合する場合、定量的に計量して調合する必要がある。しかし、少量多品種の食品を製造する場合は、材料も物性も多岐にわたるため、計量の自動化が難しく、人間が計量することが多い。 <Objects> In the present invention, powder, granules or fluid means powders such as sugar, salt, organic acid salt, powder spices, granules of grains, beans, minced vegetables, etc., water, oil, vinegar And egg yolk, soy sauce, and other fluids, and viscous materials such as miso and eggplant which have viscosity are also included in the fluid. Since these powders, granules or fluids are not routine, when prepared at the time of food production, they need to be quantitatively measured and prepared. However, in the case of producing a small amount of various kinds of food, since the material and physical properties are diverse, it is difficult to automate the weighing, and often the human weighs.

＜多軸ロボット＞現実空間Ｂにおける多軸ロボットとは、ヒト型に限らず、垂直多関節ロボット、水平多関節ロボット、直交ロボット、パラレルリンクロボットなど産業用ロボットを指す。特に、低出力で６軸以上の垂直多関節ロボットは、人間の作業エリアで動作させても危険性がなく、かつ、多品種の食品製造現場などでは従来の作業環境を維持したまま人間の作業を代替することが可能となるので好ましい。また多軸ロボットは、ＲＯＳ（ＲｏｂｏｔＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）に対応するものが好ましい。ＲＯＳ対応の多軸ロボットであれば、異なるメーカーの機種に変更することも容易となる。ＲＯＳ非対応の多軸ロボットには、座標軸変換行列、動作速度変換行列および動作角度変換行列を含むドライバレイヤーを後述のロボット機械学習制御装置に組み込むことで、ＲＯＳ対応の多軸ロボットと同様に取り扱うことが可能となる。 <Multi-Axis Robot> The multi-axis robot in the real space B is not limited to a human type but refers to industrial robots such as a vertical articulated robot, a horizontal articulated robot, an orthogonal robot, and a parallel link robot. In particular, a low-output vertical articulated robot with six or more axes does not pose a danger even when operated in a human work area, and human work while maintaining the conventional work environment in a large variety of food production sites etc. Is preferable because it is possible to substitute In addition, a multi-axis robot is preferably one compatible with ROS (Robot Operating System). If it is a multi-axis robot compatible with ROS, it will be easy to change to models of different manufacturers. Multi-axis robots that do not support ROS are handled in the same way as ROS-compatible multi-axis robots by incorporating a driver layer including coordinate axis transformation matrix, motion speed transformation matrix, and motion angle transformation matrix into the robot machine learning controller described later. It becomes possible.

＜取出し手段＞本発明における多軸ロボットの先端には、粉体、粒体又は流体といった対象物の取出し手段が備え付けられている。取出し手段としては、スプーン、しゃもじ、カップといった粉体、粒体又は流体をすくい出せるものであればよい。取出し手段には、ロードセルなど精密だが動作を停止させて計量する計量機能はないが、多軸ロボットのアクチュエータ負荷の即時集計機能はあってもよい。多軸ロボットのアクチュエータ負荷の即時集計機能を有することで、取出し手段がすくい出そうとした対象物の量を加味してフィードバック制御させることが可能となる。また取出し手段には、量制御機能があることが好ましい。量制御機能とは、スプーンやカップであれば、深さや形状をアクチュエータで変化させて容積を変動させる機構であり、しゃもじであれば、対象物が取出されるしゃもじの面積を変動させる機構である。量制御機能を有することで、１回の取り出し量を任意に変化させることが可能となる。またスプーンやカップであれば、すりきり動作を可能にでき、１回の取り出し量の定量性を向上できる。 <Extraction Means> An extraction means for an object such as powder, particles or fluid is provided at the tip of the multi-axis robot in the present invention. The takeout means may be any one that can scoop out powder, particles, or fluid such as a spoon, a scissors and a cup. Although the taking-out means does not have a precise measuring function such as a load cell but stopping operation and weighing, it may have an instantaneous counting function of the actuator load of the multi-axis robot. By having the function of instantaneously counting the actuator loads of the multi-axis robot, it is possible to perform feedback control in consideration of the amount of the object which the extraction means tried to pick up. Further, the takeout means preferably has a quantity control function. The quantity control function is a mechanism that changes the volume and depth by changing the depth and shape of the spoon or cup with an actuator, and the mechanism that changes the area of the shield from which the object is taken out. . By having the quantity control function, it is possible to arbitrarily change the amount taken out once. Moreover, if it is a spoon or a cup, scraping operation | movement can be enabled and the quantitative property of the amount taken out once can be improved.

＜容器＞本発明における粉体、粒体又は流体を入れる容器は、バット、バケツ、ペール缶、ドラム缶、袋など既存の容器であればよい。第一の容器としては、特に限定されるものではないが、容器の底部が傾斜しているものがより好ましい。傾斜していることで、粉体、粒体又は流体が最深部に集まりやすく、取り出し手段の動作の学習時間が短くできる。また底部が平面である容器であっても、予め容器自体を水平に設置せずに傾斜させておくことで、容器の底部を傾斜させることも可能である。第二の容器の直下には、計量装置を設置し、実測計量結果３ｂを得る。 <Container> The container for containing the powder, the particles or the fluid in the present invention may be an existing container such as a vat, a bucket, a pail, a drum, or a bag. The first container is not particularly limited, but it is more preferable that the bottom of the container is inclined. By being inclined, the powder, the particles or the fluid are easily collected at the deepest portion, and the learning time of the operation of the takeout means can be shortened. Moreover, even if it is a container whose bottom part is a flat, it is also possible to incline the bottom part of a container by inclining the container itself beforehand without installing horizontally. A weighing device is installed immediately below the second container to obtain a measurement result 3b.

＜定められた量＞本発明における定められた量とは、食品等製造にあたり、レシピに記載された量を示す。取り出し手段は、１回で定められた量を取り出すこともあれば、複数回にわたって取り出すことで、合計取り出し量として定められた量を取り出すこともある。 <Determined Amount> The defined amount in the present invention refers to the amount described in the recipe in the manufacture of food and the like. The takeout means may take out a predetermined amount once or take out a plurality of times to take out a predetermined amount as a total takeoff.

＜ロボット機械学習制御装置＞本発明におけるロボット機械学習制御装置とは、学習プロセスと運転プロセスとで異なる機構が動作する。また学習プロセスも、後述の現実学習プロセスとシミュレーション学習プロセスとからなる。学習プロセスにおける本発明におけるロボット機械学習制御装置は、定められた範囲の中で（現実学習プロセスにおける）多軸ロボットまたは（シミュレーション学習プロセスにおける）後述の疑似多軸ロボットに対して動作シーケンスを生み出す動作シーケンス生成機構と、（現実学習プロセスにおける）実測量結果または（シミュレーション学習プロセスにおける）後述の演算測量結果を得る測量機構と、測量結果と定められた量との差分がより小さいことをより好ましい成果報酬としてフィードバックする報酬フィードバック機構とからなる。運転プロセスにおける本発明におけるロボット機械学習制御装置は、少なくとも動作シーケンス生成機構からなるが、運転プロセスにおいても継続して現実空間Ｂで実測量結果を得る測量機構と報酬フィードバック機構を動作させて運転と同時に学習を行ってもよい。 <Robot Machine Learning Control Device> In the robot machine learning control device according to the present invention, different mechanisms operate in the learning process and the driving process. The learning process also includes a real learning process and a simulation learning process described later. The robot machine learning control apparatus according to the present invention in the learning process operates to generate an operation sequence for a multi-axis robot (in the real learning process) or a pseudo multi-axis robot (in the simulation learning process) within a defined range. It is more preferable that the difference between the sequence generation mechanism, the surveying mechanism for obtaining the actual measurement result (in the real learning process) or the calculation survey result described later (in the simulation learning process), and the difference between the survey result and the determined amount be smaller. It consists of a reward feedback mechanism that feeds back as a reward. The robot machine learning control apparatus according to the present invention in the driving process comprises at least an operation sequence generating mechanism, but the measuring mechanism and the reward feedback mechanism are continuously operated in the driving process to obtain the measurement result in the real space B. You may learn at the same time.

＜動作シーケンス生成機構＞本発明における動作シーケンス生成機構とは、少なくとも限定的乱数発生器、後述する学習データからなる。また後述の通り変動パラメータを環境条件に応じて複数保有する場合は、環境条件収集器も含まれる。限定的乱数発生器は、動作シーケンスを生成するときに用いられ、まだ学習回数が少ない段階では、動作シーケンスに占める割合を高くし、学習データが部分最適解に陥ることを防止し、学習回数が多くなってきた段階では、動作シーケンスに占める割合を低くし、無用に学習回数が増えることを防止する。 <Operation Sequence Generation Mechanism> The operation sequence generation mechanism in the present invention comprises at least a limited random number generator and learning data to be described later. In addition, as described later, when a plurality of variable parameters are stored according to the environmental conditions, environmental condition collectors are also included. The limited random number generator is used when generating an operation sequence, and in the stage where the number of learnings is still small, increases the proportion in the operation sequence to prevent learning data from falling into a suboptimal solution, At the stage where the number has increased, the proportion in the operation sequence is reduced to prevent an increase in the number of times of learning unnecessarily.

＜現実学習プロセス＞本発明における現実学習プロセスとは、現実空間Ｂにおいて、多軸ロボットに対して動作シーケンスを生み出す動作シーケンス生成機構、実測量結果の測量機構、報酬フィードバック機構とからなり、現実学習プロセスの学習結果は学習データに蓄積される。 <Reality learning process> The real learning process in the present invention is composed of an operation sequence generation mechanism that generates an operation sequence for a multi-axis robot in the real space B, a surveying mechanism of measured amount results, and a reward feedback mechanism. The learning results of the process are accumulated in the learning data.

＜疑似多軸ロボットおよび疑似取出し手段＞本発明における疑似多軸ロボットとは、現実空間Ｂにおける多軸ロボットをシミュレーション空間Ａに配置したソリッドモデルであり、疑似取り出し手段とは、同じく現実空間Ｂにおける取出し手段をシミュレーション空間Ａに配置したソリッドモデルであり、ひとつの動作シーケンスに対して現実空間Ｂとシミュレーション空間Ａとではロボットの動作に差がないように模したものである。 <Pseudo multi-axis robot and pseudo take-out means> The pseudo multi-axis robot in the present invention is a solid model in which the multi-axis robot in the real space B is arranged in the simulation space A, and the pseudo take-out means is the same in the real space B It is a solid model in which the extracting means is arranged in the simulation space A, and it is modeled so that there is no difference in the operation of the robot between the real space B and the simulation space A for one operation sequence.

＜疑似対象物＞本発明における疑似対象物とは、現実空間Ｂにおける対象物をシミュレーション空間Ａに配置した粒子シミュレーションモデルを指し、現実空間Ｂにおける対象物の変動パラメータを有することで対象物の挙動と疑似対象物の挙動が近似できるように模したものである。疑似対象物は、現実空間Ｂにおける第一の容器をシミュレーション空間Ａに配置したソリッドモデル（疑似第一の容器）内に配置され、疑似多軸ロボットおよび疑似取出し手段の動作によってシミュレーション空間Ａ内で取出され、現実空間Ｂにおける第二の容器をシミュレーション空間Ａに配置したソリッドモデル（疑似第二の容器）内に小分けされる。 <Pseudo Object> The pseudo object in the present invention refers to a particle simulation model in which an object in the real space B is arranged in the simulation space A, and the behavior of the object by having a fluctuation parameter of the object in the real space B And the imitation of the simulated object so that it can be approximated. The pseudo object is placed in a solid model (pseudo first vessel) in which the first vessel in the real space B is placed in the simulation space A, and the simulation of the simulation space A is performed by the operation of the pseudo multiaxial robot and the pseudo extraction means. It is taken out and divided into a solid model (a pseudo second container) in which the second container in the real space B is arranged in the simulation space A.

＜変動パラメータ＞本発明における変動パラメータとは、現実空間Ｂにおける対象物の物理パラメータであり、対象物が粉・粒体である場合においては少なくとも安息角または粒子間摩擦係数を含み、対象物が流体である場合においては少なくとも粘度又は動粘度を含む。また変動パラメータは、環境条件によって異なるパラメータを利用した方が対象物の挙動と疑似対象物の挙動の差が小さく、つまりシミュレーション精度が高くなる。環境条件としては、少なくとも温度または湿度を指す。 <Variation parameter> The variation parameter in the present invention is a physical parameter of the object in the real space B, and in the case where the object is powder or granules, includes at least an angle of repose or an interparticle friction coefficient. In the case of a fluid, it includes at least a viscosity or a kinematic viscosity. In addition, as for the variation parameter, the difference between the behavior of the object and the behavior of the pseudo object is smaller when the parameter different depending on the environmental conditions is used, that is, the simulation accuracy is higher. Environmental conditions refer to at least temperature or humidity.

＜安息角＞本発明における安息角とは、対象物が粉・粒体である場合、対象物をコンベアなどから連続的に積み上げた時に、崩れることなく安定を保つ斜面の角度の最大角をいう。後述の粒子間摩擦係数を加味した粒子シミュレーションモデルによってシミュレーション可能であるが、粒子間摩擦係数を加味していない粒子シミュレーションでは安息角を生じず、粉・粒体をすくい出す動作をシミュレーション空間Ａ内でロボット機械学習制御装置に学習させることはできない。また安息角は、環境条件によっても異なることがあり、特に温度および／または湿度に応じた複数の安息角を変動パラメータとして持つことが好ましく、湿気た砂糖などをすくい出す場合には異なる動作をロボット機械学習制御装置に学習させることが可能となる。 <Repose angle> The repose angle in the present invention means the maximum angle of the slope which maintains stability without collapse when the object is piled up continuously from a conveyor or the like when the object is powder or grain. . Although simulation is possible by a particle simulation model in which the interparticle friction coefficient described later is taken into consideration, in the particle simulation in which the interparticle friction coefficient is not taken into account, the repose angle is not generated and the operation of scooping powder and particles is within the simulation space A. Can not be learned by the robot machine learning controller. The angle of repose may also differ depending on the environmental conditions, and it is preferable to have a plurality of angles of repose depending on temperature and / or humidity as a fluctuation parameter. It becomes possible to make a machine learning control device learn.

＜シミュレーション学習プロセス＞本発明におけるシミュレーション学習プロセスとは、シミュレーション空間Ａ内において、疑似多軸ロボットに対して動作シーケンスを生み出す動作シーケンス生成機構、演算測量結果の測量機構、報酬フィードバック機構とからなり、シミュレーション学習プロセスの学習結果は学習データに蓄積される。現実学習プロセスにおける学習では、対象物、容器、多軸ロボットなどを準備して学習させる必要があり、学習させるにも人手を要することが多い。しかしシミュレーション学習プロセスを用いると、１回の学習後に疑似対象物の位置情報をリセットすることが容易となり、シミュレーション空間Ａ内で継続的にロボット機械学習制御装置に学習させることが可能となる。 <Simulation Learning Process> The simulation learning process in the present invention comprises an operation sequence generation mechanism for generating an operation sequence for the pseudo multi-axis robot, a survey mechanism for operation survey results, and a reward feedback mechanism in the simulation space A. The learning results of the simulation learning process are accumulated in the learning data. In the learning in the reality learning process, it is necessary to prepare and learn an object, a container, a multi-axis robot, and the like, and in many cases, it also requires manpower to make the learning. However, when the simulation learning process is used, it becomes easy to reset the position information of the pseudo target after one learning, and the robot machine learning control device can continuously learn in the simulation space A.

＜演算測量結果＞本発明における演算測量結果とは、疑似多軸ロボットおよび疑似取出し手段を用いて、疑似第一の容器に配置された疑似対象物をすくい出し、疑似第二の容器に小分けした量を算出した結果を示す。疑似対象物は粒子シミュレーションモデルであり、それ以外の疑似多軸ロボット、疑似取出し手段、疑似第一の容器、および疑似第二の容器はソリッドモデルである。ソリッドモデルの動きに応じて、粒子シミュレーションモデルである疑似対象物の動きを測量機構で演算することで、演算測量結果が得られる。 <Calculation result of calculation> With the calculation result of calculation in the present invention, the pseudo target placed in the pseudo first container was scooped out and divided into the second pseudo container using the pseudo multi-axis robot and the pseudo take-out means. It shows the result of calculating the amount. The simulated object is a particle simulation model, and the other simulated multi-axis robots, simulated extraction means, simulated first container, and simulated second container are solid models. Arithmetic survey results can be obtained by computing the motion of the pseudo target, which is a particle simulation model, using a surveying mechanism according to the motion of the solid model.

＜学習データ＞本発明における学習データとは、定められた量と変動パラメータとに対して動作シーケンスを返す、多段階ニューラルネットワークからなる関数である。学習プロセスにおいて、学習データは、定められた量、変動パラメータ、動作シーケンスに対して、得られた測量結果と定められた量との差が小さくなるようバックプロパゲーションをかけて関数内の変数を修正（学習）していく。なお、まったく学習プロセスを経ていないときの学習データは、乱数により生成する。 <Learning Data> The learning data in the present invention is a function consisting of a multistage neural network that returns an operation sequence to a predetermined amount and a variation parameter. In the learning process, the learning data is subjected to backpropagation so as to reduce the difference between the obtained survey result and the fixed amount for the fixed amount, the fluctuation parameter, and the motion sequence, the variables in the function are We will correct (learn). In addition, the learning data when not passing through the learning process at all are generated by random numbers.

＜実施形態の効果−１＞以上、説明した実施形態の効果について述べる。実施形態のシミュレーション空間Ａにおいて、動作シーケンス生成機構から生じた動作シーケンスに従って、疑似多軸ロボットおよび疑似取出し手段が動作し、疑似対象物をすくい出すことができる。測量機構によりすくい出された演算測量結果が得られ、演算測量結果と定められた量の差分に応じて報酬フィードバック機構により学習データが修正される。以上、シミュレーション空間Ａにおいて学習プロセスを繰り返すことで、人を介在させずに学習データを効率的に修正することができる。 <Effect of Embodiment-1> The effect of the embodiment described above will be described. In the simulation space A of the embodiment, according to the motion sequence generated from the motion sequence generation mechanism, the pseudo multi-axis robot and the pseudo pick-up means can operate to pick up the pseudo object. The surveying mechanism fetches the calculated survey result and the learning feedback is corrected by the reward feedback mechanism according to the difference between the calculated survey result and the determined amount. As described above, by repeating the learning process in the simulation space A, learning data can be efficiently corrected without human intervention.

＜実施形態の効果−２＞実施形態の現実空間Ｂにおいて、動作シーケンス生成機構から生じた動作シーケンスに従って、多軸ロボットおよび取出し手段が動作し、対象物をすくい出すことができる。測量機構によりすくい出された実測量結果が得られ、実測量結果と定められた量の差分に応じて報酬フィードバック機構により学習データが修正される。以上、現実空間Ｂにおいて学習プロセスを繰り返すことで、シミュレーション空間Ａ内では修正できなかった精緻な学習データの修正ができる。 <Effect of Embodiment 2> In the real space B of the embodiment, the multi-axis robot and the extraction means operate in accordance with the operation sequence generated from the operation sequence generation mechanism, and the object can be skimmed out. The measured amount result obtained by the surveying mechanism is obtained, and the learning data is corrected by the reward feedback mechanism according to the difference between the measured amount result and the determined amount. As described above, by repeating the learning process in the real space B, it is possible to correct minute learning data that could not be corrected in the simulation space A.

＜実施形態の効果−３＞実施形態のシミュレーション空間Ａにおいて、現実空間Ｂの対象物が有する変動パラメータを疑似対象物に与えることで、演算測量結果と実測量結果の差を小さくし、シミュレーション学習プロセスにおいて現実学習プロセスに近い学習データの修正ができる。 <Advantage-3 of the embodiment> In the simulation space A of the embodiment, the fluctuation parameter of the object in the real space B is given to the pseudo object, thereby reducing the difference between the result of the operation survey and the result of the actual measurement amount. It is possible to correct learning data close to the real learning process in the process.

＜実施形態の効果−４＞実施形態の学習データは、シミュレーション学習プロセスと現実学習プロセスとで学習データを修正することで、人が介在する時間を少なくしながらも、学習プロセスの繰り返し回数を高くすることが可能となり、効率的に、粉体、粒体又は流体を定められた量だけすくい出す動作をロボット機械学習制御装置に学習させることが可能となる。 <Effects of the Embodiment-4> The learning data in the embodiment corrects the learning data in the simulation learning process and the real learning process to increase the number of repetitions of the learning process while reducing the time for human intervention. It is possible to make the robot machine learning control device learn efficiently the action of scooping powder, particles or fluid by a predetermined amount.

＜変形例の構成−１＞また、上記実施形態では、対象物である粉体又は粒体の変動バラメータとして安息角および／または粒子間摩擦係数を含んでいてもよい。このような場合は、現実空間に近い粒子シミュレーションモデルが可能となり、学習効率がさらに高まる。 <Structure of Modified Example-1> In the above-described embodiment, the repose angle and / or the inter-particle friction coefficient may be included as a variable parameter of the powder or particles as the object. In such a case, a particle simulation model close to the real space becomes possible, and the learning efficiency is further enhanced.

＜変形例の構成−２＞また、上記実施形態では、対象物である流体の変動バラメータとして粘度又は動粘度を含んでいてもよい。このような場合は、例えば味噌など取り出し手段に貼りつくような対象物であっても、現実空間に近い粒子シミュレーションモデルが可能となり、学習効率がさらに高まる。 <Configuration-2 of Modified Example> Further, in the above-described embodiment, a viscosity or a kinematic viscosity may be included as a variable parameter of the fluid that is the target. In such a case, a particle simulation model close to the real space becomes possible even with an object stuck to the taking-out means such as miso, for example, and the learning efficiency is further enhanced.

＜変形例の構成−３＞また、上記実施形態では、温度および／または湿度といった環境条件に応じた変動パラメータを複数含んでいてもよい。種々の環境条件に対して効率的に学習させることで、砂糖のように温度や湿度により性情が異なってくる粉・粒体に対しても、環境条件に応じた動作シーケンスを生成することが可能となる。 <Configuration-3 of Modification> In the above-described embodiment, a plurality of fluctuation parameters according to environmental conditions such as temperature and / or humidity may be included. By efficiently learning for various environmental conditions, it is possible to generate an operation sequence according to environmental conditions, even for powders and granules that differ in character depending on temperature and humidity like sugar. It becomes.

以上、実施形態を用いて本発明を説明したが、本発明の技術的範囲は上記実施形態に記載の範囲には限定されないことは言うまでもない。上記実施形態に、多様な変更又は改良を加えることが可能であることが当業者に明らかである。また、その様な変更又は改良を加えた形態も本発明の技術的範囲に含まれ得ることが、特許請求の範囲の記載から明らかである。
Although the present invention has been described above using the embodiment, it goes without saying that the technical scope of the present invention is not limited to the scope described in the above embodiment. It is apparent to those skilled in the art that various changes or modifications can be added to the above embodiment. It is also apparent from the scope of the claims that the embodiments added with such alterations or improvements can be included in the technical scope of the present invention.

Claims

An operation of skimming a predetermined amount of powder, particles or fluid as an object from a first container into a second container using a multi-axis robot and a take-out means installed at the tip of the multi-axis robot A machine learning method for causing a robot machine learning control device to learn a robot, comprising: a reality learning process for performing reinforcement learning with the object in a real space; and a pseudo object simulating the object in a simulation space, A machine learning method comprising: a simulation learning process of reinforcement learning, wherein the pseudo target has a variation parameter of the target.

The machine learning method according to claim 1,
A machine learning method characterized in that the fluctuation parameter of the powder or particles includes an angle of repose of the object.

The machine learning method according to claim 2,
A machine learning method characterized in that the fluctuation parameter of the powder or particles includes a plurality of repose angles according to the temperature and / or humidity of the object.

The machine learning method according to claim 1,
A machine learning method characterized in that the fluctuation parameter of the powder or particles is an interparticle friction coefficient of the object.

The machine learning method according to claim 4,
A machine learning method characterized in that the fluctuation parameter of the powder or particles includes a plurality of interparticle friction coefficients according to the temperature and / or humidity of the object.

The machine learning method according to claim 1,
A machine learning method characterized in that the fluctuation parameter of the fluid is the viscosity or the kinematic viscosity of the object.

A robot machine learning control device learned by the machine learning method according to any one of claims 1 to 6.