JP2017132260A

JP2017132260A - System capable of calculating optimum operating conditions in injection molding

Info

Publication number: JP2017132260A
Application number: JP2017053182A
Authority: JP
Inventors: 白石　亘; Wataru Shiraishi; 亘白石; 山本　和弘; Kazuhiro Yamamoto; 和弘山本
Original assignee: Fanuc Corp
Current assignee: Fanuc Corp
Priority date: 2017-03-17
Filing date: 2017-03-17
Publication date: 2017-08-03

Abstract

PROBLEM TO BE SOLVED: To provide an injection molding system capable of adjusting operation conditions including molding conditions in a short time and capable of enabling molding under conditions of lower power consumption.SOLUTION: In a system in which a plurality of injection molding systems according to the present invention are interconnected via communication means, each of the injection molding system comprises: a state observing section for observing a physical quantity relating to injection molding; a physical quantity data storing section for storing the physical quantity data; a reward calculating section for calculating a reward on the basis of the physical quantity data and the reward condition; an operation condition adjustment learning section for machine-learning operation condition adjustments; a learning result storing section for storing a result of machine-learning by the operation condition adjustment learning section; and an operation condition adjustment amount outputting section for determining and outputting a target for operation condition adjustment and an amount of adjustment on the basis of the machine-learning performed by the operation condition adjustment learning section. The individual injection molding systems send/receive and share the data stored by the physical quantity data storing section and/or the learning result storing section.SELECTED DRAWING: Figure 2

Description

本発明は、射出成形システムに関し、特にオペレータによる調整なしに最適な操作条件を算出できる射出成形システムに関する。 The present invention relates to an injection molding system, and more particularly to an injection molding system capable of calculating an optimum operation condition without adjustment by an operator.

新規成形品の金型を作製した場合、該金型に基づく成形品の量産を開始する前に、成形条件を含む操作条件の最適な値を算出する必要がある。射出成形機の最適な操作条件を算出する操作条件出し作業においては、経験に基づいておおよその基準となる操作条件を設定して成形作業を行いつつ、オペレータが、工程監視データや成形品重量計測を参照し、成形品を目視チェックするなどして成形状態を確認しながら、各種操作条件を調整し、最適な操作条件になるよう調整する必要がある。このため、オペレータが各種操作条件の調整や、各操作条件で成形された成形品を比較しながら、時間を掛けて最適な操作条件を算出する必要があった。 When a mold for a new molded product is produced, it is necessary to calculate an optimum value for the operating conditions including the molding conditions before starting mass production of a molded product based on the mold. In the operation condition calculation work to calculate the optimum operation condition of the injection molding machine, the operator sets process conditions that are approximate standards based on experience and performs the molding work, while the operator measures the process monitoring data and the molded product weight It is necessary to adjust various operating conditions while checking the molding state by visually checking the molded product, etc., so that the optimum operating conditions are obtained. For this reason, it is necessary for the operator to calculate the optimum operating conditions over time while adjusting various operating conditions and comparing molded products molded under each operating condition.

一方で、オペレータによる成形条件の設定作業を支援する従来技術として、成形条件や成形データを不揮発性メモリに記憶しておいて比較表示する技術や、オペレータの要求に応じて過去の成形条件を読み出して利用する技術などが開示されている（例えば、特許文献１，２など）。 On the other hand, as a conventional technology that supports the operator to set molding conditions, a technique for storing molding conditions and molding data in a non-volatile memory for comparison display, and reading past molding conditions in response to operator requests And the like are disclosed (for example, Patent Documents 1 and 2).

特開平０６−０３９８８９号公報Japanese Patent Laid-Open No. 06-039889 特開平１１−３３３８９９号公報JP 11-333899 A

オペレータによる操作条件出し作業には、作業を行うオペレータの技量により、最適な操作条件を算出するまでに時間が掛かったり、オペレータによって最適な操作条件のレベル（品質）に差が生じたりすることがあり、毎回、同じ基準での操作条件を算出することが困難であるという課題があった。 Depending on the skill of the operator performing the operation, it may take time to calculate the optimum operation condition, or the operator may have a difference in the level (quality) of the optimum operation condition. There is a problem that it is difficult to calculate the operation condition based on the same standard every time.

また、操作条件出し作業においては、量産時における成形品の製造コストの観点から成形時の消費電力を抑えることができる操作条件を導出することも重要な観点である。しかし、成形品の品質を高く保ちながら消費電力を抑えて成形するための操作条件を導出することは熟練のオペレータであっても困難であるという課題があった。 In the operation condition determination work, it is also an important viewpoint to derive an operation condition capable of suppressing power consumption during molding from the viewpoint of manufacturing cost of a molded product during mass production. However, there is a problem that it is difficult even for a skilled operator to derive operation conditions for molding while suppressing power consumption while maintaining high quality of a molded product.

そして、このような課題は特許文献１，２に開示されるように、成形条件の履歴を記憶して単に利用するだけでは解決することができない。 And such a subject cannot be solved only by memorize | storing and using the log | history of shaping | molding conditions, as disclosed by patent document 1,2.

そこで本発明の目的は、成形条件を含む操作条件の調整を短時間で行うことが可能であり、また、より消費電力の少ない条件で成形を行うことを可能とする射出成形システムを提供することである。 Accordingly, an object of the present invention is to provide an injection molding system that can adjust operation conditions including molding conditions in a short time and that can perform molding under conditions with less power consumption. It is.

本願の請求項１に係る発明は、それぞれが少なくとも１つの射出成形機を備え、機械学習を行う人工知能を有する複数の射出成形システム同士が互いに通信手段を介して接続されて構成されるシステムであって、前記射出成形システムのそれぞれは、前記射出成形機による射出成形を実行した時、実行中の射出成形に関する物理量を観測する状態観測部と、前記状態観測部で観測した物理量データを記憶する物理量データ記憶部と、前記機械学習における報酬条件を設定する報酬条件設定部と、前記状態観測部が観測した前記物理量データと前記報酬条件設定部に設定された前記報酬条件とに基づいて報酬を計算する報酬計算部と、前記報酬計算部が算出した前記報酬と前記射出成形システムに設定されている操作条件と前記物理量データとに基づいて操作条件調整の機械学習を行う操作条件調整学習部と、前記操作条件調整学習部が機械学習した学習結果を記憶する学習結果記憶部と、前記操作条件調整学習部が行った学習結果に基づいて操作条件調整の対象と調整量を決定して出力する操作条件調整量出力部と、を備え、それぞれの射出成形システムが備える前記物理量データ記憶部が記憶した物理量データや学習結果記憶部が記憶した学習結果を送受信して共有する、システムである。 The invention according to claim 1 of the present application is a system that includes at least one injection molding machine, and is configured by connecting a plurality of injection molding systems having artificial intelligence that perform machine learning to each other via communication means. Each of the injection molding systems stores a physical quantity data observed by the state observing unit and a state observing unit for observing a physical quantity related to the injection molding being performed when the injection molding by the injection molding machine is executed. Based on the physical quantity data storage unit, the reward condition setting unit for setting the reward condition in the machine learning, the physical quantity data observed by the state observation unit and the reward condition set in the reward condition setting unit A remuneration calculation unit to calculate, the remuneration calculated by the remuneration calculation unit, operation conditions set in the injection molding system, and the physical quantity data An operation condition adjustment learning unit that performs machine learning of operation condition adjustment based on a learning result storage unit that stores a learning result obtained by machine learning by the operation condition adjustment learning unit, and a learning result performed by the operation condition adjustment learning unit. An operation condition adjustment amount output unit for determining and outputting an operation condition adjustment target and an adjustment amount based on the physical quantity data stored in the physical quantity data storage unit included in each injection molding system and a learning result storage unit It is a system that sends and receives stored learning results and shares them.

本願の請求項２に係る発明は、前記学習結果記憶部に記憶された学習結果を前記操作条件調整学習部の学習に使用することを特徴とした請求項１に記載のシステムである。 The invention according to claim 2 of the present application is the system according to claim 1, wherein the learning result stored in the learning result storage unit is used for learning of the operation condition adjustment learning unit.

本願の請求項３に係る発明は、前記射出成形システムのそれぞれは計測手段を更に備え、前記状態観測部が観測する前記物理量データは、前記計測手段によって計測される成形品の重量、寸法、成形品の画像データから算出される外観、長さ、角度、面積、体積、光学成形品の光学検査結果、成形品強度計測結果のうちの少なくとも一つを含み、前記物理量データ記憶部が他の前記物理量データと共に一つの成形品の物理量データとして記憶する、ことを特徴とする請求項１または２に記載のシステムである。 In the invention according to claim 3 of the present application, each of the injection molding systems further includes a measurement unit, and the physical quantity data observed by the state observation unit is the weight, size, and molding of the molded product measured by the measurement unit. Including at least one of the appearance, length, angle, area, volume, optical inspection result of the optical molded product, and measurement result of the molded product strength calculated from the image data of the product, and the physical quantity data storage unit The system according to claim 1 or 2, wherein the physical quantity data is stored as physical quantity data of one molded product together with the physical quantity data.

本願の請求項４に係る発明は、前記射出成形機に備えられた表示器から前記報酬条件設定部への報酬条件の入力が可能である、ことを特徴とする請求項１〜３のいずれか１つに記載のシステムである。 The invention according to claim 4 of the present application is capable of inputting reward conditions from the display device provided in the injection molding machine to the reward condition setting unit. It is a system as described in one.

本願の請求項５に係る発明は、前記報酬計算部が、物理量データの安定化、サイクルタイム短縮、省エネルギー化のうちの少なくとも１つに寄与するとその程度に応じてプラスの報酬を与える、ことを特徴とする請求項１〜４のいずれか１つに記載のシステムである。 The invention according to claim 5 of the present application is that, when the reward calculation unit contributes to at least one of stabilization of physical quantity data, reduction of cycle time, and energy saving, a positive reward is given according to the degree. The system according to claim 1, wherein the system is characterized in that

本願の請求項６に係る発明は、前記報酬計算部が、物理量データが不安定化、サイクルタイム延長、消費エネルギー増大のうちの少なくとも１つの事象が発生したらその程度に応じてマイナスの報酬を与える、ことを特徴とする請求項１〜５のいずれか１つに記載のシステムである。 In the invention according to claim 6 of the present application, the reward calculation unit gives a negative reward depending on the degree when at least one of the physical quantity data occurs, the cycle time is extended, or the energy consumption is increased. The system according to any one of claims 1 to 5.

本願の請求項７に係る発明は、前記物理量データに予め許容値が設定されており、前記報酬計算部が前記許容値内に前記物理量データが収まるとプラスの報酬を与える、ことを特徴とした請求項１〜６のいずれか１つに記載のシステムである。 The invention according to claim 7 of the present application is characterized in that an allowable value is set in advance in the physical quantity data, and the reward calculation unit gives a positive reward when the physical quantity data falls within the allowable value. It is a system as described in any one of Claims 1-6.

本願の請求項８係る発明は、前記物理量データに予め許容値が設定されており、前記報酬計算部が前記許容値から前記物理量データが外れるとその乖離量に基づいてマイナスの報酬を与える、ことを特徴とする請求項１〜７のいずれか１つに記載のシステムである。 In the invention according to claim 8 of the present application, an allowable value is set in advance in the physical quantity data, and when the physical quantity data deviates from the allowable value, the reward calculation unit gives a negative reward based on the deviation amount. The system according to claim 1, wherein:

本願の請求項９係る発明は、前記物理量データに予め目標値が設定されており、前記報酬計算部が前記目標値に前記物理量データが接近すると前記目標値と前記物理量との乖離量に基づいてプラスの報酬を与える、ことを特徴とする請求項１〜８のいずれか１つに記載のシステムである。 In the invention according to claim 9 of the present application, a target value is set in advance in the physical quantity data, and when the physical quantity data approaches the target value, the reward calculation unit is based on a deviation amount between the target value and the physical quantity. The system according to claim 1, wherein a positive reward is given.

本願の請求項１０係る発明は、前記物理量データに予め目標値が設定されており、前記報酬計算部が前記目標値から前記物理量データが離れると前記目標値と前記物理量との乖離量に基づいてマイナスの報酬を与える、ことを特徴とする請求項１〜９のいずれか１つに記載のシステムである。 In the invention according to claim 10 of the present application, a target value is set in advance in the physical quantity data, and when the physical quantity data is separated from the target value, the reward calculation unit is based on a deviation amount between the target value and the physical quantity. The system according to claim 1, wherein a negative reward is given.

本願の請求項１１係る発明は、前記報酬計算部が、成形不良を示す状態が発生したらその程度に応じてマイナスの報酬を与える、ことを特徴とする請求項１〜１０のいずれか１つに記載のシステムである。 The invention according to claim 11 of the present application is characterized in that the reward calculation unit gives a negative reward depending on the degree of occurrence of a state indicating a molding defect, according to any one of claims 1 to 10. The described system.

本願の請求項１２係る発明は、前記成形不良が、バリ、ヒケ、ソリ、気泡、ショート、フローマーク、ウェルド、シルバーストリーク、色むら、変色、炭化、不純物の混入、レンズ成形品の光軸の許容値外へのずれ、成形品厚さ不良のうちの少なくとも１つである、ことを特徴とする請求項１１に記載のシステムである。 In the invention according to claim 12 of the present application, the molding defects are burrs, sink marks, warps, bubbles, shorts, flow marks, welds, silver streaks, color unevenness, discoloration, carbonization, mixing of impurities, and the optical axis of the lens molded product. The system according to claim 11, wherein the system is at least one of a deviation outside an allowable value and a molded product thickness defect.

本願の請求項１３係る発明は、前記操作条件調整学習部が機械学習する操作条件が、型締条件、エジェクト条件、射出保圧条件、計量条件、温度条件、ノズルタッチ条件、樹脂供給条件、型厚条件、成形品取出条件、ホットランナ条件のうちの少なくとも１つである、ことを特徴とする請求項１〜１２のいずれか１つに記載のシステムである。 According to the thirteenth aspect of the present invention, the operation conditions that the operation condition adjustment learning unit performs machine learning include mold clamping conditions, ejection conditions, injection pressure holding conditions, measurement conditions, temperature conditions, nozzle touch conditions, resin supply conditions, molds The system according to claim 1, wherein the system is at least one of a thickness condition, a molded product extraction condition, and a hot runner condition.

本願の請求項１４係る発明は、前記成形品取出条件が設定される成形品取出手段としてのロボットを更に備える、ことを特徴とする請求項１３に記載のシステムである。 A fourteenth aspect of the present invention is the system according to the thirteenth aspect, further comprising a robot as a molded product extraction means for setting the molded product extraction condition.

本願の請求項１５係る発明は、前記操作条件のうちの少なくとも１つを所定の範囲内で変動させて前記操作条件調整学習部に学習させる、ことを特徴とする請求項１〜１４のいずれか１つに記載のシステムである。 The invention according to claim 15 of the present application is characterized in that at least one of the operation conditions is varied within a predetermined range and is learned by the operation condition adjustment learning unit. It is a system as described in one.

本発明では、機械学習を最適な操作条件の算出に取り入れることにより、各種操作条件調整を短時間で行うことや、より安定した成形が可能になる。また、より消費電力の少ない条件で成形を行うことが可能になる。更に、複数の射出成形システムにおける調整中の成形データや学習データを共有して機械学習に利用することで、より優れた結果を得られる機械学習を射出成形システムごとに実現することが可能となる。 In the present invention, by incorporating machine learning into the calculation of the optimum operating conditions, various operating conditions can be adjusted in a short time and more stable molding can be performed. Further, it becomes possible to perform molding under conditions with less power consumption. Furthermore, by sharing the molding data and learning data that are being adjusted in a plurality of injection molding systems and using them for machine learning, it is possible to realize machine learning that can obtain better results for each injection molding system. .

強化学習アルゴリズムの基本的な概念を説明する図である。It is a figure explaining the basic concept of a reinforcement learning algorithm. 本発明の実施形態における射出成形システムの概略構成図である。It is a schematic block diagram of the injection molding system in embodiment of this invention. １ショット間の射出保圧圧力データを圧力波形として表示した例である。This is an example in which injection holding pressure data for one shot is displayed as a pressure waveform.

以下、本発明の実施形態を図面と共に説明する。
本発明では、射出成形システムに対して人工知能となる機械学習器を導入し、射出成形に関する操作条件に関する機械学習を行うことで、射出成形における各種条件の調整を行うようにする。これによって、短期間で最適な操作条件を算出して成形をより安定化させたり、より省エネルギーに寄与したりすることが可能な射出成形システムを提案する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
In the present invention, a machine learning device that becomes an artificial intelligence is introduced into the injection molding system, and machine learning related to operation conditions related to injection molding is performed to adjust various conditions in injection molding. In this way, an injection molding system capable of calculating the optimum operation condition in a short period of time and stabilizing the molding or contributing to energy saving is proposed.

＜１．機械学習＞
一般に、機械学習には教師あり学習や教師なし学習など、その目的や条件によって様々なアルゴリズムに分類されている。本発明では金型に対する操作条件出し作業の学習を目的としており、射出成形システムでは射出環境において直接に測定できないパラメータ等が存在すること、射出した結果である成形品の状態に対してどのような行動（操作条件の調整）をすることが正しいのかを明示的に示すことが困難であることを考慮して、報酬を与えるだけで機械学習器が目標到達のための行動を自動的に学習する強化学習のアルゴリズムを採用する。 <1. Machine learning>
In general, machine learning is classified into various algorithms depending on the purpose and conditions such as supervised learning and unsupervised learning. The purpose of the present invention is to learn the operation condition setting work for the mold. In the injection molding system, there are parameters that cannot be directly measured in the injection environment, and what kind of condition is given to the state of the molded product as a result of injection. Considering that it is difficult to explicitly indicate whether it is correct to perform an action (adjustment of operation conditions), the machine learner automatically learns the action to reach the goal simply by giving a reward Reinforcement learning algorithm is adopted.

図１は、強化学習アルゴリズムの基本的な概念を説明する図である。強化学習においては、学習する主体となるエージェント（機械学習器）と、制御対象となる環境（制御対象システム）とのやりとりにより、エージェント学習と行動が進められる。より具体的には、（１）エージェントはある時点における環境の状態ｓ_tを観測し、（２）観測結果と過去の学習に基づいて自分が取れる行動ａ_tを選択して行動ａ_tを実行し、（３）行動ａ_tが実行されることで環境の状態ｓ_tが次の状態ｓ_t+1へと変化し、（４）行動ａ_tの結果としての状態の変化に基づいてエージェントが報酬ｒ_t+1を受け取り、（５）エージェントが状態ｓ_t、行動ａ_t、報酬ｒ_t+1および過去の学習の結果に基づいて学習を進める、といったやりとりがエージェントと環境の間で行われる。 FIG. 1 is a diagram for explaining the basic concept of the reinforcement learning algorithm. In reinforcement learning, agent learning and actions are performed by interaction between an agent (machine learning device) as a subject to learn and an environment (control target system) as a control target. More specifically, (1) The agent observes the state s _t environment in some point, (2) Observation and executing an action a _t Select they take actions a _t on the basis of past learning and, (3) action a _t the state s _t environment by runs is changed to the next state s _{t + 1,} the agent based on the change of state as a result of (4) action a _t receive a reward r _{t + 1,} (5) the agent state s _t, act a _t, based on the reward r _{t + 1} and the results of past learning advancing learning, exchanges such is performed between the agent and the environment .

上記した（５）における学習では、エ−ジェントは将来取得できる報酬の量を判断するための基準となる情報として、観測された状態ｓ_t，行動ａ_t，報酬ｒ_t+1のマッピングを獲得する。例えば、各時刻において取り得る状態の個数がｍ、取り得る行動の個数がｎとすると、行動を繰り返すことによって状態ｓ_tと行動ａ_tの組に対する報酬ｒ_t+1を記憶するｍ×ｎの２次元配列が得られる。
そして、上記得られたマッピングに基づいて現在の状態や行動がどのくらい良いのかを示す関数である価値関数（評価関数）を用い、行動を繰り返す中で価値関数（評価関数）を更新していくことにより状態に対する最適な行動を学習していく。 The learning in the above (5), d - stringent as serving as a reference information for determining the amount of compensation that can be acquired in the future, acquired observed state s _t, act a _t, the mapping reward r _{t + 1} To do. For example, the number of possible states at each time m, the number of actions that can be taken is when the n, the m × n for storing a reward r _{t + 1} for the set of states s _t and action a _t by repeating the action A two-dimensional array is obtained.
Based on the mapping obtained above, the value function (evaluation function), which is a function indicating how good the current state or action is, is updated while the action is repeated. To learn the best behavior for the situation.

状態価値関数は、ある状態ｓ_tがどのくらい良い状態であるのかを示す価値関数である。状態価値関数は、状態を引数とする関数として表現され、行動を繰り返す中での学習において、ある状態における行動に対して得られた報酬や、該行動により移行する未来の状態の価値などに基づいて更新される。状態価値関数の更新式は強化学習のアルゴリズムに応じて定義されており、例えば、強化学習アルゴリズムの１つであるＴＤ学習においては、状態価値関数は以下の数１式で定義される。なお、数１式においてαは学習係数、γは割引率と呼ばれ、０＜α≦１、０＜γ≦１の範囲で定義される。 State value function is a value function that indicates whether it is how much good state a state s _t is. The state value function is expressed as a function with the state as an argument, and is based on the reward obtained for the action in a certain state in learning while repeating the action, the value of the future state that is shifted by the action, etc. Updated. The state value function update equation is defined according to the reinforcement learning algorithm. For example, in TD learning, which is one of the reinforcement learning algorithms, the state value function is defined by the following equation (1). In Equation 1, α is called a learning coefficient, and γ is called a discount rate, and is defined in the range of 0 <α ≦ 1 and 0 <γ ≦ 1.

また、行動価値関数は、ある状態ｓ_tにおいて行動ａ_tがどのくらい良い行動であるのかを示す価値関数である。行動価値関数は、状態と行動を引数とする関数として表現され、行動を繰り返す中での学習において、ある状態における行動に対して得られた報酬や、該行動により移行する未来の状態における行動の価値などに基づいて更新される。行動価値関数の更新式は強化学習のアルゴリズムに応じて定義されており、例えば、代表的な強化学習アルゴリズムの１つであるＱ学習においては、行動価値関数は以下の数２式で定義される。なお、数２式においてαは学習係数、γは割引率と呼ばれ、０＜α≦１、０＜γ≦１の範囲で定義される。 In addition, action-value function is a value function that indicates whether it is how much good behavior action a _t is in a certain state s _t. The action value function is expressed as a function with the state and action as arguments, and in learning while repeating the action, the reward obtained for the action in a certain state and the action in the future state that is shifted by the action Updated based on value etc. The update formula of the action value function is defined according to the reinforcement learning algorithm. For example, in Q learning which is one of the typical reinforcement learning algorithms, the action value function is defined by the following equation (2). . In Equation 2, α is called a learning coefficient, and γ is called a discount rate, and is defined in the range of 0 <α ≦ 1 and 0 <γ ≦ 1.

なお、この価値関数（評価関数）を記憶する方法としては、近似関数を用いる方法や、配列を用いる方法以外にも、例えば状態ｓが多くの状態を取るような場合には状態ｓ_t、行動ａ_tを入力として価値（評価）を出力する多値出力のＳＶＭやニューラルネットワーク等の教師あり学習器を用いるようにしてもよい。 As a method for storing the value function (evaluation function), in addition to a method using an approximate function and a method using an array, for example, when the state s takes many states, the state _st , worth a _t as input (evaluation) may be used supervised learning device such as SVM or a neural network of multi-valued output for outputting.

そして、上記した（２）における行動の選択においては、過去の学習によって作成された価値関数（評価関数）を用いて現在の状態ｓ_tにおいて将来にわたっての報酬（ｒ_t+1＋ｒ_t+2＋…）が最大となる行動ａ_t（状態価値関数を用いている場合には、もっとも価値の高い状態へ移るための行動、行動価値関数を用いている場合には該状態において最も価値の高い行動）を選択する。なお、エージェントの学習中には学習の進展を目的として（２）における行動の選択において一定の確率でランダムな行動を選択することもある（εグリーディ法）。 Then, in the selection of the behavior in the above (2), reward future in the current state s _t with the value created by the previous learning function (cost _{function) (r t + 1 + r} t + 2 + If the ...) is using action a _t (state value function becomes maximum, most actions to move to a higher-value state, most valuable high action in the condition in case of using the action value function ) Is selected. During the learning of the agent, a random action may be selected with a certain probability in the action selection in (2) for the purpose of learning progress (ε-greedy method).

このように、（１）〜（５）を繰り返すことで学習が進められる。ある環境において学習が終了した後に、新たな環境におかれた場合でも追加の学習を行うことでその環境に適応するように学習を進めることができる。したがって、本発明のように操作条件出し作業に適用することで、新規に成形品の金型を作製した際にも、過去の操作条件出し作業の学習に、新規の成形品の金型を新たな環境とした追加の操作条件出し作業の学習をすることで、各種条件調整を短時間で行うことが可能となる。 Thus, learning is advanced by repeating (1) to (5). After learning is completed in a certain environment, learning can be advanced so as to adapt to the environment by performing additional learning even in a new environment. Therefore, by applying to the operation condition setting work as in the present invention, even when a new mold for a molded product is produced, a new mold for the new molded product is added to the learning of the past operation condition calculation work. It is possible to adjust various conditions in a short time by learning the additional operation condition setting work in a simple environment.

また、強化学習においては、複数のエージェントをネットワークなどを介して接続したシステムとし、エージェント間で状態ｓ、行動ａ、報酬ｒなどの情報を共有してそれぞれの学習に利用することで、それぞれのエージェントが他のエージェントの環境も考慮して学習をする分散強化学習を行うことで効率的な学習を行うことができる。本発明においても、複数の環境（制御対象となる射出成形機）を制御する複数のエージェント（機械学習器）がネットワークなどを介して接続された状態で分散機械学習を行うことで、金型に対する操作条件出し作業の学習を効率的に行わせることができるようになる。 In reinforcement learning, a system in which a plurality of agents are connected via a network or the like, and information such as state s, action a, and reward r is shared between the agents and used for each learning. Efficient learning can be performed by performing distributed reinforcement learning in which an agent learns considering the environment of other agents. Also in the present invention, by performing distributed machine learning in a state where a plurality of agents (machine learning devices) that control a plurality of environments (injection molding machines to be controlled) are connected via a network or the like, This makes it possible to efficiently learn the operation condition setting work.

なお、強化学習のアルゴリズムとしては、Ｑ学習、ＳＡＲＳＡ法、ＴＤ学習、ＡＣ法など様々な手法が周知となっているが、本発明に適用する方法としていずれの強化学習アルゴリズムを採用してもよい。なお、それぞれの強化学習アルゴリズムは周知なので、本明細書における各アルゴリズムの詳細説明は省略する。
以下では、機械学習器を導入した本発明の射出成形システムについて、具体的な実施形態に基づいて説明する。 Various methods such as Q learning, SARSA method, TD learning, and AC method are well known as reinforcement learning algorithms, but any reinforcement learning algorithm may be adopted as a method applied to the present invention. . In addition, since each reinforcement learning algorithm is well-known, detailed description of each algorithm in this specification is abbreviate | omitted.
Below, the injection molding system of this invention which introduced the machine learning device is demonstrated based on specific embodiment.

＜２．実施形態＞
図２は、本発明の一実施形態における射出成形システムの概略構成を示す図である。本実施形態の射出成形システム１は、射出成形機２、金型３、その他の周辺装置などの制御対象と、機械学習を行う人工知能となる機械学習器２０などで構成されている。図２に示した構成を、図１に示した強化学習における要素と対比すると、機械学習器２０がエージェントに対応し、射出成形機２、金型３、その他の周辺装置などの制御対象を含む全体が環境に対応する。 <2. Embodiment>
FIG. 2 is a diagram showing a schematic configuration of an injection molding system according to an embodiment of the present invention. The injection molding system 1 according to the present embodiment includes a control object such as an injection molding machine 2, a mold 3, and other peripheral devices, and a machine learning device 20 serving as an artificial intelligence that performs machine learning. When the configuration shown in FIG. 2 is compared with the elements in the reinforcement learning shown in FIG. 1, the machine learning device 20 corresponds to the agent and includes control objects such as the injection molding machine 2, the mold 3, and other peripheral devices. The whole corresponds to the environment.

射出成形システム１は様々な装置で構成されており、それぞれに制御装置やセンサ等が備えられている場合がある。
射出成形機２が備える制御装置としては、型締力制御装置７、型開閉、型厚調整、エジェクト、スクリュ回転、スクリュ前後進、射出ユニット前後進、適量供給装置用フィーダーの回転等を駆動するための制御装置９の他、ノズル・シリンダ、ホッパ下部温度制御装置、圧力制御装置、ノズルタッチ力制御装置等がある。 The injection molding system 1 is composed of various devices, and each may be equipped with a control device, a sensor, and the like.
As the control device provided in the injection molding machine 2, the mold clamping force control device 7, mold opening / closing, mold thickness adjustment, eject, screw rotation, screw forward / backward movement, injection unit forward / backward movement, rotation of an appropriate amount feeder feeder, etc. are driven. There are a nozzle / cylinder, a hopper lower temperature control device, a pressure control device, a nozzle touch force control device and the like in addition to the control device 9 for the above.

金型３関係では、金型温度制御装置、ホットランナ温度・ノズル開閉制御装置、射出圧縮用可動部材制御装置、コアセット・コアプル制御装置、多色成形等のスライドテーブル・ロータリーテーブル用制御装置、多色成形用材料流路切替え装置、金型加振装置などがある。 In relation to the mold 3, a mold temperature control device, a hot runner temperature / nozzle opening / closing control device, a movable member control device for injection compression, a core set / core pull control device, a control device for slide table / rotary table such as multicolor molding, There are multicolor molding material flow path switching devices, mold vibration devices, and the like.

周辺装置としては、制御装置８に制御される成形品取出装置（ロボット）５の他、インサート品挿入装置、入子挿入装置、インモールド成形の箔送り装置、フープ成形用フープ送り装置、ガスアシスト成形用ガス注入装置、超臨界流体を用いた発泡成形用のガス注入装置や長繊維注入装置、ＬＩＭ成形用２材混合装置、成形品のバリ取り装置、ランナ切断装置、成形品重量計、成形品強度試験機、成形品の光学検査装置、成形品撮影装置及び画像処理装置、成形品運搬用ロボットなどがある。
これらの装置の中にはセンサを備えてクローズドループによるフィードバック制御やフィードフォワード制御を行っている制御装置もある。また、データだけを出力する装置もある。 Peripheral devices include a molded product take-out device (robot) 5 controlled by the control device 8, an insert product insertion device, a insert insertion device, an in-mold molding foil feeding device, a hoop molding hoop feeding device, and a gas assist. Gas injection device for molding, gas injection device for foam molding using supercritical fluid, long fiber injection device, two-material mixing device for LIM molding, deburring device for molded product, runner cutting device, molded product weight meter, molding There are a product strength testing machine, a molded product optical inspection device, a molded product photographing device and an image processing device, a molded product transport robot, and the like.
Among these devices, there is also a control device that includes a sensor and performs feedback control and feedforward control by a closed loop. Some devices output only data.

また、機械学習を行う機械学習器２０は、状態観測部２１、物理量データ記憶部２２、報酬条件設定部２３、報酬計算部２４、操作条件調整学習部２５、学習結果記憶部２６、操作条件調整量出力部２７を備える。前記機械学習器２０は、射出成形機２内に備えてもよいし、射出成形機２外のパソコン等に備えるようにしてもよい。 Further, the machine learning device 20 that performs machine learning includes a state observation unit 21, a physical quantity data storage unit 22, a reward condition setting unit 23, a reward calculation unit 24, an operation condition adjustment learning unit 25, a learning result storage unit 26, and an operation condition adjustment. A quantity output unit 27 is provided. The machine learning device 20 may be provided in the injection molding machine 2 or may be provided in a personal computer or the like outside the injection molding machine 2.

状態観測部２１は、前記の射出成形システム１の各装置から出力される射出成形に関する物理量データを観測して機械学習器２０内に取得する機能手段である。物理量データとしては温度、位置、速度、加速度、電流、電圧、圧力、時間、画像データ、画像解析データ、トルク、力、歪、消費電力、金型開き量、バックフロー量、タイバー変形量、ヒータ加熱率、成形品の重量、成形品の強度、成形品の寸法、成形品の画像データから算出される外観、成形品各部の長さ、角度、面積、体積、光学成形品の光学検査結果、成形品強度計測結果、透明成形品の光軸ズレ量、さらに各物理量を演算処理して算出した算出値などがある。これら物理量データの組により、機械学習に用いられる環境の状態ｓが定義される。 The state observation unit 21 is a functional unit that observes physical quantity data related to injection molding output from each device of the injection molding system 1 and acquires the physical quantity data in the machine learning device 20. Physical quantity data includes temperature, position, speed, acceleration, current, voltage, pressure, time, image data, image analysis data, torque, force, strain, power consumption, mold opening, backflow, tie bar deformation, heater Heating rate, weight of molded product, strength of molded product, dimensions of molded product, appearance calculated from image data of molded product, length, angle, area, volume of each part of molded product, optical inspection result of optical molded product, There are a molded product strength measurement result, an optical axis shift amount of the transparent molded product, and a calculated value calculated by calculating each physical quantity. An environment state s used for machine learning is defined by a set of these physical quantity data.

物理量データ記憶部２２は物理量データを入力して記憶し、記憶した該物理量データを報酬計算部２４や操作条件調整学習部２５に対して出力する機能手段である。物理量データ記憶部２２は、射出成形時に状態観測部２１が観測した物理量データを前記射出成形により成形された一つの成形品の物理量データとして記憶する。入力される物理量データは、最新の成形運転で取得したデータでも、過去の成形運転で取得したデータでも構わない。また、他の射出成形システム１や集中管理システム３０に記憶された物理量データを入力して記憶したり、出力したりすることも可能である。 The physical quantity data storage unit 22 is a functional unit that inputs and stores physical quantity data and outputs the stored physical quantity data to the reward calculation unit 24 and the operation condition adjustment learning unit 25. The physical quantity data storage unit 22 stores the physical quantity data observed by the state observation unit 21 at the time of injection molding as physical quantity data of one molded product molded by the injection molding. The input physical quantity data may be data acquired in the latest molding operation or data acquired in the past molding operation. Further, physical quantity data stored in another injection molding system 1 or the centralized management system 30 can be input and stored or output.

報酬条件設定部２３は、機械学習において報酬を与える条件を設定するための機能手段である。報酬にはプラスの報酬とマイナスの報酬があり、適宜設定が可能である。さらに、報酬条件設定部２３への入力は集中管理システムで使用しているパソコンやタブレット端末等からでも構わないが、射出成形機２が備える表示器６を介して入力できるようにすることで、より簡便に設定することが可能となる。
報酬計算部２４は、報酬条件設定部２３で設定された条件に基づいて状態観測部２１または物理量データ記憶部２２から入力された物理量データを分析し、計算された報酬を操作条件調整学習部２５に出力する。報酬計算部２４が出力する報酬が、機械学習に用いられる報酬ｒに相当する。 The reward condition setting unit 23 is a functional means for setting conditions for giving reward in machine learning. There are positive and negative rewards, which can be set as appropriate. Furthermore, the input to the reward condition setting unit 23 may be from a personal computer or a tablet terminal used in the centralized management system, but by enabling the input via the display 6 provided in the injection molding machine 2, It becomes possible to set more simply.
The reward calculation unit 24 analyzes the physical quantity data input from the state observation unit 21 or the physical quantity data storage unit 22 based on the conditions set by the reward condition setting unit 23, and calculates the calculated rewards to the operation condition adjustment learning unit 25. Output to. The reward output by the reward calculation unit 24 corresponds to the reward r used for machine learning.

以下に、本実施形態における報酬条件設定部２３で設定する報酬条件の例を示す。
●［報酬１：物理量データの安定化、サイクルタイム短縮、省エネルギー化のうちの少なくとも１つに寄与するとプラスの報酬を与えるケース］
物理量データの安定化の判定は、物理量データを統計処理した結果、バラツキの低減に寄与した場合に、その程度に応じてプラスの報酬を与えるようにしてもよい。バラツキの指標としては標準偏差を用いるのが一般的である。
サイクルタイム短縮の判定は、サイクルタイムが短縮した場合にその程度に応じてプラスの報酬を与える。
省エネルギー化は、射出成形機単体の消費電力、射出成形システム全体の消費電力、複数の射出成形システムの全消費電力などを指標として、これらが削減された場合にその程度に応じてプラスの報酬を与える。
逆に、物理量データの不安定化、サイクルタイム延長、消費エネルギー増大となった場合はその程度に応じてマイナスの報酬を与える。 Below, the example of the reward conditions set with the reward condition setting part 23 in this embodiment is shown.
● [Reward 1: A case where a positive reward is given if it contributes to at least one of stabilization of physical quantity data, reduction of cycle time, and energy saving]
The determination of the stabilization of the physical quantity data may be made so as to give a positive reward depending on the degree when the physical quantity data contributes to the reduction of variation as a result of statistical processing. A standard deviation is generally used as an index of variation.
In the determination of the cycle time shortening, when the cycle time is shortened, a positive reward is given according to the degree.
Energy conservation is based on the power consumption of the injection molding machine alone, the power consumption of the entire injection molding system, and the total power consumption of multiple injection molding systems. give.
Conversely, if the physical quantity data becomes unstable, the cycle time is extended, or the energy consumption is increased, a negative reward is given according to the degree.

●［報酬２：物理量データに予め許容値を設定し、報酬計算部が許容値内に物理量データが収まるとプラスの報酬を与えるケース］
最大射出圧力が２００ＭＰａを超えると成形品にバリが発生し、１９０ＭＰａを下回ると成形品にショートが発生することが分かっている場合などには、射出工程での射出圧力に最大２００ＭＰａ最小１９０ＭＰａのような許容値を設定し、許容値内に物理量データは収まった場合にプラスの報酬を与える。また、上記許容値から外れている場合には、その乖離量に応じて乖離量が大きいほど大きなマイナスの報酬を与えるようにしてもよい。
また、射出成形機の画面には、図３示すように、１ショット間の射出保圧圧力データを、横軸（時間またはスクリュ位置）、縦軸（圧力）とした圧力波形で表示する機能があり、該圧力波形の複数の区間に上限や下限を設け、波形がこれらの上限下限から外れた場合にアラームメッセージを表示したり、良否判別させたりする機能が知られている。この圧力波形に設定される上限下限を、上記の報酬条件設定部２３が設定する許容値として、圧力波形に対して報酬を与えることも可能である。 ● [Reward 2: A case where a permissible value is set in advance for physical quantity data and the reward calculation unit gives a positive reward when the physical quantity data falls within the permissible value]
When the maximum injection pressure exceeds 200 MPa, burrs are generated in the molded product, and when it is less than 190 MPa, it is known that a short circuit occurs in the molded product. If the physical quantity data falls within the allowable value, a positive reward is given. Further, when the deviation is out of the allowable value, a larger negative reward may be given as the deviation amount increases according to the deviation amount.
In addition, as shown in FIG. 3, the injection molding machine screen has a function to display injection holding pressure data for one shot in a pressure waveform with a horizontal axis (time or screw position) and a vertical axis (pressure). There is known a function of setting upper and lower limits in a plurality of sections of the pressure waveform and displaying an alarm message when the waveform deviates from these upper and lower limits, and determining whether the waveform is good or bad. It is also possible to give a reward to the pressure waveform by using the upper and lower limits set in the pressure waveform as an allowable value set by the reward condition setting unit 23.

●［報酬３：物理量データに予め目標値を設定し、報酬計算部が目標値に物理量データが接近すると目標値と物理量との乖離量に基づいてプラスの報酬を与えるケース］
成形品重量に金型設計と樹脂選定に基づいて目標値を設定し、目標値に近づくほど大きいプラスの報酬を与えるようにしてもよい。
逆に目標値から物理量データが離れると前記目標値と前記物理量との乖離量に基づいてマイナスの報酬を与えるようにしても良い。また、乖離量の変化率が増大した場合に該変化率に基づいてさらにマイナスの報酬を与えるようにすると、加速度的に乖離量が増加する場合にさらに大きくマイナスの報酬を与えることが可能となる。 ● [Reward 3: A case where a target value is set in advance in the physical quantity data and the reward calculation unit gives a positive reward based on the amount of deviation between the target value and the physical quantity when the physical quantity data approaches the target value]
A target value may be set for the molded product weight based on the mold design and resin selection, and a larger positive reward may be given as the target value is approached.
Conversely, when the physical quantity data is separated from the target value, a negative reward may be given based on the amount of deviation between the target value and the physical quantity. In addition, when the rate of change in the amount of deviation increases, if a negative reward is given based on the rate of change, it becomes possible to give a larger negative reward when the amount of deviation increases at an accelerated rate. .

また、上記した報酬２の許容値の設定と報酬３の目標値の設定を組み合わせて、許容値内の上限近くに目標値を設定してもよい。射出成形で使用する樹脂は、同グレードであってもロットによって分子量分布や溶融時の粘度が変動する要因を持っているため、許容値内で成形を行っていても許容値の下限付近ではショートの発生確率が増加してしまうような場合がある。このようなリスクを回避する場合には、目標値を許容値の上限以下でかつ上限近くに設定する。これによって、許容値範囲内で、かつ許容値の上限付近で成形が安定するようになり、ショートの発生確率を低減させることが可能となる。このように複数の報酬条件を組み合わせて使用することも可能である。 Further, the target value may be set near the upper limit within the allowable value by combining the setting of the allowable value of the reward 2 and the setting of the target value of the reward 3 described above. Even if the resin used in injection molding is the same grade, it has a factor that the molecular weight distribution and viscosity at the time of melting vary depending on the lot, so even if molding is performed within the allowable value, it is short near the lower limit of the allowable value. There are cases where the probability of occurrence increases. In order to avoid such a risk, the target value is set below the upper limit of the allowable value and close to the upper limit. As a result, the molding becomes stable within the allowable value range and near the upper limit of the allowable value, and the probability of occurrence of a short circuit can be reduced. In this way, a plurality of reward conditions can be used in combination.

●［報酬４：成形不良を示す状態が発生したらマイナスの報酬を与えるケース］
成形品を撮影して取得した画像あるいは画像を解析した画像解析データや、光学検査装置等で検知した、バリ、ヒケ、ソリ、気泡、ショート、フローマーク、ウェルド、シルバーストリーク、色むら、変色、炭化、不純物の混入、レンズ成形品の光軸の許容値外へのずれ、成形品厚さ不良などの成形不良が発生したらマイナスの報酬を与える。
また、これらの不良の程度に応じてマイナスの報酬の大きさを変化させても良い。例えば変色が発生した場合は、色差計や撮影した画像の画像解析等から変色の程度を数値化し、その変色の程度に応じてマイナスの報酬の大きさを変化させるようにする。 ● [Reward 4: A case where a negative reward is given if a condition indicating molding failure occurs]
Images obtained by photographing molded products or image analysis data obtained by analyzing images, burrs, sink marks, sleds, bubbles, shorts, flow marks, welds, silver streaks, color irregularities, discoloration, detected by optical inspection devices, etc. If molding defects such as carbonization, contamination of impurities, deviation of the optical axis of the lens molded product from an allowable value, and defective thickness of the molded product occur, a negative reward is given.
Further, the magnitude of the negative reward may be changed according to the degree of these defects. For example, when discoloration occurs, the degree of discoloration is converted into a numerical value based on a color difference meter or image analysis of a captured image, and the amount of negative reward is changed according to the degree of discoloration.

図２に戻って、操作条件調整学習部２５は、物理量データと、自身が行った射出成形システムの成形条件を含む操作条件の調整、および前記報酬計算部２４で計算された報酬とに基づいて機械学習（強化学習）を行う。また、この際に後述する学習結果記憶部２６に記憶されている学習結果も使用して機械学習（強化学習）を行ってもよい。
そして、後述する操作条件調整量出力部２７は、操作条件調整学習部２５の学習結果に基づいて、型締条件、エジェクト条件、射出保圧条件、計量条件、温度条件、ノズルタッチ条件、樹脂供給条件、型厚条件、成形品取出条件、ホットランナ条件、各周辺装置の制御装置等の設定条件など射出成形システムの操作条件の調整量を出力する。ここでいう操作条件の調整が、機械学習に用いられる行動ａに相当する。 Returning to FIG. 2, the operation condition adjustment learning unit 25 is based on the physical quantity data, the adjustment of the operation condition including the molding condition of the injection molding system performed by itself, and the reward calculated by the reward calculation unit 24. Perform machine learning (reinforcement learning). At this time, machine learning (reinforcement learning) may be performed using learning results stored in a learning result storage unit 26 described later.
Then, the operation condition adjustment amount output unit 27 described later is based on the learning result of the operation condition adjustment learning unit 25, and the mold clamping condition, the eject condition, the injection pressure holding condition, the measurement condition, the temperature condition, the nozzle touch condition, and the resin supply. The adjustment amount of the operation conditions of the injection molding system such as the conditions, the mold thickness conditions, the molded product take-out conditions, the hot runner conditions, and the setting conditions of the control device of each peripheral device is output. The adjustment of the operation condition here corresponds to the action a used for machine learning.

ここで、操作条件調整学習部２５が行う機械学習においては、ある時刻ｔにおける物理量データの組み合わせにより状態ｓ_tが定義され、定義された状態ｓ_tに対して行われる射出成形システムの操作条件を調整して後述する操作条件調整量出力部２７により該調整結果を出力することが行動ａ_tとなり、そして、調整結果に基づいて射出成形が行われた結果として得られたデータに基づいて前記報酬計算部２４で計算された値が報酬ｒ_t+1となる。学習に用いられる価値関数については、適用する学習アルゴリズムに応じて決定する。例えば、Ｑ学習を用いる場合には、上記した数２式に従って行動価値関数Ｑ（ｓ_t，ａ_t）を更新することにより学習を進めるようにすれば良い。 Here, in the machine learning performed by the operating condition adjusting learning unit 25, the state s _t by a combination of physical quantity data is defined at a certain time t, the operating conditions of the injection molding system to be made to defined states s _t adjusting is possible to output the result of adjustment by the operation condition adjustment amount output unit 27 to be described later act a _t next, then the reward based on the data obtained as a result of injection molding was performed based on the adjustment result The value calculated by the calculation unit 24 is the reward r _{t + 1} . The value function used for learning is determined according to the learning algorithm to be applied. For example, when Q learning is used, learning may be advanced by updating the action value function Q (s _t , a _t ) according to the above-described equation (2).

また、学習の際には各操作条件をあらかじめ初期値に定めた上で、少なくとも１つの操作条件を所定の範囲内で変動させることにより学習するようにしてもよい。例えば、計量時のスクリュ回転数を初期値１００ｒｐｍから１０ｒｐｍずつ自動で上昇させて成形を行ったり、背圧を初期値５Ｍｐａから１ＭＰａずつ自動で上昇させて成形を行ったりした際の各物理量をとらえて学習させることで、成形不良を発生させずに良品を安定成形できる範囲で、最も消費電力が少なくなるスクリュ回転数と背圧の組み合わせを学習させるようなことが可能となる。
更に、上記したεグリーディ法を採用し、所定の確率でランダムな行動を選択することで学習の進展を図るようにしてもよい。 In learning, each operation condition may be set to an initial value in advance, and at least one operation condition may be varied within a predetermined range. For example, each physical quantity is captured when molding is performed by automatically increasing the screw rotational speed during measurement from the initial value of 100 rpm by 10 rpm, or by automatically increasing the back pressure from the initial value of 5 MPa to 1 MPa. Thus, it is possible to learn a combination of the screw rotation speed and the back pressure that consumes the least amount of power within a range in which a good product can be stably molded without causing molding defects.
Further, the above-described ε-greedy method may be adopted, and learning may be progressed by selecting a random action with a predetermined probability.

なお、前記成形品取出条件が設定される成形品取出装置５を、多関節ロボットにしてもよい。金属粉末成形など、成形品（グリーン体）が非常に脆い場合は、一般的な取出機での成形品取出しができないため、ロボットで把持力を必要最低限に抑えて低速で取出しを行うようにするとよい。しかしながら、成形品取出し速度はサイクルタイムに大きく影響を与えるため、成形品を壊さない範囲でなるべく速度を上げて成形品取出しを行うように学習させるとよい。成形品の破損の有無は、成形品重量計測や画像解析等によって物理量データとして出力及び判定が可能である。 Note that the molded product take-out device 5 in which the molded product take-out conditions are set may be an articulated robot. If the molded product (green body) is very fragile, such as metal powder molding, the molded product cannot be taken out using a general take-out machine. Good. However, since the speed of taking out the molded product greatly affects the cycle time, it is preferable to learn to take out the molded product at as high a speed as possible without breaking the molded product. The presence or absence of damage to the molded product can be output and determined as physical quantity data by measuring the weight of the molded product or analyzing the image.

学習結果記憶部２６は、前記操作条件調整学習部２５が学習した結果を記憶する。また、操作条件調整学習部２５が学習結果を再使用する際には、記憶している学習結果を操作条件調整学習部２５に出力する。学習結果の記憶には、上述したように、利用する機械学習アルゴリズムに応じた価値関数を、近似関数や、配列、又は多値出力のＳＶＭやニューラルネットワーク等の教師あり学習器などにより記憶するようにすれば良い。
なお、学習結果記憶部２６に、他の射出成形システム１や集中管理システム３０が記憶している学習結果を入力して記憶させたり、学習結果記憶部２６が記憶している学習結果を他の射出成形システム１や集中管理システム３０に対して出力したりすることも可能である。 The learning result storage unit 26 stores the result learned by the operation condition adjustment learning unit 25. When the operation condition adjustment learning unit 25 reuses the learning result, the stored learning result is output to the operation condition adjustment learning unit 25. As described above, the learning function is stored with an approximate function, an array, or a supervised learning device such as an SVM or a neural network having a multi-value output, as described above. You can do it.
In addition, the learning result stored in the other injection molding system 1 or the centralized management system 30 is input to the learning result storage unit 26 and stored, or the learning result stored in the learning result storage unit 26 is stored in another learning result. It is also possible to output to the injection molding system 1 or the centralized management system 30.

操作条件調整量出力部２７は、前記操作条件調整学習部２５が学習した操作条件の学習結果に基づいて操作条件調整の対象と調整量を決定して出力する。操作条件調整量出力部２７から出力された調整量に基づいて射出成形システム１の動作条件を変更し、再び前記状態観測部２１に入力された物理量データを使用して学習を繰り返すことにより、より優れた学習結果を得ることができる。 The operation condition adjustment amount output unit 27 determines and outputs an operation condition adjustment target and adjustment amount based on the learning result of the operation condition learned by the operation condition adjustment learning unit 25. By changing the operating condition of the injection molding system 1 based on the adjustment amount output from the operation condition adjustment amount output unit 27 and repeating learning using the physical quantity data input to the state observation unit 21 again, Excellent learning results can be obtained.

また、報酬が最大となるように物理量データと操作を引数で表現した評価関数を用いて機械学習させることが可能である。機械学習は、最新の成形の物理量データ取得しながら実施してもよいし、物理量データ記憶部が記憶している取得済の物理量データで実施しても良い。 In addition, machine learning can be performed using an evaluation function expressing physical quantity data and an operation with arguments so as to maximize the reward. The machine learning may be performed while acquiring the latest physical quantity data of molding, or may be executed with the acquired physical quantity data stored in the physical quantity data storage unit.

機械学習を行う際に、前記操作条件のうちの少なくとも１つを所定の範囲内で変動させて前記操作条件調整学習部２５に学習させることが可能である。意図的に変動を与えて学習させることで、変動に対する影響の学習を効率よく実施することが可能となる。 When performing machine learning, it is possible to cause the operation condition adjustment learning unit 25 to learn by varying at least one of the operation conditions within a predetermined range. By learning with intentional variation, it is possible to efficiently learn the influence on the variation.

また、複数の前記射出成形システム１がそれぞれ外部との通信手段を更に備えると、それぞれの前記物理量データ記憶部２２が記憶した物理量データや学習結果記憶部２６が記憶した学習結果を送受信して共有することが可能となり、より効率良く機械学習を行うことができる。例えば、所定の範囲内で操作条件を変動させて学習する際に、複数の射出成形システム１において異なる操作条件を所定の範囲内でそれぞれ変動させて成形しながら、それぞれの射出成形システム１の間で物理量データや学習データをやり取りすることにより並列して学習を進めるようにすることで効率的に学習させることができる。
なお、複数の射出成形システム１間でやり取りする際には、通信は集中管理システム３０等のホストコンピュータを経由しても、直接射出成形システム１同士が通信しても構わないし、クラウドを使用しても構わないが、大量のデータを取り扱う場合があるため、なるべく通信速度が速い通信手段が好ましい。 Further, when each of the plurality of injection molding systems 1 further includes a communication means with the outside, the physical quantity data stored in each physical quantity data storage unit 22 and the learning result stored in the learning result storage unit 26 are transmitted and received and shared. And machine learning can be performed more efficiently. For example, when learning by varying operating conditions within a predetermined range, different operating conditions are varied within a predetermined range in a plurality of injection molding systems 1 while molding between the respective injection molding systems 1. Thus, it is possible to efficiently learn by exchanging physical quantity data and learning data to advance learning in parallel.
In addition, when exchanging between a plurality of injection molding systems 1, communication may be performed directly via the host computer such as the centralized management system 30 or between the injection molding systems 1, and a cloud is used. However, since a large amount of data may be handled, a communication means with a communication speed as fast as possible is preferable.

以上、本発明の実施の形態について説明したが、本発明は上述した実施の形態の例のみに限定されることなく、適宜の変更を加えることにより様々な態様で実施することができる。 Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments, and can be implemented in various modes by making appropriate changes.

１射出成形システム
２射出成形機
３金型
５成形品取出装置
６表示器
７型締力制御装置
８制御装置
９制御装置
２０機械学習器
２１状態観測部
２２物理量データ記憶部
２３報酬条件設定部
２４報酬計算部
２５操作条件調整学習部
２６学習結果記憶部
２７操作条件調整量出力部
３０集中管理システム DESCRIPTION OF SYMBOLS 1 Injection molding system 2 Injection molding machine 3 Mold 5 Molded product taking-out apparatus 6 Indicator 7 Clamping force control apparatus 8 Control apparatus 9 Control apparatus 20 Machine learning device 21 State observation part 22 Physical quantity data storage part 23 Compensation condition setting part 24 Reward calculation unit 25 Operation condition adjustment learning unit 26 Learning result storage unit 27 Operation condition adjustment amount output unit 30 Centralized management system

Claims

A system comprising a plurality of injection molding systems each having at least one injection molding machine and having artificial intelligence for machine learning, connected to each other via communication means,
Each of the injection molding systems
When performing injection molding by the injection molding machine, a state observing unit that observes physical quantities related to the injection molding being performed;
A physical quantity data storage unit for storing physical quantity data observed by the state observation unit;
A reward condition setting unit for setting a reward condition in the machine learning;
A reward calculation unit that calculates a reward based on the physical quantity data observed by the state observation unit and the reward condition set in the reward condition setting unit;
An operation condition adjustment learning unit that performs machine learning of operation condition adjustment based on the reward calculated by the reward calculation unit, the operation condition set in the injection molding system, and the physical quantity data;
A learning result storage unit for storing a learning result obtained by machine learning by the operation condition adjustment learning unit;
An operation condition adjustment amount output unit that determines and outputs an operation condition adjustment target and an adjustment amount based on a learning result performed by the operation condition adjustment learning unit;
With
The physical quantity data stored in the physical quantity data storage unit provided in each injection molding system and the learning result stored in the learning result storage unit are transmitted and received and shared,
system.

The system according to claim 1, wherein the learning result stored in the learning result storage unit is used for learning of the operation condition adjustment learning unit.

Each of the injection molding systems further comprises measuring means,
The physical quantity data observed by the state observing unit is the weight, dimension, appearance, length, angle, area, volume, and optical molded product calculated from the image data of the molded product measured by the measuring means. Including at least one of optical inspection results and molded product strength measurement results,
The physical quantity data storage unit stores physical quantity data of one molded product together with other physical quantity data.
The system according to claim 1 or 2, characterized by the above-mentioned.

Reward conditions can be input from the display provided in the injection molding machine to the reward condition setting unit.
The system according to any one of claims 1 to 3, wherein:

If the reward calculation unit contributes to at least one of stabilization of physical quantity data, cycle time reduction, and energy saving, a positive reward is given according to the degree,
The system according to any one of claims 1 to 4, characterized in that:

The reward calculation unit gives a negative reward depending on the degree of occurrence of at least one of physical quantity data instability, cycle time extension, and energy consumption increase,
The system according to any one of claims 1 to 5, characterized in that:

An allowable value is set in advance in the physical quantity data,
The reward calculation unit gives a positive reward when the physical quantity data is within the allowable value,
The system according to any one of claims 1 to 6, characterized in that:

An allowable value is set in advance in the physical quantity data,
When the physical quantity data deviates from the allowable value, the reward calculation unit gives a negative reward based on the deviation amount,
A system according to any one of claims 1 to 7, characterized in that

A target value is set in advance in the physical quantity data,
When the physical quantity data approaches the target value, the reward calculation unit gives a positive reward based on the amount of deviation between the target value and the physical quantity,
The system according to any one of claims 1 to 8, characterized in that:

A target value is set in advance in the physical quantity data,
The reward calculation unit gives a negative reward based on the amount of deviation between the target value and the physical quantity when the physical quantity data is separated from the target value,
10. A system according to any one of claims 1 to 9, characterized in that

The reward calculation unit gives a negative reward according to the degree of occurrence of a state indicating defective molding,
The system according to any one of claims 1 to 10, wherein:

The molding defects are burrs, sink marks, warps, bubbles, shorts, flow marks, welds, silver streaks, uneven color, discoloration, carbonization, contamination of impurities, deviation of the optical axis of the lens molded product from the allowable value, molded product. At least one of the thickness defects,
The system according to claim 11.

The operation conditions that the operation condition adjustment learning unit learns are mold clamping conditions, ejection conditions, injection pressure holding conditions, measurement conditions, temperature conditions, nozzle touch conditions, resin supply conditions, mold thickness conditions, molded product removal conditions, hot At least one of the runner conditions,
A system according to any one of claims 1 to 12, characterized in that

A robot as a molded product take-out means in which the molded product take-out conditions are set;
The system of claim 13.

Fluctuating at least one of the operation conditions within a predetermined range and learning the operation condition adjustment learning unit;
15. A system according to any one of the preceding claims.