JP2020074177A

JP2020074177A - Machine learning device for learning display of operation menu, numerical control device, machine tool system, manufacturing system, and machine learning method

Info

Publication number: JP2020074177A
Application number: JP2020007640A
Authority: JP
Inventors: 友磯黒川; Yuki Kurokawa
Original assignee: Fanuc Corp
Current assignee: Fanuc Corp
Priority date: 2016-02-05
Filing date: 2020-01-21
Publication date: 2020-05-14
Anticipated expiration: 2036-02-05
Also published as: JP6898479B2

Abstract

To provide a machine learning device capable of displaying an optimum operation menu to each operator, a numerical control device, a machine tool system, a manufacturing system, and a machine learning method.SOLUTION: A machine learning device 2 for detecting an operator, communicating with a database in which the information of the operator is registered, and learning the display of the operation menu based on the information of the operator includes a state observation part 21 for observing the operation history of the operation menu and a learning part 22 for learning the display of the operation menu on the basis of the operation history of the operation menu observed by the state observation part.SELECTED DRAWING: Figure 1

Description

本発明は、操作メニューの表示を学習する機械学習器，数値制御装置，工作機械システム，製造システムおよび機械学習方法に関する。 The present invention relates to a machine learning device, a numerical control device, a machine tool system, a manufacturing system and a machine learning method for learning the display of operation menus.

従来、例えば、工作機械を制御する数値制御装置(ＮＣ(Numerical Control)装置)は、その操作メニューを様々な人(操作者)が使用している。ここで、操作者としては、例えば、工作機械メーカ(ＭＴＢ(Machine Tool Builder)の開発者や担当者、オペレータ(ユーザ)およびサービス員(サービスエンジニア)といった様々な立場および権限レベルの人が含まれる。 Conventionally, for example, in a numerical control device (NC (Numerical Control) device) for controlling a machine tool, various people (operators) use its operation menu. Here, the operators include, for example, persons of various positions and authority levels such as a machine tool maker (MTB (Machine Tool Builder) developer and person in charge, an operator (user) and a service person (service engineer)). ..

本明細書において、ＮＣ装置には、コンピュータ数値制御装置(ＣＮＣ装置)等も含まれる。また、ＮＣ装置の操作メニューは、例えば、タッチ(位置入力)機能を有する液晶パネルのような表示装置上に表示され、その表示装置に表示される内容に従って変化するものとする。なお、操作メニューを表示する表示装置は、例えば、ＮＣ装置に一体的に設けることができるが、有線または無線によりＮＣ装置から離れた位置に設け、あるいは、工作機械の近傍等において表示装置の操作メニューを操作することもできる。 In this specification, the NC device also includes a computer numerical control device (CNC device) and the like. The operation menu of the NC device is displayed on a display device such as a liquid crystal panel having a touch (position input) function, and changes according to the content displayed on the display device. The display device for displaying the operation menu can be provided integrally with the NC device, for example, but it is provided at a position separated from the NC device by wire or wirelessly, or the display device is operated near the machine tool or the like. You can also operate the menu.

ところで、従来、例えば、機能アイコンをマトリックス状にセルとして配置し、アイコンの使用回数に応じて機能アイコンをユーザが所望する配置に並び替えるものが提案されている(例えば、特許文献１参照)。また、ナビゲーション装置のメニューを表示するときに、メニュー項目を、ユーザを煩わすことなく、現在のユーザの状況に適した順番に並び替えるものも提案されている(例えば、特許文献２参照)。さらに、例えば、工作機械を操作するオペレータに権限レベルを設定し、その権限レベルよって操作メニューを変化させることも考えられる(例えば、特許文献３参照)。 By the way, conventionally, for example, there has been proposed a method of arranging function icons as cells in a matrix and rearranging the function icons in an arrangement desired by the user according to the number of times the icons are used (see, for example, Patent Document 1). Further, there is also proposed a method of rearranging menu items in an order suitable for the current user situation when displaying a menu of a navigation device (for example, see Patent Document 2) without bothering the user. Further, for example, it is possible to set an authority level for an operator who operates the machine tool and change the operation menu according to the authority level (see, for example, Patent Document 3).

特開２００９−１８１５０１号公報JP, 2009-181501, A 特開２０１０−１２７８１４号公報JP, 2010-127814, A 特開２００９−０８６９６４号公報JP, 2009-086964, A

上述したように、例えば、メニューの使用回数に応じて並べるだけでは、普段、機械を操作しているオペレータ以外、例えば、サービス員等が操作する時に、使用したいメニューへのアクセスが悪くなることが考えられる。また、操作者を検知した後、例えば、操作者毎に使用回数に応じてメニューを並び替える手法も考えられるが、この場合、ユーザが利用し易いメニュー画面を得るためには、操作者が機械をある程度、操作して学習させる必要がある。 As described above, for example, arranging the menus according to the number of times the menus are used may result in poor access to the menus that the user wants to use, for example, when a service person other than the operator who normally operates the machine operates the menus. Conceivable. In addition, after detecting the operator, for example, a method of rearranging the menus according to the number of times of use for each operator may be considered. In this case, in order to obtain a menu screen that is easy for the user to use, It is necessary to operate by learning to some extent.

また、パラメータからメニューの表示項目順を定めるテーブルを事前に用意することで、状況に適したメニューを表示する場合、想定している状況からの変化に対して動的に対応することができないため、操作する人の立場や権限レベルが増えるたびに手動でテーブルを作り直す必要がある。 In addition, by preparing in advance a table that determines the menu display item order from parameters, when displaying a menu suitable for the situation, it is not possible to dynamically respond to changes from the assumed situation. , It is necessary to manually recreate the table every time the operating person's position or authority level increases.

本発明の目的は、上述した従来技術の課題に鑑み、それぞれの操作者に最適な操作メニューを表示することができる機械学習器，数値制御装置，工作機械システム，製造システムおよび機械学習方法の提供にある。 An object of the present invention is to provide a machine learning device, a numerical control device, a machine tool system, a manufacturing system and a machine learning method capable of displaying an optimum operation menu for each operator in view of the above-mentioned problems of the prior art. It is in.

本発明に係る第１実施形態によれば、操作者を検知し、操作者の情報が登録してあるデータベースと通信して、前記操作者の情報に基づいた操作メニューの表示を学習する機械学習器であって、前記操作メニューの操作履歴を観測する状態観測部と、前記状態観測部により観測された前記操作メニューの操作履歴に基づいて、前記操作メニューの表示を学習する学習部と、を備える機械学習器が提供される。 According to the first embodiment of the present invention, machine learning that detects an operator, communicates with a database in which information of the operator is registered, and learns to display an operation menu based on the information of the operator. A state observing unit that observes the operation history of the operation menu, and a learning unit that learns the display of the operation menu based on the operation history of the operation menu observed by the state observing unit. A machine learning device is provided.

前記操作メニューの操作履歴は、前記操作メニューのアクセス回数、および、前記操作メニューの遷移情報を含むのが好ましい。前記状態観測部は、さらに、現在選択されている操作メニューの情報、工作機械が加工運転中であるか否かを示す情報、数値制御装置および前記工作機械のアラーム情報、ならびに、プログラムの編集中であるか否かを示す情報の少なくとも１つを含むのが好ましい。前記機械学習器は、さらに、前記学習部が学習した操作メニューの表示を参照して、表示装置に表示する操作メニューの位置および順番を決定する意思決定部を備えてもよい。 The operation history of the operation menu preferably includes the number of times the operation menu is accessed and transition information of the operation menu. The status observing unit further includes information on a currently selected operation menu, information indicating whether or not the machine tool is in a machining operation, alarm information of the numerical control device and the machine tool, and a program being edited. It is preferable to include at least one of the information indicating whether or not The machine learning device may further include a decision making unit that refers to the display of the operation menu learned by the learning unit and determines the position and the order of the operation menu displayed on the display device.

前記学習部は、前記状態観測部の出力に基づいて報酬を計算する報酬計算部と、前記状態観測部および前記報酬計算部の出力に基づいて、前記表示装置に表示する操作メニューの位置および順番の価値を定める価値関数を、前記報酬に応じて更新する価値関数更新部と、を備えることができる。前記報酬計算部は、アクセスし易い前記操作メニューの位置および順番に配置したメニューから操作された時にプラスの報酬を与え、アクセスし難い前記操作メニューの位置および順番に配置したメニューから操作された時にマイナスの報酬を与えるのが好ましい。 The learning unit calculates the reward based on the output of the state observation unit, and the position and order of the operation menu displayed on the display device based on the outputs of the state observation unit and the reward calculation unit. A value function updating unit that updates the value function that determines the value of the value according to the reward. The reward calculation unit gives a positive reward when operated from the position of the operation menu which is easy to access and the menu arranged in order, and when operated from the position of the operation menu which is difficult to access and the menu arranged in order. It is preferable to give a negative reward.

前記学習部は、前記状態観測部の出力、および、入力された教師データに基づいて誤差を計算する誤差計算部と、前記状態観測部および前記誤差計算部の出力に基づいて、前記表示装置に表示する操作メニューの位置および順番の誤差を定める学習モデルを更新する学習モデル更新部と、を備えることができる。前記機械学習器は、ニューラルネットワークを備えてもよい。前記操作者の情報は、当該操作者の立場または権限レベルの情報を含み、前記操作者の情報に基づいた操作メニューは、前記操作者の立場または権限レベルの情報に基づいて変化するようにしてもよい。 The learning unit is an output of the state observation unit, and an error calculation unit that calculates an error based on the input teacher data, and the display device based on the outputs of the state observation unit and the error calculation unit. A learning model updating unit that updates a learning model that determines the position and order error of the displayed operation menu. The machine learning device may include a neural network. The operator information includes information on the operator's position or authority level, and the operation menu based on the operator information is changed based on the operator's position or authority level information. Good.

本発明に係る第２実施形態によれば、前記操作者を検知する検知部と、前記操作者の情報が登録してあるデータベースと通信する通信部と、上述した第１実施形態による機械学習器と、前記機械学習器により学習された操作メニューを表示する表示装置と、を備える数値制御装置が提供される。 According to the second embodiment of the present invention, a detection unit that detects the operator, a communication unit that communicates with a database in which information of the operator is registered, and the machine learning device according to the first embodiment described above. And a display device for displaying the operation menu learned by the machine learning device.

本発明に係る第３実施形態によれば、数値制御装置と、前記数値制御装置により制御される工作機械と、上述した第１実施形態による機械学習器と、を備える工作機械システムが提供される。 According to the third embodiment of the present invention, there is provided a machine tool system including a numerical control device, a machine tool controlled by the numerical control device, and the machine learning device according to the first embodiment described above. ..

本発明に係る第４実施形態によれば、上述した第３実施形態による工作機械システムを複数備える製造システムであって、前記機械学習器は、それぞれの前記工作機械システムに設けられ、複数の前記工作機械システムに設けられた複数の前記機械学習器は、通信媒体を介して相互にデータを共有または交換するようになっている製造システムが提供される。前記機械学習器は、クラウドサーバ上に存在してもよい。 According to a fourth embodiment of the present invention, there is provided a manufacturing system comprising a plurality of machine tool systems according to the above-mentioned third embodiment, wherein the machine learning device is provided in each of the machine tool systems, and a plurality of the machine learning systems are provided. A manufacturing system is provided in which a plurality of machine learning devices provided in a machine tool system share or exchange data with each other via a communication medium. The machine learning device may exist on a cloud server.

本発明に係る第５実施形態によれば、操作者を検知し、操作者の情報が登録してあるデータベースと通信して、前記操作者の情報に基づいた操作メニューの表示を学習する機械学習方法であって、前記操作メニューの操作履歴を観測し、前記観測された前記操作メニューの操作履歴に基づいて、前記操作メニューの表示を学習する機械学習方法が提供される。前記操作メニューの操作履歴は、前記操作メニューのアクセス回数、および、前記操作メニューの遷移情報を含むのが好ましい。 According to the fifth embodiment of the present invention, the machine learning for detecting the operator, communicating with the database in which the information of the operator is registered, and learning the display of the operation menu based on the information of the operator. A machine learning method for observing an operation history of the operation menu and learning the display of the operation menu based on the observed operation history of the operation menu. The operation history of the operation menu preferably includes the number of times the operation menu is accessed and transition information of the operation menu.

本発明に係る機械学習器，数値制御装置，工作機械システム，製造システムおよび機械学習方法によれば、それぞれの操作者に最適な操作メニューを表示することができるという効果を奏する。 According to the machine learning device, the numerical controller, the machine tool system, the manufacturing system and the machine learning method according to the present invention, it is possible to display the optimum operation menu for each operator.

図１は、本発明に係る工作機械システムの一実施形態を概略的に示すブロック図である。FIG. 1 is a block diagram schematically showing an embodiment of a machine tool system according to the present invention. 図２は、本発明に係る数値制御装置の一実施形態を示すブロック図である。FIG. 2 is a block diagram showing an embodiment of the numerical controller according to the present invention. 図３は、ニューロンのモデルを模式的に示す図である。FIG. 3 is a diagram schematically showing a neuron model. 図４は、図３に示すニューロンを組み合わせて構成した三層のニューラルネットワークを模式的に示す図である。FIG. 4 is a diagram schematically showing a three-layer neural network configured by combining the neurons shown in FIG. 図５は、図１に示す機械学習器の動作の一例を示すフローチャートである。FIG. 5 is a flowchart showing an example of the operation of the machine learning device shown in FIG. 図６は、図５に示す機械学習器により学習された操作メニューの表示を説明するための図(その１)である。FIG. 6 is a view (No. 1) for explaining the display of the operation menu learned by the machine learning device shown in FIG. 図７は、図５に示す機械学習器により学習された操作メニューの表示を説明するための図(その２)である。FIG. 7 is a diagram (part 2) for explaining the display of the operation menu learned by the machine learning device shown in FIG. 図８は、図５に示す機械学習器により学習された操作メニューの表示を説明するための図(その３)である。FIG. 8 is a view (No. 3) for explaining the display of the operation menu learned by the machine learning device shown in FIG. 図９は、本発明に係る工作機械システムの他の実施形態を概略的に示すブロック図である。FIG. 9 is a block diagram schematically showing another embodiment of the machine tool system according to the present invention.

以下、本発明に係る機械学習器，数値制御装置，工作機械システム，製造システムおよび機械学習方法の実施形態を、添付図面を参照して詳述する。図１は、本発明に係る工作機械システムの一実施形態を概略的に示すブロック図である。 Hereinafter, embodiments of a machine learning device, a numerical control device, a machine tool system, a manufacturing system, and a machine learning method according to the present invention will be described in detail with reference to the accompanying drawings. FIG. 1 is a block diagram schematically showing an embodiment of a machine tool system according to the present invention.

図１に示されるように、工作機械システムは、工作機械１，機械学習器２および数値制御装置(ＮＣ装置，ＣＮＣ装置)３を含む。ここで、工作機械１は、例えば、旋盤，ボール盤，中ぐり盤，フライス盤，研削盤，歯切り盤・歯車仕上げ機械，マシニングセンタ，放電加工機，パンチプレス，レーザ加工機，搬送機およびプラスチック射出成形機等を含み、数値制御装置３により制御される。なお、図１において、機械学習器２は、数値制御装置３とは別に設けられているが、数値制御装置３の一部として設けることもできる。また、表示装置３０は、例えば、タッチ機能を有する液晶パネルのようなものであり、数値制御装置３の操作メニューを表示する。この表示装置３０は、数値制御装置３に一体的に設けてもよいが、有線または無線により数値制御装置３から離れた位置に設けてもよい。 As shown in FIG. 1, the machine tool system includes a machine tool 1, a machine learning device 2, and a numerical controller (NC device, CNC device) 3. Here, the machine tool 1 is, for example, a lathe, a drilling machine, a boring machine, a milling machine, a grinding machine, a gear cutting machine / gear finishing machine, a machining center, an electric discharge machine, a punch press, a laser machine, a carrier machine and a plastic injection molding machine. It is controlled by the numerical controller 3 including a machine. Although the machine learning device 2 is provided separately from the numerical control device 3 in FIG. 1, it may be provided as a part of the numerical control device 3. The display device 30 is, for example, a liquid crystal panel having a touch function, and displays the operation menu of the numerical control device 3. The display device 30 may be provided integrally with the numerical control device 3, but may be provided at a position separated from the numerical control device 3 by wire or wirelessly.

機械学習器２は、例えば、表示装置３０における操作メニューの表示を学習するものであり、状態観測部２１、学習部２２、および、意思決定部２５を含む。状態観測部２１は、操作メニューの操作履歴、例えば、操作メニューのアクセス回数および操作メニューの遷移情報といった状態量(状態変数)を観測する。また、状態観測部２１は、現在選択されている操作メニューの情報、工作機械が加工運転中であるか否かを示す情報、数値制御装置および工作機械のアラーム情報、ならびに、プログラムの編集中であるか否かを示す情報の少なくとも１つを状態量として観測することができる。 The machine learning device 2 learns, for example, the display of the operation menu on the display device 30, and includes a state observation unit 21, a learning unit 22, and a decision making unit 25. The state observation unit 21 observes the operation history of the operation menu, for example, the state quantities (state variables) such as the number of times the operation menu is accessed and the transition information of the operation menu. Further, the state observing unit 21 is in the process of editing the information of the currently selected operation menu, the information indicating whether the machine tool is in the machining operation, the alarm information of the numerical control device and the machine tool, and the program being edited. At least one piece of information indicating whether or not there is a state quantity can be observed.

ここで、状態観測部２１は、例えば、操作メニューのアクセス回数，操作メニューの遷移情報，現在選択されている操作メニューの情報、工作機械１が加工運転中であるか否かを示す情報、数値制御装置３のアラーム情報、ならびに、プログラムの編集中であるか否かを示す情報等を、数値制御装置３から受け取ることができる。さらに、状態観測部２１は、例えば、工作機械１が加工運転中であるか否かを示す情報および工作機械のアラーム情報等を、工作機械１から受け取ることができる。 Here, the state observing unit 21, for example, the number of times the operation menu is accessed, the transition information of the operation menu, the information of the operation menu currently selected, the information indicating whether the machine tool 1 is in the machining operation, and the numerical value. It is possible to receive from the numerical control device 3 alarm information of the control device 3 and information indicating whether or not the program is being edited. Furthermore, the state observing unit 21 can receive, for example, information indicating whether or not the machine tool 1 is in a machining operation, alarm information of the machine tool, and the like from the machine tool 1.

学習部２２は、状態観測部２１により観測された状態量に基づいて、操作メニューの表示を学習するもので、報酬計算部２３および価値関数更新部２４を含む。報酬計算部２３は、状態観測部２１の出力に基づいて報酬を計算し、価値関数更新部２４は、状態観測部２１および報酬計算部２３の出力に基づいて、表示装置３０に表示する操作メニューの位置および順番の価値を定める価値関数を、報酬に応じて更新する。ここで、報酬計算部２３は、例えば、アクセスし易い操作メニューの位置および順番に配置したメニューから操作された時にプラスの報酬を与え、アクセスし難い操作メニューの位置および順番に配置したメニューから操作された時にマイナスの報酬を与える。これは、後に、図５を参照して説明する。 The learning unit 22 learns the display of the operation menu based on the state quantity observed by the state observing unit 21, and includes a reward calculating unit 23 and a value function updating unit 24. The reward calculation unit 23 calculates the reward based on the output of the state observation unit 21, and the value function update unit 24 displays the operation menu displayed on the display device 30 based on the outputs of the state observation unit 21 and the reward calculation unit 23. The value function that determines the value of the position and order of is updated according to the reward. Here, the reward calculation unit 23 gives a positive reward when operated from the position of the operation menu which is easy to access and the menu arranged in order, and operates from the position of the operation menu which is difficult to access and the menu arranged in order. Give a negative reward when given. This will be described later with reference to FIG.

図２は、本発明に係る数値制御装置の一実施形態を示すブロック図であり、図１に示す機械学習器２が数値制御装置３に内蔵されている例を示すものである。また、図２は、複数の機械学習器２(２１，２２，…，２ｎ)が、通信媒体を介して相互にデータを共有または交換する様子も示している。 FIG. 2 is a block diagram showing an embodiment of the numerical control device according to the present invention, and shows an example in which the machine learning device 2 shown in FIG. 1 is built in the numerical control device 3. 2 also shows how a plurality of machine learning devices 2 (21, 22, ..., 2n) share or exchange data with each other via a communication medium.

図２に示されるように、数値制御装置３は、上述した操作メニューが表示される表示装置３０，検知部３１，通信部３２および機械学習器２(２１)を含む。検知部３１は、操作者６による所定の操作に基づいて、例えば、操作者６による所定のコードの入力、あるいは、操作者６がかざしたＩＣカードの読み取り等に基づいて、操作者６を検知する。通信部３２は、検知部３１の出力に基づいて、例えば、数値制御装置３の外部に設けられたデータベース５と通信する。 As shown in FIG. 2, the numerical control device 3 includes a display device 30 for displaying the above-described operation menu, a detection unit 31, a communication unit 32, and a machine learning device 2 (21). The detection unit 31 detects the operator 6 based on a predetermined operation by the operator 6, for example, based on the input of a predetermined code by the operator 6 or the reading of an IC card held by the operator 6. To do. The communication unit 32 communicates with the database 5 provided outside the numerical control device 3, for example, based on the output of the detection unit 31.

ここで、データベース５には、操作者６の情報が登録されており、例えば、操作者６の立場や権限レベルが予め登録されている。すなわち、機械学習器２に対する通信部３２からの出力データＤ１には、検知部３１が検知した操作者６に対応した操作者６の情報、例えば、操作者６が、工作機械メーカの開発者や担当者か、オペレータ(ユーザ)か、あるいは、サービスエンジニアかといった様々な立場および権限レベルの情報が含まれる。 Here, the information of the operator 6 is registered in the database 5, for example, the position and authority level of the operator 6 are registered in advance. That is, in the output data D1 from the communication unit 32 to the machine learning device 2, information on the operator 6 corresponding to the operator 6 detected by the detection unit 31, for example, the operator 6 is a developer of a machine tool maker or It contains information on various positions and authority levels, such as person in charge, operator (user), or service engineer.

ここで、検知部３１としては、操作者６が操作するキーボードおよびＩＣカードの読み取り器に限定されるものではなく、操作者６を検知できれば、知られている様々な入力機器やセンサ等を使用することも可能である。なお、本実施例において、例えば、検知された操作者６の立場や権限レベルといった操作者６の情報に基づいて、操作メニューが変化するのが好ましい。すなわち、機械学習器２が、操作者の立場や権限レベルに基づいて、その操作者の立場や権限レベルにおける最適な操作メニューを学習して、表示装置３０に表示するのが好ましい。 Here, the detection unit 31 is not limited to the keyboard operated by the operator 6 and the reader of the IC card, and various known input devices and sensors may be used as long as the operator 6 can be detected. It is also possible to do so. In the present embodiment, it is preferable that the operation menu is changed based on the information of the operator 6 such as the detected position and authority level of the operator 6, for example. That is, it is preferable that the machine learning device 2 learns the optimum operation menu for the operator's position or authority level based on the operator's position or authority level and displays it on the display device 30.

また、図２に示されるように、複数の機械学習器２(２１〜２ｎ)は、通信媒体を介して相互にデータを共有または交換するように構成してもよい。例えば、それぞれが数値制御装置３で制御される工作機械１を複数備えた工作機械工場において、すなわち、工作機械システム(１，２)を複数備える製造システムにおいて、それぞれの工作機械システムの機械学習器２(２１〜２ｎ)は、通信媒体を介して相互にデータを共有または交換することができるようになっている。なお、機械学習器２(２１〜２ｎ)は、例えば、それぞれの数値制御装置３に設けずに、クラウドサーバ上に設けることも可能である。 Further, as shown in FIG. 2, the plurality of machine learning devices 2 (21 to 2n) may be configured to share or exchange data with each other via a communication medium. For example, in a machine tool factory having a plurality of machine tools 1 each controlled by a numerical controller 3, that is, in a manufacturing system having a plurality of machine tool systems (1, 2), a machine learning device for each machine tool system. 2 (21 to 2n) can share or exchange data with each other via a communication medium. The machine learning device 2 (21 to 2n) may be provided on the cloud server instead of being provided on each numerical control device 3, for example.

図１を参照して説明したように、機械学習器２(状態観測部２１)は、例えば、表示装置３０の出力データＤ２として、操作メニューの操作履歴(状態量：例えば、操作メニューのアクセス回数および操作メニューの遷移情報等)を受け取って機械学習(例えば、強化学習)を行い、表示装置３０に表示する操作メニューを制御する変更データＤ３(操作量：例えば、メニュー表示の順番およびメニュー表示の位置等)を制御する。なお、図２は、単なる例であり、様々な変形および変更が可能なのはいうまでもない。 As described with reference to FIG. 1, the machine learning device 2 (state observing unit 21), for example, outputs the output data D2 of the display device 30 as the operation history of the operation menu (state amount: for example, the access count of the operation menu). Change data D3 (operation amount: for example, the order of menu display and the menu display) that controls the operation menu displayed on the display device 30 by performing machine learning (for example, reinforcement learning) by receiving the information and transition information of the operation menu). Position etc.). It is needless to say that FIG. 2 is merely an example, and various modifications and changes can be made.

本実施形態によれば、例えば、操作者が初めて機械(工作機械システム)を操作する場合、操作者と近い立場または権限レベルの人の操作履歴を学習していれば、最初からある程度、適切なメニュー表示を行うことができる。また、操作する人の立場や権限レベルが増えるたびに、専用のテーブルを作らなくても、次第に適切なメニュー表示を行うことができるようになる。 According to the present embodiment, for example, when an operator operates a machine (machine tool system) for the first time, if he / she learns the operation history of a person who is in a position close to the operator or has an authority level, it is appropriate to some extent from the beginning. The menu can be displayed. Also, every time the operating person's position or authority level increases, it becomes possible to gradually display an appropriate menu without creating a dedicated table.

ところで、機械学習器２は、装置に入力されるデータの集合から、その中にある有用な規則や知識表現、判断基準等を解析により抽出し、その判断結果を出力するとともに、知識の学習(機械学習)を行う機能を有する。機械学習の手法は様々であるが、大別すれば、例えば、「教師あり学習」、「教師なし学習」および「強化学習」に分けられる。さらに、これらの手法を実現するうえで、特徴量そのものの抽出を学習する、「深層学習(ディープラーニング：Deep Learning)」と呼ばれる手法がある。 By the way, the machine learning device 2 extracts useful rules, knowledge expressions, judgment criteria, etc., contained in the data input to the device by analysis, outputs the judgment results, and learns knowledge ( It has the function of performing machine learning). Although there are various machine learning methods, they can be roughly classified into, for example, “supervised learning”, “unsupervised learning”, and “reinforcement learning”. Furthermore, in order to realize these methods, there is a method called "deep learning" that learns extraction of the feature amount itself.

なお、図１に示す機械学習器２は、「強化学習(Ｑ学習)」を適用したものであり、また、図９を参照して後述する機械学習器４は、「教師あり学習」を適用したものである。これらの機械学習(機械学習器２，４)は、汎用の計算機若しくはプロセッサを用いることもできるが、例えば、ＧＰＧＰＵ(General-Purpose computing on Graphics Processing Units)や大規模ＰＣクラスター等を適用すると、より高速処理が可能になる。 The machine learning device 2 shown in FIG. 1 applies “reinforcement learning (Q learning)”, and the machine learning device 4 described later with reference to FIG. 9 applies “supervised learning”. It was done. For these machine learning (machine learning devices 2 and 4), a general-purpose computer or processor can be used, but if GPGPU (General-Purpose computing on Graphics Processing Units) or a large-scale PC cluster is applied, for example, High-speed processing becomes possible.

まず、教師あり学習とは、教師データ、すなわち、ある入力と結果(ラベル)のデータの組を大量に機械学習器に与えることで、それらのデータセットにある特徴を学習し、入力から結果を推定するモデル(誤差モデル)、すなわち、その関係性を帰納的に獲得するものである。例えば、後述のニューラルネットワーク等のアルゴリズムを用いて実現することが可能である。 First, with supervised learning, a large amount of teacher data, that is, a set of input and result (label) data, is given to a machine learning device to learn features in those data sets, and to obtain results from the input. The model (error model) to be estimated, that is, the relationship is recursively acquired. For example, it can be realized using an algorithm such as a neural network described later.

また、教師なし学習とは、入力データのみを大量に機械学習器に与えることで、入力データがどのような分布をしているか学習し、対応する教師出力データを与えなくても、入力データに対して圧縮・分類・整形等を行う装置で学習する手法である。例えば、それらのデータセットにある特徴を、似た者どうしにクラスタリングすること等ができる。この結果を使って、何らかの基準を設けてそれを最適化するような出力の割り当てを行うことにより、出力の予測を実現することできる。 In addition, unsupervised learning learns what kind of distribution the input data has by giving a large amount of input data only to the machine learning device. On the other hand, this is a method of learning with a device that performs compression, classification, shaping, and the like. For example, features in those datasets can be clustered among similar individuals. Output prediction can be realized by using the result and assigning the output so as to set some standard and optimize it.

なお、教師なし学習と教師あり学習との中間的な問題設定として、半教師あり学習と呼ばれるものもあり、これは、例えば、一部のみ入力と出力のデータの組が存在し、それ以外は入力のみのデータである場合が対応する。本実施形態においては、実際に工作機械システム(工作機械１および数値制御装置３)を動かさなくても取得することができるデータ(画像データやシミュレーションのデータ等)を教師なし学習で利用することにより、学習を効率的に行うことが可能になる。 As an intermediate problem setting between unsupervised learning and supervised learning, there is also what is called semi-supervised learning. This is because, for example, only a part of input and output data sets exists, and other than that, This corresponds to the case where the data is input only. In the present embodiment, data (image data, simulation data, etc.) that can be acquired without actually moving the machine tool system (machine tool 1 and numerical controller 3) is used by unsupervised learning. , It becomes possible to perform learning efficiently.

次に、強化学習について、説明する。まず、強化学習の問題設定として、次のように考える。
・工作機械システム(すなわち、工作機械および数値制御装置：以下の記載では、説明を簡略化するために、数値制御装置とも称する)は、環境の状態を観測し、行動を決定する。
・環境は、何らかの規則に従って変化し、さらに、自分の行動が、環境に変化を与えることもある。
・行動するたびに、報酬信号が帰ってくる。
・最大化したいのは、将来にわたっての(割引)報酬の合計である。
・行動が引き起こす結果を全く知らない、または、不完全にしか知らない状態から学習はスタートする。すなわち、数値制御装置は、実際に行動して初めて、その結果をデータとして得ることができる。つまり、試行錯誤しながら最適な行動を探索する必要がある。
・人間の動作を真似るように、事前学習(前述の教師あり学習や、逆強化学習といった手法)した状態を初期状態として、良いスタート地点から学習をスタートさせることもできる。 Next, reinforcement learning will be described. First, consider the following as a problem setting for reinforcement learning.
A machine tool system (that is, a machine tool and a numerical control device: in the following description, also referred to as a numerical control device for simplification of description) observes a state of an environment and determines an action.
-The environment changes according to some rules, and in addition, one's behavior may change the environment.
・ Every time you act, a reward signal comes back.
・ What we want to maximize is the total of (discount) rewards in the future.
・ Learning starts from a state of completely or incompletely knowing the consequences of actions. That is, the numerical control device can obtain the result as data only after actually acting. In other words, it is necessary to search for the optimum behavior by trial and error.
-It is possible to start learning from a good starting point by setting the state of pre-learning (methods such as supervised learning and inverse reinforcement learning described above) as the initial state so as to imitate human movements.

ここで、強化学習とは、判定や分類だけではなく、行動を学習することにより、環境に行動が与える相互作用を踏まえて適切な行動を学習、すなわち、将来的に得られる報酬を最大にするための学習する方法を学ぶものである。以下に、例として、Ｑ学習の場合で説明を続けるが、Ｑ学習に限定されるものではない。 Here, reinforcement learning means not only judgment and classification but also learning behaviors to learn appropriate behaviors based on the interaction of behaviors with the environment, that is, maximizing rewards obtained in the future. Is for learning how to learn. Hereinafter, as an example, the description will be continued in the case of Q learning, but the present invention is not limited to Q learning.

Ｑ学習は、或る環境状態ｓの下で、行動ａを選択する価値Ｑ(ｓ，ａ)を学習する方法である。つまり、或る状態ｓのとき、価値Ｑ(ｓ，ａ)の最も高い行動ａを最適な行動として選択すればよい。しかし、最初は、状態ｓと行動ａとの組合せについて、価値Ｑ(ｓ，ａ)の正しい値は全く分かっていない。そこで、エージェント(行動主体)は、或る状態ｓの下で様々な行動ａを選択し、その時の行動ａに対して、報酬が与えられる。それにより、エージェントは、より良い行動の選択、すなわち、正しい価値Ｑ(ｓ，ａ)を学習していく。 Q-learning is a method of learning the value Q (s, a) of selecting an action a under a certain environmental condition s. That is, in a certain state s, the action a having the highest value Q (s, a) may be selected as the optimum action. However, at first, the correct value of the value Q (s, a) is not known for the combination of the state s and the action a. Therefore, the agent (action subject) selects various actions a under a certain state s, and a reward is given to the action a at that time. Thereby, the agent learns a better action selection, that is, the correct value Q (s, a).

さらに、行動の結果、将来にわたって得られる報酬の合計を最大化したいので、最終的にＱ(ｓ，ａ)＝Ｅ［Σ(γ^t)ｒ_t］となるようにすることを目指す。ここで、期待値は、最適な行動に従って状態変化したときについてとるものとし、それは、分かっていないので、探索しながら学習することになる。このような価値Ｑ(ｓ，ａ)の更新式は、例えば、次の式(１)により表すことができる。 The results of behavioral, we want to maximize the sum of the rewards future, finally Q (s, a) = E aims to ^{_{[Σ (γ t) r t}} ] become so. Here, the expected value is taken when the state changes in accordance with the optimum action, and since it is not known, learning is performed while searching. Such an updating formula of the value Q (s, a) can be expressed by the following formula (1), for example.

上記の式(１)において、ｓ_tは、時刻ｔにおける環境の状態を表し、ａ_tは、時刻ｔにおける行動を表す。行動ａ_tにより、状態はｓ_t+1に変化する。r_t+1は、その状態の変化により得られる報酬を表している。また、ｍａｘの付いた項は、状態ｓ_t+1の下で、その時に分かっている最もＱ値の高い行動ａを選択した場合のＱ値にγを乗じたものになる。ここで、γは、０＜γ≦１のパラメータで、割引率と呼ばれる。また、αは、学習係数で、０＜α≦１の範囲とする。

In the above formula (1), s _t represents the state of the environment at time t, a _t represents the action at time t. By the action a _t, the state changes to s _{t + 1.} r _{t + 1} represents the reward obtained by changing the state. In addition, the term with max is the Q value when the action a with the highest Q value known at that time is selected under the state _{st + 1} , and is multiplied by γ. Here, γ is a parameter of 0 <γ ≦ 1 and is called a discount rate. Further, α is a learning coefficient, and is set in a range of 0 <α ≦ 1.

上述した式(１)は、試行ａ_tの結果、帰ってきた報酬ｒ_t+1を元に、状態ｓ_tにおける行動ａ_tの評価値Ｑ(ｓ_t，ａ_t)を更新する方法を表している。すなわち、状態ｓにおける行動ａの評価値Ｑ(ｓ_t，ａ_t)よりも、報酬ｒ_t+1 ＋行動ａによる次の状態における最良の行動ｍａｘａの評価値Ｑ(ｓ_t+1，ｍａｘａ_t+1)の方が大きければ、Ｑ(ｓ_t，ａ_t)を大きくし、反対に小さければ、Ｑ(ｓ_t，ａ_t)を小さくすることを示している。つまり、或る状態における或る行動の価値を、結果として即時帰ってくる報酬と、その行動による次の状態における最良の行動の価値に近付けるようにしている。 The above-mentioned formula (1) as a result of the trial a _t, based on the reward r _{t + 1} came back, represents a method for updating the evaluation value Q of the action a _t in state _{_{_{s t (s t, a t}}} ) ing. In other words, the evaluation value Q (s _{_t,} a _t) of the action a in the state s than, reward r _{t +} 1 ₊ evaluation value Q of the best action max a in the next state by the action a (s _t + 1, max If a _{t + 1} ) is larger, then Q (s _t , a _t ) is increased, and if smaller, Q (s _t , a _t ) is decreased. That is, the value of a certain action in a certain state is brought closer to the reward that immediately returns as a result and the value of the best action in the next state due to the action.

ここで、Ｑ(ｓ，ａ)の計算機上での表現方法は、すべての状態行動ペア(ｓ，ａ)に対して、その値をテーブルとして保持しておく方法と、Ｑ(ｓ，ａ)を近似するような関数を用意する方法がある。後者の方法では、前述の式(１)は、確率勾配降下法等の手法で近似関数のパラメータを調整していくことにより、実現することができる。なお、近似関数としては、後述のニューラルネットワークを用いることができる。 Here, the method of expressing Q (s, a) on the computer is to hold the values of all state-action pairs (s, a) as a table, and to use Q (s, a) There is a method to prepare a function that approximates. In the latter method, the above equation (1) can be realized by adjusting the parameters of the approximation function by a method such as the stochastic gradient descent method. A neural network described later can be used as the approximation function.

また、強化学習での価値関数の近似アルゴリズムとして、ニューラルネットワークを用いることができる。図３は、ニューロンのモデルを模式的に示す図であり、図４は、図３に示すニューロンを組み合わせて構成した三層のニューラルネットワークを模式的に示す図である。すなわち、ニューラルネットワークは、例えば、図３に示すようなニューロンのモデルを模した演算装置およびメモリ等で構成される。 Also, a neural network can be used as an approximation algorithm of the value function in the reinforcement learning. FIG. 3 is a diagram schematically showing a neuron model, and FIG. 4 is a diagram schematically showing a three-layer neural network configured by combining the neurons shown in FIG. That is, the neural network is composed of, for example, an arithmetic unit and a memory that imitate a neuron model as shown in FIG.

図３に示されるように、ニューロンは、複数の入力ｘ(図３では、一例として入力ｘ1〜ｘ3)に対する出力(結果)ｙを出力するものである。各入力ｘ(ｘ1，ｘ2，ｘ3)には、この入力ｘに対応する重みｗ(ｗ1，ｗ2，ｗ3)が乗算される。これにより、ニューロンは、次の式(２)により表現される結果ｙを出力する。なお、入力ｘ、結果ｙおよび重みｗは、すべてベクトルである。また、下記の式(２)において、θは、バイアスであり、ｆ_kは、活性化関数である。

As shown in FIG. 3, the neuron outputs an output (result) y for a plurality of inputs x (in FIG. 3, inputs x1 to x3 as an example). Each input x (x1, x2, x3) is multiplied by the weight w (w1, w2, w3) corresponding to this input x. As a result, the neuron outputs the result y expressed by the following equation (2). The input x, the result y, and the weight w are all vectors. Further, in the following formula (2), θ is a bias, and f _k is an activation function.

図４を参照して、図３に示すニューロンを組み合わせて構成した三層のニューラルネットワークを説明する。図４に示されるように、ニューラルネットワークの左側から複数の入力ｘ(ここでは、一例として、入力ｘ1〜入力ｘ3)が入力され、右側から結果ｙ(ここでは、一例として、結果ｙ1〜入力ｙ3)が出力される。具体的に、入力ｘ1，ｘ2，ｘ3は、３つのニューロンＮ11〜Ｎ13の各々に対して、対応する重みが掛けられて入力される。これらの入力に掛けられる重みは、まとめてＷ１と標記されている。 A three-layer neural network configured by combining the neurons shown in FIG. 3 will be described with reference to FIG. As shown in FIG. 4, a plurality of inputs x (here, input x1 to input x3 as an example) are input from the left side of the neural network, and a result y (here, result y1 to input y3 as an example here) is input from the right side. ) Is output. Specifically, the inputs x1, x2, and x3 are input with corresponding weights applied to each of the three neurons N11 to N13. The weights applied to these inputs are collectively labeled as W1.

ニューロンＮ11〜Ｎ13は、それぞれ、ｚ11〜ｚ13を出力する。図４において、これらｚ11〜ｚ13は、まとめて特徴ベクトルＺ１と標記され、入力ベクトルの特徴量を抽出したベクトルとみなすことができる。この特徴ベクトルＺ１は、重みＷ１と重みＷ２との間の特徴ベクトルである。ｚ11〜ｚ13は、２つのニューロンＮ21およびＮ22の各々に対して、対応する重みが掛けられて入力される。これらの特徴ベクトルに掛けられる重みは、まとめてＷ２と標記されている。 The neurons N11 to N13 output z11 to z13, respectively. In FIG. 4, z11 to z13 are collectively labeled as a feature vector Z1 and can be regarded as a vector obtained by extracting the feature amount of the input vector. The feature vector Z1 is a feature vector between the weight W1 and the weight W2. z11 to z13 are input with corresponding weights applied to each of the two neurons N21 and N22. The weights applied to these feature vectors are collectively labeled as W2.

ニューロンＮ21，Ｎ22は、それぞれｚ21，ｚ22を出力する。図４において、これらｚ21，ｚ22は、まとめて特徴ベクトルＺ２と標記されている。この特徴ベクトルＺ２は、重みＷ２と重みＷ３との間の特徴ベクトルである。ｚ21，ｚ22は、３つのニューロンＮ31〜Ｎ33の各々に対して、対応する重みが掛けられて入力される。これらの特徴ベクトルに掛けられる重みは、まとめてＷ３と標記されている。 The neurons N21 and N22 output z21 and z22, respectively. In FIG. 4, these z21 and z22 are collectively referred to as a feature vector Z2. The feature vector Z2 is a feature vector between the weight W2 and the weight W3. z21 and z22 are inputted with corresponding weights applied to each of the three neurons N31 to N33. The weights applied to these feature vectors are collectively labeled as W3.

最後に、ニューロンＮ31〜Ｎ33は、それぞれ、結果ｙ１〜結果ｙ３を出力する。ニューラルネットワークの動作には、学習モードと価値予測モードとがある。例えば、学習モードにおいて、学習データセットを用いて重みＷを学習し、そのパラメータを用いて予測モードにおいて、数値制御装置の行動判断を行う。なお、便宜上、予測と書いたが、検出・分類・推論等多様なタスクが可能なのはいうまでもない。 Finally, the neurons N31 to N33 output the result y1 to the result y3, respectively. The operation of the neural network has a learning mode and a value prediction mode. For example, in the learning mode, the weight W is learned using the learning data set, and the behavior of the numerical controller is determined in the prediction mode using the parameter. It should be noted that although it is written as prediction for convenience, it goes without saying that various tasks such as detection, classification, and inference can be performed.

ここで、予測モードで実際に数値制御装置を動かして得られたデータを即時学習し、次の行動に反映させる(オンライン学習)ことも、予め収集しておいたデータ群を用いてまとめた学習を行い、以降はずっとそのパラメータで検知モードを行う(バッチ学習)こともできる。あるいは、その中間的な、ある程度データが溜まるたびに学習モードを挟むということも可能である。 Here, the data obtained by actually operating the numerical control device in the prediction mode can be immediately learned and reflected in the next action (online learning), or the learning collected using the data group collected in advance. It is also possible to perform the detection mode with that parameter after that (batch learning). Alternatively, it is also possible to sandwich the learning mode every time some intermediate data is accumulated.

また、重みＷ１〜Ｗ３は、誤差逆伝搬法(誤差逆転伝播法：バックプロパゲーション：Backpropagation)により学習可能なものである。なお、誤差の情報は、右側から入り左側に流れる。誤差逆伝搬法は、各ニューロンについて、入力ｘが入力されたときの出力ｙと真の出力ｙ(教師)との差分を小さくするように、それぞれの重みを調整(学習)する手法である。このようなニューラルネットワークは、三層以上に、さらに層を増やすことも可能である(深層学習と称される)。また、入力の特徴抽出を段階的に行い、結果を回帰する演算装置を、教師データのみから自動的に獲得することも可能である。 The weights W1 to W3 can be learned by an error back propagation method (error back propagation method: back propagation). Note that the error information enters from the right side and flows to the left side. The error back-propagation method is a method of adjusting (learning) the respective weights for each neuron so as to reduce the difference between the output y when the input x is input and the true output y (teacher). Such a neural network can increase the number of layers to three or more (called deep learning). Further, it is also possible to automatically acquire the arithmetic unit that performs the feature extraction of the input stepwise and regresses the result from only the teacher data.

そこで、上述したように、本実施例の機械学習器２は、例えば、Ｑ学習を実施すべく、状態観測部２１、学習部２２、および、意思決定部２５を備えている。ただし、本発明に適用される機械学習方法は、Ｑ学習に限定されない。また、機械学習(機械学習器２)は、例えば、ＧＰＧＰＵや大規模ＰＣクラスター等を適用することで実現可能なのは、前述した通りである。 Therefore, as described above, the machine learning device 2 of the present embodiment includes, for example, the state observing unit 21, the learning unit 22, and the decision making unit 25 in order to carry out the Q learning. However, the machine learning method applied to the present invention is not limited to Q learning. As described above, the machine learning (machine learning device 2) can be realized by applying, for example, GPGPU or a large-scale PC cluster.

図５は、図１に示す機械学習器の動作の一例を示すフローチャートである。図５に示されるように、機械学習が開始(学習スタート)すると、ステップＳＴ１において、操作者の立場または権限レベルを取得し、ステップＳＴ２に進む。ステップＳＴ２では、操作履歴を取得し、ステップＳＴ３に進む。ここで、操作履歴には、例えば、メニューのアクセス回数および遷移情報が含まれる。 FIG. 5 is a flowchart showing an example of the operation of the machine learning device shown in FIG. As shown in FIG. 5, when machine learning is started (learning start), the position or authority level of the operator is acquired in step ST1, and the process proceeds to step ST2. In step ST2, the operation history is acquired, and the process proceeds to step ST3. Here, the operation history includes, for example, the menu access count and transition information.

ステップＳＴ３では、メニュー表示を行い、ステップＳＴ４では、操作者がメニューを選択し、ステップＳＴ５では、操作し易い位置のメニューから操作されたかどうかを判定する。操作し難い位置のメニューから操作されたと判定すると、ステップＳＴ８に進んでマイナス報酬を設定し、操作し易い位置のメニューから操作された判定すると、ステップＳＴ６に進んでプラス報酬を設定する。 In step ST3, the menu is displayed, in step ST4, the operator selects the menu, and in step ST5, it is determined whether or not the menu is operated from the menu at a position where it is easy to operate. If it is determined that the menu is operated at a position where it is difficult to operate, the procedure proceeds to step ST8 to set a negative reward, and if it is determined that it is operated from a menu at a position that is difficult to operate, the procedure proceeds to step ST6 to set a positive reward.

このようにして、ステップＳＴ６およびＳＴ８により設定された報酬は、ステップＳＴ７における報酬計算によりまとめられ、そして、ステップＳＴ９に進んで、ステップＳＴ７で計算された報酬に基づいて、行動価値テーブルを更新する。 In this way, the rewards set in steps ST6 and ST8 are summarized by the reward calculation in step ST7, and the process proceeds to step ST9 to update the action value table based on the reward calculated in step ST7. ..

すなわち、価値関数更新部２４が価値関数(行動価値テーブル)を更新する。そして、例えば、意思決定部２５が、価値関数更新部２４が更新した価値関数に基づいて、表示装置３０に表示する操作メニューの位置および順番を決定する。なお、ステップＳＴ９の処理が終了すると、ステップＳＴ１に戻って同様の処理を繰り返す。このように、本実施形態によれば、それぞれの操作者に最適な操作メニューを表示することが可能になる。 That is, the value function updating unit 24 updates the value function (action value table). Then, for example, the decision making unit 25 decides the position and the order of the operation menu displayed on the display device 30, based on the value function updated by the value function updating unit 24. When the process of step ST9 ends, the process returns to step ST1 and the same process is repeated. As described above, according to this embodiment, it is possible to display the optimum operation menu for each operator.

図６〜図８は、図５に示す機械学習器により学習された操作メニューの表示を説明するための図である。ここで、図６(a)および図６(b)は、操作メニューのアクセス回数(状態量)に基づいて、操作メニューを更新(学習)する様子を説明するための図であり、図７(a)〜図８(b)は、操作メニューの遷移情報(状態量)に基づいて、操作メニューを学習する様子を説明するための図である。また、説明を簡略化するために、各図(操作メニューが表示される表示装置３０の表示画面３００)において、例えば、操作メニューとして操作し易い位置を上側および左側とし、操作し難い位置を下側および右側とする。 6 to 8 are views for explaining the display of the operation menu learned by the machine learning device shown in FIG. Here, FIG. 6A and FIG. 6B are views for explaining the manner in which the operation menu is updated (learned) based on the access count (state amount) of the operation menu, and FIG. 8A to 8B are diagrams for explaining how to learn the operation menu based on the transition information (state amount) of the operation menu. Further, in order to simplify the description, in each figure (the display screen 300 of the display device 30 on which the operation menu is displayed), for example, the positions that are easy to operate as the operation menu are the upper side and the left side, and the positions that are difficult to operate are the lower side. On the side and on the right.

図６(a)は、初期状態での操作メニュー(ホーム画面)を示し、図６(b)は、ある操作者(例えば、工作機械メーカ(ＭＴＢ)の担当者)がアクセスした例を示す。なお、参照符号Ｐ１〜Ｐ６は、表示装置３０の表示画面３００上に表示される操作メニューの各アイコンの位置を示す。図６(a)に示されるように、初期状態での操作メニューにおいて、表示画面３００の位置Ｐ１には、『ブラウザ』アイコンが表示され、位置Ｐ２には、『メモ』アイコンが表示され、そして、位置Ｐ３には、『マニュアル』アイコンが表示されている。また、表示画面３００の位置Ｐ４には、『ＮＣ操作』アイコンが表示され、位置Ｐ５には、『保守』アイコンが表示され、そして、位置Ｐ６には、『設定』アイコンが表示されている。 FIG. 6A shows an operation menu (home screen) in an initial state, and FIG. 6B shows an example accessed by an operator (for example, a person in charge of a machine tool maker (MTB)). Note that reference symbols P1 to P6 indicate the positions of the respective icons of the operation menu displayed on the display screen 300 of the display device 30. As shown in FIG. 6A, in the operation menu in the initial state, a "browser" icon is displayed at position P1 on the display screen 300, a "memo" icon is displayed at position P2, and A "manual" icon is displayed at position P3. Further, a "NC operation" icon is displayed at position P4 on the display screen 300, a "maintenance" icon is displayed at position P5, and a "setting" icon is displayed at position P6.

ここで、『ブラウザ』アイコンは、インターネットを参照するときに使用され、『メモ』アイコンは、メモ画面を表示するときに使用され、そして、『マニュアル』アイコンは、マニュアルを参照するときに使用される。これら『ブラウザ』，『メモ』および『マニュアル』アイコンは、行方向に配列されて［カテゴリＡ］を構成する。また、『ＮＣ操作』アイコンは、加工プログラムの作成および加工状況の確認等を行うときに使用され、『保守』アイコンは、機械のアラーム情報の確認等を行うときに使用され、そして、『設定』アイコンは、パラメータの設定等を行うときに使用される。これら『ＮＣ操作』，『保守』および『設定』アイコンは、行方向に配列されて［カテゴリＢ］を構成する。 Here, the "browser" icon is used to browse the Internet, the "memo" icon is used to display the memo screen, and the "manual" icon is used to browse the manual. It These "browser", "memo" and "manual" icons are arranged in the row direction to form [category A]. The "NC operation" icon is used to create a machining program and check the machining status, the "Maintenance" icon is used to check machine alarm information, and "Setting". The icon is used when setting parameters and the like. These "NC operation", "maintenance" and "setting" icons are arranged in the row direction to form [Category B].

図６(b)に示されるように、例えば、ＭＴＢの担当者が操作者６として操作メニュー(表示画面３００)アクセスしたとき、それぞれのアイコンの操作回数が、『ブラウザ』アイコンが２回、『メモ』アイコンが１回、『マニュアル』アイコンが３回、『ＮＣ操作』アイコンが５回、『保守』アイコンが１回、そして、『設定』アイコンが１０回だった場合、［カテゴリＡ］を構成するアイコンの合計の操作回数は６回となり、［カテゴリＢ］を構成するアイコンの合計の操作回数は１６回となるため、［カテゴリＢ］は、［カテゴリＡ］よりも操作が容易な上側の行に変更される。 As shown in FIG. 6 (b), for example, when the person in charge of MTB accesses the operation menu (display screen 300) as the operator 6, the number of times of operation of each icon is 2 times for the “browser” icon and “ If the "Memo" icon is once, the "Manual" icon is three times, the "NC operation" icon is five times, the "Maintenance" icon is once, and the "Settings" icon is ten times, then [Category A] is selected. The total number of operations of the constituent icons is 6 and the total number of operations of the icons forming [Category B] is 16. Therefore, [Category B] is easier to operate than [Category A]. Is changed to the line.

さらに、各［カテゴリＡ］および［カテゴリＢ］において、アイコンの操作回数が多い順に、操作が容易な左側から並べられる。すなわち、位置Ｐ１には『設定』アイコン、位置Ｐ２には『ＮＣ操作』アイコン、そして、位置Ｐ３には『保守』アイコンが表示される。また、位置Ｐ４には『マニュアル』アイコン、位置Ｐ５には『ブラウザ』アイコン、そして、位置Ｐ６には『メモ』アイコンが表示される。 Furthermore, in each [Category A] and [Category B], the icons are arranged in the descending order of the number of times of operation from the left side where the operation is easy. That is, a "setting" icon is displayed at position P1, a "NC operation" icon is displayed at position P2, and a "maintenance" icon is displayed at position P3. A "manual" icon is displayed at position P4, a "browser" icon is displayed at position P5, and a "memo" icon is displayed at position P6.

図７(a)および図７(b)は、例えば、ＭＴＢの担当者が操作者６として、図６(a)における『設定』アイコンを操作(例えば、タッチ操作)した後、表示画面３００に表示される操作メニューの例を示し、＜１ページ目＞および＜２ページ目＞の２つの表示画面(操作メニュー)が『ページ送る』および『ページ戻る』アイコンにより選択可能となっている。なお、参照符号Ｐ１１〜Ｐ１５およびＰ２１〜Ｐ２５は、表示装置３０の表示画面３００上に表示される＜１ページ目＞および＜２ページ目＞の操作メニューの各アイコンの位置を示す。ここで、＜１ページ目＞の表示画面における『ページ送る』アイコンは、位置Ｐ１５に固定され、＜２ページ目＞の表示画面における『ページ戻る』アイコンは、位置Ｐ２５に固定されているものとする。 7 (a) and 7 (b), for example, the person in charge of the MTB as the operator 6 operates the “setting” icon in FIG. 6 (a) (for example, touch operation), and then the display screen 300 is displayed. An example of the operation menu to be displayed is shown, and two display screens (operation menu) of <first page> and <second page> can be selected by the "page forward" and "page return" icons. Reference numerals P11 to P15 and P21 to P25 indicate the positions of the respective icons of the <first page> and <second page> operation menus displayed on the display screen 300 of the display device 30. Here, the "page forward" icon on the <first page> display screen is fixed at position P15, and the "page return" icon on the <second page> display screen is fixed at position P25. To do.

そして、ＭＴＢの担当者が操作者６として、過去にアイコン『Ａ』〜『Ｈ』を操作した操作回数に基づいて、機械学習器２による機械学習が行われ、例えば、図８(a)および図８(b)に示されるような操作メニューの表示が行われる。例えば、それぞれのアイコン『Ａ』〜『Ｈ』の操作回数が、『Ａ』アイコンが１０回、『Ｂ』アイコンが８回、『Ｃ』アイコンが７回、『Ｄ』アイコンが５回、『Ｅ』アイコンが６回、『Ｆ』アイコンが９回、『Ｇ』アイコンが４回、そして、『Ｈ』アイコンが３回だった場合、図８(a)に示されるように、操作が容易な＜１ページ目＞の左から右に向けて、操作回数が多い順に、位置Ｐ１１に『Ａ』、Ｐ１２に『Ｆ』、Ｐ１３に『Ｂ』、Ｐ１４に『Ｃ』、そして、Ｐ１５に『ページ送る』アイコンが並べられ、さらに、図８(b)に示されるように、＜２ページ目＞の左から右に向けて、操作回数が多い順に、位置Ｐ２１に『Ｅ』、Ｐ２２に『Ｄ』、Ｐ２３に『Ｇ』、Ｐ２４に『Ｈ』、そして、Ｐ２５に『ページ戻る』アイコンが並べられる。 Then, the person in charge of the MTB as the operator 6 performs machine learning by the machine learning device 2 based on the number of times the icons “A” to “H” have been operated in the past. For example, as shown in FIG. The operation menu is displayed as shown in FIG. 8 (b). For example, the number of operations of each icon “A” to “H” is 10 times for the “A” icon, 8 times for the “B” icon, 7 times for the “C” icon, 5 times for the “D” icon, When the “E” icon is 6 times, the “F” icon is 9 times, the “G” icon is 4 times, and the “H” icon is 3 times, the operation is easy as shown in FIG. 8 (a). From left to right on the <first page>, in order of the number of operations, the position P11 is "A", P12 is "F", P13 is "B", P14 is "C", and P15 is " The "Send page" icon is arranged, and further, as shown in FIG. 8 (b), from the left to the right of the <second page>, "E" is set at position P21 and "E" is set at P22 in the order of the number of operations. "D", P23, "G", P24, "H", and P25, "page return" icons.

以上の説明では、操作履歴(操作メニューのアクセス回数および遷移情報)ならびに操作メニューを簡略化して説明したが、機械学習器２に入力する状態量(操作履歴等)、機械学習器２から出力される操作量(学習された操作メニュー等)、ならびに、操作者６の立場および権限レベル等は、様々な変形および変更が可能であるのはいうまでもない。 In the above description, the operation history (the number of times of access to the operation menu and the transition information) and the operation menu have been simplified, but the state quantities (operation history etc.) input to the machine learning device 2 and the output from the machine learning device 2 are described. It goes without saying that the amount of operation to be performed (learned operation menu, etc.), and the position and authority level of the operator 6 can be variously modified and changed.

図９は、本発明に係る工作機械システムの他の実施形態を概略的に示すブロック図であり、教師あり学習を適用したものを示す。図９と、前述した図１の比較から明らかなように、図９に示す教師あり学習を適用した工作機械システムは、図１に示すＱ学習(強化学習)を適用した工作機械システムにおいて、教師データ(結果(ラベル)付きデータ)が提供されるようになっている。 FIG. 9 is a block diagram schematically showing another embodiment of the machine tool system according to the present invention, to which supervised learning is applied. As is clear from the comparison between FIG. 9 and FIG. 1 described above, the machine tool system to which the supervised learning shown in FIG. 9 is applied is a machine tool system to which the Q learning (reinforcement learning) shown in FIG. 1 is applied. Data (data with result (label)) is provided.

図９に示されるように、教師あり学習を適用した工作機械システムにおける機械学習器４は、状態観測部４１と、学習部４２と、意思決定部４５と、を備える。学習部４２は、誤差計算部４３と、学習モデル更新部(誤差モデル更新部)４４と、を含む。なお、本実施形態の工作機械システムにおいても、機械学習器４は、操作メニューの操作履歴に基づいて操作メニューの表示を学習する。 As shown in FIG. 9, the machine learning device 4 in the machine tool system to which the supervised learning is applied includes a state observing section 41, a learning section 42, and a decision making section 45. The learning unit 42 includes an error calculation unit 43 and a learning model updating unit (error model updating unit) 44. Also in the machine tool system of the present embodiment, the machine learning device 4 learns the display of the operation menu based on the operation history of the operation menu.

すなわち、状態観測部４１は、図１における状態観測部２１と同様に、操作メニューの操作履歴、例えば、操作メニューのアクセス回数および操作メニューの遷移情報といった状態量を観測する。また、状態観測部４１は、現在選択されている操作メニューの情報、工作機械が加工運転中であるか否かを示す情報、数値制御装置および工作機械のアラーム情報、ならびに、プログラムの編集中であるか否かを示す情報の少なくとも１つを状態量として観測することができる。 That is, the state observing unit 41 observes the operation history of the operation menu, for example, the state quantities such as the number of times of accessing the operation menu and the transition information of the operation menu, as in the state observing unit 21 in FIG. 1. Further, the state observing unit 41 is in the process of editing the information of the currently selected operation menu, the information indicating whether the machine tool is in the machining operation, the alarm information of the numerical control device and the machine tool, and the program being edited. At least one piece of information indicating whether or not there is a state quantity can be observed.

図９に示されるように、学習部４２は、誤差計算部４３および学習モデル更新部４４を含み、誤差計算部４３および学習モデル更新部４４は、それぞれ、図１に示すＱ学習を適用した工作機械システムにおける報酬計算部２３および価値関数更新部２４に相当する。ただし、本実施形態における誤差計算部４３には、外部から教師データが入力され、その教師データと学習モデル(誤差モデル)の差が小さくなるように、学習モデル更新部４４により学習モデルが更新される構成等において、図１を参照して説明したものとは異なる。 As shown in FIG. 9, the learning unit 42 includes an error calculation unit 43 and a learning model updating unit 44, and the error calculation unit 43 and the learning model updating unit 44 respectively perform the operation to which the Q learning shown in FIG. 1 is applied. It corresponds to the reward calculation unit 23 and the value function update unit 24 in the mechanical system. However, the learning model updating unit 44 updates the learning model so that the difference between the training data and the learning model (error model) is input to the error calculating unit 43 in the present embodiment from the outside. The configuration and the like are different from those described with reference to FIG.

すなわち、誤差計算部４３は、状態観測部４１の出力および教師データを受け取って、結果(ラベル)付きデータと学習部４２に実装されている学習モデルとの誤差を計算する。ここで、教師データとしては、例えば、同一の工作機械システムにより同じ作業を行わせる場合、実際に作業を行わせる所定日の前日までに得られたラベル付きデータを保持し、その所定日に、教師データとして誤差計算部４３に提供することができる。 That is, the error calculation unit 43 receives the output of the state observation unit 41 and the teacher data, and calculates the error between the result (labeled) data and the learning model implemented in the learning unit 42. Here, as the teacher data, for example, when the same work is performed by the same machine tool system, the labeled data obtained up to the day before the predetermined day to actually perform the work is held, and the predetermined day, It can be provided to the error calculation unit 43 as teacher data.

あるいは、工作機械システム(数値制御装置や工作機械等)の外部で行われたシミュレーション等により得られたデータ、または、他の工作機械システムのラベル付きデータを、メモリカードや通信回線により、その工作機械システムの誤差計算部４３に教師データとして提供することも可能である。さらに、教師データ(ラベル付きデータ)を、例えば、学習部４２に内蔵したフラッシュメモリ(Flash Memory)等の不揮発性メモリに保持し、その不揮発性メモリに保持されたラベル付きデータを、そのまま学習部４２で使用することもできる。 Alternatively, the data obtained by a simulation performed outside the machine tool system (numerical control device, machine tool, etc.) or the labeled data of another machine tool system can be processed by a memory card or communication line. It is also possible to provide it as error data to the error calculation unit 43 of the mechanical system. Further, the teacher data (labeled data) is held in, for example, a nonvolatile memory such as a flash memory (Flash Memory) built in the learning unit 42, and the labeled data held in the nonvolatile memory is directly used as the learning unit. It can also be used at 42.

以上において、工作機械システムを複数備える製造システムを考えた場合、例えば、機械学習器２(４)は、工作機械システム毎にそれぞれ設けられ、複数の工作機械システムに設けられた複数の機械学習器２(４)は、通信媒体を介して相互にデータを共有または交換することができる。また、機械学習器２(４)は、クラウドサーバ上に存在させることも可能である。 In the above, when considering a manufacturing system including a plurality of machine tool systems, for example, the machine learning device 2 (4) is provided for each machine tool system, and a plurality of machine learning devices provided for a plurality of machine tool systems are provided. 2 (4) can share or exchange data with each other via a communication medium. Further, the machine learning device 2 (4) can be made to exist on the cloud server.

このように、本発明に係る機械学習器としては、「強化学習」だけでなく、「教師あり学習」、あるいは、「教師なし学習」や「半教師あり学習」等の様々な機械学習の手法を適用することが可能である。 As described above, as the machine learning device according to the present invention, not only “reinforcement learning” but also “supervised learning” or various machine learning methods such as “unsupervised learning” and “semi-supervised learning” Can be applied.

以上、実施形態を説明したが、ここに記載したすべての例や条件は、発明および技術に適用する発明の概念の理解を助ける目的で記載されたものであり、特に記載された例や条件は発明の範囲を制限することを意図するものではない。また、明細書のそのような記載は、発明の利点および欠点を示すものでもない。発明の実施形態を詳細に記載したが、各種の変更、置き換え、変形が発明の精神および範囲を逸脱することなく行えることが理解されるべきである。 Although the embodiments have been described above, all the examples and conditions described here are described for the purpose of helping understanding of the concept of the invention applied to the invention and the technology, and the described examples and conditions are It is not intended to limit the scope of the invention. Nor does such a description in the specification indicate the advantages and disadvantages of the invention. While the embodiments of the invention have been described in detail, it should be understood that various changes, substitutions, and changes can be made without departing from the spirit and scope of the invention.

１工作機械
２，４機械学習器
３数値制御装置
５データベース
６操作者
２１，４１状態観測部
２２，４２学習部
２３報酬計算部
２４価値関数更新部
２５，４５意思決定部
３０表示装置
３１検知部
３２通信部
４３誤差計算部
４４学習モデル更新部 1 Machine tool 2, 4 Machine learning device 3 Numerical control device 5 Database 6 Operator 21, 41 State observation part 22, 42 Learning part 23 Reward calculation part 24 Value function updating part 25, 45 Decision making part 30 Display device 31 Detection part 32 communication unit 43 error calculation unit 44 learning model updating unit

Claims

A machine learning device that detects an operator, communicates with a database in which information of the operator is registered, and learns to display an operation menu based on the information of the operator,
A state observation unit for observing the operation history of the operation menu,
A learning unit for learning the display of the operation menu based on the operation history of the operation menu observed by the state observation unit,
A machine learning device characterized by that.

The operation history of the operation menu is
Including the number of times the operation menu is accessed and transition information of the operation menu,
The machine learning device according to claim 1, wherein:

The state observation unit further includes
Information on the currently selected operation menu, information indicating whether or not the machine tool is in machining operation, alarm information for the numerical control device and the machine tool, and information indicating whether or not a program is being edited. Including at least one of,
The machine learning device according to claim 1 or 2, characterized in that.

further,
The learning unit is provided with a decision making unit that decides the position and order of the operation menu displayed on the display device with reference to the display of the operation menu learned
The machine learning device according to claim 1, wherein the machine learning device is a machine learning device.

The learning unit is
A reward calculation unit that calculates a reward based on the output of the state observation unit,
A value function updating unit that updates the value function that determines the value of the position and order of the operation menu displayed on the display device based on the output of the state observing unit and the reward calculating unit, and a value function updating unit that updates according to the reward.
The machine learning device according to any one of claims 1 to 4, wherein:

The reward calculator is
Give a positive reward when operated from the position of the operation menu that is easy to access and the menu arranged in order,
Give a negative reward when operated from the position of the operation menu that is difficult to access and the menu arranged in order,
The machine learning device according to claim 5, wherein:

The learning unit is
An output of the state observation unit, and an error calculation unit that calculates an error based on the input teacher data,
A learning model updating unit that updates a learning model that determines an error in the position and order of the operation menu displayed on the display device based on the outputs of the state observing unit and the error calculating unit,
The machine learning device according to any one of claims 1 to 4, wherein:

The machine learning device comprises a neural network,
The machine learning device according to claim 1, wherein the machine learning device is a machine learning device.

The operator information includes information on the operator's position or authority level,
The operation menu based on the information of the operator changes based on information of the position or authority level of the operator,
9. The machine learning device according to claim 1, wherein the machine learning device is a machine learning device.

A detection unit for detecting the operator,
A communication unit that communicates with a database in which the information of the operator is registered,
A machine learning device according to any one of claims 1 to 9,
A display device for displaying an operation menu learned by the machine learning device,
Numerical control device characterized in that.

A numerical controller, a machine tool controlled by the numerical controller, and the machine learning device according to claim 1.
Machine tool system characterized by

A manufacturing system comprising a plurality of machine tool systems according to claim 11,
The machine learning device is provided in each of the machine tool systems,
The plurality of machine learning devices provided in the plurality of machine tool systems are configured to share or exchange data with each other via a communication medium.
A manufacturing system characterized by the above.

The machine learning device exists on a cloud server,
The manufacturing system according to claim 12, wherein:

A machine learning method of detecting an operator, communicating with a database in which information of the operator is registered, and learning to display an operation menu based on the information of the operator,
Observe the operation history of the operation menu,
Learning the display of the operation menu based on the operation history of the observed operation menu,
A machine learning method characterized by the above.

The operation history of the operation menu is
Including the number of times the operation menu is accessed and transition information of the operation menu,
15. The machine learning method according to claim 14, wherein: