JP6663538B1

JP6663538B1 - Machine learning device

Info

Publication number: JP6663538B1
Application number: JP2019508980A
Authority: JP
Inventors: 慎吾千田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2018-04-17
Filing date: 2018-04-17
Publication date: 2020-03-11
Anticipated expiration: 2038-04-17
Also published as: CN111954582A; CN111954582B; JPWO2019202672A1; WO2019202672A1

Abstract

機械学習装置（１００）は、放電加工機（１）における加工条件を制御する制御パラメータを学習する。機械学習装置（１００）は、放電加工中の加工状態を表す複数の状態変数を観測する状態観測部（３０）と、複数の状態変数に基づいて制御パラメータを学習する学習部（４０）と、を備える。The machine learning device (100) learns control parameters for controlling machining conditions in the electric discharge machine (1). A machine learning device (100) includes a state observation unit (30) for observing a plurality of state variables representing a machining state during electric discharge machining, and a learning unit (40) for learning a control parameter based on the plurality of state variables. Equipped with.

Description

本発明は、放電加工を制御する制御パラメータを学習する機械学習装置、放電加工機および機械学習方法に関する。 The present invention relates to a machine learning device for learning control parameters for controlling electric discharge machining, an electric discharge machine, and a machine learning method.

放電加工機において安定した加工を行うために、電源電圧波形および電源電流波形の変更、サーボ動作である極間制御動作の変更といった物理量として表現される加工条件の自動変更を行う機能として適応制御機能がある。上記加工条件は、ユーザが変更可能な数種から十数種の加工パラメータにより決定される。被加工物を加工するために印加される電圧の大小または加工電流パルスの形状を変更するパラメータ、被加工物と工具となる加工電極との相対距離を調整するパラメータ、加工電極の送り速度を変えるパラメータなどが加工パラメータに該当する。 Adaptive control function to automatically change machining conditions expressed as physical quantities, such as changing power supply voltage waveform and power supply current waveform, and changing the gap control operation, which is a servo operation, in order to perform stable machining in an electric discharge machine There is. The processing conditions are determined by several to more than ten types of processing parameters that can be changed by the user. Parameters for changing the magnitude of the voltage applied to process the workpiece or the shape of the processing current pulse, parameters for adjusting the relative distance between the workpiece and the processing electrode serving as a tool, and changing the feed rate of the processing electrode Parameters and the like correspond to the processing parameters.

これらの加工パラメータの組合せは、代表的な加工形状、被加工材質および電極材質を使用して実験的に適切な値のセットとして求められたりすることにより、放電加工機に予め複数のセットが設定されていて、ユーザが選択できるようになっている場合もある。しかし、放電加工機で加工の対象とされる形状は三次元的な複雑形状であり、また通電できれば加工可能という放電加工機の特性上被加工材質も種々に渡る。したがって、加工パラメータの最適化が必要であり、例えば、特許文献１においては、作業者が入力した加工状態を利用して、加工パラメータを自動設定することが示されている。 A combination of these machining parameters can be experimentally obtained as a set of appropriate values using representative machining shapes, workpiece materials and electrode materials, and a plurality of sets are set in advance in the electric discharge machine. In some cases, it is possible for the user to make a selection. However, the shape to be machined by the electric discharge machine is a three-dimensional complicated shape, and the material to be machined also varies in view of the characteristics of the electric discharge machine, which can be machined if it can be energized. Therefore, it is necessary to optimize the processing parameters. For example, Patent Literature 1 discloses that the processing parameters are automatically set using a processing state input by an operator.

特開平２−２１２０４１号公報Japanese Patent Application Laid-Open No. H2-212041

しかし、特許文献１に記載の自動設定では、単一種類の加工状態に基づいて、ユーザも設定することが可能な加工パラメータの一部を調整するのみである。また、加工パラメータに基づいて、最終的に物理量として表現される加工条件を実現するためには各種制御パラメータがその背景として無数に存在しており、それらの制御パラメータの調整は行われていない。 However, in the automatic setting described in Patent Literature 1, only a part of processing parameters that can be set by a user is adjusted based on a single type of processing state. In addition, in order to finally realize the processing conditions expressed as physical quantities based on the processing parameters, there are countless various control parameters as the background, and the control parameters are not adjusted.

したがって、放電加工機の適応制御において、物理量としてより適切な加工条件を取得できる適応制御が求められていた。 Therefore, in the adaptive control of the electric discharge machine, there has been a demand for adaptive control capable of acquiring more appropriate machining conditions as physical quantities.

本発明は、上記に鑑みてなされたものであって、放電加工においてより適切な加工条件を自動的に学習することができる機械学習装置を得ることを目的とする。 The present invention has been made in view of the above, and an object of the present invention is to provide a machine learning device capable of automatically learning more appropriate machining conditions in electric discharge machining.

上述した課題を解決し、目的を達成するために、本発明の機械学習装置は、放電加工機における加工条件を決定するための制御パラメータを学習する。本発明の機械学習装置は、放電加工中の加工状態を表す複数の状態変数を観測する状態観測部と、複数の状態変数を前回の状態変数とそれぞれ比較して、加工状態が安定するような制御パラメータを学習する学習部と、を備える。制御パラメータは、ユーザにより設定される加工条件設定値と物理量である加工条件との対応テーブルである。 In order to solve the above-described problem and achieve the object, a machine learning device of the present invention learns control parameters for determining machining conditions in an electric discharge machine. The machine learning device of the present invention is a state observation unit that observes a plurality of state variables representing a machining state during electric discharge machining, and compares the plurality of state variables with a previous state variable so that the machining state is stabilized. A learning unit that learns control parameters. The control parameter is a correspondence table between a processing condition set value set by a user and a processing condition which is a physical quantity.

本発明にかかる機械学習装置は、放電加工においてより適切な加工条件を自動的に学習することができるという効果を奏する。 The machine learning device according to the present invention has an effect that a more appropriate machining condition can be automatically learned in electric discharge machining.

本発明の実施の形態１にかかる放電加工機の構成を示すブロック図1 is a block diagram showing a configuration of an electric discharge machine according to a first embodiment of the present invention. 実施の形態１にかかる加工条件を制御目的で分類した図FIG. 6 is a diagram in which processing conditions according to the first embodiment are classified for control purposes. 実施の形態１にかかる電圧制御に関わる加工条件の制御パラメータを説明する図FIG. 4 is a diagram for explaining control parameters of processing conditions related to voltage control according to the first embodiment. 実施の形態１にかかる電圧制御に関わる加工条件と電流パルスの発生周期との関係を示す図FIG. 4 is a diagram illustrating a relationship between processing conditions related to voltage control and a generation cycle of a current pulse according to the first embodiment. 実施の形態１にかかるパルス制御に関わる加工条件の制御パラメータを説明する図FIG. 4 is a diagram for explaining control parameters of processing conditions related to pulse control according to the first embodiment. 実施の形態１にかかるパルス制御に関わる加工条件と電流パルスの形状との関係を示す図FIG. 4 is a diagram showing a relationship between machining conditions related to pulse control and the shape of a current pulse according to the first embodiment. 実施の形態１にかかる軸駆動制御に関わる加工条件の制御パラメータを説明する図FIG. 3 is a diagram for explaining control parameters of machining conditions related to axis drive control according to the first embodiment. 実施の形態１にかかる軸駆動制御に関わる加工条件と軸駆動による極間制御との関係を示す図The figure which shows the relationship between the machining condition regarding shaft drive control concerning Embodiment 1, and gap control by shaft drive. 実施の形態１にかかる電圧パルスおよび電流パルスの状態を説明する図FIG. 4 is a diagram illustrating states of a voltage pulse and a current pulse according to the first embodiment. 実施の形態１にかかる電圧パルスおよび電流パルスが安定である場合を示す図FIG. 4 is a diagram showing a case where voltage pulses and current pulses according to the first embodiment are stable. 実施の形態１にかかる電圧パルスおよび電流パルスが不安定である場合を示す図FIG. 4 is a diagram illustrating a case where a voltage pulse and a current pulse according to the first embodiment are unstable. 実施の形態１にかかる理想的な平均電圧値の分布を示す図FIG. 3 is a diagram showing an ideal average voltage value distribution according to the first embodiment; 実施の形態１にかかる安定した放電が継続するときの平均電圧値の分布を示す図FIG. 7 is a diagram showing a distribution of average voltage values when stable discharge according to the first embodiment continues. 実施の形態１にかかる不安定な放電が継続するときの平均電圧値の分布を示す図FIG. 4 is a diagram showing a distribution of average voltage values when unstable discharge according to the first embodiment continues. 実施の形態１にかかる電圧制御に関わる加工条件の制御パラメータの学習による最適化処理を説明するフローチャート4 is a flowchart for explaining an optimization process by learning control parameters of machining conditions related to voltage control according to the first embodiment; 実施の形態１にかかるパルス制御に関わる加工条件の制御パラメータの学習による最適化処理を説明するフローチャート5 is a flowchart for explaining an optimization process by learning control parameters of machining conditions related to pulse control according to the first embodiment; 実施の形態１にかかる軸駆動制御に関わる加工条件の制御パラメータの学習による最適化処理を説明するフローチャート4 is a flowchart for explaining an optimization process by learning control parameters of machining conditions related to axis drive control according to the first embodiment; 本発明の実施の形態２にかかる放電加工機の構成を示すブロック図FIG. 2 is a block diagram showing a configuration of an electric discharge machine according to a second embodiment of the present invention. 本発明の実施の形態３にかかる放電加工機の構成を示すブロック図3 is a block diagram showing a configuration of an electric discharge machine according to a third embodiment of the present invention. 実施の形態１から３にかかる機械学習装置の機能をコンピュータシステムで実現する場合のハードウェア構成を示す図FIG. 2 is a diagram illustrating a hardware configuration when the functions of the machine learning device according to the first to third embodiments are implemented by a computer system.

以下に、本発明の実施の形態にかかる機械学習装置、放電加工機および機械学習方法を図面に基づいて詳細に説明する。なお、この実施の形態によりこの発明が限定されるものではない。 Hereinafter, a machine learning device, an electric discharge machine, and a machine learning method according to an embodiment of the present invention will be described in detail with reference to the drawings. The present invention is not limited by the embodiment.

実施の形態１．
図１は、本発明の実施の形態１にかかる放電加工機１の構成を示すブロック図である。放電加工機１には、加工工具となる加工電極２と、加工電極２と被加工物３との間の距離を制御するための駆動装置４と、加工電極２と被加工物３との間に放電を発生させるための加工電源５と、駆動装置４および加工電源５を制御する制御装置１０とを備える。被加工物３は加工電源５に接続されている。駆動装置４は、加工電極２および被加工物３のいずれか、または両方を駆動することができる。Embodiment 1 FIG.
FIG. 1 is a block diagram illustrating a configuration of the electric discharge machine 1 according to the first embodiment of the present invention. The electric discharge machine 1 includes a machining electrode 2 serving as a machining tool, a driving device 4 for controlling a distance between the machining electrode 2 and the workpiece 3, and a driving device 4 between the machining electrode 2 and the workpiece 3. And a control device 10 for controlling the driving device 4 and the machining power supply 5. The workpiece 3 is connected to a processing power supply 5. The driving device 4 can drive one or both of the processing electrode 2 and the workpiece 3.

制御装置１０は、駆動装置４を制御する軸駆動制御部１１と、加工電源５を制御する加工電源制御部１２と、加工条件設定値を設定する加工条件設定部１５と、加工条件に対応する制御パラメータを保持する制御パラメータ保持部１３と、制御パラメータの初期値を設定する初期パラメータ設定部１４と、を備える。なお、加工条件設定値は、加工条件を指定する設定値である。 The control device 10 corresponds to an axis drive control unit 11 that controls the driving device 4, a processing power supply control unit 12 that controls the processing power supply 5, a processing condition setting unit 15 that sets processing condition setting values, and processing conditions. The control device includes a control parameter storage unit 13 for storing control parameters, and an initial parameter setting unit 14 for setting initial values of the control parameters. Note that the processing condition set value is a set value that specifies a processing condition.

制御パラメータは、加工条件設定値と加工条件との関係を規定するパラメータであり、加工条件設定値および制御パラメータに基づいて具体的な物理量で表現される加工条件が決定される。したがって、加工電極２と被加工物３との間に放電を発生させて加工する際の加工パターンの加工条件は、加工条件設定部１５が設定した加工条件設定値と、制御パラメータ保持部１３に保持されている制御パラメータとに基づいて決定される。すなわち、放電加工機１における物理量で表現される加工条件は制御パラメータにより制御される。ユーザは、加工条件設定値を設定することができるが、制御パラメータを設定したり変更したりすることはできない。軸駆動制御部１１および加工電源制御部１２は、加工条件設定部１５および制御パラメータ保持部１３から与えられた情報に基づいて、上記加工条件の加工パターンに応じた指令を発する。後述するように制御パラメータは変更されるが、制御パラメータ保持部１３に最初に設定されている制御パラメータの初期値は初期パラメータ設定部１４により設定される。制御パラメータが加工条件設定値と加工条件との対応テーブルで表現される場合、制御パラメータの初期値は、初期値となる対応テーブルとなる。 The control parameter is a parameter that defines a relationship between the processing condition set value and the processing condition, and a processing condition expressed by a specific physical quantity is determined based on the processing condition set value and the control parameter. Therefore, the processing conditions of the processing pattern when processing by generating an electric discharge between the processing electrode 2 and the workpiece 3 are stored in the processing condition setting value set by the processing condition setting unit 15 and the control parameter holding unit 13. The determination is made based on the held control parameters. That is, machining conditions expressed by physical quantities in the electric discharge machine 1 are controlled by the control parameters. The user can set the processing condition setting value, but cannot set or change the control parameter. The axis drive control unit 11 and the processing power supply control unit 12 issue a command corresponding to the processing pattern of the processing condition based on the information provided from the processing condition setting unit 15 and the control parameter holding unit 13. Although the control parameters are changed as described later, the initial values of the control parameters initially set in the control parameter holding unit 13 are set by the initial parameter setting unit 14. When the control parameter is represented by a correspondence table between the processing condition setting value and the processing condition, the initial value of the control parameter is a correspondence table serving as the initial value.

駆動装置４は、軸駆動制御部１１からの上記指令に基づいて、加工電極２と被加工物３との相対距離および相対速度を制御する。加工電源５は、加工電源制御部１２からの上記指令に基づいて、加工電極２と被加工物３との間に電圧を印加して、放電時の電流波形を制御する。 The drive device 4 controls the relative distance and the relative speed between the processing electrode 2 and the workpiece 3 based on the command from the shaft drive control unit 11. The machining power supply 5 applies a voltage between the machining electrode 2 and the workpiece 3 based on the command from the machining power supply control unit 12 to control a current waveform at the time of discharge.

制御装置１０は、さらに、入出力部２０と、機械学習装置１００と、パラメータ変更部５０と、学習結果記憶部８０とを備える。 The control device 10 further includes an input / output unit 20, a machine learning device 100, a parameter changing unit 50, and a learning result storage unit 80.

入出力部２０は、ユーザの入力を受け付け、表示によりユーザの確認作業をサポートする入出力インタフェースである。入出力部２０は、加工条件設定部１５にユーザが設定させたい加工条件設定値を受け付ける加工条件入力部２１と、ユーザが加工状態を観測する確認作業を行うための表示部２２とを備える。 The input / output unit 20 is an input / output interface that accepts a user's input and supports the user's confirmation work by display. The input / output unit 20 includes a processing condition input unit 21 that receives a processing condition setting value that the user wants the processing condition setting unit 15 to set, and a display unit 22 that allows the user to perform a checking operation for observing the processing state.

機械学習装置１００は、状態観測部３０および学習部４０を備える。状態観測部３０は、軸駆動認識部３１と、パルス状態認識部３２と、加工状態観測部３３とを備える。学習部４０は、報酬計算部４７および関数更新部４８を備え、制御パラメータを学習することにより最適化する。 The machine learning device 100 includes a state observation unit 30 and a learning unit 40. The state observation unit 30 includes an axis drive recognition unit 31, a pulse state recognition unit 32, and a machining state observation unit 33. The learning unit 40 includes a reward calculation unit 47 and a function update unit 48, and optimizes by learning control parameters.

報酬計算部４７は、電圧制御にかかる報酬を計算する第１報酬計算部４１と、パルス制御にかかる報酬を計算する第２報酬計算部４３と、軸駆動制御にかかる報酬を計算する第３報酬計算部４５とを備える。関数更新部４８は、電圧制御にかかる関数を更新する第１関数更新部４２と、パルス制御にかかる関数を更新する第２関数更新部４４と、軸駆動制御にかかる関数を更新する第３関数更新部４６とを備える。 The reward calculator 47 calculates a reward for voltage control, a first reward calculator for calculating a reward for pulse control, and a third reward for calculating a reward for axis drive control. A calculation unit 45. The function updating unit 48 includes a first function updating unit 42 for updating a function relating to voltage control, a second function updating unit 44 for updating a function relating to pulse control, and a third function updating a function relating to axis driving control. And an updating unit 46.

パラメータ変更部５０は、電圧制御に関わる加工条件の制御パラメータを変更する第１制御パラメータ変更部５１と、パルス制御に関わる加工条件の制御パラメータを変更する第２制御パラメータ変更部５２と、軸駆動制御に関わる加工条件の制御パラメータを変更する第３制御パラメータ変更部５３とを備える。パラメータ変更部５０は、学習部４０が学習した結果に基づいて、制御パラメータ保持部１３が保持する制御パラメータを変更する。 The parameter changing unit 50 includes: a first control parameter changing unit 51 that changes a control parameter of a machining condition related to voltage control; a second control parameter changing unit 52 that changes a control parameter of a machining condition related to pulse control; A third control parameter changing unit 53 that changes a control parameter of a processing condition related to control; The parameter changing unit 50 changes the control parameters held by the control parameter holding unit 13 based on the result learned by the learning unit 40.

学習結果記憶部８０は、機械学習装置１００による学習結果を記憶する。 The learning result storage unit 80 stores the learning result of the machine learning device 100.

放電加工機１が放電加工を開始すると、加工条件設定部１５が出力する加工条件設定値に基づいて、軸駆動制御部１１および加工電源制御部１２が指令を行い、駆動装置４および加工電源５の動作によって、加工電極２と被加工物３との間に放電が発生する。 When the electric discharge machine 1 starts electric discharge machining, the axis drive control unit 11 and the machining power supply control unit 12 issue commands based on the machining condition setting values output by the machining condition setting unit 15, and the drive unit 4 and the machining power supply 5 By the above operation, a discharge is generated between the machining electrode 2 and the workpiece 3.

放電加工が行われている間、駆動装置４は軸駆動制御部１１の指令に従って、加工電極２と被加工物３との相対距離を小さくまたは大きくさせながら、放電が発生する最適な相対距離を探索する。この時の駆動軸の位置および駆動軸の動作についての情報は、軸駆動認識部３１が取得して、加工状態観測部３３に軸挙動履歴として記録される。 During the electric discharge machining, the driving device 4 adjusts the optimal relative distance at which electric discharge occurs while decreasing or increasing the relative distance between the machining electrode 2 and the workpiece 3 in accordance with a command from the shaft drive control unit 11. Explore. The information on the position of the drive shaft and the operation of the drive shaft at this time is acquired by the shaft drive recognition unit 31 and recorded in the machining state observation unit 33 as the shaft behavior history.

また、放電加工が行われている間は、上記した駆動装置４の動作と同時に、加工電源５は加工電源制御部１２の指令によって、加工電極２と被加工物３との間に電圧を印加して、指令された形状の電流波形の電流パルスを発生させる。加工電源制御部１２は、加工条件設定部１５からの加工条件設定値に基づいて、指令された形状の電流波形の電流パルスを一定周期で発生させるように加工電源５の電圧を制御する。しかし、物理特性上、放電加工において確実に一定周期で電流パルスを発生させることは不可能である。また、電流パルスの形状も理論値が示す電流波形とは異なるものが生成される場合がある。この電流パルスの発生周期および電流パルスの形状、それに加えて、パルス発生の元となる印加電圧の大きさおよび印加周期、電圧パルス形状を示す電圧波形の情報は、パルス状態認識部３２が取得して、加工状態観測部３３にパルス挙動履歴として記録される。 During the electric discharge machining, the machining power supply 5 applies a voltage between the machining electrode 2 and the workpiece 3 at the same time as the operation of the driving device 4 according to a command from the machining power supply control unit 12. Then, a current pulse having a current waveform of a commanded shape is generated. The processing power supply control unit 12 controls the voltage of the processing power supply 5 based on the processing condition setting value from the processing condition setting unit 15 so as to generate a current pulse having a current waveform of a specified shape at a constant cycle. However, due to physical characteristics, it is impossible to reliably generate a current pulse at a constant cycle in electric discharge machining. In some cases, the shape of the current pulse may be different from the current waveform indicated by the theoretical value. The pulse state recognizing unit 32 acquires the information about the generation cycle of the current pulse and the shape of the current pulse, and the voltage waveform indicating the magnitude and the application cycle of the applied voltage and the voltage pulse shape that are the source of the pulse generation. Then, it is recorded as a pulse behavior history in the machining state observation unit 33.

加工状態観測部３３は、パルス挙動履歴および軸挙動履歴から、一定期間における電圧値の分布、電流パルスの発生周期、電流パルスが発生した際の軸の位置情報、速度情報および加速度情報を得る。加工状態観測部３３は、現在使用されている制御パラメータの下で実行されている放電加工により得られたこれらの情報を、制御パラメータ保持部１３に設定されている現在使用されている制御パラメータと紐付けて学習部４０に与える。 The machining state observation unit 33 obtains, from the pulse behavior history and the axis behavior history, the distribution of the voltage value in a certain period, the generation cycle of the current pulse, the position information of the axis when the current pulse is generated, the speed information, and the acceleration information. The machining state observation unit 33 compares the information obtained by the electric discharge machining executed under the currently used control parameters with the currently used control parameters set in the control parameter holding unit 13. It is provided to the learning unit 40 in association with the information.

以下では、加工条件及び加工条件と制御パラメータとの関係について詳細に説明する。図２は、実施の形態１にかかる加工条件を制御目的で分類した図である。図３は、実施の形態１にかかる電圧制御に関わる加工条件の制御パラメータを説明する図である。図４は、実施の形態１にかかる電圧制御に関わる加工条件と電流パルスの発生周期との関係を示す図である。図５は、実施の形態１にかかるパルス制御に関わる加工条件の制御パラメータを説明する図である。図６は、実施の形態１にかかるパルス制御に関わる加工条件と電流パルスの形状との関係を示す図である。図７は、実施の形態１にかかる軸駆動制御に関わる加工条件の制御パラメータを説明する図である。図８は、実施の形態１にかかる軸駆動制御に関わる加工条件と軸駆動による極間制御との関係を示す図である。 Hereinafter, the processing conditions and the relationship between the processing conditions and the control parameters will be described in detail. FIG. 2 is a diagram in which machining conditions according to the first embodiment are classified for control purposes. FIG. 3 is a diagram illustrating control parameters of machining conditions related to voltage control according to the first embodiment. FIG. 4 is a diagram illustrating a relationship between processing conditions related to voltage control according to the first embodiment and a generation cycle of a current pulse. FIG. 5 is a diagram for explaining control parameters of processing conditions related to the pulse control according to the first embodiment. FIG. 6 is a diagram illustrating a relationship between processing conditions related to pulse control and the shape of a current pulse according to the first embodiment. FIG. 7 is a diagram for explaining control parameters of machining conditions related to the axis drive control according to the first embodiment. FIG. 8 is a diagram illustrating a relationship between machining conditions related to the shaft drive control and the gap control by the shaft drive according to the first embodiment.

図２では、（１）加工回路の種別、（２）回路補助設定、（３）電流パルスピーク値、（４）電流パルス長さ、（５）パルス休止時間、（６）極間ギャップ調整値、（７）ジャンプスピード、（８）ジャンプ高さ、（９）最深値持続時間、（１０）軸応答性、（１１）狙い電圧値、といった加工条件が、電圧制御に関わる加工条件、パルス制御に関わる加工条件または軸駆動制御に関わる加工条件に該当する場合には対応する欄に黒丸を付してある。電圧制御に関わる加工条件は電流パルスの発生周期に関係し、パルス制御に関わる加工条件は電流パルスの形状に関係し、軸駆動制御に関わる加工条件は極間制御に関係する。 In FIG. 2, (1) type of processing circuit, (2) circuit auxiliary setting, (3) current pulse peak value, (4) current pulse length, (5) pulse pause time, (6) gap adjustment value between poles , (7) jump speed, (8) jump height, (9) maximum depth duration, (10) axis responsiveness, (11) target voltage value, etc. In the case where the processing condition related to the above or the processing condition related to the axis drive control is satisfied, a black circle is attached to the corresponding column. The processing conditions related to the voltage control relate to the generation cycle of the current pulse, the processing conditions related to the pulse control relate to the shape of the current pulse, and the processing conditions related to the shaft drive control relate to the gap control.

図３、図５および図７は、加工条件設定部１５が加工条件設定値を設定する加工条件に対応して制御パラメータ保持部１３が保持する制御パラメータを示している。図２に示した各加工条件は、電流パルスの発生周期、電流パルスの形状、極間制御に関わっているが、重複して関わっていることがある。したがって、電流パルスの発生周期、電流パルスの形状または極間制御のいずれかを変更するように関連する加工パラメータを変更すると、他のものにも影響することがある。 FIGS. 3, 5, and 7 show control parameters held by the control parameter holding unit 13 corresponding to the processing conditions for which the processing condition setting unit 15 sets the processing condition set values. Although each processing condition shown in FIG. 2 relates to the generation cycle of the current pulse, the shape of the current pulse, and the gap control, the processing conditions may be overlapped. Therefore, changing the associated processing parameters to change any of the current pulse generation period, the shape of the current pulse, or the gap control may also affect others.

また、各加工条件は加工条件設定値によりノッチが指定され、複数の加工条件は各加工条件の指定されたノッチの組み合わせであるノッチパターンとして表現される。ノッチとは、加工条件を示す物理量を離散的に指定する刻みのことである。通常、数種類または数十種類のノッチパターンが放電加工機１に予め登録されている。各加工条件は、加工条件設定値によるノッチの選択とは別に、ユーザが変更することが出来ない制御パラメータを有している。先に説明したように、ノッチの選択を示す加工条件設定値および制御パラメータに基づいて具体的な物理量である加工条件が決定される。制御パラメータの具体例は、ノッチ分割数およびノッチ配分値である。ノッチ分割数は、当該加工条件において選択可能なノッチの数である。ノッチ配分値は、各ノッチに割当てられる加工条件の物理量の値である。ただし、制御パラメータはこれらに限定されない。図２に挙げた１１種類の加工条件それぞれの制御パラメータを変数としてとらえた場合、その変数の総数は、数十から数百に及ぶ。 A notch is specified for each processing condition by a processing condition setting value, and a plurality of processing conditions are expressed as a notch pattern which is a combination of the notch specified for each processing condition. The notch is a step that discretely specifies a physical quantity indicating a processing condition. Usually, several or dozens of notch patterns are registered in the electric discharge machine 1 in advance. Each processing condition has a control parameter that cannot be changed by the user, separately from the selection of the notch based on the processing condition set value. As described above, the processing condition, which is a specific physical quantity, is determined based on the processing condition set value indicating the selection of the notch and the control parameter. Specific examples of the control parameters are the notch division number and the notch distribution value. The notch division number is the number of notches that can be selected under the processing conditions. The notch distribution value is a value of a physical quantity of a processing condition assigned to each notch. However, the control parameters are not limited to these. When the control parameters of each of the eleven types of machining conditions shown in FIG. 2 are taken as variables, the total number of the variables ranges from tens to hundreds.

図３は、電圧制御に関わる加工条件の制御パラメータを説明している。図４は、電圧制御に関わる加工条件である、（４）電流パルス長さ、（５）パルス休止時間、（６）極間ギャップ調整値、（９）最深値持続時間、（１０）軸応答性および（１１）狙い電圧値が電流パルスの発生周期とどのように関わるかの概略を示している。（４）電流パルス長さおよび（５）パルス休止時間は図４の矢印で示した幅で示した加工条件であり、（６）極間ジャンプ調整値、（９）最深値持続時間、（１０）軸応答性および（１１）狙い電圧値は電流パルスの発生周期に関連する加工条件である。 FIG. 3 illustrates control parameters of processing conditions related to voltage control. FIG. 4 shows machining conditions related to voltage control, (4) current pulse length, (5) pulse pause time, (6) gap gap adjustment value, (9) deepest value duration, (10) axis response. And (11) how the target voltage value relates to the current pulse generation cycle. The (4) current pulse length and (5) pulse pause time are processing conditions indicated by the width indicated by the arrow in FIG. 4, (6) the gap jump adjustment value, (9) the deepest value duration, and (10) ) The axis response and (11) the target voltage value are machining conditions related to the current pulse generation cycle.

具体例を挙げると、（４）電流パルス長さの制御パラメータは長さ制御パラメータとなるノッチ分割数およびノッチ配分値である。電流パルス長さに対してある制御パラメータであるノッチ分割数およびノッチ配分値が設定されているとする。このとき、加工条件設定値０が指定するノッチに対して電流パルス長さ＝２μsecが対応し、加工条件設定値１が指定するノッチに対して電流パルス長さ＝４μsecが対応し、加工条件設定値２が指定するノッチに対して電流パルス長さ＝８μsecが対応するといった対応関係が上記制御パラメータにより規定される。電流パルス長さの制御パラメータが変更されると上記対応関係が変更されるので、同じ加工条件設定値に対する電流パルス長さが変更されることになる。ただし、制御パラメータが変更されたとしても、変更されたノッチ配分値によっては、全ての加工条件設定値に対する加工条件の値が変更されなくてもかまわない。 To give a specific example, (4) the control parameters of the current pulse length are the number of notch divisions and the notch distribution values which are the length control parameters. It is assumed that notch division numbers and notch distribution values, which are certain control parameters, are set for the current pulse length. At this time, the current pulse length = 2 μsec corresponds to the notch designated by the machining condition set value 0, and the current pulse length = 4 μsec corresponds to the notch designated by the machining condition set value 1. The corresponding relationship that the current pulse length = 8 μsec corresponds to the notch designated by the value 2 is defined by the control parameter. When the control parameter of the current pulse length is changed, the correspondence is changed, so that the current pulse length for the same processing condition set value is changed. However, even if the control parameters are changed, the values of the processing conditions for all the processing condition set values may not be changed depending on the changed notch distribution values.

図５は、パルス制御に関わる加工条件の制御パラメータを説明している。図６は、パルス制御に関わる加工条件である、（１）加工回路の種別、（２）回路補助設定、（３）電流パルスピーク値、（４）電流パルス長さ、（６）極間ギャップ調整値、（１１）狙い電圧値が電流パルスの形状とどのように関わるかの概略を示している。（１）加工回路の種別の回路呼び出しパラメータが変更されると加工回路が変更されるので電流パルスの形状が変化する。（２）回路補助設定は電流パルスの立ち上がりの傾きを規定する。（３）電流パルスピーク値は電流パルスのピーク値を規定する。（４）電流パルス長さは電流パルスのパルス長さを規定する。（６）極間ギャップ調整値および（１１）狙い電圧値は電流パルスの間隔に関連する加工条件である。 FIG. 5 illustrates control parameters of processing conditions related to pulse control. FIG. 6 shows processing conditions related to pulse control, (1) type of processing circuit, (2) circuit auxiliary setting, (3) current pulse peak value, (4) current pulse length, and (6) gap between poles. (11) An outline of how the adjustment value and the target voltage value relate to the shape of the current pulse is shown. (1) When the circuit call parameter of the type of the processing circuit is changed, the processing circuit is changed, so that the shape of the current pulse changes. (2) The circuit auxiliary setting defines the slope of the rise of the current pulse. (3) The current pulse peak value defines the peak value of the current pulse. (4) The current pulse length defines the pulse length of the current pulse. The (6) gap adjustment value and (11) target voltage value are processing conditions related to the interval between current pulses.

具体例を挙げると、（３）電流パルスピーク値の制御パラメータはピーク制御パラメータとなるノッチ分割数およびノッチ配分値である。電流パルスピーク値Ｉ_pに対してある制御パラメータであるノッチ分割数およびノッチ配分値が設定されているとする。このとき、加工条件設定値０が指定するノッチに対してＩ_p＝１Ａが対応し、加工条件設定値１が指定するノッチに対してＩ_p＝２Ａが対応し、加工条件設定値２が指定するノッチに対してＩ_p＝４Ａが対応するといった対応関係が上記制御パラメータにより規定される。電流パルスピーク値Ｉ_pの制御パラメータが変更されると上記対応関係が変更されるので、同じ加工条件設定値に対するＩ_pの値が変更されることになる。ただし、制御パラメータが変更されたとしても、変更されたノッチ配分値によっては、全ての加工条件設定値に対する加工条件の値が変更されなくてもかまわない。To give a specific example, (3) the control parameters of the current pulse peak value are the number of notch divisions and the notch distribution values which are peak control parameters. It is assumed that a certain control parameter, the notch division number and the notch distribution value, are set for the current pulse peak value _Ip . At this time, _Ip = 1A corresponds to the notch specified by the processing condition set value 0, _Ip = 2A corresponds to the notch specified by the processing condition set value 1, and the processing condition set value 2 is specified. The corresponding relationship such that I _p = 4A corresponds to the notch to be set is defined by the control parameters. When the control parameter of the current pulse peak value _Ip is changed, the correspondence is changed, so that the value of _Ip for the same processing condition setting value is changed. However, even if the control parameters are changed, the values of the processing conditions for all the processing condition set values may not be changed depending on the changed notch distribution values.

図７は、軸駆動制御に関わる加工条件の制御パラメータを説明している。図８は、軸駆動制御に関わる加工条件である、（６）極間ギャップ調整値、（７）ジャンプスピード、（８）ジャンプ高さ、（９）最深値持続時間、（１０）軸応答性および（１１）狙い電圧値が軸駆動制御とどのように関わるかの概略を示している。（６）極間ギャップ調整値、（９）最深値持続時間、（１０）軸応答性および（１１）狙い電圧値は、加工電極２と被加工物３とのアプローチ動作に関連する加工条件である。（７）ジャンプスピード、（８）ジャンプ高さおよび（１０）軸応答性は、駆動軸のジャンプ動作を含んだ被加工物３からの加工電極２の退避動作に関連する加工条件である。 FIG. 7 illustrates control parameters of machining conditions related to axis drive control. FIG. 8 shows machining conditions related to axis drive control, (6) gap adjustment value, (7) jump speed, (8) jump height, (9) deepest value duration, (10) axis response. And (11) schematically show how the target voltage value relates to axis drive control. The (6) gap adjustment value, (9) deepest value duration, (10) axis response, and (11) target voltage value are processing conditions related to the approach operation between the processing electrode 2 and the workpiece 3. is there. (7) Jump speed, (8) Jump height and (10) Axis response are processing conditions related to the retreat operation of the processing electrode 2 from the workpiece 3 including the jump operation of the drive shaft.

つぎに、放電加工における電圧パルスおよび電流パルスの安定または不安定について説明する。図９は、実施の形態１にかかる電圧パルスおよび電流パルスの状態を説明する図である。図１０は、実施の形態１にかかる電圧パルスおよび電流パルスが安定である場合を示す図である。図１１は、実施の形態１にかかる電圧パルスおよび電流パルスが不安定である場合を示す図である。図９〜図１１においては、上が電圧波形を示し、下が電流波形を示す。 Next, the stability or instability of the voltage pulse and the current pulse in electric discharge machining will be described. FIG. 9 is a diagram illustrating states of a voltage pulse and a current pulse according to the first embodiment. FIG. 10 is a diagram illustrating a case where the voltage pulse and the current pulse according to the first embodiment are stable. FIG. 11 is a diagram illustrating a case where the voltage pulse and the current pulse according to the first embodiment are unstable. 9 to 11, the upper part shows a voltage waveform and the lower part shows a current waveform.

加工電極２と被加工物３との間に電圧が印加されると、予期できないタイミングで絶縁破壊が生じて電流が流れる。安定的に加工を行える理想的な電圧および電流の関係が生じるとトランジスタ回路等で成形された一定の傾きを有する矩形波に近い電流パルスが発生する。この電流パルスが図９に安定した放電として示される。このような理想的な電圧および電流の関係が満たされない場合は、電流パルスの電流波形の形状が理想と異なる図９の不安定な放電のようになったり、加工に有効でない電流として極間に異形状の電流波形が発生する図９の異常放電のようになったりする。 When a voltage is applied between the processing electrode 2 and the workpiece 3, dielectric breakdown occurs at an unexpected timing, and a current flows. When an ideal relationship between voltage and current for stable processing is generated, a current pulse close to a rectangular wave having a constant slope and formed by a transistor circuit or the like is generated. This current pulse is shown in FIG. 9 as a stable discharge. If such an ideal relationship between the voltage and the current is not satisfied, the current waveform of the current pulse becomes unstable as shown in FIG. It may be like the abnormal discharge shown in FIG. 9 in which an irregularly shaped current waveform is generated.

放電加工の制御においては、極間の相対距離を制御するためのひとつの指標として、放電発生時における一定の時間あたりの平均電圧値を観測して制御を行う。理想的な電圧および電流の関係が維持される場合は、図１０に示すように平均電圧値が理論値となるように維持されて安定した放電が継続する。しかし、図９の不安定な放電または異常放電が繰り返される場合は、図１１に示すように平均電圧値が理論値から変動してしまい、不安定な放電が継続する。極間距離が無くなり加工電極２と被加工物３とが接触してしまった場合は短絡状態となり、加工電極２と被加工物３との距離が放電が発生しない距離まで離れてしまう場合は開放状態となるため、理論値に対する平均電圧値の変動が、ただちに放電パルスの安定または不安定を決定するものではない。また、理想的な条件下において図１０に示される安定した電流パルスのパターンが発生し続けている場合においても、絶縁破壊が生じるまでの無負荷電圧時間と呼ばれる予期できない時間間隔があるため、放電発生の周期は一定ではない。したがって、放電発生周期の増減は、加工の安定性とは独立した指標である。 In the control of the electric discharge machining, as one index for controlling the relative distance between the electrodes, the control is performed by observing the average voltage value per a certain time when electric discharge occurs. When the ideal relation between voltage and current is maintained, as shown in FIG. 10, the average voltage value is maintained at the theoretical value, and stable discharge continues. However, when the unstable discharge or the abnormal discharge shown in FIG. 9 is repeated, the average voltage value fluctuates from the theoretical value as shown in FIG. 11, and the unstable discharge continues. If the distance between the poles is lost and the machining electrode 2 and the workpiece 3 come into contact with each other, a short circuit occurs, and if the distance between the machining electrode 2 and the workpiece 3 is far enough to cause no discharge, it is opened. Because of the state, the fluctuation of the average voltage value with respect to the theoretical value does not immediately determine the stability or instability of the discharge pulse. Further, even when the stable current pulse pattern shown in FIG. 10 continues to be generated under ideal conditions, there is an unpredictable time interval called a no-load voltage time until dielectric breakdown occurs. The frequency of occurrence is not constant. Therefore, the increase / decrease in the discharge generation cycle is an index independent of the machining stability.

図１２は、実施の形態１にかかる理想的な平均電圧値の分布を示す図である。図１３は、実施の形態１にかかる安定した放電が継続するときの平均電圧値の分布を示す図である。図１４は、実施の形態１にかかる不安定な放電が継続するときの平均電圧値の分布を示す図である。図１２〜図１４においては、横軸が放電発生時における一定の時間あたりの平均電圧値を示し、縦軸が一定の時間あたりのパルス数を示す。 FIG. 12 is a diagram illustrating an ideal average voltage value distribution according to the first embodiment. FIG. 13 is a diagram showing the distribution of the average voltage value when the stable discharge according to the first embodiment continues. FIG. 14 is a diagram showing the distribution of the average voltage value when the unstable discharge according to the first embodiment continues. 12 to 14, the horizontal axis indicates the average voltage value per fixed time when the discharge occurs, and the vertical axis indicates the number of pulses per fixed time.

理想的な電圧および電流の関係が維持される場合は、図１２に示すように、上記平均電圧値は、理論値において定められたパルス数となる。実際の加工においては、物理現象上、目標とする電圧値を示す加工条件である狙い電圧値の周りに平均電圧値は分布すると共にパルス数も分布する。狙い電圧値は、理論値である必要はない。加工が安定していて図１０に示すように安定した放電が継続する場合は、図１３に示すように平均電圧値のばらつきも小さく、平均電圧値が狙い電圧値においてパルス数が最大になっている。また、加工が不安定で図１１に示すように不安定な放電が継続する場合は、図１４に示すように平均電圧値が狙い電圧値の周りに分散して大きくばらつくと共に、パルス数もばらついてしまう。 When the ideal relationship between the voltage and the current is maintained, the average voltage value is the number of pulses determined in the theoretical value, as shown in FIG. In actual processing, due to physical phenomena, an average voltage value and a pulse number are distributed around a target voltage value which is a processing condition indicating a target voltage value. The target voltage value does not need to be a theoretical value. When the machining is stable and the stable discharge continues as shown in FIG. 10, the variation in the average voltage value is small as shown in FIG. 13, and the pulse number becomes maximum at the average voltage value at the target voltage value. I have. When the machining is unstable and the unstable discharge continues as shown in FIG. 11, the average voltage value is dispersed around the target voltage value and greatly varies as shown in FIG. 14, and the number of pulses also varies. Would.

パルス状態認識部３２は、一定期間における放電発生時の電圧の分布を基に分布の良否判定を行い、パルスが安定しているか不安定であるかを判定する。一例として、パルス状態認識部３２は、加工電源制御部１２から得た平均電圧値、狙い電圧値および電圧閾値との関係に基づいて、電圧パルスおよび電流パルスが安定しているか不安定であるかを判定する。具体的には、パルス状態認識部３２は、加工電源制御部１２の指令に基づいて、放電発生時における一定の時間あたりの平均電圧値の狙い電圧値からの偏差の絶対値が電圧閾値より大きい場合はパルスの不安定信号を発生させ、不安定信号の発生回数を上記一定の時間より長い予め定めた期間の間に累積させた値を第１状態の値として求める。さらに、パルス状態認識部３２は、加工電源制御部１２の指令から得た予め定めた期間において発生したパルス数を第２状態の値として求める。上記予め定めた期間は、例えば、ジャンプ動作と言われる退避動作が終了し、放電を発生させるための極間位置制御が行われ、再び次のジャンプ動作が行われるまでの動作時間とすることができる。軸駆動認識部３１は、軸駆動制御部１１の指令から得た駆動装置４における軸の送り量を第３状態の値として求める。第３状態の値は、軸の送り量が加工進行方向に大きくなるほど正の大きな値になり、軸の送り量が後退方向に大きくなるほど、負の大きな値になるように設定される。第１状態の値、第２状態の値および第３状態の値はそれぞれ放電加工中の加工状態を表す状態変数であり、加工状態観測部３３は、取得した複数の状態変数である第１状態の値、第２状態の値および第３状態の値を分布図または棒グラフによるヒストグラムといった形式でユーザが目視で観測できるように表示部２２に表示させる。このようにして、状態観測部３０は、複数の状態変数である第１状態の値、第２状態の値および第３状態の値を観測する。そして、学習部４０の第１報酬計算部４１、第２報酬計算部４３および第３報酬計算部４５は、加工状態観測部３３が取得した第１状態の値、第２状態の値および第３状態の値に基づいて報酬を計算する。 The pulse state recognizing unit 32 determines whether the pulse is stable or unstable based on the distribution of the voltage when the discharge occurs during a certain period, and determines whether the pulse is stable or unstable. As an example, the pulse state recognition unit 32 determines whether the voltage pulse and the current pulse are stable or unstable based on the relationship between the average voltage value, the target voltage value, and the voltage threshold value obtained from the machining power control unit 12. Is determined. Specifically, based on a command from the machining power supply control unit 12, the pulse state recognition unit 32 determines that the absolute value of the deviation from the target voltage value of the average voltage value over a certain period of time when a discharge occurs is larger than the voltage threshold. In this case, a pulse unstable signal is generated, and a value obtained by accumulating the number of times the unstable signal is generated for a predetermined period longer than the above-mentioned predetermined time is obtained as a value of the first state. Further, the pulse state recognition unit 32 obtains, as a value of the second state, the number of pulses generated in a predetermined period obtained from a command from the machining power supply control unit 12. The predetermined period may be, for example, an operation time until a retreat operation called a jump operation is completed, a gap position control for generating a discharge is performed, and a next jump operation is performed again. it can. The shaft drive recognizing unit 31 obtains, as a value in the third state, a feed amount of the shaft of the drive device 4 obtained from the command of the shaft drive control unit 11. The value in the third state is set to a larger positive value as the feed amount of the shaft increases in the machining progress direction, and to a larger negative value as the feed amount of the shaft increases in the backward direction. The value of the first state, the value of the second state, and the value of the third state are state variables representing the machining state during electric discharge machining, and the machining state observation unit 33 determines the first state, which is a plurality of acquired state variables. , The value of the second state, and the value of the third state are displayed on the display unit 22 so that the user can visually observe them in the form of a histogram such as a distribution chart or a bar graph. In this way, the state observation unit 30 observes the values of the first state, the second state, and the third state, which are a plurality of state variables. Then, the first reward calculator 41, the second reward calculator 43, and the third reward calculator 45 of the learning unit 40 calculate the first state value, the second state value, and the third state value acquired by the machining state observation unit 33. Calculate rewards based on state values.

状態観測部３０、学習部４０およびパラメータ変更部５０を備えた機械学習装置１００が用いる学習アルゴリズムはどのようなものを用いてもよい。一例として、強化学習（ＲｅｉｎｆｏｒｃｅｍｅｎｔＬｅａｒｎｉｎｇ）を適用した場合について説明する。 The learning algorithm used by the machine learning device 100 including the state observation unit 30, the learning unit 40, and the parameter changing unit 50 may use any learning algorithm. As an example, a case where reinforcement learning (Reinforcement Learning) is applied will be described.

強化学習は、ある環境内におけるエージェントである行動主体が、現在の状態を観測し、取るべき行動を決定する、というものである。エージェントは、行動を選択することで環境から報酬を得て、一連の行動を通じて報酬が最も多く得られるような方策を学習する。強化学習の代表的な手法として、Ｑ学習（Ｑ−ｌｅａｒｎｉｎｇ）またはＴＤ学習（ＴＤ−ｌｅａｒｎｉｎｇ）が知られている。例えば、Ｑ学習の場合、行動価値関数Ｑ（ｓ，ａ）の一般的な更新式は、以下の数式（１）で表される。行動価値関数Ｑ（ｓ，ａ）は、行動価値テーブルとも呼ばれる。 In reinforcement learning, an agent acting as an agent in a certain environment observes the current state and determines an action to be taken. The agent obtains rewards from the environment by selecting an action, and learns a strategy to obtain the highest reward through a series of actions. As a typical method of reinforcement learning, Q learning (Q-learning) or TD learning (TD-learning) is known. For example, in the case of Q learning, a general update equation of the action value function Q (s, a) is represented by the following equation (1). The action value function Q (s, a) is also called an action value table.

数式（１）において、ｓ_tは時刻ｔにおける状態を表し、ａ_tは時刻ｔにおける行動を表す。行動ａ_tにより、状態はｓ_t+1に変わる。ｒ_t+1はその状態の変化によってもらえる報酬を表し、γは割引率を表し、αは学習係数を表す。In Equation (1), s _t represents the state at time t, a _t represents the behavior in time t. By the action a _t, the state is changed to s _{t + 1.} rt _{+ 1} represents a reward obtained by the change of the state, γ represents a discount rate, and α represents a learning coefficient.

Ｑ学習における数式（１）で表される更新式は、時刻ｔ＋１における最良の行動ａの行動価値が、時刻ｔにおいて実行された行動ａ_tの行動価値Ｑよりも大きければ、時刻ｔの行動価値Ｑを大きくし、逆の場合は、時刻ｔの行動価値Ｑを小さくする。換言すれば、時刻ｔにおける行動ａ_tの行動価値Ｑを、時刻ｔ＋１における最良の行動価値に近づけるように、行動価値関数Ｑ（ｓ_t，ａ_t）を更新する。それにより、或る環境における最良の行動価値が、それ以前の環境における行動価値に順次伝播していくようになる。Represented update equation in Equation (1) in the Q learning, action value of the best action a at time t + 1 is greater than the action value Q of the executed action a _t at time t, activation level at time t Q is increased, and conversely, the action value Q at time t is decreased. In other words, the action value Q action a _t at time t, as close to the best action value at time t + 1, action value function Q (s _t, a _t) Update. As a result, the best action value in a certain environment sequentially propagates to the action value in an earlier environment.

したがって、以下で説明する機械学習装置１００の動作において、制御パラメータの変更行動を時刻ｔにおける行動ａ_tとし、上記第１、第２および第３状態を時刻ｔにおける状態ｓ_tとすれば、Ｑ学習を行っていると理解することができる。Accordingly, in the operation of the machine learning system 100 to be described below, the change behavior of the control parameters and behavior a _t at time t, the first, if the state s _t at the second and third states the time t, Q You can understand that you are learning.

以下、機械学習装置１００による制御パラメータの最適化動作を説明する。 Hereinafter, the operation of optimizing the control parameters by the machine learning device 100 will be described.

図１５は、実施の形態１にかかる電圧制御に関わる加工条件の制御パラメータの学習による最適化処理を説明するフローチャートである。電圧制御に関わる加工条件の制御パラメータは、加工条件として設定される狙い電圧値の基となっている電圧制御を行うための変数値であり、これにより電圧の大きさだけでなく電圧波形の形状、放電を検出するための基準電圧と呼ばれる電圧基準値も含まれる。また、電圧制御に関わる加工条件の制御パラメータの最適化により、制御パラメータとして設定されている無負荷電圧時間の電圧および狙い電圧値の初期ノッチパターンを別のノッチパターンに変更するなどの処理も行われる。 FIG. 15 is a flowchart illustrating an optimization process based on learning of control parameters of processing conditions related to voltage control according to the first embodiment. The control parameters of the processing conditions related to the voltage control are variable values for performing the voltage control that are the basis of the target voltage value set as the processing conditions, and thus, not only the magnitude of the voltage but also the shape of the voltage waveform , A voltage reference value called a reference voltage for detecting discharge. In addition, by optimizing the control parameters of the machining conditions related to the voltage control, processing such as changing the initial notch pattern of the no-load voltage time and the target voltage value set as the control parameters to another notch pattern is also performed. Will be

電圧制御にかかる報酬を計算する第１報酬計算部４１は、パルス状態認識部３２が求めた状態変数である第１状態の値および第２状態の値に基づいて報酬の変化量を計算する。第１報酬計算部４１は、第１状態の値が小さく、第２状態の値が大きくなる場合に報酬を増やすように報酬の変化量を計算するのであれば、第１状態の値および第２状態の値をどのように用いて報酬の変化量を求めるかに制限はない。具体的には、第１状態の値が小さくなった場合に報酬を増やし、第１状態の値が大きくなった場合には報酬を減らす。これに加えて、第２状態の値が大きくなった場合に報酬を増やし、第２状態の値が小さくなった場合には報酬を減らす。また、不安定なパルスの数が減り、安定したパルスの数が増えた場合に報酬を増大させるとする基本的な基準に加え、不安定なパルスが減少したとしても安定したパルスの数が減った場合においては報酬が減少するように報酬の計算方法を定めてもよい。 The first reward calculating unit 41 that calculates the reward for the voltage control calculates the amount of change in the reward based on the value of the first state and the value of the second state, which are the state variables obtained by the pulse state recognition unit 32. If the first reward calculation unit 41 calculates the change amount of the reward so as to increase the reward when the value of the first state is small and the value of the second state is large, the first reward value and the second reward value are calculated. There is no restriction on how the state value is used to determine the amount of change in reward. Specifically, the reward is increased when the value in the first state is small, and the reward is reduced when the value in the first state is large. In addition, the reward is increased when the value of the second state is increased, and the reward is decreased when the value of the second state is decreased. In addition to the basic criterion of increasing the reward when the number of unstable pulses decreases and the number of stable pulses increases, the number of stable pulses decreases even if the number of unstable pulses decreases. In such a case, the method of calculating the reward may be determined so that the reward is reduced.

第１報酬計算部４１が計算した報酬に基づいて、第１関数更新部４２は電圧制御に関わる制御パラメータを決定するための関数である行動価値関数Ｑを更新する。更新された行動価値関数Ｑに基づいて、第１制御パラメータ変更部５１は、報酬が最も多く得られる制御パラメータとなるように電圧制御に関わる加工条件の制御パラメータを変更する。 Based on the reward calculated by the first reward calculation unit 41, the first function update unit 42 updates an action value function Q that is a function for determining a control parameter related to voltage control. Based on the updated action value function Q, the first control parameter changing unit 51 changes the control parameter of the processing condition related to the voltage control so that the control parameter is such that the reward is obtained most.

以上をふまえて、図３に示した電圧制御に関わる加工条件の６種類の制御パラメータの最適化について、図１５を用いて説明する。図１５は放電加工機１が放電加工を継続して実行している状況において実行され、変更される制御パラメータには優先順位が設定されているとして説明するが、６種類の制御パラメータを同時に最適化するようにしてもよい。 Based on the above, the optimization of the six control parameters of the processing conditions related to the voltage control shown in FIG. 3 will be described with reference to FIG. FIG. 15 illustrates that the control is executed in a situation where the electric discharge machine 1 is continuously performing the electric discharge machining, and that the control parameters to be changed are set to priorities. You may make it.

図１５のフローチャートが実行される前に、電圧制御にかかる報酬の初期値を第１報酬計算部４１がすでに保持しているとする。報酬の初期値は固定値であれば制限されず０としてもよい。まず、現在の加工条件および制御パラメータで加工を実行しているときの、加工電源制御部１２の情報を状態観測部３０が観測する（ステップＳ１０１）。具体的には、加工中の加工電源制御部１２の指令を状態観測部３０が取得する。そして、加工電源制御部１２の指令に基づいて、パルス状態認識部３２が第１状態の値および第２状態の値を算出する（ステップＳ１０２）。次に、パルス状態認識部３２が求めた状態変数である第１状態の値および第２状態の値が加工状態観測部３３から第１報酬計算部４１に与えられる。ここで、第１状態の値および第２状態の値は、制御パラメータ保持部１３に設定されている現在使用されている制御パラメータと紐付けされて加工状態観測部３３から第１報酬計算部４１に与えられる。 Before the flowchart of FIG. 15 is executed, it is assumed that the first reward calculation unit 41 already holds the initial value of the reward for voltage control. The initial value of the reward is not limited and may be 0 if it is a fixed value. First, the state observation unit 30 observes information of the machining power supply control unit 12 when machining is being performed with current machining conditions and control parameters (step S101). Specifically, the state observation unit 30 acquires a command from the machining power supply control unit 12 during machining. Then, the pulse state recognition unit 32 calculates the value of the first state and the value of the second state based on the command from the processing power supply control unit 12 (Step S102). Next, the value of the first state and the value of the second state, which are the state variables obtained by the pulse state recognition unit 32, are provided from the machining state observation unit 33 to the first reward calculation unit 41. Here, the value of the first state and the value of the second state are linked to the currently used control parameter set in the control parameter holding unit 13 and are transmitted from the processing state observation unit 33 to the first reward calculation unit 41. Given to.

そして、第１報酬計算部４１は、与えられた第１状態の値を前回の第１状態の値と比較する（ステップＳ１０３）。第１報酬計算部４１は前回与えられた第１状態の値を保持しており、今回与えられた第１状態の値と比較することができる。第１状態の値が前回の第１状態の値より小さい場合（ステップＳ１０３：小）、第１報酬計算部４１は報酬を増やす（ステップＳ１０４）。すなわち、第１状態の値が前回よりも安定した状態を示す場合には、報酬を増やす。ここでの報酬の増加値は予め定めた値である。第１状態の値が前回の第１状態の値と同じ場合（ステップＳ１０３：同じ）、第１報酬計算部４１は報酬を変化させない（ステップＳ１０５）。第１状態の値が前回の第１状態の値より大きい場合（ステップＳ１０３：大）、第１報酬計算部４１は報酬を減らす（ステップＳ１０６）。すなわち、第１状態の値が前回よりも不安定な状態を示す場合には、報酬を減らす。ここでの報酬の減少値は予め定めた値である。なお、最初にステップＳ１０３が実行されるときは前回与えられた第１状態の値が存在しないので、ステップＳ１０５に進む。 Then, the first reward calculation unit 41 compares the given value in the first state with the previous value in the first state (step S103). The first reward calculation unit 41 holds the value of the first state given last time and can compare it with the value of the first state given this time. When the value of the first state is smaller than the value of the previous first state (step S103: small), the first reward calculation unit 41 increases the reward (step S104). That is, if the value of the first state indicates a state that is more stable than the previous state, the reward is increased. The reward increase value here is a predetermined value. When the value of the first state is the same as the value of the previous first state (step S103: the same), the first reward calculation unit 41 does not change the reward (step S105). When the value of the first state is larger than the value of the previous first state (step S103: large), the first reward calculation unit 41 reduces the reward (step S106). That is, if the value of the first state indicates a more unstable state than the previous state, the reward is reduced. The reduction value of the reward here is a predetermined value. When step S103 is executed for the first time, there is no value of the first state given last time, so the process proceeds to step S105.

次に、第１報酬計算部４１は、与えられた第２状態の値を前回の第２状態の値と比較する（ステップＳ１０７）。第１報酬計算部４１は前回与えられた第２状態の値を保持しており、今回与えられた第２状態の値と比較することができる。第２状態の値が前回の第２状態の値より大きい場合（ステップＳ１０７：大）、第１報酬計算部４１は報酬を増やす（ステップＳ１０８）。すなわち、第２状態の値が前回よりも安定した状態を示す場合には、報酬を増やす。ここでの報酬の増加値は予め定めた値である。第２状態の値が前回の第２状態の値と同じ場合（ステップＳ１０７：同じ）、第１報酬計算部４１は報酬を変化させない（ステップＳ１０９）。第２状態の値が前回の第２状態の値より小さい場合（ステップＳ１０７：小）、第１報酬計算部４１は報酬を減らす（ステップＳ１１０）。すなわち、第２状態の値が前回よりも不安定な状態を示す場合には、報酬を減らす。ここでの報酬の減少値は予め定めた値である。なお、最初にステップＳ１０７が実行されるときは前回与えられた第２状態の値が存在しないので、ステップＳ１０９に進む。 Next, the first reward calculation unit 41 compares the given value in the second state with the previous value in the second state (step S107). The first reward calculation unit 41 holds the value of the second state given last time and can compare it with the value of the second state given this time. When the value of the second state is larger than the value of the previous second state (step S107: large), the first reward calculation unit 41 increases the reward (step S108). That is, if the value of the second state indicates a state that is more stable than the previous state, the reward is increased. The reward increase value here is a predetermined value. When the value of the second state is the same as the value of the previous second state (step S107: the same), the first reward calculation unit 41 does not change the reward (step S109). When the value in the second state is smaller than the value in the previous second state (step S107: small), the first reward calculation unit 41 reduces the reward (step S110). That is, when the value of the second state indicates a more unstable state than the previous state, the reward is reduced. The reduction value of the reward here is a predetermined value. When step S107 is executed for the first time, the process proceeds to step S109 because there is no second state value given last time.

そして、第１関数更新部４２は、第１報酬計算部４１が計算した報酬に基づいて、数式（１）に従って行動価値関数Ｑを更新する（ステップＳ１１１）。さらに、第１関数更新部４２は、ステップＳ１１１において更新が行われなくなり、行動価値関数Ｑが収束したか否かを判定する（ステップＳ１１２）。行動価値関数Ｑが収束していないと判定された場合（ステップＳ１１２：Ｎｏ）、第１制御パラメータ変更部５１は、ステップＳ１１１で更新された行動価値関数Ｑに基づいて、電圧制御に関わる加工条件の制御パラメータを変更する（ステップＳ１１３）。ステップＳ１１３の後はステップＳ１０１に戻る。行動価値関数Ｑが収束したと判定された場合（ステップＳ１１２：Ｙｅｓ）、学習部４０は、第１制御パラメータ変更部５１によって、電圧制御に関わる加工条件の制御パラメータの全てが変更されたか否かを判定する（ステップＳ１１４）。電圧制御に関わる加工条件の制御パラメータの全てが変更されてはいないと判定された場合（ステップＳ１１４：Ｎｏ）、ステップＳ１１３において第１制御パラメータ変更部５１の変更対象となる制御パラメータを別の制御パラメータに替える（ステップＳ１１５）。ステップＳ１１５において新たな変更対象となった別の制御パラメータとは、電圧制御に関わるまだ変更されていない制御パラメータである。ステップＳ１１５の後はステップＳ１１３に進む。 Then, the first function updating unit 42 updates the action value function Q according to the formula (1) based on the reward calculated by the first reward calculating unit 41 (Step S111). Furthermore, the first function update unit 42 determines whether or not the update is not performed in step S111 and the action value function Q has converged (step S112). When it is determined that the action value function Q has not converged (step S112: No), the first control parameter changing unit 51 determines the processing condition related to the voltage control based on the action value function Q updated in step S111. Is changed (step S113). After step S113, the process returns to step S101. When it is determined that the action value function Q has converged (Step S112: Yes), the learning unit 40 determines whether or not all the control parameters of the processing conditions related to the voltage control have been changed by the first control parameter changing unit 51. Is determined (step S114). If it is determined that all the control parameters of the processing conditions related to the voltage control have not been changed (step S114: No), the control parameter to be changed by the first control parameter changing unit 51 is changed to another control in step S113. The parameter is replaced (step S115). Another control parameter that has been newly changed in step S115 is a control parameter related to voltage control that has not been changed yet. After step S115, the process proceeds to step S113.

ステップＳ１１３における、電圧制御に関わる加工条件の制御パラメータの変更について以下に詳細に説明する。上述したように、ステップＳ１１３において変更される図３に示した電圧制御に関わる加工条件の６種類の制御パラメータには、変更対象となる優先順位が定められている。最初にステップＳ１１３に入ったときに第１制御パラメータ変更部５１によって変更されるのは、狙い電圧値の制御パラメータである電圧制御パラメータである。そして、ステップＳ１１２において行動価値関数Ｑが収束したと判定される毎に、第１制御パラメータ変更部５１の変更対象となる制御パラメータが、軸応答性の制御パラメータであるＧＡＩＮ制御パラメータ、パルス休止時間の制御パラメータである長さ制御パラメータ、極間ギャップ調整値の制御パラメータであるギャップ制御パラメータ、最深値持続時間の制御パラメータである長さ制御パラメータ、電流パルス長さの制御パラメータである長さ制御パラメータの順にステップＳ１１５で替えられていく。 The change of the control parameter of the processing condition related to the voltage control in step S113 will be described in detail below. As described above, the priority order to be changed is determined for the six control parameters of the processing conditions related to the voltage control shown in FIG. 3 which are changed in step S113. What is changed by the first control parameter changing unit 51 when the process first enters step S113 is a voltage control parameter that is a control parameter of a target voltage value. Each time it is determined in step S112 that the action value function Q has converged, the control parameters to be changed by the first control parameter changing unit 51 are the GAIN control parameter, which is the control parameter of the axial response, and the pulse pause time. Length control parameter which is a control parameter of the gap adjustment value, length control parameter which is a control parameter of the maximum depth duration, and length control which is a control parameter of the current pulse length The parameters are replaced in step S115 in the order of the parameters.

行動価値関数Ｑが収束して、電圧制御に関わる加工条件の制御パラメータの全てが変更されたと学習部４０が判定した場合（ステップＳ１１４：Ｙｅｓ）、電圧制御に関わる加工条件の制御パラメータの学習による最適化処理は終了し、学習結果が学習結果記憶部８０に記憶される（ステップＳ１１６）。学習結果には、ステップＳ１１３で変更されて最終的に決定された各制御パラメータに加えて、各制御パラメータの変更過程の値、および制御パラメータに対応する第１状態の値および第２状態の値が含まれる。学習結果記憶部８０に記憶された学習結果は、制御パラメータの変更前後の良否判断に利用することができる。また、上記のようにして最終的に決定された制御パラメータは、上記学習において報酬が最も多く得られ、与えられた加工条件設定値において最適な制御パラメータとして制御パラメータ保持部１３に保持される。電圧制御に関わる加工条件の制御パラメータを学習により最適化することで、加工開始から終了までの間に不安定信号が発生することを防いで、安定信号のパルスの数を最大化することが可能になる。なお、上述したように、電圧制御に関わる加工条件の６種類の制御パラメータを同時に最適化する場合は、ステップＳ１１３において、第１制御パラメータ変更部５１は、ステップＳ１１１で更新された行動価値関数Ｑに基づいて、６種類の制御パラメータを同時に変更する。この場合、ステップＳ１１４およびＳ１１５は不要であり、ステップＳ１１２において行動価値関数Ｑが収束したと判定された場合（ステップＳ１１２：Ｙｅｓ）、ただちにステップＳ１１６に進むようにすればよい。 When the learning unit 40 determines that the action value function Q has converged and all the control parameters of the processing conditions related to the voltage control have been changed (step S114: Yes), the learning unit 40 learns the control parameters of the processing conditions related to the voltage control. The optimization process ends, and the learning result is stored in the learning result storage unit 80 (Step S116). The learning result includes, in addition to the control parameters finally changed and determined in step S113, values of the process of changing each control parameter, and values of the first state and the second state corresponding to the control parameters. Is included. The learning result stored in the learning result storage unit 80 can be used to determine the quality before and after the control parameter is changed. In addition, the control parameters finally determined as described above are most rewarded in the learning, and are stored in the control parameter storage unit 13 as optimal control parameters at the given processing condition set value. By optimizing the control parameters of machining conditions related to voltage control by learning, it is possible to prevent the generation of unstable signals from the start to the end of machining and to maximize the number of stable signal pulses. become. As described above, when optimizing the six types of control parameters of the machining conditions related to the voltage control at the same time, in step S113, the first control parameter changing unit 51 sets the action value function Q updated in step S111. , Six types of control parameters are changed simultaneously. In this case, steps S114 and S115 are unnecessary, and if it is determined in step S112 that the action value function Q has converged (step S112: Yes), the process may proceed to step S116 immediately.

図１６は、実施の形態１にかかるパルス制御に関わる加工条件の制御パラメータの学習による最適化処理を説明するフローチャートである。パルス制御に関わる加工条件の制御パラメータは、パルスの傾き、パルス発生周期の理論値の元となっている異常放電検出閾値など、電流パルス制御を行うための変数値である。パルス制御に関わる加工条件の制御パラメータには、電流パルスの大きさおよび幅だけでなく電流波形の形状、電流値を理想的な形状に近づけるための加工電極２と被加工物３との相対距離を調整するための極間ギャップの調整値も含まれる。また、パルス制御に関わる加工条件の制御パラメータの最適化により、制御パラメータとして設定されている電流の大きさおよび幅の初期ノッチパターンを別のノッチパターンに変更するなどの処理も行われる。 FIG. 16 is a flowchart illustrating an optimization process based on learning of control parameters of machining conditions related to pulse control according to the first embodiment. The control parameters of the processing conditions related to the pulse control are variable values for performing the current pulse control, such as the pulse inclination and the abnormal discharge detection threshold value that is the basis of the theoretical value of the pulse generation period. The control parameters of the processing conditions related to the pulse control include not only the magnitude and width of the current pulse but also the shape of the current waveform and the relative distance between the processing electrode 2 and the workpiece 3 for making the current value closer to an ideal shape. The adjustment value of the gap between the poles for adjusting the gap is also included. Further, by optimizing the control parameters of the processing conditions related to the pulse control, processing such as changing the initial notch pattern of the magnitude and width of the current set as the control parameter to another notch pattern is also performed.

パルス制御にかかる報酬を計算する第２報酬計算部４３は、パルス状態認識部３２が求めた状態変数である第１状態の値および第２状態の値に基づいて報酬を計算する。第２報酬計算部４３の報酬の計算方法は、第１報酬計算部４１と同じである。 The second reward calculating unit 43 that calculates the reward for the pulse control calculates the reward based on the value of the first state and the value of the second state, which are the state variables obtained by the pulse state recognition unit 32. The method of calculating the reward in the second reward calculator 43 is the same as that in the first reward calculator 41.

第２報酬計算部４３が計算した報酬に基づいて、第２関数更新部４４はパルス制御に関わる制御パラメータを決定するための関数である行動価値関数Ｑを更新する。更新された行動価値関数Ｑに基づいて、第２制御パラメータ変更部５２は、報酬が最も多く得られる制御パラメータとなるようにパルス制御に関わる加工条件の制御パラメータを変更する。 Based on the reward calculated by the second reward calculation unit 43, the second function update unit 44 updates an action value function Q that is a function for determining a control parameter related to pulse control. Based on the updated action value function Q, the second control parameter changing unit 52 changes the control parameter of the processing condition related to the pulse control so that the control parameter is such that the reward is obtained most.

以上をふまえて、図５に示したパルス制御に関わる加工条件の６種類の制御パラメータの最適化について、図１６を用いて説明する。図１６は放電加工機１が放電加工を継続して実行している状況において実行され、変更される制御パラメータには優先順位が設定されているとして説明するが、６種類の制御パラメータを同時に最適化するようにしてもよい。 Based on the above, the optimization of the six control parameters of the processing conditions related to the pulse control shown in FIG. 5 will be described with reference to FIG. FIG. 16 illustrates that the process is executed in a situation where the electric discharge machine 1 continuously performs the electric discharge machining, and the control parameters to be changed are set to the priority order. You may make it.

図１６のフローチャートが実行される前に、パルス制御にかかる報酬の初期値を第２報酬計算部４３が保持しているとする。報酬の初期値は固定値であれば制限されず０としてもよい。まず、現在の加工条件および制御パラメータで加工を実行しているときの、加工電源制御部１２の情報を状態観測部３０が観測する（ステップＳ２０１）。具体的には、加工中の加工電源制御部１２の指令を状態観測部３０が取得する。そして、加工電源制御部１２の指令に基づいて、パルス状態認識部３２が第１状態の値および第２状態の値を算出する（ステップＳ２０２）。次に、パルス状態認識部３２が求めた状態変数である第１状態の値および第２状態の値が加工状態観測部３３から第２報酬計算部４３に与えられる。ここで、第１状態の値および第２状態の値は、制御パラメータ保持部１３に設定されている現在使用されている制御パラメータと紐付けされて加工状態観測部３３から第２報酬計算部４３に与えられる。 Before the flowchart of FIG. 16 is executed, it is assumed that the second reward calculating unit 43 holds the initial value of the reward related to the pulse control. The initial value of the reward is not limited and may be 0 if it is a fixed value. First, the state observation unit 30 observes information of the machining power supply control unit 12 when machining is being performed with the current machining conditions and control parameters (step S201). Specifically, the state observation unit 30 acquires a command from the machining power supply control unit 12 during machining. Then, based on a command from the machining power control unit 12, the pulse state recognition unit 32 calculates a value in the first state and a value in the second state (Step S202). Next, the value of the first state and the value of the second state, which are the state variables obtained by the pulse state recognition unit 32, are provided from the processing state observation unit 33 to the second reward calculation unit 43. Here, the value of the first state and the value of the second state are linked to the currently used control parameter set in the control parameter holding unit 13 and are transmitted from the processing state observation unit 33 to the second reward calculation unit 43. Given to.

そして、第２報酬計算部４３は、与えられた第１状態の値を前回の第１状態の値と比較する（ステップＳ２０３）。第２報酬計算部４３は前回与えられた第１状態の値を保持しており、今回与えられた第１状態の値と比較することができる。第１状態の値が前回の第１状態の値より小さい場合（ステップＳ２０３：小）、第２報酬計算部４３は報酬を増やす（ステップＳ２０４）。ここでの報酬の増加値は予め定めた値である。第１状態の値が前回の第１状態の値と同じ場合（ステップＳ２０３：同じ）、第２報酬計算部４３は報酬を変化させない（ステップＳ２０５）。第１状態の値が前回の第１状態の値より大きい場合（ステップＳ２０３：大）、第２報酬計算部４３は報酬を減らす（ステップＳ２０６）。ここでの報酬の減少値は予め定めた値である。なお、最初にステップＳ２０３が実行されるときは前回与えられた第１状態の値が存在しないので、ステップＳ２０５に進む。 Then, the second reward calculation unit 43 compares the given value in the first state with the previous value in the first state (step S203). The second reward calculation unit 43 holds the value of the first state given last time and can compare it with the value of the first state given this time. When the value of the first state is smaller than the value of the previous first state (Step S203: small), the second reward calculation unit 43 increases the reward (Step S204). The reward increase value here is a predetermined value. When the value of the first state is the same as the value of the previous first state (step S203: the same), the second reward calculation unit 43 does not change the reward (step S205). When the value of the first state is larger than the value of the previous first state (step S203: large), the second reward calculation unit 43 reduces the reward (step S206). The reduction value of the reward here is a predetermined value. When step S203 is executed for the first time, there is no value of the first state given last time, and therefore, the process proceeds to step S205.

次に、第２報酬計算部４３は、与えられた第２状態の値を前回の第２状態の値と比較する（ステップＳ２０７）。第２報酬計算部４３は前回与えられた第２状態の値を保持しており、今回与えられた第２状態の値と比較することができる。第２状態の値が前回の第２状態の値より大きい場合（ステップＳ２０７：大）、第２報酬計算部４３は報酬を増やす（ステップＳ２０８）。ここでの報酬の増加値は予め定めた値である。第２状態の値が前回の第２状態の値と同じ場合（ステップＳ２０７：同じ）、第２報酬計算部４３は報酬を変化させない（ステップＳ２０９）。第２状態の値が前回の第２状態の値より小さい場合（ステップＳ２０７：小）、第２報酬計算部４３は報酬を減らす（ステップＳ２１０）。ここでの報酬の減少値は予め定めた値である。なお、最初にステップＳ２０７が実行されるときは前回与えられた第２状態の値が存在しないので、ステップＳ２０９に進む。 Next, the second reward calculation unit 43 compares the given value in the second state with the previous value in the second state (step S207). The second reward calculation unit 43 holds the value of the second state given last time and can compare it with the value of the second state given this time. When the value of the second state is larger than the value of the previous second state (step S207: large), the second reward calculation unit 43 increases the reward (step S208). The reward increase value here is a predetermined value. If the value of the second state is the same as the value of the previous second state (step S207: the same), the second reward calculation unit 43 does not change the reward (step S209). When the value of the second state is smaller than the value of the previous second state (step S207: small), the second reward calculation unit 43 reduces the reward (step S210). The reduction value of the reward here is a predetermined value. When step S207 is executed for the first time, the process proceeds to step S209 because there is no second state value given last time.

そして、第２関数更新部４４は、第２報酬計算部４３が計算した報酬に基づいて、数式（１）に従って行動価値関数Ｑを更新する（ステップＳ２１１）。さらに、第２関数更新部４４は、ステップＳ２１１において更新が行われなくなり、行動価値関数Ｑが収束したか否かを判定する（ステップＳ２１２）。行動価値関数Ｑが収束していないと判定された場合（ステップＳ２１２：Ｎｏ）、第２制御パラメータ変更部５２は、ステップＳ２１１で更新された行動価値関数Ｑに基づいて、パルス制御に関わる加工条件の制御パラメータを変更する（ステップＳ２１３）。ステップＳ２１３の後はステップＳ２０１に戻る。行動価値関数Ｑが収束したと判定された場合（ステップＳ２１２：Ｙｅｓ）、学習部４０は、第２制御パラメータ変更部５２によって、パルス制御に関わる加工条件の制御パラメータの全てが変更されたか否かを判定する（ステップＳ２１４）。パルス制御に関わる加工条件の制御パラメータの全てが変更されてはいないと判定された場合（ステップＳ２１４：Ｎｏ）、ステップＳ２１３において第２制御パラメータ変更部５２の変更対象となる制御パラメータを別の制御パラメータに替える（ステップＳ２１５）。ステップＳ２１５において新たな変更対象となった別の制御パラメータとは、パルス制御に関わるまだ変更されていない制御パラメータである。ステップＳ２１５の後はステップＳ２１３に進む。 Then, the second function updating unit 44 updates the action value function Q according to the formula (1) based on the reward calculated by the second reward calculating unit 43 (Step S211). Further, the second function update unit 44 determines whether or not the update is not performed in step S211 and the action value function Q has converged (step S212). When it is determined that the action value function Q has not converged (step S212: No), the second control parameter changing unit 52 performs processing conditions related to pulse control based on the action value function Q updated in step S211. Is changed (step S213). After step S213, the process returns to step S201. When it is determined that the action value function Q has converged (Step S212: Yes), the learning unit 40 determines whether or not all the control parameters of the processing conditions related to the pulse control have been changed by the second control parameter changing unit 52. Is determined (step S214). If it is determined that all of the control parameters of the processing conditions related to the pulse control have not been changed (step S214: No), the control parameter to be changed by the second control parameter changing unit 52 is changed to another control in step S213. The parameter is replaced (step S215). Another control parameter that has been newly changed in step S215 is a control parameter related to pulse control that has not been changed yet. After step S215, the process proceeds to step S213.

ステップＳ２１３における、パルス制御に関わる加工条件の制御パラメータの変更について以下に詳細に説明する。上述したように、ステップＳ２１３において変更される図５に示したパルス制御に関わる加工条件の６種類の制御パラメータには、変更対象となる優先順位が定められている。最初にステップＳ２１３に入ったときに第２制御パラメータ変更部５２によって変更されるのは、狙い電圧値の制御パラメータである電圧制御パラメータである。そして、ステップＳ２１２において行動価値関数Ｑが収束したと判定される毎に、第２制御パラメータ変更部５２の変更対象となる制御パラメータが、極間ギャップ調整値の制御パラメータであるギャップ制御パラメータ、回路補助設定の制御パラメータであるパルス傾き制御パラメータ、電流パルス長さの制御パラメータである長さ制御パラメータ、電流パルスピーク値の制御パラメータであるピーク制御パラメータ、加工回路の種別の制御パラメータである回路呼出しパラメータの順にステップＳ２１５で替えられていく。 The change of the control parameter of the processing condition related to the pulse control in step S213 will be described in detail below. As described above, the priority order to be changed is determined for the six control parameters of the processing conditions related to the pulse control shown in FIG. 5 that are changed in step S213. What is changed by the second control parameter changing unit 52 when the process first enters step S213 is a voltage control parameter that is a control parameter of a target voltage value. Each time it is determined that the action value function Q has converged in step S212, the control parameter to be changed by the second control parameter changing unit 52 is a gap control parameter or a circuit A pulse tilt control parameter which is a control parameter for auxiliary setting, a length control parameter which is a control parameter for a current pulse length, a peak control parameter which is a control parameter for a current pulse peak value, and a circuit call which is a control parameter for a type of machining circuit. The parameters are replaced in step S215 in the order of the parameters.

行動価値関数Ｑが収束して、パルス制御に関わる加工条件の制御パラメータの全てが変更されたと学習部４０が判定した場合（ステップＳ２１４：Ｙｅｓ）、パルス制御に関わる加工条件の制御パラメータの学習による最適化処理は終了し、学習結果が学習結果記憶部８０に記憶される（ステップＳ２１６）。学習結果には、ステップＳ２１３で変更されて最終的に決定された各制御パラメータに加えて、各制御パラメータの変更過程の値、および制御パラメータに対応する第１状態の値および第２状態の値が含まれる。学習結果記憶部８０に記憶された学習結果は、制御パラメータの変更前後の良否判断に利用することができる。また、上記のようにして最終的に決定された制御パラメータは、上記学習において報酬が最も多く得られ、与えられた加工条件設定値において最適な制御パラメータとして制御パラメータ保持部１３に保持される。パルス制御に関わる加工条件の制御パラメータを学習により最適化することで、加工開始から終了までの間に不安定信号が発生することを防いで、安定信号のパルスの数を最大化することが可能になる。なお、上述したように、パルス制御に関わる加工条件の６種類の制御パラメータを同時に最適化する場合は、ステップＳ２１３において、第２制御パラメータ変更部５２は、ステップＳ２１１で更新された行動価値関数Ｑに基づいて、６種類の制御パラメータを同時に変更する。この場合、ステップＳ２１４およびＳ２１５は不要であり、ステップＳ２１２において行動価値関数Ｑが収束したと判定された場合（ステップＳ２１２：Ｙｅｓ）、ただちにステップＳ２１６に進むようにすればよい。 When the learning unit 40 determines that the action value function Q has converged and all the control parameters of the processing conditions related to the pulse control have been changed (step S214: Yes), the learning unit 40 learns the control parameters of the processing conditions related to the pulse control. The optimization process ends, and the learning result is stored in the learning result storage unit 80 (Step S216). The learning result includes, in addition to the control parameters finally changed and determined in step S213, values of the process of changing the control parameters, and values of the first state and the second state corresponding to the control parameters. Is included. The learning result stored in the learning result storage unit 80 can be used to determine the quality before and after the control parameter is changed. In addition, the control parameters finally determined as described above are most rewarded in the learning, and are stored in the control parameter storage unit 13 as optimal control parameters at the given processing condition set value. By optimizing the control parameters of the processing conditions related to pulse control by learning, it is possible to prevent the generation of unstable signals from the start to the end of processing and to maximize the number of stable signal pulses. become. As described above, when simultaneously optimizing the six types of control parameters of the processing conditions related to the pulse control, in step S213, the second control parameter changing unit 52 sets the action value function Q updated in step S211. , Six types of control parameters are changed simultaneously. In this case, steps S214 and S215 are unnecessary, and if it is determined in step S212 that the action value function Q has converged (step S212: Yes), the process may proceed to step S216 immediately.

図１７は、実施の形態１にかかる軸駆動制御に関わる加工条件の制御パラメータの学習による最適化処理を説明するフローチャートである。軸駆動制御に関わる加工条件の制御パラメータは極間制御パラメータとも呼ばれ、加工電極２と被加工物３とを近接させる際の減速距離、ジャンプ動作と呼ばれる瞬時退避行動の挙動を生成する速度および加速度のパラメータなど、放電加工機１の軸駆動挙動の変更を行うための変数値である。極間制御パラメータの変更には、極間に安定的に放電を発生させるための軸応答性の変更だけでなく、極間を清掃するためのジャンプ動作の変更、軸の過応答による固有周波数振動による加振を防ぐためのパラメータの変更も含まれる。 FIG. 17 is a flowchart illustrating an optimization process by learning control parameters of machining conditions related to the axis drive control according to the first embodiment. The control parameter of the machining condition related to the axis drive control is also called a gap control parameter, a deceleration distance when the machining electrode 2 and the workpiece 3 are brought close to each other, a speed at which an instantaneous evacuation behavior called a jump operation is generated, and It is a variable value for changing the axis driving behavior of the electric discharge machine 1, such as an acceleration parameter. Changes in the gap control parameters include not only changes in the shaft responsiveness for generating stable discharge between the poles, but also changes in the jump operation for cleaning the gaps, and natural frequency oscillations due to overresponse of the shaft. This also includes changing parameters to prevent vibrations due to vibration.

軸駆動制御にかかる報酬を計算する第３報酬計算部４５は、パルス状態認識部３２が求めた状態変数である第２状態の値および軸駆動認識部３１が求めた状態変数である第３状態の値に基づいて報酬を計算する。第３報酬計算部４５は、第２状態の値が大きく、第３状態の値が大きくなる場合に報酬を増やすように報酬の変化量を計算するのであれば、第２状態の値および第３状態の値をどのように用いて報酬の変化量を求めるかに制限はない。具体的には、第２状態の値が大きくなった場合に報酬を増やし、第２状態の値が小さくなった場合には報酬を減らす。これに加えて、第３状態の値が大きくなった場合に報酬を増やし、第３状態の値が小さくなった場合には報酬を減らす。また、軸の送り量に変化が無くても放電パルスの数が増大すれば報酬が増大するようにするが、軸の送り量が加工進行方向に大きくなっても放電パルスの数が減少した場合は報酬が減少するように報酬の計算方法を定めてもよい。 The third reward calculation unit 45 that calculates the reward for the axis drive control includes a second state value that is a state variable obtained by the pulse state recognition unit 32 and a third state that is a state variable obtained by the axis drive recognition unit 31. Calculate the reward based on the value of. The third reward calculation unit 45 calculates the change amount of the reward so as to increase the reward when the value of the second state is large and the value of the third state is large. There is no restriction on how the state value is used to determine the amount of change in reward. Specifically, the reward is increased when the value of the second state is large, and the reward is reduced when the value of the second state is small. In addition to this, the reward is increased when the value of the third state increases, and the reward is decreased when the value of the third state decreases. Also, if the number of discharge pulses increases even if there is no change in the feed amount of the shaft, the reward is increased, but if the number of discharge pulses decreases even if the feed amount of the shaft increases in the machining progress direction. May determine the method of calculating the reward so that the reward decreases.

第３報酬計算部４５が計算した報酬に基づいて、第３関数更新部４６は軸駆動制御に関わる制御パラメータを決定するための関数である行動価値関数Ｑを更新する。更新された行動価値関数Ｑに基づいて、第３制御パラメータ変更部５３は、報酬が最も多く得られる制御パラメータとなるように軸駆動制御に関わる加工条件の制御パラメータを変更する。 Based on the reward calculated by the third reward calculation unit 45, the third function update unit 46 updates an action value function Q that is a function for determining a control parameter related to axis drive control. Based on the updated action value function Q, the third control parameter changing unit 53 changes the control parameters of the machining conditions related to the axis drive control so that the control parameter is the one that gives the most reward.

以上をふまえて、図７に示した軸駆動制御に関わる加工条件の５種類の制御パラメータの最適化について、図１７を用いて説明する。図１７は放電加工機１が放電加工を継続して実行している状況において実行され、変更される制御パラメータには優先順位が設定されているとして説明するが、５種類の制御パラメータを同時に最適化するようにしてもよい。 Based on the above, the optimization of the five types of control parameters of the machining conditions related to the shaft drive control shown in FIG. 7 will be described with reference to FIG. FIG. 17 illustrates that the process is performed in a situation where the electric discharge machine 1 is continuously performing the electric discharge machining, and that the control parameters to be changed have a priority order. You may make it.

図１７のフローチャートが実行される前に、軸駆動制御にかかる報酬の初期値を第３報酬計算部４５がすでに保持しているとする。報酬の初期値は固定値であれば制限されず０としてもよい。まず、現在の加工条件および制御パラメータで加工を実行しているときの、加工電源制御部１２の情報を状態観測部３０が観測する（ステップＳ３０１）。具体的には、加工中の加工電源制御部１２の指令を状態観測部３０が取得する。そして、加工電源制御部１２の指令に基づいて、パルス状態認識部３２が第２状態の値を算出する（ステップＳ３０３）。また、現在の加工条件および制御パラメータで加工を実行しているときの、軸駆動制御部１１の情報を状態観測部３０が観測する（ステップＳ３０２）。具体的には、加工中の軸駆動制御部１１の指令を状態観測部３０が取得する。そして、軸駆動制御部１１の指令に基づいて、軸駆動認識部３１が第３状態の値を算出する（ステップＳ３０４）。次に、パルス状態認識部３２が求めた第２状態の値および軸駆動認識部３１が求めた第３状態の値が加工状態観測部３３から第３報酬計算部４５に与えられる。ここで、第２状態の値および第３状態の値は、制御パラメータ保持部１３に設定されている現在使用されている制御パラメータと紐付けされて加工状態観測部３３から第３報酬計算部４５に与えられる。 Before the flowchart in FIG. 17 is executed, it is assumed that the third reward calculation unit 45 has already held the initial value of the reward for the axis drive control. The initial value of the reward is not limited and may be 0 if it is a fixed value. First, the state observation unit 30 observes information of the machining power supply control unit 12 when machining is being performed with current machining conditions and control parameters (step S301). Specifically, the state observation unit 30 acquires a command from the machining power supply control unit 12 during machining. Then, based on the command from the machining power control unit 12, the pulse state recognition unit 32 calculates the value of the second state (Step S303). Further, the state observation unit 30 observes information of the axis drive control unit 11 when the machining is being performed with the current machining conditions and control parameters (step S302). Specifically, the state observation unit 30 acquires a command from the axis drive control unit 11 during processing. Then, based on a command from the shaft drive control unit 11, the shaft drive recognition unit 31 calculates a value in the third state (step S304). Next, the value of the second state obtained by the pulse state recognition unit 32 and the value of the third state obtained by the axis drive recognition unit 31 are provided from the machining state observation unit 33 to the third reward calculation unit 45. Here, the value of the second state and the value of the third state are linked to the currently used control parameter set in the control parameter holding unit 13 and are transmitted from the processing state observation unit 33 to the third reward calculation unit 45. Given to.

そして、第３報酬計算部４５は、与えられた第２状態の値を前回の第２状態の値と比較する（ステップＳ３０５）。第３報酬計算部４５は前回与えられた第２状態の値を保持しており、今回与えられた第２状態の値と比較することができる。第２状態の値が前回の第２状態の値より大きい場合（ステップＳ３０５：大）、第３報酬計算部４５は報酬を増やす（ステップＳ３０６）。ここでの報酬の増加値は予め定めた値である。第２状態の値が前回の第２状態の値と同じ場合（ステップＳ３０５：同じ）、第３報酬計算部４５は報酬を変化させない（ステップＳ３０７）。第２状態の値が前回の第２状態の値より小さい場合（ステップＳ３０５：小）、第３報酬計算部４５は報酬を減らす（ステップＳ３０８）。ここでの報酬の減少値は予め定めた値である。なお、最初にステップＳ３０５が実行されるときは前回与えられた第２状態の値が存在しないので、ステップＳ３０７に進む。 Then, the third reward calculation unit 45 compares the given value of the second state with the value of the previous second state (step S305). The third reward calculation unit 45 holds the value of the second state given last time and can compare it with the value of the second state given this time. When the value in the second state is larger than the value in the previous second state (step S305: large), the third reward calculation unit 45 increases the reward (step S306). The reward increase value here is a predetermined value. If the value of the second state is the same as the value of the previous second state (step S305: the same), the third reward calculation unit 45 does not change the reward (step S307). If the value of the second state is smaller than the value of the previous second state (step S305: small), the third reward calculation unit 45 reduces the reward (step S308). The reduction value of the reward here is a predetermined value. When step S305 is executed for the first time, the process proceeds to step S307 because the value of the second state given last time does not exist.

次に、第３報酬計算部４５は、与えられた第３状態の値を前回の第３状態の値と比較する（ステップＳ３０９）。第３報酬計算部４５は前回与えられた第３状態の値を保持しており、今回与えられた第３状態の値と比較することができる。第３状態の値が前回の第３状態の値より大きい場合（ステップＳ３０９：大）、第３報酬計算部４５は報酬を増やす（ステップＳ３１０）。すなわち、第３状態の値が前回よりも安定した状態を示す場合には、報酬を増やす。ここでの報酬の増加値は予め定めた値である。第３状態の値が前回の第３状態の値と同じ場合（ステップＳ３０９：同じ）、第３報酬計算部４５は報酬を変化させない（ステップＳ３１１）。第３状態の値が前回の第３状態の値より小さい場合（ステップＳ３０９：小）、第３報酬計算部４５は報酬を減らす（ステップＳ３１２）。すなわち、第３状態の値が前回よりも不安定な状態を示す場合には、報酬を減らす。ここでの報酬の減少値は予め定めた値である。なお、最初にステップＳ３０９が実行されるときは前回与えられた第３状態の値が存在しないので、ステップＳ３１１に進む。 Next, the third reward calculation unit 45 compares the given value in the third state with the previous value in the third state (step S309). The third reward calculation unit 45 holds the value of the third state given last time and can compare it with the value of the third state given this time. When the value of the third state is larger than the value of the previous third state (step S309: large), the third reward calculation unit 45 increases the reward (step S310). That is, if the value of the third state indicates a more stable state than the previous state, the reward is increased. The reward increase value here is a predetermined value. If the value of the third state is the same as the value of the previous third state (step S309: the same), the third reward calculation unit 45 does not change the reward (step S311). When the value in the third state is smaller than the value in the previous third state (step S309: small), the third reward calculation unit 45 reduces the reward (step S312). That is, if the value of the third state indicates a more unstable state than the previous state, the reward is reduced. The reduction value of the reward here is a predetermined value. When step S309 is executed for the first time, there is no third state value given last time, and therefore, the process proceeds to step S311.

そして、第３関数更新部４６は、第３報酬計算部４５が計算した報酬に基づいて、数式（１）に従って行動価値関数Ｑを更新する（ステップＳ３１３）。さらに、第３関数更新部４６は、ステップＳ３１３において更新が行われなくなり、行動価値関数Ｑが収束したか否かを判定する（ステップＳ３１４）。行動価値関数Ｑが収束していないと判定された場合（ステップＳ３１４：Ｎｏ）、第３制御パラメータ変更部５３は、ステップＳ３１３で更新された行動価値関数Ｑに基づいて、軸駆動制御に関わる加工条件の制御パラメータを変更する（ステップＳ３１５）。ステップＳ３１５の後はステップＳ３０１およびＳ３０２に戻る。行動価値関数Ｑが収束したと判定された場合（ステップＳ３１４：Ｙｅｓ）、学習部４０は、第３制御パラメータ変更部５３によって、軸駆動制御に関わる加工条件の制御パラメータの全てが変更されたか否かを判定する（ステップＳ３１６）。軸駆動制御に関わる加工条件の制御パラメータの全てが変更されてはいないと判定された場合（ステップＳ３１６：Ｎｏ）、ステップＳ３１５において第３制御パラメータ変更部５３の変更対象となる制御パラメータを別の制御パラメータに替える（ステップＳ３１７）。ステップＳ３１７において新たな変更対象となった別の制御パラメータとは、軸駆動制御に関わるまだ変更されていない制御パラメータである。ステップＳ３１７の後はステップＳ３１５に進む。 Then, the third function updating unit 46 updates the action value function Q according to the formula (1) based on the reward calculated by the third reward calculating unit 45 (Step S313). Further, the third function update unit 46 determines whether or not the update is not performed in step S313 and the action value function Q has converged (step S314). If it is determined that the action value function Q has not converged (step S314: No), the third control parameter changing unit 53 performs processing related to axis drive control based on the action value function Q updated in step S313. The control parameter of the condition is changed (step S315). After step S315, the process returns to steps S301 and S302. When it is determined that the action value function Q has converged (step S314: Yes), the learning unit 40 determines whether or not all the control parameters of the machining conditions related to the axis drive control have been changed by the third control parameter changing unit 53. It is determined (step S316). If it is determined that all of the control parameters of the machining conditions related to the axis drive control have not been changed (step S316: No), the control parameter to be changed by the third control parameter changing unit 53 is changed to another in step S315. The control parameter is replaced (step S317). Another control parameter that has been newly changed in step S317 is a control parameter related to axis drive control that has not been changed yet. After step S317, the process proceeds to step S315.

ステップＳ３１５における、軸駆動制御に関わる加工条件の制御パラメータの変更について以下に詳細に説明する。上述したように、ステップＳ３１５において変更される図７に示した軸駆動制御に関わる加工条件の５種類の制御パラメータには、変更対象となる優先順位が定められている。最初にステップＳ３１５に入ったときに第３制御パラメータ変更部５３によって変更されるのは、狙い電圧値の制御パラメータである電圧制御パラメータである。そして、ステップＳ３１４において行動価値関数Ｑが収束したと判定される毎に、第３制御パラメータ変更部５３の変更対象となる制御パラメータが、軸応答性の制御パラメータであるＧＡＩＮ制御パラメータ、最深値持続時間の制御パラメータである長さ制御パラメータ、ジャンプスピードおよびジャンプ高さの制御パラメータであるジャンプ制御パラメータ、極間ギャップ調整値の制御パラメータであるギャップ制御パラメータの順にステップＳ３１７で替えられていく。 The change of the control parameter of the machining condition related to the axis drive control in step S315 will be described in detail below. As described above, the priority order to be changed is determined for the five control parameters of the machining conditions related to the axis drive control shown in FIG. 7 that are changed in step S315. What is changed by the third control parameter changing unit 53 when the process first enters step S315 is a voltage control parameter which is a control parameter of a target voltage value. Each time it is determined in step S314 that the action value function Q has converged, the control parameters to be changed by the third control parameter changing unit 53 are the GAIN control parameter, which is the control parameter of the axial response, and the deepest value duration. The length control parameter, which is a control parameter for time, the jump control parameter, which is a control parameter for the jump speed and the jump height, and the gap control parameter, which is a control parameter for the gap adjustment value between poles, are replaced in this order in step S317.

行動価値関数Ｑが収束して、軸駆動制御に関わる加工条件の制御パラメータの全てが変更されたと学習部４０が判定した場合（ステップＳ３１６：Ｙｅｓ）、軸駆動制御に関わる加工条件の制御パラメータの学習による最適化処理は終了し、学習結果が学習結果記憶部８０に記憶される（ステップＳ３１８）。学習結果には、ステップＳ３１５で変更されて最終的に決定された各制御パラメータに加えて、各制御パラメータの変更過程の値、および制御パラメータに対応する第２状態の値および第３状態の値が含まれる。学習結果記憶部８０に記憶された学習結果は、制御パラメータの変更前後の良否判断に利用することができる。また、上記のようにして最終的に決定された制御パラメータは、上記学習において報酬が最も多く得られ、与えられた加工条件設定値において最適な制御パラメータとして制御パラメータ保持部１３に保持される。軸駆動制御に関わる加工条件の制御パラメータを学習により最適化することで、ジャンプ動作と言われる退避動作が終了し、放電を発生させるための極間位置制御が行われ、再び次のジャンプ動作が行われるまでといった１回の動作単位におけるパルス数を増大させて、観測毎における軸の送り量を加工進行方向に大きくして加工の進行を促進することが可能になる。なお、上述したように、軸駆動制御に関わる加工条件の５種類の制御パラメータを同時に最適化する場合は、ステップＳ３１５において、第３制御パラメータ変更部５３は、ステップＳ３１３で更新された行動価値関数Ｑに基づいて、５種類の制御パラメータを同時に変更する。この場合、ステップＳ３１６およびＳ３１７は不要であり、ステップＳ３１４において行動価値関数Ｑが収束したと判定された場合（ステップＳ３１４：Ｙｅｓ）、ただちにステップＳ３１８に進むようにすればよい。 When the learning unit 40 determines that the action value function Q has converged and all the control parameters of the machining conditions related to the axis drive control have been changed (step S316: Yes), the learning parameters of the machining parameters related to the axis drive control are changed. The optimization process based on the learning ends, and the learning result is stored in the learning result storage unit 80 (step S318). The learning result includes, in addition to the control parameters changed and finally determined in step S315, the value of the process of changing each control parameter, and the value of the second state and the value of the third state corresponding to the control parameter Is included. The learning result stored in the learning result storage unit 80 can be used to determine the quality before and after the control parameter is changed. In addition, the control parameters finally determined as described above are most rewarded in the learning, and are stored in the control parameter storage unit 13 as optimal control parameters at the given processing condition set value. By optimizing the control parameters of machining conditions related to axis drive control by learning, the retreat operation called jump operation is completed, the gap position control for generating electric discharge is performed, and the next jump operation is performed again. By increasing the number of pulses in one operation unit, such as until the operation is performed, it becomes possible to increase the feed amount of the axis for each observation in the processing progress direction to accelerate the processing. As described above, when simultaneously optimizing five types of control parameters of machining conditions related to axis drive control, in step S315, the third control parameter changing unit 53 sets the action value function updated in step S313. Based on Q, five kinds of control parameters are changed simultaneously. In this case, steps S316 and S317 are unnecessary, and if it is determined in step S314 that the action value function Q has converged (step S314: Yes), the process may proceed to step S318 immediately.

また、上記図１５から図１７の説明において、更新された行動価値関数Ｑに基づいて変更される制御パラメータの変更の方法は、現状の状態ｓ_tにおける行動価値関数Ｑ（ｓ_t，ａ_t）で求められる行動価値Ｑが最大となるような行動ａ_tすなわち制御パラメータを求めるやり方であれば特に限定されない。Further, in the description of FIG. 17 from FIG 15, the method of changing the control parameter is changed based on the updated action value function Q is action value in the current state s _t function Q (s _{_t,} a _t) action value Q sought is not particularly limited as long as the manner for obtaining the action a _t or control parameters such as the maximum.

なお、同一の加工条件の制御パラメータは同一であるので、図１５から図１７のフローチャートを並列して実行した場合には、それぞれのフローチャートによる変更を同一の制御パラメータが受けることになる。 Since the control parameters under the same machining conditions are the same, when the flowcharts of FIGS. 15 to 17 are executed in parallel, the same control parameters are changed by the respective flowcharts.

図１５から図１７のフローチャートによる制御パラメータの最適化の動作は放電加工機１による加工動作が開始され、放電が発生した段階から行われ、放電加工が終了するまで続けられる。すなわち、加工の開始と同時に加工の状態は状態観測部３０により観測され、加工が終了するまで最適な制御パラメータの探索が学習部４０およびパラメータ変更部５０により行われる。すなわち、機械学習装置１００によって、図１５から図１７のフローチャートが並列して実行され、図１５から図１７の全ての終了条件が満たされるまで制御パラメータの更新は継続する。全ての終了条件が満たされた場合に制御パラメータの変更は終了する。 The operation of optimizing the control parameters according to the flowcharts of FIGS. 15 to 17 is performed from the stage when the electric discharge machine 1 starts the machining operation by the electric discharge machine 1, and is continued until the electric discharge machining ends. That is, the state of the machining is observed by the state observation unit 30 at the same time as the start of the machining, and the learning unit 40 and the parameter changing unit 50 search for the optimal control parameters until the end of the machining. That is, the flowcharts of FIGS. 15 to 17 are executed in parallel by the machine learning device 100, and the update of the control parameters continues until all the end conditions of FIGS. 15 to 17 are satisfied. The change of the control parameter ends when all the end conditions are satisfied.

機械学習装置１００による上記学習行動は、放電加工開始から加工終了となるまで継続的に行われる。学習行動における報酬を、上記第１、第２および第３状態に基づいて求めて、制御パラメータの変更行動を行う。この学習行動により、加工終了後に得らえた最適な制御パラメータによる行動価値Ｑは、最初に設定されている制御パラメータによる行動価値Ｑより高められている。実施の形態１にかかる放電加工機１によって、行動価値Ｑが高められることにより、加工終了までにかかる時間の短縮、および安定した放電による加工によって得られる被加工物の加工精度および加工面質の向上が効果として得られる。 The above learning behavior by the machine learning device 100 is continuously performed from the start of electric discharge machining to the end of machining. A reward in the learning action is obtained based on the first, second, and third states, and a control parameter changing action is performed. As a result of this learning action, the action value Q based on the optimal control parameters obtained after the end of the processing is higher than the action value Q based on the control parameters initially set. The action value Q is increased by the electric discharge machine 1 according to the first embodiment, so that the time required for the end of machining is reduced, and the machining accuracy and machining surface quality of the workpiece obtained by machining with stable electric discharge are improved. Improvement is obtained as an effect.

従来の適応制御においては、加工が安定するように決められたルールにより加工条件設定値の制御は行なわれていたが、制御パラメータを変更する適応制御は行われていなかった。これに対して、機械学習装置１００によれば、加工物形状および加工材質に応じて、実際に放電加工を実行させながら、制御パラメータを調整する最適化学習を実行するので、物理量としてより適切で安定した加工条件を自動的に学習することができる。すなわち、機械学習装置１００によれば、被加工物の形状、電極材質、電極形状といった、あらかじめ想定が難しい適応制御使用条件下においても、適応制御の適用範囲を限定することなく制御パラメータの最適化を行うことが可能となり、加工の安定性を高めて、加工速度および加工精度の改善を図ることができる。 In the conventional adaptive control, processing condition setting values are controlled according to rules determined to stabilize processing, but adaptive control for changing control parameters is not performed. On the other hand, according to the machine learning device 100, the optimization learning for adjusting the control parameters is executed while actually performing the electric discharge machining according to the workpiece shape and the machining material. Stable machining conditions can be learned automatically. That is, according to the machine learning device 100, optimization of control parameters without limiting the applicable range of adaptive control, even under conditions of adaptive control that are difficult to predict in advance, such as the shape of the workpiece, the electrode material, and the electrode shape. Can be performed, the processing stability can be enhanced, and the processing speed and processing accuracy can be improved.

実施の形態２．
図１８は、本発明の実施の形態２にかかる放電加工機１Ａの構成を示すブロック図である。放電加工機１Ａは、実施の形態１にかかる放電加工機１に、加工結果を利用した追加学習を行うための構成である加工結果入力部２３を入出力部２０に追加している。Embodiment 2 FIG.
FIG. 18 is a block diagram showing a configuration of an electric discharge machine 1A according to a second embodiment of the present invention. In the electric discharge machine 1A, a machining result input unit 23, which is a configuration for performing additional learning using a machining result, is added to the input / output unit 20 in the electric discharge machine 1 according to the first embodiment.

実施の形態１においては、ある特定の加工を行う際の制御パラメータの学習行動について説明したが、実施の形態２においては、同じ被加工物３の材料および同じ加工条件設定値においてあらかじめ一度加工が行われたものとする。一度加工が行われた結果として、被加工物３の加工後の面粗さと、加工電極２の加工後の電極消耗量である消耗重量または消耗長さとが得られているとする。 In the first embodiment, the learning behavior of the control parameter when performing a specific processing has been described. In the second embodiment, however, the processing is performed once in advance with the same material of the workpiece 3 and the same processing condition set value. It has been done. It is assumed that as a result of the processing once, the surface roughness of the workpiece 3 after processing and the consumption weight or the consumption length, which is the amount of electrode consumption after the processing of the processing electrode 2, are obtained.

加工結果入力部２３は、ユーザが入力した被加工物３の加工後の面粗さおよび加工電極２の加工後の電極消耗量といった加工結果を受け付ける。加工結果の入力の形式は、表示部２２が選択可能な選択枝を表示して、ユーザの選択結果を加工結果入力部２３が受け付ける形式でもよい。また、ユーザが入力した被加工物３の加工後の面粗さおよび加工電極２の加工後の電極消耗量についての数値データを加工結果入力部２３が受け付ける形式でもよく、限定されない。また、加工結果入力部２３が受け付けた被加工物３の加工後の面粗さおよび加工電極２の加工後の電極消耗量の良否評価の方法も設計事項であり特に限定しない。また、被加工物３の加工後の面粗さおよび加工電極２の加工後の電極消耗量の良否自体を加工結果入力部２３が受け付けるようにしてもかまわない。 The processing result input unit 23 receives processing results such as the surface roughness of the workpiece 3 after processing and the amount of electrode consumption after processing of the processing electrode 2 input by the user. The processing result input format may be a format in which the display unit 22 displays selectable options and the processing result input unit 23 accepts the user's selection result. Further, the processing result input unit 23 may accept numerical data on the surface roughness of the workpiece 3 after processing and the amount of electrode consumption after processing of the processing electrode 2 input by the user, and the present invention is not limited thereto. Further, the method of evaluating the quality of the surface roughness of the workpiece 3 and the amount of electrode consumption after the processing of the processed electrode 2 received by the processing result input unit 23 is also a design matter and is not particularly limited. Alternatively, the machining result input unit 23 may receive the quality of the surface roughness of the workpiece 3 after machining and the amount of electrode consumption after machining of the machining electrode 2 itself.

一度行われた加工と同じ加工条件設定で再度加工を実行するときに、加工結果入力部２３が受け付けた以前の加工における加工結果を用いることで、図１５から図１７で説明した制御パラメータの変更における変更量に制限の追加または解除を行うことができる。加工結果入力部２３が受け付けた加工結果に基づいて、加工状態観測部３３などが、パラメータ変更部５０に制御パラメータの変更における変更量に制限の追加または解除を実行させる。 When the machining is performed again with the same machining condition setting as the once performed machining, the machining result in the previous machining received by the machining result input unit 23 is used to change the control parameters described in FIGS. 15 to 17. Can be added or canceled to the amount of change in. On the basis of the machining result received by the machining result input unit 23, the machining state observation unit 33 and the like cause the parameter changing unit 50 to add or cancel the limit on the change amount in the change of the control parameter.

具体的には、加工結果入力部２３が受け付けた被加工物３の加工後の面粗さの良否評価が悪いとされた場合、加工面質に影響を及ぼす制御パラメータの変更に制限を加える。一例として、電流パルス長さの制御パラメータである長さ制御パラメータを一定値以上変更しないように長さ制御パラメータの変更幅にパラメータ変更部５０が制限を加える。 Specifically, if the evaluation of the quality of the surface roughness of the workpiece 3 after the processing, which is received by the processing result input unit 23, is determined to be poor, the change of the control parameter affecting the processed surface quality is limited. As an example, the parameter changing unit 50 limits the change width of the length control parameter so that the length control parameter, which is the control parameter of the current pulse length, is not changed beyond a certain value.

また、加工結果入力部２３が受け付けた加工電極２の加工後の電極消耗量が少なくまだ余裕があると判断された場合は、電極消耗量に影響を及ぼす制御パラメータの変更の制限をパラメータ変更部５０が解除する。一例として、回路補助設定のパルス傾き制御パラメータの変更幅を増やして、変更の制限を解除する。逆に電極消耗量が大きい場合は、制御パラメータの変更にパラメータ変更部５０がさらに制限を加えることもできる。 If it is determined that the electrode consumption after machining of the machining electrode 2 received by the machining result input unit 23 is small and there is still room, the restriction on the change of the control parameter affecting the electrode consumption is set in the parameter changing unit. 50 is released. As an example, the change width of the pulse slope control parameter in the circuit auxiliary setting is increased to release the change restriction. Conversely, when the electrode consumption is large, the parameter changing unit 50 can further limit the change of the control parameter.

実施の形態２にかかる放電加工機１Ａによれば、一度行われた加工による加工結果を受け付けることにより、同じ被加工物３の材料および同じ加工条件設定値において、制御パラメータの変更の制限を加工結果に依存させることができる。これにより、実施の形態１で得られる効果加えて、加工後の加工面の面質の向上といった精度向上効果、電極消耗量の低減といったコスト削減効果が得られる。 According to the electric discharge machine 1 </ b> A according to the second embodiment, by accepting the machining result of the machining once performed, the limitation of the change of the control parameter is processed in the same material of the workpiece 3 and the same machining condition set value. It can depend on the result. Thus, in addition to the effects obtained in the first embodiment, an accuracy improvement effect such as an improvement in the surface quality of a processed surface after processing and a cost reduction effect such as a reduction in electrode consumption can be obtained.

実施の形態３．
図１９は、本発明の実施の形態３にかかる放電加工機１Ｂの構成を示すブロック図である。放電加工機１Ｂは、実施の形態２にかかる放電加工機１Ａに、通信部６０が追加されている。通信部６０は、学習結果記憶部８０に記憶された学習結果を送信可能な学習結果データに変換する学習内容ファイル化部６１と、外部から学習結果データを受信する受信部６２と、学習結果データを外部に送信する送信部６３とを備える。受信部６２および送信部６３は、放電加工機１Ｂの外部に存在するクラウドサーバ３００に接続されて通信が可能である。Embodiment 3 FIG.
FIG. 19 is a block diagram showing a configuration of an electric discharge machine 1B according to the third embodiment of the present invention. In the electric discharge machine 1B, a communication unit 60 is added to the electric discharge machine 1A according to the second embodiment. The communication unit 60 includes a learning content filing unit 61 that converts the learning result stored in the learning result storage unit 80 into learning result data that can be transmitted, a receiving unit 62 that receives learning result data from outside, and a learning result data. And a transmission unit 63 for transmitting the data to the outside. The receiving unit 62 and the transmitting unit 63 are connected to and can communicate with a cloud server 300 existing outside the electric discharge machine 1B.

クラウドサーバ３００は、放電加工機１Ｂの制御装置１０と同様な学習機能を有した放電加工機３０１〜３０３にも接続されている。したがって、放電加工機１Ｂは、通信部６０を介して他の放電加工機である放電加工機３０１〜３０３と通信することができる。クラウドサーバ３００は、放電加工機１Ｂの学習結果データのみならず、放電加工機３０１〜３０３の学習結果データも記憶することができる。クラウドサーバ３００と放電加工機１Ｂ，３０１〜３０３との通信方式については、公知の技術を利用すればよく特に制限はない。 The cloud server 300 is also connected to electric discharge machines 301 to 303 having the same learning function as the control device 10 of the electric discharge machine 1B. Therefore, the electric discharge machine 1 </ b> B can communicate with other electric discharge machines 301 to 303 via the communication unit 60. The cloud server 300 can store not only learning result data of the electric discharge machine 1B but also learning result data of the electric discharge machines 301 to 303. The communication method between the cloud server 300 and the electric discharge machines 1B, 301 to 303 is not particularly limited as long as a known technique is used.

実施の形態１および２で説明したような制御パラメータの最適化学習が既に実行されている場合に、学習結果記憶部８０に記憶されている学習結果を、外部に存在する放電加工機３０１〜３０３が利用可能な形式の学習結果データに学習内容ファイル化部６１が変換することができる。学習結果データは、制御装置１０と同様な制御装置であれば利用が可能なデータ形式であればその形式は限定されない。 When the optimization learning of the control parameters as described in the first and second embodiments has already been executed, the learning results stored in the learning result storage unit 80 are transferred to the EDMs 301 to 303 existing outside. The learning content filing unit 61 can convert the learning result data into a format that can be used. The format of the learning result data is not limited as long as it is a data format that can be used as long as it is a control device similar to the control device 10.

学習内容ファイル化部６１により作成された学習結果データは、送信部６３を経由して、クラウドサーバ３００に蓄えることができる。クラウドサーバ３００に蓄えられた学習結果データは、放電加工機３０１〜３０３に対し自動的または能動的に送信され、放電加工機３０１〜３０３がその学習結果データを利用するかどうかは、放電加工機３０１〜３０３のユーザの判断で決定できるものとする。 The learning result data created by the learning content filing unit 61 can be stored in the cloud server 300 via the transmission unit 63. The learning result data stored in the cloud server 300 is automatically or actively transmitted to the electric discharge machines 301 to 303, and it is determined whether the electric discharge machines 301 to 303 use the learning result data. It can be determined by the user's judgments 301 to 303.

この学習結果データを放電加工機３０１〜３０３の中に存在する機械学習装置に取り込むことにより、機械学習装置１００が学習した内容を、放電加工機３０１〜３０３でも同様に利用することができる。 By taking the learning result data into the machine learning devices existing in the electric discharge machines 301 to 303, the contents learned by the machine learning device 100 can be used in the electric discharge machines 301 to 303 in the same manner.

また逆に、放電加工機３０１〜３０３において学習により作成された学習結果データも、クラウドサーバ３００、受信部６２を介して、制御装置１０にて利用することができる。この際、受信部６２を介して放電加工機３０１〜３０３の制御装置で学習した内容または観測の状態は、表示部２２に表示することができる。 Conversely, learning result data created by learning in the electric discharge machines 301 to 303 can also be used by the control device 10 via the cloud server 300 and the receiving unit 62. At this time, the contents learned or the state of observation learned by the control devices of the electric discharge machines 301 to 303 via the receiving unit 62 can be displayed on the display unit 22.

これにより、遠隔地といった外部に存在する放電加工機３０１〜３０３の制御装置による学習結果を放電加工機１Ｂで利用したり、放電加工機３０１〜３０３の加工状態を放電加工機１Ｂで観測したりできる。また、放電加工機１Ｂによる学習結果を同一仕様の放電加工機３０１〜３０３に利用させることも可能となる。したがって、１つの放電加工機単体の調整のみならず、同一仕様の複数の放電加工機に対する機械性能の向上を、同一仕様の放電加工機の数が増えるほど効率的に実行することが可能になる。 As a result, a learning result obtained by a control device of the electric discharge machines 301 to 303 existing outside, such as a remote place, is used by the electric discharge machine 1B, or the machining state of the electric discharge machines 301 to 303 is observed by the electric discharge machine 1B. it can. Further, it is also possible to use the learning result by the electric discharge machine 1B to the electric discharge machines 301 to 303 of the same specification. Therefore, not only the adjustment of one electric discharge machine but also the improvement of the mechanical performance of a plurality of electric discharge machines of the same specification can be performed more efficiently as the number of electric discharge machines of the same specification increases. .

実施の形態１から３にかかる機械学習装置１００は、パーソナルコンピュータまたは汎用コンピュータといったコンピュータシステムにより実現される。図２０は、実施の形態１から３にかかる機械学習装置１００の機能をコンピュータシステムで実現する場合のハードウェア構成を示す図である。機械学習装置１００の機能をコンピュータシステムで実現する場合、機械学習装置１００の機能は、図２０に示すようにＣＰＵ（Central Processing Unit）２０１、メモリ２０２、記憶装置２０３、表示装置２０４および入力装置２０５により実現される。機械学習装置１００が実行する機能は、ソフトウェア、ファームウェア、またはソフトウェアとファームウェアとの組み合わせにより実現される。ソフトウェアまたはファームウェアは、プログラムとして記述されて記憶装置２０３に格納される。ＣＰＵ２０１は、記憶装置２０３に記憶されたソフトウェアまたはファームウェアをメモリ２０２に読み出して実行することにより、機械学習装置１００の機能を実現する。すなわち、コンピュータシステムは、機械学習装置１００の機能がＣＰＵ２０１により実行されるときに、実施の形態１から３にかかる機械学習方法を実施するステップが結果的に実行されることになるプログラムを格納するための記憶装置２０３を備える。また、これらのプログラムは、機械学習装置１００の機能が実現する処理をコンピュータに実行させるものであるともいえる。メモリ２０２は、ＲＡＭ（Random Access Memory）といった揮発性の記憶領域が該当する。記憶装置２０３は、ＲＯＭ（Read Only Memory）、フラッシュメモリといった不揮発性または揮発性の半導体メモリ、磁気ディスクが該当する。表示装置２０４の具体例は、モニタ、ディスプレイである。入力装置２０５の具体例は、キーボード、マウス、タッチパネルである。 The machine learning device 100 according to the first to third embodiments is realized by a computer system such as a personal computer or a general-purpose computer. FIG. 20 is a diagram illustrating a hardware configuration when the functions of the machine learning device 100 according to the first to third embodiments are implemented by a computer system. When the functions of the machine learning device 100 are realized by a computer system, the functions of the machine learning device 100 include a CPU (Central Processing Unit) 201, a memory 202, a storage device 203, a display device 204, and an input device 205, as shown in FIG. Is realized by: The functions executed by the machine learning device 100 are realized by software, firmware, or a combination of software and firmware. The software or firmware is described as a program and stored in the storage device 203. The CPU 201 realizes the functions of the machine learning device 100 by reading out software or firmware stored in the storage device 203 into the memory 202 and executing the software or firmware. That is, the computer system stores a program that, when the functions of the machine learning device 100 are executed by the CPU 201, the steps of executing the machine learning method according to the first to third embodiments are executed as a result. Storage device 203 is provided. In addition, it can be said that these programs cause a computer to execute processing realized by the functions of the machine learning device 100. The memory 202 corresponds to a volatile storage area such as a RAM (Random Access Memory). The storage device 203 corresponds to a nonvolatile or volatile semiconductor memory such as a ROM (Read Only Memory) or a flash memory, or a magnetic disk. Specific examples of the display device 204 are a monitor and a display. Specific examples of the input device 205 are a keyboard, a mouse, and a touch panel.

以上の実施の形態に示した構成は、本発明の内容の一例を示すものであり、別の公知の技術と組み合わせることも可能であるし、本発明の要旨を逸脱しない範囲で、構成の一部を省略、変更することも可能である。 The configurations described in the above embodiments are merely examples of the contents of the present invention, and can be combined with other known technologies, and can be combined with other known technologies without departing from the gist of the present invention. Parts can be omitted or changed.

１，１Ａ，１Ｂ，３０１〜３０３放電加工機、２加工電極、３被加工物、４駆動装置、５加工電源、１０制御装置、１１軸駆動制御部、１２加工電源制御部、１３制御パラメータ保持部、１４初期パラメータ設定部、１５加工条件設定部、２０入出力部、２１加工条件入力部、２２表示部、２３加工結果入力部、３０状態観測部、３１軸駆動認識部、３２パルス状態認識部、３３加工状態観測部、４０学習部、４１第１報酬計算部、４２第１関数更新部、４３第２報酬計算部、４４第２関数更新部、４５第３報酬計算部、４６第３関数更新部、４７報酬計算部、４８関数更新部、５０パラメータ変更部、５１第１制御パラメータ変更部、５２第２制御パラメータ変更部、５３第３制御パラメータ変更部、６０通信部、６１学習内容ファイル化部、６２受信部、６３送信部、８０学習結果記憶部、１００機械学習装置、２０１ＣＰＵ、２０２メモリ、２０３記憶装置、２０４表示装置、２０５入力装置、３００クラウドサーバ。 1, 1A, 1B, 301 to 303 EDM, 2 machining electrode, 3 workpiece, 4 drive device, 5 machining power supply, 10 control device, 11 axis drive control unit, 12 machining power control unit, 13 control parameter holding Unit, 14 initial parameter setting unit, 15 processing condition setting unit, 20 input / output unit, 21 processing condition input unit, 22 display unit, 23 processing result input unit, 30 state observation unit, 31 axis drive recognition unit, 32 pulse state recognition Part, 33 machining state observing part, 40 learning part, 41 first reward calculating part, 42 first function updating part, 43 second reward calculating part, 44 second function updating part, 45 third reward calculating part, 46 third Function update unit, 47 reward calculation unit, 48 function update unit, 50 parameter change unit, 51 first control parameter change unit, 52 second control parameter change unit, 53 third control parameter Change unit, 60 communication unit, 61 learning content filing unit, 62 receiving unit, 63 transmitting unit, 80 learning result storage unit, 100 machine learning device, 201 CPU, 202 memory, 203 storage device, 204 display device, 205 input device , 300 cloud server.

Claims

A machine learning device for learning control parameters for controlling machining conditions in an electric discharge machine,
A state observation unit that observes a plurality of state variables representing a machining state during electric discharge machining,
A learning unit that learns the control parameter based on a plurality of the state variables;
With
The state observation unit includes a first state value that is a value obtained by accumulating the number of times of generation of an unstable signal of a pulse during a predetermined period, and a second state value that is a number of pulses generated in the predetermined period. Observing the value and the value of the third state, which is the feed amount of the shaft in the driving device, as a plurality of the state variables,
The learning unit includes a reward calculation unit that calculates a reward based on the state variable, and a function update unit that updates a function for determining the control parameter based on the reward.
The reward calculator includes a first reward calculator that calculates a reward for voltage control, a second reward calculator that calculates a reward for pulse control, and a third reward calculator that calculates a reward for axis drive control. With
The function update unit includes a first function update unit that updates a function related to voltage control, a second function update unit that updates a function related to pulse control, and a third function update unit that updates a function related to axis drive control. With
The first reward calculation unit increases the reward when the value of the first state is smaller than the previous time, and decreases the reward when the value of the first state is larger than the previous time. Machine learning device.

A machine learning device for learning control parameters for controlling machining conditions in an electric discharge machine,
A state observation unit that observes a plurality of state variables representing a machining state during electric discharge machining,
A learning unit that learns the control parameter based on a plurality of the state variables;
With
The state observation unit includes a first state value that is a value obtained by accumulating the number of times of generation of an unstable signal of a pulse during a predetermined period, and a second state value that is a number of pulses generated in the predetermined period. Observing the value and the value of the third state, which is the feed amount of the shaft in the driving device, as a plurality of the state variables,
The learning unit includes a reward calculation unit that calculates a reward based on the state variable, and a function update unit that updates a function for determining the control parameter based on the reward.
The reward calculator includes a first reward calculator that calculates a reward for voltage control, a second reward calculator that calculates a reward for pulse control, and a third reward calculator that calculates a reward for axis drive control. With
The function update unit includes a first function update unit that updates a function related to voltage control, a second function update unit that updates a function related to pulse control, and a third function update unit that updates a function related to axis drive control. With
The first reward calculation unit increases the reward when the value of the second state is larger than the previous time, and decreases the reward when the value of the second state is smaller than the previous time. Machine learning device.

A machine learning device for learning control parameters for controlling machining conditions in an electric discharge machine,
A state observation unit that observes a plurality of state variables representing a machining state during electric discharge machining,
A learning unit that learns the control parameter based on a plurality of the state variables;
With
The state observation unit includes a first state value that is a value obtained by accumulating the number of times of generation of an unstable signal of a pulse during a predetermined period, and a second state value that is a number of pulses generated in the predetermined period. Observing the value and the value of the third state, which is the feed amount of the shaft in the driving device, as a plurality of the state variables,
The learning unit includes a reward calculation unit that calculates a reward based on the state variable, and a function update unit that updates a function for determining the control parameter based on the reward.
The reward calculator includes a first reward calculator that calculates a reward for voltage control, a second reward calculator that calculates a reward for pulse control, and a third reward calculator that calculates a reward for axis drive control. With
The function update unit includes a first function update unit that updates a function related to voltage control, a second function update unit that updates a function related to pulse control, and a third function update unit that updates a function related to axis drive control. With
The second reward calculation unit increases the reward when the value of the first state is smaller than the previous time, and decreases the reward when the value of the first state is larger than the previous time. Machine learning device.

A machine learning device for learning control parameters for controlling machining conditions in an electric discharge machine,
A state observation unit that observes a plurality of state variables representing a machining state during electric discharge machining,
A learning unit that learns the control parameter based on a plurality of the state variables;
With
The state observation unit includes a first state value that is a value obtained by accumulating the number of times of generation of an unstable signal of a pulse during a predetermined period, and a second state value that is a number of pulses generated in the predetermined period. Observing the value and the value of the third state, which is the feed amount of the shaft in the driving device, as a plurality of the state variables,
The learning unit includes a reward calculation unit that calculates a reward based on the state variable, and a function update unit that updates a function for determining the control parameter based on the reward.
The reward calculator includes a first reward calculator that calculates a reward for voltage control, a second reward calculator that calculates a reward for pulse control, and a third reward calculator that calculates a reward for axis drive control. With
The function update unit includes a first function update unit that updates a function related to voltage control, a second function update unit that updates a function related to pulse control, and a third function update unit that updates a function related to axis drive control. With
The second reward calculation unit increases the reward when the value of the second state is larger than the previous time, and decreases the reward when the value of the second state is smaller than the previous time. Machine learning device.

A machine learning device for learning control parameters for controlling machining conditions in an electric discharge machine,
A state observation unit that observes a plurality of state variables representing a machining state during electric discharge machining,
A learning unit that learns the control parameter based on a plurality of the state variables;
With
The state observation unit includes a first state value that is a value obtained by accumulating the number of times of generation of an unstable signal of a pulse during a predetermined period, and a second state value that is a number of pulses generated in the predetermined period. Observing the value and the value of the third state, which is the feed amount of the shaft in the driving device, as a plurality of the state variables,
The learning unit includes a reward calculation unit that calculates a reward based on the state variable, and a function update unit that updates a function for determining the control parameter based on the reward,
The reward calculator includes a first reward calculator that calculates a reward for voltage control, a second reward calculator that calculates a reward for pulse control, and a third reward calculator that calculates a reward for axis drive control. With
The function update unit includes a first function update unit that updates a function related to voltage control, a second function update unit that updates a function related to pulse control, and a third function update unit that updates a function related to axis drive control. With
The third reward calculation unit increases the reward when the value of the second state is larger than the previous time, and decreases the reward when the value of the second state is smaller than the previous time. Machine learning device.

A machine learning device for learning control parameters for controlling machining conditions in an electric discharge machine,
A state observation unit that observes a plurality of state variables representing a machining state during electric discharge machining,
A learning unit that learns the control parameter based on a plurality of the state variables;
With
The state observation unit includes a first state value that is a value obtained by accumulating the number of times of generation of an unstable signal of a pulse during a predetermined period, and a second state value that is a number of pulses generated in the predetermined period. Observing the value and the value of the third state, which is the feed amount of the shaft in the driving device, as a plurality of the state variables,
The learning unit includes a reward calculation unit that calculates a reward based on the state variable, and a function update unit that updates a function for determining the control parameter based on the reward,
The reward calculator includes a first reward calculator that calculates a reward for voltage control, a second reward calculator that calculates a reward for pulse control, and a third reward calculator that calculates a reward for axis drive control. With
The function update unit includes a first function update unit that updates a function related to voltage control, a second function update unit that updates a function related to pulse control, and a third function update unit that updates a function related to axis drive control. With
The third reward calculation unit increases the reward when the value of the third state is larger than the previous time, and decreases the reward when the value of the third state is smaller than the previous time. Machine learning device.