WO2019202672A1

WO2019202672A1 - Machine learning device, electric discharge machine, and machine learning method

Info

Publication number: WO2019202672A1
Application number: PCT/JP2018/015910
Authority: WO
Inventors: 慎吾千田
Original assignee: 三菱電機株式会社
Priority date: 2018-04-17
Filing date: 2018-04-17
Publication date: 2019-10-24
Also published as: CN111954582A; JPWO2019202672A1; CN111954582B; JP6663538B1

Abstract

A machine learning device (100) learns a control parameter for controlling a machining condition in an electric discharge machine (1). The machine learning device (100) is provided with a state observation unit (30) for observing a plurality of state variables representing a machining state during electric discharge machining, and a learning unit (40) for learning the control parameter on the basis of the plurality of state variables.

Description

Machine learning device, electric discharge machine, and machine learning method

The present invention relates to a machine learning device, an electric discharge machine, and a machine learning method for learning a control parameter for controlling electric discharge machining.

Adaptive control function that automatically changes machining conditions expressed as physical quantities, such as changes in power supply voltage waveform and power supply current waveform, and changes in inter-pole control operation that is a servo operation, in order to perform stable machining in an electrical discharge machine There is. The machining conditions are determined by several to a dozen different machining parameters that can be changed by the user. A parameter that changes the magnitude of the voltage applied to process the workpiece or the shape of the machining current pulse, a parameter that adjusts the relative distance between the workpiece and the machining electrode that is the tool, and a change in the machining electrode feed rate Parameters etc. correspond to machining parameters.

A combination of these machining parameters can be determined in advance as a set of appropriate values experimentally using typical machining shapes, workpiece materials, and electrode materials. In some cases, the user can select. However, the shape to be machined by the electric discharge machine is a three-dimensional complicated shape, and there are various materials to be machined due to the characteristics of the electric discharge machine that can be machined if energized. Therefore, it is necessary to optimize machining parameters. For example, Patent Document 1 discloses that machining parameters are automatically set using a machining state input by an operator.

Japanese Patent Laid-Open No. 2-212041

However, the automatic setting described in Patent Document 1 only adjusts some of the machining parameters that can be set by the user based on a single type of machining state. In addition, in order to realize the machining conditions finally expressed as physical quantities based on the machining parameters, there are innumerable various control parameters as the background, and the control parameters are not adjusted.

Therefore, in the adaptive control of the electric discharge machine, there has been a demand for adaptive control that can acquire more appropriate machining conditions as physical quantities.

The present invention has been made in view of the above, and an object of the present invention is to obtain a machine learning device that can automatically learn more appropriate machining conditions in electric discharge machining.

In order to solve the above-described problems and achieve the object, the machine learning device of the present invention learns control parameters for controlling machining conditions in an electric discharge machine. The machine learning device of the present invention includes a state observation unit that observes a plurality of state variables representing a machining state during electric discharge machining, and a learning unit that learns control parameters based on the plurality of state variables.

The machine learning device according to the present invention has an effect of being able to automatically learn more appropriate machining conditions in electric discharge machining.

1 is a block diagram showing a configuration of an electric discharge machine according to a first embodiment of the present invention. The figure which classified the processing conditions concerning Embodiment 1 for the purpose of control The figure explaining the control parameter of the process conditions in connection with voltage control concerning Embodiment 1. The figure which shows the relationship between the process conditions regarding the voltage control concerning Embodiment 1, and the generation cycle of an electric current pulse. FIG. 3 is a diagram for explaining control parameters of machining conditions related to pulse control according to the first embodiment; The figure which shows the relationship between the process conditions regarding the pulse control concerning Embodiment 1, and the shape of an electric current pulse. FIG. 6 is a diagram for explaining control parameters of machining conditions related to shaft drive control according to the first embodiment; The figure which shows the relationship between the process conditions regarding the axial drive control concerning Embodiment 1, and the distance control by axial drive. The figure explaining the state of the voltage pulse and current pulse concerning Embodiment 1 The figure which shows the case where the voltage pulse and current pulse concerning Embodiment 1 are stable. The figure which shows the case where the voltage pulse and current pulse concerning Embodiment 1 are unstable. The figure which shows distribution of the ideal average voltage value concerning Embodiment 1. The figure which shows distribution of the average voltage value when the stable discharge concerning Embodiment 1 continues The figure which shows distribution of the average voltage value when the unstable discharge concerning Embodiment 1 continues FIG. 3 is a flowchart for explaining optimization processing by learning control parameters of machining conditions related to voltage control according to the first embodiment; FIG. 3 is a flowchart for explaining optimization processing by learning control parameters of machining conditions related to pulse control according to the first embodiment; FIG. 6 is a flowchart for explaining optimization processing by learning control parameters of machining conditions related to the axis drive control according to the first embodiment; Block diagram showing a configuration of an electric discharge machine according to a second embodiment of the present invention. Block diagram showing a configuration of an electric discharge machine according to a third embodiment of the present invention. The figure which shows the hardware constitutions in the case of implement | achieving the function of the machine learning apparatus concerning Embodiment 1-3 from a computer system

Hereinafter, a machine learning device, an electric discharge machine, and a machine learning method according to an embodiment of the present invention will be described in detail with reference to the drawings. Note that the present invention is not limited to the embodiments.

Embodiment 1 FIG.
FIG. 1 is a block diagram showing a configuration of an electric discharge machine 1 according to a first embodiment of the present invention. The electric discharge machine 1 includes a machining electrode 2 serving as a machining tool, a driving device 4 for controlling a distance between the machining electrode 2 and the workpiece 3, and a gap between the machining electrode 2 and the workpiece 3. A machining power source 5 for generating electric discharge, and a drive device 4 and a control device 10 for controlling the machining power source 5. The workpiece 3 is connected to a machining power source 5. The driving device 4 can drive either the processing electrode 2 or the workpiece 3 or both.

The control device 10 corresponds to the shaft drive control unit 11 that controls the drive device 4, the machining power control unit 12 that controls the machining power source 5, the machining condition setting unit 15 that sets machining condition setting values, and the machining conditions. A control parameter holding unit 13 that holds control parameters and an initial parameter setting unit 14 that sets initial values of the control parameters are provided. The processing condition setting value is a setting value for specifying the processing condition.

The control parameter is a parameter that defines the relationship between the machining condition set value and the machining condition, and the machining condition expressed by a specific physical quantity is determined based on the machining condition set value and the control parameter. Therefore, the machining conditions of the machining pattern when machining by generating an electric discharge between the machining electrode 2 and the workpiece 3 are the machining condition setting value set by the machining condition setting unit 15 and the control parameter holding unit 13. It is determined based on the held control parameters. That is, the machining condition expressed by the physical quantity in the electric discharge machine 1 is controlled by the control parameter. The user can set the machining condition set value, but cannot set or change the control parameter. The shaft drive control unit 11 and the machining power control unit 12 issue a command corresponding to the machining pattern of the machining conditions based on the information given from the machining condition setting unit 15 and the control parameter holding unit 13. As will be described later, the control parameter is changed, but the initial value of the control parameter initially set in the control parameter holding unit 13 is set by the initial parameter setting unit 14. When the control parameter is expressed by a correspondence table between the machining condition setting value and the machining condition, the initial value of the control parameter is a correspondence table that is an initial value.

The driving device 4 controls the relative distance and the relative speed between the machining electrode 2 and the workpiece 3 based on the command from the shaft drive control unit 11. The machining power source 5 applies a voltage between the machining electrode 2 and the workpiece 3 based on the command from the machining power source control unit 12 to control the current waveform during discharge.

The control device 10 further includes an input / output unit 20, a machine learning device 100, a parameter changing unit 50, and a learning result storage unit 80.

The input / output unit 20 is an input / output interface that accepts user input and supports the user's confirmation work by display. The input / output unit 20 includes a machining condition input unit 21 that receives a machining condition setting value that the user wants to set in the machining condition setting unit 15, and a display unit 22 that allows the user to perform a confirmation operation for observing the machining state.

The machine learning device 100 includes a state observation unit 30 and a learning unit 40. The state observation unit 30 includes an axis drive recognition unit 31, a pulse state recognition unit 32, and a machining state observation unit 33. The learning unit 40 includes a reward calculation unit 47 and a function update unit 48, and optimizes the learning by learning control parameters.

The reward calculation unit 47 includes a first reward calculation unit 41 that calculates a voltage control reward, a second reward calculation unit 43 that calculates a pulse control reward, and a third reward that calculates a shaft drive control reward. And a calculation unit 45. The function update unit 48 includes a first function update unit 42 that updates a function related to voltage control, a second function update unit 44 that updates a function related to pulse control, and a third function that updates a function related to axis drive control. And an update unit 46.

The parameter changing unit 50 includes a first control parameter changing unit 51 that changes a control parameter of a machining condition related to voltage control, a second control parameter changing unit 52 that changes a control parameter of a machining condition related to pulse control, and an axis drive A third control parameter changing unit 53 that changes a control parameter of a machining condition related to control. The parameter changing unit 50 changes the control parameter held by the control parameter holding unit 13 based on the result learned by the learning unit 40.

The learning result storage unit 80 stores a learning result by the machine learning device 100.

When the electric discharge machine 1 starts electric discharge machining, the shaft drive control unit 11 and the machining power source control unit 12 issue a command based on the machining condition setting value output from the machining condition setting unit 15, and the driving device 4 and the machining power source 5 are processed. Due to this operation, an electric discharge is generated between the machining electrode 2 and the workpiece 3.

While the electric discharge machining is being performed, the driving device 4 sets an optimum relative distance at which electric discharge occurs while decreasing or increasing the relative distance between the machining electrode 2 and the workpiece 3 in accordance with a command from the shaft drive control unit 11. Explore. Information about the position of the drive shaft and the operation of the drive shaft at this time is acquired by the shaft drive recognition unit 31 and recorded in the machining state observation unit 33 as a shaft behavior history.

While electric discharge machining is being performed, the machining power supply 5 applies a voltage between the machining electrode 2 and the workpiece 3 in accordance with a command from the machining power supply control unit 12 simultaneously with the operation of the drive device 4 described above. Then, current pulses having a current waveform having the commanded shape are generated. Based on the machining condition setting value from the machining condition setting unit 15, the machining power source control unit 12 controls the voltage of the machining power source 5 so as to generate current pulses with a current waveform having a commanded shape at a constant period. However, due to physical characteristics, it is impossible to reliably generate current pulses at a constant period in electric discharge machining. Also, the current pulse shape may be different from the current waveform indicated by the theoretical value. The pulse state recognition unit 32 acquires the current pulse generation period and the current pulse shape, and in addition to this, the pulse voltage recognition unit 32 acquires the magnitude of the applied voltage, the application period, and the voltage waveform indicating the voltage pulse shape. Then, it is recorded in the machining state observation unit 33 as a pulse behavior history.

The machining state observation unit 33 obtains the voltage value distribution, the current pulse generation period, the axis position information when the current pulse is generated, the speed information, and the acceleration information from the pulse behavior history and the shaft behavior history. The machining state observation unit 33 uses the information obtained by the electric discharge machining performed under the currently used control parameters as the currently used control parameters set in the control parameter holding unit 13. It associates and gives to the learning part 40.

Hereinafter, the processing conditions and the relationship between the processing conditions and the control parameters will be described in detail. FIG. 2 is a diagram in which machining conditions according to the first embodiment are classified for control purposes. FIG. 3 is a diagram for explaining control parameters of machining conditions related to voltage control according to the first embodiment. FIG. 4 is a diagram illustrating a relationship between a machining condition related to voltage control according to the first embodiment and a generation period of a current pulse. FIG. 5 is a diagram for explaining control parameters of machining conditions related to pulse control according to the first embodiment. FIG. 6 is a diagram illustrating a relationship between the processing conditions related to the pulse control according to the first embodiment and the shape of the current pulse. FIG. 7 is a diagram for explaining control parameters of machining conditions related to the shaft drive control according to the first embodiment. FIG. 8 is a diagram illustrating a relationship between the machining conditions related to the shaft drive control according to the first embodiment and the distance control by the shaft drive.

In FIG. 2, (1) type of machining circuit, (2) circuit auxiliary setting, (3) current pulse peak value, (4) current pulse length, (5) pulse pause time, (6) gap gap adjustment value (7) Jump speed, (8) Jump height, (9) Deepest value duration, (10) Axis response, (11) Target voltage value, machining conditions related to voltage control, pulse control When corresponding to the machining conditions related to the above or the machining conditions related to the axis drive control, a black circle is added to the corresponding column. The machining conditions related to voltage control are related to the generation period of the current pulse, the processing conditions related to pulse control are related to the shape of the current pulse, and the machining conditions related to shaft drive control are related to the gap control.

3, FIG. 5 and FIG. 7 show the control parameters held by the control parameter holding unit 13 corresponding to the machining conditions in which the machining condition setting unit 15 sets the machining condition set values. Each machining condition shown in FIG. 2 is related to the generation period of the current pulse, the shape of the current pulse, and the inter-electrode control, but may be overlapped. Thus, changing the relevant machining parameters to change any of the current pulse generation period, current pulse shape, or inter-pole control may affect others.

Also, each machining condition is designated as a notch by a machining condition setting value, and a plurality of machining conditions are expressed as a notch pattern that is a combination of notches designated for each machining condition. A notch is a step that discretely specifies a physical quantity indicating a processing condition. Usually, several types or several tens of types of notch patterns are registered in the electric discharge machine 1 in advance. Each machining condition has a control parameter that cannot be changed by the user separately from the selection of the notch by the machining condition set value. As described above, a machining condition that is a specific physical quantity is determined based on a machining condition setting value indicating the selection of a notch and a control parameter. Specific examples of the control parameters are the number of notch divisions and the notch distribution value. The number of notch divisions is the number of notches that can be selected under the processing conditions. The notch distribution value is a value of a physical quantity of a machining condition assigned to each notch. However, the control parameters are not limited to these. When the control parameters for each of the 11 types of machining conditions shown in FIG. 2 are regarded as variables, the total number of variables ranges from tens to hundreds.

FIG. 3 explains the control parameters of machining conditions related to voltage control. FIG. 4 shows machining conditions related to voltage control, (4) current pulse length, (5) pulse pause time, (6) gap gap adjustment value, (9) deepest value duration, and (10) axis response. And (11) an outline of how the target voltage value relates to the current pulse generation period. (4) Current pulse length and (5) pulse pause time are the processing conditions indicated by the widths indicated by arrows in FIG. 4, (6) inter-pole jump adjustment value, (9) deepest value duration, (10 The shaft response and (11) the target voltage value are machining conditions related to the current pulse generation period.

(4) The current pulse length control parameters are the number of notch divisions and the notch distribution value that are the length control parameters. It is assumed that notch division numbers and notch distribution values, which are certain control parameters, are set for the current pulse length. At this time, the current pulse length = 2 μsec corresponds to the notch specified by the machining condition setting value 0, and the current pulse length = 4 μsec corresponds to the notch specified by the machining condition setting value 1. A correspondence relationship in which the current pulse length = 8 μsec corresponds to the notch specified by the value 2 is defined by the control parameter. When the control parameter of the current pulse length is changed, the correspondence relationship is changed, so that the current pulse length for the same machining condition setting value is changed. However, even if the control parameter is changed, depending on the changed notch distribution value, the values of the machining conditions for all the machining condition setting values may not be changed.

FIG. 5 illustrates control parameters for machining conditions related to pulse control. FIG. 6 shows machining conditions related to pulse control, (1) type of machining circuit, (2) circuit auxiliary setting, (3) current pulse peak value, (4) current pulse length, and (6) gap between electrodes. The adjustment value, (11) shows an outline of how the target voltage value relates to the shape of the current pulse. (1) When the circuit call parameter for the type of machining circuit is changed, the machining circuit is changed, so that the shape of the current pulse changes. (2) The circuit auxiliary setting prescribes the rising slope of the current pulse. (3) The current pulse peak value defines the peak value of the current pulse. (4) The current pulse length defines the pulse length of the current pulse. (6) The gap gap adjustment value and (11) the target voltage value are processing conditions related to the interval between current pulses.

As a specific example, (3) the control parameter of the current pulse peak value is the number of notch divisions and the notch distribution value which are the peak control parameters. Assume that the notch division number and the notch distribution value, which are certain control parameters, are set for the current pulse peak value I _p . At this time, I _p = 1A corresponds to the notch specified by the machining condition setting value 0, I _p = 2A corresponds to the notch specified by the machining condition setting value 1, and the machining condition setting value 2 is specified. A correspondence relationship such that I _p = 4A corresponds to the notch to be defined is defined by the control parameter. When the control parameter of the current pulse peak value I _p is changed, the correspondence relationship is changed, so that the value of I _p for the same processing condition setting value is changed. However, even if the control parameter is changed, depending on the changed notch distribution value, the values of the machining conditions for all the machining condition setting values may not be changed.

FIG. 7 illustrates control parameters of machining conditions related to the axis drive control. FIG. 8 shows machining conditions related to axis drive control. (6) Inter-pole gap adjustment value, (7) Jump speed, (8) Jump height, (9) Deepest value duration, (10) Axis response And (11) shows an outline of how the target voltage value relates to the shaft drive control. (6) Inter-electrode gap adjustment value, (9) Deepest value duration, (10) Axial response, and (11) Target voltage value are machining conditions related to the approach operation between the machining electrode 2 and the workpiece 3. is there. (7) Jump speed, (8) Jump height, and (10) Axial response are machining conditions related to the retreat operation of the machining electrode 2 from the workpiece 3 including the jump operation of the drive shaft.

Next, the stability or instability of voltage pulse and current pulse in electric discharge machining will be described. FIG. 9 is a diagram for explaining a state of voltage pulses and current pulses according to the first embodiment. FIG. 10 is a diagram illustrating a case where the voltage pulse and the current pulse according to the first embodiment are stable. FIG. 11 is a diagram illustrating a case where the voltage pulse and the current pulse according to the first embodiment are unstable. 9 to 11, the upper part shows the voltage waveform, and the lower part shows the current waveform.

When a voltage is applied between the machining electrode 2 and the workpiece 3, a dielectric breakdown occurs at an unexpected timing and a current flows. When an ideal voltage and current relationship that enables stable processing occurs, a current pulse close to a rectangular wave having a certain slope formed by a transistor circuit or the like is generated. This current pulse is shown as a stable discharge in FIG. If such an ideal voltage and current relationship is not satisfied, the current waveform of the current pulse becomes unstable as shown in FIG. 9 which is different from the ideal, or the current is not effective for machining. It may be like the abnormal discharge in FIG. 9 in which an irregularly shaped current waveform is generated.

In the control of electrical discharge machining, as an index for controlling the relative distance between the poles, control is performed by observing an average voltage value per certain time at the time of electrical discharge. When the ideal voltage and current relationship is maintained, the average voltage value is maintained at the theoretical value as shown in FIG. 10, and stable discharge continues. However, when the unstable or abnormal discharge of FIG. 9 is repeated, the average voltage value varies from the theoretical value as shown in FIG. 11, and the unstable discharge continues. When the distance between the electrodes is lost and the machining electrode 2 and the workpiece 3 are in contact with each other, a short-circuit state is established, and when the distance between the machining electrode 2 and the workpiece 3 is far enough to prevent discharge, the electrode is opened. Therefore, the fluctuation of the average voltage value with respect to the theoretical value does not immediately determine whether the discharge pulse is stable or unstable. Further, even when the stable current pulse pattern shown in FIG. 10 continues to be generated under ideal conditions, there is an unexpected time interval called no-load voltage time until dielectric breakdown occurs. The period of occurrence is not constant. Therefore, the increase / decrease in the discharge generation cycle is an index independent of the processing stability.

FIG. 12 is a diagram showing an ideal average voltage value distribution according to the first embodiment. FIG. 13 is a diagram illustrating a distribution of average voltage values when the stable discharge according to the first embodiment continues. FIG. 14 is a diagram illustrating a distribution of average voltage values when the unstable discharge according to the first embodiment continues. In FIGS. 12 to 14, the horizontal axis indicates the average voltage value per fixed time when the discharge occurs, and the vertical axis indicates the number of pulses per fixed time.

When the ideal voltage and current relationship is maintained, as shown in FIG. 12, the average voltage value is the number of pulses determined by the theoretical value. In actual machining, an average voltage value is distributed around the target voltage value, which is a machining condition indicating a target voltage value, due to a physical phenomenon, and the number of pulses is also distributed. The target voltage value need not be a theoretical value. When the machining is stable and the stable discharge continues as shown in FIG. 10, the variation in the average voltage value is small as shown in FIG. 13, and the number of pulses is maximized when the average voltage value is the target voltage value. Yes. In addition, when the machining is unstable and unstable discharge continues as shown in FIG. 11, the average voltage value is dispersed around the target voltage value as shown in FIG. 14, and the number of pulses also varies. End up.

The pulse state recognition unit 32 determines the quality of the distribution based on the voltage distribution at the time of occurrence of discharge in a certain period, and determines whether the pulse is stable or unstable. As an example, the pulse state recognition unit 32 determines whether the voltage pulse and the current pulse are stable or unstable based on the relationship between the average voltage value, the target voltage value, and the voltage threshold value obtained from the machining power supply control unit 12. Determine. Specifically, the pulse state recognizing unit 32 has an absolute value of deviation from the target voltage value of the average voltage value per fixed time at the time of occurrence of discharge larger than the voltage threshold based on the command of the machining power supply control unit 12. In this case, an unstable signal of a pulse is generated, and a value obtained by accumulating the number of occurrences of the unstable signal during a predetermined period longer than the predetermined time is obtained as the value of the first state. Furthermore, the pulse state recognition part 32 calculates | requires the pulse number which generate | occur | produced in the predetermined period obtained from the instruction | command of the process power supply control part 12 as a value of a 2nd state. The predetermined period may be, for example, an operation time from when a retreat operation called a jump operation is completed, an inter-electrode position control for generating a discharge is performed, and a next jump operation is performed again. it can. The shaft drive recognition unit 31 obtains the shaft feed amount in the drive device 4 obtained from the command of the shaft drive control unit 11 as the value of the third state. The value of the third state is set to be a positive value as the shaft feed amount increases in the machining progress direction, and to a negative value as the shaft feed amount increases in the backward direction. The value of the first state, the value of the second state, and the value of the third state are state variables representing the machining state during electric discharge machining, and the machining state observation unit 33 is the first state that is the acquired plurality of state variables. , The value of the second state, and the value of the third state are displayed on the display unit 22 so that the user can visually observe them in the form of a histogram using a distribution chart or a bar graph. In this way, the state observation unit 30 observes the values of the first state, the second state, and the third state, which are a plurality of state variables. And the 1st reward calculation part 41 of the learning part 40, the 2nd reward calculation part 43, and the 3rd reward calculation part 45 are the value of the 1st state which the processing state observation part 33 acquired, the value of a 2nd state, and 3rd. Calculate rewards based on state values.

Any learning algorithm used by the machine learning device 100 including the state observing unit 30, the learning unit 40, and the parameter changing unit 50 may be used. As an example, a case where reinforcement learning (Reinforcement Learning) is applied will be described.

Reinforcement learning is that an action agent who is an agent in an environment observes the current state and decides an action to be taken. Agents receive rewards from the environment by selecting actions, and learn how to get the most rewards through a series of actions. As a typical method of reinforcement learning, Q-learning or TD-learning is known. For example, in the case of Q learning, a general update formula of the action value function Q (s, a) is represented by the following formula (1). The behavior value function Q (s, a) is also called a behavior value table.

In Equation (1), s _t represents the state at time t, a _t represents the behavior in time t. By the action a _t, the state is changed to s _{t + 1.} r _{t + 1} represents a reward obtained by a change in the state, γ represents a discount rate, and α represents a learning coefficient.

Represented update equation in Equation (1) in the Q learning, action value of the best action a at time t + 1 is greater than the action value Q of the executed action a _t at time t, activation level at time t If Q is increased, and vice versa, the action value Q at time t is decreased. In other words, the action value Q action a _t at time t, as close to the best action value at time t + 1, action value function Q (s _t, a _t) Update. Thereby, the best action value in a certain environment is sequentially propagated to the action value in the previous environment.

Accordingly, in the operation of the machine learning system 100 to be described below, the change behavior of the control parameters and behavior a _t at time t, the first, if the state s _t at the second and third states the time t, Q You can understand that you are learning.

Hereinafter, the operation of optimizing the control parameters by the machine learning device 100 will be described.

FIG. 15 is a flowchart for explaining an optimization process by learning control parameters of machining conditions related to voltage control according to the first embodiment. Control parameters for machining conditions related to voltage control are variable values for performing voltage control that is the basis of the target voltage value set as machining conditions. This enables not only the magnitude of the voltage but also the shape of the voltage waveform. A voltage reference value called a reference voltage for detecting discharge is also included. In addition, by optimizing control parameters for machining conditions related to voltage control, processing such as changing the initial notch pattern of no-load voltage time and target voltage value set as control parameters to another notch pattern is also performed. Is called.

The 1st reward calculation part 41 which calculates the reward concerning voltage control calculates the variation | change_quantity of a reward based on the value of the 1st state which is the state variable which the pulse state recognition part 32 calculated | required, and the value of a 2nd state. If the first reward calculation unit 41 calculates the amount of change in reward so as to increase the reward when the value of the first state is small and the value of the second state is large, the value of the first state and the second value There is no restriction on how the state value is used to determine the amount of change in reward. Specifically, the reward is increased when the value of the first state becomes smaller, and the reward is decreased when the value of the first state becomes larger. In addition, the reward is increased when the value of the second state is increased, and the reward is decreased when the value of the second state is decreased. In addition to the basic criterion of increasing rewards when the number of unstable pulses decreases and the number of stable pulses increases, the number of stable pulses decreases even when unstable pulses decrease. In such a case, the reward calculation method may be determined so that the reward decreases.

Based on the reward calculated by the first reward calculator 41, the first function updater 42 updates the action value function Q, which is a function for determining a control parameter related to voltage control. Based on the updated behavior value function Q, the first control parameter changing unit 51 changes the control parameter of the processing condition related to the voltage control so that the control parameter can obtain the most reward.

Based on the above, the optimization of the six types of control parameters of the machining conditions related to the voltage control shown in FIG. 3 will be described with reference to FIG. FIG. 15 is executed in a situation where the electric discharge machine 1 is continuously executing electric discharge machining, and it is described that priority is set for the control parameters to be changed, but six types of control parameters are optimized at the same time. You may make it make it.

Suppose that the first reward calculation unit 41 already holds the initial value of the reward for voltage control before the flowchart of FIG. 15 is executed. The initial value of the reward is not limited as long as it is a fixed value, and may be 0. First, the state observation unit 30 observes information of the machining power source control unit 12 when machining is performed with the current machining conditions and control parameters (step S101). Specifically, the state observation unit 30 acquires a command from the machining power source control unit 12 during machining. Then, based on the command from the machining power supply control unit 12, the pulse state recognition unit 32 calculates the value of the first state and the value of the second state (step S102). Next, the value of the first state and the value of the second state, which are the state variables obtained by the pulse state recognition unit 32, are given from the machining state observation unit 33 to the first reward calculation unit 41. Here, the value of the first state and the value of the second state are associated with the currently used control parameter set in the control parameter holding unit 13, and the processing state observation unit 33 to the first reward calculation unit 41. Given to.

And the 1st reward calculation part 41 compares the value of the given 1st state with the value of the 1st state of the last time (Step S103). The first reward calculation unit 41 holds the value of the first state given last time, and can compare it with the value of the first state given this time. When the value of the first state is smaller than the value of the previous first state (step S103: small), the first reward calculation unit 41 increases the reward (step S104). That is, if the value of the first state indicates a more stable state than the previous time, the reward is increased. The increase value of the reward here is a predetermined value. When the value of the first state is the same as the value of the previous first state (step S103: the same), the first reward calculation unit 41 does not change the reward (step S105). When the value of the first state is larger than the value of the previous first state (step S103: large), the first reward calculation unit 41 reduces the reward (step S106). That is, if the value of the first state indicates a more unstable state than the previous time, the reward is reduced. Here, the decrease value of the reward is a predetermined value. Note that when step S103 is executed for the first time, the value of the first state given last time does not exist, so the process proceeds to step S105.

Next, the first reward calculating unit 41 compares the given second state value with the previous second state value (step S107). The first reward calculation unit 41 holds the value of the second state given last time, and can compare it with the value of the second state given this time. When the value of the second state is larger than the value of the second state of the previous time (step S107: large), the first reward calculation unit 41 increases the reward (step S108). That is, when the value of the second state indicates a more stable state than the previous time, the reward is increased. The increase value of the reward here is a predetermined value. When the value of the second state is the same as the value of the second state of the previous time (step S107: the same), the first reward calculation unit 41 does not change the reward (step S109). When the value of the second state is smaller than the previous value of the second state (step S107: small), the first reward calculation unit 41 reduces the reward (step S110). That is, if the value of the second state indicates a more unstable state than the previous time, the reward is reduced. Here, the decrease value of the reward is a predetermined value. When step S107 is executed for the first time, the value of the second state given last time does not exist, so the process proceeds to step S109.

And the 1st function update part 42 updates action value function Q according to a formula (1) based on the reward which the 1st reward calculation part 41 computed (Step S111). Further, the first function updating unit 42 determines whether or not the update is not performed in Step S111 and the action value function Q has converged (Step S112). When it is determined that the behavior value function Q has not converged (step S112: No), the first control parameter changing unit 51 performs processing conditions related to voltage control based on the behavior value function Q updated in step S111. The control parameter is changed (step S113). After step S113, the process returns to step S101. When it is determined that the action value function Q has converged (step S112: Yes), the learning unit 40 determines whether or not all the control parameters of the processing conditions related to voltage control have been changed by the first control parameter changing unit 51. Is determined (step S114). If it is determined that all the control parameters of the machining conditions related to voltage control have not been changed (step S114: No), the control parameter to be changed by the first control parameter changing unit 51 is changed to another control in step S113. The parameters are changed (step S115). Another control parameter that is a new change target in step S115 is a control parameter that has not yet been changed with respect to voltage control. After step S115, the process proceeds to step S113.

The change of the control parameters of the machining conditions related to voltage control in step S113 will be described in detail below. As described above, the priority order to be changed is determined for the six control parameters of the machining conditions related to the voltage control shown in FIG. 3 that are changed in step S113. When first entering step S113, the first control parameter changing unit 51 changes the voltage control parameter that is the control parameter of the target voltage value. Each time it is determined in step S112 that the behavioral value function Q has converged, the control parameters to be changed by the first control parameter changing unit 51 are the GAIN control parameter, which is an axis response control parameter, and the pulse pause time. Length control parameter, gap control parameter as control parameter of gap gap adjustment value, length control parameter as control parameter of deepest duration, length control as control parameter of current pulse length The parameters are changed in the order of parameters in step S115.

When the learning unit 40 determines that the action value function Q has converged and all the control parameters of the machining conditions related to voltage control have been changed (step S114: Yes), the learning of the control parameters of the machining conditions related to voltage control is performed. The optimization process ends, and the learning result is stored in the learning result storage unit 80 (step S116). In the learning result, in addition to the control parameters that are changed and finally determined in step S113, the value of the change process of each control parameter, the value of the first state and the value of the second state corresponding to the control parameter Is included. The learning result stored in the learning result storage unit 80 can be used for pass / fail judgment before and after the change of the control parameter. Further, the control parameter finally determined as described above is most rewarded in the learning, and is held in the control parameter holding unit 13 as an optimal control parameter for the given processing condition setting value. By optimizing the control parameters of machining conditions related to voltage control by learning, it is possible to prevent the generation of unstable signals from the start to the end of machining, and to maximize the number of stable signal pulses. become. As described above, when simultaneously optimizing the six types of control parameters of the machining conditions related to voltage control, in step S113, the first control parameter changing unit 51 updates the action value function Q updated in step S111. 6 types of control parameters are changed simultaneously. In this case, steps S114 and S115 are unnecessary, and if it is determined in step S112 that the action value function Q has converged (step S112: Yes), the process may immediately proceed to step S116.

FIG. 16 is a flowchart for explaining an optimization process by learning control parameters of machining conditions related to the pulse control according to the first embodiment. Control parameters of machining conditions related to pulse control are variable values for performing current pulse control, such as a pulse inclination and an abnormal discharge detection threshold that is a basis of a theoretical value of a pulse generation period. Control parameters of machining conditions related to pulse control include not only the magnitude and width of the current pulse but also the current waveform shape and the relative distance between the machining electrode 2 and the workpiece 3 for bringing the current value close to the ideal shape. The adjustment value of the gap between the electrodes for adjusting is also included. In addition, processing such as changing the initial notch pattern of the current magnitude and width set as the control parameters to another notch pattern is performed by optimizing the control parameters of the processing conditions related to the pulse control.

2nd reward calculation part 43 which calculates the reward concerning pulse control calculates a reward based on the value of the 1st state which is the state variable which pulse state recognition part 32 asked, and the value of the 2nd state. The reward calculation method of the second reward calculation unit 43 is the same as that of the first reward calculation unit 41.

Based on the reward calculated by the second reward calculator 43, the second function updater 44 updates the action value function Q, which is a function for determining a control parameter related to pulse control. Based on the updated behavior value function Q, the second control parameter changing unit 52 changes the control parameter of the processing condition related to the pulse control so as to obtain the control parameter with the most reward.

Based on the above, the optimization of the six types of control parameters of the machining conditions related to the pulse control shown in FIG. 5 will be described with reference to FIG. FIG. 16 is executed in a situation where the electric discharge machine 1 is continuously executing electric discharge machining, and it is described that priority is set for the control parameters to be changed. However, six types of control parameters are optimized at the same time. You may make it make it.

Suppose that the second reward calculation unit 43 holds the initial value of reward for pulse control before the flowchart of FIG. 16 is executed. The initial value of the reward is not limited as long as it is a fixed value, and may be 0. First, the state observation unit 30 observes information of the machining power source control unit 12 when machining is performed with the current machining conditions and control parameters (step S201). Specifically, the state observation unit 30 acquires a command from the machining power source control unit 12 during machining. Then, based on the command of the machining power supply control unit 12, the pulse state recognition unit 32 calculates the value of the first state and the value of the second state (step S202). Next, the value of the first state and the value of the second state, which are the state variables obtained by the pulse state recognition unit 32, are given from the machining state observation unit 33 to the second reward calculation unit 43. Here, the value of the first state and the value of the second state are associated with the currently used control parameter set in the control parameter holding unit 13, and the processing state observation unit 33 to the second reward calculation unit 43. Given to.

And the 2nd reward calculation part 43 compares the value of the given 1st state with the value of the 1st state of the last time (Step S203). The second reward calculation unit 43 holds the value of the first state given last time, and can compare it with the value of the first state given this time. When the value of the first state is smaller than the previous value of the first state (step S203: small), the second reward calculation unit 43 increases the reward (step S204). The increase value of the reward here is a predetermined value. If the value of the first state is the same as the value of the previous first state (step S203: the same), the second reward calculation unit 43 does not change the reward (step S205). When the value of the first state is larger than the value of the first state of the previous time (step S203: large), the second reward calculation unit 43 reduces the reward (step S206). Here, the decrease value of the reward is a predetermined value. When step S203 is executed for the first time, the value of the first state given last time does not exist, so the process proceeds to step S205.

Next, the second reward calculation unit 43 compares the given second state value with the previous second state value (step S207). The second reward calculation unit 43 holds the value of the second state given last time and can compare it with the value of the second state given this time. When the value of the second state is larger than the value of the second state of the previous time (step S207: large), the second reward calculation unit 43 increases the reward (step S208). The increase value of the reward here is a predetermined value. When the value of the second state is the same as the value of the second state of the previous time (step S207: the same), the second reward calculation unit 43 does not change the reward (step S209). When the value of the second state is smaller than the previous value of the second state (step S207: small), the second reward calculation unit 43 reduces the reward (step S210). Here, the decrease value of the reward is a predetermined value. When step S207 is executed for the first time, the value of the second state given last time does not exist, so the process proceeds to step S209.

Then, the second function updating unit 44 updates the action value function Q according to the mathematical formula (1) based on the reward calculated by the second reward calculating unit 43 (step S211). Furthermore, the second function update unit 44 determines whether or not the update is not performed in step S211 and the action value function Q has converged (step S212). When it is determined that the action value function Q has not converged (step S212: No), the second control parameter changing unit 52 performs processing conditions related to pulse control based on the action value function Q updated in step S211. The control parameter is changed (step S213). After step S213, the process returns to step S201. When it is determined that the behavior value function Q has converged (step S212: Yes), the learning unit 40 determines whether or not all the control parameters of the processing conditions related to the pulse control have been changed by the second control parameter changing unit 52. Is determined (step S214). When it is determined that all the control parameters of the machining conditions related to the pulse control have not been changed (step S214: No), the control parameter to be changed by the second control parameter changing unit 52 is changed to another control in step S213. The parameters are changed (step S215). Another control parameter that is a new change target in step S215 is a control parameter that has not yet been changed related to pulse control. After step S215, the process proceeds to step S213.

The change of the control parameters of the machining conditions related to the pulse control in step S213 will be described in detail below. As described above, priorities to be changed are determined for the six types of control parameters of the machining conditions related to the pulse control shown in FIG. 5 that are changed in step S213. When step S213 is entered first, the second control parameter changing unit 52 changes the voltage control parameter that is the control parameter of the target voltage value. Each time it is determined in step S212 that the action value function Q has converged, the control parameter to be changed by the second control parameter changing unit 52 is a gap control parameter or circuit that is a control parameter for the gap gap adjustment value. Pulse tilt control parameter that is a control parameter for auxiliary setting, length control parameter that is a control parameter for current pulse length, peak control parameter that is a control parameter for current pulse peak value, circuit call that is a control parameter for processing circuit type In step S215, the parameters are changed in order.

If the learning unit 40 determines that the action value function Q has converged and all of the machining condition control parameters related to pulse control have been changed (step S214: Yes), the learning of the machining condition control parameters related to pulse control is performed. The optimization process ends, and the learning result is stored in the learning result storage unit 80 (step S216). In the learning result, in addition to the control parameters that are changed and finally determined in step S213, the value of the change process of each control parameter, the value of the first state and the value of the second state corresponding to the control parameter Is included. The learning result stored in the learning result storage unit 80 can be used for pass / fail judgment before and after the change of the control parameter. Further, the control parameter finally determined as described above is most rewarded in the learning, and is held in the control parameter holding unit 13 as an optimal control parameter for the given processing condition setting value. By optimizing the control parameters of machining conditions related to pulse control by learning, it is possible to maximize the number of stable signal pulses by preventing the generation of unstable signals from the start to the end of machining. become. As described above, when simultaneously optimizing the six types of control parameters of the machining conditions related to the pulse control, in step S213, the second control parameter changing unit 52 determines the action value function Q updated in step S211. 6 types of control parameters are changed simultaneously. In this case, steps S214 and S215 are unnecessary, and if it is determined in step S212 that the action value function Q has converged (step S212: Yes), the process may proceed immediately to step S216.

FIG. 17 is a flowchart for explaining an optimization process by learning control parameters of machining conditions related to the axis drive control according to the first embodiment. The control parameter of the machining condition related to the shaft drive control is also called an inter-electrode control parameter, a deceleration distance when the machining electrode 2 and the workpiece 3 are brought close to each other, a speed for generating a behavior of an instantaneous retreat action called a jump operation, and It is a variable value for changing the shaft drive behavior of the electric discharge machine 1, such as an acceleration parameter. In order to change the control parameters between the poles, not only changes in the shaft response to generate stable discharge between the poles, but also changes in jumping action to clean the gaps, natural frequency vibration due to over-response of the shaft Also included is a parameter change to prevent vibration due to.

The third reward calculation unit 45 that calculates the reward related to the shaft drive control is the value of the second state that is the state variable obtained by the pulse state recognition unit 32 and the third state that is the state variable obtained by the shaft drive recognition unit 31. Calculate the reward based on the value of. If the value of the second state is large and the value of the third state increases, the third reward calculating unit 45 calculates the amount of change in the reward so as to increase the reward. There is no restriction on how the state value is used to determine the amount of change in reward. Specifically, the reward is increased when the value of the second state is increased, and the reward is decreased when the value of the second state is decreased. In addition to this, the reward is increased when the value of the third state is increased, and the reward is decreased when the value of the third state is decreased. If the number of discharge pulses increases even if there is no change in the feed amount of the shaft, the reward will increase, but if the number of discharge pulses decreases even if the feed amount of the shaft increases in the machining progress direction May determine how to calculate the reward so that the reward decreases.

Based on the reward calculated by the third reward calculator 45, the third function updater 46 updates the action value function Q, which is a function for determining a control parameter related to the axis drive control. Based on the updated behavior value function Q, the third control parameter changing unit 53 changes the control parameter of the machining condition related to the axial drive control so that the control parameter can obtain the most reward.

Based on the above, the optimization of the five types of control parameters of the machining conditions related to the axis drive control shown in FIG. 7 will be described with reference to FIG. FIG. 17 is executed in a situation where the electric discharge machine 1 is continuously executing electric discharge machining, and it is described that priority is set for the control parameters to be changed, but five types of control parameters are optimized at the same time. You may make it make it.

Suppose that the third reward calculation unit 45 already holds the initial value of the reward for the axis drive control before the flowchart of FIG. 17 is executed. The initial value of the reward is not limited as long as it is a fixed value, and may be 0. First, the state observation unit 30 observes information of the machining power source control unit 12 when machining is performed with the current machining conditions and control parameters (step S301). Specifically, the state observation unit 30 acquires a command from the machining power source control unit 12 during machining. Then, based on the command of the machining power supply control unit 12, the pulse state recognition unit 32 calculates the value of the second state (step S303). Further, the state observing unit 30 observes information of the shaft drive control unit 11 when the machining is being executed with the current machining conditions and control parameters (step S302). Specifically, the state observation unit 30 acquires a command from the shaft drive control unit 11 during machining. Then, based on the command from the shaft drive control unit 11, the shaft drive recognition unit 31 calculates the value of the third state (step S304). Next, the value of the second state obtained by the pulse state recognition unit 32 and the value of the third state obtained by the shaft drive recognition unit 31 are provided from the machining state observation unit 33 to the third reward calculation unit 45. Here, the value of the second state and the value of the third state are associated with the currently used control parameter set in the control parameter holding unit 13, and the machining state observation unit 33 to the third reward calculation unit 45. Given to.

And the 3rd reward calculation part 45 compares the value of the given 2nd state with the value of the 2nd state of the last time (Step S305). The third reward calculation unit 45 holds the value of the second state given last time, and can compare it with the value of the second state given this time. When the value of the second state is larger than the value of the second state of the previous time (step S305: large), the third reward calculation unit 45 increases the reward (step S306). The increase value of the reward here is a predetermined value. When the value of the second state is the same as the value of the second state of the previous time (step S305: the same), the third reward calculation unit 45 does not change the reward (step S307). When the value of the second state is smaller than the previous value of the second state (step S305: small), the third reward calculation unit 45 reduces the reward (step S308). Here, the decrease value of the reward is a predetermined value. When step S305 is executed for the first time, the value of the second state given last time does not exist, so the process proceeds to step S307.

Next, the third reward calculation unit 45 compares the given third state value with the previous third state value (step S309). The third reward calculation unit 45 holds the value of the third state given last time, and can compare it with the value of the third state given this time. If the value of the third state is larger than the value of the previous third state (step S309: large), the third reward calculation unit 45 increases the reward (step S310). That is, when the value of the third state indicates a more stable state than the previous time, the reward is increased. The increase value of the reward here is a predetermined value. When the value of the third state is the same as the value of the previous third state (step S309: the same), the third reward calculation unit 45 does not change the reward (step S311). When the value of the third state is smaller than the value of the previous third state (step S309: small), the third reward calculation unit 45 reduces the reward (step S312). That is, when the value of the third state indicates a more unstable state than the previous time, the reward is reduced. Here, the decrease value of the reward is a predetermined value. When step S309 is executed for the first time, the value of the third state given last time does not exist, so the process proceeds to step S311.

And the 3rd function update part 46 updates action value function Q according to Numerical formula (1) based on the reward which the 3rd reward calculation part 45 computed (Step S313). Further, the third function update unit 46 determines whether or not the update is not performed in step S313 and the behavior value function Q has converged (step S314). When it is determined that the behavior value function Q has not converged (step S314: No), the third control parameter changing unit 53 performs processing related to the axis drive control based on the behavior value function Q updated in step S313. The control parameter for the condition is changed (step S315). After step S315, the process returns to steps S301 and S302. When it is determined that the action value function Q has converged (step S314: Yes), the learning unit 40 determines whether or not all the control parameters of the machining conditions related to the axis drive control have been changed by the third control parameter changing unit 53. Is determined (step S316). When it is determined that all the control parameters of the machining conditions related to the axis drive control have not been changed (step S316: No), the control parameter to be changed by the third control parameter changing unit 53 is changed to another parameter in step S315. The control parameter is changed (step S317). Another control parameter that is a new change target in step S317 is a control parameter that has not been changed yet related to the axis drive control. After step S317, the process proceeds to step S315.

The change of the control parameters of the machining conditions related to the axis drive control in step S315 will be described in detail below. As described above, the priority order to be changed is determined for the five types of control parameters of the machining conditions related to the axis drive control shown in FIG. 7 changed in step S315. It is the voltage control parameter that is the control parameter of the target voltage value that is changed by the third control parameter changing unit 53 when first entering step S315. Each time it is determined in step S314 that the action value function Q has converged, the control parameter to be changed by the third control parameter changing unit 53 is the GAIN control parameter, which is an axis response control parameter, and the deepest value persistence. In step S317, the length control parameter is a time control parameter, the jump control parameter is a jump speed and jump height control parameter, and the gap control parameter is a control parameter for the gap gap adjustment value.

When the behavior value function Q has converged and the learning unit 40 determines that all of the machining condition control parameters related to the axis drive control have been changed (step S316: Yes), the control parameters of the machining conditions related to the axis drive control are set. The optimization process by learning ends, and the learning result is stored in the learning result storage unit 80 (step S318). In the learning result, in addition to the control parameters finally changed after being changed in step S315, the value of the change process of each control parameter, the value of the second state and the value of the third state corresponding to the control parameter Is included. The learning result stored in the learning result storage unit 80 can be used for pass / fail judgment before and after the change of the control parameter. Further, the control parameter finally determined as described above is most rewarded in the learning, and is held in the control parameter holding unit 13 as an optimal control parameter for the given processing condition setting value. By optimizing the control parameters of the machining conditions related to the axis drive control by learning, the retreat operation called jump operation is completed, the inter-electrode position control is performed to generate discharge, and the next jump operation is performed again. It is possible to increase the number of pulses in one operation unit until it is performed, and increase the feed amount of the axis in each observation in the processing progress direction to promote the progress of the processing. As described above, when simultaneously optimizing the five kinds of control parameters of the machining conditions related to the axis drive control, in step S315, the third control parameter changing unit 53 updates the action value function updated in step S313. Based on Q, five types of control parameters are changed simultaneously. In this case, steps S316 and S317 are unnecessary, and if it is determined in step S314 that the action value function Q has converged (step S314: Yes), the process may proceed to step S318 immediately.

Further, in the description of FIG. 17 from FIG 15, the method of changing the control parameter is changed based on the updated action value function Q is action value in the current state s _t function Q (s _{_t,} a _t) action value Q sought is not particularly limited as long as the manner for obtaining the action a _t or control parameters such as the maximum.

In addition, since the control parameters of the same machining conditions are the same, when the flowcharts of FIGS. 15 to 17 are executed in parallel, the same control parameters are changed by the respective flowcharts.

15 to 17, the control parameter optimization operation is performed from the stage where the machining operation by the electric discharge machine 1 is started and electric discharge is generated, and is continued until the electric discharge machining is completed. That is, simultaneously with the start of machining, the machining state is observed by the state observing unit 30, and an optimum control parameter is searched for by the learning unit 40 and the parameter changing unit 50 until the machining is completed. That is, the machine learning device 100 executes the flowcharts of FIGS. 15 to 17 in parallel, and the control parameter update continues until all the end conditions of FIGS. 15 to 17 are satisfied. When all the end conditions are satisfied, the control parameter change ends.

The learning action by the machine learning device 100 is continuously performed from the start of electric discharge machining to the end of machining. A reward for the learning behavior is obtained based on the first, second, and third states, and the control parameter changing behavior is performed. By this learning behavior, the behavior value Q based on the optimal control parameter obtained after the processing is finished is higher than the behavior value Q based on the control parameter that is initially set. The action value Q is increased by the electric discharge machine 1 according to the first embodiment, so that the time required for finishing the machining is shortened, and the machining accuracy and machining surface quality of the workpiece obtained by machining by stable electric discharge are improved. Improvement is obtained as an effect.

In the conventional adaptive control, the machining condition set value is controlled according to a rule determined to stabilize the machining, but the adaptive control for changing the control parameter is not performed. On the other hand, according to the machine learning device 100, the optimization learning for adjusting the control parameter is executed while actually performing the electric discharge machining according to the workpiece shape and the workpiece material. Stable machining conditions can be automatically learned. In other words, according to the machine learning device 100, optimization of control parameters is possible without limiting the applicable range of adaptive control even under adaptive control usage conditions that are difficult to assume in advance, such as the shape of the workpiece, the electrode material, and the electrode shape. It is possible to improve the processing speed and processing accuracy by improving the processing stability.

Embodiment 2. FIG.
FIG. 18 is a block diagram showing a configuration of an electric discharge machine 1A according to the second embodiment of the present invention. The electric discharge machine 1A adds a machining result input unit 23, which is a configuration for performing additional learning using the machining result, to the input / output unit 20 to the electric discharge machine 1 according to the first embodiment.

In the first embodiment, the learning behavior of the control parameter when performing a specific machining has been described. However, in the second embodiment, machining is performed once in advance with the same material of the workpiece 3 and the same machining condition setting value. Suppose that it was done. As a result of processing once, it is assumed that the surface roughness after processing the workpiece 3 and the consumption weight or the consumption length, which is the electrode consumption amount after processing the processing electrode 2, are obtained.

The machining result input unit 23 receives machining results such as the surface roughness after machining of the workpiece 3 and the electrode consumption after machining of the machining electrode 2 input by the user. The format of the processing result input may be a format in which the selection result that can be selected by the display unit 22 is displayed, and the processing result input unit 23 receives the user selection result. The processing result input unit 23 may accept numerical data regarding the surface roughness after processing of the workpiece 3 and the electrode consumption after processing of the processing electrode 2 input by the user, and is not limited. Further, the surface roughness after processing of the workpiece 3 received by the processing result input unit 23 and the method for evaluating the quality of the electrode consumption after processing of the processed electrode 2 are also design items and are not particularly limited. Further, the processing result input unit 23 may accept the surface roughness after processing the workpiece 3 and the quality of the electrode consumption after processing the processing electrode 2 itself.

When the machining is performed again with the same machining condition setting as that performed once, the machining parameter change described with reference to FIGS. 15 to 17 is performed by using the machining result in the previous machining accepted by the machining result input unit 23. You can add or remove restrictions on the amount of change in. Based on the machining result received by the machining result input unit 23, the machining state observation unit 33 or the like causes the parameter changing unit 50 to add or remove a restriction on the change amount in the change of the control parameter.

More specifically, when the quality evaluation of the surface roughness after processing of the workpiece 3 received by the processing result input unit 23 is determined to be poor, the control parameter change that affects the processing surface quality is limited. As an example, the parameter changing unit 50 limits the change width of the length control parameter so as not to change the length control parameter that is a control parameter of the current pulse length by a certain value or more.

In addition, when it is determined that the processed electrode 2 received by the processing result input unit 23 has a small amount of electrode consumption after processing and that there is still room, the parameter changing unit limits the change of the control parameter that affects the electrode consumption amount. 50 is released. As an example, the change width of the pulse inclination control parameter of the circuit auxiliary setting is increased, and the restriction on the change is released. Conversely, when the amount of electrode consumption is large, the parameter changing unit 50 can further limit the change of the control parameter.

According to the electric discharge machine 1A according to the second embodiment, by accepting a machining result obtained by machining once, the change of the control parameter is machined with the same material of the workpiece 3 and the same machining condition setting value. Can depend on the result. Thereby, in addition to the effects obtained in the first embodiment, an accuracy improvement effect such as an improvement in the quality of the processed surface after processing and a cost reduction effect such as a reduction in the amount of electrode consumption can be obtained.

Embodiment 3 FIG.
FIG. 19 is a block diagram showing a configuration of an electric discharge machine 1B according to the third embodiment of the present invention. In the electric discharge machine 1B, a communication unit 60 is added to the electric discharge machine 1A according to the second embodiment. The communication unit 60 includes a learning content filing unit 61 that converts learning results stored in the learning result storage unit 80 into transmittable learning result data, a receiving unit 62 that receives learning result data from the outside, and learning result data Is transmitted to the outside. The receiving unit 62 and the transmitting unit 63 are connected to the cloud server 300 existing outside the electric discharge machine 1B and can communicate with each other.

The cloud server 300 is also connected to electric discharge machines 301 to 303 having the same learning function as the control device 10 of the electric discharge machine 1B. Therefore, the electric discharge machine 1B can communicate with the electric discharge machines 301 to 303, which are other electric discharge machines, via the communication unit 60. The cloud server 300 can store not only learning result data of the electric discharge machine 1B but also learning result data of the electric discharge machines 301 to 303. A communication method between the cloud server 300 and the electric discharge machine 1B, 301 to 303 is not particularly limited as long as a known technique is used.

When control parameter optimization learning as described in the first and second embodiments has already been performed, the learning results stored in the learning result storage unit 80 are used as the electric discharge machines 301 to 303 existing outside. Can be converted into learning result data 61 in a format that can be used. The learning result data is not limited as long as it is a data format that can be used by a control device similar to the control device 10.

The learning result data created by the learning content filing unit 61 can be stored in the cloud server 300 via the transmission unit 63. The learning result data stored in the cloud server 300 is automatically or actively transmitted to the electric discharge machines 301 to 303, and whether or not the electric discharge machines 301 to 303 use the learning result data depends on the electric discharge machine. It can be determined by the user's judgment of 301-303.

The contents learned by the machine learning device 100 can be used in the electric discharge machines 301 to 303 in the same manner by taking this learning result data into the machine learning devices existing in the electric discharge machines 301 to 303.

Conversely, learning result data created by learning in the electric discharge machines 301 to 303 can also be used by the control device 10 via the cloud server 300 and the receiving unit 62. At this time, the contents learned by the control device of the electric discharge machines 301 to 303 via the receiving unit 62 or the observation state can be displayed on the display unit 22.

As a result, the learning result by the control device for the electric discharge machines 301 to 303 existing outside such as a remote place is used in the electric discharge machine 1B, or the machining state of the electric discharge machines 301 to 303 is observed with the electric discharge machine 1B it can. In addition, the learning results obtained by the electric discharge machine 1B can be used by the electric discharge machines 301 to 303 having the same specifications. Accordingly, not only adjustment of a single electric discharge machine but also improvement of mechanical performance for a plurality of electric discharge machines of the same specification can be efficiently performed as the number of electric discharge machines of the same specification increases. .

The machine learning apparatus 100 according to the first to third embodiments is realized by a computer system such as a personal computer or a general-purpose computer. FIG. 20 is a diagram illustrating a hardware configuration when the function of the machine learning device 100 according to the first to third embodiments is realized by a computer system. When the functions of the machine learning device 100 are realized by a computer system, the functions of the machine learning device 100 are a CPU (Central Processing Unit) 201, a memory 202, a storage device 203, a display device 204, and an input device 205 as shown in FIG. It is realized by. The function executed by the machine learning device 100 is realized by software, firmware, or a combination of software and firmware. Software or firmware is described as a program and stored in the storage device 203. The CPU 201 implements the functions of the machine learning device 100 by reading the software or firmware stored in the storage device 203 into the memory 202 and executing the software or firmware. That is, the computer system stores a program that results in the steps of executing the machine learning method according to the first to third embodiments when the function of the machine learning device 100 is executed by the CPU 201. A storage device 203 is provided. These programs can be said to cause a computer to execute processing realized by the functions of the machine learning device 100. The memory 202 corresponds to a volatile storage area such as RAM (Random Access Memory). The storage device 203 corresponds to a nonvolatile or volatile semiconductor memory such as a ROM (Read Only Memory) or a flash memory, or a magnetic disk. Specific examples of the display device 204 are a monitor and a display. Specific examples of the input device 205 are a keyboard, a mouse, and a touch panel.

The configuration described in the above embodiment shows an example of the content of the present invention, and can be combined with another known technique, and can be combined with other configurations within the scope of the present invention. It is also possible to omit or change the part.

1, 1A, 1B, 301-303 Electric discharge machine, 2 machining electrodes, 3 workpieces, 4 drive units, 5 machining power sources, 10 control units, 11 axis drive control units, 12 machining power source control units, 13 holding control parameters Unit, 14 initial parameter setting unit, 15 machining condition setting unit, 20 input / output unit, 21 machining condition input unit, 22 display unit, 23 machining result input unit, 30 state observation unit, 31 axis drive recognition unit, 32 pulse state recognition Part, 33 machining state observation part, 40 learning part, 41 first reward calculation part, 42 first function update part, 43 second reward calculation part, 44 second function update part, 45 third reward calculation part, 46 third Function update unit, 47 reward calculation unit, 48 function update unit, 50 parameter change unit, 51 first control parameter change unit, 52 second control parameter change unit, 3. Third control parameter changing unit, 60 communication unit, 61 learning content file forming unit, 62 receiving unit, 63 transmitting unit, 80 learning result storage unit, 100 machine learning device, 201 CPU, 202 memory, 203 storage device, 204 display Device, 205 input device, 300 cloud server.

Claims

A machine learning device for learning control parameters for controlling machining conditions in an electric discharge machine,
A state observation unit for observing a plurality of state variables representing a machining state during electric discharge machining;
A learning unit that learns the control parameter based on a plurality of the state variables;
A machine learning device comprising:
The state observation unit
The value of the first state, which is a value obtained by accumulating the number of occurrences of unstable pulse signals during a predetermined period, the value of the second state, which is the number of pulses generated during the predetermined period, and the driving device The machine learning device according to claim 1, wherein a value of a third state that is a feed amount of the shaft is observed as a plurality of the state variables.
The state observation unit
A pulse state recognizing unit for obtaining a value of the first state and a value of the second state based on a command of a machining power source control unit for controlling a machining power source;
A shaft drive recognition unit for obtaining a value of the third state based on a command of a shaft drive control unit that controls the drive device for controlling the distance between the machining electrode and the workpiece;
The machine learning device according to claim 2, comprising:
The learning unit
A reward calculation unit for calculating a reward based on the state variable;
A function updater for updating a function for determining the control parameter based on the reward;
The machine learning device according to claim 2, further comprising:
The reward calculation unit increases the reward when the state variable indicates a more stable state than the previous time, and reduces the reward when the state variable indicates a more unstable state than the previous time. The machine learning device according to claim 4, wherein
The reward calculator includes a first reward calculator that calculates a reward for voltage control, a second reward calculator for calculating a reward for pulse control, and a third reward calculator for calculating a reward for axis drive control. And
The function update unit includes a first function update unit that updates a function related to voltage control, a second function update unit that updates a function related to pulse control, and a third function update unit that updates a function related to axis drive control. The machine learning device according to claim 4, wherein the machine learning device is provided.
The first reward calculation unit increases the reward when the value of the first state is smaller than the previous value, and reduces the reward when the value of the first state is larger than the previous value. The machine learning device according to claim 6.
The first reward calculation unit increases the reward when the value of the second state is larger than the previous value, and reduces the reward when the value of the second state is smaller than the previous value. The machine learning device according to claim 6.
The second reward calculation unit increases the reward when the value of the first state is smaller than the previous value, and decreases the reward when the value of the first state is larger than the previous value. The machine learning device according to claim 6.
The second reward calculation unit increases the reward when the value of the second state is larger than the previous value, and decreases the reward when the value of the second state is smaller than the previous value. The machine learning device according to claim 6.
The third reward calculation unit increases the reward when the value of the second state is larger than the previous value, and decreases the reward when the value of the second state is smaller than the previous value. The machine learning device according to claim 6.
The third reward calculation unit increases the reward when the value of the third state is larger than the previous value, and decreases the reward when the value of the third state is smaller than the previous value. The machine learning device according to claim 6.
The machine learning device according to any one of claims 1 to 12,
A machining condition setting section for setting a machining condition setting value;
A control parameter holding unit for holding the control parameter;
A machining power source control unit that controls a machining power source based on the machining condition setting value and the control parameter;
An axis drive control unit that controls a drive device for controlling the distance between the machining electrode and the workpiece based on the machining condition setting value and the control parameter;
A parameter changing unit that changes the control parameter held by the control parameter holding unit based on the learning result of the learning unit;
An electric discharge machine characterized by comprising:
A machining result input unit for receiving machining results;
The electric discharge machine according to claim 13, wherein the parameter changing unit adds or removes a restriction on the change of the control parameter based on the machining result.
The electric discharge machine according to claim 13 or 14, further comprising a communication unit capable of communicating with another electric discharge machine.
The electric discharge machine according to any one of claims 13 to 15, further comprising a learning result storage unit that stores the control parameter changed by the parameter changing unit.
A machine learning method of a machine learning device for learning control parameters for controlling machining conditions in an electric discharge machine,
Obtaining a plurality of state variables representing a machining state during electric discharge machining;
Learning the control parameter based on a plurality of the state variables;
A machine learning method comprising: