WO2019202672A1 - Machine learning device, electric discharge machine, and machine learning method - Google Patents

Machine learning device, electric discharge machine, and machine learning method Download PDF

Info

Publication number
WO2019202672A1
WO2019202672A1 PCT/JP2018/015910 JP2018015910W WO2019202672A1 WO 2019202672 A1 WO2019202672 A1 WO 2019202672A1 JP 2018015910 W JP2018015910 W JP 2018015910W WO 2019202672 A1 WO2019202672 A1 WO 2019202672A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
state
machining
reward
unit
Prior art date
Application number
PCT/JP2018/015910
Other languages
French (fr)
Japanese (ja)
Inventor
慎吾 千田
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to PCT/JP2018/015910 priority Critical patent/WO2019202672A1/en
Priority to JP2019508980A priority patent/JP6663538B1/en
Priority to CN201880092284.8A priority patent/CN111954582B/en
Publication of WO2019202672A1 publication Critical patent/WO2019202672A1/en

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B23MACHINE TOOLS; METAL-WORKING NOT OTHERWISE PROVIDED FOR
    • B23HWORKING OF METAL BY THE ACTION OF A HIGH CONCENTRATION OF ELECTRIC CURRENT ON A WORKPIECE USING AN ELECTRODE WHICH TAKES THE PLACE OF A TOOL; SUCH WORKING COMBINED WITH OTHER FORMS OF WORKING OF METAL
    • B23H1/00Electrical discharge machining, i.e. removing metal with a series of rapidly recurring electrical discharges between an electrode and a workpiece in the presence of a fluid dielectric
    • B23H1/02Electric circuits specially adapted therefor, e.g. power supply, control, preventing short circuits or other abnormal discharges
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B23MACHINE TOOLS; METAL-WORKING NOT OTHERWISE PROVIDED FOR
    • B23HWORKING OF METAL BY THE ACTION OF A HIGH CONCENTRATION OF ELECTRIC CURRENT ON A WORKPIECE USING AN ELECTRODE WHICH TAKES THE PLACE OF A TOOL; SUCH WORKING COMBINED WITH OTHER FORMS OF WORKING OF METAL
    • B23H7/00Processes or apparatus applicable to both electrical discharge machining and electrochemical machining
    • B23H7/14Electric circuits specially adapted therefor, e.g. power supply
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B23MACHINE TOOLS; METAL-WORKING NOT OTHERWISE PROVIDED FOR
    • B23HWORKING OF METAL BY THE ACTION OF A HIGH CONCENTRATION OF ELECTRIC CURRENT ON A WORKPIECE USING AN ELECTRODE WHICH TAKES THE PLACE OF A TOOL; SUCH WORKING COMBINED WITH OTHER FORMS OF WORKING OF METAL
    • B23H7/00Processes or apparatus applicable to both electrical discharge machining and electrochemical machining
    • B23H7/14Electric circuits specially adapted therefor, e.g. power supply
    • B23H7/20Electric circuits specially adapted therefor, e.g. power supply for programme-control, e.g. adaptive

Definitions

  • the present invention relates to a machine learning device, an electric discharge machine, and a machine learning method for learning a control parameter for controlling electric discharge machining.
  • Adaptive control function that automatically changes machining conditions expressed as physical quantities, such as changes in power supply voltage waveform and power supply current waveform, and changes in inter-pole control operation that is a servo operation, in order to perform stable machining in an electrical discharge machine There is.
  • the machining conditions are determined by several to a dozen different machining parameters that can be changed by the user.
  • a parameter that changes the magnitude of the voltage applied to process the workpiece or the shape of the machining current pulse, a parameter that adjusts the relative distance between the workpiece and the machining electrode that is the tool, and a change in the machining electrode feed rate Parameters etc. correspond to machining parameters.
  • machining parameters can be determined in advance as a set of appropriate values experimentally using typical machining shapes, workpiece materials, and electrode materials. In some cases, the user can select. However, the shape to be machined by the electric discharge machine is a three-dimensional complicated shape, and there are various materials to be machined due to the characteristics of the electric discharge machine that can be machined if energized. Therefore, it is necessary to optimize machining parameters. For example, Patent Document 1 discloses that machining parameters are automatically set using a machining state input by an operator.
  • Patent Document 1 only adjusts some of the machining parameters that can be set by the user based on a single type of machining state.
  • various control parameters as the background, and the control parameters are not adjusted.
  • the present invention has been made in view of the above, and an object of the present invention is to obtain a machine learning device that can automatically learn more appropriate machining conditions in electric discharge machining.
  • the machine learning device of the present invention learns control parameters for controlling machining conditions in an electric discharge machine.
  • the machine learning device of the present invention includes a state observation unit that observes a plurality of state variables representing a machining state during electric discharge machining, and a learning unit that learns control parameters based on the plurality of state variables.
  • the machine learning device has an effect of being able to automatically learn more appropriate machining conditions in electric discharge machining.
  • FIG. 1 is a block diagram showing a configuration of an electric discharge machine according to a first embodiment of the present invention.
  • the figure which classified the processing conditions concerning Embodiment 1 for the purpose of control The figure explaining the control parameter of the process conditions in connection with voltage control concerning Embodiment 1.
  • FIG. 3 is a diagram for explaining control parameters of machining conditions related to pulse control according to the first embodiment;
  • FIG. 6 is a diagram for explaining control parameters of machining conditions related to shaft drive control according to the first embodiment; The figure which shows the relationship between the process conditions regarding the axial drive control concerning Embodiment 1, and the distance control by axial drive.
  • the figure explaining the state of the voltage pulse and current pulse concerning Embodiment 1 The figure which shows the case where the voltage pulse and current pulse concerning Embodiment 1 are stable. The figure which shows the case where the voltage pulse and current pulse concerning Embodiment 1 are unstable. The figure which shows distribution of the ideal average voltage value concerning Embodiment 1.
  • the figure which shows distribution of the average voltage value when the stable discharge concerning Embodiment 1 continues The figure which shows distribution of the average voltage value when the unstable discharge concerning Embodiment 1 continues FIG.
  • FIG. 3 is a flowchart for explaining optimization processing by learning control parameters of machining conditions related to voltage control according to the first embodiment;
  • FIG. 3 is a flowchart for explaining optimization processing by learning control parameters of machining conditions related to pulse control according to the first embodiment;
  • FIG. 6 is a flowchart for explaining optimization processing by learning control parameters of machining conditions related to the axis drive control according to the first embodiment;
  • FIG. 1 is a block diagram showing a configuration of an electric discharge machine 1 according to a first embodiment of the present invention.
  • the electric discharge machine 1 includes a machining electrode 2 serving as a machining tool, a driving device 4 for controlling a distance between the machining electrode 2 and the workpiece 3, and a gap between the machining electrode 2 and the workpiece 3.
  • the workpiece 3 is connected to a machining power source 5.
  • the driving device 4 can drive either the processing electrode 2 or the workpiece 3 or both.
  • the control device 10 corresponds to the shaft drive control unit 11 that controls the drive device 4, the machining power control unit 12 that controls the machining power source 5, the machining condition setting unit 15 that sets machining condition setting values, and the machining conditions.
  • a control parameter holding unit 13 that holds control parameters and an initial parameter setting unit 14 that sets initial values of the control parameters are provided.
  • the processing condition setting value is a setting value for specifying the processing condition.
  • the control parameter is a parameter that defines the relationship between the machining condition set value and the machining condition, and the machining condition expressed by a specific physical quantity is determined based on the machining condition set value and the control parameter. Therefore, the machining conditions of the machining pattern when machining by generating an electric discharge between the machining electrode 2 and the workpiece 3 are the machining condition setting value set by the machining condition setting unit 15 and the control parameter holding unit 13. It is determined based on the held control parameters. That is, the machining condition expressed by the physical quantity in the electric discharge machine 1 is controlled by the control parameter. The user can set the machining condition set value, but cannot set or change the control parameter.
  • the shaft drive control unit 11 and the machining power control unit 12 issue a command corresponding to the machining pattern of the machining conditions based on the information given from the machining condition setting unit 15 and the control parameter holding unit 13. As will be described later, the control parameter is changed, but the initial value of the control parameter initially set in the control parameter holding unit 13 is set by the initial parameter setting unit 14.
  • the initial value of the control parameter is a correspondence table that is an initial value.
  • the driving device 4 controls the relative distance and the relative speed between the machining electrode 2 and the workpiece 3 based on the command from the shaft drive control unit 11.
  • the machining power source 5 applies a voltage between the machining electrode 2 and the workpiece 3 based on the command from the machining power source control unit 12 to control the current waveform during discharge.
  • the control device 10 further includes an input / output unit 20, a machine learning device 100, a parameter changing unit 50, and a learning result storage unit 80.
  • the input / output unit 20 is an input / output interface that accepts user input and supports the user's confirmation work by display.
  • the input / output unit 20 includes a machining condition input unit 21 that receives a machining condition setting value that the user wants to set in the machining condition setting unit 15, and a display unit 22 that allows the user to perform a confirmation operation for observing the machining state.
  • the machine learning device 100 includes a state observation unit 30 and a learning unit 40.
  • the state observation unit 30 includes an axis drive recognition unit 31, a pulse state recognition unit 32, and a machining state observation unit 33.
  • the learning unit 40 includes a reward calculation unit 47 and a function update unit 48, and optimizes the learning by learning control parameters.
  • the reward calculation unit 47 includes a first reward calculation unit 41 that calculates a voltage control reward, a second reward calculation unit 43 that calculates a pulse control reward, and a third reward that calculates a shaft drive control reward. And a calculation unit 45.
  • the function update unit 48 includes a first function update unit 42 that updates a function related to voltage control, a second function update unit 44 that updates a function related to pulse control, and a third function that updates a function related to axis drive control. And an update unit 46.
  • the parameter changing unit 50 includes a first control parameter changing unit 51 that changes a control parameter of a machining condition related to voltage control, a second control parameter changing unit 52 that changes a control parameter of a machining condition related to pulse control, and an axis drive A third control parameter changing unit 53 that changes a control parameter of a machining condition related to control.
  • the parameter changing unit 50 changes the control parameter held by the control parameter holding unit 13 based on the result learned by the learning unit 40.
  • the learning result storage unit 80 stores a learning result by the machine learning device 100.
  • the shaft drive control unit 11 and the machining power source control unit 12 issue a command based on the machining condition setting value output from the machining condition setting unit 15, and the driving device 4 and the machining power source 5 are processed. Due to this operation, an electric discharge is generated between the machining electrode 2 and the workpiece 3.
  • the driving device 4 sets an optimum relative distance at which electric discharge occurs while decreasing or increasing the relative distance between the machining electrode 2 and the workpiece 3 in accordance with a command from the shaft drive control unit 11. Explore. Information about the position of the drive shaft and the operation of the drive shaft at this time is acquired by the shaft drive recognition unit 31 and recorded in the machining state observation unit 33 as a shaft behavior history.
  • the machining power supply 5 applies a voltage between the machining electrode 2 and the workpiece 3 in accordance with a command from the machining power supply control unit 12 simultaneously with the operation of the drive device 4 described above. Then, current pulses having a current waveform having the commanded shape are generated. Based on the machining condition setting value from the machining condition setting unit 15, the machining power source control unit 12 controls the voltage of the machining power source 5 so as to generate current pulses with a current waveform having a commanded shape at a constant period. However, due to physical characteristics, it is impossible to reliably generate current pulses at a constant period in electric discharge machining. Also, the current pulse shape may be different from the current waveform indicated by the theoretical value.
  • the pulse state recognition unit 32 acquires the current pulse generation period and the current pulse shape, and in addition to this, the pulse voltage recognition unit 32 acquires the magnitude of the applied voltage, the application period, and the voltage waveform indicating the voltage pulse shape. Then, it is recorded in the machining state observation unit 33 as a pulse behavior history.
  • the machining state observation unit 33 obtains the voltage value distribution, the current pulse generation period, the axis position information when the current pulse is generated, the speed information, and the acceleration information from the pulse behavior history and the shaft behavior history.
  • the machining state observation unit 33 uses the information obtained by the electric discharge machining performed under the currently used control parameters as the currently used control parameters set in the control parameter holding unit 13. It associates and gives to the learning part 40.
  • FIG. 2 is a diagram in which machining conditions according to the first embodiment are classified for control purposes.
  • FIG. 3 is a diagram for explaining control parameters of machining conditions related to voltage control according to the first embodiment.
  • FIG. 4 is a diagram illustrating a relationship between a machining condition related to voltage control according to the first embodiment and a generation period of a current pulse.
  • FIG. 5 is a diagram for explaining control parameters of machining conditions related to pulse control according to the first embodiment.
  • FIG. 6 is a diagram illustrating a relationship between the processing conditions related to the pulse control according to the first embodiment and the shape of the current pulse.
  • FIG. 7 is a diagram for explaining control parameters of machining conditions related to the shaft drive control according to the first embodiment.
  • FIG. 8 is a diagram illustrating a relationship between the machining conditions related to the shaft drive control according to the first embodiment and the distance control by the shaft drive.
  • (1) type of machining circuit (2) circuit auxiliary setting, (3) current pulse peak value, (4) current pulse length, (5) pulse pause time, (6) gap gap adjustment value (7) Jump speed, (8) Jump height, (9) Deepest value duration, (10) Axis response, (11) Target voltage value, machining conditions related to voltage control, pulse control When corresponding to the machining conditions related to the above or the machining conditions related to the axis drive control, a black circle is added to the corresponding column.
  • the machining conditions related to voltage control are related to the generation period of the current pulse
  • the processing conditions related to pulse control are related to the shape of the current pulse
  • the machining conditions related to shaft drive control are related to the gap control.
  • FIG. 5 and FIG. 7 show the control parameters held by the control parameter holding unit 13 corresponding to the machining conditions in which the machining condition setting unit 15 sets the machining condition set values.
  • Each machining condition shown in FIG. 2 is related to the generation period of the current pulse, the shape of the current pulse, and the inter-electrode control, but may be overlapped. Thus, changing the relevant machining parameters to change any of the current pulse generation period, current pulse shape, or inter-pole control may affect others.
  • each machining condition is designated as a notch by a machining condition setting value, and a plurality of machining conditions are expressed as a notch pattern that is a combination of notches designated for each machining condition.
  • a notch is a step that discretely specifies a physical quantity indicating a processing condition. Usually, several types or several tens of types of notch patterns are registered in the electric discharge machine 1 in advance.
  • Each machining condition has a control parameter that cannot be changed by the user separately from the selection of the notch by the machining condition set value. As described above, a machining condition that is a specific physical quantity is determined based on a machining condition setting value indicating the selection of a notch and a control parameter.
  • control parameters are the number of notch divisions and the notch distribution value.
  • the number of notch divisions is the number of notches that can be selected under the processing conditions.
  • the notch distribution value is a value of a physical quantity of a machining condition assigned to each notch.
  • the control parameters are not limited to these. When the control parameters for each of the 11 types of machining conditions shown in FIG. 2 are regarded as variables, the total number of variables ranges from tens to hundreds.
  • FIG. 3 explains the control parameters of machining conditions related to voltage control.
  • FIG. 4 shows machining conditions related to voltage control, (4) current pulse length, (5) pulse pause time, (6) gap gap adjustment value, (9) deepest value duration, and (10) axis response. And (11) an outline of how the target voltage value relates to the current pulse generation period.
  • (4) Current pulse length and (5) pulse pause time are the processing conditions indicated by the widths indicated by arrows in FIG. 4, (6) inter-pole jump adjustment value, (9) deepest value duration, (10
  • the shaft response and (11) the target voltage value are machining conditions related to the current pulse generation period.
  • FIG. 5 illustrates control parameters for machining conditions related to pulse control.
  • FIG. 6 shows machining conditions related to pulse control, (1) type of machining circuit, (2) circuit auxiliary setting, (3) current pulse peak value, (4) current pulse length, and (6) gap between electrodes.
  • the adjustment value, (11) shows an outline of how the target voltage value relates to the shape of the current pulse. (1) When the circuit call parameter for the type of machining circuit is changed, the machining circuit is changed, so that the shape of the current pulse changes. (2) The circuit auxiliary setting prescribes the rising slope of the current pulse. (3) The current pulse peak value defines the peak value of the current pulse. (4) The current pulse length defines the pulse length of the current pulse. (6) The gap gap adjustment value and (11) the target voltage value are processing conditions related to the interval between current pulses.
  • FIG. 7 illustrates control parameters of machining conditions related to the axis drive control.
  • FIG. 8 shows machining conditions related to axis drive control.
  • (6) Inter-pole gap adjustment value, (7) Jump speed, (8) Jump height, (9) Deepest value duration, (10) Axis response And (11) shows an outline of how the target voltage value relates to the shaft drive control.
  • Target voltage value are machining conditions related to the approach operation between the machining electrode 2 and the workpiece 3. is there.
  • Jump speed, (8) Jump height, and (10) Axial response are machining conditions related to the retreat operation of the machining electrode 2 from the workpiece 3 including the jump operation of the drive shaft.
  • FIG. 9 is a diagram for explaining a state of voltage pulses and current pulses according to the first embodiment.
  • FIG. 10 is a diagram illustrating a case where the voltage pulse and the current pulse according to the first embodiment are stable.
  • FIG. 11 is a diagram illustrating a case where the voltage pulse and the current pulse according to the first embodiment are unstable. 9 to 11, the upper part shows the voltage waveform, and the lower part shows the current waveform.
  • control is performed by observing an average voltage value per certain time at the time of electrical discharge.
  • the average voltage value is maintained at the theoretical value as shown in FIG. 10, and stable discharge continues.
  • the unstable or abnormal discharge of FIG. 9 is repeated, the average voltage value varies from the theoretical value as shown in FIG. 11, and the unstable discharge continues.
  • the fluctuation of the average voltage value with respect to the theoretical value does not immediately determine whether the discharge pulse is stable or unstable. Further, even when the stable current pulse pattern shown in FIG. 10 continues to be generated under ideal conditions, there is an unexpected time interval called no-load voltage time until dielectric breakdown occurs. The period of occurrence is not constant. Therefore, the increase / decrease in the discharge generation cycle is an index independent of the processing stability.
  • FIG. 12 is a diagram showing an ideal average voltage value distribution according to the first embodiment.
  • FIG. 13 is a diagram illustrating a distribution of average voltage values when the stable discharge according to the first embodiment continues.
  • FIG. 14 is a diagram illustrating a distribution of average voltage values when the unstable discharge according to the first embodiment continues.
  • the horizontal axis indicates the average voltage value per fixed time when the discharge occurs, and the vertical axis indicates the number of pulses per fixed time.
  • the average voltage value is the number of pulses determined by the theoretical value.
  • an average voltage value is distributed around the target voltage value, which is a machining condition indicating a target voltage value, due to a physical phenomenon, and the number of pulses is also distributed.
  • the target voltage value need not be a theoretical value.
  • the machining is stable and the stable discharge continues as shown in FIG. 10
  • the variation in the average voltage value is small as shown in FIG. 13, and the number of pulses is maximized when the average voltage value is the target voltage value. Yes.
  • the average voltage value is dispersed around the target voltage value as shown in FIG. 14, and the number of pulses also varies. End up.
  • the pulse state recognition unit 32 determines the quality of the distribution based on the voltage distribution at the time of occurrence of discharge in a certain period, and determines whether the pulse is stable or unstable. As an example, the pulse state recognition unit 32 determines whether the voltage pulse and the current pulse are stable or unstable based on the relationship between the average voltage value, the target voltage value, and the voltage threshold value obtained from the machining power supply control unit 12. Determine. Specifically, the pulse state recognizing unit 32 has an absolute value of deviation from the target voltage value of the average voltage value per fixed time at the time of occurrence of discharge larger than the voltage threshold based on the command of the machining power supply control unit 12.
  • the pulse state recognition part 32 calculates
  • the predetermined period may be, for example, an operation time from when a retreat operation called a jump operation is completed, an inter-electrode position control for generating a discharge is performed, and a next jump operation is performed again. it can.
  • the shaft drive recognition unit 31 obtains the shaft feed amount in the drive device 4 obtained from the command of the shaft drive control unit 11 as the value of the third state.
  • the value of the third state is set to be a positive value as the shaft feed amount increases in the machining progress direction, and to a negative value as the shaft feed amount increases in the backward direction.
  • the value of the first state, the value of the second state, and the value of the third state are state variables representing the machining state during electric discharge machining, and the machining state observation unit 33 is the first state that is the acquired plurality of state variables.
  • the value of the second state, and the value of the third state are displayed on the display unit 22 so that the user can visually observe them in the form of a histogram using a distribution chart or a bar graph.
  • the state observation unit 30 observes the values of the first state, the second state, and the third state, which are a plurality of state variables.
  • the 1st reward calculation part 41 of the learning part 40, the 2nd reward calculation part 43, and the 3rd reward calculation part 45 are the value of the 1st state which the processing state observation part 33 acquired, the value of a 2nd state, and 3rd. Calculate rewards based on state values.
  • Any learning algorithm used by the machine learning device 100 including the state observing unit 30, the learning unit 40, and the parameter changing unit 50 may be used.
  • reinforcement learning Reinforcement Learning
  • Reinforcement learning is that an action agent who is an agent in an environment observes the current state and decides an action to be taken. Agents receive rewards from the environment by selecting actions, and learn how to get the most rewards through a series of actions.
  • Q-learning or TD-learning is known.
  • a general update formula of the action value function Q (s, a) is represented by the following formula (1).
  • the behavior value function Q (s, a) is also called a behavior value table.
  • Equation (1) s t represents the state at time t, a t represents the behavior in time t. By the action a t, the state is changed to s t + 1.
  • r t + 1 represents a reward obtained by a change in the state, ⁇ represents a discount rate, and ⁇ represents a learning coefficient.
  • the change behavior of the control parameters and behavior a t at time t the first, if the state s t at the second and third states the time t, Q You can understand that you are learning.
  • FIG. 15 is a flowchart for explaining an optimization process by learning control parameters of machining conditions related to voltage control according to the first embodiment.
  • Control parameters for machining conditions related to voltage control are variable values for performing voltage control that is the basis of the target voltage value set as machining conditions. This enables not only the magnitude of the voltage but also the shape of the voltage waveform.
  • a voltage reference value called a reference voltage for detecting discharge is also included.
  • processing such as changing the initial notch pattern of no-load voltage time and target voltage value set as control parameters to another notch pattern is also performed. Is called.
  • the 1st reward calculation part 41 which calculates the reward concerning voltage control calculates the variation
  • the reward is increased when the value of the second state is increased, and the reward is decreased when the value of the second state is decreased.
  • the number of stable pulses decreases even when unstable pulses decrease. In such a case, the reward calculation method may be determined so that the reward decreases.
  • the first function updater 42 updates the action value function Q, which is a function for determining a control parameter related to voltage control. Based on the updated behavior value function Q, the first control parameter changing unit 51 changes the control parameter of the processing condition related to the voltage control so that the control parameter can obtain the most reward.
  • FIG. 15 is executed in a situation where the electric discharge machine 1 is continuously executing electric discharge machining, and it is described that priority is set for the control parameters to be changed, but six types of control parameters are optimized at the same time. You may make it make it.
  • the state observation unit 30 observes information of the machining power source control unit 12 when machining is performed with the current machining conditions and control parameters (step S101). Specifically, the state observation unit 30 acquires a command from the machining power source control unit 12 during machining. Then, based on the command from the machining power supply control unit 12, the pulse state recognition unit 32 calculates the value of the first state and the value of the second state (step S102).
  • the value of the first state and the value of the second state which are the state variables obtained by the pulse state recognition unit 32, are given from the machining state observation unit 33 to the first reward calculation unit 41.
  • the value of the first state and the value of the second state are associated with the currently used control parameter set in the control parameter holding unit 13, and the processing state observation unit 33 to the first reward calculation unit 41. Given to.
  • the 1st reward calculation part 41 compares the value of the given 1st state with the value of the 1st state of the last time (Step S103).
  • the first reward calculation unit 41 holds the value of the first state given last time, and can compare it with the value of the first state given this time.
  • the first reward calculation unit 41 increases the reward (step S104). That is, if the value of the first state indicates a more stable state than the previous time, the reward is increased.
  • the increase value of the reward is a predetermined value.
  • the first reward calculation unit 41 does not change the reward (step S105).
  • step S106 When the value of the first state is larger than the value of the previous first state (step S103: large), the first reward calculation unit 41 reduces the reward (step S106). That is, if the value of the first state indicates a more unstable state than the previous time, the reward is reduced.
  • the decrease value of the reward is a predetermined value. Note that when step S103 is executed for the first time, the value of the first state given last time does not exist, so the process proceeds to step S105.
  • the first reward calculating unit 41 compares the given second state value with the previous second state value (step S107).
  • the first reward calculation unit 41 holds the value of the second state given last time, and can compare it with the value of the second state given this time.
  • the first reward calculation unit 41 increases the reward (step S108). That is, when the value of the second state indicates a more stable state than the previous time, the reward is increased.
  • the increase value of the reward here is a predetermined value.
  • the first reward calculation unit 41 does not change the reward (step S109).
  • step S107 When the value of the second state is smaller than the previous value of the second state (step S107: small), the first reward calculation unit 41 reduces the reward (step S110). That is, if the value of the second state indicates a more unstable state than the previous time, the reward is reduced.
  • the decrease value of the reward is a predetermined value.
  • the 1st function update part 42 updates action value function Q according to a formula (1) based on the reward which the 1st reward calculation part 41 computed (Step S111). Further, the first function updating unit 42 determines whether or not the update is not performed in Step S111 and the action value function Q has converged (Step S112). When it is determined that the behavior value function Q has not converged (step S112: No), the first control parameter changing unit 51 performs processing conditions related to voltage control based on the behavior value function Q updated in step S111. The control parameter is changed (step S113). After step S113, the process returns to step S101.
  • step S112 When it is determined that the action value function Q has converged (step S112: Yes), the learning unit 40 determines whether or not all the control parameters of the processing conditions related to voltage control have been changed by the first control parameter changing unit 51. Is determined (step S114). If it is determined that all the control parameters of the machining conditions related to voltage control have not been changed (step S114: No), the control parameter to be changed by the first control parameter changing unit 51 is changed to another control in step S113. The parameters are changed (step S115). Another control parameter that is a new change target in step S115 is a control parameter that has not yet been changed with respect to voltage control. After step S115, the process proceeds to step S113.
  • the priority order to be changed is determined for the six control parameters of the machining conditions related to the voltage control shown in FIG. 3 that are changed in step S113.
  • the first control parameter changing unit 51 changes the voltage control parameter that is the control parameter of the target voltage value.
  • the control parameters to be changed by the first control parameter changing unit 51 are the GAIN control parameter, which is an axis response control parameter, and the pulse pause time.
  • Length control parameter, gap control parameter as control parameter of gap gap adjustment value, length control parameter as control parameter of deepest duration, length control as control parameter of current pulse length The parameters are changed in the order of parameters in step S115.
  • step S114 When the learning unit 40 determines that the action value function Q has converged and all the control parameters of the machining conditions related to voltage control have been changed (step S114: Yes), the learning of the control parameters of the machining conditions related to voltage control is performed. The optimization process ends, and the learning result is stored in the learning result storage unit 80 (step S116).
  • the learning result in addition to the control parameters that are changed and finally determined in step S113, the value of the change process of each control parameter, the value of the first state and the value of the second state corresponding to the control parameter Is included.
  • the learning result stored in the learning result storage unit 80 can be used for pass / fail judgment before and after the change of the control parameter.
  • control parameter finally determined as described above is most rewarded in the learning, and is held in the control parameter holding unit 13 as an optimal control parameter for the given processing condition setting value.
  • the control parameter changing unit 51 updates the action value function Q updated in step S111. 6 types of control parameters are changed simultaneously. In this case, steps S114 and S115 are unnecessary, and if it is determined in step S112 that the action value function Q has converged (step S112: Yes), the process may immediately proceed to step S116.
  • FIG. 16 is a flowchart for explaining an optimization process by learning control parameters of machining conditions related to the pulse control according to the first embodiment.
  • Control parameters of machining conditions related to pulse control are variable values for performing current pulse control, such as a pulse inclination and an abnormal discharge detection threshold that is a basis of a theoretical value of a pulse generation period.
  • Control parameters of machining conditions related to pulse control include not only the magnitude and width of the current pulse but also the current waveform shape and the relative distance between the machining electrode 2 and the workpiece 3 for bringing the current value close to the ideal shape.
  • the adjustment value of the gap between the electrodes for adjusting is also included.
  • processing such as changing the initial notch pattern of the current magnitude and width set as the control parameters to another notch pattern is performed by optimizing the control parameters of the processing conditions related to the pulse control.
  • 2nd reward calculation part 43 which calculates the reward concerning pulse control calculates a reward based on the value of the 1st state which is the state variable which pulse state recognition part 32 asked, and the value of the 2nd state.
  • the reward calculation method of the second reward calculation unit 43 is the same as that of the first reward calculation unit 41.
  • the second function updater 44 updates the action value function Q, which is a function for determining a control parameter related to pulse control. Based on the updated behavior value function Q, the second control parameter changing unit 52 changes the control parameter of the processing condition related to the pulse control so as to obtain the control parameter with the most reward.
  • FIG. 16 is executed in a situation where the electric discharge machine 1 is continuously executing electric discharge machining, and it is described that priority is set for the control parameters to be changed.
  • six types of control parameters are optimized at the same time. You may make it make it.
  • the second reward calculation unit 43 holds the initial value of reward for pulse control before the flowchart of FIG. 16 is executed.
  • the initial value of the reward is not limited as long as it is a fixed value, and may be 0.
  • the state observation unit 30 observes information of the machining power source control unit 12 when machining is performed with the current machining conditions and control parameters (step S201). Specifically, the state observation unit 30 acquires a command from the machining power source control unit 12 during machining. Then, based on the command of the machining power supply control unit 12, the pulse state recognition unit 32 calculates the value of the first state and the value of the second state (step S202).
  • the value of the first state and the value of the second state which are the state variables obtained by the pulse state recognition unit 32, are given from the machining state observation unit 33 to the second reward calculation unit 43.
  • the value of the first state and the value of the second state are associated with the currently used control parameter set in the control parameter holding unit 13, and the processing state observation unit 33 to the second reward calculation unit 43. Given to.
  • the 2nd reward calculation part 43 compares the value of the given 1st state with the value of the 1st state of the last time (Step S203).
  • the second reward calculation unit 43 holds the value of the first state given last time, and can compare it with the value of the first state given this time.
  • the second reward calculation unit 43 increases the reward (step S204).
  • the increase value of the reward is a predetermined value. If the value of the first state is the same as the value of the previous first state (step S203: the same), the second reward calculation unit 43 does not change the reward (step S205).
  • step S203 When the value of the first state is larger than the value of the first state of the previous time (step S203: large), the second reward calculation unit 43 reduces the reward (step S206).
  • the decrease value of the reward is a predetermined value.
  • the second reward calculation unit 43 compares the given second state value with the previous second state value (step S207).
  • the second reward calculation unit 43 holds the value of the second state given last time and can compare it with the value of the second state given this time.
  • the second reward calculation unit 43 increases the reward (step S208).
  • the increase value of the reward here is a predetermined value.
  • the second reward calculation unit 43 does not change the reward (step S209).
  • step S207 When the value of the second state is smaller than the previous value of the second state (step S207: small), the second reward calculation unit 43 reduces the reward (step S210).
  • the decrease value of the reward is a predetermined value.
  • the second function updating unit 44 updates the action value function Q according to the mathematical formula (1) based on the reward calculated by the second reward calculating unit 43 (step S211). Furthermore, the second function update unit 44 determines whether or not the update is not performed in step S211 and the action value function Q has converged (step S212). When it is determined that the action value function Q has not converged (step S212: No), the second control parameter changing unit 52 performs processing conditions related to pulse control based on the action value function Q updated in step S211. The control parameter is changed (step S213). After step S213, the process returns to step S201.
  • step S212 determines whether or not all the control parameters of the processing conditions related to the pulse control have been changed by the second control parameter changing unit 52. Is determined (step S214).
  • step S214 determines whether or not all the control parameters of the machining conditions related to the pulse control have not been changed.
  • step S214 No
  • the control parameter to be changed by the second control parameter changing unit 52 is changed to another control in step S213.
  • the parameters are changed (step S215).
  • Another control parameter that is a new change target in step S215 is a control parameter that has not yet been changed related to pulse control. After step S215, the process proceeds to step S213.
  • step S213 The change of the control parameters of the machining conditions related to the pulse control in step S213 will be described in detail below. As described above, priorities to be changed are determined for the six types of control parameters of the machining conditions related to the pulse control shown in FIG. 5 that are changed in step S213.
  • the second control parameter changing unit 52 changes the voltage control parameter that is the control parameter of the target voltage value.
  • the control parameter to be changed by the second control parameter changing unit 52 is a gap control parameter or circuit that is a control parameter for the gap gap adjustment value.
  • Pulse tilt control parameter that is a control parameter for auxiliary setting, length control parameter that is a control parameter for current pulse length, peak control parameter that is a control parameter for current pulse peak value, circuit call that is a control parameter for processing circuit type
  • the parameters are changed in order.
  • step S214 If the learning unit 40 determines that the action value function Q has converged and all of the machining condition control parameters related to pulse control have been changed (step S214: Yes), the learning of the machining condition control parameters related to pulse control is performed. The optimization process ends, and the learning result is stored in the learning result storage unit 80 (step S216).
  • the learning result in addition to the control parameters that are changed and finally determined in step S213, the value of the change process of each control parameter, the value of the first state and the value of the second state corresponding to the control parameter Is included.
  • the learning result stored in the learning result storage unit 80 can be used for pass / fail judgment before and after the change of the control parameter.
  • control parameter finally determined as described above is most rewarded in the learning, and is held in the control parameter holding unit 13 as an optimal control parameter for the given processing condition setting value.
  • the second control parameter changing unit 52 determines the action value function Q updated in step S211. 6 types of control parameters are changed simultaneously. In this case, steps S214 and S215 are unnecessary, and if it is determined in step S212 that the action value function Q has converged (step S212: Yes), the process may proceed immediately to step S216.
  • FIG. 17 is a flowchart for explaining an optimization process by learning control parameters of machining conditions related to the axis drive control according to the first embodiment.
  • the control parameter of the machining condition related to the shaft drive control is also called an inter-electrode control parameter, a deceleration distance when the machining electrode 2 and the workpiece 3 are brought close to each other, a speed for generating a behavior of an instantaneous retreat action called a jump operation, and It is a variable value for changing the shaft drive behavior of the electric discharge machine 1, such as an acceleration parameter.
  • the third reward calculation unit 45 that calculates the reward related to the shaft drive control is the value of the second state that is the state variable obtained by the pulse state recognition unit 32 and the third state that is the state variable obtained by the shaft drive recognition unit 31. Calculate the reward based on the value of. If the value of the second state is large and the value of the third state increases, the third reward calculating unit 45 calculates the amount of change in the reward so as to increase the reward. There is no restriction on how the state value is used to determine the amount of change in reward. Specifically, the reward is increased when the value of the second state is increased, and the reward is decreased when the value of the second state is decreased.
  • the reward is increased when the value of the third state is increased, and the reward is decreased when the value of the third state is decreased. If the number of discharge pulses increases even if there is no change in the feed amount of the shaft, the reward will increase, but if the number of discharge pulses decreases even if the feed amount of the shaft increases in the machining progress direction May determine how to calculate the reward so that the reward decreases.
  • the third function updater 46 updates the action value function Q, which is a function for determining a control parameter related to the axis drive control. Based on the updated behavior value function Q, the third control parameter changing unit 53 changes the control parameter of the machining condition related to the axial drive control so that the control parameter can obtain the most reward.
  • FIG. 17 is executed in a situation where the electric discharge machine 1 is continuously executing electric discharge machining, and it is described that priority is set for the control parameters to be changed, but five types of control parameters are optimized at the same time. You may make it make it.
  • the state observation unit 30 observes information of the machining power source control unit 12 when machining is performed with the current machining conditions and control parameters (step S301). Specifically, the state observation unit 30 acquires a command from the machining power source control unit 12 during machining. Then, based on the command of the machining power supply control unit 12, the pulse state recognition unit 32 calculates the value of the second state (step S303).
  • the state observing unit 30 observes information of the shaft drive control unit 11 when the machining is being executed with the current machining conditions and control parameters (step S302). Specifically, the state observation unit 30 acquires a command from the shaft drive control unit 11 during machining. Then, based on the command from the shaft drive control unit 11, the shaft drive recognition unit 31 calculates the value of the third state (step S304). Next, the value of the second state obtained by the pulse state recognition unit 32 and the value of the third state obtained by the shaft drive recognition unit 31 are provided from the machining state observation unit 33 to the third reward calculation unit 45.
  • the value of the second state and the value of the third state are associated with the currently used control parameter set in the control parameter holding unit 13, and the machining state observation unit 33 to the third reward calculation unit 45. Given to.
  • the 3rd reward calculation part 45 compares the value of the given 2nd state with the value of the 2nd state of the last time (Step S305).
  • the third reward calculation unit 45 holds the value of the second state given last time, and can compare it with the value of the second state given this time.
  • the third reward calculation unit 45 increases the reward (step S306).
  • the increase value of the reward here is a predetermined value.
  • the third reward calculation unit 45 does not change the reward (step S307).
  • step S305 When the value of the second state is smaller than the previous value of the second state (step S305: small), the third reward calculation unit 45 reduces the reward (step S308).
  • the decrease value of the reward is a predetermined value.
  • the third reward calculation unit 45 compares the given third state value with the previous third state value (step S309).
  • the third reward calculation unit 45 holds the value of the third state given last time, and can compare it with the value of the third state given this time. If the value of the third state is larger than the value of the previous third state (step S309: large), the third reward calculation unit 45 increases the reward (step S310). That is, when the value of the third state indicates a more stable state than the previous time, the reward is increased.
  • the increase value of the reward here is a predetermined value.
  • the third reward calculation unit 45 does not change the reward (step S311).
  • step S309 When the value of the third state is smaller than the value of the previous third state (step S309: small), the third reward calculation unit 45 reduces the reward (step S312). That is, when the value of the third state indicates a more unstable state than the previous time, the reward is reduced.
  • the decrease value of the reward is a predetermined value.
  • the 3rd function update part 46 updates action value function Q according to Numerical formula (1) based on the reward which the 3rd reward calculation part 45 computed (Step S313). Further, the third function update unit 46 determines whether or not the update is not performed in step S313 and the behavior value function Q has converged (step S314). When it is determined that the behavior value function Q has not converged (step S314: No), the third control parameter changing unit 53 performs processing related to the axis drive control based on the behavior value function Q updated in step S313. The control parameter for the condition is changed (step S315). After step S315, the process returns to steps S301 and S302.
  • step S314 When it is determined that the action value function Q has converged (step S314: Yes), the learning unit 40 determines whether or not all the control parameters of the machining conditions related to the axis drive control have been changed by the third control parameter changing unit 53. Is determined (step S316). When it is determined that all the control parameters of the machining conditions related to the axis drive control have not been changed (step S316: No), the control parameter to be changed by the third control parameter changing unit 53 is changed to another parameter in step S315. The control parameter is changed (step S317). Another control parameter that is a new change target in step S317 is a control parameter that has not been changed yet related to the axis drive control. After step S317, the process proceeds to step S315.
  • step S315 The change of the control parameters of the machining conditions related to the axis drive control in step S315 will be described in detail below.
  • the priority order to be changed is determined for the five types of control parameters of the machining conditions related to the axis drive control shown in FIG. 7 changed in step S315.
  • the control parameter to be changed by the third control parameter changing unit 53 is the GAIN control parameter, which is an axis response control parameter, and the deepest value persistence.
  • the length control parameter is a time control parameter
  • the jump control parameter is a jump speed and jump height control parameter
  • the gap control parameter is a control parameter for the gap gap adjustment value.
  • step S316 When the behavior value function Q has converged and the learning unit 40 determines that all of the machining condition control parameters related to the axis drive control have been changed (step S316: Yes), the control parameters of the machining conditions related to the axis drive control are set.
  • the optimization process by learning ends, and the learning result is stored in the learning result storage unit 80 (step S318).
  • the learning result in addition to the control parameters finally changed after being changed in step S315, the value of the change process of each control parameter, the value of the second state and the value of the third state corresponding to the control parameter Is included.
  • the learning result stored in the learning result storage unit 80 can be used for pass / fail judgment before and after the change of the control parameter.
  • control parameter finally determined as described above is most rewarded in the learning, and is held in the control parameter holding unit 13 as an optimal control parameter for the given processing condition setting value.
  • the retreat operation called jump operation is completed, the inter-electrode position control is performed to generate discharge, and the next jump operation is performed again. It is possible to increase the number of pulses in one operation unit until it is performed, and increase the feed amount of the axis in each observation in the processing progress direction to promote the progress of the processing.
  • the third control parameter changing unit 53 updates the action value function updated in step S313.
  • step S316 and S317 are unnecessary, and if it is determined in step S314 that the action value function Q has converged (step S314: Yes), the process may proceed to step S318 immediately.
  • the method of changing the control parameter is changed based on the updated action value function Q is action value in the current state s t function Q (s t, a t) action value Q sought is not particularly limited as long as the manner for obtaining the action a t or control parameters such as the maximum.
  • the control parameter optimization operation is performed from the stage where the machining operation by the electric discharge machine 1 is started and electric discharge is generated, and is continued until the electric discharge machining is completed. That is, simultaneously with the start of machining, the machining state is observed by the state observing unit 30, and an optimum control parameter is searched for by the learning unit 40 and the parameter changing unit 50 until the machining is completed. That is, the machine learning device 100 executes the flowcharts of FIGS. 15 to 17 in parallel, and the control parameter update continues until all the end conditions of FIGS. 15 to 17 are satisfied. When all the end conditions are satisfied, the control parameter change ends.
  • the learning action by the machine learning device 100 is continuously performed from the start of electric discharge machining to the end of machining.
  • a reward for the learning behavior is obtained based on the first, second, and third states, and the control parameter changing behavior is performed.
  • the behavior value Q based on the optimal control parameter obtained after the processing is finished is higher than the behavior value Q based on the control parameter that is initially set.
  • the action value Q is increased by the electric discharge machine 1 according to the first embodiment, so that the time required for finishing the machining is shortened, and the machining accuracy and machining surface quality of the workpiece obtained by machining by stable electric discharge are improved. Improvement is obtained as an effect.
  • the machining condition set value is controlled according to a rule determined to stabilize the machining, but the adaptive control for changing the control parameter is not performed.
  • the optimization learning for adjusting the control parameter is executed while actually performing the electric discharge machining according to the workpiece shape and the workpiece material. Stable machining conditions can be automatically learned.
  • optimization of control parameters is possible without limiting the applicable range of adaptive control even under adaptive control usage conditions that are difficult to assume in advance, such as the shape of the workpiece, the electrode material, and the electrode shape. It is possible to improve the processing speed and processing accuracy by improving the processing stability.
  • FIG. FIG. 18 is a block diagram showing a configuration of an electric discharge machine 1A according to the second embodiment of the present invention.
  • the electric discharge machine 1A adds a machining result input unit 23, which is a configuration for performing additional learning using the machining result, to the input / output unit 20 to the electric discharge machine 1 according to the first embodiment.
  • the learning behavior of the control parameter when performing a specific machining has been described.
  • machining is performed once in advance with the same material of the workpiece 3 and the same machining condition setting value. Suppose that it was done. As a result of processing once, it is assumed that the surface roughness after processing the workpiece 3 and the consumption weight or the consumption length, which is the electrode consumption amount after processing the processing electrode 2, are obtained.
  • the machining result input unit 23 receives machining results such as the surface roughness after machining of the workpiece 3 and the electrode consumption after machining of the machining electrode 2 input by the user.
  • the format of the processing result input may be a format in which the selection result that can be selected by the display unit 22 is displayed, and the processing result input unit 23 receives the user selection result.
  • the processing result input unit 23 may accept numerical data regarding the surface roughness after processing of the workpiece 3 and the electrode consumption after processing of the processing electrode 2 input by the user, and is not limited. Further, the surface roughness after processing of the workpiece 3 received by the processing result input unit 23 and the method for evaluating the quality of the electrode consumption after processing of the processed electrode 2 are also design items and are not particularly limited. Further, the processing result input unit 23 may accept the surface roughness after processing the workpiece 3 and the quality of the electrode consumption after processing the processing electrode 2 itself.
  • the machining parameter change described with reference to FIGS. 15 to 17 is performed by using the machining result in the previous machining accepted by the machining result input unit 23. You can add or remove restrictions on the amount of change in.
  • the machining state observation unit 33 or the like causes the parameter changing unit 50 to add or remove a restriction on the change amount in the change of the control parameter.
  • the control parameter change that affects the processing surface quality is limited.
  • the parameter changing unit 50 limits the change width of the length control parameter so as not to change the length control parameter that is a control parameter of the current pulse length by a certain value or more.
  • the parameter changing unit limits the change of the control parameter that affects the electrode consumption amount. 50 is released. As an example, the change width of the pulse inclination control parameter of the circuit auxiliary setting is increased, and the restriction on the change is released. Conversely, when the amount of electrode consumption is large, the parameter changing unit 50 can further limit the change of the control parameter.
  • the electric discharge machine 1A by accepting a machining result obtained by machining once, the change of the control parameter is machined with the same material of the workpiece 3 and the same machining condition setting value. Can depend on the result.
  • an accuracy improvement effect such as an improvement in the quality of the processed surface after processing and a cost reduction effect such as a reduction in the amount of electrode consumption can be obtained.
  • FIG. 19 is a block diagram showing a configuration of an electric discharge machine 1B according to the third embodiment of the present invention.
  • a communication unit 60 is added to the electric discharge machine 1A according to the second embodiment.
  • the communication unit 60 includes a learning content filing unit 61 that converts learning results stored in the learning result storage unit 80 into transmittable learning result data, a receiving unit 62 that receives learning result data from the outside, and learning result data Is transmitted to the outside.
  • the receiving unit 62 and the transmitting unit 63 are connected to the cloud server 300 existing outside the electric discharge machine 1B and can communicate with each other.
  • the cloud server 300 is also connected to electric discharge machines 301 to 303 having the same learning function as the control device 10 of the electric discharge machine 1B. Therefore, the electric discharge machine 1B can communicate with the electric discharge machines 301 to 303, which are other electric discharge machines, via the communication unit 60.
  • the cloud server 300 can store not only learning result data of the electric discharge machine 1B but also learning result data of the electric discharge machines 301 to 303.
  • a communication method between the cloud server 300 and the electric discharge machine 1B, 301 to 303 is not particularly limited as long as a known technique is used.
  • the learning results stored in the learning result storage unit 80 are used as the electric discharge machines 301 to 303 existing outside. Can be converted into learning result data 61 in a format that can be used.
  • the learning result data is not limited as long as it is a data format that can be used by a control device similar to the control device 10.
  • the learning result data created by the learning content filing unit 61 can be stored in the cloud server 300 via the transmission unit 63.
  • the learning result data stored in the cloud server 300 is automatically or actively transmitted to the electric discharge machines 301 to 303, and whether or not the electric discharge machines 301 to 303 use the learning result data depends on the electric discharge machine. It can be determined by the user's judgment of 301-303.
  • the contents learned by the machine learning device 100 can be used in the electric discharge machines 301 to 303 in the same manner by taking this learning result data into the machine learning devices existing in the electric discharge machines 301 to 303.
  • learning result data created by learning in the electric discharge machines 301 to 303 can also be used by the control device 10 via the cloud server 300 and the receiving unit 62. At this time, the contents learned by the control device of the electric discharge machines 301 to 303 via the receiving unit 62 or the observation state can be displayed on the display unit 22.
  • the learning result by the control device for the electric discharge machines 301 to 303 existing outside such as a remote place is used in the electric discharge machine 1B, or the machining state of the electric discharge machines 301 to 303 is observed with the electric discharge machine 1B it can.
  • the learning results obtained by the electric discharge machine 1B can be used by the electric discharge machines 301 to 303 having the same specifications. Accordingly, not only adjustment of a single electric discharge machine but also improvement of mechanical performance for a plurality of electric discharge machines of the same specification can be efficiently performed as the number of electric discharge machines of the same specification increases. .
  • the machine learning apparatus 100 is realized by a computer system such as a personal computer or a general-purpose computer.
  • FIG. 20 is a diagram illustrating a hardware configuration when the function of the machine learning device 100 according to the first to third embodiments is realized by a computer system.
  • the functions of the machine learning device 100 are realized by a computer system, the functions of the machine learning device 100 are a CPU (Central Processing Unit) 201, a memory 202, a storage device 203, a display device 204, and an input device 205 as shown in FIG. It is realized by.
  • the function executed by the machine learning device 100 is realized by software, firmware, or a combination of software and firmware. Software or firmware is described as a program and stored in the storage device 203.
  • the CPU 201 implements the functions of the machine learning device 100 by reading the software or firmware stored in the storage device 203 into the memory 202 and executing the software or firmware. That is, the computer system stores a program that results in the steps of executing the machine learning method according to the first to third embodiments when the function of the machine learning device 100 is executed by the CPU 201.
  • a storage device 203 is provided. These programs can be said to cause a computer to execute processing realized by the functions of the machine learning device 100.
  • the memory 202 corresponds to a volatile storage area such as RAM (Random Access Memory).
  • the storage device 203 corresponds to a nonvolatile or volatile semiconductor memory such as a ROM (Read Only Memory) or a flash memory, or a magnetic disk.
  • Specific examples of the display device 204 are a monitor and a display.
  • Specific examples of the input device 205 are a keyboard, a mouse, and a touch panel.
  • the configuration described in the above embodiment shows an example of the content of the present invention, and can be combined with another known technique, and can be combined with other configurations within the scope of the present invention. It is also possible to omit or change the part.
  • 1, 1A, 1B, 301-303 Electric discharge machine 2 machining electrodes, 3 workpieces, 4 drive units, 5 machining power sources, 10 control units, 11 axis drive control units, 12 machining power source control units, 13 holding control parameters Unit, 14 initial parameter setting unit, 15 machining condition setting unit, 20 input / output unit, 21 machining condition input unit, 22 display unit, 23 machining result input unit, 30 state observation unit, 31 axis drive recognition unit, 32 pulse state recognition Part, 33 machining state observation part, 40 learning part, 41 first reward calculation part, 42 first function update part, 43 second reward calculation part, 44 second function update part, 45 third reward calculation part, 46 third Function update unit, 47 reward calculation unit, 48 function update unit, 50 parameter change unit, 51 first control parameter change unit, 52 second control parameter change unit, 3.
  • Third control parameter changing unit 60 communication unit, 61 learning content file forming unit, 62 receiving unit, 63 transmitting unit, 80 learning result storage unit, 100 machine learning device, 201 CPU, 202 memory, 203 storage device, 204 display Device, 205 input device, 300 cloud server.

Abstract

A machine learning device (100) learns a control parameter for controlling a machining condition in an electric discharge machine (1). The machine learning device (100) is provided with a state observation unit (30) for observing a plurality of state variables representing a machining state during electric discharge machining, and a learning unit (40) for learning the control parameter on the basis of the plurality of state variables.

Description

機械学習装置、放電加工機および機械学習方法Machine learning device, electric discharge machine, and machine learning method
 本発明は、放電加工を制御する制御パラメータを学習する機械学習装置、放電加工機および機械学習方法に関する。 The present invention relates to a machine learning device, an electric discharge machine, and a machine learning method for learning a control parameter for controlling electric discharge machining.
 放電加工機において安定した加工を行うために、電源電圧波形および電源電流波形の変更、サーボ動作である極間制御動作の変更といった物理量として表現される加工条件の自動変更を行う機能として適応制御機能がある。上記加工条件は、ユーザが変更可能な数種から十数種の加工パラメータにより決定される。被加工物を加工するために印加される電圧の大小または加工電流パルスの形状を変更するパラメータ、被加工物と工具となる加工電極との相対距離を調整するパラメータ、加工電極の送り速度を変えるパラメータなどが加工パラメータに該当する。 Adaptive control function that automatically changes machining conditions expressed as physical quantities, such as changes in power supply voltage waveform and power supply current waveform, and changes in inter-pole control operation that is a servo operation, in order to perform stable machining in an electrical discharge machine There is. The machining conditions are determined by several to a dozen different machining parameters that can be changed by the user. A parameter that changes the magnitude of the voltage applied to process the workpiece or the shape of the machining current pulse, a parameter that adjusts the relative distance between the workpiece and the machining electrode that is the tool, and a change in the machining electrode feed rate Parameters etc. correspond to machining parameters.
 これらの加工パラメータの組合せは、代表的な加工形状、被加工材質および電極材質を使用して実験的に適切な値のセットとして求められたりすることにより、放電加工機に予め複数のセットが設定されていて、ユーザが選択できるようになっている場合もある。しかし、放電加工機で加工の対象とされる形状は三次元的な複雑形状であり、また通電できれば加工可能という放電加工機の特性上被加工材質も種々に渡る。したがって、加工パラメータの最適化が必要であり、例えば、特許文献1においては、作業者が入力した加工状態を利用して、加工パラメータを自動設定することが示されている。 A combination of these machining parameters can be determined in advance as a set of appropriate values experimentally using typical machining shapes, workpiece materials, and electrode materials. In some cases, the user can select. However, the shape to be machined by the electric discharge machine is a three-dimensional complicated shape, and there are various materials to be machined due to the characteristics of the electric discharge machine that can be machined if energized. Therefore, it is necessary to optimize machining parameters. For example, Patent Document 1 discloses that machining parameters are automatically set using a machining state input by an operator.
特開平2-212041号公報Japanese Patent Laid-Open No. 2-212041
 しかし、特許文献1に記載の自動設定では、単一種類の加工状態に基づいて、ユーザも設定することが可能な加工パラメータの一部を調整するのみである。また、加工パラメータに基づいて、最終的に物理量として表現される加工条件を実現するためには各種制御パラメータがその背景として無数に存在しており、それらの制御パラメータの調整は行われていない。 However, the automatic setting described in Patent Document 1 only adjusts some of the machining parameters that can be set by the user based on a single type of machining state. In addition, in order to realize the machining conditions finally expressed as physical quantities based on the machining parameters, there are innumerable various control parameters as the background, and the control parameters are not adjusted.
 したがって、放電加工機の適応制御において、物理量としてより適切な加工条件を取得できる適応制御が求められていた。 Therefore, in the adaptive control of the electric discharge machine, there has been a demand for adaptive control that can acquire more appropriate machining conditions as physical quantities.
 本発明は、上記に鑑みてなされたものであって、放電加工においてより適切な加工条件を自動的に学習することができる機械学習装置を得ることを目的とする。 The present invention has been made in view of the above, and an object of the present invention is to obtain a machine learning device that can automatically learn more appropriate machining conditions in electric discharge machining.
 上述した課題を解決し、目的を達成するために、本発明の機械学習装置は、放電加工機における加工条件を制御する制御パラメータを学習する。本発明の機械学習装置は、放電加工中の加工状態を表す複数の状態変数を観測する状態観測部と、複数の状態変数に基づいて制御パラメータを学習する学習部と、を備える。 In order to solve the above-described problems and achieve the object, the machine learning device of the present invention learns control parameters for controlling machining conditions in an electric discharge machine. The machine learning device of the present invention includes a state observation unit that observes a plurality of state variables representing a machining state during electric discharge machining, and a learning unit that learns control parameters based on the plurality of state variables.
 本発明にかかる機械学習装置は、放電加工においてより適切な加工条件を自動的に学習することができるという効果を奏する。 The machine learning device according to the present invention has an effect of being able to automatically learn more appropriate machining conditions in electric discharge machining.
本発明の実施の形態1にかかる放電加工機の構成を示すブロック図1 is a block diagram showing a configuration of an electric discharge machine according to a first embodiment of the present invention. 実施の形態1にかかる加工条件を制御目的で分類した図The figure which classified the processing conditions concerning Embodiment 1 for the purpose of control 実施の形態1にかかる電圧制御に関わる加工条件の制御パラメータを説明する図The figure explaining the control parameter of the process conditions in connection with voltage control concerning Embodiment 1. 実施の形態1にかかる電圧制御に関わる加工条件と電流パルスの発生周期との関係を示す図The figure which shows the relationship between the process conditions regarding the voltage control concerning Embodiment 1, and the generation cycle of an electric current pulse. 実施の形態1にかかるパルス制御に関わる加工条件の制御パラメータを説明する図FIG. 3 is a diagram for explaining control parameters of machining conditions related to pulse control according to the first embodiment; 実施の形態1にかかるパルス制御に関わる加工条件と電流パルスの形状との関係を示す図The figure which shows the relationship between the process conditions regarding the pulse control concerning Embodiment 1, and the shape of an electric current pulse. 実施の形態1にかかる軸駆動制御に関わる加工条件の制御パラメータを説明する図FIG. 6 is a diagram for explaining control parameters of machining conditions related to shaft drive control according to the first embodiment; 実施の形態1にかかる軸駆動制御に関わる加工条件と軸駆動による極間制御との関係を示す図The figure which shows the relationship between the process conditions regarding the axial drive control concerning Embodiment 1, and the distance control by axial drive. 実施の形態1にかかる電圧パルスおよび電流パルスの状態を説明する図The figure explaining the state of the voltage pulse and current pulse concerning Embodiment 1 実施の形態1にかかる電圧パルスおよび電流パルスが安定である場合を示す図The figure which shows the case where the voltage pulse and current pulse concerning Embodiment 1 are stable. 実施の形態1にかかる電圧パルスおよび電流パルスが不安定である場合を示す図The figure which shows the case where the voltage pulse and current pulse concerning Embodiment 1 are unstable. 実施の形態1にかかる理想的な平均電圧値の分布を示す図The figure which shows distribution of the ideal average voltage value concerning Embodiment 1. 実施の形態1にかかる安定した放電が継続するときの平均電圧値の分布を示す図The figure which shows distribution of the average voltage value when the stable discharge concerning Embodiment 1 continues 実施の形態1にかかる不安定な放電が継続するときの平均電圧値の分布を示す図The figure which shows distribution of the average voltage value when the unstable discharge concerning Embodiment 1 continues 実施の形態1にかかる電圧制御に関わる加工条件の制御パラメータの学習による最適化処理を説明するフローチャートFIG. 3 is a flowchart for explaining optimization processing by learning control parameters of machining conditions related to voltage control according to the first embodiment; 実施の形態1にかかるパルス制御に関わる加工条件の制御パラメータの学習による最適化処理を説明するフローチャートFIG. 3 is a flowchart for explaining optimization processing by learning control parameters of machining conditions related to pulse control according to the first embodiment; 実施の形態1にかかる軸駆動制御に関わる加工条件の制御パラメータの学習による最適化処理を説明するフローチャートFIG. 6 is a flowchart for explaining optimization processing by learning control parameters of machining conditions related to the axis drive control according to the first embodiment; 本発明の実施の形態2にかかる放電加工機の構成を示すブロック図Block diagram showing a configuration of an electric discharge machine according to a second embodiment of the present invention. 本発明の実施の形態3にかかる放電加工機の構成を示すブロック図Block diagram showing a configuration of an electric discharge machine according to a third embodiment of the present invention. 実施の形態1から3にかかる機械学習装置の機能をコンピュータシステムで実現する場合のハードウェア構成を示す図The figure which shows the hardware constitutions in the case of implement | achieving the function of the machine learning apparatus concerning Embodiment 1-3 from a computer system
 以下に、本発明の実施の形態にかかる機械学習装置、放電加工機および機械学習方法を図面に基づいて詳細に説明する。なお、この実施の形態によりこの発明が限定されるものではない。 Hereinafter, a machine learning device, an electric discharge machine, and a machine learning method according to an embodiment of the present invention will be described in detail with reference to the drawings. Note that the present invention is not limited to the embodiments.
実施の形態1.
 図1は、本発明の実施の形態1にかかる放電加工機1の構成を示すブロック図である。放電加工機1には、加工工具となる加工電極2と、加工電極2と被加工物3との間の距離を制御するための駆動装置4と、加工電極2と被加工物3との間に放電を発生させるための加工電源5と、駆動装置4および加工電源5を制御する制御装置10とを備える。被加工物3は加工電源5に接続されている。駆動装置4は、加工電極2および被加工物3のいずれか、または両方を駆動することができる。
Embodiment 1 FIG.
FIG. 1 is a block diagram showing a configuration of an electric discharge machine 1 according to a first embodiment of the present invention. The electric discharge machine 1 includes a machining electrode 2 serving as a machining tool, a driving device 4 for controlling a distance between the machining electrode 2 and the workpiece 3, and a gap between the machining electrode 2 and the workpiece 3. A machining power source 5 for generating electric discharge, and a drive device 4 and a control device 10 for controlling the machining power source 5. The workpiece 3 is connected to a machining power source 5. The driving device 4 can drive either the processing electrode 2 or the workpiece 3 or both.
 制御装置10は、駆動装置4を制御する軸駆動制御部11と、加工電源5を制御する加工電源制御部12と、加工条件設定値を設定する加工条件設定部15と、加工条件に対応する制御パラメータを保持する制御パラメータ保持部13と、制御パラメータの初期値を設定する初期パラメータ設定部14と、を備える。なお、加工条件設定値は、加工条件を指定する設定値である。 The control device 10 corresponds to the shaft drive control unit 11 that controls the drive device 4, the machining power control unit 12 that controls the machining power source 5, the machining condition setting unit 15 that sets machining condition setting values, and the machining conditions. A control parameter holding unit 13 that holds control parameters and an initial parameter setting unit 14 that sets initial values of the control parameters are provided. The processing condition setting value is a setting value for specifying the processing condition.
 制御パラメータは、加工条件設定値と加工条件との関係を規定するパラメータであり、加工条件設定値および制御パラメータに基づいて具体的な物理量で表現される加工条件が決定される。したがって、加工電極2と被加工物3との間に放電を発生させて加工する際の加工パターンの加工条件は、加工条件設定部15が設定した加工条件設定値と、制御パラメータ保持部13に保持されている制御パラメータとに基づいて決定される。すなわち、放電加工機1における物理量で表現される加工条件は制御パラメータにより制御される。ユーザは、加工条件設定値を設定することができるが、制御パラメータを設定したり変更したりすることはできない。軸駆動制御部11および加工電源制御部12は、加工条件設定部15および制御パラメータ保持部13から与えられた情報に基づいて、上記加工条件の加工パターンに応じた指令を発する。後述するように制御パラメータは変更されるが、制御パラメータ保持部13に最初に設定されている制御パラメータの初期値は初期パラメータ設定部14により設定される。制御パラメータが加工条件設定値と加工条件との対応テーブルで表現される場合、制御パラメータの初期値は、初期値となる対応テーブルとなる。 The control parameter is a parameter that defines the relationship between the machining condition set value and the machining condition, and the machining condition expressed by a specific physical quantity is determined based on the machining condition set value and the control parameter. Therefore, the machining conditions of the machining pattern when machining by generating an electric discharge between the machining electrode 2 and the workpiece 3 are the machining condition setting value set by the machining condition setting unit 15 and the control parameter holding unit 13. It is determined based on the held control parameters. That is, the machining condition expressed by the physical quantity in the electric discharge machine 1 is controlled by the control parameter. The user can set the machining condition set value, but cannot set or change the control parameter. The shaft drive control unit 11 and the machining power control unit 12 issue a command corresponding to the machining pattern of the machining conditions based on the information given from the machining condition setting unit 15 and the control parameter holding unit 13. As will be described later, the control parameter is changed, but the initial value of the control parameter initially set in the control parameter holding unit 13 is set by the initial parameter setting unit 14. When the control parameter is expressed by a correspondence table between the machining condition setting value and the machining condition, the initial value of the control parameter is a correspondence table that is an initial value.
 駆動装置4は、軸駆動制御部11からの上記指令に基づいて、加工電極2と被加工物3との相対距離および相対速度を制御する。加工電源5は、加工電源制御部12からの上記指令に基づいて、加工電極2と被加工物3との間に電圧を印加して、放電時の電流波形を制御する。 The driving device 4 controls the relative distance and the relative speed between the machining electrode 2 and the workpiece 3 based on the command from the shaft drive control unit 11. The machining power source 5 applies a voltage between the machining electrode 2 and the workpiece 3 based on the command from the machining power source control unit 12 to control the current waveform during discharge.
 制御装置10は、さらに、入出力部20と、機械学習装置100と、パラメータ変更部50と、学習結果記憶部80とを備える。 The control device 10 further includes an input / output unit 20, a machine learning device 100, a parameter changing unit 50, and a learning result storage unit 80.
 入出力部20は、ユーザの入力を受け付け、表示によりユーザの確認作業をサポートする入出力インタフェースである。入出力部20は、加工条件設定部15にユーザが設定させたい加工条件設定値を受け付ける加工条件入力部21と、ユーザが加工状態を観測する確認作業を行うための表示部22とを備える。 The input / output unit 20 is an input / output interface that accepts user input and supports the user's confirmation work by display. The input / output unit 20 includes a machining condition input unit 21 that receives a machining condition setting value that the user wants to set in the machining condition setting unit 15, and a display unit 22 that allows the user to perform a confirmation operation for observing the machining state.
 機械学習装置100は、状態観測部30および学習部40を備える。状態観測部30は、軸駆動認識部31と、パルス状態認識部32と、加工状態観測部33とを備える。学習部40は、報酬計算部47および関数更新部48を備え、制御パラメータを学習することにより最適化する。 The machine learning device 100 includes a state observation unit 30 and a learning unit 40. The state observation unit 30 includes an axis drive recognition unit 31, a pulse state recognition unit 32, and a machining state observation unit 33. The learning unit 40 includes a reward calculation unit 47 and a function update unit 48, and optimizes the learning by learning control parameters.
 報酬計算部47は、電圧制御にかかる報酬を計算する第1報酬計算部41と、パルス制御にかかる報酬を計算する第2報酬計算部43と、軸駆動制御にかかる報酬を計算する第3報酬計算部45とを備える。関数更新部48は、電圧制御にかかる関数を更新する第1関数更新部42と、パルス制御にかかる関数を更新する第2関数更新部44と、軸駆動制御にかかる関数を更新する第3関数更新部46とを備える。 The reward calculation unit 47 includes a first reward calculation unit 41 that calculates a voltage control reward, a second reward calculation unit 43 that calculates a pulse control reward, and a third reward that calculates a shaft drive control reward. And a calculation unit 45. The function update unit 48 includes a first function update unit 42 that updates a function related to voltage control, a second function update unit 44 that updates a function related to pulse control, and a third function that updates a function related to axis drive control. And an update unit 46.
 パラメータ変更部50は、電圧制御に関わる加工条件の制御パラメータを変更する第1制御パラメータ変更部51と、パルス制御に関わる加工条件の制御パラメータを変更する第2制御パラメータ変更部52と、軸駆動制御に関わる加工条件の制御パラメータを変更する第3制御パラメータ変更部53とを備える。パラメータ変更部50は、学習部40が学習した結果に基づいて、制御パラメータ保持部13が保持する制御パラメータを変更する。 The parameter changing unit 50 includes a first control parameter changing unit 51 that changes a control parameter of a machining condition related to voltage control, a second control parameter changing unit 52 that changes a control parameter of a machining condition related to pulse control, and an axis drive A third control parameter changing unit 53 that changes a control parameter of a machining condition related to control. The parameter changing unit 50 changes the control parameter held by the control parameter holding unit 13 based on the result learned by the learning unit 40.
 学習結果記憶部80は、機械学習装置100による学習結果を記憶する。 The learning result storage unit 80 stores a learning result by the machine learning device 100.
 放電加工機1が放電加工を開始すると、加工条件設定部15が出力する加工条件設定値に基づいて、軸駆動制御部11および加工電源制御部12が指令を行い、駆動装置4および加工電源5の動作によって、加工電極2と被加工物3との間に放電が発生する。 When the electric discharge machine 1 starts electric discharge machining, the shaft drive control unit 11 and the machining power source control unit 12 issue a command based on the machining condition setting value output from the machining condition setting unit 15, and the driving device 4 and the machining power source 5 are processed. Due to this operation, an electric discharge is generated between the machining electrode 2 and the workpiece 3.
 放電加工が行われている間、駆動装置4は軸駆動制御部11の指令に従って、加工電極2と被加工物3との相対距離を小さくまたは大きくさせながら、放電が発生する最適な相対距離を探索する。この時の駆動軸の位置および駆動軸の動作についての情報は、軸駆動認識部31が取得して、加工状態観測部33に軸挙動履歴として記録される。 While the electric discharge machining is being performed, the driving device 4 sets an optimum relative distance at which electric discharge occurs while decreasing or increasing the relative distance between the machining electrode 2 and the workpiece 3 in accordance with a command from the shaft drive control unit 11. Explore. Information about the position of the drive shaft and the operation of the drive shaft at this time is acquired by the shaft drive recognition unit 31 and recorded in the machining state observation unit 33 as a shaft behavior history.
 また、放電加工が行われている間は、上記した駆動装置4の動作と同時に、加工電源5は加工電源制御部12の指令によって、加工電極2と被加工物3との間に電圧を印加して、指令された形状の電流波形の電流パルスを発生させる。加工電源制御部12は、加工条件設定部15からの加工条件設定値に基づいて、指令された形状の電流波形の電流パルスを一定周期で発生させるように加工電源5の電圧を制御する。しかし、物理特性上、放電加工において確実に一定周期で電流パルスを発生させることは不可能である。また、電流パルスの形状も理論値が示す電流波形とは異なるものが生成される場合がある。この電流パルスの発生周期および電流パルスの形状、それに加えて、パルス発生の元となる印加電圧の大きさおよび印加周期、電圧パルス形状を示す電圧波形の情報は、パルス状態認識部32が取得して、加工状態観測部33にパルス挙動履歴として記録される。 While electric discharge machining is being performed, the machining power supply 5 applies a voltage between the machining electrode 2 and the workpiece 3 in accordance with a command from the machining power supply control unit 12 simultaneously with the operation of the drive device 4 described above. Then, current pulses having a current waveform having the commanded shape are generated. Based on the machining condition setting value from the machining condition setting unit 15, the machining power source control unit 12 controls the voltage of the machining power source 5 so as to generate current pulses with a current waveform having a commanded shape at a constant period. However, due to physical characteristics, it is impossible to reliably generate current pulses at a constant period in electric discharge machining. Also, the current pulse shape may be different from the current waveform indicated by the theoretical value. The pulse state recognition unit 32 acquires the current pulse generation period and the current pulse shape, and in addition to this, the pulse voltage recognition unit 32 acquires the magnitude of the applied voltage, the application period, and the voltage waveform indicating the voltage pulse shape. Then, it is recorded in the machining state observation unit 33 as a pulse behavior history.
 加工状態観測部33は、パルス挙動履歴および軸挙動履歴から、一定期間における電圧値の分布、電流パルスの発生周期、電流パルスが発生した際の軸の位置情報、速度情報および加速度情報を得る。加工状態観測部33は、現在使用されている制御パラメータの下で実行されている放電加工により得られたこれらの情報を、制御パラメータ保持部13に設定されている現在使用されている制御パラメータと紐付けて学習部40に与える。 The machining state observation unit 33 obtains the voltage value distribution, the current pulse generation period, the axis position information when the current pulse is generated, the speed information, and the acceleration information from the pulse behavior history and the shaft behavior history. The machining state observation unit 33 uses the information obtained by the electric discharge machining performed under the currently used control parameters as the currently used control parameters set in the control parameter holding unit 13. It associates and gives to the learning part 40.
 以下では、加工条件及び加工条件と制御パラメータとの関係について詳細に説明する。図2は、実施の形態1にかかる加工条件を制御目的で分類した図である。図3は、実施の形態1にかかる電圧制御に関わる加工条件の制御パラメータを説明する図である。図4は、実施の形態1にかかる電圧制御に関わる加工条件と電流パルスの発生周期との関係を示す図である。図5は、実施の形態1にかかるパルス制御に関わる加工条件の制御パラメータを説明する図である。図6は、実施の形態1にかかるパルス制御に関わる加工条件と電流パルスの形状との関係を示す図である。図7は、実施の形態1にかかる軸駆動制御に関わる加工条件の制御パラメータを説明する図である。図8は、実施の形態1にかかる軸駆動制御に関わる加工条件と軸駆動による極間制御との関係を示す図である。 Hereinafter, the processing conditions and the relationship between the processing conditions and the control parameters will be described in detail. FIG. 2 is a diagram in which machining conditions according to the first embodiment are classified for control purposes. FIG. 3 is a diagram for explaining control parameters of machining conditions related to voltage control according to the first embodiment. FIG. 4 is a diagram illustrating a relationship between a machining condition related to voltage control according to the first embodiment and a generation period of a current pulse. FIG. 5 is a diagram for explaining control parameters of machining conditions related to pulse control according to the first embodiment. FIG. 6 is a diagram illustrating a relationship between the processing conditions related to the pulse control according to the first embodiment and the shape of the current pulse. FIG. 7 is a diagram for explaining control parameters of machining conditions related to the shaft drive control according to the first embodiment. FIG. 8 is a diagram illustrating a relationship between the machining conditions related to the shaft drive control according to the first embodiment and the distance control by the shaft drive.
 図2では、(1)加工回路の種別、(2)回路補助設定、(3)電流パルスピーク値、(4)電流パルス長さ、(5)パルス休止時間、(6)極間ギャップ調整値、(7)ジャンプスピード、(8)ジャンプ高さ、(9)最深値持続時間、(10)軸応答性、(11)狙い電圧値、といった加工条件が、電圧制御に関わる加工条件、パルス制御に関わる加工条件または軸駆動制御に関わる加工条件に該当する場合には対応する欄に黒丸を付してある。電圧制御に関わる加工条件は電流パルスの発生周期に関係し、パルス制御に関わる加工条件は電流パルスの形状に関係し、軸駆動制御に関わる加工条件は極間制御に関係する。 In FIG. 2, (1) type of machining circuit, (2) circuit auxiliary setting, (3) current pulse peak value, (4) current pulse length, (5) pulse pause time, (6) gap gap adjustment value (7) Jump speed, (8) Jump height, (9) Deepest value duration, (10) Axis response, (11) Target voltage value, machining conditions related to voltage control, pulse control When corresponding to the machining conditions related to the above or the machining conditions related to the axis drive control, a black circle is added to the corresponding column. The machining conditions related to voltage control are related to the generation period of the current pulse, the processing conditions related to pulse control are related to the shape of the current pulse, and the machining conditions related to shaft drive control are related to the gap control.
 図3、図5および図7は、加工条件設定部15が加工条件設定値を設定する加工条件に対応して制御パラメータ保持部13が保持する制御パラメータを示している。図2に示した各加工条件は、電流パルスの発生周期、電流パルスの形状、極間制御に関わっているが、重複して関わっていることがある。したがって、電流パルスの発生周期、電流パルスの形状または極間制御のいずれかを変更するように関連する加工パラメータを変更すると、他のものにも影響することがある。 3, FIG. 5 and FIG. 7 show the control parameters held by the control parameter holding unit 13 corresponding to the machining conditions in which the machining condition setting unit 15 sets the machining condition set values. Each machining condition shown in FIG. 2 is related to the generation period of the current pulse, the shape of the current pulse, and the inter-electrode control, but may be overlapped. Thus, changing the relevant machining parameters to change any of the current pulse generation period, current pulse shape, or inter-pole control may affect others.
 また、各加工条件は加工条件設定値によりノッチが指定され、複数の加工条件は各加工条件の指定されたノッチの組み合わせであるノッチパターンとして表現される。ノッチとは、加工条件を示す物理量を離散的に指定する刻みのことである。通常、数種類または数十種類のノッチパターンが放電加工機1に予め登録されている。各加工条件は、加工条件設定値によるノッチの選択とは別に、ユーザが変更することが出来ない制御パラメータを有している。先に説明したように、ノッチの選択を示す加工条件設定値および制御パラメータに基づいて具体的な物理量である加工条件が決定される。制御パラメータの具体例は、ノッチ分割数およびノッチ配分値である。ノッチ分割数は、当該加工条件において選択可能なノッチの数である。ノッチ配分値は、各ノッチに割当てられる加工条件の物理量の値である。ただし、制御パラメータはこれらに限定されない。図2に挙げた11種類の加工条件それぞれの制御パラメータを変数としてとらえた場合、その変数の総数は、数十から数百に及ぶ。 Also, each machining condition is designated as a notch by a machining condition setting value, and a plurality of machining conditions are expressed as a notch pattern that is a combination of notches designated for each machining condition. A notch is a step that discretely specifies a physical quantity indicating a processing condition. Usually, several types or several tens of types of notch patterns are registered in the electric discharge machine 1 in advance. Each machining condition has a control parameter that cannot be changed by the user separately from the selection of the notch by the machining condition set value. As described above, a machining condition that is a specific physical quantity is determined based on a machining condition setting value indicating the selection of a notch and a control parameter. Specific examples of the control parameters are the number of notch divisions and the notch distribution value. The number of notch divisions is the number of notches that can be selected under the processing conditions. The notch distribution value is a value of a physical quantity of a machining condition assigned to each notch. However, the control parameters are not limited to these. When the control parameters for each of the 11 types of machining conditions shown in FIG. 2 are regarded as variables, the total number of variables ranges from tens to hundreds.
 図3は、電圧制御に関わる加工条件の制御パラメータを説明している。図4は、電圧制御に関わる加工条件である、(4)電流パルス長さ、(5)パルス休止時間、(6)極間ギャップ調整値、(9)最深値持続時間、(10)軸応答性および(11)狙い電圧値が電流パルスの発生周期とどのように関わるかの概略を示している。(4)電流パルス長さおよび(5)パルス休止時間は図4の矢印で示した幅で示した加工条件であり、(6)極間ジャンプ調整値、(9)最深値持続時間、(10)軸応答性および(11)狙い電圧値は電流パルスの発生周期に関連する加工条件である。 FIG. 3 explains the control parameters of machining conditions related to voltage control. FIG. 4 shows machining conditions related to voltage control, (4) current pulse length, (5) pulse pause time, (6) gap gap adjustment value, (9) deepest value duration, and (10) axis response. And (11) an outline of how the target voltage value relates to the current pulse generation period. (4) Current pulse length and (5) pulse pause time are the processing conditions indicated by the widths indicated by arrows in FIG. 4, (6) inter-pole jump adjustment value, (9) deepest value duration, (10 The shaft response and (11) the target voltage value are machining conditions related to the current pulse generation period.
 具体例を挙げると、(4)電流パルス長さの制御パラメータは長さ制御パラメータとなるノッチ分割数およびノッチ配分値である。電流パルス長さに対してある制御パラメータであるノッチ分割数およびノッチ配分値が設定されているとする。このとき、加工条件設定値0が指定するノッチに対して電流パルス長さ=2μsecが対応し、加工条件設定値1が指定するノッチに対して電流パルス長さ=4μsecが対応し、加工条件設定値2が指定するノッチに対して電流パルス長さ=8μsecが対応するといった対応関係が上記制御パラメータにより規定される。電流パルス長さの制御パラメータが変更されると上記対応関係が変更されるので、同じ加工条件設定値に対する電流パルス長さが変更されることになる。ただし、制御パラメータが変更されたとしても、変更されたノッチ配分値によっては、全ての加工条件設定値に対する加工条件の値が変更されなくてもかまわない。 (4) The current pulse length control parameters are the number of notch divisions and the notch distribution value that are the length control parameters. It is assumed that notch division numbers and notch distribution values, which are certain control parameters, are set for the current pulse length. At this time, the current pulse length = 2 μsec corresponds to the notch specified by the machining condition setting value 0, and the current pulse length = 4 μsec corresponds to the notch specified by the machining condition setting value 1. A correspondence relationship in which the current pulse length = 8 μsec corresponds to the notch specified by the value 2 is defined by the control parameter. When the control parameter of the current pulse length is changed, the correspondence relationship is changed, so that the current pulse length for the same machining condition setting value is changed. However, even if the control parameter is changed, depending on the changed notch distribution value, the values of the machining conditions for all the machining condition setting values may not be changed.
 図5は、パルス制御に関わる加工条件の制御パラメータを説明している。図6は、パルス制御に関わる加工条件である、(1)加工回路の種別、(2)回路補助設定、(3)電流パルスピーク値、(4)電流パルス長さ、(6)極間ギャップ調整値、(11)狙い電圧値が電流パルスの形状とどのように関わるかの概略を示している。(1)加工回路の種別の回路呼び出しパラメータが変更されると加工回路が変更されるので電流パルスの形状が変化する。(2)回路補助設定は電流パルスの立ち上がりの傾きを規定する。(3)電流パルスピーク値は電流パルスのピーク値を規定する。(4)電流パルス長さは電流パルスのパルス長さを規定する。(6)極間ギャップ調整値および(11)狙い電圧値は電流パルスの間隔に関連する加工条件である。 FIG. 5 illustrates control parameters for machining conditions related to pulse control. FIG. 6 shows machining conditions related to pulse control, (1) type of machining circuit, (2) circuit auxiliary setting, (3) current pulse peak value, (4) current pulse length, and (6) gap between electrodes. The adjustment value, (11) shows an outline of how the target voltage value relates to the shape of the current pulse. (1) When the circuit call parameter for the type of machining circuit is changed, the machining circuit is changed, so that the shape of the current pulse changes. (2) The circuit auxiliary setting prescribes the rising slope of the current pulse. (3) The current pulse peak value defines the peak value of the current pulse. (4) The current pulse length defines the pulse length of the current pulse. (6) The gap gap adjustment value and (11) the target voltage value are processing conditions related to the interval between current pulses.
 具体例を挙げると、(3)電流パルスピーク値の制御パラメータはピーク制御パラメータとなるノッチ分割数およびノッチ配分値である。電流パルスピーク値Ipに対してある制御パラメータであるノッチ分割数およびノッチ配分値が設定されているとする。このとき、加工条件設定値0が指定するノッチに対してIp=1Aが対応し、加工条件設定値1が指定するノッチに対してIp=2Aが対応し、加工条件設定値2が指定するノッチに対してIp=4Aが対応するといった対応関係が上記制御パラメータにより規定される。電流パルスピーク値Ipの制御パラメータが変更されると上記対応関係が変更されるので、同じ加工条件設定値に対するIpの値が変更されることになる。ただし、制御パラメータが変更されたとしても、変更されたノッチ配分値によっては、全ての加工条件設定値に対する加工条件の値が変更されなくてもかまわない。 As a specific example, (3) the control parameter of the current pulse peak value is the number of notch divisions and the notch distribution value which are the peak control parameters. Assume that the notch division number and the notch distribution value, which are certain control parameters, are set for the current pulse peak value I p . At this time, I p = 1A corresponds to the notch specified by the machining condition setting value 0, I p = 2A corresponds to the notch specified by the machining condition setting value 1, and the machining condition setting value 2 is specified. A correspondence relationship such that I p = 4A corresponds to the notch to be defined is defined by the control parameter. When the control parameter of the current pulse peak value I p is changed, the correspondence relationship is changed, so that the value of I p for the same processing condition setting value is changed. However, even if the control parameter is changed, depending on the changed notch distribution value, the values of the machining conditions for all the machining condition setting values may not be changed.
 図7は、軸駆動制御に関わる加工条件の制御パラメータを説明している。図8は、軸駆動制御に関わる加工条件である、(6)極間ギャップ調整値、(7)ジャンプスピード、(8)ジャンプ高さ、(9)最深値持続時間、(10)軸応答性および(11)狙い電圧値が軸駆動制御とどのように関わるかの概略を示している。(6)極間ギャップ調整値、(9)最深値持続時間、(10)軸応答性および(11)狙い電圧値は、加工電極2と被加工物3とのアプローチ動作に関連する加工条件である。(7)ジャンプスピード、(8)ジャンプ高さおよび(10)軸応答性は、駆動軸のジャンプ動作を含んだ被加工物3からの加工電極2の退避動作に関連する加工条件である。 FIG. 7 illustrates control parameters of machining conditions related to the axis drive control. FIG. 8 shows machining conditions related to axis drive control. (6) Inter-pole gap adjustment value, (7) Jump speed, (8) Jump height, (9) Deepest value duration, (10) Axis response And (11) shows an outline of how the target voltage value relates to the shaft drive control. (6) Inter-electrode gap adjustment value, (9) Deepest value duration, (10) Axial response, and (11) Target voltage value are machining conditions related to the approach operation between the machining electrode 2 and the workpiece 3. is there. (7) Jump speed, (8) Jump height, and (10) Axial response are machining conditions related to the retreat operation of the machining electrode 2 from the workpiece 3 including the jump operation of the drive shaft.
 つぎに、放電加工における電圧パルスおよび電流パルスの安定または不安定について説明する。図9は、実施の形態1にかかる電圧パルスおよび電流パルスの状態を説明する図である。図10は、実施の形態1にかかる電圧パルスおよび電流パルスが安定である場合を示す図である。図11は、実施の形態1にかかる電圧パルスおよび電流パルスが不安定である場合を示す図である。図9~図11においては、上が電圧波形を示し、下が電流波形を示す。 Next, the stability or instability of voltage pulse and current pulse in electric discharge machining will be described. FIG. 9 is a diagram for explaining a state of voltage pulses and current pulses according to the first embodiment. FIG. 10 is a diagram illustrating a case where the voltage pulse and the current pulse according to the first embodiment are stable. FIG. 11 is a diagram illustrating a case where the voltage pulse and the current pulse according to the first embodiment are unstable. 9 to 11, the upper part shows the voltage waveform, and the lower part shows the current waveform.
 加工電極2と被加工物3との間に電圧が印加されると、予期できないタイミングで絶縁破壊が生じて電流が流れる。安定的に加工を行える理想的な電圧および電流の関係が生じるとトランジスタ回路等で成形された一定の傾きを有する矩形波に近い電流パルスが発生する。この電流パルスが図9に安定した放電として示される。このような理想的な電圧および電流の関係が満たされない場合は、電流パルスの電流波形の形状が理想と異なる図9の不安定な放電のようになったり、加工に有効でない電流として極間に異形状の電流波形が発生する図9の異常放電のようになったりする。 When a voltage is applied between the machining electrode 2 and the workpiece 3, a dielectric breakdown occurs at an unexpected timing and a current flows. When an ideal voltage and current relationship that enables stable processing occurs, a current pulse close to a rectangular wave having a certain slope formed by a transistor circuit or the like is generated. This current pulse is shown as a stable discharge in FIG. If such an ideal voltage and current relationship is not satisfied, the current waveform of the current pulse becomes unstable as shown in FIG. 9 which is different from the ideal, or the current is not effective for machining. It may be like the abnormal discharge in FIG. 9 in which an irregularly shaped current waveform is generated.
 放電加工の制御においては、極間の相対距離を制御するためのひとつの指標として、放電発生時における一定の時間あたりの平均電圧値を観測して制御を行う。理想的な電圧および電流の関係が維持される場合は、図10に示すように平均電圧値が理論値となるように維持されて安定した放電が継続する。しかし、図9の不安定な放電または異常放電が繰り返される場合は、図11に示すように平均電圧値が理論値から変動してしまい、不安定な放電が継続する。極間距離が無くなり加工電極2と被加工物3とが接触してしまった場合は短絡状態となり、加工電極2と被加工物3との距離が放電が発生しない距離まで離れてしまう場合は開放状態となるため、理論値に対する平均電圧値の変動が、ただちに放電パルスの安定または不安定を決定するものではない。また、理想的な条件下において図10に示される安定した電流パルスのパターンが発生し続けている場合においても、絶縁破壊が生じるまでの無負荷電圧時間と呼ばれる予期できない時間間隔があるため、放電発生の周期は一定ではない。したがって、放電発生周期の増減は、加工の安定性とは独立した指標である。 In the control of electrical discharge machining, as an index for controlling the relative distance between the poles, control is performed by observing an average voltage value per certain time at the time of electrical discharge. When the ideal voltage and current relationship is maintained, the average voltage value is maintained at the theoretical value as shown in FIG. 10, and stable discharge continues. However, when the unstable or abnormal discharge of FIG. 9 is repeated, the average voltage value varies from the theoretical value as shown in FIG. 11, and the unstable discharge continues. When the distance between the electrodes is lost and the machining electrode 2 and the workpiece 3 are in contact with each other, a short-circuit state is established, and when the distance between the machining electrode 2 and the workpiece 3 is far enough to prevent discharge, the electrode is opened. Therefore, the fluctuation of the average voltage value with respect to the theoretical value does not immediately determine whether the discharge pulse is stable or unstable. Further, even when the stable current pulse pattern shown in FIG. 10 continues to be generated under ideal conditions, there is an unexpected time interval called no-load voltage time until dielectric breakdown occurs. The period of occurrence is not constant. Therefore, the increase / decrease in the discharge generation cycle is an index independent of the processing stability.
 図12は、実施の形態1にかかる理想的な平均電圧値の分布を示す図である。図13は、実施の形態1にかかる安定した放電が継続するときの平均電圧値の分布を示す図である。図14は、実施の形態1にかかる不安定な放電が継続するときの平均電圧値の分布を示す図である。図12~図14においては、横軸が放電発生時における一定の時間あたりの平均電圧値を示し、縦軸が一定の時間あたりのパルス数を示す。 FIG. 12 is a diagram showing an ideal average voltage value distribution according to the first embodiment. FIG. 13 is a diagram illustrating a distribution of average voltage values when the stable discharge according to the first embodiment continues. FIG. 14 is a diagram illustrating a distribution of average voltage values when the unstable discharge according to the first embodiment continues. In FIGS. 12 to 14, the horizontal axis indicates the average voltage value per fixed time when the discharge occurs, and the vertical axis indicates the number of pulses per fixed time.
 理想的な電圧および電流の関係が維持される場合は、図12に示すように、上記平均電圧値は、理論値において定められたパルス数となる。実際の加工においては、物理現象上、目標とする電圧値を示す加工条件である狙い電圧値の周りに平均電圧値は分布すると共にパルス数も分布する。狙い電圧値は、理論値である必要はない。加工が安定していて図10に示すように安定した放電が継続する場合は、図13に示すように平均電圧値のばらつきも小さく、平均電圧値が狙い電圧値においてパルス数が最大になっている。また、加工が不安定で図11に示すように不安定な放電が継続する場合は、図14に示すように平均電圧値が狙い電圧値の周りに分散して大きくばらつくと共に、パルス数もばらついてしまう。 When the ideal voltage and current relationship is maintained, as shown in FIG. 12, the average voltage value is the number of pulses determined by the theoretical value. In actual machining, an average voltage value is distributed around the target voltage value, which is a machining condition indicating a target voltage value, due to a physical phenomenon, and the number of pulses is also distributed. The target voltage value need not be a theoretical value. When the machining is stable and the stable discharge continues as shown in FIG. 10, the variation in the average voltage value is small as shown in FIG. 13, and the number of pulses is maximized when the average voltage value is the target voltage value. Yes. In addition, when the machining is unstable and unstable discharge continues as shown in FIG. 11, the average voltage value is dispersed around the target voltage value as shown in FIG. 14, and the number of pulses also varies. End up.
 パルス状態認識部32は、一定期間における放電発生時の電圧の分布を基に分布の良否判定を行い、パルスが安定しているか不安定であるかを判定する。一例として、パルス状態認識部32は、加工電源制御部12から得た平均電圧値、狙い電圧値および電圧閾値との関係に基づいて、電圧パルスおよび電流パルスが安定しているか不安定であるかを判定する。具体的には、パルス状態認識部32は、加工電源制御部12の指令に基づいて、放電発生時における一定の時間あたりの平均電圧値の狙い電圧値からの偏差の絶対値が電圧閾値より大きい場合はパルスの不安定信号を発生させ、不安定信号の発生回数を上記一定の時間より長い予め定めた期間の間に累積させた値を第1状態の値として求める。さらに、パルス状態認識部32は、加工電源制御部12の指令から得た予め定めた期間において発生したパルス数を第2状態の値として求める。上記予め定めた期間は、例えば、ジャンプ動作と言われる退避動作が終了し、放電を発生させるための極間位置制御が行われ、再び次のジャンプ動作が行われるまでの動作時間とすることができる。軸駆動認識部31は、軸駆動制御部11の指令から得た駆動装置4における軸の送り量を第3状態の値として求める。第3状態の値は、軸の送り量が加工進行方向に大きくなるほど正の大きな値になり、軸の送り量が後退方向に大きくなるほど、負の大きな値になるように設定される。第1状態の値、第2状態の値および第3状態の値はそれぞれ放電加工中の加工状態を表す状態変数であり、加工状態観測部33は、取得した複数の状態変数である第1状態の値、第2状態の値および第3状態の値を分布図または棒グラフによるヒストグラムといった形式でユーザが目視で観測できるように表示部22に表示させる。このようにして、状態観測部30は、複数の状態変数である第1状態の値、第2状態の値および第3状態の値を観測する。そして、学習部40の第1報酬計算部41、第2報酬計算部43および第3報酬計算部45は、加工状態観測部33が取得した第1状態の値、第2状態の値および第3状態の値に基づいて報酬を計算する。 The pulse state recognition unit 32 determines the quality of the distribution based on the voltage distribution at the time of occurrence of discharge in a certain period, and determines whether the pulse is stable or unstable. As an example, the pulse state recognition unit 32 determines whether the voltage pulse and the current pulse are stable or unstable based on the relationship between the average voltage value, the target voltage value, and the voltage threshold value obtained from the machining power supply control unit 12. Determine. Specifically, the pulse state recognizing unit 32 has an absolute value of deviation from the target voltage value of the average voltage value per fixed time at the time of occurrence of discharge larger than the voltage threshold based on the command of the machining power supply control unit 12. In this case, an unstable signal of a pulse is generated, and a value obtained by accumulating the number of occurrences of the unstable signal during a predetermined period longer than the predetermined time is obtained as the value of the first state. Furthermore, the pulse state recognition part 32 calculates | requires the pulse number which generate | occur | produced in the predetermined period obtained from the instruction | command of the process power supply control part 12 as a value of a 2nd state. The predetermined period may be, for example, an operation time from when a retreat operation called a jump operation is completed, an inter-electrode position control for generating a discharge is performed, and a next jump operation is performed again. it can. The shaft drive recognition unit 31 obtains the shaft feed amount in the drive device 4 obtained from the command of the shaft drive control unit 11 as the value of the third state. The value of the third state is set to be a positive value as the shaft feed amount increases in the machining progress direction, and to a negative value as the shaft feed amount increases in the backward direction. The value of the first state, the value of the second state, and the value of the third state are state variables representing the machining state during electric discharge machining, and the machining state observation unit 33 is the first state that is the acquired plurality of state variables. , The value of the second state, and the value of the third state are displayed on the display unit 22 so that the user can visually observe them in the form of a histogram using a distribution chart or a bar graph. In this way, the state observation unit 30 observes the values of the first state, the second state, and the third state, which are a plurality of state variables. And the 1st reward calculation part 41 of the learning part 40, the 2nd reward calculation part 43, and the 3rd reward calculation part 45 are the value of the 1st state which the processing state observation part 33 acquired, the value of a 2nd state, and 3rd. Calculate rewards based on state values.
 状態観測部30、学習部40およびパラメータ変更部50を備えた機械学習装置100が用いる学習アルゴリズムはどのようなものを用いてもよい。一例として、強化学習(Reinforcement Learning)を適用した場合について説明する。 Any learning algorithm used by the machine learning device 100 including the state observing unit 30, the learning unit 40, and the parameter changing unit 50 may be used. As an example, a case where reinforcement learning (Reinforcement Learning) is applied will be described.
 強化学習は、ある環境内におけるエージェントである行動主体が、現在の状態を観測し、取るべき行動を決定する、というものである。エージェントは、行動を選択することで環境から報酬を得て、一連の行動を通じて報酬が最も多く得られるような方策を学習する。強化学習の代表的な手法として、Q学習(Q-learning)またはTD学習(TD-learning)が知られている。例えば、Q学習の場合、行動価値関数Q(s,a)の一般的な更新式は、以下の数式(1)で表される。行動価値関数Q(s,a)は、行動価値テーブルとも呼ばれる。 Reinforcement learning is that an action agent who is an agent in an environment observes the current state and decides an action to be taken. Agents receive rewards from the environment by selecting actions, and learn how to get the most rewards through a series of actions. As a typical method of reinforcement learning, Q-learning or TD-learning is known. For example, in the case of Q learning, a general update formula of the action value function Q (s, a) is represented by the following formula (1). The behavior value function Q (s, a) is also called a behavior value table.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 数式(1)において、stは時刻tにおける状態を表し、atは時刻tにおける行動を表す。行動atにより、状態はst+1に変わる。rt+1はその状態の変化によってもらえる報酬を表し、γは割引率を表し、αは学習係数を表す。 In Equation (1), s t represents the state at time t, a t represents the behavior in time t. By the action a t, the state is changed to s t + 1. r t + 1 represents a reward obtained by a change in the state, γ represents a discount rate, and α represents a learning coefficient.
 Q学習における数式(1)で表される更新式は、時刻t+1における最良の行動aの行動価値が、時刻tにおいて実行された行動atの行動価値Qよりも大きければ、時刻tの行動価値Qを大きくし、逆の場合は、時刻tの行動価値Qを小さくする。換言すれば、時刻tにおける行動atの行動価値Qを、時刻t+1における最良の行動価値に近づけるように、行動価値関数Q(st,at)を更新する。それにより、或る環境における最良の行動価値が、それ以前の環境における行動価値に順次伝播していくようになる。 Represented update equation in Equation (1) in the Q learning, action value of the best action a at time t + 1 is greater than the action value Q of the executed action a t at time t, activation level at time t If Q is increased, and vice versa, the action value Q at time t is decreased. In other words, the action value Q action a t at time t, as close to the best action value at time t + 1, action value function Q (s t, a t) Update. Thereby, the best action value in a certain environment is sequentially propagated to the action value in the previous environment.
 したがって、以下で説明する機械学習装置100の動作において、制御パラメータの変更行動を時刻tにおける行動atとし、上記第1、第2および第3状態を時刻tにおける状態stとすれば、Q学習を行っていると理解することができる。 Accordingly, in the operation of the machine learning system 100 to be described below, the change behavior of the control parameters and behavior a t at time t, the first, if the state s t at the second and third states the time t, Q You can understand that you are learning.
 以下、機械学習装置100による制御パラメータの最適化動作を説明する。 Hereinafter, the operation of optimizing the control parameters by the machine learning device 100 will be described.
 図15は、実施の形態1にかかる電圧制御に関わる加工条件の制御パラメータの学習による最適化処理を説明するフローチャートである。電圧制御に関わる加工条件の制御パラメータは、加工条件として設定される狙い電圧値の基となっている電圧制御を行うための変数値であり、これにより電圧の大きさだけでなく電圧波形の形状、放電を検出するための基準電圧と呼ばれる電圧基準値も含まれる。また、電圧制御に関わる加工条件の制御パラメータの最適化により、制御パラメータとして設定されている無負荷電圧時間の電圧および狙い電圧値の初期ノッチパターンを別のノッチパターンに変更するなどの処理も行われる。 FIG. 15 is a flowchart for explaining an optimization process by learning control parameters of machining conditions related to voltage control according to the first embodiment. Control parameters for machining conditions related to voltage control are variable values for performing voltage control that is the basis of the target voltage value set as machining conditions. This enables not only the magnitude of the voltage but also the shape of the voltage waveform. A voltage reference value called a reference voltage for detecting discharge is also included. In addition, by optimizing control parameters for machining conditions related to voltage control, processing such as changing the initial notch pattern of no-load voltage time and target voltage value set as control parameters to another notch pattern is also performed. Is called.
 電圧制御にかかる報酬を計算する第1報酬計算部41は、パルス状態認識部32が求めた状態変数である第1状態の値および第2状態の値に基づいて報酬の変化量を計算する。第1報酬計算部41は、第1状態の値が小さく、第2状態の値が大きくなる場合に報酬を増やすように報酬の変化量を計算するのであれば、第1状態の値および第2状態の値をどのように用いて報酬の変化量を求めるかに制限はない。具体的には、第1状態の値が小さくなった場合に報酬を増やし、第1状態の値が大きくなった場合には報酬を減らす。これに加えて、第2状態の値が大きくなった場合に報酬を増やし、第2状態の値が小さくなった場合には報酬を減らす。また、不安定なパルスの数が減り、安定したパルスの数が増えた場合に報酬を増大させるとする基本的な基準に加え、不安定なパルスが減少したとしても安定したパルスの数が減った場合においては報酬が減少するように報酬の計算方法を定めてもよい。 The 1st reward calculation part 41 which calculates the reward concerning voltage control calculates the variation | change_quantity of a reward based on the value of the 1st state which is the state variable which the pulse state recognition part 32 calculated | required, and the value of a 2nd state. If the first reward calculation unit 41 calculates the amount of change in reward so as to increase the reward when the value of the first state is small and the value of the second state is large, the value of the first state and the second value There is no restriction on how the state value is used to determine the amount of change in reward. Specifically, the reward is increased when the value of the first state becomes smaller, and the reward is decreased when the value of the first state becomes larger. In addition, the reward is increased when the value of the second state is increased, and the reward is decreased when the value of the second state is decreased. In addition to the basic criterion of increasing rewards when the number of unstable pulses decreases and the number of stable pulses increases, the number of stable pulses decreases even when unstable pulses decrease. In such a case, the reward calculation method may be determined so that the reward decreases.
 第1報酬計算部41が計算した報酬に基づいて、第1関数更新部42は電圧制御に関わる制御パラメータを決定するための関数である行動価値関数Qを更新する。更新された行動価値関数Qに基づいて、第1制御パラメータ変更部51は、報酬が最も多く得られる制御パラメータとなるように電圧制御に関わる加工条件の制御パラメータを変更する。 Based on the reward calculated by the first reward calculator 41, the first function updater 42 updates the action value function Q, which is a function for determining a control parameter related to voltage control. Based on the updated behavior value function Q, the first control parameter changing unit 51 changes the control parameter of the processing condition related to the voltage control so that the control parameter can obtain the most reward.
 以上をふまえて、図3に示した電圧制御に関わる加工条件の6種類の制御パラメータの最適化について、図15を用いて説明する。図15は放電加工機1が放電加工を継続して実行している状況において実行され、変更される制御パラメータには優先順位が設定されているとして説明するが、6種類の制御パラメータを同時に最適化するようにしてもよい。 Based on the above, the optimization of the six types of control parameters of the machining conditions related to the voltage control shown in FIG. 3 will be described with reference to FIG. FIG. 15 is executed in a situation where the electric discharge machine 1 is continuously executing electric discharge machining, and it is described that priority is set for the control parameters to be changed, but six types of control parameters are optimized at the same time. You may make it make it.
 図15のフローチャートが実行される前に、電圧制御にかかる報酬の初期値を第1報酬計算部41がすでに保持しているとする。報酬の初期値は固定値であれば制限されず0としてもよい。まず、現在の加工条件および制御パラメータで加工を実行しているときの、加工電源制御部12の情報を状態観測部30が観測する(ステップS101)。具体的には、加工中の加工電源制御部12の指令を状態観測部30が取得する。そして、加工電源制御部12の指令に基づいて、パルス状態認識部32が第1状態の値および第2状態の値を算出する(ステップS102)。次に、パルス状態認識部32が求めた状態変数である第1状態の値および第2状態の値が加工状態観測部33から第1報酬計算部41に与えられる。ここで、第1状態の値および第2状態の値は、制御パラメータ保持部13に設定されている現在使用されている制御パラメータと紐付けされて加工状態観測部33から第1報酬計算部41に与えられる。 Suppose that the first reward calculation unit 41 already holds the initial value of the reward for voltage control before the flowchart of FIG. 15 is executed. The initial value of the reward is not limited as long as it is a fixed value, and may be 0. First, the state observation unit 30 observes information of the machining power source control unit 12 when machining is performed with the current machining conditions and control parameters (step S101). Specifically, the state observation unit 30 acquires a command from the machining power source control unit 12 during machining. Then, based on the command from the machining power supply control unit 12, the pulse state recognition unit 32 calculates the value of the first state and the value of the second state (step S102). Next, the value of the first state and the value of the second state, which are the state variables obtained by the pulse state recognition unit 32, are given from the machining state observation unit 33 to the first reward calculation unit 41. Here, the value of the first state and the value of the second state are associated with the currently used control parameter set in the control parameter holding unit 13, and the processing state observation unit 33 to the first reward calculation unit 41. Given to.
 そして、第1報酬計算部41は、与えられた第1状態の値を前回の第1状態の値と比較する(ステップS103)。第1報酬計算部41は前回与えられた第1状態の値を保持しており、今回与えられた第1状態の値と比較することができる。第1状態の値が前回の第1状態の値より小さい場合(ステップS103:小)、第1報酬計算部41は報酬を増やす(ステップS104)。すなわち、第1状態の値が前回よりも安定した状態を示す場合には、報酬を増やす。ここでの報酬の増加値は予め定めた値である。第1状態の値が前回の第1状態の値と同じ場合(ステップS103:同じ)、第1報酬計算部41は報酬を変化させない(ステップS105)。第1状態の値が前回の第1状態の値より大きい場合(ステップS103:大)、第1報酬計算部41は報酬を減らす(ステップS106)。すなわち、第1状態の値が前回よりも不安定な状態を示す場合には、報酬を減らす。ここでの報酬の減少値は予め定めた値である。なお、最初にステップS103が実行されるときは前回与えられた第1状態の値が存在しないので、ステップS105に進む。 And the 1st reward calculation part 41 compares the value of the given 1st state with the value of the 1st state of the last time (Step S103). The first reward calculation unit 41 holds the value of the first state given last time, and can compare it with the value of the first state given this time. When the value of the first state is smaller than the value of the previous first state (step S103: small), the first reward calculation unit 41 increases the reward (step S104). That is, if the value of the first state indicates a more stable state than the previous time, the reward is increased. The increase value of the reward here is a predetermined value. When the value of the first state is the same as the value of the previous first state (step S103: the same), the first reward calculation unit 41 does not change the reward (step S105). When the value of the first state is larger than the value of the previous first state (step S103: large), the first reward calculation unit 41 reduces the reward (step S106). That is, if the value of the first state indicates a more unstable state than the previous time, the reward is reduced. Here, the decrease value of the reward is a predetermined value. Note that when step S103 is executed for the first time, the value of the first state given last time does not exist, so the process proceeds to step S105.
 次に、第1報酬計算部41は、与えられた第2状態の値を前回の第2状態の値と比較する(ステップS107)。第1報酬計算部41は前回与えられた第2状態の値を保持しており、今回与えられた第2状態の値と比較することができる。第2状態の値が前回の第2状態の値より大きい場合(ステップS107:大)、第1報酬計算部41は報酬を増やす(ステップS108)。すなわち、第2状態の値が前回よりも安定した状態を示す場合には、報酬を増やす。ここでの報酬の増加値は予め定めた値である。第2状態の値が前回の第2状態の値と同じ場合(ステップS107:同じ)、第1報酬計算部41は報酬を変化させない(ステップS109)。第2状態の値が前回の第2状態の値より小さい場合(ステップS107:小)、第1報酬計算部41は報酬を減らす(ステップS110)。すなわち、第2状態の値が前回よりも不安定な状態を示す場合には、報酬を減らす。ここでの報酬の減少値は予め定めた値である。なお、最初にステップS107が実行されるときは前回与えられた第2状態の値が存在しないので、ステップS109に進む。 Next, the first reward calculating unit 41 compares the given second state value with the previous second state value (step S107). The first reward calculation unit 41 holds the value of the second state given last time, and can compare it with the value of the second state given this time. When the value of the second state is larger than the value of the second state of the previous time (step S107: large), the first reward calculation unit 41 increases the reward (step S108). That is, when the value of the second state indicates a more stable state than the previous time, the reward is increased. The increase value of the reward here is a predetermined value. When the value of the second state is the same as the value of the second state of the previous time (step S107: the same), the first reward calculation unit 41 does not change the reward (step S109). When the value of the second state is smaller than the previous value of the second state (step S107: small), the first reward calculation unit 41 reduces the reward (step S110). That is, if the value of the second state indicates a more unstable state than the previous time, the reward is reduced. Here, the decrease value of the reward is a predetermined value. When step S107 is executed for the first time, the value of the second state given last time does not exist, so the process proceeds to step S109.
 そして、第1関数更新部42は、第1報酬計算部41が計算した報酬に基づいて、数式(1)に従って行動価値関数Qを更新する(ステップS111)。さらに、第1関数更新部42は、ステップS111において更新が行われなくなり、行動価値関数Qが収束したか否かを判定する(ステップS112)。行動価値関数Qが収束していないと判定された場合(ステップS112:No)、第1制御パラメータ変更部51は、ステップS111で更新された行動価値関数Qに基づいて、電圧制御に関わる加工条件の制御パラメータを変更する(ステップS113)。ステップS113の後はステップS101に戻る。行動価値関数Qが収束したと判定された場合(ステップS112:Yes)、学習部40は、第1制御パラメータ変更部51によって、電圧制御に関わる加工条件の制御パラメータの全てが変更されたか否かを判定する(ステップS114)。電圧制御に関わる加工条件の制御パラメータの全てが変更されてはいないと判定された場合(ステップS114:No)、ステップS113において第1制御パラメータ変更部51の変更対象となる制御パラメータを別の制御パラメータに替える(ステップS115)。ステップS115において新たな変更対象となった別の制御パラメータとは、電圧制御に関わるまだ変更されていない制御パラメータである。ステップS115の後はステップS113に進む。 And the 1st function update part 42 updates action value function Q according to a formula (1) based on the reward which the 1st reward calculation part 41 computed (Step S111). Further, the first function updating unit 42 determines whether or not the update is not performed in Step S111 and the action value function Q has converged (Step S112). When it is determined that the behavior value function Q has not converged (step S112: No), the first control parameter changing unit 51 performs processing conditions related to voltage control based on the behavior value function Q updated in step S111. The control parameter is changed (step S113). After step S113, the process returns to step S101. When it is determined that the action value function Q has converged (step S112: Yes), the learning unit 40 determines whether or not all the control parameters of the processing conditions related to voltage control have been changed by the first control parameter changing unit 51. Is determined (step S114). If it is determined that all the control parameters of the machining conditions related to voltage control have not been changed (step S114: No), the control parameter to be changed by the first control parameter changing unit 51 is changed to another control in step S113. The parameters are changed (step S115). Another control parameter that is a new change target in step S115 is a control parameter that has not yet been changed with respect to voltage control. After step S115, the process proceeds to step S113.
 ステップS113における、電圧制御に関わる加工条件の制御パラメータの変更について以下に詳細に説明する。上述したように、ステップS113において変更される図3に示した電圧制御に関わる加工条件の6種類の制御パラメータには、変更対象となる優先順位が定められている。最初にステップS113に入ったときに第1制御パラメータ変更部51によって変更されるのは、狙い電圧値の制御パラメータである電圧制御パラメータである。そして、ステップS112において行動価値関数Qが収束したと判定される毎に、第1制御パラメータ変更部51の変更対象となる制御パラメータが、軸応答性の制御パラメータであるGAIN制御パラメータ、パルス休止時間の制御パラメータである長さ制御パラメータ、極間ギャップ調整値の制御パラメータであるギャップ制御パラメータ、最深値持続時間の制御パラメータである長さ制御パラメータ、電流パルス長さの制御パラメータである長さ制御パラメータの順にステップS115で替えられていく。 The change of the control parameters of the machining conditions related to voltage control in step S113 will be described in detail below. As described above, the priority order to be changed is determined for the six control parameters of the machining conditions related to the voltage control shown in FIG. 3 that are changed in step S113. When first entering step S113, the first control parameter changing unit 51 changes the voltage control parameter that is the control parameter of the target voltage value. Each time it is determined in step S112 that the behavioral value function Q has converged, the control parameters to be changed by the first control parameter changing unit 51 are the GAIN control parameter, which is an axis response control parameter, and the pulse pause time. Length control parameter, gap control parameter as control parameter of gap gap adjustment value, length control parameter as control parameter of deepest duration, length control as control parameter of current pulse length The parameters are changed in the order of parameters in step S115.
 行動価値関数Qが収束して、電圧制御に関わる加工条件の制御パラメータの全てが変更されたと学習部40が判定した場合(ステップS114:Yes)、電圧制御に関わる加工条件の制御パラメータの学習による最適化処理は終了し、学習結果が学習結果記憶部80に記憶される(ステップS116)。学習結果には、ステップS113で変更されて最終的に決定された各制御パラメータに加えて、各制御パラメータの変更過程の値、および制御パラメータに対応する第1状態の値および第2状態の値が含まれる。学習結果記憶部80に記憶された学習結果は、制御パラメータの変更前後の良否判断に利用することができる。また、上記のようにして最終的に決定された制御パラメータは、上記学習において報酬が最も多く得られ、与えられた加工条件設定値において最適な制御パラメータとして制御パラメータ保持部13に保持される。電圧制御に関わる加工条件の制御パラメータを学習により最適化することで、加工開始から終了までの間に不安定信号が発生することを防いで、安定信号のパルスの数を最大化することが可能になる。なお、上述したように、電圧制御に関わる加工条件の6種類の制御パラメータを同時に最適化する場合は、ステップS113において、第1制御パラメータ変更部51は、ステップS111で更新された行動価値関数Qに基づいて、6種類の制御パラメータを同時に変更する。この場合、ステップS114およびS115は不要であり、ステップS112において行動価値関数Qが収束したと判定された場合(ステップS112:Yes)、ただちにステップS116に進むようにすればよい。 When the learning unit 40 determines that the action value function Q has converged and all the control parameters of the machining conditions related to voltage control have been changed (step S114: Yes), the learning of the control parameters of the machining conditions related to voltage control is performed. The optimization process ends, and the learning result is stored in the learning result storage unit 80 (step S116). In the learning result, in addition to the control parameters that are changed and finally determined in step S113, the value of the change process of each control parameter, the value of the first state and the value of the second state corresponding to the control parameter Is included. The learning result stored in the learning result storage unit 80 can be used for pass / fail judgment before and after the change of the control parameter. Further, the control parameter finally determined as described above is most rewarded in the learning, and is held in the control parameter holding unit 13 as an optimal control parameter for the given processing condition setting value. By optimizing the control parameters of machining conditions related to voltage control by learning, it is possible to prevent the generation of unstable signals from the start to the end of machining, and to maximize the number of stable signal pulses. become. As described above, when simultaneously optimizing the six types of control parameters of the machining conditions related to voltage control, in step S113, the first control parameter changing unit 51 updates the action value function Q updated in step S111. 6 types of control parameters are changed simultaneously. In this case, steps S114 and S115 are unnecessary, and if it is determined in step S112 that the action value function Q has converged (step S112: Yes), the process may immediately proceed to step S116.
 図16は、実施の形態1にかかるパルス制御に関わる加工条件の制御パラメータの学習による最適化処理を説明するフローチャートである。パルス制御に関わる加工条件の制御パラメータは、パルスの傾き、パルス発生周期の理論値の元となっている異常放電検出閾値など、電流パルス制御を行うための変数値である。パルス制御に関わる加工条件の制御パラメータには、電流パルスの大きさおよび幅だけでなく電流波形の形状、電流値を理想的な形状に近づけるための加工電極2と被加工物3との相対距離を調整するための極間ギャップの調整値も含まれる。また、パルス制御に関わる加工条件の制御パラメータの最適化により、制御パラメータとして設定されている電流の大きさおよび幅の初期ノッチパターンを別のノッチパターンに変更するなどの処理も行われる。 FIG. 16 is a flowchart for explaining an optimization process by learning control parameters of machining conditions related to the pulse control according to the first embodiment. Control parameters of machining conditions related to pulse control are variable values for performing current pulse control, such as a pulse inclination and an abnormal discharge detection threshold that is a basis of a theoretical value of a pulse generation period. Control parameters of machining conditions related to pulse control include not only the magnitude and width of the current pulse but also the current waveform shape and the relative distance between the machining electrode 2 and the workpiece 3 for bringing the current value close to the ideal shape. The adjustment value of the gap between the electrodes for adjusting is also included. In addition, processing such as changing the initial notch pattern of the current magnitude and width set as the control parameters to another notch pattern is performed by optimizing the control parameters of the processing conditions related to the pulse control.
 パルス制御にかかる報酬を計算する第2報酬計算部43は、パルス状態認識部32が求めた状態変数である第1状態の値および第2状態の値に基づいて報酬を計算する。第2報酬計算部43の報酬の計算方法は、第1報酬計算部41と同じである。 2nd reward calculation part 43 which calculates the reward concerning pulse control calculates a reward based on the value of the 1st state which is the state variable which pulse state recognition part 32 asked, and the value of the 2nd state. The reward calculation method of the second reward calculation unit 43 is the same as that of the first reward calculation unit 41.
 第2報酬計算部43が計算した報酬に基づいて、第2関数更新部44はパルス制御に関わる制御パラメータを決定するための関数である行動価値関数Qを更新する。更新された行動価値関数Qに基づいて、第2制御パラメータ変更部52は、報酬が最も多く得られる制御パラメータとなるようにパルス制御に関わる加工条件の制御パラメータを変更する。 Based on the reward calculated by the second reward calculator 43, the second function updater 44 updates the action value function Q, which is a function for determining a control parameter related to pulse control. Based on the updated behavior value function Q, the second control parameter changing unit 52 changes the control parameter of the processing condition related to the pulse control so as to obtain the control parameter with the most reward.
 以上をふまえて、図5に示したパルス制御に関わる加工条件の6種類の制御パラメータの最適化について、図16を用いて説明する。図16は放電加工機1が放電加工を継続して実行している状況において実行され、変更される制御パラメータには優先順位が設定されているとして説明するが、6種類の制御パラメータを同時に最適化するようにしてもよい。 Based on the above, the optimization of the six types of control parameters of the machining conditions related to the pulse control shown in FIG. 5 will be described with reference to FIG. FIG. 16 is executed in a situation where the electric discharge machine 1 is continuously executing electric discharge machining, and it is described that priority is set for the control parameters to be changed. However, six types of control parameters are optimized at the same time. You may make it make it.
 図16のフローチャートが実行される前に、パルス制御にかかる報酬の初期値を第2報酬計算部43が保持しているとする。報酬の初期値は固定値であれば制限されず0としてもよい。まず、現在の加工条件および制御パラメータで加工を実行しているときの、加工電源制御部12の情報を状態観測部30が観測する(ステップS201)。具体的には、加工中の加工電源制御部12の指令を状態観測部30が取得する。そして、加工電源制御部12の指令に基づいて、パルス状態認識部32が第1状態の値および第2状態の値を算出する(ステップS202)。次に、パルス状態認識部32が求めた状態変数である第1状態の値および第2状態の値が加工状態観測部33から第2報酬計算部43に与えられる。ここで、第1状態の値および第2状態の値は、制御パラメータ保持部13に設定されている現在使用されている制御パラメータと紐付けされて加工状態観測部33から第2報酬計算部43に与えられる。 Suppose that the second reward calculation unit 43 holds the initial value of reward for pulse control before the flowchart of FIG. 16 is executed. The initial value of the reward is not limited as long as it is a fixed value, and may be 0. First, the state observation unit 30 observes information of the machining power source control unit 12 when machining is performed with the current machining conditions and control parameters (step S201). Specifically, the state observation unit 30 acquires a command from the machining power source control unit 12 during machining. Then, based on the command of the machining power supply control unit 12, the pulse state recognition unit 32 calculates the value of the first state and the value of the second state (step S202). Next, the value of the first state and the value of the second state, which are the state variables obtained by the pulse state recognition unit 32, are given from the machining state observation unit 33 to the second reward calculation unit 43. Here, the value of the first state and the value of the second state are associated with the currently used control parameter set in the control parameter holding unit 13, and the processing state observation unit 33 to the second reward calculation unit 43. Given to.
 そして、第2報酬計算部43は、与えられた第1状態の値を前回の第1状態の値と比較する(ステップS203)。第2報酬計算部43は前回与えられた第1状態の値を保持しており、今回与えられた第1状態の値と比較することができる。第1状態の値が前回の第1状態の値より小さい場合(ステップS203:小)、第2報酬計算部43は報酬を増やす(ステップS204)。ここでの報酬の増加値は予め定めた値である。第1状態の値が前回の第1状態の値と同じ場合(ステップS203:同じ)、第2報酬計算部43は報酬を変化させない(ステップS205)。第1状態の値が前回の第1状態の値より大きい場合(ステップS203:大)、第2報酬計算部43は報酬を減らす(ステップS206)。ここでの報酬の減少値は予め定めた値である。なお、最初にステップS203が実行されるときは前回与えられた第1状態の値が存在しないので、ステップS205に進む。 And the 2nd reward calculation part 43 compares the value of the given 1st state with the value of the 1st state of the last time (Step S203). The second reward calculation unit 43 holds the value of the first state given last time, and can compare it with the value of the first state given this time. When the value of the first state is smaller than the previous value of the first state (step S203: small), the second reward calculation unit 43 increases the reward (step S204). The increase value of the reward here is a predetermined value. If the value of the first state is the same as the value of the previous first state (step S203: the same), the second reward calculation unit 43 does not change the reward (step S205). When the value of the first state is larger than the value of the first state of the previous time (step S203: large), the second reward calculation unit 43 reduces the reward (step S206). Here, the decrease value of the reward is a predetermined value. When step S203 is executed for the first time, the value of the first state given last time does not exist, so the process proceeds to step S205.
 次に、第2報酬計算部43は、与えられた第2状態の値を前回の第2状態の値と比較する(ステップS207)。第2報酬計算部43は前回与えられた第2状態の値を保持しており、今回与えられた第2状態の値と比較することができる。第2状態の値が前回の第2状態の値より大きい場合(ステップS207:大)、第2報酬計算部43は報酬を増やす(ステップS208)。ここでの報酬の増加値は予め定めた値である。第2状態の値が前回の第2状態の値と同じ場合(ステップS207:同じ)、第2報酬計算部43は報酬を変化させない(ステップS209)。第2状態の値が前回の第2状態の値より小さい場合(ステップS207:小)、第2報酬計算部43は報酬を減らす(ステップS210)。ここでの報酬の減少値は予め定めた値である。なお、最初にステップS207が実行されるときは前回与えられた第2状態の値が存在しないので、ステップS209に進む。 Next, the second reward calculation unit 43 compares the given second state value with the previous second state value (step S207). The second reward calculation unit 43 holds the value of the second state given last time and can compare it with the value of the second state given this time. When the value of the second state is larger than the value of the second state of the previous time (step S207: large), the second reward calculation unit 43 increases the reward (step S208). The increase value of the reward here is a predetermined value. When the value of the second state is the same as the value of the second state of the previous time (step S207: the same), the second reward calculation unit 43 does not change the reward (step S209). When the value of the second state is smaller than the previous value of the second state (step S207: small), the second reward calculation unit 43 reduces the reward (step S210). Here, the decrease value of the reward is a predetermined value. When step S207 is executed for the first time, the value of the second state given last time does not exist, so the process proceeds to step S209.
 そして、第2関数更新部44は、第2報酬計算部43が計算した報酬に基づいて、数式(1)に従って行動価値関数Qを更新する(ステップS211)。さらに、第2関数更新部44は、ステップS211において更新が行われなくなり、行動価値関数Qが収束したか否かを判定する(ステップS212)。行動価値関数Qが収束していないと判定された場合(ステップS212:No)、第2制御パラメータ変更部52は、ステップS211で更新された行動価値関数Qに基づいて、パルス制御に関わる加工条件の制御パラメータを変更する(ステップS213)。ステップS213の後はステップS201に戻る。行動価値関数Qが収束したと判定された場合(ステップS212:Yes)、学習部40は、第2制御パラメータ変更部52によって、パルス制御に関わる加工条件の制御パラメータの全てが変更されたか否かを判定する(ステップS214)。パルス制御に関わる加工条件の制御パラメータの全てが変更されてはいないと判定された場合(ステップS214:No)、ステップS213において第2制御パラメータ変更部52の変更対象となる制御パラメータを別の制御パラメータに替える(ステップS215)。ステップS215において新たな変更対象となった別の制御パラメータとは、パルス制御に関わるまだ変更されていない制御パラメータである。ステップS215の後はステップS213に進む。 Then, the second function updating unit 44 updates the action value function Q according to the mathematical formula (1) based on the reward calculated by the second reward calculating unit 43 (step S211). Furthermore, the second function update unit 44 determines whether or not the update is not performed in step S211 and the action value function Q has converged (step S212). When it is determined that the action value function Q has not converged (step S212: No), the second control parameter changing unit 52 performs processing conditions related to pulse control based on the action value function Q updated in step S211. The control parameter is changed (step S213). After step S213, the process returns to step S201. When it is determined that the behavior value function Q has converged (step S212: Yes), the learning unit 40 determines whether or not all the control parameters of the processing conditions related to the pulse control have been changed by the second control parameter changing unit 52. Is determined (step S214). When it is determined that all the control parameters of the machining conditions related to the pulse control have not been changed (step S214: No), the control parameter to be changed by the second control parameter changing unit 52 is changed to another control in step S213. The parameters are changed (step S215). Another control parameter that is a new change target in step S215 is a control parameter that has not yet been changed related to pulse control. After step S215, the process proceeds to step S213.
 ステップS213における、パルス制御に関わる加工条件の制御パラメータの変更について以下に詳細に説明する。上述したように、ステップS213において変更される図5に示したパルス制御に関わる加工条件の6種類の制御パラメータには、変更対象となる優先順位が定められている。最初にステップS213に入ったときに第2制御パラメータ変更部52によって変更されるのは、狙い電圧値の制御パラメータである電圧制御パラメータである。そして、ステップS212において行動価値関数Qが収束したと判定される毎に、第2制御パラメータ変更部52の変更対象となる制御パラメータが、極間ギャップ調整値の制御パラメータであるギャップ制御パラメータ、回路補助設定の制御パラメータであるパルス傾き制御パラメータ、電流パルス長さの制御パラメータである長さ制御パラメータ、電流パルスピーク値の制御パラメータであるピーク制御パラメータ、加工回路の種別の制御パラメータである回路呼出しパラメータの順にステップS215で替えられていく。 The change of the control parameters of the machining conditions related to the pulse control in step S213 will be described in detail below. As described above, priorities to be changed are determined for the six types of control parameters of the machining conditions related to the pulse control shown in FIG. 5 that are changed in step S213. When step S213 is entered first, the second control parameter changing unit 52 changes the voltage control parameter that is the control parameter of the target voltage value. Each time it is determined in step S212 that the action value function Q has converged, the control parameter to be changed by the second control parameter changing unit 52 is a gap control parameter or circuit that is a control parameter for the gap gap adjustment value. Pulse tilt control parameter that is a control parameter for auxiliary setting, length control parameter that is a control parameter for current pulse length, peak control parameter that is a control parameter for current pulse peak value, circuit call that is a control parameter for processing circuit type In step S215, the parameters are changed in order.
 行動価値関数Qが収束して、パルス制御に関わる加工条件の制御パラメータの全てが変更されたと学習部40が判定した場合(ステップS214:Yes)、パルス制御に関わる加工条件の制御パラメータの学習による最適化処理は終了し、学習結果が学習結果記憶部80に記憶される(ステップS216)。学習結果には、ステップS213で変更されて最終的に決定された各制御パラメータに加えて、各制御パラメータの変更過程の値、および制御パラメータに対応する第1状態の値および第2状態の値が含まれる。学習結果記憶部80に記憶された学習結果は、制御パラメータの変更前後の良否判断に利用することができる。また、上記のようにして最終的に決定された制御パラメータは、上記学習において報酬が最も多く得られ、与えられた加工条件設定値において最適な制御パラメータとして制御パラメータ保持部13に保持される。パルス制御に関わる加工条件の制御パラメータを学習により最適化することで、加工開始から終了までの間に不安定信号が発生することを防いで、安定信号のパルスの数を最大化することが可能になる。なお、上述したように、パルス制御に関わる加工条件の6種類の制御パラメータを同時に最適化する場合は、ステップS213において、第2制御パラメータ変更部52は、ステップS211で更新された行動価値関数Qに基づいて、6種類の制御パラメータを同時に変更する。この場合、ステップS214およびS215は不要であり、ステップS212において行動価値関数Qが収束したと判定された場合(ステップS212:Yes)、ただちにステップS216に進むようにすればよい。 If the learning unit 40 determines that the action value function Q has converged and all of the machining condition control parameters related to pulse control have been changed (step S214: Yes), the learning of the machining condition control parameters related to pulse control is performed. The optimization process ends, and the learning result is stored in the learning result storage unit 80 (step S216). In the learning result, in addition to the control parameters that are changed and finally determined in step S213, the value of the change process of each control parameter, the value of the first state and the value of the second state corresponding to the control parameter Is included. The learning result stored in the learning result storage unit 80 can be used for pass / fail judgment before and after the change of the control parameter. Further, the control parameter finally determined as described above is most rewarded in the learning, and is held in the control parameter holding unit 13 as an optimal control parameter for the given processing condition setting value. By optimizing the control parameters of machining conditions related to pulse control by learning, it is possible to maximize the number of stable signal pulses by preventing the generation of unstable signals from the start to the end of machining. become. As described above, when simultaneously optimizing the six types of control parameters of the machining conditions related to the pulse control, in step S213, the second control parameter changing unit 52 determines the action value function Q updated in step S211. 6 types of control parameters are changed simultaneously. In this case, steps S214 and S215 are unnecessary, and if it is determined in step S212 that the action value function Q has converged (step S212: Yes), the process may proceed immediately to step S216.
 図17は、実施の形態1にかかる軸駆動制御に関わる加工条件の制御パラメータの学習による最適化処理を説明するフローチャートである。軸駆動制御に関わる加工条件の制御パラメータは極間制御パラメータとも呼ばれ、加工電極2と被加工物3とを近接させる際の減速距離、ジャンプ動作と呼ばれる瞬時退避行動の挙動を生成する速度および加速度のパラメータなど、放電加工機1の軸駆動挙動の変更を行うための変数値である。極間制御パラメータの変更には、極間に安定的に放電を発生させるための軸応答性の変更だけでなく、極間を清掃するためのジャンプ動作の変更、軸の過応答による固有周波数振動による加振を防ぐためのパラメータの変更も含まれる。 FIG. 17 is a flowchart for explaining an optimization process by learning control parameters of machining conditions related to the axis drive control according to the first embodiment. The control parameter of the machining condition related to the shaft drive control is also called an inter-electrode control parameter, a deceleration distance when the machining electrode 2 and the workpiece 3 are brought close to each other, a speed for generating a behavior of an instantaneous retreat action called a jump operation, and It is a variable value for changing the shaft drive behavior of the electric discharge machine 1, such as an acceleration parameter. In order to change the control parameters between the poles, not only changes in the shaft response to generate stable discharge between the poles, but also changes in jumping action to clean the gaps, natural frequency vibration due to over-response of the shaft Also included is a parameter change to prevent vibration due to.
 軸駆動制御にかかる報酬を計算する第3報酬計算部45は、パルス状態認識部32が求めた状態変数である第2状態の値および軸駆動認識部31が求めた状態変数である第3状態の値に基づいて報酬を計算する。第3報酬計算部45は、第2状態の値が大きく、第3状態の値が大きくなる場合に報酬を増やすように報酬の変化量を計算するのであれば、第2状態の値および第3状態の値をどのように用いて報酬の変化量を求めるかに制限はない。具体的には、第2状態の値が大きくなった場合に報酬を増やし、第2状態の値が小さくなった場合には報酬を減らす。これに加えて、第3状態の値が大きくなった場合に報酬を増やし、第3状態の値が小さくなった場合には報酬を減らす。また、軸の送り量に変化が無くても放電パルスの数が増大すれば報酬が増大するようにするが、軸の送り量が加工進行方向に大きくなっても放電パルスの数が減少した場合は報酬が減少するように報酬の計算方法を定めてもよい。 The third reward calculation unit 45 that calculates the reward related to the shaft drive control is the value of the second state that is the state variable obtained by the pulse state recognition unit 32 and the third state that is the state variable obtained by the shaft drive recognition unit 31. Calculate the reward based on the value of. If the value of the second state is large and the value of the third state increases, the third reward calculating unit 45 calculates the amount of change in the reward so as to increase the reward. There is no restriction on how the state value is used to determine the amount of change in reward. Specifically, the reward is increased when the value of the second state is increased, and the reward is decreased when the value of the second state is decreased. In addition to this, the reward is increased when the value of the third state is increased, and the reward is decreased when the value of the third state is decreased. If the number of discharge pulses increases even if there is no change in the feed amount of the shaft, the reward will increase, but if the number of discharge pulses decreases even if the feed amount of the shaft increases in the machining progress direction May determine how to calculate the reward so that the reward decreases.
 第3報酬計算部45が計算した報酬に基づいて、第3関数更新部46は軸駆動制御に関わる制御パラメータを決定するための関数である行動価値関数Qを更新する。更新された行動価値関数Qに基づいて、第3制御パラメータ変更部53は、報酬が最も多く得られる制御パラメータとなるように軸駆動制御に関わる加工条件の制御パラメータを変更する。 Based on the reward calculated by the third reward calculator 45, the third function updater 46 updates the action value function Q, which is a function for determining a control parameter related to the axis drive control. Based on the updated behavior value function Q, the third control parameter changing unit 53 changes the control parameter of the machining condition related to the axial drive control so that the control parameter can obtain the most reward.
 以上をふまえて、図7に示した軸駆動制御に関わる加工条件の5種類の制御パラメータの最適化について、図17を用いて説明する。図17は放電加工機1が放電加工を継続して実行している状況において実行され、変更される制御パラメータには優先順位が設定されているとして説明するが、5種類の制御パラメータを同時に最適化するようにしてもよい。 Based on the above, the optimization of the five types of control parameters of the machining conditions related to the axis drive control shown in FIG. 7 will be described with reference to FIG. FIG. 17 is executed in a situation where the electric discharge machine 1 is continuously executing electric discharge machining, and it is described that priority is set for the control parameters to be changed, but five types of control parameters are optimized at the same time. You may make it make it.
 図17のフローチャートが実行される前に、軸駆動制御にかかる報酬の初期値を第3報酬計算部45がすでに保持しているとする。報酬の初期値は固定値であれば制限されず0としてもよい。まず、現在の加工条件および制御パラメータで加工を実行しているときの、加工電源制御部12の情報を状態観測部30が観測する(ステップS301)。具体的には、加工中の加工電源制御部12の指令を状態観測部30が取得する。そして、加工電源制御部12の指令に基づいて、パルス状態認識部32が第2状態の値を算出する(ステップS303)。また、現在の加工条件および制御パラメータで加工を実行しているときの、軸駆動制御部11の情報を状態観測部30が観測する(ステップS302)。具体的には、加工中の軸駆動制御部11の指令を状態観測部30が取得する。そして、軸駆動制御部11の指令に基づいて、軸駆動認識部31が第3状態の値を算出する(ステップS304)。次に、パルス状態認識部32が求めた第2状態の値および軸駆動認識部31が求めた第3状態の値が加工状態観測部33から第3報酬計算部45に与えられる。ここで、第2状態の値および第3状態の値は、制御パラメータ保持部13に設定されている現在使用されている制御パラメータと紐付けされて加工状態観測部33から第3報酬計算部45に与えられる。 Suppose that the third reward calculation unit 45 already holds the initial value of the reward for the axis drive control before the flowchart of FIG. 17 is executed. The initial value of the reward is not limited as long as it is a fixed value, and may be 0. First, the state observation unit 30 observes information of the machining power source control unit 12 when machining is performed with the current machining conditions and control parameters (step S301). Specifically, the state observation unit 30 acquires a command from the machining power source control unit 12 during machining. Then, based on the command of the machining power supply control unit 12, the pulse state recognition unit 32 calculates the value of the second state (step S303). Further, the state observing unit 30 observes information of the shaft drive control unit 11 when the machining is being executed with the current machining conditions and control parameters (step S302). Specifically, the state observation unit 30 acquires a command from the shaft drive control unit 11 during machining. Then, based on the command from the shaft drive control unit 11, the shaft drive recognition unit 31 calculates the value of the third state (step S304). Next, the value of the second state obtained by the pulse state recognition unit 32 and the value of the third state obtained by the shaft drive recognition unit 31 are provided from the machining state observation unit 33 to the third reward calculation unit 45. Here, the value of the second state and the value of the third state are associated with the currently used control parameter set in the control parameter holding unit 13, and the machining state observation unit 33 to the third reward calculation unit 45. Given to.
 そして、第3報酬計算部45は、与えられた第2状態の値を前回の第2状態の値と比較する(ステップS305)。第3報酬計算部45は前回与えられた第2状態の値を保持しており、今回与えられた第2状態の値と比較することができる。第2状態の値が前回の第2状態の値より大きい場合(ステップS305:大)、第3報酬計算部45は報酬を増やす(ステップS306)。ここでの報酬の増加値は予め定めた値である。第2状態の値が前回の第2状態の値と同じ場合(ステップS305:同じ)、第3報酬計算部45は報酬を変化させない(ステップS307)。第2状態の値が前回の第2状態の値より小さい場合(ステップS305:小)、第3報酬計算部45は報酬を減らす(ステップS308)。ここでの報酬の減少値は予め定めた値である。なお、最初にステップS305が実行されるときは前回与えられた第2状態の値が存在しないので、ステップS307に進む。 And the 3rd reward calculation part 45 compares the value of the given 2nd state with the value of the 2nd state of the last time (Step S305). The third reward calculation unit 45 holds the value of the second state given last time, and can compare it with the value of the second state given this time. When the value of the second state is larger than the value of the second state of the previous time (step S305: large), the third reward calculation unit 45 increases the reward (step S306). The increase value of the reward here is a predetermined value. When the value of the second state is the same as the value of the second state of the previous time (step S305: the same), the third reward calculation unit 45 does not change the reward (step S307). When the value of the second state is smaller than the previous value of the second state (step S305: small), the third reward calculation unit 45 reduces the reward (step S308). Here, the decrease value of the reward is a predetermined value. When step S305 is executed for the first time, the value of the second state given last time does not exist, so the process proceeds to step S307.
 次に、第3報酬計算部45は、与えられた第3状態の値を前回の第3状態の値と比較する(ステップS309)。第3報酬計算部45は前回与えられた第3状態の値を保持しており、今回与えられた第3状態の値と比較することができる。第3状態の値が前回の第3状態の値より大きい場合(ステップS309:大)、第3報酬計算部45は報酬を増やす(ステップS310)。すなわち、第3状態の値が前回よりも安定した状態を示す場合には、報酬を増やす。ここでの報酬の増加値は予め定めた値である。第3状態の値が前回の第3状態の値と同じ場合(ステップS309:同じ)、第3報酬計算部45は報酬を変化させない(ステップS311)。第3状態の値が前回の第3状態の値より小さい場合(ステップS309:小)、第3報酬計算部45は報酬を減らす(ステップS312)。すなわち、第3状態の値が前回よりも不安定な状態を示す場合には、報酬を減らす。ここでの報酬の減少値は予め定めた値である。なお、最初にステップS309が実行されるときは前回与えられた第3状態の値が存在しないので、ステップS311に進む。 Next, the third reward calculation unit 45 compares the given third state value with the previous third state value (step S309). The third reward calculation unit 45 holds the value of the third state given last time, and can compare it with the value of the third state given this time. If the value of the third state is larger than the value of the previous third state (step S309: large), the third reward calculation unit 45 increases the reward (step S310). That is, when the value of the third state indicates a more stable state than the previous time, the reward is increased. The increase value of the reward here is a predetermined value. When the value of the third state is the same as the value of the previous third state (step S309: the same), the third reward calculation unit 45 does not change the reward (step S311). When the value of the third state is smaller than the value of the previous third state (step S309: small), the third reward calculation unit 45 reduces the reward (step S312). That is, when the value of the third state indicates a more unstable state than the previous time, the reward is reduced. Here, the decrease value of the reward is a predetermined value. When step S309 is executed for the first time, the value of the third state given last time does not exist, so the process proceeds to step S311.
 そして、第3関数更新部46は、第3報酬計算部45が計算した報酬に基づいて、数式(1)に従って行動価値関数Qを更新する(ステップS313)。さらに、第3関数更新部46は、ステップS313において更新が行われなくなり、行動価値関数Qが収束したか否かを判定する(ステップS314)。行動価値関数Qが収束していないと判定された場合(ステップS314:No)、第3制御パラメータ変更部53は、ステップS313で更新された行動価値関数Qに基づいて、軸駆動制御に関わる加工条件の制御パラメータを変更する(ステップS315)。ステップS315の後はステップS301およびS302に戻る。行動価値関数Qが収束したと判定された場合(ステップS314:Yes)、学習部40は、第3制御パラメータ変更部53によって、軸駆動制御に関わる加工条件の制御パラメータの全てが変更されたか否かを判定する(ステップS316)。軸駆動制御に関わる加工条件の制御パラメータの全てが変更されてはいないと判定された場合(ステップS316:No)、ステップS315において第3制御パラメータ変更部53の変更対象となる制御パラメータを別の制御パラメータに替える(ステップS317)。ステップS317において新たな変更対象となった別の制御パラメータとは、軸駆動制御に関わるまだ変更されていない制御パラメータである。ステップS317の後はステップS315に進む。 And the 3rd function update part 46 updates action value function Q according to Numerical formula (1) based on the reward which the 3rd reward calculation part 45 computed (Step S313). Further, the third function update unit 46 determines whether or not the update is not performed in step S313 and the behavior value function Q has converged (step S314). When it is determined that the behavior value function Q has not converged (step S314: No), the third control parameter changing unit 53 performs processing related to the axis drive control based on the behavior value function Q updated in step S313. The control parameter for the condition is changed (step S315). After step S315, the process returns to steps S301 and S302. When it is determined that the action value function Q has converged (step S314: Yes), the learning unit 40 determines whether or not all the control parameters of the machining conditions related to the axis drive control have been changed by the third control parameter changing unit 53. Is determined (step S316). When it is determined that all the control parameters of the machining conditions related to the axis drive control have not been changed (step S316: No), the control parameter to be changed by the third control parameter changing unit 53 is changed to another parameter in step S315. The control parameter is changed (step S317). Another control parameter that is a new change target in step S317 is a control parameter that has not been changed yet related to the axis drive control. After step S317, the process proceeds to step S315.
 ステップS315における、軸駆動制御に関わる加工条件の制御パラメータの変更について以下に詳細に説明する。上述したように、ステップS315において変更される図7に示した軸駆動制御に関わる加工条件の5種類の制御パラメータには、変更対象となる優先順位が定められている。最初にステップS315に入ったときに第3制御パラメータ変更部53によって変更されるのは、狙い電圧値の制御パラメータである電圧制御パラメータである。そして、ステップS314において行動価値関数Qが収束したと判定される毎に、第3制御パラメータ変更部53の変更対象となる制御パラメータが、軸応答性の制御パラメータであるGAIN制御パラメータ、最深値持続時間の制御パラメータである長さ制御パラメータ、ジャンプスピードおよびジャンプ高さの制御パラメータであるジャンプ制御パラメータ、極間ギャップ調整値の制御パラメータであるギャップ制御パラメータの順にステップS317で替えられていく。 The change of the control parameters of the machining conditions related to the axis drive control in step S315 will be described in detail below. As described above, the priority order to be changed is determined for the five types of control parameters of the machining conditions related to the axis drive control shown in FIG. 7 changed in step S315. It is the voltage control parameter that is the control parameter of the target voltage value that is changed by the third control parameter changing unit 53 when first entering step S315. Each time it is determined in step S314 that the action value function Q has converged, the control parameter to be changed by the third control parameter changing unit 53 is the GAIN control parameter, which is an axis response control parameter, and the deepest value persistence. In step S317, the length control parameter is a time control parameter, the jump control parameter is a jump speed and jump height control parameter, and the gap control parameter is a control parameter for the gap gap adjustment value.
 行動価値関数Qが収束して、軸駆動制御に関わる加工条件の制御パラメータの全てが変更されたと学習部40が判定した場合(ステップS316:Yes)、軸駆動制御に関わる加工条件の制御パラメータの学習による最適化処理は終了し、学習結果が学習結果記憶部80に記憶される(ステップS318)。学習結果には、ステップS315で変更されて最終的に決定された各制御パラメータに加えて、各制御パラメータの変更過程の値、および制御パラメータに対応する第2状態の値および第3状態の値が含まれる。学習結果記憶部80に記憶された学習結果は、制御パラメータの変更前後の良否判断に利用することができる。また、上記のようにして最終的に決定された制御パラメータは、上記学習において報酬が最も多く得られ、与えられた加工条件設定値において最適な制御パラメータとして制御パラメータ保持部13に保持される。軸駆動制御に関わる加工条件の制御パラメータを学習により最適化することで、ジャンプ動作と言われる退避動作が終了し、放電を発生させるための極間位置制御が行われ、再び次のジャンプ動作が行われるまでといった1回の動作単位におけるパルス数を増大させて、観測毎における軸の送り量を加工進行方向に大きくして加工の進行を促進することが可能になる。なお、上述したように、軸駆動制御に関わる加工条件の5種類の制御パラメータを同時に最適化する場合は、ステップS315において、第3制御パラメータ変更部53は、ステップS313で更新された行動価値関数Qに基づいて、5種類の制御パラメータを同時に変更する。この場合、ステップS316およびS317は不要であり、ステップS314において行動価値関数Qが収束したと判定された場合(ステップS314:Yes)、ただちにステップS318に進むようにすればよい。 When the behavior value function Q has converged and the learning unit 40 determines that all of the machining condition control parameters related to the axis drive control have been changed (step S316: Yes), the control parameters of the machining conditions related to the axis drive control are set. The optimization process by learning ends, and the learning result is stored in the learning result storage unit 80 (step S318). In the learning result, in addition to the control parameters finally changed after being changed in step S315, the value of the change process of each control parameter, the value of the second state and the value of the third state corresponding to the control parameter Is included. The learning result stored in the learning result storage unit 80 can be used for pass / fail judgment before and after the change of the control parameter. Further, the control parameter finally determined as described above is most rewarded in the learning, and is held in the control parameter holding unit 13 as an optimal control parameter for the given processing condition setting value. By optimizing the control parameters of the machining conditions related to the axis drive control by learning, the retreat operation called jump operation is completed, the inter-electrode position control is performed to generate discharge, and the next jump operation is performed again. It is possible to increase the number of pulses in one operation unit until it is performed, and increase the feed amount of the axis in each observation in the processing progress direction to promote the progress of the processing. As described above, when simultaneously optimizing the five kinds of control parameters of the machining conditions related to the axis drive control, in step S315, the third control parameter changing unit 53 updates the action value function updated in step S313. Based on Q, five types of control parameters are changed simultaneously. In this case, steps S316 and S317 are unnecessary, and if it is determined in step S314 that the action value function Q has converged (step S314: Yes), the process may proceed to step S318 immediately.
 また、上記図15から図17の説明において、更新された行動価値関数Qに基づいて変更される制御パラメータの変更の方法は、現状の状態stにおける行動価値関数Q(st,at)で求められる行動価値Qが最大となるような行動atすなわち制御パラメータを求めるやり方であれば特に限定されない。 Further, in the description of FIG. 17 from FIG 15, the method of changing the control parameter is changed based on the updated action value function Q is action value in the current state s t function Q (s t, a t) action value Q sought is not particularly limited as long as the manner for obtaining the action a t or control parameters such as the maximum.
 なお、同一の加工条件の制御パラメータは同一であるので、図15から図17のフローチャートを並列して実行した場合には、それぞれのフローチャートによる変更を同一の制御パラメータが受けることになる。 In addition, since the control parameters of the same machining conditions are the same, when the flowcharts of FIGS. 15 to 17 are executed in parallel, the same control parameters are changed by the respective flowcharts.
 図15から図17のフローチャートによる制御パラメータの最適化の動作は放電加工機1による加工動作が開始され、放電が発生した段階から行われ、放電加工が終了するまで続けられる。すなわち、加工の開始と同時に加工の状態は状態観測部30により観測され、加工が終了するまで最適な制御パラメータの探索が学習部40およびパラメータ変更部50により行われる。すなわち、機械学習装置100によって、図15から図17のフローチャートが並列して実行され、図15から図17の全ての終了条件が満たされるまで制御パラメータの更新は継続する。全ての終了条件が満たされた場合に制御パラメータの変更は終了する。 15 to 17, the control parameter optimization operation is performed from the stage where the machining operation by the electric discharge machine 1 is started and electric discharge is generated, and is continued until the electric discharge machining is completed. That is, simultaneously with the start of machining, the machining state is observed by the state observing unit 30, and an optimum control parameter is searched for by the learning unit 40 and the parameter changing unit 50 until the machining is completed. That is, the machine learning device 100 executes the flowcharts of FIGS. 15 to 17 in parallel, and the control parameter update continues until all the end conditions of FIGS. 15 to 17 are satisfied. When all the end conditions are satisfied, the control parameter change ends.
 機械学習装置100による上記学習行動は、放電加工開始から加工終了となるまで継続的に行われる。学習行動における報酬を、上記第1、第2および第3状態に基づいて求めて、制御パラメータの変更行動を行う。この学習行動により、加工終了後に得らえた最適な制御パラメータによる行動価値Qは、最初に設定されている制御パラメータによる行動価値Qより高められている。実施の形態1にかかる放電加工機1によって、行動価値Qが高められることにより、加工終了までにかかる時間の短縮、および安定した放電による加工によって得られる被加工物の加工精度および加工面質の向上が効果として得られる。 The learning action by the machine learning device 100 is continuously performed from the start of electric discharge machining to the end of machining. A reward for the learning behavior is obtained based on the first, second, and third states, and the control parameter changing behavior is performed. By this learning behavior, the behavior value Q based on the optimal control parameter obtained after the processing is finished is higher than the behavior value Q based on the control parameter that is initially set. The action value Q is increased by the electric discharge machine 1 according to the first embodiment, so that the time required for finishing the machining is shortened, and the machining accuracy and machining surface quality of the workpiece obtained by machining by stable electric discharge are improved. Improvement is obtained as an effect.
 従来の適応制御においては、加工が安定するように決められたルールにより加工条件設定値の制御は行なわれていたが、制御パラメータを変更する適応制御は行われていなかった。これに対して、機械学習装置100によれば、加工物形状および加工材質に応じて、実際に放電加工を実行させながら、制御パラメータを調整する最適化学習を実行するので、物理量としてより適切で安定した加工条件を自動的に学習することができる。すなわち、機械学習装置100によれば、被加工物の形状、電極材質、電極形状といった、あらかじめ想定が難しい適応制御使用条件下においても、適応制御の適用範囲を限定することなく制御パラメータの最適化を行うことが可能となり、加工の安定性を高めて、加工速度および加工精度の改善を図ることができる。 In the conventional adaptive control, the machining condition set value is controlled according to a rule determined to stabilize the machining, but the adaptive control for changing the control parameter is not performed. On the other hand, according to the machine learning device 100, the optimization learning for adjusting the control parameter is executed while actually performing the electric discharge machining according to the workpiece shape and the workpiece material. Stable machining conditions can be automatically learned. In other words, according to the machine learning device 100, optimization of control parameters is possible without limiting the applicable range of adaptive control even under adaptive control usage conditions that are difficult to assume in advance, such as the shape of the workpiece, the electrode material, and the electrode shape. It is possible to improve the processing speed and processing accuracy by improving the processing stability.
実施の形態2.
 図18は、本発明の実施の形態2にかかる放電加工機1Aの構成を示すブロック図である。放電加工機1Aは、実施の形態1にかかる放電加工機1に、加工結果を利用した追加学習を行うための構成である加工結果入力部23を入出力部20に追加している。
Embodiment 2. FIG.
FIG. 18 is a block diagram showing a configuration of an electric discharge machine 1A according to the second embodiment of the present invention. The electric discharge machine 1A adds a machining result input unit 23, which is a configuration for performing additional learning using the machining result, to the input / output unit 20 to the electric discharge machine 1 according to the first embodiment.
 実施の形態1においては、ある特定の加工を行う際の制御パラメータの学習行動について説明したが、実施の形態2においては、同じ被加工物3の材料および同じ加工条件設定値においてあらかじめ一度加工が行われたものとする。一度加工が行われた結果として、被加工物3の加工後の面粗さと、加工電極2の加工後の電極消耗量である消耗重量または消耗長さとが得られているとする。 In the first embodiment, the learning behavior of the control parameter when performing a specific machining has been described. However, in the second embodiment, machining is performed once in advance with the same material of the workpiece 3 and the same machining condition setting value. Suppose that it was done. As a result of processing once, it is assumed that the surface roughness after processing the workpiece 3 and the consumption weight or the consumption length, which is the electrode consumption amount after processing the processing electrode 2, are obtained.
 加工結果入力部23は、ユーザが入力した被加工物3の加工後の面粗さおよび加工電極2の加工後の電極消耗量といった加工結果を受け付ける。加工結果の入力の形式は、表示部22が選択可能な選択枝を表示して、ユーザの選択結果を加工結果入力部23が受け付ける形式でもよい。また、ユーザが入力した被加工物3の加工後の面粗さおよび加工電極2の加工後の電極消耗量についての数値データを加工結果入力部23が受け付ける形式でもよく、限定されない。また、加工結果入力部23が受け付けた被加工物3の加工後の面粗さおよび加工電極2の加工後の電極消耗量の良否評価の方法も設計事項であり特に限定しない。また、被加工物3の加工後の面粗さおよび加工電極2の加工後の電極消耗量の良否自体を加工結果入力部23が受け付けるようにしてもかまわない。 The machining result input unit 23 receives machining results such as the surface roughness after machining of the workpiece 3 and the electrode consumption after machining of the machining electrode 2 input by the user. The format of the processing result input may be a format in which the selection result that can be selected by the display unit 22 is displayed, and the processing result input unit 23 receives the user selection result. The processing result input unit 23 may accept numerical data regarding the surface roughness after processing of the workpiece 3 and the electrode consumption after processing of the processing electrode 2 input by the user, and is not limited. Further, the surface roughness after processing of the workpiece 3 received by the processing result input unit 23 and the method for evaluating the quality of the electrode consumption after processing of the processed electrode 2 are also design items and are not particularly limited. Further, the processing result input unit 23 may accept the surface roughness after processing the workpiece 3 and the quality of the electrode consumption after processing the processing electrode 2 itself.
 一度行われた加工と同じ加工条件設定で再度加工を実行するときに、加工結果入力部23が受け付けた以前の加工における加工結果を用いることで、図15から図17で説明した制御パラメータの変更における変更量に制限の追加または解除を行うことができる。加工結果入力部23が受け付けた加工結果に基づいて、加工状態観測部33などが、パラメータ変更部50に制御パラメータの変更における変更量に制限の追加または解除を実行させる。 When the machining is performed again with the same machining condition setting as that performed once, the machining parameter change described with reference to FIGS. 15 to 17 is performed by using the machining result in the previous machining accepted by the machining result input unit 23. You can add or remove restrictions on the amount of change in. Based on the machining result received by the machining result input unit 23, the machining state observation unit 33 or the like causes the parameter changing unit 50 to add or remove a restriction on the change amount in the change of the control parameter.
 具体的には、加工結果入力部23が受け付けた被加工物3の加工後の面粗さの良否評価が悪いとされた場合、加工面質に影響を及ぼす制御パラメータの変更に制限を加える。一例として、電流パルス長さの制御パラメータである長さ制御パラメータを一定値以上変更しないように長さ制御パラメータの変更幅にパラメータ変更部50が制限を加える。 More specifically, when the quality evaluation of the surface roughness after processing of the workpiece 3 received by the processing result input unit 23 is determined to be poor, the control parameter change that affects the processing surface quality is limited. As an example, the parameter changing unit 50 limits the change width of the length control parameter so as not to change the length control parameter that is a control parameter of the current pulse length by a certain value or more.
 また、加工結果入力部23が受け付けた加工電極2の加工後の電極消耗量が少なくまだ余裕があると判断された場合は、電極消耗量に影響を及ぼす制御パラメータの変更の制限をパラメータ変更部50が解除する。一例として、回路補助設定のパルス傾き制御パラメータの変更幅を増やして、変更の制限を解除する。逆に電極消耗量が大きい場合は、制御パラメータの変更にパラメータ変更部50がさらに制限を加えることもできる。 In addition, when it is determined that the processed electrode 2 received by the processing result input unit 23 has a small amount of electrode consumption after processing and that there is still room, the parameter changing unit limits the change of the control parameter that affects the electrode consumption amount. 50 is released. As an example, the change width of the pulse inclination control parameter of the circuit auxiliary setting is increased, and the restriction on the change is released. Conversely, when the amount of electrode consumption is large, the parameter changing unit 50 can further limit the change of the control parameter.
 実施の形態2にかかる放電加工機1Aによれば、一度行われた加工による加工結果を受け付けることにより、同じ被加工物3の材料および同じ加工条件設定値において、制御パラメータの変更の制限を加工結果に依存させることができる。これにより、実施の形態1で得られる効果加えて、加工後の加工面の面質の向上といった精度向上効果、電極消耗量の低減といったコスト削減効果が得られる。 According to the electric discharge machine 1A according to the second embodiment, by accepting a machining result obtained by machining once, the change of the control parameter is machined with the same material of the workpiece 3 and the same machining condition setting value. Can depend on the result. Thereby, in addition to the effects obtained in the first embodiment, an accuracy improvement effect such as an improvement in the quality of the processed surface after processing and a cost reduction effect such as a reduction in the amount of electrode consumption can be obtained.
実施の形態3.
 図19は、本発明の実施の形態3にかかる放電加工機1Bの構成を示すブロック図である。放電加工機1Bは、実施の形態2にかかる放電加工機1Aに、通信部60が追加されている。通信部60は、学習結果記憶部80に記憶された学習結果を送信可能な学習結果データに変換する学習内容ファイル化部61と、外部から学習結果データを受信する受信部62と、学習結果データを外部に送信する送信部63とを備える。受信部62および送信部63は、放電加工機1Bの外部に存在するクラウドサーバ300に接続されて通信が可能である。
Embodiment 3 FIG.
FIG. 19 is a block diagram showing a configuration of an electric discharge machine 1B according to the third embodiment of the present invention. In the electric discharge machine 1B, a communication unit 60 is added to the electric discharge machine 1A according to the second embodiment. The communication unit 60 includes a learning content filing unit 61 that converts learning results stored in the learning result storage unit 80 into transmittable learning result data, a receiving unit 62 that receives learning result data from the outside, and learning result data Is transmitted to the outside. The receiving unit 62 and the transmitting unit 63 are connected to the cloud server 300 existing outside the electric discharge machine 1B and can communicate with each other.
 クラウドサーバ300は、放電加工機1Bの制御装置10と同様な学習機能を有した放電加工機301~303にも接続されている。したがって、放電加工機1Bは、通信部60を介して他の放電加工機である放電加工機301~303と通信することができる。クラウドサーバ300は、放電加工機1Bの学習結果データのみならず、放電加工機301~303の学習結果データも記憶することができる。クラウドサーバ300と放電加工機1B,301~303との通信方式については、公知の技術を利用すればよく特に制限はない。 The cloud server 300 is also connected to electric discharge machines 301 to 303 having the same learning function as the control device 10 of the electric discharge machine 1B. Therefore, the electric discharge machine 1B can communicate with the electric discharge machines 301 to 303, which are other electric discharge machines, via the communication unit 60. The cloud server 300 can store not only learning result data of the electric discharge machine 1B but also learning result data of the electric discharge machines 301 to 303. A communication method between the cloud server 300 and the electric discharge machine 1B, 301 to 303 is not particularly limited as long as a known technique is used.
 実施の形態1および2で説明したような制御パラメータの最適化学習が既に実行されている場合に、学習結果記憶部80に記憶されている学習結果を、外部に存在する放電加工機301~303が利用可能な形式の学習結果データに学習内容ファイル化部61が変換することができる。学習結果データは、制御装置10と同様な制御装置であれば利用が可能なデータ形式であればその形式は限定されない。 When control parameter optimization learning as described in the first and second embodiments has already been performed, the learning results stored in the learning result storage unit 80 are used as the electric discharge machines 301 to 303 existing outside. Can be converted into learning result data 61 in a format that can be used. The learning result data is not limited as long as it is a data format that can be used by a control device similar to the control device 10.
 学習内容ファイル化部61により作成された学習結果データは、送信部63を経由して、クラウドサーバ300に蓄えることができる。クラウドサーバ300に蓄えられた学習結果データは、放電加工機301~303に対し自動的または能動的に送信され、放電加工機301~303がその学習結果データを利用するかどうかは、放電加工機301~303のユーザの判断で決定できるものとする。 The learning result data created by the learning content filing unit 61 can be stored in the cloud server 300 via the transmission unit 63. The learning result data stored in the cloud server 300 is automatically or actively transmitted to the electric discharge machines 301 to 303, and whether or not the electric discharge machines 301 to 303 use the learning result data depends on the electric discharge machine. It can be determined by the user's judgment of 301-303.
 この学習結果データを放電加工機301~303の中に存在する機械学習装置に取り込むことにより、機械学習装置100が学習した内容を、放電加工機301~303でも同様に利用することができる。 The contents learned by the machine learning device 100 can be used in the electric discharge machines 301 to 303 in the same manner by taking this learning result data into the machine learning devices existing in the electric discharge machines 301 to 303.
 また逆に、放電加工機301~303において学習により作成された学習結果データも、クラウドサーバ300、受信部62を介して、制御装置10にて利用することができる。この際、受信部62を介して放電加工機301~303の制御装置で学習した内容または観測の状態は、表示部22に表示することができる。 Conversely, learning result data created by learning in the electric discharge machines 301 to 303 can also be used by the control device 10 via the cloud server 300 and the receiving unit 62. At this time, the contents learned by the control device of the electric discharge machines 301 to 303 via the receiving unit 62 or the observation state can be displayed on the display unit 22.
 これにより、遠隔地といった外部に存在する放電加工機301~303の制御装置による学習結果を放電加工機1Bで利用したり、放電加工機301~303の加工状態を放電加工機1Bで観測したりできる。また、放電加工機1Bによる学習結果を同一仕様の放電加工機301~303に利用させることも可能となる。したがって、1つの放電加工機単体の調整のみならず、同一仕様の複数の放電加工機に対する機械性能の向上を、同一仕様の放電加工機の数が増えるほど効率的に実行することが可能になる。 As a result, the learning result by the control device for the electric discharge machines 301 to 303 existing outside such as a remote place is used in the electric discharge machine 1B, or the machining state of the electric discharge machines 301 to 303 is observed with the electric discharge machine 1B it can. In addition, the learning results obtained by the electric discharge machine 1B can be used by the electric discharge machines 301 to 303 having the same specifications. Accordingly, not only adjustment of a single electric discharge machine but also improvement of mechanical performance for a plurality of electric discharge machines of the same specification can be efficiently performed as the number of electric discharge machines of the same specification increases. .
 実施の形態1から3にかかる機械学習装置100は、パーソナルコンピュータまたは汎用コンピュータといったコンピュータシステムにより実現される。図20は、実施の形態1から3にかかる機械学習装置100の機能をコンピュータシステムで実現する場合のハードウェア構成を示す図である。機械学習装置100の機能をコンピュータシステムで実現する場合、機械学習装置100の機能は、図20に示すようにCPU(Central Processing Unit)201、メモリ202、記憶装置203、表示装置204および入力装置205により実現される。機械学習装置100が実行する機能は、ソフトウェア、ファームウェア、またはソフトウェアとファームウェアとの組み合わせにより実現される。ソフトウェアまたはファームウェアは、プログラムとして記述されて記憶装置203に格納される。CPU201は、記憶装置203に記憶されたソフトウェアまたはファームウェアをメモリ202に読み出して実行することにより、機械学習装置100の機能を実現する。すなわち、コンピュータシステムは、機械学習装置100の機能がCPU201により実行されるときに、実施の形態1から3にかかる機械学習方法を実施するステップが結果的に実行されることになるプログラムを格納するための記憶装置203を備える。また、これらのプログラムは、機械学習装置100の機能が実現する処理をコンピュータに実行させるものであるともいえる。メモリ202は、RAM(Random Access Memory)といった揮発性の記憶領域が該当する。記憶装置203は、ROM(Read Only Memory)、フラッシュメモリといった不揮発性または揮発性の半導体メモリ、磁気ディスクが該当する。表示装置204の具体例は、モニタ、ディスプレイである。入力装置205の具体例は、キーボード、マウス、タッチパネルである。 The machine learning apparatus 100 according to the first to third embodiments is realized by a computer system such as a personal computer or a general-purpose computer. FIG. 20 is a diagram illustrating a hardware configuration when the function of the machine learning device 100 according to the first to third embodiments is realized by a computer system. When the functions of the machine learning device 100 are realized by a computer system, the functions of the machine learning device 100 are a CPU (Central Processing Unit) 201, a memory 202, a storage device 203, a display device 204, and an input device 205 as shown in FIG. It is realized by. The function executed by the machine learning device 100 is realized by software, firmware, or a combination of software and firmware. Software or firmware is described as a program and stored in the storage device 203. The CPU 201 implements the functions of the machine learning device 100 by reading the software or firmware stored in the storage device 203 into the memory 202 and executing the software or firmware. That is, the computer system stores a program that results in the steps of executing the machine learning method according to the first to third embodiments when the function of the machine learning device 100 is executed by the CPU 201. A storage device 203 is provided. These programs can be said to cause a computer to execute processing realized by the functions of the machine learning device 100. The memory 202 corresponds to a volatile storage area such as RAM (Random Access Memory). The storage device 203 corresponds to a nonvolatile or volatile semiconductor memory such as a ROM (Read Only Memory) or a flash memory, or a magnetic disk. Specific examples of the display device 204 are a monitor and a display. Specific examples of the input device 205 are a keyboard, a mouse, and a touch panel.
 以上の実施の形態に示した構成は、本発明の内容の一例を示すものであり、別の公知の技術と組み合わせることも可能であるし、本発明の要旨を逸脱しない範囲で、構成の一部を省略、変更することも可能である。 The configuration described in the above embodiment shows an example of the content of the present invention, and can be combined with another known technique, and can be combined with other configurations within the scope of the present invention. It is also possible to omit or change the part.
 1,1A,1B,301~303 放電加工機、2 加工電極、3 被加工物、4 駆動装置、5 加工電源、10 制御装置、11 軸駆動制御部、12 加工電源制御部、13 制御パラメータ保持部、14 初期パラメータ設定部、15 加工条件設定部、20 入出力部、21 加工条件入力部、22 表示部、23 加工結果入力部、30 状態観測部、31 軸駆動認識部、32 パルス状態認識部、33 加工状態観測部、40 学習部、41 第1報酬計算部、42 第1関数更新部、43 第2報酬計算部、44 第2関数更新部、45 第3報酬計算部、46 第3関数更新部、47 報酬計算部、48 関数更新部、50 パラメータ変更部、51 第1制御パラメータ変更部、52 第2制御パラメータ変更部、53 第3制御パラメータ変更部、60 通信部、61 学習内容ファイル化部、62 受信部、63 送信部、80 学習結果記憶部、100 機械学習装置、201 CPU、202 メモリ、203 記憶装置、204 表示装置、205 入力装置、300 クラウドサーバ。 1, 1A, 1B, 301-303 Electric discharge machine, 2 machining electrodes, 3 workpieces, 4 drive units, 5 machining power sources, 10 control units, 11 axis drive control units, 12 machining power source control units, 13 holding control parameters Unit, 14 initial parameter setting unit, 15 machining condition setting unit, 20 input / output unit, 21 machining condition input unit, 22 display unit, 23 machining result input unit, 30 state observation unit, 31 axis drive recognition unit, 32 pulse state recognition Part, 33 machining state observation part, 40 learning part, 41 first reward calculation part, 42 first function update part, 43 second reward calculation part, 44 second function update part, 45 third reward calculation part, 46 third Function update unit, 47 reward calculation unit, 48 function update unit, 50 parameter change unit, 51 first control parameter change unit, 52 second control parameter change unit, 3. Third control parameter changing unit, 60 communication unit, 61 learning content file forming unit, 62 receiving unit, 63 transmitting unit, 80 learning result storage unit, 100 machine learning device, 201 CPU, 202 memory, 203 storage device, 204 display Device, 205 input device, 300 cloud server.

Claims (17)

  1.  放電加工機における加工条件を制御する制御パラメータを学習する機械学習装置であって、
     放電加工中の加工状態を表す複数の状態変数を観測する状態観測部と、
     複数の前記状態変数に基づいて前記制御パラメータを学習する学習部と、
     を備える
     ことを特徴とする機械学習装置。
    A machine learning device for learning control parameters for controlling machining conditions in an electric discharge machine,
    A state observation unit for observing a plurality of state variables representing a machining state during electric discharge machining;
    A learning unit that learns the control parameter based on a plurality of the state variables;
    A machine learning device comprising:
  2.  前記状態観測部は、
     パルスの不安定信号の発生回数を予め定めた期間の間に累積させた値である第1状態の値、前記予め定めた期間において発生したパルス数である第2状態の値、および駆動装置における軸の送り量である第3状態の値を複数の前記状態変数として観測する
     ことを特徴とする請求項1に記載の機械学習装置。
    The state observation unit
    The value of the first state, which is a value obtained by accumulating the number of occurrences of unstable pulse signals during a predetermined period, the value of the second state, which is the number of pulses generated during the predetermined period, and the driving device The machine learning device according to claim 1, wherein a value of a third state that is a feed amount of the shaft is observed as a plurality of the state variables.
  3.  前記状態観測部は、
     加工電源を制御する加工電源制御部の指令に基づいて前記第1状態の値および前記第2状態の値を求めるパルス状態認識部と、
     加工電極と被加工物との間の距離を制御するための前記駆動装置を制御する軸駆動制御部の指令に基づいて前記第3状態の値を求める軸駆動認識部と、
     を備える
     ことを特徴とする請求項2に記載の機械学習装置。
    The state observation unit
    A pulse state recognizing unit for obtaining a value of the first state and a value of the second state based on a command of a machining power source control unit for controlling a machining power source;
    A shaft drive recognition unit for obtaining a value of the third state based on a command of a shaft drive control unit that controls the drive device for controlling the distance between the machining electrode and the workpiece;
    The machine learning device according to claim 2, comprising:
  4.  前記学習部は、
     前記状態変数に基づいて報酬を計算する報酬計算部と、
     前記報酬に基づいて、前記制御パラメータを決定するための関数を更新する関数更新部と、
     を備える
     ことを特徴とする請求項2または3に記載の機械学習装置。
    The learning unit
    A reward calculation unit for calculating a reward based on the state variable;
    A function updater for updating a function for determining the control parameter based on the reward;
    The machine learning device according to claim 2, further comprising:
  5.  前記報酬計算部は、前記状態変数が前回よりも安定した状態を示す場合には、前記報酬を増やし、前記状態変数が前回よりも不安定な状態を示す場合には、前記報酬を減らす
     ことを特徴とする請求項4に記載の機械学習装置。
    The reward calculation unit increases the reward when the state variable indicates a more stable state than the previous time, and reduces the reward when the state variable indicates a more unstable state than the previous time. The machine learning device according to claim 4, wherein
  6.  前記報酬計算部は、電圧制御にかかる報酬を計算する第1報酬計算部と、パルス制御にかかる報酬を計算する第2報酬計算部と、軸駆動制御にかかる報酬を計算する第3報酬計算部とを備え、
     前記関数更新部は、電圧制御にかかる関数を更新する第1関数更新部と、パルス制御にかかる関数を更新する第2関数更新部と、軸駆動制御にかかる関数を更新する第3関数更新部とを備える
     ことを特徴とする請求項4または5に記載の機械学習装置。
    The reward calculator includes a first reward calculator that calculates a reward for voltage control, a second reward calculator for calculating a reward for pulse control, and a third reward calculator for calculating a reward for axis drive control. And
    The function update unit includes a first function update unit that updates a function related to voltage control, a second function update unit that updates a function related to pulse control, and a third function update unit that updates a function related to axis drive control. The machine learning device according to claim 4, wherein the machine learning device is provided.
  7.  前記第1報酬計算部は、前記第1状態の値が前回よりも小さい場合には、前記報酬を増やし、前記第1状態の値が前回よりも大きい場合には、前記報酬を減らす
     ことを特徴とする請求項6に記載の機械学習装置。
    The first reward calculation unit increases the reward when the value of the first state is smaller than the previous value, and reduces the reward when the value of the first state is larger than the previous value. The machine learning device according to claim 6.
  8.  前記第1報酬計算部は、前記第2状態の値が前回よりも大きい場合には、前記報酬を増やし、前記第2状態の値が前回よりも小さい場合には、前記報酬を減らす
     ことを特徴とする請求項6に記載の機械学習装置。
    The first reward calculation unit increases the reward when the value of the second state is larger than the previous value, and reduces the reward when the value of the second state is smaller than the previous value. The machine learning device according to claim 6.
  9.  前記第2報酬計算部は、前記第1状態の値が前回よりも小さい場合には、前記報酬を増やし、前記第1状態の値が前回よりも大きい場合には、前記報酬を減らす
     ことを特徴とする請求項6に記載の機械学習装置。
    The second reward calculation unit increases the reward when the value of the first state is smaller than the previous value, and decreases the reward when the value of the first state is larger than the previous value. The machine learning device according to claim 6.
  10.  前記第2報酬計算部は、前記第2状態の値が前回よりも大きい場合には、前記報酬を増やし、前記第2状態の値が前回よりも小さい場合には、前記報酬を減らす
     ことを特徴とする請求項6に記載の機械学習装置。
    The second reward calculation unit increases the reward when the value of the second state is larger than the previous value, and decreases the reward when the value of the second state is smaller than the previous value. The machine learning device according to claim 6.
  11.  前記第3報酬計算部は、前記第2状態の値が前回よりも大きい場合には、前記報酬を増やし、前記第2状態の値が前回よりも小さい場合には、前記報酬を減らす
     ことを特徴とする請求項6に記載の機械学習装置。
    The third reward calculation unit increases the reward when the value of the second state is larger than the previous value, and decreases the reward when the value of the second state is smaller than the previous value. The machine learning device according to claim 6.
  12.  前記第3報酬計算部は、前記第3状態の値が前回よりも大きい場合には、前記報酬を増やし、前記第3状態の値が前回よりも小さい場合には、前記報酬を減らす
     ことを特徴とする請求項6に記載の機械学習装置。
    The third reward calculation unit increases the reward when the value of the third state is larger than the previous value, and decreases the reward when the value of the third state is smaller than the previous value. The machine learning device according to claim 6.
  13.  請求項1から12のいずれか1つに記載の機械学習装置と、
     加工条件設定値を設定する加工条件設定部と、
     前記制御パラメータを保持する制御パラメータ保持部と、
     前記加工条件設定値及び前記制御パラメータに基づいて加工電源を制御する加工電源制御部と、
     前記加工条件設定値及び前記制御パラメータに基づいて加工電極と被加工物との間の距離を制御するための駆動装置を制御する軸駆動制御部と、
     前記学習部が学習した結果に基づいて、前記制御パラメータ保持部が保持する前記制御パラメータを変更するパラメータ変更部と、
     を備える
     ことを特徴とする放電加工機。
    The machine learning device according to any one of claims 1 to 12,
    A machining condition setting section for setting a machining condition setting value;
    A control parameter holding unit for holding the control parameter;
    A machining power source control unit that controls a machining power source based on the machining condition setting value and the control parameter;
    An axis drive control unit that controls a drive device for controlling the distance between the machining electrode and the workpiece based on the machining condition setting value and the control parameter;
    A parameter changing unit that changes the control parameter held by the control parameter holding unit based on the learning result of the learning unit;
    An electric discharge machine characterized by comprising:
  14.  加工結果を受け付ける加工結果入力部をさらに備え、
     前記パラメータ変更部は前記加工結果に基づいて前記制御パラメータの変更に制限の追加または解除を行う
     ことを特徴とする請求項13に記載の放電加工機。
    A machining result input unit for receiving machining results;
    The electric discharge machine according to claim 13, wherein the parameter changing unit adds or removes a restriction on the change of the control parameter based on the machining result.
  15.  他の放電加工機と通信することができる通信部をさらに備える
     ことを特徴とする請求項13または14に記載の放電加工機。
    The electric discharge machine according to claim 13 or 14, further comprising a communication unit capable of communicating with another electric discharge machine.
  16.  前記パラメータ変更部により変更された前記制御パラメータを記憶する学習結果記憶部をさらに備える
     ことを特徴とする請求項13から15のいずれか1つに記載の放電加工機。
    The electric discharge machine according to any one of claims 13 to 15, further comprising a learning result storage unit that stores the control parameter changed by the parameter changing unit.
  17.  放電加工機における加工条件を制御する制御パラメータを学習する機械学習装置の機械学習方法であって、
     放電加工中の加工状態を表す複数の状態変数を求めるステップと、
     複数の前記状態変数に基づいて、前記制御パラメータを学習するステップと、
     を備える
     ことを特徴とする機械学習方法。
    A machine learning method of a machine learning device for learning control parameters for controlling machining conditions in an electric discharge machine,
    Obtaining a plurality of state variables representing a machining state during electric discharge machining;
    Learning the control parameter based on a plurality of the state variables;
    A machine learning method comprising:
PCT/JP2018/015910 2018-04-17 2018-04-17 Machine learning device, electric discharge machine, and machine learning method WO2019202672A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/JP2018/015910 WO2019202672A1 (en) 2018-04-17 2018-04-17 Machine learning device, electric discharge machine, and machine learning method
JP2019508980A JP6663538B1 (en) 2018-04-17 2018-04-17 Machine learning device
CN201880092284.8A CN111954582B (en) 2018-04-17 2018-04-17 Machine learning device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2018/015910 WO2019202672A1 (en) 2018-04-17 2018-04-17 Machine learning device, electric discharge machine, and machine learning method

Publications (1)

Publication Number Publication Date
WO2019202672A1 true WO2019202672A1 (en) 2019-10-24

Family

ID=68239485

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/015910 WO2019202672A1 (en) 2018-04-17 2018-04-17 Machine learning device, electric discharge machine, and machine learning method

Country Status (3)

Country Link
JP (1) JP6663538B1 (en)
CN (1) CN111954582B (en)
WO (1) WO2019202672A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021186521A1 (en) * 2020-03-17 2021-09-23 三菱電機株式会社 Machining condition searching device and machining condition searching method
CN114746203A (en) * 2019-12-03 2022-07-12 三菱电机株式会社 Control device, electric discharge machine, and machine learning device
US20220236722A1 (en) * 2019-07-03 2022-07-28 Mitsubishi Electric Corporation Machine learning apparatus, numerical control apparatus, wire electric discharge machine, and machine learning method

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7204952B1 (en) * 2021-06-21 2023-01-16 三菱電機株式会社 MACHINING CONDITION SEARCH DEVICE AND MACHINING CONDITION SEARCH METHOD

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02262915A (en) * 1989-03-31 1990-10-25 Mitsubishi Electric Corp Controller for electric discharge machine
JPH06170645A (en) * 1992-12-03 1994-06-21 Sodick Co Ltd Electric discharge machining control method and electric discharge machine control device
JP2013094940A (en) * 2011-11-04 2013-05-20 Fanuc Ltd Machining condition adjusting device for electric discharge machine
JP2017042882A (en) * 2015-08-27 2017-03-02 ファナック株式会社 Wire electric discharge machine for working while adjusting the working condition
JP2017068566A (en) * 2015-09-30 2017-04-06 ファナック株式会社 Machine learning device and machine learning method for optimizing frequency of tool correction for machine tool and machine tool provided with machine learning device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2951111A4 (en) * 2013-02-04 2017-01-25 Anca Pty. Ltd. Pulse and gap control for electrical discharge machining equipment
JP6619192B2 (en) * 2015-09-29 2019-12-11 ファナック株式会社 Wire electrical discharge machine with function to warn of abnormal load on moving axis
JP6235543B2 (en) * 2015-09-30 2017-11-22 ファナック株式会社 Machine learning device, motor control device, processing machine, and machine learning method for optimizing cycle processing time of processing machine

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02262915A (en) * 1989-03-31 1990-10-25 Mitsubishi Electric Corp Controller for electric discharge machine
JPH06170645A (en) * 1992-12-03 1994-06-21 Sodick Co Ltd Electric discharge machining control method and electric discharge machine control device
JP2013094940A (en) * 2011-11-04 2013-05-20 Fanuc Ltd Machining condition adjusting device for electric discharge machine
JP2017042882A (en) * 2015-08-27 2017-03-02 ファナック株式会社 Wire electric discharge machine for working while adjusting the working condition
JP2017068566A (en) * 2015-09-30 2017-04-06 ファナック株式会社 Machine learning device and machine learning method for optimizing frequency of tool correction for machine tool and machine tool provided with machine learning device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220236722A1 (en) * 2019-07-03 2022-07-28 Mitsubishi Electric Corporation Machine learning apparatus, numerical control apparatus, wire electric discharge machine, and machine learning method
US11669077B2 (en) * 2019-07-03 2023-06-06 Mitsubishi Electric Corporation Machine learning apparatus, numerical control apparatus, wire electric discharge machine, and machine learning method
CN114746203A (en) * 2019-12-03 2022-07-12 三菱电机株式会社 Control device, electric discharge machine, and machine learning device
CN114746203B (en) * 2019-12-03 2023-08-18 三菱电机株式会社 Control device, electric discharge machine, and machine learning device
WO2021186521A1 (en) * 2020-03-17 2021-09-23 三菱電機株式会社 Machining condition searching device and machining condition searching method

Also Published As

Publication number Publication date
CN111954582A (en) 2020-11-17
JPWO2019202672A1 (en) 2020-04-30
CN111954582B (en) 2021-09-03
JP6663538B1 (en) 2020-03-11

Similar Documents

Publication Publication Date Title
WO2019202672A1 (en) Machine learning device, electric discharge machine, and machine learning method
JP6140228B2 (en) Wire electrical discharge machine for machining while adjusting machining conditions
US10564611B2 (en) Control system and machine learning device
JP6219897B2 (en) Machine tools that generate optimal acceleration / deceleration
JP6063013B1 (en) Numerical control device with machining condition adjustment function to suppress chatter or tool wear / breakage
US10180667B2 (en) Controller-equipped machining apparatus having machining time measurement function and on-machine measurement function
US10331104B2 (en) Machine tool, simulation apparatus, and machine learning device
JP6077617B1 (en) Machine tools that generate optimal speed distribution
JP6457563B2 (en) Numerical control device and machine learning device
JP4964096B2 (en) Servo gain adjusting device and servo gain adjusting method
EP3173171A1 (en) Simulation apparatus of wire electric discharge machine having function of determining welding positions of core using machine learning
KR102224970B1 (en) Controller and machine learning device
JP6704550B1 (en) Machining condition search device and wire electric discharge machine
EP2871016A2 (en) Wire-cut electrical discharge machining machine and method of machining therein
WO2021001974A1 (en) Machine learning device, numerical control device, wire electric discharge machine, machine learning method
JP6880350B1 (en) Learning device, electric discharge machine and learning method
WO2022269664A1 (en) Machining condition searching device and machining condition searching method
CN114746203A (en) Control device, electric discharge machine, and machine learning device
CN116438028B (en) Machining condition setting device, machining condition setting method, and electric discharge machining device
KR20220157437A (en) Dental machining system for generating process parameters of machining
KR20050035463A (en) Electrical discharge machining apparatus for micro discharge, and method for as the same
JPH10217039A (en) Electric discharge machining control device

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2019508980

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18915193

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18915193

Country of ref document: EP

Kind code of ref document: A1