WO2020008633A1 - Machine learning device, numerical control device, machine tool, and machine learning method - Google Patents

Machine learning device, numerical control device, machine tool, and machine learning method Download PDF

Info

Publication number
WO2020008633A1
WO2020008633A1 PCT/JP2018/025746 JP2018025746W WO2020008633A1 WO 2020008633 A1 WO2020008633 A1 WO 2020008633A1 JP 2018025746 W JP2018025746 W JP 2018025746W WO 2020008633 A1 WO2020008633 A1 WO 2020008633A1
Authority
WO
WIPO (PCT)
Prior art keywords
work
chuck
unit
position command
axis direction
Prior art date
Application number
PCT/JP2018/025746
Other languages
French (fr)
Japanese (ja)
Inventor
勇太 加藤
泰一 石田
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to PCT/JP2018/025746 priority Critical patent/WO2020008633A1/en
Priority to DE112018007687.3T priority patent/DE112018007687T5/en
Priority to CN201880095230.7A priority patent/CN112368656B/en
Priority to JP2018562139A priority patent/JP6505341B1/en
Publication of WO2020008633A1 publication Critical patent/WO2020008633A1/en

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B23MACHINE TOOLS; METAL-WORKING NOT OTHERWISE PROVIDED FOR
    • B23QDETAILS, COMPONENTS, OR ACCESSORIES FOR MACHINE TOOLS, e.g. ARRANGEMENTS FOR COPYING OR CONTROLLING; MACHINE TOOLS IN GENERAL CHARACTERISED BY THE CONSTRUCTION OF PARTICULAR DETAILS OR COMPONENTS; COMBINATIONS OR ASSOCIATIONS OF METAL-WORKING MACHINES, NOT DIRECTED TO A PARTICULAR RESULT
    • B23Q15/00Automatic control or regulation of feed movement, cutting velocity or position of tool or work
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B23MACHINE TOOLS; METAL-WORKING NOT OTHERWISE PROVIDED FOR
    • B23BTURNING; BORING
    • B23B2270/00Details of turning, boring or drilling machines, processes or tools not otherwise provided for
    • B23B2270/12Centering of two components relative to one another
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B23MACHINE TOOLS; METAL-WORKING NOT OTHERWISE PROVIDED FOR
    • B23BTURNING; BORING
    • B23B31/00Chucks; Expansion mandrels; Adaptations thereof for remote control
    • B23B31/02Chucks
    • B23B31/24Chucks characterised by features relating primarily to remote control of the gripping means
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/18Numerical control [NC], i.e. automatically operating machines, in particular machine tools, e.g. in a manufacturing environment, so as to execute positioning, movement or co-ordinated operations by means of programme data in numerical form
    • G05B19/402Numerical control [NC], i.e. automatically operating machines, in particular machine tools, e.g. in a manufacturing environment, so as to execute positioning, movement or co-ordinated operations by means of programme data in numerical form characterised by control arrangements for positioning, e.g. centring a tool relative to a hole in the workpiece, additional detection means to correct position
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/49Nc machine tool, till multiple
    • G05B2219/49184Compensation for bending of workpiece, flexible workpiece

Definitions

  • the present invention relates to a machine learning device, a numerical control device, a machine tool, and a machine learning method for learning a work transfer operation.
  • a loader that conveys the work is gripped by the chuck on the sending side.
  • the transfer position For example, the workpiece may not be able to be moved to the center position of the chuck area on the receiving side of the workpiece due to bending of the workpiece when the workpiece is a long workpiece, or failure to grasp the workpiece by the chuck.
  • the transfer position is shifted from an appropriate position during the transfer of the work, the transfer may fail. Therefore, a technique capable of suppressing the position shift of the transfer position is desired.
  • the loader control device described in Patent Literature 1 makes a correlation between a shift amount of a workpiece transfer position and a motor torque of a servo motor that drives the loader a function, and transfers the correlation based on the correlation and the measured motor torque.
  • the shift amount of the position is predicted, and the delivery position is corrected based on the predicted shift amount.
  • Patent Literature 1 since the function indicating the correlation is a fixed function, the probability of failure in delivery of the work does not decrease even if the work delivery operation is repeated.
  • the present invention has been made in view of the above, and an object of the present invention is to obtain a machine learning device that can reduce the probability of failing in transferring a work by learning the work transfer operation. .
  • the present invention provides a method for transferring a work between a first chuck for holding and sending a work and a second chuck for holding and receiving the work.
  • a machine learning device for learning a position command to a drive mechanism for moving one chuck, a state observation unit for observing a position command to the drive mechanism and feedback data from the drive mechanism as state variables,
  • a learning unit that learns a position command that suppresses a positional deviation of a workpiece transfer position between the first chuck and the second chuck in accordance with a data set created based on the variables. .
  • the machine learning device has an effect that by learning the work transfer operation, it is possible to reduce the probability that the work transfer will fail.
  • FIG. 1 is a diagram illustrating a configuration of a control system including a numerical control device according to an embodiment.
  • 4 is a flowchart illustrating an operation procedure of the machining system according to the embodiment.
  • FIG. 4 is a diagram for describing a first learning example by the machine learning device according to the embodiment.
  • FIG. 7 is a diagram for describing a second learning example by the machine learning device according to the embodiment.
  • FIG. 4 is a view for explaining a positional relationship between a spindle chuck and a work included in the machine tool according to the embodiment.
  • FIG. 2 is a diagram illustrating an example of a hardware configuration of a numerical control device according to an embodiment.
  • FIG. 1 is a diagram illustrating a configuration of a machining system according to an embodiment.
  • FIG. 1 shows a case where the machining system 1 is viewed from a vertical direction.
  • the vertical direction is the Y-axis direction
  • the horizontal direction which is the moving direction of the work 40, is the X-axis direction and the Z-axis direction.
  • the machine system 1 includes a machine tool 2 for machining a workpiece 40 and a control system 3 for controlling the operation of the machine tool 2.
  • Examples of the machine tool 2 are a lathe and a machining center.
  • a case where the machine tool 2 is a lathe will be described.
  • the machine tool 2 includes a rotating unit 35, a loader chuck 32 serving as a first chuck, a spindle chuck 31 serving as a second chuck, and a loader 36 serving as a transfer mechanism for a workpiece 40.
  • the operation of the loader 36 is controlled by the control system 3.
  • the loader chuck 32 is connected to the loader 36 and moves together with the loader 36.
  • the loader chuck 32 can grip a workpiece 40 which is a workpiece. Examples of the loader chuck 32 are a three-jaw chuck and a collet chuck.
  • the loader chuck 32 transfers the work 40 to the spindle chuck 31 when starting the processing of the work 40, and receives the work 40 from the spindle chuck 31 after the processing of the work 40 is completed.
  • the rotation unit 35 rotates around the Z axis, which is the main axis, as a rotation axis.
  • the spindle chuck 31 is connected to the rotating unit 35 and rotates together with the rotating unit 35.
  • the spindle chuck 31 can hold the work 40.
  • Examples of the spindle chuck 31 are a three-jaw chuck and a collet chuck.
  • the rotating unit 35 rotates while the spindle chuck 31 holds the work 40, thereby rotating the work 40.
  • An example of the rotating unit 35 is a spindle mechanism.
  • the machine tool 2 grips one end of the work 40 with the loader chuck 32 when loading the work 40 on the rotating unit 35.
  • the loader 36 moves to the minus side in the X-axis direction and stops at the position facing the spindle chuck 31 (s0). Then, the loader 36 moves to the minus side in the Z-axis direction.
  • the loader 36 moves the work 40 to a position where the spindle chuck 31 can grip the work 40 (s1).
  • the position of the loader 36 where the spindle chuck 31 can grip the work 40 is a desired delivery position.
  • the machine tool 2 starts the closing operation of the spindle chuck 31 using an auxiliary function such as an M code (s2), and waits until the closing operation of the spindle chuck 31 is completed (s3).
  • auxiliary function such as an M code
  • s3 the closing operation of the spindle chuck 31 is completed
  • the machine tool 2 starts the opening operation of the loader chuck 32 (s4) and waits until the opening operation of the loader chuck 32 is completed (s5). Then, the loader 36 moves to the plus side in the Z-axis direction. As a result, the loader 36 retreats in the direction away from the work 40 (s6), and further moves to the plus side in the X-axis direction.
  • each process from s0 to s6 is a process in the reverse direction. That is, in the processing of s0 and s6, the moving direction of the loader 36 is reversed between the time of loading and the time of unloading. At the time of unloading, the closing operation of the loader chuck 32 and the opening operation of the spindle chuck 31 are performed.
  • the loader 36 moves to the minus side in the X-axis direction, and further moves in a direction approaching the work 40. Then, the loader chuck 32 starts the closing operation, and when the closing operation is completed, the spindle chuck 31 starts the opening operation. When the opening operation is completed, the loader 36 retreats in a direction away from the work 40, and further X Move to the positive side in the axial direction.
  • the machine tool 2 has a peculiar habit in a process of gripping the work 40, a transfer process, and the like. For this reason, there is a correspondence between the position command to the loader chuck 32 and the actual position of the loader chuck 32 in the X-axis direction due to the peculiarity of the machine. For this reason, even when the work 40 is intended to be loaded with an appropriate position command, when the loader chuck 32 attempts to pass the work 40 to the spindle chuck 31, the position of the work 40 in the X-axis direction is shifted, and the work 40 The chuck 31 may collide with the positive end 30 in the Z-axis direction. In this case, the loader chuck 32 cannot transfer the work 40 to the spindle chuck 31.
  • a numerical control (NC: Numerical Control) device 10 described later provided in the control system 3 is used to move the loader chuck 32 in the X-axis direction when the loader chuck 32 attempts to transfer the work 40 to the spindle chuck 31. Learn the position. That is, by learning the position command to the loader chuck 32, the numerical controller 10 reduces the probability that the delivery of the workpiece 40 fails. If the posture and the shape of the work 40 gripped by the loader chuck 32 are the same each time, the position of the work 40 in the X-axis direction corresponds to the position of the loader chuck 32 in the X-axis direction. Therefore, in the present embodiment, the displacement amount of the loader chuck 32 in the X-axis direction and the displacement amount of the work 40 in the X-axis direction are used synonymously.
  • FIG. 2 is a diagram illustrating a configuration of a control system including the numerical control device according to the embodiment.
  • the control system 3 includes a numerical control device 10, a drive unit 37, and a servomotor 38.
  • the numerical controller 10 is a computer that controls the position of the loader 36 by sending a position command 53 to the drive unit 37.
  • the position command 53 sent by the numerical controller 10 to the drive unit 37 is a command specifying the position of the loader 36, and includes a position command in the X-axis direction and a position command in the Z-axis direction.
  • the numerical controller 10 controls the transfer of the work 40 from the loader chuck 32 to the spindle chuck 31, and when the transfer fails, changes the X-axis direction position command 53 to the loader chuck 32 and repeats the operation. Control the delivery.
  • the numerical controller 10 determines a position command 53 in the X-axis direction to the loader chuck 32 and whether the result of the transfer using the position command 53 in the X-axis direction is failure or success. An appropriate position command 53 in the X-axis direction to the chuck 32 is learned.
  • the drive unit 37 is a drive mechanism that moves the loader 36 by driving a servo motor 38.
  • the drive unit 37 calculates a current value to be sent to the servomotor 38 based on the position command 53 from the numerical controller 10.
  • the drive unit 37 drives the servo motor 38 by sending a current corresponding to the position command 53 to the servo motor 38.
  • the drive unit 37 sends a feedback (FB: Feed-Back) current 55, which is data indicating a current to be sent to the servomotor 38, to the numerical controller 10.
  • the FB current 55 is an example of feedback data from the drive unit 37 to the numerical controller 10.
  • the drive unit 37 calculates an FB position 54, which is data indicating the current position of the work 40, based on the number of rotations. Send to 10.
  • the FB position 54 is an example of feedback data from the drive unit 37 to the numerical controller 10.
  • the servomotor 38 is connected to the loader 36 and moves the loader 36 according to the current from the drive unit 37.
  • the servomotors 38 include a servomotor that moves the loader 36 in the X-axis direction and a servomotor that moves the loader 36 in the Z-axis direction.
  • An encoder 39 for detecting the rotation speed of the servomotor 38 is attached to each of the servomotors 38 in the X-axis direction and the Z-axis direction. The encoder 39 transmits information indicating the detected rotation speed to the drive unit 37.
  • the numerical control device 10 includes a control processing program storage unit 11, an analysis unit 12, a control unit 13, a storage unit 14, and a machine learning device 20.
  • the control processing program storage unit 11 stores a control processing program used when processing the work 40.
  • the control machining program includes a loading command 61 for loading the work 40 on the rotating unit 35, a machining command for machining the work 40, and an unloading command for unloading the work 40 from the rotating unit 35.
  • FIG. 2 illustrates the loading command 61. Among these commands, the loading command 61 and the unloading command are dedicated commands for transferring the work 40.
  • the loading command 61 is sent to the analysis unit 12 as a G code 51 for positioning the loader 36.
  • the analysis unit 12 analyzes the control machining program.
  • the analysis unit 12 determines whether or not the analyzed command is a dedicated command. If the analyzed command is a dedicated command such as the loading command 61, transfer position information indicating a position at which the workpiece 40 is transferred based on the G code 51. 52 is generated. That is, the analysis unit 12 generates the transfer position information 52 based on the positioning command of the loader 36 included in the G code 51.
  • the transfer position information 52 is information of a position at which the work 40 is transferred between chucks between the loader chuck 32 and the spindle chuck 31. Specifically, the delivery position information 52 is an end point of the loader 36. At the time of the first delivery execution of the work 40, information used for the delivery operation is set by the argument of the dedicated command.
  • the dedicated command includes the following command arguments (A1) to (A5).
  • A1 The end point of the loader 36 which is the transfer position information 52 of the work 40
  • A2 The reference current value A which is a criterion for determining whether or not a collision occurs when the work 40 is transferred.
  • A3 The retraction amount Lz of the loader 36 when it is determined that the collision occurs.
  • A4) The direction and amount Lx of moving the loader 36 during learning (A5) Maximum moving distance Lmax in one direction during learning
  • the transfer position information 52 of (A1) includes the X coordinate and the Z coordinate.
  • the reference current value A in (A2) is a value for determining whether or not the delivery is abnormal, and is compared with the FB current 55 of the current sent to the servomotor 38.
  • the FB current 55 larger than the reference current value A is sent to the servomotor 38, it is an abnormal state that the workpiece 40 collides with the spindle chuck 31 at a position other than the transfer position of the spindle chuck 31.
  • An example of a position where the workpiece 40 collides with the position other than the transfer position by the spindle chuck 31 is the end 30 of the spindle chuck 31 described above.
  • the retraction amount Lz in (A3) is the distance that the work 40 is pulled back along the Z-axis direction when the movement of the work 40 is stopped due to the collision of the work 40 at the end 30.
  • the movement amount Lx in (A4) is a distance by which the work 40 can be moved in the X-axis direction during this learning.
  • the work 40 is moved in the X-axis direction by the movement amount Lx, and then the work 40 is moved toward the spindle chuck 31 along the Z-axis direction.
  • the work 40 is moved by the moving amount Lx until the moving position at the time of delivery falls within the allowable range.
  • the movement distance Lmax in (A5) is a limit distance at which the work 40 can be moved in the X-axis direction during learning. That is, even during the learning, the work 40 is not moved farther than the moving distance Lmax.
  • the analysis unit 12 sends the transfer position information 52 of (A1) and the retraction amount Lz of (A3) to the control unit 13. Further, the analysis unit 12 sends the values of the command arguments (A2), (A4), and (A5) to the machine learning device 20. Note that the analysis unit 12 is not limited to acquiring the value of the command argument from the dedicated command, but may acquire the value corresponding to the command argument from the parameter. In this case, a value corresponding to the command argument is stored in the storage unit 14 as a parameter.
  • the control unit 13 generates the position command 53 according to the transfer position information 52 sent from the analysis unit 12 or the action 58 given from the machine learning device 20.
  • the action 58 is the next position command 53 in the X-axis direction.
  • the control unit 13 sends the position command 53 to the drive unit 37 and the machine learning device 20.
  • the control unit 13 controls the continuation of the transfer operation of the work 40.
  • control unit 13 Upon receiving a notification from the machine learning device 20 indicating that the moving position of the work 40 is out of the allowable range, the control unit 13 pulls back the work 40 along the Z-axis direction by the retraction amount Lz, 53 is generated.
  • the machine learning device 20 includes a state observation unit 25 and a learning unit 21.
  • the state observation unit 25 acquires the reference current value A of (A2) of the command arguments from the analysis unit 12, and the learning unit 21 sends the movement amount Lx of (A4) of the command arguments from the analysis unit 12. And the moving distance Lmax of (A5) is obtained.
  • the state observation unit 25 includes a position command 53 in the X-axis direction and the Z-axis direction from the control unit 13, an FB position 54 in the X-axis direction and the Z-axis direction from the drive unit 37, and an X-axis direction and a Z
  • the FB current 55 in the axial direction is obtained.
  • the state observation unit 25 determines the position command 53 in the X-axis direction and the Z-axis direction and the FB position in the X-axis direction and the Z-axis direction. 54 and an FB current 55 in the X-axis direction and the Z-axis direction. The state observation unit 25 determines whether the moving position of the workpiece 40 is within the allowable range based on the position command 53 in the X-axis direction, the FB position 54 in the X-axis direction, and the FB current 55 in the X-axis direction. It may be determined whether or not.
  • the state observation unit 25 determines whether the moving position of the workpiece 40 is within the allowable range based on the position command 53 in the Z-axis direction, the FB position 54 in the Z-axis direction, and the FB current 55 in the Z-axis direction. It may be determined whether or not. In addition, the state observation unit 25 uses the FB position 54 in the Z-axis direction and the position command 53 in the Z-axis direction without using the FB current 55 in the Z-axis direction so that the movement position of the workpiece 40 is within the allowable range. May be determined.
  • the state observation unit 25 sends the position command 53 in the X-axis direction, the FB position 54 in the X-axis direction, and the FB current in the X-axis direction. 55 is observed as a state variable 56, and the state variable 56 as the observation result is sent to the learning unit 21. That is, the state variables 56 sent from the state observation unit 25 to the learning unit 21 include the position command 53 in the X-axis direction, the FB position 54 in the X-axis direction, and the FB current 55 in the X-axis direction.
  • the delivery position of the work 40 may be shifted in the X-axis direction from the center position of the spindle chuck 31. If the transfer position of the work 40 by the loader chuck 32 is not appropriate, the transfer position of the work 40 may be shifted from the center position of the spindle chuck 31 in the X-axis direction. In these cases, there is a high possibility that the workpiece 40 collides with the spindle chuck 31 at a position other than the delivery position of the spindle chuck 31 and stops at a position where the workpiece 40 cannot be gripped by the spindle chuck 31.
  • the state observation unit 25 determines whether or not the moving position of the work 40 is within the allowable range based on the comparison result between the position command 53 and the FB position 54.
  • the state observation unit 25 sends a result of the determination as to whether or not the moving position of the work 40 is within the allowable range to the control unit 13.
  • the learning unit 21 learns the action 58 that is the next position command 53 according to the state variable 56. In other words, the learning unit 21 learns the position command 53 that reduces the amount of displacement of the transfer position of the workpiece 40 by the loader 36.
  • the learning unit 21 learns the behavior 58 according to a data set created based on the state variable 56 including the position command 53, the FB position 54, and the FB current 55.
  • the learning unit 21 includes a function updating unit 22 and a reward calculating unit 23.
  • the reward calculator 23 calculates a reward 57 based on the state variable 56.
  • the reward calculator 23 calculates the difference between the position indicated by the position command 53 and the FB position 54 based on the state variable 56, and extracts the FB current 55 from the state variable 56.
  • the reward calculation unit 23 increases the reward 57 when the difference between the position indicated by the position command 53 and the FB position 54 is equal to or less than the threshold value and the FB current 55 is equal to or less than the reference current value A. In this case, the reward calculation unit 23 increases the reward 57 as the difference between the position indicated by the position command 53 and the FB position 54 is smaller, and increases the reward 57 as the FB current 55 is smaller.
  • the learning unit 21 sends the calculated reward 57 to the function updating unit 22.
  • the difference between the position indicated by the position command 53 and the FB position 54 may be referred to as a position difference.
  • the function update unit 22 stores a function for determining the action 58, and updates the function for determining the action 58 based on the reward 57.
  • An example of a function for determining the action 58 is an action value function Q (s t , a t ) described later.
  • the function updating unit 22 of the present embodiment updates the action value function Q (s, a) such that the displacement of the delivery position is reduced each time the delivery operation of the workpiece 40 is repeated in the machine tool 2.
  • the function update unit 22 calculates the action 58 using the updated action value function Q (s, a).
  • the function update unit 22 sends the calculated action 58 to the control unit 13 and sends the storage unit 14 the previous learning data, the data used for learning, and the data necessary for controlling the loader 36.
  • An example of the learning data is the next position command 53 calculated when the delivery is successful, and an example of the data used for the learning is an action value function Q (s, a) used by the learning unit 21 for the learning. It is.
  • An example of data used for controlling the loader 36 is the pullback amount Lz.
  • the storage unit 14 stores learning data up to the previous time, data used for learning, and data necessary for controlling the loader 36 by the control unit 13.
  • FIG. 3 is a flowchart illustrating an operation procedure of the machining system according to the embodiment.
  • the numerical controller 10 determines that the transfer of the loader 36 to the transfer position instructed by the instruction argument (A1) is performed when the delivery of the work 40 is the first time, that is, when the workpiece 40 has not been learned.
  • Start step ST1.
  • the control unit 13 sends a position command 53 to the drive unit 37 and the state observation unit 25. Accordingly, the loader 36 moves in the X-axis direction with the loader chuck 32 gripping the work 40, and then moves in the Z-axis direction.
  • the drive unit 37 acquires the FB position 54 and the FB current 55 for each specific cycle, and sends them to the state observation unit 25. Accordingly, the state observation unit 25 monitors the position command 53, the FB position 54, and the FB current 55.
  • the state observation unit 25 determines whether the FB current 55 in each of the X-axis and the Z-axis is equal to or smaller than the reference current value A (step ST2). Note that, as the reference current value A, different values may be used for the X axis and the Z axis.
  • the state observation unit 25 sets the X-axis direction and the Z-axis direction. To the learning unit 21 including the position command 53, the FB position 54 in the X-axis direction and the Z-axis direction, and the FB current 55 in the X-axis direction and the Z-axis direction. Further, the state observation unit 25 notifies the control unit 13 that the movement position of the work 40 is out of the allowable range.
  • the learning unit 21 sets a small value reward 57 for the position command 53 used for transfer. I do. Accordingly, the learning unit 21 learns an appropriate position command 53 according to the state variable 56, and determines an action 58 that is the next position command 53 so that the reward 57 is maximized (step ST4a).
  • the control unit 13 pulls back the work 40 along the Z-axis direction by the pullback amount Lz (step ST5). Then, the control unit 13 starts moving the loader 36 to the transfer position (step ST1). Thereby, the control unit 13 moves the work 40 in the X-axis direction by the position command 53 generated according to the action 58, and thereafter moves the work 40 in the Z-axis direction.
  • step ST2 when the FB current 55 in each of the X-axis and the Z-axis becomes equal to or smaller than the reference current value A (Yes in step ST2), the state observation unit 25 sets the position command in each of the X-axis and the Z-axis. It is determined whether or not the position difference between the position indicated by 53 and the FB position 54 is equal to or smaller than a threshold (step ST3).
  • the state observation unit 25 sets the position command 53 in the X axis direction, the FB position 54 in the X axis direction, and the X axis direction.
  • the state variable 56 including the FB current 55 in the direction is sent to the learning unit 21. Further, the state observation unit 25 notifies the control unit 13 that the movement position of the work 40 is out of the allowable range. Thereby, the processing of steps ST4a and ST5 described above is performed, and further the processing of step ST1 is performed.
  • the learning process of the action 58 by the learning unit 21 will be described.
  • the learning algorithm used for the learning unit 21 may be any learning algorithm.
  • reinforcement learning Reinforcement @ Learning
  • an agent acting as an agent in a certain environment observes a current state indicated by a state variable 56, and determines an action 58 to be taken based on the observation result.
  • the agent obtains the reward 57 from the environment by selecting the action 58, and learns a policy that maximizes the reward 57 through a series of actions 58.
  • Q learning Q-Learning
  • TD-Learning TD learning
  • a general update equation (action value table) of the action value function Q (s, a) is represented by the following equation (1). That is, an example of the action value table is the action value function Q (s, a) of Expression (1).
  • s t represents the environment at time t
  • a t represents the behavior in time t.
  • the environment is changed to s t + 1.
  • rt + 1 represents a reward 57 obtained by a change in the environment
  • represents a discount rate
  • represents a learning coefficient. If you apply the Q-learning, the next position command 53 of the delivery operation is the action a t.
  • the update expression represented by the expression (1) increases the action value Q if the action value of the best action a at the time t + 1 is larger than the action value Q of the action a executed at the time t. In this case, the action value Q is reduced. In other words, the action value function Q (s, a) is updated so that the action value Q of the action a at the time t approaches the best action value at the time t + 1. As a result, the best action value in one environment is sequentially propagated to the action value in the previous environment.
  • the reward calculator 23 calculates the reward 57 based on the difference between the position indicated by the position command 53 and the FB position 54 and the FB current 55.
  • the reward calculation unit 23 increases the reward 57 when the position difference between the position indicated by the position command 53 and the FB position 54 is equal to or smaller than the threshold value and the FB current 55 is equal to or smaller than the reference current value A. . At this time, the reward calculator 23 gives a reward 57 of, for example, “1”.
  • the reward calculation unit 23 reduces the reward 57 when the difference between the position indicated by the position command 53 and the FB position 54 is larger than the threshold value or when the FB current 55 is larger than the reference current value A. At this time, the reward calculator 23 gives a reward 57 of, for example, “ ⁇ 1”.
  • the reward calculation unit 23 sets the reward 57 as the maximum reward.
  • the reward calculation unit 23 sets the reward 57 to half the maximum reward.
  • An example of the case where the FB current 55 is half of the reference current value A is a case where the transfer of the work 40 is successful, but the transfer position is slightly shifted from the desired position. When the workpiece 40 reaches the delivery position while rubbing against the spindle chuck 31, the FB current 55 increases during rubbing.
  • the transfer position is slightly deviated from the desired position, if the spindle chuck 31 attempts to grip the work 40 at the center of the transfer position of the work 40 after reaching the transfer position, the work 40 is pushed by the spindle chuck 31. As a result, the FB current 55 increases. In such a case, the transfer position is within the allowable range. However, the machine learning device 20 determines whether the workpiece 40 has not collided with the end 30 or not. The reward 57 is given.
  • the reward calculator 23 sets the reward 57 as the minimum reward when the position difference is larger than the threshold value or when the FB current 55 is larger than the reference current value A.
  • the reward calculation unit 23 sends the calculated reward 57 to the function update unit 22.
  • the function update unit 22 updates a function for determining the behavior 58 according to the reward 57 calculated by the reward calculation unit 23.
  • an action value function Q (s t , a t ) represented by Expression (1) is a function for calculating the action 58, and is updated by the function update unit 22.
  • FIG. 4 is a diagram for explaining a first learning example by the machine learning device according to the embodiment.
  • FIG. 4 shows positions P0 to P6 of the work 40 in the X-axis direction. It is assumed that the position of the work 40 in the X-axis direction when the work 40 collides with the end 30 of the spindle chuck 31 is the position P0. In this case, the machine learning device 20 pulls the work 40 back to the plus side in the Z-axis direction until the work 40 no longer collides with the end portion 30 of the spindle chuck 31, and moves the work 40 to the next position in the X-axis direction. The process of moving and the process of inserting the workpiece 40 on the minus side in the Z-axis direction are repeated.
  • the machine learning device 20 moves the work 40 in the X-axis direction in the order of position P1, position P2, position P3, position P4, position P5, and position P6, which are positions in the X-axis direction.
  • the distance between the position P0 and the position P1 is a distance Lx.
  • the distance between the position P0 and the position P3 and the distance between the position P0 and the position P6 are apart from each other by the moving distance Lmax.
  • the machine learning device 20 when moving the work 40 to the position P1, the machine learning device 20 sends an action 58 for loading the work 40 to the position P1 to the control unit 13.
  • the loader 36 moves in the X-axis direction according to the position command 53 corresponding to the action 58.
  • the machine learning device 20 completes moving the workpiece 40 in the X-axis direction.
  • the machine learning device 20 gives a low reward 57 to the position command 53 when the work 40 collides with the end 30, and gives a high reward 57 to the position command 53 when the work 40 does not collide with the end 30.
  • the machine learning device 20 may move the work 40 to the positions P1 to P6 in any order.
  • the machine learning device 20 may move the work 40 in the X-axis direction in an order close to the position P0, such as the order of the position P1, the position P4, the position P2, the position P5, the position P3, and the position P6.
  • the machine learning device 20 is not limited to setting the six positions P1 to P6 as the positions in the X-axis direction, and may set five or less positions or seven or more positions in the X-axis direction. .
  • Step ST3 the learning unit 21 learns the action 58 which is the next position command 53 according to the state variable 56 (Step ST4b). That is, the learning unit 21 sets the reward 57 having a large value for the position command 53 used for delivery, and then determines the action 58 corresponding to the next position command 53.
  • the state observation unit 25 notifies the control unit 13 that the movement position of the work 40 is within the allowable range. Thereby, the control unit 13 proceeds with the process of transferring the work 40 between the chucks.
  • control unit 13 causes the machine tool 2 to execute the above-described operations from s2 to s6. That is, the control unit 13 starts the operation of closing the spindle chuck 31 (step ST6), and waits until the spindle chuck 31 is closed (step ST7). When the spindle chuck 31 is closed, the spindle chuck 31 grips the work 40.
  • the controller 13 starts the operation of opening the loader chuck 32 (step ST8), and waits until the loader chuck 32 is opened (step ST9). Then, the control unit 13 retreats the loader 36 to the plus side in the Z-axis direction (step ST10).
  • the machine learning device 20 learns the position command 53 by the same processing as when the work 40 is loaded from the loader 36 to the spindle chuck 31.
  • the machine learning device 20 causes the delivery of the work 40 to be executed again according to the movement amount Lx, so that the delivery position can be corrected. Further, since the machine learning device 20 learns the transfer position of the work 40, it is possible to prevent the transfer failure. Further, since the machine learning device 20 learns the transfer position of the work 40, the machine learning device 20 can be applied to an environment in which the transfer fails due to a slight displacement such as a collet chuck. Further, since the machine learning device 20 can execute the delivery of the work 40 again and prevent the failure of the delivery, the productivity of the machine tool 2 is improved.
  • the machine learning device 20 determines the transfer position of the work 40 based on the position difference between the position indicated by the position command 53 and the FB position 54, a special device such as a camera for confirming the transfer of the work 40 is used. There is no need to provide a special mechanism or device. Therefore, delivery can be confirmed at low cost.
  • the machine learning device 20 executes the delivery of the work 40 again according to the movement amount Lx, so that the work 40 collides with the end 30 of the spindle chuck 31. There is no need for manual recovery. Therefore, downtime when loading the work 40 can be reduced, and deterioration in productivity can be suppressed.
  • the position command 53 sent by the numerical controller 10 to the drive unit 37 includes a position command in the X-axis direction, a position command in the Y-axis direction, and a position command in the Z-axis direction.
  • the servomotors 38 include servomotors for moving the loader 36 in the X-axis direction, the Y-axis direction, and the Z-axis direction. Then, the machine learning device 20 learns a position command in the X-axis direction, a position command in the Y-axis direction, and a position command in the Z-axis direction.
  • FIG. 5 is a diagram for explaining a second learning example by the machine learning device according to the embodiment.
  • FIG. 6 is a diagram for explaining a positional relationship between a spindle chuck and a work included in the machine tool according to the embodiment.
  • FIGS. 5 and 6 show positions of the work 40 in the XY plane when the work 40 is viewed from the Z-axis direction.
  • the spindle chuck 31A shown in FIG. 6 is an example of the spindle chuck 31 described in FIG.
  • the spindle chuck 31A is a three-jaw chuck.
  • Each of the three spindle chucks 31A grips the workpiece 40 by moving to the center Q1 of the circular chuck area 45.
  • the numerical controller 10 generates a position command 53 for inserting the work 40 into the center Q1, but the actual work 40 is inserted at a position shifted from the center Q1, such as the position P0, and the work 40 is May collide. For this reason, the machine learning device 20 learns the position command 53 such that the work 40 does not collide with the end 30.
  • the machine learning device 20 pulls the work 40 back to the plus side in the Z-axis direction until the work 40 no longer collides with the end 30 of the spindle chuck 31A, and moves the work 40 in the X-axis direction and the Y-axis direction. The process of moving to the next position and the process of inserting the work 40 on the minus side in the Z-axis direction are repeated. In this case, the machine learning device 20 calculates the next movement position of the work 40 based on the movement amount Lx and the movement distance Lmax. That is, when learning the movement position, the work 40 is moved to a position limited by a specific learning direction and a specific learning movement amount in the XY plane.
  • the machine learning device 20 moves the work 40 in the order of the positions P11, P12, P13, P14, P15, P16, P17, P18, and P19, which are positions in the XY plane. Move.
  • the distance between the position P0 and the position P11, the distance between the position P11 and the position P12, and the distance between the position P12 and the position P13 are apart from each other by a movement amount Lx.
  • the distance between the position P0 and the position P14, the distance between the position P14 and the position P15, and the distance between the position P15 and the position P16 are separated by the movement amount Lx.
  • the distance between the position P0 and the position P17, the distance between the position P17 and the position P18, and the distance between the position P18 and the position P19 are apart by the movement amount Lx.
  • the distance between the position P0 and the position P13, the distance between the position P0 and the position P16, and the distance between the position P0 and the position P19 are apart from each other by a movement distance Lmax.
  • the direction between the direction from the position P0 to the position P13 and the direction from the position P0 to the position P16 forms a learning angle ⁇ , and the direction from the position P0 to the position P16 and the direction from the position P0 to the position P19.
  • the direction forms a learning angle ⁇ .
  • the machine learning device 20 moves the work 40 by the movement amount Lx to the maximum movement distance Lmax in the first direction. Search for an appropriate movement position.
  • the machine learning device 20 performs a search similar to the first direction in the second direction obtained by rotating the first direction by the learning angle ⁇ . Do.
  • the machine learning device 20 searches for an appropriate movement position while moving the work 40 by the movement amount Lx until the sum of the learning angles ⁇ in the chuck area 45 exceeds 360 degrees, and rotates the learning angle ⁇ . Repeat.
  • the machine learning device 20 when moving the work 40 to the position P11, the machine learning device 20 sends an action 58 for loading the work 40 to the position P11 to the control unit 13. Thereby, the loader 36 moves in the X-axis direction and the Y-axis direction by the position command 53 corresponding to the action 58.
  • the machine learning device 20 completes moving the work 40 in the X-axis direction and the Y-axis direction.
  • the machine learning device 20 may move the work 40 to the positions P11 to P19 in any order.
  • the machine learning device 20 arranges the work 40 in the order close to the position P0, such as the order of position P11, position P14, position P17, position P12, position P15, position P18, position P13, position P16, and position P19. May be moved.
  • the machine learning apparatus 20 is not limited to setting nine positions P11 to P19 as positions in the XY plane, but may set eight or less positions or ten or more positions in the XY plane. .
  • FIG. 7 is a diagram illustrating a hardware configuration example of the numerical control device according to the embodiment.
  • the numerical controller 10 can be realized by the control circuit 300 shown in FIG.
  • An example of the processor 301 is a CPU (Central Processing Unit), a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, a processor, a DSP (Digital Signal Processor), or a system LSI (Large Scale Integration).
  • Examples of the memory 302 are a RAM (Random Access Memory) and a ROM (Read Only Memory).
  • the numerical controller 10 is realized by the processor 301 reading and executing a program stored in the memory 302 for executing the operation of the numerical controller 10. It can also be said that this program causes a computer to execute the procedure or method of the numerical control device 10.
  • the memory 302 is also used as a temporary memory when the processor 301 executes various processes.
  • the numerical control device 10 may be realized by dedicated hardware, and some may be realized by software or firmware. Further, the machine learning device 20 may be realized by the control circuit 300 shown in FIG.
  • whether or not the transfer position is within the allowable range is determined based on the FB current 55 and the position difference. However, the transfer position is within the allowable range based on the position difference without using the FB current 55. May be determined.
  • the position command 53 to the transfer position is learned based on the FB current 55 in the X-axis direction and the position difference in the X-axis direction, but the X-axis FB current 55 is not used without using the FB current 55.
  • the position command 53 to the transfer position may be learned based on the axial position difference.
  • the machine learning device 20 may learn the position command 53 based on the position shift detected on the spindle chuck 31 side.
  • the machine learning device 20 may perform other known methods such as a neural network, a genetic programming, and a functional logic programming.
  • Machine learning may be performed according to a support vector machine or the like.
  • the control unit 13 controls the loader 36 based on the behavior 58 learned by the learning unit 21, but the control unit 13 controls the loader 36 without using the behavior 58. May be.
  • the control unit 13 determines whether the moving position of the work 40 is within the allowable range based on the position command 53, the FB position 54, and the FB current 55, and determines whether the displacement amount is within the allowable range. If not, a new position command 53 shifted from the position indicated by the position command 53 is output. That is, the control unit 13 searches for an appropriate movement position of the work 40 by shifting the movement position of the work 40 little by little. Specifically, the control unit 13 executes the process of determining the position shift and the process of outputting a new position command 53 when the amount of the position shift is not within the allowable range once or a plurality of times. Keep the displacement within an allowable range.
  • the position command 53 for suppressing the displacement of the transfer position of the workpiece 40 between the chucks is learned in accordance with the data set created based on the state variable 56. Is repeated, it is possible to reduce the probability that the delivery of the work 40 fails.

Landscapes

  • Engineering & Computer Science (AREA)
  • Mechanical Engineering (AREA)
  • Numerical Control (AREA)
  • Machine Tool Sensing Apparatuses (AREA)
  • Gripping On Spindles (AREA)
  • Turning (AREA)

Abstract

A machine learning device (20), which learns a position command (53) for a drive unit (37) that moves a loader chuck (32) when a workpiece (40) is transferred between the loader chuck (32), which grips and transports the workpiece (40), and a main shaft chuck (31), which grips and receives the workpiece (40), said machine learning device being equipped with: a state observation unit (25) for observing, as state variables (56), the position command (53) for the drive unit (37) and feedback data from the drive unit (37); and a learning unit (21) for learning the position command (53) for suppressing deviation in the transfer position of the workpiece (40) between the loader chuck (32) and the main shaft chuck (31), in accordance with a data set created on the basis of the state variables (56).

Description

機械学習装置、数値制御装置、工作機械および機械学習方法Machine learning device, numerical control device, machine tool, and machine learning method
 本発明は、ワークの受け渡し動作を学習する機械学習装置、数値制御装置、工作機械および機械学習方法に関する。 The present invention relates to a machine learning device, a numerical control device, a machine tool, and a machine learning method for learning a work transfer operation.
 旋盤等の工作機械において、ワークを掴んで送る側のチャックと、ワークを掴んで受け取る側のチャックとの間におけるワークの受け渡しの動作では、ワークを搬送するローダが、送る側のチャックに掴まれたワークを受け渡し位置まで移動させる。例えば、ワークが長尺ワークである場合のワークの撓み、またはチャックによるワークのつかみ損ね等によって、ワークを受け取る側のチャック領域の中心位置に、ワークを移動させることができない場合がある。このように、ワークの受け渡しの際に、受け渡し位置が適切な位置からずれると、受け渡しに失敗する場合があるので、受け渡し位置の位置ずれを抑制することができる技術が望まれる。 In a machine tool such as a lathe, in a work transfer operation between a chuck that grips and sends a work and a chuck that grips and receives a work, a loader that conveys the work is gripped by the chuck on the sending side. To the transfer position. For example, the workpiece may not be able to be moved to the center position of the chuck area on the receiving side of the workpiece due to bending of the workpiece when the workpiece is a long workpiece, or failure to grasp the workpiece by the chuck. As described above, when the transfer position is shifted from an appropriate position during the transfer of the work, the transfer may fail. Therefore, a technique capable of suppressing the position shift of the transfer position is desired.
 特許文献1に記載のローダ制御装置は、ワークの受け渡し位置のずれ量と、ローダを駆動するサーボモータのモータトルクとの相関関係を関数化しておき、相関関係および測定したモータトルクに基づいて受け渡し位置のずれ量を予測し、予測したずれ量に基づいて受け渡し位置を修正している。 The loader control device described in Patent Literature 1 makes a correlation between a shift amount of a workpiece transfer position and a motor torque of a servo motor that drives the loader a function, and transfers the correlation based on the correlation and the measured motor torque. The shift amount of the position is predicted, and the delivery position is corrected based on the predicted shift amount.
特開2002-187040号公報JP-A-2002-187040
 しかしながら、上記特許文献1では、相関関係を示す関数が固定の関数であるので、ワークの受け渡し動作が繰り返されてもワークの受け渡しの失敗確率は減少しない。 However, in Patent Literature 1, since the function indicating the correlation is a fixed function, the probability of failure in delivery of the work does not decrease even if the work delivery operation is repeated.
 本発明は、上記に鑑みてなされたものであって、ワークの受け渡し動作を学習することによって、ワークの受け渡しに失敗する確率を減少させていくことができる機械学習装置を得ることを目的とする。 The present invention has been made in view of the above, and an object of the present invention is to obtain a machine learning device that can reduce the probability of failing in transferring a work by learning the work transfer operation. .
 上述した課題を解決し、目的を達成するために、本発明は、ワークを掴んで送る第1のチャックと、ワークを掴んで受け取る第2のチャックとの間のワークの受け渡しの際に、第1のチャックを移動させる駆動機構への位置指令を学習する機械学習装置であって、駆動機構への位置指令と、駆動機構からのフィードバックデータと、を状態変数として観測する状態観測部と、状態変数に基づいて作成されるデータセットに従って、第1のチャックと第2のチャックとの間におけるワークの受け渡し位置の位置ずれを抑制する位置指令を学習する学習部と、を備えることを特徴とする。 In order to solve the above-mentioned problems and achieve the object, the present invention provides a method for transferring a work between a first chuck for holding and sending a work and a second chuck for holding and receiving the work. 1. A machine learning device for learning a position command to a drive mechanism for moving one chuck, a state observation unit for observing a position command to the drive mechanism and feedback data from the drive mechanism as state variables, A learning unit that learns a position command that suppresses a positional deviation of a workpiece transfer position between the first chuck and the second chuck in accordance with a data set created based on the variables. .
 本発明にかかる機械学習装置は、ワークの受け渡し動作を学習することによって、ワークの受け渡しに失敗する確率を減少させていくことができるという効果を奏する。 (4) The machine learning device according to the present invention has an effect that by learning the work transfer operation, it is possible to reduce the probability that the work transfer will fail.
実施の形態にかかる工作システムの構成を示す図The figure which shows the structure of the working system concerning embodiment. 実施の形態にかかる数値制御装置を備えた制御システムの構成を示す図FIG. 1 is a diagram illustrating a configuration of a control system including a numerical control device according to an embodiment. 実施の形態にかかる工作システムの動作手順を示すフローチャート4 is a flowchart illustrating an operation procedure of the machining system according to the embodiment. 実施の形態にかかる機械学習装置による第1の学習例を説明するための図FIG. 4 is a diagram for describing a first learning example by the machine learning device according to the embodiment. 実施の形態にかかる機械学習装置による第2の学習例を説明するための図FIG. 7 is a diagram for describing a second learning example by the machine learning device according to the embodiment. 実施の形態にかかる工作機械が備える主軸チャックとワークとの位置関係を説明するための図FIG. 4 is a view for explaining a positional relationship between a spindle chuck and a work included in the machine tool according to the embodiment. 実施の形態にかかる数値制御装置のハードウェア構成例を示す図FIG. 2 is a diagram illustrating an example of a hardware configuration of a numerical control device according to an embodiment.
 以下に、本発明の実施の形態にかかる機械学習装置、数値制御装置、工作機械および機械学習方法を図面に基づいて詳細に説明する。なお、この実施の形態によりこの発明が限定されるものではない。 Hereinafter, a machine learning device, a numerical control device, a machine tool, and a machine learning method according to an embodiment of the present invention will be described in detail with reference to the drawings. It should be noted that the present invention is not limited by the embodiment.
実施の形態
 図1は、実施の形態にかかる工作システムの構成を示す図である。図1では、工作システム1を鉛直方向から見た場合を示している。本実施の形態では、鉛直方向がY軸方向であり、ワーク40の移動方向である水平方向がX軸方向およびZ軸方向である場合について説明する。
Embodiment FIG. 1 is a diagram illustrating a configuration of a machining system according to an embodiment. FIG. 1 shows a case where the machining system 1 is viewed from a vertical direction. In the present embodiment, a case will be described in which the vertical direction is the Y-axis direction, and the horizontal direction, which is the moving direction of the work 40, is the X-axis direction and the Z-axis direction.
 工作システム1は、ワーク40を加工する工作機械2と、工作機械2の動作を制御する制御システム3とを備えている。工作機械2の例は、旋盤、マシニングセンタである。以下では、工作機械2が旋盤である場合について説明する。 The machine system 1 includes a machine tool 2 for machining a workpiece 40 and a control system 3 for controlling the operation of the machine tool 2. Examples of the machine tool 2 are a lathe and a machining center. Hereinafter, a case where the machine tool 2 is a lathe will be described.
 工作機械2は、回転部35と、第1のチャックであるローダチャック32と、第2のチャックである主軸チャック31と、ワーク40の搬送機構であるローダ36とを備えている。ローダ36は、制御システム3によって動作が制御される。ローダチャック32は、ローダ36に接続されており、ローダ36とともに移動する。ローダチャック32は、被加工物であるワーク40を把持することができる。ローダチャック32の例は、三爪チャック、コレットチャックである。ローダチャック32は、ワーク40の加工を開始する際には、主軸チャック31にワーク40を渡し、ワーク40の加工が完了した後には、主軸チャック31からワーク40を受け取る。 The machine tool 2 includes a rotating unit 35, a loader chuck 32 serving as a first chuck, a spindle chuck 31 serving as a second chuck, and a loader 36 serving as a transfer mechanism for a workpiece 40. The operation of the loader 36 is controlled by the control system 3. The loader chuck 32 is connected to the loader 36 and moves together with the loader 36. The loader chuck 32 can grip a workpiece 40 which is a workpiece. Examples of the loader chuck 32 are a three-jaw chuck and a collet chuck. The loader chuck 32 transfers the work 40 to the spindle chuck 31 when starting the processing of the work 40, and receives the work 40 from the spindle chuck 31 after the processing of the work 40 is completed.
 回転部35は、主軸であるZ軸を回転軸として回転する。主軸チャック31は、回転部35に接続されており、回転部35とともに回転する。主軸チャック31は、ワーク40を把持することができる。主軸チャック31の例は、三爪チャック、コレットチャックである。ワーク40が加工される際には、主軸チャック31がワーク40を把持した状態で回転部35が回転してワーク40を回転させる。回転部35の例は、スピンドル機構である。 The rotation unit 35 rotates around the Z axis, which is the main axis, as a rotation axis. The spindle chuck 31 is connected to the rotating unit 35 and rotates together with the rotating unit 35. The spindle chuck 31 can hold the work 40. Examples of the spindle chuck 31 are a three-jaw chuck and a collet chuck. When the work 40 is machined, the rotating unit 35 rotates while the spindle chuck 31 holds the work 40, thereby rotating the work 40. An example of the rotating unit 35 is a spindle mechanism.
 工作機械2は、ワーク40を回転部35にロードする際には、ローダチャック32でワーク40の一方の端部を掴む。この状態でローダ36は、X軸方向のマイナス側に移動し、主軸チャック31に対向する位置で停止する(s0)。そして、ローダ36は、Z軸方向のマイナス側に移動する。これにより、ローダ36は、主軸チャック31がワーク40を掴むことができる位置までワーク40を移動させる(s1)。主軸チャック31がワーク40を掴むことができるローダ36の位置が、所望の受け渡し位置である。工作機械2は、Mコードといった補助機能を用いて主軸チャック31の閉動作を開始し(s2)、主軸チャック31の閉動作が完了するまで待つ(s3)。主軸チャック31が閉じることによって、主軸チャック31は、ワーク40の他方の端部を掴む。 The machine tool 2 grips one end of the work 40 with the loader chuck 32 when loading the work 40 on the rotating unit 35. In this state, the loader 36 moves to the minus side in the X-axis direction and stops at the position facing the spindle chuck 31 (s0). Then, the loader 36 moves to the minus side in the Z-axis direction. Thus, the loader 36 moves the work 40 to a position where the spindle chuck 31 can grip the work 40 (s1). The position of the loader 36 where the spindle chuck 31 can grip the work 40 is a desired delivery position. The machine tool 2 starts the closing operation of the spindle chuck 31 using an auxiliary function such as an M code (s2), and waits until the closing operation of the spindle chuck 31 is completed (s3). When the spindle chuck 31 is closed, the spindle chuck 31 grips the other end of the work 40.
 この後、工作機械2は、ローダチャック32の開動作を開始し(s4)、ローダチャック32の開動作が完了するまで待つ(s5)。そして、ローダ36は、Z軸方向のプラス側に移動する。これにより、ローダ36は、ワーク40から遠ざかる方向へ退避し(s6)、さらにX軸方向のプラス側に移動する。 Thereafter, the machine tool 2 starts the opening operation of the loader chuck 32 (s4) and waits until the opening operation of the loader chuck 32 is completed (s5). Then, the loader 36 moves to the plus side in the Z-axis direction. As a result, the loader 36 retreats in the direction away from the work 40 (s6), and further moves to the plus side in the X-axis direction.
 工作機械2は、ワーク40の加工が完了すると、上述したs0からs6の処理と逆順の処理によってワーク40を回転部35からアンロードする。この場合において、s0からs6の各処理自体は、逆方向の処理となる。すなわち、s0およびs6の処理では、ローダ36の移動方向は、ロード時とアンロード時とで逆方向になる。また、アンロード時には、ローダチャック32の閉動作と、主軸チャック31の開動作とが行われる。 (4) When the machining of the work 40 is completed, the machine tool 2 unloads the work 40 from the rotating unit 35 by performing a process in the reverse order of the processes from s0 to s6 described above. In this case, each process from s0 to s6 is a process in the reverse direction. That is, in the processing of s0 and s6, the moving direction of the loader 36 is reversed between the time of loading and the time of unloading. At the time of unloading, the closing operation of the loader chuck 32 and the opening operation of the spindle chuck 31 are performed.
 具体的には、アンロードの際には、ローダ36は、X軸方向のマイナス側に移動し、さらにワーク40に近づく方向へ移動する。そして、ローダチャック32は、閉動作を開始し、閉動作が完了すると、主軸チャック31が開動作を開始し、開動作が完了すると、ローダ36は、ワーク40から遠ざかる方向へ退避し、さらにX軸方向のプラス側に移動する。 Specifically, at the time of unloading, the loader 36 moves to the minus side in the X-axis direction, and further moves in a direction approaching the work 40. Then, the loader chuck 32 starts the closing operation, and when the closing operation is completed, the spindle chuck 31 starts the opening operation. When the opening operation is completed, the loader 36 retreats in a direction away from the work 40, and further X Move to the positive side in the axial direction.
 ワーク40を回転部35にロードする処理と、ワーク40を回転部35からアンロードする処理とは同様の処理であるので、以下では、ワーク40を回転部35にロードする処理について説明する。 処理 Since the process of loading the work 40 onto the rotating unit 35 and the process of unloading the work 40 from the rotating unit 35 are the same, the process of loading the work 40 onto the rotating unit 35 will be described below.
 工作機械2は、ワーク40を掴む処理、搬送処理などにおいて、機械特有の癖を有している。このため、ローダチャック32への位置指令と、ローダチャック32のX軸方向の実際の位置との間には、機械特有の癖に起因する対応関係がある。このため、適切な位置指令でワーク40をロードしたつもりでも、ローダチャック32が主軸チャック31へワーク40を渡そうとする際に、ワーク40のX軸方向の位置がずれて、ワーク40が主軸チャック31のZ軸方向のプラス側の端部30に衝突する場合がある。この場合、ローダチャック32は、ワーク40を主軸チャック31に渡すことができない。 The machine tool 2 has a peculiar habit in a process of gripping the work 40, a transfer process, and the like. For this reason, there is a correspondence between the position command to the loader chuck 32 and the actual position of the loader chuck 32 in the X-axis direction due to the peculiarity of the machine. For this reason, even when the work 40 is intended to be loaded with an appropriate position command, when the loader chuck 32 attempts to pass the work 40 to the spindle chuck 31, the position of the work 40 in the X-axis direction is shifted, and the work 40 The chuck 31 may collide with the positive end 30 in the Z-axis direction. In this case, the loader chuck 32 cannot transfer the work 40 to the spindle chuck 31.
 本実施の形態では、制御システム3が備える後述の数値制御(NC:Numerical Control)装置10が、ローダチャック32が主軸チャック31へワーク40を渡そうとする際のローダチャック32のX軸方向の位置を学習する。すなわち、数値制御装置10が、ローダチャック32への位置指令を学習することによって、ワーク40の受け渡しに失敗する確率を減少させていく。ローダチャック32によって掴まれるワーク40の姿勢および形状が毎回同じであれば、ワーク40のX軸方向の位置は、ローダチャック32のX軸方向の位置に対応するものである。したがって、本実施の形態では、ローダチャック32のX軸方向の位置ずれ量と、ワーク40のX軸方向の位置ずれ量とを同義で使う。 In the present embodiment, a numerical control (NC: Numerical Control) device 10 described later provided in the control system 3 is used to move the loader chuck 32 in the X-axis direction when the loader chuck 32 attempts to transfer the work 40 to the spindle chuck 31. Learn the position. That is, by learning the position command to the loader chuck 32, the numerical controller 10 reduces the probability that the delivery of the workpiece 40 fails. If the posture and the shape of the work 40 gripped by the loader chuck 32 are the same each time, the position of the work 40 in the X-axis direction corresponds to the position of the loader chuck 32 in the X-axis direction. Therefore, in the present embodiment, the displacement amount of the loader chuck 32 in the X-axis direction and the displacement amount of the work 40 in the X-axis direction are used synonymously.
 つぎに、工作機械2の動作を制御する数値制御装置10の構成について説明する。図2は、実施の形態にかかる数値制御装置を備えた制御システムの構成を示す図である。制御システム3は、数値制御装置10と、ドライブユニット37と、サーボモータ38とを備えている。 Next, the configuration of the numerical controller 10 that controls the operation of the machine tool 2 will be described. FIG. 2 is a diagram illustrating a configuration of a control system including the numerical control device according to the embodiment. The control system 3 includes a numerical control device 10, a drive unit 37, and a servomotor 38.
 数値制御装置10は、ドライブユニット37に位置指令53を送ることによってローダ36の位置を制御するコンピュータである。数値制御装置10がドライブユニット37に送る位置指令53は、ローダ36の位置を指定した指令であり、X軸方向の位置指令とZ軸方向の位置指令とが含まれている。数値制御装置10は、ローダチャック32から主軸チャック31へのワーク40の受け渡しを制御するとともに、受け渡しに失敗した場合には、ローダチャック32へのX軸方向の位置指令53を変更して再度の受け渡しを制御する。数値制御装置10は、ローダチャック32へのX軸方向の位置指令53と、このX軸方向の位置指令53を用いた場合の受け渡しの結果が失敗であるか成功であるかに基づいて、ローダチャック32へのX軸方向の適切な位置指令53を学習する。 The numerical controller 10 is a computer that controls the position of the loader 36 by sending a position command 53 to the drive unit 37. The position command 53 sent by the numerical controller 10 to the drive unit 37 is a command specifying the position of the loader 36, and includes a position command in the X-axis direction and a position command in the Z-axis direction. The numerical controller 10 controls the transfer of the work 40 from the loader chuck 32 to the spindle chuck 31, and when the transfer fails, changes the X-axis direction position command 53 to the loader chuck 32 and repeats the operation. Control the delivery. The numerical controller 10 determines a position command 53 in the X-axis direction to the loader chuck 32 and whether the result of the transfer using the position command 53 in the X-axis direction is failure or success. An appropriate position command 53 in the X-axis direction to the chuck 32 is learned.
 ドライブユニット37は、サーボモータ38を駆動することによってローダ36を移動させる駆動機構である。ドライブユニット37は、数値制御装置10からの位置指令53に基づいて、サーボモータ38に送る電流値を算出する。ドライブユニット37は、位置指令53に対応する電流をサーボモータ38に送ることによってサーボモータ38を駆動する。ドライブユニット37は、サーボモータ38に送る電流を示すデータであるフィードバック(FB:Feed-Back)電流55を数値制御装置10に送る。FB電流55は、ドライブユニット37から数値制御装置10へのフィードバックデータの一例である。 The drive unit 37 is a drive mechanism that moves the loader 36 by driving a servo motor 38. The drive unit 37 calculates a current value to be sent to the servomotor 38 based on the position command 53 from the numerical controller 10. The drive unit 37 drives the servo motor 38 by sending a current corresponding to the position command 53 to the servo motor 38. The drive unit 37 sends a feedback (FB: Feed-Back) current 55, which is data indicating a current to be sent to the servomotor 38, to the numerical controller 10. The FB current 55 is an example of feedback data from the drive unit 37 to the numerical controller 10.
 ドライブユニット37は、エンコーダ39からサーボモータ38の回転数を示す情報が送られてくると、この回転数に基づいて、ワーク40の現在位置を示すデータであるFB位置54を算出して数値制御装置10に送る。FB位置54は、ドライブユニット37から数値制御装置10へのフィードバックデータの一例である。 When information indicating the number of rotations of the servo motor 38 is sent from the encoder 39, the drive unit 37 calculates an FB position 54, which is data indicating the current position of the work 40, based on the number of rotations. Send to 10. The FB position 54 is an example of feedback data from the drive unit 37 to the numerical controller 10.
 サーボモータ38は、ローダ36に接続されており、ドライブユニット37からの電流に従ってローダ36を移動させる。サーボモータ38は、ローダ36をX軸方向に移動させるサーボモータと、ローダ36をZ軸方向に移動させるサーボモータとを含んでいる。X軸方向およびZ軸方向の各サーボモータ38には、サーボモータ38の回転数を検出するエンコーダ39が取付けられている。エンコーダ39は、検出した回転数を示す情報をドライブユニット37に送信する。 The servomotor 38 is connected to the loader 36 and moves the loader 36 according to the current from the drive unit 37. The servomotors 38 include a servomotor that moves the loader 36 in the X-axis direction and a servomotor that moves the loader 36 in the Z-axis direction. An encoder 39 for detecting the rotation speed of the servomotor 38 is attached to each of the servomotors 38 in the X-axis direction and the Z-axis direction. The encoder 39 transmits information indicating the detected rotation speed to the drive unit 37.
 数値制御装置10は、制御加工プログラム記憶部11と、解析部12と、制御部13と、記憶部14と、機械学習装置20とを備えている。制御加工プログラム記憶部11は、ワーク40を加工する際に用いられる制御加工プログラムを記憶する。制御加工プログラムには、ワーク40を回転部35にロードするためのローディング指令61と、ワーク40を加工するための加工指令と、ワーク40を回転部35からアンロードするためのアンローディング指令とが含まれている。図2では、ローディング指令61を図示している。これらの指令のうち、ローディング指令61およびアンローディング指令が、ワーク40の受け渡しを行うための専用指令である。ローディング指令61は、ローダ36の位置決めを行うためのGコード51として解析部12に送られる。 The numerical control device 10 includes a control processing program storage unit 11, an analysis unit 12, a control unit 13, a storage unit 14, and a machine learning device 20. The control processing program storage unit 11 stores a control processing program used when processing the work 40. The control machining program includes a loading command 61 for loading the work 40 on the rotating unit 35, a machining command for machining the work 40, and an unloading command for unloading the work 40 from the rotating unit 35. include. FIG. 2 illustrates the loading command 61. Among these commands, the loading command 61 and the unloading command are dedicated commands for transferring the work 40. The loading command 61 is sent to the analysis unit 12 as a G code 51 for positioning the loader 36.
 解析部12は、制御加工プログラムを解析する。解析部12は、解析した指令が専用指令であるか否かを判別し、ローディング指令61といった専用指令である場合には、Gコード51に基づいて、ワーク40を受け渡す位置を示す受け渡し位置情報52を生成する。すなわち、解析部12は、Gコード51内に含まれるローダ36の位置決め指令に基づいて、受け渡し位置情報52を生成する。受け渡し位置情報52は、ローダチャック32と主軸チャック31との間のチャック間でワーク40を受け渡す位置の情報である。具体的には、受け渡し位置情報52は、ローダ36の終点である。ワーク40の初回の受け渡し実行時には、受け渡し動作に用いられる情報が、専用指令の引数によって設定される。 The analysis unit 12 analyzes the control machining program. The analysis unit 12 determines whether or not the analyzed command is a dedicated command. If the analyzed command is a dedicated command such as the loading command 61, transfer position information indicating a position at which the workpiece 40 is transferred based on the G code 51. 52 is generated. That is, the analysis unit 12 generates the transfer position information 52 based on the positioning command of the loader 36 included in the G code 51. The transfer position information 52 is information of a position at which the work 40 is transferred between chucks between the loader chuck 32 and the spindle chuck 31. Specifically, the delivery position information 52 is an end point of the loader 36. At the time of the first delivery execution of the work 40, information used for the delivery operation is set by the argument of the dedicated command.
 専用指令は、下記の(A1)から(A5)の指令引数を含んでいる。
(A1)ワーク40の受け渡し位置情報52であるローダ36の終点
(A2)ワーク40の受け渡し時に衝突と判定か否かの基準である基準電流値A
(A3)衝突と判定した場合のローダ36の引き戻し量Lz
(A4)学習時にローダ36を移動させる方向および移動量Lx
(A5)学習時の一方向における最大の移動距離Lmax
The dedicated command includes the following command arguments (A1) to (A5).
(A1) The end point of the loader 36 which is the transfer position information 52 of the work 40 (A2) The reference current value A which is a criterion for determining whether or not a collision occurs when the work 40 is transferred.
(A3) The retraction amount Lz of the loader 36 when it is determined that the collision occurs.
(A4) The direction and amount Lx of moving the loader 36 during learning
(A5) Maximum moving distance Lmax in one direction during learning
 上述した(A1)の受け渡し位置情報52には、X座標とZ座標とが含まれている。(A2)の基準電流値Aは、受け渡しが異常状態であるか否かを判定するための値であり、サーボモータ38へ送られる電流のFB電流55と比較される。この基準電流値Aよりも大きなFB電流55がサーボモータ38へ送られるような場合は、ワーク40が主軸チャック31による受け渡し位置以外の位置で主軸チャック31に衝突したような異常状態である。主軸チャック31による受け渡し位置以外でワーク40が衝突する位置の例は、前述の主軸チャック31の端部30である。(A3)の引き戻し量Lzは、ワーク40が端部30で衝突してワーク40の移動が停止した場合に、ワーク40がZ軸方向に沿って引き戻される距離である。 (4) The transfer position information 52 of (A1) includes the X coordinate and the Z coordinate. The reference current value A in (A2) is a value for determining whether or not the delivery is abnormal, and is compared with the FB current 55 of the current sent to the servomotor 38. When the FB current 55 larger than the reference current value A is sent to the servomotor 38, it is an abnormal state that the workpiece 40 collides with the spindle chuck 31 at a position other than the transfer position of the spindle chuck 31. An example of a position where the workpiece 40 collides with the position other than the transfer position by the spindle chuck 31 is the end 30 of the spindle chuck 31 described above. The retraction amount Lz in (A3) is the distance that the work 40 is pulled back along the Z-axis direction when the movement of the work 40 is stopped due to the collision of the work 40 at the end 30.
 ワーク40が端部30で衝突して移動が停止した場合、機械学習装置20で位置指令53の学習が行われる。(A4)の移動量Lxは、この学習時にワーク40がX軸方向に移動させられる距離である。ワーク40は、移動量LxだけX軸方向に移動させられたうえで、ワーク40がZ軸方向に沿って主軸チャック31側へ移動させられる。ワーク40は、受け渡しの際の移動位置が許容範囲内となるまで移動量Lxずつ移動させられる。また、(A5)の移動距離Lmaxは、学習時にワーク40がX軸方向に移動させられる限界距離である。すなわち、学習時であっても、ワーク40は移動距離Lmaxよりも遠くへは移動させられることはない。 When the workpiece 40 collides at the end 30 and stops moving, the machine learning device 20 learns the position command 53. The movement amount Lx in (A4) is a distance by which the work 40 can be moved in the X-axis direction during this learning. The work 40 is moved in the X-axis direction by the movement amount Lx, and then the work 40 is moved toward the spindle chuck 31 along the Z-axis direction. The work 40 is moved by the moving amount Lx until the moving position at the time of delivery falls within the allowable range. The movement distance Lmax in (A5) is a limit distance at which the work 40 can be moved in the X-axis direction during learning. That is, even during the learning, the work 40 is not moved farther than the moving distance Lmax.
 解析部12は、(A1)の受け渡し位置情報52と、(A3)の引き戻し量Lzを制御部13に送る。また、解析部12は、上記(A2)、(A4)、(A5)の指令引数の値を機械学習装置20に送る。なお、解析部12は、専用指令から指令引数の値を取得する場合に限らず、パラメータから指令引数に対応する値を取得してもよい。この場合、指令引数に対応する値をパラメータとして記憶部14に格納しておく。 The analysis unit 12 sends the transfer position information 52 of (A1) and the retraction amount Lz of (A3) to the control unit 13. Further, the analysis unit 12 sends the values of the command arguments (A2), (A4), and (A5) to the machine learning device 20. Note that the analysis unit 12 is not limited to acquiring the value of the command argument from the dedicated command, but may acquire the value corresponding to the command argument from the parameter. In this case, a value corresponding to the command argument is stored in the storage unit 14 as a parameter.
 制御部13は、解析部12から送られてくる受け渡し位置情報52または機械学習装置20から与えられる行動58に従って位置指令53を生成する。行動58は、X軸方向の次回の位置指令53である。制御部13は、位置指令53をドライブユニット37および機械学習装置20に送る。制御部13は、機械学習装置20から、ワーク40の移動位置が許容範囲内であることを示す通知を受けると、ワーク40の受け渡し動作の続きを制御する。 The control unit 13 generates the position command 53 according to the transfer position information 52 sent from the analysis unit 12 or the action 58 given from the machine learning device 20. The action 58 is the next position command 53 in the X-axis direction. The control unit 13 sends the position command 53 to the drive unit 37 and the machine learning device 20. When receiving a notification from the machine learning device 20 indicating that the movement position of the work 40 is within the allowable range, the control unit 13 controls the continuation of the transfer operation of the work 40.
 制御部13は、機械学習装置20から、ワーク40の移動位置が許容範囲外であることを示す通知を受けると、ワーク40をZ軸方向に沿って引き戻し量Lzだけ引き戻し、行動58に従って位置指令53を生成する。 Upon receiving a notification from the machine learning device 20 indicating that the moving position of the work 40 is out of the allowable range, the control unit 13 pulls back the work 40 along the Z-axis direction by the retraction amount Lz, 53 is generated.
 機械学習装置20は、状態観測部25と、学習部21とを備えている。状態観測部25は、解析部12から、指令引数のうちの(A2)の基準電流値Aを取得し、学習部21は、解析部12から、指令引数のうちの(A4)の移動量Lxおよび(A5)の移動距離Lmaxを取得する。 The machine learning device 20 includes a state observation unit 25 and a learning unit 21. The state observation unit 25 acquires the reference current value A of (A2) of the command arguments from the analysis unit 12, and the learning unit 21 sends the movement amount Lx of (A4) of the command arguments from the analysis unit 12. And the moving distance Lmax of (A5) is obtained.
 状態観測部25は、制御部13からのX軸方向およびZ軸方向の位置指令53と、ドライブユニット37からのX軸方向およびZ軸方向のFB位置54と、ドライブユニット37からのX軸方向およびZ軸方向のFB電流55とを取得する。 The state observation unit 25 includes a position command 53 in the X-axis direction and the Z-axis direction from the control unit 13, an FB position 54 in the X-axis direction and the Z-axis direction from the drive unit 37, and an X-axis direction and a Z The FB current 55 in the axial direction is obtained.
 状態観測部25は、ワーク40の移動位置が許容範囲内であるか否かを判定する際には、X軸方向およびZ軸方向の位置指令53と、X軸方向およびZ軸方向のFB位置54と、X軸方向およびZ軸方向のFB電流55とを用いる。なお、状態観測部25は、X軸方向の位置指令53と、X軸方向のFB位置54と、X軸方向のFB電流55とに基づいて、ワーク40の移動位置が許容範囲内であるか否かを判定してもよい。また、状態観測部25は、Z軸方向の位置指令53と、Z軸方向のFB位置54と、Z軸方向のFB電流55とに基づいて、ワーク40の移動位置が許容範囲内であるか否かを判定してもよい。また、状態観測部25は、Z軸方向のFB電流55を用いることなく、Z軸方向のFB位置54と、Z軸方向の位置指令53とに基づいて、ワーク40の移動位置が許容範囲内であるか否かを判定してもよい。 When determining whether or not the movement position of the workpiece 40 is within the allowable range, the state observation unit 25 determines the position command 53 in the X-axis direction and the Z-axis direction and the FB position in the X-axis direction and the Z-axis direction. 54 and an FB current 55 in the X-axis direction and the Z-axis direction. The state observation unit 25 determines whether the moving position of the workpiece 40 is within the allowable range based on the position command 53 in the X-axis direction, the FB position 54 in the X-axis direction, and the FB current 55 in the X-axis direction. It may be determined whether or not. The state observation unit 25 determines whether the moving position of the workpiece 40 is within the allowable range based on the position command 53 in the Z-axis direction, the FB position 54 in the Z-axis direction, and the FB current 55 in the Z-axis direction. It may be determined whether or not. In addition, the state observation unit 25 uses the FB position 54 in the Z-axis direction and the position command 53 in the Z-axis direction without using the FB current 55 in the Z-axis direction so that the movement position of the workpiece 40 is within the allowable range. May be determined.
 また、学習部21がドライブユニット37への位置指令53を学習する際には、状態観測部25は、X軸方向の位置指令53と、X軸方向のFB位置54と、X軸方向のFB電流55とを状態変数56として観測し、観測結果である状態変数56を学習部21に送る。すなわち、状態観測部25が学習部21に送る状態変数56には、X軸方向の位置指令53と、X軸方向のFB位置54と、X軸方向のFB電流55とが含まれている。 When the learning unit 21 learns the position command 53 to the drive unit 37, the state observation unit 25 sends the position command 53 in the X-axis direction, the FB position 54 in the X-axis direction, and the FB current in the X-axis direction. 55 is observed as a state variable 56, and the state variable 56 as the observation result is sent to the learning unit 21. That is, the state variables 56 sent from the state observation unit 25 to the learning unit 21 include the position command 53 in the X-axis direction, the FB position 54 in the X-axis direction, and the FB current 55 in the X-axis direction.
 ワーク40の形状によってはワーク40の受け渡し位置が主軸チャック31の中心位置からX軸方向にずれる場合がある。また、ローダチャック32によるワーク40の受け渡し位置が適切でない場合にはワーク40の受け渡し位置が主軸チャック31の中心位置からX軸方向にずれる場合がある。これらの場合、ワーク40が、主軸チャック31による受け渡し位置以外の位置で主軸チャック31に衝突し、ワーク40が主軸チャック31で掴めない位置で停止してしまう可能性が高くなる。 に よ っ て Depending on the shape of the work 40, the delivery position of the work 40 may be shifted in the X-axis direction from the center position of the spindle chuck 31. If the transfer position of the work 40 by the loader chuck 32 is not appropriate, the transfer position of the work 40 may be shifted from the center position of the spindle chuck 31 in the X-axis direction. In these cases, there is a high possibility that the workpiece 40 collides with the spindle chuck 31 at a position other than the delivery position of the spindle chuck 31 and stops at a position where the workpiece 40 cannot be gripped by the spindle chuck 31.
 ワーク40が、受け渡し位置以外の位置で主軸チャック31に衝突した場合、ワーク40の移動位置は許容範囲外となる。また、ワーク40が、主軸チャック31に擦れながら受け渡し位置まで移動した場合も、ワーク40の移動位置は許容範囲外である。 (4) When the workpiece 40 collides with the spindle chuck 31 at a position other than the transfer position, the moving position of the workpiece 40 is out of the allowable range. Also, when the work 40 moves to the transfer position while rubbing against the spindle chuck 31, the movement position of the work 40 is outside the allowable range.
 ワーク40の移動位置が許容範囲外である異常時には、X軸方向およびZ軸方向の少なくとも一方のローダ36への負荷が大きくなる。このため、ワーク40の移動位置が許容範囲外の場合、サーボモータ38へ送られる電流が上昇し、FB電流55も上昇する。したがって、状態観測部25は、基準電流値Aと、FB電流55との比較結果に基づいて、ワーク40の移動位置が許容範囲内であるか否かを判定する。ワーク40の移動位置が許容範囲内である正常時には、ワーク40が、主軸チャック31による受け渡し位置で停止するので、ローダ36への負荷が大きくなることはなく、FB電流55が上昇することはない。 (4) When an abnormality occurs in which the movement position of the work 40 is out of the allowable range, the load on at least one of the loaders 36 in the X-axis direction and the Z-axis direction increases. Therefore, when the moving position of the work 40 is out of the allowable range, the current sent to the servomotor 38 increases, and the FB current 55 also increases. Therefore, the state observation unit 25 determines whether the moving position of the work 40 is within the allowable range based on the comparison result between the reference current value A and the FB current 55. When the moving position of the work 40 is normal within the allowable range, the work 40 stops at the transfer position by the spindle chuck 31, so that the load on the loader 36 does not increase and the FB current 55 does not increase. .
 また、ワーク40の移動位置が許容範囲外である異常時には、位置指令53に対応するローダ36の位置と、FB位置54に対応するローダ36の位置と、が特定の時間内に同じにならない。したがって、状態観測部25は、位置指令53と、FB位置54との比較結果に基づいて、ワーク40の移動位置が許容範囲内であるか否かを判定する。状態観測部25は、ワーク40の移動位置が許容範囲内であるか否かの判定結果を制御部13に送る。 {Circle around (4)} In the case of an abnormality where the moving position of the work 40 is out of the allowable range, the position of the loader 36 corresponding to the position command 53 and the position of the loader 36 corresponding to the FB position 54 do not become the same within a specific time. Therefore, the state observation unit 25 determines whether or not the moving position of the work 40 is within the allowable range based on the comparison result between the position command 53 and the FB position 54. The state observation unit 25 sends a result of the determination as to whether or not the moving position of the work 40 is within the allowable range to the control unit 13.
 学習部21は、状態変数56に従って、次回の位置指令53である行動58を学習する。換言すると、学習部21は、ローダ36によるワーク40の受け渡し位置の位置ずれ量が減る位置指令53を学習する。 The learning unit 21 learns the action 58 that is the next position command 53 according to the state variable 56. In other words, the learning unit 21 learns the position command 53 that reduces the amount of displacement of the transfer position of the workpiece 40 by the loader 36.
 具体的には、学習部21は、位置指令53と、FB位置54と、FB電流55とを含んだ状態変数56に基づいて作成されるデータセットに従って、行動58を学習する。学習部21は、関数更新部22と、報酬計算部23とを備えている。 Specifically, the learning unit 21 learns the behavior 58 according to a data set created based on the state variable 56 including the position command 53, the FB position 54, and the FB current 55. The learning unit 21 includes a function updating unit 22 and a reward calculating unit 23.
 報酬計算部23は、状態変数56に基づいて報酬57を計算する。報酬計算部23は、状態変数56に基づいて位置指令53が示す位置とFB位置54との差分を算出し、状態変数56からFB電流55を抽出する。報酬計算部23は、位置指令53が示す位置とFB位置54の差分が閾値以下であり、かつFB電流55が基準電流値A以下の場合に報酬57を増大させる。この場合において、報酬計算部23は、位置指令53が示す位置とFB位置54の差分が小さいほど報酬57を増大させ、FB電流55が小さいほど報酬57を増大させる。学習部21は、算出した報酬57を関数更新部22に送る。以下の説明では、位置指令53が示す位置とFB位置54との差分を位置差分という場合がある。 The reward calculator 23 calculates a reward 57 based on the state variable 56. The reward calculator 23 calculates the difference between the position indicated by the position command 53 and the FB position 54 based on the state variable 56, and extracts the FB current 55 from the state variable 56. The reward calculation unit 23 increases the reward 57 when the difference between the position indicated by the position command 53 and the FB position 54 is equal to or less than the threshold value and the FB current 55 is equal to or less than the reference current value A. In this case, the reward calculation unit 23 increases the reward 57 as the difference between the position indicated by the position command 53 and the FB position 54 is smaller, and increases the reward 57 as the FB current 55 is smaller. The learning unit 21 sends the calculated reward 57 to the function updating unit 22. In the following description, the difference between the position indicated by the position command 53 and the FB position 54 may be referred to as a position difference.
 関数更新部22は、行動58を決定するための関数を記憶しており、報酬57に基づいて、行動58を決定するための関数を更新する。行動58を決定するための関数の例は、後述する行動価値関数Q(st,at)である。本実施の形態の関数更新部22は、工作機械2でワーク40の受け渡し動作が繰り返されるたびに、受け渡し位置の位置ずれ量が減るよう、行動価値関数Q(s,a)を更新する。関数更新部22は、更新した行動価値関数Q(s,a)を用いて行動58を算出する。関数更新部22は、算出した行動58を制御部13に送り、前回までの学習データ、学習に用いるデータおよびローダ36の制御に必要なデータを記憶部14に送る。学習データの例は、受け渡しに成功した場合に算出された次回用の位置指令53であり、学習に用いるデータの例は、学習部21が学習の際に用いる行動価値関数Q(s,a)である。ローダ36の制御に用いるデータの例は、引き戻し量Lzである。記憶部14は、前回までの学習データ、学習に用いるデータおよび制御部13によるローダ36の制御に必要なデータを記憶する。 The function update unit 22 stores a function for determining the action 58, and updates the function for determining the action 58 based on the reward 57. An example of a function for determining the action 58 is an action value function Q (s t , a t ) described later. The function updating unit 22 of the present embodiment updates the action value function Q (s, a) such that the displacement of the delivery position is reduced each time the delivery operation of the workpiece 40 is repeated in the machine tool 2. The function update unit 22 calculates the action 58 using the updated action value function Q (s, a). The function update unit 22 sends the calculated action 58 to the control unit 13 and sends the storage unit 14 the previous learning data, the data used for learning, and the data necessary for controlling the loader 36. An example of the learning data is the next position command 53 calculated when the delivery is successful, and an example of the data used for the learning is an action value function Q (s, a) used by the learning unit 21 for the learning. It is. An example of data used for controlling the loader 36 is the pullback amount Lz. The storage unit 14 stores learning data up to the previous time, data used for learning, and data necessary for controlling the loader 36 by the control unit 13.
 つぎに、工作システム1の動作手順について説明する。図3は、実施の形態にかかる工作システムの動作手順を示すフローチャートである。数値制御装置10は、ローディング指令61を読み出すと、ワーク40の受け渡しが初めてである場合、すなわち未学習の場合、上述の(A1)の指令引数で指令された受け渡し位置へのローダ36の移動を開始する(ステップST1)。このとき、制御部13は、ドライブユニット37および状態観測部25に位置指令53を送る。これにより、ローダチャック32がワーク40を掴んだ状態でローダ36がX軸方向へ移動し、その後、Z軸方向に移動する。 Next, the operation procedure of the machining system 1 will be described. FIG. 3 is a flowchart illustrating an operation procedure of the machining system according to the embodiment. Upon reading the loading instruction 61, the numerical controller 10 determines that the transfer of the loader 36 to the transfer position instructed by the instruction argument (A1) is performed when the delivery of the work 40 is the first time, that is, when the workpiece 40 has not been learned. Start (step ST1). At this time, the control unit 13 sends a position command 53 to the drive unit 37 and the state observation unit 25. Accordingly, the loader 36 moves in the X-axis direction with the loader chuck 32 gripping the work 40, and then moves in the Z-axis direction.
 ワーク40が移動を開始すると、ドライブユニット37は、特定の周期毎にFB位置54およびFB電流55を取得して、状態観測部25に送る。これにより、状態観測部25は、位置指令53、FB位置54およびFB電流55を監視する。 When the work 40 starts to move, the drive unit 37 acquires the FB position 54 and the FB current 55 for each specific cycle, and sends them to the state observation unit 25. Accordingly, the state observation unit 25 monitors the position command 53, the FB position 54, and the FB current 55.
 状態観測部25は、X軸およびZ軸の各軸におけるFB電流55が基準電流値A以下であるか否かを判定する(ステップST2)。なお、基準電流値Aは、X軸とZ軸とで異なる値が用いられてもよい。 The state observation unit 25 determines whether the FB current 55 in each of the X-axis and the Z-axis is equal to or smaller than the reference current value A (step ST2). Note that, as the reference current value A, different values may be used for the X axis and the Z axis.
 受け渡し位置へ到達する前に、X軸またはZ軸の各軸におけるFB電流55が基準電流値Aよりも大きくなる場合(ステップST2、No)、状態観測部25は、X軸方向およびZ軸方向の位置指令53と、X軸方向およびZ軸方向のFB位置54と、X軸方向およびZ軸方向のFB電流55とを含んだ状態変数56を学習部21に送る。また、状態観測部25は、ワーク40の移動位置が許容範囲外であることを制御部13に通知する。 Before reaching the transfer position, if the FB current 55 in each of the X-axis or the Z-axis becomes larger than the reference current value A (No in step ST2), the state observation unit 25 sets the X-axis direction and the Z-axis direction. To the learning unit 21 including the position command 53, the FB position 54 in the X-axis direction and the Z-axis direction, and the FB current 55 in the X-axis direction and the Z-axis direction. Further, the state observation unit 25 notifies the control unit 13 that the movement position of the work 40 is out of the allowable range.
 各軸のFB電流55が基準電流値Aよりも大きい場合、すなわち受け渡し位置への移動が失敗である場合、学習部21は、受け渡しに用いた位置指令53に対して小さな値の報酬57を設定する。これにより、学習部21は、状態変数56に従って、適切な位置指令53を学習し、報酬57が最大となるよう、次回の位置指令53である行動58を決定する(ステップST4a)。制御部13は、ワーク40を引き戻し量LzだけZ軸方向に沿って引き戻す(ステップST5)。そして、制御部13は、受け渡し位置へのローダ36の移動を開始する(ステップST1)。これにより、制御部13は、行動58に従って生成した位置指令53でワーク40をX軸方向に移動させ、その後、ワーク40をZ軸方向に移動させる。 When the FB current 55 of each axis is larger than the reference current value A, that is, when the movement to the transfer position has failed, the learning unit 21 sets a small value reward 57 for the position command 53 used for transfer. I do. Accordingly, the learning unit 21 learns an appropriate position command 53 according to the state variable 56, and determines an action 58 that is the next position command 53 so that the reward 57 is maximized (step ST4a). The control unit 13 pulls back the work 40 along the Z-axis direction by the pullback amount Lz (step ST5). Then, the control unit 13 starts moving the loader 36 to the transfer position (step ST1). Thereby, the control unit 13 moves the work 40 in the X-axis direction by the position command 53 generated according to the action 58, and thereafter moves the work 40 in the Z-axis direction.
 ステップST2の処理において、X軸およびZ軸の各軸におけるFB電流55が基準電流値A以下になると(ステップST2、Yes)、状態観測部25は、X軸およびZ軸の各軸における位置指令53が示す位置とFB位置54との位置差分が閾値以下であるか否かを判定する(ステップST3)。 In the process of step ST2, when the FB current 55 in each of the X-axis and the Z-axis becomes equal to or smaller than the reference current value A (Yes in step ST2), the state observation unit 25 sets the position command in each of the X-axis and the Z-axis. It is determined whether or not the position difference between the position indicated by 53 and the FB position 54 is equal to or smaller than a threshold (step ST3).
 X軸およびZ軸におけるFB位置54と位置指令53とが異なる場合(ステップST3、No)、状態観測部25は、X軸方向の位置指令53と、X軸方向のFB位置54と、X軸方向のFB電流55とを含んだ状態変数56を学習部21に送る。また、状態観測部25は、ワーク40の移動位置が許容範囲外であることを制御部13に通知する。これにより、上述したステップST4a,ST5の処理が行なわれ、さらにステップST1の処理が行なわれる。 When the FB position 54 in the X axis and the Z axis is different from the position command 53 (step ST3, No), the state observation unit 25 sets the position command 53 in the X axis direction, the FB position 54 in the X axis direction, and the X axis direction. The state variable 56 including the FB current 55 in the direction is sent to the learning unit 21. Further, the state observation unit 25 notifies the control unit 13 that the movement position of the work 40 is out of the allowable range. Thereby, the processing of steps ST4a and ST5 described above is performed, and further the processing of step ST1 is performed.
 ここで学習部21による行動58の学習処理について説明する。学習部21に用いる学習アルゴリズムは、何れの学習アルゴリズムであってもよい。ここでは、学習アルゴリズムに、強化学習(Reinforcement Learning)を適用した場合について説明する。強化学習は、ある環境内における行動主体であるエージェントが、状態変数56で示される現在の状態を観測し、観測結果に基づいて取るべき行動58を決定するというものである。エージェントは、行動58を選択することで環境から報酬57を得て、一連の行動58を通じて報酬57が最も多く得られるような方策を学習する。強化学習の代表的な手法として、Q学習(Q-Learning)およびTD学習(TD-Learning)が知られている。例えば、Q学習の場合、行動価値関数Q(s,a)の一般的な更新式(行動価値テーブル)は、以下の式(1)で表される。すなわち、行動価値テーブルの一例は、式(1)の行動価値関数Q(s,a)である。 Here, the learning process of the action 58 by the learning unit 21 will be described. The learning algorithm used for the learning unit 21 may be any learning algorithm. Here, a case where reinforcement learning (Reinforcement @ Learning) is applied to the learning algorithm will be described. In reinforcement learning, an agent acting as an agent in a certain environment observes a current state indicated by a state variable 56, and determines an action 58 to be taken based on the observation result. The agent obtains the reward 57 from the environment by selecting the action 58, and learns a policy that maximizes the reward 57 through a series of actions 58. As typical methods of reinforcement learning, Q learning (Q-Learning) and TD learning (TD-Learning) are known. For example, in the case of Q learning, a general update equation (action value table) of the action value function Q (s, a) is represented by the following equation (1). That is, an example of the action value table is the action value function Q (s, a) of Expression (1).
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 式(1)において、stは時刻tにおける環境を表し、atは時刻tにおける行動を表す。行動atにより、環境はst+1に変わる。rt+1はその環境の変化によってもらえる報酬57を表し、γは割引率を表し、αは学習係数を表す。Q学習を適用した場合、受け渡し動作の次回の位置指令53が行動atとなる。 In the formula (1), s t represents the environment at time t, a t represents the behavior in time t. By the action a t, the environment is changed to s t + 1. rt + 1 represents a reward 57 obtained by a change in the environment, γ represents a discount rate, and α represents a learning coefficient. If you apply the Q-learning, the next position command 53 of the delivery operation is the action a t.
 式(1)で表される更新式は、時刻t+1における最良の行動aの行動価値が、時刻tにおいて実行された行動aの行動価値Qよりも大きければ、行動価値Qを大きくし、逆の場合は、行動価値Qを小さくする。換言すれば、時刻tにおける行動aの行動価値Qを、時刻t+1における最良の行動価値に近づけるように、行動価値関数Q(s,a)を更新する。それにより、ある環境における最良の行動価値が、それ以前の環境における行動価値に順次伝播していくようになる。 The update expression represented by the expression (1) increases the action value Q if the action value of the best action a at the time t + 1 is larger than the action value Q of the action a executed at the time t. In this case, the action value Q is reduced. In other words, the action value function Q (s, a) is updated so that the action value Q of the action a at the time t approaches the best action value at the time t + 1. As a result, the best action value in one environment is sequentially propagated to the action value in the previous environment.
 報酬計算部23は、位置指令53が示す位置とFB位置54との位置差分、およびFB電流55に基づいて、報酬57を計算する。 The reward calculator 23 calculates the reward 57 based on the difference between the position indicated by the position command 53 and the FB position 54 and the FB current 55.
 前述したように、報酬計算部23は、位置指令53が示す位置とFB位置54との位置差分が閾値以下であり、かつFB電流55が基準電流値A以下の場合には報酬57を増大させる。このとき、報酬計算部23は、例えば「1」の報酬57を与える。 As described above, the reward calculation unit 23 increases the reward 57 when the position difference between the position indicated by the position command 53 and the FB position 54 is equal to or smaller than the threshold value and the FB current 55 is equal to or smaller than the reference current value A. . At this time, the reward calculator 23 gives a reward 57 of, for example, “1”.
 一方、報酬計算部23は、位置指令53が示す位置とFB位置54の差分が閾値よりも大きい場合、またはFB電流55が基準電流値Aよりも大きい場合には報酬57を低減させる。このとき、報酬計算部23は、例えば「-1」の報酬57を与える。 On the other hand, the reward calculation unit 23 reduces the reward 57 when the difference between the position indicated by the position command 53 and the FB position 54 is larger than the threshold value or when the FB current 55 is larger than the reference current value A. At this time, the reward calculator 23 gives a reward 57 of, for example, “−1”.
 報酬計算部23は、例えば、位置差分が0であり、かつFB電流55の変化量が0であるときは、報酬57を最大報酬とする。報酬計算部23は、位置差分が閾値以下であり、FB電流55が基準電流値Aの半分の場合には、報酬57を最大報酬の半分とする。FB電流55が基準電流値Aの半分の場合の例は、ワーク40の受け渡しは成功しているが、受け渡し位置が所望位置から少しずれている場合である。ワーク40が、主軸チャック31に擦れながら受け渡し位置まで到達した場合には、擦れている間、FB電流55が大きくなる。また、受け渡し位置が所望位置から少しずれている場合、受け渡し位置へ到達した後に、主軸チャック31がワーク40の受け渡し位置の中心でワーク40を掴もうとすると、ワーク40が主軸チャック31に押されてFB電流55が大きくなる。このような場合は、受け渡し位置は許容範囲内であるが、機械学習装置20は、ワーク40が端部30に衝突しなかった場合と、ワーク40が端部30に衝突した場合との間の報酬57を与える。 For example, when the position difference is 0 and the amount of change in the FB current 55 is 0, the reward calculation unit 23 sets the reward 57 as the maximum reward. When the position difference is equal to or smaller than the threshold value and the FB current 55 is half the reference current value A, the reward calculation unit 23 sets the reward 57 to half the maximum reward. An example of the case where the FB current 55 is half of the reference current value A is a case where the transfer of the work 40 is successful, but the transfer position is slightly shifted from the desired position. When the workpiece 40 reaches the delivery position while rubbing against the spindle chuck 31, the FB current 55 increases during rubbing. Further, when the transfer position is slightly deviated from the desired position, if the spindle chuck 31 attempts to grip the work 40 at the center of the transfer position of the work 40 after reaching the transfer position, the work 40 is pushed by the spindle chuck 31. As a result, the FB current 55 increases. In such a case, the transfer position is within the allowable range. However, the machine learning device 20 determines whether the workpiece 40 has not collided with the end 30 or not. The reward 57 is given.
 報酬計算部23は、位置差分が閾値よりも大きい場合、またはFB電流55が基準電流値Aよりも大きい場合には報酬57を最小報酬とする。報酬計算部23は、計算した報酬57を関数更新部22に送る。 The reward calculator 23 sets the reward 57 as the minimum reward when the position difference is larger than the threshold value or when the FB current 55 is larger than the reference current value A. The reward calculation unit 23 sends the calculated reward 57 to the function update unit 22.
 関数更新部22は、報酬計算部23によって計算された報酬57に従って、行動58を決定するための関数を更新する。例えばQ学習の場合、式(1)で表される行動価値関数Q(st,at)が、行動58を計算するための関数であり、関数更新部22によって更新される。 The function update unit 22 updates a function for determining the behavior 58 according to the reward 57 calculated by the reward calculation unit 23. For example, in the case of Q learning, an action value function Q (s t , a t ) represented by Expression (1) is a function for calculating the action 58, and is updated by the function update unit 22.
 図4は、実施の形態にかかる機械学習装置による第1の学習例を説明するための図である。図4では、ワーク40のX軸方向の位置P0~P6を示している。ワーク40が、主軸チャック31の端部30に衝突した場合のワーク40のX軸方向の位置が位置P0であるとする。この場合、機械学習装置20は、ワーク40が主軸チャック31の端部30に衝突しなくなるまで、ワーク40をZ軸方向のプラス側に引き戻す処理と、ワーク40をX軸方向の次の位置に移動させる処理と、ワーク40をZ軸方向のマイナス側に挿入する処理とを繰り返させる。 FIG. 4 is a diagram for explaining a first learning example by the machine learning device according to the embodiment. FIG. 4 shows positions P0 to P6 of the work 40 in the X-axis direction. It is assumed that the position of the work 40 in the X-axis direction when the work 40 collides with the end 30 of the spindle chuck 31 is the position P0. In this case, the machine learning device 20 pulls the work 40 back to the plus side in the Z-axis direction until the work 40 no longer collides with the end portion 30 of the spindle chuck 31, and moves the work 40 to the next position in the X-axis direction. The process of moving and the process of inserting the workpiece 40 on the minus side in the Z-axis direction are repeated.
 具体的には、機械学習装置20は、X軸方向の位置である位置P1、位置P2、位置P3、位置P4、位置P5、位置P6の順番でワーク40をX軸方向に移動させる。位置P0と位置P1との間は、移動量Lxだけ離れている。同様に、位置P1と位置P2との間、位置P2と位置P3との間、位置P0と位置P4との間、位置P4と位置P5との間、位置P5と位置P6との間は、それぞれ移動量Lxだけ離れている。また、位置P0と位置P3との間、位置P0と位置P6との間は、それぞれ移動距離Lmaxだけ離れている。 Specifically, the machine learning device 20 moves the work 40 in the X-axis direction in the order of position P1, position P2, position P3, position P4, position P5, and position P6, which are positions in the X-axis direction. The distance between the position P0 and the position P1 is a distance Lx. Similarly, between the position P1 and the position P2, between the position P2 and the position P3, between the position P0 and the position P4, between the position P4 and the position P5, and between the position P5 and the position P6, respectively. They are separated by the movement amount Lx. In addition, the distance between the position P0 and the position P3 and the distance between the position P0 and the position P6 are apart from each other by the moving distance Lmax.
 例えば、機械学習装置20は、ワーク40を位置P1に移動させる場合、ワーク40を位置P1にロードするための行動58を制御部13に送る。これにより、行動58に対応する位置指令53でローダ36がX軸方向に移動する。 For example, when moving the work 40 to the position P1, the machine learning device 20 sends an action 58 for loading the work 40 to the position P1 to the control unit 13. As a result, the loader 36 moves in the X-axis direction according to the position command 53 corresponding to the action 58.
 機械学習装置20は、ワーク40が端部30に衝突せずに主軸チャック31まで移動させることができると、ワーク40をX軸方向に移動させることを完了する。機械学習装置20は、ワーク40が端部30に衝突した場合の位置指令53に低い報酬57を与え、ワーク40が端部30に衝突しなかった場合の位置指令53に高い報酬57を与える。 (4) When the workpiece 40 can be moved to the spindle chuck 31 without colliding with the end 30, the machine learning device 20 completes moving the workpiece 40 in the X-axis direction. The machine learning device 20 gives a low reward 57 to the position command 53 when the work 40 collides with the end 30, and gives a high reward 57 to the position command 53 when the work 40 does not collide with the end 30.
 なお、機械学習装置20は、ワーク40を何れの順番で位置P1~P6に移動させてもよい。例えば、機械学習装置20は、ワーク40を位置P1、位置P4、位置P2、位置P5、位置P3、位置P6の順番のように位置P0に近い順番でX軸方向に移動させてもよい。また、機械学習装置20は、X軸方向の位置として位置P1~P6の6箇所の位置を設定する場合に限らず、5箇所以下または7箇所以上のX軸方向の位置を設定してもよい。 The machine learning device 20 may move the work 40 to the positions P1 to P6 in any order. For example, the machine learning device 20 may move the work 40 in the X-axis direction in an order close to the position P0, such as the order of the position P1, the position P4, the position P2, the position P5, the position P3, and the position P6. Further, the machine learning device 20 is not limited to setting the six positions P1 to P6 as the positions in the X-axis direction, and may set five or less positions or seven or more positions in the X-axis direction. .
 X軸およびZ軸におけるFB電流55が基準電流値A以下になり(ステップST2、Yes)、かつX軸およびZ軸における位置指令53が示す位置とFB位置54との位置差分が閾値以下になると(ステップST3、Yes)、学習部21は、状態変数56に従って、次回の位置指令53である行動58を学習する(ステップST4b)。すなわち、学習部21は、受け渡しに用いた位置指令53に対して大きな値の報酬57を設定したうえで、次回の位置指令53に対応する行動58を決定する。 When the FB current 55 on the X axis and the Z axis becomes equal to or less than the reference current value A (Yes in step ST2), and the position difference between the position indicated by the position command 53 on the X axis and the Z axis and the FB position 54 becomes smaller than the threshold value. (Step ST3, Yes), the learning unit 21 learns the action 58 which is the next position command 53 according to the state variable 56 (Step ST4b). That is, the learning unit 21 sets the reward 57 having a large value for the position command 53 used for delivery, and then determines the action 58 corresponding to the next position command 53.
 また、状態観測部25は、ワーク40が受け渡し位置へ移動した後、ワーク40の移動位置が許容範囲内であることを制御部13に通知する。これにより、制御部13は、チャック間でのワーク40の受け渡し処理を進める。 {Circle around (5)} After the work 40 moves to the transfer position, the state observation unit 25 notifies the control unit 13 that the movement position of the work 40 is within the allowable range. Thereby, the control unit 13 proceeds with the process of transferring the work 40 between the chucks.
 具体的には、制御部13は、上述したs2からs6の動作を工作機械2に実行させる。すなわち、制御部13は、主軸チャック31を閉じる動作を開始し(ステップST6)、主軸チャック31が閉じるまで待つ(ステップST7)。主軸チャック31が閉じることによって、主軸チャック31は、ワーク40を掴む。 Specifically, the control unit 13 causes the machine tool 2 to execute the above-described operations from s2 to s6. That is, the control unit 13 starts the operation of closing the spindle chuck 31 (step ST6), and waits until the spindle chuck 31 is closed (step ST7). When the spindle chuck 31 is closed, the spindle chuck 31 grips the work 40.
 この後、制御部13は、ローダチャック32を開く動作を開始し(ステップST8)、ローダチャック32が開くまで待つ(ステップST9)。そして、制御部13は、ローダ36をZ軸方向のプラス側に退避させる(ステップST10)。 After that, the controller 13 starts the operation of opening the loader chuck 32 (step ST8), and waits until the loader chuck 32 is opened (step ST9). Then, the control unit 13 retreats the loader 36 to the plus side in the Z-axis direction (step ST10).
 機械学習装置20は、ワーク40が主軸チャック31からローダ36へアンロードされる際にも、ワーク40がローダ36から主軸チャック31にロードされる場合と同様の処理によって位置指令53を学習する。 (4) Even when the work 40 is unloaded from the spindle chuck 31 to the loader 36, the machine learning device 20 learns the position command 53 by the same processing as when the work 40 is loaded from the loader 36 to the spindle chuck 31.
 このように、ワーク40の受け渡しに失敗した場合には、機械学習装置20が、移動量Lxに従ってワーク40の受け渡しを再度実行させるので、受け渡し位置の修正を行うことができる。また、機械学習装置20が、ワーク40の受け渡し位置を学習するので、受け渡しの失敗を防止することができる。また、機械学習装置20が、ワーク40の受け渡し位置を学習するので、コレットチャックのようにわずかな位置ずれで受け渡しに失敗する環境にも機械学習装置20を適用できる。また、機械学習装置20が、ワーク40の受け渡しを再度実行させるとともに、受け渡しの失敗を防止することができるので、工作機械2による生産性が向上する。 In this way, when the delivery of the work 40 fails, the machine learning device 20 causes the delivery of the work 40 to be executed again according to the movement amount Lx, so that the delivery position can be corrected. Further, since the machine learning device 20 learns the transfer position of the work 40, it is possible to prevent the transfer failure. Further, since the machine learning device 20 learns the transfer position of the work 40, the machine learning device 20 can be applied to an environment in which the transfer fails due to a slight displacement such as a collet chuck. Further, since the machine learning device 20 can execute the delivery of the work 40 again and prevent the failure of the delivery, the productivity of the machine tool 2 is improved.
 また、機械学習装置20が、位置指令53が示す位置とFB位置54との位置差分に基づいて、ワーク40の受け渡し位置を判定しているので、ワーク40の受け渡しを確認するためのカメラといった特別な機構または装置を設ける必要がない。したがって、低コストで受け渡しを確認することができる。 Further, since the machine learning device 20 determines the transfer position of the work 40 based on the position difference between the position indicated by the position command 53 and the FB position 54, a special device such as a camera for confirming the transfer of the work 40 is used. There is no need to provide a special mechanism or device. Therefore, delivery can be confirmed at low cost.
 また、ワーク40の受け渡しに失敗した場合には、機械学習装置20が、移動量Lxに従ってワーク40の受け渡しを再度実行するので、ワーク40が主軸チャック31の端部30に衝突した場合であっても人手で復旧は行う必要がない。したがって、ワーク40をロードする際のダウンタイムを短縮することができ、生産性の悪化を抑制できる。 Further, when the delivery of the work 40 has failed, the machine learning device 20 executes the delivery of the work 40 again according to the movement amount Lx, so that the work 40 collides with the end 30 of the spindle chuck 31. There is no need for manual recovery. Therefore, downtime when loading the work 40 can be reduced, and deterioration in productivity can be suppressed.
 なお、本実施の形態では、工作機械2が、ワーク40をX軸方向およびZ軸方向に移動させる場合について説明したが、工作機械2は、ワーク40をX軸方向、Y軸方向およびZ軸方向に移動させてもよい。この場合、数値制御装置10がドライブユニット37に送る位置指令53には、X軸方向の位置指令と、Y軸方向の位置指令と、Z軸方向の位置指令とが含まれている。また、サーボモータ38は、ローダ36をX軸方向、Y軸方向およびZ軸方向に移動させる各サーボモータを含んでいる。そして、機械学習装置20は、X軸方向への位置指令と、Y軸方向の位置指令と、Z軸方向の位置指令とを学習する。 In the present embodiment, the case where the machine tool 2 moves the work 40 in the X-axis direction and the Z-axis direction has been described, but the machine tool 2 moves the work 40 in the X-axis direction, the Y-axis direction, and the Z-axis direction. It may be moved in the direction. In this case, the position command 53 sent by the numerical controller 10 to the drive unit 37 includes a position command in the X-axis direction, a position command in the Y-axis direction, and a position command in the Z-axis direction. The servomotors 38 include servomotors for moving the loader 36 in the X-axis direction, the Y-axis direction, and the Z-axis direction. Then, the machine learning device 20 learns a position command in the X-axis direction, a position command in the Y-axis direction, and a position command in the Z-axis direction.
 図5は、実施の形態にかかる機械学習装置による第2の学習例を説明するための図である。図6は、実施の形態にかかる工作機械が備える主軸チャックとワークとの位置関係を説明するための図である。ここでは、工作機械2が、ワーク40をX軸方向、Y軸方向およびZ軸方向に移動させる場合について説明する。図5および図6では、ワーク40をZ軸方向から見た場合のワーク40のXY平面内の位置を示している。 FIG. 5 is a diagram for explaining a second learning example by the machine learning device according to the embodiment. FIG. 6 is a diagram for explaining a positional relationship between a spindle chuck and a work included in the machine tool according to the embodiment. Here, a case where the machine tool 2 moves the workpiece 40 in the X-axis direction, the Y-axis direction, and the Z-axis direction will be described. FIGS. 5 and 6 show positions of the work 40 in the XY plane when the work 40 is viewed from the Z-axis direction.
 図6に示す主軸チャック31Aは、図1で説明した主軸チャック31の一例である。主軸チャック31Aは、三爪チャックである。3つの主軸チャック31Aは、それぞれが円形のチャック領域45の中心Q1に移動することによってワーク40を掴む。 主 The spindle chuck 31A shown in FIG. 6 is an example of the spindle chuck 31 described in FIG. The spindle chuck 31A is a three-jaw chuck. Each of the three spindle chucks 31A grips the workpiece 40 by moving to the center Q1 of the circular chuck area 45.
 数値制御装置10は、ワーク40を中心Q1へ挿入させるための位置指令53を生成するが、実際のワーク40は、位置P0といった中心Q1からずれた位置に挿入され、ワーク40が端部30に衝突する場合がある。このため、機械学習装置20は、ワーク40が端部30に衝突しないような位置指令53を学習する。 The numerical controller 10 generates a position command 53 for inserting the work 40 into the center Q1, but the actual work 40 is inserted at a position shifted from the center Q1, such as the position P0, and the work 40 is May collide. For this reason, the machine learning device 20 learns the position command 53 such that the work 40 does not collide with the end 30.
 ワーク40が、主軸チャック31Aの端部30に衝突した場合のワーク40の位置が位置P0であるとする。この場合、機械学習装置20は、ワーク40が主軸チャック31Aの端部30に衝突しなくなるまで、ワーク40をZ軸方向のプラス側に引き戻す処理と、ワーク40をX軸方向およびY軸方向の次の位置に移動させる処理と、ワーク40をZ軸方向のマイナス側に挿入させる処理とを繰り返させる。この場合において、機械学習装置20は、移動量Lxおよび移動距離Lmaxに基づいて、ワーク40の次の移動位置を算出する。すなわち、ワーク40は、移動位置の学習時には、XY平面内で特定の学習方向および特定の学習移動量によって制限された位置に移動させられる。 と す る It is assumed that the position of the work 40 when the work 40 collides with the end 30 of the spindle chuck 31A is the position P0. In this case, the machine learning device 20 pulls the work 40 back to the plus side in the Z-axis direction until the work 40 no longer collides with the end 30 of the spindle chuck 31A, and moves the work 40 in the X-axis direction and the Y-axis direction. The process of moving to the next position and the process of inserting the work 40 on the minus side in the Z-axis direction are repeated. In this case, the machine learning device 20 calculates the next movement position of the work 40 based on the movement amount Lx and the movement distance Lmax. That is, when learning the movement position, the work 40 is moved to a position limited by a specific learning direction and a specific learning movement amount in the XY plane.
 具体的には、機械学習装置20は、XY平面内の位置である位置P11、位置P12、位置P13、位置P14、位置P15、位置P16、位置P17、位置P18、位置P19の順番でワーク40を移動させる。位置P0と位置P11との間、位置P11と位置P12との間、位置P12と位置P13との間は、それぞれ移動量Lxだけ離れている。同様に、位置P0と位置P14との間、位置P14と位置P15との間、位置P15と位置P16との間は、それぞれ移動量Lxだけ離れている。同様に、位置P0と位置P17との間、位置P17と位置P18との間、位置P18と位置P19との間は、それぞれ移動量Lxだけ離れている。また、位置P0と位置P13との間、位置P0と位置P16との間、位置P0と位置P19との間は、それぞれ移動距離Lmaxだけ離れている。 Specifically, the machine learning device 20 moves the work 40 in the order of the positions P11, P12, P13, P14, P15, P16, P17, P18, and P19, which are positions in the XY plane. Move. The distance between the position P0 and the position P11, the distance between the position P11 and the position P12, and the distance between the position P12 and the position P13 are apart from each other by a movement amount Lx. Similarly, the distance between the position P0 and the position P14, the distance between the position P14 and the position P15, and the distance between the position P15 and the position P16 are separated by the movement amount Lx. Similarly, the distance between the position P0 and the position P17, the distance between the position P17 and the position P18, and the distance between the position P18 and the position P19 are apart by the movement amount Lx. The distance between the position P0 and the position P13, the distance between the position P0 and the position P16, and the distance between the position P0 and the position P19 are apart from each other by a movement distance Lmax.
 また、位置P0から位置P13へ向かう方向と、位置P0から位置P16へ向かう方向との間は、学習角度θをなしており、位置P0から位置P16へ向かう方向と、位置P0から位置P19へ向かう方向との間は、学習角度θをなしている。機械学習装置20は、第1の方向にワーク40を位置P0から移動量Lxずつ移動させる処理と、移動距離Lmaxまで移動させると、第1の方向から学習角度θだけ回転させた第2の方向に、ワーク40を位置P0から移動量Lxずつ移動させる処理とを繰り返す。すなわち、機械学習装置20は、ワーク40への適切な移動位置の最初の探索を行う際に、第1の方向に対して、最大の移動距離Lmaxまで、移動量Lxずつワーク40を移動させながら適切な移動位置を探索する。この第1の方向で適切な移動位置が見つからない場合、機械学習装置20は、第1の方向を学習角度θだけ回転させた第2の方向に対して、第1の方向と同様の探索を行う。機械学習装置20は、チャック領域45内で学習角度θの合計が360度を超えるまで、移動量Lxずつワーク40を移動させながら適切な移動位置を探索する処理と、学習角度θを回転させる処理とを繰り返す。 The direction between the direction from the position P0 to the position P13 and the direction from the position P0 to the position P16 forms a learning angle θ, and the direction from the position P0 to the position P16 and the direction from the position P0 to the position P19. The direction forms a learning angle θ. When the machine learning device 20 moves the work 40 in the first direction from the position P0 by the movement amount Lx, and moves the work 40 to the movement distance Lmax, the machine learning device 20 rotates the work 40 by the learning angle θ from the first direction. Then, the process of moving the work 40 from the position P0 by the movement amount Lx is repeated. That is, when performing the first search for an appropriate movement position on the work 40, the machine learning device 20 moves the work 40 by the movement amount Lx to the maximum movement distance Lmax in the first direction. Search for an appropriate movement position. When an appropriate moving position is not found in the first direction, the machine learning device 20 performs a search similar to the first direction in the second direction obtained by rotating the first direction by the learning angle θ. Do. The machine learning device 20 searches for an appropriate movement position while moving the work 40 by the movement amount Lx until the sum of the learning angles θ in the chuck area 45 exceeds 360 degrees, and rotates the learning angle θ. Repeat.
 例えば、機械学習装置20は、ワーク40を位置P11に移動させる場合、ワーク40を位置P11にロードするための行動58を制御部13に送る。これにより、行動58に対応する位置指令53でローダ36がX軸方向およびY軸方向に移動する。 For example, when moving the work 40 to the position P11, the machine learning device 20 sends an action 58 for loading the work 40 to the position P11 to the control unit 13. Thereby, the loader 36 moves in the X-axis direction and the Y-axis direction by the position command 53 corresponding to the action 58.
 機械学習装置20は、ワーク40が端部30に衝突せずに主軸チャック31Aまで移動させることができると、ワーク40をX軸方向およびY軸方向に移動させることを完了する。 When the work 40 can be moved to the spindle chuck 31A without colliding with the end 30, the machine learning device 20 completes moving the work 40 in the X-axis direction and the Y-axis direction.
 なお、機械学習装置20は、ワーク40を何れの順番で位置P11~P19に移動させてもよい。例えば、機械学習装置20は、ワーク40を位置P11、位置P14、位置P17、位置P12、位置P15、位置P18、位置P13、位置P16、位置P19の順番のように位置P0に近い順番でワーク40を移動させてもよい。また、機械学習装置20は、XY平面内の位置として位置P11~P19の9箇所の位置を設定する場合に限らず、8箇所以下または10箇所以上のXY平面内の位置を設定してもよい。 The machine learning device 20 may move the work 40 to the positions P11 to P19 in any order. For example, the machine learning device 20 arranges the work 40 in the order close to the position P0, such as the order of position P11, position P14, position P17, position P12, position P15, position P18, position P13, position P16, and position P19. May be moved. Further, the machine learning apparatus 20 is not limited to setting nine positions P11 to P19 as positions in the XY plane, but may set eight or less positions or ten or more positions in the XY plane. .
 ここで、数値制御装置10のハードウェア構成について説明する。図7は、実施の形態にかかる数値制御装置のハードウェア構成例を示す図である。 Here, the hardware configuration of the numerical controller 10 will be described. FIG. 7 is a diagram illustrating a hardware configuration example of the numerical control device according to the embodiment.
 数値制御装置10は、図7に示した制御回路300、すなわちプロセッサ301、メモリ302により実現することができる。プロセッサ301の例は、CPU(Central Processing Unit、中央処理装置、処理装置、演算装置、マイクロプロセッサ、マイクロコンピュータ、プロセッサ、DSP(Digital Signal Processor)ともいう)またはシステムLSI(Large Scale Integration)である。メモリ302の例は、RAM(Random Access Memory)、ROM(Read Only Memory)である。 The numerical controller 10 can be realized by the control circuit 300 shown in FIG. An example of the processor 301 is a CPU (Central Processing Unit), a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, a processor, a DSP (Digital Signal Processor), or a system LSI (Large Scale Integration). Examples of the memory 302 are a RAM (Random Access Memory) and a ROM (Read Only Memory).
 数値制御装置10は、プロセッサ301が、メモリ302で記憶されている、数値制御装置10の動作を実行するためのプログラムを読み出して実行することにより実現される。また、このプログラムは、数値制御装置10の手順または方法をコンピュータに実行させるものであるともいえる。メモリ302は、プロセッサ301が各種処理を実行する際の一時メモリにも使用される。 The numerical controller 10 is realized by the processor 301 reading and executing a program stored in the memory 302 for executing the operation of the numerical controller 10. It can also be said that this program causes a computer to execute the procedure or method of the numerical control device 10. The memory 302 is also used as a temporary memory when the processor 301 executes various processes.
 なお、数値制御装置10の機能について、一部を専用のハードウェアで実現し、一部をソフトウェアまたはファームウェアで実現するようにしてもよい。また、機械学習装置20を、図7に示した制御回路300によって実現してもよい。 Note that some of the functions of the numerical control device 10 may be realized by dedicated hardware, and some may be realized by software or firmware. Further, the machine learning device 20 may be realized by the control circuit 300 shown in FIG.
 本実施の形態では、FB電流55および位置差分に基づいて、受け渡し位置が許容範囲内であるか否かを判定したが、FB電流55を用いることなく位置差分に基づいて受け渡し位置が許容範囲内であるか否かを判定してもよい。 In the present embodiment, whether or not the transfer position is within the allowable range is determined based on the FB current 55 and the position difference. However, the transfer position is within the allowable range based on the position difference without using the FB current 55. May be determined.
 また、本実施の形態では、X軸方向のFB電流55およびX軸方向の位置差分に基づいて、受け渡し位置への位置指令53を学習したが、X軸方向のFB電流55を用いることなくX軸方向の位置差分に基づいて受け渡し位置への位置指令53を学習してもよい。 Further, in the present embodiment, the position command 53 to the transfer position is learned based on the FB current 55 in the X-axis direction and the position difference in the X-axis direction, but the X-axis FB current 55 is not used without using the FB current 55. The position command 53 to the transfer position may be learned based on the axial position difference.
 なお、主軸チャック31が、電動式チャックである場合、主軸チャック31側でもワーク40の位置ずれを検出することができる。この場合、機械学習装置20は、主軸チャック31側で検出された位置ずれに基づいて、位置指令53を学習してもよい。 When the spindle chuck 31 is an electric chuck, the displacement of the workpiece 40 can be detected also on the spindle chuck 31 side. In this case, the machine learning device 20 may learn the position command 53 based on the position shift detected on the spindle chuck 31 side.
 本実施の形態では、機械学習装置20が、強化学習を利用して機械学習する場合について説明したが、機械学習装置20は、他の公知の方法、例えばニューラルネットワーク、遺伝的プログラミング、機能論理プログラミング、サポートベクターマシンなどに従って機械学習を実行してもよい。 In the present embodiment, the case where the machine learning device 20 performs the machine learning using the reinforcement learning has been described. However, the machine learning device 20 may perform other known methods such as a neural network, a genetic programming, and a functional logic programming. , Machine learning may be performed according to a support vector machine or the like.
 また、本実施の形態では、学習部21が学習した行動58に基づいて、制御部13がローダ36を制御する場合について説明したが、制御部13は、行動58を用いることなくローダ36を制御してもよい。この場合、制御部13は、位置指令53と、FB位置54と、FB電流55とに基づいて、ワーク40の移動位置が許容範囲内であるか否かを判定し、位置ずれ量が許容範囲内でない場合には、位置指令53で示される位置をずらした新たな位置指令53を出力する。すなわち、制御部13は、少しずつワーク40の移動位置をずらして、ワーク40の適切な移動位置を探索する。具体的には、制御部13は、位置ずれを判定する処理と、位置ずれ量が許容範囲内でない場合に新たな位置指令53を出力する処理と、を1回または複数回実行することによって、位置ずれ量を許容範囲内に収める。 Further, in the present embodiment, a case has been described where the control unit 13 controls the loader 36 based on the behavior 58 learned by the learning unit 21, but the control unit 13 controls the loader 36 without using the behavior 58. May be. In this case, the control unit 13 determines whether the moving position of the work 40 is within the allowable range based on the position command 53, the FB position 54, and the FB current 55, and determines whether the displacement amount is within the allowable range. If not, a new position command 53 shifted from the position indicated by the position command 53 is output. That is, the control unit 13 searches for an appropriate movement position of the work 40 by shifting the movement position of the work 40 little by little. Specifically, the control unit 13 executes the process of determining the position shift and the process of outputting a new position command 53 when the amount of the position shift is not within the allowable range once or a plurality of times. Keep the displacement within an allowable range.
 このように実施の形態によれば、状態変数56に基づいて作成されるデータセットに従って、チャック間におけるワーク40の受け渡し位置の位置ずれを抑制する位置指令53を学習するので、ワーク40の受け渡し動作が繰り返されることによって、ワーク40の受け渡しに失敗する確率を減少させていくことができる。 As described above, according to the embodiment, the position command 53 for suppressing the displacement of the transfer position of the workpiece 40 between the chucks is learned in accordance with the data set created based on the state variable 56. Is repeated, it is possible to reduce the probability that the delivery of the work 40 fails.
 以上の実施の形態に示した構成は、本発明の内容の一例を示すものであり、別の公知の技術と組み合わせることも可能であるし、本発明の要旨を逸脱しない範囲で、構成の一部を省略、変更することも可能である。 The configurations described in the above embodiments are merely examples of the contents of the present invention, and can be combined with other known technologies, and can be combined with other known technologies without departing from the gist of the present invention. Parts can be omitted or changed.
 1 工作システム、2 工作機械、3 制御システム、10 数値制御装置、11 制御加工プログラム記憶部、12 解析部、13 制御部、14 記憶部、20 機械学習装置、21 学習部、22 関数更新部、23 報酬計算部、25 状態観測部、30 端部、31,31A 主軸チャック、32 ローダチャック、35 回転部、36 ローダ、37 ドライブユニット、38 サーボモータ、39 エンコーダ、40 ワーク、52 受け渡し位置情報、53 位置指令、54 FB位置、55 FB電流、56 状態変数、57 報酬、58 行動、61 ローディング指令。 1 machine system, 2 シ ス テ ム machine tool, 3 control system, 10 numerical control device, 11 control machining program storage unit, 12 analysis unit, 13 control unit, 14 storage unit, 20 machine learning device, 21 learning unit, 22 function update unit, 23 reward calculation unit, 25 state observation unit, 30 end unit, 31 and 31A spindle chuck, 32 loader chuck, 35 rotation unit, 36 loader, 37 drive unit, 38 servo motor, 39 encoder, 40 work, 52 delivery position information, 53 Position command, 54 FB position, 55 FB current, 56 state variable, 57 reward, 58 action, 61 loading command.

Claims (11)

  1.  ワークを掴んで送る第1のチャックと、前記ワークを掴んで受け取る第2のチャックとの間の前記ワークの受け渡しの際に、前記第1のチャックを移動させる駆動機構への位置指令を学習する機械学習装置であって、
     前記駆動機構への前記位置指令と、前記駆動機構からのフィードバックデータと、を状態変数として観測する状態観測部と、
     前記状態変数に基づいて作成されるデータセットに従って、前記第1のチャックと前記第2のチャックとの間における前記ワークの受け渡し位置の位置ずれを抑制する前記位置指令を学習する学習部と、
     を備える、
     ことを特徴とする機械学習装置。
    At the time of transfer of the work between a first chuck that grips and sends the work and a second chuck that grips and receives the work, a position command to a drive mechanism that moves the first chuck is learned. A machine learning device,
    The position command to the drive mechanism, and feedback data from the drive mechanism, a state observation unit that observes as a state variable,
    A learning unit that learns the position command that suppresses a displacement of a transfer position of the workpiece between the first chuck and the second chuck according to a data set created based on the state variables;
    Comprising,
    A machine learning device characterized by that:
  2.  請求項1に記載の機械学習装置と、
     前記学習部が学習した前記位置指令を前記駆動機構へ出力する制御部と、
     を備えることを特徴とする数値制御装置。
    A machine learning device according to claim 1,
    A control unit that outputs the position command learned by the learning unit to the drive mechanism,
    A numerical control device comprising:
  3.  前記フィードバックデータは、前記ワークを搬送する搬送機構の位置を示すフィードバック位置を含み、
     前記学習部は、
     前記駆動機構への前記位置指令が示す位置と、前記フィードバック位置との差分に基づいて報酬を計算する報酬計算部と、
     前記報酬に基づいて、前記位置指令を決定するための関数を更新する関数更新部と、
     を備えることを特徴とする請求項2に記載の数値制御装置。
    The feedback data includes a feedback position indicating a position of a transfer mechanism that transfers the work,
    The learning unit includes:
    The position indicated by the position command to the drive mechanism, a reward calculation unit that calculates a reward based on the difference between the feedback position,
    Based on the reward, a function update unit that updates a function for determining the position command,
    The numerical control device according to claim 2, comprising:
  4.  前記フィードバックデータは、前記駆動機構が前記第1のチャックを移動させるために前記駆動機構から出力される電流を示すデータであるフィードバック電流を含み、
     前記報酬計算部は、前記差分および前記フィードバック電流に基づいて前記報酬を計算する、
     ことを特徴とする請求項3に記載の数値制御装置。
    The feedback data includes a feedback current that is data indicating a current output from the drive mechanism for the drive mechanism to move the first chuck,
    The reward calculation unit calculates the reward based on the difference and the feedback current,
    The numerical control device according to claim 3, wherein:
  5.  前記報酬計算部は、前記差分が閾値以下で、かつ前記フィードバック電流が基準電流値以下である場合には前記報酬を増大させ、前記差分が前記閾値よりも大きいまたは前記フィードバック電流が前記基準電流値よりも大きい場合には前記報酬を低減させる、
     ことを特徴とする請求項4に記載の数値制御装置。
    The reward calculation unit increases the reward when the difference is equal to or less than a threshold and the feedback current is equal to or less than a reference current value, and the difference is greater than the threshold or the feedback current is equal to the reference current value. If greater than, reduce the reward,
    The numerical control device according to claim 4, wherein:
  6.  前記関数更新部は、前記報酬に従って、前記関数を示す行動価値テーブルを更新する、
     ことを特徴とする請求項3から5の何れか1つに記載の数値制御装置。
    The function updating unit updates an action value table indicating the function according to the reward.
    The numerical control device according to any one of claims 3 to 5, wherein:
  7.  前記状態観測部は、前記差分が閾値よりも大きいまたは前記フィードバック電流が基準電流値よりも大きい場合に、前記受け渡しが失敗であると判定し、
     前記学習部は、前記受け渡しが失敗であると判定されると前記位置指令を学習し、
     前記制御部は、前記受け渡しが失敗であると判定されると前記学習部が学習した前記位置指令で前記受け渡しの再試行を行う、
     ことを特徴とする請求項4に記載の数値制御装置。
    The state observation unit, when the difference is greater than a threshold or the feedback current is greater than a reference current value, determines that the delivery has failed,
    The learning unit learns the position command when it is determined that the delivery has failed,
    The control unit performs a retry of the delivery with the position command learned by the learning unit when the delivery is determined to have failed,
    The numerical control device according to claim 4, wherein:
  8.  前記学習部は、前記駆動機構への前記位置指令のうち、前記第1のチャックと前記第2のチャックとの間での前記受け渡しの方向に垂直な第1の方向への位置指令を学習する、
     ことを特徴とする請求項2から7の何れか1つに記載の数値制御装置。
    The learning unit learns, among the position commands to the drive mechanism, a position command in a first direction perpendicular to the transfer direction between the first chuck and the second chuck. ,
    The numerical control device according to any one of claims 2 to 7, wherein:
  9.  請求項2から請求項8のいずれか1つに記載の数値制御装置によって制御され、前記駆動機構によって駆動する、
     ことを特徴とする工作機械。
    It is controlled by the numerical controller according to any one of claims 2 to 8, and is driven by the driving mechanism.
    A machine tool characterized in that:
  10.  ワークを掴んで送る側の第1のチャックと、前記ワークを掴んで受け取る側の第2のチャックとの間の前記ワークの受け渡しの際に、前記第1のチャックを移動させる駆動機構への位置指令を学習する機械学習方法であって、
     前記駆動機構への前記位置指令と、前記駆動機構からのフィードバックデータと、を状態変数として観測する状態観測ステップと、
     前記状態変数に基づいて作成されるデータセットに従って、前記第1のチャックと前記第2のチャックとの間における前記ワークの受け渡し位置の位置ずれを抑制する前記位置指令を学習する学習ステップと、
     を含む、
     ことを特徴とする機械学習方法。
    A position of a drive mechanism for moving the first chuck when the work is transferred between a first chuck on the side of gripping and sending the work and a second chuck on the side of holding and receiving the work; A machine learning method for learning instructions,
    The position command to the drive mechanism, and feedback data from the drive mechanism, a state observation step of observing as a state variable,
    A learning step of learning the position command that suppresses a displacement of a transfer position of the workpiece between the first chuck and the second chuck according to a data set created based on the state variables;
    including,
    A machine learning method characterized in that:
  11.  ワークを掴んで送る第1のチャックと、前記ワークを掴んで受け取る第2のチャックとの間の前記ワークの受け渡しの際に、前記第1のチャックを移動させる駆動機構への位置指令を出力する数値制御装置であって、
     前記駆動機構への前記位置指令と、前記駆動機構からのフィードバックデータと、に基づいて、前記第1のチャックと前記第2のチャックとの間における前記ワークの受け渡し位置の位置ずれを判定し、位置ずれ量が許容範囲内でない場合には、前記位置指令で示される位置をずらした新たな位置指令を出力する制御部を備え、
     前記制御部は、前記位置ずれを判定する処理と、前記位置ずれ量が許容範囲内でない場合に前記新たな位置指令を出力する処理と、を1回または複数回実行することによって、前記位置ずれ量を許容範囲内に収める、
     ことを特徴とする数値制御装置。
    When transferring the work between the first chuck that grips and sends the work and the second chuck that grips and receives the work, a position command is output to a drive mechanism that moves the first chuck. A numerical controller,
    Based on the position command to the drive mechanism and the feedback data from the drive mechanism, determine a positional deviation of the workpiece transfer position between the first chuck and the second chuck, When the displacement amount is not within the allowable range, a control unit that outputs a new position command shifted the position indicated by the position command,
    The control unit executes the process of determining the position shift and the process of outputting the new position command when the amount of the position shift is not within the allowable range once or a plurality of times. Keep the amount within an acceptable range,
    A numerical controller characterized by the above-mentioned.
PCT/JP2018/025746 2018-07-06 2018-07-06 Machine learning device, numerical control device, machine tool, and machine learning method WO2020008633A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
PCT/JP2018/025746 WO2020008633A1 (en) 2018-07-06 2018-07-06 Machine learning device, numerical control device, machine tool, and machine learning method
DE112018007687.3T DE112018007687T5 (en) 2018-07-06 2018-07-06 Machine learning device, numerical control device, machine tool, and machine learning method
CN201880095230.7A CN112368656B (en) 2018-07-06 2018-07-06 Machine learning device, numerical control device, machine tool, and machine learning method
JP2018562139A JP6505341B1 (en) 2018-07-06 2018-07-06 Machine learning apparatus, numerical control apparatus, machine tool and machine learning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2018/025746 WO2020008633A1 (en) 2018-07-06 2018-07-06 Machine learning device, numerical control device, machine tool, and machine learning method

Publications (1)

Publication Number Publication Date
WO2020008633A1 true WO2020008633A1 (en) 2020-01-09

Family

ID=66324176

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/025746 WO2020008633A1 (en) 2018-07-06 2018-07-06 Machine learning device, numerical control device, machine tool, and machine learning method

Country Status (4)

Country Link
JP (1) JP6505341B1 (en)
CN (1) CN112368656B (en)
DE (1) DE112018007687T5 (en)
WO (1) WO2020008633A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020149505A (en) * 2019-03-14 2020-09-17 ファナック株式会社 Grip force adjustment device and grip force adjustment system

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220161333A1 (en) 2019-04-11 2022-05-26 Citizen Watch Co., Ltd. Machine tool and detection method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09258814A (en) * 1996-03-22 1997-10-03 Kayaba Ind Co Ltd Device and method for controlling position of assembling robot
JP2002187040A (en) * 2000-12-19 2002-07-02 Murata Mach Ltd Loader control device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4458321A (en) * 1981-08-19 1984-07-03 The Charles Stark Draper Laboratory, Inc. Self-teaching robot feedback system
US8996167B2 (en) * 2012-06-21 2015-03-31 Rethink Robotics, Inc. User interfaces for robot training
KR102023588B1 (en) * 2016-03-03 2019-10-14 구글 엘엘씨 Deep machine learning method and device for robot gripping

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09258814A (en) * 1996-03-22 1997-10-03 Kayaba Ind Co Ltd Device and method for controlling position of assembling robot
JP2002187040A (en) * 2000-12-19 2002-07-02 Murata Mach Ltd Loader control device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020149505A (en) * 2019-03-14 2020-09-17 ファナック株式会社 Grip force adjustment device and grip force adjustment system
US11691236B2 (en) 2019-03-14 2023-07-04 Fanuc Corporation Gripping force adjustment device and gripping force adjustment system

Also Published As

Publication number Publication date
JPWO2020008633A1 (en) 2020-07-27
CN112368656B (en) 2021-08-20
JP6505341B1 (en) 2019-04-24
CN112368656A (en) 2021-02-12
DE112018007687T5 (en) 2021-02-25

Similar Documents

Publication Publication Date Title
JP6333795B2 (en) Robot system with simplified teaching and learning performance improvement function by learning
US11279033B2 (en) Method and apparatus for collision-free motion planning of a manipulator
JP4122652B2 (en) Robot control device
CN108568814B (en) Robot and robot control method
JP2021053802A (en) Industrial robot and operation method thereof
JP5382359B2 (en) Robot system
US20170160717A1 (en) Combined system having machine tool and robot
JP4513663B2 (en) Operation teaching method of assembly mechanism in automatic assembly system
JP5383756B2 (en) Robot with learning control function
US20120239190A1 (en) Robot and Method For Operating A Robot
JP2014024162A (en) Robot system, robot control device, robot control method and robot control program
WO2020008633A1 (en) Machine learning device, numerical control device, machine tool, and machine learning method
CN117157596A (en) Numerical controller and numerical control system
CN110914020B (en) Handling device with robot, method and computer program
CN110315517A (en) Robot system and control method
JP2007066001A (en) Control unit for robot
US9827673B2 (en) Robot controller inhibiting shaking of tool tip in robot equipped with travel axis
US20180178383A1 (en) Moving Along A Predetermined Path With A Robot
JP2002187040A (en) Loader control device
JP5324397B2 (en) Information processing method, apparatus and program
KR102671535B1 (en) Method and computer program for correcting errors in manipulator system
CN112292238B (en) Method and system for transferring end effector of robot between one end effector pose and another end effector pose
CN110892344B (en) Machine learning device, numerical control device, machine tool, and machine learning method
WO2020194752A1 (en) Numerical control device and numerical control method
JP2002187041A (en) Loader control equipment

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018562139

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18925177

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 18925177

Country of ref document: EP

Kind code of ref document: A1