WO2020008633A1

WO2020008633A1 - Machine learning device, numerical control device, machine tool, and machine learning method

Info

Publication number: WO2020008633A1
Application number: PCT/JP2018/025746
Authority: WO
Inventors: 勇太加藤; 泰一石田
Original assignee: 三菱電機株式会社
Priority date: 2018-07-06
Filing date: 2018-07-06
Publication date: 2020-01-09
Also published as: JPWO2020008633A1; CN112368656B; JP6505341B1; CN112368656A; DE112018007687T5

Abstract

A machine learning device (20), which learns a position command (53) for a drive unit (37) that moves a loader chuck (32) when a workpiece (40) is transferred between the loader chuck (32), which grips and transports the workpiece (40), and a main shaft chuck (31), which grips and receives the workpiece (40), said machine learning device being equipped with: a state observation unit (25) for observing, as state variables (56), the position command (53) for the drive unit (37) and feedback data from the drive unit (37); and a learning unit (21) for learning the position command (53) for suppressing deviation in the transfer position of the workpiece (40) between the loader chuck (32) and the main shaft chuck (31), in accordance with a data set created on the basis of the state variables (56).

Description

Machine learning device, numerical control device, machine tool, and machine learning method

The present invention relates to a machine learning device, a numerical control device, a machine tool, and a machine learning method for learning a work transfer operation.

In a machine tool such as a lathe, in a work transfer operation between a chuck that grips and sends a work and a chuck that grips and receives a work, a loader that conveys the work is gripped by the chuck on the sending side. To the transfer position. For example, the workpiece may not be able to be moved to the center position of the chuck area on the receiving side of the workpiece due to bending of the workpiece when the workpiece is a long workpiece, or failure to grasp the workpiece by the chuck. As described above, when the transfer position is shifted from an appropriate position during the transfer of the work, the transfer may fail. Therefore, a technique capable of suppressing the position shift of the transfer position is desired.

The loader control device described in Patent Literature 1 makes a correlation between a shift amount of a workpiece transfer position and a motor torque of a servo motor that drives the loader a function, and transfers the correlation based on the correlation and the measured motor torque. The shift amount of the position is predicted, and the delivery position is corrected based on the predicted shift amount.

JP-A-2002-187040

However, in Patent Literature 1, since the function indicating the correlation is a fixed function, the probability of failure in delivery of the work does not decrease even if the work delivery operation is repeated.

The present invention has been made in view of the above, and an object of the present invention is to obtain a machine learning device that can reduce the probability of failing in transferring a work by learning the work transfer operation. .

In order to solve the above-mentioned problems and achieve the object, the present invention provides a method for transferring a work between a first chuck for holding and sending a work and a second chuck for holding and receiving the work. 1. A machine learning device for learning a position command to a drive mechanism for moving one chuck, a state observation unit for observing a position command to the drive mechanism and feedback data from the drive mechanism as state variables, A learning unit that learns a position command that suppresses a positional deviation of a workpiece transfer position between the first chuck and the second chuck in accordance with a data set created based on the variables. .

(4) The machine learning device according to the present invention has an effect that by learning the work transfer operation, it is possible to reduce the probability that the work transfer will fail.

The figure which shows the structure of the working system concerning embodiment. FIG. 1 is a diagram illustrating a configuration of a control system including a numerical control device according to an embodiment. 4 is a flowchart illustrating an operation procedure of the machining system according to the embodiment. FIG. 4 is a diagram for describing a first learning example by the machine learning device according to the embodiment. FIG. 7 is a diagram for describing a second learning example by the machine learning device according to the embodiment. FIG. 4 is a view for explaining a positional relationship between a spindle chuck and a work included in the machine tool according to the embodiment. FIG. 2 is a diagram illustrating an example of a hardware configuration of a numerical control device according to an embodiment.

Hereinafter, a machine learning device, a numerical control device, a machine tool, and a machine learning method according to an embodiment of the present invention will be described in detail with reference to the drawings. It should be noted that the present invention is not limited by the embodiment.

Embodiment FIG. 1 is a diagram illustrating a configuration of a machining system according to an embodiment. FIG. 1 shows a case where the machining system 1 is viewed from a vertical direction. In the present embodiment, a case will be described in which the vertical direction is the Y-axis direction, and the horizontal direction, which is the moving direction of the work 40, is the X-axis direction and the Z-axis direction.

The machine system 1 includes a machine tool 2 for machining a workpiece 40 and a control system 3 for controlling the operation of the machine tool 2. Examples of the machine tool 2 are a lathe and a machining center. Hereinafter, a case where the machine tool 2 is a lathe will be described.

The machine tool 2 includes a rotating unit 35, a loader chuck 32 serving as a first chuck, a spindle chuck 31 serving as a second chuck, and a loader 36 serving as a transfer mechanism for a workpiece 40. The operation of the loader 36 is controlled by the control system 3. The loader chuck 32 is connected to the loader 36 and moves together with the loader 36. The loader chuck 32 can grip a workpiece 40 which is a workpiece. Examples of the loader chuck 32 are a three-jaw chuck and a collet chuck. The loader chuck 32 transfers the work 40 to the spindle chuck 31 when starting the processing of the work 40, and receives the work 40 from the spindle chuck 31 after the processing of the work 40 is completed.

The rotation unit 35 rotates around the Z axis, which is the main axis, as a rotation axis. The spindle chuck 31 is connected to the rotating unit 35 and rotates together with the rotating unit 35. The spindle chuck 31 can hold the work 40. Examples of the spindle chuck 31 are a three-jaw chuck and a collet chuck. When the work 40 is machined, the rotating unit 35 rotates while the spindle chuck 31 holds the work 40, thereby rotating the work 40. An example of the rotating unit 35 is a spindle mechanism.

The machine tool 2 grips one end of the work 40 with the loader chuck 32 when loading the work 40 on the rotating unit 35. In this state, the loader 36 moves to the minus side in the X-axis direction and stops at the position facing the spindle chuck 31 (s0). Then, the loader 36 moves to the minus side in the Z-axis direction. Thus, the loader 36 moves the work 40 to a position where the spindle chuck 31 can grip the work 40 (s1). The position of the loader 36 where the spindle chuck 31 can grip the work 40 is a desired delivery position. The machine tool 2 starts the closing operation of the spindle chuck 31 using an auxiliary function such as an M code (s2), and waits until the closing operation of the spindle chuck 31 is completed (s3). When the spindle chuck 31 is closed, the spindle chuck 31 grips the other end of the work 40.

Thereafter, the machine tool 2 starts the opening operation of the loader chuck 32 (s4) and waits until the opening operation of the loader chuck 32 is completed (s5). Then, the loader 36 moves to the plus side in the Z-axis direction. As a result, the loader 36 retreats in the direction away from the work 40 (s6), and further moves to the plus side in the X-axis direction.

(4) When the machining of the work 40 is completed, the machine tool 2 unloads the work 40 from the rotating unit 35 by performing a process in the reverse order of the processes from s0 to s6 described above. In this case, each process from s0 to s6 is a process in the reverse direction. That is, in the processing of s0 and s6, the moving direction of the loader 36 is reversed between the time of loading and the time of unloading. At the time of unloading, the closing operation of the loader chuck 32 and the opening operation of the spindle chuck 31 are performed.

Specifically, at the time of unloading, the loader 36 moves to the minus side in the X-axis direction, and further moves in a direction approaching the work 40. Then, the loader chuck 32 starts the closing operation, and when the closing operation is completed, the spindle chuck 31 starts the opening operation. When the opening operation is completed, the loader 36 retreats in a direction away from the work 40, and further X Move to the positive side in the axial direction.

処理 Since the process of loading the work 40 onto the rotating unit 35 and the process of unloading the work 40 from the rotating unit 35 are the same, the process of loading the work 40 onto the rotating unit 35 will be described below.

The machine tool 2 has a peculiar habit in a process of gripping the work 40, a transfer process, and the like. For this reason, there is a correspondence between the position command to the loader chuck 32 and the actual position of the loader chuck 32 in the X-axis direction due to the peculiarity of the machine. For this reason, even when the work 40 is intended to be loaded with an appropriate position command, when the loader chuck 32 attempts to pass the work 40 to the spindle chuck 31, the position of the work 40 in the X-axis direction is shifted, and the work 40 The chuck 31 may collide with the positive end 30 in the Z-axis direction. In this case, the loader chuck 32 cannot transfer the work 40 to the spindle chuck 31.

In the present embodiment, a numerical control (NC: Numerical Control) device 10 described later provided in the control system 3 is used to move the loader chuck 32 in the X-axis direction when the loader chuck 32 attempts to transfer the work 40 to the spindle chuck 31. Learn the position. That is, by learning the position command to the loader chuck 32, the numerical controller 10 reduces the probability that the delivery of the workpiece 40 fails. If the posture and the shape of the work 40 gripped by the loader chuck 32 are the same each time, the position of the work 40 in the X-axis direction corresponds to the position of the loader chuck 32 in the X-axis direction. Therefore, in the present embodiment, the displacement amount of the loader chuck 32 in the X-axis direction and the displacement amount of the work 40 in the X-axis direction are used synonymously.

Next, the configuration of the numerical controller 10 that controls the operation of the machine tool 2 will be described. FIG. 2 is a diagram illustrating a configuration of a control system including the numerical control device according to the embodiment. The control system 3 includes a numerical control device 10, a drive unit 37, and a servomotor 38.

The numerical controller 10 is a computer that controls the position of the loader 36 by sending a position command 53 to the drive unit 37. The position command 53 sent by the numerical controller 10 to the drive unit 37 is a command specifying the position of the loader 36, and includes a position command in the X-axis direction and a position command in the Z-axis direction. The numerical controller 10 controls the transfer of the work 40 from the loader chuck 32 to the spindle chuck 31, and when the transfer fails, changes the X-axis direction position command 53 to the loader chuck 32 and repeats the operation. Control the delivery. The numerical controller 10 determines a position command 53 in the X-axis direction to the loader chuck 32 and whether the result of the transfer using the position command 53 in the X-axis direction is failure or success. An appropriate position command 53 in the X-axis direction to the chuck 32 is learned.

The drive unit 37 is a drive mechanism that moves the loader 36 by driving a servo motor 38. The drive unit 37 calculates a current value to be sent to the servomotor 38 based on the position command 53 from the numerical controller 10. The drive unit 37 drives the servo motor 38 by sending a current corresponding to the position command 53 to the servo motor 38. The drive unit 37 sends a feedback (FB: Feed-Back) current 55, which is data indicating a current to be sent to the servomotor 38, to the numerical controller 10. The FB current 55 is an example of feedback data from the drive unit 37 to the numerical controller 10.

When information indicating the number of rotations of the servo motor 38 is sent from the encoder 39, the drive unit 37 calculates an FB position 54, which is data indicating the current position of the work 40, based on the number of rotations. Send to 10. The FB position 54 is an example of feedback data from the drive unit 37 to the numerical controller 10.

The servomotor 38 is connected to the loader 36 and moves the loader 36 according to the current from the drive unit 37. The servomotors 38 include a servomotor that moves the loader 36 in the X-axis direction and a servomotor that moves the loader 36 in the Z-axis direction. An encoder 39 for detecting the rotation speed of the servomotor 38 is attached to each of the servomotors 38 in the X-axis direction and the Z-axis direction. The encoder 39 transmits information indicating the detected rotation speed to the drive unit 37.

The numerical control device 10 includes a control processing program storage unit 11, an analysis unit 12, a control unit 13, a storage unit 14, and a machine learning device 20. The control processing program storage unit 11 stores a control processing program used when processing the work 40. The control machining program includes a loading command 61 for loading the work 40 on the rotating unit 35, a machining command for machining the work 40, and an unloading command for unloading the work 40 from the rotating unit 35. include. FIG. 2 illustrates the loading command 61. Among these commands, the loading command 61 and the unloading command are dedicated commands for transferring the work 40. The loading command 61 is sent to the analysis unit 12 as a G code 51 for positioning the loader 36.

The analysis unit 12 analyzes the control machining program. The analysis unit 12 determines whether or not the analyzed command is a dedicated command. If the analyzed command is a dedicated command such as the loading command 61, transfer position information indicating a position at which the workpiece 40 is transferred based on the G code 51. 52 is generated. That is, the analysis unit 12 generates the transfer position information 52 based on the positioning command of the loader 36 included in the G code 51. The transfer position information 52 is information of a position at which the work 40 is transferred between chucks between the loader chuck 32 and the spindle chuck 31. Specifically, the delivery position information 52 is an end point of the loader 36. At the time of the first delivery execution of the work 40, information used for the delivery operation is set by the argument of the dedicated command.

The dedicated command includes the following command arguments (A1) to (A5).
(A1) The end point of the loader 36 which is the transfer position information 52 of the work 40 (A2) The reference current value A which is a criterion for determining whether or not a collision occurs when the work 40 is transferred.
(A3) The retraction amount Lz of the loader 36 when it is determined that the collision occurs.
(A4) The direction and amount Lx of moving the loader 36 during learning
(A5) Maximum moving distance Lmax in one direction during learning

(4) The transfer position information 52 of (A1) includes the X coordinate and the Z coordinate. The reference current value A in (A2) is a value for determining whether or not the delivery is abnormal, and is compared with the FB current 55 of the current sent to the servomotor 38. When the FB current 55 larger than the reference current value A is sent to the servomotor 38, it is an abnormal state that the workpiece 40 collides with the spindle chuck 31 at a position other than the transfer position of the spindle chuck 31. An example of a position where the workpiece 40 collides with the position other than the transfer position by the spindle chuck 31 is the end 30 of the spindle chuck 31 described above. The retraction amount Lz in (A3) is the distance that the work 40 is pulled back along the Z-axis direction when the movement of the work 40 is stopped due to the collision of the work 40 at the end 30.

When the workpiece 40 collides at the end 30 and stops moving, the machine learning device 20 learns the position command 53. The movement amount Lx in (A4) is a distance by which the work 40 can be moved in the X-axis direction during this learning. The work 40 is moved in the X-axis direction by the movement amount Lx, and then the work 40 is moved toward the spindle chuck 31 along the Z-axis direction. The work 40 is moved by the moving amount Lx until the moving position at the time of delivery falls within the allowable range. The movement distance Lmax in (A5) is a limit distance at which the work 40 can be moved in the X-axis direction during learning. That is, even during the learning, the work 40 is not moved farther than the moving distance Lmax.

The analysis unit 12 sends the transfer position information 52 of (A1) and the retraction amount Lz of (A3) to the control unit 13. Further, the analysis unit 12 sends the values of the command arguments (A2), (A4), and (A5) to the machine learning device 20. Note that the analysis unit 12 is not limited to acquiring the value of the command argument from the dedicated command, but may acquire the value corresponding to the command argument from the parameter. In this case, a value corresponding to the command argument is stored in the storage unit 14 as a parameter.

The control unit 13 generates the position command 53 according to the transfer position information 52 sent from the analysis unit 12 or the action 58 given from the machine learning device 20. The action 58 is the next position command 53 in the X-axis direction. The control unit 13 sends the position command 53 to the drive unit 37 and the machine learning device 20. When receiving a notification from the machine learning device 20 indicating that the movement position of the work 40 is within the allowable range, the control unit 13 controls the continuation of the transfer operation of the work 40.

Upon receiving a notification from the machine learning device 20 indicating that the moving position of the work 40 is out of the allowable range, the control unit 13 pulls back the work 40 along the Z-axis direction by the retraction amount Lz, 53 is generated.

The machine learning device 20 includes a state observation unit 25 and a learning unit 21. The state observation unit 25 acquires the reference current value A of (A2) of the command arguments from the analysis unit 12, and the learning unit 21 sends the movement amount Lx of (A4) of the command arguments from the analysis unit 12. And the moving distance Lmax of (A5) is obtained.

The state observation unit 25 includes a position command 53 in the X-axis direction and the Z-axis direction from the control unit 13, an FB position 54 in the X-axis direction and the Z-axis direction from the drive unit 37, and an X-axis direction and a Z The FB current 55 in the axial direction is obtained.

When determining whether or not the movement position of the workpiece 40 is within the allowable range, the state observation unit 25 determines the position command 53 in the X-axis direction and the Z-axis direction and the FB position in the X-axis direction and the Z-axis direction. 54 and an FB current 55 in the X-axis direction and the Z-axis direction. The state observation unit 25 determines whether the moving position of the workpiece 40 is within the allowable range based on the position command 53 in the X-axis direction, the FB position 54 in the X-axis direction, and the FB current 55 in the X-axis direction. It may be determined whether or not. The state observation unit 25 determines whether the moving position of the workpiece 40 is within the allowable range based on the position command 53 in the Z-axis direction, the FB position 54 in the Z-axis direction, and the FB current 55 in the Z-axis direction. It may be determined whether or not. In addition, the state observation unit 25 uses the FB position 54 in the Z-axis direction and the position command 53 in the Z-axis direction without using the FB current 55 in the Z-axis direction so that the movement position of the workpiece 40 is within the allowable range. May be determined.

When the learning unit 21 learns the position command 53 to the drive unit 37, the state observation unit 25 sends the position command 53 in the X-axis direction, the FB position 54 in the X-axis direction, and the FB current in the X-axis direction. 55 is observed as a state variable 56, and the state variable 56 as the observation result is sent to the learning unit 21. That is, the state variables 56 sent from the state observation unit 25 to the learning unit 21 include the position command 53 in the X-axis direction, the FB position 54 in the X-axis direction, and the FB current 55 in the X-axis direction.

によって Depending on the shape of the work 40, the delivery position of the work 40 may be shifted in the X-axis direction from the center position of the spindle chuck 31. If the transfer position of the work 40 by the loader chuck 32 is not appropriate, the transfer position of the work 40 may be shifted from the center position of the spindle chuck 31 in the X-axis direction. In these cases, there is a high possibility that the workpiece 40 collides with the spindle chuck 31 at a position other than the delivery position of the spindle chuck 31 and stops at a position where the workpiece 40 cannot be gripped by the spindle chuck 31.

(4) When the workpiece 40 collides with the spindle chuck 31 at a position other than the transfer position, the moving position of the workpiece 40 is out of the allowable range. Also, when the work 40 moves to the transfer position while rubbing against the spindle chuck 31, the movement position of the work 40 is outside the allowable range.

(4) When an abnormality occurs in which the movement position of the work 40 is out of the allowable range, the load on at least one of the loaders 36 in the X-axis direction and the Z-axis direction increases. Therefore, when the moving position of the work 40 is out of the allowable range, the current sent to the servomotor 38 increases, and the FB current 55 also increases. Therefore, the state observation unit 25 determines whether the moving position of the work 40 is within the allowable range based on the comparison result between the reference current value A and the FB current 55. When the moving position of the work 40 is normal within the allowable range, the work 40 stops at the transfer position by the spindle chuck 31, so that the load on the loader 36 does not increase and the FB current 55 does not increase. .

{Circle around (4)} In the case of an abnormality where the moving position of the work 40 is out of the allowable range, the position of the loader 36 corresponding to the position command 53 and the position of the loader 36 corresponding to the FB position 54 do not become the same within a specific time. Therefore, the state observation unit 25 determines whether or not the moving position of the work 40 is within the allowable range based on the comparison result between the position command 53 and the FB position 54. The state observation unit 25 sends a result of the determination as to whether or not the moving position of the work 40 is within the allowable range to the control unit 13.

The learning unit 21 learns the action 58 that is the next position command 53 according to the state variable 56. In other words, the learning unit 21 learns the position command 53 that reduces the amount of displacement of the transfer position of the workpiece 40 by the loader 36.

Specifically, the learning unit 21 learns the behavior 58 according to a data set created based on the state variable 56 including the position command 53, the FB position 54, and the FB current 55. The learning unit 21 includes a function updating unit 22 and a reward calculating unit 23.

The reward calculator 23 calculates a reward 57 based on the state variable 56. The reward calculator 23 calculates the difference between the position indicated by the position command 53 and the FB position 54 based on the state variable 56, and extracts the FB current 55 from the state variable 56. The reward calculation unit 23 increases the reward 57 when the difference between the position indicated by the position command 53 and the FB position 54 is equal to or less than the threshold value and the FB current 55 is equal to or less than the reference current value A. In this case, the reward calculation unit 23 increases the reward 57 as the difference between the position indicated by the position command 53 and the FB position 54 is smaller, and increases the reward 57 as the FB current 55 is smaller. The learning unit 21 sends the calculated reward 57 to the function updating unit 22. In the following description, the difference between the position indicated by the position command 53 and the FB position 54 may be referred to as a position difference.

The function update unit 22 stores a function for determining the action 58, and updates the function for determining the action 58 based on the reward 57. An example of a function for determining the action 58 is an action value function Q (s _t , a _t ) described later. The function updating unit 22 of the present embodiment updates the action value function Q (s, a) such that the displacement of the delivery position is reduced each time the delivery operation of the workpiece 40 is repeated in the machine tool 2. The function update unit 22 calculates the action 58 using the updated action value function Q (s, a). The function update unit 22 sends the calculated action 58 to the control unit 13 and sends the storage unit 14 the previous learning data, the data used for learning, and the data necessary for controlling the loader 36. An example of the learning data is the next position command 53 calculated when the delivery is successful, and an example of the data used for the learning is an action value function Q (s, a) used by the learning unit 21 for the learning. It is. An example of data used for controlling the loader 36 is the pullback amount Lz. The storage unit 14 stores learning data up to the previous time, data used for learning, and data necessary for controlling the loader 36 by the control unit 13.

Next, the operation procedure of the machining system 1 will be described. FIG. 3 is a flowchart illustrating an operation procedure of the machining system according to the embodiment. Upon reading the loading instruction 61, the numerical controller 10 determines that the transfer of the loader 36 to the transfer position instructed by the instruction argument (A1) is performed when the delivery of the work 40 is the first time, that is, when the workpiece 40 has not been learned. Start (step ST1). At this time, the control unit 13 sends a position command 53 to the drive unit 37 and the state observation unit 25. Accordingly, the loader 36 moves in the X-axis direction with the loader chuck 32 gripping the work 40, and then moves in the Z-axis direction.

When the work 40 starts to move, the drive unit 37 acquires the FB position 54 and the FB current 55 for each specific cycle, and sends them to the state observation unit 25. Accordingly, the state observation unit 25 monitors the position command 53, the FB position 54, and the FB current 55.

The state observation unit 25 determines whether the FB current 55 in each of the X-axis and the Z-axis is equal to or smaller than the reference current value A (step ST2). Note that, as the reference current value A, different values may be used for the X axis and the Z axis.

Before reaching the transfer position, if the FB current 55 in each of the X-axis or the Z-axis becomes larger than the reference current value A (No in step ST2), the state observation unit 25 sets the X-axis direction and the Z-axis direction. To the learning unit 21 including the position command 53, the FB position 54 in the X-axis direction and the Z-axis direction, and the FB current 55 in the X-axis direction and the Z-axis direction. Further, the state observation unit 25 notifies the control unit 13 that the movement position of the work 40 is out of the allowable range.

When the FB current 55 of each axis is larger than the reference current value A, that is, when the movement to the transfer position has failed, the learning unit 21 sets a small value reward 57 for the position command 53 used for transfer. I do. Accordingly, the learning unit 21 learns an appropriate position command 53 according to the state variable 56, and determines an action 58 that is the next position command 53 so that the reward 57 is maximized (step ST4a). The control unit 13 pulls back the work 40 along the Z-axis direction by the pullback amount Lz (step ST5). Then, the control unit 13 starts moving the loader 36 to the transfer position (step ST1). Thereby, the control unit 13 moves the work 40 in the X-axis direction by the position command 53 generated according to the action 58, and thereafter moves the work 40 in the Z-axis direction.

In the process of step ST2, when the FB current 55 in each of the X-axis and the Z-axis becomes equal to or smaller than the reference current value A (Yes in step ST2), the state observation unit 25 sets the position command in each of the X-axis and the Z-axis. It is determined whether or not the position difference between the position indicated by 53 and the FB position 54 is equal to or smaller than a threshold (step ST3).

When the FB position 54 in the X axis and the Z axis is different from the position command 53 (step ST3, No), the state observation unit 25 sets the position command 53 in the X axis direction, the FB position 54 in the X axis direction, and the X axis direction. The state variable 56 including the FB current 55 in the direction is sent to the learning unit 21. Further, the state observation unit 25 notifies the control unit 13 that the movement position of the work 40 is out of the allowable range. Thereby, the processing of steps ST4a and ST5 described above is performed, and further the processing of step ST1 is performed.

Here, the learning process of the action 58 by the learning unit 21 will be described. The learning algorithm used for the learning unit 21 may be any learning algorithm. Here, a case where reinforcement learning (Reinforcement @ Learning) is applied to the learning algorithm will be described. In reinforcement learning, an agent acting as an agent in a certain environment observes a current state indicated by a state variable 56, and determines an action 58 to be taken based on the observation result. The agent obtains the reward 57 from the environment by selecting the action 58, and learns a policy that maximizes the reward 57 through a series of actions 58. As typical methods of reinforcement learning, Q learning (Q-Learning) and TD learning (TD-Learning) are known. For example, in the case of Q learning, a general update equation (action value table) of the action value function Q (s, a) is represented by the following equation (1). That is, an example of the action value table is the action value function Q (s, a) of Expression (1).

In the formula (1), s _t represents the environment at time t, a _t represents the behavior in time t. By the action a _t, the environment is changed to s _{t + 1.} rt _{+ 1} represents a reward 57 obtained by a change in the environment, γ represents a discount rate, and α represents a learning coefficient. If you apply the Q-learning, the next position command 53 of the delivery operation is the action a _t.

The update expression represented by the expression (1) increases the action value Q if the action value of the best action a at the time t + 1 is larger than the action value Q of the action a executed at the time t. In this case, the action value Q is reduced. In other words, the action value function Q (s, a) is updated so that the action value Q of the action a at the time t approaches the best action value at the time t + 1. As a result, the best action value in one environment is sequentially propagated to the action value in the previous environment.

The reward calculator 23 calculates the reward 57 based on the difference between the position indicated by the position command 53 and the FB position 54 and the FB current 55.

As described above, the reward calculation unit 23 increases the reward 57 when the position difference between the position indicated by the position command 53 and the FB position 54 is equal to or smaller than the threshold value and the FB current 55 is equal to or smaller than the reference current value A. . At this time, the reward calculator 23 gives a reward 57 of, for example, “1”.

On the other hand, the reward calculation unit 23 reduces the reward 57 when the difference between the position indicated by the position command 53 and the FB position 54 is larger than the threshold value or when the FB current 55 is larger than the reference current value A. At this time, the reward calculator 23 gives a reward 57 of, for example, “−1”.

For example, when the position difference is 0 and the amount of change in the FB current 55 is 0, the reward calculation unit 23 sets the reward 57 as the maximum reward. When the position difference is equal to or smaller than the threshold value and the FB current 55 is half the reference current value A, the reward calculation unit 23 sets the reward 57 to half the maximum reward. An example of the case where the FB current 55 is half of the reference current value A is a case where the transfer of the work 40 is successful, but the transfer position is slightly shifted from the desired position. When the workpiece 40 reaches the delivery position while rubbing against the spindle chuck 31, the FB current 55 increases during rubbing. Further, when the transfer position is slightly deviated from the desired position, if the spindle chuck 31 attempts to grip the work 40 at the center of the transfer position of the work 40 after reaching the transfer position, the work 40 is pushed by the spindle chuck 31. As a result, the FB current 55 increases. In such a case, the transfer position is within the allowable range. However, the machine learning device 20 determines whether the workpiece 40 has not collided with the end 30 or not. The reward 57 is given.

The reward calculator 23 sets the reward 57 as the minimum reward when the position difference is larger than the threshold value or when the FB current 55 is larger than the reference current value A. The reward calculation unit 23 sends the calculated reward 57 to the function update unit 22.

The function update unit 22 updates a function for determining the behavior 58 according to the reward 57 calculated by the reward calculation unit 23. For example, in the case of Q learning, an action value function Q (s _t , a _t ) represented by Expression (1) is a function for calculating the action 58, and is updated by the function update unit 22.

FIG. 4 is a diagram for explaining a first learning example by the machine learning device according to the embodiment. FIG. 4 shows positions P0 to P6 of the work 40 in the X-axis direction. It is assumed that the position of the work 40 in the X-axis direction when the work 40 collides with the end 30 of the spindle chuck 31 is the position P0. In this case, the machine learning device 20 pulls the work 40 back to the plus side in the Z-axis direction until the work 40 no longer collides with the end portion 30 of the spindle chuck 31, and moves the work 40 to the next position in the X-axis direction. The process of moving and the process of inserting the workpiece 40 on the minus side in the Z-axis direction are repeated.

Specifically, the machine learning device 20 moves the work 40 in the X-axis direction in the order of position P1, position P2, position P3, position P4, position P5, and position P6, which are positions in the X-axis direction. The distance between the position P0 and the position P1 is a distance Lx. Similarly, between the position P1 and the position P2, between the position P2 and the position P3, between the position P0 and the position P4, between the position P4 and the position P5, and between the position P5 and the position P6, respectively. They are separated by the movement amount Lx. In addition, the distance between the position P0 and the position P3 and the distance between the position P0 and the position P6 are apart from each other by the moving distance Lmax.

For example, when moving the work 40 to the position P1, the machine learning device 20 sends an action 58 for loading the work 40 to the position P1 to the control unit 13. As a result, the loader 36 moves in the X-axis direction according to the position command 53 corresponding to the action 58.

(4) When the workpiece 40 can be moved to the spindle chuck 31 without colliding with the end 30, the machine learning device 20 completes moving the workpiece 40 in the X-axis direction. The machine learning device 20 gives a low reward 57 to the position command 53 when the work 40 collides with the end 30, and gives a high reward 57 to the position command 53 when the work 40 does not collide with the end 30.

The machine learning device 20 may move the work 40 to the positions P1 to P6 in any order. For example, the machine learning device 20 may move the work 40 in the X-axis direction in an order close to the position P0, such as the order of the position P1, the position P4, the position P2, the position P5, the position P3, and the position P6. Further, the machine learning device 20 is not limited to setting the six positions P1 to P6 as the positions in the X-axis direction, and may set five or less positions or seven or more positions in the X-axis direction. .

When the FB current 55 on the X axis and the Z axis becomes equal to or less than the reference current value A (Yes in step ST2), and the position difference between the position indicated by the position command 53 on the X axis and the Z axis and the FB position 54 becomes smaller than the threshold value. (Step ST3, Yes), the learning unit 21 learns the action 58 which is the next position command 53 according to the state variable 56 (Step ST4b). That is, the learning unit 21 sets the reward 57 having a large value for the position command 53 used for delivery, and then determines the action 58 corresponding to the next position command 53.

{Circle around (5)} After the work 40 moves to the transfer position, the state observation unit 25 notifies the control unit 13 that the movement position of the work 40 is within the allowable range. Thereby, the control unit 13 proceeds with the process of transferring the work 40 between the chucks.

Specifically, the control unit 13 causes the machine tool 2 to execute the above-described operations from s2 to s6. That is, the control unit 13 starts the operation of closing the spindle chuck 31 (step ST6), and waits until the spindle chuck 31 is closed (step ST7). When the spindle chuck 31 is closed, the spindle chuck 31 grips the work 40.

After that, the controller 13 starts the operation of opening the loader chuck 32 (step ST8), and waits until the loader chuck 32 is opened (step ST9). Then, the control unit 13 retreats the loader 36 to the plus side in the Z-axis direction (step ST10).

(4) Even when the work 40 is unloaded from the spindle chuck 31 to the loader 36, the machine learning device 20 learns the position command 53 by the same processing as when the work 40 is loaded from the loader 36 to the spindle chuck 31.

In this way, when the delivery of the work 40 fails, the machine learning device 20 causes the delivery of the work 40 to be executed again according to the movement amount Lx, so that the delivery position can be corrected. Further, since the machine learning device 20 learns the transfer position of the work 40, it is possible to prevent the transfer failure. Further, since the machine learning device 20 learns the transfer position of the work 40, the machine learning device 20 can be applied to an environment in which the transfer fails due to a slight displacement such as a collet chuck. Further, since the machine learning device 20 can execute the delivery of the work 40 again and prevent the failure of the delivery, the productivity of the machine tool 2 is improved.

Further, since the machine learning device 20 determines the transfer position of the work 40 based on the position difference between the position indicated by the position command 53 and the FB position 54, a special device such as a camera for confirming the transfer of the work 40 is used. There is no need to provide a special mechanism or device. Therefore, delivery can be confirmed at low cost.

Further, when the delivery of the work 40 has failed, the machine learning device 20 executes the delivery of the work 40 again according to the movement amount Lx, so that the work 40 collides with the end 30 of the spindle chuck 31. There is no need for manual recovery. Therefore, downtime when loading the work 40 can be reduced, and deterioration in productivity can be suppressed.

In the present embodiment, the case where the machine tool 2 moves the work 40 in the X-axis direction and the Z-axis direction has been described, but the machine tool 2 moves the work 40 in the X-axis direction, the Y-axis direction, and the Z-axis direction. It may be moved in the direction. In this case, the position command 53 sent by the numerical controller 10 to the drive unit 37 includes a position command in the X-axis direction, a position command in the Y-axis direction, and a position command in the Z-axis direction. The servomotors 38 include servomotors for moving the loader 36 in the X-axis direction, the Y-axis direction, and the Z-axis direction. Then, the machine learning device 20 learns a position command in the X-axis direction, a position command in the Y-axis direction, and a position command in the Z-axis direction.

FIG. 5 is a diagram for explaining a second learning example by the machine learning device according to the embodiment. FIG. 6 is a diagram for explaining a positional relationship between a spindle chuck and a work included in the machine tool according to the embodiment. Here, a case where the machine tool 2 moves the workpiece 40 in the X-axis direction, the Y-axis direction, and the Z-axis direction will be described. FIGS. 5 and 6 show positions of the work 40 in the XY plane when the work 40 is viewed from the Z-axis direction.

主 The spindle chuck 31A shown in FIG. 6 is an example of the spindle chuck 31 described in FIG. The spindle chuck 31A is a three-jaw chuck. Each of the three spindle chucks 31A grips the workpiece 40 by moving to the center Q1 of the circular chuck area 45.

The numerical controller 10 generates a position command 53 for inserting the work 40 into the center Q1, but the actual work 40 is inserted at a position shifted from the center Q1, such as the position P0, and the work 40 is May collide. For this reason, the machine learning device 20 learns the position command 53 such that the work 40 does not collide with the end 30.

とする It is assumed that the position of the work 40 when the work 40 collides with the end 30 of the spindle chuck 31A is the position P0. In this case, the machine learning device 20 pulls the work 40 back to the plus side in the Z-axis direction until the work 40 no longer collides with the end 30 of the spindle chuck 31A, and moves the work 40 in the X-axis direction and the Y-axis direction. The process of moving to the next position and the process of inserting the work 40 on the minus side in the Z-axis direction are repeated. In this case, the machine learning device 20 calculates the next movement position of the work 40 based on the movement amount Lx and the movement distance Lmax. That is, when learning the movement position, the work 40 is moved to a position limited by a specific learning direction and a specific learning movement amount in the XY plane.

Specifically, the machine learning device 20 moves the work 40 in the order of the positions P11, P12, P13, P14, P15, P16, P17, P18, and P19, which are positions in the XY plane. Move. The distance between the position P0 and the position P11, the distance between the position P11 and the position P12, and the distance between the position P12 and the position P13 are apart from each other by a movement amount Lx. Similarly, the distance between the position P0 and the position P14, the distance between the position P14 and the position P15, and the distance between the position P15 and the position P16 are separated by the movement amount Lx. Similarly, the distance between the position P0 and the position P17, the distance between the position P17 and the position P18, and the distance between the position P18 and the position P19 are apart by the movement amount Lx. The distance between the position P0 and the position P13, the distance between the position P0 and the position P16, and the distance between the position P0 and the position P19 are apart from each other by a movement distance Lmax.

The direction between the direction from the position P0 to the position P13 and the direction from the position P0 to the position P16 forms a learning angle θ, and the direction from the position P0 to the position P16 and the direction from the position P0 to the position P19. The direction forms a learning angle θ. When the machine learning device 20 moves the work 40 in the first direction from the position P0 by the movement amount Lx, and moves the work 40 to the movement distance Lmax, the machine learning device 20 rotates the work 40 by the learning angle θ from the first direction. Then, the process of moving the work 40 from the position P0 by the movement amount Lx is repeated. That is, when performing the first search for an appropriate movement position on the work 40, the machine learning device 20 moves the work 40 by the movement amount Lx to the maximum movement distance Lmax in the first direction. Search for an appropriate movement position. When an appropriate moving position is not found in the first direction, the machine learning device 20 performs a search similar to the first direction in the second direction obtained by rotating the first direction by the learning angle θ. Do. The machine learning device 20 searches for an appropriate movement position while moving the work 40 by the movement amount Lx until the sum of the learning angles θ in the chuck area 45 exceeds 360 degrees, and rotates the learning angle θ. Repeat.

For example, when moving the work 40 to the position P11, the machine learning device 20 sends an action 58 for loading the work 40 to the position P11 to the control unit 13. Thereby, the loader 36 moves in the X-axis direction and the Y-axis direction by the position command 53 corresponding to the action 58.

When the work 40 can be moved to the spindle chuck 31A without colliding with the end 30, the machine learning device 20 completes moving the work 40 in the X-axis direction and the Y-axis direction.

The machine learning device 20 may move the work 40 to the positions P11 to P19 in any order. For example, the machine learning device 20 arranges the work 40 in the order close to the position P0, such as the order of position P11, position P14, position P17, position P12, position P15, position P18, position P13, position P16, and position P19. May be moved. Further, the machine learning apparatus 20 is not limited to setting nine positions P11 to P19 as positions in the XY plane, but may set eight or less positions or ten or more positions in the XY plane. .

Here, the hardware configuration of the numerical controller 10 will be described. FIG. 7 is a diagram illustrating a hardware configuration example of the numerical control device according to the embodiment.

The numerical controller 10 can be realized by the control circuit 300 shown in FIG. An example of the processor 301 is a CPU (Central Processing Unit), a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, a processor, a DSP (Digital Signal Processor), or a system LSI (Large Scale Integration). Examples of the memory 302 are a RAM (Random Access Memory) and a ROM (Read Only Memory).

The numerical controller 10 is realized by the processor 301 reading and executing a program stored in the memory 302 for executing the operation of the numerical controller 10. It can also be said that this program causes a computer to execute the procedure or method of the numerical control device 10. The memory 302 is also used as a temporary memory when the processor 301 executes various processes.

Note that some of the functions of the numerical control device 10 may be realized by dedicated hardware, and some may be realized by software or firmware. Further, the machine learning device 20 may be realized by the control circuit 300 shown in FIG.

In the present embodiment, whether or not the transfer position is within the allowable range is determined based on the FB current 55 and the position difference. However, the transfer position is within the allowable range based on the position difference without using the FB current 55. May be determined.

Further, in the present embodiment, the position command 53 to the transfer position is learned based on the FB current 55 in the X-axis direction and the position difference in the X-axis direction, but the X-axis FB current 55 is not used without using the FB current 55. The position command 53 to the transfer position may be learned based on the axial position difference.

When the spindle chuck 31 is an electric chuck, the displacement of the workpiece 40 can be detected also on the spindle chuck 31 side. In this case, the machine learning device 20 may learn the position command 53 based on the position shift detected on the spindle chuck 31 side.

In the present embodiment, the case where the machine learning device 20 performs the machine learning using the reinforcement learning has been described. However, the machine learning device 20 may perform other known methods such as a neural network, a genetic programming, and a functional logic programming. , Machine learning may be performed according to a support vector machine or the like.

Further, in the present embodiment, a case has been described where the control unit 13 controls the loader 36 based on the behavior 58 learned by the learning unit 21, but the control unit 13 controls the loader 36 without using the behavior 58. May be. In this case, the control unit 13 determines whether the moving position of the work 40 is within the allowable range based on the position command 53, the FB position 54, and the FB current 55, and determines whether the displacement amount is within the allowable range. If not, a new position command 53 shifted from the position indicated by the position command 53 is output. That is, the control unit 13 searches for an appropriate movement position of the work 40 by shifting the movement position of the work 40 little by little. Specifically, the control unit 13 executes the process of determining the position shift and the process of outputting a new position command 53 when the amount of the position shift is not within the allowable range once or a plurality of times. Keep the displacement within an allowable range.

As described above, according to the embodiment, the position command 53 for suppressing the displacement of the transfer position of the workpiece 40 between the chucks is learned in accordance with the data set created based on the state variable 56. Is repeated, it is possible to reduce the probability that the delivery of the work 40 fails.

The configurations described in the above embodiments are merely examples of the contents of the present invention, and can be combined with other known technologies, and can be combined with other known technologies without departing from the gist of the present invention. Parts can be omitted or changed.

1 machine system, 2 システム machine tool, 3 control system, 10 numerical control device, 11 control machining program storage unit, 12 analysis unit, 13 control unit, 14 storage unit, 20 machine learning device, 21 learning unit, 22 function update unit, 23 reward calculation unit, 25 state observation unit, 30 end unit, 31 and 31A spindle chuck, 32 loader chuck, 35 rotation unit, 36 loader, 37 drive unit, 38 servo motor, 39 encoder, 40 work, 52 delivery position information, 53 Position command, 54 FB position, 55 FB current, 56 state variable, 57 reward, 58 action, 61 loading command.

Claims

At the time of transfer of the work between a first chuck that grips and sends the work and a second chuck that grips and receives the work, a position command to a drive mechanism that moves the first chuck is learned. A machine learning device,
The position command to the drive mechanism, and feedback data from the drive mechanism, a state observation unit that observes as a state variable,
A learning unit that learns the position command that suppresses a displacement of a transfer position of the workpiece between the first chuck and the second chuck according to a data set created based on the state variables;
Comprising,
A machine learning device characterized by that:
A machine learning device according to claim 1,
A control unit that outputs the position command learned by the learning unit to the drive mechanism,
A numerical control device comprising:
The feedback data includes a feedback position indicating a position of a transfer mechanism that transfers the work,
The learning unit includes:
The position indicated by the position command to the drive mechanism, a reward calculation unit that calculates a reward based on the difference between the feedback position,
Based on the reward, a function update unit that updates a function for determining the position command,
The numerical control device according to claim 2, comprising:
The feedback data includes a feedback current that is data indicating a current output from the drive mechanism for the drive mechanism to move the first chuck,
The reward calculation unit calculates the reward based on the difference and the feedback current,
The numerical control device according to claim 3, wherein:
The reward calculation unit increases the reward when the difference is equal to or less than a threshold and the feedback current is equal to or less than a reference current value, and the difference is greater than the threshold or the feedback current is equal to the reference current value. If greater than, reduce the reward,
The numerical control device according to claim 4, wherein:
The function updating unit updates an action value table indicating the function according to the reward.
The numerical control device according to any one of claims 3 to 5, wherein:
The state observation unit, when the difference is greater than a threshold or the feedback current is greater than a reference current value, determines that the delivery has failed,
The learning unit learns the position command when it is determined that the delivery has failed,
The control unit performs a retry of the delivery with the position command learned by the learning unit when the delivery is determined to have failed,
The numerical control device according to claim 4, wherein:
The learning unit learns, among the position commands to the drive mechanism, a position command in a first direction perpendicular to the transfer direction between the first chuck and the second chuck. ,
The numerical control device according to any one of claims 2 to 7, wherein:
It is controlled by the numerical controller according to any one of claims 2 to 8, and is driven by the driving mechanism.
A machine tool characterized in that:
A position of a drive mechanism for moving the first chuck when the work is transferred between a first chuck on the side of gripping and sending the work and a second chuck on the side of holding and receiving the work; A machine learning method for learning instructions,
The position command to the drive mechanism, and feedback data from the drive mechanism, a state observation step of observing as a state variable,
A learning step of learning the position command that suppresses a displacement of a transfer position of the workpiece between the first chuck and the second chuck according to a data set created based on the state variables;
including,
A machine learning method characterized in that:
When transferring the work between the first chuck that grips and sends the work and the second chuck that grips and receives the work, a position command is output to a drive mechanism that moves the first chuck. A numerical controller,
Based on the position command to the drive mechanism and the feedback data from the drive mechanism, determine a positional deviation of the workpiece transfer position between the first chuck and the second chuck, When the displacement amount is not within the allowable range, a control unit that outputs a new position command shifted the position indicated by the position command,
The control unit executes the process of determining the position shift and the process of outputting the new position command when the amount of the position shift is not within the allowable range once or a plurality of times. Keep the amount within an acceptable range,
A numerical controller characterized by the above-mentioned.