CN112368656B

CN112368656B - Machine learning device, numerical control device, machine tool, and machine learning method

Info

Publication number: CN112368656B
Application number: CN201880095230.7A
Authority: CN
Inventors: 加藤勇太; 石田泰一
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2018-07-06
Filing date: 2018-07-06
Publication date: 2021-08-20
Anticipated expiration: 2038-07-06
Also published as: JPWO2020008633A1; WO2020008633A1; JP6505341B1; CN112368656A; DE112018007687T5

Abstract

A machine learning device (20) that learns a position command (53) to a drive unit (37) that moves a loader chuck (32) when transferring a workpiece (40) between the loader chuck (32) that grips and conveys the workpiece (40) and a spindle chuck (31) that grips and receives the workpiece (40), the device comprising: a state observation unit (25) that observes, as state variables (56), position commands (53) to the drive unit (37) and feedback data from the drive unit (37); and a learning unit (21) that learns a position command (53) that suppresses a positional deviation of a transfer position of the workpiece (40) between the loader chuck (32) and the spindle chuck (31) in accordance with a data set created on the basis of the state variable (56).

Description

Machine learning device, numerical control device, machine tool, and machine learning method

Technical Field

The present invention relates to a machine learning device, a numerical control device, a machine tool, and a machine learning method for learning a transfer operation of a workpiece.

Background

In a machine tool such as a lathe, in a transfer operation of a workpiece between a chuck that grips the workpiece and is on a conveying side and a chuck that grips the workpiece and is on a receiving side, a loader that conveys the workpiece moves the workpiece gripped by the chuck on the conveying side to a transfer position. For example, when the workpiece is a long workpiece, the workpiece may not be moved to the center of the chuck region on the workpiece receiving side due to flexure of the workpiece or clamping damage of the workpiece by the chuck. As described above, when transferring a workpiece, if the transfer position is displaced from the appropriate position, the transfer may fail, and therefore a technique capable of suppressing the displacement of the transfer position is desired.

The loader control device described in patent document 1 is configured to convert a correlation between the offset amount of the delivery position of the workpiece and the motor torque of the servo motor that drives the loader into a function, predict the offset amount of the delivery position based on the correlation and the measured motor torque, and correct the delivery position based on the predicted offset amount.

Patent document 1: japanese laid-open patent publication No. 2002-187040

Disclosure of Invention

However, in patent document 1, since the function indicating the correlation is a fixed function, the probability of failure in the delivery of the workpiece does not decrease even if the delivery operation of the workpiece is repeated.

The present invention has been made in view of the above circumstances, and an object of the present invention is to obtain a machine learning device that can reduce the probability of failure in delivery of a workpiece by learning the delivery operation of the workpiece.

In order to solve the above-described problems and achieve the object, the present invention is a machine learning device for learning a position command to a drive mechanism for moving a1 st chuck at the time of transfer of a workpiece between the 1 st chuck for gripping and conveying the workpiece and a2 nd chuck for gripping and receiving the workpiece, the machine learning device including: a state observation unit for observing a position command to the drive mechanism and feedback data from the drive mechanism as state variables; and a learning unit that learns a position command for suppressing a positional shift of a transfer position of the workpiece between the 1 st chuck and the 2 nd chuck, in accordance with a data set created based on the state variables.

ADVANTAGEOUS EFFECTS OF INVENTION

The machine learning device according to the present invention has an effect that the probability of the delivery failure of the workpiece can be reduced by learning the delivery operation of the workpiece.

Drawings

Fig. 1 is a diagram showing a configuration of a work system according to an embodiment.

Fig. 2 is a diagram showing a configuration of a control system including a numerical control device according to an embodiment.

Fig. 3 is a flowchart showing an operation procedure of the work system according to the embodiment.

Fig. 4 is a diagram for explaining a1 st learning example realized by the machine learning device according to the embodiment.

Fig. 5 is a diagram for explaining a2 nd learning example realized by the machine learning device according to the embodiment.

Fig. 6 is a diagram for explaining a positional relationship between a spindle chuck and a workpiece included in a machine tool according to an embodiment.

Fig. 7 is a diagram showing an example of a hardware configuration of the numerical control device according to the embodiment.

Detailed Description

Hereinafter, a machine learning device, a numerical control device, a machine tool, and a machine learning method according to embodiments of the present invention will be described in detail with reference to the drawings. The present invention is not limited to the embodiments.

Detailed description of the preferred embodiments

Fig. 1 is a diagram showing a configuration of a work system according to an embodiment. Fig. 1 shows a case where the work system 1 is viewed from the vertical direction. In the present embodiment, a case where the vertical direction is the Y-axis direction and the horizontal direction, which is the moving direction of the workpiece 40, is the X-axis direction and the Z-axis direction will be described.

The work system 1 includes: a machine tool 2 that machines a workpiece 40; and a control system 3 that controls the operation of the work machine 2. Examples of the work machine 2 are a lathe and a machining center. Next, a case where the machine tool 2 is a lathe will be described.

The work machine 2 includes: a rotating portion 35; a loader chuck 32, which is the 1 st chuck; a spindle chuck 31 which is a2 nd chuck; and a loader 36 which is a conveying mechanism of the workpiece 40. The loader 36 controls the operation by the control system 3. The loader chuck 32 is connected to the loader 36 and moves with the loader 36. The loader chuck 32 can grip a workpiece 40 as a workpiece. Examples of loader chucks 32 are three-jaw chucks, collet chucks. The loader chuck 32 transfers the workpiece 40 to the spindle chuck 31 when starting the machining of the workpiece 40, and receives the workpiece 40 from the spindle chuck 31 after the machining of the workpiece 40 is completed.

The rotating unit 35 rotates about the Z-axis, which is the main axis, as a rotation axis. The spindle chuck 31 is connected to the rotating portion 35 and rotates together with the rotating portion 35. The spindle chuck 31 can grip the workpiece 40. Examples of the spindle chuck 31 are a three-jaw chuck and a collet chuck. When the workpiece 40 is machined, the rotating portion 35 rotates to rotate the workpiece 40 in a state where the spindle chuck 31 holds the workpiece 40. An example of the rotating portion 35 is a spindle mechanism.

When the machine tool 2 loads the workpiece 40 to the rotating portion 35, one end portion of the workpiece 40 is gripped by the loader chuck 32. In this state, the loader 36 moves to the negative side in the X-axis direction and stops at a position facing the spindle chuck 31 (s 0). Then, the loader 36 moves to the negative side in the Z-axis direction. Thereby, the loader 36 moves the workpiece 40 to a position where the spindle chuck 31 can grip the workpiece 40 (s 1). The position of the loader 36 at which the spindle chuck 31 can grip the workpiece 40 is a desired delivery position. The machine tool 2 starts the closing operation of the spindle chuck 31 using the auxiliary function such as M code (s2), and waits until the closing operation of the spindle chuck 31 is completed (s 3). The spindle chuck 31 is closed, whereby the spindle chuck 31 grips the other end of the workpiece 40.

Then, the machine tool 2 starts the detaching operation of the loader chuck 32 (s4), and waits until the detaching operation of the loader chuck 32 is completed (s 5). Then, the loader 36 moves to the positive side in the Z-axis direction. Thereby, the loader 36 is retracted in the direction of separating from the workpiece 40 (s6), and further moved to the positive side in the X-axis direction.

If the machining of the workpiece 40 is completed, the work machine 2 unloads the workpiece 40 from the rotating portion 35 by the processing in the reverse order of the processing of s0 to s6 described above. In this case, each process of s0 to s6 is itself a reverse process. That is, in the processing of s0 and s6, the moving direction of the loader 36 is opposite to the loading direction and the unloading direction. In addition, at the time of unloading, the closing operation of the loader chuck 32 and the opening operation of the spindle chuck 31 are performed.

Specifically, at the time of unloading, the loader 36 moves to the negative side in the X-axis direction and further moves in a direction to approach the workpiece 40. Then, the loader chuck 32 starts the closing operation, and if the closing operation is completed, the spindle chuck 31 starts the opening operation, and if the opening operation is completed, the loader 36 is retreated in the direction of separating from the workpiece 40, and further moves to the positive side in the X-axis direction.

Since the process of loading the workpiece 40 on the rotating portion 35 and the process of unloading the workpiece 40 from the rotating portion 35 are the same process, the process of loading the workpiece 40 on the rotating portion 35 will be described below.

The machine tool 2 has a characteristic feature unique to a machine in a process of gripping the workpiece 40, a conveyance process, and the like. Therefore, there is a correspondence relationship between the position command to the loader chuck 32 and the actual position of the loader chuck 32 in the X-axis direction, which is caused by the characteristic features of the machine. Therefore, even if the workpiece 40 is to be loaded by an appropriate position command, when the loader chuck 32 attempts to transfer the workpiece 40 to the spindle chuck 31, the position of the workpiece 40 in the X-axis direction may be shifted, and the workpiece 40 may collide with the end 30 on the Z-axis direction positive side of the spindle chuck 31. In this case, the loader chuck 32 cannot transfer the workpiece 40 to the spindle chuck 31.

In the present embodiment, a Numerical Control (NC) device 10 described later included in the Control system 3 learns the position of the loader chuck 32 in the X axis direction when the loader chuck 32 attempts to transfer the workpiece 40 to the spindle chuck 31. That is, the numerical control device 10 learns the position command to the loader chuck 32, thereby reducing the probability of the delivery failure of the workpiece 40. If the posture and shape of the workpiece 40 gripped by the loader chuck 32 are the same each time, the position of the workpiece 40 in the X-axis direction corresponds to the position of the loader chuck 32 in the X-axis direction. Therefore, in the present embodiment, the amount of positional displacement in the X-axis direction of the loader chuck 32 and the amount of positional displacement in the X-axis direction of the workpiece 40 are used synonymously.

Next, the configuration of the numerical control device 10 that controls the operation of the machine tool 2 will be described. Fig. 2 is a diagram showing a configuration of a control system including a numerical control device according to an embodiment. The control system 3 has a numerical control device 10, a drive unit 37, and a servo motor 38.

The numerical control device 10 is a computer that controls the position of the loader 36 by sending a position command 53 to the drive unit 37. The position command 53 transmitted from the numerical control device 10 to the drive unit 37 is a command for specifying the position of the loader 36, and includes a position command in the X-axis direction and a position command in the Z-axis direction. The numerical control device 10 controls the transfer of the workpiece 40 from the loader chuck 32 to the spindle chuck 31, and when the transfer fails, the position command 53 in the X-axis direction to the loader chuck 32 is changed to control the transfer again. The numerical control device 10 learns the appropriate position command 53 in the X-axis direction of the loader chuck 32 based on whether the result of the delivery to the position command 53 in the X-axis direction of the loader chuck 32 has failed or succeeded when the position command 53 in the X-axis direction is used.

The drive unit 37 is a drive mechanism that moves the loader 36 by driving the servo motor 38. The drive unit 37 calculates the value of the current to be supplied to the servo motor 38 based on a position command 53 from the numerical control device 10. The drive unit 37 drives the servo motor 38 by supplying a current corresponding to the position command 53 to the servo motor 38. The drive unit 37 supplies a feedback (FB: Feed-Back) current 55, which is data indicating a current supplied to the servo motor 38, to the numerical control device 10. The FB current 55 is an example of feedback data from the driving unit 37 to the numerical control device 10.

If information indicating the rotational speed of the servo motor 38 is transmitted from the encoder 39, the drive unit 37 calculates the FB position 54, which is data indicating the current position of the workpiece 40, based on the rotational speed, and transmits the calculated FB position to the numerical control device 10. The FB position 54 is an example of feedback data from the drive unit 37 to the numerical control device 10.

The servo motor 38 is connected to the loader 36, and moves the loader 36 in accordance with the current from the drive unit 37. The servo motor 38 includes a servo motor that moves the loader 36 in the X-axis direction and a servo motor that moves the loader 36 in the Z-axis direction. An encoder 39 for detecting the rotational speed of the servo motor 38 is attached to each servo motor 38 in the X-axis direction and the Z-axis direction. The encoder 39 transmits information indicating the detected rotation speed to the driving unit 37.

The numerical control device 10 includes a control processing program storage unit 11, an analysis unit 12, a control unit 13, a storage unit 14, and a machine learning device 20. The control machining program storage unit 11 stores a control machining program used when machining the workpiece 40. The control processing program includes a loading command 61 for loading the workpiece 40 on the rotating portion 35, a processing command for processing the workpiece 40, and an unloading command for unloading the workpiece 40 from the rotating portion 35. Load instruction 61 is illustrated in fig. 2. Among these commands, the load command 61 and the unload command are dedicated commands for transferring the workpiece 40. The loading command 61 is transmitted to the analysis unit 12 as the G code 51 for positioning the loader 36.

The analysis unit 12 analyzes the control machining program. The analysis unit 12 determines whether or not the analyzed command is a dedicated command, and generates delivery position information 52 indicating a position to deliver the workpiece 40 based on the G code 51 when the dedicated command is the loading command 61. That is, the analysis unit 12 generates the delivery position information 52 based on the positioning command of the loader 36 included in the G code 51. The transfer position information 52 is information of a position where the workpiece 40 is transferred between the chucks between the loader chuck 32 and the spindle chuck 31. Specifically, the delivery position information 52 is the end point of the loader 36. When the first handover of the workpiece 40 is performed, the information used in the handover operation is set by the argument of the dedicated command.

The dedicated instruction includes the following instruction arguments (A1) to (A5).

(A1) Transfer position information 52 of the workpiece 40, i.e., the end point of the loader 36

(A2) Reference current value a as a reference for determining whether or not a collision is caused during transfer of workpiece 40

(A3) Pull-back amount Lz of loader 36 in the case where collision is determined

(A4) Direction and amount of movement Lx for moving loader 36 during learning

(A5) Maximum moving distance Lmax in one direction at the time of learning

The delivery position information 52 of (a1) includes an X coordinate and a Z coordinate. (A2) The reference current value a of (a) is a value for determining whether or not the handover is in an abnormal state, and is compared with the FB current 55 of the current supplied to the servo motor 38. When the FB current 55 larger than the reference current value a is sent to the servo motor 38, the work 40 collides with the spindle chuck 31 at a position other than the delivery position of the spindle chuck 31. An example of a position where the workpiece 40 collides with the end 30 of the spindle chuck 31 other than the transfer position of the spindle chuck 31. (A3) Is a distance by which the workpiece 40 is pulled back in the Z-axis direction in the case where the workpiece 40 collides at the end portion 30 and the movement of the workpiece 40 is stopped.

In the case where the workpiece 40 collides at the end portion 30 and the movement is stopped, the learning of the position command 53 is performed by the machine learning device 20. (A4) The movement amount Lx of (b) is a distance that the workpiece 40 moves in the X-axis direction at the time of this learning. After the workpiece 40 is moved in the X-axis direction by the movement amount Lx, the workpiece 40 is moved toward the spindle chuck 31 in the Z-axis direction. The workpiece 40 is moved by the movement amount Lx until the movement position at the time of delivery falls within the allowable range. The moving distance Lmax of (a5) is a limit distance for the workpiece 40 to move in the X-axis direction during learning. That is, even at the time of learning, the workpiece 40 does not move farther than the movement distance Lmax.

The analysis unit 12 transmits the delivery position information 52 of (a1) and the pull-back amount Lz of (A3) to the control unit 13. The analysis unit 12 transmits the values of the command arguments (a2), (a4), and (a5) to the machine learning device 20. The analysis unit 12 is not limited to the case of acquiring the value of the instruction argument from the dedicated instruction, and may acquire the value corresponding to the instruction argument from the parameter. In this case, a value corresponding to the instruction argument is stored in advance in the storage unit 14 as a parameter.

The control unit 13 generates the position command 53 in accordance with the passing position information 52 transmitted from the analysis unit 12 or the action 58 given from the machine learning device 20. Action 58 is the next position command 53 in the X-axis direction. The control unit 13 transmits the position command 53 to the drive unit 37 and the machine learning device 20. The control unit 13 controls the continuation of the delivery operation of the workpiece 40 if it receives a notification from the machine learning device 20 that the movement position of the workpiece 40 is within the allowable range.

Upon receiving a notification from the machine learning device 20 that the movement position of the workpiece 40 is outside the allowable range, the control unit 13 retracts the workpiece 40 in the Z-axis direction by a retraction amount Lz, and generates a position command 53 in accordance with act 58.

The machine learning device 20 includes a state observation unit 25 and a learning unit 21. The state observation unit 25 acquires the reference current value a of (a2) in the command argument from the analysis unit 12, and the learning unit 21 acquires the movement amount Lx of (a4) and the movement distance Lmax of (a5) in the command argument from the analysis unit 12.

The state observation unit 25 acquires a position command 53 in the X-axis direction and the Z-axis direction from the control unit 13, an FB position 54 in the X-axis direction and the Z-axis direction from the drive unit 37, and an FB current 55 in the X-axis direction and the Z-axis direction from the drive unit 37.

When determining whether or not the movement position of the workpiece 40 is within the allowable range, the state observation unit 25 uses a position command 53 in the X-axis direction and the Z-axis direction, an FB position 54 in the X-axis direction and the Z-axis direction, and an FB current 55 in the X-axis direction and the Z-axis direction. The state observation unit 25 may determine whether or not the movement position of the workpiece 40 is within the allowable range based on the position command 53 in the X-axis direction, the FB position 54 in the X-axis direction, and the FB current 55 in the X-axis direction. The state observation unit 25 may determine whether or not the movement position of the workpiece 40 is within the allowable range based on the position command 53 in the Z-axis direction, the FB position 54 in the Z-axis direction, and the FB current 55 in the Z-axis direction. The state observation unit 25 may determine whether or not the movement position of the workpiece 40 is within the allowable range based on the FB position 54 in the Z-axis direction and the position command 53 in the Z-axis direction, without using the FB current 55 in the Z-axis direction.

When the learning unit 21 learns the position command 53 for the drive unit 37, the state observation unit 25 observes the position command 53 in the X-axis direction, the FB position 54 in the X-axis direction, and the FB current 55 in the X-axis direction as the state variables 56, and transmits the state variables 56, which are the observation results, to the learning unit 21. That is, the state variables 56 sent from the state observation unit 25 to the learning unit 21 include the position command 53 in the X-axis direction, the FB position 54 in the X-axis direction, and the FB current 55 in the X-axis direction.

Depending on the shape of the workpiece 40, the transfer position of the workpiece 40 may be shifted in the X-axis direction from the center position of the spindle chuck 31. In addition, when the transfer position of the workpiece 40 with respect to the loader chuck 32 is inappropriate, the transfer position of the workpiece 40 may be shifted in the X-axis direction from the center position of the spindle chuck 31. In these cases, the workpiece 40 collides with the spindle chuck 31 at a position other than the delivery position of the spindle chuck 31, and the workpiece 40 is likely to stop at a position where it cannot be gripped by the spindle chuck 31.

When the workpiece 40 collides with the spindle chuck 31 at a position other than the transfer position, the movement position of the workpiece 40 is out of the allowable range. When the workpiece 40 moves to the transfer position while rubbing against the spindle chuck 31, the movement position of the workpiece 40 is out of the allowable range.

When the movement position of the workpiece 40 is out of the allowable range, the load on the loader 36 in at least one of the X-axis direction and the Z-axis direction increases. Therefore, when the movement position of the workpiece 40 is out of the allowable range, the current to be supplied to the servo motor 38 increases, and the FB current 55 also increases. Therefore, the state observation unit 25 determines whether or not the movement position of the workpiece 40 is within the allowable range based on the comparison result between the reference current value a and the FB current 55. In a normal state in which the movement position of the workpiece 40 is within the allowable range, the workpiece 40 stops at the delivery position of the spindle chuck 31, and therefore the load to the loader 36 does not increase and the FB current 55 does not rise.

In addition, when the movement position of the workpiece 40 is out of the allowable range, the position of the loader 36 corresponding to the position command 53 and the position of the loader 36 corresponding to the FB position 54 become different within a certain time. Therefore, the state observation unit 25 determines whether or not the movement position of the workpiece 40 is within the allowable range based on the comparison result between the position command 53 and the FB position 54. The state observation unit 25 transmits the determination result of whether or not the movement position of the workpiece 40 is within the allowable range to the control unit 13.

The learning unit 21 learns the next action 58, which is the position command 53, in accordance with the state variables 56. In other words, the learning unit 21 learns the position command 53 for reducing the amount of positional deviation of the delivery position of the workpiece 40 with respect to the loader 36.

Specifically, the learning unit 21 learns the action 58 according to a data set created based on a state variable 56 including the position command 53, the FB position 54, and the FB current 55. The learning unit 21 includes a function updating unit 22 and a reward calculating unit 23.

The reward calculation unit 23 calculates a reward 57 based on the state variable 56. The return calculation unit 23 calculates a difference between the position indicated by the position command 53 and the FB position 54 based on the state variable 56, and extracts the FB current 55 from the state variable 56. The reward calculation unit 23 increases the reward 57 when the difference between the position indicated by the position command 53 and the FB position 54 is equal to or smaller than the threshold value and the FB current 55 is equal to or smaller than the reference current value a. In this case, the smaller the difference between the position indicated by the position command 53 and the FB position 54, the larger the reward 57 is made by the reward calculation unit 23, and the smaller the FB current 55, the larger the reward 57 is made by the reward calculation unit 23. The learning unit 21 transmits the calculated reward 57 to the function updating unit 22. In the following description, the difference between the position indicated by the position command 53 and the FB position 54 is sometimes referred to as a position difference.

The function update unit 22 stores a function for determining the action 58 in advance, and updates the function for determining the action 58 based on the report 57. An example of the function for determining action 58 is action cost function Q(s) described later_t，a_t). The function update unit 22 of the present embodiment operates for each passWhen the machine 2 repeats the delivery operation of the workpiece 40, the action cost function Q (s, a) is updated so that the positional deviation amount of the delivery position is reduced. The function update unit 22 calculates the action 58 using the updated action cost function Q (s, a). The function update unit 22 transmits the calculated action 58 to the control unit 13, and transmits the learning data up to the previous time, the data used in the learning, and the data necessary for controlling the loader 36 to the storage unit 14. An example of the learning data is the position command 53 for the next use calculated when the handover is successful, and an example of the data used for the learning is the action merit function Q (s, a) used by the learning unit 21 for the learning. An example of the data used in the control of the loader 36 is the pull-back amount Lz. The storage unit 14 stores learning data up to the previous time, data used in learning, and data necessary for controlling the loader 36 by the control unit 13.

Next, the operation sequence of the work system 1 will be explained. Fig. 3 is a flowchart showing an operation procedure of the work system according to the embodiment. If the numerical control device 10 reads the load command 61, the movement of the loader 36 to the delivery position specified by the command argument of (a1) described above is started when the workpiece 40 is delivered for the first time, that is, when the workpiece is not learned (step ST 1). At this time, the control unit 13 sends a position command 53 to the drive unit 37 and the state observation unit 25. Thereby, the loader 36 moves in the X-axis direction and then moves in the Z-axis direction with the loader chuck 32 gripping the workpiece 40.

When the workpiece 40 starts moving, the driving unit 37 acquires the FB position 54 and the FB current 55 at every specific cycle, and sends them to the state observing unit 25. Thereby, the state observation unit 25 monitors the position command 53, the FB position 54, and the FB current 55.

The state observation unit 25 determines whether or not the FB current 55 in each of the X-axis and Z-axis is equal to or lower than the reference current value a (step ST 2). The reference current value a may be different between the X axis and the Z axis.

When the FB current 55 in each of the X-axis and Z-axis directions before reaching the delivery position is larger than the reference current value a (No at step ST2), the state observation unit 25 transmits the state variable 56 including the position command 53 in the X-axis direction and the Z-axis direction, the FB position 54 in the X-axis direction and the Z-axis direction, and the FB current 55 in the X-axis direction and the Z-axis direction to the learning unit 21. Further, the state observation unit 25 notifies the control unit 13 that the movement position of the workpiece 40 is out of the allowable range.

When the FB current 55 for each axis is larger than the reference current value a, that is, when the movement to the handover position fails, the learning unit 21 sets a report 57 having a smaller value than the position command 53 used for the handover. Thus, the learning unit 21 learns the appropriate position command 53 in accordance with the state variable 56, and determines the next position command 53, i.e., the action 58, so that the report 57 becomes the maximum (step ST4 a). The control unit 13 retracts the workpiece 40 in the Z-axis direction by the retraction amount Lz (step ST 5). Then, the control unit 13 starts the movement of the loader 36 to the delivery position (step ST 1). Thus, the control unit 13 moves the workpiece 40 in the X-axis direction by the position command 53 generated in accordance with act 58, and then moves the workpiece 40 in the Z-axis direction.

In the process of step ST2, if the FB current 55 in each of the X-axis and Z-axis becomes equal to or less than the reference current value a (Yes at step ST2), the state observation unit 25 determines whether or not the difference between the position indicated by the position command 53 and the position of the FB position 54 in each of the X-axis and Z-axis is equal to or less than a threshold value (step ST 3).

When the FB position 54 and the position command 53 in the X axis and the Z axis are different (No at step ST3), the state observation unit 25 transmits the state variable 56 including the position command 53 in the X axis direction, the FB position 54 in the X axis direction, and the FB current 55 in the X axis direction to the learning unit 21. Further, the state observation unit 25 notifies the control unit 13 that the movement position of the workpiece 40 is out of the allowable range. Thus, the processing of steps ST4a and ST5 described above is performed, and the processing of step ST1 is further performed.

Here, the learning process of action 58 performed by the learning unit 21 will be described. The learning algorithm used by the learning unit 21 may be any learning algorithm. Here, a case where Reinforcement Learning (Reinforcement Learning) is applied to the Learning algorithm will be described. Reinforcement learning is an action 58 that an agent, which is an agent of action in an environment, observes the current state indicated by state variables 56 and decides an action to be taken based on the observation result. The agent learns the countermeasures that the agent selects action 58 to obtain the most reward 57 from the environment, and the agent learns the most reward 57 through a series of actions 58. As typical reinforcement Learning methods, Q-Learning (Q-Learning) and TD-Learning (TD-Learning) are known. For example, in the case of Q learning, a general update formula (action value table) of the action value function Q (s, a) is represented by the following formula (1). That is, an example of the action value table is the action value function Q (s, a) of the formula (1).

[ formula 1 ]

In formula (1), s_tRepresenting the environment at time t, a_tShowing the action at time t. By action a_tThe environment becomes s_t+1。r_t+1Represents the return 57 obtained by the change of its environment, γ represents the discount rate, and α represents the learning coefficient. When the Q learning is applied, the position command 53 for the next time of the handover operation becomes action a_t。

Regarding the update represented by equation (1), if the action value of the optimal action a at time t +1 is greater than the action value Q of the action a performed at time t, the action value Q is increased, and in the opposite case, the action value Q is decreased. In other words, the action-merit function Q (s, a) is updated so that the action-merit Q of the action a at the time t approaches the optimum action-merit at the time t + 1. Thus, the optimal action value in a certain environment is propagated in turn to the action values in its previous environment.

The reward calculation unit 23 calculates a reward 57 based on the position difference between the position indicated by the position command 53 and the FB position 54 and the FB current 55.

As described above, the reward calculation unit 23 increases the reward 57 when the difference between the position indicated by the position command 53 and the position of the FB position 54 is equal to or smaller than the threshold value and the FB current 55 is equal to or smaller than the reference current value a. At this time, the reward calculation unit 23 gives a reward 57 of "1", for example.

On the other hand, the reward calculation unit 23 decreases the reward 57 when the difference between the position indicated by the position command 53 and the FB position 54 is larger than the threshold value, or when the FB current 55 is larger than the reference current value a. At this time, the reward calculation unit 23 gives a reward 57 of "-1", for example.

For example, when the position difference is 0 and the amount of change in the FB current 55 is 0, the return calculation unit 23 sets the return 57 to the maximum return. When the position difference is equal to or smaller than the threshold value and the FB current 55 is half the reference current value a, the reward calculation unit 23 sets the reward 57 to half the maximum reward. An example of the case where the FB current 55 is half the reference current value a is a case where the delivery of the workpiece 40 is successful, but the delivery position is slightly shifted from the desired position. When the workpiece 40 reaches the contact position while rubbing against the spindle chuck 31, the FB current 55 increases during the rubbing. In addition, when the delivery position is slightly shifted from the desired position, if the spindle chuck 31 tries to grip the workpiece 40 at the center of the delivery position of the workpiece 40 after reaching the delivery position, the workpiece 40 is pressed by the spindle chuck 31 and the FB current 55 increases. In the case described above, the delivery position is within the allowable range, but the machine learning device 20 gives a report 57 between the case where the workpiece 40 does not collide with the end portion 30 and the case where the workpiece 40 collides with the end portion 30.

The reward calculation unit 23 sets the reward 57 to the minimum reward when the position difference is larger than the threshold value or when the FB current 55 is larger than the reference current value a. The reward calculation unit 23 sends the calculated reward 57 to the function update unit 22.

The function update unit 22 updates the function for determining the action 58 in accordance with the report 57 calculated by the report calculation unit 23. For example, in the case of Q learning, the action merit function Q(s) represented by the formula (1)_t，a_t) Is a function for calculating action 58, and is updated by function update unit 22.

Fig. 4 is a diagram for explaining a1 st learning example realized by the machine learning device according to the embodiment. Fig. 4 shows positions P0 to P6 in the X axis direction of the workpiece 40. The position of the workpiece 40 in the X-axis direction when the workpiece 40 collides with the end 30 of the spindle chuck 31 is set to a position P0. In this case, the machine learning device 20 repeats the process of pulling back the workpiece 40 to the positive side in the Z-axis direction, the process of moving the workpiece 40 to the next position in the X-axis direction, and the process of inserting the workpiece 40 to the negative side in the Z-axis direction until the workpiece 40 does not collide with the end 30 of the spindle chuck 31.

Specifically, the machine learning device 20 moves the workpiece 40 in the X-axis direction in the order of the X-axis position, i.e., the position P1, the position P2, the position P3, the position P4, the position P5, and the position P6. The position P0 and the position P1 are separated by a moving amount Lx. Likewise, between position P1 and position P2, between position P2 and position P3, between position P0 and position P4, between position P4 and position P5, and between position P5 and position P6 are each separated by an amount of movement Lx. In addition, between the position P0 and the position P3, and between the position P0 and the position P6 are each separated by a movement distance Lmax.

For example, when the machine learning device 20 moves the workpiece 40 to the position P1, it sends an action 58 for loading the workpiece 40 at the position P1 to the control unit 13. Thereby, the loader 36 is moved in the X-axis direction by the position command 53 corresponding to action 58.

The machine learning device 20 completes the operation of moving the workpiece 40 in the X-axis direction if the workpiece 40 can move to the spindle chuck 31 without colliding with the end 30. The machine learning device 20 gives a low return 57 to the position command 53 when the workpiece 40 collides with the end portion 30, and gives a high return 57 to the position command 53 when the workpiece 40 does not collide with the end portion 30.

The machine learning device 20 may move the workpiece 40 to the positions P1 to P6 in an arbitrary order. For example, the machine learning device 20 may move the workpiece 40 in the X-axis direction in order of the position P0, such as the position P1, the position P4, the position P2, the position P5, the position P3, and the position P6. The machine learning device 20 is not limited to the case where the positions of 6 positions, i.e., the positions P1 to P6, are set as the positions in the X axis direction, and may set the positions in the X axis direction at 5 positions or less or at 7 positions or more.

If the FB current 55 in the X-axis and Z-axis becomes equal to or less than the reference current value a (Yes at step ST2) and the position difference between the position indicated by the position command 53 in the X-axis and Z-axis and the FB position 54 becomes equal to or less than the threshold value (Yes at step ST3), the learning unit 21 learns the next position command 53, i.e., action 58, in accordance with the state variables 56 (step ST4 b). That is, the learning unit 21 sets a report 57 having a large value for the position command 53 used for the handover, and then determines an action 58 corresponding to the next position command 53.

Further, the state observation unit 25 notifies the control unit 13 that the movement position of the workpiece 40 is within the allowable range after moving the workpiece 40 to the delivery position. Thereby, the control unit 13 performs the transfer process of the workpiece 40 between the chucks.

Specifically, the controller 13 causes the work machine 2 to execute the operations s2 to s6 described above. That is, the control unit 13 starts the operation of closing the spindle chuck 31 (step ST6), and waits until the spindle chuck 31 is closed (step ST 7). The spindle chuck 31 is closed, whereby the spindle chuck 31 grips the workpiece 40.

Then, the control unit 13 starts the operation of separating the loader chuck 32 (step ST8), and waits until the loader chuck 32 is separated (step ST 9). Then, the control unit 13 retracts the loader 36 to the positive side in the Z-axis direction (step ST 10).

The machine learning device 20 learns the position command 53 through the same processing as in the case where the workpiece 40 is loaded from the loader 36 to the spindle chuck 31 even when the workpiece 40 is unloaded from the spindle chuck 31 to the loader 36.

As described above, when the delivery of the workpiece 40 fails, the machine learning device 20 can correct the delivery position because the delivery of the workpiece 40 is performed again in accordance with the movement amount Lx. Further, since the machine learning device 20 learns the delivery position of the workpiece 40, failure of delivery can be prevented. Further, since the machine learning device 20 learns the delivery position of the workpiece 40, the machine learning device 20 can be applied even in an environment where delivery fails due to a slight positional deviation, such as a collet chuck. In addition, the machine learning device 20 can re-execute the delivery of the workpiece 40 and can prevent failure of the delivery, so that productivity achieved by the machine tool 2 is improved.

Further, since the machine learning device 20 determines the delivery position of the workpiece 40 based on the position difference between the position indicated by the position command 53 and the FB position 54, it is not necessary to provide a special mechanism or device such as a camera for confirming delivery of the workpiece 40. Therefore, the handover can be confirmed at low cost.

Further, since the machine learning device 20 performs the transfer of the workpiece 40 again in accordance with the movement amount Lx when the transfer of the workpiece 40 fails, the workpiece 40 does not need to be manually restored even when the workpiece 40 collides with the end 30 of the spindle chuck 31. Therefore, the downtime for loading the workpiece 40 can be reduced, and the deterioration of productivity can be suppressed.

In the present embodiment, the description has been given of the case where the machine tool 2 moves the workpiece 40 in the X-axis direction and the Z-axis direction, but the machine tool 2 may move the workpiece 40 in the X-axis direction, the Y-axis direction, and the Z-axis direction. In this case, the position command 53 transmitted from the numerical control device 10 to the drive unit 37 includes a position command in the X-axis direction, a position command in the Y-axis direction, and a position command in the Z-axis direction. The servo motors 38 include respective servo motors that move the loader 36 in the X-axis direction, the Y-axis direction, and the Z-axis direction. Then, the machine learning device 20 learns the position command in the X-axis direction, the position command in the Y-axis direction, and the position command in the Z-axis direction.

Fig. 5 is a diagram for explaining a2 nd learning example realized by the machine learning device according to the embodiment. Fig. 6 is a diagram for explaining a positional relationship between a spindle chuck and a workpiece included in a machine tool according to an embodiment. Here, a case where the machine tool 2 moves the workpiece 40 in the X-axis direction, the Y-axis direction, and the Z-axis direction will be described. Fig. 5 and 6 show the position of the workpiece 40 in the XY plane when the workpiece 40 is viewed from the Z-axis direction.

The spindle chuck 31A shown in fig. 6 is an example of the spindle chuck 31 described in fig. 1. The spindle chuck 31A is a three-jaw chuck. The three jaws of the spindle chuck 31A each grip the workpiece 40 by moving toward the center Q1 of the circular chuck region 45.

The numerical control device 10 generates the position command 53 for inserting the workpiece 40 into the center Q1, but the actual workpiece 40 is sometimes inserted into a position offset from the center Q1 such as the position P0, and the workpiece 40 collides with the end 30. Therefore, the machine learning device 20 learns the position command 53 that prevents the workpiece 40 from colliding with the end portion 30.

The position of the workpiece 40 when the workpiece 40 collides with the end 30 of the spindle chuck 31A is set to a position P0. In this case, the machine learning device 20 repeats the process of retracting the workpiece 40 to the positive side in the Z-axis direction, the process of moving the workpiece 40 to the next position in the X-axis direction and the Y-axis direction, and the process of inserting the workpiece 40 to the negative side in the Z-axis direction until the workpiece 40 does not collide with the end 30 of the spindle chuck 31A. In this case, the machine learning device 20 calculates the next movement position of the workpiece 40 based on the movement amount Lx and the movement distance Lmax. That is, when the movement position is learned, the workpiece 40 moves to a position in the XY plane where the movement is restricted by a specific learning direction and a specific learning movement amount.

Specifically, the machine learning device 20 moves the workpiece 40 in the order of the position P11, the position P12, the position P13, the position P14, the position P15, the position P16, the position P17, the position P18, and the position P19, which are positions in the XY plane. Between position P0 and position P11, between position P11 and position P12, and between position P12 and position P13 are each separated by an amount of movement Lx. Likewise, between the position P0 and the position P14, between the position P14 and the position P15, and between the position P15 and the position P16 are each separated by the moving amount Lx. Likewise, between the position P0 and the position P17, between the position P17 and the position P18, and between the position P18 and the position P19 are each separated by the moving amount Lx. In addition, between the position P0 and the position P13, between the position P0 and the position P16, and between the position P0 and the position P19 are separated by a movement distance Lmax, respectively.

In addition, a learning angle θ is formed between a direction from the position P0 toward the position P13 and a direction from the position P0 toward the position P16, and a learning angle θ is formed between a direction from the position P0 toward the position P16 and a direction from the position P0 toward the position P19. The machine learning device 20 repeatedly performs the following processes: a process of moving the workpiece 40 in the 1 st direction from the position P0 by the movement amount Lx; and if the workpiece 40 is moved to the movement distance Lmax, moving the workpiece 40 in the 2 nd direction rotated from the 1 st direction by the learning angle θ by the movement amount Lx from the position P0. That is, when the machine learning device 20 performs the first search for the appropriate movement position of the workpiece 40, the appropriate movement position is searched for while moving the workpiece 40 by the movement amount Lx in the 1 st direction up to the maximum movement distance Lmax. When the appropriate movement position is not found in the 1 st direction, the machine learning device 20 performs the same search as the 1 st direction with respect to the 2 nd direction after rotating the 1 st direction by the learning angle θ. The machine learning device 20 repeats the following processes until the total of the learning angles θ exceeds 360 degrees in the chuck region 45: a process of searching for an appropriate movement position while moving the workpiece 40 by the movement amount Lx; and a process of rotating the learning angle θ.

For example, when the machine learning device 20 moves the workpiece 40 to the position P11, it sends an action 58 for loading the workpiece 40 at the position P11 to the control unit 13. Thereby, the loader 36 is moved in the X-axis direction and the Y-axis direction by the position command 53 corresponding to action 58.

The machine learning device 20 completes the operation of moving the workpiece 40 in the X-axis direction and the Y-axis direction if the workpiece 40 can move to the spindle chuck 31A without colliding with the end 30.

The machine learning device 20 may move the workpiece 40 to the positions P11 to P19 in an arbitrary order. For example, the machine learning device 20 may move the workpiece 40 in order of proximity to and away from the position P0, such as the position P11, the position P14, the position P17, the position P12, the position P15, the position P18, the position P13, the position P16, and the position P19. The machine learning device 20 is not limited to the case where 9 positions of the positions P11 to P19 are set as positions in the XY plane, and may set positions in the XY plane of 8 or less positions or 10 or more positions.

Here, the hardware configuration of the numerical controller 10 will be explained. Fig. 7 is a diagram showing an example of a hardware configuration of the numerical control device according to the embodiment.

The numerical control device 10 can be realized by a control circuit 300 shown in fig. 7, that is, a processor 301 and a memory 302. Examples of the processor 301 are a CPU (also referred to as a Central Processing Unit, a Processing Unit, an arithmetic Unit, a microprocessor, a microcomputer, a processor, a dsp (digital Signal processor)), or a system lsi (large Scale integration). Examples of the memory 302 are ram (random Access memory), rom (read Only memory).

The numerical control device 10 is realized by the processor 301 reading and executing a program for executing the operation of the numerical control device 10 stored in the memory 302. The program may be a program for causing a computer to execute a procedure or a method of the numerical control device 10. The memory 302 is also used for a temporary memory when the processor 301 executes various processes.

Further, the functions of the numerical control device 10 may be partially implemented by dedicated hardware and partially implemented by software or firmware. The machine learning device 20 may be realized by the control circuit 300 shown in fig. 7.

In the present embodiment, it is determined whether or not the delivery position is within the allowable range based on the FB current 55 and the position difference, but it may be determined whether or not the delivery position is within the allowable range based on the position difference without using the FB current 55.

In the present embodiment, the position command 53 for the delivery position is learned based on the FB current 55 in the X-axis direction and the position difference in the X-axis direction, but the position command 53 for the delivery position may be learned based on the position difference in the X-axis direction without using the FB current 55 in the X-axis direction.

In addition, even when the spindle chuck 31 is an electric chuck, the positional deviation of the workpiece 40 can be detected on the spindle chuck 31 side. In this case, the machine learning device 20 can learn the position command 53 based on the positional deviation detected on the spindle chuck 31 side.

In the present embodiment, the case where the machine learning device 20 performs machine learning by reinforcement learning has been described, but the machine learning device 20 may perform machine learning by other known methods, for example, a neural network, genetic programming, functional logic programming, a support vector machine, and the like.

In the present embodiment, the description has been given of the case where the control unit 13 controls the loader 36 based on the action 58 learned by the learning unit 21, but the control unit 13 may control the loader 36 without using the action 58. In this case, the control unit 13 determines whether or not the movement position of the workpiece 40 is within the allowable range based on the position command 53, the FB position 54, and the FB current 55, and outputs a new position command 53 in which the position indicated by the position command 53 is shifted when the position shift amount is not within the allowable range. That is, the control unit 13 slightly shifts the movement position of the workpiece 40 and searches for an appropriate movement position of the workpiece 40. Specifically, the control unit 13 executes the process of determining the position offset and the process of outputting the new position command 53 1 or more times when the position offset is not within the allowable range, thereby converging the position offset within the allowable range.

As described above, according to the embodiment, since the position command 53 for suppressing the positional deviation of the transfer position of the workpiece 40 between chucks is learned in accordance with the data set created based on the state variable 56, the transfer operation of the workpiece 40 is repeated, and the probability of the failure of the transfer of the workpiece 40 can be reduced.

The configuration described in the above embodiment is an example of the content of the present invention, and may be combined with other known techniques, and a part of the configuration may be omitted or modified without departing from the scope of the present invention.

Description of the reference numerals

The system comprises a work system 1, a work machine 2, a control system 3, a numerical control device 10, a control program storage unit 11, an analysis unit 12, a control unit 13, a storage unit 14, a machine learning device 20, a learning unit 21, a function update unit 22, a report calculation unit 23, a state observation unit 25, a terminal unit 30, a spindle chuck 31A, a loader chuck 32, a rotation unit 35, a loader 36, a drive unit 37, a servo motor 38, an encoder 39, a workpiece 40, a transfer position information 52, a position command 53, a position command 54FB, a position command 55FB current, a state variable 56, a report 57, an action 58, and a load command 61.

Claims

1. A machine learning device for learning a position command to a drive mechanism for moving a1 st chuck when transferring a workpiece between the 1 st chuck for gripping and conveying the workpiece and a2 nd chuck for gripping and receiving the workpiece,

the machine learning device is characterized by comprising:

a state observation unit that observes the position command to the drive mechanism and feedback data from the drive mechanism as state variables; and

a learning unit that learns the position command for suppressing a positional shift of a transfer position of the workpiece between the 1 st chuck and the 2 nd chuck in accordance with a data set created based on the state variables,

the feedback data includes a feedback position indicating a position of a conveying mechanism that conveys the workpiece,

the learning unit includes:

a return calculation unit that calculates a return based on a difference between the position indicated by the position command to the drive mechanism and the feedback position; and

and a function updating unit that updates a function for determining the position command based on the return.

2. A numerical control apparatus, comprising:

the machine learning device of claim 1; and

and a control unit that outputs the position command learned by the learning unit to the drive mechanism.

3. The numerical control apparatus according to claim 2,

the feedback data includes a feedback current which is data representing a current outputted from the driving mechanism in order to move the 1 st chuck,

the report calculation unit calculates the report based on the difference and the feedback current.

4. The numerical control apparatus according to claim 3,

the reward calculation unit increases the reward when the difference is equal to or smaller than a threshold value and the feedback current is equal to or smaller than a reference current value, and decreases the reward when the difference is larger than the threshold value or the feedback current is larger than the reference current value.

5. The numerical control apparatus according to claim 3 or 4,

the function update unit updates an action value table representing the function in accordance with the report.

6. The numerical control apparatus according to claim 3,

the state observation unit determines that the handover has failed when the difference is greater than a threshold value or the feedback current is greater than a reference current value,

the learning unit learns the position command if the handover is determined to have failed,

if the handover is determined to have failed, the control unit retries the handover by the position command learned by the learning unit.

7. The numerical control apparatus according to any one of claims 2 to 6,

the learning unit learns a position command in a1 st direction perpendicular to the direction of transfer between the 1 st chuck and the 2 nd chuck among the position commands to the drive mechanism.

8. A working machine is characterized in that a machine body is provided with a working machine body,

controlled by a numerical control device according to any one of claims 2 to 7, driven by said drive mechanism.

9. A machine learning method for learning a position command to a drive mechanism for moving a1 st chuck when transferring a workpiece between the 1 st chuck for gripping the workpiece and transferring the workpiece to a2 nd chuck for gripping the workpiece and receiving the workpiece,

the machine learning method is characterized by comprising:

a state observation step of observing the position command to the drive mechanism and feedback data from the drive mechanism as state variables; and

a learning step of learning the position command for suppressing a positional shift of a transfer position of the workpiece between the 1 st chuck and the 2 nd chuck in accordance with a data set created based on the state variables,

the learning step includes:

a return calculation step of calculating a return based on a difference between the position indicated by the position command to the drive mechanism and the feedback position; and

a function updating step of updating a function for deciding the position command based on the return.